├── .gitignore ├── README.md ├── ROISegNet ├── ROISegNet_2016.prototxt ├── ROISegNet_2017.prototxt ├── solve.py ├── solver_davis16.prototxt └── solver_davis17.prototxt ├── data ├── README.md └── dowdload_DAVIS16.sh ├── demo ├── combine_mask.m ├── deploy.prototxt ├── infer_davis16.py ├── jaccard_single.m └── weights.mat ├── download_all.sh ├── figures └── framework.png ├── models └── download_model.sh ├── part-tracking └── siamese-fc │ └── download_parts.sh ├── python_layers ├── bbox_accuracy_layer.py ├── bbox_data_layer.py ├── bbox_iou.py ├── compute_similarity.py ├── davis2016_ROIdata_layer.py ├── davis2016_fgbg_data_layer.py ├── davis2016_siamese_data_layer.py ├── davis2017_ROIdata_layer.py ├── davis_data_layer.py ├── davis_siamese_data_layer.py ├── davis_siamese_data_layer3.py ├── davis_test_score_layer.py ├── mask_data_layer.py ├── mask_data_layer2.py ├── readme ├── response_value.py ├── siamese_online_hard_mining_layer.py └── slice_layer.py ├── results └── download_results.sh └── test_davis16.sh /.gitignore: -------------------------------------------------------------------------------- 1 | results/favos* 2 | results-demo/* 3 | data/DAVIS2016 4 | part-tracking/siamese-fc/DAVIS2016 5 | *.zip 6 | *.tar 7 | *.caffemodel 8 | *.solverestate 9 | 10 | # python 11 | 12 | __pycache__/ 13 | *.py[cod] 14 | *$py.class 15 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Fast and Accurate Online Video Object Segmentation via Tracking Parts (FAVOS) 2 | 3 | ![Alt Text](https://github.com/JingchunCheng/FAVOS/blob/master/figures/framework.png) 4 | 5 | Contact: Jingchun Cheng (chengjingchun14 at 163 dot com) 6 | 7 | ## Paper 8 | [Fast and Accurate Online Video Object Segmentation via Tracking Parts](https://arxiv.org/abs/1806.02323)
9 | [Jingchun Cheng](https://sites.google.com/view/jingchun-cheng), [Yi-Hsuan Tsai](https://sites.google.com/site/yihsuantsai/home), [Wei-Chih Hung](https://hfslyc.github.io/), [Shengjin Wang](http://www.tsinghua.edu.cn/publish/eeen/3784/2010/20101219115601212198627/20101219115601212198627_.html) and [Ming-Hsuan Yang](http://faculty.ucmerced.edu/mhyang/)
10 | IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018 (Spotlight) 11 | 12 | This is the authors' demo (single-GPU-version) code for the DAVIS 2016 dataset as described in the above paper. Please cite our paper if you find it useful for your research. 13 | 14 | ``` 15 | @inproceedings{Cheng_favos_2018, 16 | author = {J. Cheng and Y.-H. Tsai and W.-C. Hung and S. Wang and M.-H. Yang}, 17 | booktitle = {IEEE Conference on Computer Vision and Pattern Recognition (CVPR)}, 18 | title = {Fast and Accurate Online Video Object Segmentation via Tracking Parts}, 19 | year = {2018} 20 | } 21 | ``` 22 | 23 | ## FAVOS results 24 | [Segmentation Comparisons with Fast Online Methods](https://www.dropbox.com/s/l95ozepuohie7x4/DAVIS16_segmentation_comparison_methods_with_strong_applicability.avi?dl=0) 25 | 26 | [Example Video of Part Tracking](https://www.dropbox.com/s/3yszhdjz6klpmzr/Illustration_part_tracking.avi?dl=0) 27 | 28 | 29 | ## Requirements 30 | * caffe (pycaffe) 31 | * opencv 32 | * matlab 33 | * A GPU with at least 12GB memory 34 | 35 | Download DAVIS 2016 dataset, trained models, tracked parts and pre-computed results
36 | ``` 37 | sh download_all.sh 38 | ``` 39 | 40 | ## Test our model 41 | We provide an example testing script `test_davis16.sh`.
42 | ``` 43 | # Please run download_all.sh first 44 | # Usage: sh test_davis16.sh 45 | 46 | sh test_davis16.sh 0 blackswan 47 | ``` 48 | The results would be saved in `results-demo/res_favos/`. 49 | You can replace the sequence name with others in the DAVIS 2016 validation set to obatin results for other videos.
50 | 51 | 52 | ## Train your own ROISegNet 53 | Download [ResNet-101 model](https://www.dropbox.com/s/2506oyjkwy7acjv/init.caffemodel?dl=0) and save it in the folder "models" as "init.caffemodel"
54 | ``` 55 | cd ROISegNet 56 | python solve.py ../models/init.caffemodel solver_davis16.prototxt 0 57 | ``` 58 | 59 | ## Tracker 60 | We use the SiaFC tracker in [Fully-Convolutional Siamese Networks for Object Tracking](https://github.com/bertinetto/siamese-fc).
61 | The pre-computed parts and tracking results on DAVIS 2016 can be downloaded [here](https://www.dropbox.com/s/pkqlzlhwun4qwuu/parts_DAVIS2016.tar?dl=0).
62 | 63 | Note that, we are currently working on a stable version to combine part tracking and ROISegNet for practical usage on any videos. We will update the code in a near future. 64 | 65 | ## Download our segmentation results on the DAVIS datasets 66 | * FAVOS on DAVIS2016 [link](https://www.dropbox.com/s/9zwob31bz91u75h/favos.tar?dl=0) 67 | * FAVOS on DAVIS2017 [link](https://www.dropbox.com/s/8gtcgf27qdhzyqu/favos_2017.tar?dl=0) 68 | 69 | 70 | ## Note 71 | The models and code are available for non-commercial research purposes only. 72 | 73 | 06/2018: demo code released 74 | 75 | -------------------------------------------------------------------------------- /ROISegNet/solve.py: -------------------------------------------------------------------------------- 1 | import os,sys 2 | sys.path.append("../caffe") 3 | sys.path.append("../caffe/python") 4 | sys.path.append("../caffe/python/caffe") 5 | 6 | sys.path.insert(0, "../../fcn_python/") 7 | sys.path.insert(0, "../../python_layers/") 8 | 9 | 10 | import caffe 11 | import surgery 12 | 13 | import numpy as np 14 | 15 | 16 | weights = sys.argv[1] 17 | solver_proto = sys.argv[2] 18 | gpu_id = np.int(sys.argv[3]) 19 | 20 | caffe.set_device(gpu_id) 21 | caffe.set_mode_gpu() 22 | 23 | solver = caffe.SGDSolver(solver_proto) 24 | solver.net.copy_from(weights) 25 | 26 | # surgeries 27 | interp_layers = [k for k in solver.net.params.keys() if 'up' in k] 28 | surgery.interp(solver.net, interp_layers) 29 | 30 | for _ in range(20): 31 | solver.step(10000) 32 | 33 | 34 | -------------------------------------------------------------------------------- /ROISegNet/solver_davis16.prototxt: -------------------------------------------------------------------------------- 1 | train_net: "ROISegNet_2016.prototxt" 2 | display: 20 3 | 4 | average_loss: 20 5 | lr_policy: "fixed" 6 | # lr for unnormalized softmax 7 | base_lr: 1e-5 8 | 9 | # high momentum 10 | momentum: 0.99 11 | 12 | clip_gradients: 20 13 | # no gradient accumulation 14 | 15 | iter_size: 1 16 | max_iter: 100000 17 | weight_decay: 0.0005 18 | snapshot: 5000 19 | snapshot_prefix: "../models/ROISegNet_2016" 20 | 21 | test_initialization: false 22 | -------------------------------------------------------------------------------- /ROISegNet/solver_davis17.prototxt: -------------------------------------------------------------------------------- 1 | train_net: "ROISegNet_2017.prototxt" 2 | display: 20 3 | 4 | average_loss: 20 5 | lr_policy: "fixed" 6 | # lr for unnormalized softmax 7 | base_lr: 1e-5 8 | 9 | # high momentum 10 | momentum: 0.99 11 | 12 | clip_gradients: 20 13 | # no gradient accumulation 14 | 15 | iter_size: 1 16 | max_iter: 100000 17 | weight_decay: 0.0005 18 | snapshot: 5000 19 | snapshot_prefix: "../models/ROISegNet_2017" 20 | 21 | test_initialization: false 22 | -------------------------------------------------------------------------------- /data/README.md: -------------------------------------------------------------------------------- 1 | Download the DAVIS 2016 and the DAVIS 2017 datasets, and put them in this folder. 2 | -------------------------------------------------------------------------------- /data/dowdload_DAVIS16.sh: -------------------------------------------------------------------------------- 1 | if [ ! -d "DAVIS2016" ]; then 2 | wget https://graphics.ethz.ch/Downloads/Data/Davis/DAVIS-data.zip 3 | unzip DAVIS-data.zip 4 | mv DAVIS DAVIS2016 5 | rm -f DAVIS-data.zip 6 | fi 7 | -------------------------------------------------------------------------------- /demo/combine_mask.m: -------------------------------------------------------------------------------- 1 | function combine_mask(class_name) 2 | 3 | DAVIS_dir = '../data/DAVIS2016/'; 4 | base_dir = '../results/favos_baseline/'; 5 | part_dir = '../results-demo/res_part/'; 6 | res_dir = '../results-demo/res_favos/'; 7 | 8 | 9 | assert(exist(base_dir)>0, 'Please download baseline results "results/favos_baseline".'); 10 | assert(exist(part_dir)>0, 'Please run part segmentation first.'); 11 | 12 | if ~exist(res_dir) 13 | mkdir(res_dir); 14 | end 15 | 16 | 17 | %% combine parts 18 | mkdir([res_dir,class_name]); 19 | 20 | images = dir(fullfile(DAVIS_dir,'JPEGImages/480p/',class_name,'/*.jpg')); 21 | for k = 1:length(images) 22 | 23 | ann_img = imread([DAVIS_dir,'Annotations/480p/',class_name,'/00000.png']); 24 | ann_img = uint8(ann_img>0); 25 | 26 | num_obj = 1; 27 | im_name = images(k).name; 28 | im_name = im_name(1:end-4); 29 | 30 | img = imread([DAVIS_dir,'JPEGImages/480p/', class_name,'/',im_name,'.jpg']); 31 | base_img = imread([base_dir,class_name,'/',im_name,'.png']); 32 | if (size(base_img,1)~=size(img,1)) || (size(base_img,2)~=size(img,2)) 33 | base_img = imresize(base_img,[size(img,1),size(img,2)]); 34 | end 35 | [H, W, ~] = size(img); 36 | 37 | 38 | mask_img = zeros(size(ann_img)); 39 | mask_num = zeros(size(ann_img)); 40 | 41 | 42 | mask_img_sim = zeros(size(ann_img)); 43 | mask_img_score = zeros(size(ann_img)); 44 | 45 | roi_seg_file = [part_dir, class_name,'/',im_name,'.mat']; 46 | load(roi_seg_file); 47 | 48 | 49 | box_num = size(mask,1); 50 | 51 | 52 | 53 | for j = 1:box_num 54 | 55 | box = rois(j,:); 56 | box = box(2:end); 57 | 58 | 59 | [~,~,roi_h,roi_w] = size(mask); 60 | 61 | box = round(box); 62 | box(3) = box(3)-box(1)+1; 63 | box(4) = box(4)-box(2)+1; 64 | ww = box(3); 65 | hh = box(4); 66 | 67 | patch = imcrop(img,box); 68 | if numel(patch) == 0 69 | continue 70 | end 71 | 72 | tmp_mask = mask(j,:,:,:); 73 | [~,~,roi_h,roi_w] = size(tmp_mask); 74 | tmp_mask = reshape(tmp_mask,[2, roi_h,roi_w]); 75 | res_mask = exp(tmp_mask(2,:,:))./(exp(tmp_mask(1,:,:))+exp(tmp_mask(2,:,:))); 76 | res_mask = reshape(res_mask,[roi_h,roi_w]); 77 | 78 | 79 | roi_mask = imresize(res_mask,[hh, ww]); 80 | box = rois(j,:); 81 | box = round(box(2:end)); 82 | if box(2) == 0 83 | box(2) = 1; 84 | box(4) = box(4)+1; 85 | end 86 | if box(1) == 0 87 | box(1) = 1; 88 | box(3) = box(3)+1; 89 | end 90 | box(3) = min(box(3), W); 91 | box(4) = min(box(4), H); 92 | if (hh~=box(4)-box(2)+1) || (ww~=box(3)-box(1)+1) 93 | roi_mask = roi_mask(1:box(4)-box(2)+1,1:box(3)-box(1)+1); 94 | end 95 | 96 | mask_img(box(2):box(4),box(1):box(3)) = mask_img(box(2):box(4),box(1):box(3)) + roi_mask; 97 | mask_num(box(2):box(4),box(1):box(3)) = mask_num(box(2):box(4),box(1):box(3)) + 1; 98 | end 99 | 100 | mask_img_num = double(mask_num); 101 | 102 | mask_num(mask_num == 0) = 1; 103 | mask_img = mask_img./mask_num; 104 | mask_img = mask_img/max(max(mask_img)); 105 | 106 | img_parts = mask_img; 107 | 108 | 109 | %% combine with whole image predictions 110 | num_obj = 1; 111 | im_name = images(k).name; 112 | im_name = im_name(1:end-4); 113 | 114 | 115 | if k == 1 116 | load('weights.mat'); 117 | ratio = weights(find(strcmp(classes,class_name)>0)); 118 | end 119 | 120 | base_img = imread([base_dir,class_name,'/',im_name,'.png']); 121 | if (size(base_img,1)~=size(ann_img,1)) || (size(base_img,2)~=size(ann_img,2)) 122 | base_img = imresize(base_img,[size(ann_img,1),size(ann_img,2)]); 123 | end 124 | base_img = double(base_img); 125 | base_img = base_img/max(max(base_img)); 126 | 127 | img_pick = img_parts; 128 | 129 | res_img = base_img*ratio + img_pick*(1-ratio); 130 | 131 | % soft output 132 | % imwrite(res_img, [res_dir,class_name,'/',im_name,'.jpg']); 133 | imwrite((res_img>0.5), [res_dir,class_name,'/',im_name,'.png']); 134 | 135 | 136 | end 137 | 138 | quit; 139 | 140 | -------------------------------------------------------------------------------- /demo/infer_davis16.py: -------------------------------------------------------------------------------- 1 | import os,sys 2 | sys.path.append("../caffe") 3 | sys.path.append("../caffe/python") 4 | sys.path.append("../caffe/python/caffe") 5 | 6 | import caffe 7 | 8 | import numpy as np 9 | from PIL import Image 10 | import scipy.io 11 | 12 | from scipy.misc import imresize 13 | 14 | import os 15 | from scipy import io 16 | 17 | import shutil 18 | 19 | from numpy import * 20 | import math 21 | import random 22 | 23 | 24 | 25 | def load_image(im_name): 26 | im = Image.open(im_name) 27 | print >> sys.stderr, 'loading {}'.format(im_name) 28 | return im 29 | 30 | 31 | def func_dist(feat1, feat2): 32 | dist = np.zeros([feat1.shape[0], feat2.shape[0]], dtype=np.float32) 33 | for i in range(feat1.shape[0]): 34 | for j in range(feat2.shape[0]): 35 | vector1 = np.float32(mat(feat1[i,...])) 36 | vector2 = np.float32(mat(feat2[j,...])) 37 | 38 | dist[i][j] = sqrt((vector1-vector2)*((vector1-vector2).T)) 39 | 40 | return dist 41 | 42 | 43 | def get_img_rois(img, rois, pool_h, pool_w): 44 | box_num = rois.shape[0] 45 | img_rois = np.zeros((box_num, 3, pool_h, pool_w)) 46 | for i in range(box_num): 47 | bb = rois[i,1:5] 48 | ROI = img.crop(bb) 49 | ROI = ROI.resize((np.int(pool_w), np.int(pool_h))) 50 | in_ = np.array(ROI, dtype=np.float32) 51 | in_ = in_[:,:,::-1] 52 | in_ -= np.array((104.00698793,116.66876762,122.67891434)) 53 | ROI = in_.transpose((2,0,1)) 54 | img_rois[i,...] = ROI 55 | 56 | return img_rois 57 | 58 | 59 | 60 | def get_bbox(label): 61 | label = np.array(label, dtype=np.uint8) 62 | pos = np.where(label >0) 63 | if len(pos[0])<100: 64 | print >> sys.stderr, 'escape very small object {}'.format(obj_id) 65 | bb = [0, 0, 0, 0] 66 | return bb 67 | else: 68 | bb = [np.min(pos[1]), np.min(pos[0]), np.max(pos[1]), np.max(pos[0])] 69 | print>>sys.stderr, 'gt = {}; area = {};'.format(bb, len(pos[0])) 70 | return bb 71 | 72 | 73 | def load_label(im_name): 74 | print >> sys.stderr, 'loading {}'.format(im_name) 75 | im = Image.open(im_name) 76 | label = np.array(im, dtype=np.uint8) 77 | return label 78 | 79 | 80 | davis_dir = '../data/DAVIS2016/' 81 | file_out = '../results-demo/res_part/' 82 | 83 | if not os.path.exists(file_out): 84 | os.makedirs(file_out) 85 | 86 | model = sys.argv[1] 87 | deploy_proto = sys.argv[2] 88 | cls_name = sys.argv[3] 89 | device_id = np.int(sys.argv[4]) 90 | 91 | caffe.set_device(device_id) 92 | caffe.set_mode_gpu() 93 | net = caffe.Net(deploy_proto, model, caffe.TEST) 94 | 95 | img_path = '{}/JPEGImages/480p/{}'.format(davis_dir, cls_name) 96 | images = os.listdir(img_path) 97 | images = sorted(images) 98 | 99 | 100 | det_dir = '../part-tracking/siamese-fc/DAVIS2016' 101 | det_name = '{}/{}.mat.mat'.format(det_dir, cls_name) 102 | data = io.loadmat(det_name) 103 | Dboxes = data['boxes'] 104 | 105 | 106 | for idx in range(len(images)): 107 | 108 | if os.path.exists(file_out) == False: 109 | os.mkdir(file_out) 110 | 111 | if os.path.exists('{}/{}'.format(file_out, cls_name)) == False: 112 | os.mkdir('{}/{}'.format(file_out, cls_name)) 113 | 114 | im_name = '{}/JPEGImages/480p/{}/{}'.format(davis_dir, cls_name, images[idx]) 115 | 116 | ss = images[idx].split('.jpg') 117 | ss = ss[0] 118 | 119 | img_idx = np.int(ss) 120 | dets = Dboxes[0:Dboxes.shape[0],img_idx,0:Dboxes.shape[2]] 121 | dets = np.round(dets) 122 | 123 | pos = np.where(dets<0) 124 | 125 | for k in range(len(pos[0])): 126 | dets[pos[0][k], pos[1][k]] = 0 127 | 128 | input_roi = np.zeros((dets.shape[0],5)) 129 | for k in range(dets.shape[0]): 130 | input_roi[k, 1] = dets[k, 0] 131 | input_roi[k, 2] = dets[k, 1] 132 | input_roi[k, 3] = dets[k, 2] 133 | input_roi[k, 4] = dets[k, 3] 134 | 135 | img1 = load_image(im_name) 136 | pool_w = 80 137 | pool_h = 80 138 | 139 | NUM = 200 140 | if dets.shape[0] > 0: 141 | input_rois = input_roi 142 | print >> sys.stderr,'box num = {}'.format(dets.shape[0]) 143 | outs2 = np.zeros([dets.shape[0],2,pool_w,pool_h], dtype = np.float32) 144 | for k in range( np.int(np.ceil( dets.shape[0]*1.0/NUM)) ): 145 | input_roi = input_rois[k*NUM:np.min([(k+1)*NUM,dets.shape[0]]), ...] 146 | datas = get_img_rois(img1, input_roi, pool_h, pool_w) 147 | net.blobs['datas'].reshape(*datas.shape) 148 | net.blobs['datas'].data[...] = datas 149 | net.forward() 150 | out1 = net.blobs['feat'].data 151 | out2 = net.blobs['pred_mask'].data 152 | 153 | if k == 0: 154 | feat_len = out1.shape[1] 155 | outs1 = np.zeros([dets.shape[0], feat_len], dtype = np.float32) 156 | print >> sys.stderr, 'Feature length = {}, Candidate number = {}'.format(feat_len, dets.shape[0]) 157 | 158 | outs1[k*NUM:np.min([(k+1)*NUM,dets.shape[0]]),...] = out1.reshape([out1.shape[0],out1.shape[1]]) 159 | outs2[k*NUM:np.min([(k+1)*NUM,dets.shape[0]]),...] = out2 160 | print>>sys.stderr, 'Processing {}->{}'.format(k*NUM, np.min([(k+1)*NUM,dets.shape[0]])) 161 | 162 | io.savemat('{}/{}/{}.mat'.format(file_out, cls_name, ss), {'mask': outs2, 'rois': input_rois}) 163 | 164 | -------------------------------------------------------------------------------- /demo/jaccard_single.m: -------------------------------------------------------------------------------- 1 | function [J, inters, fp, fn] = jaccard_single( object, ground_truth ) 2 | 3 | % Make sure they're binary 4 | object = logical(object); 5 | ground_truth = logical(ground_truth); 6 | 7 | % Intersection between all sets 8 | inters = object.*ground_truth; 9 | fp = object.*(1-inters); 10 | fn = ground_truth.*(1-inters); 11 | 12 | % Areas of the intersections 13 | inters = sum(inters(:)); % Intersection 14 | fp = sum(fp(:)); % False positives 15 | fn = sum(fn(:)); % False negatives 16 | 17 | % Compute the fraction 18 | denom = inters + fp + fn; 19 | if denom==0 20 | J = 1; 21 | else 22 | J = inters/denom; 23 | end 24 | end 25 | -------------------------------------------------------------------------------- /demo/weights.mat: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/JingchunCheng/FAVOS/27662f841ed782d7a2701be1d72c481a1a203445/demo/weights.mat -------------------------------------------------------------------------------- /download_all.sh: -------------------------------------------------------------------------------- 1 | cd data && sh ./dowdload_DAVIS16.sh && cd ../ 2 | cd models && sh ./download_model.sh && cd ../ 3 | cd part-tracking/siamese-fc && sh ./download_parts.sh && cd ../../ 4 | cd results && sh ./download_results.sh && cd ../ 5 | -------------------------------------------------------------------------------- /figures/framework.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/JingchunCheng/FAVOS/27662f841ed782d7a2701be1d72c481a1a203445/figures/framework.png -------------------------------------------------------------------------------- /models/download_model.sh: -------------------------------------------------------------------------------- 1 | if [ ! -f "ROISegNet_2016.caffemodel" ]; then 2 | wget https://www.dropbox.com/s/tkfa22j0ypq8ncq/ROISegNet_2016.caffemodel 3 | fi 4 | 5 | -------------------------------------------------------------------------------- /part-tracking/siamese-fc/download_parts.sh: -------------------------------------------------------------------------------- 1 | if [ ! -d "DAVIS2016" ]; then 2 | wget https://www.dropbox.com/s/pkqlzlhwun4qwuu/parts_DAVIS2016.tar 3 | tar -xf parts_DAVIS2016.tar 4 | rm -f parts_DAVIS2016.tar 5 | fi 6 | 7 | -------------------------------------------------------------------------------- /python_layers/bbox_accuracy_layer.py: -------------------------------------------------------------------------------- 1 | import caffe 2 | import numpy as np 3 | import math 4 | from scipy.misc import imresize 5 | import sys 6 | import random 7 | 8 | class BBoxAccLayer(caffe.Layer): 9 | 10 | def setup(self, bottom, top): 11 | # check input pair 12 | if len(bottom) != 2: 13 | raise Exception("Need two bottoms: score label") 14 | 15 | 16 | def reshape(self, bottom, top): 17 | #top[0].reshape(bottom[0].data.shape[0],bottom[0].data.shape[1],bottom[0].data.shape[2],bottom[0].data.shape[3])I 18 | #top[1].reshape(bottom[1].data.shape[0],bottom[1].data.shape[1],bottom[1].data.shape[2],bottom[1].data.shape[3]) 19 | self.score = np.zeros_like(bottom[0].data, dtype=np.float32) 20 | self.label = np.zeros_like(bottom[1].data, dtype=np.float32) 21 | self.H = 480 22 | self.W = 854 23 | 24 | 25 | 26 | def forward(self, bottom, top): 27 | self.score = bottom[0].data 28 | self.label = bottom[1].data 29 | label = self.label 30 | num_bg = 0 31 | num_fg = 0 32 | correct_bg = 0 33 | correct_fg = 0 34 | for iw in range(self.W): 35 | for ih in range(self.H): 36 | if label[0,0,ih,iw] == -1: 37 | continue 38 | elif self.score[0,0,ih,iw]>self.score[0,1,ih,iw]: 39 | if label[0,0,ih,iw] == 0: 40 | num_bg = num_bg + 1 41 | correct_bg = correct_bg + 1 42 | else: 43 | num_fg = num_fg + 1 44 | else: 45 | if label[0,0,ih,iw] == 1: 46 | num_fg = num_fg + 1 47 | correct_fg = correct_fg + 1 48 | else: 49 | num_bg = num_bg + 1 50 | 51 | if num_bg > 0 and num_fg > 0: 52 | print >> sys.stderr,'pr_bg = {}; pr_fg = {}; pr_all = {}'.format(correct_bg*1.0/num_bg,correct_fg*1.0/num_fg,(correct_bg+correct_fg)*1.0/(num_bg+num_fg)) 53 | 54 | 55 | def backward(self, top, propagate_down, bottom): 56 | pass 57 | -------------------------------------------------------------------------------- /python_layers/bbox_data_layer.py: -------------------------------------------------------------------------------- 1 | import caffe 2 | import numpy as np 3 | from numpy import * 4 | import math 5 | from scipy.misc import imresize 6 | import sys 7 | import random 8 | 9 | class BboxDataLayer(caffe.Layer): 10 | 11 | def setup(self, bottom, top): 12 | # check input pair 13 | if len(bottom) != 2: 14 | raise Exception("Need three bottoms: label, rois") 15 | if len(top) != 1: 16 | raise Exception("Need one top: bbox_targets") 17 | 18 | 19 | def reshape(self, bottom, top): 20 | self.label = np.zeros_like(bottom[0].data, dtype=np.float32) 21 | self.rois = np.zeros_like(bottom[1].data, dtype=np.float32) 22 | self.bbox_targets = np.zeros([bottom[1].data.shape[0], 4], dtype=np.float32) 23 | top[0].reshape(self.rois.shape[0],4) 24 | 25 | def forward(self, bottom, top): 26 | self.label = bottom[0].data 27 | self.label = self.label.reshape([self.label.shape[1],self.label.shape[2]]) 28 | self.rois = bottom[1].data 29 | self.box_num = self.rois.shape[0] 30 | gt_box = self.get_bbox(self.label) 31 | 32 | for i in range(self.box_num): 33 | bbgt = self.transform_box(gt_box, self.rois[i,1:5]) 34 | bbgt = bbgt.reshape(1,4) 35 | self.bbox_targets[i,...] = bbgt 36 | 37 | 38 | 39 | top[0].reshape(self.box_num,4) 40 | top[0].data[...] = self.bbox_targets 41 | 42 | 43 | 44 | def backward(self, top, propagate_down, bottom): 45 | pass 46 | 47 | 48 | def get_bbox(self, label): 49 | label = np.array(label, dtype=np.uint8) 50 | pos = np.where(label == 1) 51 | bb = [np.min(pos[1]), np.min(pos[0]), np.max(pos[1]), np.max(pos[0])] 52 | #print>>sys.stderr, 'gt = {}; area = {};'.format(bb, len(pos[0])) 53 | return bb 54 | 55 | 56 | def transform_box(self,box,roi): 57 | x_a = (roi[0] + roi[2])*1.0/2 58 | y_a = (roi[1] + roi[3])*1.0/2 59 | w_a = (roi[2]-roi[0]+1)*1.0 60 | h_a = (roi[3]-roi[1]+1)*1.0 61 | 62 | bb = zeros([4,1],dtype=np.float32) 63 | bb[0] = (box[0] - x_a)*1.0/w_a 64 | bb[1] = (box[1] - y_a)*1.0/h_a 65 | bb[2] = box[2] - box[0] + 1 66 | bb[2] = math.log(bb[2]*1.0/w_a) 67 | bb[3] = box[3] - box[1] + 1 68 | bb[3] = math.log(bb[3]*1.0/h_a) 69 | 70 | #print >> sys.stderr, 'box = {}, box_transform = {}'.format(box, bb) 71 | return bb 72 | -------------------------------------------------------------------------------- /python_layers/bbox_iou.py: -------------------------------------------------------------------------------- 1 | import caffe 2 | import numpy as np 3 | from numpy import * 4 | import math 5 | from scipy.misc import imresize 6 | import sys 7 | import random 8 | import math 9 | 10 | class BBoxIoULayer(caffe.Layer): 11 | 12 | def setup(self, bottom, top): 13 | # check input pair 14 | if len(bottom) != 3: 15 | raise Exception("Need two bottoms: label, roi, bbox_pred") 16 | 17 | 18 | def reshape(self, bottom, top): 19 | self.label = np.zeros_like(bottom[0].data, dtype=np.float32) 20 | self.roi = np.zeros_like(bottom[1].data, dtype=np.float32) 21 | self.bbox_pred = np.zeros_like(bottom[2].data, dtype=np.float32) 22 | 23 | 24 | 25 | def forward(self, bottom, top): 26 | self.label = bottom[0].data 27 | self.label = self.label.reshape([self.label.shape[1],self.label.shape[2]]) 28 | self.rois = bottom[1].data 29 | self.bbox_pred = bottom[2].data 30 | box_num = self.rois.shape[0] 31 | gt_box = self.get_bbox(self.label) 32 | # print >> sys.stderr, self.label.shape 33 | 34 | IoUs = np.zeros([box_num, 2], dtype=np.float32) 35 | for i in range(box_num): 36 | roi = self.rois[i,1:5] 37 | box = self.bbox_pred[i,...,0,0] 38 | box = self.recover_box(box,roi) 39 | iou_roi = self.func_iou(roi, gt_box) 40 | iou_pred = self.func_iou(box, gt_box) 41 | IoUs[i,0]= iou_roi 42 | IoUs[i,1]= iou_pred 43 | 44 | # print >> sys.stderr, 'gt = {}'.format(gt_box) 45 | ave_iou = np.sum(IoUs, axis=0)*1.0/box_num 46 | 47 | print >> sys.stderr,'IoU before bbrg = {}; IoU after bbrg = {};'.format(ave_iou[0], ave_iou[1]) 48 | 49 | 50 | def backward(self, top, propagate_down, bottom): 51 | pass 52 | 53 | 54 | def func_iou(self, bb, gtbb): 55 | iou = 0 56 | iw = min(bb[2],gtbb[2]) - max(bb[0],gtbb[0]) + 1 57 | ih = min(bb[3],gtbb[3]) - max(bb[1],gtbb[1]) + 1 58 | if iw>0 and ih>0: 59 | ua = (bb[2]-bb[0]+1)*(bb[3]-bb[1]+1) + (gtbb[2]-gtbb[0]+1)*(gtbb[3]-gtbb[1]+1) - iw*ih 60 | iou = np.float32(iw*ih*1.0/ua) 61 | 62 | return iou 63 | 64 | 65 | 66 | def get_bbox(self, label): 67 | label = np.array(label, dtype=np.uint8) 68 | pos = np.where(label == 1) 69 | bb = [np.min(pos[1]), np.min(pos[0]), np.max(pos[1]), np.max(pos[0])] 70 | print>>sys.stderr, 'gt = {}; area = {};'.format(bb, len(pos[0])) 71 | return bb 72 | 73 | 74 | def recover_box(self,box,roi): 75 | x_a = (roi[0] + roi[2])*1.0/2 76 | y_a = (roi[1] + roi[3])*1.0/2 77 | w_a = (roi[2]-roi[0]+1)*1.0 78 | h_a = (roi[3]-roi[1]+1)*1.0 79 | 80 | x = box[0]*w_a + x_a 81 | y = box[1]*h_a + y_a 82 | w = math.exp(box[2])*w_a 83 | h = math.exp(box[3])*h_a 84 | 85 | bb = zeros([4,1],dtype=np.float32) 86 | bb[0] = x - w*1.0/2 87 | bb[1] = y - h*1.0/2 88 | bb[2] = x + w*1.0/2 89 | bb[3] = y + h*1.0/2 90 | 91 | # print >> sys.stderr, 'box_transform = {}, box = {}'.format(box, bb) 92 | return bb 93 | 94 | -------------------------------------------------------------------------------- /python_layers/compute_similarity.py: -------------------------------------------------------------------------------- 1 | import caffe 2 | import numpy as np 3 | from numpy import * 4 | import math 5 | from scipy.misc import imresize 6 | import sys 7 | import random 8 | 9 | class SimilarityLayer(caffe.Layer): 10 | 11 | def setup(self, bottom, top): 12 | # check input pair 13 | if len(bottom) != 2: 14 | raise Exception("Need two bottoms: bbox, feats") 15 | 16 | 17 | def reshape(self, bottom, top): 18 | self.bbox = np.zeros_like(bottom[0].data, dtype=np.float32) 19 | self.feats = np.zeros_like(bottom[1].data, dtype=np.float32) 20 | self.H = 480 21 | self.W = 854 22 | top[0].reshape(self.feats.shape[0],self.bbox.shape[0]) 23 | 24 | 25 | def forward(self, bottom, top): 26 | self.feats = bottom[1].data 27 | self.bbox = bottom[0].data 28 | num_obj = self.bbox.shape[0] 29 | self.feat1 = self.feats[0:num_obj,...] 30 | self.feat2 = self.feats[num_obj:self.feats.shape[0],...] 31 | #print self.feat1.shape 300,4096,1,1 32 | #print >> sys.stderr, self.feat1 33 | #print >> sys.stderr, self.feat2 34 | print >> sys.stderr, 'Number of objects: {}'.format(num_obj) 35 | 36 | dist = np.zeros([self.feat2.shape[0], num_obj], dtype=np.float32) 37 | for obj_id in range(num_obj): 38 | for i in range(self.feat2.shape[0]): 39 | vector1 = np.float32(mat(self.feat1[obj_id,...,0])) 40 | vector2 = np.float32(mat(self.feat2[i,...,0])) 41 | dist[i, obj_id] = sqrt(((vector1-vector2).T)*(vector1-vector2)) 42 | 43 | top[0].reshape(self.feat2.shape[0], num_obj) 44 | top[0].data[...] = dist 45 | 46 | def backward(self, top, propagate_down, bottom): 47 | pass 48 | -------------------------------------------------------------------------------- /python_layers/davis2016_ROIdata_layer.py: -------------------------------------------------------------------------------- 1 | import caffe 2 | 3 | import numpy as np 4 | from PIL import Image 5 | from scipy.misc import imresize 6 | 7 | import sys 8 | 9 | import random 10 | 11 | from numpy import * 12 | import math 13 | from scipy.misc import imresize 14 | 15 | 16 | 17 | class Davis2016ROIDataLayer(caffe.Layer): 18 | 19 | def setup(self, bottom, top): 20 | params = eval(self.param_str) 21 | self.davis_dir = params['davis_dir'] 22 | self.split = params['split'] 23 | self.mean = np.array(params['mean']) 24 | self.random = params.get('randomize', True) 25 | self.seed = params.get('seed', None) 26 | self.scale = params.get('scale', 1) 27 | self.fg_rate = params.get('fg_rate', 0.5) 28 | self.augment = params.get('with_augmentation', True) 29 | self.fg_random = params.get('fg_random', False) 30 | self.aug_params= np.array(params['aug_params']) #( aug_num, max_scale, max_rotate, max_translation, flip) 31 | self.H = 480 32 | self.W = 854 33 | self.pool_w = params.get('pool_w', 7) 34 | self.pool_h = params.get('pool_h', 7) 35 | self.box_num = params.get('box_num', 100) 36 | self.fg_thre = params.get('fg_thre', 0.7) 37 | 38 | if self.augment: 39 | self.aug_num = np.int(self.aug_params[0]) 40 | self.max_scale = self.aug_params[1] 41 | self.max_rotate = self.aug_params[2] 42 | self.max_transW = self.aug_params[3] 43 | self.max_transH = self.aug_params[4] 44 | self.flip = (self.aug_params[5]>0) 45 | 46 | # tops 47 | if len(top) != 3: 48 | raise Exception("Need to define five tops: data, labels, weights") 49 | # data layers have no bottoms 50 | if len(bottom) != 0: 51 | raise Exception("Do not define a bottom.") 52 | 53 | # load indices for images and labels 54 | split_f = '{}/ImageSets/480p/{}.txt'.format(self.davis_dir, 55 | self.split) 56 | self.indices = open(split_f, 'r').read().splitlines() 57 | self.idx1 = -1 # we pick idx in reshape 58 | self.idx2 = -1 59 | 60 | # make eval deterministic 61 | if 'train' not in self.split: 62 | self.random = False 63 | 64 | # randomization: seed and pick 65 | if self.random: 66 | random.seed(self.seed) 67 | self.idx = random.randint(0, len(self.indices)-1) 68 | 69 | 70 | def reshape(self, bottom, top): 71 | while True: 72 | # pick next input 73 | if self.random: 74 | self.idx1 = random.randint(0, len(self.indices)-1) 75 | else: 76 | self.idx1 += 1 77 | if self.idx1 == len(self.indices): 78 | self.idx1 = 0 79 | 80 | idx1 = self.idx1 81 | 82 | #get clip name 83 | clip1 = self.indices[idx1].split(' ')[0].split('/')[-2] 84 | 85 | if self.augment == False or random.randint(0, self.aug_num) == 0: 86 | self.img1 = self.load_image(self.indices[idx1].split(' ')[0]) 87 | self.label1 = self.load_label(self.indices[idx1].split(' ')[1]) 88 | self.img1 = self.img1.resize((self.H, self.W)) 89 | self.label1 = imresize(self.label1, size=(self.H, self.W), interp="nearest") 90 | else: 91 | scale = (random.random()*2-1) * self.max_scale 92 | rotation = (random.random()*2-1) * self.max_rotate 93 | trans_w = np.int( (random.random()*2-1) * self.max_transW * self.W ) 94 | trans_h = np.int( (random.random()*2-1) * self.max_transH * self.H ) 95 | if self.flip: 96 | flip = (random.randint(0,1) > 0) 97 | else: 98 | flip = False 99 | self.img1 = self.load_image_transform(self.indices[idx1].split(' ')[0], scale, rotation, trans_h, trans_w, flip) 100 | self.label1 = self.load_label_transform(self.indices[idx1].split(' ')[1], scale, rotation, trans_h, trans_w, flip) 101 | 102 | 103 | if self.scale != 1: 104 | self.img1 = self.img1.resize((np.int(self.H*self.scale), np.int(self.W*self.scale))) 105 | print >> sys.stderr, 'SCALE {}'.format(self.scale) 106 | 107 | if np.max(self.label1) == 0: 108 | continue 109 | 110 | if np.sum(self.get_bbox(self.label1)) == 0: 111 | # if objects in either image are too small 112 | continue 113 | 114 | if self.fg_random: 115 | self.rois = self.get_rois(self.label1, self.box_num, 1) 116 | else: 117 | self.rois = self.get_bboxes(self.label1, self.box_num) 118 | 119 | self.img_rois = self.get_img_rois(self.img1, self.rois) 120 | self.lab_rois = self.get_lab_rois(self.label1, self.rois) 121 | self.weights = self.calculate_weight_rois(self.lab_rois) 122 | 123 | break 124 | 125 | # reshape tops 126 | top[0].reshape(self.box_num, 3, self.pool_h, self.pool_w) 127 | top[1].reshape(self.box_num, self.pool_h, self.pool_w) 128 | top[2].reshape(self.box_num, self.pool_h, self.pool_w) 129 | 130 | 131 | 132 | def forward(self, bottom, top): 133 | # assign output 134 | top[0].data[...] = self.img_rois 135 | top[1].data[...] = self.lab_rois 136 | top[2].data[...] = self.weights 137 | 138 | def backward(self, top, propagate_down, bottom): 139 | pass 140 | 141 | 142 | def load_image(self, idx): 143 | """ 144 | Load input image and preprocess for Caffe: 145 | - cast to float 146 | - switch channels RGB -> BGR 147 | - subtract mean 148 | - transpose to channel x height x width order 149 | """ 150 | print >> sys.stderr, 'loading {}'.format(idx) 151 | im = Image.open('{}/{}'.format(self.davis_dir, idx)) 152 | im = im.resize((self.W, self.H)) 153 | # in_ = np.array(im, dtype=np.float32) 154 | # in_ = in_[:,:,::-1] 155 | # in_ -= self.mean 156 | # in_ = in_.transpose((2,0,1)) 157 | return im 158 | 159 | 160 | def load_label(self, idx): 161 | """ 162 | Load label image as 1 x height x width integer array of label indices. 163 | The leading singleton dimension is required by the loss. 164 | """ 165 | print >> sys.stderr, 'loading {}'.format(idx) 166 | im = Image.open('{}/{}'.format(self.davis_dir, idx)) 167 | im = im.resize((self.W, self.H)) 168 | if self.scale != 1: 169 | im = im.resize((np.int(self.W*self.scale), np.int(self.H*self.scale))) 170 | 171 | label = np.array(im, dtype=np.uint8) 172 | # label = label[np.newaxis, ...] 173 | print >> sys.stderr, 'Number of Objects: {}'.format(np.max(label)) 174 | return label 175 | 176 | 177 | 178 | def load_image_transform(self, idx, scale, rotation, trans_h, trans_w, flip): 179 | img_W = np.int( self.W*(1.0 + scale) ) 180 | img_H = np.int( self.H*(1.0 + scale) ) 181 | 182 | print >> sys.stderr, 'loading {}'.format(idx) 183 | print >> sys.stderr, 'scale: {}; rotation: {}; translation: ({},{}); flip: {}.'.format(scale, rotation, trans_w, trans_h, flip) 184 | 185 | im = Image.open('{}/{}'.format(self.davis_dir, idx)) 186 | im = im.resize((img_W,img_H)) 187 | im = im.transform((img_W,img_H),Image.AFFINE,(1,0,trans_w,0,1,trans_h)) 188 | im = im.rotate(rotation) 189 | if flip: 190 | im = im.transpose(Image.FLIP_LEFT_RIGHT) 191 | 192 | if scale>0: 193 | box = (np.int((img_W - self.W)/2), np.int((img_H - self.H)/2), np.int((img_W - self.W)/2)+self.W, np.int((img_H - self.H)/2)+self.H) 194 | im = im.crop(box) 195 | else: 196 | im = im.resize((self.W, self.H)) 197 | 198 | # in_ = np.array(im, dtype=np.float32) 199 | # in_ = in_[:,:,::-1] 200 | # in_ -= self.mean 201 | 202 | return im 203 | 204 | 205 | def load_label_transform(self, idx, scale, rotation, trans_h, trans_w, flip): 206 | img_W = np.int( self.W*(1.0 + scale) ) 207 | img_H = np.int( self.H*(1.0 + scale) ) 208 | 209 | im = Image.open('{}/{}'.format(self.davis_dir, idx)) 210 | im = im.resize((img_W,img_H)) 211 | im = im.transform((img_W,img_H),Image.AFFINE,(1,0,trans_w,0,1,trans_h)) 212 | im = im.rotate(rotation) 213 | if flip: 214 | im = im.transpose(Image.FLIP_LEFT_RIGHT) 215 | 216 | if scale>0: 217 | # w_start = np.int(random.random()*(img_W - self.W)) 218 | # h_start = np.int(random.random()*(img_H - self.H)) 219 | # box = (w_start, h_start, w_start+self.W, h_start+self.H) 220 | box = (np.int((img_W - self.W)/2), np.int((img_H - self.H)/2), np.int((img_W - self.W)/2)+self.W, np.int((img_H - self.H)/2)+self.H) 221 | im = im.crop(box) 222 | else: 223 | im = im.resize((self.W, self.H)) 224 | 225 | if self.scale != 1: 226 | im = im.resize((np.int(self.W*self.scale), np.int(self.H*self.scale))) 227 | 228 | label = np.array(im, dtype=np.uint8) 229 | print >> sys.stderr, 'Number of Objects: {}'.format(np.max(label)) 230 | 231 | return label 232 | 233 | 234 | def get_bboxes(self, label, box_num): 235 | label = np.array(label, dtype=np.uint8) 236 | pos = np.where(label > 0 ) 237 | rois = np.zeros((box_num, 5)) 238 | bb = [np.min(pos[1]), np.min(pos[0]), np.max(pos[1]), np.max(pos[0])] 239 | for k in range(box_num): 240 | rois[k, 1:5] = bb 241 | return rois 242 | 243 | 244 | def get_bbox(self, label): 245 | label = np.array(label, dtype=np.uint8) 246 | pos = np.where(label > 0 ) 247 | if len(pos[0])<1024: 248 | print >> sys.stderr, 'Escape very small object (area < 1024).' 249 | bb = [0, 0, 0, 0] 250 | return bb 251 | else: 252 | bb = [np.min(pos[1]), np.min(pos[0]), np.max(pos[1]), np.max(pos[0])] 253 | print>>sys.stderr, 'gt = {}; area = {};'.format(bb, len(pos[0])) 254 | return bb 255 | 256 | 257 | def get_rois(self, label, box_num, fg_rate): 258 | label = np.array(label, dtype=np.uint8) 259 | pos = np.where(label > 0) 260 | W = label.shape[1] 261 | H = label.shape[0] 262 | print>>sys.stderr, label.shape 263 | rois = np.zeros((box_num, 5)) 264 | gtbb = [np.min(pos[1]), np.min(pos[0]), np.max(pos[1]), np.max(pos[0])] 265 | box = np.zeros_like(gtbb) 266 | bb = np.zeros((1,5)) 267 | k = 0 268 | while k < np.int(box_num*fg_rate): 269 | bb[0, 1] = random.randint(0,W) 270 | bb[0, 2] = random.randint(0,H) 271 | bb[0, 3] = random.randint(bb[0, 1],W) 272 | bb[0, 4] = random.randint(bb[0, 2],H) 273 | if (bb[0,4] - bb[0,2]+1)*(bb[0,3]-bb[0,1]+1) < 8*8*4: 274 | continue 275 | iou = self.func_iou(bb[0, 1:5], gtbb) 276 | # print>>sys.stderr, 'gt = {}; proposal = {}; IoU = {};'.format(gtbb, bb, iou) 277 | if iou > self.fg_thre: 278 | rois[k,...] = bb 279 | k = k + 1 280 | # print>>sys.stderr, 'box_num = {}; gt = {}; proposal = {}; IoU = {};'.format(k, gtbb, bb[0, 1:5], iou) 281 | 282 | while k < box_num: 283 | bb[0, 1] = random.randint(0,W) 284 | bb[0, 2] = random.randint(0,H) 285 | bb[0, 3] = random.randint(bb[0, 1],W) 286 | bb[0, 4] = random.randint(bb[0, 2],H) 287 | if (bb[0,4] - bb[0,2]+1)*(bb[0,3]-bb[0,1]+1) < 8*8*4: 288 | continue 289 | iou = self.func_iou(bb[0, 1:5], gtbb) 290 | # print>>sys.stderr, 'obj = {}; gt = {}; proposal = {}; IoU = {};'.format(obj_id, gtbb, bb, iou) 291 | if iou < 0.5: 292 | rois[k,...] = bb 293 | k = k + 1 294 | 295 | return rois 296 | 297 | 298 | 299 | def func_iou(self, bb, gtbb): 300 | iou = 0 301 | iw = min(bb[2],gtbb[2]) - max(bb[0],gtbb[0]) + 1 302 | ih = min(bb[3],gtbb[3]) - max(bb[1],gtbb[1]) + 1 303 | if iw>0 and ih>0: 304 | ua = (bb[2]-bb[0]+1)*(bb[3]-bb[1]+1) + (gtbb[2]-gtbb[0]+1)*(gtbb[3]-gtbb[1]+1) - iw*ih 305 | iou = np.float32(iw*ih*1.0/ua) 306 | 307 | return iou 308 | 309 | def get_img_rois(self, img, rois): 310 | #img = Image.fromarray(img) 311 | img_rois = np.zeros((self.box_num, 3, self.pool_h, self.pool_w)) 312 | for i in range(self.box_num): 313 | bb = rois[i,1:5] 314 | ROI = img.crop(bb) 315 | ROI = ROI.resize((np.int(self.pool_w), np.int(self.pool_h))) 316 | # print>>sys.stderr, ROI.shape 317 | in_ = np.array(ROI, dtype=np.float32) 318 | in_ = in_[:,:,::-1] 319 | in_ -= self.mean 320 | ROI = in_.transpose((2,0,1)) 321 | img_rois[i,...] = ROI 322 | 323 | return img_rois 324 | 325 | 326 | def get_lab_rois(self, lab, rois): 327 | lab = Image.fromarray(lab) 328 | lab_rois = np.zeros((self.box_num, self.pool_h, self.pool_w)) 329 | for i in range(self.box_num): 330 | bb = rois[i,1:5] 331 | ROI = lab.crop(bb) 332 | ROI = ROI.resize((np.int(self.pool_w), np.int(self.pool_h))) 333 | ROI = np.array(ROI, dtype=np.uint8) 334 | ROI = np.uint8(ROI>0) 335 | lab_rois[i,...] = ROI 336 | 337 | return lab_rois 338 | 339 | 340 | def calculate_weight_rois(self, labels): 341 | weights = np.ones((self.box_num, self.pool_h, self.pool_w)) 342 | for i in range(self.box_num): 343 | label = labels[i,...] 344 | pos = np.where(label==1) 345 | neg = np.where(label==0) 346 | weight_pos = len(pos[0])*1.0/(len(pos[0])+len(neg[0])) 347 | for k in range(len(pos[0])): 348 | weights[i, pos[0][k],pos[1][k]] = 1 - weight_pos 349 | # print >> sys.stderr, 'pos_num = {}, neg_num = {}, weight_pos = {}'.format(len(pos[0]), len(neg[0]), 1 - weight_pos) 350 | 351 | return weights 352 | 353 | -------------------------------------------------------------------------------- /python_layers/davis2016_fgbg_data_layer.py: -------------------------------------------------------------------------------- 1 | import caffe 2 | 3 | import numpy as np 4 | from PIL import Image 5 | 6 | import cv2 7 | from scipy.misc import imresize 8 | from scipy.misc import imrotate 9 | 10 | import sys 11 | 12 | import random 13 | 14 | class DAVIS2016FgBgDataLayer(caffe.Layer): 15 | """ 16 | Load (input image, label image) pairs from PASCAL VOC 17 | one-at-a-time while reshaping the net to preserve dimensions. 18 | 19 | Use this to feed data to a fully convolutional network. 20 | """ 21 | 22 | def setup(self, bottom, top): 23 | # config 24 | params = eval(self.param_str) 25 | self.davis_dir = params['davis_dir'] 26 | self.split = params['split'] 27 | self.mean = np.array(params['mean']) 28 | self.random = params.get('randomize', True) 29 | self.seed = params.get('seed', None) 30 | self.scale = params.get('scale', 1) 31 | self.augment = params.get('with_augmentation', True) 32 | self.aug_params = np.array(params['aug_params']) #( aug_num, max_scale, max_rotate, max_translation, flip) 33 | self.H = 480 34 | self.W = 854 35 | 36 | 37 | # two tops: data and label 38 | if len(top) != 3: 39 | raise Exception("Need to define two tops: data label weight") 40 | # data layers have no bottoms 41 | if len(bottom) != 0: 42 | raise Exception("Do not define a bottom.") 43 | 44 | # load indices for images and labels 45 | split_f = '{}/ImageSets/480p/{}.txt'.format(self.davis_dir, 46 | self.split) 47 | self.indices = open(split_f, 'r').read().splitlines() 48 | self.idx = -1 # we pick idx in reshape 49 | 50 | if self.augment: 51 | self.aug_num = np.int(self.aug_params[0]) 52 | self.max_scale = self.aug_params[1] 53 | self.max_rotate = self.aug_params[2] 54 | self.max_transW = self.aug_params[3] 55 | self.max_transH = self.aug_params[4] 56 | self.flip = (self.aug_params[5]>0) 57 | 58 | 59 | # make eval deterministic 60 | if 'train' not in self.split: 61 | self.random = False 62 | 63 | # randomization: seed and pick 64 | if self.random: 65 | random.seed(self.seed) 66 | self.idx = random.randint(0, len(self.indices)-1) 67 | 68 | 69 | def reshape(self, bottom, top): 70 | 71 | while True: 72 | # pick next input 73 | if self.random: 74 | self.idx = random.randint(0, len(self.indices)-1) 75 | else: 76 | self.idx += 1 77 | if self.idx == len(self.indices): 78 | self.idx = 0 79 | 80 | 81 | if self.idx == (len(self.indices) - 1): 82 | continue 83 | 84 | idx = self.idx 85 | 86 | if self.augment == False or random.randint(0, self.aug_num) == 0: 87 | self.img = self.load_image(self.indices[idx].split(' ')[0]) 88 | self.label = self.load_label(self.indices[idx].split(' ')[1]) 89 | self.img = imresize(self.img, size=(self.H, self.W), interp="bilinear") 90 | self.label = imresize(self.label, size=(self.H, self.W), interp="nearest") 91 | else: 92 | scale = (random.random()*2-1) * self.max_scale 93 | rotation = (random.random()*2-1) * self.max_rotate 94 | trans_w = np.int( (random.random()*2-1) * self.max_transW * self.W ) 95 | trans_h = np.int( (random.random()*2-1) * self.max_transH * self.H ) 96 | if self.flip: 97 | flip = (random.randint(0,1) > 0) 98 | else: 99 | flip = False 100 | self.img = self.load_image_transform(self.indices[idx].split(' ')[0], scale, rotation, trans_h, trans_w, flip) 101 | self.label = self.load_label_transform(self.indices[idx].split(' ')[1], scale, rotation, trans_h, trans_w, flip) 102 | 103 | 104 | if self.scale != 1: 105 | self.img = imresize(self.img, size=(np.int(self.H*self.scale), np.int(self.W*self.scale)), interp="bilinear") 106 | self.label = imresize(self.label, size=(np.int(self.H*self.scale), np.int(self.W*self.scale)), interp="nearest") 107 | 108 | 109 | self.weight = self.calculate_weight(self.label) 110 | self.img = self.img.transpose((2,0,1)) 111 | self.label = self.label[np.newaxis, ...] 112 | self.weight = self.weight[np.newaxis, ...] 113 | break 114 | 115 | # reshape tops to fit (leading 2 is for batch dimension) 116 | top[0].reshape(1, *self.img.shape) 117 | top[1].reshape(1, *self.label.shape) 118 | top[2].reshape(1, *self.weight.shape) 119 | 120 | def forward(self, bottom, top): 121 | top[0].data[...] = self.img 122 | top[1].data[...] = self.label 123 | top[2].data[...] = self.weight 124 | 125 | def backward(self, top, propagate_down, bottom): 126 | pass 127 | 128 | 129 | def load_image(self, idx): 130 | """ 131 | Load input image and preprocess for Caffe: 132 | - cast to float 133 | - switch channels RGB -> BGR 134 | - subtract mean 135 | - transpose to channel x height x width order 136 | """ 137 | print >> sys.stderr, 'loading Original {}'.format(idx) 138 | im = Image.open('{}/{}'.format(self.davis_dir, idx)) 139 | in_ = np.array(im, dtype=np.float32) 140 | in_ = in_[:,:,::-1] 141 | in_ -= self.mean 142 | return in_ 143 | 144 | 145 | def load_label(self, idx): 146 | """ 147 | Load label image as 1 x height x width integer array of label indices. 148 | The leading singleton dimension is required by the loss. 149 | """ 150 | im = Image.open('{}/{}'.format(self.davis_dir, idx)) 151 | label = np.array(im>0, dtype=np.uint8) 152 | print >> sys.stderr, 'Number of Objects: {}'.format(np.max(label)) 153 | 154 | #print(label) 155 | return label 156 | 157 | def load_image_transform(self, idx, scale, rotation, trans_h, trans_w, flip): 158 | img_W = np.int( self.W*(1.0 + scale) ) 159 | img_H = np.int( self.H*(1.0 + scale) ) 160 | 161 | print >> sys.stderr, 'loading {}'.format(idx) 162 | print >> sys.stderr, 'scale: {}; rotation: {}; translation: ({},{}); flip: {}.'.format(scale, rotation, trans_w, trans_h, flip) 163 | 164 | im = Image.open('{}/{}'.format(self.davis_dir, idx)) 165 | im = im.resize((img_W,img_H)) 166 | im = im.transform((img_W,img_H),Image.AFFINE,(1,0,trans_w,0,1,trans_h)) 167 | im = im.rotate(rotation) 168 | if flip: 169 | im = im.transpose(Image.FLIP_LEFT_RIGHT) 170 | 171 | if scale>0: 172 | box = (np.int((img_W - self.W)/2), np.int((img_H - self.H)/2), np.int((img_W - self.W)/2)+self.W, np.int((img_H - self.H)/2)+self.H) 173 | im = im.crop(box) 174 | else: 175 | im = im.resize((self.W, self.H)) 176 | 177 | 178 | in_ = np.array(im, dtype=np.float32) 179 | in_ = in_[:,:,::-1] 180 | in_ -= self.mean 181 | 182 | return in_ 183 | 184 | 185 | def load_label_transform(self, idx, scale, rotation, trans_h, trans_w, flip): 186 | img_W = np.int( self.W*(1.0 + scale) ) 187 | img_H = np.int( self.H*(1.0 + scale) ) 188 | 189 | 190 | im = Image.open('{}/{}'.format(self.davis_dir, idx)) 191 | im = im.resize((img_W,img_H)) 192 | im = im.transform((img_W,img_H),Image.AFFINE,(1,0,trans_w,0,1,trans_h)) 193 | im = im.rotate(rotation) 194 | if flip: 195 | im = im.transpose(Image.FLIP_LEFT_RIGHT) 196 | 197 | if scale>0: 198 | w_start = np.int(random.random()*(img_W - self.W)) 199 | h_start = np.int(random.random()*(img_H - self.H)) 200 | box = (w_start, h_start, w_start+self.W, h_start+self.H) 201 | im = im.crop(box) 202 | else: 203 | im = im.resize((self.W, self.H)) 204 | 205 | label = np.array(im>0, dtype=np.uint8) 206 | print >> sys.stderr, 'Number of Objects: {}'.format(np.max(label)) 207 | 208 | 209 | return label 210 | 211 | 212 | 213 | def calculate_weight(self, annt): 214 | weight = np.zeros_like(annt, dtype = np.float32) + 1 215 | pos = np.where(annt==1) 216 | neg = np.where(annt==0) 217 | pos_weight = 1 - np.float32(len(pos[0]))/np.float32(len(pos[0])+len(neg[0])) 218 | weight = weight - pos_weight 219 | print >> sys.stderr, 'pos_weight: {}'.format(pos_weight) 220 | 221 | for idx in range (len(pos[0])): 222 | weight[pos[0][idx], pos[1][idx]] = pos_weight 223 | 224 | #print(weight) 225 | return weight 226 | -------------------------------------------------------------------------------- /python_layers/davis2016_siamese_data_layer.py: -------------------------------------------------------------------------------- 1 | import caffe 2 | 3 | import numpy as np 4 | from PIL import Image 5 | from scipy.misc import imresize 6 | 7 | import sys 8 | 9 | import random 10 | 11 | class Davis2016SiameseDataLayer(caffe.Layer): 12 | 13 | def setup(self, bottom, top): 14 | params = eval(self.param_str) 15 | self.davis_dir = params['davis_dir'] 16 | self.split = params['split'] 17 | self.mean = np.array(params['mean']) 18 | self.random = params.get('randomize', True) 19 | self.seed = params.get('seed', None) 20 | self.scale = params.get('scale', 1) 21 | self.fg_rate = params.get('fg_rate', 0.5) 22 | self.augment = params.get('with_augmentation', True) 23 | self.fg_random = params.get('fg_random', False) 24 | self.aug_params= np.array(params['aug_params']) #( aug_num, max_scale, max_rotate, max_translation, flip) 25 | self.H = 480 26 | self.W = 854 27 | self.box_num = params.get('box_num', 100) 28 | self.fg_thre = params.get('fg_thre', 0.7) 29 | 30 | if self.augment: 31 | self.aug_num = np.int(self.aug_params[0]) 32 | self.max_scale = self.aug_params[1] 33 | self.max_rotate = self.aug_params[2] 34 | self.max_transW = self.aug_params[3] 35 | self.max_transH = self.aug_params[4] 36 | self.flip = (self.aug_params[5]>0) 37 | 38 | # tops 39 | if len(top) != 5: 40 | raise Exception("Need to define five tops: data, labels, label_sim, bbox, rois") 41 | # data layers have no bottoms 42 | if len(bottom) != 0: 43 | raise Exception("Do not define a bottom.") 44 | 45 | # load indices for images and labels 46 | split_f = '{}/ImageSets/480p/{}.txt'.format(self.davis_dir, 47 | self.split) 48 | self.indices = open(split_f, 'r').read().splitlines() 49 | self.idx1 = -1 # we pick idx in reshape 50 | self.idx2 = -1 51 | 52 | # make eval deterministic 53 | if 'train' not in self.split: 54 | self.random = False 55 | 56 | # randomization: seed and pick 57 | if self.random: 58 | random.seed(self.seed) 59 | self.idx = random.randint(0, len(self.indices)-1) 60 | 61 | 62 | def reshape(self, bottom, top): 63 | while True: 64 | # pick next input 65 | if self.random: 66 | self.idx1 = random.randint(0, len(self.indices)-1) 67 | self.idx2 = random.randint(0, len(self.indices)-1) 68 | else: 69 | self.idx1 += 1 70 | self.idx2 += 1 71 | if self.idx1 == len(self.indices): 72 | self.idx1 = 0 73 | if self.idx2 == len(self.indices): 74 | self.idx2 = 0 75 | 76 | idx1 = self.idx1 77 | idx2 = self.idx2 78 | 79 | #get clip name 80 | clip1 = self.indices[idx1].split(' ')[0].split('/')[-2] 81 | clip2 = self.indices[idx2].split(' ')[0].split('/')[-2] 82 | 83 | if clip1 != clip2: 84 | continue 85 | 86 | 87 | if self.augment == False or random.randint(0, self.aug_num) == 0: 88 | self.img1 = self.load_image(self.indices[idx1].split(' ')[0]) 89 | self.img2 = self.load_image(self.indices[idx2].split(' ')[0]) 90 | self.label1 = self.load_label(self.indices[idx1].split(' ')[1]) 91 | self.label2 = self.load_label(self.indices[idx2].split(' ')[1]) 92 | self.img1 = imresize(self.img1, size=(self.H, self.W), interp="bilinear") 93 | self.label1 = imresize(self.label1, size=(self.H, self.W), interp="nearest") 94 | self.img2 = imresize(self.img2, size=(self.H, self.W), interp="bilinear") 95 | self.label2 = imresize(self.label2, size=(self.H, self.W), interp="nearest") 96 | else: 97 | scale = (random.random()*2-1) * self.max_scale 98 | rotation = (random.random()*2-1) * self.max_rotate 99 | trans_w = np.int( (random.random()*2-1) * self.max_transW * self.W ) 100 | trans_h = np.int( (random.random()*2-1) * self.max_transH * self.H ) 101 | if self.flip: 102 | flip = (random.randint(0,1) > 0) 103 | else: 104 | flip = False 105 | self.img1 = self.load_image_transform(self.indices[idx1].split(' ')[0], scale, rotation, trans_h, trans_w, flip) 106 | self.label1 = self.load_label_transform(self.indices[idx1].split(' ')[1], scale, rotation, trans_h, trans_w, flip) 107 | 108 | scale = (random.random()*2-1) * self.max_scale 109 | rotation = (random.random()*2-1) * self.max_rotate 110 | trans_w = np.int( (random.random()*2-1) * self.max_transW * self.W ) 111 | trans_h = np.int( (random.random()*2-1) * self.max_transH * self.H ) 112 | if self.flip: 113 | flip = (random.randint(0,1) > 0) 114 | else: 115 | flip = False 116 | self.img2 = self.load_image_transform(self.indices[idx2].split(' ')[0], scale, rotation, trans_h, trans_w, flip) 117 | self.label2 = self.load_label_transform(self.indices[idx2].split(' ')[1], scale, rotation, trans_h, trans_w, flip) 118 | 119 | 120 | if self.scale != 1: 121 | self.img1 = imresize(self.img1, size=(np.int(self.H*self.scale), np.int(self.W*self.scale)), interp="bilinear") 122 | # self.label1 = imresize(self.label1, size=(np.int(self.H*self.scale), np.int(self.W*self.scale)), interp="nearest", mode='L') 123 | self.img2 = imresize(self.img2, size=(np.int(self.H*self.scale), np.int(self.W*self.scale)), interp="bilinear") 124 | # self.label2 = imresize(self.label2, size=(np.int(self.H*self.scale), np.int(self.W*self.scale)), interp="nearest", mode='L') 125 | print >> sys.stderr, 'SCALE {}'.format(self.scale) 126 | 127 | 128 | if np.max(self.label1) == 0 or np.max(self.label2) == 0: 129 | continue 130 | 131 | if np.sum(self.get_bbox(self.label1)) == 0 or np.sum(self.get_bbox(self.label2)) == 0: 132 | # if objects in either image are too small 133 | continue 134 | 135 | if self.fg_random: 136 | self.bbox = self.get_rois(self.label1, self.box_num, 1) 137 | else: 138 | self.bbox = self.get_bboxes(self.label1, self.box_num) 139 | self.rois = self.get_rois(self.label2, self.box_num, self.fg_rate) 140 | self.label_sim = np.zeros([self.box_num, 1], dtype = np.uint8) 141 | 142 | for i in range(np.int(self.box_num*self.fg_rate)): 143 | self.label_sim[i] = 1 144 | 145 | # print >> sys.stderr,'label = {}'.format(self.label_sim) 146 | self.img1 = self.img1.transpose((2,0,1)) 147 | self.img2 = self.img2.transpose((2,0,1)) 148 | 149 | self.label1 = np.uint8(self.label1>0) 150 | self.label2 = np.uint8(self.label2>0) 151 | break 152 | 153 | # reshape tops 154 | top[0].reshape(2, *self.img1.shape) 155 | top[1].reshape(2, *self.label1.shape) 156 | top[2].reshape(self.box_num, 1) 157 | top[3].reshape(self.box_num, 5) 158 | top[4].reshape(self.box_num, 5) 159 | 160 | 161 | print >> sys.stderr,self.label1.shape 162 | print >> sys.stderr,self.label2.shape 163 | 164 | def forward(self, bottom, top): 165 | # assign output 166 | top[0].data[...] = np.concatenate((self.img1[np.newaxis, ...], self.img2[np.newaxis, ...]), axis = 0) 167 | top[1].data[...] = np.concatenate((self.label1[np.newaxis, ...], self.label2[np.newaxis, ...]), axis = 0) 168 | top[2].data[...] = self.label_sim 169 | top[3].data[...] = self.bbox 170 | top[4].data[...] = self.rois 171 | 172 | def backward(self, top, propagate_down, bottom): 173 | pass 174 | 175 | 176 | def load_image(self, idx): 177 | """ 178 | Load input image and preprocess for Caffe: 179 | - cast to float 180 | - switch channels RGB -> BGR 181 | - subtract mean 182 | - transpose to channel x height x width order 183 | """ 184 | print >> sys.stderr, 'loading {}'.format(idx) 185 | im = Image.open('{}/{}'.format(self.davis_dir, idx)) 186 | im = im.resize((self.W, self.H)) 187 | in_ = np.array(im, dtype=np.float32) 188 | in_ = in_[:,:,::-1] 189 | in_ -= self.mean 190 | # in_ = in_.transpose((2,0,1)) 191 | return in_ 192 | 193 | 194 | def load_label(self, idx): 195 | """ 196 | Load label image as 1 x height x width integer array of label indices. 197 | The leading singleton dimension is required by the loss. 198 | """ 199 | print >> sys.stderr, 'loading {}'.format(idx) 200 | im = Image.open('{}/{}'.format(self.davis_dir, idx)) 201 | im = im.resize((self.W, self.H)) 202 | if self.scale != 1: 203 | im = im.resize((np.int(self.W*self.scale), np.int(self.H*self.scale))) 204 | 205 | label = np.array(im, dtype=np.uint8) 206 | # label = label[np.newaxis, ...] 207 | print >> sys.stderr, 'Number of Objects: {}'.format(np.max(label)) 208 | return label 209 | 210 | 211 | 212 | def load_image_transform(self, idx, scale, rotation, trans_h, trans_w, flip): 213 | img_W = np.int( self.W*(1.0 + scale) ) 214 | img_H = np.int( self.H*(1.0 + scale) ) 215 | 216 | print >> sys.stderr, 'loading {}'.format(idx) 217 | print >> sys.stderr, 'scale: {}; rotation: {}; translation: ({},{}); flip: {}.'.format(scale, rotation, trans_w, trans_h, flip) 218 | 219 | im = Image.open('{}/{}'.format(self.davis_dir, idx)) 220 | im = im.resize((img_W,img_H)) 221 | im = im.transform((img_W,img_H),Image.AFFINE,(1,0,trans_w,0,1,trans_h)) 222 | im = im.rotate(rotation) 223 | if flip: 224 | im = im.transpose(Image.FLIP_LEFT_RIGHT) 225 | 226 | if scale>0: 227 | box = (np.int((img_W - self.W)/2), np.int((img_H - self.H)/2), np.int((img_W - self.W)/2)+self.W, np.int((img_H - self.H)/2)+self.H) 228 | im = im.crop(box) 229 | else: 230 | im = im.resize((self.W, self.H)) 231 | 232 | in_ = np.array(im, dtype=np.float32) 233 | in_ = in_[:,:,::-1] 234 | in_ -= self.mean 235 | 236 | return in_ 237 | 238 | 239 | def load_label_transform(self, idx, scale, rotation, trans_h, trans_w, flip): 240 | img_W = np.int( self.W*(1.0 + scale) ) 241 | img_H = np.int( self.H*(1.0 + scale) ) 242 | 243 | im = Image.open('{}/{}'.format(self.davis_dir, idx)) 244 | im = im.resize((img_W,img_H)) 245 | im = im.transform((img_W,img_H),Image.AFFINE,(1,0,trans_w,0,1,trans_h)) 246 | im = im.rotate(rotation) 247 | if flip: 248 | im = im.transpose(Image.FLIP_LEFT_RIGHT) 249 | 250 | if scale>0: 251 | w_start = np.int(random.random()*(img_W - self.W)) 252 | h_start = np.int(random.random()*(img_H - self.H)) 253 | box = (w_start, h_start, w_start+self.W, h_start+self.H) 254 | im = im.crop(box) 255 | else: 256 | im = im.resize((self.W, self.H)) 257 | 258 | if self.scale != 1: 259 | im = im.resize((np.int(self.W*self.scale), np.int(self.H*self.scale))) 260 | 261 | label = np.array(im, dtype=np.uint8) 262 | print >> sys.stderr, 'Number of Objects: {}'.format(np.max(label)) 263 | 264 | return label 265 | 266 | 267 | def get_bboxes(self, label, box_num): 268 | label = np.array(label, dtype=np.uint8) 269 | pos = np.where(label > 0 ) 270 | rois = np.zeros((box_num, 5)) 271 | bb = [np.min(pos[1]), np.min(pos[0]), np.max(pos[1]), np.max(pos[0])] 272 | for k in range(box_num): 273 | rois[k, 1:5] = bb 274 | return rois 275 | 276 | 277 | def get_bbox(self, label): 278 | label = np.array(label, dtype=np.uint8) 279 | pos = np.where(label > 0 ) 280 | if len(pos[0])<1024: 281 | print >> sys.stderr, 'Escape very small object (area < 1024).' 282 | bb = [0, 0, 0, 0] 283 | return bb 284 | else: 285 | bb = [np.min(pos[1]), np.min(pos[0]), np.max(pos[1]), np.max(pos[0])] 286 | print>>sys.stderr, 'gt = {}; area = {};'.format(bb, len(pos[0])) 287 | return bb 288 | 289 | 290 | def get_rois(self, label, box_num, fg_rate): 291 | label = np.array(label, dtype=np.uint8) 292 | pos = np.where(label > 0) 293 | W = label.shape[1] 294 | H = label.shape[0] 295 | print>>sys.stderr, label.shape 296 | rois = np.zeros((box_num, 5)) 297 | gtbb = [np.min(pos[1]), np.min(pos[0]), np.max(pos[1]), np.max(pos[0])] 298 | box = np.zeros_like(gtbb) 299 | bb = np.zeros((1,5)) 300 | k = 0 301 | while k < np.int(box_num*fg_rate): 302 | bb[0, 1] = random.randint(0,W) 303 | bb[0, 2] = random.randint(0,H) 304 | bb[0, 3] = random.randint(bb[0, 1],W) 305 | bb[0, 4] = random.randint(bb[0, 2],H) 306 | if (bb[0,4] - bb[0,2]+1)*(bb[0,3]-bb[0,1]+1) < 8*8*4: 307 | continue 308 | iou = self.func_iou(bb[0, 1:5], gtbb) 309 | # print>>sys.stderr, 'gt = {}; proposal = {}; IoU = {};'.format(gtbb, bb, iou) 310 | if iou > self.fg_thre: 311 | rois[k,...] = bb 312 | k = k + 1 313 | # print>>sys.stderr, 'box_num = {}; gt = {}; proposal = {}; IoU = {};'.format(k, gtbb, bb[0, 1:5], iou) 314 | 315 | while k < box_num: 316 | bb[0, 1] = random.randint(0,W) 317 | bb[0, 2] = random.randint(0,H) 318 | bb[0, 3] = random.randint(bb[0, 1],W) 319 | bb[0, 4] = random.randint(bb[0, 2],H) 320 | if (bb[0,4] - bb[0,2]+1)*(bb[0,3]-bb[0,1]+1) < 8*8*4: 321 | continue 322 | iou = self.func_iou(bb[0, 1:5], gtbb) 323 | # print>>sys.stderr, 'obj = {}; gt = {}; proposal = {}; IoU = {};'.format(obj_id, gtbb, bb, iou) 324 | if iou < 0.5: 325 | rois[k,...] = bb 326 | k = k + 1 327 | 328 | return rois 329 | 330 | 331 | 332 | def func_iou(self, bb, gtbb): 333 | iou = 0 334 | iw = min(bb[2],gtbb[2]) - max(bb[0],gtbb[0]) + 1 335 | ih = min(bb[3],gtbb[3]) - max(bb[1],gtbb[1]) + 1 336 | if iw>0 and ih>0: 337 | ua = (bb[2]-bb[0]+1)*(bb[3]-bb[1]+1) + (gtbb[2]-gtbb[0]+1)*(gtbb[3]-gtbb[1]+1) - iw*ih 338 | iou = np.float32(iw*ih*1.0/ua) 339 | 340 | return iou 341 | 342 | -------------------------------------------------------------------------------- /python_layers/davis2017_ROIdata_layer.py: -------------------------------------------------------------------------------- 1 | import caffe 2 | 3 | import numpy as np 4 | from PIL import Image 5 | from scipy.misc import imresize 6 | 7 | import sys 8 | 9 | import random 10 | 11 | from numpy import * 12 | import math 13 | from scipy.misc import imresize 14 | 15 | import time 16 | 17 | 18 | class Davis2017ROIDataLayer(caffe.Layer): 19 | 20 | def setup(self, bottom, top): 21 | params = eval(self.param_str) 22 | self.davis_dir = params['davis_dir'] 23 | self.split = params['split'] 24 | self.mean = np.array(params['mean']) 25 | self.random = params.get('randomize', True) 26 | self.seed = params.get('seed', None) 27 | self.scale = params.get('scale', 1) 28 | self.fg_rate = params.get('fg_rate', 0.5) 29 | self.augment = params.get('with_augmentation', True) 30 | self.fg_random = params.get('fg_random', False) 31 | self.aug_params= np.array(params['aug_params']) #( aug_num, max_scale, max_rotate, max_translation, flip) 32 | self.H = 480 33 | self.W = 854 34 | self.pool_w = params.get('pool_w', 7) 35 | self.pool_h = params.get('pool_h', 7) 36 | self.box_num = params.get('box_num', 100) 37 | self.fg_thre = params.get('fg_thre', 0.7) 38 | 39 | if self.augment: 40 | self.aug_num = np.int(self.aug_params[0]) 41 | self.max_scale = self.aug_params[1] 42 | self.max_rotate = self.aug_params[2] 43 | self.max_transW = self.aug_params[3] 44 | self.max_transH = self.aug_params[4] 45 | self.flip = (self.aug_params[5]>0) 46 | 47 | # tops 48 | if len(top) != 3: 49 | raise Exception("Need to define five tops: data, labels, weights") 50 | # data layers have no bottoms 51 | if len(bottom) != 0: 52 | raise Exception("Do not define a bottom.") 53 | 54 | # load indices for images and labels 55 | split_f = '{}/ImageSets/2017/{}.txt'.format(self.davis_dir, 56 | self.split) 57 | self.indices = open(split_f, 'r').read().splitlines() 58 | self.idx1 = -1 # we pick idx in reshape 59 | self.idx2 = -1 60 | 61 | # make eval deterministic 62 | if 'train' not in self.split: 63 | self.random = False 64 | 65 | # randomization: seed and pick 66 | if self.random: 67 | random.seed(self.seed) 68 | self.idx = random.randint(0, len(self.indices)-1) 69 | 70 | 71 | def reshape(self, bottom, top): 72 | while True: 73 | # pick next input 74 | if self.random: 75 | self.idx1 = random.randint(0, len(self.indices)-1) 76 | else: 77 | self.idx1 += 1 78 | if self.idx1 == len(self.indices): 79 | self.idx1 = 0 80 | 81 | idx1 = self.idx1 82 | 83 | #get clip name 84 | clip1 = self.indices[idx1].split(' ')[0].split('/')[-2] 85 | 86 | if self.augment == False or random.randint(0, self.aug_num) == 0: 87 | self.img1 = self.load_image(self.indices[idx1].split(' ')[0]) 88 | self.label1 = self.load_label(self.indices[idx1].split(' ')[1]) 89 | # self.img1 = self.img1.resize((self.H, self.W)) 90 | # self.label1 = imresize(self.label1, size=(self.H, self.W), interp="nearest") 91 | else: 92 | scale = (random.random()*2-1) * self.max_scale 93 | rotation = (random.random()*2-1) * self.max_rotate 94 | trans_w = np.int( (random.random()*2-1) * self.max_transW * self.W ) 95 | trans_h = np.int( (random.random()*2-1) * self.max_transH * self.H ) 96 | if self.flip: 97 | flip = (random.randint(0,1) > 0) 98 | else: 99 | flip = False 100 | self.img1 = self.load_image_transform(self.indices[idx1].split(' ')[0], scale, rotation, trans_h, trans_w, flip) 101 | self.label1 = self.load_label_transform(self.indices[idx1].split(' ')[1], scale, rotation, trans_h, trans_w, flip) 102 | 103 | 104 | # if self.scale != 1: 105 | # self.img1 = self.img1.resize((np.int(self.H*self.scale), np.int(self.W*self.scale))) 106 | # print >> sys.stderr, 'SCALE {}'.format(self.scale) 107 | 108 | if np.max(self.label1) == 0: 109 | continue 110 | 111 | 112 | if self.fg_random: 113 | self.rois, obj_ids = self.get_rois(self.label1, self.box_num, 1) 114 | else: 115 | self.rois, obj_ids = self.get_bboxes(self.label1, self.box_num) 116 | 117 | if np.sum(self.rois) == 0: 118 | continue 119 | 120 | self.img_rois = self.get_img_rois(self.img1, self.rois) 121 | self.lab_rois = self.get_lab_rois(self.label1, self.rois, obj_ids) 122 | self.weights = self.calculate_weight_rois(self.lab_rois) 123 | 124 | break 125 | 126 | # reshape tops 127 | top[0].reshape(self.box_num, 3, self.pool_h, self.pool_w) 128 | top[1].reshape(self.box_num, self.pool_h, self.pool_w) 129 | top[2].reshape(self.box_num, self.pool_h, self.pool_w) 130 | 131 | 132 | 133 | def forward(self, bottom, top): 134 | # assign output 135 | top[0].data[...] = self.img_rois 136 | top[1].data[...] = self.lab_rois 137 | top[2].data[...] = self.weights 138 | 139 | def backward(self, top, propagate_down, bottom): 140 | pass 141 | 142 | 143 | def load_image(self, idx): 144 | """ 145 | Load input image and preprocess for Caffe: 146 | - cast to float 147 | - switch channels RGB -> BGR 148 | - subtract mean 149 | - transpose to channel x height x width order 150 | """ 151 | print >> sys.stderr, 'loading {}'.format(idx) 152 | im = Image.open('{}/{}'.format(self.davis_dir, idx)) 153 | im = im.resize((self.W, self.H)) 154 | return im 155 | 156 | 157 | def load_label(self, idx): 158 | """ 159 | Load label image as 1 x height x width integer array of label indices. 160 | The leading singleton dimension is required by the loss. 161 | """ 162 | print >> sys.stderr, 'loading {}'.format(idx) 163 | im = Image.open('{}/{}'.format(self.davis_dir, idx)) 164 | 165 | label = np.array(im, dtype=np.uint8) 166 | print >> sys.stderr, 'Number of Objects: {}'.format(np.max(label)) 167 | return label 168 | 169 | 170 | 171 | def load_image_transform(self, idx, scale, rotation, trans_h, trans_w, flip): 172 | img_W = np.int( self.W*(1.0 + scale) ) 173 | img_H = np.int( self.H*(1.0 + scale) ) 174 | 175 | print >> sys.stderr, 'loading {}'.format(idx) 176 | print >> sys.stderr, 'scale: {}; rotation: {}; translation: ({},{}); flip: {}.'.format(scale, rotation, trans_w, trans_h, flip) 177 | 178 | im = Image.open('{}/{}'.format(self.davis_dir, idx)) 179 | im = im.transform((img_W,img_H),Image.AFFINE,(1,0,trans_w,0,1,trans_h)) 180 | im = im.rotate(rotation) 181 | if flip: 182 | im = im.transpose(Image.FLIP_LEFT_RIGHT) 183 | 184 | 185 | return im 186 | 187 | 188 | def load_label_transform(self, idx, scale, rotation, trans_h, trans_w, flip): 189 | img_W = np.int( self.W*(1.0 + scale) ) 190 | img_H = np.int( self.H*(1.0 + scale) ) 191 | 192 | im = Image.open('{}/{}'.format(self.davis_dir, idx)) 193 | im = im.transform((img_W,img_H),Image.AFFINE,(1,0,trans_w,0,1,trans_h)) 194 | im = im.rotate(rotation) 195 | if flip: 196 | im = im.transpose(Image.FLIP_LEFT_RIGHT) 197 | 198 | 199 | label = np.array(im, dtype=np.uint8) 200 | print >> sys.stderr, 'Number of Objects: {}'.format(np.max(label)) 201 | 202 | return label 203 | 204 | 205 | def get_bboxes(self, label, box_num): 206 | label = np.array(label, dtype=np.uint8) 207 | objs = np.unique(label) 208 | 209 | if len(objs) < 2: 210 | bb = [0, 0, 0, 0, 0] 211 | return bb 212 | 213 | objs = objs[1:len(obj)] 214 | num_obj = len(objs) 215 | 216 | step = np.ceil(box_num*1.0/np.float32(num_obj)) 217 | rois = np.zeros((box_num, 5)) 218 | obj_ids = np.zeros((box_num, 1)) 219 | nn = 0 220 | area = 0 221 | for obj_id in objs: 222 | pos = np.where(label == obj_id) 223 | area = np.max([area, len(pos[0])]) 224 | bb = [np.min(pos[1]), np.min(pos[0]), np.max(pos[1]), np.max(pos[0])] 225 | for k in range(step): 226 | rois[nn, 1:5] = bb 227 | obj_ids[nn] = obj_id 228 | nn = nn + 1 229 | if nn == box_num: 230 | break 231 | 232 | if area < 40*40: 233 | bb = [0, 0, 0, 0, 0] 234 | print >> sys.stderr,'All objects atr too small.' 235 | return bb, bb 236 | 237 | 238 | return rois, obj_ids 239 | 240 | 241 | def get_bbox(self, label, obj_id): 242 | label = np.array(label, dtype=np.uint8) 243 | pos = np.where(label == obj_id ) 244 | if len(pos[0])< 40*40: 245 | print >> sys.stderr, 'Escape very small object (area < 1024).' 246 | bb = [0, 0, 0, 0] 247 | return bb 248 | else: 249 | bb = [np.min(pos[1]), np.min(pos[0]), np.max(pos[1]), np.max(pos[0])] 250 | print>>sys.stderr, 'gt = {}; area = {};'.format(bb, len(pos[0])) 251 | return bb 252 | 253 | 254 | def get_rois(self, label, box_num, fg_rate): 255 | label = np.array(label, dtype=np.uint8) 256 | objs = np.unique(label) 257 | 258 | start_time = time.clock() 259 | 260 | if len(objs) < 2: 261 | bb = [0, 0, 0, 0, 0] 262 | return bb, bb 263 | 264 | area = 0 265 | objs = objs[1:len(objs)] 266 | for obj_id in objs: 267 | pos = np.where(label == obj_id) 268 | area = np.max([area, len(pos[0])]) 269 | 270 | if area < 40*40: 271 | bb = [0, 0, 0, 0, 0] 272 | print >> sys.stderr,'All objects are too small.' 273 | return bb, bb 274 | 275 | 276 | num_obj = len(objs) 277 | rois = np.zeros((box_num, 5)) 278 | obj_ids = np.zeros((box_num,1)) 279 | k = 0 280 | gts = np.zeros([num_obj, 4]) 281 | for obj_id in objs: 282 | pos = np.where(label == obj_id) 283 | gtbb = [np.min(pos[1]), np.min(pos[0]), np.max(pos[1]), np.max(pos[0])] 284 | gts[k,...]=gtbb 285 | k = k + 1 286 | 287 | W = label.shape[1] 288 | H = label.shape[0] 289 | print>>sys.stderr, label.shape 290 | 291 | box = np.zeros_like(gtbb) 292 | bb = np.zeros((1,5)) 293 | k = 0 294 | while k < np.int(box_num*fg_rate): 295 | bb[0, 1] = random.randint(0,W) 296 | bb[0, 2] = random.randint(0,H) 297 | bb[0, 3] = random.randint(bb[0, 1],W) 298 | bb[0, 4] = random.randint(bb[0, 2],H) 299 | if (bb[0,4] - bb[0,2]+1)*(bb[0,3]-bb[0,1]+1) < 8*8*4: 300 | continue 301 | 302 | flag_select = False 303 | flag_break = False 304 | for kk in range(num_obj): 305 | iou = self.func_iou(bb[0, 1:5], gts[kk,...]) 306 | if iou > self.fg_thre and flag_select == False: 307 | inb = 0 308 | for nn in range(num_obj): 309 | if nn == kk: 310 | continue 311 | else: 312 | inb = np.max([inb, self.func_inb(gts[nn,...], bb[0,1:5])]) 313 | if inb < 0.7: 314 | flag_select = True 315 | IoU = iou 316 | idx = objs[kk] 317 | continue 318 | 319 | if iou > self.fg_thre and flag_select == True: 320 | flag_select = False 321 | break 322 | 323 | 324 | if flag_select: 325 | rois[k,...] = bb 326 | obj_ids[k] = idx 327 | # print>>sys.stderr,'obj_id {}: proposal = {}, iou = {}'.format(idx,bb,IoU) 328 | k = k + 1 329 | 330 | duration_time = time.clock() - start_time 331 | if duration_time > 60: 332 | print >> sys.stderr,'Objects too close, can not get enough proposals within {} s'.format(duration_time) 333 | flag_break = True 334 | break 335 | 336 | 337 | if flag_break: 338 | bb = [0, 0, 0, 0, 0] 339 | return bb, bb 340 | 341 | while k < box_num: 342 | bb[0, 1] = random.randint(0,W) 343 | bb[0, 2] = random.randint(0,H) 344 | bb[0, 3] = random.randint(bb[0, 1],W) 345 | bb[0, 4] = random.randint(bb[0, 2],H) 346 | if (bb[0,4] - bb[0,2]+1)*(bb[0,3]-bb[0,1]+1) < 8*8*4: 347 | continue 348 | 349 | flag_select = True 350 | for kk in range(num_obj): 351 | iou = self.func_iou(bb[0, 1:5], gts[kk,...]) 352 | if iou > 0.5: 353 | flag_select = False 354 | break 355 | 356 | if flag_select: 357 | rois[k,...] = bb 358 | k = k + 1 359 | 360 | return rois, obj_ids 361 | 362 | 363 | 364 | def func_iou(self, bb, gtbb): 365 | iou = 0 366 | iw = min(bb[2],gtbb[2]) - max(bb[0],gtbb[0]) + 1 367 | ih = min(bb[3],gtbb[3]) - max(bb[1],gtbb[1]) + 1 368 | if iw>0 and ih>0: 369 | ua = (bb[2]-bb[0]+1)*(bb[3]-bb[1]+1) + (gtbb[2]-gtbb[0]+1)*(gtbb[3]-gtbb[1]+1) - iw*ih 370 | iou = np.float32(iw*ih*1.0/ua) 371 | 372 | return iou 373 | 374 | def get_img_rois(self, img, rois): 375 | #img = Image.fromarray(img) 376 | img_rois = np.zeros((self.box_num, 3, self.pool_h, self.pool_w)) 377 | for i in range(self.box_num): 378 | bb = rois[i,1:5] 379 | ROI = img.crop(bb) 380 | ROI = ROI.resize((np.int(self.pool_w), np.int(self.pool_h))) 381 | # print>>sys.stderr, ROI.shape 382 | in_ = np.array(ROI, dtype=np.float32) 383 | in_ = in_[:,:,::-1] 384 | in_ -= self.mean 385 | ROI = in_.transpose((2,0,1)) 386 | img_rois[i,...] = ROI 387 | 388 | return img_rois 389 | 390 | 391 | def get_lab_rois(self, lab, rois, obj_ids): 392 | lab_rois = np.zeros((self.box_num, self.pool_h, self.pool_w)) 393 | for i in range(self.box_num): 394 | bb = rois[i,1:5] 395 | label= np.uint8(lab == obj_ids[i]) 396 | label= Image.fromarray(label) 397 | ROI = label.crop(bb) 398 | # print >> sys.stderr, 'ROI_lab = {}'.format(np.unique(ROI)) 399 | ROI = ROI.resize((np.int(self.pool_w), np.int(self.pool_h))) 400 | ROI = np.array(ROI, dtype=np.uint8) 401 | ROI = np.uint8(ROI>0.5) 402 | lab_rois[i,...] = ROI 403 | 404 | # print >> sys.stderr, 'ROI_lab = {}'.format(np.unique(lab_rois)) 405 | return lab_rois 406 | 407 | 408 | def calculate_weight_rois(self, labels): 409 | weights = np.ones((self.box_num, self.pool_h, self.pool_w)) 410 | for i in range(self.box_num): 411 | label = labels[i,...] 412 | pos = np.where(label==1) 413 | neg = np.where(label==0) 414 | weight_pos = len(pos[0])*1.0/(len(pos[0])+len(neg[0])) 415 | for k in range(len(pos[0])): 416 | weights[i, pos[0][k],pos[1][k]] = 1 - weight_pos 417 | # print >> sys.stderr, 'pos_num = {}, neg_num = {}, weight_pos = {}'.format(len(pos[0]), len(neg[0]), 1 - weight_pos) 418 | 419 | return weights 420 | 421 | def func_inb(self, bb, gtbb): 422 | # iou/area(bb) 423 | iou = 0 424 | iw = min(bb[2],gtbb[2]) - max(bb[0],gtbb[0]) + 1 425 | ih = min(bb[3],gtbb[3]) - max(bb[1],gtbb[1]) + 1 426 | if iw>0 and ih>0: 427 | ua = (bb[2]-bb[0]+1)*(bb[3]-bb[1]+1) 428 | iou = np.float32(iw*ih*1.0/ua) 429 | 430 | return iou 431 | 432 | -------------------------------------------------------------------------------- /python_layers/davis_data_layer.py: -------------------------------------------------------------------------------- 1 | import caffe 2 | 3 | import numpy as np 4 | from PIL import Image 5 | 6 | import cv2 7 | from scipy.misc import imresize 8 | from scipy.misc import imrotate 9 | 10 | import sys 11 | 12 | import random 13 | 14 | class DAVISDataLayer(caffe.Layer): 15 | """ 16 | Load (input image, label image) pairs from PASCAL VOC 17 | one-at-a-time while reshaping the net to preserve dimensions. 18 | 19 | Use this to feed data to a fully convolutional network. 20 | """ 21 | 22 | def setup(self, bottom, top): 23 | # config 24 | params = eval(self.param_str) 25 | self.davis_dir = params['davis_dir'] 26 | self.split = params['split'] 27 | self.mean = np.array(params['mean']) 28 | self.random = params.get('randomize', True) 29 | self.seed = params.get('seed', None) 30 | self.scale = params.get('scale', 1) 31 | self.augment = params.get('with_augmentation', True) 32 | self.aug_params = np.array(params['aug_params']) #( aug_num, max_scale, max_rotate, max_translation, flip) 33 | self.H = 480 34 | self.W = 854 35 | 36 | 37 | # two tops: data and label 38 | if len(top) != 2: 39 | raise Exception("Need to define two tops: data label") 40 | # data layers have no bottoms 41 | if len(bottom) != 0: 42 | raise Exception("Do not define a bottom.") 43 | 44 | # load indices for images and labels 45 | split_f = '{}/ImageSets/2017/{}.txt'.format(self.davis_dir, 46 | self.split) 47 | self.indices = open(split_f, 'r').read().splitlines() 48 | self.idx = -1 # we pick idx in reshape 49 | 50 | if self.augment: 51 | self.aug_num = np.int(self.aug_params[0]) 52 | self.max_scale = self.aug_params[1] 53 | self.max_rotate = self.aug_params[2] 54 | self.max_transW = self.aug_params[3] 55 | self.max_transH = self.aug_params[4] 56 | self.flip = (self.aug_params[5]>0) 57 | 58 | 59 | # make eval deterministic 60 | if 'train' not in self.split: 61 | self.random = False 62 | 63 | # randomization: seed and pick 64 | if self.random: 65 | random.seed(self.seed) 66 | self.idx = random.randint(0, len(self.indices)-1) 67 | 68 | 69 | def reshape(self, bottom, top): 70 | 71 | while True: 72 | # pick next input 73 | if self.random: 74 | self.idx = random.randint(0, len(self.indices)-1) 75 | else: 76 | self.idx += 1 77 | if self.idx == len(self.indices): 78 | self.idx = 0 79 | 80 | 81 | if self.idx == (len(self.indices) - 1): 82 | continue 83 | 84 | idx = self.idx 85 | 86 | if self.augment == False or random.randint(0, self.aug_num) == 0: 87 | self.img = self.load_image(self.indices[idx].split(' ')[0]) 88 | self.label = self.load_label(self.indices[idx].split(' ')[1]) 89 | self.img = imresize(self.img, size=(self.H, self.W), interp="bilinear") 90 | self.label = imresize(self.label, size=(self.H, self.W), interp="nearest") 91 | else: 92 | scale = (random.random()*2-1) * self.max_scale 93 | rotation = (random.random()*2-1) * self.max_rotate 94 | trans_w = np.int( (random.random()*2-1) * self.max_transW * self.W ) 95 | trans_h = np.int( (random.random()*2-1) * self.max_transH * self.H ) 96 | if self.flip: 97 | flip = (random.randint(0,1) > 0) 98 | else: 99 | flip = False 100 | self.img = self.load_image_transform(self.indices[idx].split(' ')[0], scale, rotation, trans_h, trans_w, flip) 101 | self.label = self.load_label_transform(self.indices[idx].split(' ')[1], scale, rotation, trans_h, trans_w, flip) 102 | 103 | 104 | if self.scale != 1: 105 | self.img = imresize(self.img, size=(np.int(self.H*self.scale), np.int(self.W*self.scale)), interp="bilinear") 106 | self.label = imresize(self.label, size=(np.int(self.H*self.scale), np.int(self.W*self.scale)), interp="nearest") 107 | 108 | 109 | self.img = self.img.transpose((2,0,1)) 110 | self.label = self.label[np.newaxis, ...] 111 | # print(self.img.shape) 112 | break 113 | 114 | # reshape tops to fit (leading 2 is for batch dimension) 115 | top[0].reshape(1, *self.img.shape) 116 | top[1].reshape(1, *self.label.shape) 117 | 118 | def forward(self, bottom, top): 119 | top[0].data[...] = self.img 120 | top[1].data[...] = self.label 121 | 122 | def backward(self, top, propagate_down, bottom): 123 | pass 124 | 125 | 126 | def load_image(self, idx): 127 | """ 128 | Load input image and preprocess for Caffe: 129 | - cast to float 130 | - switch channels RGB -> BGR 131 | - subtract mean 132 | - transpose to channel x height x width order 133 | """ 134 | print >> sys.stderr, 'loading Original {}'.format(idx) 135 | im = Image.open('{}/{}'.format(self.davis_dir, idx)) 136 | in_ = np.array(im, dtype=np.float32) 137 | in_ = in_[:,:,::-1] 138 | in_ -= self.mean 139 | return in_ 140 | 141 | 142 | def load_label(self, idx): 143 | """ 144 | Load label image as 1 x height x width integer array of label indices. 145 | The leading singleton dimension is required by the loss. 146 | """ 147 | im = Image.open('{}/{}'.format(self.davis_dir, idx)) 148 | label = np.array(im, dtype=np.uint8) 149 | print >> sys.stderr, 'Number of Objects: {}'.format(np.max(label)) 150 | 151 | #print(label) 152 | return label 153 | 154 | def load_image_transform(self, idx, scale, rotation, trans_h, trans_w, flip): 155 | img_W = np.int( self.W*(1.0 + scale) ) 156 | img_H = np.int( self.H*(1.0 + scale) ) 157 | 158 | print >> sys.stderr, 'loading {}'.format(idx) 159 | print >> sys.stderr, 'scale: {}; rotation: {}; translation: ({},{}); flip: {}.'.format(scale, rotation, trans_w, trans_h, flip) 160 | 161 | im = Image.open('{}/{}'.format(self.davis_dir, idx)) 162 | im = im.resize((img_W,img_H)) 163 | im = im.transform((img_W,img_H),Image.AFFINE,(1,0,trans_w,0,1,trans_h)) 164 | im = im.rotate(rotation) 165 | if flip: 166 | im = im.transpose(Image.FLIP_LEFT_RIGHT) 167 | 168 | if scale>0: 169 | box = (np.int((img_W - self.W)/2), np.int((img_H - self.H)/2), np.int((img_W - self.W)/2)+self.W, np.int((img_H - self.H)/2)+self.H) 170 | im = im.crop(box) 171 | else: 172 | im = im.resize((self.W, self.H)) 173 | 174 | 175 | in_ = np.array(im, dtype=np.float32) 176 | in_ = in_[:,:,::-1] 177 | in_ -= self.mean 178 | 179 | return in_ 180 | 181 | 182 | def load_label_transform(self, idx, scale, rotation, trans_h, trans_w, flip): 183 | img_W = np.int( self.W*(1.0 + scale) ) 184 | img_H = np.int( self.H*(1.0 + scale) ) 185 | 186 | 187 | im = Image.open('{}/{}'.format(self.davis_dir, idx)) 188 | im = im.resize((img_W,img_H)) 189 | im = im.transform((img_W,img_H),Image.AFFINE,(1,0,trans_w,0,1,trans_h)) 190 | im = im.rotate(rotation) 191 | if flip: 192 | im = im.transpose(Image.FLIP_LEFT_RIGHT) 193 | 194 | if scale>0: 195 | w_start = np.int(random.random()*(img_W - self.W)) 196 | h_start = np.int(random.random()*(img_H - self.H)) 197 | box = (w_start, h_start, w_start+self.W, h_start+self.H) 198 | im = im.crop(box) 199 | else: 200 | im = im.resize((self.W, self.H)) 201 | 202 | label = np.array(im, dtype=np.uint8) 203 | print >> sys.stderr, 'Number of Objects: {}'.format(np.max(label)) 204 | 205 | 206 | return label 207 | 208 | 209 | 210 | 211 | -------------------------------------------------------------------------------- /python_layers/davis_siamese_data_layer.py: -------------------------------------------------------------------------------- 1 | import caffe 2 | 3 | import numpy as np 4 | from PIL import Image 5 | from scipy.misc import imresize 6 | 7 | import sys 8 | 9 | import random 10 | 11 | class DavisSiameseDataLayer(caffe.Layer): 12 | 13 | def setup(self, bottom, top): 14 | params = eval(self.param_str) 15 | self.davis_dir = params['davis_dir'] 16 | self.split = params['split'] 17 | self.mean = np.array(params['mean']) 18 | self.random = params.get('randomize', True) 19 | self.seed = params.get('seed', None) 20 | self.scale = params.get('scale', 1) 21 | self.fg_rate = params.get('fg_rate', 0.5) 22 | self.augment = params.get('with_augmentation', True) 23 | self.aug_params= np.array(params['aug_params']) #( aug_num, max_scale, max_rotate, max_translation, flip) 24 | self.H = 480 25 | self.W = 854 26 | self.box_num = params.get('box_num', 100) 27 | 28 | if self.augment: 29 | self.aug_num = np.int(self.aug_params[0]) 30 | self.max_scale = self.aug_params[1] 31 | self.max_rotate = self.aug_params[2] 32 | self.max_transW = self.aug_params[3] 33 | self.max_transH = self.aug_params[4] 34 | self.flip = (self.aug_params[5]>0) 35 | 36 | # tops 37 | if len(top) != 5: 38 | raise Exception("Need to define five tops: data, labels, label_sim, bbox, rois") 39 | # data layers have no bottoms 40 | if len(bottom) != 0: 41 | raise Exception("Do not define a bottom.") 42 | 43 | # load indices for images and labels 44 | split_f = '{}/ImageSets/480p/{}.txt'.format(self.davis_dir, 45 | self.split) 46 | self.indices = open(split_f, 'r').read().splitlines() 47 | self.idx1 = -1 # we pick idx in reshape 48 | self.idx2 = -1 49 | 50 | # make eval deterministic 51 | if 'train' not in self.split: 52 | self.random = False 53 | 54 | # randomization: seed and pick 55 | if self.random: 56 | random.seed(self.seed) 57 | self.idx = random.randint(0, len(self.indices)-1) 58 | 59 | 60 | def reshape(self, bottom, top): 61 | while True: 62 | # pick next input 63 | if self.random: 64 | self.idx1 = random.randint(0, len(self.indices)-1) 65 | self.idx2 = random.randint(0, len(self.indices)-1) 66 | else: 67 | self.idx1 += 1 68 | self.idx2 += 1 69 | if self.idx1 == len(self.indices): 70 | self.idx1 = 0 71 | if self.idx2 == len(self.indices): 72 | self.idx2 = 0 73 | 74 | idx1 = self.idx1 75 | idx2 = self.idx2 76 | 77 | #get clip name 78 | clip1 = self.indices[idx1].split(' ')[0].split('/')[-2] 79 | clip2 = self.indices[idx2].split(' ')[0].split('/')[-2] 80 | 81 | if clip1 != clip2: 82 | continue 83 | 84 | 85 | if self.augment == False or random.randint(0, self.aug_num) == 0: 86 | self.img1 = self.load_image(self.indices[idx1].split(' ')[0]) 87 | self.img2 = self.load_image(self.indices[idx2].split(' ')[0]) 88 | self.label1 = self.load_label(self.indices[idx1].split(' ')[1]) 89 | self.label2 = self.load_label(self.indices[idx2].split(' ')[1]) 90 | self.img1 = imresize(self.img1, size=(self.H, self.W), interp="bilinear") 91 | self.label1 = imresize(self.label1, size=(self.H, self.W), interp="nearest") 92 | self.img2 = imresize(self.img2, size=(self.H, self.W), interp="bilinear") 93 | self.label2 = imresize(self.label2, size=(self.H, self.W), interp="nearest") 94 | else: 95 | scale = (random.random()*2-1) * self.max_scale 96 | rotation = (random.random()*2-1) * self.max_rotate 97 | trans_w = np.int( (random.random()*2-1) * self.max_transW * self.W ) 98 | trans_h = np.int( (random.random()*2-1) * self.max_transH * self.H ) 99 | if self.flip: 100 | flip = (random.randint(0,1) > 0) 101 | else: 102 | flip = False 103 | self.img1 = self.load_image_transform(self.indices[idx1].split(' ')[0], scale, rotation, trans_h, trans_w, flip) 104 | self.label1 = self.load_label_transform(self.indices[idx1].split(' ')[1], scale, rotation, trans_h, trans_w, flip) 105 | 106 | scale = (random.random()*2-1) * self.max_scale 107 | rotation = (random.random()*2-1) * self.max_rotate 108 | trans_w = np.int( (random.random()*2-1) * self.max_transW * self.W ) 109 | trans_h = np.int( (random.random()*2-1) * self.max_transH * self.H ) 110 | if self.flip: 111 | flip = (random.randint(0,1) > 0) 112 | else: 113 | flip = False 114 | self.img2 = self.load_image_transform(self.indices[idx2].split(' ')[0], scale, rotation, trans_h, trans_w, flip) 115 | self.label2 = self.load_label_transform(self.indices[idx2].split(' ')[1], scale, rotation, trans_h, trans_w, flip) 116 | 117 | 118 | if self.scale != 1: 119 | self.img1 = imresize(self.img1, size=(np.int(self.H*self.scale), np.int(self.W*self.scale)), interp="bilinear") 120 | self.label1 = imresize(self.label1, size=(np.int(self.H*self.scale), np.int(self.W*self.scale)), interp="nearest") 121 | self.img2 = imresize(self.img2, size=(np.int(self.H*self.scale), np.int(self.W*self.scale)), interp="bilinear") 122 | self.label2 = imresize(self.label2, size=(np.int(self.H*self.scale), np.int(self.W*self.scale)), interp="nearest") 123 | print >> sys.stderr, 'SCALE {}'.format(self.scale) 124 | 125 | # randomly select an object 126 | num_obj = np.max(self.label1) 127 | if num_obj == 0 or np.max(self.label2) == 0: 128 | continue 129 | 130 | obj_id = random.randint(1, num_obj) 131 | if np.sum(self.get_bbox(self.label1, obj_id)) == 0 or np.sum(self.get_bbox(self.label2, obj_id)) == 0: 132 | # if objects in either image are too small 133 | continue 134 | 135 | # self.bbox = self.get_rois(self.label1, obj_id, self.box_num, 1) 136 | self.bbox = self.get_bboxes(self.label1, obj_id, self.box_num) 137 | self.rois = self.get_rois(self.label2, obj_id, self.box_num, self.fg_rate) 138 | self.label_sim = np.zeros([self.box_num, 1], dtype = np.uint8) 139 | for i in range(np.int(self.box_num/2)): 140 | self.label_sim[i] = 1 141 | # print >> sys.stderr,'label = {}'.format(self.label_sim) 142 | self.img1 = self.img1.transpose((2,0,1)) 143 | self.img2 = self.img2.transpose((2,0,1)) 144 | 145 | self.label1 = np.uint8(self.label1==obj_id) 146 | self.label2 = np.uint8(self.label2==obj_id) 147 | break 148 | 149 | # reshape tops 150 | top[0].reshape(2, *self.img1.shape) 151 | top[1].reshape(2, *self.label1.shape) 152 | top[2].reshape(self.box_num, 1) 153 | top[3].reshape(self.box_num, 5) 154 | top[4].reshape(self.box_num, 5) 155 | 156 | 157 | def forward(self, bottom, top): 158 | # assign output 159 | top[0].data[...] = np.concatenate((self.img1[np.newaxis, ...], self.img2[np.newaxis, ...]), axis = 0) 160 | top[1].data[...] = np.concatenate((self.label1[np.newaxis, ...], self.label2[np.newaxis, ...]), axis = 0) 161 | top[2].data[...] = self.label_sim 162 | top[3].data[...] = self.bbox 163 | top[4].data[...] = self.rois 164 | 165 | def backward(self, top, propagate_down, bottom): 166 | pass 167 | 168 | 169 | def load_image(self, idx): 170 | """ 171 | Load input image and preprocess for Caffe: 172 | - cast to float 173 | - switch channels RGB -> BGR 174 | - subtract mean 175 | - transpose to channel x height x width order 176 | """ 177 | print >> sys.stderr, 'loading {}'.format(idx) 178 | im = Image.open('{}/{}'.format(self.davis_dir, idx)) 179 | im = im.resize((self.W, self.H)) 180 | in_ = np.array(im, dtype=np.float32) 181 | in_ = in_[:,:,::-1] 182 | in_ -= self.mean 183 | # in_ = in_.transpose((2,0,1)) 184 | return in_ 185 | 186 | 187 | def load_label(self, idx): 188 | """ 189 | Load label image as 1 x height x width integer array of label indices. 190 | The leading singleton dimension is required by the loss. 191 | """ 192 | print >> sys.stderr, 'loading {}'.format(idx) 193 | im = Image.open('{}/{}'.format(self.davis_dir, idx)) 194 | im = im.resize((self.W, self.H)) 195 | label = np.array(im, dtype=np.uint8) 196 | # label = label[np.newaxis, ...] 197 | return label 198 | 199 | 200 | 201 | def load_image_transform(self, idx, scale, rotation, trans_h, trans_w, flip): 202 | img_W = np.int( self.W*(1.0 + scale) ) 203 | img_H = np.int( self.H*(1.0 + scale) ) 204 | 205 | print >> sys.stderr, 'loading {}'.format(idx) 206 | print >> sys.stderr, 'scale: {}; rotation: {}; translation: ({},{}); flip: {}.'.format(scale, rotation, trans_w, trans_h, flip) 207 | 208 | im = Image.open('{}/{}'.format(self.davis_dir, idx)) 209 | im = im.resize((img_W,img_H)) 210 | im = im.transform((img_W,img_H),Image.AFFINE,(1,0,trans_w,0,1,trans_h)) 211 | im = im.rotate(rotation) 212 | if flip: 213 | im = im.transpose(Image.FLIP_LEFT_RIGHT) 214 | 215 | if scale>0: 216 | box = (np.int((img_W - self.W)/2), np.int((img_H - self.H)/2), np.int((img_W - self.W)/2)+self.W, np.int((img_H - self.H)/2)+self.H) 217 | im = im.crop(box) 218 | else: 219 | im = im.resize((self.W, self.H)) 220 | 221 | in_ = np.array(im, dtype=np.float32) 222 | in_ = in_[:,:,::-1] 223 | in_ -= self.mean 224 | 225 | return in_ 226 | 227 | 228 | def load_label_transform(self, idx, scale, rotation, trans_h, trans_w, flip): 229 | img_W = np.int( self.W*(1.0 + scale) ) 230 | img_H = np.int( self.H*(1.0 + scale) ) 231 | 232 | im = Image.open('{}/{}'.format(self.davis_dir, idx)) 233 | im = im.resize((img_W,img_H)) 234 | im = im.transform((img_W,img_H),Image.AFFINE,(1,0,trans_w,0,1,trans_h)) 235 | im = im.rotate(rotation) 236 | if flip: 237 | im = im.transpose(Image.FLIP_LEFT_RIGHT) 238 | 239 | if scale>0: 240 | w_start = np.int(random.random()*(img_W - self.W)) 241 | h_start = np.int(random.random()*(img_H - self.H)) 242 | box = (w_start, h_start, w_start+self.W, h_start+self.H) 243 | im = im.crop(box) 244 | else: 245 | im = im.resize((self.W, self.H)) 246 | 247 | label = np.array(im, dtype=np.uint8) 248 | print >> sys.stderr, 'Number of Objects: {}'.format(np.max(label)) 249 | 250 | return label 251 | 252 | 253 | def get_bboxes(self, label, obj_id, box_num): 254 | label = np.array(label, dtype=np.uint8) 255 | pos = np.where(label == obj_id) 256 | rois = np.zeros((box_num, 5)) 257 | bb = [np.min(pos[1]), np.min(pos[0]), np.max(pos[1]), np.max(pos[0])] 258 | for k in range(box_num): 259 | rois[k, 1:5] = bb 260 | return rois 261 | 262 | 263 | def get_bbox(self, label, obj_id): 264 | label = np.array(label, dtype=np.uint8) 265 | pos = np.where(label == obj_id) 266 | if len(pos[0])<1024: 267 | print >> sys.stderr, 'escape very small object {}'.format(obj_id) 268 | bb = [0, 0, 0, 0] 269 | return bb 270 | else: 271 | bb = [np.min(pos[1]), np.min(pos[0]), np.max(pos[1]), np.max(pos[0])] 272 | print>>sys.stderr, 'obj = {}; gt = {}; area = {};'.format(obj_id, bb, len(pos[0])) 273 | return bb 274 | 275 | 276 | def get_rois(self, label, obj_id, box_num, fg_rate): 277 | label = np.array(label, dtype=np.uint8) 278 | pos = np.where(label == obj_id) 279 | W = label.shape[1] 280 | H = label.shape[0] 281 | print>>sys.stderr, label.shape 282 | rois = np.zeros((box_num, 5)) 283 | gtbb = [np.min(pos[1]), np.min(pos[0]), np.max(pos[1]), np.max(pos[0])] 284 | box = np.zeros_like(gtbb) 285 | bb = np.zeros((1,5)) 286 | k = 0 287 | while k < np.int(box_num*fg_rate): 288 | bb[0, 1] = random.randint(0,W) 289 | bb[0, 2] = random.randint(0,H) 290 | bb[0, 3] = random.randint(bb[0, 1],W) 291 | bb[0, 4] = random.randint(bb[0, 2],H) 292 | if (bb[0,4] - bb[0,2]+1)*(bb[0,3]-bb[0,1]+1) < 8*8*4: 293 | continue 294 | iou = self.func_iou(bb[0, 1:5], gtbb) 295 | # print>>sys.stderr, 'obj = {}; gt = {}; proposal = {}; IoU = {};'.format(obj_id, gtbb, bb, iou) 296 | if iou > 0.7: 297 | rois[k,...] = bb 298 | k = k + 1 299 | # print>>sys.stderr, 'obj = {}; gt = {}; proposal = {}; IoU = {};'.format(obj_id, gtbb, bb[0, 1:5], iou) 300 | 301 | while k < box_num: 302 | bb[0, 1] = random.randint(0,W) 303 | bb[0, 2] = random.randint(0,H) 304 | bb[0, 3] = random.randint(bb[0, 1],W) 305 | bb[0, 4] = random.randint(bb[0, 2],H) 306 | if (bb[0,4] - bb[0,2]+1)*(bb[0,3]-bb[0,1]+1) < 8*8*4: 307 | continue 308 | iou = self.func_iou(bb[0, 1:5], gtbb) 309 | # print>>sys.stderr, 'obj = {}; gt = {}; proposal = {}; IoU = {};'.format(obj_id, gtbb, bb, iou) 310 | if iou < 0.5: 311 | rois[k,...] = bb 312 | k = k + 1 313 | # print>>sys.stderr, 'obj = {}; gt = {}; proposal = {}; IoU = {};'.format(obj_id, gtbb, bb[0, 1:5], iou) 314 | 315 | return rois 316 | 317 | 318 | 319 | def func_iou(self, bb, gtbb): 320 | iou = 0 321 | iw = min(bb[2],gtbb[2]) - max(bb[0],gtbb[0]) + 1 322 | ih = min(bb[3],gtbb[3]) - max(bb[1],gtbb[1]) + 1 323 | if iw>0 and ih>0: 324 | ua = (bb[2]-bb[0]+1)*(bb[3]-bb[1]+1) + (gtbb[2]-gtbb[0]+1)*(gtbb[3]-gtbb[1]+1) - iw*ih 325 | iou = np.float32(iw*ih*1.0/ua) 326 | 327 | return iou 328 | 329 | -------------------------------------------------------------------------------- /python_layers/davis_siamese_data_layer3.py: -------------------------------------------------------------------------------- 1 | import caffe 2 | 3 | import numpy as np 4 | from PIL import Image 5 | from scipy.misc import imresize 6 | 7 | import sys 8 | 9 | import random 10 | 11 | class DavisSiameseDataLayer3(caffe.Layer): 12 | 13 | def setup(self, bottom, top): 14 | params = eval(self.param_str) 15 | self.davis_dir = params['davis_dir'] 16 | self.split = params['split'] 17 | self.mean = np.array(params['mean']) 18 | self.random = params.get('randomize', True) 19 | self.seed = params.get('seed', None) 20 | self.scale = params.get('scale', 1) 21 | self.fg_rate = params.get('fg_rate', 0.5) 22 | self.augment = params.get('with_augmentation', True) 23 | self.aug_params= np.array(params['aug_params']) #( aug_num, max_scale, max_rotate, max_translation, flip) 24 | self.H = 480 25 | self.W = 854 26 | self.box_num = params.get('box_num', 100) 27 | 28 | if self.augment: 29 | self.aug_num = np.int(self.aug_params[0]) 30 | self.max_scale = self.aug_params[1] 31 | self.max_rotate = self.aug_params[2] 32 | self.max_transW = self.aug_params[3] 33 | self.max_transH = self.aug_params[4] 34 | self.flip = (self.aug_params[5]>0) 35 | 36 | # tops 37 | if len(top) != 5: 38 | raise Exception("Need to define five tops: data, labels, label_sim, bbox, rois") 39 | # data layers have no bottoms 40 | if len(bottom) != 0: 41 | raise Exception("Do not define a bottom.") 42 | 43 | # load indices for images and labels 44 | split_f = '{}/ImageSets/480p/{}.txt'.format(self.davis_dir, 45 | self.split) 46 | self.indices = open(split_f, 'r').read().splitlines() 47 | self.idx1 = -1 # we pick idx in reshape 48 | self.idx2 = -1 49 | 50 | # make eval deterministic 51 | if 'train' not in self.split: 52 | self.random = False 53 | 54 | # randomization: seed and pick 55 | if self.random: 56 | random.seed(self.seed) 57 | self.idx = random.randint(0, len(self.indices)-1) 58 | 59 | 60 | def reshape(self, bottom, top): 61 | while True: 62 | # pick next input 63 | if self.random: 64 | self.idx1 = random.randint(0, len(self.indices)-1) 65 | self.idx2 = random.randint(0, len(self.indices)-1) 66 | else: 67 | self.idx1 += 1 68 | self.idx2 += 1 69 | if self.idx1 == len(self.indices): 70 | self.idx1 = 0 71 | if self.idx2 == len(self.indices): 72 | self.idx2 = 0 73 | 74 | idx1 = self.idx1 75 | idx2 = self.idx2 76 | 77 | #get clip name 78 | clip1 = self.indices[idx1].split(' ')[0].split('/')[-2] 79 | clip2 = self.indices[idx2].split(' ')[0].split('/')[-2] 80 | 81 | if clip1 != clip2: 82 | continue 83 | 84 | 85 | if self.augment == False or random.randint(0, self.aug_num) == 0: 86 | self.img1 = self.load_image(self.indices[idx1].split(' ')[0]) 87 | self.img2 = self.load_image(self.indices[idx2].split(' ')[0]) 88 | self.label1 = self.load_label(self.indices[idx1].split(' ')[1]) 89 | self.label2 = self.load_label(self.indices[idx2].split(' ')[1]) 90 | self.img1 = imresize(self.img1, size=(self.H, self.W), interp="bilinear") 91 | self.label1 = imresize(self.label1, size=(self.H, self.W), interp="nearest") 92 | self.img2 = imresize(self.img2, size=(self.H, self.W), interp="bilinear") 93 | self.label2 = imresize(self.label2, size=(self.H, self.W), interp="nearest") 94 | else: 95 | scale = (random.random()*2-1) * self.max_scale 96 | rotation = (random.random()*2-1) * self.max_rotate 97 | trans_w = np.int( (random.random()*2-1) * self.max_transW * self.W ) 98 | trans_h = np.int( (random.random()*2-1) * self.max_transH * self.H ) 99 | if self.flip: 100 | flip = (random.randint(0,1) > 0) 101 | else: 102 | flip = False 103 | self.img1 = self.load_image_transform(self.indices[idx1].split(' ')[0], scale, rotation, trans_h, trans_w, flip) 104 | self.label1 = self.load_label_transform(self.indices[idx1].split(' ')[1], scale, rotation, trans_h, trans_w, flip) 105 | 106 | scale = (random.random()*2-1) * self.max_scale 107 | rotation = (random.random()*2-1) * self.max_rotate 108 | trans_w = np.int( (random.random()*2-1) * self.max_transW * self.W ) 109 | trans_h = np.int( (random.random()*2-1) * self.max_transH * self.H ) 110 | if self.flip: 111 | flip = (random.randint(0,1) > 0) 112 | else: 113 | flip = False 114 | self.img2 = self.load_image_transform(self.indices[idx2].split(' ')[0], scale, rotation, trans_h, trans_w, flip) 115 | self.label2 = self.load_label_transform(self.indices[idx2].split(' ')[1], scale, rotation, trans_h, trans_w, flip) 116 | 117 | 118 | if self.scale != 1: 119 | self.img1 = imresize(self.img1, size=(np.int(self.H*self.scale), np.int(self.W*self.scale)), interp="bilinear") 120 | self.label1 = imresize(self.label1, size=(np.int(self.H*self.scale), np.int(self.W*self.scale)), interp="nearest") 121 | self.img2 = imresize(self.img2, size=(np.int(self.H*self.scale), np.int(self.W*self.scale)), interp="bilinear") 122 | self.label2 = imresize(self.label2, size=(np.int(self.H*self.scale), np.int(self.W*self.scale)), interp="nearest") 123 | print >> sys.stderr, 'SCALE {}'.format(self.scale) 124 | 125 | # randomly select an object 126 | num_obj = np.max(self.label1) 127 | if num_obj == 0 or np.max(self.label2) == 0: 128 | continue 129 | 130 | obj_id = random.randint(1, num_obj) 131 | if np.sum(self.get_bbox(self.label1, obj_id)) == 0 or np.sum(self.get_bbox(self.label2, obj_id)) == 0: 132 | # if objects in either image are too small 133 | continue 134 | 135 | self.bbox = self.get_rois(self.label1, obj_id, self.box_num, 1) 136 | # self.bbox = self.get_bboxes(self.label1, obj_id, self.box_num) 137 | self.rois = self.get_rois(self.label2, obj_id, self.box_num, self.fg_rate) 138 | self.label_sim = np.zeros([self.box_num, 1], dtype = np.uint8) 139 | for i in range(np.int(self.box_num/2)): 140 | self.label_sim[i] = 1 141 | # print >> sys.stderr,'label = {}'.format(self.label_sim) 142 | self.img1 = self.img1.transpose((2,0,1)) 143 | self.img2 = self.img2.transpose((2,0,1)) 144 | 145 | self.label1 = np.uint8(self.label1==obj_id) 146 | self.label2 = np.uint8(self.label2==obj_id) 147 | break 148 | 149 | # reshape tops 150 | top[0].reshape(2, *self.img1.shape) 151 | top[1].reshape(2, *self.label1.shape) 152 | top[2].reshape(self.box_num, 1) 153 | top[3].reshape(self.box_num, 5) 154 | top[4].reshape(self.box_num, 5) 155 | 156 | 157 | def forward(self, bottom, top): 158 | # assign output 159 | top[0].data[...] = np.concatenate((self.img1[np.newaxis, ...], self.img2[np.newaxis, ...]), axis = 0) 160 | top[1].data[...] = np.concatenate((self.label1[np.newaxis, ...], self.label2[np.newaxis, ...]), axis = 0) 161 | top[2].data[...] = self.label_sim 162 | top[3].data[...] = self.bbox 163 | top[4].data[...] = self.rois 164 | 165 | def backward(self, top, propagate_down, bottom): 166 | pass 167 | 168 | 169 | def load_image(self, idx): 170 | """ 171 | Load input image and preprocess for Caffe: 172 | - cast to float 173 | - switch channels RGB -> BGR 174 | - subtract mean 175 | - transpose to channel x height x width order 176 | """ 177 | print >> sys.stderr, 'loading {}'.format(idx) 178 | im = Image.open('{}/{}'.format(self.davis_dir, idx)) 179 | im = im.resize((self.W, self.H)) 180 | in_ = np.array(im, dtype=np.float32) 181 | in_ = in_[:,:,::-1] 182 | in_ -= self.mean 183 | # in_ = in_.transpose((2,0,1)) 184 | return in_ 185 | 186 | 187 | def load_label(self, idx): 188 | """ 189 | Load label image as 1 x height x width integer array of label indices. 190 | The leading singleton dimension is required by the loss. 191 | """ 192 | print >> sys.stderr, 'loading {}'.format(idx) 193 | im = Image.open('{}/{}'.format(self.davis_dir, idx)) 194 | im = im.resize((self.W, self.H)) 195 | label = np.array(im, dtype=np.uint8) 196 | # label = label[np.newaxis, ...] 197 | return label 198 | 199 | 200 | 201 | def load_image_transform(self, idx, scale, rotation, trans_h, trans_w, flip): 202 | img_W = np.int( self.W*(1.0 + scale) ) 203 | img_H = np.int( self.H*(1.0 + scale) ) 204 | 205 | print >> sys.stderr, 'loading {}'.format(idx) 206 | print >> sys.stderr, 'scale: {}; rotation: {}; translation: ({},{}); flip: {}.'.format(scale, rotation, trans_w, trans_h, flip) 207 | 208 | im = Image.open('{}/{}'.format(self.davis_dir, idx)) 209 | im = im.resize((img_W,img_H)) 210 | im = im.transform((img_W,img_H),Image.AFFINE,(1,0,trans_w,0,1,trans_h)) 211 | im = im.rotate(rotation) 212 | if flip: 213 | im = im.transpose(Image.FLIP_LEFT_RIGHT) 214 | 215 | if scale>0: 216 | box = (np.int((img_W - self.W)/2), np.int((img_H - self.H)/2), np.int((img_W - self.W)/2)+self.W, np.int((img_H - self.H)/2)+self.H) 217 | im = im.crop(box) 218 | else: 219 | im = im.resize((self.W, self.H)) 220 | 221 | in_ = np.array(im, dtype=np.float32) 222 | in_ = in_[:,:,::-1] 223 | in_ -= self.mean 224 | 225 | return in_ 226 | 227 | 228 | def load_label_transform(self, idx, scale, rotation, trans_h, trans_w, flip): 229 | img_W = np.int( self.W*(1.0 + scale) ) 230 | img_H = np.int( self.H*(1.0 + scale) ) 231 | 232 | im = Image.open('{}/{}'.format(self.davis_dir, idx)) 233 | im = im.resize((img_W,img_H)) 234 | im = im.transform((img_W,img_H),Image.AFFINE,(1,0,trans_w,0,1,trans_h)) 235 | im = im.rotate(rotation) 236 | if flip: 237 | im = im.transpose(Image.FLIP_LEFT_RIGHT) 238 | 239 | if scale>0: 240 | w_start = np.int(random.random()*(img_W - self.W)) 241 | h_start = np.int(random.random()*(img_H - self.H)) 242 | box = (w_start, h_start, w_start+self.W, h_start+self.H) 243 | im = im.crop(box) 244 | else: 245 | im = im.resize((self.W, self.H)) 246 | 247 | label = np.array(im, dtype=np.uint8) 248 | print >> sys.stderr, 'Number of Objects: {}'.format(np.max(label)) 249 | 250 | return label 251 | 252 | 253 | def get_bboxes(self, label, obj_id, box_num): 254 | label = np.array(label, dtype=np.uint8) 255 | pos = np.where(label == obj_id) 256 | rois = np.zeros((box_num, 5)) 257 | bb = [np.min(pos[1]), np.min(pos[0]), np.max(pos[1]), np.max(pos[0])] 258 | for k in range(box_num): 259 | rois[k, 1:5] = bb 260 | return rois 261 | 262 | 263 | def get_bbox(self, label, obj_id): 264 | label = np.array(label, dtype=np.uint8) 265 | pos = np.where(label == obj_id) 266 | if len(pos[0])<1024: 267 | print >> sys.stderr, 'escape very small object {}'.format(obj_id) 268 | bb = [0, 0, 0, 0] 269 | return bb 270 | else: 271 | bb = [np.min(pos[1]), np.min(pos[0]), np.max(pos[1]), np.max(pos[0])] 272 | print>>sys.stderr, 'obj = {}; gt = {}; area = {};'.format(obj_id, bb, len(pos[0])) 273 | return bb 274 | 275 | 276 | def get_rois(self, label, obj_id, box_num, fg_rate): 277 | label = np.array(label, dtype=np.uint8) 278 | pos = np.where(label == obj_id) 279 | W = label.shape[1] 280 | H = label.shape[0] 281 | print>>sys.stderr, label.shape 282 | rois = np.zeros((box_num, 5)) 283 | gtbb = [np.min(pos[1]), np.min(pos[0]), np.max(pos[1]), np.max(pos[0])] 284 | box = np.zeros_like(gtbb) 285 | bb = np.zeros((1,5)) 286 | k = 0 287 | while k < np.int(box_num*fg_rate): 288 | bb[0, 1] = random.randint(0,W) 289 | bb[0, 2] = random.randint(0,H) 290 | bb[0, 3] = random.randint(bb[0, 1],W) 291 | bb[0, 4] = random.randint(bb[0, 2],H) 292 | if (bb[0,4] - bb[0,2]+1)*(bb[0,3]-bb[0,1]+1) < 8*8*4: 293 | continue 294 | iou = self.func_iou(bb[0, 1:5], gtbb) 295 | # print>>sys.stderr, 'obj = {}; gt = {}; proposal = {}; IoU = {};'.format(obj_id, gtbb, bb, iou) 296 | if iou > 0.7: 297 | rois[k,...] = bb 298 | k = k + 1 299 | # print>>sys.stderr, 'obj = {}; gt = {}; proposal = {}; IoU = {};'.format(obj_id, gtbb, bb[0, 1:5], iou) 300 | 301 | while k < box_num: 302 | bb[0, 1] = random.randint(0,W) 303 | bb[0, 2] = random.randint(0,H) 304 | bb[0, 3] = random.randint(bb[0, 1],W) 305 | bb[0, 4] = random.randint(bb[0, 2],H) 306 | if (bb[0,4] - bb[0,2]+1)*(bb[0,3]-bb[0,1]+1) < 8*8*4: 307 | continue 308 | iou = self.func_iou(bb[0, 1:5], gtbb) 309 | # print>>sys.stderr, 'obj = {}; gt = {}; proposal = {}; IoU = {};'.format(obj_id, gtbb, bb, iou) 310 | if iou < 0.5: 311 | rois[k,...] = bb 312 | k = k + 1 313 | # print>>sys.stderr, 'obj = {}; gt = {}; proposal = {}; IoU = {};'.format(obj_id, gtbb, bb[0, 1:5], iou) 314 | 315 | return rois 316 | 317 | 318 | 319 | def func_iou(self, bb, gtbb): 320 | iou = 0 321 | iw = min(bb[2],gtbb[2]) - max(bb[0],gtbb[0]) + 1 322 | ih = min(bb[3],gtbb[3]) - max(bb[1],gtbb[1]) + 1 323 | if iw>0 and ih>0: 324 | ua = (bb[2]-bb[0]+1)*(bb[3]-bb[1]+1) + (gtbb[2]-gtbb[0]+1)*(gtbb[3]-gtbb[1]+1) - iw*ih 325 | iou = np.float32(iw*ih*1.0/ua) 326 | 327 | return iou 328 | 329 | -------------------------------------------------------------------------------- /python_layers/davis_test_score_layer.py: -------------------------------------------------------------------------------- 1 | import caffe 2 | 3 | import numpy as np 4 | from PIL import Image 5 | from scipy.misc import imresize 6 | 7 | import sys 8 | 9 | import random 10 | 11 | from scipy import ndimage 12 | 13 | 14 | 15 | class DavisSiameseTestScoreLayer(caffe.Layer): 16 | 17 | def setup(self, bottom, top): 18 | params = eval(self.param_str) 19 | self.davis_dir = params['davis_dir'] 20 | self.split = params['split'] 21 | self.mean = np.array(params['mean']) 22 | self.sc_params = np.array(params['aug_params']) # min_scale, max_scale, stride, flag_search_range (0: whole image, 1: search_range), search_range 23 | self.H = 480 24 | self.W = 854 25 | 26 | 27 | self.min_scale = self.sc_params[0] 28 | self.max_scale = self.sc_params[1] 29 | self.stride = self.sc_params[2] 30 | self.flag_range = (self.sc_params[3]>0) 31 | self.search_range = self.sc_params[4] 32 | # anchors: 1*2, 2*1, 1*1 33 | 34 | # tops 35 | if len(top) != 3: 36 | raise Exception("Need to define three tops: scores, scales, anchors") 37 | # data layers have no bottoms 38 | if len(bottom) != 3: 39 | raise Exception("Need two bottoms: gt1_box, feat1, feat2") 40 | 41 | def reshape(self, bottom, top): 42 | self.bbox = np.zeros_like(bottom[0].data, dtype = np.float32) 43 | self.feat1 = np.zeros_like(bottom[1].data, dtype = np.float32) 44 | self.feat2 = np.zeros_like(bottom[2].data, dtype = np.float32) 45 | self.num_scoremaps = 3.0*( (self.max_scale - self.min_scale)*1.0/self.stride + 1 ) 46 | self.scales = np.zeros([self.num_scoremaps, 1], dtype = np.float32) 47 | self.anchors = np.zeros([self.num_scoremaps, 2], dtype = np.float32) 48 | self.score_maps = np.zeros([self.num_scoremaps, bottom[1].shape[2], bottom[1].shape[3]], dtype = np.float32) 49 | # reshape tops 50 | top[0].reshape(self.num_scoremaps, bottom[1].shape[2], bottom[1].shape[3]) 51 | top[1].reshape(self.num_scoremaps, 1) 52 | top[2].reshape(self.num_scoremaps, 2) 53 | 54 | 55 | def forward(self, bottom, top): 56 | self.bbox = bottom[0].data 57 | self.feat1 = bottom[1].data 58 | self.feat2 = bottom[2].data 59 | 60 | self.feat1 = self.feat1.reshape([self.feat1.shape[1], self.feat1.shape[2], self.feat1.shape[3]]) 61 | self.feat1 = self.feat1.transpos((2,3,1)) 62 | feat_bbox = self.extract_feat_ROI(self.feat1, self.bbox) # get gt features 63 | 64 | self.feat2 = self.feat2.reshape([self.feat1.shape[1], self.feat1.shape[2], self.feat1.shape[3]]) 65 | self.feat2 = self.feat2.transpos((2,3,1)) 66 | 67 | anchor = [1, 1] 68 | for i in range(np.int(self.num_scoremaps/3)): 69 | scale = self.min_scale + i*self.stride 70 | feat_map = self.extract_feat_map(self.feat2, scale, anchor) # get scaled feature maps 71 | score_map = ndimage.convolve(feat_bbox, feat_map, mode='reflect', cval=0.0) 72 | self.score_maps[i,...] = score_map 73 | self.scales[i] = scale 74 | self.anchors[i,...] = anchor 75 | 76 | 77 | # assign output 78 | top[0].data[...] = self.score_maps 79 | top[1].data[...] = self.scales 80 | top[2].data[...] = self.anchors 81 | 82 | 83 | 84 | def backward(self, top, propagate_down, bottom): 85 | pass 86 | 87 | 88 | def extract_feat_ROI(self, feat, bbox): 89 | bbox = np.round(bbox) 90 | feat_bbox = feat[bbox[0]:bbox[2],bbox[1]:bbox[3],...] 91 | 92 | return feat_bbox 93 | 94 | 95 | def extract_feat_map(self, feat, scale, anchor): 96 | 97 | 98 | 99 | -------------------------------------------------------------------------------- /python_layers/mask_data_layer.py: -------------------------------------------------------------------------------- 1 | import caffe 2 | import numpy as np 3 | from numpy import * 4 | import math 5 | from scipy.misc import imresize 6 | import sys 7 | import random 8 | 9 | from PIL import Image 10 | 11 | class MaskDataLayer(caffe.Layer): 12 | 13 | def setup(self, bottom, top): 14 | # check input pair 15 | if len(bottom) != 2: 16 | raise Exception("Need two bottoms: label, rois") 17 | if len(top) != 2: 18 | raise Exception("Need two top: mask_targets, weights") 19 | params = eval(self.param_str) 20 | self.mask_w = params.get('mask_w', 14) 21 | self.mask_h = params.get('mask_h', 14) 22 | 23 | 24 | def reshape(self, bottom, top): 25 | self.label = np.zeros_like(bottom[0].data, dtype=np.float32) 26 | self.rois = np.zeros_like(bottom[1].data, dtype=np.float32) 27 | self.mask_targets = np.zeros([self.rois.shape[0], self.mask_h, self.mask_w],dtype=np.uint8) 28 | self.weights = np.zeros([self.rois.shape[0], self.mask_h, self.mask_w],dtype=np.float32) 29 | top[0].reshape(self.rois.shape[0], self.mask_h, self.mask_w) 30 | top[1].reshape(self.rois.shape[0], self.mask_h, self.mask_w) 31 | 32 | def forward(self, bottom, top): 33 | self.label = bottom[0].data 34 | self.label = self.label.reshape(self.label.shape[1], self.label.shape[2]) 35 | # print >> sys.stderr, self.label 36 | self.rois = bottom[1].data 37 | box_num = self.rois.shape[0] 38 | label_map = self.label 39 | for i in range(box_num): 40 | label_map_new = self.label_in_box(label_map, self.rois[i,...]) 41 | label_map_new = imresize(label_map_new, size=(np.int(self.mask_h), np.int(self.mask_w)), interp="nearest", mode='F') 42 | label_map_new = np.array(label_map_new, dtype=np.uint8) 43 | # print label_map_new 44 | # pos = np.where(label_map_new == 1) 45 | # print >> sys.stderr, 'Number of positive pixels after resize: {}'.format(len(pos[0])) 46 | self.mask_targets[i,...] = label_map_new 47 | self.weights[i,...] = self.calculate_weight(label_map_new) 48 | # print self.mask_targets 49 | top[0].data[...] = self.mask_targets 50 | top[1].data[...] = self.weights 51 | 52 | 53 | def backward(self, top, propagate_down, bottom): 54 | pass 55 | 56 | 57 | def label_in_box(self, label, bb): 58 | bb = bb[1:5] 59 | label = Image.fromarray(label) 60 | # print >>sys.stderr, bb 61 | label = label.crop(bb) 62 | # label = np.array(label, dtype=np.uint8) 63 | # pos = np.where(label == 1) 64 | # print >> sys.stderr, 'Number of positive pixels: {}'.format(len(pos[0])) 65 | 66 | return label 67 | 68 | 69 | 70 | def calculate_weight(self, label): 71 | pos = np.where(label==1) 72 | neg = np.where(label==0) 73 | weight_pos = len(pos[0])*1.0/(len(pos[0])+len(neg[0])) 74 | weight = np.ones_like(label, dtype=np.float32)*weight_pos 75 | for i in range(len(pos[0])): 76 | weight[pos[0][i],pos[1][i]] = 1 - weight_pos 77 | 78 | print >> sys.stderr, 'pos_num = {}, neg_num = {}, weight_pos = {}'.format(len(pos[0]), len(neg[0]), 1 - weight_pos) 79 | 80 | return weight 81 | -------------------------------------------------------------------------------- /python_layers/mask_data_layer2.py: -------------------------------------------------------------------------------- 1 | import caffe 2 | import numpy as np 3 | from numpy import * 4 | import math 5 | from scipy.misc import imresize 6 | import sys 7 | import random 8 | 9 | from PIL import Image 10 | 11 | class MaskDataLayer2(caffe.Layer): 12 | 13 | def setup(self, bottom, top): 14 | # check input pair 15 | if len(bottom) != 2: 16 | raise Exception("Need two bottoms: label, rois") 17 | if len(top) != 2: 18 | raise Exception("Need two top: mask_targets, weights") 19 | params = eval(self.param_str) 20 | self.mask_w = params.get('mask_w', 14) 21 | self.mask_h = params.get('mask_h', 14) 22 | 23 | 24 | def reshape(self, bottom, top): 25 | self.label = np.zeros_like(bottom[0].data, dtype=np.float32) 26 | self.rois = np.zeros_like(bottom[1].data, dtype=np.float32) 27 | self.mask_targets = np.zeros([self.rois.shape[0], self.mask_h, self.mask_w],dtype=np.uint8) 28 | self.weights = np.zeros([self.rois.shape[0], self.mask_h, self.mask_w],dtype=np.float32) 29 | top[0].reshape(self.rois.shape[0], self.mask_h, self.mask_w) 30 | top[1].reshape(self.rois.shape[0], self.mask_h, self.mask_w) 31 | 32 | def forward(self, bottom, top): 33 | self.label = bottom[0].data 34 | self.label = self.label.reshape(self.label.shape[1], self.label.shape[2]) 35 | # print >> sys.stderr, self.label 36 | self.rois = bottom[1].data 37 | box_num = self.rois.shape[0] 38 | label_map = self.label 39 | for i in range(box_num): 40 | label_map_new = self.label_in_box(label_map, self.rois[i,...]) 41 | label_map_new = imresize(label_map_new, size=(np.int(self.mask_h), np.int(self.mask_w)), interp="nearest", mode='F') 42 | label_map_new = np.array(label_map_new, dtype=np.uint8) 43 | # print label_map_new 44 | # pos = np.where(label_map_new == 1) 45 | # print >> sys.stderr, 'Number of positive pixels after resize: {}'.format(len(pos[0])) 46 | self.mask_targets[i,...] = label_map_new 47 | self.weights[i,...] = self.calculate_weight(label_map_new) 48 | # print self.mask_targets 49 | top[0].data[...] = self.mask_targets 50 | top[1].data[...] = self.weights 51 | 52 | 53 | def backward(self, top, propagate_down, bottom): 54 | pass 55 | 56 | 57 | def label_in_box(self, label, bb): 58 | box = bb[1:5] 59 | #bb[0] = box[1] 60 | #bb[1] = box[0] 61 | #bb[3] = box[2] 62 | #bb[2] = box[3] 63 | #bb = bb[0:4] 64 | bb = box 65 | label = Image.fromarray(label) 66 | # print >>sys.stderr, bb 67 | label = label.crop(bb) 68 | # label = np.array(label, dtype=np.uint8) 69 | # pos = np.where(label == 1) 70 | # print >> sys.stderr, 'Number of positive pixels: {}'.format(len(pos[0])) 71 | 72 | return label 73 | 74 | 75 | 76 | def calculate_weight(self, label): 77 | pos = np.where(label==1) 78 | neg = np.where(label==0) 79 | weight_pos = len(pos[0])*1.0/(len(pos[0])+len(neg[0])) 80 | weight = np.ones_like(label, dtype=np.float32)*weight_pos 81 | for i in range(len(pos[0])): 82 | weight[pos[0][i],pos[1][i]] = 1 - weight_pos 83 | 84 | print >> sys.stderr, 'pos_num = {}, neg_num = {}, weight_pos = {}'.format(len(pos[0]), len(neg[0]), 1 - weight_pos) 85 | 86 | return weight 87 | -------------------------------------------------------------------------------- /python_layers/readme: -------------------------------------------------------------------------------- 1 | 2 | -------------------------------------------------------------------------------- /python_layers/response_value.py: -------------------------------------------------------------------------------- 1 | import caffe 2 | import numpy as np 3 | from numpy import * 4 | import math 5 | from scipy.misc import imresize 6 | import sys 7 | import random 8 | 9 | class ResValueLayer(caffe.Layer): 10 | 11 | def setup(self, bottom, top): 12 | # check input pair 13 | if len(bottom) != 1: 14 | raise Exception("Need one bottom.") 15 | 16 | 17 | def reshape(self, bottom, top): 18 | self.response = np.zeros_like(bottom[0].data, dtype=np.float32) 19 | 20 | def forward(self, bottom, top): 21 | self.response = bottom[0].data 22 | print >> sys.stderr,'Response Absolute Value Mean = {}; Response Value Range = [{}, {}];'.format(np.mean(np.abs(self.response)), np.min(self.response), np.max(self.response)) 23 | 24 | 25 | def backward(self, top, propagate_down, bottom): 26 | pass 27 | -------------------------------------------------------------------------------- /python_layers/siamese_online_hard_mining_layer.py: -------------------------------------------------------------------------------- 1 | import caffe 2 | import numpy as np 3 | from numpy import * 4 | import math 5 | from scipy.misc import imresize 6 | import sys 7 | import random 8 | 9 | class SiaHardMiningLayer(caffe.Layer): 10 | 11 | def setup(self, bottom, top): 12 | # check input pair 13 | if len(bottom) != 3: 14 | raise Exception("Need two bottoms: label, feat1, feat2") 15 | 16 | 17 | def reshape(self, bottom, top): 18 | self.label = np.zeros_like(bottom[0].data, dtype=np.float32) 19 | self.label_new = np.zeros_like(bottom[0].data, dtype=np.float32) 20 | self.feat1 = np.zeros_like(bottom[1].data, dtype=np.float32) 21 | self.feat2 = np.zeros_like(bottom[2].data, dtype=np.float32) 22 | self.maintain_rate = 0.5 23 | self.H = 480 24 | self.W = 854 25 | self.pos = np.zeros([ 2,self.label.shape[0] ]) 26 | top[0].reshape(bottom[0].shape[0]/2, 1) 27 | top[1].reshape(bottom[1].shape[0]/2, bottom[1].shape[1], bottom[1].shape[2], bottom[1].shape[3]) 28 | top[2].reshape(bottom[1].shape[0]/2, bottom[1].shape[1], bottom[1].shape[2], bottom[1].shape[3]) 29 | 30 | 31 | def forward(self, bottom, top): 32 | self.label = bottom[0].data 33 | self.feat1 = bottom[1].data 34 | self.feat2 = bottom[2].data 35 | label = self.label 36 | #print self.feat1.shape 300,4096,1,1 37 | #print >> sys.stderr, self.feat1 38 | #print >> sys.stderr, self.feat2 39 | 40 | dist = np.zeros([len(self.label),1], dtype=np.float32) 41 | for i in range(len(self.label)): 42 | vector1 = np.float32(mat(self.feat1[i,...,0])) 43 | vector2 = np.float32(mat(self.feat2[i,...,0])) 44 | dist[i] = sqrt(((vector1-vector2).T)*(vector1-vector2)) 45 | 46 | 47 | pos = np.where(self.label == 1) 48 | D_sim = dist[pos] 49 | thre = np.median(D_sim) 50 | for i in range(len(pos[0])): 51 | if D_sim[i] < thre: 52 | label[pos[0][i]] = -1 53 | 54 | pos = np.where(self.label == 0) 55 | D_diff= dist[pos] 56 | thre = np.median(D_diff) 57 | for i in range(len(pos[0])): 58 | if D_diff[i] > thre: 59 | label[pos[0][i]] = -1 60 | 61 | # print>> sys.stderr, self.label 62 | 63 | pos1 = np.where(label == 1) 64 | pos2 = np.where(label == 0) 65 | if len(pos1[0])>len(pos2[0]): 66 | pos = np.zeros_like(pos2) 67 | pos[0][...] = pos1[0][0:len(pos2[0])] 68 | pos[1][...] = pos1[1][0:len(pos2[0])] 69 | pos1 = pos 70 | 71 | if len(pos2[0])>len(pos1[0]): 72 | pos = np.zeros_like(pos1) 73 | pos[0][...] = pos2[0][0:len(pos1[0])] 74 | pos[1][...] = pos2[1][0:len(pos1[0])] 75 | pos2 = pos 76 | 77 | 78 | self.pos = np.concatenate((pos1, pos2), axis = 1) 79 | self.label_new = self.label[self.pos[0]] 80 | self.feat1 = self.feat1[self.pos[0],...] 81 | self.feat2 = self.feat2[self.pos[0],...] 82 | # print>> sys.stderr, self.label_new.shape 83 | 84 | top[0].reshape(*self.label_new.shape) 85 | top[0].data[...] = self.label_new 86 | top[1].reshape(*self.feat1.shape) 87 | top[1].data[...] = self.feat1 88 | top[2].reshape(*self.feat2.shape) 89 | top[2].data[...] = self.feat2 90 | 91 | 92 | 93 | def backward(self, top, propagate_down, bottom): 94 | for i in range(3): 95 | if not propagate_down[i]: 96 | continue 97 | 98 | if i == 0: 99 | continue 100 | 101 | bottom[i].diff[...] = np.zeros_like(bottom[i].data, dtype=np.float32) 102 | # print >> sys.stderr, top[i].diff.shape 103 | # print >> sys.stderr, bottom[i].diff.shape 104 | for j in range(len(self.pos[0])): 105 | bottom[i].diff[self.pos[0][j],...] += top[i].diff[j,...] 106 | 107 | 108 | -------------------------------------------------------------------------------- /python_layers/slice_layer.py: -------------------------------------------------------------------------------- 1 | import caffe 2 | import numpy as np 3 | from numpy import * 4 | import math 5 | from scipy.misc import imresize 6 | import sys 7 | import random 8 | 9 | class SliceLayer(caffe.Layer): 10 | 11 | def setup(self, bottom, top): 12 | # check input pair 13 | if len(bottom) != 1: 14 | raise Exception("Need one bottom: feat_concat") 15 | 16 | 17 | def reshape(self, bottom, top): 18 | self.feats = np.zeros_like(bottom[0].data, dtype=np.float32) 19 | top[0].reshape(self.feats.shape[0]-1,self.feats.shape[1],self.feats.shape[2],self.feats.shape[3]) 20 | top[1].reshape(self.feats.shape[0]-1,self.feats.shape[1],self.feats.shape[2],self.feats.shape[3]) 21 | 22 | 23 | def forward(self, bottom, top): 24 | self.feats = bottom[0].data 25 | num_obj = self.feats.shape[0]-1 26 | self.feat2 = self.feats[1:self.feats.shape[0],...] 27 | self.feat1 = np.zeros_like(self.feat2,dtype=np.float32) 28 | 29 | print >> sys.stderr, 'Number of objects: {}'.format(num_obj) 30 | 31 | for obj_id in range(num_obj): 32 | self.feat1[obj_id,...] = self.feats[0,...] 33 | 34 | top[0].reshape(*self.feat1.shape) 35 | top[0].data[...] = self.feat1 36 | top[1].reshape(*self.feat2.shape) 37 | top[1].data[...] = self.feat2 38 | 39 | 40 | def backward(self, top, propagate_down, bottom): 41 | pass 42 | -------------------------------------------------------------------------------- /results/download_results.sh: -------------------------------------------------------------------------------- 1 | if [ ! -d "favos_baseline" ]; then 2 | wget https://www.dropbox.com/s/9zwob31bz91u75h/favos.tar 3 | tar -xf favos.tar 4 | rm -f favos.tar 5 | fi 6 | 7 | -------------------------------------------------------------------------------- /test_davis16.sh: -------------------------------------------------------------------------------- 1 | # Demo script for FAVOS 2 | # Usage: sh test_davis16.sh {GPU-id} {sequence-name} 3 | 4 | cd demo && \ 5 | python infer_davis16.py ../models/ROISegNet_2016.caffemodel deploy.prototxt $2 $1 && \ 6 | matlab -nojvm -nodesktop -nodisplay -r "combine_mask('$2')" && \ 7 | cd ../ 8 | --------------------------------------------------------------------------------