├── .gitignore
├── README.md
├── ROISegNet
├── ROISegNet_2016.prototxt
├── ROISegNet_2017.prototxt
├── solve.py
├── solver_davis16.prototxt
└── solver_davis17.prototxt
├── data
├── README.md
└── dowdload_DAVIS16.sh
├── demo
├── combine_mask.m
├── deploy.prototxt
├── infer_davis16.py
├── jaccard_single.m
└── weights.mat
├── download_all.sh
├── figures
└── framework.png
├── models
└── download_model.sh
├── part-tracking
└── siamese-fc
│ └── download_parts.sh
├── python_layers
├── bbox_accuracy_layer.py
├── bbox_data_layer.py
├── bbox_iou.py
├── compute_similarity.py
├── davis2016_ROIdata_layer.py
├── davis2016_fgbg_data_layer.py
├── davis2016_siamese_data_layer.py
├── davis2017_ROIdata_layer.py
├── davis_data_layer.py
├── davis_siamese_data_layer.py
├── davis_siamese_data_layer3.py
├── davis_test_score_layer.py
├── mask_data_layer.py
├── mask_data_layer2.py
├── readme
├── response_value.py
├── siamese_online_hard_mining_layer.py
└── slice_layer.py
├── results
└── download_results.sh
└── test_davis16.sh
/.gitignore:
--------------------------------------------------------------------------------
1 | results/favos*
2 | results-demo/*
3 | data/DAVIS2016
4 | part-tracking/siamese-fc/DAVIS2016
5 | *.zip
6 | *.tar
7 | *.caffemodel
8 | *.solverestate
9 |
10 | # python
11 |
12 | __pycache__/
13 | *.py[cod]
14 | *$py.class
15 |
--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
1 | # Fast and Accurate Online Video Object Segmentation via Tracking Parts (FAVOS)
2 |
3 | 
4 |
5 | Contact: Jingchun Cheng (chengjingchun14 at 163 dot com)
6 |
7 | ## Paper
8 | [Fast and Accurate Online Video Object Segmentation via Tracking Parts](https://arxiv.org/abs/1806.02323)
9 | [Jingchun Cheng](https://sites.google.com/view/jingchun-cheng), [Yi-Hsuan Tsai](https://sites.google.com/site/yihsuantsai/home), [Wei-Chih Hung](https://hfslyc.github.io/), [Shengjin Wang](http://www.tsinghua.edu.cn/publish/eeen/3784/2010/20101219115601212198627/20101219115601212198627_.html) and [Ming-Hsuan Yang](http://faculty.ucmerced.edu/mhyang/)
10 | IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018 (Spotlight)
11 |
12 | This is the authors' demo (single-GPU-version) code for the DAVIS 2016 dataset as described in the above paper. Please cite our paper if you find it useful for your research.
13 |
14 | ```
15 | @inproceedings{Cheng_favos_2018,
16 | author = {J. Cheng and Y.-H. Tsai and W.-C. Hung and S. Wang and M.-H. Yang},
17 | booktitle = {IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
18 | title = {Fast and Accurate Online Video Object Segmentation via Tracking Parts},
19 | year = {2018}
20 | }
21 | ```
22 |
23 | ## FAVOS results
24 | [Segmentation Comparisons with Fast Online Methods](https://www.dropbox.com/s/l95ozepuohie7x4/DAVIS16_segmentation_comparison_methods_with_strong_applicability.avi?dl=0)
25 |
26 | [Example Video of Part Tracking](https://www.dropbox.com/s/3yszhdjz6klpmzr/Illustration_part_tracking.avi?dl=0)
27 |
28 |
29 | ## Requirements
30 | * caffe (pycaffe)
31 | * opencv
32 | * matlab
33 | * A GPU with at least 12GB memory
34 |
35 | Download DAVIS 2016 dataset, trained models, tracked parts and pre-computed results
36 | ```
37 | sh download_all.sh
38 | ```
39 |
40 | ## Test our model
41 | We provide an example testing script `test_davis16.sh`.
42 | ```
43 | # Please run download_all.sh first
44 | # Usage: sh test_davis16.sh
45 |
46 | sh test_davis16.sh 0 blackswan
47 | ```
48 | The results would be saved in `results-demo/res_favos/`.
49 | You can replace the sequence name with others in the DAVIS 2016 validation set to obatin results for other videos.
50 |
51 |
52 | ## Train your own ROISegNet
53 | Download [ResNet-101 model](https://www.dropbox.com/s/2506oyjkwy7acjv/init.caffemodel?dl=0) and save it in the folder "models" as "init.caffemodel"
54 | ```
55 | cd ROISegNet
56 | python solve.py ../models/init.caffemodel solver_davis16.prototxt 0
57 | ```
58 |
59 | ## Tracker
60 | We use the SiaFC tracker in [Fully-Convolutional Siamese Networks for Object Tracking](https://github.com/bertinetto/siamese-fc).
61 | The pre-computed parts and tracking results on DAVIS 2016 can be downloaded [here](https://www.dropbox.com/s/pkqlzlhwun4qwuu/parts_DAVIS2016.tar?dl=0).
62 |
63 | Note that, we are currently working on a stable version to combine part tracking and ROISegNet for practical usage on any videos. We will update the code in a near future.
64 |
65 | ## Download our segmentation results on the DAVIS datasets
66 | * FAVOS on DAVIS2016 [link](https://www.dropbox.com/s/9zwob31bz91u75h/favos.tar?dl=0)
67 | * FAVOS on DAVIS2017 [link](https://www.dropbox.com/s/8gtcgf27qdhzyqu/favos_2017.tar?dl=0)
68 |
69 |
70 | ## Note
71 | The models and code are available for non-commercial research purposes only.
72 |
73 | 06/2018: demo code released
74 |
75 |
--------------------------------------------------------------------------------
/ROISegNet/solve.py:
--------------------------------------------------------------------------------
1 | import os,sys
2 | sys.path.append("../caffe")
3 | sys.path.append("../caffe/python")
4 | sys.path.append("../caffe/python/caffe")
5 |
6 | sys.path.insert(0, "../../fcn_python/")
7 | sys.path.insert(0, "../../python_layers/")
8 |
9 |
10 | import caffe
11 | import surgery
12 |
13 | import numpy as np
14 |
15 |
16 | weights = sys.argv[1]
17 | solver_proto = sys.argv[2]
18 | gpu_id = np.int(sys.argv[3])
19 |
20 | caffe.set_device(gpu_id)
21 | caffe.set_mode_gpu()
22 |
23 | solver = caffe.SGDSolver(solver_proto)
24 | solver.net.copy_from(weights)
25 |
26 | # surgeries
27 | interp_layers = [k for k in solver.net.params.keys() if 'up' in k]
28 | surgery.interp(solver.net, interp_layers)
29 |
30 | for _ in range(20):
31 | solver.step(10000)
32 |
33 |
34 |
--------------------------------------------------------------------------------
/ROISegNet/solver_davis16.prototxt:
--------------------------------------------------------------------------------
1 | train_net: "ROISegNet_2016.prototxt"
2 | display: 20
3 |
4 | average_loss: 20
5 | lr_policy: "fixed"
6 | # lr for unnormalized softmax
7 | base_lr: 1e-5
8 |
9 | # high momentum
10 | momentum: 0.99
11 |
12 | clip_gradients: 20
13 | # no gradient accumulation
14 |
15 | iter_size: 1
16 | max_iter: 100000
17 | weight_decay: 0.0005
18 | snapshot: 5000
19 | snapshot_prefix: "../models/ROISegNet_2016"
20 |
21 | test_initialization: false
22 |
--------------------------------------------------------------------------------
/ROISegNet/solver_davis17.prototxt:
--------------------------------------------------------------------------------
1 | train_net: "ROISegNet_2017.prototxt"
2 | display: 20
3 |
4 | average_loss: 20
5 | lr_policy: "fixed"
6 | # lr for unnormalized softmax
7 | base_lr: 1e-5
8 |
9 | # high momentum
10 | momentum: 0.99
11 |
12 | clip_gradients: 20
13 | # no gradient accumulation
14 |
15 | iter_size: 1
16 | max_iter: 100000
17 | weight_decay: 0.0005
18 | snapshot: 5000
19 | snapshot_prefix: "../models/ROISegNet_2017"
20 |
21 | test_initialization: false
22 |
--------------------------------------------------------------------------------
/data/README.md:
--------------------------------------------------------------------------------
1 | Download the DAVIS 2016 and the DAVIS 2017 datasets, and put them in this folder.
2 |
--------------------------------------------------------------------------------
/data/dowdload_DAVIS16.sh:
--------------------------------------------------------------------------------
1 | if [ ! -d "DAVIS2016" ]; then
2 | wget https://graphics.ethz.ch/Downloads/Data/Davis/DAVIS-data.zip
3 | unzip DAVIS-data.zip
4 | mv DAVIS DAVIS2016
5 | rm -f DAVIS-data.zip
6 | fi
7 |
--------------------------------------------------------------------------------
/demo/combine_mask.m:
--------------------------------------------------------------------------------
1 | function combine_mask(class_name)
2 |
3 | DAVIS_dir = '../data/DAVIS2016/';
4 | base_dir = '../results/favos_baseline/';
5 | part_dir = '../results-demo/res_part/';
6 | res_dir = '../results-demo/res_favos/';
7 |
8 |
9 | assert(exist(base_dir)>0, 'Please download baseline results "results/favos_baseline".');
10 | assert(exist(part_dir)>0, 'Please run part segmentation first.');
11 |
12 | if ~exist(res_dir)
13 | mkdir(res_dir);
14 | end
15 |
16 |
17 | %% combine parts
18 | mkdir([res_dir,class_name]);
19 |
20 | images = dir(fullfile(DAVIS_dir,'JPEGImages/480p/',class_name,'/*.jpg'));
21 | for k = 1:length(images)
22 |
23 | ann_img = imread([DAVIS_dir,'Annotations/480p/',class_name,'/00000.png']);
24 | ann_img = uint8(ann_img>0);
25 |
26 | num_obj = 1;
27 | im_name = images(k).name;
28 | im_name = im_name(1:end-4);
29 |
30 | img = imread([DAVIS_dir,'JPEGImages/480p/', class_name,'/',im_name,'.jpg']);
31 | base_img = imread([base_dir,class_name,'/',im_name,'.png']);
32 | if (size(base_img,1)~=size(img,1)) || (size(base_img,2)~=size(img,2))
33 | base_img = imresize(base_img,[size(img,1),size(img,2)]);
34 | end
35 | [H, W, ~] = size(img);
36 |
37 |
38 | mask_img = zeros(size(ann_img));
39 | mask_num = zeros(size(ann_img));
40 |
41 |
42 | mask_img_sim = zeros(size(ann_img));
43 | mask_img_score = zeros(size(ann_img));
44 |
45 | roi_seg_file = [part_dir, class_name,'/',im_name,'.mat'];
46 | load(roi_seg_file);
47 |
48 |
49 | box_num = size(mask,1);
50 |
51 |
52 |
53 | for j = 1:box_num
54 |
55 | box = rois(j,:);
56 | box = box(2:end);
57 |
58 |
59 | [~,~,roi_h,roi_w] = size(mask);
60 |
61 | box = round(box);
62 | box(3) = box(3)-box(1)+1;
63 | box(4) = box(4)-box(2)+1;
64 | ww = box(3);
65 | hh = box(4);
66 |
67 | patch = imcrop(img,box);
68 | if numel(patch) == 0
69 | continue
70 | end
71 |
72 | tmp_mask = mask(j,:,:,:);
73 | [~,~,roi_h,roi_w] = size(tmp_mask);
74 | tmp_mask = reshape(tmp_mask,[2, roi_h,roi_w]);
75 | res_mask = exp(tmp_mask(2,:,:))./(exp(tmp_mask(1,:,:))+exp(tmp_mask(2,:,:)));
76 | res_mask = reshape(res_mask,[roi_h,roi_w]);
77 |
78 |
79 | roi_mask = imresize(res_mask,[hh, ww]);
80 | box = rois(j,:);
81 | box = round(box(2:end));
82 | if box(2) == 0
83 | box(2) = 1;
84 | box(4) = box(4)+1;
85 | end
86 | if box(1) == 0
87 | box(1) = 1;
88 | box(3) = box(3)+1;
89 | end
90 | box(3) = min(box(3), W);
91 | box(4) = min(box(4), H);
92 | if (hh~=box(4)-box(2)+1) || (ww~=box(3)-box(1)+1)
93 | roi_mask = roi_mask(1:box(4)-box(2)+1,1:box(3)-box(1)+1);
94 | end
95 |
96 | mask_img(box(2):box(4),box(1):box(3)) = mask_img(box(2):box(4),box(1):box(3)) + roi_mask;
97 | mask_num(box(2):box(4),box(1):box(3)) = mask_num(box(2):box(4),box(1):box(3)) + 1;
98 | end
99 |
100 | mask_img_num = double(mask_num);
101 |
102 | mask_num(mask_num == 0) = 1;
103 | mask_img = mask_img./mask_num;
104 | mask_img = mask_img/max(max(mask_img));
105 |
106 | img_parts = mask_img;
107 |
108 |
109 | %% combine with whole image predictions
110 | num_obj = 1;
111 | im_name = images(k).name;
112 | im_name = im_name(1:end-4);
113 |
114 |
115 | if k == 1
116 | load('weights.mat');
117 | ratio = weights(find(strcmp(classes,class_name)>0));
118 | end
119 |
120 | base_img = imread([base_dir,class_name,'/',im_name,'.png']);
121 | if (size(base_img,1)~=size(ann_img,1)) || (size(base_img,2)~=size(ann_img,2))
122 | base_img = imresize(base_img,[size(ann_img,1),size(ann_img,2)]);
123 | end
124 | base_img = double(base_img);
125 | base_img = base_img/max(max(base_img));
126 |
127 | img_pick = img_parts;
128 |
129 | res_img = base_img*ratio + img_pick*(1-ratio);
130 |
131 | % soft output
132 | % imwrite(res_img, [res_dir,class_name,'/',im_name,'.jpg']);
133 | imwrite((res_img>0.5), [res_dir,class_name,'/',im_name,'.png']);
134 |
135 |
136 | end
137 |
138 | quit;
139 |
140 |
--------------------------------------------------------------------------------
/demo/infer_davis16.py:
--------------------------------------------------------------------------------
1 | import os,sys
2 | sys.path.append("../caffe")
3 | sys.path.append("../caffe/python")
4 | sys.path.append("../caffe/python/caffe")
5 |
6 | import caffe
7 |
8 | import numpy as np
9 | from PIL import Image
10 | import scipy.io
11 |
12 | from scipy.misc import imresize
13 |
14 | import os
15 | from scipy import io
16 |
17 | import shutil
18 |
19 | from numpy import *
20 | import math
21 | import random
22 |
23 |
24 |
25 | def load_image(im_name):
26 | im = Image.open(im_name)
27 | print >> sys.stderr, 'loading {}'.format(im_name)
28 | return im
29 |
30 |
31 | def func_dist(feat1, feat2):
32 | dist = np.zeros([feat1.shape[0], feat2.shape[0]], dtype=np.float32)
33 | for i in range(feat1.shape[0]):
34 | for j in range(feat2.shape[0]):
35 | vector1 = np.float32(mat(feat1[i,...]))
36 | vector2 = np.float32(mat(feat2[j,...]))
37 |
38 | dist[i][j] = sqrt((vector1-vector2)*((vector1-vector2).T))
39 |
40 | return dist
41 |
42 |
43 | def get_img_rois(img, rois, pool_h, pool_w):
44 | box_num = rois.shape[0]
45 | img_rois = np.zeros((box_num, 3, pool_h, pool_w))
46 | for i in range(box_num):
47 | bb = rois[i,1:5]
48 | ROI = img.crop(bb)
49 | ROI = ROI.resize((np.int(pool_w), np.int(pool_h)))
50 | in_ = np.array(ROI, dtype=np.float32)
51 | in_ = in_[:,:,::-1]
52 | in_ -= np.array((104.00698793,116.66876762,122.67891434))
53 | ROI = in_.transpose((2,0,1))
54 | img_rois[i,...] = ROI
55 |
56 | return img_rois
57 |
58 |
59 |
60 | def get_bbox(label):
61 | label = np.array(label, dtype=np.uint8)
62 | pos = np.where(label >0)
63 | if len(pos[0])<100:
64 | print >> sys.stderr, 'escape very small object {}'.format(obj_id)
65 | bb = [0, 0, 0, 0]
66 | return bb
67 | else:
68 | bb = [np.min(pos[1]), np.min(pos[0]), np.max(pos[1]), np.max(pos[0])]
69 | print>>sys.stderr, 'gt = {}; area = {};'.format(bb, len(pos[0]))
70 | return bb
71 |
72 |
73 | def load_label(im_name):
74 | print >> sys.stderr, 'loading {}'.format(im_name)
75 | im = Image.open(im_name)
76 | label = np.array(im, dtype=np.uint8)
77 | return label
78 |
79 |
80 | davis_dir = '../data/DAVIS2016/'
81 | file_out = '../results-demo/res_part/'
82 |
83 | if not os.path.exists(file_out):
84 | os.makedirs(file_out)
85 |
86 | model = sys.argv[1]
87 | deploy_proto = sys.argv[2]
88 | cls_name = sys.argv[3]
89 | device_id = np.int(sys.argv[4])
90 |
91 | caffe.set_device(device_id)
92 | caffe.set_mode_gpu()
93 | net = caffe.Net(deploy_proto, model, caffe.TEST)
94 |
95 | img_path = '{}/JPEGImages/480p/{}'.format(davis_dir, cls_name)
96 | images = os.listdir(img_path)
97 | images = sorted(images)
98 |
99 |
100 | det_dir = '../part-tracking/siamese-fc/DAVIS2016'
101 | det_name = '{}/{}.mat.mat'.format(det_dir, cls_name)
102 | data = io.loadmat(det_name)
103 | Dboxes = data['boxes']
104 |
105 |
106 | for idx in range(len(images)):
107 |
108 | if os.path.exists(file_out) == False:
109 | os.mkdir(file_out)
110 |
111 | if os.path.exists('{}/{}'.format(file_out, cls_name)) == False:
112 | os.mkdir('{}/{}'.format(file_out, cls_name))
113 |
114 | im_name = '{}/JPEGImages/480p/{}/{}'.format(davis_dir, cls_name, images[idx])
115 |
116 | ss = images[idx].split('.jpg')
117 | ss = ss[0]
118 |
119 | img_idx = np.int(ss)
120 | dets = Dboxes[0:Dboxes.shape[0],img_idx,0:Dboxes.shape[2]]
121 | dets = np.round(dets)
122 |
123 | pos = np.where(dets<0)
124 |
125 | for k in range(len(pos[0])):
126 | dets[pos[0][k], pos[1][k]] = 0
127 |
128 | input_roi = np.zeros((dets.shape[0],5))
129 | for k in range(dets.shape[0]):
130 | input_roi[k, 1] = dets[k, 0]
131 | input_roi[k, 2] = dets[k, 1]
132 | input_roi[k, 3] = dets[k, 2]
133 | input_roi[k, 4] = dets[k, 3]
134 |
135 | img1 = load_image(im_name)
136 | pool_w = 80
137 | pool_h = 80
138 |
139 | NUM = 200
140 | if dets.shape[0] > 0:
141 | input_rois = input_roi
142 | print >> sys.stderr,'box num = {}'.format(dets.shape[0])
143 | outs2 = np.zeros([dets.shape[0],2,pool_w,pool_h], dtype = np.float32)
144 | for k in range( np.int(np.ceil( dets.shape[0]*1.0/NUM)) ):
145 | input_roi = input_rois[k*NUM:np.min([(k+1)*NUM,dets.shape[0]]), ...]
146 | datas = get_img_rois(img1, input_roi, pool_h, pool_w)
147 | net.blobs['datas'].reshape(*datas.shape)
148 | net.blobs['datas'].data[...] = datas
149 | net.forward()
150 | out1 = net.blobs['feat'].data
151 | out2 = net.blobs['pred_mask'].data
152 |
153 | if k == 0:
154 | feat_len = out1.shape[1]
155 | outs1 = np.zeros([dets.shape[0], feat_len], dtype = np.float32)
156 | print >> sys.stderr, 'Feature length = {}, Candidate number = {}'.format(feat_len, dets.shape[0])
157 |
158 | outs1[k*NUM:np.min([(k+1)*NUM,dets.shape[0]]),...] = out1.reshape([out1.shape[0],out1.shape[1]])
159 | outs2[k*NUM:np.min([(k+1)*NUM,dets.shape[0]]),...] = out2
160 | print>>sys.stderr, 'Processing {}->{}'.format(k*NUM, np.min([(k+1)*NUM,dets.shape[0]]))
161 |
162 | io.savemat('{}/{}/{}.mat'.format(file_out, cls_name, ss), {'mask': outs2, 'rois': input_rois})
163 |
164 |
--------------------------------------------------------------------------------
/demo/jaccard_single.m:
--------------------------------------------------------------------------------
1 | function [J, inters, fp, fn] = jaccard_single( object, ground_truth )
2 |
3 | % Make sure they're binary
4 | object = logical(object);
5 | ground_truth = logical(ground_truth);
6 |
7 | % Intersection between all sets
8 | inters = object.*ground_truth;
9 | fp = object.*(1-inters);
10 | fn = ground_truth.*(1-inters);
11 |
12 | % Areas of the intersections
13 | inters = sum(inters(:)); % Intersection
14 | fp = sum(fp(:)); % False positives
15 | fn = sum(fn(:)); % False negatives
16 |
17 | % Compute the fraction
18 | denom = inters + fp + fn;
19 | if denom==0
20 | J = 1;
21 | else
22 | J = inters/denom;
23 | end
24 | end
25 |
--------------------------------------------------------------------------------
/demo/weights.mat:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/JingchunCheng/FAVOS/27662f841ed782d7a2701be1d72c481a1a203445/demo/weights.mat
--------------------------------------------------------------------------------
/download_all.sh:
--------------------------------------------------------------------------------
1 | cd data && sh ./dowdload_DAVIS16.sh && cd ../
2 | cd models && sh ./download_model.sh && cd ../
3 | cd part-tracking/siamese-fc && sh ./download_parts.sh && cd ../../
4 | cd results && sh ./download_results.sh && cd ../
5 |
--------------------------------------------------------------------------------
/figures/framework.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/JingchunCheng/FAVOS/27662f841ed782d7a2701be1d72c481a1a203445/figures/framework.png
--------------------------------------------------------------------------------
/models/download_model.sh:
--------------------------------------------------------------------------------
1 | if [ ! -f "ROISegNet_2016.caffemodel" ]; then
2 | wget https://www.dropbox.com/s/tkfa22j0ypq8ncq/ROISegNet_2016.caffemodel
3 | fi
4 |
5 |
--------------------------------------------------------------------------------
/part-tracking/siamese-fc/download_parts.sh:
--------------------------------------------------------------------------------
1 | if [ ! -d "DAVIS2016" ]; then
2 | wget https://www.dropbox.com/s/pkqlzlhwun4qwuu/parts_DAVIS2016.tar
3 | tar -xf parts_DAVIS2016.tar
4 | rm -f parts_DAVIS2016.tar
5 | fi
6 |
7 |
--------------------------------------------------------------------------------
/python_layers/bbox_accuracy_layer.py:
--------------------------------------------------------------------------------
1 | import caffe
2 | import numpy as np
3 | import math
4 | from scipy.misc import imresize
5 | import sys
6 | import random
7 |
8 | class BBoxAccLayer(caffe.Layer):
9 |
10 | def setup(self, bottom, top):
11 | # check input pair
12 | if len(bottom) != 2:
13 | raise Exception("Need two bottoms: score label")
14 |
15 |
16 | def reshape(self, bottom, top):
17 | #top[0].reshape(bottom[0].data.shape[0],bottom[0].data.shape[1],bottom[0].data.shape[2],bottom[0].data.shape[3])I
18 | #top[1].reshape(bottom[1].data.shape[0],bottom[1].data.shape[1],bottom[1].data.shape[2],bottom[1].data.shape[3])
19 | self.score = np.zeros_like(bottom[0].data, dtype=np.float32)
20 | self.label = np.zeros_like(bottom[1].data, dtype=np.float32)
21 | self.H = 480
22 | self.W = 854
23 |
24 |
25 |
26 | def forward(self, bottom, top):
27 | self.score = bottom[0].data
28 | self.label = bottom[1].data
29 | label = self.label
30 | num_bg = 0
31 | num_fg = 0
32 | correct_bg = 0
33 | correct_fg = 0
34 | for iw in range(self.W):
35 | for ih in range(self.H):
36 | if label[0,0,ih,iw] == -1:
37 | continue
38 | elif self.score[0,0,ih,iw]>self.score[0,1,ih,iw]:
39 | if label[0,0,ih,iw] == 0:
40 | num_bg = num_bg + 1
41 | correct_bg = correct_bg + 1
42 | else:
43 | num_fg = num_fg + 1
44 | else:
45 | if label[0,0,ih,iw] == 1:
46 | num_fg = num_fg + 1
47 | correct_fg = correct_fg + 1
48 | else:
49 | num_bg = num_bg + 1
50 |
51 | if num_bg > 0 and num_fg > 0:
52 | print >> sys.stderr,'pr_bg = {}; pr_fg = {}; pr_all = {}'.format(correct_bg*1.0/num_bg,correct_fg*1.0/num_fg,(correct_bg+correct_fg)*1.0/(num_bg+num_fg))
53 |
54 |
55 | def backward(self, top, propagate_down, bottom):
56 | pass
57 |
--------------------------------------------------------------------------------
/python_layers/bbox_data_layer.py:
--------------------------------------------------------------------------------
1 | import caffe
2 | import numpy as np
3 | from numpy import *
4 | import math
5 | from scipy.misc import imresize
6 | import sys
7 | import random
8 |
9 | class BboxDataLayer(caffe.Layer):
10 |
11 | def setup(self, bottom, top):
12 | # check input pair
13 | if len(bottom) != 2:
14 | raise Exception("Need three bottoms: label, rois")
15 | if len(top) != 1:
16 | raise Exception("Need one top: bbox_targets")
17 |
18 |
19 | def reshape(self, bottom, top):
20 | self.label = np.zeros_like(bottom[0].data, dtype=np.float32)
21 | self.rois = np.zeros_like(bottom[1].data, dtype=np.float32)
22 | self.bbox_targets = np.zeros([bottom[1].data.shape[0], 4], dtype=np.float32)
23 | top[0].reshape(self.rois.shape[0],4)
24 |
25 | def forward(self, bottom, top):
26 | self.label = bottom[0].data
27 | self.label = self.label.reshape([self.label.shape[1],self.label.shape[2]])
28 | self.rois = bottom[1].data
29 | self.box_num = self.rois.shape[0]
30 | gt_box = self.get_bbox(self.label)
31 |
32 | for i in range(self.box_num):
33 | bbgt = self.transform_box(gt_box, self.rois[i,1:5])
34 | bbgt = bbgt.reshape(1,4)
35 | self.bbox_targets[i,...] = bbgt
36 |
37 |
38 |
39 | top[0].reshape(self.box_num,4)
40 | top[0].data[...] = self.bbox_targets
41 |
42 |
43 |
44 | def backward(self, top, propagate_down, bottom):
45 | pass
46 |
47 |
48 | def get_bbox(self, label):
49 | label = np.array(label, dtype=np.uint8)
50 | pos = np.where(label == 1)
51 | bb = [np.min(pos[1]), np.min(pos[0]), np.max(pos[1]), np.max(pos[0])]
52 | #print>>sys.stderr, 'gt = {}; area = {};'.format(bb, len(pos[0]))
53 | return bb
54 |
55 |
56 | def transform_box(self,box,roi):
57 | x_a = (roi[0] + roi[2])*1.0/2
58 | y_a = (roi[1] + roi[3])*1.0/2
59 | w_a = (roi[2]-roi[0]+1)*1.0
60 | h_a = (roi[3]-roi[1]+1)*1.0
61 |
62 | bb = zeros([4,1],dtype=np.float32)
63 | bb[0] = (box[0] - x_a)*1.0/w_a
64 | bb[1] = (box[1] - y_a)*1.0/h_a
65 | bb[2] = box[2] - box[0] + 1
66 | bb[2] = math.log(bb[2]*1.0/w_a)
67 | bb[3] = box[3] - box[1] + 1
68 | bb[3] = math.log(bb[3]*1.0/h_a)
69 |
70 | #print >> sys.stderr, 'box = {}, box_transform = {}'.format(box, bb)
71 | return bb
72 |
--------------------------------------------------------------------------------
/python_layers/bbox_iou.py:
--------------------------------------------------------------------------------
1 | import caffe
2 | import numpy as np
3 | from numpy import *
4 | import math
5 | from scipy.misc import imresize
6 | import sys
7 | import random
8 | import math
9 |
10 | class BBoxIoULayer(caffe.Layer):
11 |
12 | def setup(self, bottom, top):
13 | # check input pair
14 | if len(bottom) != 3:
15 | raise Exception("Need two bottoms: label, roi, bbox_pred")
16 |
17 |
18 | def reshape(self, bottom, top):
19 | self.label = np.zeros_like(bottom[0].data, dtype=np.float32)
20 | self.roi = np.zeros_like(bottom[1].data, dtype=np.float32)
21 | self.bbox_pred = np.zeros_like(bottom[2].data, dtype=np.float32)
22 |
23 |
24 |
25 | def forward(self, bottom, top):
26 | self.label = bottom[0].data
27 | self.label = self.label.reshape([self.label.shape[1],self.label.shape[2]])
28 | self.rois = bottom[1].data
29 | self.bbox_pred = bottom[2].data
30 | box_num = self.rois.shape[0]
31 | gt_box = self.get_bbox(self.label)
32 | # print >> sys.stderr, self.label.shape
33 |
34 | IoUs = np.zeros([box_num, 2], dtype=np.float32)
35 | for i in range(box_num):
36 | roi = self.rois[i,1:5]
37 | box = self.bbox_pred[i,...,0,0]
38 | box = self.recover_box(box,roi)
39 | iou_roi = self.func_iou(roi, gt_box)
40 | iou_pred = self.func_iou(box, gt_box)
41 | IoUs[i,0]= iou_roi
42 | IoUs[i,1]= iou_pred
43 |
44 | # print >> sys.stderr, 'gt = {}'.format(gt_box)
45 | ave_iou = np.sum(IoUs, axis=0)*1.0/box_num
46 |
47 | print >> sys.stderr,'IoU before bbrg = {}; IoU after bbrg = {};'.format(ave_iou[0], ave_iou[1])
48 |
49 |
50 | def backward(self, top, propagate_down, bottom):
51 | pass
52 |
53 |
54 | def func_iou(self, bb, gtbb):
55 | iou = 0
56 | iw = min(bb[2],gtbb[2]) - max(bb[0],gtbb[0]) + 1
57 | ih = min(bb[3],gtbb[3]) - max(bb[1],gtbb[1]) + 1
58 | if iw>0 and ih>0:
59 | ua = (bb[2]-bb[0]+1)*(bb[3]-bb[1]+1) + (gtbb[2]-gtbb[0]+1)*(gtbb[3]-gtbb[1]+1) - iw*ih
60 | iou = np.float32(iw*ih*1.0/ua)
61 |
62 | return iou
63 |
64 |
65 |
66 | def get_bbox(self, label):
67 | label = np.array(label, dtype=np.uint8)
68 | pos = np.where(label == 1)
69 | bb = [np.min(pos[1]), np.min(pos[0]), np.max(pos[1]), np.max(pos[0])]
70 | print>>sys.stderr, 'gt = {}; area = {};'.format(bb, len(pos[0]))
71 | return bb
72 |
73 |
74 | def recover_box(self,box,roi):
75 | x_a = (roi[0] + roi[2])*1.0/2
76 | y_a = (roi[1] + roi[3])*1.0/2
77 | w_a = (roi[2]-roi[0]+1)*1.0
78 | h_a = (roi[3]-roi[1]+1)*1.0
79 |
80 | x = box[0]*w_a + x_a
81 | y = box[1]*h_a + y_a
82 | w = math.exp(box[2])*w_a
83 | h = math.exp(box[3])*h_a
84 |
85 | bb = zeros([4,1],dtype=np.float32)
86 | bb[0] = x - w*1.0/2
87 | bb[1] = y - h*1.0/2
88 | bb[2] = x + w*1.0/2
89 | bb[3] = y + h*1.0/2
90 |
91 | # print >> sys.stderr, 'box_transform = {}, box = {}'.format(box, bb)
92 | return bb
93 |
94 |
--------------------------------------------------------------------------------
/python_layers/compute_similarity.py:
--------------------------------------------------------------------------------
1 | import caffe
2 | import numpy as np
3 | from numpy import *
4 | import math
5 | from scipy.misc import imresize
6 | import sys
7 | import random
8 |
9 | class SimilarityLayer(caffe.Layer):
10 |
11 | def setup(self, bottom, top):
12 | # check input pair
13 | if len(bottom) != 2:
14 | raise Exception("Need two bottoms: bbox, feats")
15 |
16 |
17 | def reshape(self, bottom, top):
18 | self.bbox = np.zeros_like(bottom[0].data, dtype=np.float32)
19 | self.feats = np.zeros_like(bottom[1].data, dtype=np.float32)
20 | self.H = 480
21 | self.W = 854
22 | top[0].reshape(self.feats.shape[0],self.bbox.shape[0])
23 |
24 |
25 | def forward(self, bottom, top):
26 | self.feats = bottom[1].data
27 | self.bbox = bottom[0].data
28 | num_obj = self.bbox.shape[0]
29 | self.feat1 = self.feats[0:num_obj,...]
30 | self.feat2 = self.feats[num_obj:self.feats.shape[0],...]
31 | #print self.feat1.shape 300,4096,1,1
32 | #print >> sys.stderr, self.feat1
33 | #print >> sys.stderr, self.feat2
34 | print >> sys.stderr, 'Number of objects: {}'.format(num_obj)
35 |
36 | dist = np.zeros([self.feat2.shape[0], num_obj], dtype=np.float32)
37 | for obj_id in range(num_obj):
38 | for i in range(self.feat2.shape[0]):
39 | vector1 = np.float32(mat(self.feat1[obj_id,...,0]))
40 | vector2 = np.float32(mat(self.feat2[i,...,0]))
41 | dist[i, obj_id] = sqrt(((vector1-vector2).T)*(vector1-vector2))
42 |
43 | top[0].reshape(self.feat2.shape[0], num_obj)
44 | top[0].data[...] = dist
45 |
46 | def backward(self, top, propagate_down, bottom):
47 | pass
48 |
--------------------------------------------------------------------------------
/python_layers/davis2016_ROIdata_layer.py:
--------------------------------------------------------------------------------
1 | import caffe
2 |
3 | import numpy as np
4 | from PIL import Image
5 | from scipy.misc import imresize
6 |
7 | import sys
8 |
9 | import random
10 |
11 | from numpy import *
12 | import math
13 | from scipy.misc import imresize
14 |
15 |
16 |
17 | class Davis2016ROIDataLayer(caffe.Layer):
18 |
19 | def setup(self, bottom, top):
20 | params = eval(self.param_str)
21 | self.davis_dir = params['davis_dir']
22 | self.split = params['split']
23 | self.mean = np.array(params['mean'])
24 | self.random = params.get('randomize', True)
25 | self.seed = params.get('seed', None)
26 | self.scale = params.get('scale', 1)
27 | self.fg_rate = params.get('fg_rate', 0.5)
28 | self.augment = params.get('with_augmentation', True)
29 | self.fg_random = params.get('fg_random', False)
30 | self.aug_params= np.array(params['aug_params']) #( aug_num, max_scale, max_rotate, max_translation, flip)
31 | self.H = 480
32 | self.W = 854
33 | self.pool_w = params.get('pool_w', 7)
34 | self.pool_h = params.get('pool_h', 7)
35 | self.box_num = params.get('box_num', 100)
36 | self.fg_thre = params.get('fg_thre', 0.7)
37 |
38 | if self.augment:
39 | self.aug_num = np.int(self.aug_params[0])
40 | self.max_scale = self.aug_params[1]
41 | self.max_rotate = self.aug_params[2]
42 | self.max_transW = self.aug_params[3]
43 | self.max_transH = self.aug_params[4]
44 | self.flip = (self.aug_params[5]>0)
45 |
46 | # tops
47 | if len(top) != 3:
48 | raise Exception("Need to define five tops: data, labels, weights")
49 | # data layers have no bottoms
50 | if len(bottom) != 0:
51 | raise Exception("Do not define a bottom.")
52 |
53 | # load indices for images and labels
54 | split_f = '{}/ImageSets/480p/{}.txt'.format(self.davis_dir,
55 | self.split)
56 | self.indices = open(split_f, 'r').read().splitlines()
57 | self.idx1 = -1 # we pick idx in reshape
58 | self.idx2 = -1
59 |
60 | # make eval deterministic
61 | if 'train' not in self.split:
62 | self.random = False
63 |
64 | # randomization: seed and pick
65 | if self.random:
66 | random.seed(self.seed)
67 | self.idx = random.randint(0, len(self.indices)-1)
68 |
69 |
70 | def reshape(self, bottom, top):
71 | while True:
72 | # pick next input
73 | if self.random:
74 | self.idx1 = random.randint(0, len(self.indices)-1)
75 | else:
76 | self.idx1 += 1
77 | if self.idx1 == len(self.indices):
78 | self.idx1 = 0
79 |
80 | idx1 = self.idx1
81 |
82 | #get clip name
83 | clip1 = self.indices[idx1].split(' ')[0].split('/')[-2]
84 |
85 | if self.augment == False or random.randint(0, self.aug_num) == 0:
86 | self.img1 = self.load_image(self.indices[idx1].split(' ')[0])
87 | self.label1 = self.load_label(self.indices[idx1].split(' ')[1])
88 | self.img1 = self.img1.resize((self.H, self.W))
89 | self.label1 = imresize(self.label1, size=(self.H, self.W), interp="nearest")
90 | else:
91 | scale = (random.random()*2-1) * self.max_scale
92 | rotation = (random.random()*2-1) * self.max_rotate
93 | trans_w = np.int( (random.random()*2-1) * self.max_transW * self.W )
94 | trans_h = np.int( (random.random()*2-1) * self.max_transH * self.H )
95 | if self.flip:
96 | flip = (random.randint(0,1) > 0)
97 | else:
98 | flip = False
99 | self.img1 = self.load_image_transform(self.indices[idx1].split(' ')[0], scale, rotation, trans_h, trans_w, flip)
100 | self.label1 = self.load_label_transform(self.indices[idx1].split(' ')[1], scale, rotation, trans_h, trans_w, flip)
101 |
102 |
103 | if self.scale != 1:
104 | self.img1 = self.img1.resize((np.int(self.H*self.scale), np.int(self.W*self.scale)))
105 | print >> sys.stderr, 'SCALE {}'.format(self.scale)
106 |
107 | if np.max(self.label1) == 0:
108 | continue
109 |
110 | if np.sum(self.get_bbox(self.label1)) == 0:
111 | # if objects in either image are too small
112 | continue
113 |
114 | if self.fg_random:
115 | self.rois = self.get_rois(self.label1, self.box_num, 1)
116 | else:
117 | self.rois = self.get_bboxes(self.label1, self.box_num)
118 |
119 | self.img_rois = self.get_img_rois(self.img1, self.rois)
120 | self.lab_rois = self.get_lab_rois(self.label1, self.rois)
121 | self.weights = self.calculate_weight_rois(self.lab_rois)
122 |
123 | break
124 |
125 | # reshape tops
126 | top[0].reshape(self.box_num, 3, self.pool_h, self.pool_w)
127 | top[1].reshape(self.box_num, self.pool_h, self.pool_w)
128 | top[2].reshape(self.box_num, self.pool_h, self.pool_w)
129 |
130 |
131 |
132 | def forward(self, bottom, top):
133 | # assign output
134 | top[0].data[...] = self.img_rois
135 | top[1].data[...] = self.lab_rois
136 | top[2].data[...] = self.weights
137 |
138 | def backward(self, top, propagate_down, bottom):
139 | pass
140 |
141 |
142 | def load_image(self, idx):
143 | """
144 | Load input image and preprocess for Caffe:
145 | - cast to float
146 | - switch channels RGB -> BGR
147 | - subtract mean
148 | - transpose to channel x height x width order
149 | """
150 | print >> sys.stderr, 'loading {}'.format(idx)
151 | im = Image.open('{}/{}'.format(self.davis_dir, idx))
152 | im = im.resize((self.W, self.H))
153 | # in_ = np.array(im, dtype=np.float32)
154 | # in_ = in_[:,:,::-1]
155 | # in_ -= self.mean
156 | # in_ = in_.transpose((2,0,1))
157 | return im
158 |
159 |
160 | def load_label(self, idx):
161 | """
162 | Load label image as 1 x height x width integer array of label indices.
163 | The leading singleton dimension is required by the loss.
164 | """
165 | print >> sys.stderr, 'loading {}'.format(idx)
166 | im = Image.open('{}/{}'.format(self.davis_dir, idx))
167 | im = im.resize((self.W, self.H))
168 | if self.scale != 1:
169 | im = im.resize((np.int(self.W*self.scale), np.int(self.H*self.scale)))
170 |
171 | label = np.array(im, dtype=np.uint8)
172 | # label = label[np.newaxis, ...]
173 | print >> sys.stderr, 'Number of Objects: {}'.format(np.max(label))
174 | return label
175 |
176 |
177 |
178 | def load_image_transform(self, idx, scale, rotation, trans_h, trans_w, flip):
179 | img_W = np.int( self.W*(1.0 + scale) )
180 | img_H = np.int( self.H*(1.0 + scale) )
181 |
182 | print >> sys.stderr, 'loading {}'.format(idx)
183 | print >> sys.stderr, 'scale: {}; rotation: {}; translation: ({},{}); flip: {}.'.format(scale, rotation, trans_w, trans_h, flip)
184 |
185 | im = Image.open('{}/{}'.format(self.davis_dir, idx))
186 | im = im.resize((img_W,img_H))
187 | im = im.transform((img_W,img_H),Image.AFFINE,(1,0,trans_w,0,1,trans_h))
188 | im = im.rotate(rotation)
189 | if flip:
190 | im = im.transpose(Image.FLIP_LEFT_RIGHT)
191 |
192 | if scale>0:
193 | box = (np.int((img_W - self.W)/2), np.int((img_H - self.H)/2), np.int((img_W - self.W)/2)+self.W, np.int((img_H - self.H)/2)+self.H)
194 | im = im.crop(box)
195 | else:
196 | im = im.resize((self.W, self.H))
197 |
198 | # in_ = np.array(im, dtype=np.float32)
199 | # in_ = in_[:,:,::-1]
200 | # in_ -= self.mean
201 |
202 | return im
203 |
204 |
205 | def load_label_transform(self, idx, scale, rotation, trans_h, trans_w, flip):
206 | img_W = np.int( self.W*(1.0 + scale) )
207 | img_H = np.int( self.H*(1.0 + scale) )
208 |
209 | im = Image.open('{}/{}'.format(self.davis_dir, idx))
210 | im = im.resize((img_W,img_H))
211 | im = im.transform((img_W,img_H),Image.AFFINE,(1,0,trans_w,0,1,trans_h))
212 | im = im.rotate(rotation)
213 | if flip:
214 | im = im.transpose(Image.FLIP_LEFT_RIGHT)
215 |
216 | if scale>0:
217 | # w_start = np.int(random.random()*(img_W - self.W))
218 | # h_start = np.int(random.random()*(img_H - self.H))
219 | # box = (w_start, h_start, w_start+self.W, h_start+self.H)
220 | box = (np.int((img_W - self.W)/2), np.int((img_H - self.H)/2), np.int((img_W - self.W)/2)+self.W, np.int((img_H - self.H)/2)+self.H)
221 | im = im.crop(box)
222 | else:
223 | im = im.resize((self.W, self.H))
224 |
225 | if self.scale != 1:
226 | im = im.resize((np.int(self.W*self.scale), np.int(self.H*self.scale)))
227 |
228 | label = np.array(im, dtype=np.uint8)
229 | print >> sys.stderr, 'Number of Objects: {}'.format(np.max(label))
230 |
231 | return label
232 |
233 |
234 | def get_bboxes(self, label, box_num):
235 | label = np.array(label, dtype=np.uint8)
236 | pos = np.where(label > 0 )
237 | rois = np.zeros((box_num, 5))
238 | bb = [np.min(pos[1]), np.min(pos[0]), np.max(pos[1]), np.max(pos[0])]
239 | for k in range(box_num):
240 | rois[k, 1:5] = bb
241 | return rois
242 |
243 |
244 | def get_bbox(self, label):
245 | label = np.array(label, dtype=np.uint8)
246 | pos = np.where(label > 0 )
247 | if len(pos[0])<1024:
248 | print >> sys.stderr, 'Escape very small object (area < 1024).'
249 | bb = [0, 0, 0, 0]
250 | return bb
251 | else:
252 | bb = [np.min(pos[1]), np.min(pos[0]), np.max(pos[1]), np.max(pos[0])]
253 | print>>sys.stderr, 'gt = {}; area = {};'.format(bb, len(pos[0]))
254 | return bb
255 |
256 |
257 | def get_rois(self, label, box_num, fg_rate):
258 | label = np.array(label, dtype=np.uint8)
259 | pos = np.where(label > 0)
260 | W = label.shape[1]
261 | H = label.shape[0]
262 | print>>sys.stderr, label.shape
263 | rois = np.zeros((box_num, 5))
264 | gtbb = [np.min(pos[1]), np.min(pos[0]), np.max(pos[1]), np.max(pos[0])]
265 | box = np.zeros_like(gtbb)
266 | bb = np.zeros((1,5))
267 | k = 0
268 | while k < np.int(box_num*fg_rate):
269 | bb[0, 1] = random.randint(0,W)
270 | bb[0, 2] = random.randint(0,H)
271 | bb[0, 3] = random.randint(bb[0, 1],W)
272 | bb[0, 4] = random.randint(bb[0, 2],H)
273 | if (bb[0,4] - bb[0,2]+1)*(bb[0,3]-bb[0,1]+1) < 8*8*4:
274 | continue
275 | iou = self.func_iou(bb[0, 1:5], gtbb)
276 | # print>>sys.stderr, 'gt = {}; proposal = {}; IoU = {};'.format(gtbb, bb, iou)
277 | if iou > self.fg_thre:
278 | rois[k,...] = bb
279 | k = k + 1
280 | # print>>sys.stderr, 'box_num = {}; gt = {}; proposal = {}; IoU = {};'.format(k, gtbb, bb[0, 1:5], iou)
281 |
282 | while k < box_num:
283 | bb[0, 1] = random.randint(0,W)
284 | bb[0, 2] = random.randint(0,H)
285 | bb[0, 3] = random.randint(bb[0, 1],W)
286 | bb[0, 4] = random.randint(bb[0, 2],H)
287 | if (bb[0,4] - bb[0,2]+1)*(bb[0,3]-bb[0,1]+1) < 8*8*4:
288 | continue
289 | iou = self.func_iou(bb[0, 1:5], gtbb)
290 | # print>>sys.stderr, 'obj = {}; gt = {}; proposal = {}; IoU = {};'.format(obj_id, gtbb, bb, iou)
291 | if iou < 0.5:
292 | rois[k,...] = bb
293 | k = k + 1
294 |
295 | return rois
296 |
297 |
298 |
299 | def func_iou(self, bb, gtbb):
300 | iou = 0
301 | iw = min(bb[2],gtbb[2]) - max(bb[0],gtbb[0]) + 1
302 | ih = min(bb[3],gtbb[3]) - max(bb[1],gtbb[1]) + 1
303 | if iw>0 and ih>0:
304 | ua = (bb[2]-bb[0]+1)*(bb[3]-bb[1]+1) + (gtbb[2]-gtbb[0]+1)*(gtbb[3]-gtbb[1]+1) - iw*ih
305 | iou = np.float32(iw*ih*1.0/ua)
306 |
307 | return iou
308 |
309 | def get_img_rois(self, img, rois):
310 | #img = Image.fromarray(img)
311 | img_rois = np.zeros((self.box_num, 3, self.pool_h, self.pool_w))
312 | for i in range(self.box_num):
313 | bb = rois[i,1:5]
314 | ROI = img.crop(bb)
315 | ROI = ROI.resize((np.int(self.pool_w), np.int(self.pool_h)))
316 | # print>>sys.stderr, ROI.shape
317 | in_ = np.array(ROI, dtype=np.float32)
318 | in_ = in_[:,:,::-1]
319 | in_ -= self.mean
320 | ROI = in_.transpose((2,0,1))
321 | img_rois[i,...] = ROI
322 |
323 | return img_rois
324 |
325 |
326 | def get_lab_rois(self, lab, rois):
327 | lab = Image.fromarray(lab)
328 | lab_rois = np.zeros((self.box_num, self.pool_h, self.pool_w))
329 | for i in range(self.box_num):
330 | bb = rois[i,1:5]
331 | ROI = lab.crop(bb)
332 | ROI = ROI.resize((np.int(self.pool_w), np.int(self.pool_h)))
333 | ROI = np.array(ROI, dtype=np.uint8)
334 | ROI = np.uint8(ROI>0)
335 | lab_rois[i,...] = ROI
336 |
337 | return lab_rois
338 |
339 |
340 | def calculate_weight_rois(self, labels):
341 | weights = np.ones((self.box_num, self.pool_h, self.pool_w))
342 | for i in range(self.box_num):
343 | label = labels[i,...]
344 | pos = np.where(label==1)
345 | neg = np.where(label==0)
346 | weight_pos = len(pos[0])*1.0/(len(pos[0])+len(neg[0]))
347 | for k in range(len(pos[0])):
348 | weights[i, pos[0][k],pos[1][k]] = 1 - weight_pos
349 | # print >> sys.stderr, 'pos_num = {}, neg_num = {}, weight_pos = {}'.format(len(pos[0]), len(neg[0]), 1 - weight_pos)
350 |
351 | return weights
352 |
353 |
--------------------------------------------------------------------------------
/python_layers/davis2016_fgbg_data_layer.py:
--------------------------------------------------------------------------------
1 | import caffe
2 |
3 | import numpy as np
4 | from PIL import Image
5 |
6 | import cv2
7 | from scipy.misc import imresize
8 | from scipy.misc import imrotate
9 |
10 | import sys
11 |
12 | import random
13 |
14 | class DAVIS2016FgBgDataLayer(caffe.Layer):
15 | """
16 | Load (input image, label image) pairs from PASCAL VOC
17 | one-at-a-time while reshaping the net to preserve dimensions.
18 |
19 | Use this to feed data to a fully convolutional network.
20 | """
21 |
22 | def setup(self, bottom, top):
23 | # config
24 | params = eval(self.param_str)
25 | self.davis_dir = params['davis_dir']
26 | self.split = params['split']
27 | self.mean = np.array(params['mean'])
28 | self.random = params.get('randomize', True)
29 | self.seed = params.get('seed', None)
30 | self.scale = params.get('scale', 1)
31 | self.augment = params.get('with_augmentation', True)
32 | self.aug_params = np.array(params['aug_params']) #( aug_num, max_scale, max_rotate, max_translation, flip)
33 | self.H = 480
34 | self.W = 854
35 |
36 |
37 | # two tops: data and label
38 | if len(top) != 3:
39 | raise Exception("Need to define two tops: data label weight")
40 | # data layers have no bottoms
41 | if len(bottom) != 0:
42 | raise Exception("Do not define a bottom.")
43 |
44 | # load indices for images and labels
45 | split_f = '{}/ImageSets/480p/{}.txt'.format(self.davis_dir,
46 | self.split)
47 | self.indices = open(split_f, 'r').read().splitlines()
48 | self.idx = -1 # we pick idx in reshape
49 |
50 | if self.augment:
51 | self.aug_num = np.int(self.aug_params[0])
52 | self.max_scale = self.aug_params[1]
53 | self.max_rotate = self.aug_params[2]
54 | self.max_transW = self.aug_params[3]
55 | self.max_transH = self.aug_params[4]
56 | self.flip = (self.aug_params[5]>0)
57 |
58 |
59 | # make eval deterministic
60 | if 'train' not in self.split:
61 | self.random = False
62 |
63 | # randomization: seed and pick
64 | if self.random:
65 | random.seed(self.seed)
66 | self.idx = random.randint(0, len(self.indices)-1)
67 |
68 |
69 | def reshape(self, bottom, top):
70 |
71 | while True:
72 | # pick next input
73 | if self.random:
74 | self.idx = random.randint(0, len(self.indices)-1)
75 | else:
76 | self.idx += 1
77 | if self.idx == len(self.indices):
78 | self.idx = 0
79 |
80 |
81 | if self.idx == (len(self.indices) - 1):
82 | continue
83 |
84 | idx = self.idx
85 |
86 | if self.augment == False or random.randint(0, self.aug_num) == 0:
87 | self.img = self.load_image(self.indices[idx].split(' ')[0])
88 | self.label = self.load_label(self.indices[idx].split(' ')[1])
89 | self.img = imresize(self.img, size=(self.H, self.W), interp="bilinear")
90 | self.label = imresize(self.label, size=(self.H, self.W), interp="nearest")
91 | else:
92 | scale = (random.random()*2-1) * self.max_scale
93 | rotation = (random.random()*2-1) * self.max_rotate
94 | trans_w = np.int( (random.random()*2-1) * self.max_transW * self.W )
95 | trans_h = np.int( (random.random()*2-1) * self.max_transH * self.H )
96 | if self.flip:
97 | flip = (random.randint(0,1) > 0)
98 | else:
99 | flip = False
100 | self.img = self.load_image_transform(self.indices[idx].split(' ')[0], scale, rotation, trans_h, trans_w, flip)
101 | self.label = self.load_label_transform(self.indices[idx].split(' ')[1], scale, rotation, trans_h, trans_w, flip)
102 |
103 |
104 | if self.scale != 1:
105 | self.img = imresize(self.img, size=(np.int(self.H*self.scale), np.int(self.W*self.scale)), interp="bilinear")
106 | self.label = imresize(self.label, size=(np.int(self.H*self.scale), np.int(self.W*self.scale)), interp="nearest")
107 |
108 |
109 | self.weight = self.calculate_weight(self.label)
110 | self.img = self.img.transpose((2,0,1))
111 | self.label = self.label[np.newaxis, ...]
112 | self.weight = self.weight[np.newaxis, ...]
113 | break
114 |
115 | # reshape tops to fit (leading 2 is for batch dimension)
116 | top[0].reshape(1, *self.img.shape)
117 | top[1].reshape(1, *self.label.shape)
118 | top[2].reshape(1, *self.weight.shape)
119 |
120 | def forward(self, bottom, top):
121 | top[0].data[...] = self.img
122 | top[1].data[...] = self.label
123 | top[2].data[...] = self.weight
124 |
125 | def backward(self, top, propagate_down, bottom):
126 | pass
127 |
128 |
129 | def load_image(self, idx):
130 | """
131 | Load input image and preprocess for Caffe:
132 | - cast to float
133 | - switch channels RGB -> BGR
134 | - subtract mean
135 | - transpose to channel x height x width order
136 | """
137 | print >> sys.stderr, 'loading Original {}'.format(idx)
138 | im = Image.open('{}/{}'.format(self.davis_dir, idx))
139 | in_ = np.array(im, dtype=np.float32)
140 | in_ = in_[:,:,::-1]
141 | in_ -= self.mean
142 | return in_
143 |
144 |
145 | def load_label(self, idx):
146 | """
147 | Load label image as 1 x height x width integer array of label indices.
148 | The leading singleton dimension is required by the loss.
149 | """
150 | im = Image.open('{}/{}'.format(self.davis_dir, idx))
151 | label = np.array(im>0, dtype=np.uint8)
152 | print >> sys.stderr, 'Number of Objects: {}'.format(np.max(label))
153 |
154 | #print(label)
155 | return label
156 |
157 | def load_image_transform(self, idx, scale, rotation, trans_h, trans_w, flip):
158 | img_W = np.int( self.W*(1.0 + scale) )
159 | img_H = np.int( self.H*(1.0 + scale) )
160 |
161 | print >> sys.stderr, 'loading {}'.format(idx)
162 | print >> sys.stderr, 'scale: {}; rotation: {}; translation: ({},{}); flip: {}.'.format(scale, rotation, trans_w, trans_h, flip)
163 |
164 | im = Image.open('{}/{}'.format(self.davis_dir, idx))
165 | im = im.resize((img_W,img_H))
166 | im = im.transform((img_W,img_H),Image.AFFINE,(1,0,trans_w,0,1,trans_h))
167 | im = im.rotate(rotation)
168 | if flip:
169 | im = im.transpose(Image.FLIP_LEFT_RIGHT)
170 |
171 | if scale>0:
172 | box = (np.int((img_W - self.W)/2), np.int((img_H - self.H)/2), np.int((img_W - self.W)/2)+self.W, np.int((img_H - self.H)/2)+self.H)
173 | im = im.crop(box)
174 | else:
175 | im = im.resize((self.W, self.H))
176 |
177 |
178 | in_ = np.array(im, dtype=np.float32)
179 | in_ = in_[:,:,::-1]
180 | in_ -= self.mean
181 |
182 | return in_
183 |
184 |
185 | def load_label_transform(self, idx, scale, rotation, trans_h, trans_w, flip):
186 | img_W = np.int( self.W*(1.0 + scale) )
187 | img_H = np.int( self.H*(1.0 + scale) )
188 |
189 |
190 | im = Image.open('{}/{}'.format(self.davis_dir, idx))
191 | im = im.resize((img_W,img_H))
192 | im = im.transform((img_W,img_H),Image.AFFINE,(1,0,trans_w,0,1,trans_h))
193 | im = im.rotate(rotation)
194 | if flip:
195 | im = im.transpose(Image.FLIP_LEFT_RIGHT)
196 |
197 | if scale>0:
198 | w_start = np.int(random.random()*(img_W - self.W))
199 | h_start = np.int(random.random()*(img_H - self.H))
200 | box = (w_start, h_start, w_start+self.W, h_start+self.H)
201 | im = im.crop(box)
202 | else:
203 | im = im.resize((self.W, self.H))
204 |
205 | label = np.array(im>0, dtype=np.uint8)
206 | print >> sys.stderr, 'Number of Objects: {}'.format(np.max(label))
207 |
208 |
209 | return label
210 |
211 |
212 |
213 | def calculate_weight(self, annt):
214 | weight = np.zeros_like(annt, dtype = np.float32) + 1
215 | pos = np.where(annt==1)
216 | neg = np.where(annt==0)
217 | pos_weight = 1 - np.float32(len(pos[0]))/np.float32(len(pos[0])+len(neg[0]))
218 | weight = weight - pos_weight
219 | print >> sys.stderr, 'pos_weight: {}'.format(pos_weight)
220 |
221 | for idx in range (len(pos[0])):
222 | weight[pos[0][idx], pos[1][idx]] = pos_weight
223 |
224 | #print(weight)
225 | return weight
226 |
--------------------------------------------------------------------------------
/python_layers/davis2016_siamese_data_layer.py:
--------------------------------------------------------------------------------
1 | import caffe
2 |
3 | import numpy as np
4 | from PIL import Image
5 | from scipy.misc import imresize
6 |
7 | import sys
8 |
9 | import random
10 |
11 | class Davis2016SiameseDataLayer(caffe.Layer):
12 |
13 | def setup(self, bottom, top):
14 | params = eval(self.param_str)
15 | self.davis_dir = params['davis_dir']
16 | self.split = params['split']
17 | self.mean = np.array(params['mean'])
18 | self.random = params.get('randomize', True)
19 | self.seed = params.get('seed', None)
20 | self.scale = params.get('scale', 1)
21 | self.fg_rate = params.get('fg_rate', 0.5)
22 | self.augment = params.get('with_augmentation', True)
23 | self.fg_random = params.get('fg_random', False)
24 | self.aug_params= np.array(params['aug_params']) #( aug_num, max_scale, max_rotate, max_translation, flip)
25 | self.H = 480
26 | self.W = 854
27 | self.box_num = params.get('box_num', 100)
28 | self.fg_thre = params.get('fg_thre', 0.7)
29 |
30 | if self.augment:
31 | self.aug_num = np.int(self.aug_params[0])
32 | self.max_scale = self.aug_params[1]
33 | self.max_rotate = self.aug_params[2]
34 | self.max_transW = self.aug_params[3]
35 | self.max_transH = self.aug_params[4]
36 | self.flip = (self.aug_params[5]>0)
37 |
38 | # tops
39 | if len(top) != 5:
40 | raise Exception("Need to define five tops: data, labels, label_sim, bbox, rois")
41 | # data layers have no bottoms
42 | if len(bottom) != 0:
43 | raise Exception("Do not define a bottom.")
44 |
45 | # load indices for images and labels
46 | split_f = '{}/ImageSets/480p/{}.txt'.format(self.davis_dir,
47 | self.split)
48 | self.indices = open(split_f, 'r').read().splitlines()
49 | self.idx1 = -1 # we pick idx in reshape
50 | self.idx2 = -1
51 |
52 | # make eval deterministic
53 | if 'train' not in self.split:
54 | self.random = False
55 |
56 | # randomization: seed and pick
57 | if self.random:
58 | random.seed(self.seed)
59 | self.idx = random.randint(0, len(self.indices)-1)
60 |
61 |
62 | def reshape(self, bottom, top):
63 | while True:
64 | # pick next input
65 | if self.random:
66 | self.idx1 = random.randint(0, len(self.indices)-1)
67 | self.idx2 = random.randint(0, len(self.indices)-1)
68 | else:
69 | self.idx1 += 1
70 | self.idx2 += 1
71 | if self.idx1 == len(self.indices):
72 | self.idx1 = 0
73 | if self.idx2 == len(self.indices):
74 | self.idx2 = 0
75 |
76 | idx1 = self.idx1
77 | idx2 = self.idx2
78 |
79 | #get clip name
80 | clip1 = self.indices[idx1].split(' ')[0].split('/')[-2]
81 | clip2 = self.indices[idx2].split(' ')[0].split('/')[-2]
82 |
83 | if clip1 != clip2:
84 | continue
85 |
86 |
87 | if self.augment == False or random.randint(0, self.aug_num) == 0:
88 | self.img1 = self.load_image(self.indices[idx1].split(' ')[0])
89 | self.img2 = self.load_image(self.indices[idx2].split(' ')[0])
90 | self.label1 = self.load_label(self.indices[idx1].split(' ')[1])
91 | self.label2 = self.load_label(self.indices[idx2].split(' ')[1])
92 | self.img1 = imresize(self.img1, size=(self.H, self.W), interp="bilinear")
93 | self.label1 = imresize(self.label1, size=(self.H, self.W), interp="nearest")
94 | self.img2 = imresize(self.img2, size=(self.H, self.W), interp="bilinear")
95 | self.label2 = imresize(self.label2, size=(self.H, self.W), interp="nearest")
96 | else:
97 | scale = (random.random()*2-1) * self.max_scale
98 | rotation = (random.random()*2-1) * self.max_rotate
99 | trans_w = np.int( (random.random()*2-1) * self.max_transW * self.W )
100 | trans_h = np.int( (random.random()*2-1) * self.max_transH * self.H )
101 | if self.flip:
102 | flip = (random.randint(0,1) > 0)
103 | else:
104 | flip = False
105 | self.img1 = self.load_image_transform(self.indices[idx1].split(' ')[0], scale, rotation, trans_h, trans_w, flip)
106 | self.label1 = self.load_label_transform(self.indices[idx1].split(' ')[1], scale, rotation, trans_h, trans_w, flip)
107 |
108 | scale = (random.random()*2-1) * self.max_scale
109 | rotation = (random.random()*2-1) * self.max_rotate
110 | trans_w = np.int( (random.random()*2-1) * self.max_transW * self.W )
111 | trans_h = np.int( (random.random()*2-1) * self.max_transH * self.H )
112 | if self.flip:
113 | flip = (random.randint(0,1) > 0)
114 | else:
115 | flip = False
116 | self.img2 = self.load_image_transform(self.indices[idx2].split(' ')[0], scale, rotation, trans_h, trans_w, flip)
117 | self.label2 = self.load_label_transform(self.indices[idx2].split(' ')[1], scale, rotation, trans_h, trans_w, flip)
118 |
119 |
120 | if self.scale != 1:
121 | self.img1 = imresize(self.img1, size=(np.int(self.H*self.scale), np.int(self.W*self.scale)), interp="bilinear")
122 | # self.label1 = imresize(self.label1, size=(np.int(self.H*self.scale), np.int(self.W*self.scale)), interp="nearest", mode='L')
123 | self.img2 = imresize(self.img2, size=(np.int(self.H*self.scale), np.int(self.W*self.scale)), interp="bilinear")
124 | # self.label2 = imresize(self.label2, size=(np.int(self.H*self.scale), np.int(self.W*self.scale)), interp="nearest", mode='L')
125 | print >> sys.stderr, 'SCALE {}'.format(self.scale)
126 |
127 |
128 | if np.max(self.label1) == 0 or np.max(self.label2) == 0:
129 | continue
130 |
131 | if np.sum(self.get_bbox(self.label1)) == 0 or np.sum(self.get_bbox(self.label2)) == 0:
132 | # if objects in either image are too small
133 | continue
134 |
135 | if self.fg_random:
136 | self.bbox = self.get_rois(self.label1, self.box_num, 1)
137 | else:
138 | self.bbox = self.get_bboxes(self.label1, self.box_num)
139 | self.rois = self.get_rois(self.label2, self.box_num, self.fg_rate)
140 | self.label_sim = np.zeros([self.box_num, 1], dtype = np.uint8)
141 |
142 | for i in range(np.int(self.box_num*self.fg_rate)):
143 | self.label_sim[i] = 1
144 |
145 | # print >> sys.stderr,'label = {}'.format(self.label_sim)
146 | self.img1 = self.img1.transpose((2,0,1))
147 | self.img2 = self.img2.transpose((2,0,1))
148 |
149 | self.label1 = np.uint8(self.label1>0)
150 | self.label2 = np.uint8(self.label2>0)
151 | break
152 |
153 | # reshape tops
154 | top[0].reshape(2, *self.img1.shape)
155 | top[1].reshape(2, *self.label1.shape)
156 | top[2].reshape(self.box_num, 1)
157 | top[3].reshape(self.box_num, 5)
158 | top[4].reshape(self.box_num, 5)
159 |
160 |
161 | print >> sys.stderr,self.label1.shape
162 | print >> sys.stderr,self.label2.shape
163 |
164 | def forward(self, bottom, top):
165 | # assign output
166 | top[0].data[...] = np.concatenate((self.img1[np.newaxis, ...], self.img2[np.newaxis, ...]), axis = 0)
167 | top[1].data[...] = np.concatenate((self.label1[np.newaxis, ...], self.label2[np.newaxis, ...]), axis = 0)
168 | top[2].data[...] = self.label_sim
169 | top[3].data[...] = self.bbox
170 | top[4].data[...] = self.rois
171 |
172 | def backward(self, top, propagate_down, bottom):
173 | pass
174 |
175 |
176 | def load_image(self, idx):
177 | """
178 | Load input image and preprocess for Caffe:
179 | - cast to float
180 | - switch channels RGB -> BGR
181 | - subtract mean
182 | - transpose to channel x height x width order
183 | """
184 | print >> sys.stderr, 'loading {}'.format(idx)
185 | im = Image.open('{}/{}'.format(self.davis_dir, idx))
186 | im = im.resize((self.W, self.H))
187 | in_ = np.array(im, dtype=np.float32)
188 | in_ = in_[:,:,::-1]
189 | in_ -= self.mean
190 | # in_ = in_.transpose((2,0,1))
191 | return in_
192 |
193 |
194 | def load_label(self, idx):
195 | """
196 | Load label image as 1 x height x width integer array of label indices.
197 | The leading singleton dimension is required by the loss.
198 | """
199 | print >> sys.stderr, 'loading {}'.format(idx)
200 | im = Image.open('{}/{}'.format(self.davis_dir, idx))
201 | im = im.resize((self.W, self.H))
202 | if self.scale != 1:
203 | im = im.resize((np.int(self.W*self.scale), np.int(self.H*self.scale)))
204 |
205 | label = np.array(im, dtype=np.uint8)
206 | # label = label[np.newaxis, ...]
207 | print >> sys.stderr, 'Number of Objects: {}'.format(np.max(label))
208 | return label
209 |
210 |
211 |
212 | def load_image_transform(self, idx, scale, rotation, trans_h, trans_w, flip):
213 | img_W = np.int( self.W*(1.0 + scale) )
214 | img_H = np.int( self.H*(1.0 + scale) )
215 |
216 | print >> sys.stderr, 'loading {}'.format(idx)
217 | print >> sys.stderr, 'scale: {}; rotation: {}; translation: ({},{}); flip: {}.'.format(scale, rotation, trans_w, trans_h, flip)
218 |
219 | im = Image.open('{}/{}'.format(self.davis_dir, idx))
220 | im = im.resize((img_W,img_H))
221 | im = im.transform((img_W,img_H),Image.AFFINE,(1,0,trans_w,0,1,trans_h))
222 | im = im.rotate(rotation)
223 | if flip:
224 | im = im.transpose(Image.FLIP_LEFT_RIGHT)
225 |
226 | if scale>0:
227 | box = (np.int((img_W - self.W)/2), np.int((img_H - self.H)/2), np.int((img_W - self.W)/2)+self.W, np.int((img_H - self.H)/2)+self.H)
228 | im = im.crop(box)
229 | else:
230 | im = im.resize((self.W, self.H))
231 |
232 | in_ = np.array(im, dtype=np.float32)
233 | in_ = in_[:,:,::-1]
234 | in_ -= self.mean
235 |
236 | return in_
237 |
238 |
239 | def load_label_transform(self, idx, scale, rotation, trans_h, trans_w, flip):
240 | img_W = np.int( self.W*(1.0 + scale) )
241 | img_H = np.int( self.H*(1.0 + scale) )
242 |
243 | im = Image.open('{}/{}'.format(self.davis_dir, idx))
244 | im = im.resize((img_W,img_H))
245 | im = im.transform((img_W,img_H),Image.AFFINE,(1,0,trans_w,0,1,trans_h))
246 | im = im.rotate(rotation)
247 | if flip:
248 | im = im.transpose(Image.FLIP_LEFT_RIGHT)
249 |
250 | if scale>0:
251 | w_start = np.int(random.random()*(img_W - self.W))
252 | h_start = np.int(random.random()*(img_H - self.H))
253 | box = (w_start, h_start, w_start+self.W, h_start+self.H)
254 | im = im.crop(box)
255 | else:
256 | im = im.resize((self.W, self.H))
257 |
258 | if self.scale != 1:
259 | im = im.resize((np.int(self.W*self.scale), np.int(self.H*self.scale)))
260 |
261 | label = np.array(im, dtype=np.uint8)
262 | print >> sys.stderr, 'Number of Objects: {}'.format(np.max(label))
263 |
264 | return label
265 |
266 |
267 | def get_bboxes(self, label, box_num):
268 | label = np.array(label, dtype=np.uint8)
269 | pos = np.where(label > 0 )
270 | rois = np.zeros((box_num, 5))
271 | bb = [np.min(pos[1]), np.min(pos[0]), np.max(pos[1]), np.max(pos[0])]
272 | for k in range(box_num):
273 | rois[k, 1:5] = bb
274 | return rois
275 |
276 |
277 | def get_bbox(self, label):
278 | label = np.array(label, dtype=np.uint8)
279 | pos = np.where(label > 0 )
280 | if len(pos[0])<1024:
281 | print >> sys.stderr, 'Escape very small object (area < 1024).'
282 | bb = [0, 0, 0, 0]
283 | return bb
284 | else:
285 | bb = [np.min(pos[1]), np.min(pos[0]), np.max(pos[1]), np.max(pos[0])]
286 | print>>sys.stderr, 'gt = {}; area = {};'.format(bb, len(pos[0]))
287 | return bb
288 |
289 |
290 | def get_rois(self, label, box_num, fg_rate):
291 | label = np.array(label, dtype=np.uint8)
292 | pos = np.where(label > 0)
293 | W = label.shape[1]
294 | H = label.shape[0]
295 | print>>sys.stderr, label.shape
296 | rois = np.zeros((box_num, 5))
297 | gtbb = [np.min(pos[1]), np.min(pos[0]), np.max(pos[1]), np.max(pos[0])]
298 | box = np.zeros_like(gtbb)
299 | bb = np.zeros((1,5))
300 | k = 0
301 | while k < np.int(box_num*fg_rate):
302 | bb[0, 1] = random.randint(0,W)
303 | bb[0, 2] = random.randint(0,H)
304 | bb[0, 3] = random.randint(bb[0, 1],W)
305 | bb[0, 4] = random.randint(bb[0, 2],H)
306 | if (bb[0,4] - bb[0,2]+1)*(bb[0,3]-bb[0,1]+1) < 8*8*4:
307 | continue
308 | iou = self.func_iou(bb[0, 1:5], gtbb)
309 | # print>>sys.stderr, 'gt = {}; proposal = {}; IoU = {};'.format(gtbb, bb, iou)
310 | if iou > self.fg_thre:
311 | rois[k,...] = bb
312 | k = k + 1
313 | # print>>sys.stderr, 'box_num = {}; gt = {}; proposal = {}; IoU = {};'.format(k, gtbb, bb[0, 1:5], iou)
314 |
315 | while k < box_num:
316 | bb[0, 1] = random.randint(0,W)
317 | bb[0, 2] = random.randint(0,H)
318 | bb[0, 3] = random.randint(bb[0, 1],W)
319 | bb[0, 4] = random.randint(bb[0, 2],H)
320 | if (bb[0,4] - bb[0,2]+1)*(bb[0,3]-bb[0,1]+1) < 8*8*4:
321 | continue
322 | iou = self.func_iou(bb[0, 1:5], gtbb)
323 | # print>>sys.stderr, 'obj = {}; gt = {}; proposal = {}; IoU = {};'.format(obj_id, gtbb, bb, iou)
324 | if iou < 0.5:
325 | rois[k,...] = bb
326 | k = k + 1
327 |
328 | return rois
329 |
330 |
331 |
332 | def func_iou(self, bb, gtbb):
333 | iou = 0
334 | iw = min(bb[2],gtbb[2]) - max(bb[0],gtbb[0]) + 1
335 | ih = min(bb[3],gtbb[3]) - max(bb[1],gtbb[1]) + 1
336 | if iw>0 and ih>0:
337 | ua = (bb[2]-bb[0]+1)*(bb[3]-bb[1]+1) + (gtbb[2]-gtbb[0]+1)*(gtbb[3]-gtbb[1]+1) - iw*ih
338 | iou = np.float32(iw*ih*1.0/ua)
339 |
340 | return iou
341 |
342 |
--------------------------------------------------------------------------------
/python_layers/davis2017_ROIdata_layer.py:
--------------------------------------------------------------------------------
1 | import caffe
2 |
3 | import numpy as np
4 | from PIL import Image
5 | from scipy.misc import imresize
6 |
7 | import sys
8 |
9 | import random
10 |
11 | from numpy import *
12 | import math
13 | from scipy.misc import imresize
14 |
15 | import time
16 |
17 |
18 | class Davis2017ROIDataLayer(caffe.Layer):
19 |
20 | def setup(self, bottom, top):
21 | params = eval(self.param_str)
22 | self.davis_dir = params['davis_dir']
23 | self.split = params['split']
24 | self.mean = np.array(params['mean'])
25 | self.random = params.get('randomize', True)
26 | self.seed = params.get('seed', None)
27 | self.scale = params.get('scale', 1)
28 | self.fg_rate = params.get('fg_rate', 0.5)
29 | self.augment = params.get('with_augmentation', True)
30 | self.fg_random = params.get('fg_random', False)
31 | self.aug_params= np.array(params['aug_params']) #( aug_num, max_scale, max_rotate, max_translation, flip)
32 | self.H = 480
33 | self.W = 854
34 | self.pool_w = params.get('pool_w', 7)
35 | self.pool_h = params.get('pool_h', 7)
36 | self.box_num = params.get('box_num', 100)
37 | self.fg_thre = params.get('fg_thre', 0.7)
38 |
39 | if self.augment:
40 | self.aug_num = np.int(self.aug_params[0])
41 | self.max_scale = self.aug_params[1]
42 | self.max_rotate = self.aug_params[2]
43 | self.max_transW = self.aug_params[3]
44 | self.max_transH = self.aug_params[4]
45 | self.flip = (self.aug_params[5]>0)
46 |
47 | # tops
48 | if len(top) != 3:
49 | raise Exception("Need to define five tops: data, labels, weights")
50 | # data layers have no bottoms
51 | if len(bottom) != 0:
52 | raise Exception("Do not define a bottom.")
53 |
54 | # load indices for images and labels
55 | split_f = '{}/ImageSets/2017/{}.txt'.format(self.davis_dir,
56 | self.split)
57 | self.indices = open(split_f, 'r').read().splitlines()
58 | self.idx1 = -1 # we pick idx in reshape
59 | self.idx2 = -1
60 |
61 | # make eval deterministic
62 | if 'train' not in self.split:
63 | self.random = False
64 |
65 | # randomization: seed and pick
66 | if self.random:
67 | random.seed(self.seed)
68 | self.idx = random.randint(0, len(self.indices)-1)
69 |
70 |
71 | def reshape(self, bottom, top):
72 | while True:
73 | # pick next input
74 | if self.random:
75 | self.idx1 = random.randint(0, len(self.indices)-1)
76 | else:
77 | self.idx1 += 1
78 | if self.idx1 == len(self.indices):
79 | self.idx1 = 0
80 |
81 | idx1 = self.idx1
82 |
83 | #get clip name
84 | clip1 = self.indices[idx1].split(' ')[0].split('/')[-2]
85 |
86 | if self.augment == False or random.randint(0, self.aug_num) == 0:
87 | self.img1 = self.load_image(self.indices[idx1].split(' ')[0])
88 | self.label1 = self.load_label(self.indices[idx1].split(' ')[1])
89 | # self.img1 = self.img1.resize((self.H, self.W))
90 | # self.label1 = imresize(self.label1, size=(self.H, self.W), interp="nearest")
91 | else:
92 | scale = (random.random()*2-1) * self.max_scale
93 | rotation = (random.random()*2-1) * self.max_rotate
94 | trans_w = np.int( (random.random()*2-1) * self.max_transW * self.W )
95 | trans_h = np.int( (random.random()*2-1) * self.max_transH * self.H )
96 | if self.flip:
97 | flip = (random.randint(0,1) > 0)
98 | else:
99 | flip = False
100 | self.img1 = self.load_image_transform(self.indices[idx1].split(' ')[0], scale, rotation, trans_h, trans_w, flip)
101 | self.label1 = self.load_label_transform(self.indices[idx1].split(' ')[1], scale, rotation, trans_h, trans_w, flip)
102 |
103 |
104 | # if self.scale != 1:
105 | # self.img1 = self.img1.resize((np.int(self.H*self.scale), np.int(self.W*self.scale)))
106 | # print >> sys.stderr, 'SCALE {}'.format(self.scale)
107 |
108 | if np.max(self.label1) == 0:
109 | continue
110 |
111 |
112 | if self.fg_random:
113 | self.rois, obj_ids = self.get_rois(self.label1, self.box_num, 1)
114 | else:
115 | self.rois, obj_ids = self.get_bboxes(self.label1, self.box_num)
116 |
117 | if np.sum(self.rois) == 0:
118 | continue
119 |
120 | self.img_rois = self.get_img_rois(self.img1, self.rois)
121 | self.lab_rois = self.get_lab_rois(self.label1, self.rois, obj_ids)
122 | self.weights = self.calculate_weight_rois(self.lab_rois)
123 |
124 | break
125 |
126 | # reshape tops
127 | top[0].reshape(self.box_num, 3, self.pool_h, self.pool_w)
128 | top[1].reshape(self.box_num, self.pool_h, self.pool_w)
129 | top[2].reshape(self.box_num, self.pool_h, self.pool_w)
130 |
131 |
132 |
133 | def forward(self, bottom, top):
134 | # assign output
135 | top[0].data[...] = self.img_rois
136 | top[1].data[...] = self.lab_rois
137 | top[2].data[...] = self.weights
138 |
139 | def backward(self, top, propagate_down, bottom):
140 | pass
141 |
142 |
143 | def load_image(self, idx):
144 | """
145 | Load input image and preprocess for Caffe:
146 | - cast to float
147 | - switch channels RGB -> BGR
148 | - subtract mean
149 | - transpose to channel x height x width order
150 | """
151 | print >> sys.stderr, 'loading {}'.format(idx)
152 | im = Image.open('{}/{}'.format(self.davis_dir, idx))
153 | im = im.resize((self.W, self.H))
154 | return im
155 |
156 |
157 | def load_label(self, idx):
158 | """
159 | Load label image as 1 x height x width integer array of label indices.
160 | The leading singleton dimension is required by the loss.
161 | """
162 | print >> sys.stderr, 'loading {}'.format(idx)
163 | im = Image.open('{}/{}'.format(self.davis_dir, idx))
164 |
165 | label = np.array(im, dtype=np.uint8)
166 | print >> sys.stderr, 'Number of Objects: {}'.format(np.max(label))
167 | return label
168 |
169 |
170 |
171 | def load_image_transform(self, idx, scale, rotation, trans_h, trans_w, flip):
172 | img_W = np.int( self.W*(1.0 + scale) )
173 | img_H = np.int( self.H*(1.0 + scale) )
174 |
175 | print >> sys.stderr, 'loading {}'.format(idx)
176 | print >> sys.stderr, 'scale: {}; rotation: {}; translation: ({},{}); flip: {}.'.format(scale, rotation, trans_w, trans_h, flip)
177 |
178 | im = Image.open('{}/{}'.format(self.davis_dir, idx))
179 | im = im.transform((img_W,img_H),Image.AFFINE,(1,0,trans_w,0,1,trans_h))
180 | im = im.rotate(rotation)
181 | if flip:
182 | im = im.transpose(Image.FLIP_LEFT_RIGHT)
183 |
184 |
185 | return im
186 |
187 |
188 | def load_label_transform(self, idx, scale, rotation, trans_h, trans_w, flip):
189 | img_W = np.int( self.W*(1.0 + scale) )
190 | img_H = np.int( self.H*(1.0 + scale) )
191 |
192 | im = Image.open('{}/{}'.format(self.davis_dir, idx))
193 | im = im.transform((img_W,img_H),Image.AFFINE,(1,0,trans_w,0,1,trans_h))
194 | im = im.rotate(rotation)
195 | if flip:
196 | im = im.transpose(Image.FLIP_LEFT_RIGHT)
197 |
198 |
199 | label = np.array(im, dtype=np.uint8)
200 | print >> sys.stderr, 'Number of Objects: {}'.format(np.max(label))
201 |
202 | return label
203 |
204 |
205 | def get_bboxes(self, label, box_num):
206 | label = np.array(label, dtype=np.uint8)
207 | objs = np.unique(label)
208 |
209 | if len(objs) < 2:
210 | bb = [0, 0, 0, 0, 0]
211 | return bb
212 |
213 | objs = objs[1:len(obj)]
214 | num_obj = len(objs)
215 |
216 | step = np.ceil(box_num*1.0/np.float32(num_obj))
217 | rois = np.zeros((box_num, 5))
218 | obj_ids = np.zeros((box_num, 1))
219 | nn = 0
220 | area = 0
221 | for obj_id in objs:
222 | pos = np.where(label == obj_id)
223 | area = np.max([area, len(pos[0])])
224 | bb = [np.min(pos[1]), np.min(pos[0]), np.max(pos[1]), np.max(pos[0])]
225 | for k in range(step):
226 | rois[nn, 1:5] = bb
227 | obj_ids[nn] = obj_id
228 | nn = nn + 1
229 | if nn == box_num:
230 | break
231 |
232 | if area < 40*40:
233 | bb = [0, 0, 0, 0, 0]
234 | print >> sys.stderr,'All objects atr too small.'
235 | return bb, bb
236 |
237 |
238 | return rois, obj_ids
239 |
240 |
241 | def get_bbox(self, label, obj_id):
242 | label = np.array(label, dtype=np.uint8)
243 | pos = np.where(label == obj_id )
244 | if len(pos[0])< 40*40:
245 | print >> sys.stderr, 'Escape very small object (area < 1024).'
246 | bb = [0, 0, 0, 0]
247 | return bb
248 | else:
249 | bb = [np.min(pos[1]), np.min(pos[0]), np.max(pos[1]), np.max(pos[0])]
250 | print>>sys.stderr, 'gt = {}; area = {};'.format(bb, len(pos[0]))
251 | return bb
252 |
253 |
254 | def get_rois(self, label, box_num, fg_rate):
255 | label = np.array(label, dtype=np.uint8)
256 | objs = np.unique(label)
257 |
258 | start_time = time.clock()
259 |
260 | if len(objs) < 2:
261 | bb = [0, 0, 0, 0, 0]
262 | return bb, bb
263 |
264 | area = 0
265 | objs = objs[1:len(objs)]
266 | for obj_id in objs:
267 | pos = np.where(label == obj_id)
268 | area = np.max([area, len(pos[0])])
269 |
270 | if area < 40*40:
271 | bb = [0, 0, 0, 0, 0]
272 | print >> sys.stderr,'All objects are too small.'
273 | return bb, bb
274 |
275 |
276 | num_obj = len(objs)
277 | rois = np.zeros((box_num, 5))
278 | obj_ids = np.zeros((box_num,1))
279 | k = 0
280 | gts = np.zeros([num_obj, 4])
281 | for obj_id in objs:
282 | pos = np.where(label == obj_id)
283 | gtbb = [np.min(pos[1]), np.min(pos[0]), np.max(pos[1]), np.max(pos[0])]
284 | gts[k,...]=gtbb
285 | k = k + 1
286 |
287 | W = label.shape[1]
288 | H = label.shape[0]
289 | print>>sys.stderr, label.shape
290 |
291 | box = np.zeros_like(gtbb)
292 | bb = np.zeros((1,5))
293 | k = 0
294 | while k < np.int(box_num*fg_rate):
295 | bb[0, 1] = random.randint(0,W)
296 | bb[0, 2] = random.randint(0,H)
297 | bb[0, 3] = random.randint(bb[0, 1],W)
298 | bb[0, 4] = random.randint(bb[0, 2],H)
299 | if (bb[0,4] - bb[0,2]+1)*(bb[0,3]-bb[0,1]+1) < 8*8*4:
300 | continue
301 |
302 | flag_select = False
303 | flag_break = False
304 | for kk in range(num_obj):
305 | iou = self.func_iou(bb[0, 1:5], gts[kk,...])
306 | if iou > self.fg_thre and flag_select == False:
307 | inb = 0
308 | for nn in range(num_obj):
309 | if nn == kk:
310 | continue
311 | else:
312 | inb = np.max([inb, self.func_inb(gts[nn,...], bb[0,1:5])])
313 | if inb < 0.7:
314 | flag_select = True
315 | IoU = iou
316 | idx = objs[kk]
317 | continue
318 |
319 | if iou > self.fg_thre and flag_select == True:
320 | flag_select = False
321 | break
322 |
323 |
324 | if flag_select:
325 | rois[k,...] = bb
326 | obj_ids[k] = idx
327 | # print>>sys.stderr,'obj_id {}: proposal = {}, iou = {}'.format(idx,bb,IoU)
328 | k = k + 1
329 |
330 | duration_time = time.clock() - start_time
331 | if duration_time > 60:
332 | print >> sys.stderr,'Objects too close, can not get enough proposals within {} s'.format(duration_time)
333 | flag_break = True
334 | break
335 |
336 |
337 | if flag_break:
338 | bb = [0, 0, 0, 0, 0]
339 | return bb, bb
340 |
341 | while k < box_num:
342 | bb[0, 1] = random.randint(0,W)
343 | bb[0, 2] = random.randint(0,H)
344 | bb[0, 3] = random.randint(bb[0, 1],W)
345 | bb[0, 4] = random.randint(bb[0, 2],H)
346 | if (bb[0,4] - bb[0,2]+1)*(bb[0,3]-bb[0,1]+1) < 8*8*4:
347 | continue
348 |
349 | flag_select = True
350 | for kk in range(num_obj):
351 | iou = self.func_iou(bb[0, 1:5], gts[kk,...])
352 | if iou > 0.5:
353 | flag_select = False
354 | break
355 |
356 | if flag_select:
357 | rois[k,...] = bb
358 | k = k + 1
359 |
360 | return rois, obj_ids
361 |
362 |
363 |
364 | def func_iou(self, bb, gtbb):
365 | iou = 0
366 | iw = min(bb[2],gtbb[2]) - max(bb[0],gtbb[0]) + 1
367 | ih = min(bb[3],gtbb[3]) - max(bb[1],gtbb[1]) + 1
368 | if iw>0 and ih>0:
369 | ua = (bb[2]-bb[0]+1)*(bb[3]-bb[1]+1) + (gtbb[2]-gtbb[0]+1)*(gtbb[3]-gtbb[1]+1) - iw*ih
370 | iou = np.float32(iw*ih*1.0/ua)
371 |
372 | return iou
373 |
374 | def get_img_rois(self, img, rois):
375 | #img = Image.fromarray(img)
376 | img_rois = np.zeros((self.box_num, 3, self.pool_h, self.pool_w))
377 | for i in range(self.box_num):
378 | bb = rois[i,1:5]
379 | ROI = img.crop(bb)
380 | ROI = ROI.resize((np.int(self.pool_w), np.int(self.pool_h)))
381 | # print>>sys.stderr, ROI.shape
382 | in_ = np.array(ROI, dtype=np.float32)
383 | in_ = in_[:,:,::-1]
384 | in_ -= self.mean
385 | ROI = in_.transpose((2,0,1))
386 | img_rois[i,...] = ROI
387 |
388 | return img_rois
389 |
390 |
391 | def get_lab_rois(self, lab, rois, obj_ids):
392 | lab_rois = np.zeros((self.box_num, self.pool_h, self.pool_w))
393 | for i in range(self.box_num):
394 | bb = rois[i,1:5]
395 | label= np.uint8(lab == obj_ids[i])
396 | label= Image.fromarray(label)
397 | ROI = label.crop(bb)
398 | # print >> sys.stderr, 'ROI_lab = {}'.format(np.unique(ROI))
399 | ROI = ROI.resize((np.int(self.pool_w), np.int(self.pool_h)))
400 | ROI = np.array(ROI, dtype=np.uint8)
401 | ROI = np.uint8(ROI>0.5)
402 | lab_rois[i,...] = ROI
403 |
404 | # print >> sys.stderr, 'ROI_lab = {}'.format(np.unique(lab_rois))
405 | return lab_rois
406 |
407 |
408 | def calculate_weight_rois(self, labels):
409 | weights = np.ones((self.box_num, self.pool_h, self.pool_w))
410 | for i in range(self.box_num):
411 | label = labels[i,...]
412 | pos = np.where(label==1)
413 | neg = np.where(label==0)
414 | weight_pos = len(pos[0])*1.0/(len(pos[0])+len(neg[0]))
415 | for k in range(len(pos[0])):
416 | weights[i, pos[0][k],pos[1][k]] = 1 - weight_pos
417 | # print >> sys.stderr, 'pos_num = {}, neg_num = {}, weight_pos = {}'.format(len(pos[0]), len(neg[0]), 1 - weight_pos)
418 |
419 | return weights
420 |
421 | def func_inb(self, bb, gtbb):
422 | # iou/area(bb)
423 | iou = 0
424 | iw = min(bb[2],gtbb[2]) - max(bb[0],gtbb[0]) + 1
425 | ih = min(bb[3],gtbb[3]) - max(bb[1],gtbb[1]) + 1
426 | if iw>0 and ih>0:
427 | ua = (bb[2]-bb[0]+1)*(bb[3]-bb[1]+1)
428 | iou = np.float32(iw*ih*1.0/ua)
429 |
430 | return iou
431 |
432 |
--------------------------------------------------------------------------------
/python_layers/davis_data_layer.py:
--------------------------------------------------------------------------------
1 | import caffe
2 |
3 | import numpy as np
4 | from PIL import Image
5 |
6 | import cv2
7 | from scipy.misc import imresize
8 | from scipy.misc import imrotate
9 |
10 | import sys
11 |
12 | import random
13 |
14 | class DAVISDataLayer(caffe.Layer):
15 | """
16 | Load (input image, label image) pairs from PASCAL VOC
17 | one-at-a-time while reshaping the net to preserve dimensions.
18 |
19 | Use this to feed data to a fully convolutional network.
20 | """
21 |
22 | def setup(self, bottom, top):
23 | # config
24 | params = eval(self.param_str)
25 | self.davis_dir = params['davis_dir']
26 | self.split = params['split']
27 | self.mean = np.array(params['mean'])
28 | self.random = params.get('randomize', True)
29 | self.seed = params.get('seed', None)
30 | self.scale = params.get('scale', 1)
31 | self.augment = params.get('with_augmentation', True)
32 | self.aug_params = np.array(params['aug_params']) #( aug_num, max_scale, max_rotate, max_translation, flip)
33 | self.H = 480
34 | self.W = 854
35 |
36 |
37 | # two tops: data and label
38 | if len(top) != 2:
39 | raise Exception("Need to define two tops: data label")
40 | # data layers have no bottoms
41 | if len(bottom) != 0:
42 | raise Exception("Do not define a bottom.")
43 |
44 | # load indices for images and labels
45 | split_f = '{}/ImageSets/2017/{}.txt'.format(self.davis_dir,
46 | self.split)
47 | self.indices = open(split_f, 'r').read().splitlines()
48 | self.idx = -1 # we pick idx in reshape
49 |
50 | if self.augment:
51 | self.aug_num = np.int(self.aug_params[0])
52 | self.max_scale = self.aug_params[1]
53 | self.max_rotate = self.aug_params[2]
54 | self.max_transW = self.aug_params[3]
55 | self.max_transH = self.aug_params[4]
56 | self.flip = (self.aug_params[5]>0)
57 |
58 |
59 | # make eval deterministic
60 | if 'train' not in self.split:
61 | self.random = False
62 |
63 | # randomization: seed and pick
64 | if self.random:
65 | random.seed(self.seed)
66 | self.idx = random.randint(0, len(self.indices)-1)
67 |
68 |
69 | def reshape(self, bottom, top):
70 |
71 | while True:
72 | # pick next input
73 | if self.random:
74 | self.idx = random.randint(0, len(self.indices)-1)
75 | else:
76 | self.idx += 1
77 | if self.idx == len(self.indices):
78 | self.idx = 0
79 |
80 |
81 | if self.idx == (len(self.indices) - 1):
82 | continue
83 |
84 | idx = self.idx
85 |
86 | if self.augment == False or random.randint(0, self.aug_num) == 0:
87 | self.img = self.load_image(self.indices[idx].split(' ')[0])
88 | self.label = self.load_label(self.indices[idx].split(' ')[1])
89 | self.img = imresize(self.img, size=(self.H, self.W), interp="bilinear")
90 | self.label = imresize(self.label, size=(self.H, self.W), interp="nearest")
91 | else:
92 | scale = (random.random()*2-1) * self.max_scale
93 | rotation = (random.random()*2-1) * self.max_rotate
94 | trans_w = np.int( (random.random()*2-1) * self.max_transW * self.W )
95 | trans_h = np.int( (random.random()*2-1) * self.max_transH * self.H )
96 | if self.flip:
97 | flip = (random.randint(0,1) > 0)
98 | else:
99 | flip = False
100 | self.img = self.load_image_transform(self.indices[idx].split(' ')[0], scale, rotation, trans_h, trans_w, flip)
101 | self.label = self.load_label_transform(self.indices[idx].split(' ')[1], scale, rotation, trans_h, trans_w, flip)
102 |
103 |
104 | if self.scale != 1:
105 | self.img = imresize(self.img, size=(np.int(self.H*self.scale), np.int(self.W*self.scale)), interp="bilinear")
106 | self.label = imresize(self.label, size=(np.int(self.H*self.scale), np.int(self.W*self.scale)), interp="nearest")
107 |
108 |
109 | self.img = self.img.transpose((2,0,1))
110 | self.label = self.label[np.newaxis, ...]
111 | # print(self.img.shape)
112 | break
113 |
114 | # reshape tops to fit (leading 2 is for batch dimension)
115 | top[0].reshape(1, *self.img.shape)
116 | top[1].reshape(1, *self.label.shape)
117 |
118 | def forward(self, bottom, top):
119 | top[0].data[...] = self.img
120 | top[1].data[...] = self.label
121 |
122 | def backward(self, top, propagate_down, bottom):
123 | pass
124 |
125 |
126 | def load_image(self, idx):
127 | """
128 | Load input image and preprocess for Caffe:
129 | - cast to float
130 | - switch channels RGB -> BGR
131 | - subtract mean
132 | - transpose to channel x height x width order
133 | """
134 | print >> sys.stderr, 'loading Original {}'.format(idx)
135 | im = Image.open('{}/{}'.format(self.davis_dir, idx))
136 | in_ = np.array(im, dtype=np.float32)
137 | in_ = in_[:,:,::-1]
138 | in_ -= self.mean
139 | return in_
140 |
141 |
142 | def load_label(self, idx):
143 | """
144 | Load label image as 1 x height x width integer array of label indices.
145 | The leading singleton dimension is required by the loss.
146 | """
147 | im = Image.open('{}/{}'.format(self.davis_dir, idx))
148 | label = np.array(im, dtype=np.uint8)
149 | print >> sys.stderr, 'Number of Objects: {}'.format(np.max(label))
150 |
151 | #print(label)
152 | return label
153 |
154 | def load_image_transform(self, idx, scale, rotation, trans_h, trans_w, flip):
155 | img_W = np.int( self.W*(1.0 + scale) )
156 | img_H = np.int( self.H*(1.0 + scale) )
157 |
158 | print >> sys.stderr, 'loading {}'.format(idx)
159 | print >> sys.stderr, 'scale: {}; rotation: {}; translation: ({},{}); flip: {}.'.format(scale, rotation, trans_w, trans_h, flip)
160 |
161 | im = Image.open('{}/{}'.format(self.davis_dir, idx))
162 | im = im.resize((img_W,img_H))
163 | im = im.transform((img_W,img_H),Image.AFFINE,(1,0,trans_w,0,1,trans_h))
164 | im = im.rotate(rotation)
165 | if flip:
166 | im = im.transpose(Image.FLIP_LEFT_RIGHT)
167 |
168 | if scale>0:
169 | box = (np.int((img_W - self.W)/2), np.int((img_H - self.H)/2), np.int((img_W - self.W)/2)+self.W, np.int((img_H - self.H)/2)+self.H)
170 | im = im.crop(box)
171 | else:
172 | im = im.resize((self.W, self.H))
173 |
174 |
175 | in_ = np.array(im, dtype=np.float32)
176 | in_ = in_[:,:,::-1]
177 | in_ -= self.mean
178 |
179 | return in_
180 |
181 |
182 | def load_label_transform(self, idx, scale, rotation, trans_h, trans_w, flip):
183 | img_W = np.int( self.W*(1.0 + scale) )
184 | img_H = np.int( self.H*(1.0 + scale) )
185 |
186 |
187 | im = Image.open('{}/{}'.format(self.davis_dir, idx))
188 | im = im.resize((img_W,img_H))
189 | im = im.transform((img_W,img_H),Image.AFFINE,(1,0,trans_w,0,1,trans_h))
190 | im = im.rotate(rotation)
191 | if flip:
192 | im = im.transpose(Image.FLIP_LEFT_RIGHT)
193 |
194 | if scale>0:
195 | w_start = np.int(random.random()*(img_W - self.W))
196 | h_start = np.int(random.random()*(img_H - self.H))
197 | box = (w_start, h_start, w_start+self.W, h_start+self.H)
198 | im = im.crop(box)
199 | else:
200 | im = im.resize((self.W, self.H))
201 |
202 | label = np.array(im, dtype=np.uint8)
203 | print >> sys.stderr, 'Number of Objects: {}'.format(np.max(label))
204 |
205 |
206 | return label
207 |
208 |
209 |
210 |
211 |
--------------------------------------------------------------------------------
/python_layers/davis_siamese_data_layer.py:
--------------------------------------------------------------------------------
1 | import caffe
2 |
3 | import numpy as np
4 | from PIL import Image
5 | from scipy.misc import imresize
6 |
7 | import sys
8 |
9 | import random
10 |
11 | class DavisSiameseDataLayer(caffe.Layer):
12 |
13 | def setup(self, bottom, top):
14 | params = eval(self.param_str)
15 | self.davis_dir = params['davis_dir']
16 | self.split = params['split']
17 | self.mean = np.array(params['mean'])
18 | self.random = params.get('randomize', True)
19 | self.seed = params.get('seed', None)
20 | self.scale = params.get('scale', 1)
21 | self.fg_rate = params.get('fg_rate', 0.5)
22 | self.augment = params.get('with_augmentation', True)
23 | self.aug_params= np.array(params['aug_params']) #( aug_num, max_scale, max_rotate, max_translation, flip)
24 | self.H = 480
25 | self.W = 854
26 | self.box_num = params.get('box_num', 100)
27 |
28 | if self.augment:
29 | self.aug_num = np.int(self.aug_params[0])
30 | self.max_scale = self.aug_params[1]
31 | self.max_rotate = self.aug_params[2]
32 | self.max_transW = self.aug_params[3]
33 | self.max_transH = self.aug_params[4]
34 | self.flip = (self.aug_params[5]>0)
35 |
36 | # tops
37 | if len(top) != 5:
38 | raise Exception("Need to define five tops: data, labels, label_sim, bbox, rois")
39 | # data layers have no bottoms
40 | if len(bottom) != 0:
41 | raise Exception("Do not define a bottom.")
42 |
43 | # load indices for images and labels
44 | split_f = '{}/ImageSets/480p/{}.txt'.format(self.davis_dir,
45 | self.split)
46 | self.indices = open(split_f, 'r').read().splitlines()
47 | self.idx1 = -1 # we pick idx in reshape
48 | self.idx2 = -1
49 |
50 | # make eval deterministic
51 | if 'train' not in self.split:
52 | self.random = False
53 |
54 | # randomization: seed and pick
55 | if self.random:
56 | random.seed(self.seed)
57 | self.idx = random.randint(0, len(self.indices)-1)
58 |
59 |
60 | def reshape(self, bottom, top):
61 | while True:
62 | # pick next input
63 | if self.random:
64 | self.idx1 = random.randint(0, len(self.indices)-1)
65 | self.idx2 = random.randint(0, len(self.indices)-1)
66 | else:
67 | self.idx1 += 1
68 | self.idx2 += 1
69 | if self.idx1 == len(self.indices):
70 | self.idx1 = 0
71 | if self.idx2 == len(self.indices):
72 | self.idx2 = 0
73 |
74 | idx1 = self.idx1
75 | idx2 = self.idx2
76 |
77 | #get clip name
78 | clip1 = self.indices[idx1].split(' ')[0].split('/')[-2]
79 | clip2 = self.indices[idx2].split(' ')[0].split('/')[-2]
80 |
81 | if clip1 != clip2:
82 | continue
83 |
84 |
85 | if self.augment == False or random.randint(0, self.aug_num) == 0:
86 | self.img1 = self.load_image(self.indices[idx1].split(' ')[0])
87 | self.img2 = self.load_image(self.indices[idx2].split(' ')[0])
88 | self.label1 = self.load_label(self.indices[idx1].split(' ')[1])
89 | self.label2 = self.load_label(self.indices[idx2].split(' ')[1])
90 | self.img1 = imresize(self.img1, size=(self.H, self.W), interp="bilinear")
91 | self.label1 = imresize(self.label1, size=(self.H, self.W), interp="nearest")
92 | self.img2 = imresize(self.img2, size=(self.H, self.W), interp="bilinear")
93 | self.label2 = imresize(self.label2, size=(self.H, self.W), interp="nearest")
94 | else:
95 | scale = (random.random()*2-1) * self.max_scale
96 | rotation = (random.random()*2-1) * self.max_rotate
97 | trans_w = np.int( (random.random()*2-1) * self.max_transW * self.W )
98 | trans_h = np.int( (random.random()*2-1) * self.max_transH * self.H )
99 | if self.flip:
100 | flip = (random.randint(0,1) > 0)
101 | else:
102 | flip = False
103 | self.img1 = self.load_image_transform(self.indices[idx1].split(' ')[0], scale, rotation, trans_h, trans_w, flip)
104 | self.label1 = self.load_label_transform(self.indices[idx1].split(' ')[1], scale, rotation, trans_h, trans_w, flip)
105 |
106 | scale = (random.random()*2-1) * self.max_scale
107 | rotation = (random.random()*2-1) * self.max_rotate
108 | trans_w = np.int( (random.random()*2-1) * self.max_transW * self.W )
109 | trans_h = np.int( (random.random()*2-1) * self.max_transH * self.H )
110 | if self.flip:
111 | flip = (random.randint(0,1) > 0)
112 | else:
113 | flip = False
114 | self.img2 = self.load_image_transform(self.indices[idx2].split(' ')[0], scale, rotation, trans_h, trans_w, flip)
115 | self.label2 = self.load_label_transform(self.indices[idx2].split(' ')[1], scale, rotation, trans_h, trans_w, flip)
116 |
117 |
118 | if self.scale != 1:
119 | self.img1 = imresize(self.img1, size=(np.int(self.H*self.scale), np.int(self.W*self.scale)), interp="bilinear")
120 | self.label1 = imresize(self.label1, size=(np.int(self.H*self.scale), np.int(self.W*self.scale)), interp="nearest")
121 | self.img2 = imresize(self.img2, size=(np.int(self.H*self.scale), np.int(self.W*self.scale)), interp="bilinear")
122 | self.label2 = imresize(self.label2, size=(np.int(self.H*self.scale), np.int(self.W*self.scale)), interp="nearest")
123 | print >> sys.stderr, 'SCALE {}'.format(self.scale)
124 |
125 | # randomly select an object
126 | num_obj = np.max(self.label1)
127 | if num_obj == 0 or np.max(self.label2) == 0:
128 | continue
129 |
130 | obj_id = random.randint(1, num_obj)
131 | if np.sum(self.get_bbox(self.label1, obj_id)) == 0 or np.sum(self.get_bbox(self.label2, obj_id)) == 0:
132 | # if objects in either image are too small
133 | continue
134 |
135 | # self.bbox = self.get_rois(self.label1, obj_id, self.box_num, 1)
136 | self.bbox = self.get_bboxes(self.label1, obj_id, self.box_num)
137 | self.rois = self.get_rois(self.label2, obj_id, self.box_num, self.fg_rate)
138 | self.label_sim = np.zeros([self.box_num, 1], dtype = np.uint8)
139 | for i in range(np.int(self.box_num/2)):
140 | self.label_sim[i] = 1
141 | # print >> sys.stderr,'label = {}'.format(self.label_sim)
142 | self.img1 = self.img1.transpose((2,0,1))
143 | self.img2 = self.img2.transpose((2,0,1))
144 |
145 | self.label1 = np.uint8(self.label1==obj_id)
146 | self.label2 = np.uint8(self.label2==obj_id)
147 | break
148 |
149 | # reshape tops
150 | top[0].reshape(2, *self.img1.shape)
151 | top[1].reshape(2, *self.label1.shape)
152 | top[2].reshape(self.box_num, 1)
153 | top[3].reshape(self.box_num, 5)
154 | top[4].reshape(self.box_num, 5)
155 |
156 |
157 | def forward(self, bottom, top):
158 | # assign output
159 | top[0].data[...] = np.concatenate((self.img1[np.newaxis, ...], self.img2[np.newaxis, ...]), axis = 0)
160 | top[1].data[...] = np.concatenate((self.label1[np.newaxis, ...], self.label2[np.newaxis, ...]), axis = 0)
161 | top[2].data[...] = self.label_sim
162 | top[3].data[...] = self.bbox
163 | top[4].data[...] = self.rois
164 |
165 | def backward(self, top, propagate_down, bottom):
166 | pass
167 |
168 |
169 | def load_image(self, idx):
170 | """
171 | Load input image and preprocess for Caffe:
172 | - cast to float
173 | - switch channels RGB -> BGR
174 | - subtract mean
175 | - transpose to channel x height x width order
176 | """
177 | print >> sys.stderr, 'loading {}'.format(idx)
178 | im = Image.open('{}/{}'.format(self.davis_dir, idx))
179 | im = im.resize((self.W, self.H))
180 | in_ = np.array(im, dtype=np.float32)
181 | in_ = in_[:,:,::-1]
182 | in_ -= self.mean
183 | # in_ = in_.transpose((2,0,1))
184 | return in_
185 |
186 |
187 | def load_label(self, idx):
188 | """
189 | Load label image as 1 x height x width integer array of label indices.
190 | The leading singleton dimension is required by the loss.
191 | """
192 | print >> sys.stderr, 'loading {}'.format(idx)
193 | im = Image.open('{}/{}'.format(self.davis_dir, idx))
194 | im = im.resize((self.W, self.H))
195 | label = np.array(im, dtype=np.uint8)
196 | # label = label[np.newaxis, ...]
197 | return label
198 |
199 |
200 |
201 | def load_image_transform(self, idx, scale, rotation, trans_h, trans_w, flip):
202 | img_W = np.int( self.W*(1.0 + scale) )
203 | img_H = np.int( self.H*(1.0 + scale) )
204 |
205 | print >> sys.stderr, 'loading {}'.format(idx)
206 | print >> sys.stderr, 'scale: {}; rotation: {}; translation: ({},{}); flip: {}.'.format(scale, rotation, trans_w, trans_h, flip)
207 |
208 | im = Image.open('{}/{}'.format(self.davis_dir, idx))
209 | im = im.resize((img_W,img_H))
210 | im = im.transform((img_W,img_H),Image.AFFINE,(1,0,trans_w,0,1,trans_h))
211 | im = im.rotate(rotation)
212 | if flip:
213 | im = im.transpose(Image.FLIP_LEFT_RIGHT)
214 |
215 | if scale>0:
216 | box = (np.int((img_W - self.W)/2), np.int((img_H - self.H)/2), np.int((img_W - self.W)/2)+self.W, np.int((img_H - self.H)/2)+self.H)
217 | im = im.crop(box)
218 | else:
219 | im = im.resize((self.W, self.H))
220 |
221 | in_ = np.array(im, dtype=np.float32)
222 | in_ = in_[:,:,::-1]
223 | in_ -= self.mean
224 |
225 | return in_
226 |
227 |
228 | def load_label_transform(self, idx, scale, rotation, trans_h, trans_w, flip):
229 | img_W = np.int( self.W*(1.0 + scale) )
230 | img_H = np.int( self.H*(1.0 + scale) )
231 |
232 | im = Image.open('{}/{}'.format(self.davis_dir, idx))
233 | im = im.resize((img_W,img_H))
234 | im = im.transform((img_W,img_H),Image.AFFINE,(1,0,trans_w,0,1,trans_h))
235 | im = im.rotate(rotation)
236 | if flip:
237 | im = im.transpose(Image.FLIP_LEFT_RIGHT)
238 |
239 | if scale>0:
240 | w_start = np.int(random.random()*(img_W - self.W))
241 | h_start = np.int(random.random()*(img_H - self.H))
242 | box = (w_start, h_start, w_start+self.W, h_start+self.H)
243 | im = im.crop(box)
244 | else:
245 | im = im.resize((self.W, self.H))
246 |
247 | label = np.array(im, dtype=np.uint8)
248 | print >> sys.stderr, 'Number of Objects: {}'.format(np.max(label))
249 |
250 | return label
251 |
252 |
253 | def get_bboxes(self, label, obj_id, box_num):
254 | label = np.array(label, dtype=np.uint8)
255 | pos = np.where(label == obj_id)
256 | rois = np.zeros((box_num, 5))
257 | bb = [np.min(pos[1]), np.min(pos[0]), np.max(pos[1]), np.max(pos[0])]
258 | for k in range(box_num):
259 | rois[k, 1:5] = bb
260 | return rois
261 |
262 |
263 | def get_bbox(self, label, obj_id):
264 | label = np.array(label, dtype=np.uint8)
265 | pos = np.where(label == obj_id)
266 | if len(pos[0])<1024:
267 | print >> sys.stderr, 'escape very small object {}'.format(obj_id)
268 | bb = [0, 0, 0, 0]
269 | return bb
270 | else:
271 | bb = [np.min(pos[1]), np.min(pos[0]), np.max(pos[1]), np.max(pos[0])]
272 | print>>sys.stderr, 'obj = {}; gt = {}; area = {};'.format(obj_id, bb, len(pos[0]))
273 | return bb
274 |
275 |
276 | def get_rois(self, label, obj_id, box_num, fg_rate):
277 | label = np.array(label, dtype=np.uint8)
278 | pos = np.where(label == obj_id)
279 | W = label.shape[1]
280 | H = label.shape[0]
281 | print>>sys.stderr, label.shape
282 | rois = np.zeros((box_num, 5))
283 | gtbb = [np.min(pos[1]), np.min(pos[0]), np.max(pos[1]), np.max(pos[0])]
284 | box = np.zeros_like(gtbb)
285 | bb = np.zeros((1,5))
286 | k = 0
287 | while k < np.int(box_num*fg_rate):
288 | bb[0, 1] = random.randint(0,W)
289 | bb[0, 2] = random.randint(0,H)
290 | bb[0, 3] = random.randint(bb[0, 1],W)
291 | bb[0, 4] = random.randint(bb[0, 2],H)
292 | if (bb[0,4] - bb[0,2]+1)*(bb[0,3]-bb[0,1]+1) < 8*8*4:
293 | continue
294 | iou = self.func_iou(bb[0, 1:5], gtbb)
295 | # print>>sys.stderr, 'obj = {}; gt = {}; proposal = {}; IoU = {};'.format(obj_id, gtbb, bb, iou)
296 | if iou > 0.7:
297 | rois[k,...] = bb
298 | k = k + 1
299 | # print>>sys.stderr, 'obj = {}; gt = {}; proposal = {}; IoU = {};'.format(obj_id, gtbb, bb[0, 1:5], iou)
300 |
301 | while k < box_num:
302 | bb[0, 1] = random.randint(0,W)
303 | bb[0, 2] = random.randint(0,H)
304 | bb[0, 3] = random.randint(bb[0, 1],W)
305 | bb[0, 4] = random.randint(bb[0, 2],H)
306 | if (bb[0,4] - bb[0,2]+1)*(bb[0,3]-bb[0,1]+1) < 8*8*4:
307 | continue
308 | iou = self.func_iou(bb[0, 1:5], gtbb)
309 | # print>>sys.stderr, 'obj = {}; gt = {}; proposal = {}; IoU = {};'.format(obj_id, gtbb, bb, iou)
310 | if iou < 0.5:
311 | rois[k,...] = bb
312 | k = k + 1
313 | # print>>sys.stderr, 'obj = {}; gt = {}; proposal = {}; IoU = {};'.format(obj_id, gtbb, bb[0, 1:5], iou)
314 |
315 | return rois
316 |
317 |
318 |
319 | def func_iou(self, bb, gtbb):
320 | iou = 0
321 | iw = min(bb[2],gtbb[2]) - max(bb[0],gtbb[0]) + 1
322 | ih = min(bb[3],gtbb[3]) - max(bb[1],gtbb[1]) + 1
323 | if iw>0 and ih>0:
324 | ua = (bb[2]-bb[0]+1)*(bb[3]-bb[1]+1) + (gtbb[2]-gtbb[0]+1)*(gtbb[3]-gtbb[1]+1) - iw*ih
325 | iou = np.float32(iw*ih*1.0/ua)
326 |
327 | return iou
328 |
329 |
--------------------------------------------------------------------------------
/python_layers/davis_siamese_data_layer3.py:
--------------------------------------------------------------------------------
1 | import caffe
2 |
3 | import numpy as np
4 | from PIL import Image
5 | from scipy.misc import imresize
6 |
7 | import sys
8 |
9 | import random
10 |
11 | class DavisSiameseDataLayer3(caffe.Layer):
12 |
13 | def setup(self, bottom, top):
14 | params = eval(self.param_str)
15 | self.davis_dir = params['davis_dir']
16 | self.split = params['split']
17 | self.mean = np.array(params['mean'])
18 | self.random = params.get('randomize', True)
19 | self.seed = params.get('seed', None)
20 | self.scale = params.get('scale', 1)
21 | self.fg_rate = params.get('fg_rate', 0.5)
22 | self.augment = params.get('with_augmentation', True)
23 | self.aug_params= np.array(params['aug_params']) #( aug_num, max_scale, max_rotate, max_translation, flip)
24 | self.H = 480
25 | self.W = 854
26 | self.box_num = params.get('box_num', 100)
27 |
28 | if self.augment:
29 | self.aug_num = np.int(self.aug_params[0])
30 | self.max_scale = self.aug_params[1]
31 | self.max_rotate = self.aug_params[2]
32 | self.max_transW = self.aug_params[3]
33 | self.max_transH = self.aug_params[4]
34 | self.flip = (self.aug_params[5]>0)
35 |
36 | # tops
37 | if len(top) != 5:
38 | raise Exception("Need to define five tops: data, labels, label_sim, bbox, rois")
39 | # data layers have no bottoms
40 | if len(bottom) != 0:
41 | raise Exception("Do not define a bottom.")
42 |
43 | # load indices for images and labels
44 | split_f = '{}/ImageSets/480p/{}.txt'.format(self.davis_dir,
45 | self.split)
46 | self.indices = open(split_f, 'r').read().splitlines()
47 | self.idx1 = -1 # we pick idx in reshape
48 | self.idx2 = -1
49 |
50 | # make eval deterministic
51 | if 'train' not in self.split:
52 | self.random = False
53 |
54 | # randomization: seed and pick
55 | if self.random:
56 | random.seed(self.seed)
57 | self.idx = random.randint(0, len(self.indices)-1)
58 |
59 |
60 | def reshape(self, bottom, top):
61 | while True:
62 | # pick next input
63 | if self.random:
64 | self.idx1 = random.randint(0, len(self.indices)-1)
65 | self.idx2 = random.randint(0, len(self.indices)-1)
66 | else:
67 | self.idx1 += 1
68 | self.idx2 += 1
69 | if self.idx1 == len(self.indices):
70 | self.idx1 = 0
71 | if self.idx2 == len(self.indices):
72 | self.idx2 = 0
73 |
74 | idx1 = self.idx1
75 | idx2 = self.idx2
76 |
77 | #get clip name
78 | clip1 = self.indices[idx1].split(' ')[0].split('/')[-2]
79 | clip2 = self.indices[idx2].split(' ')[0].split('/')[-2]
80 |
81 | if clip1 != clip2:
82 | continue
83 |
84 |
85 | if self.augment == False or random.randint(0, self.aug_num) == 0:
86 | self.img1 = self.load_image(self.indices[idx1].split(' ')[0])
87 | self.img2 = self.load_image(self.indices[idx2].split(' ')[0])
88 | self.label1 = self.load_label(self.indices[idx1].split(' ')[1])
89 | self.label2 = self.load_label(self.indices[idx2].split(' ')[1])
90 | self.img1 = imresize(self.img1, size=(self.H, self.W), interp="bilinear")
91 | self.label1 = imresize(self.label1, size=(self.H, self.W), interp="nearest")
92 | self.img2 = imresize(self.img2, size=(self.H, self.W), interp="bilinear")
93 | self.label2 = imresize(self.label2, size=(self.H, self.W), interp="nearest")
94 | else:
95 | scale = (random.random()*2-1) * self.max_scale
96 | rotation = (random.random()*2-1) * self.max_rotate
97 | trans_w = np.int( (random.random()*2-1) * self.max_transW * self.W )
98 | trans_h = np.int( (random.random()*2-1) * self.max_transH * self.H )
99 | if self.flip:
100 | flip = (random.randint(0,1) > 0)
101 | else:
102 | flip = False
103 | self.img1 = self.load_image_transform(self.indices[idx1].split(' ')[0], scale, rotation, trans_h, trans_w, flip)
104 | self.label1 = self.load_label_transform(self.indices[idx1].split(' ')[1], scale, rotation, trans_h, trans_w, flip)
105 |
106 | scale = (random.random()*2-1) * self.max_scale
107 | rotation = (random.random()*2-1) * self.max_rotate
108 | trans_w = np.int( (random.random()*2-1) * self.max_transW * self.W )
109 | trans_h = np.int( (random.random()*2-1) * self.max_transH * self.H )
110 | if self.flip:
111 | flip = (random.randint(0,1) > 0)
112 | else:
113 | flip = False
114 | self.img2 = self.load_image_transform(self.indices[idx2].split(' ')[0], scale, rotation, trans_h, trans_w, flip)
115 | self.label2 = self.load_label_transform(self.indices[idx2].split(' ')[1], scale, rotation, trans_h, trans_w, flip)
116 |
117 |
118 | if self.scale != 1:
119 | self.img1 = imresize(self.img1, size=(np.int(self.H*self.scale), np.int(self.W*self.scale)), interp="bilinear")
120 | self.label1 = imresize(self.label1, size=(np.int(self.H*self.scale), np.int(self.W*self.scale)), interp="nearest")
121 | self.img2 = imresize(self.img2, size=(np.int(self.H*self.scale), np.int(self.W*self.scale)), interp="bilinear")
122 | self.label2 = imresize(self.label2, size=(np.int(self.H*self.scale), np.int(self.W*self.scale)), interp="nearest")
123 | print >> sys.stderr, 'SCALE {}'.format(self.scale)
124 |
125 | # randomly select an object
126 | num_obj = np.max(self.label1)
127 | if num_obj == 0 or np.max(self.label2) == 0:
128 | continue
129 |
130 | obj_id = random.randint(1, num_obj)
131 | if np.sum(self.get_bbox(self.label1, obj_id)) == 0 or np.sum(self.get_bbox(self.label2, obj_id)) == 0:
132 | # if objects in either image are too small
133 | continue
134 |
135 | self.bbox = self.get_rois(self.label1, obj_id, self.box_num, 1)
136 | # self.bbox = self.get_bboxes(self.label1, obj_id, self.box_num)
137 | self.rois = self.get_rois(self.label2, obj_id, self.box_num, self.fg_rate)
138 | self.label_sim = np.zeros([self.box_num, 1], dtype = np.uint8)
139 | for i in range(np.int(self.box_num/2)):
140 | self.label_sim[i] = 1
141 | # print >> sys.stderr,'label = {}'.format(self.label_sim)
142 | self.img1 = self.img1.transpose((2,0,1))
143 | self.img2 = self.img2.transpose((2,0,1))
144 |
145 | self.label1 = np.uint8(self.label1==obj_id)
146 | self.label2 = np.uint8(self.label2==obj_id)
147 | break
148 |
149 | # reshape tops
150 | top[0].reshape(2, *self.img1.shape)
151 | top[1].reshape(2, *self.label1.shape)
152 | top[2].reshape(self.box_num, 1)
153 | top[3].reshape(self.box_num, 5)
154 | top[4].reshape(self.box_num, 5)
155 |
156 |
157 | def forward(self, bottom, top):
158 | # assign output
159 | top[0].data[...] = np.concatenate((self.img1[np.newaxis, ...], self.img2[np.newaxis, ...]), axis = 0)
160 | top[1].data[...] = np.concatenate((self.label1[np.newaxis, ...], self.label2[np.newaxis, ...]), axis = 0)
161 | top[2].data[...] = self.label_sim
162 | top[3].data[...] = self.bbox
163 | top[4].data[...] = self.rois
164 |
165 | def backward(self, top, propagate_down, bottom):
166 | pass
167 |
168 |
169 | def load_image(self, idx):
170 | """
171 | Load input image and preprocess for Caffe:
172 | - cast to float
173 | - switch channels RGB -> BGR
174 | - subtract mean
175 | - transpose to channel x height x width order
176 | """
177 | print >> sys.stderr, 'loading {}'.format(idx)
178 | im = Image.open('{}/{}'.format(self.davis_dir, idx))
179 | im = im.resize((self.W, self.H))
180 | in_ = np.array(im, dtype=np.float32)
181 | in_ = in_[:,:,::-1]
182 | in_ -= self.mean
183 | # in_ = in_.transpose((2,0,1))
184 | return in_
185 |
186 |
187 | def load_label(self, idx):
188 | """
189 | Load label image as 1 x height x width integer array of label indices.
190 | The leading singleton dimension is required by the loss.
191 | """
192 | print >> sys.stderr, 'loading {}'.format(idx)
193 | im = Image.open('{}/{}'.format(self.davis_dir, idx))
194 | im = im.resize((self.W, self.H))
195 | label = np.array(im, dtype=np.uint8)
196 | # label = label[np.newaxis, ...]
197 | return label
198 |
199 |
200 |
201 | def load_image_transform(self, idx, scale, rotation, trans_h, trans_w, flip):
202 | img_W = np.int( self.W*(1.0 + scale) )
203 | img_H = np.int( self.H*(1.0 + scale) )
204 |
205 | print >> sys.stderr, 'loading {}'.format(idx)
206 | print >> sys.stderr, 'scale: {}; rotation: {}; translation: ({},{}); flip: {}.'.format(scale, rotation, trans_w, trans_h, flip)
207 |
208 | im = Image.open('{}/{}'.format(self.davis_dir, idx))
209 | im = im.resize((img_W,img_H))
210 | im = im.transform((img_W,img_H),Image.AFFINE,(1,0,trans_w,0,1,trans_h))
211 | im = im.rotate(rotation)
212 | if flip:
213 | im = im.transpose(Image.FLIP_LEFT_RIGHT)
214 |
215 | if scale>0:
216 | box = (np.int((img_W - self.W)/2), np.int((img_H - self.H)/2), np.int((img_W - self.W)/2)+self.W, np.int((img_H - self.H)/2)+self.H)
217 | im = im.crop(box)
218 | else:
219 | im = im.resize((self.W, self.H))
220 |
221 | in_ = np.array(im, dtype=np.float32)
222 | in_ = in_[:,:,::-1]
223 | in_ -= self.mean
224 |
225 | return in_
226 |
227 |
228 | def load_label_transform(self, idx, scale, rotation, trans_h, trans_w, flip):
229 | img_W = np.int( self.W*(1.0 + scale) )
230 | img_H = np.int( self.H*(1.0 + scale) )
231 |
232 | im = Image.open('{}/{}'.format(self.davis_dir, idx))
233 | im = im.resize((img_W,img_H))
234 | im = im.transform((img_W,img_H),Image.AFFINE,(1,0,trans_w,0,1,trans_h))
235 | im = im.rotate(rotation)
236 | if flip:
237 | im = im.transpose(Image.FLIP_LEFT_RIGHT)
238 |
239 | if scale>0:
240 | w_start = np.int(random.random()*(img_W - self.W))
241 | h_start = np.int(random.random()*(img_H - self.H))
242 | box = (w_start, h_start, w_start+self.W, h_start+self.H)
243 | im = im.crop(box)
244 | else:
245 | im = im.resize((self.W, self.H))
246 |
247 | label = np.array(im, dtype=np.uint8)
248 | print >> sys.stderr, 'Number of Objects: {}'.format(np.max(label))
249 |
250 | return label
251 |
252 |
253 | def get_bboxes(self, label, obj_id, box_num):
254 | label = np.array(label, dtype=np.uint8)
255 | pos = np.where(label == obj_id)
256 | rois = np.zeros((box_num, 5))
257 | bb = [np.min(pos[1]), np.min(pos[0]), np.max(pos[1]), np.max(pos[0])]
258 | for k in range(box_num):
259 | rois[k, 1:5] = bb
260 | return rois
261 |
262 |
263 | def get_bbox(self, label, obj_id):
264 | label = np.array(label, dtype=np.uint8)
265 | pos = np.where(label == obj_id)
266 | if len(pos[0])<1024:
267 | print >> sys.stderr, 'escape very small object {}'.format(obj_id)
268 | bb = [0, 0, 0, 0]
269 | return bb
270 | else:
271 | bb = [np.min(pos[1]), np.min(pos[0]), np.max(pos[1]), np.max(pos[0])]
272 | print>>sys.stderr, 'obj = {}; gt = {}; area = {};'.format(obj_id, bb, len(pos[0]))
273 | return bb
274 |
275 |
276 | def get_rois(self, label, obj_id, box_num, fg_rate):
277 | label = np.array(label, dtype=np.uint8)
278 | pos = np.where(label == obj_id)
279 | W = label.shape[1]
280 | H = label.shape[0]
281 | print>>sys.stderr, label.shape
282 | rois = np.zeros((box_num, 5))
283 | gtbb = [np.min(pos[1]), np.min(pos[0]), np.max(pos[1]), np.max(pos[0])]
284 | box = np.zeros_like(gtbb)
285 | bb = np.zeros((1,5))
286 | k = 0
287 | while k < np.int(box_num*fg_rate):
288 | bb[0, 1] = random.randint(0,W)
289 | bb[0, 2] = random.randint(0,H)
290 | bb[0, 3] = random.randint(bb[0, 1],W)
291 | bb[0, 4] = random.randint(bb[0, 2],H)
292 | if (bb[0,4] - bb[0,2]+1)*(bb[0,3]-bb[0,1]+1) < 8*8*4:
293 | continue
294 | iou = self.func_iou(bb[0, 1:5], gtbb)
295 | # print>>sys.stderr, 'obj = {}; gt = {}; proposal = {}; IoU = {};'.format(obj_id, gtbb, bb, iou)
296 | if iou > 0.7:
297 | rois[k,...] = bb
298 | k = k + 1
299 | # print>>sys.stderr, 'obj = {}; gt = {}; proposal = {}; IoU = {};'.format(obj_id, gtbb, bb[0, 1:5], iou)
300 |
301 | while k < box_num:
302 | bb[0, 1] = random.randint(0,W)
303 | bb[0, 2] = random.randint(0,H)
304 | bb[0, 3] = random.randint(bb[0, 1],W)
305 | bb[0, 4] = random.randint(bb[0, 2],H)
306 | if (bb[0,4] - bb[0,2]+1)*(bb[0,3]-bb[0,1]+1) < 8*8*4:
307 | continue
308 | iou = self.func_iou(bb[0, 1:5], gtbb)
309 | # print>>sys.stderr, 'obj = {}; gt = {}; proposal = {}; IoU = {};'.format(obj_id, gtbb, bb, iou)
310 | if iou < 0.5:
311 | rois[k,...] = bb
312 | k = k + 1
313 | # print>>sys.stderr, 'obj = {}; gt = {}; proposal = {}; IoU = {};'.format(obj_id, gtbb, bb[0, 1:5], iou)
314 |
315 | return rois
316 |
317 |
318 |
319 | def func_iou(self, bb, gtbb):
320 | iou = 0
321 | iw = min(bb[2],gtbb[2]) - max(bb[0],gtbb[0]) + 1
322 | ih = min(bb[3],gtbb[3]) - max(bb[1],gtbb[1]) + 1
323 | if iw>0 and ih>0:
324 | ua = (bb[2]-bb[0]+1)*(bb[3]-bb[1]+1) + (gtbb[2]-gtbb[0]+1)*(gtbb[3]-gtbb[1]+1) - iw*ih
325 | iou = np.float32(iw*ih*1.0/ua)
326 |
327 | return iou
328 |
329 |
--------------------------------------------------------------------------------
/python_layers/davis_test_score_layer.py:
--------------------------------------------------------------------------------
1 | import caffe
2 |
3 | import numpy as np
4 | from PIL import Image
5 | from scipy.misc import imresize
6 |
7 | import sys
8 |
9 | import random
10 |
11 | from scipy import ndimage
12 |
13 |
14 |
15 | class DavisSiameseTestScoreLayer(caffe.Layer):
16 |
17 | def setup(self, bottom, top):
18 | params = eval(self.param_str)
19 | self.davis_dir = params['davis_dir']
20 | self.split = params['split']
21 | self.mean = np.array(params['mean'])
22 | self.sc_params = np.array(params['aug_params']) # min_scale, max_scale, stride, flag_search_range (0: whole image, 1: search_range), search_range
23 | self.H = 480
24 | self.W = 854
25 |
26 |
27 | self.min_scale = self.sc_params[0]
28 | self.max_scale = self.sc_params[1]
29 | self.stride = self.sc_params[2]
30 | self.flag_range = (self.sc_params[3]>0)
31 | self.search_range = self.sc_params[4]
32 | # anchors: 1*2, 2*1, 1*1
33 |
34 | # tops
35 | if len(top) != 3:
36 | raise Exception("Need to define three tops: scores, scales, anchors")
37 | # data layers have no bottoms
38 | if len(bottom) != 3:
39 | raise Exception("Need two bottoms: gt1_box, feat1, feat2")
40 |
41 | def reshape(self, bottom, top):
42 | self.bbox = np.zeros_like(bottom[0].data, dtype = np.float32)
43 | self.feat1 = np.zeros_like(bottom[1].data, dtype = np.float32)
44 | self.feat2 = np.zeros_like(bottom[2].data, dtype = np.float32)
45 | self.num_scoremaps = 3.0*( (self.max_scale - self.min_scale)*1.0/self.stride + 1 )
46 | self.scales = np.zeros([self.num_scoremaps, 1], dtype = np.float32)
47 | self.anchors = np.zeros([self.num_scoremaps, 2], dtype = np.float32)
48 | self.score_maps = np.zeros([self.num_scoremaps, bottom[1].shape[2], bottom[1].shape[3]], dtype = np.float32)
49 | # reshape tops
50 | top[0].reshape(self.num_scoremaps, bottom[1].shape[2], bottom[1].shape[3])
51 | top[1].reshape(self.num_scoremaps, 1)
52 | top[2].reshape(self.num_scoremaps, 2)
53 |
54 |
55 | def forward(self, bottom, top):
56 | self.bbox = bottom[0].data
57 | self.feat1 = bottom[1].data
58 | self.feat2 = bottom[2].data
59 |
60 | self.feat1 = self.feat1.reshape([self.feat1.shape[1], self.feat1.shape[2], self.feat1.shape[3]])
61 | self.feat1 = self.feat1.transpos((2,3,1))
62 | feat_bbox = self.extract_feat_ROI(self.feat1, self.bbox) # get gt features
63 |
64 | self.feat2 = self.feat2.reshape([self.feat1.shape[1], self.feat1.shape[2], self.feat1.shape[3]])
65 | self.feat2 = self.feat2.transpos((2,3,1))
66 |
67 | anchor = [1, 1]
68 | for i in range(np.int(self.num_scoremaps/3)):
69 | scale = self.min_scale + i*self.stride
70 | feat_map = self.extract_feat_map(self.feat2, scale, anchor) # get scaled feature maps
71 | score_map = ndimage.convolve(feat_bbox, feat_map, mode='reflect', cval=0.0)
72 | self.score_maps[i,...] = score_map
73 | self.scales[i] = scale
74 | self.anchors[i,...] = anchor
75 |
76 |
77 | # assign output
78 | top[0].data[...] = self.score_maps
79 | top[1].data[...] = self.scales
80 | top[2].data[...] = self.anchors
81 |
82 |
83 |
84 | def backward(self, top, propagate_down, bottom):
85 | pass
86 |
87 |
88 | def extract_feat_ROI(self, feat, bbox):
89 | bbox = np.round(bbox)
90 | feat_bbox = feat[bbox[0]:bbox[2],bbox[1]:bbox[3],...]
91 |
92 | return feat_bbox
93 |
94 |
95 | def extract_feat_map(self, feat, scale, anchor):
96 |
97 |
98 |
99 |
--------------------------------------------------------------------------------
/python_layers/mask_data_layer.py:
--------------------------------------------------------------------------------
1 | import caffe
2 | import numpy as np
3 | from numpy import *
4 | import math
5 | from scipy.misc import imresize
6 | import sys
7 | import random
8 |
9 | from PIL import Image
10 |
11 | class MaskDataLayer(caffe.Layer):
12 |
13 | def setup(self, bottom, top):
14 | # check input pair
15 | if len(bottom) != 2:
16 | raise Exception("Need two bottoms: label, rois")
17 | if len(top) != 2:
18 | raise Exception("Need two top: mask_targets, weights")
19 | params = eval(self.param_str)
20 | self.mask_w = params.get('mask_w', 14)
21 | self.mask_h = params.get('mask_h', 14)
22 |
23 |
24 | def reshape(self, bottom, top):
25 | self.label = np.zeros_like(bottom[0].data, dtype=np.float32)
26 | self.rois = np.zeros_like(bottom[1].data, dtype=np.float32)
27 | self.mask_targets = np.zeros([self.rois.shape[0], self.mask_h, self.mask_w],dtype=np.uint8)
28 | self.weights = np.zeros([self.rois.shape[0], self.mask_h, self.mask_w],dtype=np.float32)
29 | top[0].reshape(self.rois.shape[0], self.mask_h, self.mask_w)
30 | top[1].reshape(self.rois.shape[0], self.mask_h, self.mask_w)
31 |
32 | def forward(self, bottom, top):
33 | self.label = bottom[0].data
34 | self.label = self.label.reshape(self.label.shape[1], self.label.shape[2])
35 | # print >> sys.stderr, self.label
36 | self.rois = bottom[1].data
37 | box_num = self.rois.shape[0]
38 | label_map = self.label
39 | for i in range(box_num):
40 | label_map_new = self.label_in_box(label_map, self.rois[i,...])
41 | label_map_new = imresize(label_map_new, size=(np.int(self.mask_h), np.int(self.mask_w)), interp="nearest", mode='F')
42 | label_map_new = np.array(label_map_new, dtype=np.uint8)
43 | # print label_map_new
44 | # pos = np.where(label_map_new == 1)
45 | # print >> sys.stderr, 'Number of positive pixels after resize: {}'.format(len(pos[0]))
46 | self.mask_targets[i,...] = label_map_new
47 | self.weights[i,...] = self.calculate_weight(label_map_new)
48 | # print self.mask_targets
49 | top[0].data[...] = self.mask_targets
50 | top[1].data[...] = self.weights
51 |
52 |
53 | def backward(self, top, propagate_down, bottom):
54 | pass
55 |
56 |
57 | def label_in_box(self, label, bb):
58 | bb = bb[1:5]
59 | label = Image.fromarray(label)
60 | # print >>sys.stderr, bb
61 | label = label.crop(bb)
62 | # label = np.array(label, dtype=np.uint8)
63 | # pos = np.where(label == 1)
64 | # print >> sys.stderr, 'Number of positive pixels: {}'.format(len(pos[0]))
65 |
66 | return label
67 |
68 |
69 |
70 | def calculate_weight(self, label):
71 | pos = np.where(label==1)
72 | neg = np.where(label==0)
73 | weight_pos = len(pos[0])*1.0/(len(pos[0])+len(neg[0]))
74 | weight = np.ones_like(label, dtype=np.float32)*weight_pos
75 | for i in range(len(pos[0])):
76 | weight[pos[0][i],pos[1][i]] = 1 - weight_pos
77 |
78 | print >> sys.stderr, 'pos_num = {}, neg_num = {}, weight_pos = {}'.format(len(pos[0]), len(neg[0]), 1 - weight_pos)
79 |
80 | return weight
81 |
--------------------------------------------------------------------------------
/python_layers/mask_data_layer2.py:
--------------------------------------------------------------------------------
1 | import caffe
2 | import numpy as np
3 | from numpy import *
4 | import math
5 | from scipy.misc import imresize
6 | import sys
7 | import random
8 |
9 | from PIL import Image
10 |
11 | class MaskDataLayer2(caffe.Layer):
12 |
13 | def setup(self, bottom, top):
14 | # check input pair
15 | if len(bottom) != 2:
16 | raise Exception("Need two bottoms: label, rois")
17 | if len(top) != 2:
18 | raise Exception("Need two top: mask_targets, weights")
19 | params = eval(self.param_str)
20 | self.mask_w = params.get('mask_w', 14)
21 | self.mask_h = params.get('mask_h', 14)
22 |
23 |
24 | def reshape(self, bottom, top):
25 | self.label = np.zeros_like(bottom[0].data, dtype=np.float32)
26 | self.rois = np.zeros_like(bottom[1].data, dtype=np.float32)
27 | self.mask_targets = np.zeros([self.rois.shape[0], self.mask_h, self.mask_w],dtype=np.uint8)
28 | self.weights = np.zeros([self.rois.shape[0], self.mask_h, self.mask_w],dtype=np.float32)
29 | top[0].reshape(self.rois.shape[0], self.mask_h, self.mask_w)
30 | top[1].reshape(self.rois.shape[0], self.mask_h, self.mask_w)
31 |
32 | def forward(self, bottom, top):
33 | self.label = bottom[0].data
34 | self.label = self.label.reshape(self.label.shape[1], self.label.shape[2])
35 | # print >> sys.stderr, self.label
36 | self.rois = bottom[1].data
37 | box_num = self.rois.shape[0]
38 | label_map = self.label
39 | for i in range(box_num):
40 | label_map_new = self.label_in_box(label_map, self.rois[i,...])
41 | label_map_new = imresize(label_map_new, size=(np.int(self.mask_h), np.int(self.mask_w)), interp="nearest", mode='F')
42 | label_map_new = np.array(label_map_new, dtype=np.uint8)
43 | # print label_map_new
44 | # pos = np.where(label_map_new == 1)
45 | # print >> sys.stderr, 'Number of positive pixels after resize: {}'.format(len(pos[0]))
46 | self.mask_targets[i,...] = label_map_new
47 | self.weights[i,...] = self.calculate_weight(label_map_new)
48 | # print self.mask_targets
49 | top[0].data[...] = self.mask_targets
50 | top[1].data[...] = self.weights
51 |
52 |
53 | def backward(self, top, propagate_down, bottom):
54 | pass
55 |
56 |
57 | def label_in_box(self, label, bb):
58 | box = bb[1:5]
59 | #bb[0] = box[1]
60 | #bb[1] = box[0]
61 | #bb[3] = box[2]
62 | #bb[2] = box[3]
63 | #bb = bb[0:4]
64 | bb = box
65 | label = Image.fromarray(label)
66 | # print >>sys.stderr, bb
67 | label = label.crop(bb)
68 | # label = np.array(label, dtype=np.uint8)
69 | # pos = np.where(label == 1)
70 | # print >> sys.stderr, 'Number of positive pixels: {}'.format(len(pos[0]))
71 |
72 | return label
73 |
74 |
75 |
76 | def calculate_weight(self, label):
77 | pos = np.where(label==1)
78 | neg = np.where(label==0)
79 | weight_pos = len(pos[0])*1.0/(len(pos[0])+len(neg[0]))
80 | weight = np.ones_like(label, dtype=np.float32)*weight_pos
81 | for i in range(len(pos[0])):
82 | weight[pos[0][i],pos[1][i]] = 1 - weight_pos
83 |
84 | print >> sys.stderr, 'pos_num = {}, neg_num = {}, weight_pos = {}'.format(len(pos[0]), len(neg[0]), 1 - weight_pos)
85 |
86 | return weight
87 |
--------------------------------------------------------------------------------
/python_layers/readme:
--------------------------------------------------------------------------------
1 |
2 |
--------------------------------------------------------------------------------
/python_layers/response_value.py:
--------------------------------------------------------------------------------
1 | import caffe
2 | import numpy as np
3 | from numpy import *
4 | import math
5 | from scipy.misc import imresize
6 | import sys
7 | import random
8 |
9 | class ResValueLayer(caffe.Layer):
10 |
11 | def setup(self, bottom, top):
12 | # check input pair
13 | if len(bottom) != 1:
14 | raise Exception("Need one bottom.")
15 |
16 |
17 | def reshape(self, bottom, top):
18 | self.response = np.zeros_like(bottom[0].data, dtype=np.float32)
19 |
20 | def forward(self, bottom, top):
21 | self.response = bottom[0].data
22 | print >> sys.stderr,'Response Absolute Value Mean = {}; Response Value Range = [{}, {}];'.format(np.mean(np.abs(self.response)), np.min(self.response), np.max(self.response))
23 |
24 |
25 | def backward(self, top, propagate_down, bottom):
26 | pass
27 |
--------------------------------------------------------------------------------
/python_layers/siamese_online_hard_mining_layer.py:
--------------------------------------------------------------------------------
1 | import caffe
2 | import numpy as np
3 | from numpy import *
4 | import math
5 | from scipy.misc import imresize
6 | import sys
7 | import random
8 |
9 | class SiaHardMiningLayer(caffe.Layer):
10 |
11 | def setup(self, bottom, top):
12 | # check input pair
13 | if len(bottom) != 3:
14 | raise Exception("Need two bottoms: label, feat1, feat2")
15 |
16 |
17 | def reshape(self, bottom, top):
18 | self.label = np.zeros_like(bottom[0].data, dtype=np.float32)
19 | self.label_new = np.zeros_like(bottom[0].data, dtype=np.float32)
20 | self.feat1 = np.zeros_like(bottom[1].data, dtype=np.float32)
21 | self.feat2 = np.zeros_like(bottom[2].data, dtype=np.float32)
22 | self.maintain_rate = 0.5
23 | self.H = 480
24 | self.W = 854
25 | self.pos = np.zeros([ 2,self.label.shape[0] ])
26 | top[0].reshape(bottom[0].shape[0]/2, 1)
27 | top[1].reshape(bottom[1].shape[0]/2, bottom[1].shape[1], bottom[1].shape[2], bottom[1].shape[3])
28 | top[2].reshape(bottom[1].shape[0]/2, bottom[1].shape[1], bottom[1].shape[2], bottom[1].shape[3])
29 |
30 |
31 | def forward(self, bottom, top):
32 | self.label = bottom[0].data
33 | self.feat1 = bottom[1].data
34 | self.feat2 = bottom[2].data
35 | label = self.label
36 | #print self.feat1.shape 300,4096,1,1
37 | #print >> sys.stderr, self.feat1
38 | #print >> sys.stderr, self.feat2
39 |
40 | dist = np.zeros([len(self.label),1], dtype=np.float32)
41 | for i in range(len(self.label)):
42 | vector1 = np.float32(mat(self.feat1[i,...,0]))
43 | vector2 = np.float32(mat(self.feat2[i,...,0]))
44 | dist[i] = sqrt(((vector1-vector2).T)*(vector1-vector2))
45 |
46 |
47 | pos = np.where(self.label == 1)
48 | D_sim = dist[pos]
49 | thre = np.median(D_sim)
50 | for i in range(len(pos[0])):
51 | if D_sim[i] < thre:
52 | label[pos[0][i]] = -1
53 |
54 | pos = np.where(self.label == 0)
55 | D_diff= dist[pos]
56 | thre = np.median(D_diff)
57 | for i in range(len(pos[0])):
58 | if D_diff[i] > thre:
59 | label[pos[0][i]] = -1
60 |
61 | # print>> sys.stderr, self.label
62 |
63 | pos1 = np.where(label == 1)
64 | pos2 = np.where(label == 0)
65 | if len(pos1[0])>len(pos2[0]):
66 | pos = np.zeros_like(pos2)
67 | pos[0][...] = pos1[0][0:len(pos2[0])]
68 | pos[1][...] = pos1[1][0:len(pos2[0])]
69 | pos1 = pos
70 |
71 | if len(pos2[0])>len(pos1[0]):
72 | pos = np.zeros_like(pos1)
73 | pos[0][...] = pos2[0][0:len(pos1[0])]
74 | pos[1][...] = pos2[1][0:len(pos1[0])]
75 | pos2 = pos
76 |
77 |
78 | self.pos = np.concatenate((pos1, pos2), axis = 1)
79 | self.label_new = self.label[self.pos[0]]
80 | self.feat1 = self.feat1[self.pos[0],...]
81 | self.feat2 = self.feat2[self.pos[0],...]
82 | # print>> sys.stderr, self.label_new.shape
83 |
84 | top[0].reshape(*self.label_new.shape)
85 | top[0].data[...] = self.label_new
86 | top[1].reshape(*self.feat1.shape)
87 | top[1].data[...] = self.feat1
88 | top[2].reshape(*self.feat2.shape)
89 | top[2].data[...] = self.feat2
90 |
91 |
92 |
93 | def backward(self, top, propagate_down, bottom):
94 | for i in range(3):
95 | if not propagate_down[i]:
96 | continue
97 |
98 | if i == 0:
99 | continue
100 |
101 | bottom[i].diff[...] = np.zeros_like(bottom[i].data, dtype=np.float32)
102 | # print >> sys.stderr, top[i].diff.shape
103 | # print >> sys.stderr, bottom[i].diff.shape
104 | for j in range(len(self.pos[0])):
105 | bottom[i].diff[self.pos[0][j],...] += top[i].diff[j,...]
106 |
107 |
108 |
--------------------------------------------------------------------------------
/python_layers/slice_layer.py:
--------------------------------------------------------------------------------
1 | import caffe
2 | import numpy as np
3 | from numpy import *
4 | import math
5 | from scipy.misc import imresize
6 | import sys
7 | import random
8 |
9 | class SliceLayer(caffe.Layer):
10 |
11 | def setup(self, bottom, top):
12 | # check input pair
13 | if len(bottom) != 1:
14 | raise Exception("Need one bottom: feat_concat")
15 |
16 |
17 | def reshape(self, bottom, top):
18 | self.feats = np.zeros_like(bottom[0].data, dtype=np.float32)
19 | top[0].reshape(self.feats.shape[0]-1,self.feats.shape[1],self.feats.shape[2],self.feats.shape[3])
20 | top[1].reshape(self.feats.shape[0]-1,self.feats.shape[1],self.feats.shape[2],self.feats.shape[3])
21 |
22 |
23 | def forward(self, bottom, top):
24 | self.feats = bottom[0].data
25 | num_obj = self.feats.shape[0]-1
26 | self.feat2 = self.feats[1:self.feats.shape[0],...]
27 | self.feat1 = np.zeros_like(self.feat2,dtype=np.float32)
28 |
29 | print >> sys.stderr, 'Number of objects: {}'.format(num_obj)
30 |
31 | for obj_id in range(num_obj):
32 | self.feat1[obj_id,...] = self.feats[0,...]
33 |
34 | top[0].reshape(*self.feat1.shape)
35 | top[0].data[...] = self.feat1
36 | top[1].reshape(*self.feat2.shape)
37 | top[1].data[...] = self.feat2
38 |
39 |
40 | def backward(self, top, propagate_down, bottom):
41 | pass
42 |
--------------------------------------------------------------------------------
/results/download_results.sh:
--------------------------------------------------------------------------------
1 | if [ ! -d "favos_baseline" ]; then
2 | wget https://www.dropbox.com/s/9zwob31bz91u75h/favos.tar
3 | tar -xf favos.tar
4 | rm -f favos.tar
5 | fi
6 |
7 |
--------------------------------------------------------------------------------
/test_davis16.sh:
--------------------------------------------------------------------------------
1 | # Demo script for FAVOS
2 | # Usage: sh test_davis16.sh {GPU-id} {sequence-name}
3 |
4 | cd demo && \
5 | python infer_davis16.py ../models/ROISegNet_2016.caffemodel deploy.prototxt $2 $1 && \
6 | matlab -nojvm -nodesktop -nodisplay -r "combine_mask('$2')" && \
7 | cd ../
8 |
--------------------------------------------------------------------------------