├── CSPN.pdf
├── LICENSE
├── README.md
├── approach.PNG
├── argmax_layer_CSPN.py
├── create_saliency_images.py
├── create_saliency_raw.py
├── evaluate_VOC_val.py
├── projection_layer_CSPN.py
├── results.PNG
├── softmax_loss_CSPN.py
├── train.prototxt
└── train_weighted_argmax_softmax.prototxt
/CSPN.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/briqr/CSPN/d3d01e5a4e29d0c2ee4f1dfda1f2e7815163d346/CSPN.pdf
--------------------------------------------------------------------------------
/LICENSE:
--------------------------------------------------------------------------------
1 | MIT License
2 |
3 | Copyright (c) 2018 Rania Briq
4 |
5 | Permission is hereby granted, free of charge, to any person obtaining a copy
6 | of this software and associated documentation files (the "Software"), to deal
7 | in the Software without restriction, including without limitation the rights
8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9 | copies of the Software, and to permit persons to whom the Software is
10 | furnished to do so, subject to the following conditions:
11 |
12 | The above copyright notice and this permission notice shall be included in all
13 | copies or substantial portions of the Software.
14 |
15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21 | SOFTWARE.
22 |
--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
1 | # Convolutional Simplex Projection Network (CSPN)
2 | In this work we propose an optimization approach for solving weakly supervised semantic segmentation with deep Convolutional Neural Networks(CNN).
3 | The method introduces a novel layer which applies simplex projection on the output of a neural network in order to satisfy
4 | object size constraints. Although we demonstrate the performance of the approach in a semantic segmentation application,
5 | the method is generic and can be integrated in any given network architecture. The goal of intruducing such a layer is to
6 | reduce the set of feasible solutions to only those that satisfy some given constraints.
7 | This implementation builds on [SEC](https://github.com/kolesman/SEC).
8 | Further details on the approach can be found in the paper. This paper has been accepted in BMVC 2018.
9 |
10 |
11 |
12 | ## Instructions:
13 |
14 | In order to use our algorithm, you need to train a modified version of SEC CNN. But first we need some preprocessing steps.
15 |
16 |
17 | To obtain object size estimates, we rely on an external module for which you need to download the trained model.
18 | 1. download [DCSM library](https://github.com/shimoda-uec/dcsm), and [DCSM trained model](http://mm.cs.uec.ac.jp/shimoda-k/models/mp512_iter_20000.caffemodel). Place the files create_saliency_raw.py and create_saliency_images.py in DCSM folder.
19 |
20 | 2. To create the saliency maps, run the file `create_saliency_raw.py` after upadting the paths to match your machine.
21 |
22 | These are raw class-specific saliency maps, we merge them into a single image for each input image. To do that, run:
23 |
24 | 3. `create_saliency_images.py`.
25 |
26 | In this file, you need to specify a threshold value above which pixels count as salient areas, and background otherwise. In our experiments the threshold value **0.125** performed best. Note that mutliple classes could be above this threshold, so to prevent a pixel from being assigned to more than one class concurrently, we take the class with the maximum score.
27 |
28 | 4. download [SEC](https://github.com/kolesman/SEC) and replace their train.prototxt with our train.prototxt.
29 |
30 | 5. You may now start training your segmentation CNN by typing in the command:
31 |
32 | `~/libs/caffe/build/tools/caffe train --solver solver.prototxt -weights vgg16_20M.caffemodel`
33 |
34 | (Make sure that Caffe Python layers (*_CSPN.py as well as SEC layers) are added to your system path so that Caffe can detect them.)
35 |
36 | 6. After training is finished, you can run the code in `evaluate_VOC_val.py` to evaluate the trained CNN on Pascal VOC validation dataset.
37 |
38 | 7. Alternatively, you can download the [trained model](https://1drv.ms/u/s!AkaKZSdGrfMlhC-hsevrwVm1fdt1).
39 |
40 | The optimal performance when integrating the new approach is reached at an early stage (iteration 3000), whereas in the baseline approach, a larger number of iterations was required. This fast convergence can be attributed to the fact that the space of feasible solutions is reduced only to those that are within the constraint set.
41 |
42 |
43 |
44 | For remarks or questions, please drop me an email (briq@iai.uni-bonn.de).
45 |
46 | If you use our approach or rely on it, please cite our paper:
47 |
48 |
49 | ```@ARTICLE{CSPN2018,
50 | author = {{Briq}, R. and {Moeller}, M. and {Gall}, J.},
51 | title = "{Convolutional Simplex Projection Network (CSPN) for Weakly Supervised Semantic Segmentation}",
52 | journal = {arXiv preprint arXiv:1807.09169},
53 | year = {2018}
54 | }
55 | ```
56 |
57 |
--------------------------------------------------------------------------------
/approach.PNG:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/briqr/CSPN/d3d01e5a4e29d0c2ee4f1dfda1f2e7815163d346/approach.PNG
--------------------------------------------------------------------------------
/argmax_layer_CSPN.py:
--------------------------------------------------------------------------------
1 | import sys
2 | sys.path.insert(0,'/home/briq/libs/caffe/python')
3 | import caffe
4 | import random
5 | import numpy as np
6 | import scipy.misc
7 |
8 |
9 | class ArgmaxLayer(caffe.Layer):
10 |
11 |
12 | def setup(self, bottom, top):
13 | pass
14 |
15 |
16 | def reshape(self, bottom, top):
17 | top[0].reshape(bottom[0].num, bottom[0].shape[2], bottom[0].shape[3])
18 |
19 |
20 |
21 | def forward(self, bottom, top):
22 | top[0].data[...] = np.argmax(bottom[0].data[...], axis=1)
23 |
24 |
25 | def backward(self, top, propagate_down, bottom):
26 | propagate_down = False
27 | pass
28 |
--------------------------------------------------------------------------------
/create_saliency_images.py:
--------------------------------------------------------------------------------
1 | import sys
2 | import numpy as np
3 | import scipy.misc
4 | import scipy.ndimage as nd
5 | import os.path
6 | import scipy.io as sio
7 |
8 |
9 |
10 |
11 | saliency_path = '/media/VOC/saliency/raw_maps/' # the path of the raw class-specific saliency maps, created by create_saliency_raw.py
12 | save_path = '/media/VOC/saliency/thresholded_saliency_images/' # the path where combined class-specific saliency maps will be saved after thresholding
13 | dataset_path = 'val.txt'
14 | size = 41 #corresponds to the dimension of fc8 in the CNN
15 |
16 | with open(dataset_path) as fp:
17 | images = fp.readlines()
18 | for im_id in range(len(images)):
19 | import os
20 |
21 | im_name = images[im_id].split(' ')[0].split('.')[0].split('/')[2]
22 |
23 | saliency_ims = []
24 | threshold = 0.125
25 | bkg = np.ones((size, size))*2
26 | for c in range(20):
27 | if(c==0):
28 | saliency_ims.append(np.zeros((size,size)))
29 | continue
30 |
31 | saliency_name = saliency_path+im_name+'_' + str(c)+'.mat'
32 | if (not os.path.isfile(saliency_name)):
33 | saliency_ims.append(np.ones((size,size)))
34 | saliency_ims[c] *= -2 # just to make sure non occuring classes will never turn up in the argmax operation
35 | continue
36 |
37 | saliency_map = sio.loadmat(saliency_name, squeeze_me=True)['data'] #
38 |
39 | saliency_map = nd.zoom(saliency_map.astype('float32'), (size / float(saliency_map.shape[0]), size / float(saliency_map.shape[1]) ), order=1)
40 | saliency_map[saliency_map=threshold)]=0 # mark the saliency pixels as non background
42 | saliency_ims.append(saliency_map)
43 | saliency_ims[0] = bkg
44 | total_name = save_path+im_name+'.mat'
45 | total_im=np.argmax(saliency_ims, axis=0)
46 | sio.savemat(total_name , {'data':total_im})
47 |
--------------------------------------------------------------------------------
/create_saliency_raw.py:
--------------------------------------------------------------------------------
1 | #Creates the saliency raw maps from DCSM CNN, you can download the library and prototxt from https://github.com/shimoda-uec/dcsm, and the trained model from
2 | #http://mm.cs.uec.ac.jp/shimoda-k/models/mp512_iter_20000.caffemodel
3 | import sys
4 | sys.path.append('/home/briq/libs/dcsm_saliency/caffe/python')
5 | import os
6 | import numpy as np
7 | import scipy.ndimage as nd
8 | import caffe
9 | import scipy.io as sio
10 |
11 | cm = './mp512_iter_20000.caffemodel'
12 | proto='./gdep.prototxt'
13 | caffe.set_mode_gpu()
14 |
15 |
16 | bsize=512
17 | mean = np.zeros((3,int(bsize),int(bsize)))
18 | mean[0,:,:]=104.00699
19 | mean[1,:,:]=116.66877
20 | mean[2,:,:]=122.67892
21 | channel_swap = [2,1,0]
22 | center_only=False
23 | input_scale=None
24 | image_dims=[bsize,bsize]
25 | ims=(int(bsize),int(bsize))
26 | raw_scale=255.0
27 | data = caffe.Classifier(proto,cm, image_dims=image_dims,mean=mean,
28 | input_scale=input_scale,
29 | raw_scale=raw_scale,
30 | channel_swap=channel_swap)
31 |
32 |
33 | size = 41
34 | orig_img_path = '/media/datasets/VOC2012/JPEGImages/' # the path to your VOC pascal input images
35 | img_list_path = 'input_list.txt'
36 | save_path= '/media/VOC/saliency/raw_maps/'
37 | with open(img_list_path) as f:
38 | content = f.readlines()
39 | f.close()
40 | content = [x.strip() for x in content]
41 |
42 | for line in content:
43 | img_name = line.strip().split('.')[0]
44 | img_full_name = orig_img_path + img_name + '.jpg'
45 | im = [caffe.io.load_image(img_full_name)]
46 | im2=[caffe.io.resize_image(im[0], ims)]
47 | im3 = np.zeros((1,ims[0],ims[1],im2[0].shape[2]),dtype=np.float32)
48 | im3[0]=im2[0]
49 | caffe_in = np.zeros(np.array(im3.shape)[[0, 3, 1, 2]],dtype=np.float32)
50 | caffe_in[0]=data.transformer.preprocess('data', im3[0])
51 | out = data.forward_all(**{'data': caffe_in})
52 | map=data.blobs['dcsmn'].data
53 | id=data.blobs['sortid'].data
54 |
55 | for i in range(map.shape[0]):
56 | sn=save_path+img_name+'_'+str(int(id[0,i,0,0])+1)+'.mat'
57 | sio.savemat(sn, {'data':map[i,0,:,:]})
58 |
59 |
--------------------------------------------------------------------------------
/evaluate_VOC_val.py:
--------------------------------------------------------------------------------
1 | # You can use this code to evaluate the trained model of CSPN on VOC validation data, adapted from SEC
2 |
3 | import numpy as np
4 | import pylab
5 |
6 | import scipy.ndimage as nd
7 |
8 |
9 | import imageio
10 | from matplotlib import pyplot as plt
11 | from matplotlib import colors as mpl_colors
12 |
13 | import krahenbuhl2013
14 | import sys
15 | sys.path.insert(0,'/home/briq/libs/caffe/python')
16 | import caffe
17 | import scipy
18 |
19 | caffe.set_device(0)
20 | caffe.set_mode_gpu()
21 |
22 |
23 |
24 | voc_classes = [ 'background',
25 | 'aeroplane',
26 | 'bicycle',
27 | 'bird',
28 | 'boat',
29 | 'bottle',
30 | 'bus',
31 | 'car',
32 | 'cat',
33 | 'chair',
34 | 'cow',
35 | 'diningtable',
36 | 'dog',
37 | 'horse',
38 | 'motorbike',
39 | 'person',
40 | 'pottedplant',
41 | 'sheep',
42 | 'sofa',
43 | 'train',
44 | 'tvmonitor',
45 | ]
46 | max_label = 20
47 |
48 | mean_pixel = np.array([104.0, 117.0, 123.0])
49 |
50 | palette = [(0.0, 0.0, 0.0), (0.5, 0.0, 0.0), (0.0, 0.5, 0.0), (0.5, 0.5, 0.0),
51 | (0.0, 0.0, 0.5), (0.5, 0.0, 0.5), (0.0, 0.5, 0.5), (0.5, 0.5, 0.5),
52 | (0.25, 0.0, 0.0), (0.75, 0.0, 0.0), (0.25, 0.5, 0.0), (0.75, 0.5, 0.0),
53 | (0.25, 0.0, 0.5), (0.75, 0.0, 0.5), (0.25, 0.5, 0.5), (0.75, 0.5, 0.5),
54 | (0.0, 0.25, 0.0), (0.5, 0.25, 0.0), (0.0, 0.75, 0.0), (0.5, 0.75, 0.0),
55 | (0.0, 0.25, 0.5)]
56 | my_cmap = mpl_colors.LinearSegmentedColormap.from_list('Custom cmap', palette, 21)
57 |
58 |
59 |
60 | def preprocess(image, size, mean_pixel=mean_pixel):
61 |
62 | image = np.array(image)
63 |
64 | image = nd.zoom(image.astype('float32'),
65 | (size / float(image.shape[0]),
66 | size / float(image.shape[1]), 1.0),
67 | order=1)
68 |
69 | image = image[:, :, [2, 1, 0]]
70 | image = image - mean_pixel
71 |
72 | image = image.transpose([2, 0, 1])
73 | return image
74 |
75 |
76 | def predict_mask(image_file, net, smooth=True):
77 |
78 | im = pylab.imread(image_file)
79 |
80 | net.blobs['images'].data[0] = preprocess(im, 321)
81 | net.forward()
82 |
83 | scores = np.transpose(net.blobs['fc8-SEC'].data[0], [1, 2, 0])
84 | d1, d2 = float(im.shape[0]), float(im.shape[1])
85 | scores_exp = np.exp(scores - np.max(scores, axis=2, keepdims=True))
86 | probs = scores_exp / np.sum(scores_exp, axis=2, keepdims=True)
87 | probs = nd.zoom(probs, (d1 / probs.shape[0], d2 / probs.shape[1], 1.0), order=1)
88 |
89 | eps = 0.00001
90 | probs[probs < eps] = eps
91 |
92 | if smooth:
93 | result = np.argmax(krahenbuhl2013.CRF(im, np.log(probs), scale_factor=1.0), axis=2)
94 | else:
95 | result = np.argmax(probs, axis=2)
96 |
97 | return result
98 |
99 | def evaluate(res, gt_img):
100 | intersect_gt_res = np.sum( (res == gt_img) & (res!=0) & (gt_img!=0) )
101 | union_gt_res = np.sum( (res!=0) | (gt_img!=0) )
102 | acc = float(intersect_gt_res) / union_gt_res
103 | return acc
104 |
105 |
106 |
107 | model = '/home/briq/libs/CSPN/training/models/model_iter_3000.caffemodel'
108 | draw = False
109 | smoothing = True
110 |
111 |
112 | if __name__ == "__main__":
113 |
114 | num_classes = len(voc_classes)
115 | gt_path = '/media/datasets/VOC2012/SegmentationClassAug/'
116 |
117 | orig_img_path = '/media/datasets/VOC2012/JPEGImages/'
118 | img_list_path = '/home/briq/libs/CSPN/list/val_id.txt'
119 |
120 | with open(img_list_path) as f:
121 | content = f.readlines()
122 | f.close()
123 | content = [x.strip() for x in content]
124 |
125 | num_ims = 0
126 |
127 | cspn_net = caffe.Net('deploy.prototxt', model, caffe.TEST)
128 |
129 | for line in content:
130 | img_name = line.strip()
131 |
132 |
133 | gt_name = gt_path + img_name
134 | gt_name = gt_name + '.png'
135 |
136 | gt_img = imageio.imread(gt_name)
137 |
138 |
139 | orig_img_name = orig_img_path + img_name
140 | orig_img_name = orig_img_name + '.jpg'
141 | res = predict_mask(orig_img_name, cspn_net, smooth=smoothing)
142 |
143 | num_ims += 1
144 | if(num_ims%100==0):
145 | print '-----------------im:{}---------------------\n'.format(num_ims)
146 |
147 |
148 | acc = evaluate(res, gt_img)
149 |
150 | print img_name, str(num_ims), "{}%\n".format(acc*100)
151 |
152 | if draw:
153 | fig = plt.figure()
154 | ax = fig.add_subplot('221')
155 | ax.imshow(pylab.imread(orig_img_name))
156 | plt.title('image')
157 |
158 | ax = fig.add_subplot('222')
159 | ax.matshow(gt_img, vmin=0, vmax=21, cmap=my_cmap)
160 | plt.title('GT')
161 |
162 | ax = fig.add_subplot('223')
163 | ax.matshow(res, vmin=0, vmax=21, cmap=my_cmap)
164 | plt.title('CSPN')
165 |
166 |
167 | plt.show()
168 |
169 |
170 |
171 |
--------------------------------------------------------------------------------
/projection_layer_CSPN.py:
--------------------------------------------------------------------------------
1 | # the simplex projection algorithm implemented as a layer, while using the saliency maps to obtain object size estimates
2 | import sys
3 | sys.path.insert(0,'/home/briq/libs/caffe/python')
4 | import caffe
5 | import random
6 | import numpy as np
7 | import scipy.misc
8 | import imageio
9 | import cv2
10 | import scipy.ndimage as nd
11 | import os.path
12 | import scipy.io as sio
13 | class SimplexProjectionLayer(caffe.Layer):
14 |
15 |
16 | saliency_path = '/media/VOC/saliency/thresholded_saliency_images/'
17 | input_list_path = '/home/briq/libs/CSPN/training/input_list.txt'
18 |
19 | def simplexProjectionLinear(self, data_ind, class_ind, V_im, nu):
20 | if(nu<1):
21 | return V_im
22 |
23 | heatmap_size = V_im.shape[0]*V_im.shape[1]
24 | theta = np.sum(V_im)
25 | if(theta ==nu): # the size constrain is already satisfied
26 | return V_im
27 | if(theta < nu):
28 | pi = V_im+(nu-theta)/heatmap_size
29 | return pi
30 |
31 | V = V_im.flatten()
32 | s = 0.0
33 | p = 0.0
34 | U=V
35 |
36 | while(len(U) > 0):
37 | k = random.randint(0, len(U)-1)
38 | uk = U[k]
39 | UG = U[U>=uk]
40 | delta_p = len(UG)
41 | delta_s = np.sum(UG)
42 | if ((s+delta_s)-(p+delta_p)*uk0.5): # the label is there
83 | instance = bottom[0].data[i][c]
84 | nu = np.sum(saliency_im==c)
85 | if(nu>1):
86 | instance = bottom[0].data[i][c]
87 | top[0].data[i][c]= self.simplexProjectionLinear(i, c, instance, nu)
88 |
89 |
90 |
91 | def backward(self, top, propagate_down, bottom):
92 | pass
--------------------------------------------------------------------------------
/results.PNG:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/briqr/CSPN/d3d01e5a4e29d0c2ee4f1dfda1f2e7815163d346/results.PNG
--------------------------------------------------------------------------------
/softmax_loss_CSPN.py:
--------------------------------------------------------------------------------
1 | import sys
2 | sys.path.insert(0,'/home/briq/libs/caffe/python')
3 | import caffe
4 | import numpy as np
5 |
6 | class ProjectionSoftmax(caffe.Layer):
7 |
8 | def softmax(self, scores):
9 | probs = np.zeros(scores.shape)
10 | maxScore = np.max(scores, axis=0)
11 | scores -= maxScore
12 | scores = np.exp(scores)
13 | sum_scores = np.sum(scores, axis=0)
14 | probs = scores / sum_scores
15 | return probs
16 |
17 |
18 | def setup(self, bottom, top):
19 | self.normalization = bottom[0].num*bottom[0].shape[2]*bottom[0].shape[3]
20 |
21 |
22 | def reshape(self, bottom, top):
23 | if bottom[0].num != bottom[1].num:
24 | raise Exception("The true label dimension must be equal to the output dimension")
25 | top[0].reshape(1)
26 |
27 |
28 |
29 |
30 | def forward(self, bottom, top):
31 | self.projected_probs = np.zeros(bottom[1].data.shape)
32 | for i in range(bottom[0].num):
33 | self.projected_probs[i] = self.softmax(bottom[1].data[i].copy())
34 |
35 | accum_loss = np.sum(-bottom[0].data*(np.log(self.projected_probs )))
36 |
37 | top[0].data[...] = accum_loss / self.normalization
38 |
39 |
40 |
41 |
42 | def backward(self, top, propagate_down, bottom):
43 | is_argmax = False
44 | if(not is_argmax): # softmax
45 | bottom[0].diff[...] = -(self.projected_probs[...]-bottom[0].data[...])
46 | else: # hard argmax
47 | max_label = np.argmax(bottom[1].data, axis=1)
48 | bottom[0].diff[...] = -(self.projected_probs[...]-bottom[0].data[...])
49 | bottom[0].diff[:,max_label,:,:] -= 1
50 | bottom[0].diff[...] /= self.normalization
51 |
--------------------------------------------------------------------------------
/train.prototxt:
--------------------------------------------------------------------------------
1 | name: "CSPN"
2 |
3 | layer {
4 | name: "Input"
5 | type: "ImageData"
6 | top: "images"
7 | top: "image_ids"
8 | transform_param {
9 | mirror: false
10 | mean_value: 104
11 | mean_value: 117
12 | mean_value: 123
13 | }
14 | image_data_param {
15 | root_folder: "/media/datasets/VOC2012/JPEGImages/"
16 | source: "/home/briq/libs/CSPN/training/input_list.txt"
17 | batch_size: 15
18 | new_height: 321
19 | new_width: 321
20 | shuffle: true
21 | }
22 | }
23 |
24 | layer {
25 | type: "Python"
26 | name: 'Annotation'
27 | bottom: 'image_ids'
28 | top: 'labels'
29 | top: 'cues'
30 | python_param {
31 | module: 'pylayers'
32 | layer: 'AnnotationLayer'
33 | }
34 | propagate_down: 0
35 | }
36 |
37 | layer {
38 | bottom: "images"
39 | top: "conv1_1"
40 | name: "conv1_1"
41 | type: "Convolution"
42 | param { lr_mult: 1 decay_mult: 1 }
43 | param { lr_mult: 2 decay_mult: 0 }
44 | convolution_param {
45 | num_output: 64
46 | pad: 1
47 | kernel_size: 3
48 | }
49 | }
50 |
51 | layer {
52 | bottom: "conv1_1"
53 | top: "conv1_1"
54 | name: "relu1_1"
55 | type: "ReLU"
56 | }
57 |
58 | layer {
59 | bottom: "conv1_1"
60 | top: "conv1_2"
61 | name: "conv1_2"
62 | type: "Convolution"
63 | param { lr_mult: 1 decay_mult: 1 }
64 | param { lr_mult: 2 decay_mult: 0 }
65 | convolution_param {
66 | num_output: 64
67 | pad: 1
68 | kernel_size: 3
69 | }
70 | }
71 |
72 | layer {
73 | bottom: "conv1_2"
74 | top: "conv1_2"
75 | name: "relu1_2"
76 | type: "ReLU"
77 | }
78 |
79 | layer {
80 | bottom: "conv1_2"
81 | top: "pool1"
82 | name: "pool1"
83 | type: "Pooling"
84 | pooling_param {
85 | pool: MAX
86 | kernel_size: 3
87 | stride: 2
88 | pad: 1
89 | }
90 | }
91 |
92 | layer {
93 | bottom: "pool1"
94 | top: "conv2_1"
95 | name: "conv2_1"
96 | type: "Convolution"
97 | param { lr_mult: 1 decay_mult: 1 }
98 | param { lr_mult: 2 decay_mult: 0 }
99 | convolution_param {
100 | num_output: 128
101 | pad: 1
102 | kernel_size: 3
103 | }
104 | }
105 |
106 | layer {
107 | bottom: "conv2_1"
108 | top: "conv2_1"
109 | name: "relu2_1"
110 | type: "ReLU"
111 | }
112 |
113 | layer {
114 | bottom: "conv2_1"
115 | top: "conv2_2"
116 | name: "conv2_2"
117 | type: "Convolution"
118 | param { lr_mult: 1 decay_mult: 1 }
119 | param { lr_mult: 2 decay_mult: 0 }
120 | convolution_param {
121 | num_output: 128
122 | pad: 1
123 | kernel_size: 3
124 | }
125 | }
126 |
127 | layer {
128 | bottom: "conv2_2"
129 | top: "conv2_2"
130 | name: "relu2_2"
131 | type: "ReLU"
132 | }
133 |
134 | layer {
135 | bottom: "conv2_2"
136 | top: "pool2"
137 | name: "pool2"
138 | type: "Pooling"
139 | pooling_param {
140 | pool: MAX
141 | kernel_size: 3
142 | stride: 2
143 | pad: 1
144 | }
145 | }
146 |
147 | layer {
148 | bottom: "pool2"
149 | top: "conv3_1"
150 | name: "conv3_1"
151 | type: "Convolution"
152 | param { lr_mult: 1 decay_mult: 1 }
153 | param { lr_mult: 2 decay_mult: 0 }
154 | convolution_param {
155 | num_output: 256
156 | pad: 1
157 | kernel_size: 3
158 | }
159 | }
160 |
161 | layer {
162 | bottom: "conv3_1"
163 | top: "conv3_1"
164 | name: "relu3_1"
165 | type: "ReLU"
166 | }
167 |
168 | layer {
169 | bottom: "conv3_1"
170 | top: "conv3_2"
171 | name: "conv3_2"
172 | type: "Convolution"
173 | param { lr_mult: 1 decay_mult: 1 }
174 | param { lr_mult: 2 decay_mult: 0 }
175 | convolution_param {
176 | num_output: 256
177 | pad: 1
178 | kernel_size: 3
179 | }
180 | }
181 |
182 | layer {
183 | bottom: "conv3_2"
184 | top: "conv3_2"
185 | name: "relu3_2"
186 | type: "ReLU"
187 | }
188 |
189 | layer {
190 | bottom: "conv3_2"
191 | top: "conv3_3"
192 | name: "conv3_3"
193 | type: "Convolution"
194 | param { lr_mult: 1 decay_mult: 1 }
195 | param { lr_mult: 2 decay_mult: 0 }
196 | convolution_param {
197 | num_output: 256
198 | pad: 1
199 | kernel_size: 3
200 | }
201 | }
202 |
203 | layer {
204 | bottom: "conv3_3"
205 | top: "conv3_3"
206 | name: "relu3_3"
207 | type: "ReLU"
208 | }
209 |
210 | layer {
211 | bottom: "conv3_3"
212 | top: "pool3"
213 | name: "pool3"
214 | type: "Pooling"
215 | pooling_param {
216 | pool: MAX
217 | kernel_size: 3
218 | stride: 2
219 | pad: 1
220 | }
221 | }
222 |
223 | layer {
224 | bottom: "pool3"
225 | top: "conv4_1"
226 | name: "conv4_1"
227 | type: "Convolution"
228 | param { lr_mult: 1 decay_mult: 1 }
229 | param { lr_mult: 2 decay_mult: 0 }
230 | convolution_param {
231 | num_output: 512
232 | pad: 1
233 | kernel_size: 3
234 | }
235 | }
236 |
237 | layer {
238 | bottom: "conv4_1"
239 | top: "conv4_1"
240 | name: "relu4_1"
241 | type: "ReLU"
242 | }
243 |
244 | layer {
245 | bottom: "conv4_1"
246 | top: "conv4_2"
247 | name: "conv4_2"
248 | type: "Convolution"
249 | param { lr_mult: 1 decay_mult: 1 }
250 | param { lr_mult: 2 decay_mult: 0 }
251 | convolution_param {
252 | num_output: 512
253 | pad: 1
254 | kernel_size: 3
255 | }
256 | }
257 |
258 | layer {
259 | bottom: "conv4_2"
260 | top: "conv4_2"
261 | name: "relu4_2"
262 | type: "ReLU"
263 | }
264 |
265 | layer {
266 | bottom: "conv4_2"
267 | top: "conv4_3"
268 | name: "conv4_3"
269 | type: "Convolution"
270 | param { lr_mult: 1 decay_mult: 1 }
271 | param { lr_mult: 2 decay_mult: 0 }
272 | convolution_param {
273 | num_output: 512
274 | pad: 1
275 | kernel_size: 3
276 | }
277 | }
278 |
279 | layer {
280 | bottom: "conv4_3"
281 | top: "conv4_3"
282 | name: "relu4_3"
283 | type: "ReLU"
284 | }
285 |
286 | layer {
287 | bottom: "conv4_3"
288 | top: "pool4"
289 | name: "pool4"
290 | type: "Pooling"
291 | pooling_param {
292 | pool: MAX
293 | kernel_size: 3
294 | pad: 1
295 | stride: 1
296 | }
297 | }
298 |
299 | layer {
300 | bottom: "pool4"
301 | top: "conv5_1"
302 | name: "conv5_1"
303 | type: "Convolution"
304 | param { lr_mult: 1 decay_mult: 1 }
305 | param { lr_mult: 2 decay_mult: 0 }
306 | convolution_param {
307 | num_output: 512
308 | pad: 2
309 | dilation: 2
310 | kernel_size: 3
311 | }
312 | }
313 |
314 | layer {
315 | bottom: "conv5_1"
316 | top: "conv5_1"
317 | name: "relu5_1"
318 | type: "ReLU"
319 | }
320 |
321 | layer {
322 | bottom: "conv5_1"
323 | top: "conv5_2"
324 | name: "conv5_2"
325 | type: "Convolution"
326 | param { lr_mult: 1 decay_mult: 1 }
327 | param { lr_mult: 2 decay_mult: 0 }
328 | convolution_param {
329 | num_output: 512
330 | pad: 2
331 | dilation: 2
332 | kernel_size: 3
333 | }
334 | }
335 |
336 | layer {
337 | bottom: "conv5_2"
338 | top: "conv5_2"
339 | name: "relu5_2"
340 | type: "ReLU"
341 | }
342 |
343 | layer {
344 | bottom: "conv5_2"
345 | top: "conv5_3"
346 | name: "conv5_3"
347 | type: "Convolution"
348 | param { lr_mult: 1 decay_mult: 1 }
349 | param { lr_mult: 2 decay_mult: 0 }
350 | convolution_param {
351 | num_output: 512
352 | pad: 2
353 | dilation: 2
354 | kernel_size: 3
355 | }
356 | }
357 |
358 | layer {
359 | bottom: "conv5_3"
360 | top: "conv5_3"
361 | name: "relu5_3"
362 | type: "ReLU"
363 | }
364 |
365 | layer {
366 | bottom: "conv5_3"
367 | top: "pool5"
368 | name: "pool5"
369 | type: "Pooling"
370 | pooling_param {
371 | pool: MAX
372 | kernel_size: 3
373 | stride: 1
374 | pad: 1
375 | }
376 | }
377 |
378 | layer {
379 | bottom: "pool5"
380 | top: "pool5a"
381 | name: "pool5a"
382 | type: "Pooling"
383 | pooling_param {
384 | pool: AVE
385 | kernel_size: 3
386 | stride: 1
387 | pad: 1
388 | }
389 | }
390 |
391 | layer {
392 | bottom: "pool5a"
393 | top: "fc6"
394 | name: "fc6"
395 | type: "Convolution"
396 | param { lr_mult: 1 decay_mult: 1 }
397 | param { lr_mult: 2 decay_mult: 0 }
398 | convolution_param {
399 | num_output: 1024
400 | pad: 12
401 | dilation: 12
402 | kernel_size: 3
403 | }
404 | }
405 |
406 | layer {
407 | bottom: "fc6"
408 | top: "fc6"
409 | name: "relu6"
410 | type: "ReLU"
411 | }
412 |
413 | layer {
414 | bottom: "fc6"
415 | top: "fc6"
416 | name: "drop6"
417 | type: "Dropout"
418 | dropout_param {
419 | dropout_ratio: 0.5
420 | }
421 | }
422 |
423 | layer {
424 | bottom: "fc6"
425 | top: "fc7"
426 | name: "fc7"
427 | type: "Convolution"
428 | param { lr_mult: 1 decay_mult: 1 }
429 | param { lr_mult: 2 decay_mult: 0 }
430 | convolution_param {
431 | num_output: 1024
432 | kernel_size: 1
433 | }
434 | }
435 |
436 | layer {
437 | bottom: "fc7"
438 | top: "fc7"
439 | name: "relu7"
440 | type: "ReLU"
441 | }
442 |
443 | layer {
444 | bottom: "fc7"
445 | top: "fc7"
446 | name: "drop7"
447 | type: "Dropout"
448 | dropout_param {
449 | dropout_ratio: 0.5
450 | }
451 | }
452 |
453 | layer {
454 | bottom: "fc7"
455 | top: "fc8-SEC"
456 | name: "fc8-SEC"
457 | type: "Convolution"
458 | param { lr_mult: 10 decay_mult: 1 }
459 | param { lr_mult: 20 decay_mult: 0 }
460 | convolution_param {
461 | num_output: 21
462 | kernel_size: 1
463 | weight_filler {
464 | type: "gaussian"
465 | std: 0.01
466 | }
467 | bias_filler {
468 | type: "constant"
469 | value: 0
470 | }
471 | }
472 | }
473 |
474 | layer {
475 | type: "Python"
476 | name: 'Softmax'
477 | bottom: 'fc8-SEC'
478 | top: 'fc8-SEC-Softmax'
479 | python_param {
480 | module: 'pylayers'
481 | layer: 'SoftmaxLayer'
482 | }
483 | propagate_down: 1
484 | }
485 |
486 |
487 | layer {
488 | type: "Python"
489 | name: 'CRF'
490 | bottom: 'fc8-SEC'
491 | bottom: 'images'
492 | top: 'fc8-SEC-CRF-log'
493 | python_param {
494 | module: 'pylayers'
495 | layer: 'CRFLayer'
496 | }
497 | propagate_down: 1
498 | propagate_down: 0
499 | }
500 |
501 |
502 |
503 |
504 | layer {
505 | type: "Python"
506 | name: "simplex"
507 | bottom: "fc8-SEC-Softmax"
508 | bottom: "labels" # the weak labels
509 | bottom: "image_ids"
510 | bottom: "fc8-SEC"
511 | top: "simplex_proj"
512 | python_param {
513 | module: "projection_layer_CSPN"
514 | layer: "SimplexProjectionLayer"
515 | }
516 | }
517 |
518 |
519 | layer {
520 | bottom: "simplex_proj"
521 | top: "label_proj"
522 | name: "argmax"
523 | type: "Python"
524 | python_param {
525 | module: "argmax_layer_CSPN"
526 | layer: "ArgmaxLayer"
527 | }
528 |
529 | }
530 |
531 |
532 | layer {
533 | name: "loss_hard"
534 | type: "SoftmaxWithLoss"
535 | bottom: "fc8-SEC"
536 | bottom: "label_proj"
537 | top: "softmax-loss-argmax"
538 | loss_param {
539 | ignore_label: 255
540 | }
541 | loss_weight: 1.0
542 | include: { phase: TRAIN }
543 | }
544 |
545 |
546 |
547 | #layer {
548 | # name: "loss_soft"
549 | # type: "Python"
550 | # bottom: "fc8-SEC-Softmax"
551 | # bottom: "simplex_proj"
552 | # top: "softmax-loss-soft"
553 | # python_param {
554 | # module: "softmax_loss_CSPN"
555 | # layer: "ProjectionSoftmax"
556 | # }
557 | # loss_weight: 1.0
558 | # include: { phase: TRAIN }
559 | #}
560 |
561 |
562 |
563 |
564 | layer {
565 | bottom: "fc8-SEC-Softmax"
566 | bottom: "cues"
567 | top: "loss-Seed"
568 | name: "loss-Seed"
569 | type: "Python"
570 | python_param {
571 | module: "pylayers"
572 | layer: "SeedLossLayer"
573 | }
574 | loss_weight: 1
575 | }
576 |
577 | layer {
578 | bottom: "fc8-SEC-Softmax"
579 | bottom: "fc8-SEC-CRF-log"
580 | top: "loss-Constrain"
581 | name: "loss-Constrain"
582 | type: "Python"
583 | python_param {
584 | module: "pylayers"
585 | layer: "ConstrainLossLayer"
586 | }
587 | loss_weight: 1
588 | }
589 |
590 |
591 |
592 |
--------------------------------------------------------------------------------
/train_weighted_argmax_softmax.prototxt:
--------------------------------------------------------------------------------
1 | name: "CSPN"
2 |
3 | layer {
4 | name: "Input"
5 | type: "ImageData"
6 | top: "images"
7 | top: "image_ids"
8 | transform_param {
9 | mirror: false
10 | mean_value: 104
11 | mean_value: 117
12 | mean_value: 123
13 | }
14 | image_data_param {
15 | root_folder: "/media/datasets/VOC2012_orig/JPEGImages/"
16 | source: "/home/briq/libs/CSPN/training/input_list.txt"
17 | batch_size: 15
18 | new_height: 321
19 | new_width: 321
20 | shuffle: true
21 | }
22 | }
23 |
24 | layer {
25 | type: "Python"
26 | name: 'Annotation'
27 | bottom: 'image_ids'
28 | top: 'labels'
29 | top: 'cues'
30 | python_param {
31 | module: 'pylayers'
32 | layer: 'AnnotationLayer'
33 | }
34 | propagate_down: 0
35 | }
36 |
37 | layer {
38 | bottom: "images"
39 | top: "conv1_1"
40 | name: "conv1_1"
41 | type: "Convolution"
42 | param { lr_mult: 1 decay_mult: 1 }
43 | param { lr_mult: 2 decay_mult: 0 }
44 | convolution_param {
45 | num_output: 64
46 | pad: 1
47 | kernel_size: 3
48 | }
49 | }
50 |
51 | layer {
52 | bottom: "conv1_1"
53 | top: "conv1_1"
54 | name: "relu1_1"
55 | type: "ReLU"
56 | }
57 |
58 | layer {
59 | bottom: "conv1_1"
60 | top: "conv1_2"
61 | name: "conv1_2"
62 | type: "Convolution"
63 | param { lr_mult: 1 decay_mult: 1 }
64 | param { lr_mult: 2 decay_mult: 0 }
65 | convolution_param {
66 | num_output: 64
67 | pad: 1
68 | kernel_size: 3
69 | }
70 | }
71 |
72 | layer {
73 | bottom: "conv1_2"
74 | top: "conv1_2"
75 | name: "relu1_2"
76 | type: "ReLU"
77 | }
78 |
79 | layer {
80 | bottom: "conv1_2"
81 | top: "pool1"
82 | name: "pool1"
83 | type: "Pooling"
84 | pooling_param {
85 | pool: MAX
86 | kernel_size: 3
87 | stride: 2
88 | pad: 1
89 | }
90 | }
91 |
92 | layer {
93 | bottom: "pool1"
94 | top: "conv2_1"
95 | name: "conv2_1"
96 | type: "Convolution"
97 | param { lr_mult: 1 decay_mult: 1 }
98 | param { lr_mult: 2 decay_mult: 0 }
99 | convolution_param {
100 | num_output: 128
101 | pad: 1
102 | kernel_size: 3
103 | }
104 | }
105 |
106 | layer {
107 | bottom: "conv2_1"
108 | top: "conv2_1"
109 | name: "relu2_1"
110 | type: "ReLU"
111 | }
112 |
113 | layer {
114 | bottom: "conv2_1"
115 | top: "conv2_2"
116 | name: "conv2_2"
117 | type: "Convolution"
118 | param { lr_mult: 1 decay_mult: 1 }
119 | param { lr_mult: 2 decay_mult: 0 }
120 | convolution_param {
121 | num_output: 128
122 | pad: 1
123 | kernel_size: 3
124 | }
125 | }
126 |
127 | layer {
128 | bottom: "conv2_2"
129 | top: "conv2_2"
130 | name: "relu2_2"
131 | type: "ReLU"
132 | }
133 |
134 | layer {
135 | bottom: "conv2_2"
136 | top: "pool2"
137 | name: "pool2"
138 | type: "Pooling"
139 | pooling_param {
140 | pool: MAX
141 | kernel_size: 3
142 | stride: 2
143 | pad: 1
144 | }
145 | }
146 |
147 | layer {
148 | bottom: "pool2"
149 | top: "conv3_1"
150 | name: "conv3_1"
151 | type: "Convolution"
152 | param { lr_mult: 1 decay_mult: 1 }
153 | param { lr_mult: 2 decay_mult: 0 }
154 | convolution_param {
155 | num_output: 256
156 | pad: 1
157 | kernel_size: 3
158 | }
159 | }
160 |
161 | layer {
162 | bottom: "conv3_1"
163 | top: "conv3_1"
164 | name: "relu3_1"
165 | type: "ReLU"
166 | }
167 |
168 | layer {
169 | bottom: "conv3_1"
170 | top: "conv3_2"
171 | name: "conv3_2"
172 | type: "Convolution"
173 | param { lr_mult: 1 decay_mult: 1 }
174 | param { lr_mult: 2 decay_mult: 0 }
175 | convolution_param {
176 | num_output: 256
177 | pad: 1
178 | kernel_size: 3
179 | }
180 | }
181 |
182 | layer {
183 | bottom: "conv3_2"
184 | top: "conv3_2"
185 | name: "relu3_2"
186 | type: "ReLU"
187 | }
188 |
189 | layer {
190 | bottom: "conv3_2"
191 | top: "conv3_3"
192 | name: "conv3_3"
193 | type: "Convolution"
194 | param { lr_mult: 1 decay_mult: 1 }
195 | param { lr_mult: 2 decay_mult: 0 }
196 | convolution_param {
197 | num_output: 256
198 | pad: 1
199 | kernel_size: 3
200 | }
201 | }
202 |
203 | layer {
204 | bottom: "conv3_3"
205 | top: "conv3_3"
206 | name: "relu3_3"
207 | type: "ReLU"
208 | }
209 |
210 | layer {
211 | bottom: "conv3_3"
212 | top: "pool3"
213 | name: "pool3"
214 | type: "Pooling"
215 | pooling_param {
216 | pool: MAX
217 | kernel_size: 3
218 | stride: 2
219 | pad: 1
220 | }
221 | }
222 |
223 | layer {
224 | bottom: "pool3"
225 | top: "conv4_1"
226 | name: "conv4_1"
227 | type: "Convolution"
228 | param { lr_mult: 1 decay_mult: 1 }
229 | param { lr_mult: 2 decay_mult: 0 }
230 | convolution_param {
231 | num_output: 512
232 | pad: 1
233 | kernel_size: 3
234 | }
235 | }
236 |
237 | layer {
238 | bottom: "conv4_1"
239 | top: "conv4_1"
240 | name: "relu4_1"
241 | type: "ReLU"
242 | }
243 |
244 | layer {
245 | bottom: "conv4_1"
246 | top: "conv4_2"
247 | name: "conv4_2"
248 | type: "Convolution"
249 | param { lr_mult: 1 decay_mult: 1 }
250 | param { lr_mult: 2 decay_mult: 0 }
251 | convolution_param {
252 | num_output: 512
253 | pad: 1
254 | kernel_size: 3
255 | }
256 | }
257 |
258 | layer {
259 | bottom: "conv4_2"
260 | top: "conv4_2"
261 | name: "relu4_2"
262 | type: "ReLU"
263 | }
264 |
265 | layer {
266 | bottom: "conv4_2"
267 | top: "conv4_3"
268 | name: "conv4_3"
269 | type: "Convolution"
270 | param { lr_mult: 1 decay_mult: 1 }
271 | param { lr_mult: 2 decay_mult: 0 }
272 | convolution_param {
273 | num_output: 512
274 | pad: 1
275 | kernel_size: 3
276 | }
277 | }
278 |
279 | layer {
280 | bottom: "conv4_3"
281 | top: "conv4_3"
282 | name: "relu4_3"
283 | type: "ReLU"
284 | }
285 |
286 | layer {
287 | bottom: "conv4_3"
288 | top: "pool4"
289 | name: "pool4"
290 | type: "Pooling"
291 | pooling_param {
292 | pool: MAX
293 | kernel_size: 3
294 | pad: 1
295 | stride: 1
296 | }
297 | }
298 |
299 | layer {
300 | bottom: "pool4"
301 | top: "conv5_1"
302 | name: "conv5_1"
303 | type: "Convolution"
304 | param { lr_mult: 1 decay_mult: 1 }
305 | param { lr_mult: 2 decay_mult: 0 }
306 | convolution_param {
307 | num_output: 512
308 | pad: 2
309 | dilation: 2
310 | kernel_size: 3
311 | }
312 | }
313 |
314 | layer {
315 | bottom: "conv5_1"
316 | top: "conv5_1"
317 | name: "relu5_1"
318 | type: "ReLU"
319 | }
320 |
321 | layer {
322 | bottom: "conv5_1"
323 | top: "conv5_2"
324 | name: "conv5_2"
325 | type: "Convolution"
326 | param { lr_mult: 1 decay_mult: 1 }
327 | param { lr_mult: 2 decay_mult: 0 }
328 | convolution_param {
329 | num_output: 512
330 | pad: 2
331 | dilation: 2
332 | kernel_size: 3
333 | }
334 | }
335 |
336 | layer {
337 | bottom: "conv5_2"
338 | top: "conv5_2"
339 | name: "relu5_2"
340 | type: "ReLU"
341 | }
342 |
343 | layer {
344 | bottom: "conv5_2"
345 | top: "conv5_3"
346 | name: "conv5_3"
347 | type: "Convolution"
348 | param { lr_mult: 1 decay_mult: 1 }
349 | param { lr_mult: 2 decay_mult: 0 }
350 | convolution_param {
351 | num_output: 512
352 | pad: 2
353 | dilation: 2
354 | kernel_size: 3
355 | }
356 | }
357 |
358 | layer {
359 | bottom: "conv5_3"
360 | top: "conv5_3"
361 | name: "relu5_3"
362 | type: "ReLU"
363 | }
364 |
365 | layer {
366 | bottom: "conv5_3"
367 | top: "pool5"
368 | name: "pool5"
369 | type: "Pooling"
370 | pooling_param {
371 | pool: MAX
372 | kernel_size: 3
373 | stride: 1
374 | pad: 1
375 | }
376 | }
377 |
378 | layer {
379 | bottom: "pool5"
380 | top: "pool5a"
381 | name: "pool5a"
382 | type: "Pooling"
383 | pooling_param {
384 | pool: AVE
385 | kernel_size: 3
386 | stride: 1
387 | pad: 1
388 | }
389 | }
390 |
391 | layer {
392 | bottom: "pool5a"
393 | top: "fc6"
394 | name: "fc6"
395 | type: "Convolution"
396 | param { lr_mult: 1 decay_mult: 1 }
397 | param { lr_mult: 2 decay_mult: 0 }
398 | convolution_param {
399 | num_output: 1024
400 | pad: 12
401 | dilation: 12
402 | kernel_size: 3
403 | }
404 | }
405 |
406 | layer {
407 | bottom: "fc6"
408 | top: "fc6"
409 | name: "relu6"
410 | type: "ReLU"
411 | }
412 |
413 | layer {
414 | bottom: "fc6"
415 | top: "fc6"
416 | name: "drop6"
417 | type: "Dropout"
418 | dropout_param {
419 | dropout_ratio: 0.5
420 | }
421 | }
422 |
423 | layer {
424 | bottom: "fc6"
425 | top: "fc7"
426 | name: "fc7"
427 | type: "Convolution"
428 | param { lr_mult: 1 decay_mult: 1 }
429 | param { lr_mult: 2 decay_mult: 0 }
430 | convolution_param {
431 | num_output: 1024
432 | kernel_size: 1
433 | }
434 | }
435 |
436 | layer {
437 | bottom: "fc7"
438 | top: "fc7"
439 | name: "relu7"
440 | type: "ReLU"
441 | }
442 |
443 | layer {
444 | bottom: "fc7"
445 | top: "fc7"
446 | name: "drop7"
447 | type: "Dropout"
448 | dropout_param {
449 | dropout_ratio: 0.5
450 | }
451 | }
452 |
453 | layer {
454 | bottom: "fc7"
455 | top: "fc8-SEC"
456 | name: "fc8-SEC"
457 | type: "Convolution"
458 | param { lr_mult: 10 decay_mult: 1 }
459 | param { lr_mult: 20 decay_mult: 0 }
460 | convolution_param {
461 | num_output: 21
462 | kernel_size: 1
463 | weight_filler {
464 | type: "gaussian"
465 | std: 0.01
466 | }
467 | bias_filler {
468 | type: "constant"
469 | value: 0
470 | }
471 | }
472 | }
473 |
474 | layer {
475 | type: "Python"
476 | name: 'Softmax'
477 | bottom: 'fc8-SEC'
478 | top: 'fc8-SEC-Softmax'
479 | python_param {
480 | module: 'pylayers'
481 | layer: 'SoftmaxLayer'
482 | }
483 | propagate_down: 1
484 | }
485 |
486 |
487 | layer {
488 | type: "Python"
489 | name: 'CRF'
490 | bottom: 'fc8-SEC'
491 | bottom: 'images'
492 | top: 'fc8-SEC-CRF-log'
493 | python_param {
494 | module: 'pylayers'
495 | layer: 'CRFLayer'
496 | }
497 | propagate_down: 1
498 | propagate_down: 0
499 | }
500 |
501 |
502 |
503 |
504 | layer {
505 | type: "Python"
506 | name: "simplex"
507 | bottom: "fc8-SEC-Softmax"
508 | bottom: "labels" # the weak labels
509 | bottom: "image_ids"
510 | bottom: "fc8-SEC"
511 | top: "simplex_proj"
512 | python_param {
513 | module: "projection_layer_CSPN"
514 | layer: "SimplexProjectionLayer"
515 | }
516 | }
517 |
518 |
519 |
520 | layer {
521 | bottom: "simplex_proj"
522 | top: "label_proj"
523 | name: "argmax"
524 | type: "Python"
525 | python_param {
526 | module: "argmax_layer_CSPN"
527 | layer: "ArgmaxLayer"
528 | }
529 | }
530 |
531 |
532 |
533 | layer {
534 | name: "loss_hard"
535 | type: "SoftmaxWithLoss"
536 | bottom: "fc8-SEC"
537 | bottom: "label_proj"
538 | top: "softmax-loss-hard"
539 | loss_param {
540 | ignore_label: 255
541 | }
542 | loss_weight: 0.9
543 | include: { phase: TRAIN }
544 | }
545 |
546 |
547 | layer {
548 | name: "loss_soft"
549 | type: "Python"
550 | bottom: "fc8-SEC-Softmax"
551 | bottom: "simplex_proj"
552 | bottom: "label_proj"
553 | top: "softmax-loss-soft"
554 | python_param {
555 | module: "softmax_loss_CSPN"
556 | layer: "ProjectionSoftmax"
557 | }
558 | loss_weight: 0.1
559 | include: { phase: TRAIN }
560 | }
561 |
562 |
563 |
564 |
565 | layer {
566 | bottom: "fc8-SEC-Softmax"
567 | bottom: "cues"
568 | top: "loss-Seed"
569 | name: "loss-Seed"
570 | type: "Python"
571 | python_param {
572 | module: "pylayers"
573 | layer: "SeedLossLayer"
574 | }
575 | loss_weight: 1
576 | }
577 |
578 | layer {
579 | bottom: "fc8-SEC-Softmax"
580 | bottom: "fc8-SEC-CRF-log"
581 | top: "loss-Constrain"
582 | name: "loss-Constrain"
583 | type: "Python"
584 | python_param {
585 | module: "pylayers"
586 | layer: "ConstrainLossLayer"
587 | }
588 | loss_weight: 1
589 | }
590 |
591 |
592 |
593 |
--------------------------------------------------------------------------------