├── README.md
├── data.py
├── data_prep.py
├── images
    ├── outputvideo.mp4
    ├── testOut.png
    └── train_label.png
├── runModel.py
└── train.py


/README.md:
--------------------------------------------------------------------------------
 1 | 
 2 | # Image segmentation using U-Net
 3 | Henry Yau
 4 |  May 28, 2018
 5 | 
 6 | ## Problem description
 7 | The challenge presented was to perform a pixel wise identification of the road and vehicles from a video of the car simulator Carla. http://carla.readthedocs.io/en/latest/
 8 | The solution attempted here is image segmentation using a U-Net implemented with Keras with a TensorFlow backend.
 9 | ### Training data
10 | The training data provided is a set of 1000 800x600 PNG images from the hood of a simulated car and their corresponding labels, also 800x600 PNG images. The training labels have integer values in the red channels corresponding to the ID of the particular object located at that given pixel. The training labels are preprocessed to move ID's not corresponding to the road or vehicles to a single label and to relabel pixels corresponding to road markings to the road label. In addition, pixels corresponding to the vehicle hood are set to 0 (none). To prevent overfitting, additional simulated runs on Carla were generated providing 2300 more training images and labels. In addition to the additional simulated runs, data augmentation was used. The input images and labels are then resized to 256x256.
11 | 
12 | 
13 | Training image and label: 
14 | ![alt text][trainingImage1]
15 | 
16 | [trainingImage1]:https://raw.githubusercontent.com/henyau/Image-Segmentation-with-Unet/master/images/train_label.png "Training image and label"
17 | 
18 | 
19 | ### Data augmentation
20 | The standard data generator on Keras does not appear to be meant for multiclass image segmentation problems like this one. However there is a trick intended for transforming masks which can be applied here. By providing a the same seed to both the image and label ImageDataGenerator, the same transformations are applied to both. Using zip() creates an iterator which provides the image/label pair.
21 | 
22 | ```		
23 | data_gen_args = dict(featurewise_center=False,featurewise_std_normalization=False, rotation_range=10.,width_shift_range=0.3, 
24 | height_shift_range=0.3,zoom_range=[0.7,1.3], horizontal_flip=True, fill_mode = 'constant', cval = 0.0, shear_range = 0.3)
25 | image_datagen = ImageDataGenerator(**data_gen_args)
26 | mask_datagen = ImageDataGenerator(**data_gen_args)
27 | seed = 1
28 | image_datagen.fit(imgs_train, augment=True, seed=seed)
29 | mask_datagen.fit(imgs_mask_train, augment=True, seed=seed)
30 | ...
31 | train_generator = zip(image_generator, mask_generator)
32 | ```
33 | It is possible to leave the label information in a single channel, then in the loss function convert the values into a OHE tensor. To make implementing the loss function more straight forward, the label data is preprocessed so that the road labels are in the R-channel as ones and zeros and the vehicle labels are in the G-channel and the remaining labels are set tot the B-channel. 
34 | 
35 | ## U-Net Model
36 | The U-Net architecture is a fairly recent advancement in the image segmentation field, originally developed by Ronneberger, et al. at the University of Freiberg to perform image segmentation on neuronal structures.  https://lmb.informatik.uni-freiburg.de/people/ronneber/u-net/
37 | 
38 | The general structure of the U-Net starts out  similar to a generic CNN used for image classification.  That is, a series of 3x3 convolutionals filters and ReLU activation functions with a max pooling layer. This contraction pass results in a long single feature vector, in the original work a 2x1024 representing the background and foreground features. The implementation in this project has a 4x1024 vector representing no label, road label, vehicle label, all additional labels, and a none channel. In a classification problem, a fully connected layer and a sigmoid activation function would be placed here after this to get an output, however U-Net differs significantly from here on. An expansion path is performed which results in a label image with the same dimensions as the input image. The expansion is performed with a series of up-convolutions. What makes U-Net unique is that the weights are passed directly from the contraction path and to the corresponding layer of the expansion path. This kernel is used to map the feature vector into a 2x2 pixel output.
39 | 
40 | ## Implementation details
41 | ### U-Net layers
42 | This implementation used here is derived from: 	https://github.com/petrosgk/Kaggle-Carvana-Image-Masking-Challenge
43 | 
44 | The contraction path layers are implemented in Keras as :
45 | ~~~
46 |     down = Conv2D(64, (3, 3), padding='same')(input)
47 |     down = BatchNormalization()(down)
48 |     down = Activation('relu')(down)
49 |     down = Conv2D(64, (3, 3), padding='same')(down)
50 |     down = BatchNormalization()(down)
51 |     down = Activation('relu')(down)
52 |     down_pool = MaxPooling2D((2, 2), strides=(2, 2))(down)
53 | ~~~
54 | With each subsequent layer doubling  until being squeezed into a 1024 center layer. The up-convolution layers use ```UpSampling2D((2,2))``` which doubles the size of the input followed by ```concatenate([down, up])``` which combines the learned weights of the contraction path layer to the expansion layer. The ```Conv2D()```  layer that follows then uses those weights to map the input feature vector to a 2x2 pixel output. The each additional layer doubles the output dimensions until the original dimensions are reached. The original implementation uses no padding so the output dimensions are smaller than the original image.
55 | 
56 | ### Loss function
57 | The loss function is combination of one minus the F-Beta score, one minus the Dice score, and a weighted categorical cross entropy. The F-Beta score is computed as the average of the two F-Beta score for the vehicle and the road. To implement this the ``` Lambda``` function is used to separate the two channels. For example, the predictions are separated using:
58 | ~~~
59 | pred0 = Lambda(lambda x : x[:,:,:,0])(y_pred)
60 | pred1 = Lambda(lambda x : x[:,:,:,1])(y_pred)
61 | ~~~
62 | From here, the prediction and truths are flattened to a 1D tensor to compute precision and recall which are then used to compute the F-Beta scores using the definitions:
63 | $$\mathrm{precision} =\frac{| \mathrm{pred}\cap\mathrm{truth}|}{|\mathrm{pred}|}$$
64 | $$\mathrm{recall} =\frac{| \mathrm{pred}\cap\mathrm{truth}|}{|\mathrm{truth}|}$$
65 | $$F_{\beta} = (1+\beta^2)\frac{\mathrm{precision}\cdot \mathrm{recall}}{(\beta^2 \mathrm{precision})+ \mathrm{recall}}$$
66 | 
67 | The F-beta is a generalization of the F1 score where one can add more weight to either precision or recall.  The Dice index is a measure of how similar two sets are, similar to precision and recall. One minus the Dice index is a commonly used loss function for the image segmentation problem. The Dice index is defined as:
68 | $$\mathrm{Dice} =2\frac{| \mathrm{pred}\cap\mathrm{truth}|}{|\mathrm{pred}|+|\mathrm{truth}|}$$
69 | 
70 | Finally a weighted cross entropy is a log loss function where each categorical label can be assigned a weight
71 | $$\mathrm{loss} = -\sum \mathrm{truth}\cdot  \log(\mathrm{pred})  w_i$$
72 | 
73 | ### Results
74 | I intended to train the model with only the weighted categorical cross entropy and the dice loss then add the f-beta loss when a local mininum was found, but ran out of time on the virtual machine.  The validation Dice score was around 0.98 at the end of training.  The model still produces adequate results at over 11 frames per second with an f-beta score of over 0.98 for the road and 0.81 for vehicles. With more time for further training, the accuracy can likely be increased significantly.  
75 | 
76 | A video of a test sample can be viewed in the images directory.
77 | Clicking on the sample test output image below links to a video of a sample output overlaid on the test video. 
78 | 
79 | [![IMAGE ALT TEXT HERE](https://raw.githubusercontent.com/henyau/Image-Segmentation-with-Unet/master/images/testOut.png)](https://github.com/henyau/Image-Segmentation-with-Unet/blob/master/images/outputvideo.mp4?raw=true)
80 | 
81 | ### Todo
82 | 
83 | Convert to TensorRT. Has custom loss functions which need to be converted to plug-ins.


--------------------------------------------------------------------------------
/data.py:
--------------------------------------------------------------------------------
  1 | from keras.preprocessing.image import ImageDataGenerator, array_to_img, img_to_array, load_img
  2 | from skimage import exposure
  3 | #from skimage.morphology import disk
  4 | #from skimage.filters import unsharp_mask
  5 | from skimage import filters
  6 | import numpy as np 
  7 | import os
  8 | import glob
  9 | from numpy import zeros, newaxis
 10 | #import cv2
 11 | 
 12 | class dataProcess(object):
 13 | 	'''
 14 | 	images and labels are stored into np arrays and save to disk, images are normalized and mean centered when loaded
 15 | 	'''
 16 |     
 17 | 	def __init__(self, out_rows, out_cols, data_path = "./Train/train-256", label_path = "./Train/train_masks-256", test_path = "./test", npy_path = "./npydata", img_type = "png"):
 18 | 		self.out_rows = out_rows
 19 | 		self.out_cols = out_cols
 20 | 		self.data_path = './Train/train-'f'{str(out_rows)}'
 21 | 		self.label_path = './Train/train_masks-'f'{str(out_rows)}'
 22 | 		self.img_type = img_type
 23 | 		self.test_path = test_path
 24 | 		self.npy_path = npy_path
 25 | 
 26 | 	def create_train_data(self):
 27 | 		i = 0
 28 | 		num_img = 3000
 29 | 		print('-'*30)
 30 | 		print('Creating training images...')
 31 | 		print('-'*30)
 32 | 		imgs = glob.glob(self.data_path+"/*."+self.img_type)
 33 | 		print(len(imgs))
 34 | 
 35 |         
 36 | 		imgdatas = np.ndarray((num_img,self.out_cols,self.out_rows,3), dtype=np.uint8)
 37 | 		imglabels = np.ndarray((num_img,self.out_cols,self.out_rows,1), dtype=np.uint8)
 38 | 
 39 |         
 40 | 		for imgname in imgs:
 41 | 			midname = imgname[imgname.rindex("/")+1:]
 42 | 			img = load_img(self.data_path + "/" + midname,grayscale = False)
 43 | 			label = load_img(self.label_path + "/" + midname,grayscale = False)
 44 | 			print(midname)
 45 |             
 46 | 			img = img_to_array(img)
 47 | 			label = img_to_array(label)
 48 | 			#img = cv2.imread(self.data_path + "/" + midname,cv2.IMREAD_GRAYSCALE)
 49 | 			#label = cv2.imread(self.label_path + "/" + midname,cv2.IMREAD_GRAYSCALE)
 50 | 			#img = np.array([img])
 51 | 			#label = np.array([label])
 52 | 			#imgdatas[i] = img
 53 | 			tmp2 = label[...,0]
 54 | 			tmp2 = tmp2[...,newaxis] #still need 128,128,1
 55 | 			#imglabels[i] = tmp2
 56 | 			
 57 | 
 58 |             #if there is a car to the left add it
 59 | 			'''
 60 | 			cars_b = False
 61 | 			car_px_cnt = 0
 62 | 			for ii in range(128):
 63 | 				for jj in range(64, 110):
 64 | 					if tmp2[ii,jj,0] == 10:
 65 | 						car_px_cnt+=1
 66 | 					if car_px_cnt >10:
 67 | 						cars_b = True                        
 68 | 						print(midname)
 69 | 						break
 70 | 				if cars_b == True:
 71 | 					break;                    
 72 | 			if cars_b:
 73 | 				imglabels[i] = tmp2
 74 | 				imgdatas[i] = img
 75 | 				i += 1
 76 | 			'''
 77 |                 
 78 | 			imglabels[i] = tmp2
 79 | 			imgdatas[i] = img
 80 | 			i += 1
 81 |                 
 82 | 			if i % 100 == 0:
 83 | 				print('Done: {0}/{1} images'.format(i, len(imgs)))
 84 | 			#i += 1
 85 | 			if i>=num_img:
 86 | 				break
 87 | 		print('loading done')
 88 | 		np.save(self.npy_path + '/imgs_train_'f'{str(self.out_rows)}.npy', imgdatas)
 89 | 		np.save(self.npy_path + '/imgs_mask_train_'f'{str(self.out_rows)}.npy', imglabels)
 90 | 		print('Saving to .npy files done.')
 91 | 
 92 | 	def create_test_data(self):
 93 | 		i = 0
 94 | 		print('-'*30)
 95 | 		print('Creating test images...')
 96 | 		print('-'*30)
 97 | 		imgs = glob.glob(self.test_path+"/*."+self.img_type)
 98 | 		print(len(imgs))
 99 | 		imgdatas = np.ndarray((len(imgs),self.out_rows,self.out_cols,1), dtype=np.uint8)
100 | 		for imgname in imgs:
101 | 			midname = imgname[imgname.rindex("/")+1:]
102 | 			img = load_img(self.test_path + "/" + midname,grayscale = True)
103 | 			img = img_to_array(img)
104 | 			#img = cv2.imread(self.test_path + "/" + midname,cv2.IMREAD_GRAYSCALE)
105 | 			#img = np.array([img])
106 | 			imgdatas[i] = img 
107 | 			i += 1
108 | 		print('loading done')
109 | 		np.save(self.npy_path + '/imgs_test.npy', imgdatas)
110 | 		print('Saving to imgs_test.npy files done.')
111 | 
112 | 	def load_train_data_1chan(self):
113 | 		print('-'*30)
114 | 		print('load train images...')
115 | 		print('-'*30)
116 | 		imgs_train = np.load(self.npy_path+"/imgs_train.npy")
117 | 		imgs_mask_train = np.load(self.npy_path+"/imgs_mask_train.npy")
118 | 		imgs_train = imgs_train.astype('float32')
119 | 		
120 | 		imgs_train /= 255
121 | 		mean = imgs_train.mean(axis = 0)
122 | 		imgs_train -= mean	
123 | 		#print('mean: '+ repr(mean))
124 | 		#imgs_mask_train = imgs_mask_train.astype('float32')
125 | 		#imgs_mask_train /= 255
126 | 		
127 | 		#imgs_mask_train[imgs_mask_train > 0] = 1 # do binary for now
128 | 		dim_mask = imgs_mask_train.shape
129 | 		
130 | 		imgs_mask_train2 = np.zeros((  dim_mask[0] , dim_mask[1], dim_mask[2]  , 1 )) # two classes in 3 channels?
131 | 		
132 | 		#for categorical
133 | 		for i in range(dim_mask[0]):
134 | 			for indc, c in enumerate([7,10,15]):                                  
135 | 				imgs_mask_train2[i, : , : , 0 ] += (imgs_mask_train[i, : , :, 0] == c ).astype(int)
136 | 
137 | 		#for sparse
138 | 		
139 | 		
140 | 		#imgs_mask_train[imgs_mask_train <= 0.5] = 0
141 | 		#the mask should have two channels, road and vehicle, then just repaint after
142 | 		#return imgs_train,imgs_mask_train2
143 | 		
144 | 		#test sparse categorical for now?
145 | 		return imgs_train,imgs_mask_train2,mean
146 |     
147 | 	def load_train_data(self):
148 | 		print('-'*30)
149 | 		print('load train images...')
150 | 		print('-'*30)
151 | 		imgs_train = np.load(self.npy_path+'/imgs_train_'f'{str(self.out_rows)}.npy')
152 | 		imgs_mask_train = np.load(self.npy_path+'/imgs_mask_train_'f'{str(self.out_rows)}.npy')
153 | 		
154 | 		
155 | 		#imgs_train = exposure.equalize_hist(imgs_train) #global equalize
156 | 		#selem = disk(30)
157 | 		#imgs_train = rank.equalize(imgs_train, selem=selem) # local equalize       
158 | 		#imgs_train = exposure.adjust_gamma(imgs_train, gamma=2, gain=1)
159 |         
160 | 		imgs_train = imgs_train.astype('float32')
161 |         
162 | 		#imgs_train /= 127
163 | 		imgs_train /= 255
164 | 		#imgs_train -= 1
165 | 		#imgs_train = filters.unsharp_mask(imgs_train)
166 | 		mean = imgs_train.mean(axis = 0)
167 | 		#imgs_train -= mean	 #test without subtracting mean
168 |         
169 |         
170 | 		#print('mean: '+ repr(mean))
171 | 		#imgs_mask_train = imgs_mask_train.astype('float32')
172 | 		#imgs_mask_train /= 255
173 | 		#imgs_mask_train[imgs_mask_train > 0] = 1 # do binary for now
174 | 		dim_mask = imgs_mask_train.shape
175 | 		
176 | 		imgs_mask_train2 = np.zeros((  dim_mask[0] , dim_mask[1], dim_mask[2]  , 3 )) # two classes vehicle or road
177 | 		
178 | 		#for categorical
179 | 		for i in range(dim_mask[0]):
180 | 			for indc, c in enumerate([7,10,15]): 
181 | 				imgs_mask_train2[i, : , : , indc ] = (imgs_mask_train[i, : , :, 0] == c ).astype(int)
182 | 
183 | 		#for sparse
184 | 		#imgs_mask_train[imgs_mask_train <= 0.5] = 0
185 | 		#the mask should have two channels, road and vehicle, then just repaint after
186 | 		#return imgs_train,imgs_mask_train2
187 | 		
188 | 		#test sparse categorical for now?
189 | 		return imgs_train,imgs_mask_train2,mean
190 |     
191 | 
192 | 	def create_mean(self):
193 | 		print('-'*30)
194 | 		print('load train to compute mean images...')
195 | 		print('-'*30)
196 | 		imgs_train = np.load(self.npy_path+'/imgs_train_'f'{str(self.out_rows)}.npy')
197 | 		imgs_train = imgs_train.astype('float32')		
198 | 		imgs_train /= 255
199 | 		mean = imgs_train.mean(axis = 0)		
200 | 		np.save(self.npy_path + '/train_mean_'f'{str(self.out_rows)}.npy', mean)
201 | 
202 | 		
203 | 	def get_mean(self):
204 | 		mean = np.load(self.npy_path+'/train_mean_'f'{str(self.out_rows)}.npy')
205 | 		return mean
206 | 
207 | 
208 | if __name__ == "__main__":
209 | 	#traindata = dataProcess(448,448, data_path = "./Train/train-448", label_path = "./Train/train_masks-448"
210 | 	traindata = dataProcess(256,256, data_path = "./Train/train-256", label_path = "./Train/train_masks-256")
211 | 	#traindata = dataProcess(288,288, data_path = "./Train/train-288", label_path = "./Train/train_masks-288")    
212 | 	#traindata = dataProcess(512,512, data_path = "./Train/train-512", label_path = "./Train/train_masks-512")
213 | 	traindata.create_train_data()
214 | 	traindata.create_mean()
215 | 
216 | 


--------------------------------------------------------------------------------
/data_prep.py:
--------------------------------------------------------------------------------
 1 | #from fastai.conv_learner import *
 2 | #from fastai.dataset import *
 3 | from sklearn.model_selection import train_test_split
 4 | #from fastai.models.resnet import vgg_resnet50
 5 | from concurrent.futures import ThreadPoolExecutor
 6 | 
 7 | from PIL import Image
 8 | import matplotlib.pyplot as plt
 9 | from pathlib import Path
10 | import json
11 | #torch.cuda.set_device(0)
12 | import shutil
13 | import numpy as np
14 | 
15 | import cv2
16 | 
17 | PATH = Path('./Train/')
18 | list(PATH.iterdir())
19 | #sx = 256
20 | sz = 256
21 | 
22 | TRAIN_DN = 'CameraRGB'
23 | MASKS_DN = 'CameraSeg'
24 | 
25 | def show_img(im, figsize=None, ax=None, alpha=None):
26 |     if not ax: fig,ax = plt.subplots(figsize=figsize)
27 |     ax.imshow(im, alpha=alpha)
28 |     ax.set_axis_off()
29 |     return ax
30 | 
31 | #list((PATH/TRAIN_DN).iterdir())[:5]
32 | list((PATH/MASKS_DN).iterdir())[:5]
33 | 
34 | 
35 | (PATH/'train_masks-'f'{str(sz)}').mkdir(exist_ok=True)
36 | (PATH/'train-'f'{str(sz)}').mkdir(exist_ok=True)
37 | 
38 | def resize_mask(fn):
39 |     tmpimg = Image.open(fn).resize((sz,sz))
40 |     oneChanAr = np.zeros((sz, sz));
41 |     for i in range(0,sz):
42 |         for j in range(0,sz):
43 |             r, g, b = tmpimg.getpixel((i, j))
44 |             if r == 6:
45 |                 #tmpimg.putpixel((i, j), (7,0,0))
46 |                 oneChanAr[j][i] = 7
47 |             elif r!= 7 and r!= 10 and r!=0:
48 |                 #tmpimg.putpixel((i, j), (15,0,0))
49 |                 oneChanAr[j][i] = 15
50 |             elif j>sz*0.82 and r ==10: # remove hood, use 100 for 128x128 images
51 |                 #tmpimg.putpixel((i, j), (0,0,0)) #can actually extend the road
52 |                 oneChanAr[j][i] = 0
53 |             else:
54 |                 #tmpimg.putpixel((i, j), (r,0,0)) #can actually extend the road
55 |                 oneChanAr[j][i] = r
56 |                 
57 | 
58 |     oneChanIm = Image.fromarray(np.uint8(oneChanAr))
59 |     oneChanIm.save((fn.parent.parent)/'train_masks-'f'{str(sz)}'/fn.name, mode='L')
60 | 
61 | def resize_img(fn):
62 |     Image.open(fn).resize((sz,sz)).save((fn.parent.parent)/'train-'f'{str(sz)}'/fn.name)
63 | 
64 | #resize_mask(PATH/MASKS_DN/'200.png')
65 | #ims = cv2.imread(str(PATH/TRAIN_DN/'200.png'))
66 | #im_masks = cv2.imread(str(PATH/'train_masks-128/200.png'))
67 | #print(im_masks.shape)
68 | #ax = show_img(ims)
69 | #ax = show_img(im_masks[...,0])    
70 | 
71 |          
72 | 		 
73 | files = list((PATH/f'{MASKS_DN}').iterdir())
74 | with ThreadPoolExecutor(8) as e: 
75 |     e.map(resize_mask, files)
76 | 
77 | files = list((PATH/f'{TRAIN_DN}').iterdir())
78 | with ThreadPoolExecutor(8) as e: 
79 |     e.map(resize_img, files)
80 | 


--------------------------------------------------------------------------------
/images/outputvideo.mp4:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/henyau/Image-Segmentation-with-Unet/e46e6bd7c195098e7f20b9a898137d32f9b1250a/images/outputvideo.mp4


--------------------------------------------------------------------------------
/images/testOut.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/henyau/Image-Segmentation-with-Unet/e46e6bd7c195098e7f20b9a898137d32f9b1250a/images/testOut.png


--------------------------------------------------------------------------------
/images/train_label.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/henyau/Image-Segmentation-with-Unet/e46e6bd7c195098e7f20b9a898137d32f9b1250a/images/train_label.png


--------------------------------------------------------------------------------
/runModel.py:
--------------------------------------------------------------------------------
  1 | from keras.models import load_model
  2 | import sys, skvideo.io, json, base64
  3 | from skimage import exposure
  4 | import numpy as np
  5 | from PIL import Image
  6 | from io import BytesIO, StringIO
  7 | from train import *
  8 | import cv2
  9 | import os
 10 | from concurrent.futures import ThreadPoolExecutor
 11 | 
 12 | #sz = 256
 13 | #sx = 320
 14 | #sy = 224
 15 | 
 16 | #sx = 448
 17 | #sy = 640
 18 | sx = 256
 19 | sy = 256
 20 | 
 21 | os.environ['TF_CPP_MIN_LOG_LEVEL'] = '3' 
 22 | 
 23 | file = sys.argv[-1]
 24 | 
 25 | if file == 'runModel.py':
 26 |   print ("Error loading video")
 27 |   quit
 28 | 
 29 | 
 30 | # Define encoder function
 31 | def encode(array):
 32 | 	pil_img = Image.fromarray(array)
 33 | 	buff = BytesIO()
 34 | 	pil_img.save(buff, format="PNG")
 35 | 	return base64.b64encode(buff.getvalue()).decode("utf-8")
 36 | 
 37 | video = skvideo.io.vread(file)
 38 | 
 39 | answer_key = {}
 40 | 
 41 | # Frame numbering starts at 1
 42 | frame = 1
 43 | 
 44 | # Load model
 45 | #model = get_unet_128((sx,sy,3),3)
 46 | #model.load_weights("unet_Datgen.hdf5")
 47 | #model.load_weights("unet_Datgen_NoMean448.hdf5")
 48 | model = load_model('./models/unet_Datgen_256Frozen.hdf5')
 49 | 
 50 | #model.load_weights("unet_1chan.hdf5")
 51 | 
 52 | 
 53 | #imgs_train, imgs_mask_train, train_mean = mydata.load_train_data()
 54 | train_mean= np.load("./npydata/train_mean_256.npy") #mydata.get_mean()
 55 | #train_mea n = np.load("./npydata/train_mean.npy") #mydata.get_mean()
 56 | 
 57 | # Process video with trained Unet model
 58 | 
 59 | 
 60 | #process all the frames in the video, resize frame in array [x,128,128,3], then process at once
 61 | dim = video.shape[0]
 62 | 
 63 | test_array = np.zeros((dim, sx,sy,3))
 64 | #print(test_array.shape)
 65 | i = 0
 66 | 
 67 | 
 68 | 
 69 | for rgb_frame in video:
 70 |     #use PIL for test
 71 |     #ims_PIL = Image.fromarray(rgb_frame);
 72 |     #ims = ims_PIL.resize((sz, sz),Image.NEAREST)
 73 |     #ims = np.array(ims)
 74 |     ims = cv2.resize(rgb_frame,(sy,sx) ,interpolation = cv2.INTER_CUBIC)  
 75 |     #ims = exposure.adjust_gamma(ims, gamma=2, gain=1)
 76 |     #ims = exposure.adjust_gamma(ims) #global equalize
 77 |     ims = ims.astype('float32')
 78 |     
 79 |     #ims /=127.0
 80 |     #ims -= 1.0
 81 |     ims /= 255
 82 |     #ims -=train_mean
 83 |     
 84 |     test_array[i, ...] = ims
 85 |     i+=1
 86 | 
 87 | #datagen = ImageDataGenerator(zca_whitening=True)
 88 | #datagen.fit(test_array)
 89 | #for i in range(len(test_array)):
 90 |     #test_array[i] = datagen.standardize(test_array[i])
 91 |     
 92 | wholeBatch = True
 93 | if wholeBatch == True:
 94 |     res = model.predict(test_array)
 95 | 
 96 |     for ylab in res:
 97 |         
 98 |         for i in range(0,sx): #remove hood
 99 |             for j in range(0,sy):
100 |                 if j>(sy*0.82):
101 |                     ylab[i,j,1] = 0
102 |                     
103 |         res_road = cv2.resize(ylab[...,0],(800,600) ,interpolation = cv2.INTER_CUBIC)
104 |         res_vehicle = cv2.resize(ylab[...,1],(800,600) ,interpolation = cv2.INTER_LINEAR)
105 | 
106 |         res_road = (res_road>0.5).astype('uint8')
107 |         res_vehicle = (res_vehicle>0.5).astype('uint8')
108 | 
109 |         answer_key[frame] = [encode(res_vehicle), encode(res_road)]
110 | 
111 |         frame+=1
112 | else:
113 |     for test_img in test_array:
114 |         
115 |         ylab = model.predict(test_img[newaxis,...])      
116 |         for i in range(0,sx):
117 |             for j in range(0,sy):
118 |                 if j>(sy*0.82): # remove hood
119 |                     ylab[0,i,j,1] = 0
120 | 
121 |         res_road = cv2.resize(ylab[0,...,0],(800,600) ,interpolation = cv2.INTER_LINEAR)
122 |         res_vehicle = cv2.resize(ylab[0,...,1],(800,600) ,interpolation = cv2.INTER_LINEAR)
123 | 
124 |         res_road = (res_road>0.5).astype('uint8')
125 |         res_vehicle = (res_vehicle>0.5).astype('uint8')
126 |         
127 |         answer_key[frame] = [encode(res_vehicle[...]), encode(res_road[...])]
128 | 
129 |         frame+=1
130 |         
131 | # Print output in proper json format
132 | print (json.dumps(answer_key))
133 | 


--------------------------------------------------------------------------------
/train.py:
--------------------------------------------------------------------------------
  1 | import os 
  2 | #os.environ["CUDA_VISIBLE_DEVICES"] = "0"
  3 | import numpy as np
  4 | from keras.models import *
  5 | from keras.layers import Input, merge, Conv2D, MaxPooling2D, UpSampling2D, Dropout, Cropping2D, Lambda
  6 | from keras.preprocessing.image import ImageDataGenerator, array_to_img, img_to_array, load_img
  7 | from keras.optimizers import *
  8 | from keras.callbacks import ModelCheckpoint, LearningRateScheduler
  9 | from keras import backend as keras
 10 | from data import *
 11 | from keras.utils.np_utils import to_categorical
 12 | from keras.losses import categorical_crossentropy
 13 | from keras import backend as K
 14 | from keras.utils import to_categorical
 15 | from keras.models import Model
 16 | from keras.layers import Input, concatenate, Conv2D, MaxPooling2D, Activation, UpSampling2D, BatchNormalization
 17 | from keras.optimizers import RMSprop
 18 | from sklearn.model_selection import train_test_split
 19 | 
 20 | def dice(y_pred, y_true):
 21 |     y_true_f = K.flatten(y_true)
 22 |     y_pred_f = K.flatten(y_pred)
 23 |     intersection = K.sum(y_true_f * y_pred_f)
 24 |     return (2. * intersection + 1) / (K.sum(y_true_f) + K.sum(y_pred_f) + 1)
 25 | 
 26 | def fbeta(y_pred, y_true):
 27 | 
 28 |     pred0 = Lambda(lambda x : x[:,:,:,0])(y_pred)
 29 |     pred1 = Lambda(lambda x : x[:,:,:,1])(y_pred)
 30 |     true0 = Lambda(lambda x : x[:,:,:,0])(y_true)
 31 |     true1 = Lambda(lambda x : x[:,:,:,1])(y_true) # channel last?
 32 |     
 33 |     y_pred_0 = K.flatten(pred0)
 34 |     y_true_0 = K.flatten(true0)
 35 |     
 36 |     y_pred_1 = K.flatten(pred1)
 37 |     y_true_1 = K.flatten(true1)
 38 |     
 39 |     intersection0 = K.sum(y_true_0 * y_pred_0)
 40 |     intersection1 = K.sum(y_true_1 * y_pred_1)
 41 | 
 42 |     precision0 = intersection0/(K.sum(y_pred_0)+K.epsilon())
 43 |     recall0 = intersection0/(K.sum(y_true_0)+K.epsilon())
 44 |     
 45 |     precision1 = intersection1/(K.sum(y_pred_1)+K.epsilon())
 46 |     recall1 = intersection1/(K.sum(y_true_1)+K.epsilon())
 47 |     
 48 |     fbeta0 = (1.0+0.25)*(precision0*recall0)/(0.25*precision0+recall0+K.epsilon())
 49 |     fbeta1 = (1.0+4.0)*(precision1*recall1)/(4.0*precision1+recall1+K.epsilon())
 50 |     
 51 |     return ((fbeta0+fbeta1)/2.0)
 52 | 
 53 | def fbeta_loss(y_true, y_pred):
 54 |     return 1-fbeta(y_true, y_pred)
 55 | 
 56 | def dice_loss(y_true, y_pred):
 57 |     return 1-dice(y_true, y_pred)
 58 | 
 59 | def bce_dice_loss(y_true, y_pred):
 60 |     return binary_crossentropy(y_true, y_pred) + dice_coef_loss(y_true, y_pred)
 61 | def weighted_categorical_crossentropy(y_true, y_pred):    
 62 |     #weights = K.variable([0.5,2.0,0.0])
 63 |     weights = K.variable([0.5,4.0,0.0])
 64 |         
 65 |     # scale predictions so that the class probas of each sample sum to 1
 66 |     y_pred /= K.sum(y_pred, axis=-1, keepdims=True)
 67 |     # clip to prevent NaN's and Inf's
 68 |     y_pred = K.clip(y_pred, K.epsilon(), 1 - K.epsilon())
 69 |     # calc
 70 |     loss = y_true * K.log(y_pred) * weights
 71 |     loss = -K.sum(loss, -1)
 72 |     
 73 |     
 74 |     return loss
 75 | 
 76 | def cat_dice_loss(y_true, y_pred):
 77 | #    return categorical_crossentropy(y_true, y_pred) + dice_loss(y_true, y_pred)
 78 |     #return weighted_categorical_crossentropy(y_true, y_pred) + dice_loss(y_true, y_pred)+fbeta_loss(y_true, y_pred)
 79 |     return weighted_categorical_crossentropy(y_true, y_pred) + dice_loss(y_true, y_pred)
 80 |     #return weighted_categorical_crossentropy(y_true, y_pred) +fbeta_loss(y_true, y_pred)
 81 |     #return dice_loss(y_true, y_pred)
 82 | def get_unet_128(input_shape=(128, 128, 3),
 83 |                  num_classes=2):
 84 |     
 85 |     #This U-Net implementation is derived from: 	https://github.com/petrosgk/Kaggle-Carvana-Image-Masking-Challenge
 86 |     
 87 |     inputs = Input(shape=input_shape)
 88 |     # 256
 89 | 
 90 |     down0 = Conv2D(32, (3, 3), padding='same')(inputs)
 91 |     down0 = BatchNormalization()(down0)
 92 |     down0 = Activation('relu')(down0)
 93 |     down0 = Conv2D(32, (3, 3), padding='same')(down0)
 94 |     down0 = BatchNormalization()(down0)
 95 |     down0 = Activation('relu')(down0)
 96 |     down0_pool = MaxPooling2D((2, 2), strides=(2, 2))(down0)
 97 |     # 128
 98 | 
 99 |     down1 = Conv2D(64, (3, 3), padding='same')(down0_pool)
100 |     down1 = BatchNormalization()(down1)
101 |     down1 = Activation('relu')(down1)
102 |     down1 = Conv2D(64, (3, 3), padding='same')(down1)
103 |     down1 = BatchNormalization()(down1)
104 |     down1 = Activation('relu')(down1)
105 |     down1_pool = MaxPooling2D((2, 2), strides=(2, 2))(down1)
106 |     # 64
107 | 
108 |     down2 = Conv2D(128, (3, 3), padding='same')(down1_pool)
109 |     down2 = BatchNormalization()(down2)
110 |     down2 = Activation('relu')(down2)
111 |     down2 = Conv2D(128, (3, 3), padding='same')(down2)
112 |     down2 = BatchNormalization()(down2)
113 |     down2 = Activation('relu')(down2)
114 |     down2_pool = MaxPooling2D((2, 2), strides=(2, 2))(down2)
115 |     # 32
116 | 
117 |     down3 = Conv2D(256, (3, 3), padding='same')(down2_pool)
118 |     down3 = BatchNormalization()(down3)
119 |     down3 = Activation('relu')(down3)
120 |     down3 = Conv2D(256, (3, 3), padding='same')(down3)
121 |     down3 = BatchNormalization()(down3)
122 |     down3 = Activation('relu')(down3)
123 |     down3_pool = MaxPooling2D((2, 2), strides=(2, 2))(down3)
124 |     # 16
125 | 
126 |     down4 = Conv2D(512, (3, 3), padding='same')(down3_pool)
127 |     down4 = BatchNormalization()(down4)
128 |     down4 = Activation('relu')(down4)
129 |     down4 = Conv2D(512, (3, 3), padding='same')(down4)
130 |     down4 = BatchNormalization()(down4)
131 |     down4 = Activation('relu')(down4)
132 |     down4_pool = MaxPooling2D((2, 2), strides=(2, 2))(down4)
133 |     # 8
134 | 
135 |     center = Conv2D(1024, (3, 3), padding='same')(down4_pool)
136 |     center = BatchNormalization()(center)
137 |     center = Activation('relu')(center)
138 |     center = Conv2D(1024, (3, 3), padding='same')(center)
139 |     center = BatchNormalization()(center)
140 |     center = Activation('relu')(center)
141 |     # center
142 | 
143 |     up4 = UpSampling2D((2, 2))(center)
144 |     up4 = concatenate([down4, up4], axis=3)
145 |     up4 = Conv2D(512, (3, 3), padding='same')(up4)
146 |     up4 = BatchNormalization()(up4)
147 |     up4 = Activation('relu')(up4)
148 |     up4 = Conv2D(512, (3, 3), padding='same')(up4)
149 |     up4 = BatchNormalization()(up4)
150 |     up4 = Activation('relu')(up4)
151 |     up4 = Conv2D(512, (3, 3), padding='same')(up4)
152 |     up4 = BatchNormalization()(up4)
153 |     up4 = Activation('relu')(up4)
154 |     # 16
155 | 
156 |     up3 = UpSampling2D((2, 2))(up4)
157 |     up3 = concatenate([down3, up3], axis=3)
158 |     up3 = Conv2D(256, (3, 3), padding='same')(up3)
159 |     up3 = BatchNormalization()(up3)
160 |     up3 = Activation('relu')(up3)
161 |     up3 = Conv2D(256, (3, 3), padding='same')(up3)
162 |     up3 = BatchNormalization()(up3)
163 |     up3 = Activation('relu')(up3)
164 |     up3 = Conv2D(256, (3, 3), padding='same')(up3)
165 |     up3 = BatchNormalization()(up3)
166 |     up3 = Activation('relu')(up3)
167 |     # 32
168 | 
169 |     up2 = UpSampling2D((2, 2))(up3)
170 |     up2 = concatenate([down2, up2], axis=3)
171 |     up2 = Conv2D(128, (3, 3), padding='same')(up2)
172 |     up2 = BatchNormalization()(up2)
173 |     up2 = Activation('relu')(up2)
174 |     up2 = Conv2D(128, (3, 3), padding='same')(up2)
175 |     up2 = BatchNormalization()(up2)
176 |     up2 = Activation('relu')(up2)
177 |     up2 = Conv2D(128, (3, 3), padding='same')(up2)
178 |     up2 = BatchNormalization()(up2)
179 |     up2 = Activation('relu')(up2)
180 |     # 64
181 | 
182 |     up1 = UpSampling2D((2, 2))(up2)
183 |     up1 = concatenate([down1, up1], axis=3)
184 |     up1 = Conv2D(64, (3, 3), padding='same')(up1)
185 |     up1 = BatchNormalization()(up1)
186 |     up1 = Activation('relu')(up1)
187 |     up1 = Conv2D(64, (3, 3), padding='same')(up1)
188 |     up1 = BatchNormalization()(up1)
189 |     up1 = Activation('relu')(up1)
190 |     up1 = Conv2D(64, (3, 3), padding='same')(up1)
191 |     up1 = BatchNormalization()(up1)
192 |     up1 = Activation('relu')(up1)
193 |     # 128
194 | 
195 |     up0 = UpSampling2D((2, 2))(up1)
196 |     up0 = concatenate([down0, up0], axis=3)
197 |     up0 = Conv2D(32, (3, 3), padding='same')(up0)
198 |     up0 = BatchNormalization()(up0)
199 |     up0 = Activation('relu')(up0)
200 |     up0 = Conv2D(32, (3, 3), padding='same')(up0)
201 |     up0 = BatchNormalization()(up0)
202 |     up0 = Activation('relu')(up0)
203 |     up0 = Conv2D(32, (3, 3), padding='same')(up0)
204 |     up0 = BatchNormalization()(up0)
205 |     up0 = Activation('relu')(up0)
206 |     # 256
207 | 
208 |     #classify = Conv2D(num_classes, (1, 1), activation='sigmoid')(up0)
209 |     classify = Conv2D(3, (1, 1), activation='sigmoid')(up0) #using dataGen means 1,3,4 channels only
210 | 
211 |     model = Model(inputs=inputs, outputs=classify)
212 | 
213 |     #model.compile(optimizer=RMSprop(lr=0.0001), loss=bce_dice_loss, metrics=[dice_coeff])
214 |     #model.compile(optimizer = Adam(lr = 1e-5), loss = 'sparse_categorical_crossentropy', metrics = [dice])
215 |     #model.compile(optimizer = Adam(lr = 1e-5), loss = 'categorical_crossentropy', metrics = [dice])
216 |     
217 |     for layer in model.layers:#freeze the conv layers? # only want to learn input weights (and output weights?)
218 |         layer.trainable = False
219 |     lr = 1e-4
220 |     model.compile(optimizer = Adam(lr = lr, decay=1e-5), loss = cat_dice_loss, metrics = [dice, fbeta])
221 |     #keras.optimizers.Nadam(lr=0.002, beta_1=0.9, beta_2=0.999, epsilon=None, schedule_decay=0.004)
222 | 
223 |     return model
224 |     
225 | class TrainModel(object):
226 | 
227 | 	def __init__(self, img_rows = 512, img_cols = 512):
228 | 
229 | 		self.img_rows = img_rows
230 | 		self.img_cols = img_cols
231 | 
232 | 	def load_data(self):
233 | 
234 | 		mydata = dataProcess(self.img_rows, self.img_cols)
235 | 		imgs_train, imgs_mask_train, mean = mydata.load_train_data()
236 | 		#imgs_mask_train = to_categorical(imgs_mask_train)
237 | 		#imgs_test = mydata.load_test_data()
238 | 		return imgs_train, imgs_mask_train#, imgs_test
239 | 
240 | 	def train(self):
241 | 		'''
242 | 		print("loading data")
243 | 		imgs_train, imgs_mask_train, imgs_test = self.load_data()
244 | 		print("loading data done")
245 | 		#model = self.get_unet()
246 | 		model = get_unet_128((128,128,3),2)
247 | 		print("got unet")
248 | 
249 | 		model_checkpoint = ModelCheckpoint('unet_new.hdf5', monitor='loss',verbose=1, save_best_only=True)
250 | 		print('Fitting model...')
251 | 		model.fit(imgs_train, imgs_mask_train, batch_size=8, nb_epoch=8, verbose=1,validation_split=0.2, shuffle=True, callbacks=[model_checkpoint])
252 |         
253 | 
254 | 		#print('predict test data')
255 | 		#imgs_mask_test = model.predict(imgs_test, batch_size=1, verbose=1)
256 | 		#np.save('../results/imgs_mask_test.npy', imgs_mask_test)
257 | 		'''
258 | 		'''data_gen_args1 = dict(featurewise_center=False,featurewise_std_normalization=False, rotation_range=25.,width_shift_range=0.3, 
259 |                              height_shift_range=0.3,zoom_range=[0.5,1.5], horizontal_flip=True, fill_mode = 'constant', cval = 0.0, shear_range = 0.3, zca_whitening= True)'''
260 | 
261 |         
262 | 		#data_gen_args = dict(featurewise_center=False,featurewise_std_normalization=False, rotation_range=25.,width_shift_range=0.2, 
263 |                             # height_shift_range=0.2,zoom_range=[0.6,1.4], horizontal_flip=True, fill_mode = 'constant', cval = 0.0, shear_range = 0.0)        
264 | 		#data_gen_args = dict(featurewise_center=False,featurewise_std_normalization=False, rotation_range=10.0,width_shift_range=0.1, 
265 |                              #height_shift_range=0.1,zoom_range=[0.8,1.2], horizontal_flip=False, fill_mode = 'constant', cval = 0.0, shear_range = 0.0)
266 | 		data_gen_args = dict(featurewise_center=False,featurewise_std_normalization=False, rotation_range=0.,width_shift_range=0, 
267 |                              height_shift_range=0.0,zoom_range=0.0, horizontal_flip=True, fill_mode = 'constant', cval = 0.0, shear_range = 0)
268 |         
269 | 		image_datagen = ImageDataGenerator(**data_gen_args)#data generator works on both image and masks
270 | 		mask_datagen = ImageDataGenerator(**data_gen_args)
271 |         
272 | 		X_train, X_test, y_train, y_test = train_test_split(imgs_train, imgs_mask_train, test_size=0.20, random_state=42)
273 |         
274 | 		seed = 1 
275 | 		'''      
276 | 		image_datagen.fit(imgs_train, augment=True, seed=seed)
277 | 		mask_datagen.fit(imgs_mask_train, augment=True, seed=seed)
278 |         
279 | 		image_generator=image_datagen.flow(imgs_train,batch_size = 12,shuffle = True, seed = seed)
280 | 		mask_generator=mask_datagen.flow(imgs_mask_train,batch_size = 12,shuffle = True, seed = seed)
281 |         
282 | 		#image_generator = image_datagen.flow_from_directory('Train/train-128', class_mode=None,seed=seed)
283 | 		#mask_generator = mask_datagen.flow_from_directory('Train/train_masks-128',class_mode=None,seed=seed)
284 |         
285 | 		'''  
286 |         
287 | 		image_datagen.fit(X_train, augment=True, seed=seed)
288 | 		mask_datagen.fit(y_train, augment=True, seed=seed)
289 | 		image_generator=image_datagen.flow(X_train,batch_size = 4,shuffle = True, seed = seed)
290 | 		mask_generator=mask_datagen.flow(y_train,batch_size = 4,shuffle = True, seed = seed)
291 | 
292 | 		valid_image_generator=image_datagen.flow(X_test,batch_size = 4,shuffle = True, seed = seed)
293 | 		valid_mask_generator=mask_datagen.flow(y_test,batch_size = 4,shuffle = True, seed = seed)        
294 |         
295 | 		train_generator = zip(image_generator, mask_generator)
296 | 		valid_generator = zip(valid_image_generator, valid_mask_generator)
297 | 		model = get_unet_128((self.img_rows,self.img_cols,3),2)
298 |         
299 | 		model_checkpoint = ModelCheckpoint('unet_Datgen_256Frozen.hdf5', monitor='val_loss',verbose=1, save_best_only=True)
300 | 		#model.load_weights("unet_1chan.hdf5")
301 | 		#model.load_weights("unet_Datgen_NewDat.hdf5")
302 | 		model.load_weights("unet_Datgen_256Full.hdf5")      
303 | 		#model.load_weights("unet_Datgen_fBeta.hdf5")   
304 | 		#model.fit_generator(train_generator, steps_per_epoch=200, epochs=40, validation_data = valid_generator, validation_steps=50, verbose = 1, callbacks=[model_checkpoint])
305 | 		model.fit(imgs_train, imgs_mask_train, batch_size=12, epochs=20, verbose=1,validation_split=0.2, shuffle=True, callbacks=[model_checkpoint])
306 | 
307 | 
308 | if __name__ == '__main__':
309 | 	unet = TrainModel(256,256)
310 | 	#unet = TrainModel(288,288)
311 | 	#unet = TrainModel(512,512)
312 | 	#unet = TrainModel(448,640)
313 | 	imgs_train, imgs_mask_train = unet.load_data()
314 | 	unet.train()


--------------------------------------------------------------------------------