├── README.md ├── data.py ├── data_prep.py ├── images ├── outputvideo.mp4 ├── testOut.png └── train_label.png ├── runModel.py └── train.py /README.md: -------------------------------------------------------------------------------- 1 | 2 | # Image segmentation using U-Net 3 | Henry Yau 4 | May 28, 2018 5 | 6 | ## Problem description 7 | The challenge presented was to perform a pixel wise identification of the road and vehicles from a video of the car simulator Carla. http://carla.readthedocs.io/en/latest/ 8 | The solution attempted here is image segmentation using a U-Net implemented with Keras with a TensorFlow backend. 9 | ### Training data 10 | The training data provided is a set of 1000 800x600 PNG images from the hood of a simulated car and their corresponding labels, also 800x600 PNG images. The training labels have integer values in the red channels corresponding to the ID of the particular object located at that given pixel. The training labels are preprocessed to move ID's not corresponding to the road or vehicles to a single label and to relabel pixels corresponding to road markings to the road label. In addition, pixels corresponding to the vehicle hood are set to 0 (none). To prevent overfitting, additional simulated runs on Carla were generated providing 2300 more training images and labels. In addition to the additional simulated runs, data augmentation was used. The input images and labels are then resized to 256x256. 11 | 12 | 13 | Training image and label: 14 | ![alt text][trainingImage1] 15 | 16 | [trainingImage1]:https://raw.githubusercontent.com/henyau/Image-Segmentation-with-Unet/master/images/train_label.png "Training image and label" 17 | 18 | 19 | ### Data augmentation 20 | The standard data generator on Keras does not appear to be meant for multiclass image segmentation problems like this one. However there is a trick intended for transforming masks which can be applied here. By providing a the same seed to both the image and label ImageDataGenerator, the same transformations are applied to both. Using zip() creates an iterator which provides the image/label pair. 21 | 22 | ``` 23 | data_gen_args = dict(featurewise_center=False,featurewise_std_normalization=False, rotation_range=10.,width_shift_range=0.3, 24 | height_shift_range=0.3,zoom_range=[0.7,1.3], horizontal_flip=True, fill_mode = 'constant', cval = 0.0, shear_range = 0.3) 25 | image_datagen = ImageDataGenerator(**data_gen_args) 26 | mask_datagen = ImageDataGenerator(**data_gen_args) 27 | seed = 1 28 | image_datagen.fit(imgs_train, augment=True, seed=seed) 29 | mask_datagen.fit(imgs_mask_train, augment=True, seed=seed) 30 | ... 31 | train_generator = zip(image_generator, mask_generator) 32 | ``` 33 | It is possible to leave the label information in a single channel, then in the loss function convert the values into a OHE tensor. To make implementing the loss function more straight forward, the label data is preprocessed so that the road labels are in the R-channel as ones and zeros and the vehicle labels are in the G-channel and the remaining labels are set tot the B-channel. 34 | 35 | ## U-Net Model 36 | The U-Net architecture is a fairly recent advancement in the image segmentation field, originally developed by Ronneberger, et al. at the University of Freiberg to perform image segmentation on neuronal structures. https://lmb.informatik.uni-freiburg.de/people/ronneber/u-net/ 37 | 38 | The general structure of the U-Net starts out similar to a generic CNN used for image classification. That is, a series of 3x3 convolutionals filters and ReLU activation functions with a max pooling layer. This contraction pass results in a long single feature vector, in the original work a 2x1024 representing the background and foreground features. The implementation in this project has a 4x1024 vector representing no label, road label, vehicle label, all additional labels, and a none channel. In a classification problem, a fully connected layer and a sigmoid activation function would be placed here after this to get an output, however U-Net differs significantly from here on. An expansion path is performed which results in a label image with the same dimensions as the input image. The expansion is performed with a series of up-convolutions. What makes U-Net unique is that the weights are passed directly from the contraction path and to the corresponding layer of the expansion path. This kernel is used to map the feature vector into a 2x2 pixel output. 39 | 40 | ## Implementation details 41 | ### U-Net layers 42 | This implementation used here is derived from: https://github.com/petrosgk/Kaggle-Carvana-Image-Masking-Challenge 43 | 44 | The contraction path layers are implemented in Keras as : 45 | ~~~ 46 | down = Conv2D(64, (3, 3), padding='same')(input) 47 | down = BatchNormalization()(down) 48 | down = Activation('relu')(down) 49 | down = Conv2D(64, (3, 3), padding='same')(down) 50 | down = BatchNormalization()(down) 51 | down = Activation('relu')(down) 52 | down_pool = MaxPooling2D((2, 2), strides=(2, 2))(down) 53 | ~~~ 54 | With each subsequent layer doubling until being squeezed into a 1024 center layer. The up-convolution layers use ```UpSampling2D((2,2))``` which doubles the size of the input followed by ```concatenate([down, up])``` which combines the learned weights of the contraction path layer to the expansion layer. The ```Conv2D()``` layer that follows then uses those weights to map the input feature vector to a 2x2 pixel output. The each additional layer doubles the output dimensions until the original dimensions are reached. The original implementation uses no padding so the output dimensions are smaller than the original image. 55 | 56 | ### Loss function 57 | The loss function is combination of one minus the F-Beta score, one minus the Dice score, and a weighted categorical cross entropy. The F-Beta score is computed as the average of the two F-Beta score for the vehicle and the road. To implement this the ``` Lambda``` function is used to separate the two channels. For example, the predictions are separated using: 58 | ~~~ 59 | pred0 = Lambda(lambda x : x[:,:,:,0])(y_pred) 60 | pred1 = Lambda(lambda x : x[:,:,:,1])(y_pred) 61 | ~~~ 62 | From here, the prediction and truths are flattened to a 1D tensor to compute precision and recall which are then used to compute the F-Beta scores using the definitions: 63 | $$\mathrm{precision} =\frac{| \mathrm{pred}\cap\mathrm{truth}|}{|\mathrm{pred}|}$$ 64 | $$\mathrm{recall} =\frac{| \mathrm{pred}\cap\mathrm{truth}|}{|\mathrm{truth}|}$$ 65 | $$F_{\beta} = (1+\beta^2)\frac{\mathrm{precision}\cdot \mathrm{recall}}{(\beta^2 \mathrm{precision})+ \mathrm{recall}}$$ 66 | 67 | The F-beta is a generalization of the F1 score where one can add more weight to either precision or recall. The Dice index is a measure of how similar two sets are, similar to precision and recall. One minus the Dice index is a commonly used loss function for the image segmentation problem. The Dice index is defined as: 68 | $$\mathrm{Dice} =2\frac{| \mathrm{pred}\cap\mathrm{truth}|}{|\mathrm{pred}|+|\mathrm{truth}|}$$ 69 | 70 | Finally a weighted cross entropy is a log loss function where each categorical label can be assigned a weight 71 | $$\mathrm{loss} = -\sum \mathrm{truth}\cdot \log(\mathrm{pred}) w_i$$ 72 | 73 | ### Results 74 | I intended to train the model with only the weighted categorical cross entropy and the dice loss then add the f-beta loss when a local mininum was found, but ran out of time on the virtual machine. The validation Dice score was around 0.98 at the end of training. The model still produces adequate results at over 11 frames per second with an f-beta score of over 0.98 for the road and 0.81 for vehicles. With more time for further training, the accuracy can likely be increased significantly. 75 | 76 | A video of a test sample can be viewed in the images directory. 77 | Clicking on the sample test output image below links to a video of a sample output overlaid on the test video. 78 | 79 | [![IMAGE ALT TEXT HERE](https://raw.githubusercontent.com/henyau/Image-Segmentation-with-Unet/master/images/testOut.png)](https://github.com/henyau/Image-Segmentation-with-Unet/blob/master/images/outputvideo.mp4?raw=true) 80 | 81 | ### Todo 82 | 83 | Convert to TensorRT. Has custom loss functions which need to be converted to plug-ins. -------------------------------------------------------------------------------- /data.py: -------------------------------------------------------------------------------- 1 | from keras.preprocessing.image import ImageDataGenerator, array_to_img, img_to_array, load_img 2 | from skimage import exposure 3 | #from skimage.morphology import disk 4 | #from skimage.filters import unsharp_mask 5 | from skimage import filters 6 | import numpy as np 7 | import os 8 | import glob 9 | from numpy import zeros, newaxis 10 | #import cv2 11 | 12 | class dataProcess(object): 13 | ''' 14 | images and labels are stored into np arrays and save to disk, images are normalized and mean centered when loaded 15 | ''' 16 | 17 | def __init__(self, out_rows, out_cols, data_path = "./Train/train-256", label_path = "./Train/train_masks-256", test_path = "./test", npy_path = "./npydata", img_type = "png"): 18 | self.out_rows = out_rows 19 | self.out_cols = out_cols 20 | self.data_path = './Train/train-'f'{str(out_rows)}' 21 | self.label_path = './Train/train_masks-'f'{str(out_rows)}' 22 | self.img_type = img_type 23 | self.test_path = test_path 24 | self.npy_path = npy_path 25 | 26 | def create_train_data(self): 27 | i = 0 28 | num_img = 3000 29 | print('-'*30) 30 | print('Creating training images...') 31 | print('-'*30) 32 | imgs = glob.glob(self.data_path+"/*."+self.img_type) 33 | print(len(imgs)) 34 | 35 | 36 | imgdatas = np.ndarray((num_img,self.out_cols,self.out_rows,3), dtype=np.uint8) 37 | imglabels = np.ndarray((num_img,self.out_cols,self.out_rows,1), dtype=np.uint8) 38 | 39 | 40 | for imgname in imgs: 41 | midname = imgname[imgname.rindex("/")+1:] 42 | img = load_img(self.data_path + "/" + midname,grayscale = False) 43 | label = load_img(self.label_path + "/" + midname,grayscale = False) 44 | print(midname) 45 | 46 | img = img_to_array(img) 47 | label = img_to_array(label) 48 | #img = cv2.imread(self.data_path + "/" + midname,cv2.IMREAD_GRAYSCALE) 49 | #label = cv2.imread(self.label_path + "/" + midname,cv2.IMREAD_GRAYSCALE) 50 | #img = np.array([img]) 51 | #label = np.array([label]) 52 | #imgdatas[i] = img 53 | tmp2 = label[...,0] 54 | tmp2 = tmp2[...,newaxis] #still need 128,128,1 55 | #imglabels[i] = tmp2 56 | 57 | 58 | #if there is a car to the left add it 59 | ''' 60 | cars_b = False 61 | car_px_cnt = 0 62 | for ii in range(128): 63 | for jj in range(64, 110): 64 | if tmp2[ii,jj,0] == 10: 65 | car_px_cnt+=1 66 | if car_px_cnt >10: 67 | cars_b = True 68 | print(midname) 69 | break 70 | if cars_b == True: 71 | break; 72 | if cars_b: 73 | imglabels[i] = tmp2 74 | imgdatas[i] = img 75 | i += 1 76 | ''' 77 | 78 | imglabels[i] = tmp2 79 | imgdatas[i] = img 80 | i += 1 81 | 82 | if i % 100 == 0: 83 | print('Done: {0}/{1} images'.format(i, len(imgs))) 84 | #i += 1 85 | if i>=num_img: 86 | break 87 | print('loading done') 88 | np.save(self.npy_path + '/imgs_train_'f'{str(self.out_rows)}.npy', imgdatas) 89 | np.save(self.npy_path + '/imgs_mask_train_'f'{str(self.out_rows)}.npy', imglabels) 90 | print('Saving to .npy files done.') 91 | 92 | def create_test_data(self): 93 | i = 0 94 | print('-'*30) 95 | print('Creating test images...') 96 | print('-'*30) 97 | imgs = glob.glob(self.test_path+"/*."+self.img_type) 98 | print(len(imgs)) 99 | imgdatas = np.ndarray((len(imgs),self.out_rows,self.out_cols,1), dtype=np.uint8) 100 | for imgname in imgs: 101 | midname = imgname[imgname.rindex("/")+1:] 102 | img = load_img(self.test_path + "/" + midname,grayscale = True) 103 | img = img_to_array(img) 104 | #img = cv2.imread(self.test_path + "/" + midname,cv2.IMREAD_GRAYSCALE) 105 | #img = np.array([img]) 106 | imgdatas[i] = img 107 | i += 1 108 | print('loading done') 109 | np.save(self.npy_path + '/imgs_test.npy', imgdatas) 110 | print('Saving to imgs_test.npy files done.') 111 | 112 | def load_train_data_1chan(self): 113 | print('-'*30) 114 | print('load train images...') 115 | print('-'*30) 116 | imgs_train = np.load(self.npy_path+"/imgs_train.npy") 117 | imgs_mask_train = np.load(self.npy_path+"/imgs_mask_train.npy") 118 | imgs_train = imgs_train.astype('float32') 119 | 120 | imgs_train /= 255 121 | mean = imgs_train.mean(axis = 0) 122 | imgs_train -= mean 123 | #print('mean: '+ repr(mean)) 124 | #imgs_mask_train = imgs_mask_train.astype('float32') 125 | #imgs_mask_train /= 255 126 | 127 | #imgs_mask_train[imgs_mask_train > 0] = 1 # do binary for now 128 | dim_mask = imgs_mask_train.shape 129 | 130 | imgs_mask_train2 = np.zeros(( dim_mask[0] , dim_mask[1], dim_mask[2] , 1 )) # two classes in 3 channels? 131 | 132 | #for categorical 133 | for i in range(dim_mask[0]): 134 | for indc, c in enumerate([7,10,15]): 135 | imgs_mask_train2[i, : , : , 0 ] += (imgs_mask_train[i, : , :, 0] == c ).astype(int) 136 | 137 | #for sparse 138 | 139 | 140 | #imgs_mask_train[imgs_mask_train <= 0.5] = 0 141 | #the mask should have two channels, road and vehicle, then just repaint after 142 | #return imgs_train,imgs_mask_train2 143 | 144 | #test sparse categorical for now? 145 | return imgs_train,imgs_mask_train2,mean 146 | 147 | def load_train_data(self): 148 | print('-'*30) 149 | print('load train images...') 150 | print('-'*30) 151 | imgs_train = np.load(self.npy_path+'/imgs_train_'f'{str(self.out_rows)}.npy') 152 | imgs_mask_train = np.load(self.npy_path+'/imgs_mask_train_'f'{str(self.out_rows)}.npy') 153 | 154 | 155 | #imgs_train = exposure.equalize_hist(imgs_train) #global equalize 156 | #selem = disk(30) 157 | #imgs_train = rank.equalize(imgs_train, selem=selem) # local equalize 158 | #imgs_train = exposure.adjust_gamma(imgs_train, gamma=2, gain=1) 159 | 160 | imgs_train = imgs_train.astype('float32') 161 | 162 | #imgs_train /= 127 163 | imgs_train /= 255 164 | #imgs_train -= 1 165 | #imgs_train = filters.unsharp_mask(imgs_train) 166 | mean = imgs_train.mean(axis = 0) 167 | #imgs_train -= mean #test without subtracting mean 168 | 169 | 170 | #print('mean: '+ repr(mean)) 171 | #imgs_mask_train = imgs_mask_train.astype('float32') 172 | #imgs_mask_train /= 255 173 | #imgs_mask_train[imgs_mask_train > 0] = 1 # do binary for now 174 | dim_mask = imgs_mask_train.shape 175 | 176 | imgs_mask_train2 = np.zeros(( dim_mask[0] , dim_mask[1], dim_mask[2] , 3 )) # two classes vehicle or road 177 | 178 | #for categorical 179 | for i in range(dim_mask[0]): 180 | for indc, c in enumerate([7,10,15]): 181 | imgs_mask_train2[i, : , : , indc ] = (imgs_mask_train[i, : , :, 0] == c ).astype(int) 182 | 183 | #for sparse 184 | #imgs_mask_train[imgs_mask_train <= 0.5] = 0 185 | #the mask should have two channels, road and vehicle, then just repaint after 186 | #return imgs_train,imgs_mask_train2 187 | 188 | #test sparse categorical for now? 189 | return imgs_train,imgs_mask_train2,mean 190 | 191 | 192 | def create_mean(self): 193 | print('-'*30) 194 | print('load train to compute mean images...') 195 | print('-'*30) 196 | imgs_train = np.load(self.npy_path+'/imgs_train_'f'{str(self.out_rows)}.npy') 197 | imgs_train = imgs_train.astype('float32') 198 | imgs_train /= 255 199 | mean = imgs_train.mean(axis = 0) 200 | np.save(self.npy_path + '/train_mean_'f'{str(self.out_rows)}.npy', mean) 201 | 202 | 203 | def get_mean(self): 204 | mean = np.load(self.npy_path+'/train_mean_'f'{str(self.out_rows)}.npy') 205 | return mean 206 | 207 | 208 | if __name__ == "__main__": 209 | #traindata = dataProcess(448,448, data_path = "./Train/train-448", label_path = "./Train/train_masks-448" 210 | traindata = dataProcess(256,256, data_path = "./Train/train-256", label_path = "./Train/train_masks-256") 211 | #traindata = dataProcess(288,288, data_path = "./Train/train-288", label_path = "./Train/train_masks-288") 212 | #traindata = dataProcess(512,512, data_path = "./Train/train-512", label_path = "./Train/train_masks-512") 213 | traindata.create_train_data() 214 | traindata.create_mean() 215 | 216 | -------------------------------------------------------------------------------- /data_prep.py: -------------------------------------------------------------------------------- 1 | #from fastai.conv_learner import * 2 | #from fastai.dataset import * 3 | from sklearn.model_selection import train_test_split 4 | #from fastai.models.resnet import vgg_resnet50 5 | from concurrent.futures import ThreadPoolExecutor 6 | 7 | from PIL import Image 8 | import matplotlib.pyplot as plt 9 | from pathlib import Path 10 | import json 11 | #torch.cuda.set_device(0) 12 | import shutil 13 | import numpy as np 14 | 15 | import cv2 16 | 17 | PATH = Path('./Train/') 18 | list(PATH.iterdir()) 19 | #sx = 256 20 | sz = 256 21 | 22 | TRAIN_DN = 'CameraRGB' 23 | MASKS_DN = 'CameraSeg' 24 | 25 | def show_img(im, figsize=None, ax=None, alpha=None): 26 | if not ax: fig,ax = plt.subplots(figsize=figsize) 27 | ax.imshow(im, alpha=alpha) 28 | ax.set_axis_off() 29 | return ax 30 | 31 | #list((PATH/TRAIN_DN).iterdir())[:5] 32 | list((PATH/MASKS_DN).iterdir())[:5] 33 | 34 | 35 | (PATH/'train_masks-'f'{str(sz)}').mkdir(exist_ok=True) 36 | (PATH/'train-'f'{str(sz)}').mkdir(exist_ok=True) 37 | 38 | def resize_mask(fn): 39 | tmpimg = Image.open(fn).resize((sz,sz)) 40 | oneChanAr = np.zeros((sz, sz)); 41 | for i in range(0,sz): 42 | for j in range(0,sz): 43 | r, g, b = tmpimg.getpixel((i, j)) 44 | if r == 6: 45 | #tmpimg.putpixel((i, j), (7,0,0)) 46 | oneChanAr[j][i] = 7 47 | elif r!= 7 and r!= 10 and r!=0: 48 | #tmpimg.putpixel((i, j), (15,0,0)) 49 | oneChanAr[j][i] = 15 50 | elif j>sz*0.82 and r ==10: # remove hood, use 100 for 128x128 images 51 | #tmpimg.putpixel((i, j), (0,0,0)) #can actually extend the road 52 | oneChanAr[j][i] = 0 53 | else: 54 | #tmpimg.putpixel((i, j), (r,0,0)) #can actually extend the road 55 | oneChanAr[j][i] = r 56 | 57 | 58 | oneChanIm = Image.fromarray(np.uint8(oneChanAr)) 59 | oneChanIm.save((fn.parent.parent)/'train_masks-'f'{str(sz)}'/fn.name, mode='L') 60 | 61 | def resize_img(fn): 62 | Image.open(fn).resize((sz,sz)).save((fn.parent.parent)/'train-'f'{str(sz)}'/fn.name) 63 | 64 | #resize_mask(PATH/MASKS_DN/'200.png') 65 | #ims = cv2.imread(str(PATH/TRAIN_DN/'200.png')) 66 | #im_masks = cv2.imread(str(PATH/'train_masks-128/200.png')) 67 | #print(im_masks.shape) 68 | #ax = show_img(ims) 69 | #ax = show_img(im_masks[...,0]) 70 | 71 | 72 | 73 | files = list((PATH/f'{MASKS_DN}').iterdir()) 74 | with ThreadPoolExecutor(8) as e: 75 | e.map(resize_mask, files) 76 | 77 | files = list((PATH/f'{TRAIN_DN}').iterdir()) 78 | with ThreadPoolExecutor(8) as e: 79 | e.map(resize_img, files) 80 | -------------------------------------------------------------------------------- /images/outputvideo.mp4: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/henyau/Image-Segmentation-with-Unet/e46e6bd7c195098e7f20b9a898137d32f9b1250a/images/outputvideo.mp4 -------------------------------------------------------------------------------- /images/testOut.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/henyau/Image-Segmentation-with-Unet/e46e6bd7c195098e7f20b9a898137d32f9b1250a/images/testOut.png -------------------------------------------------------------------------------- /images/train_label.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/henyau/Image-Segmentation-with-Unet/e46e6bd7c195098e7f20b9a898137d32f9b1250a/images/train_label.png -------------------------------------------------------------------------------- /runModel.py: -------------------------------------------------------------------------------- 1 | from keras.models import load_model 2 | import sys, skvideo.io, json, base64 3 | from skimage import exposure 4 | import numpy as np 5 | from PIL import Image 6 | from io import BytesIO, StringIO 7 | from train import * 8 | import cv2 9 | import os 10 | from concurrent.futures import ThreadPoolExecutor 11 | 12 | #sz = 256 13 | #sx = 320 14 | #sy = 224 15 | 16 | #sx = 448 17 | #sy = 640 18 | sx = 256 19 | sy = 256 20 | 21 | os.environ['TF_CPP_MIN_LOG_LEVEL'] = '3' 22 | 23 | file = sys.argv[-1] 24 | 25 | if file == 'runModel.py': 26 | print ("Error loading video") 27 | quit 28 | 29 | 30 | # Define encoder function 31 | def encode(array): 32 | pil_img = Image.fromarray(array) 33 | buff = BytesIO() 34 | pil_img.save(buff, format="PNG") 35 | return base64.b64encode(buff.getvalue()).decode("utf-8") 36 | 37 | video = skvideo.io.vread(file) 38 | 39 | answer_key = {} 40 | 41 | # Frame numbering starts at 1 42 | frame = 1 43 | 44 | # Load model 45 | #model = get_unet_128((sx,sy,3),3) 46 | #model.load_weights("unet_Datgen.hdf5") 47 | #model.load_weights("unet_Datgen_NoMean448.hdf5") 48 | model = load_model('./models/unet_Datgen_256Frozen.hdf5') 49 | 50 | #model.load_weights("unet_1chan.hdf5") 51 | 52 | 53 | #imgs_train, imgs_mask_train, train_mean = mydata.load_train_data() 54 | train_mean= np.load("./npydata/train_mean_256.npy") #mydata.get_mean() 55 | #train_mea n = np.load("./npydata/train_mean.npy") #mydata.get_mean() 56 | 57 | # Process video with trained Unet model 58 | 59 | 60 | #process all the frames in the video, resize frame in array [x,128,128,3], then process at once 61 | dim = video.shape[0] 62 | 63 | test_array = np.zeros((dim, sx,sy,3)) 64 | #print(test_array.shape) 65 | i = 0 66 | 67 | 68 | 69 | for rgb_frame in video: 70 | #use PIL for test 71 | #ims_PIL = Image.fromarray(rgb_frame); 72 | #ims = ims_PIL.resize((sz, sz),Image.NEAREST) 73 | #ims = np.array(ims) 74 | ims = cv2.resize(rgb_frame,(sy,sx) ,interpolation = cv2.INTER_CUBIC) 75 | #ims = exposure.adjust_gamma(ims, gamma=2, gain=1) 76 | #ims = exposure.adjust_gamma(ims) #global equalize 77 | ims = ims.astype('float32') 78 | 79 | #ims /=127.0 80 | #ims -= 1.0 81 | ims /= 255 82 | #ims -=train_mean 83 | 84 | test_array[i, ...] = ims 85 | i+=1 86 | 87 | #datagen = ImageDataGenerator(zca_whitening=True) 88 | #datagen.fit(test_array) 89 | #for i in range(len(test_array)): 90 | #test_array[i] = datagen.standardize(test_array[i]) 91 | 92 | wholeBatch = True 93 | if wholeBatch == True: 94 | res = model.predict(test_array) 95 | 96 | for ylab in res: 97 | 98 | for i in range(0,sx): #remove hood 99 | for j in range(0,sy): 100 | if j>(sy*0.82): 101 | ylab[i,j,1] = 0 102 | 103 | res_road = cv2.resize(ylab[...,0],(800,600) ,interpolation = cv2.INTER_CUBIC) 104 | res_vehicle = cv2.resize(ylab[...,1],(800,600) ,interpolation = cv2.INTER_LINEAR) 105 | 106 | res_road = (res_road>0.5).astype('uint8') 107 | res_vehicle = (res_vehicle>0.5).astype('uint8') 108 | 109 | answer_key[frame] = [encode(res_vehicle), encode(res_road)] 110 | 111 | frame+=1 112 | else: 113 | for test_img in test_array: 114 | 115 | ylab = model.predict(test_img[newaxis,...]) 116 | for i in range(0,sx): 117 | for j in range(0,sy): 118 | if j>(sy*0.82): # remove hood 119 | ylab[0,i,j,1] = 0 120 | 121 | res_road = cv2.resize(ylab[0,...,0],(800,600) ,interpolation = cv2.INTER_LINEAR) 122 | res_vehicle = cv2.resize(ylab[0,...,1],(800,600) ,interpolation = cv2.INTER_LINEAR) 123 | 124 | res_road = (res_road>0.5).astype('uint8') 125 | res_vehicle = (res_vehicle>0.5).astype('uint8') 126 | 127 | answer_key[frame] = [encode(res_vehicle[...]), encode(res_road[...])] 128 | 129 | frame+=1 130 | 131 | # Print output in proper json format 132 | print (json.dumps(answer_key)) 133 | -------------------------------------------------------------------------------- /train.py: -------------------------------------------------------------------------------- 1 | import os 2 | #os.environ["CUDA_VISIBLE_DEVICES"] = "0" 3 | import numpy as np 4 | from keras.models import * 5 | from keras.layers import Input, merge, Conv2D, MaxPooling2D, UpSampling2D, Dropout, Cropping2D, Lambda 6 | from keras.preprocessing.image import ImageDataGenerator, array_to_img, img_to_array, load_img 7 | from keras.optimizers import * 8 | from keras.callbacks import ModelCheckpoint, LearningRateScheduler 9 | from keras import backend as keras 10 | from data import * 11 | from keras.utils.np_utils import to_categorical 12 | from keras.losses import categorical_crossentropy 13 | from keras import backend as K 14 | from keras.utils import to_categorical 15 | from keras.models import Model 16 | from keras.layers import Input, concatenate, Conv2D, MaxPooling2D, Activation, UpSampling2D, BatchNormalization 17 | from keras.optimizers import RMSprop 18 | from sklearn.model_selection import train_test_split 19 | 20 | def dice(y_pred, y_true): 21 | y_true_f = K.flatten(y_true) 22 | y_pred_f = K.flatten(y_pred) 23 | intersection = K.sum(y_true_f * y_pred_f) 24 | return (2. * intersection + 1) / (K.sum(y_true_f) + K.sum(y_pred_f) + 1) 25 | 26 | def fbeta(y_pred, y_true): 27 | 28 | pred0 = Lambda(lambda x : x[:,:,:,0])(y_pred) 29 | pred1 = Lambda(lambda x : x[:,:,:,1])(y_pred) 30 | true0 = Lambda(lambda x : x[:,:,:,0])(y_true) 31 | true1 = Lambda(lambda x : x[:,:,:,1])(y_true) # channel last? 32 | 33 | y_pred_0 = K.flatten(pred0) 34 | y_true_0 = K.flatten(true0) 35 | 36 | y_pred_1 = K.flatten(pred1) 37 | y_true_1 = K.flatten(true1) 38 | 39 | intersection0 = K.sum(y_true_0 * y_pred_0) 40 | intersection1 = K.sum(y_true_1 * y_pred_1) 41 | 42 | precision0 = intersection0/(K.sum(y_pred_0)+K.epsilon()) 43 | recall0 = intersection0/(K.sum(y_true_0)+K.epsilon()) 44 | 45 | precision1 = intersection1/(K.sum(y_pred_1)+K.epsilon()) 46 | recall1 = intersection1/(K.sum(y_true_1)+K.epsilon()) 47 | 48 | fbeta0 = (1.0+0.25)*(precision0*recall0)/(0.25*precision0+recall0+K.epsilon()) 49 | fbeta1 = (1.0+4.0)*(precision1*recall1)/(4.0*precision1+recall1+K.epsilon()) 50 | 51 | return ((fbeta0+fbeta1)/2.0) 52 | 53 | def fbeta_loss(y_true, y_pred): 54 | return 1-fbeta(y_true, y_pred) 55 | 56 | def dice_loss(y_true, y_pred): 57 | return 1-dice(y_true, y_pred) 58 | 59 | def bce_dice_loss(y_true, y_pred): 60 | return binary_crossentropy(y_true, y_pred) + dice_coef_loss(y_true, y_pred) 61 | def weighted_categorical_crossentropy(y_true, y_pred): 62 | #weights = K.variable([0.5,2.0,0.0]) 63 | weights = K.variable([0.5,4.0,0.0]) 64 | 65 | # scale predictions so that the class probas of each sample sum to 1 66 | y_pred /= K.sum(y_pred, axis=-1, keepdims=True) 67 | # clip to prevent NaN's and Inf's 68 | y_pred = K.clip(y_pred, K.epsilon(), 1 - K.epsilon()) 69 | # calc 70 | loss = y_true * K.log(y_pred) * weights 71 | loss = -K.sum(loss, -1) 72 | 73 | 74 | return loss 75 | 76 | def cat_dice_loss(y_true, y_pred): 77 | # return categorical_crossentropy(y_true, y_pred) + dice_loss(y_true, y_pred) 78 | #return weighted_categorical_crossentropy(y_true, y_pred) + dice_loss(y_true, y_pred)+fbeta_loss(y_true, y_pred) 79 | return weighted_categorical_crossentropy(y_true, y_pred) + dice_loss(y_true, y_pred) 80 | #return weighted_categorical_crossentropy(y_true, y_pred) +fbeta_loss(y_true, y_pred) 81 | #return dice_loss(y_true, y_pred) 82 | def get_unet_128(input_shape=(128, 128, 3), 83 | num_classes=2): 84 | 85 | #This U-Net implementation is derived from: https://github.com/petrosgk/Kaggle-Carvana-Image-Masking-Challenge 86 | 87 | inputs = Input(shape=input_shape) 88 | # 256 89 | 90 | down0 = Conv2D(32, (3, 3), padding='same')(inputs) 91 | down0 = BatchNormalization()(down0) 92 | down0 = Activation('relu')(down0) 93 | down0 = Conv2D(32, (3, 3), padding='same')(down0) 94 | down0 = BatchNormalization()(down0) 95 | down0 = Activation('relu')(down0) 96 | down0_pool = MaxPooling2D((2, 2), strides=(2, 2))(down0) 97 | # 128 98 | 99 | down1 = Conv2D(64, (3, 3), padding='same')(down0_pool) 100 | down1 = BatchNormalization()(down1) 101 | down1 = Activation('relu')(down1) 102 | down1 = Conv2D(64, (3, 3), padding='same')(down1) 103 | down1 = BatchNormalization()(down1) 104 | down1 = Activation('relu')(down1) 105 | down1_pool = MaxPooling2D((2, 2), strides=(2, 2))(down1) 106 | # 64 107 | 108 | down2 = Conv2D(128, (3, 3), padding='same')(down1_pool) 109 | down2 = BatchNormalization()(down2) 110 | down2 = Activation('relu')(down2) 111 | down2 = Conv2D(128, (3, 3), padding='same')(down2) 112 | down2 = BatchNormalization()(down2) 113 | down2 = Activation('relu')(down2) 114 | down2_pool = MaxPooling2D((2, 2), strides=(2, 2))(down2) 115 | # 32 116 | 117 | down3 = Conv2D(256, (3, 3), padding='same')(down2_pool) 118 | down3 = BatchNormalization()(down3) 119 | down3 = Activation('relu')(down3) 120 | down3 = Conv2D(256, (3, 3), padding='same')(down3) 121 | down3 = BatchNormalization()(down3) 122 | down3 = Activation('relu')(down3) 123 | down3_pool = MaxPooling2D((2, 2), strides=(2, 2))(down3) 124 | # 16 125 | 126 | down4 = Conv2D(512, (3, 3), padding='same')(down3_pool) 127 | down4 = BatchNormalization()(down4) 128 | down4 = Activation('relu')(down4) 129 | down4 = Conv2D(512, (3, 3), padding='same')(down4) 130 | down4 = BatchNormalization()(down4) 131 | down4 = Activation('relu')(down4) 132 | down4_pool = MaxPooling2D((2, 2), strides=(2, 2))(down4) 133 | # 8 134 | 135 | center = Conv2D(1024, (3, 3), padding='same')(down4_pool) 136 | center = BatchNormalization()(center) 137 | center = Activation('relu')(center) 138 | center = Conv2D(1024, (3, 3), padding='same')(center) 139 | center = BatchNormalization()(center) 140 | center = Activation('relu')(center) 141 | # center 142 | 143 | up4 = UpSampling2D((2, 2))(center) 144 | up4 = concatenate([down4, up4], axis=3) 145 | up4 = Conv2D(512, (3, 3), padding='same')(up4) 146 | up4 = BatchNormalization()(up4) 147 | up4 = Activation('relu')(up4) 148 | up4 = Conv2D(512, (3, 3), padding='same')(up4) 149 | up4 = BatchNormalization()(up4) 150 | up4 = Activation('relu')(up4) 151 | up4 = Conv2D(512, (3, 3), padding='same')(up4) 152 | up4 = BatchNormalization()(up4) 153 | up4 = Activation('relu')(up4) 154 | # 16 155 | 156 | up3 = UpSampling2D((2, 2))(up4) 157 | up3 = concatenate([down3, up3], axis=3) 158 | up3 = Conv2D(256, (3, 3), padding='same')(up3) 159 | up3 = BatchNormalization()(up3) 160 | up3 = Activation('relu')(up3) 161 | up3 = Conv2D(256, (3, 3), padding='same')(up3) 162 | up3 = BatchNormalization()(up3) 163 | up3 = Activation('relu')(up3) 164 | up3 = Conv2D(256, (3, 3), padding='same')(up3) 165 | up3 = BatchNormalization()(up3) 166 | up3 = Activation('relu')(up3) 167 | # 32 168 | 169 | up2 = UpSampling2D((2, 2))(up3) 170 | up2 = concatenate([down2, up2], axis=3) 171 | up2 = Conv2D(128, (3, 3), padding='same')(up2) 172 | up2 = BatchNormalization()(up2) 173 | up2 = Activation('relu')(up2) 174 | up2 = Conv2D(128, (3, 3), padding='same')(up2) 175 | up2 = BatchNormalization()(up2) 176 | up2 = Activation('relu')(up2) 177 | up2 = Conv2D(128, (3, 3), padding='same')(up2) 178 | up2 = BatchNormalization()(up2) 179 | up2 = Activation('relu')(up2) 180 | # 64 181 | 182 | up1 = UpSampling2D((2, 2))(up2) 183 | up1 = concatenate([down1, up1], axis=3) 184 | up1 = Conv2D(64, (3, 3), padding='same')(up1) 185 | up1 = BatchNormalization()(up1) 186 | up1 = Activation('relu')(up1) 187 | up1 = Conv2D(64, (3, 3), padding='same')(up1) 188 | up1 = BatchNormalization()(up1) 189 | up1 = Activation('relu')(up1) 190 | up1 = Conv2D(64, (3, 3), padding='same')(up1) 191 | up1 = BatchNormalization()(up1) 192 | up1 = Activation('relu')(up1) 193 | # 128 194 | 195 | up0 = UpSampling2D((2, 2))(up1) 196 | up0 = concatenate([down0, up0], axis=3) 197 | up0 = Conv2D(32, (3, 3), padding='same')(up0) 198 | up0 = BatchNormalization()(up0) 199 | up0 = Activation('relu')(up0) 200 | up0 = Conv2D(32, (3, 3), padding='same')(up0) 201 | up0 = BatchNormalization()(up0) 202 | up0 = Activation('relu')(up0) 203 | up0 = Conv2D(32, (3, 3), padding='same')(up0) 204 | up0 = BatchNormalization()(up0) 205 | up0 = Activation('relu')(up0) 206 | # 256 207 | 208 | #classify = Conv2D(num_classes, (1, 1), activation='sigmoid')(up0) 209 | classify = Conv2D(3, (1, 1), activation='sigmoid')(up0) #using dataGen means 1,3,4 channels only 210 | 211 | model = Model(inputs=inputs, outputs=classify) 212 | 213 | #model.compile(optimizer=RMSprop(lr=0.0001), loss=bce_dice_loss, metrics=[dice_coeff]) 214 | #model.compile(optimizer = Adam(lr = 1e-5), loss = 'sparse_categorical_crossentropy', metrics = [dice]) 215 | #model.compile(optimizer = Adam(lr = 1e-5), loss = 'categorical_crossentropy', metrics = [dice]) 216 | 217 | for layer in model.layers:#freeze the conv layers? # only want to learn input weights (and output weights?) 218 | layer.trainable = False 219 | lr = 1e-4 220 | model.compile(optimizer = Adam(lr = lr, decay=1e-5), loss = cat_dice_loss, metrics = [dice, fbeta]) 221 | #keras.optimizers.Nadam(lr=0.002, beta_1=0.9, beta_2=0.999, epsilon=None, schedule_decay=0.004) 222 | 223 | return model 224 | 225 | class TrainModel(object): 226 | 227 | def __init__(self, img_rows = 512, img_cols = 512): 228 | 229 | self.img_rows = img_rows 230 | self.img_cols = img_cols 231 | 232 | def load_data(self): 233 | 234 | mydata = dataProcess(self.img_rows, self.img_cols) 235 | imgs_train, imgs_mask_train, mean = mydata.load_train_data() 236 | #imgs_mask_train = to_categorical(imgs_mask_train) 237 | #imgs_test = mydata.load_test_data() 238 | return imgs_train, imgs_mask_train#, imgs_test 239 | 240 | def train(self): 241 | ''' 242 | print("loading data") 243 | imgs_train, imgs_mask_train, imgs_test = self.load_data() 244 | print("loading data done") 245 | #model = self.get_unet() 246 | model = get_unet_128((128,128,3),2) 247 | print("got unet") 248 | 249 | model_checkpoint = ModelCheckpoint('unet_new.hdf5', monitor='loss',verbose=1, save_best_only=True) 250 | print('Fitting model...') 251 | model.fit(imgs_train, imgs_mask_train, batch_size=8, nb_epoch=8, verbose=1,validation_split=0.2, shuffle=True, callbacks=[model_checkpoint]) 252 | 253 | 254 | #print('predict test data') 255 | #imgs_mask_test = model.predict(imgs_test, batch_size=1, verbose=1) 256 | #np.save('../results/imgs_mask_test.npy', imgs_mask_test) 257 | ''' 258 | '''data_gen_args1 = dict(featurewise_center=False,featurewise_std_normalization=False, rotation_range=25.,width_shift_range=0.3, 259 | height_shift_range=0.3,zoom_range=[0.5,1.5], horizontal_flip=True, fill_mode = 'constant', cval = 0.0, shear_range = 0.3, zca_whitening= True)''' 260 | 261 | 262 | #data_gen_args = dict(featurewise_center=False,featurewise_std_normalization=False, rotation_range=25.,width_shift_range=0.2, 263 | # height_shift_range=0.2,zoom_range=[0.6,1.4], horizontal_flip=True, fill_mode = 'constant', cval = 0.0, shear_range = 0.0) 264 | #data_gen_args = dict(featurewise_center=False,featurewise_std_normalization=False, rotation_range=10.0,width_shift_range=0.1, 265 | #height_shift_range=0.1,zoom_range=[0.8,1.2], horizontal_flip=False, fill_mode = 'constant', cval = 0.0, shear_range = 0.0) 266 | data_gen_args = dict(featurewise_center=False,featurewise_std_normalization=False, rotation_range=0.,width_shift_range=0, 267 | height_shift_range=0.0,zoom_range=0.0, horizontal_flip=True, fill_mode = 'constant', cval = 0.0, shear_range = 0) 268 | 269 | image_datagen = ImageDataGenerator(**data_gen_args)#data generator works on both image and masks 270 | mask_datagen = ImageDataGenerator(**data_gen_args) 271 | 272 | X_train, X_test, y_train, y_test = train_test_split(imgs_train, imgs_mask_train, test_size=0.20, random_state=42) 273 | 274 | seed = 1 275 | ''' 276 | image_datagen.fit(imgs_train, augment=True, seed=seed) 277 | mask_datagen.fit(imgs_mask_train, augment=True, seed=seed) 278 | 279 | image_generator=image_datagen.flow(imgs_train,batch_size = 12,shuffle = True, seed = seed) 280 | mask_generator=mask_datagen.flow(imgs_mask_train,batch_size = 12,shuffle = True, seed = seed) 281 | 282 | #image_generator = image_datagen.flow_from_directory('Train/train-128', class_mode=None,seed=seed) 283 | #mask_generator = mask_datagen.flow_from_directory('Train/train_masks-128',class_mode=None,seed=seed) 284 | 285 | ''' 286 | 287 | image_datagen.fit(X_train, augment=True, seed=seed) 288 | mask_datagen.fit(y_train, augment=True, seed=seed) 289 | image_generator=image_datagen.flow(X_train,batch_size = 4,shuffle = True, seed = seed) 290 | mask_generator=mask_datagen.flow(y_train,batch_size = 4,shuffle = True, seed = seed) 291 | 292 | valid_image_generator=image_datagen.flow(X_test,batch_size = 4,shuffle = True, seed = seed) 293 | valid_mask_generator=mask_datagen.flow(y_test,batch_size = 4,shuffle = True, seed = seed) 294 | 295 | train_generator = zip(image_generator, mask_generator) 296 | valid_generator = zip(valid_image_generator, valid_mask_generator) 297 | model = get_unet_128((self.img_rows,self.img_cols,3),2) 298 | 299 | model_checkpoint = ModelCheckpoint('unet_Datgen_256Frozen.hdf5', monitor='val_loss',verbose=1, save_best_only=True) 300 | #model.load_weights("unet_1chan.hdf5") 301 | #model.load_weights("unet_Datgen_NewDat.hdf5") 302 | model.load_weights("unet_Datgen_256Full.hdf5") 303 | #model.load_weights("unet_Datgen_fBeta.hdf5") 304 | #model.fit_generator(train_generator, steps_per_epoch=200, epochs=40, validation_data = valid_generator, validation_steps=50, verbose = 1, callbacks=[model_checkpoint]) 305 | model.fit(imgs_train, imgs_mask_train, batch_size=12, epochs=20, verbose=1,validation_split=0.2, shuffle=True, callbacks=[model_checkpoint]) 306 | 307 | 308 | if __name__ == '__main__': 309 | unet = TrainModel(256,256) 310 | #unet = TrainModel(288,288) 311 | #unet = TrainModel(512,512) 312 | #unet = TrainModel(448,640) 313 | imgs_train, imgs_mask_train = unet.load_data() 314 | unet.train() --------------------------------------------------------------------------------