├── 01_finding_lane_lines ├── README.md ├── basic.py └── test_images │ └── solidWhiteRight.jpg ├── 02_traffic_sign_detector ├── .gitmodules ├── CMakeLists.txt ├── README.md ├── examples │ └── images │ │ ├── lombada.png │ │ ├── pare.jpg │ │ └── pedestre.jpg ├── src │ ├── detect.cpp │ ├── hog_detector.cpp │ ├── train_object_detector.cpp │ └── view_hog.cpp └── svm_detectors │ ├── lombada_detector.svm │ ├── pare_detector.svm │ └── pedestre_detector.svm ├── 03_opencv_detection ├── CMakeLists.txt ├── README.md ├── input │ ├── object_detection_classes_coco.txt │ ├── ssd_mobilenet_v2_coco_2018_03_29.pbtxt.txt │ └── video_1.mp4 └── main.cpp ├── 04_vehicle_detection ├── README.md ├── SSD.py ├── config.py ├── data │ ├── feat_extraction_params.pickle │ ├── feature_scaler.pickle │ └── svm_trained.pickle ├── functions_detection.py ├── functions_feat_extraction.py ├── functions_utils.py ├── img │ ├── car_samples.png │ ├── confidence_001.png │ ├── confidence_050.png │ ├── hog_car_vs_noncar.jpg │ ├── noncar_samples.png │ └── pipeline_hog.jpg ├── main_hog.py ├── main_ssd.py ├── output_images │ ├── test1.jpg │ ├── test2.jpg │ ├── test3.jpg │ ├── test4.jpg │ ├── test5.jpg │ └── test6.jpg ├── process_video.py ├── project_5_utils.py ├── test_images │ ├── test1.jpg │ ├── test2.jpg │ ├── test3.jpg │ ├── test4.jpg │ ├── test5.jpg │ └── test6.jpg ├── train.py └── vehicle.py ├── 05_road_segmentation ├── README.md ├── __init__.py ├── helper.py ├── image_augmentation.py ├── img │ ├── example.png │ └── overview.jpg ├── main.py └── project_tests.py ├── LICENSE ├── README.md ├── faster-rcnn-tutorial ├── README.md ├── custom_dataset.py ├── dataset │ ├── README.md │ ├── download_dataset.sh │ └── result.png ├── detection │ ├── __init__.py │ ├── anchor_generator.py │ ├── backbone_resnet.py │ ├── faster_RCNN.py │ ├── transformations.py │ └── utils.py ├── inference.ipynb ├── metrics │ ├── README.md │ ├── __init__.py │ ├── bounding_box.py │ ├── enumerators.py │ ├── general_utils.py │ ├── pascal_voc_evaluator.py │ └── pascal_voc_evaluator_test.py ├── setup.py └── train.py └── resources ├── autonomous-driving.md ├── datasets.md ├── deep-learning.md └── images └── overview.png /01_finding_lane_lines/README.md: -------------------------------------------------------------------------------- 1 | 2 | # Finding Lane Lines on the Road 3 | 4 | 5 | ## Basic Lane Finding Project 6 | 7 | For a self driving vehicle to stay in a lane, the first step is to identify lane lines before issuing commands to the control system. Since the lane lines can be of different colors (white, yellow) or forms (solid, dashed) this seemingly trivial task becomes increasingly difficult. Moreover, the situation is further exacerbated with variations in lighting conditions. Thankfully, there are a number of mathematical tools and approaches available nowadays to effectively extract lane lines from an image or dashcam video. 8 | 9 | ### Methodology 10 | 11 | Before attempting to detect lane lines in a video, a software pipeline is developed for lane detection in a series of images. Only after ensuring that it works satisfactorily for test images, the pipeline is employed for lane detection in a video. 12 | 13 | Consider the test image given below: 14 | 15 | ![](./test_images/solidWhiteRight.jpg) 16 | 17 | 1. The test image is first converted to grayscale from RGB using the helper function grayscale(). 18 | 19 | 2. The grayscaled image is given a gaussian blur to remove noise or spurious gradients. 20 | 21 | 3. Canny edge detection is applied on this blurred image and a binary image 22 | 23 | 4. A region of interest is defined to separate the lanes from sorrounding environment and a masked image containing only the lanes is extracted using cv2.bitwise_and() function. 24 | 25 | 5. This binary image of identified lane lines is finally merged with the original image using cv2.addweighted() function. 26 | 27 | 28 | - **Advanced:** Built an advanced lane-finding algorithm using distortion correction, image rectification, color transforms, and gradient thresholding. Identified lane curvature and vehicle displacement. Overcame environmental challenges such as shadows and pavement changes. 29 | 30 | [Advanced Lane Finding Project](https://github.com/vsingla2/Self-Driving-Car-NanoDegree-Udacity/blob/master/Term1-Computer-Vision-and-Deep-Learning/Project4-Advanced-Lane_Lines/Advanced-Lane-Lines.ipynb) -------------------------------------------------------------------------------- /01_finding_lane_lines/basic.py: -------------------------------------------------------------------------------- 1 | import matplotlib.pyplot as plt 2 | import matplotlib.image as mpimg 3 | import numpy as np 4 | import cv2 5 | import math 6 | import os 7 | 8 | # https://github.com/vsingla2/Self-Driving-Car-NanoDegree-Udacity/tree/master/Term1-Computer-Vision-and-Deep-Learning/Project1-Finding-Lane-Lines 9 | 10 | def grayscale(img): 11 | """Applies the Grayscale transform 12 | This will return an image with only one color channel 13 | but NOTE: to see the returned image as grayscale 14 | (assuming your grayscaled image is called 'gray') 15 | you should call plt.imshow(gray, cmap='gray')""" 16 | return cv2.cvtColor(img, cv2.COLOR_RGB2GRAY) 17 | # Or use BGR2GRAY if you read an image with cv2.imread() 18 | # return cv2.cvtColor(img, cv2.COLOR_BGR2GRAY) 19 | 20 | def rgbtohsv(img): 21 | "Applies rgb to hsv transform" 22 | return cv2.cvtColor(img, cv2.COLOR_RGB2HSV) 23 | 24 | def canny(img, low_threshold, high_threshold): 25 | """Applies the Canny transform""" 26 | return cv2.Canny(img, low_threshold, high_threshold) 27 | 28 | def gaussian_blur(img, kernel_size): 29 | """Applies a Gaussian Noise kernel""" 30 | return cv2.GaussianBlur(img, (kernel_size, kernel_size), 0) 31 | 32 | def region_of_interest(img, vertices): 33 | """ 34 | Applies an image mask. 35 | 36 | Only keeps the region of the image defined by the polygon 37 | formed from `vertices`. The rest of the image is set to black. 38 | """ 39 | #defining a blank mask to start with 40 | mask = np.zeros_like(img) 41 | 42 | #defining a 3 channel or 1 channel color to fill the mask with depending on the input image 43 | if len(img.shape) > 2: 44 | channel_count = img.shape[2] # i.e. 3 or 4 depending on your image 45 | ignore_mask_color = (255,) * channel_count 46 | else: 47 | ignore_mask_color = 255 48 | 49 | #filling pixels inside the polygon defined by "vertices" with the fill color 50 | cv2.fillPoly(mask, vertices, ignore_mask_color) 51 | 52 | #returning the image only where mask pixels are nonzero 53 | masked_image = cv2.bitwise_and(img, mask) 54 | return masked_image 55 | 56 | 57 | def draw_lines(img, lines, color=[200, 0, 0], thickness = 10): 58 | """ 59 | NOTE: this is the function you might want to use as a starting point once you want to 60 | average/extrapolate the line segments you detect to map out the full 61 | extent of the lane (going from the result shown in raw-lines-example.mp4 62 | to that shown in P1_example.mp4). 63 | 64 | Think about things like separating line segments by their 65 | slope ((y2-y1)/(x2-x1)) to decide which segments are part of the left 66 | line vs. the right line. Then, you can average the position of each of 67 | the lines and extrapolate to the top and bottom of the lane. 68 | 69 | This function draws `lines` with `color` and `thickness`. 70 | Lines are drawn on the image inplace (mutates the image). 71 | If you want to make the lines semi-transparent, think about combining 72 | this function with the weighted_img() function below 73 | """ 74 | x_left = [] 75 | y_left = [] 76 | x_right = [] 77 | y_right = [] 78 | imshape = image.shape 79 | ysize = imshape[0] 80 | ytop = int(0.6*ysize) # need y coordinates of the top and bottom of left and right lane 81 | ybtm = int(ysize) # to calculate x values once a line is found 82 | 83 | for line in lines: 84 | for x1,y1,x2,y2 in line: 85 | slope = float(((y2-y1)/(x2-x1))) 86 | if (slope > 0.5): # if the line slope is greater than tan(26.52 deg), it is the left line 87 | x_left.append(x1) 88 | x_left.append(x2) 89 | y_left.append(y1) 90 | y_left.append(y2) 91 | if (slope < -0.5): # if the line slope is less than tan(153.48 deg), it is the right line 92 | x_right.append(x1) 93 | x_right.append(x2) 94 | y_right.append(y1) 95 | y_right.append(y2) 96 | # only execute if there are points found that meet criteria, this eliminates borderline cases i.e. rogue frames 97 | if (x_left!=[]) & (x_right!=[]) & (y_left!=[]) & (y_right!=[]): 98 | left_line_coeffs = np.polyfit(x_left, y_left, 1) 99 | left_xtop = int((ytop - left_line_coeffs[1])/left_line_coeffs[0]) 100 | left_xbtm = int((ybtm - left_line_coeffs[1])/left_line_coeffs[0]) 101 | right_line_coeffs = np.polyfit(x_right, y_right, 1) 102 | right_xtop = int((ytop - right_line_coeffs[1])/right_line_coeffs[0]) 103 | right_xbtm = int((ybtm - right_line_coeffs[1])/right_line_coeffs[0]) 104 | cv2.line(img, (left_xtop, ytop), (left_xbtm, ybtm), color, thickness) 105 | cv2.line(img, (right_xtop, ytop), (right_xbtm, ybtm), color, thickness) 106 | 107 | def hough_lines(img, rho, theta, threshold, min_line_len, max_line_gap): 108 | """ 109 | `img` should be the output of a Canny transform. 110 | 111 | Returns an image with hough lines drawn. 112 | """ 113 | lines = cv2.HoughLinesP(img, rho, theta, threshold, np.array([]), minLineLength=min_line_len, maxLineGap=max_line_gap) 114 | line_img = np.zeros((img.shape[0], img.shape[1], 3), dtype=np.uint8) 115 | draw_lines(line_img, lines) 116 | return line_img 117 | 118 | # Python 3 has support for cool math symbols. 119 | 120 | def weighted_img(img, initial_img, α=0.8, β=1., λ=0.): 121 | """ 122 | `img` is the output of the hough_lines(), An image with lines drawn on it. 123 | Should be a blank image (all black) with lines drawn on it. 124 | 125 | `initial_img` should be the image before any processing. 126 | 127 | The result image is computed as follows: 128 | 129 | initial_img * α + img * β + λ 130 | NOTE: initial_img and img must be the same shape! 131 | """ 132 | return cv2.addWeighted(initial_img, α, img, β, λ) 133 | 134 | 135 | #reading in an image 136 | image = mpimg.imread('test_images/solidWhiteRight.jpg') 137 | 138 | #printing out some stats and plotting 139 | print('This image is:', type(image), 'with dimensions:', image.shape) 140 | 141 | # if you wanted to show a single color channel image called 'gray', for example, call as plt.imshow(gray, cmap='gray') 142 | plt.imshow(image) 143 | 144 | test_images_list = os.listdir("test_images/") 145 | 146 | 147 | # define parameters needed for helper functions (given inline) 148 | kernel_size = 5 # gaussian blur 149 | low_threshold = 60 # canny edge detection 150 | high_threshold = 180 # canny edge detection 151 | # Define the Hough transform parameters 152 | rho = 1 # distance resolution in pixels of the Hough grid 153 | theta = np.pi/180 # angular resolution in radians of the Hough grid 154 | threshold = 20 # minimum number of votes (intersections in Hough grid cell) 155 | min_line_length = 40 # minimum number of pixels making up a line 156 | max_line_gap = 25 # maximum gap in pixels between connectable line segments 157 | 158 | for test_image in test_images_list: # iterating through the images in test_images folder 159 | image = mpimg.imread('test_images/' + test_image) # reading in an image 160 | gray = grayscale(image) # convert to grayscale 161 | blur_gray = gaussian_blur(gray, kernel_size) # add gaussian blur to remove noise 162 | edges = canny(blur_gray, low_threshold, high_threshold) # perform canny edge detection 163 | # extract image size and define vertices of the four sided polygon for masking 164 | imshape = image.shape 165 | xsize = imshape[1] 166 | ysize = imshape[0] 167 | vertices = np.array([[(0.05*xsize, ysize ),(0.44*xsize, 0.6*ysize),\ 168 | (0.55*xsize, 0.6*ysize), (0.95*xsize, ysize)]], dtype=np.int32) # 169 | masked_edges = region_of_interest(edges, vertices) # retain information only in the region of interest 170 | line_image = hough_lines(masked_edges, rho, theta, threshold,\ 171 | min_line_length, max_line_gap) # perform hough transform and retain lines with specific properties 172 | lines_edges = weighted_img(line_image, image, α=0.8, β=1., λ=0.) # Draw the lines on the edge image 173 | plt.imshow(lines_edges) # Display the image 174 | plt.show() 175 | mpimg.imsave('test_images_output/' + test_image, lines_edges) # save the resulting image -------------------------------------------------------------------------------- /01_finding_lane_lines/test_images/solidWhiteRight.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ifding/deep-learning-python/fc8bc808d5439686f0ee24a4f0f3b1f5354df6c0/01_finding_lane_lines/test_images/solidWhiteRight.jpg -------------------------------------------------------------------------------- /02_traffic_sign_detector/.gitmodules: -------------------------------------------------------------------------------- 1 | [submodule "dlib"] 2 | path = dlib 3 | url = https://github.com/davisking/dlib 4 | -------------------------------------------------------------------------------- /02_traffic_sign_detector/CMakeLists.txt: -------------------------------------------------------------------------------- 1 | # 2 | # This is a CMake makefile. You can find the cmake utility and 3 | # information about it at http://www.cmake.org 4 | # 5 | 6 | cmake_minimum_required(VERSION 2.8.4) 7 | 8 | PROJECT(transito-cv) 9 | 10 | include(dlib/dlib/cmake) 11 | 12 | set(CMAKE_BUILD_TYPE Release) 13 | set(DLIB_NO_GUI_SUPPORT OFF) 14 | 15 | option(USE_AVX_INSTRUCTIONS "Compile your program with AVX instructions" OFF) 16 | 17 | IF(USE_AVX_INSTRUCTIONS) 18 | add_definitions(-mavx) 19 | add_definitions(-march=native) 20 | ENDIF() 21 | 22 | MACRO(add_source name) 23 | ADD_EXECUTABLE(${name} src/${name}.cpp) 24 | TARGET_LINK_LIBRARIES(${name} dlib ) 25 | ENDMACRO() 26 | 27 | add_source(hog_detector) 28 | add_source(train_object_detector) 29 | add_source(detect) 30 | add_source(view_hog) 31 | 32 | -------------------------------------------------------------------------------- /02_traffic_sign_detector/README.md: -------------------------------------------------------------------------------- 1 | 2 | 3 | This is a traffic sign detector and classifier that uses [dlib](http://dlib.net/) and its implementation of the Felzenszwalb's version of the Histogram of Oriented Gradients (HoG) detector. 4 | 5 | The training examples used in this repository are from Brazilian road signs, but the classifier should work with any traffic signs, as long as you train it properly. Google Street View images can be used to train the detectors. 25~40 images are sufficient to train a good detector. 6 | 7 | ![](https://cloud.githubusercontent.com/assets/294960/7904020/7d216ae0-07c3-11e5-96fe-2b9d020fec4c.png) 8 | 9 | Note: all programs accept `-h` as command-line parameter to show a help message. 10 | 11 | ## Build 12 | 13 | ``` 14 | mkdir build 15 | (cd build; cmake .. && cmake --build .) 16 | ``` 17 | 18 | If you want to enable AVX instructions (make sure you have compatibility): 19 | 20 | ``` 21 | (cd build; cmake .. -DUSE_AVX_INSTRUCTIONS=ON && cmake --build .) 22 | ``` 23 | 24 | ## Mark signs on images 25 | 26 | 1. Compile `imglab`: 27 | 28 | ``` 29 | cd dlib/tools/imglab 30 | mkdir build 31 | cd build 32 | cmake .. 33 | cmake --build . 34 | ``` 35 | 36 | 2. Create XML from sample images and Train the fHOG detector: 37 | 38 | Please check [transito-cv](https://github.com/fabioperez/transito-cv), there are methods with better results than HoG for traffic sign detector, such as Deep Learning architectures. 39 | 40 | 41 | ## Visualize HOG detectors 42 | 43 | To visualize detectors, use the program `view_hog`. Usage: 44 | 45 | ``` 46 | build/view_hog svm_detectors/pare_detector.svm 47 | ``` 48 | 49 | ![image](https://cloud.githubusercontent.com/assets/294960/8306983/6fa2ca40-1992-11e5-905d-04260fbfe128.png) 50 | 51 | 52 | ## Detect and Classify 53 | 54 | To detect and classify images, run `detect` with the video frames as parameters, use the parameter `--wait` to wait for user input to show next image. 55 | 56 | ``` 57 | build/detect --wait examples/images/*.jpg 58 | ``` 59 | 60 | ## Examples 61 | 62 | To run the examples: 63 | 64 | ``` 65 | build/detect --wait -u1 examples/images/* 66 | ``` 67 | 68 | ![image6](https://cloud.githubusercontent.com/assets/294960/8306981/6ef3e142-1992-11e5-91b0-e753737bcb5f.png) 69 | ![image7](https://cloud.githubusercontent.com/assets/294960/8306982/6f7f22c0-1992-11e5-8c2e-4079ddffec47.png) 70 | ![image8](https://cloud.githubusercontent.com/assets/294960/8306980/6edb6ae0-1992-11e5-9d77-ddbd0cd59a7b.png) 71 | 72 | 73 | ## References 74 | 75 | - 76 | -------------------------------------------------------------------------------- /02_traffic_sign_detector/examples/images/lombada.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ifding/deep-learning-python/fc8bc808d5439686f0ee24a4f0f3b1f5354df6c0/02_traffic_sign_detector/examples/images/lombada.png -------------------------------------------------------------------------------- /02_traffic_sign_detector/examples/images/pare.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ifding/deep-learning-python/fc8bc808d5439686f0ee24a4f0f3b1f5354df6c0/02_traffic_sign_detector/examples/images/pare.jpg -------------------------------------------------------------------------------- /02_traffic_sign_detector/examples/images/pedestre.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ifding/deep-learning-python/fc8bc808d5439686f0ee24a4f0f3b1f5354df6c0/02_traffic_sign_detector/examples/images/pedestre.jpg -------------------------------------------------------------------------------- /02_traffic_sign_detector/src/detect.cpp: -------------------------------------------------------------------------------- 1 | /* HOG DETECTOR 2 | * 3 | */ 4 | 5 | #include 6 | #include 7 | #include 8 | #include 9 | #include 10 | #include 11 | 12 | #include 13 | #include 14 | #include 15 | 16 | using namespace std; 17 | using namespace dlib; 18 | 19 | struct TrafficSign { 20 | string name; 21 | string svm_path; 22 | rgb_pixel color; 23 | TrafficSign(string name, string svm_path, rgb_pixel color) : 24 | name(name), svm_path(svm_path), color(color) {}; 25 | }; 26 | 27 | int main(int argc, char** argv) { 28 | try { 29 | command_line_parser parser; 30 | 31 | parser.add_option("h","Display this help message."); 32 | parser.add_option("u", "Upsample each input image times. Each \ 33 | upsampling quadruples the number of pixels in the image \ 34 | (default: 0).", 1); 35 | parser.add_option("wait","Wait user input to show next image."); 36 | 37 | parser.parse(argc, argv); 38 | parser.check_option_arg_range("u", 0, 8); 39 | 40 | const char* one_time_opts[] = {"h","u","wait"}; 41 | parser.check_one_time_options(one_time_opts); 42 | 43 | // Display help message 44 | if (parser.option("h")) { 45 | cout << "Usage: " << argv[0] << " [options] " << endl; 46 | parser.print_options(); 47 | 48 | return EXIT_SUCCESS; 49 | } 50 | 51 | if (parser.number_of_arguments() == 0) { 52 | cout << "You must give a list of input files." << endl; 53 | cout << "\nTry the -h option for more information." << endl; 54 | return EXIT_FAILURE; 55 | } 56 | 57 | const unsigned long upsample_amount = get_option(parser, "u", 0); 58 | 59 | dlib::array > images; 60 | 61 | images.resize(parser.number_of_arguments()); 62 | 63 | for (unsigned long i = 0; i < images.size(); ++i) { 64 | load_image(images[i], parser[i]); 65 | } 66 | 67 | for (unsigned long i = 0; i < upsample_amount; ++i) { 68 | for (unsigned long j = 0; j < images.size(); ++j) { 69 | pyramid_up(images[j]); 70 | } 71 | } 72 | 73 | typedef scan_fhog_pyramid > image_scanner_type; 74 | 75 | // Load SVM detectors 76 | std::vector signs; 77 | signs.push_back(TrafficSign("PARE", "svm_detectors/pare_detector.svm", 78 | rgb_pixel(255,0,0))); 79 | signs.push_back(TrafficSign("LOMBADA", "svm_detectors/lombada_detector.svm", 80 | rgb_pixel(255,122,0))); 81 | signs.push_back(TrafficSign("PEDESTRE", "svm_detectors/pedestre_detector.svm", 82 | rgb_pixel(255,255,0))); 83 | 84 | std::vector > detectors; 85 | 86 | for (int i = 0; i < signs.size(); i++) { 87 | object_detector detector; 88 | deserialize(signs[i].svm_path) >> detector; 89 | detectors.push_back(detector); 90 | } 91 | 92 | image_window win; 93 | std::vector rects; 94 | for (unsigned long i = 0; i < images.size(); ++i) { 95 | evaluate_detectors(detectors, images[i], rects); 96 | 97 | // Put the image and detections into the window. 98 | win.clear_overlay(); 99 | win.set_image(images[i]); 100 | 101 | for (unsigned long j = 0; j < rects.size(); ++j) { 102 | win.add_overlay(rects[j].rect, signs[rects[j].weight_index].color, 103 | signs[rects[j].weight_index].name); 104 | } 105 | 106 | if (parser.option("wait")) { 107 | cout << "Press any key to continue..."; 108 | cin.get(); 109 | } 110 | } 111 | } 112 | catch (exception& e) { 113 | cout << "\nexception thrown!" << endl; 114 | cout << e.what() << endl; 115 | } 116 | } 117 | -------------------------------------------------------------------------------- /02_traffic_sign_detector/src/hog_detector.cpp: -------------------------------------------------------------------------------- 1 | /* HOG DETECTOR TRAINER 2 | * This program trains a fHOG detector. 3 | * For help, run ./hog_detector -h 4 | * 5 | * Sample usage: 6 | * ./hog_detector -u1 --filter 0.4 -v images/pare 7 | * 8 | * To better understand the code of this detector, read the following example codes: 9 | * http://dlib.net/fhog_object_detector_ex.cpp.html 10 | * http://dlib.net/train_object_detector.cpp.html 11 | */ 12 | 13 | #include 14 | #include 15 | #include 16 | #include 17 | #include 18 | #include 19 | 20 | #include 21 | #include 22 | #include 23 | 24 | using namespace std; 25 | using namespace dlib; 26 | 27 | int main(int argc, char** argv) { 28 | try { 29 | command_line_parser parser; 30 | parser.add_option("h","Display this help message."); 31 | parser.add_option("c","Set the SVM C parameter to (default: 1.0).",1); 32 | parser.add_option("u", "Upsample each input image times. Each upsampling quadruples the number of pixels in the image (default: 0).", 1); 33 | parser.add_option("v","Be verbose."); 34 | parser.add_option("filter","Remove filters with singular value less than (default: disabled).", 1); 35 | parser.add_option("detector-name","Save SVM detector to (default: 'detector.svm').", 1); 36 | parser.add_option("threads", "Use threads for training (default: 4).",1); 37 | parser.add_option("eps", "Set SVM training epsilon to (default: 0.01).", 1); 38 | parser.add_option("norm", "If set, the nuclear norm regularization strength will be (default: disabled).", 1); 39 | 40 | // TODO: Variable window size 41 | #if 0 42 | parser.add_option("w","Set window size to x pixels (default: 80x80.", 2); 43 | #endif 44 | 45 | parser.parse(argc, argv); 46 | 47 | // Can't give an option more than once 48 | const char* one_time_opts[] = {"h","c","u","v","detector-name", "threads", "eps","filter","norm"}; 49 | parser.check_one_time_options(one_time_opts); 50 | 51 | // Check parameters values 52 | parser.check_option_arg_range("c", 1e-12, 1e12); 53 | parser.check_option_arg_range("u", 0, 8); 54 | parser.check_option_arg_range("threads", 1, 1000); 55 | parser.check_option_arg_range("eps", 1e-5, 1e4); 56 | parser.check_option_arg_range("filter", 0.0, 2.0); 57 | parser.check_option_arg_range("norm", 1e-12, 1e12); 58 | 59 | // Display help message 60 | if (parser.option("h")) { 61 | cout << "Usage: " << argv[0] << " [options] " << endl; 62 | cout << " must countain the files training.xml and testing.xml." << endl; 63 | parser.print_options(); 64 | 65 | return EXIT_SUCCESS; 66 | } 67 | 68 | if (parser.number_of_arguments() == 0) { 69 | cout << "You must give an image or an image dataset metadata XML file produced by the imglab tool." << endl; 70 | cout << "\nTry the -h option for more information." << endl; 71 | return EXIT_FAILURE; 72 | } 73 | 74 | // Declarations and parameters 75 | const std::string dir = parser[0]; 76 | dlib::array > images_train, images_test; 77 | std::vector > sign_boxes_train, sign_boxes_test; 78 | const double c_val = get_option(parser, "c", 1.0); 79 | const unsigned long upsample_amount = get_option(parser, "u", 0); 80 | const std::string detector_name = get_option(parser, "detector-name", "detector.svm"); 81 | const int num_threads = get_option(parser, "threads", 4); 82 | const double eps = get_option(parser, "eps", 0.01); 83 | const double filter_val = get_option(parser, "filter", 0.0); 84 | const double norm = get_option(parser, "norm", 0.0); 85 | 86 | cout << "Training with the following parameters: " << endl; 87 | cout << " threads: "<< num_threads << endl; 88 | cout << " C: "<< c_val << endl; 89 | cout << " epsilon: "<< eps << endl; 90 | cout << " upsample this many times : "<< upsample_amount << endl; 91 | cout << " filter threshold : "<< filter_val << endl; 92 | cout << " NNR strenght : "<< norm << endl; 93 | 94 | // Load training and testing datasets 95 | load_image_dataset(images_train, sign_boxes_train, dir+"/training.xml"); 96 | load_image_dataset(images_test, sign_boxes_test, dir+"/testing.xml"); 97 | 98 | // Upsample images (set by -u parameters) 99 | for (unsigned long i = 0; i < upsample_amount; ++i) { 100 | upsample_image_dataset >(images_train, sign_boxes_train); 101 | upsample_image_dataset >(images_test, sign_boxes_test); 102 | } 103 | 104 | // Create fHOG scanner 105 | typedef scan_fhog_pyramid > image_scanner_type; 106 | image_scanner_type scanner; 107 | if (norm > 0.0) scanner.set_nuclear_norm_regularization_strength(norm); 108 | scanner.set_detection_window_size(80, 80); 109 | structural_object_detection_trainer trainer(scanner); 110 | trainer.set_num_threads(num_threads); // Number of working threads 111 | trainer.set_c(c_val); // SVM C-value 112 | if (parser.option("v")) trainer.be_verbose(); 113 | trainer.set_epsilon(eps); 114 | object_detector detector = trainer.train(images_train, sign_boxes_train); 115 | 116 | if (filter_val > 0.0) { 117 | int num_filters_before = num_separable_filters(detector); 118 | detector = threshold_filter_singular_values(detector,filter_val); 119 | cout << num_filters_before-num_separable_filters(detector) << " filters were removed." << endl; 120 | } 121 | 122 | // Test results on training and testing dataset 123 | cout << "training results: " << test_object_detection_function(detector, images_train, sign_boxes_train) << endl; 124 | cout << "testing results: " << test_object_detection_function(detector, images_test, sign_boxes_test) << endl; 125 | 126 | // Save detector to disk 127 | serialize(detector_name) << detector; 128 | } 129 | catch (exception& e) { 130 | cout << "\nexception thrown!" << endl; 131 | cout << e.what() << endl; 132 | } 133 | } 134 | -------------------------------------------------------------------------------- /02_traffic_sign_detector/src/view_hog.cpp: -------------------------------------------------------------------------------- 1 | /* Visualise a fHOG detector. 2 | * This program takes a fHOG detector as input and displays it in a window. 3 | * 4 | * Usage: 5 | * ./view_hog detector.svm 6 | */ 7 | 8 | #include 9 | #include 10 | #include 11 | 12 | #include 13 | #include 14 | #include 15 | 16 | using namespace std; 17 | using namespace dlib; 18 | 19 | int main(int argc, char** argv) { 20 | try { 21 | command_line_parser parser; 22 | parser.add_option("h","Display this help message."); 23 | 24 | parser.parse(argc, argv); 25 | const char* one_time_opts[] = {"h"}; 26 | parser.check_one_time_options(one_time_opts); 27 | 28 | // Display help message 29 | if (parser.option("h")) { 30 | cout << "Usage: " << argv[0] << " [options] " << endl; 31 | parser.print_options(); 32 | 33 | return EXIT_SUCCESS; 34 | } 35 | 36 | if (parser.number_of_arguments() == 0) { 37 | cout << "You must give a fHOG SVM file as input." << endl; 38 | cout << "\nTry the -h option for more information." << endl; 39 | return EXIT_FAILURE; 40 | } 41 | 42 | typedef scan_fhog_pyramid > image_scanner_type; 43 | object_detector detector; 44 | deserialize(argv[1]) >> detector; 45 | image_window hogwin(draw_fhog(detector), "Learned fHOG detector"); 46 | cout << "Press any key to exit!" << endl; 47 | cin.get(); // Wait input to exit 48 | } 49 | catch (exception& e) { 50 | cout << "\nexception thrown!" << endl; 51 | cout << e.what() << endl; 52 | } 53 | } 54 | -------------------------------------------------------------------------------- /02_traffic_sign_detector/svm_detectors/lombada_detector.svm: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ifding/deep-learning-python/fc8bc808d5439686f0ee24a4f0f3b1f5354df6c0/02_traffic_sign_detector/svm_detectors/lombada_detector.svm -------------------------------------------------------------------------------- /02_traffic_sign_detector/svm_detectors/pare_detector.svm: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ifding/deep-learning-python/fc8bc808d5439686f0ee24a4f0f3b1f5354df6c0/02_traffic_sign_detector/svm_detectors/pare_detector.svm -------------------------------------------------------------------------------- /02_traffic_sign_detector/svm_detectors/pedestre_detector.svm: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ifding/deep-learning-python/fc8bc808d5439686f0ee24a4f0f3b1f5354df6c0/02_traffic_sign_detector/svm_detectors/pedestre_detector.svm -------------------------------------------------------------------------------- /03_opencv_detection/CMakeLists.txt: -------------------------------------------------------------------------------- 1 | # Older versions of CMake are likely to work just fine but, since 2 | # I don't know where to cut off I just use the version I'm using 3 | cmake_minimum_required(VERSION "3.17") 4 | 5 | # name of this example project 6 | project(demo) 7 | 8 | # set OpenCV_DIR variable equal to the path to the cmake 9 | # files within the previously installed opencv program 10 | set(OpenCV_DIR /Users/ifding/Documents/Code/opencv/install/lib/cmake/opencv4) 11 | 12 | # Tell compiler to use C++ 17 features which is needed because 13 | # Clang version is often behind in the XCode installation 14 | set(CMAKE_CXX_STANDARD 17) 15 | 16 | # configure the necessary common CMake environment variables 17 | # needed to include and link the OpenCV program into this 18 | # demo project, namely OpenCV_INCLUDE_DIRS and OpenCV_LIBS 19 | find_package( OpenCV REQUIRED ) 20 | 21 | # tell the build to include the headers from OpenCV 22 | include_directories( ${OpenCV_INCLUDE_DIRS} ) 23 | 24 | # specify the executable target to be built 25 | add_executable(demo main.cpp) 26 | 27 | # tell it to link the executable target against OpenCV 28 | target_link_libraries(demo ${OpenCV_LIBS} ) -------------------------------------------------------------------------------- /03_opencv_detection/README.md: -------------------------------------------------------------------------------- 1 | 2 | # Object detection in OpenCV 3 | 4 | 5 | - Install OpenCV by following [here](https://thecodinginterface.com/blog/opencv-cpp-vscode/) 6 | 7 | - Download `frozen_inference_graph.pb` from [learnopencv](https://github.com/spmallick/learnopencv/tree/master/Deep-Learning-with-OpenCV-DNN-Module/input) 8 | 9 | - More models can be found in 10 | 11 | To run the code in C++: 12 | 13 | ```bash 14 | mkdir build 15 | cd build 16 | cmake .. 17 | cmake --build . --config Release 18 | cd .. 19 | ./build/demo 20 | ``` 21 | -------------------------------------------------------------------------------- /03_opencv_detection/input/object_detection_classes_coco.txt: -------------------------------------------------------------------------------- 1 | person 2 | bicycle 3 | car 4 | motorcycle 5 | airplane 6 | bus 7 | train 8 | truck 9 | boat 10 | traffic light 11 | fire hydrant 12 | street sign 13 | stop sign 14 | parking meter 15 | bench 16 | bird 17 | cat 18 | dog 19 | horse 20 | sheep 21 | cow 22 | elephant 23 | bear 24 | zebra 25 | giraffe 26 | hat 27 | backpack 28 | umbrella 29 | shoe 30 | eye glasses 31 | handbag 32 | tie 33 | suitcase 34 | frisbee 35 | skis 36 | snowboard 37 | sports ball 38 | kite 39 | baseball bat 40 | baseball glove 41 | skateboard 42 | surfboard 43 | tennis racket 44 | bottle 45 | plate 46 | wine glass 47 | cup 48 | fork 49 | knife 50 | spoon 51 | bowl 52 | banana 53 | apple 54 | sandwich 55 | orange 56 | broccoli 57 | carrot 58 | hot dog 59 | pizza 60 | donut 61 | cake 62 | chair 63 | couch 64 | potted plant 65 | bed 66 | mirror 67 | dining table 68 | window 69 | desk 70 | toilet 71 | door 72 | tv 73 | laptop 74 | mouse 75 | remote 76 | keyboard 77 | cell phone 78 | microwave 79 | oven 80 | toaster 81 | sink 82 | refrigerator 83 | blender 84 | book 85 | clock 86 | vase 87 | scissors 88 | teddy bear 89 | hair drier 90 | toothbrush 91 | -------------------------------------------------------------------------------- /03_opencv_detection/input/video_1.mp4: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ifding/deep-learning-python/fc8bc808d5439686f0ee24a4f0f3b1f5354df6c0/03_opencv_detection/input/video_1.mp4 -------------------------------------------------------------------------------- /03_opencv_detection/main.cpp: -------------------------------------------------------------------------------- 1 | #include 2 | #include 3 | #include 4 | #include 5 | #include 6 | 7 | #include 8 | #include 9 | #include 10 | 11 | 12 | using namespace cv; 13 | using namespace dnn; 14 | 15 | float confThreshold = 0.4, nmsThreshold = 0.4; 16 | std::vector classes; 17 | 18 | inline void preprocess(const Mat& frame, Net& net, Size inpSize, float scale, 19 | const Scalar& mean, bool swapRB); 20 | 21 | void postprocess(Mat& frame, const std::vector& out, Net& net); 22 | 23 | void drawPred(int classId, float conf, int left, int top, int right, int bottom, Mat& frame); 24 | 25 | 26 | template 27 | class QueueFPS : public std::queue 28 | { 29 | public: 30 | QueueFPS() : counter(0) {} 31 | 32 | void push(const T& entry) 33 | { 34 | std::lock_guard lock(mutex); 35 | 36 | std::queue::push(entry); 37 | counter += 1; 38 | if (counter == 1) 39 | { 40 | // Start counting from a second frame (warmup). 41 | tm.reset(); 42 | tm.start(); 43 | } 44 | } 45 | 46 | T get() 47 | { 48 | std::lock_guard lock(mutex); 49 | T entry = this->front(); 50 | this->pop(); 51 | return entry; 52 | } 53 | 54 | float getFPS() 55 | { 56 | tm.stop(); 57 | double fps = counter / tm.getTimeSec(); 58 | tm.start(); 59 | return static_cast(fps); 60 | } 61 | 62 | void clear() 63 | { 64 | std::lock_guard lock(mutex); 65 | while (!this->empty()) 66 | this->pop(); 67 | } 68 | 69 | unsigned int counter; 70 | 71 | private: 72 | TickMeter tm; 73 | std::mutex mutex; 74 | }; 75 | 76 | 77 | int main(int argc, char** argv) 78 | { 79 | 80 | float scale = 1.0; 81 | Scalar mean = Scalar(104, 177, 123); 82 | bool swapRB = false; 83 | int inpWidth = 300; 84 | int inpHeight = 300; 85 | size_t asyncNumReq = 0; 86 | 87 | // Open file with classes names. 88 | std::string file = "./input/object_detection_classes_coco.txt"; 89 | std::ifstream ifs(file.c_str()); 90 | if (!ifs.is_open()) 91 | CV_Error(Error::StsError, "File " + file + " not found"); 92 | std::string line; 93 | while (std::getline(ifs, line)) 94 | { 95 | classes.push_back(line); 96 | } 97 | 98 | std::string modelPath = "./input/frozen_inference_graph.pb"; 99 | std::string configPath = "./input/ssd_mobilenet_v2_coco_2018_03_29.pbtxt.txt"; 100 | std::string framework = "TensorFlow"; 101 | 102 | // Load a model. 103 | Net net = readNet(modelPath, configPath, framework); 104 | std::vector outNames = net.getUnconnectedOutLayersNames(); 105 | 106 | // Create a window 107 | static const std::string kWinName = "Object detection in OpenCV"; 108 | namedWindow(kWinName, WINDOW_NORMAL); 109 | 110 | // Open a video file or an image file or a camera stream. 111 | //VideoCapture cap; 112 | VideoCapture cap("./input/video_1.mp4"); 113 | 114 | bool process = true; 115 | 116 | // Frames capturing thread 117 | QueueFPS framesQueue; 118 | std::thread framesThread([&](){ 119 | Mat frame; 120 | while (process) 121 | { 122 | cap >> frame; 123 | if (!frame.empty()) 124 | framesQueue.push(frame.clone()); 125 | else 126 | break; 127 | } 128 | }); 129 | 130 | // Frames processing thread 131 | QueueFPS processedFramesQueue; 132 | QueueFPS > predictionsQueue; 133 | std::thread processingThread([&](){ 134 | std::queue futureOutputs; 135 | Mat blob; 136 | while (process) 137 | { 138 | // Get a next frame 139 | Mat frame; 140 | { 141 | if (!framesQueue.empty()) 142 | { 143 | frame = framesQueue.get(); 144 | if (asyncNumReq) 145 | { 146 | if (futureOutputs.size() == asyncNumReq) 147 | frame = Mat(); 148 | } 149 | else 150 | framesQueue.clear(); // Skip the rest of frames 151 | } 152 | } 153 | 154 | // Process the frame 155 | if (!frame.empty()) 156 | { 157 | preprocess(frame, net, Size(inpWidth, inpHeight), scale, mean, swapRB); 158 | processedFramesQueue.push(frame); 159 | 160 | if (asyncNumReq) 161 | { 162 | futureOutputs.push(net.forwardAsync()); 163 | } 164 | else 165 | { 166 | std::vector outs; 167 | net.forward(outs, outNames); 168 | predictionsQueue.push(outs); 169 | } 170 | } 171 | 172 | while (!futureOutputs.empty() && 173 | futureOutputs.front().wait_for(std::chrono::seconds(0))) 174 | { 175 | AsyncArray async_out = futureOutputs.front(); 176 | futureOutputs.pop(); 177 | Mat out; 178 | async_out.get(out); 179 | predictionsQueue.push({out}); 180 | } 181 | } 182 | }); 183 | 184 | // Postprocessing and rendering loop 185 | while (waitKey(1) < 0) 186 | { 187 | if (predictionsQueue.empty()) 188 | continue; 189 | 190 | std::vector outs = predictionsQueue.get(); 191 | Mat frame = processedFramesQueue.get(); 192 | 193 | postprocess(frame, outs, net); 194 | 195 | if (predictionsQueue.counter > 1) 196 | { 197 | std::string label = format("Camera: %.2f FPS", framesQueue.getFPS()); 198 | putText(frame, label, Point(0, 15), FONT_HERSHEY_SIMPLEX, 0.5, Scalar(0, 255, 0)); 199 | 200 | label = format("Network: %.2f FPS", predictionsQueue.getFPS()); 201 | putText(frame, label, Point(0, 30), FONT_HERSHEY_SIMPLEX, 0.5, Scalar(0, 255, 0)); 202 | 203 | label = format("Skipped frames: %d", framesQueue.counter - predictionsQueue.counter); 204 | putText(frame, label, Point(0, 45), FONT_HERSHEY_SIMPLEX, 0.5, Scalar(0, 255, 0)); 205 | } 206 | imshow(kWinName, frame); 207 | } 208 | 209 | process = false; 210 | framesThread.join(); 211 | processingThread.join(); 212 | 213 | return 0; 214 | } 215 | 216 | inline void preprocess(const Mat& frame, Net& net, Size inpSize, float scale, 217 | const Scalar& mean, bool swapRB) 218 | { 219 | static Mat blob; 220 | // Create a 4D blob from a frame. 221 | if (inpSize.width <= 0) inpSize.width = frame.cols; 222 | if (inpSize.height <= 0) inpSize.height = frame.rows; 223 | blobFromImage(frame, blob, 1.0, inpSize, Scalar(), swapRB, false, CV_8U); 224 | 225 | // Run a model. 226 | net.setInput(blob, "", scale, mean); 227 | if (net.getLayer(0)->outputNameToIndex("im_info") != -1) // Faster-RCNN or R-FCN 228 | { 229 | resize(frame, frame, inpSize); 230 | Mat imInfo = (Mat_(1, 3) << inpSize.height, inpSize.width, 1.6f); 231 | net.setInput(imInfo, "im_info"); 232 | } 233 | } 234 | 235 | void postprocess(Mat& frame, const std::vector& outs, Net& net) 236 | { 237 | static std::vector outLayers = net.getUnconnectedOutLayers(); 238 | 239 | std::vector classIds; 240 | std::vector confidences; 241 | std::vector boxes; 242 | 243 | // Network produces output blob with a shape 1x1xNx7 where N is a number of 244 | // detections and an every detection is a vector of values 245 | // [batchId, classId, confidence, left, top, right, bottom] 246 | CV_Assert(outs.size() > 0); 247 | for (size_t k = 0; k < outs.size(); k++) 248 | { 249 | float* data = (float*)outs[k].data; 250 | for (size_t i = 0; i < outs[k].total(); i += 7) 251 | { 252 | float confidence = data[i + 2]; 253 | if (confidence > confThreshold) 254 | { 255 | int left = (int)data[i + 3]; 256 | int top = (int)data[i + 4]; 257 | int right = (int)data[i + 5]; 258 | int bottom = (int)data[i + 6]; 259 | int width = right - left + 1; 260 | int height = bottom - top + 1; 261 | if (width <= 2 || height <= 2) 262 | { 263 | left = (int)(data[i + 3] * frame.cols); 264 | top = (int)(data[i + 4] * frame.rows); 265 | right = (int)(data[i + 5] * frame.cols); 266 | bottom = (int)(data[i + 6] * frame.rows); 267 | width = right - left + 1; 268 | height = bottom - top + 1; 269 | } 270 | classIds.push_back((int)(data[i + 1]) - 1); // Skip 0th background class id. 271 | boxes.push_back(Rect(left, top, width, height)); 272 | confidences.push_back(confidence); 273 | } 274 | } 275 | } 276 | 277 | // NMS is used inside Region layer only on DNN_BACKEND_OPENCV for another backends we need NMS in sample 278 | // or NMS is required if number of outputs > 1 279 | if (outLayers.size() > 1) 280 | { 281 | std::map > class2indices; 282 | for (size_t i = 0; i < classIds.size(); i++) 283 | { 284 | if (confidences[i] >= confThreshold) 285 | { 286 | class2indices[classIds[i]].push_back(i); 287 | } 288 | } 289 | std::vector nmsBoxes; 290 | std::vector nmsConfidences; 291 | std::vector nmsClassIds; 292 | for (auto it = class2indices.begin(); it != class2indices.end(); ++it) 293 | { 294 | std::vector localBoxes; 295 | std::vector localConfidences; 296 | std::vector classIndices = it->second; 297 | for (size_t i = 0; i < classIndices.size(); i++) 298 | { 299 | localBoxes.push_back(boxes[classIndices[i]]); 300 | localConfidences.push_back(confidences[classIndices[i]]); 301 | } 302 | std::vector nmsIndices; 303 | NMSBoxes(localBoxes, localConfidences, confThreshold, nmsThreshold, nmsIndices); 304 | for (size_t i = 0; i < nmsIndices.size(); i++) 305 | { 306 | size_t idx = nmsIndices[i]; 307 | nmsBoxes.push_back(localBoxes[idx]); 308 | nmsConfidences.push_back(localConfidences[idx]); 309 | nmsClassIds.push_back(it->first); 310 | } 311 | } 312 | boxes = nmsBoxes; 313 | classIds = nmsClassIds; 314 | confidences = nmsConfidences; 315 | } 316 | 317 | for (size_t idx = 0; idx < boxes.size(); ++idx) 318 | { 319 | Rect box = boxes[idx]; 320 | drawPred(classIds[idx], confidences[idx], box.x, box.y, 321 | box.x + box.width, box.y + box.height, frame); 322 | } 323 | } 324 | 325 | void drawPred(int classId, float conf, int left, int top, int right, int bottom, Mat& frame) 326 | { 327 | rectangle(frame, Point(left, top), Point(right, bottom), Scalar(0, 255, 0)); 328 | 329 | std::string label = format("%.2f", conf); 330 | if (!classes.empty()) 331 | { 332 | CV_Assert(classId < (int)classes.size()); 333 | label = classes[classId] + ": " + label; 334 | } 335 | 336 | int baseLine; 337 | Size labelSize = getTextSize(label, FONT_HERSHEY_SIMPLEX, 0.5, 1, &baseLine); 338 | 339 | top = max(top, labelSize.height); 340 | rectangle(frame, Point(left, top - labelSize.height), 341 | Point(left + labelSize.width, top + baseLine), Scalar::all(255), FILLED); 342 | putText(frame, label, Point(left, top), FONT_HERSHEY_SIMPLEX, 0.5, Scalar()); 343 | } 344 | -------------------------------------------------------------------------------- /04_vehicle_detection/README.md: -------------------------------------------------------------------------------- 1 | # Vehicle Detection Project 2 | 3 | 4 | ### Abstract 5 | 6 | The goal of the project was to develop a pipeline to reliably detect cars given a video from a roof-mounted camera: in this readme the reader will find a short summary of how I tackled the problem. 7 | 8 | **Long story short**: 9 | - (baseline) HOG features + linear SVM to detect cars, temporal smoothing to discard false positive 10 | - (submission) [SSD deep network](https://arxiv.org/pdf/1512.02325.pdf) for detection, thresholds on detection confidence and label to discard false positive 11 | 12 | *That said, let's go into details!* 13 | 14 | ### Good old CV: Histogram of Oriented Gradients (HOG) 15 | 16 | #### 1. Feature Extraction. 17 | 18 | In the field of computer vision, a *features* is a compact representation that encodes information that is relevant for a given task. In our case, features must be informative enough to distinguish between *car* and *non-car* image patches as accurately as possible. 19 | 20 | Here is an example of how the `vehicle` and `non-vehicle` classes look like in this dataset: 21 | 22 |

23 | non_car_img 24 |
Randomly-samples non-car patches. 25 |

26 | 27 |

28 | car_img 29 |
Randomly-samples car patches. 30 |

31 | 32 | The most of the code that relates to feature extraction is contained in [`functions_feat_extraction.py`](functions_feat_extraction.py). Nonetheless, all parameters used in the phase of feature extraction are stored as dictionary in [`config.py`](config.py), in order to be able to access them from anywhere in the project. 33 | 34 | Actual feature extraction is performed by the function `image_to_features`, which takes as input an image and the dictionary of parameters, and returns the features computed for that image. In order to perform batch feature extraction on the whole dataset (for training), `extract_features_from_file_list` takes as input a list of images and return a list of feature vectors, one for each input image. 35 | 36 | For the task of car detection I used *color histograms* and *spatial features* to encode the object visual appearence and HOG features to encode the object's *shape*. While color the first two features are easy to understand and implement, HOG features can be a little bit trickier to master. 37 | 38 | #### 2. Choosing HOG parameters. 39 | 40 | HOG stands for *Histogram of Oriented Gradients* and refer to a powerful descriptor that has met with a wide success in the computer vision community, since its [introduction](http://vc.cs.nthu.edu.tw/home/paper/codfiles/hkchiu/201205170946/Histograms%20of%20Oriented%20Gradients%20for%20Human%20Detection.pdf) in 2005 with the main purpose of people detection. 41 | 42 |

43 | hog 44 |
Representation of HOG descriptors for a car patch (left) and a non-car patch (right). 45 |

46 | 47 | The bad news is, HOG come along with a *lot* of parameters to tune in order to work properly. The main parameters are the size of the cell in which the gradients are accumulated, as well as the number of orientations used to discretize the histogram of gradients. Furthermore, one must specify the number of cells that compose a block, on which later a feature normalization will be performed. Finally, being the HOG computed on a single-channel image, arises the need of deciding which channel to use, eventually computing the feature on all channels then concatenating the result. 48 | 49 | In order to select the right parameters, both the classifier accuracy and computational efficiency are to consider. After various attemps, I came up to the following parameters that are stored in [`config.py`](config.py): 50 | ``` 51 | # parameters used in the phase of feature extraction 52 | feat_extraction_params = {'resize_h': 64, # resize image height before feat extraction 53 | 'resize_w': 64, # resize image height before feat extraction 54 | 'color_space': 'YCrCb', # Can be RGB, HSV, LUV, HLS, YUV, YCrCb 55 | 'orient': 9, # HOG orientations 56 | 'pix_per_cell': 8, # HOG pixels per cell 57 | 'cell_per_block': 2, # HOG cells per block 58 | 'hog_channel': "ALL", # Can be 0, 1, 2, or "ALL" 59 | 'spatial_size': (32, 32), # Spatial binning dimensions 60 | 'hist_bins': 16, # Number of histogram bins 61 | 'spatial_feat': True, # Spatial features on or off 62 | 'hist_feat': True, # Histogram features on or off 63 | 'hog_feat': True} # HOG features on or off 64 | ``` 65 | 66 | #### 3. Training the classifier 67 | 68 | Once decided which features to used, we can train a classifier on these. In [`train.py`](train.py) I train a linear SVM for task of binary classification *car* vs *non-car*. First, training data are listed a feature vector is extracted for each image: 69 | ``` 70 | cars = get_file_list_recursively(root_data_vehicle) 71 | notcars = get_file_list_recursively(root_data_non_vehicle) 72 | 73 | car_features = extract_features_from_file_list(cars, feat_extraction_params) 74 | notcar_features = extract_features_from_file_list(notcars, feat_extraction_params) 75 | ``` 76 | Then, the actual training set is composed as the set of all car and all non-car features (labels are given accordingly). Furthermore, feature vectors are standardize in order to have all the features in a similar range and ease training. 77 | ``` 78 | feature_scaler = StandardScaler().fit(X) # per-column scaler 79 | scaled_X = feature_scaler.transform(X) 80 | ``` 81 | Now, training the LinearSVM classifier is as easy as: 82 | ``` 83 | svc = LinearSVC() # svc = SVC(kernel='rbf') 84 | svc.fit(X_train, y_train) 85 | ``` 86 | In order to have an idea of the classifier performance, we can make a prediction on the test set with `svc.score(X_test, y_test)`. Training the SVM with the features explained above took around 10 minutes on my laptop. 87 | 88 | ### Sliding Window Search 89 | 90 | #### 1. Describe how (and identify where in your code) you implemented a sliding window search. How did you decide what scales to search and how much to overlap windows? 91 | 92 | In a first phase, I implemented a naive sliding window approach in order to get windows at different scales for the purpose of classification. This is shown in function `compute_windows_multiscale` in [`functions_detection.py`](functions_detection.py). This turned out to be very slow. I utlimately implemented a function to jointly search the region of interest and to classify each window as suggested by the course instructor. The performance boost is due to the fact that HOG features are computed only once for the whole region of interest, then subsampled at different scales in order to have the same effect of a multiscale search, but in a more computationally efficient way. This function is called `find_cars` and implemented in [`functions_feat_extraction.py`](functions_feat_extraction.py). Of course the *tradeoff* is evident: the more the search scales and the more the overlap between adjacent windows, the less performing is the search from a computational point of view. 93 | 94 | #### 2. Show some examples of test images to demonstrate how your pipeline is working. What did you do to optimize the performance of your classifier? 95 | 96 | Whole classification pipelin using CV approach is implemented in [`main_hog.py`](main_hog.py). Each test image undergoes through the `process_pipeline` function, which is responsbile for all phases: feature extraction, classification and showing the results. 97 | 98 |

99 | hog 100 |
Result of HOG pipeline on one of the test images. 101 |

102 | 103 | In order to optimize the performance of the classifier, I started the training with different configuration of the parameters, and kept the best one. Performing detection at different scales also helped a lot, even if exceeding in this direction can lead to very long computational time for a single image. At the end of this pipeline, the whole processing, from image reading to writing the ouput blend, took about 0.5 second per frame. 104 | 105 | ### Computer Vision on Steroids, a.k.a. Deep Learning 106 | 107 | #### 1. SSD (*Single Shot Multi-Box Detector*) network 108 | 109 | In order to solve the aforementioned problems, I decided to use a deep network to perform the detection, thus replacing the HOG+SVM pipeline. For this task employed the recently proposed [SSD deep network](https://arxiv.org/pdf/1512.02325.pdf) for detection. This paved the way for several huge advantages: 110 | - the network performs detection and classification in a single pass, and natively goes in GPU (*is fast*) 111 | - there is no more need to tune and validate hundreds of parameters related to the phase of feature extraction (*is robust*) 112 | - being the "car" class in very common, various pretrained models are available in different frameworks (Keras, Tensorflow etc.) that are already able to nicely distinguish this class of objects (*no need to retrain*) 113 | - the network outputs a confidence level along with the coordinates of the bounding box, so we can decide the tradeoff precision and recall just by tuning the confidence level we want (*less false positive*) 114 | 115 | The whole pipeline has been adapted to the make use of SSD network in file [`main_ssd.py`](main_ssd.py). 116 | 117 | ### Video Implementation 118 | 119 | #### 1. Provide a link to your final video output. Your pipeline should perform reasonably well on the entire project video (somewhat wobbly or unstable bounding boxes are ok as long as you are identifying the vehicles most of the time with minimal false positives.) 120 | Here's a [link to my video result](https://www.youtube.com/watch?v=Cd7p5pnP3e0) 121 | 122 | 123 | #### 2. Describe how (and identify where in your code) you implemented some kind of filter for false positives and some method for combining overlapping bounding boxes. 124 | 125 | In a first phase while I was still using HOG+SVM, I implemented a heatmap to average detection results from successive frames. The heatmap was thresholded to a minimum value before labeling regions, so to remove the major part of false positive. This process in shown in the thumbnails on the left of the previous figure. 126 | 127 | When I turned to deep learning, as mentioned before I could rely on a *confidence score* to decide the tradeoff between precision and recall. The following figure shows the effect of thresholding SSD detection at different level of confidence. 128 | 129 | 130 | 131 | 137 | 143 | 144 |
132 |

133 | low_confidence 134 |
SSD Network result setting minimum confidence = 0.01 135 |

136 |
138 |

139 | high_confidence 140 |
SSD Network result setting minimum confidence = 0.50 141 |

142 |
145 | 146 | Actually, while using SSD network for detection for the project video I found that integrating detections over time was not only useless, but even detrimental for performance. Indeed, being detections very precide and false positive almost zero, there was no need anymore to carry on information from previous detections. 147 | 148 | --- 149 | 150 | ### Discussion 151 | 152 | #### 1. Briefly discuss any problems / issues you faced in your implementation of this project. Where will your pipeline likely fail? What could you do to make it more robust? 153 | 154 | In the first phase, the HOG+SVM approach turned out to be slightly frustrating, in that strongly relied on the parameter chosed to perform feature extraction, training and detection. Even if I found a set of parameters that more or less worked for the project video, I wasn't satisfied of the result, because parameters were so finely tuned on the project video that certainly were not robust to different situations. 155 | 156 | For this reason, I turned to deep learning, and I leveraged on an existing detection network (pretrained on Pascal VOC classes) to tackle the problem. From that moment, the sun shone again on this assignment! :-) 157 | 158 | ### Acknowledgments 159 | 160 | Implementation of [Single Shot MultiBox Detector](https://arxiv.org/pdf/1512.02325.pdf) was borrowed from [this repo](https://github.com/rykov8/ssd_keras) and then slightly modified for my purpose. Thank you [rykov8](https://github.com/rykov8) for porting this amazing network in Keras-Tensorflow! 161 | -------------------------------------------------------------------------------- /04_vehicle_detection/config.py: -------------------------------------------------------------------------------- 1 | # root directory that contain all vehicle images in nested subdirectories 2 | root_data_vehicle = '../../../NANODEGREE/term_1/project_5_vehicle_detection/vehicles' 3 | 4 | # root directory that contain all NON-vehicle images in nested subdirectories 5 | root_data_non_vehicle = '../../../NANODEGREE/term_1/project_5_vehicle_detection/non-vehicles' 6 | 7 | # parameters used in the phase of feature extraction 8 | feat_extraction_params = {'resize_h': 64, # resize image height before feat extraction 9 | 'resize_w': 64, # resize image height before feat extraction 10 | 'color_space': 'YCrCb', # Can be RGB, HSV, LUV, HLS, YUV, YCrCb 11 | 'orient': 9, # HOG orientations 12 | 'pix_per_cell': 8, # HOG pixels per cell 13 | 'cell_per_block': 2, # HOG cells per block 14 | 'hog_channel': "ALL", # Can be 0, 1, 2, or "ALL" 15 | 'spatial_size': (32, 32), # Spatial binning dimensions 16 | 'hist_bins': 16, # Number of histogram bins 17 | 'spatial_feat': True, # Spatial features on or off 18 | 'hist_feat': True, # Histogram features on or off 19 | 'hog_feat': True} # HOG features on or off 20 | 21 | 22 | 23 | 24 | 25 | 26 | 27 | -------------------------------------------------------------------------------- /04_vehicle_detection/data/feat_extraction_params.pickle: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ifding/deep-learning-python/fc8bc808d5439686f0ee24a4f0f3b1f5354df6c0/04_vehicle_detection/data/feat_extraction_params.pickle -------------------------------------------------------------------------------- /04_vehicle_detection/data/feature_scaler.pickle: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ifding/deep-learning-python/fc8bc808d5439686f0ee24a4f0f3b1f5354df6c0/04_vehicle_detection/data/feature_scaler.pickle -------------------------------------------------------------------------------- /04_vehicle_detection/data/svm_trained.pickle: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ifding/deep-learning-python/fc8bc808d5439686f0ee24a4f0f3b1f5354df6c0/04_vehicle_detection/data/svm_trained.pickle -------------------------------------------------------------------------------- /04_vehicle_detection/functions_detection.py: -------------------------------------------------------------------------------- 1 | import cv2 2 | import matplotlib.pyplot as plt 3 | import numpy as np 4 | 5 | from functions_feat_extraction import image_to_features 6 | from project_5_utils import stitch_together 7 | 8 | 9 | def draw_labeled_bounding_boxes(img, labeled_frame, num_objects): 10 | """ 11 | Starting from labeled regions, draw enclosing rectangles in the original color frame. 12 | """ 13 | # Iterate through all detected cars 14 | for car_number in range(1, num_objects + 1): 15 | # Find pixels with each car_number label value 16 | rows, cols = np.where(labeled_frame == car_number) 17 | 18 | # Find minimum enclosing rectangle 19 | x_min, y_min = np.min(cols), np.min(rows) 20 | x_max, y_max = np.max(cols), np.max(rows) 21 | 22 | cv2.rectangle(img, (x_min, y_min), (x_max, y_max), color=(255, 0, 0), thickness=6) 23 | 24 | return img 25 | 26 | 27 | def compute_heatmap_from_detections(frame, hot_windows, threshold=5, verbose=False): 28 | """ 29 | Compute heatmaps from windows classified as positive, in order to filter false positives. 30 | """ 31 | h, w, c = frame.shape 32 | 33 | heatmap = np.zeros(shape=(h, w), dtype=np.uint8) 34 | 35 | for bbox in hot_windows: 36 | # for each bounding box, add heat to the corresponding rectangle in the image 37 | x_min, y_min = bbox[0] 38 | x_max, y_max = bbox[1] 39 | heatmap[y_min:y_max, x_min:x_max] += 1 # add heat 40 | 41 | # apply threshold + morphological closure to remove noise 42 | _, heatmap_thresh = cv2.threshold(heatmap, threshold, 255, type=cv2.THRESH_BINARY) 43 | heatmap_thresh = cv2.morphologyEx(heatmap_thresh, op=cv2.MORPH_CLOSE, 44 | kernel=cv2.getStructuringElement(cv2.MORPH_ELLIPSE, 45 | (13, 13)), iterations=1) 46 | if verbose: 47 | f, ax = plt.subplots(1, 3) 48 | ax[0].imshow(cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)) 49 | ax[1].imshow(heatmap, cmap='hot') 50 | ax[2].imshow(heatmap_thresh, cmap='hot') 51 | plt.show() 52 | 53 | return heatmap, heatmap_thresh 54 | 55 | 56 | def compute_windows_multiscale(image, verbose=False): 57 | """ 58 | Naive implementation of multiscale window search. 59 | """ 60 | h, w, c = image.shape 61 | 62 | windows_multiscale = [] 63 | 64 | windows_32 = slide_window(image, x_start_stop=[None, None], 65 | y_start_stop=[4 * h // 8, 5 * h // 8], 66 | xy_window=(32, 32), xy_overlap=(0.8, 0.8)) 67 | windows_multiscale.append(windows_32) 68 | 69 | windows_64 = slide_window(image, x_start_stop=[None, None], 70 | y_start_stop=[4 * h // 8, 6 * h // 8], 71 | xy_window=(64, 64), xy_overlap=(0.8, 0.8)) 72 | windows_multiscale.append(windows_64) 73 | 74 | windows_128 = slide_window(image, x_start_stop=[None, None], y_start_stop=[3 * h // 8, h], 75 | xy_window=(128, 128), xy_overlap=(0.8, 0.8)) 76 | windows_multiscale.append(windows_128) 77 | 78 | if verbose: 79 | windows_img_32 = draw_boxes(image, windows_32, color=(0, 0, 255), thick=1) 80 | windows_img_64 = draw_boxes(image, windows_64, color=(0, 255, 0), thick=1) 81 | windows_img_128 = draw_boxes(image, windows_128, color=(255, 0, 0), thick=1) 82 | 83 | stitching = stitch_together([windows_img_32, windows_img_64, windows_img_128], (1, 3), 84 | resize_dim=(1300, 500)) 85 | cv2.imshow('', stitching) 86 | cv2.waitKey() 87 | 88 | return np.concatenate(windows_multiscale) 89 | 90 | 91 | def slide_window(img, x_start_stop=[None, None], y_start_stop=[None, None], 92 | xy_window=(64, 64), xy_overlap=(0.5, 0.5)): 93 | """ 94 | Implementation of a sliding window in a region of interest of the image. 95 | """ 96 | # If x and/or y start/stop positions not defined, set to image size 97 | if x_start_stop[0] is None: 98 | x_start_stop[0] = 0 99 | if x_start_stop[1] is None: 100 | x_start_stop[1] = img.shape[1] 101 | if y_start_stop[0] is None: 102 | y_start_stop[0] = 0 103 | if y_start_stop[1] is None: 104 | y_start_stop[1] = img.shape[0] 105 | 106 | # Compute the span of the region to be searched 107 | x_span = x_start_stop[1] - x_start_stop[0] 108 | y_span = y_start_stop[1] - y_start_stop[0] 109 | 110 | # Compute the number of pixels per step in x/y 111 | n_x_pix_per_step = np.int(xy_window[0] * (1 - xy_overlap[0])) 112 | n_y_pix_per_step = np.int(xy_window[1] * (1 - xy_overlap[1])) 113 | 114 | # Compute the number of windows in x / y 115 | n_x_windows = np.int(x_span / n_x_pix_per_step) - 1 116 | n_y_windows = np.int(y_span / n_y_pix_per_step) - 1 117 | 118 | # Initialize a list to append window positions to 119 | window_list = [] 120 | 121 | # Loop through finding x and y window positions. 122 | for i in range(n_y_windows): 123 | for j in range(n_x_windows): 124 | # Calculate window position 125 | start_x = j * n_x_pix_per_step + x_start_stop[0] 126 | end_x = start_x + xy_window[0] 127 | start_y = i * n_y_pix_per_step + y_start_stop[0] 128 | end_y = start_y + xy_window[1] 129 | 130 | # Append window position to list 131 | window_list.append(((start_x, start_y), (end_x, end_y))) 132 | 133 | # Return the list of windows 134 | return window_list 135 | 136 | 137 | def draw_boxes(img, bbox_list, color=(0, 0, 255), thick=6): 138 | """ 139 | Draw all bounding boxes in `bbox_list` onto a given image. 140 | :param img: input image 141 | :param bbox_list: list of bounding boxes 142 | :param color: color used for drawing boxes 143 | :param thick: thickness of the box line 144 | :return: a new image with the bounding boxes drawn 145 | """ 146 | # Make a copy of the image 147 | img_copy = np.copy(img) 148 | 149 | # Iterate through the bounding boxes 150 | for bbox in bbox_list: 151 | # Draw a rectangle given bbox coordinates 152 | tl_corner = tuple(bbox[0]) 153 | br_corner = tuple(bbox[1]) 154 | cv2.rectangle(img_copy, tl_corner, br_corner, color, thick) 155 | 156 | # Return the image copy with boxes drawn 157 | return img_copy 158 | 159 | 160 | # Define a function you will pass an image and the list of windows to be searched (output of slide_windows()) 161 | def search_windows(img, windows, clf, scaler, feat_extraction_params): 162 | hot_windows = [] # list to receive positive detection windows 163 | 164 | for window in windows: 165 | # Extract the current window from original image 166 | resize_h, resize_w = feat_extraction_params['resize_h'], feat_extraction_params['resize_w'] 167 | test_img = cv2.resize(img[window[0][1]:window[1][1], window[0][0]:window[1][0]], 168 | (resize_w, resize_h)) 169 | 170 | # Extract features for that window using single_img_features() 171 | features = image_to_features(test_img, feat_extraction_params) 172 | 173 | # Scale extracted features to be fed to classifier 174 | test_features = scaler.transform(np.array(features).reshape(1, -1)) 175 | 176 | # Predict on rescaled features 177 | prediction = clf.predict(test_features) 178 | 179 | # If positive (prediction == 1) then save the window 180 | if prediction == 1: 181 | hot_windows.append(window) 182 | 183 | # Return windows for positive detections 184 | return hot_windows 185 | -------------------------------------------------------------------------------- /04_vehicle_detection/functions_utils.py: -------------------------------------------------------------------------------- 1 | import numpy as np 2 | 3 | 4 | def normalize_image(img): 5 | """ 6 | Normalize image between 0 and 255 and cast to uint8 7 | (useful for visualization) 8 | """ 9 | img = np.float32(img) 10 | 11 | img = img / img.max() * 255 12 | 13 | return np.uint8(img) -------------------------------------------------------------------------------- /04_vehicle_detection/img/car_samples.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ifding/deep-learning-python/fc8bc808d5439686f0ee24a4f0f3b1f5354df6c0/04_vehicle_detection/img/car_samples.png -------------------------------------------------------------------------------- /04_vehicle_detection/img/confidence_001.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ifding/deep-learning-python/fc8bc808d5439686f0ee24a4f0f3b1f5354df6c0/04_vehicle_detection/img/confidence_001.png -------------------------------------------------------------------------------- /04_vehicle_detection/img/confidence_050.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ifding/deep-learning-python/fc8bc808d5439686f0ee24a4f0f3b1f5354df6c0/04_vehicle_detection/img/confidence_050.png -------------------------------------------------------------------------------- /04_vehicle_detection/img/hog_car_vs_noncar.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ifding/deep-learning-python/fc8bc808d5439686f0ee24a4f0f3b1f5354df6c0/04_vehicle_detection/img/hog_car_vs_noncar.jpg -------------------------------------------------------------------------------- /04_vehicle_detection/img/noncar_samples.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ifding/deep-learning-python/fc8bc808d5439686f0ee24a4f0f3b1f5354df6c0/04_vehicle_detection/img/noncar_samples.png -------------------------------------------------------------------------------- /04_vehicle_detection/img/pipeline_hog.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ifding/deep-learning-python/fc8bc808d5439686f0ee24a4f0f3b1f5354df6c0/04_vehicle_detection/img/pipeline_hog.jpg -------------------------------------------------------------------------------- /04_vehicle_detection/main_hog.py: -------------------------------------------------------------------------------- 1 | import os 2 | import numpy as np 3 | import pickle 4 | from functions_detection import * 5 | import scipy 6 | from functions_utils import normalize_image 7 | from functions_feat_extraction import find_cars 8 | import time 9 | import collections 10 | 11 | time_window = 5 12 | hot_windows_history = collections.deque(maxlen=time_window) 13 | 14 | # load pretrained svm classifier 15 | svc = pickle.load(open('data/svm_trained.pickle', 'rb')) 16 | 17 | # load feature scaler fitted on training data 18 | feature_scaler = pickle.load(open('data/feature_scaler.pickle', 'rb')) 19 | 20 | # load parameters used to perform feature extraction 21 | feat_extraction_params = pickle.load(open('data/feat_extraction_params.pickle', 'rb')) 22 | 23 | 24 | def prepare_output_blend(frame, img_hot_windows, img_heatmap, img_labeling, img_detection): 25 | 26 | h, w, c = frame.shape 27 | 28 | # decide the size of thumbnail images 29 | thumb_ratio = 0.25 30 | thumb_h, thumb_w = int(thumb_ratio * h), int(thumb_ratio * w) 31 | 32 | # resize to thumbnails images from various stages of the pipeline 33 | thumb_hot_windows = cv2.resize(img_hot_windows, dsize=(thumb_w, thumb_h)) 34 | thumb_heatmap = cv2.resize(img_heatmap, dsize=(thumb_w, thumb_h)) 35 | thumb_labeling = cv2.resize(img_labeling, dsize=(thumb_w, thumb_h)) 36 | 37 | off_x, off_y = 20, 45 38 | 39 | # add a semi-transparent rectangle to highlight thumbnails on the left 40 | mask = cv2.rectangle(img_detection.copy(), (0, 0), (2*off_x + thumb_w, h), (0, 0, 0), thickness=cv2.FILLED) 41 | img_blend = cv2.addWeighted(src1=mask, alpha=0.2, src2=img_detection, beta=0.8, gamma=0) 42 | 43 | # stitch thumbnails 44 | img_blend[off_y:off_y+thumb_h, off_x:off_x+thumb_w, :] = thumb_hot_windows 45 | img_blend[2*off_y+thumb_h:2*(off_y+thumb_h), off_x:off_x+thumb_w, :] = thumb_heatmap 46 | img_blend[3*off_y+2*thumb_h:3*(off_y+thumb_h), off_x:off_x+thumb_w, :] = thumb_labeling 47 | 48 | return img_blend 49 | 50 | 51 | def process_pipeline(frame, svc, feature_scaler, feat_extraction_params, keep_state=True, verbose=False): 52 | 53 | hot_windows = [] 54 | 55 | for subsample in np.arange(1, 3, 0.5): 56 | hot_windows += find_cars(frame, 400, 600, subsample, svc, feature_scaler, feat_extraction_params) 57 | 58 | if keep_state: 59 | if hot_windows: 60 | hot_windows_history.append(hot_windows) 61 | hot_windows = np.concatenate(hot_windows_history) 62 | 63 | # compute heatmaps positive windows found 64 | thresh = (time_window - 1) if keep_state else 0 65 | heatmap, heatmap_thresh = compute_heatmap_from_detections(frame, hot_windows, threshold=thresh, verbose=False) 66 | 67 | # label connected components 68 | labeled_frame, num_objects = scipy.ndimage.measurements.label(heatmap_thresh) 69 | 70 | # prepare images for blend 71 | img_hot_windows = draw_boxes(frame, hot_windows, color=(0, 0, 255), thick=2) # show pos windows 72 | img_heatmap = cv2.applyColorMap(normalize_image(heatmap), colormap=cv2.COLORMAP_HOT) # draw heatmap 73 | img_labeling = cv2.applyColorMap(normalize_image(labeled_frame), colormap=cv2.COLORMAP_HOT) # draw label 74 | img_detection = draw_labeled_bounding_boxes(frame.copy(), labeled_frame, num_objects) # draw detected bboxes 75 | 76 | img_blend_out = prepare_output_blend(frame, img_hot_windows, img_heatmap, img_labeling, img_detection) 77 | 78 | if verbose: 79 | cv2.imshow('detection bboxes', img_hot_windows) 80 | cv2.imshow('heatmap', img_heatmap) 81 | cv2.imshow('labeled frame', img_labeling) 82 | cv2.imshow('detections', img_detection) 83 | cv2.waitKey() 84 | 85 | return img_blend_out 86 | 87 | 88 | if __name__ == '__main__': 89 | 90 | test_img_dir = 'test_images' 91 | for test_img in os.listdir(test_img_dir): 92 | 93 | t = time.time() 94 | print('Processing image {}...'.format(test_img), end="") 95 | 96 | frame = cv2.imread(os.path.join(test_img_dir, test_img)) 97 | 98 | frame_out = process_pipeline(frame, svc, feature_scaler, feat_extraction_params, keep_state=False, verbose=False) 99 | 100 | cv2.imwrite('output_images/{}'.format(test_img), frame_out) 101 | 102 | print('Done. Elapsed: {:.02f}'.format(time.time()-t)) 103 | 104 | -------------------------------------------------------------------------------- /04_vehicle_detection/main_ssd.py: -------------------------------------------------------------------------------- 1 | from functions_detection import * 2 | from SSD import process_frame_bgr_with_SSD, get_SSD_model 3 | from vehicle import Vehicle 4 | import os 5 | import os.path as path 6 | 7 | 8 | # global deep network model 9 | ssd_model, bbox_helper, color_palette = get_SSD_model() 10 | 11 | 12 | def process_pipeline(frame, verbose=False): 13 | 14 | detected_vehicles = [] 15 | 16 | img_blend_out = frame.copy() 17 | 18 | # return bounding boxes detected by SSD 19 | ssd_bboxes = process_frame_bgr_with_SSD(frame, ssd_model, bbox_helper, allow_classes=[7], min_confidence=0.3) 20 | for row in ssd_bboxes: 21 | label, confidence, x_min, y_min, x_max, y_max = row 22 | x_min = int(round(x_min * frame.shape[1])) 23 | y_min = int(round(y_min * frame.shape[0])) 24 | x_max = int(round(x_max * frame.shape[1])) 25 | y_max = int(round(y_max * frame.shape[0])) 26 | 27 | proposed_vehicle = Vehicle(x_min, y_min, x_max, y_max) 28 | 29 | if not detected_vehicles: 30 | detected_vehicles.append(proposed_vehicle) 31 | else: 32 | for i, vehicle in enumerate(detected_vehicles): 33 | if vehicle.contains(*proposed_vehicle.center): 34 | pass # go on, bigger bbox already detected in that position 35 | elif proposed_vehicle.contains(*vehicle.center): 36 | detected_vehicles[i] = proposed_vehicle # keep the bigger window 37 | else: 38 | detected_vehicles.append(proposed_vehicle) 39 | 40 | # draw bounding boxes of detected vehicles on frame 41 | for vehicle in detected_vehicles: 42 | vehicle.draw(img_blend_out, color=(0, 255, 255), thickness=2) 43 | 44 | h, w = frame.shape[:2] 45 | off_x, off_y = 30, 30 46 | thumb_h, thumb_w = (96, 128) 47 | 48 | # add a semi-transparent rectangle to highlight thumbnails on the left 49 | mask = cv2.rectangle(frame.copy(), (0, 0), (w, 2 * off_y + thumb_h), (0, 0, 0), thickness=cv2.FILLED) 50 | img_blend_out = cv2.addWeighted(src1=mask, alpha=0.3, src2=img_blend_out, beta=0.8, gamma=0) 51 | 52 | # create list of thumbnails s.t. this can be later sorted for drawing 53 | vehicle_thumbs = [] 54 | for i, vehicle in enumerate(detected_vehicles): 55 | x_min, y_min, x_max, y_max = vehicle.coords 56 | vehicle_thumbs.append(frame[y_min:y_max, x_min:x_max, :]) 57 | 58 | # draw detected car thumbnails on the top of the frame 59 | for i, thumbnail in enumerate(sorted(vehicle_thumbs, key=lambda x: np.mean(x), reverse=True)): 60 | vehicle_thumb = cv2.resize(thumbnail, dsize=(thumb_w, thumb_h)) 61 | start_x = 300 + (i+1) * off_x + i * thumb_w 62 | img_blend_out[off_y:off_y + thumb_h, start_x:start_x + thumb_w, :] = vehicle_thumb 63 | 64 | # write the counter of cars detected 65 | font = cv2.FONT_HERSHEY_SIMPLEX 66 | cv2.putText(img_blend_out, 'Vehicles in sight: {:02d}'.format(len(detected_vehicles)), 67 | (20, off_y + thumb_h // 2), font, 0.8, (255, 255, 255), 2, cv2.LINE_AA) 68 | 69 | return img_blend_out 70 | 71 | 72 | if __name__ == '__main__': 73 | 74 | mode = 'images' 75 | 76 | if mode == 'video': 77 | 78 | video_file = 'project_video.mp4' 79 | 80 | cap_in = cv2.VideoCapture(video_file) 81 | video_out_dir = '../../../NANODEGREE/term_1/project_5_vehicle_detection/frames_out' 82 | 83 | f_counter = 0 84 | while True: 85 | 86 | ret, frame = cap_in.read() 87 | 88 | if ret: 89 | 90 | f_counter += 1 91 | 92 | frame_out = process_pipeline(frame, verbose=1) 93 | 94 | cv2.imwrite(path.join(video_out_dir, '{:06d}.jpg'.format(f_counter)), frame_out) 95 | 96 | cv2.imshow('', frame_out) 97 | if cv2.waitKey(1) & 0xFF == ord('q'): 98 | break 99 | 100 | # When everything done, release the capture 101 | cap_in.release() 102 | cv2.destroyAllWindows() 103 | exit() 104 | 105 | else: 106 | 107 | test_img_dir = 'test_images' 108 | for test_img in os.listdir(test_img_dir): 109 | 110 | frame = cv2.imread(os.path.join(test_img_dir, test_img)) 111 | 112 | frame_out = process_pipeline(frame, verbose=False) 113 | 114 | cv2.imwrite('output_images/{}'.format(test_img), frame_out) 115 | 116 | 117 | 118 | -------------------------------------------------------------------------------- /04_vehicle_detection/output_images/test1.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ifding/deep-learning-python/fc8bc808d5439686f0ee24a4f0f3b1f5354df6c0/04_vehicle_detection/output_images/test1.jpg -------------------------------------------------------------------------------- /04_vehicle_detection/output_images/test2.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ifding/deep-learning-python/fc8bc808d5439686f0ee24a4f0f3b1f5354df6c0/04_vehicle_detection/output_images/test2.jpg -------------------------------------------------------------------------------- /04_vehicle_detection/output_images/test3.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ifding/deep-learning-python/fc8bc808d5439686f0ee24a4f0f3b1f5354df6c0/04_vehicle_detection/output_images/test3.jpg -------------------------------------------------------------------------------- /04_vehicle_detection/output_images/test4.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ifding/deep-learning-python/fc8bc808d5439686f0ee24a4f0f3b1f5354df6c0/04_vehicle_detection/output_images/test4.jpg -------------------------------------------------------------------------------- /04_vehicle_detection/output_images/test5.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ifding/deep-learning-python/fc8bc808d5439686f0ee24a4f0f3b1f5354df6c0/04_vehicle_detection/output_images/test5.jpg -------------------------------------------------------------------------------- /04_vehicle_detection/output_images/test6.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ifding/deep-learning-python/fc8bc808d5439686f0ee24a4f0f3b1f5354df6c0/04_vehicle_detection/output_images/test6.jpg -------------------------------------------------------------------------------- /04_vehicle_detection/process_video.py: -------------------------------------------------------------------------------- 1 | import cv2 2 | import os.path as path 3 | from SSD import process_frame_bgr_with_SSD, show_SSD_results, get_SSD_model 4 | 5 | 6 | if __name__ == '__main__': 7 | 8 | SSD_net, bbox_helper, color_palette = get_SSD_model() 9 | 10 | # video_file = 'project_video.mp4' 11 | video_file = 'C:/Users/minotauro/Google Drive/DEMO_SMARTAREA/modena.mp4' 12 | # out_path = 'C:/Users/minotauro/Google Drive/DEMO_SMARTAREA/out_frames' 13 | out_path = 'C:/temp_frames' 14 | 15 | cap = cv2.VideoCapture(video_file) 16 | 17 | counter = 0 18 | while True: 19 | 20 | ret, frame = cap.read() 21 | 22 | if ret: 23 | bboxes = process_frame_bgr_with_SSD(frame, SSD_net, bbox_helper, 24 | min_confidence=0.2, 25 | allow_classes=[2, 7, 14, 15]) 26 | 27 | show_SSD_results(bboxes, frame, color_palette=color_palette) 28 | 29 | cv2.imwrite(path.join(out_path, '{:06d}.jpg'.format(counter)), frame) 30 | # cv2.imshow('', frame) 31 | # if cv2.waitKey(1) & 0xFF == ord('q'): 32 | # break 33 | 34 | counter += 1 35 | 36 | 37 | # When everything done, release the capture 38 | cap.release() 39 | cv2.destroyAllWindows() 40 | exit() 41 | -------------------------------------------------------------------------------- /04_vehicle_detection/project_5_utils.py: -------------------------------------------------------------------------------- 1 | import os 2 | import sys 3 | from os.path import exists 4 | from os.path import join 5 | 6 | import cv2 7 | import numpy as np 8 | 9 | 10 | def get_file_list_recursively(top_directory): 11 | """ 12 | Get list of full paths of all files found under root directory "top_directory". 13 | If a list of allowed file extensions is provided, files are filtered according to this list. 14 | 15 | Parameters 16 | ---------- 17 | top_directory: str 18 | Root of the hierarchy 19 | 20 | Returns 21 | ------- 22 | file_list: list 23 | List of files found under top_directory (with full path) 24 | """ 25 | if not exists(top_directory): 26 | raise ValueError('Directory "{}" does NOT exist.'.format(top_directory)) 27 | 28 | file_list = [] 29 | 30 | for cur_dir, cur_subdirs, cur_files in os.walk(top_directory): 31 | 32 | for file in cur_files: 33 | file_list.append(join(cur_dir, file)) 34 | sys.stdout.write( 35 | '\r[{}] - found {:06d} files...'.format(top_directory, len(file_list))) 36 | sys.stdout.flush() 37 | 38 | sys.stdout.write(' Done.\n') 39 | 40 | return file_list 41 | 42 | 43 | def stitch_together(input_images, layout, resize_dim=None, off_x=None, off_y=None, 44 | bg_color=(0, 0, 0)): 45 | """ 46 | Stitch together N input images into a bigger frame, using a grid layout. 47 | Input images can be either color or grayscale, but must all have the same size. 48 | 49 | Parameters 50 | ---------- 51 | input_images : list 52 | List of input images 53 | layout : tuple 54 | Grid layout of the stitch expressed as (rows, cols) 55 | resize_dim : couple 56 | If not None, stitch is resized to this size 57 | off_x : int 58 | Offset between stitched images along x axis 59 | off_y : int 60 | Offset between stitched images along y axis 61 | bg_color : tuple 62 | Color used for background 63 | 64 | Returns 65 | ------- 66 | stitch : ndarray 67 | Stitch of input images 68 | """ 69 | 70 | if len(set([img.shape for img in input_images])) > 1: 71 | raise ValueError('All images must have the same shape') 72 | 73 | if len(set([img.dtype for img in input_images])) > 1: 74 | raise ValueError('All images must have the same data type') 75 | 76 | # determine if input images are color (3 channels) or grayscale (single channel) 77 | if len(input_images[0].shape) == 2: 78 | mode = 'grayscale' 79 | img_h, img_w = input_images[0].shape 80 | elif len(input_images[0].shape) == 3: 81 | mode = 'color' 82 | img_h, img_w, img_c = input_images[0].shape 83 | else: 84 | raise ValueError('Unknown shape for input images') 85 | 86 | # if no offset is provided, set to 10% of image size 87 | if off_x is None: 88 | off_x = img_w // 10 89 | if off_y is None: 90 | off_y = img_h // 10 91 | 92 | # create stitch mask 93 | rows, cols = layout 94 | stitch_h = rows * img_h + (rows + 1) * off_y 95 | stitch_w = cols * img_w + (cols + 1) * off_x 96 | if mode == 'color': 97 | bg_color = np.array(bg_color)[None, None, :] # cast to ndarray add singleton dimensions 98 | stitch = np.uint8(np.repeat(np.repeat(bg_color, stitch_h, axis=0), stitch_w, axis=1)) 99 | elif mode == 'grayscale': 100 | stitch = np.zeros(shape=(stitch_h, stitch_w), dtype=np.uint8) 101 | 102 | for r in range(0, rows): 103 | for c in range(0, cols): 104 | 105 | list_idx = r * cols + c 106 | 107 | if list_idx < len(input_images): 108 | if mode == 'color': 109 | stitch[r * (off_y + img_h) + off_y: r * (off_y + img_h) + off_y + img_h, 110 | c * (off_x + img_w) + off_x: c * (off_x + img_w) + off_x + img_w, 111 | :] = input_images[list_idx] 112 | elif mode == 'grayscale': 113 | stitch[r * (off_y + img_h) + off_y: r * (off_y + img_h) + off_y + img_h, 114 | c * (off_x + img_w) + off_x: c * (off_x + img_w) + off_x + img_w] \ 115 | = input_images[list_idx] 116 | 117 | if resize_dim: 118 | stitch = cv2.resize(stitch, dsize=(resize_dim[::-1])) 119 | 120 | return stitch 121 | 122 | 123 | class Rectangle: 124 | """ 125 | 2D Rectangle defined by top-left and bottom-right corners. 126 | Parameters 127 | ---------- 128 | x_min : int 129 | x coordinate of top-left corner. 130 | y_min : int 131 | y coordinate of top-left corner. 132 | x_max : int 133 | x coordinate of bottom-right corner. 134 | y_min : int 135 | y coordinate of bottom-right corner. 136 | """ 137 | 138 | def __init__(self, x_min, y_min, x_max, y_max, label=""): 139 | 140 | self.x_min = x_min 141 | self.y_min = y_min 142 | self.x_max = x_max 143 | self.y_max = y_max 144 | 145 | self.x_side = self.x_max - self.x_min 146 | self.y_side = self.y_max - self.y_min 147 | 148 | self.label = label 149 | 150 | def intersect_with(self, rect): 151 | """ 152 | Compute the intersection between this instance and another Rectangle. 153 | 154 | Parameters 155 | ---------- 156 | rect : Rectangle 157 | The instance of the second Rectangle. 158 | 159 | Returns 160 | ------- 161 | intersection_area : float 162 | Area of intersection between the two rectangles expressed in number of pixels. 163 | """ 164 | if not isinstance(rect, Rectangle): 165 | raise ValueError('Cannot compute intersection if "rect" is not a Rectangle') 166 | 167 | dx = min(self.x_max, rect.x_max) - max(self.x_min, rect.x_min) 168 | dy = min(self.y_max, rect.y_max) - max(self.y_min, rect.y_min) 169 | 170 | if dx >= 0 and dy >= 0: 171 | intersection = dx * dy 172 | else: 173 | intersection = 0. 174 | 175 | return intersection 176 | 177 | def resize_sides(self, ratio, bounds=None): 178 | """ 179 | Resize the sides of rectangle while mantaining the aspect ratio and center position. 180 | Parameters 181 | ---------- 182 | ratio : float 183 | Ratio of the resize in range (0, infinity), where 2 means double the size and 0.5 is half of the size. 184 | bounds: tuple, optional 185 | If present, clip the Rectangle to these bounds=(xbmin, ybmin, xbmax, ybmax). 186 | Returns 187 | ------- 188 | rectangle : Rectangle 189 | Reshaped Rectangle. 190 | """ 191 | 192 | # compute offset 193 | off_x = abs(ratio * self.x_side - self.x_side) / 2 194 | off_y = abs(ratio * self.y_side - self.y_side) / 2 195 | 196 | # offset changes sign according if the resize is either positive or negative 197 | sign = np.sign(ratio - 1.) 198 | off_x = np.int32(off_x * sign) 199 | off_y = np.int32(off_y * sign) 200 | 201 | # update top-left and bottom-right coords 202 | new_x_min, new_y_min = self.x_min - off_x, self.y_min - off_y 203 | new_x_max, new_y_max = self.x_max + off_x, self.y_max + off_y 204 | 205 | # eventually clip the coordinates according to the given bounds 206 | if bounds: 207 | b_x_min, b_y_min, b_x_max, b_y_max = bounds 208 | new_x_min = max(new_x_min, b_x_min) 209 | new_y_min = max(new_y_min, b_y_min) 210 | new_x_max = min(new_x_max, b_x_max) 211 | new_y_max = min(new_y_max, b_y_max) 212 | 213 | return Rectangle(new_x_min, new_y_min, new_x_max, new_y_max) 214 | 215 | def draw(self, frame, color=255, thickness=2, draw_label=False): 216 | """ 217 | Draw Rectangle on a given frame. 218 | Notice: while this function does not return anything, original image `frame` is modified. 219 | Parameters 220 | ---------- 221 | frame : 2D / 3D np.array 222 | The image on which the rectangle is drawn. 223 | color : tuple, optional 224 | Color used to draw the rectangle (default = 255) 225 | thickness : int, optional 226 | Line thickness used t draw the rectangle (default = 1) 227 | draw_label : bool, optional 228 | If True and the Rectangle has a label, draws it on the top of the rectangle. 229 | Returns 230 | ------- 231 | None 232 | """ 233 | if draw_label and self.label: 234 | # compute text size 235 | text_font, text_scale, text_thick = cv2.FONT_HERSHEY_SIMPLEX, 0.5, 1 236 | (text_w, text_h), baseline = cv2.getTextSize(self.label, text_font, text_scale, 237 | text_thick) 238 | 239 | # draw rectangle on which text will be displayed 240 | text_rect_w = min(text_w, self.x_side - 2 * baseline) 241 | out = cv2.rectangle(frame.copy(), pt1=(self.x_min, self.y_min - text_h - 2 * baseline), 242 | pt2=(self.x_min + text_rect_w + 2 * baseline, self.y_min), 243 | color=color, thickness=cv2.FILLED) 244 | cv2.addWeighted(frame, 0.75, out, 0.25, 0, dst=frame) 245 | 246 | # actually write text label 247 | cv2.putText(frame, self.label, (self.x_min + baseline, self.y_min - baseline), 248 | text_font, text_scale, (0, 0, 0), text_thick, cv2.LINE_AA) 249 | 250 | # add text rectangle border 251 | cv2.rectangle(frame, pt1=(self.x_min, self.y_min - text_h - 2 * baseline), 252 | pt2=(self.x_min + text_rect_w + 2 * baseline, self.y_min), color=color, 253 | thickness=thickness) 254 | 255 | # draw the Rectangle 256 | cv2.rectangle(frame, (self.x_min, self.y_min), (self.x_max, self.y_max), color, thickness) 257 | 258 | def get_binary_mask(self, mask_shape): 259 | """ 260 | Get uint8 binary mask of shape `mask_shape` with rectangle in foreground. 261 | Parameters 262 | ---------- 263 | mask_shape : (tuple) 264 | Shape of the mask to return - following convention (h, w) 265 | Returns 266 | ------- 267 | mask : np.array 268 | Binary uint8 mask of shape `mask_shape` with rectangle drawn as foreground. 269 | """ 270 | if mask_shape[0] < self.y_max or mask_shape[1] < self.x_max: 271 | raise ValueError('Mask shape is smaller than Rectangle size') 272 | mask = np.zeros(shape=mask_shape, dtype=np.uint8) 273 | mask = cv2.rectangle(mask, self.tl_corner, self.br_corner, color=255, thickness=cv2.FILLED) 274 | return mask 275 | 276 | @property 277 | def tl_corner(self): 278 | """ 279 | Coordinates of the top-left corner of rectangle (as int32). 280 | Returns 281 | ------- 282 | tl_corner : int32 tuple 283 | """ 284 | return tuple(map(np.int32, (self.x_min, self.y_min))) 285 | 286 | @property 287 | def br_corner(self): 288 | """ 289 | Coordinates of the bottom-right corner of rectangle. 290 | 291 | Returns 292 | ------- 293 | br_corner : int32 tuple 294 | """ 295 | return tuple(map(np.int32, (self.x_max, self.y_max))) 296 | 297 | @property 298 | def coords(self): 299 | """ 300 | Coordinates (x_min, y_min, x_max, y_max) which define the Rectangle. 301 | 302 | Returns 303 | ------- 304 | coordinates : int32 tuple 305 | """ 306 | return tuple(map(np.int32, (self.x_min, self.y_min, self.x_max, self.y_max))) 307 | 308 | @property 309 | def area(self): 310 | """ 311 | Get the area of Rectangle 312 | 313 | Returns 314 | ------- 315 | area : float32 316 | """ 317 | return np.float32(self.x_side * self.y_side) 318 | -------------------------------------------------------------------------------- /04_vehicle_detection/test_images/test1.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ifding/deep-learning-python/fc8bc808d5439686f0ee24a4f0f3b1f5354df6c0/04_vehicle_detection/test_images/test1.jpg -------------------------------------------------------------------------------- /04_vehicle_detection/test_images/test2.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ifding/deep-learning-python/fc8bc808d5439686f0ee24a4f0f3b1f5354df6c0/04_vehicle_detection/test_images/test2.jpg -------------------------------------------------------------------------------- /04_vehicle_detection/test_images/test3.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ifding/deep-learning-python/fc8bc808d5439686f0ee24a4f0f3b1f5354df6c0/04_vehicle_detection/test_images/test3.jpg -------------------------------------------------------------------------------- /04_vehicle_detection/test_images/test4.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ifding/deep-learning-python/fc8bc808d5439686f0ee24a4f0f3b1f5354df6c0/04_vehicle_detection/test_images/test4.jpg -------------------------------------------------------------------------------- /04_vehicle_detection/test_images/test5.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ifding/deep-learning-python/fc8bc808d5439686f0ee24a4f0f3b1f5354df6c0/04_vehicle_detection/test_images/test5.jpg -------------------------------------------------------------------------------- /04_vehicle_detection/test_images/test6.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ifding/deep-learning-python/fc8bc808d5439686f0ee24a4f0f3b1f5354df6c0/04_vehicle_detection/test_images/test6.jpg -------------------------------------------------------------------------------- /04_vehicle_detection/train.py: -------------------------------------------------------------------------------- 1 | import os 2 | import pickle 3 | import time 4 | 5 | import cv2 6 | import matplotlib.pyplot as plt 7 | import numpy as np 8 | from sklearn.model_selection import train_test_split 9 | from sklearn.preprocessing import StandardScaler 10 | from sklearn.svm import LinearSVC 11 | 12 | from config import root_data_non_vehicle, root_data_vehicle, feat_extraction_params 13 | from functions_detection import draw_boxes 14 | from functions_detection import search_windows 15 | from functions_detection import slide_window 16 | from functions_feat_extraction import extract_features_from_file_list 17 | from project_5_utils import get_file_list_recursively 18 | 19 | 20 | if __name__ == '__main__': 21 | 22 | # read paths of training images 23 | cars = get_file_list_recursively(root_data_vehicle) 24 | notcars = get_file_list_recursively(root_data_non_vehicle) 25 | 26 | print('Extracting car features...') 27 | car_features = extract_features_from_file_list(cars, feat_extraction_params) 28 | 29 | print('Extracting non-car features...') 30 | notcar_features = extract_features_from_file_list(notcars, feat_extraction_params) 31 | 32 | X = np.vstack((car_features, notcar_features)).astype(np.float64) 33 | 34 | # standardize features with sklearn preprocessing 35 | feature_scaler = StandardScaler().fit(X) # per-column scaler 36 | scaled_X = feature_scaler.transform(X) 37 | 38 | # Define the labels vector 39 | y = np.hstack((np.ones(len(car_features)), np.zeros(len(notcar_features)))) 40 | 41 | # Split up data into randomized training and test sets 42 | rand_state = np.random.randint(0, 100) 43 | X_train, X_test, y_train, y_test = train_test_split(scaled_X, y, test_size=0.2, random_state=rand_state) 44 | 45 | print('Feature vector length:', len(X_train[0])) 46 | 47 | # Define the classifier 48 | svc = LinearSVC() # svc = SVC(kernel='rbf') 49 | 50 | # Train the classifier (check training time) 51 | t = time.time() 52 | svc.fit(X_train, y_train) 53 | t2 = time.time() 54 | print(round(t2 - t, 2), 'Seconds to train SVC...') 55 | 56 | # Check the score of the SVC 57 | print('Test Accuracy of SVC = ', round(svc.score(X_test, y_test), 4)) 58 | 59 | # dump all stuff necessary to perform testing in a successive phase 60 | with open('data/svm_trained.pickle', 'wb') as f: 61 | pickle.dump(svc, f) 62 | with open('data/feature_scaler.pickle', 'wb') as f: 63 | pickle.dump(feature_scaler, f) 64 | with open('data/feat_extraction_params.pickle', 'wb') as f: 65 | pickle.dump(feat_extraction_params, f) 66 | 67 | # test on images in "test_images" directory 68 | test_img_dir = 'test_images' 69 | for test_img in os.listdir(test_img_dir): 70 | image = cv2.imread(os.path.join(test_img_dir, test_img)) 71 | 72 | h, w, c = image.shape 73 | draw_image = np.copy(image) 74 | 75 | windows = slide_window(image, x_start_stop=[None, None], y_start_stop=[h//2, None], 76 | xy_window=(64, 64), xy_overlap=(0.8, 0.8)) 77 | 78 | hot_windows = search_windows(image, windows, svc, feature_scaler, feat_extraction_params) 79 | 80 | window_img = draw_boxes(draw_image, hot_windows, color=(0, 0, 255), thick=6) 81 | 82 | plt.imshow(cv2.cvtColor(window_img, cv2.COLOR_BGR2RGB)) 83 | plt.show() 84 | -------------------------------------------------------------------------------- /04_vehicle_detection/vehicle.py: -------------------------------------------------------------------------------- 1 | import cv2 2 | import numpy as np 3 | 4 | 5 | class Vehicle: 6 | """ 7 | 2D Vehicle defined by top-left and bottom-right corners. 8 | 9 | Parameters 10 | ---------- 11 | x_min : int 12 | x coordinate of top-left corner. 13 | y_min : int 14 | y coordinate of top-left corner. 15 | x_max : int 16 | x coordinate of bottom-right corner. 17 | y_min : int 18 | y coordinate of bottom-right corner. 19 | """ 20 | 21 | def __init__(self, x_min, y_min, x_max, y_max): 22 | 23 | self.x_min = x_min 24 | self.y_min = y_min 25 | self.x_max = x_max 26 | self.y_max = y_max 27 | 28 | self.x_side = self.x_max - self.x_min 29 | self.y_side = self.y_max - self.y_min 30 | 31 | def intersect_with(self, rect): 32 | """ 33 | Compute the intersection between this instance and another Vehicle. 34 | 35 | Parameters 36 | ---------- 37 | rect : Vehicle 38 | The instance of the second Vehicle. 39 | 40 | Returns 41 | ------- 42 | intersection_area : float 43 | Area of intersection between the two rectangles expressed in number of pixels. 44 | """ 45 | if not isinstance(rect, Vehicle): 46 | raise ValueError('Cannot compute intersection if "rect" is not a Vehicle') 47 | 48 | dx = min(self.x_max, rect.x_max) - max(self.x_min, rect.x_min) 49 | dy = min(self.y_max, rect.y_max) - max(self.y_min, rect.y_min) 50 | 51 | if dx >= 0 and dy >= 0: 52 | intersection = dx * dy 53 | else: 54 | intersection = 0. 55 | 56 | return intersection 57 | 58 | def resize_sides(self, ratio, bounds=None): 59 | """ 60 | Resize the sides of rectangle while mantaining the aspect ratio and center position. 61 | 62 | Parameters 63 | ---------- 64 | ratio : float 65 | Ratio of the resize in range (0, infinity), where 2 means double the size and 0.5 is half of the size. 66 | bounds: tuple, optional 67 | If present, clip the Vehicle to these bounds=(xbmin, ybmin, xbmax, ybmax). 68 | 69 | Returns 70 | ------- 71 | rectangle : Vehicle 72 | Reshaped Vehicle. 73 | """ 74 | 75 | # compute offset 76 | off_x = abs(ratio * self.x_side - self.x_side) / 2 77 | off_y = abs(ratio * self.y_side - self.y_side) / 2 78 | 79 | # offset changes sign according if the resize is either positive or negative 80 | sign = np.sign(ratio - 1.) 81 | off_x = np.int32(off_x * sign) 82 | off_y = np.int32(off_y * sign) 83 | 84 | # update top-left and bottom-right coords 85 | new_x_min, new_y_min = self.x_min - off_x, self.y_min - off_y 86 | new_x_max, new_y_max = self.x_max + off_x, self.y_max + off_y 87 | 88 | # eventually clip the coordinates according to the given bounds 89 | if bounds: 90 | b_x_min, b_y_min, b_x_max, b_y_max = bounds 91 | new_x_min = max(new_x_min, b_x_min) 92 | new_y_min = max(new_y_min, b_y_min) 93 | new_x_max = min(new_x_max, b_x_max) 94 | new_y_max = min(new_y_max, b_y_max) 95 | 96 | return Vehicle(new_x_min, new_y_min, new_x_max, new_y_max) 97 | 98 | def draw(self, frame, color=255, thickness=1): 99 | """ 100 | Draw Vehicle on a given frame. 101 | 102 | Notice: while this function does not return anything, original image `frame` is modified. 103 | 104 | Parameters 105 | ---------- 106 | frame : 2D / 3D np.array 107 | The image on which the rectangle is drawn. 108 | color : tuple, optional 109 | Color used to draw the rectangle (default = 255) 110 | thickness : int, optional 111 | Line thickness used t draw the rectangle (default = 1) 112 | 113 | Returns 114 | ------- 115 | None 116 | """ 117 | cv2.rectangle(frame, (self.x_min, self.y_min), (self.x_max, self.y_max), color, thickness) 118 | 119 | def get_binary_mask(self, mask_shape): 120 | """ 121 | Get uint8 binary mask of shape `mask_shape` with rectangle in foreground. 122 | 123 | Parameters 124 | ---------- 125 | mask_shape : (tuple) 126 | Shape of the mask to return - following convention (h, w) 127 | 128 | Returns 129 | ------- 130 | mask : np.array 131 | Binary uint8 mask of shape `mask_shape` with rectangle drawn as foreground. 132 | """ 133 | if mask_shape[0] < self.y_max or mask_shape[1] < self.x_max: 134 | raise ValueError('Mask shape is smaller than Vehicle size') 135 | mask = np.zeros(shape=mask_shape, dtype=np.uint8) 136 | mask = cv2.rectangle(mask, self.tl_corner, self.br_corner, color=255, thickness=cv2.FILLED) 137 | return mask 138 | 139 | def contains(self, x, y): 140 | 141 | if self.x_min < x < self.x_max and self.y_min < y < self.y_max: 142 | return True 143 | else: 144 | return False 145 | 146 | @property 147 | def center(self): 148 | center_x = self.x_min + self.x_side // 2 149 | center_y = self.y_min + self.y_side // 2 150 | return tuple(map(np.int32, (center_x, center_y))) 151 | 152 | @property 153 | def tl_corner(self): 154 | """ 155 | Coordinates of the top-left corner of rectangle (as int32). 156 | 157 | Returns 158 | ------- 159 | tl_corner : int32 tuple 160 | """ 161 | return tuple(map(np.int32, (self.x_min, self.y_min))) 162 | 163 | @property 164 | def br_corner(self): 165 | """ 166 | Coordinates of the bottom-right corner of rectangle. 167 | 168 | Returns 169 | ------- 170 | br_corner : int32 tuple 171 | """ 172 | return tuple(map(np.int32, (self.x_max, self.y_max))) 173 | 174 | @property 175 | def coords(self): 176 | """ 177 | Coordinates (x_min, y_min, x_max, y_max) which define the Vehicle. 178 | 179 | Returns 180 | ------- 181 | coordinates : int32 tuple 182 | """ 183 | return tuple(map(np.int32, (self.x_min, self.y_min, self.x_max, self.y_max))) 184 | 185 | 186 | @property 187 | def area(self): 188 | """ 189 | Get the area of Vehicle 190 | 191 | Returns 192 | ------- 193 | area : float32 194 | """ 195 | return np.float32(self.x_side * self.y_side) -------------------------------------------------------------------------------- /05_road_segmentation/README.md: -------------------------------------------------------------------------------- 1 | # Semantic Segmentation 2 | ### Introduction 3 | In this project, you'll label the pixels of a road in images using a Fully Convolutional Network (FCN). 4 | 5 |

6 | Overview 7 |
Qualitative results. 8 |

9 | 10 | 11 | 12 | ### Setup 13 | ##### Frameworks and Packages 14 | Make sure you have the following is installed: 15 | - [Python 3](https://www.python.org/) 16 | - [TensorFlow](https://www.tensorflow.org/) 17 | - [NumPy](http://www.numpy.org/) 18 | - [SciPy](https://www.scipy.org/) 19 | ##### Dataset 20 | Download the [Kitti Road dataset](http://www.cvlibs.net/datasets/kitti/eval_road.php) from [here](http://www.cvlibs.net/download.php?file=data_road.zip). Extract the dataset in the `data` folder. This will create the folder `data_road` with all the training a test images. 21 | 22 | ### Run 23 | 24 | Run the following command to run the project: 25 | ``` 26 | python main.py 27 | ``` 28 | **Note** If running this in Jupyter Notebook system messages, such as those regarding test status, may appear in the terminal rather than the notebook. 29 | 30 | -------------------------------------------------------------------------------- /05_road_segmentation/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ifding/deep-learning-python/fc8bc808d5439686f0ee24a4f0f3b1f5354df6c0/05_road_segmentation/__init__.py -------------------------------------------------------------------------------- /05_road_segmentation/helper.py: -------------------------------------------------------------------------------- 1 | import re 2 | import random 3 | import numpy as np 4 | import os.path 5 | import scipy.misc 6 | import shutil 7 | import zipfile 8 | import time 9 | import tensorflow as tf 10 | from glob import glob 11 | from urllib.request import urlretrieve 12 | from tqdm import tqdm 13 | 14 | 15 | class DLProgress(tqdm): 16 | last_block = 0 17 | 18 | def hook(self, block_num=1, block_size=1, total_size=None): 19 | self.total = total_size 20 | self.update((block_num - self.last_block) * block_size) 21 | self.last_block = block_num 22 | 23 | 24 | def maybe_download_pretrained_vgg(data_dir): 25 | """ 26 | Download and extract pretrained vgg model if it doesn't exist 27 | :param data_dir: Directory to download the model to 28 | """ 29 | vgg_filename = 'vgg.zip' 30 | vgg_path = os.path.join(data_dir, 'vgg') 31 | vgg_files = [ 32 | os.path.join(vgg_path, 'variables/variables.data-00000-of-00001'), 33 | os.path.join(vgg_path, 'variables/variables.index'), 34 | os.path.join(vgg_path, 'saved_model.pb')] 35 | 36 | missing_vgg_files = [vgg_file for vgg_file in vgg_files if not os.path.exists(vgg_file)] 37 | if missing_vgg_files: 38 | # Clean vgg dir 39 | if os.path.exists(vgg_path): 40 | shutil.rmtree(vgg_path) 41 | os.makedirs(vgg_path) 42 | 43 | # Download vgg 44 | print('Downloading pre-trained vgg model...') 45 | with DLProgress(unit='B', unit_scale=True, miniters=1) as pbar: 46 | urlretrieve( 47 | 'https://s3-us-west-1.amazonaws.com/udacity-selfdrivingcar/vgg.zip', 48 | os.path.join(vgg_path, vgg_filename), 49 | pbar.hook) 50 | 51 | # Extract vgg 52 | print('Extracting model...') 53 | zip_ref = zipfile.ZipFile(os.path.join(vgg_path, vgg_filename), 'r') 54 | zip_ref.extractall(data_dir) 55 | zip_ref.close() 56 | 57 | # Remove zip file to save space 58 | os.remove(os.path.join(vgg_path, vgg_filename)) 59 | 60 | 61 | def gen_batch_function(data_folder, image_shape): 62 | """ 63 | Generate function to create batches of training data 64 | :param data_folder: Path to folder that contains all the datasets 65 | :param image_shape: Tuple - Shape of image 66 | :return: 67 | """ 68 | def get_batches_fn(batch_size): 69 | """ 70 | Create batches of training data 71 | :param batch_size: Batch Size 72 | :return: Batches of training data 73 | """ 74 | image_paths = glob(os.path.join(data_folder, 'image_2', '*.png')) 75 | label_paths = { 76 | re.sub(r'_(lane|road)_', '_', os.path.basename(path)): path 77 | for path in glob(os.path.join(data_folder, 'gt_image_2', '*_road_*.png'))} 78 | background_color = np.array([255, 0, 0]) 79 | 80 | random.shuffle(image_paths) 81 | for batch_i in range(0, len(image_paths), batch_size): 82 | images = [] 83 | gt_images = [] 84 | for image_file in image_paths[batch_i:batch_i+batch_size]: 85 | gt_image_file = label_paths[os.path.basename(image_file)] 86 | 87 | image = scipy.misc.imresize(scipy.misc.imread(image_file), image_shape) 88 | gt_image = scipy.misc.imresize(scipy.misc.imread(gt_image_file), image_shape) 89 | 90 | gt_bg = np.all(gt_image == background_color, axis=2) 91 | gt_bg = gt_bg.reshape(*gt_bg.shape, 1) 92 | gt_image = np.concatenate((gt_bg, np.invert(gt_bg)), axis=2) 93 | 94 | images.append(image) 95 | gt_images.append(gt_image) 96 | 97 | yield np.array(images), np.array(gt_images) 98 | return get_batches_fn 99 | 100 | 101 | def gen_test_output(sess, logits, keep_prob, image_pl, data_folder, image_shape): 102 | """ 103 | Generate test output using the test images 104 | :param sess: TF session 105 | :param logits: TF Tensor for the logits 106 | :param keep_prob: TF Placeholder for the dropout keep robability 107 | :param image_pl: TF Placeholder for the image placeholder 108 | :param data_folder: Path to the folder that contains the datasets 109 | :param image_shape: Tuple - Shape of image 110 | :return: Output for for each test image 111 | """ 112 | for image_file in glob(os.path.join(data_folder, 'image_2', '*.png')): 113 | image = scipy.misc.imresize(scipy.misc.imread(image_file), image_shape) 114 | 115 | im_softmax = sess.run( 116 | [tf.nn.softmax(logits)], 117 | {keep_prob: 1.0, image_pl: [image]}) 118 | im_softmax = im_softmax[0][:, 1].reshape(image_shape[0], image_shape[1]) 119 | segmentation = (im_softmax > 0.5).reshape(image_shape[0], image_shape[1], 1) 120 | mask = np.dot(segmentation, np.array([[0, 255, 0, 127]])) 121 | mask = scipy.misc.toimage(mask, mode="RGBA") 122 | street_im = scipy.misc.toimage(image) 123 | street_im.paste(mask, box=None, mask=mask) 124 | 125 | yield os.path.basename(image_file), np.array(street_im) 126 | 127 | 128 | def save_inference_samples(runs_dir, data_dir, sess, image_shape, logits, keep_prob, input_image): 129 | # Make folder for current run 130 | output_dir = os.path.join(runs_dir, str(time.time())) 131 | if os.path.exists(output_dir): 132 | shutil.rmtree(output_dir) 133 | os.makedirs(output_dir) 134 | 135 | # Run NN on test images and save them to HD 136 | print('Training Finished. Saving test images to: {}'.format(output_dir)) 137 | image_outputs = gen_test_output( 138 | sess, logits, keep_prob, input_image, os.path.join(data_dir, 'data_road/testing'), image_shape) 139 | for name, image in image_outputs: 140 | scipy.misc.imsave(os.path.join(output_dir, name), image) 141 | -------------------------------------------------------------------------------- /05_road_segmentation/image_augmentation.py: -------------------------------------------------------------------------------- 1 | """ 2 | Fairly basic set of tools for data augmentation on images. 3 | """ 4 | 5 | import numpy as np 6 | import random 7 | import cv2 8 | from os.path import join, expanduser 9 | import matplotlib.pyplot as plt 10 | 11 | 12 | def perform_augmentation(batch_x, batch_y): 13 | """ 14 | Perform basic data augmentation on image batches. 15 | 16 | Parameters 17 | ---------- 18 | batch_x: ndarray of shape (b, h, w, c) 19 | Batch of images in RGB format, values in [0, 255] 20 | batch_y: ndarray of shape (b, h, w, c) 21 | Batch of ground truth with road segmentation 22 | 23 | Returns 24 | ------- 25 | batch_x_aug, batch_y_aug: two ndarray of shape (b, h, w, c) 26 | Augmented batches 27 | """ 28 | def mirror(x): 29 | return x[:, ::-1, :] 30 | 31 | def augment_in_hsv_space(x_hsv): 32 | x_hsv = np.float32(cv2.cvtColor(x_hsv, cv2.COLOR_RGB2HSV)) 33 | x_hsv[:, :, 0] = x_hsv[:, :, 0] * random.uniform(0.9, 1.1) # change hue 34 | x_hsv[:, :, 1] = x_hsv[:, :, 1] * random.uniform(0.5, 2.0) # change saturation 35 | x_hsv[:, :, 2] = x_hsv[:, :, 2] * random.uniform(0.5, 2.0) # change brightness 36 | x_hsv = np.uint8(np.clip(x_hsv, 0, 255)) 37 | return cv2.cvtColor(x_hsv, cv2.COLOR_HSV2RGB) 38 | 39 | batch_x_aug = np.copy(batch_x) 40 | batch_y_aug = np.copy(batch_y) 41 | 42 | for b in range(batch_x_aug.shape[0]): 43 | 44 | # Random mirroring 45 | should_mirror = random.choice([True, False]) 46 | if should_mirror: 47 | batch_x_aug[b] = mirror(batch_x[b]) 48 | batch_y_aug[b] = mirror(batch_y[b]) 49 | 50 | # Random change in image values (hue, saturation, brightness) 51 | batch_x_aug[b] = augment_in_hsv_space(batch_x_aug[b]) 52 | 53 | return batch_x_aug, batch_y_aug 54 | 55 | 56 | def debug_visualize_data_augmentation(): 57 | 58 | from main_27 import gen_batch_function # keep here to avoid circular dependencies 59 | 60 | """ 61 | Dirty and running code to debug image augmentation functions. 62 | """ 63 | 64 | # Parameters 65 | data_dir = join(expanduser("~"), 'code', 'self-driving-car', 'project_12_road_segmentation', 'data') 66 | image_h, image_w = (160, 576) 67 | batch_size = 20 68 | 69 | # Create function to get batches 70 | batch_generator = gen_batch_function(join(data_dir, 'data_road/training'), (image_h, image_w)) 71 | 72 | # Load a batch and augment it 73 | batch_x, batch_y = next(batch_generator(batch_size)) 74 | batch_x_aug, batch_y_aug = perform_augmentation(batch_x, batch_y) 75 | 76 | # Show both original and augmented batch images 77 | for i in range(batch_size): 78 | plt.figure(1) 79 | 80 | x = batch_x[i] 81 | y = np.uint8(batch_y[i][:, :, 1]) * 255 # cast from boolean to uint8 for visualization 82 | y = np.stack([y, y, y], axis=2) # turn to 3-channels for visualization 83 | xy = np.concatenate([x, y], axis=0) 84 | 85 | plt.imshow(xy) 86 | 87 | plt.figure(2) 88 | 89 | x_aug = batch_x_aug[i] 90 | y_aug = np.uint8(batch_y_aug[i][:, :, 1]) * 255 # cast from boolean to uint8 for visualization 91 | y_aug = np.stack([y_aug, y_aug, y_aug], axis=2) # turn to 3-channels for visualization 92 | xy_aug = np.concatenate([x_aug, y_aug], axis=0) 93 | 94 | plt.imshow(xy_aug) 95 | 96 | plt.show(block=False) 97 | plt.waitforbuttonpress() 98 | 99 | 100 | if __name__ == '__main__': 101 | 102 | debug_visualize_data_augmentation() 103 | -------------------------------------------------------------------------------- /05_road_segmentation/img/example.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ifding/deep-learning-python/fc8bc808d5439686f0ee24a4f0f3b1f5354df6c0/05_road_segmentation/img/example.png -------------------------------------------------------------------------------- /05_road_segmentation/img/overview.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ifding/deep-learning-python/fc8bc808d5439686f0ee24a4f0f3b1f5354df6c0/05_road_segmentation/img/overview.jpg -------------------------------------------------------------------------------- /05_road_segmentation/main.py: -------------------------------------------------------------------------------- 1 | import os 2 | import argparse 3 | import warnings 4 | import tensorflow as tf 5 | from helper import gen_batch_function, save_inference_samples 6 | from distutils.version import LooseVersion 7 | from os.path import join, expanduser 8 | import project_tests as tests 9 | from image_augmentation import perform_augmentation 10 | 11 | 12 | # Check TensorFlow Version 13 | assert LooseVersion(tf.__version__) >= LooseVersion('1.0'),\ 14 | 'Please use TensorFlow version 1.0 or newer. You are using {}'.format(tf.__version__) 15 | print('TensorFlow Version: {}'.format(tf.__version__)) 16 | 17 | # Check for a GPU 18 | if not tf.test.gpu_device_name(): 19 | warnings.warn('No GPU found. Please use a GPU to train your neural network.') 20 | else: 21 | print('Default GPU Device: {}'.format(tf.test.gpu_device_name())) 22 | 23 | 24 | def load_vgg(sess, vgg_path): 25 | """ 26 | Load Pretrained VGG Model into TensorFlow. 27 | 28 | :param sess: TensorFlow Session 29 | :param vgg_path: Path to vgg folder, containing "variables/" and "saved_model.pb" 30 | :return: Tuple of Tensors from VGG model (image_input, keep_prob, layer3_out, layer4_out, layer7_out) 31 | """ 32 | 33 | vgg_input_tensor_name = 'image_input:0' 34 | vgg_keep_prob_tensor_name = 'keep_prob:0' 35 | vgg_layer3_out_tensor_name = 'layer3_out:0' 36 | vgg_layer4_out_tensor_name = 'layer4_out:0' 37 | vgg_layer7_out_tensor_name = 'layer7_out:0' 38 | 39 | tf.saved_model.loader.load(sess, ['vgg16'], vgg_path) 40 | graph = tf.get_default_graph() 41 | 42 | image_input = graph.get_tensor_by_name(vgg_input_tensor_name) 43 | keep_prob = graph.get_tensor_by_name(vgg_keep_prob_tensor_name) 44 | layer3_out = graph.get_tensor_by_name(vgg_layer3_out_tensor_name) 45 | layer4_out = graph.get_tensor_by_name(vgg_layer4_out_tensor_name) 46 | layer7_out = graph.get_tensor_by_name(vgg_layer7_out_tensor_name) 47 | 48 | return image_input, keep_prob, layer3_out, layer4_out, layer7_out 49 | 50 | 51 | def layers(vgg_layer3_out, vgg_layer4_out, vgg_layer7_out, num_classes): 52 | """ 53 | Create the layers for a fully convolutional network. Build skip-layers using the vgg layers. 54 | For reference: https://people.eecs.berkeley.edu/~jonlong/long_shelhamer_fcn.pdf 55 | 56 | :param vgg_layer7_out: TF Tensor for VGG Layer 3 output 57 | :param vgg_layer4_out: TF Tensor for VGG Layer 4 output 58 | :param vgg_layer3_out: TF Tensor for VGG Layer 7 output 59 | :param num_classes: Number of classes to classify 60 | :return: The Tensor for the last layer of output 61 | """ 62 | 63 | kernel_regularizer = tf.contrib.layers.l2_regularizer(0.5) 64 | 65 | # Compute logits 66 | layer3_logits = tf.layers.conv2d(vgg_layer3_out, num_classes, kernel_size=[1, 1], 67 | padding='same', kernel_regularizer=kernel_regularizer) 68 | layer4_logits = tf.layers.conv2d(vgg_layer4_out, num_classes, kernel_size=[1, 1], 69 | padding='same', kernel_regularizer=kernel_regularizer) 70 | layer7_logits = tf.layers.conv2d(vgg_layer7_out, num_classes, kernel_size=[1, 1], 71 | padding='same', kernel_regularizer=kernel_regularizer) 72 | 73 | # Add skip connection before 4th and 7th layer 74 | layer7_logits_up = tf.image.resize_images(layer7_logits, size=[10, 36]) 75 | layer_4_7_fused = tf.add(layer7_logits_up, layer4_logits) 76 | 77 | # Add skip connection before (4+7)th and 3rd layer 78 | layer_4_7_fused_up = tf.image.resize_images(layer_4_7_fused, size=[20, 72]) 79 | layer_3_4_7_fused = tf.add(layer3_logits, layer_4_7_fused_up) 80 | 81 | # resize to original size 82 | layer_3_4_7_up = tf.image.resize_images(layer_3_4_7_fused, size=[160, 576]) 83 | layer_3_4_7_up = tf.layers.conv2d(layer_3_4_7_up, num_classes, kernel_size=[15, 15], 84 | padding='same', kernel_regularizer=kernel_regularizer) 85 | 86 | return layer_3_4_7_up 87 | 88 | 89 | def optimize(net_prediction, labels, learning_rate, num_classes): 90 | """ 91 | Build the TensorFLow loss and optimizer operations. 92 | :param net_prediction: TF Tensor of the last layer in the neural network 93 | :param labels: TF Placeholder for the correct label image 94 | :param learning_rate: TF Placeholder for the learning rate 95 | :param num_classes: Number of classes to classify 96 | :return: Tuple of (logits, train_op, cross_entropy_loss) 97 | """ 98 | 99 | # Unroll 100 | logits_flat = tf.reshape(net_prediction, (-1, num_classes)) 101 | labels_flat = tf.reshape(labels, (-1, num_classes)) 102 | 103 | # Define loss 104 | cross_entropy_loss = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(labels=labels_flat, logits=logits_flat)) 105 | 106 | # Define optimization step 107 | train_step = tf.train.AdamOptimizer(learning_rate=learning_rate).minimize(cross_entropy_loss) 108 | 109 | return logits_flat, train_step, cross_entropy_loss 110 | 111 | 112 | def train_nn(sess, training_epochs, batch_size, get_batches_fn, train_op, cross_entropy_loss, 113 | image_input, labels, keep_prob, learning_rate): 114 | """ 115 | Train neural network and print out the loss during training. 116 | :param sess: TF Session 117 | :param training_epochs: Number of epochs 118 | :param batch_size: Batch size 119 | :param get_batches_fn: Function to get batches of training data. Call using get_batches_fn(batch_size) 120 | :param train_op: TF Operation to train the neural network 121 | :param cross_entropy_loss: TF Tensor for the amount of loss 122 | :param image_input: TF Placeholder for input images 123 | :param labels: TF Placeholder for label images 124 | :param keep_prob: TF Placeholder for dropout keep probability 125 | :param learning_rate: TF Placeholder for learning rate 126 | """ 127 | 128 | # Variable initialization 129 | sess.run(tf.global_variables_initializer()) 130 | 131 | lr = args.learning_rate 132 | 133 | for e in range(0, training_epochs): 134 | 135 | loss_this_epoch = 0.0 136 | 137 | for i in range(0, args.batches_per_epoch): 138 | 139 | # Load a batch of examples 140 | batch_x, batch_y = next(get_batches_fn(batch_size)) 141 | if should_do_augmentation: 142 | batch_x, batch_y = perform_augmentation(batch_x, batch_y) 143 | 144 | _, cur_loss = sess.run(fetches=[train_op, cross_entropy_loss], 145 | feed_dict={image_input: batch_x, labels: batch_y, keep_prob: 0.25, 146 | learning_rate: lr}) 147 | 148 | loss_this_epoch += cur_loss 149 | 150 | print('Epoch: {:02d} - Loss: {:.03f}'.format(e, loss_this_epoch / args.batches_per_epoch)) 151 | 152 | 153 | def perform_tests(): 154 | tests.test_for_kitti_dataset(data_dir) 155 | tests.test_load_vgg(load_vgg, tf) 156 | tests.test_layers(layers) 157 | tests.test_optimize(optimize) 158 | tests.test_train_nn(train_nn) 159 | 160 | 161 | def run(): 162 | 163 | num_classes = 2 164 | 165 | image_h, image_w = (160, 576) 166 | 167 | with tf.Session() as sess: 168 | 169 | # Path to vgg model 170 | vgg_path = join(data_dir, 'vgg') 171 | 172 | # Create function to get batches 173 | batch_generator = gen_batch_function(join(data_dir, 'data_road/training'), (image_h, image_w)) 174 | 175 | # Load VGG pretrained 176 | image_input, keep_prob, vgg_layer3_out, vgg_layer4_out, vgg_layer7_out = load_vgg(sess, vgg_path) 177 | 178 | # Add skip connections 179 | output = layers(vgg_layer3_out, vgg_layer4_out, vgg_layer7_out, num_classes) 180 | 181 | # Define placeholders 182 | labels = tf.placeholder(tf.float32, shape=[None, image_h, image_w, num_classes]) 183 | learning_rate = tf.placeholder(tf.float32, shape=[]) 184 | 185 | logits, train_op, cross_entropy_loss = optimize(output, labels, learning_rate, num_classes) 186 | 187 | # Training parameters 188 | train_nn(sess, args.training_epochs, args.batch_size, batch_generator, train_op, cross_entropy_loss, 189 | image_input, labels, keep_prob, learning_rate) 190 | 191 | save_inference_samples(runs_dir, data_dir, sess, (image_h, image_w), logits, keep_prob, image_input) 192 | 193 | 194 | def parse_arguments(): 195 | """ 196 | Parse command line arguments 197 | """ 198 | parser = argparse.ArgumentParser(formatter_class=argparse.ArgumentDefaultsHelpFormatter) 199 | parser.add_argument('--batch_size', type=int, default=8, help='Batch size used for training', metavar='') 200 | parser.add_argument('--batches_per_epoch', type=int, default=100, help='Batches each training epoch', metavar='') 201 | parser.add_argument('--training_epochs', type=int, default=30, help='Number of training epoch', metavar='') 202 | parser.add_argument('--learning_rate', type=float, default=1e-4, help='Learning rate', metavar='') 203 | parser.add_argument('--augmentation', type=bool, default=True, help='Perform augmentation in training', metavar='') 204 | parser.add_argument('--gpu', type=int, default=0, help='Which GPU to use', metavar='') 205 | return parser.parse_args() 206 | 207 | 208 | if __name__ == '__main__': 209 | 210 | data_dir = join(expanduser("~"), 'code', 'self-driving-car', 'project_12_road_segmentation', 'data') 211 | runs_dir = join(expanduser("~"), 'majinbu_home', 'road_segmentation_prediction') 212 | 213 | args = parse_arguments() 214 | 215 | # Appropriately set GPU device 216 | os.environ['CUDA_VISIBLE_DEVICES'] = str(args.gpu) 217 | print('Using GPU: {:02d}.'.format(args.gpu)) 218 | 219 | # Turn off augmentation during tests 220 | should_do_augmentation = False 221 | perform_tests() 222 | 223 | # Restore appropriate augmentation value 224 | should_do_augmentation = args.augmentation 225 | run() 226 | -------------------------------------------------------------------------------- /05_road_segmentation/project_tests.py: -------------------------------------------------------------------------------- 1 | import sys 2 | import os 3 | from copy import deepcopy 4 | from glob import glob 5 | from unittest import mock 6 | 7 | import numpy as np 8 | import tensorflow as tf 9 | 10 | 11 | def test_safe(func): 12 | """ 13 | Isolate tests 14 | """ 15 | def func_wrapper(*args): 16 | with tf.Graph().as_default(): 17 | result = func(*args) 18 | print('Tests Passed') 19 | return result 20 | 21 | return func_wrapper 22 | 23 | 24 | def _prevent_print(function, params): 25 | sys.stdout = open(os.devnull, "w") 26 | function(**params) 27 | sys.stdout = sys.__stdout__ 28 | 29 | 30 | def _assert_tensor_shape(tensor, shape, display_name): 31 | assert tf.assert_rank(tensor, len(shape), message='{} has wrong rank'.format(display_name)) 32 | 33 | tensor_shape = tensor.get_shape().as_list() if len(shape) else [] 34 | 35 | wrong_dimension = [ten_dim for ten_dim, cor_dim in zip(tensor_shape, shape) 36 | if cor_dim is not None and ten_dim != cor_dim] 37 | assert not wrong_dimension, \ 38 | '{} has wrong shape. Found {}'.format(display_name, tensor_shape) 39 | 40 | 41 | class TmpMock(object): 42 | """ 43 | Mock a attribute. Restore attribute when exiting scope. 44 | """ 45 | def __init__(self, module, attrib_name): 46 | self.original_attrib = deepcopy(getattr(module, attrib_name)) 47 | setattr(module, attrib_name, mock.MagicMock()) 48 | self.module = module 49 | self.attrib_name = attrib_name 50 | 51 | def __enter__(self): 52 | return getattr(self.module, self.attrib_name) 53 | 54 | def __exit__(self, type, value, traceback): 55 | setattr(self.module, self.attrib_name, self.original_attrib) 56 | 57 | 58 | @test_safe 59 | def test_load_vgg(load_vgg, tf_module): 60 | with TmpMock(tf_module.saved_model.loader, 'load') as mock_load_model: 61 | vgg_path = '' 62 | sess = tf.Session() 63 | test_input_image = tf.placeholder(tf.float32, name='image_input') 64 | test_keep_prob = tf.placeholder(tf.float32, name='keep_prob') 65 | test_vgg_layer3_out = tf.placeholder(tf.float32, name='layer3_out') 66 | test_vgg_layer4_out = tf.placeholder(tf.float32, name='layer4_out') 67 | test_vgg_layer7_out = tf.placeholder(tf.float32, name='layer7_out') 68 | 69 | input_image, keep_prob, vgg_layer3_out, vgg_layer4_out, vgg_layer7_out = load_vgg(sess, vgg_path) 70 | 71 | assert mock_load_model.called, \ 72 | 'tf.saved_model.loader.load() not called' 73 | assert mock_load_model.call_args == mock.call(sess, ['vgg16'], vgg_path), \ 74 | 'tf.saved_model.loader.load() called with wrong arguments.' 75 | 76 | assert input_image == test_input_image, 'input_image is the wrong object' 77 | assert keep_prob == test_keep_prob, 'keep_prob is the wrong object' 78 | assert vgg_layer3_out == test_vgg_layer3_out, 'layer3_out is the wrong object' 79 | assert vgg_layer4_out == test_vgg_layer4_out, 'layer4_out is the wrong object' 80 | assert vgg_layer7_out == test_vgg_layer7_out, 'layer7_out is the wrong object' 81 | 82 | 83 | @test_safe 84 | def test_layers(layers): 85 | num_classes = 2 86 | vgg_layer3_out = tf.placeholder(tf.float32, [None, None, None, 256]) 87 | vgg_layer4_out = tf.placeholder(tf.float32, [None, None, None, 512]) 88 | vgg_layer7_out = tf.placeholder(tf.float32, [None, None, None, 4096]) 89 | layers_output = layers(vgg_layer3_out, vgg_layer4_out, vgg_layer7_out, num_classes) 90 | 91 | _assert_tensor_shape(layers_output, [None, None, None, num_classes], 'Layers Output') 92 | 93 | 94 | @test_safe 95 | def test_optimize(optimize): 96 | num_classes = 2 97 | shape = [2, 3, 4, num_classes] 98 | layers_output = tf.Variable(tf.zeros(shape)) 99 | correct_label = tf.placeholder(tf.float32, [None, None, None, num_classes]) 100 | learning_rate = tf.placeholder(tf.float32) 101 | logits, train_op, cross_entropy_loss = optimize(layers_output, correct_label, learning_rate, num_classes) 102 | 103 | _assert_tensor_shape(logits, [2*3*4, num_classes], 'Logits') 104 | 105 | with tf.Session() as sess: 106 | sess.run(tf.global_variables_initializer()) 107 | sess.run([train_op], {correct_label: np.arange(np.prod(shape)).reshape(shape), learning_rate: 10}) 108 | test, loss = sess.run([layers_output, cross_entropy_loss], {correct_label: np.arange(np.prod(shape)).reshape(shape)}) 109 | 110 | assert test.min() != 0 or test.max() != 0, 'Training operation not changing weights.' 111 | 112 | 113 | @test_safe 114 | def test_train_nn(train_nn): 115 | epochs = 1 116 | batch_size = 2 117 | 118 | def get_batches_fn(batch_size_parm): 119 | shape = [batch_size_parm, 2, 3, 3] 120 | yield np.arange(np.prod(shape)).reshape(shape) 121 | 122 | train_op = tf.constant(0) 123 | cross_entropy_loss = tf.constant(10.11) 124 | input_image = tf.placeholder(tf.float32, name='input_image') 125 | correct_label = tf.placeholder(tf.float32, name='correct_label') 126 | keep_prob = tf.placeholder(tf.float32, name='keep_prob') 127 | learning_rate = tf.placeholder(tf.float32, name='learning_rate') 128 | with tf.Session() as sess: 129 | parameters = { 130 | 'sess': sess, 131 | 'training_epochs': epochs, 132 | 'batch_size': batch_size, 133 | 'get_batches_fn': get_batches_fn, 134 | 'train_op': train_op, 135 | 'cross_entropy_loss': cross_entropy_loss, 136 | 'image_input': input_image, 137 | 'labels': correct_label, 138 | 'keep_prob': keep_prob, 139 | 'learning_rate': learning_rate} 140 | _prevent_print(train_nn, parameters) 141 | 142 | 143 | @test_safe 144 | def test_for_kitti_dataset(data_dir): 145 | kitti_dataset_path = os.path.join(data_dir, 'data_road') 146 | training_labels_count = len(glob(os.path.join(kitti_dataset_path, 'training/gt_image_2/*_road_*.png'))) 147 | training_images_count = len(glob(os.path.join(kitti_dataset_path, 'training/image_2/*.png'))) 148 | testing_images_count = len(glob(os.path.join(kitti_dataset_path, 'testing/image_2/*.png'))) 149 | 150 | assert not (training_images_count == training_labels_count == testing_images_count == 0),\ 151 | 'Kitti dataset not found. Extract Kitti dataset in {}'.format(kitti_dataset_path) 152 | assert training_images_count == 289, 'Expected 289 training images, found {} images.'.format(training_images_count) 153 | assert training_labels_count == 289, 'Expected 289 training labels, found {} labels.'.format(training_labels_count) 154 | assert testing_images_count == 290, 'Expected 290 testing images, found {} images.'.format(testing_images_count) 155 | -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | MIT License 2 | 3 | Copyright (c) 2019 Fei Ding 4 | 5 | Permission is hereby granted, free of charge, to any person obtaining a copy 6 | of this software and associated documentation files (the "Software"), to deal 7 | in the Software without restriction, including without limitation the rights 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 9 | copies of the Software, and to permit persons to whom the Software is 10 | furnished to do so, subject to the following conditions: 11 | 12 | The above copyright notice and this permission notice shall be included in all 13 | copies or substantial portions of the Software. 14 | 15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 21 | SOFTWARE. 22 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | 2 | # Deep Learning using Python/C++/OpenCV 3 | 4 | --- 5 | 6 | ## Basics 7 | 8 | - [Deep Learning Basics](./resources/deep-learning.md) 9 | - [Components of Autonomous Driving System](./resources/autonomous-driving.md) 10 | - [Datasets](./resources/datasets.md) 11 | - [Train your own object detector with Faster-RCNN & PyTorch](./faster-rcnn-tutorial) 12 | 13 | 14 | ## Computer Vision and Deep Learning 15 | 16 | #### [P1 - Detecting Lane Lines](./01_finding_lane_lines) 17 | - **Basic:** Detected highway lane lines on a video stream. Used OpencV image analysis techniques to identify lines, including Hough Transforms and Canny edge detection. 18 | - **Keywords:** Computer Vision, OpenCV 19 | 20 | #### [P2 - Traffic Sign Classification](./02_traffic_sign_detector) 21 | - **Summary:** Built and trained a support vector machines (SVM) to classify traffic signs, using [dlib](http://dlib.net/). Google Street View images can be used to train the detectors. 25~40 images are sufficient to train a good detector. 22 | - **Keywords:** Computer Vision, Machine Learning 23 | 24 | #### [P3 - Object Detection with OpenCV](./03_opencv_detection) 25 | - **Summary:** The provided API (for C++ and Python) is very easy to use, just load the network and run it. Multiple inputs/outputs are supported. Here are the examples: https://github.com/opencv/opencv/tree/master/samples/dnn. 26 | 27 | #### [P4 - Vehicle Detection and Tracking](./04_vehicle_detection) 28 | - **Summary:** Created a vehicle detection and tracking pipeline with OpenCV, histogram of oriented gradients (HOG), and support vector machines (SVM). Implemented the same pipeline using a deep network to perform detection. Optimized and evaluated the model on video data from a automotive camera taken during highway driving. 29 | - **Keywords:** Computer Vision, Deep Learning, OpenCV 30 | 31 | #### [P5 - Road Segmentation](./05_road_segmentation) 32 | - **Summary:** Implement the road segmentation using a fully-convolutional network. 33 | - **Keywords:** Deep Learning, Semantic Segmentation 34 | 35 | 36 | ## References 37 | 38 | - 39 | - 40 | 41 | -------------------------------------------------------------------------------- /faster-rcnn-tutorial/README.md: -------------------------------------------------------------------------------- 1 | # Train your own object detector with Faster-RCNN & PyTorch 2 | 3 | This repository contains all files that were used for the blog tutorial 4 | [**Train your own object detector with Faster-RCNN & PyTorch**](https://github.com/ifding/faster-rcnn-tutorial). 5 | 6 | - If you want to use neptune for your own experiments, add the 'NEPTUNE' env var to your system. For example, I use `dotenv`: 7 | 8 | `$ dotenv set NEPTUNE your_key` 9 | 10 | It will create `.env` in current dir, and `.env` is already added in the `.gitignore`. After that, when you put `load_dotenv()` in your code, it will automatically take the 'NEPTUNE' env var from `.env`. 11 | 12 | - Just focus on modifying `custom_dataset.py` file for your own data 13 | - Now compatible with pytorch 1.9 and pytorch lighting 1.37 14 | 15 | 16 | ## Installation steps: 17 | 18 | - `conda create -n ` 19 | - `conda activate ` 20 | - `conda install python=3.8` 21 | - `git clone https://github.com/ifding/faster-rcnn-tutorial.git` 22 | - `cd faster-rcnn-tutorial` 23 | - `pip install .` 24 | - You have to install a pytorch version with `pip` or `conda` that meets the requirements of your hardware. 25 | Otherwise the versions for torch etc. specified in [setup.py](setup.py) are installed. 26 | To install the correct pytorch version for your hardware, check [pytorch.org](https://pytorch.org/). 27 | - [OPTIONAL] To check whether pytorch uses the nvidia gpu, check if `torch.cuda.is_available()` returns `True` in a python shell. 28 | 29 | ## Custom dataset: balloon 30 | 31 | - `sh download_dataset.sh` 32 | - `CUDA_VISIBLE_DEVICES=0 python train.py` 33 | - check the checkpoint in folder `balloon` and experiment details in 34 | 35 | ![](dataset/result.png) 36 | 37 | ## Acknowledge 38 | 39 | Most code is borrowed from , thanks, Johannes Schmidt! 40 | 41 | I try to reduce less important compontents, and make the whole pipeline more clear. 42 | -------------------------------------------------------------------------------- /faster-rcnn-tutorial/custom_dataset.py: -------------------------------------------------------------------------------- 1 | import os 2 | import pathlib 3 | from multiprocessing import Pool 4 | from typing import List, Dict 5 | import numpy as np 6 | import json 7 | import cv2 8 | 9 | import torch 10 | from skimage.color import rgba2rgb 11 | from skimage.io import imread 12 | from torchvision.ops import box_convert 13 | from detection.transformations import ComposeDouble, ComposeSingle, map_class_to_int 14 | from detection.utils import read_json 15 | 16 | 17 | # https://github.com/TannerGilbert/Object-Detection-and-Image-Segmentation-with-Detectron2 18 | def get_balloon_dicts(img_dir): 19 | json_file = os.path.join(img_dir, "via_region_data.json") 20 | with open(json_file) as f: 21 | imgs_anns = json.load(f) 22 | 23 | dataset_dicts = [] 24 | for idx, v in enumerate(imgs_anns.values()): 25 | record = {} 26 | 27 | filename = os.path.join(img_dir, v["filename"]) 28 | height, width = cv2.imread(filename).shape[:2] 29 | 30 | record["file_name"] = filename 31 | 32 | annos = v["regions"] 33 | boxes, labels = [], [] 34 | for _, anno in annos.items(): 35 | assert not anno["region_attributes"] 36 | anno = anno["shape_attributes"] 37 | px = anno["all_points_x"] 38 | py = anno["all_points_y"] 39 | 40 | boxes.append([np.min(px), np.min(py), np.max(px), np.max(py)]) 41 | labels.append('balloon') 42 | record["annotations"] = {'boxes': boxes, 'labels': labels} 43 | dataset_dicts.append(record) 44 | return dataset_dicts 45 | 46 | 47 | class ObjectDetectionDataSet(torch.utils.data.Dataset): 48 | """ 49 | Builds a dataset with images and their respective targets. 50 | A target is expected to be a json file 51 | and should contain at least a 'boxes' and a 'labels' key. 52 | inputs and targets are expected to be a list of pathlib.Path objects. 53 | 54 | In case your labels are strings, you can use mapping (a dict) to int-encode them. 55 | Returns a dict with the following keys: 'x', 'x_name', 'y', 'y_name' 56 | """ 57 | 58 | def __init__(self, 59 | data_path: str, 60 | transform: ComposeDouble = None, 61 | mapping: Dict = None 62 | ): 63 | self.data_path = data_path 64 | self.dataset_dict = get_balloon_dicts(data_path) 65 | self.transform = transform 66 | self.mapping = mapping 67 | 68 | def __len__(self): 69 | return len(self.dataset_dict) 70 | 71 | def __getitem__(self, 72 | index: int): 73 | record = self.dataset_dict[index] 74 | 75 | # Load input 76 | x = imread(record["file_name"]) 77 | img_name = os.path.basename(record["file_name"]) 78 | 79 | # From RGBA to RGB 80 | if x.shape[-1] == 4: 81 | x = rgba2rgb(x) 82 | 83 | # Label Mapping 84 | y = record["annotations"] 85 | if self.mapping: 86 | labels = map_class_to_int(y['labels'], mapping=self.mapping) 87 | else: 88 | labels = y['labels'] 89 | 90 | # Create target, should be converted to np.ndarrays 91 | target = {'boxes': np.array(y['boxes']), 92 | 'labels': np.array(labels)} 93 | 94 | if self.transform is not None: 95 | x, target = self.transform(x, target) # returns np.ndarrays 96 | 97 | # Typecasting 98 | x = torch.from_numpy(x).type(torch.float32) 99 | target = {key: torch.from_numpy(value).type(torch.int64) for key, value in target.items()} 100 | 101 | return {'x': x, 'y': target, 'x_name': img_name, 'y_name': img_name} 102 | 103 | 104 | 105 | class ObjectDetectionDatasetSingle(torch.utils.data.Dataset): 106 | """ 107 | Builds a dataset with images. 108 | inputs is expected to be a list of pathlib.Path objects. 109 | 110 | Returns a dict with the following keys: 'x', 'x_name' 111 | """ 112 | 113 | def __init__(self, 114 | inputs: List[pathlib.Path], 115 | transform: ComposeSingle = None, 116 | use_cache: bool = False, 117 | ): 118 | self.inputs = inputs 119 | self.transform = transform 120 | self.use_cache = use_cache 121 | 122 | if self.use_cache: 123 | # Use multiprocessing to load images and targets into RAM 124 | with Pool() as pool: 125 | self.cached_data = pool.starmap(self.read_images, inputs) 126 | 127 | def __len__(self): 128 | return len(self.inputs) 129 | 130 | def __getitem__(self, 131 | index: int): 132 | if self.use_cache: 133 | x = self.cached_data[index] 134 | else: 135 | # Select the sample 136 | input_ID = self.inputs[index] 137 | 138 | # Load input and target 139 | x = self.read_images(input_ID) 140 | 141 | # From RGBA to RGB 142 | if x.shape[-1] == 4: 143 | x = rgba2rgb(x) 144 | 145 | # Preprocessing 146 | if self.transform is not None: 147 | x = self.transform(x) # returns a np.ndarray 148 | 149 | # Typecasting 150 | x = torch.from_numpy(x).type(torch.float32) 151 | 152 | return {'x': x, 'x_name': self.inputs[index].name} 153 | 154 | @staticmethod 155 | def read_images(inp): 156 | return imread(inp) 157 | 158 | class ObjectDetectionDataSetDouble(torch.utils.data.Dataset): 159 | """ 160 | Builds a dataset with images and their respective targets. 161 | A target is expected to be a json file 162 | and should contain at least a 'boxes' and a 'labels' key. 163 | inputs and targets are expected to be a list of pathlib.Path objects. 164 | In case your labels are strings, you can use mapping (a dict) to int-encode them. 165 | Returns a dict with the following keys: 'x', 'x_name', 'y', 'y_name' 166 | """ 167 | 168 | def __init__(self, 169 | inputs: List[pathlib.Path], 170 | targets: List[pathlib.Path], 171 | transform: ComposeDouble = None, 172 | use_cache: bool = False, 173 | convert_to_format: str = None, 174 | mapping: Dict = None 175 | ): 176 | self.inputs = inputs 177 | self.targets = targets 178 | self.transform = transform 179 | self.use_cache = use_cache 180 | self.convert_to_format = convert_to_format 181 | self.mapping = mapping 182 | 183 | if self.use_cache: 184 | # Use multiprocessing to load images and targets into RAM 185 | with Pool() as pool: 186 | self.cached_data = pool.starmap(self.read_images, zip(inputs, targets)) 187 | 188 | def __len__(self): 189 | return len(self.inputs) 190 | 191 | def __getitem__(self, 192 | index: int): 193 | if self.use_cache: 194 | x, y = self.cached_data[index] 195 | else: 196 | # Select the sample 197 | input_ID = self.inputs[index] 198 | target_ID = self.targets[index] 199 | 200 | # Load input and target 201 | x, y = self.read_images(input_ID, target_ID) 202 | 203 | # From RGBA to RGB 204 | if x.shape[-1] == 4: 205 | x = rgba2rgb(x) 206 | 207 | # Read boxes 208 | try: 209 | boxes = torch.from_numpy(y['boxes']).to(torch.float32) 210 | except TypeError: 211 | boxes = torch.tensor(y['boxes']).to(torch.float32) 212 | 213 | # Read scores 214 | if 'scores' in y.keys(): 215 | try: 216 | scores = torch.from_numpy(y['scores']).to(torch.float32) 217 | except TypeError: 218 | scores = torch.tensor(y['scores']).to(torch.float32) 219 | 220 | # Label Mapping 221 | if self.mapping: 222 | labels = map_class_to_int(y['labels'], mapping=self.mapping) 223 | else: 224 | labels = y['labels'] 225 | 226 | # Read labels 227 | try: 228 | labels = torch.from_numpy(labels).to(torch.int64) 229 | except TypeError: 230 | labels = torch.tensor(labels).to(torch.int64) 231 | 232 | 233 | # Create target 234 | target = {'boxes': boxes, 235 | 'labels': labels} 236 | 237 | if 'scores' in y.keys(): 238 | target['scores'] = scores 239 | 240 | # Preprocessing 241 | target = {key: value.numpy() for key, value in target.items()} # all tensors should be converted to np.ndarrays 242 | 243 | if self.transform is not None: 244 | x, target = self.transform(x, target) # returns np.ndarrays 245 | 246 | # Typecasting 247 | x = torch.from_numpy(x).type(torch.float32) 248 | target = {key: torch.from_numpy(value).type(torch.int64) for key, value in target.items()} 249 | 250 | return {'x': x, 'y': target, 'x_name': self.inputs[index].name, 'y_name': self.targets[index].name} 251 | 252 | @staticmethod 253 | def read_images(inp, tar): 254 | return imread(inp), read_json(tar) -------------------------------------------------------------------------------- /faster-rcnn-tutorial/dataset/README.md: -------------------------------------------------------------------------------- 1 | 2 | 3 | # Dataset 4 | 5 | Before we can start training our model we need to download some dataset. In this case we will use a dataset with balloon images. 6 | 7 | 8 | 9 | 10 | -------------------------------------------------------------------------------- /faster-rcnn-tutorial/dataset/download_dataset.sh: -------------------------------------------------------------------------------- 1 | # fetch balloon images 2 | 3 | wget https://github.com/matterport/Mask_RCNN/releases/download/v2.1/balloon_dataset.zip 4 | unzip balloon_dataset.zip > /dev/null 5 | rm balloon_dataset.zip 6 | rm -fr __MACOSX 7 | -------------------------------------------------------------------------------- /faster-rcnn-tutorial/dataset/result.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ifding/deep-learning-python/fc8bc808d5439686f0ee24a4f0f3b1f5354df6c0/faster-rcnn-tutorial/dataset/result.png -------------------------------------------------------------------------------- /faster-rcnn-tutorial/detection/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ifding/deep-learning-python/fc8bc808d5439686f0ee24a4f0f3b1f5354df6c0/faster-rcnn-tutorial/detection/__init__.py -------------------------------------------------------------------------------- /faster-rcnn-tutorial/detection/anchor_generator.py: -------------------------------------------------------------------------------- 1 | from typing import Tuple 2 | 3 | import torch 4 | from torch import nn 5 | from torch.jit.annotations import List, Optional, Dict 6 | from torchvision.models.detection.image_list import ImageList 7 | from torchvision.models.detection.transform import GeneralizedRCNNTransform 8 | 9 | 10 | class AnchorGenerator(nn.Module): 11 | # Slightly adapted AnchorGenerator from torchvision. 12 | # It returns anchors_over_all_feature_maps instead of anchors (concatenated for every feature layer) 13 | 14 | """ 15 | Module that generates anchors for a set of feature maps and 16 | image sizes. 17 | 18 | The module support computing anchors at multiple sizes and aspect ratios 19 | per feature map. This module assumes aspect ratio = height / width for 20 | each anchor. 21 | 22 | sizes and aspect_ratios should have the same number of elements, and it should 23 | correspond to the number of feature maps. 24 | 25 | sizes[i] and aspect_ratios[i] can have an arbitrary number of elements, 26 | and AnchorGenerator will output a set of sizes[i] * aspect_ratios[i] anchors 27 | per spatial location for feature map i. 28 | 29 | Arguments: 30 | sizes (Tuple[Tuple[int]]): 31 | aspect_ratios (Tuple[Tuple[float]]): 32 | """ 33 | 34 | __annotations__ = { 35 | "cell_anchors": Optional[List[torch.Tensor]], 36 | "_cache": Dict[str, List[torch.Tensor]] 37 | } 38 | 39 | def __init__( 40 | self, 41 | sizes=((128, 256, 512),), 42 | aspect_ratios=((0.5, 1.0, 2.0),), 43 | ): 44 | super(AnchorGenerator, self).__init__() 45 | 46 | if not isinstance(sizes[0], (list, tuple)): 47 | sizes = tuple((s,) for s in sizes) 48 | if not isinstance(aspect_ratios[0], (list, tuple)): 49 | aspect_ratios = (aspect_ratios,) * len(sizes) 50 | 51 | assert len(sizes) == len(aspect_ratios) 52 | 53 | self.sizes = sizes 54 | self.aspect_ratios = aspect_ratios 55 | self.cell_anchors = None 56 | self._cache = {} 57 | 58 | def generate_anchors(self, scales, aspect_ratios, dtype=torch.float32, device="cpu"): 59 | # type: (List[int], List[float], int, Device) -> Tensor # noqa: F821 60 | scales = torch.as_tensor(scales, dtype=dtype, device=device) 61 | aspect_ratios = torch.as_tensor(aspect_ratios, dtype=dtype, device=device) 62 | h_ratios = torch.sqrt(aspect_ratios) 63 | w_ratios = 1 / h_ratios 64 | 65 | ws = (w_ratios[:, None] * scales[None, :]).view(-1) 66 | hs = (h_ratios[:, None] * scales[None, :]).view(-1) 67 | 68 | base_anchors = torch.stack([-ws, -hs, ws, hs], dim=1) / 2 69 | return base_anchors.round() 70 | 71 | def set_cell_anchors(self, dtype, device): 72 | # type: (int, Device) -> None # noqa: F821 73 | if self.cell_anchors is not None: 74 | cell_anchors = self.cell_anchors 75 | assert cell_anchors is not None 76 | # suppose that all anchors have the same device 77 | # which is a valid assumption in the current state of the codebase 78 | if cell_anchors[0].device == device: 79 | return 80 | 81 | cell_anchors = [ 82 | self.generate_anchors( 83 | sizes, 84 | aspect_ratios, 85 | dtype, 86 | device 87 | ) 88 | for sizes, aspect_ratios in zip(self.sizes, self.aspect_ratios) 89 | ] 90 | self.cell_anchors = cell_anchors 91 | 92 | def num_anchors_per_location(self): 93 | return [len(s) * len(a) for s, a in zip(self.sizes, self.aspect_ratios)] 94 | 95 | # For every combination of (a, (g, s), i) in (self.cell_anchors, zip(grid_sizes, strides), 0:2), 96 | # output g[i] anchors that are s[i] distance apart in direction i, with the same dimensions as a. 97 | def grid_anchors(self, grid_sizes, strides): 98 | # type: (List[List[int]], List[List[Tensor]]) -> List[Tensor] 99 | anchors = [] 100 | cell_anchors = self.cell_anchors 101 | assert cell_anchors is not None 102 | assert len(grid_sizes) == len(strides) == len(cell_anchors) 103 | 104 | for size, stride, base_anchors in zip( 105 | grid_sizes, strides, cell_anchors 106 | ): 107 | grid_height, grid_width = size 108 | stride_height, stride_width = stride 109 | device = base_anchors.device 110 | 111 | # For output anchor, compute [x_center, y_center, x_center, y_center] 112 | shifts_x = torch.arange( 113 | 0, grid_width, dtype=torch.float32, device=device 114 | ) * stride_width 115 | shifts_y = torch.arange( 116 | 0, grid_height, dtype=torch.float32, device=device 117 | ) * stride_height 118 | shift_y, shift_x = torch.meshgrid(shifts_y, shifts_x) 119 | shift_x = shift_x.reshape(-1) 120 | shift_y = shift_y.reshape(-1) 121 | shifts = torch.stack((shift_x, shift_y, shift_x, shift_y), dim=1) 122 | 123 | # For every (base anchor, output anchor) pair, 124 | # offset each zero-centered base anchor by the center of the output anchor. 125 | anchors.append( 126 | (shifts.view(-1, 1, 4) + base_anchors.view(1, -1, 4)).reshape(-1, 4) 127 | ) 128 | 129 | return anchors 130 | 131 | def cached_grid_anchors(self, grid_sizes, strides): 132 | # type: (List[List[int]], List[List[Tensor]]) -> List[Tensor] 133 | key = str(grid_sizes) + str(strides) 134 | if key in self._cache: 135 | return self._cache[key] 136 | anchors = self.grid_anchors(grid_sizes, strides) 137 | self._cache[key] = anchors 138 | return anchors 139 | 140 | def forward(self, image_list, feature_maps): 141 | # type: (ImageList, List[Tensor]) -> List[Tensor] 142 | grid_sizes = list([feature_map.shape[-2:] for feature_map in feature_maps]) 143 | image_size = image_list.tensors.shape[-2:] 144 | dtype, device = feature_maps[0].dtype, feature_maps[0].device 145 | strides = [[torch.tensor(image_size[0] // g[0], dtype=torch.int64, device=device), 146 | torch.tensor(image_size[1] // g[1], dtype=torch.int64, device=device)] for g in grid_sizes] 147 | self.set_cell_anchors(dtype, device) 148 | anchors_over_all_feature_maps = self.cached_grid_anchors(grid_sizes, strides) 149 | self._cache.clear() 150 | return anchors_over_all_feature_maps 151 | 152 | 153 | def get_anchor_boxes(image: torch.tensor, 154 | rcnn_transform: GeneralizedRCNNTransform, 155 | feature_map_size: tuple, 156 | anchor_size: Tuple[tuple] = ((128, 256, 512),), 157 | aspect_ratios: Tuple[tuple] = ((1.0,),), 158 | ): 159 | """ 160 | Returns the anchors for a given image and feature map. 161 | image should be a torch.tensor with shape [C, H, W]. 162 | feature_map_size should be a tuple with shape (C, H, W]). 163 | Only one feature map supported at the moment. 164 | 165 | Example: 166 | 167 | from torchvision.models.detection.transform import GeneralizedRCNNTransform 168 | 169 | transform = GeneralizedRCNNTransform(min_size=1024, 170 | max_size=1024, 171 | image_mean=[0.485, 0.456, 0.406], 172 | image_std=[0.229, 0.224, 0.225]) 173 | 174 | image = dataset[0]['x'] # ObjectDetectionDataSet 175 | 176 | anchors = get_anchor_boxes(image, 177 | transform, 178 | feature_map_size=(512, 16, 16), 179 | anchor_size=((128, 256, 512),), 180 | aspect_ratios=((1.0, 2.0),) 181 | ) 182 | """ 183 | 184 | image_transformed = rcnn_transform([image]) 185 | 186 | features = [torch.rand(size=feature_map_size)] 187 | 188 | anchor_gen = AnchorGenerator(anchor_size, aspect_ratios) 189 | anchors = anchor_gen(image_list=image_transformed[0], feature_maps=features) 190 | 191 | return anchors[0] 192 | -------------------------------------------------------------------------------- /faster-rcnn-tutorial/detection/backbone_resnet.py: -------------------------------------------------------------------------------- 1 | import torch 2 | import torchvision.models as models 3 | from torch import nn 4 | from torchvision.models import resnet 5 | from torchvision.models._utils import IntermediateLayerGetter 6 | from torchvision.ops import misc as misc_nn_ops 7 | from torchvision.ops.feature_pyramid_network import FeaturePyramidNetwork 8 | 9 | 10 | def get_resnet_backbone(backbone_name: str): 11 | """ 12 | Returns a resnet backbone pretrained on ImageNet. 13 | Removes the average-pooling layer and the linear layer at the end. 14 | """ 15 | if backbone_name == 'resnet18': 16 | pretrained_model = models.resnet18(pretrained=True, progress=False) 17 | out_channels = 512 18 | elif backbone_name == 'resnet34': 19 | pretrained_model = models.resnet34(pretrained=True, progress=False) 20 | out_channels = 512 21 | elif backbone_name == 'resnet50': 22 | pretrained_model = models.resnet50(pretrained=True, progress=False) 23 | out_channels = 2048 24 | elif backbone_name == 'resnet101': 25 | pretrained_model = models.resnet101(pretrained=True, progress=False) 26 | out_channels = 2048 27 | elif backbone_name == 'resnet152': 28 | pretrained_model = models.resnet152(pretrained=True, progress=False) 29 | out_channels = 2048 30 | 31 | backbone = torch.nn.Sequential(*list(pretrained_model.children())[:-2]) 32 | backbone.out_channels = out_channels 33 | 34 | return backbone 35 | 36 | 37 | def get_resnet_fpn_backbone(backbone_name: str, pretrained: bool = True, trainable_layers: int = 5): 38 | """ 39 | Returns a resnet backbone with fpn pretrained on ImageNet. 40 | """ 41 | backbone = resnet_fpn_backbone(backbone_name=backbone_name, 42 | pretrained=pretrained, 43 | trainable_layers=trainable_layers) 44 | 45 | backbone.out_channels = 256 46 | return backbone 47 | 48 | 49 | def resnet_fpn_backbone(backbone_name: str, 50 | pretrained: bool, 51 | norm_layer=misc_nn_ops.FrozenBatchNorm2d, 52 | trainable_layers: int = 3, 53 | returned_layers=None, 54 | extra_blocks=None 55 | ): 56 | # Slight adaptation from the original pytorch vision package 57 | # Changes: Removed extra_blocks parameter - This parameter invokes LastLevelMaxPool(), which I don't need 58 | """ 59 | Constructs a specified ResNet backbone with FPN on top. Freezes the specified number of layers in the backbone. 60 | 61 | Arguments: 62 | backbone_name (string): resnet architecture. Possible values are 'ResNet', 'resnet18', 'resnet34', 'resnet50', 63 | 'resnet101', 'resnet152', 'resnext50_32x4d', 'resnext101_32x8d', 'wide_resnet50_2', 'wide_resnet101_2' 64 | norm_layer (torchvision.ops): it is recommended to use the default value. For details visit: 65 | (https://github.com/facebookresearch/maskrcnn-benchmark/issues/267) 66 | pretrained (bool): If True, returns a model with backbone pre-trained on Imagenet 67 | trainable_layers (int): number of trainable (not frozen) resnet layers starting from final block. 68 | Valid values are between 0 and 5, with 5 meaning all backbone layers are trainable. 69 | """ 70 | backbone = resnet.__dict__[backbone_name]( 71 | pretrained=pretrained, 72 | norm_layer=norm_layer) 73 | 74 | # select layers that wont be frozen 75 | assert trainable_layers <= 5 and trainable_layers >= 0 76 | layers_to_train = ['layer4', 'layer3', 'layer2', 'layer1', 'conv1'][:trainable_layers] 77 | # freeze layers only if pretrained backbone is used 78 | for name, parameter in backbone.named_parameters(): 79 | if all([not name.startswith(layer) for layer in layers_to_train]): 80 | parameter.requires_grad_(False) 81 | 82 | if returned_layers is None: 83 | returned_layers = [1, 2, 3, 4] 84 | assert min(returned_layers) > 0 and max(returned_layers) < 5 85 | return_layers = {f'layer{k}': str(v) for v, k in enumerate(returned_layers)} 86 | 87 | in_channels_stage2 = backbone.inplanes // 8 88 | in_channels_list = [in_channels_stage2 * 2 ** (i - 1) for i in returned_layers] 89 | out_channels = 256 90 | return BackboneWithFPN(backbone, return_layers, in_channels_list, out_channels, extra_blocks=extra_blocks) 91 | 92 | 93 | class BackboneWithFPN(nn.Module): 94 | """ 95 | Adds a FPN on top of a model. 96 | Internally, it uses torchvision.models._utils.IntermediateLayerGetter to 97 | extract a submodel that returns the feature maps specified in return_layers. 98 | The same limitations of IntermediatLayerGetter apply here. 99 | Arguments: 100 | backbone (nn.Module) 101 | return_layers (Dict[name, new_name]): a dict containing the names 102 | of the modules for which the activations will be returned as 103 | the key of the dict, and the value of the dict is the name 104 | of the returned activation (which the user can specify). 105 | in_channels_list (List[int]): number of channels for each feature map 106 | that is returned, in the order they are present in the OrderedDict 107 | out_channels (int): number of channels in the FPN. 108 | Attributes: 109 | out_channels (int): the number of channels in the FPN 110 | """ 111 | 112 | def __init__(self, backbone, return_layers, in_channels_list, out_channels, extra_blocks=None): 113 | super(BackboneWithFPN, self).__init__() 114 | 115 | self.body = IntermediateLayerGetter(backbone, return_layers=return_layers) 116 | self.fpn = FeaturePyramidNetwork( 117 | in_channels_list=in_channels_list, 118 | out_channels=out_channels, 119 | extra_blocks=extra_blocks, 120 | ) 121 | self.out_channels = out_channels 122 | 123 | def forward(self, x): 124 | x = self.body(x) 125 | x = self.fpn(x) 126 | return x 127 | -------------------------------------------------------------------------------- /faster-rcnn-tutorial/detection/faster_RCNN.py: -------------------------------------------------------------------------------- 1 | from collections import OrderedDict 2 | from itertools import chain 3 | from typing import Tuple, List 4 | 5 | import pytorch_lightning as pl 6 | import torch 7 | from torchvision.models.detection.faster_rcnn import FasterRCNN 8 | from torchvision.models.detection.rpn import AnchorGenerator 9 | from torchvision.ops import MultiScaleRoIAlign 10 | 11 | from metrics.enumerators import MethodAveragePrecision 12 | from metrics.pascal_voc_evaluator import get_pascalvoc_metrics 13 | from .backbone_resnet import get_resnet_backbone, get_resnet_fpn_backbone 14 | from .utils import from_dict_to_boundingbox 15 | 16 | 17 | def get_anchor_generator(anchor_size: Tuple[tuple] = None, aspect_ratios: Tuple[tuple] = None): 18 | """Returns the anchor generator.""" 19 | if anchor_size is None: 20 | anchor_size = ((16,), (32,), (64,), (128,)) 21 | if aspect_ratios is None: 22 | aspect_ratios = ((0.5, 1.0, 2.0),) * len(anchor_size) 23 | 24 | anchor_generator = AnchorGenerator(sizes=anchor_size, 25 | aspect_ratios=aspect_ratios) 26 | return anchor_generator 27 | 28 | 29 | def get_roi_pool(featmap_names: List[str] = None, output_size: int = 7, sampling_ratio: int = 2): 30 | """Returns the ROI Pooling""" 31 | if featmap_names is None: 32 | # default for resnet with FPN 33 | featmap_names = ['0', '1', '2', '3'] 34 | 35 | roi_pooler = MultiScaleRoIAlign(featmap_names=featmap_names, 36 | output_size=output_size, 37 | sampling_ratio=sampling_ratio) 38 | 39 | return roi_pooler 40 | 41 | 42 | def get_fasterRCNN(backbone: torch.nn.Module, 43 | anchor_generator: AnchorGenerator, 44 | roi_pooler: MultiScaleRoIAlign, 45 | num_classes: int, 46 | image_mean: List[float] = [0.485, 0.456, 0.406], 47 | image_std: List[float] = [0.229, 0.224, 0.225], 48 | min_size: int = 512, 49 | max_size: int = 1024, 50 | **kwargs 51 | ): 52 | """Returns the Faster-RCNN model. Default normalization: ImageNet""" 53 | model = FasterRCNN(backbone=backbone, 54 | rpn_anchor_generator=anchor_generator, 55 | box_roi_pool=roi_pooler, 56 | num_classes=num_classes, 57 | image_mean=image_mean, # ImageNet 58 | image_std=image_std, # ImageNet 59 | min_size=min_size, 60 | max_size=max_size, 61 | **kwargs 62 | ) 63 | model.num_classes = num_classes 64 | model.image_mean = image_mean 65 | model.image_std = image_std 66 | model.min_size = min_size 67 | model.max_size = max_size 68 | 69 | return model 70 | 71 | 72 | def get_fasterRCNN_resnet(num_classes: int, 73 | backbone_name: str, 74 | anchor_size: List[float], 75 | aspect_ratios: List[float], 76 | fpn: bool = True, 77 | min_size: int = 512, 78 | max_size: int = 1024, 79 | **kwargs 80 | ): 81 | """Returns the Faster-RCNN model with resnet backbone with and without fpn.""" 82 | 83 | # Backbone 84 | if fpn: 85 | backbone = get_resnet_fpn_backbone(backbone_name=backbone_name) 86 | else: 87 | backbone = get_resnet_backbone(backbone_name=backbone_name) 88 | 89 | # Anchors 90 | anchor_size = anchor_size 91 | aspect_ratios = aspect_ratios * len(anchor_size) 92 | anchor_generator = get_anchor_generator(anchor_size=anchor_size, aspect_ratios=aspect_ratios) 93 | 94 | # ROI Pool 95 | with torch.no_grad(): 96 | backbone.eval() 97 | random_input = torch.rand(size=(1, 3, 512, 512)) 98 | features = backbone(random_input) 99 | 100 | if isinstance(features, torch.Tensor): 101 | 102 | features = OrderedDict([('0', features)]) 103 | 104 | featmap_names = [key for key in features.keys() if key.isnumeric()] 105 | 106 | roi_pool = get_roi_pool(featmap_names=featmap_names) 107 | 108 | # Model 109 | return get_fasterRCNN(backbone=backbone, 110 | anchor_generator=anchor_generator, 111 | roi_pooler=roi_pool, 112 | num_classes=num_classes, 113 | min_size=min_size, 114 | max_size=max_size, 115 | **kwargs) 116 | 117 | 118 | class FasterRCNN_lightning(pl.LightningModule): 119 | def __init__(self, 120 | model: torch.nn.Module, 121 | lr: float = 0.0001, 122 | iou_threshold: float = 0.5 123 | ): 124 | super().__init__() 125 | 126 | # Model 127 | self.model = model 128 | 129 | # Classes (background inclusive) 130 | self.num_classes = self.model.num_classes 131 | 132 | # Learning rate 133 | self.lr = lr 134 | 135 | # IoU threshold 136 | self.iou_threshold = iou_threshold 137 | 138 | # Transformation parameters 139 | self.mean = model.image_mean 140 | self.std = model.image_std 141 | self.min_size = model.min_size 142 | self.max_size = model.max_size 143 | 144 | # Save hyperparameters 145 | self.save_hyperparameters() 146 | 147 | def forward(self, x): 148 | self.model.eval() 149 | return self.model(x) 150 | 151 | def training_step(self, batch, batch_idx): 152 | # Batch 153 | x, y, x_name, y_name = batch # tuple unpacking 154 | 155 | loss_dict = self.model(x, y) 156 | loss = sum(loss for loss in loss_dict.values()) 157 | 158 | self.log_dict(loss_dict) 159 | return loss 160 | 161 | def validation_step(self, batch, batch_idx): 162 | # Batch 163 | x, y, x_name, y_name = batch 164 | 165 | # Inference 166 | preds = self.model(x) 167 | 168 | gt_boxes = [from_dict_to_boundingbox(target, name=name, groundtruth=True) for target, name in zip(y, x_name)] 169 | gt_boxes = list(chain(*gt_boxes)) 170 | 171 | pred_boxes = [from_dict_to_boundingbox(pred, name=name, groundtruth=False) for pred, name in zip(preds, x_name)] 172 | pred_boxes = list(chain(*pred_boxes)) 173 | 174 | return {'pred_boxes': pred_boxes, 'gt_boxes': gt_boxes} 175 | 176 | def validation_epoch_end(self, outs): 177 | gt_boxes = [out['gt_boxes'] for out in outs] 178 | gt_boxes = list(chain(*gt_boxes)) 179 | pred_boxes = [out['pred_boxes'] for out in outs] 180 | pred_boxes = list(chain(*pred_boxes)) 181 | 182 | metric = get_pascalvoc_metrics(gt_boxes=gt_boxes, 183 | det_boxes=pred_boxes, 184 | iou_threshold=self.iou_threshold, 185 | method=MethodAveragePrecision.EVERY_POINT_INTERPOLATION, 186 | generate_table=True) 187 | 188 | per_class, mAP = metric['per_class'], metric['mAP'] 189 | self.log('Validation_mAP', mAP) 190 | 191 | for key, value in per_class.items(): 192 | self.log(f'Validation_AP_{key}', value['AP']) 193 | 194 | def test_step(self, batch, batch_idx): 195 | # Batch 196 | x, y, x_name, y_name = batch 197 | 198 | # Inference 199 | preds = self.model(x) 200 | 201 | gt_boxes = [from_dict_to_boundingbox(target, name=name, groundtruth=True) for target, name in zip(y, x_name)] 202 | gt_boxes = list(chain(*gt_boxes)) 203 | 204 | pred_boxes = [from_dict_to_boundingbox(pred, name=name, groundtruth=False) for pred, name in zip(preds, x_name)] 205 | pred_boxes = list(chain(*pred_boxes)) 206 | 207 | return {'pred_boxes': pred_boxes, 'gt_boxes': gt_boxes} 208 | 209 | def test_epoch_end(self, outs): 210 | gt_boxes = [out['gt_boxes'] for out in outs] 211 | gt_boxes = list(chain(*gt_boxes)) 212 | pred_boxes = [out['pred_boxes'] for out in outs] 213 | pred_boxes = list(chain(*pred_boxes)) 214 | 215 | metric = get_pascalvoc_metrics(gt_boxes=gt_boxes, 216 | det_boxes=pred_boxes, 217 | iou_threshold=self.iou_threshold, 218 | method=MethodAveragePrecision.EVERY_POINT_INTERPOLATION, 219 | generate_table=True) 220 | 221 | per_class, mAP = metric['per_class'], metric['mAP'] 222 | self.log('Test_mAP', mAP) 223 | 224 | for key, value in per_class.items(): 225 | self.log(f'Test_AP_{key}', value['AP']) 226 | 227 | def configure_optimizers(self): 228 | optimizer = torch.optim.SGD(self.model.parameters(), 229 | lr=self.lr, 230 | momentum=0.9, 231 | weight_decay=0.005) 232 | lr_scheduler = torch.optim.lr_scheduler.ReduceLROnPlateau(optimizer, 233 | mode='max', 234 | factor=0.75, 235 | patience=30, 236 | min_lr=0) 237 | return {'optimizer': optimizer, 'lr_scheduler': lr_scheduler, 'monitor': 'Validation_mAP'} 238 | -------------------------------------------------------------------------------- /faster-rcnn-tutorial/detection/transformations.py: -------------------------------------------------------------------------------- 1 | from functools import partial 2 | from typing import List, Callable 3 | 4 | import albumentations as A 5 | import numpy as np 6 | import torch 7 | from sklearn.externals._pilutil import bytescale 8 | from torchvision.ops import nms 9 | 10 | 11 | def normalize_01(inp: np.ndarray): 12 | """Squash image input to the value range [0, 1] (no clipping)""" 13 | inp_out = (inp - np.min(inp)) / np.ptp(inp) 14 | return inp_out 15 | 16 | 17 | def normalize(inp: np.ndarray, mean: float, std: float): 18 | """Normalize based on mean and standard deviation.""" 19 | inp_out = (inp - mean) / std 20 | return inp_out 21 | 22 | 23 | def re_normalize(inp: np.ndarray, 24 | low: int = 0, 25 | high: int = 255 26 | ): 27 | """Normalize the data to a certain range. Default: [0-255]""" 28 | inp_out = bytescale(inp, low=low, high=high) 29 | return inp_out 30 | 31 | 32 | def clip_bbs(inp: np.ndarray, 33 | bbs: np.ndarray): 34 | """ 35 | If the bounding boxes exceed one dimension, they are clipped to the dim's maximum. 36 | Bounding boxes are expected to be in xyxy format. 37 | Example: x_value=224 but x_shape=200 -> x1=199 38 | """ 39 | 40 | def clip(value: int, max: int): 41 | 42 | if value >= max - 1: 43 | value = max - 1 44 | elif value <= 0: 45 | value = 0 46 | 47 | return value 48 | 49 | output = [] 50 | for bb in bbs: 51 | x1, y1, x2, y2 = tuple(bb) 52 | x_shape = inp.shape[1] 53 | y_shape = inp.shape[0] 54 | 55 | x1 = clip(x1, x_shape) 56 | y1 = clip(y1, y_shape) 57 | x2 = clip(x2, x_shape) 58 | y2 = clip(y2, y_shape) 59 | 60 | output.append([x1, y1, x2, y2]) 61 | 62 | return np.array(output) 63 | 64 | 65 | def map_class_to_int(labels: List[str], mapping: dict): 66 | """Maps a string to an integer.""" 67 | labels = np.array(labels) 68 | dummy = np.empty_like(labels) 69 | for key, value in mapping.items(): 70 | dummy[labels == key] = value 71 | 72 | return dummy.astype(np.uint8) 73 | 74 | 75 | def apply_nms(target: dict, iou_threshold): 76 | """Non-maximum Suppression""" 77 | boxes = torch.tensor(target['boxes']) 78 | labels = torch.tensor(target['labels']) 79 | scores = torch.tensor(target['scores']) 80 | 81 | if boxes.size()[0] > 0: 82 | mask = nms(boxes, scores, iou_threshold=iou_threshold) 83 | mask = (np.array(mask),) 84 | 85 | target['boxes'] = np.asarray(boxes)[mask] 86 | target['labels'] = np.asarray(labels)[mask] 87 | target['scores'] = np.asarray(scores)[mask] 88 | 89 | return target 90 | 91 | 92 | def apply_score_threshold(target: dict, score_threshold): 93 | """Removes bounding box predictions with low scores.""" 94 | boxes = target['boxes'] 95 | labels = target['labels'] 96 | scores = target['scores'] 97 | 98 | mask = np.where(scores > score_threshold) 99 | target['boxes'] = boxes[mask] 100 | target['labels'] = labels[mask] 101 | target['scores'] = scores[mask] 102 | 103 | return target 104 | 105 | 106 | class Repr: 107 | """Evaluatable string representation of an object""" 108 | 109 | def __repr__(self): return f'{self.__class__.__name__}: {self.__dict__}' 110 | 111 | 112 | class FunctionWrapperSingle(Repr): 113 | """A function wrapper that returns a partial for input only.""" 114 | 115 | def __init__(self, function: Callable, *args, **kwargs): 116 | self.function = partial(function, *args, **kwargs) 117 | 118 | def __call__(self, inp: np.ndarray): return self.function(inp) 119 | 120 | 121 | class FunctionWrapperDouble(Repr): 122 | """A function wrapper that returns a partial for an input-target pair.""" 123 | 124 | def __init__(self, function: Callable, input: bool = True, target: bool = False, *args, **kwargs): 125 | self.function = partial(function, *args, **kwargs) 126 | self.input = input 127 | self.target = target 128 | 129 | def __call__(self, inp: np.ndarray, tar: dict): 130 | if self.input: inp = self.function(inp) 131 | if self.target: tar = self.function(tar) 132 | return inp, tar 133 | 134 | 135 | class Compose: 136 | """Baseclass - composes several transforms together.""" 137 | 138 | def __init__(self, transforms: List[Callable]): 139 | self.transforms = transforms 140 | 141 | def __repr__(self): return str([transform for transform in self.transforms]) 142 | 143 | 144 | class ComposeDouble(Compose): 145 | """Composes transforms for input-target pairs.""" 146 | 147 | def __call__(self, inp: np.ndarray, target: dict): 148 | for t in self.transforms: 149 | inp, target = t(inp, target) 150 | return inp, target 151 | 152 | 153 | class ComposeSingle(Compose): 154 | """Composes transforms for input only.""" 155 | 156 | def __call__(self, inp: np.ndarray): 157 | for t in self.transforms: 158 | inp = t(inp) 159 | return inp 160 | 161 | 162 | class AlbumentationWrapper(Repr): 163 | """ 164 | A wrapper for the albumentation package. 165 | Bounding boxes are expected to be in xyxy format (pascal_voc). 166 | Bounding boxes cannot be larger than the spatial image's dimensions. 167 | Use Clip() if your bounding boxes are outside of the image, before using this wrapper. 168 | """ 169 | 170 | def __init__(self, albumentation: Callable, format: str = 'pascal_voc'): 171 | self.albumentation = albumentation 172 | self.format = format 173 | 174 | def __call__(self, inp: np.ndarray, tar: dict): 175 | # input, target 176 | transform = A.Compose([ 177 | self.albumentation 178 | ], bbox_params=A.BboxParams(format=self.format, label_fields=['class_labels'])) 179 | 180 | out_dict = transform(image=inp, bboxes=tar['boxes'], class_labels=tar['labels']) 181 | 182 | input_out = np.array(out_dict['image']) 183 | boxes = np.array(out_dict['bboxes']) 184 | labels = np.array(out_dict['class_labels']) 185 | 186 | tar['boxes'] = boxes 187 | tar['labels'] = labels 188 | 189 | return input_out, tar 190 | 191 | 192 | class Clip(Repr): 193 | """ 194 | If the bounding boxes exceed one dimension, they are clipped to the dim's maximum. 195 | Bounding boxes are expected to be in xyxy format. 196 | Example: x_value=224 but x_shape=200 -> x1=199 197 | """ 198 | 199 | def __call__(self, inp: np.ndarray, tar: dict): 200 | new_boxes = clip_bbs(inp=inp, bbs=tar['boxes']) 201 | tar['boxes'] = new_boxes 202 | 203 | return inp, tar 204 | -------------------------------------------------------------------------------- /faster-rcnn-tutorial/detection/utils.py: -------------------------------------------------------------------------------- 1 | import json 2 | import os 3 | import pathlib 4 | import cv2 5 | 6 | import importlib_metadata 7 | import numpy as np 8 | import pandas as pd 9 | import torch 10 | from IPython import get_ipython 11 | from neptunecontrib.api import log_table 12 | from torchvision.models.detection.transform import GeneralizedRCNNTransform 13 | from torchvision.ops import box_convert, box_area 14 | 15 | from metrics.bounding_box import BoundingBox 16 | from metrics.enumerators import BBFormat, BBType 17 | 18 | 19 | def get_filenames_of_path(path: pathlib.Path, ext: str = '*'): 20 | """ 21 | Returns a list of files in a directory/path. Uses pathlib. 22 | """ 23 | filenames = [file for file in path.glob(ext) if file.is_file()] 24 | assert len(filenames) > 0, f'No files found in path: {path}' 25 | return filenames 26 | 27 | 28 | def read_json(path: pathlib.Path): 29 | with open(str(path), 'r') as fp: # fp is the file pointer 30 | file = json.loads(s=fp.read()) 31 | 32 | return file 33 | 34 | 35 | def save_json(obj, path: pathlib.Path): 36 | with open(path, 'w') as fp: # fp is the file pointer 37 | json.dump(obj=obj, fp=fp, indent=4, sort_keys=False) 38 | 39 | 40 | def collate_double(batch): 41 | """ 42 | collate function for the ObjectDetectionDataSet. 43 | Only used by the dataloader. 44 | """ 45 | x = [sample['x'] for sample in batch] 46 | y = [sample['y'] for sample in batch] 47 | x_name = [sample['x_name'] for sample in batch] 48 | y_name = [sample['y_name'] for sample in batch] 49 | return x, y, x_name, y_name 50 | 51 | 52 | def collate_single(batch): 53 | """ 54 | collate function for the ObjectDetectionDataSetSingle. 55 | Only used by the dataloader. 56 | """ 57 | x = [sample['x'] for sample in batch] 58 | x_name = [sample['x_name'] for sample in batch] 59 | return x, x_name 60 | 61 | 62 | def color_mapping_func(labels, mapping): 63 | """Maps an label (integer or string) to a color""" 64 | color_list = [mapping[value] for value in labels] 65 | return color_list 66 | 67 | 68 | def enable_gui_qt(): 69 | """Performs the magic command %gui qt""" 70 | ipython = get_ipython() 71 | ipython.magic('gui qt') 72 | 73 | 74 | def stats_dataset(dataset, rcnn_transform: GeneralizedRCNNTransform = False): 75 | """ 76 | Iterates over the dataset and returns some stats. 77 | Can be useful to pick the right anchor box sizes. 78 | """ 79 | stats = { 80 | 'image_height': [], 81 | 'image_width': [], 82 | 'image_mean': [], 83 | 'image_std': [], 84 | 'boxes_height': [], 85 | 'boxes_width': [], 86 | 'boxes_num': [], 87 | 'boxes_area': [] 88 | } 89 | for batch in dataset: 90 | # Batch 91 | x, y, x_name, y_name = batch['x'], batch['y'], batch['x_name'], batch['y_name'] 92 | 93 | # Transform 94 | if rcnn_transform: 95 | x, y = rcnn_transform([x], [y]) 96 | x, y = x.tensors, y[0] 97 | 98 | # Image 99 | stats['image_height'].append(x.shape[-2]) 100 | stats['image_width'].append(x.shape[-1]) 101 | stats['image_mean'].append(x.mean().item()) 102 | stats['image_std'].append(x.std().item()) 103 | 104 | # Target 105 | wh = box_convert(y['boxes'], 'xyxy', 'xywh')[:, -2:] 106 | stats['boxes_height'].append(wh[:, -2]) 107 | stats['boxes_width'].append(wh[:, -1]) 108 | stats['boxes_num'].append(len(wh)) 109 | stats['boxes_area'].append(box_area(y['boxes'])) 110 | 111 | stats['image_height'] = torch.tensor(stats['image_height'], dtype=torch.float) 112 | stats['image_width'] = torch.tensor(stats['image_width'], dtype=torch.float) 113 | stats['image_mean'] = torch.tensor(stats['image_mean'], dtype=torch.float) 114 | stats['image_std'] = torch.tensor(stats['image_std'], dtype=torch.float) 115 | stats['boxes_height'] = torch.cat(stats['boxes_height']) 116 | stats['boxes_width'] = torch.cat(stats['boxes_width']) 117 | stats['boxes_area'] = torch.cat(stats['boxes_area']) 118 | stats['boxes_num'] = torch.tensor(stats['boxes_num'], dtype=torch.float) 119 | 120 | return stats 121 | 122 | 123 | def from_file_to_boundingbox(file_name: pathlib.Path, groundtruth: bool = True): 124 | """Returns a list of BoundingBox objects from groundtruth or prediction.""" 125 | file = torch.load(file_name) 126 | labels = file['labels'] 127 | boxes = file['boxes'] 128 | scores = file['scores'] if not groundtruth else [None] * len(boxes) 129 | 130 | gt = BBType.GROUND_TRUTH if groundtruth else BBType.DETECTED 131 | 132 | return [BoundingBox(image_name=file_name.stem, 133 | class_id=l, 134 | coordinates=tuple(bb), 135 | format=BBFormat.XYX2Y2, 136 | bb_type=gt, 137 | confidence=s) for bb, l, s in zip(boxes, labels, scores)] 138 | 139 | 140 | def from_dict_to_boundingbox(file: dict, name: str, groundtruth: bool = True): 141 | """Returns list of BoundingBox objects from groundtruth or prediction.""" 142 | labels = file['labels'] 143 | boxes = file['boxes'] 144 | scores = np.array(file['scores'].cpu()) if not groundtruth else [None] * len(boxes) 145 | 146 | gt = BBType.GROUND_TRUTH if groundtruth else BBType.DETECTED 147 | 148 | return [BoundingBox(image_name=name, 149 | class_id=int(l), 150 | coordinates=tuple(bb), 151 | format=BBFormat.XYX2Y2, 152 | bb_type=gt, 153 | confidence=s) for bb, l, s in zip(boxes, labels, scores)] 154 | 155 | 156 | def log_packages_neptune(neptune_logger): 157 | """Uses the neptunecontrib.api to log the packages of the current python env.""" 158 | dists = importlib_metadata.distributions() 159 | packages = {idx: (dist.metadata['Name'], dist.version) for idx, dist in enumerate(dists)} 160 | 161 | packages_df = pd.DataFrame.from_dict(packages, orient='index', columns=['package', 'version']) 162 | 163 | log_table(name='packages', table=packages_df, experiment=neptune_logger.experiment) 164 | 165 | 166 | def log_mapping_neptune(mapping: dict, neptune_logger): 167 | """Uses the neptunecontrib.api to log a class mapping.""" 168 | mapping_df = pd.DataFrame.from_dict(mapping, orient='index', columns=['class_value']) 169 | log_table(name='mapping', table=mapping_df, experiment=neptune_logger.experiment) 170 | 171 | 172 | def log_model_neptune(checkpoint_path: pathlib.Path, 173 | save_directory: pathlib.Path, 174 | name: str, 175 | neptune_logger): 176 | """Saves the model to disk, uploads it to neptune and removes it again.""" 177 | checkpoint = torch.load(checkpoint_path) 178 | model = checkpoint['hyper_parameters']['model'] 179 | torch.save(model.state_dict(), save_directory / name) 180 | neptune_logger.experiment.set_property('checkpoint_name', checkpoint_path.name) 181 | neptune_logger.experiment.log_artifact(str(save_directory / name)) 182 | if os.path.isfile(save_directory / name): 183 | os.remove(save_directory / name) 184 | 185 | 186 | def log_checkpoint_neptune(checkpoint_path: pathlib.Path, neptune_logger): 187 | neptune_logger.experiment.set_property('checkpoint_name', checkpoint_path.name) 188 | neptune_logger.experiment.log_artifact(str(checkpoint_path)) 189 | -------------------------------------------------------------------------------- /faster-rcnn-tutorial/metrics/README.md: -------------------------------------------------------------------------------- 1 | To compute the metrics of an object detection model, 2 | one can use this [opensource toolbox for object detection metrics](https://github.com/rafaelpadilla/review_object_detection_metrics). 3 | 4 | This work was published in the 5 | [Journal Electronics - Special Issue Deep Learning Based Object Detection](https://www.mdpi.com/journal/electronics/special_issues/learning_based_detection). 6 | 7 | You can download the paper [here](https://github.com/rafaelpadilla/review_object_detection_metrics/blob/main/published_paper.pdf). 8 | ``` 9 | @Article{electronics10030279, 10 | AUTHOR = {Padilla, Rafael and Passos, Wesley L. and Dias, Thadeu L. B. and Netto, Sergio L. and da Silva, Eduardo A. B.}, 11 | TITLE = {A Comparative Analysis of Object Detection Metrics with a Companion Open-Source Toolkit}, 12 | JOURNAL = {Electronics}, 13 | VOLUME = {10}, 14 | YEAR = {2021}, 15 | NUMBER = {3}, 16 | ARTICLE-NUMBER = {279}, 17 | URL = {https://www.mdpi.com/2079-9292/10/3/279}, 18 | ISSN = {2079-9292}, 19 | DOI = {10.3390/electronics10030279} 20 | } 21 | ``` 22 | 23 | 24 | You can find these files in their repo, the code here is slightly adjusted. 25 | 26 | -------------------------------------------------------------------------------- /faster-rcnn-tutorial/metrics/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ifding/deep-learning-python/fc8bc808d5439686f0ee24a4f0f3b1f5354df6c0/faster-rcnn-tutorial/metrics/__init__.py -------------------------------------------------------------------------------- /faster-rcnn-tutorial/metrics/enumerators.py: -------------------------------------------------------------------------------- 1 | from enum import Enum 2 | 3 | 4 | class MethodAveragePrecision(Enum): 5 | """ 6 | Class representing if the coordinates are relative to the 7 | image size or are absolute values. 8 | Developed by: Rafael Padilla 9 | Last modification: Apr 28 2018 10 | """ 11 | EVERY_POINT_INTERPOLATION = 1 12 | ELEVEN_POINT_INTERPOLATION = 2 13 | 14 | 15 | class CoordinatesType(Enum): 16 | """ 17 | Class representing if the coordinates are relative to the 18 | image size or are absolute values. 19 | Developed by: Rafael Padilla 20 | Last modification: Apr 28 2018 21 | """ 22 | RELATIVE = 1 23 | ABSOLUTE = 2 24 | 25 | 26 | class BBType(Enum): 27 | """ 28 | Class representing if the bounding box is groundtruth or not. 29 | """ 30 | GROUND_TRUTH = 1 31 | DETECTED = 2 32 | 33 | 34 | class BBFormat(Enum): 35 | """ 36 | Class representing the format of a bounding box. 37 | """ 38 | XYWH = 1 39 | XYX2Y2 = 2 40 | PASCAL_XML = 3 41 | YOLO = 4 42 | 43 | 44 | class FileFormat(Enum): 45 | ABSOLUTE_TEXT = 1 46 | PASCAL = 2 47 | LABEL_ME = 3 48 | COCO = 4 49 | CVAT = 5 50 | YOLO = 6 51 | OPENIMAGE = 7 52 | IMAGENET = 8 53 | UNKNOWN = 9 -------------------------------------------------------------------------------- /faster-rcnn-tutorial/metrics/general_utils.py: -------------------------------------------------------------------------------- 1 | import fnmatch 2 | import os 3 | 4 | import cv2 5 | import matplotlib.pyplot as plt 6 | import numpy as np 7 | from PyQt5 import QtCore, QtGui 8 | 9 | from metrics.enumerators import BBFormat 10 | 11 | 12 | def get_files_recursively(directory, extension="*"): 13 | if '.' not in extension: 14 | extension = '*.' + extension 15 | files = [ 16 | os.path.join(dirpath, f) for dirpath, dirnames, files in os.walk(directory) 17 | for f in fnmatch.filter(files, extension) 18 | ] 19 | return files 20 | 21 | 22 | def convert_box_xywh2xyxy(box): 23 | arr = box.copy() 24 | arr[:, 2] += arr[:, 0] 25 | arr[:, 3] += arr[:, 1] 26 | return arr 27 | 28 | 29 | def convert_box_xyxy2xywh(box): 30 | arr = box.copy() 31 | arr[:, 2] -= arr[:, 0] 32 | arr[:, 3] -= arr[:, 1] 33 | return arr 34 | 35 | 36 | # size => (width, height) of the image 37 | # box => (X1, X2, Y1, Y2) of the bounding box 38 | def convert_to_relative_values(size, box): 39 | dw = 1. / (size[0]) 40 | dh = 1. / (size[1]) 41 | cx = (box[1] + box[0]) / 2.0 42 | cy = (box[3] + box[2]) / 2.0 43 | w = box[1] - box[0] 44 | h = box[3] - box[2] 45 | x = cx * dw 46 | y = cy * dh 47 | w = w * dw 48 | h = h * dh 49 | # YOLO's format 50 | # x,y => (bounding_box_center)/width_of_the_image 51 | # w => bounding_box_width / width_of_the_image 52 | # h => bounding_box_height / height_of_the_image 53 | return (x, y, w, h) 54 | 55 | 56 | # size => (width, height) of the image 57 | # box => (centerX, centerY, w, h) of the bounding box relative to the image 58 | def convert_to_absolute_values(size, box): 59 | w_box = size[0] * box[2] 60 | h_box = size[1] * box[3] 61 | 62 | x1 = (float(box[0]) * float(size[0])) - (w_box / 2) 63 | y1 = (float(box[1]) * float(size[1])) - (h_box / 2) 64 | x2 = x1 + w_box 65 | y2 = y1 + h_box 66 | return (round(x1), round(y1), round(x2), round(y2)) 67 | 68 | 69 | def add_bb_into_image(image, bb, color=(255, 0, 0), thickness=2, label=None): 70 | r = int(color[0]) 71 | g = int(color[1]) 72 | b = int(color[2]) 73 | 74 | font = cv2.FONT_HERSHEY_SIMPLEX 75 | fontScale = 0.5 76 | fontThickness = 1 77 | 78 | x1, y1, x2, y2 = bb.get_absolute_bounding_box(BBFormat.XYX2Y2) 79 | x1 = int(x1) 80 | y1 = int(y1) 81 | x2 = int(x2) 82 | y2 = int(y2) 83 | cv2.rectangle(image, (x1, y1), (x2, y2), (b, g, r), thickness) 84 | # Add label 85 | if label is not None: 86 | # Get size of the text box 87 | (tw, th) = cv2.getTextSize(label, font, fontScale, fontThickness)[0] 88 | # Top-left coord of the textbox 89 | (xin_bb, yin_bb) = (x1 + thickness, y1 - th + int(12.5 * fontScale)) 90 | # Checking position of the text top-left (outside or inside the bb) 91 | if yin_bb - th <= 0: # if outside the image 92 | yin_bb = y1 + th # put it inside the bb 93 | r_Xin = x1 - int(thickness / 2) 94 | r_Yin = y1 - th - int(thickness / 2) 95 | # Draw filled rectangle to put the text in it 96 | cv2.rectangle(image, (r_Xin, r_Yin - thickness), 97 | (r_Xin + tw + thickness * 3, r_Yin + th + int(12.5 * fontScale)), (b, g, r), 98 | -1) 99 | cv2.putText(image, label, (xin_bb, yin_bb), font, fontScale, (0, 0, 0), fontThickness, 100 | cv2.LINE_AA) 101 | return image 102 | 103 | 104 | def remove_file_extension(filename): 105 | return os.path.join(os.path.dirname(filename), os.path.splitext(filename)[0]) 106 | 107 | 108 | def get_files_dir(directory, extensions=['*']): 109 | ret = [] 110 | for extension in extensions: 111 | if extension == '*': 112 | ret += [f for f in os.listdir(directory)] 113 | continue 114 | elif extension is None: 115 | # accepts all extensions 116 | extension = '' 117 | elif '.' not in extension: 118 | extension = f'.{extension}' 119 | ret += [f for f in os.listdir(directory) if f.endswith(extension)] 120 | return ret 121 | 122 | 123 | def remove_file_extension(filename): 124 | return os.path.join(os.path.dirname(filename), os.path.splitext(filename)[0]) 125 | 126 | 127 | def image_to_pixmap(image): 128 | image = image.astype(np.uint8) 129 | if image.shape[2] == 4: 130 | qformat = QtGui.QImage.Format_RGBA8888 131 | else: 132 | qformat = QtGui.QImage.Format_RGB888 133 | 134 | image = QtGui.QImage(image.data, image.shape[1], image.shape[0], image.strides[0], qformat) 135 | # image= image.rgbSwapped() 136 | return QtGui.QPixmap(image) 137 | 138 | 139 | def show_image_in_qt_component(image, label_component): 140 | pix = image_to_pixmap((image).astype(np.uint8)) 141 | label_component.setPixmap(pix) 142 | label_component.setAlignment(QtCore.Qt.AlignCenter) 143 | 144 | 145 | def get_files_recursively(directory, extension="*"): 146 | if '.' not in extension: 147 | extension = '*.' + extension 148 | files = [ 149 | os.path.join(dirpath, f) for dirpath, dirnames, files in os.walk(directory) 150 | for f in fnmatch.filter(files, extension) 151 | ] 152 | return files 153 | 154 | 155 | def is_str_int(s): 156 | if s[0] in ('-', '+'): 157 | return s[1:].isdigit() 158 | return s.isdigit() 159 | 160 | 161 | def get_file_name_only(file_path): 162 | if file_path is None: 163 | return '' 164 | return os.path.splitext(os.path.basename(file_path))[0] 165 | 166 | 167 | def find_file(directory, file_name, match_extension=True): 168 | if os.path.isdir(directory) is False: 169 | return None 170 | for dirpath, dirnames, files in os.walk(directory): 171 | for f in files: 172 | f1 = os.path.basename(f) 173 | f2 = file_name 174 | if not match_extension: 175 | f1 = os.path.splitext(f1)[0] 176 | f2 = os.path.splitext(f2)[0] 177 | if f1 == f2: 178 | return os.path.join(dirpath, os.path.basename(f)) 179 | return None 180 | 181 | 182 | def get_image_resolution(image_file): 183 | if image_file is None or not os.path.isfile(image_file): 184 | print(f'Warning: Path {image_file} not found.') 185 | return None 186 | img = cv2.imread(image_file) 187 | if img is None: 188 | print(f'Warning: Error loading the image {image_file}.') 189 | return None 190 | h, w, _ = img.shape 191 | return {'height': h, 'width': w} 192 | 193 | 194 | def draw_bb_into_image(image, boundingBox, color, thickness, label=None): 195 | if isinstance(image, str): 196 | image = cv2.imread(image) 197 | 198 | r = int(color[0]) 199 | g = int(color[1]) 200 | b = int(color[2]) 201 | 202 | font = cv2.FONT_HERSHEY_SIMPLEX 203 | fontScale = 0.5 204 | fontThickness = 1 205 | 206 | xIn = boundingBox[0] 207 | yIn = boundingBox[1] 208 | cv2.rectangle(image, (boundingBox[0], boundingBox[1]), (boundingBox[2], boundingBox[3]), 209 | (b, g, r), thickness) 210 | # Add label 211 | if label is not None: 212 | # Get size of the text box 213 | (tw, th) = cv2.getTextSize(label, font, fontScale, fontThickness)[0] 214 | # Top-left coord of the textbox 215 | (xin_bb, yin_bb) = (xIn + thickness, yIn - th + int(12.5 * fontScale)) 216 | # Checking position of the text top-left (outside or inside the bb) 217 | if yin_bb - th <= 0: # if outside the image 218 | yin_bb = yIn + th # put it inside the bb 219 | r_Xin = xIn - int(thickness / 2) 220 | r_Yin = yin_bb - th - int(thickness / 2) 221 | # Draw filled rectangle to put the text in it 222 | cv2.rectangle(image, (r_Xin, r_Yin - thickness), 223 | (r_Xin + tw + thickness * 3, r_Yin + th + int(12.5 * fontScale)), (b, g, r), 224 | -1) 225 | cv2.putText(image, label, (xin_bb, yin_bb), font, fontScale, (0, 0, 0), fontThickness, 226 | cv2.LINE_AA) 227 | return image 228 | 229 | 230 | def plot_bb_per_classes(dict_bbs_per_class, 231 | horizontally=True, 232 | rotation=0, 233 | show=False, 234 | extra_title=''): 235 | plt.close() 236 | if horizontally: 237 | ypos = np.arange(len(dict_bbs_per_class.keys())) 238 | plt.barh(ypos, dict_bbs_per_class.values()) 239 | plt.yticks(ypos, dict_bbs_per_class.keys()) 240 | plt.xlabel('amount of bounding boxes') 241 | plt.ylabel('classes') 242 | else: 243 | plt.bar(dict_bbs_per_class.keys(), dict_bbs_per_class.values()) 244 | plt.xlabel('classes') 245 | plt.ylabel('amount of bounding boxes') 246 | plt.xticks(rotation=rotation) 247 | title = f'Distribution of bounding boxes per class {extra_title}' 248 | plt.title(title) 249 | if show: 250 | # plt.tight_layout() 251 | # plt.show(aspect='auto') 252 | fig = plt.gcf() 253 | fig.canvas.set_window_title(title) 254 | fig.tight_layout() 255 | fig.show() 256 | return plt 257 | -------------------------------------------------------------------------------- /faster-rcnn-tutorial/metrics/pascal_voc_evaluator_test.py: -------------------------------------------------------------------------------- 1 | # Imports 2 | import pathlib 3 | from itertools import chain 4 | 5 | from metrics.enumerators import MethodAveragePrecision 6 | from metrics.pascal_voc_evaluator import get_pascalvoc_metrics 7 | from helper.utils import from_file_to_boundingbox 8 | from helper.utils import get_filenames_of_path 9 | 10 | # root directory 11 | root = pathlib.Path(r"C:\Users\johan\Desktop\Johannes\Heads") 12 | 13 | # input and target files 14 | inputs = get_filenames_of_path(root / 'input') 15 | targets = get_filenames_of_path(root / 'target') 16 | 17 | inputs.sort() 18 | targets.sort() 19 | 20 | # get the gt_boxes from disk 21 | gt_boxes = [from_file_to_boundingbox(file_name, groundtruth=True) for file_name in targets] 22 | # reduce list 23 | gt_boxes = list(chain(*gt_boxes)) 24 | # TODO: add predictions 25 | pred_boxes = [from_file_to_boundingbox(file_name, groundtruth=False) for file_name in targets] 26 | pred_boxes = list(chain(*pred_boxes)) 27 | 28 | output = get_pascalvoc_metrics(gt_boxes=gt_boxes, 29 | det_boxes=pred_boxes, 30 | iou_threshold=0.5, 31 | method=MethodAveragePrecision.EVERY_POINT_INTERPOLATION, 32 | generate_table=True) 33 | 34 | per_class, mAP = output['per_class'], output['mAP'] 35 | head = per_class['head'] 36 | 37 | # %% another test:Difference between computing the mAP per batch and then taking the mean and computing it directly from all batches 38 | all_gt = [] 39 | all_pred = [] 40 | all_mAP = [] 41 | all_per_class = [] 42 | for batch in dataloader_valid: 43 | x, y, x_name, y_name = batch 44 | with torch.no_grad(): 45 | task.model.eval() 46 | preds = task.model(x) 47 | 48 | from itertools import chain 49 | from utils import from_dict_to_BoundingBox 50 | 51 | gt_boxes = list( 52 | chain(*[from_dict_to_BoundingBox(target, name=name, groundtruth=True) for target, name in zip(y, x_name)])) 53 | pred_boxes = list( 54 | chain(*[from_dict_to_BoundingBox(pred, name=name, groundtruth=False) for pred, name in zip(preds, x_name)])) 55 | 56 | all_gt.append(gt_boxes) 57 | all_pred.append(pred_boxes) 58 | 59 | from metrics.pascal_voc_evaluator import get_pascalvoc_metrics 60 | from metrics.enumerators import MethodAveragePrecision 61 | metric = get_pascalvoc_metrics(gt_boxes=gt_boxes, 62 | det_boxes=pred_boxes, 63 | iou_threshold=0.5, 64 | method=MethodAveragePrecision.EVERY_POINT_INTERPOLATION, 65 | generate_table=False) 66 | 67 | per_class, mAP = metric['per_class'], metric['mAP'] 68 | all_per_class.append(per_class) 69 | all_mAP.append(mAP) 70 | 71 | all_tp = [pc[1]['total TP'] for pc in all_per_class] 72 | all_fp = [pc[1]['total FP'] for pc in all_per_class] 73 | 74 | 75 | all_gt = list(chain(*all_gt)) 76 | all_pred = list(chain(*all_pred)) 77 | 78 | m = get_pascalvoc_metrics(gt_boxes=all_gt, 79 | det_boxes=all_pred, 80 | iou_threshold=0.5, 81 | method=MethodAveragePrecision.EVERY_POINT_INTERPOLATION, 82 | generate_table=True) 83 | 84 | per_class, mAP = m['per_class'], m['mAP'] 85 | -------------------------------------------------------------------------------- /faster-rcnn-tutorial/setup.py: -------------------------------------------------------------------------------- 1 | import setuptools 2 | 3 | setuptools.setup( 4 | name="faster_rcnn_tutorial", 5 | version="0.0.1", 6 | author="ifding", 7 | author_email="", 8 | url="https://github.com/ifding/faster-rcnn-tutorial", 9 | packages=setuptools.find_packages(), 10 | include_package_data=True, 11 | classifiers=[ 12 | "Programming Language :: Python :: 3", 13 | "License :: OSI Approved :: MIT License", 14 | "Operating System :: OS Independent", 15 | ], 16 | python_requires='>=3.6', 17 | install_requires=[ 18 | 'numpy', 19 | 'scikit-image', 20 | 'sklearn', 21 | 'neptune-contrib', 22 | 'python-dotenv', 23 | 'albumentations==0.5.2', 24 | 'pytorch-lightning==1.3.5', 25 | 'torch==1.8.1', 26 | 'torchvision==0.9.1', 27 | 'torchsummary==1.5.1', 28 | 'torchmetrics==0.2.0' 29 | ] 30 | ) 31 | -------------------------------------------------------------------------------- /faster-rcnn-tutorial/train.py: -------------------------------------------------------------------------------- 1 | # imports 2 | import os 3 | import pathlib 4 | import json 5 | from dotenv import load_dotenv 6 | 7 | import albumentations as A 8 | import numpy as np 9 | from pytorch_lightning import Trainer 10 | from pytorch_lightning import seed_everything 11 | from pytorch_lightning.callbacks import ModelCheckpoint, LearningRateMonitor, EarlyStopping 12 | from pytorch_lightning.loggers.neptune import NeptuneLogger 13 | from torch.utils.data import DataLoader 14 | from torchvision.models.detection.transform import GeneralizedRCNNTransform 15 | 16 | from custom_dataset import ObjectDetectionDataSet 17 | from detection.faster_RCNN import FasterRCNN_lightning, get_fasterRCNN_resnet 18 | from detection.transformations import Clip, ComposeDouble, AlbumentationWrapper 19 | from detection.transformations import FunctionWrapperDouble, normalize_01 20 | from detection.utils import collate_double, stats_dataset 21 | from detection.utils import log_mapping_neptune, log_model_neptune, log_packages_neptune 22 | 23 | # hyper-parameters 24 | params = {'BATCH_SIZE': 2, 25 | 'OWNER': 'feid', # your username in neptune 26 | 'SAVE_DIR': None, # checkpoints will be saved to cwd 27 | 'LOG_MODEL': False, # whether to log the model to neptune after training 28 | 'GPU': 1, # set to None for cpu training 29 | 'LR': 0.001, 30 | 'PRECISION': 32, 31 | 'CLASSES': 2, 32 | 'SEED': 42, 33 | 'PROJECT': 'Balloon', 34 | 'EXPERIMENT': 'balloon', 35 | 'MAXEPOCHS': 100, 36 | 'PATIENCE': 50, 37 | 'BACKBONE': 'resnet34', 38 | 'FPN': False, 39 | 'ANCHOR_SIZE': ((32, 64, 128, 256, 512),), 40 | 'ASPECT_RATIOS': ((0.5, 1.0, 2.0),), 41 | 'MIN_SIZE': 1024, 42 | 'MAX_SIZE': 1024, 43 | 'IMG_MEAN': [0.485, 0.456, 0.406], 44 | 'IMG_STD': [0.229, 0.224, 0.225], 45 | 'IOU_THRESHOLD': 0.5 46 | } 47 | 48 | 49 | def main(): 50 | # api key, https://github.com/neptune-ai/neptune-client 51 | load_dotenv() # read environment variables 52 | api_key = os.environ['NEPTUNE'] # if this throws an error, you didn't set your env var 53 | 54 | # save directory 55 | save_dir = os.getcwd() if not params['SAVE_DIR'] else params['SAVE_DIR'] 56 | 57 | # custom dataset directory 58 | data_path = 'dataset/balloon' 59 | train_path = os.path.join(data_path, 'train') 60 | val_path = os.path.join(data_path, 'val') 61 | 62 | # label mapping, starting at 1, as the background is assigned 0 63 | mapping = { 64 | 'balloon': 1, 65 | } 66 | 67 | # training transformations and augmentations 68 | transforms_training = ComposeDouble([ 69 | Clip(), 70 | AlbumentationWrapper(albumentation=A.HorizontalFlip(p=0.5)), 71 | AlbumentationWrapper(albumentation=A.RandomScale(p=0.5, scale_limit=0.5)), 72 | #AlbumentationWrapper(albumentation=A.VerticalFlip(p=0.5)), 73 | FunctionWrapperDouble(np.moveaxis, source=-1, destination=0), 74 | FunctionWrapperDouble(normalize_01) 75 | ]) 76 | 77 | # validation transformations 78 | transforms_validation = ComposeDouble([ 79 | Clip(), 80 | FunctionWrapperDouble(np.moveaxis, source=-1, destination=0), 81 | FunctionWrapperDouble(normalize_01) 82 | ]) 83 | 84 | 85 | # random seed 86 | seed_everything(params['SEED']) 87 | 88 | # dataset training 89 | dataset_train = ObjectDetectionDataSet(data_path=train_path, 90 | transform=transforms_training, 91 | mapping=mapping) 92 | 93 | # dataset validation 94 | dataset_valid = ObjectDetectionDataSet(data_path=val_path, 95 | transform=transforms_validation, 96 | mapping=mapping) 97 | 98 | # dataloader training 99 | dataloader_train = DataLoader(dataset=dataset_train, 100 | batch_size=params['BATCH_SIZE'], 101 | shuffle=True, 102 | num_workers=0, 103 | collate_fn=collate_double) 104 | 105 | # dataloader validation 106 | dataloader_valid = DataLoader(dataset=dataset_valid, 107 | batch_size=1, 108 | shuffle=False, 109 | num_workers=0, 110 | collate_fn=collate_double) 111 | 112 | # Datasets statistics exploration 113 | if False: 114 | stats_train = stats_dataset(dataset_train) 115 | transform = GeneralizedRCNNTransform(min_size=1024, 116 | max_size=1024, 117 | image_mean=[0.485, 0.456, 0.406], 118 | image_std=[0.229, 0.224, 0.225]) 119 | stats_train_transform = stats_dataset(dataset_train, transform) 120 | print(stats_train) 121 | print(stats_train_transform) 122 | 123 | # neptune logger 124 | neptune_logger = NeptuneLogger( 125 | api_key=api_key, 126 | project_name=f'{params["OWNER"]}/{params["PROJECT"]}', # use your neptune name here 127 | experiment_name=params['EXPERIMENT'], 128 | params=params 129 | ) 130 | 131 | assert neptune_logger.name # http GET request to check if the project exists 132 | 133 | # model init 134 | model = get_fasterRCNN_resnet(num_classes=params['CLASSES'], 135 | backbone_name=params['BACKBONE'], 136 | anchor_size=params['ANCHOR_SIZE'], 137 | aspect_ratios=params['ASPECT_RATIOS'], 138 | fpn=params['FPN'], 139 | min_size=params['MIN_SIZE'], 140 | max_size=params['MAX_SIZE']) 141 | 142 | # lightning init 143 | task = FasterRCNN_lightning(model=model, lr=params['LR'], iou_threshold=params['IOU_THRESHOLD']) 144 | 145 | # callbacks 146 | checkpoint_callback = ModelCheckpoint(monitor='Validation_mAP', mode='max') 147 | learningrate_callback = LearningRateMonitor(logging_interval='step', log_momentum=False) 148 | early_stopping_callback = EarlyStopping(monitor='Validation_mAP', patience=params['PATIENCE'], mode='max') 149 | 150 | # trainer init 151 | trainer = Trainer(gpus=params['GPU'], 152 | precision=params['PRECISION'], # try 16 with enable_pl_optimizer=False 153 | callbacks=[checkpoint_callback, learningrate_callback, early_stopping_callback], 154 | default_root_dir=save_dir, # where checkpoints are saved to 155 | logger=neptune_logger, 156 | log_every_n_steps=1, 157 | num_sanity_val_steps=0, 158 | ) 159 | 160 | # start training 161 | trainer.max_epochs = params['MAXEPOCHS'] 162 | trainer.fit(task, 163 | train_dataloader=dataloader_train, 164 | val_dataloaders=dataloader_valid) 165 | 166 | # start testing 167 | #trainer.test(ckpt_path='best', test_dataloaders=dataloader_valid) 168 | 169 | # log packages 170 | log_packages_neptune(neptune_logger) 171 | 172 | # log mapping as table 173 | log_mapping_neptune(mapping, neptune_logger) 174 | 175 | # log model 176 | if params['LOG_MODEL']: 177 | checkpoint_path = pathlib.Path(checkpoint_callback.best_model_path) 178 | log_model_neptune(checkpoint_path=checkpoint_path, 179 | save_directory=pathlib.Path.home(), 180 | name='best_model.pt', 181 | neptune_logger=neptune_logger) 182 | 183 | # stop logger 184 | neptune_logger.experiment.stop() 185 | print('Finished') 186 | 187 | 188 | if __name__ == '__main__': 189 | main() 190 | -------------------------------------------------------------------------------- /resources/autonomous-driving.md: -------------------------------------------------------------------------------- 1 | 2 | ## Components of Autonomous Driving System 3 | 4 | 5 | ![Alt](images/overview.png "Standard components in a modern autonomous driving systems pipeline.") 6 | 7 | The Autonomous Driving survey paper (https://arxiv.org/pdf/2002.00444.pdf) demonstrates the above pipeline from sensor stream to control actuation. 8 | 9 | The **sensor architecture** includes multiple sets of cameras, radars and LIDARs as well as a GPS-GNSS system for absolute localization and Inertial Measurement Units (IMUs) that provide 3D pose of the vehicle in space. 10 | 11 | The goal of the **perception module** is the creation of an intermediate level representation of the environment state that is be later utilized by a decision making system that produces the driving policy. 12 | 13 | This state would include lane position, drivable zone, location of agents such as cars, pedestrians, state of traffic lights and others. 14 | 15 | Several perception tasks like _semantic segmentation_, _motion estimation_, _depth estimation_, _soiling detection_, etc which can be unified into a multi-task model. 16 | 17 | 18 | ## Courses 19 | * [[Coursera] Machine Learning](https://www.coursera.org/learn/machine-learning) - presented by [Andrew Ng](https://en.wikipedia.org/wiki/Andrew_Ng), as of 2020 Jan 28 it has 125,344 ratings and 30,705 reviews. 20 | * [[Coursera+DeepLearning.ai]Deep Learning Specialization](https://www.coursera.org/specializations/deep-learning) - presented by [Andrew Ng](https://en.wikipedia.org/wiki/Andrew_Ng), 5 Courses, teaches foundations of deep learning, programming language: python 21 | * [[Udacity] Self-Driving Car Nanodegree Program](https://www.udacity.com/course/self-driving-car-engineer-nanodegree--nd013) - teaches the skills and techniques used by self-driving car teams. Program syllabus can be found [here](https://medium.com/self-driving-cars/term-1-in-depth-on-udacitys-self-driving-car-curriculum-ffcf46af0c08#.bfgw9uxd9). 22 | * [[University of Toronto] CSC2541 23 | Visual Perception for Autonomous Driving](http://www.cs.toronto.edu/~urtasun/courses/CSC2541/CSC2541_Winter16.html) - A graduate course in visual perception for autonomous driving. The class briefly covers topics in localization, ego-motion estimaton, free-space estimation, visual recognition (classification, detection, segmentation). 24 | * [[INRIA] Mobile Robots and Autonomous Vehicles](https://www.fun-mooc.fr/courses/inria/41005S02/session02/about?utm_source=mooc-list) - Introduces the key concepts required to program mobile robots and autonomous vehicles. The course presents both formal and algorithmic tools, and for its last week's topics (behavior modeling and learning), it will also provide realistic examples and programming exercises in Python. 25 | * [[Universty of Glasgow] ENG5017 Autonomous Vehicle Guidance Systems](http://www.gla.ac.uk/coursecatalogue/course/?code=ENG5017) - Introduces the concepts behind autonomous vehicle guidance and coordination and enables students to design and implement guidance strategies for vehicles incorporating planning, optimising and reacting elements. 26 | * [[David Silver - Udacity] How to Land An Autonomous Vehicle Job: Coursework](https://medium.com/self-driving-cars/how-to-land-an-autonomous-vehicle-job-coursework-e7acc2bfe740#.j5b2kwbso) David Silver, from Udacity, reviews his coursework for landing a job in self-driving cars coming from a Software Engineering background. 27 | * [[Stanford] - CS221 Artificial Intelligence: Principles and Techniques](http://stanford.edu/~cpiech/cs221/index.html) - Contains a simple self-driving project and simulator. 28 | * [[MIT] 6.S094: Deep Learning for Self-Driving Cars](http://selfdrivingcars.mit.edu/) - *"This class is an introduction to the practice of deep learning through the applied theme of building a self-driving car. It is open to beginners and is designed for those who are new to machine learning, but it can also benefit advanced researchers in the field looking for a practical overview of deep learning methods and their application. (...)"* 29 | * [[MIT] Deep Learning](https://deeplearning.mit.edu/) - *"This page is a collection of MIT courses and lectures on deep learning, deep reinforcement learning, autonomous vehicles, and artificial intelligence organized by Lex Fridman."* 30 | * [[MIT] Human-Centered Artificial Intelligence](https://hcai.mit.edu/) - *"Human-Centered AI at MIT is a collection of research and courses focused on the design, development, and deployment of artificial intelligence systems that learn from and collaborate with humans in a deep, meaningful way."* 31 | * [[UCSD] - MAE/ECE148 Introduction to Autonomous Vehicles](https://guitar.ucsd.edu/maeece148/index.php/Introduction_to_Autonomous_Vehicles) - A hands-on, project-based course using DonkeyCar with lane-tracking functionality and various advanced topics such as object detection, navigation, etc. 32 | * [[MIT] 2.166 Duckietown](http://duckietown.mit.edu/index.html) - Class about the science of autonomy at the graduate level. This is a hands-on, project-focused course focusing on self-driving vehicles and high-level autonomy. The problem: **Design the Autonomous Robo-Taxis System for the City of Duckietown.** 33 | * [[Coursera] Self-Driving Cars](https://www.coursera.org/specializations/self-driving-cars#about) - A 4 course specialization about Self-Driving Cars by the University of Toronto. Covering all the way from the Introduction, State Estimation & Localization, Visual Perception, Motion Planning. -------------------------------------------------------------------------------- /resources/datasets.md: -------------------------------------------------------------------------------- 1 | ## Datasets 2 | 3 | > 4 | 5 | 1. [Udacity](https://github.com/udacity/self-driving-car/tree/master/datasets) - Udacity driving datasets released for [Udacity Challenges](https://www.udacity.com/self-driving-car). Contains ROSBAG training data. (~80 GB). 6 | * [Comma.ai](https://archive.org/details/comma-dataset) - 7 and a quarter hours of largely highway driving. Consists of 10 videos clips of variable size recorded at 20 Hz with a camera mounted on the windshield of an Acura ILX 2016. In parallel to the videos, also recorded some measurements such as car's speed, acceleration, steering angle, GPS coordinates, gyroscope angles. These measurements are transformed into a uniform 100 Hz time base. 7 | * [Oxford's Robotic Car](http://robotcar-dataset.robots.ox.ac.uk/) - over 100 repetitions of a consistent route through Oxford, UK, captured over a period of over a year. The dataset captures many different combinations of weather, traffic and pedestrians, along with longer term changes such as construction and roadworks. 8 | * [KITTI Vision Benchmark Suite](http://www.cvlibs.net/datasets/kitti/raw_data.php) - 6 hours of traffic scenarios at 10-100 Hz using a variety of sensor modalities such as highresolution 9 | color and grayscale stereo cameras, a Velodyne 3D laser scanner and a high-precision GPS/IMU inertial navigation system. 10 | * [University of Michigan North Campus Long-Term Vision and LIDAR Dataset](http://robots.engin.umich.edu/nclt/) - consists of omnidirectional imagery, 3D lidar, planar lidar, GPS, and proprioceptive 11 | sensors for odometry collected using a Segway robot. 12 | * [University of Michigan Ford Campus Vision and Lidar Data Set](http://robots.engin.umich.edu/SoftwareData/Ford) - dataset collected by an autonomous ground vehicle testbed, based upon a modified Ford F-250 pickup truck. The vehicle is outfitted with a professional (Applanix POS LV) and consumer (Xsens MTI-G) Inertial Measuring Unit (IMU), a Velodyne 3D-lidar scanner, two push-broom forward looking Riegl lidars, and a Point Grey Ladybug3 omnidirectional camera system. 13 | * [DIPLECS Autonomous Driving Datasets (2015)](http://cvssp.org/data/diplecs/) - dataset was recorded by placing a HD camera in a car driving around the Surrey countryside. The dataset contains about 30 minutes of driving. The video is 1920x1080 in colour, encoded using H.264 codec. Steering is estimated by tracking markers on the steering wheel. The car's speed is estimated from OCR the car's speedometer (but the accuracy of the method is not guaranteed). 14 | * [Velodyne SLAM Dataset from Karlsruhe Institute of Technology](http://www.mrt.kit.edu/z/publ/download/velodyneslam/dataset.html) - two challenging datasets recorded with the Velodyne HDL64E-S2 scanner in the city of Karlsruhe, Germany. 15 | * [SYNTHetic collection of Imagery and Annotations (SYNTHIA)](http://synthia-dataset.net/) - consists of a collection of photo-realistic frames rendered from a virtual city and comes with precise pixel-level semantic annotations for 13 classes: misc, sky, building, road, sidewalk, fence, vegetation, pole, car, sign, pedestrian, cyclist, lanemarking. 16 | * [Cityscape Dataset](https://www.cityscapes-dataset.com/) - focuses on semantic understanding of urban street scenes. large-scale dataset that contains a diverse set of stereo video sequences recorded in street scenes from 50 different cities, with high quality pixel-level annotations of 5 000 frames in addition to a larger set of 20 000 weakly annotated frames. The dataset is thus an order of magnitude larger than similar previous attempts. Details on annotated classes and examples of our annotations are available. 17 | * [CSSAD Dataset](http://aplicaciones.cimat.mx/Personal/jbhayet/ccsad-dataset) - Several real-world stereo datasets exist for the development and testing of algorithms in the fields of perception and navigation of autonomous vehicles. However, none of them was recorded in developing countries and therefore they lack the particular characteristics that can be found in their streets and roads, like abundant potholes, speed bumpers and peculiar flows of pedestrians. This stereo dataset was recorded from a moving vehicle and contains high resolution stereo images which are complemented with orientation and acceleration data obtained from an IMU, GPS data, and data from the car computer. 18 | * [Daimler Urban Segmetation Dataset](http://www.6d-vision.com/scene-labeling) - consists of video sequences recorded in urban traffic. The dataset consists of 5000 rectified stereo image pairs with a resolution of 1024x440. 500 frames (every 10th frame of the sequence) come with pixel-level semantic class annotations into 5 classes: ground, building, vehicle, pedestrian, sky. Dense disparity maps are provided as a reference, however these are not manually annotated but computed using semi-global matching (sgm). 19 | * [Self Racing Cars - XSens/Fairchild Dataset](http://data.selfracingcars.com/) - The files include measurements from the Fairchild FIS1100 6 Degree of Freedom (DoF) IMU, the Fairchild FMT-1030 AHRS, the Xsens MTi-3 AHRS, and the Xsens MTi-G-710 GNSS/INS. The files from the event can all be read in the MT Manager software, available as part of the MT Software Suite, available here. 20 | * [MIT AGE Lab](http://lexfridman.com/automated-synchronization-of-driving-data-video-audio-telemetry-accelerometer/) - a small sample of the 1,000+ hours of multi-sensor driving datasets collected at AgeLab. 21 | * [Yet Another Computer Vision Index To Datasets (YACVID)](http://yacvid.hayko.at/) - a list of frequently used computer vision datasets. 22 | * [KUL Belgium Traffic Sign Dataset](http://www.vision.ee.ethz.ch/~timofter/traffic_signs/) - a large dataset with 10000+ traffic sign annotations, thousands of physically distinct traffic signs. 4 video sequences recorded with 8 high resolution cameras mounted on a van, summing more than 3 hours, with traffic sign annotations, camera calibrations and poses. About 16000 background images. The material is captured in Belgium, in urban environments from Flanders region, by GeoAutomation. 23 | * [LISA: Laboratory for Intelligent & Safe Automobiles, UC San Diego Datasets](http://cvrr.ucsd.edu/LISA/datasets.html) - traffic sign, vehicles detection, traffic lights, trajectory patterns. 24 | * [Multisensory Omni-directional Long-term Place Recognition (MOLP) dataset for autonomous driving](http://hcr.mines.edu/code/MOLP.html) It was recorded using omni-directional stereo cameras during one year in Colorado, USA. [paper](https://arxiv.org/abs/1704.05215) 25 | * [Lane Instance Segmentation in Urban Environments](https://five.ai/datasets) Semi-automated method for labelling lane instances. 24,000 image set available. [paper](https://arxiv.org/pdf/1807.01347.pdf) 26 | * [Foggy Zurich Dataset](https://www.vision.ee.ethz.ch/~csakarid/Model_adaptation_SFSU_dense/) Curriculum Model Adaptation with Synthetic and Real Data for Semantic Dense Foggy Scene Understanding. 3.8k High Quality Foggy images in and around Zurich. [paper](https://arxiv.org/abs/1901.01415) 27 | * [SullyChen AutoPilot Dataset](https://github.com/SullyChen/Autopilot-TensorFlow) Dataset collected by SullyChen in and around California. 28 | * [Waymo Training and Validation Data](https://waymo.com/open) One terabyte of data with 3D and 2D labels. 29 | * [Intel's dataset for AD conditions in India](https://www.intel.ai/iiit-hyderabad-and-intel-release-worlds-first-dataset-for-driving-in-india/#gs.28pnw5) A dataset for Autonomous Driving conditions in India with segmented annotations (10k). (by Intel & IIIT Hyderabad). 30 | * [nuScenes Dataset](https://www.nuscenes.org/) A large dataset with 1,400,000 images and 390,000 lidar sweeps from Boston and Singapore. Provides manually generated 3D bounding boxes for 23 object classes. 31 | * [German Traffic Sign Dataset](http://benchmark.ini.rub.de/?section=gtsrb&subsection=dataset) A large dataset of German traffic sign recogniton data (GTSRB) with more than 40 classes in 50k images and detection data (GTSDB) with 900 image annotations. 32 | * [Swedish Traffic Sign Dataset](https://www.cvl.isy.liu.se/research/datasets/traffic-signs-dataset/) A dataset with traffic signs recorded on 350 km of Swedish roads, consisting of 20k+ images with 20% of annotations. 33 | -------------------------------------------------------------------------------- /resources/deep-learning.md: -------------------------------------------------------------------------------- 1 | 2 | ## Deep Learning Basics 3 | 4 | - [Offical PyTorch tutorials](http://pytorch.org/tutorials/) for more tutorials (some of these tutorials are included there) 5 | - [apachecn/MachineLearning](https://github.com/apachecn/MachineLearning) 6 | einforcement Learning Algorithms. Python, OpenAI Gym, Tensorflow](https://github.com/dennybritz/reinforcement-learning) 7 | - [lawlite19/DeepLearning_Python](https://github.com/lawlite19/DeepLearning_Python) 8 | - [A collection of tutorials and examples for solving and understanding machine learning and pattern classification tasks](https://github.com/rasbt/pattern_classification) 9 | - [Deep Learning papers reading roadmap for anyone who are eager to learn this amazing tech](https://github.com/songrotek/Deep-Learning-Papers-Reading-Roadmap) 10 | - [Content for Udacity's Machine Learning curriculum](https://github.com/udacity/machine-learning) 11 | - [This is the lab repository of my honours degree project on machine learning](https://github.com/ShokuninSan/machine-learning) 12 | - [A curated list of awesome Machine Learning frameworks, libraries and software](https://github.com/josephmisiti/awesome-machine-learning) 13 | - [Bare bones Python implementations of some of the fundamental Machine Learning models and algorithms](https://github.com/eriklindernoren/ML-From-Scratch) 14 | - [The "Python Machine Learning" book code repository and info resource](https://github.com/rasbt/python-machine-learning-book) 15 | 16 | -------------------------------------------------------------------------------- /resources/images/overview.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ifding/deep-learning-python/fc8bc808d5439686f0ee24a4f0f3b1f5354df6c0/resources/images/overview.png --------------------------------------------------------------------------------