├── 01_finding_lane_lines
    ├── README.md
    ├── basic.py
    └── test_images
    │   └── solidWhiteRight.jpg
├── 02_traffic_sign_detector
    ├── .gitmodules
    ├── CMakeLists.txt
    ├── README.md
    ├── examples
    │   └── images
    │   │   ├── lombada.png
    │   │   ├── pare.jpg
    │   │   └── pedestre.jpg
    ├── src
    │   ├── detect.cpp
    │   ├── hog_detector.cpp
    │   ├── train_object_detector.cpp
    │   └── view_hog.cpp
    └── svm_detectors
    │   ├── lombada_detector.svm
    │   ├── pare_detector.svm
    │   └── pedestre_detector.svm
├── 03_opencv_detection
    ├── CMakeLists.txt
    ├── README.md
    ├── input
    │   ├── object_detection_classes_coco.txt
    │   ├── ssd_mobilenet_v2_coco_2018_03_29.pbtxt.txt
    │   └── video_1.mp4
    └── main.cpp
├── 04_vehicle_detection
    ├── README.md
    ├── SSD.py
    ├── config.py
    ├── data
    │   ├── feat_extraction_params.pickle
    │   ├── feature_scaler.pickle
    │   └── svm_trained.pickle
    ├── functions_detection.py
    ├── functions_feat_extraction.py
    ├── functions_utils.py
    ├── img
    │   ├── car_samples.png
    │   ├── confidence_001.png
    │   ├── confidence_050.png
    │   ├── hog_car_vs_noncar.jpg
    │   ├── noncar_samples.png
    │   └── pipeline_hog.jpg
    ├── main_hog.py
    ├── main_ssd.py
    ├── output_images
    │   ├── test1.jpg
    │   ├── test2.jpg
    │   ├── test3.jpg
    │   ├── test4.jpg
    │   ├── test5.jpg
    │   └── test6.jpg
    ├── process_video.py
    ├── project_5_utils.py
    ├── test_images
    │   ├── test1.jpg
    │   ├── test2.jpg
    │   ├── test3.jpg
    │   ├── test4.jpg
    │   ├── test5.jpg
    │   └── test6.jpg
    ├── train.py
    └── vehicle.py
├── 05_road_segmentation
    ├── README.md
    ├── __init__.py
    ├── helper.py
    ├── image_augmentation.py
    ├── img
    │   ├── example.png
    │   └── overview.jpg
    ├── main.py
    └── project_tests.py
├── LICENSE
├── README.md
├── faster-rcnn-tutorial
    ├── README.md
    ├── custom_dataset.py
    ├── dataset
    │   ├── README.md
    │   ├── download_dataset.sh
    │   └── result.png
    ├── detection
    │   ├── __init__.py
    │   ├── anchor_generator.py
    │   ├── backbone_resnet.py
    │   ├── faster_RCNN.py
    │   ├── transformations.py
    │   └── utils.py
    ├── inference.ipynb
    ├── metrics
    │   ├── README.md
    │   ├── __init__.py
    │   ├── bounding_box.py
    │   ├── enumerators.py
    │   ├── general_utils.py
    │   ├── pascal_voc_evaluator.py
    │   └── pascal_voc_evaluator_test.py
    ├── setup.py
    └── train.py
└── resources
    ├── autonomous-driving.md
    ├── datasets.md
    ├── deep-learning.md
    └── images
        └── overview.png


/01_finding_lane_lines/README.md:
--------------------------------------------------------------------------------
 1 | 
 2 | # Finding Lane Lines on the Road
 3 | 
 4 | 
 5 | ## Basic Lane Finding Project
 6 | 
 7 | For a self driving vehicle to stay in a lane, the first step is to identify lane lines before issuing commands to the control system. Since the lane lines can be of different colors (white, yellow) or forms (solid, dashed) this seemingly trivial task becomes increasingly difficult. Moreover, the situation is further exacerbated with variations in lighting conditions. Thankfully, there are a number of mathematical tools and approaches available nowadays to effectively extract lane lines from an image or dashcam video. 
 8 | 
 9 | ### Methodology
10 | 
11 | Before attempting to detect lane lines in a video, a software pipeline is developed for lane detection in a series of images. Only after ensuring that it works satisfactorily for test images, the pipeline is employed for lane detection in a video. 
12 | 
13 | Consider the test image given below:
14 | 
15 | ![](./test_images/solidWhiteRight.jpg)
16 | 
17 | 1. The test image is first converted to grayscale from RGB using the helper function grayscale().
18 | 
19 | 2. The grayscaled image is given a gaussian blur to remove noise or spurious gradients. 
20 | 
21 | 3. Canny edge detection is applied on this blurred image and a binary image
22 | 
23 | 4. A region of interest is defined to separate the lanes from sorrounding environment and a masked image containing only the lanes is extracted using cv2.bitwise_and() function.
24 | 
25 | 5. This binary image of identified lane lines is finally merged with the original image using cv2.addweighted() function.
26 | 
27 | 
28 |  - **Advanced:** Built an advanced lane-finding algorithm using distortion correction, image rectification, color transforms, and gradient thresholding. Identified lane curvature and vehicle displacement. Overcame environmental challenges such as shadows and pavement changes.
29 | 
30 | [Advanced Lane Finding Project](https://github.com/vsingla2/Self-Driving-Car-NanoDegree-Udacity/blob/master/Term1-Computer-Vision-and-Deep-Learning/Project4-Advanced-Lane_Lines/Advanced-Lane-Lines.ipynb)


--------------------------------------------------------------------------------
/01_finding_lane_lines/basic.py:
--------------------------------------------------------------------------------
  1 | import matplotlib.pyplot as plt
  2 | import matplotlib.image as mpimg
  3 | import numpy as np
  4 | import cv2
  5 | import math
  6 | import os
  7 | 
  8 | # https://github.com/vsingla2/Self-Driving-Car-NanoDegree-Udacity/tree/master/Term1-Computer-Vision-and-Deep-Learning/Project1-Finding-Lane-Lines
  9 | 
 10 | def grayscale(img):
 11 |     """Applies the Grayscale transform
 12 |     This will return an image with only one color channel
 13 |     but NOTE: to see the returned image as grayscale
 14 |     (assuming your grayscaled image is called 'gray')
 15 |     you should call plt.imshow(gray, cmap='gray')"""
 16 |     return cv2.cvtColor(img, cv2.COLOR_RGB2GRAY)
 17 |     # Or use BGR2GRAY if you read an image with cv2.imread()
 18 |     # return cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
 19 | 
 20 | def rgbtohsv(img):
 21 |     "Applies rgb to hsv transform"
 22 |     return cv2.cvtColor(img, cv2.COLOR_RGB2HSV)
 23 |     
 24 | def canny(img, low_threshold, high_threshold):
 25 |     """Applies the Canny transform"""
 26 |     return cv2.Canny(img, low_threshold, high_threshold)
 27 | 
 28 | def gaussian_blur(img, kernel_size):
 29 |     """Applies a Gaussian Noise kernel"""
 30 |     return cv2.GaussianBlur(img, (kernel_size, kernel_size), 0)
 31 | 
 32 | def region_of_interest(img, vertices):
 33 |     """
 34 |     Applies an image mask.
 35 |     
 36 |     Only keeps the region of the image defined by the polygon
 37 |     formed from `vertices`. The rest of the image is set to black.
 38 |     """
 39 |     #defining a blank mask to start with
 40 |     mask = np.zeros_like(img)   
 41 |     
 42 |     #defining a 3 channel or 1 channel color to fill the mask with depending on the input image
 43 |     if len(img.shape) > 2:
 44 |         channel_count = img.shape[2]  # i.e. 3 or 4 depending on your image
 45 |         ignore_mask_color = (255,) * channel_count
 46 |     else:
 47 |         ignore_mask_color = 255
 48 |         
 49 |     #filling pixels inside the polygon defined by "vertices" with the fill color    
 50 |     cv2.fillPoly(mask, vertices, ignore_mask_color)
 51 |     
 52 |     #returning the image only where mask pixels are nonzero
 53 |     masked_image = cv2.bitwise_and(img, mask)
 54 |     return masked_image
 55 | 
 56 | 
 57 | def draw_lines(img, lines, color=[200, 0, 0], thickness = 10):
 58 |     """
 59 |     NOTE: this is the function you might want to use as a starting point once you want to 
 60 |     average/extrapolate the line segments you detect to map out the full
 61 |     extent of the lane (going from the result shown in raw-lines-example.mp4
 62 |     to that shown in P1_example.mp4).  
 63 |     
 64 |     Think about things like separating line segments by their 
 65 |     slope ((y2-y1)/(x2-x1)) to decide which segments are part of the left
 66 |     line vs. the right line.  Then, you can average the position of each of 
 67 |     the lines and extrapolate to the top and bottom of the lane.
 68 |     
 69 |     This function draws `lines` with `color` and `thickness`.    
 70 |     Lines are drawn on the image inplace (mutates the image).
 71 |     If you want to make the lines semi-transparent, think about combining
 72 |     this function with the weighted_img() function below
 73 |     """
 74 |     x_left = []
 75 |     y_left = []
 76 |     x_right = []
 77 |     y_right = []
 78 |     imshape = image.shape
 79 |     ysize = imshape[0]
 80 |     ytop = int(0.6*ysize) # need y coordinates of the top and bottom of left and right lane
 81 |     ybtm = int(ysize) #  to calculate x values once a line is found
 82 |     
 83 |     for line in lines:
 84 |         for x1,y1,x2,y2 in line:
 85 |             slope = float(((y2-y1)/(x2-x1)))
 86 |             if (slope > 0.5): # if the line slope is greater than tan(26.52 deg), it is the left line
 87 |                     x_left.append(x1)
 88 |                     x_left.append(x2)
 89 |                     y_left.append(y1)
 90 |                     y_left.append(y2)
 91 |             if (slope < -0.5): # if the line slope is less than tan(153.48 deg), it is the right line
 92 |                     x_right.append(x1)
 93 |                     x_right.append(x2)
 94 |                     y_right.append(y1)
 95 |                     y_right.append(y2)
 96 |     # only execute if there are points found that meet criteria, this eliminates borderline cases i.e. rogue frames
 97 |     if (x_left!=[]) & (x_right!=[]) & (y_left!=[]) & (y_right!=[]): 
 98 |         left_line_coeffs = np.polyfit(x_left, y_left, 1)
 99 |         left_xtop = int((ytop - left_line_coeffs[1])/left_line_coeffs[0])
100 |         left_xbtm = int((ybtm - left_line_coeffs[1])/left_line_coeffs[0])
101 |         right_line_coeffs = np.polyfit(x_right, y_right, 1)
102 |         right_xtop = int((ytop - right_line_coeffs[1])/right_line_coeffs[0])
103 |         right_xbtm = int((ybtm - right_line_coeffs[1])/right_line_coeffs[0])
104 |         cv2.line(img, (left_xtop, ytop), (left_xbtm, ybtm), color, thickness)
105 |         cv2.line(img, (right_xtop, ytop), (right_xbtm, ybtm), color, thickness)
106 | 
107 | def hough_lines(img, rho, theta, threshold, min_line_len, max_line_gap):
108 |     """
109 |     `img` should be the output of a Canny transform.
110 |         
111 |     Returns an image with hough lines drawn.
112 |     """
113 |     lines = cv2.HoughLinesP(img, rho, theta, threshold, np.array([]), minLineLength=min_line_len, maxLineGap=max_line_gap)
114 |     line_img = np.zeros((img.shape[0], img.shape[1], 3), dtype=np.uint8)
115 |     draw_lines(line_img, lines)
116 |     return line_img
117 | 
118 | # Python 3 has support for cool math symbols.
119 | 
120 | def weighted_img(img, initial_img, α=0.8, β=1., λ=0.):
121 |     """
122 |     `img` is the output of the hough_lines(), An image with lines drawn on it.
123 |     Should be a blank image (all black) with lines drawn on it.
124 |     
125 |     `initial_img` should be the image before any processing.
126 |     
127 |     The result image is computed as follows:
128 |     
129 |     initial_img * α + img * β + λ
130 |     NOTE: initial_img and img must be the same shape!
131 |     """
132 |     return cv2.addWeighted(initial_img, α, img, β, λ)
133 | 
134 | 
135 | #reading in an image
136 | image = mpimg.imread('test_images/solidWhiteRight.jpg')
137 | 
138 | #printing out some stats and plotting
139 | print('This image is:', type(image), 'with dimensions:', image.shape)
140 | 
141 | # if you wanted to show a single color channel image called 'gray', for example, call as plt.imshow(gray, cmap='gray')
142 | plt.imshow(image) 
143 | 
144 | test_images_list = os.listdir("test_images/")
145 | 
146 | 
147 | # define parameters needed for helper functions (given inline)
148 | kernel_size = 5 # gaussian blur
149 | low_threshold = 60 # canny edge detection
150 | high_threshold = 180 # canny edge detection
151 | # Define the Hough transform parameters
152 | rho = 1 # distance resolution in pixels of the Hough grid
153 | theta = np.pi/180 # angular resolution in radians of the Hough grid
154 | threshold = 20   # minimum number of votes (intersections in Hough grid cell)
155 | min_line_length = 40 # minimum number of pixels making up a line
156 | max_line_gap = 25   # maximum gap in pixels between connectable line segments
157 | 
158 | for test_image in test_images_list: # iterating through the images in test_images folder
159 |     image = mpimg.imread('test_images/' + test_image) # reading in an image
160 |     gray = grayscale(image) # convert to grayscale
161 |     blur_gray = gaussian_blur(gray, kernel_size) # add gaussian blur to remove noise
162 |     edges = canny(blur_gray, low_threshold, high_threshold) # perform canny edge detection
163 |     # extract image size and define vertices of the four sided polygon for masking
164 |     imshape = image.shape
165 |     xsize = imshape[1]
166 |     ysize = imshape[0]
167 |     vertices = np.array([[(0.05*xsize, ysize ),(0.44*xsize, 0.6*ysize),\
168 |                           (0.55*xsize, 0.6*ysize), (0.95*xsize, ysize)]], dtype=np.int32) #
169 |     masked_edges = region_of_interest(edges, vertices) # retain information only in the region of interest
170 |     line_image = hough_lines(masked_edges, rho, theta, threshold,\
171 |                              min_line_length, max_line_gap) # perform hough transform and retain lines with specific properties
172 |     lines_edges = weighted_img(line_image, image, α=0.8, β=1., λ=0.) # Draw the lines on the edge image                 
173 |     plt.imshow(lines_edges) # Display the image
174 |     plt.show()
175 |     mpimg.imsave('test_images_output/' + test_image, lines_edges) # save the resulting image


--------------------------------------------------------------------------------
/01_finding_lane_lines/test_images/solidWhiteRight.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/ifding/deep-learning-python/fc8bc808d5439686f0ee24a4f0f3b1f5354df6c0/01_finding_lane_lines/test_images/solidWhiteRight.jpg


--------------------------------------------------------------------------------
/02_traffic_sign_detector/.gitmodules:
--------------------------------------------------------------------------------
1 | [submodule "dlib"]
2 | 	path = dlib
3 | 	url = https://github.com/davisking/dlib
4 | 


--------------------------------------------------------------------------------
/02_traffic_sign_detector/CMakeLists.txt:
--------------------------------------------------------------------------------
 1 | #
 2 | # This is a CMake makefile.  You can find the cmake utility and
 3 | # information about it at http://www.cmake.org
 4 | #
 5 | 
 6 | cmake_minimum_required(VERSION 2.8.4)
 7 | 
 8 | PROJECT(transito-cv)
 9 | 
10 | include(dlib/dlib/cmake)
11 | 
12 | set(CMAKE_BUILD_TYPE Release)
13 | set(DLIB_NO_GUI_SUPPORT OFF)
14 | 
15 | option(USE_AVX_INSTRUCTIONS  "Compile your program with AVX instructions" OFF)
16 | 
17 | IF(USE_AVX_INSTRUCTIONS)
18 |   add_definitions(-mavx)
19 |   add_definitions(-march=native)
20 | ENDIF()
21 | 
22 | MACRO(add_source name)
23 |     ADD_EXECUTABLE(${name} src/${name}.cpp)
24 |     TARGET_LINK_LIBRARIES(${name} dlib )
25 | ENDMACRO()
26 | 
27 | add_source(hog_detector)
28 | add_source(train_object_detector)
29 | add_source(detect)
30 | add_source(view_hog)
31 | 
32 | 


--------------------------------------------------------------------------------
/02_traffic_sign_detector/README.md:
--------------------------------------------------------------------------------
 1 | 
 2 | 
 3 | This is a traffic sign detector and classifier that uses [dlib](http://dlib.net/) and its implementation of the Felzenszwalb's version of the Histogram of Oriented Gradients (HoG) detector.
 4 | 
 5 | The training examples used in this repository are from Brazilian road signs, but the classifier should work with any traffic signs, as long as you train it properly. Google Street View images can be used to train the detectors. 25~40 images are sufficient to train a good detector.
 6 | 
 7 | ![](https://cloud.githubusercontent.com/assets/294960/7904020/7d216ae0-07c3-11e5-96fe-2b9d020fec4c.png)
 8 | 
 9 | Note: all programs accept `-h` as command-line parameter to show a help message.
10 | 
11 | ## Build
12 | 
13 | ```
14 | mkdir build
15 | (cd build; cmake .. && cmake --build .)
16 | ```
17 | 
18 | If you want to enable AVX instructions (make sure you have compatibility):
19 | 
20 | ```
21 | (cd build; cmake .. -DUSE_AVX_INSTRUCTIONS=ON && cmake --build .)
22 | ```
23 | 
24 | ## Mark signs on images
25 | 
26 | 1. Compile `imglab`:
27 | 
28 | ```
29 | cd dlib/tools/imglab
30 | mkdir build
31 | cd build
32 | cmake ..
33 | cmake --build .
34 | ```
35 | 
36 | 2. Create XML from sample images and Train the fHOG detector:
37 | 
38 | Please check [transito-cv](https://github.com/fabioperez/transito-cv), there are methods with better results than HoG for traffic sign detector, such as Deep Learning architectures.
39 | 
40 | 
41 | ## Visualize HOG detectors
42 | 
43 | To visualize detectors, use the program `view_hog`. Usage:
44 | 
45 | ```
46 | build/view_hog svm_detectors/pare_detector.svm
47 | ```
48 | 
49 | ![image](https://cloud.githubusercontent.com/assets/294960/8306983/6fa2ca40-1992-11e5-905d-04260fbfe128.png)
50 | 
51 | 
52 | ## Detect and Classify
53 | 
54 | To detect and classify images, run `detect` with the video frames as parameters, use the parameter `--wait` to wait for user input to show next image.
55 | 
56 | ```
57 | build/detect --wait examples/images/*.jpg
58 | ```
59 | 
60 | ## Examples
61 | 
62 | To run the examples:
63 | 
64 | ```
65 | build/detect --wait -u1 examples/images/*
66 | ```
67 | 
68 | ![image6](https://cloud.githubusercontent.com/assets/294960/8306981/6ef3e142-1992-11e5-91b0-e753737bcb5f.png)
69 | ![image7](https://cloud.githubusercontent.com/assets/294960/8306982/6f7f22c0-1992-11e5-8c2e-4079ddffec47.png)
70 | ![image8](https://cloud.githubusercontent.com/assets/294960/8306980/6edb6ae0-1992-11e5-9d77-ddbd0cd59a7b.png)
71 | 
72 | 
73 | ## References
74 | 
75 | - <https://github.com/fabioperez/transito-cv>
76 | 


--------------------------------------------------------------------------------
/02_traffic_sign_detector/examples/images/lombada.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/ifding/deep-learning-python/fc8bc808d5439686f0ee24a4f0f3b1f5354df6c0/02_traffic_sign_detector/examples/images/lombada.png


--------------------------------------------------------------------------------
/02_traffic_sign_detector/examples/images/pare.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/ifding/deep-learning-python/fc8bc808d5439686f0ee24a4f0f3b1f5354df6c0/02_traffic_sign_detector/examples/images/pare.jpg


--------------------------------------------------------------------------------
/02_traffic_sign_detector/examples/images/pedestre.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/ifding/deep-learning-python/fc8bc808d5439686f0ee24a4f0f3b1f5354df6c0/02_traffic_sign_detector/examples/images/pedestre.jpg


--------------------------------------------------------------------------------
/02_traffic_sign_detector/src/detect.cpp:
--------------------------------------------------------------------------------
  1 | /* HOG DETECTOR
  2 |  *
  3 |  */
  4 | 
  5 | #include <dlib/svm_threaded.h>
  6 | #include <dlib/gui_widgets.h>
  7 | #include <dlib/image_processing.h>
  8 | #include <dlib/data_io.h>
  9 | #include <dlib/image_transforms.h>
 10 | #include <dlib/cmd_line_parser.h>
 11 | 
 12 | #include <iostream>
 13 | #include <fstream>
 14 | #include <cstdlib>
 15 | 
 16 | using namespace std;
 17 | using namespace dlib;
 18 | 
 19 | struct TrafficSign {
 20 |   string name;
 21 |   string svm_path;
 22 |   rgb_pixel color;
 23 |   TrafficSign(string name, string svm_path, rgb_pixel color) :
 24 |     name(name), svm_path(svm_path), color(color) {};
 25 | };
 26 | 
 27 | int main(int argc, char** argv) {
 28 |   try {
 29 |     command_line_parser parser;
 30 | 
 31 |     parser.add_option("h","Display this help message.");
 32 |     parser.add_option("u", "Upsample each input image <arg> times. Each \
 33 |                       upsampling quadruples the number of pixels in the image \
 34 |                       (default: 0).", 1);
 35 |     parser.add_option("wait","Wait user input to show next image.");
 36 | 
 37 |     parser.parse(argc, argv);
 38 |     parser.check_option_arg_range("u", 0, 8);
 39 | 
 40 |     const char* one_time_opts[] = {"h","u","wait"};
 41 |     parser.check_one_time_options(one_time_opts);
 42 | 
 43 |     // Display help message
 44 |     if (parser.option("h")) {
 45 |       cout << "Usage: " << argv[0] << " [options] <list of images>" << endl;
 46 |       parser.print_options();
 47 | 
 48 |       return EXIT_SUCCESS;
 49 |     }
 50 | 
 51 |     if (parser.number_of_arguments() == 0) {
 52 |       cout << "You must give a list of input files." << endl;
 53 |       cout << "\nTry the -h option for more information." << endl;
 54 |       return EXIT_FAILURE;
 55 |     }
 56 | 
 57 |     const unsigned long upsample_amount = get_option(parser, "u", 0);
 58 | 
 59 |     dlib::array<array2d<unsigned char> > images;
 60 | 
 61 |     images.resize(parser.number_of_arguments());
 62 | 
 63 |     for (unsigned long i = 0; i < images.size(); ++i) {
 64 |       load_image(images[i], parser[i]);
 65 |     }
 66 | 
 67 |     for (unsigned long i = 0; i < upsample_amount; ++i) {
 68 |       for (unsigned long j = 0; j < images.size(); ++j) {
 69 |         pyramid_up(images[j]);
 70 |       }
 71 |     }
 72 | 
 73 |     typedef scan_fhog_pyramid<pyramid_down<6> > image_scanner_type;
 74 | 
 75 |     // Load SVM detectors
 76 |     std::vector<TrafficSign> signs;
 77 |     signs.push_back(TrafficSign("PARE", "svm_detectors/pare_detector.svm",
 78 |                                 rgb_pixel(255,0,0)));
 79 |     signs.push_back(TrafficSign("LOMBADA", "svm_detectors/lombada_detector.svm",
 80 |                                 rgb_pixel(255,122,0)));
 81 |     signs.push_back(TrafficSign("PEDESTRE", "svm_detectors/pedestre_detector.svm",
 82 |                                 rgb_pixel(255,255,0)));
 83 | 
 84 |     std::vector<object_detector<image_scanner_type> > detectors;
 85 | 
 86 |     for (int i = 0; i < signs.size(); i++) {
 87 |       object_detector<image_scanner_type> detector;
 88 |       deserialize(signs[i].svm_path) >> detector;
 89 |       detectors.push_back(detector);
 90 |     }
 91 | 
 92 |     image_window win;
 93 |     std::vector<rect_detection> rects;
 94 |     for (unsigned long i = 0; i < images.size(); ++i) {
 95 |       evaluate_detectors(detectors, images[i], rects);
 96 | 
 97 |       // Put the image and detections into the window.
 98 |       win.clear_overlay();
 99 |       win.set_image(images[i]);
100 | 
101 |       for (unsigned long j = 0; j < rects.size(); ++j) {
102 |         win.add_overlay(rects[j].rect, signs[rects[j].weight_index].color,
103 |                         signs[rects[j].weight_index].name);
104 |       }
105 | 
106 |       if (parser.option("wait")) {
107 |         cout << "Press any key to continue...";
108 |         cin.get();
109 |       }
110 |     }
111 |   }
112 |   catch (exception& e) {
113 |     cout << "\nexception thrown!" << endl;
114 |     cout << e.what() << endl;
115 |   }
116 | }
117 | 


--------------------------------------------------------------------------------
/02_traffic_sign_detector/src/hog_detector.cpp:
--------------------------------------------------------------------------------
  1 | /* HOG DETECTOR TRAINER
  2 |  * This program trains a fHOG detector.
  3 |  * For help, run ./hog_detector -h
  4 |  *
  5 |  * Sample usage:
  6 |  *   ./hog_detector -u1 --filter 0.4 -v images/pare
  7 |  *
  8 |  * To better understand the code of this detector, read the following example codes:
  9 |  *   http://dlib.net/fhog_object_detector_ex.cpp.html
 10 |  *   http://dlib.net/train_object_detector.cpp.html
 11 |  */
 12 | 
 13 | #include <dlib/svm_threaded.h>
 14 | #include <dlib/gui_widgets.h>
 15 | #include <dlib/image_processing.h>
 16 | #include <dlib/data_io.h>
 17 | #include <dlib/image_transforms.h>
 18 | #include <dlib/cmd_line_parser.h>
 19 | 
 20 | #include <iostream>
 21 | #include <fstream>
 22 | #include <cstdlib>
 23 | 
 24 | using namespace std;
 25 | using namespace dlib;
 26 | 
 27 | int main(int argc, char** argv) {
 28 |   try {
 29 |     command_line_parser parser;
 30 |     parser.add_option("h","Display this help message.");
 31 |     parser.add_option("c","Set the SVM C parameter to <arg> (default: 1.0).",1);
 32 |     parser.add_option("u", "Upsample each input image <arg> times. Each upsampling quadruples the number of pixels in the image (default: 0).", 1);
 33 |     parser.add_option("v","Be verbose.");
 34 |     parser.add_option("filter","Remove filters with singular value less than <arg> (default: disabled).", 1);
 35 |     parser.add_option("detector-name","Save SVM detector to <arg> (default: 'detector.svm').", 1);
 36 |     parser.add_option("threads", "Use <arg> threads for training (default: 4).",1);
 37 |     parser.add_option("eps", "Set SVM training epsilon to <arg> (default: 0.01).", 1);
 38 |     parser.add_option("norm", "If set, the nuclear norm regularization strength will be <arg> (default: disabled).", 1);
 39 | 
 40 |     // TODO: Variable window size
 41 | #if 0
 42 |     parser.add_option("w","Set window size to <arg1> x <arg2> pixels (default: 80x80.", 2);
 43 | #endif
 44 | 
 45 |     parser.parse(argc, argv);
 46 | 
 47 |     // Can't give an option more than once
 48 |     const char* one_time_opts[] = {"h","c","u","v","detector-name", "threads", "eps","filter","norm"};
 49 |     parser.check_one_time_options(one_time_opts);
 50 | 
 51 |     // Check parameters values
 52 |     parser.check_option_arg_range("c", 1e-12, 1e12);
 53 |     parser.check_option_arg_range("u", 0, 8);
 54 |     parser.check_option_arg_range("threads", 1, 1000);
 55 |     parser.check_option_arg_range("eps", 1e-5, 1e4);
 56 |     parser.check_option_arg_range("filter", 0.0, 2.0);
 57 |     parser.check_option_arg_range("norm", 1e-12, 1e12);
 58 | 
 59 |     // Display help message
 60 |     if (parser.option("h")) {
 61 |       cout << "Usage: " << argv[0] << " [options] <path>" << endl;
 62 |       cout << "<path> must countain the files training.xml and testing.xml." << endl;
 63 |       parser.print_options(); 
 64 | 
 65 |       return EXIT_SUCCESS;
 66 |     }
 67 | 
 68 |     if (parser.number_of_arguments() == 0) {
 69 |       cout << "You must give an image or an image dataset metadata XML file produced by the imglab tool." << endl;
 70 |       cout << "\nTry the -h option for more information." << endl;
 71 |       return EXIT_FAILURE;
 72 |     }
 73 | 
 74 |     // Declarations and parameters
 75 |     const std::string dir = parser[0];
 76 |     dlib::array<array2d<unsigned char> > images_train, images_test;
 77 |     std::vector<std::vector<rectangle> > sign_boxes_train, sign_boxes_test;
 78 |     const double c_val = get_option(parser, "c", 1.0);
 79 |     const unsigned long upsample_amount = get_option(parser, "u", 0);
 80 |     const std::string detector_name = get_option(parser, "detector-name", "detector.svm");
 81 |     const int num_threads = get_option(parser, "threads", 4);
 82 |     const double eps = get_option(parser, "eps", 0.01);
 83 |     const double filter_val = get_option(parser, "filter", 0.0);
 84 |     const double norm = get_option(parser, "norm", 0.0);
 85 | 
 86 |     cout << "Training with the following parameters: " << endl;
 87 |     cout << "  threads:                   "<< num_threads << endl;
 88 |     cout << "  C:                         "<< c_val << endl;
 89 |     cout << "  epsilon:                   "<< eps << endl;
 90 |     cout << "  upsample this many times : "<< upsample_amount << endl;
 91 |     cout << "  filter threshold :         "<< filter_val << endl;
 92 |     cout << "  NNR strenght :             "<< norm << endl;
 93 | 
 94 |     // Load training and testing datasets
 95 |     load_image_dataset(images_train, sign_boxes_train, dir+"/training.xml");
 96 |     load_image_dataset(images_test, sign_boxes_test, dir+"/testing.xml");
 97 | 
 98 |     // Upsample images (set by -u parameters)
 99 |     for (unsigned long i = 0; i < upsample_amount; ++i) {
100 |       upsample_image_dataset<pyramid_down<2> >(images_train, sign_boxes_train);
101 |       upsample_image_dataset<pyramid_down<2> >(images_test,  sign_boxes_test);
102 |     }
103 | 
104 |     // Create fHOG scanner
105 |     typedef scan_fhog_pyramid<pyramid_down<6> > image_scanner_type; 
106 |     image_scanner_type scanner;
107 |     if (norm > 0.0) scanner.set_nuclear_norm_regularization_strength(norm);
108 |     scanner.set_detection_window_size(80, 80); 
109 |     structural_object_detection_trainer<image_scanner_type> trainer(scanner);
110 |     trainer.set_num_threads(num_threads); // Number of working threads
111 |     trainer.set_c(c_val); // SVM C-value
112 |     if (parser.option("v")) trainer.be_verbose();
113 |     trainer.set_epsilon(eps);
114 |     object_detector<image_scanner_type> detector = trainer.train(images_train, sign_boxes_train);
115 | 
116 |     if (filter_val > 0.0) {
117 |       int num_filters_before = num_separable_filters(detector);
118 |       detector = threshold_filter_singular_values(detector,filter_val);
119 |       cout << num_filters_before-num_separable_filters(detector) << " filters were removed." << endl;
120 |     }
121 | 
122 |     // Test results on training and testing dataset
123 |     cout << "training results: " << test_object_detection_function(detector, images_train, sign_boxes_train) << endl;
124 |     cout << "testing results:  " << test_object_detection_function(detector, images_test, sign_boxes_test) << endl;
125 | 
126 |     // Save detector to disk
127 |     serialize(detector_name) << detector;
128 |   }
129 |   catch (exception& e) {
130 |     cout << "\nexception thrown!" << endl;
131 |     cout << e.what() << endl;
132 |   }
133 | }
134 | 


--------------------------------------------------------------------------------
/02_traffic_sign_detector/src/view_hog.cpp:
--------------------------------------------------------------------------------
 1 | /* Visualise a fHOG detector. 
 2 | *  This program takes a fHOG detector as input and displays it in a window.
 3 | *
 4 | *  Usage:
 5 | *    ./view_hog detector.svm
 6 | */
 7 | 
 8 | #include <dlib/gui_widgets.h>
 9 | #include <dlib/image_processing.h>
10 | #include <dlib/cmd_line_parser.h>
11 | 
12 | #include <iostream>
13 | #include <fstream>
14 | #include <cstdlib>
15 | 
16 | using namespace std;
17 | using namespace dlib;
18 | 
19 | int main(int argc, char** argv) {
20 |   try {
21 |     command_line_parser parser;
22 |     parser.add_option("h","Display this help message.");
23 | 
24 |     parser.parse(argc, argv);
25 |     const char* one_time_opts[] = {"h"};
26 |     parser.check_one_time_options(one_time_opts);
27 | 
28 |     // Display help message
29 |     if (parser.option("h")) {
30 |       cout << "Usage: " << argv[0] << " [options] <fHOG SVM file>" << endl;
31 |       parser.print_options(); 
32 | 
33 |       return EXIT_SUCCESS;
34 |     }
35 | 
36 |     if (parser.number_of_arguments() == 0) {
37 |       cout << "You must give a fHOG SVM file as input." << endl;
38 |       cout << "\nTry the -h option for more information." << endl;
39 |       return EXIT_FAILURE;
40 |     }
41 | 
42 |     typedef scan_fhog_pyramid<pyramid_down<6> > image_scanner_type;
43 |     object_detector<image_scanner_type> detector;
44 |     deserialize(argv[1]) >> detector;
45 |     image_window hogwin(draw_fhog(detector), "Learned fHOG detector");
46 |     cout << "Press any key to exit!" << endl;
47 |     cin.get(); // Wait input to exit
48 |   }
49 |   catch (exception& e) {
50 |     cout << "\nexception thrown!" << endl;
51 |     cout << e.what() << endl;
52 |   }
53 | }
54 | 


--------------------------------------------------------------------------------
/02_traffic_sign_detector/svm_detectors/lombada_detector.svm:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/ifding/deep-learning-python/fc8bc808d5439686f0ee24a4f0f3b1f5354df6c0/02_traffic_sign_detector/svm_detectors/lombada_detector.svm


--------------------------------------------------------------------------------
/02_traffic_sign_detector/svm_detectors/pare_detector.svm:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/ifding/deep-learning-python/fc8bc808d5439686f0ee24a4f0f3b1f5354df6c0/02_traffic_sign_detector/svm_detectors/pare_detector.svm


--------------------------------------------------------------------------------
/02_traffic_sign_detector/svm_detectors/pedestre_detector.svm:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/ifding/deep-learning-python/fc8bc808d5439686f0ee24a4f0f3b1f5354df6c0/02_traffic_sign_detector/svm_detectors/pedestre_detector.svm


--------------------------------------------------------------------------------
/03_opencv_detection/CMakeLists.txt:
--------------------------------------------------------------------------------
 1 | # Older versions of CMake are likely to work just fine but, since
 2 | # I don't know where to cut off I just use the version I'm using
 3 | cmake_minimum_required(VERSION "3.17")
 4 | 
 5 | # name of this example project
 6 | project(demo)
 7 | 
 8 | # set OpenCV_DIR variable equal to the path to the cmake
 9 | # files within the previously installed opencv program
10 | set(OpenCV_DIR /Users/ifding/Documents/Code/opencv/install/lib/cmake/opencv4)
11 | 
12 | # Tell compiler to use C++ 17 features which is needed because
13 | # Clang version is often behind in the XCode installation
14 | set(CMAKE_CXX_STANDARD 17)
15 | 
16 | # configure the necessary common CMake environment variables
17 | # needed to include and link the OpenCV program into this
18 | # demo project, namely OpenCV_INCLUDE_DIRS and OpenCV_LIBS
19 | find_package( OpenCV REQUIRED )
20 | 
21 | # tell the build to include the headers from OpenCV
22 | include_directories( ${OpenCV_INCLUDE_DIRS} )
23 | 
24 | # specify the executable target to be built
25 | add_executable(demo main.cpp)
26 | 
27 | # tell it to link the executable target against OpenCV
28 | target_link_libraries(demo ${OpenCV_LIBS} )


--------------------------------------------------------------------------------
/03_opencv_detection/README.md:
--------------------------------------------------------------------------------
 1 | 
 2 | # Object detection in OpenCV
 3 | 
 4 | 
 5 | - Install OpenCV by following [here](https://thecodinginterface.com/blog/opencv-cpp-vscode/)
 6 | 
 7 | - Download `frozen_inference_graph.pb` from [learnopencv](https://github.com/spmallick/learnopencv/tree/master/Deep-Learning-with-OpenCV-DNN-Module/input)
 8 | 
 9 | - More models can be found in <https://github.com/opencv/opencv/wiki/Deep-Learning-in-OpenCV>
10 | 
11 | To run the code in C++:
12 | 
13 | ```bash
14 | mkdir build
15 | cd build
16 | cmake ..
17 | cmake --build . --config Release
18 | cd ..
19 | ./build/demo
20 | ```
21 | 


--------------------------------------------------------------------------------
/03_opencv_detection/input/object_detection_classes_coco.txt:
--------------------------------------------------------------------------------
 1 | person
 2 | bicycle
 3 | car
 4 | motorcycle
 5 | airplane
 6 | bus
 7 | train
 8 | truck
 9 | boat
10 | traffic light
11 | fire hydrant
12 | street sign
13 | stop sign
14 | parking meter
15 | bench
16 | bird
17 | cat
18 | dog
19 | horse
20 | sheep
21 | cow
22 | elephant
23 | bear
24 | zebra
25 | giraffe
26 | hat
27 | backpack
28 | umbrella
29 | shoe
30 | eye glasses
31 | handbag
32 | tie
33 | suitcase
34 | frisbee
35 | skis
36 | snowboard
37 | sports ball
38 | kite
39 | baseball bat
40 | baseball glove
41 | skateboard
42 | surfboard
43 | tennis racket
44 | bottle
45 | plate
46 | wine glass
47 | cup
48 | fork
49 | knife
50 | spoon
51 | bowl
52 | banana
53 | apple
54 | sandwich
55 | orange
56 | broccoli
57 | carrot
58 | hot dog
59 | pizza
60 | donut
61 | cake
62 | chair
63 | couch
64 | potted plant
65 | bed
66 | mirror
67 | dining table
68 | window
69 | desk
70 | toilet
71 | door
72 | tv
73 | laptop
74 | mouse
75 | remote
76 | keyboard
77 | cell phone
78 | microwave
79 | oven
80 | toaster
81 | sink
82 | refrigerator
83 | blender
84 | book
85 | clock
86 | vase
87 | scissors
88 | teddy bear
89 | hair drier
90 | toothbrush
91 | 


--------------------------------------------------------------------------------
/03_opencv_detection/input/video_1.mp4:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/ifding/deep-learning-python/fc8bc808d5439686f0ee24a4f0f3b1f5354df6c0/03_opencv_detection/input/video_1.mp4


--------------------------------------------------------------------------------
/03_opencv_detection/main.cpp:
--------------------------------------------------------------------------------
  1 | #include <fstream>
  2 | #include <sstream>
  3 | #include <mutex>
  4 | #include <thread>
  5 | #include <queue>
  6 | 
  7 | #include <opencv2/dnn.hpp>
  8 | #include <opencv2/imgproc.hpp>
  9 | #include <opencv2/highgui.hpp>
 10 | 
 11 | 
 12 | using namespace cv;
 13 | using namespace dnn;
 14 | 
 15 | float confThreshold = 0.4, nmsThreshold = 0.4;
 16 | std::vector<std::string> classes;
 17 | 
 18 | inline void preprocess(const Mat& frame, Net& net, Size inpSize, float scale,
 19 |                        const Scalar& mean, bool swapRB);
 20 | 
 21 | void postprocess(Mat& frame, const std::vector<Mat>& out, Net& net);
 22 | 
 23 | void drawPred(int classId, float conf, int left, int top, int right, int bottom, Mat& frame);
 24 | 
 25 | 
 26 | template <typename T>
 27 | class QueueFPS : public std::queue<T>
 28 | {
 29 | public:
 30 |     QueueFPS() : counter(0) {}
 31 | 
 32 |     void push(const T& entry)
 33 |     {
 34 |         std::lock_guard<std::mutex> lock(mutex);
 35 | 
 36 |         std::queue<T>::push(entry);
 37 |         counter += 1;
 38 |         if (counter == 1)
 39 |         {
 40 |             // Start counting from a second frame (warmup).
 41 |             tm.reset();
 42 |             tm.start();
 43 |         }
 44 |     }
 45 | 
 46 |     T get()
 47 |     {
 48 |         std::lock_guard<std::mutex> lock(mutex);
 49 |         T entry = this->front();
 50 |         this->pop();
 51 |         return entry;
 52 |     }
 53 | 
 54 |     float getFPS()
 55 |     {
 56 |         tm.stop();
 57 |         double fps = counter / tm.getTimeSec();
 58 |         tm.start();
 59 |         return static_cast<float>(fps);
 60 |     }
 61 | 
 62 |     void clear()
 63 |     {
 64 |         std::lock_guard<std::mutex> lock(mutex);
 65 |         while (!this->empty())
 66 |             this->pop();
 67 |     }
 68 | 
 69 |     unsigned int counter;
 70 | 
 71 | private:
 72 |     TickMeter tm;
 73 |     std::mutex mutex;
 74 | };
 75 | 
 76 | 
 77 | int main(int argc, char** argv)
 78 | {
 79 | 
 80 |     float scale = 1.0;
 81 |     Scalar mean = Scalar(104, 177, 123);
 82 |     bool swapRB = false;
 83 |     int inpWidth = 300;
 84 |     int inpHeight = 300;
 85 |     size_t asyncNumReq = 0;
 86 | 
 87 |     // Open file with classes names.
 88 |     std::string file = "./input/object_detection_classes_coco.txt"; 
 89 |     std::ifstream ifs(file.c_str());
 90 |     if (!ifs.is_open())
 91 |         CV_Error(Error::StsError, "File " + file + " not found");
 92 |     std::string line;
 93 |     while (std::getline(ifs, line))
 94 |     {
 95 |         classes.push_back(line);
 96 |     }
 97 | 
 98 |     std::string modelPath = "./input/frozen_inference_graph.pb";
 99 |     std::string configPath = "./input/ssd_mobilenet_v2_coco_2018_03_29.pbtxt.txt";
100 |     std::string framework = "TensorFlow";    
101 | 
102 |     // Load a model.
103 |     Net net = readNet(modelPath, configPath, framework);
104 |     std::vector<String> outNames = net.getUnconnectedOutLayersNames();
105 | 
106 |     // Create a window
107 |     static const std::string kWinName = "Object detection in OpenCV";
108 |     namedWindow(kWinName, WINDOW_NORMAL);
109 | 
110 |     // Open a video file or an image file or a camera stream.
111 |     //VideoCapture cap;
112 |     VideoCapture cap("./input/video_1.mp4");
113 | 
114 |     bool process = true;
115 | 
116 |     // Frames capturing thread
117 |     QueueFPS<Mat> framesQueue;
118 |     std::thread framesThread([&](){
119 |         Mat frame;
120 |         while (process)
121 |         {
122 |             cap >> frame;
123 |             if (!frame.empty())
124 |                 framesQueue.push(frame.clone());
125 |             else
126 |                 break;
127 |         }
128 |     });
129 | 
130 |     // Frames processing thread
131 |     QueueFPS<Mat> processedFramesQueue;
132 |     QueueFPS<std::vector<Mat> > predictionsQueue;
133 |     std::thread processingThread([&](){
134 |         std::queue<AsyncArray> futureOutputs;
135 |         Mat blob;
136 |         while (process)
137 |         {
138 |             // Get a next frame
139 |             Mat frame;
140 |             {
141 |                 if (!framesQueue.empty())
142 |                 {
143 |                     frame = framesQueue.get();
144 |                     if (asyncNumReq)
145 |                     {
146 |                         if (futureOutputs.size() == asyncNumReq)
147 |                             frame = Mat();
148 |                     }
149 |                     else
150 |                         framesQueue.clear();  // Skip the rest of frames
151 |                 }
152 |             }
153 | 
154 |             // Process the frame
155 |             if (!frame.empty())
156 |             {
157 |                 preprocess(frame, net, Size(inpWidth, inpHeight), scale, mean, swapRB);
158 |                 processedFramesQueue.push(frame);
159 | 
160 |                 if (asyncNumReq)
161 |                 {
162 |                     futureOutputs.push(net.forwardAsync());
163 |                 }
164 |                 else
165 |                 {
166 |                     std::vector<Mat> outs;
167 |                     net.forward(outs, outNames);
168 |                     predictionsQueue.push(outs);
169 |                 }
170 |             }
171 | 
172 |             while (!futureOutputs.empty() &&
173 |                    futureOutputs.front().wait_for(std::chrono::seconds(0)))
174 |             {
175 |                 AsyncArray async_out = futureOutputs.front();
176 |                 futureOutputs.pop();
177 |                 Mat out;
178 |                 async_out.get(out);
179 |                 predictionsQueue.push({out});
180 |             }
181 |         }
182 |     });
183 | 
184 |     // Postprocessing and rendering loop
185 |     while (waitKey(1) < 0)
186 |     {
187 |         if (predictionsQueue.empty())
188 |             continue;
189 | 
190 |         std::vector<Mat> outs = predictionsQueue.get();
191 |         Mat frame = processedFramesQueue.get();
192 | 
193 |         postprocess(frame, outs, net);
194 | 
195 |         if (predictionsQueue.counter > 1)
196 |         {
197 |             std::string label = format("Camera: %.2f FPS", framesQueue.getFPS());
198 |             putText(frame, label, Point(0, 15), FONT_HERSHEY_SIMPLEX, 0.5, Scalar(0, 255, 0));
199 | 
200 |             label = format("Network: %.2f FPS", predictionsQueue.getFPS());
201 |             putText(frame, label, Point(0, 30), FONT_HERSHEY_SIMPLEX, 0.5, Scalar(0, 255, 0));
202 | 
203 |             label = format("Skipped frames: %d", framesQueue.counter - predictionsQueue.counter);
204 |             putText(frame, label, Point(0, 45), FONT_HERSHEY_SIMPLEX, 0.5, Scalar(0, 255, 0));
205 |         }
206 |         imshow(kWinName, frame);
207 |     }
208 | 
209 |     process = false;
210 |     framesThread.join();
211 |     processingThread.join();
212 | 
213 |     return 0;
214 | }
215 | 
216 | inline void preprocess(const Mat& frame, Net& net, Size inpSize, float scale,
217 |                        const Scalar& mean, bool swapRB)
218 | {
219 |     static Mat blob;
220 |     // Create a 4D blob from a frame.
221 |     if (inpSize.width <= 0) inpSize.width = frame.cols;
222 |     if (inpSize.height <= 0) inpSize.height = frame.rows;
223 |     blobFromImage(frame, blob, 1.0, inpSize, Scalar(), swapRB, false, CV_8U);
224 | 
225 |     // Run a model.
226 |     net.setInput(blob, "", scale, mean);
227 |     if (net.getLayer(0)->outputNameToIndex("im_info") != -1)  // Faster-RCNN or R-FCN
228 |     {
229 |         resize(frame, frame, inpSize);
230 |         Mat imInfo = (Mat_<float>(1, 3) << inpSize.height, inpSize.width, 1.6f);
231 |         net.setInput(imInfo, "im_info");
232 |     }
233 | }
234 | 
235 | void postprocess(Mat& frame, const std::vector<Mat>& outs, Net& net)
236 | {
237 |     static std::vector<int> outLayers = net.getUnconnectedOutLayers();
238 | 
239 |     std::vector<int> classIds;
240 |     std::vector<float> confidences;
241 |     std::vector<Rect> boxes;
242 | 
243 |     // Network produces output blob with a shape 1x1xNx7 where N is a number of
244 |     // detections and an every detection is a vector of values
245 |     // [batchId, classId, confidence, left, top, right, bottom]
246 |     CV_Assert(outs.size() > 0);
247 |     for (size_t k = 0; k < outs.size(); k++)
248 |     {
249 |         float* data = (float*)outs[k].data;
250 |         for (size_t i = 0; i < outs[k].total(); i += 7)
251 |         {
252 |             float confidence = data[i + 2];
253 |             if (confidence > confThreshold)
254 |             {
255 |                 int left   = (int)data[i + 3];
256 |                 int top    = (int)data[i + 4];
257 |                 int right  = (int)data[i + 5];
258 |                 int bottom = (int)data[i + 6];
259 |                 int width  = right - left + 1;
260 |                 int height = bottom - top + 1;
261 |                 if (width <= 2 || height <= 2)
262 |                 {
263 |                     left   = (int)(data[i + 3] * frame.cols);
264 |                     top    = (int)(data[i + 4] * frame.rows);
265 |                     right  = (int)(data[i + 5] * frame.cols);
266 |                     bottom = (int)(data[i + 6] * frame.rows);
267 |                     width  = right - left + 1;
268 |                     height = bottom - top + 1;
269 |                 }
270 |                 classIds.push_back((int)(data[i + 1]) - 1);  // Skip 0th background class id.
271 |                 boxes.push_back(Rect(left, top, width, height));
272 |                 confidences.push_back(confidence);
273 |             }
274 |         }
275 |     }
276 | 
277 |     // NMS is used inside Region layer only on DNN_BACKEND_OPENCV for another backends we need NMS in sample
278 |     // or NMS is required if number of outputs > 1
279 |     if (outLayers.size() > 1)
280 |     {
281 |         std::map<int, std::vector<size_t> > class2indices;
282 |         for (size_t i = 0; i < classIds.size(); i++)
283 |         {
284 |             if (confidences[i] >= confThreshold)
285 |             {
286 |                 class2indices[classIds[i]].push_back(i);
287 |             }
288 |         }
289 |         std::vector<Rect> nmsBoxes;
290 |         std::vector<float> nmsConfidences;
291 |         std::vector<int> nmsClassIds;
292 |         for (auto it = class2indices.begin(); it != class2indices.end(); ++it)
293 |         {
294 |             std::vector<Rect> localBoxes;
295 |             std::vector<float> localConfidences;
296 |             std::vector<size_t> classIndices = it->second;
297 |             for (size_t i = 0; i < classIndices.size(); i++)
298 |             {
299 |                 localBoxes.push_back(boxes[classIndices[i]]);
300 |                 localConfidences.push_back(confidences[classIndices[i]]);
301 |             }
302 |             std::vector<int> nmsIndices;
303 |             NMSBoxes(localBoxes, localConfidences, confThreshold, nmsThreshold, nmsIndices);
304 |             for (size_t i = 0; i < nmsIndices.size(); i++)
305 |             {
306 |                 size_t idx = nmsIndices[i];
307 |                 nmsBoxes.push_back(localBoxes[idx]);
308 |                 nmsConfidences.push_back(localConfidences[idx]);
309 |                 nmsClassIds.push_back(it->first);
310 |             }
311 |         }
312 |         boxes = nmsBoxes;
313 |         classIds = nmsClassIds;
314 |         confidences = nmsConfidences;
315 |     }
316 | 
317 |     for (size_t idx = 0; idx < boxes.size(); ++idx)
318 |     {
319 |         Rect box = boxes[idx];
320 |         drawPred(classIds[idx], confidences[idx], box.x, box.y,
321 |                  box.x + box.width, box.y + box.height, frame);
322 |     }
323 | }
324 | 
325 | void drawPred(int classId, float conf, int left, int top, int right, int bottom, Mat& frame)
326 | {
327 |     rectangle(frame, Point(left, top), Point(right, bottom), Scalar(0, 255, 0));
328 | 
329 |     std::string label = format("%.2f", conf);
330 |     if (!classes.empty())
331 |     {
332 |         CV_Assert(classId < (int)classes.size());
333 |         label = classes[classId] + ": " + label;
334 |     }
335 | 
336 |     int baseLine;
337 |     Size labelSize = getTextSize(label, FONT_HERSHEY_SIMPLEX, 0.5, 1, &baseLine);
338 | 
339 |     top = max(top, labelSize.height);
340 |     rectangle(frame, Point(left, top - labelSize.height),
341 |               Point(left + labelSize.width, top + baseLine), Scalar::all(255), FILLED);
342 |     putText(frame, label, Point(left, top), FONT_HERSHEY_SIMPLEX, 0.5, Scalar());
343 | }
344 | 


--------------------------------------------------------------------------------
/04_vehicle_detection/README.md:
--------------------------------------------------------------------------------
  1 | # Vehicle Detection Project
  2 | 
  3 | 
  4 | ### Abstract
  5 | 
  6 | The goal of the project was to develop a pipeline to reliably detect cars given a video from a roof-mounted camera: in this readme the reader will find a short summary of how I tackled the problem.
  7 | 
  8 | **Long story short**:
  9 |  - (baseline) HOG features + linear SVM to detect cars, temporal smoothing to discard false positive
 10 |  - (submission) [SSD deep network](https://arxiv.org/pdf/1512.02325.pdf) for detection, thresholds on detection confidence and label to discard false positive 
 11 |  
 12 | *That said, let's go into details!*
 13 | 
 14 | ### Good old CV: Histogram of Oriented Gradients (HOG)
 15 | 
 16 | #### 1. Feature Extraction.
 17 | 
 18 | In the field of computer vision, a *features* is a compact representation that encodes information that is relevant for a given task. In our case, features must be informative enough to distinguish between *car* and *non-car* image patches as accurately as possible.
 19 | 
 20 | Here is an example of how the `vehicle` and `non-vehicle` classes look like in this dataset:
 21 | 
 22 | <p align="center">
 23 |   <img src="./img/noncar_samples.png" alt="non_car_img">
 24 |   <br>Randomly-samples non-car patches.
 25 | </p>
 26 | 
 27 | <p align="center">
 28 |   <img src="./img/car_samples.png" alt="car_img">
 29 |   <br>Randomly-samples car patches.
 30 | </p>
 31 | 
 32 | The most of the code that relates to feature extraction is contained in [`functions_feat_extraction.py`](functions_feat_extraction.py). Nonetheless, all parameters used in the phase of feature extraction are stored as dictionary in [`config.py`](config.py), in order to be able to access them from anywhere in the project.
 33 | 
 34 | Actual feature extraction is performed by the function `image_to_features`, which takes as input an image and the dictionary of parameters, and returns the features computed for that image. In order to perform batch feature extraction on the whole dataset (for training), `extract_features_from_file_list` takes as input a list of images and return a list of feature vectors, one for each input image.
 35 | 
 36 | For the task of car detection I used *color histograms* and *spatial features* to encode the object visual appearence and HOG features to encode the object's *shape*. While color the first two features are easy to understand and implement, HOG features can be a little bit trickier to master.
 37 | 
 38 | #### 2. Choosing HOG parameters.
 39 | 
 40 | HOG stands for *Histogram of Oriented Gradients* and refer to a powerful descriptor that has met with a wide success in the computer vision community, since its [introduction](http://vc.cs.nthu.edu.tw/home/paper/codfiles/hkchiu/201205170946/Histograms%20of%20Oriented%20Gradients%20for%20Human%20Detection.pdf) in 2005 with the main purpose of people detection. 
 41 | 
 42 | <p align="center">
 43 |   <img src="./img/hog_car_vs_noncar.jpg" alt="hog" height="128">
 44 |   <br>Representation of HOG descriptors for a car patch (left) and a non-car patch (right).
 45 | </p>
 46 | 
 47 | The bad news is, HOG come along with a *lot* of parameters to tune in order to work properly. The main parameters are the size of the cell in which the gradients are accumulated, as well as the number of orientations used to discretize the histogram of gradients. Furthermore, one must specify the number of cells that compose a block, on which later a feature normalization will be performed. Finally, being the HOG computed on a single-channel image, arises the need of deciding which channel to use, eventually computing the feature on all channels then concatenating the result.
 48 | 
 49 | In order to select the right parameters, both the classifier accuracy and computational efficiency are to consider. After various attemps, I came up to the following parameters that are stored in [`config.py`](config.py):
 50 | ```
 51 | # parameters used in the phase of feature extraction
 52 | feat_extraction_params = {'resize_h': 64,             # resize image height before feat extraction
 53 |                           'resize_w': 64,             # resize image height before feat extraction
 54 |                           'color_space': 'YCrCb',     # Can be RGB, HSV, LUV, HLS, YUV, YCrCb
 55 |                           'orient': 9,                # HOG orientations
 56 |                           'pix_per_cell': 8,          # HOG pixels per cell
 57 |                           'cell_per_block': 2,        # HOG cells per block
 58 |                           'hog_channel': "ALL",       # Can be 0, 1, 2, or "ALL"
 59 |                           'spatial_size': (32, 32),   # Spatial binning dimensions
 60 |                           'hist_bins': 16,            # Number of histogram bins
 61 |                           'spatial_feat': True,       # Spatial features on or off
 62 |                           'hist_feat': True,          # Histogram features on or off
 63 |                           'hog_feat': True}           # HOG features on or off
 64 | ```
 65 | 
 66 | #### 3. Training the classifier
 67 | 
 68 | Once decided which features to used, we can train a classifier on these. In [`train.py`](train.py) I train a linear SVM for task of binary classification *car* vs *non-car*. First, training data are listed a feature vector is extracted for each image:
 69 | ```
 70 |     cars = get_file_list_recursively(root_data_vehicle)
 71 |     notcars = get_file_list_recursively(root_data_non_vehicle)
 72 | 
 73 |     car_features = extract_features_from_file_list(cars, feat_extraction_params)
 74 |     notcar_features = extract_features_from_file_list(notcars, feat_extraction_params)
 75 | ``` 
 76 | Then, the actual training set is composed as the set of all car and all non-car features (labels are given accordingly). Furthermore, feature vectors are standardize in order to have all the features in a similar range and ease training.
 77 | ```
 78 |     feature_scaler = StandardScaler().fit(X)  # per-column scaler
 79 |     scaled_X = feature_scaler.transform(X)
 80 | ```
 81 | Now, training the LinearSVM classifier is as easy as:
 82 | ```
 83 |     svc = LinearSVC()  # svc = SVC(kernel='rbf')
 84 |     svc.fit(X_train, y_train)
 85 | ```
 86 | In order to have an idea of the classifier performance, we can make a prediction on the test set with `svc.score(X_test, y_test)`. Training the SVM with the features explained above took around 10 minutes on my laptop. 
 87 | 
 88 | ### Sliding Window Search
 89 | 
 90 | #### 1. Describe how (and identify where in your code) you implemented a sliding window search.  How did you decide what scales to search and how much to overlap windows?
 91 | 
 92 | In a first phase, I implemented a naive sliding window approach in order to get windows at different scales for the purpose of classification. This is shown in function `compute_windows_multiscale` in [`functions_detection.py`](functions_detection.py). This turned out to be very slow. I utlimately implemented a function to jointly search the region of interest and to classify each window as suggested by the course instructor. The performance boost is due to the fact that HOG features are computed only once for the whole region of interest, then subsampled at different scales in order to have the same effect of a multiscale search, but in a more computationally efficient way. This function is called `find_cars` and implemented in [`functions_feat_extraction.py`](functions_feat_extraction.py). Of course the *tradeoff* is evident: the more the search scales and the more the overlap between adjacent windows, the less performing is the search from a computational point of view.
 93 | 
 94 | #### 2. Show some examples of test images to demonstrate how your pipeline is working.  What did you do to optimize the performance of your classifier?
 95 | 
 96 | Whole classification pipelin using CV approach is implemented in [`main_hog.py`](main_hog.py). Each test image undergoes through the `process_pipeline` function, which is responsbile for all phases: feature extraction, classification and showing the results.
 97 | 
 98 | <p align="center">
 99 |   <img src="./img/pipeline_hog.jpg" alt="hog" height="256">
100 |   <br>Result of HOG pipeline on one of the test images.
101 | </p>
102 | 
103 | In order to optimize the performance of the classifier, I started the training with different configuration of the parameters, and kept the best one. Performing detection at different scales also helped a lot, even if exceeding in this direction can lead to very long computational time for a single image. At the end of this pipeline, the whole processing, from image reading to writing the ouput blend, took about 0.5 second per frame.
104 | 
105 | ### Computer Vision on Steroids, a.k.a. Deep Learning
106 | 
107 | #### 1. SSD (*Single Shot Multi-Box Detector*) network
108 | 
109 | In order to solve the aforementioned problems, I decided to use a deep network to perform the detection, thus replacing the HOG+SVM pipeline. For this task employed the recently proposed  [SSD deep network](https://arxiv.org/pdf/1512.02325.pdf) for detection. This paved the way for several huge advantages:
110 |  - the network performs detection and classification in a single pass, and natively goes in GPU (*is fast*)
111 |  - there is no more need to tune and validate hundreds of parameters related to the phase of feature extraction (*is robust*)
112 |  - being the "car" class in very common, various pretrained models are available in different frameworks (Keras, Tensorflow etc.) that are already able to nicely distinguish this class of objects (*no need to retrain*)
113 |  - the network outputs a confidence level along with the coordinates of the bounding box, so we can decide the tradeoff precision and recall just by tuning the confidence level we want (*less false positive*) 
114 |  
115 | The whole pipeline has been adapted to the make use of SSD network in file [`main_ssd.py`](main_ssd.py).
116 | 
117 | ### Video Implementation
118 | 
119 | #### 1. Provide a link to your final video output.  Your pipeline should perform reasonably well on the entire project video (somewhat wobbly or unstable bounding boxes are ok as long as you are identifying the vehicles most of the time with minimal false positives.)
120 | Here's a [link to my video result](https://www.youtube.com/watch?v=Cd7p5pnP3e0)
121 | 
122 | 
123 | #### 2. Describe how (and identify where in your code) you implemented some kind of filter for false positives and some method for combining overlapping bounding boxes.
124 | 
125 | In a first phase while I was still using HOG+SVM, I implemented a heatmap to average detection results from successive frames. The heatmap was thresholded to a minimum value before labeling regions, so to remove the major part of false positive. This process in shown in the thumbnails on the left of the previous figure.
126 | 
127 | When I turned to deep learning, as mentioned before I could rely on a *confidence score* to decide the tradeoff between precision and recall. The following figure shows the effect of thresholding SSD detection at different level of confidence. 
128 | 
129 | <table style="width:100%">
130 |   <tr>
131 |     <th>
132 |       <p align="center">
133 |            <img src="./img/confidence_001.png" alt="low_confidence" height="256">
134 |            <br>SSD Network result setting minimum confidence = 0.01
135 |       </p>
136 |     </th>
137 |     <th>
138 |       <p align="center">
139 |            <img src="./img/confidence_050.png" alt="high_confidence" height="256">
140 |            <br>SSD Network result setting minimum confidence = 0.50
141 |       </p>
142 |     </th>
143 |   </tr>
144 | </table>
145 | 
146 | Actually, while using SSD network for detection for the project video I found that integrating detections over time was not only useless, but even detrimental for performance. Indeed, being detections very precide and false positive almost zero, there was no need anymore to carry on information from previous detections. 
147 | 
148 | ---
149 | 
150 | ### Discussion
151 | 
152 | #### 1. Briefly discuss any problems / issues you faced in your implementation of this project.  Where will your pipeline likely fail?  What could you do to make it more robust?
153 | 
154 | In the first phase, the HOG+SVM approach turned out to be slightly frustrating, in that strongly relied on the parameter chosed to perform feature extraction, training and detection. Even if I found a set of parameters that more or less worked for the project video, I wasn't satisfied of the result, because parameters were so finely tuned on the project video that certainly were not robust to different situations. 
155 | 
156 | For this reason, I turned to deep learning, and I leveraged on an existing detection network (pretrained on Pascal VOC classes) to tackle the problem. From that moment, the sun shone again on this assignment! :-)
157 | 
158 | ### Acknowledgments
159 | 
160 | Implementation of [Single Shot MultiBox Detector](https://arxiv.org/pdf/1512.02325.pdf) was borrowed from [this repo](https://github.com/rykov8/ssd_keras) and then slightly modified for my purpose. Thank you [rykov8](https://github.com/rykov8) for porting this amazing network in Keras-Tensorflow!
161 | 


--------------------------------------------------------------------------------
/04_vehicle_detection/config.py:
--------------------------------------------------------------------------------
 1 | # root directory that contain all vehicle images in nested subdirectories
 2 | root_data_vehicle = '../../../NANODEGREE/term_1/project_5_vehicle_detection/vehicles'
 3 | 
 4 | # root directory that contain all NON-vehicle images in nested subdirectories
 5 | root_data_non_vehicle = '../../../NANODEGREE/term_1/project_5_vehicle_detection/non-vehicles'
 6 | 
 7 | # parameters used in the phase of feature extraction
 8 | feat_extraction_params = {'resize_h': 64,             # resize image height before feat extraction
 9 |                           'resize_w': 64,             # resize image height before feat extraction
10 |                           'color_space': 'YCrCb',     # Can be RGB, HSV, LUV, HLS, YUV, YCrCb
11 |                           'orient': 9,                # HOG orientations
12 |                           'pix_per_cell': 8,          # HOG pixels per cell
13 |                           'cell_per_block': 2,        # HOG cells per block
14 |                           'hog_channel': "ALL",       # Can be 0, 1, 2, or "ALL"
15 |                           'spatial_size': (32, 32),   # Spatial binning dimensions
16 |                           'hist_bins': 16,            # Number of histogram bins
17 |                           'spatial_feat': True,       # Spatial features on or off
18 |                           'hist_feat': True,          # Histogram features on or off
19 |                           'hog_feat': True}           # HOG features on or off
20 | 
21 | 
22 | 
23 | 
24 | 
25 | 
26 | 
27 | 


--------------------------------------------------------------------------------
/04_vehicle_detection/data/feat_extraction_params.pickle:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/ifding/deep-learning-python/fc8bc808d5439686f0ee24a4f0f3b1f5354df6c0/04_vehicle_detection/data/feat_extraction_params.pickle


--------------------------------------------------------------------------------
/04_vehicle_detection/data/feature_scaler.pickle:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/ifding/deep-learning-python/fc8bc808d5439686f0ee24a4f0f3b1f5354df6c0/04_vehicle_detection/data/feature_scaler.pickle


--------------------------------------------------------------------------------
/04_vehicle_detection/data/svm_trained.pickle:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/ifding/deep-learning-python/fc8bc808d5439686f0ee24a4f0f3b1f5354df6c0/04_vehicle_detection/data/svm_trained.pickle


--------------------------------------------------------------------------------
/04_vehicle_detection/functions_detection.py:
--------------------------------------------------------------------------------
  1 | import cv2
  2 | import matplotlib.pyplot as plt
  3 | import numpy as np
  4 | 
  5 | from functions_feat_extraction import image_to_features
  6 | from project_5_utils import stitch_together
  7 | 
  8 | 
  9 | def draw_labeled_bounding_boxes(img, labeled_frame, num_objects):
 10 |     """
 11 |     Starting from labeled regions, draw enclosing rectangles in the original color frame.
 12 |     """
 13 |     # Iterate through all detected cars
 14 |     for car_number in range(1, num_objects + 1):
 15 |         # Find pixels with each car_number label value
 16 |         rows, cols = np.where(labeled_frame == car_number)
 17 | 
 18 |         # Find minimum enclosing rectangle
 19 |         x_min, y_min = np.min(cols), np.min(rows)
 20 |         x_max, y_max = np.max(cols), np.max(rows)
 21 | 
 22 |         cv2.rectangle(img, (x_min, y_min), (x_max, y_max), color=(255, 0, 0), thickness=6)
 23 | 
 24 |     return img
 25 | 
 26 | 
 27 | def compute_heatmap_from_detections(frame, hot_windows, threshold=5, verbose=False):
 28 |     """
 29 |     Compute heatmaps from windows classified as positive, in order to filter false positives.
 30 |     """
 31 |     h, w, c = frame.shape
 32 | 
 33 |     heatmap = np.zeros(shape=(h, w), dtype=np.uint8)
 34 | 
 35 |     for bbox in hot_windows:
 36 |         # for each bounding box, add heat to the corresponding rectangle in the image
 37 |         x_min, y_min = bbox[0]
 38 |         x_max, y_max = bbox[1]
 39 |         heatmap[y_min:y_max, x_min:x_max] += 1  # add heat
 40 | 
 41 |     # apply threshold + morphological closure to remove noise
 42 |     _, heatmap_thresh = cv2.threshold(heatmap, threshold, 255, type=cv2.THRESH_BINARY)
 43 |     heatmap_thresh = cv2.morphologyEx(heatmap_thresh, op=cv2.MORPH_CLOSE,
 44 |                                       kernel=cv2.getStructuringElement(cv2.MORPH_ELLIPSE,
 45 |                                                                        (13, 13)), iterations=1)
 46 |     if verbose:
 47 |         f, ax = plt.subplots(1, 3)
 48 |         ax[0].imshow(cv2.cvtColor(frame, cv2.COLOR_BGR2RGB))
 49 |         ax[1].imshow(heatmap, cmap='hot')
 50 |         ax[2].imshow(heatmap_thresh, cmap='hot')
 51 |         plt.show()
 52 | 
 53 |     return heatmap, heatmap_thresh
 54 | 
 55 | 
 56 | def compute_windows_multiscale(image, verbose=False):
 57 |     """
 58 |     Naive implementation of multiscale window search.
 59 |     """
 60 |     h, w, c = image.shape
 61 | 
 62 |     windows_multiscale = []
 63 | 
 64 |     windows_32 = slide_window(image, x_start_stop=[None, None],
 65 |                               y_start_stop=[4 * h // 8, 5 * h // 8],
 66 |                               xy_window=(32, 32), xy_overlap=(0.8, 0.8))
 67 |     windows_multiscale.append(windows_32)
 68 | 
 69 |     windows_64 = slide_window(image, x_start_stop=[None, None],
 70 |                               y_start_stop=[4 * h // 8, 6 * h // 8],
 71 |                               xy_window=(64, 64), xy_overlap=(0.8, 0.8))
 72 |     windows_multiscale.append(windows_64)
 73 | 
 74 |     windows_128 = slide_window(image, x_start_stop=[None, None], y_start_stop=[3 * h // 8, h],
 75 |                                xy_window=(128, 128), xy_overlap=(0.8, 0.8))
 76 |     windows_multiscale.append(windows_128)
 77 | 
 78 |     if verbose:
 79 |         windows_img_32 = draw_boxes(image, windows_32, color=(0, 0, 255), thick=1)
 80 |         windows_img_64 = draw_boxes(image, windows_64, color=(0, 255, 0), thick=1)
 81 |         windows_img_128 = draw_boxes(image, windows_128, color=(255, 0, 0), thick=1)
 82 | 
 83 |         stitching = stitch_together([windows_img_32, windows_img_64, windows_img_128], (1, 3),
 84 |                                     resize_dim=(1300, 500))
 85 |         cv2.imshow('', stitching)
 86 |         cv2.waitKey()
 87 | 
 88 |     return np.concatenate(windows_multiscale)
 89 | 
 90 | 
 91 | def slide_window(img, x_start_stop=[None, None], y_start_stop=[None, None],
 92 |                  xy_window=(64, 64), xy_overlap=(0.5, 0.5)):
 93 |     """
 94 |     Implementation of a sliding window in a region of interest of the image.
 95 |     """
 96 |     # If x and/or y start/stop positions not defined, set to image size
 97 |     if x_start_stop[0] is None:
 98 |         x_start_stop[0] = 0
 99 |     if x_start_stop[1] is None:
100 |         x_start_stop[1] = img.shape[1]
101 |     if y_start_stop[0] is None:
102 |         y_start_stop[0] = 0
103 |     if y_start_stop[1] is None:
104 |         y_start_stop[1] = img.shape[0]
105 | 
106 |     # Compute the span of the region to be searched
107 |     x_span = x_start_stop[1] - x_start_stop[0]
108 |     y_span = y_start_stop[1] - y_start_stop[0]
109 | 
110 |     # Compute the number of pixels per step in x/y
111 |     n_x_pix_per_step = np.int(xy_window[0] * (1 - xy_overlap[0]))
112 |     n_y_pix_per_step = np.int(xy_window[1] * (1 - xy_overlap[1]))
113 | 
114 |     # Compute the number of windows in x / y
115 |     n_x_windows = np.int(x_span / n_x_pix_per_step) - 1
116 |     n_y_windows = np.int(y_span / n_y_pix_per_step) - 1
117 | 
118 |     # Initialize a list to append window positions to
119 |     window_list = []
120 | 
121 |     # Loop through finding x and y window positions.
122 |     for i in range(n_y_windows):
123 |         for j in range(n_x_windows):
124 |             # Calculate window position
125 |             start_x = j * n_x_pix_per_step + x_start_stop[0]
126 |             end_x = start_x + xy_window[0]
127 |             start_y = i * n_y_pix_per_step + y_start_stop[0]
128 |             end_y = start_y + xy_window[1]
129 | 
130 |             # Append window position to list
131 |             window_list.append(((start_x, start_y), (end_x, end_y)))
132 | 
133 |     # Return the list of windows
134 |     return window_list
135 | 
136 | 
137 | def draw_boxes(img, bbox_list, color=(0, 0, 255), thick=6):
138 |     """
139 |     Draw all bounding boxes in `bbox_list` onto a given image.
140 |     :param img: input image
141 |     :param bbox_list: list of bounding boxes
142 |     :param color: color used for drawing boxes
143 |     :param thick: thickness of the box line
144 |     :return: a new image with the bounding boxes drawn
145 |     """
146 |     # Make a copy of the image
147 |     img_copy = np.copy(img)
148 | 
149 |     # Iterate through the bounding boxes
150 |     for bbox in bbox_list:
151 |         # Draw a rectangle given bbox coordinates
152 |         tl_corner = tuple(bbox[0])
153 |         br_corner = tuple(bbox[1])
154 |         cv2.rectangle(img_copy, tl_corner, br_corner, color, thick)
155 | 
156 |     # Return the image copy with boxes drawn
157 |     return img_copy
158 | 
159 | 
160 | # Define a function you will pass an image and the list of windows to be searched (output of slide_windows())
161 | def search_windows(img, windows, clf, scaler, feat_extraction_params):
162 |     hot_windows = []  # list to receive positive detection windows
163 | 
164 |     for window in windows:
165 |         # Extract the current window from original image
166 |         resize_h, resize_w = feat_extraction_params['resize_h'], feat_extraction_params['resize_w']
167 |         test_img = cv2.resize(img[window[0][1]:window[1][1], window[0][0]:window[1][0]],
168 |                               (resize_w, resize_h))
169 | 
170 |         # Extract features for that window using single_img_features()
171 |         features = image_to_features(test_img, feat_extraction_params)
172 | 
173 |         # Scale extracted features to be fed to classifier
174 |         test_features = scaler.transform(np.array(features).reshape(1, -1))
175 | 
176 |         # Predict on rescaled features
177 |         prediction = clf.predict(test_features)
178 | 
179 |         # If positive (prediction == 1) then save the window
180 |         if prediction == 1:
181 |             hot_windows.append(window)
182 | 
183 |     # Return windows for positive detections
184 |     return hot_windows
185 | 


--------------------------------------------------------------------------------
/04_vehicle_detection/functions_utils.py:
--------------------------------------------------------------------------------
 1 | import numpy as np
 2 | 
 3 | 
 4 | def normalize_image(img):
 5 |     """
 6 |     Normalize image between 0 and 255 and cast to uint8
 7 |     (useful for visualization)
 8 |     """
 9 |     img = np.float32(img)
10 | 
11 |     img = img / img.max() * 255
12 | 
13 |     return np.uint8(img)


--------------------------------------------------------------------------------
/04_vehicle_detection/img/car_samples.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/ifding/deep-learning-python/fc8bc808d5439686f0ee24a4f0f3b1f5354df6c0/04_vehicle_detection/img/car_samples.png


--------------------------------------------------------------------------------
/04_vehicle_detection/img/confidence_001.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/ifding/deep-learning-python/fc8bc808d5439686f0ee24a4f0f3b1f5354df6c0/04_vehicle_detection/img/confidence_001.png


--------------------------------------------------------------------------------
/04_vehicle_detection/img/confidence_050.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/ifding/deep-learning-python/fc8bc808d5439686f0ee24a4f0f3b1f5354df6c0/04_vehicle_detection/img/confidence_050.png


--------------------------------------------------------------------------------
/04_vehicle_detection/img/hog_car_vs_noncar.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/ifding/deep-learning-python/fc8bc808d5439686f0ee24a4f0f3b1f5354df6c0/04_vehicle_detection/img/hog_car_vs_noncar.jpg


--------------------------------------------------------------------------------
/04_vehicle_detection/img/noncar_samples.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/ifding/deep-learning-python/fc8bc808d5439686f0ee24a4f0f3b1f5354df6c0/04_vehicle_detection/img/noncar_samples.png


--------------------------------------------------------------------------------
/04_vehicle_detection/img/pipeline_hog.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/ifding/deep-learning-python/fc8bc808d5439686f0ee24a4f0f3b1f5354df6c0/04_vehicle_detection/img/pipeline_hog.jpg


--------------------------------------------------------------------------------
/04_vehicle_detection/main_hog.py:
--------------------------------------------------------------------------------
  1 | import os
  2 | import numpy as np
  3 | import pickle
  4 | from functions_detection import *
  5 | import scipy
  6 | from functions_utils import normalize_image
  7 | from functions_feat_extraction import find_cars
  8 | import time
  9 | import collections
 10 | 
 11 | time_window = 5
 12 | hot_windows_history = collections.deque(maxlen=time_window)
 13 | 
 14 | # load pretrained svm classifier
 15 | svc = pickle.load(open('data/svm_trained.pickle', 'rb'))
 16 | 
 17 | # load feature scaler fitted on training data
 18 | feature_scaler = pickle.load(open('data/feature_scaler.pickle', 'rb'))
 19 | 
 20 | # load parameters used to perform feature extraction
 21 | feat_extraction_params = pickle.load(open('data/feat_extraction_params.pickle', 'rb'))
 22 | 
 23 | 
 24 | def prepare_output_blend(frame, img_hot_windows, img_heatmap, img_labeling, img_detection):
 25 | 
 26 |     h, w, c = frame.shape
 27 | 
 28 |     # decide the size of thumbnail images
 29 |     thumb_ratio = 0.25
 30 |     thumb_h, thumb_w = int(thumb_ratio * h), int(thumb_ratio * w)
 31 | 
 32 |     # resize to thumbnails images from various stages of the pipeline
 33 |     thumb_hot_windows = cv2.resize(img_hot_windows, dsize=(thumb_w, thumb_h))
 34 |     thumb_heatmap = cv2.resize(img_heatmap, dsize=(thumb_w, thumb_h))
 35 |     thumb_labeling = cv2.resize(img_labeling, dsize=(thumb_w, thumb_h))
 36 | 
 37 |     off_x, off_y = 20, 45
 38 | 
 39 |     # add a semi-transparent rectangle to highlight thumbnails on the left
 40 |     mask = cv2.rectangle(img_detection.copy(), (0, 0), (2*off_x + thumb_w, h), (0, 0, 0), thickness=cv2.FILLED)
 41 |     img_blend = cv2.addWeighted(src1=mask, alpha=0.2, src2=img_detection, beta=0.8, gamma=0)
 42 | 
 43 |     # stitch thumbnails
 44 |     img_blend[off_y:off_y+thumb_h, off_x:off_x+thumb_w, :] = thumb_hot_windows
 45 |     img_blend[2*off_y+thumb_h:2*(off_y+thumb_h), off_x:off_x+thumb_w, :] = thumb_heatmap
 46 |     img_blend[3*off_y+2*thumb_h:3*(off_y+thumb_h), off_x:off_x+thumb_w, :] = thumb_labeling
 47 | 
 48 |     return img_blend
 49 | 
 50 | 
 51 | def process_pipeline(frame, svc, feature_scaler, feat_extraction_params, keep_state=True, verbose=False):
 52 | 
 53 |     hot_windows = []
 54 | 
 55 |     for subsample in np.arange(1, 3, 0.5):
 56 |         hot_windows += find_cars(frame, 400, 600, subsample, svc, feature_scaler, feat_extraction_params)
 57 | 
 58 |     if keep_state:
 59 |         if hot_windows:
 60 |             hot_windows_history.append(hot_windows)
 61 |             hot_windows = np.concatenate(hot_windows_history)
 62 | 
 63 |     # compute heatmaps positive windows found
 64 |     thresh = (time_window - 1) if keep_state else 0
 65 |     heatmap, heatmap_thresh = compute_heatmap_from_detections(frame, hot_windows, threshold=thresh, verbose=False)
 66 | 
 67 |     # label connected components
 68 |     labeled_frame, num_objects = scipy.ndimage.measurements.label(heatmap_thresh)
 69 | 
 70 |     # prepare images for blend
 71 |     img_hot_windows = draw_boxes(frame, hot_windows, color=(0, 0, 255), thick=2)                 # show pos windows
 72 |     img_heatmap = cv2.applyColorMap(normalize_image(heatmap), colormap=cv2.COLORMAP_HOT)         # draw heatmap
 73 |     img_labeling = cv2.applyColorMap(normalize_image(labeled_frame), colormap=cv2.COLORMAP_HOT)  # draw label
 74 |     img_detection = draw_labeled_bounding_boxes(frame.copy(), labeled_frame, num_objects)        # draw detected bboxes
 75 | 
 76 |     img_blend_out = prepare_output_blend(frame, img_hot_windows, img_heatmap, img_labeling, img_detection)
 77 | 
 78 |     if verbose:
 79 |         cv2.imshow('detection bboxes', img_hot_windows)
 80 |         cv2.imshow('heatmap', img_heatmap)
 81 |         cv2.imshow('labeled frame', img_labeling)
 82 |         cv2.imshow('detections', img_detection)
 83 |         cv2.waitKey()
 84 | 
 85 |     return img_blend_out
 86 | 
 87 | 
 88 | if __name__ == '__main__':
 89 | 
 90 |     test_img_dir = 'test_images'
 91 |     for test_img in os.listdir(test_img_dir):
 92 | 
 93 |         t = time.time()
 94 |         print('Processing image {}...'.format(test_img), end="")
 95 | 
 96 |         frame = cv2.imread(os.path.join(test_img_dir, test_img))
 97 | 
 98 |         frame_out = process_pipeline(frame, svc, feature_scaler, feat_extraction_params, keep_state=False, verbose=False)
 99 | 
100 |         cv2.imwrite('output_images/{}'.format(test_img), frame_out)
101 | 
102 |         print('Done. Elapsed: {:.02f}'.format(time.time()-t))
103 | 
104 | 


--------------------------------------------------------------------------------
/04_vehicle_detection/main_ssd.py:
--------------------------------------------------------------------------------
  1 | from functions_detection import *
  2 | from SSD import process_frame_bgr_with_SSD, get_SSD_model
  3 | from vehicle import Vehicle
  4 | import os
  5 | import os.path as path
  6 | 
  7 | 
  8 | # global deep network model
  9 | ssd_model, bbox_helper, color_palette = get_SSD_model()
 10 | 
 11 | 
 12 | def process_pipeline(frame, verbose=False):
 13 | 
 14 |     detected_vehicles = []
 15 | 
 16 |     img_blend_out = frame.copy()
 17 | 
 18 |     # return bounding boxes detected by SSD
 19 |     ssd_bboxes = process_frame_bgr_with_SSD(frame, ssd_model, bbox_helper, allow_classes=[7], min_confidence=0.3)
 20 |     for row in ssd_bboxes:
 21 |         label, confidence, x_min, y_min, x_max, y_max = row
 22 |         x_min = int(round(x_min * frame.shape[1]))
 23 |         y_min = int(round(y_min * frame.shape[0]))
 24 |         x_max = int(round(x_max * frame.shape[1]))
 25 |         y_max = int(round(y_max * frame.shape[0]))
 26 | 
 27 |         proposed_vehicle = Vehicle(x_min, y_min, x_max, y_max)
 28 | 
 29 |         if not detected_vehicles:
 30 |             detected_vehicles.append(proposed_vehicle)
 31 |         else:
 32 |             for i, vehicle in enumerate(detected_vehicles):
 33 |                 if vehicle.contains(*proposed_vehicle.center):
 34 |                     pass  # go on, bigger bbox already detected in that position
 35 |                 elif proposed_vehicle.contains(*vehicle.center):
 36 |                     detected_vehicles[i] = proposed_vehicle  # keep the bigger window
 37 |                 else:
 38 |                     detected_vehicles.append(proposed_vehicle)
 39 | 
 40 |     # draw bounding boxes of detected vehicles on frame
 41 |     for vehicle in detected_vehicles:
 42 |         vehicle.draw(img_blend_out, color=(0, 255, 255), thickness=2)
 43 | 
 44 |     h, w = frame.shape[:2]
 45 |     off_x, off_y = 30, 30
 46 |     thumb_h, thumb_w = (96, 128)
 47 | 
 48 |     # add a semi-transparent rectangle to highlight thumbnails on the left
 49 |     mask = cv2.rectangle(frame.copy(), (0, 0), (w, 2 * off_y + thumb_h), (0, 0, 0), thickness=cv2.FILLED)
 50 |     img_blend_out = cv2.addWeighted(src1=mask, alpha=0.3, src2=img_blend_out, beta=0.8, gamma=0)
 51 | 
 52 |     # create list of thumbnails s.t. this can be later sorted for drawing
 53 |     vehicle_thumbs = []
 54 |     for i, vehicle in enumerate(detected_vehicles):
 55 |         x_min, y_min, x_max, y_max = vehicle.coords
 56 |         vehicle_thumbs.append(frame[y_min:y_max, x_min:x_max, :])
 57 | 
 58 |     # draw detected car thumbnails on the top of the frame
 59 |     for i, thumbnail in enumerate(sorted(vehicle_thumbs, key=lambda x: np.mean(x), reverse=True)):
 60 |         vehicle_thumb = cv2.resize(thumbnail, dsize=(thumb_w, thumb_h))
 61 |         start_x = 300 + (i+1) * off_x + i * thumb_w
 62 |         img_blend_out[off_y:off_y + thumb_h, start_x:start_x + thumb_w, :] = vehicle_thumb
 63 | 
 64 |     # write the counter of cars detected
 65 |     font = cv2.FONT_HERSHEY_SIMPLEX
 66 |     cv2.putText(img_blend_out, 'Vehicles in sight: {:02d}'.format(len(detected_vehicles)),
 67 |                 (20, off_y + thumb_h // 2), font, 0.8, (255, 255, 255), 2, cv2.LINE_AA)
 68 | 
 69 |     return img_blend_out
 70 | 
 71 | 
 72 | if __name__ == '__main__':
 73 | 
 74 |     mode = 'images'
 75 | 
 76 |     if mode == 'video':
 77 | 
 78 |         video_file = 'project_video.mp4'
 79 | 
 80 |         cap_in = cv2.VideoCapture(video_file)
 81 |         video_out_dir = '../../../NANODEGREE/term_1/project_5_vehicle_detection/frames_out'
 82 | 
 83 |         f_counter = 0
 84 |         while True:
 85 | 
 86 |             ret, frame = cap_in.read()
 87 | 
 88 |             if ret:
 89 | 
 90 |                 f_counter += 1
 91 | 
 92 |                 frame_out = process_pipeline(frame, verbose=1)
 93 | 
 94 |                 cv2.imwrite(path.join(video_out_dir, '{:06d}.jpg'.format(f_counter)), frame_out)
 95 | 
 96 |                 cv2.imshow('', frame_out)
 97 |                 if cv2.waitKey(1) & 0xFF == ord('q'):
 98 |                     break
 99 | 
100 |         # When everything done, release the capture
101 |         cap_in.release()
102 |         cv2.destroyAllWindows()
103 |         exit()
104 | 
105 |     else:
106 | 
107 |         test_img_dir = 'test_images'
108 |         for test_img in os.listdir(test_img_dir):
109 | 
110 |             frame = cv2.imread(os.path.join(test_img_dir, test_img))
111 | 
112 |             frame_out = process_pipeline(frame, verbose=False)
113 | 
114 |             cv2.imwrite('output_images/{}'.format(test_img), frame_out)
115 | 
116 | 
117 | 
118 | 


--------------------------------------------------------------------------------
/04_vehicle_detection/output_images/test1.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/ifding/deep-learning-python/fc8bc808d5439686f0ee24a4f0f3b1f5354df6c0/04_vehicle_detection/output_images/test1.jpg


--------------------------------------------------------------------------------
/04_vehicle_detection/output_images/test2.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/ifding/deep-learning-python/fc8bc808d5439686f0ee24a4f0f3b1f5354df6c0/04_vehicle_detection/output_images/test2.jpg


--------------------------------------------------------------------------------
/04_vehicle_detection/output_images/test3.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/ifding/deep-learning-python/fc8bc808d5439686f0ee24a4f0f3b1f5354df6c0/04_vehicle_detection/output_images/test3.jpg


--------------------------------------------------------------------------------
/04_vehicle_detection/output_images/test4.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/ifding/deep-learning-python/fc8bc808d5439686f0ee24a4f0f3b1f5354df6c0/04_vehicle_detection/output_images/test4.jpg


--------------------------------------------------------------------------------
/04_vehicle_detection/output_images/test5.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/ifding/deep-learning-python/fc8bc808d5439686f0ee24a4f0f3b1f5354df6c0/04_vehicle_detection/output_images/test5.jpg


--------------------------------------------------------------------------------
/04_vehicle_detection/output_images/test6.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/ifding/deep-learning-python/fc8bc808d5439686f0ee24a4f0f3b1f5354df6c0/04_vehicle_detection/output_images/test6.jpg


--------------------------------------------------------------------------------
/04_vehicle_detection/process_video.py:
--------------------------------------------------------------------------------
 1 | import cv2
 2 | import os.path as path
 3 | from SSD import process_frame_bgr_with_SSD, show_SSD_results, get_SSD_model
 4 | 
 5 | 
 6 | if __name__ == '__main__':
 7 | 
 8 |     SSD_net, bbox_helper, color_palette = get_SSD_model()
 9 | 
10 |     # video_file = 'project_video.mp4'
11 |     video_file = 'C:/Users/minotauro/Google Drive/DEMO_SMARTAREA/modena.mp4'
12 |     # out_path   = 'C:/Users/minotauro/Google Drive/DEMO_SMARTAREA/out_frames'
13 |     out_path = 'C:/temp_frames'
14 | 
15 |     cap = cv2.VideoCapture(video_file)
16 | 
17 |     counter = 0
18 |     while True:
19 | 
20 |         ret, frame = cap.read()
21 | 
22 |         if ret:
23 |             bboxes = process_frame_bgr_with_SSD(frame, SSD_net, bbox_helper,
24 |                                                 min_confidence=0.2,
25 |                                                 allow_classes=[2, 7, 14, 15])
26 | 
27 |             show_SSD_results(bboxes, frame, color_palette=color_palette)
28 | 
29 |             cv2.imwrite(path.join(out_path, '{:06d}.jpg'.format(counter)), frame)
30 |             # cv2.imshow('', frame)
31 |             # if cv2.waitKey(1) & 0xFF == ord('q'):
32 |             #     break
33 | 
34 |             counter += 1
35 | 
36 | 
37 |     # When everything done, release the capture
38 |     cap.release()
39 |     cv2.destroyAllWindows()
40 |     exit()
41 | 


--------------------------------------------------------------------------------
/04_vehicle_detection/project_5_utils.py:
--------------------------------------------------------------------------------
  1 | import os
  2 | import sys
  3 | from os.path import exists
  4 | from os.path import join
  5 | 
  6 | import cv2
  7 | import numpy as np
  8 | 
  9 | 
 10 | def get_file_list_recursively(top_directory):
 11 |     """
 12 |     Get list of full paths of all files found under root directory "top_directory".
 13 |     If a list of allowed file extensions is provided, files are filtered according to this list.
 14 | 
 15 |     Parameters
 16 |     ----------
 17 |     top_directory: str
 18 |         Root of the hierarchy
 19 | 
 20 |     Returns
 21 |     -------
 22 |     file_list: list
 23 |         List of files found under top_directory (with full path)
 24 |     """
 25 |     if not exists(top_directory):
 26 |         raise ValueError('Directory "{}" does NOT exist.'.format(top_directory))
 27 | 
 28 |     file_list = []
 29 | 
 30 |     for cur_dir, cur_subdirs, cur_files in os.walk(top_directory):
 31 | 
 32 |         for file in cur_files:
 33 |             file_list.append(join(cur_dir, file))
 34 |             sys.stdout.write(
 35 |                 '\r[{}] - found {:06d} files...'.format(top_directory, len(file_list)))
 36 |             sys.stdout.flush()
 37 | 
 38 |     sys.stdout.write(' Done.\n')
 39 | 
 40 |     return file_list
 41 | 
 42 | 
 43 | def stitch_together(input_images, layout, resize_dim=None, off_x=None, off_y=None,
 44 |                     bg_color=(0, 0, 0)):
 45 |     """
 46 |     Stitch together N input images into a bigger frame, using a grid layout.
 47 |     Input images can be either color or grayscale, but must all have the same size.
 48 | 
 49 |     Parameters
 50 |     ----------
 51 |     input_images : list
 52 |         List of input images
 53 |     layout : tuple
 54 |         Grid layout of the stitch expressed as (rows, cols)
 55 |     resize_dim : couple
 56 |         If not None, stitch is resized to this size
 57 |     off_x : int
 58 |         Offset between stitched images along x axis
 59 |     off_y : int
 60 |         Offset between stitched images along y axis
 61 |     bg_color : tuple
 62 |         Color used for background
 63 | 
 64 |     Returns
 65 |     -------
 66 |     stitch : ndarray
 67 |         Stitch of input images
 68 |     """
 69 | 
 70 |     if len(set([img.shape for img in input_images])) > 1:
 71 |         raise ValueError('All images must have the same shape')
 72 | 
 73 |     if len(set([img.dtype for img in input_images])) > 1:
 74 |         raise ValueError('All images must have the same data type')
 75 | 
 76 |     # determine if input images are color (3 channels) or grayscale (single channel)
 77 |     if len(input_images[0].shape) == 2:
 78 |         mode = 'grayscale'
 79 |         img_h, img_w = input_images[0].shape
 80 |     elif len(input_images[0].shape) == 3:
 81 |         mode = 'color'
 82 |         img_h, img_w, img_c = input_images[0].shape
 83 |     else:
 84 |         raise ValueError('Unknown shape for input images')
 85 | 
 86 |     # if no offset is provided, set to 10% of image size
 87 |     if off_x is None:
 88 |         off_x = img_w // 10
 89 |     if off_y is None:
 90 |         off_y = img_h // 10
 91 | 
 92 |     # create stitch mask
 93 |     rows, cols = layout
 94 |     stitch_h = rows * img_h + (rows + 1) * off_y
 95 |     stitch_w = cols * img_w + (cols + 1) * off_x
 96 |     if mode == 'color':
 97 |         bg_color = np.array(bg_color)[None, None, :]  # cast to ndarray add singleton dimensions
 98 |         stitch = np.uint8(np.repeat(np.repeat(bg_color, stitch_h, axis=0), stitch_w, axis=1))
 99 |     elif mode == 'grayscale':
100 |         stitch = np.zeros(shape=(stitch_h, stitch_w), dtype=np.uint8)
101 | 
102 |     for r in range(0, rows):
103 |         for c in range(0, cols):
104 | 
105 |             list_idx = r * cols + c
106 | 
107 |             if list_idx < len(input_images):
108 |                 if mode == 'color':
109 |                     stitch[r * (off_y + img_h) + off_y: r * (off_y + img_h) + off_y + img_h,
110 |                     c * (off_x + img_w) + off_x: c * (off_x + img_w) + off_x + img_w,
111 |                     :] = input_images[list_idx]
112 |                 elif mode == 'grayscale':
113 |                     stitch[r * (off_y + img_h) + off_y: r * (off_y + img_h) + off_y + img_h,
114 |                     c * (off_x + img_w) + off_x: c * (off_x + img_w) + off_x + img_w] \
115 |                         = input_images[list_idx]
116 | 
117 |     if resize_dim:
118 |         stitch = cv2.resize(stitch, dsize=(resize_dim[::-1]))
119 | 
120 |     return stitch
121 | 
122 | 
123 | class Rectangle:
124 |     """
125 |     2D Rectangle defined by top-left and bottom-right corners.
126 |     Parameters
127 |     ----------
128 |     x_min : int
129 |         x coordinate of top-left corner.
130 |     y_min : int
131 |         y coordinate of top-left corner.
132 |     x_max : int
133 |         x coordinate of bottom-right corner.
134 |     y_min : int
135 |         y coordinate of bottom-right corner.
136 |     """
137 | 
138 |     def __init__(self, x_min, y_min, x_max, y_max, label=""):
139 | 
140 |         self.x_min = x_min
141 |         self.y_min = y_min
142 |         self.x_max = x_max
143 |         self.y_max = y_max
144 | 
145 |         self.x_side = self.x_max - self.x_min
146 |         self.y_side = self.y_max - self.y_min
147 | 
148 |         self.label = label
149 | 
150 |     def intersect_with(self, rect):
151 |         """
152 |         Compute the intersection between this instance and another Rectangle.
153 | 
154 |         Parameters
155 |         ----------
156 |         rect : Rectangle
157 |             The instance of the second Rectangle.
158 | 
159 |         Returns
160 |         -------
161 |         intersection_area : float
162 |             Area of intersection between the two rectangles expressed in number of pixels.
163 |         """
164 |         if not isinstance(rect, Rectangle):
165 |             raise ValueError('Cannot compute intersection if "rect" is not a Rectangle')
166 | 
167 |         dx = min(self.x_max, rect.x_max) - max(self.x_min, rect.x_min)
168 |         dy = min(self.y_max, rect.y_max) - max(self.y_min, rect.y_min)
169 | 
170 |         if dx >= 0 and dy >= 0:
171 |             intersection = dx * dy
172 |         else:
173 |             intersection = 0.
174 | 
175 |         return intersection
176 | 
177 |     def resize_sides(self, ratio, bounds=None):
178 |         """
179 |         Resize the sides of rectangle while mantaining the aspect ratio and center position.
180 |         Parameters
181 |         ----------
182 |         ratio : float
183 |             Ratio of the resize in range (0, infinity), where 2 means double the size and 0.5 is half of the size.
184 |         bounds: tuple, optional
185 |             If present, clip the Rectangle to these bounds=(xbmin, ybmin, xbmax, ybmax).
186 |         Returns
187 |         -------
188 |         rectangle : Rectangle
189 |             Reshaped Rectangle.
190 |         """
191 | 
192 |         # compute offset
193 |         off_x = abs(ratio * self.x_side - self.x_side) / 2
194 |         off_y = abs(ratio * self.y_side - self.y_side) / 2
195 | 
196 |         # offset changes sign according if the resize is either positive or negative
197 |         sign = np.sign(ratio - 1.)
198 |         off_x = np.int32(off_x * sign)
199 |         off_y = np.int32(off_y * sign)
200 | 
201 |         # update top-left and bottom-right coords
202 |         new_x_min, new_y_min = self.x_min - off_x, self.y_min - off_y
203 |         new_x_max, new_y_max = self.x_max + off_x, self.y_max + off_y
204 | 
205 |         # eventually clip the coordinates according to the given bounds
206 |         if bounds:
207 |             b_x_min, b_y_min, b_x_max, b_y_max = bounds
208 |             new_x_min = max(new_x_min, b_x_min)
209 |             new_y_min = max(new_y_min, b_y_min)
210 |             new_x_max = min(new_x_max, b_x_max)
211 |             new_y_max = min(new_y_max, b_y_max)
212 | 
213 |         return Rectangle(new_x_min, new_y_min, new_x_max, new_y_max)
214 | 
215 |     def draw(self, frame, color=255, thickness=2, draw_label=False):
216 |         """
217 |         Draw Rectangle on a given frame.
218 |         Notice: while this function does not return anything, original image `frame` is modified.
219 |         Parameters
220 |         ----------
221 |         frame : 2D / 3D np.array
222 |             The image on which the rectangle is drawn.
223 |         color : tuple, optional
224 |             Color used to draw the rectangle (default = 255)
225 |         thickness : int, optional
226 |             Line thickness used t draw the rectangle (default = 1)
227 |         draw_label : bool, optional
228 |             If True and the Rectangle has a label, draws it on the top of the rectangle.
229 |         Returns
230 |         -------
231 |         None
232 |         """
233 |         if draw_label and self.label:
234 |             # compute text size
235 |             text_font, text_scale, text_thick = cv2.FONT_HERSHEY_SIMPLEX, 0.5, 1
236 |             (text_w, text_h), baseline = cv2.getTextSize(self.label, text_font, text_scale,
237 |                                                          text_thick)
238 | 
239 |             # draw rectangle on which text will be displayed
240 |             text_rect_w = min(text_w, self.x_side - 2 * baseline)
241 |             out = cv2.rectangle(frame.copy(), pt1=(self.x_min, self.y_min - text_h - 2 * baseline),
242 |                                 pt2=(self.x_min + text_rect_w + 2 * baseline, self.y_min),
243 |                                 color=color, thickness=cv2.FILLED)
244 |             cv2.addWeighted(frame, 0.75, out, 0.25, 0, dst=frame)
245 | 
246 |             # actually write text label
247 |             cv2.putText(frame, self.label, (self.x_min + baseline, self.y_min - baseline),
248 |                         text_font, text_scale, (0, 0, 0), text_thick, cv2.LINE_AA)
249 | 
250 |             # add text rectangle border
251 |             cv2.rectangle(frame, pt1=(self.x_min, self.y_min - text_h - 2 * baseline),
252 |                           pt2=(self.x_min + text_rect_w + 2 * baseline, self.y_min), color=color,
253 |                           thickness=thickness)
254 | 
255 |         # draw the Rectangle
256 |         cv2.rectangle(frame, (self.x_min, self.y_min), (self.x_max, self.y_max), color, thickness)
257 | 
258 |     def get_binary_mask(self, mask_shape):
259 |         """
260 |         Get uint8 binary mask of shape `mask_shape` with rectangle in foreground.
261 |         Parameters
262 |         ----------
263 |         mask_shape : (tuple)
264 |             Shape of the mask to return - following convention (h, w)
265 |         Returns
266 |         -------
267 |         mask : np.array
268 |             Binary uint8 mask of shape `mask_shape` with rectangle drawn as foreground.
269 |         """
270 |         if mask_shape[0] < self.y_max or mask_shape[1] < self.x_max:
271 |             raise ValueError('Mask shape is smaller than Rectangle size')
272 |         mask = np.zeros(shape=mask_shape, dtype=np.uint8)
273 |         mask = cv2.rectangle(mask, self.tl_corner, self.br_corner, color=255, thickness=cv2.FILLED)
274 |         return mask
275 | 
276 |     @property
277 |     def tl_corner(self):
278 |         """
279 |         Coordinates of the top-left corner of rectangle (as int32).
280 |         Returns
281 |         -------
282 |         tl_corner : int32 tuple
283 |         """
284 |         return tuple(map(np.int32, (self.x_min, self.y_min)))
285 | 
286 |     @property
287 |     def br_corner(self):
288 |         """
289 |         Coordinates of the bottom-right corner of rectangle.
290 | 
291 |         Returns
292 |         -------
293 |         br_corner : int32 tuple
294 |         """
295 |         return tuple(map(np.int32, (self.x_max, self.y_max)))
296 | 
297 |     @property
298 |     def coords(self):
299 |         """
300 |         Coordinates (x_min, y_min, x_max, y_max) which define the Rectangle.
301 | 
302 |         Returns
303 |         -------
304 |         coordinates : int32 tuple
305 |         """
306 |         return tuple(map(np.int32, (self.x_min, self.y_min, self.x_max, self.y_max)))
307 | 
308 |     @property
309 |     def area(self):
310 |         """
311 |         Get the area of Rectangle
312 | 
313 |         Returns
314 |         -------
315 |         area : float32
316 |         """
317 |         return np.float32(self.x_side * self.y_side)
318 | 


--------------------------------------------------------------------------------
/04_vehicle_detection/test_images/test1.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/ifding/deep-learning-python/fc8bc808d5439686f0ee24a4f0f3b1f5354df6c0/04_vehicle_detection/test_images/test1.jpg


--------------------------------------------------------------------------------
/04_vehicle_detection/test_images/test2.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/ifding/deep-learning-python/fc8bc808d5439686f0ee24a4f0f3b1f5354df6c0/04_vehicle_detection/test_images/test2.jpg


--------------------------------------------------------------------------------
/04_vehicle_detection/test_images/test3.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/ifding/deep-learning-python/fc8bc808d5439686f0ee24a4f0f3b1f5354df6c0/04_vehicle_detection/test_images/test3.jpg


--------------------------------------------------------------------------------
/04_vehicle_detection/test_images/test4.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/ifding/deep-learning-python/fc8bc808d5439686f0ee24a4f0f3b1f5354df6c0/04_vehicle_detection/test_images/test4.jpg


--------------------------------------------------------------------------------
/04_vehicle_detection/test_images/test5.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/ifding/deep-learning-python/fc8bc808d5439686f0ee24a4f0f3b1f5354df6c0/04_vehicle_detection/test_images/test5.jpg


--------------------------------------------------------------------------------
/04_vehicle_detection/test_images/test6.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/ifding/deep-learning-python/fc8bc808d5439686f0ee24a4f0f3b1f5354df6c0/04_vehicle_detection/test_images/test6.jpg


--------------------------------------------------------------------------------
/04_vehicle_detection/train.py:
--------------------------------------------------------------------------------
 1 | import os
 2 | import pickle
 3 | import time
 4 | 
 5 | import cv2
 6 | import matplotlib.pyplot as plt
 7 | import numpy as np
 8 | from sklearn.model_selection import train_test_split
 9 | from sklearn.preprocessing import StandardScaler
10 | from sklearn.svm import LinearSVC
11 | 
12 | from config import root_data_non_vehicle, root_data_vehicle, feat_extraction_params
13 | from functions_detection import draw_boxes
14 | from functions_detection import search_windows
15 | from functions_detection import slide_window
16 | from functions_feat_extraction import extract_features_from_file_list
17 | from project_5_utils import get_file_list_recursively
18 | 
19 | 
20 | if __name__ == '__main__':
21 | 
22 |     # read paths of training images
23 |     cars = get_file_list_recursively(root_data_vehicle)
24 |     notcars = get_file_list_recursively(root_data_non_vehicle)
25 | 
26 |     print('Extracting car features...')
27 |     car_features = extract_features_from_file_list(cars, feat_extraction_params)
28 | 
29 |     print('Extracting non-car features...')
30 |     notcar_features = extract_features_from_file_list(notcars, feat_extraction_params)
31 | 
32 |     X = np.vstack((car_features, notcar_features)).astype(np.float64)
33 | 
34 |     # standardize features with sklearn preprocessing
35 |     feature_scaler = StandardScaler().fit(X)  # per-column scaler
36 |     scaled_X = feature_scaler.transform(X)
37 | 
38 |     # Define the labels vector
39 |     y = np.hstack((np.ones(len(car_features)), np.zeros(len(notcar_features))))
40 | 
41 |     # Split up data into randomized training and test sets
42 |     rand_state = np.random.randint(0, 100)
43 |     X_train, X_test, y_train, y_test = train_test_split(scaled_X, y, test_size=0.2, random_state=rand_state)
44 | 
45 |     print('Feature vector length:', len(X_train[0]))
46 | 
47 |     # Define the classifier
48 |     svc = LinearSVC()  # svc = SVC(kernel='rbf')
49 | 
50 |     # Train the classifier (check training time)
51 |     t = time.time()
52 |     svc.fit(X_train, y_train)
53 |     t2 = time.time()
54 |     print(round(t2 - t, 2), 'Seconds to train SVC...')
55 | 
56 |     # Check the score of the SVC
57 |     print('Test Accuracy of SVC = ', round(svc.score(X_test, y_test), 4))
58 | 
59 |     # dump all stuff necessary to perform testing in a successive phase
60 |     with open('data/svm_trained.pickle', 'wb') as f:
61 |         pickle.dump(svc, f)
62 |     with open('data/feature_scaler.pickle', 'wb') as f:
63 |         pickle.dump(feature_scaler, f)
64 |     with open('data/feat_extraction_params.pickle', 'wb') as f:
65 |         pickle.dump(feat_extraction_params, f)
66 | 
67 |     # test on images in "test_images" directory
68 |     test_img_dir = 'test_images'
69 |     for test_img in os.listdir(test_img_dir):
70 |         image = cv2.imread(os.path.join(test_img_dir, test_img))
71 | 
72 |         h, w, c = image.shape
73 |         draw_image = np.copy(image)
74 | 
75 |         windows = slide_window(image, x_start_stop=[None, None], y_start_stop=[h//2, None],
76 |                                xy_window=(64, 64), xy_overlap=(0.8, 0.8))
77 | 
78 |         hot_windows = search_windows(image, windows, svc, feature_scaler, feat_extraction_params)
79 | 
80 |         window_img = draw_boxes(draw_image, hot_windows, color=(0, 0, 255), thick=6)
81 | 
82 |         plt.imshow(cv2.cvtColor(window_img, cv2.COLOR_BGR2RGB))
83 |         plt.show()
84 | 


--------------------------------------------------------------------------------
/04_vehicle_detection/vehicle.py:
--------------------------------------------------------------------------------
  1 | import cv2
  2 | import numpy as np
  3 | 
  4 | 
  5 | class Vehicle:
  6 |     """
  7 |     2D Vehicle defined by top-left and bottom-right corners.
  8 | 
  9 |     Parameters
 10 |     ----------
 11 |     x_min : int
 12 |         x coordinate of top-left corner.
 13 |     y_min : int
 14 |         y coordinate of top-left corner.
 15 |     x_max : int
 16 |         x coordinate of bottom-right corner.
 17 |     y_min : int
 18 |         y coordinate of bottom-right corner.
 19 |     """
 20 | 
 21 |     def __init__(self, x_min, y_min, x_max, y_max):
 22 | 
 23 |         self.x_min = x_min
 24 |         self.y_min = y_min
 25 |         self.x_max = x_max
 26 |         self.y_max = y_max
 27 | 
 28 |         self.x_side = self.x_max - self.x_min
 29 |         self.y_side = self.y_max - self.y_min
 30 | 
 31 |     def intersect_with(self, rect):
 32 |         """
 33 |         Compute the intersection between this instance and another Vehicle.
 34 | 
 35 |         Parameters
 36 |         ----------
 37 |         rect : Vehicle
 38 |             The instance of the second Vehicle.
 39 | 
 40 |         Returns
 41 |         -------
 42 |         intersection_area : float
 43 |             Area of intersection between the two rectangles expressed in number of pixels.
 44 |         """
 45 |         if not isinstance(rect, Vehicle):
 46 |             raise ValueError('Cannot compute intersection if "rect" is not a Vehicle')
 47 | 
 48 |         dx = min(self.x_max, rect.x_max) - max(self.x_min, rect.x_min)
 49 |         dy = min(self.y_max, rect.y_max) - max(self.y_min, rect.y_min)
 50 | 
 51 |         if dx >= 0 and dy >= 0:
 52 |             intersection = dx * dy
 53 |         else:
 54 |             intersection = 0.
 55 | 
 56 |         return intersection
 57 | 
 58 |     def resize_sides(self, ratio, bounds=None):
 59 |         """
 60 |         Resize the sides of rectangle while mantaining the aspect ratio and center position.
 61 | 
 62 |         Parameters
 63 |         ----------
 64 |         ratio : float
 65 |             Ratio of the resize in range (0, infinity), where 2 means double the size and 0.5 is half of the size.
 66 |         bounds: tuple, optional
 67 |             If present, clip the Vehicle to these bounds=(xbmin, ybmin, xbmax, ybmax).
 68 | 
 69 |         Returns
 70 |         -------
 71 |         rectangle : Vehicle
 72 |             Reshaped Vehicle.
 73 |         """
 74 | 
 75 |         # compute offset
 76 |         off_x = abs(ratio * self.x_side - self.x_side) / 2
 77 |         off_y = abs(ratio * self.y_side - self.y_side) / 2
 78 | 
 79 |         # offset changes sign according if the resize is either positive or negative
 80 |         sign = np.sign(ratio - 1.)
 81 |         off_x = np.int32(off_x * sign)
 82 |         off_y = np.int32(off_y * sign)
 83 | 
 84 |         # update top-left and bottom-right coords
 85 |         new_x_min, new_y_min = self.x_min - off_x, self.y_min - off_y
 86 |         new_x_max, new_y_max = self.x_max + off_x, self.y_max + off_y
 87 | 
 88 |         # eventually clip the coordinates according to the given bounds
 89 |         if bounds:
 90 |             b_x_min, b_y_min, b_x_max, b_y_max = bounds
 91 |             new_x_min = max(new_x_min, b_x_min)
 92 |             new_y_min = max(new_y_min, b_y_min)
 93 |             new_x_max = min(new_x_max, b_x_max)
 94 |             new_y_max = min(new_y_max, b_y_max)
 95 | 
 96 |         return Vehicle(new_x_min, new_y_min, new_x_max, new_y_max)
 97 | 
 98 |     def draw(self, frame, color=255, thickness=1):
 99 |         """
100 |         Draw Vehicle on a given frame.
101 | 
102 |         Notice: while this function does not return anything, original image `frame` is modified.
103 | 
104 |         Parameters
105 |         ----------
106 |         frame : 2D / 3D np.array
107 |             The image on which the rectangle is drawn.
108 |         color : tuple, optional
109 |             Color used to draw the rectangle (default = 255)
110 |         thickness : int, optional
111 |             Line thickness used t draw the rectangle (default = 1)
112 | 
113 |         Returns
114 |         -------
115 |         None
116 |         """
117 |         cv2.rectangle(frame, (self.x_min, self.y_min), (self.x_max, self.y_max), color, thickness)
118 | 
119 |     def get_binary_mask(self, mask_shape):
120 |         """
121 |         Get uint8 binary mask of shape `mask_shape` with rectangle in foreground.
122 | 
123 |         Parameters
124 |         ----------
125 |         mask_shape : (tuple)
126 |             Shape of the mask to return - following convention (h, w)
127 | 
128 |         Returns
129 |         -------
130 |         mask : np.array
131 |             Binary uint8 mask of shape `mask_shape` with rectangle drawn as foreground.
132 |         """
133 |         if mask_shape[0] < self.y_max or mask_shape[1] < self.x_max:
134 |             raise ValueError('Mask shape is smaller than Vehicle size')
135 |         mask = np.zeros(shape=mask_shape, dtype=np.uint8)
136 |         mask = cv2.rectangle(mask, self.tl_corner, self.br_corner, color=255, thickness=cv2.FILLED)
137 |         return mask
138 | 
139 |     def contains(self, x, y):
140 | 
141 |         if self.x_min < x < self.x_max and self.y_min < y < self.y_max:
142 |             return True
143 |         else:
144 |             return False
145 | 
146 |     @property
147 |     def center(self):
148 |         center_x = self.x_min + self.x_side // 2
149 |         center_y = self.y_min + self.y_side // 2
150 |         return tuple(map(np.int32, (center_x, center_y)))
151 | 
152 |     @property
153 |     def tl_corner(self):
154 |         """
155 |         Coordinates of the top-left corner of rectangle (as int32).
156 | 
157 |         Returns
158 |         -------
159 |         tl_corner : int32 tuple
160 |         """
161 |         return tuple(map(np.int32, (self.x_min, self.y_min)))
162 | 
163 |     @property
164 |     def br_corner(self):
165 |         """
166 |         Coordinates of the bottom-right corner of rectangle.
167 | 
168 |         Returns
169 |         -------
170 |         br_corner : int32 tuple
171 |         """
172 |         return tuple(map(np.int32, (self.x_max, self.y_max)))
173 | 
174 |     @property
175 |     def coords(self):
176 |         """
177 |         Coordinates (x_min, y_min, x_max, y_max) which define the Vehicle. 
178 | 
179 |         Returns
180 |         -------
181 |         coordinates : int32 tuple
182 |         """
183 |         return tuple(map(np.int32, (self.x_min, self.y_min, self.x_max, self.y_max)))
184 | 
185 | 
186 |     @property
187 |     def area(self):
188 |         """
189 |         Get the area of Vehicle
190 | 
191 |         Returns
192 |         -------
193 |         area : float32
194 |         """
195 |         return np.float32(self.x_side * self.y_side)


--------------------------------------------------------------------------------
/05_road_segmentation/README.md:
--------------------------------------------------------------------------------
 1 | # Semantic Segmentation
 2 | ### Introduction
 3 | In this project, you'll label the pixels of a road in images using a Fully Convolutional Network (FCN).
 4 | 
 5 | <p align="center">
 6 |  <img src="./img/example.png" alt="Overview" width="75%" height="75%">
 7 |  <br>Qualitative results.
 8 | </p>
 9 | 
10 | 
11 | 
12 | ### Setup
13 | ##### Frameworks and Packages
14 | Make sure you have the following is installed:
15 |  - [Python 3](https://www.python.org/)
16 |  - [TensorFlow](https://www.tensorflow.org/)
17 |  - [NumPy](http://www.numpy.org/)
18 |  - [SciPy](https://www.scipy.org/)
19 | ##### Dataset
20 | Download the [Kitti Road dataset](http://www.cvlibs.net/datasets/kitti/eval_road.php) from [here](http://www.cvlibs.net/download.php?file=data_road.zip).  Extract the dataset in the `data` folder.  This will create the folder `data_road` with all the training a test images.
21 | 
22 | ### Run
23 | 
24 | Run the following command to run the project:
25 | ```
26 | python main.py
27 | ```
28 | **Note** If running this in Jupyter Notebook system messages, such as those regarding test status, may appear in the terminal rather than the notebook.
29 |  
30 | 


--------------------------------------------------------------------------------
/05_road_segmentation/__init__.py:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/ifding/deep-learning-python/fc8bc808d5439686f0ee24a4f0f3b1f5354df6c0/05_road_segmentation/__init__.py


--------------------------------------------------------------------------------
/05_road_segmentation/helper.py:
--------------------------------------------------------------------------------
  1 | import re
  2 | import random
  3 | import numpy as np
  4 | import os.path
  5 | import scipy.misc
  6 | import shutil
  7 | import zipfile
  8 | import time
  9 | import tensorflow as tf
 10 | from glob import glob
 11 | from urllib.request import urlretrieve
 12 | from tqdm import tqdm
 13 | 
 14 | 
 15 | class DLProgress(tqdm):
 16 |     last_block = 0
 17 | 
 18 |     def hook(self, block_num=1, block_size=1, total_size=None):
 19 |         self.total = total_size
 20 |         self.update((block_num - self.last_block) * block_size)
 21 |         self.last_block = block_num
 22 | 
 23 | 
 24 | def maybe_download_pretrained_vgg(data_dir):
 25 |     """
 26 |     Download and extract pretrained vgg model if it doesn't exist
 27 |     :param data_dir: Directory to download the model to
 28 |     """
 29 |     vgg_filename = 'vgg.zip'
 30 |     vgg_path = os.path.join(data_dir, 'vgg')
 31 |     vgg_files = [
 32 |         os.path.join(vgg_path, 'variables/variables.data-00000-of-00001'),
 33 |         os.path.join(vgg_path, 'variables/variables.index'),
 34 |         os.path.join(vgg_path, 'saved_model.pb')]
 35 | 
 36 |     missing_vgg_files = [vgg_file for vgg_file in vgg_files if not os.path.exists(vgg_file)]
 37 |     if missing_vgg_files:
 38 |         # Clean vgg dir
 39 |         if os.path.exists(vgg_path):
 40 |             shutil.rmtree(vgg_path)
 41 |         os.makedirs(vgg_path)
 42 | 
 43 |         # Download vgg
 44 |         print('Downloading pre-trained vgg model...')
 45 |         with DLProgress(unit='B', unit_scale=True, miniters=1) as pbar:
 46 |             urlretrieve(
 47 |                 'https://s3-us-west-1.amazonaws.com/udacity-selfdrivingcar/vgg.zip',
 48 |                 os.path.join(vgg_path, vgg_filename),
 49 |                 pbar.hook)
 50 | 
 51 |         # Extract vgg
 52 |         print('Extracting model...')
 53 |         zip_ref = zipfile.ZipFile(os.path.join(vgg_path, vgg_filename), 'r')
 54 |         zip_ref.extractall(data_dir)
 55 |         zip_ref.close()
 56 | 
 57 |         # Remove zip file to save space
 58 |         os.remove(os.path.join(vgg_path, vgg_filename))
 59 | 
 60 | 
 61 | def gen_batch_function(data_folder, image_shape):
 62 |     """
 63 |     Generate function to create batches of training data
 64 |     :param data_folder: Path to folder that contains all the datasets
 65 |     :param image_shape: Tuple - Shape of image
 66 |     :return:
 67 |     """
 68 |     def get_batches_fn(batch_size):
 69 |         """
 70 |         Create batches of training data
 71 |         :param batch_size: Batch Size
 72 |         :return: Batches of training data
 73 |         """
 74 |         image_paths = glob(os.path.join(data_folder, 'image_2', '*.png'))
 75 |         label_paths = {
 76 |             re.sub(r'_(lane|road)_', '_', os.path.basename(path)): path
 77 |             for path in glob(os.path.join(data_folder, 'gt_image_2', '*_road_*.png'))}
 78 |         background_color = np.array([255, 0, 0])
 79 | 
 80 |         random.shuffle(image_paths)
 81 |         for batch_i in range(0, len(image_paths), batch_size):
 82 |             images = []
 83 |             gt_images = []
 84 |             for image_file in image_paths[batch_i:batch_i+batch_size]:
 85 |                 gt_image_file = label_paths[os.path.basename(image_file)]
 86 | 
 87 |                 image = scipy.misc.imresize(scipy.misc.imread(image_file), image_shape)
 88 |                 gt_image = scipy.misc.imresize(scipy.misc.imread(gt_image_file), image_shape)
 89 | 
 90 |                 gt_bg = np.all(gt_image == background_color, axis=2)
 91 |                 gt_bg = gt_bg.reshape(*gt_bg.shape, 1)
 92 |                 gt_image = np.concatenate((gt_bg, np.invert(gt_bg)), axis=2)
 93 | 
 94 |                 images.append(image)
 95 |                 gt_images.append(gt_image)
 96 | 
 97 |             yield np.array(images), np.array(gt_images)
 98 |     return get_batches_fn
 99 | 
100 | 
101 | def gen_test_output(sess, logits, keep_prob, image_pl, data_folder, image_shape):
102 |     """
103 |     Generate test output using the test images
104 |     :param sess: TF session
105 |     :param logits: TF Tensor for the logits
106 |     :param keep_prob: TF Placeholder for the dropout keep robability
107 |     :param image_pl: TF Placeholder for the image placeholder
108 |     :param data_folder: Path to the folder that contains the datasets
109 |     :param image_shape: Tuple - Shape of image
110 |     :return: Output for for each test image
111 |     """
112 |     for image_file in glob(os.path.join(data_folder, 'image_2', '*.png')):
113 |         image = scipy.misc.imresize(scipy.misc.imread(image_file), image_shape)
114 | 
115 |         im_softmax = sess.run(
116 |             [tf.nn.softmax(logits)],
117 |             {keep_prob: 1.0, image_pl: [image]})
118 |         im_softmax = im_softmax[0][:, 1].reshape(image_shape[0], image_shape[1])
119 |         segmentation = (im_softmax > 0.5).reshape(image_shape[0], image_shape[1], 1)
120 |         mask = np.dot(segmentation, np.array([[0, 255, 0, 127]]))
121 |         mask = scipy.misc.toimage(mask, mode="RGBA")
122 |         street_im = scipy.misc.toimage(image)
123 |         street_im.paste(mask, box=None, mask=mask)
124 | 
125 |         yield os.path.basename(image_file), np.array(street_im)
126 | 
127 | 
128 | def save_inference_samples(runs_dir, data_dir, sess, image_shape, logits, keep_prob, input_image):
129 |     # Make folder for current run
130 |     output_dir = os.path.join(runs_dir, str(time.time()))
131 |     if os.path.exists(output_dir):
132 |         shutil.rmtree(output_dir)
133 |     os.makedirs(output_dir)
134 | 
135 |     # Run NN on test images and save them to HD
136 |     print('Training Finished. Saving test images to: {}'.format(output_dir))
137 |     image_outputs = gen_test_output(
138 |         sess, logits, keep_prob, input_image, os.path.join(data_dir, 'data_road/testing'), image_shape)
139 |     for name, image in image_outputs:
140 |         scipy.misc.imsave(os.path.join(output_dir, name), image)
141 | 


--------------------------------------------------------------------------------
/05_road_segmentation/image_augmentation.py:
--------------------------------------------------------------------------------
  1 | """
  2 | Fairly basic set of tools for data augmentation on images.
  3 | """
  4 | 
  5 | import numpy as np
  6 | import random
  7 | import cv2
  8 | from os.path import join, expanduser
  9 | import matplotlib.pyplot as plt
 10 | 
 11 | 
 12 | def perform_augmentation(batch_x, batch_y):
 13 |     """
 14 |     Perform basic data augmentation on image batches.
 15 |     
 16 |     Parameters
 17 |     ----------
 18 |     batch_x: ndarray of shape (b, h, w, c)
 19 |         Batch of images in RGB format, values in [0, 255]
 20 |     batch_y: ndarray of shape (b, h, w, c)
 21 |         Batch of ground truth with road segmentation
 22 |         
 23 |     Returns
 24 |     -------
 25 |     batch_x_aug, batch_y_aug: two ndarray of shape (b, h, w, c)
 26 |         Augmented batches
 27 |     """
 28 |     def mirror(x):
 29 |         return x[:, ::-1, :]
 30 | 
 31 |     def augment_in_hsv_space(x_hsv):
 32 |         x_hsv = np.float32(cv2.cvtColor(x_hsv, cv2.COLOR_RGB2HSV))
 33 |         x_hsv[:, :, 0] = x_hsv[:, :, 0] * random.uniform(0.9, 1.1)   # change hue
 34 |         x_hsv[:, :, 1] = x_hsv[:, :, 1] * random.uniform(0.5, 2.0)   # change saturation
 35 |         x_hsv[:, :, 2] = x_hsv[:, :, 2] * random.uniform(0.5, 2.0)   # change brightness
 36 |         x_hsv = np.uint8(np.clip(x_hsv, 0, 255))
 37 |         return cv2.cvtColor(x_hsv, cv2.COLOR_HSV2RGB)
 38 | 
 39 |     batch_x_aug = np.copy(batch_x)
 40 |     batch_y_aug = np.copy(batch_y)
 41 | 
 42 |     for b in range(batch_x_aug.shape[0]):
 43 | 
 44 |         # Random mirroring
 45 |         should_mirror = random.choice([True, False])
 46 |         if should_mirror:
 47 |             batch_x_aug[b] = mirror(batch_x[b])
 48 |             batch_y_aug[b] = mirror(batch_y[b])
 49 | 
 50 |         # Random change in image values (hue, saturation, brightness)
 51 |         batch_x_aug[b] = augment_in_hsv_space(batch_x_aug[b])
 52 | 
 53 |     return batch_x_aug, batch_y_aug
 54 | 
 55 | 
 56 | def debug_visualize_data_augmentation():
 57 | 
 58 |     from main_27 import gen_batch_function  # keep here to avoid circular dependencies
 59 | 
 60 |     """
 61 |     Dirty and running code to debug image augmentation functions.
 62 |     """
 63 | 
 64 |     # Parameters
 65 |     data_dir = join(expanduser("~"), 'code', 'self-driving-car', 'project_12_road_segmentation', 'data')
 66 |     image_h, image_w = (160, 576)
 67 |     batch_size = 20
 68 | 
 69 |     # Create function to get batches
 70 |     batch_generator = gen_batch_function(join(data_dir, 'data_road/training'), (image_h, image_w))
 71 | 
 72 |     # Load a batch and augment it
 73 |     batch_x, batch_y = next(batch_generator(batch_size))
 74 |     batch_x_aug, batch_y_aug = perform_augmentation(batch_x, batch_y)
 75 | 
 76 |     # Show both original and augmented batch images
 77 |     for i in range(batch_size):
 78 |         plt.figure(1)
 79 | 
 80 |         x = batch_x[i]
 81 |         y = np.uint8(batch_y[i][:, :, 1]) * 255  # cast from boolean to uint8 for visualization
 82 |         y = np.stack([y, y, y], axis=2)          # turn to 3-channels for visualization
 83 |         xy = np.concatenate([x, y], axis=0)
 84 | 
 85 |         plt.imshow(xy)
 86 | 
 87 |         plt.figure(2)
 88 | 
 89 |         x_aug = batch_x_aug[i]
 90 |         y_aug = np.uint8(batch_y_aug[i][:, :, 1]) * 255  # cast from boolean to uint8 for visualization
 91 |         y_aug = np.stack([y_aug, y_aug, y_aug], axis=2)  # turn to 3-channels for visualization
 92 |         xy_aug = np.concatenate([x_aug, y_aug], axis=0)
 93 | 
 94 |         plt.imshow(xy_aug)
 95 | 
 96 |         plt.show(block=False)
 97 |         plt.waitforbuttonpress()
 98 | 
 99 | 
100 | if __name__ == '__main__':
101 | 
102 |     debug_visualize_data_augmentation()
103 | 


--------------------------------------------------------------------------------
/05_road_segmentation/img/example.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/ifding/deep-learning-python/fc8bc808d5439686f0ee24a4f0f3b1f5354df6c0/05_road_segmentation/img/example.png


--------------------------------------------------------------------------------
/05_road_segmentation/img/overview.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/ifding/deep-learning-python/fc8bc808d5439686f0ee24a4f0f3b1f5354df6c0/05_road_segmentation/img/overview.jpg


--------------------------------------------------------------------------------
/05_road_segmentation/main.py:
--------------------------------------------------------------------------------
  1 | import os
  2 | import argparse
  3 | import warnings
  4 | import tensorflow as tf
  5 | from helper import gen_batch_function, save_inference_samples
  6 | from distutils.version import LooseVersion
  7 | from os.path import join, expanduser
  8 | import project_tests as tests
  9 | from image_augmentation import perform_augmentation
 10 | 
 11 | 
 12 | # Check TensorFlow Version
 13 | assert LooseVersion(tf.__version__) >= LooseVersion('1.0'),\
 14 |     'Please use TensorFlow version 1.0 or newer.  You are using {}'.format(tf.__version__)
 15 | print('TensorFlow Version: {}'.format(tf.__version__))
 16 | 
 17 | # Check for a GPU
 18 | if not tf.test.gpu_device_name():
 19 |     warnings.warn('No GPU found. Please use a GPU to train your neural network.')
 20 | else:
 21 |     print('Default GPU Device: {}'.format(tf.test.gpu_device_name()))
 22 | 
 23 | 
 24 | def load_vgg(sess, vgg_path):
 25 |     """
 26 |     Load Pretrained VGG Model into TensorFlow.
 27 | 
 28 |     :param sess: TensorFlow Session
 29 |     :param vgg_path: Path to vgg folder, containing "variables/" and "saved_model.pb"
 30 |     :return: Tuple of Tensors from VGG model (image_input, keep_prob, layer3_out, layer4_out, layer7_out)
 31 |     """
 32 | 
 33 |     vgg_input_tensor_name = 'image_input:0'
 34 |     vgg_keep_prob_tensor_name = 'keep_prob:0'
 35 |     vgg_layer3_out_tensor_name = 'layer3_out:0'
 36 |     vgg_layer4_out_tensor_name = 'layer4_out:0'
 37 |     vgg_layer7_out_tensor_name = 'layer7_out:0'
 38 | 
 39 |     tf.saved_model.loader.load(sess, ['vgg16'], vgg_path)
 40 |     graph = tf.get_default_graph()
 41 | 
 42 |     image_input = graph.get_tensor_by_name(vgg_input_tensor_name)
 43 |     keep_prob = graph.get_tensor_by_name(vgg_keep_prob_tensor_name)
 44 |     layer3_out = graph.get_tensor_by_name(vgg_layer3_out_tensor_name)
 45 |     layer4_out = graph.get_tensor_by_name(vgg_layer4_out_tensor_name)
 46 |     layer7_out = graph.get_tensor_by_name(vgg_layer7_out_tensor_name)
 47 | 
 48 |     return image_input, keep_prob, layer3_out, layer4_out, layer7_out
 49 | 
 50 | 
 51 | def layers(vgg_layer3_out, vgg_layer4_out, vgg_layer7_out, num_classes):
 52 |     """
 53 |     Create the layers for a fully convolutional network.  Build skip-layers using the vgg layers.
 54 |     For reference: https://people.eecs.berkeley.edu/~jonlong/long_shelhamer_fcn.pdf
 55 | 
 56 |     :param vgg_layer7_out: TF Tensor for VGG Layer 3 output
 57 |     :param vgg_layer4_out: TF Tensor for VGG Layer 4 output
 58 |     :param vgg_layer3_out: TF Tensor for VGG Layer 7 output
 59 |     :param num_classes: Number of classes to classify
 60 |     :return: The Tensor for the last layer of output
 61 |     """
 62 | 
 63 |     kernel_regularizer = tf.contrib.layers.l2_regularizer(0.5)
 64 | 
 65 |     # Compute logits
 66 |     layer3_logits = tf.layers.conv2d(vgg_layer3_out, num_classes, kernel_size=[1, 1],
 67 |                                      padding='same', kernel_regularizer=kernel_regularizer)
 68 |     layer4_logits = tf.layers.conv2d(vgg_layer4_out, num_classes, kernel_size=[1, 1],
 69 |                                      padding='same', kernel_regularizer=kernel_regularizer)
 70 |     layer7_logits = tf.layers.conv2d(vgg_layer7_out, num_classes, kernel_size=[1, 1],
 71 |                                      padding='same', kernel_regularizer=kernel_regularizer)
 72 | 
 73 |     # Add skip connection before 4th and 7th layer
 74 |     layer7_logits_up = tf.image.resize_images(layer7_logits, size=[10, 36])
 75 |     layer_4_7_fused = tf.add(layer7_logits_up, layer4_logits)
 76 | 
 77 |     # Add skip connection before (4+7)th and 3rd layer
 78 |     layer_4_7_fused_up = tf.image.resize_images(layer_4_7_fused, size=[20, 72])
 79 |     layer_3_4_7_fused = tf.add(layer3_logits, layer_4_7_fused_up)
 80 | 
 81 |     # resize to original size
 82 |     layer_3_4_7_up = tf.image.resize_images(layer_3_4_7_fused, size=[160, 576])
 83 |     layer_3_4_7_up = tf.layers.conv2d(layer_3_4_7_up, num_classes, kernel_size=[15, 15],
 84 |                                       padding='same', kernel_regularizer=kernel_regularizer)
 85 | 
 86 |     return layer_3_4_7_up
 87 | 
 88 | 
 89 | def optimize(net_prediction, labels, learning_rate, num_classes):
 90 |     """
 91 |     Build the TensorFLow loss and optimizer operations.
 92 |     :param net_prediction: TF Tensor of the last layer in the neural network
 93 |     :param labels: TF Placeholder for the correct label image
 94 |     :param learning_rate: TF Placeholder for the learning rate
 95 |     :param num_classes: Number of classes to classify
 96 |     :return: Tuple of (logits, train_op, cross_entropy_loss)
 97 |     """
 98 | 
 99 |     # Unroll
100 |     logits_flat = tf.reshape(net_prediction, (-1, num_classes))
101 |     labels_flat = tf.reshape(labels, (-1, num_classes))
102 | 
103 |     # Define loss
104 |     cross_entropy_loss = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(labels=labels_flat, logits=logits_flat))
105 | 
106 |     # Define optimization step
107 |     train_step = tf.train.AdamOptimizer(learning_rate=learning_rate).minimize(cross_entropy_loss)
108 | 
109 |     return logits_flat, train_step, cross_entropy_loss
110 | 
111 | 
112 | def train_nn(sess, training_epochs, batch_size, get_batches_fn, train_op, cross_entropy_loss,
113 |              image_input, labels, keep_prob, learning_rate):
114 |     """
115 |     Train neural network and print out the loss during training.
116 |     :param sess: TF Session
117 |     :param training_epochs: Number of epochs
118 |     :param batch_size: Batch size
119 |     :param get_batches_fn: Function to get batches of training data.  Call using get_batches_fn(batch_size)
120 |     :param train_op: TF Operation to train the neural network
121 |     :param cross_entropy_loss: TF Tensor for the amount of loss
122 |     :param image_input: TF Placeholder for input images
123 |     :param labels: TF Placeholder for label images
124 |     :param keep_prob: TF Placeholder for dropout keep probability
125 |     :param learning_rate: TF Placeholder for learning rate
126 |     """
127 | 
128 |     # Variable initialization
129 |     sess.run(tf.global_variables_initializer())
130 | 
131 |     lr = args.learning_rate
132 | 
133 |     for e in range(0, training_epochs):
134 | 
135 |         loss_this_epoch = 0.0
136 | 
137 |         for i in range(0, args.batches_per_epoch):
138 | 
139 |             # Load a batch of examples
140 |             batch_x, batch_y = next(get_batches_fn(batch_size))
141 |             if should_do_augmentation:
142 |                 batch_x, batch_y = perform_augmentation(batch_x, batch_y)
143 | 
144 |             _, cur_loss = sess.run(fetches=[train_op, cross_entropy_loss],
145 |                                    feed_dict={image_input: batch_x, labels: batch_y, keep_prob: 0.25,
146 |                                               learning_rate: lr})
147 | 
148 |             loss_this_epoch += cur_loss
149 | 
150 |         print('Epoch: {:02d}  -  Loss: {:.03f}'.format(e, loss_this_epoch / args.batches_per_epoch))
151 | 
152 | 
153 | def perform_tests():
154 |     tests.test_for_kitti_dataset(data_dir)
155 |     tests.test_load_vgg(load_vgg, tf)
156 |     tests.test_layers(layers)
157 |     tests.test_optimize(optimize)
158 |     tests.test_train_nn(train_nn)
159 | 
160 | 
161 | def run():
162 | 
163 |     num_classes = 2
164 | 
165 |     image_h, image_w = (160, 576)
166 | 
167 |     with tf.Session() as sess:
168 | 
169 |         # Path to vgg model
170 |         vgg_path = join(data_dir, 'vgg')
171 | 
172 |         # Create function to get batches
173 |         batch_generator = gen_batch_function(join(data_dir, 'data_road/training'), (image_h, image_w))
174 | 
175 |         # Load VGG pretrained
176 |         image_input, keep_prob, vgg_layer3_out, vgg_layer4_out, vgg_layer7_out = load_vgg(sess, vgg_path)
177 | 
178 |         # Add skip connections
179 |         output = layers(vgg_layer3_out, vgg_layer4_out, vgg_layer7_out, num_classes)
180 | 
181 |         # Define placeholders
182 |         labels = tf.placeholder(tf.float32, shape=[None, image_h, image_w, num_classes])
183 |         learning_rate = tf.placeholder(tf.float32, shape=[])
184 | 
185 |         logits, train_op, cross_entropy_loss = optimize(output, labels, learning_rate, num_classes)
186 | 
187 |         # Training parameters
188 |         train_nn(sess, args.training_epochs, args.batch_size, batch_generator, train_op, cross_entropy_loss,
189 |                  image_input, labels, keep_prob, learning_rate)
190 | 
191 |         save_inference_samples(runs_dir, data_dir, sess, (image_h, image_w), logits, keep_prob, image_input)
192 | 
193 | 
194 | def parse_arguments():
195 |     """
196 |     Parse command line arguments
197 |     """
198 |     parser = argparse.ArgumentParser(formatter_class=argparse.ArgumentDefaultsHelpFormatter)
199 |     parser.add_argument('--batch_size', type=int, default=8, help='Batch size used for training', metavar='')
200 |     parser.add_argument('--batches_per_epoch', type=int, default=100, help='Batches each training epoch', metavar='')
201 |     parser.add_argument('--training_epochs', type=int, default=30, help='Number of training epoch', metavar='')
202 |     parser.add_argument('--learning_rate', type=float, default=1e-4, help='Learning rate', metavar='')
203 |     parser.add_argument('--augmentation', type=bool, default=True, help='Perform augmentation in training', metavar='')
204 |     parser.add_argument('--gpu', type=int, default=0, help='Which GPU to use', metavar='')
205 |     return parser.parse_args()
206 | 
207 | 
208 | if __name__ == '__main__':
209 | 
210 |     data_dir = join(expanduser("~"), 'code', 'self-driving-car', 'project_12_road_segmentation', 'data')
211 |     runs_dir = join(expanduser("~"), 'majinbu_home', 'road_segmentation_prediction')
212 | 
213 |     args = parse_arguments()
214 | 
215 |     # Appropriately set GPU device
216 |     os.environ['CUDA_VISIBLE_DEVICES'] = str(args.gpu)
217 |     print('Using GPU: {:02d}.'.format(args.gpu))
218 | 
219 |     # Turn off augmentation during tests
220 |     should_do_augmentation = False
221 |     perform_tests()
222 | 
223 |     # Restore appropriate augmentation value
224 |     should_do_augmentation = args.augmentation
225 |     run()
226 | 


--------------------------------------------------------------------------------
/05_road_segmentation/project_tests.py:
--------------------------------------------------------------------------------
  1 | import sys
  2 | import os
  3 | from copy import deepcopy
  4 | from glob import glob
  5 | from unittest import mock
  6 | 
  7 | import numpy as np
  8 | import tensorflow as tf
  9 | 
 10 | 
 11 | def test_safe(func):
 12 |     """
 13 |     Isolate tests
 14 |     """
 15 |     def func_wrapper(*args):
 16 |         with tf.Graph().as_default():
 17 |             result = func(*args)
 18 |         print('Tests Passed')
 19 |         return result
 20 | 
 21 |     return func_wrapper
 22 | 
 23 | 
 24 | def _prevent_print(function, params):
 25 |     sys.stdout = open(os.devnull, "w")
 26 |     function(**params)
 27 |     sys.stdout = sys.__stdout__
 28 | 
 29 | 
 30 | def _assert_tensor_shape(tensor, shape, display_name):
 31 |     assert tf.assert_rank(tensor, len(shape), message='{} has wrong rank'.format(display_name))
 32 | 
 33 |     tensor_shape = tensor.get_shape().as_list() if len(shape) else []
 34 | 
 35 |     wrong_dimension = [ten_dim for ten_dim, cor_dim in zip(tensor_shape, shape)
 36 |                        if cor_dim is not None and ten_dim != cor_dim]
 37 |     assert not wrong_dimension, \
 38 |         '{} has wrong shape.  Found {}'.format(display_name, tensor_shape)
 39 | 
 40 | 
 41 | class TmpMock(object):
 42 |     """
 43 |     Mock a attribute.  Restore attribute when exiting scope.
 44 |     """
 45 |     def __init__(self, module, attrib_name):
 46 |         self.original_attrib = deepcopy(getattr(module, attrib_name))
 47 |         setattr(module, attrib_name, mock.MagicMock())
 48 |         self.module = module
 49 |         self.attrib_name = attrib_name
 50 | 
 51 |     def __enter__(self):
 52 |         return getattr(self.module, self.attrib_name)
 53 | 
 54 |     def __exit__(self, type, value, traceback):
 55 |         setattr(self.module, self.attrib_name, self.original_attrib)
 56 | 
 57 | 
 58 | @test_safe
 59 | def test_load_vgg(load_vgg, tf_module):
 60 |     with TmpMock(tf_module.saved_model.loader, 'load') as mock_load_model:
 61 |         vgg_path = ''
 62 |         sess = tf.Session()
 63 |         test_input_image = tf.placeholder(tf.float32, name='image_input')
 64 |         test_keep_prob = tf.placeholder(tf.float32, name='keep_prob')
 65 |         test_vgg_layer3_out = tf.placeholder(tf.float32, name='layer3_out')
 66 |         test_vgg_layer4_out = tf.placeholder(tf.float32, name='layer4_out')
 67 |         test_vgg_layer7_out = tf.placeholder(tf.float32, name='layer7_out')
 68 | 
 69 |         input_image, keep_prob, vgg_layer3_out, vgg_layer4_out, vgg_layer7_out = load_vgg(sess, vgg_path)
 70 | 
 71 |         assert mock_load_model.called, \
 72 |             'tf.saved_model.loader.load() not called'
 73 |         assert mock_load_model.call_args == mock.call(sess, ['vgg16'], vgg_path), \
 74 |             'tf.saved_model.loader.load() called with wrong arguments.'
 75 | 
 76 |         assert input_image == test_input_image, 'input_image is the wrong object'
 77 |         assert keep_prob == test_keep_prob, 'keep_prob is the wrong object'
 78 |         assert vgg_layer3_out == test_vgg_layer3_out, 'layer3_out is the wrong object'
 79 |         assert vgg_layer4_out == test_vgg_layer4_out, 'layer4_out is the wrong object'
 80 |         assert vgg_layer7_out == test_vgg_layer7_out, 'layer7_out is the wrong object'
 81 | 
 82 | 
 83 | @test_safe
 84 | def test_layers(layers):
 85 |     num_classes = 2
 86 |     vgg_layer3_out = tf.placeholder(tf.float32, [None, None, None, 256])
 87 |     vgg_layer4_out = tf.placeholder(tf.float32, [None, None, None, 512])
 88 |     vgg_layer7_out = tf.placeholder(tf.float32, [None, None, None, 4096])
 89 |     layers_output = layers(vgg_layer3_out, vgg_layer4_out, vgg_layer7_out, num_classes)
 90 | 
 91 |     _assert_tensor_shape(layers_output, [None, None, None, num_classes], 'Layers Output')
 92 | 
 93 | 
 94 | @test_safe
 95 | def test_optimize(optimize):
 96 |     num_classes = 2
 97 |     shape = [2, 3, 4, num_classes]
 98 |     layers_output = tf.Variable(tf.zeros(shape))
 99 |     correct_label = tf.placeholder(tf.float32, [None, None, None, num_classes])
100 |     learning_rate = tf.placeholder(tf.float32)
101 |     logits, train_op, cross_entropy_loss = optimize(layers_output, correct_label, learning_rate, num_classes)
102 | 
103 |     _assert_tensor_shape(logits, [2*3*4, num_classes], 'Logits')
104 | 
105 |     with tf.Session() as sess:
106 |         sess.run(tf.global_variables_initializer())
107 |         sess.run([train_op], {correct_label: np.arange(np.prod(shape)).reshape(shape), learning_rate: 10})
108 |         test, loss = sess.run([layers_output, cross_entropy_loss], {correct_label: np.arange(np.prod(shape)).reshape(shape)})
109 | 
110 |     assert test.min() != 0 or test.max() != 0, 'Training operation not changing weights.'
111 | 
112 | 
113 | @test_safe
114 | def test_train_nn(train_nn):
115 |     epochs = 1
116 |     batch_size = 2
117 | 
118 |     def get_batches_fn(batch_size_parm):
119 |         shape = [batch_size_parm, 2, 3, 3]
120 |         yield np.arange(np.prod(shape)).reshape(shape)
121 | 
122 |     train_op = tf.constant(0)
123 |     cross_entropy_loss = tf.constant(10.11)
124 |     input_image = tf.placeholder(tf.float32, name='input_image')
125 |     correct_label = tf.placeholder(tf.float32, name='correct_label')
126 |     keep_prob = tf.placeholder(tf.float32, name='keep_prob')
127 |     learning_rate = tf.placeholder(tf.float32, name='learning_rate')
128 |     with tf.Session() as sess:
129 |         parameters = {
130 |             'sess': sess,
131 |             'training_epochs': epochs,
132 |             'batch_size': batch_size,
133 |             'get_batches_fn': get_batches_fn,
134 |             'train_op': train_op,
135 |             'cross_entropy_loss': cross_entropy_loss,
136 |             'image_input': input_image,
137 |             'labels': correct_label,
138 |             'keep_prob': keep_prob,
139 |             'learning_rate': learning_rate}
140 |         _prevent_print(train_nn, parameters)
141 | 
142 | 
143 | @test_safe
144 | def test_for_kitti_dataset(data_dir):
145 |     kitti_dataset_path = os.path.join(data_dir, 'data_road')
146 |     training_labels_count = len(glob(os.path.join(kitti_dataset_path, 'training/gt_image_2/*_road_*.png')))
147 |     training_images_count = len(glob(os.path.join(kitti_dataset_path, 'training/image_2/*.png')))
148 |     testing_images_count = len(glob(os.path.join(kitti_dataset_path, 'testing/image_2/*.png')))
149 | 
150 |     assert not (training_images_count == training_labels_count == testing_images_count == 0),\
151 |         'Kitti dataset not found. Extract Kitti dataset in {}'.format(kitti_dataset_path)
152 |     assert training_images_count == 289, 'Expected 289 training images, found {} images.'.format(training_images_count)
153 |     assert training_labels_count == 289, 'Expected 289 training labels, found {} labels.'.format(training_labels_count)
154 |     assert testing_images_count == 290, 'Expected 290 testing images, found {} images.'.format(testing_images_count)
155 | 


--------------------------------------------------------------------------------
/LICENSE:
--------------------------------------------------------------------------------
 1 | MIT License
 2 | 
 3 | Copyright (c) 2019 Fei Ding
 4 | 
 5 | Permission is hereby granted, free of charge, to any person obtaining a copy
 6 | of this software and associated documentation files (the "Software"), to deal
 7 | in the Software without restriction, including without limitation the rights
 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
 9 | copies of the Software, and to permit persons to whom the Software is
10 | furnished to do so, subject to the following conditions:
11 | 
12 | The above copyright notice and this permission notice shall be included in all
13 | copies or substantial portions of the Software.
14 | 
15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21 | SOFTWARE.
22 | 


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
 1 | 
 2 | # Deep Learning using Python/C++/OpenCV
 3 | 
 4 | --- 
 5 | 
 6 | ## Basics
 7 | 
 8 | - [Deep Learning Basics](./resources/deep-learning.md)
 9 | - [Components of Autonomous Driving System](./resources/autonomous-driving.md)
10 | - [Datasets](./resources/datasets.md)
11 | - [Train your own object detector with Faster-RCNN & PyTorch](./faster-rcnn-tutorial)
12 | 
13 | 
14 | ## Computer Vision and Deep Learning
15 | 
16 | #### [P1 - Detecting Lane Lines](./01_finding_lane_lines)
17 |  - **Basic:** Detected highway lane lines on a video stream. Used OpencV image analysis techniques to identify lines, including Hough Transforms and Canny edge detection.
18 |  - **Keywords:** Computer Vision, OpenCV
19 |  
20 | #### [P2 - Traffic Sign Classification](./02_traffic_sign_detector)
21 |  - **Summary:** Built and trained a support vector machines (SVM) to classify traffic signs, using [dlib](http://dlib.net/). Google Street View images can be used to train the detectors. 25~40 images are sufficient to train a good detector.
22 |  - **Keywords:** Computer Vision, Machine Learning
23 |  
24 | #### [P3 - Object Detection with OpenCV](./03_opencv_detection)
25 |  - **Summary:** The provided API (for C++ and Python) is very easy to use, just load the network and run it. Multiple inputs/outputs are supported. Here are the examples: https://github.com/opencv/opencv/tree/master/samples/dnn.
26 | 
27 | #### [P4 - Vehicle Detection and Tracking](./04_vehicle_detection)
28 |  - **Summary:** Created a vehicle detection and tracking pipeline with OpenCV, histogram of oriented gradients (HOG), and support vector machines (SVM). Implemented the same pipeline using a deep network to perform detection. Optimized and evaluated the model on video data from a automotive camera taken during highway driving.
29 |  - **Keywords:** Computer Vision, Deep Learning, OpenCV
30 | 
31 | #### [P5 - Road Segmentation](./05_road_segmentation)
32 | - **Summary:** Implement the road segmentation using a fully-convolutional network.
33 | - **Keywords:** Deep Learning, Semantic Segmentation
34 | 
35 | 
36 | ## References
37 | 
38 | - <https://github.com/spmallick/learnopencv>
39 | - <https://github.com/ndrplz/self-driving-car>
40 | 
41 | 


--------------------------------------------------------------------------------
/faster-rcnn-tutorial/README.md:
--------------------------------------------------------------------------------
 1 | # Train your own object detector with Faster-RCNN & PyTorch
 2 | 
 3 | This repository contains all files that were used for the blog tutorial
 4 | [**Train your own object detector with Faster-RCNN & PyTorch**](https://github.com/ifding/faster-rcnn-tutorial).
 5 | 
 6 | - If you want to use neptune for your own experiments, add the 'NEPTUNE' env var to your system. For example, I use `dotenv`:
 7 | 
 8 | `$ dotenv set NEPTUNE your_key`
 9 | 
10 | It will create `.env` in current dir, and `.env` is already added in the `.gitignore`. After that, when you put `load_dotenv()` in your code, it will automatically take the 'NEPTUNE' env var from `.env`.
11 | 
12 | - Just focus on modifying `custom_dataset.py` file for your own data
13 | - Now compatible with pytorch 1.9 and pytorch lighting 1.37
14 | 
15 | 
16 | ## Installation steps:
17 | 
18 | - `conda create -n <env_name>`
19 | - `conda activate <env_name>`
20 | - `conda install python=3.8` 
21 | - `git clone https://github.com/ifding/faster-rcnn-tutorial.git`
22 | - `cd faster-rcnn-tutorial`
23 | - `pip install .`
24 | - You have to install a pytorch version with `pip` or `conda` that meets the requirements of your hardware. 
25 |   Otherwise the versions for torch etc. specified in [setup.py](setup.py) are installed.
26 |   To install the correct pytorch version for your hardware, check [pytorch.org](https://pytorch.org/).
27 | - [OPTIONAL] To check whether pytorch uses the nvidia gpu, check if `torch.cuda.is_available()` returns `True` in a python shell.
28 | 
29 | ## Custom dataset: balloon
30 | 
31 | - `sh download_dataset.sh`
32 | - `CUDA_VISIBLE_DEVICES=0 python train.py`
33 | - check the checkpoint in folder `balloon` and experiment details in <https://app.neptune.ai/>
34 | 
35 | ![](dataset/result.png)
36 | 
37 | ## Acknowledge
38 | 
39 | Most code is borrowed from <https://github.com/johschmidt42/PyTorch-Object-Detection-Faster-RCNN-Tutorial>, thanks, Johannes Schmidt!
40 | 
41 | I try to reduce less important compontents, and make the whole pipeline more clear.
42 | 


--------------------------------------------------------------------------------
/faster-rcnn-tutorial/custom_dataset.py:
--------------------------------------------------------------------------------
  1 | import os
  2 | import pathlib
  3 | from multiprocessing import Pool
  4 | from typing import List, Dict
  5 | import numpy as np
  6 | import json
  7 | import cv2
  8 | 
  9 | import torch
 10 | from skimage.color import rgba2rgb
 11 | from skimage.io import imread
 12 | from torchvision.ops import box_convert
 13 | from detection.transformations import ComposeDouble, ComposeSingle, map_class_to_int
 14 | from detection.utils import read_json
 15 | 
 16 | 
 17 | # https://github.com/TannerGilbert/Object-Detection-and-Image-Segmentation-with-Detectron2
 18 | def get_balloon_dicts(img_dir):
 19 |     json_file = os.path.join(img_dir, "via_region_data.json")
 20 |     with open(json_file) as f:
 21 |         imgs_anns = json.load(f)
 22 | 
 23 |     dataset_dicts = []
 24 |     for idx, v in enumerate(imgs_anns.values()):
 25 |         record = {}
 26 | 
 27 |         filename = os.path.join(img_dir, v["filename"])
 28 |         height, width = cv2.imread(filename).shape[:2]
 29 | 
 30 |         record["file_name"] = filename
 31 | 
 32 |         annos = v["regions"]
 33 |         boxes, labels = [], []
 34 |         for _, anno in annos.items():
 35 |             assert not anno["region_attributes"]
 36 |             anno = anno["shape_attributes"]
 37 |             px = anno["all_points_x"]
 38 |             py = anno["all_points_y"]
 39 | 
 40 |             boxes.append([np.min(px), np.min(py), np.max(px), np.max(py)])
 41 |             labels.append('balloon')
 42 |         record["annotations"] = {'boxes': boxes, 'labels': labels}
 43 |         dataset_dicts.append(record)
 44 |     return dataset_dicts
 45 | 
 46 | 
 47 | class ObjectDetectionDataSet(torch.utils.data.Dataset):
 48 |     """
 49 |     Builds a dataset with images and their respective targets.
 50 |     A target is expected to be a json file
 51 |     and should contain at least a 'boxes' and a 'labels' key.
 52 |     inputs and targets are expected to be a list of pathlib.Path objects.
 53 | 
 54 |     In case your labels are strings, you can use mapping (a dict) to int-encode them.
 55 |     Returns a dict with the following keys: 'x', 'x_name', 'y', 'y_name'
 56 |     """
 57 | 
 58 |     def __init__(self,
 59 |                  data_path: str,
 60 |                  transform: ComposeDouble = None,
 61 |                  mapping: Dict = None
 62 |                  ):
 63 |         self.data_path = data_path
 64 |         self.dataset_dict = get_balloon_dicts(data_path)
 65 |         self.transform = transform
 66 |         self.mapping = mapping
 67 | 
 68 |     def __len__(self):
 69 |         return len(self.dataset_dict)
 70 | 
 71 |     def __getitem__(self,
 72 |                     index: int):
 73 |         record = self.dataset_dict[index]
 74 | 
 75 |         # Load input
 76 |         x = imread(record["file_name"])
 77 |         img_name = os.path.basename(record["file_name"])
 78 |         
 79 |         # From RGBA to RGB
 80 |         if x.shape[-1] == 4:
 81 |             x = rgba2rgb(x)
 82 |         
 83 |         # Label Mapping
 84 |         y = record["annotations"]
 85 |         if self.mapping:
 86 |             labels = map_class_to_int(y['labels'], mapping=self.mapping)
 87 |         else:
 88 |             labels = y['labels']
 89 | 
 90 |         # Create target, should be converted to np.ndarrays
 91 |         target = {'boxes': np.array(y['boxes']),
 92 |                   'labels': np.array(labels)}
 93 | 
 94 |         if self.transform is not None:
 95 |             x, target = self.transform(x, target)  # returns np.ndarrays
 96 | 
 97 |         # Typecasting
 98 |         x = torch.from_numpy(x).type(torch.float32)
 99 |         target = {key: torch.from_numpy(value).type(torch.int64) for key, value in target.items()}
100 | 
101 |         return {'x': x, 'y': target, 'x_name': img_name, 'y_name': img_name}
102 | 
103 | 
104 | 
105 | class ObjectDetectionDatasetSingle(torch.utils.data.Dataset):
106 |     """
107 |     Builds a dataset with images.
108 |     inputs is expected to be a list of pathlib.Path objects.
109 | 
110 |     Returns a dict with the following keys: 'x', 'x_name'
111 |     """
112 | 
113 |     def __init__(self,
114 |                  inputs: List[pathlib.Path],
115 |                  transform: ComposeSingle = None,
116 |                  use_cache: bool = False,
117 |                  ):
118 |         self.inputs = inputs
119 |         self.transform = transform
120 |         self.use_cache = use_cache
121 | 
122 |         if self.use_cache:
123 |             # Use multiprocessing to load images and targets into RAM
124 |             with Pool() as pool:
125 |                 self.cached_data = pool.starmap(self.read_images, inputs)
126 | 
127 |     def __len__(self):
128 |         return len(self.inputs)
129 | 
130 |     def __getitem__(self,
131 |                     index: int):
132 |         if self.use_cache:
133 |             x = self.cached_data[index]
134 |         else:
135 |             # Select the sample
136 |             input_ID = self.inputs[index]
137 | 
138 |             # Load input and target
139 |             x = self.read_images(input_ID)
140 | 
141 |         # From RGBA to RGB
142 |         if x.shape[-1] == 4:
143 |             x = rgba2rgb(x)
144 | 
145 |         # Preprocessing
146 |         if self.transform is not None:
147 |             x = self.transform(x)  # returns a np.ndarray
148 | 
149 |         # Typecasting
150 |         x = torch.from_numpy(x).type(torch.float32)
151 | 
152 |         return {'x': x, 'x_name': self.inputs[index].name}
153 | 
154 |     @staticmethod
155 |     def read_images(inp):
156 |         return imread(inp)
157 | 
158 | class ObjectDetectionDataSetDouble(torch.utils.data.Dataset):
159 |     """
160 |     Builds a dataset with images and their respective targets.
161 |     A target is expected to be a json file
162 |     and should contain at least a 'boxes' and a 'labels' key.
163 |     inputs and targets are expected to be a list of pathlib.Path objects.
164 |     In case your labels are strings, you can use mapping (a dict) to int-encode them.
165 |     Returns a dict with the following keys: 'x', 'x_name', 'y', 'y_name'
166 |     """
167 | 
168 |     def __init__(self,
169 |                  inputs: List[pathlib.Path],
170 |                  targets: List[pathlib.Path],
171 |                  transform: ComposeDouble = None,
172 |                  use_cache: bool = False,
173 |                  convert_to_format: str = None,
174 |                  mapping: Dict = None
175 |                  ):
176 |         self.inputs = inputs
177 |         self.targets = targets
178 |         self.transform = transform
179 |         self.use_cache = use_cache
180 |         self.convert_to_format = convert_to_format
181 |         self.mapping = mapping
182 | 
183 |         if self.use_cache:
184 |             # Use multiprocessing to load images and targets into RAM
185 |             with Pool() as pool:
186 |                 self.cached_data = pool.starmap(self.read_images, zip(inputs, targets))
187 | 
188 |     def __len__(self):
189 |         return len(self.inputs)
190 | 
191 |     def __getitem__(self,
192 |                     index: int):
193 |         if self.use_cache:
194 |             x, y = self.cached_data[index]
195 |         else:
196 |             # Select the sample
197 |             input_ID = self.inputs[index]
198 |             target_ID = self.targets[index]
199 | 
200 |             # Load input and target
201 |             x, y = self.read_images(input_ID, target_ID)
202 | 
203 |         # From RGBA to RGB
204 |         if x.shape[-1] == 4:
205 |             x = rgba2rgb(x)
206 | 
207 |         # Read boxes
208 |         try:
209 |             boxes = torch.from_numpy(y['boxes']).to(torch.float32)
210 |         except TypeError:
211 |             boxes = torch.tensor(y['boxes']).to(torch.float32)
212 |             
213 |         # Read scores
214 |         if 'scores' in y.keys():
215 |             try:
216 |                 scores = torch.from_numpy(y['scores']).to(torch.float32)
217 |             except TypeError:
218 |                 scores = torch.tensor(y['scores']).to(torch.float32)            
219 | 
220 |         # Label Mapping
221 |         if self.mapping:
222 |             labels = map_class_to_int(y['labels'], mapping=self.mapping)
223 |         else:
224 |             labels = y['labels']
225 | 
226 |         # Read labels
227 |         try:
228 |             labels = torch.from_numpy(labels).to(torch.int64)
229 |         except TypeError:
230 |             labels = torch.tensor(labels).to(torch.int64)
231 | 
232 | 
233 |         # Create target
234 |         target = {'boxes': boxes,
235 |                   'labels': labels}
236 | 
237 |         if 'scores' in y.keys():
238 |             target['scores'] = scores
239 | 
240 |         # Preprocessing
241 |         target = {key: value.numpy() for key, value in target.items()}  # all tensors should be converted to np.ndarrays
242 | 
243 |         if self.transform is not None:
244 |             x, target = self.transform(x, target)  # returns np.ndarrays
245 | 
246 |         # Typecasting
247 |         x = torch.from_numpy(x).type(torch.float32)
248 |         target = {key: torch.from_numpy(value).type(torch.int64) for key, value in target.items()}
249 | 
250 |         return {'x': x, 'y': target, 'x_name': self.inputs[index].name, 'y_name': self.targets[index].name}
251 | 
252 |     @staticmethod
253 |     def read_images(inp, tar):
254 |         return imread(inp), read_json(tar)


--------------------------------------------------------------------------------
/faster-rcnn-tutorial/dataset/README.md:
--------------------------------------------------------------------------------
 1 | 
 2 | 
 3 | # Dataset
 4 | 
 5 | Before we can start training our model we need to download some dataset. In this case we will use a dataset with balloon images.
 6 | 
 7 | <https://github.com/matterport/Mask_RCNN/releases>
 8 | 
 9 | 
10 | 


--------------------------------------------------------------------------------
/faster-rcnn-tutorial/dataset/download_dataset.sh:
--------------------------------------------------------------------------------
1 | # fetch balloon images
2 | 
3 | wget https://github.com/matterport/Mask_RCNN/releases/download/v2.1/balloon_dataset.zip
4 | unzip balloon_dataset.zip > /dev/null
5 | rm balloon_dataset.zip
6 | rm -fr __MACOSX 
7 | 


--------------------------------------------------------------------------------
/faster-rcnn-tutorial/dataset/result.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/ifding/deep-learning-python/fc8bc808d5439686f0ee24a4f0f3b1f5354df6c0/faster-rcnn-tutorial/dataset/result.png


--------------------------------------------------------------------------------
/faster-rcnn-tutorial/detection/__init__.py:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/ifding/deep-learning-python/fc8bc808d5439686f0ee24a4f0f3b1f5354df6c0/faster-rcnn-tutorial/detection/__init__.py


--------------------------------------------------------------------------------
/faster-rcnn-tutorial/detection/anchor_generator.py:
--------------------------------------------------------------------------------
  1 | from typing import Tuple
  2 | 
  3 | import torch
  4 | from torch import nn
  5 | from torch.jit.annotations import List, Optional, Dict
  6 | from torchvision.models.detection.image_list import ImageList
  7 | from torchvision.models.detection.transform import GeneralizedRCNNTransform
  8 | 
  9 | 
 10 | class AnchorGenerator(nn.Module):
 11 |     # Slightly adapted AnchorGenerator from torchvision.
 12 |     # It returns anchors_over_all_feature_maps instead of anchors (concatenated for every feature layer)
 13 | 
 14 |     """
 15 |     Module that generates anchors for a set of feature maps and
 16 |     image sizes.
 17 | 
 18 |     The module support computing anchors at multiple sizes and aspect ratios
 19 |     per feature map. This module assumes aspect ratio = height / width for
 20 |     each anchor.
 21 | 
 22 |     sizes and aspect_ratios should have the same number of elements, and it should
 23 |     correspond to the number of feature maps.
 24 | 
 25 |     sizes[i] and aspect_ratios[i] can have an arbitrary number of elements,
 26 |     and AnchorGenerator will output a set of sizes[i] * aspect_ratios[i] anchors
 27 |     per spatial location for feature map i.
 28 | 
 29 |     Arguments:
 30 |         sizes (Tuple[Tuple[int]]):
 31 |         aspect_ratios (Tuple[Tuple[float]]):
 32 |     """
 33 | 
 34 |     __annotations__ = {
 35 |         "cell_anchors": Optional[List[torch.Tensor]],
 36 |         "_cache": Dict[str, List[torch.Tensor]]
 37 |     }
 38 | 
 39 |     def __init__(
 40 |             self,
 41 |             sizes=((128, 256, 512),),
 42 |             aspect_ratios=((0.5, 1.0, 2.0),),
 43 |     ):
 44 |         super(AnchorGenerator, self).__init__()
 45 | 
 46 |         if not isinstance(sizes[0], (list, tuple)):
 47 |             sizes = tuple((s,) for s in sizes)
 48 |         if not isinstance(aspect_ratios[0], (list, tuple)):
 49 |             aspect_ratios = (aspect_ratios,) * len(sizes)
 50 | 
 51 |         assert len(sizes) == len(aspect_ratios)
 52 | 
 53 |         self.sizes = sizes
 54 |         self.aspect_ratios = aspect_ratios
 55 |         self.cell_anchors = None
 56 |         self._cache = {}
 57 | 
 58 |     def generate_anchors(self, scales, aspect_ratios, dtype=torch.float32, device="cpu"):
 59 |         # type: (List[int], List[float], int, Device) -> Tensor  # noqa: F821
 60 |         scales = torch.as_tensor(scales, dtype=dtype, device=device)
 61 |         aspect_ratios = torch.as_tensor(aspect_ratios, dtype=dtype, device=device)
 62 |         h_ratios = torch.sqrt(aspect_ratios)
 63 |         w_ratios = 1 / h_ratios
 64 | 
 65 |         ws = (w_ratios[:, None] * scales[None, :]).view(-1)
 66 |         hs = (h_ratios[:, None] * scales[None, :]).view(-1)
 67 | 
 68 |         base_anchors = torch.stack([-ws, -hs, ws, hs], dim=1) / 2
 69 |         return base_anchors.round()
 70 | 
 71 |     def set_cell_anchors(self, dtype, device):
 72 |         # type: (int, Device) -> None  # noqa: F821
 73 |         if self.cell_anchors is not None:
 74 |             cell_anchors = self.cell_anchors
 75 |             assert cell_anchors is not None
 76 |             # suppose that all anchors have the same device
 77 |             # which is a valid assumption in the current state of the codebase
 78 |             if cell_anchors[0].device == device:
 79 |                 return
 80 | 
 81 |         cell_anchors = [
 82 |             self.generate_anchors(
 83 |                 sizes,
 84 |                 aspect_ratios,
 85 |                 dtype,
 86 |                 device
 87 |             )
 88 |             for sizes, aspect_ratios in zip(self.sizes, self.aspect_ratios)
 89 |         ]
 90 |         self.cell_anchors = cell_anchors
 91 | 
 92 |     def num_anchors_per_location(self):
 93 |         return [len(s) * len(a) for s, a in zip(self.sizes, self.aspect_ratios)]
 94 | 
 95 |     # For every combination of (a, (g, s), i) in (self.cell_anchors, zip(grid_sizes, strides), 0:2),
 96 |     # output g[i] anchors that are s[i] distance apart in direction i, with the same dimensions as a.
 97 |     def grid_anchors(self, grid_sizes, strides):
 98 |         # type: (List[List[int]], List[List[Tensor]]) -> List[Tensor]
 99 |         anchors = []
100 |         cell_anchors = self.cell_anchors
101 |         assert cell_anchors is not None
102 |         assert len(grid_sizes) == len(strides) == len(cell_anchors)
103 | 
104 |         for size, stride, base_anchors in zip(
105 |                 grid_sizes, strides, cell_anchors
106 |         ):
107 |             grid_height, grid_width = size
108 |             stride_height, stride_width = stride
109 |             device = base_anchors.device
110 | 
111 |             # For output anchor, compute [x_center, y_center, x_center, y_center]
112 |             shifts_x = torch.arange(
113 |                 0, grid_width, dtype=torch.float32, device=device
114 |             ) * stride_width
115 |             shifts_y = torch.arange(
116 |                 0, grid_height, dtype=torch.float32, device=device
117 |             ) * stride_height
118 |             shift_y, shift_x = torch.meshgrid(shifts_y, shifts_x)
119 |             shift_x = shift_x.reshape(-1)
120 |             shift_y = shift_y.reshape(-1)
121 |             shifts = torch.stack((shift_x, shift_y, shift_x, shift_y), dim=1)
122 | 
123 |             # For every (base anchor, output anchor) pair,
124 |             # offset each zero-centered base anchor by the center of the output anchor.
125 |             anchors.append(
126 |                 (shifts.view(-1, 1, 4) + base_anchors.view(1, -1, 4)).reshape(-1, 4)
127 |             )
128 | 
129 |         return anchors
130 | 
131 |     def cached_grid_anchors(self, grid_sizes, strides):
132 |         # type: (List[List[int]], List[List[Tensor]]) -> List[Tensor]
133 |         key = str(grid_sizes) + str(strides)
134 |         if key in self._cache:
135 |             return self._cache[key]
136 |         anchors = self.grid_anchors(grid_sizes, strides)
137 |         self._cache[key] = anchors
138 |         return anchors
139 | 
140 |     def forward(self, image_list, feature_maps):
141 |         # type: (ImageList, List[Tensor]) -> List[Tensor]
142 |         grid_sizes = list([feature_map.shape[-2:] for feature_map in feature_maps])
143 |         image_size = image_list.tensors.shape[-2:]
144 |         dtype, device = feature_maps[0].dtype, feature_maps[0].device
145 |         strides = [[torch.tensor(image_size[0] // g[0], dtype=torch.int64, device=device),
146 |                     torch.tensor(image_size[1] // g[1], dtype=torch.int64, device=device)] for g in grid_sizes]
147 |         self.set_cell_anchors(dtype, device)
148 |         anchors_over_all_feature_maps = self.cached_grid_anchors(grid_sizes, strides)
149 |         self._cache.clear()
150 |         return anchors_over_all_feature_maps
151 | 
152 | 
153 | def get_anchor_boxes(image: torch.tensor,
154 |                      rcnn_transform: GeneralizedRCNNTransform,
155 |                      feature_map_size: tuple,
156 |                      anchor_size: Tuple[tuple] = ((128, 256, 512),),
157 |                      aspect_ratios: Tuple[tuple] = ((1.0,),),
158 |                      ):
159 |     """
160 |     Returns the anchors for a given image and feature map.
161 |     image should be a torch.tensor with shape [C, H, W].
162 |     feature_map_size should be a tuple with shape (C, H, W]).
163 |     Only one feature map supported at the moment.
164 | 
165 |     Example:
166 | 
167 |     from torchvision.models.detection.transform import GeneralizedRCNNTransform
168 | 
169 |     transform = GeneralizedRCNNTransform(min_size=1024,
170 |                                          max_size=1024,
171 |                                          image_mean=[0.485, 0.456, 0.406],
172 |                                          image_std=[0.229, 0.224, 0.225])
173 | 
174 |     image = dataset[0]['x'] # ObjectDetectionDataSet
175 | 
176 |     anchors = get_anchor_boxes(image,
177 |                                transform,
178 |                                feature_map_size=(512, 16, 16),
179 |                                anchor_size=((128, 256, 512),),
180 |                                aspect_ratios=((1.0, 2.0),)
181 |                                )
182 |     """
183 | 
184 |     image_transformed = rcnn_transform([image])
185 | 
186 |     features = [torch.rand(size=feature_map_size)]
187 | 
188 |     anchor_gen = AnchorGenerator(anchor_size, aspect_ratios)
189 |     anchors = anchor_gen(image_list=image_transformed[0], feature_maps=features)
190 | 
191 |     return anchors[0]
192 | 


--------------------------------------------------------------------------------
/faster-rcnn-tutorial/detection/backbone_resnet.py:
--------------------------------------------------------------------------------
  1 | import torch
  2 | import torchvision.models as models
  3 | from torch import nn
  4 | from torchvision.models import resnet
  5 | from torchvision.models._utils import IntermediateLayerGetter
  6 | from torchvision.ops import misc as misc_nn_ops
  7 | from torchvision.ops.feature_pyramid_network import FeaturePyramidNetwork
  8 | 
  9 | 
 10 | def get_resnet_backbone(backbone_name: str):
 11 |     """
 12 |     Returns a resnet backbone pretrained on ImageNet.
 13 |     Removes the average-pooling layer and the linear layer at the end.
 14 |     """
 15 |     if backbone_name == 'resnet18':
 16 |         pretrained_model = models.resnet18(pretrained=True, progress=False)
 17 |         out_channels = 512
 18 |     elif backbone_name == 'resnet34':
 19 |         pretrained_model = models.resnet34(pretrained=True, progress=False)
 20 |         out_channels = 512
 21 |     elif backbone_name == 'resnet50':
 22 |         pretrained_model = models.resnet50(pretrained=True, progress=False)
 23 |         out_channels = 2048
 24 |     elif backbone_name == 'resnet101':
 25 |         pretrained_model = models.resnet101(pretrained=True, progress=False)
 26 |         out_channels = 2048
 27 |     elif backbone_name == 'resnet152':
 28 |         pretrained_model = models.resnet152(pretrained=True, progress=False)
 29 |         out_channels = 2048
 30 | 
 31 |     backbone = torch.nn.Sequential(*list(pretrained_model.children())[:-2])
 32 |     backbone.out_channels = out_channels
 33 | 
 34 |     return backbone
 35 | 
 36 | 
 37 | def get_resnet_fpn_backbone(backbone_name: str, pretrained: bool = True, trainable_layers: int = 5):
 38 |     """
 39 |     Returns a resnet backbone with fpn pretrained on ImageNet.
 40 |     """
 41 |     backbone = resnet_fpn_backbone(backbone_name=backbone_name,
 42 |                                    pretrained=pretrained,
 43 |                                    trainable_layers=trainable_layers)
 44 | 
 45 |     backbone.out_channels = 256
 46 |     return backbone
 47 | 
 48 | 
 49 | def resnet_fpn_backbone(backbone_name: str,
 50 |                         pretrained: bool,
 51 |                         norm_layer=misc_nn_ops.FrozenBatchNorm2d,
 52 |                         trainable_layers: int = 3,
 53 |                         returned_layers=None,
 54 |                         extra_blocks=None
 55 |                         ):
 56 |     # Slight adaptation from the original pytorch vision package
 57 |     # Changes: Removed extra_blocks parameter - This parameter invokes LastLevelMaxPool(), which I don't need
 58 |     """
 59 |     Constructs a specified ResNet backbone with FPN on top. Freezes the specified number of layers in the backbone.
 60 | 
 61 |     Arguments:
 62 |         backbone_name (string): resnet architecture. Possible values are 'ResNet', 'resnet18', 'resnet34', 'resnet50',
 63 |              'resnet101', 'resnet152', 'resnext50_32x4d', 'resnext101_32x8d', 'wide_resnet50_2', 'wide_resnet101_2'
 64 |         norm_layer (torchvision.ops): it is recommended to use the default value. For details visit:
 65 |             (https://github.com/facebookresearch/maskrcnn-benchmark/issues/267)
 66 |         pretrained (bool): If True, returns a model with backbone pre-trained on Imagenet
 67 |         trainable_layers (int): number of trainable (not frozen) resnet layers starting from final block.
 68 |             Valid values are between 0 and 5, with 5 meaning all backbone layers are trainable.
 69 |     """
 70 |     backbone = resnet.__dict__[backbone_name](
 71 |         pretrained=pretrained,
 72 |         norm_layer=norm_layer)
 73 | 
 74 |     # select layers that wont be frozen
 75 |     assert trainable_layers <= 5 and trainable_layers >= 0
 76 |     layers_to_train = ['layer4', 'layer3', 'layer2', 'layer1', 'conv1'][:trainable_layers]
 77 |     # freeze layers only if pretrained backbone is used
 78 |     for name, parameter in backbone.named_parameters():
 79 |         if all([not name.startswith(layer) for layer in layers_to_train]):
 80 |             parameter.requires_grad_(False)
 81 | 
 82 |     if returned_layers is None:
 83 |         returned_layers = [1, 2, 3, 4]
 84 |     assert min(returned_layers) > 0 and max(returned_layers) < 5
 85 |     return_layers = {f'layer{k}': str(v) for v, k in enumerate(returned_layers)}
 86 | 
 87 |     in_channels_stage2 = backbone.inplanes // 8
 88 |     in_channels_list = [in_channels_stage2 * 2 ** (i - 1) for i in returned_layers]
 89 |     out_channels = 256
 90 |     return BackboneWithFPN(backbone, return_layers, in_channels_list, out_channels, extra_blocks=extra_blocks)
 91 | 
 92 | 
 93 | class BackboneWithFPN(nn.Module):
 94 |     """
 95 |     Adds a FPN on top of a model.
 96 |     Internally, it uses torchvision.models._utils.IntermediateLayerGetter to
 97 |     extract a submodel that returns the feature maps specified in return_layers.
 98 |     The same limitations of IntermediatLayerGetter apply here.
 99 |     Arguments:
100 |         backbone (nn.Module)
101 |         return_layers (Dict[name, new_name]): a dict containing the names
102 |             of the modules for which the activations will be returned as
103 |             the key of the dict, and the value of the dict is the name
104 |             of the returned activation (which the user can specify).
105 |         in_channels_list (List[int]): number of channels for each feature map
106 |             that is returned, in the order they are present in the OrderedDict
107 |         out_channels (int): number of channels in the FPN.
108 |     Attributes:
109 |         out_channels (int): the number of channels in the FPN
110 |     """
111 | 
112 |     def __init__(self, backbone, return_layers, in_channels_list, out_channels, extra_blocks=None):
113 |         super(BackboneWithFPN, self).__init__()
114 | 
115 |         self.body = IntermediateLayerGetter(backbone, return_layers=return_layers)
116 |         self.fpn = FeaturePyramidNetwork(
117 |             in_channels_list=in_channels_list,
118 |             out_channels=out_channels,
119 |             extra_blocks=extra_blocks,
120 |         )
121 |         self.out_channels = out_channels
122 | 
123 |     def forward(self, x):
124 |         x = self.body(x)
125 |         x = self.fpn(x)
126 |         return x
127 | 


--------------------------------------------------------------------------------
/faster-rcnn-tutorial/detection/faster_RCNN.py:
--------------------------------------------------------------------------------
  1 | from collections import OrderedDict
  2 | from itertools import chain
  3 | from typing import Tuple, List
  4 | 
  5 | import pytorch_lightning as pl
  6 | import torch
  7 | from torchvision.models.detection.faster_rcnn import FasterRCNN
  8 | from torchvision.models.detection.rpn import AnchorGenerator
  9 | from torchvision.ops import MultiScaleRoIAlign
 10 | 
 11 | from metrics.enumerators import MethodAveragePrecision
 12 | from metrics.pascal_voc_evaluator import get_pascalvoc_metrics
 13 | from .backbone_resnet import get_resnet_backbone, get_resnet_fpn_backbone
 14 | from .utils import from_dict_to_boundingbox
 15 | 
 16 | 
 17 | def get_anchor_generator(anchor_size: Tuple[tuple] = None, aspect_ratios: Tuple[tuple] = None):
 18 |     """Returns the anchor generator."""
 19 |     if anchor_size is None:
 20 |         anchor_size = ((16,), (32,), (64,), (128,))
 21 |     if aspect_ratios is None:
 22 |         aspect_ratios = ((0.5, 1.0, 2.0),) * len(anchor_size)
 23 | 
 24 |     anchor_generator = AnchorGenerator(sizes=anchor_size,
 25 |                                        aspect_ratios=aspect_ratios)
 26 |     return anchor_generator
 27 | 
 28 | 
 29 | def get_roi_pool(featmap_names: List[str] = None, output_size: int = 7, sampling_ratio: int = 2):
 30 |     """Returns the ROI Pooling"""
 31 |     if featmap_names is None:
 32 |         # default for resnet with FPN
 33 |         featmap_names = ['0', '1', '2', '3']
 34 | 
 35 |     roi_pooler = MultiScaleRoIAlign(featmap_names=featmap_names,
 36 |                                     output_size=output_size,
 37 |                                     sampling_ratio=sampling_ratio)
 38 | 
 39 |     return roi_pooler
 40 | 
 41 | 
 42 | def get_fasterRCNN(backbone: torch.nn.Module,
 43 |                    anchor_generator: AnchorGenerator,
 44 |                    roi_pooler: MultiScaleRoIAlign,
 45 |                    num_classes: int,
 46 |                    image_mean: List[float] = [0.485, 0.456, 0.406],
 47 |                    image_std: List[float] = [0.229, 0.224, 0.225],
 48 |                    min_size: int = 512,
 49 |                    max_size: int = 1024,
 50 |                    **kwargs
 51 |                    ):
 52 |     """Returns the Faster-RCNN model. Default normalization: ImageNet"""
 53 |     model = FasterRCNN(backbone=backbone,
 54 |                        rpn_anchor_generator=anchor_generator,
 55 |                        box_roi_pool=roi_pooler,
 56 |                        num_classes=num_classes,
 57 |                        image_mean=image_mean,  # ImageNet
 58 |                        image_std=image_std,  # ImageNet
 59 |                        min_size=min_size,
 60 |                        max_size=max_size,
 61 |                        **kwargs
 62 |                        )
 63 |     model.num_classes = num_classes
 64 |     model.image_mean = image_mean
 65 |     model.image_std = image_std
 66 |     model.min_size = min_size
 67 |     model.max_size = max_size
 68 | 
 69 |     return model
 70 | 
 71 | 
 72 | def get_fasterRCNN_resnet(num_classes: int,
 73 |                           backbone_name: str,
 74 |                           anchor_size: List[float],
 75 |                           aspect_ratios: List[float],
 76 |                           fpn: bool = True,
 77 |                           min_size: int = 512,
 78 |                           max_size: int = 1024,
 79 |                           **kwargs
 80 |                           ):
 81 |     """Returns the Faster-RCNN model with resnet backbone with and without fpn."""
 82 | 
 83 |     # Backbone
 84 |     if fpn:
 85 |         backbone = get_resnet_fpn_backbone(backbone_name=backbone_name)
 86 |     else:
 87 |         backbone = get_resnet_backbone(backbone_name=backbone_name)
 88 | 
 89 |     # Anchors
 90 |     anchor_size = anchor_size
 91 |     aspect_ratios = aspect_ratios * len(anchor_size)
 92 |     anchor_generator = get_anchor_generator(anchor_size=anchor_size, aspect_ratios=aspect_ratios)
 93 | 
 94 |     # ROI Pool
 95 |     with torch.no_grad():
 96 |         backbone.eval()
 97 |         random_input = torch.rand(size=(1, 3, 512, 512))
 98 |         features = backbone(random_input)
 99 | 
100 |     if isinstance(features, torch.Tensor):
101 | 
102 |         features = OrderedDict([('0', features)])
103 | 
104 |     featmap_names = [key for key in features.keys() if key.isnumeric()]
105 | 
106 |     roi_pool = get_roi_pool(featmap_names=featmap_names)
107 | 
108 |     # Model
109 |     return get_fasterRCNN(backbone=backbone,
110 |                           anchor_generator=anchor_generator,
111 |                           roi_pooler=roi_pool,
112 |                           num_classes=num_classes,
113 |                           min_size=min_size,
114 |                           max_size=max_size,
115 |                           **kwargs)
116 | 
117 | 
118 | class FasterRCNN_lightning(pl.LightningModule):
119 |     def __init__(self,
120 |                  model: torch.nn.Module,
121 |                  lr: float = 0.0001,
122 |                  iou_threshold: float = 0.5
123 |                  ):
124 |         super().__init__()
125 | 
126 |         # Model
127 |         self.model = model
128 | 
129 |         # Classes (background inclusive)
130 |         self.num_classes = self.model.num_classes
131 | 
132 |         # Learning rate
133 |         self.lr = lr
134 | 
135 |         # IoU threshold
136 |         self.iou_threshold = iou_threshold
137 | 
138 |         # Transformation parameters
139 |         self.mean = model.image_mean
140 |         self.std = model.image_std
141 |         self.min_size = model.min_size
142 |         self.max_size = model.max_size
143 | 
144 |         # Save hyperparameters
145 |         self.save_hyperparameters()
146 | 
147 |     def forward(self, x):
148 |         self.model.eval()
149 |         return self.model(x)
150 | 
151 |     def training_step(self, batch, batch_idx):
152 |         # Batch
153 |         x, y, x_name, y_name = batch  # tuple unpacking
154 | 
155 |         loss_dict = self.model(x, y)
156 |         loss = sum(loss for loss in loss_dict.values())
157 | 
158 |         self.log_dict(loss_dict)
159 |         return loss
160 | 
161 |     def validation_step(self, batch, batch_idx):
162 |         # Batch
163 |         x, y, x_name, y_name = batch
164 | 
165 |         # Inference
166 |         preds = self.model(x)
167 | 
168 |         gt_boxes = [from_dict_to_boundingbox(target, name=name, groundtruth=True) for target, name in zip(y, x_name)]
169 |         gt_boxes = list(chain(*gt_boxes))
170 | 
171 |         pred_boxes = [from_dict_to_boundingbox(pred, name=name, groundtruth=False) for pred, name in zip(preds, x_name)]
172 |         pred_boxes = list(chain(*pred_boxes))
173 | 
174 |         return {'pred_boxes': pred_boxes, 'gt_boxes': gt_boxes}
175 | 
176 |     def validation_epoch_end(self, outs):
177 |         gt_boxes = [out['gt_boxes'] for out in outs]
178 |         gt_boxes = list(chain(*gt_boxes))
179 |         pred_boxes = [out['pred_boxes'] for out in outs]
180 |         pred_boxes = list(chain(*pred_boxes))
181 | 
182 |         metric = get_pascalvoc_metrics(gt_boxes=gt_boxes,
183 |                                        det_boxes=pred_boxes,
184 |                                        iou_threshold=self.iou_threshold,
185 |                                        method=MethodAveragePrecision.EVERY_POINT_INTERPOLATION,
186 |                                        generate_table=True)
187 | 
188 |         per_class, mAP = metric['per_class'], metric['mAP']
189 |         self.log('Validation_mAP', mAP)
190 | 
191 |         for key, value in per_class.items():
192 |             self.log(f'Validation_AP_{key}', value['AP'])
193 | 
194 |     def test_step(self, batch, batch_idx):
195 |         # Batch
196 |         x, y, x_name, y_name = batch
197 | 
198 |         # Inference
199 |         preds = self.model(x)
200 | 
201 |         gt_boxes = [from_dict_to_boundingbox(target, name=name, groundtruth=True) for target, name in zip(y, x_name)]
202 |         gt_boxes = list(chain(*gt_boxes))
203 | 
204 |         pred_boxes = [from_dict_to_boundingbox(pred, name=name, groundtruth=False) for pred, name in zip(preds, x_name)]
205 |         pred_boxes = list(chain(*pred_boxes))
206 | 
207 |         return {'pred_boxes': pred_boxes, 'gt_boxes': gt_boxes}
208 | 
209 |     def test_epoch_end(self, outs):
210 |         gt_boxes = [out['gt_boxes'] for out in outs]
211 |         gt_boxes = list(chain(*gt_boxes))
212 |         pred_boxes = [out['pred_boxes'] for out in outs]
213 |         pred_boxes = list(chain(*pred_boxes))
214 | 
215 |         metric = get_pascalvoc_metrics(gt_boxes=gt_boxes,
216 |                                        det_boxes=pred_boxes,
217 |                                        iou_threshold=self.iou_threshold,
218 |                                        method=MethodAveragePrecision.EVERY_POINT_INTERPOLATION,
219 |                                        generate_table=True)
220 | 
221 |         per_class, mAP = metric['per_class'], metric['mAP']
222 |         self.log('Test_mAP', mAP)
223 | 
224 |         for key, value in per_class.items():
225 |             self.log(f'Test_AP_{key}', value['AP'])
226 | 
227 |     def configure_optimizers(self):
228 |         optimizer = torch.optim.SGD(self.model.parameters(),
229 |                                     lr=self.lr,
230 |                                     momentum=0.9,
231 |                                     weight_decay=0.005)
232 |         lr_scheduler = torch.optim.lr_scheduler.ReduceLROnPlateau(optimizer,
233 |                                                                   mode='max',
234 |                                                                   factor=0.75,
235 |                                                                   patience=30,
236 |                                                                   min_lr=0)
237 |         return {'optimizer': optimizer, 'lr_scheduler': lr_scheduler, 'monitor': 'Validation_mAP'}
238 | 


--------------------------------------------------------------------------------
/faster-rcnn-tutorial/detection/transformations.py:
--------------------------------------------------------------------------------
  1 | from functools import partial
  2 | from typing import List, Callable
  3 | 
  4 | import albumentations as A
  5 | import numpy as np
  6 | import torch
  7 | from sklearn.externals._pilutil import bytescale
  8 | from torchvision.ops import nms
  9 | 
 10 | 
 11 | def normalize_01(inp: np.ndarray):
 12 |     """Squash image input to the value range [0, 1] (no clipping)"""
 13 |     inp_out = (inp - np.min(inp)) / np.ptp(inp)
 14 |     return inp_out
 15 | 
 16 | 
 17 | def normalize(inp: np.ndarray, mean: float, std: float):
 18 |     """Normalize based on mean and standard deviation."""
 19 |     inp_out = (inp - mean) / std
 20 |     return inp_out
 21 | 
 22 | 
 23 | def re_normalize(inp: np.ndarray,
 24 |                  low: int = 0,
 25 |                  high: int = 255
 26 |                  ):
 27 |     """Normalize the data to a certain range. Default: [0-255]"""
 28 |     inp_out = bytescale(inp, low=low, high=high)
 29 |     return inp_out
 30 | 
 31 | 
 32 | def clip_bbs(inp: np.ndarray,
 33 |              bbs: np.ndarray):
 34 |     """
 35 |     If the bounding boxes exceed one dimension, they are clipped to the dim's maximum.
 36 |     Bounding boxes are expected to be in xyxy format.
 37 |     Example: x_value=224 but x_shape=200 -> x1=199
 38 |     """
 39 | 
 40 |     def clip(value: int, max: int):
 41 | 
 42 |         if value >= max - 1:
 43 |             value = max - 1
 44 |         elif value <= 0:
 45 |             value = 0
 46 | 
 47 |         return value
 48 | 
 49 |     output = []
 50 |     for bb in bbs:
 51 |         x1, y1, x2, y2 = tuple(bb)
 52 |         x_shape = inp.shape[1]
 53 |         y_shape = inp.shape[0]
 54 | 
 55 |         x1 = clip(x1, x_shape)
 56 |         y1 = clip(y1, y_shape)
 57 |         x2 = clip(x2, x_shape)
 58 |         y2 = clip(y2, y_shape)
 59 | 
 60 |         output.append([x1, y1, x2, y2])
 61 | 
 62 |     return np.array(output)
 63 | 
 64 | 
 65 | def map_class_to_int(labels: List[str], mapping: dict):
 66 |     """Maps a string to an integer."""
 67 |     labels = np.array(labels)
 68 |     dummy = np.empty_like(labels)
 69 |     for key, value in mapping.items():
 70 |         dummy[labels == key] = value
 71 | 
 72 |     return dummy.astype(np.uint8)
 73 | 
 74 | 
 75 | def apply_nms(target: dict, iou_threshold):
 76 |     """Non-maximum Suppression"""
 77 |     boxes = torch.tensor(target['boxes'])
 78 |     labels = torch.tensor(target['labels'])
 79 |     scores = torch.tensor(target['scores'])
 80 | 
 81 |     if boxes.size()[0] > 0:
 82 |         mask = nms(boxes, scores, iou_threshold=iou_threshold)
 83 |         mask = (np.array(mask),)
 84 | 
 85 |         target['boxes'] = np.asarray(boxes)[mask]
 86 |         target['labels'] = np.asarray(labels)[mask]
 87 |         target['scores'] = np.asarray(scores)[mask]
 88 | 
 89 |     return target
 90 | 
 91 | 
 92 | def apply_score_threshold(target: dict, score_threshold):
 93 |     """Removes bounding box predictions with low scores."""
 94 |     boxes = target['boxes']
 95 |     labels = target['labels']
 96 |     scores = target['scores']
 97 | 
 98 |     mask = np.where(scores > score_threshold)
 99 |     target['boxes'] = boxes[mask]
100 |     target['labels'] = labels[mask]
101 |     target['scores'] = scores[mask]
102 | 
103 |     return target
104 | 
105 | 
106 | class Repr:
107 |     """Evaluatable string representation of an object"""
108 | 
109 |     def __repr__(self): return f'{self.__class__.__name__}: {self.__dict__}'
110 | 
111 | 
112 | class FunctionWrapperSingle(Repr):
113 |     """A function wrapper that returns a partial for input only."""
114 | 
115 |     def __init__(self, function: Callable, *args, **kwargs):
116 |         self.function = partial(function, *args, **kwargs)
117 | 
118 |     def __call__(self, inp: np.ndarray): return self.function(inp)
119 | 
120 | 
121 | class FunctionWrapperDouble(Repr):
122 |     """A function wrapper that returns a partial for an input-target pair."""
123 | 
124 |     def __init__(self, function: Callable, input: bool = True, target: bool = False, *args, **kwargs):
125 |         self.function = partial(function, *args, **kwargs)
126 |         self.input = input
127 |         self.target = target
128 | 
129 |     def __call__(self, inp: np.ndarray, tar: dict):
130 |         if self.input: inp = self.function(inp)
131 |         if self.target: tar = self.function(tar)
132 |         return inp, tar
133 | 
134 | 
135 | class Compose:
136 |     """Baseclass - composes several transforms together."""
137 | 
138 |     def __init__(self, transforms: List[Callable]):
139 |         self.transforms = transforms
140 | 
141 |     def __repr__(self): return str([transform for transform in self.transforms])
142 | 
143 | 
144 | class ComposeDouble(Compose):
145 |     """Composes transforms for input-target pairs."""
146 | 
147 |     def __call__(self, inp: np.ndarray, target: dict):
148 |         for t in self.transforms:
149 |             inp, target = t(inp, target)
150 |         return inp, target
151 | 
152 | 
153 | class ComposeSingle(Compose):
154 |     """Composes transforms for input only."""
155 | 
156 |     def __call__(self, inp: np.ndarray):
157 |         for t in self.transforms:
158 |             inp = t(inp)
159 |         return inp
160 | 
161 | 
162 | class AlbumentationWrapper(Repr):
163 |     """
164 |     A wrapper for the albumentation package.
165 |     Bounding boxes are expected to be in xyxy format (pascal_voc).
166 |     Bounding boxes cannot be larger than the spatial image's dimensions.
167 |     Use Clip() if your bounding boxes are outside of the image, before using this wrapper.
168 |     """
169 | 
170 |     def __init__(self, albumentation: Callable, format: str = 'pascal_voc'):
171 |         self.albumentation = albumentation
172 |         self.format = format
173 | 
174 |     def __call__(self, inp: np.ndarray, tar: dict):
175 |         # input, target
176 |         transform = A.Compose([
177 |             self.albumentation
178 |         ], bbox_params=A.BboxParams(format=self.format, label_fields=['class_labels']))
179 | 
180 |         out_dict = transform(image=inp, bboxes=tar['boxes'], class_labels=tar['labels'])
181 | 
182 |         input_out = np.array(out_dict['image'])
183 |         boxes = np.array(out_dict['bboxes'])
184 |         labels = np.array(out_dict['class_labels'])
185 | 
186 |         tar['boxes'] = boxes
187 |         tar['labels'] = labels
188 | 
189 |         return input_out, tar
190 | 
191 | 
192 | class Clip(Repr):
193 |     """
194 |     If the bounding boxes exceed one dimension, they are clipped to the dim's maximum.
195 |     Bounding boxes are expected to be in xyxy format.
196 |     Example: x_value=224 but x_shape=200 -> x1=199
197 |     """
198 | 
199 |     def __call__(self, inp: np.ndarray, tar: dict):
200 |         new_boxes = clip_bbs(inp=inp, bbs=tar['boxes'])
201 |         tar['boxes'] = new_boxes
202 | 
203 |         return inp, tar
204 | 


--------------------------------------------------------------------------------
/faster-rcnn-tutorial/detection/utils.py:
--------------------------------------------------------------------------------
  1 | import json
  2 | import os
  3 | import pathlib
  4 | import cv2
  5 | 
  6 | import importlib_metadata
  7 | import numpy as np
  8 | import pandas as pd
  9 | import torch
 10 | from IPython import get_ipython
 11 | from neptunecontrib.api import log_table
 12 | from torchvision.models.detection.transform import GeneralizedRCNNTransform
 13 | from torchvision.ops import box_convert, box_area
 14 | 
 15 | from metrics.bounding_box import BoundingBox
 16 | from metrics.enumerators import BBFormat, BBType
 17 | 
 18 | 
 19 | def get_filenames_of_path(path: pathlib.Path, ext: str = '*'):
 20 |     """
 21 |     Returns a list of files in a directory/path. Uses pathlib.
 22 |     """
 23 |     filenames = [file for file in path.glob(ext) if file.is_file()]
 24 |     assert len(filenames) > 0, f'No files found in path: {path}'
 25 |     return filenames
 26 | 
 27 | 
 28 | def read_json(path: pathlib.Path):
 29 |     with open(str(path), 'r') as fp:  # fp is the file pointer
 30 |         file = json.loads(s=fp.read())
 31 | 
 32 |     return file
 33 | 
 34 | 
 35 | def save_json(obj, path: pathlib.Path):
 36 |     with open(path, 'w') as fp:  # fp is the file pointer
 37 |         json.dump(obj=obj, fp=fp, indent=4, sort_keys=False)
 38 | 
 39 | 
 40 | def collate_double(batch):
 41 |     """
 42 |     collate function for the ObjectDetectionDataSet.
 43 |     Only used by the dataloader.
 44 |     """
 45 |     x = [sample['x'] for sample in batch]
 46 |     y = [sample['y'] for sample in batch]
 47 |     x_name = [sample['x_name'] for sample in batch]
 48 |     y_name = [sample['y_name'] for sample in batch]
 49 |     return x, y, x_name, y_name
 50 | 
 51 | 
 52 | def collate_single(batch):
 53 |     """
 54 |     collate function for the ObjectDetectionDataSetSingle.
 55 |     Only used by the dataloader.
 56 |     """
 57 |     x = [sample['x'] for sample in batch]
 58 |     x_name = [sample['x_name'] for sample in batch]
 59 |     return x, x_name
 60 | 
 61 | 
 62 | def color_mapping_func(labels, mapping):
 63 |     """Maps an label (integer or string) to a color"""
 64 |     color_list = [mapping[value] for value in labels]
 65 |     return color_list
 66 | 
 67 | 
 68 | def enable_gui_qt():
 69 |     """Performs the magic command %gui qt"""
 70 |     ipython = get_ipython()
 71 |     ipython.magic('gui qt')
 72 | 
 73 | 
 74 | def stats_dataset(dataset, rcnn_transform: GeneralizedRCNNTransform = False):
 75 |     """
 76 |     Iterates over the dataset and returns some stats.
 77 |     Can be useful to pick the right anchor box sizes.
 78 |     """
 79 |     stats = {
 80 |         'image_height': [],
 81 |         'image_width': [],
 82 |         'image_mean': [],
 83 |         'image_std': [],
 84 |         'boxes_height': [],
 85 |         'boxes_width': [],
 86 |         'boxes_num': [],
 87 |         'boxes_area': []
 88 |     }
 89 |     for batch in dataset:
 90 |         # Batch
 91 |         x, y, x_name, y_name = batch['x'], batch['y'], batch['x_name'], batch['y_name']
 92 | 
 93 |         # Transform
 94 |         if rcnn_transform:
 95 |             x, y = rcnn_transform([x], [y])
 96 |             x, y = x.tensors, y[0]
 97 | 
 98 |         # Image
 99 |         stats['image_height'].append(x.shape[-2])
100 |         stats['image_width'].append(x.shape[-1])
101 |         stats['image_mean'].append(x.mean().item())
102 |         stats['image_std'].append(x.std().item())
103 | 
104 |         # Target
105 |         wh = box_convert(y['boxes'], 'xyxy', 'xywh')[:, -2:]
106 |         stats['boxes_height'].append(wh[:, -2])
107 |         stats['boxes_width'].append(wh[:, -1])
108 |         stats['boxes_num'].append(len(wh))
109 |         stats['boxes_area'].append(box_area(y['boxes']))
110 | 
111 |     stats['image_height'] = torch.tensor(stats['image_height'], dtype=torch.float)
112 |     stats['image_width'] = torch.tensor(stats['image_width'], dtype=torch.float)
113 |     stats['image_mean'] = torch.tensor(stats['image_mean'], dtype=torch.float)
114 |     stats['image_std'] = torch.tensor(stats['image_std'], dtype=torch.float)
115 |     stats['boxes_height'] = torch.cat(stats['boxes_height'])
116 |     stats['boxes_width'] = torch.cat(stats['boxes_width'])
117 |     stats['boxes_area'] = torch.cat(stats['boxes_area'])
118 |     stats['boxes_num'] = torch.tensor(stats['boxes_num'], dtype=torch.float)
119 | 
120 |     return stats
121 | 
122 | 
123 | def from_file_to_boundingbox(file_name: pathlib.Path, groundtruth: bool = True):
124 |     """Returns a list of BoundingBox objects from groundtruth or prediction."""
125 |     file = torch.load(file_name)
126 |     labels = file['labels']
127 |     boxes = file['boxes']
128 |     scores = file['scores'] if not groundtruth else [None] * len(boxes)
129 | 
130 |     gt = BBType.GROUND_TRUTH if groundtruth else BBType.DETECTED
131 | 
132 |     return [BoundingBox(image_name=file_name.stem,
133 |                         class_id=l,
134 |                         coordinates=tuple(bb),
135 |                         format=BBFormat.XYX2Y2,
136 |                         bb_type=gt,
137 |                         confidence=s) for bb, l, s in zip(boxes, labels, scores)]
138 | 
139 | 
140 | def from_dict_to_boundingbox(file: dict, name: str, groundtruth: bool = True):
141 |     """Returns list of BoundingBox objects from groundtruth or prediction."""
142 |     labels = file['labels']
143 |     boxes = file['boxes']
144 |     scores = np.array(file['scores'].cpu()) if not groundtruth else [None] * len(boxes)
145 | 
146 |     gt = BBType.GROUND_TRUTH if groundtruth else BBType.DETECTED
147 | 
148 |     return [BoundingBox(image_name=name,
149 |                         class_id=int(l),
150 |                         coordinates=tuple(bb),
151 |                         format=BBFormat.XYX2Y2,
152 |                         bb_type=gt,
153 |                         confidence=s) for bb, l, s in zip(boxes, labels, scores)]
154 | 
155 | 
156 | def log_packages_neptune(neptune_logger):
157 |     """Uses the neptunecontrib.api to log the packages of the current python env."""
158 |     dists = importlib_metadata.distributions()
159 |     packages = {idx: (dist.metadata['Name'], dist.version) for idx, dist in enumerate(dists)}
160 | 
161 |     packages_df = pd.DataFrame.from_dict(packages, orient='index', columns=['package', 'version'])
162 | 
163 |     log_table(name='packages', table=packages_df, experiment=neptune_logger.experiment)
164 | 
165 | 
166 | def log_mapping_neptune(mapping: dict, neptune_logger):
167 |     """Uses the neptunecontrib.api to log a class mapping."""
168 |     mapping_df = pd.DataFrame.from_dict(mapping, orient='index', columns=['class_value'])
169 |     log_table(name='mapping', table=mapping_df, experiment=neptune_logger.experiment)
170 | 
171 | 
172 | def log_model_neptune(checkpoint_path: pathlib.Path,
173 |                       save_directory: pathlib.Path,
174 |                       name: str,
175 |                       neptune_logger):
176 |     """Saves the model to disk, uploads it to neptune and removes it again."""
177 |     checkpoint = torch.load(checkpoint_path)
178 |     model = checkpoint['hyper_parameters']['model']
179 |     torch.save(model.state_dict(), save_directory / name)
180 |     neptune_logger.experiment.set_property('checkpoint_name', checkpoint_path.name)
181 |     neptune_logger.experiment.log_artifact(str(save_directory / name))
182 |     if os.path.isfile(save_directory / name):
183 |         os.remove(save_directory / name)
184 | 
185 | 
186 | def log_checkpoint_neptune(checkpoint_path: pathlib.Path, neptune_logger):
187 |     neptune_logger.experiment.set_property('checkpoint_name', checkpoint_path.name)
188 |     neptune_logger.experiment.log_artifact(str(checkpoint_path))
189 | 


--------------------------------------------------------------------------------
/faster-rcnn-tutorial/metrics/README.md:
--------------------------------------------------------------------------------
 1 | To compute the metrics of an object detection model,
 2 | one can use this [opensource toolbox for object detection metrics](https://github.com/rafaelpadilla/review_object_detection_metrics).
 3 | 
 4 | This work was published in the
 5 | [Journal Electronics - Special Issue Deep Learning Based Object Detection](https://www.mdpi.com/journal/electronics/special_issues/learning_based_detection). 
 6 | 
 7 | You can download the paper [here](https://github.com/rafaelpadilla/review_object_detection_metrics/blob/main/published_paper.pdf).
 8 | ```
 9 | @Article{electronics10030279,
10 | AUTHOR = {Padilla, Rafael and Passos, Wesley L. and Dias, Thadeu L. B. and Netto, Sergio L. and da Silva, Eduardo A. B.},
11 | TITLE = {A Comparative Analysis of Object Detection Metrics with a Companion Open-Source Toolkit},
12 | JOURNAL = {Electronics},
13 | VOLUME = {10},
14 | YEAR = {2021},
15 | NUMBER = {3},
16 | ARTICLE-NUMBER = {279},
17 | URL = {https://www.mdpi.com/2079-9292/10/3/279},
18 | ISSN = {2079-9292},
19 | DOI = {10.3390/electronics10030279}
20 | }
21 | ```
22 | 
23 | 
24 | You can find these files in their repo, the code here is slightly adjusted.
25 | 
26 | 


--------------------------------------------------------------------------------
/faster-rcnn-tutorial/metrics/__init__.py:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/ifding/deep-learning-python/fc8bc808d5439686f0ee24a4f0f3b1f5354df6c0/faster-rcnn-tutorial/metrics/__init__.py


--------------------------------------------------------------------------------
/faster-rcnn-tutorial/metrics/enumerators.py:
--------------------------------------------------------------------------------
 1 | from enum import Enum
 2 | 
 3 | 
 4 | class MethodAveragePrecision(Enum):
 5 |     """
 6 |     Class representing if the coordinates are relative to the
 7 |     image size or are absolute values.
 8 |         Developed by: Rafael Padilla
 9 |         Last modification: Apr 28 2018
10 |     """
11 |     EVERY_POINT_INTERPOLATION = 1
12 |     ELEVEN_POINT_INTERPOLATION = 2
13 | 
14 | 
15 | class CoordinatesType(Enum):
16 |     """
17 |     Class representing if the coordinates are relative to the
18 |     image size or are absolute values.
19 |         Developed by: Rafael Padilla
20 |         Last modification: Apr 28 2018
21 |     """
22 |     RELATIVE = 1
23 |     ABSOLUTE = 2
24 | 
25 | 
26 | class BBType(Enum):
27 |     """
28 |     Class representing if the bounding box is groundtruth or not.
29 |     """
30 |     GROUND_TRUTH = 1
31 |     DETECTED = 2
32 | 
33 | 
34 | class BBFormat(Enum):
35 |     """
36 |     Class representing the format of a bounding box.
37 |     """
38 |     XYWH = 1
39 |     XYX2Y2 = 2
40 |     PASCAL_XML = 3
41 |     YOLO = 4
42 | 
43 | 
44 | class FileFormat(Enum):
45 |     ABSOLUTE_TEXT = 1
46 |     PASCAL = 2
47 |     LABEL_ME = 3
48 |     COCO = 4
49 |     CVAT = 5
50 |     YOLO = 6
51 |     OPENIMAGE = 7
52 |     IMAGENET = 8
53 |     UNKNOWN = 9


--------------------------------------------------------------------------------
/faster-rcnn-tutorial/metrics/general_utils.py:
--------------------------------------------------------------------------------
  1 | import fnmatch
  2 | import os
  3 | 
  4 | import cv2
  5 | import matplotlib.pyplot as plt
  6 | import numpy as np
  7 | from PyQt5 import QtCore, QtGui
  8 | 
  9 | from metrics.enumerators import BBFormat
 10 | 
 11 | 
 12 | def get_files_recursively(directory, extension="*"):
 13 |     if '.' not in extension:
 14 |         extension = '*.' + extension
 15 |     files = [
 16 |         os.path.join(dirpath, f) for dirpath, dirnames, files in os.walk(directory)
 17 |         for f in fnmatch.filter(files, extension)
 18 |     ]
 19 |     return files
 20 | 
 21 | 
 22 | def convert_box_xywh2xyxy(box):
 23 |     arr = box.copy()
 24 |     arr[:, 2] += arr[:, 0]
 25 |     arr[:, 3] += arr[:, 1]
 26 |     return arr
 27 | 
 28 | 
 29 | def convert_box_xyxy2xywh(box):
 30 |     arr = box.copy()
 31 |     arr[:, 2] -= arr[:, 0]
 32 |     arr[:, 3] -= arr[:, 1]
 33 |     return arr
 34 | 
 35 | 
 36 | # size => (width, height) of the image
 37 | # box => (X1, X2, Y1, Y2) of the bounding box
 38 | def convert_to_relative_values(size, box):
 39 |     dw = 1. / (size[0])
 40 |     dh = 1. / (size[1])
 41 |     cx = (box[1] + box[0]) / 2.0
 42 |     cy = (box[3] + box[2]) / 2.0
 43 |     w = box[1] - box[0]
 44 |     h = box[3] - box[2]
 45 |     x = cx * dw
 46 |     y = cy * dh
 47 |     w = w * dw
 48 |     h = h * dh
 49 |     # YOLO's format
 50 |     # x,y => (bounding_box_center)/width_of_the_image
 51 |     # w => bounding_box_width / width_of_the_image
 52 |     # h => bounding_box_height / height_of_the_image
 53 |     return (x, y, w, h)
 54 | 
 55 | 
 56 | # size => (width, height) of the image
 57 | # box => (centerX, centerY, w, h) of the bounding box relative to the image
 58 | def convert_to_absolute_values(size, box):
 59 |     w_box = size[0] * box[2]
 60 |     h_box = size[1] * box[3]
 61 | 
 62 |     x1 = (float(box[0]) * float(size[0])) - (w_box / 2)
 63 |     y1 = (float(box[1]) * float(size[1])) - (h_box / 2)
 64 |     x2 = x1 + w_box
 65 |     y2 = y1 + h_box
 66 |     return (round(x1), round(y1), round(x2), round(y2))
 67 | 
 68 | 
 69 | def add_bb_into_image(image, bb, color=(255, 0, 0), thickness=2, label=None):
 70 |     r = int(color[0])
 71 |     g = int(color[1])
 72 |     b = int(color[2])
 73 | 
 74 |     font = cv2.FONT_HERSHEY_SIMPLEX
 75 |     fontScale = 0.5
 76 |     fontThickness = 1
 77 | 
 78 |     x1, y1, x2, y2 = bb.get_absolute_bounding_box(BBFormat.XYX2Y2)
 79 |     x1 = int(x1)
 80 |     y1 = int(y1)
 81 |     x2 = int(x2)
 82 |     y2 = int(y2)
 83 |     cv2.rectangle(image, (x1, y1), (x2, y2), (b, g, r), thickness)
 84 |     # Add label
 85 |     if label is not None:
 86 |         # Get size of the text box
 87 |         (tw, th) = cv2.getTextSize(label, font, fontScale, fontThickness)[0]
 88 |         # Top-left coord of the textbox
 89 |         (xin_bb, yin_bb) = (x1 + thickness, y1 - th + int(12.5 * fontScale))
 90 |         # Checking position of the text top-left (outside or inside the bb)
 91 |         if yin_bb - th <= 0:  # if outside the image
 92 |             yin_bb = y1 + th  # put it inside the bb
 93 |         r_Xin = x1 - int(thickness / 2)
 94 |         r_Yin = y1 - th - int(thickness / 2)
 95 |         # Draw filled rectangle to put the text in it
 96 |         cv2.rectangle(image, (r_Xin, r_Yin - thickness),
 97 |                       (r_Xin + tw + thickness * 3, r_Yin + th + int(12.5 * fontScale)), (b, g, r),
 98 |                       -1)
 99 |         cv2.putText(image, label, (xin_bb, yin_bb), font, fontScale, (0, 0, 0), fontThickness,
100 |                     cv2.LINE_AA)
101 |     return image
102 | 
103 | 
104 | def remove_file_extension(filename):
105 |     return os.path.join(os.path.dirname(filename), os.path.splitext(filename)[0])
106 | 
107 | 
108 | def get_files_dir(directory, extensions=['*']):
109 |     ret = []
110 |     for extension in extensions:
111 |         if extension == '*':
112 |             ret += [f for f in os.listdir(directory)]
113 |             continue
114 |         elif extension is None:
115 |             # accepts all extensions
116 |             extension = ''
117 |         elif '.' not in extension:
118 |             extension = f'.{extension}'
119 |         ret += [f for f in os.listdir(directory) if f.endswith(extension)]
120 |     return ret
121 | 
122 | 
123 | def remove_file_extension(filename):
124 |     return os.path.join(os.path.dirname(filename), os.path.splitext(filename)[0])
125 | 
126 | 
127 | def image_to_pixmap(image):
128 |     image = image.astype(np.uint8)
129 |     if image.shape[2] == 4:
130 |         qformat = QtGui.QImage.Format_RGBA8888
131 |     else:
132 |         qformat = QtGui.QImage.Format_RGB888
133 | 
134 |     image = QtGui.QImage(image.data, image.shape[1], image.shape[0], image.strides[0], qformat)
135 |     # image= image.rgbSwapped()
136 |     return QtGui.QPixmap(image)
137 | 
138 | 
139 | def show_image_in_qt_component(image, label_component):
140 |     pix = image_to_pixmap((image).astype(np.uint8))
141 |     label_component.setPixmap(pix)
142 |     label_component.setAlignment(QtCore.Qt.AlignCenter)
143 | 
144 | 
145 | def get_files_recursively(directory, extension="*"):
146 |     if '.' not in extension:
147 |         extension = '*.' + extension
148 |     files = [
149 |         os.path.join(dirpath, f) for dirpath, dirnames, files in os.walk(directory)
150 |         for f in fnmatch.filter(files, extension)
151 |     ]
152 |     return files
153 | 
154 | 
155 | def is_str_int(s):
156 |     if s[0] in ('-', '+'):
157 |         return s[1:].isdigit()
158 |     return s.isdigit()
159 | 
160 | 
161 | def get_file_name_only(file_path):
162 |     if file_path is None:
163 |         return ''
164 |     return os.path.splitext(os.path.basename(file_path))[0]
165 | 
166 | 
167 | def find_file(directory, file_name, match_extension=True):
168 |     if os.path.isdir(directory) is False:
169 |         return None
170 |     for dirpath, dirnames, files in os.walk(directory):
171 |         for f in files:
172 |             f1 = os.path.basename(f)
173 |             f2 = file_name
174 |             if not match_extension:
175 |                 f1 = os.path.splitext(f1)[0]
176 |                 f2 = os.path.splitext(f2)[0]
177 |             if f1 == f2:
178 |                 return os.path.join(dirpath, os.path.basename(f))
179 |     return None
180 | 
181 | 
182 | def get_image_resolution(image_file):
183 |     if image_file is None or not os.path.isfile(image_file):
184 |         print(f'Warning: Path {image_file} not found.')
185 |         return None
186 |     img = cv2.imread(image_file)
187 |     if img is None:
188 |         print(f'Warning: Error loading the image {image_file}.')
189 |         return None
190 |     h, w, _ = img.shape
191 |     return {'height': h, 'width': w}
192 | 
193 | 
194 | def draw_bb_into_image(image, boundingBox, color, thickness, label=None):
195 |     if isinstance(image, str):
196 |         image = cv2.imread(image)
197 | 
198 |     r = int(color[0])
199 |     g = int(color[1])
200 |     b = int(color[2])
201 | 
202 |     font = cv2.FONT_HERSHEY_SIMPLEX
203 |     fontScale = 0.5
204 |     fontThickness = 1
205 | 
206 |     xIn = boundingBox[0]
207 |     yIn = boundingBox[1]
208 |     cv2.rectangle(image, (boundingBox[0], boundingBox[1]), (boundingBox[2], boundingBox[3]),
209 |                   (b, g, r), thickness)
210 |     # Add label
211 |     if label is not None:
212 |         # Get size of the text box
213 |         (tw, th) = cv2.getTextSize(label, font, fontScale, fontThickness)[0]
214 |         # Top-left coord of the textbox
215 |         (xin_bb, yin_bb) = (xIn + thickness, yIn - th + int(12.5 * fontScale))
216 |         # Checking position of the text top-left (outside or inside the bb)
217 |         if yin_bb - th <= 0:  # if outside the image
218 |             yin_bb = yIn + th  # put it inside the bb
219 |         r_Xin = xIn - int(thickness / 2)
220 |         r_Yin = yin_bb - th - int(thickness / 2)
221 |         # Draw filled rectangle to put the text in it
222 |         cv2.rectangle(image, (r_Xin, r_Yin - thickness),
223 |                       (r_Xin + tw + thickness * 3, r_Yin + th + int(12.5 * fontScale)), (b, g, r),
224 |                       -1)
225 |         cv2.putText(image, label, (xin_bb, yin_bb), font, fontScale, (0, 0, 0), fontThickness,
226 |                     cv2.LINE_AA)
227 |     return image
228 | 
229 | 
230 | def plot_bb_per_classes(dict_bbs_per_class,
231 |                         horizontally=True,
232 |                         rotation=0,
233 |                         show=False,
234 |                         extra_title=''):
235 |     plt.close()
236 |     if horizontally:
237 |         ypos = np.arange(len(dict_bbs_per_class.keys()))
238 |         plt.barh(ypos, dict_bbs_per_class.values())
239 |         plt.yticks(ypos, dict_bbs_per_class.keys())
240 |         plt.xlabel('amount of bounding boxes')
241 |         plt.ylabel('classes')
242 |     else:
243 |         plt.bar(dict_bbs_per_class.keys(), dict_bbs_per_class.values())
244 |         plt.xlabel('classes')
245 |         plt.ylabel('amount of bounding boxes')
246 |     plt.xticks(rotation=rotation)
247 |     title = f'Distribution of bounding boxes per class {extra_title}'
248 |     plt.title(title)
249 |     if show:
250 |         # plt.tight_layout()
251 |         # plt.show(aspect='auto')
252 |         fig = plt.gcf()
253 |         fig.canvas.set_window_title(title)
254 |         fig.tight_layout()
255 |         fig.show()
256 |     return plt
257 | 


--------------------------------------------------------------------------------
/faster-rcnn-tutorial/metrics/pascal_voc_evaluator_test.py:
--------------------------------------------------------------------------------
 1 | # Imports
 2 | import pathlib
 3 | from itertools import chain
 4 | 
 5 | from metrics.enumerators import MethodAveragePrecision
 6 | from metrics.pascal_voc_evaluator import get_pascalvoc_metrics
 7 | from helper.utils import from_file_to_boundingbox
 8 | from helper.utils import get_filenames_of_path
 9 | 
10 | # root directory
11 | root = pathlib.Path(r"C:\Users\johan\Desktop\Johannes\Heads")
12 | 
13 | # input and target files
14 | inputs = get_filenames_of_path(root / 'input')
15 | targets = get_filenames_of_path(root / 'target')
16 | 
17 | inputs.sort()
18 | targets.sort()
19 | 
20 | # get the gt_boxes from disk
21 | gt_boxes = [from_file_to_boundingbox(file_name, groundtruth=True) for file_name in targets]
22 | # reduce list
23 | gt_boxes = list(chain(*gt_boxes))
24 | # TODO: add predictions
25 | pred_boxes = [from_file_to_boundingbox(file_name, groundtruth=False) for file_name in targets]
26 | pred_boxes = list(chain(*pred_boxes))
27 | 
28 | output = get_pascalvoc_metrics(gt_boxes=gt_boxes,
29 |                                det_boxes=pred_boxes,
30 |                                iou_threshold=0.5,
31 |                                method=MethodAveragePrecision.EVERY_POINT_INTERPOLATION,
32 |                                generate_table=True)
33 | 
34 | per_class, mAP = output['per_class'], output['mAP']
35 | head = per_class['head']
36 | 
37 | # %% another test:Difference between computing the mAP per batch and then taking the mean and computing it directly from all batches
38 | all_gt = []
39 | all_pred = []
40 | all_mAP = []
41 | all_per_class = []
42 | for batch in dataloader_valid:
43 |     x, y, x_name, y_name = batch
44 |     with torch.no_grad():
45 |         task.model.eval()
46 |         preds = task.model(x)
47 | 
48 |     from itertools import chain
49 |     from utils import from_dict_to_BoundingBox
50 | 
51 |     gt_boxes = list(
52 |         chain(*[from_dict_to_BoundingBox(target, name=name, groundtruth=True) for target, name in zip(y, x_name)]))
53 |     pred_boxes = list(
54 |         chain(*[from_dict_to_BoundingBox(pred, name=name, groundtruth=False) for pred, name in zip(preds, x_name)]))
55 | 
56 |     all_gt.append(gt_boxes)
57 |     all_pred.append(pred_boxes)
58 | 
59 |     from metrics.pascal_voc_evaluator import get_pascalvoc_metrics
60 |     from metrics.enumerators import MethodAveragePrecision
61 |     metric = get_pascalvoc_metrics(gt_boxes=gt_boxes,
62 |                                    det_boxes=pred_boxes,
63 |                                    iou_threshold=0.5,
64 |                                    method=MethodAveragePrecision.EVERY_POINT_INTERPOLATION,
65 |                                    generate_table=False)
66 | 
67 |     per_class, mAP = metric['per_class'], metric['mAP']
68 |     all_per_class.append(per_class)
69 |     all_mAP.append(mAP)
70 | 
71 | all_tp = [pc[1]['total TP'] for pc in all_per_class]
72 | all_fp = [pc[1]['total FP'] for pc in all_per_class]
73 | 
74 | 
75 | all_gt = list(chain(*all_gt))
76 | all_pred = list(chain(*all_pred))
77 | 
78 | m = get_pascalvoc_metrics(gt_boxes=all_gt,
79 |                                det_boxes=all_pred,
80 |                                iou_threshold=0.5,
81 |                                method=MethodAveragePrecision.EVERY_POINT_INTERPOLATION,
82 |                                generate_table=True)
83 | 
84 | per_class, mAP = m['per_class'], m['mAP']
85 | 


--------------------------------------------------------------------------------
/faster-rcnn-tutorial/setup.py:
--------------------------------------------------------------------------------
 1 | import setuptools
 2 | 
 3 | setuptools.setup(
 4 |     name="faster_rcnn_tutorial",
 5 |     version="0.0.1",
 6 |     author="ifding",
 7 |     author_email="",
 8 |     url="https://github.com/ifding/faster-rcnn-tutorial",
 9 |     packages=setuptools.find_packages(),
10 |     include_package_data=True,
11 |     classifiers=[
12 |         "Programming Language :: Python :: 3",
13 |         "License :: OSI Approved :: MIT License",
14 |         "Operating System :: OS Independent",
15 |     ],
16 |     python_requires='>=3.6',
17 |     install_requires=[
18 |         'numpy',
19 |         'scikit-image',
20 |         'sklearn',
21 |         'neptune-contrib',
22 |         'python-dotenv',        
23 |         'albumentations==0.5.2',
24 |         'pytorch-lightning==1.3.5',
25 |         'torch==1.8.1',
26 |         'torchvision==0.9.1',
27 |         'torchsummary==1.5.1',
28 |         'torchmetrics==0.2.0'
29 |     ]
30 | )
31 | 


--------------------------------------------------------------------------------
/faster-rcnn-tutorial/train.py:
--------------------------------------------------------------------------------
  1 | # imports
  2 | import os
  3 | import pathlib
  4 | import json
  5 | from dotenv import load_dotenv
  6 | 
  7 | import albumentations as A
  8 | import numpy as np
  9 | from pytorch_lightning import Trainer
 10 | from pytorch_lightning import seed_everything
 11 | from pytorch_lightning.callbacks import ModelCheckpoint, LearningRateMonitor, EarlyStopping
 12 | from pytorch_lightning.loggers.neptune import NeptuneLogger
 13 | from torch.utils.data import DataLoader
 14 | from torchvision.models.detection.transform import GeneralizedRCNNTransform
 15 | 
 16 | from custom_dataset import ObjectDetectionDataSet
 17 | from detection.faster_RCNN import FasterRCNN_lightning, get_fasterRCNN_resnet
 18 | from detection.transformations import Clip, ComposeDouble, AlbumentationWrapper
 19 | from detection.transformations import FunctionWrapperDouble, normalize_01
 20 | from detection.utils import collate_double, stats_dataset
 21 | from detection.utils import log_mapping_neptune, log_model_neptune, log_packages_neptune
 22 | 
 23 | # hyper-parameters
 24 | params = {'BATCH_SIZE': 2,
 25 |           'OWNER': 'feid',  # your username in neptune
 26 |           'SAVE_DIR': None,  # checkpoints will be saved to cwd
 27 |           'LOG_MODEL': False,  # whether to log the model to neptune after training
 28 |           'GPU': 1,  # set to None for cpu training
 29 |           'LR': 0.001,
 30 |           'PRECISION': 32,
 31 |           'CLASSES': 2,
 32 |           'SEED': 42,
 33 |           'PROJECT': 'Balloon',
 34 |           'EXPERIMENT': 'balloon',
 35 |           'MAXEPOCHS': 100,
 36 |           'PATIENCE': 50,
 37 |           'BACKBONE': 'resnet34',
 38 |           'FPN': False,
 39 |           'ANCHOR_SIZE': ((32, 64, 128, 256, 512),),
 40 |           'ASPECT_RATIOS': ((0.5, 1.0, 2.0),),
 41 |           'MIN_SIZE': 1024,
 42 |           'MAX_SIZE': 1024,
 43 |           'IMG_MEAN': [0.485, 0.456, 0.406],
 44 |           'IMG_STD': [0.229, 0.224, 0.225],
 45 |           'IOU_THRESHOLD': 0.5
 46 |           }
 47 | 
 48 | 
 49 | def main():
 50 |     # api key, https://github.com/neptune-ai/neptune-client
 51 |     load_dotenv() # read environment variables
 52 |     api_key = os.environ['NEPTUNE']  # if this throws an error, you didn't set your env var
 53 | 
 54 |     # save directory
 55 |     save_dir = os.getcwd() if not params['SAVE_DIR'] else params['SAVE_DIR']
 56 | 
 57 |     # custom dataset directory
 58 |     data_path = 'dataset/balloon'
 59 |     train_path = os.path.join(data_path, 'train')
 60 |     val_path = os.path.join(data_path, 'val')
 61 |     
 62 |     # label mapping, starting at 1, as the background is assigned 0
 63 |     mapping = {
 64 |         'balloon': 1,
 65 |     }
 66 | 
 67 |     # training transformations and augmentations
 68 |     transforms_training = ComposeDouble([
 69 |         Clip(),
 70 |         AlbumentationWrapper(albumentation=A.HorizontalFlip(p=0.5)),
 71 |         AlbumentationWrapper(albumentation=A.RandomScale(p=0.5, scale_limit=0.5)),
 72 |         #AlbumentationWrapper(albumentation=A.VerticalFlip(p=0.5)),
 73 |         FunctionWrapperDouble(np.moveaxis, source=-1, destination=0),
 74 |         FunctionWrapperDouble(normalize_01)
 75 |     ])
 76 | 
 77 |     # validation transformations
 78 |     transforms_validation = ComposeDouble([
 79 |         Clip(),
 80 |         FunctionWrapperDouble(np.moveaxis, source=-1, destination=0),
 81 |         FunctionWrapperDouble(normalize_01)
 82 |     ])
 83 | 
 84 | 
 85 |     # random seed
 86 |     seed_everything(params['SEED'])
 87 | 
 88 |     # dataset training
 89 |     dataset_train = ObjectDetectionDataSet(data_path=train_path,
 90 |                                            transform=transforms_training,
 91 |                                            mapping=mapping)
 92 | 
 93 |     # dataset validation
 94 |     dataset_valid = ObjectDetectionDataSet(data_path=val_path,
 95 |                                            transform=transforms_validation,
 96 |                                            mapping=mapping)
 97 | 
 98 |     # dataloader training
 99 |     dataloader_train = DataLoader(dataset=dataset_train,
100 |                                   batch_size=params['BATCH_SIZE'],
101 |                                   shuffle=True,
102 |                                   num_workers=0,
103 |                                   collate_fn=collate_double)
104 | 
105 |     # dataloader validation
106 |     dataloader_valid = DataLoader(dataset=dataset_valid,
107 |                                   batch_size=1,
108 |                                   shuffle=False,
109 |                                   num_workers=0,
110 |                                   collate_fn=collate_double)
111 |     
112 |     # Datasets statistics exploration
113 |     if False:
114 |         stats_train = stats_dataset(dataset_train)
115 |         transform = GeneralizedRCNNTransform(min_size=1024,
116 |                                          max_size=1024,
117 |                                          image_mean=[0.485, 0.456, 0.406],
118 |                                          image_std=[0.229, 0.224, 0.225])
119 |         stats_train_transform = stats_dataset(dataset_train, transform)
120 |         print(stats_train)
121 |         print(stats_train_transform)
122 | 
123 |     # neptune logger
124 |     neptune_logger = NeptuneLogger(
125 |         api_key=api_key,
126 |         project_name=f'{params["OWNER"]}/{params["PROJECT"]}',  # use your neptune name here
127 |         experiment_name=params['EXPERIMENT'],
128 |         params=params
129 |     )
130 | 
131 |     assert neptune_logger.name  # http GET request to check if the project exists
132 | 
133 |     # model init
134 |     model = get_fasterRCNN_resnet(num_classes=params['CLASSES'],
135 |                                   backbone_name=params['BACKBONE'],
136 |                                   anchor_size=params['ANCHOR_SIZE'],
137 |                                   aspect_ratios=params['ASPECT_RATIOS'],
138 |                                   fpn=params['FPN'],
139 |                                   min_size=params['MIN_SIZE'],
140 |                                   max_size=params['MAX_SIZE'])
141 | 
142 |     # lightning init
143 |     task = FasterRCNN_lightning(model=model, lr=params['LR'], iou_threshold=params['IOU_THRESHOLD'])
144 | 
145 |     # callbacks
146 |     checkpoint_callback = ModelCheckpoint(monitor='Validation_mAP', mode='max')
147 |     learningrate_callback = LearningRateMonitor(logging_interval='step', log_momentum=False)
148 |     early_stopping_callback = EarlyStopping(monitor='Validation_mAP', patience=params['PATIENCE'], mode='max')
149 | 
150 |     # trainer init
151 |     trainer = Trainer(gpus=params['GPU'],
152 |                       precision=params['PRECISION'],  # try 16 with enable_pl_optimizer=False
153 |                       callbacks=[checkpoint_callback, learningrate_callback, early_stopping_callback],
154 |                       default_root_dir=save_dir,  # where checkpoints are saved to
155 |                       logger=neptune_logger,
156 |                       log_every_n_steps=1,
157 |                       num_sanity_val_steps=0,
158 |                       )
159 | 
160 |     # start training
161 |     trainer.max_epochs = params['MAXEPOCHS']
162 |     trainer.fit(task,
163 |                 train_dataloader=dataloader_train,
164 |                 val_dataloaders=dataloader_valid)
165 | 
166 |     # start testing
167 |     #trainer.test(ckpt_path='best', test_dataloaders=dataloader_valid)
168 | 
169 |     # log packages
170 |     log_packages_neptune(neptune_logger)
171 | 
172 |     # log mapping as table
173 |     log_mapping_neptune(mapping, neptune_logger)
174 | 
175 |     # log model
176 |     if params['LOG_MODEL']:
177 |         checkpoint_path = pathlib.Path(checkpoint_callback.best_model_path)
178 |         log_model_neptune(checkpoint_path=checkpoint_path,
179 |                           save_directory=pathlib.Path.home(),
180 |                           name='best_model.pt',
181 |                           neptune_logger=neptune_logger)
182 | 
183 |     # stop logger
184 |     neptune_logger.experiment.stop()
185 |     print('Finished')
186 | 
187 | 
188 | if __name__ == '__main__':
189 |     main()
190 | 


--------------------------------------------------------------------------------
/resources/autonomous-driving.md:
--------------------------------------------------------------------------------
 1 | 
 2 | ## Components of Autonomous Driving System
 3 | 
 4 | 
 5 | ![Alt](images/overview.png "Standard components in a modern autonomous driving systems pipeline.")
 6 | 
 7 | The Autonomous Driving survey paper (https://arxiv.org/pdf/2002.00444.pdf) demonstrates the above pipeline from sensor stream to control actuation.
 8 | 
 9 | The **sensor architecture** includes multiple sets of cameras, radars and LIDARs as well as a GPS-GNSS system for absolute localization and Inertial Measurement Units (IMUs) that provide 3D pose of the vehicle in space.
10 | 
11 | The goal of the **perception module** is the creation of an intermediate level representation of the environment state that is be later utilized by a decision making system that produces the driving policy.
12 | 
13 | This state would include lane position, drivable zone, location of agents such as cars, pedestrians, state of traffic lights and others.
14 | 
15 | Several perception tasks like _semantic segmentation_, _motion estimation_, _depth estimation_, _soiling detection_, etc which can be unified into a multi-task model.
16 | 
17 | 
18 | ## Courses
19 | * [[Coursera] Machine Learning](https://www.coursera.org/learn/machine-learning) - presented by [Andrew Ng](https://en.wikipedia.org/wiki/Andrew_Ng), as of 2020 Jan 28 it has 125,344 ratings and 30,705 reviews.
20 | * [[Coursera+DeepLearning.ai]Deep Learning Specialization](https://www.coursera.org/specializations/deep-learning) - presented by [Andrew Ng](https://en.wikipedia.org/wiki/Andrew_Ng), 5 Courses, teaches foundations of deep learning, programming language: python
21 | * [[Udacity] Self-Driving Car Nanodegree Program](https://www.udacity.com/course/self-driving-car-engineer-nanodegree--nd013) - teaches the skills and techniques used by self-driving car teams. Program syllabus can be found [here](https://medium.com/self-driving-cars/term-1-in-depth-on-udacitys-self-driving-car-curriculum-ffcf46af0c08#.bfgw9uxd9).
22 | * [[University of Toronto] CSC2541
23 | Visual Perception for Autonomous Driving](http://www.cs.toronto.edu/~urtasun/courses/CSC2541/CSC2541_Winter16.html) - A graduate course in visual perception for autonomous driving. The class briefly covers topics in localization, ego-motion estimaton, free-space estimation, visual recognition (classification, detection, segmentation).
24 | * [[INRIA] Mobile Robots and Autonomous Vehicles](https://www.fun-mooc.fr/courses/inria/41005S02/session02/about?utm_source=mooc-list) - Introduces the key concepts required to program mobile robots and autonomous vehicles. The course presents both formal and algorithmic tools, and for its last week's topics (behavior modeling and learning), it will also provide realistic examples and programming exercises in Python.
25 | * [[Universty of Glasgow] ENG5017 Autonomous Vehicle Guidance Systems](http://www.gla.ac.uk/coursecatalogue/course/?code=ENG5017) - Introduces the concepts behind autonomous vehicle guidance and coordination and enables students to design and implement guidance strategies for vehicles incorporating planning, optimising and reacting elements.
26 | * [[David Silver - Udacity] How to Land An Autonomous Vehicle Job: Coursework](https://medium.com/self-driving-cars/how-to-land-an-autonomous-vehicle-job-coursework-e7acc2bfe740#.j5b2kwbso) David Silver, from Udacity, reviews his coursework for landing a job in self-driving cars coming from a Software Engineering background.
27 | * [[Stanford] - CS221 Artificial Intelligence: Principles and Techniques](http://stanford.edu/~cpiech/cs221/index.html) - Contains a simple self-driving project and simulator.
28 | * [[MIT] 6.S094: Deep Learning for Self-Driving Cars](http://selfdrivingcars.mit.edu/) - *"This class is an introduction to the practice of deep learning through the applied theme of building a self-driving car. It is open to beginners and is designed for those who are new to machine learning, but it can also benefit advanced researchers in the field looking for a practical overview of deep learning methods and their application. (...)"* 
29 | * [[MIT] Deep Learning](https://deeplearning.mit.edu/) - *"This page is a collection of MIT courses and lectures on deep learning, deep reinforcement learning, autonomous vehicles, and artificial intelligence organized by Lex Fridman."* 
30 | * [[MIT] Human-Centered Artificial Intelligence](https://hcai.mit.edu/) - *"Human-Centered AI at MIT is a collection of research and courses focused on the design, development, and deployment of artificial intelligence systems that learn from and collaborate with humans in a deep, meaningful way."*
31 | * [[UCSD] - MAE/ECE148 Introduction to Autonomous Vehicles](https://guitar.ucsd.edu/maeece148/index.php/Introduction_to_Autonomous_Vehicles) - A hands-on, project-based course using DonkeyCar with lane-tracking functionality and various advanced topics such as object detection, navigation, etc.
32 | * [[MIT] 2.166 Duckietown](http://duckietown.mit.edu/index.html) - Class about the science of autonomy at the graduate level. This is a hands-on, project-focused course focusing on self-driving vehicles and high-level autonomy. The problem: **Design the Autonomous Robo-Taxis System for the City of Duckietown.**
33 | * [[Coursera] Self-Driving Cars](https://www.coursera.org/specializations/self-driving-cars#about) - A 4 course specialization about Self-Driving Cars by the University of Toronto. Covering all the way from the Introduction, State Estimation & Localization, Visual Perception, Motion Planning.


--------------------------------------------------------------------------------
/resources/datasets.md:
--------------------------------------------------------------------------------
 1 | ## Datasets
 2 | 
 3 | > <https://github.com/manfreddiaz/awesome-autonomous-vehicles>
 4 | 
 5 | 1. [Udacity](https://github.com/udacity/self-driving-car/tree/master/datasets) - Udacity driving datasets released for [Udacity Challenges](https://www.udacity.com/self-driving-car). Contains ROSBAG training data. (~80 GB).
 6 | * [Comma.ai](https://archive.org/details/comma-dataset) - 7 and a quarter hours of largely highway driving. Consists of 10 videos clips of variable size recorded at 20 Hz with a camera mounted on the windshield of an Acura ILX 2016. In parallel to the videos, also recorded some measurements such as car's speed, acceleration, steering angle, GPS coordinates, gyroscope angles. These measurements are transformed into a uniform 100 Hz time base.
 7 | * [Oxford's Robotic Car](http://robotcar-dataset.robots.ox.ac.uk/) - over 100 repetitions of a consistent route through Oxford, UK, captured over a period of over a year. The dataset captures many different combinations of weather, traffic and pedestrians, along with longer term changes such as construction and roadworks.
 8 | * [KITTI Vision Benchmark Suite](http://www.cvlibs.net/datasets/kitti/raw_data.php) - 6 hours of traffic scenarios at 10-100 Hz using a variety of sensor modalities such as highresolution
 9 | color and grayscale stereo cameras, a Velodyne 3D laser scanner and a high-precision GPS/IMU inertial navigation system.
10 | * [University of Michigan North Campus Long-Term Vision and LIDAR Dataset](http://robots.engin.umich.edu/nclt/) -  consists of omnidirectional imagery, 3D lidar, planar lidar, GPS, and proprioceptive
11 | sensors for odometry collected using a Segway robot.
12 | * [University of Michigan Ford Campus Vision and Lidar Data Set](http://robots.engin.umich.edu/SoftwareData/Ford) - dataset collected by an autonomous ground vehicle testbed, based upon a modified Ford F-250 pickup truck. The vehicle is outfitted with a professional (Applanix POS LV) and consumer (Xsens MTI-G) Inertial Measuring Unit (IMU), a Velodyne 3D-lidar scanner, two push-broom forward looking Riegl lidars, and a Point Grey Ladybug3 omnidirectional camera system.
13 | * [DIPLECS Autonomous Driving Datasets (2015)](http://cvssp.org/data/diplecs/) - dataset was recorded by placing a HD camera in a car driving around the Surrey countryside. The dataset contains about 30 minutes of driving. The video is 1920x1080 in colour, encoded using H.264 codec. Steering is estimated by tracking markers on the steering wheel. The car's speed is estimated from OCR the car's speedometer (but the accuracy of the method is not guaranteed).
14 | * [Velodyne SLAM Dataset from Karlsruhe Institute of Technology](http://www.mrt.kit.edu/z/publ/download/velodyneslam/dataset.html) -  two challenging datasets recorded with the Velodyne HDL64E-S2 scanner in the city of Karlsruhe, Germany.
15 | * [SYNTHetic collection of Imagery and Annotations (SYNTHIA)](http://synthia-dataset.net/) - consists of a collection of photo-realistic frames rendered from a virtual city and comes with precise pixel-level semantic annotations for 13 classes: misc, sky, building, road, sidewalk, fence, vegetation, pole, car, sign, pedestrian, cyclist, lanemarking.
16 | * [Cityscape Dataset](https://www.cityscapes-dataset.com/) - focuses on semantic understanding of urban street scenes.  large-scale dataset that contains a diverse set of stereo video sequences recorded in street scenes from 50 different cities, with high quality pixel-level annotations of 5 000 frames in addition to a larger set of 20 000 weakly annotated frames. The dataset is thus an order of magnitude larger than similar previous attempts. Details on annotated classes and examples of our annotations are available.
17 | * [CSSAD Dataset](http://aplicaciones.cimat.mx/Personal/jbhayet/ccsad-dataset) - Several real-world stereo datasets exist for the development and testing of algorithms in the fields of perception and navigation of autonomous vehicles. However, none of them was recorded in developing countries and therefore they lack the particular characteristics that can be found in their streets and roads, like abundant potholes, speed bumpers and peculiar flows of pedestrians. This stereo dataset was recorded from a moving vehicle and contains high resolution stereo images which are complemented with orientation and acceleration data obtained from an IMU, GPS data, and data from the car computer.
18 | * [Daimler Urban Segmetation Dataset](http://www.6d-vision.com/scene-labeling) - consists of video sequences recorded in urban traffic. The dataset consists of 5000 rectified stereo image pairs with a resolution of 1024x440. 500 frames (every 10th frame of the sequence) come with pixel-level semantic class annotations into 5 classes: ground, building, vehicle, pedestrian, sky. Dense disparity maps are provided as a reference, however these are not manually annotated but computed using semi-global matching (sgm).
19 | * [Self Racing Cars - XSens/Fairchild Dataset](http://data.selfracingcars.com/) - The files include measurements from the Fairchild FIS1100 6 Degree of Freedom (DoF) IMU, the Fairchild FMT-1030 AHRS, the Xsens MTi-3 AHRS, and the Xsens MTi-G-710 GNSS/INS. The files from the event can all be read in the MT Manager software, available as part of the MT Software Suite, available here.
20 | * [MIT AGE Lab](http://lexfridman.com/automated-synchronization-of-driving-data-video-audio-telemetry-accelerometer/) - a small sample of the 1,000+ hours of multi-sensor driving datasets collected at AgeLab.
21 | * [Yet Another Computer Vision Index To Datasets (YACVID)](http://yacvid.hayko.at/) -  a list of frequently used computer vision datasets.
22 | * [KUL Belgium Traffic Sign Dataset](http://www.vision.ee.ethz.ch/~timofter/traffic_signs/) - a large dataset with 10000+ traffic sign annotations, thousands of physically distinct traffic signs. 4 video sequences recorded with 8 high resolution cameras mounted on a van, summing more than 3 hours, with traffic sign annotations, camera calibrations and poses. About 16000 background images. The material is captured in Belgium, in urban environments from Flanders region, by GeoAutomation. 
23 | * [LISA: Laboratory for Intelligent & Safe Automobiles, UC San Diego Datasets](http://cvrr.ucsd.edu/LISA/datasets.html) - traffic sign, vehicles detection, traffic lights, trajectory patterns.
24 | * [Multisensory Omni-directional Long-term Place Recognition (MOLP) dataset for autonomous driving](http://hcr.mines.edu/code/MOLP.html) It was recorded using omni-directional stereo cameras during one year in Colorado, USA. [paper](https://arxiv.org/abs/1704.05215)
25 | * [Lane Instance Segmentation in Urban Environments](https://five.ai/datasets) Semi-automated method for labelling lane instances. 24,000 image set available. [paper](https://arxiv.org/pdf/1807.01347.pdf)
26 | * [Foggy Zurich Dataset](https://www.vision.ee.ethz.ch/~csakarid/Model_adaptation_SFSU_dense/) Curriculum Model Adaptation with Synthetic and Real Data for Semantic Dense Foggy Scene Understanding. 3.8k High Quality Foggy images in and around Zurich. [paper](https://arxiv.org/abs/1901.01415)
27 | * [SullyChen AutoPilot Dataset](https://github.com/SullyChen/Autopilot-TensorFlow) Dataset collected by SullyChen in and around California. 
28 | * [Waymo Training and Validation Data](https://waymo.com/open) One terabyte of data with 3D and 2D labels.
29 | * [Intel's dataset for AD conditions in India](https://www.intel.ai/iiit-hyderabad-and-intel-release-worlds-first-dataset-for-driving-in-india/#gs.28pnw5) A dataset for Autonomous Driving conditions in India with segmented annotations (10k). (by Intel & IIIT Hyderabad).
30 | * [nuScenes Dataset](https://www.nuscenes.org/) A large dataset with 1,400,000 images and 390,000 lidar sweeps from Boston and Singapore. Provides manually generated 3D bounding boxes for 23 object classes.
31 | * [German Traffic Sign Dataset](http://benchmark.ini.rub.de/?section=gtsrb&subsection=dataset) A large dataset of German traffic sign recogniton data (GTSRB) with more than 40 classes in 50k images and detection data (GTSDB) with 900 image annotations.
32 | * [Swedish Traffic Sign Dataset](https://www.cvl.isy.liu.se/research/datasets/traffic-signs-dataset/) A dataset with traffic signs recorded on 350 km of Swedish roads, consisting of 20k+ images with 20% of annotations.
33 | 


--------------------------------------------------------------------------------
/resources/deep-learning.md:
--------------------------------------------------------------------------------
 1 | 
 2 | ## Deep Learning Basics
 3 | 
 4 | - [Offical PyTorch tutorials](http://pytorch.org/tutorials/) for more tutorials (some of these tutorials are included there)
 5 | - [apachecn/MachineLearning](https://github.com/apachecn/MachineLearning)
 6 | einforcement Learning Algorithms. Python, OpenAI Gym, Tensorflow](https://github.com/dennybritz/reinforcement-learning)
 7 | - [lawlite19/DeepLearning_Python](https://github.com/lawlite19/DeepLearning_Python)
 8 | - [A collection of tutorials and examples for solving and understanding machine learning and pattern classification tasks](https://github.com/rasbt/pattern_classification)
 9 | - [Deep Learning papers reading roadmap for anyone who are eager to learn this amazing tech](https://github.com/songrotek/Deep-Learning-Papers-Reading-Roadmap)
10 | - [Content for Udacity's Machine Learning curriculum](https://github.com/udacity/machine-learning)
11 | - [This is the lab repository of my honours degree project on machine learning](https://github.com/ShokuninSan/machine-learning)
12 | - [A curated list of awesome Machine Learning frameworks, libraries and software](https://github.com/josephmisiti/awesome-machine-learning)
13 | - [Bare bones Python implementations of some of the fundamental Machine Learning models and algorithms](https://github.com/eriklindernoren/ML-From-Scratch)
14 | - [The "Python Machine Learning" book code repository and info resource](https://github.com/rasbt/python-machine-learning-book)
15 | 
16 | 


--------------------------------------------------------------------------------
/resources/images/overview.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/ifding/deep-learning-python/fc8bc808d5439686f0ee24a4f0f3b1f5354df6c0/resources/images/overview.png


--------------------------------------------------------------------------------