├── README.md
├── images
    ├── .DS_Store
    ├── bird_orig.png
    ├── bird_predict.png
    ├── boy_orig.png
    ├── boy_predict.png
    ├── cutepenguin_boxes.png
    ├── cutepenguin_no_boxes.png
    ├── full-diagram.png
    ├── image_with_bounding.png
    ├── light_orig.png
    ├── light_predict.png
    ├── loss_1.png
    ├── loss_2.png
    └── pixel_array.png
└── src
    ├── .ipynb_checkpoints
        ├── dataprep-checkpoint.ipynb
        └── plswork-checkpoint.ipynb
    ├── dataprep.ipynb
    ├── testing.ipynb
    └── training.ipynb


/README.md:
--------------------------------------------------------------------------------
  1 | # Object-detection-using-Faster-RCNN
  2 | ## Introduction
  3 | 
  4 | What exactly is object detection?
  5 | 
  6 | Looking at the picture below what do you see? Well if everythings loading properly then you're thinking two cute baby penguins. If that's the case then you're 100% correct its two cute baby penguins. What you just did was object detection you identifed that there is an object in the image!
  7 | 
  8 | ![alt text](images/cutepenguin_no_boxes.png)
  9 | 
 10 | Now the next step would be for me to ask you if you could draw boxes around where the penguins are in the image. If you were to do that it would probably look something like this:
 11 | 
 12 | ![alt text](images/cutepenguin_boxes.png)
 13 | 
 14 | Now what you've just done is object detection! You not only identified that the penguins were in the image you showed me where and how confident you were that it was a penguin. 
 15 | 
 16 | ## Motivation
 17 | 
 18 | Now why is object detection hard for computers to do? Computers unfortunately dont see images as we do. Instead of seeing two cute penguins they see a 3-dimensional array of numbers between 1-255 for each individual square or pixel. An example below. Now that you've seen how a computer sees images suddenly object detection becomes a bit trickier!
 19 | 
 20 | ![alt text](images/pixel_array.png)
 21 | 
 22 | You could be asking yourself, why is object-detection useful? A few examples I can think of are things such as self-driving cars or self-flying drones. This is only becoming more relevant as time goes on just earlier this week Tesla released their first full self driving car ride video(https://www.youtube.com/watch?v=tlThdr3O5Qo). Amazon has also been working diligently for the past few years trying to get drones to deliver packages. Now that we have some background lets dive into it.
 23 | 
 24 | ## The Data
 25 | [Google AI Open Dataset](https://storage.googleapis.com/openimages/web/index.html)
 26 | 
 27 | Over 1.7 million unique images, 600 different types of classes, and over 14 million human identifed objects within the images.
 28 | 
 29 | An example of how the original images look
 30 | 
 31 | ![alt text](images/image_with_bounding.png)
 32 | 
 33 | Because the dataset is so massive I chose a subset of about 2500~ images split them into 1800 train and 700 test this gave me close to 8000 objects to try and detect.
 34 | 
 35 | Steps to download the type of data I used
 36 | 
 37 | 1.) Press *Download*
 38 | 
 39 | 2.) Then press *Download from Figure Eight*
 40 | 
 41 | 3.) Next *Download Options* in the top right
 42 | 
 43 | 4.) Under Train_00.zip download train-annotations-bbox.csv and train-images-boxable.csv
 44 | 
 45 | 5.) Scroll down the page to the 'Please Note' then click the hyper link in the second paragraph labeled '*You can download it here*'.
 46 | 
 47 | ## Faster R-CNN
 48 | 
 49 | Originally presented in a paper titled [**Faster R-CNN: Towards Real-Time Object Detection
 50 | with Region Proposal Networks**](**https://papers.nips.cc/paper/5638-faster-r-cnn-towards-real-time-object-detection-with-region-proposal-networks.pdf). Faster region-based convolutional neural network is the third iteration of the R-CNN family and by far the fastest. 
 51 | 
 52 | Now how exactly does it work? 
 53 | 
 54 | ![alt text](images/full-diagram.png)
 55 | (Image credit to the [original paper](https://arxiv.org/pdf/1506.01497.pdf))
 56 | 
 57 | Lets start at the bottom of the image. First you pass in the full reshaped image to the 'conv layers' these layers are generally a pretrained network such as the VGG-16 or ResNet with the heads cut off 
 58 | allowing us to extract just the feature maps generated by these networks. If you're unsure what a feature map is, think of a simplified image where everything but the feature (a line, or a curve, maybe a square, things that help the computer determine what type of object it is) is ignored. 
 59 | 
 60 | If you'd like to look further into feature maps heres a link to help to a research paper titled [Visualizing and Understanding Convolutional Networks](https://arxiv.org/pdf/1311.2901.pdf). 
 61 | 
 62 | Now that we've extracted the feature maps we then pass them to our Region Proposal Network(RPN). The RPN is the real star of the show, this is what allowed Faster-RCNN to be 250x faster than the original RCNN model. The RPN is actually two neural networks one classification and one regression. The regression network generates by default 9-bounding boxes per pixel on the feature map. These bounding boxes are a 1:1, a 2:1, and a 1:2 ratio with the original proposed size being 128,256,512 pixels. For the purposes of my project I actually reduced the image size to 300 pixels, so I halved those values leaving me with a size 64,128,256 pixels for my boxes. Now that we've proposed our boxes the classifaction network kicks in and determines whether an object is in the proposed region or not. If the network thinks there is an object in the region it's passed off to the RoI pooling layer.
 63 | 
 64 | The RoI(Region of Interest) pooling layer extracts the original portion of the image within that bounding box and passes it to another two networks. 
 65 | 
 66 | Again one is a regression and one is a classifier, the regression network generates a bounding box for where it thinks the object is in this portion of the image. The classifaction network then based on this region determines what type of object is within that bounding box. Is it a bird? A plane? Superman? Well if the networks well trained lets hope it predicts the correct answer!
 67 | 
 68 | Thats a quick summary of how Faster Region-based Convultional Neural Network works. If you have any questions please feel free to get into contact with me!
 69 | 
 70 | ## My Models
 71 | 
 72 | I trained two models, one to detect where a person is. Another to detect traffic lights, birds, and footballs. I trained on a AWS p2.Xlarge instance, which was taking about 6 minutes per epoch or in other words 6 minutes per 1000 images processesed. Both models were trained for around 150 epochs. 
 73 | 
 74 | The graphs for the traffic lights, birds, and footballs model.
 75 | 
 76 | ![alt text](images/loss_1.png) ![alt text](images/loss_2.png)
 77 | 
 78 | As you can see by the graphs above the models were still learning as the epochs progressed so they definitely could have used more training time.
 79 | 
 80 | ## Results
 81 | 
 82 | Here is some of my favorite results from both models.
 83 | 
 84 | ![alt text](images/boy_orig.png) ![alt text](images/boy_predict.png)
 85 | 
 86 | ![alt text](images/bird_orig.png) ![alt text](images/bird_predict.png)
 87 | 
 88 | ![alt text](images/light_orig.png) ![alt text](images/light_predict.png)
 89 | 
 90 | ## Conclusions 
 91 | 
 92 | With only a week to do this project I would say that my models preformed well. But this whole project has been a very fun experience and Faster R-CNN is a very complex model type that works extremely well.
 93 | 
 94 | ## Future work
 95 | 
 96 | ### More training time
 97 | * Allowing the model to see the images longer will definitely help it learn the shapes better!
 98 | 
 99 | ### More training data
100 | * More data allows the model to learn the features of the object better allowing it to perform even better!
101 | 
102 | ### Tryout a ResNet instead of VGG-16
103 | * Self explanatory perhaps a different feature map extractor will help it preform better as-well.
104 | 
105 | 
106 | ## Acknowledgments
107 | 
108 | Yinghan Xu's article was an absolute must have for this project. 
109 | [Link](https://towardsdatascience.com/faster-r-cnn-object-detection-implemented-by-keras-for-custom-data-from-googles-open-images-125f62b9141a)
110 | 
111 | Tryo labs wrote an excellent article on Faster-RCNN that helped my understanding tremendously [Link](https://tryolabs.com/blog/2018/01/18/faster-r-cnn-down-the-rabbit-hole-of-modern-object-detection/)
112 | 
113 | 


--------------------------------------------------------------------------------
/images/.DS_Store:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/DaHeller/Object-detection-using-Faster-RCNN/fe96e699b7ebf660cb719fe0e74e1389777b2afb/images/.DS_Store


--------------------------------------------------------------------------------
/images/bird_orig.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/DaHeller/Object-detection-using-Faster-RCNN/fe96e699b7ebf660cb719fe0e74e1389777b2afb/images/bird_orig.png


--------------------------------------------------------------------------------
/images/bird_predict.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/DaHeller/Object-detection-using-Faster-RCNN/fe96e699b7ebf660cb719fe0e74e1389777b2afb/images/bird_predict.png


--------------------------------------------------------------------------------
/images/boy_orig.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/DaHeller/Object-detection-using-Faster-RCNN/fe96e699b7ebf660cb719fe0e74e1389777b2afb/images/boy_orig.png


--------------------------------------------------------------------------------
/images/boy_predict.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/DaHeller/Object-detection-using-Faster-RCNN/fe96e699b7ebf660cb719fe0e74e1389777b2afb/images/boy_predict.png


--------------------------------------------------------------------------------
/images/cutepenguin_boxes.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/DaHeller/Object-detection-using-Faster-RCNN/fe96e699b7ebf660cb719fe0e74e1389777b2afb/images/cutepenguin_boxes.png


--------------------------------------------------------------------------------
/images/cutepenguin_no_boxes.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/DaHeller/Object-detection-using-Faster-RCNN/fe96e699b7ebf660cb719fe0e74e1389777b2afb/images/cutepenguin_no_boxes.png


--------------------------------------------------------------------------------
/images/full-diagram.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/DaHeller/Object-detection-using-Faster-RCNN/fe96e699b7ebf660cb719fe0e74e1389777b2afb/images/full-diagram.png


--------------------------------------------------------------------------------
/images/image_with_bounding.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/DaHeller/Object-detection-using-Faster-RCNN/fe96e699b7ebf660cb719fe0e74e1389777b2afb/images/image_with_bounding.png


--------------------------------------------------------------------------------
/images/light_orig.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/DaHeller/Object-detection-using-Faster-RCNN/fe96e699b7ebf660cb719fe0e74e1389777b2afb/images/light_orig.png


--------------------------------------------------------------------------------
/images/light_predict.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/DaHeller/Object-detection-using-Faster-RCNN/fe96e699b7ebf660cb719fe0e74e1389777b2afb/images/light_predict.png


--------------------------------------------------------------------------------
/images/loss_1.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/DaHeller/Object-detection-using-Faster-RCNN/fe96e699b7ebf660cb719fe0e74e1389777b2afb/images/loss_1.png


--------------------------------------------------------------------------------
/images/loss_2.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/DaHeller/Object-detection-using-Faster-RCNN/fe96e699b7ebf660cb719fe0e74e1389777b2afb/images/loss_2.png


--------------------------------------------------------------------------------
/images/pixel_array.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/DaHeller/Object-detection-using-Faster-RCNN/fe96e699b7ebf660cb719fe0e74e1389777b2afb/images/pixel_array.png


--------------------------------------------------------------------------------
/src/.ipynb_checkpoints/dataprep-checkpoint.ipynb:
--------------------------------------------------------------------------------
  1 | {
  2 |  "cells": [
  3 |   {
  4 |    "cell_type": "code",
  5 |    "execution_count": 18,
  6 |    "metadata": {},
  7 |    "outputs": [],
  8 |    "source": [
  9 |     "import numpy as np\n",
 10 |     "import time\n",
 11 |     "import sys\n",
 12 |     "import os\n",
 13 |     "import random\n",
 14 |     "from skimage import io\n",
 15 |     "import pandas as pd\n",
 16 |     "from matplotlib import pyplot as plt\n",
 17 |     "from shutil import copyfile\n",
 18 |     "\n",
 19 |     "import cv2\n",
 20 |     "import tensorflow as tf"
 21 |    ]
 22 |   },
 23 |   {
 24 |    "cell_type": "code",
 25 |    "execution_count": 19,
 26 |    "metadata": {},
 27 |    "outputs": [],
 28 |    "source": [
 29 |     "base_path = '~/Object-detection-using-Faster-RCNN/data/'\n",
 30 |     "image_links = 'train-images-boxable.csv'\n",
 31 |     "annot_box = 'train-annotations-bbox.csv'\n",
 32 |     "class_labels = 'class-descriptions-boxable.csv'"
 33 |    ]
 34 |   },
 35 |   {
 36 |    "cell_type": "code",
 37 |    "execution_count": 20,
 38 |    "metadata": {},
 39 |    "outputs": [],
 40 |    "source": [
 41 |     "df_image_links = pd.read_csv(base_path+image_links)"
 42 |    ]
 43 |   },
 44 |   {
 45 |    "cell_type": "code",
 46 |    "execution_count": 21,
 47 |    "metadata": {},
 48 |    "outputs": [],
 49 |    "source": [
 50 |     "df_annot_box = pd.read_csv(base_path+annot_box)"
 51 |    ]
 52 |   },
 53 |   {
 54 |    "cell_type": "code",
 55 |    "execution_count": 22,
 56 |    "metadata": {},
 57 |    "outputs": [],
 58 |    "source": [
 59 |     "df_class_labels = pd.read_csv(base_path+class_labels,header=None)\n",
 60 |     "df_class_labels.columns=['id','name']\n"
 61 |    ]
 62 |   },
 63 |   {
 64 |    "cell_type": "code",
 65 |    "execution_count": null,
 66 |    "metadata": {},
 67 |    "outputs": [],
 68 |    "source": [
 69 |     "def plot_org_img_with_boxes(image_name):\n",
 70 |     "    \"\"\"\n",
 71 |     "    Input: \n",
 72 |     "    image_name(string) = the actual file name '6b5bfa4e9b0e767c.jpg'\n",
 73 |     "    \n",
 74 |     "    Return:\n",
 75 |     "    One plot of the original image no bounding boxes, the other with bounding boxes.\n",
 76 |     "    \"\"\"\n",
 77 |     "    temp = df_image_links[df_image_links['image_name']==image_name]\n",
 78 |     "    img_url = temp['image_url'].values[0]\n",
 79 |     "    img_id = image_name[:16]\n",
 80 |     "    \n",
 81 |     "    img = io.imread(img_url)\n",
 82 |     "    \n",
 83 |     "    height, width, _ = img.shape\n",
 84 |     "    plt.figure(figsize=(10,10))\n",
 85 |     "    plt.subplot(1,2,1)\n",
 86 |     "    plt.title('Original Image')\n",
 87 |     "    plt.imshow(img)\n",
 88 |     "    boxes = df_annot_box[df_annot_box['ImageID']==img_id]\n",
 89 |     "    img_bbox = img.copy()\n",
 90 |     "    for index, row in boxes.iterrows():\n",
 91 |     "        xmin,xmax,ymin,ymax = row['XMin'],row['XMax'],row['YMin'],row['YMax']\n",
 92 |     "        xmin,xmax,ymin,ymax = int(xmin*width),int(xmax*width),int(ymin*height),int(ymax*height)\n",
 93 |     "        label_name = row['LabelName']\n",
 94 |     "        \n",
 95 |     "        temp_df = df_class_labels[df_class_labels['id']==label_name]\n",
 96 |     "        class_of_box = temp_df['name'].values[0]\n",
 97 |     "        \n",
 98 |     "        cv2.rectangle(img_bbox,(xmin,ymin),(xmax,ymax),(255,255,0),2)\n",
 99 |     "        font = cv2.FONT_HERSHEY_SIMPLEX\n",
100 |     "        cv2.putText(img_bbox,class_of_box,(xmin,ymin-10), font, 1,(255,255,0),2)\n",
101 |     "    plt.subplot(1,2,2)\n",
102 |     "    plt.title('Image with Bounding Box')\n",
103 |     "    plt.imshow(img_bbox)\n",
104 |     "    plt.show()\n",
105 |     "    "
106 |    ]
107 |   },
108 |   {
109 |    "cell_type": "code",
110 |    "execution_count": null,
111 |    "metadata": {},
112 |    "outputs": [],
113 |    "source": [
114 |     "def create_csv(which_class, num_rows,save=False,save_path ='~/Object-detection-using-Faster-RCNN/createddata/'):\n",
115 |     "    \"\"\"Input: \n",
116 |     "    which_class(str): one of 600 classes available in the dataset\n",
117 |     "    num_rows(int): how many rows you want your csv to be could potentially be less than inputted if not that many in data\n",
118 |     "    \n",
119 |     "    Returns: \n",
120 |     "    Df with num_rows randomly chosen number of rows\n",
121 |     "    if save==True: saves it as a csv in save_path + (which_class)_1000.csv\n",
122 |     "    \n",
123 |     "    \"\"\"\n",
124 |     "    class_id = df_class_labels[df_class_labels['name']==which_class].values[0][0] #collect the class_id value \n",
125 |     "    num_total_pics = df_annot_box[df_annot_box['LabelName']==class_id] #select all the annotation boxes with that id value\n",
126 |     "    print('Total amount of {} in data'.format(which_class))\n",
127 |     "    print(len(num_total_pics))\n",
128 |     "    \n",
129 |     "    print('Number of unique pictures featuring atleast one of {}'.format(which_class))\n",
130 |     "    num_unique_pics_of_class = np.unique(num_total_pics['ImageID']) #remove duplicate images from the df, \n",
131 |     "    #such as smooshing down a picture that has 2 birds to one value\n",
132 |     "    print(len(num_unique_pics_of_class))\n",
133 |     "    random_rows = np.random.choice(num_unique_pics_of_class,num_rows,replace=False)#randomly choose Num_rows\n",
134 |     "    array_append_jpg = [df_image_links[df_image_links['image_name']==name+'.jpg'] for name in random_rows]\n",
135 |     "    df = pd.DataFrame()\n",
136 |     "    for i in range(len(array_append_jpg)):\n",
137 |     "        df = df.append(array_append_jpg[i], ignore_index = True)\n",
138 |     "    if save:\n",
139 |     "        df.to_csv(save_path + '{}_{}.csv'.format(which_class,num_rows))\n",
140 |     "    \n",
141 |     "    return df\n",
142 |     "\n"
143 |    ]
144 |   },
145 |   {
146 |    "cell_type": "code",
147 |    "execution_count": null,
148 |    "metadata": {},
149 |    "outputs": [],
150 |    "source": [
151 |     "def download_images(csv_file_path, save_file_path, type_of_class):\n",
152 |     "    \"\"\"\n",
153 |     "    INPUT:\n",
154 |     "    csv_file_path(string) = path to where you saved the csv generated from create_csv\n",
155 |     "    save_file_path(string) = path to where you want to save all of the images\n",
156 |     "    type_of_class(string) = whichever class('Person', 'Bird', etc...)\n",
157 |     "    \n",
158 |     "    Returns: None\n",
159 |     "    \n",
160 |     "    Generates: A new folder with the name of type_of_class with all the images downloaded inside of it\n",
161 |     "    \"\"\"\n",
162 |     "    df = pd.read_csv(csv_file_path)\n",
163 |     "    if len(df.columns) >=3:\n",
164 |     "        df.drop('Unnamed: 0',axis=1,inplace=True)\n",
165 |     "    urls = df['image_url'].values\n",
166 |     "    directory = save_file_path + type_of_class\n",
167 |     "    os.mkdir(directory)\n",
168 |     "    for url in urls:\n",
169 |     "        img = io.imread(url)\n",
170 |     "        file_name=url[-20:]\n",
171 |     "        io.imsave(directory+'/'+file_name, img)\n",
172 |     "    \n",
173 |     "    "
174 |    ]
175 |   },
176 |   {
177 |    "cell_type": "code",
178 |    "execution_count": null,
179 |    "metadata": {},
180 |    "outputs": [],
181 |    "source": [
182 |     "train_path = '../createddata/train/'\n",
183 |     "test_path = '../createddata/test/'\n",
184 |     "# os.mkdir(train_path)\n",
185 |     "# os.mkdir(test_path)\n",
186 |     "def split_train_test(file_path_to_imgs, percentage_split=.8, save_path_train = '../createddata/train/',save_path_test = '../createddata/test/'):\n",
187 |     "    \"\"\"\n",
188 |     "    Input: \n",
189 |     "    file_path_to_imgs = file path to the image directory where all the downloaded images from download_images() are saved\n",
190 |     "    percentage_split(int between 0-1) = Default at .8 The percentage you want to be train images, remaining percent is test\n",
191 |     "    save_path_train = file path where you want to save the train images\n",
192 |     "    save_path_test = file path where you want to save the test images\n",
193 |     "    \n",
194 |     "    RETURNS: None\n",
195 |     "    \n",
196 |     "    Generates:\n",
197 |     "    Copied images into specificed train and test directorys\n",
198 |     "    \"\"\"\n",
199 |     "    imgs = os.listdir(file_path_to_imgs)\n",
200 |     "    imgs = [f for f in imgs if not f.startswith('.')]\n",
201 |     "    random.seed(1)\n",
202 |     "    random.shuffle(imgs)\n",
203 |     "    num_of_train_imgs = int(len(imgs)*percentage_split)\n",
204 |     "    num_of_test_imgs =len(imgs)-int(len(imgs)*(1-percentage_split))\n",
205 |     "    train_imgs = imgs[:num_of_train_imgs]\n",
206 |     "    test_imgs = imgs[num_of_test_imgs:]\n",
207 |     "    for val in train_imgs:\n",
208 |     "        img_loc = file_path_to_imgs + val\n",
209 |     "        save_loc = save_path_train+val\n",
210 |     "        copyfile(img_loc, save_loc)\n",
211 |     "    for val in test_imgs:\n",
212 |     "        img_loc = file_path_to_imgs + val\n",
213 |     "        save_loc = save_path_test+val\n",
214 |     "        copyfile(img_loc, save_loc)"
215 |    ]
216 |   },
217 |   {
218 |    "cell_type": "code",
219 |    "execution_count": 49,
220 |    "metadata": {},
221 |    "outputs": [],
222 |    "source": [
223 |     "def create_df_out_of_image_folders(path,names_of_classes=[],save=False, type_of_data='train'):\n",
224 |     "    \"\"\"\n",
225 |     "    INPUTS:\n",
226 |     "    path = path to where you just stored your train or test pictures\n",
227 |     "    names_of_classes = A list with the classes you've chosen for your data ('Bird','Person,'Traffic light'...etc)\n",
228 |     "    save = True save a csv file you can load in with this information\n",
229 |     "    type_of_data = actually just used as a variable to name your saved csv \n",
230 |     "    ex:\n",
231 |     "    if type_of_data = 'train' file will save as 'train_df.csv'\n",
232 |     "    \n",
233 |     "    Returns:\n",
234 |     "    dataframe with format [FileName, XMIN, XMax, YMin, Ymax, ClassName] for each bounding box\n",
235 |     "    \n",
236 |     "    \"\"\"\n",
237 |     "    class_id = []\n",
238 |     "    for val in names_of_classes:\n",
239 |     "        class_id.append(df_class_labels[df_class_labels['name']==val].values[0][0])\n",
240 |     "    df = pd.DataFrame(columns=['FileName', 'XMin', 'XMax', 'YMin', 'YMax', 'ClassName'])\n",
241 |     "    train_imgs = os.listdir(path)\n",
242 |     "    train_imgs = [name for name in train_imgs if not name.startswith('.')]\n",
243 |     "    for i in range(len(train_imgs)):\n",
244 |     "        sys.stdout.write('Parse train_imgs ' + str(i) + '; Number of boxes: ' + str(len(df)) + '\\r')\n",
245 |     "        sys.stdout.flush()\n",
246 |     "        img_name = train_imgs[i]\n",
247 |     "        img_id = img_name[0:16]\n",
248 |     "        tmp_df = df_annot_box[df_annot_box['ImageID']==img_id]\n",
249 |     "        for index,row in tmp_df.iterrows():\n",
250 |     "            labelname=row['LabelName']\n",
251 |     "            for val in range(len(names_of_classes)):\n",
252 |     "                if labelname == class_id[val]:\n",
253 |     "                    df = df.append({'FileName': img_name, \n",
254 |     "                                            'XMin': row['XMin'], \n",
255 |     "                                            'XMax': row['XMax'], \n",
256 |     "                                            'YMin': row['YMin'], \n",
257 |     "                                            'YMax': row['YMax'], \n",
258 |     "                                            'ClassName': names_of_classes[val]}, \n",
259 |     "                                           ignore_index=True)\n",
260 |     "    if save:\n",
261 |     "        df.to_csv('../createddata/{}_df.csv'.format(type_of_data))\n",
262 |     "    return df"
263 |    ]
264 |   },
265 |   {
266 |    "cell_type": "code",
267 |    "execution_count": null,
268 |    "metadata": {},
269 |    "outputs": [],
270 |    "source": [
271 |     "#Use case for preparing data from the google AI dataset\n",
272 |     "base_path = '~/Object-detection-using-Faster-RCNN/createddata/'\n",
273 |     "create_csv('Person',num_rows=2500, save=True, save_path = base_path)\n",
274 |     "download_images(base_path + 'Person_1000.csv',base_path +'/images/', 'Person')\n",
275 |     "split_train_test(base_path+'/images/', percentage_splot=.8,save_path_train='../createddata/train/',save_path_test= '../createddata/test')\n",
276 |     "train_df = create_df_out_of_image_folders('../createddata/train/',names_of_classes=['Person'], save =True, type_of_data='train')\n",
277 |     "test_df = create_df_out_of_image_folders('../createddata/test/',names_of_classes=['Person'], save =True, type_of_data='test')\n",
278 |     "\n",
279 |     "#Example of where to call in the csvs generated from 'create_df_out_of_image_folders' if working with them later\n",
280 |     "# train_df = pd.read_csv('../createddata/train_df.csv')\n",
281 |     "# train_df.drop('Unnamed: 0',inplace=True,axis=1)\n",
282 |     "\n",
283 |     "# test_df = pd.read_csv('../createddata/test_df.csv')\n",
284 |     "# test_df.drop('Unnamed: 0',inplace=True,axis=1)\n"
285 |    ]
286 |   },
287 |   {
288 |    "cell_type": "code",
289 |    "execution_count": null,
290 |    "metadata": {},
291 |    "outputs": [],
292 |    "source": [
293 |     "# For training\n",
294 |     "\n",
295 |     "f= open('../createddata' + \"/train_annotation.txt\",\"w+\")\n",
296 |     "for idx, row in train_df.iterrows():\n",
297 |     "#     sys.stdout.write(str(idx) + '\\r')\n",
298 |     "#     sys.stdout.flush()\n",
299 |     "    img = cv2.imread(('../createddata' + '/train/' + row['FileName']))\n",
300 |     "    plt.imshow(img)\n",
301 |     "    height, width = img.shape[:2]\n",
302 |     "    x1 = int(row['XMin'] * width)\n",
303 |     "    x2 = int(row['XMax'] * width)\n",
304 |     "    y1 = int(row['YMin'] * height)\n",
305 |     "    y2 = int(row['YMax'] * height)\n",
306 |     "    \n",
307 |     "    fileName = '/home/ubuntu/Object-detection-using-Faster-RCNN/createddata/train/' +row['FileName']\n",
308 |     "    className = row['ClassName']\n",
309 |     "    other_name = '/home/ubuntu/Object-detection-using-Faster-RCNN/createddata/train/' +row['FileName']\n",
310 |     "    f.write(other_name + ',' + str(x1) + ',' + str(y1) + ',' + str(x2) + ',' + str(y2) + ',' + className + '\\n')\n",
311 |     "f.close()"
312 |    ]
313 |   },
314 |   {
315 |    "cell_type": "code",
316 |    "execution_count": 17,
317 |    "metadata": {},
318 |    "outputs": [],
319 |    "source": [
320 |     "\n",
321 |     "f= open('../createddata' + \"/test_annotation.txt\",\"w+\")\n",
322 |     "for idx, row in test_df.iterrows():\n",
323 |     "#     sys.stdout.write(str(idx) + '\\r')\n",
324 |     "#     sys.stdout.flush()\n",
325 |     "    img = cv2.imread(('../createddata' + '/test/' + row['FileName']))\n",
326 |     "    height, width = img.shape[:2]\n",
327 |     "    x1 = int(row['XMin'] * width)\n",
328 |     "    x2 = int(row['XMax'] * width)\n",
329 |     "    y1 = int(row['YMin'] * height)\n",
330 |     "    y2 = int(row['YMax'] * height)\n",
331 |     "    \n",
332 |     "    fileName = '/Users/davidheller/Object-detection-using-Faster-RCNN/createddata/test/' +row['FileName']\n",
333 |     "    other_name = '/home/ubuntu/Object-detection-using-Faster-RCNN/createddata/test/' +row['FileName']\n",
334 |     "    className = row['ClassName']\n",
335 |     "    f.write(other_name + ',' + str(x1) + ',' + str(y1) + ',' + str(x2) + ',' + str(y2) + ',' + className + '\\n')\n",
336 |     "f.close()\n",
337 |     "\n",
338 |     "\n",
339 |     "\n",
340 |     "\n",
341 |     "\n"
342 |    ]
343 |   },
344 |   {
345 |    "cell_type": "code",
346 |    "execution_count": null,
347 |    "metadata": {},
348 |    "outputs": [],
349 |    "source": []
350 |   }
351 |  ],
352 |  "metadata": {
353 |   "kernelspec": {
354 |    "display_name": "Python 3",
355 |    "language": "python",
356 |    "name": "python3"
357 |   },
358 |   "language_info": {
359 |    "codemirror_mode": {
360 |     "name": "ipython",
361 |     "version": 3
362 |    },
363 |    "file_extension": ".py",
364 |    "mimetype": "text/x-python",
365 |    "name": "python",
366 |    "nbconvert_exporter": "python",
367 |    "pygments_lexer": "ipython3",
368 |    "version": "3.7.3"
369 |   }
370 |  },
371 |  "nbformat": 4,
372 |  "nbformat_minor": 2
373 | }
374 | 


--------------------------------------------------------------------------------
/src/.ipynb_checkpoints/plswork-checkpoint.ipynb:
--------------------------------------------------------------------------------
1 | {
2 |  "cells": [],
3 |  "metadata": {},
4 |  "nbformat": 4,
5 |  "nbformat_minor": 2
6 | }
7 | 


--------------------------------------------------------------------------------
/src/dataprep.ipynb:
--------------------------------------------------------------------------------
  1 | {
  2 |  "cells": [
  3 |   {
  4 |    "cell_type": "code",
  5 |    "execution_count": 18,
  6 |    "metadata": {},
  7 |    "outputs": [],
  8 |    "source": [
  9 |     "import numpy as np\n",
 10 |     "import time\n",
 11 |     "import sys\n",
 12 |     "import os\n",
 13 |     "import random\n",
 14 |     "from skimage import io\n",
 15 |     "import pandas as pd\n",
 16 |     "from matplotlib import pyplot as plt\n",
 17 |     "from shutil import copyfile\n",
 18 |     "\n",
 19 |     "import cv2\n",
 20 |     "import tensorflow as tf"
 21 |    ]
 22 |   },
 23 |   {
 24 |    "cell_type": "code",
 25 |    "execution_count": 19,
 26 |    "metadata": {},
 27 |    "outputs": [],
 28 |    "source": [
 29 |     "base_path = '~/Object-detection-using-Faster-RCNN/data/'\n",
 30 |     "image_links = 'train-images-boxable.csv'\n",
 31 |     "annot_box = 'train-annotations-bbox.csv'\n",
 32 |     "class_labels = 'class-descriptions-boxable.csv'"
 33 |    ]
 34 |   },
 35 |   {
 36 |    "cell_type": "code",
 37 |    "execution_count": 20,
 38 |    "metadata": {},
 39 |    "outputs": [],
 40 |    "source": [
 41 |     "df_image_links = pd.read_csv(base_path+image_links)"
 42 |    ]
 43 |   },
 44 |   {
 45 |    "cell_type": "code",
 46 |    "execution_count": 21,
 47 |    "metadata": {},
 48 |    "outputs": [],
 49 |    "source": [
 50 |     "df_annot_box = pd.read_csv(base_path+annot_box)"
 51 |    ]
 52 |   },
 53 |   {
 54 |    "cell_type": "code",
 55 |    "execution_count": 22,
 56 |    "metadata": {},
 57 |    "outputs": [],
 58 |    "source": [
 59 |     "df_class_labels = pd.read_csv(base_path+class_labels,header=None)\n",
 60 |     "df_class_labels.columns=['id','name']\n"
 61 |    ]
 62 |   },
 63 |   {
 64 |    "cell_type": "code",
 65 |    "execution_count": null,
 66 |    "metadata": {},
 67 |    "outputs": [],
 68 |    "source": [
 69 |     "def plot_org_img_with_boxes(image_name):\n",
 70 |     "    \"\"\"\n",
 71 |     "    Input: \n",
 72 |     "    image_name(string) = the actual file name '6b5bfa4e9b0e767c.jpg'\n",
 73 |     "    \n",
 74 |     "    Return:\n",
 75 |     "    One plot of the original image no bounding boxes, the other with bounding boxes.\n",
 76 |     "    \"\"\"\n",
 77 |     "    temp = df_image_links[df_image_links['image_name']==image_name]\n",
 78 |     "    img_url = temp['image_url'].values[0]\n",
 79 |     "    img_id = image_name[:16]\n",
 80 |     "    \n",
 81 |     "    img = io.imread(img_url)\n",
 82 |     "    \n",
 83 |     "    height, width, _ = img.shape\n",
 84 |     "    plt.figure(figsize=(10,10))\n",
 85 |     "    plt.subplot(1,2,1)\n",
 86 |     "    plt.title('Original Image')\n",
 87 |     "    plt.imshow(img)\n",
 88 |     "    boxes = df_annot_box[df_annot_box['ImageID']==img_id]\n",
 89 |     "    img_bbox = img.copy()\n",
 90 |     "    for index, row in boxes.iterrows():\n",
 91 |     "        xmin,xmax,ymin,ymax = row['XMin'],row['XMax'],row['YMin'],row['YMax']\n",
 92 |     "        xmin,xmax,ymin,ymax = int(xmin*width),int(xmax*width),int(ymin*height),int(ymax*height)\n",
 93 |     "        label_name = row['LabelName']\n",
 94 |     "        \n",
 95 |     "        temp_df = df_class_labels[df_class_labels['id']==label_name]\n",
 96 |     "        class_of_box = temp_df['name'].values[0]\n",
 97 |     "        \n",
 98 |     "        cv2.rectangle(img_bbox,(xmin,ymin),(xmax,ymax),(255,255,0),2)\n",
 99 |     "        font = cv2.FONT_HERSHEY_SIMPLEX\n",
100 |     "        cv2.putText(img_bbox,class_of_box,(xmin,ymin-10), font, 1,(255,255,0),2)\n",
101 |     "    plt.subplot(1,2,2)\n",
102 |     "    plt.title('Image with Bounding Box')\n",
103 |     "    plt.imshow(img_bbox)\n",
104 |     "    plt.show()\n",
105 |     "    "
106 |    ]
107 |   },
108 |   {
109 |    "cell_type": "code",
110 |    "execution_count": null,
111 |    "metadata": {},
112 |    "outputs": [],
113 |    "source": [
114 |     "def create_csv(which_class, num_rows,save=False,save_path ='~/Object-detection-using-Faster-RCNN/createddata/'):\n",
115 |     "    \"\"\"Input: \n",
116 |     "    which_class(str): one of 600 classes available in the dataset\n",
117 |     "    num_rows(int): how many rows you want your csv to be could potentially be less than inputted if not that many in data\n",
118 |     "    \n",
119 |     "    Returns: \n",
120 |     "    Df with num_rows randomly chosen number of rows\n",
121 |     "    if save==True: saves it as a csv in save_path + (which_class)_1000.csv\n",
122 |     "    \n",
123 |     "    \"\"\"\n",
124 |     "    class_id = df_class_labels[df_class_labels['name']==which_class].values[0][0] #collect the class_id value \n",
125 |     "    num_total_pics = df_annot_box[df_annot_box['LabelName']==class_id] #select all the annotation boxes with that id value\n",
126 |     "    print('Total amount of {} in data'.format(which_class))\n",
127 |     "    print(len(num_total_pics))\n",
128 |     "    \n",
129 |     "    print('Number of unique pictures featuring atleast one of {}'.format(which_class))\n",
130 |     "    num_unique_pics_of_class = np.unique(num_total_pics['ImageID']) #remove duplicate images from the df, \n",
131 |     "    #such as smooshing down a picture that has 2 birds to one value\n",
132 |     "    print(len(num_unique_pics_of_class))\n",
133 |     "    random_rows = np.random.choice(num_unique_pics_of_class,num_rows,replace=False)#randomly choose Num_rows\n",
134 |     "    array_append_jpg = [df_image_links[df_image_links['image_name']==name+'.jpg'] for name in random_rows]\n",
135 |     "    df = pd.DataFrame()\n",
136 |     "    for i in range(len(array_append_jpg)):\n",
137 |     "        df = df.append(array_append_jpg[i], ignore_index = True)\n",
138 |     "    if save:\n",
139 |     "        df.to_csv(save_path + '{}_{}.csv'.format(which_class,num_rows))\n",
140 |     "    \n",
141 |     "    return df\n",
142 |     "\n"
143 |    ]
144 |   },
145 |   {
146 |    "cell_type": "code",
147 |    "execution_count": null,
148 |    "metadata": {},
149 |    "outputs": [],
150 |    "source": [
151 |     "def download_images(csv_file_path, save_file_path, type_of_class):\n",
152 |     "    \"\"\"\n",
153 |     "    INPUT:\n",
154 |     "    csv_file_path(string) = path to where you saved the csv generated from create_csv\n",
155 |     "    save_file_path(string) = path to where you want to save all of the images\n",
156 |     "    type_of_class(string) = whichever class('Person', 'Bird', etc...)\n",
157 |     "    \n",
158 |     "    Returns: None\n",
159 |     "    \n",
160 |     "    Generates: A new folder with the name of type_of_class with all the images downloaded inside of it\n",
161 |     "    \"\"\"\n",
162 |     "    df = pd.read_csv(csv_file_path)\n",
163 |     "    if len(df.columns) >=3:\n",
164 |     "        df.drop('Unnamed: 0',axis=1,inplace=True)\n",
165 |     "    urls = df['image_url'].values\n",
166 |     "    directory = save_file_path + type_of_class\n",
167 |     "    os.mkdir(directory)\n",
168 |     "    for url in urls:\n",
169 |     "        img = io.imread(url)\n",
170 |     "        file_name=url[-20:]\n",
171 |     "        io.imsave(directory+'/'+file_name, img)\n",
172 |     "    \n",
173 |     "    "
174 |    ]
175 |   },
176 |   {
177 |    "cell_type": "code",
178 |    "execution_count": null,
179 |    "metadata": {},
180 |    "outputs": [],
181 |    "source": [
182 |     "train_path = '../createddata/train/'\n",
183 |     "test_path = '../createddata/test/'\n",
184 |     "# os.mkdir(train_path)\n",
185 |     "# os.mkdir(test_path)\n",
186 |     "def split_train_test(file_path_to_imgs, percentage_split=.8, save_path_train = '../createddata/train/',save_path_test = '../createddata/test/'):\n",
187 |     "    \"\"\"\n",
188 |     "    Input: \n",
189 |     "    file_path_to_imgs = file path to the image directory where all the downloaded images from download_images() are saved\n",
190 |     "    percentage_split(int between 0-1) = Default at .8 The percentage you want to be train images, remaining percent is test\n",
191 |     "    save_path_train = file path where you want to save the train images\n",
192 |     "    save_path_test = file path where you want to save the test images\n",
193 |     "    \n",
194 |     "    RETURNS: None\n",
195 |     "    \n",
196 |     "    Generates:\n",
197 |     "    Copied images into specificed train and test directorys\n",
198 |     "    \"\"\"\n",
199 |     "    imgs = os.listdir(file_path_to_imgs)\n",
200 |     "    imgs = [f for f in imgs if not f.startswith('.')]\n",
201 |     "    random.seed(1)\n",
202 |     "    random.shuffle(imgs)\n",
203 |     "    num_of_train_imgs = int(len(imgs)*percentage_split)\n",
204 |     "    num_of_test_imgs =len(imgs)-int(len(imgs)*(1-percentage_split))\n",
205 |     "    train_imgs = imgs[:num_of_train_imgs]\n",
206 |     "    test_imgs = imgs[num_of_test_imgs:]\n",
207 |     "    for val in train_imgs:\n",
208 |     "        img_loc = file_path_to_imgs + val\n",
209 |     "        save_loc = save_path_train+val\n",
210 |     "        copyfile(img_loc, save_loc)\n",
211 |     "    for val in test_imgs:\n",
212 |     "        img_loc = file_path_to_imgs + val\n",
213 |     "        save_loc = save_path_test+val\n",
214 |     "        copyfile(img_loc, save_loc)"
215 |    ]
216 |   },
217 |   {
218 |    "cell_type": "code",
219 |    "execution_count": 49,
220 |    "metadata": {},
221 |    "outputs": [],
222 |    "source": [
223 |     "def create_df_out_of_image_folders(path,names_of_classes=[],save=False, type_of_data='train'):\n",
224 |     "    \"\"\"\n",
225 |     "    INPUTS:\n",
226 |     "    path = path to where you just stored your train or test pictures\n",
227 |     "    names_of_classes = A list with the classes you've chosen for your data ('Bird','Person,'Traffic light'...etc)\n",
228 |     "    save = True save a csv file you can load in with this information\n",
229 |     "    type_of_data = actually just used as a variable to name your saved csv \n",
230 |     "    ex:\n",
231 |     "    if type_of_data = 'train' file will save as 'train_df.csv'\n",
232 |     "    \n",
233 |     "    Returns:\n",
234 |     "    dataframe with format [FileName, XMIN, XMax, YMin, Ymax, ClassName] for each bounding box\n",
235 |     "    \n",
236 |     "    \"\"\"\n",
237 |     "    class_id = []\n",
238 |     "    for val in names_of_classes:\n",
239 |     "        class_id.append(df_class_labels[df_class_labels['name']==val].values[0][0])\n",
240 |     "    df = pd.DataFrame(columns=['FileName', 'XMin', 'XMax', 'YMin', 'YMax', 'ClassName'])\n",
241 |     "    train_imgs = os.listdir(path)\n",
242 |     "    train_imgs = [name for name in train_imgs if not name.startswith('.')]\n",
243 |     "    for i in range(len(train_imgs)):\n",
244 |     "        sys.stdout.write('Parse train_imgs ' + str(i) + '; Number of boxes: ' + str(len(df)) + '\\r')\n",
245 |     "        sys.stdout.flush()\n",
246 |     "        img_name = train_imgs[i]\n",
247 |     "        img_id = img_name[0:16]\n",
248 |     "        tmp_df = df_annot_box[df_annot_box['ImageID']==img_id]\n",
249 |     "        for index,row in tmp_df.iterrows():\n",
250 |     "            labelname=row['LabelName']\n",
251 |     "            for val in range(len(names_of_classes)):\n",
252 |     "                if labelname == class_id[val]:\n",
253 |     "                    df = df.append({'FileName': img_name, \n",
254 |     "                                            'XMin': row['XMin'], \n",
255 |     "                                            'XMax': row['XMax'], \n",
256 |     "                                            'YMin': row['YMin'], \n",
257 |     "                                            'YMax': row['YMax'], \n",
258 |     "                                            'ClassName': names_of_classes[val]}, \n",
259 |     "                                           ignore_index=True)\n",
260 |     "    if save:\n",
261 |     "        df.to_csv('../createddata/{}_df.csv'.format(type_of_data))\n",
262 |     "    return df"
263 |    ]
264 |   },
265 |   {
266 |    "cell_type": "code",
267 |    "execution_count": null,
268 |    "metadata": {},
269 |    "outputs": [],
270 |    "source": [
271 |     "#Use case for preparing data from the google AI dataset\n",
272 |     "base_path = '~/Object-detection-using-Faster-RCNN/createddata/'\n",
273 |     "create_csv('Person',num_rows=2500, save=True, save_path = base_path)\n",
274 |     "download_images(base_path + 'Person_1000.csv',base_path +'/images/', 'Person')\n",
275 |     "split_train_test(base_path+'/images/', percentage_splot=.8,save_path_train='../createddata/train/',save_path_test= '../createddata/test')\n",
276 |     "train_df = create_df_out_of_image_folders('../createddata/train/',names_of_classes=['Person'], save =True, type_of_data='train')\n",
277 |     "test_df = create_df_out_of_image_folders('../createddata/test/',names_of_classes=['Person'], save =True, type_of_data='test')\n",
278 |     "\n",
279 |     "#Example of where to call in the csvs generated from 'create_df_out_of_image_folders' if working with them later\n",
280 |     "# train_df = pd.read_csv('../createddata/train_df.csv')\n",
281 |     "# train_df.drop('Unnamed: 0',inplace=True,axis=1)\n",
282 |     "\n",
283 |     "# test_df = pd.read_csv('../createddata/test_df.csv')\n",
284 |     "# test_df.drop('Unnamed: 0',inplace=True,axis=1)\n"
285 |    ]
286 |   },
287 |   {
288 |    "cell_type": "code",
289 |    "execution_count": null,
290 |    "metadata": {},
291 |    "outputs": [],
292 |    "source": [
293 |     "# For training\n",
294 |     "\n",
295 |     "f= open('../createddata' + \"/train_annotation.txt\",\"w+\")\n",
296 |     "for idx, row in train_df.iterrows():\n",
297 |     "#     sys.stdout.write(str(idx) + '\\r')\n",
298 |     "#     sys.stdout.flush()\n",
299 |     "    img = cv2.imread(('../createddata' + '/train/' + row['FileName']))\n",
300 |     "    plt.imshow(img)\n",
301 |     "    height, width = img.shape[:2]\n",
302 |     "    x1 = int(row['XMin'] * width)\n",
303 |     "    x2 = int(row['XMax'] * width)\n",
304 |     "    y1 = int(row['YMin'] * height)\n",
305 |     "    y2 = int(row['YMax'] * height)\n",
306 |     "    \n",
307 |     "    fileName = '/home/ubuntu/Object-detection-using-Faster-RCNN/createddata/train/' +row['FileName']\n",
308 |     "    className = row['ClassName']\n",
309 |     "    other_name = '/home/ubuntu/Object-detection-using-Faster-RCNN/createddata/train/' +row['FileName']\n",
310 |     "    f.write(other_name + ',' + str(x1) + ',' + str(y1) + ',' + str(x2) + ',' + str(y2) + ',' + className + '\\n')\n",
311 |     "f.close()"
312 |    ]
313 |   },
314 |   {
315 |    "cell_type": "code",
316 |    "execution_count": 17,
317 |    "metadata": {},
318 |    "outputs": [],
319 |    "source": [
320 |     "\n",
321 |     "f= open('../createddata' + \"/test_annotation.txt\",\"w+\")\n",
322 |     "for idx, row in test_df.iterrows():\n",
323 |     "#     sys.stdout.write(str(idx) + '\\r')\n",
324 |     "#     sys.stdout.flush()\n",
325 |     "    img = cv2.imread(('../createddata' + '/test/' + row['FileName']))\n",
326 |     "    height, width = img.shape[:2]\n",
327 |     "    x1 = int(row['XMin'] * width)\n",
328 |     "    x2 = int(row['XMax'] * width)\n",
329 |     "    y1 = int(row['YMin'] * height)\n",
330 |     "    y2 = int(row['YMax'] * height)\n",
331 |     "    \n",
332 |     "    fileName = '/Users/davidheller/Object-detection-using-Faster-RCNN/createddata/test/' +row['FileName']\n",
333 |     "    other_name = '/home/ubuntu/Object-detection-using-Faster-RCNN/createddata/test/' +row['FileName']\n",
334 |     "    className = row['ClassName']\n",
335 |     "    f.write(other_name + ',' + str(x1) + ',' + str(y1) + ',' + str(x2) + ',' + str(y2) + ',' + className + '\\n')\n",
336 |     "f.close()\n",
337 |     "\n",
338 |     "\n",
339 |     "\n",
340 |     "\n",
341 |     "\n"
342 |    ]
343 |   },
344 |   {
345 |    "cell_type": "code",
346 |    "execution_count": null,
347 |    "metadata": {},
348 |    "outputs": [],
349 |    "source": []
350 |   }
351 |  ],
352 |  "metadata": {
353 |   "kernelspec": {
354 |    "display_name": "Python 3",
355 |    "language": "python",
356 |    "name": "python3"
357 |   },
358 |   "language_info": {
359 |    "codemirror_mode": {
360 |     "name": "ipython",
361 |     "version": 3
362 |    },
363 |    "file_extension": ".py",
364 |    "mimetype": "text/x-python",
365 |    "name": "python",
366 |    "nbconvert_exporter": "python",
367 |    "pygments_lexer": "ipython3",
368 |    "version": "3.7.3"
369 |   }
370 |  },
371 |  "nbformat": 4,
372 |  "nbformat_minor": 2
373 | }
374 | 


--------------------------------------------------------------------------------
/src/training.ipynb:
--------------------------------------------------------------------------------
   1 | {
   2 |  "cells": [
   3 |   {
   4 |    "cell_type": "code",
   5 |    "execution_count": 32,
   6 |    "metadata": {},
   7 |    "outputs": [],
   8 |    "source": [
   9 |     "from __future__ import division\n",
  10 |     "from __future__ import print_function\n",
  11 |     "from __future__ import absolute_import\n",
  12 |     "import random\n",
  13 |     "import pprint\n",
  14 |     "import sys\n",
  15 |     "import time\n",
  16 |     "import numpy as np\n",
  17 |     "from optparse import OptionParser\n",
  18 |     "import pickle\n",
  19 |     "import math\n",
  20 |     "import cv2\n",
  21 |     "import copy\n",
  22 |     "import matplotlib as mpl\n",
  23 |     "mpl.use('Agg')\n",
  24 |     "from matplotlib import pyplot as plt\n",
  25 |     "import tensorflow as tf\n",
  26 |     "import pandas as pd\n",
  27 |     "import os\n",
  28 |     "\n",
  29 |     "from sklearn.metrics import average_precision_score\n",
  30 |     "\n",
  31 |     "from keras import backend as K\n",
  32 |     "from keras.optimizers import Adam, SGD, RMSprop\n",
  33 |     "from keras.layers import Flatten, Dense, Input, Conv2D, MaxPooling2D, Dropout\n",
  34 |     "from keras.layers import GlobalAveragePooling2D, GlobalMaxPooling2D, TimeDistributed\n",
  35 |     "from keras.engine.topology import get_source_inputs\n",
  36 |     "from keras.utils import layer_utils\n",
  37 |     "from keras.utils.data_utils import get_file\n",
  38 |     "from keras.objectives import categorical_crossentropy\n",
  39 |     "\n",
  40 |     "from keras.models import Model\n",
  41 |     "from keras.utils import generic_utils\n",
  42 |     "from keras.engine import Layer, InputSpec\n",
  43 |     "from keras import initializers, regularizers"
  44 |    ]
  45 |   },
  46 |   {
  47 |    "cell_type": "code",
  48 |    "execution_count": 3,
  49 |    "metadata": {},
  50 |    "outputs": [],
  51 |    "source": [
  52 |     "class Config:\n",
  53 |     "\n",
  54 |     "\tdef __init__(self):\n",
  55 |     "\n",
  56 |     "\t\t# Print the process or not\n",
  57 |     "\t\tself.verbose = True\n",
  58 |     "\n",
  59 |     "\t\t# Name of base network\n",
  60 |     "\t\tself.network = 'vgg'\n",
  61 |     "\n",
  62 |     "\t\t# Setting for data augmentation\n",
  63 |     "\t\tself.use_horizontal_flips = False\n",
  64 |     "\t\tself.use_vertical_flips = False\n",
  65 |     "\t\tself.rot_90 = False\n",
  66 |     "\n",
  67 |     "\t\t# Anchor box scales\n",
  68 |     "    # Note that if im_size is smaller, anchor_box_scales should be scaled\n",
  69 |     "    # Original anchor_box_scales in the paper is [128, 256, 512]\n",
  70 |     "\t\tself.anchor_box_scales = [64, 128, 256] \n",
  71 |     "\n",
  72 |     "\t\t# Anchor box ratios\n",
  73 |     "\t\tself.anchor_box_ratios = [[1, 1], [1./math.sqrt(2), 2./math.sqrt(2)], [2./math.sqrt(2), 1./math.sqrt(2)]]\n",
  74 |     "\n",
  75 |     "\t\t# Size to resize the smallest side of the image\n",
  76 |     "\t\t# Original setting in paper is 600. Set to 300 in here to save training time\n",
  77 |     "\t\tself.im_size = 300\n",
  78 |     "\n",
  79 |     "\t\t# image channel-wise mean to subtract\n",
  80 |     "\t\tself.img_channel_mean = [103.939, 116.779, 123.68]\n",
  81 |     "\t\tself.img_scaling_factor = 1.0\n",
  82 |     "\n",
  83 |     "\t\t# number of ROIs at once\n",
  84 |     "\t\tself.num_rois = 4\n",
  85 |     "\n",
  86 |     "\t\t# stride at the RPN (this depends on the network configuration)\n",
  87 |     "\t\tself.rpn_stride = 16\n",
  88 |     "\n",
  89 |     "\t\tself.balanced_classes = False\n",
  90 |     "\n",
  91 |     "\t\t# scaling the stdev\n",
  92 |     "\t\tself.std_scaling = 4.0\n",
  93 |     "\t\tself.classifier_regr_std = [8.0, 8.0, 4.0, 4.0]\n",
  94 |     "\n",
  95 |     "\t\t# overlaps for RPN\n",
  96 |     "\t\tself.rpn_min_overlap = 0.3\n",
  97 |     "\t\tself.rpn_max_overlap = 0.7\n",
  98 |     "\n",
  99 |     "\t\t# overlaps for classifier ROIs\n",
 100 |     "\t\tself.classifier_min_overlap = 0.1\n",
 101 |     "\t\tself.classifier_max_overlap = 0.5\n",
 102 |     "\n",
 103 |     "\t\t# placeholder for the class mapping, automatically generated by the parser\n",
 104 |     "\t\tself.class_mapping = None\n",
 105 |     "\n",
 106 |     "\t\tself.model_path = None"
 107 |    ]
 108 |   },
 109 |   {
 110 |    "cell_type": "code",
 111 |    "execution_count": 4,
 112 |    "metadata": {},
 113 |    "outputs": [],
 114 |    "source": [
 115 |     "def get_data(input_path):\n",
 116 |     "\t\"\"\"Parse the data from annotation file\n",
 117 |     "\t\n",
 118 |     "\tArgs:\n",
 119 |     "\t\tinput_path: annotation file path\n",
 120 |     "\n",
 121 |     "\tReturns:\n",
 122 |     "\t\tall_data: list(filepath, width, height, list(bboxes))\n",
 123 |     "\t\tclasses_count: dict{key:class_name, value:count_num} \n",
 124 |     "\t\t\te.g. {'Car': 2383, 'Mobile phone': 1108, 'Person': 3745}\n",
 125 |     "\t\tclass_mapping: dict{key:class_name, value: idx}\n",
 126 |     "\t\t\te.g. {'Car': 0, 'Mobile phone': 1, 'Person': 2}\n",
 127 |     "\t\"\"\"\n",
 128 |     "\tfound_bg = False\n",
 129 |     "\tall_imgs = {}\n",
 130 |     "\n",
 131 |     "\tclasses_count = {}\n",
 132 |     "\n",
 133 |     "\tclass_mapping = {}\n",
 134 |     "\n",
 135 |     "\tvisualise = True\n",
 136 |     "\n",
 137 |     "\ti = 1\n",
 138 |     "\t\n",
 139 |     "\twith open(input_path,'r') as f:\n",
 140 |     "\n",
 141 |     "\t\tprint('Parsing annotation files')\n",
 142 |     "\n",
 143 |     "\t\tfor line in f:\n",
 144 |     "\n",
 145 |     "\t\t\t# Print process\n",
 146 |     "\t\t\tsys.stdout.write('\\r'+'idx=' + str(i))\n",
 147 |     "\t\t\ti += 1\n",
 148 |     "\n",
 149 |     "\t\t\tline_split = line.strip().split(',')\n",
 150 |     "\n",
 151 |     "\t\t\t# Make sure the info saved in annotation file matching the format (path_filename, x1, y1, x2, y2, class_name)\n",
 152 |     "\t\t\t# Note:\n",
 153 |     "\t\t\t#\tOne path_filename might has several classes (class_name)\n",
 154 |     "\t\t\t#\tx1, y1, x2, y2 are the pixel value of the origial image, not the ratio value\n",
 155 |     "\t\t\t#\t(x1, y1) top left coordinates; (x2, y2) bottom right coordinates\n",
 156 |     "\t\t\t#   x1,y1-------------------\n",
 157 |     "\t\t\t#\t|\t\t\t\t\t\t|\n",
 158 |     "\t\t\t#\t|\t\t\t\t\t\t|\n",
 159 |     "\t\t\t#\t|\t\t\t\t\t\t|\n",
 160 |     "\t\t\t#\t|\t\t\t\t\t\t|\n",
 161 |     "\t\t\t#\t---------------------x2,y2\n",
 162 |     "\n",
 163 |     "\t\t\t(filename,x1,y1,x2,y2,class_name) = line_split\n",
 164 |     "\n",
 165 |     "\t\t\tif class_name not in classes_count:\n",
 166 |     "\t\t\t\tclasses_count[class_name] = 1\n",
 167 |     "\t\t\telse:\n",
 168 |     "\t\t\t\tclasses_count[class_name] += 1\n",
 169 |     "\n",
 170 |     "\t\t\tif class_name not in class_mapping:\n",
 171 |     "\t\t\t\tif class_name == 'bg' and found_bg == False:\n",
 172 |     "\t\t\t\t\tprint('Found class name with special name bg. Will be treated as a background region (this is usually for hard negative mining).')\n",
 173 |     "\t\t\t\t\tfound_bg = True\n",
 174 |     "\t\t\t\tclass_mapping[class_name] = len(class_mapping)\n",
 175 |     "\n",
 176 |     "\t\t\tif filename not in all_imgs:\n",
 177 |     "\t\t\t\tall_imgs[filename] = {}\n",
 178 |     "\t\t\t\t\n",
 179 |     "\t\t\t\timg = cv2.imread(filename)\n",
 180 |     "\t\t\t\t(rows,cols) = img.shape[:2]\n",
 181 |     "\t\t\t\tall_imgs[filename]['filepath'] = filename\n",
 182 |     "\t\t\t\tall_imgs[filename]['width'] = cols\n",
 183 |     "\t\t\t\tall_imgs[filename]['height'] = rows\n",
 184 |     "\t\t\t\tall_imgs[filename]['bboxes'] = []\n",
 185 |     "\t\t\t\t# if np.random.randint(0,6) > 0:\n",
 186 |     "\t\t\t\t# \tall_imgs[filename]['imageset'] = 'trainval'\n",
 187 |     "\t\t\t\t# else:\n",
 188 |     "\t\t\t\t# \tall_imgs[filename]['imageset'] = 'test'\n",
 189 |     "\n",
 190 |     "\t\t\tall_imgs[filename]['bboxes'].append({'class': class_name, 'x1': int(x1), 'x2': int(x2), 'y1': int(y1), 'y2': int(y2)})\n",
 191 |     "\n",
 192 |     "\n",
 193 |     "\t\tall_data = []\n",
 194 |     "\t\tfor key in all_imgs:\n",
 195 |     "\t\t\tall_data.append(all_imgs[key])\n",
 196 |     "\t\t\n",
 197 |     "\t\t# make sure the bg class is last in the list\n",
 198 |     "\t\tif found_bg:\n",
 199 |     "\t\t\tif class_mapping['bg'] != len(class_mapping) - 1:\n",
 200 |     "\t\t\t\tkey_to_switch = [key for key in class_mapping.keys() if class_mapping[key] == len(class_mapping)-1][0]\n",
 201 |     "\t\t\t\tval_to_switch = class_mapping['bg']\n",
 202 |     "\t\t\t\tclass_mapping['bg'] = len(class_mapping) - 1\n",
 203 |     "\t\t\t\tclass_mapping[key_to_switch] = val_to_switch\n",
 204 |     "\t\t\n",
 205 |     "\t\treturn all_data, classes_count, class_mapping\n"
 206 |    ]
 207 |   },
 208 |   {
 209 |    "cell_type": "code",
 210 |    "execution_count": 5,
 211 |    "metadata": {},
 212 |    "outputs": [],
 213 |    "source": [
 214 |     "class RoiPoolingConv(Layer):\n",
 215 |     "    '''ROI pooling layer for 2D inputs.\n",
 216 |     "    See Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition,\n",
 217 |     "    K. He, X. Zhang, S. Ren, J. Sun\n",
 218 |     "    # Arguments\n",
 219 |     "        pool_size: int\n",
 220 |     "            Size of pooling region to use. pool_size = 7 will result in a 7x7 region.\n",
 221 |     "        num_rois: number of regions of interest to be used\n",
 222 |     "    # Input shape\n",
 223 |     "        list of two 4D tensors [X_img,X_roi] with shape:\n",
 224 |     "        X_img:\n",
 225 |     "        `(1, rows, cols, channels)`\n",
 226 |     "        X_roi:\n",
 227 |     "        `(1,num_rois,4)` list of rois, with ordering (x,y,w,h)\n",
 228 |     "    # Output shape\n",
 229 |     "        3D tensor with shape:\n",
 230 |     "        `(1, num_rois, channels, pool_size, pool_size)`\n",
 231 |     "    '''\n",
 232 |     "    def __init__(self, pool_size, num_rois, **kwargs):\n",
 233 |     "\n",
 234 |     "        self.dim_ordering = K.image_dim_ordering()\n",
 235 |     "        self.pool_size = pool_size\n",
 236 |     "        self.num_rois = num_rois\n",
 237 |     "\n",
 238 |     "        super(RoiPoolingConv, self).__init__(**kwargs)\n",
 239 |     "\n",
 240 |     "    def build(self, input_shape):\n",
 241 |     "        self.nb_channels = input_shape[0][3]   \n",
 242 |     "\n",
 243 |     "    def compute_output_shape(self, input_shape):\n",
 244 |     "        return None, self.num_rois, self.pool_size, self.pool_size, self.nb_channels\n",
 245 |     "\n",
 246 |     "    def call(self, x, mask=None):\n",
 247 |     "\n",
 248 |     "        assert(len(x) == 2)\n",
 249 |     "\n",
 250 |     "        # x[0] is image with shape (rows, cols, channels)\n",
 251 |     "        img = x[0]\n",
 252 |     "\n",
 253 |     "        # x[1] is roi with shape (num_rois,4) with ordering (x,y,w,h)\n",
 254 |     "        rois = x[1]\n",
 255 |     "\n",
 256 |     "        input_shape = K.shape(img)\n",
 257 |     "\n",
 258 |     "        outputs = []\n",
 259 |     "\n",
 260 |     "        for roi_idx in range(self.num_rois):\n",
 261 |     "\n",
 262 |     "            x = rois[0, roi_idx, 0]\n",
 263 |     "            y = rois[0, roi_idx, 1]\n",
 264 |     "            w = rois[0, roi_idx, 2]\n",
 265 |     "            h = rois[0, roi_idx, 3]\n",
 266 |     "\n",
 267 |     "            x = K.cast(x, 'int32')\n",
 268 |     "            y = K.cast(y, 'int32')\n",
 269 |     "            w = K.cast(w, 'int32')\n",
 270 |     "            h = K.cast(h, 'int32')\n",
 271 |     "\n",
 272 |     "            # Resized roi of the image to pooling size (7x7)\n",
 273 |     "            rs = tf.image.resize_images(img[:, y:y+h, x:x+w, :], (self.pool_size, self.pool_size))\n",
 274 |     "            outputs.append(rs)\n",
 275 |     "                \n",
 276 |     "\n",
 277 |     "        final_output = K.concatenate(outputs, axis=0)\n",
 278 |     "\n",
 279 |     "        # Reshape to (1, num_rois, pool_size, pool_size, nb_channels)\n",
 280 |     "        # Might be (1, 4, 7, 7, 3)\n",
 281 |     "        final_output = K.reshape(final_output, (1, self.num_rois, self.pool_size, self.pool_size, self.nb_channels))\n",
 282 |     "\n",
 283 |     "        # permute_dimensions is similar to transpose\n",
 284 |     "        final_output = K.permute_dimensions(final_output, (0, 1, 2, 3, 4))\n",
 285 |     "\n",
 286 |     "        return final_output\n",
 287 |     "    \n",
 288 |     "    \n",
 289 |     "    def get_config(self):\n",
 290 |     "        config = {'pool_size': self.pool_size,\n",
 291 |     "                  'num_rois': self.num_rois}\n",
 292 |     "        base_config = super(RoiPoolingConv, self).get_config()\n",
 293 |     "        return dict(list(base_config.items()) + list(config.items()))"
 294 |    ]
 295 |   },
 296 |   {
 297 |    "cell_type": "code",
 298 |    "execution_count": 6,
 299 |    "metadata": {},
 300 |    "outputs": [],
 301 |    "source": [
 302 |     "def get_img_output_length(width, height):\n",
 303 |     "    def get_output_length(input_length):\n",
 304 |     "        return input_length//16\n",
 305 |     "\n",
 306 |     "    return get_output_length(width), get_output_length(height)    \n",
 307 |     "\n",
 308 |     "def nn_base(input_tensor=None, trainable=False):\n",
 309 |     "\n",
 310 |     "\n",
 311 |     "    input_shape = (None, None, 3)\n",
 312 |     "\n",
 313 |     "    if input_tensor is None:\n",
 314 |     "        img_input = Input(shape=input_shape)\n",
 315 |     "    else:\n",
 316 |     "        if not K.is_keras_tensor(input_tensor):\n",
 317 |     "            img_input = Input(tensor=input_tensor, shape=input_shape)\n",
 318 |     "        else:\n",
 319 |     "            img_input = input_tensor\n",
 320 |     "\n",
 321 |     "    bn_axis = 3\n",
 322 |     "\n",
 323 |     "    # Block 1\n",
 324 |     "    x = Conv2D(64, (3, 3), activation='relu', padding='same', name='block1_conv1')(img_input)\n",
 325 |     "    x = Conv2D(64, (3, 3), activation='relu', padding='same', name='block1_conv2')(x)\n",
 326 |     "    x = MaxPooling2D((2, 2), strides=(2, 2), name='block1_pool')(x)\n",
 327 |     "\n",
 328 |     "    # Block 2\n",
 329 |     "    x = Conv2D(128, (3, 3), activation='relu', padding='same', name='block2_conv1')(x)\n",
 330 |     "    x = Conv2D(128, (3, 3), activation='relu', padding='same', name='block2_conv2')(x)\n",
 331 |     "    x = MaxPooling2D((2, 2), strides=(2, 2), name='block2_pool')(x)\n",
 332 |     "\n",
 333 |     "    # Block 3\n",
 334 |     "    x = Conv2D(256, (3, 3), activation='relu', padding='same', name='block3_conv1')(x)\n",
 335 |     "    x = Conv2D(256, (3, 3), activation='relu', padding='same', name='block3_conv2')(x)\n",
 336 |     "    x = Conv2D(256, (3, 3), activation='relu', padding='same', name='block3_conv3')(x)\n",
 337 |     "    x = MaxPooling2D((2, 2), strides=(2, 2), name='block3_pool')(x)\n",
 338 |     "\n",
 339 |     "    # Block 4\n",
 340 |     "    x = Conv2D(512, (3, 3), activation='relu', padding='same', name='block4_conv1')(x)\n",
 341 |     "    x = Conv2D(512, (3, 3), activation='relu', padding='same', name='block4_conv2')(x)\n",
 342 |     "    x = Conv2D(512, (3, 3), activation='relu', padding='same', name='block4_conv3')(x)\n",
 343 |     "    x = MaxPooling2D((2, 2), strides=(2, 2), name='block4_pool')(x)\n",
 344 |     "\n",
 345 |     "    # Block 5\n",
 346 |     "    x = Conv2D(512, (3, 3), activation='relu', padding='same', name='block5_conv1')(x)\n",
 347 |     "    x = Conv2D(512, (3, 3), activation='relu', padding='same', name='block5_conv2')(x)\n",
 348 |     "    x = Conv2D(512, (3, 3), activation='relu', padding='same', name='block5_conv3')(x)\n",
 349 |     "    # x = MaxPooling2D((2, 2), strides=(2, 2), name='block5_pool')(x)\n",
 350 |     "\n",
 351 |     "    return x"
 352 |    ]
 353 |   },
 354 |   {
 355 |    "cell_type": "code",
 356 |    "execution_count": 7,
 357 |    "metadata": {},
 358 |    "outputs": [],
 359 |    "source": [
 360 |     "def rpn_layer(base_layers, num_anchors):\n",
 361 |     "    \"\"\"Create a rpn layer\n",
 362 |     "        Step1: Pass through the feature map from base layer to a 3x3 512 channels convolutional layer\n",
 363 |     "                Keep the padding 'same' to preserve the feature map's size\n",
 364 |     "        Step2: Pass the step1 to two (1,1) convolutional layer to replace the fully connected layer\n",
 365 |     "                classification layer: num_anchors (9 in here) channels for 0, 1 sigmoid activation output\n",
 366 |     "                regression layer: num_anchors*4 (36 in here) channels for computing the regression of bboxes with linear activation\n",
 367 |     "    Args:\n",
 368 |     "        base_layers: vgg in here\n",
 369 |     "        num_anchors: 9 in here\n",
 370 |     "\n",
 371 |     "    Returns:\n",
 372 |     "        [x_class, x_regr, base_layers]\n",
 373 |     "        x_class: classification for whether it's an object\n",
 374 |     "        x_regr: bboxes regression\n",
 375 |     "        base_layers: vgg in here\n",
 376 |     "    \"\"\"\n",
 377 |     "    x = Conv2D(512, (3, 3), padding='same', activation='relu', kernel_initializer='normal', name='rpn_conv1')(base_layers)\n",
 378 |     "\n",
 379 |     "    x_class = Conv2D(num_anchors, (1, 1), activation='sigmoid', kernel_initializer='uniform', name='rpn_out_class')(x)\n",
 380 |     "    x_regr = Conv2D(num_anchors * 4, (1, 1), activation='linear', kernel_initializer='zero', name='rpn_out_regress')(x)\n",
 381 |     "\n",
 382 |     "    return [x_class, x_regr, base_layers]"
 383 |    ]
 384 |   },
 385 |   {
 386 |    "cell_type": "code",
 387 |    "execution_count": 8,
 388 |    "metadata": {},
 389 |    "outputs": [],
 390 |    "source": [
 391 |     "def classifier_layer(base_layers, input_rois, num_rois, nb_classes = 4):\n",
 392 |     "    \"\"\"Create a classifier layer\n",
 393 |     "    \n",
 394 |     "    Args:\n",
 395 |     "        base_layers: vgg\n",
 396 |     "        input_rois: `(1,num_rois,4)` list of rois, with ordering (x,y,w,h)\n",
 397 |     "        num_rois: number of rois to be processed in one time (4 in here)\n",
 398 |     "\n",
 399 |     "    Returns:\n",
 400 |     "        list(out_class, out_regr)\n",
 401 |     "        out_class: classifier layer output\n",
 402 |     "        out_regr: regression layer output\n",
 403 |     "    \"\"\"\n",
 404 |     "\n",
 405 |     "    input_shape = (num_rois,7,7,512)\n",
 406 |     "\n",
 407 |     "    pooling_regions = 7\n",
 408 |     "\n",
 409 |     "    # out_roi_pool.shape = (1, num_rois, channels, pool_size, pool_size)\n",
 410 |     "    # num_rois (4) 7x7 roi pooling\n",
 411 |     "    out_roi_pool = RoiPoolingConv(pooling_regions, num_rois)([base_layers, input_rois])\n",
 412 |     "\n",
 413 |     "    # Flatten the convlutional layer and connected to 2 FC and 2 dropout\n",
 414 |     "    out = TimeDistributed(Flatten(name='flatten'))(out_roi_pool)\n",
 415 |     "    out = TimeDistributed(Dense(4096, activation='relu', name='fc1'))(out)\n",
 416 |     "    out = TimeDistributed(Dropout(0.5))(out)\n",
 417 |     "    out = TimeDistributed(Dense(4096, activation='relu', name='fc2'))(out)\n",
 418 |     "    out = TimeDistributed(Dropout(0.5))(out)\n",
 419 |     "\n",
 420 |     "    # There are two output layer\n",
 421 |     "    # out_class: softmax acivation function for classify the class name of the object\n",
 422 |     "    # out_regr: linear activation function for bboxes coordinates regression\n",
 423 |     "    out_class = TimeDistributed(Dense(nb_classes, activation='softmax', kernel_initializer='zero'), name='dense_class_{}'.format(nb_classes))(out)\n",
 424 |     "    # note: no regression target for bg class\n",
 425 |     "    out_regr = TimeDistributed(Dense(4 * (nb_classes-1), activation='linear', kernel_initializer='zero'), name='dense_regress_{}'.format(nb_classes))(out)\n",
 426 |     "\n",
 427 |     "    return [out_class, out_regr]"
 428 |    ]
 429 |   },
 430 |   {
 431 |    "cell_type": "code",
 432 |    "execution_count": 9,
 433 |    "metadata": {},
 434 |    "outputs": [],
 435 |    "source": [
 436 |     "def union(au, bu, area_intersection):\n",
 437 |     "\tarea_a = (au[2] - au[0]) * (au[3] - au[1])\n",
 438 |     "\tarea_b = (bu[2] - bu[0]) * (bu[3] - bu[1])\n",
 439 |     "\tarea_union = area_a + area_b - area_intersection\n",
 440 |     "\treturn area_union\n",
 441 |     "\n",
 442 |     "\n",
 443 |     "def intersection(ai, bi):\n",
 444 |     "\tx = max(ai[0], bi[0])\n",
 445 |     "\ty = max(ai[1], bi[1])\n",
 446 |     "\tw = min(ai[2], bi[2]) - x\n",
 447 |     "\th = min(ai[3], bi[3]) - y\n",
 448 |     "\tif w < 0 or h < 0:\n",
 449 |     "\t\treturn 0\n",
 450 |     "\treturn w*h\n",
 451 |     "\n",
 452 |     "\n",
 453 |     "def iou(a, b):\n",
 454 |     "\t# a and b should be (x1,y1,x2,y2)\n",
 455 |     "\n",
 456 |     "\tif a[0] >= a[2] or a[1] >= a[3] or b[0] >= b[2] or b[1] >= b[3]:\n",
 457 |     "\t\treturn 0.0\n",
 458 |     "\n",
 459 |     "\tarea_i = intersection(a, b)\n",
 460 |     "\tarea_u = union(a, b, area_i)\n",
 461 |     "\n",
 462 |     "\treturn float(area_i) / float(area_u + 1e-6)"
 463 |    ]
 464 |   },
 465 |   {
 466 |    "cell_type": "code",
 467 |    "execution_count": 10,
 468 |    "metadata": {},
 469 |    "outputs": [],
 470 |    "source": [
 471 |     "def calc_rpn(C, img_data, width, height, resized_width, resized_height, img_length_calc_function):\n",
 472 |     "\t\"\"\"(Important part!) Calculate the rpn for all anchors \n",
 473 |     "\t\tIf feature map has shape 38x50=1900, there are 1900x9=17100 potential anchors\n",
 474 |     "\t\n",
 475 |     "\tArgs:\n",
 476 |     "\t\tC: config\n",
 477 |     "\t\timg_data: augmented image data\n",
 478 |     "\t\twidth: original image width (e.g. 600)\n",
 479 |     "\t\theight: original image height (e.g. 800)\n",
 480 |     "\t\tresized_width: resized image width according to C.im_size (e.g. 300)\n",
 481 |     "\t\tresized_height: resized image height according to C.im_size (e.g. 400)\n",
 482 |     "\t\timg_length_calc_function: function to calculate final layer's feature map (of base model) size according to input image size\n",
 483 |     "\n",
 484 |     "\tReturns:\n",
 485 |     "\t\ty_rpn_cls: list(num_bboxes, y_is_box_valid + y_rpn_overlap)\n",
 486 |     "\t\t\ty_is_box_valid: 0 or 1 (0 means the box is invalid, 1 means the box is valid)\n",
 487 |     "\t\t\ty_rpn_overlap: 0 or 1 (0 means the box is not an object, 1 means the box is an object)\n",
 488 |     "\t\ty_rpn_regr: list(num_bboxes, 4*y_rpn_overlap + y_rpn_regr)\n",
 489 |     "\t\t\ty_rpn_regr: x1,y1,x2,y2 bunding boxes coordinates\n",
 490 |     "\t\"\"\"\n",
 491 |     "\tdownscale = float(C.rpn_stride) \n",
 492 |     "\tanchor_sizes = C.anchor_box_scales   # 128, 256, 512\n",
 493 |     "\tanchor_ratios = C.anchor_box_ratios  # 1:1, 1:2*sqrt(2), 2*sqrt(2):1\n",
 494 |     "\tnum_anchors = len(anchor_sizes) * len(anchor_ratios) # 3x3=9\n",
 495 |     "\n",
 496 |     "\t# calculate the output map size based on the network architecture\n",
 497 |     "\t(output_width, output_height) = img_length_calc_function(resized_width, resized_height)\n",
 498 |     "\n",
 499 |     "\tn_anchratios = len(anchor_ratios)    # 3\n",
 500 |     "\t\n",
 501 |     "\t# initialise empty output objectives\n",
 502 |     "\ty_rpn_overlap = np.zeros((output_height, output_width, num_anchors))\n",
 503 |     "\ty_is_box_valid = np.zeros((output_height, output_width, num_anchors))\n",
 504 |     "\ty_rpn_regr = np.zeros((output_height, output_width, num_anchors * 4))\n",
 505 |     "\n",
 506 |     "\tnum_bboxes = len(img_data['bboxes'])\n",
 507 |     "\n",
 508 |     "\tnum_anchors_for_bbox = np.zeros(num_bboxes).astype(int)\n",
 509 |     "\tbest_anchor_for_bbox = -1*np.ones((num_bboxes, 4)).astype(int)\n",
 510 |     "\tbest_iou_for_bbox = np.zeros(num_bboxes).astype(np.float32)\n",
 511 |     "\tbest_x_for_bbox = np.zeros((num_bboxes, 4)).astype(int)\n",
 512 |     "\tbest_dx_for_bbox = np.zeros((num_bboxes, 4)).astype(np.float32)\n",
 513 |     "\n",
 514 |     "\t# get the GT box coordinates, and resize to account for image resizing\n",
 515 |     "\tgta = np.zeros((num_bboxes, 4))\n",
 516 |     "\tfor bbox_num, bbox in enumerate(img_data['bboxes']):\n",
 517 |     "\t\t# get the GT box coordinates, and resize to account for image resizing\n",
 518 |     "\t\tgta[bbox_num, 0] = bbox['x1'] * (resized_width / float(width))\n",
 519 |     "\t\tgta[bbox_num, 1] = bbox['x2'] * (resized_width / float(width))\n",
 520 |     "\t\tgta[bbox_num, 2] = bbox['y1'] * (resized_height / float(height))\n",
 521 |     "\t\tgta[bbox_num, 3] = bbox['y2'] * (resized_height / float(height))\n",
 522 |     "\t\n",
 523 |     "\t# rpn ground truth\n",
 524 |     "\n",
 525 |     "\tfor anchor_size_idx in range(len(anchor_sizes)):\n",
 526 |     "\t\tfor anchor_ratio_idx in range(n_anchratios):\n",
 527 |     "\t\t\tanchor_x = anchor_sizes[anchor_size_idx] * anchor_ratios[anchor_ratio_idx][0]\n",
 528 |     "\t\t\tanchor_y = anchor_sizes[anchor_size_idx] * anchor_ratios[anchor_ratio_idx][1]\t\n",
 529 |     "\t\t\t\n",
 530 |     "\t\t\tfor ix in range(output_width):\t\t\t\t\t\n",
 531 |     "\t\t\t\t# x-coordinates of the current anchor box\t\n",
 532 |     "\t\t\t\tx1_anc = downscale * (ix + 0.5) - anchor_x / 2\n",
 533 |     "\t\t\t\tx2_anc = downscale * (ix + 0.5) + anchor_x / 2\t\n",
 534 |     "\t\t\t\t\n",
 535 |     "\t\t\t\t# ignore boxes that go across image boundaries\t\t\t\t\t\n",
 536 |     "\t\t\t\tif x1_anc < 0 or x2_anc > resized_width:\n",
 537 |     "\t\t\t\t\tcontinue\n",
 538 |     "\t\t\t\t\t\n",
 539 |     "\t\t\t\tfor jy in range(output_height):\n",
 540 |     "\n",
 541 |     "\t\t\t\t\t# y-coordinates of the current anchor box\n",
 542 |     "\t\t\t\t\ty1_anc = downscale * (jy + 0.5) - anchor_y / 2\n",
 543 |     "\t\t\t\t\ty2_anc = downscale * (jy + 0.5) + anchor_y / 2\n",
 544 |     "\n",
 545 |     "\t\t\t\t\t# ignore boxes that go across image boundaries\n",
 546 |     "\t\t\t\t\tif y1_anc < 0 or y2_anc > resized_height:\n",
 547 |     "\t\t\t\t\t\tcontinue\n",
 548 |     "\n",
 549 |     "\t\t\t\t\t# bbox_type indicates whether an anchor should be a target\n",
 550 |     "\t\t\t\t\t# Initialize with 'negative'\n",
 551 |     "\t\t\t\t\tbbox_type = 'neg'\n",
 552 |     "\n",
 553 |     "\t\t\t\t\t# this is the best IOU for the (x,y) coord and the current anchor\n",
 554 |     "\t\t\t\t\t# note that this is different from the best IOU for a GT bbox\n",
 555 |     "\t\t\t\t\tbest_iou_for_loc = 0.0\n",
 556 |     "\n",
 557 |     "\t\t\t\t\tfor bbox_num in range(num_bboxes):\n",
 558 |     "\t\t\t\t\t\t\n",
 559 |     "\t\t\t\t\t\t# get IOU of the current GT box and the current anchor box\n",
 560 |     "\t\t\t\t\t\tcurr_iou = iou([gta[bbox_num, 0], gta[bbox_num, 2], gta[bbox_num, 1], gta[bbox_num, 3]], [x1_anc, y1_anc, x2_anc, y2_anc])\n",
 561 |     "\t\t\t\t\t\t# calculate the regression targets if they will be needed\n",
 562 |     "\t\t\t\t\t\tif curr_iou > best_iou_for_bbox[bbox_num] or curr_iou > C.rpn_max_overlap:\n",
 563 |     "\t\t\t\t\t\t\tcx = (gta[bbox_num, 0] + gta[bbox_num, 1]) / 2.0\n",
 564 |     "\t\t\t\t\t\t\tcy = (gta[bbox_num, 2] + gta[bbox_num, 3]) / 2.0\n",
 565 |     "\t\t\t\t\t\t\tcxa = (x1_anc + x2_anc)/2.0\n",
 566 |     "\t\t\t\t\t\t\tcya = (y1_anc + y2_anc)/2.0\n",
 567 |     "\n",
 568 |     "\t\t\t\t\t\t\t# x,y are the center point of ground-truth bbox\n",
 569 |     "\t\t\t\t\t\t\t# xa,ya are the center point of anchor bbox (xa=downscale * (ix + 0.5); ya=downscale * (iy+0.5))\n",
 570 |     "\t\t\t\t\t\t\t# w,h are the width and height of ground-truth bbox\n",
 571 |     "\t\t\t\t\t\t\t# wa,ha are the width and height of anchor bboxe\n",
 572 |     "\t\t\t\t\t\t\t# tx = (x - xa) / wa\n",
 573 |     "\t\t\t\t\t\t\t# ty = (y - ya) / ha\n",
 574 |     "\t\t\t\t\t\t\t# tw = log(w / wa)\n",
 575 |     "\t\t\t\t\t\t\t# th = log(h / ha)\n",
 576 |     "\t\t\t\t\t\t\ttx = (cx - cxa) / (x2_anc - x1_anc)\n",
 577 |     "\t\t\t\t\t\t\tty = (cy - cya) / (y2_anc - y1_anc)\n",
 578 |     "\t\t\t\t\t\t\ttw = np.log((gta[bbox_num, 1] - gta[bbox_num, 0]) / (x2_anc - x1_anc))\n",
 579 |     "\t\t\t\t\t\t\tth = np.log((gta[bbox_num, 3] - gta[bbox_num, 2]) / (y2_anc - y1_anc))\n",
 580 |     "\t\t\t\t\t\t\n",
 581 |     "\t\t\t\t\t\tif img_data['bboxes'][bbox_num]['class'] != 'bg':\n",
 582 |     "\n",
 583 |     "\t\t\t\t\t\t\t# all GT boxes should be mapped to an anchor box, so we keep track of which anchor box was best\n",
 584 |     "\t\t\t\t\t\t\tif curr_iou > best_iou_for_bbox[bbox_num]:\n",
 585 |     "\t\t\t\t\t\t\t\tbest_anchor_for_bbox[bbox_num] = [jy, ix, anchor_ratio_idx, anchor_size_idx]\n",
 586 |     "\t\t\t\t\t\t\t\tbest_iou_for_bbox[bbox_num] = curr_iou\n",
 587 |     "\t\t\t\t\t\t\t\tbest_x_for_bbox[bbox_num,:] = [x1_anc, x2_anc, y1_anc, y2_anc]\n",
 588 |     "\t\t\t\t\t\t\t\tbest_dx_for_bbox[bbox_num,:] = [tx, ty, tw, th]\n",
 589 |     "\n",
 590 |     "\t\t\t\t\t\t\t# we set the anchor to positive if the IOU is >0.7 (it does not matter if there was another better box, it just indicates overlap)\n",
 591 |     "\t\t\t\t\t\t\tif curr_iou > C.rpn_max_overlap:\n",
 592 |     "\t\t\t\t\t\t\t\tbbox_type = 'pos'\n",
 593 |     "\t\t\t\t\t\t\t\tnum_anchors_for_bbox[bbox_num] += 1\n",
 594 |     "\t\t\t\t\t\t\t\t# we update the regression layer target if this IOU is the best for the current (x,y) and anchor position\n",
 595 |     "\t\t\t\t\t\t\t\tif curr_iou > best_iou_for_loc:\n",
 596 |     "\t\t\t\t\t\t\t\t\tbest_iou_for_loc = curr_iou\n",
 597 |     "\t\t\t\t\t\t\t\t\tbest_regr = (tx, ty, tw, th)\n",
 598 |     "\n",
 599 |     "\t\t\t\t\t\t\t# if the IOU is >0.3 and <0.7, it is ambiguous and no included in the objective\n",
 600 |     "\t\t\t\t\t\t\tif C.rpn_min_overlap < curr_iou < C.rpn_max_overlap:\n",
 601 |     "\t\t\t\t\t\t\t\t# gray zone between neg and pos\n",
 602 |     "\t\t\t\t\t\t\t\tif bbox_type != 'pos':\n",
 603 |     "\t\t\t\t\t\t\t\t\tbbox_type = 'neutral'\n",
 604 |     "\n",
 605 |     "\t\t\t\t\t# turn on or off outputs depending on IOUs\n",
 606 |     "\t\t\t\t\tif bbox_type == 'neg':\n",
 607 |     "\t\t\t\t\t\ty_is_box_valid[jy, ix, anchor_ratio_idx + n_anchratios * anchor_size_idx] = 1\n",
 608 |     "\t\t\t\t\t\ty_rpn_overlap[jy, ix, anchor_ratio_idx + n_anchratios * anchor_size_idx] = 0\n",
 609 |     "\t\t\t\t\telif bbox_type == 'neutral':\n",
 610 |     "\t\t\t\t\t\ty_is_box_valid[jy, ix, anchor_ratio_idx + n_anchratios * anchor_size_idx] = 0\n",
 611 |     "\t\t\t\t\t\ty_rpn_overlap[jy, ix, anchor_ratio_idx + n_anchratios * anchor_size_idx] = 0\n",
 612 |     "\t\t\t\t\telif bbox_type == 'pos':\n",
 613 |     "\t\t\t\t\t\ty_is_box_valid[jy, ix, anchor_ratio_idx + n_anchratios * anchor_size_idx] = 1\n",
 614 |     "\t\t\t\t\t\ty_rpn_overlap[jy, ix, anchor_ratio_idx + n_anchratios * anchor_size_idx] = 1\n",
 615 |     "\t\t\t\t\t\tstart = 4 * (anchor_ratio_idx + n_anchratios * anchor_size_idx)\n",
 616 |     "\t\t\t\t\t\ty_rpn_regr[jy, ix, start:start+4] = best_regr\n",
 617 |     "\n",
 618 |     "\t# we ensure that every bbox has at least one positive RPN region\n",
 619 |     "\n",
 620 |     "\tfor idx in range(num_anchors_for_bbox.shape[0]):\n",
 621 |     "\t\tif num_anchors_for_bbox[idx] == 0:\n",
 622 |     "\t\t\t# no box with an IOU greater than zero ...\n",
 623 |     "\t\t\tif best_anchor_for_bbox[idx, 0] == -1:\n",
 624 |     "\t\t\t\tcontinue\n",
 625 |     "\t\t\ty_is_box_valid[\n",
 626 |     "\t\t\t\tbest_anchor_for_bbox[idx,0], best_anchor_for_bbox[idx,1], best_anchor_for_bbox[idx,2] + n_anchratios *\n",
 627 |     "\t\t\t\tbest_anchor_for_bbox[idx,3]] = 1\n",
 628 |     "\t\t\ty_rpn_overlap[\n",
 629 |     "\t\t\t\tbest_anchor_for_bbox[idx,0], best_anchor_for_bbox[idx,1], best_anchor_for_bbox[idx,2] + n_anchratios *\n",
 630 |     "\t\t\t\tbest_anchor_for_bbox[idx,3]] = 1\n",
 631 |     "\t\t\tstart = 4 * (best_anchor_for_bbox[idx,2] + n_anchratios * best_anchor_for_bbox[idx,3])\n",
 632 |     "\t\t\ty_rpn_regr[\n",
 633 |     "\t\t\t\tbest_anchor_for_bbox[idx,0], best_anchor_for_bbox[idx,1], start:start+4] = best_dx_for_bbox[idx, :]\n",
 634 |     "\n",
 635 |     "\ty_rpn_overlap = np.transpose(y_rpn_overlap, (2, 0, 1))\n",
 636 |     "\ty_rpn_overlap = np.expand_dims(y_rpn_overlap, axis=0)\n",
 637 |     "\n",
 638 |     "\ty_is_box_valid = np.transpose(y_is_box_valid, (2, 0, 1))\n",
 639 |     "\ty_is_box_valid = np.expand_dims(y_is_box_valid, axis=0)\n",
 640 |     "\n",
 641 |     "\ty_rpn_regr = np.transpose(y_rpn_regr, (2, 0, 1))\n",
 642 |     "\ty_rpn_regr = np.expand_dims(y_rpn_regr, axis=0)\n",
 643 |     "\n",
 644 |     "\tpos_locs = np.where(np.logical_and(y_rpn_overlap[0, :, :, :] == 1, y_is_box_valid[0, :, :, :] == 1))\n",
 645 |     "\tneg_locs = np.where(np.logical_and(y_rpn_overlap[0, :, :, :] == 0, y_is_box_valid[0, :, :, :] == 1))\n",
 646 |     "\n",
 647 |     "\tnum_pos = len(pos_locs[0])\n",
 648 |     "\n",
 649 |     "\t# one issue is that the RPN has many more negative than positive regions, so we turn off some of the negative\n",
 650 |     "\t# regions. We also limit it to 256 regions.\n",
 651 |     "\tnum_regions = 256\n",
 652 |     "\n",
 653 |     "\tif len(pos_locs[0]) > num_regions/2:\n",
 654 |     "\t\tval_locs = random.sample(range(len(pos_locs[0])), len(pos_locs[0]) - num_regions/2)\n",
 655 |     "\t\ty_is_box_valid[0, pos_locs[0][val_locs], pos_locs[1][val_locs], pos_locs[2][val_locs]] = 0\n",
 656 |     "\t\tnum_pos = num_regions/2\n",
 657 |     "\n",
 658 |     "\tif len(neg_locs[0]) + num_pos > num_regions:\n",
 659 |     "\t\tval_locs = random.sample(range(len(neg_locs[0])), len(neg_locs[0]) - num_pos)\n",
 660 |     "\t\ty_is_box_valid[0, neg_locs[0][val_locs], neg_locs[1][val_locs], neg_locs[2][val_locs]] = 0\n",
 661 |     "\n",
 662 |     "\ty_rpn_cls = np.concatenate([y_is_box_valid, y_rpn_overlap], axis=1)\n",
 663 |     "\ty_rpn_regr = np.concatenate([np.repeat(y_rpn_overlap, 4, axis=1), y_rpn_regr], axis=1)\n",
 664 |     "\n",
 665 |     "\treturn np.copy(y_rpn_cls), np.copy(y_rpn_regr), num_pos"
 666 |    ]
 667 |   },
 668 |   {
 669 |    "cell_type": "code",
 670 |    "execution_count": 11,
 671 |    "metadata": {},
 672 |    "outputs": [],
 673 |    "source": [
 674 |     "def get_new_img_size(width, height, img_min_side=300):\n",
 675 |     "\tif width <= height:\n",
 676 |     "\t\tf = float(img_min_side) / width\n",
 677 |     "\t\tresized_height = int(f * height)\n",
 678 |     "\t\tresized_width = img_min_side\n",
 679 |     "\telse:\n",
 680 |     "\t\tf = float(img_min_side) / height\n",
 681 |     "\t\tresized_width = int(f * width)\n",
 682 |     "\t\tresized_height = img_min_side\n",
 683 |     "\n",
 684 |     "\treturn resized_width, resized_height\n",
 685 |     "\n",
 686 |     "def augment(img_data, config, augment=True):\n",
 687 |     "\tassert 'filepath' in img_data\n",
 688 |     "\tassert 'bboxes' in img_data\n",
 689 |     "\tassert 'width' in img_data\n",
 690 |     "\tassert 'height' in img_data\n",
 691 |     "\n",
 692 |     "\timg_data_aug = copy.deepcopy(img_data)\n",
 693 |     "\n",
 694 |     "\timg = cv2.imread(img_data_aug['filepath'])\n",
 695 |     "\n",
 696 |     "\tif augment:\n",
 697 |     "\t\trows, cols = img.shape[:2]\n",
 698 |     "\n",
 699 |     "\t\tif config.use_horizontal_flips and np.random.randint(0, 2) == 0:\n",
 700 |     "\t\t\timg = cv2.flip(img, 1)\n",
 701 |     "\t\t\tfor bbox in img_data_aug['bboxes']:\n",
 702 |     "\t\t\t\tx1 = bbox['x1']\n",
 703 |     "\t\t\t\tx2 = bbox['x2']\n",
 704 |     "\t\t\t\tbbox['x2'] = cols - x1\n",
 705 |     "\t\t\t\tbbox['x1'] = cols - x2\n",
 706 |     "\n",
 707 |     "\t\tif config.use_vertical_flips and np.random.randint(0, 2) == 0:\n",
 708 |     "\t\t\timg = cv2.flip(img, 0)\n",
 709 |     "\t\t\tfor bbox in img_data_aug['bboxes']:\n",
 710 |     "\t\t\t\ty1 = bbox['y1']\n",
 711 |     "\t\t\t\ty2 = bbox['y2']\n",
 712 |     "\t\t\t\tbbox['y2'] = rows - y1\n",
 713 |     "\t\t\t\tbbox['y1'] = rows - y2\n",
 714 |     "\n",
 715 |     "\t\tif config.rot_90:\n",
 716 |     "\t\t\tangle = np.random.choice([0,90,180,270],1)[0]\n",
 717 |     "\t\t\tif angle == 270:\n",
 718 |     "\t\t\t\timg = np.transpose(img, (1,0,2))\n",
 719 |     "\t\t\t\timg = cv2.flip(img, 0)\n",
 720 |     "\t\t\telif angle == 180:\n",
 721 |     "\t\t\t\timg = cv2.flip(img, -1)\n",
 722 |     "\t\t\telif angle == 90:\n",
 723 |     "\t\t\t\timg = np.transpose(img, (1,0,2))\n",
 724 |     "\t\t\t\timg = cv2.flip(img, 1)\n",
 725 |     "\t\t\telif angle == 0:\n",
 726 |     "\t\t\t\tpass\n",
 727 |     "\n",
 728 |     "\t\t\tfor bbox in img_data_aug['bboxes']:\n",
 729 |     "\t\t\t\tx1 = bbox['x1']\n",
 730 |     "\t\t\t\tx2 = bbox['x2']\n",
 731 |     "\t\t\t\ty1 = bbox['y1']\n",
 732 |     "\t\t\t\ty2 = bbox['y2']\n",
 733 |     "\t\t\t\tif angle == 270:\n",
 734 |     "\t\t\t\t\tbbox['x1'] = y1\n",
 735 |     "\t\t\t\t\tbbox['x2'] = y2\n",
 736 |     "\t\t\t\t\tbbox['y1'] = cols - x2\n",
 737 |     "\t\t\t\t\tbbox['y2'] = cols - x1\n",
 738 |     "\t\t\t\telif angle == 180:\n",
 739 |     "\t\t\t\t\tbbox['x2'] = cols - x1\n",
 740 |     "\t\t\t\t\tbbox['x1'] = cols - x2\n",
 741 |     "\t\t\t\t\tbbox['y2'] = rows - y1\n",
 742 |     "\t\t\t\t\tbbox['y1'] = rows - y2\n",
 743 |     "\t\t\t\telif angle == 90:\n",
 744 |     "\t\t\t\t\tbbox['x1'] = rows - y2\n",
 745 |     "\t\t\t\t\tbbox['x2'] = rows - y1\n",
 746 |     "\t\t\t\t\tbbox['y1'] = x1\n",
 747 |     "\t\t\t\t\tbbox['y2'] = x2        \n",
 748 |     "\t\t\t\telif angle == 0:\n",
 749 |     "\t\t\t\t\tpass\n",
 750 |     "\n",
 751 |     "\timg_data_aug['width'] = img.shape[1]\n",
 752 |     "\timg_data_aug['height'] = img.shape[0]\n",
 753 |     "\treturn img_data_aug, img"
 754 |    ]
 755 |   },
 756 |   {
 757 |    "cell_type": "code",
 758 |    "execution_count": 12,
 759 |    "metadata": {},
 760 |    "outputs": [],
 761 |    "source": [
 762 |     "def get_anchor_gt(all_img_data, C, img_length_calc_function, mode='train'):\n",
 763 |     "\t\"\"\" Yield the ground-truth anchors as Y (labels)\n",
 764 |     "\t\t\n",
 765 |     "\tArgs:\n",
 766 |     "\t\tall_img_data: list(filepath, width, height, list(bboxes))\n",
 767 |     "\t\tC: config\n",
 768 |     "\t\timg_length_calc_function: function to calculate final layer's feature map (of base model) size according to input image size\n",
 769 |     "\t\tmode: 'train' or 'test'; 'train' mode need augmentation\n",
 770 |     "\n",
 771 |     "\tReturns:\n",
 772 |     "\t\tx_img: image data after resized and scaling (smallest size = 300px)\n",
 773 |     "\t\tY: [y_rpn_cls, y_rpn_regr]\n",
 774 |     "\t\timg_data_aug: augmented image data (original image with augmentation)\n",
 775 |     "\t\tdebug_img: show image for debug\n",
 776 |     "\t\tnum_pos: show number of positive anchors for debug\n",
 777 |     "\t\"\"\"\n",
 778 |     "\twhile True:\n",
 779 |     "\n",
 780 |     "\t\tfor img_data in all_img_data:\n",
 781 |     "\t\t\ttry:\n",
 782 |     "\n",
 783 |     "\t\t\t\t# read in image, and optionally add augmentation\n",
 784 |     "\n",
 785 |     "\t\t\t\tif mode == 'train':\n",
 786 |     "\t\t\t\t\timg_data_aug, x_img = augment(img_data, C, augment=True)\n",
 787 |     "\t\t\t\telse:\n",
 788 |     "\t\t\t\t\timg_data_aug, x_img = augment(img_data, C, augment=False)\n",
 789 |     "\n",
 790 |     "\t\t\t\t(width, height) = (img_data_aug['width'], img_data_aug['height'])\n",
 791 |     "\t\t\t\t(rows, cols, _) = x_img.shape\n",
 792 |     "\n",
 793 |     "\t\t\t\tassert cols == width\n",
 794 |     "\t\t\t\tassert rows == height\n",
 795 |     "\n",
 796 |     "\t\t\t\t# get image dimensions for resizing\n",
 797 |     "\t\t\t\t(resized_width, resized_height) = get_new_img_size(width, height, C.im_size)\n",
 798 |     "\n",
 799 |     "\t\t\t\t# resize the image so that smalles side is length = 300px\n",
 800 |     "\t\t\t\tx_img = cv2.resize(x_img, (resized_width, resized_height), interpolation=cv2.INTER_CUBIC)\n",
 801 |     "\t\t\t\tdebug_img = x_img.copy()\n",
 802 |     "\n",
 803 |     "\t\t\t\ttry:\n",
 804 |     "\t\t\t\t\ty_rpn_cls, y_rpn_regr, num_pos = calc_rpn(C, img_data_aug, width, height, resized_width, resized_height, img_length_calc_function)\n",
 805 |     "\t\t\t\texcept:\n",
 806 |     "\t\t\t\t\tcontinue\n",
 807 |     "\n",
 808 |     "\t\t\t\t# Zero-center by mean pixel, and preprocess image\n",
 809 |     "\n",
 810 |     "\t\t\t\tx_img = x_img[:,:, (2, 1, 0)]  # BGR -> RGB\n",
 811 |     "\t\t\t\tx_img = x_img.astype(np.float32)\n",
 812 |     "\t\t\t\tx_img[:, :, 0] -= C.img_channel_mean[0]\n",
 813 |     "\t\t\t\tx_img[:, :, 1] -= C.img_channel_mean[1]\n",
 814 |     "\t\t\t\tx_img[:, :, 2] -= C.img_channel_mean[2]\n",
 815 |     "\t\t\t\tx_img /= C.img_scaling_factor\n",
 816 |     "\n",
 817 |     "\t\t\t\tx_img = np.transpose(x_img, (2, 0, 1))\n",
 818 |     "\t\t\t\tx_img = np.expand_dims(x_img, axis=0)\n",
 819 |     "\n",
 820 |     "\t\t\t\ty_rpn_regr[:, y_rpn_regr.shape[1]//2:, :, :] *= C.std_scaling\n",
 821 |     "\n",
 822 |     "\t\t\t\tx_img = np.transpose(x_img, (0, 2, 3, 1))\n",
 823 |     "\t\t\t\ty_rpn_cls = np.transpose(y_rpn_cls, (0, 2, 3, 1))\n",
 824 |     "\t\t\t\ty_rpn_regr = np.transpose(y_rpn_regr, (0, 2, 3, 1))\n",
 825 |     "\n",
 826 |     "\t\t\t\tyield np.copy(x_img), [np.copy(y_rpn_cls), np.copy(y_rpn_regr)], img_data_aug, debug_img, num_pos\n",
 827 |     "\n",
 828 |     "\t\t\texcept Exception as e:\n",
 829 |     "\t\t\t\tprint(e)\n",
 830 |     "\t\t\t\tcontinue"
 831 |    ]
 832 |   },
 833 |   {
 834 |    "cell_type": "code",
 835 |    "execution_count": 13,
 836 |    "metadata": {},
 837 |    "outputs": [],
 838 |    "source": [
 839 |     "lambda_rpn_regr = 1.0\n",
 840 |     "lambda_rpn_class = 1.0\n",
 841 |     "\n",
 842 |     "lambda_cls_regr = 1.0\n",
 843 |     "lambda_cls_class = 1.0\n",
 844 |     "\n",
 845 |     "epsilon = 1e-4"
 846 |    ]
 847 |   },
 848 |   {
 849 |    "cell_type": "code",
 850 |    "execution_count": 14,
 851 |    "metadata": {},
 852 |    "outputs": [],
 853 |    "source": [
 854 |     "def rpn_loss_regr(num_anchors):\n",
 855 |     "    \"\"\"Loss function for rpn regression\n",
 856 |     "    Args:\n",
 857 |     "        num_anchors: number of anchors (9 in here)\n",
 858 |     "    Returns:\n",
 859 |     "        Smooth L1 loss function \n",
 860 |     "                           0.5*x*x (if x_abs < 1)\n",
 861 |     "                           x_abx - 0.5 (otherwise)\n",
 862 |     "    \"\"\"\n",
 863 |     "    def rpn_loss_regr_fixed_num(y_true, y_pred):\n",
 864 |     "\n",
 865 |     "        # x is the difference between true value and predicted vaue\n",
 866 |     "        x = y_true[:, :, :, 4 * num_anchors:] - y_pred\n",
 867 |     "\n",
 868 |     "        # absolute value of x\n",
 869 |     "        x_abs = K.abs(x)\n",
 870 |     "\n",
 871 |     "        # If x_abs <= 1.0, x_bool = 1\n",
 872 |     "        x_bool = K.cast(K.less_equal(x_abs, 1.0), tf.float32)\n",
 873 |     "\n",
 874 |     "        return lambda_rpn_regr * K.sum(\n",
 875 |     "            y_true[:, :, :, :4 * num_anchors] * (x_bool * (0.5 * x * x) + (1 - x_bool) * (x_abs - 0.5))) / K.sum(epsilon + y_true[:, :, :, :4 * num_anchors])\n",
 876 |     "\n",
 877 |     "    return rpn_loss_regr_fixed_num\n",
 878 |     "\n",
 879 |     "\n",
 880 |     "def rpn_loss_cls(num_anchors):\n",
 881 |     "    \"\"\"Loss function for rpn classification\n",
 882 |     "    Args:\n",
 883 |     "        num_anchors: number of anchors (9 in here)\n",
 884 |     "        y_true[:, :, :, :9]: [0,1,0,0,0,0,0,1,0] means only the second and the eighth box is valid which contains pos or neg anchor => isValid\n",
 885 |     "        y_true[:, :, :, 9:]: [0,1,0,0,0,0,0,0,0] means the second box is pos and eighth box is negative\n",
 886 |     "    Returns:\n",
 887 |     "        lambda * sum((binary_crossentropy(isValid*y_pred,y_true))) / N\n",
 888 |     "    \"\"\"\n",
 889 |     "    def rpn_loss_cls_fixed_num(y_true, y_pred):\n",
 890 |     "\n",
 891 |     "            return lambda_rpn_class * K.sum(y_true[:, :, :, :num_anchors] * K.binary_crossentropy(y_pred[:, :, :, :], y_true[:, :, :, num_anchors:])) / K.sum(epsilon + y_true[:, :, :, :num_anchors])\n",
 892 |     "\n",
 893 |     "    return rpn_loss_cls_fixed_num\n",
 894 |     "\n",
 895 |     "\n",
 896 |     "def class_loss_regr(num_classes):\n",
 897 |     "    \"\"\"Loss function for rpn regression\n",
 898 |     "    Args:\n",
 899 |     "        num_anchors: number of anchors (9 in here)\n",
 900 |     "    Returns:\n",
 901 |     "        Smooth L1 loss function \n",
 902 |     "                           0.5*x*x (if x_abs < 1)\n",
 903 |     "                           x_abx - 0.5 (otherwise)\n",
 904 |     "    \"\"\"\n",
 905 |     "    def class_loss_regr_fixed_num(y_true, y_pred):\n",
 906 |     "        x = y_true[:, :, 4*num_classes:] - y_pred\n",
 907 |     "        x_abs = K.abs(x)\n",
 908 |     "        x_bool = K.cast(K.less_equal(x_abs, 1.0), 'float32')\n",
 909 |     "        return lambda_cls_regr * K.sum(y_true[:, :, :4*num_classes] * (x_bool * (0.5 * x * x) + (1 - x_bool) * (x_abs - 0.5))) / K.sum(epsilon + y_true[:, :, :4*num_classes])\n",
 910 |     "    return class_loss_regr_fixed_num\n",
 911 |     "\n",
 912 |     "\n",
 913 |     "def class_loss_cls(y_true, y_pred):\n",
 914 |     "    return lambda_cls_class * K.mean(categorical_crossentropy(y_true[0, :, :], y_pred[0, :, :]))"
 915 |    ]
 916 |   },
 917 |   {
 918 |    "cell_type": "code",
 919 |    "execution_count": 15,
 920 |    "metadata": {},
 921 |    "outputs": [],
 922 |    "source": [
 923 |     "def non_max_suppression_fast(boxes, probs, overlap_thresh=0.9, max_boxes=300):\n",
 924 |     "    # code used from here: http://www.pyimagesearch.com/2015/02/16/faster-non-maximum-suppression-python/\n",
 925 |     "    # if there are no boxes, return an empty list\n",
 926 |     "\n",
 927 |     "    # Process explanation:\n",
 928 |     "    #   Step 1: Sort the probs list\n",
 929 |     "    #   Step 2: Find the larget prob 'Last' in the list and save it to the pick list\n",
 930 |     "    #   Step 3: Calculate the IoU with 'Last' box and other boxes in the list. If the IoU is larger than overlap_threshold, delete the box from list\n",
 931 |     "    #   Step 4: Repeat step 2 and step 3 until there is no item in the probs list \n",
 932 |     "    if len(boxes) == 0:\n",
 933 |     "        return []\n",
 934 |     "\n",
 935 |     "    # grab the coordinates of the bounding boxes\n",
 936 |     "    x1 = boxes[:, 0]\n",
 937 |     "    y1 = boxes[:, 1]\n",
 938 |     "    x2 = boxes[:, 2]\n",
 939 |     "    y2 = boxes[:, 3]\n",
 940 |     "\n",
 941 |     "    np.testing.assert_array_less(x1, x2)\n",
 942 |     "    np.testing.assert_array_less(y1, y2)\n",
 943 |     "\n",
 944 |     "    # if the bounding boxes integers, convert them to floats --\n",
 945 |     "    # this is important since we'll be doing a bunch of divisions\n",
 946 |     "    if boxes.dtype.kind == \"i\":\n",
 947 |     "        boxes = boxes.astype(\"float\")\n",
 948 |     "\n",
 949 |     "    # initialize the list of picked indexes\t\n",
 950 |     "    pick = []\n",
 951 |     "\n",
 952 |     "    # calculate the areas\n",
 953 |     "    area = (x2 - x1) * (y2 - y1)\n",
 954 |     "\n",
 955 |     "    # sort the bounding boxes \n",
 956 |     "    idxs = np.argsort(probs)\n",
 957 |     "\n",
 958 |     "    # keep looping while some indexes still remain in the indexes\n",
 959 |     "    # list\n",
 960 |     "    while len(idxs) > 0:\n",
 961 |     "        # grab the last index in the indexes list and add the\n",
 962 |     "        # index value to the list of picked indexes\n",
 963 |     "        last = len(idxs) - 1\n",
 964 |     "        i = idxs[last]\n",
 965 |     "        pick.append(i)\n",
 966 |     "\n",
 967 |     "        # find the intersection\n",
 968 |     "\n",
 969 |     "        xx1_int = np.maximum(x1[i], x1[idxs[:last]])\n",
 970 |     "        yy1_int = np.maximum(y1[i], y1[idxs[:last]])\n",
 971 |     "        xx2_int = np.minimum(x2[i], x2[idxs[:last]])\n",
 972 |     "        yy2_int = np.minimum(y2[i], y2[idxs[:last]])\n",
 973 |     "\n",
 974 |     "        ww_int = np.maximum(0, xx2_int - xx1_int)\n",
 975 |     "        hh_int = np.maximum(0, yy2_int - yy1_int)\n",
 976 |     "\n",
 977 |     "        area_int = ww_int * hh_int\n",
 978 |     "\n",
 979 |     "        # find the union\n",
 980 |     "        area_union = area[i] + area[idxs[:last]] - area_int\n",
 981 |     "\n",
 982 |     "        # compute the ratio of overlap\n",
 983 |     "        overlap = area_int/(area_union + 1e-6)\n",
 984 |     "\n",
 985 |     "        # delete all indexes from the index list that have\n",
 986 |     "        idxs = np.delete(idxs, np.concatenate(([last],\n",
 987 |     "            np.where(overlap > overlap_thresh)[0])))\n",
 988 |     "\n",
 989 |     "        if len(pick) >= max_boxes:\n",
 990 |     "            break\n",
 991 |     "\n",
 992 |     "    # return only the bounding boxes that were picked using the integer data type\n",
 993 |     "    boxes = boxes[pick].astype(\"int\")\n",
 994 |     "    probs = probs[pick]\n",
 995 |     "    return boxes, probs\n",
 996 |     "\n",
 997 |     "def apply_regr_np(X, T):\n",
 998 |     "    \"\"\"Apply regression layer to all anchors in one feature map\n",
 999 |     "\n",
1000 |     "    Args:\n",
1001 |     "        X: shape=(4, 18, 25) the current anchor type for all points in the feature map\n",
1002 |     "        T: regression layer shape=(4, 18, 25)\n",
1003 |     "\n",
1004 |     "    Returns:\n",
1005 |     "        X: regressed position and size for current anchor\n",
1006 |     "    \"\"\"\n",
1007 |     "    try:\n",
1008 |     "        x = X[0, :, :]\n",
1009 |     "        y = X[1, :, :]\n",
1010 |     "        w = X[2, :, :]\n",
1011 |     "        h = X[3, :, :]\n",
1012 |     "\n",
1013 |     "        tx = T[0, :, :]\n",
1014 |     "        ty = T[1, :, :]\n",
1015 |     "        tw = T[2, :, :]\n",
1016 |     "        th = T[3, :, :]\n",
1017 |     "\n",
1018 |     "        cx = x + w/2.\n",
1019 |     "        cy = y + h/2.\n",
1020 |     "        cx1 = tx * w + cx\n",
1021 |     "        cy1 = ty * h + cy\n",
1022 |     "\n",
1023 |     "        w1 = np.exp(tw.astype(np.float64)) * w\n",
1024 |     "        h1 = np.exp(th.astype(np.float64)) * h\n",
1025 |     "        x1 = cx1 - w1/2.\n",
1026 |     "        y1 = cy1 - h1/2.\n",
1027 |     "\n",
1028 |     "        x1 = np.round(x1)\n",
1029 |     "        y1 = np.round(y1)\n",
1030 |     "        w1 = np.round(w1)\n",
1031 |     "        h1 = np.round(h1)\n",
1032 |     "        return np.stack([x1, y1, w1, h1])\n",
1033 |     "    except Exception as e:\n",
1034 |     "        print(e)\n",
1035 |     "        return X\n",
1036 |     "    \n",
1037 |     "def apply_regr(x, y, w, h, tx, ty, tw, th):\n",
1038 |     "    # Apply regression to x, y, w and h\n",
1039 |     "    try:\n",
1040 |     "        cx = x + w/2.\n",
1041 |     "        cy = y + h/2.\n",
1042 |     "        cx1 = tx * w + cx\n",
1043 |     "        cy1 = ty * h + cy\n",
1044 |     "        w1 = math.exp(tw) * w\n",
1045 |     "        h1 = math.exp(th) * h\n",
1046 |     "        x1 = cx1 - w1/2.\n",
1047 |     "        y1 = cy1 - h1/2.\n",
1048 |     "        x1 = int(round(x1))\n",
1049 |     "        y1 = int(round(y1))\n",
1050 |     "        w1 = int(round(w1))\n",
1051 |     "        h1 = int(round(h1))\n",
1052 |     "\n",
1053 |     "        return x1, y1, w1, h1\n",
1054 |     "\n",
1055 |     "    except ValueError:\n",
1056 |     "        return x, y, w, h\n",
1057 |     "    except OverflowError:\n",
1058 |     "        return x, y, w, h\n",
1059 |     "    except Exception as e:\n",
1060 |     "        print(e)\n",
1061 |     "        return x, y, w, h\n",
1062 |     "\n",
1063 |     "def calc_iou(R, img_data, C, class_mapping):\n",
1064 |     "    \"\"\"Converts from (x1,y1,x2,y2) to (x,y,w,h) format\n",
1065 |     "\n",
1066 |     "    Args:\n",
1067 |     "        R: bboxes, probs\n",
1068 |     "    \"\"\"\n",
1069 |     "    bboxes = img_data['bboxes']\n",
1070 |     "    (width, height) = (img_data['width'], img_data['height'])\n",
1071 |     "    # get image dimensions for resizing\n",
1072 |     "    (resized_width, resized_height) = get_new_img_size(width, height, C.im_size)\n",
1073 |     "\n",
1074 |     "    gta = np.zeros((len(bboxes), 4))\n",
1075 |     "\n",
1076 |     "    for bbox_num, bbox in enumerate(bboxes):\n",
1077 |     "        # get the GT box coordinates, and resize to account for image resizing\n",
1078 |     "        # gta[bbox_num, 0] = (40 * (600 / 800)) / 16 = int(round(1.875)) = 2 (x in feature map)\n",
1079 |     "        gta[bbox_num, 0] = int(round(bbox['x1'] * (resized_width / float(width))/C.rpn_stride))\n",
1080 |     "        gta[bbox_num, 1] = int(round(bbox['x2'] * (resized_width / float(width))/C.rpn_stride))\n",
1081 |     "        gta[bbox_num, 2] = int(round(bbox['y1'] * (resized_height / float(height))/C.rpn_stride))\n",
1082 |     "        gta[bbox_num, 3] = int(round(bbox['y2'] * (resized_height / float(height))/C.rpn_stride))\n",
1083 |     "\n",
1084 |     "    x_roi = []\n",
1085 |     "    y_class_num = []\n",
1086 |     "    y_class_regr_coords = []\n",
1087 |     "    y_class_regr_label = []\n",
1088 |     "    IoUs = [] # for debugging only\n",
1089 |     "\n",
1090 |     "    # R.shape[0]: number of bboxes (=300 from non_max_suppression)\n",
1091 |     "    for ix in range(R.shape[0]):\n",
1092 |     "        (x1, y1, x2, y2) = R[ix, :]\n",
1093 |     "        x1 = int(round(x1))\n",
1094 |     "        y1 = int(round(y1))\n",
1095 |     "        x2 = int(round(x2))\n",
1096 |     "        y2 = int(round(y2))\n",
1097 |     "\n",
1098 |     "        best_iou = 0.0\n",
1099 |     "        best_bbox = -1\n",
1100 |     "        # Iterate through all the ground-truth bboxes to calculate the iou\n",
1101 |     "        for bbox_num in range(len(bboxes)):\n",
1102 |     "            curr_iou = iou([gta[bbox_num, 0], gta[bbox_num, 2], gta[bbox_num, 1], gta[bbox_num, 3]], [x1, y1, x2, y2])\n",
1103 |     "\n",
1104 |     "            # Find out the corresponding ground-truth bbox_num with larget iou\n",
1105 |     "            if curr_iou > best_iou:\n",
1106 |     "                best_iou = curr_iou\n",
1107 |     "                best_bbox = bbox_num\n",
1108 |     "\n",
1109 |     "        if best_iou < C.classifier_min_overlap:\n",
1110 |     "                continue\n",
1111 |     "        else:\n",
1112 |     "            w = x2 - x1\n",
1113 |     "            h = y2 - y1\n",
1114 |     "            x_roi.append([x1, y1, w, h])\n",
1115 |     "            IoUs.append(best_iou)\n",
1116 |     "\n",
1117 |     "            if C.classifier_min_overlap <= best_iou < C.classifier_max_overlap:\n",
1118 |     "                # hard negative example\n",
1119 |     "                cls_name = 'bg'\n",
1120 |     "            elif C.classifier_max_overlap <= best_iou:\n",
1121 |     "                cls_name = bboxes[best_bbox]['class']\n",
1122 |     "                cxg = (gta[best_bbox, 0] + gta[best_bbox, 1]) / 2.0\n",
1123 |     "                cyg = (gta[best_bbox, 2] + gta[best_bbox, 3]) / 2.0\n",
1124 |     "\n",
1125 |     "                cx = x1 + w / 2.0\n",
1126 |     "                cy = y1 + h / 2.0\n",
1127 |     "\n",
1128 |     "                tx = (cxg - cx) / float(w)\n",
1129 |     "                ty = (cyg - cy) / float(h)\n",
1130 |     "                tw = np.log((gta[best_bbox, 1] - gta[best_bbox, 0]) / float(w))\n",
1131 |     "                th = np.log((gta[best_bbox, 3] - gta[best_bbox, 2]) / float(h))\n",
1132 |     "            else:\n",
1133 |     "                print('roi = {}'.format(best_iou))\n",
1134 |     "                raise RuntimeError\n",
1135 |     "\n",
1136 |     "        class_num = class_mapping[cls_name]\n",
1137 |     "        class_label = len(class_mapping) * [0]\n",
1138 |     "        class_label[class_num] = 1\n",
1139 |     "        y_class_num.append(copy.deepcopy(class_label))\n",
1140 |     "        coords = [0] * 4 * (len(class_mapping) - 1)\n",
1141 |     "        labels = [0] * 4 * (len(class_mapping) - 1)\n",
1142 |     "        if cls_name != 'bg':\n",
1143 |     "            label_pos = 4 * class_num\n",
1144 |     "            sx, sy, sw, sh = C.classifier_regr_std\n",
1145 |     "            coords[label_pos:4+label_pos] = [sx*tx, sy*ty, sw*tw, sh*th]\n",
1146 |     "            labels[label_pos:4+label_pos] = [1, 1, 1, 1]\n",
1147 |     "            y_class_regr_coords.append(copy.deepcopy(coords))\n",
1148 |     "            y_class_regr_label.append(copy.deepcopy(labels))\n",
1149 |     "        else:\n",
1150 |     "            y_class_regr_coords.append(copy.deepcopy(coords))\n",
1151 |     "            y_class_regr_label.append(copy.deepcopy(labels))\n",
1152 |     "\n",
1153 |     "    if len(x_roi) == 0:\n",
1154 |     "        return None, None, None, None\n",
1155 |     "\n",
1156 |     "    # bboxes that iou > C.classifier_min_overlap for all gt bboxes in 300 non_max_suppression bboxes\n",
1157 |     "    X = np.array(x_roi)\n",
1158 |     "    # one hot code for bboxes from above => x_roi (X)\n",
1159 |     "    Y1 = np.array(y_class_num)\n",
1160 |     "    # corresponding labels and corresponding gt bboxes\n",
1161 |     "    Y2 = np.concatenate([np.array(y_class_regr_label),np.array(y_class_regr_coords)],axis=1)\n",
1162 |     "\n",
1163 |     "    return np.expand_dims(X, axis=0), np.expand_dims(Y1, axis=0), np.expand_dims(Y2, axis=0), IoUs"
1164 |    ]
1165 |   },
1166 |   {
1167 |    "cell_type": "code",
1168 |    "execution_count": 16,
1169 |    "metadata": {},
1170 |    "outputs": [],
1171 |    "source": [
1172 |     "def rpn_to_roi(rpn_layer, regr_layer, C, dim_ordering, use_regr=True, max_boxes=300,overlap_thresh=0.9):\n",
1173 |     "\t\"\"\"Convert rpn layer to roi bboxes\n",
1174 |     "\n",
1175 |     "\tArgs: (num_anchors = 9)\n",
1176 |     "\t\trpn_layer: output layer for rpn classification \n",
1177 |     "\t\t\tshape (1, feature_map.height, feature_map.width, num_anchors)\n",
1178 |     "\t\t\tMight be (1, 18, 25, 18) if resized image is 400 width and 300\n",
1179 |     "\t\tregr_layer: output layer for rpn regression\n",
1180 |     "\t\t\tshape (1, feature_map.height, feature_map.width, num_anchors)\n",
1181 |     "\t\t\tMight be (1, 18, 25, 72) if resized image is 400 width and 300\n",
1182 |     "\t\tC: config\n",
1183 |     "\t\tuse_regr: Wether to use bboxes regression in rpn\n",
1184 |     "\t\tmax_boxes: max bboxes number for non-max-suppression (NMS)\n",
1185 |     "\t\toverlap_thresh: If iou in NMS is larger than this threshold, drop the box\n",
1186 |     "\n",
1187 |     "\tReturns:\n",
1188 |     "\t\tresult: boxes from non-max-suppression (shape=(300, 4))\n",
1189 |     "\t\t\tboxes: coordinates for bboxes (on the feature map)\n",
1190 |     "\t\"\"\"\n",
1191 |     "\tregr_layer = regr_layer / C.std_scaling\n",
1192 |     "\n",
1193 |     "\tanchor_sizes = C.anchor_box_scales   # (3 in here)\n",
1194 |     "\tanchor_ratios = C.anchor_box_ratios  # (3 in here)\n",
1195 |     "\n",
1196 |     "\tassert rpn_layer.shape[0] == 1\n",
1197 |     "\n",
1198 |     "\t(rows, cols) = rpn_layer.shape[1:3]\n",
1199 |     "\n",
1200 |     "\tcurr_layer = 0\n",
1201 |     "\n",
1202 |     "\t# A.shape = (4, feature_map.height, feature_map.width, num_anchors) \n",
1203 |     "\t# Might be (4, 18, 25, 18) if resized image is 400 width and 300\n",
1204 |     "\t# A is the coordinates for 9 anchors for every point in the feature map \n",
1205 |     "\t# => all 18x25x9=4050 anchors cooridnates\n",
1206 |     "\tA = np.zeros((4, rpn_layer.shape[1], rpn_layer.shape[2], rpn_layer.shape[3]))\n",
1207 |     "\n",
1208 |     "\tfor anchor_size in anchor_sizes:\n",
1209 |     "\t\tfor anchor_ratio in anchor_ratios:\n",
1210 |     "\t\t\t# anchor_x = (128 * 1) / 16 = 8  => width of current anchor\n",
1211 |     "\t\t\t# anchor_y = (128 * 2) / 16 = 16 => height of current anchor\n",
1212 |     "\t\t\tanchor_x = (anchor_size * anchor_ratio[0])/C.rpn_stride\n",
1213 |     "\t\t\tanchor_y = (anchor_size * anchor_ratio[1])/C.rpn_stride\n",
1214 |     "\t\t\t\n",
1215 |     "\t\t\t# curr_layer: 0~8 (9 anchors)\n",
1216 |     "\t\t\t# the Kth anchor of all position in the feature map (9th in total)\n",
1217 |     "\t\t\tregr = regr_layer[0, :, :, 4 * curr_layer:4 * curr_layer + 4] # shape => (18, 25, 4)\n",
1218 |     "\t\t\tregr = np.transpose(regr, (2, 0, 1)) # shape => (4, 18, 25)\n",
1219 |     "\n",
1220 |     "\t\t\t# Create 18x25 mesh grid\n",
1221 |     "\t\t\t# For every point in x, there are all the y points and vice versa\n",
1222 |     "\t\t\t# X.shape = (18, 25)\n",
1223 |     "\t\t\t# Y.shape = (18, 25)\n",
1224 |     "\t\t\tX, Y = np.meshgrid(np.arange(cols),np. arange(rows))\n",
1225 |     "\n",
1226 |     "\t\t\t# Calculate anchor position and size for each feature map point\n",
1227 |     "\t\t\tA[0, :, :, curr_layer] = X - anchor_x/2 # Top left x coordinate\n",
1228 |     "\t\t\tA[1, :, :, curr_layer] = Y - anchor_y/2 # Top left y coordinate\n",
1229 |     "\t\t\tA[2, :, :, curr_layer] = anchor_x       # width of current anchor\n",
1230 |     "\t\t\tA[3, :, :, curr_layer] = anchor_y       # height of current anchor\n",
1231 |     "\n",
1232 |     "\t\t\t# Apply regression to x, y, w and h if there is rpn regression layer\n",
1233 |     "\t\t\tif use_regr:\n",
1234 |     "\t\t\t\tA[:, :, :, curr_layer] = apply_regr_np(A[:, :, :, curr_layer], regr)\n",
1235 |     "\n",
1236 |     "\t\t\t# Avoid width and height exceeding 1\n",
1237 |     "\t\t\tA[2, :, :, curr_layer] = np.maximum(1, A[2, :, :, curr_layer])\n",
1238 |     "\t\t\tA[3, :, :, curr_layer] = np.maximum(1, A[3, :, :, curr_layer])\n",
1239 |     "\n",
1240 |     "\t\t\t# Convert (x, y , w, h) to (x1, y1, x2, y2)\n",
1241 |     "\t\t\t# x1, y1 is top left coordinate\n",
1242 |     "\t\t\t# x2, y2 is bottom right coordinate\n",
1243 |     "\t\t\tA[2, :, :, curr_layer] += A[0, :, :, curr_layer]\n",
1244 |     "\t\t\tA[3, :, :, curr_layer] += A[1, :, :, curr_layer]\n",
1245 |     "\n",
1246 |     "\t\t\t# Avoid bboxes drawn outside the feature map\n",
1247 |     "\t\t\tA[0, :, :, curr_layer] = np.maximum(0, A[0, :, :, curr_layer])\n",
1248 |     "\t\t\tA[1, :, :, curr_layer] = np.maximum(0, A[1, :, :, curr_layer])\n",
1249 |     "\t\t\tA[2, :, :, curr_layer] = np.minimum(cols-1, A[2, :, :, curr_layer])\n",
1250 |     "\t\t\tA[3, :, :, curr_layer] = np.minimum(rows-1, A[3, :, :, curr_layer])\n",
1251 |     "\n",
1252 |     "\t\t\tcurr_layer += 1\n",
1253 |     "\n",
1254 |     "\tall_boxes = np.reshape(A.transpose((0, 3, 1, 2)), (4, -1)).transpose((1, 0))  # shape=(4050, 4)\n",
1255 |     "\tall_probs = rpn_layer.transpose((0, 3, 1, 2)).reshape((-1))                   # shape=(4050,)\n",
1256 |     "\n",
1257 |     "\tx1 = all_boxes[:, 0]\n",
1258 |     "\ty1 = all_boxes[:, 1]\n",
1259 |     "\tx2 = all_boxes[:, 2]\n",
1260 |     "\ty2 = all_boxes[:, 3]\n",
1261 |     "\n",
1262 |     "\t# Find out the bboxes which is illegal and delete them from bboxes list\n",
1263 |     "\tidxs = np.where((x1 - x2 >= 0) | (y1 - y2 >= 0))\n",
1264 |     "\n",
1265 |     "\tall_boxes = np.delete(all_boxes, idxs, 0)\n",
1266 |     "\tall_probs = np.delete(all_probs, idxs, 0)\n",
1267 |     "\n",
1268 |     "\t# Apply non_max_suppression\n",
1269 |     "\t# Only extract the bboxes. Don't need rpn probs in the later process\n",
1270 |     "\tresult = non_max_suppression_fast(all_boxes, all_probs, overlap_thresh=overlap_thresh, max_boxes=max_boxes)[0]\n",
1271 |     "\n",
1272 |     "\treturn result\n"
1273 |    ]
1274 |   },
1275 |   {
1276 |    "cell_type": "code",
1277 |    "execution_count": 35,
1278 |    "metadata": {},
1279 |    "outputs": [],
1280 |    "source": [
1281 |     "base_path = '/home/ubuntu/keras-frcnn'\n",
1282 |     "\n",
1283 |     "train_path =  '/home/ubuntu/Object-detection-using-Faster-RCNN/createddata/annotation.txt' # Training data (annotation file)\n",
1284 |     "\n",
1285 |     "num_rois = 4 # Number of RoIs to process at once.\n",
1286 |     "\n",
1287 |     "# Augmentation flag\n",
1288 |     "horizontal_flips = True # Augment with horizontal flips in training. \n",
1289 |     "vertical_flips = True   # Augment with vertical flips in training. \n",
1290 |     "rot_90 = True           # Augment with 90 degree rotations in training. \n",
1291 |     "\n",
1292 |     "output_weight_path = base_path + '/model/model_frcnn_vgg.hdf5' \n",
1293 |     "\n",
1294 |     "record_path = base_path+ '/model/record.csv' # Record data (used to save the losses, classification accuracy and mean average precision)\n",
1295 |     "\n",
1296 |     "base_weight_path = base_path + '/model/vgg16_weights_tf_dim_ordering_tf_kernels.h5'\n",
1297 |     "\n",
1298 |     "config_output_filename = base_path +'model_vgg_config.pickle'\n",
1299 |     "\n",
1300 |     "\n",
1301 |     "\n",
1302 |     "\n",
1303 |     "\n"
1304 |    ]
1305 |   },
1306 |   {
1307 |    "cell_type": "code",
1308 |    "execution_count": 18,
1309 |    "metadata": {},
1310 |    "outputs": [],
1311 |    "source": [
1312 |     "# Create the config\n",
1313 |     "C = Config()\n",
1314 |     "\n",
1315 |     "C.use_horizontal_flips = horizontal_flips\n",
1316 |     "C.use_vertical_flips = vertical_flips\n",
1317 |     "C.rot_90 = rot_90\n",
1318 |     "\n",
1319 |     "C.record_path = record_path\n",
1320 |     "C.model_path = output_weight_path\n",
1321 |     "C.num_rois = num_rois\n",
1322 |     "\n",
1323 |     "C.base_net_weights = base_weight_path"
1324 |    ]
1325 |   },
1326 |   {
1327 |    "cell_type": "code",
1328 |    "execution_count": 19,
1329 |    "metadata": {},
1330 |    "outputs": [
1331 |     {
1332 |      "name": "stdout",
1333 |      "output_type": "stream",
1334 |      "text": [
1335 |       "Parsing annotation files\n",
1336 |       "idx=200\n",
1337 |       "Spend 0.02 mins to load the data\n"
1338 |      ]
1339 |     }
1340 |    ],
1341 |    "source": [
1342 |     "st = time.time()\n",
1343 |     "train_imgs, classes_count, class_mapping = get_data(train_path)\n",
1344 |     "print()\n",
1345 |     "print('Spend %0.2f mins to load the data' % ((time.time()-st)/60) )"
1346 |    ]
1347 |   },
1348 |   {
1349 |    "cell_type": "code",
1350 |    "execution_count": 20,
1351 |    "metadata": {},
1352 |    "outputs": [
1353 |     {
1354 |      "name": "stdout",
1355 |      "output_type": "stream",
1356 |      "text": [
1357 |       "Training images per class:\n",
1358 |       "{'Bird': 84, 'Football': 26, 'Traffic light': 90, 'bg': 0}\n",
1359 |       "Num classes (including bg) = 4\n",
1360 |       "{'Bird': 0, 'Football': 1, 'Traffic light': 2, 'bg': 3}\n",
1361 |       "Config has been written to /Users/davidheller/keras-frcnnmodel_vgg_config.pickle, and can be loaded when testing to ensure correct results\n"
1362 |      ]
1363 |     }
1364 |    ],
1365 |    "source": [
1366 |     "if 'bg' not in classes_count:\n",
1367 |     "\tclasses_count['bg'] = 0\n",
1368 |     "\tclass_mapping['bg'] = len(class_mapping)\n",
1369 |     "# e.g.\n",
1370 |     "#    classes_count: {'Car': 2383, 'Mobile phone': 1108, 'Person': 3745, 'bg': 0}\n",
1371 |     "#    class_mapping: {'Person': 0, 'Car': 1, 'Mobile phone': 2, 'bg': 3}\n",
1372 |     "C.class_mapping = class_mapping\n",
1373 |     "\n",
1374 |     "print('Training images per class:')\n",
1375 |     "pprint.pprint(classes_count)\n",
1376 |     "print('Num classes (including bg) = {}'.format(len(classes_count)))\n",
1377 |     "print(class_mapping)\n",
1378 |     "\n",
1379 |     "# Save the configuration\n",
1380 |     "with open(config_output_filename, 'wb') as config_f:\n",
1381 |     "\tpickle.dump(C,config_f)\n",
1382 |     "\tprint('Config has been written to {}, and can be loaded when testing to ensure correct results'.format(config_output_filename))\n"
1383 |    ]
1384 |   },
1385 |   {
1386 |    "cell_type": "code",
1387 |    "execution_count": 21,
1388 |    "metadata": {},
1389 |    "outputs": [
1390 |     {
1391 |      "name": "stdout",
1392 |      "output_type": "stream",
1393 |      "text": [
1394 |       "Num train samples (images) 73\n"
1395 |      ]
1396 |     }
1397 |    ],
1398 |    "source": [
1399 |     "# Shuffle the images with seed\n",
1400 |     "random.seed(1)\n",
1401 |     "random.shuffle(train_imgs)\n",
1402 |     "\n",
1403 |     "print('Num train samples (images) {}'.format(len(train_imgs)))"
1404 |    ]
1405 |   },
1406 |   {
1407 |    "cell_type": "code",
1408 |    "execution_count": 22,
1409 |    "metadata": {},
1410 |    "outputs": [],
1411 |    "source": [
1412 |     "# Get train data generator which generate X, Y, image_data\n",
1413 |     "data_gen_train = get_anchor_gt(train_imgs, C, get_img_output_length, mode='train')"
1414 |    ]
1415 |   },
1416 |   {
1417 |    "cell_type": "code",
1418 |    "execution_count": 23,
1419 |    "metadata": {},
1420 |    "outputs": [],
1421 |    "source": [
1422 |     "X, Y, image_data, debug_img, debug_num_pos = next(data_gen_train)"
1423 |    ]
1424 |   },
1425 |   {
1426 |    "cell_type": "code",
1427 |    "execution_count": 36,
1428 |    "metadata": {},
1429 |    "outputs": [],
1430 |    "source": [
1431 |     "# print('Original image: height=%d width=%d'%(image_data['height'], image_data['width']))\n",
1432 |     "# print('Resized image:  height=%d width=%d C.im_size=%d'%(X.shape[1], X.shape[2], C.im_size))\n",
1433 |     "# print('Feature map size: height=%d width=%d C.rpn_stride=%d'%(Y[0].shape[1], Y[0].shape[2], C.rpn_stride))\n",
1434 |     "# print(X.shape)\n",
1435 |     "# print(str(len(Y))+\" includes 'y_rpn_cls' and 'y_rpn_regr'\")\n",
1436 |     "# print('Shape of y_rpn_cls {}'.format(Y[0].shape))\n",
1437 |     "# print('Shape of y_rpn_regr {}'.format(Y[1].shape))\n",
1438 |     "# print(image_data)\n",
1439 |     "\n",
1440 |     "# print('Number of positive anchors for this image: %d' % (debug_num_pos))\n",
1441 |     "# if debug_num_pos==0:\n",
1442 |     "#     gt_x1, gt_x2 = image_data['bboxes'][0]['x1']*(X.shape[2]/image_data['height']), image_data['bboxes'][0]['x2']*(X.shape[2]/image_data['height'])\n",
1443 |     "#     gt_y1, gt_y2 = image_data['bboxes'][0]['y1']*(X.shape[1]/image_data['width']), image_data['bboxes'][0]['y2']*(X.shape[1]/image_data['width'])\n",
1444 |     "#     gt_x1, gt_y1, gt_x2, gt_y2 = int(gt_x1), int(gt_y1), int(gt_x2), int(gt_y2)\n",
1445 |     "\n",
1446 |     "#     img = debug_img.copy()\n",
1447 |     "#     img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)\n",
1448 |     "#     color = (0, 255, 0)\n",
1449 |     "#     cv2.putText(img, 'gt bbox', (gt_x1, gt_y1-5), cv2.FONT_HERSHEY_DUPLEX, 0.7, color, 1)\n",
1450 |     "#     cv2.rectangle(img, (gt_x1, gt_y1), (gt_x2, gt_y2), color, 2)\n",
1451 |     "#     cv2.circle(img, (int((gt_x1+gt_x2)/2), int((gt_y1+gt_y2)/2)), 3, color, -1)\n",
1452 |     "\n",
1453 |     "#     plt.grid()\n",
1454 |     "#     plt.imshow(img)\n",
1455 |     "#     plt.show()\n",
1456 |     "# else:\n",
1457 |     "#     cls = Y[0][0]\n",
1458 |     "#     pos_cls = np.where(cls==1)\n",
1459 |     "#     print(pos_cls)\n",
1460 |     "#     regr = Y[1][0]\n",
1461 |     "#     pos_regr = np.where(regr==1)\n",
1462 |     "#     print(pos_regr)\n",
1463 |     "#     print('y_rpn_cls for possible pos anchor: {}'.format(cls[pos_cls[0][0],pos_cls[1][0],:]))\n",
1464 |     "#     print('y_rpn_regr for positive anchor: {}'.format(regr[pos_regr[0][0],pos_regr[1][0],:]))\n",
1465 |     "\n",
1466 |     "#     gt_x1, gt_x2 = image_data['bboxes'][0]['x1']*(X.shape[2]/image_data['width']), image_data['bboxes'][0]['x2']*(X.shape[2]/image_data['width'])\n",
1467 |     "#     gt_y1, gt_y2 = image_data['bboxes'][0]['y1']*(X.shape[1]/image_data['height']), image_data['bboxes'][0]['y2']*(X.shape[1]/image_data['height'])\n",
1468 |     "#     gt_x1, gt_y1, gt_x2, gt_y2 = int(gt_x1), int(gt_y1), int(gt_x2), int(gt_y2)\n",
1469 |     "\n",
1470 |     "#     img = debug_img.copy()\n",
1471 |     "#     img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)\n",
1472 |     "#     color = (0, 255, 0)\n",
1473 |     "#     #   cv2.putText(img, 'gt bbox', (gt_x1, gt_y1-5), cv2.FONT_HERSHEY_DUPLEX, 0.7, color, 1)\n",
1474 |     "#     cv2.rectangle(img, (gt_x1, gt_y1), (gt_x2, gt_y2), color, 2)\n",
1475 |     "#     cv2.circle(img, (int((gt_x1+gt_x2)/2), int((gt_y1+gt_y2)/2)), 3, color, -1)\n",
1476 |     "\n",
1477 |     "#     # Add text\n",
1478 |     "#     textLabel = 'gt bbox'\n",
1479 |     "#     (retval,baseLine) = cv2.getTextSize(textLabel,cv2.FONT_HERSHEY_COMPLEX,0.5,1)\n",
1480 |     "#     textOrg = (gt_x1, gt_y1+5)\n",
1481 |     "#     cv2.rectangle(img, (textOrg[0] - 5, textOrg[1]+baseLine - 5), (textOrg[0]+retval[0] + 5, textOrg[1]-retval[1] - 5), (0, 0, 0), 2)\n",
1482 |     "#     cv2.rectangle(img, (textOrg[0] - 5,textOrg[1]+baseLine - 5), (textOrg[0]+retval[0] + 5, textOrg[1]-retval[1] - 5), (255, 255, 255), -1)\n",
1483 |     "#     cv2.putText(img, textLabel, textOrg, cv2.FONT_HERSHEY_DUPLEX, 0.5, (0, 0, 0), 1)\n",
1484 |     "\n",
1485 |     "#     # Draw positive anchors according to the y_rpn_regr\n",
1486 |     "#     for i in range(debug_num_pos):\n",
1487 |     "\n",
1488 |     "#         color = (100+i*(155/4), 0, 100+i*(155/4))\n",
1489 |     "\n",
1490 |     "#         idx = pos_regr[2][i*4]/4\n",
1491 |     "#         anchor_size = C.anchor_box_scales[int(idx/3)]\n",
1492 |     "#         anchor_ratio = C.anchor_box_ratios[2-int((idx+1)%3)]\n",
1493 |     "\n",
1494 |     "#         center = (pos_regr[1][i*4]*C.rpn_stride, pos_regr[0][i*4]*C.rpn_stride)\n",
1495 |     "#         print('Center position of positive anchor: ', center)\n",
1496 |     "#         cv2.circle(img, center, 3, color, -1)\n",
1497 |     "#         anc_w, anc_h = anchor_size*anchor_ratio[0], anchor_size*anchor_ratio[1]\n",
1498 |     "#         cv2.rectangle(img, (center[0]-int(anc_w/2), center[1]-int(anc_h/2)), (center[0]+int(anc_w/2), center[1]+int(anc_h/2)), color, 2)\n",
1499 |     "# #         cv2.putText(img, 'pos anchor bbox '+str(i+1), (center[0]-int(anc_w/2), center[1]-int(anc_h/2)-5), cv2.FONT_HERSHEY_DUPLEX, 0.5, color, 1)\n",
1500 |     "\n",
1501 |     "# print('Green bboxes is ground-truth bbox. Others are positive anchors')\n",
1502 |     "# plt.figure(figsize=(8,8))\n",
1503 |     "# plt.grid()\n",
1504 |     "# plt.imshow(img)\n",
1505 |     "# plt.show()"
1506 |    ]
1507 |   },
1508 |   {
1509 |    "cell_type": "code",
1510 |    "execution_count": 25,
1511 |    "metadata": {},
1512 |    "outputs": [
1513 |     {
1514 |      "name": "stdout",
1515 |      "output_type": "stream",
1516 |      "text": [
1517 |       "WARNING:tensorflow:From /Users/davidheller/anaconda3/lib/python3.7/site-packages/tensorflow/python/framework/op_def_library.py:263: colocate_with (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version.\n",
1518 |       "Instructions for updating:\n",
1519 |       "Colocations handled automatically by placer.\n"
1520 |      ]
1521 |     }
1522 |    ],
1523 |    "source": [
1524 |     "input_shape_img = (None, None, 3)\n",
1525 |     "\n",
1526 |     "img_input = Input(shape=input_shape_img)\n",
1527 |     "roi_input = Input(shape=(None, 4))\n",
1528 |     "\n",
1529 |     "# define the base network (VGG here, can be Resnet50, Inception, etc)\n",
1530 |     "shared_layers = nn_base(img_input, trainable=True)"
1531 |    ]
1532 |   },
1533 |   {
1534 |    "cell_type": "code",
1535 |    "execution_count": 26,
1536 |    "metadata": {},
1537 |    "outputs": [
1538 |     {
1539 |      "name": "stdout",
1540 |      "output_type": "stream",
1541 |      "text": [
1542 |       "WARNING:tensorflow:From /Users/davidheller/anaconda3/lib/python3.7/site-packages/keras/backend/tensorflow_backend.py:3144: calling dropout (from tensorflow.python.ops.nn_ops) with keep_prob is deprecated and will be removed in a future version.\n",
1543 |       "Instructions for updating:\n",
1544 |       "Please use `rate` instead of `keep_prob`. Rate should be set to `rate = 1 - keep_prob`.\n",
1545 |       "This is the first time of your training\n",
1546 |       "loading weights from /Users/davidheller/keras-frcnn/model/vgg16_weights_tf_dim_ordering_tf_kernels.h5\n",
1547 |       "Could not load pretrained model weights. Weights can be found in the keras application folder             https://github.com/fchollet/keras/tree/master/keras/applications\n"
1548 |      ]
1549 |     }
1550 |    ],
1551 |    "source": [
1552 |     "# define the RPN, built on the base layers\n",
1553 |     "num_anchors = len(C.anchor_box_scales) * len(C.anchor_box_ratios) # 9\n",
1554 |     "rpn = rpn_layer(shared_layers, num_anchors)\n",
1555 |     "\n",
1556 |     "classifier = classifier_layer(shared_layers, roi_input, C.num_rois, nb_classes=len(classes_count))\n",
1557 |     "\n",
1558 |     "model_rpn = Model(img_input, rpn[:2])\n",
1559 |     "model_classifier = Model([img_input, roi_input], classifier)\n",
1560 |     "\n",
1561 |     "# this is a model that holds both the RPN and the classifier, used to load/save weights for the models\n",
1562 |     "model_all = Model([img_input, roi_input], rpn[:2] + classifier)\n",
1563 |     "\n",
1564 |     "# Because the google colab can only run the session several hours one time (then you need to connect again), \n",
1565 |     "# we need to save the model and load the model to continue training\n",
1566 |     "if not os.path.isfile(C.model_path):\n",
1567 |     "    #If this is the begin of the training, load the pre-traind base network such as vgg-16\n",
1568 |     "    try:\n",
1569 |     "        print('This is the first time of your training')\n",
1570 |     "        print('loading weights from {}'.format(C.base_net_weights))\n",
1571 |     "        model_rpn.load_weights(C.base_net_weights, by_name=True)\n",
1572 |     "        model_classifier.load_weights(C.base_net_weights, by_name=True)\n",
1573 |     "    except:\n",
1574 |     "        print('Could not load pretrained model weights. Weights can be found in the keras application folder \\\n",
1575 |     "            https://github.com/fchollet/keras/tree/master/keras/applications')\n",
1576 |     "    \n",
1577 |     "    # Create the record.csv file to record losses, acc and mAP\n",
1578 |     "    record_df = pd.DataFrame(columns=['mean_overlapping_bboxes', 'class_acc', 'loss_rpn_cls', 'loss_rpn_regr', 'loss_class_cls', 'loss_class_regr', 'curr_loss', 'elapsed_time', 'mAP'])\n",
1579 |     "else:\n",
1580 |     "    # If this is a continued training, load the trained model from before\n",
1581 |     "    print('Continue training based on previous trained model')\n",
1582 |     "    print('Loading weights from {}'.format(C.model_path))\n",
1583 |     "    model_rpn.load_weights(C.model_path, by_name=True)\n",
1584 |     "    model_classifier.load_weights(C.model_path, by_name=True)\n",
1585 |     "    \n",
1586 |     "    # Load the records\n",
1587 |     "    record_df = pd.read_csv(record_path)\n",
1588 |     "\n",
1589 |     "    r_mean_overlapping_bboxes = record_df['mean_overlapping_bboxes']\n",
1590 |     "    r_class_acc = record_df['class_acc']\n",
1591 |     "    r_loss_rpn_cls = record_df['loss_rpn_cls']\n",
1592 |     "    r_loss_rpn_regr = record_df['loss_rpn_regr']\n",
1593 |     "    r_loss_class_cls = record_df['loss_class_cls']\n",
1594 |     "    r_loss_class_regr = record_df['loss_class_regr']\n",
1595 |     "    r_curr_loss = record_df['curr_loss']\n",
1596 |     "    r_elapsed_time = record_df['elapsed_time']\n",
1597 |     "    r_mAP = record_df['mAP']\n",
1598 |     "\n",
1599 |     "    print('Already train %dK batches'% (len(record_df)))\n"
1600 |    ]
1601 |   },
1602 |   {
1603 |    "cell_type": "code",
1604 |    "execution_count": 27,
1605 |    "metadata": {},
1606 |    "outputs": [],
1607 |    "source": [
1608 |     "optimizer = Adam(lr=1e-5)\n",
1609 |     "optimizer_classifier = Adam(lr=1e-5)\n",
1610 |     "model_rpn.compile(optimizer=optimizer, loss=[rpn_loss_cls(num_anchors), rpn_loss_regr(num_anchors)])\n",
1611 |     "model_classifier.compile(optimizer=optimizer_classifier, loss=[class_loss_cls, class_loss_regr(len(classes_count)-1)], metrics={'dense_class_{}'.format(len(classes_count)): 'accuracy'})\n",
1612 |     "model_all.compile(optimizer='sgd', loss='mae')"
1613 |    ]
1614 |   },
1615 |   {
1616 |    "cell_type": "code",
1617 |    "execution_count": 28,
1618 |    "metadata": {},
1619 |    "outputs": [],
1620 |    "source": [
1621 |     "# Training setting\n",
1622 |     "total_epochs = len(record_df)\n",
1623 |     "r_epochs = len(record_df)\n",
1624 |     "\n",
1625 |     "epoch_length = 1000\n",
1626 |     "num_epochs = 50\n",
1627 |     "iter_num = 0\n",
1628 |     "\n",
1629 |     "total_epochs += num_epochs\n",
1630 |     "\n",
1631 |     "losses = np.zeros((epoch_length, 5))\n",
1632 |     "rpn_accuracy_rpn_monitor = []\n",
1633 |     "rpn_accuracy_for_epoch = []\n",
1634 |     "\n",
1635 |     "if len(record_df)==0:\n",
1636 |     "    best_loss = np.Inf\n",
1637 |     "else:\n",
1638 |     "    best_loss = np.min(r_curr_loss)"
1639 |    ]
1640 |   },
1641 |   {
1642 |    "cell_type": "code",
1643 |    "execution_count": 29,
1644 |    "metadata": {},
1645 |    "outputs": [
1646 |     {
1647 |      "name": "stdout",
1648 |      "output_type": "stream",
1649 |      "text": [
1650 |       "0\n"
1651 |      ]
1652 |     }
1653 |    ],
1654 |    "source": [
1655 |     "print(len(record_df))\n"
1656 |    ]
1657 |   },
1658 |   {
1659 |    "cell_type": "code",
1660 |    "execution_count": 30,
1661 |    "metadata": {},
1662 |    "outputs": [
1663 |     {
1664 |      "name": "stdout",
1665 |      "output_type": "stream",
1666 |      "text": [
1667 |       "Epoch 1/50\n",
1668 |       "WARNING:tensorflow:From /Users/davidheller/anaconda3/lib/python3.7/site-packages/tensorflow/python/ops/math_ops.py:3066: to_int32 (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.\n",
1669 |       "Instructions for updating:\n",
1670 |       "Use tf.cast instead.\n"
1671 |      ]
1672 |     },
1673 |     {
1674 |      "ename": "KeyboardInterrupt",
1675 |      "evalue": "",
1676 |      "output_type": "error",
1677 |      "traceback": [
1678 |       "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
1679 |       "\u001b[0;31mKeyboardInterrupt\u001b[0m                         Traceback (most recent call last)",
1680 |       "\u001b[0;32m<ipython-input-30-4ccbeef93246>\u001b[0m in \u001b[0;36m<module>\u001b[0;34m\u001b[0m\n\u001b[1;32m     89\u001b[0m             \u001b[0;31m#  Y1[:, sel_samples, :] => one hot encode for num_rois bboxes which contains selected neg and pos\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m     90\u001b[0m             \u001b[0;31m#  Y2[:, sel_samples, :] => labels and gt bboxes for num_rois bboxes which contains selected neg and pos\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m---> 91\u001b[0;31m             \u001b[0mloss_class\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mmodel_classifier\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mtrain_on_batch\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0mX\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mX2\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0msel_samples\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m:\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m[\u001b[0m\u001b[0mY1\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0msel_samples\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m:\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mY2\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0msel_samples\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m:\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m     92\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m     93\u001b[0m             \u001b[0mlosses\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0miter_num\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;36m0\u001b[0m\u001b[0;34m]\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mloss_rpn\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0;36m1\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
1681 |       "\u001b[0;32m~/anaconda3/lib/python3.7/site-packages/keras/engine/training.py\u001b[0m in \u001b[0;36mtrain_on_batch\u001b[0;34m(self, x, y, sample_weight, class_weight)\u001b[0m\n\u001b[1;32m   1880\u001b[0m         \u001b[0;32melse\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m   1881\u001b[0m             \u001b[0mins\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mx\u001b[0m \u001b[0;34m+\u001b[0m \u001b[0my\u001b[0m \u001b[0;34m+\u001b[0m \u001b[0msample_weights\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m-> 1882\u001b[0;31m         \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_make_train_function\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m   1883\u001b[0m         \u001b[0moutputs\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mtrain_function\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mins\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m   1884\u001b[0m         \u001b[0;32mif\u001b[0m \u001b[0mlen\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0moutputs\u001b[0m\u001b[0;34m)\u001b[0m \u001b[0;34m==\u001b[0m \u001b[0;36m1\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
1682 |       "\u001b[0;32m~/anaconda3/lib/python3.7/site-packages/keras/engine/training.py\u001b[0m in \u001b[0;36m_make_train_function\u001b[0;34m(self)\u001b[0m\n\u001b[1;32m    990\u001b[0m                     training_updates = self.optimizer.get_updates(\n\u001b[1;32m    991\u001b[0m                         \u001b[0mparams\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_collected_trainable_weights\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 992\u001b[0;31m                         loss=self.total_loss)\n\u001b[0m\u001b[1;32m    993\u001b[0m                 \u001b[0mupdates\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mupdates\u001b[0m \u001b[0;34m+\u001b[0m \u001b[0mtraining_updates\u001b[0m \u001b[0;34m+\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mmetrics_updates\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m    994\u001b[0m                 \u001b[0;31m# Gets loss and metrics. Updates weights at each call.\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
1683 |       "\u001b[0;32m~/anaconda3/lib/python3.7/site-packages/keras/legacy/interfaces.py\u001b[0m in \u001b[0;36mwrapper\u001b[0;34m(*args, **kwargs)\u001b[0m\n\u001b[1;32m     89\u001b[0m                 warnings.warn('Update your `' + object_name +\n\u001b[1;32m     90\u001b[0m                               '` call to the Keras 2 API: ' + signature, stacklevel=2)\n\u001b[0;32m---> 91\u001b[0;31m             \u001b[0;32mreturn\u001b[0m \u001b[0mfunc\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m*\u001b[0m\u001b[0margs\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m**\u001b[0m\u001b[0mkwargs\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m     92\u001b[0m         \u001b[0mwrapper\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_original_function\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mfunc\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m     93\u001b[0m         \u001b[0;32mreturn\u001b[0m \u001b[0mwrapper\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
1684 |       "\u001b[0;32m~/anaconda3/lib/python3.7/site-packages/keras/optimizers.py\u001b[0m in \u001b[0;36mget_updates\u001b[0;34m(self, loss, params)\u001b[0m\n\u001b[1;32m    455\u001b[0m                      (1. - K.pow(self.beta_1, t)))\n\u001b[1;32m    456\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 457\u001b[0;31m         \u001b[0mms\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0;34m[\u001b[0m\u001b[0mK\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mzeros\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mK\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mint_shape\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mp\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mdtype\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mK\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mdtype\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mp\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m)\u001b[0m \u001b[0;32mfor\u001b[0m \u001b[0mp\u001b[0m \u001b[0;32min\u001b[0m \u001b[0mparams\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m    458\u001b[0m         \u001b[0mvs\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0;34m[\u001b[0m\u001b[0mK\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mzeros\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mK\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mint_shape\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mp\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mdtype\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mK\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mdtype\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mp\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m)\u001b[0m \u001b[0;32mfor\u001b[0m \u001b[0mp\u001b[0m \u001b[0;32min\u001b[0m \u001b[0mparams\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m    459\u001b[0m         \u001b[0;32mif\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mamsgrad\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
1685 |       "\u001b[0;32m~/anaconda3/lib/python3.7/site-packages/keras/optimizers.py\u001b[0m in \u001b[0;36m<listcomp>\u001b[0;34m(.0)\u001b[0m\n\u001b[1;32m    455\u001b[0m                      (1. - K.pow(self.beta_1, t)))\n\u001b[1;32m    456\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 457\u001b[0;31m         \u001b[0mms\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0;34m[\u001b[0m\u001b[0mK\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mzeros\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mK\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mint_shape\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mp\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mdtype\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mK\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mdtype\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mp\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m)\u001b[0m \u001b[0;32mfor\u001b[0m \u001b[0mp\u001b[0m \u001b[0;32min\u001b[0m \u001b[0mparams\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m    458\u001b[0m         \u001b[0mvs\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0;34m[\u001b[0m\u001b[0mK\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mzeros\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mK\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mint_shape\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mp\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mdtype\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mK\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mdtype\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mp\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m)\u001b[0m \u001b[0;32mfor\u001b[0m \u001b[0mp\u001b[0m \u001b[0;32min\u001b[0m \u001b[0mparams\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m    459\u001b[0m         \u001b[0;32mif\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mamsgrad\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
1686 |       "\u001b[0;32m~/anaconda3/lib/python3.7/site-packages/keras/backend/tensorflow_backend.py\u001b[0m in \u001b[0;36mzeros\u001b[0;34m(shape, dtype, name)\u001b[0m\n\u001b[1;32m    693\u001b[0m     \u001b[0mv\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mtf\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mzeros\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mshape\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mshape\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mdtype\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mtf_dtype\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mname\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mname\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m    694\u001b[0m     \u001b[0;32mif\u001b[0m \u001b[0mpy_all\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mv\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mget_shape\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mas_list\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 695\u001b[0;31m         \u001b[0;32mreturn\u001b[0m \u001b[0mvariable\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mv\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mdtype\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mdtype\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mname\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mname\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m    696\u001b[0m     \u001b[0;32mreturn\u001b[0m \u001b[0mv\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m    697\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n",
1687 |       "\u001b[0;32m~/anaconda3/lib/python3.7/site-packages/keras/backend/tensorflow_backend.py\u001b[0m in \u001b[0;36mvariable\u001b[0;34m(value, dtype, name, constraint)\u001b[0m\n\u001b[1;32m    394\u001b[0m         \u001b[0mv\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_uses_learning_phase\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0;32mFalse\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m    395\u001b[0m         \u001b[0;32mreturn\u001b[0m \u001b[0mv\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 396\u001b[0;31m     \u001b[0mv\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mtf\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mVariable\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mvalue\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mdtype\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mtf\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mas_dtype\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mdtype\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mname\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mname\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m    397\u001b[0m     \u001b[0;32mif\u001b[0m \u001b[0misinstance\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mvalue\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mnp\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mndarray\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m    398\u001b[0m         \u001b[0mv\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_keras_shape\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mvalue\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mshape\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
1688 |       "\u001b[0;32m~/anaconda3/lib/python3.7/site-packages/tensorflow/python/ops/variables.py\u001b[0m in \u001b[0;36m__call__\u001b[0;34m(cls, *args, **kwargs)\u001b[0m\n\u001b[1;32m    211\u001b[0m   \u001b[0;32mdef\u001b[0m \u001b[0m__call__\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mcls\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m*\u001b[0m\u001b[0margs\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m**\u001b[0m\u001b[0mkwargs\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m    212\u001b[0m     \u001b[0;32mif\u001b[0m \u001b[0mcls\u001b[0m \u001b[0;32mis\u001b[0m \u001b[0mVariableV1\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 213\u001b[0;31m       \u001b[0;32mreturn\u001b[0m \u001b[0mcls\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_variable_v1_call\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m*\u001b[0m\u001b[0margs\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m**\u001b[0m\u001b[0mkwargs\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m    214\u001b[0m     \u001b[0;32melif\u001b[0m \u001b[0mcls\u001b[0m \u001b[0;32mis\u001b[0m \u001b[0mVariable\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m    215\u001b[0m       \u001b[0;32mreturn\u001b[0m \u001b[0mcls\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_variable_v2_call\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m*\u001b[0m\u001b[0margs\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m**\u001b[0m\u001b[0mkwargs\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
1689 |       "\u001b[0;32m~/anaconda3/lib/python3.7/site-packages/tensorflow/python/ops/variables.py\u001b[0m in \u001b[0;36m_variable_v1_call\u001b[0;34m(cls, initial_value, trainable, collections, validate_shape, caching_device, name, variable_def, dtype, expected_shape, import_scope, constraint, use_resource, synchronization, aggregation)\u001b[0m\n\u001b[1;32m    174\u001b[0m         \u001b[0muse_resource\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0muse_resource\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m    175\u001b[0m         \u001b[0msynchronization\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0msynchronization\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 176\u001b[0;31m         aggregation=aggregation)\n\u001b[0m\u001b[1;32m    177\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m    178\u001b[0m   def _variable_v2_call(cls,\n",
1690 |       "\u001b[0;32m~/anaconda3/lib/python3.7/site-packages/tensorflow/python/ops/variables.py\u001b[0m in \u001b[0;36m<lambda>\u001b[0;34m(**kwargs)\u001b[0m\n\u001b[1;32m    153\u001b[0m                         aggregation=VariableAggregation.NONE):\n\u001b[1;32m    154\u001b[0m     \u001b[0;34m\"\"\"Call on Variable class. Useful to force the signature.\"\"\"\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 155\u001b[0;31m     \u001b[0mprevious_getter\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0;32mlambda\u001b[0m \u001b[0;34m**\u001b[0m\u001b[0mkwargs\u001b[0m\u001b[0;34m:\u001b[0m \u001b[0mdefault_variable_creator\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;32mNone\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m**\u001b[0m\u001b[0mkwargs\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m    156\u001b[0m     \u001b[0;32mfor\u001b[0m \u001b[0mgetter\u001b[0m \u001b[0;32min\u001b[0m \u001b[0mops\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mget_default_graph\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_variable_creator_stack\u001b[0m\u001b[0;34m:\u001b[0m  \u001b[0;31m# pylint: disable=protected-access\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m    157\u001b[0m       \u001b[0mprevious_getter\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0m_make_getter\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mgetter\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mprevious_getter\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
1691 |       "\u001b[0;32m~/anaconda3/lib/python3.7/site-packages/tensorflow/python/ops/variable_scope.py\u001b[0m in \u001b[0;36mdefault_variable_creator\u001b[0;34m(next_creator, **kwargs)\u001b[0m\n\u001b[1;32m   2493\u001b[0m         \u001b[0mcaching_device\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mcaching_device\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mname\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mname\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mdtype\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mdtype\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m   2494\u001b[0m         \u001b[0mconstraint\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mconstraint\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mvariable_def\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mvariable_def\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m-> 2495\u001b[0;31m         expected_shape=expected_shape, import_scope=import_scope)\n\u001b[0m\u001b[1;32m   2496\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m   2497\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n",
1692 |       "\u001b[0;32m~/anaconda3/lib/python3.7/site-packages/tensorflow/python/ops/variables.py\u001b[0m in \u001b[0;36m__call__\u001b[0;34m(cls, *args, **kwargs)\u001b[0m\n\u001b[1;32m    215\u001b[0m       \u001b[0;32mreturn\u001b[0m \u001b[0mcls\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_variable_v2_call\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m*\u001b[0m\u001b[0margs\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m**\u001b[0m\u001b[0mkwargs\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m    216\u001b[0m     \u001b[0;32melse\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 217\u001b[0;31m       \u001b[0;32mreturn\u001b[0m \u001b[0msuper\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mVariableMetaclass\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mcls\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m__call__\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m*\u001b[0m\u001b[0margs\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m**\u001b[0m\u001b[0mkwargs\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m    218\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m    219\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n",
1693 |       "\u001b[0;32m~/anaconda3/lib/python3.7/site-packages/tensorflow/python/ops/variables.py\u001b[0m in \u001b[0;36m__init__\u001b[0;34m(self, initial_value, trainable, collections, validate_shape, caching_device, name, variable_def, dtype, expected_shape, import_scope, constraint)\u001b[0m\n\u001b[1;32m   1393\u001b[0m           \u001b[0mdtype\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mdtype\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m   1394\u001b[0m           \u001b[0mexpected_shape\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mexpected_shape\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m-> 1395\u001b[0;31m           constraint=constraint)\n\u001b[0m\u001b[1;32m   1396\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m   1397\u001b[0m   \u001b[0;32mdef\u001b[0m \u001b[0m__repr__\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mself\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
1694 |       "\u001b[0;32m~/anaconda3/lib/python3.7/site-packages/tensorflow/python/ops/variables.py\u001b[0m in \u001b[0;36m_init_from_args\u001b[0;34m(self, initial_value, trainable, collections, validate_shape, caching_device, name, dtype, expected_shape, constraint)\u001b[0m\n\u001b[1;32m   1545\u001b[0m             self._try_guard_against_uninitialized_dependencies(\n\u001b[1;32m   1546\u001b[0m                 self._initial_value),\n\u001b[0;32m-> 1547\u001b[0;31m             validate_shape=validate_shape).op\n\u001b[0m\u001b[1;32m   1548\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m   1549\u001b[0m         \u001b[0;31m# TODO(vrv): Change this class to not take caching_device, but\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
1695 |       "\u001b[0;32m~/anaconda3/lib/python3.7/site-packages/tensorflow/python/ops/state_ops.py\u001b[0m in \u001b[0;36massign\u001b[0;34m(ref, value, validate_shape, use_locking, name)\u001b[0m\n\u001b[1;32m    221\u001b[0m     return gen_state_ops.assign(\n\u001b[1;32m    222\u001b[0m         \u001b[0mref\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mvalue\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0muse_locking\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0muse_locking\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mname\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mname\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 223\u001b[0;31m         validate_shape=validate_shape)\n\u001b[0m\u001b[1;32m    224\u001b[0m   \u001b[0;32mreturn\u001b[0m \u001b[0mref\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0massign\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mvalue\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mname\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mname\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m    225\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n",
1696 |       "\u001b[0;32m~/anaconda3/lib/python3.7/site-packages/tensorflow/python/ops/gen_state_ops.py\u001b[0m in \u001b[0;36massign\u001b[0;34m(ref, value, validate_shape, use_locking, name)\u001b[0m\n\u001b[1;32m     65\u001b[0m   \u001b[0m_result\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0m_op\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0moutputs\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m     66\u001b[0m   \u001b[0m_inputs_flat\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0m_op\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0minputs\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m---> 67\u001b[0;31m   _attrs = (\"T\", _op.get_attr(\"T\"), \"validate_shape\",\n\u001b[0m\u001b[1;32m     68\u001b[0m             \u001b[0m_op\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mget_attr\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m\"validate_shape\"\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m\"use_locking\"\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m     69\u001b[0m             _op.get_attr(\"use_locking\"))\n",
1697 |       "\u001b[0;32m~/anaconda3/lib/python3.7/site-packages/tensorflow/python/framework/ops.py\u001b[0m in \u001b[0;36mget_attr\u001b[0;34m(self, name)\u001b[0m\n\u001b[1;32m   2413\u001b[0m       \u001b[0;32mraise\u001b[0m \u001b[0mValueError\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mstr\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0me\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m   2414\u001b[0m     \u001b[0mx\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mattr_value_pb2\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mAttrValue\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m-> 2415\u001b[0;31m     \u001b[0mx\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mParseFromString\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mdata\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m   2416\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m   2417\u001b[0m     \u001b[0moneof_value\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mx\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mWhichOneof\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m\"value\"\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
1698 |       "\u001b[0;31mKeyboardInterrupt\u001b[0m: "
1699 |      ]
1700 |     }
1701 |    ],
1702 |    "source": [
1703 |     "start_time = time.time()\n",
1704 |     "for epoch_num in range(num_epochs):\n",
1705 |     "\n",
1706 |     "    progbar = generic_utils.Progbar(epoch_length)\n",
1707 |     "    print('Epoch {}/{}'.format(r_epochs + 1, total_epochs))\n",
1708 |     "    \n",
1709 |     "    r_epochs += 1\n",
1710 |     "\n",
1711 |     "    while True:\n",
1712 |     "        try:\n",
1713 |     "\n",
1714 |     "            if len(rpn_accuracy_rpn_monitor) == epoch_length and C.verbose:\n",
1715 |     "                mean_overlapping_bboxes = float(sum(rpn_accuracy_rpn_monitor))/len(rpn_accuracy_rpn_monitor)\n",
1716 |     "                rpn_accuracy_rpn_monitor = []\n",
1717 |     "#                 print('Average number of overlapping bounding boxes from RPN = {} for {} previous iterations'.format(mean_overlapping_bboxes, epoch_length))\n",
1718 |     "                if mean_overlapping_bboxes == 0:\n",
1719 |     "                    print('RPN is not producing bounding boxes that overlap the ground truth boxes. Check RPN settings or keep training.')\n",
1720 |     "\n",
1721 |     "            # Generate X (x_img) and label Y ([y_rpn_cls, y_rpn_regr])\n",
1722 |     "            X, Y, img_data, debug_img, debug_num_pos = next(data_gen_train)\n",
1723 |     "\n",
1724 |     "            # Train rpn model and get loss value [_, loss_rpn_cls, loss_rpn_regr]\n",
1725 |     "            loss_rpn = model_rpn.train_on_batch(X, Y)\n",
1726 |     "\n",
1727 |     "            # Get predicted rpn from rpn model [rpn_cls, rpn_regr]\n",
1728 |     "            P_rpn = model_rpn.predict_on_batch(X)\n",
1729 |     "\n",
1730 |     "            # R: bboxes (shape=(300,4))\n",
1731 |     "            # Convert rpn layer to roi bboxes\n",
1732 |     "            R = rpn_to_roi(P_rpn[0], P_rpn[1], C, K.image_dim_ordering(), use_regr=True, overlap_thresh=0.7, max_boxes=300)\n",
1733 |     "            \n",
1734 |     "            # note: calc_iou converts from (x1,y1,x2,y2) to (x,y,w,h) format\n",
1735 |     "            # X2: bboxes that iou > C.classifier_min_overlap for all gt bboxes in 300 non_max_suppression bboxes\n",
1736 |     "            # Y1: one hot code for bboxes from above => x_roi (X)\n",
1737 |     "            # Y2: corresponding labels and corresponding gt bboxes\n",
1738 |     "            X2, Y1, Y2, IouS = calc_iou(R, img_data, C, class_mapping)\n",
1739 |     "\n",
1740 |     "            # If X2 is None means there are no matching bboxes\n",
1741 |     "            if X2 is None:\n",
1742 |     "                rpn_accuracy_rpn_monitor.append(0)\n",
1743 |     "                rpn_accuracy_for_epoch.append(0)\n",
1744 |     "                continue\n",
1745 |     "            \n",
1746 |     "            # Find out the positive anchors and negative anchors\n",
1747 |     "            neg_samples = np.where(Y1[0, :, -1] == 1)\n",
1748 |     "            pos_samples = np.where(Y1[0, :, -1] == 0)\n",
1749 |     "\n",
1750 |     "            if len(neg_samples) > 0:\n",
1751 |     "                neg_samples = neg_samples[0]\n",
1752 |     "            else:\n",
1753 |     "                neg_samples = []\n",
1754 |     "\n",
1755 |     "            if len(pos_samples) > 0:\n",
1756 |     "                pos_samples = pos_samples[0]\n",
1757 |     "            else:\n",
1758 |     "                pos_samples = []\n",
1759 |     "\n",
1760 |     "            rpn_accuracy_rpn_monitor.append(len(pos_samples))\n",
1761 |     "            rpn_accuracy_for_epoch.append((len(pos_samples)))\n",
1762 |     "\n",
1763 |     "            if C.num_rois > 1:\n",
1764 |     "                # If number of positive anchors is larger than 4//2 = 2, randomly choose 2 pos samples\n",
1765 |     "                if len(pos_samples) < C.num_rois//2:\n",
1766 |     "                    selected_pos_samples = pos_samples.tolist()\n",
1767 |     "                else:\n",
1768 |     "                    selected_pos_samples = np.random.choice(pos_samples, C.num_rois//2, replace=False).tolist()\n",
1769 |     "                \n",
1770 |     "                # Randomly choose (num_rois - num_pos) neg samples\n",
1771 |     "                try:\n",
1772 |     "                    selected_neg_samples = np.random.choice(neg_samples, C.num_rois - len(selected_pos_samples), replace=False).tolist()\n",
1773 |     "                except:\n",
1774 |     "                    selected_neg_samples = np.random.choice(neg_samples, C.num_rois - len(selected_pos_samples), replace=True).tolist()\n",
1775 |     "                \n",
1776 |     "                # Save all the pos and neg samples in sel_samples\n",
1777 |     "                sel_samples = selected_pos_samples + selected_neg_samples\n",
1778 |     "            else:\n",
1779 |     "                # in the extreme case where num_rois = 1, we pick a random pos or neg sample\n",
1780 |     "                selected_pos_samples = pos_samples.tolist()\n",
1781 |     "                selected_neg_samples = neg_samples.tolist()\n",
1782 |     "                if np.random.randint(0, 2):\n",
1783 |     "                    sel_samples = random.choice(neg_samples)\n",
1784 |     "                else:\n",
1785 |     "                    sel_samples = random.choice(pos_samples)\n",
1786 |     "\n",
1787 |     "            # training_data: [X, X2[:, sel_samples, :]]\n",
1788 |     "            # labels: [Y1[:, sel_samples, :], Y2[:, sel_samples, :]]\n",
1789 |     "            #  X                     => img_data resized image\n",
1790 |     "            #  X2[:, sel_samples, :] => num_rois (4 in here) bboxes which contains selected neg and pos\n",
1791 |     "            #  Y1[:, sel_samples, :] => one hot encode for num_rois bboxes which contains selected neg and pos\n",
1792 |     "            #  Y2[:, sel_samples, :] => labels and gt bboxes for num_rois bboxes which contains selected neg and pos\n",
1793 |     "            loss_class = model_classifier.train_on_batch([X, X2[:, sel_samples, :]], [Y1[:, sel_samples, :], Y2[:, sel_samples, :]])\n",
1794 |     "\n",
1795 |     "            losses[iter_num, 0] = loss_rpn[1]\n",
1796 |     "            losses[iter_num, 1] = loss_rpn[2]\n",
1797 |     "\n",
1798 |     "            losses[iter_num, 2] = loss_class[1]\n",
1799 |     "            losses[iter_num, 3] = loss_class[2]\n",
1800 |     "            losses[iter_num, 4] = loss_class[3]\n",
1801 |     "\n",
1802 |     "            iter_num += 1\n",
1803 |     "\n",
1804 |     "            progbar.update(iter_num, [('rpn_cls', np.mean(losses[:iter_num, 0])), ('rpn_regr', np.mean(losses[:iter_num, 1])),\n",
1805 |     "                                      ('final_cls', np.mean(losses[:iter_num, 2])), ('final_regr', np.mean(losses[:iter_num, 3]))])\n",
1806 |     "\n",
1807 |     "            if iter_num == epoch_length:\n",
1808 |     "                loss_rpn_cls = np.mean(losses[:, 0])\n",
1809 |     "                loss_rpn_regr = np.mean(losses[:, 1])\n",
1810 |     "                loss_class_cls = np.mean(losses[:, 2])\n",
1811 |     "                loss_class_regr = np.mean(losses[:, 3])\n",
1812 |     "                class_acc = np.mean(losses[:, 4])\n",
1813 |     "\n",
1814 |     "                mean_overlapping_bboxes = float(sum(rpn_accuracy_for_epoch)) / len(rpn_accuracy_for_epoch)\n",
1815 |     "                rpn_accuracy_for_epoch = []\n",
1816 |     "\n",
1817 |     "                if C.verbose:\n",
1818 |     "                    print('Mean number of bounding boxes from RPN overlapping ground truth boxes: {}'.format(mean_overlapping_bboxes))\n",
1819 |     "                    print('Classifier accuracy for bounding boxes from RPN: {}'.format(class_acc))\n",
1820 |     "                    print('Loss RPN classifier: {}'.format(loss_rpn_cls))\n",
1821 |     "                    print('Loss RPN regression: {}'.format(loss_rpn_regr))\n",
1822 |     "                    print('Loss Detector classifier: {}'.format(loss_class_cls))\n",
1823 |     "                    print('Loss Detector regression: {}'.format(loss_class_regr))\n",
1824 |     "                    print('Total loss: {}'.format(loss_rpn_cls + loss_rpn_regr + loss_class_cls + loss_class_regr))\n",
1825 |     "                    print('Elapsed time: {}'.format(time.time() - start_time))\n",
1826 |     "                    elapsed_time = (time.time()-start_time)/60\n",
1827 |     "\n",
1828 |     "                curr_loss = loss_rpn_cls + loss_rpn_regr + loss_class_cls + loss_class_regr\n",
1829 |     "                iter_num = 0\n",
1830 |     "                start_time = time.time()\n",
1831 |     "\n",
1832 |     "                if curr_loss < best_loss:\n",
1833 |     "                    if C.verbose:\n",
1834 |     "                        print('Total loss decreased from {} to {}, saving weights'.format(best_loss,curr_loss))\n",
1835 |     "                    best_loss = curr_loss\n",
1836 |     "                    model_all.save_weights(C.model_path)\n",
1837 |     "\n",
1838 |     "                new_row = {'mean_overlapping_bboxes':round(mean_overlapping_bboxes, 3), \n",
1839 |     "                           'class_acc':round(class_acc, 3), \n",
1840 |     "                           'loss_rpn_cls':round(loss_rpn_cls, 3), \n",
1841 |     "                           'loss_rpn_regr':round(loss_rpn_regr, 3), \n",
1842 |     "                           'loss_class_cls':round(loss_class_cls, 3), \n",
1843 |     "                           'loss_class_regr':round(loss_class_regr, 3), \n",
1844 |     "                           'curr_loss':round(curr_loss, 3), \n",
1845 |     "                           'elapsed_time':round(elapsed_time, 3), \n",
1846 |     "                           'mAP': 0}\n",
1847 |     "\n",
1848 |     "                record_df = record_df.append(new_row, ignore_index=True)\n",
1849 |     "                record_df.to_csv(record_path, index=0)\n",
1850 |     "\n",
1851 |     "                break\n",
1852 |     "\n",
1853 |     "        except Exception as e:\n",
1854 |     "            print('Exception: {}'.format(e))\n",
1855 |     "            continue\n",
1856 |     "\n",
1857 |     "print('Training complete, exiting.')"
1858 |    ]
1859 |   },
1860 |   {
1861 |    "cell_type": "code",
1862 |    "execution_count": null,
1863 |    "metadata": {},
1864 |    "outputs": [],
1865 |    "source": [
1866 |     "# plt.figure(figsize=(15,5))\n",
1867 |     "# plt.subplot(1,2,1)\n",
1868 |     "# plt.plot(np.arange(0, r_epochs), record_df['mean_overlapping_bboxes'], 'r')\n",
1869 |     "# plt.title('mean_overlapping_bboxes')\n",
1870 |     "# plt.subplot(1,2,2)\n",
1871 |     "# plt.plot(np.arange(0, r_epochs), record_df['class_acc'], 'r')\n",
1872 |     "# plt.title('class_acc')\n",
1873 |     "\n",
1874 |     "# plt.show()\n",
1875 |     "\n",
1876 |     "# plt.figure(figsize=(15,5))\n",
1877 |     "# plt.subplot(1,2,1)\n",
1878 |     "# plt.plot(np.arange(0, r_epochs), record_df['loss_rpn_cls'], 'r')\n",
1879 |     "# plt.title('loss_rpn_cls')\n",
1880 |     "# plt.subplot(1,2,2)\n",
1881 |     "# plt.plot(np.arange(0, r_epochs), record_df['loss_rpn_regr'], 'r')\n",
1882 |     "# plt.title('loss_rpn_regr')\n",
1883 |     "# plt.show()\n",
1884 |     "\n",
1885 |     "\n",
1886 |     "# plt.figure(figsize=(15,5))\n",
1887 |     "# plt.subplot(1,2,1)\n",
1888 |     "# plt.plot(np.arange(0, r_epochs), record_df['loss_class_cls'], 'r')\n",
1889 |     "# plt.title('loss_class_cls')\n",
1890 |     "# plt.subplot(1,2,2)\n",
1891 |     "# plt.plot(np.arange(0, r_epochs), record_df['loss_class_regr'], 'r')\n",
1892 |     "# plt.title('loss_class_regr')\n",
1893 |     "# plt.show()\n",
1894 |     "\n",
1895 |     "# plt.plot(np.arange(0, r_epochs), record_df['curr_loss'], 'r')\n",
1896 |     "# plt.title('total_loss')\n",
1897 |     "# plt.show()\n",
1898 |     "\n",
1899 |     "# plt.figure(figsize=(15,5))\n",
1900 |     "# plt.subplot(1,2,1)\n",
1901 |     "# plt.plot(np.arange(0, r_epochs), record_df['curr_loss'], 'r')\n",
1902 |     "# plt.title('total_loss')\n",
1903 |     "# plt.subplot(1,2,2)\n",
1904 |     "# plt.plot(np.arange(0, r_epochs), record_df['elapsed_time'], 'r')\n",
1905 |     "# plt.title('elapsed_time')\n",
1906 |     "# plt.show()\n",
1907 |     "\n",
1908 |     "# plt.title('loss')\n",
1909 |     "# plt.plot(np.arange(0, r_epochs), record_df['loss_rpn_cls'], 'b')\n",
1910 |     "# plt.plot(np.arange(0, r_epochs), record_df['loss_rpn_regr'], 'g')\n",
1911 |     "# plt.plot(np.arange(0, r_epochs), record_df['loss_class_cls'], 'r')\n",
1912 |     "# plt.plot(np.arange(0, r_epochs), record_df['loss_class_regr'], 'c')\n",
1913 |     "# # plt.plot(np.arange(0, r_epochs), record_df['curr_loss'], 'm')\n",
1914 |     "# plt.show()"
1915 |    ]
1916 |   },
1917 |   {
1918 |    "cell_type": "code",
1919 |    "execution_count": null,
1920 |    "metadata": {},
1921 |    "outputs": [],
1922 |    "source": []
1923 |   },
1924 |   {
1925 |    "cell_type": "code",
1926 |    "execution_count": null,
1927 |    "metadata": {},
1928 |    "outputs": [],
1929 |    "source": []
1930 |   },
1931 |   {
1932 |    "cell_type": "code",
1933 |    "execution_count": null,
1934 |    "metadata": {},
1935 |    "outputs": [],
1936 |    "source": []
1937 |   }
1938 |  ],
1939 |  "metadata": {
1940 |   "kernelspec": {
1941 |    "display_name": "Python 3",
1942 |    "language": "python",
1943 |    "name": "python3"
1944 |   },
1945 |   "language_info": {
1946 |    "codemirror_mode": {
1947 |     "name": "ipython",
1948 |     "version": 3
1949 |    },
1950 |    "file_extension": ".py",
1951 |    "mimetype": "text/x-python",
1952 |    "name": "python",
1953 |    "nbconvert_exporter": "python",
1954 |    "pygments_lexer": "ipython3",
1955 |    "version": "3.7.3"
1956 |   }
1957 |  },
1958 |  "nbformat": 4,
1959 |  "nbformat_minor": 2
1960 | }
1961 | 


--------------------------------------------------------------------------------