├── README.md
├── images
    ├── img1.png
    ├── img2.png
    ├── img3.png
    ├── img4.png
    ├── img5.png
    ├── img6.png
    ├── img7.png
    ├── img8.png
    └── readme
└── math_2Dto3D.py


/README.md:
--------------------------------------------------------------------------------
 1 | # 3D-detection-with-monocular-RGB-image  
 2 | Reference Paper: 
 3 | Paper1: 3D Bounding Box Estimation Using Deep Learning and Geometry    
 4 | URL: https://arxiv.org/abs/1612.00496  
 5 | Paper2: Monocular 3D Object Detection Leveraging Accurate Proposals and Shape Reconstruction    
 6 | URL: https://arxiv.org/abs/1904.01690    
 7 | Paper3: 3D Bounding Boxes for Road Vehicles: A One-Stage, Localization Prioritized Approach using Single Monocular Images URL: https://link.springer.com/chapter/10.1007%2F978-3-030-11021-5_39  
 8 | 
 9 | I did the 3D detection research during my internship in MEGVII and most codes including training, testing, lib codes are not allowed posted online, because the codes contains the basemodel and framework information of MEGVII.  
10 | 
11 | I want to share my viewpoint and thoughts about 3D detection with monoculr RGB images. The hardest part and the most tricky part is how to use monocular RGB images to predict location, so I decide to post the code of this part after being addimitted by mentor. This part only uses Numpy and math module rather than deep learning framework. Besides, I will compare different methods of predict orientation and location inference.
12 | 
13 | ## Data set and structure
14 | Kitti 2d object: http://www.cvlibs.net/datasets/kitti/eval_object.php?obj_benchmark=2d  
15 | Input: monoculr RGB image; 2D boxes, dimension, orientation, and location of objects; Camera's inner and outer parameter.
16 | Data form: JSON.
17 | ## Overall Thought
18 | Class and 2D box prediction + Orientation prediction + dimentation prediction --> Location inference --> visualization
19 | ### Class and 2D box prediction
20 | Applied Faster RCNN and resnet as backbone with 2FNN as header to predict the 2D boxes (left-top point and right-bottom point).
21 | ### Orientation prediction
22 | Because of the 2pi range is hard for model to learn, thus divides the 2pi range into several bins and predict the bin class and offset regression has better performance.
23 | ![image](https://github.com/ZhixinLai/3D-detection-with-monocular-RGB-image/blob/master/images/img1.png)
24 | #### Different thought:
25 | * predic the alpha directly and used the space constraint to inference Sida_y  vs   predict the Sida_y directly
26 | Paper1 explains the reason why should use method one to do angle prediction, but actually the method two has better performance after doing experients.
27 | * predict angle direcly vs predict sin & cos
28 | sin & cos is better.
29 | * numbers of bin
30 | Dividing 2Pi into 4 bins has best performance.
31 | ### Dimension prediction
32 | Because the objects in Kitti set ranges a lot, it will has bad performance if we predict the dimension for all object directly.
33 | First, calculate the average dimension for each class. Second, regress the offset of each object. Third, according to the object class predicted in the first step, add the average dimension and offset.
34 | ### Location inference(Tricky part)
35 | This part is hardest to comprehend and needs background of projection principle and different coordinate systems.
36 | #### Different thought:
37 | * Method one: According to paper1, we can use the 2D box and 3D box relationship to inference 3D location. As we can see from the figure blew, some vertexs of 3D boxes locats in the line of 2D boxes. We can use this principle to do inference location coordinate. First, assuming the location coordinate(center point in 3D box) as x, y, z and then use dimension and orientation figure to calculate the coordinates of 8 vertexs with xyz. Second, transfer the 8 coordinates from world coordinate system into camera coordinate system. Third, each vertex of 3D box is possible located in each line of 2D box, thus there are 4^8 cases. However, because of some priori knowledge and angle prediction result, the number of cases reduce to 64. In this step, we get 64 equations and each quations comtains 4 equation. Forth, solve the 64 equations and then select the best solution as location coordinate xyz.  
38 | ![image](https://github.com/ZhixinLai/3D-detection-with-monocular-RGB-image/blob/master/images/img2.png)  
39 | 
40 | * Method two: According to papaer2, because of the priori knowledge that objects in self-driving scene are on the ground, we can use the height, 2D box and projection constraint to inference the depth information(z). And then use z to inference xy with projection constraint.  
41 | 
42 | * Method three: The method two supposes that after being projected into 2D images, the center point of 3D box coincide with center point of 2D boxes. Actually, the two points will not coincides with each other exactly. According to paper3, we can predict the 3D projected point in 2D image first(shown as blew) and then use the point to step into method two.  
43 | ![image](https://github.com/ZhixinLai/3D-detection-with-monocular-RGB-image/blob/master/images/img3.png)  
44 | 
45 | ## Results
46 | The best performance:  
47 | car_detection AP: 96.452675 86.783386 77.942184  
48 | car_orientation AP: 93.204292 82.368660 73.366890  
49 | pedestrian_detection AP: 69.537376 60.686756 52.112762  
50 | pedestrian_orientation AP: 51.052326 44.875721 38.976936  
51 | cyclist_detection AP: 65.076256 47.723835 46.861427  
52 | cyclist_orientation AP: 40.380432 30.131805 29.838795  
53 | ### 2D box prediction visualization  
54 | ![image](https://github.com/ZhixinLai/3D-detection-with-monocular-RGB-image/blob/master/images/img4.png)  
55 | ### 3D box prediction visualization  
56 | ![image](https://github.com/ZhixinLai/3D-detection-with-monocular-RGB-image/blob/master/images/img5.png)    
57 | ![image](https://github.com/ZhixinLai/3D-detection-with-monocular-RGB-image/blob/master/images/img6.png)   
58 | ![image](https://github.com/ZhixinLai/3D-detection-with-monocular-RGB-image/blob/master/images/img7.png)     
59 | ## conclusion  
60 | As for orientation prediction, divide sida_y into 4bins and regress sin cos is better.  
61 | As for Location inference, method one has better performance.
62 | 


--------------------------------------------------------------------------------
/images/img1.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/ZhixinLai/3D-detection-with-monocular-RGB-image/db6dcd50655a1ccccbdeabd3356fd58b31bfbf3e/images/img1.png


--------------------------------------------------------------------------------
/images/img2.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/ZhixinLai/3D-detection-with-monocular-RGB-image/db6dcd50655a1ccccbdeabd3356fd58b31bfbf3e/images/img2.png


--------------------------------------------------------------------------------
/images/img3.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/ZhixinLai/3D-detection-with-monocular-RGB-image/db6dcd50655a1ccccbdeabd3356fd58b31bfbf3e/images/img3.png


--------------------------------------------------------------------------------
/images/img4.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/ZhixinLai/3D-detection-with-monocular-RGB-image/db6dcd50655a1ccccbdeabd3356fd58b31bfbf3e/images/img4.png


--------------------------------------------------------------------------------
/images/img5.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/ZhixinLai/3D-detection-with-monocular-RGB-image/db6dcd50655a1ccccbdeabd3356fd58b31bfbf3e/images/img5.png


--------------------------------------------------------------------------------
/images/img6.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/ZhixinLai/3D-detection-with-monocular-RGB-image/db6dcd50655a1ccccbdeabd3356fd58b31bfbf3e/images/img6.png


--------------------------------------------------------------------------------
/images/img7.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/ZhixinLai/3D-detection-with-monocular-RGB-image/db6dcd50655a1ccccbdeabd3356fd58b31bfbf3e/images/img7.png


--------------------------------------------------------------------------------
/images/img8.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/ZhixinLai/3D-detection-with-monocular-RGB-image/db6dcd50655a1ccccbdeabd3356fd58b31bfbf3e/images/img8.png


--------------------------------------------------------------------------------
/images/readme:
--------------------------------------------------------------------------------
1 | 
2 | 


--------------------------------------------------------------------------------
/math_2Dto3D.py:
--------------------------------------------------------------------------------
  1 | import numpy as np
  2 | 
  3 | 
  4 | def rotation_matrix(yaw, pitch=0, roll=0):
  5 |     tx = roll
  6 |     ty = yaw
  7 |     tz = pitch
  8 |     
  9 |     Rx = np.array([[1,0,0], [0, np.cos(tx), -np.sin(tx)], [0, np.sin(tx), np.cos(tx)]])
 10 |     Ry = np.array([[np.cos(ty), 0, np.sin(ty)], [0, 1, 0], [-np.sin(ty), 0, np.cos(ty)]])
 11 |     Rz = np.array([[np.cos(tz), -np.sin(tz), 0], [np.sin(tz), np.cos(tz), 0], [0,0,1]])
 12 | 
 13 |     return Ry.reshape([3,3])
 14 |     # return np.dot(np.dot(Rz,Ry), Rx)
 15 |     
 16 | # option to rotate and shift (for label info)
 17 | def create_corners(dimension, location=None, R=None):
 18 | 
 19 |     dx = dimension[2] / 2 # length
 20 |     dy = dimension[0] * 2 # height
 21 |     dz = dimension[1] / 2 # width
 22 |   # height width length
 23 |     x_corners = []
 24 |     y_corners = []
 25 |     z_corners = []
 26 | 
 27 |     for i in [1, -1]:
 28 |         for j in [1,-1]:
 29 |             for k in [1,-1]:
 30 |                 x_corners.append(dx*i)
 31 |                 y_corners.append(dy*j)
 32 |                 z_corners.append(dz*k)
 33 | 
 34 |     corners = [x_corners, y_corners, z_corners]
 35 | 
 36 |     # rotate if R is passed in
 37 |     if R is not None:
 38 |         corners = np.dot(R, corners)
 39 | 
 40 |     # shift if location is passed in
 41 |     if location is not None:
 42 |         for i,loc in enumerate(location):
 43 |             corners[i,:] = corners[i,:] + loc
 44 | 
 45 |     final_corners = []
 46 |     for i in range(8):
 47 |         final_corners.append([corners[0][i], corners[1][i], corners[2][i]])
 48 | 
 49 | 
 50 |     return final_corners
 51 | 
 52 | # this is based on the paper. Math!
 53 | # calib is a 3x4 matrix, box_2d is [(xmin, ymin), (xmax, ymax)]
 54 | def calc_location(dimension, proj_matrix, box_2d, alpha, theta_ray):
 55 |     #global orientation
 56 |     orient = alpha + theta_ray
 57 |     #print('orient:',orient)
 58 |     R = rotation_matrix(orient) # 为了方便通过loc+ori计算顶点坐标;将世界坐标系转化为相机坐标系
 59 |     #print('rotation_matrix:', R)
 60 |     # format 2d corners
 61 |     xmin = box_2d[0][0]
 62 |     ymin = box_2d[0][1]
 63 |     xmax = box_2d[1][0]
 64 |     ymax = box_2d[1][1]
 65 | 
 66 |     # left top right bottom
 67 |     box_corners = [xmin, ymin, xmax, ymax]
 68 |     #print('box_corners:',box_corners)
 69 |     # get the point constraints
 70 |     constraints = []
 71 | 
 72 |     left_constraints = []
 73 |     right_constraints = []
 74 |     top_constraints = []
 75 |     bottom_constraints = []
 76 | 
 77 |     # using a different coord system ; 原数据集中第一个是height（车高度），第二个是width（车两侧的距离），第三个是length(车头到车尾)
 78 |     dx = dimension[2] / 2   # length
 79 |     dy = dimension[0] / 2   # height
 80 |     dz = dimension[1] / 2   # width
 81 |     #print('dx:',dx,'dy:',dy,'dz:',dz)
 82 |     # below is very much based on trial and error
 83 | 
 84 |     # based on the relative angle, a different configuration occurs
 85 |     # negative is back of car, positive is front
 86 |     left_mult = 1
 87 |     right_mult = -1
 88 | 
 89 |     # about straight on but opposite way
 90 |     if alpha < np.deg2rad(92) and alpha > np.deg2rad(88): # 左边——车前右，右边——车前左
 91 |         left_mult = 1
 92 |         right_mult = 1
 93 |     # about straight on and same way
 94 |     elif alpha < np.deg2rad(-88) and alpha > np.deg2rad(-92): # 左边——车后左，右边——车后右
 95 |         left_mult = -1
 96 |         right_mult = -1
 97 |     # this works but doesnt make much sense
 98 |     elif alpha < np.deg2rad(90) and alpha > -np.deg2rad(90): # 左边——车后左/右（当alpha<0,左；alpha>0,右）
 99 |         left_mult = -1
100 |         right_mult = 1
101 | 
102 | 
103 |     # if the car is facing the oppositeway, switch left and right
104 |     switch_mult = -1 #-1
105 |     if alpha > 0:
106 |         switch_mult = 1 #1
107 | 
108 |     # left and right could either be the front of the car or the back of the car
109 |     # careful to use left and right based on image, no of actual car's left and right
110 |     for i in (-2,0):
111 |         left_constraints.append([left_mult * dx, i*dy, -switch_mult * dz])
112 |     for i in (-2,0):
113 |         right_constraints.append([right_mult * dx, i*dy, switch_mult * dz])
114 | 
115 |     """
116 |     # left and right could either be the front of the car ot the back of the car
117 |     # careful to use left and right based on image, no of actual car's left and right
118 |     for i in (-1,1):
119 |         for j in (-1,1):
120 |             for k in (-2,0):
121 |                 left_constraints.append([i * dx, k * dy, j * dz])
122 |     for i in (-1,1):
123 |         for j in (-1,1):
124 |             for k in (-2,0):
125 |                 right_constraints.append([i * dx, k * dy, j * dz])
126 |     """
127 | 
128 |     # top and bottom are easy, just the top and bottom of car
129 |     for i in (-1,1):
130 |         for j in (-1,1):
131 |             top_constraints.append([i*dx, -dy*2, j*dz])
132 |     for i in (-1,1):
133 |         for j in (-1,1):
134 |             bottom_constraints.append([i*dx, 0, j*dz])
135 | 
136 |     # now, 64 combinations
137 |     for left in left_constraints:
138 |         for top in top_constraints:
139 |             for right in right_constraints:
140 |                 for bottom in bottom_constraints:
141 |                     constraints.append([left, top, right, bottom])
142 | 
143 |     # filter out the ones with repeats
144 |     constraints = filter(lambda x: len(x) == len(set(tuple(i) for i in x)), constraints)
145 | 
146 |     # create pre M (the term with I and the R*X)
147 |     pre_M = np.zeros([4,4])
148 |     # 1's down diagonal
149 |     for i in range(0,4):
150 |         pre_M[i][i] = 1
151 | 
152 |     best_loc = None
153 |     best_error = [1e09]
154 |     best_X = None
155 | 
156 |     # loop through each possible constraint, hold on to the best guess
157 |     # constraint will be 64 sets of 4 corners
158 |     count = 0
159 |     for constraint in constraints:
160 |         #print('constraint:',constraint)
161 |         # each corner
162 |         Xa = constraint[0]
163 |         Xb = constraint[1]
164 |         Xc = constraint[2]
165 |         Xd = constraint[3]
166 | 
167 |         X_array = [Xa, Xb, Xc, Xd] # 4约束，对应上下左右 ，shape=4,3
168 |         #print('X_array:',X_array)
169 |         # M: all 1's down diagonal, and upper 3x1 is Rotation_matrix * [x, y, z]
170 |         Ma = np.copy(pre_M)
171 |         Mb = np.copy(pre_M)
172 |         Mc = np.copy(pre_M)
173 |         Md = np.copy(pre_M)
174 | 
175 |         M_array = [Ma, Mb, Mc, Md] # 4个对角为1的4*4方阵
176 | 
177 |         # create A, b
178 |         A = np.zeros([4,3], dtype=np.float)
179 |         b = np.zeros([4,1])
180 | 
181 |         # 对于其中某个约束/上/下/左/右
182 |         indicies = [0,1,0,1]
183 |         for row, index in enumerate(indicies):
184 |             X = X_array[row]
185 |             # x_array is four constrains for up bottom left and right;
186 |             # x is one point in world World coordinate system, .shape = 3
187 |             M = M_array[row] # 一个对角是1的4*4方阵 .shape = 4*4
188 | 
189 |             # create M for corner Xx
190 |             RX = np.dot(R, X) # 某边对应的某点在相机坐标系下的坐标, 维度3 .shape = 3,1
191 |             M[:3,3] = RX.reshape(3) # 对角线是1，最后一列前三行分别是RX对应的相机坐标系下的长宽高 .shape = 4,4
192 | 
193 |             M = np.dot(proj_matrix, M) # 投影到二维平面的坐标（前三维）.shape = 3,4，前三列是project矩阵，最后一列是二维平面的x，y，1
194 | 
195 |             A[row, :] = M[index,:3] - box_corners[row] * M[2,:3]
196 |             b[row] = box_corners[row] * M[2,3] - M[index,3]
197 | 
198 |             #print('A:',A)
199 |             #print('b:',b)
200 |             #print("M:",M)
201 |             #input()
202 | 
203 |         # solve here with least squares, since over fit will get some error
204 |         loc, error, rank, s = np.linalg.lstsq(A, b, rcond=None)
205 | 
206 |         # found a better estimation
207 |         if error < best_error:
208 |             count += 1 # for debugging
209 |             best_loc = loc
210 |             best_error = error
211 |             best_X = X_array
212 | 
213 |     # return best_loc, [left_constraints, right_constraints] # for debugging
214 |     #
215 |     best_loc = [best_loc[0][0], best_loc[1][0], best_loc[2][0]]
216 |     return best_loc, best_X
217 | 
218 | # this is based on the paper. Math!
219 | # calib is a 3x4 matrix, box_2d is [(xmin, ymin), (xmax, ymax)]
220 | def calc_location_new(dimension, proj_matrix, box_2d, alpha, theta_ray):
221 |     #global orientation
222 |     orient = alpha + theta_ray
223 |     #print('orient:',orient)
224 |     R = rotation_matrix(orient) # 为了方便通过loc+ori计算顶点坐标;将世界坐标系转化为相机坐标系
225 |     #print('rotation_matrix:', R)
226 |     # format 2d corners
227 |     xmin = box_2d[0][0]
228 |     ymin = box_2d[0][1]
229 |     xmax = box_2d[1][0]
230 |     ymax = box_2d[1][1]
231 | 
232 |     # left top right bottom
233 |     box_corners = [xmin, ymin, xmax, ymax]
234 |     #print('box_corners:',box_corners)
235 |     # get the point constraints
236 |     constraints = []
237 | 
238 |     left_constraints = []
239 |     right_constraints = []
240 |     top_constraints = []
241 |     bottom_constraints = []
242 | 
243 |     dx = dimension[2] / 2   # length
244 |     dy = dimension[0] / 2   # height
245 |     dz = dimension[1] / 2   # width
246 |     #print('dx:',dx,'dy:',dy,'dz:',dz)
247 |     # below is very much based on trial and error
248 | 
249 |     # based on the relative angle, a different configuration occurs
250 |     # negative is back of car, positive is front
251 |     left_mult = 1
252 |     right_mult = -1
253 | 
254 |     # about straight on but opposite way
255 |     if alpha < np.deg2rad(92) and alpha > np.deg2rad(88): # 左边——车前右，右边——车前左
256 |         left_mult = 1
257 |         right_mult = 1
258 |     # about straight on and same way
259 |     elif alpha < np.deg2rad(-88) and alpha > np.deg2rad(-92): # 左边——车后左，右边——车后右
260 |         left_mult = -1
261 |         right_mult = -1
262 |     # this works but doesnt make much sense
263 |     elif alpha < np.deg2rad(90) and alpha > -np.deg2rad(90): # 左边——车后左/右（当alpha<0,左；alpha>0,右）
264 |         left_mult = -1
265 |         right_mult = 1
266 | 
267 | 
268 | 
269 | 
270 |     # if the car is facing the oppositeway, switch left and right
271 |     switch_mult = -1 #-1
272 |     if alpha > 0:
273 |         switch_mult = 1 #1
274 | 
275 |     # left and right could either be the front of the car or the back of the car
276 |     # careful to use left and right based on image, no of actual car's left and right
277 |     for i in (-2,0):
278 |         left_constraints.append([left_mult * dx, i*dy, -switch_mult * dz])
279 |     for i in (-2,0):
280 |         right_constraints.append([right_mult * dx, i*dy, switch_mult * dz])
281 | 
282 |     """
283 |     # left and right could either be the front of the car ot the back of the car
284 |     # careful to use left and right based on image, no of actual car's left and right
285 |     for i in (-1,1):
286 |         for j in (-1,1):
287 |             for k in (-2,0):
288 |                 left_constraints.append([i * dx, k * dy, j * dz])
289 |     for i in (-1,1):
290 |         for j in (-1,1):
291 |             for k in (-2,0):
292 |                 right_constraints.append([i * dx, k * dy, j * dz])
293 |     """
294 | 
295 |     # top and bottom are easy, just the top and bottom of car
296 |     for i in (-1,1):
297 |         for j in (-1,1):
298 |             top_constraints.append([i*dx, -dy*2, j*dz])
299 |     for i in (-1,1):
300 |         for j in (-1,1):
301 |             bottom_constraints.append([i*dx, 0, j*dz])
302 | 
303 |     # now, 64 combinations
304 |     for left in left_constraints:
305 |         for top in top_constraints:
306 |             for right in right_constraints:
307 |                 for bottom in bottom_constraints:
308 |                     constraints.append([left, top, right, bottom])
309 | 
310 |     # filter out the ones with repeats
311 |     constraints = filter(lambda x: len(x) == len(set(tuple(i) for i in x)), constraints)
312 | 
313 |     # create pre M (the term with I and the R*X)
314 |     pre_M = np.zeros([4,4])
315 |     # 1's down diagonal
316 |     for i in range(0,4):
317 |         pre_M[i][i] = 1
318 | 
319 |     best_loc = None
320 |     best_error = [1e09]
321 |     best_X = None
322 | 
323 |     # loop through each possible constraint, hold on to the best guess
324 |     # constraint will be 64 sets of 4 corners
325 |     count = 0
326 |     for constraint in constraints:
327 |         #print('constraint:',constraint)
328 |         # each corner
329 |         Xa = constraint[0]
330 |         Xb = constraint[1]
331 |         Xc = constraint[2]
332 |         Xd = constraint[3]
333 | 
334 |         X_array = [Xa, Xb, Xc, Xd] # 4约束，对应上下左右 ，shape=4,3
335 |         #print('X_array:',X_array)
336 |         # M: all 1's down diagonal, and upper 3x1 is Rotation_matrix * [x, y, z]
337 |         Ma = np.copy(pre_M)
338 |         Mb = np.copy(pre_M)
339 |         Mc = np.copy(pre_M)
340 |         Md = np.copy(pre_M)
341 | 
342 |         M_array = [Ma, Mb, Mc, Md] # 4个对角为1的4*4方阵
343 | 
344 |         # create A, b
345 |         A = np.zeros([4,3], dtype=np.float)
346 |         b = np.zeros([4,1])
347 | 
348 |         # 对于其中某个约束/上/下/左/右
349 |         indicies = [0,1,0,1]
350 |         for row, index in enumerate(indicies):
351 |             X = X_array[row]
352 |             # x_array is four constrains for up bottom left and right;
353 |             # x is one point in world World coordinate system, .shape = 3
354 |             M = M_array[row] # 一个对角是1的4*4方阵 .shape = 4*4
355 | 
356 |             # create M for corner Xx
357 |             RX = np.dot(R, X) # 某边对应的某点在相机坐标系下的坐标, 维度3 .shape = 3,1
358 |             M[:3,3] = RX.reshape(3) # 对角线是1，最后一列前三行分别是RX对应的相机坐标系下的长宽高 .shape = 4,4
359 | 
360 |             M = np.dot(proj_matrix, M) # 投影到二维平面的坐标（前三维）.shape = 3,4，前三列是project矩阵，最后一列是二维平面的x，y，1
361 | 
362 |             A[row, :] = M[index,:3] - box_corners[row] * M[2,:3]
363 |             b[row] = box_corners[row] * M[2,3] - M[index,3]
364 | 
365 |             #print('A:',A)
366 |             #print('b:',b)
367 |             #print("M:",M)
368 |             #input()
369 | 
370 |         # solve here with least squares, since over fit will get some error
371 |         loc, error, rank, s = np.linalg.lstsq(A, b, rcond=None)
372 | 
373 |         # 3d bounding box dimensions
374 | 
375 |         # 3d bounding box corners
376 |         x_corners = [dx , dx, -dx , -dx, dx, dx, -dx, -dx]
377 |         y_corners = [0, 0, 0, 0, -2*dy, -2*dy, -2*dy, -2*dy]
378 |         z_corners = [dz, -dz, -dz, dz, dz, -dz, -dz, dz]
379 | 
380 |         # rotate and translate 3d bounding box
381 |         corners_3d = np.dot(R, np.vstack([x_corners, y_corners, z_corners]))
382 |         # corners_3d = np.dot(R_x, corners_3d)
383 |         # print corners_3d.shape
384 |         corners_3d[0, :] = corners_3d[0, :] + loc[0]
385 |         corners_3d[1, :] = corners_3d[1, :] + loc[1]
386 |         corners_3d[2, :] = corners_3d[2, :] + loc[2]
387 |         corners_3d = np.transpose(corners_3d)
388 |         N = corners_3d.shape[0]
389 |         points = np.hstack([corners_3d, np.ones((N, 1))]).T
390 |         points = np.matmul(proj_matrix, points)
391 |         points /= points[2, :]
392 |         points_2d = (points[0:2, :]).T
393 |         #print(points_2d)
394 |         included = 0
395 |         print('box_corners',box_corners)
396 |         for cor_point_2d in points_2d:
397 |             if cor_point_2d[0]<xmax+3 and cor_point_2d[0]>xmin-3 and cor_point_2d[1]<ymax+3 and cor_point_2d[1]>ymin-3:
398 |                 included += 1
399 |         # found a better estimation
400 |         if error < best_error and included == 8:
401 |             count += 1 # for debugging
402 |             best_loc = loc
403 |             best_error = error
404 |             best_X = X_array
405 |             print('best_loc',best_loc)
406 | 
407 |     # return best_loc, [left_constraints, right_constraints] # for debugging
408 |     #
409 |     if best_loc is not None:
410 |         best_loc = [best_loc[0], best_loc[1], best_loc[2]]
411 |     return best_loc, best_X
412 | 
413 | def calc_theta_ray(img, box_2d, proj_matrix):
414 |     width = img.shape[1]
415 |     fovx = 2 * np.arctan(width / (2 * proj_matrix[0][0]))
416 |     center = (box_2d[1][0] + box_2d[0][0]) / 2
417 |     dx = center - (width / 2)
418 |     mult = 1
419 |     if dx < 0:
420 |         mult = -1
421 |     dx = abs(dx)
422 |     angle = np.arctan( (2*dx*np.tan(fovx/2)) / width )
423 |     angle = angle * mult
424 |     return angle
425 | 


--------------------------------------------------------------------------------