├── .dockerignore
├── .gitignore
├── CoRL
├── config_inference
│ ├── camera_info.yaml
│ └── config_pose.yaml
├── cuboid.py
├── cuboid_pnp_solver.py
├── detector.py
├── inference.py
├── models.py
├── readme.md
├── requirements.txt
├── train.py
└── utils_dope.py
├── common
├── cuboid.py
├── cuboid_pnp_solver.py
├── debug.py
├── detector.py
├── models.py
└── utils.py
├── config
├── blenderproc_camera_info_example.yaml
├── camera_info.yaml
└── config_pose.yaml
├── data_generation
├── backgrounds
│ ├── README.md
│ └── messy_office.png
├── blenderproc_data_gen
│ ├── README.md
│ ├── generate_training_data.py
│ └── run_blenderproc_datagen.py
├── dome_hdri_haven
│ └── download.md
├── models
│ └── Ketchup
│ │ └── google_16k
│ │ ├── texture_map.png
│ │ ├── texture_map_flat.png
│ │ ├── textured.mtl
│ │ ├── textured.obj
│ │ ├── textured.obj.bin
│ │ ├── textured.obj.json
│ │ ├── textured_simple.mtl
│ │ └── textured_simple.obj
├── nvisii_data_gen
│ ├── .gitignore
│ ├── debug_json_ros_node.py
│ ├── doc
│ │ └── videos
│ │ │ ├── cylinder_nosym.mp4
│ │ │ ├── cylinder_sym.mp4
│ │ │ └── hex_screw.mp4
│ ├── download_google_scanned_objects.py
│ ├── generate_dataset.py
│ ├── models_with_symmetries
│ │ ├── cylinder
│ │ │ └── google_16k
│ │ │ │ ├── model_info.json
│ │ │ │ ├── texture_map_flat.png
│ │ │ │ ├── textured.mtl
│ │ │ │ └── textured.obj
│ │ └── hex_screw
│ │ │ └── google_16k
│ │ │ ├── model_info.json
│ │ │ ├── texture_map_flat.png
│ │ │ ├── textured.mtl
│ │ │ └── textured.obj
│ ├── output
│ │ └── output_example
│ │ │ ├── 00000.depth.exr
│ │ │ ├── 00000.json
│ │ │ ├── 00000.png
│ │ │ └── 00000.seg.exr
│ ├── readme.md
│ ├── requirements.txt
│ ├── single_video_pybullet.py
│ └── utils.py
├── readme.md
└── validate_data.py
├── doc
└── camera_tutorial.md
├── dope_objects.png
├── evaluate
├── .gitignore
├── add_compute.py
├── download_content.sh
├── evaluate.py
├── kpd_compute.py
├── make_graphs.py
├── overlay.png
├── readme.md
├── render_json.py
├── results
│ └── output.png
└── utils_eval.py
├── inference
├── README.md
└── inference.py
├── license.md
├── readme.md
├── requirements.txt
├── ros1
├── README.md
├── docker
│ ├── Dockerfile.noetic
│ ├── init_workspace.sh
│ ├── readme.md
│ └── run_dope_docker.sh
├── dope
│ ├── CMakeLists.txt
│ ├── config
│ │ ├── camera_info.yaml
│ │ └── config_pose.yaml
│ ├── launch
│ │ ├── camera.launch
│ │ └── dope.launch
│ ├── nodes
│ │ ├── camera
│ │ └── dope
│ ├── package.xml
│ ├── setup.py
│ ├── src
│ │ └── dope
│ │ │ ├── __init__.py
│ │ │ ├── inference
│ │ │ ├── __init__.py
│ │ │ ├── cuboid.py
│ │ │ ├── cuboid_pnp_solver.py
│ │ │ └── detector.py
│ │ │ └── utils.py
│ └── weights
│ │ └── README.md
└── requirements.txt
├── ros2
└── README.md
├── sample_data
├── 000000.json
├── 000000.png
├── 000001.json
├── 000001.png
├── 000030.json
├── 000030.png
├── 000031.json
└── 000031.png
├── train
├── .gitignore
├── README.md
├── docker
│ ├── Dockerfile
│ └── get_nvidia_libs.sh
├── misc
│ ├── arial.ttf
│ └── test_projection.py
└── train.py
├── walkthrough.md
└── weights
└── readme.md
/.dockerignore:
--------------------------------------------------------------------------------
1 | .git
2 | /weights
3 | *.pyc
4 | *._
5 | *.png
6 | __pycache__
7 | venv
8 | .idea
9 |
--------------------------------------------------------------------------------
/.gitignore:
--------------------------------------------------------------------------------
1 | *.pth
2 | *.pyc
3 | *._
4 | __pycache__
5 | venv
6 | .idea
7 | .DS_Store
8 | ._.DS_Store
9 | *.hdr
10 | google_scanned_models/
11 | scripts/nvisii_data_gen/output/dataset/
12 |
--------------------------------------------------------------------------------
/CoRL/config_inference/camera_info.yaml:
--------------------------------------------------------------------------------
1 | image_width: 640
2 | image_height: 480
3 | camera_name: dope_webcam_0
4 | camera_matrix:
5 | rows: 3
6 | cols: 3
7 | data: [641.5, 0, 320.0, 0, 641.5, 240.0, 0, 0, 1]
8 | distortion_model: plumb_bob
9 | distortion_coefficients:
10 | rows: 1
11 | cols: 5
12 | data: [0, 0, 0, 0, 0]
13 | rectification_matrix:
14 | rows: 3
15 | cols: 3
16 | data: [1, 0, 0, 0, 1, 0, 0, 0, 1]
17 | projection_matrix:
18 | rows: 3
19 | cols: 4
20 | data: [641.5, 0, 320.0, 0, 0, 641.5, 240.0, 0, 0, 0, 1, 0]
21 |
--------------------------------------------------------------------------------
/CoRL/config_inference/config_pose.yaml:
--------------------------------------------------------------------------------
1 | topic_camera: "/dope/webcam/image_raw"
2 | topic_camera_info: "/dope/webcam/camera_info"
3 | topic_publishing: "dope"
4 | input_is_rectified: True # Whether the input image is rectified (strongly suggested!)
5 | downscale_height: 400 # if the input image is larger than this, scale it down to this pixel height
6 |
7 | # Comment any of these lines to prevent detection / pose estimation of that object
8 | weights: {
9 | # "cracker":"package://dope/weights/cracker_60.pth",
10 | # "gelatin":"package://dope/weights/gelatin_60.pth",
11 | # "meat":"package://dope/weights/meat_20.pth",
12 | # "mustard":"package://dope/weights/mustard_60.pth",
13 | # "soup":"package://dope/weights/soup_60.pth",
14 | # 'peg_hole': "package://dope/weights/peg_box_40.pth",
15 | # 'cube_red': "package://dope/weights/red_40.pth",
16 | #"sugar":"package://dope/weights/sugar_60.pth"
17 | # "bleach":"package://dope/weights/bleach_28_dr.pth"
18 | # 'pudding':"weights_dope/pudding.pth"
19 | # 'pudding':"weights_dope/dope_network/net_epoch_60.pth"
20 | 'alphabet_soup':"weights_dope/resnet_simple/alphabe_soup.pth"
21 | }
22 |
23 | # Type of neural network architecture
24 | architectures: {
25 | 'pudding':"dope",
26 | 'alphabet_soup':'resnet_simple',
27 | }
28 |
29 |
30 | # Cuboid dimension in cm x,y,z
31 | dimensions: {
32 | "cracker": [16.403600692749023,21.343700408935547,7.179999828338623],
33 | "gelatin": [8.918299674987793, 7.311500072479248, 2.9983000755310059],
34 | "meat": [10.164673805236816,8.3542995452880859,5.7600898742675781],
35 | "mustard": [9.6024150848388672,19.130100250244141,5.824894905090332],
36 | "soup": [6.7659378051757813,10.185500144958496,6.771425724029541],
37 | "sugar": [9.267730712890625,17.625339508056641,4.5134143829345703],
38 | "bleach": [10.267730712890625,26.625339508056641,7.5134143829345703],
39 | "peg_hole": [12.6,3.9,12.6],
40 | 'cube_red':[5,5,5],
41 | 'pudding':[49.47199821472168, 29.923000335693359, 83.498001098632812],
42 | 'alphabet_soup':[8.3555002212524414, 7.1121001243591309, 6.6055998802185059]
43 |
44 | }
45 |
46 | class_ids: {
47 | "cracker": 1,
48 | "gelatin": 2,
49 | "meat": 3,
50 | "mustard": 4,
51 | "soup": 5,
52 | "sugar": 6,
53 | "bleach": 7,
54 | "peg_hole": 8,
55 | "cube_red": 9,
56 | 'pudding': 10,
57 | 'alphabet_soup': 12,
58 | }
59 |
60 | draw_colors: {
61 | "cracker": [13, 255, 128], # green
62 | "gelatin": [255, 255, 255], # while
63 | "meat": [0, 104, 255], # blue
64 | "mustard": [217,12, 232], # magenta
65 | "soup": [255, 101, 0], # orange
66 | "sugar": [232, 222, 12], # yellow
67 | "bleach": [232, 222, 12], # yellow
68 | "peg_hole": [232, 222, 12], # yellow
69 | "cube_red": [255,0,0],
70 | "pudding": [255,0,0],
71 | }
72 |
73 | # optional: provide a transform that is applied to the pose returned by DOPE
74 | model_transforms: {
75 | # "cracker": [[ 0, 0, 1, 0],
76 | # [ 0, -1, 0, 0],
77 | # [ 1, 0, 0, 0],
78 | # [ 0, 0, 0, 1]]
79 | }
80 |
81 | # optional: if you provide a mesh of the object here, a mesh marker will be
82 | # published for visualization in RViz
83 | # You can use the nvdu_ycb tool to download the meshes: https://github.com/NVIDIA/Dataset_Utilities#nvdu_ycb
84 | meshes: {
85 | # "cracker": "file://path/to/Dataset_Utilities/nvdu/data/ycb/aligned_cm/003_cracker_box/google_16k/textured.obj",
86 | # "gelatin": "file://path/to/Dataset_Utilities/nvdu/data/ycb/aligned_cm/009_gelatin_box/google_16k/textured.obj",
87 | # "meat": "file://path/to/Dataset_Utilities/nvdu/data/ycb/aligned_cm/010_potted_meat_can/google_16k/textured.obj",
88 | # "mustard": "file://path/to/Dataset_Utilities/nvdu/data/ycb/aligned_cm/006_mustard_bottle/google_16k/textured.obj",
89 | # "soup": "file://path/to/Dataset_Utilities/nvdu/data/ycb/aligned_cm/005_tomato_soup_can/google_16k/textured.obj",
90 | # "sugar": "file://path/to/Dataset_Utilities/nvdu/data/ycb/aligned_cm/004_sugar_box/google_16k/textured.obj",
91 | # "bleach": "file://path/to/Dataset_Utilities/nvdu/data/ycb/aligned_cm/021_bleach_cleanser/google_16k/textured.obj",
92 | }
93 |
94 | # optional: If the specified meshes are not in meters, provide a scale here (e.g. if the mesh is in centimeters, scale should be 0.01). default scale: 1.0.
95 | mesh_scales: {
96 | "cracker": 0.01,
97 | "gelatin": 0.01,
98 | "meat": 0.01,
99 | "mustard": 0.01,
100 | "soup": 0.01,
101 | "sugar": 0.01,
102 | "bleach": 0.01,
103 | }
104 |
105 | # Config params for DOPE
106 | thresh_angle: 0.5
107 | thresh_map: 0.0001
108 | sigma: 3
109 | thresh_points: 0.1
110 |
--------------------------------------------------------------------------------
/CoRL/cuboid.py:
--------------------------------------------------------------------------------
1 | # Copyright (c) 2018 NVIDIA Corporation. All rights reserved.
2 | # This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
3 | # https://creativecommons.org/licenses/by-nc-sa/4.0/legalcode
4 |
5 | from enum import IntEnum, unique
6 | import numpy as np
7 | import cv2
8 | from pyrr import Quaternion, Matrix44, Vector3, euler
9 |
10 | # Related to the object's local coordinate system
11 | # @unique
12 | class CuboidVertexType(IntEnum):
13 | FrontTopRight = 0
14 | FrontTopLeft = 1
15 | FrontBottomLeft = 2
16 | FrontBottomRight = 3
17 | RearTopRight = 4
18 | RearTopLeft = 5
19 | RearBottomLeft = 6
20 | RearBottomRight = 7
21 | Center = 8
22 | TotalCornerVertexCount = 8 # Corner vertexes doesn't include the center point
23 | TotalVertexCount = 9
24 |
25 | # List of the vertex indexes in each line edges of the cuboid
26 | CuboidLineIndexes = [
27 | # Front face
28 | [ CuboidVertexType.FrontTopLeft, CuboidVertexType.FrontTopRight ],
29 | [ CuboidVertexType.FrontTopRight, CuboidVertexType.FrontBottomRight ],
30 | [ CuboidVertexType.FrontBottomRight, CuboidVertexType.FrontBottomLeft ],
31 | [ CuboidVertexType.FrontBottomLeft, CuboidVertexType.FrontTopLeft ],
32 | # Back face
33 | [ CuboidVertexType.RearTopLeft, CuboidVertexType.RearTopRight ],
34 | [ CuboidVertexType.RearTopRight, CuboidVertexType.RearBottomRight ],
35 | [ CuboidVertexType.RearBottomRight, CuboidVertexType.RearBottomLeft ],
36 | [ CuboidVertexType.RearBottomLeft, CuboidVertexType.RearTopLeft ],
37 | # Left face
38 | [ CuboidVertexType.FrontBottomLeft, CuboidVertexType.RearBottomLeft ],
39 | [ CuboidVertexType.FrontTopLeft, CuboidVertexType.RearTopLeft ],
40 | # Right face
41 | [ CuboidVertexType.FrontBottomRight, CuboidVertexType.RearBottomRight ],
42 | [ CuboidVertexType.FrontTopRight, CuboidVertexType.RearTopRight ],
43 | ]
44 |
45 |
46 | # ========================= Cuboid3d =========================
47 | class Cuboid3d():
48 | '''This class contains a 3D cuboid.'''
49 |
50 | # Create a box with a certain size
51 | def __init__(self, size3d = [1.0, 1.0, 1.0], center_location = [0, 0, 0],
52 | coord_system = None, parent_object = None):
53 |
54 | # NOTE: This local coordinate system is similar
55 | # to the intrinsic transform matrix of a 3d object
56 | self.center_location = center_location
57 | self.coord_system = coord_system
58 | self.size3d = size3d
59 | self._vertices = [0, 0, 0] * CuboidVertexType.TotalVertexCount
60 |
61 | self.generate_vertexes()
62 |
63 | def get_vertex(self, vertex_type):
64 | """Returns the location of a vertex.
65 |
66 | Args:
67 | vertex_type: enum of type CuboidVertexType
68 |
69 | Returns:
70 | Numpy array(3) - Location of the vertex type in the cuboid
71 | """
72 | return self._vertices[vertex_type]
73 |
74 | def get_vertices(self):
75 | return self._vertices
76 |
77 | def generate_vertexes(self):
78 | width, height, depth = self.size3d
79 |
80 | # By default just use the normal OpenCV coordinate system
81 | if (self.coord_system is None):
82 | cx, cy, cz = self.center_location
83 | # X axis point to the right
84 | right = cx + width / 2.0
85 | left = cx - width / 2.0
86 | # Y axis point downward
87 | top = cy - height / 2.0
88 | bottom = cy + height / 2.0
89 | # Z axis point forward
90 | front = cz + depth / 2.0
91 | rear = cz - depth / 2.0
92 |
93 | # List of 8 vertices of the box
94 | self._vertices = [
95 | [right, top, front], # Front Top Right
96 | [left, top, front], # Front Top Left
97 | [left, bottom, front], # Front Bottom Left
98 | [right, bottom, front], # Front Bottom Right
99 | [right, top, rear], # Rear Top Right
100 | [left, top, rear], # Rear Top Left
101 | [left, bottom, rear], # Rear Bottom Left
102 | [right, bottom, rear], # Rear Bottom Right
103 | self.center_location, # Center
104 | ]
105 | else:
106 | sx, sy, sz = self.size3d
107 | forward = np.array(self.coord_system.forward, dtype=float) * sy * 0.5
108 | up = np.array(self.coord_system.up, dtype=float) * sz * 0.5
109 | right = np.array(self.coord_system.right, dtype=float) * sx * 0.5
110 | center = np.array(self.center_location, dtype=float)
111 | self._vertices = [
112 | center + forward + up + right, # Front Top Right
113 | center + forward + up - right, # Front Top Left
114 | center + forward - up - right, # Front Bottom Left
115 | center + forward - up + right, # Front Bottom Right
116 | center - forward + up + right, # Rear Top Right
117 | center - forward + up - right, # Rear Top Left
118 | center - forward - up - right, # Rear Bottom Left
119 | center - forward - up + right, # Rear Bottom Right
120 | self.center_location, # Center
121 | ]
122 |
123 | def get_projected_cuboid2d(self, cuboid_transform, camera_intrinsic_matrix):
124 | """
125 | Projects the cuboid into the image plane using camera intrinsics.
126 |
127 | Args:
128 | cuboid_transform: the world transform of the cuboid
129 | camera_intrinsic_matrix: camera intrinsic matrix
130 |
131 | Returns:
132 | Cuboid2d - the projected cuboid points
133 | """
134 |
135 | world_transform_matrix = cuboid_transform
136 | rvec = [0, 0, 0]
137 | tvec = [0, 0, 0]
138 | dist_coeffs = np.zeros((4, 1))
139 |
140 | transformed_vertices = [0, 0, 0] * CuboidVertexType.TotalVertexCount
141 | for vertex_index in range(CuboidVertexType.TotalVertexCount):
142 | vertex3d = self._vertices[vertex_index]
143 | transformed_vertices[vertex_index] = world_transform_matrix * vertex3d
144 |
145 | projected_vertices = cv2.projectPoints(transformed_vertices, rvec, tvec,
146 | camera_intrinsic_matrix, dist_coeffs)
147 |
148 | return Cuboid2d(projected_vertices)
149 |
--------------------------------------------------------------------------------
/CoRL/cuboid_pnp_solver.py:
--------------------------------------------------------------------------------
1 | # Copyright (c) 2018 NVIDIA Corporation. All rights reserved.
2 | # This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
3 | # https://creativecommons.org/licenses/by-nc-sa/4.0/legalcode
4 |
5 | import cv2
6 | import numpy as np
7 | from cuboid import CuboidVertexType
8 | from pyrr import Quaternion
9 |
10 |
11 | class CuboidPNPSolver(object):
12 | """
13 | This class is used to find the 6-DoF pose of a cuboid given its projected vertices.
14 |
15 | Runs perspective-n-point (PNP) algorithm.
16 | """
17 |
18 | # Class variables
19 | cv2version = cv2.__version__.split(".")
20 | cv2majorversion = int(cv2version[0])
21 |
22 | def __init__(
23 | self,
24 | object_name="",
25 | camera_intrinsic_matrix=None,
26 | cuboid3d=None,
27 | dist_coeffs=np.zeros((4, 1)),
28 | ):
29 | self.object_name = object_name
30 | if not camera_intrinsic_matrix is None:
31 | self._camera_intrinsic_matrix = camera_intrinsic_matrix
32 | else:
33 | self._camera_intrinsic_matrix = np.array([[0, 0, 0], [0, 0, 0], [0, 0, 0]])
34 | self._cuboid3d = cuboid3d
35 |
36 | self._dist_coeffs = dist_coeffs
37 |
38 | def set_camera_intrinsic_matrix(self, new_intrinsic_matrix):
39 | """Sets the camera intrinsic matrix"""
40 | self._camera_intrinsic_matrix = new_intrinsic_matrix
41 |
42 | def set_dist_coeffs(self, dist_coeffs):
43 | """Sets the camera intrinsic matrix"""
44 | self._dist_coeffs = dist_coeffs
45 |
46 | def solve_pnp(self, cuboid2d_points, pnp_algorithm=None):
47 | """
48 | Detects the rotation and traslation
49 | of a cuboid object from its vertexes'
50 | 2D location in the image
51 | """
52 |
53 | # Fallback to default PNP algorithm base on OpenCV version
54 | if pnp_algorithm is None:
55 | if CuboidPNPSolver.cv2majorversion == 2:
56 | pnp_algorithm = cv2.CV_ITERATIVE
57 | elif CuboidPNPSolver.cv2majorversion == 3:
58 | pnp_algorithm = cv2.SOLVEPNP_ITERATIVE
59 |
60 | if pnp_algorithm is None:
61 | pnp_algorithm = cv2.SOLVEPNP_EPNP
62 |
63 | location = None
64 | quaternion = None
65 | projected_points = cuboid2d_points
66 |
67 | cuboid3d_points = np.array(self._cuboid3d.get_vertices())
68 | obj_2d_points = []
69 | obj_3d_points = []
70 |
71 | for i in range(CuboidVertexType.TotalVertexCount):
72 | check_point_2d = cuboid2d_points[i]
73 | # Ignore invalid points
74 | if check_point_2d is None:
75 | continue
76 | obj_2d_points.append(check_point_2d)
77 | obj_3d_points.append(cuboid3d_points[i])
78 |
79 | obj_2d_points = np.array(obj_2d_points, dtype=float)
80 | obj_3d_points = np.array(obj_3d_points, dtype=float)
81 |
82 | valid_point_count = len(obj_2d_points)
83 |
84 | # Can only do PNP if we have more than 3 valid points
85 | is_points_valid = valid_point_count >= 4
86 |
87 | if is_points_valid:
88 |
89 | ret, rvec, tvec = cv2.solvePnP(
90 | obj_3d_points,
91 | obj_2d_points,
92 | self._camera_intrinsic_matrix,
93 | self._dist_coeffs,
94 | flags=pnp_algorithm,
95 | )
96 |
97 | if ret:
98 | location = list(x[0] for x in tvec)
99 | quaternion = self.convert_rvec_to_quaternion(rvec)
100 |
101 | projected_points, _ = cv2.projectPoints(
102 | cuboid3d_points,
103 | rvec,
104 | tvec,
105 | self._camera_intrinsic_matrix,
106 | self._dist_coeffs,
107 | )
108 | projected_points = np.squeeze(projected_points)
109 |
110 | # If the location.Z is negative or object is behind the camera then flip both location and rotation
111 | x, y, z = location
112 | if z < 0:
113 | # Get the opposite location
114 | location = [-x, -y, -z]
115 |
116 | # Change the rotation by 180 degree
117 | rotate_angle = np.pi
118 | rotate_quaternion = Quaternion.from_axis_rotation(
119 | location, rotate_angle
120 | )
121 | quaternion = rotate_quaternion.cross(quaternion)
122 |
123 | return location, quaternion, projected_points
124 |
125 | def convert_rvec_to_quaternion(self, rvec):
126 | """Convert rvec (which is log quaternion) to quaternion"""
127 | theta = np.sqrt(
128 | rvec[0] * rvec[0] + rvec[1] * rvec[1] + rvec[2] * rvec[2]
129 | ) # in radians
130 | raxis = [rvec[0] / theta, rvec[1] / theta, rvec[2] / theta]
131 |
132 | # pyrr's Quaternion (order is XYZW), https://pyrr.readthedocs.io/en/latest/oo_api_quaternion.html
133 | return Quaternion.from_axis_rotation(raxis, theta)
134 |
135 | def project_points(self, rvec, tvec):
136 | """Project points from model onto image using rotation, translation"""
137 | output_points, tmp = cv2.projectPoints(
138 | self.__object_vertex_coordinates,
139 | rvec,
140 | tvec,
141 | self.__camera_intrinsic_matrix,
142 | self.__dist_coeffs,
143 | )
144 |
145 | output_points = np.squeeze(output_points)
146 | return output_points
147 |
--------------------------------------------------------------------------------
/CoRL/readme.md:
--------------------------------------------------------------------------------
1 | # Training
2 |
3 | This is the training code used for the [CoRL 2018 paper](https://arxiv.org/abs/1809.10790). You can also use this training script on the training data generated by the training script in [data_generation/nvisii_data_gen/](../data_generation/nvisii_data_gen/)
4 |
5 | ```
6 | python -m torch.distributed.launch --nproc_per_node=1 train.py --network dope --epochs 2 --batchsize 10 --outf tmp/ --data ../nvisii_data_gen/output/output_example/
7 | ```
8 |
9 | There is an accompanying dataset you can also use to train DOPE on the meat can with the shiny top. [link here](https://drive.google.com/file/d/1Q5VLnlt1gu2pKIAcUo9uzSyWw1nGlSF8/view?usp=sharing).
10 |
11 | # Inference
12 |
13 | I also made an inference script that runs without any ROS components.
14 |
15 | ```
16 | python inference.py
17 | ```
18 |
19 | Look at the file for more information, similar to the ROS node everything is run through the yaml files in `config_inference`. It is very similar to the original code with some changes.
20 |
21 | Check `models.py` as we are proposing different architectures.
--------------------------------------------------------------------------------
/CoRL/requirements.txt:
--------------------------------------------------------------------------------
1 | opencv-python-headless<4.3
2 | albumentations
3 | matplotlib
4 | simplejson
5 | numpy
6 | opencv_python
7 | photutils
8 | scipy
9 | torch
10 | pyquaternion
11 | tqdm
12 | pyrr
13 | Pillow==5.2.0
14 | torchvision
15 | PyYAML
16 | tensorboardX
17 |
--------------------------------------------------------------------------------
/common/cuboid.py:
--------------------------------------------------------------------------------
1 | # Copyright (c) 2018 NVIDIA Corporation. All rights reserved.
2 | # This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
3 | # https://creativecommons.org/licenses/by-nc-sa/4.0/legalcode
4 |
5 | from enum import IntEnum, unique
6 | import numpy as np
7 | import cv2
8 | from pyrr import Quaternion, Matrix44, Vector3, euler
9 |
10 | # Related to the object's local coordinate system
11 | # @unique
12 | class CuboidVertexType(IntEnum):
13 | FrontTopRight = 0
14 | FrontTopLeft = 1
15 | FrontBottomLeft = 2
16 | FrontBottomRight = 3
17 | RearTopRight = 4
18 | RearTopLeft = 5
19 | RearBottomLeft = 6
20 | RearBottomRight = 7
21 | Center = 8
22 | TotalCornerVertexCount = 8 # Corner vertexes doesn't include the center point
23 | TotalVertexCount = 9
24 |
25 | # List of the vertex indexes in each line edges of the cuboid
26 | CuboidLineIndexes = [
27 | # Front face
28 | [ CuboidVertexType.FrontTopLeft, CuboidVertexType.FrontTopRight ],
29 | [ CuboidVertexType.FrontTopRight, CuboidVertexType.FrontBottomRight ],
30 | [ CuboidVertexType.FrontBottomRight, CuboidVertexType.FrontBottomLeft ],
31 | [ CuboidVertexType.FrontBottomLeft, CuboidVertexType.FrontTopLeft ],
32 | # Back face
33 | [ CuboidVertexType.RearTopLeft, CuboidVertexType.RearTopRight ],
34 | [ CuboidVertexType.RearTopRight, CuboidVertexType.RearBottomRight ],
35 | [ CuboidVertexType.RearBottomRight, CuboidVertexType.RearBottomLeft ],
36 | [ CuboidVertexType.RearBottomLeft, CuboidVertexType.RearTopLeft ],
37 | # Left face
38 | [ CuboidVertexType.FrontBottomLeft, CuboidVertexType.RearBottomLeft ],
39 | [ CuboidVertexType.FrontTopLeft, CuboidVertexType.RearTopLeft ],
40 | # Right face
41 | [ CuboidVertexType.FrontBottomRight, CuboidVertexType.RearBottomRight ],
42 | [ CuboidVertexType.FrontTopRight, CuboidVertexType.RearTopRight ],
43 | ]
44 |
45 |
46 | # ========================= Cuboid3d =========================
47 | class Cuboid3d():
48 | '''This class contains a 3D cuboid.'''
49 |
50 | # Create a box with a certain size
51 | def __init__(self, size3d = [1.0, 1.0, 1.0], center_location = [0, 0, 0],
52 | coord_system = None, parent_object = None):
53 |
54 | # NOTE: This local coordinate system is similar
55 | # to the intrinsic transform matrix of a 3d object
56 | self.center_location = center_location
57 | self.coord_system = coord_system
58 | self.size3d = size3d
59 | self._vertices = [0, 0, 0] * CuboidVertexType.TotalVertexCount
60 |
61 | self.generate_vertexes()
62 |
63 | def get_vertex(self, vertex_type):
64 | """Returns the location of a vertex.
65 |
66 | Args:
67 | vertex_type: enum of type CuboidVertexType
68 |
69 | Returns:
70 | Numpy array(3) - Location of the vertex type in the cuboid
71 | """
72 | return self._vertices[vertex_type]
73 |
74 | def get_vertices(self):
75 | return self._vertices
76 |
77 | def generate_vertexes(self):
78 | width, height, depth = self.size3d
79 |
80 | # By default just use the normal OpenCV coordinate system
81 | if (self.coord_system is None):
82 | cx, cy, cz = self.center_location
83 | # X axis point to the right
84 | right = cx + width / 2.0
85 | left = cx - width / 2.0
86 | # Y axis point downward
87 | top = cy - height / 2.0
88 | bottom = cy + height / 2.0
89 | # Z axis point forward
90 | front = cz + depth / 2.0
91 | rear = cz - depth / 2.0
92 |
93 | # List of 8 vertices of the box
94 | self._vertices = [
95 | [right, top, front], # Front Top Right
96 | [left, top, front], # Front Top Left
97 | [left, bottom, front], # Front Bottom Left
98 | [right, bottom, front], # Front Bottom Right
99 | [right, top, rear], # Rear Top Right
100 | [left, top, rear], # Rear Top Left
101 | [left, bottom, rear], # Rear Bottom Left
102 | [right, bottom, rear], # Rear Bottom Right
103 | self.center_location, # Center
104 | ]
105 | else:
106 | sx, sy, sz = self.size3d
107 | forward = np.array(self.coord_system.forward, dtype=float) * sy * 0.5
108 | up = np.array(self.coord_system.up, dtype=float) * sz * 0.5
109 | right = np.array(self.coord_system.right, dtype=float) * sx * 0.5
110 | center = np.array(self.center_location, dtype=float)
111 | self._vertices = [
112 | center + forward + up + right, # Front Top Right
113 | center + forward + up - right, # Front Top Left
114 | center + forward - up - right, # Front Bottom Left
115 | center + forward - up + right, # Front Bottom Right
116 | center - forward + up + right, # Rear Top Right
117 | center - forward + up - right, # Rear Top Left
118 | center - forward - up - right, # Rear Bottom Left
119 | center - forward - up + right, # Rear Bottom Right
120 | self.center_location, # Center
121 | ]
122 |
123 | def get_projected_cuboid2d(self, cuboid_transform, camera_intrinsic_matrix):
124 | """
125 | Projects the cuboid into the image plane using camera intrinsics.
126 |
127 | Args:
128 | cuboid_transform: the world transform of the cuboid
129 | camera_intrinsic_matrix: camera intrinsic matrix
130 |
131 | Returns:
132 | Cuboid2d - the projected cuboid points
133 | """
134 |
135 | world_transform_matrix = cuboid_transform
136 | rvec = [0, 0, 0]
137 | tvec = [0, 0, 0]
138 | dist_coeffs = np.zeros((4, 1))
139 |
140 | transformed_vertices = [0, 0, 0] * CuboidVertexType.TotalVertexCount
141 | for vertex_index in range(CuboidVertexType.TotalVertexCount):
142 | vertex3d = self._vertices[vertex_index]
143 | transformed_vertices[vertex_index] = world_transform_matrix * vertex3d
144 |
145 | projected_vertices = cv2.projectPoints(transformed_vertices, rvec, tvec,
146 | camera_intrinsic_matrix, dist_coeffs)
147 |
148 | return Cuboid2d(projected_vertices)
149 |
--------------------------------------------------------------------------------
/common/cuboid_pnp_solver.py:
--------------------------------------------------------------------------------
1 | # Copyright (c) 2018 NVIDIA Corporation. All rights reserved.
2 | # This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
3 | # https://creativecommons.org/licenses/by-nc-sa/4.0/legalcode
4 |
5 | import cv2
6 | import numpy as np
7 | from cuboid import CuboidVertexType
8 | from pyrr import Quaternion
9 |
10 |
11 | class CuboidPNPSolver(object):
12 | """
13 | This class is used to find the 6-DoF pose of a cuboid given its projected vertices.
14 |
15 | Runs perspective-n-point (PNP) algorithm.
16 | """
17 |
18 | # Class variables
19 | cv2version = cv2.__version__.split(".")
20 | cv2majorversion = int(cv2version[0])
21 |
22 | def __init__(
23 | self,
24 | object_name="",
25 | camera_intrinsic_matrix=None,
26 | cuboid3d=None,
27 | dist_coeffs=np.zeros((4, 1)),
28 | ):
29 | self.object_name = object_name
30 | if not camera_intrinsic_matrix is None:
31 | self._camera_intrinsic_matrix = camera_intrinsic_matrix
32 | else:
33 | self._camera_intrinsic_matrix = np.array([[0, 0, 0], [0, 0, 0], [0, 0, 0]])
34 | self._cuboid3d = cuboid3d
35 |
36 | self._dist_coeffs = dist_coeffs
37 |
38 | def set_camera_intrinsic_matrix(self, new_intrinsic_matrix):
39 | """Sets the camera intrinsic matrix"""
40 | self._camera_intrinsic_matrix = new_intrinsic_matrix
41 |
42 | def set_dist_coeffs(self, dist_coeffs):
43 | """Sets the camera intrinsic matrix"""
44 | self._dist_coeffs = dist_coeffs
45 |
46 | def solve_pnp(self, cuboid2d_points, pnp_algorithm=None):
47 | """
48 | Detects the rotation and traslation
49 | of a cuboid object from its vertexes'
50 | 2D location in the image
51 | """
52 |
53 | # Fallback to default PNP algorithm base on OpenCV version
54 | if pnp_algorithm is None:
55 | if CuboidPNPSolver.cv2majorversion == 2:
56 | pnp_algorithm = cv2.CV_ITERATIVE
57 | elif CuboidPNPSolver.cv2majorversion == 3:
58 | pnp_algorithm = cv2.SOLVEPNP_ITERATIVE
59 |
60 | if pnp_algorithm is None:
61 | pnp_algorithm = cv2.SOLVEPNP_EPNP
62 |
63 | location = None
64 | quaternion = None
65 | projected_points = cuboid2d_points
66 |
67 | cuboid3d_points = np.array(self._cuboid3d.get_vertices())
68 | obj_2d_points = []
69 | obj_3d_points = []
70 |
71 | for i in range(CuboidVertexType.TotalVertexCount):
72 | check_point_2d = cuboid2d_points[i]
73 | # Ignore invalid points
74 | if check_point_2d is None:
75 | continue
76 | obj_2d_points.append(check_point_2d)
77 | obj_3d_points.append(cuboid3d_points[i])
78 |
79 | obj_2d_points = np.array(obj_2d_points, dtype=float)
80 | obj_3d_points = np.array(obj_3d_points, dtype=float)
81 |
82 | valid_point_count = len(obj_2d_points)
83 |
84 | # Can only do PNP if we have more than 3 valid points
85 | is_points_valid = valid_point_count >= 4
86 |
87 | if is_points_valid:
88 |
89 | ret, rvec, tvec = cv2.solvePnP(
90 | obj_3d_points,
91 | obj_2d_points,
92 | self._camera_intrinsic_matrix,
93 | self._dist_coeffs,
94 | flags=pnp_algorithm,
95 | )
96 |
97 | if ret:
98 | location = list(x[0] for x in tvec)
99 | quaternion = self.convert_rvec_to_quaternion(rvec)
100 |
101 | projected_points, _ = cv2.projectPoints(
102 | cuboid3d_points,
103 | rvec,
104 | tvec,
105 | self._camera_intrinsic_matrix,
106 | self._dist_coeffs,
107 | )
108 | projected_points = np.squeeze(projected_points)
109 |
110 | # If the location.Z is negative or object is behind the camera then flip both location and rotation
111 | x, y, z = location
112 | if z < 0:
113 | # Get the opposite location
114 | location = [-x, -y, -z]
115 |
116 | # Change the rotation by 180 degree
117 | rotate_angle = np.pi
118 | rotate_quaternion = Quaternion.from_axis_rotation(
119 | location, rotate_angle
120 | )
121 | quaternion = rotate_quaternion.cross(quaternion)
122 |
123 | return location, quaternion, projected_points
124 |
125 | def convert_rvec_to_quaternion(self, rvec):
126 | """Convert rvec (which is log quaternion) to quaternion"""
127 | theta = np.sqrt(
128 | rvec[0] * rvec[0] + rvec[1] * rvec[1] + rvec[2] * rvec[2]
129 | ) # in radians
130 | raxis = [rvec[0] / theta, rvec[1] / theta, rvec[2] / theta]
131 |
132 | # pyrr's Quaternion (order is XYZW), https://pyrr.readthedocs.io/en/latest/oo_api_quaternion.html
133 | return Quaternion.from_axis_rotation(raxis, theta)
134 |
135 | def project_points(self, rvec, tvec):
136 | """Project points from model onto image using rotation, translation"""
137 | output_points, tmp = cv2.projectPoints(
138 | self.__object_vertex_coordinates,
139 | rvec,
140 | tvec,
141 | self.__camera_intrinsic_matrix,
142 | self.__dist_coeffs,
143 | )
144 |
145 | output_points = np.squeeze(output_points)
146 | return output_points
147 |
--------------------------------------------------------------------------------
/common/debug.py:
--------------------------------------------------------------------------------
1 | # Debugging Tool to Visualize Synthetic Data Projected Points Accuracy
2 |
3 | from PIL import Image
4 | import json
5 |
6 | from utils import Draw, loadimages
7 |
8 | import argparse
9 | import os
10 |
11 |
12 | def visualize_projected_points(path_img, path_json, path_output, img_name, root):
13 | img = Image.open(path_img).convert("RGB")
14 |
15 | with open(path_json) as f:
16 | data_json = json.load(f)
17 |
18 | draw = Draw(img)
19 |
20 | for obj in data_json["objects"]:
21 | projected_cuboid_keypoints = [tuple(pair) for pair in obj["projected_cuboid"]]
22 | draw.draw_cube(projected_cuboid_keypoints)
23 |
24 | path_output = os.path.join(
25 | path_output, img_path.replace(root, "").replace(img_name, "").lstrip("/")
26 | )
27 | os.makedirs(path_output, exist_ok=True)
28 | img.save(os.path.join(path_output, img_name))
29 |
30 |
31 | if __name__ == "__main__":
32 | parser = argparse.ArgumentParser()
33 |
34 | parser.add_argument(
35 | "--outf",
36 | default="output/debug",
37 | help="Where to store the debug output images.",
38 | )
39 | parser.add_argument(
40 | "--data",
41 | required=True,
42 | help="Folder containing groundtruth and images.",
43 | )
44 |
45 | opt = parser.parse_args()
46 |
47 | imgs = sorted(loadimages(opt.data, extensions=["jpg", "png"]))
48 |
49 | for i, (img_path, img_name, json_path) in enumerate(imgs):
50 | img_rel_path = img_path.replace(opt.data, "")
51 | print(f"Debugging image {img_rel_path} ({i + 1} of {len(imgs)}) | Outputting to: {opt.outf + '/' + img_rel_path}")
52 | visualize_projected_points(img_path, json_path, opt.outf, img_name, opt.data)
53 |
--------------------------------------------------------------------------------
/common/models.py:
--------------------------------------------------------------------------------
1 | """
2 | NVIDIA from jtremblay@gmail.com
3 | """
4 |
5 | # Networks
6 | import torch
7 | import torch
8 | import torch.nn as nn
9 | import torch.nn.parallel
10 | import torch.utils.data
11 | import torchvision.models as models
12 |
13 |
14 | class DopeNetwork(nn.Module):
15 | def __init__(
16 | self,
17 | pretrained=False,
18 | numBeliefMap=9,
19 | numAffinity=16,
20 | stop_at_stage=6, # number of stages to process (if less than total number of stages)
21 | ):
22 | super(DopeNetwork, self).__init__()
23 |
24 | self.stop_at_stage = stop_at_stage
25 |
26 | vgg_full = models.vgg19(pretrained=False).features
27 | self.vgg = nn.Sequential()
28 | for i_layer in range(24):
29 | self.vgg.add_module(str(i_layer), vgg_full[i_layer])
30 |
31 | # Add some layers
32 | i_layer = 23
33 | self.vgg.add_module(
34 | str(i_layer), nn.Conv2d(512, 256, kernel_size=3, stride=1, padding=1)
35 | )
36 | self.vgg.add_module(str(i_layer + 1), nn.ReLU(inplace=True))
37 | self.vgg.add_module(
38 | str(i_layer + 2), nn.Conv2d(256, 128, kernel_size=3, stride=1, padding=1)
39 | )
40 | self.vgg.add_module(str(i_layer + 3), nn.ReLU(inplace=True))
41 |
42 | # print('---Belief------------------------------------------------')
43 | # _2 are the belief map stages
44 | self.m1_2 = DopeNetwork.create_stage(128, numBeliefMap, True)
45 | self.m2_2 = DopeNetwork.create_stage(
46 | 128 + numBeliefMap + numAffinity, numBeliefMap, False
47 | )
48 | self.m3_2 = DopeNetwork.create_stage(
49 | 128 + numBeliefMap + numAffinity, numBeliefMap, False
50 | )
51 | self.m4_2 = DopeNetwork.create_stage(
52 | 128 + numBeliefMap + numAffinity, numBeliefMap, False
53 | )
54 | self.m5_2 = DopeNetwork.create_stage(
55 | 128 + numBeliefMap + numAffinity, numBeliefMap, False
56 | )
57 | self.m6_2 = DopeNetwork.create_stage(
58 | 128 + numBeliefMap + numAffinity, numBeliefMap, False
59 | )
60 |
61 | # print('---Affinity----------------------------------------------')
62 | # _1 are the affinity map stages
63 | self.m1_1 = DopeNetwork.create_stage(128, numAffinity, True)
64 | self.m2_1 = DopeNetwork.create_stage(
65 | 128 + numBeliefMap + numAffinity, numAffinity, False
66 | )
67 | self.m3_1 = DopeNetwork.create_stage(
68 | 128 + numBeliefMap + numAffinity, numAffinity, False
69 | )
70 | self.m4_1 = DopeNetwork.create_stage(
71 | 128 + numBeliefMap + numAffinity, numAffinity, False
72 | )
73 | self.m5_1 = DopeNetwork.create_stage(
74 | 128 + numBeliefMap + numAffinity, numAffinity, False
75 | )
76 | self.m6_1 = DopeNetwork.create_stage(
77 | 128 + numBeliefMap + numAffinity, numAffinity, False
78 | )
79 |
80 | def forward(self, x):
81 | """Runs inference on the neural network"""
82 |
83 | out1 = self.vgg(x)
84 |
85 | out1_2 = self.m1_2(out1)
86 | out1_1 = self.m1_1(out1)
87 |
88 | if self.stop_at_stage == 1:
89 | return [out1_2], [out1_1]
90 |
91 | out2 = torch.cat([out1_2, out1_1, out1], 1)
92 | out2_2 = self.m2_2(out2)
93 | out2_1 = self.m2_1(out2)
94 |
95 | if self.stop_at_stage == 2:
96 | return [out1_2, out2_2], [out1_1, out2_1]
97 |
98 | out3 = torch.cat([out2_2, out2_1, out1], 1)
99 | out3_2 = self.m3_2(out3)
100 | out3_1 = self.m3_1(out3)
101 |
102 | if self.stop_at_stage == 3:
103 | return [out1_2, out2_2, out3_2], [out1_1, out2_1, out3_1]
104 |
105 | out4 = torch.cat([out3_2, out3_1, out1], 1)
106 | out4_2 = self.m4_2(out4)
107 | out4_1 = self.m4_1(out4)
108 |
109 | if self.stop_at_stage == 4:
110 | return [out1_2, out2_2, out3_2, out4_2], [out1_1, out2_1, out3_1, out4_1]
111 |
112 | out5 = torch.cat([out4_2, out4_1, out1], 1)
113 | out5_2 = self.m5_2(out5)
114 | out5_1 = self.m5_1(out5)
115 |
116 | if self.stop_at_stage == 5:
117 | return [out1_2, out2_2, out3_2, out4_2, out5_2], [
118 | out1_1,
119 | out2_1,
120 | out3_1,
121 | out4_1,
122 | out5_1,
123 | ]
124 |
125 | out6 = torch.cat([out5_2, out5_1, out1], 1)
126 | out6_2 = self.m6_2(out6)
127 | out6_1 = self.m6_1(out6)
128 |
129 | return [out1_2, out2_2, out3_2, out4_2, out5_2, out6_2], [
130 | out1_1,
131 | out2_1,
132 | out3_1,
133 | out4_1,
134 | out5_1,
135 | out6_1,
136 | ]
137 |
138 | @staticmethod
139 | def create_stage(in_channels, out_channels, first=False):
140 | """Create the neural network layers for a single stage."""
141 |
142 | model = nn.Sequential()
143 | mid_channels = 128
144 | if first:
145 | padding = 1
146 | kernel = 3
147 | count = 6
148 | final_channels = 512
149 | else:
150 | padding = 3
151 | kernel = 7
152 | count = 10
153 | final_channels = mid_channels
154 |
155 | # First convolution
156 | model.add_module(
157 | "0",
158 | nn.Conv2d(
159 | in_channels, mid_channels, kernel_size=kernel, stride=1, padding=padding
160 | ),
161 | )
162 |
163 | # Middle convolutions
164 | i = 1
165 | while i < count - 1:
166 | model.add_module(str(i), nn.ReLU(inplace=True))
167 | i += 1
168 | model.add_module(
169 | str(i),
170 | nn.Conv2d(
171 | mid_channels,
172 | mid_channels,
173 | kernel_size=kernel,
174 | stride=1,
175 | padding=padding,
176 | ),
177 | )
178 | i += 1
179 |
180 | # Penultimate convolution
181 | model.add_module(str(i), nn.ReLU(inplace=True))
182 | i += 1
183 | model.add_module(
184 | str(i), nn.Conv2d(mid_channels, final_channels, kernel_size=1, stride=1)
185 | )
186 | i += 1
187 |
188 | # Last convolution
189 | model.add_module(str(i), nn.ReLU(inplace=True))
190 | i += 1
191 | model.add_module(
192 | str(i), nn.Conv2d(final_channels, out_channels, kernel_size=1, stride=1)
193 | )
194 | i += 1
195 |
196 | return model
197 |
--------------------------------------------------------------------------------
/config/blenderproc_camera_info_example.yaml:
--------------------------------------------------------------------------------
1 | image_width: 500
2 | image_height: 500
3 | camera_name: dope_webcam_0
4 | camera_matrix:
5 | rows: 3
6 | cols: 3
7 | data: [603.5535070631239, 0, 249.5, 0, 603.5535070631239, 249.5, 0, 0, 1]
8 | distortion_model: plumb_bob
9 | distortion_coefficients:
10 | rows: 1
11 | cols: 5
12 | data: [0, 0, 0, 0, 0]
13 | rectification_matrix:
14 | rows: 3
15 | cols: 3
16 | data: [1, 0, 0, 0, 1, 0, 0, 0, 1]
17 | projection_matrix:
18 | rows: 3
19 | cols: 4
20 | data: [603.5535070631239, 0, 249.5, 0, 0, 603.5535070631239, 249.5, 0, 0, 0, 1, 0]
21 |
--------------------------------------------------------------------------------
/config/camera_info.yaml:
--------------------------------------------------------------------------------
1 | image_width: 640
2 | image_height: 480
3 | camera_name: dope_webcam_0
4 | camera_matrix:
5 | rows: 3
6 | cols: 3
7 | data: [641.5, 0, 320.0, 0, 641.5, 240.0, 0, 0, 1]
8 | distortion_model: plumb_bob
9 | distortion_coefficients:
10 | rows: 1
11 | cols: 5
12 | data: [0, 0, 0, 0, 0]
13 | rectification_matrix:
14 | rows: 3
15 | cols: 3
16 | data: [1, 0, 0, 0, 1, 0, 0, 0, 1]
17 | projection_matrix:
18 | rows: 3
19 | cols: 4
20 | data: [641.5, 0, 320.0, 0, 0, 641.5, 240.0, 0, 0, 0, 1, 0]
21 |
--------------------------------------------------------------------------------
/config/config_pose.yaml:
--------------------------------------------------------------------------------
1 | topic_camera: "/dope/webcam/image_raw"
2 | topic_camera_info: "/dope/webcam/camera_info"
3 | topic_publishing: "dope"
4 | input_is_rectified: True # Whether the input image is rectified (strongly suggested!)
5 | downscale_height: 400 # if the input image is larger than this, scale it down to this pixel height
6 |
7 | # Comment any of these lines to prevent detection / pose estimation of that object
8 | weights: {
9 | # "cracker":"package://dope/weights/cracker_60.pth",
10 | # "gelatin":"package://dope/weights/gelatin_60.pth",
11 | # "meat":"package://dope/weights/meat_20.pth",
12 | # "mustard":"package://dope/weights/mustard_60.pth",
13 | "soup":"package://dope/weights/soup_60.pth",
14 | #"sugar":"package://dope/weights/sugar_60.pth",
15 | # "bleach":"package://dope/weights/bleach_28_dr.pth"
16 |
17 | # NEW OBJECTS - HOPE
18 | # "AlphabetSoup":"package://dope/weights/AlphabetSoup.pth",
19 | # "BBQSauce":"package://dope/weights/BBQSauce.pth",
20 | # "Butter":"package://dope/weights/Butter.pth",
21 | # "Cherries":"package://dope/weights/Cherries.pth",
22 | # "ChocolatePudding":"package://dope/weights/ChocolatePudding.pth",
23 | # "Cookies":"package://dope/weights/Cookies.pth",
24 | # "Corn":"package://dope/weights/Corn.pth",
25 | # "CreamCheese":"package://dope/weights/CreamCheese.pth",
26 | # "GreenBeans":"package://dope/weights/GreenBeans.pth",
27 | # "GranolaBars":"package://dope/weights/GranolaBars.pth",
28 | # "Ketchup":"package://dope/weights/Ketchup.pth",
29 | # "MacaroniAndCheese":"package://dope/weights/MacaroniAndCheese.pth",
30 | # "Mayo":"package://dope/weights/Mayo.pth",
31 | # "Milk":"package://dope/weights/Milk.pth",
32 | # "Mushrooms":"package://dope/weights/Mushrooms.pth",
33 | # "Mustard":"package://dope/weights/Mustard.pth",
34 | # "Parmesan":"package://dope/weights/Parmesan.pth",
35 | # "PeasAndCarrots":"package://dope/weights/PeasAndCarrots.pth",
36 | # "Peaches":"package://dope/weights/Peaches.pth",
37 | # "Pineapple":"package://dope/weights/Pineapple.pth",
38 | # "Popcorn":"package://dope/weights/Popcorn.pth",
39 | # "OrangeJuice":"package://dope/weights/OrangeJuice.pth",
40 | # "Raisins":"package://dope/weights/Raisins.pth",
41 | # "SaladDressing":"package://dope/weights/SaladDressing.pth",
42 | # "Spaghetti":"package://dope/weights/Spaghetti.pth",
43 | # "TomatoSauce":"package://dope/weights/TomatoSauce.pth",
44 | # "Tuna":"package://dope/weights/Tuna.pth",
45 | # "Yogurt":"package://dope/weights/Yogurt.pth",
46 |
47 | }
48 |
49 | # Cuboid dimension in cm x,y,z
50 | dimensions: {
51 | "cracker": [16.403600692749023,21.343700408935547,7.179999828338623],
52 | "gelatin": [8.918299674987793, 7.311500072479248, 2.9983000755310059],
53 | "meat": [10.164673805236816,8.3542995452880859,5.7600898742675781],
54 | "mustard": [9.6024150848388672,19.130100250244141,5.824894905090332],
55 | "soup": [6.7659378051757813,10.185500144958496,6.771425724029541],
56 | "sugar": [9.267730712890625,17.625339508056641,4.5134143829345703],
57 | "bleach": [10.267730712890625,26.625339508056641,7.5134143829345703],
58 |
59 | # new objects
60 | "AlphabetSoup" : [ 8.3555002212524414, 7.1121001243591309, 6.6055998802185059 ],
61 | "Butter" : [ 5.282599925994873, 2.3935999870300293, 10.330100059509277 ],
62 | "Ketchup" : [ 14.860799789428711, 4.3368000984191895, 6.4513998031616211 ],
63 | "Pineapple" : [ 5.7623000144958496, 6.95989990234375, 6.567500114440918 ],
64 | "BBQSauce" : [ 14.832900047302246, 4.3478999137878418, 6.4632000923156738 ],
65 | "MacaroniAndCheese" : [ 16.625600814819336, 4.0180997848510742, 12.350899696350098 ],
66 | "Popcorn" : [ 8.4976997375488281, 3.825200080871582, 12.649200439453125 ],
67 | "Mayo" : [ 14.790200233459473, 4.1030998229980469, 6.4541001319885254 ],
68 | "Raisins" : [ 12.317500114440918, 3.9751999378204346, 8.5874996185302734 ],
69 | "Cherries" : [ 5.8038997650146484, 7.0907998085021973, 6.6101999282836914 ],
70 | "Milk" : [ 19.035800933837891, 7.326200008392334, 7.2154998779296875 ],
71 | "SaladDressing" : [ 14.744099617004395, 4.3695998191833496, 6.403900146484375 ],
72 | "ChocolatePudding" : [ 4.947199821472168, 2.9923000335693359, 8.3498001098632812 ],
73 | "Mushrooms" : [ 3.3322000503540039, 7.079899787902832, 6.5869998931884766 ],
74 | "Spaghetti" : [ 4.9836997985839844, 2.8492999076843262, 24.988100051879883 ],
75 | "Cookies" : [ 16.724300384521484, 4.015200138092041, 12.274600028991699 ],
76 | "Mustard" : [ 16.004999160766602, 4.8573999404907227, 6.5132999420166016 ],
77 | "TomatoSauce" : [ 8.2847003936767578, 7.0198001861572266, 6.6469998359680176 ],
78 | "Corn" : [ 5.8038997650146484, 7.0907998085021973, 6.6101999282836914 ],
79 | "OrangeJuice" : [ 19.248300552368164, 7.2781000137329102, 7.1582999229431152 ],
80 | "Tuna" : [ 3.2571001052856445, 7.0805997848510742, 6.5837001800537109 ],
81 | "CreamCheese" : [ 5.3206000328063965, 2.4230999946594238, 10.359000205993652 ],
82 | "Parmesan" : [ 10.286199569702148, 6.6093001365661621, 7.1117000579833984 ],
83 | "Yogurt" : [ 5.3677000999450684, 6.7961997985839844, 6.7915000915527344 ],
84 | "GranolaBars" : [ 12.400600433349609, 3.8738000392913818, 16.53380012512207 ],
85 | "Peaches" : [ 5.7781000137329102, 7.0961999893188477, 6.5925998687744141 ],
86 | "GreenBeans" : [ 5.758699893951416, 7.0608000755310059, 6.5732002258300781 ],
87 | "PeasAndCarrots" : [ 5.8512001037597656, 7.0636000633239746, 6.5918002128601074 ]
88 | }
89 |
90 | class_ids: {
91 | "cracker": 1,
92 | "gelatin": 2,
93 | "meat": 3,
94 | "mustard": 4,
95 | "soup": 5,
96 | "sugar": 6,
97 | "bleach": 7,
98 | "AlphabetSoup" : 9,
99 | "Ketchup" : 10,
100 | "Pineapple" : 11,
101 | "BBQSauce" : 12,
102 | "MacaroniAndCheese" : 13,
103 | "Popcorn" : 14,
104 | "Butter" : 15,
105 | "Mayo" : 16,
106 | "Raisins" : 17,
107 | "Cherries" : 18,
108 | "Milk" : 19,
109 | "SaladDressing" : 20,
110 | "ChocolatePudding" : 21,
111 | "Mushrooms" : 22,
112 | "Spaghetti" : 23,
113 | "Cookies" : 24,
114 | "Mustard" : 25,
115 | "TomatoSauce" : 26,
116 | "Corn" : 27,
117 | "OrangeJuice" : 28,
118 | "Tuna" : 29,
119 | "CreamCheese" : 20,
120 | "Parmesan" : 31,
121 | "Yogurt" : 32,
122 | "GranolaBars" : 33,
123 | "Peaches" : 34,
124 | "GreenBeans" : 35,
125 | "PeasAndCarrots" : 36
126 | }
127 |
128 | draw_colors: {
129 | "cracker": [13, 255, 128], # green
130 | "gelatin": [255, 255, 255], # while
131 | "meat": [0, 104, 255], # blue
132 | "mustard": [217,12, 232], # magenta
133 | "soup": [255, 101, 0], # orange
134 | "sugar": [232, 222, 12], # yellow
135 | "bleach": [232, 222, 12], # yellow
136 | }
137 |
138 | # optional: provide a transform that is applied to the pose returned by DOPE
139 | model_transforms: {
140 | # "cracker": [[ 0, 0, 1, 0],
141 | # [ 0, -1, 0, 0],
142 | # [ 1, 0, 0, 0],
143 | # [ 0, 0, 0, 1]]
144 | }
145 |
146 | # optional: if you provide a mesh of the object here, a mesh marker will be
147 | # published for visualization in RViz
148 | # You can use the nvdu_ycb tool to download the meshes: https://github.com/NVIDIA/Dataset_Utilities#nvdu_ycb
149 | meshes: {
150 | # "cracker": "file://path/to/Dataset_Utilities/nvdu/data/ycb/aligned_cm/003_cracker_box/google_16k/textured.obj",
151 | # "gelatin": "file://path/to/Dataset_Utilities/nvdu/data/ycb/aligned_cm/009_gelatin_box/google_16k/textured.obj",
152 | # "meat": "file://path/to/Dataset_Utilities/nvdu/data/ycb/aligned_cm/010_potted_meat_can/google_16k/textured.obj",
153 | # "mustard": "file://path/to/Dataset_Utilities/nvdu/data/ycb/aligned_cm/006_mustard_bottle/google_16k/textured.obj",
154 | # "soup": "file://path/to/Dataset_Utilities/nvdu/data/ycb/aligned_cm/005_tomato_soup_can/google_16k/textured.obj",
155 | # "sugar": "file://path/to/Dataset_Utilities/nvdu/data/ycb/aligned_cm/004_sugar_box/google_16k/textured.obj",
156 | # "bleach": "file://path/to/Dataset_Utilities/nvdu/data/ycb/aligned_cm/021_bleach_cleanser/google_16k/textured.obj",
157 | }
158 |
159 | # optional: If the specified meshes are not in meters, provide a scale here (e.g. if the mesh is in centimeters, scale should be 0.01). default scale: 1.0.
160 | mesh_scales: {
161 | "cracker": 0.01,
162 | "gelatin": 0.01,
163 | "meat": 0.01,
164 | "mustard": 0.01,
165 | "soup": 0.01,
166 | "sugar": 0.01,
167 | "bleach": 0.01,
168 | }
169 |
170 | overlay_belief_images: True # Whether to overlay the input image on the belief images published on /dope/belief_[obj_name]
171 |
172 | # Config params for DOPE
173 | thresh_angle: 0.5
174 | thresh_map: 0.01
175 | sigma: 3
176 | thresh_points: 0.1
177 |
--------------------------------------------------------------------------------
/data_generation/backgrounds/README.md:
--------------------------------------------------------------------------------
1 | # Background Images
2 |
3 | Place background images here.
--------------------------------------------------------------------------------
/data_generation/backgrounds/messy_office.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/NVlabs/Deep_Object_Pose/6cb3f3d250aed270e91b605e7cb6e64db4b97dbe/data_generation/backgrounds/messy_office.png
--------------------------------------------------------------------------------
/data_generation/blenderproc_data_gen/README.md:
--------------------------------------------------------------------------------
1 | # Synthetic Data Generation with BlenderProc
2 |
3 | ## Installation
4 | BlenderProc can be installed with pip:
5 | ```
6 | pip install blenderproc
7 | ```
8 | If you run into troubles, please consult the [project's own GitHub page](https://github.com/DLR-RM/BlenderProc).
9 |
10 |
11 | ## Usage
12 |
13 | [BlenderProc](https://github.com/DLR-RM/BlenderProc) is intended to create a single scene and render multiple frames of it. Adding and removing objects (such as varying the number of distractors) will cause memory bloat and poor performance. To avoid this issue, we use a batching script (`run_blenderproc_datagen.py`) to run a standalone BlenderProc script several times.
14 |
15 |
16 |
17 | ### Usage example:
18 |
19 | Run the BlenderProc script in five parallel jobs, each generating 1000 frames. Each frame will have six instances of the object and ten randomly chosen distractor objects:
20 | ```
21 | ./run_blenderproc_datagen.py --nb_runs 5 --nb_frames 1000 --path_single_obj ../models/Ketchup/google_16k/textured.obj --nb_objects 6 --distractors_folder ~/data/google_scanned_models/ --nb_distractors 10 --backgrounds_folder ../dome_hdri_haven/
22 | ```
23 |
24 | Parameters of the top-level script can be shown by running
25 | ```
26 | python ./run_blenderproc_datagen.py --help
27 | ```
28 |
29 | Note that, as a BlenderProc script, `generate_training_data.py` cannot be invoked with Python. It must be run via the `blenderproc` launch script. To discover its command-line parameters, you must look
30 | at the source-code itself; `blenderproc run ./generate_training_data.py --help` will not report
31 | them properly.
32 |
33 | BlenderProc searches for python modules in a different order than when invoking Python by itself. If you run into an issue where `generate_training_data.py` fails to import modules that you have installed, you may have to re-install them via BlenderProc; e.g. `blenderproc pip install pyquaternion`.
34 |
35 | ## Versioning Notes
36 |
37 | 10 June 2024
38 |
39 | The order of the `projected_cuboid` points generated by the BlenderProc scripts changed with
40 | changelist `22a2468`. This fix rotated the points 180 degrees around the Z (vertical axis). Do not
41 | intermingle data generated before this change with data generated after it.
42 |
--------------------------------------------------------------------------------
/data_generation/blenderproc_data_gen/run_blenderproc_datagen.py:
--------------------------------------------------------------------------------
1 | #!/usr/bin/env python3
2 |
3 | import argparse
4 | import multiprocessing
5 | import os
6 | from queue import Queue
7 | import subprocess
8 | import sys
9 |
10 |
11 | parser = argparse.ArgumentParser()
12 | ## Parameters for this script
13 | parser.add_argument(
14 | '--nb_runs',
15 | default=1,
16 | type=int,
17 | help='Number of times the datagen script is run. Each time it is run, a new set of '
18 | 'distractors is selected.'
19 | )
20 | parser.add_argument(
21 | '--nb_workers',
22 | default=0,
23 | type=int,
24 | help='Number of parallel blenderproc workers to run. The default of 0 will create '
25 | 'one worker for every CPU core'
26 | )
27 |
28 |
29 | opt, unknown = parser.parse_known_args()
30 |
31 | num_workers = min(opt.nb_workers, multiprocessing.cpu_count())
32 | if num_workers == 0:
33 | num_workers = multiprocessing.cpu_count()
34 |
35 | amount_of_runs = opt.nb_runs
36 |
37 | # set the folder in which the generation script is located
38 | rerun_folder = os.path.abspath(os.path.dirname(__file__))
39 |
40 | Q = Queue(maxsize = num_workers)
41 | for run_id in range(amount_of_runs):
42 | if Q.full():
43 | proc = Q.get()
44 | proc.wait()
45 |
46 | # execute one BlenderProc run
47 | cmd = ["blenderproc", "run", os.path.join(rerun_folder, "generate_training_data.py")]
48 | cmd.extend(unknown)
49 | cmd.extend(['--run_id', str(run_id)])
50 | p = subprocess.Popen(" ".join(cmd), shell=True)
51 | Q.put(p)
52 |
--------------------------------------------------------------------------------
/data_generation/dome_hdri_haven/download.md:
--------------------------------------------------------------------------------
1 | Download HDRI maps from https://polyhaven.com/hdris
--------------------------------------------------------------------------------
/data_generation/models/Ketchup/google_16k/texture_map.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/NVlabs/Deep_Object_Pose/6cb3f3d250aed270e91b605e7cb6e64db4b97dbe/data_generation/models/Ketchup/google_16k/texture_map.png
--------------------------------------------------------------------------------
/data_generation/models/Ketchup/google_16k/texture_map_flat.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/NVlabs/Deep_Object_Pose/6cb3f3d250aed270e91b605e7cb6e64db4b97dbe/data_generation/models/Ketchup/google_16k/texture_map_flat.png
--------------------------------------------------------------------------------
/data_generation/models/Ketchup/google_16k/textured.mtl:
--------------------------------------------------------------------------------
1 | newmtl textured:_texture
2 | map_Kd texture_map.png
3 |
--------------------------------------------------------------------------------
/data_generation/models/Ketchup/google_16k/textured.obj.bin:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/NVlabs/Deep_Object_Pose/6cb3f3d250aed270e91b605e7cb6e64db4b97dbe/data_generation/models/Ketchup/google_16k/textured.obj.bin
--------------------------------------------------------------------------------
/data_generation/models/Ketchup/google_16k/textured.obj.json:
--------------------------------------------------------------------------------
1 | {
2 | "created_at": "2019-11-20T15:13:41.741628",
3 | "version": "0.1",
4 | "mtllibs": [
5 | "textured.mtl"
6 | ],
7 | "vertex_buffers": [
8 | {
9 | "material": "textured:_texture",
10 | "vertex_format": "T2F_N3F_V3F",
11 | "byte_offset": 0,
12 | "byte_length": 1491360
13 | }
14 | ]
15 | }
--------------------------------------------------------------------------------
/data_generation/models/Ketchup/google_16k/textured_simple.mtl:
--------------------------------------------------------------------------------
1 | # Blender MTL File: 'None'
2 | # Material Count: 1
3 |
4 | newmtl textured:_texture
5 | Ns 0.000000
6 | Ka 1.000000 1.000000 1.000000
7 | Kd 0.800000 0.800000 0.800000
8 | Ks 0.000000 0.000000 0.000000
9 | Ke 0.000000 0.000000 0.000000
10 | Ni 1.450000
11 | d 1.000000
12 | illum 1
13 | map_Kd texture_map.png
14 |
--------------------------------------------------------------------------------
/data_generation/nvisii_data_gen/.gitignore:
--------------------------------------------------------------------------------
1 | output/
--------------------------------------------------------------------------------
/data_generation/nvisii_data_gen/debug_json_ros_node.py:
--------------------------------------------------------------------------------
1 | #!/usr/bin/env python3
2 | """
3 | This is a simple ROS node that reads the various transform data from a set of
4 | json files that were generated by `nvisii_data_gen` and publishes them as TF
5 | transforms over ROS, so that they can be visualized using RViz (to debug
6 | whether the transforms are correct). It was only used while debugging the json
7 | output of `nvisii_data_gen`, so most users should not need this file. It is
8 | only left in here as an example on how to use the transformations from the json
9 | fields in ROS.
10 | """
11 |
12 | import json
13 | import time
14 |
15 | import numpy as np
16 | import rospy
17 | import tf
18 | from tf.transformations import quaternion_from_matrix, translation_from_matrix
19 |
20 | rospy.init_node("debug_json")
21 |
22 | tf_broadcaster = tf.TransformBroadcaster()
23 |
24 | while True:
25 | # This assumes that there are (at least) 50 frames (00000.json, 00001.json, ...) in the directory.
26 | for frame_number in range(50):
27 | if rospy.is_shutdown():
28 | break
29 | path = f"{str(frame_number).zfill(5)}.json"
30 | with open(path) as json_file:
31 | conf = json.load(json_file)
32 | print(path)
33 |
34 | stamp = rospy.Time.now()
35 |
36 | camera_data = conf['camera_data']
37 | tf_broadcaster.sendTransform(translation=camera_data['location_worldframe'],
38 | rotation=camera_data['quaternion_xyzw_worldframe'],
39 | time=stamp,
40 | parent='world',
41 | child='camera',
42 | )
43 | # transpose to transform between column-major and row-major order
44 | camera_view_matrix = np.array(camera_data['camera_view_matrix']).transpose()
45 | tf_broadcaster.sendTransform(translation=translation_from_matrix(camera_view_matrix),
46 | rotation=quaternion_from_matrix(camera_view_matrix),
47 | time=stamp,
48 | parent='camera',
49 | child='world_from_matrix',
50 | )
51 | for object_data in conf['objects']:
52 | tf_broadcaster.sendTransform(translation=object_data['location_worldframe'],
53 | rotation=object_data['quaternion_xyzw_worldframe'],
54 | time=stamp,
55 | parent='world',
56 | child=f"{object_data['name']}_world",
57 | )
58 | tf_broadcaster.sendTransform(translation=object_data['location'],
59 | rotation=object_data['quaternion_xyzw'],
60 | time=stamp,
61 | parent='camera',
62 | child=f"{object_data['name']}_cam",
63 | )
64 | local_to_world_matrix = np.array(object_data['local_to_world_matrix']).transpose()
65 | tf_broadcaster.sendTransform(translation=translation_from_matrix(local_to_world_matrix),
66 | rotation=quaternion_from_matrix(local_to_world_matrix),
67 | time=stamp,
68 | parent='world',
69 | child=f"{object_data['name']}_cam_from_matrix",
70 | )
71 |
72 | time.sleep(1 / 30)
73 |
--------------------------------------------------------------------------------
/data_generation/nvisii_data_gen/doc/videos/cylinder_nosym.mp4:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/NVlabs/Deep_Object_Pose/6cb3f3d250aed270e91b605e7cb6e64db4b97dbe/data_generation/nvisii_data_gen/doc/videos/cylinder_nosym.mp4
--------------------------------------------------------------------------------
/data_generation/nvisii_data_gen/doc/videos/cylinder_sym.mp4:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/NVlabs/Deep_Object_Pose/6cb3f3d250aed270e91b605e7cb6e64db4b97dbe/data_generation/nvisii_data_gen/doc/videos/cylinder_sym.mp4
--------------------------------------------------------------------------------
/data_generation/nvisii_data_gen/doc/videos/hex_screw.mp4:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/NVlabs/Deep_Object_Pose/6cb3f3d250aed270e91b605e7cb6e64db4b97dbe/data_generation/nvisii_data_gen/doc/videos/hex_screw.mp4
--------------------------------------------------------------------------------
/data_generation/nvisii_data_gen/download_google_scanned_objects.py:
--------------------------------------------------------------------------------
1 | import sys,json,requests
2 | import simplejson as json
3 | import subprocess
4 |
5 | collection_name = 'Google%20Scanned%20Objects'
6 | owner_name = 'GoogleResearch'
7 | # The server URL
8 | base_url ='https://fuel.ignitionrobotics.org'
9 | # Path to get the models in the collection
10 | next_url = '/1.0/models?page=2&per_page=100&q=collections:{}'.format(collection_name)
11 | next_url = '/1.0/models?per_page=100&page={}&q=collections:Google%20Scanned%20Objects'
12 | # Path to download a single model in the collection
13 | download_url = 'https://fuel.ignitionrobotics.org/1.0/{}/models/'.format(owner_name)
14 | count = 0
15 | total_count = 0
16 | # Iterate over the pages
17 | # while next_url:
18 | downloaded = {}
19 |
20 | subprocess.call(['mkdir','google_scanned_models/'])
21 |
22 |
23 |
24 | for i in range(1,1100):
25 | print(count)
26 | # Get the contents of the current page.
27 | try:
28 | r = requests.get(base_url + next_url.format(str(i)))
29 | # print(base_url + next_url)
30 | # print(r.headers)
31 | # break
32 | # Convert to JSON
33 | # print(r.text)
34 | models = json.loads(r.text)
35 | except:
36 | continue
37 | # print(models)
38 | # break
39 | # Get the next page's URL
40 | # next_url = ''
41 | # if 'Link' in r.headers:
42 | # links = r.headers['Link'].split(',')
43 | # for link in links:
44 | # parts = link.split(';')
45 | # if 'next' in parts[1]:
46 | # next_url = parts[0].replace('<','').replace('>','')
47 | # Get the total number of models to download
48 | if total_count <= 0 and 'X-Total-Count' in r.headers:
49 | total_count = int(r.headers['X-Total-Count'])
50 | # Download each model
51 | for model in models:
52 | # count+=1
53 | model_name = model['name']
54 | if model_name not in downloaded:
55 | downloaded[model_name] = 1
56 | count+=1
57 | print ('Downloading (%d/%d) %s' % (count, total_count, model_name))
58 | download = requests.get(download_url+model_name+'.zip', stream=True)
59 | with open("google_scanned_models/"+model_name+'.zip', 'wb') as fd:
60 | for chunk in download.iter_content(chunk_size=1024*1024):
61 | fd.write(chunk)
62 |
63 | subprocess.call(['unzip',"google_scanned_models/"+model_name+'.zip','-d', "google_scanned_models/"+model_name])
64 | subprocess.call(['rm',"google_scanned_models/"+model_name+'.zip'])
65 |
66 |
67 |
--------------------------------------------------------------------------------
/data_generation/nvisii_data_gen/generate_dataset.py:
--------------------------------------------------------------------------------
1 | #!/usr/bin/env python3
2 | import random
3 | import subprocess
4 |
5 |
6 | # 20 000 images
7 |
8 | for i in range(0, 100):
9 | to_call = [
10 | "python",'single_video_pybullet.py',
11 | '--spp','10',
12 | '--nb_frames', '200',
13 | '--nb_objects',str(int(random.uniform(50,75))),
14 | '--scale', '0.01',
15 | '--outf',f"dataset/{str(i).zfill(3)}",
16 | ]
17 | subprocess.call(to_call)
--------------------------------------------------------------------------------
/data_generation/nvisii_data_gen/models_with_symmetries/cylinder/google_16k/model_info.json:
--------------------------------------------------------------------------------
1 | {
2 | "symmetries_discrete": [[1, 0, 0, 0,
3 | 0, -1, 0, 0,
4 | 0, 0, -1, 0,
5 | 0, 0, 0, 1]],
6 | "symmetries_continuous": [{"axis": [0, 0, 1], "offset": [0, 0, 0]}],
7 | "align_axes": [{"object": [0, 1, 0], "camera": [0, 0, 1]}, {"object": [0, 0, 1], "camera": [0, 1, 0]}]
8 | }
9 |
--------------------------------------------------------------------------------
/data_generation/nvisii_data_gen/models_with_symmetries/cylinder/google_16k/texture_map_flat.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/NVlabs/Deep_Object_Pose/6cb3f3d250aed270e91b605e7cb6e64db4b97dbe/data_generation/nvisii_data_gen/models_with_symmetries/cylinder/google_16k/texture_map_flat.png
--------------------------------------------------------------------------------
/data_generation/nvisii_data_gen/models_with_symmetries/cylinder/google_16k/textured.mtl:
--------------------------------------------------------------------------------
1 | # Blender MTL File: 'None'
2 | # Material Count: 1
3 |
4 | newmtl cylinder_material
5 | Ns 225.000000
6 | Ka 1.000000 1.000000 1.000000
7 | Kd 0.800000 0.800000 0.800000
8 | Ks 0.500000 0.500000 0.500000
9 | Ke 0.000000 0.000000 0.000000
10 | Ni 1.450000
11 | d 1.000000
12 | illum 2
13 | map_Kd texture_map_flat.png
14 |
--------------------------------------------------------------------------------
/data_generation/nvisii_data_gen/models_with_symmetries/hex_screw/google_16k/model_info.json:
--------------------------------------------------------------------------------
1 | {
2 | "symmetries_discrete": [[ 0.5, -0.866, 0, 0,
3 | 0.866, 0.5, 0, 0,
4 | 0, 0, 1, 0,
5 | 0, 0, 0, 1],
6 | [-0.5, -0.866, 0, 0,
7 | 0.866, -0.5, 0, 0,
8 | 0, 0, 1, 0,
9 | 0, 0, 0, 1],
10 | [-1, 0, 0, 0,
11 | 0, -1, 0, 0,
12 | 0, 0, 1, 0,
13 | 0, 0, 0, 1],
14 | [-0.5, 0.866, 0, 0,
15 | -0.866, -0.5, 0, 0,
16 | 0, 0, 1, 0,
17 | 0, 0, 0, 1],
18 | [ 0.5, 0.866, 0, 0,
19 | -0.866, 0.5, 0, 0,
20 | 0, 0, 1, 0,
21 | 0, 0, 0, 1]],
22 | "align_axes": [{"object": [0, 1, 0], "camera": [0, 0, 1]}]
23 | }
24 |
--------------------------------------------------------------------------------
/data_generation/nvisii_data_gen/models_with_symmetries/hex_screw/google_16k/texture_map_flat.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/NVlabs/Deep_Object_Pose/6cb3f3d250aed270e91b605e7cb6e64db4b97dbe/data_generation/nvisii_data_gen/models_with_symmetries/hex_screw/google_16k/texture_map_flat.png
--------------------------------------------------------------------------------
/data_generation/nvisii_data_gen/models_with_symmetries/hex_screw/google_16k/textured.mtl:
--------------------------------------------------------------------------------
1 | # Blender MTL File: 'textured.blend'
2 | # Material Count: 1
3 |
4 | newmtl hexagon_material
5 | Ns 225.000000
6 | Ka 1.000000 1.000000 1.000000
7 | Kd 0.800000 0.800000 0.800000
8 | Ks 0.500000 0.500000 0.500000
9 | Ke 0.000000 0.000000 0.000000
10 | Ni 1.450000
11 | d 1.000000
12 | illum 2
13 | map_Kd texture_map_flat.png
14 |
--------------------------------------------------------------------------------
/data_generation/nvisii_data_gen/models_with_symmetries/hex_screw/google_16k/textured.obj:
--------------------------------------------------------------------------------
1 | # Blender v3.0.1 OBJ File: 'textured.blend'
2 | # www.blender.org
3 | mtllib textured.mtl
4 | o Cylinder
5 | v -0.000000 0.075000 0.000000
6 | v -0.000000 0.075000 0.075000
7 | v 0.064952 0.037500 0.000000
8 | v 0.064952 0.037500 0.075000
9 | v 0.064952 -0.037500 0.000000
10 | v 0.064952 -0.037500 0.075000
11 | v -0.000000 -0.075000 0.000000
12 | v -0.000000 -0.075000 0.075000
13 | v -0.064952 -0.037500 0.000000
14 | v -0.064952 -0.037500 0.075000
15 | v -0.064952 0.037500 0.000000
16 | v -0.064952 0.037500 0.075000
17 | v 0.000000 0.037500 -0.075000
18 | v 0.000000 0.037500 0.000000
19 | v 0.007316 0.036779 -0.075000
20 | v 0.007316 0.036779 0.000000
21 | v 0.014351 0.034645 -0.075000
22 | v 0.014351 0.034645 0.000000
23 | v 0.020834 0.031180 -0.075000
24 | v 0.020834 0.031180 0.000000
25 | v 0.026517 0.026517 -0.075000
26 | v 0.026517 0.026517 0.000000
27 | v 0.031180 0.020834 -0.075000
28 | v 0.031180 0.020834 0.000000
29 | v 0.034645 0.014351 -0.075000
30 | v 0.034645 0.014351 0.000000
31 | v 0.036779 0.007316 -0.075000
32 | v 0.036779 0.007316 0.000000
33 | v 0.037500 0.000000 -0.075000
34 | v 0.037500 -0.000000 -0.000000
35 | v 0.036779 -0.007316 -0.075000
36 | v 0.036779 -0.007316 -0.000000
37 | v 0.034645 -0.014351 -0.075000
38 | v 0.034645 -0.014351 -0.000000
39 | v 0.031180 -0.020834 -0.075000
40 | v 0.031180 -0.020834 -0.000000
41 | v 0.026517 -0.026517 -0.075000
42 | v 0.026517 -0.026517 -0.000000
43 | v 0.020834 -0.031180 -0.075000
44 | v 0.020834 -0.031180 -0.000000
45 | v 0.014351 -0.034645 -0.075000
46 | v 0.014351 -0.034645 -0.000000
47 | v 0.007316 -0.036779 -0.075000
48 | v 0.007316 -0.036779 -0.000000
49 | v -0.000000 -0.037500 -0.075000
50 | v -0.000000 -0.037500 -0.000000
51 | v -0.007316 -0.036779 -0.075000
52 | v -0.007316 -0.036779 -0.000000
53 | v -0.014351 -0.034645 -0.075000
54 | v -0.014351 -0.034645 -0.000000
55 | v -0.020834 -0.031180 -0.075000
56 | v -0.020834 -0.031180 -0.000000
57 | v -0.026517 -0.026517 -0.075000
58 | v -0.026517 -0.026517 -0.000000
59 | v -0.031180 -0.020834 -0.075000
60 | v -0.031180 -0.020834 -0.000000
61 | v -0.034645 -0.014351 -0.075000
62 | v -0.034645 -0.014351 -0.000000
63 | v -0.036779 -0.007316 -0.075000
64 | v -0.036779 -0.007316 -0.000000
65 | v -0.037500 0.000000 -0.075000
66 | v -0.037500 -0.000000 -0.000000
67 | v -0.036779 0.007316 -0.075000
68 | v -0.036779 0.007316 0.000000
69 | v -0.034645 0.014351 -0.075000
70 | v -0.034645 0.014351 0.000000
71 | v -0.031180 0.020834 -0.075000
72 | v -0.031180 0.020834 0.000000
73 | v -0.026517 0.026517 -0.075000
74 | v -0.026517 0.026516 0.000000
75 | v -0.020834 0.031180 -0.075000
76 | v -0.020834 0.031180 0.000000
77 | v -0.014351 0.034645 -0.075000
78 | v -0.014351 0.034645 0.000000
79 | v -0.007316 0.036779 -0.075000
80 | v -0.007316 0.036779 0.000000
81 | vt 1.000000 0.749602
82 | vt 1.000000 0.916269
83 | vt 0.833333 0.916269
84 | vt 0.833333 0.749602
85 | vt 0.666667 0.916269
86 | vt 0.666667 0.749602
87 | vt 0.500000 0.916269
88 | vt 0.500000 0.749602
89 | vt 0.333333 0.916269
90 | vt 0.333333 0.749602
91 | vt 0.457846 0.130000
92 | vt 0.457846 0.370000
93 | vt 0.250000 0.490000
94 | vt 0.042154 0.370000
95 | vt 0.042154 0.130000
96 | vt 0.250000 0.010000
97 | vt 0.166667 0.916269
98 | vt 0.166667 0.749602
99 | vt -0.000000 0.916269
100 | vt -0.000000 0.749602
101 | vt 0.785595 0.311550
102 | vt 0.889518 0.371550
103 | vt 0.993441 0.311550
104 | vt 0.993441 0.191550
105 | vt 0.889518 0.131550
106 | vt 0.785595 0.191550
107 | vt 1.000000 0.505485
108 | vt 1.000000 0.672152
109 | vt 0.968750 0.672152
110 | vt 0.968750 0.505485
111 | vt 0.937500 0.672152
112 | vt 0.937500 0.505485
113 | vt 0.906250 0.672152
114 | vt 0.906250 0.505485
115 | vt 0.875000 0.672152
116 | vt 0.875000 0.505485
117 | vt 0.843750 0.672152
118 | vt 0.843750 0.505485
119 | vt 0.812500 0.672152
120 | vt 0.812500 0.505485
121 | vt 0.781250 0.672152
122 | vt 0.781250 0.505485
123 | vt 0.750000 0.672152
124 | vt 0.750000 0.505485
125 | vt 0.718750 0.672152
126 | vt 0.718750 0.505485
127 | vt 0.687500 0.672152
128 | vt 0.687500 0.505485
129 | vt 0.656250 0.672152
130 | vt 0.656250 0.505485
131 | vt 0.625000 0.672152
132 | vt 0.625000 0.505485
133 | vt 0.593750 0.672152
134 | vt 0.593750 0.505485
135 | vt 0.562500 0.672152
136 | vt 0.562500 0.505485
137 | vt 0.531250 0.672152
138 | vt 0.531250 0.505485
139 | vt 0.500000 0.672152
140 | vt 0.500000 0.505485
141 | vt 0.468750 0.672152
142 | vt 0.468750 0.505485
143 | vt 0.437500 0.672152
144 | vt 0.437500 0.505485
145 | vt 0.406250 0.672152
146 | vt 0.406250 0.505485
147 | vt 0.375000 0.672152
148 | vt 0.375000 0.505485
149 | vt 0.343750 0.672152
150 | vt 0.343750 0.505485
151 | vt 0.312500 0.672152
152 | vt 0.312500 0.505485
153 | vt 0.281250 0.672152
154 | vt 0.281250 0.505485
155 | vt 0.250000 0.672152
156 | vt 0.250000 0.505485
157 | vt 0.218750 0.672152
158 | vt 0.218750 0.505485
159 | vt 0.187500 0.672152
160 | vt 0.187500 0.505485
161 | vt 0.156250 0.672152
162 | vt 0.156250 0.505485
163 | vt 0.125000 0.672152
164 | vt 0.125000 0.505485
165 | vt 0.093750 0.672152
166 | vt 0.093750 0.505485
167 | vt 0.062500 0.672152
168 | vt 0.062500 0.505485
169 | vt 0.031250 0.672152
170 | vt 0.031250 0.505485
171 | vt 0.000000 0.672152
172 | vt 0.000000 0.505485
173 | vt 0.750000 0.370000
174 | vt 0.726589 0.367694
175 | vt 0.704078 0.360866
176 | vt 0.683332 0.349776
177 | vt 0.665147 0.334853
178 | vt 0.650224 0.316668
179 | vt 0.639134 0.295922
180 | vt 0.632306 0.273411
181 | vt 0.630000 0.250000
182 | vt 0.632306 0.226589
183 | vt 0.639134 0.204078
184 | vt 0.650224 0.183332
185 | vt 0.665147 0.165147
186 | vt 0.683332 0.150224
187 | vt 0.704078 0.139134
188 | vt 0.726589 0.132306
189 | vt 0.750000 0.130000
190 | vt 0.773411 0.132306
191 | vt 0.795922 0.139134
192 | vt 0.816668 0.150224
193 | vt 0.834853 0.165147
194 | vt 0.849776 0.183332
195 | vt 0.860866 0.204078
196 | vt 0.867694 0.226589
197 | vt 0.870000 0.250000
198 | vt 0.867694 0.273411
199 | vt 0.860866 0.295922
200 | vt 0.849776 0.316668
201 | vt 0.834853 0.334853
202 | vt 0.816668 0.349776
203 | vt 0.795922 0.360866
204 | vt 0.773411 0.367694
205 | vn 0.5000 0.8660 0.0000
206 | vn 1.0000 0.0000 0.0000
207 | vn 0.5000 -0.8660 0.0000
208 | vn -0.5000 -0.8660 0.0000
209 | vn 0.0000 0.0000 1.0000
210 | vn -1.0000 0.0000 0.0000
211 | vn -0.5000 0.8660 0.0000
212 | vn 0.0000 0.0000 -1.0000
213 | vn 0.0980 0.9952 0.0000
214 | vn 0.2903 0.9569 0.0000
215 | vn 0.4714 0.8819 0.0000
216 | vn 0.6344 0.7730 0.0000
217 | vn 0.7730 0.6344 0.0000
218 | vn 0.8819 0.4714 0.0000
219 | vn 0.9569 0.2903 0.0000
220 | vn 0.9952 0.0980 0.0000
221 | vn 0.9952 -0.0980 -0.0000
222 | vn 0.9569 -0.2903 -0.0000
223 | vn 0.8819 -0.4714 -0.0000
224 | vn 0.7730 -0.6344 -0.0000
225 | vn 0.6344 -0.7730 -0.0000
226 | vn 0.4714 -0.8819 -0.0000
227 | vn 0.2903 -0.9569 0.0000
228 | vn 0.0980 -0.9952 -0.0000
229 | vn -0.0980 -0.9952 -0.0000
230 | vn -0.2903 -0.9569 -0.0000
231 | vn -0.4714 -0.8819 -0.0000
232 | vn -0.6344 -0.7730 -0.0000
233 | vn -0.7730 -0.6344 -0.0000
234 | vn -0.8819 -0.4714 -0.0000
235 | vn -0.9569 -0.2903 -0.0000
236 | vn -0.9952 -0.0980 -0.0000
237 | vn -0.9952 0.0980 0.0000
238 | vn -0.9569 0.2903 0.0000
239 | vn -0.8819 0.4714 0.0000
240 | vn -0.7730 0.6344 0.0000
241 | vn -0.6344 0.7730 0.0000
242 | vn -0.4714 0.8819 0.0000
243 | vn -0.2903 0.9569 0.0000
244 | vn -0.0980 0.9952 0.0000
245 | usemtl hexagon_material
246 | s off
247 | f 1/1/1 2/2/1 4/3/1 3/4/1
248 | f 3/4/2 4/3/2 6/5/2 5/6/2
249 | f 5/6/3 6/5/3 8/7/3 7/8/3
250 | f 7/8/4 8/7/4 10/9/4 9/10/4
251 | f 6/11/5 4/12/5 2/13/5 12/14/5 10/15/5 8/16/5
252 | f 9/10/6 10/9/6 12/17/6 11/18/6
253 | f 11/18/7 12/17/7 2/19/7 1/20/7
254 | f 11/21/8 1/22/8 3/23/8 5/24/8 7/25/8 9/26/8
255 | f 13/27/9 14/28/9 16/29/9 15/30/9
256 | f 15/30/10 16/29/10 18/31/10 17/32/10
257 | f 17/32/11 18/31/11 20/33/11 19/34/11
258 | f 19/34/12 20/33/12 22/35/12 21/36/12
259 | f 21/36/13 22/35/13 24/37/13 23/38/13
260 | f 23/38/14 24/37/14 26/39/14 25/40/14
261 | f 25/40/15 26/39/15 28/41/15 27/42/15
262 | f 27/42/16 28/41/16 30/43/16 29/44/16
263 | f 29/44/17 30/43/17 32/45/17 31/46/17
264 | f 31/46/18 32/45/18 34/47/18 33/48/18
265 | f 33/48/19 34/47/19 36/49/19 35/50/19
266 | f 35/50/20 36/49/20 38/51/20 37/52/20
267 | f 37/52/21 38/51/21 40/53/21 39/54/21
268 | f 39/54/22 40/53/22 42/55/22 41/56/22
269 | f 41/56/23 42/55/23 44/57/23 43/58/23
270 | f 43/58/24 44/57/24 46/59/24 45/60/24
271 | f 45/60/25 46/59/25 48/61/25 47/62/25
272 | f 47/62/26 48/61/26 50/63/26 49/64/26
273 | f 49/64/27 50/63/27 52/65/27 51/66/27
274 | f 51/66/28 52/65/28 54/67/28 53/68/28
275 | f 53/68/29 54/67/29 56/69/29 55/70/29
276 | f 55/70/30 56/69/30 58/71/30 57/72/30
277 | f 57/72/31 58/71/31 60/73/31 59/74/31
278 | f 59/74/32 60/73/32 62/75/32 61/76/32
279 | f 61/76/33 62/75/33 64/77/33 63/78/33
280 | f 63/78/34 64/77/34 66/79/34 65/80/34
281 | f 65/80/35 66/79/35 68/81/35 67/82/35
282 | f 67/82/36 68/81/36 70/83/36 69/84/36
283 | f 69/84/37 70/83/37 72/85/37 71/86/37
284 | f 71/86/38 72/85/38 74/87/38 73/88/38
285 | f 73/88/39 74/87/39 76/89/39 75/90/39
286 | f 75/90/40 76/89/40 14/91/40 13/92/40
287 | f 13/93/8 15/94/8 17/95/8 19/96/8 21/97/8 23/98/8 25/99/8 27/100/8 29/101/8 31/102/8 33/103/8 35/104/8 37/105/8 39/106/8 41/107/8 43/108/8 45/109/8 47/110/8 49/111/8 51/112/8 53/113/8 55/114/8 57/115/8 59/116/8 61/117/8 63/118/8 65/119/8 67/120/8 69/121/8 71/122/8 73/123/8 75/124/8
288 |
--------------------------------------------------------------------------------
/data_generation/nvisii_data_gen/output/output_example/00000.depth.exr:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/NVlabs/Deep_Object_Pose/6cb3f3d250aed270e91b605e7cb6e64db4b97dbe/data_generation/nvisii_data_gen/output/output_example/00000.depth.exr
--------------------------------------------------------------------------------
/data_generation/nvisii_data_gen/output/output_example/00000.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/NVlabs/Deep_Object_Pose/6cb3f3d250aed270e91b605e7cb6e64db4b97dbe/data_generation/nvisii_data_gen/output/output_example/00000.png
--------------------------------------------------------------------------------
/data_generation/nvisii_data_gen/output/output_example/00000.seg.exr:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/NVlabs/Deep_Object_Pose/6cb3f3d250aed270e91b605e7cb6e64db4b97dbe/data_generation/nvisii_data_gen/output/output_example/00000.seg.exr
--------------------------------------------------------------------------------
/data_generation/nvisii_data_gen/readme.md:
--------------------------------------------------------------------------------
1 |
2 |
3 | # Description
4 |
5 | These sample scripts use [NViSII](https://github.com/owl-project/NVISII) to generate synthetic data for training the [DOPE](https://github.com/NVlabs/Deep_Object_Pose) object pose estimator.
6 | The data can also be used for training other networks.
7 | To generate the data, you will need NVIDIA drivers 450 or above.
8 | We also highly recommend a GPU with RTX capabilities, as ray tracing can be costly on a non-RTX GPU.
9 |
10 | The code in this repo is a cleaned-up version of what was used to generate the data called `dome` in our NViSII [paper](https://arxiv.org/abs/2105.13962).
11 |
12 |
13 |
14 | # Installation
15 | ```
16 | pip install -r requirements.txt
17 | ```
18 |
19 | ## HDRI maps
20 | You will need to download HDRI maps to illuminate the scene. These can be found freely on [polyhaven](https://polyhaven.com/hdris).
21 | For testing purposes, you can download a single one here:
22 | ```
23 | wget https://www.dropbox.com/s/na3vo8rca7feoiq/teatro_massimo_2k.hdr
24 | mv teatro_massimo_2k.hdr dome_hdri_haven/
25 | ```
26 |
27 |
28 | ## Distractors
29 |
30 | The script, as is, expects some objects to be used as distractors. It is currently using the [Google scanned objects dataset](https://app.ignitionrobotics.org/GoogleResearch/fuel/collections/Google%20Scanned%20Objects), which can be download automatically with the following:
31 |
32 | ```
33 | python download_google_scanned_objects.py
34 | ```
35 |
36 | If you do *not* want to use the distractors, use the following argument when running the script: `--nb_distractors 0`.
37 |
38 | # Running the script
39 |
40 | If you downloaded everything from the previous steps, _e.g._, a single HDRI map and some distractors from Google scanned objects, you can run the following command:
41 |
42 | ```
43 | python single_video_pybullet.py --nb_frames 1 --scale 0.01
44 | ```
45 |
46 | This will generate a single frame example in `output/output_example/`. The image should be similar to the following:
47 |
48 | 
49 |
50 | The script has a few controls that are exposed at the beginning of the file.
51 | Please consult `single_video_pybullet.py --help` for a complete list of parameters.
52 | The major parameters are as follows:
53 | - `--spp` for the number of sample per pixel, the higher it is the better quality the resulting image.
54 | - `--nb_frames` number of images to export.
55 | - `--outf` folder to store the data.
56 | - `--nb_objects` the number of objects to load, this can reload the same object multiple times.
57 | - `--nb_distractors` how many objects to add as distractors, this uses 3D models from Google scanned objects.
58 |
59 | # Adding your own 3D models
60 |
61 | You can simply use `--path_single_obj` to load your own 3d model. But there are some limitations for exporting the meta data if the obj is complex. Try to have it as a single obj, e.g., not multiple textures, similar to the provided one in the repo.
62 |
63 | ## Modifying the code to load your object
64 |
65 | The script loads 3d models that are expressed in the format that was introduced by YCB dataset.
66 | But it is fairly easy to change the script to load your own 3d model, [NViSII](https://github.com/owl-project/NVISII) allows you to load different format
67 | as well, not just `obj` files. In `single_video_pybullet.py` find the following code:
68 |
69 | ```python
70 | for i_obj in range(int(opt.nb_objects)):
71 |
72 | toy_to_load = google_content_folder[random.randint(0,len(google_content_folder)-1)]
73 |
74 | obj_to_load = toy_to_load + "/google_16k/textured.obj"
75 | texture_to_load = toy_to_load + "/google_16k/texture_map_flat.png"
76 | name = "hope_" + toy_to_load.split('/')[-2] + f"_{i_obj}"
77 | adding_mesh_object(name,obj_to_load,texture_to_load,scale=0.01)
78 | ```
79 | You can change the `obj_to_load` and `texture_to_load` to match your data format. If your file format is quite different, for example you are using a `.glb` file, then in the function `adding_mesh_object()` you will need to change the following:
80 |
81 | ```python
82 | if obj_to_load in mesh_loaded:
83 | toy_mesh = mesh_loaded[obj_to_load]
84 | else:
85 | toy_mesh = visii.mesh.create_from_file(name,obj_to_load)
86 | mesh_loaded[obj_to_load] = toy_mesh
87 | ```
88 | `visii.mesh.create_from_file` is the function that is used to load the data, this can load different file format. The rest of that function also loads the right texture as well as applying a material. The function also creates a collision mesh to make the object move.
89 |
90 |
91 |
92 |
93 |
94 | ## Updates
95 |
96 | - 11/01/2022: Added the possibility to load a single object with `--path_single_obj`. Just give the direct path to the object.
97 | This function uses [nvisii.import_scene()](https://nvisii.com/nvisii.html#nvisii.import_scene).
98 | If the obj file is complex, it will break the object into sub components,
99 | so you might not have the projected cuboid, and you will get each pose of the different components with the cuboid.
100 | Be careful using this one, make sure your understand the implications.
101 | TODO: track the cuboid of the import_scene from nvisii.
102 |
103 |
104 | ## Citation
105 |
106 | If you use this data generation script in your research, please cite as follows:
107 |
108 | ```latex
109 | @misc{morrical2021nvisii,
110 | title={NViSII: A Scriptable Tool for Photorealistic Image Generation},
111 | author={Nathan Morrical and Jonathan Tremblay and Yunzhi Lin and Stephen Tyree and Stan Birchfield and Valerio Pascucci and Ingo Wald},
112 | year={2021},
113 | eprint={2105.13962},
114 | archivePrefix={arXiv},
115 | primaryClass={cs.CV}
116 | }
117 | ```
118 |
--------------------------------------------------------------------------------
/data_generation/nvisii_data_gen/requirements.txt:
--------------------------------------------------------------------------------
1 | nvisii
2 | numpy
3 | opencv-python
4 | pybullet
5 | randomcolor
6 | requests
7 | simplejson
8 | Pillow
9 | pyquaternion
10 |
--------------------------------------------------------------------------------
/data_generation/validate_data.py:
--------------------------------------------------------------------------------
1 | #!/usr/bin/env python3
2 |
3 | import json
4 | import numpy as np
5 | import os
6 | from PIL import Image, ImageDraw
7 | from pyquaternion import Quaternion
8 | import sys
9 | sys.path.append("../common/")
10 | from cuboid import CuboidVertexType
11 |
12 | def main(json_files):
13 | for json_fn in json_files:
14 | # Find corresponding PNG
15 | base, _ = os.path.splitext(json_fn)
16 | img_fn = base+'.png'
17 | if not os.path.isfile(img_fn):
18 | print(f"Could not locate '{img_fn}'. Skipping..")
19 | continue
20 |
21 | # Load JSON data
22 | with open(json_fn, 'r') as F:
23 | data_json = json.load(F)
24 | up = np.array(data_json['camera_data']['camera_look_at']['up'])
25 | at = np.array(data_json['camera_data']['camera_look_at']['at'])
26 | eye = np.array(data_json['camera_data']['camera_look_at']['eye'])
27 |
28 | cam_matrix = np.eye(4)
29 | cam_matrix[0:3,0] = up
30 | cam_matrix[0:3,1] = np.cross(up, -at)
31 | cam_matrix[0:3,2] = -at
32 | cam_matrix[0:3,3] = -eye
33 |
34 | img = Image.open(img_fn)
35 |
36 | objects = data_json['objects']
37 | # draw projected cuboid dots
38 | for oo in objects:
39 | draw = ImageDraw.Draw(img)
40 | pts = oo['projected_cuboid']
41 | for idx, pt in enumerate(pts):
42 | draw.ellipse((pt[0]-2, pt[1]-2, pt[0]+2, pt[1]+2), fill = 'cyan',
43 | outline ='cyan')
44 |
45 | # Note that the enum names DO NOT MATCH the positions of the points
46 | # when projected into 3D. This is an old bug that will not be fixed,
47 | # as it will result in errors in inference in older trained models
48 | line_order = [
49 | # Front
50 | [CuboidVertexType.FrontTopRight, CuboidVertexType.FrontTopLeft, 'red'],
51 | [CuboidVertexType.FrontTopLeft, CuboidVertexType.FrontBottomLeft, 'red'],
52 | [CuboidVertexType.FrontBottomRight, CuboidVertexType.FrontBottomLeft, 'red'],
53 | [CuboidVertexType.FrontBottomRight, CuboidVertexType.FrontTopRight, 'red'],
54 | # Rear
55 | [CuboidVertexType.RearTopRight, CuboidVertexType.RearTopLeft, 'cyan'],
56 | [CuboidVertexType.RearBottomLeft, CuboidVertexType.RearTopLeft, 'cyan'],
57 | [CuboidVertexType.RearBottomLeft, CuboidVertexType.RearBottomRight, 'cyan'],
58 | [CuboidVertexType.RearTopRight, CuboidVertexType.RearBottomRight, 'cyan'],
59 | # Sides
60 | [CuboidVertexType.FrontTopRight, CuboidVertexType.RearTopRight, 'green'],
61 | [CuboidVertexType.RearBottomRight, CuboidVertexType.FrontBottomRight, 'green'],
62 | [CuboidVertexType.RearTopLeft, CuboidVertexType.FrontTopLeft, 'cyan'],
63 | [CuboidVertexType.FrontBottomLeft, CuboidVertexType.RearBottomLeft, 'cyan'],
64 | # 'X' on top
65 | [CuboidVertexType.FrontTopRight, CuboidVertexType.RearTopLeft, 'cyan'],
66 | [CuboidVertexType.FrontTopLeft, CuboidVertexType.RearTopRight, 'cyan']
67 | ]
68 |
69 | for ll in line_order:
70 | draw.line([(pts[ll[0]][0],pts[ll[0]][1]), (pts[ll[1]][0],pts[ll[1]][1])],
71 | fill=ll[2], width=1)
72 |
73 | img.save(base+'-output.png')
74 |
75 |
76 | def usage_msg(script_name):
77 | print(f"Usage: {script_name} _JSON FILES_")
78 | print(" The basename of the JSON files in _FILES_ will be used to find its")
79 | print(" corresponding image file; i.e. if `00001.json` is provided, the code")
80 | print(" will look for an image named `00001.png`")
81 |
82 |
83 | if __name__ == "__main__":
84 | # Print out usage information if there are no arguments
85 | if len(sys.argv) < 2:
86 | usage_msg(sys.argv[0])
87 | exit(0)
88 |
89 | # ..or if the first argument is a request for help
90 | s = sys.argv[1].lstrip('-')
91 | if s == "h" or s == "help":
92 | usage_msg(sys.argv[0])
93 | exit(0)
94 |
95 | main(sys.argv[1:])
96 |
97 |
98 |
--------------------------------------------------------------------------------
/doc/camera_tutorial.md:
--------------------------------------------------------------------------------
1 | ## Running DOPE with a webcam
2 |
3 | This tutorial explains how to:
4 |
5 | 1. start a ROS driver for a regular USB webcam
6 | 2. calibrate the camera **or** enter the camera intrinsics manually
7 | 3. rectify the images and publish them on a topic
8 |
9 | Since DOPE relies solely on RGB images and the associated `camera_info` topic,
10 | it is essential that the camera is properly calibrated to give good results.
11 | Also, unless you are using a very low-distortion lens, the images should be
12 | rectified before feeding them to DOPE.
13 |
14 | ### A. Starting a ROS driver for a USB webcam
15 |
16 | In this tutorial, we're using the [usb_cam](http://wiki.ros.org/usb_cam)
17 | ROS package. If this package is not working with your camera, simply google
18 | around - nowadays there is a ROS driver for almost every camera.
19 |
20 | 1. Install the driver:
21 |
22 | ```bash
23 | sudo apt install ros-kinetic-usb-cam
24 | ```
25 |
26 | 2. Run the camera driver (enter each command in a separate terminal)
27 |
28 | ```bash
29 | roscore
30 | rosrun usb_cam usb_cam_node _camera_name:='usb_cam' _camera_frame_id:='usb_cam'
31 | ```
32 |
33 | See the [usb_cam wiki page](http://wiki.ros.org/usb_cam) for a list of all
34 | parameters.
35 |
36 | 3. Check that the camera is running:
37 |
38 | ```
39 | $ rostopic list
40 | [...]
41 | /usb_cam/camera_info
42 | /usb_cam/image_raw
43 | [...]
44 | $ rostopic hz /usb_cam/image_raw
45 | subscribed to [/usb_cam/image_raw]
46 | average rate: 30.001
47 | min: 0.029s max: 0.038s std dev: 0.00280s window: 28
48 | ```
49 |
50 | 4. If you want, you can also run `rviz` to visualize the camera topic.
51 |
52 | Since the camera is still uncalibrated, you should have seen the following
53 | warning when starting the `usb_cam` node in step 2:
54 |
55 | ```
56 | [ WARN] [1561548002.895791819]: Camera calibration file /home/******/.ros/camera_info/usb_cam.yaml not found.
57 | ```
58 |
59 | Also, the camera_info topic is all zeros:
60 |
61 | ```bash
62 | $ rostopic echo -n1 /usb_cam/camera_info
63 | header:
64 | seq: 87
65 | stamp:
66 | secs: 1561548114
67 | nsecs: 388301085
68 | frame_id: "usb_cam"
69 | height: 480
70 | width: 640
71 | distortion_model: ''
72 | D: []
73 | K: [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]
74 | R: [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]
75 | P: [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]
76 | binning_x: 0
77 | binning_y: 0
78 | roi:
79 | x_offset: 0
80 | y_offset: 0
81 | height: 0
82 | width: 0
83 | do_rectify: False
84 | ```
85 |
86 | To fix this, we need to generate a file called
87 | `~/.ros/camera_info/usb_cam.yaml` which holds the camera intrinsics. Either
88 | follow step **B** or **C** to do this.
89 |
90 | ### B. Manually entering camera intrinsics
91 |
92 | If you know the camera intrinsics of your webcam, you can simply generate a new
93 | file `~/.ros/camera_info/usb_cam.yaml` which looks like this (the example is
94 | for a Logitech C920 webcam with the following intrinsics: fx = 641.5,
95 | fy = 641.5, cx = 320.0, cy = 240.0):
96 |
97 |
98 | ```
99 | image_width: 640
100 | image_height: 480
101 | camera_name: usb_cam
102 | camera_matrix:
103 | rows: 3
104 | cols: 3
105 | data: [641.5, 0, 320.0, 0, 641.5, 240.0, 0, 0, 1]
106 | distortion_model: plumb_bob
107 | distortion_coefficients:
108 | rows: 1
109 | cols: 5
110 | data: [0, 0, 0, 0, 0]
111 | rectification_matrix:
112 | rows: 3
113 | cols: 3
114 | data: [1, 0, 0, 0, 1, 0, 0, 0, 1]
115 | projection_matrix:
116 | rows: 3
117 | cols: 4
118 | data: [641.5, 0, 320.0, 0, 0, 641.5, 240.0, 0, 0, 0, 1, 0]
119 | ```
120 |
121 | After creating this file, restart the `usb_cam` driver for the changes to take
122 | effect. The warning "Camera calibration file not found" should have
123 | disappeared, and the `/usb_cam/camera_info` topic should reflect the values
124 | entered above.
125 |
126 | Since the camera intrinsics we supplied above do not specify distortion
127 | coefficients, the image does not need to be rectified, so you can skip the
128 | remaining steps and use the `/usb_cam/image_raw` topic as input for DOPE.
129 |
130 | If you want to do proper calibration and rectification instead, skip step **B**
131 | and continue with **C**.
132 |
133 | ### C. Calibrating the webcam
134 |
135 | Follow the steps in [this tutorial](http://wiki.ros.org/camera_calibration/Tutorials/MonocularCalibration).
136 |
137 | In short, run these commands:
138 |
139 | ```bash
140 | sudo apt install ros-kinetic-camera-calibration
141 | rosrun camera_calibration cameracalibrator.py --size 6x7 --square 0.0495 image:=/usb_cam/image_raw camera:=/usb_cam # adjust these values to your checkerboard
142 | ```
143 |
144 | * Move your checkerboard around and make sure that you cover a good range of
145 | distance from the camera, all parts of the image, and horizontal and vertical
146 | skew of the checkerboard.
147 | * When done, press "calibrate" and **wait** until the calibration is complete.
148 | This can take a long time (minutes or hours), depending on how many
149 | calibration samples you took. As long as the image window is frozen and
150 | `camera_calibration` hogs a CPU, it's still computing.
151 | * Once the calibration has finished, the window will unfreeze. Press "save",
152 | then press "commit".
153 |
154 | After this, the calibration info should have been saved to
155 | `~/.ros/camera_info/usb_cam.yaml`. Restart the `usb_cam` driver for the changes
156 | to take effect.
157 |
158 |
159 | ### D. Rectifying the images
160 |
161 | 1. Install `image_proc`:
162 |
163 | ```bash
164 | sudo apt install ros-kinetic-image-proc
165 | ```
166 |
167 | 2. Create a file called `usb_cam_image_proc.launch` with the following contents:
168 |
169 | ```xml
170 |
171 |
172 |
173 |
174 |
175 |
176 |
177 |
178 |
179 |
180 |
181 |
182 |
183 |
184 |
185 | ```
186 |
187 | 3. Launch it:
188 |
189 | ```bash
190 | roslaunch usb_cam_image_proc.launch
191 | ```
192 |
193 | This should publish the topic `/usb_cam/image_rect_color` (among others). You
194 | can now use this topic as the input for DOPE.
195 |
--------------------------------------------------------------------------------
/dope_objects.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/NVlabs/Deep_Object_Pose/6cb3f3d250aed270e91b605e7cb6e64db4b97dbe/dope_objects.png
--------------------------------------------------------------------------------
/evaluate/.gitignore:
--------------------------------------------------------------------------------
1 | content/
2 | data/
3 | results/
4 |
--------------------------------------------------------------------------------
/evaluate/download_content.sh:
--------------------------------------------------------------------------------
1 | mkdir data
2 |
3 | cd data
4 |
5 | wget https://www.dropbox.com/s/qeljw3vjnc416bs/table_003_cracker_box_dope_results.zip
6 | wget https://www.dropbox.com/s/mn2yqflc6fcqaic/table_003_cracker_box.zip
7 |
8 | unzip table_003_cracker_box_dope_results.zip
9 | rm table_003_cracker_box_dope_results.zip
10 | mkdir table_dope_results/
11 | mv table_003_cracker_box table_dope_results/scene1/
12 |
13 | unzip table_003_cracker_box.zip
14 | rm table_003_cracker_box.zip
15 | mkdir table_ground_truth/
16 | mv table_003_cracker_box table_ground_truth/scene1/
17 |
18 | cd ../
19 |
20 | mkdir content
21 | cd content
22 |
23 | wget https://www.dropbox.com/s/b61es9q5nhwtooi/003_cracker_box.zip
24 | unzip 003_cracker_box.zip
25 | rm 003_cracker_box.zip
26 | mv 003_cracker_box 003_cracker_box_16k
27 |
28 | cd ../
--------------------------------------------------------------------------------
/evaluate/kpd_compute.py:
--------------------------------------------------------------------------------
1 | """
2 | This script computes the average distance metric at the keypoint level
3 | from GT to GU.
4 | """
5 |
6 |
7 |
8 | import argparse
9 | import os
10 | import numpy as np
11 | import glob
12 | import math
13 |
14 | # from pymesh import obj
15 | # from pymesh import ply
16 | # import pywavefront
17 | # import pymesh
18 | from scipy import spatial
19 |
20 | import simplejson as json
21 | import copy
22 | from pyquaternion import Quaternion
23 | import pickle
24 | import nvisii as visii
25 | import subprocess
26 |
27 |
28 |
29 | parser = argparse.ArgumentParser()
30 |
31 | parser.add_argument('--data_prediction',
32 | default = "data/table_dope_results/",
33 | help='path to prediction data')
34 | parser.add_argument('--data',
35 | default="data/table_ground_truth/",
36 | help='path to data ground truth')
37 | parser.add_argument("--outf",
38 | default="results_kpd/",
39 | help="where to put the data"
40 | )
41 | parser.add_argument("--show",
42 | action='store_true',
43 | help="show the graph at the end. "
44 | )
45 |
46 | opt = parser.parse_args()
47 |
48 |
49 |
50 | if opt.outf is None:
51 | opt.outf = opt.data_prediction
52 |
53 | if not os.path.isdir(opt.outf):
54 | print(f'creating the folder: {opt.outf}')
55 | os.mkdir(opt.outf)
56 |
57 | if os.path.isdir(opt.outf + "/tmp"):
58 | print(f'folder {opt.outf + "/tmp"}/ exists')
59 | else:
60 | os.mkdir(opt.outf + "/tmp")
61 | print(f'created folder {opt.outf + "/tmp"}/')
62 |
63 | def get_all_entries(path_to_explore, what='*.json'):
64 |
65 | imgs = []
66 |
67 | def add_images(path):
68 | # print(path)
69 | # print(glob.glob(path+"/*json"))
70 | # print(glob.glob(path+"/"+what))
71 | for j in sorted(glob.glob(path+"/"+what)):
72 | # print(j)
73 | imgs.append(j)
74 | # imgsname.append(j.replace(path,"").replace("/",""))
75 |
76 |
77 | def explore(path):
78 | if not os.path.isdir(path):
79 | return
80 | folders = [os.path.join(path, o) for o in os.listdir(path)
81 | if os.path.isdir(os.path.join(path,o))]
82 | # if len(folders)>0:
83 | for path_entry in folders:
84 | explore(path_entry)
85 |
86 |
87 | add_images(path)
88 |
89 | explore(path_to_explore)
90 | return imgs
91 |
92 |
93 |
94 |
95 |
96 | ###### START #######
97 |
98 | data_thruth = get_all_entries(opt.data,"*.json")
99 | data_prediction = get_all_entries(opt.data_prediction,"*.json")
100 |
101 |
102 | print('number of ground thruths found',len(data_thruth))
103 | print("number of predictions found",len(data_prediction))
104 |
105 | adds_objects = {}
106 |
107 | adds_all = []
108 | all_gts = []
109 | count_all_annotations = 0
110 | count_by_object = {}
111 |
112 | count_all_guesses = 0
113 | count_by_object_guesses = {}
114 |
115 |
116 | for gt_file in data_thruth:
117 | scene_gt = gt_file.replace(opt.data,"").replace('.json','')
118 | pred_scene = None
119 |
120 |
121 | for d in data_prediction:
122 | scene_d = d.replace(opt.data_prediction,'').replace('json','').replace('.','')
123 |
124 | # if scene in d:
125 | # print(scene_d,scene_gt)
126 | if scene_d.split('/')[-1] == scene_gt.split('/')[-1]:
127 | pred_scene = d
128 | break
129 |
130 | if pred_scene is None:
131 | continue
132 | # print(gt_file)
133 | gt_json = None
134 | with open(gt_file) as json_file:
135 | gt_json = json.load(json_file)
136 |
137 | gu_json = None
138 | with open(pred_scene) as json_file:
139 | gu_json = json.load(json_file)
140 |
141 |
142 | objects_gt = [] #name obj, keypoints
143 |
144 | for obj in gt_json['objects']:
145 | if 'class' not in obj:
146 | name_gt = obj['name']
147 | else:
148 | name_gt = obj['class']
149 | # little hack from bug in the data
150 | if name_gt == '003':
151 | name_gt = "003_cracker_box_16k"
152 |
153 | objects_gt.append(
154 | [
155 | name_gt,
156 | obj["projected_cuboid"]
157 | ]
158 | )
159 |
160 | count_all_annotations += 1
161 |
162 | if name_gt in count_by_object:
163 | count_by_object[name_gt] +=1
164 | else:
165 | count_by_object[name_gt] = 1
166 |
167 | for obj_guess in gu_json['objects']:
168 |
169 | if 'class' not in obj:
170 | name_guess = obj_guess['name']
171 | name_look_up = obj_guess['name']
172 | else:
173 | name_guess = obj_guess['class']
174 | name_look_up = obj_guess['class']
175 |
176 |
177 | keypoints_gu = obj_guess["projected_cuboid"]
178 |
179 | count_all_guesses += 1
180 |
181 | if name_guess in count_by_object_guesses:
182 | count_by_object_guesses[name_guess] +=1
183 | else:
184 | count_by_object_guesses[name_guess] = 1
185 |
186 |
187 | # print (name, pose_mesh)
188 | candidates = []
189 | for i_obj_gt, obj_gt in enumerate(objects_gt):
190 | name_gt, pose_mesh_gt = obj_gt
191 |
192 | # print(name_look_up,name_gt)
193 |
194 | if name_look_up == name_gt:
195 | candidates.append([i_obj_gt, pose_mesh_gt, name_gt])
196 |
197 | best_dist = 10000000000
198 | best_index = -1
199 |
200 | for candi_gt in candidates:
201 | # compute the add
202 | i_gt, keypoint_gt, name_gt = candi_gt
203 | dist = []
204 |
205 | for i in range(len(keypoints_gu)):
206 | dist_key = 100000
207 | for j in range(len(keypoints_gu)):
208 | d = np.sqrt((keypoint_gt[i][0]-keypoints_gu[j][0])**2+(keypoint_gt[i][1]-keypoints_gu[j][1])**2)
209 | # print(keypoint_gt[i],keypoints_gu[i],i,d)
210 | if d < dist_key:
211 | dist_key = d
212 | dist.append(dist_key)
213 |
214 |
215 | dist = np.mean(dist)
216 |
217 | if dist < best_dist:
218 | best_dist = dist
219 | best_index = i_gt
220 |
221 | if best_index != -1:
222 | if not name_guess in adds_objects.keys():
223 | adds_objects[name_guess] = []
224 | adds_all.append(best_dist)
225 | adds_objects[name_guess].append(best_dist)
226 |
227 | # save the data
228 | if len(opt.outf.split("/"))>1:
229 | path = None
230 | for folder in opt.outf.split("/"):
231 | if path is None:
232 | path = folder
233 | else:
234 | path = path + "/" + folder
235 | try:
236 | os.mkdir(path)
237 | except:
238 | pass
239 | else:
240 | try:
241 | os.mkdir(opt.outf)
242 | except:
243 | pass
244 | print(adds_objects.keys())
245 | count_by_object["all"] = count_all_annotations
246 | pickle.dump(count_by_object,open(f'{opt.outf}/count_all_annotations.p','wb'))
247 | pickle.dump(adds_all,open(f'{opt.outf}/adds_all.p','wb'))
248 |
249 | count_by_object_guesses["all"] = count_all_guesses
250 | pickle.dump(count_by_object,open(f'{opt.outf}/count_all_guesses.p','wb'))
251 |
252 |
253 | labels = []
254 | data = []
255 | for key in adds_objects.keys():
256 | pickle.dump(adds_objects[key],open(f'{opt.outf}/adds_{key}.p','wb'))
257 | labels.append(key)
258 | data.append(f'{opt.outf}/adds_{key}.p')
259 |
260 |
261 | array_to_call = ["python",
262 | "make_graphs.py",
263 | '--pixels',
264 | '--threshold',"50.0",
265 | "--outf",
266 | opt.outf,
267 | '--labels',
268 | ]
269 |
270 | for label in labels:
271 | array_to_call.append(label)
272 |
273 | array_to_call.append('--data')
274 | for d_p in data:
275 | array_to_call.append(d_p)
276 |
277 | array_to_call.append('--colours')
278 | for i in range(len(data)):
279 | array_to_call.append(str(i))
280 | if opt.show:
281 | array_to_call.append('--show')
282 |
283 | print(array_to_call)
284 | subprocess.call(array_to_call)
285 |
286 | # subprocess.call(
287 | # [
288 | # "python", "make_graphs.py",
289 | # "--data", f'{opt.outf}/adds_{key}.p',
290 | # "--labels", key,
291 | # "--outf", opt.outf,
292 | # '--colours', "0",
293 | # ]
294 | # )
295 |
296 |
297 | visii.deinitialize()
298 |
299 |
--------------------------------------------------------------------------------
/evaluate/make_graphs.py:
--------------------------------------------------------------------------------
1 | import matplotlib
2 | import pickle
3 | import argparse
4 | import seaborn as sns
5 | import matplotlib.pyplot as plt
6 | import numpy as np
7 | import os
8 |
9 | import glob
10 |
11 | # load the data
12 | # might be multiple datasets
13 |
14 | # make the plots
15 | os.environ["CUDA_VISIBLE_DEVICES"]="1"
16 |
17 | parser = argparse.ArgumentParser()
18 |
19 | parser.add_argument('--data_folder',
20 | default = None,
21 | help='path to data output')
22 | parser.add_argument('--data',
23 | nargs='+',
24 | default=None,
25 | help='list of csv files')
26 | parser.add_argument('--labels',
27 | nargs='+',
28 | default=None,
29 | help='labels to put')
30 | parser.add_argument('--colours',
31 | nargs='+',
32 | default=None,
33 | help = '')
34 |
35 | parser.add_argument("--outf",
36 | default=None,
37 | help="where to put the data")
38 |
39 | parser.add_argument('--threshold',
40 | default = 0.1,
41 | type = float
42 | )
43 | parser.add_argument('--title',
44 | default = 'AUC')
45 | parser.add_argument('--filename',
46 | default = 'output')
47 | parser.add_argument('--styles',
48 | nargs='+',
49 | default=None,
50 | help = '')
51 | parser.add_argument("--show",
52 | action='store_true',
53 | help="show the graph at the end. "
54 | )
55 | parser.add_argument("--pixels",
56 | action='store_true',
57 | help="Using keypoint distance as metric"
58 | )
59 | opt = parser.parse_args()
60 | sns.set_style("white")
61 | sns.set_style("ticks")
62 | sns.set_context("paper")
63 | # sns.set_context("notebook")
64 | # sns.set_context("talk")
65 | sns.despine()
66 | # load the data
67 |
68 | # if folder load all the files and create a graph
69 | # if a list put all of them in the same graph
70 |
71 | plt.tight_layout()
72 | # sns.set(font_scale=1.1)
73 |
74 | if opt.data_folder is not None:
75 | # load the data from the file
76 | adds_to_load = glob.glob(f"{opt.data_folder}/adds*")
77 | counts_dict = pickle.load(open(f"{opt.data_folder}/count_all_annotations.p",'rb'))
78 |
79 | else:
80 | # load the files in the list
81 | adds_to_load = opt.data
82 | counts_dict = None
83 |
84 | fig = plt.figure()
85 | ax = plt.axes()
86 |
87 |
88 | for i_file, file in enumerate(adds_to_load):
89 | print(file)
90 | label = file.split("/")[-1]
91 | label = label.replace('adds_','').replace(".p",'')
92 | filename = label
93 |
94 | if not counts_dict is None:
95 | fig = plt.figure()
96 | ax = plt.axes()
97 |
98 | n_pnp_possible_frames = counts_dict[filename]
99 |
100 | else:
101 | # check labels
102 | try:
103 | label = opt.labels[i_file]
104 | except:
105 | label = filename
106 |
107 | # get n possible solutions
108 | path = "/".join(file.split("/")[0:-1]) + '/'
109 | n_pnp_possible_frames = pickle.load(open(f"{path}/count_all_annotations.p",'rb'))[filename]
110 |
111 | adds_objects = pickle.load(open(file,'rb'))
112 |
113 | # add_pnp_found = np.array(adds_objects)/100
114 | add_pnp_found = np.array(adds_objects)
115 | print('mean',add_pnp_found.mean(),'std',add_pnp_found.std(),
116 | 'ratio',f'{len(add_pnp_found)}/{n_pnp_possible_frames}')
117 | n_pnp_found = len(add_pnp_found)
118 |
119 | delta_threshold = opt.threshold/300
120 | add_threshold_values = np.arange(0., opt.threshold, delta_threshold)
121 |
122 | counts = []
123 | for value in add_threshold_values:
124 | under_threshold = len(np.where(add_pnp_found <= value)[0])/n_pnp_possible_frames
125 | counts.append(under_threshold)
126 |
127 | for value in [0.02,0.04,0.06]:
128 | under_threshold = len(np.where(add_pnp_found <= value)[0])/n_pnp_possible_frames
129 | print('auc at ',value,':', under_threshold)
130 | auc = np.trapz(counts, dx = delta_threshold)/opt.threshold
131 |
132 | # divide might screw this up .... to check!
133 | print('auc',auc)
134 | # print('found', n_pnp_found/n_pnp_possible_frames)
135 | # print('mean', np.mean(add[np.where(add > pnp_sol_found_magic_number)]))
136 | # print('median',np.median(add[np.where(add > pnp_sol_found_magic_number)]))
137 | # print('std',np.std(add[np.where(add > pnp_sol_found_magic_number)]))
138 |
139 | cycle = plt.rcParams['axes.prop_cycle'].by_key()['color']
140 | if counts_dict is None:
141 | colour = cycle[int(opt.colours[i_file])]
142 | # colour = cycle[int(i_file)]
143 | else:
144 | colour = cycle[0]
145 |
146 | try:
147 | style = args.styles[i_csv]
148 | if style == '0':
149 | style = '-'
150 | elif style == '1':
151 | style = '--'
152 | elif style == '2':
153 | style = ':'
154 |
155 | else:
156 | style = '-'
157 | except:
158 | style = '-'
159 |
160 | label = f'{label} ({auc:.3f})'
161 | ax.plot(add_threshold_values, counts,style,color=colour,label=label)
162 |
163 | if not counts_dict is None:
164 | if opt.pixels:
165 | plt.xlabel('L2 threshold distance (pixels)')
166 | else:
167 | plt.xlabel('ADD threshold distance (m)')
168 | plt.ylabel('Accuracy')
169 | plt.title(f'{filename} auc: {auc:.3f}')
170 |
171 | ax.set_ylim(0,1)
172 | ax.set_xlim(0, float(opt.threshold))
173 |
174 | # ax.set_xticklabels([0,20,40,60,80,100])
175 | plt.tight_layout()
176 | plt.savefig(f'{opt.data_folder}/{filename}.png')
177 | plt.close()
178 |
179 | if counts_dict is None:
180 | if opt.pixels:
181 | plt.xlabel('L2 threshold distance (pixels)')
182 | else:
183 | plt.xlabel('ADD threshold distance (m)')
184 |
185 | plt.ylabel('Accuracy')
186 | plt.title(opt.title)
187 | ax.legend(loc='lower right',frameon = True, fancybox=True, framealpha=0.8)
188 |
189 |
190 | legend = ax.get_legend()
191 | for i, t in enumerate(legend.get_texts()):
192 | if opt.data[i] == '666':
193 | t.set_ha('left') # ha is alias for horizontalalignment
194 | t.set_position((-30,0))
195 |
196 | ax.set_ylim(0,1)
197 | ax.set_xlim(0, float(opt.threshold))
198 | # ax.set_xticklabels([0,20,40,60,80,100])
199 | plt.tight_layout()
200 | try:
201 | os.mkdir(opt.outf)
202 | except:
203 | pass
204 | if opt.outf is None:
205 | plt.savefig(f'{opt.filename}.png')
206 | else:
207 | plt.savefig(f'{opt.outf}/{opt.filename}.png')
208 | if opt.show:
209 | plt.show()
210 | plt.close()
211 |
212 |
213 |
--------------------------------------------------------------------------------
/evaluate/overlay.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/NVlabs/Deep_Object_Pose/6cb3f3d250aed270e91b605e7cb6e64db4b97dbe/evaluate/overlay.png
--------------------------------------------------------------------------------
/evaluate/readme.md:
--------------------------------------------------------------------------------
1 | # Deep Object Pose Estimation (DOPE) - Evaluation
2 |
3 | ## IMPORTANT NOTE
4 | These utilities currently require NVISII for visualization.
5 |
6 |
7 | ## Simple Performance Evaluation
8 | This directory contains code to measure the performance of your trained DOPE model. Below is an example of running the basic evaluation script:
9 |
10 | ```
11 | python evaluate.py --data_prediction ../inference/output --data ../sample_data
12 | ```
13 | ### Arguments
14 | #### `--data`:
15 | Path to ground-truth data for the predictions that you wish to evaluate.
16 |
17 | #### `--data_prediction`:
18 | Path to predictions that were generated from running inference on your trained model. To support the evaluation of multiple sets of weights at once, this path can point to a folder containing the outputs of multiple inference results.
19 |
20 | #### `--models`:
21 | Path to 3D model files.
22 | These models are loaded before running evaluation and are rendered to compute the 3D error between the predicted results and ground truth.
23 | Point this argument at the root of the folder containing all of your different model files. Below is a sample folder structure:
24 |
25 | ```
26 | /PATH_TO_MODELS_FOLDER
27 | ├── 002_master_chef_can
28 | │ ├── 002_master_chef_can.xml
29 | │ ├── points.xyz
30 | │ ├── textured.mtl
31 | │ ├── textured.obj
32 | │ ├── textured_simple.obj
33 | │ ├── textured_simple.obj.mtl
34 | │ └── texture_map.png
35 | └── 035_power_drill
36 | ├── 035_power_drill.xml
37 | ├── points.xyz
38 | ├── textured.mtl
39 | ├── textured.obj
40 | ├── textured_simple.obj
41 | ├── textured_simple.obj.mtl
42 | └── texture_map.png
43 | ```
44 |
45 | If you trained DOPE on a new object and want to evaluate its
46 | performance, make sure to include the 3D model files in a folder that
47 | matches `"class_name"` in the ground truth `.json` file.
48 |
49 | Multiple models can be loaded at once as the script will recursively
50 | search for any 3D models in the folder specified in `--models`.
51 |
52 | #### `--adds`:
53 | The average distance computed using the closest point distance between
54 | the predicted pose and the ground truth pose. This takes a while to
55 | compute. If you are only looking for a fast approximation, use
56 | ``--cuboid``.
57 |
58 | #### `--cuboid`:
59 | Computes average distance using the 8 cuboid points of the 3D models.
60 | It is much faster than ``--adds`` but is only an approximation for the
61 | metric. It should be used for testing purposes.
62 |
63 |
64 |
65 | # More Complex ADD Metrics and Figure Generation
66 |
67 | ## Requirements
68 |
69 | Run the download content file: `./download_content.sh`, this downloads a simple scene with annotation rendered by NViSII and with DOPE predictions.
70 |
71 | ## How to run
72 |
73 | If you downloaded the previous content you can execute the following:
74 |
75 | ```
76 | python add_compute.py
77 | ```
78 | which should generate the following results:
79 | ```
80 | mean 0.0208515107260977 std 0.016006083915162977 ratio 17/22
81 | auc at 0.02 : 0.5
82 | auc at 0.04 : 0.6818181818181818
83 | auc at 0.06 : 0.7272727272727273
84 | auc 0.6115249999999999
85 | ```
86 | This means the area under the curve, *auc* from 0 cm to 10 cm is 0.61. This script also produces graphs such as:
87 |
88 | 
89 |
90 | These are the metrics we reported in the original DOPE paper. I will refer to the paper for explaining the graph.
91 |
92 | ## Assumptions
93 | We make a few assumptions in this script.
94 | 1. We assume the folders structures are the same and there are only scenes in the folder. See `data/` folder example from downloading the content.
95 | 2. We assume the notation folder is in the OpenGL format and that it is using the nvisii outputs from the data generation pipeline. If you use a different file format please update the script or your data.
96 | 3. We assume the inferences are from DOPE inference, _e.g._, the poses are in the OpenGL format. These conventions are easy to change, _e.g._, look for the line `visii_gu.get_transform().rotate_around` in `add_compute.py` to change the pose convention.
97 |
98 | If the script takes to long to run, please run with `--cuboid`, instead of using the 3d models vertices to compare the metric, it uses the 3d cuboid of the 3d model to compute the metric.
99 |
100 | ## 2D image-based metric
101 |
102 | If you do not have a 3d model of your model and you would prefer to just measure the quality of your detections with a simple euclidean distance for the predicted keypoints. You can use `python kpd_compute.py`, this is very similar to `add_compute.py` and it behaves very similarly.
103 | The metric used here is the euclidean (L2) distance from predicted keypoint and the ground truth keypoint. Then we propose to use a threshold plot to evaluate the data, similar to the ADD metric.
104 |
105 | # Rendering 3d predictions using NViSII
106 |
107 | 
108 |
109 | We added a script for you to add render of the 3d model to your predictions. It uses a version of NViSII that is not released yet. Please manually install this [wheel](https://www.dropbox.com/s/m85v7ts981xs090/nvisii-1.2.dev47%2Bgf122b5b.72-cp36-cp36m-manylinux2014_x86_64.whl?dl=0).
110 | ```
111 | # for scenes with DOPE inference
112 | python render_json.py --path_json data/table_dope_results/scene1/00300.json --scale 0.01 --opencv --contour --gray
113 | # for scenes generated by nvisii
114 | python render_json.py --path_json data/table_ground_truth/scene1/00100.json --scale 0.01 --contour --gray
115 | ```
116 |
117 | `--gray` render the 3d model as a gray image and `--contour` adds the 3d model contour in green.
118 |
119 | ## Rendering BOP format on images
120 |
121 | Using the same argument, you can use this script on the BOP annotation with 3d models. The script simply rebuilds the data structure that is needed to load the scene.
122 |
123 | ```
124 | python render_json.py --path_json /PATH/TO/hope_bop/hope_val/val/000001/scene_gt.json --bop --objs_folder /PATH/TO/hope_bop/hope_models/models/ --gray --contour --bop_scene 0
125 | ```
126 |
127 | Only `--bop` is needed to be passed to load a bop scene. You can pass which scene you want to load with `--bop_scene`. The rest is the same behavior. This was only tested on the HOPE data.
128 |
129 | ## Assumptions
130 |
131 | We assume that you have the intrinsics stored in the camera data. If you do not have them, the script uses 512 x 512 with a fov of 0.78. If the camera data is complete, like with NViSII data, it will use the camera intrinsics.
132 |
133 |
138 |
--------------------------------------------------------------------------------
/evaluate/results/output.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/NVlabs/Deep_Object_Pose/6cb3f3d250aed270e91b605e7cb6e64db4b97dbe/evaluate/results/output.png
--------------------------------------------------------------------------------
/evaluate/utils_eval.py:
--------------------------------------------------------------------------------
1 | import os
2 | import nvisii as visii
3 | import numpy as np
4 |
5 |
6 | def create_obj(
7 | name="name",
8 | path_obj="",
9 | path_tex=None,
10 | scale=1,
11 | rot_base=None, # visii quat
12 | pos_base=None, # visii vec3
13 | ):
14 |
15 | # This is for YCB like dataset
16 | if path_obj in create_obj.meshes:
17 | obj_mesh = create_obj.meshes[path_obj]
18 | else:
19 | obj_mesh = visii.mesh.create_from_file(name, path_obj)
20 | create_obj.meshes[path_obj] = obj_mesh
21 |
22 | obj_entity = visii.entity.create(
23 | name=name,
24 | # mesh = visii.mesh.create_sphere("mesh1", 1, 128, 128),
25 | mesh=obj_mesh,
26 | transform=visii.transform.create(name),
27 | material=visii.material.create(name),
28 | )
29 |
30 | # should randomize
31 | obj_entity.get_material().set_metallic(0) # should 0 or 1
32 | obj_entity.get_material().set_transmission(0) # should 0 or 1
33 | obj_entity.get_material().set_roughness(1) # default is 1
34 |
35 | if not path_tex is None:
36 |
37 | if path_tex in create_obj.textures:
38 | obj_texture = create_obj.textures[path_tex]
39 | else:
40 | obj_texture = visii.texture.create_from_file(name, path_tex)
41 | create_obj.textures[path_tex] = obj_texture
42 |
43 | obj_entity.get_material().set_base_color_texture(obj_texture)
44 |
45 | obj_entity.get_transform().set_scale(visii.vec3(scale))
46 |
47 | if not rot_base is None:
48 | obj_entity.get_transform().set_rotation(rot_base)
49 | if not pos_base is None:
50 | obj_entity.get_transform().set_position(pos_base)
51 |
52 | return obj_entity
53 |
54 |
55 | create_obj.meshes = {}
56 | create_obj.textures = {}
57 |
58 |
59 | def add_cuboid(name, debug=False):
60 | obj = visii.entity.get(name)
61 |
62 | min_obj = obj.get_mesh().get_min_aabb_corner()
63 | max_obj = obj.get_mesh().get_max_aabb_corner()
64 | centroid_obj = obj.get_mesh().get_aabb_center()
65 |
66 | cuboid = [
67 | visii.vec3(max_obj[0], max_obj[1], max_obj[2]),
68 | visii.vec3(min_obj[0], max_obj[1], max_obj[2]),
69 | visii.vec3(max_obj[0], min_obj[1], max_obj[2]),
70 | visii.vec3(max_obj[0], max_obj[1], min_obj[2]),
71 | visii.vec3(min_obj[0], min_obj[1], max_obj[2]),
72 | visii.vec3(max_obj[0], min_obj[1], min_obj[2]),
73 | visii.vec3(min_obj[0], max_obj[1], min_obj[2]),
74 | visii.vec3(min_obj[0], min_obj[1], min_obj[2]),
75 | visii.vec3(centroid_obj[0], centroid_obj[1], centroid_obj[2]),
76 | ]
77 |
78 | # change the ids to be like ndds / DOPE
79 | cuboid = [
80 | cuboid[2],
81 | cuboid[0],
82 | cuboid[3],
83 | cuboid[5],
84 | cuboid[4],
85 | cuboid[1],
86 | cuboid[6],
87 | cuboid[7],
88 | cuboid[-1],
89 | ]
90 |
91 | cuboid.append(visii.vec3(centroid_obj[0], centroid_obj[1], centroid_obj[2]))
92 |
93 | for i_p, p in enumerate(cuboid):
94 | child_transform = visii.transform.create(f"{name}_cuboid_{i_p}")
95 | child_transform.set_position(p)
96 | child_transform.set_scale(visii.vec3(0.3))
97 | child_transform.set_parent(obj.get_transform())
98 | if debug:
99 | visii.entity.create(
100 | name=f"{name}_cuboid_{i_p}",
101 | mesh=visii.mesh.create_sphere(f"{name}_cuboid_{i_p}"),
102 | transform=child_transform,
103 | material=visii.material.create(f"{name}_cuboid_{i_p}"),
104 | )
105 |
106 | for i_v, v in enumerate(cuboid):
107 | cuboid[i_v] = [v[0], v[1], v[2]]
108 |
109 | return cuboid
110 |
111 |
112 | def loadmodels(root, cuboid, suffix=""):
113 | models = {}
114 |
115 | def explore(path):
116 | if not os.path.isdir(path):
117 | return
118 | folders = [
119 | os.path.join(path, o)
120 | for o in os.listdir(path)
121 | if os.path.isdir(os.path.join(path, o))
122 | ]
123 |
124 | if len(folders) > 0:
125 | for path_entry in folders:
126 | explore(path_entry)
127 | else:
128 | print("Looking at:", path)
129 | path_obj = os.path.join(path, "textured_simple.obj")
130 | path_tex = os.path.join(path, "texture_map.png")
131 |
132 | if os.path.exists(path_obj) and os.path.exists(path_tex):
133 | path = path.rstrip("/")
134 | model_name = path.split("/")[-1]
135 |
136 | print(f"Loading Model: {model_name}")
137 |
138 | models[model_name] = create_obj(
139 | name=model_name + suffix,
140 | path_obj=path_obj,
141 | path_tex=path_tex,
142 | scale=0.01,
143 | )
144 |
145 | if cuboid:
146 | add_cuboid(model_name + suffix)
147 |
148 | if "gu" in suffix:
149 | models[model_name].get_material().set_metallic(1)
150 | models[model_name].get_material().set_roughness(0.05)
151 |
152 | explore(root)
153 |
154 | return models
155 |
156 |
157 | def load_groundtruth(root):
158 | gts = []
159 |
160 | def explore(path):
161 |
162 | if not os.path.isdir(path):
163 | return
164 | folders = [
165 | os.path.join(path, o)
166 | for o in os.listdir(path)
167 | if os.path.isdir(os.path.join(path, o))
168 | ]
169 |
170 | for path_entry in folders:
171 | explore(path_entry)
172 |
173 | gts.extend(
174 | [
175 | os.path.join(path, gt).replace(root, "").lstrip("/")
176 | for gt in os.listdir(path)
177 | if gt.endswith(".json") and not "settings" in gt
178 | ]
179 | )
180 |
181 | explore(root)
182 |
183 | return gts
184 |
185 |
186 | def load_prediction(root, groundtruths):
187 | """
188 | Supports multiple prediction folders for one set of testing data.
189 | Each prediction folder must contain the same folder structure as the testing data directory.
190 | """
191 | subdirs = os.listdir(root)
192 | subdirs.append("")
193 |
194 | prediction_folders = []
195 |
196 | for dir in subdirs:
197 | valid_folder = True
198 | for gt in groundtruths:
199 | file_path = os.path.join(os.path.abspath(root), dir, gt)
200 |
201 | if not os.path.exists(file_path):
202 | valid_folder = False
203 | break
204 |
205 | if valid_folder:
206 | prediction_folders.append(dir)
207 |
208 | return prediction_folders
209 |
210 |
211 | def calculate_auc(thresholds, add_list, total_objects):
212 | res = []
213 | for thresh in thresholds:
214 | under_thresh = len(np.where(add_list <= thresh)[0]) / total_objects
215 |
216 | res.append(under_thresh)
217 |
218 | return res
219 |
220 |
221 | def calculate_auc_total(
222 | add_list, total_objects, delta_threshold=0.00001, max_threshold=0.1
223 | ):
224 | add_threshold_values = np.arange(0.0, max_threshold, delta_threshold)
225 |
226 | counts = []
227 | for value in add_threshold_values:
228 | under_threshold = len(np.where(add_list <= value)[0]) / total_objects
229 | counts.append(under_threshold)
230 |
231 | auc = np.trapz(counts, dx=delta_threshold) / max_threshold
232 |
233 | return auc
234 |
--------------------------------------------------------------------------------
/inference/README.md:
--------------------------------------------------------------------------------
1 | # Deep Object Pose Estimation (DOPE) - Inference
2 |
3 | This directory contains a simple example of inference for DOPE.
4 |
5 |
6 | ## Setup
7 |
8 | If you haven't already, install the dependencies listed in `requirements.txt`
9 | in the root of the repo:
10 |
11 | ```
12 | pip install -r ../requirements.txt
13 | ```
14 |
15 | ## Running Inference
16 |
17 | The `inference.py` script will take a trained model to run inference. In order to run, the following 3 arguments are needed:
18 | 1. `--weights`: path to the trained model weights. Can either point to a single `.pth` file or a folder containing multiple `.pth` files. If this path points to a folder with multiple `.pth` files, the script will individually load and run inference for all of the weights.
19 | 2. ``--data`: path to the data that will be used as input to run inference on. The script **recursively** loads all data that end with extensions specified in the `--exts` flag.
20 | 3. `--object`: name of the class to run detections on. This name must be defined under `dimensions` in the config file passed to `--config`.
21 |
22 | Below is an example of running inference:
23 |
24 | ```
25 | python inference.py --weights ../weights --data ../sample_data --object cracker
26 | ```
27 |
28 | ### Configuration Files
29 | Depending on the images you want to run inference on, you may need to redefine the configuration values in `camera_info.yaml` and `config_pose.yaml`.
30 | You can either define a new configuration file and specify it with `--config` and `--camera` or update `camera_info.yaml` and `config_pose.yaml`.
31 |
32 | Before running inference, it is important to make sure that:
33 | 1. The `projection_matrix` field is set properly in `camera_info.yaml` (or the file you specified for `--camera`).
34 | The `projection_matrix` field should be a `3x4` matrix of the form:
35 | ```
36 | [fx, 0, cx, 0,
37 | 0, fy, cy, 0,
38 | 0, 0, 1, 0]
39 | ```
40 |
41 | 2. The `dimensions` and `class_ids` fields have been specified for the object you wish to detect in `config_pose.yaml` (or the file you specified for `--config`).
42 |
43 | ### Running Inference with Multiple Weights at Once
44 | The inference script can run inference on multiple weights if the path specified in ``--weights`` points to a folder containing multiple `.pth` files.
45 | This feature is useful for fast evaluation of multiple weights to find the epoch that performs the best.
46 | While, generally, later epochs tend to perform better than earlier ones, this is not always the case.
47 | For more information on how to quantitatively evaluate the performance of a trained model, refer to the `/evaluate` subdirectory.
48 |
--------------------------------------------------------------------------------
/license.md:
--------------------------------------------------------------------------------
1 | NVIDIA Source Code License
2 |
3 |
4 | Copyright (c) 2024, NVIDIA Corporation & affiliates. All rights reserved.
5 |
6 |
7 | =======================================================================
8 |
9 | 1. Definitions
10 |
11 | "Licensor" means any person or entity that distributes its Work.
12 |
13 | "Software" means the original work of authorship made available under
14 | this License.
15 |
16 | "Work" means the Software and any additions to or derivative works of
17 | the Software that are made available under this License.
18 |
19 | The terms "reproduce," "reproduction," "derivative works," and
20 | "distribution" have the meaning as provided under U.S. copyright law;
21 | provided, however, that for the purposes of this License, derivative
22 | works shall not include works that remain separable from, or merely
23 | link (or bind by name) to the interfaces of, the Work.
24 |
25 | Works, including the Software, are "made available" under this License
26 | by including in or with the Work either (a) a copyright notice
27 | referencing the applicability of this License to the Work, or (b) a
28 | copy of this License.
29 |
30 | 2. License Grants
31 |
32 | 2.1 Copyright Grant. Subject to the terms and conditions of this
33 | License, each Licensor grants to you a perpetual, worldwide,
34 | non-exclusive, royalty-free, copyright license to reproduce,
35 | prepare derivative works of, publicly display, publicly perform,
36 | sublicense and distribute its Work and any resulting derivative
37 | works in any form.
38 |
39 | 3. Limitations
40 |
41 | 3.1 Redistribution. You may reproduce or distribute the Work only
42 | if (a) you do so under this License, (b) you include a complete
43 | copy of this License with your distribution, and (c) you retain
44 | without modification any copyright, patent, trademark, or
45 | attribution notices that are present in the Work.
46 |
47 | 3.2 Derivative Works. You may specify that additional or different
48 | terms apply to the use, reproduction, and distribution of your
49 | derivative works of the Work ("Your Terms") only if (a) Your Terms
50 | provide that the use limitation in Section 3.3 applies to your
51 | derivative works, and (b) you identify the specific derivative
52 | works that are subject to Your Terms. Notwithstanding Your Terms,
53 | this License (including the redistribution requirements in Section
54 | 3.1) will continue to apply to the Work itself.
55 |
56 | 3.3 Use Limitation. The Work and any derivative works thereof only
57 | may be used or intended for use non-commercially. Notwithstanding
58 | the foregoing, NVIDIA and its affiliates may use the Work and any
59 | derivative works commercially. As used herein, "non-commercially"
60 | means for research or evaluation purposes only.
61 |
62 | 3.4 Patent Claims. If you bring or threaten to bring a patent claim
63 | against any Licensor (including any claim, cross-claim or
64 | counterclaim in a lawsuit) to enforce any patents that you allege
65 | are infringed by any Work, then your rights under this License from
66 | such Licensor (including the grant in Section 2.1) will terminate
67 | immediately.
68 |
69 | 3.5 Trademarks. This License does not grant any rights to use any
70 | Licensor�s or its affiliates� names, logos, or trademarks, except
71 | as necessary to reproduce the notices described in this License.
72 |
73 | 3.6 Termination. If you violate any term of this License, then your
74 | rights under this License (including the grant in Section 2.1) will
75 | terminate immediately.
76 |
77 | 4. Disclaimer of Warranty.
78 |
79 | THE WORK IS PROVIDED "AS IS" WITHOUT WARRANTIES OR CONDITIONS OF ANY
80 | KIND, EITHER EXPRESS OR IMPLIED, INCLUDING WARRANTIES OR CONDITIONS OF
81 | MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE, TITLE OR
82 | NON-INFRINGEMENT. YOU BEAR THE RISK OF UNDERTAKING ANY ACTIVITIES UNDER
83 | THIS LICENSE.
84 |
85 | 5. Limitation of Liability.
86 |
87 | EXCEPT AS PROHIBITED BY APPLICABLE LAW, IN NO EVENT AND UNDER NO LEGAL
88 | THEORY, WHETHER IN TORT (INCLUDING NEGLIGENCE), CONTRACT, OR OTHERWISE
89 | SHALL ANY LICENSOR BE LIABLE TO YOU FOR DAMAGES, INCLUDING ANY DIRECT,
90 | INDIRECT, SPECIAL, INCIDENTAL, OR CONSEQUENTIAL DAMAGES ARISING OUT OF
91 | OR RELATED TO THIS LICENSE, THE USE OR INABILITY TO USE THE WORK
92 | (INCLUDING BUT NOT LIMITED TO LOSS OF GOODWILL, BUSINESS INTERRUPTION,
93 | LOST PROFITS OR DATA, COMPUTER FAILURE OR MALFUNCTION, OR ANY OTHER
94 | COMMERCIAL DAMAGES OR LOSSES), EVEN IF THE LICENSOR HAS BEEN ADVISED OF
95 | THE POSSIBILITY OF SUCH DAMAGES.
96 |
97 | =======================================================================
--------------------------------------------------------------------------------
/readme.md:
--------------------------------------------------------------------------------
1 | [](https://creativecommons.org/licenses/by-nc-sa/4.0/legalcode)
2 | 
3 | # Deep Object Pose Estimation
4 |
5 | This is the official repository for NVIDIA's Deep Object Pose Estimation, which performs detection and 6-DoF pose estimation of **known objects** from an RGB camera. For full details, see our [CoRL 2018 paper](https://arxiv.org/abs/1809.10790) and [video](https://youtu.be/yVGViBqWtBI).
6 |
7 |
8 | 
9 |
10 |
11 | ## Contents
12 |
13 | This repository contains complete code for [training](train), [inference](inference), numerical [evaluation](evaluate) of results, and synthetic [data generation](data_generation). We also provide a [ROS1 Noetic package](ros1) that performs inference on images from a USB camera.
14 |
15 | Hardware-accelerated ROS2 inference can be done with the external
16 | [NVIDIA Isaac ROS DOPE](https://github.com/NVIDIA-ISAAC-ROS/isaac_ros_pose_estimation/tree/main/isaac_ros_dope) project.
17 |
18 |
19 | The [original version](CoRL) of the code used for the CoRL paper is also included
20 | for reference, but is no longer being maintained.
21 |
22 | ## Walkthrough
23 | We provide a [walkthrough](walkthrough.md) of the entire pipeline: generating data, training a model, and inference.
24 |
25 | ## Datasets
26 |
27 | We have trained and tested DOPE with two publicly available datasets: YCB, and HOPE. The trained weights can be [downloaded from Google Drive](https://drive.google.com/drive/folders/1DfoA3m_Bm0fW8tOWXGVxi4ETlLEAgmcg).
28 |
29 |
30 |
31 | ### YCB 3D Models
32 | YCB models can be downloaded from the [YCB website](http://www.ycbbenchmarks.com/), or by using [NVDU](https://github.com/NVIDIA/Dataset_Utilities) (see the `nvdu_ycb` command).
33 |
34 |
35 | ### HOPE 3D Models
36 | The [HOPE dataset](https://github.com/swtyree/hope-dataset/) is a collection of RGBD images and video sequences with labeled 6-DoF poses for 28 toy grocery objects. The 3D models [can be downloaded here](https://drive.google.com/drive/folders/1jiJS9KgcYAkfb8KJPp5MRlB0P11BStft).
37 | The folders are organized in the style of the YCB 3d models.
38 |
39 | The physical objects can be purchased online (details and links to Amazon can be found in the [HOPE repository README](https://github.com/swtyree/hope-dataset/).
40 |
41 |
42 |
43 | ## Tested Configurations
44 |
45 | We have tested our standalone training, inference and evaluation scripts on Ubuntu 20.04 and 22.04 with Python 3.8+, using an NVIDIA Titan X, 2080Ti, and Titan RTX.
46 |
47 | The ROS1 node has been tested with ROS Noetic using Python 3.10. The Isaac ROS2 DOPE node has been tested with ROS2 Foxy on Jetson AGX Xavier with JetPack 4.6; and on x86/Ubuntu 20.04 with a NVIDIA Titan X, 2080Ti, and Titan RTX.
48 |
49 |
50 |
51 |
52 | ## How to cite DOPE
53 |
54 | If you use this tool in a research project, please cite as follows:
55 | ```
56 | @inproceedings{tremblay2018corl:dope,
57 | author = {Jonathan Tremblay and Thang To and Balakumar Sundaralingam and Yu Xiang and Dieter Fox and Stan Birchfield},
58 | title = {Deep Object Pose Estimation for Semantic Robotic Grasping of Household Objects},
59 | booktitle = {Conference on Robot Learning (CoRL)},
60 | url = "https://arxiv.org/abs/1809.10790",
61 | year = 2018
62 | }
63 | ```
64 |
65 | ## License
66 |
67 | Copyright (C) 2018-2025 NVIDIA Corporation. All rights reserved. This code is licensed under the [NVIDIA Source Code License](license.md).
68 |
69 |
70 | ## Acknowledgment
71 |
72 | Thanks to Jeff Smith (jeffreys@nvidia.com) for help maintaining the repo and software. Thanks also to [Martin Günther](https://github.com/mintar) for his code contributions and fixes.
73 |
74 |
75 | ## Contact
76 |
77 | Jonathan Tremblay (jtremblay@nvidia.com), Stan Birchfield (sbirchfield@nvidia.com)
78 |
--------------------------------------------------------------------------------
/requirements.txt:
--------------------------------------------------------------------------------
1 | torch>=2.1.0
2 | torchvision
3 | tensorboardX
4 | boto3
5 | albumentations
6 | botocore
7 | certifi
8 | charset-normalizer
9 | configparser==5.0.0
10 | idna
11 | imageio
12 | jmespath
13 | joblib
14 | networkx
15 | numpy
16 | opencv-python-headless==4.6.0.66
17 | packaging
18 | Pillow
19 | protobuf==3.20.1
20 | pyparsing
21 | pyrr==0.10.3
22 | python-dateutil
23 | PyWavelets
24 | PyYAML
25 | qudida
26 | requests
27 | seaborn
28 | s3transfer
29 | scikit-image
30 | scikit-learn
31 | scipy
32 | simplejson
33 | six
34 | threadpoolctl
35 | tifffile
36 | typing_extensions
37 | urllib3
38 |
39 |
40 | ## If running into dependency issues, install the version-specific requirements below in a virtual environment
41 | # albumentations==1.2.1
42 | # boto3==1.24.58
43 | # botocore==1.27.58
44 | # certifi==2022.6.15
45 | # charset-normalizer==2.1.1
46 | # idna==3.3
47 | # imageio==2.21.1
48 | # jmespath==1.0.1
49 | # joblib==1.1.0
50 | # networkx==2.8.6
51 | # numpy==1.23.2
52 | # opencv-python-headless==4.6.0.66
53 | # packaging==21.3
54 | # Pillow==9.2.0
55 | # protobuf==3.20.1
56 | # pyparsing==3.0.9
57 | # pyrr==0.10.3
58 | # python-dateutil==2.8.2
59 | # PyWavelets==1.3.0
60 | # PyYAML==6.0
61 | # qudida==0.0.4
62 | # requests==2.28.1
63 | # s3transfer==0.6.0
64 | # scikit-image==0.19.3
65 | # scikit-learn==1.1.2
66 | # scipy==1.9.0
67 | # simplejson==3.17.6
68 | # six==1.16.0
69 | # tensorboardX==2.5.1
70 | # threadpoolctl==3.1.0
71 | # tifffile==2022.8.12
72 | # torch==1.12.1
73 | # torchvision==0.13.1
74 | # typing_extensions==4.3.0
75 | # urllib3==1.26.12
76 |
--------------------------------------------------------------------------------
/ros1/README.md:
--------------------------------------------------------------------------------
1 | # Running DOPE with ROS
2 |
3 | This directory and its subdirectories contains code for running DOPE with ROS Noetic.
4 | The following steps assume you have installed ROS already. Alternatively, you can use the provided [Docker image](docker/readme.md) and skip to Step #7.
5 |
6 | 1. **Install ROS**
7 |
8 | Follow these [instructions](http://wiki.ros.org/noetic/Installation/Ubuntu).
9 | You can select any of the default configurations in step 1.4; even the
10 | ROS-Base (Bare Bones) package (`ros-noetic-ros-base`) is enough.
11 |
12 | 2. **Create a catkin workspace** (if you do not already have one). To create a catkin workspace, follow these [instructions](http://wiki.ros.org/catkin/Tutorials/create_a_workspace):
13 | ```
14 | $ mkdir -p ~/catkin_ws/src # Replace `catkin_ws` with the name of your workspace
15 | $ cd ~/catkin_ws/
16 | $ catkin_make
17 | ```
18 |
19 | 3. **Download the DOPE code**
20 | ```
21 | $ cd ~src
22 | $ git clone https://github.com/NVlabs/Deep_Object_Pose.git
23 | $ ln -s ~/src/Deep_Object_Pose/ros1/dope ~/catkin_ws/src/dope
24 | ```
25 |
26 | 4. **Install python dependencies**
27 | ```
28 | $ cd ~/catkin_ws/src/dope
29 | $ python3 -m pip install -r ~/src/Deep_Object_Pose/requirements.txt
30 | ```
31 |
32 | 5. **Install ROS dependencies**
33 | ```
34 | $ cd ~/catkin_ws
35 | $ rosdep install --from-paths src -i --rosdistro noetic
36 | $ sudo apt-get install ros-noetic-rosbash ros-noetic-ros-comm
37 | ```
38 |
39 | 6. **Build**
40 | ```
41 | $ cd ~/catkin_ws
42 | $ catkin_make
43 | ```
44 |
45 | 7. **Download [the weights](https://drive.google.com/open?id=1DfoA3m_Bm0fW8tOWXGVxi4ETlLEAgmcg)** and save them to the `weights` folder, *i.e.*, `~/catkin_ws/src/dope/weights/`.
46 |
47 |
48 | ## Running
49 |
50 | 1. **Start ROS master**
51 | ```
52 | $ cd ~/catkin_ws
53 | $ source devel/setup.bash
54 | $ roscore
55 | ```
56 |
57 | 2. **Start camera node** (or start your own camera node)
58 | ```
59 | $ roslaunch dope camera.launch # Publishes RGB images to `/dope/webcam_rgb_raw`
60 | ```
61 |
62 | The camera must publish a correct `camera_info` topic to enable DOPE to compute the correct poses. Basically all ROS drivers have a `camera_info_url` parameter where you can set the calibration info (but most ROS drivers include a reasonable default).
63 |
64 | For details on calibration and rectification of your camera see the [camera tutorial](doc/camera_tutorial.md).
65 |
66 | 3. **Edit config info** (if desired) in `~/catkin_ws/src/dope/config/config_pose.yaml`
67 | * `topic_camera`: RGB topic to listen to
68 | * `topic_camera_info`: camera info topic to listen to
69 | * `topic_publishing`: topic namespace for publishing
70 | * `input_is_rectified`: Whether the input images are rectified. It is strongly suggested to use a rectified input topic.
71 | * `downscale_height`: If the input image is larger than this, scale it down to this pixel height. Very large input images eat up all the GPU memory and slow down inference. Also, DOPE works best when the object size (in pixels) has appeared in the training data (which is downscaled to 400 px). For these reasons, downscaling large input images to something reasonable (e.g., 400-500 px) improves memory consumption, inference speed *and* recognition results.
72 | * `weights`: dictionary of object names and there weights path name, **comment out any line to disable detection/estimation of that object**
73 | * `dimensions`: dictionary of dimensions for the objects (key values must match the `weights` names)
74 | * `class_ids`: dictionary of class ids to be used in the messages published on the `/dope/detected_objects` topic (key values must match the `weights` names)
75 | * `draw_colors`: dictionary of object colors (key values must match the `weights` names)
76 | * `model_transforms`: dictionary of transforms that are applied to the pose before publishing (key values must match the `weights` names)
77 | * `meshes`: dictionary of mesh filenames for visualization (key values must match the `weights` names)
78 | * `mesh_scales`: dictionary of scaling factors for the visualization meshes (key values must match the `weights` names)
79 | * `overlay_belief_images`: whether to overlay the input image on the belief images published on /dope/belief_[obj_name]
80 | * `thresh_angle`: undocumented
81 | * `thresh_map`: undocumented
82 | * `sigma`: undocumented
83 | * `thresh_points`: Thresholding the confidence for object detection; increase this value if you see too many false positives, reduce it if objects are not detected.
84 |
85 | 4. **Start DOPE node**
86 | ```
87 | $ roslaunch dope dope.launch [config:=/path/to/my_config.yaml] # Config file is optional; default is `config_pose.yaml`
88 | ```
89 |
90 |
91 |
92 |
93 | ## Debugging
94 |
95 | * The following ROS topics are published (assuming `topic_publishing == 'dope'`):
96 | ```
97 | /dope/belief_[obj_name] # belief maps of object
98 | /dope/dimension_[obj_name] # dimensions of object
99 | /dope/pose_[obj_name] # timestamped pose of object
100 | /dope/rgb_points # RGB images with detected cuboids overlaid
101 | /dope/detected_objects # vision_msgs/Detection3DArray of all detected objects
102 | /dope/markers # RViz visualization markers for all objects
103 | ```
104 | *Note:* `[obj_name]` is in {cracker, gelatin, meat, mustard, soup, sugar}
105 |
106 | * To debug in RViz, run `rviz`, then add one or more of the following displays:
107 | * `Add > Image` to view the raw RGB image or the image with cuboids overlaid
108 | * `Add > Pose` to view the object coordinate frame in 3D.
109 | * `Add > MarkerArray` to view the cuboids, meshes etc. in 3D.
110 | * `Add > Camera` to view the RGB Image with the poses and markers from above.
111 |
112 | If you do not have a coordinate frame set up, you can run this static transformation: `rosrun tf2_ros static_transform_publisher 0 0 0 0.7071 0 0 -0.7071 world `, where `` is the `frame_id` of your input camera messages. Make sure that in RViz's `Global Options`, the `Fixed Frame` is set to `world`. Alternatively, you can skip the `static_transform_publisher` step and directly set the `Fixed Frame` to your ``.
113 |
114 | * If `rosrun` does not find the package (`[rospack] Error: package 'dope' not found`), be sure that you called `source devel/setup.bash` as mentioned above. To find the package, run `rospack find dope`.
115 |
116 |
--------------------------------------------------------------------------------
/ros1/docker/Dockerfile.noetic:
--------------------------------------------------------------------------------
1 | FROM ros:noetic-robot
2 |
3 | # Copyright (c) 2020, NVIDIA CORPORATION. All rights reserved.
4 | # Full license terms provided in LICENSE.md file.
5 | # To build:
6 | # docker build -t nvidia-dope:noetic-v1 -f Dockerfile.noetic ..
7 |
8 | ENV HOME /root
9 | ENV DEBIAN_FRONTEND=noninteractive
10 |
11 | # Install system and development components
12 | RUN apt update && apt -y --no-install-recommends install \
13 | software-properties-common \
14 | build-essential \
15 | cmake \
16 | git \
17 | python3-pip \
18 | libxext6 \
19 | libx11-6 \
20 | libglvnd0 \
21 | libgl1 \
22 | libglx0 \
23 | libegl1 \
24 | freeglut3-dev \
25 | && apt -y autoremove \
26 | && apt clean
27 |
28 | # Install required ROS components
29 | RUN apt update && apt -y --no-install-recommends install \
30 | ros-noetic-cv-bridge \
31 | ros-noetic-geometry-msgs \
32 | ros-noetic-message-filters \
33 | ros-noetic-resource-retriever \
34 | ros-noetic-rospy \
35 | ros-noetic-sensor-msgs \
36 | ros-noetic-std-msgs \
37 | ros-noetic-tf \
38 | ros-noetic-vision-msgs \
39 | ros-noetic-visualization-msgs \
40 | ros-noetic-rviz \
41 | && apt -y autoremove \
42 | && apt clean
43 |
44 | # pip install required Python packages
45 | COPY requirements.txt ${HOME}
46 | RUN python3 -m pip install --no-cache-dir -r ${HOME}/requirements.txt
47 |
48 | # Setup catkin workspace
49 | ENV CATKIN_WS ${HOME}/catkin_ws
50 | COPY dope ${CATKIN_WS}/src/dope
51 | COPY docker/init_workspace.sh ${HOME}
52 | RUN ${HOME}/init_workspace.sh
53 | RUN echo "source ${CATKIN_WS}/devel/setup.bash" >> ${HOME}/.bashrc
54 |
55 | ENV DISPLAY :0
56 | ENV NVIDIA_VISIBLE_DEVICES all
57 | ENV NVIDIA_DRIVER_CAPABILITIES graphics,utility,compute
58 | ENV TERM=xterm
59 | # Some QT-Apps don't show controls without this
60 | ENV QT_X11_NO_MITSHM 1
--------------------------------------------------------------------------------
/ros1/docker/init_workspace.sh:
--------------------------------------------------------------------------------
1 | #!/bin/bash
2 | #
3 |
4 | # Copyright (c) 2018, NVIDIA CORPORATION. All rights reserved.
5 | # Full license terms provided in LICENSE.md file.
6 |
7 | # Stop in case of any error.
8 | set -e
9 |
10 | source /opt/ros/noetic/setup.bash
11 |
12 | # Create catkin workspace.
13 | mkdir -p ${CATKIN_WS}/src
14 | cd ${CATKIN_WS}/src
15 | catkin_init_workspace
16 | # Clone ROS libraires that must be built from source
17 | git clone https://github.com/ros-perception/camera_info_manager_py.git
18 | cd ..
19 | catkin_make
20 |
--------------------------------------------------------------------------------
/ros1/docker/readme.md:
--------------------------------------------------------------------------------
1 | ## DOPE in a Docker Container
2 |
3 | Running ROS inside of [Docker](https://www.docker.com/) is an excellent way to
4 | experiment with DOPE, as it allows the user to completely isolate all software and configuration
5 | changes from the host system. This document describes how to create and run a
6 | Docker image that contains a complete ROS environment that supports DOPE,
7 | including all required components, such as ROS Noetic, rviz, CUDA with cuDNN,
8 | and other packages.
9 |
10 | The current configuration assumes all components are installed on an x86 host
11 | platform running Ubuntu 18.04 or later. Further, use of the DOPE Docker container requires an NVIDIA GPU to be present, and the use of Docker version 19.03.0 or later.
12 |
13 |
14 | ### Steps
15 |
16 | 1. **Download the DOPE code**
17 | ```
18 | $ git clone https://github.com/NVlabs/Deep_Object_Pose.git dope
19 | ```
20 |
21 | 2. **Build the docker image**
22 | ```
23 | $ cd dope/docker
24 | $ docker build -t nvidia-dope:noetic-v1 -f Dockerfile.noetic ..
25 | ```
26 | This will take several minutes and requires an internet connection.
27 |
28 | 3. **Plug in your camera**
29 | Docker will not recognize a USB device that is plugged in after the container is started.
30 |
31 | 4. **Run the container**
32 | ```
33 | $ ./run_dope_docker.sh [name] [host dir] [container dir]
34 | ```
35 | Parameters:
36 | - `name` is an optional field that specifies the name of this image. By default, it is `nvidia-dope-v2`. By using different names, you can create multiple containers from the same image.
37 | - `host dir` and `container dir` are a pair of optional fields that allow you to specify a mapping between a directory on your host machine and a location inside the container. This is useful for sharing code and data between the two systems. By default, it maps the directory containing dope to `/root/catkin_ws/src/dope` in the container.
38 |
39 | Only the first invocation of this script with a given name will create a container. Subsequent executions will attach to the running container allowing you -- in effect -- to have multiple terminal sessions into a single container.
40 |
41 | 5. **Build DOPE**
42 | Return to step 7 of the [installation instructions](../readme.md) (downloading the weights).
43 |
44 | *Note:* Since the Docker container binds directly to the host's network, it will see `roscore` even if running outside the docker container.
45 |
--------------------------------------------------------------------------------
/ros1/docker/run_dope_docker.sh:
--------------------------------------------------------------------------------
1 | #!/bin/bash
2 |
3 | # Copyright (c) 2020, NVIDIA CORPORATION. All rights reserved.
4 | # Full license terms provided in LICENSE.md file.
5 |
6 | CONTAINER_NAME=$1
7 | if [[ -z "${CONTAINER_NAME}" ]]; then
8 | CONTAINER_NAME=nvidia-dope-v2
9 | fi
10 |
11 | # This specifies a mapping between a host directory and a directory in the
12 | # docker container. This mapping should be changed if you wish to have access to
13 | # a different directory
14 | HOST_DIR=$2
15 | if [[ -z "${HOST_DIR}" ]]; then
16 | HOST_DIR=`realpath ${PWD}/..`
17 | fi
18 |
19 | CONTAINER_DIR=$3
20 | if [[ -z "${CONTAINER_DIR}" ]]; then
21 | CONTAINER_DIR=/root/catkin_ws/src/dope
22 | fi
23 |
24 | echo "Container name : ${CONTAINER_NAME}"
25 | echo "Host directory : ${HOST_DIR}"
26 | echo "Container directory: ${CONTAINER_DIR}"
27 | DOPE_ID=`docker ps -aqf "name=^/${CONTAINER_NAME}$"`
28 | if [ -z "${DOPE_ID}" ]; then
29 | echo "Creating new DOPE docker container."
30 | xhost +local:root
31 | docker run --gpus all -it --privileged --network=host -v ${HOST_DIR}:${CONTAINER_DIR}:rw -v /tmp/.X11-unix:/tmp/.X11-unix:rw --env="DISPLAY" --name=${CONTAINER_NAME} nvidia-dope:noetic-v1 bash
32 | else
33 | echo "Found DOPE docker container: ${DOPE_ID}."
34 | # Check if the container is already running and start if necessary.
35 | if [ -z `docker ps -qf "name=^/${CONTAINER_NAME}$"` ]; then
36 | xhost +local:${DOPE_ID}
37 | echo "Starting and attaching to ${CONTAINER_NAME} container..."
38 | docker start ${DOPE_ID}
39 | docker attach ${DOPE_ID}
40 | else
41 | echo "Found running ${CONTAINER_NAME} container, attaching bash..."
42 | docker exec -it ${DOPE_ID} bash
43 | fi
44 | fi
45 |
--------------------------------------------------------------------------------
/ros1/dope/CMakeLists.txt:
--------------------------------------------------------------------------------
1 | cmake_minimum_required(VERSION 3.5.2)
2 | project(dope)
3 |
4 | ## Compile as C++11, supported in ROS Kinetic and newer
5 | # add_compile_options(-std=c++11)
6 |
7 | ## Find catkin macros and libraries
8 | ## if COMPONENTS list like find_package(catkin REQUIRED COMPONENTS xyz)
9 | ## is used, also find other catkin packages
10 | find_package(catkin REQUIRED COMPONENTS
11 | cv_bridge
12 | geometry_msgs
13 | message_filters
14 | resource_retriever
15 | rospy
16 | sensor_msgs
17 | std_msgs
18 | tf
19 | vision_msgs
20 | visualization_msgs
21 | )
22 |
23 | ## System dependencies are found with CMake's conventions
24 | # find_package(Boost REQUIRED COMPONENTS system)
25 |
26 |
27 | ## Uncomment this if the package has a setup.py. This macro ensures
28 | ## modules and global scripts declared therein get installed
29 | ## See http://ros.org/doc/api/catkin/html/user_guide/setup_dot_py.html
30 | catkin_python_setup()
31 |
32 | ################################################
33 | ## Declare ROS messages, services and actions ##
34 | ################################################
35 |
36 | ## To declare and build messages, services or actions from within this
37 | ## package, follow these steps:
38 | ## * Let MSG_DEP_SET be the set of packages whose message types you use in
39 | ## your messages/services/actions (e.g. std_msgs, actionlib_msgs, ...).
40 | ## * In the file package.xml:
41 | ## * add a build_depend tag for "message_generation"
42 | ## * add a build_depend and a exec_depend tag for each package in MSG_DEP_SET
43 | ## * If MSG_DEP_SET isn't empty the following dependency has been pulled in
44 | ## but can be declared for certainty nonetheless:
45 | ## * add a exec_depend tag for "message_runtime"
46 | ## * In this file (CMakeLists.txt):
47 | ## * add "message_generation" and every package in MSG_DEP_SET to
48 | ## find_package(catkin REQUIRED COMPONENTS ...)
49 | ## * add "message_runtime" and every package in MSG_DEP_SET to
50 | ## catkin_package(CATKIN_DEPENDS ...)
51 | ## * uncomment the add_*_files sections below as needed
52 | ## and list every .msg/.srv/.action file to be processed
53 | ## * uncomment the generate_messages entry below
54 | ## * add every package in MSG_DEP_SET to generate_messages(DEPENDENCIES ...)
55 |
56 | ## Generate messages in the 'msg' folder
57 | # add_message_files(
58 | # FILES
59 | # Message1.msg
60 | # Message2.msg
61 | # )
62 |
63 | ## Generate services in the 'srv' folder
64 | # add_service_files(
65 | # FILES
66 | # Service1.srv
67 | # Service2.srv
68 | # )
69 |
70 | ## Generate actions in the 'action' folder
71 | # add_action_files(
72 | # FILES
73 | # Action1.action
74 | # Action2.action
75 | # )
76 |
77 | ## Generate added messages and services with any dependencies listed here
78 | # generate_messages(
79 | # DEPENDENCIES
80 | # std_msgs
81 | # )
82 |
83 | ################################################
84 | ## Declare ROS dynamic reconfigure parameters ##
85 | ################################################
86 |
87 | ## To declare and build dynamic reconfigure parameters within this
88 | ## package, follow these steps:
89 | ## * In the file package.xml:
90 | ## * add a build_depend and a exec_depend tag for "dynamic_reconfigure"
91 | ## * In this file (CMakeLists.txt):
92 | ## * add "dynamic_reconfigure" to
93 | ## find_package(catkin REQUIRED COMPONENTS ...)
94 | ## * uncomment the "generate_dynamic_reconfigure_options" section below
95 | ## and list every .cfg file to be processed
96 |
97 | ## Generate dynamic reconfigure parameters in the 'cfg' folder
98 | # generate_dynamic_reconfigure_options(
99 | # cfg/DynReconf1.cfg
100 | # cfg/DynReconf2.cfg
101 | # )
102 |
103 | ###################################
104 | ## catkin specific configuration ##
105 | ###################################
106 | ## The catkin_package macro generates cmake config files for your package
107 | ## Declare things to be passed to dependent projects
108 | ## INCLUDE_DIRS: uncomment this if your package contains header files
109 | ## LIBRARIES: libraries you create in this project that dependent projects also need
110 | ## CATKIN_DEPENDS: catkin_packages dependent projects also need
111 | ## DEPENDS: system dependencies of this project that dependent projects also need
112 | catkin_package(
113 | CATKIN_DEPENDS geometry_msgs sensor_msgs std_msgs vision_msgs visualization_msgs
114 | # DEPENDS system_lib
115 | )
116 |
117 | ###########
118 | ## Build ##
119 | ###########
120 |
121 | ## Specify additional locations of header files
122 | ## Your package locations should be listed before other locations
123 | include_directories(
124 | # include
125 | ${catkin_INCLUDE_DIRS}
126 | )
127 |
128 | ## Declare a C++ library
129 | # add_library(${PROJECT_NAME}
130 | # src/${PROJECT_NAME}/dope/dope.cpp
131 | # )
132 |
133 | ## Add cmake target dependencies of the library
134 | ## as an example, code may need to be generated before libraries
135 | ## either from message generation or dynamic reconfigure
136 | # add_dependencies(${PROJECT_NAME} ${${PROJECT_NAME}_EXPORTED_TARGETS} ${catkin_EXPORTED_TARGETS})
137 |
138 | ## Declare a C++ executable
139 | ## With catkin_make all packages are built within a single CMake context
140 | ## The recommended prefix ensures that target names across packages don't collide
141 | # add_executable(${PROJECT_NAME}_node src/dope_vis_node.cpp)
142 |
143 | ## Rename C++ executable without prefix
144 | ## The above recommended prefix causes long target names, the following renames the
145 | ## target back to the shorter version for ease of user use
146 | ## e.g. "rosrun someones_pkg node" instead of "rosrun someones_pkg someones_pkg_node"
147 | # set_target_properties(${PROJECT_NAME}_node PROPERTIES OUTPUT_NAME node PREFIX "")
148 |
149 | ## Add cmake target dependencies of the executable
150 | ## same as for the library above
151 | # add_dependencies(${PROJECT_NAME}_node ${${PROJECT_NAME}_EXPORTED_TARGETS} ${catkin_EXPORTED_TARGETS})
152 |
153 | ## Specify libraries to link a library or executable target against
154 | # target_link_libraries(${PROJECT_NAME}_node
155 | # ${catkin_LIBRARIES}
156 | # )
157 |
158 | #############
159 | ## Install ##
160 | #############
161 |
162 | # all install targets should use catkin DESTINATION variables
163 | # See http://ros.org/doc/api/catkin/html/adv_user_guide/variables.html
164 |
165 | ## Mark executable scripts (Python etc.) for installation
166 | ## in contrast to setup.py, you can choose the destination
167 | # install(PROGRAMS
168 | # scripts/my_python_script
169 | # DESTINATION ${CATKIN_PACKAGE_BIN_DESTINATION}
170 | # )
171 |
172 | catkin_install_python(PROGRAMS
173 | nodes/camera
174 | nodes/dope
175 | DESTINATION ${CATKIN_PACKAGE_BIN_DESTINATION})
176 |
177 | ## Mark executables and/or libraries for installation
178 | # install(TARGETS ${PROJECT_NAME} ${PROJECT_NAME}_node
179 | # ARCHIVE DESTINATION ${CATKIN_PACKAGE_LIB_DESTINATION}
180 | # LIBRARY DESTINATION ${CATKIN_PACKAGE_LIB_DESTINATION}
181 | # RUNTIME DESTINATION ${CATKIN_PACKAGE_BIN_DESTINATION}
182 | # )
183 |
184 | ## Mark cpp header files for installation
185 | # install(DIRECTORY include/${PROJECT_NAME}/
186 | # DESTINATION ${CATKIN_PACKAGE_INCLUDE_DESTINATION}
187 | # FILES_MATCHING PATTERN "*.h"
188 | # PATTERN ".svn" EXCLUDE
189 | # )
190 |
191 | ## Mark other files for installation (e.g. launch and bag files, etc.)
192 | # install(FILES
193 | # # myfile1
194 | # # myfile2
195 | # DESTINATION ${CATKIN_PACKAGE_SHARE_DESTINATION}
196 | # )
197 |
198 | #############
199 | ## Testing ##
200 | #############
201 |
202 | ## Add gtest based cpp test target and link libraries
203 | # catkin_add_gtest(${PROJECT_NAME}-test test/test_dope_vis.cpp)
204 | # if(TARGET ${PROJECT_NAME}-test)
205 | # target_link_libraries(${PROJECT_NAME}-test ${PROJECT_NAME})
206 | # endif()
207 |
208 | ## Add folders to be run by python nosetests
209 | # catkin_add_nosetests(test)
210 |
--------------------------------------------------------------------------------
/ros1/dope/config/camera_info.yaml:
--------------------------------------------------------------------------------
1 | image_width: 640
2 | image_height: 480
3 | camera_name: dope_webcam_0
4 | camera_matrix:
5 | rows: 3
6 | cols: 3
7 | data: [641.5, 0, 320.0, 0, 641.5, 240.0, 0, 0, 1]
8 | distortion_model: plumb_bob
9 | distortion_coefficients:
10 | rows: 1
11 | cols: 5
12 | data: [0, 0, 0, 0, 0]
13 | rectification_matrix:
14 | rows: 3
15 | cols: 3
16 | data: [1, 0, 0, 0, 1, 0, 0, 0, 1]
17 | projection_matrix:
18 | rows: 3
19 | cols: 4
20 | data: [641.5, 0, 320.0, 0, 0, 641.5, 240.0, 0, 0, 0, 1, 0]
21 |
--------------------------------------------------------------------------------
/ros1/dope/config/config_pose.yaml:
--------------------------------------------------------------------------------
1 | topic_camera: "/dope/webcam/image_raw"
2 | topic_camera_info: "/dope/webcam/camera_info"
3 | topic_publishing: "dope"
4 | input_is_rectified: True # Whether the input image is rectified (strongly suggested!)
5 | downscale_height: 400 # if the input image is larger than this, scale it down to this pixel height
6 |
7 | # Comment any of these lines to prevent detection / pose estimation of that object
8 | weights: {
9 | # "cracker":"package://dope/weights/cracker_60.pth",
10 | # "gelatin":"package://dope/weights/gelatin_60.pth",
11 | # "meat":"package://dope/weights/meat_20.pth",
12 | # "mustard":"package://dope/weights/mustard_60.pth",
13 | "soup":"package://dope/weights/soup_60.pth",
14 | #"sugar":"package://dope/weights/sugar_60.pth",
15 | # "bleach":"package://dope/weights/bleach_28_dr.pth"
16 |
17 | # NEW OBJECTS - HOPE
18 | # "AlphabetSoup":"package://dope/weights/AlphabetSoup.pth",
19 | # "BBQSauce":"package://dope/weights/BBQSauce.pth",
20 | # "Butter":"package://dope/weights/Butter.pth",
21 | # "Cherries":"package://dope/weights/Cherries.pth",
22 | # "ChocolatePudding":"package://dope/weights/ChocolatePudding.pth",
23 | # "Cookies":"package://dope/weights/Cookies.pth",
24 | # "Corn":"package://dope/weights/Corn.pth",
25 | # "CreamCheese":"package://dope/weights/CreamCheese.pth",
26 | # "GreenBeans":"package://dope/weights/GreenBeans.pth",
27 | # "GranolaBars":"package://dope/weights/GranolaBars.pth",
28 | # "Ketchup":"package://dope/weights/Ketchup.pth",
29 | # "MacaroniAndCheese":"package://dope/weights/MacaroniAndCheese.pth",
30 | # "Mayo":"package://dope/weights/Mayo.pth",
31 | # "Milk":"package://dope/weights/Milk.pth",
32 | # "Mushrooms":"package://dope/weights/Mushrooms.pth",
33 | # "Mustard":"package://dope/weights/Mustard.pth",
34 | # "Parmesan":"package://dope/weights/Parmesan.pth",
35 | # "PeasAndCarrots":"package://dope/weights/PeasAndCarrots.pth",
36 | # "Peaches":"package://dope/weights/Peaches.pth",
37 | # "Pineapple":"package://dope/weights/Pineapple.pth",
38 | # "Popcorn":"package://dope/weights/Popcorn.pth",
39 | # "OrangeJuice":"package://dope/weights/OrangeJuice.pth",
40 | # "Raisins":"package://dope/weights/Raisins.pth",
41 | # "SaladDressing":"package://dope/weights/SaladDressing.pth",
42 | # "Spaghetti":"package://dope/weights/Spaghetti.pth",
43 | # "TomatoSauce":"package://dope/weights/TomatoSauce.pth",
44 | # "Tuna":"package://dope/weights/Tuna.pth",
45 | # "Yogurt":"package://dope/weights/Yogurt.pth",
46 |
47 | }
48 |
49 | # Cuboid dimension in cm x,y,z
50 | dimensions: {
51 | "cracker": [16.403600692749023,21.343700408935547,7.179999828338623],
52 | "gelatin": [8.918299674987793, 7.311500072479248, 2.9983000755310059],
53 | "meat": [10.164673805236816,8.3542995452880859,5.7600898742675781],
54 | "mustard": [9.6024150848388672,19.130100250244141,5.824894905090332],
55 | "soup": [6.7659378051757813,10.185500144958496,6.771425724029541],
56 | "sugar": [9.267730712890625,17.625339508056641,4.5134143829345703],
57 | "bleach": [10.267730712890625,26.625339508056641,7.5134143829345703],
58 |
59 | # new objects
60 | "AlphabetSoup" : [ 8.3555002212524414, 7.1121001243591309, 6.6055998802185059 ],
61 | "Butter" : [ 5.282599925994873, 2.3935999870300293, 10.330100059509277 ],
62 | "Ketchup" : [ 14.860799789428711, 4.3368000984191895, 6.4513998031616211 ],
63 | "Pineapple" : [ 5.7623000144958496, 6.95989990234375, 6.567500114440918 ],
64 | "BBQSauce" : [ 14.832900047302246, 4.3478999137878418, 6.4632000923156738 ],
65 | "MacaroniAndCheese" : [ 16.625600814819336, 4.0180997848510742, 12.350899696350098 ],
66 | "Popcorn" : [ 8.4976997375488281, 3.825200080871582, 12.649200439453125 ],
67 | "Mayo" : [ 14.790200233459473, 4.1030998229980469, 6.4541001319885254 ],
68 | "Raisins" : [ 12.317500114440918, 3.9751999378204346, 8.5874996185302734 ],
69 | "Cherries" : [ 5.8038997650146484, 7.0907998085021973, 6.6101999282836914 ],
70 | "Milk" : [ 19.035800933837891, 7.326200008392334, 7.2154998779296875 ],
71 | "SaladDressing" : [ 14.744099617004395, 4.3695998191833496, 6.403900146484375 ],
72 | "ChocolatePudding" : [ 4.947199821472168, 2.9923000335693359, 8.3498001098632812 ],
73 | "Mushrooms" : [ 3.3322000503540039, 7.079899787902832, 6.5869998931884766 ],
74 | "Spaghetti" : [ 4.9836997985839844, 2.8492999076843262, 24.988100051879883 ],
75 | "Cookies" : [ 16.724300384521484, 4.015200138092041, 12.274600028991699 ],
76 | "Mustard" : [ 16.004999160766602, 4.8573999404907227, 6.5132999420166016 ],
77 | "TomatoSauce" : [ 8.2847003936767578, 7.0198001861572266, 6.6469998359680176 ],
78 | "Corn" : [ 5.8038997650146484, 7.0907998085021973, 6.6101999282836914 ],
79 | "OrangeJuice" : [ 19.248300552368164, 7.2781000137329102, 7.1582999229431152 ],
80 | "Tuna" : [ 3.2571001052856445, 7.0805997848510742, 6.5837001800537109 ],
81 | "CreamCheese" : [ 5.3206000328063965, 2.4230999946594238, 10.359000205993652 ],
82 | "Parmesan" : [ 10.286199569702148, 6.6093001365661621, 7.1117000579833984 ],
83 | "Yogurt" : [ 5.3677000999450684, 6.7961997985839844, 6.7915000915527344 ],
84 | "GranolaBars" : [ 12.400600433349609, 3.8738000392913818, 16.53380012512207 ],
85 | "Peaches" : [ 5.7781000137329102, 7.0961999893188477, 6.5925998687744141 ],
86 | "GreenBeans" : [ 5.758699893951416, 7.0608000755310059, 6.5732002258300781 ],
87 | "PeasAndCarrots" : [ 5.8512001037597656, 7.0636000633239746, 6.5918002128601074 ]
88 | }
89 |
90 | class_ids: {
91 | "cracker": 1,
92 | "gelatin": 2,
93 | "meat": 3,
94 | "mustard": 4,
95 | "soup": 5,
96 | "sugar": 6,
97 | "bleach": 7,
98 | "AlphabetSoup" : 9,
99 | "Ketchup" : 10,
100 | "Pineapple" : 11,
101 | "BBQSauce" : 12,
102 | "MacaroniAndCheese" : 13,
103 | "Popcorn" : 14,
104 | "Butter" : 15,
105 | "Mayo" : 16,
106 | "Raisins" : 17,
107 | "Cherries" : 18,
108 | "Milk" : 19,
109 | "SaladDressing" : 20,
110 | "ChocolatePudding" : 21,
111 | "Mushrooms" : 22,
112 | "Spaghetti" : 23,
113 | "Cookies" : 24,
114 | "Mustard" : 25,
115 | "TomatoSauce" : 26,
116 | "Corn" : 27,
117 | "OrangeJuice" : 28,
118 | "Tuna" : 29,
119 | "CreamCheese" : 20,
120 | "Parmesan" : 31,
121 | "Yogurt" : 32,
122 | "GranolaBars" : 33,
123 | "Peaches" : 34,
124 | "GreenBeans" : 35,
125 | "PeasAndCarrots" : 36
126 | }
127 |
128 | draw_colors: {
129 | "cracker": [13, 255, 128], # green
130 | "gelatin": [255, 255, 255], # while
131 | "meat": [0, 104, 255], # blue
132 | "mustard": [217,12, 232], # magenta
133 | "soup": [255, 101, 0], # orange
134 | "sugar": [232, 222, 12], # yellow
135 | "bleach": [232, 222, 12], # yellow
136 | }
137 |
138 | # optional: provide a transform that is applied to the pose returned by DOPE
139 | model_transforms: {
140 | # "cracker": [[ 0, 0, 1, 0],
141 | # [ 0, -1, 0, 0],
142 | # [ 1, 0, 0, 0],
143 | # [ 0, 0, 0, 1]]
144 | }
145 |
146 | # optional: if you provide a mesh of the object here, a mesh marker will be
147 | # published for visualization in RViz
148 | # You can use the nvdu_ycb tool to download the meshes: https://github.com/NVIDIA/Dataset_Utilities#nvdu_ycb
149 | meshes: {
150 | # "cracker": "file://path/to/Dataset_Utilities/nvdu/data/ycb/aligned_cm/003_cracker_box/google_16k/textured.obj",
151 | # "gelatin": "file://path/to/Dataset_Utilities/nvdu/data/ycb/aligned_cm/009_gelatin_box/google_16k/textured.obj",
152 | # "meat": "file://path/to/Dataset_Utilities/nvdu/data/ycb/aligned_cm/010_potted_meat_can/google_16k/textured.obj",
153 | # "mustard": "file://path/to/Dataset_Utilities/nvdu/data/ycb/aligned_cm/006_mustard_bottle/google_16k/textured.obj",
154 | # "soup": "file://path/to/Dataset_Utilities/nvdu/data/ycb/aligned_cm/005_tomato_soup_can/google_16k/textured.obj",
155 | # "sugar": "file://path/to/Dataset_Utilities/nvdu/data/ycb/aligned_cm/004_sugar_box/google_16k/textured.obj",
156 | # "bleach": "file://path/to/Dataset_Utilities/nvdu/data/ycb/aligned_cm/021_bleach_cleanser/google_16k/textured.obj",
157 | }
158 |
159 | # optional: If the specified meshes are not in meters, provide a scale here (e.g. if the mesh is in centimeters, scale should be 0.01). default scale: 1.0.
160 | mesh_scales: {
161 | "cracker": 0.01,
162 | "gelatin": 0.01,
163 | "meat": 0.01,
164 | "mustard": 0.01,
165 | "soup": 0.01,
166 | "sugar": 0.01,
167 | "bleach": 0.01,
168 | }
169 |
170 | overlay_belief_images: True # Whether to overlay the input image on the belief images published on /dope/belief_[obj_name]
171 |
172 | # Config params for DOPE
173 | thresh_angle: 0.5
174 | thresh_map: 0.01
175 | sigma: 3
176 | thresh_points: 0.1
177 |
--------------------------------------------------------------------------------
/ros1/dope/launch/camera.launch:
--------------------------------------------------------------------------------
1 |
2 |
3 |
4 |
5 |
6 |
--------------------------------------------------------------------------------
/ros1/dope/launch/dope.launch:
--------------------------------------------------------------------------------
1 |
2 |
3 |
4 |
5 |
6 |
7 |
8 |
--------------------------------------------------------------------------------
/ros1/dope/nodes/camera:
--------------------------------------------------------------------------------
1 | #!/usr/bin/env python3
2 | # Copyright (c) 2018 NVIDIA Corporation. All rights reserved.
3 | # This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
4 | # https://creativecommons.org/licenses/by-nc-sa/4.0/legalcode
5 |
6 | """
7 | This file opens an RGB camera and publishes images via ROS.
8 | It uses OpenCV to capture from camera 0.
9 | """
10 |
11 | import cv2
12 | import rospy
13 | from camera_info_manager import CameraInfoManager
14 | from cv_bridge import CvBridge
15 | from sensor_msgs.msg import Image, CameraInfo
16 | import sys
17 |
18 |
19 | def publish_images(freq=100):
20 | cam_index = 0 # index of camera to capture
21 |
22 | ### initialize ROS publishers etc.
23 | rospy.init_node('dope_webcam')
24 | camera_ns = rospy.get_param('camera', 'dope/webcam')
25 | img_topic = '{}/image_raw'.format(camera_ns)
26 | info_topic = '{}/camera_info'.format(camera_ns)
27 | image_pub = rospy.Publisher(img_topic, Image, queue_size=10)
28 | info_pub = rospy.Publisher(info_topic, CameraInfo, queue_size=10)
29 | info_manager = CameraInfoManager(cname='dope_webcam_{}'.format(cam_index),
30 | namespace=camera_ns)
31 | try:
32 | camera_info_url = rospy.get_param('~camera_info_url')
33 | if not info_manager.setURL(camera_info_url):
34 | rospy.logwarn('Camera info URL invalid: %s', camera_info_url)
35 | except KeyError:
36 | # we don't have a camera_info_url, so we'll keep the
37 | # default ('file://${ROS_HOME}/camera_info/${NAME}.yaml')
38 | pass
39 |
40 | info_manager.loadCameraInfo()
41 | if not info_manager.isCalibrated():
42 | rospy.logwarn('Camera is not calibrated, please supply a valid camera_info_url parameter!')
43 |
44 | ### open camera
45 | cap = cv2.VideoCapture(cam_index)
46 | if not cap.isOpened():
47 | rospy.logfatal("ERROR: Unable to open camera for capture. Is camera plugged in?")
48 | sys.exit(1)
49 |
50 | rospy.loginfo("Publishing images from camera %s to topic '%s'...", cam_index, img_topic)
51 | rospy.loginfo("Ctrl-C to stop")
52 |
53 | ### publish images
54 | rate = rospy.Rate(freq)
55 | while not rospy.is_shutdown():
56 | ret, frame = cap.read()
57 |
58 | if ret:
59 | image = CvBridge().cv2_to_imgmsg(frame, "bgr8")
60 | image.header.frame_id = 'dope_webcam'
61 | image.header.stamp = rospy.Time.now()
62 | image_pub.publish(image)
63 | # we need to call getCameraInfo() every time in case it was updated
64 | camera_info = info_manager.getCameraInfo()
65 | camera_info.header = image.header
66 | info_pub.publish(camera_info)
67 |
68 | rate.sleep()
69 |
70 |
71 | if __name__ == "__main__":
72 | try:
73 | publish_images()
74 | except rospy.ROSInterruptException:
75 | pass
76 |
--------------------------------------------------------------------------------
/ros1/dope/package.xml:
--------------------------------------------------------------------------------
1 |
2 |
3 | dope
4 | 0.0.0
5 | The DOPE package for deep object pose estimation
6 |
7 |
8 |
9 |
10 | jtremblay
11 |
12 |
13 |
14 |
15 |
16 | CC BY-NC-SA 4.0
17 |
18 |
19 |
20 |
21 |
22 | https://research.nvidia.com/publication/2018-09_Deep-Object-Pose
23 |
24 |
25 |
26 |
27 |
28 |
29 |
30 |
31 |
32 |
33 |
34 |
35 |
36 |
37 |
38 |
39 |
40 |
41 |
42 |
43 |
44 |
45 |
46 |
47 |
48 |
49 |
50 |
51 | catkin
52 | camera_info_manager_py
53 | cv_bridge
54 | geometry_msgs
55 | message_filters
56 | python-argparse
57 | resource_retriever
58 | rospy
59 | sensor_msgs
60 | std_msgs
61 | tf
62 | vision_msgs
63 | visualization_msgs
64 | python3-pyrr-pip
65 | python-pytorch-pip
66 | python3-numpy
67 | python3-scipy
68 | python3-opencv
69 | python3-pil
70 | python-configparser
71 |
72 |
73 |
74 |
75 |
76 |
77 |
78 |
--------------------------------------------------------------------------------
/ros1/dope/setup.py:
--------------------------------------------------------------------------------
1 | ## ! DO NOT MANUALLY INVOKE THIS setup.py, USE CATKIN INSTEAD
2 |
3 | from distutils.core import setup
4 | from catkin_pkg.python_setup import generate_distutils_setup
5 |
6 | # fetch values from package.xml
7 | setup_args = generate_distutils_setup(
8 | packages=['dope', 'dope.inference'],
9 | package_dir={'': 'src'})
10 |
11 | setup(**setup_args)
12 |
--------------------------------------------------------------------------------
/ros1/dope/src/dope/__init__.py:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/NVlabs/Deep_Object_Pose/6cb3f3d250aed270e91b605e7cb6e64db4b97dbe/ros1/dope/src/dope/__init__.py
--------------------------------------------------------------------------------
/ros1/dope/src/dope/inference/__init__.py:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/NVlabs/Deep_Object_Pose/6cb3f3d250aed270e91b605e7cb6e64db4b97dbe/ros1/dope/src/dope/inference/__init__.py
--------------------------------------------------------------------------------
/ros1/dope/src/dope/inference/cuboid.py:
--------------------------------------------------------------------------------
1 | # Copyright (c) 2018 NVIDIA Corporation. All rights reserved.
2 | # This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
3 | # https://creativecommons.org/licenses/by-nc-sa/4.0/legalcode
4 |
5 | from enum import IntEnum
6 |
7 | import cv2
8 | import numpy as np
9 |
10 |
11 | # Related to the object's local coordinate system
12 | # @unique
13 | class CuboidVertexType(IntEnum):
14 | FrontTopRight = 0
15 | FrontTopLeft = 1
16 | FrontBottomLeft = 2
17 | FrontBottomRight = 3
18 | RearTopRight = 4
19 | RearTopLeft = 5
20 | RearBottomLeft = 6
21 | RearBottomRight = 7
22 | Center = 8
23 | TotalCornerVertexCount = 8 # Corner vertexes doesn't include the center point
24 | TotalVertexCount = 9
25 |
26 | # List of the vertex indexes in each line edges of the cuboid
27 | CuboidLineIndexes = [
28 | # Front face
29 | [ CuboidVertexType.FrontTopLeft, CuboidVertexType.FrontTopRight ],
30 | [ CuboidVertexType.FrontTopRight, CuboidVertexType.FrontBottomRight ],
31 | [ CuboidVertexType.FrontBottomRight, CuboidVertexType.FrontBottomLeft ],
32 | [ CuboidVertexType.FrontBottomLeft, CuboidVertexType.FrontTopLeft ],
33 | # Back face
34 | [ CuboidVertexType.RearTopLeft, CuboidVertexType.RearTopRight ],
35 | [ CuboidVertexType.RearTopRight, CuboidVertexType.RearBottomRight ],
36 | [ CuboidVertexType.RearBottomRight, CuboidVertexType.RearBottomLeft ],
37 | [ CuboidVertexType.RearBottomLeft, CuboidVertexType.RearTopLeft ],
38 | # Left face
39 | [ CuboidVertexType.FrontBottomLeft, CuboidVertexType.RearBottomLeft ],
40 | [ CuboidVertexType.FrontTopLeft, CuboidVertexType.RearTopLeft ],
41 | # Right face
42 | [ CuboidVertexType.FrontBottomRight, CuboidVertexType.RearBottomRight ],
43 | [ CuboidVertexType.FrontTopRight, CuboidVertexType.RearTopRight ],
44 | ]
45 |
46 |
47 | # ========================= Cuboid3d =========================
48 | class Cuboid3d():
49 | '''This class contains a 3D cuboid.'''
50 |
51 | # Create a box with a certain size
52 | def __init__(self, size3d = [1.0, 1.0, 1.0], center_location = [0, 0, 0],
53 | coord_system = None, parent_object = None):
54 |
55 | # NOTE: This local coordinate system is similar
56 | # to the intrinsic transform matrix of a 3d object
57 | self.center_location = center_location
58 | self.coord_system = coord_system
59 | self.size3d = size3d
60 | self._vertices = [0, 0, 0] * CuboidVertexType.TotalVertexCount
61 |
62 | self.generate_vertexes()
63 |
64 | def get_vertex(self, vertex_type):
65 | """Returns the location of a vertex.
66 |
67 | Args:
68 | vertex_type: enum of type CuboidVertexType
69 |
70 | Returns:
71 | Numpy array(3) - Location of the vertex type in the cuboid
72 | """
73 | return self._vertices[vertex_type]
74 |
75 | def get_vertices(self):
76 | return self._vertices
77 |
78 | def generate_vertexes(self):
79 | width, height, depth = self.size3d
80 |
81 | # By default just use the normal OpenCV coordinate system
82 | if (self.coord_system is None):
83 | cx, cy, cz = self.center_location
84 | # X axis point to the right
85 | right = cx + width / 2.0
86 | left = cx - width / 2.0
87 | # Y axis point downward
88 | top = cy - height / 2.0
89 | bottom = cy + height / 2.0
90 | # Z axis point forward
91 | front = cz + depth / 2.0
92 | rear = cz - depth / 2.0
93 |
94 | # List of 8 vertices of the box
95 | self._vertices = [
96 | [right, top, front], # Front Top Right
97 | [left, top, front], # Front Top Left
98 | [left, bottom, front], # Front Bottom Left
99 | [right, bottom, front], # Front Bottom Right
100 | [right, top, rear], # Rear Top Right
101 | [left, top, rear], # Rear Top Left
102 | [left, bottom, rear], # Rear Bottom Left
103 | [right, bottom, rear], # Rear Bottom Right
104 | self.center_location, # Center
105 | ]
106 | else:
107 | sx, sy, sz = self.size3d
108 | forward = np.array(self.coord_system.forward, dtype=float) * sy * 0.5
109 | up = np.array(self.coord_system.up, dtype=float) * sz * 0.5
110 | right = np.array(self.coord_system.right, dtype=float) * sx * 0.5
111 | center = np.array(self.center_location, dtype=float)
112 | self._vertices = [
113 | center + forward + up + right, # Front Top Right
114 | center + forward + up - right, # Front Top Left
115 | center + forward - up - right, # Front Bottom Left
116 | center + forward - up + right, # Front Bottom Right
117 | center - forward + up + right, # Rear Top Right
118 | center - forward + up - right, # Rear Top Left
119 | center - forward - up - right, # Rear Bottom Left
120 | center - forward - up + right, # Rear Bottom Right
121 | self.center_location, # Center
122 | ]
123 |
124 | def get_projected_cuboid2d(self, cuboid_transform, camera_intrinsic_matrix):
125 | """
126 | Projects the cuboid into the image plane using camera intrinsics.
127 |
128 | Args:
129 | cuboid_transform: the world transform of the cuboid
130 | camera_intrinsic_matrix: camera intrinsic matrix
131 |
132 | Returns:
133 | Cuboid2d - the projected cuboid points
134 | """
135 |
136 | world_transform_matrix = cuboid_transform
137 | rvec = [0, 0, 0]
138 | tvec = [0, 0, 0]
139 | dist_coeffs = np.zeros((4, 1))
140 |
141 | transformed_vertices = [0, 0, 0] * CuboidVertexType.TotalVertexCount
142 | for vertex_index in range(CuboidVertexType.TotalVertexCount):
143 | vertex3d = self._vertices[vertex_index]
144 | transformed_vertices[vertex_index] = world_transform_matrix * vertex3d
145 |
146 | projected_vertices = cv2.projectPoints(transformed_vertices, rvec, tvec,
147 | camera_intrinsic_matrix, dist_coeffs)
148 |
149 | return Cuboid2d(projected_vertices)
150 |
--------------------------------------------------------------------------------
/ros1/dope/src/dope/inference/cuboid_pnp_solver.py:
--------------------------------------------------------------------------------
1 | # Copyright (c) 2018 NVIDIA Corporation. All rights reserved.
2 | # This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
3 | # https://creativecommons.org/licenses/by-nc-sa/4.0/legalcode
4 |
5 | import cv2
6 | import numpy as np
7 | from .cuboid import CuboidVertexType
8 | from pyrr import Quaternion
9 |
10 |
11 | class CuboidPNPSolver(object):
12 | """
13 | This class is used to find the 6-DoF pose of a cuboid given its projected vertices.
14 |
15 | Runs perspective-n-point (PNP) algorithm.
16 | """
17 |
18 | # Class variables
19 | cv2version = cv2.__version__.split('.')
20 | cv2majorversion = int(cv2version[0])
21 |
22 | def __init__(self, object_name="", camera_intrinsic_matrix = None, cuboid3d = None,
23 | dist_coeffs = np.zeros((4, 1))):
24 | self.object_name = object_name
25 | if (not camera_intrinsic_matrix is None):
26 | self._camera_intrinsic_matrix = camera_intrinsic_matrix
27 | else:
28 | self._camera_intrinsic_matrix = np.array([
29 | [0, 0, 0],
30 | [0, 0, 0],
31 | [0, 0, 0]
32 | ])
33 | self._cuboid3d = cuboid3d
34 |
35 | self._dist_coeffs = dist_coeffs
36 |
37 | def set_camera_intrinsic_matrix(self, new_intrinsic_matrix):
38 | '''Sets the camera intrinsic matrix'''
39 | self._camera_intrinsic_matrix = new_intrinsic_matrix
40 |
41 | def set_dist_coeffs(self, dist_coeffs):
42 | '''Sets the camera intrinsic matrix'''
43 | self._dist_coeffs = dist_coeffs
44 |
45 | def solve_pnp(self, cuboid2d_points, pnp_algorithm = None):
46 | """
47 | Detects the rotation and traslation
48 | of a cuboid object from its vertexes'
49 | 2D location in the image
50 | """
51 |
52 | location = None
53 | quaternion = None
54 | projected_points = cuboid2d_points
55 |
56 | cuboid3d_points = np.array(self._cuboid3d.get_vertices())
57 | obj_2d_points = []
58 | obj_3d_points = []
59 |
60 | for i in range(CuboidVertexType.TotalVertexCount):
61 | check_point_2d = cuboid2d_points[i]
62 | # Ignore invalid points
63 | if (check_point_2d is None):
64 | continue
65 | obj_2d_points.append(check_point_2d)
66 | obj_3d_points.append(cuboid3d_points[i])
67 |
68 | obj_2d_points = np.array(obj_2d_points, dtype=float)
69 | obj_3d_points = np.array(obj_3d_points, dtype=float)
70 |
71 | valid_point_count = len(obj_2d_points)
72 | print(valid_point_count, "valid points found" )
73 |
74 | # Set PNP algorithm based on OpenCV version and number of valid points
75 | is_points_valid = False
76 |
77 | if pnp_algorithm is None:
78 | if CuboidPNPSolver.cv2majorversion == 2:
79 | is_points_valid = True
80 | pnp_algorithm = cv2.CV_ITERATIVE
81 | elif CuboidPNPSolver.cv2majorversion > 2:
82 | if valid_point_count >= 6:
83 | is_points_valid = True
84 | pnp_algorithm = cv2.SOLVEPNP_ITERATIVE
85 | elif valid_point_count >= 4:
86 | is_points_valid = True
87 | pnp_algorithm = cv2.SOLVEPNP_P3P
88 | # This algorithm requires EXACTLY four points, so we truncate our
89 | # data
90 | obj_3d_points = obj_3d_points[:4]
91 | obj_2d_points = obj_2d_points[:4]
92 | # Alternative algorithms:
93 | # pnp_algorithm = SOLVE_PNP_EPNP
94 | else:
95 | assert False, "DOPE will not work with versions of OpenCV earlier than 2.0"
96 |
97 | if is_points_valid:
98 | try:
99 | ret, rvec, tvec = cv2.solvePnP(
100 | obj_3d_points,
101 | obj_2d_points,
102 | self._camera_intrinsic_matrix,
103 | self._dist_coeffs,
104 | flags=pnp_algorithm
105 | )
106 | except:
107 | # solvePnP will assert if there are insufficient points for the
108 | # algorithm
109 | print("cv2.solvePnP failed with an error")
110 | ret = False
111 |
112 | if ret:
113 | location = list(x[0] for x in tvec)
114 | quaternion = self.convert_rvec_to_quaternion(rvec)
115 |
116 | projected_points, _ = cv2.projectPoints(cuboid3d_points, rvec, tvec, self._camera_intrinsic_matrix, self._dist_coeffs)
117 | projected_points = np.squeeze(projected_points)
118 |
119 | # If the location.Z is negative or object is behind the camera then flip both location and rotation
120 | x, y, z = location
121 | if z < 0:
122 | # Get the opposite location
123 | location = [-x, -y, -z]
124 |
125 | # Change the rotation by 180 degree
126 | rotate_angle = np.pi
127 | rotate_quaternion = Quaternion.from_axis_rotation(location, rotate_angle)
128 | quaternion = rotate_quaternion.cross(quaternion)
129 |
130 | return location, quaternion, projected_points
131 |
132 | def convert_rvec_to_quaternion(self, rvec):
133 | '''Convert rvec (which is log quaternion) to quaternion'''
134 | theta = np.sqrt(rvec[0] * rvec[0] + rvec[1] * rvec[1] + rvec[2] * rvec[2]) # in radians
135 | raxis = [rvec[0] / theta, rvec[1] / theta, rvec[2] / theta]
136 |
137 | # pyrr's Quaternion (order is XYZW), https://pyrr.readthedocs.io/en/latest/oo_api_quaternion.html
138 | return Quaternion.from_axis_rotation(raxis, theta)
139 |
140 | # Alternatively: pyquaternion
141 | # return Quaternion(axis=raxis, radians=theta) # uses OpenCV's Quaternion (order is WXYZ)
142 |
143 | def project_points(self, rvec, tvec):
144 | '''Project points from model onto image using rotation, translation'''
145 | output_points, tmp = cv2.projectPoints(
146 | self.__object_vertex_coordinates,
147 | rvec,
148 | tvec,
149 | self.__camera_intrinsic_matrix,
150 | self.__dist_coeffs)
151 |
152 | output_points = np.squeeze(output_points)
153 | return output_points
154 |
--------------------------------------------------------------------------------
/ros1/dope/src/dope/utils.py:
--------------------------------------------------------------------------------
1 | import math
2 |
3 | import torch
4 |
5 |
6 | def make_grid(tensor, nrow=8, padding=2,
7 | normalize=False, range_=None, scale_each=False, pad_value=0):
8 | """Make a grid of images.
9 | Args:
10 | tensor (Tensor or list): 4D mini-batch Tensor of shape (B x C x H x W)
11 | or a list of images all of the same size.
12 | nrow (int, optional): Number of images displayed in each row of the grid.
13 | The Final grid size is (B / nrow, nrow). Default is 8.
14 | padding (int, optional): amount of padding. Default is 2.
15 | normalize (bool, optional): If True, shift the image to the range (0, 1),
16 | by subtracting the minimum and dividing by the maximum pixel value.
17 | range (tuple, optional): tuple (min, max) where min and max are numbers,
18 | then these numbers are used to normalize the image. By default, min and max
19 | are computed from the tensor.
20 | scale_each (bool, optional): If True, scale each image in the batch of
21 | images separately rather than the (min, max) over all images.
22 | pad_value (float, optional): Value for the padded pixels.
23 | Example:
24 | See this notebook `here `_
25 | """
26 | if not (torch.is_tensor(tensor) or
27 | (isinstance(tensor, list) and all(torch.is_tensor(t) for t in tensor))):
28 | raise TypeError('tensor or list of tensors expected, got {}'.format(type(tensor)))
29 |
30 | # if list of tensors, convert to a 4D mini-batch Tensor
31 | if isinstance(tensor, list):
32 | tensor = torch.stack(tensor, dim=0)
33 |
34 | if tensor.dim() == 2: # single image H x W
35 | tensor = tensor.view(1, tensor.size(0), tensor.size(1))
36 | if tensor.dim() == 3: # single image
37 | if tensor.size(0) == 1: # if single-channel, convert to 3-channel
38 | tensor = torch.cat((tensor, tensor, tensor), 0)
39 | tensor = tensor.view(1, tensor.size(0), tensor.size(1), tensor.size(2))
40 |
41 | if tensor.dim() == 4 and tensor.size(1) == 1: # single-channel images
42 | tensor = torch.cat((tensor, tensor, tensor), 1)
43 |
44 | if normalize is True:
45 | tensor = tensor.clone() # avoid modifying tensor in-place
46 | if range_ is not None:
47 | assert isinstance(range_, tuple), \
48 | "range has to be a tuple (min, max) if specified. min and max are numbers"
49 |
50 | def norm_ip(img, min, max):
51 | img.clamp_(min=min, max=max)
52 | img.add_(-min).div_(max - min + 1e-5)
53 |
54 | def norm_range(t, range_):
55 | if range_ is not None:
56 | norm_ip(t, range_[0], range_[1])
57 | else:
58 | norm_ip(t, float(t.min()), float(t.max()))
59 |
60 | if scale_each is True:
61 | for t in tensor: # loop over mini-batch dimension
62 | norm_range(t, range)
63 | else:
64 | norm_range(tensor, range)
65 |
66 | if tensor.size(0) == 1:
67 | return tensor.squeeze()
68 |
69 | # make the mini-batch of images into a grid
70 | nmaps = tensor.size(0)
71 | xmaps = min(nrow, nmaps)
72 | ymaps = int(math.ceil(float(nmaps) / xmaps))
73 | height, width = int(tensor.size(2) + padding), int(tensor.size(3) + padding)
74 | grid = tensor.new(3, height * ymaps + padding, width * xmaps + padding).fill_(pad_value)
75 | k = 0
76 | for y in range(ymaps):
77 | for x in range(xmaps):
78 | if k >= nmaps:
79 | break
80 | grid.narrow(1, y * height + padding, height - padding) \
81 | .narrow(2, x * width + padding, width - padding) \
82 | .copy_(tensor[k])
83 | k = k + 1
84 | return grid
85 |
86 |
87 | def get_image_grid(tensor, nrow=3, padding=2, mean=None, std=None):
88 | """
89 | Saves a given Tensor into an image file.
90 | If given a mini-batch tensor, will save the tensor as a grid of images.
91 | """
92 | from PIL import Image
93 |
94 | # tensor = tensor.cpu()
95 | grid = make_grid(tensor, nrow=nrow, padding=padding, pad_value=1)
96 | if not mean is None:
97 | # ndarr = grid.mul(std).add(mean).mul(255).byte().transpose(0,2).transpose(0,1).numpy()
98 | ndarr = grid.mul(std).add(mean).mul(255).byte().transpose(0, 2).transpose(0, 1).numpy()
99 | else:
100 | ndarr = grid.mul(0.5).add(0.5).mul(255).byte().transpose(0, 2).transpose(0, 1).numpy()
101 | im = Image.fromarray(ndarr)
102 | return im
103 |
--------------------------------------------------------------------------------
/ros1/dope/weights/README.md:
--------------------------------------------------------------------------------
1 | This is the default location for DOPE weights.
2 |
--------------------------------------------------------------------------------
/ros1/requirements.txt:
--------------------------------------------------------------------------------
1 | pyrr==0.10.3
2 | torch==1.6.0
3 | torchvision==0.7.0
4 | numpy==1.17.4
5 | scipy==1.5.2
6 | opencv_python==4.4.0.44
7 | Pillow==8.1.1
8 | configparser==5.0.0
9 |
--------------------------------------------------------------------------------
/ros2/README.md:
--------------------------------------------------------------------------------
1 | NVIDIA's [Isaac ROS2](https://nvidia-isaac-ros.github.io/index.html) project provides a collection of hardware-accelerated, ROS 2 packages, including one for [DOPE](https://nvidia-isaac-ros.github.io/repositories_and_packages/isaac_ros_pose_estimation/index.html).
2 |
3 | This code is for inference only, so [training a custom model](https://nvidia-isaac-ros.github.io/concepts/pose_estimation/dope/tutorial_custom_model.html) should be done with the code in this repository.
4 |
5 |
--------------------------------------------------------------------------------
/sample_data/000000.json:
--------------------------------------------------------------------------------
1 | {
2 | "camera_data": {},
3 | "objects": [
4 | {
5 | "class": "cracker",
6 | "visibility": 1.0,
7 | "location": [
8 | -5.40710973739624,
9 | 10.20408821105957,
10 | 55.31976318359375
11 | ],
12 | "quaternion_xyzw": [
13 | -0.42416033148765564,
14 | 0.6872081160545349,
15 | 0.20139221847057343,
16 | -0.5543231964111328
17 | ],
18 | "projected_cuboid": [
19 | [
20 | 201.36776733398438,
21 | 563.1981201171875
22 | ],
23 | [
24 | 153.2436981201172,
25 | 382.7738952636719
26 | ],
27 | [
28 | 57.269569396972656,
29 | 247.0780792236328
30 | ],
31 | [
32 | 102.94410705566406,
33 | 470.3310852050781
34 | ],
35 | [
36 | 288.4808654785156,
37 | 528.7132568359375
38 | ],
39 | [
40 | 228.2211456298828,
41 | 353.6261291503906
42 | ],
43 | [
44 | 157.5666961669922,
45 | 208.70346069335938
46 | ],
47 | [
48 | 226.2147216796875,
49 | 422.1109619140625
50 | ],
51 | [
52 | 180.91781616210938,
53 | 397.6921691894531
54 | ]
55 | ]
56 | }
57 | ]
58 | }
59 |
--------------------------------------------------------------------------------
/sample_data/000000.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/NVlabs/Deep_Object_Pose/6cb3f3d250aed270e91b605e7cb6e64db4b97dbe/sample_data/000000.png
--------------------------------------------------------------------------------
/sample_data/000001.json:
--------------------------------------------------------------------------------
1 | {
2 | "camera_data": {},
3 | "objects": [
4 | {
5 | "class": "cracker",
6 | "visibility": 1.0,
7 | "location": [
8 | -5.40710973739624,
9 | 10.20408821105957,
10 | 55.31976318359375
11 | ],
12 | "quaternion_xyzw": [
13 | -0.42416033148765564,
14 | 0.6872081160545349,
15 | 0.20139221847057343,
16 | -0.5543231964111328
17 | ],
18 | "projected_cuboid": [
19 | [
20 | 201.36776733398438,
21 | 563.1981201171875
22 | ],
23 | [
24 | 153.2436981201172,
25 | 382.7738952636719
26 | ],
27 | [
28 | 57.269569396972656,
29 | 247.0780792236328
30 | ],
31 | [
32 | 102.94410705566406,
33 | 470.3310852050781
34 | ],
35 | [
36 | 288.4808654785156,
37 | 528.7132568359375
38 | ],
39 | [
40 | 228.2211456298828,
41 | 353.6261291503906
42 | ],
43 | [
44 | 157.5666961669922,
45 | 208.70346069335938
46 | ],
47 | [
48 | 226.2147216796875,
49 | 422.1109619140625
50 | ],
51 | [
52 | 180.91781616210938,
53 | 397.6921691894531
54 | ]
55 | ]
56 | }
57 | ]
58 | }
59 |
--------------------------------------------------------------------------------
/sample_data/000001.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/NVlabs/Deep_Object_Pose/6cb3f3d250aed270e91b605e7cb6e64db4b97dbe/sample_data/000001.png
--------------------------------------------------------------------------------
/sample_data/000030.json:
--------------------------------------------------------------------------------
1 | {
2 | "camera_data": {},
3 | "objects": [
4 | {
5 | "class": "cracker",
6 | "visibility": 1.0,
7 | "location": [
8 | 6.001536846160889,
9 | -4.993739604949951,
10 | 86.97349548339844
11 | ],
12 | "quaternion_xyzw": [
13 | 0.05684935301542282,
14 | 0.32983407378196716,
15 | -0.32128748297691345,
16 | 0.8858622908592224
17 | ],
18 | "projected_cuboid": [
19 | [
20 | 294.026123046875,
21 | 258.42864990234375
22 | ],
23 | [
24 | 407.443115234375,
25 | 274.3418884277344
26 | ],
27 | [
28 | 363.9100646972656,
29 | 112.13758850097656
30 | ],
31 | [
32 | 240.95428466796875,
33 | 112.85871887207031
34 | ],
35 | [
36 | 264.0873718261719,
37 | 293.51092529296875
38 | ],
39 | [
40 | 379.4239501953125,
41 | 314.3736877441406
42 | ],
43 | [
44 | 329.1734924316406,
45 | 149.2497100830078
46 | ],
47 | [
48 | 204.38331604003906,
49 | 145.52359008789062
50 | ],
51 | [
52 | 309.0063171386719,
53 | 211.89468383789062
54 | ]
55 | ]
56 | }
57 | ]
58 | }
59 |
--------------------------------------------------------------------------------
/sample_data/000030.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/NVlabs/Deep_Object_Pose/6cb3f3d250aed270e91b605e7cb6e64db4b97dbe/sample_data/000030.png
--------------------------------------------------------------------------------
/sample_data/000031.json:
--------------------------------------------------------------------------------
1 | {
2 | "camera_data": {},
3 | "objects": [
4 | {
5 | "class": "cracker",
6 | "visibility": 1.0,
7 | "location": [
8 | 6.001536846160889,
9 | -4.993739604949951,
10 | 86.97349548339844
11 | ],
12 | "quaternion_xyzw": [
13 | 0.05684935301542282,
14 | 0.32983407378196716,
15 | -0.32128748297691345,
16 | 0.8858622908592224
17 | ],
18 | "projected_cuboid": [
19 | [
20 | 294.026123046875,
21 | 258.42864990234375
22 | ],
23 | [
24 | 407.443115234375,
25 | 274.3418884277344
26 | ],
27 | [
28 | 363.9100646972656,
29 | 112.13758850097656
30 | ],
31 | [
32 | 240.95428466796875,
33 | 112.85871887207031
34 | ],
35 | [
36 | 264.0873718261719,
37 | 293.51092529296875
38 | ],
39 | [
40 | 379.4239501953125,
41 | 314.3736877441406
42 | ],
43 | [
44 | 329.1734924316406,
45 | 149.2497100830078
46 | ],
47 | [
48 | 204.38331604003906,
49 | 145.52359008789062
50 | ],
51 | [
52 | 309.0063171386719,
53 | 211.89468383789062
54 | ]
55 | ]
56 | }
57 | ]
58 | }
59 |
--------------------------------------------------------------------------------
/sample_data/000031.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/NVlabs/Deep_Object_Pose/6cb3f3d250aed270e91b605e7cb6e64db4b97dbe/sample_data/000031.png
--------------------------------------------------------------------------------
/train/.gitignore:
--------------------------------------------------------------------------------
1 | output/*
2 | **__pycache__**
3 | inference/output/*
4 | evaluate/output/*
5 | docker/drivers/*
--------------------------------------------------------------------------------
/train/README.md:
--------------------------------------------------------------------------------
1 | # Deep Object Pose Estimation (DOPE) - Training
2 |
3 | This repo contains a simplified version of the training pipeline for DOPE.
4 | Scripts for inference, evaluation, and data visualization can be found in this repos top-level directories `inference` and `evaluate`.
5 |
6 | A user report of training DOPE on a single GPU using NVISII-created synthetic data can [be found here](https://github.com/NVlabs/Deep_Object_Pose/issues/155#issuecomment-791148200).
7 |
8 |
9 |
10 | ## Installing Dependencies
11 | ***Note***
12 |
13 | It is highly recommended to install these dependencies in a virtual environment. You can create and activate a virtual environment by running:
14 | ```
15 | python -m venv ./output/dope_training
16 | source ./output/dope_training/bin/activate
17 | ```
18 | ---
19 | To install the required dependencies, run:
20 | ```
21 | pip install -r ../requirements.txt
22 | ```
23 |
24 | ## Training
25 | To run the training script, at minimum the ``--data`` and ``--object`` flags must be specified if training with data that is stored locally:
26 | ```
27 | python -m torch.distributed.launch --nproc_per_node=1 train.py --data PATH_TO_DATA --object CLASS_OF_OBJECT
28 | ```
29 | The ``--data`` flag specifies the path to the training data. There can be multiple paths that are passed in.
30 |
31 | The ``--object`` flag specifies the name of the object to train the DOPE model on.
32 | Although multiple objects can be passed in, DOPE is designed to be trained for a specific object. For best results, only specify one object.
33 | The name of this object must match the `"class"` field in ground-truth `.json` files.
34 |
35 | To get a full list of the command line arguments, run `python train.py --help`.
36 |
37 | ### Loading Data from `s3`
38 | There is also an option to train with data that is stored on an `s3` bucket. The script uses `boto3` to load data from `s3`.
39 | The easiest way to configure credentials with `boto3` is with a config file, which you can [setup using this guide](https://boto3.amazonaws.com/v1/documentation/api/latest/guide/credentials.html#aws-config-file).
40 |
41 | When training with data from `s3`, be sure to specify the ``--use_s3`` flag and also the ``--train_buckets`` flag that indicates which buckets to use for training.
42 | Note that multiple buckets can be specified with the `--train_buckets` flag.
43 |
44 | In addition, the `--endpoint` must be specified in order to retrieve data from an `s3` bucket.
45 |
46 | Below is a sample command to run the training script while using data from `s3`.
47 | ```
48 | torchrun --nproc_per_node=1 train.py --use_s3 --train_buckets BUCKET_1 BUCKET_2 --endpoint ENDPOINT_URL --object CLASS_OF_OBJECT
49 | ```
50 |
51 | ### Multi-GPU Training
52 |
53 | To run on multi-GPU machines, set `--nproc_per_node=`. In addition, reduce the number of epochs by a factor of the number of GPUs you have.
54 | For example, when running on an 8-GPU machine, setting ``--epochs 5`` is equivalent to running `40` epochs on a single GPU machine.
55 |
56 | ## Debugging
57 | There is an option to visualize the `projected_cuboid_points` in the ground truth file. To do so, run:
58 | ```
59 | python debug.py --data PATH_TO_IMAGES
60 | ```
61 |
62 | ## Common Issues
63 |
64 | 1. If you notice you are running out of memory when training, reduce the batch size by specifying a smaller ``--batchsize`` value. By default, this value is `32`.
65 | 2. If you are running into dependency issues when installing,
66 | you can try to install the version specific dependencies that are commented out in `requirements.txt`. Be sure to do this in a virtual environment.
67 |
68 |
--------------------------------------------------------------------------------
/train/docker/Dockerfile:
--------------------------------------------------------------------------------
1 | FROM nvcr.io/nvidia/pytorch:21.02-py3
2 |
3 | # make sure to run ./get_nvidia_libs.sh before running docker build
4 | COPY drivers/* /usr/lib/x86_64-linux-gnu/
5 |
6 | # Note, installing manually instead of using pip install -r requirements.txt because
7 | # the base PyTorch container already has some dependencies installed. Installing using
8 | # requirements.txt will cause circular dependency issues.
9 | RUN git clone https://github.com/andrewyguo/dope_training.git \
10 | && pip install --no-input tensorboardX \
11 | && pip install --no-input boto3 \
12 | && pip install --no-input albumentations \
13 | && pip install --no-input pyrr \
14 | && pip install --no-input simplejson \
15 | && pip install --no-input visii \
16 | && pip install --no-input opencv_python==4.5.4.60 \
17 | && apt-get update \
18 | && export DEBIAN_FRONTEND=noninteractive \
19 | && apt-get install s3cmd -y \
20 | && apt-get install -y libgl1
21 |
22 | WORKDIR /workspace/dope_training
23 |
24 | # Uncomment and fill in if using s3
25 | # RUN mkdir ~/.aws \
26 | # && echo "[default]" >> ~/.aws/config \
27 | # && echo "aws_access_key_id = " >> ~/.aws/config \
28 | # && echo "aws_secret_access_key = " >> ~/.aws/config \
29 | # # Setup config files for s3 authentication
30 | # && echo "[default]" >> ~/.s3cfg \
31 | # && echo "use_https = True" >> ~/.s3cfg \
32 | # && echo "access_key = " >> ~/.s3cfg \
33 | # && echo "secret_key = " >> ~/.s3cfg \
34 | # && echo "bucket_location = us-east-1" >> ~/.s3cfg \
35 | # && echo "host_base = " >> ~/.s3cfg \
36 | # && echo "host_bucket = bucket-name" >> ~/.s3cfg
37 |
--------------------------------------------------------------------------------
/train/docker/get_nvidia_libs.sh:
--------------------------------------------------------------------------------
1 | # Copies over the necessary driver files into ./drivers.
2 | # During docker build, files in ./drivers are copied into the container.
3 | # This is needed to run visii in evaluate.py
4 | cd drivers/
5 | cp /usr/lib/x86_64-linux-gnu/libnvoptix.so.1 .
6 | cp /usr/lib/x86_64-linux-gnu/*nvidia* .
7 |
--------------------------------------------------------------------------------
/train/misc/arial.ttf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/NVlabs/Deep_Object_Pose/6cb3f3d250aed270e91b605e7cb6e64db4b97dbe/train/misc/arial.ttf
--------------------------------------------------------------------------------
/train/misc/test_projection.py:
--------------------------------------------------------------------------------
1 | import numpy as np
2 | import pyrender
3 |
4 | def transform_points(local_points, local_to_world_matrix):
5 | """
6 | Transforms 3D points from local to world coordinates using a given transformation matrix.
7 |
8 | Parameters:
9 | local_points (numpy array of shape (N, 3)): Points in local coordinates
10 | local_to_world_matrix (numpy array of shape (4, 4)): Transformation matrix from local to world coordinates
11 |
12 | Returns:
13 | world_points (numpy array of shape (N, 3)): Points in world coordinates
14 | """
15 |
16 | # Convert local points to homogeneous coordinates
17 | local_points_hom = np.concatenate([local_points, np.ones((local_points.shape[0], 1))], axis=-1)
18 | local_to_world_matrix_col_major = local_to_world_matrix.T
19 | print_matrix(local_to_world_matrix_col_major, "local_to_world_matrix_col_major")
20 | print_matrix(local_points_hom, "local_points_hom")
21 |
22 | world_points_hom = local_to_world_matrix_col_major @ local_points_hom.T
23 |
24 | # world_points_hom = np.dot(local_to_world_matrix.T, local_points_hom.T)
25 | # print_matrix(world_points_hom, "world_points_hom")
26 | print_matrix(local_to_world_matrix, "local_to_world_matrix")
27 |
28 | world_points = world_points_hom[:, :3]
29 | print_matrix(world_points, "world_points")
30 | return world_points
31 |
32 | def print_matrix(matrix, name):
33 | print(f"--\n{name}: {matrix.shape} ")
34 | for row in matrix:
35 | print(row)
36 |
37 | def get_image_space_points(points, view_proj_matrix):
38 | """
39 | Args:
40 | points: numpy array of N points (N, 3) in the world space. Points will be projected into the image space.
41 | view_proj_matrix: Desired view projection matrix, transforming points from world frame to image space of desired camera
42 | Returns:
43 | numpy array of shape (N, 3) of points projected into the image space.
44 | """
45 |
46 | homo = np.pad(points, ((0, 0), (0, 1)), constant_values=1.0)
47 | tf_points = np.dot(homo, view_proj_matrix)
48 | tf_points = tf_points / (tf_points[..., -1:])
49 | tf_points[..., :2] = 0.5 * (tf_points[..., :2] + 1)
50 | image_space_points = tf_points[..., :3]
51 |
52 | return image_space_points
53 |
54 |
55 | import json
56 | json_fp = "../sample_data/000000.json"
57 |
58 | with open(json_fp) as f:
59 | data = json.load(f)
60 |
61 | camera_data = data["camera_data"]
62 | camera_view_matrix = np.array(camera_data["camera_view_matrix"])
63 |
64 | cx = camera_data["intrinsics"]["cx"]
65 | cy = camera_data["intrinsics"]["cy"]
66 | fx = camera_data["intrinsics"]["fx"]
67 | fy = camera_data["intrinsics"]["fy"]
68 |
69 | camera_intrinsic_matrix = np.array([[fx, 0, cx], [0, fy, cy], [0, 0, 1]])
70 |
71 | cam = pyrender.IntrinsicsCamera(fx, fy, cx, cy)
72 |
73 | view_projection_matrix = cam.get_projection_matrix(camera_data["width"], camera_data["height"])
74 |
75 | view_projection_matrix = view_projection_matrix.T
76 |
77 | points_3d = np.array(data["objects"][0]["local_cuboid"])
78 | local_to_world_matrix = np.array(data["objects"][0]["local_to_world_matrix"])
79 |
80 |
81 | world_points_3d = transform_points(points_3d, local_to_world_matrix)
82 |
83 | print_matrix(view_projection_matrix, "view_projection_matrix")
84 | image_space_points = get_image_space_points(world_points_3d, view_projection_matrix)
85 |
86 | resolution = np.array([[camera_data["width"], camera_data["height"], 1.0]])
87 | image_space_points *= resolution
88 |
89 | projected_cuboid_points = [
90 | [pixel_coordinate[0], pixel_coordinate[1]] for pixel_coordinate in image_space_points
91 | ]
92 | print("--\nimage_space_points: ")
93 | for row in image_space_points:
94 | print(row)
95 |
96 |
--------------------------------------------------------------------------------
/walkthrough.md:
--------------------------------------------------------------------------------
1 | # DOPE Pipeline Walkthrough
2 |
3 | Here we provide a detailed example of using the data generation, training, and inference tools provided in this repo. This example is
4 | given not only to demonstrate the various tools, but also to show what
5 | kind of results you can expect.
6 |
7 | We have uploaded our final PTH file as well as some sample data
8 | to [Google Drive](https://drive.google.com/drive/folders/1zq4yJUj8lTn56bWdOMnkCr1Wmj0dq-GL).
9 |
10 |
11 | ## Preparation
12 | We assume you have installed the project dependencies and are in an environment where you have access to GPUs.
13 |
14 |
15 | Follow [the instructions](data_generation/readme.md) for downloading environment maps and distractors.
16 |
17 | Download or create a textured model of your object of interest. For this walkthrough, we will use the [Ketchup](https://drive.google.com/drive/folders/1ICXdhNhahDiUrjh_r5aBMPYbb2aWMCJF?usp=drive_link) model from the [HOPE 3D Model Set](https://drive.google.com/drive/folders/1jiJS9KgcYAkfb8KJPp5MRlB0P11BStft/).
18 |
19 |
20 | For the sake of the example commands, we will assume the following folder
21 | structure:
22 | `~/data/dome_hdri_haven/` contains the HDR environment maps;
23 | `~/data/google_scanned_models/` contains the distractor objects;
24 | `~/data/models/` contains our "hero" models in subdirectories; e.g. `~/data/models/Ketchup`.
25 |
26 | ## Data Generation
27 | We will use the BlenderProc data generation utilities. In the `data_generation/blenderproc_data_gen` directory, run the following command:
28 |
29 | ```
30 | ./run_blenderproc_datagen.py --nb_runs 10 --nb_frames 50000 --path_single_obj ~/data/models/Ketchup/google_16k/textured.obj --nb_objects 1 --distractors_folder ~/data/google_scanned_models/ --nb_distractors 10 --backgrounds_folder ~/data/dome_hdri_haven/ --outf ~/data/KetchupData
31 | ```
32 |
33 | This will create ten subdirectories under the `~/data/KetchupData` directory, each containing 5000 images (`nb_images` divided by `nb_runs`). For Blender efficiency reasons, the distractors are only changed from run to run. That is, we will have 10 different selections of distractors in our 50,000 images. If you want
34 | a greater selection of distractors, increase the `nb_runs` parameter.
35 |
36 | ## Training
37 |
38 | Assuming your machine has *N* GPUs, run the following command:
39 |
40 | ```
41 | python -m torch.distributed.launch --nproc_per_node=N ./train.py --data ~/data/KetchupData --object Ketchup --epochs 2000 --save_every 100
42 | ```
43 |
44 | This command will train DOPE for 2000 epochs, saving a checkpoint every 100 epochs.
45 |
46 | ## Inference
47 | When training is finished, you will have several saved checkpoints including the final one: `final_net_epoch_2000.pth`. We will use this checkpoint for inference.
48 |
49 |
50 | Generate a small number of new images in the same distribution as your training images. We will use these for inference testing and evaluation.
51 | ```
52 | ./run_blenderproc_datagen.py --nb_runs 2 --nb_frames 20 --path_single_obj ~/data/models/Ketchup/google_16k/textured.obj --nb_objects 1 --distractors_folder ~/data/google_scanned_models/ --nb_distractors 10 --backgrounds_folder ~/data/dome_hdri_haven/ --outf ~/data/KetchupTest
53 | ```
54 | For convenience, we have uploaded 20 test images and JSON files to the [Google Drive](https://drive.google.com/drive/folders/1zq4yJUj8lTn56bWdOMnkCr1Wmj0dq-GL)
55 | location mentioned above.
56 |
57 |
58 | Inside the `inference` directory, run the following command:
59 | ```
60 | python ./inference.py --camera ../config/blenderproc_camera_info_example.yaml --object Ketchup --parallel --weights final_net_epoch_2000.pth --data ~/data/KetchupTest/
61 | ```
62 |
63 | The inference output will be in the `output` directory. Using our provided `final_net_epoch_2000.pth` and our provided test images, DOPE finds the object of interest in 13 out of 20 images.
64 |
65 |
--------------------------------------------------------------------------------
/weights/readme.md:
--------------------------------------------------------------------------------
1 | We have trained and tested DOPE with two publicly available datasets: YCB, and HOPE. These trained weights can be
2 | [downloaded from Google Drive](https://drive.google.com/drive/folders/1DfoA3m_Bm0fW8tOWXGVxi4ETlLEAgmcg).
3 |
4 |
5 |
--------------------------------------------------------------------------------