├── .dockerignore ├── .gitignore ├── CoRL ├── config_inference │ ├── camera_info.yaml │ └── config_pose.yaml ├── cuboid.py ├── cuboid_pnp_solver.py ├── detector.py ├── inference.py ├── models.py ├── readme.md ├── requirements.txt ├── train.py └── utils_dope.py ├── common ├── cuboid.py ├── cuboid_pnp_solver.py ├── debug.py ├── detector.py ├── models.py └── utils.py ├── config ├── blenderproc_camera_info_example.yaml ├── camera_info.yaml └── config_pose.yaml ├── data_generation ├── backgrounds │ ├── README.md │ └── messy_office.png ├── blenderproc_data_gen │ ├── README.md │ ├── generate_training_data.py │ └── run_blenderproc_datagen.py ├── dome_hdri_haven │ └── download.md ├── models │ └── Ketchup │ │ └── google_16k │ │ ├── texture_map.png │ │ ├── texture_map_flat.png │ │ ├── textured.mtl │ │ ├── textured.obj │ │ ├── textured.obj.bin │ │ ├── textured.obj.json │ │ ├── textured_simple.mtl │ │ └── textured_simple.obj ├── nvisii_data_gen │ ├── .gitignore │ ├── debug_json_ros_node.py │ ├── doc │ │ └── videos │ │ │ ├── cylinder_nosym.mp4 │ │ │ ├── cylinder_sym.mp4 │ │ │ └── hex_screw.mp4 │ ├── download_google_scanned_objects.py │ ├── generate_dataset.py │ ├── models_with_symmetries │ │ ├── cylinder │ │ │ └── google_16k │ │ │ │ ├── model_info.json │ │ │ │ ├── texture_map_flat.png │ │ │ │ ├── textured.mtl │ │ │ │ └── textured.obj │ │ └── hex_screw │ │ │ └── google_16k │ │ │ ├── model_info.json │ │ │ ├── texture_map_flat.png │ │ │ ├── textured.mtl │ │ │ └── textured.obj │ ├── output │ │ └── output_example │ │ │ ├── 00000.depth.exr │ │ │ ├── 00000.json │ │ │ ├── 00000.png │ │ │ └── 00000.seg.exr │ ├── readme.md │ ├── requirements.txt │ ├── single_video_pybullet.py │ └── utils.py ├── readme.md └── validate_data.py ├── doc └── camera_tutorial.md ├── dope_objects.png ├── evaluate ├── .gitignore ├── add_compute.py ├── download_content.sh ├── evaluate.py ├── kpd_compute.py ├── make_graphs.py ├── overlay.png ├── readme.md ├── render_json.py ├── results │ └── output.png └── utils_eval.py ├── inference ├── README.md └── inference.py ├── license.md ├── readme.md ├── requirements.txt ├── ros1 ├── README.md ├── docker │ ├── Dockerfile.noetic │ ├── init_workspace.sh │ ├── readme.md │ └── run_dope_docker.sh ├── dope │ ├── CMakeLists.txt │ ├── config │ │ ├── camera_info.yaml │ │ └── config_pose.yaml │ ├── launch │ │ ├── camera.launch │ │ └── dope.launch │ ├── nodes │ │ ├── camera │ │ └── dope │ ├── package.xml │ ├── setup.py │ ├── src │ │ └── dope │ │ │ ├── __init__.py │ │ │ ├── inference │ │ │ ├── __init__.py │ │ │ ├── cuboid.py │ │ │ ├── cuboid_pnp_solver.py │ │ │ └── detector.py │ │ │ └── utils.py │ └── weights │ │ └── README.md └── requirements.txt ├── ros2 └── README.md ├── sample_data ├── 000000.json ├── 000000.png ├── 000001.json ├── 000001.png ├── 000030.json ├── 000030.png ├── 000031.json └── 000031.png ├── train ├── .gitignore ├── README.md ├── docker │ ├── Dockerfile │ └── get_nvidia_libs.sh ├── misc │ ├── arial.ttf │ └── test_projection.py └── train.py ├── walkthrough.md └── weights └── readme.md /.dockerignore: -------------------------------------------------------------------------------- 1 | .git 2 | /weights 3 | *.pyc 4 | *._ 5 | *.png 6 | __pycache__ 7 | venv 8 | .idea 9 | -------------------------------------------------------------------------------- /.gitignore: -------------------------------------------------------------------------------- 1 | *.pth 2 | *.pyc 3 | *._ 4 | __pycache__ 5 | venv 6 | .idea 7 | .DS_Store 8 | ._.DS_Store 9 | *.hdr 10 | google_scanned_models/ 11 | scripts/nvisii_data_gen/output/dataset/ 12 | -------------------------------------------------------------------------------- /CoRL/config_inference/camera_info.yaml: -------------------------------------------------------------------------------- 1 | image_width: 640 2 | image_height: 480 3 | camera_name: dope_webcam_0 4 | camera_matrix: 5 | rows: 3 6 | cols: 3 7 | data: [641.5, 0, 320.0, 0, 641.5, 240.0, 0, 0, 1] 8 | distortion_model: plumb_bob 9 | distortion_coefficients: 10 | rows: 1 11 | cols: 5 12 | data: [0, 0, 0, 0, 0] 13 | rectification_matrix: 14 | rows: 3 15 | cols: 3 16 | data: [1, 0, 0, 0, 1, 0, 0, 0, 1] 17 | projection_matrix: 18 | rows: 3 19 | cols: 4 20 | data: [641.5, 0, 320.0, 0, 0, 641.5, 240.0, 0, 0, 0, 1, 0] 21 | -------------------------------------------------------------------------------- /CoRL/config_inference/config_pose.yaml: -------------------------------------------------------------------------------- 1 | topic_camera: "/dope/webcam/image_raw" 2 | topic_camera_info: "/dope/webcam/camera_info" 3 | topic_publishing: "dope" 4 | input_is_rectified: True # Whether the input image is rectified (strongly suggested!) 5 | downscale_height: 400 # if the input image is larger than this, scale it down to this pixel height 6 | 7 | # Comment any of these lines to prevent detection / pose estimation of that object 8 | weights: { 9 | # "cracker":"package://dope/weights/cracker_60.pth", 10 | # "gelatin":"package://dope/weights/gelatin_60.pth", 11 | # "meat":"package://dope/weights/meat_20.pth", 12 | # "mustard":"package://dope/weights/mustard_60.pth", 13 | # "soup":"package://dope/weights/soup_60.pth", 14 | # 'peg_hole': "package://dope/weights/peg_box_40.pth", 15 | # 'cube_red': "package://dope/weights/red_40.pth", 16 | #"sugar":"package://dope/weights/sugar_60.pth" 17 | # "bleach":"package://dope/weights/bleach_28_dr.pth" 18 | # 'pudding':"weights_dope/pudding.pth" 19 | # 'pudding':"weights_dope/dope_network/net_epoch_60.pth" 20 | 'alphabet_soup':"weights_dope/resnet_simple/alphabe_soup.pth" 21 | } 22 | 23 | # Type of neural network architecture 24 | architectures: { 25 | 'pudding':"dope", 26 | 'alphabet_soup':'resnet_simple', 27 | } 28 | 29 | 30 | # Cuboid dimension in cm x,y,z 31 | dimensions: { 32 | "cracker": [16.403600692749023,21.343700408935547,7.179999828338623], 33 | "gelatin": [8.918299674987793, 7.311500072479248, 2.9983000755310059], 34 | "meat": [10.164673805236816,8.3542995452880859,5.7600898742675781], 35 | "mustard": [9.6024150848388672,19.130100250244141,5.824894905090332], 36 | "soup": [6.7659378051757813,10.185500144958496,6.771425724029541], 37 | "sugar": [9.267730712890625,17.625339508056641,4.5134143829345703], 38 | "bleach": [10.267730712890625,26.625339508056641,7.5134143829345703], 39 | "peg_hole": [12.6,3.9,12.6], 40 | 'cube_red':[5,5,5], 41 | 'pudding':[49.47199821472168, 29.923000335693359, 83.498001098632812], 42 | 'alphabet_soup':[8.3555002212524414, 7.1121001243591309, 6.6055998802185059] 43 | 44 | } 45 | 46 | class_ids: { 47 | "cracker": 1, 48 | "gelatin": 2, 49 | "meat": 3, 50 | "mustard": 4, 51 | "soup": 5, 52 | "sugar": 6, 53 | "bleach": 7, 54 | "peg_hole": 8, 55 | "cube_red": 9, 56 | 'pudding': 10, 57 | 'alphabet_soup': 12, 58 | } 59 | 60 | draw_colors: { 61 | "cracker": [13, 255, 128], # green 62 | "gelatin": [255, 255, 255], # while 63 | "meat": [0, 104, 255], # blue 64 | "mustard": [217,12, 232], # magenta 65 | "soup": [255, 101, 0], # orange 66 | "sugar": [232, 222, 12], # yellow 67 | "bleach": [232, 222, 12], # yellow 68 | "peg_hole": [232, 222, 12], # yellow 69 | "cube_red": [255,0,0], 70 | "pudding": [255,0,0], 71 | } 72 | 73 | # optional: provide a transform that is applied to the pose returned by DOPE 74 | model_transforms: { 75 | # "cracker": [[ 0, 0, 1, 0], 76 | # [ 0, -1, 0, 0], 77 | # [ 1, 0, 0, 0], 78 | # [ 0, 0, 0, 1]] 79 | } 80 | 81 | # optional: if you provide a mesh of the object here, a mesh marker will be 82 | # published for visualization in RViz 83 | # You can use the nvdu_ycb tool to download the meshes: https://github.com/NVIDIA/Dataset_Utilities#nvdu_ycb 84 | meshes: { 85 | # "cracker": "file://path/to/Dataset_Utilities/nvdu/data/ycb/aligned_cm/003_cracker_box/google_16k/textured.obj", 86 | # "gelatin": "file://path/to/Dataset_Utilities/nvdu/data/ycb/aligned_cm/009_gelatin_box/google_16k/textured.obj", 87 | # "meat": "file://path/to/Dataset_Utilities/nvdu/data/ycb/aligned_cm/010_potted_meat_can/google_16k/textured.obj", 88 | # "mustard": "file://path/to/Dataset_Utilities/nvdu/data/ycb/aligned_cm/006_mustard_bottle/google_16k/textured.obj", 89 | # "soup": "file://path/to/Dataset_Utilities/nvdu/data/ycb/aligned_cm/005_tomato_soup_can/google_16k/textured.obj", 90 | # "sugar": "file://path/to/Dataset_Utilities/nvdu/data/ycb/aligned_cm/004_sugar_box/google_16k/textured.obj", 91 | # "bleach": "file://path/to/Dataset_Utilities/nvdu/data/ycb/aligned_cm/021_bleach_cleanser/google_16k/textured.obj", 92 | } 93 | 94 | # optional: If the specified meshes are not in meters, provide a scale here (e.g. if the mesh is in centimeters, scale should be 0.01). default scale: 1.0. 95 | mesh_scales: { 96 | "cracker": 0.01, 97 | "gelatin": 0.01, 98 | "meat": 0.01, 99 | "mustard": 0.01, 100 | "soup": 0.01, 101 | "sugar": 0.01, 102 | "bleach": 0.01, 103 | } 104 | 105 | # Config params for DOPE 106 | thresh_angle: 0.5 107 | thresh_map: 0.0001 108 | sigma: 3 109 | thresh_points: 0.1 110 | -------------------------------------------------------------------------------- /CoRL/cuboid.py: -------------------------------------------------------------------------------- 1 | # Copyright (c) 2018 NVIDIA Corporation. All rights reserved. 2 | # This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License. 3 | # https://creativecommons.org/licenses/by-nc-sa/4.0/legalcode 4 | 5 | from enum import IntEnum, unique 6 | import numpy as np 7 | import cv2 8 | from pyrr import Quaternion, Matrix44, Vector3, euler 9 | 10 | # Related to the object's local coordinate system 11 | # @unique 12 | class CuboidVertexType(IntEnum): 13 | FrontTopRight = 0 14 | FrontTopLeft = 1 15 | FrontBottomLeft = 2 16 | FrontBottomRight = 3 17 | RearTopRight = 4 18 | RearTopLeft = 5 19 | RearBottomLeft = 6 20 | RearBottomRight = 7 21 | Center = 8 22 | TotalCornerVertexCount = 8 # Corner vertexes doesn't include the center point 23 | TotalVertexCount = 9 24 | 25 | # List of the vertex indexes in each line edges of the cuboid 26 | CuboidLineIndexes = [ 27 | # Front face 28 | [ CuboidVertexType.FrontTopLeft, CuboidVertexType.FrontTopRight ], 29 | [ CuboidVertexType.FrontTopRight, CuboidVertexType.FrontBottomRight ], 30 | [ CuboidVertexType.FrontBottomRight, CuboidVertexType.FrontBottomLeft ], 31 | [ CuboidVertexType.FrontBottomLeft, CuboidVertexType.FrontTopLeft ], 32 | # Back face 33 | [ CuboidVertexType.RearTopLeft, CuboidVertexType.RearTopRight ], 34 | [ CuboidVertexType.RearTopRight, CuboidVertexType.RearBottomRight ], 35 | [ CuboidVertexType.RearBottomRight, CuboidVertexType.RearBottomLeft ], 36 | [ CuboidVertexType.RearBottomLeft, CuboidVertexType.RearTopLeft ], 37 | # Left face 38 | [ CuboidVertexType.FrontBottomLeft, CuboidVertexType.RearBottomLeft ], 39 | [ CuboidVertexType.FrontTopLeft, CuboidVertexType.RearTopLeft ], 40 | # Right face 41 | [ CuboidVertexType.FrontBottomRight, CuboidVertexType.RearBottomRight ], 42 | [ CuboidVertexType.FrontTopRight, CuboidVertexType.RearTopRight ], 43 | ] 44 | 45 | 46 | # ========================= Cuboid3d ========================= 47 | class Cuboid3d(): 48 | '''This class contains a 3D cuboid.''' 49 | 50 | # Create a box with a certain size 51 | def __init__(self, size3d = [1.0, 1.0, 1.0], center_location = [0, 0, 0], 52 | coord_system = None, parent_object = None): 53 | 54 | # NOTE: This local coordinate system is similar 55 | # to the intrinsic transform matrix of a 3d object 56 | self.center_location = center_location 57 | self.coord_system = coord_system 58 | self.size3d = size3d 59 | self._vertices = [0, 0, 0] * CuboidVertexType.TotalVertexCount 60 | 61 | self.generate_vertexes() 62 | 63 | def get_vertex(self, vertex_type): 64 | """Returns the location of a vertex. 65 | 66 | Args: 67 | vertex_type: enum of type CuboidVertexType 68 | 69 | Returns: 70 | Numpy array(3) - Location of the vertex type in the cuboid 71 | """ 72 | return self._vertices[vertex_type] 73 | 74 | def get_vertices(self): 75 | return self._vertices 76 | 77 | def generate_vertexes(self): 78 | width, height, depth = self.size3d 79 | 80 | # By default just use the normal OpenCV coordinate system 81 | if (self.coord_system is None): 82 | cx, cy, cz = self.center_location 83 | # X axis point to the right 84 | right = cx + width / 2.0 85 | left = cx - width / 2.0 86 | # Y axis point downward 87 | top = cy - height / 2.0 88 | bottom = cy + height / 2.0 89 | # Z axis point forward 90 | front = cz + depth / 2.0 91 | rear = cz - depth / 2.0 92 | 93 | # List of 8 vertices of the box 94 | self._vertices = [ 95 | [right, top, front], # Front Top Right 96 | [left, top, front], # Front Top Left 97 | [left, bottom, front], # Front Bottom Left 98 | [right, bottom, front], # Front Bottom Right 99 | [right, top, rear], # Rear Top Right 100 | [left, top, rear], # Rear Top Left 101 | [left, bottom, rear], # Rear Bottom Left 102 | [right, bottom, rear], # Rear Bottom Right 103 | self.center_location, # Center 104 | ] 105 | else: 106 | sx, sy, sz = self.size3d 107 | forward = np.array(self.coord_system.forward, dtype=float) * sy * 0.5 108 | up = np.array(self.coord_system.up, dtype=float) * sz * 0.5 109 | right = np.array(self.coord_system.right, dtype=float) * sx * 0.5 110 | center = np.array(self.center_location, dtype=float) 111 | self._vertices = [ 112 | center + forward + up + right, # Front Top Right 113 | center + forward + up - right, # Front Top Left 114 | center + forward - up - right, # Front Bottom Left 115 | center + forward - up + right, # Front Bottom Right 116 | center - forward + up + right, # Rear Top Right 117 | center - forward + up - right, # Rear Top Left 118 | center - forward - up - right, # Rear Bottom Left 119 | center - forward - up + right, # Rear Bottom Right 120 | self.center_location, # Center 121 | ] 122 | 123 | def get_projected_cuboid2d(self, cuboid_transform, camera_intrinsic_matrix): 124 | """ 125 | Projects the cuboid into the image plane using camera intrinsics. 126 | 127 | Args: 128 | cuboid_transform: the world transform of the cuboid 129 | camera_intrinsic_matrix: camera intrinsic matrix 130 | 131 | Returns: 132 | Cuboid2d - the projected cuboid points 133 | """ 134 | 135 | world_transform_matrix = cuboid_transform 136 | rvec = [0, 0, 0] 137 | tvec = [0, 0, 0] 138 | dist_coeffs = np.zeros((4, 1)) 139 | 140 | transformed_vertices = [0, 0, 0] * CuboidVertexType.TotalVertexCount 141 | for vertex_index in range(CuboidVertexType.TotalVertexCount): 142 | vertex3d = self._vertices[vertex_index] 143 | transformed_vertices[vertex_index] = world_transform_matrix * vertex3d 144 | 145 | projected_vertices = cv2.projectPoints(transformed_vertices, rvec, tvec, 146 | camera_intrinsic_matrix, dist_coeffs) 147 | 148 | return Cuboid2d(projected_vertices) 149 | -------------------------------------------------------------------------------- /CoRL/cuboid_pnp_solver.py: -------------------------------------------------------------------------------- 1 | # Copyright (c) 2018 NVIDIA Corporation. All rights reserved. 2 | # This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License. 3 | # https://creativecommons.org/licenses/by-nc-sa/4.0/legalcode 4 | 5 | import cv2 6 | import numpy as np 7 | from cuboid import CuboidVertexType 8 | from pyrr import Quaternion 9 | 10 | 11 | class CuboidPNPSolver(object): 12 | """ 13 | This class is used to find the 6-DoF pose of a cuboid given its projected vertices. 14 | 15 | Runs perspective-n-point (PNP) algorithm. 16 | """ 17 | 18 | # Class variables 19 | cv2version = cv2.__version__.split(".") 20 | cv2majorversion = int(cv2version[0]) 21 | 22 | def __init__( 23 | self, 24 | object_name="", 25 | camera_intrinsic_matrix=None, 26 | cuboid3d=None, 27 | dist_coeffs=np.zeros((4, 1)), 28 | ): 29 | self.object_name = object_name 30 | if not camera_intrinsic_matrix is None: 31 | self._camera_intrinsic_matrix = camera_intrinsic_matrix 32 | else: 33 | self._camera_intrinsic_matrix = np.array([[0, 0, 0], [0, 0, 0], [0, 0, 0]]) 34 | self._cuboid3d = cuboid3d 35 | 36 | self._dist_coeffs = dist_coeffs 37 | 38 | def set_camera_intrinsic_matrix(self, new_intrinsic_matrix): 39 | """Sets the camera intrinsic matrix""" 40 | self._camera_intrinsic_matrix = new_intrinsic_matrix 41 | 42 | def set_dist_coeffs(self, dist_coeffs): 43 | """Sets the camera intrinsic matrix""" 44 | self._dist_coeffs = dist_coeffs 45 | 46 | def solve_pnp(self, cuboid2d_points, pnp_algorithm=None): 47 | """ 48 | Detects the rotation and traslation 49 | of a cuboid object from its vertexes' 50 | 2D location in the image 51 | """ 52 | 53 | # Fallback to default PNP algorithm base on OpenCV version 54 | if pnp_algorithm is None: 55 | if CuboidPNPSolver.cv2majorversion == 2: 56 | pnp_algorithm = cv2.CV_ITERATIVE 57 | elif CuboidPNPSolver.cv2majorversion == 3: 58 | pnp_algorithm = cv2.SOLVEPNP_ITERATIVE 59 | 60 | if pnp_algorithm is None: 61 | pnp_algorithm = cv2.SOLVEPNP_EPNP 62 | 63 | location = None 64 | quaternion = None 65 | projected_points = cuboid2d_points 66 | 67 | cuboid3d_points = np.array(self._cuboid3d.get_vertices()) 68 | obj_2d_points = [] 69 | obj_3d_points = [] 70 | 71 | for i in range(CuboidVertexType.TotalVertexCount): 72 | check_point_2d = cuboid2d_points[i] 73 | # Ignore invalid points 74 | if check_point_2d is None: 75 | continue 76 | obj_2d_points.append(check_point_2d) 77 | obj_3d_points.append(cuboid3d_points[i]) 78 | 79 | obj_2d_points = np.array(obj_2d_points, dtype=float) 80 | obj_3d_points = np.array(obj_3d_points, dtype=float) 81 | 82 | valid_point_count = len(obj_2d_points) 83 | 84 | # Can only do PNP if we have more than 3 valid points 85 | is_points_valid = valid_point_count >= 4 86 | 87 | if is_points_valid: 88 | 89 | ret, rvec, tvec = cv2.solvePnP( 90 | obj_3d_points, 91 | obj_2d_points, 92 | self._camera_intrinsic_matrix, 93 | self._dist_coeffs, 94 | flags=pnp_algorithm, 95 | ) 96 | 97 | if ret: 98 | location = list(x[0] for x in tvec) 99 | quaternion = self.convert_rvec_to_quaternion(rvec) 100 | 101 | projected_points, _ = cv2.projectPoints( 102 | cuboid3d_points, 103 | rvec, 104 | tvec, 105 | self._camera_intrinsic_matrix, 106 | self._dist_coeffs, 107 | ) 108 | projected_points = np.squeeze(projected_points) 109 | 110 | # If the location.Z is negative or object is behind the camera then flip both location and rotation 111 | x, y, z = location 112 | if z < 0: 113 | # Get the opposite location 114 | location = [-x, -y, -z] 115 | 116 | # Change the rotation by 180 degree 117 | rotate_angle = np.pi 118 | rotate_quaternion = Quaternion.from_axis_rotation( 119 | location, rotate_angle 120 | ) 121 | quaternion = rotate_quaternion.cross(quaternion) 122 | 123 | return location, quaternion, projected_points 124 | 125 | def convert_rvec_to_quaternion(self, rvec): 126 | """Convert rvec (which is log quaternion) to quaternion""" 127 | theta = np.sqrt( 128 | rvec[0] * rvec[0] + rvec[1] * rvec[1] + rvec[2] * rvec[2] 129 | ) # in radians 130 | raxis = [rvec[0] / theta, rvec[1] / theta, rvec[2] / theta] 131 | 132 | # pyrr's Quaternion (order is XYZW), https://pyrr.readthedocs.io/en/latest/oo_api_quaternion.html 133 | return Quaternion.from_axis_rotation(raxis, theta) 134 | 135 | def project_points(self, rvec, tvec): 136 | """Project points from model onto image using rotation, translation""" 137 | output_points, tmp = cv2.projectPoints( 138 | self.__object_vertex_coordinates, 139 | rvec, 140 | tvec, 141 | self.__camera_intrinsic_matrix, 142 | self.__dist_coeffs, 143 | ) 144 | 145 | output_points = np.squeeze(output_points) 146 | return output_points 147 | -------------------------------------------------------------------------------- /CoRL/readme.md: -------------------------------------------------------------------------------- 1 | # Training 2 | 3 | This is the training code used for the [CoRL 2018 paper](https://arxiv.org/abs/1809.10790). You can also use this training script on the training data generated by the training script in [data_generation/nvisii_data_gen/](../data_generation/nvisii_data_gen/) 4 | 5 | ``` 6 | python -m torch.distributed.launch --nproc_per_node=1 train.py --network dope --epochs 2 --batchsize 10 --outf tmp/ --data ../nvisii_data_gen/output/output_example/ 7 | ``` 8 | 9 | There is an accompanying dataset you can also use to train DOPE on the meat can with the shiny top. [link here](https://drive.google.com/file/d/1Q5VLnlt1gu2pKIAcUo9uzSyWw1nGlSF8/view?usp=sharing). 10 | 11 | # Inference 12 | 13 | I also made an inference script that runs without any ROS components. 14 | 15 | ``` 16 | python inference.py 17 | ``` 18 | 19 | Look at the file for more information, similar to the ROS node everything is run through the yaml files in `config_inference`. It is very similar to the original code with some changes. 20 | 21 | Check `models.py` as we are proposing different architectures. -------------------------------------------------------------------------------- /CoRL/requirements.txt: -------------------------------------------------------------------------------- 1 | opencv-python-headless<4.3 2 | albumentations 3 | matplotlib 4 | simplejson 5 | numpy 6 | opencv_python 7 | photutils 8 | scipy 9 | torch 10 | pyquaternion 11 | tqdm 12 | pyrr 13 | Pillow==5.2.0 14 | torchvision 15 | PyYAML 16 | tensorboardX 17 | -------------------------------------------------------------------------------- /common/cuboid.py: -------------------------------------------------------------------------------- 1 | # Copyright (c) 2018 NVIDIA Corporation. All rights reserved. 2 | # This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License. 3 | # https://creativecommons.org/licenses/by-nc-sa/4.0/legalcode 4 | 5 | from enum import IntEnum, unique 6 | import numpy as np 7 | import cv2 8 | from pyrr import Quaternion, Matrix44, Vector3, euler 9 | 10 | # Related to the object's local coordinate system 11 | # @unique 12 | class CuboidVertexType(IntEnum): 13 | FrontTopRight = 0 14 | FrontTopLeft = 1 15 | FrontBottomLeft = 2 16 | FrontBottomRight = 3 17 | RearTopRight = 4 18 | RearTopLeft = 5 19 | RearBottomLeft = 6 20 | RearBottomRight = 7 21 | Center = 8 22 | TotalCornerVertexCount = 8 # Corner vertexes doesn't include the center point 23 | TotalVertexCount = 9 24 | 25 | # List of the vertex indexes in each line edges of the cuboid 26 | CuboidLineIndexes = [ 27 | # Front face 28 | [ CuboidVertexType.FrontTopLeft, CuboidVertexType.FrontTopRight ], 29 | [ CuboidVertexType.FrontTopRight, CuboidVertexType.FrontBottomRight ], 30 | [ CuboidVertexType.FrontBottomRight, CuboidVertexType.FrontBottomLeft ], 31 | [ CuboidVertexType.FrontBottomLeft, CuboidVertexType.FrontTopLeft ], 32 | # Back face 33 | [ CuboidVertexType.RearTopLeft, CuboidVertexType.RearTopRight ], 34 | [ CuboidVertexType.RearTopRight, CuboidVertexType.RearBottomRight ], 35 | [ CuboidVertexType.RearBottomRight, CuboidVertexType.RearBottomLeft ], 36 | [ CuboidVertexType.RearBottomLeft, CuboidVertexType.RearTopLeft ], 37 | # Left face 38 | [ CuboidVertexType.FrontBottomLeft, CuboidVertexType.RearBottomLeft ], 39 | [ CuboidVertexType.FrontTopLeft, CuboidVertexType.RearTopLeft ], 40 | # Right face 41 | [ CuboidVertexType.FrontBottomRight, CuboidVertexType.RearBottomRight ], 42 | [ CuboidVertexType.FrontTopRight, CuboidVertexType.RearTopRight ], 43 | ] 44 | 45 | 46 | # ========================= Cuboid3d ========================= 47 | class Cuboid3d(): 48 | '''This class contains a 3D cuboid.''' 49 | 50 | # Create a box with a certain size 51 | def __init__(self, size3d = [1.0, 1.0, 1.0], center_location = [0, 0, 0], 52 | coord_system = None, parent_object = None): 53 | 54 | # NOTE: This local coordinate system is similar 55 | # to the intrinsic transform matrix of a 3d object 56 | self.center_location = center_location 57 | self.coord_system = coord_system 58 | self.size3d = size3d 59 | self._vertices = [0, 0, 0] * CuboidVertexType.TotalVertexCount 60 | 61 | self.generate_vertexes() 62 | 63 | def get_vertex(self, vertex_type): 64 | """Returns the location of a vertex. 65 | 66 | Args: 67 | vertex_type: enum of type CuboidVertexType 68 | 69 | Returns: 70 | Numpy array(3) - Location of the vertex type in the cuboid 71 | """ 72 | return self._vertices[vertex_type] 73 | 74 | def get_vertices(self): 75 | return self._vertices 76 | 77 | def generate_vertexes(self): 78 | width, height, depth = self.size3d 79 | 80 | # By default just use the normal OpenCV coordinate system 81 | if (self.coord_system is None): 82 | cx, cy, cz = self.center_location 83 | # X axis point to the right 84 | right = cx + width / 2.0 85 | left = cx - width / 2.0 86 | # Y axis point downward 87 | top = cy - height / 2.0 88 | bottom = cy + height / 2.0 89 | # Z axis point forward 90 | front = cz + depth / 2.0 91 | rear = cz - depth / 2.0 92 | 93 | # List of 8 vertices of the box 94 | self._vertices = [ 95 | [right, top, front], # Front Top Right 96 | [left, top, front], # Front Top Left 97 | [left, bottom, front], # Front Bottom Left 98 | [right, bottom, front], # Front Bottom Right 99 | [right, top, rear], # Rear Top Right 100 | [left, top, rear], # Rear Top Left 101 | [left, bottom, rear], # Rear Bottom Left 102 | [right, bottom, rear], # Rear Bottom Right 103 | self.center_location, # Center 104 | ] 105 | else: 106 | sx, sy, sz = self.size3d 107 | forward = np.array(self.coord_system.forward, dtype=float) * sy * 0.5 108 | up = np.array(self.coord_system.up, dtype=float) * sz * 0.5 109 | right = np.array(self.coord_system.right, dtype=float) * sx * 0.5 110 | center = np.array(self.center_location, dtype=float) 111 | self._vertices = [ 112 | center + forward + up + right, # Front Top Right 113 | center + forward + up - right, # Front Top Left 114 | center + forward - up - right, # Front Bottom Left 115 | center + forward - up + right, # Front Bottom Right 116 | center - forward + up + right, # Rear Top Right 117 | center - forward + up - right, # Rear Top Left 118 | center - forward - up - right, # Rear Bottom Left 119 | center - forward - up + right, # Rear Bottom Right 120 | self.center_location, # Center 121 | ] 122 | 123 | def get_projected_cuboid2d(self, cuboid_transform, camera_intrinsic_matrix): 124 | """ 125 | Projects the cuboid into the image plane using camera intrinsics. 126 | 127 | Args: 128 | cuboid_transform: the world transform of the cuboid 129 | camera_intrinsic_matrix: camera intrinsic matrix 130 | 131 | Returns: 132 | Cuboid2d - the projected cuboid points 133 | """ 134 | 135 | world_transform_matrix = cuboid_transform 136 | rvec = [0, 0, 0] 137 | tvec = [0, 0, 0] 138 | dist_coeffs = np.zeros((4, 1)) 139 | 140 | transformed_vertices = [0, 0, 0] * CuboidVertexType.TotalVertexCount 141 | for vertex_index in range(CuboidVertexType.TotalVertexCount): 142 | vertex3d = self._vertices[vertex_index] 143 | transformed_vertices[vertex_index] = world_transform_matrix * vertex3d 144 | 145 | projected_vertices = cv2.projectPoints(transformed_vertices, rvec, tvec, 146 | camera_intrinsic_matrix, dist_coeffs) 147 | 148 | return Cuboid2d(projected_vertices) 149 | -------------------------------------------------------------------------------- /common/cuboid_pnp_solver.py: -------------------------------------------------------------------------------- 1 | # Copyright (c) 2018 NVIDIA Corporation. All rights reserved. 2 | # This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License. 3 | # https://creativecommons.org/licenses/by-nc-sa/4.0/legalcode 4 | 5 | import cv2 6 | import numpy as np 7 | from cuboid import CuboidVertexType 8 | from pyrr import Quaternion 9 | 10 | 11 | class CuboidPNPSolver(object): 12 | """ 13 | This class is used to find the 6-DoF pose of a cuboid given its projected vertices. 14 | 15 | Runs perspective-n-point (PNP) algorithm. 16 | """ 17 | 18 | # Class variables 19 | cv2version = cv2.__version__.split(".") 20 | cv2majorversion = int(cv2version[0]) 21 | 22 | def __init__( 23 | self, 24 | object_name="", 25 | camera_intrinsic_matrix=None, 26 | cuboid3d=None, 27 | dist_coeffs=np.zeros((4, 1)), 28 | ): 29 | self.object_name = object_name 30 | if not camera_intrinsic_matrix is None: 31 | self._camera_intrinsic_matrix = camera_intrinsic_matrix 32 | else: 33 | self._camera_intrinsic_matrix = np.array([[0, 0, 0], [0, 0, 0], [0, 0, 0]]) 34 | self._cuboid3d = cuboid3d 35 | 36 | self._dist_coeffs = dist_coeffs 37 | 38 | def set_camera_intrinsic_matrix(self, new_intrinsic_matrix): 39 | """Sets the camera intrinsic matrix""" 40 | self._camera_intrinsic_matrix = new_intrinsic_matrix 41 | 42 | def set_dist_coeffs(self, dist_coeffs): 43 | """Sets the camera intrinsic matrix""" 44 | self._dist_coeffs = dist_coeffs 45 | 46 | def solve_pnp(self, cuboid2d_points, pnp_algorithm=None): 47 | """ 48 | Detects the rotation and traslation 49 | of a cuboid object from its vertexes' 50 | 2D location in the image 51 | """ 52 | 53 | # Fallback to default PNP algorithm base on OpenCV version 54 | if pnp_algorithm is None: 55 | if CuboidPNPSolver.cv2majorversion == 2: 56 | pnp_algorithm = cv2.CV_ITERATIVE 57 | elif CuboidPNPSolver.cv2majorversion == 3: 58 | pnp_algorithm = cv2.SOLVEPNP_ITERATIVE 59 | 60 | if pnp_algorithm is None: 61 | pnp_algorithm = cv2.SOLVEPNP_EPNP 62 | 63 | location = None 64 | quaternion = None 65 | projected_points = cuboid2d_points 66 | 67 | cuboid3d_points = np.array(self._cuboid3d.get_vertices()) 68 | obj_2d_points = [] 69 | obj_3d_points = [] 70 | 71 | for i in range(CuboidVertexType.TotalVertexCount): 72 | check_point_2d = cuboid2d_points[i] 73 | # Ignore invalid points 74 | if check_point_2d is None: 75 | continue 76 | obj_2d_points.append(check_point_2d) 77 | obj_3d_points.append(cuboid3d_points[i]) 78 | 79 | obj_2d_points = np.array(obj_2d_points, dtype=float) 80 | obj_3d_points = np.array(obj_3d_points, dtype=float) 81 | 82 | valid_point_count = len(obj_2d_points) 83 | 84 | # Can only do PNP if we have more than 3 valid points 85 | is_points_valid = valid_point_count >= 4 86 | 87 | if is_points_valid: 88 | 89 | ret, rvec, tvec = cv2.solvePnP( 90 | obj_3d_points, 91 | obj_2d_points, 92 | self._camera_intrinsic_matrix, 93 | self._dist_coeffs, 94 | flags=pnp_algorithm, 95 | ) 96 | 97 | if ret: 98 | location = list(x[0] for x in tvec) 99 | quaternion = self.convert_rvec_to_quaternion(rvec) 100 | 101 | projected_points, _ = cv2.projectPoints( 102 | cuboid3d_points, 103 | rvec, 104 | tvec, 105 | self._camera_intrinsic_matrix, 106 | self._dist_coeffs, 107 | ) 108 | projected_points = np.squeeze(projected_points) 109 | 110 | # If the location.Z is negative or object is behind the camera then flip both location and rotation 111 | x, y, z = location 112 | if z < 0: 113 | # Get the opposite location 114 | location = [-x, -y, -z] 115 | 116 | # Change the rotation by 180 degree 117 | rotate_angle = np.pi 118 | rotate_quaternion = Quaternion.from_axis_rotation( 119 | location, rotate_angle 120 | ) 121 | quaternion = rotate_quaternion.cross(quaternion) 122 | 123 | return location, quaternion, projected_points 124 | 125 | def convert_rvec_to_quaternion(self, rvec): 126 | """Convert rvec (which is log quaternion) to quaternion""" 127 | theta = np.sqrt( 128 | rvec[0] * rvec[0] + rvec[1] * rvec[1] + rvec[2] * rvec[2] 129 | ) # in radians 130 | raxis = [rvec[0] / theta, rvec[1] / theta, rvec[2] / theta] 131 | 132 | # pyrr's Quaternion (order is XYZW), https://pyrr.readthedocs.io/en/latest/oo_api_quaternion.html 133 | return Quaternion.from_axis_rotation(raxis, theta) 134 | 135 | def project_points(self, rvec, tvec): 136 | """Project points from model onto image using rotation, translation""" 137 | output_points, tmp = cv2.projectPoints( 138 | self.__object_vertex_coordinates, 139 | rvec, 140 | tvec, 141 | self.__camera_intrinsic_matrix, 142 | self.__dist_coeffs, 143 | ) 144 | 145 | output_points = np.squeeze(output_points) 146 | return output_points 147 | -------------------------------------------------------------------------------- /common/debug.py: -------------------------------------------------------------------------------- 1 | # Debugging Tool to Visualize Synthetic Data Projected Points Accuracy 2 | 3 | from PIL import Image 4 | import json 5 | 6 | from utils import Draw, loadimages 7 | 8 | import argparse 9 | import os 10 | 11 | 12 | def visualize_projected_points(path_img, path_json, path_output, img_name, root): 13 | img = Image.open(path_img).convert("RGB") 14 | 15 | with open(path_json) as f: 16 | data_json = json.load(f) 17 | 18 | draw = Draw(img) 19 | 20 | for obj in data_json["objects"]: 21 | projected_cuboid_keypoints = [tuple(pair) for pair in obj["projected_cuboid"]] 22 | draw.draw_cube(projected_cuboid_keypoints) 23 | 24 | path_output = os.path.join( 25 | path_output, img_path.replace(root, "").replace(img_name, "").lstrip("/") 26 | ) 27 | os.makedirs(path_output, exist_ok=True) 28 | img.save(os.path.join(path_output, img_name)) 29 | 30 | 31 | if __name__ == "__main__": 32 | parser = argparse.ArgumentParser() 33 | 34 | parser.add_argument( 35 | "--outf", 36 | default="output/debug", 37 | help="Where to store the debug output images.", 38 | ) 39 | parser.add_argument( 40 | "--data", 41 | required=True, 42 | help="Folder containing groundtruth and images.", 43 | ) 44 | 45 | opt = parser.parse_args() 46 | 47 | imgs = sorted(loadimages(opt.data, extensions=["jpg", "png"])) 48 | 49 | for i, (img_path, img_name, json_path) in enumerate(imgs): 50 | img_rel_path = img_path.replace(opt.data, "") 51 | print(f"Debugging image {img_rel_path} ({i + 1} of {len(imgs)}) | Outputting to: {opt.outf + '/' + img_rel_path}") 52 | visualize_projected_points(img_path, json_path, opt.outf, img_name, opt.data) 53 | -------------------------------------------------------------------------------- /common/models.py: -------------------------------------------------------------------------------- 1 | """ 2 | NVIDIA from jtremblay@gmail.com 3 | """ 4 | 5 | # Networks 6 | import torch 7 | import torch 8 | import torch.nn as nn 9 | import torch.nn.parallel 10 | import torch.utils.data 11 | import torchvision.models as models 12 | 13 | 14 | class DopeNetwork(nn.Module): 15 | def __init__( 16 | self, 17 | pretrained=False, 18 | numBeliefMap=9, 19 | numAffinity=16, 20 | stop_at_stage=6, # number of stages to process (if less than total number of stages) 21 | ): 22 | super(DopeNetwork, self).__init__() 23 | 24 | self.stop_at_stage = stop_at_stage 25 | 26 | vgg_full = models.vgg19(pretrained=False).features 27 | self.vgg = nn.Sequential() 28 | for i_layer in range(24): 29 | self.vgg.add_module(str(i_layer), vgg_full[i_layer]) 30 | 31 | # Add some layers 32 | i_layer = 23 33 | self.vgg.add_module( 34 | str(i_layer), nn.Conv2d(512, 256, kernel_size=3, stride=1, padding=1) 35 | ) 36 | self.vgg.add_module(str(i_layer + 1), nn.ReLU(inplace=True)) 37 | self.vgg.add_module( 38 | str(i_layer + 2), nn.Conv2d(256, 128, kernel_size=3, stride=1, padding=1) 39 | ) 40 | self.vgg.add_module(str(i_layer + 3), nn.ReLU(inplace=True)) 41 | 42 | # print('---Belief------------------------------------------------') 43 | # _2 are the belief map stages 44 | self.m1_2 = DopeNetwork.create_stage(128, numBeliefMap, True) 45 | self.m2_2 = DopeNetwork.create_stage( 46 | 128 + numBeliefMap + numAffinity, numBeliefMap, False 47 | ) 48 | self.m3_2 = DopeNetwork.create_stage( 49 | 128 + numBeliefMap + numAffinity, numBeliefMap, False 50 | ) 51 | self.m4_2 = DopeNetwork.create_stage( 52 | 128 + numBeliefMap + numAffinity, numBeliefMap, False 53 | ) 54 | self.m5_2 = DopeNetwork.create_stage( 55 | 128 + numBeliefMap + numAffinity, numBeliefMap, False 56 | ) 57 | self.m6_2 = DopeNetwork.create_stage( 58 | 128 + numBeliefMap + numAffinity, numBeliefMap, False 59 | ) 60 | 61 | # print('---Affinity----------------------------------------------') 62 | # _1 are the affinity map stages 63 | self.m1_1 = DopeNetwork.create_stage(128, numAffinity, True) 64 | self.m2_1 = DopeNetwork.create_stage( 65 | 128 + numBeliefMap + numAffinity, numAffinity, False 66 | ) 67 | self.m3_1 = DopeNetwork.create_stage( 68 | 128 + numBeliefMap + numAffinity, numAffinity, False 69 | ) 70 | self.m4_1 = DopeNetwork.create_stage( 71 | 128 + numBeliefMap + numAffinity, numAffinity, False 72 | ) 73 | self.m5_1 = DopeNetwork.create_stage( 74 | 128 + numBeliefMap + numAffinity, numAffinity, False 75 | ) 76 | self.m6_1 = DopeNetwork.create_stage( 77 | 128 + numBeliefMap + numAffinity, numAffinity, False 78 | ) 79 | 80 | def forward(self, x): 81 | """Runs inference on the neural network""" 82 | 83 | out1 = self.vgg(x) 84 | 85 | out1_2 = self.m1_2(out1) 86 | out1_1 = self.m1_1(out1) 87 | 88 | if self.stop_at_stage == 1: 89 | return [out1_2], [out1_1] 90 | 91 | out2 = torch.cat([out1_2, out1_1, out1], 1) 92 | out2_2 = self.m2_2(out2) 93 | out2_1 = self.m2_1(out2) 94 | 95 | if self.stop_at_stage == 2: 96 | return [out1_2, out2_2], [out1_1, out2_1] 97 | 98 | out3 = torch.cat([out2_2, out2_1, out1], 1) 99 | out3_2 = self.m3_2(out3) 100 | out3_1 = self.m3_1(out3) 101 | 102 | if self.stop_at_stage == 3: 103 | return [out1_2, out2_2, out3_2], [out1_1, out2_1, out3_1] 104 | 105 | out4 = torch.cat([out3_2, out3_1, out1], 1) 106 | out4_2 = self.m4_2(out4) 107 | out4_1 = self.m4_1(out4) 108 | 109 | if self.stop_at_stage == 4: 110 | return [out1_2, out2_2, out3_2, out4_2], [out1_1, out2_1, out3_1, out4_1] 111 | 112 | out5 = torch.cat([out4_2, out4_1, out1], 1) 113 | out5_2 = self.m5_2(out5) 114 | out5_1 = self.m5_1(out5) 115 | 116 | if self.stop_at_stage == 5: 117 | return [out1_2, out2_2, out3_2, out4_2, out5_2], [ 118 | out1_1, 119 | out2_1, 120 | out3_1, 121 | out4_1, 122 | out5_1, 123 | ] 124 | 125 | out6 = torch.cat([out5_2, out5_1, out1], 1) 126 | out6_2 = self.m6_2(out6) 127 | out6_1 = self.m6_1(out6) 128 | 129 | return [out1_2, out2_2, out3_2, out4_2, out5_2, out6_2], [ 130 | out1_1, 131 | out2_1, 132 | out3_1, 133 | out4_1, 134 | out5_1, 135 | out6_1, 136 | ] 137 | 138 | @staticmethod 139 | def create_stage(in_channels, out_channels, first=False): 140 | """Create the neural network layers for a single stage.""" 141 | 142 | model = nn.Sequential() 143 | mid_channels = 128 144 | if first: 145 | padding = 1 146 | kernel = 3 147 | count = 6 148 | final_channels = 512 149 | else: 150 | padding = 3 151 | kernel = 7 152 | count = 10 153 | final_channels = mid_channels 154 | 155 | # First convolution 156 | model.add_module( 157 | "0", 158 | nn.Conv2d( 159 | in_channels, mid_channels, kernel_size=kernel, stride=1, padding=padding 160 | ), 161 | ) 162 | 163 | # Middle convolutions 164 | i = 1 165 | while i < count - 1: 166 | model.add_module(str(i), nn.ReLU(inplace=True)) 167 | i += 1 168 | model.add_module( 169 | str(i), 170 | nn.Conv2d( 171 | mid_channels, 172 | mid_channels, 173 | kernel_size=kernel, 174 | stride=1, 175 | padding=padding, 176 | ), 177 | ) 178 | i += 1 179 | 180 | # Penultimate convolution 181 | model.add_module(str(i), nn.ReLU(inplace=True)) 182 | i += 1 183 | model.add_module( 184 | str(i), nn.Conv2d(mid_channels, final_channels, kernel_size=1, stride=1) 185 | ) 186 | i += 1 187 | 188 | # Last convolution 189 | model.add_module(str(i), nn.ReLU(inplace=True)) 190 | i += 1 191 | model.add_module( 192 | str(i), nn.Conv2d(final_channels, out_channels, kernel_size=1, stride=1) 193 | ) 194 | i += 1 195 | 196 | return model 197 | -------------------------------------------------------------------------------- /config/blenderproc_camera_info_example.yaml: -------------------------------------------------------------------------------- 1 | image_width: 500 2 | image_height: 500 3 | camera_name: dope_webcam_0 4 | camera_matrix: 5 | rows: 3 6 | cols: 3 7 | data: [603.5535070631239, 0, 249.5, 0, 603.5535070631239, 249.5, 0, 0, 1] 8 | distortion_model: plumb_bob 9 | distortion_coefficients: 10 | rows: 1 11 | cols: 5 12 | data: [0, 0, 0, 0, 0] 13 | rectification_matrix: 14 | rows: 3 15 | cols: 3 16 | data: [1, 0, 0, 0, 1, 0, 0, 0, 1] 17 | projection_matrix: 18 | rows: 3 19 | cols: 4 20 | data: [603.5535070631239, 0, 249.5, 0, 0, 603.5535070631239, 249.5, 0, 0, 0, 1, 0] 21 | -------------------------------------------------------------------------------- /config/camera_info.yaml: -------------------------------------------------------------------------------- 1 | image_width: 640 2 | image_height: 480 3 | camera_name: dope_webcam_0 4 | camera_matrix: 5 | rows: 3 6 | cols: 3 7 | data: [641.5, 0, 320.0, 0, 641.5, 240.0, 0, 0, 1] 8 | distortion_model: plumb_bob 9 | distortion_coefficients: 10 | rows: 1 11 | cols: 5 12 | data: [0, 0, 0, 0, 0] 13 | rectification_matrix: 14 | rows: 3 15 | cols: 3 16 | data: [1, 0, 0, 0, 1, 0, 0, 0, 1] 17 | projection_matrix: 18 | rows: 3 19 | cols: 4 20 | data: [641.5, 0, 320.0, 0, 0, 641.5, 240.0, 0, 0, 0, 1, 0] 21 | -------------------------------------------------------------------------------- /config/config_pose.yaml: -------------------------------------------------------------------------------- 1 | topic_camera: "/dope/webcam/image_raw" 2 | topic_camera_info: "/dope/webcam/camera_info" 3 | topic_publishing: "dope" 4 | input_is_rectified: True # Whether the input image is rectified (strongly suggested!) 5 | downscale_height: 400 # if the input image is larger than this, scale it down to this pixel height 6 | 7 | # Comment any of these lines to prevent detection / pose estimation of that object 8 | weights: { 9 | # "cracker":"package://dope/weights/cracker_60.pth", 10 | # "gelatin":"package://dope/weights/gelatin_60.pth", 11 | # "meat":"package://dope/weights/meat_20.pth", 12 | # "mustard":"package://dope/weights/mustard_60.pth", 13 | "soup":"package://dope/weights/soup_60.pth", 14 | #"sugar":"package://dope/weights/sugar_60.pth", 15 | # "bleach":"package://dope/weights/bleach_28_dr.pth" 16 | 17 | # NEW OBJECTS - HOPE 18 | # "AlphabetSoup":"package://dope/weights/AlphabetSoup.pth", 19 | # "BBQSauce":"package://dope/weights/BBQSauce.pth", 20 | # "Butter":"package://dope/weights/Butter.pth", 21 | # "Cherries":"package://dope/weights/Cherries.pth", 22 | # "ChocolatePudding":"package://dope/weights/ChocolatePudding.pth", 23 | # "Cookies":"package://dope/weights/Cookies.pth", 24 | # "Corn":"package://dope/weights/Corn.pth", 25 | # "CreamCheese":"package://dope/weights/CreamCheese.pth", 26 | # "GreenBeans":"package://dope/weights/GreenBeans.pth", 27 | # "GranolaBars":"package://dope/weights/GranolaBars.pth", 28 | # "Ketchup":"package://dope/weights/Ketchup.pth", 29 | # "MacaroniAndCheese":"package://dope/weights/MacaroniAndCheese.pth", 30 | # "Mayo":"package://dope/weights/Mayo.pth", 31 | # "Milk":"package://dope/weights/Milk.pth", 32 | # "Mushrooms":"package://dope/weights/Mushrooms.pth", 33 | # "Mustard":"package://dope/weights/Mustard.pth", 34 | # "Parmesan":"package://dope/weights/Parmesan.pth", 35 | # "PeasAndCarrots":"package://dope/weights/PeasAndCarrots.pth", 36 | # "Peaches":"package://dope/weights/Peaches.pth", 37 | # "Pineapple":"package://dope/weights/Pineapple.pth", 38 | # "Popcorn":"package://dope/weights/Popcorn.pth", 39 | # "OrangeJuice":"package://dope/weights/OrangeJuice.pth", 40 | # "Raisins":"package://dope/weights/Raisins.pth", 41 | # "SaladDressing":"package://dope/weights/SaladDressing.pth", 42 | # "Spaghetti":"package://dope/weights/Spaghetti.pth", 43 | # "TomatoSauce":"package://dope/weights/TomatoSauce.pth", 44 | # "Tuna":"package://dope/weights/Tuna.pth", 45 | # "Yogurt":"package://dope/weights/Yogurt.pth", 46 | 47 | } 48 | 49 | # Cuboid dimension in cm x,y,z 50 | dimensions: { 51 | "cracker": [16.403600692749023,21.343700408935547,7.179999828338623], 52 | "gelatin": [8.918299674987793, 7.311500072479248, 2.9983000755310059], 53 | "meat": [10.164673805236816,8.3542995452880859,5.7600898742675781], 54 | "mustard": [9.6024150848388672,19.130100250244141,5.824894905090332], 55 | "soup": [6.7659378051757813,10.185500144958496,6.771425724029541], 56 | "sugar": [9.267730712890625,17.625339508056641,4.5134143829345703], 57 | "bleach": [10.267730712890625,26.625339508056641,7.5134143829345703], 58 | 59 | # new objects 60 | "AlphabetSoup" : [ 8.3555002212524414, 7.1121001243591309, 6.6055998802185059 ], 61 | "Butter" : [ 5.282599925994873, 2.3935999870300293, 10.330100059509277 ], 62 | "Ketchup" : [ 14.860799789428711, 4.3368000984191895, 6.4513998031616211 ], 63 | "Pineapple" : [ 5.7623000144958496, 6.95989990234375, 6.567500114440918 ], 64 | "BBQSauce" : [ 14.832900047302246, 4.3478999137878418, 6.4632000923156738 ], 65 | "MacaroniAndCheese" : [ 16.625600814819336, 4.0180997848510742, 12.350899696350098 ], 66 | "Popcorn" : [ 8.4976997375488281, 3.825200080871582, 12.649200439453125 ], 67 | "Mayo" : [ 14.790200233459473, 4.1030998229980469, 6.4541001319885254 ], 68 | "Raisins" : [ 12.317500114440918, 3.9751999378204346, 8.5874996185302734 ], 69 | "Cherries" : [ 5.8038997650146484, 7.0907998085021973, 6.6101999282836914 ], 70 | "Milk" : [ 19.035800933837891, 7.326200008392334, 7.2154998779296875 ], 71 | "SaladDressing" : [ 14.744099617004395, 4.3695998191833496, 6.403900146484375 ], 72 | "ChocolatePudding" : [ 4.947199821472168, 2.9923000335693359, 8.3498001098632812 ], 73 | "Mushrooms" : [ 3.3322000503540039, 7.079899787902832, 6.5869998931884766 ], 74 | "Spaghetti" : [ 4.9836997985839844, 2.8492999076843262, 24.988100051879883 ], 75 | "Cookies" : [ 16.724300384521484, 4.015200138092041, 12.274600028991699 ], 76 | "Mustard" : [ 16.004999160766602, 4.8573999404907227, 6.5132999420166016 ], 77 | "TomatoSauce" : [ 8.2847003936767578, 7.0198001861572266, 6.6469998359680176 ], 78 | "Corn" : [ 5.8038997650146484, 7.0907998085021973, 6.6101999282836914 ], 79 | "OrangeJuice" : [ 19.248300552368164, 7.2781000137329102, 7.1582999229431152 ], 80 | "Tuna" : [ 3.2571001052856445, 7.0805997848510742, 6.5837001800537109 ], 81 | "CreamCheese" : [ 5.3206000328063965, 2.4230999946594238, 10.359000205993652 ], 82 | "Parmesan" : [ 10.286199569702148, 6.6093001365661621, 7.1117000579833984 ], 83 | "Yogurt" : [ 5.3677000999450684, 6.7961997985839844, 6.7915000915527344 ], 84 | "GranolaBars" : [ 12.400600433349609, 3.8738000392913818, 16.53380012512207 ], 85 | "Peaches" : [ 5.7781000137329102, 7.0961999893188477, 6.5925998687744141 ], 86 | "GreenBeans" : [ 5.758699893951416, 7.0608000755310059, 6.5732002258300781 ], 87 | "PeasAndCarrots" : [ 5.8512001037597656, 7.0636000633239746, 6.5918002128601074 ] 88 | } 89 | 90 | class_ids: { 91 | "cracker": 1, 92 | "gelatin": 2, 93 | "meat": 3, 94 | "mustard": 4, 95 | "soup": 5, 96 | "sugar": 6, 97 | "bleach": 7, 98 | "AlphabetSoup" : 9, 99 | "Ketchup" : 10, 100 | "Pineapple" : 11, 101 | "BBQSauce" : 12, 102 | "MacaroniAndCheese" : 13, 103 | "Popcorn" : 14, 104 | "Butter" : 15, 105 | "Mayo" : 16, 106 | "Raisins" : 17, 107 | "Cherries" : 18, 108 | "Milk" : 19, 109 | "SaladDressing" : 20, 110 | "ChocolatePudding" : 21, 111 | "Mushrooms" : 22, 112 | "Spaghetti" : 23, 113 | "Cookies" : 24, 114 | "Mustard" : 25, 115 | "TomatoSauce" : 26, 116 | "Corn" : 27, 117 | "OrangeJuice" : 28, 118 | "Tuna" : 29, 119 | "CreamCheese" : 20, 120 | "Parmesan" : 31, 121 | "Yogurt" : 32, 122 | "GranolaBars" : 33, 123 | "Peaches" : 34, 124 | "GreenBeans" : 35, 125 | "PeasAndCarrots" : 36 126 | } 127 | 128 | draw_colors: { 129 | "cracker": [13, 255, 128], # green 130 | "gelatin": [255, 255, 255], # while 131 | "meat": [0, 104, 255], # blue 132 | "mustard": [217,12, 232], # magenta 133 | "soup": [255, 101, 0], # orange 134 | "sugar": [232, 222, 12], # yellow 135 | "bleach": [232, 222, 12], # yellow 136 | } 137 | 138 | # optional: provide a transform that is applied to the pose returned by DOPE 139 | model_transforms: { 140 | # "cracker": [[ 0, 0, 1, 0], 141 | # [ 0, -1, 0, 0], 142 | # [ 1, 0, 0, 0], 143 | # [ 0, 0, 0, 1]] 144 | } 145 | 146 | # optional: if you provide a mesh of the object here, a mesh marker will be 147 | # published for visualization in RViz 148 | # You can use the nvdu_ycb tool to download the meshes: https://github.com/NVIDIA/Dataset_Utilities#nvdu_ycb 149 | meshes: { 150 | # "cracker": "file://path/to/Dataset_Utilities/nvdu/data/ycb/aligned_cm/003_cracker_box/google_16k/textured.obj", 151 | # "gelatin": "file://path/to/Dataset_Utilities/nvdu/data/ycb/aligned_cm/009_gelatin_box/google_16k/textured.obj", 152 | # "meat": "file://path/to/Dataset_Utilities/nvdu/data/ycb/aligned_cm/010_potted_meat_can/google_16k/textured.obj", 153 | # "mustard": "file://path/to/Dataset_Utilities/nvdu/data/ycb/aligned_cm/006_mustard_bottle/google_16k/textured.obj", 154 | # "soup": "file://path/to/Dataset_Utilities/nvdu/data/ycb/aligned_cm/005_tomato_soup_can/google_16k/textured.obj", 155 | # "sugar": "file://path/to/Dataset_Utilities/nvdu/data/ycb/aligned_cm/004_sugar_box/google_16k/textured.obj", 156 | # "bleach": "file://path/to/Dataset_Utilities/nvdu/data/ycb/aligned_cm/021_bleach_cleanser/google_16k/textured.obj", 157 | } 158 | 159 | # optional: If the specified meshes are not in meters, provide a scale here (e.g. if the mesh is in centimeters, scale should be 0.01). default scale: 1.0. 160 | mesh_scales: { 161 | "cracker": 0.01, 162 | "gelatin": 0.01, 163 | "meat": 0.01, 164 | "mustard": 0.01, 165 | "soup": 0.01, 166 | "sugar": 0.01, 167 | "bleach": 0.01, 168 | } 169 | 170 | overlay_belief_images: True # Whether to overlay the input image on the belief images published on /dope/belief_[obj_name] 171 | 172 | # Config params for DOPE 173 | thresh_angle: 0.5 174 | thresh_map: 0.01 175 | sigma: 3 176 | thresh_points: 0.1 177 | -------------------------------------------------------------------------------- /data_generation/backgrounds/README.md: -------------------------------------------------------------------------------- 1 | # Background Images 2 | 3 | Place background images here. -------------------------------------------------------------------------------- /data_generation/backgrounds/messy_office.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/NVlabs/Deep_Object_Pose/6cb3f3d250aed270e91b605e7cb6e64db4b97dbe/data_generation/backgrounds/messy_office.png -------------------------------------------------------------------------------- /data_generation/blenderproc_data_gen/README.md: -------------------------------------------------------------------------------- 1 | # Synthetic Data Generation with BlenderProc 2 | 3 | ## Installation 4 | BlenderProc can be installed with pip: 5 | ``` 6 | pip install blenderproc 7 | ``` 8 | If you run into troubles, please consult the [project's own GitHub page](https://github.com/DLR-RM/BlenderProc). 9 | 10 | 11 | ## Usage 12 | 13 | [BlenderProc](https://github.com/DLR-RM/BlenderProc) is intended to create a single scene and render multiple frames of it. Adding and removing objects (such as varying the number of distractors) will cause memory bloat and poor performance. To avoid this issue, we use a batching script (`run_blenderproc_datagen.py`) to run a standalone BlenderProc script several times. 14 | 15 | 16 | 17 | ### Usage example: 18 | 19 | Run the BlenderProc script in five parallel jobs, each generating 1000 frames. Each frame will have six instances of the object and ten randomly chosen distractor objects: 20 | ``` 21 | ./run_blenderproc_datagen.py --nb_runs 5 --nb_frames 1000 --path_single_obj ../models/Ketchup/google_16k/textured.obj --nb_objects 6 --distractors_folder ~/data/google_scanned_models/ --nb_distractors 10 --backgrounds_folder ../dome_hdri_haven/ 22 | ``` 23 | 24 | Parameters of the top-level script can be shown by running 25 | ``` 26 | python ./run_blenderproc_datagen.py --help 27 | ``` 28 | 29 | Note that, as a BlenderProc script, `generate_training_data.py` cannot be invoked with Python. It must be run via the `blenderproc` launch script. To discover its command-line parameters, you must look 30 | at the source-code itself; `blenderproc run ./generate_training_data.py --help` will not report 31 | them properly. 32 | 33 | BlenderProc searches for python modules in a different order than when invoking Python by itself. If you run into an issue where `generate_training_data.py` fails to import modules that you have installed, you may have to re-install them via BlenderProc; e.g. `blenderproc pip install pyquaternion`. 34 | 35 | ## Versioning Notes 36 | 37 | 10 June 2024 38 | 39 | The order of the `projected_cuboid` points generated by the BlenderProc scripts changed with 40 | changelist `22a2468`. This fix rotated the points 180 degrees around the Z (vertical axis). Do not 41 | intermingle data generated before this change with data generated after it. 42 | -------------------------------------------------------------------------------- /data_generation/blenderproc_data_gen/run_blenderproc_datagen.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python3 2 | 3 | import argparse 4 | import multiprocessing 5 | import os 6 | from queue import Queue 7 | import subprocess 8 | import sys 9 | 10 | 11 | parser = argparse.ArgumentParser() 12 | ## Parameters for this script 13 | parser.add_argument( 14 | '--nb_runs', 15 | default=1, 16 | type=int, 17 | help='Number of times the datagen script is run. Each time it is run, a new set of ' 18 | 'distractors is selected.' 19 | ) 20 | parser.add_argument( 21 | '--nb_workers', 22 | default=0, 23 | type=int, 24 | help='Number of parallel blenderproc workers to run. The default of 0 will create ' 25 | 'one worker for every CPU core' 26 | ) 27 | 28 | 29 | opt, unknown = parser.parse_known_args() 30 | 31 | num_workers = min(opt.nb_workers, multiprocessing.cpu_count()) 32 | if num_workers == 0: 33 | num_workers = multiprocessing.cpu_count() 34 | 35 | amount_of_runs = opt.nb_runs 36 | 37 | # set the folder in which the generation script is located 38 | rerun_folder = os.path.abspath(os.path.dirname(__file__)) 39 | 40 | Q = Queue(maxsize = num_workers) 41 | for run_id in range(amount_of_runs): 42 | if Q.full(): 43 | proc = Q.get() 44 | proc.wait() 45 | 46 | # execute one BlenderProc run 47 | cmd = ["blenderproc", "run", os.path.join(rerun_folder, "generate_training_data.py")] 48 | cmd.extend(unknown) 49 | cmd.extend(['--run_id', str(run_id)]) 50 | p = subprocess.Popen(" ".join(cmd), shell=True) 51 | Q.put(p) 52 | -------------------------------------------------------------------------------- /data_generation/dome_hdri_haven/download.md: -------------------------------------------------------------------------------- 1 | Download HDRI maps from https://polyhaven.com/hdris -------------------------------------------------------------------------------- /data_generation/models/Ketchup/google_16k/texture_map.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/NVlabs/Deep_Object_Pose/6cb3f3d250aed270e91b605e7cb6e64db4b97dbe/data_generation/models/Ketchup/google_16k/texture_map.png -------------------------------------------------------------------------------- /data_generation/models/Ketchup/google_16k/texture_map_flat.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/NVlabs/Deep_Object_Pose/6cb3f3d250aed270e91b605e7cb6e64db4b97dbe/data_generation/models/Ketchup/google_16k/texture_map_flat.png -------------------------------------------------------------------------------- /data_generation/models/Ketchup/google_16k/textured.mtl: -------------------------------------------------------------------------------- 1 | newmtl textured:_texture 2 | map_Kd texture_map.png 3 | -------------------------------------------------------------------------------- /data_generation/models/Ketchup/google_16k/textured.obj.bin: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/NVlabs/Deep_Object_Pose/6cb3f3d250aed270e91b605e7cb6e64db4b97dbe/data_generation/models/Ketchup/google_16k/textured.obj.bin -------------------------------------------------------------------------------- /data_generation/models/Ketchup/google_16k/textured.obj.json: -------------------------------------------------------------------------------- 1 | { 2 | "created_at": "2019-11-20T15:13:41.741628", 3 | "version": "0.1", 4 | "mtllibs": [ 5 | "textured.mtl" 6 | ], 7 | "vertex_buffers": [ 8 | { 9 | "material": "textured:_texture", 10 | "vertex_format": "T2F_N3F_V3F", 11 | "byte_offset": 0, 12 | "byte_length": 1491360 13 | } 14 | ] 15 | } -------------------------------------------------------------------------------- /data_generation/models/Ketchup/google_16k/textured_simple.mtl: -------------------------------------------------------------------------------- 1 | # Blender MTL File: 'None' 2 | # Material Count: 1 3 | 4 | newmtl textured:_texture 5 | Ns 0.000000 6 | Ka 1.000000 1.000000 1.000000 7 | Kd 0.800000 0.800000 0.800000 8 | Ks 0.000000 0.000000 0.000000 9 | Ke 0.000000 0.000000 0.000000 10 | Ni 1.450000 11 | d 1.000000 12 | illum 1 13 | map_Kd texture_map.png 14 | -------------------------------------------------------------------------------- /data_generation/nvisii_data_gen/.gitignore: -------------------------------------------------------------------------------- 1 | output/ -------------------------------------------------------------------------------- /data_generation/nvisii_data_gen/debug_json_ros_node.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python3 2 | """ 3 | This is a simple ROS node that reads the various transform data from a set of 4 | json files that were generated by `nvisii_data_gen` and publishes them as TF 5 | transforms over ROS, so that they can be visualized using RViz (to debug 6 | whether the transforms are correct). It was only used while debugging the json 7 | output of `nvisii_data_gen`, so most users should not need this file. It is 8 | only left in here as an example on how to use the transformations from the json 9 | fields in ROS. 10 | """ 11 | 12 | import json 13 | import time 14 | 15 | import numpy as np 16 | import rospy 17 | import tf 18 | from tf.transformations import quaternion_from_matrix, translation_from_matrix 19 | 20 | rospy.init_node("debug_json") 21 | 22 | tf_broadcaster = tf.TransformBroadcaster() 23 | 24 | while True: 25 | # This assumes that there are (at least) 50 frames (00000.json, 00001.json, ...) in the directory. 26 | for frame_number in range(50): 27 | if rospy.is_shutdown(): 28 | break 29 | path = f"{str(frame_number).zfill(5)}.json" 30 | with open(path) as json_file: 31 | conf = json.load(json_file) 32 | print(path) 33 | 34 | stamp = rospy.Time.now() 35 | 36 | camera_data = conf['camera_data'] 37 | tf_broadcaster.sendTransform(translation=camera_data['location_worldframe'], 38 | rotation=camera_data['quaternion_xyzw_worldframe'], 39 | time=stamp, 40 | parent='world', 41 | child='camera', 42 | ) 43 | # transpose to transform between column-major and row-major order 44 | camera_view_matrix = np.array(camera_data['camera_view_matrix']).transpose() 45 | tf_broadcaster.sendTransform(translation=translation_from_matrix(camera_view_matrix), 46 | rotation=quaternion_from_matrix(camera_view_matrix), 47 | time=stamp, 48 | parent='camera', 49 | child='world_from_matrix', 50 | ) 51 | for object_data in conf['objects']: 52 | tf_broadcaster.sendTransform(translation=object_data['location_worldframe'], 53 | rotation=object_data['quaternion_xyzw_worldframe'], 54 | time=stamp, 55 | parent='world', 56 | child=f"{object_data['name']}_world", 57 | ) 58 | tf_broadcaster.sendTransform(translation=object_data['location'], 59 | rotation=object_data['quaternion_xyzw'], 60 | time=stamp, 61 | parent='camera', 62 | child=f"{object_data['name']}_cam", 63 | ) 64 | local_to_world_matrix = np.array(object_data['local_to_world_matrix']).transpose() 65 | tf_broadcaster.sendTransform(translation=translation_from_matrix(local_to_world_matrix), 66 | rotation=quaternion_from_matrix(local_to_world_matrix), 67 | time=stamp, 68 | parent='world', 69 | child=f"{object_data['name']}_cam_from_matrix", 70 | ) 71 | 72 | time.sleep(1 / 30) 73 | -------------------------------------------------------------------------------- /data_generation/nvisii_data_gen/doc/videos/cylinder_nosym.mp4: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/NVlabs/Deep_Object_Pose/6cb3f3d250aed270e91b605e7cb6e64db4b97dbe/data_generation/nvisii_data_gen/doc/videos/cylinder_nosym.mp4 -------------------------------------------------------------------------------- /data_generation/nvisii_data_gen/doc/videos/cylinder_sym.mp4: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/NVlabs/Deep_Object_Pose/6cb3f3d250aed270e91b605e7cb6e64db4b97dbe/data_generation/nvisii_data_gen/doc/videos/cylinder_sym.mp4 -------------------------------------------------------------------------------- /data_generation/nvisii_data_gen/doc/videos/hex_screw.mp4: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/NVlabs/Deep_Object_Pose/6cb3f3d250aed270e91b605e7cb6e64db4b97dbe/data_generation/nvisii_data_gen/doc/videos/hex_screw.mp4 -------------------------------------------------------------------------------- /data_generation/nvisii_data_gen/download_google_scanned_objects.py: -------------------------------------------------------------------------------- 1 | import sys,json,requests 2 | import simplejson as json 3 | import subprocess 4 | 5 | collection_name = 'Google%20Scanned%20Objects' 6 | owner_name = 'GoogleResearch' 7 | # The server URL 8 | base_url ='https://fuel.ignitionrobotics.org' 9 | # Path to get the models in the collection 10 | next_url = '/1.0/models?page=2&per_page=100&q=collections:{}'.format(collection_name) 11 | next_url = '/1.0/models?per_page=100&page={}&q=collections:Google%20Scanned%20Objects' 12 | # Path to download a single model in the collection 13 | download_url = 'https://fuel.ignitionrobotics.org/1.0/{}/models/'.format(owner_name) 14 | count = 0 15 | total_count = 0 16 | # Iterate over the pages 17 | # while next_url: 18 | downloaded = {} 19 | 20 | subprocess.call(['mkdir','google_scanned_models/']) 21 | 22 | 23 | 24 | for i in range(1,1100): 25 | print(count) 26 | # Get the contents of the current page. 27 | try: 28 | r = requests.get(base_url + next_url.format(str(i))) 29 | # print(base_url + next_url) 30 | # print(r.headers) 31 | # break 32 | # Convert to JSON 33 | # print(r.text) 34 | models = json.loads(r.text) 35 | except: 36 | continue 37 | # print(models) 38 | # break 39 | # Get the next page's URL 40 | # next_url = '' 41 | # if 'Link' in r.headers: 42 | # links = r.headers['Link'].split(',') 43 | # for link in links: 44 | # parts = link.split(';') 45 | # if 'next' in parts[1]: 46 | # next_url = parts[0].replace('<','').replace('>','') 47 | # Get the total number of models to download 48 | if total_count <= 0 and 'X-Total-Count' in r.headers: 49 | total_count = int(r.headers['X-Total-Count']) 50 | # Download each model 51 | for model in models: 52 | # count+=1 53 | model_name = model['name'] 54 | if model_name not in downloaded: 55 | downloaded[model_name] = 1 56 | count+=1 57 | print ('Downloading (%d/%d) %s' % (count, total_count, model_name)) 58 | download = requests.get(download_url+model_name+'.zip', stream=True) 59 | with open("google_scanned_models/"+model_name+'.zip', 'wb') as fd: 60 | for chunk in download.iter_content(chunk_size=1024*1024): 61 | fd.write(chunk) 62 | 63 | subprocess.call(['unzip',"google_scanned_models/"+model_name+'.zip','-d', "google_scanned_models/"+model_name]) 64 | subprocess.call(['rm',"google_scanned_models/"+model_name+'.zip']) 65 | 66 | 67 | -------------------------------------------------------------------------------- /data_generation/nvisii_data_gen/generate_dataset.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python3 2 | import random 3 | import subprocess 4 | 5 | 6 | # 20 000 images 7 | 8 | for i in range(0, 100): 9 | to_call = [ 10 | "python",'single_video_pybullet.py', 11 | '--spp','10', 12 | '--nb_frames', '200', 13 | '--nb_objects',str(int(random.uniform(50,75))), 14 | '--scale', '0.01', 15 | '--outf',f"dataset/{str(i).zfill(3)}", 16 | ] 17 | subprocess.call(to_call) -------------------------------------------------------------------------------- /data_generation/nvisii_data_gen/models_with_symmetries/cylinder/google_16k/model_info.json: -------------------------------------------------------------------------------- 1 | { 2 | "symmetries_discrete": [[1, 0, 0, 0, 3 | 0, -1, 0, 0, 4 | 0, 0, -1, 0, 5 | 0, 0, 0, 1]], 6 | "symmetries_continuous": [{"axis": [0, 0, 1], "offset": [0, 0, 0]}], 7 | "align_axes": [{"object": [0, 1, 0], "camera": [0, 0, 1]}, {"object": [0, 0, 1], "camera": [0, 1, 0]}] 8 | } 9 | -------------------------------------------------------------------------------- /data_generation/nvisii_data_gen/models_with_symmetries/cylinder/google_16k/texture_map_flat.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/NVlabs/Deep_Object_Pose/6cb3f3d250aed270e91b605e7cb6e64db4b97dbe/data_generation/nvisii_data_gen/models_with_symmetries/cylinder/google_16k/texture_map_flat.png -------------------------------------------------------------------------------- /data_generation/nvisii_data_gen/models_with_symmetries/cylinder/google_16k/textured.mtl: -------------------------------------------------------------------------------- 1 | # Blender MTL File: 'None' 2 | # Material Count: 1 3 | 4 | newmtl cylinder_material 5 | Ns 225.000000 6 | Ka 1.000000 1.000000 1.000000 7 | Kd 0.800000 0.800000 0.800000 8 | Ks 0.500000 0.500000 0.500000 9 | Ke 0.000000 0.000000 0.000000 10 | Ni 1.450000 11 | d 1.000000 12 | illum 2 13 | map_Kd texture_map_flat.png 14 | -------------------------------------------------------------------------------- /data_generation/nvisii_data_gen/models_with_symmetries/hex_screw/google_16k/model_info.json: -------------------------------------------------------------------------------- 1 | { 2 | "symmetries_discrete": [[ 0.5, -0.866, 0, 0, 3 | 0.866, 0.5, 0, 0, 4 | 0, 0, 1, 0, 5 | 0, 0, 0, 1], 6 | [-0.5, -0.866, 0, 0, 7 | 0.866, -0.5, 0, 0, 8 | 0, 0, 1, 0, 9 | 0, 0, 0, 1], 10 | [-1, 0, 0, 0, 11 | 0, -1, 0, 0, 12 | 0, 0, 1, 0, 13 | 0, 0, 0, 1], 14 | [-0.5, 0.866, 0, 0, 15 | -0.866, -0.5, 0, 0, 16 | 0, 0, 1, 0, 17 | 0, 0, 0, 1], 18 | [ 0.5, 0.866, 0, 0, 19 | -0.866, 0.5, 0, 0, 20 | 0, 0, 1, 0, 21 | 0, 0, 0, 1]], 22 | "align_axes": [{"object": [0, 1, 0], "camera": [0, 0, 1]}] 23 | } 24 | -------------------------------------------------------------------------------- /data_generation/nvisii_data_gen/models_with_symmetries/hex_screw/google_16k/texture_map_flat.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/NVlabs/Deep_Object_Pose/6cb3f3d250aed270e91b605e7cb6e64db4b97dbe/data_generation/nvisii_data_gen/models_with_symmetries/hex_screw/google_16k/texture_map_flat.png -------------------------------------------------------------------------------- /data_generation/nvisii_data_gen/models_with_symmetries/hex_screw/google_16k/textured.mtl: -------------------------------------------------------------------------------- 1 | # Blender MTL File: 'textured.blend' 2 | # Material Count: 1 3 | 4 | newmtl hexagon_material 5 | Ns 225.000000 6 | Ka 1.000000 1.000000 1.000000 7 | Kd 0.800000 0.800000 0.800000 8 | Ks 0.500000 0.500000 0.500000 9 | Ke 0.000000 0.000000 0.000000 10 | Ni 1.450000 11 | d 1.000000 12 | illum 2 13 | map_Kd texture_map_flat.png 14 | -------------------------------------------------------------------------------- /data_generation/nvisii_data_gen/models_with_symmetries/hex_screw/google_16k/textured.obj: -------------------------------------------------------------------------------- 1 | # Blender v3.0.1 OBJ File: 'textured.blend' 2 | # www.blender.org 3 | mtllib textured.mtl 4 | o Cylinder 5 | v -0.000000 0.075000 0.000000 6 | v -0.000000 0.075000 0.075000 7 | v 0.064952 0.037500 0.000000 8 | v 0.064952 0.037500 0.075000 9 | v 0.064952 -0.037500 0.000000 10 | v 0.064952 -0.037500 0.075000 11 | v -0.000000 -0.075000 0.000000 12 | v -0.000000 -0.075000 0.075000 13 | v -0.064952 -0.037500 0.000000 14 | v -0.064952 -0.037500 0.075000 15 | v -0.064952 0.037500 0.000000 16 | v -0.064952 0.037500 0.075000 17 | v 0.000000 0.037500 -0.075000 18 | v 0.000000 0.037500 0.000000 19 | v 0.007316 0.036779 -0.075000 20 | v 0.007316 0.036779 0.000000 21 | v 0.014351 0.034645 -0.075000 22 | v 0.014351 0.034645 0.000000 23 | v 0.020834 0.031180 -0.075000 24 | v 0.020834 0.031180 0.000000 25 | v 0.026517 0.026517 -0.075000 26 | v 0.026517 0.026517 0.000000 27 | v 0.031180 0.020834 -0.075000 28 | v 0.031180 0.020834 0.000000 29 | v 0.034645 0.014351 -0.075000 30 | v 0.034645 0.014351 0.000000 31 | v 0.036779 0.007316 -0.075000 32 | v 0.036779 0.007316 0.000000 33 | v 0.037500 0.000000 -0.075000 34 | v 0.037500 -0.000000 -0.000000 35 | v 0.036779 -0.007316 -0.075000 36 | v 0.036779 -0.007316 -0.000000 37 | v 0.034645 -0.014351 -0.075000 38 | v 0.034645 -0.014351 -0.000000 39 | v 0.031180 -0.020834 -0.075000 40 | v 0.031180 -0.020834 -0.000000 41 | v 0.026517 -0.026517 -0.075000 42 | v 0.026517 -0.026517 -0.000000 43 | v 0.020834 -0.031180 -0.075000 44 | v 0.020834 -0.031180 -0.000000 45 | v 0.014351 -0.034645 -0.075000 46 | v 0.014351 -0.034645 -0.000000 47 | v 0.007316 -0.036779 -0.075000 48 | v 0.007316 -0.036779 -0.000000 49 | v -0.000000 -0.037500 -0.075000 50 | v -0.000000 -0.037500 -0.000000 51 | v -0.007316 -0.036779 -0.075000 52 | v -0.007316 -0.036779 -0.000000 53 | v -0.014351 -0.034645 -0.075000 54 | v -0.014351 -0.034645 -0.000000 55 | v -0.020834 -0.031180 -0.075000 56 | v -0.020834 -0.031180 -0.000000 57 | v -0.026517 -0.026517 -0.075000 58 | v -0.026517 -0.026517 -0.000000 59 | v -0.031180 -0.020834 -0.075000 60 | v -0.031180 -0.020834 -0.000000 61 | v -0.034645 -0.014351 -0.075000 62 | v -0.034645 -0.014351 -0.000000 63 | v -0.036779 -0.007316 -0.075000 64 | v -0.036779 -0.007316 -0.000000 65 | v -0.037500 0.000000 -0.075000 66 | v -0.037500 -0.000000 -0.000000 67 | v -0.036779 0.007316 -0.075000 68 | v -0.036779 0.007316 0.000000 69 | v -0.034645 0.014351 -0.075000 70 | v -0.034645 0.014351 0.000000 71 | v -0.031180 0.020834 -0.075000 72 | v -0.031180 0.020834 0.000000 73 | v -0.026517 0.026517 -0.075000 74 | v -0.026517 0.026516 0.000000 75 | v -0.020834 0.031180 -0.075000 76 | v -0.020834 0.031180 0.000000 77 | v -0.014351 0.034645 -0.075000 78 | v -0.014351 0.034645 0.000000 79 | v -0.007316 0.036779 -0.075000 80 | v -0.007316 0.036779 0.000000 81 | vt 1.000000 0.749602 82 | vt 1.000000 0.916269 83 | vt 0.833333 0.916269 84 | vt 0.833333 0.749602 85 | vt 0.666667 0.916269 86 | vt 0.666667 0.749602 87 | vt 0.500000 0.916269 88 | vt 0.500000 0.749602 89 | vt 0.333333 0.916269 90 | vt 0.333333 0.749602 91 | vt 0.457846 0.130000 92 | vt 0.457846 0.370000 93 | vt 0.250000 0.490000 94 | vt 0.042154 0.370000 95 | vt 0.042154 0.130000 96 | vt 0.250000 0.010000 97 | vt 0.166667 0.916269 98 | vt 0.166667 0.749602 99 | vt -0.000000 0.916269 100 | vt -0.000000 0.749602 101 | vt 0.785595 0.311550 102 | vt 0.889518 0.371550 103 | vt 0.993441 0.311550 104 | vt 0.993441 0.191550 105 | vt 0.889518 0.131550 106 | vt 0.785595 0.191550 107 | vt 1.000000 0.505485 108 | vt 1.000000 0.672152 109 | vt 0.968750 0.672152 110 | vt 0.968750 0.505485 111 | vt 0.937500 0.672152 112 | vt 0.937500 0.505485 113 | vt 0.906250 0.672152 114 | vt 0.906250 0.505485 115 | vt 0.875000 0.672152 116 | vt 0.875000 0.505485 117 | vt 0.843750 0.672152 118 | vt 0.843750 0.505485 119 | vt 0.812500 0.672152 120 | vt 0.812500 0.505485 121 | vt 0.781250 0.672152 122 | vt 0.781250 0.505485 123 | vt 0.750000 0.672152 124 | vt 0.750000 0.505485 125 | vt 0.718750 0.672152 126 | vt 0.718750 0.505485 127 | vt 0.687500 0.672152 128 | vt 0.687500 0.505485 129 | vt 0.656250 0.672152 130 | vt 0.656250 0.505485 131 | vt 0.625000 0.672152 132 | vt 0.625000 0.505485 133 | vt 0.593750 0.672152 134 | vt 0.593750 0.505485 135 | vt 0.562500 0.672152 136 | vt 0.562500 0.505485 137 | vt 0.531250 0.672152 138 | vt 0.531250 0.505485 139 | vt 0.500000 0.672152 140 | vt 0.500000 0.505485 141 | vt 0.468750 0.672152 142 | vt 0.468750 0.505485 143 | vt 0.437500 0.672152 144 | vt 0.437500 0.505485 145 | vt 0.406250 0.672152 146 | vt 0.406250 0.505485 147 | vt 0.375000 0.672152 148 | vt 0.375000 0.505485 149 | vt 0.343750 0.672152 150 | vt 0.343750 0.505485 151 | vt 0.312500 0.672152 152 | vt 0.312500 0.505485 153 | vt 0.281250 0.672152 154 | vt 0.281250 0.505485 155 | vt 0.250000 0.672152 156 | vt 0.250000 0.505485 157 | vt 0.218750 0.672152 158 | vt 0.218750 0.505485 159 | vt 0.187500 0.672152 160 | vt 0.187500 0.505485 161 | vt 0.156250 0.672152 162 | vt 0.156250 0.505485 163 | vt 0.125000 0.672152 164 | vt 0.125000 0.505485 165 | vt 0.093750 0.672152 166 | vt 0.093750 0.505485 167 | vt 0.062500 0.672152 168 | vt 0.062500 0.505485 169 | vt 0.031250 0.672152 170 | vt 0.031250 0.505485 171 | vt 0.000000 0.672152 172 | vt 0.000000 0.505485 173 | vt 0.750000 0.370000 174 | vt 0.726589 0.367694 175 | vt 0.704078 0.360866 176 | vt 0.683332 0.349776 177 | vt 0.665147 0.334853 178 | vt 0.650224 0.316668 179 | vt 0.639134 0.295922 180 | vt 0.632306 0.273411 181 | vt 0.630000 0.250000 182 | vt 0.632306 0.226589 183 | vt 0.639134 0.204078 184 | vt 0.650224 0.183332 185 | vt 0.665147 0.165147 186 | vt 0.683332 0.150224 187 | vt 0.704078 0.139134 188 | vt 0.726589 0.132306 189 | vt 0.750000 0.130000 190 | vt 0.773411 0.132306 191 | vt 0.795922 0.139134 192 | vt 0.816668 0.150224 193 | vt 0.834853 0.165147 194 | vt 0.849776 0.183332 195 | vt 0.860866 0.204078 196 | vt 0.867694 0.226589 197 | vt 0.870000 0.250000 198 | vt 0.867694 0.273411 199 | vt 0.860866 0.295922 200 | vt 0.849776 0.316668 201 | vt 0.834853 0.334853 202 | vt 0.816668 0.349776 203 | vt 0.795922 0.360866 204 | vt 0.773411 0.367694 205 | vn 0.5000 0.8660 0.0000 206 | vn 1.0000 0.0000 0.0000 207 | vn 0.5000 -0.8660 0.0000 208 | vn -0.5000 -0.8660 0.0000 209 | vn 0.0000 0.0000 1.0000 210 | vn -1.0000 0.0000 0.0000 211 | vn -0.5000 0.8660 0.0000 212 | vn 0.0000 0.0000 -1.0000 213 | vn 0.0980 0.9952 0.0000 214 | vn 0.2903 0.9569 0.0000 215 | vn 0.4714 0.8819 0.0000 216 | vn 0.6344 0.7730 0.0000 217 | vn 0.7730 0.6344 0.0000 218 | vn 0.8819 0.4714 0.0000 219 | vn 0.9569 0.2903 0.0000 220 | vn 0.9952 0.0980 0.0000 221 | vn 0.9952 -0.0980 -0.0000 222 | vn 0.9569 -0.2903 -0.0000 223 | vn 0.8819 -0.4714 -0.0000 224 | vn 0.7730 -0.6344 -0.0000 225 | vn 0.6344 -0.7730 -0.0000 226 | vn 0.4714 -0.8819 -0.0000 227 | vn 0.2903 -0.9569 0.0000 228 | vn 0.0980 -0.9952 -0.0000 229 | vn -0.0980 -0.9952 -0.0000 230 | vn -0.2903 -0.9569 -0.0000 231 | vn -0.4714 -0.8819 -0.0000 232 | vn -0.6344 -0.7730 -0.0000 233 | vn -0.7730 -0.6344 -0.0000 234 | vn -0.8819 -0.4714 -0.0000 235 | vn -0.9569 -0.2903 -0.0000 236 | vn -0.9952 -0.0980 -0.0000 237 | vn -0.9952 0.0980 0.0000 238 | vn -0.9569 0.2903 0.0000 239 | vn -0.8819 0.4714 0.0000 240 | vn -0.7730 0.6344 0.0000 241 | vn -0.6344 0.7730 0.0000 242 | vn -0.4714 0.8819 0.0000 243 | vn -0.2903 0.9569 0.0000 244 | vn -0.0980 0.9952 0.0000 245 | usemtl hexagon_material 246 | s off 247 | f 1/1/1 2/2/1 4/3/1 3/4/1 248 | f 3/4/2 4/3/2 6/5/2 5/6/2 249 | f 5/6/3 6/5/3 8/7/3 7/8/3 250 | f 7/8/4 8/7/4 10/9/4 9/10/4 251 | f 6/11/5 4/12/5 2/13/5 12/14/5 10/15/5 8/16/5 252 | f 9/10/6 10/9/6 12/17/6 11/18/6 253 | f 11/18/7 12/17/7 2/19/7 1/20/7 254 | f 11/21/8 1/22/8 3/23/8 5/24/8 7/25/8 9/26/8 255 | f 13/27/9 14/28/9 16/29/9 15/30/9 256 | f 15/30/10 16/29/10 18/31/10 17/32/10 257 | f 17/32/11 18/31/11 20/33/11 19/34/11 258 | f 19/34/12 20/33/12 22/35/12 21/36/12 259 | f 21/36/13 22/35/13 24/37/13 23/38/13 260 | f 23/38/14 24/37/14 26/39/14 25/40/14 261 | f 25/40/15 26/39/15 28/41/15 27/42/15 262 | f 27/42/16 28/41/16 30/43/16 29/44/16 263 | f 29/44/17 30/43/17 32/45/17 31/46/17 264 | f 31/46/18 32/45/18 34/47/18 33/48/18 265 | f 33/48/19 34/47/19 36/49/19 35/50/19 266 | f 35/50/20 36/49/20 38/51/20 37/52/20 267 | f 37/52/21 38/51/21 40/53/21 39/54/21 268 | f 39/54/22 40/53/22 42/55/22 41/56/22 269 | f 41/56/23 42/55/23 44/57/23 43/58/23 270 | f 43/58/24 44/57/24 46/59/24 45/60/24 271 | f 45/60/25 46/59/25 48/61/25 47/62/25 272 | f 47/62/26 48/61/26 50/63/26 49/64/26 273 | f 49/64/27 50/63/27 52/65/27 51/66/27 274 | f 51/66/28 52/65/28 54/67/28 53/68/28 275 | f 53/68/29 54/67/29 56/69/29 55/70/29 276 | f 55/70/30 56/69/30 58/71/30 57/72/30 277 | f 57/72/31 58/71/31 60/73/31 59/74/31 278 | f 59/74/32 60/73/32 62/75/32 61/76/32 279 | f 61/76/33 62/75/33 64/77/33 63/78/33 280 | f 63/78/34 64/77/34 66/79/34 65/80/34 281 | f 65/80/35 66/79/35 68/81/35 67/82/35 282 | f 67/82/36 68/81/36 70/83/36 69/84/36 283 | f 69/84/37 70/83/37 72/85/37 71/86/37 284 | f 71/86/38 72/85/38 74/87/38 73/88/38 285 | f 73/88/39 74/87/39 76/89/39 75/90/39 286 | f 75/90/40 76/89/40 14/91/40 13/92/40 287 | f 13/93/8 15/94/8 17/95/8 19/96/8 21/97/8 23/98/8 25/99/8 27/100/8 29/101/8 31/102/8 33/103/8 35/104/8 37/105/8 39/106/8 41/107/8 43/108/8 45/109/8 47/110/8 49/111/8 51/112/8 53/113/8 55/114/8 57/115/8 59/116/8 61/117/8 63/118/8 65/119/8 67/120/8 69/121/8 71/122/8 73/123/8 75/124/8 288 | -------------------------------------------------------------------------------- /data_generation/nvisii_data_gen/output/output_example/00000.depth.exr: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/NVlabs/Deep_Object_Pose/6cb3f3d250aed270e91b605e7cb6e64db4b97dbe/data_generation/nvisii_data_gen/output/output_example/00000.depth.exr -------------------------------------------------------------------------------- /data_generation/nvisii_data_gen/output/output_example/00000.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/NVlabs/Deep_Object_Pose/6cb3f3d250aed270e91b605e7cb6e64db4b97dbe/data_generation/nvisii_data_gen/output/output_example/00000.png -------------------------------------------------------------------------------- /data_generation/nvisii_data_gen/output/output_example/00000.seg.exr: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/NVlabs/Deep_Object_Pose/6cb3f3d250aed270e91b605e7cb6e64db4b97dbe/data_generation/nvisii_data_gen/output/output_example/00000.seg.exr -------------------------------------------------------------------------------- /data_generation/nvisii_data_gen/readme.md: -------------------------------------------------------------------------------- 1 | 2 | 3 | # Description 4 | 5 | These sample scripts use [NViSII](https://github.com/owl-project/NVISII) to generate synthetic data for training the [DOPE](https://github.com/NVlabs/Deep_Object_Pose) object pose estimator. 6 | The data can also be used for training other networks. 7 | To generate the data, you will need NVIDIA drivers 450 or above. 8 | We also highly recommend a GPU with RTX capabilities, as ray tracing can be costly on a non-RTX GPU. 9 | 10 | The code in this repo is a cleaned-up version of what was used to generate the data called `dome` in our NViSII [paper](https://arxiv.org/abs/2105.13962). 11 | 12 | 13 | 14 | # Installation 15 | ``` 16 | pip install -r requirements.txt 17 | ``` 18 | 19 | ## HDRI maps 20 | You will need to download HDRI maps to illuminate the scene. These can be found freely on [polyhaven](https://polyhaven.com/hdris). 21 | For testing purposes, you can download a single one here: 22 | ``` 23 | wget https://www.dropbox.com/s/na3vo8rca7feoiq/teatro_massimo_2k.hdr 24 | mv teatro_massimo_2k.hdr dome_hdri_haven/ 25 | ``` 26 | 27 | 28 | ## Distractors 29 | 30 | The script, as is, expects some objects to be used as distractors. It is currently using the [Google scanned objects dataset](https://app.ignitionrobotics.org/GoogleResearch/fuel/collections/Google%20Scanned%20Objects), which can be download automatically with the following: 31 | 32 | ``` 33 | python download_google_scanned_objects.py 34 | ``` 35 | 36 | If you do *not* want to use the distractors, use the following argument when running the script: `--nb_distractors 0`. 37 | 38 | # Running the script 39 | 40 | If you downloaded everything from the previous steps, _e.g._, a single HDRI map and some distractors from Google scanned objects, you can run the following command: 41 | 42 | ``` 43 | python single_video_pybullet.py --nb_frames 1 --scale 0.01 44 | ``` 45 | 46 | This will generate a single frame example in `output/output_example/`. The image should be similar to the following: 47 | 48 | ![example output image](output/output_example/00000.png) 49 | 50 | The script has a few controls that are exposed at the beginning of the file. 51 | Please consult `single_video_pybullet.py --help` for a complete list of parameters. 52 | The major parameters are as follows: 53 | - `--spp` for the number of sample per pixel, the higher it is the better quality the resulting image. 54 | - `--nb_frames` number of images to export. 55 | - `--outf` folder to store the data. 56 | - `--nb_objects` the number of objects to load, this can reload the same object multiple times. 57 | - `--nb_distractors` how many objects to add as distractors, this uses 3D models from Google scanned objects. 58 | 59 | # Adding your own 3D models 60 | 61 | You can simply use `--path_single_obj` to load your own 3d model. But there are some limitations for exporting the meta data if the obj is complex. Try to have it as a single obj, e.g., not multiple textures, similar to the provided one in the repo. 62 | 63 | ## Modifying the code to load your object 64 | 65 | The script loads 3d models that are expressed in the format that was introduced by YCB dataset. 66 | But it is fairly easy to change the script to load your own 3d model, [NViSII](https://github.com/owl-project/NVISII) allows you to load different format 67 | as well, not just `obj` files. In `single_video_pybullet.py` find the following code: 68 | 69 | ```python 70 | for i_obj in range(int(opt.nb_objects)): 71 | 72 | toy_to_load = google_content_folder[random.randint(0,len(google_content_folder)-1)] 73 | 74 | obj_to_load = toy_to_load + "/google_16k/textured.obj" 75 | texture_to_load = toy_to_load + "/google_16k/texture_map_flat.png" 76 | name = "hope_" + toy_to_load.split('/')[-2] + f"_{i_obj}" 77 | adding_mesh_object(name,obj_to_load,texture_to_load,scale=0.01) 78 | ``` 79 | You can change the `obj_to_load` and `texture_to_load` to match your data format. If your file format is quite different, for example you are using a `.glb` file, then in the function `adding_mesh_object()` you will need to change the following: 80 | 81 | ```python 82 | if obj_to_load in mesh_loaded: 83 | toy_mesh = mesh_loaded[obj_to_load] 84 | else: 85 | toy_mesh = visii.mesh.create_from_file(name,obj_to_load) 86 | mesh_loaded[obj_to_load] = toy_mesh 87 | ``` 88 | `visii.mesh.create_from_file` is the function that is used to load the data, this can load different file format. The rest of that function also loads the right texture as well as applying a material. The function also creates a collision mesh to make the object move. 89 | 90 | 91 | 92 | 93 | 94 | ## Updates 95 | 96 | - 11/01/2022: Added the possibility to load a single object with `--path_single_obj`. Just give the direct path to the object. 97 | This function uses [nvisii.import_scene()](https://nvisii.com/nvisii.html#nvisii.import_scene). 98 | If the obj file is complex, it will break the object into sub components, 99 | so you might not have the projected cuboid, and you will get each pose of the different components with the cuboid. 100 | Be careful using this one, make sure your understand the implications. 101 | TODO: track the cuboid of the import_scene from nvisii. 102 | 103 | 104 | ## Citation 105 | 106 | If you use this data generation script in your research, please cite as follows: 107 | 108 | ```latex 109 | @misc{morrical2021nvisii, 110 | title={NViSII: A Scriptable Tool for Photorealistic Image Generation}, 111 | author={Nathan Morrical and Jonathan Tremblay and Yunzhi Lin and Stephen Tyree and Stan Birchfield and Valerio Pascucci and Ingo Wald}, 112 | year={2021}, 113 | eprint={2105.13962}, 114 | archivePrefix={arXiv}, 115 | primaryClass={cs.CV} 116 | } 117 | ``` 118 | -------------------------------------------------------------------------------- /data_generation/nvisii_data_gen/requirements.txt: -------------------------------------------------------------------------------- 1 | nvisii 2 | numpy 3 | opencv-python 4 | pybullet 5 | randomcolor 6 | requests 7 | simplejson 8 | Pillow 9 | pyquaternion 10 | -------------------------------------------------------------------------------- /data_generation/validate_data.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python3 2 | 3 | import json 4 | import numpy as np 5 | import os 6 | from PIL import Image, ImageDraw 7 | from pyquaternion import Quaternion 8 | import sys 9 | sys.path.append("../common/") 10 | from cuboid import CuboidVertexType 11 | 12 | def main(json_files): 13 | for json_fn in json_files: 14 | # Find corresponding PNG 15 | base, _ = os.path.splitext(json_fn) 16 | img_fn = base+'.png' 17 | if not os.path.isfile(img_fn): 18 | print(f"Could not locate '{img_fn}'. Skipping..") 19 | continue 20 | 21 | # Load JSON data 22 | with open(json_fn, 'r') as F: 23 | data_json = json.load(F) 24 | up = np.array(data_json['camera_data']['camera_look_at']['up']) 25 | at = np.array(data_json['camera_data']['camera_look_at']['at']) 26 | eye = np.array(data_json['camera_data']['camera_look_at']['eye']) 27 | 28 | cam_matrix = np.eye(4) 29 | cam_matrix[0:3,0] = up 30 | cam_matrix[0:3,1] = np.cross(up, -at) 31 | cam_matrix[0:3,2] = -at 32 | cam_matrix[0:3,3] = -eye 33 | 34 | img = Image.open(img_fn) 35 | 36 | objects = data_json['objects'] 37 | # draw projected cuboid dots 38 | for oo in objects: 39 | draw = ImageDraw.Draw(img) 40 | pts = oo['projected_cuboid'] 41 | for idx, pt in enumerate(pts): 42 | draw.ellipse((pt[0]-2, pt[1]-2, pt[0]+2, pt[1]+2), fill = 'cyan', 43 | outline ='cyan') 44 | 45 | # Note that the enum names DO NOT MATCH the positions of the points 46 | # when projected into 3D. This is an old bug that will not be fixed, 47 | # as it will result in errors in inference in older trained models 48 | line_order = [ 49 | # Front 50 | [CuboidVertexType.FrontTopRight, CuboidVertexType.FrontTopLeft, 'red'], 51 | [CuboidVertexType.FrontTopLeft, CuboidVertexType.FrontBottomLeft, 'red'], 52 | [CuboidVertexType.FrontBottomRight, CuboidVertexType.FrontBottomLeft, 'red'], 53 | [CuboidVertexType.FrontBottomRight, CuboidVertexType.FrontTopRight, 'red'], 54 | # Rear 55 | [CuboidVertexType.RearTopRight, CuboidVertexType.RearTopLeft, 'cyan'], 56 | [CuboidVertexType.RearBottomLeft, CuboidVertexType.RearTopLeft, 'cyan'], 57 | [CuboidVertexType.RearBottomLeft, CuboidVertexType.RearBottomRight, 'cyan'], 58 | [CuboidVertexType.RearTopRight, CuboidVertexType.RearBottomRight, 'cyan'], 59 | # Sides 60 | [CuboidVertexType.FrontTopRight, CuboidVertexType.RearTopRight, 'green'], 61 | [CuboidVertexType.RearBottomRight, CuboidVertexType.FrontBottomRight, 'green'], 62 | [CuboidVertexType.RearTopLeft, CuboidVertexType.FrontTopLeft, 'cyan'], 63 | [CuboidVertexType.FrontBottomLeft, CuboidVertexType.RearBottomLeft, 'cyan'], 64 | # 'X' on top 65 | [CuboidVertexType.FrontTopRight, CuboidVertexType.RearTopLeft, 'cyan'], 66 | [CuboidVertexType.FrontTopLeft, CuboidVertexType.RearTopRight, 'cyan'] 67 | ] 68 | 69 | for ll in line_order: 70 | draw.line([(pts[ll[0]][0],pts[ll[0]][1]), (pts[ll[1]][0],pts[ll[1]][1])], 71 | fill=ll[2], width=1) 72 | 73 | img.save(base+'-output.png') 74 | 75 | 76 | def usage_msg(script_name): 77 | print(f"Usage: {script_name} _JSON FILES_") 78 | print(" The basename of the JSON files in _FILES_ will be used to find its") 79 | print(" corresponding image file; i.e. if `00001.json` is provided, the code") 80 | print(" will look for an image named `00001.png`") 81 | 82 | 83 | if __name__ == "__main__": 84 | # Print out usage information if there are no arguments 85 | if len(sys.argv) < 2: 86 | usage_msg(sys.argv[0]) 87 | exit(0) 88 | 89 | # ..or if the first argument is a request for help 90 | s = sys.argv[1].lstrip('-') 91 | if s == "h" or s == "help": 92 | usage_msg(sys.argv[0]) 93 | exit(0) 94 | 95 | main(sys.argv[1:]) 96 | 97 | 98 | -------------------------------------------------------------------------------- /doc/camera_tutorial.md: -------------------------------------------------------------------------------- 1 | ## Running DOPE with a webcam 2 | 3 | This tutorial explains how to: 4 | 5 | 1. start a ROS driver for a regular USB webcam 6 | 2. calibrate the camera **or** enter the camera intrinsics manually 7 | 3. rectify the images and publish them on a topic 8 | 9 | Since DOPE relies solely on RGB images and the associated `camera_info` topic, 10 | it is essential that the camera is properly calibrated to give good results. 11 | Also, unless you are using a very low-distortion lens, the images should be 12 | rectified before feeding them to DOPE. 13 | 14 | ### A. Starting a ROS driver for a USB webcam 15 | 16 | In this tutorial, we're using the [usb_cam](http://wiki.ros.org/usb_cam) 17 | ROS package. If this package is not working with your camera, simply google 18 | around - nowadays there is a ROS driver for almost every camera. 19 | 20 | 1. Install the driver: 21 | 22 | ```bash 23 | sudo apt install ros-kinetic-usb-cam 24 | ``` 25 | 26 | 2. Run the camera driver (enter each command in a separate terminal) 27 | 28 | ```bash 29 | roscore 30 | rosrun usb_cam usb_cam_node _camera_name:='usb_cam' _camera_frame_id:='usb_cam' 31 | ``` 32 | 33 | See the [usb_cam wiki page](http://wiki.ros.org/usb_cam) for a list of all 34 | parameters. 35 | 36 | 3. Check that the camera is running: 37 | 38 | ``` 39 | $ rostopic list 40 | [...] 41 | /usb_cam/camera_info 42 | /usb_cam/image_raw 43 | [...] 44 | $ rostopic hz /usb_cam/image_raw 45 | subscribed to [/usb_cam/image_raw] 46 | average rate: 30.001 47 | min: 0.029s max: 0.038s std dev: 0.00280s window: 28 48 | ``` 49 | 50 | 4. If you want, you can also run `rviz` to visualize the camera topic. 51 | 52 | Since the camera is still uncalibrated, you should have seen the following 53 | warning when starting the `usb_cam` node in step 2: 54 | 55 | ``` 56 | [ WARN] [1561548002.895791819]: Camera calibration file /home/******/.ros/camera_info/usb_cam.yaml not found. 57 | ``` 58 | 59 | Also, the camera_info topic is all zeros: 60 | 61 | ```bash 62 | $ rostopic echo -n1 /usb_cam/camera_info 63 | header: 64 | seq: 87 65 | stamp: 66 | secs: 1561548114 67 | nsecs: 388301085 68 | frame_id: "usb_cam" 69 | height: 480 70 | width: 640 71 | distortion_model: '' 72 | D: [] 73 | K: [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0] 74 | R: [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0] 75 | P: [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0] 76 | binning_x: 0 77 | binning_y: 0 78 | roi: 79 | x_offset: 0 80 | y_offset: 0 81 | height: 0 82 | width: 0 83 | do_rectify: False 84 | ``` 85 | 86 | To fix this, we need to generate a file called 87 | `~/.ros/camera_info/usb_cam.yaml` which holds the camera intrinsics. Either 88 | follow step **B** or **C** to do this. 89 | 90 | ### B. Manually entering camera intrinsics 91 | 92 | If you know the camera intrinsics of your webcam, you can simply generate a new 93 | file `~/.ros/camera_info/usb_cam.yaml` which looks like this (the example is 94 | for a Logitech C920 webcam with the following intrinsics: fx = 641.5, 95 | fy = 641.5, cx = 320.0, cy = 240.0): 96 | 97 | 98 | ``` 99 | image_width: 640 100 | image_height: 480 101 | camera_name: usb_cam 102 | camera_matrix: 103 | rows: 3 104 | cols: 3 105 | data: [641.5, 0, 320.0, 0, 641.5, 240.0, 0, 0, 1] 106 | distortion_model: plumb_bob 107 | distortion_coefficients: 108 | rows: 1 109 | cols: 5 110 | data: [0, 0, 0, 0, 0] 111 | rectification_matrix: 112 | rows: 3 113 | cols: 3 114 | data: [1, 0, 0, 0, 1, 0, 0, 0, 1] 115 | projection_matrix: 116 | rows: 3 117 | cols: 4 118 | data: [641.5, 0, 320.0, 0, 0, 641.5, 240.0, 0, 0, 0, 1, 0] 119 | ``` 120 | 121 | After creating this file, restart the `usb_cam` driver for the changes to take 122 | effect. The warning "Camera calibration file not found" should have 123 | disappeared, and the `/usb_cam/camera_info` topic should reflect the values 124 | entered above. 125 | 126 | Since the camera intrinsics we supplied above do not specify distortion 127 | coefficients, the image does not need to be rectified, so you can skip the 128 | remaining steps and use the `/usb_cam/image_raw` topic as input for DOPE. 129 | 130 | If you want to do proper calibration and rectification instead, skip step **B** 131 | and continue with **C**. 132 | 133 | ### C. Calibrating the webcam 134 | 135 | Follow the steps in [this tutorial](http://wiki.ros.org/camera_calibration/Tutorials/MonocularCalibration). 136 | 137 | In short, run these commands: 138 | 139 | ```bash 140 | sudo apt install ros-kinetic-camera-calibration 141 | rosrun camera_calibration cameracalibrator.py --size 6x7 --square 0.0495 image:=/usb_cam/image_raw camera:=/usb_cam # adjust these values to your checkerboard 142 | ``` 143 | 144 | * Move your checkerboard around and make sure that you cover a good range of 145 | distance from the camera, all parts of the image, and horizontal and vertical 146 | skew of the checkerboard. 147 | * When done, press "calibrate" and **wait** until the calibration is complete. 148 | This can take a long time (minutes or hours), depending on how many 149 | calibration samples you took. As long as the image window is frozen and 150 | `camera_calibration` hogs a CPU, it's still computing. 151 | * Once the calibration has finished, the window will unfreeze. Press "save", 152 | then press "commit". 153 | 154 | After this, the calibration info should have been saved to 155 | `~/.ros/camera_info/usb_cam.yaml`. Restart the `usb_cam` driver for the changes 156 | to take effect. 157 | 158 | 159 | ### D. Rectifying the images 160 | 161 | 1. Install `image_proc`: 162 | 163 | ```bash 164 | sudo apt install ros-kinetic-image-proc 165 | ``` 166 | 167 | 2. Create a file called `usb_cam_image_proc.launch` with the following contents: 168 | 169 | ```xml 170 | 171 | 172 | 173 | 174 | 175 | 176 | 177 | 178 | 179 | 180 | 181 | 182 | 183 | 184 | 185 | ``` 186 | 187 | 3. Launch it: 188 | 189 | ```bash 190 | roslaunch usb_cam_image_proc.launch 191 | ``` 192 | 193 | This should publish the topic `/usb_cam/image_rect_color` (among others). You 194 | can now use this topic as the input for DOPE. 195 | -------------------------------------------------------------------------------- /dope_objects.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/NVlabs/Deep_Object_Pose/6cb3f3d250aed270e91b605e7cb6e64db4b97dbe/dope_objects.png -------------------------------------------------------------------------------- /evaluate/.gitignore: -------------------------------------------------------------------------------- 1 | content/ 2 | data/ 3 | results/ 4 | -------------------------------------------------------------------------------- /evaluate/download_content.sh: -------------------------------------------------------------------------------- 1 | mkdir data 2 | 3 | cd data 4 | 5 | wget https://www.dropbox.com/s/qeljw3vjnc416bs/table_003_cracker_box_dope_results.zip 6 | wget https://www.dropbox.com/s/mn2yqflc6fcqaic/table_003_cracker_box.zip 7 | 8 | unzip table_003_cracker_box_dope_results.zip 9 | rm table_003_cracker_box_dope_results.zip 10 | mkdir table_dope_results/ 11 | mv table_003_cracker_box table_dope_results/scene1/ 12 | 13 | unzip table_003_cracker_box.zip 14 | rm table_003_cracker_box.zip 15 | mkdir table_ground_truth/ 16 | mv table_003_cracker_box table_ground_truth/scene1/ 17 | 18 | cd ../ 19 | 20 | mkdir content 21 | cd content 22 | 23 | wget https://www.dropbox.com/s/b61es9q5nhwtooi/003_cracker_box.zip 24 | unzip 003_cracker_box.zip 25 | rm 003_cracker_box.zip 26 | mv 003_cracker_box 003_cracker_box_16k 27 | 28 | cd ../ -------------------------------------------------------------------------------- /evaluate/kpd_compute.py: -------------------------------------------------------------------------------- 1 | """ 2 | This script computes the average distance metric at the keypoint level 3 | from GT to GU. 4 | """ 5 | 6 | 7 | 8 | import argparse 9 | import os 10 | import numpy as np 11 | import glob 12 | import math 13 | 14 | # from pymesh import obj 15 | # from pymesh import ply 16 | # import pywavefront 17 | # import pymesh 18 | from scipy import spatial 19 | 20 | import simplejson as json 21 | import copy 22 | from pyquaternion import Quaternion 23 | import pickle 24 | import nvisii as visii 25 | import subprocess 26 | 27 | 28 | 29 | parser = argparse.ArgumentParser() 30 | 31 | parser.add_argument('--data_prediction', 32 | default = "data/table_dope_results/", 33 | help='path to prediction data') 34 | parser.add_argument('--data', 35 | default="data/table_ground_truth/", 36 | help='path to data ground truth') 37 | parser.add_argument("--outf", 38 | default="results_kpd/", 39 | help="where to put the data" 40 | ) 41 | parser.add_argument("--show", 42 | action='store_true', 43 | help="show the graph at the end. " 44 | ) 45 | 46 | opt = parser.parse_args() 47 | 48 | 49 | 50 | if opt.outf is None: 51 | opt.outf = opt.data_prediction 52 | 53 | if not os.path.isdir(opt.outf): 54 | print(f'creating the folder: {opt.outf}') 55 | os.mkdir(opt.outf) 56 | 57 | if os.path.isdir(opt.outf + "/tmp"): 58 | print(f'folder {opt.outf + "/tmp"}/ exists') 59 | else: 60 | os.mkdir(opt.outf + "/tmp") 61 | print(f'created folder {opt.outf + "/tmp"}/') 62 | 63 | def get_all_entries(path_to_explore, what='*.json'): 64 | 65 | imgs = [] 66 | 67 | def add_images(path): 68 | # print(path) 69 | # print(glob.glob(path+"/*json")) 70 | # print(glob.glob(path+"/"+what)) 71 | for j in sorted(glob.glob(path+"/"+what)): 72 | # print(j) 73 | imgs.append(j) 74 | # imgsname.append(j.replace(path,"").replace("/","")) 75 | 76 | 77 | def explore(path): 78 | if not os.path.isdir(path): 79 | return 80 | folders = [os.path.join(path, o) for o in os.listdir(path) 81 | if os.path.isdir(os.path.join(path,o))] 82 | # if len(folders)>0: 83 | for path_entry in folders: 84 | explore(path_entry) 85 | 86 | 87 | add_images(path) 88 | 89 | explore(path_to_explore) 90 | return imgs 91 | 92 | 93 | 94 | 95 | 96 | ###### START ####### 97 | 98 | data_thruth = get_all_entries(opt.data,"*.json") 99 | data_prediction = get_all_entries(opt.data_prediction,"*.json") 100 | 101 | 102 | print('number of ground thruths found',len(data_thruth)) 103 | print("number of predictions found",len(data_prediction)) 104 | 105 | adds_objects = {} 106 | 107 | adds_all = [] 108 | all_gts = [] 109 | count_all_annotations = 0 110 | count_by_object = {} 111 | 112 | count_all_guesses = 0 113 | count_by_object_guesses = {} 114 | 115 | 116 | for gt_file in data_thruth: 117 | scene_gt = gt_file.replace(opt.data,"").replace('.json','') 118 | pred_scene = None 119 | 120 | 121 | for d in data_prediction: 122 | scene_d = d.replace(opt.data_prediction,'').replace('json','').replace('.','') 123 | 124 | # if scene in d: 125 | # print(scene_d,scene_gt) 126 | if scene_d.split('/')[-1] == scene_gt.split('/')[-1]: 127 | pred_scene = d 128 | break 129 | 130 | if pred_scene is None: 131 | continue 132 | # print(gt_file) 133 | gt_json = None 134 | with open(gt_file) as json_file: 135 | gt_json = json.load(json_file) 136 | 137 | gu_json = None 138 | with open(pred_scene) as json_file: 139 | gu_json = json.load(json_file) 140 | 141 | 142 | objects_gt = [] #name obj, keypoints 143 | 144 | for obj in gt_json['objects']: 145 | if 'class' not in obj: 146 | name_gt = obj['name'] 147 | else: 148 | name_gt = obj['class'] 149 | # little hack from bug in the data 150 | if name_gt == '003': 151 | name_gt = "003_cracker_box_16k" 152 | 153 | objects_gt.append( 154 | [ 155 | name_gt, 156 | obj["projected_cuboid"] 157 | ] 158 | ) 159 | 160 | count_all_annotations += 1 161 | 162 | if name_gt in count_by_object: 163 | count_by_object[name_gt] +=1 164 | else: 165 | count_by_object[name_gt] = 1 166 | 167 | for obj_guess in gu_json['objects']: 168 | 169 | if 'class' not in obj: 170 | name_guess = obj_guess['name'] 171 | name_look_up = obj_guess['name'] 172 | else: 173 | name_guess = obj_guess['class'] 174 | name_look_up = obj_guess['class'] 175 | 176 | 177 | keypoints_gu = obj_guess["projected_cuboid"] 178 | 179 | count_all_guesses += 1 180 | 181 | if name_guess in count_by_object_guesses: 182 | count_by_object_guesses[name_guess] +=1 183 | else: 184 | count_by_object_guesses[name_guess] = 1 185 | 186 | 187 | # print (name, pose_mesh) 188 | candidates = [] 189 | for i_obj_gt, obj_gt in enumerate(objects_gt): 190 | name_gt, pose_mesh_gt = obj_gt 191 | 192 | # print(name_look_up,name_gt) 193 | 194 | if name_look_up == name_gt: 195 | candidates.append([i_obj_gt, pose_mesh_gt, name_gt]) 196 | 197 | best_dist = 10000000000 198 | best_index = -1 199 | 200 | for candi_gt in candidates: 201 | # compute the add 202 | i_gt, keypoint_gt, name_gt = candi_gt 203 | dist = [] 204 | 205 | for i in range(len(keypoints_gu)): 206 | dist_key = 100000 207 | for j in range(len(keypoints_gu)): 208 | d = np.sqrt((keypoint_gt[i][0]-keypoints_gu[j][0])**2+(keypoint_gt[i][1]-keypoints_gu[j][1])**2) 209 | # print(keypoint_gt[i],keypoints_gu[i],i,d) 210 | if d < dist_key: 211 | dist_key = d 212 | dist.append(dist_key) 213 | 214 | 215 | dist = np.mean(dist) 216 | 217 | if dist < best_dist: 218 | best_dist = dist 219 | best_index = i_gt 220 | 221 | if best_index != -1: 222 | if not name_guess in adds_objects.keys(): 223 | adds_objects[name_guess] = [] 224 | adds_all.append(best_dist) 225 | adds_objects[name_guess].append(best_dist) 226 | 227 | # save the data 228 | if len(opt.outf.split("/"))>1: 229 | path = None 230 | for folder in opt.outf.split("/"): 231 | if path is None: 232 | path = folder 233 | else: 234 | path = path + "/" + folder 235 | try: 236 | os.mkdir(path) 237 | except: 238 | pass 239 | else: 240 | try: 241 | os.mkdir(opt.outf) 242 | except: 243 | pass 244 | print(adds_objects.keys()) 245 | count_by_object["all"] = count_all_annotations 246 | pickle.dump(count_by_object,open(f'{opt.outf}/count_all_annotations.p','wb')) 247 | pickle.dump(adds_all,open(f'{opt.outf}/adds_all.p','wb')) 248 | 249 | count_by_object_guesses["all"] = count_all_guesses 250 | pickle.dump(count_by_object,open(f'{opt.outf}/count_all_guesses.p','wb')) 251 | 252 | 253 | labels = [] 254 | data = [] 255 | for key in adds_objects.keys(): 256 | pickle.dump(adds_objects[key],open(f'{opt.outf}/adds_{key}.p','wb')) 257 | labels.append(key) 258 | data.append(f'{opt.outf}/adds_{key}.p') 259 | 260 | 261 | array_to_call = ["python", 262 | "make_graphs.py", 263 | '--pixels', 264 | '--threshold',"50.0", 265 | "--outf", 266 | opt.outf, 267 | '--labels', 268 | ] 269 | 270 | for label in labels: 271 | array_to_call.append(label) 272 | 273 | array_to_call.append('--data') 274 | for d_p in data: 275 | array_to_call.append(d_p) 276 | 277 | array_to_call.append('--colours') 278 | for i in range(len(data)): 279 | array_to_call.append(str(i)) 280 | if opt.show: 281 | array_to_call.append('--show') 282 | 283 | print(array_to_call) 284 | subprocess.call(array_to_call) 285 | 286 | # subprocess.call( 287 | # [ 288 | # "python", "make_graphs.py", 289 | # "--data", f'{opt.outf}/adds_{key}.p', 290 | # "--labels", key, 291 | # "--outf", opt.outf, 292 | # '--colours', "0", 293 | # ] 294 | # ) 295 | 296 | 297 | visii.deinitialize() 298 | 299 | -------------------------------------------------------------------------------- /evaluate/make_graphs.py: -------------------------------------------------------------------------------- 1 | import matplotlib 2 | import pickle 3 | import argparse 4 | import seaborn as sns 5 | import matplotlib.pyplot as plt 6 | import numpy as np 7 | import os 8 | 9 | import glob 10 | 11 | # load the data 12 | # might be multiple datasets 13 | 14 | # make the plots 15 | os.environ["CUDA_VISIBLE_DEVICES"]="1" 16 | 17 | parser = argparse.ArgumentParser() 18 | 19 | parser.add_argument('--data_folder', 20 | default = None, 21 | help='path to data output') 22 | parser.add_argument('--data', 23 | nargs='+', 24 | default=None, 25 | help='list of csv files') 26 | parser.add_argument('--labels', 27 | nargs='+', 28 | default=None, 29 | help='labels to put') 30 | parser.add_argument('--colours', 31 | nargs='+', 32 | default=None, 33 | help = '') 34 | 35 | parser.add_argument("--outf", 36 | default=None, 37 | help="where to put the data") 38 | 39 | parser.add_argument('--threshold', 40 | default = 0.1, 41 | type = float 42 | ) 43 | parser.add_argument('--title', 44 | default = 'AUC') 45 | parser.add_argument('--filename', 46 | default = 'output') 47 | parser.add_argument('--styles', 48 | nargs='+', 49 | default=None, 50 | help = '') 51 | parser.add_argument("--show", 52 | action='store_true', 53 | help="show the graph at the end. " 54 | ) 55 | parser.add_argument("--pixels", 56 | action='store_true', 57 | help="Using keypoint distance as metric" 58 | ) 59 | opt = parser.parse_args() 60 | sns.set_style("white") 61 | sns.set_style("ticks") 62 | sns.set_context("paper") 63 | # sns.set_context("notebook") 64 | # sns.set_context("talk") 65 | sns.despine() 66 | # load the data 67 | 68 | # if folder load all the files and create a graph 69 | # if a list put all of them in the same graph 70 | 71 | plt.tight_layout() 72 | # sns.set(font_scale=1.1) 73 | 74 | if opt.data_folder is not None: 75 | # load the data from the file 76 | adds_to_load = glob.glob(f"{opt.data_folder}/adds*") 77 | counts_dict = pickle.load(open(f"{opt.data_folder}/count_all_annotations.p",'rb')) 78 | 79 | else: 80 | # load the files in the list 81 | adds_to_load = opt.data 82 | counts_dict = None 83 | 84 | fig = plt.figure() 85 | ax = plt.axes() 86 | 87 | 88 | for i_file, file in enumerate(adds_to_load): 89 | print(file) 90 | label = file.split("/")[-1] 91 | label = label.replace('adds_','').replace(".p",'') 92 | filename = label 93 | 94 | if not counts_dict is None: 95 | fig = plt.figure() 96 | ax = plt.axes() 97 | 98 | n_pnp_possible_frames = counts_dict[filename] 99 | 100 | else: 101 | # check labels 102 | try: 103 | label = opt.labels[i_file] 104 | except: 105 | label = filename 106 | 107 | # get n possible solutions 108 | path = "/".join(file.split("/")[0:-1]) + '/' 109 | n_pnp_possible_frames = pickle.load(open(f"{path}/count_all_annotations.p",'rb'))[filename] 110 | 111 | adds_objects = pickle.load(open(file,'rb')) 112 | 113 | # add_pnp_found = np.array(adds_objects)/100 114 | add_pnp_found = np.array(adds_objects) 115 | print('mean',add_pnp_found.mean(),'std',add_pnp_found.std(), 116 | 'ratio',f'{len(add_pnp_found)}/{n_pnp_possible_frames}') 117 | n_pnp_found = len(add_pnp_found) 118 | 119 | delta_threshold = opt.threshold/300 120 | add_threshold_values = np.arange(0., opt.threshold, delta_threshold) 121 | 122 | counts = [] 123 | for value in add_threshold_values: 124 | under_threshold = len(np.where(add_pnp_found <= value)[0])/n_pnp_possible_frames 125 | counts.append(under_threshold) 126 | 127 | for value in [0.02,0.04,0.06]: 128 | under_threshold = len(np.where(add_pnp_found <= value)[0])/n_pnp_possible_frames 129 | print('auc at ',value,':', under_threshold) 130 | auc = np.trapz(counts, dx = delta_threshold)/opt.threshold 131 | 132 | # divide might screw this up .... to check! 133 | print('auc',auc) 134 | # print('found', n_pnp_found/n_pnp_possible_frames) 135 | # print('mean', np.mean(add[np.where(add > pnp_sol_found_magic_number)])) 136 | # print('median',np.median(add[np.where(add > pnp_sol_found_magic_number)])) 137 | # print('std',np.std(add[np.where(add > pnp_sol_found_magic_number)])) 138 | 139 | cycle = plt.rcParams['axes.prop_cycle'].by_key()['color'] 140 | if counts_dict is None: 141 | colour = cycle[int(opt.colours[i_file])] 142 | # colour = cycle[int(i_file)] 143 | else: 144 | colour = cycle[0] 145 | 146 | try: 147 | style = args.styles[i_csv] 148 | if style == '0': 149 | style = '-' 150 | elif style == '1': 151 | style = '--' 152 | elif style == '2': 153 | style = ':' 154 | 155 | else: 156 | style = '-' 157 | except: 158 | style = '-' 159 | 160 | label = f'{label} ({auc:.3f})' 161 | ax.plot(add_threshold_values, counts,style,color=colour,label=label) 162 | 163 | if not counts_dict is None: 164 | if opt.pixels: 165 | plt.xlabel('L2 threshold distance (pixels)') 166 | else: 167 | plt.xlabel('ADD threshold distance (m)') 168 | plt.ylabel('Accuracy') 169 | plt.title(f'{filename} auc: {auc:.3f}') 170 | 171 | ax.set_ylim(0,1) 172 | ax.set_xlim(0, float(opt.threshold)) 173 | 174 | # ax.set_xticklabels([0,20,40,60,80,100]) 175 | plt.tight_layout() 176 | plt.savefig(f'{opt.data_folder}/{filename}.png') 177 | plt.close() 178 | 179 | if counts_dict is None: 180 | if opt.pixels: 181 | plt.xlabel('L2 threshold distance (pixels)') 182 | else: 183 | plt.xlabel('ADD threshold distance (m)') 184 | 185 | plt.ylabel('Accuracy') 186 | plt.title(opt.title) 187 | ax.legend(loc='lower right',frameon = True, fancybox=True, framealpha=0.8) 188 | 189 | 190 | legend = ax.get_legend() 191 | for i, t in enumerate(legend.get_texts()): 192 | if opt.data[i] == '666': 193 | t.set_ha('left') # ha is alias for horizontalalignment 194 | t.set_position((-30,0)) 195 | 196 | ax.set_ylim(0,1) 197 | ax.set_xlim(0, float(opt.threshold)) 198 | # ax.set_xticklabels([0,20,40,60,80,100]) 199 | plt.tight_layout() 200 | try: 201 | os.mkdir(opt.outf) 202 | except: 203 | pass 204 | if opt.outf is None: 205 | plt.savefig(f'{opt.filename}.png') 206 | else: 207 | plt.savefig(f'{opt.outf}/{opt.filename}.png') 208 | if opt.show: 209 | plt.show() 210 | plt.close() 211 | 212 | 213 | -------------------------------------------------------------------------------- /evaluate/overlay.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/NVlabs/Deep_Object_Pose/6cb3f3d250aed270e91b605e7cb6e64db4b97dbe/evaluate/overlay.png -------------------------------------------------------------------------------- /evaluate/readme.md: -------------------------------------------------------------------------------- 1 | # Deep Object Pose Estimation (DOPE) - Evaluation 2 | 3 | ## IMPORTANT NOTE 4 | These utilities currently require NVISII for visualization. 5 | 6 | 7 | ## Simple Performance Evaluation 8 | This directory contains code to measure the performance of your trained DOPE model. Below is an example of running the basic evaluation script: 9 | 10 | ``` 11 | python evaluate.py --data_prediction ../inference/output --data ../sample_data 12 | ``` 13 | ### Arguments 14 | #### `--data`: 15 | Path to ground-truth data for the predictions that you wish to evaluate. 16 | 17 | #### `--data_prediction`: 18 | Path to predictions that were generated from running inference on your trained model. To support the evaluation of multiple sets of weights at once, this path can point to a folder containing the outputs of multiple inference results. 19 | 20 | #### `--models`: 21 | Path to 3D model files. 22 | These models are loaded before running evaluation and are rendered to compute the 3D error between the predicted results and ground truth. 23 | Point this argument at the root of the folder containing all of your different model files. Below is a sample folder structure: 24 | 25 | ``` 26 | /PATH_TO_MODELS_FOLDER 27 | ├── 002_master_chef_can 28 | │ ├── 002_master_chef_can.xml 29 | │ ├── points.xyz 30 | │ ├── textured.mtl 31 | │ ├── textured.obj 32 | │ ├── textured_simple.obj 33 | │ ├── textured_simple.obj.mtl 34 | │ └── texture_map.png 35 | └── 035_power_drill 36 | ├── 035_power_drill.xml 37 | ├── points.xyz 38 | ├── textured.mtl 39 | ├── textured.obj 40 | ├── textured_simple.obj 41 | ├── textured_simple.obj.mtl 42 | └── texture_map.png 43 | ``` 44 | 45 | If you trained DOPE on a new object and want to evaluate its 46 | performance, make sure to include the 3D model files in a folder that 47 | matches `"class_name"` in the ground truth `.json` file. 48 | 49 | Multiple models can be loaded at once as the script will recursively 50 | search for any 3D models in the folder specified in `--models`. 51 | 52 | #### `--adds`: 53 | The average distance computed using the closest point distance between 54 | the predicted pose and the ground truth pose. This takes a while to 55 | compute. If you are only looking for a fast approximation, use 56 | ``--cuboid``. 57 | 58 | #### `--cuboid`: 59 | Computes average distance using the 8 cuboid points of the 3D models. 60 | It is much faster than ``--adds`` but is only an approximation for the 61 | metric. It should be used for testing purposes. 62 | 63 | 64 | 65 | # More Complex ADD Metrics and Figure Generation 66 | 67 | ## Requirements 68 | 69 | Run the download content file: `./download_content.sh`, this downloads a simple scene with annotation rendered by NViSII and with DOPE predictions. 70 | 71 | ## How to run 72 | 73 | If you downloaded the previous content you can execute the following: 74 | 75 | ``` 76 | python add_compute.py 77 | ``` 78 | which should generate the following results: 79 | ``` 80 | mean 0.0208515107260977 std 0.016006083915162977 ratio 17/22 81 | auc at 0.02 : 0.5 82 | auc at 0.04 : 0.6818181818181818 83 | auc at 0.06 : 0.7272727272727273 84 | auc 0.6115249999999999 85 | ``` 86 | This means the area under the curve, *auc* from 0 cm to 10 cm is 0.61. This script also produces graphs such as: 87 | 88 | ![example of graph](results/output.png) 89 | 90 | These are the metrics we reported in the original DOPE paper. I will refer to the paper for explaining the graph. 91 | 92 | ## Assumptions 93 | We make a few assumptions in this script. 94 | 1. We assume the folders structures are the same and there are only scenes in the folder. See `data/` folder example from downloading the content. 95 | 2. We assume the notation folder is in the OpenGL format and that it is using the nvisii outputs from the data generation pipeline. If you use a different file format please update the script or your data. 96 | 3. We assume the inferences are from DOPE inference, _e.g._, the poses are in the OpenGL format. These conventions are easy to change, _e.g._, look for the line `visii_gu.get_transform().rotate_around` in `add_compute.py` to change the pose convention. 97 | 98 | If the script takes to long to run, please run with `--cuboid`, instead of using the 3d models vertices to compare the metric, it uses the 3d cuboid of the 3d model to compute the metric. 99 | 100 | ## 2D image-based metric 101 | 102 | If you do not have a 3d model of your model and you would prefer to just measure the quality of your detections with a simple euclidean distance for the predicted keypoints. You can use `python kpd_compute.py`, this is very similar to `add_compute.py` and it behaves very similarly. 103 | The metric used here is the euclidean (L2) distance from predicted keypoint and the ground truth keypoint. Then we propose to use a threshold plot to evaluate the data, similar to the ADD metric. 104 | 105 | # Rendering 3d predictions using NViSII 106 | 107 | ![example of overlay](overlay.png) 108 | 109 | We added a script for you to add render of the 3d model to your predictions. It uses a version of NViSII that is not released yet. Please manually install this [wheel](https://www.dropbox.com/s/m85v7ts981xs090/nvisii-1.2.dev47%2Bgf122b5b.72-cp36-cp36m-manylinux2014_x86_64.whl?dl=0). 110 | ``` 111 | # for scenes with DOPE inference 112 | python render_json.py --path_json data/table_dope_results/scene1/00300.json --scale 0.01 --opencv --contour --gray 113 | # for scenes generated by nvisii 114 | python render_json.py --path_json data/table_ground_truth/scene1/00100.json --scale 0.01 --contour --gray 115 | ``` 116 | 117 | `--gray` render the 3d model as a gray image and `--contour` adds the 3d model contour in green. 118 | 119 | ## Rendering BOP format on images 120 | 121 | Using the same argument, you can use this script on the BOP annotation with 3d models. The script simply rebuilds the data structure that is needed to load the scene. 122 | 123 | ``` 124 | python render_json.py --path_json /PATH/TO/hope_bop/hope_val/val/000001/scene_gt.json --bop --objs_folder /PATH/TO/hope_bop/hope_models/models/ --gray --contour --bop_scene 0 125 | ``` 126 | 127 | Only `--bop` is needed to be passed to load a bop scene. You can pass which scene you want to load with `--bop_scene`. The rest is the same behavior. This was only tested on the HOPE data. 128 | 129 | ## Assumptions 130 | 131 | We assume that you have the intrinsics stored in the camera data. If you do not have them, the script uses 512 x 512 with a fov of 0.78. If the camera data is complete, like with NViSII data, it will use the camera intrinsics. 132 | 133 | 138 | -------------------------------------------------------------------------------- /evaluate/results/output.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/NVlabs/Deep_Object_Pose/6cb3f3d250aed270e91b605e7cb6e64db4b97dbe/evaluate/results/output.png -------------------------------------------------------------------------------- /evaluate/utils_eval.py: -------------------------------------------------------------------------------- 1 | import os 2 | import nvisii as visii 3 | import numpy as np 4 | 5 | 6 | def create_obj( 7 | name="name", 8 | path_obj="", 9 | path_tex=None, 10 | scale=1, 11 | rot_base=None, # visii quat 12 | pos_base=None, # visii vec3 13 | ): 14 | 15 | # This is for YCB like dataset 16 | if path_obj in create_obj.meshes: 17 | obj_mesh = create_obj.meshes[path_obj] 18 | else: 19 | obj_mesh = visii.mesh.create_from_file(name, path_obj) 20 | create_obj.meshes[path_obj] = obj_mesh 21 | 22 | obj_entity = visii.entity.create( 23 | name=name, 24 | # mesh = visii.mesh.create_sphere("mesh1", 1, 128, 128), 25 | mesh=obj_mesh, 26 | transform=visii.transform.create(name), 27 | material=visii.material.create(name), 28 | ) 29 | 30 | # should randomize 31 | obj_entity.get_material().set_metallic(0) # should 0 or 1 32 | obj_entity.get_material().set_transmission(0) # should 0 or 1 33 | obj_entity.get_material().set_roughness(1) # default is 1 34 | 35 | if not path_tex is None: 36 | 37 | if path_tex in create_obj.textures: 38 | obj_texture = create_obj.textures[path_tex] 39 | else: 40 | obj_texture = visii.texture.create_from_file(name, path_tex) 41 | create_obj.textures[path_tex] = obj_texture 42 | 43 | obj_entity.get_material().set_base_color_texture(obj_texture) 44 | 45 | obj_entity.get_transform().set_scale(visii.vec3(scale)) 46 | 47 | if not rot_base is None: 48 | obj_entity.get_transform().set_rotation(rot_base) 49 | if not pos_base is None: 50 | obj_entity.get_transform().set_position(pos_base) 51 | 52 | return obj_entity 53 | 54 | 55 | create_obj.meshes = {} 56 | create_obj.textures = {} 57 | 58 | 59 | def add_cuboid(name, debug=False): 60 | obj = visii.entity.get(name) 61 | 62 | min_obj = obj.get_mesh().get_min_aabb_corner() 63 | max_obj = obj.get_mesh().get_max_aabb_corner() 64 | centroid_obj = obj.get_mesh().get_aabb_center() 65 | 66 | cuboid = [ 67 | visii.vec3(max_obj[0], max_obj[1], max_obj[2]), 68 | visii.vec3(min_obj[0], max_obj[1], max_obj[2]), 69 | visii.vec3(max_obj[0], min_obj[1], max_obj[2]), 70 | visii.vec3(max_obj[0], max_obj[1], min_obj[2]), 71 | visii.vec3(min_obj[0], min_obj[1], max_obj[2]), 72 | visii.vec3(max_obj[0], min_obj[1], min_obj[2]), 73 | visii.vec3(min_obj[0], max_obj[1], min_obj[2]), 74 | visii.vec3(min_obj[0], min_obj[1], min_obj[2]), 75 | visii.vec3(centroid_obj[0], centroid_obj[1], centroid_obj[2]), 76 | ] 77 | 78 | # change the ids to be like ndds / DOPE 79 | cuboid = [ 80 | cuboid[2], 81 | cuboid[0], 82 | cuboid[3], 83 | cuboid[5], 84 | cuboid[4], 85 | cuboid[1], 86 | cuboid[6], 87 | cuboid[7], 88 | cuboid[-1], 89 | ] 90 | 91 | cuboid.append(visii.vec3(centroid_obj[0], centroid_obj[1], centroid_obj[2])) 92 | 93 | for i_p, p in enumerate(cuboid): 94 | child_transform = visii.transform.create(f"{name}_cuboid_{i_p}") 95 | child_transform.set_position(p) 96 | child_transform.set_scale(visii.vec3(0.3)) 97 | child_transform.set_parent(obj.get_transform()) 98 | if debug: 99 | visii.entity.create( 100 | name=f"{name}_cuboid_{i_p}", 101 | mesh=visii.mesh.create_sphere(f"{name}_cuboid_{i_p}"), 102 | transform=child_transform, 103 | material=visii.material.create(f"{name}_cuboid_{i_p}"), 104 | ) 105 | 106 | for i_v, v in enumerate(cuboid): 107 | cuboid[i_v] = [v[0], v[1], v[2]] 108 | 109 | return cuboid 110 | 111 | 112 | def loadmodels(root, cuboid, suffix=""): 113 | models = {} 114 | 115 | def explore(path): 116 | if not os.path.isdir(path): 117 | return 118 | folders = [ 119 | os.path.join(path, o) 120 | for o in os.listdir(path) 121 | if os.path.isdir(os.path.join(path, o)) 122 | ] 123 | 124 | if len(folders) > 0: 125 | for path_entry in folders: 126 | explore(path_entry) 127 | else: 128 | print("Looking at:", path) 129 | path_obj = os.path.join(path, "textured_simple.obj") 130 | path_tex = os.path.join(path, "texture_map.png") 131 | 132 | if os.path.exists(path_obj) and os.path.exists(path_tex): 133 | path = path.rstrip("/") 134 | model_name = path.split("/")[-1] 135 | 136 | print(f"Loading Model: {model_name}") 137 | 138 | models[model_name] = create_obj( 139 | name=model_name + suffix, 140 | path_obj=path_obj, 141 | path_tex=path_tex, 142 | scale=0.01, 143 | ) 144 | 145 | if cuboid: 146 | add_cuboid(model_name + suffix) 147 | 148 | if "gu" in suffix: 149 | models[model_name].get_material().set_metallic(1) 150 | models[model_name].get_material().set_roughness(0.05) 151 | 152 | explore(root) 153 | 154 | return models 155 | 156 | 157 | def load_groundtruth(root): 158 | gts = [] 159 | 160 | def explore(path): 161 | 162 | if not os.path.isdir(path): 163 | return 164 | folders = [ 165 | os.path.join(path, o) 166 | for o in os.listdir(path) 167 | if os.path.isdir(os.path.join(path, o)) 168 | ] 169 | 170 | for path_entry in folders: 171 | explore(path_entry) 172 | 173 | gts.extend( 174 | [ 175 | os.path.join(path, gt).replace(root, "").lstrip("/") 176 | for gt in os.listdir(path) 177 | if gt.endswith(".json") and not "settings" in gt 178 | ] 179 | ) 180 | 181 | explore(root) 182 | 183 | return gts 184 | 185 | 186 | def load_prediction(root, groundtruths): 187 | """ 188 | Supports multiple prediction folders for one set of testing data. 189 | Each prediction folder must contain the same folder structure as the testing data directory. 190 | """ 191 | subdirs = os.listdir(root) 192 | subdirs.append("") 193 | 194 | prediction_folders = [] 195 | 196 | for dir in subdirs: 197 | valid_folder = True 198 | for gt in groundtruths: 199 | file_path = os.path.join(os.path.abspath(root), dir, gt) 200 | 201 | if not os.path.exists(file_path): 202 | valid_folder = False 203 | break 204 | 205 | if valid_folder: 206 | prediction_folders.append(dir) 207 | 208 | return prediction_folders 209 | 210 | 211 | def calculate_auc(thresholds, add_list, total_objects): 212 | res = [] 213 | for thresh in thresholds: 214 | under_thresh = len(np.where(add_list <= thresh)[0]) / total_objects 215 | 216 | res.append(under_thresh) 217 | 218 | return res 219 | 220 | 221 | def calculate_auc_total( 222 | add_list, total_objects, delta_threshold=0.00001, max_threshold=0.1 223 | ): 224 | add_threshold_values = np.arange(0.0, max_threshold, delta_threshold) 225 | 226 | counts = [] 227 | for value in add_threshold_values: 228 | under_threshold = len(np.where(add_list <= value)[0]) / total_objects 229 | counts.append(under_threshold) 230 | 231 | auc = np.trapz(counts, dx=delta_threshold) / max_threshold 232 | 233 | return auc 234 | -------------------------------------------------------------------------------- /inference/README.md: -------------------------------------------------------------------------------- 1 | # Deep Object Pose Estimation (DOPE) - Inference 2 | 3 | This directory contains a simple example of inference for DOPE. 4 | 5 | 6 | ## Setup 7 | 8 | If you haven't already, install the dependencies listed in `requirements.txt` 9 | in the root of the repo: 10 | 11 | ``` 12 | pip install -r ../requirements.txt 13 | ``` 14 | 15 | ## Running Inference 16 | 17 | The `inference.py` script will take a trained model to run inference. In order to run, the following 3 arguments are needed: 18 | 1. `--weights`: path to the trained model weights. Can either point to a single `.pth` file or a folder containing multiple `.pth` files. If this path points to a folder with multiple `.pth` files, the script will individually load and run inference for all of the weights. 19 | 2. ``--data`: path to the data that will be used as input to run inference on. The script **recursively** loads all data that end with extensions specified in the `--exts` flag. 20 | 3. `--object`: name of the class to run detections on. This name must be defined under `dimensions` in the config file passed to `--config`. 21 | 22 | Below is an example of running inference: 23 | 24 | ``` 25 | python inference.py --weights ../weights --data ../sample_data --object cracker 26 | ``` 27 | 28 | ### Configuration Files 29 | Depending on the images you want to run inference on, you may need to redefine the configuration values in `camera_info.yaml` and `config_pose.yaml`. 30 | You can either define a new configuration file and specify it with `--config` and `--camera` or update `camera_info.yaml` and `config_pose.yaml`. 31 | 32 | Before running inference, it is important to make sure that: 33 | 1. The `projection_matrix` field is set properly in `camera_info.yaml` (or the file you specified for `--camera`). 34 | The `projection_matrix` field should be a `3x4` matrix of the form: 35 | ``` 36 | [fx, 0, cx, 0, 37 | 0, fy, cy, 0, 38 | 0, 0, 1, 0] 39 | ``` 40 | 41 | 2. The `dimensions` and `class_ids` fields have been specified for the object you wish to detect in `config_pose.yaml` (or the file you specified for `--config`). 42 | 43 | ### Running Inference with Multiple Weights at Once 44 | The inference script can run inference on multiple weights if the path specified in ``--weights`` points to a folder containing multiple `.pth` files. 45 | This feature is useful for fast evaluation of multiple weights to find the epoch that performs the best. 46 | While, generally, later epochs tend to perform better than earlier ones, this is not always the case. 47 | For more information on how to quantitatively evaluate the performance of a trained model, refer to the `/evaluate` subdirectory. 48 | -------------------------------------------------------------------------------- /license.md: -------------------------------------------------------------------------------- 1 | NVIDIA Source Code License 2 | 3 | 4 | Copyright (c) 2024, NVIDIA Corporation & affiliates. All rights reserved. 5 | 6 | 7 | ======================================================================= 8 | 9 | 1. Definitions 10 | 11 | "Licensor" means any person or entity that distributes its Work. 12 | 13 | "Software" means the original work of authorship made available under 14 | this License. 15 | 16 | "Work" means the Software and any additions to or derivative works of 17 | the Software that are made available under this License. 18 | 19 | The terms "reproduce," "reproduction," "derivative works," and 20 | "distribution" have the meaning as provided under U.S. copyright law; 21 | provided, however, that for the purposes of this License, derivative 22 | works shall not include works that remain separable from, or merely 23 | link (or bind by name) to the interfaces of, the Work. 24 | 25 | Works, including the Software, are "made available" under this License 26 | by including in or with the Work either (a) a copyright notice 27 | referencing the applicability of this License to the Work, or (b) a 28 | copy of this License. 29 | 30 | 2. License Grants 31 | 32 | 2.1 Copyright Grant. Subject to the terms and conditions of this 33 | License, each Licensor grants to you a perpetual, worldwide, 34 | non-exclusive, royalty-free, copyright license to reproduce, 35 | prepare derivative works of, publicly display, publicly perform, 36 | sublicense and distribute its Work and any resulting derivative 37 | works in any form. 38 | 39 | 3. Limitations 40 | 41 | 3.1 Redistribution. You may reproduce or distribute the Work only 42 | if (a) you do so under this License, (b) you include a complete 43 | copy of this License with your distribution, and (c) you retain 44 | without modification any copyright, patent, trademark, or 45 | attribution notices that are present in the Work. 46 | 47 | 3.2 Derivative Works. You may specify that additional or different 48 | terms apply to the use, reproduction, and distribution of your 49 | derivative works of the Work ("Your Terms") only if (a) Your Terms 50 | provide that the use limitation in Section 3.3 applies to your 51 | derivative works, and (b) you identify the specific derivative 52 | works that are subject to Your Terms. Notwithstanding Your Terms, 53 | this License (including the redistribution requirements in Section 54 | 3.1) will continue to apply to the Work itself. 55 | 56 | 3.3 Use Limitation. The Work and any derivative works thereof only 57 | may be used or intended for use non-commercially. Notwithstanding 58 | the foregoing, NVIDIA and its affiliates may use the Work and any 59 | derivative works commercially. As used herein, "non-commercially" 60 | means for research or evaluation purposes only. 61 | 62 | 3.4 Patent Claims. If you bring or threaten to bring a patent claim 63 | against any Licensor (including any claim, cross-claim or 64 | counterclaim in a lawsuit) to enforce any patents that you allege 65 | are infringed by any Work, then your rights under this License from 66 | such Licensor (including the grant in Section 2.1) will terminate 67 | immediately. 68 | 69 | 3.5 Trademarks. This License does not grant any rights to use any 70 | Licensor�s or its affiliates� names, logos, or trademarks, except 71 | as necessary to reproduce the notices described in this License. 72 | 73 | 3.6 Termination. If you violate any term of this License, then your 74 | rights under this License (including the grant in Section 2.1) will 75 | terminate immediately. 76 | 77 | 4. Disclaimer of Warranty. 78 | 79 | THE WORK IS PROVIDED "AS IS" WITHOUT WARRANTIES OR CONDITIONS OF ANY 80 | KIND, EITHER EXPRESS OR IMPLIED, INCLUDING WARRANTIES OR CONDITIONS OF 81 | MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE, TITLE OR 82 | NON-INFRINGEMENT. YOU BEAR THE RISK OF UNDERTAKING ANY ACTIVITIES UNDER 83 | THIS LICENSE. 84 | 85 | 5. Limitation of Liability. 86 | 87 | EXCEPT AS PROHIBITED BY APPLICABLE LAW, IN NO EVENT AND UNDER NO LEGAL 88 | THEORY, WHETHER IN TORT (INCLUDING NEGLIGENCE), CONTRACT, OR OTHERWISE 89 | SHALL ANY LICENSOR BE LIABLE TO YOU FOR DAMAGES, INCLUDING ANY DIRECT, 90 | INDIRECT, SPECIAL, INCIDENTAL, OR CONSEQUENTIAL DAMAGES ARISING OUT OF 91 | OR RELATED TO THIS LICENSE, THE USE OR INABILITY TO USE THE WORK 92 | (INCLUDING BUT NOT LIMITED TO LOSS OF GOODWILL, BUSINESS INTERRUPTION, 93 | LOST PROFITS OR DATA, COMPUTER FAILURE OR MALFUNCTION, OR ANY OTHER 94 | COMMERCIAL DAMAGES OR LOSSES), EVEN IF THE LICENSOR HAS BEEN ADVISED OF 95 | THE POSSIBILITY OF SUCH DAMAGES. 96 | 97 | ======================================================================= -------------------------------------------------------------------------------- /readme.md: -------------------------------------------------------------------------------- 1 | [![License CC BY-NC-SA 4.0](https://img.shields.io/badge/License-CC%20BY--NC--SA%204.0-blue.svg)](https://creativecommons.org/licenses/by-nc-sa/4.0/legalcode) 2 | ![Python 3.8](https://img.shields.io/badge/python-3.8-blue.svg) 3 | # Deep Object Pose Estimation 4 | 5 | This is the official repository for NVIDIA's Deep Object Pose Estimation, which performs detection and 6-DoF pose estimation of **known objects** from an RGB camera. For full details, see our [CoRL 2018 paper](https://arxiv.org/abs/1809.10790) and [video](https://youtu.be/yVGViBqWtBI). 6 | 7 | 8 | ![DOPE Objects](dope_objects.png) 9 | 10 | 11 | ## Contents 12 | 13 | This repository contains complete code for [training](train), [inference](inference), numerical [evaluation](evaluate) of results, and synthetic [data generation](data_generation). We also provide a [ROS1 Noetic package](ros1) that performs inference on images from a USB camera. 14 | 15 | Hardware-accelerated ROS2 inference can be done with the external 16 | [NVIDIA Isaac ROS DOPE](https://github.com/NVIDIA-ISAAC-ROS/isaac_ros_pose_estimation/tree/main/isaac_ros_dope) project. 17 | 18 | 19 | The [original version](CoRL) of the code used for the CoRL paper is also included 20 | for reference, but is no longer being maintained. 21 | 22 | ## Walkthrough 23 | We provide a [walkthrough](walkthrough.md) of the entire pipeline: generating data, training a model, and inference. 24 | 25 | ## Datasets 26 | 27 | We have trained and tested DOPE with two publicly available datasets: YCB, and HOPE. The trained weights can be [downloaded from Google Drive](https://drive.google.com/drive/folders/1DfoA3m_Bm0fW8tOWXGVxi4ETlLEAgmcg). 28 | 29 | 30 | 31 | ### YCB 3D Models 32 | YCB models can be downloaded from the [YCB website](http://www.ycbbenchmarks.com/), or by using [NVDU](https://github.com/NVIDIA/Dataset_Utilities) (see the `nvdu_ycb` command). 33 | 34 | 35 | ### HOPE 3D Models 36 | The [HOPE dataset](https://github.com/swtyree/hope-dataset/) is a collection of RGBD images and video sequences with labeled 6-DoF poses for 28 toy grocery objects. The 3D models [can be downloaded here](https://drive.google.com/drive/folders/1jiJS9KgcYAkfb8KJPp5MRlB0P11BStft). 37 | The folders are organized in the style of the YCB 3d models. 38 | 39 | The physical objects can be purchased online (details and links to Amazon can be found in the [HOPE repository README](https://github.com/swtyree/hope-dataset/). 40 | 41 | 42 | 43 | ## Tested Configurations 44 | 45 | We have tested our standalone training, inference and evaluation scripts on Ubuntu 20.04 and 22.04 with Python 3.8+, using an NVIDIA Titan X, 2080Ti, and Titan RTX. 46 | 47 | The ROS1 node has been tested with ROS Noetic using Python 3.10. The Isaac ROS2 DOPE node has been tested with ROS2 Foxy on Jetson AGX Xavier with JetPack 4.6; and on x86/Ubuntu 20.04 with a NVIDIA Titan X, 2080Ti, and Titan RTX. 48 | 49 |
50 |
51 | 52 | ## How to cite DOPE 53 | 54 | If you use this tool in a research project, please cite as follows: 55 | ``` 56 | @inproceedings{tremblay2018corl:dope, 57 | author = {Jonathan Tremblay and Thang To and Balakumar Sundaralingam and Yu Xiang and Dieter Fox and Stan Birchfield}, 58 | title = {Deep Object Pose Estimation for Semantic Robotic Grasping of Household Objects}, 59 | booktitle = {Conference on Robot Learning (CoRL)}, 60 | url = "https://arxiv.org/abs/1809.10790", 61 | year = 2018 62 | } 63 | ``` 64 | 65 | ## License 66 | 67 | Copyright (C) 2018-2025 NVIDIA Corporation. All rights reserved. This code is licensed under the [NVIDIA Source Code License](license.md). 68 | 69 | 70 | ## Acknowledgment 71 | 72 | Thanks to Jeff Smith (jeffreys@nvidia.com) for help maintaining the repo and software. Thanks also to [Martin Günther](https://github.com/mintar) for his code contributions and fixes. 73 | 74 | 75 | ## Contact 76 | 77 | Jonathan Tremblay (jtremblay@nvidia.com), Stan Birchfield (sbirchfield@nvidia.com) 78 | -------------------------------------------------------------------------------- /requirements.txt: -------------------------------------------------------------------------------- 1 | torch>=2.1.0 2 | torchvision 3 | tensorboardX 4 | boto3 5 | albumentations 6 | botocore 7 | certifi 8 | charset-normalizer 9 | configparser==5.0.0 10 | idna 11 | imageio 12 | jmespath 13 | joblib 14 | networkx 15 | numpy 16 | opencv-python-headless==4.6.0.66 17 | packaging 18 | Pillow 19 | protobuf==3.20.1 20 | pyparsing 21 | pyrr==0.10.3 22 | python-dateutil 23 | PyWavelets 24 | PyYAML 25 | qudida 26 | requests 27 | seaborn 28 | s3transfer 29 | scikit-image 30 | scikit-learn 31 | scipy 32 | simplejson 33 | six 34 | threadpoolctl 35 | tifffile 36 | typing_extensions 37 | urllib3 38 | 39 | 40 | ## If running into dependency issues, install the version-specific requirements below in a virtual environment 41 | # albumentations==1.2.1 42 | # boto3==1.24.58 43 | # botocore==1.27.58 44 | # certifi==2022.6.15 45 | # charset-normalizer==2.1.1 46 | # idna==3.3 47 | # imageio==2.21.1 48 | # jmespath==1.0.1 49 | # joblib==1.1.0 50 | # networkx==2.8.6 51 | # numpy==1.23.2 52 | # opencv-python-headless==4.6.0.66 53 | # packaging==21.3 54 | # Pillow==9.2.0 55 | # protobuf==3.20.1 56 | # pyparsing==3.0.9 57 | # pyrr==0.10.3 58 | # python-dateutil==2.8.2 59 | # PyWavelets==1.3.0 60 | # PyYAML==6.0 61 | # qudida==0.0.4 62 | # requests==2.28.1 63 | # s3transfer==0.6.0 64 | # scikit-image==0.19.3 65 | # scikit-learn==1.1.2 66 | # scipy==1.9.0 67 | # simplejson==3.17.6 68 | # six==1.16.0 69 | # tensorboardX==2.5.1 70 | # threadpoolctl==3.1.0 71 | # tifffile==2022.8.12 72 | # torch==1.12.1 73 | # torchvision==0.13.1 74 | # typing_extensions==4.3.0 75 | # urllib3==1.26.12 76 | -------------------------------------------------------------------------------- /ros1/README.md: -------------------------------------------------------------------------------- 1 | # Running DOPE with ROS 2 | 3 | This directory and its subdirectories contains code for running DOPE with ROS Noetic. 4 | The following steps assume you have installed ROS already. Alternatively, you can use the provided [Docker image](docker/readme.md) and skip to Step #7. 5 | 6 | 1. **Install ROS** 7 | 8 | Follow these [instructions](http://wiki.ros.org/noetic/Installation/Ubuntu). 9 | You can select any of the default configurations in step 1.4; even the 10 | ROS-Base (Bare Bones) package (`ros-noetic-ros-base`) is enough. 11 | 12 | 2. **Create a catkin workspace** (if you do not already have one). To create a catkin workspace, follow these [instructions](http://wiki.ros.org/catkin/Tutorials/create_a_workspace): 13 | ``` 14 | $ mkdir -p ~/catkin_ws/src # Replace `catkin_ws` with the name of your workspace 15 | $ cd ~/catkin_ws/ 16 | $ catkin_make 17 | ``` 18 | 19 | 3. **Download the DOPE code** 20 | ``` 21 | $ cd ~src 22 | $ git clone https://github.com/NVlabs/Deep_Object_Pose.git 23 | $ ln -s ~/src/Deep_Object_Pose/ros1/dope ~/catkin_ws/src/dope 24 | ``` 25 | 26 | 4. **Install python dependencies** 27 | ``` 28 | $ cd ~/catkin_ws/src/dope 29 | $ python3 -m pip install -r ~/src/Deep_Object_Pose/requirements.txt 30 | ``` 31 | 32 | 5. **Install ROS dependencies** 33 | ``` 34 | $ cd ~/catkin_ws 35 | $ rosdep install --from-paths src -i --rosdistro noetic 36 | $ sudo apt-get install ros-noetic-rosbash ros-noetic-ros-comm 37 | ``` 38 | 39 | 6. **Build** 40 | ``` 41 | $ cd ~/catkin_ws 42 | $ catkin_make 43 | ``` 44 | 45 | 7. **Download [the weights](https://drive.google.com/open?id=1DfoA3m_Bm0fW8tOWXGVxi4ETlLEAgmcg)** and save them to the `weights` folder, *i.e.*, `~/catkin_ws/src/dope/weights/`. 46 | 47 | 48 | ## Running 49 | 50 | 1. **Start ROS master** 51 | ``` 52 | $ cd ~/catkin_ws 53 | $ source devel/setup.bash 54 | $ roscore 55 | ``` 56 | 57 | 2. **Start camera node** (or start your own camera node) 58 | ``` 59 | $ roslaunch dope camera.launch # Publishes RGB images to `/dope/webcam_rgb_raw` 60 | ``` 61 | 62 | The camera must publish a correct `camera_info` topic to enable DOPE to compute the correct poses. Basically all ROS drivers have a `camera_info_url` parameter where you can set the calibration info (but most ROS drivers include a reasonable default). 63 | 64 | For details on calibration and rectification of your camera see the [camera tutorial](doc/camera_tutorial.md). 65 | 66 | 3. **Edit config info** (if desired) in `~/catkin_ws/src/dope/config/config_pose.yaml` 67 | * `topic_camera`: RGB topic to listen to 68 | * `topic_camera_info`: camera info topic to listen to 69 | * `topic_publishing`: topic namespace for publishing 70 | * `input_is_rectified`: Whether the input images are rectified. It is strongly suggested to use a rectified input topic. 71 | * `downscale_height`: If the input image is larger than this, scale it down to this pixel height. Very large input images eat up all the GPU memory and slow down inference. Also, DOPE works best when the object size (in pixels) has appeared in the training data (which is downscaled to 400 px). For these reasons, downscaling large input images to something reasonable (e.g., 400-500 px) improves memory consumption, inference speed *and* recognition results. 72 | * `weights`: dictionary of object names and there weights path name, **comment out any line to disable detection/estimation of that object** 73 | * `dimensions`: dictionary of dimensions for the objects (key values must match the `weights` names) 74 | * `class_ids`: dictionary of class ids to be used in the messages published on the `/dope/detected_objects` topic (key values must match the `weights` names) 75 | * `draw_colors`: dictionary of object colors (key values must match the `weights` names) 76 | * `model_transforms`: dictionary of transforms that are applied to the pose before publishing (key values must match the `weights` names) 77 | * `meshes`: dictionary of mesh filenames for visualization (key values must match the `weights` names) 78 | * `mesh_scales`: dictionary of scaling factors for the visualization meshes (key values must match the `weights` names) 79 | * `overlay_belief_images`: whether to overlay the input image on the belief images published on /dope/belief_[obj_name] 80 | * `thresh_angle`: undocumented 81 | * `thresh_map`: undocumented 82 | * `sigma`: undocumented 83 | * `thresh_points`: Thresholding the confidence for object detection; increase this value if you see too many false positives, reduce it if objects are not detected. 84 | 85 | 4. **Start DOPE node** 86 | ``` 87 | $ roslaunch dope dope.launch [config:=/path/to/my_config.yaml] # Config file is optional; default is `config_pose.yaml` 88 | ``` 89 | 90 | 91 | 92 | 93 | ## Debugging 94 | 95 | * The following ROS topics are published (assuming `topic_publishing == 'dope'`): 96 | ``` 97 | /dope/belief_[obj_name] # belief maps of object 98 | /dope/dimension_[obj_name] # dimensions of object 99 | /dope/pose_[obj_name] # timestamped pose of object 100 | /dope/rgb_points # RGB images with detected cuboids overlaid 101 | /dope/detected_objects # vision_msgs/Detection3DArray of all detected objects 102 | /dope/markers # RViz visualization markers for all objects 103 | ``` 104 | *Note:* `[obj_name]` is in {cracker, gelatin, meat, mustard, soup, sugar} 105 | 106 | * To debug in RViz, run `rviz`, then add one or more of the following displays: 107 | * `Add > Image` to view the raw RGB image or the image with cuboids overlaid 108 | * `Add > Pose` to view the object coordinate frame in 3D. 109 | * `Add > MarkerArray` to view the cuboids, meshes etc. in 3D. 110 | * `Add > Camera` to view the RGB Image with the poses and markers from above. 111 | 112 | If you do not have a coordinate frame set up, you can run this static transformation: `rosrun tf2_ros static_transform_publisher 0 0 0 0.7071 0 0 -0.7071 world `, where `` is the `frame_id` of your input camera messages. Make sure that in RViz's `Global Options`, the `Fixed Frame` is set to `world`. Alternatively, you can skip the `static_transform_publisher` step and directly set the `Fixed Frame` to your ``. 113 | 114 | * If `rosrun` does not find the package (`[rospack] Error: package 'dope' not found`), be sure that you called `source devel/setup.bash` as mentioned above. To find the package, run `rospack find dope`. 115 | 116 | -------------------------------------------------------------------------------- /ros1/docker/Dockerfile.noetic: -------------------------------------------------------------------------------- 1 | FROM ros:noetic-robot 2 | 3 | # Copyright (c) 2020, NVIDIA CORPORATION. All rights reserved. 4 | # Full license terms provided in LICENSE.md file. 5 | # To build: 6 | # docker build -t nvidia-dope:noetic-v1 -f Dockerfile.noetic .. 7 | 8 | ENV HOME /root 9 | ENV DEBIAN_FRONTEND=noninteractive 10 | 11 | # Install system and development components 12 | RUN apt update && apt -y --no-install-recommends install \ 13 | software-properties-common \ 14 | build-essential \ 15 | cmake \ 16 | git \ 17 | python3-pip \ 18 | libxext6 \ 19 | libx11-6 \ 20 | libglvnd0 \ 21 | libgl1 \ 22 | libglx0 \ 23 | libegl1 \ 24 | freeglut3-dev \ 25 | && apt -y autoremove \ 26 | && apt clean 27 | 28 | # Install required ROS components 29 | RUN apt update && apt -y --no-install-recommends install \ 30 | ros-noetic-cv-bridge \ 31 | ros-noetic-geometry-msgs \ 32 | ros-noetic-message-filters \ 33 | ros-noetic-resource-retriever \ 34 | ros-noetic-rospy \ 35 | ros-noetic-sensor-msgs \ 36 | ros-noetic-std-msgs \ 37 | ros-noetic-tf \ 38 | ros-noetic-vision-msgs \ 39 | ros-noetic-visualization-msgs \ 40 | ros-noetic-rviz \ 41 | && apt -y autoremove \ 42 | && apt clean 43 | 44 | # pip install required Python packages 45 | COPY requirements.txt ${HOME} 46 | RUN python3 -m pip install --no-cache-dir -r ${HOME}/requirements.txt 47 | 48 | # Setup catkin workspace 49 | ENV CATKIN_WS ${HOME}/catkin_ws 50 | COPY dope ${CATKIN_WS}/src/dope 51 | COPY docker/init_workspace.sh ${HOME} 52 | RUN ${HOME}/init_workspace.sh 53 | RUN echo "source ${CATKIN_WS}/devel/setup.bash" >> ${HOME}/.bashrc 54 | 55 | ENV DISPLAY :0 56 | ENV NVIDIA_VISIBLE_DEVICES all 57 | ENV NVIDIA_DRIVER_CAPABILITIES graphics,utility,compute 58 | ENV TERM=xterm 59 | # Some QT-Apps don't show controls without this 60 | ENV QT_X11_NO_MITSHM 1 -------------------------------------------------------------------------------- /ros1/docker/init_workspace.sh: -------------------------------------------------------------------------------- 1 | #!/bin/bash 2 | # 3 | 4 | # Copyright (c) 2018, NVIDIA CORPORATION. All rights reserved. 5 | # Full license terms provided in LICENSE.md file. 6 | 7 | # Stop in case of any error. 8 | set -e 9 | 10 | source /opt/ros/noetic/setup.bash 11 | 12 | # Create catkin workspace. 13 | mkdir -p ${CATKIN_WS}/src 14 | cd ${CATKIN_WS}/src 15 | catkin_init_workspace 16 | # Clone ROS libraires that must be built from source 17 | git clone https://github.com/ros-perception/camera_info_manager_py.git 18 | cd .. 19 | catkin_make 20 | -------------------------------------------------------------------------------- /ros1/docker/readme.md: -------------------------------------------------------------------------------- 1 | ## DOPE in a Docker Container 2 | 3 | Running ROS inside of [Docker](https://www.docker.com/) is an excellent way to 4 | experiment with DOPE, as it allows the user to completely isolate all software and configuration 5 | changes from the host system. This document describes how to create and run a 6 | Docker image that contains a complete ROS environment that supports DOPE, 7 | including all required components, such as ROS Noetic, rviz, CUDA with cuDNN, 8 | and other packages. 9 | 10 | The current configuration assumes all components are installed on an x86 host 11 | platform running Ubuntu 18.04 or later. Further, use of the DOPE Docker container requires an NVIDIA GPU to be present, and the use of Docker version 19.03.0 or later. 12 | 13 | 14 | ### Steps 15 | 16 | 1. **Download the DOPE code** 17 | ``` 18 | $ git clone https://github.com/NVlabs/Deep_Object_Pose.git dope 19 | ``` 20 | 21 | 2. **Build the docker image** 22 | ``` 23 | $ cd dope/docker 24 | $ docker build -t nvidia-dope:noetic-v1 -f Dockerfile.noetic .. 25 | ``` 26 | This will take several minutes and requires an internet connection. 27 | 28 | 3. **Plug in your camera** 29 | Docker will not recognize a USB device that is plugged in after the container is started. 30 | 31 | 4. **Run the container** 32 | ``` 33 | $ ./run_dope_docker.sh [name] [host dir] [container dir] 34 | ``` 35 | Parameters: 36 | - `name` is an optional field that specifies the name of this image. By default, it is `nvidia-dope-v2`. By using different names, you can create multiple containers from the same image. 37 | - `host dir` and `container dir` are a pair of optional fields that allow you to specify a mapping between a directory on your host machine and a location inside the container. This is useful for sharing code and data between the two systems. By default, it maps the directory containing dope to `/root/catkin_ws/src/dope` in the container. 38 | 39 | Only the first invocation of this script with a given name will create a container. Subsequent executions will attach to the running container allowing you -- in effect -- to have multiple terminal sessions into a single container. 40 | 41 | 5. **Build DOPE** 42 | Return to step 7 of the [installation instructions](../readme.md) (downloading the weights). 43 | 44 | *Note:* Since the Docker container binds directly to the host's network, it will see `roscore` even if running outside the docker container. 45 | -------------------------------------------------------------------------------- /ros1/docker/run_dope_docker.sh: -------------------------------------------------------------------------------- 1 | #!/bin/bash 2 | 3 | # Copyright (c) 2020, NVIDIA CORPORATION. All rights reserved. 4 | # Full license terms provided in LICENSE.md file. 5 | 6 | CONTAINER_NAME=$1 7 | if [[ -z "${CONTAINER_NAME}" ]]; then 8 | CONTAINER_NAME=nvidia-dope-v2 9 | fi 10 | 11 | # This specifies a mapping between a host directory and a directory in the 12 | # docker container. This mapping should be changed if you wish to have access to 13 | # a different directory 14 | HOST_DIR=$2 15 | if [[ -z "${HOST_DIR}" ]]; then 16 | HOST_DIR=`realpath ${PWD}/..` 17 | fi 18 | 19 | CONTAINER_DIR=$3 20 | if [[ -z "${CONTAINER_DIR}" ]]; then 21 | CONTAINER_DIR=/root/catkin_ws/src/dope 22 | fi 23 | 24 | echo "Container name : ${CONTAINER_NAME}" 25 | echo "Host directory : ${HOST_DIR}" 26 | echo "Container directory: ${CONTAINER_DIR}" 27 | DOPE_ID=`docker ps -aqf "name=^/${CONTAINER_NAME}$"` 28 | if [ -z "${DOPE_ID}" ]; then 29 | echo "Creating new DOPE docker container." 30 | xhost +local:root 31 | docker run --gpus all -it --privileged --network=host -v ${HOST_DIR}:${CONTAINER_DIR}:rw -v /tmp/.X11-unix:/tmp/.X11-unix:rw --env="DISPLAY" --name=${CONTAINER_NAME} nvidia-dope:noetic-v1 bash 32 | else 33 | echo "Found DOPE docker container: ${DOPE_ID}." 34 | # Check if the container is already running and start if necessary. 35 | if [ -z `docker ps -qf "name=^/${CONTAINER_NAME}$"` ]; then 36 | xhost +local:${DOPE_ID} 37 | echo "Starting and attaching to ${CONTAINER_NAME} container..." 38 | docker start ${DOPE_ID} 39 | docker attach ${DOPE_ID} 40 | else 41 | echo "Found running ${CONTAINER_NAME} container, attaching bash..." 42 | docker exec -it ${DOPE_ID} bash 43 | fi 44 | fi 45 | -------------------------------------------------------------------------------- /ros1/dope/CMakeLists.txt: -------------------------------------------------------------------------------- 1 | cmake_minimum_required(VERSION 3.5.2) 2 | project(dope) 3 | 4 | ## Compile as C++11, supported in ROS Kinetic and newer 5 | # add_compile_options(-std=c++11) 6 | 7 | ## Find catkin macros and libraries 8 | ## if COMPONENTS list like find_package(catkin REQUIRED COMPONENTS xyz) 9 | ## is used, also find other catkin packages 10 | find_package(catkin REQUIRED COMPONENTS 11 | cv_bridge 12 | geometry_msgs 13 | message_filters 14 | resource_retriever 15 | rospy 16 | sensor_msgs 17 | std_msgs 18 | tf 19 | vision_msgs 20 | visualization_msgs 21 | ) 22 | 23 | ## System dependencies are found with CMake's conventions 24 | # find_package(Boost REQUIRED COMPONENTS system) 25 | 26 | 27 | ## Uncomment this if the package has a setup.py. This macro ensures 28 | ## modules and global scripts declared therein get installed 29 | ## See http://ros.org/doc/api/catkin/html/user_guide/setup_dot_py.html 30 | catkin_python_setup() 31 | 32 | ################################################ 33 | ## Declare ROS messages, services and actions ## 34 | ################################################ 35 | 36 | ## To declare and build messages, services or actions from within this 37 | ## package, follow these steps: 38 | ## * Let MSG_DEP_SET be the set of packages whose message types you use in 39 | ## your messages/services/actions (e.g. std_msgs, actionlib_msgs, ...). 40 | ## * In the file package.xml: 41 | ## * add a build_depend tag for "message_generation" 42 | ## * add a build_depend and a exec_depend tag for each package in MSG_DEP_SET 43 | ## * If MSG_DEP_SET isn't empty the following dependency has been pulled in 44 | ## but can be declared for certainty nonetheless: 45 | ## * add a exec_depend tag for "message_runtime" 46 | ## * In this file (CMakeLists.txt): 47 | ## * add "message_generation" and every package in MSG_DEP_SET to 48 | ## find_package(catkin REQUIRED COMPONENTS ...) 49 | ## * add "message_runtime" and every package in MSG_DEP_SET to 50 | ## catkin_package(CATKIN_DEPENDS ...) 51 | ## * uncomment the add_*_files sections below as needed 52 | ## and list every .msg/.srv/.action file to be processed 53 | ## * uncomment the generate_messages entry below 54 | ## * add every package in MSG_DEP_SET to generate_messages(DEPENDENCIES ...) 55 | 56 | ## Generate messages in the 'msg' folder 57 | # add_message_files( 58 | # FILES 59 | # Message1.msg 60 | # Message2.msg 61 | # ) 62 | 63 | ## Generate services in the 'srv' folder 64 | # add_service_files( 65 | # FILES 66 | # Service1.srv 67 | # Service2.srv 68 | # ) 69 | 70 | ## Generate actions in the 'action' folder 71 | # add_action_files( 72 | # FILES 73 | # Action1.action 74 | # Action2.action 75 | # ) 76 | 77 | ## Generate added messages and services with any dependencies listed here 78 | # generate_messages( 79 | # DEPENDENCIES 80 | # std_msgs 81 | # ) 82 | 83 | ################################################ 84 | ## Declare ROS dynamic reconfigure parameters ## 85 | ################################################ 86 | 87 | ## To declare and build dynamic reconfigure parameters within this 88 | ## package, follow these steps: 89 | ## * In the file package.xml: 90 | ## * add a build_depend and a exec_depend tag for "dynamic_reconfigure" 91 | ## * In this file (CMakeLists.txt): 92 | ## * add "dynamic_reconfigure" to 93 | ## find_package(catkin REQUIRED COMPONENTS ...) 94 | ## * uncomment the "generate_dynamic_reconfigure_options" section below 95 | ## and list every .cfg file to be processed 96 | 97 | ## Generate dynamic reconfigure parameters in the 'cfg' folder 98 | # generate_dynamic_reconfigure_options( 99 | # cfg/DynReconf1.cfg 100 | # cfg/DynReconf2.cfg 101 | # ) 102 | 103 | ################################### 104 | ## catkin specific configuration ## 105 | ################################### 106 | ## The catkin_package macro generates cmake config files for your package 107 | ## Declare things to be passed to dependent projects 108 | ## INCLUDE_DIRS: uncomment this if your package contains header files 109 | ## LIBRARIES: libraries you create in this project that dependent projects also need 110 | ## CATKIN_DEPENDS: catkin_packages dependent projects also need 111 | ## DEPENDS: system dependencies of this project that dependent projects also need 112 | catkin_package( 113 | CATKIN_DEPENDS geometry_msgs sensor_msgs std_msgs vision_msgs visualization_msgs 114 | # DEPENDS system_lib 115 | ) 116 | 117 | ########### 118 | ## Build ## 119 | ########### 120 | 121 | ## Specify additional locations of header files 122 | ## Your package locations should be listed before other locations 123 | include_directories( 124 | # include 125 | ${catkin_INCLUDE_DIRS} 126 | ) 127 | 128 | ## Declare a C++ library 129 | # add_library(${PROJECT_NAME} 130 | # src/${PROJECT_NAME}/dope/dope.cpp 131 | # ) 132 | 133 | ## Add cmake target dependencies of the library 134 | ## as an example, code may need to be generated before libraries 135 | ## either from message generation or dynamic reconfigure 136 | # add_dependencies(${PROJECT_NAME} ${${PROJECT_NAME}_EXPORTED_TARGETS} ${catkin_EXPORTED_TARGETS}) 137 | 138 | ## Declare a C++ executable 139 | ## With catkin_make all packages are built within a single CMake context 140 | ## The recommended prefix ensures that target names across packages don't collide 141 | # add_executable(${PROJECT_NAME}_node src/dope_vis_node.cpp) 142 | 143 | ## Rename C++ executable without prefix 144 | ## The above recommended prefix causes long target names, the following renames the 145 | ## target back to the shorter version for ease of user use 146 | ## e.g. "rosrun someones_pkg node" instead of "rosrun someones_pkg someones_pkg_node" 147 | # set_target_properties(${PROJECT_NAME}_node PROPERTIES OUTPUT_NAME node PREFIX "") 148 | 149 | ## Add cmake target dependencies of the executable 150 | ## same as for the library above 151 | # add_dependencies(${PROJECT_NAME}_node ${${PROJECT_NAME}_EXPORTED_TARGETS} ${catkin_EXPORTED_TARGETS}) 152 | 153 | ## Specify libraries to link a library or executable target against 154 | # target_link_libraries(${PROJECT_NAME}_node 155 | # ${catkin_LIBRARIES} 156 | # ) 157 | 158 | ############# 159 | ## Install ## 160 | ############# 161 | 162 | # all install targets should use catkin DESTINATION variables 163 | # See http://ros.org/doc/api/catkin/html/adv_user_guide/variables.html 164 | 165 | ## Mark executable scripts (Python etc.) for installation 166 | ## in contrast to setup.py, you can choose the destination 167 | # install(PROGRAMS 168 | # scripts/my_python_script 169 | # DESTINATION ${CATKIN_PACKAGE_BIN_DESTINATION} 170 | # ) 171 | 172 | catkin_install_python(PROGRAMS 173 | nodes/camera 174 | nodes/dope 175 | DESTINATION ${CATKIN_PACKAGE_BIN_DESTINATION}) 176 | 177 | ## Mark executables and/or libraries for installation 178 | # install(TARGETS ${PROJECT_NAME} ${PROJECT_NAME}_node 179 | # ARCHIVE DESTINATION ${CATKIN_PACKAGE_LIB_DESTINATION} 180 | # LIBRARY DESTINATION ${CATKIN_PACKAGE_LIB_DESTINATION} 181 | # RUNTIME DESTINATION ${CATKIN_PACKAGE_BIN_DESTINATION} 182 | # ) 183 | 184 | ## Mark cpp header files for installation 185 | # install(DIRECTORY include/${PROJECT_NAME}/ 186 | # DESTINATION ${CATKIN_PACKAGE_INCLUDE_DESTINATION} 187 | # FILES_MATCHING PATTERN "*.h" 188 | # PATTERN ".svn" EXCLUDE 189 | # ) 190 | 191 | ## Mark other files for installation (e.g. launch and bag files, etc.) 192 | # install(FILES 193 | # # myfile1 194 | # # myfile2 195 | # DESTINATION ${CATKIN_PACKAGE_SHARE_DESTINATION} 196 | # ) 197 | 198 | ############# 199 | ## Testing ## 200 | ############# 201 | 202 | ## Add gtest based cpp test target and link libraries 203 | # catkin_add_gtest(${PROJECT_NAME}-test test/test_dope_vis.cpp) 204 | # if(TARGET ${PROJECT_NAME}-test) 205 | # target_link_libraries(${PROJECT_NAME}-test ${PROJECT_NAME}) 206 | # endif() 207 | 208 | ## Add folders to be run by python nosetests 209 | # catkin_add_nosetests(test) 210 | -------------------------------------------------------------------------------- /ros1/dope/config/camera_info.yaml: -------------------------------------------------------------------------------- 1 | image_width: 640 2 | image_height: 480 3 | camera_name: dope_webcam_0 4 | camera_matrix: 5 | rows: 3 6 | cols: 3 7 | data: [641.5, 0, 320.0, 0, 641.5, 240.0, 0, 0, 1] 8 | distortion_model: plumb_bob 9 | distortion_coefficients: 10 | rows: 1 11 | cols: 5 12 | data: [0, 0, 0, 0, 0] 13 | rectification_matrix: 14 | rows: 3 15 | cols: 3 16 | data: [1, 0, 0, 0, 1, 0, 0, 0, 1] 17 | projection_matrix: 18 | rows: 3 19 | cols: 4 20 | data: [641.5, 0, 320.0, 0, 0, 641.5, 240.0, 0, 0, 0, 1, 0] 21 | -------------------------------------------------------------------------------- /ros1/dope/config/config_pose.yaml: -------------------------------------------------------------------------------- 1 | topic_camera: "/dope/webcam/image_raw" 2 | topic_camera_info: "/dope/webcam/camera_info" 3 | topic_publishing: "dope" 4 | input_is_rectified: True # Whether the input image is rectified (strongly suggested!) 5 | downscale_height: 400 # if the input image is larger than this, scale it down to this pixel height 6 | 7 | # Comment any of these lines to prevent detection / pose estimation of that object 8 | weights: { 9 | # "cracker":"package://dope/weights/cracker_60.pth", 10 | # "gelatin":"package://dope/weights/gelatin_60.pth", 11 | # "meat":"package://dope/weights/meat_20.pth", 12 | # "mustard":"package://dope/weights/mustard_60.pth", 13 | "soup":"package://dope/weights/soup_60.pth", 14 | #"sugar":"package://dope/weights/sugar_60.pth", 15 | # "bleach":"package://dope/weights/bleach_28_dr.pth" 16 | 17 | # NEW OBJECTS - HOPE 18 | # "AlphabetSoup":"package://dope/weights/AlphabetSoup.pth", 19 | # "BBQSauce":"package://dope/weights/BBQSauce.pth", 20 | # "Butter":"package://dope/weights/Butter.pth", 21 | # "Cherries":"package://dope/weights/Cherries.pth", 22 | # "ChocolatePudding":"package://dope/weights/ChocolatePudding.pth", 23 | # "Cookies":"package://dope/weights/Cookies.pth", 24 | # "Corn":"package://dope/weights/Corn.pth", 25 | # "CreamCheese":"package://dope/weights/CreamCheese.pth", 26 | # "GreenBeans":"package://dope/weights/GreenBeans.pth", 27 | # "GranolaBars":"package://dope/weights/GranolaBars.pth", 28 | # "Ketchup":"package://dope/weights/Ketchup.pth", 29 | # "MacaroniAndCheese":"package://dope/weights/MacaroniAndCheese.pth", 30 | # "Mayo":"package://dope/weights/Mayo.pth", 31 | # "Milk":"package://dope/weights/Milk.pth", 32 | # "Mushrooms":"package://dope/weights/Mushrooms.pth", 33 | # "Mustard":"package://dope/weights/Mustard.pth", 34 | # "Parmesan":"package://dope/weights/Parmesan.pth", 35 | # "PeasAndCarrots":"package://dope/weights/PeasAndCarrots.pth", 36 | # "Peaches":"package://dope/weights/Peaches.pth", 37 | # "Pineapple":"package://dope/weights/Pineapple.pth", 38 | # "Popcorn":"package://dope/weights/Popcorn.pth", 39 | # "OrangeJuice":"package://dope/weights/OrangeJuice.pth", 40 | # "Raisins":"package://dope/weights/Raisins.pth", 41 | # "SaladDressing":"package://dope/weights/SaladDressing.pth", 42 | # "Spaghetti":"package://dope/weights/Spaghetti.pth", 43 | # "TomatoSauce":"package://dope/weights/TomatoSauce.pth", 44 | # "Tuna":"package://dope/weights/Tuna.pth", 45 | # "Yogurt":"package://dope/weights/Yogurt.pth", 46 | 47 | } 48 | 49 | # Cuboid dimension in cm x,y,z 50 | dimensions: { 51 | "cracker": [16.403600692749023,21.343700408935547,7.179999828338623], 52 | "gelatin": [8.918299674987793, 7.311500072479248, 2.9983000755310059], 53 | "meat": [10.164673805236816,8.3542995452880859,5.7600898742675781], 54 | "mustard": [9.6024150848388672,19.130100250244141,5.824894905090332], 55 | "soup": [6.7659378051757813,10.185500144958496,6.771425724029541], 56 | "sugar": [9.267730712890625,17.625339508056641,4.5134143829345703], 57 | "bleach": [10.267730712890625,26.625339508056641,7.5134143829345703], 58 | 59 | # new objects 60 | "AlphabetSoup" : [ 8.3555002212524414, 7.1121001243591309, 6.6055998802185059 ], 61 | "Butter" : [ 5.282599925994873, 2.3935999870300293, 10.330100059509277 ], 62 | "Ketchup" : [ 14.860799789428711, 4.3368000984191895, 6.4513998031616211 ], 63 | "Pineapple" : [ 5.7623000144958496, 6.95989990234375, 6.567500114440918 ], 64 | "BBQSauce" : [ 14.832900047302246, 4.3478999137878418, 6.4632000923156738 ], 65 | "MacaroniAndCheese" : [ 16.625600814819336, 4.0180997848510742, 12.350899696350098 ], 66 | "Popcorn" : [ 8.4976997375488281, 3.825200080871582, 12.649200439453125 ], 67 | "Mayo" : [ 14.790200233459473, 4.1030998229980469, 6.4541001319885254 ], 68 | "Raisins" : [ 12.317500114440918, 3.9751999378204346, 8.5874996185302734 ], 69 | "Cherries" : [ 5.8038997650146484, 7.0907998085021973, 6.6101999282836914 ], 70 | "Milk" : [ 19.035800933837891, 7.326200008392334, 7.2154998779296875 ], 71 | "SaladDressing" : [ 14.744099617004395, 4.3695998191833496, 6.403900146484375 ], 72 | "ChocolatePudding" : [ 4.947199821472168, 2.9923000335693359, 8.3498001098632812 ], 73 | "Mushrooms" : [ 3.3322000503540039, 7.079899787902832, 6.5869998931884766 ], 74 | "Spaghetti" : [ 4.9836997985839844, 2.8492999076843262, 24.988100051879883 ], 75 | "Cookies" : [ 16.724300384521484, 4.015200138092041, 12.274600028991699 ], 76 | "Mustard" : [ 16.004999160766602, 4.8573999404907227, 6.5132999420166016 ], 77 | "TomatoSauce" : [ 8.2847003936767578, 7.0198001861572266, 6.6469998359680176 ], 78 | "Corn" : [ 5.8038997650146484, 7.0907998085021973, 6.6101999282836914 ], 79 | "OrangeJuice" : [ 19.248300552368164, 7.2781000137329102, 7.1582999229431152 ], 80 | "Tuna" : [ 3.2571001052856445, 7.0805997848510742, 6.5837001800537109 ], 81 | "CreamCheese" : [ 5.3206000328063965, 2.4230999946594238, 10.359000205993652 ], 82 | "Parmesan" : [ 10.286199569702148, 6.6093001365661621, 7.1117000579833984 ], 83 | "Yogurt" : [ 5.3677000999450684, 6.7961997985839844, 6.7915000915527344 ], 84 | "GranolaBars" : [ 12.400600433349609, 3.8738000392913818, 16.53380012512207 ], 85 | "Peaches" : [ 5.7781000137329102, 7.0961999893188477, 6.5925998687744141 ], 86 | "GreenBeans" : [ 5.758699893951416, 7.0608000755310059, 6.5732002258300781 ], 87 | "PeasAndCarrots" : [ 5.8512001037597656, 7.0636000633239746, 6.5918002128601074 ] 88 | } 89 | 90 | class_ids: { 91 | "cracker": 1, 92 | "gelatin": 2, 93 | "meat": 3, 94 | "mustard": 4, 95 | "soup": 5, 96 | "sugar": 6, 97 | "bleach": 7, 98 | "AlphabetSoup" : 9, 99 | "Ketchup" : 10, 100 | "Pineapple" : 11, 101 | "BBQSauce" : 12, 102 | "MacaroniAndCheese" : 13, 103 | "Popcorn" : 14, 104 | "Butter" : 15, 105 | "Mayo" : 16, 106 | "Raisins" : 17, 107 | "Cherries" : 18, 108 | "Milk" : 19, 109 | "SaladDressing" : 20, 110 | "ChocolatePudding" : 21, 111 | "Mushrooms" : 22, 112 | "Spaghetti" : 23, 113 | "Cookies" : 24, 114 | "Mustard" : 25, 115 | "TomatoSauce" : 26, 116 | "Corn" : 27, 117 | "OrangeJuice" : 28, 118 | "Tuna" : 29, 119 | "CreamCheese" : 20, 120 | "Parmesan" : 31, 121 | "Yogurt" : 32, 122 | "GranolaBars" : 33, 123 | "Peaches" : 34, 124 | "GreenBeans" : 35, 125 | "PeasAndCarrots" : 36 126 | } 127 | 128 | draw_colors: { 129 | "cracker": [13, 255, 128], # green 130 | "gelatin": [255, 255, 255], # while 131 | "meat": [0, 104, 255], # blue 132 | "mustard": [217,12, 232], # magenta 133 | "soup": [255, 101, 0], # orange 134 | "sugar": [232, 222, 12], # yellow 135 | "bleach": [232, 222, 12], # yellow 136 | } 137 | 138 | # optional: provide a transform that is applied to the pose returned by DOPE 139 | model_transforms: { 140 | # "cracker": [[ 0, 0, 1, 0], 141 | # [ 0, -1, 0, 0], 142 | # [ 1, 0, 0, 0], 143 | # [ 0, 0, 0, 1]] 144 | } 145 | 146 | # optional: if you provide a mesh of the object here, a mesh marker will be 147 | # published for visualization in RViz 148 | # You can use the nvdu_ycb tool to download the meshes: https://github.com/NVIDIA/Dataset_Utilities#nvdu_ycb 149 | meshes: { 150 | # "cracker": "file://path/to/Dataset_Utilities/nvdu/data/ycb/aligned_cm/003_cracker_box/google_16k/textured.obj", 151 | # "gelatin": "file://path/to/Dataset_Utilities/nvdu/data/ycb/aligned_cm/009_gelatin_box/google_16k/textured.obj", 152 | # "meat": "file://path/to/Dataset_Utilities/nvdu/data/ycb/aligned_cm/010_potted_meat_can/google_16k/textured.obj", 153 | # "mustard": "file://path/to/Dataset_Utilities/nvdu/data/ycb/aligned_cm/006_mustard_bottle/google_16k/textured.obj", 154 | # "soup": "file://path/to/Dataset_Utilities/nvdu/data/ycb/aligned_cm/005_tomato_soup_can/google_16k/textured.obj", 155 | # "sugar": "file://path/to/Dataset_Utilities/nvdu/data/ycb/aligned_cm/004_sugar_box/google_16k/textured.obj", 156 | # "bleach": "file://path/to/Dataset_Utilities/nvdu/data/ycb/aligned_cm/021_bleach_cleanser/google_16k/textured.obj", 157 | } 158 | 159 | # optional: If the specified meshes are not in meters, provide a scale here (e.g. if the mesh is in centimeters, scale should be 0.01). default scale: 1.0. 160 | mesh_scales: { 161 | "cracker": 0.01, 162 | "gelatin": 0.01, 163 | "meat": 0.01, 164 | "mustard": 0.01, 165 | "soup": 0.01, 166 | "sugar": 0.01, 167 | "bleach": 0.01, 168 | } 169 | 170 | overlay_belief_images: True # Whether to overlay the input image on the belief images published on /dope/belief_[obj_name] 171 | 172 | # Config params for DOPE 173 | thresh_angle: 0.5 174 | thresh_map: 0.01 175 | sigma: 3 176 | thresh_points: 0.1 177 | -------------------------------------------------------------------------------- /ros1/dope/launch/camera.launch: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | 5 | 6 | -------------------------------------------------------------------------------- /ros1/dope/launch/dope.launch: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | -------------------------------------------------------------------------------- /ros1/dope/nodes/camera: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python3 2 | # Copyright (c) 2018 NVIDIA Corporation. All rights reserved. 3 | # This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License. 4 | # https://creativecommons.org/licenses/by-nc-sa/4.0/legalcode 5 | 6 | """ 7 | This file opens an RGB camera and publishes images via ROS. 8 | It uses OpenCV to capture from camera 0. 9 | """ 10 | 11 | import cv2 12 | import rospy 13 | from camera_info_manager import CameraInfoManager 14 | from cv_bridge import CvBridge 15 | from sensor_msgs.msg import Image, CameraInfo 16 | import sys 17 | 18 | 19 | def publish_images(freq=100): 20 | cam_index = 0 # index of camera to capture 21 | 22 | ### initialize ROS publishers etc. 23 | rospy.init_node('dope_webcam') 24 | camera_ns = rospy.get_param('camera', 'dope/webcam') 25 | img_topic = '{}/image_raw'.format(camera_ns) 26 | info_topic = '{}/camera_info'.format(camera_ns) 27 | image_pub = rospy.Publisher(img_topic, Image, queue_size=10) 28 | info_pub = rospy.Publisher(info_topic, CameraInfo, queue_size=10) 29 | info_manager = CameraInfoManager(cname='dope_webcam_{}'.format(cam_index), 30 | namespace=camera_ns) 31 | try: 32 | camera_info_url = rospy.get_param('~camera_info_url') 33 | if not info_manager.setURL(camera_info_url): 34 | rospy.logwarn('Camera info URL invalid: %s', camera_info_url) 35 | except KeyError: 36 | # we don't have a camera_info_url, so we'll keep the 37 | # default ('file://${ROS_HOME}/camera_info/${NAME}.yaml') 38 | pass 39 | 40 | info_manager.loadCameraInfo() 41 | if not info_manager.isCalibrated(): 42 | rospy.logwarn('Camera is not calibrated, please supply a valid camera_info_url parameter!') 43 | 44 | ### open camera 45 | cap = cv2.VideoCapture(cam_index) 46 | if not cap.isOpened(): 47 | rospy.logfatal("ERROR: Unable to open camera for capture. Is camera plugged in?") 48 | sys.exit(1) 49 | 50 | rospy.loginfo("Publishing images from camera %s to topic '%s'...", cam_index, img_topic) 51 | rospy.loginfo("Ctrl-C to stop") 52 | 53 | ### publish images 54 | rate = rospy.Rate(freq) 55 | while not rospy.is_shutdown(): 56 | ret, frame = cap.read() 57 | 58 | if ret: 59 | image = CvBridge().cv2_to_imgmsg(frame, "bgr8") 60 | image.header.frame_id = 'dope_webcam' 61 | image.header.stamp = rospy.Time.now() 62 | image_pub.publish(image) 63 | # we need to call getCameraInfo() every time in case it was updated 64 | camera_info = info_manager.getCameraInfo() 65 | camera_info.header = image.header 66 | info_pub.publish(camera_info) 67 | 68 | rate.sleep() 69 | 70 | 71 | if __name__ == "__main__": 72 | try: 73 | publish_images() 74 | except rospy.ROSInterruptException: 75 | pass 76 | -------------------------------------------------------------------------------- /ros1/dope/package.xml: -------------------------------------------------------------------------------- 1 | 2 | 3 | dope 4 | 0.0.0 5 | The DOPE package for deep object pose estimation 6 | 7 | 8 | 9 | 10 | jtremblay 11 | 12 | 13 | 14 | 15 | 16 | CC BY-NC-SA 4.0 17 | 18 | 19 | 20 | 21 | 22 | https://research.nvidia.com/publication/2018-09_Deep-Object-Pose 23 | 24 | 25 | 26 | 27 | 28 | 29 | 30 | 31 | 32 | 33 | 34 | 35 | 36 | 37 | 38 | 39 | 40 | 41 | 42 | 43 | 44 | 45 | 46 | 47 | 48 | 49 | 50 | 51 | catkin 52 | camera_info_manager_py 53 | cv_bridge 54 | geometry_msgs 55 | message_filters 56 | python-argparse 57 | resource_retriever 58 | rospy 59 | sensor_msgs 60 | std_msgs 61 | tf 62 | vision_msgs 63 | visualization_msgs 64 | python3-pyrr-pip 65 | python-pytorch-pip 66 | python3-numpy 67 | python3-scipy 68 | python3-opencv 69 | python3-pil 70 | python-configparser 71 | 72 | 73 | 74 | 75 | 76 | 77 | 78 | -------------------------------------------------------------------------------- /ros1/dope/setup.py: -------------------------------------------------------------------------------- 1 | ## ! DO NOT MANUALLY INVOKE THIS setup.py, USE CATKIN INSTEAD 2 | 3 | from distutils.core import setup 4 | from catkin_pkg.python_setup import generate_distutils_setup 5 | 6 | # fetch values from package.xml 7 | setup_args = generate_distutils_setup( 8 | packages=['dope', 'dope.inference'], 9 | package_dir={'': 'src'}) 10 | 11 | setup(**setup_args) 12 | -------------------------------------------------------------------------------- /ros1/dope/src/dope/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/NVlabs/Deep_Object_Pose/6cb3f3d250aed270e91b605e7cb6e64db4b97dbe/ros1/dope/src/dope/__init__.py -------------------------------------------------------------------------------- /ros1/dope/src/dope/inference/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/NVlabs/Deep_Object_Pose/6cb3f3d250aed270e91b605e7cb6e64db4b97dbe/ros1/dope/src/dope/inference/__init__.py -------------------------------------------------------------------------------- /ros1/dope/src/dope/inference/cuboid.py: -------------------------------------------------------------------------------- 1 | # Copyright (c) 2018 NVIDIA Corporation. All rights reserved. 2 | # This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License. 3 | # https://creativecommons.org/licenses/by-nc-sa/4.0/legalcode 4 | 5 | from enum import IntEnum 6 | 7 | import cv2 8 | import numpy as np 9 | 10 | 11 | # Related to the object's local coordinate system 12 | # @unique 13 | class CuboidVertexType(IntEnum): 14 | FrontTopRight = 0 15 | FrontTopLeft = 1 16 | FrontBottomLeft = 2 17 | FrontBottomRight = 3 18 | RearTopRight = 4 19 | RearTopLeft = 5 20 | RearBottomLeft = 6 21 | RearBottomRight = 7 22 | Center = 8 23 | TotalCornerVertexCount = 8 # Corner vertexes doesn't include the center point 24 | TotalVertexCount = 9 25 | 26 | # List of the vertex indexes in each line edges of the cuboid 27 | CuboidLineIndexes = [ 28 | # Front face 29 | [ CuboidVertexType.FrontTopLeft, CuboidVertexType.FrontTopRight ], 30 | [ CuboidVertexType.FrontTopRight, CuboidVertexType.FrontBottomRight ], 31 | [ CuboidVertexType.FrontBottomRight, CuboidVertexType.FrontBottomLeft ], 32 | [ CuboidVertexType.FrontBottomLeft, CuboidVertexType.FrontTopLeft ], 33 | # Back face 34 | [ CuboidVertexType.RearTopLeft, CuboidVertexType.RearTopRight ], 35 | [ CuboidVertexType.RearTopRight, CuboidVertexType.RearBottomRight ], 36 | [ CuboidVertexType.RearBottomRight, CuboidVertexType.RearBottomLeft ], 37 | [ CuboidVertexType.RearBottomLeft, CuboidVertexType.RearTopLeft ], 38 | # Left face 39 | [ CuboidVertexType.FrontBottomLeft, CuboidVertexType.RearBottomLeft ], 40 | [ CuboidVertexType.FrontTopLeft, CuboidVertexType.RearTopLeft ], 41 | # Right face 42 | [ CuboidVertexType.FrontBottomRight, CuboidVertexType.RearBottomRight ], 43 | [ CuboidVertexType.FrontTopRight, CuboidVertexType.RearTopRight ], 44 | ] 45 | 46 | 47 | # ========================= Cuboid3d ========================= 48 | class Cuboid3d(): 49 | '''This class contains a 3D cuboid.''' 50 | 51 | # Create a box with a certain size 52 | def __init__(self, size3d = [1.0, 1.0, 1.0], center_location = [0, 0, 0], 53 | coord_system = None, parent_object = None): 54 | 55 | # NOTE: This local coordinate system is similar 56 | # to the intrinsic transform matrix of a 3d object 57 | self.center_location = center_location 58 | self.coord_system = coord_system 59 | self.size3d = size3d 60 | self._vertices = [0, 0, 0] * CuboidVertexType.TotalVertexCount 61 | 62 | self.generate_vertexes() 63 | 64 | def get_vertex(self, vertex_type): 65 | """Returns the location of a vertex. 66 | 67 | Args: 68 | vertex_type: enum of type CuboidVertexType 69 | 70 | Returns: 71 | Numpy array(3) - Location of the vertex type in the cuboid 72 | """ 73 | return self._vertices[vertex_type] 74 | 75 | def get_vertices(self): 76 | return self._vertices 77 | 78 | def generate_vertexes(self): 79 | width, height, depth = self.size3d 80 | 81 | # By default just use the normal OpenCV coordinate system 82 | if (self.coord_system is None): 83 | cx, cy, cz = self.center_location 84 | # X axis point to the right 85 | right = cx + width / 2.0 86 | left = cx - width / 2.0 87 | # Y axis point downward 88 | top = cy - height / 2.0 89 | bottom = cy + height / 2.0 90 | # Z axis point forward 91 | front = cz + depth / 2.0 92 | rear = cz - depth / 2.0 93 | 94 | # List of 8 vertices of the box 95 | self._vertices = [ 96 | [right, top, front], # Front Top Right 97 | [left, top, front], # Front Top Left 98 | [left, bottom, front], # Front Bottom Left 99 | [right, bottom, front], # Front Bottom Right 100 | [right, top, rear], # Rear Top Right 101 | [left, top, rear], # Rear Top Left 102 | [left, bottom, rear], # Rear Bottom Left 103 | [right, bottom, rear], # Rear Bottom Right 104 | self.center_location, # Center 105 | ] 106 | else: 107 | sx, sy, sz = self.size3d 108 | forward = np.array(self.coord_system.forward, dtype=float) * sy * 0.5 109 | up = np.array(self.coord_system.up, dtype=float) * sz * 0.5 110 | right = np.array(self.coord_system.right, dtype=float) * sx * 0.5 111 | center = np.array(self.center_location, dtype=float) 112 | self._vertices = [ 113 | center + forward + up + right, # Front Top Right 114 | center + forward + up - right, # Front Top Left 115 | center + forward - up - right, # Front Bottom Left 116 | center + forward - up + right, # Front Bottom Right 117 | center - forward + up + right, # Rear Top Right 118 | center - forward + up - right, # Rear Top Left 119 | center - forward - up - right, # Rear Bottom Left 120 | center - forward - up + right, # Rear Bottom Right 121 | self.center_location, # Center 122 | ] 123 | 124 | def get_projected_cuboid2d(self, cuboid_transform, camera_intrinsic_matrix): 125 | """ 126 | Projects the cuboid into the image plane using camera intrinsics. 127 | 128 | Args: 129 | cuboid_transform: the world transform of the cuboid 130 | camera_intrinsic_matrix: camera intrinsic matrix 131 | 132 | Returns: 133 | Cuboid2d - the projected cuboid points 134 | """ 135 | 136 | world_transform_matrix = cuboid_transform 137 | rvec = [0, 0, 0] 138 | tvec = [0, 0, 0] 139 | dist_coeffs = np.zeros((4, 1)) 140 | 141 | transformed_vertices = [0, 0, 0] * CuboidVertexType.TotalVertexCount 142 | for vertex_index in range(CuboidVertexType.TotalVertexCount): 143 | vertex3d = self._vertices[vertex_index] 144 | transformed_vertices[vertex_index] = world_transform_matrix * vertex3d 145 | 146 | projected_vertices = cv2.projectPoints(transformed_vertices, rvec, tvec, 147 | camera_intrinsic_matrix, dist_coeffs) 148 | 149 | return Cuboid2d(projected_vertices) 150 | -------------------------------------------------------------------------------- /ros1/dope/src/dope/inference/cuboid_pnp_solver.py: -------------------------------------------------------------------------------- 1 | # Copyright (c) 2018 NVIDIA Corporation. All rights reserved. 2 | # This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License. 3 | # https://creativecommons.org/licenses/by-nc-sa/4.0/legalcode 4 | 5 | import cv2 6 | import numpy as np 7 | from .cuboid import CuboidVertexType 8 | from pyrr import Quaternion 9 | 10 | 11 | class CuboidPNPSolver(object): 12 | """ 13 | This class is used to find the 6-DoF pose of a cuboid given its projected vertices. 14 | 15 | Runs perspective-n-point (PNP) algorithm. 16 | """ 17 | 18 | # Class variables 19 | cv2version = cv2.__version__.split('.') 20 | cv2majorversion = int(cv2version[0]) 21 | 22 | def __init__(self, object_name="", camera_intrinsic_matrix = None, cuboid3d = None, 23 | dist_coeffs = np.zeros((4, 1))): 24 | self.object_name = object_name 25 | if (not camera_intrinsic_matrix is None): 26 | self._camera_intrinsic_matrix = camera_intrinsic_matrix 27 | else: 28 | self._camera_intrinsic_matrix = np.array([ 29 | [0, 0, 0], 30 | [0, 0, 0], 31 | [0, 0, 0] 32 | ]) 33 | self._cuboid3d = cuboid3d 34 | 35 | self._dist_coeffs = dist_coeffs 36 | 37 | def set_camera_intrinsic_matrix(self, new_intrinsic_matrix): 38 | '''Sets the camera intrinsic matrix''' 39 | self._camera_intrinsic_matrix = new_intrinsic_matrix 40 | 41 | def set_dist_coeffs(self, dist_coeffs): 42 | '''Sets the camera intrinsic matrix''' 43 | self._dist_coeffs = dist_coeffs 44 | 45 | def solve_pnp(self, cuboid2d_points, pnp_algorithm = None): 46 | """ 47 | Detects the rotation and traslation 48 | of a cuboid object from its vertexes' 49 | 2D location in the image 50 | """ 51 | 52 | location = None 53 | quaternion = None 54 | projected_points = cuboid2d_points 55 | 56 | cuboid3d_points = np.array(self._cuboid3d.get_vertices()) 57 | obj_2d_points = [] 58 | obj_3d_points = [] 59 | 60 | for i in range(CuboidVertexType.TotalVertexCount): 61 | check_point_2d = cuboid2d_points[i] 62 | # Ignore invalid points 63 | if (check_point_2d is None): 64 | continue 65 | obj_2d_points.append(check_point_2d) 66 | obj_3d_points.append(cuboid3d_points[i]) 67 | 68 | obj_2d_points = np.array(obj_2d_points, dtype=float) 69 | obj_3d_points = np.array(obj_3d_points, dtype=float) 70 | 71 | valid_point_count = len(obj_2d_points) 72 | print(valid_point_count, "valid points found" ) 73 | 74 | # Set PNP algorithm based on OpenCV version and number of valid points 75 | is_points_valid = False 76 | 77 | if pnp_algorithm is None: 78 | if CuboidPNPSolver.cv2majorversion == 2: 79 | is_points_valid = True 80 | pnp_algorithm = cv2.CV_ITERATIVE 81 | elif CuboidPNPSolver.cv2majorversion > 2: 82 | if valid_point_count >= 6: 83 | is_points_valid = True 84 | pnp_algorithm = cv2.SOLVEPNP_ITERATIVE 85 | elif valid_point_count >= 4: 86 | is_points_valid = True 87 | pnp_algorithm = cv2.SOLVEPNP_P3P 88 | # This algorithm requires EXACTLY four points, so we truncate our 89 | # data 90 | obj_3d_points = obj_3d_points[:4] 91 | obj_2d_points = obj_2d_points[:4] 92 | # Alternative algorithms: 93 | # pnp_algorithm = SOLVE_PNP_EPNP 94 | else: 95 | assert False, "DOPE will not work with versions of OpenCV earlier than 2.0" 96 | 97 | if is_points_valid: 98 | try: 99 | ret, rvec, tvec = cv2.solvePnP( 100 | obj_3d_points, 101 | obj_2d_points, 102 | self._camera_intrinsic_matrix, 103 | self._dist_coeffs, 104 | flags=pnp_algorithm 105 | ) 106 | except: 107 | # solvePnP will assert if there are insufficient points for the 108 | # algorithm 109 | print("cv2.solvePnP failed with an error") 110 | ret = False 111 | 112 | if ret: 113 | location = list(x[0] for x in tvec) 114 | quaternion = self.convert_rvec_to_quaternion(rvec) 115 | 116 | projected_points, _ = cv2.projectPoints(cuboid3d_points, rvec, tvec, self._camera_intrinsic_matrix, self._dist_coeffs) 117 | projected_points = np.squeeze(projected_points) 118 | 119 | # If the location.Z is negative or object is behind the camera then flip both location and rotation 120 | x, y, z = location 121 | if z < 0: 122 | # Get the opposite location 123 | location = [-x, -y, -z] 124 | 125 | # Change the rotation by 180 degree 126 | rotate_angle = np.pi 127 | rotate_quaternion = Quaternion.from_axis_rotation(location, rotate_angle) 128 | quaternion = rotate_quaternion.cross(quaternion) 129 | 130 | return location, quaternion, projected_points 131 | 132 | def convert_rvec_to_quaternion(self, rvec): 133 | '''Convert rvec (which is log quaternion) to quaternion''' 134 | theta = np.sqrt(rvec[0] * rvec[0] + rvec[1] * rvec[1] + rvec[2] * rvec[2]) # in radians 135 | raxis = [rvec[0] / theta, rvec[1] / theta, rvec[2] / theta] 136 | 137 | # pyrr's Quaternion (order is XYZW), https://pyrr.readthedocs.io/en/latest/oo_api_quaternion.html 138 | return Quaternion.from_axis_rotation(raxis, theta) 139 | 140 | # Alternatively: pyquaternion 141 | # return Quaternion(axis=raxis, radians=theta) # uses OpenCV's Quaternion (order is WXYZ) 142 | 143 | def project_points(self, rvec, tvec): 144 | '''Project points from model onto image using rotation, translation''' 145 | output_points, tmp = cv2.projectPoints( 146 | self.__object_vertex_coordinates, 147 | rvec, 148 | tvec, 149 | self.__camera_intrinsic_matrix, 150 | self.__dist_coeffs) 151 | 152 | output_points = np.squeeze(output_points) 153 | return output_points 154 | -------------------------------------------------------------------------------- /ros1/dope/src/dope/utils.py: -------------------------------------------------------------------------------- 1 | import math 2 | 3 | import torch 4 | 5 | 6 | def make_grid(tensor, nrow=8, padding=2, 7 | normalize=False, range_=None, scale_each=False, pad_value=0): 8 | """Make a grid of images. 9 | Args: 10 | tensor (Tensor or list): 4D mini-batch Tensor of shape (B x C x H x W) 11 | or a list of images all of the same size. 12 | nrow (int, optional): Number of images displayed in each row of the grid. 13 | The Final grid size is (B / nrow, nrow). Default is 8. 14 | padding (int, optional): amount of padding. Default is 2. 15 | normalize (bool, optional): If True, shift the image to the range (0, 1), 16 | by subtracting the minimum and dividing by the maximum pixel value. 17 | range (tuple, optional): tuple (min, max) where min and max are numbers, 18 | then these numbers are used to normalize the image. By default, min and max 19 | are computed from the tensor. 20 | scale_each (bool, optional): If True, scale each image in the batch of 21 | images separately rather than the (min, max) over all images. 22 | pad_value (float, optional): Value for the padded pixels. 23 | Example: 24 | See this notebook `here `_ 25 | """ 26 | if not (torch.is_tensor(tensor) or 27 | (isinstance(tensor, list) and all(torch.is_tensor(t) for t in tensor))): 28 | raise TypeError('tensor or list of tensors expected, got {}'.format(type(tensor))) 29 | 30 | # if list of tensors, convert to a 4D mini-batch Tensor 31 | if isinstance(tensor, list): 32 | tensor = torch.stack(tensor, dim=0) 33 | 34 | if tensor.dim() == 2: # single image H x W 35 | tensor = tensor.view(1, tensor.size(0), tensor.size(1)) 36 | if tensor.dim() == 3: # single image 37 | if tensor.size(0) == 1: # if single-channel, convert to 3-channel 38 | tensor = torch.cat((tensor, tensor, tensor), 0) 39 | tensor = tensor.view(1, tensor.size(0), tensor.size(1), tensor.size(2)) 40 | 41 | if tensor.dim() == 4 and tensor.size(1) == 1: # single-channel images 42 | tensor = torch.cat((tensor, tensor, tensor), 1) 43 | 44 | if normalize is True: 45 | tensor = tensor.clone() # avoid modifying tensor in-place 46 | if range_ is not None: 47 | assert isinstance(range_, tuple), \ 48 | "range has to be a tuple (min, max) if specified. min and max are numbers" 49 | 50 | def norm_ip(img, min, max): 51 | img.clamp_(min=min, max=max) 52 | img.add_(-min).div_(max - min + 1e-5) 53 | 54 | def norm_range(t, range_): 55 | if range_ is not None: 56 | norm_ip(t, range_[0], range_[1]) 57 | else: 58 | norm_ip(t, float(t.min()), float(t.max())) 59 | 60 | if scale_each is True: 61 | for t in tensor: # loop over mini-batch dimension 62 | norm_range(t, range) 63 | else: 64 | norm_range(tensor, range) 65 | 66 | if tensor.size(0) == 1: 67 | return tensor.squeeze() 68 | 69 | # make the mini-batch of images into a grid 70 | nmaps = tensor.size(0) 71 | xmaps = min(nrow, nmaps) 72 | ymaps = int(math.ceil(float(nmaps) / xmaps)) 73 | height, width = int(tensor.size(2) + padding), int(tensor.size(3) + padding) 74 | grid = tensor.new(3, height * ymaps + padding, width * xmaps + padding).fill_(pad_value) 75 | k = 0 76 | for y in range(ymaps): 77 | for x in range(xmaps): 78 | if k >= nmaps: 79 | break 80 | grid.narrow(1, y * height + padding, height - padding) \ 81 | .narrow(2, x * width + padding, width - padding) \ 82 | .copy_(tensor[k]) 83 | k = k + 1 84 | return grid 85 | 86 | 87 | def get_image_grid(tensor, nrow=3, padding=2, mean=None, std=None): 88 | """ 89 | Saves a given Tensor into an image file. 90 | If given a mini-batch tensor, will save the tensor as a grid of images. 91 | """ 92 | from PIL import Image 93 | 94 | # tensor = tensor.cpu() 95 | grid = make_grid(tensor, nrow=nrow, padding=padding, pad_value=1) 96 | if not mean is None: 97 | # ndarr = grid.mul(std).add(mean).mul(255).byte().transpose(0,2).transpose(0,1).numpy() 98 | ndarr = grid.mul(std).add(mean).mul(255).byte().transpose(0, 2).transpose(0, 1).numpy() 99 | else: 100 | ndarr = grid.mul(0.5).add(0.5).mul(255).byte().transpose(0, 2).transpose(0, 1).numpy() 101 | im = Image.fromarray(ndarr) 102 | return im 103 | -------------------------------------------------------------------------------- /ros1/dope/weights/README.md: -------------------------------------------------------------------------------- 1 | This is the default location for DOPE weights. 2 | -------------------------------------------------------------------------------- /ros1/requirements.txt: -------------------------------------------------------------------------------- 1 | pyrr==0.10.3 2 | torch==1.6.0 3 | torchvision==0.7.0 4 | numpy==1.17.4 5 | scipy==1.5.2 6 | opencv_python==4.4.0.44 7 | Pillow==8.1.1 8 | configparser==5.0.0 9 | -------------------------------------------------------------------------------- /ros2/README.md: -------------------------------------------------------------------------------- 1 | NVIDIA's [Isaac ROS2](https://nvidia-isaac-ros.github.io/index.html) project provides a collection of hardware-accelerated, ROS 2 packages, including one for [DOPE](https://nvidia-isaac-ros.github.io/repositories_and_packages/isaac_ros_pose_estimation/index.html). 2 | 3 | This code is for inference only, so [training a custom model](https://nvidia-isaac-ros.github.io/concepts/pose_estimation/dope/tutorial_custom_model.html) should be done with the code in this repository. 4 | 5 | -------------------------------------------------------------------------------- /sample_data/000000.json: -------------------------------------------------------------------------------- 1 | { 2 | "camera_data": {}, 3 | "objects": [ 4 | { 5 | "class": "cracker", 6 | "visibility": 1.0, 7 | "location": [ 8 | -5.40710973739624, 9 | 10.20408821105957, 10 | 55.31976318359375 11 | ], 12 | "quaternion_xyzw": [ 13 | -0.42416033148765564, 14 | 0.6872081160545349, 15 | 0.20139221847057343, 16 | -0.5543231964111328 17 | ], 18 | "projected_cuboid": [ 19 | [ 20 | 201.36776733398438, 21 | 563.1981201171875 22 | ], 23 | [ 24 | 153.2436981201172, 25 | 382.7738952636719 26 | ], 27 | [ 28 | 57.269569396972656, 29 | 247.0780792236328 30 | ], 31 | [ 32 | 102.94410705566406, 33 | 470.3310852050781 34 | ], 35 | [ 36 | 288.4808654785156, 37 | 528.7132568359375 38 | ], 39 | [ 40 | 228.2211456298828, 41 | 353.6261291503906 42 | ], 43 | [ 44 | 157.5666961669922, 45 | 208.70346069335938 46 | ], 47 | [ 48 | 226.2147216796875, 49 | 422.1109619140625 50 | ], 51 | [ 52 | 180.91781616210938, 53 | 397.6921691894531 54 | ] 55 | ] 56 | } 57 | ] 58 | } 59 | -------------------------------------------------------------------------------- /sample_data/000000.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/NVlabs/Deep_Object_Pose/6cb3f3d250aed270e91b605e7cb6e64db4b97dbe/sample_data/000000.png -------------------------------------------------------------------------------- /sample_data/000001.json: -------------------------------------------------------------------------------- 1 | { 2 | "camera_data": {}, 3 | "objects": [ 4 | { 5 | "class": "cracker", 6 | "visibility": 1.0, 7 | "location": [ 8 | -5.40710973739624, 9 | 10.20408821105957, 10 | 55.31976318359375 11 | ], 12 | "quaternion_xyzw": [ 13 | -0.42416033148765564, 14 | 0.6872081160545349, 15 | 0.20139221847057343, 16 | -0.5543231964111328 17 | ], 18 | "projected_cuboid": [ 19 | [ 20 | 201.36776733398438, 21 | 563.1981201171875 22 | ], 23 | [ 24 | 153.2436981201172, 25 | 382.7738952636719 26 | ], 27 | [ 28 | 57.269569396972656, 29 | 247.0780792236328 30 | ], 31 | [ 32 | 102.94410705566406, 33 | 470.3310852050781 34 | ], 35 | [ 36 | 288.4808654785156, 37 | 528.7132568359375 38 | ], 39 | [ 40 | 228.2211456298828, 41 | 353.6261291503906 42 | ], 43 | [ 44 | 157.5666961669922, 45 | 208.70346069335938 46 | ], 47 | [ 48 | 226.2147216796875, 49 | 422.1109619140625 50 | ], 51 | [ 52 | 180.91781616210938, 53 | 397.6921691894531 54 | ] 55 | ] 56 | } 57 | ] 58 | } 59 | -------------------------------------------------------------------------------- /sample_data/000001.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/NVlabs/Deep_Object_Pose/6cb3f3d250aed270e91b605e7cb6e64db4b97dbe/sample_data/000001.png -------------------------------------------------------------------------------- /sample_data/000030.json: -------------------------------------------------------------------------------- 1 | { 2 | "camera_data": {}, 3 | "objects": [ 4 | { 5 | "class": "cracker", 6 | "visibility": 1.0, 7 | "location": [ 8 | 6.001536846160889, 9 | -4.993739604949951, 10 | 86.97349548339844 11 | ], 12 | "quaternion_xyzw": [ 13 | 0.05684935301542282, 14 | 0.32983407378196716, 15 | -0.32128748297691345, 16 | 0.8858622908592224 17 | ], 18 | "projected_cuboid": [ 19 | [ 20 | 294.026123046875, 21 | 258.42864990234375 22 | ], 23 | [ 24 | 407.443115234375, 25 | 274.3418884277344 26 | ], 27 | [ 28 | 363.9100646972656, 29 | 112.13758850097656 30 | ], 31 | [ 32 | 240.95428466796875, 33 | 112.85871887207031 34 | ], 35 | [ 36 | 264.0873718261719, 37 | 293.51092529296875 38 | ], 39 | [ 40 | 379.4239501953125, 41 | 314.3736877441406 42 | ], 43 | [ 44 | 329.1734924316406, 45 | 149.2497100830078 46 | ], 47 | [ 48 | 204.38331604003906, 49 | 145.52359008789062 50 | ], 51 | [ 52 | 309.0063171386719, 53 | 211.89468383789062 54 | ] 55 | ] 56 | } 57 | ] 58 | } 59 | -------------------------------------------------------------------------------- /sample_data/000030.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/NVlabs/Deep_Object_Pose/6cb3f3d250aed270e91b605e7cb6e64db4b97dbe/sample_data/000030.png -------------------------------------------------------------------------------- /sample_data/000031.json: -------------------------------------------------------------------------------- 1 | { 2 | "camera_data": {}, 3 | "objects": [ 4 | { 5 | "class": "cracker", 6 | "visibility": 1.0, 7 | "location": [ 8 | 6.001536846160889, 9 | -4.993739604949951, 10 | 86.97349548339844 11 | ], 12 | "quaternion_xyzw": [ 13 | 0.05684935301542282, 14 | 0.32983407378196716, 15 | -0.32128748297691345, 16 | 0.8858622908592224 17 | ], 18 | "projected_cuboid": [ 19 | [ 20 | 294.026123046875, 21 | 258.42864990234375 22 | ], 23 | [ 24 | 407.443115234375, 25 | 274.3418884277344 26 | ], 27 | [ 28 | 363.9100646972656, 29 | 112.13758850097656 30 | ], 31 | [ 32 | 240.95428466796875, 33 | 112.85871887207031 34 | ], 35 | [ 36 | 264.0873718261719, 37 | 293.51092529296875 38 | ], 39 | [ 40 | 379.4239501953125, 41 | 314.3736877441406 42 | ], 43 | [ 44 | 329.1734924316406, 45 | 149.2497100830078 46 | ], 47 | [ 48 | 204.38331604003906, 49 | 145.52359008789062 50 | ], 51 | [ 52 | 309.0063171386719, 53 | 211.89468383789062 54 | ] 55 | ] 56 | } 57 | ] 58 | } 59 | -------------------------------------------------------------------------------- /sample_data/000031.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/NVlabs/Deep_Object_Pose/6cb3f3d250aed270e91b605e7cb6e64db4b97dbe/sample_data/000031.png -------------------------------------------------------------------------------- /train/.gitignore: -------------------------------------------------------------------------------- 1 | output/* 2 | **__pycache__** 3 | inference/output/* 4 | evaluate/output/* 5 | docker/drivers/* -------------------------------------------------------------------------------- /train/README.md: -------------------------------------------------------------------------------- 1 | # Deep Object Pose Estimation (DOPE) - Training 2 | 3 | This repo contains a simplified version of the training pipeline for DOPE. 4 | Scripts for inference, evaluation, and data visualization can be found in this repos top-level directories `inference` and `evaluate`. 5 | 6 | A user report of training DOPE on a single GPU using NVISII-created synthetic data can [be found here](https://github.com/NVlabs/Deep_Object_Pose/issues/155#issuecomment-791148200). 7 | 8 | 9 | 10 | ## Installing Dependencies 11 | ***Note*** 12 | 13 | It is highly recommended to install these dependencies in a virtual environment. You can create and activate a virtual environment by running: 14 | ``` 15 | python -m venv ./output/dope_training 16 | source ./output/dope_training/bin/activate 17 | ``` 18 | --- 19 | To install the required dependencies, run: 20 | ``` 21 | pip install -r ../requirements.txt 22 | ``` 23 | 24 | ## Training 25 | To run the training script, at minimum the ``--data`` and ``--object`` flags must be specified if training with data that is stored locally: 26 | ``` 27 | python -m torch.distributed.launch --nproc_per_node=1 train.py --data PATH_TO_DATA --object CLASS_OF_OBJECT 28 | ``` 29 | The ``--data`` flag specifies the path to the training data. There can be multiple paths that are passed in. 30 | 31 | The ``--object`` flag specifies the name of the object to train the DOPE model on. 32 | Although multiple objects can be passed in, DOPE is designed to be trained for a specific object. For best results, only specify one object. 33 | The name of this object must match the `"class"` field in ground-truth `.json` files. 34 | 35 | To get a full list of the command line arguments, run `python train.py --help`. 36 | 37 | ### Loading Data from `s3` 38 | There is also an option to train with data that is stored on an `s3` bucket. The script uses `boto3` to load data from `s3`. 39 | The easiest way to configure credentials with `boto3` is with a config file, which you can [setup using this guide](https://boto3.amazonaws.com/v1/documentation/api/latest/guide/credentials.html#aws-config-file). 40 | 41 | When training with data from `s3`, be sure to specify the ``--use_s3`` flag and also the ``--train_buckets`` flag that indicates which buckets to use for training. 42 | Note that multiple buckets can be specified with the `--train_buckets` flag. 43 | 44 | In addition, the `--endpoint` must be specified in order to retrieve data from an `s3` bucket. 45 | 46 | Below is a sample command to run the training script while using data from `s3`. 47 | ``` 48 | torchrun --nproc_per_node=1 train.py --use_s3 --train_buckets BUCKET_1 BUCKET_2 --endpoint ENDPOINT_URL --object CLASS_OF_OBJECT 49 | ``` 50 | 51 | ### Multi-GPU Training 52 | 53 | To run on multi-GPU machines, set `--nproc_per_node=`. In addition, reduce the number of epochs by a factor of the number of GPUs you have. 54 | For example, when running on an 8-GPU machine, setting ``--epochs 5`` is equivalent to running `40` epochs on a single GPU machine. 55 | 56 | ## Debugging 57 | There is an option to visualize the `projected_cuboid_points` in the ground truth file. To do so, run: 58 | ``` 59 | python debug.py --data PATH_TO_IMAGES 60 | ``` 61 | 62 | ## Common Issues 63 | 64 | 1. If you notice you are running out of memory when training, reduce the batch size by specifying a smaller ``--batchsize`` value. By default, this value is `32`. 65 | 2. If you are running into dependency issues when installing, 66 | you can try to install the version specific dependencies that are commented out in `requirements.txt`. Be sure to do this in a virtual environment. 67 | 68 | -------------------------------------------------------------------------------- /train/docker/Dockerfile: -------------------------------------------------------------------------------- 1 | FROM nvcr.io/nvidia/pytorch:21.02-py3 2 | 3 | # make sure to run ./get_nvidia_libs.sh before running docker build 4 | COPY drivers/* /usr/lib/x86_64-linux-gnu/ 5 | 6 | # Note, installing manually instead of using pip install -r requirements.txt because 7 | # the base PyTorch container already has some dependencies installed. Installing using 8 | # requirements.txt will cause circular dependency issues. 9 | RUN git clone https://github.com/andrewyguo/dope_training.git \ 10 | && pip install --no-input tensorboardX \ 11 | && pip install --no-input boto3 \ 12 | && pip install --no-input albumentations \ 13 | && pip install --no-input pyrr \ 14 | && pip install --no-input simplejson \ 15 | && pip install --no-input visii \ 16 | && pip install --no-input opencv_python==4.5.4.60 \ 17 | && apt-get update \ 18 | && export DEBIAN_FRONTEND=noninteractive \ 19 | && apt-get install s3cmd -y \ 20 | && apt-get install -y libgl1 21 | 22 | WORKDIR /workspace/dope_training 23 | 24 | # Uncomment and fill in if using s3 25 | # RUN mkdir ~/.aws \ 26 | # && echo "[default]" >> ~/.aws/config \ 27 | # && echo "aws_access_key_id = " >> ~/.aws/config \ 28 | # && echo "aws_secret_access_key = " >> ~/.aws/config \ 29 | # # Setup config files for s3 authentication 30 | # && echo "[default]" >> ~/.s3cfg \ 31 | # && echo "use_https = True" >> ~/.s3cfg \ 32 | # && echo "access_key = " >> ~/.s3cfg \ 33 | # && echo "secret_key = " >> ~/.s3cfg \ 34 | # && echo "bucket_location = us-east-1" >> ~/.s3cfg \ 35 | # && echo "host_base = " >> ~/.s3cfg \ 36 | # && echo "host_bucket = bucket-name" >> ~/.s3cfg 37 | -------------------------------------------------------------------------------- /train/docker/get_nvidia_libs.sh: -------------------------------------------------------------------------------- 1 | # Copies over the necessary driver files into ./drivers. 2 | # During docker build, files in ./drivers are copied into the container. 3 | # This is needed to run visii in evaluate.py 4 | cd drivers/ 5 | cp /usr/lib/x86_64-linux-gnu/libnvoptix.so.1 . 6 | cp /usr/lib/x86_64-linux-gnu/*nvidia* . 7 | -------------------------------------------------------------------------------- /train/misc/arial.ttf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/NVlabs/Deep_Object_Pose/6cb3f3d250aed270e91b605e7cb6e64db4b97dbe/train/misc/arial.ttf -------------------------------------------------------------------------------- /train/misc/test_projection.py: -------------------------------------------------------------------------------- 1 | import numpy as np 2 | import pyrender 3 | 4 | def transform_points(local_points, local_to_world_matrix): 5 | """ 6 | Transforms 3D points from local to world coordinates using a given transformation matrix. 7 | 8 | Parameters: 9 | local_points (numpy array of shape (N, 3)): Points in local coordinates 10 | local_to_world_matrix (numpy array of shape (4, 4)): Transformation matrix from local to world coordinates 11 | 12 | Returns: 13 | world_points (numpy array of shape (N, 3)): Points in world coordinates 14 | """ 15 | 16 | # Convert local points to homogeneous coordinates 17 | local_points_hom = np.concatenate([local_points, np.ones((local_points.shape[0], 1))], axis=-1) 18 | local_to_world_matrix_col_major = local_to_world_matrix.T 19 | print_matrix(local_to_world_matrix_col_major, "local_to_world_matrix_col_major") 20 | print_matrix(local_points_hom, "local_points_hom") 21 | 22 | world_points_hom = local_to_world_matrix_col_major @ local_points_hom.T 23 | 24 | # world_points_hom = np.dot(local_to_world_matrix.T, local_points_hom.T) 25 | # print_matrix(world_points_hom, "world_points_hom") 26 | print_matrix(local_to_world_matrix, "local_to_world_matrix") 27 | 28 | world_points = world_points_hom[:, :3] 29 | print_matrix(world_points, "world_points") 30 | return world_points 31 | 32 | def print_matrix(matrix, name): 33 | print(f"--\n{name}: {matrix.shape} ") 34 | for row in matrix: 35 | print(row) 36 | 37 | def get_image_space_points(points, view_proj_matrix): 38 | """ 39 | Args: 40 | points: numpy array of N points (N, 3) in the world space. Points will be projected into the image space. 41 | view_proj_matrix: Desired view projection matrix, transforming points from world frame to image space of desired camera 42 | Returns: 43 | numpy array of shape (N, 3) of points projected into the image space. 44 | """ 45 | 46 | homo = np.pad(points, ((0, 0), (0, 1)), constant_values=1.0) 47 | tf_points = np.dot(homo, view_proj_matrix) 48 | tf_points = tf_points / (tf_points[..., -1:]) 49 | tf_points[..., :2] = 0.5 * (tf_points[..., :2] + 1) 50 | image_space_points = tf_points[..., :3] 51 | 52 | return image_space_points 53 | 54 | 55 | import json 56 | json_fp = "../sample_data/000000.json" 57 | 58 | with open(json_fp) as f: 59 | data = json.load(f) 60 | 61 | camera_data = data["camera_data"] 62 | camera_view_matrix = np.array(camera_data["camera_view_matrix"]) 63 | 64 | cx = camera_data["intrinsics"]["cx"] 65 | cy = camera_data["intrinsics"]["cy"] 66 | fx = camera_data["intrinsics"]["fx"] 67 | fy = camera_data["intrinsics"]["fy"] 68 | 69 | camera_intrinsic_matrix = np.array([[fx, 0, cx], [0, fy, cy], [0, 0, 1]]) 70 | 71 | cam = pyrender.IntrinsicsCamera(fx, fy, cx, cy) 72 | 73 | view_projection_matrix = cam.get_projection_matrix(camera_data["width"], camera_data["height"]) 74 | 75 | view_projection_matrix = view_projection_matrix.T 76 | 77 | points_3d = np.array(data["objects"][0]["local_cuboid"]) 78 | local_to_world_matrix = np.array(data["objects"][0]["local_to_world_matrix"]) 79 | 80 | 81 | world_points_3d = transform_points(points_3d, local_to_world_matrix) 82 | 83 | print_matrix(view_projection_matrix, "view_projection_matrix") 84 | image_space_points = get_image_space_points(world_points_3d, view_projection_matrix) 85 | 86 | resolution = np.array([[camera_data["width"], camera_data["height"], 1.0]]) 87 | image_space_points *= resolution 88 | 89 | projected_cuboid_points = [ 90 | [pixel_coordinate[0], pixel_coordinate[1]] for pixel_coordinate in image_space_points 91 | ] 92 | print("--\nimage_space_points: ") 93 | for row in image_space_points: 94 | print(row) 95 | 96 | -------------------------------------------------------------------------------- /walkthrough.md: -------------------------------------------------------------------------------- 1 | # DOPE Pipeline Walkthrough 2 | 3 | Here we provide a detailed example of using the data generation, training, and inference tools provided in this repo. This example is 4 | given not only to demonstrate the various tools, but also to show what 5 | kind of results you can expect. 6 | 7 | We have uploaded our final PTH file as well as some sample data 8 | to [Google Drive](https://drive.google.com/drive/folders/1zq4yJUj8lTn56bWdOMnkCr1Wmj0dq-GL). 9 | 10 | 11 | ## Preparation 12 | We assume you have installed the project dependencies and are in an environment where you have access to GPUs. 13 | 14 | 15 | Follow [the instructions](data_generation/readme.md) for downloading environment maps and distractors. 16 | 17 | Download or create a textured model of your object of interest. For this walkthrough, we will use the [Ketchup](https://drive.google.com/drive/folders/1ICXdhNhahDiUrjh_r5aBMPYbb2aWMCJF?usp=drive_link) model from the [HOPE 3D Model Set](https://drive.google.com/drive/folders/1jiJS9KgcYAkfb8KJPp5MRlB0P11BStft/). 18 | 19 | 20 | For the sake of the example commands, we will assume the following folder 21 | structure: 22 | `~/data/dome_hdri_haven/` contains the HDR environment maps; 23 | `~/data/google_scanned_models/` contains the distractor objects; 24 | `~/data/models/` contains our "hero" models in subdirectories; e.g. `~/data/models/Ketchup`. 25 | 26 | ## Data Generation 27 | We will use the BlenderProc data generation utilities. In the `data_generation/blenderproc_data_gen` directory, run the following command: 28 | 29 | ``` 30 | ./run_blenderproc_datagen.py --nb_runs 10 --nb_frames 50000 --path_single_obj ~/data/models/Ketchup/google_16k/textured.obj --nb_objects 1 --distractors_folder ~/data/google_scanned_models/ --nb_distractors 10 --backgrounds_folder ~/data/dome_hdri_haven/ --outf ~/data/KetchupData 31 | ``` 32 | 33 | This will create ten subdirectories under the `~/data/KetchupData` directory, each containing 5000 images (`nb_images` divided by `nb_runs`). For Blender efficiency reasons, the distractors are only changed from run to run. That is, we will have 10 different selections of distractors in our 50,000 images. If you want 34 | a greater selection of distractors, increase the `nb_runs` parameter. 35 | 36 | ## Training 37 | 38 | Assuming your machine has *N* GPUs, run the following command: 39 | 40 | ``` 41 | python -m torch.distributed.launch --nproc_per_node=N ./train.py --data ~/data/KetchupData --object Ketchup --epochs 2000 --save_every 100 42 | ``` 43 | 44 | This command will train DOPE for 2000 epochs, saving a checkpoint every 100 epochs. 45 | 46 | ## Inference 47 | When training is finished, you will have several saved checkpoints including the final one: `final_net_epoch_2000.pth`. We will use this checkpoint for inference. 48 | 49 | 50 | Generate a small number of new images in the same distribution as your training images. We will use these for inference testing and evaluation. 51 | ``` 52 | ./run_blenderproc_datagen.py --nb_runs 2 --nb_frames 20 --path_single_obj ~/data/models/Ketchup/google_16k/textured.obj --nb_objects 1 --distractors_folder ~/data/google_scanned_models/ --nb_distractors 10 --backgrounds_folder ~/data/dome_hdri_haven/ --outf ~/data/KetchupTest 53 | ``` 54 | For convenience, we have uploaded 20 test images and JSON files to the [Google Drive](https://drive.google.com/drive/folders/1zq4yJUj8lTn56bWdOMnkCr1Wmj0dq-GL) 55 | location mentioned above. 56 | 57 | 58 | Inside the `inference` directory, run the following command: 59 | ``` 60 | python ./inference.py --camera ../config/blenderproc_camera_info_example.yaml --object Ketchup --parallel --weights final_net_epoch_2000.pth --data ~/data/KetchupTest/ 61 | ``` 62 | 63 | The inference output will be in the `output` directory. Using our provided `final_net_epoch_2000.pth` and our provided test images, DOPE finds the object of interest in 13 out of 20 images. 64 | 65 | -------------------------------------------------------------------------------- /weights/readme.md: -------------------------------------------------------------------------------- 1 | We have trained and tested DOPE with two publicly available datasets: YCB, and HOPE. These trained weights can be 2 | [downloaded from Google Drive](https://drive.google.com/drive/folders/1DfoA3m_Bm0fW8tOWXGVxi4ETlLEAgmcg). 3 | 4 | 5 | --------------------------------------------------------------------------------