├── .gitignore ├── LICENSE.md ├── README.md ├── arch_configs ├── dream_resnet_f.yaml ├── dream_resnet_h.yaml ├── dream_vgg_f.yaml └── dream_vgg_q.yaml ├── data └── DOWNLOAD.sh ├── docker ├── Dockerfile.kinetic ├── README.md ├── init_workspace.sh └── run_dream_docker.sh ├── dream-franka.png ├── dream ├── .gitignore ├── __init__.py ├── add_plots.py ├── analysis.py ├── datasets.py ├── geometric_vision.py ├── image_proc.py ├── models.py ├── network.py ├── oks_plots.py ├── spatial_softmax.py └── utilities.py ├── manip_configs ├── baxter.yaml ├── kuka.yaml └── panda.yaml ├── requirements.txt ├── scripts ├── analyze_training.py ├── analyze_training_multi.py ├── launch_dream_ros.py ├── network_inference.py ├── network_inference_dataset.py ├── train_network.py ├── train_network_multi.py └── visualize_network_inference.py ├── setup.py ├── test ├── _camera_settings.json ├── test_image_proc.py └── test_utilities.py └── trained_models └── DOWNLOAD.sh /.gitignore: -------------------------------------------------------------------------------- 1 | analyze_results/*.png 2 | __pycache__/ 3 | training_results/ 4 | -------------------------------------------------------------------------------- /LICENSE.md: -------------------------------------------------------------------------------- 1 | # NVIDIA Source Code License-NC 2 | 3 | ## 1. Definitions 4 | 5 | “Licensor” means any person or entity that distributes its Work. 6 | 7 | “Software” means the original work of authorship made available under this License. 8 | “Work” means the Software and any additions to or derivative works of the Software that are made available under this License. 9 | 10 | “Nvidia Processors” means any central processing unit (CPU), graphics processing unit (GPU), field-programmable gate array (FPGA), application-specific integrated circuit (ASIC) or any combination thereof designed, made, sold, or provided by Nvidia or its affiliates. 11 | 12 | The terms “reproduce,” “reproduction,” “derivative works,” and “distribution” have the meaning as provided under U.S. copyright law; provided, however, that for the purposes of this License, derivative works shall not include works that remain separable from, or merely link (or bind by name) to the interfaces of, the Work. 13 | 14 | Works, including the Software, are “made available” under this License by including in or with the Work either (a) a copyright notice referencing the applicability of this License to the Work, or (b) a copy of this License. 15 | 16 | ## 2. License Grants 17 | 18 | ### 2.1 Copyright Grant. 19 | Subject to the terms and conditions of this License, each Licensor grants to you a perpetual, worldwide, non-exclusive, royalty-free, copyright license to reproduce, prepare derivative works of, publicly display, publicly perform, sublicense and distribute its Work and any resulting derivative works in any form. 20 | 21 | ## 3. Limitations 22 | 23 | ### 3.1 Redistribution. 24 | You may reproduce or distribute the Work only if (a) you do so under this License, (b) you include a complete copy of this License with your distribution, and (c) you retain without modification any copyright, patent, trademark, or attribution notices that are present in the Work. 25 | 26 | ### 3.2 Derivative Works. 27 | You may specify that additional or different terms apply to the use, reproduction, and distribution of your derivative works of the Work (“Your Terms”) only if (a) Your Terms provide that the use limitation in Section 3.3 applies to your derivative works, and (b) you identify the specific derivative works that are subject to Your Terms. Notwithstanding Your Terms, this License (including the redistribution requirements in Section 3.1) will continue to apply to the Work itself. 28 | 29 | ### 3.3 Use Limitation. 30 | The Work and any derivative works thereof only may be used or intended for use non-commercially. The Work or derivative works thereof may be used or intended for use by Nvidia or its affiliates commercially or non-commercially. As used herein, “non-commercially” means for research or evaluation purposes only. 31 | 32 | ### 3.4 Patent Claims. 33 | If you bring or threaten to bring a patent claim against any Licensor (including any claim, cross-claim or counterclaim in a lawsuit) to enforce any patents that you allege are infringed by any Work, then your rights under this License from such Licensor (including the grants in Sections 2.1 and 2.2) will terminate immediately. 34 | 35 | ### 3.5 Trademarks. 36 | This License does not grant any rights to use any Licensor’s or its affiliates’ names, logos, or trademarks, except as necessary to reproduce the notices described in this License. 37 | 38 | ### 3.6 Termination. 39 | If you violate any term of this License, then your rights under this License (including the grants in Sections 2.1 and 2.2) will terminate immediately. 40 | 41 | ## 4. Disclaimer of Warranty. 42 | 43 | THE WORK IS PROVIDED “AS IS” WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, EITHER EXPRESS OR IMPLIED, INCLUDING WARRANTIES OR CONDITIONS OF M ERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE, TITLE OR NON-INFRINGEMENT. YOU BEAR THE RISK OF UNDERTAKING ANY ACTIVITIES UNDER THIS LICENSE. 44 | 45 | ## 5. Limitation of Liability. 46 | 47 | EXCEPT AS PROHIBITED BY APPLICABLE LAW, IN NO EVENT AND UNDER NO LEGAL THEORY, WHETHER IN TORT (INCLUDING NEGLIGENCE), CONTRACT, OR OTHERWISE SHALL ANY LICENSOR BE LIABLE TO YOU FOR DAMAGES, INCLUDING ANY DIRECT, INDIRECT, SPECIAL, INCIDENTAL, OR CONSEQUENTIAL DAMAGES ARISING OUT OF OR RELATED TO THIS LICENSE, THE USE OR INABILITY TO USE THE WORK (INCLUDING BUT NOT LIMITED TO LOSS OF GOODWILL, BUSINESS INTERRUPTION, LOST PROFITS OR DATA, COMPUTER FAILURE OR MALFUNCTION, OR ANY OTHER COMM ERCIAL DAMAGES OR LOSSES), EVEN IF THE LICENSOR HAS BEEN ADVISED OF THE POSSIBILITY OF SUCH DAMAGES. 48 | 49 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # DREAM: Deep Robot-to-Camera Extrinsics for Articulated Manipulators 2 | 3 | This is the official implementation of ["Camera-to-robot pose estimation from a single image"](https://arxiv.org/abs/1911.09231) (ICRA 2020). The DREAM system uses a robot-specific deep neural network to detect keypoints (typically joint locations) in the RGB image of a robot manipulator. Using these keypoint locations along with the robot forward kinematics, the camera pose with respect to the 4 | robot is estimated using a perspective-n-point (PnP) algorithm. For more details, please see our [paper](https://arxiv.org/abs/1911.09231) and [video](https://youtu.be/O1qAFboFQ8A). 5 | 6 | ![DREAM in operation](dream-franka.png) 7 | 8 | 9 | ## **Installation** 10 | 11 | We have tested on Ubuntu 16.04 and 18.04 with an NVIDIA GeForce RTX 2080 and Titan X, with both Python 2.7 and Python 3.6. The code may work on other systems. 12 | 13 | Install the DREAM package and its dependencies using `pip`: 14 | 15 | ``` 16 | pip install . -r requirements.txt 17 | ``` 18 | 19 | Download the pre-trained models and (optionally) data. In the scripts below, be sure to comment out files you do not want, as they are very large. Alternatively, you can download files [manually](https://drive.google.com/drive/folders/1Krp-fCT9ffEML3IpweSOgWiMHHBw6k2Z?usp=sharing) 20 | ``` 21 | cd trained_models; ./DOWNLOAD.sh; cd .. 22 | cd data; ./DOWNLOAD.sh; cd .. 23 | ``` 24 | 25 | Unit tests are implemented in the `pytest` framework. Verify your installation by running them: `pytest test/` 26 | 27 | 28 | ## **Offline inference** 29 | 30 | There are three scripts for offline inference: 31 | 32 | ``` 33 | # Run the network on a single image to display detected 2D keypoints. 34 | python scripts/network_inference.py -i -m 35 | 36 | # Process a dataset to save both 2D keypoints and 3D poses. 37 | python scripts/network_inference_dataset.py -i -d -o -b 16 -w 8 38 | 39 | # Run the network on an image sequence 40 | # (either a dataset or a directory of images, e.g., from a video), 41 | # and saves the resulting visualizations as videos. 42 | python scripts/visualize_network_inference.py -i -d -o -s -e 43 | ``` 44 | 45 | Pass `-h` for help on command line arguments. Datasets are assumed to be in [NDDS](https://github.com/NVIDIA/Dataset_Synthesizer) format. 46 | 47 | 48 | #### **Example for single-image inference** 49 | 50 | Single-image inference from one frame of the Panda-3Cam RealSense dataset using the DREAM-vgg-Q network: 51 | 52 | ``` 53 | python scripts/network_inference.py -i trained_models/panda_dream_vgg_q.pth -m data/real/panda-3cam_realsense/000000.rgb.jpg 54 | ``` 55 | 56 | You should see the detected keypoints printed to the screen as well as overlaid on the Panda robot. (See note below regarding the Panda keypoint locations.) 57 | 58 | 59 | #### **Example for dataset inference** 60 | 61 | Inference on the Panda-3Cam RealSense dataset using the DREAM-vgg-Q network: 62 | 63 | ``` 64 | python scripts/network_inference_dataset.py -i trained_models/panda_dream_vgg_q.pth -d data/real/panda-3cam_realsense/ -o -b 16 -w 8 65 | ``` 66 | 67 | The analysis will print to both the screen and file. You should see that the percentage of correct keypoints (PCK) and the area under the curve (AUC) is about 0.720, and the average distance (ADD) AUC is about 0.792. Various visualizations will also be saved to disk. 68 | 69 | #### **Example for generating inference visualizations** 70 | 71 | Generating visualizations for one of the sequences in the Panda-3Cam RealSense dataset using the DREAM-vgg-Q network: 72 | 73 | ``` 74 | python scripts/visualize_network_inference.py -i trained_models/panda_dream_vgg_q.pth -d data/real/panda-3cam_realsense -o -fps 120.0 -s 004151 75 | ``` 76 | This creates videos at 4x normal camera framerate. 77 | 78 | ## **Online inference using ROS** 79 | 80 | A ROS node is provided for real-time camera pose estimation. 81 | Some values, such as ROS topic names, may 82 | need to be changed for your application. Because of incompatabilities between 83 | ROS (before Noetic) and Python 3, the DREAM ROS node is implemented using Python 2.7. For ease 84 | of use, we have provided a [Docker setup](docker/) containing all the necessary 85 | compoinents to run DREAM with ROS Kinetic. 86 | 87 | Example to run the DREAM ROS node (in verbose mode): 88 | 89 | ``` 90 | python scripts/launch_dream_ros.py -i trained_models/baxter_dream_vgg_q.pth -b torso -v 91 | ``` 92 | 93 | 94 | ## **Training** 95 | 96 | Below is an example for training a DREAM-vgg-Q model for the Franka Emika Panda robot: 97 | 98 | ``` 99 | python scripts/train_network.py -i data/synthetic/panda_synth_train_dr/ -t 0.8 -m manip_configs/panda.yaml -ar arch_configs/dream_hourglass_example.yaml -e 25 -lr 0.00015 -b 128 -w 16 -o 100 | ``` 101 | 102 | The models below are defined in the following architecture files: 103 | - DREAM-vgg-Q: `arch_configs/dream_vgg_q.yaml` 104 | - DREAM-vgg-F: `arch_configs/deam_vgg_f.yaml` 105 | - DREAM-resnet-H: `arch_configs/dream_resnet_h.yaml` 106 | - DREAM-resnet-F: `arch_configs/dream_resnet_f.yaml` (very large network and unwieldy to train) 107 | 108 | ## **Note on Panda keypoints** 109 | 110 | By default, keypoints are defined at the joint locations as defined by the robot URDF file. In the case of the Panda robot, the URDF file defines the joints at non-intuitive locations. As a result, visualizations of keypoint detections may appear to be wrong when they are in fact correct (see [our video](https://youtu.be/O1qAFboFQ8A)). We have since modified the URDF to place the keypoints at the actual joint locations (see Fig. 5a of our paper), but for simplicity we are not releasing the modified URDF at this time. 111 | 112 | 113 | ## **Note on reproducing results** 114 | 115 | The experiments in the paper used the image preprocessing type `shrink-and-crop`, which preserves the same aspect ratio of the input image, but crops the width to send 400 x 400 resolution to the network (which is the resolution used during training). In order to allow for full-frame inference, the models we released have the default image preprocessing type `resize`, which prevents this cropping. Careful analysis has shown almost no difference in quantitative results, but if you are looking to reproduce our ICRA results exactly, please change the `architecture/image_processing` value to `shrink-and-crop`. 116 | 117 | The PCK and ADD plots in the paper are generated from `oks_plots.py` and `add_plots.py`. The AUC in these figures (and in Table 1) are in the `analysis_results.txt` file that is produced by `scripts/network_inference_dataset.py`. 118 | 119 | # Further information 120 | 121 | For an example of how DREAM can be used in practice for vision-based object manipulation, please refer to ["Indirect Object-to-Robot Pose Estimation from an External Monocular RGB Camera"](https://research.nvidia.com/publication/2020-07_Indirect-Object-Pose) by Jonathan Tremblay, Stephen Tyree, Terry Mosier, and Stan Birchfield. 122 | 123 | For more information on how DREAM learns robot keypoint detection from sim-to-real transfer using only synthetic domain randomized images, please refer to our [sim2realAI blog post](https://sim2realai.github.io/dream-camera-calibration-sim2real/). 124 | 125 | 126 | # License 127 | 128 | DREAM is licensed under the [NVIDIA Source Code License - Non-commercial](LICENSE.md). 129 | 130 | 131 | # Citation 132 | 133 | Please cite our work if you use it for your research. Thank you! 134 | 135 | ``` 136 | @inproceedings{lee2020icra:dream, 137 | title={Camera-to-Robot Pose Estimation from a Single Image}, 138 | author={Lee, Timothy E and Tremblay, Jonathan and To, Thang and Cheng, Jia and Mosier, Terry and Kroemer, Oliver and Fox, Dieter and Birchfield, Stan}, 139 | booktitle={International Conference on Robotics and Automation (ICRA)}, 140 | year=2020, 141 | url={https://arxiv.org/abs/1911.09231} 142 | } 143 | ``` 144 | 145 | # Acknowledgment 146 | Thanks to Jeffrey Smith (jeffreys@nvidia.com) for assistance in preparing this release. 147 | -------------------------------------------------------------------------------- /arch_configs/dream_resnet_f.yaml: -------------------------------------------------------------------------------- 1 | !!omap 2 | - architecture: !!omap 3 | - type: resnet 4 | - target: belief_maps 5 | - full_decoder: true 6 | - input_heads: 7 | - image_rgb 8 | - output_heads: 9 | - belief_maps 10 | - image_normalization: !!omap 11 | - mean: [0.5, 0.5, 0.5] 12 | - stdev: [0.5, 0.5, 0.5] 13 | - loss: !!omap 14 | - type: mse 15 | - training: !!omap 16 | - config: !!omap 17 | - image_preprocessing: shrink-and-crop 18 | - net_input_resolution: [400, 400] 19 | -------------------------------------------------------------------------------- /arch_configs/dream_resnet_h.yaml: -------------------------------------------------------------------------------- 1 | !!omap 2 | - architecture: !!omap 3 | - type: resnet 4 | - target: belief_maps 5 | - input_heads: 6 | - image_rgb 7 | - output_heads: 8 | - belief_maps 9 | - image_normalization: !!omap 10 | - mean: [0.5, 0.5, 0.5] 11 | - stdev: [0.5, 0.5, 0.5] 12 | - loss: !!omap 13 | - type: mse 14 | - training: !!omap 15 | - config: !!omap 16 | - image_preprocessing: shrink-and-crop 17 | - net_input_resolution: [400, 400] 18 | -------------------------------------------------------------------------------- /arch_configs/dream_vgg_f.yaml: -------------------------------------------------------------------------------- 1 | !!omap 2 | - architecture: !!omap 3 | - type: vgg 4 | - target: belief_maps 5 | - deconv_decoder: true 6 | - input_heads: 7 | - image_rgb 8 | - output_heads: 9 | - belief_maps 10 | - image_normalization: !!omap 11 | - mean: [0.5, 0.5, 0.5] 12 | - stdev: [0.5, 0.5, 0.5] 13 | - loss: !!omap 14 | - type: mse 15 | - training: !!omap 16 | - config: !!omap 17 | - image_preprocessing: shrink-and-crop 18 | - net_input_resolution: [400, 400] 19 | -------------------------------------------------------------------------------- /arch_configs/dream_vgg_q.yaml: -------------------------------------------------------------------------------- 1 | !!omap 2 | - architecture: !!omap 3 | - type: vgg 4 | - target: belief_maps 5 | - input_heads: 6 | - image_rgb 7 | - output_heads: 8 | - belief_maps 9 | - image_normalization: !!omap 10 | - mean: [0.5, 0.5, 0.5] 11 | - stdev: [0.5, 0.5, 0.5] 12 | - loss: !!omap 13 | - type: mse 14 | - training: !!omap 15 | - config: !!omap 16 | - image_preprocessing: shrink-and-crop 17 | - net_input_resolution: [400, 400] 18 | -------------------------------------------------------------------------------- /data/DOWNLOAD.sh: -------------------------------------------------------------------------------- 1 | #!/bin/bash 2 | 3 | # This script makes it easy to download real/synthetic images with annotations. 4 | # (Files are at https://drive.google.com/open?id=1Krp-fCT9ffEML3IpweSOgWiMHHBw6k2Z) 5 | 6 | # Comment out any data you do not want. 7 | 8 | echo 'Warning: Files are *very* large. Be sure to comment out any files you do not want.' 9 | 10 | 11 | #----- real test data ----------------------------------- 12 | 13 | mkdir -p real 14 | cd real 15 | 16 | gdown --id 10Tpx8jAfzP6g44WXfvjlVywbIlxZ4BRx # Panda-3Cam, Azure (368 MB) 17 | gdown --id 14TJ9o9QOdb25zlZ3onsOJlSb7-tGrvKz # Panda-3Cam, Kinect360 (295 MB) 18 | gdown --id 1FFAFpJFwzsjD83S9-Y1ODwDWiWlh1X6P # Panda-3Cam, RealSense (343 MB) 19 | gdown --id 1kL7Goibx4lwKQoO-UQ4gm94f_XdEKTUZ # Panda-Orb, RealSense (2 GB) 20 | 21 | cd .. 22 | 23 | 24 | #----- synthetic training / validation data ----------------------------------- 25 | 26 | mkdir -p synthetic 27 | cd synthetic 28 | 29 | #----- Rethink Robotics' Baxter 30 | gdown --id 1MSRwQpg690RvuvtjNuGYA1ILGipX16dW # test DR (541 MB) 31 | gdown --id 1SzUPYmNxe1OsbGyWdpdkoRjWJurs-NAF # train (9 GB) 32 | 33 | #----- Kuka LBR iiwa 7 R800 34 | gdown --id 1kGvSlVScmMohZStS-_NfCpCa5SBAcx_i # test DR (445 MB) 35 | gdown --id 1ChF4jAGMPbPwe2dOZYPJ2t2rCSR0Xw9R # test non-DR (222 MB) 36 | gdown --id 1HTW3YEGDO22zOT56jFWxfizznw4aGMpU # train (8 GB) 37 | 38 | #----- Franka Emika Panda 39 | gdown --id 1tOzfzlRhUbRO-QkzPHvj-IOPFuSIts-R # test DR (450 MB) 40 | gdown --id 11pK1BqfQkzVnTjyQHVRZ6ZkX4oyxbEQP # test non-DR (190 MB) 41 | gdown --id 1ZXzseMa7aMIKxK4BNH2gacmm3_XGJvxm # train (8 GB) 42 | 43 | cd .. 44 | -------------------------------------------------------------------------------- /docker/Dockerfile.kinetic: -------------------------------------------------------------------------------- 1 | FROM nvidia/cudagl:10.0-devel-ubuntu16.04 2 | 3 | # Copyright (c) 2020, NVIDIA CORPORATION. All rights reserved. 4 | # Full license terms provided in LICENSE.md file. 5 | 6 | # Build with: 7 | # docker build -t nvidia-dream:kinetic-v1 -f Dockerfile.kinetic .. 8 | 9 | ENV HOME /root 10 | 11 | WORKDIR ${HOME} 12 | 13 | RUN apt-get update && apt-get -y --no-install-recommends install software-properties-common 14 | 15 | # install packages requried for setup and installing ROS 16 | RUN apt-get update && apt-get install -q -y \ 17 | dirmngr \ 18 | gnupg2 \ 19 | wget \ 20 | python-tk \ 21 | && rm -rf /var/lib/apt/lists/* 22 | 23 | # install python 24 | RUN add-apt-repository ppa:deadsnakes/ppa \ 25 | && apt-get update \ 26 | && apt-get -y --no-install-recommends install python libpython-dev \ 27 | && wget https://bootstrap.pypa.io/get-pip.py \ 28 | && python2.7 get-pip.py 29 | 30 | # Install DREAM requirements 31 | COPY requirements.txt ${HOME} 32 | RUN python2 -m pip install --no-cache-dir -r requirements.txt 33 | 34 | # set up keys 35 | RUN apt-key adv --keyserver hkp://keyserver.ubuntu.com:80 \ 36 | --recv-keys C1CF6E31E6BADE8868B172B4F42ED6FBAB17C654 37 | 38 | # setup sources.list 39 | RUN echo "deb http://packages.ros.org/ros/ubuntu xenial main" > /etc/apt/sources.list.d/ros1-latest.list 40 | 41 | # install bootstrap tools 42 | RUN apt-get update && apt-get install --no-install-recommends -y \ 43 | python-rosdep \ 44 | python-rosinstall \ 45 | python-vcstools \ 46 | && rm -rf /var/lib/apt/lists/* 47 | 48 | # setup environment 49 | ENV LANG C.UTF-8 50 | ENV LC_ALL C.UTF-8 51 | ENV DISPLAY :0 52 | # Some QT-Apps don't not show controls without this 53 | ENV QT_X11_NO_MITSHM 1 54 | 55 | # bootstrap rosdep 56 | ENV ROS_DISTRO kinetic 57 | RUN rosdep init \ 58 | && rosdep update --rosdistro $ROS_DISTRO 59 | 60 | # install ros packages 61 | RUN apt-get update \ 62 | && apt-get install -y ros-kinetic-ros-core=1.3.2-0* \ 63 | && apt-get install --no-install-recommends -y \ 64 | ros-$ROS_DISTRO-rviz \ 65 | ros-$ROS_DISTRO-tf2 \ 66 | ros-$ROS_DISTRO-cv-bridge \ 67 | ros-$ROS_DISTRO-image-view \ 68 | && rm -rf /var/lib/apt/lists/* 69 | 70 | RUN echo 'source /opt/ros/$ROS_DISTRO/setup.bash' >> ${HOME}/.bashrc 71 | RUN echo 'source /root/catkin_ws/devel/setup.bash' >> ${HOME}/.bashrc 72 | 73 | # Setup catkin workspace 74 | ENV CATKIN_WS ${HOME}/catkin_ws 75 | COPY docker/init_workspace.sh ${HOME} 76 | RUN ${HOME}/init_workspace.sh 77 | 78 | # cleanup 79 | RUN rm /root/requirements.txt /root/get-pip.py /root/init_workspace.sh 80 | 81 | CMD ["bash"] 82 | -------------------------------------------------------------------------------- /docker/README.md: -------------------------------------------------------------------------------- 1 | ## DREAM in a Docker Container 2 | 3 | Running ROS inside of [Docker](https://www.docker.com/) is an excellent way to 4 | experiment with DREAM, as it allows the user to completely isolate all software 5 | and configuration changes from the host system. This document describes how to 6 | create and run a Docker image that contains a complete ROS environment that 7 | supports DREAM, including all required components, such as ROS Kinetic, OpenCV, 8 | CUDA, and other packages. 9 | 10 | The current configuration assumes all components are installed on an x86 host 11 | platform running Ubuntu 16.04 or later. Further, use of the DREAM Docker container 12 | requires an NVIDIA GPU. 13 | 14 | 15 | ### Steps 16 | 17 | 1. **Create the container** 18 | ``` 19 | $ cd deep-arm-cal/docker 20 | $ docker build -t nvidia-dream:kinetic-v1 -f Dockerfile.kinetic .. 21 | ``` 22 | This will take several minutes and requires an internet connection. 23 | 24 | 2. **Run the container** 25 | ``` 26 | $ ./run_dream_docker.sh [name] [host dir] [container dir] 27 | ``` 28 | Parameters: 29 | - `name` is an optional field that specifies the name of this image. By default, it is `nvidia-dream-v1`. By using different names, you can create multiple containers from the same image. 30 | - `host dir` and `container dir` are a pair of optional fields that allow you to specify a mapping between a directory on your host machine and a location inside the container. This is useful for sharing code and data between the two systems. By default, it maps the directory containing DREAM to `/root/catkin_ws/src/dream` in the container. 31 | 32 | Only the first invocation of this script with a given name will create a container. Subsequent executions will attach to the running container allowing you -- in effect -- to have multiple terminal sessions into a single container. 33 | 34 | 3. **Install DREAM** 35 | 36 | Inside the running Docker container: 37 | ``` 38 | $ cd /root/catkin_ws/src/dream 39 | $ pip install -e . 40 | ``` 41 | 42 | 4. **Run the tests** 43 | 44 | Inside the running Docker container: 45 | ``` 46 | $ cd /root/catkin_ws/src/dream 47 | $ pytest test/ 48 | ``` 49 | 50 | See the README.md in the top-level directory of this repository for information about 51 | running DREAM. -------------------------------------------------------------------------------- /docker/init_workspace.sh: -------------------------------------------------------------------------------- 1 | #!/bin/bash 2 | # 3 | 4 | # Copyright (c) 2020, NVIDIA CORPORATION. All rights reserved. 5 | # Full license terms provided in LICENSE.md file. 6 | 7 | # Stop in case of any error. 8 | set -e 9 | 10 | source /opt/ros/kinetic/setup.bash 11 | 12 | # Create catkin workspace. 13 | mkdir -p ${CATKIN_WS}/src 14 | cd ${CATKIN_WS}/src 15 | catkin_init_workspace 16 | cd .. 17 | catkin_make 18 | source devel/setup.bash 19 | -------------------------------------------------------------------------------- /docker/run_dream_docker.sh: -------------------------------------------------------------------------------- 1 | #!/bin/bash 2 | 3 | # Copyright (c) 2018, NVIDIA CORPORATION. All rights reserved. 4 | # Full license terms provided in LICENSE.md file. 5 | 6 | CONTAINER_NAME=$1 7 | if [[ -z "${CONTAINER_NAME}" ]]; then 8 | CONTAINER_NAME=nvidia-dream-v1 9 | fi 10 | 11 | # This specifies a mapping between a host directory and a directory in the 12 | # docker container. This mapping should be changed if you wish to have access to 13 | # a different directory 14 | HOST_DIR=$2 15 | if [[ -z "${HOST_DIR}" ]]; then 16 | HOST_DIR=`realpath ${PWD}/..` 17 | fi 18 | 19 | CONTAINER_DIR=$3 20 | if [[ -z "${CONTAINER_DIR}" ]]; then 21 | CONTAINER_DIR=/root/catkin_ws/src/dream 22 | fi 23 | 24 | echo "Container name : ${CONTAINER_NAME}" 25 | echo "Host directory : ${HOST_DIR}" 26 | echo "Container directory: ${CONTAINER_DIR}" 27 | DREAM_ID=`docker ps -aqf "name=^/${CONTAINER_NAME}$"` 28 | if [ -z "${DREAM_ID}" ]; then 29 | echo "Creating new DREAM docker container." 30 | xhost + 31 | docker run --gpus all -it --privileged --network=host -v ${HOST_DIR}:${CONTAINER_DIR}:rw -v /tmp/.X11-unix:/tmp/.X11-unix -e DISPLAY=unix${DISPLAY} --name=${CONTAINER_NAME} nvidia-dream:kinetic-v1 bash 32 | else 33 | echo "Found DREAM docker container: ${DREAM_ID}." 34 | # Check if the container is already running and start if necessary. 35 | if [ -z `docker ps -qf "name=^/${CONTAINER_NAME}$"` ]; then 36 | xhost +local:${DREAM_ID} 37 | echo "Starting and attaching to ${CONTAINER_NAME} container..." 38 | docker start ${DREAM_ID} 39 | docker attach ${DREAM_ID} 40 | else 41 | echo "Found running ${CONTAINER_NAME} container, attaching bash..." 42 | docker exec -it ${DREAM_ID} bash 43 | fi 44 | fi 45 | -------------------------------------------------------------------------------- /dream-franka.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/NVlabs/DREAM/3360f2aa45f66a58eaa70d0d40f2c46c2682c0bb/dream-franka.png -------------------------------------------------------------------------------- /dream/.gitignore: -------------------------------------------------------------------------------- 1 | *.csv 2 | *.pdf 3 | -------------------------------------------------------------------------------- /dream/__init__.py: -------------------------------------------------------------------------------- 1 | __version__ = "1.3.0" 2 | 3 | from .analysis import * 4 | from .datasets import * 5 | from .geometric_vision import * 6 | from .image_proc import * 7 | from .network import * 8 | from .models import * 9 | from .utilities import * 10 | -------------------------------------------------------------------------------- /dream/add_plots.py: -------------------------------------------------------------------------------- 1 | # Copyright (c) 2020 NVIDIA Corporation. All rights reserved. 2 | # This work is licensed under the NVIDIA Source Code License - Non-commercial. Full 3 | # text can be found in LICENSE.md 4 | 5 | import pandas as pd 6 | import argparse 7 | import numpy as np 8 | import matplotlib.pyplot as plt 9 | import seaborn as sns 10 | 11 | plt.style.use("seaborn-whitegrid") 12 | 13 | 14 | # Example of running the script 15 | # python oks_plots.py --data all_dataset_keypoints.csv all_dataset_keypoints.csv --labels 1 2 16 | 17 | # pythonw oks_plots.py --data deep-arm-cal-paper/data/dope/3cam_real_keypoints.csv deep-arm-cal-paper/data/dream_hg/3cam_real_keypoints.csv deep-arm-cal-paper/data/dream_hg_deconv/3cam_real_keypoints.csv deep-arm-cal-paper/data/resimple/3cam_real_keypoints.csv --labels DOPE DREAM AE resnet 18 | parser = argparse.ArgumentParser(description="OKS for DREAM") 19 | parser.add_argument( 20 | "--data", nargs="+", default="[all_dataset_keypoints.csv]", help="list of csv files" 21 | ) 22 | 23 | parser.add_argument( 24 | "--labels", 25 | nargs="+", 26 | default=None, 27 | help="names for each dataset to be added as label", 28 | ) 29 | 30 | parser.add_argument("--styles", nargs="+", default=None, help="") 31 | parser.add_argument("--threshold", default=0.1) 32 | 33 | parser.add_argument("--colours", nargs="+", default=None, help="") 34 | 35 | parser.add_argument("--pixel", default=20) 36 | 37 | parser.add_argument("--output", default="output.pdf") 38 | 39 | parser.add_argument("--show", default=False, action="store_true") 40 | 41 | parser.add_argument("--divide", default=False, action="store_true") 42 | 43 | 44 | parser.add_argument("--title", default=None) 45 | 46 | args = parser.parse_args() 47 | print(args) 48 | 49 | 50 | fig = plt.figure() 51 | ax = plt.axes() 52 | 53 | for i_csv, csv_file in enumerate(args.data): 54 | print(csv_file) 55 | 56 | if csv_file == "666": 57 | plt.plot([], [], " ", label=args.labels[i_csv].replace("_", " ")) 58 | continue 59 | 60 | name_csv = csv_file.replace(".csv", "") 61 | 62 | df = pd.read_csv(csv_file) 63 | 64 | if args.divide is True: 65 | add = np.array(df["add"].tolist()) / 100 66 | else: 67 | add = np.array(df["add"].tolist()) 68 | 69 | pnp_sol_found_magic_number = -9.99 if args.divide else -999.0 70 | 71 | n_inframe_gt_projs = np.array(df["n_inframe_gt_projs"].tolist()) 72 | n_pnp_possible_frames = len(np.where(n_inframe_gt_projs >= 4)[0]) 73 | add_pnp_found = add[np.where(add > pnp_sol_found_magic_number)] 74 | n_pnp_found = len(add_pnp_found) 75 | 76 | delta_threshold = 0.00001 77 | add_threshold_values = np.arange(0.0, args.threshold, delta_threshold) 78 | 79 | counts = [] 80 | for value in add_threshold_values: 81 | under_threshold = ( 82 | len(np.where(add_pnp_found <= value)[0]) / n_pnp_possible_frames 83 | ) 84 | counts.append(under_threshold) 85 | 86 | auc = np.trapz(counts, dx=delta_threshold) / args.threshold 87 | 88 | # TODO: consolidate above calculations with pnp_metrics 89 | from dream import analysis as dream_analysis 90 | 91 | pnp_metrics = dream_analysis.pnp_metrics(df["add"], df["n_inframe_gt_projs"]) 92 | assert pnp_metrics["add_auc"] == auc 93 | assert pnp_metrics["add_mean"] == np.mean( 94 | add[np.where(add > pnp_sol_found_magic_number)] 95 | ) 96 | assert pnp_metrics["add_median"] == np.median( 97 | add[np.where(add > pnp_sol_found_magic_number)] 98 | ) 99 | assert pnp_metrics["add_std"] == np.std( 100 | add[np.where(add > pnp_sol_found_magic_number)] 101 | ) 102 | assert pnp_metrics["num_pnp_found"] == n_pnp_found 103 | assert pnp_metrics["num_pnp_possible"] == n_pnp_possible_frames 104 | 105 | # divide might screw this up .... to check! 106 | print("auc", auc) 107 | print("found", n_pnp_found / n_pnp_possible_frames) 108 | print("mean", np.mean(add[np.where(add > pnp_sol_found_magic_number)])) 109 | print("median", np.median(add[np.where(add > pnp_sol_found_magic_number)])) 110 | print("std", np.std(add[np.where(add > pnp_sol_found_magic_number)])) 111 | 112 | try: 113 | label = args.labels[i_csv] 114 | except: 115 | label = name_csv 116 | cycle = plt.rcParams["axes.prop_cycle"].by_key()["color"] 117 | try: 118 | 119 | colour = cycle[int(args.colours[i_csv])] 120 | except: 121 | colour = "" 122 | 123 | try: 124 | style = args.styles[i_csv] 125 | if style == "0": 126 | style = "-" 127 | elif style == "1": 128 | style = "--" 129 | elif style == "2": 130 | style = ":" 131 | 132 | else: 133 | style = "-" 134 | except: 135 | style = "-" 136 | 137 | label = f"{label} ({auc:.3f})" 138 | ax.plot(add_threshold_values, counts, style, color=colour, label=label) 139 | 140 | plt.xlabel("ADD threshold distance (mm)") 141 | plt.ylabel("Accuracy") 142 | plt.title(args.title) 143 | ax.legend(loc="lower right", frameon=True, fancybox=True, framealpha=0.8) 144 | 145 | 146 | legend = ax.get_legend() 147 | for i, t in enumerate(legend.get_texts()): 148 | if args.data[i] == "666": 149 | t.set_ha("left") # ha is alias for horizontalalignment 150 | t.set_position((-30, 0)) 151 | 152 | ax.set_ylim(0, 1) 153 | ax.set_xlim(0, float(args.threshold)) 154 | ax.set_xticklabels([0, 20, 40, 60, 80, 100]) 155 | 156 | plt.savefig(args.output) 157 | if args.show: 158 | plt.show() 159 | -------------------------------------------------------------------------------- /dream/datasets.py: -------------------------------------------------------------------------------- 1 | # Copyright (c) 2020 NVIDIA Corporation. All rights reserved. 2 | # This work is licensed under the NVIDIA Source Code License - Non-commercial. Full 3 | # text can be found in LICENSE.md 4 | 5 | from enum import IntEnum 6 | 7 | import albumentations as albu 8 | import numpy as np 9 | from PIL import Image as PILImage 10 | import torch 11 | from torch.utils.data import Dataset as TorchDataset 12 | import torchvision.transforms as TVTransforms 13 | 14 | import dream 15 | 16 | # ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 17 | 18 | # Debug mode: 19 | # 0: no debug mode 20 | # 1: light debug 21 | # 2: heavy debug 22 | class ManipulatorNDDSDatasetDebugLevels(IntEnum): 23 | # No debug information 24 | NONE = 0 25 | # Minor debug information, passing of extra info but not saving to disk 26 | LIGHT = 1 27 | # Heavy debug information, including saving data to disk 28 | HEAVY = 2 29 | # Interactive debug mode, not intended to be used for actual training 30 | INTERACTIVE = 3 31 | 32 | 33 | class ManipulatorNDDSDataset(TorchDataset): 34 | def __init__( 35 | self, 36 | ndds_dataset, 37 | manipulator_name, 38 | keypoint_names, 39 | network_input_resolution, 40 | network_output_resolution, 41 | image_normalization, 42 | image_preprocessing, 43 | augment_data=False, 44 | include_ground_truth=True, 45 | include_belief_maps=False, 46 | debug_mode=ManipulatorNDDSDatasetDebugLevels["NONE"], 47 | ): 48 | # Read in the camera intrinsics 49 | self.ndds_dataset_data = ndds_dataset[0] 50 | self.ndds_dataset_config = ndds_dataset[1] 51 | self.manipulator_name = manipulator_name 52 | self.keypoint_names = keypoint_names 53 | self.network_input_resolution = network_input_resolution 54 | self.network_output_resolution = network_output_resolution 55 | self.augment_data = augment_data 56 | 57 | # If include_belief_maps is specified, include_ground_truth must also be 58 | # TBD: revisit better way of passing inputs, maybe to make one argument instead of two 59 | if include_belief_maps: 60 | assert ( 61 | include_ground_truth 62 | ), 'If "include_belief_maps" is True, "include_ground_truth" must also be True.' 63 | self.include_ground_truth = include_ground_truth 64 | self.include_belief_maps = include_belief_maps 65 | 66 | self.debug_mode = debug_mode 67 | 68 | assert ( 69 | isinstance(image_normalization, dict) or not image_normalization 70 | ), 'Expected image_normalization to be either a dict specifying "mean" and "stdev", or None or False to specify no normalization.' 71 | 72 | # Image normalization 73 | # Basic PIL -> tensor without normalization, used for visualizing the net input image 74 | self.tensor_from_image_no_norm_tform = TVTransforms.Compose( 75 | [TVTransforms.ToTensor()] 76 | ) 77 | 78 | if image_normalization: 79 | assert ( 80 | "mean" in image_normalization and len(image_normalization["mean"]) == 3 81 | ), 'When image_normalization is a dict, expected key "mean" specifying a 3-tuple to exist, but it does not.' 82 | assert ( 83 | "stdev" in image_normalization 84 | and len(image_normalization["stdev"]) == 3 85 | ), 'When image_normalization is a dict, expected key "stdev" specifying a 3-tuple to exist, but it does not.' 86 | 87 | self.tensor_from_image_tform = TVTransforms.Compose( 88 | [ 89 | TVTransforms.ToTensor(), 90 | TVTransforms.Normalize( 91 | image_normalization["mean"], image_normalization["stdev"] 92 | ), 93 | ] 94 | ) 95 | else: 96 | # Use the PIL -> tensor tform without normalization if image_normalization isn't specified 97 | self.tensor_from_image_tform = self.tensor_from_image_no_norm_tform 98 | 99 | assert ( 100 | image_preprocessing in dream.image_proc.KNOWN_IMAGE_PREPROC_TYPES 101 | ), 'Image preprocessing type "{}" is not recognized.'.format( 102 | image_preprocessing 103 | ) 104 | self.image_preprocessing = image_preprocessing 105 | 106 | def __len__(self): 107 | return len(self.ndds_dataset_data) 108 | 109 | def __getitem__(self, index): 110 | 111 | # Parse this datum 112 | datum = self.ndds_dataset_data[index] 113 | image_rgb_path = datum["image_paths"]["rgb"] 114 | 115 | # Extract keypoints from the json file 116 | data_path = datum["data_path"] 117 | if self.include_ground_truth: 118 | keypoints = dream.utilities.load_keypoints( 119 | data_path, self.manipulator_name, self.keypoint_names 120 | ) 121 | else: 122 | # Generate an empty 'keypoints' dict 123 | keypoints = dream.utilities.load_keypoints( 124 | data_path, self.manipulator_name, [] 125 | ) 126 | 127 | # Load image and transform to network input resolution -- pre augmentation 128 | image_rgb_raw = PILImage.open(image_rgb_path).convert("RGB") 129 | image_raw_resolution = image_rgb_raw.size 130 | 131 | # Do image preprocessing, including keypoint conversion 132 | image_rgb_before_aug = dream.image_proc.preprocess_image( 133 | image_rgb_raw, self.network_input_resolution, self.image_preprocessing 134 | ) 135 | kp_projs_before_aug = dream.image_proc.convert_keypoints_to_netin_from_raw( 136 | keypoints["projections"], 137 | image_raw_resolution, 138 | self.network_input_resolution, 139 | self.image_preprocessing, 140 | ) 141 | 142 | # Handle data augmentation 143 | if self.augment_data: 144 | augmentation = albu.Compose( 145 | [ 146 | albu.GaussNoise(), 147 | albu.RandomBrightnessContrast(brightness_by_max=False), 148 | albu.ShiftScaleRotate(rotate_limit=15), 149 | ], 150 | p=1.0, 151 | keypoint_params={"format": "xy", "remove_invisible": False}, 152 | ) 153 | data_to_aug = { 154 | "image": np.array(image_rgb_before_aug), 155 | "keypoints": kp_projs_before_aug, 156 | } 157 | augmented_data = augmentation(**data_to_aug) 158 | image_rgb_net_input = PILImage.fromarray(augmented_data["image"]) 159 | kp_projs_net_input = augmented_data["keypoints"] 160 | else: 161 | image_rgb_net_input = image_rgb_before_aug 162 | kp_projs_net_input = kp_projs_before_aug 163 | 164 | assert ( 165 | image_rgb_net_input.size == self.network_input_resolution 166 | ), "Expected resolution for image_rgb_net_input to be equal to specified network input resolution, but they are different." 167 | 168 | # Now convert keypoints at network input to network output for use as the trained label 169 | kp_projs_net_output = dream.image_proc.convert_keypoints_to_netout_from_netin( 170 | kp_projs_net_input, 171 | self.network_input_resolution, 172 | self.network_output_resolution, 173 | ) 174 | 175 | # Convert to tensor for output handling 176 | # This one goes through image normalization (used for inference) 177 | image_rgb_net_input_as_tensor = self.tensor_from_image_tform( 178 | image_rgb_net_input 179 | ) 180 | 181 | # This one is not (used for net input overlay visualizations - hence "viz") 182 | image_rgb_net_input_viz_as_tensor = self.tensor_from_image_no_norm_tform( 183 | image_rgb_net_input 184 | ) 185 | 186 | # Convert keypoint data to tensors - use float32 size 187 | keypoint_positions_wrt_cam_as_tensor = torch.from_numpy( 188 | np.array(keypoints["positions_wrt_cam"]) 189 | ).float() 190 | kp_projs_net_output_as_tensor = torch.from_numpy( 191 | np.array(kp_projs_net_output) 192 | ).float() 193 | 194 | # Construct output sample 195 | sample = { 196 | "image_rgb_input": image_rgb_net_input_as_tensor, 197 | "keypoint_projections_output": kp_projs_net_output_as_tensor, 198 | "keypoint_positions": keypoint_positions_wrt_cam_as_tensor, 199 | "config": datum, 200 | } 201 | 202 | # Generate the belief maps directly 203 | if self.include_belief_maps: 204 | belief_maps = dream.image_proc.create_belief_map( 205 | self.network_output_resolution, kp_projs_net_output_as_tensor 206 | ) 207 | belief_maps_as_tensor = torch.tensor(belief_maps).float() 208 | sample["belief_maps"] = belief_maps_as_tensor 209 | 210 | if self.debug_mode >= ManipulatorNDDSDatasetDebugLevels["LIGHT"]: 211 | kp_projections_as_tensor = torch.from_numpy( 212 | np.array(keypoints["projections"]) 213 | ).float() 214 | sample["keypoint_projections_raw"] = kp_projections_as_tensor 215 | kp_projections_input_as_tensor = torch.from_numpy( 216 | kp_projs_net_input 217 | ).float() 218 | sample["keypoint_projections_input"] = kp_projections_input_as_tensor 219 | image_raw_resolution_as_tensor = torch.tensor(image_raw_resolution).float() 220 | sample["image_resolution_raw"] = image_raw_resolution_as_tensor 221 | sample["image_rgb_input_viz"] = image_rgb_net_input_viz_as_tensor 222 | 223 | # TODO: same as LIGHT debug, but also saves to disk 224 | if self.debug_mode >= ManipulatorNDDSDatasetDebugLevels["HEAVY"]: 225 | pass 226 | 227 | # Display to screen 228 | if self.debug_mode >= ManipulatorNDDSDatasetDebugLevels["INTERACTIVE"]: 229 | # Ensure that the points are consistent with the image transformations 230 | # The overlaid points on both image should be consistent, despite image transformations 231 | debug_image_raw = dream.image_proc.overlay_points_on_image( 232 | image_rgb_raw, keypoints["projections"], self.keypoint_names 233 | ) 234 | debug_image_raw.show() 235 | 236 | debug_image = dream.image_proc.overlay_points_on_image( 237 | image_rgb_net_input, kp_projs_net_input, self.keypoint_names 238 | ) 239 | debug_image.show() 240 | 241 | # Also show that the output resolution data are consistent 242 | image_rgb_net_output = image_rgb_net_input.resize( 243 | self.network_output_resolution, resample=PILImage.BILINEAR 244 | ) 245 | debug_image_rgb_net_output = dream.image_proc.overlay_points_on_image( 246 | image_rgb_net_output, kp_projs_net_output, self.keypoint_names 247 | ) 248 | debug_image_rgb_net_output.show() 249 | 250 | if self.include_belief_maps: 251 | for kp_idx in range(len(self.keypoint_names)): 252 | belief_map_kp = dream.image_proc.image_from_belief_map( 253 | belief_maps_as_tensor[kp_idx] 254 | ) 255 | belief_map_kp.show() 256 | 257 | belief_map_kp_upscaled = belief_map_kp.resize( 258 | self.network_input_resolution, resample=PILImage.BILINEAR 259 | ) 260 | image_rgb_net_output_belief_blend = PILImage.blend( 261 | image_rgb_net_input, belief_map_kp_upscaled, alpha=0.5 262 | ) 263 | image_rgb_net_output_belief_blend_overlay = dream.image_proc.overlay_points_on_image( 264 | image_rgb_net_output_belief_blend, 265 | [kp_projs_net_input[kp_idx]], 266 | [self.keypoint_names[kp_idx]], 267 | ) 268 | image_rgb_net_output_belief_blend_overlay.show() 269 | 270 | # This only works if the number of workers is zero 271 | input("Press Enter to continue...") 272 | 273 | return sample 274 | 275 | 276 | if __name__ == "__main__": 277 | from PIL import Image 278 | 279 | # beliefs = CreateBeliefMap((100,100),[(50,50),(-1,-1),(0,50),(50,0),(10,10)]) 280 | # for i,b in enumerate(beliefs): 281 | # print(b.shape) 282 | # stack = np.stack([b,b,b],axis=0).transpose(2,1,0) 283 | # im = Image.fromarray((stack*255).astype('uint8')) 284 | # im.save('{}.png'.format(i)) 285 | 286 | path = "/home/sbirchfield/data/FrankaSimpleHomeDR20k/" 287 | # path = '/home/sbirchfield/data/FrankaSimpleMPGammaDR105k/' 288 | 289 | keypoint_names = [ 290 | "panda_link0", 291 | "panda_link2", 292 | "panda_link3", 293 | "panda_link4", 294 | "panda_link6", 295 | "panda_link7", 296 | "panda_hand", 297 | ] 298 | 299 | found_data = dream.utilities.find_ndds_data_in_dir(path) 300 | train_dataset = ManipulatorNDDSDataset( 301 | found_data, 302 | "panda", 303 | keypoint_names, 304 | (400, 400), 305 | (100, 100), 306 | include_belief_maps=True, 307 | augment_data=True, 308 | ) 309 | trainingdata = torch.utils.data.DataLoader( 310 | train_dataset, batch_size=32, shuffle=False, num_workers=1, pin_memory=True 311 | ) 312 | 313 | targets = iter(trainingdata).next() 314 | 315 | for i, b in enumerate(targets["belief_maps"][0]): 316 | print(b.shape) 317 | stack = np.stack([b, b, b], axis=0).transpose(2, 1, 0) 318 | im = Image.fromarray((stack * 255).astype("uint8")) 319 | im.save("{}.png".format(i)) 320 | -------------------------------------------------------------------------------- /dream/geometric_vision.py: -------------------------------------------------------------------------------- 1 | # Copyright (c) 2020 NVIDIA Corporation. All rights reserved. 2 | # This work is licensed under the NVIDIA Source Code License - Non-commercial. Full 3 | # text can be found in LICENSE.md 4 | 5 | import cv2 6 | import numpy as np 7 | from pyrr import Quaternion 8 | 9 | # ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 10 | 11 | 12 | def convert_rvec_to_quaternion(rvec): 13 | """Convert rvec (which is log quaternion) to quaternion""" 14 | theta = np.sqrt( 15 | rvec[0] * rvec[0] + rvec[1] * rvec[1] + rvec[2] * rvec[2] 16 | ) # in radians 17 | raxis = [rvec[0] / theta, rvec[1] / theta, rvec[2] / theta] 18 | 19 | # pyrr's Quaternion (order is XYZW), https://pyrr.readthedocs.io/en/latest/oo_api_quaternion.html 20 | quaternion = Quaternion.from_axis_rotation(raxis, theta) 21 | quaternion.normalize() 22 | return quaternion 23 | 24 | 25 | def hnormalized(vector): 26 | hnormalized_vector = (vector / vector[-1])[:-1] 27 | return hnormalized_vector 28 | 29 | 30 | def point_projection_from_3d(camera_K, points): 31 | projections = [] 32 | for p in points: 33 | p_unflattened = np.matmul(camera_K, p) 34 | projection = hnormalized(p_unflattened) 35 | projections.append(projection) 36 | projections = np.array(projections) 37 | return projections 38 | 39 | 40 | def solve_pnp( 41 | canonical_points, 42 | projections, 43 | camera_K, 44 | method=cv2.SOLVEPNP_EPNP, 45 | refinement=True, 46 | dist_coeffs=np.array([]), 47 | ): 48 | 49 | n_canonial_points = len(canonical_points) 50 | n_projections = len(projections) 51 | assert ( 52 | n_canonial_points == n_projections 53 | ), "Expected canonical_points and projections to have the same length, but they are length {} and {}.".format( 54 | n_canonial_points, n_projections 55 | ) 56 | 57 | # Process points to remove any NaNs 58 | canonical_points_proc = [] 59 | projections_proc = [] 60 | for canon_pt, proj in zip(canonical_points, projections): 61 | 62 | if ( 63 | canon_pt is None 64 | or len(canon_pt) == 0 65 | or canon_pt[0] is None 66 | or canon_pt[1] is None 67 | or proj is None 68 | or len(proj) == 0 69 | or proj[0] is None 70 | or proj[1] is None 71 | ): 72 | continue 73 | 74 | canonical_points_proc.append(canon_pt) 75 | projections_proc.append(proj) 76 | 77 | # Return if no valid points 78 | if len(canonical_points_proc) == 0: 79 | return False, None, None 80 | 81 | canonical_points_proc = np.array(canonical_points_proc) 82 | projections_proc = np.array(projections_proc) 83 | 84 | # Use cv2's PNP solver 85 | try: 86 | pnp_retval, rvec, tvec = cv2.solvePnP( 87 | canonical_points_proc.reshape(canonical_points_proc.shape[0], 1, 3), 88 | projections_proc.reshape(projections_proc.shape[0], 1, 2), 89 | camera_K, 90 | dist_coeffs, 91 | flags=method, 92 | ) 93 | 94 | if refinement: 95 | pnp_retval, rvec, tvec = cv2.solvePnP( 96 | canonical_points_proc.reshape(canonical_points_proc.shape[0], 1, 3), 97 | projections_proc.reshape(projections_proc.shape[0], 1, 2), 98 | camera_K, 99 | dist_coeffs, 100 | flags=cv2.SOLVEPNP_ITERATIVE, 101 | useExtrinsicGuess=True, 102 | rvec=rvec, 103 | tvec=tvec, 104 | ) 105 | translation = tvec[:, 0] 106 | quaternion = convert_rvec_to_quaternion(rvec[:, 0]) 107 | 108 | except: 109 | pnp_retval = False 110 | translation = None 111 | quaternion = None 112 | 113 | return pnp_retval, translation, quaternion 114 | 115 | 116 | def solve_pnp_ransac( 117 | canonical_points, 118 | projections, 119 | camera_K, 120 | method=cv2.SOLVEPNP_EPNP, 121 | inlier_thresh_px=5.0, # this is the threshold for each point to be considered an inlier 122 | dist_coeffs=np.array([]), 123 | ): 124 | 125 | n_canonial_points = len(canonical_points) 126 | n_projections = len(projections) 127 | assert ( 128 | n_canonial_points == n_projections 129 | ), "Expected canonical_points and projections to have the same length, but they are length {} and {}.".format( 130 | n_canonial_points, n_projections 131 | ) 132 | 133 | # Process points to remove any NaNs 134 | canonical_points_proc = [] 135 | projections_proc = [] 136 | for canon_pt, proj in zip(canonical_points, projections): 137 | 138 | if ( 139 | canon_pt is None 140 | or len(canon_pt) == 0 141 | or canon_pt[0] is None 142 | or canon_pt[1] is None 143 | or proj is None 144 | or len(proj) == 0 145 | or proj[0] is None 146 | or proj[1] is None 147 | ): 148 | continue 149 | 150 | canonical_points_proc.append(canon_pt) 151 | projections_proc.append(proj) 152 | 153 | # Return if no valid points 154 | if len(canonical_points_proc) == 0: 155 | return False, None, None, None 156 | 157 | canonical_points_proc = np.array(canonical_points_proc) 158 | projections_proc = np.array(projections_proc) 159 | 160 | # Use cv2's PNP solver 161 | try: 162 | pnp_retval, rvec, tvec, inliers = cv2.solvePnPRansac( 163 | canonical_points_proc.reshape(canonical_points_proc.shape[0], 1, 3), 164 | projections_proc.reshape(projections_proc.shape[0], 1, 2), 165 | camera_K, 166 | dist_coeffs, 167 | reprojectionError=inlier_thresh_px, 168 | flags=method, 169 | ) 170 | 171 | translation = tvec[:, 0] 172 | quaternion = convert_rvec_to_quaternion(rvec[:, 0]) 173 | 174 | except: 175 | pnp_retval = False 176 | translation = None 177 | quaternion = None 178 | inliers = None 179 | 180 | return pnp_retval, translation, quaternion, inliers 181 | 182 | 183 | def add_from_pose(translation, quaternion, keypoint_positions_wrt_cam_gt, camera_K): 184 | transform = np.eye(4) 185 | transform[:3, :3] = quaternion.matrix33.tolist() 186 | transform[:3, -1] = translation 187 | kp_pos_gt_homog = np.hstack( 188 | ( 189 | keypoint_positions_wrt_cam_gt, 190 | np.ones((keypoint_positions_wrt_cam_gt.shape[0], 1)), 191 | ) 192 | ) 193 | kp_pos_aligned = np.transpose(np.matmul(transform, np.transpose(kp_pos_gt_homog)))[ 194 | :, :3 195 | ] 196 | # The below lines were useful when debugging pnp ransac, so left here for now 197 | # projs = point_projection_from_3d(camera_K, kp_pos_aligned) 198 | # temp = np.linalg.norm(kp_projs_est_pnp - projs, axis=1) # all of these should be below the inlier threshold above! 199 | kp_3d_errors = kp_pos_aligned - keypoint_positions_wrt_cam_gt 200 | kp_3d_l2_errors = np.linalg.norm(kp_3d_errors, axis=1) 201 | add = np.mean(kp_3d_l2_errors) 202 | return add 203 | -------------------------------------------------------------------------------- /dream/network.py: -------------------------------------------------------------------------------- 1 | # Copyright (c) 2020 NVIDIA Corporation. All rights reserved. 2 | # This work is licensed under the NVIDIA Source Code License - Non-commercial. Full 3 | # text can be found in LICENSE.md 4 | 5 | import os 6 | from PIL import Image as PILImage 7 | 8 | import numpy as np 9 | import ruamel.yaml 10 | import torch 11 | import torchvision.transforms as TVTransforms 12 | 13 | from .spatial_softmax import SoftArgmaxPavlo 14 | import dream 15 | 16 | # ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 17 | 18 | KNOWN_ARCHITECTURES = [ 19 | "vgg", 20 | "resnet", 21 | ] 22 | 23 | KNOWN_OPTIMIZERS = [ 24 | "adam", # the Adam optimizer 25 | "sgd", 26 | ] # the Stochastic Gradient Descent optimizer 27 | 28 | 29 | def create_network_from_config_file(config_file_path, network_params_path=None): 30 | 31 | # Input argument handling 32 | assert os.path.exists( 33 | config_file_path 34 | ), 'Expected config_file_path "{}" to exist, but it does not.'.format( 35 | config_file_path 36 | ) 37 | 38 | if network_params_path: 39 | load_network_parameters = True 40 | assert os.path.exists( 41 | network_params_path 42 | ), 'If provided, expected network_params_path "{}" to exist, but it does not.'.format( 43 | network_params_path 44 | ) 45 | else: 46 | load_network_parameters = False 47 | 48 | # Create parser 49 | data_parser = ruamel.yaml.YAML(typ="safe") 50 | 51 | print('Loading network config file "{}"'.format(config_file_path)) 52 | with open(config_file_path, "r") as f: 53 | network_config = data_parser.load(f) 54 | 55 | # Create the network 56 | dream_network = create_network_from_config_data(network_config) 57 | 58 | # Optionally load weights of the network 59 | if load_network_parameters: 60 | print('Loading network weights file "{}"'.format(network_params_path)) 61 | dream_network.model.load_state_dict(torch.load(network_params_path)) 62 | 63 | return dream_network 64 | 65 | 66 | def create_network_from_config_data(network_config_data): 67 | 68 | # Create the network 69 | dream_network = DreamNetwork(network_config_data) 70 | return dream_network 71 | 72 | 73 | class DreamNetwork: 74 | def __init__(self, network_config): 75 | 76 | # Assert input 77 | assert ( 78 | "architecture" in network_config 79 | ), 'Required key "architecture" is missing from network configuration.' 80 | assert ( 81 | "type" in network_config["architecture"] 82 | ), 'Required key "type" in dictionary "architecture" is missing from network configuration.' 83 | assert ( 84 | "manipulator" in network_config 85 | ), 'Required key "manipulator" is missing from network configuration.' 86 | assert ( 87 | "name" in network_config["manipulator"] 88 | ), 'Required key "name" in dictionary "manipulator" is missing from network configuration.' 89 | assert ( 90 | "keypoints" in network_config["manipulator"] 91 | ), 'Required key "keypoints" in dictionary "manipulator" is missing from network configuration.' 92 | 93 | # Parse keypoint specification 94 | self.keypoint_names = [] 95 | self.friendly_keypoint_names = [] 96 | self.ros_keypoint_frames = [] 97 | for kp_def in network_config["manipulator"]["keypoints"]: 98 | assert "name" in kp_def, 'Keypoint specification is missing key "name".' 99 | kp_name = kp_def["name"] 100 | self.keypoint_names.append(kp_name) 101 | 102 | friendly_kp_name = ( 103 | kp_def["friendly_name"] if "friendly_name" in kp_def else kp_name 104 | ) 105 | self.friendly_keypoint_names.append(friendly_kp_name) 106 | 107 | ros_kp_frame = kp_def["ros_frame"] if "ros_frame" in kp_def else kp_name 108 | self.ros_keypoint_frames.append(ros_kp_frame) 109 | 110 | self.network_config = network_config 111 | 112 | # TBD: the following "getters" should all be methods 113 | self.manipulator_name = self.network_config["manipulator"]["name"] 114 | self.n_keypoints = len(self.keypoint_names) 115 | self.architecture_type = self.network_config["architecture"]["type"] 116 | 117 | # print robot info 118 | print("`network.py`. `DreamNetwork:__init()` ----------") 119 | print(" Manipulator: {}".format(self.manipulator_name)) 120 | print(" Keypoint names: {}".format(self.keypoint_names)) 121 | print(" Friendly keypoint names: {}".format(self.friendly_keypoint_names)) 122 | print(" Architecture type: {}".format(self.architecture_type)) 123 | 124 | # Parse normalization 125 | assert ( 126 | "image_normalization" in self.network_config["architecture"] 127 | ), 'Required key "image_normalization" in dictionary "architecture" is missing from network configuration.' 128 | self.image_normalization = self.network_config["architecture"][ 129 | "image_normalization" 130 | ] 131 | 132 | # Parse image preprocessing: how to handle what the network does between the input image and the network input layer 133 | assert ( 134 | "image_preprocessing" in self.network_config["architecture"] 135 | ), 'Required key "image_preprocessing" in dictionary "architecture" is missing from network configuration.' 136 | assert ( 137 | self.image_preprocessing() in dream.image_proc.KNOWN_IMAGE_PREPROC_TYPES 138 | ), 'Image preprocessing type "{}" is not recognized.'.format( 139 | self.image_preprocessing() 140 | ) 141 | 142 | # Assert the output heads have been specified 143 | assert ( 144 | "output_heads" in self.network_config["architecture"] 145 | ), 'Required key "output_heads" in dictionary "architecture" is missing from network configuration.' 146 | 147 | assert ( 148 | self.architecture_type in KNOWN_ARCHITECTURES 149 | ), 'Expected architecture type "{}" to be in the list of known network architectures, but it is not.'.format( 150 | self.architecture_type 151 | ) 152 | 153 | # Assert that the input heads have been specified 154 | assert ( 155 | "input_heads" in self.network_config["architecture"] 156 | ), 'Required key "input_heads" in dictionary "architecture" is missing from network configuration.' 157 | assert ( 158 | self.network_config["architecture"]["input_heads"][0] == "image_rgb" 159 | ), 'First input head must be "image_rgb".' 160 | 161 | # Assert that the trained network input resolution has been specified 162 | # The output resolution is calculated at the end, after the model is created 163 | assert ( 164 | "training" in self.network_config 165 | ), 'Required key "training" is missing from network configuration.' 166 | assert ( 167 | "config" in self.network_config["training"] 168 | ), 'Required key "config" in dictionary "training" is missing from network configuration.' 169 | assert ( 170 | "net_input_resolution" in self.network_config["training"]["config"] 171 | ), 'Required key "net_input_resolution" is missing from training configuration.' 172 | len_trained_net_input_res = len( 173 | self.network_config["training"]["config"]["net_input_resolution"] 174 | ) 175 | assert ( 176 | len_trained_net_input_res == 2 177 | ), "Expected trained net input resolution to have length 2, but it has length {}.".format( 178 | len_trained_net_input_res 179 | ) 180 | 181 | assert ( 182 | "platform" in self.network_config["training"] 183 | ), 'Required key "platform" in dictionary "training" is missing from network configuration.' 184 | gpu_ids = self.network_config["training"]["platform"]["gpu_ids"] 185 | data_parallel_device_ids = gpu_ids if gpu_ids else None 186 | 187 | # Options to sort belief map 188 | # Use belief map scores to try to determine cases in multiple instances 189 | self.use_belief_peak_scores = True 190 | # This in the scale of the belief map, which is 0 - 1.0. 191 | self.belief_peak_next_best_score = 0.25 192 | 193 | # Create architectures 194 | if self.architecture_type == "vgg": 195 | vgg_kwargs = {} 196 | if "spatial_softmax" in self.network_config["architecture"]: 197 | assert self.network_config["architecture"]["output_heads"] == [ 198 | "belief_maps", 199 | "keypoints", 200 | ] 201 | vgg_kwargs = { 202 | "internalize_spatial_softmax": True, 203 | "learned_beta": self.network_config["architecture"][ 204 | "spatial_softmax" 205 | ]["learned_beta"], 206 | "initial_beta": self.network_config["architecture"][ 207 | "spatial_softmax" 208 | ]["initial_beta"], 209 | } 210 | else: 211 | assert self.network_config["architecture"]["output_heads"] == [ 212 | "belief_maps" 213 | ] 214 | vgg_kwargs = {"internalize_spatial_softmax": False} 215 | 216 | # Check if using new decoder -- assume output is 100x100 if not 217 | if ( 218 | "deconv_decoder" in self.network_config["architecture"] 219 | and not "full_output" in self.network_config["architecture"] 220 | ): 221 | use_deconv_decoder = self.network_config["architecture"][ 222 | "deconv_decoder" 223 | ] 224 | vgg_kwargs["deconv_decoder"] = use_deconv_decoder 225 | elif "full_output" in self.network_config["architecture"]: 226 | use_deconv_decoder = self.network_config["architecture"][ 227 | "deconv_decoder" 228 | ] 229 | vgg_kwargs["deconv_decoder"] = use_deconv_decoder 230 | vgg_kwargs["full_output"] = True 231 | 232 | if "n_stages" in self.network_config["architecture"]: 233 | vgg_kwargs["n_stages"] = self.network_config[ 234 | "architecture" 235 | ]["n_stages"] 236 | 237 | # Check if skip connections -- use default if not 238 | if "skip_connections" in self.network_config["architecture"]: 239 | vgg_kwargs["skip_connections"] = self.network_config[ 240 | "architecture" 241 | ]["skip_connections"] 242 | 243 | if "n_stages" in self.network_config["architecture"]: 244 | self.model = torch.nn.DataParallel( 245 | dream.models.DreamHourglassMultiStage( 246 | self.n_keypoints, **vgg_kwargs 247 | ), 248 | device_ids=data_parallel_device_ids, 249 | ).cuda() 250 | else: 251 | self.model = torch.nn.DataParallel( 252 | dream.models.DreamHourglass( 253 | self.n_keypoints, **vgg_kwargs 254 | ), 255 | device_ids=data_parallel_device_ids, 256 | ).cuda() 257 | 258 | loss_type = self.network_config["architecture"]["loss"]["type"] 259 | 260 | if loss_type == "mse": 261 | self.criterion = torch.nn.MSELoss() 262 | elif loss_type == "huber": 263 | self.criterion = torch.nn.SmoothL1Loss() 264 | else: 265 | assert False, "Loss not yet implemented." 266 | 267 | elif self.architecture_type == "resnet": 268 | 269 | # Assume we're only training on belief maps 270 | assert self.network_config["architecture"]["output_heads"] == [ 271 | "belief_maps" 272 | ] 273 | 274 | resnet_kwargs = {} 275 | 276 | # Check if using "full" output -- assume output is "half" if not 277 | if "full_decoder" in self.network_config["architecture"]: 278 | use_full_decoder = self.network_config["architecture"]["full_decoder"] 279 | resnet_kwargs["full"] = use_full_decoder 280 | 281 | self.model = torch.nn.DataParallel( 282 | dream.models.ResnetSimple(self.n_keypoints, **resnet_kwargs), 283 | device_ids=data_parallel_device_ids, 284 | ).cuda() 285 | 286 | loss_type = self.network_config["architecture"]["loss"]["type"] 287 | 288 | if loss_type == "mse": 289 | self.criterion = torch.nn.MSELoss() 290 | elif loss_type == "huber": 291 | self.criterion = torch.nn.SmoothL1Loss() 292 | else: 293 | assert False, "Loss not yet implemented." 294 | 295 | else: 296 | assert False, 'Network architecture type "{}" not defined.'.format( 297 | self.architecture_type 298 | ) 299 | 300 | # Optimizer is created in a separate call, because this isn't needed unless we're training 301 | self.optimizer = None 302 | 303 | # Now determine trained network output resolution, if not specified, or assert consistency if so 304 | trained_net_out_res_from_model = self.net_output_resolution_from_input_resolution( 305 | self.trained_net_input_resolution() 306 | ) 307 | trained_net_out_res_from_model_as_list = list(trained_net_out_res_from_model) 308 | if "net_output_resolution" in self.network_config["training"]["config"]: 309 | assert ( 310 | self.network_config["training"]["config"]["net_output_resolution"] 311 | == trained_net_out_res_from_model_as_list 312 | ), "Network model and config file disagree for trained network output resolution." 313 | else: 314 | # Add to config 315 | self.network_config["training"]["config"][ 316 | "net_output_resolution" 317 | ] = trained_net_out_res_from_model_as_list 318 | 319 | def trained_net_input_resolution(self): 320 | return tuple(self.network_config["training"]["config"]["net_input_resolution"]) 321 | 322 | def trained_net_output_resolution(self): 323 | return tuple(self.network_config["training"]["config"]["net_output_resolution"]) 324 | 325 | def image_preprocessing(self): 326 | return self.network_config["architecture"]["image_preprocessing"] 327 | 328 | def train(self, network_input_heads, target): 329 | assert self.optimizer, "Optimizer must be defined. Use enable_training() first." 330 | 331 | self.optimizer.zero_grad() 332 | 333 | loss = self.loss(network_input_heads, target) 334 | 335 | loss.backward() 336 | self.optimizer.step() 337 | 338 | return loss 339 | 340 | def loss(self, network_input_heads, target): 341 | network_output_heads = self.model(network_input_heads[0]) 342 | 343 | if self.network_config["architecture"]["output_heads"] == ["belief_maps"]: 344 | 345 | if "n_stages" in self.network_config["architecture"]: 346 | # print('hello') 347 | n_stages = len(network_output_heads) 348 | expanded_size = [n_stages] + [-1] * target.dim() 349 | target_expanded = target.unsqueeze(0).expand(expanded_size) 350 | loss = self.criterion( 351 | torch.stack(network_output_heads), target_expanded 352 | ) 353 | 354 | # This was the loss from before 355 | # loss = torch.tensor(0).float().cuda() 356 | # for belief in network_output_heads: loss += ((belief - target) * (belief - target)).mean() 357 | 358 | else: 359 | loss = self.criterion(network_output_heads[0], target) 360 | 361 | else: 362 | assert False, "Not yet implemented." 363 | 364 | return loss 365 | 366 | # image_raw_resolution: (width, height) in pixels 367 | # calls resolution_after_preprocessing under the hood, using network trained resolution as the reference resolution 368 | def net_resolutions_from_image_raw_resolution( 369 | self, image_raw_resolution, image_preprocessing_override=None 370 | ): 371 | 372 | # Input argument handling 373 | assert ( 374 | len(image_raw_resolution) == 2 375 | ), 'Expected "image_raw_resolution" to have length 2, but it has length {}.'.format( 376 | len(image_raw_resolution) 377 | ) 378 | 379 | image_preprocessing = ( 380 | image_preprocessing_override 381 | if image_preprocessing_override 382 | else self.image_preprocessing() 383 | ) 384 | 385 | # Calculate image resolution at network input layer, after preprocessing 386 | net_input_resolution = dream.image_proc.resolution_after_preprocessing( 387 | image_raw_resolution, 388 | self.trained_net_input_resolution(), 389 | image_preprocessing, 390 | ) 391 | net_output_resolution = self.net_output_resolution_from_input_resolution( 392 | net_input_resolution 393 | ) 394 | 395 | return net_input_resolution, net_output_resolution 396 | 397 | def net_output_resolution_from_input_resolution(self, net_input_resolution): 398 | 399 | # Input argument handling 400 | assert ( 401 | len(net_input_resolution) == 2 402 | ), 'Expected "net_input_resolution" to have length 2, but it has length {}.'.format( 403 | len(net_input_resolution) 404 | ) 405 | 406 | netin_width, netin_height = net_input_resolution 407 | 408 | # Construct test input and send thru network to get the output 409 | # This assumes there is only one input head, which is the RGB image 410 | with torch.no_grad(): 411 | net_input_as_tensor_batch = torch.zeros( 412 | 1, 3, netin_height, netin_width 413 | ).cuda() 414 | net_output_as_tensor_batch = self.model(net_input_as_tensor_batch) 415 | net_output_shape = net_output_as_tensor_batch[0][0].shape 416 | net_output_resolution = (net_output_shape[2], net_output_shape[1]) 417 | 418 | return net_output_resolution 419 | 420 | # Wrapper for inferences from a PIL image directly 421 | # Returns keypoints in the input image (not necessarily network input) frame 422 | # Allows for an optional overwrite 423 | def keypoints_from_image( 424 | self, input_rgb_image_as_pil, image_preprocessing_override=None, debug=False 425 | ): 426 | 427 | # Input handling 428 | assert isinstance( 429 | input_rgb_image_as_pil, PILImage.Image 430 | ), 'Expected "input_rgb_image_as_pil" to be a PIL Image, but it is {}.'.format( 431 | type(input_rgb_image_as_pil) 432 | ) 433 | input_image_resolution = input_rgb_image_as_pil.size 434 | 435 | # do preprocessing 436 | image_preprocessing = ( 437 | image_preprocessing_override 438 | if image_preprocessing_override 439 | else self.image_preprocessing() 440 | ) 441 | 442 | input_image_preproc_before_norm = dream.image_proc.preprocess_image( 443 | input_rgb_image_as_pil, 444 | self.trained_net_input_resolution(), 445 | image_preprocessing, 446 | ) 447 | 448 | netin_res_inf = input_image_preproc_before_norm.size 449 | tensor_from_image_tform = TVTransforms.Compose( 450 | [ 451 | TVTransforms.ToTensor(), 452 | TVTransforms.Normalize( 453 | self.image_normalization["mean"], self.image_normalization["stdev"] 454 | ), 455 | ] 456 | ) 457 | input_rgb_image_as_tensor = tensor_from_image_tform( 458 | input_image_preproc_before_norm 459 | ) 460 | 461 | # Inference 462 | with torch.no_grad(): 463 | input_rgb_image_as_tensor_batch = input_rgb_image_as_tensor.unsqueeze( 464 | 0 465 | ).cuda() 466 | belief_maps_net_out_batch, detected_kp_projs_net_out_batch = self.inference( 467 | input_rgb_image_as_tensor_batch 468 | ) 469 | 470 | belief_maps_net_out = belief_maps_net_out_batch[0] 471 | detected_kp_projs_net_out = np.array( 472 | detected_kp_projs_net_out_batch[0], dtype=float 473 | ) 474 | 475 | # Get network output resolution for this inference based on belief maps 476 | belief_map = belief_maps_net_out[0] 477 | netout_res_inf = (belief_map.shape[1], belief_map.shape[0]) 478 | 479 | # Convert keypoints from net-output to net-input 480 | detected_kp_projs_net_in = dream.image_proc.convert_keypoints_to_netin_from_netout( 481 | detected_kp_projs_net_out, netout_res_inf, netin_res_inf 482 | ) 483 | detected_kp_projs = dream.image_proc.convert_keypoints_to_raw_from_netin( 484 | detected_kp_projs_net_in, 485 | netin_res_inf, 486 | input_image_resolution, 487 | image_preprocessing, 488 | ) 489 | 490 | detection_result = {"detected_keypoints": detected_kp_projs} 491 | if debug: 492 | detection_result["image_rgb_net_input"] = input_image_preproc_before_norm 493 | detection_result["belief_maps"] = belief_maps_net_out 494 | detection_result[ 495 | "detected_keypoints_net_output" 496 | ] = detected_kp_projs_net_out 497 | detection_result["detected_keypoints_net_input"] = detected_kp_projs_net_in 498 | 499 | return detection_result 500 | 501 | # Inference is designed to return the best output of belief_maps and keypoints 502 | # This is an abstraction layer so even if multiple stages are used, this only produces one set of outputs (for the final stage) 503 | def inference(self, network_input): 504 | 505 | if self.architecture_type == "dream_legacy": 506 | network_head_outputs = self.model(network_input) 507 | belief_maps = network_head_outputs[0] 508 | detected_kp_projections = self.soft_argmax(belief_maps) 509 | network_output = [belief_maps, detected_kp_projections] 510 | 511 | elif self.network_config["architecture"]["output_heads"] == [ 512 | "belief_maps", 513 | "keypoints", 514 | ]: 515 | network_output = self.model(network_input) 516 | 517 | elif self.network_config["architecture"]["output_heads"] == ["belief_maps"]: 518 | 519 | network_head_output = self.model(network_input) 520 | 521 | # use the last layer 522 | belief_maps_batch = network_head_output[-1] 523 | 524 | # # If we need to transpose the belief map -- this is done here: 525 | # for x in range(belief_maps_batch.shape[0]): 526 | # bmaps = belief_maps_batch[x] 527 | # belief_maps_batch[x] = torch.cat([torch.t(b).unsqueeze(0) for b in bmaps]) 528 | 529 | peaks_from_belief_maps_kwargs = {} 530 | ( 531 | trained_net_output_width, 532 | trained_net_output_height, 533 | ) = self.trained_net_output_resolution() 534 | if trained_net_output_width >= 400 and trained_net_output_height >= 400: 535 | peaks_from_belief_maps_kwargs["offset_due_to_upsampling"] = 0.0 536 | else: 537 | # heuristic based on previous work with smaller belief maps 538 | peaks_from_belief_maps_kwargs["offset_due_to_upsampling"] = 0.4395 539 | 540 | detected_kp_projs_batch = [] 541 | for belief_maps in belief_maps_batch: 542 | peaks = dream.image_proc.peaks_from_belief_maps( 543 | belief_maps, **peaks_from_belief_maps_kwargs 544 | ) 545 | 546 | # Determine keypoints from this 547 | detected_kp_projs = [] 548 | for peak in peaks: 549 | 550 | if len(peak) == 1: 551 | detected_kp_projs.append([peak[0][0], peak[0][1]]) 552 | else: 553 | if self.use_belief_peak_scores and len(peak) > 1: 554 | # Try to use the belief map scores 555 | peak_sorted_by_score = sorted( 556 | peak, key=lambda x: x[2], reverse=True 557 | ) 558 | if ( 559 | peak_sorted_by_score[0][2] - peak_sorted_by_score[1][2] 560 | >= self.belief_peak_next_best_score 561 | ): 562 | # Keep the best score 563 | detected_kp_projs.append( 564 | [ 565 | peak_sorted_by_score[0][0], 566 | peak_sorted_by_score[0][1], 567 | ] 568 | ) 569 | else: 570 | # Can't determine -- return no detection 571 | # Can't use None because we need to return it as a tensor 572 | detected_kp_projs.append([-999.999, -999.999]) 573 | 574 | else: 575 | # Can't determine -- return no detection 576 | # Can't use None because we need to return it as a tensor 577 | detected_kp_projs.append([-999.999, -999.999]) 578 | 579 | detected_kp_projs_batch.append(detected_kp_projs) 580 | 581 | detected_kp_projs_batch = torch.tensor(detected_kp_projs_batch).float() 582 | 583 | network_output = [belief_maps_batch, detected_kp_projs_batch] 584 | 585 | else: 586 | assert ( 587 | False 588 | ), "Could not determine how to conduct inference on this network." 589 | 590 | return network_output 591 | 592 | def save_network_config(self, config_file_path, overwrite=False): 593 | 594 | if not overwrite: 595 | assert not os.path.exists( 596 | config_file_path 597 | ), 'Output file already exists in "{}".'.format(config_file_path) 598 | 599 | # Create saver 600 | data_saver = ruamel.yaml.YAML() 601 | data_saver.default_flow_style = False 602 | data_saver.explicit_start = False 603 | 604 | with open(config_file_path, "w") as f: 605 | # TBD - convert to ruamel.yaml.comments.CommentedMap to get rid of !!omap in yaml 606 | data_saver.dump(self.network_config, f) 607 | 608 | def save_network_params(self, network_params_path, overwrite=False): 609 | 610 | if not overwrite: 611 | assert not os.path.exists( 612 | network_params_path 613 | ), 'Output file already exists in "{}".'.format(network_params_path) 614 | 615 | # Save weights 616 | torch.save(self.model.state_dict(), network_params_path) 617 | 618 | def save_network( 619 | self, output_dir, output_filename_without_extension, overwrite=False 620 | ): 621 | 622 | dream.utilities.makedirs(output_dir, exist_ok=overwrite) 623 | 624 | network_config_dir = os.path.join( 625 | output_dir, output_filename_without_extension + ".yaml" 626 | ) 627 | self.save_network_config(network_config_dir, overwrite) 628 | 629 | network_params_path = os.path.join( 630 | output_dir, output_filename_without_extension + ".pth" 631 | ) 632 | self.save_network_params(network_params_path, overwrite) 633 | 634 | def enable_training(self): 635 | 636 | # Load optimizer if needed 637 | if not self.optimizer: 638 | 639 | assert ( 640 | "optimizer" in self.network_config["training"]["config"] 641 | ), 'Required key "optimizer" in dictionary "config" is missing from network configuration.' 642 | assert ( 643 | "type" in self.network_config["training"]["config"]["optimizer"] 644 | ), 'Required key "type" in dictionary "optimizer" is missing from network configuration.' 645 | 646 | network_parameters = filter( 647 | lambda p: p.requires_grad, self.model.parameters() 648 | ) 649 | optimizer_type = self.network_config["training"]["config"]["optimizer"][ 650 | "type" 651 | ] 652 | 653 | assert ( 654 | optimizer_type in KNOWN_OPTIMIZERS 655 | ), 'Expected optimizer_type "{}" to be in the list of known optimizers, but it is not.'.format( 656 | optimizer_type 657 | ) 658 | 659 | if optimizer_type == "adam": 660 | 661 | assert ( 662 | "learning_rate" 663 | in self.network_config["training"]["config"]["optimizer"] 664 | ), 'Required key "learning_rate" in dictionary "optimizer" is missing to use the Adam optimizer.' 665 | 666 | self.optimizer = torch.optim.Adam( 667 | network_parameters, 668 | lr=self.network_config["training"]["config"]["optimizer"][ 669 | "learning_rate" 670 | ], 671 | ) 672 | 673 | elif optimizer_type == "sgd": 674 | 675 | assert ( 676 | "learning_rate" 677 | in self.network_config["training"]["config"]["optimizer"] 678 | ), 'Required key "learning_rate" in dictionary "optimizer" is missing to use the SGD optimizer.' 679 | 680 | self.optimizer = torch.optim.SGD( 681 | network_parameters, 682 | lr=self.network_config["training"]["config"]["optimizer"][ 683 | "learning_rate" 684 | ], 685 | ) 686 | 687 | else: 688 | assert False, 'Optimizer "{}" is not defined.'.format(optimizer_type) 689 | 690 | # Enable training mode 691 | self.model.train() 692 | 693 | def enable_evaluation(self): 694 | 695 | # Enable evaluation mode 696 | self.model.eval() 697 | -------------------------------------------------------------------------------- /dream/oks_plots.py: -------------------------------------------------------------------------------- 1 | # Copyright (c) 2020 NVIDIA Corporation. All rights reserved. 2 | # This work is licensed under the NVIDIA Source Code License - Non-commercial. Full 3 | # text can be found in LICENSE.md 4 | 5 | import pandas as pd 6 | import argparse 7 | import numpy as np 8 | import matplotlib.pyplot as plt 9 | import seaborn as sns 10 | 11 | plt.style.use("seaborn-whitegrid") 12 | 13 | 14 | # Example of running the script 15 | # python oks_plots.py --data all_dataset_keypoints.csv all_dataset_keypoints.csv --labels 1 2 16 | 17 | # pythonw oks_plots.py --data deep-arm-cal-paper/data/dope/3cam_real_keypoints.csv deep-arm-cal-paper/data/dream_hg/3cam_real_keypoints.csv deep-arm-cal-paper/data/dream_hg_deconv/3cam_real_keypoints.csv deep-arm-cal-paper/data/resimple/3cam_real_keypoints.csv --labels DOPE DREAM AE resnet 18 | parser = argparse.ArgumentParser(description="OKS for DREAM") 19 | parser.add_argument( 20 | "--data", nargs="+", default="[all_dataset_keypoints.csv]", help="list of csv files" 21 | ) 22 | 23 | parser.add_argument( 24 | "--labels", 25 | nargs="+", 26 | default=None, 27 | help="names for each dataset to be added as label", 28 | ) 29 | 30 | parser.add_argument("--styles", nargs="+", default=None, help="") 31 | 32 | parser.add_argument("--colours", nargs="+", default=None, help="") 33 | 34 | parser.add_argument("--pixel", default=20) 35 | 36 | parser.add_argument("--output", default="output.pdf") 37 | 38 | parser.add_argument("--show", default=False, action="store_true") 39 | 40 | parser.add_argument("--title", default=None) 41 | 42 | args = parser.parse_args() 43 | print(args) 44 | 45 | 46 | fig = plt.figure() 47 | ax = plt.axes() 48 | 49 | handles = [] 50 | 51 | for i_csv, csv_file in enumerate(args.data): 52 | print(csv_file) 53 | if csv_file == "666": 54 | plt.plot([], [], " ", label=args.labels[i_csv].replace("_", " ")) 55 | continue 56 | name_csv = csv_file.replace(".csv", "") 57 | 58 | df = pd.read_csv(csv_file) 59 | 60 | # PCK percentage of correct keypoints 61 | all_dist = [] 62 | all_pred = [] 63 | all_gt = [] 64 | for i in range(7): 65 | # Compute all the distances between keypoints - Implementing them does not work well 66 | 67 | fpred = [] 68 | fgt = [] 69 | gt = df[[f"kp{i}x_gt", f"kp{i}y_gt"]].values.tolist() 70 | pred = df[[f"kp{i}x", f"kp{i}y"]].values.tolist() 71 | 72 | all_gt.append(gt) 73 | all_pred.append(pred) 74 | 75 | for i_entry, entry in enumerate(gt): 76 | if entry[0] > 0 and entry[0] < 640 and entry[1] > 0 and entry[1] < 480: 77 | fgt.append([float(entry[0]), float(entry[1])]) 78 | fpred.append([float(pred[i_entry][0]), float(pred[i_entry][1])]) 79 | 80 | pred = np.array(fpred) 81 | gt = np.array(fgt) 82 | 83 | values = np.linalg.norm(gt - pred, axis=1) 84 | 85 | # print(pair.shape) 86 | # add them to a single list 87 | 88 | all_dist += values.tolist() 89 | 90 | all_dist = np.array(all_dist) 91 | 92 | # print(len(all_dist)) 93 | 94 | # all_dist = all_dist[np.where(all_dist<1000)] 95 | # all_dist = all_dist[np.where(all_dist<1000)] 96 | 97 | # print(all_dist[:10]) 98 | # print(all_dist.shape) 99 | print("detected", len(all_dist)) 100 | 101 | pck_values = np.arange(0, int(args.pixel), 0.01) 102 | 103 | y_values = [] 104 | 105 | for value in pck_values: 106 | size_good = len(np.where(all_dist < value)[0]) / len(all_dist) 107 | y_values.append(size_good) 108 | # print(value,size_good) 109 | 110 | auc = np.trapz(y_values, dx=0.01) / float(args.pixel) 111 | 112 | print("auc", auc) 113 | all_dist = all_dist[np.where(all_dist < 1000)] 114 | print("mean", np.mean(all_dist)) 115 | print("median", np.median(all_dist)) 116 | print("std", np.std(all_dist)) 117 | 118 | # TODO: consolidate above calculations with pnp_metrics 119 | from dream import analysis as dream_analysis 120 | 121 | dim = np.array(all_pred).shape 122 | temp_pred = np.reshape(np.array(all_pred), (dim[0] * dim[1], dim[2])) 123 | temp_gt = np.reshape(np.array(all_gt), (dim[0] * dim[1], dim[2])) 124 | kp_metrics = dream_analysis.keypoint_metrics(temp_pred, temp_gt, (640, 480)) 125 | assert kp_metrics["l2_error_mean_px"] == np.mean(all_dist) 126 | assert kp_metrics["l2_error_median_px"] == np.median(all_dist) 127 | assert kp_metrics["l2_error_std_px"] == np.std(all_dist) 128 | assert np.abs(auc - kp_metrics["l2_error_auc"]) < 1.0e-15 129 | 130 | # plot 131 | 132 | try: 133 | label = args.labels[i_csv] 134 | except: 135 | label = name_csv 136 | cycle = plt.rcParams["axes.prop_cycle"].by_key()["color"] 137 | try: 138 | 139 | colour = cycle[int(args.colours[i_csv])] 140 | except: 141 | colour = "" 142 | 143 | try: 144 | style = args.styles[i_csv] 145 | if style == "0": 146 | style = "-" 147 | elif style == "1": 148 | style = "--" 149 | elif style == "2": 150 | style = ":" 151 | 152 | else: 153 | style = "-" 154 | except: 155 | style = "-" 156 | 157 | label = f"{label} ({auc:.3f})" 158 | ax.plot(pck_values, y_values, style, color=colour, label=label) 159 | 160 | # from matplotlib.patches import Rectangle 161 | 162 | # extra = Rectangle((0, 0), 1, 1, fc="w", fill=False, edgecolor='none', linewidth=0) 163 | 164 | 165 | plt.xlabel("PCK threshold distance (pixels)") 166 | plt.ylabel("Accuracy") 167 | plt.title(args.title) 168 | # ax.legend([extra, handles[0], handles[1], handles[2], extra , handles[3], handles[4], handles[5] ], ("Non-DR",0,1,2 ,"DR",0,1,2),loc = "lower right") 169 | ax.legend(loc="lower right", frameon=True, fancybox=True, framealpha=0.8) 170 | 171 | legend = ax.get_legend() 172 | for i, t in enumerate(legend.get_texts()): 173 | if args.data[i] == "666": 174 | t.set_ha("left") # ha is alias for horizontalalignment 175 | t.set_position((-30, 0)) 176 | 177 | ax.set_ylim(0, 1) 178 | ax.set_xlim(0, int(args.pixel)) 179 | plt.savefig(args.output) 180 | if args.show: 181 | plt.show() 182 | -------------------------------------------------------------------------------- /dream/spatial_softmax.py: -------------------------------------------------------------------------------- 1 | # Copyright (c) 2020 NVIDIA Corporation. All rights reserved. 2 | # This work is licensed under the NVIDIA Source Code License - Non-commercial. Full 3 | # text can be found in LICENSE.md 4 | 5 | import torch 6 | from torch.autograd import Variable 7 | import torch.nn.functional as F 8 | from torch.nn.parameter import Parameter 9 | import numpy as np 10 | 11 | # This code is adapted from 12 | # https://gitlab-master.nvidia.com/pmolchanov/lpr-3d-hand-pose-rgb-demo/blob/master/handpose/models/image_heatmaps_pose2dZrel_softargmax_slim.py 13 | 14 | 15 | class SoftArgmaxPavlo(torch.nn.Module): 16 | def __init__(self, n_keypoints=5, learned_beta=False, initial_beta=25.0): 17 | super(SoftArgmaxPavlo, self).__init__() 18 | self.avgpool = torch.nn.AvgPool2d(7, stride=1, padding=3) 19 | if learned_beta: 20 | self.beta = torch.nn.Parameter(torch.ones(n_keypoints) * initial_beta) 21 | else: 22 | self.beta = (torch.ones(n_keypoints) * initial_beta).cuda() 23 | 24 | def forward(self, heatmaps, size_mult=1.0): 25 | 26 | epsilon = 1e-8 27 | bch, ch, n_row, n_col = heatmaps.size() 28 | n_kpts = ch 29 | 30 | beta = self.beta 31 | 32 | # input has the shape (#bch, n_kpts+1, img_sz[0], img_sz[1]) 33 | # +1 is for the Zrel 34 | heatmaps2d = heatmaps[:, :n_kpts, :, :] 35 | heatmaps2d = self.avgpool(heatmaps2d) 36 | 37 | # heatmaps2d has the shape (#bch, n_kpts, img_sz[0]*img_sz[1]) 38 | heatmaps2d = heatmaps2d.contiguous().view(bch, n_kpts, -1) 39 | 40 | # getting the max value of the maps across each 2D matrix 41 | map_max = torch.max(heatmaps2d, dim=2, keepdim=True)[0] 42 | 43 | # reducing the max from each map 44 | # this will make the max value zero and all other values 45 | # will be negative. 46 | # max_reduced_maps has the shape (#bch, n_kpts, img_sz[0]*img_sz[1]) 47 | heatmaps2d = heatmaps2d - map_max 48 | 49 | beta_ = beta.view(1, n_kpts, 1).repeat(1, 1, n_row * n_col) 50 | # due to applying the beta value, the non-max values will be further 51 | # pushed towards zero after applying the exp function 52 | exp_maps = torch.exp(beta_ * heatmaps2d) 53 | # normalizing the exp_maps by diving it to the sum of elements 54 | # exp_maps_sum has the shape (#bch, n_kpts, 1) 55 | exp_maps_sum = torch.sum(exp_maps, dim=2, keepdim=True) 56 | exp_maps_sum = exp_maps_sum.view(bch, n_kpts, 1, 1) 57 | normalized_maps = exp_maps.view(bch, n_kpts, n_row, n_col) / ( 58 | exp_maps_sum + epsilon 59 | ) 60 | 61 | col_vals = torch.arange(0, n_col) * size_mult 62 | col_repeat = col_vals.repeat(n_row, 1) 63 | col_idx = col_repeat.view(1, 1, n_row, n_col).cuda() 64 | # col_mat gives a column measurement matrix to be used for getting 65 | # 'x'. It is a matrix where each row has the sequential values starting 66 | # from 0 up to n_col-1: 67 | # 0,1,2, ..., n_col-1 68 | # 0,1,2, ..., n_col-1 69 | # 0,1,2, ..., n_col-1 70 | 71 | row_vals = torch.arange(0, n_row).view(n_row, -1) * size_mult 72 | row_repeat = row_vals.repeat(1, n_col) 73 | row_idx = row_repeat.view(1, 1, n_row, n_col).cuda() 74 | # row_mat gives a row measurement matrix to be used for getting 'y'. 75 | # It is a matrix where each column has the sequential values starting 76 | # from 0 up to n_row-1: 77 | # 0,0,0, ..., 0 78 | # 1,1,1, ..., 1 79 | # 2,2,2, ..., 2 80 | # ... 81 | # n_row-1, ..., n_row-1 82 | 83 | col_idx = Variable(col_idx, requires_grad=False) 84 | weighted_x = normalized_maps * col_idx.float() 85 | weighted_x = weighted_x.view(bch, n_kpts, -1) 86 | x_vals = torch.sum(weighted_x, dim=2) 87 | 88 | row_idx = Variable(row_idx, requires_grad=False) 89 | weighted_y = normalized_maps * row_idx.float() 90 | weighted_y = weighted_y.view(bch, n_kpts, -1) 91 | y_vals = torch.sum(weighted_y, dim=2) 92 | 93 | out = torch.stack((x_vals, y_vals), dim=2) 94 | 95 | return out 96 | 97 | 98 | if __name__ == "__main__": 99 | from torch.autograd import Variable 100 | 101 | data = Variable(torch.zeros([1, 3, 3, 3])) 102 | data[0, 0, 0, 1] = 10 103 | data[0, 1, 1, 1] = 10 104 | data[0, 2, 1, 2] = 10 105 | layer = SpatialSoftmax(3, 3, 3, temperature=1) 106 | print(layer(data)) 107 | -------------------------------------------------------------------------------- /dream/utilities.py: -------------------------------------------------------------------------------- 1 | # Copyright (c) 2020 NVIDIA Corporation. All rights reserved. 2 | # This work is licensed under the NVIDIA Source Code License - Non-commercial. Full 3 | # text can be found in LICENSE.md 4 | 5 | import os 6 | import random 7 | 8 | import numpy as np 9 | from ruamel.yaml import YAML 10 | import torch 11 | 12 | # ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 13 | 14 | 15 | def set_random_seed(seed): 16 | assert isinstance( 17 | seed, int 18 | ), 'Expected "seed" to be an integer, but it is "{}".'.format(type(seed)) 19 | random.seed(seed) 20 | os.environ["PYTHONHASHSEED"] = str(seed) 21 | np.random.seed(seed) 22 | torch.manual_seed(seed) 23 | torch.cuda.manual_seed(seed) 24 | torch.cuda.manual_seed_all(seed) 25 | torch.backends.cudnn.deterministic = True 26 | torch.backends.cudnn.benchmark = False 27 | 28 | 29 | def makedirs(directory, exist_ok=False): 30 | """A method that replicates the functionality of os.makedirs that works for both Python 2 and Python 3.""" 31 | if os.path.exists(directory): 32 | assert exist_ok, 'Specified directory "{}" already exists.'.format(directory) 33 | else: 34 | os.makedirs(directory) 35 | return 36 | 37 | 38 | def is_ndds_dataset(input_dir, data_extension="json"): 39 | 40 | # Input argument handling 41 | # Expand user shortcut if it exists 42 | input_dir = os.path.expanduser(input_dir) 43 | assert os.path.exists( 44 | input_dir 45 | ), 'Expected path "{}" to exist, but it does not.'.format(input_dir) 46 | assert isinstance( 47 | data_extension, str 48 | ), 'Expected "data_extension" to be a string, but it is "{}".'.format( 49 | type(data_extension) 50 | ) 51 | 52 | data_full_ext = "." + data_extension 53 | 54 | dirlist = os.listdir(input_dir) 55 | 56 | # Find json files 57 | data_filenames = [f for f in dirlist if f.endswith(data_full_ext)] 58 | 59 | # Extract name from json file 60 | data_names = [os.path.splitext(f)[0] for f in data_filenames if f[0].isdigit()] 61 | 62 | is_ndds_dataset = True if data_names else False 63 | 64 | return is_ndds_dataset 65 | 66 | 67 | def find_ndds_data_in_dir( 68 | input_dir, data_extension="json", image_extension=None, requested_image_types="all", 69 | ): 70 | 71 | # Input argument handling 72 | # Expand user shortcut if it exists 73 | input_dir = os.path.expanduser(input_dir) 74 | assert os.path.exists( 75 | input_dir 76 | ), 'Expected path "{}" to exist, but it does not.'.format(input_dir) 77 | dirlist = os.listdir(input_dir) 78 | 79 | assert isinstance( 80 | data_extension, str 81 | ), 'Expected "data_extension" to be a string, but it is "{}".'.format( 82 | type(data_extension) 83 | ) 84 | data_full_ext = "." + data_extension 85 | 86 | if image_extension is None: 87 | # Auto detect based on list of image extensions to try 88 | # In case there is a tie, prefer the extensions that are closer to the front 89 | image_exts_to_try = ["png", "jpg"] 90 | num_image_exts = [] 91 | for image_ext in image_exts_to_try: 92 | num_image_exts.append(len([f for f in dirlist if f.endswith(image_ext)])) 93 | max_num_image_exts = np.max(num_image_exts) 94 | idx_max = np.where(num_image_exts == max_num_image_exts)[0] 95 | # If there are multiple indices due to ties, this uses the one closest to the front 96 | image_extension = image_exts_to_try[idx_max[0]] 97 | # Mention to user if there are multiple cases to ensure they are aware of the selection 98 | if len(idx_max) > 1 and max_num_image_exts > 0: 99 | print( 100 | 'Multiple sets of images detected in NDDS dataset with different extensions. Using extension "{}".'.format( 101 | image_extension 102 | ) 103 | ) 104 | else: 105 | assert isinstance( 106 | image_extension, str 107 | ), 'If specified, expected "image_extension" to be a string, but it is "{}".'.format( 108 | type(image_extension) 109 | ) 110 | image_full_ext = "." + image_extension 111 | 112 | assert ( 113 | requested_image_types is None 114 | or requested_image_types == "all" 115 | or isinstance(requested_image_types, list) 116 | ), "Expected \"requested_image_types\" to be None, 'all', or a list of requested_image_types." 117 | 118 | # Read in json files 119 | data_filenames = [f for f in dirlist if f.endswith(data_full_ext)] 120 | 121 | # Sort candidate data files by name 122 | data_filenames.sort() 123 | 124 | data_names = [os.path.splitext(f)[0] for f in data_filenames if f[0].isdigit()] 125 | 126 | # If there are no matching json files -- this is not an NDDS dataset -- return None 127 | if not data_names: 128 | return None, None 129 | 130 | data_paths = [os.path.join(input_dir, f) for f in data_filenames if f[0].isdigit()] 131 | 132 | if requested_image_types == "all": 133 | # Detect based on first entry 134 | first_entry_name = data_names[0] 135 | matching_image_names = [ 136 | f 137 | for f in dirlist 138 | if f.startswith(first_entry_name) and f.endswith(image_full_ext) 139 | ] 140 | find_rgb = ( 141 | True 142 | if first_entry_name + ".rgb" + image_full_ext in matching_image_names 143 | else False 144 | ) 145 | find_depth = ( 146 | True 147 | if first_entry_name + ".depth" + image_full_ext in matching_image_names 148 | else False 149 | ) 150 | find_cs = ( 151 | True 152 | if first_entry_name + ".cs" + image_full_ext in matching_image_names 153 | else False 154 | ) 155 | if len(matching_image_names) > 3: 156 | print("Image types detected that are not yet implemented in this function.") 157 | 158 | elif requested_image_types: 159 | # Check based on known data types 160 | known_image_types = ["rgb", "depth", "cs"] 161 | for this_image_type in requested_image_types: 162 | assert ( 163 | this_image_type in known_image_types 164 | ), 'Image type "{}" not recognized.'.format(this_image_type) 165 | 166 | find_rgb = True if "rgb" in requested_image_types else False 167 | find_depth = True if "depth" in requested_image_types else False 168 | find_cs = True if "cs" in requested_image_types else False 169 | 170 | else: 171 | find_rgb = False 172 | find_depth = False 173 | find_cs = False 174 | 175 | dict_of_lists_images = {} 176 | n_samples = len(data_names) 177 | 178 | if find_rgb: 179 | rgb_paths = [ 180 | os.path.join(input_dir, f + ".rgb" + image_full_ext) for f in data_names 181 | ] 182 | for n in range(n_samples): 183 | assert os.path.exists( 184 | rgb_paths[n] 185 | ), 'Expected image "{}" to exist, but it does not.'.format(rgb_paths[n]) 186 | dict_of_lists_images["rgb"] = rgb_paths 187 | 188 | if find_depth: 189 | depth_paths = [ 190 | os.path.join(input_dir, f + ".depth" + image_full_ext) for f in data_names 191 | ] 192 | for n in range(n_samples): 193 | assert os.path.exists( 194 | depth_paths[n] 195 | ), 'Expected image "{}" to exist, but it does not.'.format(depth_paths[n]) 196 | dict_of_lists_images["depth"] = depth_paths 197 | 198 | if find_cs: 199 | cs_paths = [ 200 | os.path.join(input_dir, f + ".cs" + image_full_ext) for f in data_names 201 | ] 202 | for n in range(n_samples): 203 | assert os.path.exists( 204 | cs_paths[n] 205 | ), 'Expected image "{}" to exist, but it does not.'.format(cs_paths[n]) 206 | dict_of_lists_images["class_segmentation"] = cs_paths 207 | 208 | found_images = [ 209 | dict(zip(dict_of_lists_images, t)) for t in zip(*dict_of_lists_images.values()) 210 | ] 211 | 212 | # Create output dictionaries 213 | dict_of_lists = {"name": data_names, "data_path": data_paths} 214 | 215 | if find_rgb or find_depth or find_cs: 216 | dict_of_lists["image_paths"] = found_images 217 | 218 | found_data = [dict(zip(dict_of_lists, t)) for t in zip(*dict_of_lists.values())] 219 | 220 | # Process config files, which are data files that don't have an associated image 221 | found_configs = {"camera": None, "object": None, "unsorted": []} 222 | data_filenames_without_images = [f for f in data_filenames if not f[0].isdigit()] 223 | 224 | for data_filename in data_filenames_without_images: 225 | if data_filename == "_camera_settings" + data_full_ext: 226 | found_configs["camera"] = os.path.join(input_dir, data_filename) 227 | elif data_filename == "_object_settings" + data_full_ext: 228 | found_configs["object"] = os.path.join(input_dir, data_filename) 229 | else: 230 | found_configs["unsorted"].append(os.path.join(input_dir, data_filename)) 231 | 232 | return found_data, found_configs 233 | 234 | 235 | def load_camera_intrinsics(camera_data_path): 236 | 237 | # Input argument handling 238 | assert os.path.exists( 239 | camera_data_path 240 | ), 'Expected path "{}" to exist, but it does not.'.format(camera_data_path) 241 | 242 | # Create YAML/json parser 243 | data_parser = YAML(typ="safe") 244 | 245 | with open(camera_data_path, "r") as f: 246 | cam_settings_data = data_parser.load(f) 247 | 248 | camera_fx = cam_settings_data["camera_settings"][0]["intrinsic_settings"]["fx"] 249 | camera_fy = cam_settings_data["camera_settings"][0]["intrinsic_settings"]["fy"] 250 | camera_cx = cam_settings_data["camera_settings"][0]["intrinsic_settings"]["cx"] 251 | camera_cy = cam_settings_data["camera_settings"][0]["intrinsic_settings"]["cy"] 252 | camera_K = np.array( 253 | [[camera_fx, 0.0, camera_cx], [0.0, camera_fy, camera_cy], [0.0, 0.0, 1.0]] 254 | ) 255 | 256 | return camera_K 257 | 258 | 259 | def load_image_resolution(camera_data_path): 260 | 261 | # Input argument handling 262 | assert os.path.exists( 263 | camera_data_path 264 | ), 'Expected path "{}" to exist, but it does not.'.format(camera_data_path) 265 | 266 | # Create YAML/json parser 267 | data_parser = YAML(typ="safe") 268 | 269 | with open(camera_data_path, "r") as f: 270 | cam_settings_data = data_parser.load(f) 271 | 272 | image_width = cam_settings_data["camera_settings"][0]["captured_image_size"][ 273 | "width" 274 | ] 275 | image_height = cam_settings_data["camera_settings"][0]["captured_image_size"][ 276 | "height" 277 | ] 278 | image_resolution = (image_width, image_height) 279 | 280 | return image_resolution 281 | 282 | 283 | def load_keypoints(data_path, object_name, keypoint_names): 284 | assert os.path.exists( 285 | data_path 286 | ), 'Expected data_path "{}" to exist, but it does not.'.format(data_path) 287 | 288 | # Set up output structure 289 | keypoint_data = {"positions_wrt_cam": [], "projections": []} 290 | 291 | # Load keypoints for a particular object for now 292 | parser = YAML(typ="safe") 293 | with open(data_path, "r") as f: 294 | data = parser.load(f) 295 | 296 | assert ( 297 | "objects" in data.keys() 298 | ), 'Expected "objects" key to exist in data file, but it does not.' 299 | 300 | object_names = [o["class"] for o in data["objects"]] 301 | assert ( 302 | object_name in object_names 303 | ), 'Requested object_name "{}" does not exist in the data file objects.'.format( 304 | object_name 305 | ) 306 | 307 | idx_object = object_names.index(object_name) 308 | 309 | object_data = data["objects"][idx_object] 310 | object_keypoints = object_data["keypoints"] 311 | 312 | object_keypoint_names = [kp["name"] for kp in object_keypoints] 313 | 314 | # Process in same order as keypoint_names to retain same order 315 | for kp_name in keypoint_names: 316 | assert ( 317 | kp_name in object_keypoint_names 318 | ), "Expected keypoint '{}' to exist in the data file '{}', but it does not. Rather, the keypoints are '{}'".format( 319 | kp_name, data_path, object_keypoint_names 320 | ) 321 | 322 | idx_kp = object_keypoint_names.index(kp_name) 323 | kp_data = object_keypoints[idx_kp] 324 | 325 | kp_position_wrt_cam = kp_data["location"] 326 | kp_projection = kp_data["projected_location"] 327 | 328 | keypoint_data["positions_wrt_cam"].append(kp_position_wrt_cam) 329 | keypoint_data["projections"].append(kp_projection) 330 | 331 | return keypoint_data 332 | -------------------------------------------------------------------------------- /manip_configs/baxter.yaml: -------------------------------------------------------------------------------- 1 | !!omap 2 | - manipulator: !!omap 3 | - name: baxter 4 | - keypoints: 5 | - name: torso_t0 6 | friendly_name: Torso 7 | ros_frame: torso_t0 8 | - name: left_s0 9 | friendly_name: Left:S0 10 | ros_frame: left_s0 11 | - name: left_s1 12 | friendly_name: Left:S1 13 | ros_frame: left_s1 14 | - name: left_e0 15 | friendly_name: Left:E0 16 | ros_frame: left_e0 17 | - name: left_e1 18 | friendly_name: Left:E1 19 | ros_frame: left_e1 20 | - name: left_w0 21 | friendly_name: Left:W0 22 | ros_frame: left_w0 23 | - name: left_w1 24 | friendly_name: Left:W1 25 | ros_frame: left_w1 26 | - name: left_w2 27 | friendly_name: Left:W2 28 | ros_frame: left_w2 29 | - name: left_hand 30 | friendly_name: LeftHand 31 | ros_frame: left_hand 32 | - name: right_s0 33 | friendly_name: Right:S0 34 | ros_frame: right_s0 35 | - name: right_s1 36 | friendly_name: Right:S1 37 | ros_frame: right_s1 38 | - name: right_e0 39 | friendly_name: Right:E0 40 | ros_frame: right_e0 41 | - name: right_e1 42 | friendly_name: Right:E1 43 | ros_frame: right_e1 44 | - name: right_w0 45 | friendly_name: Right:W0 46 | ros_frame: right_w0 47 | - name: right_w1 48 | friendly_name: Right:W1 49 | ros_frame: right_w1 50 | - name: right_w2 51 | friendly_name: Right:W2 52 | ros_frame: right_w2 53 | - name: right_hand 54 | friendly_name: RightHand 55 | ros_frame: right_hand 56 | -------------------------------------------------------------------------------- /manip_configs/kuka.yaml: -------------------------------------------------------------------------------- 1 | !!omap 2 | - manipulator: !!omap 3 | - name: kuka 4 | - keypoints: 5 | - name: iiwa7_link_0 6 | friendly_name: Base 7 | ros_frame: iiwa7_link_0 8 | - name: iiwa7_link_1 9 | friendly_name: Joint1 10 | ros_frame: iiwa7_link_1 11 | - name: iiwa7_link_2 12 | friendly_name: Joint2 13 | ros_frame: iiwa7_link_2 14 | - name: iiwa7_link_3 15 | friendly_name: Joint3 16 | ros_frame: iiwa7_link_3 17 | - name: iiwa7_link_4 18 | friendly_name: Joint4 19 | ros_frame: iiwa7_link_4 20 | - name: iiwa7_link_5 21 | friendly_name: Joint5 22 | ros_frame: iiwa7_link_5 23 | - name: iiwa7_link_6 24 | friendly_name: Joint6 25 | ros_frame: iiwa7_link_6 26 | - name: iiwa7_link_7 27 | friendly_name: Wrist 28 | ros_frame: iiwa7_link_7 29 | -------------------------------------------------------------------------------- /manip_configs/panda.yaml: -------------------------------------------------------------------------------- 1 | !!omap 2 | - manipulator: !!omap 3 | - name: panda 4 | - keypoints: 5 | - name: panda_link0 6 | friendly_name: Base 7 | ros_frame: panda_link0 8 | - name: panda_link2 9 | friendly_name: Joint2 10 | ros_frame: panda_link2 11 | - name: panda_link3 12 | friendly_name: Joint3 13 | ros_frame: panda_link3 14 | - name: panda_link4 15 | friendly_name: Joint4 16 | ros_frame: panda_link4 17 | - name: panda_link6 18 | friendly_name: Joint6 19 | ros_frame: panda_link6 20 | - name: panda_link7 21 | friendly_name: Joint7 22 | ros_frame: panda_link7 23 | - name: panda_hand 24 | friendly_name: Hand 25 | ros_frame: panda_hand 26 | -------------------------------------------------------------------------------- /requirements.txt: -------------------------------------------------------------------------------- 1 | future 2 | albumentations 3 | enum34 4 | matplotlib 5 | numpy 6 | opencv-python 7 | pandas 8 | Pillow 9 | pyrr 10 | pytest 11 | ruamel.yaml 12 | seaborn 13 | scipy 14 | tensorboardX 15 | torch 16 | torchvision 17 | tqdm 18 | webcolors 19 | gdown 20 | -------------------------------------------------------------------------------- /scripts/analyze_training.py: -------------------------------------------------------------------------------- 1 | # Copyright (c) 2020 NVIDIA Corporation. All rights reserved. 2 | # This work is licensed under the NVIDIA Source Code License - Non-commercial. Full 3 | # text can be found in LICENSE.md 4 | 5 | import argparse 6 | import os 7 | import pickle 8 | 9 | from ruamel.yaml import YAML 10 | 11 | import dream.analysis as dream_analysis 12 | 13 | # ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 14 | 15 | LOSS_TEXT = "loss" 16 | VIZ_TEXT = "viz" 17 | 18 | 19 | def analyze_training(args): 20 | 21 | # Input argument handling 22 | assert os.path.exists( 23 | args.input_params_path 24 | ), 'Expected input_params_path "{}" to exist, but it does not.'.format( 25 | args.input_params_path 26 | ) 27 | 28 | if args.input_config_path: 29 | input_config_path = args.input_config_path 30 | else: 31 | # Use params filepath to infer the config filepath 32 | input_config_path = os.path.splitext(args.input_params_path)[0] + ".yaml" 33 | 34 | assert os.path.exists( 35 | input_config_path 36 | ), 'Expected input_config_path "{}" to exist, but it does not.'.format( 37 | input_config_path 38 | ) 39 | 40 | do_training_plots = True if LOSS_TEXT in args.analyses else False 41 | do_visualizations = True if VIZ_TEXT in args.analyses else False 42 | 43 | dream.utilities.makedirs(args.output_dir, exist_ok=args.force_overwrite) 44 | 45 | if do_training_plots: 46 | 47 | # Use params filepath to infer the training log pickle 48 | training_log_path = os.path.join( 49 | os.path.dirname(args.input_params_path), "training_log.pkl" 50 | ) 51 | with open(training_log_path, "rb") as f: 52 | training_log = pickle.load(f) 53 | 54 | epochs = training_log["epochs"] 55 | batch_training_losses = training_log["batch_training_losses"] 56 | batch_validation_losses = training_log["batch_validation_losses"] 57 | 58 | save_plot_path = os.path.join(args.output_dir, "train_valid_loss.png") 59 | dream_analysis.plot_train_valid_loss( 60 | epochs, 61 | batch_training_losses, 62 | batch_validation_losses, 63 | save_plot_path=save_plot_path, 64 | ) 65 | 66 | if do_visualizations: 67 | 68 | # Create parser 69 | data_parser = YAML(typ="safe") 70 | 71 | with open(input_config_path, "r") as f: 72 | network_config = data_parser.load(f) 73 | 74 | dataset_dir = os.path.expanduser(network_config["data_path"]) 75 | 76 | dream_analysis.analyze_ndds_dataset( 77 | args.input_params_path, 78 | input_config_path, 79 | dataset_dir, 80 | args.output_dir, 81 | batch_size=args.batch_size, 82 | force_overwrite=args.force_overwrite, 83 | gpu_ids=args.gpu_ids, 84 | ) 85 | 86 | 87 | if __name__ == "__main__": 88 | parser = argparse.ArgumentParser( 89 | formatter_class=argparse.ArgumentDefaultsHelpFormatter 90 | ) 91 | parser.add_argument( 92 | "-i", 93 | "--input-params-path", 94 | required=True, 95 | help="Path to network parameters file.", 96 | ) 97 | parser.add_argument( 98 | "-c", 99 | "--input-config-path", 100 | default=None, 101 | help="Path to network configuration file.", 102 | ) 103 | parser.add_argument( 104 | "-o", 105 | "--output-dir", 106 | required=True, 107 | help="Path to output directory to save training analysis results.", 108 | ) 109 | parser.add_argument( 110 | "-f", 111 | "--force-overwrite", 112 | action="store_true", 113 | default=False, 114 | help="Forces overwriting of analysis results in the provided directory.", 115 | ) 116 | parser.add_argument( 117 | "-a", 118 | "--analyses", 119 | nargs="+", 120 | choices=[LOSS_TEXT, VIZ_TEXT], 121 | default=[LOSS_TEXT, VIZ_TEXT], 122 | help="Specify the analyses to run: 'loss' is loss plots; 'viz' is visualizations. Multiple analyses can be specified.", 123 | ) 124 | parser.add_argument( 125 | "-b", 126 | "--batch-size", 127 | type=int, 128 | default=1, 129 | help="The batch size used to process the data. Does not affect results, only how quickly the results are obtained.", 130 | ) 131 | parser.add_argument( 132 | "-g", 133 | "--gpu-ids", 134 | nargs="+", 135 | type=int, 136 | default=None, 137 | help="The GPU IDs on which to conduct network inference. Nothing specified means all GPUs will be utilized. Does not affect results, only how quickly the results are obtained.", 138 | ) 139 | args = parser.parse_args() 140 | analyze_training(args) 141 | -------------------------------------------------------------------------------- /scripts/analyze_training_multi.py: -------------------------------------------------------------------------------- 1 | # Copyright (c) 2020 NVIDIA Corporation. All rights reserved. 2 | # This work is licensed under the NVIDIA Source Code License - Non-commercial. Full 3 | # text can be found in LICENSE.md 4 | 5 | import argparse 6 | import os 7 | import pickle 8 | 9 | import matplotlib.pyplot as plt 10 | import numpy as np 11 | 12 | import dream.analysis as dream_analysis 13 | 14 | # ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 15 | 16 | 17 | def analyze_training(args): 18 | 19 | # Input argument handling 20 | assert os.path.exists( 21 | args.input_dir 22 | ), 'Expected input directory "{}" to exist, but it does not.'.format(args.input_dir) 23 | 24 | save_results = True if args.output_dir else False 25 | 26 | if save_results: 27 | dream.utilities.makedirs(args.output_dir, exist_ok=args.force_overwrite) 28 | 29 | dir_list = [ 30 | this_dir 31 | for this_dir in os.listdir(args.input_dir) 32 | if os.path.isdir(os.path.join(args.input_dir, this_dir)) 33 | ] 34 | dir_list.sort(key=lambda x: os.path.getmtime(os.path.join(args.input_dir, x))) 35 | 36 | train_epochs = None 37 | all_losses_list = [] 38 | all_validation_losses_list = [] 39 | random_seeds = [] 40 | for d in dir_list: 41 | results_path = os.path.join(args.input_dir, d, "training_log.pkl") 42 | with open(results_path, "rb") as f: 43 | training_log = pickle.load(f) 44 | 45 | # Process log 46 | epochs = training_log["epochs"] 47 | losses = training_log["losses"] 48 | this_random_seed = training_log["random_seed"] 49 | 50 | if train_epochs: 51 | assert train_epochs == epochs 52 | else: 53 | train_epochs = epochs 54 | 55 | all_losses_list.append(losses) 56 | 57 | if "validation_losses" in training_log: 58 | all_validation_losses_list.append(training_log["validation_losses"]) 59 | 60 | random_seeds.append(this_random_seed) 61 | 62 | print("{}: Random seed: {}".format(d, this_random_seed)) 63 | 64 | # Process 65 | all_losses = np.array(all_losses_list) 66 | all_losses_std = np.std(all_losses, axis=0) 67 | all_losses_mean = np.mean(all_losses, axis=0) 68 | all_losses_med = np.median(all_losses, axis=0) 69 | all_losses_max = np.max(all_losses, axis=0) 70 | all_losses_min = np.min(all_losses, axis=0) 71 | 72 | all_validation_losses = np.array(all_validation_losses_list) 73 | 74 | # Determine the worst and best results 75 | n_traces = len(all_losses_list) 76 | n_epochs = len(train_epochs) 77 | all_losses_lasthalf = all_losses[:, (n_epochs // 2) :] 78 | all_losses_lasthalf_sum = np.sum(all_losses_lasthalf, axis=1) 79 | x_worst = np.argmax(all_losses_lasthalf_sum) 80 | x_best = np.argmin(all_losses_lasthalf_sum) 81 | x_sort = np.argsort(all_losses_lasthalf_sum) 82 | x_median = x_sort[n_traces // 2] 83 | 84 | print("Training Loss Performance") 85 | print("~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~") 86 | print("Best instance for training loss: {}".format(dir_list[x_best])) 87 | print("Median instance for training loss: {}".format(dir_list[x_median])) 88 | print("Worst instance for training loss: {}".format(dir_list[x_worst])) 89 | print("") 90 | 91 | # Plot 92 | fig, ax = plt.subplots() 93 | ax.plot(train_epochs, np.transpose(all_losses), ".-") 94 | ax.plot( 95 | train_epochs, 96 | all_losses[x_worst, :], 97 | "-", 98 | linewidth=8, 99 | alpha=0.667, 100 | label="Worst training result", 101 | ) 102 | ax.plot( 103 | train_epochs, 104 | all_losses[x_best, :], 105 | "-", 106 | linewidth=8, 107 | alpha=0.667, 108 | label="Best training result", 109 | ) 110 | ax.plot( 111 | train_epochs, 112 | all_losses[x_median, :], 113 | "-", 114 | linewidth=8, 115 | alpha=0.667, 116 | label="Median training result", 117 | ) 118 | ax.grid() 119 | plt.xlabel("Training epoch") 120 | plt.ylabel("Training loss") 121 | plt.xlim((train_epochs[0], train_epochs[-1])) 122 | plt.title("All training results ({} instances)".format(n_traces)) 123 | ax.legend(loc="best") 124 | if save_results: 125 | training_results_path = os.path.join( 126 | args.output_dir, "training_results_instances.png" 127 | ) 128 | plt.savefig(training_results_path) 129 | 130 | fig, ax = plt.subplots() 131 | ax.fill_between( 132 | train_epochs, 133 | all_losses_mean - all_losses_std, 134 | all_losses_mean + all_losses_std, 135 | alpha=0.333, 136 | label="Aggregate mean +- 1 std dev", 137 | ) 138 | ax.plot(train_epochs, all_losses_mean, ".-", label="Aggregate mean") 139 | ax.plot(train_epochs, all_losses_med, ".-", label="Aggregate median") 140 | ax.plot(train_epochs, all_losses_min, ".-", label="Aggregate min") 141 | ax.plot(train_epochs, all_losses_max, ".-", label="Aggregate max") 142 | ax.grid() 143 | plt.xlabel("Training epoch") 144 | plt.ylabel("Training loss") 145 | plt.xlim((train_epochs[0], train_epochs[-1])) 146 | plt.title("Aggregate (epoch-wise) training results ({} instances)".format(n_traces)) 147 | ax.legend(loc="best") 148 | if save_results: 149 | training_results_agg_path = os.path.join( 150 | args.output_dir, "training_results_aggregate.png" 151 | ) 152 | plt.savefig(training_results_agg_path) 153 | 154 | if not save_results: 155 | plt.show() 156 | 157 | # Now show validation losses 158 | if len(all_validation_losses) > 0: 159 | 160 | # Determine the best overall validation loss 161 | min_validation_losses_traces = np.min(all_validation_losses, axis=1) 162 | min_valid_loss_all_traces = np.min(min_validation_losses_traces) 163 | x_best_valid_loss = np.argmin(min_validation_losses_traces) 164 | x_epoch = np.argmin(all_validation_losses[x_best_valid_loss]) 165 | 166 | print("Validation Loss Performance:") 167 | print("~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~") 168 | print( 169 | "Best instance for validation loss: {} ({} after epoch {})".format( 170 | dir_list[x_best_valid_loss], 171 | min_valid_loss_all_traces, 172 | train_epochs[x_epoch], 173 | ) 174 | ) 175 | 176 | for n in range(n_traces): 177 | 178 | save_plot_path = ( 179 | os.path.join(args.output_dir, "train_valid_loss_{}".format(dir_list[n])) 180 | if save_results 181 | else None 182 | ) 183 | dream_analysis.plot_train_valid_loss( 184 | train_epochs, 185 | all_losses[n, :], 186 | all_validation_losses[n, :], 187 | dataset_name=dir_list[n], 188 | save_plot_path=save_plot_path, 189 | ) 190 | 191 | if not save_results: 192 | plt.show() 193 | 194 | 195 | if __name__ == "__main__": 196 | 197 | # Parse input arguments 198 | parser = argparse.ArgumentParser( 199 | formatter_class=argparse.ArgumentDefaultsHelpFormatter 200 | ) 201 | parser.add_argument( 202 | "-i", 203 | "--input-dir", 204 | metavar="input_dir", 205 | required=True, 206 | help="Path to directory of training results to analyze.", 207 | ) 208 | parser.add_argument( 209 | "-o", 210 | "--output-dir", 211 | metavar="output_dir", 212 | default=None, 213 | help="Path to output directory to save analysis results.", 214 | ) 215 | parser.add_argument( 216 | "-f", 217 | "--force-overwrite", 218 | action="store_true", 219 | default=False, 220 | help="Forces overwriting of analysis results in the provided directory.", 221 | ) 222 | args = parser.parse_args() 223 | 224 | # Analyze training results 225 | analyze_training(args) 226 | -------------------------------------------------------------------------------- /scripts/launch_dream_ros.py: -------------------------------------------------------------------------------- 1 | # Copyright (c) 2020 NVIDIA Corporation. All rights reserved. 2 | # This work is licensed under the NVIDIA Source Code License - Non-commercial. Full 3 | # text can be found in LICENSE.md 4 | 5 | #!/usr/bin/env python2 6 | 7 | import argparse 8 | import os 9 | 10 | import cv2 11 | import numpy as np 12 | from PIL import Image as PILImage 13 | from pyrr import Quaternion 14 | from ruamel.yaml import YAML 15 | import torch 16 | import webcolors 17 | 18 | import rospy 19 | 20 | from cv_bridge import CvBridge 21 | from geometry_msgs.msg import TransformStamped 22 | from sensor_msgs.msg import Image, CameraInfo 23 | from std_srvs.srv import Empty 24 | import tf2_ros as tf2 25 | import tf 26 | 27 | import dream 28 | 29 | # ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 30 | 31 | # USERS: Please change the ROS topics / frames below as desired. 32 | 33 | # ROS topic for listening to RGB images 34 | image_topic = "/camera/color/image_raw" 35 | 36 | # ROS topic for listening to camera intrinsics 37 | camera_info_topic = "/camera/color/camera_info" 38 | 39 | # ROS service for sending request to capture frame 40 | capture_frame_service_topic = "/dream/capture_frame" 41 | 42 | # ROS service for sending request to clear buffer 43 | clear_buffer_service_topic = "/dream/clear_buffer" 44 | 45 | # ROS topics for outputs 46 | topic_out_net_input_image = "/dream/net_input_image" 47 | topic_out_keypoint_overlay = "/dream/keypoint_overlay" 48 | topic_out_belief_maps = "/dream/belief_maps" 49 | topic_out_keypoint_belief_overlay = "/dream/keypoint_belief_overlay" 50 | topic_out_keypoint_names = "/dream/keypoint_names" 51 | topic_out_keypoint_frame_overlay = "/dream/keypoint_frame_overlay" 52 | 53 | # ROS frames for the output of DREAM 54 | # tform_out_basename is now set by the user - previously was 'dream/base_frame' 55 | tform_out_childname = "dream/camera_rgb_frame" 56 | 57 | # ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 58 | 59 | 60 | class DreamInferenceROS: 61 | def __init__(self, args, single_frame_mode=True, compute_2d_to_3d_transform=False): 62 | """Initialize inference engine. 63 | 64 | single_frame_mode: Set this to True. (Appears to be some sort of future-proofing by Tim.) 65 | """ 66 | self.cv_image = None 67 | self.camera_K = None 68 | 69 | # TODO: -- continuous mode produces a TF at each frame 70 | # not continuous mode allows for several frames before producing an estimate 71 | self.single_frame_mode = single_frame_mode 72 | self.capture_frame_srv = rospy.Service( 73 | capture_frame_service_topic, Empty, self.on_capture_frame 74 | ) 75 | self.clear_buffer_srv = rospy.Service( 76 | clear_buffer_service_topic, Empty, self.on_clear_buffer 77 | ) 78 | self.kp_projs_raw_buffer = np.array([]) 79 | self.kp_positions_buffer = np.array([]) 80 | self.pnp_solution_found = False 81 | self.capture_frame_max_kps = True 82 | 83 | self.compute_2d_to_3d_transform = compute_2d_to_3d_transform 84 | 85 | # Create subscribers 86 | self.image_sub = rospy.Subscriber( 87 | image_topic, Image, self.on_image, queue_size=1 88 | ) 89 | self.bridge = CvBridge() 90 | 91 | # Input argument handling 92 | assert os.path.exists( 93 | args.input_params_path 94 | ), 'Expected input_params_path "{}" to exist, but it does not.'.format( 95 | args.input_params_path 96 | ) 97 | 98 | if args.input_config_path: 99 | input_config_path = args.input_config_path 100 | else: 101 | # Use params filepath to infer the config filepath 102 | input_config_path = os.path.splitext(args.input_params_path)[0] + ".yaml" 103 | 104 | assert os.path.exists( 105 | input_config_path 106 | ), 'Expected input_config_path "{}" to exist, but it does not.'.format( 107 | input_config_path 108 | ) 109 | 110 | # Create parser 111 | print("# Opening config file: {} ...".format(input_config_path)) 112 | data_parser = YAML(typ="safe") 113 | 114 | with open(input_config_path, "r") as f: 115 | network_config = data_parser.load(f) 116 | 117 | # Overwrite GPU 118 | # If nothing is specified at the command line, None is the default, which uses all GPUs 119 | # TBD - think about a better way of doing this 120 | network_config["training"]["platform"]["gpu_ids"] = args.gpu_ids 121 | 122 | # Load network 123 | print("# Creating network...") 124 | self.dream_network = dream.create_network_from_config_data(network_config) 125 | 126 | print( 127 | "Loading network with weights from: {} ...".format(args.input_params_path) 128 | ) 129 | self.dream_network.model.load_state_dict(torch.load(args.input_params_path)) 130 | self.dream_network.enable_evaluation() 131 | 132 | # Use image preprocessing specified by config by default, unless user specifies otherwise 133 | if args.image_preproc_override: 134 | self.image_preprocessing = args.image_preproc_override 135 | else: 136 | self.image_preprocessing = self.dream_network.image_preprocessing() 137 | 138 | # Output names used to look up keypoints in TF tree 139 | self.keypoint_tf_frames = self.dream_network.ros_keypoint_frames 140 | print("ROS keypoint frames: {}".format(self.keypoint_tf_frames)) 141 | 142 | # Define publishers 143 | self.net_input_image_pub = rospy.Publisher( 144 | topic_out_net_input_image, Image, queue_size=1 145 | ) 146 | self.image_overlay_pub = rospy.Publisher( 147 | topic_out_keypoint_overlay, Image, queue_size=1 148 | ) 149 | self.belief_image_pub = rospy.Publisher( 150 | topic_out_belief_maps, Image, queue_size=1 151 | ) 152 | self.kp_belief_overlay_pub = rospy.Publisher( 153 | topic_out_keypoint_belief_overlay, Image, queue_size=1 154 | ) 155 | self.kp_frame_overlay_pub = rospy.Publisher( 156 | topic_out_keypoint_frame_overlay, Image, queue_size=1 157 | ) 158 | 159 | # Store the base frame for the TF lookup 160 | self.base_tf_frame = args.base_frame 161 | 162 | # Define TFs 163 | self.tfBuffer = tf2.Buffer() 164 | self.tf_broadcaster = tf2.TransformBroadcaster() 165 | self.listener = tf2.TransformListener(self.tfBuffer) 166 | self.camera_pose_tform = TransformStamped() 167 | 168 | self.camera_pose_tform.header.frame_id = self.base_tf_frame 169 | self.camera_pose_tform.child_frame_id = tform_out_childname 170 | 171 | # Subscriber for camera intrinsics topic 172 | self.camera_info_sub = rospy.Subscriber( 173 | camera_info_topic, CameraInfo, self.on_camera_info, queue_size=1 174 | ) 175 | 176 | # Verbose mode 177 | self.verbose = args.verbose 178 | 179 | def on_capture_frame(self, req): 180 | print("Capturing frame.") 181 | found_kp_projs_net_input = dream_ros.process_image() 182 | print(found_kp_projs_net_input) 183 | ( 184 | kp_projs_raw_good_sample, 185 | kp_positions_good_sample, 186 | ) = dream_ros.keypoint_correspondences(found_kp_projs_net_input) 187 | if self.capture_frame_max_kps and kp_projs_raw_good_sample is not None: 188 | n_found_keypoints = kp_projs_raw_good_sample.shape[0] 189 | if n_found_keypoints != self.dream_network.n_keypoints: 190 | print( 191 | "Only found {} keypoints -- not continuing. Try again.".format( 192 | n_found_keypoints 193 | ) 194 | ) 195 | return [] 196 | if ( 197 | kp_projs_raw_good_sample is not None 198 | and kp_positions_good_sample is not None 199 | ): 200 | dream_ros.solve_pnp_buffer( 201 | kp_projs_raw_good_sample, kp_positions_good_sample 202 | ) 203 | return [] 204 | 205 | def on_clear_buffer(self, req): 206 | print("Clearing frame buffer.") 207 | self.kp_projs_raw_buffer = np.array([]) 208 | self.kp_positions_buffer = np.array([]) 209 | self.pnp_solution_found = False 210 | return [] 211 | 212 | def on_image(self, image): 213 | self.cv_image = self.bridge.imgmsg_to_cv2(image, "rgb8") 214 | 215 | def on_camera_info(self, camera_info): 216 | # Create camera intrinsics matrix 217 | fx = camera_info.K[0] 218 | fy = camera_info.K[4] 219 | cx = camera_info.K[2] 220 | cy = camera_info.K[5] 221 | self.camera_K = np.array([[fx, 0.0, cx], [0.0, fy, cy], [0.0, 0.0, 1.0]]) 222 | 223 | def process_image(self): 224 | """Performs inference on the image most recently captured. 225 | 226 | Input (none) 227 | self.cv_image: Holds the image to be processed. 228 | 229 | Returns 230 | detected_keypoints: Array of 2D keypoint coords wrt original image, possibly including missing keypoints 231 | """ 232 | 233 | if self.cv_image is None: 234 | return 235 | 236 | # Determine if we need debug content from single-image inference based on subscribers 237 | num_connect_belief_image_pub = self.belief_image_pub.get_num_connections() 238 | num_connect_net_input_image_pub = self.net_input_image_pub.get_num_connections() 239 | num_connect_image_overlay_pub = self.image_overlay_pub.get_num_connections() 240 | num_connect_kp_belief_overlay_pub = ( 241 | self.kp_belief_overlay_pub.get_num_connections() 242 | ) 243 | 244 | if ( 245 | num_connect_belief_image_pub > 0 246 | or num_connect_net_input_image_pub > 0 247 | or num_connect_image_overlay_pub > 0 248 | or num_connect_kp_belief_overlay_pub > 0 249 | ): 250 | debug_inference = True 251 | else: 252 | debug_inference = False 253 | 254 | # Detect keypoints from single-image inference 255 | image_raw = PILImage.fromarray(self.cv_image) 256 | detection_result = self.dream_network.keypoints_from_image( 257 | image_raw, 258 | image_preprocessing_override=self.image_preprocessing, 259 | debug=debug_inference, 260 | ) 261 | detected_keypoints = detection_result["detected_keypoints"] 262 | 263 | # Publish debug topics - only do this if something is subscribed 264 | # TODO: clean up so some of the intermediate processing isn't duplicated 265 | if num_connect_belief_image_pub > 0: 266 | belief_maps = detection_result["belief_maps"] 267 | belief_images = dream.image_proc.images_from_belief_maps( 268 | belief_maps, normalization_method=6 269 | ) 270 | belief_image_mosaic = dream.image_proc.mosaic_images( 271 | belief_images, rows=1, cols=len(belief_images), inner_padding_px=5 272 | ) 273 | cv_belief_image = np.array(belief_image_mosaic) 274 | cv_belief_image = cv_belief_image[:, :, ::-1].copy() 275 | belief_msg = self.bridge.cv2_to_imgmsg(cv_belief_image, encoding="bgr8") 276 | self.belief_image_pub.publish(belief_msg) 277 | 278 | if num_connect_net_input_image_pub > 0: 279 | net_input_image = detection_result["image_rgb_net_input"] 280 | cv_input_image = np.array(net_input_image) 281 | cv_input_image = cv_input_image[:, :, ::-1].copy() 282 | net_input_image_msg = self.bridge.cv2_to_imgmsg( 283 | cv_input_image, encoding="bgr8" 284 | ) 285 | self.net_input_image_pub.publish(net_input_image_msg) 286 | 287 | if num_connect_image_overlay_pub > 0: 288 | # TODO: fix color cycle 289 | if self.dream_network.n_keypoints == 7: 290 | point_colors = [ 291 | "red", 292 | "blue", 293 | "green", 294 | "yellow", 295 | "black", 296 | "cyan", 297 | "white", 298 | ] 299 | else: 300 | point_colors = "red" 301 | 302 | image_overlay = dream.image_proc.overlay_points_on_image( 303 | image_raw, 304 | detected_keypoints, 305 | self.dream_network.friendly_keypoint_names, 306 | annotation_color_dot=point_colors, 307 | annotation_color_text=point_colors, 308 | ) 309 | cv_image_overlay = np.array(image_overlay) 310 | cv_image_overlay = cv_image_overlay[:, :, ::-1].copy() 311 | image_overlay_msg = self.bridge.cv2_to_imgmsg( 312 | cv_image_overlay, encoding="bgr8" 313 | ) 314 | self.image_overlay_pub.publish(image_overlay_msg) 315 | 316 | if num_connect_kp_belief_overlay_pub > 0: 317 | image_raw_resolution = (self.cv_image.shape[1], self.cv_image.shape[0]) 318 | net_input_resolution = detection_result["image_rgb_net_input"].size 319 | belief_maps = detection_result["belief_maps"] 320 | flattened_belief_tensor = belief_maps.sum(dim=0) 321 | flattened_belief_image = dream.image_proc.image_from_belief_map( 322 | flattened_belief_tensor, colormap="hot", normalization_method=6 323 | ) 324 | flattened_belief_image_netin = dream.image_proc.convert_image_to_netin_from_netout( 325 | flattened_belief_image, net_input_resolution 326 | ) 327 | flattened_belief_image_raw = dream.image_proc.inverse_preprocess_image( 328 | flattened_belief_image_netin, 329 | image_raw_resolution, 330 | self.image_preprocessing, 331 | ) 332 | flattened_belief_image_raw_blend = PILImage.blend( 333 | image_raw, flattened_belief_image_raw, alpha=0.5 334 | ) 335 | 336 | # Overlay keypoints 337 | # TODO: fix color cycle 338 | if self.dream_network.n_keypoints == 7: 339 | point_colors = [ 340 | "red", 341 | "blue", 342 | "green", 343 | "yellow", 344 | "black", 345 | "cyan", 346 | "white", 347 | ] 348 | else: 349 | point_colors = "red" 350 | 351 | kp_belief_overlay = dream.image_proc.overlay_points_on_image( 352 | flattened_belief_image_raw_blend, 353 | detected_keypoints, 354 | self.dream_network.friendly_keypoint_names, 355 | annotation_color_dot=point_colors, 356 | annotation_color_text=point_colors, 357 | ) 358 | cv_kp_belief_overlay = np.array(kp_belief_overlay) 359 | cv_kp_belief_overlay = cv_kp_belief_overlay[:, :, ::-1].copy() 360 | kp_belief_overlay_msg = self.bridge.cv2_to_imgmsg( 361 | cv_kp_belief_overlay, encoding="bgr8" 362 | ) 363 | self.kp_belief_overlay_pub.publish(kp_belief_overlay_msg) 364 | 365 | return detected_keypoints 366 | 367 | def keypoint_correspondences(self, detected_kp_projs): 368 | """Convert 2D keypoint coords to (2D, 3D) pairs of correspondences. 369 | 370 | Input: 371 | detected_kp_projs: Array of 2D keypoint coords wrt original image, possibly including missing keypoints 372 | 373 | Returns: 374 | kp_projs_raw_good_sample: Array of 2D keypoint coords wrt original image 375 | kp_positions_good_sample: Array of 3D keypoint coords 376 | """ 377 | 378 | if not self.compute_2d_to_3d_transform: 379 | return None, None 380 | 381 | all_kp_positions = [] 382 | 383 | for i in range(len(self.keypoint_tf_frames)): 384 | keypoint_tf_frame = self.keypoint_tf_frames[i] 385 | if self.verbose: 386 | print( 387 | "Attempting transform lookup between {} and {}...".format( 388 | self.base_tf_frame, keypoint_tf_frame 389 | ) 390 | ) 391 | try: 392 | tform = self.tfBuffer.lookup_transform( 393 | self.base_tf_frame, keypoint_tf_frame, rospy.Time() 394 | ) 395 | this_pos = np.array( 396 | [ 397 | tform.transform.translation.x, 398 | tform.transform.translation.y, 399 | tform.transform.translation.z, 400 | ] 401 | ) 402 | all_kp_positions.append(this_pos) 403 | 404 | except tf2.TransformException as e: 405 | print("TF Exception: {}".format(e)) 406 | return None, None 407 | 408 | # Now determine which to keep 409 | kp_projs_raw_good_sample = [] 410 | kp_positions_good_sample = [] 411 | for this_kp_proj_est, this_kp_position in zip( 412 | detected_kp_projs, all_kp_positions 413 | ): 414 | if ( 415 | this_kp_proj_est is not None 416 | and this_kp_proj_est[0] 417 | and this_kp_proj_est[1] 418 | and not (this_kp_proj_est[0] < -999.0 and this_kp_proj_est[1] < 0.999) 419 | ): 420 | # This keypoint is defined to exist within the image frame, so we keep it 421 | kp_projs_raw_good_sample.append(this_kp_proj_est) 422 | kp_positions_good_sample.append(this_kp_position) 423 | 424 | kp_projs_raw_good_sample = np.array(kp_projs_raw_good_sample) 425 | kp_positions_good_sample = np.array(kp_positions_good_sample) 426 | 427 | return kp_projs_raw_good_sample, kp_positions_good_sample 428 | 429 | def solve_pnp_buffer(self, candidate_kp_projs_raw, candidate_kp_positions): 430 | 431 | if self.camera_K is None: 432 | self.pnp_solution_found = False 433 | return 434 | 435 | kp_projs_raw_to_try = np.array( 436 | self.kp_projs_raw_buffer.tolist() + candidate_kp_projs_raw.tolist() 437 | ) 438 | kp_positions_to_try = np.array( 439 | self.kp_positions_buffer.tolist() + candidate_kp_positions.tolist() 440 | ) 441 | 442 | if self.verbose: 443 | print("\nSolving for PNP... ~~~~~~~~~~~~~~~~~~~~") 444 | print("2D Detected KP projections in raw:") 445 | print(kp_projs_raw_to_try) 446 | 447 | print("3D KP positions:") 448 | print(kp_positions_to_try) 449 | 450 | pnp_retval, tvec, quat = dream.geometric_vision.solve_pnp( 451 | kp_positions_to_try, kp_projs_raw_to_try, self.camera_K 452 | ) 453 | 454 | if pnp_retval: 455 | self.pnp_solution_found = True 456 | 457 | if self.verbose: 458 | print("Camera-from-robot pose, found by PNP:") 459 | print(tvec) 460 | print(quat) 461 | 462 | # Update transform 463 | T_cam_from_robot = tf.transformations.quaternion_matrix(quat) 464 | T_cam_from_robot[:3, -1] = tvec 465 | T_robot_from_cam = tf.transformations.inverse_matrix(T_cam_from_robot) 466 | 467 | robot_from_cam_pos = T_robot_from_cam[:3, -1] 468 | 469 | R_robot_from_cam = T_robot_from_cam[:3, :3] 470 | temp = tf.transformations.identity_matrix() 471 | temp[:3, :3] = R_robot_from_cam 472 | robot_from_cam_quat = tf.transformations.quaternion_from_matrix(temp) 473 | 474 | # Update transform msg for publication 475 | self.camera_pose_tform.transform.translation.x = robot_from_cam_pos[0] 476 | self.camera_pose_tform.transform.translation.y = robot_from_cam_pos[1] 477 | self.camera_pose_tform.transform.translation.z = robot_from_cam_pos[2] 478 | 479 | self.camera_pose_tform.transform.rotation.x = robot_from_cam_quat[0] 480 | self.camera_pose_tform.transform.rotation.y = robot_from_cam_quat[1] 481 | self.camera_pose_tform.transform.rotation.z = robot_from_cam_quat[2] 482 | self.camera_pose_tform.transform.rotation.w = robot_from_cam_quat[3] 483 | 484 | # Save buffer since we got a PNP solution if we're not in single-frame mode 485 | if not self.single_frame_mode: 486 | self.kp_projs_raw_buffer = kp_projs_raw_to_try 487 | self.kp_positions_buffer = kp_positions_to_try 488 | print( 489 | "Adding to buffer! New buffer size: {}".format( 490 | self.kp_positions_buffer.shape[0] 491 | ) 492 | ) 493 | 494 | else: 495 | print("PnP failed to provide a solution.") 496 | self.pnp_solution_found = False 497 | 498 | def publish_pose(self): 499 | self.camera_pose_tform.header.stamp = rospy.Time().now() 500 | self.tf_broadcaster.sendTransform(self.camera_pose_tform) 501 | 502 | # Generate and publish keypoint frame overlay 503 | if self.kp_frame_overlay_pub.get_num_connections() > 0: 504 | 505 | if self.cv_image is None or self.camera_K is None: 506 | return 507 | 508 | all_kp_transforms = [] 509 | for i in range(len(self.keypoint_tf_frames)): 510 | keypoint_tf_frame = self.keypoint_tf_frames[i] 511 | # Lookup transform between dream published frame and keypoint frames 512 | try: 513 | tform = self.tfBuffer.lookup_transform( 514 | self.camera_pose_tform.child_frame_id, 515 | keypoint_tf_frame, 516 | rospy.Time(), 517 | ) 518 | all_kp_transforms.append(tform) 519 | 520 | except tf2.TransformException as e: 521 | print("TF Exception: {}".format(e)) 522 | return None, None 523 | 524 | cv_image_overlay = self.cv_image.copy() 525 | 526 | frame_len = 0.1 527 | frame_thickness = 3 528 | frame_triad_pts = np.array( 529 | [ 530 | [0.0, 0.0, 0.0, 1.0], 531 | [frame_len, 0.0, 0.0, 1.0], 532 | [0.0, frame_len, 0.0, 1.0], 533 | [0.0, 0.0, frame_len, 1.0], 534 | ] 535 | ) 536 | shift = 4 537 | factor = 1 << shift 538 | point_radius = 4.0 539 | 540 | for kp_tform in all_kp_transforms: 541 | pos = [ 542 | kp_tform.transform.translation.x, 543 | kp_tform.transform.translation.y, 544 | kp_tform.transform.translation.z, 545 | ] 546 | quat = [ 547 | kp_tform.transform.rotation.x, 548 | kp_tform.transform.rotation.y, 549 | kp_tform.transform.rotation.z, 550 | kp_tform.transform.rotation.w, 551 | ] 552 | T = tf.transformations.quaternion_matrix(quat) 553 | T[:3, -1] = pos 554 | 555 | frame_triad_positions_homog = np.transpose( 556 | np.matmul(T, np.transpose(frame_triad_pts)) 557 | ) 558 | frame_triad_positions = [ 559 | dream.geometric_vision.hnormalized(v).tolist() 560 | for v in frame_triad_positions_homog 561 | ] 562 | frame_triad_projs = dream.geometric_vision.point_projection_from_3d( 563 | self.camera_K, frame_triad_positions 564 | ) 565 | 566 | # Overlay line on image 567 | point0_fixedpt = ( 568 | int(frame_triad_projs[0][0] * factor), 569 | int(frame_triad_projs[0][1] * factor), 570 | ) 571 | point1_fixedpt = ( 572 | int(frame_triad_projs[1][0] * factor), 573 | int(frame_triad_projs[1][1] * factor), 574 | ) 575 | point2_fixedpt = ( 576 | int(frame_triad_projs[2][0] * factor), 577 | int(frame_triad_projs[2][1] * factor), 578 | ) 579 | point3_fixedpt = ( 580 | int(frame_triad_projs[3][0] * factor), 581 | int(frame_triad_projs[3][1] * factor), 582 | ) 583 | 584 | # x-axis 585 | cv_image_overlay = cv2.line( 586 | cv_image_overlay, 587 | point0_fixedpt, 588 | point1_fixedpt, 589 | webcolors.name_to_rgb("red"), 590 | thickness=frame_thickness, 591 | shift=shift, 592 | ) 593 | # y-axis 594 | cv_image_overlay = cv2.line( 595 | cv_image_overlay, 596 | point0_fixedpt, 597 | point2_fixedpt, 598 | webcolors.name_to_rgb("green"), 599 | thickness=frame_thickness, 600 | shift=shift, 601 | ) 602 | # z-axis 603 | cv_image_overlay = cv2.line( 604 | cv_image_overlay, 605 | point0_fixedpt, 606 | point3_fixedpt, 607 | webcolors.name_to_rgb("blue"), 608 | thickness=frame_thickness, 609 | shift=shift, 610 | ) 611 | # center of frame triad 612 | radius_fixedpt = int(point_radius * factor) 613 | cv_image_overlay = cv2.circle( 614 | cv_image_overlay, 615 | point0_fixedpt, 616 | radius_fixedpt, 617 | webcolors.name_to_rgb("black"), 618 | thickness=-1, 619 | shift=shift, 620 | ) 621 | 622 | cv_image_overlay = cv_image_overlay[:, :, ::-1] 623 | image_overlay_msg = self.bridge.cv2_to_imgmsg( 624 | cv_image_overlay, encoding="bgr8" 625 | ) 626 | self.kp_frame_overlay_pub.publish(image_overlay_msg) 627 | 628 | 629 | if __name__ == "__main__": 630 | 631 | # Parse input arguments 632 | parser = argparse.ArgumentParser( 633 | formatter_class=argparse.ArgumentDefaultsHelpFormatter 634 | ) 635 | parser.add_argument( 636 | "-i", 637 | "--input-params-path", 638 | required=True, 639 | help="Path to network parameters file.", 640 | ) 641 | parser.add_argument( 642 | "-c", 643 | "--input-config-path", 644 | default=None, 645 | help="Path to network configuration file. If nothing is specified, the script will search for a config file by the same name as the network parameters file.", 646 | ) 647 | parser.add_argument( 648 | "-b", 649 | "--base-frame", 650 | required=True, 651 | help="The ROS TF name for the base frame of the robot, which serves as the canonical frame for PnP.", 652 | ) 653 | parser.add_argument( 654 | "-r", 655 | "--node-rate", 656 | type=float, 657 | default=10.0, 658 | help="The rate in Hz for this node to run.", 659 | ) 660 | parser.add_argument( 661 | "-v", 662 | "--verbose", 663 | action="store_true", 664 | default=False, 665 | help="Outputs all diagnostic information to the screen.", 666 | ) 667 | parser.add_argument( 668 | "-p", 669 | "--image-preproc-override", 670 | default=None, 671 | help="Overrides the image preprocessing specified by the network. (Debug argument.)", 672 | ) 673 | parser.add_argument( 674 | "-g", 675 | "--gpu-ids", 676 | nargs="+", 677 | type=int, 678 | default=None, 679 | help="The GPU IDs on which to conduct network inference. Nothing specified means all GPUs will be utilized. Does not affect results, only how quickly the results are obtained.", 680 | ) 681 | args = parser.parse_args() 682 | 683 | # Initialize ROS node 684 | rospy.init_node("dream") 685 | 686 | # Create DREAM inference engine 687 | single_frame_mode = True 688 | mode_str = "single-frame mode" if single_frame_mode else "multi-frame mode" 689 | dream_ros = DreamInferenceROS( 690 | args, single_frame_mode, compute_2d_to_3d_transform=True 691 | ) 692 | print("DREAM ~ online ~ " + mode_str) 693 | 694 | # Main loop 695 | rate = rospy.Rate(args.node_rate) 696 | while not rospy.is_shutdown(): 697 | 698 | # Find keypoints in image 699 | found_kp_projs_net_input = dream_ros.process_image() 700 | 701 | # Solve PNP (if in single frame mode) 702 | if single_frame_mode and found_kp_projs_net_input is not None: 703 | ( 704 | kp_projs_raw_good_sample, 705 | kp_positions_good_sample, 706 | ) = dream_ros.keypoint_correspondences(found_kp_projs_net_input) 707 | if ( 708 | kp_projs_raw_good_sample is not None 709 | and kp_positions_good_sample is not None 710 | ): 711 | dream_ros.solve_pnp_buffer( 712 | kp_projs_raw_good_sample, kp_positions_good_sample 713 | ) 714 | 715 | # Publish pose (if found) 716 | if dream_ros.pnp_solution_found: 717 | dream_ros.publish_pose() 718 | 719 | rate.sleep() 720 | -------------------------------------------------------------------------------- /scripts/network_inference.py: -------------------------------------------------------------------------------- 1 | # Copyright (c) 2020 NVIDIA Corporation. All rights reserved. 2 | # This work is licensed under the NVIDIA Source Code License - Non-commercial. Full 3 | # text can be found in LICENSE.md 4 | 5 | import argparse 6 | import math 7 | import os 8 | from PIL import Image as PILImage 9 | 10 | import numpy as np 11 | from ruamel.yaml import YAML 12 | import torch 13 | import torchvision.transforms as TVTransforms 14 | 15 | import dream 16 | 17 | # ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 18 | 19 | 20 | def generate_belief_map_visualizations( 21 | belief_maps, keypoint_projs_detected, keypoint_projs_gt=None 22 | ): 23 | 24 | belief_map_images = dream.image_proc.images_from_belief_maps( 25 | belief_maps, normalization_method=6 26 | ) 27 | 28 | belief_map_images_kp = [] 29 | for kp in range(len(keypoint_projs_detected)): 30 | if keypoint_projs_gt: 31 | keypoint_projs = [keypoint_projs_gt[kp], keypoint_projs_detected[kp]] 32 | color = ["green", "red"] 33 | else: 34 | keypoint_projs = [keypoint_projs_detected[kp]] 35 | color = "red" 36 | belief_map_image_kp = dream.image_proc.overlay_points_on_image( 37 | belief_map_images[kp], 38 | keypoint_projs, 39 | annotation_color_dot=color, 40 | annotation_color_text=color, 41 | point_diameter=4, 42 | ) 43 | belief_map_images_kp.append(belief_map_image_kp) 44 | n_cols = int(math.ceil(len(keypoint_projs_detected) / 2.0)) 45 | belief_maps_kp_mosaic = dream.image_proc.mosaic_images( 46 | belief_map_images_kp, 47 | rows=2, 48 | cols=n_cols, 49 | inner_padding_px=10, 50 | fill_color_rgb=(0, 0, 0), 51 | ) 52 | return belief_maps_kp_mosaic 53 | 54 | 55 | def network_inference(args): 56 | 57 | # Input argument handling 58 | assert os.path.exists( 59 | args.input_params_path 60 | ), 'Expected input_params_path "{}" to exist, but it does not.'.format( 61 | args.input_params_path 62 | ) 63 | 64 | if args.input_config_path: 65 | input_config_path = args.input_config_path 66 | else: 67 | # Use params filepath to infer the config filepath 68 | input_config_path = os.path.splitext(args.input_params_path)[0] + ".yaml" 69 | 70 | assert os.path.exists( 71 | input_config_path 72 | ), 'Expected input_config_path "{}" to exist, but it does not.'.format( 73 | input_config_path 74 | ) 75 | 76 | assert os.path.exists( 77 | args.image_path 78 | ), 'Expected image_path "{}" to exist, but it does not.'.format(args.image_path) 79 | 80 | # Create parser 81 | print("# Opening config file: {} ...".format(input_config_path)) 82 | data_parser = YAML(typ="safe") 83 | 84 | with open(input_config_path, "r") as f: 85 | network_config = data_parser.load(f) 86 | 87 | # Overwrite GPU 88 | # If nothing is specified at the command line, None is the default, which uses all GPUs 89 | # TBD - think about a better way of doing this 90 | network_config["training"]["platform"]["gpu_ids"] = args.gpu_ids 91 | 92 | # Load network 93 | print("# Creating network...") 94 | dream_network = dream.create_network_from_config_data(network_config) 95 | 96 | print("Loading network with weights from: {} ...".format(args.input_params_path)) 97 | dream_network.model.load_state_dict(torch.load(args.input_params_path)) 98 | dream_network.enable_evaluation() 99 | 100 | # Load in image 101 | print("# Loading image: {} ...".format(args.image_path)) 102 | image_rgb_OrigInput_asPilImage = PILImage.open(args.image_path).convert("RGB") 103 | orig_image_dim = tuple(image_rgb_OrigInput_asPilImage.size) 104 | 105 | # Use image preprocessing specified by config by default, unless user specifies otherwise 106 | if args.image_preproc_override: 107 | image_preprocessing = args.image_preproc_override 108 | print( 109 | " Image preprocessing: '{}' --- as specified by user".format( 110 | image_preprocessing 111 | ) 112 | ) 113 | else: 114 | image_preprocessing = dream_network.image_preprocessing() 115 | print( 116 | " Image preprocessing: '{}' --- default as specified by network config".format( 117 | image_preprocessing 118 | ) 119 | ) 120 | 121 | print("Detecting keypoints...") 122 | detection_result = dream_network.keypoints_from_image( 123 | image_rgb_OrigInput_asPilImage, 124 | image_preprocessing_override=image_preprocessing, 125 | debug=True, 126 | ) 127 | kp_coords_wrtOrigInput_asArray = detection_result["detected_keypoints"] 128 | print( 129 | "Detected keypoints in input image:\n{}".format(kp_coords_wrtOrigInput_asArray) 130 | ) 131 | 132 | kp_coords_wrtNetOutput_asArray = detection_result["detected_keypoints_net_output"] 133 | image_rgb_NetInput_asPilImage = detection_result["image_rgb_net_input"] 134 | input_image_dim = image_rgb_NetInput_asPilImage.size 135 | belief_maps_wrtNetOutput_asTensor = detection_result["belief_maps"] 136 | kp_coords_wrtNetInput_asArray = detection_result["detected_keypoints_net_input"] 137 | 138 | # Read in keypoints if provided 139 | if args.keypoints_path: 140 | print( 141 | "# Loading ground truth keypoints from {} ...".format(args.keypoints_path) 142 | ) 143 | keypoints_gt = dream.utilities.load_keypoints( 144 | args.keypoints_path, 145 | dream_network.manipulator_name, 146 | dream_network.keypoint_names, 147 | ) 148 | kp_coords_gt_wrtOrig = keypoints_gt["projections"] 149 | print( 150 | "Ground truth keypoints in input image:\n{}".format( 151 | np.array(kp_coords_gt_wrtOrig) 152 | ) 153 | ) 154 | 155 | kp_coords_gt_wrtNetInput_asArray = dream.image_proc.convert_keypoints_to_netin_from_raw( 156 | kp_coords_gt_wrtOrig, 157 | orig_image_dim, 158 | dream_network.trained_net_input_resolution(), 159 | image_preprocessing, 160 | ) 161 | kp_coords_gt_wrtNetOutput_asArray = dream.image_proc.convert_keypoints_to_netout_from_netin( 162 | kp_coords_gt_wrtNetInput_asArray, 163 | dream_network.trained_net_input_resolution(), 164 | dream_network.trained_net_output_resolution(), 165 | ) 166 | kp_coords_gt_wrtNetInput_asList = kp_coords_gt_wrtNetInput_asArray.tolist() 167 | kp_coords_gt_wrtNetOutput_asList = kp_coords_gt_wrtNetOutput_asArray.tolist() 168 | else: 169 | print("# Not loading ground truth keypoints (not provided)") 170 | kp_coords_gt_wrtNetInput_asList = None 171 | kp_coords_gt_wrtNetOutput_asList = None 172 | 173 | # Generate visualization output: keypoints, with ground truth if requested) overlaid on image used for network input 174 | keypoints_wrtNetInput_overlay = dream.image_proc.overlay_points_on_image( 175 | image_rgb_NetInput_asPilImage, 176 | kp_coords_gt_wrtNetInput_asList, 177 | dream_network.friendly_keypoint_names, 178 | annotation_color_dot="green", 179 | annotation_color_text="white", 180 | ) 181 | keypoints_wrtNetInput_overlay = dream.image_proc.overlay_points_on_image( 182 | keypoints_wrtNetInput_overlay, 183 | kp_coords_wrtNetInput_asArray, 184 | dream_network.friendly_keypoint_names, 185 | annotation_color_dot="red", 186 | annotation_color_text="white", 187 | ) 188 | keypoints_wrtNetInput_overlay.show( 189 | title="Keypoints (possibly with ground truth) on net input image" 190 | ) 191 | 192 | # Generate visualization output: mosaic of raw belief maps from network 193 | belief_maps_overlay = generate_belief_map_visualizations( 194 | belief_maps_wrtNetOutput_asTensor, 195 | kp_coords_wrtNetOutput_asArray, 196 | kp_coords_gt_wrtNetOutput_asList, 197 | ) 198 | belief_maps_overlay.show(title="Belief map output mosaic") 199 | 200 | # Generate visualization output: mosaic of belief maps, with keypoints, overlaid on image used for network input 201 | belief_maps_wrtNetOutput_asListOfPilImages = dream.image_proc.images_from_belief_maps( 202 | belief_maps_wrtNetOutput_asTensor, normalization_method=6 203 | ) 204 | blended_array = [] 205 | 206 | for n in range(len(kp_coords_wrtNetOutput_asArray)): 207 | 208 | bm_wrtNetOutput_asPilImage = belief_maps_wrtNetOutput_asListOfPilImages[n] 209 | kp = kp_coords_wrtNetInput_asArray[n] 210 | fname = dream_network.friendly_keypoint_names[n] 211 | 212 | bm_wrtNetInput_asPilImage = bm_wrtNetOutput_asPilImage.resize( 213 | input_image_dim, resample=PILImage.BILINEAR 214 | ) 215 | blended = PILImage.blend( 216 | image_rgb_NetInput_asPilImage, bm_wrtNetInput_asPilImage, alpha=0.5 217 | ) 218 | blended = dream.image_proc.overlay_points_on_image( 219 | blended, 220 | [kp], 221 | [fname], 222 | annotation_color_dot="red", 223 | annotation_color_text="white", 224 | ) 225 | blended_array.append(blended) 226 | 227 | n_cols = int(math.ceil(len(kp_coords_wrtNetOutput_asArray) / 2.0)) 228 | belief_maps_with_kp_overlaid_mosaic = dream.image_proc.mosaic_images( 229 | blended_array, rows=2, cols=n_cols, fill_color_rgb=(0, 0, 0) 230 | ) 231 | belief_maps_with_kp_overlaid_mosaic.show( 232 | title="Mosaic of belief maps, with keypoints, on original" 233 | ) 234 | 235 | # Squash belief maps into one combined image 236 | belief_map_combined_wrtNetOutput_asTensor = belief_maps_wrtNetOutput_asTensor.sum( 237 | dim=0 238 | ) 239 | belief_map_combined_wrtNetOutput_asPilImage = dream.image_proc.image_from_belief_map( 240 | belief_map_combined_wrtNetOutput_asTensor, normalization_method=6 241 | ) # clamps to b/w 0 and 1 242 | belief_map_combined_wrtNetInput_asPilImage = dream.image_proc.convert_image_to_netin_from_netout( 243 | belief_map_combined_wrtNetOutput_asPilImage, input_image_dim 244 | ) 245 | 246 | # Generate visualization output: belief maps, with keypoints, overlaid on image used for network input 247 | belief_map_combined_wrtNetInput_overlay = PILImage.blend( 248 | image_rgb_NetInput_asPilImage, 249 | belief_map_combined_wrtNetInput_asPilImage, 250 | alpha=0.5, 251 | ) 252 | belief_map_combined_wrtNetInput_overlay = dream.image_proc.overlay_points_on_image( 253 | belief_map_combined_wrtNetInput_overlay, 254 | kp_coords_wrtNetInput_asArray, 255 | dream_network.friendly_keypoint_names, 256 | annotation_color_dot="red", 257 | annotation_color_text="white", 258 | ) 259 | belief_map_combined_wrtNetInput_overlay.show( 260 | title="Belief maps, with keypoints, on net input image" 261 | ) 262 | 263 | # Generate visualization output: belief maps, with keypoints, overlaid on original image 264 | belief_map_combined_wrtOrigInput_asPilImage = dream.image_proc.inverse_preprocess_image( 265 | belief_map_combined_wrtNetInput_asPilImage, orig_image_dim, image_preprocessing 266 | ) 267 | belief_map_combined_wrtOrigInput_overlay = PILImage.blend( 268 | image_rgb_OrigInput_asPilImage, 269 | belief_map_combined_wrtOrigInput_asPilImage, 270 | alpha=0.5, 271 | ) 272 | belief_map_combined_wrtOrigInput_overlay = dream.image_proc.overlay_points_on_image( 273 | belief_map_combined_wrtOrigInput_overlay, 274 | kp_coords_wrtOrigInput_asArray, 275 | dream_network.friendly_keypoint_names, 276 | annotation_color_dot="red", 277 | annotation_color_text="white", 278 | ) 279 | belief_map_combined_wrtOrigInput_overlay.show( 280 | title="Belief maps, with keypoints, on original image" 281 | ) 282 | 283 | print("Done.") 284 | 285 | 286 | if __name__ == "__main__": 287 | 288 | print( 289 | "---------- Running 'network_inference.py' -------------------------------------------------" 290 | ) 291 | 292 | # Parse input arguments 293 | parser = argparse.ArgumentParser( 294 | formatter_class=argparse.ArgumentDefaultsHelpFormatter 295 | ) 296 | parser.add_argument( 297 | "-i", 298 | "--input-params-path", 299 | required=True, 300 | help="Path to network parameters file.", 301 | ) 302 | parser.add_argument( 303 | "-c", 304 | "--input-config-path", 305 | default=None, 306 | help="Path to network configuration file. If nothing is specified, the script will search for a config file by the same name as the network parameters file.", 307 | ) 308 | parser.add_argument( 309 | "-m", "--image_path", required=True, help="Path to image used for inference." 310 | ) 311 | parser.add_argument( 312 | "-k", 313 | "--keypoints_path", 314 | default=None, 315 | help="Path to NDDS dataset with ground truth keypoints information.", 316 | ) 317 | parser.add_argument( 318 | "-g", 319 | "--gpu-ids", 320 | nargs="+", 321 | type=int, 322 | default=None, 323 | help="The GPU IDs on which to conduct network inference. Nothing specified means all GPUs will be utilized. Does not affect results, only how quickly the results are obtained.", 324 | ) 325 | parser.add_argument( 326 | "-p", 327 | "--image-preproc-override", 328 | default=None, 329 | help="Overrides the image preprocessing specified by the network. (Debug argument.)", 330 | ) 331 | args = parser.parse_args() 332 | 333 | # Run network inference 334 | network_inference(args) 335 | -------------------------------------------------------------------------------- /scripts/network_inference_dataset.py: -------------------------------------------------------------------------------- 1 | # Copyright (c) 2020 NVIDIA Corporation. All rights reserved. 2 | # This work is licensed under the NVIDIA Source Code License - Non-commercial. Full 3 | # text can be found in LICENSE.md 4 | 5 | import argparse 6 | import os 7 | 8 | import dream.analysis as dream_analysis 9 | 10 | # ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 11 | 12 | 13 | def network_inference_dataset(args): 14 | 15 | # Input argument handling 16 | assert os.path.exists( 17 | args.input_params_path 18 | ), 'Expected input_params_path "{}" to exist, but it does not.'.format( 19 | args.input_params_path 20 | ) 21 | 22 | if args.input_config_path: 23 | input_config_path = args.input_config_path 24 | else: 25 | # Use params filepath to infer the config filepath 26 | input_config_path = os.path.splitext(args.input_params_path)[0] + ".yaml" 27 | 28 | assert os.path.exists( 29 | input_config_path 30 | ), 'Expected input_config_path "{}" to exist, but it does not.'.format( 31 | input_config_path 32 | ) 33 | 34 | assert os.path.exists( 35 | args.dataset_dir 36 | ), 'Expected dataset_dir "{}" to exist, but it does not.'.format(args.dataset_dir) 37 | 38 | assert ( 39 | isinstance(args.batch_size, int) and args.batch_size > 0 40 | ), 'If specified, "batch_size" must be a positive integer.' 41 | 42 | visualize_belief_maps = not args.not_visualize_belief_maps 43 | 44 | dream_analysis.analyze_ndds_dataset( 45 | args.input_params_path, 46 | input_config_path, 47 | args.dataset_dir, 48 | args.output_dir, 49 | visualize_belief_maps=visualize_belief_maps, 50 | pnp_analysis=True, 51 | force_overwrite=args.force_overwrite, 52 | image_preprocessing_override=args.image_preproc_override, 53 | batch_size=args.batch_size, 54 | num_workers=args.num_workers, 55 | gpu_ids=args.gpu_ids, 56 | ) 57 | 58 | 59 | if __name__ == "__main__": 60 | parser = argparse.ArgumentParser( 61 | formatter_class=argparse.ArgumentDefaultsHelpFormatter 62 | ) 63 | parser.add_argument( 64 | "-i", 65 | "--input-params-path", 66 | required=True, 67 | help="Path to network parameters file.", 68 | ) 69 | parser.add_argument( 70 | "-c", 71 | "--input-config-path", 72 | default=None, 73 | help="Path to network configuration file. If nothing is specified, the script will search for a config file by the same name as the network parameters file.", 74 | ) 75 | parser.add_argument( 76 | "-d", 77 | "--dataset-dir", 78 | required=True, 79 | help="Path to NDDS dataset on which to conduct network inference.", 80 | ) 81 | parser.add_argument( 82 | "-o", 83 | "--output-dir", 84 | required=True, 85 | help="Path to output directory to save analysis results.", 86 | ) 87 | parser.add_argument( 88 | "-not-v", 89 | "--not-visualize-belief-maps", 90 | action="store_true", 91 | default=False, 92 | help="Disable belief map visualization. Without this flag, belief map visualization is enabled by default.", 93 | ) 94 | parser.add_argument( 95 | "-f", 96 | "--force-overwrite", 97 | action="store_true", 98 | default=False, 99 | help="Forces overwriting of analysis results in the provided directory.", 100 | ) 101 | parser.add_argument( 102 | "-b", 103 | "--batch-size", 104 | type=int, 105 | default=16, 106 | help="The batch size used to process the data. Does not affect results, only how quickly the results are obtained.", 107 | ) 108 | parser.add_argument( 109 | "-w", 110 | "--num-workers", 111 | type=int, 112 | default=8, 113 | help='The number of subprocesses ("workers") used for loading the data. 0 means that no subprocesses are used. ' 114 | + "Does not affect results, only how quickly the results are obtained.", 115 | ) 116 | parser.add_argument( 117 | "-g", 118 | "--gpu-ids", 119 | nargs="+", 120 | type=int, 121 | default=None, 122 | help="The GPU IDs on which to conduct network inference. Nothing specified means all GPUs will be utilized. Does not affect results, only how quickly the results are obtained.", 123 | ) 124 | parser.add_argument( 125 | "-p", 126 | "--image-preproc-override", 127 | default=None, 128 | help="Overrides the image preprocessing specified by the network. (Debug argument.)", 129 | ) 130 | args = parser.parse_args() 131 | network_inference_dataset(args) 132 | -------------------------------------------------------------------------------- /scripts/train_network_multi.py: -------------------------------------------------------------------------------- 1 | # Copyright (c) 2020 NVIDIA Corporation. All rights reserved. 2 | # This work is licensed under the NVIDIA Source Code License - Non-commercial. Full 3 | # text can be found in LICENSE.md 4 | 5 | import argparse 6 | import os 7 | import subprocess 8 | 9 | # ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 10 | 11 | # Totally developmental script 12 | 13 | 14 | def train_network_multi(args): 15 | 16 | # Input argument handling 17 | assert ( 18 | isinstance(args.num_instances, int) and args.num_instances > 0 19 | ), "Expected num_instances to be an integer greater than 0." 20 | 21 | # Generate output directories 22 | dream.utilities.makedirs(args.output_dir) 23 | output_dirs = [ 24 | os.path.join(args.output_dir, "train_{}".format(n)) 25 | for n in range(args.num_instances) 26 | ] 27 | 28 | for output_dir in output_dirs: 29 | 30 | # TBD: do this without subprocess 31 | train_command_output = ( 32 | "python scripts/train_network.py " 33 | + args.train_command 34 | + ' -o "{}"'.format(output_dir) 35 | ) 36 | subprocess.run(train_command_output, shell=True) 37 | 38 | 39 | if __name__ == "__main__": 40 | parser = argparse.ArgumentParser( 41 | formatter_class=argparse.ArgumentDefaultsHelpFormatter 42 | ) 43 | parser.add_argument( 44 | "-n", 45 | "--num-instances", 46 | metavar="num_instances", 47 | type=int, 48 | required=True, 49 | help="Number of training instances to run.", 50 | ) 51 | parser.add_argument( 52 | "-o", 53 | "--output-dir", 54 | metavar="output_dir", 55 | required=True, 56 | help="Output directory for training instances.", 57 | ) 58 | parser.add_argument( 59 | "-c", 60 | "--train-command", 61 | metavar="train_command", 62 | required=True, 63 | help="Command line options for training.", 64 | ) 65 | args = parser.parse_args() 66 | 67 | # Run train network, multiple instances version 68 | train_network_multi(args) 69 | -------------------------------------------------------------------------------- /scripts/visualize_network_inference.py: -------------------------------------------------------------------------------- 1 | # Copyright (c) 2020 NVIDIA Corporation. All rights reserved. 2 | # This work is licensed under the NVIDIA Source Code License - Non-commercial. Full 3 | # text can be found in LICENSE.md 4 | 5 | import argparse 6 | import os 7 | from PIL import Image as PILImage 8 | import shutil 9 | import subprocess 10 | import sys 11 | 12 | import numpy as np 13 | from ruamel.yaml import YAML 14 | import torch 15 | from torch.utils.data import DataLoader as TorchDataLoader 16 | import torchvision.transforms as TVTransforms 17 | from tqdm import tqdm 18 | 19 | import dream 20 | 21 | # ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 22 | 23 | 24 | def video_from_frames(frames_dir, video_output_path, video_framerate): 25 | force_str = "-y" 26 | loglevel_str = "-loglevel 24" 27 | framerate_str = "-framerate {}".format(video_framerate) 28 | input_data_str = '-pattern_type glob -i "{}"'.format( 29 | os.path.join(frames_dir, "*.png") 30 | ) 31 | output_vid_str = '"{}"'.format(video_output_path) 32 | encoding_str = "-vcodec libx264 -pix_fmt yuv420p" 33 | ffmpeg_vid_cmd = ( 34 | "ffmpeg " 35 | + force_str 36 | + " " 37 | + loglevel_str 38 | + " " 39 | + framerate_str 40 | + " " 41 | + input_data_str 42 | + " " 43 | + encoding_str 44 | + " " 45 | + output_vid_str 46 | ) 47 | 48 | print("Running command: {}".format(ffmpeg_vid_cmd)) 49 | subprocess.call(ffmpeg_vid_cmd, shell=True) 50 | 51 | 52 | # ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 53 | 54 | KP_OVERLAY_RAW = "kp_raw" 55 | KP_OVERLAY_NET_INPUT = "kp_net_input" 56 | KP_BELIEF_OVERLAY_RAW = "kp_belief_raw" 57 | BELIEF_OVERLAY_RAW = "belief_raw" 58 | 59 | 60 | def visualize_network_inference(args): 61 | 62 | # Input argument handling 63 | assert os.path.exists( 64 | args.input_params_path 65 | ), 'Expected input_params_path "{}" to exist, but it does not.'.format( 66 | args.input_params_path 67 | ) 68 | 69 | if args.input_config_path: 70 | input_config_path = args.input_config_path 71 | else: 72 | # Use params filepath to infer the config filepath 73 | input_config_path = os.path.splitext(args.input_params_path)[0] + ".yaml" 74 | 75 | assert os.path.exists( 76 | input_config_path 77 | ), 'Expected input_config_path "{}" to exist, but it does not.'.format( 78 | input_config_path 79 | ) 80 | 81 | assert os.path.exists( 82 | args.dataset_path 83 | ), 'Expected dataset_path "{}" to exist, but it does not.'.format(args.dataset_path) 84 | 85 | # Determine what types of visualizations to do 86 | print("visualization types: {}".format(args.visualization_types)) 87 | do_kp_overlay_raw = True if KP_OVERLAY_RAW in args.visualization_types else False 88 | do_kp_overlay_net_input = ( 89 | True if KP_OVERLAY_NET_INPUT in args.visualization_types else False 90 | ) 91 | do_kp_belief_overlay_raw = ( 92 | True if KP_BELIEF_OVERLAY_RAW in args.visualization_types else False 93 | ) 94 | do_belief_overlay_raw = ( 95 | True if BELIEF_OVERLAY_RAW in args.visualization_types else False 96 | ) 97 | 98 | videos_to_make = [] 99 | needs_belief_maps = False 100 | if do_kp_overlay_raw: 101 | idx_kp_overlay_raw = len(videos_to_make) 102 | videos_to_make.append( 103 | { 104 | "frames_dir": os.path.join(args.output_dir, "frames_kp_overlay_raw"), 105 | "output_path": os.path.join(args.output_dir, "kp_overlay_raw.mp4"), 106 | "frame": [], 107 | } 108 | ) 109 | if do_kp_overlay_net_input: 110 | idx_kp_overlay_net_input = len(videos_to_make) 111 | videos_to_make.append( 112 | { 113 | "frames_dir": os.path.join( 114 | args.output_dir, "frames_kp_overlay_net_input" 115 | ), 116 | "output_path": os.path.join( 117 | args.output_dir, "kp_overlay_net_input.mp4" 118 | ), 119 | "frame": [], 120 | } 121 | ) 122 | if do_kp_belief_overlay_raw: 123 | idx_kp_belief_overlay_raw = len(videos_to_make) 124 | needs_belief_maps = True 125 | videos_to_make.append( 126 | { 127 | "frames_dir": os.path.join( 128 | args.output_dir, "frames_kp_belief_overlay_raw" 129 | ), 130 | "output_path": os.path.join( 131 | args.output_dir, "kp_belief_overlay_raw.mp4" 132 | ), 133 | "frame": [], 134 | } 135 | ) 136 | if do_belief_overlay_raw: 137 | idx_belief_overlay_raw = len(videos_to_make) 138 | needs_belief_maps = True 139 | videos_to_make.append( 140 | { 141 | "frames_dir": os.path.join( 142 | args.output_dir, "frames_belief_overlay_raw" 143 | ), 144 | "output_path": os.path.join(args.output_dir, "belief_overlay_raw.mp4"), 145 | "frame": [], 146 | } 147 | ) 148 | 149 | if len(videos_to_make) == 0: 150 | print("No visualizations have been selected.") 151 | sys.exit(0) 152 | 153 | dream.utilities.makedirs(args.output_dir, exist_ok=args.force_overwrite) 154 | for video in videos_to_make: 155 | if os.path.exists(video["frames_dir"]): 156 | assert args.force_overwrite, 'Frames directory "{}" already exists.'.format( 157 | video["frames_dir"] 158 | ) 159 | shutil.rmtree(video["frames_dir"]) 160 | dream.utilities.makedirs(video["frames_dir"], exist_ok=args.force_overwrite) 161 | 162 | # Create parser 163 | data_parser = YAML(typ="safe") 164 | 165 | with open(input_config_path, "r") as f: 166 | network_config = data_parser.load(f) 167 | 168 | # Overwrite GPU 169 | # If nothing is specified at the command line, None is the default, which uses all GPUs 170 | # TBD - think about a better way of doing this 171 | network_config["training"]["platform"]["gpu_ids"] = args.gpu_ids 172 | 173 | # Load network 174 | dream_network = dream.create_network_from_config_data(network_config) 175 | dream_network.model.load_state_dict(torch.load(args.input_params_path)) 176 | dream_network.enable_evaluation() 177 | 178 | # Use image preprocessing specified by config by default, unless user specifies otherwise 179 | if args.image_preproc_override: 180 | image_preprocessing = args.image_preproc_override 181 | else: 182 | image_preprocessing = dream_network.image_preprocessing() 183 | 184 | if args.keypoint_ids is None or len(args.keypoint_ids) == 0: 185 | idx_keypoints = list(range(dream_network.n_keypoints)) 186 | else: 187 | idx_keypoints = args.keypoint_ids 188 | n_idx_keypoints = len(idx_keypoints) 189 | 190 | sample_results = [] 191 | 192 | dataset_to_viz = dream.utilities.find_ndds_data_in_dir(args.dataset_path) 193 | dataset_file_dict_list = dataset_to_viz[ 194 | 0 195 | ] # list of data file dictionaries; each dictionary indicates the files names for rgb, depth, seg, ... 196 | dataset_meta_dict = dataset_to_viz[1] # dictionary of camera, object files, etc. 197 | 198 | if dataset_file_dict_list: 199 | 200 | # Downselect based on frame name 201 | if args.start_frame or args.end_frame: 202 | sample_names = [x["name"] for x in dataset_file_dict_list] 203 | start_idx = sample_names.index(args.start_frame) if args.start_frame else 0 204 | end_idx = ( 205 | sample_names.index(args.end_frame) + 1 206 | if args.end_frame 207 | else len(dataset_file_dict_list) 208 | ) 209 | 210 | dataset_to_viz = ( 211 | dataset_file_dict_list[start_idx:end_idx], 212 | dataset_meta_dict, 213 | ) 214 | 215 | image_raw_resolution = dream.utilities.load_image_resolution( 216 | dataset_meta_dict["camera"] 217 | ) 218 | ( 219 | network_input_res_inf, 220 | network_output_res_inf, 221 | ) = dream_network.net_resolutions_from_image_raw_resolution( 222 | image_raw_resolution, image_preprocessing_override=image_preprocessing 223 | ) 224 | 225 | manip_dataset_debug_mode = dream.datasets.ManipulatorNDDSDatasetDebugLevels[ 226 | "LIGHT" 227 | ] 228 | manip_dataset = dream.datasets.ManipulatorNDDSDataset( 229 | dataset_to_viz, 230 | dream_network.manipulator_name, 231 | dream_network.keypoint_names, 232 | network_input_res_inf, 233 | network_output_res_inf, 234 | dream_network.image_normalization, 235 | image_preprocessing, 236 | augment_data=False, 237 | include_ground_truth=not args.no_ground_truth, 238 | debug_mode=manip_dataset_debug_mode, 239 | ) 240 | 241 | # TODO: set batch size and num_workers at command line 242 | batch_size = 8 243 | num_workers = 4 244 | training_data = TorchDataLoader( 245 | manip_dataset, batch_size=batch_size, num_workers=num_workers, shuffle=False 246 | ) 247 | 248 | # Network inference on dataset 249 | with torch.no_grad(): 250 | 251 | for batch_idx, sample in enumerate(tqdm(training_data)): 252 | 253 | this_batch_size = len(sample["config"]["name"]) 254 | 255 | # Conduct inference 256 | network_image_input = sample["image_rgb_input"].cuda() 257 | ( 258 | belief_maps_batch, 259 | detected_kp_projs_netout_batch, 260 | ) = dream_network.inference(network_image_input) 261 | 262 | for b in range(this_batch_size): 263 | 264 | input_image_path = sample["config"]["image_paths"]["rgb"][b] 265 | 266 | if needs_belief_maps: 267 | belief_maps = belief_maps_batch[b] 268 | selected_belief_maps_copy = ( 269 | belief_maps[idx_keypoints, :, :].detach().clone() 270 | ) 271 | else: 272 | selected_belief_maps_copy = [] 273 | 274 | detected_kp_projs_netout = np.array( 275 | detected_kp_projs_netout_batch[b], dtype=float 276 | ) 277 | selected_detected_kp_projs_netout = detected_kp_projs_netout[ 278 | idx_keypoints, : 279 | ] 280 | selected_detected_kp_projs_netin = dream.image_proc.convert_keypoints_to_netin_from_netout( 281 | selected_detected_kp_projs_netout, 282 | network_output_res_inf, 283 | network_input_res_inf, 284 | ) 285 | selected_detected_kp_projs_raw = dream.image_proc.convert_keypoints_to_raw_from_netin( 286 | selected_detected_kp_projs_netin, 287 | network_input_res_inf, 288 | image_raw_resolution, 289 | image_preprocessing, 290 | ) 291 | 292 | if args.no_ground_truth: 293 | selected_gt_kp_projs_raw = [] 294 | selected_gt_kp_projs_netin = [] 295 | else: 296 | selected_gt_kp_projs_raw = np.array( 297 | sample["keypoint_projections_raw"][b][idx_keypoints, :], 298 | dtype=float, 299 | ) 300 | selected_gt_kp_projs_netin = np.array( 301 | sample["keypoint_projections_input"][b][idx_keypoints, :], 302 | dtype=float, 303 | ) 304 | 305 | input_image_raw = PILImage.open(input_image_path).convert("RGB") 306 | image_net_input = dream.image_proc.image_from_tensor( 307 | sample["image_rgb_input_viz"][b] 308 | ) 309 | 310 | sample_results.append( 311 | ( 312 | input_image_raw, 313 | image_net_input, 314 | selected_belief_maps_copy, 315 | selected_detected_kp_projs_raw, 316 | selected_detected_kp_projs_netin, 317 | selected_gt_kp_projs_raw, 318 | selected_gt_kp_projs_netin, 319 | ) 320 | ) 321 | 322 | else: 323 | # Probably a directory of images - fix this later to avoid code duplication 324 | dirlist = os.listdir(args.dataset_path) 325 | dirlist.sort() 326 | png_image_names = [f for f in dirlist if f.endswith(".png")] 327 | jpg_image_names = [f for f in dirlist if f.endswith(".jpg")] 328 | image_names = ( 329 | png_image_names 330 | if len(png_image_names) > len(jpg_image_names) 331 | else jpg_image_names 332 | ) 333 | 334 | if args.start_frame or args.end_frame: 335 | sample_names = [os.path.splitext(i)[0] for i in image_names] 336 | start_idx = sample_names.index(args.start_frame) if args.start_frame else 0 337 | end_idx = ( 338 | sample_names.index(args.end_frame) + 1 339 | if args.end_frame 340 | else len(sample_names) 341 | ) 342 | 343 | image_names = image_names[start_idx:end_idx] 344 | 345 | # Just use a heuristic to determine the image extension 346 | image_paths = [os.path.join(args.dataset_path, i) for i in image_names] 347 | 348 | for input_image_path in tqdm(image_paths): 349 | 350 | input_image_raw = PILImage.open(input_image_path).convert("RGB") 351 | detection_result = dream_network.keypoints_from_image( 352 | input_image_raw, 353 | image_preprocessing_override=image_preprocessing, 354 | debug=True, 355 | ) 356 | 357 | selected_detected_kps_raw = detection_result["detected_keypoints"][ 358 | idx_keypoints, : 359 | ] 360 | selected_detected_kps_netin = detection_result[ 361 | "detected_keypoints_net_input" 362 | ][idx_keypoints, :] 363 | image_net_input = detection_result["image_rgb_net_input"] 364 | selected_belief_maps = ( 365 | detection_result["belief_maps"][idx_keypoints, :, :] 366 | if needs_belief_maps 367 | else [] 368 | ) 369 | selected_gt_kps_raw = [] 370 | selected_gt_kps_netin = [] 371 | 372 | sample_results.append( 373 | ( 374 | input_image_raw, 375 | image_net_input, 376 | selected_belief_maps, 377 | selected_detected_kps_raw, 378 | selected_detected_kps_netin, 379 | selected_gt_kps_raw, 380 | selected_gt_kps_netin, 381 | ) 382 | ) 383 | 384 | # Iterate through inferred results 385 | idx_this_frame = 1 386 | print("Creating visualizations...") 387 | for ( 388 | image_raw, 389 | input_image, 390 | belief_maps, 391 | detected_kp_projs_raw, 392 | detected_kp_projs_net_input, 393 | gt_kp_projs_raw, 394 | gt_kp_projs_net_input, 395 | ) in tqdm(sample_results): 396 | 397 | show_gt_keypoints = (not args.no_ground_truth) and len(gt_kp_projs_raw) > 0 398 | 399 | image_raw_resolution = image_raw.size 400 | net_input_resolution = input_image.size 401 | 402 | if do_kp_overlay_net_input: 403 | videos_to_make[idx_kp_overlay_net_input]["frame"] = input_image 404 | 405 | if do_kp_overlay_raw: 406 | videos_to_make[idx_kp_overlay_raw]["frame"] = image_raw 407 | 408 | if do_kp_belief_overlay_raw: 409 | flattened_belief_tensor = belief_maps.sum(dim=0) 410 | flattened_belief_image = dream.image_proc.image_from_belief_map( 411 | flattened_belief_tensor, colormap="hot", normalization_method=6 412 | ) 413 | flattened_belief_image_netin = dream.image_proc.convert_image_to_netin_from_netout( 414 | flattened_belief_image, net_input_resolution 415 | ) 416 | flattened_belief_image_raw = dream.image_proc.inverse_preprocess_image( 417 | flattened_belief_image_netin, image_raw_resolution, image_preprocessing 418 | ) 419 | videos_to_make[idx_kp_belief_overlay_raw]["frame"] = PILImage.blend( 420 | image_raw, flattened_belief_image_raw, alpha=0.5 421 | ) 422 | 423 | # Previous code here, but the overlays don't look as nice 424 | # Note - this seems pretty slow 425 | # I = np.asarray(flattened_belief_image_raw.convert('L')) 426 | # I_black = I < 20 427 | # mask = PILImage.fromarray(np.uint8(255*I_black)) 428 | # temp = PILImage.composite(image_raw, flattened_belief_image_raw, mask) 429 | # videos_to_make[idx_kp_belief_overlay_raw]['frame'] = PILImage.blend(image_raw, temp, alpha=0.75) 430 | # #PILImage.alpha_composite(flattened_belief_image_raw.convert('RGBA'), image_raw.convert('RGBA')) 431 | 432 | if do_belief_overlay_raw: 433 | flattened_belief_tensor = belief_maps.sum(dim=0) 434 | flattened_belief_image = dream.image_proc.image_from_belief_map( 435 | flattened_belief_tensor, colormap="hot", normalization_method=6 436 | ) 437 | flattened_belief_image_netin = dream.image_proc.convert_image_to_netin_from_netout( 438 | flattened_belief_image, net_input_resolution 439 | ) 440 | flattened_belief_image_raw = dream.image_proc.inverse_preprocess_image( 441 | flattened_belief_image_netin, image_raw_resolution, image_preprocessing 442 | ) 443 | videos_to_make[idx_belief_overlay_raw]["frame"] = PILImage.blend( 444 | image_raw, flattened_belief_image_raw, alpha=0.5 445 | ) 446 | 447 | for n in range(n_idx_keypoints): 448 | detected_kp_proj_raw = detected_kp_projs_raw[n, :] 449 | detected_kp_proj_net_input = detected_kp_projs_net_input[n, :] 450 | 451 | if show_gt_keypoints: 452 | gt_kp_proj_raw = gt_kp_projs_raw[n, :] 453 | gt_kp_proj_net_input = gt_kp_projs_net_input[n, :] 454 | 455 | # Overlay 456 | if do_kp_overlay_net_input: 457 | # Heuristic to make point diameter look good for larger raw resolutions 458 | pt_diameter = ( 459 | 12.0 460 | if image_raw_resolution[0] * image_raw_resolution[1] > 500000 461 | else 6.0 462 | ) 463 | if show_gt_keypoints: 464 | videos_to_make[idx_kp_overlay_net_input][ 465 | "frame" 466 | ] = dream.image_proc.overlay_points_on_image( 467 | videos_to_make[idx_kp_overlay_net_input]["frame"], 468 | [gt_kp_proj_net_input], 469 | annotation_color_dot="green", 470 | annotation_color_text="white", 471 | point_thickness=2, 472 | point_diameter=pt_diameter, 473 | ) 474 | 475 | videos_to_make[idx_kp_overlay_net_input][ 476 | "frame" 477 | ] = dream.image_proc.overlay_points_on_image( 478 | videos_to_make[idx_kp_overlay_net_input]["frame"], 479 | [detected_kp_proj_net_input], 480 | annotation_color_dot="red", 481 | annotation_color_text="white", 482 | point_diameter=pt_diameter, 483 | ) 484 | 485 | if do_kp_overlay_raw: 486 | # Heuristic to make point diameter look good for larger raw resolutions 487 | pt_diameter = ( 488 | 12.0 489 | if image_raw_resolution[0] * image_raw_resolution[1] > 500000 490 | else 6.0 491 | ) 492 | 493 | if show_gt_keypoints: 494 | videos_to_make[idx_kp_overlay_raw][ 495 | "frame" 496 | ] = dream.image_proc.overlay_points_on_image( 497 | videos_to_make[idx_kp_overlay_raw]["frame"], 498 | [gt_kp_proj_raw], 499 | annotation_color_dot="green", 500 | annotation_color_text="white", 501 | point_thickness=2, 502 | point_diameter=pt_diameter + 2, 503 | ) 504 | 505 | videos_to_make[idx_kp_overlay_raw][ 506 | "frame" 507 | ] = dream.image_proc.overlay_points_on_image( 508 | videos_to_make[idx_kp_overlay_raw]["frame"], 509 | [detected_kp_proj_raw], 510 | annotation_color_dot="red", 511 | annotation_color_text="white", 512 | point_diameter=pt_diameter, 513 | ) 514 | 515 | if do_kp_belief_overlay_raw: 516 | # Heuristic to make point diameter look good for larger raw resolutions 517 | pt_diameter = ( 518 | 12.0 519 | if image_raw_resolution[0] * image_raw_resolution[1] > 500000 520 | else 6.0 521 | ) 522 | 523 | if show_gt_keypoints: 524 | videos_to_make[idx_kp_belief_overlay_raw][ 525 | "frame" 526 | ] = dream.image_proc.overlay_points_on_image( 527 | videos_to_make[idx_kp_belief_overlay_raw]["frame"], 528 | [gt_kp_proj_raw], 529 | annotation_color_dot="green", 530 | annotation_color_text="white", 531 | point_thickness=2, 532 | point_diameter=pt_diameter + 2, 533 | ) 534 | 535 | videos_to_make[idx_kp_belief_overlay_raw][ 536 | "frame" 537 | ] = dream.image_proc.overlay_points_on_image( 538 | videos_to_make[idx_kp_belief_overlay_raw]["frame"], 539 | [detected_kp_proj_raw], 540 | annotation_color_dot="red", 541 | annotation_color_text="white", 542 | point_diameter=pt_diameter, 543 | ) 544 | 545 | frame_output_filename = str(idx_this_frame).zfill(6) + ".png" 546 | for video in videos_to_make: 547 | video["frame"].save( 548 | os.path.join(video["frames_dir"], frame_output_filename) 549 | ) 550 | 551 | idx_this_frame += 1 552 | 553 | # Call to ffmpeg 554 | for video in videos_to_make: 555 | video_from_frames(video["frames_dir"], video["output_path"], args.framerate) 556 | shutil.rmtree(video["frames_dir"]) 557 | 558 | 559 | if __name__ == "__main__": 560 | parser = argparse.ArgumentParser( 561 | formatter_class=argparse.ArgumentDefaultsHelpFormatter 562 | ) 563 | parser.add_argument( 564 | "-i", 565 | "--input-params-path", 566 | required=True, 567 | help="Path to network parameters file.", 568 | ) 569 | parser.add_argument( 570 | "-c", 571 | "--input-config-path", 572 | default=None, 573 | help="Path to network configuration file. If nothing is specified, the script will search for a config file by the same name as the network parameters file.", 574 | ) 575 | parser.add_argument( 576 | "-d", "--dataset-path", required=True, help="Path to dataset to visualize." 577 | ) 578 | parser.add_argument( 579 | "-o", 580 | "--output-dir", 581 | required=True, 582 | help="Path to output directory to save visualization.", 583 | ) 584 | parser.add_argument( 585 | "-f", 586 | "--force-overwrite", 587 | action="store_true", 588 | default=False, 589 | help="Forces overwriting of analysis results in the provided directory.", 590 | ) 591 | parser.add_argument( 592 | "-k", 593 | "--keypoint-ids", 594 | nargs="+", 595 | type=int, 596 | default=None, 597 | help="List of keypoint indices to use in the visualization. No input (None) means all keypoints will be used.", 598 | ) 599 | parser.add_argument( 600 | "-not-gt", 601 | "--no-ground-truth", 602 | action="store_true", 603 | default=False, 604 | help="Do not overlay ground truth keypoints when the dataset has ground truth.", 605 | ) 606 | parser.add_argument( 607 | "-v", 608 | "--visualization-types", 609 | nargs="+", 610 | choices=[ 611 | KP_OVERLAY_RAW, 612 | KP_OVERLAY_NET_INPUT, 613 | KP_BELIEF_OVERLAY_RAW, 614 | BELIEF_OVERLAY_RAW, 615 | ], 616 | default=[ 617 | KP_OVERLAY_RAW, 618 | KP_OVERLAY_NET_INPUT, 619 | KP_BELIEF_OVERLAY_RAW, 620 | BELIEF_OVERLAY_RAW, 621 | ], 622 | help="Specify the visulizations to generate: 'kp_raw' overlays keypoints onto the original input image. 'kp_net_input' overlays keypoints onto the input image to the network. Multiple analyses can be specified.", 623 | ) 624 | parser.add_argument( 625 | "-fps", 626 | "--framerate", 627 | type=float, 628 | default=30.0, 629 | help="Framerate (frames/sec) for the video that is created from the dataset.", 630 | ) 631 | parser.add_argument( 632 | "-s", 633 | "--start-frame", 634 | default=None, 635 | help='Start frame of the dataset to visualize. Use the name, not the file extension (e.g., "-s 001022", not "-s 001022.png").', 636 | ) 637 | parser.add_argument( 638 | "-e", 639 | "--end-frame", 640 | default=None, 641 | help='End frame of the dataset to visualize. Use the name, not the file extension (e.g., "-e 001022", not "-e 001022.png").', 642 | ) 643 | parser.add_argument( 644 | "-g", 645 | "--gpu-ids", 646 | nargs="+", 647 | type=int, 648 | default=None, 649 | help="The GPU IDs on which to conduct network inference. Nothing specified means all GPUs will be utilized. Does not affect results, only how quickly the results are obtained.", 650 | ) 651 | parser.add_argument( 652 | "-p", 653 | "--image-preproc-override", 654 | default=None, 655 | help="Overrides the image preprocessing specified by the network. (Debug argument.)", 656 | ) 657 | args = parser.parse_args() 658 | visualize_network_inference(args) 659 | -------------------------------------------------------------------------------- /setup.py: -------------------------------------------------------------------------------- 1 | # Copyright (c) 2020 NVIDIA Corporation. All rights reserved. 2 | # This work is licensed under the NVIDIA Source Code License - Non-commercial. Full 3 | # text can be found in LICENSE.md 4 | 5 | import io 6 | import os 7 | import re 8 | from setuptools import setup, find_packages 9 | 10 | # This method was adapted from code in 11 | # https://github.com/albumentations-team/albumentations 12 | def get_version(): 13 | current_dir = os.path.abspath(os.path.dirname(__file__)) 14 | version_file = os.path.join(current_dir, "dream", "__init__.py") 15 | with io.open(version_file, encoding="utf-8") as f: 16 | return re.search(r'^__version__ = [\'"]([^\'"]*)[\'"]', f.read(), re.M).group(1) 17 | 18 | setup( 19 | name='dream', 20 | version=get_version(), 21 | author='NVIDIA', 22 | author_email='sbirchfield@nvidia.com', 23 | maintainer='Timothy Lee', 24 | maintainer_email='timothyelee@cmu.edu', 25 | description='Deep Robot-to-camera Extrinsics for Articulated Manipulators', 26 | packages=['dream'], 27 | package_dir={'dream': 'dream'}, 28 | zip_safe=False 29 | ) 30 | -------------------------------------------------------------------------------- /test/_camera_settings.json: -------------------------------------------------------------------------------- 1 | { 2 | "camera_settings": [ 3 | { 4 | "id": "", 5 | "name": "", 6 | "intrinsic_settings": 7 | { 8 | "fx": 160, 9 | "fy": 160, 10 | "cx": 160, 11 | "cy": 120, 12 | "s": 0, 13 | "hfov": 90, 14 | "resolution": 15 | { 16 | "width": 320, 17 | "height": 240 18 | } 19 | }, 20 | "captured_image_size": 21 | { 22 | "width": 320, 23 | "height": 240 24 | } 25 | } 26 | ] 27 | } -------------------------------------------------------------------------------- /test/test_image_proc.py: -------------------------------------------------------------------------------- 1 | # Copyright (c) 2020 NVIDIA Corporation. All rights reserved. 2 | # This work is licensed under the NVIDIA Source Code License - Non-commercial. Full 3 | # text can be found in LICENSE.md 4 | 5 | import numpy as np 6 | import torch 7 | 8 | # Needed to prevent pytest from raising a warning when importing torchvision via dream 9 | import warnings 10 | 11 | with warnings.catch_warnings(): 12 | warnings.filterwarnings("ignore", category=DeprecationWarning) 13 | import imp 14 | 15 | from dream import image_proc 16 | 17 | # ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 18 | 19 | 20 | def test_shrink_resolution(): 21 | 22 | # Testing that (640 x 480) shrinks to (533 x 400) using a (400, 400) reference 23 | input_resolution = (640, 480) 24 | ref_resolution = (400, 400) 25 | shrink_res = image_proc.shrink_resolution(input_resolution, ref_resolution) 26 | shrink_res_gt = (533, 400) 27 | assert shrink_res == shrink_res_gt 28 | 29 | # Testing that shrink is a no-op if the resolution is the same 30 | ref_resolution_same = input_resolution 31 | shrink_res_same = image_proc.shrink_resolution( 32 | input_resolution, ref_resolution_same 33 | ) 34 | assert shrink_res_same == input_resolution 35 | 36 | 37 | def test_shrink_and_crop_resolution(): 38 | 39 | # Testing with (640 x 480) input with (400, 400) reference 40 | input_resolution = (640, 480) 41 | ref_resolution = (400, 400) 42 | input_cropped_res, cropped_coords = image_proc.shrink_and_crop_resolution( 43 | input_resolution, ref_resolution 44 | ) 45 | 46 | input_cropped_res_gt = (480, 480) 47 | cropped_coords_gt = (80, 0) 48 | 49 | assert input_cropped_res == input_cropped_res_gt 50 | assert cropped_coords == cropped_coords_gt 51 | 52 | # Testing that shrink-and-crop is a no-op if the resolution is the same 53 | ref_resolution_same = input_resolution 54 | input_cropped_res_same, cropped_coords_same = image_proc.shrink_and_crop_resolution( 55 | input_resolution, ref_resolution_same 56 | ) 57 | cropped_coords_same_gt = (0, 0) 58 | 59 | assert input_cropped_res_same == input_resolution 60 | assert cropped_coords_same == cropped_coords_same_gt 61 | 62 | 63 | def test_resolution_after_preprocessing(): 64 | 65 | # Testing with (640 x 480) input with (400, 400) reference 66 | input_resolution = (640, 480) 67 | ref_resolution = (400, 400) 68 | 69 | preproc_res_none = image_proc.resolution_after_preprocessing( 70 | input_resolution, ref_resolution, "none" 71 | ) 72 | preproc_res_none_gt = input_resolution 73 | assert preproc_res_none == preproc_res_none_gt 74 | 75 | preproc_res_resize = image_proc.resolution_after_preprocessing( 76 | input_resolution, ref_resolution, "resize" 77 | ) 78 | preproc_res_resize_gt = ref_resolution 79 | assert preproc_res_resize == preproc_res_resize_gt 80 | 81 | preproc_res_shrink = image_proc.resolution_after_preprocessing( 82 | input_resolution, ref_resolution, "shrink" 83 | ) 84 | preproc_res_shrink_gt = (533, 400) 85 | assert preproc_res_shrink == preproc_res_shrink_gt 86 | 87 | preproc_res_snc = image_proc.resolution_after_preprocessing( 88 | input_resolution, ref_resolution, "shrink-and-crop" 89 | ) 90 | preproc_res_snc_gt = ref_resolution 91 | assert preproc_res_snc == preproc_res_snc_gt 92 | 93 | 94 | def test_belief_maps(): 95 | 96 | # Generate belief maps 97 | belief_map_resolution = (80, 60) 98 | kp_proj = np.array([65.0, 20.0]) 99 | kp_proj_out_of_frame = np.array( 100 | [belief_map_resolution[0] + 20.0, belief_map_resolution[1] + 20.0] 101 | ) 102 | belief_maps = image_proc.create_belief_map( 103 | belief_map_resolution, [kp_proj, kp_proj_out_of_frame] 104 | ) 105 | 106 | belief_maps_as_tensor = torch.tensor(belief_maps).float() 107 | # Uncomment below for showing visuals 108 | # image_proc.image_from_belief_map(belief_maps_as_tensor[0]).show() 109 | # image_proc.image_from_belief_map(belief_maps_as_tensor[1]).show() 110 | 111 | # Detect keypoints in belief maps 112 | all_peaks = image_proc.peaks_from_belief_maps(belief_maps_as_tensor, 0.0) 113 | 114 | # First belief map should only contain one detected keypoint, the ground truth 115 | assert len(all_peaks[0]) == 1 116 | kp_proj_detected = np.array(all_peaks[0][0][:2]) 117 | assert np.linalg.norm(kp_proj - kp_proj_detected) < 1.0e-3 118 | 119 | # Second belief map, for out-of-frame keypoint, should return no peaks 120 | assert len(all_peaks[1]) == 0 121 | -------------------------------------------------------------------------------- /test/test_utilities.py: -------------------------------------------------------------------------------- 1 | # Copyright (c) 2020 NVIDIA Corporation. All rights reserved. 2 | # This work is licensed under the NVIDIA Source Code License - Non-commercial. Full 3 | # text can be found in LICENSE.md 4 | 5 | import os 6 | 7 | import numpy as np 8 | 9 | # Needed to prevent pytest from raising a warning when importing torchvision via dream 10 | import warnings 11 | 12 | with warnings.catch_warnings(): 13 | warnings.filterwarnings("ignore", category=DeprecationWarning) 14 | import imp 15 | 16 | from dream import utilities as dream_utils 17 | 18 | # ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 19 | 20 | 21 | def test_load_camera_intrinsics(): 22 | 23 | # Provide the camera setting json that exists in this directory 24 | this_script_dir = os.path.dirname(os.path.realpath(__file__)) 25 | camera_data_path = os.path.join(this_script_dir, "_camera_settings.json") 26 | camera_K = dream_utils.load_camera_intrinsics(camera_data_path) 27 | 28 | # Construct the ground truth camera matrix 29 | cam_fx = 160.0 30 | cam_fy = 160.0 31 | cam_cx = 160.0 32 | cam_cy = 120.0 33 | 34 | camera_K_gt = np.array( 35 | [[cam_fx, 0.0, cam_cx], [0.0, cam_fy, cam_cy], [0.0, 0.0, 1.0]] 36 | ) 37 | 38 | # Test assertion 39 | assert np.all(camera_K == camera_K_gt) 40 | 41 | 42 | def test_load_image_resolution(): 43 | 44 | # Provide the camera setting json that exists in this directory 45 | this_script_dir = os.path.dirname(os.path.realpath(__file__)) 46 | camera_data_path = os.path.join(this_script_dir, "_camera_settings.json") 47 | image_resolution = dream_utils.load_image_resolution(camera_data_path) 48 | 49 | # Construct the ground truth image resolution 50 | image_resolution_gt = (320, 240) 51 | 52 | # Test assertion 53 | assert image_resolution == image_resolution_gt 54 | -------------------------------------------------------------------------------- /trained_models/DOWNLOAD.sh: -------------------------------------------------------------------------------- 1 | #!/bin/bash 2 | 3 | # This script makes it easy to download pretrained model weights. 4 | # (Files are at https://drive.google.com/open?id=1Krp-fCT9ffEML3IpweSOgWiMHHBw6k2Z) 5 | 6 | # Each robot contains two files (.pth and .yaml). 7 | # Comment out any model you do not want. 8 | 9 | #----- Rethink Robotics' Baxter 10 | 11 | # Baxter VGG-Q (baxter_dream_vgg_q) 12 | gdown --id 1Ia4UxSdilXH9SwyPqem0rS13Mha9pN7F # .pth (85 MB) 13 | gdown --id 1TNhYuOm_-UH5z1rEVm16mnRA7hB7AT1X # .yaml 14 | 15 | 16 | #----- Kuka LBR iiwa 7 R800 17 | 18 | # Kuka ResNet-H (kuka_dream_resnet_h) 19 | gdown --id 1Ctoh01q1IvLHP9pf5Os8eIzJ8fQBgYpJ # .pth (207 MB) 20 | gdown --id 1MLWDTq7yQF9UeV1T3REDk60GYne32OXJ # .yaml 21 | 22 | 23 | #----- Franka Emika Panda 24 | 25 | # Panda VGG-Q (panda_dream_vgg_q) -- recommended 26 | gdown --id 1zS-kQ73dOYMXS8Wku_OUN0q7MvEUm2fZ # .pth (85 MB) 27 | gdown --id 1MKDiknxDzXErd4Gwdv0uMoL65IYjxO0Q # .yaml 28 | 29 | # Panda VGG-F (panda_dream_vgg_f) 30 | gdown --id 1pz-gXux8TxB4pOYnYy5DH7vp-3-mTJFu # .pth (86 MB) 31 | gdown --id 191Pgu_C0qzKpOSoicOOSLq-bR7cg2KVO # .yaml 32 | 33 | # Panda ResNet-H (panda_dream_resnet_h) 34 | gdown --id 16fyv6ps3om0H8dnXRDHj0w4dfEKPSpDW # .pth (207 MB) 35 | gdown --id 1gCpigRIqm1rAw-o7oXpRO2ZTQkHyYF-k # .yaml 36 | 37 | # Panda ResNet-F (panda_dream_resnet_f) -- difficult to train 38 | gdown --id 1d8UfrgQb4ohIAfpRGvDBjabSKuP9LCpy # .pth (211 MB) 39 | gdown --id 1IWdXSmmIq2-eimtNK_ywJZRH4omesSDq # .yaml 40 | --------------------------------------------------------------------------------