├── .gitignore
├── LICENSE.md
├── README.md
├── arch_configs
    ├── dream_resnet_f.yaml
    ├── dream_resnet_h.yaml
    ├── dream_vgg_f.yaml
    └── dream_vgg_q.yaml
├── data
    └── DOWNLOAD.sh
├── docker
    ├── Dockerfile.kinetic
    ├── README.md
    ├── init_workspace.sh
    └── run_dream_docker.sh
├── dream-franka.png
├── dream
    ├── .gitignore
    ├── __init__.py
    ├── add_plots.py
    ├── analysis.py
    ├── datasets.py
    ├── geometric_vision.py
    ├── image_proc.py
    ├── models.py
    ├── network.py
    ├── oks_plots.py
    ├── spatial_softmax.py
    └── utilities.py
├── manip_configs
    ├── baxter.yaml
    ├── kuka.yaml
    └── panda.yaml
├── requirements.txt
├── scripts
    ├── analyze_training.py
    ├── analyze_training_multi.py
    ├── launch_dream_ros.py
    ├── network_inference.py
    ├── network_inference_dataset.py
    ├── train_network.py
    ├── train_network_multi.py
    └── visualize_network_inference.py
├── setup.py
├── test
    ├── _camera_settings.json
    ├── test_image_proc.py
    └── test_utilities.py
└── trained_models
    └── DOWNLOAD.sh


/.gitignore:
--------------------------------------------------------------------------------
1 | analyze_results/*.png
2 | __pycache__/
3 | training_results/
4 | 


--------------------------------------------------------------------------------
/LICENSE.md:
--------------------------------------------------------------------------------
 1 | # NVIDIA Source Code License-NC
 2 | 
 3 | ## 1. Definitions
 4 | 
 5 | “Licensor” means any person or entity that distributes its Work.
 6 | 
 7 | “Software” means the original work of authorship made available under this License.
 8 | “Work” means the Software and any additions to or derivative works of the Software that are made available under this License.
 9 | 
10 | “Nvidia Processors” means any central processing unit (CPU), graphics processing unit (GPU), field-programmable gate array (FPGA), application-specific integrated circuit (ASIC) or any combination thereof designed, made, sold, or provided by Nvidia or its affiliates.
11 | 
12 | The terms “reproduce,” “reproduction,” “derivative works,” and “distribution” have the meaning as provided under U.S. copyright law; provided, however, that for the purposes of this License, derivative works shall not include works that remain separable from, or merely link (or bind by name) to the interfaces of, the Work.
13 | 
14 | Works, including the Software, are “made available” under this License by including in or with the Work either (a) a copyright notice referencing the applicability of this License to the Work, or (b) a copy of this License.
15 | 
16 | ## 2. License Grants
17 | 
18 | ### 2.1 Copyright Grant.
19 | Subject to the terms and conditions of this License, each Licensor grants to you a perpetual, worldwide, non-exclusive, royalty-free, copyright license to reproduce, prepare derivative works of, publicly display, publicly perform, sublicense and distribute its Work and any resulting derivative works in any form.
20 | 
21 | ## 3. Limitations
22 | 
23 | ### 3.1 Redistribution.
24 |  You may reproduce or distribute the Work only if (a) you do so under this License, (b) you include a complete copy of this License with your distribution, and (c) you retain without modification any copyright, patent, trademark, or attribution notices that are present in the Work.
25 | 
26 | ### 3.2 Derivative Works.
27 |  You may specify that additional or different terms apply to the use, reproduction, and distribution of your derivative works of the Work (“Your Terms”) only if (a) Your Terms provide that the use limitation in Section 3.3 applies to your derivative works, and (b) you identify the specific derivative works that are subject to Your Terms. Notwithstanding Your Terms, this License (including the redistribution requirements in Section 3.1) will continue to apply to the Work itself.
28 | 
29 | ### 3.3 Use Limitation.
30 |  The Work and any derivative works thereof only may be used or intended for use non-commercially.  The Work or derivative works thereof may be used or intended for use by Nvidia or its affiliates commercially or non-commercially.  As used herein, “non-commercially” means for research or evaluation purposes only.
31 | 
32 | ### 3.4 Patent Claims.
33 |  If you bring or threaten to bring a patent claim against any Licensor (including any claim, cross-claim or counterclaim in a lawsuit) to enforce any patents that you allege are infringed by any Work, then your rights under this License from such Licensor (including the grants in Sections 2.1 and 2.2) will terminate immediately.
34 | 
35 | ### 3.5 Trademarks.
36 |  This License does not grant any rights to use any Licensor’s or its affiliates’ names, logos, or trademarks, except as necessary to reproduce the notices described in this License.
37 | 
38 | ### 3.6 Termination.
39 |  If you violate any term of this License, then your rights under this License (including the grants in Sections 2.1 and 2.2) will terminate immediately.
40 | 
41 | ## 4. Disclaimer of Warranty.
42 | 
43 | THE WORK IS PROVIDED “AS IS” WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, EITHER EXPRESS OR IMPLIED, INCLUDING WARRANTIES OR CONDITIONS OF M ERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE, TITLE OR NON-INFRINGEMENT. YOU BEAR THE RISK OF UNDERTAKING ANY ACTIVITIES UNDER THIS LICENSE. 
44 | 
45 | ## 5. Limitation of Liability.
46 | 
47 | EXCEPT AS PROHIBITED BY APPLICABLE LAW, IN NO EVENT AND UNDER NO LEGAL THEORY, WHETHER IN TORT (INCLUDING NEGLIGENCE), CONTRACT, OR OTHERWISE SHALL ANY LICENSOR BE LIABLE TO YOU FOR DAMAGES, INCLUDING ANY DIRECT, INDIRECT, SPECIAL, INCIDENTAL, OR CONSEQUENTIAL DAMAGES ARISING OUT OF OR RELATED TO THIS LICENSE, THE USE OR INABILITY TO USE THE WORK (INCLUDING BUT NOT LIMITED TO LOSS OF GOODWILL, BUSINESS INTERRUPTION, LOST PROFITS OR DATA, COMPUTER FAILURE OR MALFUNCTION, OR ANY OTHER COMM ERCIAL DAMAGES OR LOSSES), EVEN IF THE LICENSOR HAS BEEN ADVISED OF THE POSSIBILITY OF SUCH DAMAGES.
48 | 
49 | 


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
  1 | # DREAM: Deep Robot-to-Camera Extrinsics for Articulated Manipulators
  2 | 
  3 | This is the official implementation of ["Camera-to-robot pose estimation from a single image"](https://arxiv.org/abs/1911.09231) (ICRA 2020).  The DREAM system uses a robot-specific deep neural network to detect keypoints (typically joint locations) in the RGB image of a robot manipulator. Using these keypoint locations along with the robot forward kinematics, the camera pose with respect to the
  4 | robot is estimated using a perspective-n-point (PnP) algorithm.  For more details, please see our [paper](https://arxiv.org/abs/1911.09231) and [video](https://youtu.be/O1qAFboFQ8A).
  5 | 
  6 | ![DREAM in operation](dream-franka.png)
  7 | 
  8 | 
  9 | ## **Installation**
 10 | 
 11 | We have tested on Ubuntu 16.04 and 18.04 with an NVIDIA GeForce RTX 2080 and Titan X, with both Python 2.7 and Python 3.6.  The code may work on other systems.
 12 | 
 13 | Install the DREAM package and its dependencies using `pip`:
 14 | 
 15 | ```
 16 | pip install . -r requirements.txt
 17 | ```
 18 | 
 19 | Download the pre-trained models and (optionally) data.  In the scripts below, be sure to comment out files you do not want, as they are very large.  Alternatively, you can download files [manually](https://drive.google.com/drive/folders/1Krp-fCT9ffEML3IpweSOgWiMHHBw6k2Z?usp=sharing)
 20 | ```
 21 | cd trained_models; ./DOWNLOAD.sh; cd ..
 22 | cd data; ./DOWNLOAD.sh; cd ..
 23 | ```
 24 | 
 25 | Unit tests are implemented in the `pytest` framework. Verify your installation by running them:  `pytest test/`
 26 | 
 27 | 
 28 | ## **Offline inference**
 29 | 
 30 | There are three scripts for offline inference:
 31 | 
 32 | ```
 33 | # Run the network on a single image to display detected 2D keypoints.
 34 | python scripts/network_inference.py -i <path/to/network.pth> -m <path/to/image.png> 
 35 | 
 36 | # Process a dataset to save both 2D keypoints and 3D poses.
 37 | python scripts/network_inference_dataset.py -i <path/to/network.pth> -d <path/to/dataset_dir/> -o <path/to/output_dir/> -b 16 -w 8
 38 | 
 39 | # Run the network on an image sequence 
 40 | # (either a dataset or a directory of images, e.g., from a video),
 41 | # and saves the resulting visualizations as videos.
 42 | python scripts/visualize_network_inference.py -i <path/to/network.pth> -d <path/to/dataset_dir/> -o <path/to/output_dir/> -s <start_frame_name> -e <end_frame_name>
 43 | ```
 44 | 
 45 | Pass `-h` for help on command line arguments.  Datasets are assumed to be in [NDDS](https://github.com/NVIDIA/Dataset_Synthesizer) format.
 46 | 
 47 | 
 48 | #### **Example for single-image inference**
 49 | 
 50 | Single-image inference from one frame of the Panda-3Cam RealSense dataset using the DREAM-vgg-Q network:
 51 | 
 52 | ```
 53 | python scripts/network_inference.py -i trained_models/panda_dream_vgg_q.pth -m data/real/panda-3cam_realsense/000000.rgb.jpg
 54 | ```
 55 | 
 56 | You should see the detected keypoints printed to the screen as well as overlaid on the Panda robot. (See note below regarding the Panda keypoint locations.)
 57 | 
 58 | 
 59 | #### **Example for dataset inference**
 60 | 
 61 | Inference on the Panda-3Cam RealSense dataset using the DREAM-vgg-Q network:
 62 | 
 63 | ```
 64 | python scripts/network_inference_dataset.py -i trained_models/panda_dream_vgg_q.pth -d data/real/panda-3cam_realsense/ -o <path/to/output_results> -b 16 -w 8
 65 | ```
 66 | 
 67 | The analysis will print to both the screen and file. You should see that the percentage of correct keypoints (PCK) and the area under the curve (AUC) is about 0.720, and the average distance (ADD) AUC is about 0.792. Various visualizations will also be saved to disk.
 68 | 
 69 | #### **Example for generating inference visualizations**
 70 | 
 71 | Generating visualizations for one of the sequences in the Panda-3Cam RealSense dataset using the DREAM-vgg-Q network:
 72 | 
 73 | ```
 74 | python scripts/visualize_network_inference.py -i trained_models/panda_dream_vgg_q.pth -d data/real/panda-3cam_realsense -o <path/to/output_results> -fps 120.0 -s 004151
 75 | ```
 76 | This creates videos at 4x normal camera framerate.
 77 | 
 78 | ## **Online inference using ROS**
 79 | 
 80 | A ROS node is provided for real-time camera pose estimation.
 81 | Some values, such as ROS topic names, may
 82 | need to be changed for your application.  Because of incompatabilities between
 83 | ROS (before Noetic) and Python 3, the DREAM ROS node is implemented using Python 2.7.  For ease
 84 | of use, we have provided a [Docker setup](docker/) containing all the necessary
 85 | compoinents to run DREAM with ROS Kinetic.
 86 | 
 87 | Example to run the DREAM ROS node (in verbose mode):
 88 | 
 89 | ```
 90 | python scripts/launch_dream_ros.py -i trained_models/baxter_dream_vgg_q.pth -b torso -v
 91 | ```
 92 | 
 93 | 
 94 | ## **Training**
 95 | 
 96 | Below is an example for training a DREAM-vgg-Q model for the Franka Emika Panda robot:
 97 | 
 98 | ```
 99 | python scripts/train_network.py -i data/synthetic/panda_synth_train_dr/ -t 0.8 -m manip_configs/panda.yaml -ar arch_configs/dream_hourglass_example.yaml -e 25 -lr 0.00015 -b 128 -w 16 -o <path/to/output_dir/>
100 | ```
101 | 
102 | The models below are defined in the following architecture files:
103 | - DREAM-vgg-Q: `arch_configs/dream_vgg_q.yaml`
104 | - DREAM-vgg-F: `arch_configs/deam_vgg_f.yaml`
105 | - DREAM-resnet-H: `arch_configs/dream_resnet_h.yaml`
106 | - DREAM-resnet-F: `arch_configs/dream_resnet_f.yaml` (very large network and unwieldy to train)
107 | 
108 | ## **Note on Panda keypoints**
109 | 
110 | By default, keypoints are defined at the joint locations as defined by the robot URDF file. In the case of the Panda robot, the URDF file defines the joints at non-intuitive locations. As a result, visualizations of keypoint detections may appear to be wrong when they are in fact correct (see [our video](https://youtu.be/O1qAFboFQ8A)).  We have since modified the URDF to place the keypoints at the actual joint locations (see Fig. 5a of our paper), but for simplicity we are not releasing the modified URDF at this time.
111 | 
112 | 
113 | ## **Note on reproducing results**
114 | 
115 | The experiments in the paper used the image preprocessing type `shrink-and-crop`, which preserves the same aspect ratio of the input image, but crops the width to send 400 x 400 resolution to the network (which is the resolution used during training). In order to allow for full-frame inference, the models we released have the default image preprocessing type `resize`, which prevents this cropping.  Careful analysis has shown almost no difference in quantitative results, but if you are looking to reproduce our ICRA results exactly, please change the `architecture/image_processing` value to `shrink-and-crop`.
116 | 
117 | The PCK and ADD plots in the paper are generated from `oks_plots.py` and `add_plots.py`. The AUC in these figures (and in Table 1) are in the `analysis_results.txt` file that is produced by `scripts/network_inference_dataset.py`.
118 | 
119 | # Further information
120 | 
121 | For an example of how DREAM can be used in practice for vision-based object manipulation, please refer to ["Indirect Object-to-Robot Pose Estimation from an External Monocular RGB Camera"](https://research.nvidia.com/publication/2020-07_Indirect-Object-Pose) by Jonathan Tremblay, Stephen Tyree, Terry Mosier, and Stan Birchfield.
122 | 
123 | For more information on how DREAM learns robot keypoint detection from sim-to-real transfer using only synthetic domain randomized images, please refer to our [sim2realAI blog post](https://sim2realai.github.io/dream-camera-calibration-sim2real/).
124 | 
125 | 
126 | # License
127 | 
128 | DREAM is licensed under the [NVIDIA Source Code License - Non-commercial](LICENSE.md).
129 | 
130 | 
131 | # Citation
132 | 
133 | Please cite our work if you use it for your research. Thank you!
134 | 
135 | ```
136 | @inproceedings{lee2020icra:dream,
137 |   title={Camera-to-Robot Pose Estimation from a Single Image},
138 |   author={Lee, Timothy E and Tremblay, Jonathan and To, Thang and Cheng, Jia and Mosier, Terry and Kroemer, Oliver and Fox, Dieter and Birchfield, Stan},
139 |   booktitle={International Conference on Robotics and Automation (ICRA)},
140 |   year=2020,
141 |   url={https://arxiv.org/abs/1911.09231}
142 | }
143 | ```
144 | 
145 | # Acknowledgment
146 | Thanks to Jeffrey Smith (jeffreys@nvidia.com) for assistance in preparing this release.
147 | 


--------------------------------------------------------------------------------
/arch_configs/dream_resnet_f.yaml:
--------------------------------------------------------------------------------
 1 | !!omap
 2 | - architecture: !!omap
 3 |   - type: resnet
 4 |   - target: belief_maps
 5 |   - full_decoder: true
 6 |   - input_heads:
 7 |     - image_rgb
 8 |   - output_heads:
 9 |     - belief_maps
10 |   - image_normalization: !!omap
11 |     - mean: [0.5, 0.5, 0.5]
12 |     - stdev: [0.5, 0.5, 0.5]
13 |   - loss: !!omap
14 |     - type: mse
15 | - training: !!omap
16 |   - config: !!omap
17 |     - image_preprocessing: shrink-and-crop
18 |     - net_input_resolution: [400, 400]
19 | 


--------------------------------------------------------------------------------
/arch_configs/dream_resnet_h.yaml:
--------------------------------------------------------------------------------
 1 | !!omap
 2 | - architecture: !!omap
 3 |   - type: resnet
 4 |   - target: belief_maps
 5 |   - input_heads:
 6 |     - image_rgb
 7 |   - output_heads:
 8 |     - belief_maps
 9 |   - image_normalization: !!omap
10 |     - mean: [0.5, 0.5, 0.5]
11 |     - stdev: [0.5, 0.5, 0.5]
12 |   - loss: !!omap
13 |     - type: mse
14 | - training: !!omap
15 |   - config: !!omap
16 |     - image_preprocessing: shrink-and-crop
17 |     - net_input_resolution: [400, 400]
18 | 


--------------------------------------------------------------------------------
/arch_configs/dream_vgg_f.yaml:
--------------------------------------------------------------------------------
 1 | !!omap
 2 | - architecture: !!omap
 3 |   - type: vgg
 4 |   - target: belief_maps
 5 |   - deconv_decoder: true
 6 |   - input_heads:
 7 |     - image_rgb
 8 |   - output_heads:
 9 |     - belief_maps
10 |   - image_normalization: !!omap
11 |     - mean: [0.5, 0.5, 0.5]
12 |     - stdev: [0.5, 0.5, 0.5]
13 |   - loss: !!omap
14 |     - type: mse
15 | - training: !!omap
16 |   - config: !!omap
17 |     - image_preprocessing: shrink-and-crop
18 |     - net_input_resolution: [400, 400]
19 | 


--------------------------------------------------------------------------------
/arch_configs/dream_vgg_q.yaml:
--------------------------------------------------------------------------------
 1 | !!omap
 2 | - architecture: !!omap
 3 |   - type: vgg
 4 |   - target: belief_maps
 5 |   - input_heads:
 6 |     - image_rgb
 7 |   - output_heads:
 8 |     - belief_maps
 9 |   - image_normalization: !!omap
10 |     - mean: [0.5, 0.5, 0.5]
11 |     - stdev: [0.5, 0.5, 0.5]
12 |   - loss: !!omap
13 |     - type: mse
14 | - training: !!omap
15 |   - config: !!omap
16 |     - image_preprocessing: shrink-and-crop
17 |     - net_input_resolution: [400, 400]
18 | 


--------------------------------------------------------------------------------
/data/DOWNLOAD.sh:
--------------------------------------------------------------------------------
 1 | #!/bin/bash
 2 | 
 3 | # This script makes it easy to download real/synthetic images with annotations.
 4 | # (Files are at https://drive.google.com/open?id=1Krp-fCT9ffEML3IpweSOgWiMHHBw6k2Z)
 5 | 
 6 | # Comment out any data you do not want.
 7 | 
 8 | echo 'Warning:  Files are *very* large.  Be sure to comment out any files you do not want.'
 9 | 
10 | 
11 | #----- real test data -----------------------------------
12 | 
13 | mkdir -p real
14 | cd real
15 | 
16 | gdown --id 10Tpx8jAfzP6g44WXfvjlVywbIlxZ4BRx  # Panda-3Cam, Azure (368 MB)
17 | gdown --id 14TJ9o9QOdb25zlZ3onsOJlSb7-tGrvKz  # Panda-3Cam, Kinect360 (295 MB)
18 | gdown --id 1FFAFpJFwzsjD83S9-Y1ODwDWiWlh1X6P  # Panda-3Cam, RealSense (343 MB)
19 | gdown --id 1kL7Goibx4lwKQoO-UQ4gm94f_XdEKTUZ  # Panda-Orb, RealSense (2 GB)
20 | 
21 | cd ..
22 | 
23 | 
24 | #----- synthetic training / validation data -----------------------------------
25 | 
26 | mkdir -p synthetic
27 | cd synthetic
28 | 
29 | #----- Rethink Robotics' Baxter
30 | gdown --id 1MSRwQpg690RvuvtjNuGYA1ILGipX16dW  # test DR (541 MB)
31 | gdown --id 1SzUPYmNxe1OsbGyWdpdkoRjWJurs-NAF  # train (9 GB)
32 | 
33 | #----- Kuka LBR iiwa 7 R800
34 | gdown --id 1kGvSlVScmMohZStS-_NfCpCa5SBAcx_i  # test DR (445 MB)
35 | gdown --id 1ChF4jAGMPbPwe2dOZYPJ2t2rCSR0Xw9R  # test non-DR (222 MB)
36 | gdown --id 1HTW3YEGDO22zOT56jFWxfizznw4aGMpU  # train (8 GB)
37 | 
38 | #----- Franka Emika Panda
39 | gdown --id 1tOzfzlRhUbRO-QkzPHvj-IOPFuSIts-R  # test DR (450 MB)
40 | gdown --id 11pK1BqfQkzVnTjyQHVRZ6ZkX4oyxbEQP  # test non-DR (190 MB)
41 | gdown --id 1ZXzseMa7aMIKxK4BNH2gacmm3_XGJvxm  # train (8 GB)
42 | 
43 | cd ..
44 | 


--------------------------------------------------------------------------------
/docker/Dockerfile.kinetic:
--------------------------------------------------------------------------------
 1 | FROM nvidia/cudagl:10.0-devel-ubuntu16.04
 2 | 
 3 | # Copyright (c) 2020, NVIDIA CORPORATION. All rights reserved.
 4 | # Full license terms provided in LICENSE.md file.
 5 | 
 6 | # Build with:
 7 | # docker build -t nvidia-dream:kinetic-v1 -f Dockerfile.kinetic ..
 8 | 
 9 | ENV HOME /root
10 | 
11 | WORKDIR ${HOME}
12 | 
13 | RUN apt-get update && apt-get -y --no-install-recommends install software-properties-common
14 | 
15 | # install packages requried for setup and installing ROS
16 | RUN apt-get update && apt-get install -q -y \
17 |     dirmngr \
18 |     gnupg2 \
19 |     wget \
20 |     python-tk \
21 |     && rm -rf /var/lib/apt/lists/*
22 | 
23 | # install python
24 | RUN add-apt-repository ppa:deadsnakes/ppa \
25 |     && apt-get update \
26 |     && apt-get -y --no-install-recommends install python libpython-dev \
27 |     && wget https://bootstrap.pypa.io/get-pip.py \
28 |     && python2.7 get-pip.py
29 | 
30 | # Install DREAM requirements
31 | COPY requirements.txt ${HOME}
32 | RUN python2 -m pip install --no-cache-dir -r requirements.txt
33 | 
34 | # set up keys
35 | RUN apt-key adv --keyserver hkp://keyserver.ubuntu.com:80 \
36 |     --recv-keys C1CF6E31E6BADE8868B172B4F42ED6FBAB17C654
37 | 
38 | # setup sources.list
39 | RUN echo "deb http://packages.ros.org/ros/ubuntu xenial main" > /etc/apt/sources.list.d/ros1-latest.list
40 | 
41 | # install bootstrap tools
42 | RUN apt-get update && apt-get install --no-install-recommends -y \
43 |     python-rosdep \
44 |     python-rosinstall \
45 |     python-vcstools \
46 |     && rm -rf /var/lib/apt/lists/*
47 | 
48 | # setup environment
49 | ENV LANG C.UTF-8
50 | ENV LC_ALL C.UTF-8
51 | ENV DISPLAY :0
52 | # Some QT-Apps don't not show controls without this
53 | ENV QT_X11_NO_MITSHM 1
54 | 
55 | # bootstrap rosdep
56 | ENV ROS_DISTRO kinetic
57 | RUN rosdep init \
58 |     && rosdep update --rosdistro $ROS_DISTRO
59 | 
60 | # install ros packages
61 | RUN apt-get update \
62 |     && apt-get install -y ros-kinetic-ros-core=1.3.2-0* \
63 |     && apt-get install --no-install-recommends -y \
64 |        ros-$ROS_DISTRO-rviz       \
65 |        ros-$ROS_DISTRO-tf2        \
66 |        ros-$ROS_DISTRO-cv-bridge  \
67 |        ros-$ROS_DISTRO-image-view \
68 |     && rm -rf /var/lib/apt/lists/*
69 | 
70 | RUN echo 'source /opt/ros/$ROS_DISTRO/setup.bash' >> ${HOME}/.bashrc
71 | RUN echo 'source /root/catkin_ws/devel/setup.bash' >> ${HOME}/.bashrc
72 | 
73 | # Setup catkin workspace
74 | ENV CATKIN_WS ${HOME}/catkin_ws
75 | COPY docker/init_workspace.sh ${HOME}
76 | RUN ${HOME}/init_workspace.sh
77 | 
78 | # cleanup
79 | RUN rm /root/requirements.txt /root/get-pip.py /root/init_workspace.sh
80 | 
81 | CMD ["bash"]
82 | 


--------------------------------------------------------------------------------
/docker/README.md:
--------------------------------------------------------------------------------
 1 | ## DREAM in a Docker Container
 2 | 
 3 | Running ROS inside of [Docker](https://www.docker.com/) is an excellent way to
 4 | experiment with DREAM, as it allows the user to completely isolate all software
 5 | and configuration changes from the host system.  This document describes how to
 6 | create and run a Docker image that contains a complete ROS environment that
 7 | supports DREAM, including all required components, such as ROS Kinetic, OpenCV,
 8 | CUDA, and other packages.
 9 | 
10 | The current configuration assumes all components are installed on an x86 host
11 | platform running Ubuntu 16.04 or later.  Further, use of the DREAM Docker container
12 | requires an NVIDIA GPU.
13 | 
14 | 
15 | ### Steps
16 | 
17 | 1. **Create the container**
18 |    ```
19 |    $ cd deep-arm-cal/docker
20 |    $ docker build -t nvidia-dream:kinetic-v1 -f Dockerfile.kinetic ..
21 |    ```
22 |    This will take several minutes and requires an internet connection.
23 | 
24 | 2. **Run the container**
25 |    ```
26 |    $ ./run_dream_docker.sh [name] [host dir] [container dir]
27 |    ```
28 |    Parameters:
29 |    - `name` is an optional field that specifies the name of this image. By default, it is `nvidia-dream-v1`.  By using different names, you can create multiple containers from the same image.
30 |    - `host dir` and `container dir` are a pair of optional fields that allow you to specify a mapping between a directory on your host machine and a location inside the container.  This is useful for sharing code and data between the two systems.  By default, it maps the directory containing DREAM to `/root/catkin_ws/src/dream` in the container.
31 | 
32 |       Only the first invocation of this script with a given name will create a container. Subsequent executions will attach to the running container allowing you -- in effect -- to have multiple terminal sessions into a single container.
33 | 
34 | 3. **Install DREAM**
35 | 
36 |    Inside the running Docker container:
37 |    ```
38 |    $ cd /root/catkin_ws/src/dream
39 |    $ pip install -e .
40 |    ```
41 | 
42 | 4. **Run the tests**
43 | 
44 |    Inside the running Docker container:
45 |    ```
46 |    $ cd /root/catkin_ws/src/dream
47 |    $ pytest test/
48 |    ```
49 | 
50 | See the README.md in the top-level directory of this repository for information about
51 | running DREAM.


--------------------------------------------------------------------------------
/docker/init_workspace.sh:
--------------------------------------------------------------------------------
 1 | #!/bin/bash
 2 | #
 3 | 
 4 | # Copyright (c) 2020, NVIDIA CORPORATION. All rights reserved.
 5 | # Full license terms provided in LICENSE.md file.
 6 | 
 7 | # Stop in case of any error.
 8 | set -e
 9 | 
10 | source /opt/ros/kinetic/setup.bash
11 | 
12 | # Create catkin workspace.
13 | mkdir -p ${CATKIN_WS}/src
14 | cd ${CATKIN_WS}/src
15 | catkin_init_workspace
16 | cd ..
17 | catkin_make
18 | source devel/setup.bash
19 | 


--------------------------------------------------------------------------------
/docker/run_dream_docker.sh:
--------------------------------------------------------------------------------
 1 | #!/bin/bash
 2 | 
 3 | # Copyright (c) 2018, NVIDIA CORPORATION. All rights reserved.
 4 | # Full license terms provided in LICENSE.md file.
 5 | 
 6 | CONTAINER_NAME=$1
 7 | if [[ -z "${CONTAINER_NAME}" ]]; then
 8 |     CONTAINER_NAME=nvidia-dream-v1
 9 | fi
10 | 
11 | # This specifies a mapping between a host directory and a directory in the
12 | # docker container. This mapping should be changed if you wish to have access to
13 | # a different directory
14 | HOST_DIR=$2
15 | if [[ -z "${HOST_DIR}" ]]; then
16 |     HOST_DIR=`realpath ${PWD}/..`
17 | fi
18 | 
19 | CONTAINER_DIR=$3
20 | if [[ -z "${CONTAINER_DIR}" ]]; then
21 |     CONTAINER_DIR=/root/catkin_ws/src/dream
22 | fi
23 | 
24 | echo "Container name     : ${CONTAINER_NAME}"
25 | echo "Host directory     : ${HOST_DIR}"
26 | echo "Container directory: ${CONTAINER_DIR}"
27 | DREAM_ID=`docker ps -aqf "name=^/${CONTAINER_NAME}$"`
28 | if [ -z "${DREAM_ID}" ]; then
29 |     echo "Creating new DREAM docker container."
30 |     xhost +
31 |     docker run --gpus all -it --privileged --network=host -v ${HOST_DIR}:${CONTAINER_DIR}:rw -v /tmp/.X11-unix:/tmp/.X11-unix -e DISPLAY=unix${DISPLAY} --name=${CONTAINER_NAME} nvidia-dream:kinetic-v1 bash
32 | else
33 |     echo "Found DREAM docker container: ${DREAM_ID}."
34 |     # Check if the container is already running and start if necessary.
35 |     if [ -z `docker ps -qf "name=^/${CONTAINER_NAME}$"` ]; then
36 |         xhost +local:${DREAM_ID}
37 |         echo "Starting and attaching to ${CONTAINER_NAME} container..."
38 |         docker start ${DREAM_ID}
39 |         docker attach ${DREAM_ID}
40 |     else
41 |         echo "Found running ${CONTAINER_NAME} container, attaching bash..."
42 |         docker exec -it ${DREAM_ID} bash
43 |     fi
44 | fi
45 | 


--------------------------------------------------------------------------------
/dream-franka.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/NVlabs/DREAM/3360f2aa45f66a58eaa70d0d40f2c46c2682c0bb/dream-franka.png


--------------------------------------------------------------------------------
/dream/.gitignore:
--------------------------------------------------------------------------------
1 | *.csv
2 | *.pdf
3 | 


--------------------------------------------------------------------------------
/dream/__init__.py:
--------------------------------------------------------------------------------
 1 | __version__ = "1.3.0"
 2 | 
 3 | from .analysis import *
 4 | from .datasets import *
 5 | from .geometric_vision import *
 6 | from .image_proc import *
 7 | from .network import *
 8 | from .models import *
 9 | from .utilities import *
10 | 


--------------------------------------------------------------------------------
/dream/add_plots.py:
--------------------------------------------------------------------------------
  1 | # Copyright (c) 2020 NVIDIA Corporation. All rights reserved.
  2 | # This work is licensed under the NVIDIA Source Code License - Non-commercial. Full
  3 | # text can be found in LICENSE.md
  4 | 
  5 | import pandas as pd
  6 | import argparse
  7 | import numpy as np
  8 | import matplotlib.pyplot as plt
  9 | import seaborn as sns
 10 | 
 11 | plt.style.use("seaborn-whitegrid")
 12 | 
 13 | 
 14 | # Example of running the script
 15 | # python oks_plots.py --data all_dataset_keypoints.csv all_dataset_keypoints.csv --labels 1 2
 16 | 
 17 | # pythonw oks_plots.py --data deep-arm-cal-paper/data/dope/3cam_real_keypoints.csv deep-arm-cal-paper/data/dream_hg/3cam_real_keypoints.csv deep-arm-cal-paper/data/dream_hg_deconv/3cam_real_keypoints.csv deep-arm-cal-paper/data/resimple/3cam_real_keypoints.csv --labels DOPE DREAM AE resnet
 18 | parser = argparse.ArgumentParser(description="OKS for DREAM")
 19 | parser.add_argument(
 20 |     "--data", nargs="+", default="[all_dataset_keypoints.csv]", help="list of csv files"
 21 | )
 22 | 
 23 | parser.add_argument(
 24 |     "--labels",
 25 |     nargs="+",
 26 |     default=None,
 27 |     help="names for each dataset to be added as label",
 28 | )
 29 | 
 30 | parser.add_argument("--styles", nargs="+", default=None, help="")
 31 | parser.add_argument("--threshold", default=0.1)
 32 | 
 33 | parser.add_argument("--colours", nargs="+", default=None, help="")
 34 | 
 35 | parser.add_argument("--pixel", default=20)
 36 | 
 37 | parser.add_argument("--output", default="output.pdf")
 38 | 
 39 | parser.add_argument("--show", default=False, action="store_true")
 40 | 
 41 | parser.add_argument("--divide", default=False, action="store_true")
 42 | 
 43 | 
 44 | parser.add_argument("--title", default=None)
 45 | 
 46 | args = parser.parse_args()
 47 | print(args)
 48 | 
 49 | 
 50 | fig = plt.figure()
 51 | ax = plt.axes()
 52 | 
 53 | for i_csv, csv_file in enumerate(args.data):
 54 |     print(csv_file)
 55 | 
 56 |     if csv_file == "666":
 57 |         plt.plot([], [], " ", label=args.labels[i_csv].replace("_", " "))
 58 |         continue
 59 | 
 60 |     name_csv = csv_file.replace(".csv", "")
 61 | 
 62 |     df = pd.read_csv(csv_file)
 63 | 
 64 |     if args.divide is True:
 65 |         add = np.array(df["add"].tolist()) / 100
 66 |     else:
 67 |         add = np.array(df["add"].tolist())
 68 | 
 69 |     pnp_sol_found_magic_number = -9.99 if args.divide else -999.0
 70 | 
 71 |     n_inframe_gt_projs = np.array(df["n_inframe_gt_projs"].tolist())
 72 |     n_pnp_possible_frames = len(np.where(n_inframe_gt_projs >= 4)[0])
 73 |     add_pnp_found = add[np.where(add > pnp_sol_found_magic_number)]
 74 |     n_pnp_found = len(add_pnp_found)
 75 | 
 76 |     delta_threshold = 0.00001
 77 |     add_threshold_values = np.arange(0.0, args.threshold, delta_threshold)
 78 | 
 79 |     counts = []
 80 |     for value in add_threshold_values:
 81 |         under_threshold = (
 82 |             len(np.where(add_pnp_found <= value)[0]) / n_pnp_possible_frames
 83 |         )
 84 |         counts.append(under_threshold)
 85 | 
 86 |     auc = np.trapz(counts, dx=delta_threshold) / args.threshold
 87 | 
 88 |     # TODO: consolidate above calculations with pnp_metrics
 89 |     from dream import analysis as dream_analysis
 90 | 
 91 |     pnp_metrics = dream_analysis.pnp_metrics(df["add"], df["n_inframe_gt_projs"])
 92 |     assert pnp_metrics["add_auc"] == auc
 93 |     assert pnp_metrics["add_mean"] == np.mean(
 94 |         add[np.where(add > pnp_sol_found_magic_number)]
 95 |     )
 96 |     assert pnp_metrics["add_median"] == np.median(
 97 |         add[np.where(add > pnp_sol_found_magic_number)]
 98 |     )
 99 |     assert pnp_metrics["add_std"] == np.std(
100 |         add[np.where(add > pnp_sol_found_magic_number)]
101 |     )
102 |     assert pnp_metrics["num_pnp_found"] == n_pnp_found
103 |     assert pnp_metrics["num_pnp_possible"] == n_pnp_possible_frames
104 | 
105 |     # divide might screw this up .... to check!
106 |     print("auc", auc)
107 |     print("found", n_pnp_found / n_pnp_possible_frames)
108 |     print("mean", np.mean(add[np.where(add > pnp_sol_found_magic_number)]))
109 |     print("median", np.median(add[np.where(add > pnp_sol_found_magic_number)]))
110 |     print("std", np.std(add[np.where(add > pnp_sol_found_magic_number)]))
111 | 
112 |     try:
113 |         label = args.labels[i_csv]
114 |     except:
115 |         label = name_csv
116 |     cycle = plt.rcParams["axes.prop_cycle"].by_key()["color"]
117 |     try:
118 | 
119 |         colour = cycle[int(args.colours[i_csv])]
120 |     except:
121 |         colour = ""
122 | 
123 |     try:
124 |         style = args.styles[i_csv]
125 |         if style == "0":
126 |             style = "-"
127 |         elif style == "1":
128 |             style = "--"
129 |         elif style == "2":
130 |             style = ":"
131 | 
132 |         else:
133 |             style = "-"
134 |     except:
135 |         style = "-"
136 | 
137 |     label = f"{label} ({auc:.3f})"
138 |     ax.plot(add_threshold_values, counts, style, color=colour, label=label)
139 | 
140 | plt.xlabel("ADD threshold distance (mm)")
141 | plt.ylabel("Accuracy")
142 | plt.title(args.title)
143 | ax.legend(loc="lower right", frameon=True, fancybox=True, framealpha=0.8)
144 | 
145 | 
146 | legend = ax.get_legend()
147 | for i, t in enumerate(legend.get_texts()):
148 |     if args.data[i] == "666":
149 |         t.set_ha("left")  # ha is alias for horizontalalignment
150 |         t.set_position((-30, 0))
151 | 
152 | ax.set_ylim(0, 1)
153 | ax.set_xlim(0, float(args.threshold))
154 | ax.set_xticklabels([0, 20, 40, 60, 80, 100])
155 | 
156 | plt.savefig(args.output)
157 | if args.show:
158 |     plt.show()
159 | 


--------------------------------------------------------------------------------
/dream/datasets.py:
--------------------------------------------------------------------------------
  1 | # Copyright (c) 2020 NVIDIA Corporation. All rights reserved.
  2 | # This work is licensed under the NVIDIA Source Code License - Non-commercial. Full
  3 | # text can be found in LICENSE.md
  4 | 
  5 | from enum import IntEnum
  6 | 
  7 | import albumentations as albu
  8 | import numpy as np
  9 | from PIL import Image as PILImage
 10 | import torch
 11 | from torch.utils.data import Dataset as TorchDataset
 12 | import torchvision.transforms as TVTransforms
 13 | 
 14 | import dream
 15 | 
 16 | # ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 17 | 
 18 | # Debug mode:
 19 | # 0: no debug mode
 20 | # 1: light debug
 21 | # 2: heavy debug
 22 | class ManipulatorNDDSDatasetDebugLevels(IntEnum):
 23 |     # No debug information
 24 |     NONE = 0
 25 |     # Minor debug information, passing of extra info but not saving to disk
 26 |     LIGHT = 1
 27 |     # Heavy debug information, including saving data to disk
 28 |     HEAVY = 2
 29 |     # Interactive debug mode, not intended to be used for actual training
 30 |     INTERACTIVE = 3
 31 | 
 32 | 
 33 | class ManipulatorNDDSDataset(TorchDataset):
 34 |     def __init__(
 35 |         self,
 36 |         ndds_dataset,
 37 |         manipulator_name,
 38 |         keypoint_names,
 39 |         network_input_resolution,
 40 |         network_output_resolution,
 41 |         image_normalization,
 42 |         image_preprocessing,
 43 |         augment_data=False,
 44 |         include_ground_truth=True,
 45 |         include_belief_maps=False,
 46 |         debug_mode=ManipulatorNDDSDatasetDebugLevels["NONE"],
 47 |     ):
 48 |         # Read in the camera intrinsics
 49 |         self.ndds_dataset_data = ndds_dataset[0]
 50 |         self.ndds_dataset_config = ndds_dataset[1]
 51 |         self.manipulator_name = manipulator_name
 52 |         self.keypoint_names = keypoint_names
 53 |         self.network_input_resolution = network_input_resolution
 54 |         self.network_output_resolution = network_output_resolution
 55 |         self.augment_data = augment_data
 56 | 
 57 |         # If include_belief_maps is specified, include_ground_truth must also be
 58 |         # TBD: revisit better way of passing inputs, maybe to make one argument instead of two
 59 |         if include_belief_maps:
 60 |             assert (
 61 |                 include_ground_truth
 62 |             ), 'If "include_belief_maps" is True, "include_ground_truth" must also be True.'
 63 |         self.include_ground_truth = include_ground_truth
 64 |         self.include_belief_maps = include_belief_maps
 65 | 
 66 |         self.debug_mode = debug_mode
 67 | 
 68 |         assert (
 69 |             isinstance(image_normalization, dict) or not image_normalization
 70 |         ), 'Expected image_normalization to be either a dict specifying "mean" and "stdev", or None or False to specify no normalization.'
 71 | 
 72 |         # Image normalization
 73 |         # Basic PIL -> tensor without normalization, used for visualizing the net input image
 74 |         self.tensor_from_image_no_norm_tform = TVTransforms.Compose(
 75 |             [TVTransforms.ToTensor()]
 76 |         )
 77 | 
 78 |         if image_normalization:
 79 |             assert (
 80 |                 "mean" in image_normalization and len(image_normalization["mean"]) == 3
 81 |             ), 'When image_normalization is a dict, expected key "mean" specifying a 3-tuple to exist, but it does not.'
 82 |             assert (
 83 |                 "stdev" in image_normalization
 84 |                 and len(image_normalization["stdev"]) == 3
 85 |             ), 'When image_normalization is a dict, expected key "stdev" specifying a 3-tuple to exist, but it does not.'
 86 | 
 87 |             self.tensor_from_image_tform = TVTransforms.Compose(
 88 |                 [
 89 |                     TVTransforms.ToTensor(),
 90 |                     TVTransforms.Normalize(
 91 |                         image_normalization["mean"], image_normalization["stdev"]
 92 |                     ),
 93 |                 ]
 94 |             )
 95 |         else:
 96 |             # Use the PIL -> tensor tform without normalization if image_normalization isn't specified
 97 |             self.tensor_from_image_tform = self.tensor_from_image_no_norm_tform
 98 | 
 99 |         assert (
100 |             image_preprocessing in dream.image_proc.KNOWN_IMAGE_PREPROC_TYPES
101 |         ), 'Image preprocessing type "{}" is not recognized.'.format(
102 |             image_preprocessing
103 |         )
104 |         self.image_preprocessing = image_preprocessing
105 | 
106 |     def __len__(self):
107 |         return len(self.ndds_dataset_data)
108 | 
109 |     def __getitem__(self, index):
110 | 
111 |         # Parse this datum
112 |         datum = self.ndds_dataset_data[index]
113 |         image_rgb_path = datum["image_paths"]["rgb"]
114 | 
115 |         # Extract keypoints from the json file
116 |         data_path = datum["data_path"]
117 |         if self.include_ground_truth:
118 |             keypoints = dream.utilities.load_keypoints(
119 |                 data_path, self.manipulator_name, self.keypoint_names
120 |             )
121 |         else:
122 |             # Generate an empty 'keypoints' dict
123 |             keypoints = dream.utilities.load_keypoints(
124 |                 data_path, self.manipulator_name, []
125 |             )
126 | 
127 |         # Load image and transform to network input resolution -- pre augmentation
128 |         image_rgb_raw = PILImage.open(image_rgb_path).convert("RGB")
129 |         image_raw_resolution = image_rgb_raw.size
130 | 
131 |         # Do image preprocessing, including keypoint conversion
132 |         image_rgb_before_aug = dream.image_proc.preprocess_image(
133 |             image_rgb_raw, self.network_input_resolution, self.image_preprocessing
134 |         )
135 |         kp_projs_before_aug = dream.image_proc.convert_keypoints_to_netin_from_raw(
136 |             keypoints["projections"],
137 |             image_raw_resolution,
138 |             self.network_input_resolution,
139 |             self.image_preprocessing,
140 |         )
141 | 
142 |         # Handle data augmentation
143 |         if self.augment_data:
144 |             augmentation = albu.Compose(
145 |                 [
146 |                     albu.GaussNoise(),
147 |                     albu.RandomBrightnessContrast(brightness_by_max=False),
148 |                     albu.ShiftScaleRotate(rotate_limit=15),
149 |                 ],
150 |                 p=1.0,
151 |                 keypoint_params={"format": "xy", "remove_invisible": False},
152 |             )
153 |             data_to_aug = {
154 |                 "image": np.array(image_rgb_before_aug),
155 |                 "keypoints": kp_projs_before_aug,
156 |             }
157 |             augmented_data = augmentation(**data_to_aug)
158 |             image_rgb_net_input = PILImage.fromarray(augmented_data["image"])
159 |             kp_projs_net_input = augmented_data["keypoints"]
160 |         else:
161 |             image_rgb_net_input = image_rgb_before_aug
162 |             kp_projs_net_input = kp_projs_before_aug
163 | 
164 |         assert (
165 |             image_rgb_net_input.size == self.network_input_resolution
166 |         ), "Expected resolution for image_rgb_net_input to be equal to specified network input resolution, but they are different."
167 | 
168 |         # Now convert keypoints at network input to network output for use as the trained label
169 |         kp_projs_net_output = dream.image_proc.convert_keypoints_to_netout_from_netin(
170 |             kp_projs_net_input,
171 |             self.network_input_resolution,
172 |             self.network_output_resolution,
173 |         )
174 | 
175 |         # Convert to tensor for output handling
176 |         # This one goes through image normalization (used for inference)
177 |         image_rgb_net_input_as_tensor = self.tensor_from_image_tform(
178 |             image_rgb_net_input
179 |         )
180 | 
181 |         # This one is not (used for net input overlay visualizations - hence "viz")
182 |         image_rgb_net_input_viz_as_tensor = self.tensor_from_image_no_norm_tform(
183 |             image_rgb_net_input
184 |         )
185 | 
186 |         # Convert keypoint data to tensors - use float32 size
187 |         keypoint_positions_wrt_cam_as_tensor = torch.from_numpy(
188 |             np.array(keypoints["positions_wrt_cam"])
189 |         ).float()
190 |         kp_projs_net_output_as_tensor = torch.from_numpy(
191 |             np.array(kp_projs_net_output)
192 |         ).float()
193 | 
194 |         # Construct output sample
195 |         sample = {
196 |             "image_rgb_input": image_rgb_net_input_as_tensor,
197 |             "keypoint_projections_output": kp_projs_net_output_as_tensor,
198 |             "keypoint_positions": keypoint_positions_wrt_cam_as_tensor,
199 |             "config": datum,
200 |         }
201 | 
202 |         # Generate the belief maps directly
203 |         if self.include_belief_maps:
204 |             belief_maps = dream.image_proc.create_belief_map(
205 |                 self.network_output_resolution, kp_projs_net_output_as_tensor
206 |             )
207 |             belief_maps_as_tensor = torch.tensor(belief_maps).float()
208 |             sample["belief_maps"] = belief_maps_as_tensor
209 | 
210 |         if self.debug_mode >= ManipulatorNDDSDatasetDebugLevels["LIGHT"]:
211 |             kp_projections_as_tensor = torch.from_numpy(
212 |                 np.array(keypoints["projections"])
213 |             ).float()
214 |             sample["keypoint_projections_raw"] = kp_projections_as_tensor
215 |             kp_projections_input_as_tensor = torch.from_numpy(
216 |                 kp_projs_net_input
217 |             ).float()
218 |             sample["keypoint_projections_input"] = kp_projections_input_as_tensor
219 |             image_raw_resolution_as_tensor = torch.tensor(image_raw_resolution).float()
220 |             sample["image_resolution_raw"] = image_raw_resolution_as_tensor
221 |             sample["image_rgb_input_viz"] = image_rgb_net_input_viz_as_tensor
222 | 
223 |         # TODO: same as LIGHT debug, but also saves to disk
224 |         if self.debug_mode >= ManipulatorNDDSDatasetDebugLevels["HEAVY"]:
225 |             pass
226 | 
227 |         # Display to screen
228 |         if self.debug_mode >= ManipulatorNDDSDatasetDebugLevels["INTERACTIVE"]:
229 |             # Ensure that the points are consistent with the image transformations
230 |             # The overlaid points on both image should be consistent, despite image transformations
231 |             debug_image_raw = dream.image_proc.overlay_points_on_image(
232 |                 image_rgb_raw, keypoints["projections"], self.keypoint_names
233 |             )
234 |             debug_image_raw.show()
235 | 
236 |             debug_image = dream.image_proc.overlay_points_on_image(
237 |                 image_rgb_net_input, kp_projs_net_input, self.keypoint_names
238 |             )
239 |             debug_image.show()
240 | 
241 |             # Also show that the output resolution data are consistent
242 |             image_rgb_net_output = image_rgb_net_input.resize(
243 |                 self.network_output_resolution, resample=PILImage.BILINEAR
244 |             )
245 |             debug_image_rgb_net_output = dream.image_proc.overlay_points_on_image(
246 |                 image_rgb_net_output, kp_projs_net_output, self.keypoint_names
247 |             )
248 |             debug_image_rgb_net_output.show()
249 | 
250 |             if self.include_belief_maps:
251 |                 for kp_idx in range(len(self.keypoint_names)):
252 |                     belief_map_kp = dream.image_proc.image_from_belief_map(
253 |                         belief_maps_as_tensor[kp_idx]
254 |                     )
255 |                     belief_map_kp.show()
256 | 
257 |                     belief_map_kp_upscaled = belief_map_kp.resize(
258 |                         self.network_input_resolution, resample=PILImage.BILINEAR
259 |                     )
260 |                     image_rgb_net_output_belief_blend = PILImage.blend(
261 |                         image_rgb_net_input, belief_map_kp_upscaled, alpha=0.5
262 |                     )
263 |                     image_rgb_net_output_belief_blend_overlay = dream.image_proc.overlay_points_on_image(
264 |                         image_rgb_net_output_belief_blend,
265 |                         [kp_projs_net_input[kp_idx]],
266 |                         [self.keypoint_names[kp_idx]],
267 |                     )
268 |                     image_rgb_net_output_belief_blend_overlay.show()
269 | 
270 |             # This only works if the number of workers is zero
271 |             input("Press Enter to continue...")
272 | 
273 |         return sample
274 | 
275 | 
276 | if __name__ == "__main__":
277 |     from PIL import Image
278 | 
279 |     # beliefs = CreateBeliefMap((100,100),[(50,50),(-1,-1),(0,50),(50,0),(10,10)])
280 |     # for i,b in enumerate(beliefs):
281 |     #     print(b.shape)
282 |     #     stack = np.stack([b,b,b],axis=0).transpose(2,1,0)
283 |     #     im = Image.fromarray((stack*255).astype('uint8'))
284 |     #     im.save('{}.png'.format(i))
285 | 
286 |     path = "/home/sbirchfield/data/FrankaSimpleHomeDR20k/"
287 |     # path = '/home/sbirchfield/data/FrankaSimpleMPGammaDR105k/'
288 | 
289 |     keypoint_names = [
290 |         "panda_link0",
291 |         "panda_link2",
292 |         "panda_link3",
293 |         "panda_link4",
294 |         "panda_link6",
295 |         "panda_link7",
296 |         "panda_hand",
297 |     ]
298 | 
299 |     found_data = dream.utilities.find_ndds_data_in_dir(path)
300 |     train_dataset = ManipulatorNDDSDataset(
301 |         found_data,
302 |         "panda",
303 |         keypoint_names,
304 |         (400, 400),
305 |         (100, 100),
306 |         include_belief_maps=True,
307 |         augment_data=True,
308 |     )
309 |     trainingdata = torch.utils.data.DataLoader(
310 |         train_dataset, batch_size=32, shuffle=False, num_workers=1, pin_memory=True
311 |     )
312 | 
313 |     targets = iter(trainingdata).next()
314 | 
315 |     for i, b in enumerate(targets["belief_maps"][0]):
316 |         print(b.shape)
317 |         stack = np.stack([b, b, b], axis=0).transpose(2, 1, 0)
318 |         im = Image.fromarray((stack * 255).astype("uint8"))
319 |         im.save("{}.png".format(i))
320 | 


--------------------------------------------------------------------------------
/dream/geometric_vision.py:
--------------------------------------------------------------------------------
  1 | # Copyright (c) 2020 NVIDIA Corporation. All rights reserved.
  2 | # This work is licensed under the NVIDIA Source Code License - Non-commercial. Full
  3 | # text can be found in LICENSE.md
  4 | 
  5 | import cv2
  6 | import numpy as np
  7 | from pyrr import Quaternion
  8 | 
  9 | # ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 10 | 
 11 | 
 12 | def convert_rvec_to_quaternion(rvec):
 13 |     """Convert rvec (which is log quaternion) to quaternion"""
 14 |     theta = np.sqrt(
 15 |         rvec[0] * rvec[0] + rvec[1] * rvec[1] + rvec[2] * rvec[2]
 16 |     )  # in radians
 17 |     raxis = [rvec[0] / theta, rvec[1] / theta, rvec[2] / theta]
 18 | 
 19 |     # pyrr's Quaternion (order is XYZW), https://pyrr.readthedocs.io/en/latest/oo_api_quaternion.html
 20 |     quaternion = Quaternion.from_axis_rotation(raxis, theta)
 21 |     quaternion.normalize()
 22 |     return quaternion
 23 | 
 24 | 
 25 | def hnormalized(vector):
 26 |     hnormalized_vector = (vector / vector[-1])[:-1]
 27 |     return hnormalized_vector
 28 | 
 29 | 
 30 | def point_projection_from_3d(camera_K, points):
 31 |     projections = []
 32 |     for p in points:
 33 |         p_unflattened = np.matmul(camera_K, p)
 34 |         projection = hnormalized(p_unflattened)
 35 |         projections.append(projection)
 36 |     projections = np.array(projections)
 37 |     return projections
 38 | 
 39 | 
 40 | def solve_pnp(
 41 |     canonical_points,
 42 |     projections,
 43 |     camera_K,
 44 |     method=cv2.SOLVEPNP_EPNP,
 45 |     refinement=True,
 46 |     dist_coeffs=np.array([]),
 47 | ):
 48 | 
 49 |     n_canonial_points = len(canonical_points)
 50 |     n_projections = len(projections)
 51 |     assert (
 52 |         n_canonial_points == n_projections
 53 |     ), "Expected canonical_points and projections to have the same length, but they are length {} and {}.".format(
 54 |         n_canonial_points, n_projections
 55 |     )
 56 | 
 57 |     # Process points to remove any NaNs
 58 |     canonical_points_proc = []
 59 |     projections_proc = []
 60 |     for canon_pt, proj in zip(canonical_points, projections):
 61 | 
 62 |         if (
 63 |             canon_pt is None
 64 |             or len(canon_pt) == 0
 65 |             or canon_pt[0] is None
 66 |             or canon_pt[1] is None
 67 |             or proj is None
 68 |             or len(proj) == 0
 69 |             or proj[0] is None
 70 |             or proj[1] is None
 71 |         ):
 72 |             continue
 73 | 
 74 |         canonical_points_proc.append(canon_pt)
 75 |         projections_proc.append(proj)
 76 | 
 77 |     # Return if no valid points
 78 |     if len(canonical_points_proc) == 0:
 79 |         return False, None, None
 80 | 
 81 |     canonical_points_proc = np.array(canonical_points_proc)
 82 |     projections_proc = np.array(projections_proc)
 83 | 
 84 |     # Use cv2's PNP solver
 85 |     try:
 86 |         pnp_retval, rvec, tvec = cv2.solvePnP(
 87 |             canonical_points_proc.reshape(canonical_points_proc.shape[0], 1, 3),
 88 |             projections_proc.reshape(projections_proc.shape[0], 1, 2),
 89 |             camera_K,
 90 |             dist_coeffs,
 91 |             flags=method,
 92 |         )
 93 | 
 94 |         if refinement:
 95 |             pnp_retval, rvec, tvec = cv2.solvePnP(
 96 |                 canonical_points_proc.reshape(canonical_points_proc.shape[0], 1, 3),
 97 |                 projections_proc.reshape(projections_proc.shape[0], 1, 2),
 98 |                 camera_K,
 99 |                 dist_coeffs,
100 |                 flags=cv2.SOLVEPNP_ITERATIVE,
101 |                 useExtrinsicGuess=True,
102 |                 rvec=rvec,
103 |                 tvec=tvec,
104 |             )
105 |         translation = tvec[:, 0]
106 |         quaternion = convert_rvec_to_quaternion(rvec[:, 0])
107 | 
108 |     except:
109 |         pnp_retval = False
110 |         translation = None
111 |         quaternion = None
112 | 
113 |     return pnp_retval, translation, quaternion
114 | 
115 | 
116 | def solve_pnp_ransac(
117 |     canonical_points,
118 |     projections,
119 |     camera_K,
120 |     method=cv2.SOLVEPNP_EPNP,
121 |     inlier_thresh_px=5.0,  # this is the threshold for each point to be considered an inlier
122 |     dist_coeffs=np.array([]),
123 | ):
124 | 
125 |     n_canonial_points = len(canonical_points)
126 |     n_projections = len(projections)
127 |     assert (
128 |         n_canonial_points == n_projections
129 |     ), "Expected canonical_points and projections to have the same length, but they are length {} and {}.".format(
130 |         n_canonial_points, n_projections
131 |     )
132 | 
133 |     # Process points to remove any NaNs
134 |     canonical_points_proc = []
135 |     projections_proc = []
136 |     for canon_pt, proj in zip(canonical_points, projections):
137 | 
138 |         if (
139 |             canon_pt is None
140 |             or len(canon_pt) == 0
141 |             or canon_pt[0] is None
142 |             or canon_pt[1] is None
143 |             or proj is None
144 |             or len(proj) == 0
145 |             or proj[0] is None
146 |             or proj[1] is None
147 |         ):
148 |             continue
149 | 
150 |         canonical_points_proc.append(canon_pt)
151 |         projections_proc.append(proj)
152 | 
153 |     # Return if no valid points
154 |     if len(canonical_points_proc) == 0:
155 |         return False, None, None, None
156 | 
157 |     canonical_points_proc = np.array(canonical_points_proc)
158 |     projections_proc = np.array(projections_proc)
159 | 
160 |     # Use cv2's PNP solver
161 |     try:
162 |         pnp_retval, rvec, tvec, inliers = cv2.solvePnPRansac(
163 |             canonical_points_proc.reshape(canonical_points_proc.shape[0], 1, 3),
164 |             projections_proc.reshape(projections_proc.shape[0], 1, 2),
165 |             camera_K,
166 |             dist_coeffs,
167 |             reprojectionError=inlier_thresh_px,
168 |             flags=method,
169 |         )
170 | 
171 |         translation = tvec[:, 0]
172 |         quaternion = convert_rvec_to_quaternion(rvec[:, 0])
173 | 
174 |     except:
175 |         pnp_retval = False
176 |         translation = None
177 |         quaternion = None
178 |         inliers = None
179 | 
180 |     return pnp_retval, translation, quaternion, inliers
181 | 
182 | 
183 | def add_from_pose(translation, quaternion, keypoint_positions_wrt_cam_gt, camera_K):
184 |     transform = np.eye(4)
185 |     transform[:3, :3] = quaternion.matrix33.tolist()
186 |     transform[:3, -1] = translation
187 |     kp_pos_gt_homog = np.hstack(
188 |         (
189 |             keypoint_positions_wrt_cam_gt,
190 |             np.ones((keypoint_positions_wrt_cam_gt.shape[0], 1)),
191 |         )
192 |     )
193 |     kp_pos_aligned = np.transpose(np.matmul(transform, np.transpose(kp_pos_gt_homog)))[
194 |         :, :3
195 |     ]
196 |     # The below lines were useful when debugging pnp ransac, so left here for now
197 |     # projs = point_projection_from_3d(camera_K, kp_pos_aligned)
198 |     # temp = np.linalg.norm(kp_projs_est_pnp - projs, axis=1) # all of these should be below the inlier threshold above!
199 |     kp_3d_errors = kp_pos_aligned - keypoint_positions_wrt_cam_gt
200 |     kp_3d_l2_errors = np.linalg.norm(kp_3d_errors, axis=1)
201 |     add = np.mean(kp_3d_l2_errors)
202 |     return add
203 | 


--------------------------------------------------------------------------------
/dream/network.py:
--------------------------------------------------------------------------------
  1 | # Copyright (c) 2020 NVIDIA Corporation. All rights reserved.
  2 | # This work is licensed under the NVIDIA Source Code License - Non-commercial. Full
  3 | # text can be found in LICENSE.md
  4 | 
  5 | import os
  6 | from PIL import Image as PILImage
  7 | 
  8 | import numpy as np
  9 | import ruamel.yaml
 10 | import torch
 11 | import torchvision.transforms as TVTransforms
 12 | 
 13 | from .spatial_softmax import SoftArgmaxPavlo
 14 | import dream
 15 | 
 16 | # ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 17 | 
 18 | KNOWN_ARCHITECTURES = [
 19 |     "vgg",
 20 |     "resnet",
 21 | ]
 22 | 
 23 | KNOWN_OPTIMIZERS = [
 24 |     "adam",  # the Adam optimizer
 25 |     "sgd",
 26 | ]  # the Stochastic Gradient Descent optimizer
 27 | 
 28 | 
 29 | def create_network_from_config_file(config_file_path, network_params_path=None):
 30 | 
 31 |     # Input argument handling
 32 |     assert os.path.exists(
 33 |         config_file_path
 34 |     ), 'Expected config_file_path "{}" to exist, but it does not.'.format(
 35 |         config_file_path
 36 |     )
 37 | 
 38 |     if network_params_path:
 39 |         load_network_parameters = True
 40 |         assert os.path.exists(
 41 |             network_params_path
 42 |         ), 'If provided, expected network_params_path "{}" to exist, but it does not.'.format(
 43 |             network_params_path
 44 |         )
 45 |     else:
 46 |         load_network_parameters = False
 47 | 
 48 |     # Create parser
 49 |     data_parser = ruamel.yaml.YAML(typ="safe")
 50 | 
 51 |     print('Loading network config file "{}"'.format(config_file_path))
 52 |     with open(config_file_path, "r") as f:
 53 |         network_config = data_parser.load(f)
 54 | 
 55 |     # Create the network
 56 |     dream_network = create_network_from_config_data(network_config)
 57 | 
 58 |     # Optionally load weights of the network
 59 |     if load_network_parameters:
 60 |         print('Loading network weights file "{}"'.format(network_params_path))
 61 |         dream_network.model.load_state_dict(torch.load(network_params_path))
 62 | 
 63 |     return dream_network
 64 | 
 65 | 
 66 | def create_network_from_config_data(network_config_data):
 67 | 
 68 |     # Create the network
 69 |     dream_network = DreamNetwork(network_config_data)
 70 |     return dream_network
 71 | 
 72 | 
 73 | class DreamNetwork:
 74 |     def __init__(self, network_config):
 75 | 
 76 |         # Assert input
 77 |         assert (
 78 |             "architecture" in network_config
 79 |         ), 'Required key "architecture" is missing from network configuration.'
 80 |         assert (
 81 |             "type" in network_config["architecture"]
 82 |         ), 'Required key "type" in dictionary "architecture" is missing from network configuration.'
 83 |         assert (
 84 |             "manipulator" in network_config
 85 |         ), 'Required key "manipulator" is missing from network configuration.'
 86 |         assert (
 87 |             "name" in network_config["manipulator"]
 88 |         ), 'Required key "name" in dictionary "manipulator" is missing from network configuration.'
 89 |         assert (
 90 |             "keypoints" in network_config["manipulator"]
 91 |         ), 'Required key "keypoints" in dictionary "manipulator" is missing from network configuration.'
 92 | 
 93 |         # Parse keypoint specification
 94 |         self.keypoint_names = []
 95 |         self.friendly_keypoint_names = []
 96 |         self.ros_keypoint_frames = []
 97 |         for kp_def in network_config["manipulator"]["keypoints"]:
 98 |             assert "name" in kp_def, 'Keypoint specification is missing key "name".'
 99 |             kp_name = kp_def["name"]
100 |             self.keypoint_names.append(kp_name)
101 | 
102 |             friendly_kp_name = (
103 |                 kp_def["friendly_name"] if "friendly_name" in kp_def else kp_name
104 |             )
105 |             self.friendly_keypoint_names.append(friendly_kp_name)
106 | 
107 |             ros_kp_frame = kp_def["ros_frame"] if "ros_frame" in kp_def else kp_name
108 |             self.ros_keypoint_frames.append(ros_kp_frame)
109 | 
110 |         self.network_config = network_config
111 | 
112 |         # TBD: the following "getters" should all be methods
113 |         self.manipulator_name = self.network_config["manipulator"]["name"]
114 |         self.n_keypoints = len(self.keypoint_names)
115 |         self.architecture_type = self.network_config["architecture"]["type"]
116 | 
117 |         # print robot info
118 |         print("`network.py`.  `DreamNetwork:__init()` ----------")
119 |         print("  Manipulator: {}".format(self.manipulator_name))
120 |         print("  Keypoint names: {}".format(self.keypoint_names))
121 |         print("  Friendly keypoint names: {}".format(self.friendly_keypoint_names))
122 |         print("  Architecture type: {}".format(self.architecture_type))
123 | 
124 |         # Parse normalization
125 |         assert (
126 |             "image_normalization" in self.network_config["architecture"]
127 |         ), 'Required key "image_normalization" in dictionary "architecture" is missing from network configuration.'
128 |         self.image_normalization = self.network_config["architecture"][
129 |             "image_normalization"
130 |         ]
131 | 
132 |         # Parse image preprocessing: how to handle what the network does between the input image and the network input layer
133 |         assert (
134 |             "image_preprocessing" in self.network_config["architecture"]
135 |         ), 'Required key "image_preprocessing" in dictionary "architecture" is missing from network configuration.'
136 |         assert (
137 |             self.image_preprocessing() in dream.image_proc.KNOWN_IMAGE_PREPROC_TYPES
138 |         ), 'Image preprocessing type "{}" is not recognized.'.format(
139 |             self.image_preprocessing()
140 |         )
141 | 
142 |         # Assert the output heads have been specified
143 |         assert (
144 |             "output_heads" in self.network_config["architecture"]
145 |         ), 'Required key "output_heads" in dictionary "architecture" is missing from network configuration.'
146 | 
147 |         assert (
148 |             self.architecture_type in KNOWN_ARCHITECTURES
149 |         ), 'Expected architecture type "{}" to be in the list of known network architectures, but it is not.'.format(
150 |             self.architecture_type
151 |         )
152 | 
153 |         # Assert that the input heads have been specified
154 |         assert (
155 |             "input_heads" in self.network_config["architecture"]
156 |         ), 'Required key "input_heads" in dictionary "architecture" is missing from network configuration.'
157 |         assert (
158 |             self.network_config["architecture"]["input_heads"][0] == "image_rgb"
159 |         ), 'First input head must be "image_rgb".'
160 | 
161 |         # Assert that the trained network input resolution has been specified
162 |         # The output resolution is calculated at the end, after the model is created
163 |         assert (
164 |             "training" in self.network_config
165 |         ), 'Required key "training" is missing from network configuration.'
166 |         assert (
167 |             "config" in self.network_config["training"]
168 |         ), 'Required key "config" in dictionary "training" is missing from network configuration.'
169 |         assert (
170 |             "net_input_resolution" in self.network_config["training"]["config"]
171 |         ), 'Required key "net_input_resolution" is missing from training configuration.'
172 |         len_trained_net_input_res = len(
173 |             self.network_config["training"]["config"]["net_input_resolution"]
174 |         )
175 |         assert (
176 |             len_trained_net_input_res == 2
177 |         ), "Expected trained net input resolution to have length 2, but it has length {}.".format(
178 |             len_trained_net_input_res
179 |         )
180 | 
181 |         assert (
182 |             "platform" in self.network_config["training"]
183 |         ), 'Required key "platform" in dictionary "training" is missing from network configuration.'
184 |         gpu_ids = self.network_config["training"]["platform"]["gpu_ids"]
185 |         data_parallel_device_ids = gpu_ids if gpu_ids else None
186 | 
187 |         # Options to sort belief map
188 |         # Use belief map scores to try to determine cases in multiple instances
189 |         self.use_belief_peak_scores = True
190 |         # This in the scale of the belief map, which is 0 - 1.0.
191 |         self.belief_peak_next_best_score = 0.25
192 | 
193 |         # Create architectures
194 |         if self.architecture_type == "vgg":
195 |             vgg_kwargs = {}
196 |             if "spatial_softmax" in self.network_config["architecture"]:
197 |                 assert self.network_config["architecture"]["output_heads"] == [
198 |                     "belief_maps",
199 |                     "keypoints",
200 |                 ]
201 |                 vgg_kwargs = {
202 |                     "internalize_spatial_softmax": True,
203 |                     "learned_beta": self.network_config["architecture"][
204 |                         "spatial_softmax"
205 |                     ]["learned_beta"],
206 |                     "initial_beta": self.network_config["architecture"][
207 |                         "spatial_softmax"
208 |                     ]["initial_beta"],
209 |                 }
210 |             else:
211 |                 assert self.network_config["architecture"]["output_heads"] == [
212 |                     "belief_maps"
213 |                 ]
214 |                 vgg_kwargs = {"internalize_spatial_softmax": False}
215 | 
216 |             # Check if using new decoder -- assume output is 100x100 if not
217 |             if (
218 |                 "deconv_decoder" in self.network_config["architecture"]
219 |                 and not "full_output" in self.network_config["architecture"]
220 |             ):
221 |                 use_deconv_decoder = self.network_config["architecture"][
222 |                     "deconv_decoder"
223 |                 ]
224 |                 vgg_kwargs["deconv_decoder"] = use_deconv_decoder
225 |             elif "full_output" in self.network_config["architecture"]:
226 |                 use_deconv_decoder = self.network_config["architecture"][
227 |                     "deconv_decoder"
228 |                 ]
229 |                 vgg_kwargs["deconv_decoder"] = use_deconv_decoder
230 |                 vgg_kwargs["full_output"] = True
231 | 
232 |                 if "n_stages" in self.network_config["architecture"]:
233 |                     vgg_kwargs["n_stages"] = self.network_config[
234 |                         "architecture"
235 |                     ]["n_stages"]
236 | 
237 |             # Check if skip connections -- use default if not
238 |             if "skip_connections" in self.network_config["architecture"]:
239 |                 vgg_kwargs["skip_connections"] = self.network_config[
240 |                     "architecture"
241 |                 ]["skip_connections"]
242 | 
243 |             if "n_stages" in self.network_config["architecture"]:
244 |                 self.model = torch.nn.DataParallel(
245 |                     dream.models.DreamHourglassMultiStage(
246 |                         self.n_keypoints, **vgg_kwargs
247 |                     ),
248 |                     device_ids=data_parallel_device_ids,
249 |                 ).cuda()
250 |             else:
251 |                 self.model = torch.nn.DataParallel(
252 |                     dream.models.DreamHourglass(
253 |                         self.n_keypoints, **vgg_kwargs
254 |                     ),
255 |                     device_ids=data_parallel_device_ids,
256 |                 ).cuda()
257 | 
258 |             loss_type = self.network_config["architecture"]["loss"]["type"]
259 | 
260 |             if loss_type == "mse":
261 |                 self.criterion = torch.nn.MSELoss()
262 |             elif loss_type == "huber":
263 |                 self.criterion = torch.nn.SmoothL1Loss()
264 |             else:
265 |                 assert False, "Loss not yet implemented."
266 | 
267 |         elif self.architecture_type == "resnet":
268 | 
269 |             # Assume we're only training on belief maps
270 |             assert self.network_config["architecture"]["output_heads"] == [
271 |                 "belief_maps"
272 |             ]
273 | 
274 |             resnet_kwargs = {}
275 | 
276 |             # Check if using "full" output -- assume output is "half" if not
277 |             if "full_decoder" in self.network_config["architecture"]:
278 |                 use_full_decoder = self.network_config["architecture"]["full_decoder"]
279 |                 resnet_kwargs["full"] = use_full_decoder
280 | 
281 |             self.model = torch.nn.DataParallel(
282 |                 dream.models.ResnetSimple(self.n_keypoints, **resnet_kwargs),
283 |                 device_ids=data_parallel_device_ids,
284 |             ).cuda()
285 | 
286 |             loss_type = self.network_config["architecture"]["loss"]["type"]
287 | 
288 |             if loss_type == "mse":
289 |                 self.criterion = torch.nn.MSELoss()
290 |             elif loss_type == "huber":
291 |                 self.criterion = torch.nn.SmoothL1Loss()
292 |             else:
293 |                 assert False, "Loss not yet implemented."
294 | 
295 |         else:
296 |             assert False, 'Network architecture type "{}" not defined.'.format(
297 |                 self.architecture_type
298 |             )
299 | 
300 |         # Optimizer is created in a separate call, because this isn't needed unless we're training
301 |         self.optimizer = None
302 | 
303 |         # Now determine trained network output resolution, if not specified, or assert consistency if so
304 |         trained_net_out_res_from_model = self.net_output_resolution_from_input_resolution(
305 |             self.trained_net_input_resolution()
306 |         )
307 |         trained_net_out_res_from_model_as_list = list(trained_net_out_res_from_model)
308 |         if "net_output_resolution" in self.network_config["training"]["config"]:
309 |             assert (
310 |                 self.network_config["training"]["config"]["net_output_resolution"]
311 |                 == trained_net_out_res_from_model_as_list
312 |             ), "Network model and config file disagree for trained network output resolution."
313 |         else:
314 |             # Add to config
315 |             self.network_config["training"]["config"][
316 |                 "net_output_resolution"
317 |             ] = trained_net_out_res_from_model_as_list
318 | 
319 |     def trained_net_input_resolution(self):
320 |         return tuple(self.network_config["training"]["config"]["net_input_resolution"])
321 | 
322 |     def trained_net_output_resolution(self):
323 |         return tuple(self.network_config["training"]["config"]["net_output_resolution"])
324 | 
325 |     def image_preprocessing(self):
326 |         return self.network_config["architecture"]["image_preprocessing"]
327 | 
328 |     def train(self, network_input_heads, target):
329 |         assert self.optimizer, "Optimizer must be defined. Use enable_training() first."
330 | 
331 |         self.optimizer.zero_grad()
332 | 
333 |         loss = self.loss(network_input_heads, target)
334 | 
335 |         loss.backward()
336 |         self.optimizer.step()
337 | 
338 |         return loss
339 | 
340 |     def loss(self, network_input_heads, target):
341 |         network_output_heads = self.model(network_input_heads[0])
342 | 
343 |         if self.network_config["architecture"]["output_heads"] == ["belief_maps"]:
344 | 
345 |             if "n_stages" in self.network_config["architecture"]:
346 |                 # print('hello')
347 |                 n_stages = len(network_output_heads)
348 |                 expanded_size = [n_stages] + [-1] * target.dim()
349 |                 target_expanded = target.unsqueeze(0).expand(expanded_size)
350 |                 loss = self.criterion(
351 |                     torch.stack(network_output_heads), target_expanded
352 |                 )
353 | 
354 |                 # This was the loss from before
355 |                 # loss = torch.tensor(0).float().cuda()
356 |                 # for belief in network_output_heads: loss += ((belief - target) * (belief - target)).mean()
357 | 
358 |             else:
359 |                 loss = self.criterion(network_output_heads[0], target)
360 | 
361 |         else:
362 |             assert False, "Not yet implemented."
363 | 
364 |         return loss
365 | 
366 |     # image_raw_resolution: (width, height) in pixels
367 |     # calls resolution_after_preprocessing under the hood, using network trained resolution as the reference resolution
368 |     def net_resolutions_from_image_raw_resolution(
369 |         self, image_raw_resolution, image_preprocessing_override=None
370 |     ):
371 | 
372 |         # Input argument handling
373 |         assert (
374 |             len(image_raw_resolution) == 2
375 |         ), 'Expected "image_raw_resolution" to have length 2, but it has length {}.'.format(
376 |             len(image_raw_resolution)
377 |         )
378 | 
379 |         image_preprocessing = (
380 |             image_preprocessing_override
381 |             if image_preprocessing_override
382 |             else self.image_preprocessing()
383 |         )
384 | 
385 |         # Calculate image resolution at network input layer, after preprocessing
386 |         net_input_resolution = dream.image_proc.resolution_after_preprocessing(
387 |             image_raw_resolution,
388 |             self.trained_net_input_resolution(),
389 |             image_preprocessing,
390 |         )
391 |         net_output_resolution = self.net_output_resolution_from_input_resolution(
392 |             net_input_resolution
393 |         )
394 | 
395 |         return net_input_resolution, net_output_resolution
396 | 
397 |     def net_output_resolution_from_input_resolution(self, net_input_resolution):
398 | 
399 |         # Input argument handling
400 |         assert (
401 |             len(net_input_resolution) == 2
402 |         ), 'Expected "net_input_resolution" to have length 2, but it has length {}.'.format(
403 |             len(net_input_resolution)
404 |         )
405 | 
406 |         netin_width, netin_height = net_input_resolution
407 | 
408 |         # Construct test input and send thru network to get the output
409 |         # This assumes there is only one input head, which is the RGB image
410 |         with torch.no_grad():
411 |             net_input_as_tensor_batch = torch.zeros(
412 |                 1, 3, netin_height, netin_width
413 |             ).cuda()
414 |             net_output_as_tensor_batch = self.model(net_input_as_tensor_batch)
415 |             net_output_shape = net_output_as_tensor_batch[0][0].shape
416 |             net_output_resolution = (net_output_shape[2], net_output_shape[1])
417 | 
418 |         return net_output_resolution
419 | 
420 |     # Wrapper for inferences from a PIL image directly
421 |     # Returns keypoints in the input image (not necessarily network input) frame
422 |     # Allows for an optional overwrite
423 |     def keypoints_from_image(
424 |         self, input_rgb_image_as_pil, image_preprocessing_override=None, debug=False
425 |     ):
426 | 
427 |         # Input handling
428 |         assert isinstance(
429 |             input_rgb_image_as_pil, PILImage.Image
430 |         ), 'Expected "input_rgb_image_as_pil" to be a PIL Image, but it is {}.'.format(
431 |             type(input_rgb_image_as_pil)
432 |         )
433 |         input_image_resolution = input_rgb_image_as_pil.size
434 | 
435 |         # do preprocessing
436 |         image_preprocessing = (
437 |             image_preprocessing_override
438 |             if image_preprocessing_override
439 |             else self.image_preprocessing()
440 |         )
441 | 
442 |         input_image_preproc_before_norm = dream.image_proc.preprocess_image(
443 |             input_rgb_image_as_pil,
444 |             self.trained_net_input_resolution(),
445 |             image_preprocessing,
446 |         )
447 | 
448 |         netin_res_inf = input_image_preproc_before_norm.size
449 |         tensor_from_image_tform = TVTransforms.Compose(
450 |             [
451 |                 TVTransforms.ToTensor(),
452 |                 TVTransforms.Normalize(
453 |                     self.image_normalization["mean"], self.image_normalization["stdev"]
454 |                 ),
455 |             ]
456 |         )
457 |         input_rgb_image_as_tensor = tensor_from_image_tform(
458 |             input_image_preproc_before_norm
459 |         )
460 | 
461 |         # Inference
462 |         with torch.no_grad():
463 |             input_rgb_image_as_tensor_batch = input_rgb_image_as_tensor.unsqueeze(
464 |                 0
465 |             ).cuda()
466 |             belief_maps_net_out_batch, detected_kp_projs_net_out_batch = self.inference(
467 |                 input_rgb_image_as_tensor_batch
468 |             )
469 | 
470 |         belief_maps_net_out = belief_maps_net_out_batch[0]
471 |         detected_kp_projs_net_out = np.array(
472 |             detected_kp_projs_net_out_batch[0], dtype=float
473 |         )
474 | 
475 |         # Get network output resolution for this inference based on belief maps
476 |         belief_map = belief_maps_net_out[0]
477 |         netout_res_inf = (belief_map.shape[1], belief_map.shape[0])
478 | 
479 |         # Convert keypoints from net-output to net-input
480 |         detected_kp_projs_net_in = dream.image_proc.convert_keypoints_to_netin_from_netout(
481 |             detected_kp_projs_net_out, netout_res_inf, netin_res_inf
482 |         )
483 |         detected_kp_projs = dream.image_proc.convert_keypoints_to_raw_from_netin(
484 |             detected_kp_projs_net_in,
485 |             netin_res_inf,
486 |             input_image_resolution,
487 |             image_preprocessing,
488 |         )
489 | 
490 |         detection_result = {"detected_keypoints": detected_kp_projs}
491 |         if debug:
492 |             detection_result["image_rgb_net_input"] = input_image_preproc_before_norm
493 |             detection_result["belief_maps"] = belief_maps_net_out
494 |             detection_result[
495 |                 "detected_keypoints_net_output"
496 |             ] = detected_kp_projs_net_out
497 |             detection_result["detected_keypoints_net_input"] = detected_kp_projs_net_in
498 | 
499 |         return detection_result
500 | 
501 |     # Inference is designed to return the best output of belief_maps and keypoints
502 |     # This is an abstraction layer so even if multiple stages are used, this only produces one set of outputs (for the final stage)
503 |     def inference(self, network_input):
504 | 
505 |         if self.architecture_type == "dream_legacy":
506 |             network_head_outputs = self.model(network_input)
507 |             belief_maps = network_head_outputs[0]
508 |             detected_kp_projections = self.soft_argmax(belief_maps)
509 |             network_output = [belief_maps, detected_kp_projections]
510 | 
511 |         elif self.network_config["architecture"]["output_heads"] == [
512 |             "belief_maps",
513 |             "keypoints",
514 |         ]:
515 |             network_output = self.model(network_input)
516 | 
517 |         elif self.network_config["architecture"]["output_heads"] == ["belief_maps"]:
518 | 
519 |             network_head_output = self.model(network_input)
520 | 
521 |             # use the last layer
522 |             belief_maps_batch = network_head_output[-1]
523 | 
524 |             # # If we need to transpose the belief map -- this is done here:
525 |             # for x in range(belief_maps_batch.shape[0]):
526 |             #     bmaps = belief_maps_batch[x]
527 |             #     belief_maps_batch[x] = torch.cat([torch.t(b).unsqueeze(0) for b in bmaps])
528 | 
529 |             peaks_from_belief_maps_kwargs = {}
530 |             (
531 |                 trained_net_output_width,
532 |                 trained_net_output_height,
533 |             ) = self.trained_net_output_resolution()
534 |             if trained_net_output_width >= 400 and trained_net_output_height >= 400:
535 |                 peaks_from_belief_maps_kwargs["offset_due_to_upsampling"] = 0.0
536 |             else:
537 |                 # heuristic based on previous work with smaller belief maps
538 |                 peaks_from_belief_maps_kwargs["offset_due_to_upsampling"] = 0.4395
539 | 
540 |             detected_kp_projs_batch = []
541 |             for belief_maps in belief_maps_batch:
542 |                 peaks = dream.image_proc.peaks_from_belief_maps(
543 |                     belief_maps, **peaks_from_belief_maps_kwargs
544 |                 )
545 | 
546 |                 # Determine keypoints from this
547 |                 detected_kp_projs = []
548 |                 for peak in peaks:
549 | 
550 |                     if len(peak) == 1:
551 |                         detected_kp_projs.append([peak[0][0], peak[0][1]])
552 |                     else:
553 |                         if self.use_belief_peak_scores and len(peak) > 1:
554 |                             # Try to use the belief map scores
555 |                             peak_sorted_by_score = sorted(
556 |                                 peak, key=lambda x: x[2], reverse=True
557 |                             )
558 |                             if (
559 |                                 peak_sorted_by_score[0][2] - peak_sorted_by_score[1][2]
560 |                                 >= self.belief_peak_next_best_score
561 |                             ):
562 |                                 # Keep the best score
563 |                                 detected_kp_projs.append(
564 |                                     [
565 |                                         peak_sorted_by_score[0][0],
566 |                                         peak_sorted_by_score[0][1],
567 |                                     ]
568 |                                 )
569 |                             else:
570 |                                 # Can't determine -- return no detection
571 |                                 # Can't use None because we need to return it as a tensor
572 |                                 detected_kp_projs.append([-999.999, -999.999])
573 | 
574 |                         else:
575 |                             # Can't determine -- return no detection
576 |                             # Can't use None because we need to return it as a tensor
577 |                             detected_kp_projs.append([-999.999, -999.999])
578 | 
579 |                 detected_kp_projs_batch.append(detected_kp_projs)
580 | 
581 |             detected_kp_projs_batch = torch.tensor(detected_kp_projs_batch).float()
582 | 
583 |             network_output = [belief_maps_batch, detected_kp_projs_batch]
584 | 
585 |         else:
586 |             assert (
587 |                 False
588 |             ), "Could not determine how to conduct inference on this network."
589 | 
590 |         return network_output
591 | 
592 |     def save_network_config(self, config_file_path, overwrite=False):
593 | 
594 |         if not overwrite:
595 |             assert not os.path.exists(
596 |                 config_file_path
597 |             ), 'Output file already exists in "{}".'.format(config_file_path)
598 | 
599 |         # Create saver
600 |         data_saver = ruamel.yaml.YAML()
601 |         data_saver.default_flow_style = False
602 |         data_saver.explicit_start = False
603 | 
604 |         with open(config_file_path, "w") as f:
605 |             # TBD - convert to ruamel.yaml.comments.CommentedMap to get rid of !!omap in yaml
606 |             data_saver.dump(self.network_config, f)
607 | 
608 |     def save_network_params(self, network_params_path, overwrite=False):
609 | 
610 |         if not overwrite:
611 |             assert not os.path.exists(
612 |                 network_params_path
613 |             ), 'Output file already exists in "{}".'.format(network_params_path)
614 | 
615 |         # Save weights
616 |         torch.save(self.model.state_dict(), network_params_path)
617 | 
618 |     def save_network(
619 |         self, output_dir, output_filename_without_extension, overwrite=False
620 |     ):
621 | 
622 |         dream.utilities.makedirs(output_dir, exist_ok=overwrite)
623 | 
624 |         network_config_dir = os.path.join(
625 |             output_dir, output_filename_without_extension + ".yaml"
626 |         )
627 |         self.save_network_config(network_config_dir, overwrite)
628 | 
629 |         network_params_path = os.path.join(
630 |             output_dir, output_filename_without_extension + ".pth"
631 |         )
632 |         self.save_network_params(network_params_path, overwrite)
633 | 
634 |     def enable_training(self):
635 | 
636 |         # Load optimizer if needed
637 |         if not self.optimizer:
638 | 
639 |             assert (
640 |                 "optimizer" in self.network_config["training"]["config"]
641 |             ), 'Required key "optimizer" in dictionary "config" is missing from network configuration.'
642 |             assert (
643 |                 "type" in self.network_config["training"]["config"]["optimizer"]
644 |             ), 'Required key "type" in dictionary "optimizer" is missing from network configuration.'
645 | 
646 |             network_parameters = filter(
647 |                 lambda p: p.requires_grad, self.model.parameters()
648 |             )
649 |             optimizer_type = self.network_config["training"]["config"]["optimizer"][
650 |                 "type"
651 |             ]
652 | 
653 |             assert (
654 |                 optimizer_type in KNOWN_OPTIMIZERS
655 |             ), 'Expected optimizer_type "{}" to be in the list of known optimizers, but it is not.'.format(
656 |                 optimizer_type
657 |             )
658 | 
659 |             if optimizer_type == "adam":
660 | 
661 |                 assert (
662 |                     "learning_rate"
663 |                     in self.network_config["training"]["config"]["optimizer"]
664 |                 ), 'Required key "learning_rate" in dictionary "optimizer" is missing to use the Adam optimizer.'
665 | 
666 |                 self.optimizer = torch.optim.Adam(
667 |                     network_parameters,
668 |                     lr=self.network_config["training"]["config"]["optimizer"][
669 |                         "learning_rate"
670 |                     ],
671 |                 )
672 | 
673 |             elif optimizer_type == "sgd":
674 | 
675 |                 assert (
676 |                     "learning_rate"
677 |                     in self.network_config["training"]["config"]["optimizer"]
678 |                 ), 'Required key "learning_rate" in dictionary "optimizer" is missing to use the SGD optimizer.'
679 | 
680 |                 self.optimizer = torch.optim.SGD(
681 |                     network_parameters,
682 |                     lr=self.network_config["training"]["config"]["optimizer"][
683 |                         "learning_rate"
684 |                     ],
685 |                 )
686 | 
687 |             else:
688 |                 assert False, 'Optimizer "{}" is not defined.'.format(optimizer_type)
689 | 
690 |         # Enable training mode
691 |         self.model.train()
692 | 
693 |     def enable_evaluation(self):
694 | 
695 |         # Enable evaluation mode
696 |         self.model.eval()
697 | 


--------------------------------------------------------------------------------
/dream/oks_plots.py:
--------------------------------------------------------------------------------
  1 | # Copyright (c) 2020 NVIDIA Corporation. All rights reserved.
  2 | # This work is licensed under the NVIDIA Source Code License - Non-commercial. Full
  3 | # text can be found in LICENSE.md
  4 | 
  5 | import pandas as pd
  6 | import argparse
  7 | import numpy as np
  8 | import matplotlib.pyplot as plt
  9 | import seaborn as sns
 10 | 
 11 | plt.style.use("seaborn-whitegrid")
 12 | 
 13 | 
 14 | # Example of running the script
 15 | # python oks_plots.py --data all_dataset_keypoints.csv all_dataset_keypoints.csv --labels 1 2
 16 | 
 17 | # pythonw oks_plots.py --data deep-arm-cal-paper/data/dope/3cam_real_keypoints.csv deep-arm-cal-paper/data/dream_hg/3cam_real_keypoints.csv deep-arm-cal-paper/data/dream_hg_deconv/3cam_real_keypoints.csv deep-arm-cal-paper/data/resimple/3cam_real_keypoints.csv --labels DOPE DREAM AE resnet
 18 | parser = argparse.ArgumentParser(description="OKS for DREAM")
 19 | parser.add_argument(
 20 |     "--data", nargs="+", default="[all_dataset_keypoints.csv]", help="list of csv files"
 21 | )
 22 | 
 23 | parser.add_argument(
 24 |     "--labels",
 25 |     nargs="+",
 26 |     default=None,
 27 |     help="names for each dataset to be added as label",
 28 | )
 29 | 
 30 | parser.add_argument("--styles", nargs="+", default=None, help="")
 31 | 
 32 | parser.add_argument("--colours", nargs="+", default=None, help="")
 33 | 
 34 | parser.add_argument("--pixel", default=20)
 35 | 
 36 | parser.add_argument("--output", default="output.pdf")
 37 | 
 38 | parser.add_argument("--show", default=False, action="store_true")
 39 | 
 40 | parser.add_argument("--title", default=None)
 41 | 
 42 | args = parser.parse_args()
 43 | print(args)
 44 | 
 45 | 
 46 | fig = plt.figure()
 47 | ax = plt.axes()
 48 | 
 49 | handles = []
 50 | 
 51 | for i_csv, csv_file in enumerate(args.data):
 52 |     print(csv_file)
 53 |     if csv_file == "666":
 54 |         plt.plot([], [], " ", label=args.labels[i_csv].replace("_", " "))
 55 |         continue
 56 |     name_csv = csv_file.replace(".csv", "")
 57 | 
 58 |     df = pd.read_csv(csv_file)
 59 | 
 60 |     # PCK percentage of correct keypoints
 61 |     all_dist = []
 62 |     all_pred = []
 63 |     all_gt = []
 64 |     for i in range(7):
 65 |         # Compute all the distances between keypoints - Implementing them does not work well
 66 | 
 67 |         fpred = []
 68 |         fgt = []
 69 |         gt = df[[f"kp{i}x_gt", f"kp{i}y_gt"]].values.tolist()
 70 |         pred = df[[f"kp{i}x", f"kp{i}y"]].values.tolist()
 71 | 
 72 |         all_gt.append(gt)
 73 |         all_pred.append(pred)
 74 | 
 75 |         for i_entry, entry in enumerate(gt):
 76 |             if entry[0] > 0 and entry[0] < 640 and entry[1] > 0 and entry[1] < 480:
 77 |                 fgt.append([float(entry[0]), float(entry[1])])
 78 |                 fpred.append([float(pred[i_entry][0]), float(pred[i_entry][1])])
 79 | 
 80 |         pred = np.array(fpred)
 81 |         gt = np.array(fgt)
 82 | 
 83 |         values = np.linalg.norm(gt - pred, axis=1)
 84 | 
 85 |         # print(pair.shape)
 86 |         # add them to a single list
 87 | 
 88 |         all_dist += values.tolist()
 89 | 
 90 |     all_dist = np.array(all_dist)
 91 | 
 92 |     # print(len(all_dist))
 93 | 
 94 |     # all_dist = all_dist[np.where(all_dist<1000)]
 95 |     # all_dist = all_dist[np.where(all_dist<1000)]
 96 | 
 97 |     # print(all_dist[:10])
 98 |     # print(all_dist.shape)
 99 |     print("detected", len(all_dist))
100 | 
101 |     pck_values = np.arange(0, int(args.pixel), 0.01)
102 | 
103 |     y_values = []
104 | 
105 |     for value in pck_values:
106 |         size_good = len(np.where(all_dist < value)[0]) / len(all_dist)
107 |         y_values.append(size_good)
108 |         # print(value,size_good)
109 | 
110 |     auc = np.trapz(y_values, dx=0.01) / float(args.pixel)
111 | 
112 |     print("auc", auc)
113 |     all_dist = all_dist[np.where(all_dist < 1000)]
114 |     print("mean", np.mean(all_dist))
115 |     print("median", np.median(all_dist))
116 |     print("std", np.std(all_dist))
117 | 
118 |     # TODO: consolidate above calculations with pnp_metrics
119 |     from dream import analysis as dream_analysis
120 | 
121 |     dim = np.array(all_pred).shape
122 |     temp_pred = np.reshape(np.array(all_pred), (dim[0] * dim[1], dim[2]))
123 |     temp_gt = np.reshape(np.array(all_gt), (dim[0] * dim[1], dim[2]))
124 |     kp_metrics = dream_analysis.keypoint_metrics(temp_pred, temp_gt, (640, 480))
125 |     assert kp_metrics["l2_error_mean_px"] == np.mean(all_dist)
126 |     assert kp_metrics["l2_error_median_px"] == np.median(all_dist)
127 |     assert kp_metrics["l2_error_std_px"] == np.std(all_dist)
128 |     assert np.abs(auc - kp_metrics["l2_error_auc"]) < 1.0e-15
129 | 
130 |     # plot
131 | 
132 |     try:
133 |         label = args.labels[i_csv]
134 |     except:
135 |         label = name_csv
136 |     cycle = plt.rcParams["axes.prop_cycle"].by_key()["color"]
137 |     try:
138 | 
139 |         colour = cycle[int(args.colours[i_csv])]
140 |     except:
141 |         colour = ""
142 | 
143 |     try:
144 |         style = args.styles[i_csv]
145 |         if style == "0":
146 |             style = "-"
147 |         elif style == "1":
148 |             style = "--"
149 |         elif style == "2":
150 |             style = ":"
151 | 
152 |         else:
153 |             style = "-"
154 |     except:
155 |         style = "-"
156 | 
157 |     label = f"{label} ({auc:.3f})"
158 |     ax.plot(pck_values, y_values, style, color=colour, label=label)
159 | 
160 | # from matplotlib.patches import Rectangle
161 | 
162 | # extra = Rectangle((0, 0), 1, 1, fc="w", fill=False, edgecolor='none', linewidth=0)
163 | 
164 | 
165 | plt.xlabel("PCK threshold distance (pixels)")
166 | plt.ylabel("Accuracy")
167 | plt.title(args.title)
168 | # ax.legend([extra, handles[0], handles[1], handles[2], extra , handles[3], handles[4], handles[5] ], ("Non-DR",0,1,2 ,"DR",0,1,2),loc = "lower right")
169 | ax.legend(loc="lower right", frameon=True, fancybox=True, framealpha=0.8)
170 | 
171 | legend = ax.get_legend()
172 | for i, t in enumerate(legend.get_texts()):
173 |     if args.data[i] == "666":
174 |         t.set_ha("left")  # ha is alias for horizontalalignment
175 |         t.set_position((-30, 0))
176 | 
177 | ax.set_ylim(0, 1)
178 | ax.set_xlim(0, int(args.pixel))
179 | plt.savefig(args.output)
180 | if args.show:
181 |     plt.show()
182 | 


--------------------------------------------------------------------------------
/dream/spatial_softmax.py:
--------------------------------------------------------------------------------
  1 | # Copyright (c) 2020 NVIDIA Corporation. All rights reserved.
  2 | # This work is licensed under the NVIDIA Source Code License - Non-commercial. Full
  3 | # text can be found in LICENSE.md
  4 | 
  5 | import torch
  6 | from torch.autograd import Variable
  7 | import torch.nn.functional as F
  8 | from torch.nn.parameter import Parameter
  9 | import numpy as np
 10 | 
 11 | # This code is adapted from
 12 | # https://gitlab-master.nvidia.com/pmolchanov/lpr-3d-hand-pose-rgb-demo/blob/master/handpose/models/image_heatmaps_pose2dZrel_softargmax_slim.py
 13 | 
 14 | 
 15 | class SoftArgmaxPavlo(torch.nn.Module):
 16 |     def __init__(self, n_keypoints=5, learned_beta=False, initial_beta=25.0):
 17 |         super(SoftArgmaxPavlo, self).__init__()
 18 |         self.avgpool = torch.nn.AvgPool2d(7, stride=1, padding=3)
 19 |         if learned_beta:
 20 |             self.beta = torch.nn.Parameter(torch.ones(n_keypoints) * initial_beta)
 21 |         else:
 22 |             self.beta = (torch.ones(n_keypoints) * initial_beta).cuda()
 23 | 
 24 |     def forward(self, heatmaps, size_mult=1.0):
 25 | 
 26 |         epsilon = 1e-8
 27 |         bch, ch, n_row, n_col = heatmaps.size()
 28 |         n_kpts = ch
 29 | 
 30 |         beta = self.beta
 31 | 
 32 |         # input has the shape (#bch, n_kpts+1, img_sz[0], img_sz[1])
 33 |         # +1 is for the Zrel
 34 |         heatmaps2d = heatmaps[:, :n_kpts, :, :]
 35 |         heatmaps2d = self.avgpool(heatmaps2d)
 36 | 
 37 |         # heatmaps2d has the shape (#bch, n_kpts, img_sz[0]*img_sz[1])
 38 |         heatmaps2d = heatmaps2d.contiguous().view(bch, n_kpts, -1)
 39 | 
 40 |         # getting the max value of the maps across each 2D matrix
 41 |         map_max = torch.max(heatmaps2d, dim=2, keepdim=True)[0]
 42 | 
 43 |         # reducing the max from each map
 44 |         # this will make the max value zero and all other values
 45 |         # will be negative.
 46 |         # max_reduced_maps has the shape (#bch, n_kpts, img_sz[0]*img_sz[1])
 47 |         heatmaps2d = heatmaps2d - map_max
 48 | 
 49 |         beta_ = beta.view(1, n_kpts, 1).repeat(1, 1, n_row * n_col)
 50 |         # due to applying the beta value, the non-max values will be further
 51 |         # pushed towards zero after applying the exp function
 52 |         exp_maps = torch.exp(beta_ * heatmaps2d)
 53 |         # normalizing the exp_maps by diving it to the sum of elements
 54 |         # exp_maps_sum has the shape (#bch, n_kpts, 1)
 55 |         exp_maps_sum = torch.sum(exp_maps, dim=2, keepdim=True)
 56 |         exp_maps_sum = exp_maps_sum.view(bch, n_kpts, 1, 1)
 57 |         normalized_maps = exp_maps.view(bch, n_kpts, n_row, n_col) / (
 58 |             exp_maps_sum + epsilon
 59 |         )
 60 | 
 61 |         col_vals = torch.arange(0, n_col) * size_mult
 62 |         col_repeat = col_vals.repeat(n_row, 1)
 63 |         col_idx = col_repeat.view(1, 1, n_row, n_col).cuda()
 64 |         # col_mat gives a column measurement matrix to be used for getting
 65 |         # 'x'. It is a matrix where each row has the sequential values starting
 66 |         # from 0 up to n_col-1:
 67 |         # 0,1,2, ..., n_col-1
 68 |         # 0,1,2, ..., n_col-1
 69 |         # 0,1,2, ..., n_col-1
 70 | 
 71 |         row_vals = torch.arange(0, n_row).view(n_row, -1) * size_mult
 72 |         row_repeat = row_vals.repeat(1, n_col)
 73 |         row_idx = row_repeat.view(1, 1, n_row, n_col).cuda()
 74 |         # row_mat gives a row measurement matrix to be used for getting 'y'.
 75 |         # It is a matrix where each column has the sequential values starting
 76 |         # from 0 up to n_row-1:
 77 |         # 0,0,0, ..., 0
 78 |         # 1,1,1, ..., 1
 79 |         # 2,2,2, ..., 2
 80 |         # ...
 81 |         # n_row-1, ..., n_row-1
 82 | 
 83 |         col_idx = Variable(col_idx, requires_grad=False)
 84 |         weighted_x = normalized_maps * col_idx.float()
 85 |         weighted_x = weighted_x.view(bch, n_kpts, -1)
 86 |         x_vals = torch.sum(weighted_x, dim=2)
 87 | 
 88 |         row_idx = Variable(row_idx, requires_grad=False)
 89 |         weighted_y = normalized_maps * row_idx.float()
 90 |         weighted_y = weighted_y.view(bch, n_kpts, -1)
 91 |         y_vals = torch.sum(weighted_y, dim=2)
 92 | 
 93 |         out = torch.stack((x_vals, y_vals), dim=2)
 94 | 
 95 |         return out
 96 | 
 97 | 
 98 | if __name__ == "__main__":
 99 |     from torch.autograd import Variable
100 | 
101 |     data = Variable(torch.zeros([1, 3, 3, 3]))
102 |     data[0, 0, 0, 1] = 10
103 |     data[0, 1, 1, 1] = 10
104 |     data[0, 2, 1, 2] = 10
105 |     layer = SpatialSoftmax(3, 3, 3, temperature=1)
106 |     print(layer(data))
107 | 


--------------------------------------------------------------------------------
/dream/utilities.py:
--------------------------------------------------------------------------------
  1 | # Copyright (c) 2020 NVIDIA Corporation. All rights reserved.
  2 | # This work is licensed under the NVIDIA Source Code License - Non-commercial. Full
  3 | # text can be found in LICENSE.md
  4 | 
  5 | import os
  6 | import random
  7 | 
  8 | import numpy as np
  9 | from ruamel.yaml import YAML
 10 | import torch
 11 | 
 12 | # ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 13 | 
 14 | 
 15 | def set_random_seed(seed):
 16 |     assert isinstance(
 17 |         seed, int
 18 |     ), 'Expected "seed" to be an integer, but it is "{}".'.format(type(seed))
 19 |     random.seed(seed)
 20 |     os.environ["PYTHONHASHSEED"] = str(seed)
 21 |     np.random.seed(seed)
 22 |     torch.manual_seed(seed)
 23 |     torch.cuda.manual_seed(seed)
 24 |     torch.cuda.manual_seed_all(seed)
 25 |     torch.backends.cudnn.deterministic = True
 26 |     torch.backends.cudnn.benchmark = False
 27 | 
 28 | 
 29 | def makedirs(directory, exist_ok=False):
 30 |     """A method that replicates the functionality of os.makedirs that works for both Python 2 and Python 3."""
 31 |     if os.path.exists(directory):
 32 |         assert exist_ok, 'Specified directory "{}" already exists.'.format(directory)
 33 |     else:
 34 |         os.makedirs(directory)
 35 |     return
 36 | 
 37 | 
 38 | def is_ndds_dataset(input_dir, data_extension="json"):
 39 | 
 40 |     # Input argument handling
 41 |     # Expand user shortcut if it exists
 42 |     input_dir = os.path.expanduser(input_dir)
 43 |     assert os.path.exists(
 44 |         input_dir
 45 |     ), 'Expected path "{}" to exist, but it does not.'.format(input_dir)
 46 |     assert isinstance(
 47 |         data_extension, str
 48 |     ), 'Expected "data_extension" to be a string, but it is "{}".'.format(
 49 |         type(data_extension)
 50 |     )
 51 | 
 52 |     data_full_ext = "." + data_extension
 53 | 
 54 |     dirlist = os.listdir(input_dir)
 55 | 
 56 |     # Find json files
 57 |     data_filenames = [f for f in dirlist if f.endswith(data_full_ext)]
 58 | 
 59 |     # Extract name from json file
 60 |     data_names = [os.path.splitext(f)[0] for f in data_filenames if f[0].isdigit()]
 61 | 
 62 |     is_ndds_dataset = True if data_names else False
 63 | 
 64 |     return is_ndds_dataset
 65 | 
 66 | 
 67 | def find_ndds_data_in_dir(
 68 |     input_dir, data_extension="json", image_extension=None, requested_image_types="all",
 69 | ):
 70 | 
 71 |     # Input argument handling
 72 |     # Expand user shortcut if it exists
 73 |     input_dir = os.path.expanduser(input_dir)
 74 |     assert os.path.exists(
 75 |         input_dir
 76 |     ), 'Expected path "{}" to exist, but it does not.'.format(input_dir)
 77 |     dirlist = os.listdir(input_dir)
 78 | 
 79 |     assert isinstance(
 80 |         data_extension, str
 81 |     ), 'Expected "data_extension" to be a string, but it is "{}".'.format(
 82 |         type(data_extension)
 83 |     )
 84 |     data_full_ext = "." + data_extension
 85 | 
 86 |     if image_extension is None:
 87 |         # Auto detect based on list of image extensions to try
 88 |         # In case there is a tie, prefer the extensions that are closer to the front
 89 |         image_exts_to_try = ["png", "jpg"]
 90 |         num_image_exts = []
 91 |         for image_ext in image_exts_to_try:
 92 |             num_image_exts.append(len([f for f in dirlist if f.endswith(image_ext)]))
 93 |         max_num_image_exts = np.max(num_image_exts)
 94 |         idx_max = np.where(num_image_exts == max_num_image_exts)[0]
 95 |         # If there are multiple indices due to ties, this uses the one closest to the front
 96 |         image_extension = image_exts_to_try[idx_max[0]]
 97 |         # Mention to user if there are multiple cases to ensure they are aware of the selection
 98 |         if len(idx_max) > 1 and max_num_image_exts > 0:
 99 |             print(
100 |                 'Multiple sets of images detected in NDDS dataset with different extensions. Using extension "{}".'.format(
101 |                     image_extension
102 |                 )
103 |             )
104 |     else:
105 |         assert isinstance(
106 |             image_extension, str
107 |         ), 'If specified, expected "image_extension" to be a string, but it is "{}".'.format(
108 |             type(image_extension)
109 |         )
110 |     image_full_ext = "." + image_extension
111 | 
112 |     assert (
113 |         requested_image_types is None
114 |         or requested_image_types == "all"
115 |         or isinstance(requested_image_types, list)
116 |     ), "Expected \"requested_image_types\" to be None, 'all', or a list of requested_image_types."
117 | 
118 |     # Read in json files
119 |     data_filenames = [f for f in dirlist if f.endswith(data_full_ext)]
120 | 
121 |     # Sort candidate data files by name
122 |     data_filenames.sort()
123 | 
124 |     data_names = [os.path.splitext(f)[0] for f in data_filenames if f[0].isdigit()]
125 | 
126 |     # If there are no matching json files -- this is not an NDDS dataset -- return None
127 |     if not data_names:
128 |         return None, None
129 | 
130 |     data_paths = [os.path.join(input_dir, f) for f in data_filenames if f[0].isdigit()]
131 | 
132 |     if requested_image_types == "all":
133 |         # Detect based on first entry
134 |         first_entry_name = data_names[0]
135 |         matching_image_names = [
136 |             f
137 |             for f in dirlist
138 |             if f.startswith(first_entry_name) and f.endswith(image_full_ext)
139 |         ]
140 |         find_rgb = (
141 |             True
142 |             if first_entry_name + ".rgb" + image_full_ext in matching_image_names
143 |             else False
144 |         )
145 |         find_depth = (
146 |             True
147 |             if first_entry_name + ".depth" + image_full_ext in matching_image_names
148 |             else False
149 |         )
150 |         find_cs = (
151 |             True
152 |             if first_entry_name + ".cs" + image_full_ext in matching_image_names
153 |             else False
154 |         )
155 |         if len(matching_image_names) > 3:
156 |             print("Image types detected that are not yet implemented in this function.")
157 | 
158 |     elif requested_image_types:
159 |         # Check based on known data types
160 |         known_image_types = ["rgb", "depth", "cs"]
161 |         for this_image_type in requested_image_types:
162 |             assert (
163 |                 this_image_type in known_image_types
164 |             ), 'Image type "{}" not recognized.'.format(this_image_type)
165 | 
166 |         find_rgb = True if "rgb" in requested_image_types else False
167 |         find_depth = True if "depth" in requested_image_types else False
168 |         find_cs = True if "cs" in requested_image_types else False
169 | 
170 |     else:
171 |         find_rgb = False
172 |         find_depth = False
173 |         find_cs = False
174 | 
175 |     dict_of_lists_images = {}
176 |     n_samples = len(data_names)
177 | 
178 |     if find_rgb:
179 |         rgb_paths = [
180 |             os.path.join(input_dir, f + ".rgb" + image_full_ext) for f in data_names
181 |         ]
182 |         for n in range(n_samples):
183 |             assert os.path.exists(
184 |                 rgb_paths[n]
185 |             ), 'Expected image "{}" to exist, but it does not.'.format(rgb_paths[n])
186 |         dict_of_lists_images["rgb"] = rgb_paths
187 | 
188 |     if find_depth:
189 |         depth_paths = [
190 |             os.path.join(input_dir, f + ".depth" + image_full_ext) for f in data_names
191 |         ]
192 |         for n in range(n_samples):
193 |             assert os.path.exists(
194 |                 depth_paths[n]
195 |             ), 'Expected image "{}" to exist, but it does not.'.format(depth_paths[n])
196 |         dict_of_lists_images["depth"] = depth_paths
197 | 
198 |     if find_cs:
199 |         cs_paths = [
200 |             os.path.join(input_dir, f + ".cs" + image_full_ext) for f in data_names
201 |         ]
202 |         for n in range(n_samples):
203 |             assert os.path.exists(
204 |                 cs_paths[n]
205 |             ), 'Expected image "{}" to exist, but it does not.'.format(cs_paths[n])
206 |         dict_of_lists_images["class_segmentation"] = cs_paths
207 | 
208 |     found_images = [
209 |         dict(zip(dict_of_lists_images, t)) for t in zip(*dict_of_lists_images.values())
210 |     ]
211 | 
212 |     # Create output dictionaries
213 |     dict_of_lists = {"name": data_names, "data_path": data_paths}
214 | 
215 |     if find_rgb or find_depth or find_cs:
216 |         dict_of_lists["image_paths"] = found_images
217 | 
218 |     found_data = [dict(zip(dict_of_lists, t)) for t in zip(*dict_of_lists.values())]
219 | 
220 |     # Process config files, which are data files that don't have an associated image
221 |     found_configs = {"camera": None, "object": None, "unsorted": []}
222 |     data_filenames_without_images = [f for f in data_filenames if not f[0].isdigit()]
223 | 
224 |     for data_filename in data_filenames_without_images:
225 |         if data_filename == "_camera_settings" + data_full_ext:
226 |             found_configs["camera"] = os.path.join(input_dir, data_filename)
227 |         elif data_filename == "_object_settings" + data_full_ext:
228 |             found_configs["object"] = os.path.join(input_dir, data_filename)
229 |         else:
230 |             found_configs["unsorted"].append(os.path.join(input_dir, data_filename))
231 | 
232 |     return found_data, found_configs
233 | 
234 | 
235 | def load_camera_intrinsics(camera_data_path):
236 | 
237 |     # Input argument handling
238 |     assert os.path.exists(
239 |         camera_data_path
240 |     ), 'Expected path "{}" to exist, but it does not.'.format(camera_data_path)
241 | 
242 |     # Create YAML/json parser
243 |     data_parser = YAML(typ="safe")
244 | 
245 |     with open(camera_data_path, "r") as f:
246 |         cam_settings_data = data_parser.load(f)
247 | 
248 |     camera_fx = cam_settings_data["camera_settings"][0]["intrinsic_settings"]["fx"]
249 |     camera_fy = cam_settings_data["camera_settings"][0]["intrinsic_settings"]["fy"]
250 |     camera_cx = cam_settings_data["camera_settings"][0]["intrinsic_settings"]["cx"]
251 |     camera_cy = cam_settings_data["camera_settings"][0]["intrinsic_settings"]["cy"]
252 |     camera_K = np.array(
253 |         [[camera_fx, 0.0, camera_cx], [0.0, camera_fy, camera_cy], [0.0, 0.0, 1.0]]
254 |     )
255 | 
256 |     return camera_K
257 | 
258 | 
259 | def load_image_resolution(camera_data_path):
260 | 
261 |     # Input argument handling
262 |     assert os.path.exists(
263 |         camera_data_path
264 |     ), 'Expected path "{}" to exist, but it does not.'.format(camera_data_path)
265 | 
266 |     # Create YAML/json parser
267 |     data_parser = YAML(typ="safe")
268 | 
269 |     with open(camera_data_path, "r") as f:
270 |         cam_settings_data = data_parser.load(f)
271 | 
272 |     image_width = cam_settings_data["camera_settings"][0]["captured_image_size"][
273 |         "width"
274 |     ]
275 |     image_height = cam_settings_data["camera_settings"][0]["captured_image_size"][
276 |         "height"
277 |     ]
278 |     image_resolution = (image_width, image_height)
279 | 
280 |     return image_resolution
281 | 
282 | 
283 | def load_keypoints(data_path, object_name, keypoint_names):
284 |     assert os.path.exists(
285 |         data_path
286 |     ), 'Expected data_path "{}" to exist, but it does not.'.format(data_path)
287 | 
288 |     # Set up output structure
289 |     keypoint_data = {"positions_wrt_cam": [], "projections": []}
290 | 
291 |     # Load keypoints for a particular object for now
292 |     parser = YAML(typ="safe")
293 |     with open(data_path, "r") as f:
294 |         data = parser.load(f)
295 | 
296 |     assert (
297 |         "objects" in data.keys()
298 |     ), 'Expected "objects" key to exist in data file, but it does not.'
299 | 
300 |     object_names = [o["class"] for o in data["objects"]]
301 |     assert (
302 |         object_name in object_names
303 |     ), 'Requested object_name "{}" does not exist in the data file objects.'.format(
304 |         object_name
305 |     )
306 | 
307 |     idx_object = object_names.index(object_name)
308 | 
309 |     object_data = data["objects"][idx_object]
310 |     object_keypoints = object_data["keypoints"]
311 | 
312 |     object_keypoint_names = [kp["name"] for kp in object_keypoints]
313 | 
314 |     # Process in same order as keypoint_names to retain same order
315 |     for kp_name in keypoint_names:
316 |         assert (
317 |             kp_name in object_keypoint_names
318 |         ), "Expected keypoint '{}' to exist in the data file '{}', but it does not.  Rather, the keypoints are '{}'".format(
319 |             kp_name, data_path, object_keypoint_names
320 |         )
321 | 
322 |         idx_kp = object_keypoint_names.index(kp_name)
323 |         kp_data = object_keypoints[idx_kp]
324 | 
325 |         kp_position_wrt_cam = kp_data["location"]
326 |         kp_projection = kp_data["projected_location"]
327 | 
328 |         keypoint_data["positions_wrt_cam"].append(kp_position_wrt_cam)
329 |         keypoint_data["projections"].append(kp_projection)
330 | 
331 |     return keypoint_data
332 | 


--------------------------------------------------------------------------------
/manip_configs/baxter.yaml:
--------------------------------------------------------------------------------
 1 | !!omap
 2 | - manipulator: !!omap
 3 |   - name: baxter
 4 |   - keypoints:
 5 |     - name: torso_t0
 6 |       friendly_name: Torso
 7 |       ros_frame: torso_t0
 8 |     - name: left_s0
 9 |       friendly_name: Left:S0
10 |       ros_frame: left_s0
11 |     - name: left_s1
12 |       friendly_name: Left:S1
13 |       ros_frame: left_s1
14 |     - name: left_e0
15 |       friendly_name: Left:E0
16 |       ros_frame: left_e0
17 |     - name: left_e1
18 |       friendly_name: Left:E1
19 |       ros_frame: left_e1
20 |     - name: left_w0
21 |       friendly_name: Left:W0
22 |       ros_frame: left_w0
23 |     - name: left_w1
24 |       friendly_name: Left:W1
25 |       ros_frame: left_w1
26 |     - name: left_w2
27 |       friendly_name: Left:W2
28 |       ros_frame: left_w2
29 |     - name: left_hand
30 |       friendly_name: LeftHand
31 |       ros_frame: left_hand
32 |     - name: right_s0
33 |       friendly_name: Right:S0
34 |       ros_frame: right_s0
35 |     - name: right_s1
36 |       friendly_name: Right:S1
37 |       ros_frame: right_s1
38 |     - name: right_e0
39 |       friendly_name: Right:E0
40 |       ros_frame: right_e0
41 |     - name: right_e1
42 |       friendly_name: Right:E1
43 |       ros_frame: right_e1
44 |     - name: right_w0
45 |       friendly_name: Right:W0
46 |       ros_frame: right_w0
47 |     - name: right_w1
48 |       friendly_name: Right:W1
49 |       ros_frame: right_w1
50 |     - name: right_w2
51 |       friendly_name: Right:W2
52 |       ros_frame: right_w2
53 |     - name: right_hand
54 |       friendly_name: RightHand
55 |       ros_frame: right_hand
56 | 


--------------------------------------------------------------------------------
/manip_configs/kuka.yaml:
--------------------------------------------------------------------------------
 1 | !!omap
 2 | - manipulator: !!omap
 3 |   - name: kuka
 4 |   - keypoints:
 5 |     - name: iiwa7_link_0
 6 |       friendly_name: Base
 7 |       ros_frame: iiwa7_link_0
 8 |     - name: iiwa7_link_1
 9 |       friendly_name: Joint1
10 |       ros_frame: iiwa7_link_1
11 |     - name: iiwa7_link_2
12 |       friendly_name: Joint2
13 |       ros_frame: iiwa7_link_2
14 |     - name: iiwa7_link_3
15 |       friendly_name: Joint3
16 |       ros_frame: iiwa7_link_3
17 |     - name: iiwa7_link_4
18 |       friendly_name: Joint4
19 |       ros_frame: iiwa7_link_4
20 |     - name: iiwa7_link_5
21 |       friendly_name: Joint5
22 |       ros_frame: iiwa7_link_5
23 |     - name: iiwa7_link_6
24 |       friendly_name: Joint6
25 |       ros_frame: iiwa7_link_6
26 |     - name: iiwa7_link_7
27 |       friendly_name: Wrist
28 |       ros_frame: iiwa7_link_7
29 | 


--------------------------------------------------------------------------------
/manip_configs/panda.yaml:
--------------------------------------------------------------------------------
 1 | !!omap
 2 | - manipulator: !!omap
 3 |   - name: panda
 4 |   - keypoints:
 5 |     - name: panda_link0
 6 |       friendly_name: Base
 7 |       ros_frame: panda_link0
 8 |     - name: panda_link2
 9 |       friendly_name: Joint2
10 |       ros_frame: panda_link2
11 |     - name: panda_link3
12 |       friendly_name: Joint3
13 |       ros_frame: panda_link3
14 |     - name: panda_link4
15 |       friendly_name: Joint4
16 |       ros_frame: panda_link4
17 |     - name: panda_link6
18 |       friendly_name: Joint6
19 |       ros_frame: panda_link6
20 |     - name: panda_link7
21 |       friendly_name: Joint7
22 |       ros_frame: panda_link7
23 |     - name: panda_hand
24 |       friendly_name: Hand
25 |       ros_frame: panda_hand
26 | 


--------------------------------------------------------------------------------
/requirements.txt:
--------------------------------------------------------------------------------
 1 | future
 2 | albumentations
 3 | enum34
 4 | matplotlib
 5 | numpy
 6 | opencv-python
 7 | pandas
 8 | Pillow
 9 | pyrr
10 | pytest
11 | ruamel.yaml
12 | seaborn
13 | scipy
14 | tensorboardX
15 | torch
16 | torchvision
17 | tqdm
18 | webcolors
19 | gdown
20 | 


--------------------------------------------------------------------------------
/scripts/analyze_training.py:
--------------------------------------------------------------------------------
  1 | # Copyright (c) 2020 NVIDIA Corporation. All rights reserved.
  2 | # This work is licensed under the NVIDIA Source Code License - Non-commercial. Full
  3 | # text can be found in LICENSE.md
  4 | 
  5 | import argparse
  6 | import os
  7 | import pickle
  8 | 
  9 | from ruamel.yaml import YAML
 10 | 
 11 | import dream.analysis as dream_analysis
 12 | 
 13 | # ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 14 | 
 15 | LOSS_TEXT = "loss"
 16 | VIZ_TEXT = "viz"
 17 | 
 18 | 
 19 | def analyze_training(args):
 20 | 
 21 |     # Input argument handling
 22 |     assert os.path.exists(
 23 |         args.input_params_path
 24 |     ), 'Expected input_params_path "{}" to exist, but it does not.'.format(
 25 |         args.input_params_path
 26 |     )
 27 | 
 28 |     if args.input_config_path:
 29 |         input_config_path = args.input_config_path
 30 |     else:
 31 |         # Use params filepath to infer the config filepath
 32 |         input_config_path = os.path.splitext(args.input_params_path)[0] + ".yaml"
 33 | 
 34 |     assert os.path.exists(
 35 |         input_config_path
 36 |     ), 'Expected input_config_path "{}" to exist, but it does not.'.format(
 37 |         input_config_path
 38 |     )
 39 | 
 40 |     do_training_plots = True if LOSS_TEXT in args.analyses else False
 41 |     do_visualizations = True if VIZ_TEXT in args.analyses else False
 42 | 
 43 |     dream.utilities.makedirs(args.output_dir, exist_ok=args.force_overwrite)
 44 | 
 45 |     if do_training_plots:
 46 | 
 47 |         # Use params filepath to infer the training log pickle
 48 |         training_log_path = os.path.join(
 49 |             os.path.dirname(args.input_params_path), "training_log.pkl"
 50 |         )
 51 |         with open(training_log_path, "rb") as f:
 52 |             training_log = pickle.load(f)
 53 | 
 54 |         epochs = training_log["epochs"]
 55 |         batch_training_losses = training_log["batch_training_losses"]
 56 |         batch_validation_losses = training_log["batch_validation_losses"]
 57 | 
 58 |         save_plot_path = os.path.join(args.output_dir, "train_valid_loss.png")
 59 |         dream_analysis.plot_train_valid_loss(
 60 |             epochs,
 61 |             batch_training_losses,
 62 |             batch_validation_losses,
 63 |             save_plot_path=save_plot_path,
 64 |         )
 65 | 
 66 |     if do_visualizations:
 67 | 
 68 |         # Create parser
 69 |         data_parser = YAML(typ="safe")
 70 | 
 71 |         with open(input_config_path, "r") as f:
 72 |             network_config = data_parser.load(f)
 73 | 
 74 |         dataset_dir = os.path.expanduser(network_config["data_path"])
 75 | 
 76 |         dream_analysis.analyze_ndds_dataset(
 77 |             args.input_params_path,
 78 |             input_config_path,
 79 |             dataset_dir,
 80 |             args.output_dir,
 81 |             batch_size=args.batch_size,
 82 |             force_overwrite=args.force_overwrite,
 83 |             gpu_ids=args.gpu_ids,
 84 |         )
 85 | 
 86 | 
 87 | if __name__ == "__main__":
 88 |     parser = argparse.ArgumentParser(
 89 |         formatter_class=argparse.ArgumentDefaultsHelpFormatter
 90 |     )
 91 |     parser.add_argument(
 92 |         "-i",
 93 |         "--input-params-path",
 94 |         required=True,
 95 |         help="Path to network parameters file.",
 96 |     )
 97 |     parser.add_argument(
 98 |         "-c",
 99 |         "--input-config-path",
100 |         default=None,
101 |         help="Path to network configuration file.",
102 |     )
103 |     parser.add_argument(
104 |         "-o",
105 |         "--output-dir",
106 |         required=True,
107 |         help="Path to output directory to save training analysis results.",
108 |     )
109 |     parser.add_argument(
110 |         "-f",
111 |         "--force-overwrite",
112 |         action="store_true",
113 |         default=False,
114 |         help="Forces overwriting of analysis results in the provided directory.",
115 |     )
116 |     parser.add_argument(
117 |         "-a",
118 |         "--analyses",
119 |         nargs="+",
120 |         choices=[LOSS_TEXT, VIZ_TEXT],
121 |         default=[LOSS_TEXT, VIZ_TEXT],
122 |         help="Specify the analyses to run: 'loss' is loss plots; 'viz' is visualizations. Multiple analyses can be specified.",
123 |     )
124 |     parser.add_argument(
125 |         "-b",
126 |         "--batch-size",
127 |         type=int,
128 |         default=1,
129 |         help="The batch size used to process the data. Does not affect results, only how quickly the results are obtained.",
130 |     )
131 |     parser.add_argument(
132 |         "-g",
133 |         "--gpu-ids",
134 |         nargs="+",
135 |         type=int,
136 |         default=None,
137 |         help="The GPU IDs on which to conduct network inference. Nothing specified means all GPUs will be utilized. Does not affect results, only how quickly the results are obtained.",
138 |     )
139 |     args = parser.parse_args()
140 |     analyze_training(args)
141 | 


--------------------------------------------------------------------------------
/scripts/analyze_training_multi.py:
--------------------------------------------------------------------------------
  1 | # Copyright (c) 2020 NVIDIA Corporation. All rights reserved.
  2 | # This work is licensed under the NVIDIA Source Code License - Non-commercial. Full
  3 | # text can be found in LICENSE.md
  4 | 
  5 | import argparse
  6 | import os
  7 | import pickle
  8 | 
  9 | import matplotlib.pyplot as plt
 10 | import numpy as np
 11 | 
 12 | import dream.analysis as dream_analysis
 13 | 
 14 | # ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 15 | 
 16 | 
 17 | def analyze_training(args):
 18 | 
 19 |     # Input argument handling
 20 |     assert os.path.exists(
 21 |         args.input_dir
 22 |     ), 'Expected input directory "{}" to exist, but it does not.'.format(args.input_dir)
 23 | 
 24 |     save_results = True if args.output_dir else False
 25 | 
 26 |     if save_results:
 27 |         dream.utilities.makedirs(args.output_dir, exist_ok=args.force_overwrite)
 28 | 
 29 |     dir_list = [
 30 |         this_dir
 31 |         for this_dir in os.listdir(args.input_dir)
 32 |         if os.path.isdir(os.path.join(args.input_dir, this_dir))
 33 |     ]
 34 |     dir_list.sort(key=lambda x: os.path.getmtime(os.path.join(args.input_dir, x)))
 35 | 
 36 |     train_epochs = None
 37 |     all_losses_list = []
 38 |     all_validation_losses_list = []
 39 |     random_seeds = []
 40 |     for d in dir_list:
 41 |         results_path = os.path.join(args.input_dir, d, "training_log.pkl")
 42 |         with open(results_path, "rb") as f:
 43 |             training_log = pickle.load(f)
 44 | 
 45 |         # Process log
 46 |         epochs = training_log["epochs"]
 47 |         losses = training_log["losses"]
 48 |         this_random_seed = training_log["random_seed"]
 49 | 
 50 |         if train_epochs:
 51 |             assert train_epochs == epochs
 52 |         else:
 53 |             train_epochs = epochs
 54 | 
 55 |         all_losses_list.append(losses)
 56 | 
 57 |         if "validation_losses" in training_log:
 58 |             all_validation_losses_list.append(training_log["validation_losses"])
 59 | 
 60 |         random_seeds.append(this_random_seed)
 61 | 
 62 |         print("{}: Random seed: {}".format(d, this_random_seed))
 63 | 
 64 |     # Process
 65 |     all_losses = np.array(all_losses_list)
 66 |     all_losses_std = np.std(all_losses, axis=0)
 67 |     all_losses_mean = np.mean(all_losses, axis=0)
 68 |     all_losses_med = np.median(all_losses, axis=0)
 69 |     all_losses_max = np.max(all_losses, axis=0)
 70 |     all_losses_min = np.min(all_losses, axis=0)
 71 | 
 72 |     all_validation_losses = np.array(all_validation_losses_list)
 73 | 
 74 |     # Determine the worst and best results
 75 |     n_traces = len(all_losses_list)
 76 |     n_epochs = len(train_epochs)
 77 |     all_losses_lasthalf = all_losses[:, (n_epochs // 2) :]
 78 |     all_losses_lasthalf_sum = np.sum(all_losses_lasthalf, axis=1)
 79 |     x_worst = np.argmax(all_losses_lasthalf_sum)
 80 |     x_best = np.argmin(all_losses_lasthalf_sum)
 81 |     x_sort = np.argsort(all_losses_lasthalf_sum)
 82 |     x_median = x_sort[n_traces // 2]
 83 | 
 84 |     print("Training Loss Performance")
 85 |     print("~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~")
 86 |     print("Best instance for training loss: {}".format(dir_list[x_best]))
 87 |     print("Median instance for training loss: {}".format(dir_list[x_median]))
 88 |     print("Worst instance for training loss: {}".format(dir_list[x_worst]))
 89 |     print("")
 90 | 
 91 |     # Plot
 92 |     fig, ax = plt.subplots()
 93 |     ax.plot(train_epochs, np.transpose(all_losses), ".-")
 94 |     ax.plot(
 95 |         train_epochs,
 96 |         all_losses[x_worst, :],
 97 |         "-",
 98 |         linewidth=8,
 99 |         alpha=0.667,
100 |         label="Worst training result",
101 |     )
102 |     ax.plot(
103 |         train_epochs,
104 |         all_losses[x_best, :],
105 |         "-",
106 |         linewidth=8,
107 |         alpha=0.667,
108 |         label="Best training result",
109 |     )
110 |     ax.plot(
111 |         train_epochs,
112 |         all_losses[x_median, :],
113 |         "-",
114 |         linewidth=8,
115 |         alpha=0.667,
116 |         label="Median training result",
117 |     )
118 |     ax.grid()
119 |     plt.xlabel("Training epoch")
120 |     plt.ylabel("Training loss")
121 |     plt.xlim((train_epochs[0], train_epochs[-1]))
122 |     plt.title("All training results ({} instances)".format(n_traces))
123 |     ax.legend(loc="best")
124 |     if save_results:
125 |         training_results_path = os.path.join(
126 |             args.output_dir, "training_results_instances.png"
127 |         )
128 |         plt.savefig(training_results_path)
129 | 
130 |     fig, ax = plt.subplots()
131 |     ax.fill_between(
132 |         train_epochs,
133 |         all_losses_mean - all_losses_std,
134 |         all_losses_mean + all_losses_std,
135 |         alpha=0.333,
136 |         label="Aggregate mean +- 1 std dev",
137 |     )
138 |     ax.plot(train_epochs, all_losses_mean, ".-", label="Aggregate mean")
139 |     ax.plot(train_epochs, all_losses_med, ".-", label="Aggregate median")
140 |     ax.plot(train_epochs, all_losses_min, ".-", label="Aggregate min")
141 |     ax.plot(train_epochs, all_losses_max, ".-", label="Aggregate max")
142 |     ax.grid()
143 |     plt.xlabel("Training epoch")
144 |     plt.ylabel("Training loss")
145 |     plt.xlim((train_epochs[0], train_epochs[-1]))
146 |     plt.title("Aggregate (epoch-wise) training results ({} instances)".format(n_traces))
147 |     ax.legend(loc="best")
148 |     if save_results:
149 |         training_results_agg_path = os.path.join(
150 |             args.output_dir, "training_results_aggregate.png"
151 |         )
152 |         plt.savefig(training_results_agg_path)
153 | 
154 |     if not save_results:
155 |         plt.show()
156 | 
157 |     # Now show validation losses
158 |     if len(all_validation_losses) > 0:
159 | 
160 |         # Determine the best overall validation loss
161 |         min_validation_losses_traces = np.min(all_validation_losses, axis=1)
162 |         min_valid_loss_all_traces = np.min(min_validation_losses_traces)
163 |         x_best_valid_loss = np.argmin(min_validation_losses_traces)
164 |         x_epoch = np.argmin(all_validation_losses[x_best_valid_loss])
165 | 
166 |         print("Validation Loss Performance:")
167 |         print("~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~")
168 |         print(
169 |             "Best instance for validation loss: {} ({} after epoch {})".format(
170 |                 dir_list[x_best_valid_loss],
171 |                 min_valid_loss_all_traces,
172 |                 train_epochs[x_epoch],
173 |             )
174 |         )
175 | 
176 |         for n in range(n_traces):
177 | 
178 |             save_plot_path = (
179 |                 os.path.join(args.output_dir, "train_valid_loss_{}".format(dir_list[n]))
180 |                 if save_results
181 |                 else None
182 |             )
183 |             dream_analysis.plot_train_valid_loss(
184 |                 train_epochs,
185 |                 all_losses[n, :],
186 |                 all_validation_losses[n, :],
187 |                 dataset_name=dir_list[n],
188 |                 save_plot_path=save_plot_path,
189 |             )
190 | 
191 |         if not save_results:
192 |             plt.show()
193 | 
194 | 
195 | if __name__ == "__main__":
196 | 
197 |     # Parse input arguments
198 |     parser = argparse.ArgumentParser(
199 |         formatter_class=argparse.ArgumentDefaultsHelpFormatter
200 |     )
201 |     parser.add_argument(
202 |         "-i",
203 |         "--input-dir",
204 |         metavar="input_dir",
205 |         required=True,
206 |         help="Path to directory of training results to analyze.",
207 |     )
208 |     parser.add_argument(
209 |         "-o",
210 |         "--output-dir",
211 |         metavar="output_dir",
212 |         default=None,
213 |         help="Path to output directory to save analysis results.",
214 |     )
215 |     parser.add_argument(
216 |         "-f",
217 |         "--force-overwrite",
218 |         action="store_true",
219 |         default=False,
220 |         help="Forces overwriting of analysis results in the provided directory.",
221 |     )
222 |     args = parser.parse_args()
223 | 
224 |     # Analyze training results
225 |     analyze_training(args)
226 | 


--------------------------------------------------------------------------------
/scripts/launch_dream_ros.py:
--------------------------------------------------------------------------------
  1 | # Copyright (c) 2020 NVIDIA Corporation. All rights reserved.
  2 | # This work is licensed under the NVIDIA Source Code License - Non-commercial. Full
  3 | # text can be found in LICENSE.md
  4 | 
  5 | #!/usr/bin/env python2
  6 | 
  7 | import argparse
  8 | import os
  9 | 
 10 | import cv2
 11 | import numpy as np
 12 | from PIL import Image as PILImage
 13 | from pyrr import Quaternion
 14 | from ruamel.yaml import YAML
 15 | import torch
 16 | import webcolors
 17 | 
 18 | import rospy
 19 | 
 20 | from cv_bridge import CvBridge
 21 | from geometry_msgs.msg import TransformStamped
 22 | from sensor_msgs.msg import Image, CameraInfo
 23 | from std_srvs.srv import Empty
 24 | import tf2_ros as tf2
 25 | import tf
 26 | 
 27 | import dream
 28 | 
 29 | # ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 30 | 
 31 | # USERS: Please change the ROS topics / frames below as desired.
 32 | 
 33 | # ROS topic for listening to RGB images
 34 | image_topic = "/camera/color/image_raw"
 35 | 
 36 | # ROS topic for listening to camera intrinsics
 37 | camera_info_topic = "/camera/color/camera_info"
 38 | 
 39 | # ROS service for sending request to capture frame
 40 | capture_frame_service_topic = "/dream/capture_frame"
 41 | 
 42 | # ROS service for sending request to clear buffer
 43 | clear_buffer_service_topic = "/dream/clear_buffer"
 44 | 
 45 | # ROS topics for outputs
 46 | topic_out_net_input_image = "/dream/net_input_image"
 47 | topic_out_keypoint_overlay = "/dream/keypoint_overlay"
 48 | topic_out_belief_maps = "/dream/belief_maps"
 49 | topic_out_keypoint_belief_overlay = "/dream/keypoint_belief_overlay"
 50 | topic_out_keypoint_names = "/dream/keypoint_names"
 51 | topic_out_keypoint_frame_overlay = "/dream/keypoint_frame_overlay"
 52 | 
 53 | # ROS frames for the output of DREAM
 54 | # tform_out_basename is now set by the user - previously was 'dream/base_frame'
 55 | tform_out_childname = "dream/camera_rgb_frame"
 56 | 
 57 | # ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 58 | 
 59 | 
 60 | class DreamInferenceROS:
 61 |     def __init__(self, args, single_frame_mode=True, compute_2d_to_3d_transform=False):
 62 |         """Initialize inference engine.
 63 | 
 64 |             single_frame_mode:  Set this to True.  (Appears to be some sort of future-proofing by Tim.)
 65 |         """
 66 |         self.cv_image = None
 67 |         self.camera_K = None
 68 | 
 69 |         # TODO: -- continuous mode produces a TF at each frame
 70 |         # not continuous mode allows for several frames before producing an estimate
 71 |         self.single_frame_mode = single_frame_mode
 72 |         self.capture_frame_srv = rospy.Service(
 73 |             capture_frame_service_topic, Empty, self.on_capture_frame
 74 |         )
 75 |         self.clear_buffer_srv = rospy.Service(
 76 |             clear_buffer_service_topic, Empty, self.on_clear_buffer
 77 |         )
 78 |         self.kp_projs_raw_buffer = np.array([])
 79 |         self.kp_positions_buffer = np.array([])
 80 |         self.pnp_solution_found = False
 81 |         self.capture_frame_max_kps = True
 82 | 
 83 |         self.compute_2d_to_3d_transform = compute_2d_to_3d_transform
 84 | 
 85 |         # Create subscribers
 86 |         self.image_sub = rospy.Subscriber(
 87 |             image_topic, Image, self.on_image, queue_size=1
 88 |         )
 89 |         self.bridge = CvBridge()
 90 | 
 91 |         # Input argument handling
 92 |         assert os.path.exists(
 93 |             args.input_params_path
 94 |         ), 'Expected input_params_path "{}" to exist, but it does not.'.format(
 95 |             args.input_params_path
 96 |         )
 97 | 
 98 |         if args.input_config_path:
 99 |             input_config_path = args.input_config_path
100 |         else:
101 |             # Use params filepath to infer the config filepath
102 |             input_config_path = os.path.splitext(args.input_params_path)[0] + ".yaml"
103 | 
104 |         assert os.path.exists(
105 |             input_config_path
106 |         ), 'Expected input_config_path "{}" to exist, but it does not.'.format(
107 |             input_config_path
108 |         )
109 | 
110 |         # Create parser
111 |         print("# Opening config file:  {} ...".format(input_config_path))
112 |         data_parser = YAML(typ="safe")
113 | 
114 |         with open(input_config_path, "r") as f:
115 |             network_config = data_parser.load(f)
116 | 
117 |         # Overwrite GPU
118 |         # If nothing is specified at the command line, None is the default, which uses all GPUs
119 |         # TBD - think about a better way of doing this
120 |         network_config["training"]["platform"]["gpu_ids"] = args.gpu_ids
121 | 
122 |         # Load network
123 |         print("# Creating network...")
124 |         self.dream_network = dream.create_network_from_config_data(network_config)
125 | 
126 |         print(
127 |             "Loading network with weights from:  {} ...".format(args.input_params_path)
128 |         )
129 |         self.dream_network.model.load_state_dict(torch.load(args.input_params_path))
130 |         self.dream_network.enable_evaluation()
131 | 
132 |         # Use image preprocessing specified by config by default, unless user specifies otherwise
133 |         if args.image_preproc_override:
134 |             self.image_preprocessing = args.image_preproc_override
135 |         else:
136 |             self.image_preprocessing = self.dream_network.image_preprocessing()
137 | 
138 |         # Output names used to look up keypoints in TF tree
139 |         self.keypoint_tf_frames = self.dream_network.ros_keypoint_frames
140 |         print("ROS keypoint frames: {}".format(self.keypoint_tf_frames))
141 | 
142 |         # Define publishers
143 |         self.net_input_image_pub = rospy.Publisher(
144 |             topic_out_net_input_image, Image, queue_size=1
145 |         )
146 |         self.image_overlay_pub = rospy.Publisher(
147 |             topic_out_keypoint_overlay, Image, queue_size=1
148 |         )
149 |         self.belief_image_pub = rospy.Publisher(
150 |             topic_out_belief_maps, Image, queue_size=1
151 |         )
152 |         self.kp_belief_overlay_pub = rospy.Publisher(
153 |             topic_out_keypoint_belief_overlay, Image, queue_size=1
154 |         )
155 |         self.kp_frame_overlay_pub = rospy.Publisher(
156 |             topic_out_keypoint_frame_overlay, Image, queue_size=1
157 |         )
158 | 
159 |         # Store the base frame for the TF lookup
160 |         self.base_tf_frame = args.base_frame
161 | 
162 |         # Define TFs
163 |         self.tfBuffer = tf2.Buffer()
164 |         self.tf_broadcaster = tf2.TransformBroadcaster()
165 |         self.listener = tf2.TransformListener(self.tfBuffer)
166 |         self.camera_pose_tform = TransformStamped()
167 | 
168 |         self.camera_pose_tform.header.frame_id = self.base_tf_frame
169 |         self.camera_pose_tform.child_frame_id = tform_out_childname
170 | 
171 |         # Subscriber for camera intrinsics topic
172 |         self.camera_info_sub = rospy.Subscriber(
173 |             camera_info_topic, CameraInfo, self.on_camera_info, queue_size=1
174 |         )
175 | 
176 |         # Verbose mode
177 |         self.verbose = args.verbose
178 | 
179 |     def on_capture_frame(self, req):
180 |         print("Capturing frame.")
181 |         found_kp_projs_net_input = dream_ros.process_image()
182 |         print(found_kp_projs_net_input)
183 |         (
184 |             kp_projs_raw_good_sample,
185 |             kp_positions_good_sample,
186 |         ) = dream_ros.keypoint_correspondences(found_kp_projs_net_input)
187 |         if self.capture_frame_max_kps and kp_projs_raw_good_sample is not None:
188 |             n_found_keypoints = kp_projs_raw_good_sample.shape[0]
189 |             if n_found_keypoints != self.dream_network.n_keypoints:
190 |                 print(
191 |                     "Only found {} keypoints -- not continuing. Try again.".format(
192 |                         n_found_keypoints
193 |                     )
194 |                 )
195 |                 return []
196 |         if (
197 |             kp_projs_raw_good_sample is not None
198 |             and kp_positions_good_sample is not None
199 |         ):
200 |             dream_ros.solve_pnp_buffer(
201 |                 kp_projs_raw_good_sample, kp_positions_good_sample
202 |             )
203 |         return []
204 | 
205 |     def on_clear_buffer(self, req):
206 |         print("Clearing frame buffer.")
207 |         self.kp_projs_raw_buffer = np.array([])
208 |         self.kp_positions_buffer = np.array([])
209 |         self.pnp_solution_found = False
210 |         return []
211 | 
212 |     def on_image(self, image):
213 |         self.cv_image = self.bridge.imgmsg_to_cv2(image, "rgb8")
214 | 
215 |     def on_camera_info(self, camera_info):
216 |         # Create camera intrinsics matrix
217 |         fx = camera_info.K[0]
218 |         fy = camera_info.K[4]
219 |         cx = camera_info.K[2]
220 |         cy = camera_info.K[5]
221 |         self.camera_K = np.array([[fx, 0.0, cx], [0.0, fy, cy], [0.0, 0.0, 1.0]])
222 | 
223 |     def process_image(self):
224 |         """Performs inference on the image most recently captured.
225 | 
226 |         Input (none)
227 |         self.cv_image:  Holds the image to be processed.
228 | 
229 |         Returns
230 |         detected_keypoints:  Array of 2D keypoint coords wrt original image, possibly including missing keypoints
231 |         """
232 | 
233 |         if self.cv_image is None:
234 |             return
235 | 
236 |         # Determine if we need debug content from single-image inference based on subscribers
237 |         num_connect_belief_image_pub = self.belief_image_pub.get_num_connections()
238 |         num_connect_net_input_image_pub = self.net_input_image_pub.get_num_connections()
239 |         num_connect_image_overlay_pub = self.image_overlay_pub.get_num_connections()
240 |         num_connect_kp_belief_overlay_pub = (
241 |             self.kp_belief_overlay_pub.get_num_connections()
242 |         )
243 | 
244 |         if (
245 |             num_connect_belief_image_pub > 0
246 |             or num_connect_net_input_image_pub > 0
247 |             or num_connect_image_overlay_pub > 0
248 |             or num_connect_kp_belief_overlay_pub > 0
249 |         ):
250 |             debug_inference = True
251 |         else:
252 |             debug_inference = False
253 | 
254 |         # Detect keypoints from single-image inference
255 |         image_raw = PILImage.fromarray(self.cv_image)
256 |         detection_result = self.dream_network.keypoints_from_image(
257 |             image_raw,
258 |             image_preprocessing_override=self.image_preprocessing,
259 |             debug=debug_inference,
260 |         )
261 |         detected_keypoints = detection_result["detected_keypoints"]
262 | 
263 |         # Publish debug topics - only do this if something is subscribed
264 |         # TODO: clean up so some of the intermediate processing isn't duplicated
265 |         if num_connect_belief_image_pub > 0:
266 |             belief_maps = detection_result["belief_maps"]
267 |             belief_images = dream.image_proc.images_from_belief_maps(
268 |                 belief_maps, normalization_method=6
269 |             )
270 |             belief_image_mosaic = dream.image_proc.mosaic_images(
271 |                 belief_images, rows=1, cols=len(belief_images), inner_padding_px=5
272 |             )
273 |             cv_belief_image = np.array(belief_image_mosaic)
274 |             cv_belief_image = cv_belief_image[:, :, ::-1].copy()
275 |             belief_msg = self.bridge.cv2_to_imgmsg(cv_belief_image, encoding="bgr8")
276 |             self.belief_image_pub.publish(belief_msg)
277 | 
278 |         if num_connect_net_input_image_pub > 0:
279 |             net_input_image = detection_result["image_rgb_net_input"]
280 |             cv_input_image = np.array(net_input_image)
281 |             cv_input_image = cv_input_image[:, :, ::-1].copy()
282 |             net_input_image_msg = self.bridge.cv2_to_imgmsg(
283 |                 cv_input_image, encoding="bgr8"
284 |             )
285 |             self.net_input_image_pub.publish(net_input_image_msg)
286 | 
287 |         if num_connect_image_overlay_pub > 0:
288 |             # TODO: fix color cycle
289 |             if self.dream_network.n_keypoints == 7:
290 |                 point_colors = [
291 |                     "red",
292 |                     "blue",
293 |                     "green",
294 |                     "yellow",
295 |                     "black",
296 |                     "cyan",
297 |                     "white",
298 |                 ]
299 |             else:
300 |                 point_colors = "red"
301 | 
302 |             image_overlay = dream.image_proc.overlay_points_on_image(
303 |                 image_raw,
304 |                 detected_keypoints,
305 |                 self.dream_network.friendly_keypoint_names,
306 |                 annotation_color_dot=point_colors,
307 |                 annotation_color_text=point_colors,
308 |             )
309 |             cv_image_overlay = np.array(image_overlay)
310 |             cv_image_overlay = cv_image_overlay[:, :, ::-1].copy()
311 |             image_overlay_msg = self.bridge.cv2_to_imgmsg(
312 |                 cv_image_overlay, encoding="bgr8"
313 |             )
314 |             self.image_overlay_pub.publish(image_overlay_msg)
315 | 
316 |         if num_connect_kp_belief_overlay_pub > 0:
317 |             image_raw_resolution = (self.cv_image.shape[1], self.cv_image.shape[0])
318 |             net_input_resolution = detection_result["image_rgb_net_input"].size
319 |             belief_maps = detection_result["belief_maps"]
320 |             flattened_belief_tensor = belief_maps.sum(dim=0)
321 |             flattened_belief_image = dream.image_proc.image_from_belief_map(
322 |                 flattened_belief_tensor, colormap="hot", normalization_method=6
323 |             )
324 |             flattened_belief_image_netin = dream.image_proc.convert_image_to_netin_from_netout(
325 |                 flattened_belief_image, net_input_resolution
326 |             )
327 |             flattened_belief_image_raw = dream.image_proc.inverse_preprocess_image(
328 |                 flattened_belief_image_netin,
329 |                 image_raw_resolution,
330 |                 self.image_preprocessing,
331 |             )
332 |             flattened_belief_image_raw_blend = PILImage.blend(
333 |                 image_raw, flattened_belief_image_raw, alpha=0.5
334 |             )
335 | 
336 |             # Overlay keypoints
337 |             # TODO: fix color cycle
338 |             if self.dream_network.n_keypoints == 7:
339 |                 point_colors = [
340 |                     "red",
341 |                     "blue",
342 |                     "green",
343 |                     "yellow",
344 |                     "black",
345 |                     "cyan",
346 |                     "white",
347 |                 ]
348 |             else:
349 |                 point_colors = "red"
350 | 
351 |             kp_belief_overlay = dream.image_proc.overlay_points_on_image(
352 |                 flattened_belief_image_raw_blend,
353 |                 detected_keypoints,
354 |                 self.dream_network.friendly_keypoint_names,
355 |                 annotation_color_dot=point_colors,
356 |                 annotation_color_text=point_colors,
357 |             )
358 |             cv_kp_belief_overlay = np.array(kp_belief_overlay)
359 |             cv_kp_belief_overlay = cv_kp_belief_overlay[:, :, ::-1].copy()
360 |             kp_belief_overlay_msg = self.bridge.cv2_to_imgmsg(
361 |                 cv_kp_belief_overlay, encoding="bgr8"
362 |             )
363 |             self.kp_belief_overlay_pub.publish(kp_belief_overlay_msg)
364 | 
365 |         return detected_keypoints
366 | 
367 |     def keypoint_correspondences(self, detected_kp_projs):
368 |         """Convert 2D keypoint coords to (2D, 3D) pairs of correspondences.
369 | 
370 |         Input:
371 |         detected_kp_projs:  Array of 2D keypoint coords wrt original image, possibly including missing keypoints
372 | 
373 |         Returns:
374 |         kp_projs_raw_good_sample:  Array of 2D keypoint coords wrt original image
375 |         kp_positions_good_sample:  Array of 3D keypoint coords
376 |         """
377 | 
378 |         if not self.compute_2d_to_3d_transform:
379 |             return None, None
380 | 
381 |         all_kp_positions = []
382 | 
383 |         for i in range(len(self.keypoint_tf_frames)):
384 |             keypoint_tf_frame = self.keypoint_tf_frames[i]
385 |             if self.verbose:
386 |                 print(
387 |                     "Attempting transform lookup between {} and {}...".format(
388 |                         self.base_tf_frame, keypoint_tf_frame
389 |                     )
390 |                 )
391 |             try:
392 |                 tform = self.tfBuffer.lookup_transform(
393 |                     self.base_tf_frame, keypoint_tf_frame, rospy.Time()
394 |                 )
395 |                 this_pos = np.array(
396 |                     [
397 |                         tform.transform.translation.x,
398 |                         tform.transform.translation.y,
399 |                         tform.transform.translation.z,
400 |                     ]
401 |                 )
402 |                 all_kp_positions.append(this_pos)
403 | 
404 |             except tf2.TransformException as e:
405 |                 print("TF Exception: {}".format(e))
406 |                 return None, None
407 | 
408 |         # Now determine which to keep
409 |         kp_projs_raw_good_sample = []
410 |         kp_positions_good_sample = []
411 |         for this_kp_proj_est, this_kp_position in zip(
412 |             detected_kp_projs, all_kp_positions
413 |         ):
414 |             if (
415 |                 this_kp_proj_est is not None
416 |                 and this_kp_proj_est[0]
417 |                 and this_kp_proj_est[1]
418 |                 and not (this_kp_proj_est[0] < -999.0 and this_kp_proj_est[1] < 0.999)
419 |             ):
420 |                 # This keypoint is defined to exist within the image frame, so we keep it
421 |                 kp_projs_raw_good_sample.append(this_kp_proj_est)
422 |                 kp_positions_good_sample.append(this_kp_position)
423 | 
424 |         kp_projs_raw_good_sample = np.array(kp_projs_raw_good_sample)
425 |         kp_positions_good_sample = np.array(kp_positions_good_sample)
426 | 
427 |         return kp_projs_raw_good_sample, kp_positions_good_sample
428 | 
429 |     def solve_pnp_buffer(self, candidate_kp_projs_raw, candidate_kp_positions):
430 | 
431 |         if self.camera_K is None:
432 |             self.pnp_solution_found = False
433 |             return
434 | 
435 |         kp_projs_raw_to_try = np.array(
436 |             self.kp_projs_raw_buffer.tolist() + candidate_kp_projs_raw.tolist()
437 |         )
438 |         kp_positions_to_try = np.array(
439 |             self.kp_positions_buffer.tolist() + candidate_kp_positions.tolist()
440 |         )
441 | 
442 |         if self.verbose:
443 |             print("\nSolving for PNP... ~~~~~~~~~~~~~~~~~~~~")
444 |             print("2D Detected KP projections in raw:")
445 |             print(kp_projs_raw_to_try)
446 | 
447 |             print("3D KP positions:")
448 |             print(kp_positions_to_try)
449 | 
450 |         pnp_retval, tvec, quat = dream.geometric_vision.solve_pnp(
451 |             kp_positions_to_try, kp_projs_raw_to_try, self.camera_K
452 |         )
453 | 
454 |         if pnp_retval:
455 |             self.pnp_solution_found = True
456 | 
457 |             if self.verbose:
458 |                 print("Camera-from-robot pose, found by PNP:")
459 |                 print(tvec)
460 |                 print(quat)
461 | 
462 |             # Update transform
463 |             T_cam_from_robot = tf.transformations.quaternion_matrix(quat)
464 |             T_cam_from_robot[:3, -1] = tvec
465 |             T_robot_from_cam = tf.transformations.inverse_matrix(T_cam_from_robot)
466 | 
467 |             robot_from_cam_pos = T_robot_from_cam[:3, -1]
468 | 
469 |             R_robot_from_cam = T_robot_from_cam[:3, :3]
470 |             temp = tf.transformations.identity_matrix()
471 |             temp[:3, :3] = R_robot_from_cam
472 |             robot_from_cam_quat = tf.transformations.quaternion_from_matrix(temp)
473 | 
474 |             # Update transform msg for publication
475 |             self.camera_pose_tform.transform.translation.x = robot_from_cam_pos[0]
476 |             self.camera_pose_tform.transform.translation.y = robot_from_cam_pos[1]
477 |             self.camera_pose_tform.transform.translation.z = robot_from_cam_pos[2]
478 | 
479 |             self.camera_pose_tform.transform.rotation.x = robot_from_cam_quat[0]
480 |             self.camera_pose_tform.transform.rotation.y = robot_from_cam_quat[1]
481 |             self.camera_pose_tform.transform.rotation.z = robot_from_cam_quat[2]
482 |             self.camera_pose_tform.transform.rotation.w = robot_from_cam_quat[3]
483 | 
484 |             # Save buffer since we got a PNP solution if we're not in single-frame mode
485 |             if not self.single_frame_mode:
486 |                 self.kp_projs_raw_buffer = kp_projs_raw_to_try
487 |                 self.kp_positions_buffer = kp_positions_to_try
488 |                 print(
489 |                     "Adding to buffer! New buffer size: {}".format(
490 |                         self.kp_positions_buffer.shape[0]
491 |                     )
492 |                 )
493 | 
494 |         else:
495 |             print("PnP failed to provide a solution.")
496 |             self.pnp_solution_found = False
497 | 
498 |     def publish_pose(self):
499 |         self.camera_pose_tform.header.stamp = rospy.Time().now()
500 |         self.tf_broadcaster.sendTransform(self.camera_pose_tform)
501 | 
502 |         # Generate and publish keypoint frame overlay
503 |         if self.kp_frame_overlay_pub.get_num_connections() > 0:
504 | 
505 |             if self.cv_image is None or self.camera_K is None:
506 |                 return
507 | 
508 |             all_kp_transforms = []
509 |             for i in range(len(self.keypoint_tf_frames)):
510 |                 keypoint_tf_frame = self.keypoint_tf_frames[i]
511 |                 # Lookup transform between dream published frame and keypoint frames
512 |                 try:
513 |                     tform = self.tfBuffer.lookup_transform(
514 |                         self.camera_pose_tform.child_frame_id,
515 |                         keypoint_tf_frame,
516 |                         rospy.Time(),
517 |                     )
518 |                     all_kp_transforms.append(tform)
519 | 
520 |                 except tf2.TransformException as e:
521 |                     print("TF Exception: {}".format(e))
522 |                     return None, None
523 | 
524 |             cv_image_overlay = self.cv_image.copy()
525 | 
526 |             frame_len = 0.1
527 |             frame_thickness = 3
528 |             frame_triad_pts = np.array(
529 |                 [
530 |                     [0.0, 0.0, 0.0, 1.0],
531 |                     [frame_len, 0.0, 0.0, 1.0],
532 |                     [0.0, frame_len, 0.0, 1.0],
533 |                     [0.0, 0.0, frame_len, 1.0],
534 |                 ]
535 |             )
536 |             shift = 4
537 |             factor = 1 << shift
538 |             point_radius = 4.0
539 | 
540 |             for kp_tform in all_kp_transforms:
541 |                 pos = [
542 |                     kp_tform.transform.translation.x,
543 |                     kp_tform.transform.translation.y,
544 |                     kp_tform.transform.translation.z,
545 |                 ]
546 |                 quat = [
547 |                     kp_tform.transform.rotation.x,
548 |                     kp_tform.transform.rotation.y,
549 |                     kp_tform.transform.rotation.z,
550 |                     kp_tform.transform.rotation.w,
551 |                 ]
552 |                 T = tf.transformations.quaternion_matrix(quat)
553 |                 T[:3, -1] = pos
554 | 
555 |                 frame_triad_positions_homog = np.transpose(
556 |                     np.matmul(T, np.transpose(frame_triad_pts))
557 |                 )
558 |                 frame_triad_positions = [
559 |                     dream.geometric_vision.hnormalized(v).tolist()
560 |                     for v in frame_triad_positions_homog
561 |                 ]
562 |                 frame_triad_projs = dream.geometric_vision.point_projection_from_3d(
563 |                     self.camera_K, frame_triad_positions
564 |                 )
565 | 
566 |                 # Overlay line on image
567 |                 point0_fixedpt = (
568 |                     int(frame_triad_projs[0][0] * factor),
569 |                     int(frame_triad_projs[0][1] * factor),
570 |                 )
571 |                 point1_fixedpt = (
572 |                     int(frame_triad_projs[1][0] * factor),
573 |                     int(frame_triad_projs[1][1] * factor),
574 |                 )
575 |                 point2_fixedpt = (
576 |                     int(frame_triad_projs[2][0] * factor),
577 |                     int(frame_triad_projs[2][1] * factor),
578 |                 )
579 |                 point3_fixedpt = (
580 |                     int(frame_triad_projs[3][0] * factor),
581 |                     int(frame_triad_projs[3][1] * factor),
582 |                 )
583 | 
584 |                 # x-axis
585 |                 cv_image_overlay = cv2.line(
586 |                     cv_image_overlay,
587 |                     point0_fixedpt,
588 |                     point1_fixedpt,
589 |                     webcolors.name_to_rgb("red"),
590 |                     thickness=frame_thickness,
591 |                     shift=shift,
592 |                 )
593 |                 # y-axis
594 |                 cv_image_overlay = cv2.line(
595 |                     cv_image_overlay,
596 |                     point0_fixedpt,
597 |                     point2_fixedpt,
598 |                     webcolors.name_to_rgb("green"),
599 |                     thickness=frame_thickness,
600 |                     shift=shift,
601 |                 )
602 |                 # z-axis
603 |                 cv_image_overlay = cv2.line(
604 |                     cv_image_overlay,
605 |                     point0_fixedpt,
606 |                     point3_fixedpt,
607 |                     webcolors.name_to_rgb("blue"),
608 |                     thickness=frame_thickness,
609 |                     shift=shift,
610 |                 )
611 |                 # center of frame triad
612 |                 radius_fixedpt = int(point_radius * factor)
613 |                 cv_image_overlay = cv2.circle(
614 |                     cv_image_overlay,
615 |                     point0_fixedpt,
616 |                     radius_fixedpt,
617 |                     webcolors.name_to_rgb("black"),
618 |                     thickness=-1,
619 |                     shift=shift,
620 |                 )
621 | 
622 |             cv_image_overlay = cv_image_overlay[:, :, ::-1]
623 |             image_overlay_msg = self.bridge.cv2_to_imgmsg(
624 |                 cv_image_overlay, encoding="bgr8"
625 |             )
626 |             self.kp_frame_overlay_pub.publish(image_overlay_msg)
627 | 
628 | 
629 | if __name__ == "__main__":
630 | 
631 |     # Parse input arguments
632 |     parser = argparse.ArgumentParser(
633 |         formatter_class=argparse.ArgumentDefaultsHelpFormatter
634 |     )
635 |     parser.add_argument(
636 |         "-i",
637 |         "--input-params-path",
638 |         required=True,
639 |         help="Path to network parameters file.",
640 |     )
641 |     parser.add_argument(
642 |         "-c",
643 |         "--input-config-path",
644 |         default=None,
645 |         help="Path to network configuration file. If nothing is specified, the script will search for a config file by the same name as the network parameters file.",
646 |     )
647 |     parser.add_argument(
648 |         "-b",
649 |         "--base-frame",
650 |         required=True,
651 |         help="The ROS TF name for the base frame of the robot, which serves as the canonical frame for PnP.",
652 |     )
653 |     parser.add_argument(
654 |         "-r",
655 |         "--node-rate",
656 |         type=float,
657 |         default=10.0,
658 |         help="The rate in Hz for this node to run.",
659 |     )
660 |     parser.add_argument(
661 |         "-v",
662 |         "--verbose",
663 |         action="store_true",
664 |         default=False,
665 |         help="Outputs all diagnostic information to the screen.",
666 |     )
667 |     parser.add_argument(
668 |         "-p",
669 |         "--image-preproc-override",
670 |         default=None,
671 |         help="Overrides the image preprocessing specified by the network. (Debug argument.)",
672 |     )
673 |     parser.add_argument(
674 |         "-g",
675 |         "--gpu-ids",
676 |         nargs="+",
677 |         type=int,
678 |         default=None,
679 |         help="The GPU IDs on which to conduct network inference. Nothing specified means all GPUs will be utilized. Does not affect results, only how quickly the results are obtained.",
680 |     )
681 |     args = parser.parse_args()
682 | 
683 |     # Initialize ROS node
684 |     rospy.init_node("dream")
685 | 
686 |     # Create DREAM inference engine
687 |     single_frame_mode = True
688 |     mode_str = "single-frame mode" if single_frame_mode else "multi-frame mode"
689 |     dream_ros = DreamInferenceROS(
690 |         args, single_frame_mode, compute_2d_to_3d_transform=True
691 |     )
692 |     print("DREAM ~ online ~ " + mode_str)
693 | 
694 |     # Main loop
695 |     rate = rospy.Rate(args.node_rate)
696 |     while not rospy.is_shutdown():
697 | 
698 |         # Find keypoints in image
699 |         found_kp_projs_net_input = dream_ros.process_image()
700 | 
701 |         # Solve PNP (if in single frame mode)
702 |         if single_frame_mode and found_kp_projs_net_input is not None:
703 |             (
704 |                 kp_projs_raw_good_sample,
705 |                 kp_positions_good_sample,
706 |             ) = dream_ros.keypoint_correspondences(found_kp_projs_net_input)
707 |             if (
708 |                 kp_projs_raw_good_sample is not None
709 |                 and kp_positions_good_sample is not None
710 |             ):
711 |                 dream_ros.solve_pnp_buffer(
712 |                     kp_projs_raw_good_sample, kp_positions_good_sample
713 |                 )
714 | 
715 |         # Publish pose (if found)
716 |         if dream_ros.pnp_solution_found:
717 |             dream_ros.publish_pose()
718 | 
719 |         rate.sleep()
720 | 


--------------------------------------------------------------------------------
/scripts/network_inference.py:
--------------------------------------------------------------------------------
  1 | # Copyright (c) 2020 NVIDIA Corporation. All rights reserved.
  2 | # This work is licensed under the NVIDIA Source Code License - Non-commercial. Full
  3 | # text can be found in LICENSE.md
  4 | 
  5 | import argparse
  6 | import math
  7 | import os
  8 | from PIL import Image as PILImage
  9 | 
 10 | import numpy as np
 11 | from ruamel.yaml import YAML
 12 | import torch
 13 | import torchvision.transforms as TVTransforms
 14 | 
 15 | import dream
 16 | 
 17 | # ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 18 | 
 19 | 
 20 | def generate_belief_map_visualizations(
 21 |     belief_maps, keypoint_projs_detected, keypoint_projs_gt=None
 22 | ):
 23 | 
 24 |     belief_map_images = dream.image_proc.images_from_belief_maps(
 25 |         belief_maps, normalization_method=6
 26 |     )
 27 | 
 28 |     belief_map_images_kp = []
 29 |     for kp in range(len(keypoint_projs_detected)):
 30 |         if keypoint_projs_gt:
 31 |             keypoint_projs = [keypoint_projs_gt[kp], keypoint_projs_detected[kp]]
 32 |             color = ["green", "red"]
 33 |         else:
 34 |             keypoint_projs = [keypoint_projs_detected[kp]]
 35 |             color = "red"
 36 |         belief_map_image_kp = dream.image_proc.overlay_points_on_image(
 37 |             belief_map_images[kp],
 38 |             keypoint_projs,
 39 |             annotation_color_dot=color,
 40 |             annotation_color_text=color,
 41 |             point_diameter=4,
 42 |         )
 43 |         belief_map_images_kp.append(belief_map_image_kp)
 44 |     n_cols = int(math.ceil(len(keypoint_projs_detected) / 2.0))
 45 |     belief_maps_kp_mosaic = dream.image_proc.mosaic_images(
 46 |         belief_map_images_kp,
 47 |         rows=2,
 48 |         cols=n_cols,
 49 |         inner_padding_px=10,
 50 |         fill_color_rgb=(0, 0, 0),
 51 |     )
 52 |     return belief_maps_kp_mosaic
 53 | 
 54 | 
 55 | def network_inference(args):
 56 | 
 57 |     # Input argument handling
 58 |     assert os.path.exists(
 59 |         args.input_params_path
 60 |     ), 'Expected input_params_path "{}" to exist, but it does not.'.format(
 61 |         args.input_params_path
 62 |     )
 63 | 
 64 |     if args.input_config_path:
 65 |         input_config_path = args.input_config_path
 66 |     else:
 67 |         # Use params filepath to infer the config filepath
 68 |         input_config_path = os.path.splitext(args.input_params_path)[0] + ".yaml"
 69 | 
 70 |     assert os.path.exists(
 71 |         input_config_path
 72 |     ), 'Expected input_config_path "{}" to exist, but it does not.'.format(
 73 |         input_config_path
 74 |     )
 75 | 
 76 |     assert os.path.exists(
 77 |         args.image_path
 78 |     ), 'Expected image_path "{}" to exist, but it does not.'.format(args.image_path)
 79 | 
 80 |     # Create parser
 81 |     print("# Opening config file:  {} ...".format(input_config_path))
 82 |     data_parser = YAML(typ="safe")
 83 | 
 84 |     with open(input_config_path, "r") as f:
 85 |         network_config = data_parser.load(f)
 86 | 
 87 |     # Overwrite GPU
 88 |     # If nothing is specified at the command line, None is the default, which uses all GPUs
 89 |     # TBD - think about a better way of doing this
 90 |     network_config["training"]["platform"]["gpu_ids"] = args.gpu_ids
 91 | 
 92 |     # Load network
 93 |     print("# Creating network...")
 94 |     dream_network = dream.create_network_from_config_data(network_config)
 95 | 
 96 |     print("Loading network with weights from:  {} ...".format(args.input_params_path))
 97 |     dream_network.model.load_state_dict(torch.load(args.input_params_path))
 98 |     dream_network.enable_evaluation()
 99 | 
100 |     # Load in image
101 |     print("# Loading image:  {} ...".format(args.image_path))
102 |     image_rgb_OrigInput_asPilImage = PILImage.open(args.image_path).convert("RGB")
103 |     orig_image_dim = tuple(image_rgb_OrigInput_asPilImage.size)
104 | 
105 |     # Use image preprocessing specified by config by default, unless user specifies otherwise
106 |     if args.image_preproc_override:
107 |         image_preprocessing = args.image_preproc_override
108 |         print(
109 |             "  Image preprocessing: '{}' --- as specified by user".format(
110 |                 image_preprocessing
111 |             )
112 |         )
113 |     else:
114 |         image_preprocessing = dream_network.image_preprocessing()
115 |         print(
116 |             "  Image preprocessing: '{}' --- default as specified by network config".format(
117 |                 image_preprocessing
118 |             )
119 |         )
120 | 
121 |     print("Detecting keypoints...")
122 |     detection_result = dream_network.keypoints_from_image(
123 |         image_rgb_OrigInput_asPilImage,
124 |         image_preprocessing_override=image_preprocessing,
125 |         debug=True,
126 |     )
127 |     kp_coords_wrtOrigInput_asArray = detection_result["detected_keypoints"]
128 |     print(
129 |         "Detected keypoints in input image:\n{}".format(kp_coords_wrtOrigInput_asArray)
130 |     )
131 | 
132 |     kp_coords_wrtNetOutput_asArray = detection_result["detected_keypoints_net_output"]
133 |     image_rgb_NetInput_asPilImage = detection_result["image_rgb_net_input"]
134 |     input_image_dim = image_rgb_NetInput_asPilImage.size
135 |     belief_maps_wrtNetOutput_asTensor = detection_result["belief_maps"]
136 |     kp_coords_wrtNetInput_asArray = detection_result["detected_keypoints_net_input"]
137 | 
138 |     # Read in keypoints if provided
139 |     if args.keypoints_path:
140 |         print(
141 |             "# Loading ground truth keypoints from {} ...".format(args.keypoints_path)
142 |         )
143 |         keypoints_gt = dream.utilities.load_keypoints(
144 |             args.keypoints_path,
145 |             dream_network.manipulator_name,
146 |             dream_network.keypoint_names,
147 |         )
148 |         kp_coords_gt_wrtOrig = keypoints_gt["projections"]
149 |         print(
150 |             "Ground truth keypoints in input image:\n{}".format(
151 |                 np.array(kp_coords_gt_wrtOrig)
152 |             )
153 |         )
154 | 
155 |         kp_coords_gt_wrtNetInput_asArray = dream.image_proc.convert_keypoints_to_netin_from_raw(
156 |             kp_coords_gt_wrtOrig,
157 |             orig_image_dim,
158 |             dream_network.trained_net_input_resolution(),
159 |             image_preprocessing,
160 |         )
161 |         kp_coords_gt_wrtNetOutput_asArray = dream.image_proc.convert_keypoints_to_netout_from_netin(
162 |             kp_coords_gt_wrtNetInput_asArray,
163 |             dream_network.trained_net_input_resolution(),
164 |             dream_network.trained_net_output_resolution(),
165 |         )
166 |         kp_coords_gt_wrtNetInput_asList = kp_coords_gt_wrtNetInput_asArray.tolist()
167 |         kp_coords_gt_wrtNetOutput_asList = kp_coords_gt_wrtNetOutput_asArray.tolist()
168 |     else:
169 |         print("# Not loading ground truth keypoints (not provided)")
170 |         kp_coords_gt_wrtNetInput_asList = None
171 |         kp_coords_gt_wrtNetOutput_asList = None
172 | 
173 |     # Generate visualization output:  keypoints, with ground truth if requested) overlaid on image used for network input
174 |     keypoints_wrtNetInput_overlay = dream.image_proc.overlay_points_on_image(
175 |         image_rgb_NetInput_asPilImage,
176 |         kp_coords_gt_wrtNetInput_asList,
177 |         dream_network.friendly_keypoint_names,
178 |         annotation_color_dot="green",
179 |         annotation_color_text="white",
180 |     )
181 |     keypoints_wrtNetInput_overlay = dream.image_proc.overlay_points_on_image(
182 |         keypoints_wrtNetInput_overlay,
183 |         kp_coords_wrtNetInput_asArray,
184 |         dream_network.friendly_keypoint_names,
185 |         annotation_color_dot="red",
186 |         annotation_color_text="white",
187 |     )
188 |     keypoints_wrtNetInput_overlay.show(
189 |         title="Keypoints (possibly with ground truth) on net input image"
190 |     )
191 | 
192 |     # Generate visualization output:  mosaic of raw belief maps from network
193 |     belief_maps_overlay = generate_belief_map_visualizations(
194 |         belief_maps_wrtNetOutput_asTensor,
195 |         kp_coords_wrtNetOutput_asArray,
196 |         kp_coords_gt_wrtNetOutput_asList,
197 |     )
198 |     belief_maps_overlay.show(title="Belief map output mosaic")
199 | 
200 |     # Generate visualization output:  mosaic of belief maps, with keypoints, overlaid on image used for network input
201 |     belief_maps_wrtNetOutput_asListOfPilImages = dream.image_proc.images_from_belief_maps(
202 |         belief_maps_wrtNetOutput_asTensor, normalization_method=6
203 |     )
204 |     blended_array = []
205 | 
206 |     for n in range(len(kp_coords_wrtNetOutput_asArray)):
207 | 
208 |         bm_wrtNetOutput_asPilImage = belief_maps_wrtNetOutput_asListOfPilImages[n]
209 |         kp = kp_coords_wrtNetInput_asArray[n]
210 |         fname = dream_network.friendly_keypoint_names[n]
211 | 
212 |         bm_wrtNetInput_asPilImage = bm_wrtNetOutput_asPilImage.resize(
213 |             input_image_dim, resample=PILImage.BILINEAR
214 |         )
215 |         blended = PILImage.blend(
216 |             image_rgb_NetInput_asPilImage, bm_wrtNetInput_asPilImage, alpha=0.5
217 |         )
218 |         blended = dream.image_proc.overlay_points_on_image(
219 |             blended,
220 |             [kp],
221 |             [fname],
222 |             annotation_color_dot="red",
223 |             annotation_color_text="white",
224 |         )
225 |         blended_array.append(blended)
226 | 
227 |     n_cols = int(math.ceil(len(kp_coords_wrtNetOutput_asArray) / 2.0))
228 |     belief_maps_with_kp_overlaid_mosaic = dream.image_proc.mosaic_images(
229 |         blended_array, rows=2, cols=n_cols, fill_color_rgb=(0, 0, 0)
230 |     )
231 |     belief_maps_with_kp_overlaid_mosaic.show(
232 |         title="Mosaic of belief maps, with keypoints, on original"
233 |     )
234 | 
235 |     # Squash belief maps into one combined image
236 |     belief_map_combined_wrtNetOutput_asTensor = belief_maps_wrtNetOutput_asTensor.sum(
237 |         dim=0
238 |     )
239 |     belief_map_combined_wrtNetOutput_asPilImage = dream.image_proc.image_from_belief_map(
240 |         belief_map_combined_wrtNetOutput_asTensor, normalization_method=6
241 |     )  # clamps to b/w 0 and 1
242 |     belief_map_combined_wrtNetInput_asPilImage = dream.image_proc.convert_image_to_netin_from_netout(
243 |         belief_map_combined_wrtNetOutput_asPilImage, input_image_dim
244 |     )
245 | 
246 |     # Generate visualization output:  belief maps, with keypoints, overlaid on image used for network input
247 |     belief_map_combined_wrtNetInput_overlay = PILImage.blend(
248 |         image_rgb_NetInput_asPilImage,
249 |         belief_map_combined_wrtNetInput_asPilImage,
250 |         alpha=0.5,
251 |     )
252 |     belief_map_combined_wrtNetInput_overlay = dream.image_proc.overlay_points_on_image(
253 |         belief_map_combined_wrtNetInput_overlay,
254 |         kp_coords_wrtNetInput_asArray,
255 |         dream_network.friendly_keypoint_names,
256 |         annotation_color_dot="red",
257 |         annotation_color_text="white",
258 |     )
259 |     belief_map_combined_wrtNetInput_overlay.show(
260 |         title="Belief maps, with keypoints, on net input image"
261 |     )
262 | 
263 |     # Generate visualization output:  belief maps, with keypoints, overlaid on original image
264 |     belief_map_combined_wrtOrigInput_asPilImage = dream.image_proc.inverse_preprocess_image(
265 |         belief_map_combined_wrtNetInput_asPilImage, orig_image_dim, image_preprocessing
266 |     )
267 |     belief_map_combined_wrtOrigInput_overlay = PILImage.blend(
268 |         image_rgb_OrigInput_asPilImage,
269 |         belief_map_combined_wrtOrigInput_asPilImage,
270 |         alpha=0.5,
271 |     )
272 |     belief_map_combined_wrtOrigInput_overlay = dream.image_proc.overlay_points_on_image(
273 |         belief_map_combined_wrtOrigInput_overlay,
274 |         kp_coords_wrtOrigInput_asArray,
275 |         dream_network.friendly_keypoint_names,
276 |         annotation_color_dot="red",
277 |         annotation_color_text="white",
278 |     )
279 |     belief_map_combined_wrtOrigInput_overlay.show(
280 |         title="Belief maps, with keypoints, on original image"
281 |     )
282 | 
283 |     print("Done.")
284 | 
285 | 
286 | if __name__ == "__main__":
287 | 
288 |     print(
289 |         "---------- Running 'network_inference.py' -------------------------------------------------"
290 |     )
291 | 
292 |     # Parse input arguments
293 |     parser = argparse.ArgumentParser(
294 |         formatter_class=argparse.ArgumentDefaultsHelpFormatter
295 |     )
296 |     parser.add_argument(
297 |         "-i",
298 |         "--input-params-path",
299 |         required=True,
300 |         help="Path to network parameters file.",
301 |     )
302 |     parser.add_argument(
303 |         "-c",
304 |         "--input-config-path",
305 |         default=None,
306 |         help="Path to network configuration file. If nothing is specified, the script will search for a config file by the same name as the network parameters file.",
307 |     )
308 |     parser.add_argument(
309 |         "-m", "--image_path", required=True, help="Path to image used for inference."
310 |     )
311 |     parser.add_argument(
312 |         "-k",
313 |         "--keypoints_path",
314 |         default=None,
315 |         help="Path to NDDS dataset with ground truth keypoints information.",
316 |     )
317 |     parser.add_argument(
318 |         "-g",
319 |         "--gpu-ids",
320 |         nargs="+",
321 |         type=int,
322 |         default=None,
323 |         help="The GPU IDs on which to conduct network inference. Nothing specified means all GPUs will be utilized. Does not affect results, only how quickly the results are obtained.",
324 |     )
325 |     parser.add_argument(
326 |         "-p",
327 |         "--image-preproc-override",
328 |         default=None,
329 |         help="Overrides the image preprocessing specified by the network. (Debug argument.)",
330 |     )
331 |     args = parser.parse_args()
332 | 
333 |     # Run network inference
334 |     network_inference(args)
335 | 


--------------------------------------------------------------------------------
/scripts/network_inference_dataset.py:
--------------------------------------------------------------------------------
  1 | # Copyright (c) 2020 NVIDIA Corporation. All rights reserved.
  2 | # This work is licensed under the NVIDIA Source Code License - Non-commercial. Full
  3 | # text can be found in LICENSE.md
  4 | 
  5 | import argparse
  6 | import os
  7 | 
  8 | import dream.analysis as dream_analysis
  9 | 
 10 | # ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 11 | 
 12 | 
 13 | def network_inference_dataset(args):
 14 | 
 15 |     # Input argument handling
 16 |     assert os.path.exists(
 17 |         args.input_params_path
 18 |     ), 'Expected input_params_path "{}" to exist, but it does not.'.format(
 19 |         args.input_params_path
 20 |     )
 21 | 
 22 |     if args.input_config_path:
 23 |         input_config_path = args.input_config_path
 24 |     else:
 25 |         # Use params filepath to infer the config filepath
 26 |         input_config_path = os.path.splitext(args.input_params_path)[0] + ".yaml"
 27 | 
 28 |     assert os.path.exists(
 29 |         input_config_path
 30 |     ), 'Expected input_config_path "{}" to exist, but it does not.'.format(
 31 |         input_config_path
 32 |     )
 33 | 
 34 |     assert os.path.exists(
 35 |         args.dataset_dir
 36 |     ), 'Expected dataset_dir "{}" to exist, but it does not.'.format(args.dataset_dir)
 37 | 
 38 |     assert (
 39 |         isinstance(args.batch_size, int) and args.batch_size > 0
 40 |     ), 'If specified, "batch_size" must be a positive integer.'
 41 | 
 42 |     visualize_belief_maps = not args.not_visualize_belief_maps
 43 | 
 44 |     dream_analysis.analyze_ndds_dataset(
 45 |         args.input_params_path,
 46 |         input_config_path,
 47 |         args.dataset_dir,
 48 |         args.output_dir,
 49 |         visualize_belief_maps=visualize_belief_maps,
 50 |         pnp_analysis=True,
 51 |         force_overwrite=args.force_overwrite,
 52 |         image_preprocessing_override=args.image_preproc_override,
 53 |         batch_size=args.batch_size,
 54 |         num_workers=args.num_workers,
 55 |         gpu_ids=args.gpu_ids,
 56 |     )
 57 | 
 58 | 
 59 | if __name__ == "__main__":
 60 |     parser = argparse.ArgumentParser(
 61 |         formatter_class=argparse.ArgumentDefaultsHelpFormatter
 62 |     )
 63 |     parser.add_argument(
 64 |         "-i",
 65 |         "--input-params-path",
 66 |         required=True,
 67 |         help="Path to network parameters file.",
 68 |     )
 69 |     parser.add_argument(
 70 |         "-c",
 71 |         "--input-config-path",
 72 |         default=None,
 73 |         help="Path to network configuration file. If nothing is specified, the script will search for a config file by the same name as the network parameters file.",
 74 |     )
 75 |     parser.add_argument(
 76 |         "-d",
 77 |         "--dataset-dir",
 78 |         required=True,
 79 |         help="Path to NDDS dataset on which to conduct network inference.",
 80 |     )
 81 |     parser.add_argument(
 82 |         "-o",
 83 |         "--output-dir",
 84 |         required=True,
 85 |         help="Path to output directory to save analysis results.",
 86 |     )
 87 |     parser.add_argument(
 88 |         "-not-v",
 89 |         "--not-visualize-belief-maps",
 90 |         action="store_true",
 91 |         default=False,
 92 |         help="Disable belief map visualization. Without this flag, belief map visualization is enabled by default.",
 93 |     )
 94 |     parser.add_argument(
 95 |         "-f",
 96 |         "--force-overwrite",
 97 |         action="store_true",
 98 |         default=False,
 99 |         help="Forces overwriting of analysis results in the provided directory.",
100 |     )
101 |     parser.add_argument(
102 |         "-b",
103 |         "--batch-size",
104 |         type=int,
105 |         default=16,
106 |         help="The batch size used to process the data. Does not affect results, only how quickly the results are obtained.",
107 |     )
108 |     parser.add_argument(
109 |         "-w",
110 |         "--num-workers",
111 |         type=int,
112 |         default=8,
113 |         help='The number of subprocesses ("workers") used for loading the data. 0 means that no subprocesses are used. '
114 |         + "Does not affect results, only how quickly the results are obtained.",
115 |     )
116 |     parser.add_argument(
117 |         "-g",
118 |         "--gpu-ids",
119 |         nargs="+",
120 |         type=int,
121 |         default=None,
122 |         help="The GPU IDs on which to conduct network inference. Nothing specified means all GPUs will be utilized. Does not affect results, only how quickly the results are obtained.",
123 |     )
124 |     parser.add_argument(
125 |         "-p",
126 |         "--image-preproc-override",
127 |         default=None,
128 |         help="Overrides the image preprocessing specified by the network. (Debug argument.)",
129 |     )
130 |     args = parser.parse_args()
131 |     network_inference_dataset(args)
132 | 


--------------------------------------------------------------------------------
/scripts/train_network_multi.py:
--------------------------------------------------------------------------------
 1 | # Copyright (c) 2020 NVIDIA Corporation. All rights reserved.
 2 | # This work is licensed under the NVIDIA Source Code License - Non-commercial. Full
 3 | # text can be found in LICENSE.md
 4 | 
 5 | import argparse
 6 | import os
 7 | import subprocess
 8 | 
 9 | # ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
10 | 
11 | # Totally developmental script
12 | 
13 | 
14 | def train_network_multi(args):
15 | 
16 |     # Input argument handling
17 |     assert (
18 |         isinstance(args.num_instances, int) and args.num_instances > 0
19 |     ), "Expected num_instances to be an integer greater than 0."
20 | 
21 |     # Generate output directories
22 |     dream.utilities.makedirs(args.output_dir)
23 |     output_dirs = [
24 |         os.path.join(args.output_dir, "train_{}".format(n))
25 |         for n in range(args.num_instances)
26 |     ]
27 | 
28 |     for output_dir in output_dirs:
29 | 
30 |         # TBD: do this without subprocess
31 |         train_command_output = (
32 |             "python scripts/train_network.py "
33 |             + args.train_command
34 |             + ' -o "{}"'.format(output_dir)
35 |         )
36 |         subprocess.run(train_command_output, shell=True)
37 | 
38 | 
39 | if __name__ == "__main__":
40 |     parser = argparse.ArgumentParser(
41 |         formatter_class=argparse.ArgumentDefaultsHelpFormatter
42 |     )
43 |     parser.add_argument(
44 |         "-n",
45 |         "--num-instances",
46 |         metavar="num_instances",
47 |         type=int,
48 |         required=True,
49 |         help="Number of training instances to run.",
50 |     )
51 |     parser.add_argument(
52 |         "-o",
53 |         "--output-dir",
54 |         metavar="output_dir",
55 |         required=True,
56 |         help="Output directory for training instances.",
57 |     )
58 |     parser.add_argument(
59 |         "-c",
60 |         "--train-command",
61 |         metavar="train_command",
62 |         required=True,
63 |         help="Command line options for training.",
64 |     )
65 |     args = parser.parse_args()
66 | 
67 |     # Run train network, multiple instances version
68 |     train_network_multi(args)
69 | 


--------------------------------------------------------------------------------
/scripts/visualize_network_inference.py:
--------------------------------------------------------------------------------
  1 | # Copyright (c) 2020 NVIDIA Corporation. All rights reserved.
  2 | # This work is licensed under the NVIDIA Source Code License - Non-commercial. Full
  3 | # text can be found in LICENSE.md
  4 | 
  5 | import argparse
  6 | import os
  7 | from PIL import Image as PILImage
  8 | import shutil
  9 | import subprocess
 10 | import sys
 11 | 
 12 | import numpy as np
 13 | from ruamel.yaml import YAML
 14 | import torch
 15 | from torch.utils.data import DataLoader as TorchDataLoader
 16 | import torchvision.transforms as TVTransforms
 17 | from tqdm import tqdm
 18 | 
 19 | import dream
 20 | 
 21 | # ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 22 | 
 23 | 
 24 | def video_from_frames(frames_dir, video_output_path, video_framerate):
 25 |     force_str = "-y"
 26 |     loglevel_str = "-loglevel 24"
 27 |     framerate_str = "-framerate {}".format(video_framerate)
 28 |     input_data_str = '-pattern_type glob -i "{}"'.format(
 29 |         os.path.join(frames_dir, "*.png")
 30 |     )
 31 |     output_vid_str = '"{}"'.format(video_output_path)
 32 |     encoding_str = "-vcodec libx264 -pix_fmt yuv420p"
 33 |     ffmpeg_vid_cmd = (
 34 |         "ffmpeg "
 35 |         + force_str
 36 |         + " "
 37 |         + loglevel_str
 38 |         + " "
 39 |         + framerate_str
 40 |         + " "
 41 |         + input_data_str
 42 |         + " "
 43 |         + encoding_str
 44 |         + " "
 45 |         + output_vid_str
 46 |     )
 47 | 
 48 |     print("Running command: {}".format(ffmpeg_vid_cmd))
 49 |     subprocess.call(ffmpeg_vid_cmd, shell=True)
 50 | 
 51 | 
 52 | # ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 53 | 
 54 | KP_OVERLAY_RAW = "kp_raw"
 55 | KP_OVERLAY_NET_INPUT = "kp_net_input"
 56 | KP_BELIEF_OVERLAY_RAW = "kp_belief_raw"
 57 | BELIEF_OVERLAY_RAW = "belief_raw"
 58 | 
 59 | 
 60 | def visualize_network_inference(args):
 61 | 
 62 |     # Input argument handling
 63 |     assert os.path.exists(
 64 |         args.input_params_path
 65 |     ), 'Expected input_params_path "{}" to exist, but it does not.'.format(
 66 |         args.input_params_path
 67 |     )
 68 | 
 69 |     if args.input_config_path:
 70 |         input_config_path = args.input_config_path
 71 |     else:
 72 |         # Use params filepath to infer the config filepath
 73 |         input_config_path = os.path.splitext(args.input_params_path)[0] + ".yaml"
 74 | 
 75 |     assert os.path.exists(
 76 |         input_config_path
 77 |     ), 'Expected input_config_path "{}" to exist, but it does not.'.format(
 78 |         input_config_path
 79 |     )
 80 | 
 81 |     assert os.path.exists(
 82 |         args.dataset_path
 83 |     ), 'Expected dataset_path "{}" to exist, but it does not.'.format(args.dataset_path)
 84 | 
 85 |     # Determine what types of visualizations to do
 86 |     print("visualization types: {}".format(args.visualization_types))
 87 |     do_kp_overlay_raw = True if KP_OVERLAY_RAW in args.visualization_types else False
 88 |     do_kp_overlay_net_input = (
 89 |         True if KP_OVERLAY_NET_INPUT in args.visualization_types else False
 90 |     )
 91 |     do_kp_belief_overlay_raw = (
 92 |         True if KP_BELIEF_OVERLAY_RAW in args.visualization_types else False
 93 |     )
 94 |     do_belief_overlay_raw = (
 95 |         True if BELIEF_OVERLAY_RAW in args.visualization_types else False
 96 |     )
 97 | 
 98 |     videos_to_make = []
 99 |     needs_belief_maps = False
100 |     if do_kp_overlay_raw:
101 |         idx_kp_overlay_raw = len(videos_to_make)
102 |         videos_to_make.append(
103 |             {
104 |                 "frames_dir": os.path.join(args.output_dir, "frames_kp_overlay_raw"),
105 |                 "output_path": os.path.join(args.output_dir, "kp_overlay_raw.mp4"),
106 |                 "frame": [],
107 |             }
108 |         )
109 |     if do_kp_overlay_net_input:
110 |         idx_kp_overlay_net_input = len(videos_to_make)
111 |         videos_to_make.append(
112 |             {
113 |                 "frames_dir": os.path.join(
114 |                     args.output_dir, "frames_kp_overlay_net_input"
115 |                 ),
116 |                 "output_path": os.path.join(
117 |                     args.output_dir, "kp_overlay_net_input.mp4"
118 |                 ),
119 |                 "frame": [],
120 |             }
121 |         )
122 |     if do_kp_belief_overlay_raw:
123 |         idx_kp_belief_overlay_raw = len(videos_to_make)
124 |         needs_belief_maps = True
125 |         videos_to_make.append(
126 |             {
127 |                 "frames_dir": os.path.join(
128 |                     args.output_dir, "frames_kp_belief_overlay_raw"
129 |                 ),
130 |                 "output_path": os.path.join(
131 |                     args.output_dir, "kp_belief_overlay_raw.mp4"
132 |                 ),
133 |                 "frame": [],
134 |             }
135 |         )
136 |     if do_belief_overlay_raw:
137 |         idx_belief_overlay_raw = len(videos_to_make)
138 |         needs_belief_maps = True
139 |         videos_to_make.append(
140 |             {
141 |                 "frames_dir": os.path.join(
142 |                     args.output_dir, "frames_belief_overlay_raw"
143 |                 ),
144 |                 "output_path": os.path.join(args.output_dir, "belief_overlay_raw.mp4"),
145 |                 "frame": [],
146 |             }
147 |         )
148 | 
149 |     if len(videos_to_make) == 0:
150 |         print("No visualizations have been selected.")
151 |         sys.exit(0)
152 | 
153 |     dream.utilities.makedirs(args.output_dir, exist_ok=args.force_overwrite)
154 |     for video in videos_to_make:
155 |         if os.path.exists(video["frames_dir"]):
156 |             assert args.force_overwrite, 'Frames directory "{}" already exists.'.format(
157 |                 video["frames_dir"]
158 |             )
159 |             shutil.rmtree(video["frames_dir"])
160 |         dream.utilities.makedirs(video["frames_dir"], exist_ok=args.force_overwrite)
161 | 
162 |     # Create parser
163 |     data_parser = YAML(typ="safe")
164 | 
165 |     with open(input_config_path, "r") as f:
166 |         network_config = data_parser.load(f)
167 | 
168 |     # Overwrite GPU
169 |     # If nothing is specified at the command line, None is the default, which uses all GPUs
170 |     # TBD - think about a better way of doing this
171 |     network_config["training"]["platform"]["gpu_ids"] = args.gpu_ids
172 | 
173 |     # Load network
174 |     dream_network = dream.create_network_from_config_data(network_config)
175 |     dream_network.model.load_state_dict(torch.load(args.input_params_path))
176 |     dream_network.enable_evaluation()
177 | 
178 |     # Use image preprocessing specified by config by default, unless user specifies otherwise
179 |     if args.image_preproc_override:
180 |         image_preprocessing = args.image_preproc_override
181 |     else:
182 |         image_preprocessing = dream_network.image_preprocessing()
183 | 
184 |     if args.keypoint_ids is None or len(args.keypoint_ids) == 0:
185 |         idx_keypoints = list(range(dream_network.n_keypoints))
186 |     else:
187 |         idx_keypoints = args.keypoint_ids
188 |     n_idx_keypoints = len(idx_keypoints)
189 | 
190 |     sample_results = []
191 | 
192 |     dataset_to_viz = dream.utilities.find_ndds_data_in_dir(args.dataset_path)
193 |     dataset_file_dict_list = dataset_to_viz[
194 |         0
195 |     ]  # list of data file dictionaries; each dictionary indicates the files names for rgb, depth, seg, ...
196 |     dataset_meta_dict = dataset_to_viz[1]  # dictionary of camera, object files, etc.
197 | 
198 |     if dataset_file_dict_list:
199 | 
200 |         # Downselect based on frame name
201 |         if args.start_frame or args.end_frame:
202 |             sample_names = [x["name"] for x in dataset_file_dict_list]
203 |             start_idx = sample_names.index(args.start_frame) if args.start_frame else 0
204 |             end_idx = (
205 |                 sample_names.index(args.end_frame) + 1
206 |                 if args.end_frame
207 |                 else len(dataset_file_dict_list)
208 |             )
209 | 
210 |             dataset_to_viz = (
211 |                 dataset_file_dict_list[start_idx:end_idx],
212 |                 dataset_meta_dict,
213 |             )
214 | 
215 |         image_raw_resolution = dream.utilities.load_image_resolution(
216 |             dataset_meta_dict["camera"]
217 |         )
218 |         (
219 |             network_input_res_inf,
220 |             network_output_res_inf,
221 |         ) = dream_network.net_resolutions_from_image_raw_resolution(
222 |             image_raw_resolution, image_preprocessing_override=image_preprocessing
223 |         )
224 | 
225 |         manip_dataset_debug_mode = dream.datasets.ManipulatorNDDSDatasetDebugLevels[
226 |             "LIGHT"
227 |         ]
228 |         manip_dataset = dream.datasets.ManipulatorNDDSDataset(
229 |             dataset_to_viz,
230 |             dream_network.manipulator_name,
231 |             dream_network.keypoint_names,
232 |             network_input_res_inf,
233 |             network_output_res_inf,
234 |             dream_network.image_normalization,
235 |             image_preprocessing,
236 |             augment_data=False,
237 |             include_ground_truth=not args.no_ground_truth,
238 |             debug_mode=manip_dataset_debug_mode,
239 |         )
240 | 
241 |         # TODO: set batch size and num_workers at command line
242 |         batch_size = 8
243 |         num_workers = 4
244 |         training_data = TorchDataLoader(
245 |             manip_dataset, batch_size=batch_size, num_workers=num_workers, shuffle=False
246 |         )
247 | 
248 |         # Network inference on dataset
249 |         with torch.no_grad():
250 | 
251 |             for batch_idx, sample in enumerate(tqdm(training_data)):
252 | 
253 |                 this_batch_size = len(sample["config"]["name"])
254 | 
255 |                 # Conduct inference
256 |                 network_image_input = sample["image_rgb_input"].cuda()
257 |                 (
258 |                     belief_maps_batch,
259 |                     detected_kp_projs_netout_batch,
260 |                 ) = dream_network.inference(network_image_input)
261 | 
262 |                 for b in range(this_batch_size):
263 | 
264 |                     input_image_path = sample["config"]["image_paths"]["rgb"][b]
265 | 
266 |                     if needs_belief_maps:
267 |                         belief_maps = belief_maps_batch[b]
268 |                         selected_belief_maps_copy = (
269 |                             belief_maps[idx_keypoints, :, :].detach().clone()
270 |                         )
271 |                     else:
272 |                         selected_belief_maps_copy = []
273 | 
274 |                     detected_kp_projs_netout = np.array(
275 |                         detected_kp_projs_netout_batch[b], dtype=float
276 |                     )
277 |                     selected_detected_kp_projs_netout = detected_kp_projs_netout[
278 |                         idx_keypoints, :
279 |                     ]
280 |                     selected_detected_kp_projs_netin = dream.image_proc.convert_keypoints_to_netin_from_netout(
281 |                         selected_detected_kp_projs_netout,
282 |                         network_output_res_inf,
283 |                         network_input_res_inf,
284 |                     )
285 |                     selected_detected_kp_projs_raw = dream.image_proc.convert_keypoints_to_raw_from_netin(
286 |                         selected_detected_kp_projs_netin,
287 |                         network_input_res_inf,
288 |                         image_raw_resolution,
289 |                         image_preprocessing,
290 |                     )
291 | 
292 |                     if args.no_ground_truth:
293 |                         selected_gt_kp_projs_raw = []
294 |                         selected_gt_kp_projs_netin = []
295 |                     else:
296 |                         selected_gt_kp_projs_raw = np.array(
297 |                             sample["keypoint_projections_raw"][b][idx_keypoints, :],
298 |                             dtype=float,
299 |                         )
300 |                         selected_gt_kp_projs_netin = np.array(
301 |                             sample["keypoint_projections_input"][b][idx_keypoints, :],
302 |                             dtype=float,
303 |                         )
304 | 
305 |                     input_image_raw = PILImage.open(input_image_path).convert("RGB")
306 |                     image_net_input = dream.image_proc.image_from_tensor(
307 |                         sample["image_rgb_input_viz"][b]
308 |                     )
309 | 
310 |                     sample_results.append(
311 |                         (
312 |                             input_image_raw,
313 |                             image_net_input,
314 |                             selected_belief_maps_copy,
315 |                             selected_detected_kp_projs_raw,
316 |                             selected_detected_kp_projs_netin,
317 |                             selected_gt_kp_projs_raw,
318 |                             selected_gt_kp_projs_netin,
319 |                         )
320 |                     )
321 | 
322 |     else:
323 |         # Probably a directory of images - fix this later to avoid code duplication
324 |         dirlist = os.listdir(args.dataset_path)
325 |         dirlist.sort()
326 |         png_image_names = [f for f in dirlist if f.endswith(".png")]
327 |         jpg_image_names = [f for f in dirlist if f.endswith(".jpg")]
328 |         image_names = (
329 |             png_image_names
330 |             if len(png_image_names) > len(jpg_image_names)
331 |             else jpg_image_names
332 |         )
333 | 
334 |         if args.start_frame or args.end_frame:
335 |             sample_names = [os.path.splitext(i)[0] for i in image_names]
336 |             start_idx = sample_names.index(args.start_frame) if args.start_frame else 0
337 |             end_idx = (
338 |                 sample_names.index(args.end_frame) + 1
339 |                 if args.end_frame
340 |                 else len(sample_names)
341 |             )
342 | 
343 |             image_names = image_names[start_idx:end_idx]
344 | 
345 |         # Just use a heuristic to determine the image extension
346 |         image_paths = [os.path.join(args.dataset_path, i) for i in image_names]
347 | 
348 |         for input_image_path in tqdm(image_paths):
349 | 
350 |             input_image_raw = PILImage.open(input_image_path).convert("RGB")
351 |             detection_result = dream_network.keypoints_from_image(
352 |                 input_image_raw,
353 |                 image_preprocessing_override=image_preprocessing,
354 |                 debug=True,
355 |             )
356 | 
357 |             selected_detected_kps_raw = detection_result["detected_keypoints"][
358 |                 idx_keypoints, :
359 |             ]
360 |             selected_detected_kps_netin = detection_result[
361 |                 "detected_keypoints_net_input"
362 |             ][idx_keypoints, :]
363 |             image_net_input = detection_result["image_rgb_net_input"]
364 |             selected_belief_maps = (
365 |                 detection_result["belief_maps"][idx_keypoints, :, :]
366 |                 if needs_belief_maps
367 |                 else []
368 |             )
369 |             selected_gt_kps_raw = []
370 |             selected_gt_kps_netin = []
371 | 
372 |             sample_results.append(
373 |                 (
374 |                     input_image_raw,
375 |                     image_net_input,
376 |                     selected_belief_maps,
377 |                     selected_detected_kps_raw,
378 |                     selected_detected_kps_netin,
379 |                     selected_gt_kps_raw,
380 |                     selected_gt_kps_netin,
381 |                 )
382 |             )
383 | 
384 |     # Iterate through inferred results
385 |     idx_this_frame = 1
386 |     print("Creating visualizations...")
387 |     for (
388 |         image_raw,
389 |         input_image,
390 |         belief_maps,
391 |         detected_kp_projs_raw,
392 |         detected_kp_projs_net_input,
393 |         gt_kp_projs_raw,
394 |         gt_kp_projs_net_input,
395 |     ) in tqdm(sample_results):
396 | 
397 |         show_gt_keypoints = (not args.no_ground_truth) and len(gt_kp_projs_raw) > 0
398 | 
399 |         image_raw_resolution = image_raw.size
400 |         net_input_resolution = input_image.size
401 | 
402 |         if do_kp_overlay_net_input:
403 |             videos_to_make[idx_kp_overlay_net_input]["frame"] = input_image
404 | 
405 |         if do_kp_overlay_raw:
406 |             videos_to_make[idx_kp_overlay_raw]["frame"] = image_raw
407 | 
408 |         if do_kp_belief_overlay_raw:
409 |             flattened_belief_tensor = belief_maps.sum(dim=0)
410 |             flattened_belief_image = dream.image_proc.image_from_belief_map(
411 |                 flattened_belief_tensor, colormap="hot", normalization_method=6
412 |             )
413 |             flattened_belief_image_netin = dream.image_proc.convert_image_to_netin_from_netout(
414 |                 flattened_belief_image, net_input_resolution
415 |             )
416 |             flattened_belief_image_raw = dream.image_proc.inverse_preprocess_image(
417 |                 flattened_belief_image_netin, image_raw_resolution, image_preprocessing
418 |             )
419 |             videos_to_make[idx_kp_belief_overlay_raw]["frame"] = PILImage.blend(
420 |                 image_raw, flattened_belief_image_raw, alpha=0.5
421 |             )
422 | 
423 |             # Previous code here, but the overlays don't look as nice
424 |             # Note - this seems pretty slow
425 |             # I = np.asarray(flattened_belief_image_raw.convert('L'))
426 |             # I_black = I < 20
427 |             # mask = PILImage.fromarray(np.uint8(255*I_black))
428 |             # temp = PILImage.composite(image_raw, flattened_belief_image_raw, mask)
429 |             # videos_to_make[idx_kp_belief_overlay_raw]['frame'] = PILImage.blend(image_raw, temp, alpha=0.75)
430 |             # #PILImage.alpha_composite(flattened_belief_image_raw.convert('RGBA'), image_raw.convert('RGBA'))
431 | 
432 |         if do_belief_overlay_raw:
433 |             flattened_belief_tensor = belief_maps.sum(dim=0)
434 |             flattened_belief_image = dream.image_proc.image_from_belief_map(
435 |                 flattened_belief_tensor, colormap="hot", normalization_method=6
436 |             )
437 |             flattened_belief_image_netin = dream.image_proc.convert_image_to_netin_from_netout(
438 |                 flattened_belief_image, net_input_resolution
439 |             )
440 |             flattened_belief_image_raw = dream.image_proc.inverse_preprocess_image(
441 |                 flattened_belief_image_netin, image_raw_resolution, image_preprocessing
442 |             )
443 |             videos_to_make[idx_belief_overlay_raw]["frame"] = PILImage.blend(
444 |                 image_raw, flattened_belief_image_raw, alpha=0.5
445 |             )
446 | 
447 |         for n in range(n_idx_keypoints):
448 |             detected_kp_proj_raw = detected_kp_projs_raw[n, :]
449 |             detected_kp_proj_net_input = detected_kp_projs_net_input[n, :]
450 | 
451 |             if show_gt_keypoints:
452 |                 gt_kp_proj_raw = gt_kp_projs_raw[n, :]
453 |                 gt_kp_proj_net_input = gt_kp_projs_net_input[n, :]
454 | 
455 |             # Overlay
456 |             if do_kp_overlay_net_input:
457 |                 # Heuristic to make point diameter look good for larger raw resolutions
458 |                 pt_diameter = (
459 |                     12.0
460 |                     if image_raw_resolution[0] * image_raw_resolution[1] > 500000
461 |                     else 6.0
462 |                 )
463 |                 if show_gt_keypoints:
464 |                     videos_to_make[idx_kp_overlay_net_input][
465 |                         "frame"
466 |                     ] = dream.image_proc.overlay_points_on_image(
467 |                         videos_to_make[idx_kp_overlay_net_input]["frame"],
468 |                         [gt_kp_proj_net_input],
469 |                         annotation_color_dot="green",
470 |                         annotation_color_text="white",
471 |                         point_thickness=2,
472 |                         point_diameter=pt_diameter,
473 |                     )
474 | 
475 |                 videos_to_make[idx_kp_overlay_net_input][
476 |                     "frame"
477 |                 ] = dream.image_proc.overlay_points_on_image(
478 |                     videos_to_make[idx_kp_overlay_net_input]["frame"],
479 |                     [detected_kp_proj_net_input],
480 |                     annotation_color_dot="red",
481 |                     annotation_color_text="white",
482 |                     point_diameter=pt_diameter,
483 |                 )
484 | 
485 |             if do_kp_overlay_raw:
486 |                 # Heuristic to make point diameter look good for larger raw resolutions
487 |                 pt_diameter = (
488 |                     12.0
489 |                     if image_raw_resolution[0] * image_raw_resolution[1] > 500000
490 |                     else 6.0
491 |                 )
492 | 
493 |                 if show_gt_keypoints:
494 |                     videos_to_make[idx_kp_overlay_raw][
495 |                         "frame"
496 |                     ] = dream.image_proc.overlay_points_on_image(
497 |                         videos_to_make[idx_kp_overlay_raw]["frame"],
498 |                         [gt_kp_proj_raw],
499 |                         annotation_color_dot="green",
500 |                         annotation_color_text="white",
501 |                         point_thickness=2,
502 |                         point_diameter=pt_diameter + 2,
503 |                     )
504 | 
505 |                 videos_to_make[idx_kp_overlay_raw][
506 |                     "frame"
507 |                 ] = dream.image_proc.overlay_points_on_image(
508 |                     videos_to_make[idx_kp_overlay_raw]["frame"],
509 |                     [detected_kp_proj_raw],
510 |                     annotation_color_dot="red",
511 |                     annotation_color_text="white",
512 |                     point_diameter=pt_diameter,
513 |                 )
514 | 
515 |             if do_kp_belief_overlay_raw:
516 |                 # Heuristic to make point diameter look good for larger raw resolutions
517 |                 pt_diameter = (
518 |                     12.0
519 |                     if image_raw_resolution[0] * image_raw_resolution[1] > 500000
520 |                     else 6.0
521 |                 )
522 | 
523 |                 if show_gt_keypoints:
524 |                     videos_to_make[idx_kp_belief_overlay_raw][
525 |                         "frame"
526 |                     ] = dream.image_proc.overlay_points_on_image(
527 |                         videos_to_make[idx_kp_belief_overlay_raw]["frame"],
528 |                         [gt_kp_proj_raw],
529 |                         annotation_color_dot="green",
530 |                         annotation_color_text="white",
531 |                         point_thickness=2,
532 |                         point_diameter=pt_diameter + 2,
533 |                     )
534 | 
535 |                 videos_to_make[idx_kp_belief_overlay_raw][
536 |                     "frame"
537 |                 ] = dream.image_proc.overlay_points_on_image(
538 |                     videos_to_make[idx_kp_belief_overlay_raw]["frame"],
539 |                     [detected_kp_proj_raw],
540 |                     annotation_color_dot="red",
541 |                     annotation_color_text="white",
542 |                     point_diameter=pt_diameter,
543 |                 )
544 | 
545 |         frame_output_filename = str(idx_this_frame).zfill(6) + ".png"
546 |         for video in videos_to_make:
547 |             video["frame"].save(
548 |                 os.path.join(video["frames_dir"], frame_output_filename)
549 |             )
550 | 
551 |         idx_this_frame += 1
552 | 
553 |     # Call to ffmpeg
554 |     for video in videos_to_make:
555 |         video_from_frames(video["frames_dir"], video["output_path"], args.framerate)
556 |         shutil.rmtree(video["frames_dir"])
557 | 
558 | 
559 | if __name__ == "__main__":
560 |     parser = argparse.ArgumentParser(
561 |         formatter_class=argparse.ArgumentDefaultsHelpFormatter
562 |     )
563 |     parser.add_argument(
564 |         "-i",
565 |         "--input-params-path",
566 |         required=True,
567 |         help="Path to network parameters file.",
568 |     )
569 |     parser.add_argument(
570 |         "-c",
571 |         "--input-config-path",
572 |         default=None,
573 |         help="Path to network configuration file. If nothing is specified, the script will search for a config file by the same name as the network parameters file.",
574 |     )
575 |     parser.add_argument(
576 |         "-d", "--dataset-path", required=True, help="Path to dataset to visualize."
577 |     )
578 |     parser.add_argument(
579 |         "-o",
580 |         "--output-dir",
581 |         required=True,
582 |         help="Path to output directory to save visualization.",
583 |     )
584 |     parser.add_argument(
585 |         "-f",
586 |         "--force-overwrite",
587 |         action="store_true",
588 |         default=False,
589 |         help="Forces overwriting of analysis results in the provided directory.",
590 |     )
591 |     parser.add_argument(
592 |         "-k",
593 |         "--keypoint-ids",
594 |         nargs="+",
595 |         type=int,
596 |         default=None,
597 |         help="List of keypoint indices to use in the visualization. No input (None) means all keypoints will be used.",
598 |     )
599 |     parser.add_argument(
600 |         "-not-gt",
601 |         "--no-ground-truth",
602 |         action="store_true",
603 |         default=False,
604 |         help="Do not overlay ground truth keypoints when the dataset has ground truth.",
605 |     )
606 |     parser.add_argument(
607 |         "-v",
608 |         "--visualization-types",
609 |         nargs="+",
610 |         choices=[
611 |             KP_OVERLAY_RAW,
612 |             KP_OVERLAY_NET_INPUT,
613 |             KP_BELIEF_OVERLAY_RAW,
614 |             BELIEF_OVERLAY_RAW,
615 |         ],
616 |         default=[
617 |             KP_OVERLAY_RAW,
618 |             KP_OVERLAY_NET_INPUT,
619 |             KP_BELIEF_OVERLAY_RAW,
620 |             BELIEF_OVERLAY_RAW,
621 |         ],
622 |         help="Specify the visulizations to generate: 'kp_raw' overlays keypoints onto the original input image. 'kp_net_input' overlays keypoints onto the input image to the network. Multiple analyses can be specified.",
623 |     )
624 |     parser.add_argument(
625 |         "-fps",
626 |         "--framerate",
627 |         type=float,
628 |         default=30.0,
629 |         help="Framerate (frames/sec) for the video that is created from the dataset.",
630 |     )
631 |     parser.add_argument(
632 |         "-s",
633 |         "--start-frame",
634 |         default=None,
635 |         help='Start frame of the dataset to visualize. Use the name, not the file extension (e.g., "-s 001022", not "-s 001022.png").',
636 |     )
637 |     parser.add_argument(
638 |         "-e",
639 |         "--end-frame",
640 |         default=None,
641 |         help='End frame of the dataset to visualize. Use the name, not the file extension (e.g., "-e 001022", not "-e 001022.png").',
642 |     )
643 |     parser.add_argument(
644 |         "-g",
645 |         "--gpu-ids",
646 |         nargs="+",
647 |         type=int,
648 |         default=None,
649 |         help="The GPU IDs on which to conduct network inference. Nothing specified means all GPUs will be utilized. Does not affect results, only how quickly the results are obtained.",
650 |     )
651 |     parser.add_argument(
652 |         "-p",
653 |         "--image-preproc-override",
654 |         default=None,
655 |         help="Overrides the image preprocessing specified by the network. (Debug argument.)",
656 |     )
657 |     args = parser.parse_args()
658 |     visualize_network_inference(args)
659 | 


--------------------------------------------------------------------------------
/setup.py:
--------------------------------------------------------------------------------
 1 | # Copyright (c) 2020 NVIDIA Corporation. All rights reserved.
 2 | # This work is licensed under the NVIDIA Source Code License - Non-commercial. Full
 3 | # text can be found in LICENSE.md
 4 | 
 5 | import io
 6 | import os
 7 | import re
 8 | from setuptools import setup, find_packages
 9 | 
10 | # This method was adapted from code in
11 | #  https://github.com/albumentations-team/albumentations
12 | def get_version():
13 |     current_dir = os.path.abspath(os.path.dirname(__file__))
14 |     version_file = os.path.join(current_dir, "dream", "__init__.py")
15 |     with io.open(version_file, encoding="utf-8") as f:
16 |         return re.search(r'^__version__ = [\'"]([^\'"]*)[\'"]', f.read(), re.M).group(1)
17 | 
18 | setup(
19 |     name='dream',
20 |     version=get_version(),
21 |     author='NVIDIA',
22 |     author_email='sbirchfield@nvidia.com',
23 |     maintainer='Timothy Lee',
24 |     maintainer_email='timothyelee@cmu.edu',
25 |     description='Deep Robot-to-camera Extrinsics for Articulated Manipulators',
26 |     packages=['dream'],
27 |     package_dir={'dream': 'dream'},
28 |     zip_safe=False
29 | )
30 | 


--------------------------------------------------------------------------------
/test/_camera_settings.json:
--------------------------------------------------------------------------------
 1 | {
 2 | 	"camera_settings": [
 3 | 		{
 4 | 			"id": "",
 5 | 			"name": "",
 6 | 			"intrinsic_settings":
 7 | 			{
 8 | 				"fx": 160,
 9 | 				"fy": 160,
10 | 				"cx": 160,
11 | 				"cy": 120,
12 | 				"s": 0,
13 | 				"hfov": 90,
14 | 				"resolution":
15 | 				{
16 | 					"width": 320,
17 | 					"height": 240
18 | 				}
19 | 			},
20 | 			"captured_image_size":
21 | 			{
22 | 				"width": 320,
23 | 				"height": 240
24 | 			}
25 | 		}
26 | 	]
27 | }


--------------------------------------------------------------------------------
/test/test_image_proc.py:
--------------------------------------------------------------------------------
  1 | # Copyright (c) 2020 NVIDIA Corporation. All rights reserved.
  2 | # This work is licensed under the NVIDIA Source Code License - Non-commercial. Full
  3 | # text can be found in LICENSE.md
  4 | 
  5 | import numpy as np
  6 | import torch
  7 | 
  8 | # Needed to prevent pytest from raising a warning when importing torchvision via dream
  9 | import warnings
 10 | 
 11 | with warnings.catch_warnings():
 12 |     warnings.filterwarnings("ignore", category=DeprecationWarning)
 13 |     import imp
 14 | 
 15 | from dream import image_proc
 16 | 
 17 | # ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 18 | 
 19 | 
 20 | def test_shrink_resolution():
 21 | 
 22 |     # Testing that (640 x 480) shrinks to (533 x 400) using a (400, 400) reference
 23 |     input_resolution = (640, 480)
 24 |     ref_resolution = (400, 400)
 25 |     shrink_res = image_proc.shrink_resolution(input_resolution, ref_resolution)
 26 |     shrink_res_gt = (533, 400)
 27 |     assert shrink_res == shrink_res_gt
 28 | 
 29 |     # Testing that shrink is a no-op if the resolution is the same
 30 |     ref_resolution_same = input_resolution
 31 |     shrink_res_same = image_proc.shrink_resolution(
 32 |         input_resolution, ref_resolution_same
 33 |     )
 34 |     assert shrink_res_same == input_resolution
 35 | 
 36 | 
 37 | def test_shrink_and_crop_resolution():
 38 | 
 39 |     # Testing with (640 x 480) input with (400, 400) reference
 40 |     input_resolution = (640, 480)
 41 |     ref_resolution = (400, 400)
 42 |     input_cropped_res, cropped_coords = image_proc.shrink_and_crop_resolution(
 43 |         input_resolution, ref_resolution
 44 |     )
 45 | 
 46 |     input_cropped_res_gt = (480, 480)
 47 |     cropped_coords_gt = (80, 0)
 48 | 
 49 |     assert input_cropped_res == input_cropped_res_gt
 50 |     assert cropped_coords == cropped_coords_gt
 51 | 
 52 |     # Testing that shrink-and-crop is a no-op if the resolution is the same
 53 |     ref_resolution_same = input_resolution
 54 |     input_cropped_res_same, cropped_coords_same = image_proc.shrink_and_crop_resolution(
 55 |         input_resolution, ref_resolution_same
 56 |     )
 57 |     cropped_coords_same_gt = (0, 0)
 58 | 
 59 |     assert input_cropped_res_same == input_resolution
 60 |     assert cropped_coords_same == cropped_coords_same_gt
 61 | 
 62 | 
 63 | def test_resolution_after_preprocessing():
 64 | 
 65 |     # Testing with (640 x 480) input with (400, 400) reference
 66 |     input_resolution = (640, 480)
 67 |     ref_resolution = (400, 400)
 68 | 
 69 |     preproc_res_none = image_proc.resolution_after_preprocessing(
 70 |         input_resolution, ref_resolution, "none"
 71 |     )
 72 |     preproc_res_none_gt = input_resolution
 73 |     assert preproc_res_none == preproc_res_none_gt
 74 | 
 75 |     preproc_res_resize = image_proc.resolution_after_preprocessing(
 76 |         input_resolution, ref_resolution, "resize"
 77 |     )
 78 |     preproc_res_resize_gt = ref_resolution
 79 |     assert preproc_res_resize == preproc_res_resize_gt
 80 | 
 81 |     preproc_res_shrink = image_proc.resolution_after_preprocessing(
 82 |         input_resolution, ref_resolution, "shrink"
 83 |     )
 84 |     preproc_res_shrink_gt = (533, 400)
 85 |     assert preproc_res_shrink == preproc_res_shrink_gt
 86 | 
 87 |     preproc_res_snc = image_proc.resolution_after_preprocessing(
 88 |         input_resolution, ref_resolution, "shrink-and-crop"
 89 |     )
 90 |     preproc_res_snc_gt = ref_resolution
 91 |     assert preproc_res_snc == preproc_res_snc_gt
 92 | 
 93 | 
 94 | def test_belief_maps():
 95 | 
 96 |     # Generate belief maps
 97 |     belief_map_resolution = (80, 60)
 98 |     kp_proj = np.array([65.0, 20.0])
 99 |     kp_proj_out_of_frame = np.array(
100 |         [belief_map_resolution[0] + 20.0, belief_map_resolution[1] + 20.0]
101 |     )
102 |     belief_maps = image_proc.create_belief_map(
103 |         belief_map_resolution, [kp_proj, kp_proj_out_of_frame]
104 |     )
105 | 
106 |     belief_maps_as_tensor = torch.tensor(belief_maps).float()
107 |     # Uncomment below for showing visuals
108 |     # image_proc.image_from_belief_map(belief_maps_as_tensor[0]).show()
109 |     # image_proc.image_from_belief_map(belief_maps_as_tensor[1]).show()
110 | 
111 |     # Detect keypoints in belief maps
112 |     all_peaks = image_proc.peaks_from_belief_maps(belief_maps_as_tensor, 0.0)
113 | 
114 |     # First belief map should only contain one detected keypoint, the ground truth
115 |     assert len(all_peaks[0]) == 1
116 |     kp_proj_detected = np.array(all_peaks[0][0][:2])
117 |     assert np.linalg.norm(kp_proj - kp_proj_detected) < 1.0e-3
118 | 
119 |     # Second belief map, for out-of-frame keypoint, should return no peaks
120 |     assert len(all_peaks[1]) == 0
121 | 


--------------------------------------------------------------------------------
/test/test_utilities.py:
--------------------------------------------------------------------------------
 1 | # Copyright (c) 2020 NVIDIA Corporation. All rights reserved.
 2 | # This work is licensed under the NVIDIA Source Code License - Non-commercial. Full
 3 | # text can be found in LICENSE.md
 4 | 
 5 | import os
 6 | 
 7 | import numpy as np
 8 | 
 9 | # Needed to prevent pytest from raising a warning when importing torchvision via dream
10 | import warnings
11 | 
12 | with warnings.catch_warnings():
13 |     warnings.filterwarnings("ignore", category=DeprecationWarning)
14 |     import imp
15 | 
16 | from dream import utilities as dream_utils
17 | 
18 | # ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
19 | 
20 | 
21 | def test_load_camera_intrinsics():
22 | 
23 |     # Provide the camera setting json that exists in this directory
24 |     this_script_dir = os.path.dirname(os.path.realpath(__file__))
25 |     camera_data_path = os.path.join(this_script_dir, "_camera_settings.json")
26 |     camera_K = dream_utils.load_camera_intrinsics(camera_data_path)
27 | 
28 |     # Construct the ground truth camera matrix
29 |     cam_fx = 160.0
30 |     cam_fy = 160.0
31 |     cam_cx = 160.0
32 |     cam_cy = 120.0
33 | 
34 |     camera_K_gt = np.array(
35 |         [[cam_fx, 0.0, cam_cx], [0.0, cam_fy, cam_cy], [0.0, 0.0, 1.0]]
36 |     )
37 | 
38 |     # Test assertion
39 |     assert np.all(camera_K == camera_K_gt)
40 | 
41 | 
42 | def test_load_image_resolution():
43 | 
44 |     # Provide the camera setting json that exists in this directory
45 |     this_script_dir = os.path.dirname(os.path.realpath(__file__))
46 |     camera_data_path = os.path.join(this_script_dir, "_camera_settings.json")
47 |     image_resolution = dream_utils.load_image_resolution(camera_data_path)
48 | 
49 |     # Construct the ground truth image resolution
50 |     image_resolution_gt = (320, 240)
51 | 
52 |     # Test assertion
53 |     assert image_resolution == image_resolution_gt
54 | 


--------------------------------------------------------------------------------
/trained_models/DOWNLOAD.sh:
--------------------------------------------------------------------------------
 1 | #!/bin/bash
 2 | 
 3 | # This script makes it easy to download pretrained model weights.
 4 | # (Files are at https://drive.google.com/open?id=1Krp-fCT9ffEML3IpweSOgWiMHHBw6k2Z)
 5 | 
 6 | # Each robot contains two files (.pth and .yaml).
 7 | # Comment out any model you do not want.
 8 | 
 9 | #----- Rethink Robotics' Baxter 
10 | 
11 | # Baxter VGG-Q (baxter_dream_vgg_q)
12 | gdown --id 1Ia4UxSdilXH9SwyPqem0rS13Mha9pN7F  # .pth (85 MB)
13 | gdown --id 1TNhYuOm_-UH5z1rEVm16mnRA7hB7AT1X  # .yaml
14 | 
15 | 
16 | #----- Kuka LBR iiwa 7 R800
17 | 
18 | # Kuka ResNet-H (kuka_dream_resnet_h)
19 | gdown --id 1Ctoh01q1IvLHP9pf5Os8eIzJ8fQBgYpJ  # .pth (207 MB)
20 | gdown --id 1MLWDTq7yQF9UeV1T3REDk60GYne32OXJ  # .yaml
21 | 
22 | 
23 | #----- Franka Emika Panda
24 | 
25 | # Panda VGG-Q (panda_dream_vgg_q) -- recommended
26 | gdown --id 1zS-kQ73dOYMXS8Wku_OUN0q7MvEUm2fZ  # .pth (85 MB)
27 | gdown --id 1MKDiknxDzXErd4Gwdv0uMoL65IYjxO0Q  # .yaml
28 | 
29 | # Panda VGG-F (panda_dream_vgg_f)
30 | gdown --id 1pz-gXux8TxB4pOYnYy5DH7vp-3-mTJFu  # .pth (86 MB)
31 | gdown --id 191Pgu_C0qzKpOSoicOOSLq-bR7cg2KVO  # .yaml
32 | 
33 | # Panda ResNet-H (panda_dream_resnet_h)
34 | gdown --id 16fyv6ps3om0H8dnXRDHj0w4dfEKPSpDW  # .pth (207 MB)
35 | gdown --id 1gCpigRIqm1rAw-o7oXpRO2ZTQkHyYF-k  # .yaml
36 | 
37 | # Panda ResNet-F (panda_dream_resnet_f) -- difficult to train
38 | gdown --id 1d8UfrgQb4ohIAfpRGvDBjabSKuP9LCpy  # .pth (211 MB)
39 | gdown --id 1IWdXSmmIq2-eimtNK_ywJZRH4omesSDq  # .yaml
40 | 


--------------------------------------------------------------------------------