├── .docs
    ├── .DS_Store
    └── figures
    │   ├── .DS_Store
    │   ├── teaser.pdf
    │   ├── teaser.png
    │   ├── architecture.pdf
    │   ├── architecture.png
    │   ├── multiple_views.png
    │   ├── reconstructions.png
    │   ├── results_generation.pdf
    │   ├── results_generation.png
    │   ├── results_inpainting.pdf
    │   ├── results_inpainting.png
    │   ├── depth_reconstructions.png
    │   ├── results_reconstruction.pdf
    │   ├── results_reconstruction.png
    │   ├── results_reconstruction_2.pdf
    │   ├── results_reconstruction_2.png
    │   ├── results_reconstruction_ood.pdf
    │   └── results_reconstruction_ood.png
├── LICENSE
└── README.md


/.docs/.DS_Store:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Anciukevicius/RenderDiffusion/HEAD/.docs/.DS_Store


--------------------------------------------------------------------------------
/.docs/figures/.DS_Store:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Anciukevicius/RenderDiffusion/HEAD/.docs/figures/.DS_Store


--------------------------------------------------------------------------------
/.docs/figures/teaser.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Anciukevicius/RenderDiffusion/HEAD/.docs/figures/teaser.pdf


--------------------------------------------------------------------------------
/.docs/figures/teaser.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Anciukevicius/RenderDiffusion/HEAD/.docs/figures/teaser.png


--------------------------------------------------------------------------------
/.docs/figures/architecture.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Anciukevicius/RenderDiffusion/HEAD/.docs/figures/architecture.pdf


--------------------------------------------------------------------------------
/.docs/figures/architecture.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Anciukevicius/RenderDiffusion/HEAD/.docs/figures/architecture.png


--------------------------------------------------------------------------------
/.docs/figures/multiple_views.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Anciukevicius/RenderDiffusion/HEAD/.docs/figures/multiple_views.png


--------------------------------------------------------------------------------
/.docs/figures/reconstructions.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Anciukevicius/RenderDiffusion/HEAD/.docs/figures/reconstructions.png


--------------------------------------------------------------------------------
/.docs/figures/results_generation.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Anciukevicius/RenderDiffusion/HEAD/.docs/figures/results_generation.pdf


--------------------------------------------------------------------------------
/.docs/figures/results_generation.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Anciukevicius/RenderDiffusion/HEAD/.docs/figures/results_generation.png


--------------------------------------------------------------------------------
/.docs/figures/results_inpainting.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Anciukevicius/RenderDiffusion/HEAD/.docs/figures/results_inpainting.pdf


--------------------------------------------------------------------------------
/.docs/figures/results_inpainting.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Anciukevicius/RenderDiffusion/HEAD/.docs/figures/results_inpainting.png


--------------------------------------------------------------------------------
/.docs/figures/depth_reconstructions.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Anciukevicius/RenderDiffusion/HEAD/.docs/figures/depth_reconstructions.png


--------------------------------------------------------------------------------
/.docs/figures/results_reconstruction.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Anciukevicius/RenderDiffusion/HEAD/.docs/figures/results_reconstruction.pdf


--------------------------------------------------------------------------------
/.docs/figures/results_reconstruction.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Anciukevicius/RenderDiffusion/HEAD/.docs/figures/results_reconstruction.png


--------------------------------------------------------------------------------
/.docs/figures/results_reconstruction_2.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Anciukevicius/RenderDiffusion/HEAD/.docs/figures/results_reconstruction_2.pdf


--------------------------------------------------------------------------------
/.docs/figures/results_reconstruction_2.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Anciukevicius/RenderDiffusion/HEAD/.docs/figures/results_reconstruction_2.png


--------------------------------------------------------------------------------
/.docs/figures/results_reconstruction_ood.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Anciukevicius/RenderDiffusion/HEAD/.docs/figures/results_reconstruction_ood.pdf


--------------------------------------------------------------------------------
/.docs/figures/results_reconstruction_ood.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Anciukevicius/RenderDiffusion/HEAD/.docs/figures/results_reconstruction_ood.png


--------------------------------------------------------------------------------
/LICENSE:
--------------------------------------------------------------------------------
 1 | MIT License
 2 | 
 3 | Copyright (c) 2023 Titas Anciukevičius
 4 | 
 5 | Permission is hereby granted, free of charge, to any person obtaining a copy
 6 | of this software and associated documentation files (the "Software"), to deal
 7 | in the Software without restriction, including without limitation the rights
 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
 9 | copies of the Software, and to permit persons to whom the Software is
10 | furnished to do so, subject to the following conditions:
11 | 
12 | The above copyright notice and this permission notice shall be included in all
13 | copies or substantial portions of the Software.
14 | 
15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21 | SOFTWARE.


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
 1 | ## <br><sub>RenderDiffusion: Image Diffusion for 3D Reconstruction, Inpainting and Generation</sub>
 2 | 
 3 | *Titas Anciukevičius‬, Zexiang Xu, Matthew Fisher, Paul Henderson, Hakan Bilen, Niloy J. Mitra, Paul Guerrero*
 4 | 
 5 | 
 6 | ![Teaser image](.docs/figures/teaser.png)
 7 | 
 8 | **Abstract**: Diffusion models currently achieve state-of-the-art performance for both conditional and unconditional image generation.
 9 | However, so far, image diffusion models do not support tasks required for 3D understanding, such as view-consistent 3D generation or single-view object reconstruction. 
10 | In this paper, we present RenderDiffusion as the first diffusion model for 3D generation and inference that can be trained using only monocular 2D supervision.
11 | At the heart of our method is a novel image denoising
12 | architecture that generates and renders an intermediate three-dimensional
13 | representation of a scene in each denoising step. 
14 | This enforces a strong inductive structure into the diffusion process that gives us a 3D consistent representation while only requiring
15 | 2D supervision. 
16 | The resulting 3D representation can be rendered from any viewpoint.
17 | We evaluate RenderDiffusion on ShapeNet and Clevr datasets
18 | and show competitive performance for generation of 3D scenes and inference of 3D scenes from 2D images. Additionally, our diffusion-based approach allows us to use 2D inpainting to edit 3D scenes. We believe that our work promises to enable full 3D generation at scale when trained on massive image collections, thus circumventing the need to have large-scale 3D model collections for supervision.
19 | 
20 | 
21 | ## Model
22 | Our method builds on the successful training and generation setup of 2D image diffusion models, which are trained to denoise input images that have various amounts of added noise. At test time, novel images are generated by applying the model in multiple steps to progressively recover an image starting from pure noise samples. We keep this training and generation setup, but modify the architecture of the main denoiser to encode the noisy input image into a 3D representation of the scene 
23 | that is volumetrically rendered to obtain the denoised output image.
24 | This introduces an inductive bias that favors 3D scene consistency, and allows us to render the 3D representation from novel viewpoints. Figure below shows an overview of our architecture.
25 | 
26 | ![model](.docs/figures/architecture.png)
27 | 
28 | ## Results
29 | We evaluate RenderDiffusion on three tasks: monocular 3D reconstruction, unconditional generation, and 3D-aware inpainting.
30 | 
31 | ### 3D reconstruction
32 | Unlike existing 2D diffusion models, we can use RenderDiffusion to reconstruct 3D scenes from 2D images. To reconstruct the scene shown in an input image $\\mathbf{x}\_0$, we pass it through the forward process for $t\_r \le T$ steps, and then denoise it in the reverse process using our learned denoiser $g\_\theta$. In the final denoising step, the triplanes encode a 3D scene that can be rendered from novel viewpoints. The choice of $t\_r$ introduces an interesting control that is not available in existing 3D reconstruction methods. It allows us to trade off between reconstruction fidelity and generalization to out-of-distribution input images: At $t\_r=0$, no noise is added to the input image and the 3D reconstruction reproduces the scene shown in the input image as accurately as possible; however, out-of-distribution images cannot be handled. With larger values for $t\_r$, input images that are increasingly out-of-distribution can be handled, as the denoiser can move the input images towards the learned distribution. This comes at the cost of reduced reconstruction fidelity, as the added noise removes some detail from the input image, which the denoiser fills in with generated content.
33 | 
34 | ![inference](.docs/figures/results_reconstruction.png)
35 | 
36 | ![inference2](.docs/figures/results_reconstruction_2.png)
37 | 
38 | Using a 3D-aware denoiser allows us to reconstruct a 3D scene from noisy images, where information that is lost to the noise is filled in with generated content. By adding more noise, we can generalize to input images that are increasingly out-of-distribution, at the cost of reconstruction fidelity. In figure below, we show 3D reconstructions from photos that have significantly different backgrounds and materials than the images seen at training time. We see that results with added noise ( $t\_{r}=40$ ) generalize better than results without added noise ( $t\_{r}=0$ ), at the cost of less accurate shapes and poses of the reconstructed models.
39 | 
40 | ![ood_inference](.docs/figures/results_reconstruction_ood.png)
41 | 
42 | ### Unconditional Generation
43 | 
44 | Below we show qualitative results for unconditional generation.
45 | 
46 | ![generation](.docs/figures/results_generation.png)
47 | 
48 | ### 3D-aware inpainting
49 | 
50 | Lastly, we apply our trained model to the task of inpainting masked 2D regions of an image while simultaneously reconstructing the 3D shape it shows.
51 | We follow an approach similar to [RePaint](https://github.com/andreas128/RePaint), but using our 3D denoiser instead of their 2D architecture.
52 | Specifically, we condition the denoising iterations on the known regions of the image, 
53 | by setting $\\mathbf{x}\_{t-1}$ in known regions to the noised target pixels, while sampling the unknown regions as usual based on $\\mathbf{x}\_t$.
54 | Thus, the model performs 3D-aware inpainting, finding a latent 3D structure that is consistent with the observed part of the image, and also plausible in the masked part.
55 | 
56 | ![inpainting](.docs/figures/results_inpainting.png)
57 | 
58 | ## Code
59 | 
60 | To aid reproducibility, we will soon release our datasets, code, and checkpoints.
61 | 
62 | ## Related work
63 | Check out related prior and concurrent work:
64 | * [PixelNeRF](https://github.com/sxyu/pixel-nerf) is a non-generative method for inference of implicit 3D representations.
65 | * [EG3D](https://github.com/NVlabs/eg3d) is a generative 3D model based on GANs with triplane representation. 
66 | * Concurrently, [GAUDI](https://github.com/apple/ml-gaudi) presents a diffusion model for generation of 3D camera paths and up to 300 scenes. However, unlike ours, it requires 2 stages of training, with the diffusion model operating only on latent space. In contrast, our diffusion model is defined directly over pixels - this allows exciting applications, such as refinement of generated image and inpainting.
67 | 
68 | 
69 | ## Citation
70 | ```
71 | @article{anciukevicius2022renderdiffusion,
72 | 	title        = {{RenderDiffusion}: Image Diffusion for {3D} Reconstruction, Inpainting and Generation},
73 | 	author       = {Titas Anciukevicius and Zexiang Xu and Matthew Fisher and Paul Henderson and Hakan Bilen and Mitra, Niloy J. and Paul Guerrero},
74 | 	year         = 2022,
75 | 	journal      = {arXiv}
76 | }
77 | ```
78 | 


--------------------------------------------------------------------------------