├── .gitattributes
├── .gitignore
├── LICENSE
├── README.md
├── assets
    ├── lll++_overview.png
    └── overview.png
├── configs
    ├── image_branch.yaml
    ├── lalaloc_pp
    │   ├── plan_branch.yaml
    │   └── transformer_image_branch.yaml
    └── layout_branch.yaml
├── lalaloc
    ├── __init__.py
    ├── config
    │   ├── __init__.py
    │   ├── defaults.py
    │   └── parser.py
    ├── data
    │   ├── __init__.py
    │   ├── dataset.py
    │   ├── load.py
    │   ├── split.py
    │   └── transform.py
    ├── model
    │   ├── __init__.py
    │   ├── lalaloc.py
    │   ├── lalaloc_base.py
    │   ├── lalaloc_pp.py
    │   ├── lalaloc_pp_base.py
    │   ├── losses.py
    │   ├── modules.py
    │   ├── pose_optimisation.py
    │   ├── position_encoding.py
    │   ├── transformer.py
    │   └── unet.py
    └── utils
    │   ├── chamfer.py
    │   ├── eval.py
    │   ├── floorplan.py
    │   ├── panorama.py
    │   ├── polygons.py
    │   ├── projection.py
    │   ├── render.py
    │   └── vogel_disc.py
├── train.py
└── trained_models
    ├── .gitattributes
    ├── lalaloc_image2layout.ckpt
    ├── lalaloc_layout2layout.ckpt
    ├── lalaloc_pp_image2plan.ckpt
    └── lalaloc_pp_planbranch.ckpt


/.gitattributes:
--------------------------------------------------------------------------------
1 | *.ckpt filter=lfs diff=lfs merge=lfs -text
2 | 


--------------------------------------------------------------------------------
/.gitignore:
--------------------------------------------------------------------------------
1 | __pycache__/
2 | .vscode/
3 | runs/


--------------------------------------------------------------------------------
/LICENSE:
--------------------------------------------------------------------------------
 1 | MIT License
 2 | 
 3 | Copyright (c) 2021 Henry Howard-Jenkins @ Active Vision Laboratory, Oxford
 4 | 
 5 | Permission is hereby granted, free of charge, to any person obtaining a copy
 6 | of this software and associated documentation files (the "Software"), to deal
 7 | in the Software without restriction, including without limitation the rights
 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
 9 | copies of the Software, and to permit persons to whom the Software is
10 | furnished to do so, subject to the following conditions:
11 | 
12 | The above copyright notice and this permission notice shall be included in all
13 | copies or substantial portions of the Software.
14 | 
15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21 | SOFTWARE.


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
  1 | # Overview
  2 | 
  3 | This is the code repository for LaLaLoc and LaLaLoc++.
  4 | 
  5 | 
  6 | * We currently provide:
  7 |   * Training and evaluation code for LaLaLoc, for both the Image-to-Layout and Layout-to-Layout configurations. 
  8 |   * Training and evaluation code for LaLaLoc++'s plan and image branches.
  9 |   * Pretrained models for all the provided configs.
 10 | 
 11 | ## LaLaLoc++: Global Floor Plan Comprehension for Layout Localisation in Unvisited Environments
 12 | **Henry Howard-Jenkins and Victor Adrian Prisacariu**
 13 | **(ECCV 2022)**
 14 | 
 15 | [Project Page](https://lalalocpp.active.vision) | Paper(coming soon!)
 16 | 
 17 | ![LaLaLoc++ Overview](assets/lll++_overview.png)
 18 | 
 19 | ## LaLaLoc: Latent Layout Localisation in Dynamic, Unvisited Environments
 20 | **Henry Howard-Jenkins, Jose-Raul Ruiz-Sarmiento and Victor Adrian Prisacariu**
 21 | **(ICCV 2021)**
 22 | 
 23 | [Project Page](https://lalaloc.active.vision) | [Paper](https://arxiv.org/abs/2104.09169)
 24 | 
 25 | ![LaLaLoc Overview](assets/overview.png)
 26 | 
 27 | 
 28 | # Setup
 29 | ## Installing Requirements
 30 | 
 31 | * Create conda environment:
 32 | ```
 33 | conda create -n lalaloc python==3.8
 34 | conda activate lalaloc
 35 | ```
 36 | * Install PyTorch:
 37 | ```
 38 | conda install pytorch==1.7.1 torchvision==0.8.2 cudatoolkit=10.1 -c pytorch
 39 | ```
 40 | * Install Pytorch Lightning:
 41 | ```
 42 | conda install -c conda-forge pytorch-lightning==1.1.5
 43 | ```
 44 | * Install Pytorch3d:
 45 | ```
 46 | conda install -c fvcore -c iopath -c conda-forge fvcore iopath
 47 | conda install -c bottler nvidiacub
 48 | conda install -c pytorch3d pytorch3d==0.4.0 
 49 | ```
 50 | * Install Pymesh
 51 |     * Follow build and install instructions: https://github.com/PyMesh/PyMesh
 52 | * Install Redner and OpenCV:
 53 | ```
 54 | pip install redner-gpu opencv-python
 55 | ```
 56 | * Install Scikit-Learn:
 57 | ```
 58 | conda install -c anaconda scikit-learn
 59 | ```
 60 | 
 61 | ## Download the Structured3D Dataset
 62 | * Information provided here: https://github.com/bertjiazheng/Structured3D
 63 | 
 64 | # Usage
 65 | ### Layout/Plan Branch
 66 | * Train LaLaLoc's layout branch or LaLaLoc++'s plan branch.
 67 | ```
 68 | # LaLaLoc layout branch
 69 | python train.py -c configs/layout_branch.yaml \
 70 |     DATASET.PATH [path/to/dataset]
 71 | ```
 72 | ```
 73 | # LaLaLoc++ plan branch
 74 | python train.py -c configs/lalaloc_pp/plan_branch.yaml \
 75 |     DATASET.PATH [path/to/dataset]
 76 | ```
 77 | * Test LaLaLoc's layout branch:
 78 |     * Perform evaluation of the trained layout branch on a sampled grid of 0.5m with VDR and LPO.
 79 |     
 80 |     Note: Testing LaLaLoc++'s plan branch isn't particularly meaningful.
 81 | ```
 82 | python train.py -c configs/layout_branch.yaml -t [path/to/checkpoint] \
 83 |     DATASET.PATH [path/to/dataset] \
 84 |     SYSTEM.NUM_GPUS 1 \
 85 |     TEST.VOGEL_DISC_REFINE True \
 86 |     TEST.LATENT_POSE_OPTIMISATION True \
 87 |     TEST.POSE_SAMPLE_STEP 500
 88 | ```
 89 | 
 90 | ### Image Branch
 91 | * Train the image branch for LaLaLoc and LaLaLoc++
 92 |     * Perform training of the image branch with the layout/plan branch from a previous training run.
 93 | ```
 94 | # LaLaLoc image branch
 95 | python train.py -c configs/image_branch.yaml \
 96 |     DATASET.PATH [path/to/dataset] \
 97 |     TRAIN.SOURCE_WEIGHTS [path/to/layout_branch_checkpoint]
 98 | ```
 99 | ```
100 | # LaLaLoc++ image branch
101 | python train.py -c configs/lalaloc_pp/image_branch.yaml \
102 |     DATASET.PATH [path/to/dataset] \
103 |     TRAIN.SOURCE_WEIGHTS [path/to/plan_branch_checkpoint]
104 | ```
105 | 
106 | * Test image branch
107 | ```
108 | # LaLaLoc image branch
109 | python train.py -c configs/image_branch.yaml -t [path/to/checkpoint] \
110 |     DATASET.PATH [path/to/dataset] \
111 |     SYSTEM.NUM_GPUS 1 \
112 |     TEST.VOGEL_DISC_REFINE True \
113 |     TEST.LATENT_POSE_OPTIMISATION True \
114 |     TEST.POSE_SAMPLE_STEP 500
115 | ```
116 | ```
117 | # LaLaLoc++ image branch
118 | python train.py -c configs/lalaloc_pp/transfomer_image_branch.yaml -t [path/to/checkpoint] \
119 |     DATASET.PATH [path/to/dataset] \
120 |     SYSTEM.NUM_GPUS 1 \
121 | ```
122 | 
123 | # Citations
124 | ```
125 | @article{howard2022lalaloc++,
126 |   title={LaLaLoc++: Global Floor Plan Comprehension for Layout Localisation in Unvisited Environments},
127 |   author={Howard-Jenkins, Henry and Prisacariu, Victor Adrian},
128 |   booktitle={Proceedings of the European Conference on Computer Vision},
129 |   pages={},
130 |   year={2022}
131 | }
132 | ```
133 | ```
134 | @inproceedings{howard2021lalaloc,
135 |   title={Lalaloc: Latent layout localisation in dynamic, unvisited environments},
136 |   author={Howard-Jenkins, Henry and Ruiz-Sarmiento, Jose-Raul and Prisacariu, Victor Adrian},
137 |   booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision},
138 |   pages={10107--10116},
139 |   year={2021}
140 | }
141 | ```


--------------------------------------------------------------------------------
/assets/lll++_overview.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/ActiveVisionLab/LaLaLoc/be48bc1884722409eeb6766e55fafa8ece521c9c/assets/lll++_overview.png


--------------------------------------------------------------------------------
/assets/overview.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/ActiveVisionLab/LaLaLoc/be48bc1884722409eeb6766e55fafa8ece521c9c/assets/overview.png


--------------------------------------------------------------------------------
/configs/image_branch.yaml:
--------------------------------------------------------------------------------
 1 | OUT_DIR: "./runs/lalaloc/image_branch"
 2 | SYSTEM:
 3 |   NUM_GPUS: 2
 4 |   DISTRIBUTED_BACKEND: "ddp"
 5 |   NUM_WORKERS: 0
 6 | MODEL:
 7 |   QUERY_TYPE: "image"
 8 |   PANORAMA_BACKBONE: "resnet50"
 9 |   LAYOUT_BACKBONE: "resnet18"
10 |   NORMALISE_EMBEDDING: True
11 | TRAIN:
12 |   SOURCE_WEIGHTS: ""
13 |   LR_MILESTONES: [100, 150]
14 |   NUM_EPOCHS: 200
15 |   NEAR_MAX_DIST: 0.5
16 |   LOSS: "cos"
17 |   INITIAL_LR: 0.1
18 |   BATCH_SIZE: 32
19 |   TEST_EVERY: 20
20 |   NUM_NEAR_SAMPLES: 1
21 |   NUM_FAR_SAMPLES: 0
22 | TEST:
23 |   POSE_SAMPLE_STEP: 1000
24 |   LAYOUTS_MAX_BATCH: 256
25 | RENDER:
26 |   BATCH_SIZE: 1024
27 |   IMG_SIZE: (128, 256)
28 | POSE_REFINE:
29 |   RENDER_SIZE: (128, 256)
30 | 


--------------------------------------------------------------------------------
/configs/lalaloc_pp/plan_branch.yaml:
--------------------------------------------------------------------------------
 1 | OUT_DIR: "./runs/lalaloc_pp/plan_branch"
 2 | SYSTEM:
 3 |   NUM_GPUS: 1
 4 |   DISTRIBUTED_BACKEND: "ddp"
 5 |   NUM_WORKERS: 16
 6 | MODEL:
 7 |   TYPE: "lalaloc++"
 8 |   QUERY_TYPE: "layout"
 9 |   NORMALISE_EMBEDDING: True
10 |   NORMALISE_SAMPLE: True
11 | TRAIN:
12 |   LR_MILESTONES: [50, 75]
13 |   NUM_EPOCHS: 100
14 |   INITIAL_LR: 0.05
15 |   BATCH_SIZE: 16
16 |   TEST_EVERY: 10
17 |   NUM_FAR_SAMPLES: 20
18 |   COMPUTE_GT_DIST: False
19 | TEST:
20 |   POSE_SAMPLE_STEP: 1000
21 | RENDER:
22 |   BATCH_SIZE: 1024
23 |   IMG_SIZE: (128, 256)


--------------------------------------------------------------------------------
/configs/lalaloc_pp/transformer_image_branch.yaml:
--------------------------------------------------------------------------------
 1 | OUT_DIR: "./runs/lalaloc_pp/image_branch"
 2 | SYSTEM:
 3 |   NUM_GPUS: 1
 4 |   DISTRIBUTED_BACKEND: "ddp"
 5 |   NUM_WORKERS: 16
 6 | MODEL:
 7 |   TYPE: "lalaloc++"
 8 |   QUERY_TYPE: "image"
 9 |   PANORAMA_BACKBONE: "resnet50"
10 |   LAYOUT_BACKBONE: "resnet18"
11 |   NORMALISE_EMBEDDING: True
12 |   NORMALISE_SAMPLE: True
13 |   PANO_EMBEDDER_TYPE: "transformer-fc"
14 |   PANORAMA_MODULE:
15 |     POS_AT_INPUT: False
16 |     HIDDEN_DIM: 256
17 | TRAIN:
18 |   SOURCE_WEIGHTS: "trained_models/lalaloc_pp_planbranch.ckpt"
19 |   LR_MILESTONES: [100, 150]
20 |   NUM_EPOCHS: 200
21 |   NEAR_MAX_DIST: 0.5
22 |   INITIAL_LR: 0.1
23 |   BATCH_SIZE: 64
24 |   TEST_EVERY: 50
25 |   NUM_NEAR_SAMPLES: 1
26 |   NUM_FAR_SAMPLES: 0
27 |   COMPUTE_GT_DIST: False
28 | TEST:
29 |   POSE_SAMPLE_STEP: 1000
30 |   LAYOUTS_MAX_BATCH: 256
31 | RENDER:
32 |   BATCH_SIZE: 1024
33 |   IMG_SIZE: (128, 256)
34 | POSE_REFINE:
35 |   RENDER_SIZE: (128, 256)
36 | INPUT:
37 |   NORMALISE_STD: [0.229, 0.224, 0.225]


--------------------------------------------------------------------------------
/configs/layout_branch.yaml:
--------------------------------------------------------------------------------
 1 | OUT_DIR: "./runs/lalaloc/layout_branch"
 2 | SYSTEM:
 3 |   NUM_GPUS: 2
 4 |   DISTRIBUTED_BACKEND: "ddp"
 5 |   NUM_WORKERS: 0
 6 | MODEL:
 7 |   QUERY_TYPE: "layout"
 8 |   LAYOUT_BACKBONE: "resnet18"
 9 |   NORMALISE_EMBEDDING: True
10 | TRAIN:
11 |   LR_MILESTONES: [10, 15]
12 |   NUM_EPOCHS: 20
13 |   NEAR_MAX_DIST: 0.5
14 |   LOSS: "bbs"
15 |   INITIAL_LR: 0.01
16 |   BATCH_SIZE: 4
17 |   TEST_EVERY: 4
18 |   NUM_FAR_SAMPLES: 20
19 |   LAYOUT_LOSS_SCALE: 100.0
20 |   DECODER_LOSS_SCALE: 1.0
21 | TEST:
22 |   POSE_SAMPLE_STEP: 1000
23 |   LAYOUTS_MAX_BATCH: 256
24 | RENDER:
25 |   BATCH_SIZE: 1024
26 |   IMG_SIZE: (128, 256)
27 | POSE_REFINE:
28 |   RENDER_SIZE: (128, 256)


--------------------------------------------------------------------------------
/lalaloc/__init__.py:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/ActiveVisionLab/LaLaLoc/be48bc1884722409eeb6766e55fafa8ece521c9c/lalaloc/__init__.py


--------------------------------------------------------------------------------
/lalaloc/config/__init__.py:
--------------------------------------------------------------------------------
1 | from .defaults import get_cfg_defaults
2 | from .parser import parse_args


--------------------------------------------------------------------------------
/lalaloc/config/defaults.py:
--------------------------------------------------------------------------------
  1 | from yacs.config import CfgNode as CN
  2 | 
  3 | _C = CN()
  4 | _C.OUT_DIR = "./runs"
  5 | _C.SEED = 42
  6 | 
  7 | _C.SYSTEM = CN()
  8 | _C.SYSTEM.NUM_WORKERS = 0
  9 | _C.SYSTEM.NUM_GPUS = 1
 10 | _C.SYSTEM.DISTRIBUTED_BACKEND = "ddp"
 11 | 
 12 | 
 13 | _C.INPUT = CN()
 14 | 
 15 | _C.INPUT.IMG_SIZE = (256, 512)
 16 | _C.INPUT.LAYOUT_SIZE = (256, 512)
 17 | _C.INPUT.NORMALISE_MEAN = [0.485, 0.456, 0.406]
 18 | _C.INPUT.NORMALISE_STD = [0.485, 0.456, 0.406]
 19 | # _C.INPUT.NORMALISE_STD = [0.229, 0.224, 0.225] # A bug meant that this was also used as the STD for LaLaLoc configs
 20 | 
 21 | 
 22 | _C.DATASET = CN()
 23 | 
 24 | _C.DATASET.PATH = "/home/henry/Data/datasets/structured3d/Structured3D"
 25 | _C.DATASET.TEST_LIGHTING = ["warm"]
 26 | _C.DATASET.TEST_FURNITURE = ["full"]
 27 | _C.DATASET.TRAIN_LIGHTING = ["raw", "warm", "cold"]
 28 | _C.DATASET.TRAIN_FURNITURE = ["empty", "full", "simple"]
 29 | _C.DATASET.AUGMENT_LAYOUTS = False
 30 | _C.DATASET.PIX_PER_MM = 0.025
 31 | _C.DATASET.FLOORPLAN_DIVISIBLE_BY = 32
 32 | 
 33 | 
 34 | _C.RENDER = CN()
 35 | 
 36 | _C.RENDER.IMG_SIZE = (256, 512)
 37 | _C.RENDER.USE_CUDA = True
 38 | _C.RENDER.BATCH_SIZE = 256
 39 | _C.RENDER.INVALID_VALUE = -1000
 40 | 
 41 | 
 42 | _C.MODEL = CN()
 43 | 
 44 | _C.MODEL.TYPE = "lalaloc"
 45 | _C.MODEL.QUERY_TYPE = "image"
 46 | _C.MODEL.PANORAMA_BACKBONE = "resnet18"
 47 | _C.MODEL.LAYOUT_BACKBONE = "resnet18"
 48 | _C.MODEL.DESC_LENGTH = 128
 49 | _C.MODEL.NORMALISE_EMBEDDING = True
 50 | _C.MODEL.PANO_EMBEDDER_TYPE = "fc"
 51 | _C.MODEL.LAYOUT_EMBEDDER_TYPE = "fc"
 52 | _C.MODEL.DECODER_RESOLUTION = (32, 64)
 53 | _C.MODEL.NORMALISE_SAMPLE = False
 54 | 
 55 | _C.MODEL.UNET_ENCODER_CHANNELS = [32, 64, 128, 256, 512]
 56 | _C.MODEL.UNET_DECODER_CHANNELS = [512, 256, 128, 64, 32, 128]
 57 | 
 58 | _C.MODEL.PANORAMA_MODULE = CN()
 59 | _C.MODEL.PANORAMA_MODULE.POS_AT_INPUT = True
 60 | _C.MODEL.PANORAMA_MODULE.NUM_BLOCKS = 2
 61 | _C.MODEL.PANORAMA_MODULE.HIDDEN_DIM = 2048
 62 | 
 63 | 
 64 | _C.POSE_REFINE = CN()
 65 | _C.POSE_REFINE.LR = 0.01
 66 | _C.POSE_REFINE.SCHEDULER_PATIENCE = 10
 67 | _C.POSE_REFINE.SCHEDULER_THRESHOLD = 0.05
 68 | _C.POSE_REFINE.SCHEDULER_DECAY = 0.5
 69 | _C.POSE_REFINE.CONVERGANCE_THRESHOLD = 0.0001
 70 | _C.POSE_REFINE.CONVERGANCE_PATIENCE = 20
 71 | _C.POSE_REFINE.MAX_ITERS = 150
 72 | _C.POSE_REFINE.RENDER_SIZE = (256, 512)
 73 | 
 74 | 
 75 | _C.TRAIN = CN()
 76 | 
 77 | _C.TRAIN.NUM_EPOCHS = 10
 78 | _C.TRAIN.BATCH_SIZE = 16
 79 | _C.TRAIN.TEST_EVERY = 1
 80 | _C.TRAIN.LOSS = "triplet"
 81 | _C.TRAIN.GT_LAYOUT_LOSS = False
 82 | 
 83 | _C.TRAIN.INITIAL_LR = 0.01
 84 | _C.TRAIN.MOMENTUM = 0.9
 85 | _C.TRAIN.WEIGHT_DECAY = 1e-4
 86 | _C.TRAIN.LR_MILESTONES = [5, 8]
 87 | _C.TRAIN.LR_GAMMA = 0.1
 88 | 
 89 | _C.TRAIN.NEAR_MIN_DIST = 0.0
 90 | _C.TRAIN.NEAR_MAX_DIST = 0.5
 91 | _C.TRAIN.FAR_MIN_DIST = 2
 92 | _C.TRAIN.FAR_MAX_DIST = 10
 93 | _C.TRAIN.NUM_NEAR_SAMPLES = 1
 94 | _C.TRAIN.NUM_FAR_SAMPLES = 1
 95 | 
 96 | _C.TRAIN.APPEND_GT = False
 97 | _C.TRAIN.NO_TRANSFORM = False
 98 | 
 99 | _C.TRAIN.SOURCE_WEIGHTS = ""
100 | _C.TRAIN.COMPUTE_GT_DIST = True
101 | 
102 | _C.TRAIN.DISTANCE_LOSS_SCALE = 1.0
103 | _C.TRAIN.LAYOUT_LOSS_SCALE = 1.0
104 | _C.TRAIN.DECODER_LOSS_SCALE = 0.001
105 | 
106 | _C.TRAIN.CUT_GRAD = True
107 | _C.TRAIN.SUBSAMPLE_PLAN_X = 1
108 | 
109 | 
110 | _C.TEST = CN()
111 | 
112 | _C.TEST.BATCH_SIZE = 1
113 | _C.TEST.POSE_SAMPLE_STEP = 500
114 | _C.TEST.LAYOUTS_MAX_BATCH = 64
115 | _C.TEST.VOGEL_DISC_REFINE = False
116 | _C.TEST.VOGEL_SAMPLES = 100
117 | _C.TEST.VAL_AS_TEST = False
118 | _C.TEST.LATENT_POSE_OPTIMISATION = False
119 | _C.TEST.DECODE_REFINE = False
120 | _C.TEST.DECODE_USE_GT = False
121 | _C.TEST.METRIC_DUMP = ""
122 | _C.TEST.COMPUTE_GT_DIST = True
123 | _C.TEST.SUBSAMPLE_PLAN_X = 1
124 | 
125 | 
126 | def get_cfg_defaults():
127 |     """Get a yacs CfgNode object with default values for my_project."""
128 |     # Return a clone so that the defaults will not be altered
129 |     # This is for the "local variable" use pattern
130 |     return _C.clone()
131 | 


--------------------------------------------------------------------------------
/lalaloc/config/parser.py:
--------------------------------------------------------------------------------
 1 | import argparse
 2 | 
 3 | 
 4 | def parse_args():
 5 |     parser = argparse.ArgumentParser()
 6 |     parser.add_argument("-c", "--config_file", default="", type=str)
 7 |     parser.add_argument("-l", "--checkpoint_file", default="", type=str)
 8 |     parser.add_argument("-t", "--test_ckpt", default="", type=str)
 9 |     parser.add_argument("-v", "--val", action="store_true")
10 |     parser.add_argument(
11 |         "opts",
12 |         help="Modify config options using the command-line",
13 |         default=None,
14 |         nargs=argparse.REMAINDER,
15 |     )
16 | 
17 |     args = parser.parse_args()
18 |     return args


--------------------------------------------------------------------------------
/lalaloc/data/__init__.py:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/ActiveVisionLab/LaLaLoc/be48bc1884722409eeb6766e55fafa8ece521c9c/lalaloc/data/__init__.py


--------------------------------------------------------------------------------
/lalaloc/data/dataset.py:
--------------------------------------------------------------------------------
  1 | import os
  2 | import random
  3 | import warnings
  4 | from math import dist
  5 | 
  6 | import numpy as np
  7 | import torch
  8 | import torchvision.transforms.functional as tvf
  9 | from PIL import Image
 10 | from pytorch3d.structures import Pointclouds
 11 | from torch.utils.data import Dataset
 12 | from tqdm import tqdm
 13 | 
 14 | from ..utils.chamfer import chamfer
 15 | from ..utils.floorplan import pose_to_pixel_loc, sample_locs
 16 | from ..utils.projection import (
 17 |     project_depth_to_pc,
 18 |     project_depth_to_pc_batched,
 19 |     projects_onto_floor,
 20 | )
 21 | from ..utils.render import (
 22 |     render_scene,
 23 |     render_scene_batched,
 24 |     render_semantic_batched,
 25 |     render_semantics,
 26 | )
 27 | from .load import (
 28 |     create_floorplan_from_annos,
 29 |     load_scene_annos,
 30 |     prepare_geometry_from_annos,
 31 | )
 32 | from .split import scenes_split
 33 | from .transform import build_transform
 34 | 
 35 | 
 36 | def sample_xy_displacement(max_dist=1, min_dist=0):
 37 |     radius = random.uniform(min_dist, max_dist)
 38 |     angle = random.uniform(0, 2 * np.pi)
 39 |     x = radius * np.sin(angle)
 40 |     y = radius * np.cos(angle)
 41 |     return np.array([x, y, 0]) * 1000
 42 | 
 43 | 
 44 | def load_scene_data(
 45 |     scene_ids, root_path, for_visualisation=False, pix_per_mm=0.025, min_factor=32
 46 | ):
 47 |     scenes = []
 48 |     for scene_id in tqdm(scene_ids):
 49 |         annos = load_scene_annos(root_path, scene_id)
 50 |         floorplan, plan_params = create_floorplan_from_annos(
 51 |             annos, scene_id, pix_per_mm, min_factor
 52 |         )
 53 |         scene_geometry, floor_planes, limits = prepare_geometry_from_annos(
 54 |             annos, for_visualisation=for_visualisation
 55 |         )
 56 |         scene_path = os.path.join(root_path, f"scene_{scene_id:05d}", "2D_rendering")
 57 | 
 58 |         scene_rooms = []
 59 |         for room_id in np.sort(os.listdir(scene_path)):
 60 |             room_path = os.path.join(scene_path, room_id, "panorama")
 61 |             panorama_path = os.path.join(room_path, "{}", "rgb_{}light.png")
 62 |             pose = np.loadtxt(os.path.join(room_path, "camera_xyz.txt"))
 63 | 
 64 |             scene_rooms.append(
 65 |                 {
 66 |                     "id": room_id,
 67 |                     "path": room_path,
 68 |                     "panorama": panorama_path,
 69 |                     "pose": pose,
 70 |                 }
 71 |             )
 72 |         scenes.append(
 73 |             {
 74 |                 "id": scene_id,
 75 |                 "geometry": scene_geometry,
 76 |                 "rooms": scene_rooms,
 77 |                 "floor_planes": floor_planes,
 78 |                 "limits": limits,
 79 |                 "floorplan": floorplan,
 80 |                 "floorplan_params": plan_params,
 81 |             }
 82 |         )
 83 |     return scenes
 84 | 
 85 | 
 86 | class Structured3DPlans(Dataset):
 87 |     def __init__(
 88 |         self, config, split="train", visualise=False,
 89 |     ):
 90 |         scene_ids = scenes_split(split)
 91 |         dataset_path = config.DATASET.PATH
 92 |         self.scenes = load_scene_data(
 93 |             scene_ids,
 94 |             dataset_path,
 95 |             visualise,
 96 |             config.DATASET.PIX_PER_MM,
 97 |             config.DATASET.FLOORPLAN_DIVISIBLE_BY,
 98 |         )
 99 |         self.is_train = split == "train"
100 |         self.visualise = visualise
101 |         self.transform = build_transform(config, self.is_train)
102 |         self.layout_transform = build_transform(config, self.is_train, is_layout=True)
103 |         self.augment = config.DATASET.AUGMENT_LAYOUTS
104 | 
105 |         self.precomputed = None
106 | 
107 |         if self.is_train:
108 |             furniture_levels = config.DATASET.TRAIN_FURNITURE
109 |             lighting_levels = config.DATASET.TRAIN_LIGHTING
110 |         else:
111 |             furniture_levels = config.DATASET.TEST_FURNITURE
112 |             lighting_levels = config.DATASET.TEST_LIGHTING
113 |         self.furniture_levels = furniture_levels
114 |         self.lighting_levels = lighting_levels
115 |         self.compute_gt_dist = config.TRAIN.COMPUTE_GT_DIST
116 |         self.compute_gt_dist_test = config.TEST.COMPUTE_GT_DIST
117 |         self.config = config
118 | 
119 |     def __len__(self):
120 |         return len(self.scenes)
121 | 
122 |     def __getitem__(self, idx):
123 |         data = self.scenes[idx]
124 |         geometry = data["geometry"]
125 |         rooms = data["rooms"]
126 |         floor = data["floor_planes"]
127 |         limits = data["limits"]
128 |         floorplan = data["floorplan"]
129 |         floorplan_params = data["floorplan_params"]
130 | 
131 |         if self.is_train:
132 |             # randomly select a room from the scene
133 |             room = random.choice(rooms)
134 |             # sample and process the +ve and -ve traning examples
135 |             sampled_poses_and_rooms = self._sample_train_poses(
136 |                 room["pose"], floor, limits, scene_id=data["id"]
137 |             )
138 |             sampled_layouts = render_scene_batched(
139 |                 self.config, geometry, sampled_poses_and_rooms
140 |             )
141 |             sampled_pointclouds = project_depth_to_pc_batched(
142 |                 self.config, sampled_layouts
143 |             )
144 |             room_data = self._process_room(
145 |                 data["id"], room, geometry, floor, limits, sampled_pointclouds,
146 |             )
147 |             panorama = room_data["image"]
148 |             pano_pose = room_data["pose"]
149 |             pano_depths = room_data["layout"]
150 |             pano_layouts = self.layout_transform(room_data["layout"])
151 |             pano_room_idx = [room_data["room_idx"]]
152 |             # in some scenarios we do not need the gt distances
153 |             if self.compute_gt_dist:
154 |                 distances = room_data["gt_distances"]
155 |             else:
156 |                 distances = []
157 |             # semantics are only used for visualation therefore not needed here
158 |             sampled_semantics = [np.empty((0, 0))]
159 |             pano_semantics = [np.empty((0, 0))]
160 |         else:
161 |             sampled_poses_and_rooms = self._sample_test_poses(
162 |                 limits,
163 |                 rooms[0]["pose"][-1],
164 |                 floor,
165 |                 step=self.config.TEST.POSE_SAMPLE_STEP,
166 |             )
167 |             # if the grid size is set to be very large sometimes there are no test poses
168 |             if sampled_poses_and_rooms:
169 |                 sampled_layouts = render_scene_batched(
170 |                     self.config, geometry, sampled_poses_and_rooms
171 |                 )
172 |                 sampled_semantics = render_semantic_batched(
173 |                     self.config, geometry, sampled_poses_and_rooms
174 |                 )
175 |                 sampled_pointclouds = project_depth_to_pc_batched(
176 |                     self.config, sampled_layouts
177 |                 )
178 |             else:
179 |                 warnings.warn(
180 |                     "The grid size is set too large for the scene leading to there being no valid test poses."
181 |                 )
182 |                 sampled_layouts = None
183 |                 sampled_semantics = None
184 |                 sampled_pointclouds = []
185 | 
186 |             panoramas = []
187 |             pano_poses = []
188 |             pano_layouts = []
189 |             pano_room_idx = []
190 |             pano_semantics = []
191 |             distances_all = []
192 |             for room in rooms:
193 |                 room_data = self._process_room(
194 |                     data["id"], room, geometry, floor, limits, sampled_pointclouds,
195 |                 )
196 |                 panoramas.append(room_data["image"])
197 |                 pano_poses.append(room_data["pose"])
198 |                 pano_layouts.append(room_data["layout"])
199 |                 pano_room_idx.append(room_data["room_idx"])
200 |                 pano_semantics.append(room_data["semantics"])
201 |                 distances_all.append(room_data["gt_distances"])
202 |             panorama = torch.stack(panoramas)
203 |             pano_pose = np.stack(pano_poses)
204 |             pano_room_idx = torch.Tensor(pano_room_idx)
205 |             pano_depths = np.stack(pano_layouts)
206 |             pano_layouts = torch.stack(
207 |                 [self.layout_transform(l) for l in pano_layouts], dim=0
208 |             )
209 |             if self.compute_gt_dist_test:
210 |                 distances = torch.stack(distances_all)
211 |             else:
212 |                 distances = []
213 |         pano_semantics = np.stack(pano_semantics)
214 | 
215 |         sampled_depths = np.stack(sampled_layouts)
216 |         sampled_semanitcs = np.stack(sampled_semantics)
217 |         sampled_layouts = torch.stack(
218 |             [self.layout_transform(l) for l in sampled_layouts], dim=0
219 |         )
220 | 
221 |         pano_pose = torch.Tensor(pano_pose)
222 |         sampled_poses = torch.Tensor([p for p, _ in sampled_poses_and_rooms])
223 |         sampled_room_idxs = torch.Tensor([r for _, r in sampled_poses_and_rooms])
224 | 
225 |         if self.is_train:
226 |             # geometry doesn't support batching but isn't used in training
227 |             geometry = []
228 |             floor = []
229 | 
230 |         floorplan = tvf.to_tensor(floorplan)
231 | 
232 |         return {
233 |             "panorama": panorama,
234 |             "pano_layout": pano_layouts,
235 |             "pano_pose": pano_pose,
236 |             "pano_room_idx": pano_room_idx,
237 |             "pano_depths": pano_depths,
238 |             "pano_semantics": pano_semantics,
239 |             "sampled_layouts": sampled_layouts,
240 |             "sampled_poses": sampled_poses,
241 |             "sampled_room_idxs": sampled_room_idxs,
242 |             "sampled_depths": sampled_depths,
243 |             "sampled_semantics": sampled_semanitcs,
244 |             "distances": distances,
245 |             "geometry": geometry,
246 |             "floor": floor,
247 |             "floorplan": floorplan,
248 |             "floorplan_params": floorplan_params,
249 |         }
250 | 
251 |     def _process_room(
252 |         self, scene_id, room, geometry, floor, limits, pointclouds_layouts
253 |     ):
254 |         pose_pano, layout_pano, semantics_pano, room_idx = self._get_room_data(
255 |             room, geometry, floor, scene_id
256 |         )
257 |         pointcloud_pano = torch.Tensor(project_depth_to_pc(self.config, layout_pano))
258 |         pointclouds_pano = Pointclouds([pointcloud_pano,] * len(pointclouds_layouts))
259 | 
260 |         if (self.is_train and self.compute_gt_dist) or (
261 |             not self.is_train and self.compute_gt_dist_test
262 |         ):
263 |             distances = chamfer(pointclouds_pano, pointclouds_layouts)
264 |         else:
265 |             distances = []
266 | 
267 |         furniture = random.choice(self.furniture_levels)
268 |         lighting = random.choice(self.lighting_levels)
269 |         panorama_path = room["panorama"].format(furniture, lighting)
270 |         # sometimes the panorama image can be get corrupted
271 |         # if this happens, reextract the relevant zip file
272 |         try:
273 |             panorama = Image.open(panorama_path).convert("RGB")
274 |         except Exception as e:
275 |             print(panorama_path)
276 |             print(e)
277 | 
278 |         panorama = self.transform(panorama)
279 | 
280 |         return {
281 |             "image": panorama,
282 |             "pose": pose_pano,
283 |             "layout": layout_pano,
284 |             "semantics": semantics_pano,
285 |             "gt_distances": distances,
286 |             "room_idx": room_idx,
287 |         }
288 | 
289 |     def _sample_train_poses(
290 |         self, pose_panorama, floor_geometry, limits, offset=500, scene_id=None
291 |     ):
292 |         poses = []
293 |         for _ in range(self.config.TRAIN.NUM_NEAR_SAMPLES):
294 |             near_room_idx = -1
295 |             num_attempts = 0
296 |             while near_room_idx < 0:
297 |                 if num_attempts > 100:
298 |                     pose_near = pose_panorama
299 |                 else:
300 |                     pose_near = pose_panorama + sample_xy_displacement(
301 |                         max_dist=self.config.TRAIN.NEAR_MAX_DIST,
302 |                         min_dist=self.config.TRAIN.NEAR_MIN_DIST,
303 |                     )
304 |                 near_room_idx = projects_onto_floor(pose_near, floor_geometry)
305 |                 num_attempts += 1
306 |             poses.append((pose_near, near_room_idx))
307 | 
308 |         for _ in range(self.config.TRAIN.NUM_FAR_SAMPLES):
309 |             far_room_idx = -1
310 |             num_attempts = 0
311 |             while far_room_idx < 0:
312 |                 x_location = np.random.uniform(
313 |                     limits[0] + (offset / 2), limits[1] - (offset / 2)
314 |                 )
315 |                 y_location = np.random.uniform(
316 |                     limits[2] + (offset / 2), limits[3] - (offset / 2)
317 |                 )
318 |                 pose_far = np.array([x_location, y_location, pose_panorama[-1]])
319 | 
320 |                 if (
321 |                     np.linalg.norm(pose_panorama - pose_far)
322 |                     < self.config.TRAIN.FAR_MIN_DIST
323 |                 ):
324 |                     continue
325 |                 far_room_idx = projects_onto_floor(pose_far, floor_geometry)
326 |                 if num_attempts >= 100:
327 |                     pose_far = pose_near
328 |                 far_room_idx = projects_onto_floor(pose_far, floor_geometry)
329 |                 num_attempts += 1
330 |             poses.append((pose_far, far_room_idx))
331 | 
332 |         if self.config.TRAIN.APPEND_GT:
333 |             gt_room_idx = projects_onto_floor(pose_panorama, floor_geometry)
334 |             poses.append((pose_panorama, gt_room_idx))
335 |         return poses
336 | 
337 |     def _sample_test_poses(self, limits, z, floor_geometry, step=1000):
338 |         x_locations = np.arange(limits[0] + (step / 2), limits[1] + (step / 2), step)
339 |         y_locations = np.arange(limits[2] + (step / 2), limits[3] + (step / 2), step)
340 | 
341 |         poses = np.meshgrid(x_locations, y_locations)
342 |         poses = np.stack(poses, axis=2).reshape(-1, 2)
343 |         poses = np.concatenate([poses, np.full((poses.shape[0], 1), z)], axis=1)
344 | 
345 |         room_idxs = [projects_onto_floor(pose, floor_geometry) for pose in poses]
346 |         pose_grid = [(p, i) for p, i in zip(poses, room_idxs) if i >= 0]
347 |         return pose_grid
348 | 
349 |     def _get_room_data(self, room, geometry, floor, scene_id):
350 |         pose_pano = room["pose"]
351 |         room_idx = projects_onto_floor(pose_pano, floor)
352 |         if room_idx < 0:
353 |             warnings.warn("pose outside room: {}".format(scene_id))
354 |         layout_pano = render_scene(self.config, geometry[room_idx], pose_pano)
355 |         semantics_pano = render_semantics(self.config, geometry[room_idx], pose_pano)
356 |         return pose_pano, layout_pano, semantics_pano, room_idx
357 | 
358 | 
359 | class TargetEmbeddingDataset(Structured3DPlans):
360 |     def __init__(self, encoder, config, split="train", visualise=False, device="cpu:0"):
361 |         super().__init__(config, split=split, visualise=visualise)
362 | 
363 |         with torch.no_grad():
364 |             encoder.eval()
365 |             for scene in tqdm(self.scenes):
366 |                 floorplan = scene["floorplan"]
367 |                 floorplan_params = scene["floorplan_params"]
368 |                 scale = torch.tensor([floorplan_params["scale"]])
369 |                 shift = torch.tensor(floorplan_params["shift"])
370 |                 plan_height = floorplan_params["h"]
371 |                 plan_width = floorplan_params["w"]
372 |                 floorplan = floorplan[:plan_height, :plan_width]
373 | 
374 |                 floorplan = tvf.to_tensor(floorplan).to(device).unsqueeze(0)
375 |                 plan_embed = encoder(floorplan).cpu()
376 | 
377 |                 subsample_x = self.config.TEST.SUBSAMPLE_PLAN_X
378 |                 if subsample_x > 1:
379 |                     plan_embed = plan_embed[:, :, ::subsample_x, ::subsample_x]
380 |                     floorplan = floorplan[:, :, ::subsample_x, ::subsample_x]
381 |                     scale = scale / subsample_x
382 | 
383 |                 query_poses = []
384 |                 rooms = scene["rooms"]
385 |                 for room in rooms:
386 |                     query_poses.append(room["pose"])
387 |                 query_poses = torch.Tensor(np.stack(query_poses))
388 |                 query_locs = pose_to_pixel_loc(query_poses.unsqueeze(0), scale, shift)
389 | 
390 |                 target_embeddings = sample_locs(
391 |                     plan_embed, query_locs, normalise=self.config.MODEL.NORMALISE_SAMPLE
392 |                 ).squeeze(0)
393 | 
394 |                 scene["embeddings"] = target_embeddings
395 | 
396 |     def __getitem__(self, idx):
397 |         data = self.scenes[idx]
398 |         rooms = data["rooms"]
399 |         embeddings = data["embeddings"]
400 | 
401 |         room_idx = random.randrange(0, len(rooms))
402 |         room = rooms[room_idx]
403 | 
404 |         furniture = random.choice(self.furniture_levels)
405 |         lighting = random.choice(self.lighting_levels)
406 |         panorama_path = room["panorama"].format(furniture, lighting)
407 |         # sometimes the panorama image can be get corrupted
408 |         # if this happens, reextract the relevant zip file
409 |         try:
410 |             panorama = Image.open(panorama_path).convert("RGB")
411 |         except Exception as e:
412 |             print(panorama_path)
413 |             print(e)
414 | 
415 |         panorama = self.transform(panorama)
416 |         embedding = embeddings[room_idx]
417 |         return {
418 |             "panorama": panorama,
419 |             "target_embedding": embedding,
420 |         }
421 | 


--------------------------------------------------------------------------------
/lalaloc/data/load.py:
--------------------------------------------------------------------------------
  1 | """
  2 | Parts of this code are modified from: https://github.com/bertjiazheng/Structured3D
  3 | Copyright (c) 2019 Structured3D Group
  4 | """
  5 | import json
  6 | import math
  7 | import os
  8 | 
  9 | import cv2
 10 | import numpy as np
 11 | import torch
 12 | from pytorch3d.structures import Meshes, join_meshes_as_scene
 13 | 
 14 | from ..utils.polygons import clip_polygon, convert_lines_to_vertices
 15 | 
 16 | 
 17 | def round_up_to_multiple(f, factor=2):
 18 |     return math.ceil(f / float(factor)) * factor
 19 | 
 20 | 
 21 | def load_scene_annos(root, scene_id):
 22 |     with open(
 23 |         os.path.join(root, f"scene_{scene_id:05d}", "annotation_3d.json")
 24 |     ) as file:
 25 |         annos = json.load(file)
 26 |     return annos
 27 | 
 28 | 
 29 | def prepare_geometry_from_annos(annos, for_visualisation=False):
 30 |     junctions = [item["coordinate"] for item in annos["junctions"]]
 31 | 
 32 |     # extract hole vertices
 33 |     lines_holes = []
 34 |     for semantic in annos["semantics"]:
 35 |         if semantic["type"] in ["window", "door"]:
 36 |             for planeID in semantic["planeID"]:
 37 |                 lines_holes.extend(
 38 |                     np.where(np.array(annos["planeLineMatrix"][planeID]))[0].tolist()
 39 |                 )
 40 |     lines_holes = np.unique(lines_holes)
 41 |     _, vertices_holes = np.where(np.array(annos["lineJunctionMatrix"])[lines_holes])
 42 |     vertices_holes = np.unique(vertices_holes)
 43 | 
 44 |     # load polygons
 45 |     rooms = []
 46 |     floor_verts = []
 47 |     floor_faces = []
 48 |     min_x = 1e15
 49 |     max_x = -1e15
 50 |     min_y = 1e15
 51 |     max_y = -1e15
 52 |     for semantic in annos["semantics"]:
 53 |         if semantic["type"] in ["outwall", "door", "window"]:
 54 |             continue
 55 |         polygons = []
 56 |         for planeID in semantic["planeID"]:
 57 |             plane_anno = annos["planes"][planeID]
 58 |             lineIDs = np.where(np.array(annos["planeLineMatrix"][planeID]))[0].tolist()
 59 |             junction_pairs = [
 60 |                 np.where(np.array(annos["lineJunctionMatrix"][lineID]))[0].tolist()
 61 |                 for lineID in lineIDs
 62 |             ]
 63 |             polygon = convert_lines_to_vertices(junction_pairs)
 64 |             vertices, faces = clip_polygon(
 65 |                 polygon, vertices_holes, junctions, plane_anno, clip_holes=False
 66 |             )
 67 |             polygons.append(
 68 |                 [
 69 |                     vertices,
 70 |                     faces,
 71 |                     planeID,
 72 |                     plane_anno["normal"],
 73 |                     plane_anno["type"],
 74 |                     semantic["type"],
 75 |                 ]
 76 |             )
 77 | 
 78 |         room_verts = []
 79 |         room_faces = []
 80 |         for vertices, faces, planeID, normal, plane_type, semantic_type in polygons:
 81 |             vis_verts = np.array(vertices)
 82 |             vis_faces = np.array(faces)
 83 |             if len(vis_faces) == 0:
 84 |                 continue
 85 | 
 86 |             room_verts.append(torch.Tensor(vertices))
 87 |             room_faces.append(torch.Tensor(faces))
 88 | 
 89 |             min_x = min(min_x, np.min(vis_verts[:, 0]))
 90 |             max_x = max(max_x, np.max(vis_verts[:, 0]))
 91 |             min_y = min(min_y, np.min(vis_verts[:, 1]))
 92 |             max_y = max(max_y, np.max(vis_verts[:, 1]))
 93 | 
 94 |             if plane_type == "floor":
 95 |                 floor_verts.append(torch.Tensor(vertices))
 96 |                 floor_faces.append(torch.Tensor(faces))
 97 |         if not for_visualisation:
 98 |             room = join_meshes_as_scene(Meshes(room_verts, room_faces))
 99 |         else:
100 |             room = Meshes(
101 |                 room_verts, room_faces
102 |             )  # This provides the correct form for visualisation
103 |         rooms.append(room)
104 |     floors = Meshes(verts=floor_verts, faces=floor_faces)
105 |     limits = (min_x, max_x, min_y, max_y)
106 |     return rooms, floors, limits
107 | 
108 | 
109 | def create_floorplan_from_annos(annos, scene_id, pix_per_mm=0.025, min_factor=32):
110 |     # extract the floor in each semantic for floorplan visualization
111 |     planes = []
112 |     for semantic in annos["semantics"]:
113 |         for planeID in semantic["planeID"]:
114 |             if annos["planes"][planeID]["type"] == "floor":
115 |                 planes.append({"planeID": planeID, "type": semantic["type"]})
116 | 
117 |         if semantic["type"] == "outwall":
118 |             outerwall_planes = semantic["planeID"]
119 | 
120 |     # extract hole vertices
121 |     lines_holes = []
122 |     for semantic in annos["semantics"]:
123 |         if semantic["type"] in ["window", "door"]:
124 |             for planeID in semantic["planeID"]:
125 |                 lines_holes.extend(
126 |                     np.where(np.array(annos["planeLineMatrix"][planeID]))[0].tolist()
127 |                 )
128 |     lines_holes = np.unique(lines_holes)
129 | 
130 |     # junctions on the floor
131 |     junctions = np.array([junc["coordinate"] for junc in annos["junctions"]])
132 |     junction_floor = np.where(np.isclose(junctions[:, -1], 0))[0]
133 | 
134 |     # construct each polygon
135 |     polygons = []
136 |     for plane in planes:
137 |         lineIDs = np.where(np.array(annos["planeLineMatrix"][plane["planeID"]]))[
138 |             0
139 |         ].tolist()
140 |         junction_pairs = [
141 |             np.where(np.array(annos["lineJunctionMatrix"][lineID]))[0].tolist()
142 |             for lineID in lineIDs
143 |         ]
144 |         polygon = convert_lines_to_vertices(junction_pairs)
145 |         polygons.append([polygon[0], plane["type"]])
146 | 
147 |     outerwall_floor = []
148 |     for planeID in outerwall_planes:
149 |         lineIDs = np.where(np.array(annos["planeLineMatrix"][planeID]))[0].tolist()
150 |         lineIDs = np.setdiff1d(lineIDs, lines_holes)
151 |         junction_pairs = [
152 |             np.where(np.array(annos["lineJunctionMatrix"][lineID]))[0].tolist()
153 |             for lineID in lineIDs
154 |         ]
155 |         for start, end in junction_pairs:
156 |             if start in junction_floor and end in junction_floor:
157 |                 outerwall_floor.append([start, end])
158 | 
159 |     outerwall_polygon = convert_lines_to_vertices(outerwall_floor)
160 |     polygons.insert(0, [outerwall_polygon[0], "outwall"])
161 | 
162 |     floorplan, affine_params = plot_floorplan(
163 |         polygons, junctions, scene_id, pix_per_mm=pix_per_mm, round_multiple=min_factor
164 |     )
165 |     return floorplan, affine_params
166 | 
167 | 
168 | def plot_floorplan(
169 |     polygons, junctions, scene_id, size=512, pix_per_mm=0.025, round_multiple=32
170 | ):
171 | 
172 |     junctions = junctions[:, :2]
173 | 
174 |     used_junctions = []
175 |     for polygon, _ in polygons:
176 |         used_junctions.append(junctions[np.array(polygon)])
177 |     used_junctions = np.concatenate(used_junctions)
178 |     # shift so floorplan fits in unit square 0 and 1
179 |     min_x = np.min(used_junctions[:, 0])
180 |     max_x = np.max(used_junctions[:, 0])
181 |     min_y = np.min(used_junctions[:, 1])
182 |     max_y = np.max(used_junctions[:, 1])
183 |     shift = np.array((min_x, min_y))
184 | 
185 |     if pix_per_mm < 0:
186 |         range = max(max_x - min_x, max_y - min_y)
187 |         scale = size / range
188 |         floorplan_shape = (size, size, 3)
189 |     else:
190 |         scale = pix_per_mm
191 |         range_x = max_x - min_x
192 |         range_y = max_y - min_y
193 |         w_ind = round_up_to_multiple(pix_per_mm * range_x, round_multiple)
194 |         h_ind = round_up_to_multiple(pix_per_mm * range_y, round_multiple)
195 |         w = 1216
196 |         h = 960
197 |         floorplan_shape = (h, w, 3)
198 | 
199 |     junctions -= shift
200 |     junctions *= scale
201 | 
202 |     floorplan = np.zeros(floorplan_shape, dtype=np.float32)
203 |     for (polygon, poly_type) in polygons:
204 |         contours = junctions[np.array(polygon)].astype(np.int32)
205 |         if poly_type in ["door", "window", "outwall"]:
206 |             cv2.fillPoly(floorplan, pts=[contours], color=(1.0, 1.0, 1.0))
207 |         else:
208 |             cv2.fillPoly(floorplan, pts=[contours], color=(0.5, 0.5, 0.5))
209 | 
210 |     return floorplan, {"scale": scale, "shift": shift, "w": w_ind, "h": h_ind}
211 | 


--------------------------------------------------------------------------------
/lalaloc/data/split.py:
--------------------------------------------------------------------------------
 1 | def scenes_split(split):
 2 |     splits = {
 3 |         "train": (
 4 |             list(range(0, 3000)),
 5 |             [
 6 |                 335,
 7 |                 683,
 8 |                 1192,
 9 |                 1753,
10 |                 1852,
11 |                 2205,
12 |                 2209,
13 |                 2223,
14 |                 2339,
15 |                 2357,
16 |                 2401,
17 |                 2956,
18 |                 2309,
19 |                 278,
20 |                 379,
21 |                 1212,
22 |                 1840,
23 |                 1855,
24 |                 2025,
25 |                 2110,
26 |                 2593,
27 |             ],
28 |         ),
29 |         "val": (list(range(3000, 3250)), [2110, 3086, 3117, 3121, 3239]),
30 |         "test": (list(range(3250, 3500)), [3307]),
31 |     }
32 |     ids, to_remove = splits[split]
33 |     ids = [i for i in ids if i not in to_remove]
34 |     return ids
35 | 


--------------------------------------------------------------------------------
/lalaloc/data/transform.py:
--------------------------------------------------------------------------------
 1 | import torchvision.transforms as transforms
 2 | 
 3 | 
 4 | def build_transform(config, is_train, is_layout=False):
 5 |     # TODO: Data augmentation, flip and rotate the camera
 6 |     # Needs to be applied to layouts as well
 7 |     in_size = config.INPUT.LAYOUT_SIZE if is_layout else config.INPUT.IMG_SIZE
 8 | 
 9 |     transform = [
10 |         transforms.Resize(in_size),
11 |         transforms.ToTensor(),
12 |     ]
13 |     if is_layout:
14 |         transform = [transforms.ToPILImage(),] + transform
15 |     elif not config.TRAIN.NO_TRANSFORM:
16 |         transform += [
17 |             transforms.Normalize(
18 |                 mean=config.INPUT.NORMALISE_MEAN, std=config.INPUT.NORMALISE_STD
19 |             ),
20 |         ]
21 |     transform = transforms.Compose(transform)
22 |     return transform
23 | 


--------------------------------------------------------------------------------
/lalaloc/model/__init__.py:
--------------------------------------------------------------------------------
1 | from .lalaloc import ImageFromLayout, Layout2LayoutDecode
2 | from .lalaloc_pp import FloorPlanUnetImage, FloorPlanUnetLayout
3 | 


--------------------------------------------------------------------------------
/lalaloc/model/lalaloc.py:
--------------------------------------------------------------------------------
  1 | from logging import warn
  2 | import warnings
  3 | 
  4 | import torch
  5 | import torch.nn as nn
  6 | import torch.nn.functional as F
  7 | 
  8 | from .lalaloc_base import Image2LayoutBase, Layout2LayoutBase
  9 | from .modules import LayoutDecoder
 10 | from .losses import triplet_loss, bbs_loss
 11 | 
 12 | 
 13 | class ImageFromLayout(Image2LayoutBase):
 14 |     def __init__(self, config):
 15 |         super(ImageFromLayout, self).__init__(config)
 16 |         self.load_weights_from_l2l(config.TRAIN.SOURCE_WEIGHTS)
 17 | 
 18 |     def load_weights_from_l2l(self, ckpt_path):
 19 |         if not ckpt_path:
 20 |             warnings.warn("No source for the layout branch weights was specified")
 21 |             return
 22 |         # load weights from Layout2Layout model
 23 |         ckpt_dict = torch.load(ckpt_path)
 24 |         model_weights = ckpt_dict["state_dict"]
 25 | 
 26 |         # load "embedder" weights into "reference_embedder"
 27 |         load_dict = {}
 28 |         for k, v in model_weights.items():
 29 |             modules = k.split(".")
 30 |             parent = modules[0]
 31 |             if parent == "embedder":
 32 |                 child = ".".join(modules[1:])
 33 |                 load_dict[child] = v
 34 |         self.reference_embedder.load_state_dict(load_dict)
 35 | 
 36 |         # freeze reference_embedder weights
 37 |         for p in self.reference_embedder.parameters():
 38 |             p.requires_grad = False
 39 | 
 40 |     def training_step(self, batch, batch_idx):
 41 |         for m in self.reference_embedder.modules():
 42 |             if isinstance(m, nn.BatchNorm2d):
 43 |                 m.eval()
 44 | 
 45 |         # compute query and reference embeddings
 46 |         query_image = batch["panorama"]
 47 |         query_embed = self.forward(q=query_image)
 48 | 
 49 |         reference_layouts = batch["pano_layout"].unsqueeze(1)
 50 |         reference_embed = self.forward(r=reference_layouts).squeeze(1)
 51 | 
 52 |         # perform L2 distance loss
 53 |         loss = ((query_embed - reference_embed) ** 2).sum(dim=1).sqrt().mean()
 54 | 
 55 |         stats_to_log = {"train/l2_loss": loss.item()}
 56 |         return {"loss": loss, "log": stats_to_log}
 57 | 
 58 | 
 59 | class Layout2LayoutDecode(Layout2LayoutBase):
 60 |     def __init__(self, config):
 61 |         super(Layout2LayoutDecode, self).__init__(config)
 62 |         self.layout_decoder = LayoutDecoder(config)
 63 | 
 64 |     def training_step(self, batch, batch_idx):
 65 |         query_image = batch[self.query_key]
 66 |         reference_layouts = batch["sampled_layouts"]
 67 |         gt_distances = batch["distances"]
 68 | 
 69 |         query_embed = self.forward(q=query_image)
 70 |         reference_embed = self.forward(r=reference_layouts)
 71 | 
 72 |         # perform layout2layout layout loss
 73 |         distances = self.compute_distances(query_embed, reference_embed)
 74 |         if self.config.TRAIN.LOSS == "triplet":
 75 |             loss_layout = triplet_loss(distances)
 76 |         elif self.config.TRAIN.LOSS == "bbs":
 77 |             gt_distances = gt_distances.float().to(self.device)
 78 |             loss_layout = bbs_loss(distances, gt_distances)
 79 |         else:
 80 |             raise NotImplementedError(
 81 |                 "{} loss type is not currently implemented".format(
 82 |                     self.config.TRAIN.LOSS
 83 |                 )
 84 |             )
 85 | 
 86 |         # decode the layout embedding and compute loss
 87 |         query_decoded = self.layout_decoder(query_embed)
 88 |         query_target = F.interpolate(
 89 |             query_image.detach().clone(), self.config.MODEL.DECODER_RESOLUTION
 90 |         )
 91 |         reference_decoded = self.layout_decoder(reference_embed)
 92 | 
 93 |         h, w = reference_layouts.shape[-2:]
 94 |         reference_targets = F.interpolate(
 95 |             reference_layouts.view(-1, 1, h, w).detach().clone(),
 96 |             self.config.MODEL.DECODER_RESOLUTION,
 97 |         )
 98 | 
 99 |         decoded = torch.cat([query_decoded, reference_decoded], dim=0)
100 |         target = torch.cat([query_target, reference_targets], dim=0)
101 |         loss_decode = F.l1_loss(decoded, target)
102 | 
103 |         loss = (
104 |             self.config.TRAIN.DECODER_LOSS_SCALE * loss_decode
105 |             + self.config.TRAIN.LAYOUT_LOSS_SCALE * loss_layout
106 |         )
107 |         stats_to_log = {
108 |             "train/loss": loss.item(),
109 |             "train/layout_loss": loss_layout.item(),
110 |             "train/decoder_loss": loss_decode.item(),
111 |         }
112 |         return {"loss": loss, "log": stats_to_log}
113 | 


--------------------------------------------------------------------------------
/lalaloc/model/lalaloc_base.py:
--------------------------------------------------------------------------------
  1 | import pytorch_lightning as pl
  2 | import torch
  3 | import torch.nn as nn
  4 | import torch.nn.functional as F
  5 | import torch.optim as optim
  6 | from torch.utils.data import DataLoader
  7 | 
  8 | from lalaloc.utils.render import render_scene_batched
  9 | from lalaloc.utils.vogel_disc import sample_vogel_disc
 10 | 
 11 | from ..data.dataset import Structured3DPlans
 12 | from ..data.transform import build_transform
 13 | from ..utils.eval import recall_at_n
 14 | from ..utils.projection import projects_onto_floor
 15 | from .modules import LayoutModule, PanoModule
 16 | from .pose_optimisation import (
 17 |     PoseConvergenceChecker,
 18 |     init_camera_at_origin,
 19 |     init_objects_at_pose,
 20 |     init_optimiser,
 21 |     render_at_pose,
 22 | )
 23 | 
 24 | 
 25 | def build_dataloader(config, split):
 26 |     is_train = split == "train"
 27 | 
 28 |     batch_size = config.TRAIN.BATCH_SIZE if is_train else None
 29 |     num_workers = (
 30 |         config.SYSTEM.NUM_WORKERS if is_train or not config.TEST.COMPUTE_GT_DIST else 0
 31 |     )
 32 | 
 33 |     dataset = Structured3DPlans(config, split)
 34 |     dataloader = DataLoader(
 35 |         dataset, batch_size, shuffle=is_train, num_workers=num_workers,
 36 |     )
 37 |     return dataloader
 38 | 
 39 | 
 40 | class Image2LayoutBase(pl.LightningModule):
 41 |     def __init__(self, config):
 42 |         super(Image2LayoutBase, self).__init__()
 43 |         self.query_embedder = PanoModule(config)
 44 |         self.reference_embedder = LayoutModule(config)
 45 |         self.desc_length = config.MODEL.DESC_LENGTH
 46 |         self.config = config
 47 |         # The key to access the query data type from the batch dict
 48 |         self.query_key = "panorama"
 49 | 
 50 |     def forward(self, q=None, r=None):
 51 |         if q is None and r is None:
 52 |             raise Exception
 53 | 
 54 |         if q is not None:
 55 |             q = self.query_embedder(q)
 56 |             if r is None:
 57 |                 return q
 58 | 
 59 |         if r is not None:
 60 |             n, m, c, h, w = r.shape
 61 |             r = r.reshape(n * m, c, h, w)
 62 |             r = self.reference_embedder(r)
 63 |             r = r.reshape(n, m, self.desc_length)
 64 |             if q is None:
 65 |                 return r
 66 | 
 67 |         d = self.compute_distances(q, r)
 68 |         return d
 69 | 
 70 |     def compute_distances(self, p, l):
 71 |         n, m, _ = l.shape
 72 |         p = p.unsqueeze(1).expand(-1, m, -1)
 73 |         p = p.reshape(n * m, self.desc_length)
 74 |         l = l.reshape(-1, self.desc_length)
 75 | 
 76 |         d = F.pairwise_distance(p, l)
 77 |         d = d.reshape(n, m)
 78 |         return d
 79 | 
 80 |     def vogel_refinement(self, query_embeddings, nn_poses, geometry, floor):
 81 |         transform = build_transform(self.config, False, is_layout=True)
 82 |         radius = 2 * self.config.TEST.POSE_SAMPLE_STEP
 83 |         num_samples = self.config.TEST.VOGEL_SAMPLES
 84 |         refined_poses = []
 85 |         for query_embedding, nn_pose in zip(query_embeddings, nn_poses):
 86 |             sampled_poses = sample_vogel_disc(nn_pose, radius, num_samples)
 87 |             poses_to_render = []
 88 |             for pose in sampled_poses:
 89 |                 room_idx = projects_onto_floor(pose, floor)
 90 |                 if room_idx < 0:
 91 |                     continue
 92 |                 poses_to_render.append((pose, room_idx))
 93 |             local_layouts = render_scene_batched(self.config, geometry, poses_to_render)
 94 |             local_poses = [torch.tensor(p[0]) for p in poses_to_render]
 95 | 
 96 |             # transform and stack layouts
 97 |             local_layouts = [transform(l) for l in local_layouts]
 98 |             local_layouts = torch.stack(local_layouts).to(self.device)
 99 |             # feed into embedder
100 |             local_embeddings = self.forward(r=local_layouts.unsqueeze(0)).detach()
101 |             # take min distance between result and query_embedding
102 |             # and append its respective pose
103 |             distances = torch.norm(local_embeddings - query_embedding, dim=-1)
104 |             refined_pose = local_poses[distances.argmin()]
105 |             refined_poses.append(refined_pose)
106 |         refined_poses = torch.stack(refined_poses)
107 |         return refined_poses
108 | 
109 |     def latent_pose_optimisation(self, query_embeddings, nn_poses, geometry, floor):
110 |         # Ensure gradients are enabled and modules are in eval mode
111 |         torch.set_grad_enabled(True)
112 |         self.eval()
113 | 
114 |         refined_poses = []
115 |         for query_embedding, nn_pose in zip(query_embeddings, nn_poses):
116 |             query_embedding = query_embedding.detach()
117 |             query_embedding.requires_grad = False
118 | 
119 |             # gather room geometry
120 |             room_idx = projects_onto_floor(nn_pose, floor)
121 |             mesh = geometry[room_idx]
122 | 
123 |             # centre geometry at pose
124 |             nn_pose = nn_pose.to(self.device)
125 |             objects, vertices = init_objects_at_pose(nn_pose, mesh, self.device)
126 |             camera = init_camera_at_origin(self.config)
127 | 
128 |             # initialise the refinement translation vector
129 |             # note: these represent displacements from the initial pose in metres
130 |             pose_xy = torch.zeros((1, 2), requires_grad=True, device=self.device)
131 |             pose_z = torch.zeros((1, 1), requires_grad=False, device=self.device)
132 | 
133 |             # initialise optimisation and stopping metrics
134 |             optimiser, scheduler = init_optimiser(self.config, [pose_xy])
135 |             convergence_checker = PoseConvergenceChecker(self.config)
136 | 
137 |             for j in range(self.config.POSE_REFINE.MAX_ITERS):
138 |                 optimiser.zero_grad()
139 |                 layout = render_at_pose(
140 |                     camera, objects, vertices, torch.cat([pose_xy, pose_z], dim=1)
141 |                 )
142 | 
143 |                 h, w = layout.shape
144 |                 layout = layout.view(1, 1, h, w)
145 |                 # for faster rendering, sometimes we render the layouts at a smaller resolution than the network takes as input
146 |                 # therefore, we need to interpolate the layout to the target input size
147 |                 if self.config.POSE_REFINE.RENDER_SIZE != self.config.INPUT.IMG_SIZE:
148 |                     layout = F.interpolate(layout, self.config.INPUT.IMG_SIZE)
149 | 
150 |                 layout_embedding = self.forward(r=layout.unsqueeze(0))
151 |                 loss = torch.norm(query_embedding - layout_embedding, dim=-1)
152 |                 loss.backward()
153 | 
154 |                 # check convergence
155 |                 current_loss = loss.item()
156 |                 current_pose = torch.cat([pose_xy, pose_z], dim=1).clone().detach()
157 |                 if convergence_checker.has_converged(current_loss, current_pose):
158 |                     break
159 | 
160 |                 optimiser.step()
161 |                 scheduler.step(current_loss)
162 |             # pose displacement is optimised in metres, therefore convert it to mm
163 |             refined_pose = nn_pose + convergence_checker.best_pose * 1000
164 |             refined_poses.append(refined_pose)
165 |         refined_poses = torch.stack(refined_poses)
166 |         refined_poses = refined_poses.detach().cpu().squeeze(1)
167 |         torch.set_grad_enabled(False)
168 |         return refined_poses
169 | 
170 |     def configure_optimizers(self):
171 |         optimiser = optim.SGD(
172 |             filter(lambda p: p.requires_grad, self.parameters()),
173 |             lr=self.config.TRAIN.INITIAL_LR,
174 |             momentum=self.config.TRAIN.MOMENTUM,
175 |             weight_decay=self.config.TRAIN.WEIGHT_DECAY,
176 |         )
177 |         scheduler = optim.lr_scheduler.MultiStepLR(
178 |             optimiser, self.config.TRAIN.LR_MILESTONES, self.config.TRAIN.LR_GAMMA
179 |         )
180 |         return [optimiser], [scheduler]
181 | 
182 |     def training_step(self, batch, batch_idx):
183 |         # Training step should be implemented in child classes
184 |         raise NotImplementedError("The training routine is not implemented.")
185 | 
186 |     def inference_step(self, batch, batch_idx):
187 |         query_image = batch[self.query_key]
188 |         query_pose = batch["pano_pose"]
189 |         reference_layouts = batch["sampled_layouts"]
190 |         reference_poses = batch["sampled_poses"]
191 | 
192 |         gt_distances = batch["distances"]
193 |         query_image = query_image
194 |         query_pose = query_pose.cpu()
195 |         reference_poses = reference_poses.cpu()
196 |         gt_distances = gt_distances.cpu()
197 | 
198 |         # compute the desciptors for the panoramas in each room
199 |         query_desc = self.forward(q=query_image).detach()
200 |         # compute the descriptors for each of the sampled layouts
201 |         # NB: this is split into minibatches since some rooms may be extremely large with many sampled layouts
202 |         reference_descs = []
203 |         for layout_minibatch in reference_layouts.split(
204 |             self.config.TEST.LAYOUTS_MAX_BATCH
205 |         ):
206 |             layout_minibatch = layout_minibatch.unsqueeze(0).contiguous().cuda()
207 |             reference_descs.append(self.forward(r=layout_minibatch)[0].detach())
208 |         reference_descs = torch.cat(reference_descs)
209 |         # compute the distances between each of the room panos and the grid of layouts
210 |         n, _ = query_desc.shape
211 |         reference_descs = reference_descs.unsqueeze(0).expand(n, -1, -1)
212 |         pred_distances = (
213 |             self.compute_distances(query_desc, reference_descs).detach().cpu()
214 |         )
215 | 
216 |         # gather ranking info for the prediction vs the actual
217 |         pose_distances = torch.norm(
218 |             (query_pose.unsqueeze(1) - reference_poses.expand(n, -1, -1)), dim=-1
219 |         )
220 |         ranking_prediction = torch.argsort(pred_distances, dim=-1)[:, :5]
221 |         ranking_layout = torch.argsort(gt_distances, dim=-1)[:, :5]
222 |         ranking_pose = torch.argsort(pose_distances, dim=-1)[:, :5]
223 | 
224 |         retrieval_error = pose_distances.gather(
225 |             1, ranking_prediction[:, 0].unsqueeze(1)
226 |         )
227 |         oracle_error = pose_distances.gather(1, ranking_layout[:, 0].unsqueeze(1))
228 | 
229 |         nn_poses = reference_poses[ranking_prediction[:, 0]]
230 | 
231 |         # vogel disc refinement
232 |         if self.config.TEST.VOGEL_DISC_REFINE:
233 |             refined_poses = self.vogel_refinement(
234 |                 query_desc, nn_poses, batch["geometry"], batch["floor"],
235 |             )
236 |         else:
237 |             refined_poses = nn_poses
238 | 
239 |         # latent pose optimisation
240 |         if self.config.TEST.LATENT_POSE_OPTIMISATION:
241 |             optimised_poses = self.latent_pose_optimisation(
242 |                 query_desc, refined_poses, batch["geometry"], batch["floor"],
243 |             )
244 |         else:
245 |             optimised_poses = refined_poses
246 | 
247 |         refined_error = torch.norm(query_pose - refined_poses, dim=-1, keepdim=True)
248 |         optimised_error = torch.norm(query_pose - optimised_poses, dim=-1, keepdim=True)
249 | 
250 |         return {
251 |             "pred_rank": ranking_prediction,
252 |             "layout_rank": ranking_layout,
253 |             "pose_rank": ranking_pose,
254 |             "retrieval_error": retrieval_error,
255 |             "optimised_error": optimised_error,
256 |             "refined_error": refined_error,
257 |             "oracle_error": oracle_error,
258 |             "pred_pose": optimised_poses,
259 |         }
260 | 
261 |     def inference_epoch_end(self, outputs, log_key):
262 |         predictions = []
263 |         layouts = []
264 |         poses = []
265 |         retrieval_errors = []
266 |         oracle_errors = []
267 |         refined_errors = []
268 |         optimised_errors = []
269 |         pred_poses = []
270 | 
271 |         scene_idxs = []
272 |         room_idxs = []
273 |         for i, out in enumerate(outputs):
274 |             predictions.extend(out["pred_rank"].unsqueeze(0))
275 |             layouts.extend(out["layout_rank"].unsqueeze(0))
276 |             poses.extend(out["pose_rank"].unsqueeze(0))
277 |             retrieval_errors.extend(out["retrieval_error"].unsqueeze(0))
278 |             oracle_errors.extend(out["oracle_error"].unsqueeze(0))
279 |             refined_errors.extend(out["refined_error"].unsqueeze(0))
280 |             optimised_errors.extend(out["optimised_error"].unsqueeze(0))
281 |             pred_poses.extend(out["pred_pose"].unsqueeze(0))
282 | 
283 |             num_rooms = len(out["retrieval_error"])
284 |             scene_idxs.extend([i] * num_rooms)
285 |             room_idxs.extend(list(range(num_rooms)))
286 | 
287 |         predictions = torch.cat(predictions)
288 |         layouts = torch.cat(layouts)
289 |         poses = torch.cat(poses)
290 |         retrieval_errors = torch.cat(retrieval_errors)
291 |         oracle_errors = torch.cat(oracle_errors)
292 |         refined_errors = torch.cat(refined_errors)
293 |         optimised_errors = torch.cat(optimised_errors)
294 |         pred_poses = torch.cat(pred_poses)
295 | 
296 |         scene_idxs = torch.tensor(scene_idxs)
297 |         room_idxs = torch.tensor(room_idxs)
298 | 
299 |         layout_r_at_1 = (predictions[:, 0] == layouts[:, 0]).float().mean().item()
300 |         pose_r_at_1 = (predictions[:, 0] == poses[:, 0]).float().mean().item()
301 |         oracle_r_at_1 = (layouts[:, 0] == poses[:, 0]).float().mean().item()
302 | 
303 |         layout_r_at_5 = recall_at_n(5, predictions, layouts).item()
304 |         pose_r_at_5 = recall_at_n(5, predictions, poses).item()
305 |         oracle_r_at_5 = recall_at_n(5, layouts, poses).item()
306 | 
307 |         median_retrieval_error = torch.median(retrieval_errors).item()
308 |         median_refined_error = torch.median(refined_errors).item()
309 |         median_optimised_error = torch.median(optimised_errors).item()
310 |         median_oracle_error = torch.median(oracle_errors).item()
311 | 
312 |         threshold_1cm = (optimised_errors < 10).float().mean().item()
313 |         threshold_5cm = (optimised_errors < 50).float().mean().item()
314 |         threshold_10cm = (optimised_errors < 100).float().mean().item()
315 |         threshold_100cm = (optimised_errors < 1000).float().mean().item()
316 | 
317 |         if self.config.TEST.METRIC_DUMP:
318 |             data = {
319 |                 "scene_idxs": scene_idxs,
320 |                 "room_idxs": room_idxs,
321 |                 "oracle": oracle_errors.cpu(),
322 |                 "refinement": refined_errors.cpu(),
323 |                 "optimisation": optimised_errors.cpu(),
324 |                 "retrieval": retrieval_errors.cpu(),
325 |                 "pred_poses": pred_poses,
326 |             }
327 |             torch.save(data, self.config.TEST.METRIC_DUMP)
328 | 
329 |         stats_to_log = {
330 |             "{}/layout_r_at_1".format(log_key): layout_r_at_1,
331 |             "{}/pose_r_at_1".format(log_key): pose_r_at_1,
332 |             "{}/layout_r_at_5".format(log_key): layout_r_at_5,
333 |             "{}/pose_r_at_5".format(log_key): pose_r_at_5,
334 |             "{}/oracle_r_at_1".format(log_key): oracle_r_at_1,
335 |             "{}/oracle_r_at_5".format(log_key): oracle_r_at_5,
336 |             "{}/median_retrieval_error".format(log_key): median_retrieval_error,
337 |             "{}/median_refined_error".format(log_key): median_refined_error,
338 |             "{}/median_optimised_error".format(log_key): median_optimised_error,
339 |             "{}/median_oracle_error".format(log_key): median_oracle_error,
340 |             "{}/threshold_1cm".format(log_key): threshold_1cm,
341 |             "{}/threshold_5cm".format(log_key): threshold_5cm,
342 |             "{}/threshold_10cm".format(log_key): threshold_10cm,
343 |             "{}/threshold_100cm".format(log_key): threshold_100cm,
344 |         }
345 |         return {"test_loss": 1 - layout_r_at_1, "log": stats_to_log}
346 | 
347 |     def train_dataloader(self):
348 |         return build_dataloader(self.config, "train")
349 | 
350 |     def val_dataloader(self):
351 |         return build_dataloader(self.config, "val")
352 | 
353 |     def test_dataloader(self):
354 |         # Convenient to make "test" actually the validation set so you can recheck val acc at any point
355 |         if self.config.TEST.VAL_AS_TEST:
356 |             return build_dataloader(self.config, "val")
357 |         return build_dataloader(self.config, "test")
358 | 
359 |     def validation_step(self, batch, batch_idx):
360 |         return self.inference_step(batch, batch_idx)
361 | 
362 |     def test_step(self, batch, batch_idx):
363 |         return self.inference_step(batch, batch_idx)
364 | 
365 |     def validation_epoch_end(self, outputs):
366 |         return self.inference_epoch_end(outputs, "val")
367 | 
368 |     def test_epoch_end(self, outputs):
369 |         return self.inference_epoch_end(outputs, "test")
370 | 
371 | 
372 | class Layout2LayoutBase(Image2LayoutBase):
373 |     def __init__(self, config):
374 |         super(Layout2LayoutBase, self).__init__(config)
375 |         self.embedder = LayoutModule(config)
376 |         self.reference_embedder = None
377 |         self.query_embedder = None
378 |         self.desc_length = config.MODEL.DESC_LENGTH
379 |         self.config = config
380 |         self.query_key = "pano_layout"
381 | 
382 |     def forward(self, q=None, r=None):
383 |         if q is None and r is None:
384 |             raise Exception
385 | 
386 |         if q is not None:
387 |             q = self.embedder(q)
388 |             if r is None:
389 |                 return q
390 | 
391 |         if r is not None:
392 |             n, m, c, h, w = r.shape
393 |             r = r.reshape(n * m, c, h, w)
394 |             r = self.embedder(r)
395 |             r = r.reshape(n, m, self.desc_length)
396 |             if q is None:
397 |                 return r
398 | 
399 |         d = self.compute_distances(q, r)
400 |         return d
401 | 


--------------------------------------------------------------------------------
/lalaloc/model/lalaloc_pp.py:
--------------------------------------------------------------------------------
  1 | import warnings
  2 | 
  3 | import numpy as np
  4 | import torch
  5 | import torch.nn.functional as F
  6 | from sklearn.decomposition import PCA
  7 | from torch.utils.data import DataLoader
  8 | 
  9 | from ..data.dataset import TargetEmbeddingDataset
 10 | from ..utils.floorplan import pose_to_pixel_loc, sample_locs
 11 | from .lalaloc_pp_base import FloorPlanUnetBase
 12 | from .losses import bbs_loss
 13 | from .modules import LayoutDecoder
 14 | from .unet import UNet
 15 | 
 16 | ROOM_VALUE = 0.5
 17 | 
 18 | 
 19 | def visualise_features(features, valid_mask=None):
 20 |     features = features.cpu()
 21 |     if valid_mask is not None:
 22 |         valid_mask = valid_mask.cpu()
 23 |     for feature_map in features:
 24 |         h, w = feature_map.shape[-2:]
 25 |         feature_map = feature_map.view(-1, h * w).transpose(0, 1).numpy()
 26 |         pca = PCA(n_components=3)
 27 |         feature_map_pca = pca.fit_transform(feature_map)
 28 |         feature_map_pca = feature_map_pca.reshape(h, w, 3)
 29 | 
 30 |         shift = np.min(feature_map_pca)
 31 |         feature_map_pca -= shift
 32 |         scale = np.max(feature_map_pca)
 33 |         feature_map_pca /= scale
 34 | 
 35 |         if valid_mask is not None:
 36 |             invalid_mask = ~valid_mask
 37 |             feature_map_pca[invalid_mask] = 1.0
 38 | 
 39 |         return feature_map_pca
 40 | 
 41 | 
 42 | class FloorPlanUnetLayout(FloorPlanUnetBase):
 43 |     def __init__(self, config):
 44 |         super(FloorPlanUnetLayout, self).__init__(config)
 45 |         self.layout_decoder = LayoutDecoder(config)
 46 | 
 47 |     def training_step(self, batch, batch_idx):
 48 |         plans = batch["floorplan"]
 49 |         plan_params = batch["floorplan_params"]
 50 |         plan_scale = plan_params["scale"]
 51 |         plan_shift = plan_params["shift"]
 52 |         plan_heights = plan_params["h"]
 53 |         plan_widths = plan_params["w"]
 54 |         query_layouts = batch["pano_layout"]
 55 |         query_pose = batch["pano_pose"]
 56 |         reference_layouts = batch["sampled_layouts"]
 57 |         reference_poses = batch["sampled_poses"]
 58 | 
 59 |         query_locs = pose_to_pixel_loc(query_pose.unsqueeze(1), plan_scale, plan_shift)
 60 |         reference_locs = pose_to_pixel_loc(reference_poses, plan_scale, plan_shift)
 61 | 
 62 |         # embed floor plan and sample locations
 63 |         query_embed = []
 64 |         reference_embed = []
 65 |         for plan, query_loc, reference_loc, h, w in zip(
 66 |             plans, query_locs, reference_locs, plan_heights, plan_widths
 67 |         ):
 68 |             plan_embed = self.floorplan_encoder(plan[:, :h, :w].unsqueeze(0))
 69 |             qry_embed = sample_locs(
 70 |                 plan_embed,
 71 |                 query_loc.unsqueeze(0),
 72 |                 normalise=self.config.MODEL.NORMALISE_SAMPLE,
 73 |             )
 74 |             ref_embed = sample_locs(
 75 |                 plan_embed,
 76 |                 reference_loc.unsqueeze(0),
 77 |                 normalise=self.config.MODEL.NORMALISE_SAMPLE,
 78 |             )
 79 |             query_embed.append(qry_embed)
 80 |             reference_embed.append(ref_embed)
 81 |         query_embed = torch.cat(query_embed).squeeze(1)
 82 |         reference_embed = torch.cat(reference_embed)
 83 | 
 84 |         # decode the layout embeddings for both queries and reference
 85 |         query_decoded = self.layout_decoder(query_embed)
 86 |         query_target = F.interpolate(
 87 |             query_layouts.detach().clone(), self.config.MODEL.DECODER_RESOLUTION
 88 |         )
 89 |         reference_decoded = self.layout_decoder(reference_embed)
 90 |         h, w = reference_layouts.shape[-2:]
 91 |         reference_targets = F.interpolate(
 92 |             reference_layouts.view(-1, 1, h, w).detach().clone(),
 93 |             self.config.MODEL.DECODER_RESOLUTION,
 94 |         )
 95 |         decoded = torch.cat([query_decoded, reference_decoded], dim=0)
 96 | 
 97 |         # compute decoding loss
 98 |         target = torch.cat([query_target, reference_targets], dim=0)
 99 |         loss_decode = F.l1_loss(decoded, target)
100 |         loss = self.config.TRAIN.DECODER_LOSS_SCALE * loss_decode
101 |         stats_to_log = {"train/decoder_loss": loss_decode.item()}
102 | 
103 |         # compute bbs loss if specified
104 |         if self.config.TRAIN.LOSS == "decoder_plus_bbs":
105 |             gt_distances = batch["distances"]
106 |             gt_distances = gt_distances.float().to(self.device)
107 |             distances = self.compute_distances(query_embed, reference_embed)
108 |             loss_layout = bbs_loss(distances, gt_distances)
109 |             stats_to_log["train/layout_loss"] = loss_layout.item()
110 |             loss = loss + self.config.TRAIN.LAYOUT_LOSS_SCALE * loss_layout
111 | 
112 |         stats_to_log["train/loss"] = loss.item()
113 |         return {"loss": loss, "log": stats_to_log}
114 | 
115 |     def inference_step(self, batch, batch_idx):
116 |         plan_params = batch["floorplan_params"]
117 |         plan_height = plan_params["h"]
118 |         plan_width = plan_params["w"]
119 |         plan = batch["floorplan"][:, :plan_height, :plan_width].unsqueeze(0)
120 |         plan_scale = torch.Tensor([plan_params["scale"]]).to(self.device)
121 |         plan_shift = plan_params["shift"]
122 | 
123 |         plan_embed = self.floorplan_encoder(plan).detach()
124 | 
125 |         # plot latent floor plan
126 |         valid_loc_mask = plan[0, 0] == ROOM_VALUE
127 |         vis_features = visualise_features(plan_embed, valid_loc_mask)
128 |         self.logger.experiment.add_image(
129 |             f"unet_feats_{batch_idx}",
130 |             vis_features,
131 |             self.current_epoch,
132 |             dataformats="HWC",
133 |         )
134 | 
135 |         # sample embedding at query location
136 |         query_pose = batch["pano_pose"]
137 |         query_z = query_pose[0, -1]
138 |         query_loc = pose_to_pixel_loc(
139 |             query_pose.unsqueeze(0).clone(), plan_scale, plan_shift
140 |         )
141 |         query_embed = sample_locs(plan_embed, query_loc).squeeze(0)
142 | 
143 |         # legacy sampling of sparse grid to emulate LaLaLoc
144 |         reference_poses = batch["sampled_poses"]
145 |         reference_locs = pose_to_pixel_loc(
146 |             reference_poses.unsqueeze(0).clone(), plan_scale, plan_shift
147 |         )
148 |         reference_embed = sample_locs(plan_embed, reference_locs).squeeze(0)
149 |         n, _ = query_embed.shape
150 |         reference_embed = reference_embed.unsqueeze(0).expand(n, -1, -1)
151 | 
152 |         distances = self.compute_distances(query_embed, reference_embed).detach().cpu()
153 |         gt_distances = batch["distances"].cpu()
154 |         pose_distances = torch.norm(
155 |             (query_pose.unsqueeze(1) - reference_poses.expand(n, -1, -1)), dim=-1
156 |         ).cpu()
157 | 
158 |         # gather ranking info for the prediction vs the actual
159 |         ranking_prediction = torch.argsort(distances, dim=-1)[:, :5]
160 |         ranking_layout = torch.argsort(gt_distances, dim=-1)[:, :5]
161 |         ranking_pose = torch.argsort(pose_distances, dim=-1)[:, :5]
162 | 
163 |         retrieval_error = pose_distances.gather(
164 |             1, ranking_prediction[:, 0].unsqueeze(1)
165 |         )
166 |         oracle_error = pose_distances.gather(1, ranking_layout[:, 0].unsqueeze(1))
167 | 
168 |         # retrieve from dense LaLaLoc++ prediction
169 |         refined_poses = optimised_poses = self.predict_pose_dense(
170 |             query_embed, plan_embed, plan_scale, plan_shift, query_z, valid_loc_mask
171 |         ).to(self.device)
172 | 
173 |         refined_error = torch.norm(query_pose - refined_poses, dim=-1, keepdim=True)
174 |         optimised_error = torch.norm(query_pose - optimised_poses, dim=-1, keepdim=True)
175 | 
176 |         return {
177 |             "pred_rank": ranking_prediction,
178 |             "layout_rank": ranking_layout,
179 |             "pose_rank": ranking_pose,
180 |             "retrieval_error": retrieval_error,
181 |             "optimised_error": optimised_error,
182 |             "refined_error": refined_error,
183 |             "oracle_error": oracle_error,
184 |             "pred_pose": optimised_poses,
185 |         }
186 | 
187 | 
188 | class FloorPlanUnetImage(FloorPlanUnetBase):
189 |     def __init__(self, config):
190 |         super(FloorPlanUnetImage, self).__init__(config)
191 |         self.load_weights_from_plan_only(config.TRAIN.SOURCE_WEIGHTS)
192 |         for p in self.floorplan_encoder.parameters():
193 |             p.requires_grad = False
194 | 
195 |     def load_weights_from_plan_only(self, ckpt_path):
196 |         if not ckpt_path:
197 |             warnings.warn("No source for the layout branch weights was specified")
198 |             return
199 |         print("Loading Floor Plan Encoder from {}".format(ckpt_path))
200 |         # load weights from plan-branch-only model
201 |         ckpt_dict = torch.load(ckpt_path)
202 |         model_weights = ckpt_dict["state_dict"]
203 | 
204 |         # load "embedder" weights into "reference_embedder"
205 |         load_dict = {}
206 |         for k, v in model_weights.items():
207 |             modules = k.split(".")
208 |             parent = modules[0]
209 |             if parent == "floorplan_encoder":
210 |                 child = ".".join(modules[1:])
211 |                 load_dict[child] = v
212 |         self.floorplan_encoder.load_state_dict(load_dict)
213 | 
214 |     def train_dataloader(self):
215 |         dataset = TargetEmbeddingDataset(
216 |             self.floorplan_encoder, self.config, device=self.device
217 |         )
218 |         batch_size = self.config.TRAIN.BATCH_SIZE
219 |         num_workers = self.config.SYSTEM.NUM_WORKERS
220 |         dataloader = DataLoader(
221 |             dataset, batch_size, shuffle=True, num_workers=num_workers,
222 |         )
223 |         return dataloader
224 | 
225 |     def training_step(self, batch, batch_idx):
226 |         query_image = batch["panorama"]
227 |         target_embed = batch["target_embedding"]
228 | 
229 |         query_embed = self.forward(q=query_image)
230 | 
231 |         loss = ((target_embed - query_embed) ** 2).sum(dim=1).sqrt().mean()
232 |         stats_to_log = {"train/loss": loss.item()}
233 |         return {"loss": loss, "log": stats_to_log}
234 | 
235 |     def inference_step(self, batch, batch_idx):
236 |         plan_params = batch["floorplan_params"]
237 |         plan_height = plan_params["h"]
238 |         plan_width = plan_params["w"]
239 |         plan = batch["floorplan"][:, :plan_height, :plan_width].unsqueeze(0)
240 |         plan_scale = torch.Tensor([plan_params["scale"]]).to(self.device)
241 |         plan_shift = plan_params["shift"]
242 | 
243 |         plan_embed = self.floorplan_encoder(plan).detach()
244 | 
245 |         # if specified subsample the embedded floor plan
246 |         subsample_x = self.config.TEST.SUBSAMPLE_PLAN_X
247 |         if subsample_x > 1:
248 |             plan_embed = plan_embed[:, :, ::subsample_x, ::subsample_x]
249 |             plan = plan[:, :, ::subsample_x, ::subsample_x]
250 |             plan_scale = plan_scale / subsample_x
251 | 
252 |         # sample embedding at query location
253 |         query_image = batch[self.query_key]
254 |         query_pose = batch["pano_pose"]
255 |         query_embed = self.forward(q=query_image).detach()
256 | 
257 |         # plot latent floor plan
258 |         query_z = query_pose[0, -1]
259 |         valid_loc_mask = plan[0, 0] == ROOM_VALUE
260 |         vis_features = visualise_features(plan_embed, valid_loc_mask)
261 |         self.logger.experiment.add_image(
262 |             f"unet_feats_{batch_idx}",
263 |             vis_features,
264 |             self.current_epoch,
265 |             dataformats="HWC",
266 |         )
267 | 
268 |         # legacy sampling of sparse grid to emulate LaLaLoc
269 |         reference_poses = batch["sampled_poses"]
270 |         reference_locs = pose_to_pixel_loc(
271 |             reference_poses.unsqueeze(0).clone(), plan_scale, plan_shift
272 |         )
273 |         reference_embed = sample_locs(plan_embed, reference_locs).squeeze(0)
274 |         n, _ = query_embed.shape
275 |         reference_embed = reference_embed.unsqueeze(0).expand(n, -1, -1)
276 | 
277 |         distances = self.compute_distances(query_embed, reference_embed).detach().cpu()
278 |         pose_distances = torch.norm(
279 |             (query_pose.unsqueeze(1) - reference_poses.expand(n, -1, -1)), dim=-1
280 |         ).cpu()
281 |         # gather ranking info for the prediction vs the actual
282 |         ranking_prediction = torch.argsort(distances, dim=-1)[:, :5]
283 |         ranking_pose = torch.argsort(pose_distances, dim=-1)[:, :5]
284 | 
285 |         retrieval_error = pose_distances.gather(
286 |             1, ranking_prediction[:, 0].unsqueeze(1)
287 |         )
288 | 
289 |         if self.config.TEST.COMPUTE_GT_DIST:
290 |             gt_distances = batch["distances"].cpu()
291 |             ranking_layout = torch.argsort(gt_distances, dim=-1)[:, :5]
292 |             oracle_error = pose_distances.gather(1, ranking_layout[:, 0].unsqueeze(1))
293 |         else:
294 |             oracle_error = torch.zeros_like(retrieval_error)
295 |             ranking_layout = torch.zeros_like(ranking_prediction)
296 | 
297 |         # retrieve from dense prediction and optimise them
298 |         refined_poses = self.predict_pose_dense(
299 |             query_embed, plan_embed, plan_scale, plan_shift, query_z, valid_loc_mask
300 |         )
301 |         optimised_poses = self.optimise_pose(
302 |             query_embed, refined_poses.clone(), plan_embed, plan_scale, plan_shift
303 |         )
304 | 
305 |         refined_error = torch.norm(query_pose - refined_poses, dim=-1, keepdim=True)
306 |         optimised_error = torch.norm(query_pose - optimised_poses, dim=-1, keepdim=True)
307 |         return {
308 |             "pred_rank": ranking_prediction,
309 |             "layout_rank": ranking_layout,
310 |             "pose_rank": ranking_pose,
311 |             "retrieval_error": retrieval_error,
312 |             "optimised_error": optimised_error,
313 |             "refined_error": refined_error,
314 |             "oracle_error": oracle_error,
315 |             "pred_pose": optimised_poses,
316 |         }
317 | 
318 | 


--------------------------------------------------------------------------------
/lalaloc/model/lalaloc_pp_base.py:
--------------------------------------------------------------------------------
 1 | import warnings
 2 | 
 3 | import numpy as np
 4 | import torch
 5 | import torch.nn.functional as F
 6 | from sklearn.decomposition import PCA
 7 | from torch.utils.data import DataLoader
 8 | 
 9 | from ..utils.floorplan import (
10 |     create_pixel_loc_grid,
11 |     pixel_loc_to_pose,
12 |     pose_to_pixel_loc,
13 |     sample_locs,
14 | )
15 | from .lalaloc_base import Image2LayoutBase
16 | from .pose_optimisation import PoseConvergenceChecker, init_optimiser
17 | from .unet import UNet
18 | 
19 | 
20 | class FloorPlanUnetBase(Image2LayoutBase):
21 |     def __init__(self, config):
22 |         super(FloorPlanUnetBase, self).__init__(config)
23 |         # Remove uneeded LaLaLoc layout branch
24 |         self.reference_embedder = None
25 |         # Create LaLaLoc++ plan branch
26 |         self.floorplan_encoder = UNet(config)
27 | 
28 |     def predict_pose_dense(
29 |         self, query_desc, plan_embed, plan_scale, plan_shift, query_z, mask=None
30 |     ):
31 |         _, c, h, w = plan_embed.shape
32 |         n = query_desc.shape[0]
33 |         dense_loc_grid = create_pixel_loc_grid(w, h)
34 |         dense_pose_grid = pixel_loc_to_pose(
35 |             dense_loc_grid, plan_scale.cpu(), plan_shift.cpu(), query_z
36 |         ).cpu()
37 |         plan_embed_ = plan_embed.clone()
38 |         if mask is not None:
39 |             dense_pose_grid = dense_pose_grid[mask, :]
40 |             plan_embed_ = plan_embed_[:, :, mask]
41 |         dense_poses = dense_pose_grid.view(-1, 3)
42 |         plan_embed_ = plan_embed_.view(c, -1).transpose(0, 1)
43 | 
44 |         plan_embed_ = plan_embed_.unsqueeze(0).expand(n, -1, -1)
45 |         pred_distances = self.compute_distances(query_desc, plan_embed_).detach().cpu()
46 |         ranking_dense = torch.argsort(pred_distances, dim=-1)[:, :5]
47 |         pred_poses = dense_poses[ranking_dense[:, 0]].to(self.device)
48 |         return pred_poses
49 | 
50 |     def optimise_pose(
51 |         self, query_embeddings, initial_poses, feature_map, plan_scale, plan_shift
52 |     ):
53 |         torch.set_grad_enabled(True)
54 |         initial_locs = pose_to_pixel_loc(
55 |             initial_poses.unsqueeze(1), plan_scale, plan_shift
56 |         )
57 |         refined_locs = []
58 |         for query_embedding, initial_loc in zip(query_embeddings, initial_locs):
59 |             offset = torch.zeros((1, 2), requires_grad=True, device=self.device)
60 | 
61 |             # initialise optimisation and stopping metrics
62 |             optimiser, scheduler = init_optimiser(self.config, [offset])
63 |             convergence_checker = PoseConvergenceChecker(self.config)
64 | 
65 |             for j in range(self.config.POSE_REFINE.MAX_ITERS):
66 |                 optimiser.zero_grad()
67 |                 embedding = sample_locs(
68 |                     feature_map, (initial_loc + offset).unsqueeze(0)
69 |                 )
70 |                 loss = torch.norm(query_embedding - embedding, dim=-1)
71 |                 loss.backward()
72 | 
73 |                 current_loss = loss.item()
74 |                 current_loc = (initial_loc + offset).clone().detach()
75 |                 if convergence_checker.has_converged(current_loss, current_loc):
76 |                     break
77 |                 optimiser.step()
78 |                 scheduler.step(current_loss)
79 |             refined_loc = convergence_checker.best_pose
80 |             refined_locs.append(refined_loc)
81 |         refined_locs = torch.stack(refined_locs)
82 |         z = initial_poses[0, -1]
83 |         refined_poses = pixel_loc_to_pose(refined_locs, plan_scale, plan_shift, z).view(
84 |             -1, 3
85 |         )
86 |         torch.set_grad_enabled(False)
87 |         return refined_poses
88 | 


--------------------------------------------------------------------------------
/lalaloc/model/losses.py:
--------------------------------------------------------------------------------
 1 | import torch
 2 | 
 3 | def triplet_loss(distances, margin=1):
 4 |     pos_distances = distances[:, 0]
 5 |     neg_distances = distances[:, 1]
 6 | 
 7 |     losses = (pos_distances - neg_distances + margin).clamp(min=0)
 8 |     loss = losses.sum()
 9 |     if loss > 0:
10 |         loss = loss / len(torch.nonzero(losses))
11 |     return loss
12 | 
13 | 
14 | def bbs_loss(distances, distances_truth):
15 |     n = distances.shape[1]
16 |     rows = distances.unsqueeze(1).expand(-1, n, -1)
17 |     cols = distances.unsqueeze(2).expand(-1, -1, n)
18 | 
19 |     rows_gt = distances_truth.unsqueeze(1).expand(-1, n, -1)
20 |     cols_gt = distances_truth.unsqueeze(2).expand(-1, -1, n)
21 | 
22 |     loss = (rows / cols).log() - (rows_gt / cols_gt).log()
23 |     # remove i, i matches
24 |     identity = torch.eye(n, device=loss.device).unsqueeze(0).expand_as(loss).bool()
25 |     loss = loss[~identity]
26 | 
27 |     loss = (loss ** 2).mean() / 2
28 |     return loss


--------------------------------------------------------------------------------
/lalaloc/model/modules.py:
--------------------------------------------------------------------------------
  1 | import torch
  2 | import torch.nn as nn
  3 | import torch.nn.functional as F
  4 | from torchvision import models
  5 | 
  6 | from .position_encoding import build_position_encoding
  7 | from .transformer import TransformerEncoder, TransformerEncoderLayer
  8 | 
  9 | 
 10 | class EmbeddingModule(nn.Module):
 11 |     def __init__(self, in_channels, desc_channels):
 12 |         super(EmbeddingModule, self).__init__()
 13 |         self.pool = nn.AdaptiveAvgPool2d((1, 1))
 14 |         self.fc = nn.Linear(in_channels, desc_channels)
 15 | 
 16 |     def forward(self, x):
 17 |         x = self.pool(x)
 18 |         x = torch.flatten(x, 1)
 19 |         x = self.fc(x)
 20 |         return x
 21 | 
 22 | 
 23 | class MLPEmbeddingModule(nn.Module):
 24 |     def __init__(self, in_channels, desc_channels):
 25 |         super(MLPEmbeddingModule, self).__init__()
 26 |         self.pool = nn.AdaptiveAvgPool2d((1, 1))
 27 |         self.mlp = nn.Sequential(
 28 |             nn.Linear(in_channels, in_channels),
 29 |             nn.ReLU(),
 30 |             nn.Linear(in_channels, desc_channels),
 31 |         )
 32 | 
 33 |     def forward(self, x):
 34 |         x = self.pool(x)
 35 |         x = torch.flatten(x, 1)
 36 |         x = self.mlp(x)
 37 |         return x
 38 | 
 39 | 
 40 | class TransformerFCEmbeddingModule(nn.Module):
 41 |     def __init__(
 42 |         self,
 43 |         in_channels,
 44 |         desc_channels,
 45 |         pos_at_input=True,
 46 |         hidden_dim=2048,
 47 |         num_heads=8,
 48 |         num_blocks=2,
 49 |     ):
 50 |         super().__init__()
 51 |         self.position_encoder = build_position_encoding(hidden_dim=desc_channels)
 52 |         self.dim_reduction = nn.Conv2d(in_channels, desc_channels, 1)
 53 |         encoder_layer = TransformerEncoderLayer(desc_channels, num_heads, hidden_dim)
 54 |         self.encoder = TransformerEncoder(encoder_layer, num_blocks)
 55 |         self.pool = nn.AdaptiveAvgPool2d((1, 1))
 56 |         self.fc = nn.Linear(desc_channels, desc_channels)
 57 |         self.pos_at_input = pos_at_input
 58 | 
 59 |     def forward(self, x):
 60 |         pos = self.position_encoder(x)
 61 |         x = self.dim_reduction(x)
 62 |         b, c, h, w = x.shape
 63 | 
 64 |         x = x.flatten(2).permute(2, 0, 1)  # NxCxHxW -> HWxNxC
 65 |         pos = pos.flatten(2).permute(2, 0, 1)  # NxCxHxW -> HWxNxC
 66 |         if self.pos_at_input:
 67 |             x = x + pos
 68 |             pos = None
 69 | 
 70 |         x = self.encoder(x, pos=pos).permute(1, 2, 0).view(b, c, h, w)
 71 |         x = self.pool(x).flatten(1)
 72 |         x = self.fc(x)
 73 |         return x
 74 | 
 75 | 
 76 | class PanoModule(nn.Module):
 77 |     def __init__(self, config):
 78 |         super(PanoModule, self).__init__()
 79 |         desc_length = config.MODEL.DESC_LENGTH
 80 |         normalise = config.MODEL.NORMALISE_EMBEDDING
 81 |         pos_at_input = config.MODEL.PANORAMA_MODULE.POS_AT_INPUT
 82 |         num_blocks = config.MODEL.PANORAMA_MODULE.NUM_BLOCKS
 83 |         hidden_dim = config.MODEL.PANORAMA_MODULE.HIDDEN_DIM
 84 |         net, out_dim = _create_backbone(config.MODEL.PANORAMA_BACKBONE)
 85 |         self.layers = nn.Sequential(*list(net.children())[:-2])
 86 |         if config.MODEL.PANO_EMBEDDER_TYPE == "fc":
 87 |             self.embedding = EmbeddingModule(out_dim, desc_length)
 88 |         elif config.MODEL.PANO_EMBEDDER_TYPE == "mlp":
 89 |             self.embedding = MLPEmbeddingModule(out_dim, desc_length)
 90 |         elif config.MODEL.PANO_EMBEDDER_TYPE == "transformer-fc":
 91 |             self.embedding = TransformerFCEmbeddingModule(
 92 |                 out_dim,
 93 |                 desc_length,
 94 |                 pos_at_input=pos_at_input,
 95 |                 num_blocks=num_blocks,
 96 |                 hidden_dim=hidden_dim,
 97 |             )
 98 |         self.normalise = normalise
 99 | 
100 |     def forward(self, x):
101 |         x = self.layers(x)
102 |         x = self.embedding(x)
103 |         if self.normalise:
104 |             x = F.normalize(x)
105 |         return x
106 | 
107 | 
108 | class LayoutModule(PanoModule):
109 |     def __init__(self, config):
110 |         super(LayoutModule, self).__init__(config)
111 |         desc_length = config.MODEL.DESC_LENGTH
112 |         net, out_dim = _create_backbone(config.MODEL.LAYOUT_BACKBONE)
113 |         layers = [
114 |             nn.Conv2d(1, 64, kernel_size=7, stride=2, padding=3),
115 |         ]
116 |         layers += list(net.children())[1:-2]
117 |         self.layers = nn.Sequential(*layers)
118 |         if config.MODEL.LAYOUT_EMBEDDER_TYPE == "fc":
119 |             self.embedding = EmbeddingModule(out_dim, desc_length)
120 |         elif config.MODEL.LAYOUT_EMBEDDER_TYPE == "mlp":
121 |             self.embedding = MLPEmbeddingModule(out_dim, desc_length)
122 | 
123 | 
124 | class LayoutDecoder(nn.Module):
125 |     def __init__(self, config):
126 |         super().__init__()
127 |         desc_length = config.MODEL.DESC_LENGTH
128 |         self.fc = nn.Sequential(nn.Linear(desc_length, 2048), nn.ReLU())
129 |         upsample_layers = [
130 |             nn.Upsample(scale_factor=2.0, mode="bilinear", align_corners=False),
131 |             nn.Conv2d(256, 128, 3, padding=1),
132 |             nn.ReLU(),
133 |             nn.BatchNorm2d(128),
134 |             nn.Upsample(scale_factor=2.0, mode="bilinear", align_corners=False),
135 |             nn.Conv2d(128, 64, 3, padding=1),
136 |             nn.ReLU(),
137 |             nn.BatchNorm2d(64),
138 |             nn.Upsample(scale_factor=2.0, mode="bilinear", align_corners=False),
139 |             nn.Conv2d(64, 32, 3, padding=1),
140 |             nn.ReLU(),
141 |             nn.BatchNorm2d(32),
142 |             nn.Upsample(scale_factor=2.0, mode="bilinear", align_corners=False),
143 |             nn.Conv2d(32, 32, 3, padding=1),
144 |             nn.ReLU(),
145 |             nn.BatchNorm2d(32),
146 |             nn.Conv2d(32, 1, 1),
147 |         ]
148 |         self.decov_layers = nn.Sequential(*upsample_layers)
149 | 
150 |     def forward(self, x):
151 |         x = self.fc(x)
152 |         x = x.view(-1, 256, 2, 4)
153 |         x = self.decov_layers(x)
154 |         return x
155 | 
156 | 
157 | def _create_backbone(name):
158 |     backbones = {
159 |         "resnet18": (models.resnet18(pretrained=True), 512),
160 |         "resnet50": (models.resnet50(pretrained=True), 2048),
161 |     }
162 |     return backbones[name]
163 | 


--------------------------------------------------------------------------------
/lalaloc/model/pose_optimisation.py:
--------------------------------------------------------------------------------
 1 | import pyredner
 2 | import torch
 3 | 
 4 | 
 5 | class PoseConvergenceChecker:
 6 |     def __init__(self, config):
 7 |         self.best_loss = 1e5
 8 |         self.best_pose = None
 9 |         self.converge_count = 0
10 |         self.converge_threshold = config.POSE_REFINE.CONVERGANCE_THRESHOLD
11 |         self.converge_patience = config.POSE_REFINE.CONVERGANCE_PATIENCE
12 | 
13 |     def has_converged(self, current_loss, current_pose):
14 |         delta = 0
15 |         if current_loss < self.best_loss:
16 |             delta = self.best_loss - current_loss
17 |             self.best_pose = current_pose
18 |             self.best_loss = current_loss
19 |         if delta < self.converge_threshold:
20 |             self.converge_count += 1
21 |         else:
22 |             self.converge_count = 0
23 | 
24 |         return self.converge_count > self.converge_patience
25 | 
26 | 
27 | def init_objects_at_pose(pose, mesh, device):
28 |     # Creates pyrender objects from the mesh at the specified pose
29 |     objects = []
30 |     material = pyredner.Material()
31 |     for verts, faces in zip(mesh.verts_list(), mesh.faces_list()):
32 |         verts = verts.to(device) - pose.unsqueeze(0)
33 |         faces = faces.to(device)
34 |         objects.append(pyredner.Object(verts, faces.int(), material))
35 |     vertices = [obj.vertices.clone() for obj in objects]
36 | 
37 |     return objects, vertices
38 | 
39 | 
40 | def init_camera_at_origin(config):
41 |     # Creates a pyrender camera at the origin
42 |     origin = torch.Tensor([0.0, 0.0, 0.0])
43 |     # look at x axis equivalent to an offset x'
44 |     look_at = torch.Tensor([1, 0, 0])
45 |     up = torch.Tensor([0.0, 0.0, 1.0])
46 |     camera = pyredner.Camera(
47 |         position=origin,
48 |         look_at=look_at,
49 |         up=up,
50 |         camera_type=pyredner.camera_type.panorama,
51 |         resolution=config.POSE_REFINE.RENDER_SIZE,
52 |     )
53 |     return camera
54 | 
55 | 
56 | def init_optimiser(config, params):
57 |     optimiser = torch.optim.Adam(params, lr=config.POSE_REFINE.LR)
58 |     scheduler = torch.optim.lr_scheduler.ReduceLROnPlateau(
59 |         optimiser,
60 |         patience=config.POSE_REFINE.SCHEDULER_PATIENCE,
61 |         threshold=config.POSE_REFINE.SCHEDULER_THRESHOLD,
62 |         factor=config.POSE_REFINE.SCHEDULER_DECAY,
63 |     )
64 |     return optimiser, scheduler
65 | 
66 | 
67 | def render_at_pose(camera, objects, vertices, pose):
68 |     for i in range(len(objects)):
69 |         objects[i].vertices = vertices[i] - pose * 1000
70 | 
71 |     scene = pyredner.Scene(camera=camera, objects=objects)
72 |     img = pyredner.render_g_buffer(scene, [pyredner.channels.depth], device=pose.device)
73 |     img = img.flip(dims=[1])
74 |     return img.squeeze(2)
75 | 
76 | 


--------------------------------------------------------------------------------
/lalaloc/model/position_encoding.py:
--------------------------------------------------------------------------------
 1 | # Copyright (c) Facebook, Inc. and its affiliates. All Rights Reserved
 2 | """
 3 | Various positional encodings for the transformer.
 4 | """
 5 | import math
 6 | 
 7 | import torch
 8 | from torch import nn
 9 | 
10 | 
11 | class PositionEmbeddingSine(nn.Module):
12 |     """
13 |     This is a more standard version of the position embedding, very similar to the one
14 |     used by the Attention is all you need paper, generalized to work on images.
15 |     """
16 | 
17 |     def __init__(
18 |         self, num_pos_feats=64, temperature=10000, normalize=False, scale=None
19 |     ):
20 |         super().__init__()
21 |         self.num_pos_feats = num_pos_feats
22 |         self.temperature = temperature
23 |         self.normalize = normalize
24 |         if scale is not None and normalize is False:
25 |             raise ValueError("normalize should be True if scale is passed")
26 |         if scale is None:
27 |             scale = 2 * math.pi
28 |         self.scale = scale
29 | 
30 |     def forward(self, x):
31 |         b, c, h, w = x.shape
32 |         not_mask = torch.ones((b, h, w), dtype=torch.uint8, device=x.device)
33 | 
34 |         y_embed = not_mask.cumsum(1, dtype=torch.float32)
35 |         x_embed = not_mask.cumsum(2, dtype=torch.float32)
36 |         if self.normalize:
37 |             eps = 1e-6
38 |             y_embed = y_embed / (y_embed[:, -1:, :] + eps) * self.scale
39 |             x_embed = x_embed / (x_embed[:, :, -1:] + eps) * self.scale
40 | 
41 |         dim_t = torch.arange(self.num_pos_feats, dtype=torch.float32, device=x.device)
42 |         dim_t = self.temperature ** (2 * (dim_t // 2) / self.num_pos_feats)
43 | 
44 |         pos_x = x_embed[:, :, :, None] / dim_t
45 |         pos_y = y_embed[:, :, :, None] / dim_t
46 | 
47 |         pos_x = torch.stack(
48 |             (pos_x[:, :, :, 0::2].sin(), pos_x[:, :, :, 1::2].cos()), dim=4
49 |         ).flatten(3)
50 |         pos_y = torch.stack(
51 |             (pos_y[:, :, :, 0::2].sin(), pos_y[:, :, :, 1::2].cos()), dim=4
52 |         ).flatten(3)
53 |         pos = torch.cat((pos_y, pos_x), dim=3).permute(0, 3, 1, 2)
54 | 
55 |         return pos
56 | 
57 | 
58 | def build_position_encoding(position_embedding_mode="sine", hidden_dim=256):
59 |     N_steps = hidden_dim // 2
60 |     if position_embedding_mode in ("v2", "sine"):
61 |         # TODO find a better way of exposing other arguments
62 |         position_embedding = PositionEmbeddingSine(N_steps, normalize=True)
63 |     else:
64 |         raise ValueError(f"not supported {position_embedding_mode}")
65 | 
66 |     return position_embedding
67 | 


--------------------------------------------------------------------------------
/lalaloc/model/transformer.py:
--------------------------------------------------------------------------------
  1 | from typing import Optional
  2 | 
  3 | import torch.nn as nn
  4 | from torch import Tensor
  5 | 
  6 | 
  7 | class TransformerEncoder(nn.TransformerEncoder):
  8 |     def forward(
  9 |         self,
 10 |         src: Tensor,
 11 |         mask: Optional[Tensor] = None,
 12 |         src_key_padding_mask: Optional[Tensor] = None,
 13 |         pos: Optional[Tensor] = None,
 14 |     ) -> Tensor:
 15 |         output = src
 16 | 
 17 |         for mod in self.layers:
 18 |             output = mod(
 19 |                 output,
 20 |                 src_mask=mask,
 21 |                 src_key_padding_mask=src_key_padding_mask,
 22 |                 pos=pos,
 23 |             )
 24 | 
 25 |         if self.norm is not None:
 26 |             output = self.norm(output)
 27 | 
 28 |         return output
 29 | 
 30 | 
 31 | class TransformerEncoderLayer(nn.TransformerEncoderLayer):
 32 |     def with_pos_embed(self, tensor, pos: Optional[Tensor]):
 33 |         # print("encoder", pos is None)
 34 |         return tensor if pos is None else tensor + pos
 35 | 
 36 |     def forward(
 37 |         self,
 38 |         src: Tensor,
 39 |         src_mask: Optional[Tensor] = None,
 40 |         src_key_padding_mask: Optional[Tensor] = None,
 41 |         pos: Optional[Tensor] = None,
 42 |     ) -> Tensor:
 43 |         q = k = self.with_pos_embed(src, pos)
 44 |         src2 = self.self_attn(
 45 |             q, k, src, attn_mask=src_mask, key_padding_mask=src_key_padding_mask
 46 |         )[0]
 47 |         src = src + self.dropout1(src2)
 48 |         src = self.norm1(src)
 49 |         src2 = self.linear2(self.dropout(self.activation(self.linear1(src))))
 50 |         src = src + self.dropout2(src2)
 51 |         src = self.norm2(src)
 52 |         return src
 53 | 
 54 | 
 55 | class TransformerDecoder(nn.TransformerDecoder):
 56 |     def forward(
 57 |         self,
 58 |         tgt: Tensor,
 59 |         memory: Tensor,
 60 |         tgt_mask: Optional[Tensor] = None,
 61 |         memory_mask: Optional[Tensor] = None,
 62 |         tgt_key_padding_mask: Optional[Tensor] = None,
 63 |         memory_key_padding_mask: Optional[Tensor] = None,
 64 |         tgt_pos: Optional[Tensor] = None,
 65 |         memory_pos: Optional[Tensor] = None,
 66 |         output_attention: Optional[bool] = False,
 67 |     ) -> Tensor:
 68 |         output = tgt
 69 |         attentions = []
 70 |         for mod in self.layers:
 71 |             output, attention = mod(
 72 |                 output,
 73 |                 memory,
 74 |                 tgt_mask=tgt_mask,
 75 |                 memory_mask=memory_mask,
 76 |                 tgt_key_padding_mask=tgt_key_padding_mask,
 77 |                 memory_key_padding_mask=memory_key_padding_mask,
 78 |                 tgt_pos=tgt_pos,
 79 |                 memory_pos=memory_pos,
 80 |                 output_attention=True,
 81 |             )
 82 |             attentions.append(attention)
 83 | 
 84 |         if self.norm is not None:
 85 |             output = self.norm(output)
 86 | 
 87 |         if output_attention:
 88 |             return output, attentions
 89 |         return output
 90 | 
 91 | 
 92 | class TransformerDecoderLayer(nn.TransformerDecoderLayer):
 93 |     def __init__(
 94 |         self, d_model, nhead, dim_feedforward=2048, dropout=0.1, activation="relu"
 95 |     ):
 96 |         super().__init__(
 97 |             d_model,
 98 |             nhead,
 99 |             dim_feedforward=dim_feedforward,
100 |             dropout=dropout,
101 |             activation=activation,
102 |         )
103 | 
104 |     def forward(
105 |         self,
106 |         tgt: Tensor,
107 |         memory: Tensor,
108 |         tgt_mask: Optional[Tensor] = None,
109 |         memory_mask: Optional[Tensor] = None,
110 |         tgt_key_padding_mask: Optional[Tensor] = None,
111 |         memory_key_padding_mask: Optional[Tensor] = None,
112 |         tgt_pos: Optional[Tensor] = None,
113 |         memory_pos: Optional[Tensor] = None,
114 |         output_attention: Optional[bool] = False,
115 |     ) -> Tensor:
116 |         x = tgt
117 |         x = self.norm1(x + self._sa_block(x, tgt_mask, tgt_key_padding_mask, tgt_pos))
118 |         x_, attn = self._mha_block(
119 |             x,
120 |             memory,
121 |             memory_mask,
122 |             memory_key_padding_mask,
123 |             x_pos=tgt_pos,
124 |             mem_pos=memory_pos,
125 |             output_attention=True,
126 |         )
127 |         x = self.norm2(x + x_)
128 |         x = self.norm3(x + self._ff_block(x))
129 |         if output_attention:
130 |             return x, attn
131 |         return x
132 | 
133 |     def with_pos_embed(self, tensor, pos: Optional[Tensor]):
134 |         # print("decoder", pos is None)
135 |         return tensor if pos is None else tensor + pos
136 | 
137 |     def _sa_block(
138 |         self,
139 |         x: Tensor,
140 |         attn_mask: Optional[Tensor],
141 |         key_padding_mask: Optional[Tensor],
142 |         pos: Optional[Tensor],
143 |     ) -> Tensor:
144 |         q = k = self.with_pos_embed(x, pos)
145 |         x = self.self_attn(
146 |             q,
147 |             k,
148 |             x,
149 |             attn_mask=attn_mask,
150 |             key_padding_mask=key_padding_mask,
151 |             need_weights=False,
152 |         )[0]
153 |         return self.dropout1(x)
154 | 
155 |     def _mha_block(
156 |         self,
157 |         x: Tensor,
158 |         mem: Tensor,
159 |         attn_mask: Optional[Tensor],
160 |         key_padding_mask: Optional[Tensor],
161 |         x_pos: Optional[Tensor],
162 |         mem_pos: Optional[Tensor],
163 |         output_attention: Optional[bool] = False,
164 |     ) -> Tensor:
165 |         x, attn = self.multihead_attn(
166 |             self.with_pos_embed(x, x_pos),
167 |             self.with_pos_embed(mem, mem_pos),
168 |             mem,
169 |             attn_mask=attn_mask,
170 |             key_padding_mask=key_padding_mask,
171 |             need_weights=True,
172 |         )
173 |         if output_attention:
174 |             return self.dropout2(x), attn
175 |         return self.dropout2(x)
176 | 
177 |     def _ff_block(self, x: Tensor) -> Tensor:
178 |         x = self.linear2(self.dropout(self.activation(self.linear1(x))))
179 |         return self.dropout3(x)
180 | 


--------------------------------------------------------------------------------
/lalaloc/model/unet.py:
--------------------------------------------------------------------------------
 1 | import torch
 2 | import torch.nn as nn
 3 | import torch.nn.functional as F
 4 | 
 5 | 
 6 | class ConvBlock(nn.Module):
 7 |     def __init__(self, in_channels, out_channels):
 8 |         super().__init__()
 9 |         self.conv1 = nn.Conv2d(in_channels, out_channels, 3, padding=1)
10 |         self.bn1 = nn.BatchNorm2d(out_channels)
11 |         self.relu1 = nn.ReLU()
12 |         self.conv2 = nn.Conv2d(out_channels, out_channels, 3, padding=1)
13 |         self.bn2 = nn.BatchNorm2d(out_channels)
14 |         self.relu2 = nn.ReLU()
15 | 
16 |     def forward(self, x):
17 |         x = self.conv1(x)
18 |         x = self.bn1(x)
19 |         x = self.relu1(x)
20 |         x = self.conv2(x)
21 |         x = self.bn2(x)
22 |         x = self.relu2(x)
23 |         return x
24 | 
25 | 
26 | class DecodeBlock(nn.Module):
27 |     def __init__(self, in_channels, out_channels):
28 |         super().__init__()
29 |         self.upsample = nn.Upsample(
30 |             scale_factor=2.0, mode="bilinear", align_corners=False
31 |         )
32 |         self.conv = ConvBlock(in_channels * 2, out_channels)
33 | 
34 |     def forward(self, x, encoder_x):
35 |         x = self.upsample(x)
36 |         x = torch.cat([x, encoder_x], dim=1)
37 |         x = self.conv(x)
38 |         return x
39 | 
40 | 
41 | class Encoder(nn.Module):
42 |     def __init__(self, channels):
43 |         super().__init__()
44 |         in_channels = 3
45 |         blocks = []
46 |         for out_channels in channels:
47 |             blocks.append(ConvBlock(in_channels, out_channels))
48 |             in_channels = out_channels
49 |         self.blocks = nn.ModuleList(blocks)
50 |         self.pool = nn.MaxPool2d(2)
51 | 
52 |     def forward(self, x):
53 |         features = []
54 |         for block in self.blocks:
55 |             x = block(x)
56 |             features.append(x)
57 |             x = self.pool(x)
58 |         return x, features
59 | 
60 | 
61 | class Decoder(nn.Module):
62 |     def __init__(self, channels, in_channels):
63 |         super().__init__()
64 |         blocks = []
65 |         self.initial_block = DecodeBlock(in_channels, channels[0])
66 |         for out_channels in channels[1:]:
67 |             blocks.append(DecodeBlock(in_channels, out_channels))
68 |             in_channels = out_channels
69 |         self.blocks = nn.ModuleList(blocks)
70 | 
71 |     def forward(self, x, encoder_features):
72 |         for block, encoder_x in zip(self.blocks, encoder_features):
73 |             x = block(x, encoder_x)
74 |         return x
75 | 
76 | 
77 | class UNet(nn.Module):
78 |     def __init__(self, config):
79 |         super().__init__()
80 |         encoder_channels = config.MODEL.UNET_ENCODER_CHANNELS
81 |         decoder_channels = config.MODEL.UNET_DECODER_CHANNELS
82 |         self.encoder = Encoder(encoder_channels)
83 |         self.decoder = Decoder(decoder_channels, encoder_channels[-1])
84 |         self.conv = nn.Conv2d(128, 128, 3, padding=1)
85 | 
86 |     def forward(self, x):
87 |         x, features = self.encoder(x)
88 |         x = self.decoder(x, features[::-1])
89 |         x = self.conv(x)
90 |         x = F.normalize(x, dim=1)
91 |         return x
92 | 


--------------------------------------------------------------------------------
/lalaloc/utils/chamfer.py:
--------------------------------------------------------------------------------
 1 | import torch
 2 | from pytorch3d.loss import chamfer_distance
 3 | 
 4 | 
 5 | def chamfer(points1, points2, device=None):
 6 |     with torch.no_grad():
 7 |         if device is None:
 8 |             points1 = points1.cuda()
 9 |             points2 = points2.cuda()
10 |         else:
11 |             points1 = points1.to(device)
12 |             points2 = points2.to(device)
13 |         distances, _ = chamfer_distance(points1, points2, batch_reduction=None)
14 |     return distances.cpu()
15 | 


--------------------------------------------------------------------------------
/lalaloc/utils/eval.py:
--------------------------------------------------------------------------------
 1 | import torch
 2 | 
 3 | def index_of_val(arr, value):
 4 |     return (arr == value).nonzero()
 5 | 
 6 | 
 7 | def recall_at_n(n, predictions, truths):
 8 |     results = []
 9 |     for prediction, truth in zip(predictions, truths):
10 |         intersection = [p for p in prediction[:n] if p in truth[:n]]
11 |         result = 1 if len(intersection) > 0 else 0
12 |         results.append(result)
13 |     results = torch.Tensor(results).mean()
14 |     return results


--------------------------------------------------------------------------------
/lalaloc/utils/floorplan.py:
--------------------------------------------------------------------------------
 1 | import torch
 2 | import torch.nn.functional as F
 3 | 
 4 | 
 5 | def pose_to_pixel_loc(pose, scale, shift):
 6 |     locs = pose.clone()[:, :, :2]
 7 |     locs -= shift.view(-1, 1, 2)
 8 |     locs *= scale.view(-1, 1, 1)
 9 |     return locs
10 | 
11 | 
12 | def pixel_loc_to_pose(locs, scale, shift, z):
13 |     poses = locs.clone().float()
14 |     poses /= scale.view(-1, 1, 1)
15 |     poses += shift.view(-1, 1, 2)
16 |     h, w, _ = poses.shape
17 |     zs = torch.full((h, w, 1), z).to(poses.device)
18 |     return torch.cat([poses, zs], dim=-1)
19 | 
20 | 
21 | def create_pixel_loc_grid(w, h):
22 |     x = torch.arange(w)
23 |     y = torch.arange(h)
24 |     ys, xs = torch.meshgrid([y, x])
25 |     loc_grid = torch.stack([xs, ys], dim=-1)
26 |     return loc_grid
27 | 
28 | 
29 | def sample_locs(tensor, locs, normalise=True):
30 |     b, c, h, w = tensor.shape
31 |     locs[:, :, 0] = locs[:, :, 0] / (w / 2) - 1
32 |     locs[:, :, 1] = locs[:, :, 1] / (h / 2) - 1
33 |     _, n, _ = locs.shape
34 |     locs = locs.view(b, 1, -1, 2).clone().float()
35 |     sampled = F.grid_sample(tensor, locs, align_corners=False)
36 |     sampled = sampled.view(b, c, n).permute(0, 2, 1)
37 |     if normalise:
38 |         sampled = F.normalize(sampled, dim=-1)
39 |     return sampled
40 | 


--------------------------------------------------------------------------------
/lalaloc/utils/panorama.py:
--------------------------------------------------------------------------------
 1 | """
 2 | Parts of this code are modified from: https://github.com/bertjiazheng/Structured3D
 3 | Copyright (c) 2019 Structured3D Group
 4 | """
 5 | import numpy as np
 6 | 
 7 | 
 8 | def uvs_to_rays(uvs):
 9 |     xs_ray = np.cos(uvs[:, 1]) * np.sin(uvs[:, 0])
10 |     ys_ray = np.cos(uvs[:, 1]) * np.cos(uvs[:, 0])
11 |     zs_ray = np.sin(uvs[:, 1])
12 | 
13 |     rays = np.stack([xs_ray, ys_ray, zs_ray], axis=1)
14 |     rays = rays / np.linalg.norm(rays, axis=1).reshape(-1, 1)
15 |     return rays
16 | 
17 | 
18 | def coords_to_uv(coords, width, height):
19 |     """
20 |     Image coordinates (xy) to uv
21 |     """
22 |     middleX = width / 2 + 0.5
23 |     middleY = height / 2 + 0.5
24 |     uv = np.hstack(
25 |         [
26 |             (coords[:, [0]] - middleX) / width * 2 * np.pi,
27 |             -(coords[:, [1]] - middleY) / height * np.pi,
28 |         ]
29 |     )
30 |     return uv
31 | 


--------------------------------------------------------------------------------
/lalaloc/utils/polygons.py:
--------------------------------------------------------------------------------
  1 | """
  2 | Parts of this code are modified from: https://github.com/bertjiazheng/Structured3D
  3 | Copyright (c) 2019 Structured3D Group
  4 | """
  5 | import numpy as np
  6 | import pymesh
  7 | 
  8 | 
  9 | def project(x, meta):
 10 |     """ project 3D to 2D for polygon clipping
 11 |     """
 12 |     proj_axis = max(range(3), key=lambda i: abs(meta["normal"][i]))
 13 | 
 14 |     return tuple(c for i, c in enumerate(x) if i != proj_axis)
 15 | 
 16 | 
 17 | def project_inv(x, meta):
 18 |     """ recover 3D points from 2D
 19 |     """
 20 |     # Returns the vector w in the walls' plane such that project(w) equals x.
 21 |     proj_axis = max(range(3), key=lambda i: abs(meta["normal"][i]))
 22 | 
 23 |     w = list(x)
 24 |     w[proj_axis:proj_axis] = [0.0]
 25 |     c = -meta["offset"]
 26 |     for i in range(3):
 27 |         c -= w[i] * meta["normal"][i]
 28 |     c /= meta["normal"][proj_axis]
 29 |     w[proj_axis] = c
 30 |     return tuple(w)
 31 | 
 32 | 
 33 | def triangulate(points):
 34 |     """ triangulate the plane for operation and visualization
 35 |     """
 36 | 
 37 |     num_points = len(points)
 38 |     indices = np.arange(num_points, dtype=np.int)
 39 |     segments = np.vstack((indices, np.roll(indices, -1))).T
 40 |     tri = pymesh.triangle()
 41 |     tri.points = np.array(points)
 42 | 
 43 |     tri.segments = segments
 44 |     tri.verbosity = 0
 45 |     tri.run()
 46 |     return tri.mesh
 47 | 
 48 | 
 49 | def clip_polygon(polygons, vertices_hole, junctions, meta, clip_holes=True):
 50 |     """ clip polygon the hole
 51 |     """
 52 |     if len(polygons) == 1:
 53 |         junctions = [junctions[vertex] for vertex in polygons[0]]
 54 |         mesh_wall = triangulate(junctions)
 55 |         vertices = np.array(mesh_wall.vertices)
 56 |         faces = np.array(mesh_wall.faces)
 57 | 
 58 |         return vertices, faces
 59 | 
 60 |     else:
 61 |         wall = []
 62 |         holes = []
 63 |         for polygon in polygons:
 64 |             if np.any(np.intersect1d(polygon, vertices_hole)):
 65 |                 holes.append(polygon)
 66 |             else:
 67 |                 wall.append(polygon)
 68 | 
 69 |         # extract junctions on this plane
 70 |         indices = []
 71 |         junctions_wall = []
 72 |         for plane in wall:
 73 |             for vertex in plane:
 74 |                 indices.append(vertex)
 75 |                 junctions_wall.append(junctions[vertex])
 76 |         junctions_wall = [project(x, meta) for x in junctions_wall]
 77 |         mesh_wall = triangulate(junctions_wall)
 78 | 
 79 |         if clip_holes:
 80 |             junctions_holes = []
 81 |             for plane in holes:
 82 |                 junctions_hole = []
 83 |                 for vertex in plane:
 84 |                     indices.append(vertex)
 85 |                     junctions_hole.append(junctions[vertex])
 86 |                 junctions_holes.append(junctions_hole)
 87 | 
 88 |             junctions_holes = [
 89 |                 [project(x, meta) for x in junctions_hole]
 90 |                 for junctions_hole in junctions_holes
 91 |             ]
 92 |             
 93 |             for hole in junctions_holes:
 94 |                 mesh_hole = triangulate(hole)
 95 |                 mesh_wall = pymesh.boolean(mesh_wall, mesh_hole, "difference")
 96 | 
 97 |         vertices = [project_inv(vertex, meta) for vertex in mesh_wall.vertices]
 98 | 
 99 |         return vertices, np.array(mesh_wall.faces)
100 | 
101 | 
102 | def convert_lines_to_vertices(lines):
103 |     """convert line representation to polygon vertices
104 |     """
105 |     polygons = []
106 |     lines = np.array(lines)
107 | 
108 |     polygon = None
109 |     while len(lines) != 0:
110 |         if polygon is None:
111 |             polygon = lines[0].tolist()
112 |             lines = np.delete(lines, 0, 0)
113 | 
114 |         lineID, juncID = np.where(lines == polygon[-1])
115 |         vertex = lines[lineID[0], 1 - juncID[0]]
116 |         lines = np.delete(lines, lineID, 0)
117 | 
118 |         if vertex in polygon:
119 |             polygons.append(polygon)
120 |             polygon = None
121 |         else:
122 |             polygon.append(vertex)
123 | 
124 |     return polygons
125 | 


--------------------------------------------------------------------------------
/lalaloc/utils/projection.py:
--------------------------------------------------------------------------------
 1 | import numpy as np
 2 | import torch
 3 | from pytorch3d.structures import Pointclouds
 4 | from pytorch3d.renderer.mesh.rasterize_meshes import barycentric_coordinates
 5 | 
 6 | from .panorama import coords_to_uv, uvs_to_rays
 7 | 
 8 | 
 9 | def project_depth_to_pc(
10 |     config, depth, camera_type="panorama", camera_params=None,
11 | ):
12 |     height, width = depth.shape
13 |     depth = depth.reshape(-1)
14 |     xys_camera = np.stack(
15 |         np.meshgrid(np.arange(width), np.arange(height)), axis=2
16 |     ).reshape((-1, 2))
17 | 
18 |     if camera_type == "panorama":
19 |         uvs = coords_to_uv(xys_camera, width, height)
20 |         rays = uvs_to_rays(uvs).astype(np.float32).reshape(-1, 3)
21 |     else:
22 |         raise NotImplementedError(
23 |             "{} camera_type is not currently implemented".format(camera_type)
24 |         )
25 |     invalid_depth = np.isnan(depth) | (depth <= 0)
26 |     points = depth[~invalid_depth].reshape(-1, 1) * rays[~invalid_depth]
27 | 
28 |     return points
29 | 
30 | 
31 | def project_depth_to_pc_batched(
32 |     config, depths, camera_type="panorama", camera_params=None
33 | ):
34 |     height, width = config.RENDER.IMG_SIZE
35 |     xys_camera = np.stack(
36 |         np.meshgrid(np.arange(width), np.arange(height)), axis=2
37 |     ).reshape((-1, 2))
38 | 
39 |     if camera_type == "panorama":
40 |         uvs = coords_to_uv(xys_camera, width, height)
41 |         rays = uvs_to_rays(uvs).astype(np.float32).reshape(-1, 3)
42 |     else:
43 |         raise NotImplementedError(
44 |             "{} camera_type is not currently implemented".format(camera_type)
45 |         )
46 |     points = []
47 |     for depth in depths:
48 |         depth = depth.reshape(-1)
49 |         invalid_depth = np.isnan(depth) | (depth <= 0)
50 |         point = depth[~invalid_depth].reshape(-1, 1) * rays[~invalid_depth]
51 |         points.append(torch.Tensor(point))
52 | 
53 |     points = Pointclouds(points)
54 |     return points
55 | 
56 | 
57 | def projects_onto_floor(location, floors):
58 |     for idx, (verts, faces) in enumerate(zip(floors.verts_list(), floors.faces_list())):
59 |         triangles = [verts[idxs] for idxs in faces]
60 |         for triangle in triangles:
61 |             bary = barycentric_coordinates(location, *triangle)
62 |             bary = torch.Tensor(bary)
63 |             if (bary >= 0).all() and (bary <= 1).all() and abs(sum(bary) - 1) < 1e-5:
64 |                 return idx
65 |     return -1
66 | 


--------------------------------------------------------------------------------
/lalaloc/utils/render.py:
--------------------------------------------------------------------------------
  1 | import numpy as np
  2 | import pyredner
  3 | import torch
  4 | from pyredner import scene
  5 | 
  6 | device = torch.device("cpu:0")
  7 | 
  8 | pyredner.set_print_timing(False)
  9 | pyredner.set_device(device)
 10 | 
 11 | 
 12 | def create_camera(pose, img_size):
 13 |     # look at x axis equivalent to an offset x'
 14 |     look_at = pose.clone() + torch.Tensor([1, 0, 0])
 15 |     up = torch.Tensor([0.0, 0.0, 1.0])
 16 | 
 17 |     return pyredner.Camera(
 18 |         position=pose,
 19 |         look_at=look_at,
 20 |         up=up,
 21 |         camera_type=pyredner.camera_type.panorama,
 22 |         resolution=img_size,
 23 |     )
 24 | 
 25 | 
 26 | def create_objects(mesh):
 27 |     material = pyredner.Material()
 28 |     objects = []
 29 |     for verts, faces in zip(mesh.verts_list(), mesh.faces_list()):
 30 |         objects.append(pyredner.Object(verts, faces.int(), material))
 31 |     return objects
 32 | 
 33 | 
 34 | def render_scene(config, mesh, pose):
 35 |     img_size = config.RENDER.IMG_SIZE
 36 |     pose = torch.Tensor(pose)
 37 | 
 38 |     camera = create_camera(pose, img_size)
 39 |     objects = create_objects(mesh)
 40 |     scene = pyredner.Scene(camera=camera, objects=objects)
 41 | 
 42 |     img = pyredner.render_g_buffer(scene, [pyredner.channels.depth], device=device)
 43 |     # flip x axis for equivalent to a flipped x'
 44 |     img = img.flip(dims=[1])
 45 |     img = img.cpu().squeeze(2).numpy()
 46 |     img = np.ascontiguousarray(img)
 47 |     return img
 48 | 
 49 | 
 50 | def render_semantics(config, mesh, pose):
 51 |     img_size = config.RENDER.IMG_SIZE
 52 |     pose = torch.Tensor(pose)
 53 | 
 54 |     camera = create_camera(pose, img_size)
 55 |     objects = create_objects(mesh)
 56 |     scene = pyredner.Scene(camera=camera, objects=objects)
 57 | 
 58 |     img = pyredner.render_g_buffer(scene, [pyredner.channels.shape_id], num_samples=1)
 59 |     # flip x axis for equivalent to a flipped x'
 60 |     img = img.flip(dims=[1])
 61 |     img = img.cpu().squeeze(2).numpy()
 62 |     img = np.ascontiguousarray(img)
 63 |     return img
 64 | 
 65 | 
 66 | def render_scene_batched(config, geometry, poses):
 67 |     scenes = []
 68 |     rooms = [r for _, r in poses]
 69 |     rooms = np.unique(rooms)
 70 | 
 71 |     objects = {}
 72 |     for room in rooms:
 73 |         mesh = geometry[room]
 74 |         objs = create_objects(mesh)
 75 |         objects[room] = objs
 76 | 
 77 |     img_size = config.RENDER.IMG_SIZE
 78 |     for pose, room in poses:
 79 |         pose = torch.Tensor(pose)
 80 |         camera = create_camera(pose, img_size)
 81 |         scenes.append(pyredner.Scene(camera=camera, objects=objects[room]))
 82 | 
 83 |     imgs = pyredner.render_g_buffer(
 84 |         scenes, [pyredner.channels.depth], device=device, num_samples=1
 85 |     )
 86 |     # flip x axis for equivalent to a flipped x'
 87 |     imgs = imgs.flip(dims=[2])
 88 |     imgs = imgs.cpu().squeeze(3).numpy()
 89 |     imgs = np.ascontiguousarray(imgs)
 90 |     return imgs
 91 | 
 92 | 
 93 | def render_semantic_batched(config, geometry, poses):
 94 |     scenes = []
 95 |     rooms = [r for _, r in poses]
 96 |     rooms = np.unique(rooms)
 97 | 
 98 |     objects = {}
 99 |     for room in rooms:
100 |         mesh = geometry[room]
101 |         objs = create_objects(mesh)
102 |         objects[room] = objs
103 | 
104 |     img_size = config.RENDER.IMG_SIZE
105 |     for pose, room in poses:
106 |         pose = torch.Tensor(pose)
107 |         camera = create_camera(pose, img_size)
108 |         scenes.append(pyredner.Scene(camera=camera, objects=objects[room]))
109 | 
110 |     imgs = pyredner.render_g_buffer(
111 |         scenes, [pyredner.channels.shape_id], device=device, num_samples=1
112 |     )
113 |     # flip x axis for equivalent to a flipped x'
114 |     imgs = imgs.flip(dims=[2])
115 |     imgs = imgs.cpu().squeeze(3).numpy()
116 |     imgs = np.ascontiguousarray(imgs)
117 |     return imgs
118 | 


--------------------------------------------------------------------------------
/lalaloc/utils/vogel_disc.py:
--------------------------------------------------------------------------------
 1 | import numpy as np
 2 | 
 3 | 
 4 | def sample_vogel_disc(centre, radius, num_samples):
 5 |     phi = (1 + np.sqrt(5)) / 2
 6 | 
 7 |     samples = []
 8 |     for i in range(1, num_samples + 1):
 9 |         distance = radius * np.sqrt(i) / np.sqrt(num_samples + 1)
10 |         angle = 2 * np.pi * phi * i
11 | 
12 |         x = distance * np.cos(angle) + centre[0]
13 |         y = distance * np.sin(angle) + centre[1]
14 |         samples.append(np.array([x, y, centre[2]]))
15 |     return samples


--------------------------------------------------------------------------------
/train.py:
--------------------------------------------------------------------------------
 1 | import pytorch_lightning as pl
 2 | import torch
 3 | import torch.nn as nn
 4 | from pytorch_lightning import loggers
 5 | from pytorch_lightning.callbacks import ModelCheckpoint
 6 | 
 7 | from lalaloc.config import get_cfg_defaults, parse_args
 8 | from lalaloc.model import (
 9 |     FloorPlanUnetImage,
10 |     FloorPlanUnetLayout,
11 |     ImageFromLayout,
12 |     Layout2LayoutDecode,
13 | )
14 | 
15 | if __name__ == "__main__":
16 |     args = parse_args()
17 | 
18 |     config = get_cfg_defaults()
19 |     config.merge_from_file(args.config_file)
20 |     config.merge_from_list(args.opts)
21 |     if args.val:
22 |         config.TEST.VAL_AS_TEST = True
23 |     config.freeze()
24 |     print(config)
25 | 
26 |     pl.seed_everything(config.SEED)
27 | 
28 |     if args.checkpoint_file:
29 |         resume_path = args.checkpoint_file
30 |     else:
31 |         resume_path = None
32 | 
33 |     if config.MODEL.TYPE == "lalaloc":
34 |         if config.MODEL.QUERY_TYPE == "image":
35 |             model = ImageFromLayout(config)
36 |         elif config.MODEL.QUERY_TYPE == "layout":
37 |             model = Layout2LayoutDecode(config)
38 |         else:
39 |             raise NotImplementedError(
40 |                 "The query type, {}, isn't recognised.".format(config.MODEL.QUERY_TYPE)
41 |             )
42 |     elif config.MODEL.TYPE == "lalaloc++":
43 |         if config.MODEL.QUERY_TYPE == "image":
44 |             model = FloorPlanUnetImage(config)
45 |         elif config.MODEL.QUERY_TYPE == "layout":
46 |             model = FloorPlanUnetLayout(config)
47 |         else:
48 |             raise NotImplementedError(
49 |                 "The query type, {}, isn't recognised.".format(config.MODEL.QUERY_TYPE)
50 |             )
51 | 
52 |     logger = loggers.TensorBoardLogger(config.OUT_DIR)
53 |     checkpoint_callback = ModelCheckpoint(save_top_k=-1,)
54 | 
55 |     trainer = pl.Trainer(
56 |         max_epochs=config.TRAIN.NUM_EPOCHS,
57 |         gpus=config.SYSTEM.NUM_GPUS,
58 |         logger=logger,
59 |         distributed_backend=config.SYSTEM.DISTRIBUTED_BACKEND,
60 |         limit_val_batches=25,
61 |         resume_from_checkpoint=resume_path,
62 |         num_sanity_val_steps=2,
63 |         check_val_every_n_epoch=config.TRAIN.TEST_EVERY,
64 |         callbacks=[checkpoint_callback],
65 |     )
66 |     if args.test_ckpt:
67 |         assert config.SYSTEM.NUM_GPUS == 1
68 |         load = torch.load(args.test_ckpt)
69 |         model.load_state_dict(load["state_dict"], strict=False)
70 |         trainer.test(model)
71 |     else:
72 |         trainer.fit(model)
73 | 


--------------------------------------------------------------------------------
/trained_models/.gitattributes:
--------------------------------------------------------------------------------
1 | lalaloc_pp_image2plan.ckpt filter=lfs diff=lfs merge=lfs -text
2 | lalaloc_pp_planbranch.ckpt filter=lfs diff=lfs merge=lfs -text
3 | 


--------------------------------------------------------------------------------
/trained_models/lalaloc_image2layout.ckpt:
--------------------------------------------------------------------------------
1 | version https://git-lfs.github.com/spec/v1
2 | oid sha256:ad1e340d3f2e52bdc64cdaeeb2d62e46388b1524aa106b59a3e78e5fb60f2472
3 | size 235577387
4 | 


--------------------------------------------------------------------------------
/trained_models/lalaloc_layout2layout.ckpt:
--------------------------------------------------------------------------------
1 | version https://git-lfs.github.com/spec/v1
2 | oid sha256:3b8e8eff602cc464e88e786011a9e1fcfb45fca0ed8769eb75d0062a0e5a1b23
3 | size 140357335
4 | 


--------------------------------------------------------------------------------
/trained_models/lalaloc_pp_image2plan.ckpt:
--------------------------------------------------------------------------------
1 | version https://git-lfs.github.com/spec/v1
2 | oid sha256:fba43aa21769c53e3bb0802a475184980f724b8efdddb219ceca577c43ae6d98
3 | size 257260520
4 | 


--------------------------------------------------------------------------------
/trained_models/lalaloc_pp_planbranch.ckpt:
--------------------------------------------------------------------------------
1 | version https://git-lfs.github.com/spec/v1
2 | oid sha256:0c4d45af98576cbdde2bc457d5111b87a3e5dc3e5cbc828463373f85f8eeb0e4
3 | size 105801121
4 | 


--------------------------------------------------------------------------------