├── .gitattributes ├── .gitignore ├── LICENSE ├── README.md ├── assets ├── lll++_overview.png └── overview.png ├── configs ├── image_branch.yaml ├── lalaloc_pp │ ├── plan_branch.yaml │ └── transformer_image_branch.yaml └── layout_branch.yaml ├── lalaloc ├── __init__.py ├── config │ ├── __init__.py │ ├── defaults.py │ └── parser.py ├── data │ ├── __init__.py │ ├── dataset.py │ ├── load.py │ ├── split.py │ └── transform.py ├── model │ ├── __init__.py │ ├── lalaloc.py │ ├── lalaloc_base.py │ ├── lalaloc_pp.py │ ├── lalaloc_pp_base.py │ ├── losses.py │ ├── modules.py │ ├── pose_optimisation.py │ ├── position_encoding.py │ ├── transformer.py │ └── unet.py └── utils │ ├── chamfer.py │ ├── eval.py │ ├── floorplan.py │ ├── panorama.py │ ├── polygons.py │ ├── projection.py │ ├── render.py │ └── vogel_disc.py ├── train.py └── trained_models ├── .gitattributes ├── lalaloc_image2layout.ckpt ├── lalaloc_layout2layout.ckpt ├── lalaloc_pp_image2plan.ckpt └── lalaloc_pp_planbranch.ckpt /.gitattributes: -------------------------------------------------------------------------------- 1 | *.ckpt filter=lfs diff=lfs merge=lfs -text 2 | -------------------------------------------------------------------------------- /.gitignore: -------------------------------------------------------------------------------- 1 | __pycache__/ 2 | .vscode/ 3 | runs/ -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | MIT License 2 | 3 | Copyright (c) 2021 Henry Howard-Jenkins @ Active Vision Laboratory, Oxford 4 | 5 | Permission is hereby granted, free of charge, to any person obtaining a copy 6 | of this software and associated documentation files (the "Software"), to deal 7 | in the Software without restriction, including without limitation the rights 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 9 | copies of the Software, and to permit persons to whom the Software is 10 | furnished to do so, subject to the following conditions: 11 | 12 | The above copyright notice and this permission notice shall be included in all 13 | copies or substantial portions of the Software. 14 | 15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 21 | SOFTWARE. -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Overview 2 | 3 | This is the code repository for LaLaLoc and LaLaLoc++. 4 | 5 | 6 | * We currently provide: 7 | * Training and evaluation code for LaLaLoc, for both the Image-to-Layout and Layout-to-Layout configurations. 8 | * Training and evaluation code for LaLaLoc++'s plan and image branches. 9 | * Pretrained models for all the provided configs. 10 | 11 | ## LaLaLoc++: Global Floor Plan Comprehension for Layout Localisation in Unvisited Environments 12 | **Henry Howard-Jenkins and Victor Adrian Prisacariu** 13 | **(ECCV 2022)** 14 | 15 | [Project Page](https://lalalocpp.active.vision) | Paper(coming soon!) 16 | 17 | ![LaLaLoc++ Overview](assets/lll++_overview.png) 18 | 19 | ## LaLaLoc: Latent Layout Localisation in Dynamic, Unvisited Environments 20 | **Henry Howard-Jenkins, Jose-Raul Ruiz-Sarmiento and Victor Adrian Prisacariu** 21 | **(ICCV 2021)** 22 | 23 | [Project Page](https://lalaloc.active.vision) | [Paper](https://arxiv.org/abs/2104.09169) 24 | 25 | ![LaLaLoc Overview](assets/overview.png) 26 | 27 | 28 | # Setup 29 | ## Installing Requirements 30 | 31 | * Create conda environment: 32 | ``` 33 | conda create -n lalaloc python==3.8 34 | conda activate lalaloc 35 | ``` 36 | * Install PyTorch: 37 | ``` 38 | conda install pytorch==1.7.1 torchvision==0.8.2 cudatoolkit=10.1 -c pytorch 39 | ``` 40 | * Install Pytorch Lightning: 41 | ``` 42 | conda install -c conda-forge pytorch-lightning==1.1.5 43 | ``` 44 | * Install Pytorch3d: 45 | ``` 46 | conda install -c fvcore -c iopath -c conda-forge fvcore iopath 47 | conda install -c bottler nvidiacub 48 | conda install -c pytorch3d pytorch3d==0.4.0 49 | ``` 50 | * Install Pymesh 51 | * Follow build and install instructions: https://github.com/PyMesh/PyMesh 52 | * Install Redner and OpenCV: 53 | ``` 54 | pip install redner-gpu opencv-python 55 | ``` 56 | * Install Scikit-Learn: 57 | ``` 58 | conda install -c anaconda scikit-learn 59 | ``` 60 | 61 | ## Download the Structured3D Dataset 62 | * Information provided here: https://github.com/bertjiazheng/Structured3D 63 | 64 | # Usage 65 | ### Layout/Plan Branch 66 | * Train LaLaLoc's layout branch or LaLaLoc++'s plan branch. 67 | ``` 68 | # LaLaLoc layout branch 69 | python train.py -c configs/layout_branch.yaml \ 70 | DATASET.PATH [path/to/dataset] 71 | ``` 72 | ``` 73 | # LaLaLoc++ plan branch 74 | python train.py -c configs/lalaloc_pp/plan_branch.yaml \ 75 | DATASET.PATH [path/to/dataset] 76 | ``` 77 | * Test LaLaLoc's layout branch: 78 | * Perform evaluation of the trained layout branch on a sampled grid of 0.5m with VDR and LPO. 79 | 80 | Note: Testing LaLaLoc++'s plan branch isn't particularly meaningful. 81 | ``` 82 | python train.py -c configs/layout_branch.yaml -t [path/to/checkpoint] \ 83 | DATASET.PATH [path/to/dataset] \ 84 | SYSTEM.NUM_GPUS 1 \ 85 | TEST.VOGEL_DISC_REFINE True \ 86 | TEST.LATENT_POSE_OPTIMISATION True \ 87 | TEST.POSE_SAMPLE_STEP 500 88 | ``` 89 | 90 | ### Image Branch 91 | * Train the image branch for LaLaLoc and LaLaLoc++ 92 | * Perform training of the image branch with the layout/plan branch from a previous training run. 93 | ``` 94 | # LaLaLoc image branch 95 | python train.py -c configs/image_branch.yaml \ 96 | DATASET.PATH [path/to/dataset] \ 97 | TRAIN.SOURCE_WEIGHTS [path/to/layout_branch_checkpoint] 98 | ``` 99 | ``` 100 | # LaLaLoc++ image branch 101 | python train.py -c configs/lalaloc_pp/image_branch.yaml \ 102 | DATASET.PATH [path/to/dataset] \ 103 | TRAIN.SOURCE_WEIGHTS [path/to/plan_branch_checkpoint] 104 | ``` 105 | 106 | * Test image branch 107 | ``` 108 | # LaLaLoc image branch 109 | python train.py -c configs/image_branch.yaml -t [path/to/checkpoint] \ 110 | DATASET.PATH [path/to/dataset] \ 111 | SYSTEM.NUM_GPUS 1 \ 112 | TEST.VOGEL_DISC_REFINE True \ 113 | TEST.LATENT_POSE_OPTIMISATION True \ 114 | TEST.POSE_SAMPLE_STEP 500 115 | ``` 116 | ``` 117 | # LaLaLoc++ image branch 118 | python train.py -c configs/lalaloc_pp/transfomer_image_branch.yaml -t [path/to/checkpoint] \ 119 | DATASET.PATH [path/to/dataset] \ 120 | SYSTEM.NUM_GPUS 1 \ 121 | ``` 122 | 123 | # Citations 124 | ``` 125 | @article{howard2022lalaloc++, 126 | title={LaLaLoc++: Global Floor Plan Comprehension for Layout Localisation in Unvisited Environments}, 127 | author={Howard-Jenkins, Henry and Prisacariu, Victor Adrian}, 128 | booktitle={Proceedings of the European Conference on Computer Vision}, 129 | pages={}, 130 | year={2022} 131 | } 132 | ``` 133 | ``` 134 | @inproceedings{howard2021lalaloc, 135 | title={Lalaloc: Latent layout localisation in dynamic, unvisited environments}, 136 | author={Howard-Jenkins, Henry and Ruiz-Sarmiento, Jose-Raul and Prisacariu, Victor Adrian}, 137 | booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision}, 138 | pages={10107--10116}, 139 | year={2021} 140 | } 141 | ``` -------------------------------------------------------------------------------- /assets/lll++_overview.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ActiveVisionLab/LaLaLoc/be48bc1884722409eeb6766e55fafa8ece521c9c/assets/lll++_overview.png -------------------------------------------------------------------------------- /assets/overview.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ActiveVisionLab/LaLaLoc/be48bc1884722409eeb6766e55fafa8ece521c9c/assets/overview.png -------------------------------------------------------------------------------- /configs/image_branch.yaml: -------------------------------------------------------------------------------- 1 | OUT_DIR: "./runs/lalaloc/image_branch" 2 | SYSTEM: 3 | NUM_GPUS: 2 4 | DISTRIBUTED_BACKEND: "ddp" 5 | NUM_WORKERS: 0 6 | MODEL: 7 | QUERY_TYPE: "image" 8 | PANORAMA_BACKBONE: "resnet50" 9 | LAYOUT_BACKBONE: "resnet18" 10 | NORMALISE_EMBEDDING: True 11 | TRAIN: 12 | SOURCE_WEIGHTS: "" 13 | LR_MILESTONES: [100, 150] 14 | NUM_EPOCHS: 200 15 | NEAR_MAX_DIST: 0.5 16 | LOSS: "cos" 17 | INITIAL_LR: 0.1 18 | BATCH_SIZE: 32 19 | TEST_EVERY: 20 20 | NUM_NEAR_SAMPLES: 1 21 | NUM_FAR_SAMPLES: 0 22 | TEST: 23 | POSE_SAMPLE_STEP: 1000 24 | LAYOUTS_MAX_BATCH: 256 25 | RENDER: 26 | BATCH_SIZE: 1024 27 | IMG_SIZE: (128, 256) 28 | POSE_REFINE: 29 | RENDER_SIZE: (128, 256) 30 | -------------------------------------------------------------------------------- /configs/lalaloc_pp/plan_branch.yaml: -------------------------------------------------------------------------------- 1 | OUT_DIR: "./runs/lalaloc_pp/plan_branch" 2 | SYSTEM: 3 | NUM_GPUS: 1 4 | DISTRIBUTED_BACKEND: "ddp" 5 | NUM_WORKERS: 16 6 | MODEL: 7 | TYPE: "lalaloc++" 8 | QUERY_TYPE: "layout" 9 | NORMALISE_EMBEDDING: True 10 | NORMALISE_SAMPLE: True 11 | TRAIN: 12 | LR_MILESTONES: [50, 75] 13 | NUM_EPOCHS: 100 14 | INITIAL_LR: 0.05 15 | BATCH_SIZE: 16 16 | TEST_EVERY: 10 17 | NUM_FAR_SAMPLES: 20 18 | COMPUTE_GT_DIST: False 19 | TEST: 20 | POSE_SAMPLE_STEP: 1000 21 | RENDER: 22 | BATCH_SIZE: 1024 23 | IMG_SIZE: (128, 256) -------------------------------------------------------------------------------- /configs/lalaloc_pp/transformer_image_branch.yaml: -------------------------------------------------------------------------------- 1 | OUT_DIR: "./runs/lalaloc_pp/image_branch" 2 | SYSTEM: 3 | NUM_GPUS: 1 4 | DISTRIBUTED_BACKEND: "ddp" 5 | NUM_WORKERS: 16 6 | MODEL: 7 | TYPE: "lalaloc++" 8 | QUERY_TYPE: "image" 9 | PANORAMA_BACKBONE: "resnet50" 10 | LAYOUT_BACKBONE: "resnet18" 11 | NORMALISE_EMBEDDING: True 12 | NORMALISE_SAMPLE: True 13 | PANO_EMBEDDER_TYPE: "transformer-fc" 14 | PANORAMA_MODULE: 15 | POS_AT_INPUT: False 16 | HIDDEN_DIM: 256 17 | TRAIN: 18 | SOURCE_WEIGHTS: "trained_models/lalaloc_pp_planbranch.ckpt" 19 | LR_MILESTONES: [100, 150] 20 | NUM_EPOCHS: 200 21 | NEAR_MAX_DIST: 0.5 22 | INITIAL_LR: 0.1 23 | BATCH_SIZE: 64 24 | TEST_EVERY: 50 25 | NUM_NEAR_SAMPLES: 1 26 | NUM_FAR_SAMPLES: 0 27 | COMPUTE_GT_DIST: False 28 | TEST: 29 | POSE_SAMPLE_STEP: 1000 30 | LAYOUTS_MAX_BATCH: 256 31 | RENDER: 32 | BATCH_SIZE: 1024 33 | IMG_SIZE: (128, 256) 34 | POSE_REFINE: 35 | RENDER_SIZE: (128, 256) 36 | INPUT: 37 | NORMALISE_STD: [0.229, 0.224, 0.225] -------------------------------------------------------------------------------- /configs/layout_branch.yaml: -------------------------------------------------------------------------------- 1 | OUT_DIR: "./runs/lalaloc/layout_branch" 2 | SYSTEM: 3 | NUM_GPUS: 2 4 | DISTRIBUTED_BACKEND: "ddp" 5 | NUM_WORKERS: 0 6 | MODEL: 7 | QUERY_TYPE: "layout" 8 | LAYOUT_BACKBONE: "resnet18" 9 | NORMALISE_EMBEDDING: True 10 | TRAIN: 11 | LR_MILESTONES: [10, 15] 12 | NUM_EPOCHS: 20 13 | NEAR_MAX_DIST: 0.5 14 | LOSS: "bbs" 15 | INITIAL_LR: 0.01 16 | BATCH_SIZE: 4 17 | TEST_EVERY: 4 18 | NUM_FAR_SAMPLES: 20 19 | LAYOUT_LOSS_SCALE: 100.0 20 | DECODER_LOSS_SCALE: 1.0 21 | TEST: 22 | POSE_SAMPLE_STEP: 1000 23 | LAYOUTS_MAX_BATCH: 256 24 | RENDER: 25 | BATCH_SIZE: 1024 26 | IMG_SIZE: (128, 256) 27 | POSE_REFINE: 28 | RENDER_SIZE: (128, 256) -------------------------------------------------------------------------------- /lalaloc/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ActiveVisionLab/LaLaLoc/be48bc1884722409eeb6766e55fafa8ece521c9c/lalaloc/__init__.py -------------------------------------------------------------------------------- /lalaloc/config/__init__.py: -------------------------------------------------------------------------------- 1 | from .defaults import get_cfg_defaults 2 | from .parser import parse_args -------------------------------------------------------------------------------- /lalaloc/config/defaults.py: -------------------------------------------------------------------------------- 1 | from yacs.config import CfgNode as CN 2 | 3 | _C = CN() 4 | _C.OUT_DIR = "./runs" 5 | _C.SEED = 42 6 | 7 | _C.SYSTEM = CN() 8 | _C.SYSTEM.NUM_WORKERS = 0 9 | _C.SYSTEM.NUM_GPUS = 1 10 | _C.SYSTEM.DISTRIBUTED_BACKEND = "ddp" 11 | 12 | 13 | _C.INPUT = CN() 14 | 15 | _C.INPUT.IMG_SIZE = (256, 512) 16 | _C.INPUT.LAYOUT_SIZE = (256, 512) 17 | _C.INPUT.NORMALISE_MEAN = [0.485, 0.456, 0.406] 18 | _C.INPUT.NORMALISE_STD = [0.485, 0.456, 0.406] 19 | # _C.INPUT.NORMALISE_STD = [0.229, 0.224, 0.225] # A bug meant that this was also used as the STD for LaLaLoc configs 20 | 21 | 22 | _C.DATASET = CN() 23 | 24 | _C.DATASET.PATH = "/home/henry/Data/datasets/structured3d/Structured3D" 25 | _C.DATASET.TEST_LIGHTING = ["warm"] 26 | _C.DATASET.TEST_FURNITURE = ["full"] 27 | _C.DATASET.TRAIN_LIGHTING = ["raw", "warm", "cold"] 28 | _C.DATASET.TRAIN_FURNITURE = ["empty", "full", "simple"] 29 | _C.DATASET.AUGMENT_LAYOUTS = False 30 | _C.DATASET.PIX_PER_MM = 0.025 31 | _C.DATASET.FLOORPLAN_DIVISIBLE_BY = 32 32 | 33 | 34 | _C.RENDER = CN() 35 | 36 | _C.RENDER.IMG_SIZE = (256, 512) 37 | _C.RENDER.USE_CUDA = True 38 | _C.RENDER.BATCH_SIZE = 256 39 | _C.RENDER.INVALID_VALUE = -1000 40 | 41 | 42 | _C.MODEL = CN() 43 | 44 | _C.MODEL.TYPE = "lalaloc" 45 | _C.MODEL.QUERY_TYPE = "image" 46 | _C.MODEL.PANORAMA_BACKBONE = "resnet18" 47 | _C.MODEL.LAYOUT_BACKBONE = "resnet18" 48 | _C.MODEL.DESC_LENGTH = 128 49 | _C.MODEL.NORMALISE_EMBEDDING = True 50 | _C.MODEL.PANO_EMBEDDER_TYPE = "fc" 51 | _C.MODEL.LAYOUT_EMBEDDER_TYPE = "fc" 52 | _C.MODEL.DECODER_RESOLUTION = (32, 64) 53 | _C.MODEL.NORMALISE_SAMPLE = False 54 | 55 | _C.MODEL.UNET_ENCODER_CHANNELS = [32, 64, 128, 256, 512] 56 | _C.MODEL.UNET_DECODER_CHANNELS = [512, 256, 128, 64, 32, 128] 57 | 58 | _C.MODEL.PANORAMA_MODULE = CN() 59 | _C.MODEL.PANORAMA_MODULE.POS_AT_INPUT = True 60 | _C.MODEL.PANORAMA_MODULE.NUM_BLOCKS = 2 61 | _C.MODEL.PANORAMA_MODULE.HIDDEN_DIM = 2048 62 | 63 | 64 | _C.POSE_REFINE = CN() 65 | _C.POSE_REFINE.LR = 0.01 66 | _C.POSE_REFINE.SCHEDULER_PATIENCE = 10 67 | _C.POSE_REFINE.SCHEDULER_THRESHOLD = 0.05 68 | _C.POSE_REFINE.SCHEDULER_DECAY = 0.5 69 | _C.POSE_REFINE.CONVERGANCE_THRESHOLD = 0.0001 70 | _C.POSE_REFINE.CONVERGANCE_PATIENCE = 20 71 | _C.POSE_REFINE.MAX_ITERS = 150 72 | _C.POSE_REFINE.RENDER_SIZE = (256, 512) 73 | 74 | 75 | _C.TRAIN = CN() 76 | 77 | _C.TRAIN.NUM_EPOCHS = 10 78 | _C.TRAIN.BATCH_SIZE = 16 79 | _C.TRAIN.TEST_EVERY = 1 80 | _C.TRAIN.LOSS = "triplet" 81 | _C.TRAIN.GT_LAYOUT_LOSS = False 82 | 83 | _C.TRAIN.INITIAL_LR = 0.01 84 | _C.TRAIN.MOMENTUM = 0.9 85 | _C.TRAIN.WEIGHT_DECAY = 1e-4 86 | _C.TRAIN.LR_MILESTONES = [5, 8] 87 | _C.TRAIN.LR_GAMMA = 0.1 88 | 89 | _C.TRAIN.NEAR_MIN_DIST = 0.0 90 | _C.TRAIN.NEAR_MAX_DIST = 0.5 91 | _C.TRAIN.FAR_MIN_DIST = 2 92 | _C.TRAIN.FAR_MAX_DIST = 10 93 | _C.TRAIN.NUM_NEAR_SAMPLES = 1 94 | _C.TRAIN.NUM_FAR_SAMPLES = 1 95 | 96 | _C.TRAIN.APPEND_GT = False 97 | _C.TRAIN.NO_TRANSFORM = False 98 | 99 | _C.TRAIN.SOURCE_WEIGHTS = "" 100 | _C.TRAIN.COMPUTE_GT_DIST = True 101 | 102 | _C.TRAIN.DISTANCE_LOSS_SCALE = 1.0 103 | _C.TRAIN.LAYOUT_LOSS_SCALE = 1.0 104 | _C.TRAIN.DECODER_LOSS_SCALE = 0.001 105 | 106 | _C.TRAIN.CUT_GRAD = True 107 | _C.TRAIN.SUBSAMPLE_PLAN_X = 1 108 | 109 | 110 | _C.TEST = CN() 111 | 112 | _C.TEST.BATCH_SIZE = 1 113 | _C.TEST.POSE_SAMPLE_STEP = 500 114 | _C.TEST.LAYOUTS_MAX_BATCH = 64 115 | _C.TEST.VOGEL_DISC_REFINE = False 116 | _C.TEST.VOGEL_SAMPLES = 100 117 | _C.TEST.VAL_AS_TEST = False 118 | _C.TEST.LATENT_POSE_OPTIMISATION = False 119 | _C.TEST.DECODE_REFINE = False 120 | _C.TEST.DECODE_USE_GT = False 121 | _C.TEST.METRIC_DUMP = "" 122 | _C.TEST.COMPUTE_GT_DIST = True 123 | _C.TEST.SUBSAMPLE_PLAN_X = 1 124 | 125 | 126 | def get_cfg_defaults(): 127 | """Get a yacs CfgNode object with default values for my_project.""" 128 | # Return a clone so that the defaults will not be altered 129 | # This is for the "local variable" use pattern 130 | return _C.clone() 131 | -------------------------------------------------------------------------------- /lalaloc/config/parser.py: -------------------------------------------------------------------------------- 1 | import argparse 2 | 3 | 4 | def parse_args(): 5 | parser = argparse.ArgumentParser() 6 | parser.add_argument("-c", "--config_file", default="", type=str) 7 | parser.add_argument("-l", "--checkpoint_file", default="", type=str) 8 | parser.add_argument("-t", "--test_ckpt", default="", type=str) 9 | parser.add_argument("-v", "--val", action="store_true") 10 | parser.add_argument( 11 | "opts", 12 | help="Modify config options using the command-line", 13 | default=None, 14 | nargs=argparse.REMAINDER, 15 | ) 16 | 17 | args = parser.parse_args() 18 | return args -------------------------------------------------------------------------------- /lalaloc/data/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ActiveVisionLab/LaLaLoc/be48bc1884722409eeb6766e55fafa8ece521c9c/lalaloc/data/__init__.py -------------------------------------------------------------------------------- /lalaloc/data/dataset.py: -------------------------------------------------------------------------------- 1 | import os 2 | import random 3 | import warnings 4 | from math import dist 5 | 6 | import numpy as np 7 | import torch 8 | import torchvision.transforms.functional as tvf 9 | from PIL import Image 10 | from pytorch3d.structures import Pointclouds 11 | from torch.utils.data import Dataset 12 | from tqdm import tqdm 13 | 14 | from ..utils.chamfer import chamfer 15 | from ..utils.floorplan import pose_to_pixel_loc, sample_locs 16 | from ..utils.projection import ( 17 | project_depth_to_pc, 18 | project_depth_to_pc_batched, 19 | projects_onto_floor, 20 | ) 21 | from ..utils.render import ( 22 | render_scene, 23 | render_scene_batched, 24 | render_semantic_batched, 25 | render_semantics, 26 | ) 27 | from .load import ( 28 | create_floorplan_from_annos, 29 | load_scene_annos, 30 | prepare_geometry_from_annos, 31 | ) 32 | from .split import scenes_split 33 | from .transform import build_transform 34 | 35 | 36 | def sample_xy_displacement(max_dist=1, min_dist=0): 37 | radius = random.uniform(min_dist, max_dist) 38 | angle = random.uniform(0, 2 * np.pi) 39 | x = radius * np.sin(angle) 40 | y = radius * np.cos(angle) 41 | return np.array([x, y, 0]) * 1000 42 | 43 | 44 | def load_scene_data( 45 | scene_ids, root_path, for_visualisation=False, pix_per_mm=0.025, min_factor=32 46 | ): 47 | scenes = [] 48 | for scene_id in tqdm(scene_ids): 49 | annos = load_scene_annos(root_path, scene_id) 50 | floorplan, plan_params = create_floorplan_from_annos( 51 | annos, scene_id, pix_per_mm, min_factor 52 | ) 53 | scene_geometry, floor_planes, limits = prepare_geometry_from_annos( 54 | annos, for_visualisation=for_visualisation 55 | ) 56 | scene_path = os.path.join(root_path, f"scene_{scene_id:05d}", "2D_rendering") 57 | 58 | scene_rooms = [] 59 | for room_id in np.sort(os.listdir(scene_path)): 60 | room_path = os.path.join(scene_path, room_id, "panorama") 61 | panorama_path = os.path.join(room_path, "{}", "rgb_{}light.png") 62 | pose = np.loadtxt(os.path.join(room_path, "camera_xyz.txt")) 63 | 64 | scene_rooms.append( 65 | { 66 | "id": room_id, 67 | "path": room_path, 68 | "panorama": panorama_path, 69 | "pose": pose, 70 | } 71 | ) 72 | scenes.append( 73 | { 74 | "id": scene_id, 75 | "geometry": scene_geometry, 76 | "rooms": scene_rooms, 77 | "floor_planes": floor_planes, 78 | "limits": limits, 79 | "floorplan": floorplan, 80 | "floorplan_params": plan_params, 81 | } 82 | ) 83 | return scenes 84 | 85 | 86 | class Structured3DPlans(Dataset): 87 | def __init__( 88 | self, config, split="train", visualise=False, 89 | ): 90 | scene_ids = scenes_split(split) 91 | dataset_path = config.DATASET.PATH 92 | self.scenes = load_scene_data( 93 | scene_ids, 94 | dataset_path, 95 | visualise, 96 | config.DATASET.PIX_PER_MM, 97 | config.DATASET.FLOORPLAN_DIVISIBLE_BY, 98 | ) 99 | self.is_train = split == "train" 100 | self.visualise = visualise 101 | self.transform = build_transform(config, self.is_train) 102 | self.layout_transform = build_transform(config, self.is_train, is_layout=True) 103 | self.augment = config.DATASET.AUGMENT_LAYOUTS 104 | 105 | self.precomputed = None 106 | 107 | if self.is_train: 108 | furniture_levels = config.DATASET.TRAIN_FURNITURE 109 | lighting_levels = config.DATASET.TRAIN_LIGHTING 110 | else: 111 | furniture_levels = config.DATASET.TEST_FURNITURE 112 | lighting_levels = config.DATASET.TEST_LIGHTING 113 | self.furniture_levels = furniture_levels 114 | self.lighting_levels = lighting_levels 115 | self.compute_gt_dist = config.TRAIN.COMPUTE_GT_DIST 116 | self.compute_gt_dist_test = config.TEST.COMPUTE_GT_DIST 117 | self.config = config 118 | 119 | def __len__(self): 120 | return len(self.scenes) 121 | 122 | def __getitem__(self, idx): 123 | data = self.scenes[idx] 124 | geometry = data["geometry"] 125 | rooms = data["rooms"] 126 | floor = data["floor_planes"] 127 | limits = data["limits"] 128 | floorplan = data["floorplan"] 129 | floorplan_params = data["floorplan_params"] 130 | 131 | if self.is_train: 132 | # randomly select a room from the scene 133 | room = random.choice(rooms) 134 | # sample and process the +ve and -ve traning examples 135 | sampled_poses_and_rooms = self._sample_train_poses( 136 | room["pose"], floor, limits, scene_id=data["id"] 137 | ) 138 | sampled_layouts = render_scene_batched( 139 | self.config, geometry, sampled_poses_and_rooms 140 | ) 141 | sampled_pointclouds = project_depth_to_pc_batched( 142 | self.config, sampled_layouts 143 | ) 144 | room_data = self._process_room( 145 | data["id"], room, geometry, floor, limits, sampled_pointclouds, 146 | ) 147 | panorama = room_data["image"] 148 | pano_pose = room_data["pose"] 149 | pano_depths = room_data["layout"] 150 | pano_layouts = self.layout_transform(room_data["layout"]) 151 | pano_room_idx = [room_data["room_idx"]] 152 | # in some scenarios we do not need the gt distances 153 | if self.compute_gt_dist: 154 | distances = room_data["gt_distances"] 155 | else: 156 | distances = [] 157 | # semantics are only used for visualation therefore not needed here 158 | sampled_semantics = [np.empty((0, 0))] 159 | pano_semantics = [np.empty((0, 0))] 160 | else: 161 | sampled_poses_and_rooms = self._sample_test_poses( 162 | limits, 163 | rooms[0]["pose"][-1], 164 | floor, 165 | step=self.config.TEST.POSE_SAMPLE_STEP, 166 | ) 167 | # if the grid size is set to be very large sometimes there are no test poses 168 | if sampled_poses_and_rooms: 169 | sampled_layouts = render_scene_batched( 170 | self.config, geometry, sampled_poses_and_rooms 171 | ) 172 | sampled_semantics = render_semantic_batched( 173 | self.config, geometry, sampled_poses_and_rooms 174 | ) 175 | sampled_pointclouds = project_depth_to_pc_batched( 176 | self.config, sampled_layouts 177 | ) 178 | else: 179 | warnings.warn( 180 | "The grid size is set too large for the scene leading to there being no valid test poses." 181 | ) 182 | sampled_layouts = None 183 | sampled_semantics = None 184 | sampled_pointclouds = [] 185 | 186 | panoramas = [] 187 | pano_poses = [] 188 | pano_layouts = [] 189 | pano_room_idx = [] 190 | pano_semantics = [] 191 | distances_all = [] 192 | for room in rooms: 193 | room_data = self._process_room( 194 | data["id"], room, geometry, floor, limits, sampled_pointclouds, 195 | ) 196 | panoramas.append(room_data["image"]) 197 | pano_poses.append(room_data["pose"]) 198 | pano_layouts.append(room_data["layout"]) 199 | pano_room_idx.append(room_data["room_idx"]) 200 | pano_semantics.append(room_data["semantics"]) 201 | distances_all.append(room_data["gt_distances"]) 202 | panorama = torch.stack(panoramas) 203 | pano_pose = np.stack(pano_poses) 204 | pano_room_idx = torch.Tensor(pano_room_idx) 205 | pano_depths = np.stack(pano_layouts) 206 | pano_layouts = torch.stack( 207 | [self.layout_transform(l) for l in pano_layouts], dim=0 208 | ) 209 | if self.compute_gt_dist_test: 210 | distances = torch.stack(distances_all) 211 | else: 212 | distances = [] 213 | pano_semantics = np.stack(pano_semantics) 214 | 215 | sampled_depths = np.stack(sampled_layouts) 216 | sampled_semanitcs = np.stack(sampled_semantics) 217 | sampled_layouts = torch.stack( 218 | [self.layout_transform(l) for l in sampled_layouts], dim=0 219 | ) 220 | 221 | pano_pose = torch.Tensor(pano_pose) 222 | sampled_poses = torch.Tensor([p for p, _ in sampled_poses_and_rooms]) 223 | sampled_room_idxs = torch.Tensor([r for _, r in sampled_poses_and_rooms]) 224 | 225 | if self.is_train: 226 | # geometry doesn't support batching but isn't used in training 227 | geometry = [] 228 | floor = [] 229 | 230 | floorplan = tvf.to_tensor(floorplan) 231 | 232 | return { 233 | "panorama": panorama, 234 | "pano_layout": pano_layouts, 235 | "pano_pose": pano_pose, 236 | "pano_room_idx": pano_room_idx, 237 | "pano_depths": pano_depths, 238 | "pano_semantics": pano_semantics, 239 | "sampled_layouts": sampled_layouts, 240 | "sampled_poses": sampled_poses, 241 | "sampled_room_idxs": sampled_room_idxs, 242 | "sampled_depths": sampled_depths, 243 | "sampled_semantics": sampled_semanitcs, 244 | "distances": distances, 245 | "geometry": geometry, 246 | "floor": floor, 247 | "floorplan": floorplan, 248 | "floorplan_params": floorplan_params, 249 | } 250 | 251 | def _process_room( 252 | self, scene_id, room, geometry, floor, limits, pointclouds_layouts 253 | ): 254 | pose_pano, layout_pano, semantics_pano, room_idx = self._get_room_data( 255 | room, geometry, floor, scene_id 256 | ) 257 | pointcloud_pano = torch.Tensor(project_depth_to_pc(self.config, layout_pano)) 258 | pointclouds_pano = Pointclouds([pointcloud_pano,] * len(pointclouds_layouts)) 259 | 260 | if (self.is_train and self.compute_gt_dist) or ( 261 | not self.is_train and self.compute_gt_dist_test 262 | ): 263 | distances = chamfer(pointclouds_pano, pointclouds_layouts) 264 | else: 265 | distances = [] 266 | 267 | furniture = random.choice(self.furniture_levels) 268 | lighting = random.choice(self.lighting_levels) 269 | panorama_path = room["panorama"].format(furniture, lighting) 270 | # sometimes the panorama image can be get corrupted 271 | # if this happens, reextract the relevant zip file 272 | try: 273 | panorama = Image.open(panorama_path).convert("RGB") 274 | except Exception as e: 275 | print(panorama_path) 276 | print(e) 277 | 278 | panorama = self.transform(panorama) 279 | 280 | return { 281 | "image": panorama, 282 | "pose": pose_pano, 283 | "layout": layout_pano, 284 | "semantics": semantics_pano, 285 | "gt_distances": distances, 286 | "room_idx": room_idx, 287 | } 288 | 289 | def _sample_train_poses( 290 | self, pose_panorama, floor_geometry, limits, offset=500, scene_id=None 291 | ): 292 | poses = [] 293 | for _ in range(self.config.TRAIN.NUM_NEAR_SAMPLES): 294 | near_room_idx = -1 295 | num_attempts = 0 296 | while near_room_idx < 0: 297 | if num_attempts > 100: 298 | pose_near = pose_panorama 299 | else: 300 | pose_near = pose_panorama + sample_xy_displacement( 301 | max_dist=self.config.TRAIN.NEAR_MAX_DIST, 302 | min_dist=self.config.TRAIN.NEAR_MIN_DIST, 303 | ) 304 | near_room_idx = projects_onto_floor(pose_near, floor_geometry) 305 | num_attempts += 1 306 | poses.append((pose_near, near_room_idx)) 307 | 308 | for _ in range(self.config.TRAIN.NUM_FAR_SAMPLES): 309 | far_room_idx = -1 310 | num_attempts = 0 311 | while far_room_idx < 0: 312 | x_location = np.random.uniform( 313 | limits[0] + (offset / 2), limits[1] - (offset / 2) 314 | ) 315 | y_location = np.random.uniform( 316 | limits[2] + (offset / 2), limits[3] - (offset / 2) 317 | ) 318 | pose_far = np.array([x_location, y_location, pose_panorama[-1]]) 319 | 320 | if ( 321 | np.linalg.norm(pose_panorama - pose_far) 322 | < self.config.TRAIN.FAR_MIN_DIST 323 | ): 324 | continue 325 | far_room_idx = projects_onto_floor(pose_far, floor_geometry) 326 | if num_attempts >= 100: 327 | pose_far = pose_near 328 | far_room_idx = projects_onto_floor(pose_far, floor_geometry) 329 | num_attempts += 1 330 | poses.append((pose_far, far_room_idx)) 331 | 332 | if self.config.TRAIN.APPEND_GT: 333 | gt_room_idx = projects_onto_floor(pose_panorama, floor_geometry) 334 | poses.append((pose_panorama, gt_room_idx)) 335 | return poses 336 | 337 | def _sample_test_poses(self, limits, z, floor_geometry, step=1000): 338 | x_locations = np.arange(limits[0] + (step / 2), limits[1] + (step / 2), step) 339 | y_locations = np.arange(limits[2] + (step / 2), limits[3] + (step / 2), step) 340 | 341 | poses = np.meshgrid(x_locations, y_locations) 342 | poses = np.stack(poses, axis=2).reshape(-1, 2) 343 | poses = np.concatenate([poses, np.full((poses.shape[0], 1), z)], axis=1) 344 | 345 | room_idxs = [projects_onto_floor(pose, floor_geometry) for pose in poses] 346 | pose_grid = [(p, i) for p, i in zip(poses, room_idxs) if i >= 0] 347 | return pose_grid 348 | 349 | def _get_room_data(self, room, geometry, floor, scene_id): 350 | pose_pano = room["pose"] 351 | room_idx = projects_onto_floor(pose_pano, floor) 352 | if room_idx < 0: 353 | warnings.warn("pose outside room: {}".format(scene_id)) 354 | layout_pano = render_scene(self.config, geometry[room_idx], pose_pano) 355 | semantics_pano = render_semantics(self.config, geometry[room_idx], pose_pano) 356 | return pose_pano, layout_pano, semantics_pano, room_idx 357 | 358 | 359 | class TargetEmbeddingDataset(Structured3DPlans): 360 | def __init__(self, encoder, config, split="train", visualise=False, device="cpu:0"): 361 | super().__init__(config, split=split, visualise=visualise) 362 | 363 | with torch.no_grad(): 364 | encoder.eval() 365 | for scene in tqdm(self.scenes): 366 | floorplan = scene["floorplan"] 367 | floorplan_params = scene["floorplan_params"] 368 | scale = torch.tensor([floorplan_params["scale"]]) 369 | shift = torch.tensor(floorplan_params["shift"]) 370 | plan_height = floorplan_params["h"] 371 | plan_width = floorplan_params["w"] 372 | floorplan = floorplan[:plan_height, :plan_width] 373 | 374 | floorplan = tvf.to_tensor(floorplan).to(device).unsqueeze(0) 375 | plan_embed = encoder(floorplan).cpu() 376 | 377 | subsample_x = self.config.TEST.SUBSAMPLE_PLAN_X 378 | if subsample_x > 1: 379 | plan_embed = plan_embed[:, :, ::subsample_x, ::subsample_x] 380 | floorplan = floorplan[:, :, ::subsample_x, ::subsample_x] 381 | scale = scale / subsample_x 382 | 383 | query_poses = [] 384 | rooms = scene["rooms"] 385 | for room in rooms: 386 | query_poses.append(room["pose"]) 387 | query_poses = torch.Tensor(np.stack(query_poses)) 388 | query_locs = pose_to_pixel_loc(query_poses.unsqueeze(0), scale, shift) 389 | 390 | target_embeddings = sample_locs( 391 | plan_embed, query_locs, normalise=self.config.MODEL.NORMALISE_SAMPLE 392 | ).squeeze(0) 393 | 394 | scene["embeddings"] = target_embeddings 395 | 396 | def __getitem__(self, idx): 397 | data = self.scenes[idx] 398 | rooms = data["rooms"] 399 | embeddings = data["embeddings"] 400 | 401 | room_idx = random.randrange(0, len(rooms)) 402 | room = rooms[room_idx] 403 | 404 | furniture = random.choice(self.furniture_levels) 405 | lighting = random.choice(self.lighting_levels) 406 | panorama_path = room["panorama"].format(furniture, lighting) 407 | # sometimes the panorama image can be get corrupted 408 | # if this happens, reextract the relevant zip file 409 | try: 410 | panorama = Image.open(panorama_path).convert("RGB") 411 | except Exception as e: 412 | print(panorama_path) 413 | print(e) 414 | 415 | panorama = self.transform(panorama) 416 | embedding = embeddings[room_idx] 417 | return { 418 | "panorama": panorama, 419 | "target_embedding": embedding, 420 | } 421 | -------------------------------------------------------------------------------- /lalaloc/data/load.py: -------------------------------------------------------------------------------- 1 | """ 2 | Parts of this code are modified from: https://github.com/bertjiazheng/Structured3D 3 | Copyright (c) 2019 Structured3D Group 4 | """ 5 | import json 6 | import math 7 | import os 8 | 9 | import cv2 10 | import numpy as np 11 | import torch 12 | from pytorch3d.structures import Meshes, join_meshes_as_scene 13 | 14 | from ..utils.polygons import clip_polygon, convert_lines_to_vertices 15 | 16 | 17 | def round_up_to_multiple(f, factor=2): 18 | return math.ceil(f / float(factor)) * factor 19 | 20 | 21 | def load_scene_annos(root, scene_id): 22 | with open( 23 | os.path.join(root, f"scene_{scene_id:05d}", "annotation_3d.json") 24 | ) as file: 25 | annos = json.load(file) 26 | return annos 27 | 28 | 29 | def prepare_geometry_from_annos(annos, for_visualisation=False): 30 | junctions = [item["coordinate"] for item in annos["junctions"]] 31 | 32 | # extract hole vertices 33 | lines_holes = [] 34 | for semantic in annos["semantics"]: 35 | if semantic["type"] in ["window", "door"]: 36 | for planeID in semantic["planeID"]: 37 | lines_holes.extend( 38 | np.where(np.array(annos["planeLineMatrix"][planeID]))[0].tolist() 39 | ) 40 | lines_holes = np.unique(lines_holes) 41 | _, vertices_holes = np.where(np.array(annos["lineJunctionMatrix"])[lines_holes]) 42 | vertices_holes = np.unique(vertices_holes) 43 | 44 | # load polygons 45 | rooms = [] 46 | floor_verts = [] 47 | floor_faces = [] 48 | min_x = 1e15 49 | max_x = -1e15 50 | min_y = 1e15 51 | max_y = -1e15 52 | for semantic in annos["semantics"]: 53 | if semantic["type"] in ["outwall", "door", "window"]: 54 | continue 55 | polygons = [] 56 | for planeID in semantic["planeID"]: 57 | plane_anno = annos["planes"][planeID] 58 | lineIDs = np.where(np.array(annos["planeLineMatrix"][planeID]))[0].tolist() 59 | junction_pairs = [ 60 | np.where(np.array(annos["lineJunctionMatrix"][lineID]))[0].tolist() 61 | for lineID in lineIDs 62 | ] 63 | polygon = convert_lines_to_vertices(junction_pairs) 64 | vertices, faces = clip_polygon( 65 | polygon, vertices_holes, junctions, plane_anno, clip_holes=False 66 | ) 67 | polygons.append( 68 | [ 69 | vertices, 70 | faces, 71 | planeID, 72 | plane_anno["normal"], 73 | plane_anno["type"], 74 | semantic["type"], 75 | ] 76 | ) 77 | 78 | room_verts = [] 79 | room_faces = [] 80 | for vertices, faces, planeID, normal, plane_type, semantic_type in polygons: 81 | vis_verts = np.array(vertices) 82 | vis_faces = np.array(faces) 83 | if len(vis_faces) == 0: 84 | continue 85 | 86 | room_verts.append(torch.Tensor(vertices)) 87 | room_faces.append(torch.Tensor(faces)) 88 | 89 | min_x = min(min_x, np.min(vis_verts[:, 0])) 90 | max_x = max(max_x, np.max(vis_verts[:, 0])) 91 | min_y = min(min_y, np.min(vis_verts[:, 1])) 92 | max_y = max(max_y, np.max(vis_verts[:, 1])) 93 | 94 | if plane_type == "floor": 95 | floor_verts.append(torch.Tensor(vertices)) 96 | floor_faces.append(torch.Tensor(faces)) 97 | if not for_visualisation: 98 | room = join_meshes_as_scene(Meshes(room_verts, room_faces)) 99 | else: 100 | room = Meshes( 101 | room_verts, room_faces 102 | ) # This provides the correct form for visualisation 103 | rooms.append(room) 104 | floors = Meshes(verts=floor_verts, faces=floor_faces) 105 | limits = (min_x, max_x, min_y, max_y) 106 | return rooms, floors, limits 107 | 108 | 109 | def create_floorplan_from_annos(annos, scene_id, pix_per_mm=0.025, min_factor=32): 110 | # extract the floor in each semantic for floorplan visualization 111 | planes = [] 112 | for semantic in annos["semantics"]: 113 | for planeID in semantic["planeID"]: 114 | if annos["planes"][planeID]["type"] == "floor": 115 | planes.append({"planeID": planeID, "type": semantic["type"]}) 116 | 117 | if semantic["type"] == "outwall": 118 | outerwall_planes = semantic["planeID"] 119 | 120 | # extract hole vertices 121 | lines_holes = [] 122 | for semantic in annos["semantics"]: 123 | if semantic["type"] in ["window", "door"]: 124 | for planeID in semantic["planeID"]: 125 | lines_holes.extend( 126 | np.where(np.array(annos["planeLineMatrix"][planeID]))[0].tolist() 127 | ) 128 | lines_holes = np.unique(lines_holes) 129 | 130 | # junctions on the floor 131 | junctions = np.array([junc["coordinate"] for junc in annos["junctions"]]) 132 | junction_floor = np.where(np.isclose(junctions[:, -1], 0))[0] 133 | 134 | # construct each polygon 135 | polygons = [] 136 | for plane in planes: 137 | lineIDs = np.where(np.array(annos["planeLineMatrix"][plane["planeID"]]))[ 138 | 0 139 | ].tolist() 140 | junction_pairs = [ 141 | np.where(np.array(annos["lineJunctionMatrix"][lineID]))[0].tolist() 142 | for lineID in lineIDs 143 | ] 144 | polygon = convert_lines_to_vertices(junction_pairs) 145 | polygons.append([polygon[0], plane["type"]]) 146 | 147 | outerwall_floor = [] 148 | for planeID in outerwall_planes: 149 | lineIDs = np.where(np.array(annos["planeLineMatrix"][planeID]))[0].tolist() 150 | lineIDs = np.setdiff1d(lineIDs, lines_holes) 151 | junction_pairs = [ 152 | np.where(np.array(annos["lineJunctionMatrix"][lineID]))[0].tolist() 153 | for lineID in lineIDs 154 | ] 155 | for start, end in junction_pairs: 156 | if start in junction_floor and end in junction_floor: 157 | outerwall_floor.append([start, end]) 158 | 159 | outerwall_polygon = convert_lines_to_vertices(outerwall_floor) 160 | polygons.insert(0, [outerwall_polygon[0], "outwall"]) 161 | 162 | floorplan, affine_params = plot_floorplan( 163 | polygons, junctions, scene_id, pix_per_mm=pix_per_mm, round_multiple=min_factor 164 | ) 165 | return floorplan, affine_params 166 | 167 | 168 | def plot_floorplan( 169 | polygons, junctions, scene_id, size=512, pix_per_mm=0.025, round_multiple=32 170 | ): 171 | 172 | junctions = junctions[:, :2] 173 | 174 | used_junctions = [] 175 | for polygon, _ in polygons: 176 | used_junctions.append(junctions[np.array(polygon)]) 177 | used_junctions = np.concatenate(used_junctions) 178 | # shift so floorplan fits in unit square 0 and 1 179 | min_x = np.min(used_junctions[:, 0]) 180 | max_x = np.max(used_junctions[:, 0]) 181 | min_y = np.min(used_junctions[:, 1]) 182 | max_y = np.max(used_junctions[:, 1]) 183 | shift = np.array((min_x, min_y)) 184 | 185 | if pix_per_mm < 0: 186 | range = max(max_x - min_x, max_y - min_y) 187 | scale = size / range 188 | floorplan_shape = (size, size, 3) 189 | else: 190 | scale = pix_per_mm 191 | range_x = max_x - min_x 192 | range_y = max_y - min_y 193 | w_ind = round_up_to_multiple(pix_per_mm * range_x, round_multiple) 194 | h_ind = round_up_to_multiple(pix_per_mm * range_y, round_multiple) 195 | w = 1216 196 | h = 960 197 | floorplan_shape = (h, w, 3) 198 | 199 | junctions -= shift 200 | junctions *= scale 201 | 202 | floorplan = np.zeros(floorplan_shape, dtype=np.float32) 203 | for (polygon, poly_type) in polygons: 204 | contours = junctions[np.array(polygon)].astype(np.int32) 205 | if poly_type in ["door", "window", "outwall"]: 206 | cv2.fillPoly(floorplan, pts=[contours], color=(1.0, 1.0, 1.0)) 207 | else: 208 | cv2.fillPoly(floorplan, pts=[contours], color=(0.5, 0.5, 0.5)) 209 | 210 | return floorplan, {"scale": scale, "shift": shift, "w": w_ind, "h": h_ind} 211 | -------------------------------------------------------------------------------- /lalaloc/data/split.py: -------------------------------------------------------------------------------- 1 | def scenes_split(split): 2 | splits = { 3 | "train": ( 4 | list(range(0, 3000)), 5 | [ 6 | 335, 7 | 683, 8 | 1192, 9 | 1753, 10 | 1852, 11 | 2205, 12 | 2209, 13 | 2223, 14 | 2339, 15 | 2357, 16 | 2401, 17 | 2956, 18 | 2309, 19 | 278, 20 | 379, 21 | 1212, 22 | 1840, 23 | 1855, 24 | 2025, 25 | 2110, 26 | 2593, 27 | ], 28 | ), 29 | "val": (list(range(3000, 3250)), [2110, 3086, 3117, 3121, 3239]), 30 | "test": (list(range(3250, 3500)), [3307]), 31 | } 32 | ids, to_remove = splits[split] 33 | ids = [i for i in ids if i not in to_remove] 34 | return ids 35 | -------------------------------------------------------------------------------- /lalaloc/data/transform.py: -------------------------------------------------------------------------------- 1 | import torchvision.transforms as transforms 2 | 3 | 4 | def build_transform(config, is_train, is_layout=False): 5 | # TODO: Data augmentation, flip and rotate the camera 6 | # Needs to be applied to layouts as well 7 | in_size = config.INPUT.LAYOUT_SIZE if is_layout else config.INPUT.IMG_SIZE 8 | 9 | transform = [ 10 | transforms.Resize(in_size), 11 | transforms.ToTensor(), 12 | ] 13 | if is_layout: 14 | transform = [transforms.ToPILImage(),] + transform 15 | elif not config.TRAIN.NO_TRANSFORM: 16 | transform += [ 17 | transforms.Normalize( 18 | mean=config.INPUT.NORMALISE_MEAN, std=config.INPUT.NORMALISE_STD 19 | ), 20 | ] 21 | transform = transforms.Compose(transform) 22 | return transform 23 | -------------------------------------------------------------------------------- /lalaloc/model/__init__.py: -------------------------------------------------------------------------------- 1 | from .lalaloc import ImageFromLayout, Layout2LayoutDecode 2 | from .lalaloc_pp import FloorPlanUnetImage, FloorPlanUnetLayout 3 | -------------------------------------------------------------------------------- /lalaloc/model/lalaloc.py: -------------------------------------------------------------------------------- 1 | from logging import warn 2 | import warnings 3 | 4 | import torch 5 | import torch.nn as nn 6 | import torch.nn.functional as F 7 | 8 | from .lalaloc_base import Image2LayoutBase, Layout2LayoutBase 9 | from .modules import LayoutDecoder 10 | from .losses import triplet_loss, bbs_loss 11 | 12 | 13 | class ImageFromLayout(Image2LayoutBase): 14 | def __init__(self, config): 15 | super(ImageFromLayout, self).__init__(config) 16 | self.load_weights_from_l2l(config.TRAIN.SOURCE_WEIGHTS) 17 | 18 | def load_weights_from_l2l(self, ckpt_path): 19 | if not ckpt_path: 20 | warnings.warn("No source for the layout branch weights was specified") 21 | return 22 | # load weights from Layout2Layout model 23 | ckpt_dict = torch.load(ckpt_path) 24 | model_weights = ckpt_dict["state_dict"] 25 | 26 | # load "embedder" weights into "reference_embedder" 27 | load_dict = {} 28 | for k, v in model_weights.items(): 29 | modules = k.split(".") 30 | parent = modules[0] 31 | if parent == "embedder": 32 | child = ".".join(modules[1:]) 33 | load_dict[child] = v 34 | self.reference_embedder.load_state_dict(load_dict) 35 | 36 | # freeze reference_embedder weights 37 | for p in self.reference_embedder.parameters(): 38 | p.requires_grad = False 39 | 40 | def training_step(self, batch, batch_idx): 41 | for m in self.reference_embedder.modules(): 42 | if isinstance(m, nn.BatchNorm2d): 43 | m.eval() 44 | 45 | # compute query and reference embeddings 46 | query_image = batch["panorama"] 47 | query_embed = self.forward(q=query_image) 48 | 49 | reference_layouts = batch["pano_layout"].unsqueeze(1) 50 | reference_embed = self.forward(r=reference_layouts).squeeze(1) 51 | 52 | # perform L2 distance loss 53 | loss = ((query_embed - reference_embed) ** 2).sum(dim=1).sqrt().mean() 54 | 55 | stats_to_log = {"train/l2_loss": loss.item()} 56 | return {"loss": loss, "log": stats_to_log} 57 | 58 | 59 | class Layout2LayoutDecode(Layout2LayoutBase): 60 | def __init__(self, config): 61 | super(Layout2LayoutDecode, self).__init__(config) 62 | self.layout_decoder = LayoutDecoder(config) 63 | 64 | def training_step(self, batch, batch_idx): 65 | query_image = batch[self.query_key] 66 | reference_layouts = batch["sampled_layouts"] 67 | gt_distances = batch["distances"] 68 | 69 | query_embed = self.forward(q=query_image) 70 | reference_embed = self.forward(r=reference_layouts) 71 | 72 | # perform layout2layout layout loss 73 | distances = self.compute_distances(query_embed, reference_embed) 74 | if self.config.TRAIN.LOSS == "triplet": 75 | loss_layout = triplet_loss(distances) 76 | elif self.config.TRAIN.LOSS == "bbs": 77 | gt_distances = gt_distances.float().to(self.device) 78 | loss_layout = bbs_loss(distances, gt_distances) 79 | else: 80 | raise NotImplementedError( 81 | "{} loss type is not currently implemented".format( 82 | self.config.TRAIN.LOSS 83 | ) 84 | ) 85 | 86 | # decode the layout embedding and compute loss 87 | query_decoded = self.layout_decoder(query_embed) 88 | query_target = F.interpolate( 89 | query_image.detach().clone(), self.config.MODEL.DECODER_RESOLUTION 90 | ) 91 | reference_decoded = self.layout_decoder(reference_embed) 92 | 93 | h, w = reference_layouts.shape[-2:] 94 | reference_targets = F.interpolate( 95 | reference_layouts.view(-1, 1, h, w).detach().clone(), 96 | self.config.MODEL.DECODER_RESOLUTION, 97 | ) 98 | 99 | decoded = torch.cat([query_decoded, reference_decoded], dim=0) 100 | target = torch.cat([query_target, reference_targets], dim=0) 101 | loss_decode = F.l1_loss(decoded, target) 102 | 103 | loss = ( 104 | self.config.TRAIN.DECODER_LOSS_SCALE * loss_decode 105 | + self.config.TRAIN.LAYOUT_LOSS_SCALE * loss_layout 106 | ) 107 | stats_to_log = { 108 | "train/loss": loss.item(), 109 | "train/layout_loss": loss_layout.item(), 110 | "train/decoder_loss": loss_decode.item(), 111 | } 112 | return {"loss": loss, "log": stats_to_log} 113 | -------------------------------------------------------------------------------- /lalaloc/model/lalaloc_base.py: -------------------------------------------------------------------------------- 1 | import pytorch_lightning as pl 2 | import torch 3 | import torch.nn as nn 4 | import torch.nn.functional as F 5 | import torch.optim as optim 6 | from torch.utils.data import DataLoader 7 | 8 | from lalaloc.utils.render import render_scene_batched 9 | from lalaloc.utils.vogel_disc import sample_vogel_disc 10 | 11 | from ..data.dataset import Structured3DPlans 12 | from ..data.transform import build_transform 13 | from ..utils.eval import recall_at_n 14 | from ..utils.projection import projects_onto_floor 15 | from .modules import LayoutModule, PanoModule 16 | from .pose_optimisation import ( 17 | PoseConvergenceChecker, 18 | init_camera_at_origin, 19 | init_objects_at_pose, 20 | init_optimiser, 21 | render_at_pose, 22 | ) 23 | 24 | 25 | def build_dataloader(config, split): 26 | is_train = split == "train" 27 | 28 | batch_size = config.TRAIN.BATCH_SIZE if is_train else None 29 | num_workers = ( 30 | config.SYSTEM.NUM_WORKERS if is_train or not config.TEST.COMPUTE_GT_DIST else 0 31 | ) 32 | 33 | dataset = Structured3DPlans(config, split) 34 | dataloader = DataLoader( 35 | dataset, batch_size, shuffle=is_train, num_workers=num_workers, 36 | ) 37 | return dataloader 38 | 39 | 40 | class Image2LayoutBase(pl.LightningModule): 41 | def __init__(self, config): 42 | super(Image2LayoutBase, self).__init__() 43 | self.query_embedder = PanoModule(config) 44 | self.reference_embedder = LayoutModule(config) 45 | self.desc_length = config.MODEL.DESC_LENGTH 46 | self.config = config 47 | # The key to access the query data type from the batch dict 48 | self.query_key = "panorama" 49 | 50 | def forward(self, q=None, r=None): 51 | if q is None and r is None: 52 | raise Exception 53 | 54 | if q is not None: 55 | q = self.query_embedder(q) 56 | if r is None: 57 | return q 58 | 59 | if r is not None: 60 | n, m, c, h, w = r.shape 61 | r = r.reshape(n * m, c, h, w) 62 | r = self.reference_embedder(r) 63 | r = r.reshape(n, m, self.desc_length) 64 | if q is None: 65 | return r 66 | 67 | d = self.compute_distances(q, r) 68 | return d 69 | 70 | def compute_distances(self, p, l): 71 | n, m, _ = l.shape 72 | p = p.unsqueeze(1).expand(-1, m, -1) 73 | p = p.reshape(n * m, self.desc_length) 74 | l = l.reshape(-1, self.desc_length) 75 | 76 | d = F.pairwise_distance(p, l) 77 | d = d.reshape(n, m) 78 | return d 79 | 80 | def vogel_refinement(self, query_embeddings, nn_poses, geometry, floor): 81 | transform = build_transform(self.config, False, is_layout=True) 82 | radius = 2 * self.config.TEST.POSE_SAMPLE_STEP 83 | num_samples = self.config.TEST.VOGEL_SAMPLES 84 | refined_poses = [] 85 | for query_embedding, nn_pose in zip(query_embeddings, nn_poses): 86 | sampled_poses = sample_vogel_disc(nn_pose, radius, num_samples) 87 | poses_to_render = [] 88 | for pose in sampled_poses: 89 | room_idx = projects_onto_floor(pose, floor) 90 | if room_idx < 0: 91 | continue 92 | poses_to_render.append((pose, room_idx)) 93 | local_layouts = render_scene_batched(self.config, geometry, poses_to_render) 94 | local_poses = [torch.tensor(p[0]) for p in poses_to_render] 95 | 96 | # transform and stack layouts 97 | local_layouts = [transform(l) for l in local_layouts] 98 | local_layouts = torch.stack(local_layouts).to(self.device) 99 | # feed into embedder 100 | local_embeddings = self.forward(r=local_layouts.unsqueeze(0)).detach() 101 | # take min distance between result and query_embedding 102 | # and append its respective pose 103 | distances = torch.norm(local_embeddings - query_embedding, dim=-1) 104 | refined_pose = local_poses[distances.argmin()] 105 | refined_poses.append(refined_pose) 106 | refined_poses = torch.stack(refined_poses) 107 | return refined_poses 108 | 109 | def latent_pose_optimisation(self, query_embeddings, nn_poses, geometry, floor): 110 | # Ensure gradients are enabled and modules are in eval mode 111 | torch.set_grad_enabled(True) 112 | self.eval() 113 | 114 | refined_poses = [] 115 | for query_embedding, nn_pose in zip(query_embeddings, nn_poses): 116 | query_embedding = query_embedding.detach() 117 | query_embedding.requires_grad = False 118 | 119 | # gather room geometry 120 | room_idx = projects_onto_floor(nn_pose, floor) 121 | mesh = geometry[room_idx] 122 | 123 | # centre geometry at pose 124 | nn_pose = nn_pose.to(self.device) 125 | objects, vertices = init_objects_at_pose(nn_pose, mesh, self.device) 126 | camera = init_camera_at_origin(self.config) 127 | 128 | # initialise the refinement translation vector 129 | # note: these represent displacements from the initial pose in metres 130 | pose_xy = torch.zeros((1, 2), requires_grad=True, device=self.device) 131 | pose_z = torch.zeros((1, 1), requires_grad=False, device=self.device) 132 | 133 | # initialise optimisation and stopping metrics 134 | optimiser, scheduler = init_optimiser(self.config, [pose_xy]) 135 | convergence_checker = PoseConvergenceChecker(self.config) 136 | 137 | for j in range(self.config.POSE_REFINE.MAX_ITERS): 138 | optimiser.zero_grad() 139 | layout = render_at_pose( 140 | camera, objects, vertices, torch.cat([pose_xy, pose_z], dim=1) 141 | ) 142 | 143 | h, w = layout.shape 144 | layout = layout.view(1, 1, h, w) 145 | # for faster rendering, sometimes we render the layouts at a smaller resolution than the network takes as input 146 | # therefore, we need to interpolate the layout to the target input size 147 | if self.config.POSE_REFINE.RENDER_SIZE != self.config.INPUT.IMG_SIZE: 148 | layout = F.interpolate(layout, self.config.INPUT.IMG_SIZE) 149 | 150 | layout_embedding = self.forward(r=layout.unsqueeze(0)) 151 | loss = torch.norm(query_embedding - layout_embedding, dim=-1) 152 | loss.backward() 153 | 154 | # check convergence 155 | current_loss = loss.item() 156 | current_pose = torch.cat([pose_xy, pose_z], dim=1).clone().detach() 157 | if convergence_checker.has_converged(current_loss, current_pose): 158 | break 159 | 160 | optimiser.step() 161 | scheduler.step(current_loss) 162 | # pose displacement is optimised in metres, therefore convert it to mm 163 | refined_pose = nn_pose + convergence_checker.best_pose * 1000 164 | refined_poses.append(refined_pose) 165 | refined_poses = torch.stack(refined_poses) 166 | refined_poses = refined_poses.detach().cpu().squeeze(1) 167 | torch.set_grad_enabled(False) 168 | return refined_poses 169 | 170 | def configure_optimizers(self): 171 | optimiser = optim.SGD( 172 | filter(lambda p: p.requires_grad, self.parameters()), 173 | lr=self.config.TRAIN.INITIAL_LR, 174 | momentum=self.config.TRAIN.MOMENTUM, 175 | weight_decay=self.config.TRAIN.WEIGHT_DECAY, 176 | ) 177 | scheduler = optim.lr_scheduler.MultiStepLR( 178 | optimiser, self.config.TRAIN.LR_MILESTONES, self.config.TRAIN.LR_GAMMA 179 | ) 180 | return [optimiser], [scheduler] 181 | 182 | def training_step(self, batch, batch_idx): 183 | # Training step should be implemented in child classes 184 | raise NotImplementedError("The training routine is not implemented.") 185 | 186 | def inference_step(self, batch, batch_idx): 187 | query_image = batch[self.query_key] 188 | query_pose = batch["pano_pose"] 189 | reference_layouts = batch["sampled_layouts"] 190 | reference_poses = batch["sampled_poses"] 191 | 192 | gt_distances = batch["distances"] 193 | query_image = query_image 194 | query_pose = query_pose.cpu() 195 | reference_poses = reference_poses.cpu() 196 | gt_distances = gt_distances.cpu() 197 | 198 | # compute the desciptors for the panoramas in each room 199 | query_desc = self.forward(q=query_image).detach() 200 | # compute the descriptors for each of the sampled layouts 201 | # NB: this is split into minibatches since some rooms may be extremely large with many sampled layouts 202 | reference_descs = [] 203 | for layout_minibatch in reference_layouts.split( 204 | self.config.TEST.LAYOUTS_MAX_BATCH 205 | ): 206 | layout_minibatch = layout_minibatch.unsqueeze(0).contiguous().cuda() 207 | reference_descs.append(self.forward(r=layout_minibatch)[0].detach()) 208 | reference_descs = torch.cat(reference_descs) 209 | # compute the distances between each of the room panos and the grid of layouts 210 | n, _ = query_desc.shape 211 | reference_descs = reference_descs.unsqueeze(0).expand(n, -1, -1) 212 | pred_distances = ( 213 | self.compute_distances(query_desc, reference_descs).detach().cpu() 214 | ) 215 | 216 | # gather ranking info for the prediction vs the actual 217 | pose_distances = torch.norm( 218 | (query_pose.unsqueeze(1) - reference_poses.expand(n, -1, -1)), dim=-1 219 | ) 220 | ranking_prediction = torch.argsort(pred_distances, dim=-1)[:, :5] 221 | ranking_layout = torch.argsort(gt_distances, dim=-1)[:, :5] 222 | ranking_pose = torch.argsort(pose_distances, dim=-1)[:, :5] 223 | 224 | retrieval_error = pose_distances.gather( 225 | 1, ranking_prediction[:, 0].unsqueeze(1) 226 | ) 227 | oracle_error = pose_distances.gather(1, ranking_layout[:, 0].unsqueeze(1)) 228 | 229 | nn_poses = reference_poses[ranking_prediction[:, 0]] 230 | 231 | # vogel disc refinement 232 | if self.config.TEST.VOGEL_DISC_REFINE: 233 | refined_poses = self.vogel_refinement( 234 | query_desc, nn_poses, batch["geometry"], batch["floor"], 235 | ) 236 | else: 237 | refined_poses = nn_poses 238 | 239 | # latent pose optimisation 240 | if self.config.TEST.LATENT_POSE_OPTIMISATION: 241 | optimised_poses = self.latent_pose_optimisation( 242 | query_desc, refined_poses, batch["geometry"], batch["floor"], 243 | ) 244 | else: 245 | optimised_poses = refined_poses 246 | 247 | refined_error = torch.norm(query_pose - refined_poses, dim=-1, keepdim=True) 248 | optimised_error = torch.norm(query_pose - optimised_poses, dim=-1, keepdim=True) 249 | 250 | return { 251 | "pred_rank": ranking_prediction, 252 | "layout_rank": ranking_layout, 253 | "pose_rank": ranking_pose, 254 | "retrieval_error": retrieval_error, 255 | "optimised_error": optimised_error, 256 | "refined_error": refined_error, 257 | "oracle_error": oracle_error, 258 | "pred_pose": optimised_poses, 259 | } 260 | 261 | def inference_epoch_end(self, outputs, log_key): 262 | predictions = [] 263 | layouts = [] 264 | poses = [] 265 | retrieval_errors = [] 266 | oracle_errors = [] 267 | refined_errors = [] 268 | optimised_errors = [] 269 | pred_poses = [] 270 | 271 | scene_idxs = [] 272 | room_idxs = [] 273 | for i, out in enumerate(outputs): 274 | predictions.extend(out["pred_rank"].unsqueeze(0)) 275 | layouts.extend(out["layout_rank"].unsqueeze(0)) 276 | poses.extend(out["pose_rank"].unsqueeze(0)) 277 | retrieval_errors.extend(out["retrieval_error"].unsqueeze(0)) 278 | oracle_errors.extend(out["oracle_error"].unsqueeze(0)) 279 | refined_errors.extend(out["refined_error"].unsqueeze(0)) 280 | optimised_errors.extend(out["optimised_error"].unsqueeze(0)) 281 | pred_poses.extend(out["pred_pose"].unsqueeze(0)) 282 | 283 | num_rooms = len(out["retrieval_error"]) 284 | scene_idxs.extend([i] * num_rooms) 285 | room_idxs.extend(list(range(num_rooms))) 286 | 287 | predictions = torch.cat(predictions) 288 | layouts = torch.cat(layouts) 289 | poses = torch.cat(poses) 290 | retrieval_errors = torch.cat(retrieval_errors) 291 | oracle_errors = torch.cat(oracle_errors) 292 | refined_errors = torch.cat(refined_errors) 293 | optimised_errors = torch.cat(optimised_errors) 294 | pred_poses = torch.cat(pred_poses) 295 | 296 | scene_idxs = torch.tensor(scene_idxs) 297 | room_idxs = torch.tensor(room_idxs) 298 | 299 | layout_r_at_1 = (predictions[:, 0] == layouts[:, 0]).float().mean().item() 300 | pose_r_at_1 = (predictions[:, 0] == poses[:, 0]).float().mean().item() 301 | oracle_r_at_1 = (layouts[:, 0] == poses[:, 0]).float().mean().item() 302 | 303 | layout_r_at_5 = recall_at_n(5, predictions, layouts).item() 304 | pose_r_at_5 = recall_at_n(5, predictions, poses).item() 305 | oracle_r_at_5 = recall_at_n(5, layouts, poses).item() 306 | 307 | median_retrieval_error = torch.median(retrieval_errors).item() 308 | median_refined_error = torch.median(refined_errors).item() 309 | median_optimised_error = torch.median(optimised_errors).item() 310 | median_oracle_error = torch.median(oracle_errors).item() 311 | 312 | threshold_1cm = (optimised_errors < 10).float().mean().item() 313 | threshold_5cm = (optimised_errors < 50).float().mean().item() 314 | threshold_10cm = (optimised_errors < 100).float().mean().item() 315 | threshold_100cm = (optimised_errors < 1000).float().mean().item() 316 | 317 | if self.config.TEST.METRIC_DUMP: 318 | data = { 319 | "scene_idxs": scene_idxs, 320 | "room_idxs": room_idxs, 321 | "oracle": oracle_errors.cpu(), 322 | "refinement": refined_errors.cpu(), 323 | "optimisation": optimised_errors.cpu(), 324 | "retrieval": retrieval_errors.cpu(), 325 | "pred_poses": pred_poses, 326 | } 327 | torch.save(data, self.config.TEST.METRIC_DUMP) 328 | 329 | stats_to_log = { 330 | "{}/layout_r_at_1".format(log_key): layout_r_at_1, 331 | "{}/pose_r_at_1".format(log_key): pose_r_at_1, 332 | "{}/layout_r_at_5".format(log_key): layout_r_at_5, 333 | "{}/pose_r_at_5".format(log_key): pose_r_at_5, 334 | "{}/oracle_r_at_1".format(log_key): oracle_r_at_1, 335 | "{}/oracle_r_at_5".format(log_key): oracle_r_at_5, 336 | "{}/median_retrieval_error".format(log_key): median_retrieval_error, 337 | "{}/median_refined_error".format(log_key): median_refined_error, 338 | "{}/median_optimised_error".format(log_key): median_optimised_error, 339 | "{}/median_oracle_error".format(log_key): median_oracle_error, 340 | "{}/threshold_1cm".format(log_key): threshold_1cm, 341 | "{}/threshold_5cm".format(log_key): threshold_5cm, 342 | "{}/threshold_10cm".format(log_key): threshold_10cm, 343 | "{}/threshold_100cm".format(log_key): threshold_100cm, 344 | } 345 | return {"test_loss": 1 - layout_r_at_1, "log": stats_to_log} 346 | 347 | def train_dataloader(self): 348 | return build_dataloader(self.config, "train") 349 | 350 | def val_dataloader(self): 351 | return build_dataloader(self.config, "val") 352 | 353 | def test_dataloader(self): 354 | # Convenient to make "test" actually the validation set so you can recheck val acc at any point 355 | if self.config.TEST.VAL_AS_TEST: 356 | return build_dataloader(self.config, "val") 357 | return build_dataloader(self.config, "test") 358 | 359 | def validation_step(self, batch, batch_idx): 360 | return self.inference_step(batch, batch_idx) 361 | 362 | def test_step(self, batch, batch_idx): 363 | return self.inference_step(batch, batch_idx) 364 | 365 | def validation_epoch_end(self, outputs): 366 | return self.inference_epoch_end(outputs, "val") 367 | 368 | def test_epoch_end(self, outputs): 369 | return self.inference_epoch_end(outputs, "test") 370 | 371 | 372 | class Layout2LayoutBase(Image2LayoutBase): 373 | def __init__(self, config): 374 | super(Layout2LayoutBase, self).__init__(config) 375 | self.embedder = LayoutModule(config) 376 | self.reference_embedder = None 377 | self.query_embedder = None 378 | self.desc_length = config.MODEL.DESC_LENGTH 379 | self.config = config 380 | self.query_key = "pano_layout" 381 | 382 | def forward(self, q=None, r=None): 383 | if q is None and r is None: 384 | raise Exception 385 | 386 | if q is not None: 387 | q = self.embedder(q) 388 | if r is None: 389 | return q 390 | 391 | if r is not None: 392 | n, m, c, h, w = r.shape 393 | r = r.reshape(n * m, c, h, w) 394 | r = self.embedder(r) 395 | r = r.reshape(n, m, self.desc_length) 396 | if q is None: 397 | return r 398 | 399 | d = self.compute_distances(q, r) 400 | return d 401 | -------------------------------------------------------------------------------- /lalaloc/model/lalaloc_pp.py: -------------------------------------------------------------------------------- 1 | import warnings 2 | 3 | import numpy as np 4 | import torch 5 | import torch.nn.functional as F 6 | from sklearn.decomposition import PCA 7 | from torch.utils.data import DataLoader 8 | 9 | from ..data.dataset import TargetEmbeddingDataset 10 | from ..utils.floorplan import pose_to_pixel_loc, sample_locs 11 | from .lalaloc_pp_base import FloorPlanUnetBase 12 | from .losses import bbs_loss 13 | from .modules import LayoutDecoder 14 | from .unet import UNet 15 | 16 | ROOM_VALUE = 0.5 17 | 18 | 19 | def visualise_features(features, valid_mask=None): 20 | features = features.cpu() 21 | if valid_mask is not None: 22 | valid_mask = valid_mask.cpu() 23 | for feature_map in features: 24 | h, w = feature_map.shape[-2:] 25 | feature_map = feature_map.view(-1, h * w).transpose(0, 1).numpy() 26 | pca = PCA(n_components=3) 27 | feature_map_pca = pca.fit_transform(feature_map) 28 | feature_map_pca = feature_map_pca.reshape(h, w, 3) 29 | 30 | shift = np.min(feature_map_pca) 31 | feature_map_pca -= shift 32 | scale = np.max(feature_map_pca) 33 | feature_map_pca /= scale 34 | 35 | if valid_mask is not None: 36 | invalid_mask = ~valid_mask 37 | feature_map_pca[invalid_mask] = 1.0 38 | 39 | return feature_map_pca 40 | 41 | 42 | class FloorPlanUnetLayout(FloorPlanUnetBase): 43 | def __init__(self, config): 44 | super(FloorPlanUnetLayout, self).__init__(config) 45 | self.layout_decoder = LayoutDecoder(config) 46 | 47 | def training_step(self, batch, batch_idx): 48 | plans = batch["floorplan"] 49 | plan_params = batch["floorplan_params"] 50 | plan_scale = plan_params["scale"] 51 | plan_shift = plan_params["shift"] 52 | plan_heights = plan_params["h"] 53 | plan_widths = plan_params["w"] 54 | query_layouts = batch["pano_layout"] 55 | query_pose = batch["pano_pose"] 56 | reference_layouts = batch["sampled_layouts"] 57 | reference_poses = batch["sampled_poses"] 58 | 59 | query_locs = pose_to_pixel_loc(query_pose.unsqueeze(1), plan_scale, plan_shift) 60 | reference_locs = pose_to_pixel_loc(reference_poses, plan_scale, plan_shift) 61 | 62 | # embed floor plan and sample locations 63 | query_embed = [] 64 | reference_embed = [] 65 | for plan, query_loc, reference_loc, h, w in zip( 66 | plans, query_locs, reference_locs, plan_heights, plan_widths 67 | ): 68 | plan_embed = self.floorplan_encoder(plan[:, :h, :w].unsqueeze(0)) 69 | qry_embed = sample_locs( 70 | plan_embed, 71 | query_loc.unsqueeze(0), 72 | normalise=self.config.MODEL.NORMALISE_SAMPLE, 73 | ) 74 | ref_embed = sample_locs( 75 | plan_embed, 76 | reference_loc.unsqueeze(0), 77 | normalise=self.config.MODEL.NORMALISE_SAMPLE, 78 | ) 79 | query_embed.append(qry_embed) 80 | reference_embed.append(ref_embed) 81 | query_embed = torch.cat(query_embed).squeeze(1) 82 | reference_embed = torch.cat(reference_embed) 83 | 84 | # decode the layout embeddings for both queries and reference 85 | query_decoded = self.layout_decoder(query_embed) 86 | query_target = F.interpolate( 87 | query_layouts.detach().clone(), self.config.MODEL.DECODER_RESOLUTION 88 | ) 89 | reference_decoded = self.layout_decoder(reference_embed) 90 | h, w = reference_layouts.shape[-2:] 91 | reference_targets = F.interpolate( 92 | reference_layouts.view(-1, 1, h, w).detach().clone(), 93 | self.config.MODEL.DECODER_RESOLUTION, 94 | ) 95 | decoded = torch.cat([query_decoded, reference_decoded], dim=0) 96 | 97 | # compute decoding loss 98 | target = torch.cat([query_target, reference_targets], dim=0) 99 | loss_decode = F.l1_loss(decoded, target) 100 | loss = self.config.TRAIN.DECODER_LOSS_SCALE * loss_decode 101 | stats_to_log = {"train/decoder_loss": loss_decode.item()} 102 | 103 | # compute bbs loss if specified 104 | if self.config.TRAIN.LOSS == "decoder_plus_bbs": 105 | gt_distances = batch["distances"] 106 | gt_distances = gt_distances.float().to(self.device) 107 | distances = self.compute_distances(query_embed, reference_embed) 108 | loss_layout = bbs_loss(distances, gt_distances) 109 | stats_to_log["train/layout_loss"] = loss_layout.item() 110 | loss = loss + self.config.TRAIN.LAYOUT_LOSS_SCALE * loss_layout 111 | 112 | stats_to_log["train/loss"] = loss.item() 113 | return {"loss": loss, "log": stats_to_log} 114 | 115 | def inference_step(self, batch, batch_idx): 116 | plan_params = batch["floorplan_params"] 117 | plan_height = plan_params["h"] 118 | plan_width = plan_params["w"] 119 | plan = batch["floorplan"][:, :plan_height, :plan_width].unsqueeze(0) 120 | plan_scale = torch.Tensor([plan_params["scale"]]).to(self.device) 121 | plan_shift = plan_params["shift"] 122 | 123 | plan_embed = self.floorplan_encoder(plan).detach() 124 | 125 | # plot latent floor plan 126 | valid_loc_mask = plan[0, 0] == ROOM_VALUE 127 | vis_features = visualise_features(plan_embed, valid_loc_mask) 128 | self.logger.experiment.add_image( 129 | f"unet_feats_{batch_idx}", 130 | vis_features, 131 | self.current_epoch, 132 | dataformats="HWC", 133 | ) 134 | 135 | # sample embedding at query location 136 | query_pose = batch["pano_pose"] 137 | query_z = query_pose[0, -1] 138 | query_loc = pose_to_pixel_loc( 139 | query_pose.unsqueeze(0).clone(), plan_scale, plan_shift 140 | ) 141 | query_embed = sample_locs(plan_embed, query_loc).squeeze(0) 142 | 143 | # legacy sampling of sparse grid to emulate LaLaLoc 144 | reference_poses = batch["sampled_poses"] 145 | reference_locs = pose_to_pixel_loc( 146 | reference_poses.unsqueeze(0).clone(), plan_scale, plan_shift 147 | ) 148 | reference_embed = sample_locs(plan_embed, reference_locs).squeeze(0) 149 | n, _ = query_embed.shape 150 | reference_embed = reference_embed.unsqueeze(0).expand(n, -1, -1) 151 | 152 | distances = self.compute_distances(query_embed, reference_embed).detach().cpu() 153 | gt_distances = batch["distances"].cpu() 154 | pose_distances = torch.norm( 155 | (query_pose.unsqueeze(1) - reference_poses.expand(n, -1, -1)), dim=-1 156 | ).cpu() 157 | 158 | # gather ranking info for the prediction vs the actual 159 | ranking_prediction = torch.argsort(distances, dim=-1)[:, :5] 160 | ranking_layout = torch.argsort(gt_distances, dim=-1)[:, :5] 161 | ranking_pose = torch.argsort(pose_distances, dim=-1)[:, :5] 162 | 163 | retrieval_error = pose_distances.gather( 164 | 1, ranking_prediction[:, 0].unsqueeze(1) 165 | ) 166 | oracle_error = pose_distances.gather(1, ranking_layout[:, 0].unsqueeze(1)) 167 | 168 | # retrieve from dense LaLaLoc++ prediction 169 | refined_poses = optimised_poses = self.predict_pose_dense( 170 | query_embed, plan_embed, plan_scale, plan_shift, query_z, valid_loc_mask 171 | ).to(self.device) 172 | 173 | refined_error = torch.norm(query_pose - refined_poses, dim=-1, keepdim=True) 174 | optimised_error = torch.norm(query_pose - optimised_poses, dim=-1, keepdim=True) 175 | 176 | return { 177 | "pred_rank": ranking_prediction, 178 | "layout_rank": ranking_layout, 179 | "pose_rank": ranking_pose, 180 | "retrieval_error": retrieval_error, 181 | "optimised_error": optimised_error, 182 | "refined_error": refined_error, 183 | "oracle_error": oracle_error, 184 | "pred_pose": optimised_poses, 185 | } 186 | 187 | 188 | class FloorPlanUnetImage(FloorPlanUnetBase): 189 | def __init__(self, config): 190 | super(FloorPlanUnetImage, self).__init__(config) 191 | self.load_weights_from_plan_only(config.TRAIN.SOURCE_WEIGHTS) 192 | for p in self.floorplan_encoder.parameters(): 193 | p.requires_grad = False 194 | 195 | def load_weights_from_plan_only(self, ckpt_path): 196 | if not ckpt_path: 197 | warnings.warn("No source for the layout branch weights was specified") 198 | return 199 | print("Loading Floor Plan Encoder from {}".format(ckpt_path)) 200 | # load weights from plan-branch-only model 201 | ckpt_dict = torch.load(ckpt_path) 202 | model_weights = ckpt_dict["state_dict"] 203 | 204 | # load "embedder" weights into "reference_embedder" 205 | load_dict = {} 206 | for k, v in model_weights.items(): 207 | modules = k.split(".") 208 | parent = modules[0] 209 | if parent == "floorplan_encoder": 210 | child = ".".join(modules[1:]) 211 | load_dict[child] = v 212 | self.floorplan_encoder.load_state_dict(load_dict) 213 | 214 | def train_dataloader(self): 215 | dataset = TargetEmbeddingDataset( 216 | self.floorplan_encoder, self.config, device=self.device 217 | ) 218 | batch_size = self.config.TRAIN.BATCH_SIZE 219 | num_workers = self.config.SYSTEM.NUM_WORKERS 220 | dataloader = DataLoader( 221 | dataset, batch_size, shuffle=True, num_workers=num_workers, 222 | ) 223 | return dataloader 224 | 225 | def training_step(self, batch, batch_idx): 226 | query_image = batch["panorama"] 227 | target_embed = batch["target_embedding"] 228 | 229 | query_embed = self.forward(q=query_image) 230 | 231 | loss = ((target_embed - query_embed) ** 2).sum(dim=1).sqrt().mean() 232 | stats_to_log = {"train/loss": loss.item()} 233 | return {"loss": loss, "log": stats_to_log} 234 | 235 | def inference_step(self, batch, batch_idx): 236 | plan_params = batch["floorplan_params"] 237 | plan_height = plan_params["h"] 238 | plan_width = plan_params["w"] 239 | plan = batch["floorplan"][:, :plan_height, :plan_width].unsqueeze(0) 240 | plan_scale = torch.Tensor([plan_params["scale"]]).to(self.device) 241 | plan_shift = plan_params["shift"] 242 | 243 | plan_embed = self.floorplan_encoder(plan).detach() 244 | 245 | # if specified subsample the embedded floor plan 246 | subsample_x = self.config.TEST.SUBSAMPLE_PLAN_X 247 | if subsample_x > 1: 248 | plan_embed = plan_embed[:, :, ::subsample_x, ::subsample_x] 249 | plan = plan[:, :, ::subsample_x, ::subsample_x] 250 | plan_scale = plan_scale / subsample_x 251 | 252 | # sample embedding at query location 253 | query_image = batch[self.query_key] 254 | query_pose = batch["pano_pose"] 255 | query_embed = self.forward(q=query_image).detach() 256 | 257 | # plot latent floor plan 258 | query_z = query_pose[0, -1] 259 | valid_loc_mask = plan[0, 0] == ROOM_VALUE 260 | vis_features = visualise_features(plan_embed, valid_loc_mask) 261 | self.logger.experiment.add_image( 262 | f"unet_feats_{batch_idx}", 263 | vis_features, 264 | self.current_epoch, 265 | dataformats="HWC", 266 | ) 267 | 268 | # legacy sampling of sparse grid to emulate LaLaLoc 269 | reference_poses = batch["sampled_poses"] 270 | reference_locs = pose_to_pixel_loc( 271 | reference_poses.unsqueeze(0).clone(), plan_scale, plan_shift 272 | ) 273 | reference_embed = sample_locs(plan_embed, reference_locs).squeeze(0) 274 | n, _ = query_embed.shape 275 | reference_embed = reference_embed.unsqueeze(0).expand(n, -1, -1) 276 | 277 | distances = self.compute_distances(query_embed, reference_embed).detach().cpu() 278 | pose_distances = torch.norm( 279 | (query_pose.unsqueeze(1) - reference_poses.expand(n, -1, -1)), dim=-1 280 | ).cpu() 281 | # gather ranking info for the prediction vs the actual 282 | ranking_prediction = torch.argsort(distances, dim=-1)[:, :5] 283 | ranking_pose = torch.argsort(pose_distances, dim=-1)[:, :5] 284 | 285 | retrieval_error = pose_distances.gather( 286 | 1, ranking_prediction[:, 0].unsqueeze(1) 287 | ) 288 | 289 | if self.config.TEST.COMPUTE_GT_DIST: 290 | gt_distances = batch["distances"].cpu() 291 | ranking_layout = torch.argsort(gt_distances, dim=-1)[:, :5] 292 | oracle_error = pose_distances.gather(1, ranking_layout[:, 0].unsqueeze(1)) 293 | else: 294 | oracle_error = torch.zeros_like(retrieval_error) 295 | ranking_layout = torch.zeros_like(ranking_prediction) 296 | 297 | # retrieve from dense prediction and optimise them 298 | refined_poses = self.predict_pose_dense( 299 | query_embed, plan_embed, plan_scale, plan_shift, query_z, valid_loc_mask 300 | ) 301 | optimised_poses = self.optimise_pose( 302 | query_embed, refined_poses.clone(), plan_embed, plan_scale, plan_shift 303 | ) 304 | 305 | refined_error = torch.norm(query_pose - refined_poses, dim=-1, keepdim=True) 306 | optimised_error = torch.norm(query_pose - optimised_poses, dim=-1, keepdim=True) 307 | return { 308 | "pred_rank": ranking_prediction, 309 | "layout_rank": ranking_layout, 310 | "pose_rank": ranking_pose, 311 | "retrieval_error": retrieval_error, 312 | "optimised_error": optimised_error, 313 | "refined_error": refined_error, 314 | "oracle_error": oracle_error, 315 | "pred_pose": optimised_poses, 316 | } 317 | 318 | -------------------------------------------------------------------------------- /lalaloc/model/lalaloc_pp_base.py: -------------------------------------------------------------------------------- 1 | import warnings 2 | 3 | import numpy as np 4 | import torch 5 | import torch.nn.functional as F 6 | from sklearn.decomposition import PCA 7 | from torch.utils.data import DataLoader 8 | 9 | from ..utils.floorplan import ( 10 | create_pixel_loc_grid, 11 | pixel_loc_to_pose, 12 | pose_to_pixel_loc, 13 | sample_locs, 14 | ) 15 | from .lalaloc_base import Image2LayoutBase 16 | from .pose_optimisation import PoseConvergenceChecker, init_optimiser 17 | from .unet import UNet 18 | 19 | 20 | class FloorPlanUnetBase(Image2LayoutBase): 21 | def __init__(self, config): 22 | super(FloorPlanUnetBase, self).__init__(config) 23 | # Remove uneeded LaLaLoc layout branch 24 | self.reference_embedder = None 25 | # Create LaLaLoc++ plan branch 26 | self.floorplan_encoder = UNet(config) 27 | 28 | def predict_pose_dense( 29 | self, query_desc, plan_embed, plan_scale, plan_shift, query_z, mask=None 30 | ): 31 | _, c, h, w = plan_embed.shape 32 | n = query_desc.shape[0] 33 | dense_loc_grid = create_pixel_loc_grid(w, h) 34 | dense_pose_grid = pixel_loc_to_pose( 35 | dense_loc_grid, plan_scale.cpu(), plan_shift.cpu(), query_z 36 | ).cpu() 37 | plan_embed_ = plan_embed.clone() 38 | if mask is not None: 39 | dense_pose_grid = dense_pose_grid[mask, :] 40 | plan_embed_ = plan_embed_[:, :, mask] 41 | dense_poses = dense_pose_grid.view(-1, 3) 42 | plan_embed_ = plan_embed_.view(c, -1).transpose(0, 1) 43 | 44 | plan_embed_ = plan_embed_.unsqueeze(0).expand(n, -1, -1) 45 | pred_distances = self.compute_distances(query_desc, plan_embed_).detach().cpu() 46 | ranking_dense = torch.argsort(pred_distances, dim=-1)[:, :5] 47 | pred_poses = dense_poses[ranking_dense[:, 0]].to(self.device) 48 | return pred_poses 49 | 50 | def optimise_pose( 51 | self, query_embeddings, initial_poses, feature_map, plan_scale, plan_shift 52 | ): 53 | torch.set_grad_enabled(True) 54 | initial_locs = pose_to_pixel_loc( 55 | initial_poses.unsqueeze(1), plan_scale, plan_shift 56 | ) 57 | refined_locs = [] 58 | for query_embedding, initial_loc in zip(query_embeddings, initial_locs): 59 | offset = torch.zeros((1, 2), requires_grad=True, device=self.device) 60 | 61 | # initialise optimisation and stopping metrics 62 | optimiser, scheduler = init_optimiser(self.config, [offset]) 63 | convergence_checker = PoseConvergenceChecker(self.config) 64 | 65 | for j in range(self.config.POSE_REFINE.MAX_ITERS): 66 | optimiser.zero_grad() 67 | embedding = sample_locs( 68 | feature_map, (initial_loc + offset).unsqueeze(0) 69 | ) 70 | loss = torch.norm(query_embedding - embedding, dim=-1) 71 | loss.backward() 72 | 73 | current_loss = loss.item() 74 | current_loc = (initial_loc + offset).clone().detach() 75 | if convergence_checker.has_converged(current_loss, current_loc): 76 | break 77 | optimiser.step() 78 | scheduler.step(current_loss) 79 | refined_loc = convergence_checker.best_pose 80 | refined_locs.append(refined_loc) 81 | refined_locs = torch.stack(refined_locs) 82 | z = initial_poses[0, -1] 83 | refined_poses = pixel_loc_to_pose(refined_locs, plan_scale, plan_shift, z).view( 84 | -1, 3 85 | ) 86 | torch.set_grad_enabled(False) 87 | return refined_poses 88 | -------------------------------------------------------------------------------- /lalaloc/model/losses.py: -------------------------------------------------------------------------------- 1 | import torch 2 | 3 | def triplet_loss(distances, margin=1): 4 | pos_distances = distances[:, 0] 5 | neg_distances = distances[:, 1] 6 | 7 | losses = (pos_distances - neg_distances + margin).clamp(min=0) 8 | loss = losses.sum() 9 | if loss > 0: 10 | loss = loss / len(torch.nonzero(losses)) 11 | return loss 12 | 13 | 14 | def bbs_loss(distances, distances_truth): 15 | n = distances.shape[1] 16 | rows = distances.unsqueeze(1).expand(-1, n, -1) 17 | cols = distances.unsqueeze(2).expand(-1, -1, n) 18 | 19 | rows_gt = distances_truth.unsqueeze(1).expand(-1, n, -1) 20 | cols_gt = distances_truth.unsqueeze(2).expand(-1, -1, n) 21 | 22 | loss = (rows / cols).log() - (rows_gt / cols_gt).log() 23 | # remove i, i matches 24 | identity = torch.eye(n, device=loss.device).unsqueeze(0).expand_as(loss).bool() 25 | loss = loss[~identity] 26 | 27 | loss = (loss ** 2).mean() / 2 28 | return loss -------------------------------------------------------------------------------- /lalaloc/model/modules.py: -------------------------------------------------------------------------------- 1 | import torch 2 | import torch.nn as nn 3 | import torch.nn.functional as F 4 | from torchvision import models 5 | 6 | from .position_encoding import build_position_encoding 7 | from .transformer import TransformerEncoder, TransformerEncoderLayer 8 | 9 | 10 | class EmbeddingModule(nn.Module): 11 | def __init__(self, in_channels, desc_channels): 12 | super(EmbeddingModule, self).__init__() 13 | self.pool = nn.AdaptiveAvgPool2d((1, 1)) 14 | self.fc = nn.Linear(in_channels, desc_channels) 15 | 16 | def forward(self, x): 17 | x = self.pool(x) 18 | x = torch.flatten(x, 1) 19 | x = self.fc(x) 20 | return x 21 | 22 | 23 | class MLPEmbeddingModule(nn.Module): 24 | def __init__(self, in_channels, desc_channels): 25 | super(MLPEmbeddingModule, self).__init__() 26 | self.pool = nn.AdaptiveAvgPool2d((1, 1)) 27 | self.mlp = nn.Sequential( 28 | nn.Linear(in_channels, in_channels), 29 | nn.ReLU(), 30 | nn.Linear(in_channels, desc_channels), 31 | ) 32 | 33 | def forward(self, x): 34 | x = self.pool(x) 35 | x = torch.flatten(x, 1) 36 | x = self.mlp(x) 37 | return x 38 | 39 | 40 | class TransformerFCEmbeddingModule(nn.Module): 41 | def __init__( 42 | self, 43 | in_channels, 44 | desc_channels, 45 | pos_at_input=True, 46 | hidden_dim=2048, 47 | num_heads=8, 48 | num_blocks=2, 49 | ): 50 | super().__init__() 51 | self.position_encoder = build_position_encoding(hidden_dim=desc_channels) 52 | self.dim_reduction = nn.Conv2d(in_channels, desc_channels, 1) 53 | encoder_layer = TransformerEncoderLayer(desc_channels, num_heads, hidden_dim) 54 | self.encoder = TransformerEncoder(encoder_layer, num_blocks) 55 | self.pool = nn.AdaptiveAvgPool2d((1, 1)) 56 | self.fc = nn.Linear(desc_channels, desc_channels) 57 | self.pos_at_input = pos_at_input 58 | 59 | def forward(self, x): 60 | pos = self.position_encoder(x) 61 | x = self.dim_reduction(x) 62 | b, c, h, w = x.shape 63 | 64 | x = x.flatten(2).permute(2, 0, 1) # NxCxHxW -> HWxNxC 65 | pos = pos.flatten(2).permute(2, 0, 1) # NxCxHxW -> HWxNxC 66 | if self.pos_at_input: 67 | x = x + pos 68 | pos = None 69 | 70 | x = self.encoder(x, pos=pos).permute(1, 2, 0).view(b, c, h, w) 71 | x = self.pool(x).flatten(1) 72 | x = self.fc(x) 73 | return x 74 | 75 | 76 | class PanoModule(nn.Module): 77 | def __init__(self, config): 78 | super(PanoModule, self).__init__() 79 | desc_length = config.MODEL.DESC_LENGTH 80 | normalise = config.MODEL.NORMALISE_EMBEDDING 81 | pos_at_input = config.MODEL.PANORAMA_MODULE.POS_AT_INPUT 82 | num_blocks = config.MODEL.PANORAMA_MODULE.NUM_BLOCKS 83 | hidden_dim = config.MODEL.PANORAMA_MODULE.HIDDEN_DIM 84 | net, out_dim = _create_backbone(config.MODEL.PANORAMA_BACKBONE) 85 | self.layers = nn.Sequential(*list(net.children())[:-2]) 86 | if config.MODEL.PANO_EMBEDDER_TYPE == "fc": 87 | self.embedding = EmbeddingModule(out_dim, desc_length) 88 | elif config.MODEL.PANO_EMBEDDER_TYPE == "mlp": 89 | self.embedding = MLPEmbeddingModule(out_dim, desc_length) 90 | elif config.MODEL.PANO_EMBEDDER_TYPE == "transformer-fc": 91 | self.embedding = TransformerFCEmbeddingModule( 92 | out_dim, 93 | desc_length, 94 | pos_at_input=pos_at_input, 95 | num_blocks=num_blocks, 96 | hidden_dim=hidden_dim, 97 | ) 98 | self.normalise = normalise 99 | 100 | def forward(self, x): 101 | x = self.layers(x) 102 | x = self.embedding(x) 103 | if self.normalise: 104 | x = F.normalize(x) 105 | return x 106 | 107 | 108 | class LayoutModule(PanoModule): 109 | def __init__(self, config): 110 | super(LayoutModule, self).__init__(config) 111 | desc_length = config.MODEL.DESC_LENGTH 112 | net, out_dim = _create_backbone(config.MODEL.LAYOUT_BACKBONE) 113 | layers = [ 114 | nn.Conv2d(1, 64, kernel_size=7, stride=2, padding=3), 115 | ] 116 | layers += list(net.children())[1:-2] 117 | self.layers = nn.Sequential(*layers) 118 | if config.MODEL.LAYOUT_EMBEDDER_TYPE == "fc": 119 | self.embedding = EmbeddingModule(out_dim, desc_length) 120 | elif config.MODEL.LAYOUT_EMBEDDER_TYPE == "mlp": 121 | self.embedding = MLPEmbeddingModule(out_dim, desc_length) 122 | 123 | 124 | class LayoutDecoder(nn.Module): 125 | def __init__(self, config): 126 | super().__init__() 127 | desc_length = config.MODEL.DESC_LENGTH 128 | self.fc = nn.Sequential(nn.Linear(desc_length, 2048), nn.ReLU()) 129 | upsample_layers = [ 130 | nn.Upsample(scale_factor=2.0, mode="bilinear", align_corners=False), 131 | nn.Conv2d(256, 128, 3, padding=1), 132 | nn.ReLU(), 133 | nn.BatchNorm2d(128), 134 | nn.Upsample(scale_factor=2.0, mode="bilinear", align_corners=False), 135 | nn.Conv2d(128, 64, 3, padding=1), 136 | nn.ReLU(), 137 | nn.BatchNorm2d(64), 138 | nn.Upsample(scale_factor=2.0, mode="bilinear", align_corners=False), 139 | nn.Conv2d(64, 32, 3, padding=1), 140 | nn.ReLU(), 141 | nn.BatchNorm2d(32), 142 | nn.Upsample(scale_factor=2.0, mode="bilinear", align_corners=False), 143 | nn.Conv2d(32, 32, 3, padding=1), 144 | nn.ReLU(), 145 | nn.BatchNorm2d(32), 146 | nn.Conv2d(32, 1, 1), 147 | ] 148 | self.decov_layers = nn.Sequential(*upsample_layers) 149 | 150 | def forward(self, x): 151 | x = self.fc(x) 152 | x = x.view(-1, 256, 2, 4) 153 | x = self.decov_layers(x) 154 | return x 155 | 156 | 157 | def _create_backbone(name): 158 | backbones = { 159 | "resnet18": (models.resnet18(pretrained=True), 512), 160 | "resnet50": (models.resnet50(pretrained=True), 2048), 161 | } 162 | return backbones[name] 163 | -------------------------------------------------------------------------------- /lalaloc/model/pose_optimisation.py: -------------------------------------------------------------------------------- 1 | import pyredner 2 | import torch 3 | 4 | 5 | class PoseConvergenceChecker: 6 | def __init__(self, config): 7 | self.best_loss = 1e5 8 | self.best_pose = None 9 | self.converge_count = 0 10 | self.converge_threshold = config.POSE_REFINE.CONVERGANCE_THRESHOLD 11 | self.converge_patience = config.POSE_REFINE.CONVERGANCE_PATIENCE 12 | 13 | def has_converged(self, current_loss, current_pose): 14 | delta = 0 15 | if current_loss < self.best_loss: 16 | delta = self.best_loss - current_loss 17 | self.best_pose = current_pose 18 | self.best_loss = current_loss 19 | if delta < self.converge_threshold: 20 | self.converge_count += 1 21 | else: 22 | self.converge_count = 0 23 | 24 | return self.converge_count > self.converge_patience 25 | 26 | 27 | def init_objects_at_pose(pose, mesh, device): 28 | # Creates pyrender objects from the mesh at the specified pose 29 | objects = [] 30 | material = pyredner.Material() 31 | for verts, faces in zip(mesh.verts_list(), mesh.faces_list()): 32 | verts = verts.to(device) - pose.unsqueeze(0) 33 | faces = faces.to(device) 34 | objects.append(pyredner.Object(verts, faces.int(), material)) 35 | vertices = [obj.vertices.clone() for obj in objects] 36 | 37 | return objects, vertices 38 | 39 | 40 | def init_camera_at_origin(config): 41 | # Creates a pyrender camera at the origin 42 | origin = torch.Tensor([0.0, 0.0, 0.0]) 43 | # look at x axis equivalent to an offset x' 44 | look_at = torch.Tensor([1, 0, 0]) 45 | up = torch.Tensor([0.0, 0.0, 1.0]) 46 | camera = pyredner.Camera( 47 | position=origin, 48 | look_at=look_at, 49 | up=up, 50 | camera_type=pyredner.camera_type.panorama, 51 | resolution=config.POSE_REFINE.RENDER_SIZE, 52 | ) 53 | return camera 54 | 55 | 56 | def init_optimiser(config, params): 57 | optimiser = torch.optim.Adam(params, lr=config.POSE_REFINE.LR) 58 | scheduler = torch.optim.lr_scheduler.ReduceLROnPlateau( 59 | optimiser, 60 | patience=config.POSE_REFINE.SCHEDULER_PATIENCE, 61 | threshold=config.POSE_REFINE.SCHEDULER_THRESHOLD, 62 | factor=config.POSE_REFINE.SCHEDULER_DECAY, 63 | ) 64 | return optimiser, scheduler 65 | 66 | 67 | def render_at_pose(camera, objects, vertices, pose): 68 | for i in range(len(objects)): 69 | objects[i].vertices = vertices[i] - pose * 1000 70 | 71 | scene = pyredner.Scene(camera=camera, objects=objects) 72 | img = pyredner.render_g_buffer(scene, [pyredner.channels.depth], device=pose.device) 73 | img = img.flip(dims=[1]) 74 | return img.squeeze(2) 75 | 76 | -------------------------------------------------------------------------------- /lalaloc/model/position_encoding.py: -------------------------------------------------------------------------------- 1 | # Copyright (c) Facebook, Inc. and its affiliates. All Rights Reserved 2 | """ 3 | Various positional encodings for the transformer. 4 | """ 5 | import math 6 | 7 | import torch 8 | from torch import nn 9 | 10 | 11 | class PositionEmbeddingSine(nn.Module): 12 | """ 13 | This is a more standard version of the position embedding, very similar to the one 14 | used by the Attention is all you need paper, generalized to work on images. 15 | """ 16 | 17 | def __init__( 18 | self, num_pos_feats=64, temperature=10000, normalize=False, scale=None 19 | ): 20 | super().__init__() 21 | self.num_pos_feats = num_pos_feats 22 | self.temperature = temperature 23 | self.normalize = normalize 24 | if scale is not None and normalize is False: 25 | raise ValueError("normalize should be True if scale is passed") 26 | if scale is None: 27 | scale = 2 * math.pi 28 | self.scale = scale 29 | 30 | def forward(self, x): 31 | b, c, h, w = x.shape 32 | not_mask = torch.ones((b, h, w), dtype=torch.uint8, device=x.device) 33 | 34 | y_embed = not_mask.cumsum(1, dtype=torch.float32) 35 | x_embed = not_mask.cumsum(2, dtype=torch.float32) 36 | if self.normalize: 37 | eps = 1e-6 38 | y_embed = y_embed / (y_embed[:, -1:, :] + eps) * self.scale 39 | x_embed = x_embed / (x_embed[:, :, -1:] + eps) * self.scale 40 | 41 | dim_t = torch.arange(self.num_pos_feats, dtype=torch.float32, device=x.device) 42 | dim_t = self.temperature ** (2 * (dim_t // 2) / self.num_pos_feats) 43 | 44 | pos_x = x_embed[:, :, :, None] / dim_t 45 | pos_y = y_embed[:, :, :, None] / dim_t 46 | 47 | pos_x = torch.stack( 48 | (pos_x[:, :, :, 0::2].sin(), pos_x[:, :, :, 1::2].cos()), dim=4 49 | ).flatten(3) 50 | pos_y = torch.stack( 51 | (pos_y[:, :, :, 0::2].sin(), pos_y[:, :, :, 1::2].cos()), dim=4 52 | ).flatten(3) 53 | pos = torch.cat((pos_y, pos_x), dim=3).permute(0, 3, 1, 2) 54 | 55 | return pos 56 | 57 | 58 | def build_position_encoding(position_embedding_mode="sine", hidden_dim=256): 59 | N_steps = hidden_dim // 2 60 | if position_embedding_mode in ("v2", "sine"): 61 | # TODO find a better way of exposing other arguments 62 | position_embedding = PositionEmbeddingSine(N_steps, normalize=True) 63 | else: 64 | raise ValueError(f"not supported {position_embedding_mode}") 65 | 66 | return position_embedding 67 | -------------------------------------------------------------------------------- /lalaloc/model/transformer.py: -------------------------------------------------------------------------------- 1 | from typing import Optional 2 | 3 | import torch.nn as nn 4 | from torch import Tensor 5 | 6 | 7 | class TransformerEncoder(nn.TransformerEncoder): 8 | def forward( 9 | self, 10 | src: Tensor, 11 | mask: Optional[Tensor] = None, 12 | src_key_padding_mask: Optional[Tensor] = None, 13 | pos: Optional[Tensor] = None, 14 | ) -> Tensor: 15 | output = src 16 | 17 | for mod in self.layers: 18 | output = mod( 19 | output, 20 | src_mask=mask, 21 | src_key_padding_mask=src_key_padding_mask, 22 | pos=pos, 23 | ) 24 | 25 | if self.norm is not None: 26 | output = self.norm(output) 27 | 28 | return output 29 | 30 | 31 | class TransformerEncoderLayer(nn.TransformerEncoderLayer): 32 | def with_pos_embed(self, tensor, pos: Optional[Tensor]): 33 | # print("encoder", pos is None) 34 | return tensor if pos is None else tensor + pos 35 | 36 | def forward( 37 | self, 38 | src: Tensor, 39 | src_mask: Optional[Tensor] = None, 40 | src_key_padding_mask: Optional[Tensor] = None, 41 | pos: Optional[Tensor] = None, 42 | ) -> Tensor: 43 | q = k = self.with_pos_embed(src, pos) 44 | src2 = self.self_attn( 45 | q, k, src, attn_mask=src_mask, key_padding_mask=src_key_padding_mask 46 | )[0] 47 | src = src + self.dropout1(src2) 48 | src = self.norm1(src) 49 | src2 = self.linear2(self.dropout(self.activation(self.linear1(src)))) 50 | src = src + self.dropout2(src2) 51 | src = self.norm2(src) 52 | return src 53 | 54 | 55 | class TransformerDecoder(nn.TransformerDecoder): 56 | def forward( 57 | self, 58 | tgt: Tensor, 59 | memory: Tensor, 60 | tgt_mask: Optional[Tensor] = None, 61 | memory_mask: Optional[Tensor] = None, 62 | tgt_key_padding_mask: Optional[Tensor] = None, 63 | memory_key_padding_mask: Optional[Tensor] = None, 64 | tgt_pos: Optional[Tensor] = None, 65 | memory_pos: Optional[Tensor] = None, 66 | output_attention: Optional[bool] = False, 67 | ) -> Tensor: 68 | output = tgt 69 | attentions = [] 70 | for mod in self.layers: 71 | output, attention = mod( 72 | output, 73 | memory, 74 | tgt_mask=tgt_mask, 75 | memory_mask=memory_mask, 76 | tgt_key_padding_mask=tgt_key_padding_mask, 77 | memory_key_padding_mask=memory_key_padding_mask, 78 | tgt_pos=tgt_pos, 79 | memory_pos=memory_pos, 80 | output_attention=True, 81 | ) 82 | attentions.append(attention) 83 | 84 | if self.norm is not None: 85 | output = self.norm(output) 86 | 87 | if output_attention: 88 | return output, attentions 89 | return output 90 | 91 | 92 | class TransformerDecoderLayer(nn.TransformerDecoderLayer): 93 | def __init__( 94 | self, d_model, nhead, dim_feedforward=2048, dropout=0.1, activation="relu" 95 | ): 96 | super().__init__( 97 | d_model, 98 | nhead, 99 | dim_feedforward=dim_feedforward, 100 | dropout=dropout, 101 | activation=activation, 102 | ) 103 | 104 | def forward( 105 | self, 106 | tgt: Tensor, 107 | memory: Tensor, 108 | tgt_mask: Optional[Tensor] = None, 109 | memory_mask: Optional[Tensor] = None, 110 | tgt_key_padding_mask: Optional[Tensor] = None, 111 | memory_key_padding_mask: Optional[Tensor] = None, 112 | tgt_pos: Optional[Tensor] = None, 113 | memory_pos: Optional[Tensor] = None, 114 | output_attention: Optional[bool] = False, 115 | ) -> Tensor: 116 | x = tgt 117 | x = self.norm1(x + self._sa_block(x, tgt_mask, tgt_key_padding_mask, tgt_pos)) 118 | x_, attn = self._mha_block( 119 | x, 120 | memory, 121 | memory_mask, 122 | memory_key_padding_mask, 123 | x_pos=tgt_pos, 124 | mem_pos=memory_pos, 125 | output_attention=True, 126 | ) 127 | x = self.norm2(x + x_) 128 | x = self.norm3(x + self._ff_block(x)) 129 | if output_attention: 130 | return x, attn 131 | return x 132 | 133 | def with_pos_embed(self, tensor, pos: Optional[Tensor]): 134 | # print("decoder", pos is None) 135 | return tensor if pos is None else tensor + pos 136 | 137 | def _sa_block( 138 | self, 139 | x: Tensor, 140 | attn_mask: Optional[Tensor], 141 | key_padding_mask: Optional[Tensor], 142 | pos: Optional[Tensor], 143 | ) -> Tensor: 144 | q = k = self.with_pos_embed(x, pos) 145 | x = self.self_attn( 146 | q, 147 | k, 148 | x, 149 | attn_mask=attn_mask, 150 | key_padding_mask=key_padding_mask, 151 | need_weights=False, 152 | )[0] 153 | return self.dropout1(x) 154 | 155 | def _mha_block( 156 | self, 157 | x: Tensor, 158 | mem: Tensor, 159 | attn_mask: Optional[Tensor], 160 | key_padding_mask: Optional[Tensor], 161 | x_pos: Optional[Tensor], 162 | mem_pos: Optional[Tensor], 163 | output_attention: Optional[bool] = False, 164 | ) -> Tensor: 165 | x, attn = self.multihead_attn( 166 | self.with_pos_embed(x, x_pos), 167 | self.with_pos_embed(mem, mem_pos), 168 | mem, 169 | attn_mask=attn_mask, 170 | key_padding_mask=key_padding_mask, 171 | need_weights=True, 172 | ) 173 | if output_attention: 174 | return self.dropout2(x), attn 175 | return self.dropout2(x) 176 | 177 | def _ff_block(self, x: Tensor) -> Tensor: 178 | x = self.linear2(self.dropout(self.activation(self.linear1(x)))) 179 | return self.dropout3(x) 180 | -------------------------------------------------------------------------------- /lalaloc/model/unet.py: -------------------------------------------------------------------------------- 1 | import torch 2 | import torch.nn as nn 3 | import torch.nn.functional as F 4 | 5 | 6 | class ConvBlock(nn.Module): 7 | def __init__(self, in_channels, out_channels): 8 | super().__init__() 9 | self.conv1 = nn.Conv2d(in_channels, out_channels, 3, padding=1) 10 | self.bn1 = nn.BatchNorm2d(out_channels) 11 | self.relu1 = nn.ReLU() 12 | self.conv2 = nn.Conv2d(out_channels, out_channels, 3, padding=1) 13 | self.bn2 = nn.BatchNorm2d(out_channels) 14 | self.relu2 = nn.ReLU() 15 | 16 | def forward(self, x): 17 | x = self.conv1(x) 18 | x = self.bn1(x) 19 | x = self.relu1(x) 20 | x = self.conv2(x) 21 | x = self.bn2(x) 22 | x = self.relu2(x) 23 | return x 24 | 25 | 26 | class DecodeBlock(nn.Module): 27 | def __init__(self, in_channels, out_channels): 28 | super().__init__() 29 | self.upsample = nn.Upsample( 30 | scale_factor=2.0, mode="bilinear", align_corners=False 31 | ) 32 | self.conv = ConvBlock(in_channels * 2, out_channels) 33 | 34 | def forward(self, x, encoder_x): 35 | x = self.upsample(x) 36 | x = torch.cat([x, encoder_x], dim=1) 37 | x = self.conv(x) 38 | return x 39 | 40 | 41 | class Encoder(nn.Module): 42 | def __init__(self, channels): 43 | super().__init__() 44 | in_channels = 3 45 | blocks = [] 46 | for out_channels in channels: 47 | blocks.append(ConvBlock(in_channels, out_channels)) 48 | in_channels = out_channels 49 | self.blocks = nn.ModuleList(blocks) 50 | self.pool = nn.MaxPool2d(2) 51 | 52 | def forward(self, x): 53 | features = [] 54 | for block in self.blocks: 55 | x = block(x) 56 | features.append(x) 57 | x = self.pool(x) 58 | return x, features 59 | 60 | 61 | class Decoder(nn.Module): 62 | def __init__(self, channels, in_channels): 63 | super().__init__() 64 | blocks = [] 65 | self.initial_block = DecodeBlock(in_channels, channels[0]) 66 | for out_channels in channels[1:]: 67 | blocks.append(DecodeBlock(in_channels, out_channels)) 68 | in_channels = out_channels 69 | self.blocks = nn.ModuleList(blocks) 70 | 71 | def forward(self, x, encoder_features): 72 | for block, encoder_x in zip(self.blocks, encoder_features): 73 | x = block(x, encoder_x) 74 | return x 75 | 76 | 77 | class UNet(nn.Module): 78 | def __init__(self, config): 79 | super().__init__() 80 | encoder_channels = config.MODEL.UNET_ENCODER_CHANNELS 81 | decoder_channels = config.MODEL.UNET_DECODER_CHANNELS 82 | self.encoder = Encoder(encoder_channels) 83 | self.decoder = Decoder(decoder_channels, encoder_channels[-1]) 84 | self.conv = nn.Conv2d(128, 128, 3, padding=1) 85 | 86 | def forward(self, x): 87 | x, features = self.encoder(x) 88 | x = self.decoder(x, features[::-1]) 89 | x = self.conv(x) 90 | x = F.normalize(x, dim=1) 91 | return x 92 | -------------------------------------------------------------------------------- /lalaloc/utils/chamfer.py: -------------------------------------------------------------------------------- 1 | import torch 2 | from pytorch3d.loss import chamfer_distance 3 | 4 | 5 | def chamfer(points1, points2, device=None): 6 | with torch.no_grad(): 7 | if device is None: 8 | points1 = points1.cuda() 9 | points2 = points2.cuda() 10 | else: 11 | points1 = points1.to(device) 12 | points2 = points2.to(device) 13 | distances, _ = chamfer_distance(points1, points2, batch_reduction=None) 14 | return distances.cpu() 15 | -------------------------------------------------------------------------------- /lalaloc/utils/eval.py: -------------------------------------------------------------------------------- 1 | import torch 2 | 3 | def index_of_val(arr, value): 4 | return (arr == value).nonzero() 5 | 6 | 7 | def recall_at_n(n, predictions, truths): 8 | results = [] 9 | for prediction, truth in zip(predictions, truths): 10 | intersection = [p for p in prediction[:n] if p in truth[:n]] 11 | result = 1 if len(intersection) > 0 else 0 12 | results.append(result) 13 | results = torch.Tensor(results).mean() 14 | return results -------------------------------------------------------------------------------- /lalaloc/utils/floorplan.py: -------------------------------------------------------------------------------- 1 | import torch 2 | import torch.nn.functional as F 3 | 4 | 5 | def pose_to_pixel_loc(pose, scale, shift): 6 | locs = pose.clone()[:, :, :2] 7 | locs -= shift.view(-1, 1, 2) 8 | locs *= scale.view(-1, 1, 1) 9 | return locs 10 | 11 | 12 | def pixel_loc_to_pose(locs, scale, shift, z): 13 | poses = locs.clone().float() 14 | poses /= scale.view(-1, 1, 1) 15 | poses += shift.view(-1, 1, 2) 16 | h, w, _ = poses.shape 17 | zs = torch.full((h, w, 1), z).to(poses.device) 18 | return torch.cat([poses, zs], dim=-1) 19 | 20 | 21 | def create_pixel_loc_grid(w, h): 22 | x = torch.arange(w) 23 | y = torch.arange(h) 24 | ys, xs = torch.meshgrid([y, x]) 25 | loc_grid = torch.stack([xs, ys], dim=-1) 26 | return loc_grid 27 | 28 | 29 | def sample_locs(tensor, locs, normalise=True): 30 | b, c, h, w = tensor.shape 31 | locs[:, :, 0] = locs[:, :, 0] / (w / 2) - 1 32 | locs[:, :, 1] = locs[:, :, 1] / (h / 2) - 1 33 | _, n, _ = locs.shape 34 | locs = locs.view(b, 1, -1, 2).clone().float() 35 | sampled = F.grid_sample(tensor, locs, align_corners=False) 36 | sampled = sampled.view(b, c, n).permute(0, 2, 1) 37 | if normalise: 38 | sampled = F.normalize(sampled, dim=-1) 39 | return sampled 40 | -------------------------------------------------------------------------------- /lalaloc/utils/panorama.py: -------------------------------------------------------------------------------- 1 | """ 2 | Parts of this code are modified from: https://github.com/bertjiazheng/Structured3D 3 | Copyright (c) 2019 Structured3D Group 4 | """ 5 | import numpy as np 6 | 7 | 8 | def uvs_to_rays(uvs): 9 | xs_ray = np.cos(uvs[:, 1]) * np.sin(uvs[:, 0]) 10 | ys_ray = np.cos(uvs[:, 1]) * np.cos(uvs[:, 0]) 11 | zs_ray = np.sin(uvs[:, 1]) 12 | 13 | rays = np.stack([xs_ray, ys_ray, zs_ray], axis=1) 14 | rays = rays / np.linalg.norm(rays, axis=1).reshape(-1, 1) 15 | return rays 16 | 17 | 18 | def coords_to_uv(coords, width, height): 19 | """ 20 | Image coordinates (xy) to uv 21 | """ 22 | middleX = width / 2 + 0.5 23 | middleY = height / 2 + 0.5 24 | uv = np.hstack( 25 | [ 26 | (coords[:, [0]] - middleX) / width * 2 * np.pi, 27 | -(coords[:, [1]] - middleY) / height * np.pi, 28 | ] 29 | ) 30 | return uv 31 | -------------------------------------------------------------------------------- /lalaloc/utils/polygons.py: -------------------------------------------------------------------------------- 1 | """ 2 | Parts of this code are modified from: https://github.com/bertjiazheng/Structured3D 3 | Copyright (c) 2019 Structured3D Group 4 | """ 5 | import numpy as np 6 | import pymesh 7 | 8 | 9 | def project(x, meta): 10 | """ project 3D to 2D for polygon clipping 11 | """ 12 | proj_axis = max(range(3), key=lambda i: abs(meta["normal"][i])) 13 | 14 | return tuple(c for i, c in enumerate(x) if i != proj_axis) 15 | 16 | 17 | def project_inv(x, meta): 18 | """ recover 3D points from 2D 19 | """ 20 | # Returns the vector w in the walls' plane such that project(w) equals x. 21 | proj_axis = max(range(3), key=lambda i: abs(meta["normal"][i])) 22 | 23 | w = list(x) 24 | w[proj_axis:proj_axis] = [0.0] 25 | c = -meta["offset"] 26 | for i in range(3): 27 | c -= w[i] * meta["normal"][i] 28 | c /= meta["normal"][proj_axis] 29 | w[proj_axis] = c 30 | return tuple(w) 31 | 32 | 33 | def triangulate(points): 34 | """ triangulate the plane for operation and visualization 35 | """ 36 | 37 | num_points = len(points) 38 | indices = np.arange(num_points, dtype=np.int) 39 | segments = np.vstack((indices, np.roll(indices, -1))).T 40 | tri = pymesh.triangle() 41 | tri.points = np.array(points) 42 | 43 | tri.segments = segments 44 | tri.verbosity = 0 45 | tri.run() 46 | return tri.mesh 47 | 48 | 49 | def clip_polygon(polygons, vertices_hole, junctions, meta, clip_holes=True): 50 | """ clip polygon the hole 51 | """ 52 | if len(polygons) == 1: 53 | junctions = [junctions[vertex] for vertex in polygons[0]] 54 | mesh_wall = triangulate(junctions) 55 | vertices = np.array(mesh_wall.vertices) 56 | faces = np.array(mesh_wall.faces) 57 | 58 | return vertices, faces 59 | 60 | else: 61 | wall = [] 62 | holes = [] 63 | for polygon in polygons: 64 | if np.any(np.intersect1d(polygon, vertices_hole)): 65 | holes.append(polygon) 66 | else: 67 | wall.append(polygon) 68 | 69 | # extract junctions on this plane 70 | indices = [] 71 | junctions_wall = [] 72 | for plane in wall: 73 | for vertex in plane: 74 | indices.append(vertex) 75 | junctions_wall.append(junctions[vertex]) 76 | junctions_wall = [project(x, meta) for x in junctions_wall] 77 | mesh_wall = triangulate(junctions_wall) 78 | 79 | if clip_holes: 80 | junctions_holes = [] 81 | for plane in holes: 82 | junctions_hole = [] 83 | for vertex in plane: 84 | indices.append(vertex) 85 | junctions_hole.append(junctions[vertex]) 86 | junctions_holes.append(junctions_hole) 87 | 88 | junctions_holes = [ 89 | [project(x, meta) for x in junctions_hole] 90 | for junctions_hole in junctions_holes 91 | ] 92 | 93 | for hole in junctions_holes: 94 | mesh_hole = triangulate(hole) 95 | mesh_wall = pymesh.boolean(mesh_wall, mesh_hole, "difference") 96 | 97 | vertices = [project_inv(vertex, meta) for vertex in mesh_wall.vertices] 98 | 99 | return vertices, np.array(mesh_wall.faces) 100 | 101 | 102 | def convert_lines_to_vertices(lines): 103 | """convert line representation to polygon vertices 104 | """ 105 | polygons = [] 106 | lines = np.array(lines) 107 | 108 | polygon = None 109 | while len(lines) != 0: 110 | if polygon is None: 111 | polygon = lines[0].tolist() 112 | lines = np.delete(lines, 0, 0) 113 | 114 | lineID, juncID = np.where(lines == polygon[-1]) 115 | vertex = lines[lineID[0], 1 - juncID[0]] 116 | lines = np.delete(lines, lineID, 0) 117 | 118 | if vertex in polygon: 119 | polygons.append(polygon) 120 | polygon = None 121 | else: 122 | polygon.append(vertex) 123 | 124 | return polygons 125 | -------------------------------------------------------------------------------- /lalaloc/utils/projection.py: -------------------------------------------------------------------------------- 1 | import numpy as np 2 | import torch 3 | from pytorch3d.structures import Pointclouds 4 | from pytorch3d.renderer.mesh.rasterize_meshes import barycentric_coordinates 5 | 6 | from .panorama import coords_to_uv, uvs_to_rays 7 | 8 | 9 | def project_depth_to_pc( 10 | config, depth, camera_type="panorama", camera_params=None, 11 | ): 12 | height, width = depth.shape 13 | depth = depth.reshape(-1) 14 | xys_camera = np.stack( 15 | np.meshgrid(np.arange(width), np.arange(height)), axis=2 16 | ).reshape((-1, 2)) 17 | 18 | if camera_type == "panorama": 19 | uvs = coords_to_uv(xys_camera, width, height) 20 | rays = uvs_to_rays(uvs).astype(np.float32).reshape(-1, 3) 21 | else: 22 | raise NotImplementedError( 23 | "{} camera_type is not currently implemented".format(camera_type) 24 | ) 25 | invalid_depth = np.isnan(depth) | (depth <= 0) 26 | points = depth[~invalid_depth].reshape(-1, 1) * rays[~invalid_depth] 27 | 28 | return points 29 | 30 | 31 | def project_depth_to_pc_batched( 32 | config, depths, camera_type="panorama", camera_params=None 33 | ): 34 | height, width = config.RENDER.IMG_SIZE 35 | xys_camera = np.stack( 36 | np.meshgrid(np.arange(width), np.arange(height)), axis=2 37 | ).reshape((-1, 2)) 38 | 39 | if camera_type == "panorama": 40 | uvs = coords_to_uv(xys_camera, width, height) 41 | rays = uvs_to_rays(uvs).astype(np.float32).reshape(-1, 3) 42 | else: 43 | raise NotImplementedError( 44 | "{} camera_type is not currently implemented".format(camera_type) 45 | ) 46 | points = [] 47 | for depth in depths: 48 | depth = depth.reshape(-1) 49 | invalid_depth = np.isnan(depth) | (depth <= 0) 50 | point = depth[~invalid_depth].reshape(-1, 1) * rays[~invalid_depth] 51 | points.append(torch.Tensor(point)) 52 | 53 | points = Pointclouds(points) 54 | return points 55 | 56 | 57 | def projects_onto_floor(location, floors): 58 | for idx, (verts, faces) in enumerate(zip(floors.verts_list(), floors.faces_list())): 59 | triangles = [verts[idxs] for idxs in faces] 60 | for triangle in triangles: 61 | bary = barycentric_coordinates(location, *triangle) 62 | bary = torch.Tensor(bary) 63 | if (bary >= 0).all() and (bary <= 1).all() and abs(sum(bary) - 1) < 1e-5: 64 | return idx 65 | return -1 66 | -------------------------------------------------------------------------------- /lalaloc/utils/render.py: -------------------------------------------------------------------------------- 1 | import numpy as np 2 | import pyredner 3 | import torch 4 | from pyredner import scene 5 | 6 | device = torch.device("cpu:0") 7 | 8 | pyredner.set_print_timing(False) 9 | pyredner.set_device(device) 10 | 11 | 12 | def create_camera(pose, img_size): 13 | # look at x axis equivalent to an offset x' 14 | look_at = pose.clone() + torch.Tensor([1, 0, 0]) 15 | up = torch.Tensor([0.0, 0.0, 1.0]) 16 | 17 | return pyredner.Camera( 18 | position=pose, 19 | look_at=look_at, 20 | up=up, 21 | camera_type=pyredner.camera_type.panorama, 22 | resolution=img_size, 23 | ) 24 | 25 | 26 | def create_objects(mesh): 27 | material = pyredner.Material() 28 | objects = [] 29 | for verts, faces in zip(mesh.verts_list(), mesh.faces_list()): 30 | objects.append(pyredner.Object(verts, faces.int(), material)) 31 | return objects 32 | 33 | 34 | def render_scene(config, mesh, pose): 35 | img_size = config.RENDER.IMG_SIZE 36 | pose = torch.Tensor(pose) 37 | 38 | camera = create_camera(pose, img_size) 39 | objects = create_objects(mesh) 40 | scene = pyredner.Scene(camera=camera, objects=objects) 41 | 42 | img = pyredner.render_g_buffer(scene, [pyredner.channels.depth], device=device) 43 | # flip x axis for equivalent to a flipped x' 44 | img = img.flip(dims=[1]) 45 | img = img.cpu().squeeze(2).numpy() 46 | img = np.ascontiguousarray(img) 47 | return img 48 | 49 | 50 | def render_semantics(config, mesh, pose): 51 | img_size = config.RENDER.IMG_SIZE 52 | pose = torch.Tensor(pose) 53 | 54 | camera = create_camera(pose, img_size) 55 | objects = create_objects(mesh) 56 | scene = pyredner.Scene(camera=camera, objects=objects) 57 | 58 | img = pyredner.render_g_buffer(scene, [pyredner.channels.shape_id], num_samples=1) 59 | # flip x axis for equivalent to a flipped x' 60 | img = img.flip(dims=[1]) 61 | img = img.cpu().squeeze(2).numpy() 62 | img = np.ascontiguousarray(img) 63 | return img 64 | 65 | 66 | def render_scene_batched(config, geometry, poses): 67 | scenes = [] 68 | rooms = [r for _, r in poses] 69 | rooms = np.unique(rooms) 70 | 71 | objects = {} 72 | for room in rooms: 73 | mesh = geometry[room] 74 | objs = create_objects(mesh) 75 | objects[room] = objs 76 | 77 | img_size = config.RENDER.IMG_SIZE 78 | for pose, room in poses: 79 | pose = torch.Tensor(pose) 80 | camera = create_camera(pose, img_size) 81 | scenes.append(pyredner.Scene(camera=camera, objects=objects[room])) 82 | 83 | imgs = pyredner.render_g_buffer( 84 | scenes, [pyredner.channels.depth], device=device, num_samples=1 85 | ) 86 | # flip x axis for equivalent to a flipped x' 87 | imgs = imgs.flip(dims=[2]) 88 | imgs = imgs.cpu().squeeze(3).numpy() 89 | imgs = np.ascontiguousarray(imgs) 90 | return imgs 91 | 92 | 93 | def render_semantic_batched(config, geometry, poses): 94 | scenes = [] 95 | rooms = [r for _, r in poses] 96 | rooms = np.unique(rooms) 97 | 98 | objects = {} 99 | for room in rooms: 100 | mesh = geometry[room] 101 | objs = create_objects(mesh) 102 | objects[room] = objs 103 | 104 | img_size = config.RENDER.IMG_SIZE 105 | for pose, room in poses: 106 | pose = torch.Tensor(pose) 107 | camera = create_camera(pose, img_size) 108 | scenes.append(pyredner.Scene(camera=camera, objects=objects[room])) 109 | 110 | imgs = pyredner.render_g_buffer( 111 | scenes, [pyredner.channels.shape_id], device=device, num_samples=1 112 | ) 113 | # flip x axis for equivalent to a flipped x' 114 | imgs = imgs.flip(dims=[2]) 115 | imgs = imgs.cpu().squeeze(3).numpy() 116 | imgs = np.ascontiguousarray(imgs) 117 | return imgs 118 | -------------------------------------------------------------------------------- /lalaloc/utils/vogel_disc.py: -------------------------------------------------------------------------------- 1 | import numpy as np 2 | 3 | 4 | def sample_vogel_disc(centre, radius, num_samples): 5 | phi = (1 + np.sqrt(5)) / 2 6 | 7 | samples = [] 8 | for i in range(1, num_samples + 1): 9 | distance = radius * np.sqrt(i) / np.sqrt(num_samples + 1) 10 | angle = 2 * np.pi * phi * i 11 | 12 | x = distance * np.cos(angle) + centre[0] 13 | y = distance * np.sin(angle) + centre[1] 14 | samples.append(np.array([x, y, centre[2]])) 15 | return samples -------------------------------------------------------------------------------- /train.py: -------------------------------------------------------------------------------- 1 | import pytorch_lightning as pl 2 | import torch 3 | import torch.nn as nn 4 | from pytorch_lightning import loggers 5 | from pytorch_lightning.callbacks import ModelCheckpoint 6 | 7 | from lalaloc.config import get_cfg_defaults, parse_args 8 | from lalaloc.model import ( 9 | FloorPlanUnetImage, 10 | FloorPlanUnetLayout, 11 | ImageFromLayout, 12 | Layout2LayoutDecode, 13 | ) 14 | 15 | if __name__ == "__main__": 16 | args = parse_args() 17 | 18 | config = get_cfg_defaults() 19 | config.merge_from_file(args.config_file) 20 | config.merge_from_list(args.opts) 21 | if args.val: 22 | config.TEST.VAL_AS_TEST = True 23 | config.freeze() 24 | print(config) 25 | 26 | pl.seed_everything(config.SEED) 27 | 28 | if args.checkpoint_file: 29 | resume_path = args.checkpoint_file 30 | else: 31 | resume_path = None 32 | 33 | if config.MODEL.TYPE == "lalaloc": 34 | if config.MODEL.QUERY_TYPE == "image": 35 | model = ImageFromLayout(config) 36 | elif config.MODEL.QUERY_TYPE == "layout": 37 | model = Layout2LayoutDecode(config) 38 | else: 39 | raise NotImplementedError( 40 | "The query type, {}, isn't recognised.".format(config.MODEL.QUERY_TYPE) 41 | ) 42 | elif config.MODEL.TYPE == "lalaloc++": 43 | if config.MODEL.QUERY_TYPE == "image": 44 | model = FloorPlanUnetImage(config) 45 | elif config.MODEL.QUERY_TYPE == "layout": 46 | model = FloorPlanUnetLayout(config) 47 | else: 48 | raise NotImplementedError( 49 | "The query type, {}, isn't recognised.".format(config.MODEL.QUERY_TYPE) 50 | ) 51 | 52 | logger = loggers.TensorBoardLogger(config.OUT_DIR) 53 | checkpoint_callback = ModelCheckpoint(save_top_k=-1,) 54 | 55 | trainer = pl.Trainer( 56 | max_epochs=config.TRAIN.NUM_EPOCHS, 57 | gpus=config.SYSTEM.NUM_GPUS, 58 | logger=logger, 59 | distributed_backend=config.SYSTEM.DISTRIBUTED_BACKEND, 60 | limit_val_batches=25, 61 | resume_from_checkpoint=resume_path, 62 | num_sanity_val_steps=2, 63 | check_val_every_n_epoch=config.TRAIN.TEST_EVERY, 64 | callbacks=[checkpoint_callback], 65 | ) 66 | if args.test_ckpt: 67 | assert config.SYSTEM.NUM_GPUS == 1 68 | load = torch.load(args.test_ckpt) 69 | model.load_state_dict(load["state_dict"], strict=False) 70 | trainer.test(model) 71 | else: 72 | trainer.fit(model) 73 | -------------------------------------------------------------------------------- /trained_models/.gitattributes: -------------------------------------------------------------------------------- 1 | lalaloc_pp_image2plan.ckpt filter=lfs diff=lfs merge=lfs -text 2 | lalaloc_pp_planbranch.ckpt filter=lfs diff=lfs merge=lfs -text 3 | -------------------------------------------------------------------------------- /trained_models/lalaloc_image2layout.ckpt: -------------------------------------------------------------------------------- 1 | version https://git-lfs.github.com/spec/v1 2 | oid sha256:ad1e340d3f2e52bdc64cdaeeb2d62e46388b1524aa106b59a3e78e5fb60f2472 3 | size 235577387 4 | -------------------------------------------------------------------------------- /trained_models/lalaloc_layout2layout.ckpt: -------------------------------------------------------------------------------- 1 | version https://git-lfs.github.com/spec/v1 2 | oid sha256:3b8e8eff602cc464e88e786011a9e1fcfb45fca0ed8769eb75d0062a0e5a1b23 3 | size 140357335 4 | -------------------------------------------------------------------------------- /trained_models/lalaloc_pp_image2plan.ckpt: -------------------------------------------------------------------------------- 1 | version https://git-lfs.github.com/spec/v1 2 | oid sha256:fba43aa21769c53e3bb0802a475184980f724b8efdddb219ceca577c43ae6d98 3 | size 257260520 4 | -------------------------------------------------------------------------------- /trained_models/lalaloc_pp_planbranch.ckpt: -------------------------------------------------------------------------------- 1 | version https://git-lfs.github.com/spec/v1 2 | oid sha256:0c4d45af98576cbdde2bc457d5111b87a3e5dc3e5cbc828463373f85f8eeb0e4 3 | size 105801121 4 | --------------------------------------------------------------------------------