├── .gitignore
├── README.md
├── VLN-DUET-RVR
├── README.md
├── map_nav_src
│ ├── models
│ │ ├── graph_utils.py
│ │ ├── model.py
│ │ ├── ops.py
│ │ ├── transformer.py
│ │ ├── vilmodel.py
│ │ └── vlnbert_init.py
│ ├── reverie
│ │ ├── agent_base.py
│ │ ├── agent_obj.py
│ │ ├── data_utils.py
│ │ ├── env.py
│ │ ├── env_hm3d.py
│ │ ├── eval_utils.py
│ │ ├── main_nav_obj.py
│ │ ├── main_nav_obj_hm3d.py
│ │ └── parser.py
│ ├── scripts
│ │ └── run_reverie.sh
│ └── utils
│ │ ├── data.py
│ │ ├── distributed.py
│ │ ├── logger.py
│ │ ├── misc.py
│ │ └── ops.py
└── pretrain_src
│ ├── configs
│ ├── model_config.json
│ └── training_args.json
│ ├── data
│ ├── __init__.py
│ ├── common.py
│ ├── dataset.py
│ ├── loader.py
│ └── tasks.py
│ ├── model
│ ├── __init__.py
│ ├── ops.py
│ ├── pretrain_cmt.py
│ ├── transformer.py
│ └── vilmodel.py
│ ├── optim
│ ├── __init__.py
│ ├── adamw.py
│ ├── lookahead.py
│ ├── misc.py
│ ├── radam.py
│ ├── ralamb.py
│ ├── rangerlars.py
│ └── sched.py
│ ├── parser.py
│ ├── submit_reverie.sh
│ ├── test
│ ├── test_dataset.py
│ ├── test_tasks.py
│ └── test_vilmodel.py
│ ├── train_hm3d_reverie.py
│ ├── train_reverie_obj.py
│ ├── train_soon_obj.py
│ └── utils
│ ├── __init__.py
│ ├── distributed.py
│ ├── logger.py
│ ├── misc.py
│ └── save.py
├── VLN-DUET
├── datasets
│ └── .gitkeep
├── map_nav_src
│ ├── models
│ │ ├── graph_utils.py
│ │ ├── model.py
│ │ ├── ops.py
│ │ ├── transformer.py
│ │ ├── vilmodel.py
│ │ └── vlnbert_init.py
│ ├── r2r
│ │ ├── agent.py
│ │ ├── agent_base.py
│ │ ├── data_utils.py
│ │ ├── env.py
│ │ ├── eval_utils.py
│ │ ├── main_nav.py
│ │ └── parser.py
│ ├── scripts
│ │ ├── r2r_b16_mix.sh
│ │ └── r2r_h14_envedit_mix.sh
│ └── utils
│ │ ├── data.py
│ │ ├── distributed.py
│ │ ├── logger.py
│ │ ├── misc.py
│ │ └── ops.py
├── pretrain_src
│ ├── config
│ │ ├── r2r_model_config_clip-b16.json
│ │ ├── r2r_model_config_clip-h14.json
│ │ ├── r2r_pretrain_hm3d+mp3d+gibson_clip-b16.json
│ │ └── r2r_pretrain_hm3d+mp3d+gibson_clip-h14.json
│ ├── data
│ │ ├── __init__.py
│ │ ├── common.py
│ │ ├── dataset.py
│ │ ├── loader.py
│ │ └── tasks.py
│ ├── model
│ │ ├── __init__.py
│ │ ├── ops.py
│ │ ├── pretrain_cmt.py
│ │ ├── transformer.py
│ │ └── vilmodel.py
│ ├── optim
│ │ ├── __init__.py
│ │ ├── adamw.py
│ │ ├── lookahead.py
│ │ ├── misc.py
│ │ ├── radam.py
│ │ ├── ralamb.py
│ │ ├── rangerlars.py
│ │ └── sched.py
│ ├── parser.py
│ ├── run_r2r_b16.sh
│ ├── run_r2r_h14.sh
│ ├── train_r2r.py
│ └── utils
│ │ ├── __init__.py
│ │ ├── distributed.py
│ │ ├── logger.py
│ │ ├── misc.py
│ │ └── save.py
└── requirements.txt
└── files
└── overall.jpg
/.gitignore:
--------------------------------------------------------------------------------
1 | .ftpignore
2 | .ftpconfig
3 |
4 | # Byte-compiled / optimized / DLL files
5 | __pycache__/
6 | *.py[cod]
7 | *$py.class
8 |
9 | .vscode/
10 |
11 | VLN-DUET/datasets/*
12 | out/*
13 | !VLN-DUET/datasets/.gitkeep
--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
1 | # ScaleVLN
2 | Official implementation of the **ICCV 2023** paper:
3 |
**Scaling Data Generation in Vision-and-Language Navigation**
4 | [**Zun Wang**](https://zunwang1.github.io/), [**Jialu Li**](https://jialuli-luka.github.io/), [**Yicong Hong**](http://www.yiconghong.me/), [Yi Wang](https://shepnerd.github.io/), [Qi Wu](http://www.qi-wu.me/), [Mohit Bansal](https://www.cs.unc.edu/~mbansal/), [Stephen Gould](http://users.cecs.anu.edu.au/~sgould/), [Hao Tan](https://www.cs.unc.edu/~airsplay/), [Yu Qiao](https://scholar.google.com/citations?hl=en&user=gFtI-8QAAAAJ&view_op=list_works)
5 |
6 | [Paper & Appendices](https://arxiv.org/abs/2307.15644)
7 |
8 | 
9 |
10 |
11 |
12 | ## Abstract
13 | Recent research in language-guided visual navigation has demonstrated a significant demand for the diversity of traversable environments and the quantity of supervision for training generalizable agents. To tackle the common data scarcity issue in existing vision-and-language navigation datasets, we propose an effective paradigm for generating large-scale data for learning, which applies 1200+ photo-realistic environments from HM3D and Gibson datasets and synthesizes 4.9 million instruction-trajectory pairs using fully-accessible resources on the web. Importantly, we investigate the influence of each component in this paradigm on the agent's performance and study how to adequately apply the augmented data to pre-train and fine-tune an agent. Thanks to our large-scale dataset, the performance of an existing agent can be pushed up (+11\% absolute with regard to previous SoTA) to a significantly new best of 80\% single-run success rate on the R2R test split by simple imitation learning. The long-lasting generalization gap between navigating in seen and unseen environments is also reduced to less than 1\% (versus 8\% in the previous best method). Moreover, our paradigm also facilitates different models to achieve new state-of-the-art navigation results on CVDN, REVERIE, and R2R in continuous environments.
14 |
15 | ## Updates
16 | - **2023/07/31**🔥: We release the ScaleVLN dataset, models and training codes for R2R.
17 |
18 | - **2023/07/14**: ScaleVLN is accpeted by ICCV2023! 🎉
19 |
20 | ## TODOs
21 |
22 | - [x] ScaleVLN dataset
23 | - [x] Code, data and trained models for R2R
24 | - [x] Code, data and trained models for RVR
25 | - [ ] Graph Construction Code
26 | - [ ] Speaker Training Code
27 |
28 | ## Prerequisites
29 |
30 | ### Installation
31 |
32 | 1. Install Matterport3D simulators: follow instructions [here](https://github.com/peteanderson80/Matterport3DSimulator). We use the latest version instead of v0.1.
33 | ```
34 | export PYTHONPATH=Matterport3DSimulator/build:$PYTHONPATH
35 | ```
36 |
37 | 2. Install requirements:
38 | ```
39 | conda create --name vlnde python=3.9
40 | conda activate vlnde
41 | cd VLN-DUET
42 | pip install -r requirements.txt
43 | ```
44 |
45 | ### R2R
46 |
47 | 1. Download the required data from [here](https://huggingface.co/datasets/OpenGVLab/ScaleVLN/blob/main/r2r_preprocess_data.zip) and unzip it to `VLN-DUET/datasets/R2R`. It should include three folders `annotations, connectivity, connectivity_mp3d`.
48 |
49 | 2. Download the CLIP and EnvEdit features from [here](https://huggingface.co/datasets/OpenGVLab/ScaleVLN/blob/main/features.zip) and unzip it to `VLN-DUET/datasets/R2R`. It should include one folder `features`.
50 |
51 | 3. (Optional) Download the trained models from [here](https://huggingface.co/datasets/OpenGVLab/ScaleVLN/blob/main/r2r_trained_models.zip) and unzip it to `VLN-DUET/datasets/R2R`. It should include one folder `trained_models`.
52 |
53 | 4. Download pretrained lxmert from [here](https://nlp.cs.unc.edu/data/model_LXRT.pth) and place it at `VLN-DUET/datasets/pretrained/LXMERT`.
54 |
55 |
56 | ## Running
57 |
58 | ### Pre-training
59 |
60 | We use Two NVDIA A100 GPUs for pre-training agents on ScaleVLN.
61 |
62 | ```bash
63 | bash run_r2r_b14.sh "0,1" 45008
64 | bash run_r2r_h14.sh "0,1" 45009
65 | ```
66 |
67 |
68 | ### Fine-tuning
69 |
70 | We use one NVDIA A100 GPU for fine-tuning our agents.
71 |
72 | ```
73 | bash scripts/r2r_b16_mix.sh 0
74 | bash scripts/r2r_h14_envedit_mix.sh 0
75 | ...
76 | ```
77 |
78 | ## REVERIE
79 | Please see `ScaleVLN/VLN-DUET-RVR/README.md`.
80 |
81 | ## Citation
82 | Please cite our paper:
83 | ```
84 | @InProceedings{wang2023scalevln,
85 | author = {Zun Wang, Jialu Li, Yicong Hong, Yi Wang, Qi Wu, Mohit Bansal, Stephen Gould, Hao Tan, Yu Qiao},
86 | title = {Scaling Data Generation in Vision-and-Language Navigation},
87 | booktitle = {ICCV 2023},
88 | year = {2023}
89 | }
90 | ```
91 |
92 | ## Acknowledgement
93 |
94 | We thank the developers of [DUET](https://github.com/cshizhe/VLN-DUET), [EnvDrop](https://github.com/clip-vil/CLIP-ViL/tree/master/CLIP-ViL-VLN), [Co-Mod GAN](https://github.com/zsyzzsoft/co-mod-gan), [Discrete-Continuous VLN](https://github.com/YicongHong/Discrete-Continuous-VLN), [HAMT](https://github.com/cshizhe/VLN-HAMT) for their public code release.
95 |
--------------------------------------------------------------------------------
/VLN-DUET-RVR/README.md:
--------------------------------------------------------------------------------
1 | 1. follow scaleVLN to get datasets for R2R and the pretrained LXMERT (connectivity files and img features of rvr are from R2R data) and unzip it in datasets.
2 | 2. download RVR data from [here](https://huggingface.co/datasets/OpenGVLab/ScaleVLN/blob/main/rvr_data.zip) and unzip it in datasets. We expects files in this structure:
3 | ```
4 | datasets
5 | ├─ R2R
6 | ├─ REVERIE
7 | ├─ pretrained
8 | ```
9 | 3. bash submit_reverie.sh to run rvr pretraining (make take ~1h to preload the data in training).
10 | 4. bash script/run_reverie.sh to run rvr finetuneing.
11 |
--------------------------------------------------------------------------------
/VLN-DUET-RVR/map_nav_src/models/graph_utils.py:
--------------------------------------------------------------------------------
1 | from collections import defaultdict
2 | import numpy as np
3 |
4 | MAX_DIST = 30
5 | MAX_STEP = 10
6 |
7 | def calc_position_distance(a, b):
8 | # a, b: (x, y, z)
9 | dx = b[0] - a[0]
10 | dy = b[1] - a[1]
11 | dz = b[2] - a[2]
12 | dist = np.sqrt(dx**2 + dy**2 + dz**2)
13 | return dist
14 |
15 | def calculate_vp_rel_pos_fts(a, b, base_heading=0, base_elevation=0):
16 | # a, b: (x, y, z)
17 | dx = b[0] - a[0]
18 | dy = b[1] - a[1]
19 | dz = b[2] - a[2]
20 | xy_dist = max(np.sqrt(dx**2 + dy**2), 1e-8)
21 | xyz_dist = max(np.sqrt(dx**2 + dy**2 + dz**2), 1e-8)
22 |
23 | # the simulator's api is weired (x-y axis is transposed)
24 | heading = np.arcsin(dx/xy_dist) # [-pi/2, pi/2]
25 | if b[1] < a[1]:
26 | heading = np.pi - heading
27 | heading -= base_heading
28 |
29 | elevation = np.arcsin(dz/xyz_dist) # [-pi/2, pi/2]
30 | elevation -= base_elevation
31 |
32 | return heading, elevation, xyz_dist
33 |
34 | def get_angle_fts(headings, elevations, angle_feat_size):
35 | ang_fts = [np.sin(headings), np.cos(headings), np.sin(elevations), np.cos(elevations)]
36 | ang_fts = np.vstack(ang_fts).transpose().astype(np.float32)
37 | num_repeats = angle_feat_size // 4
38 | if num_repeats > 1:
39 | ang_fts = np.concatenate([ang_fts] * num_repeats, 1)
40 | return ang_fts
41 |
42 |
43 | class FloydGraph(object):
44 | def __init__(self):
45 | self._dis = defaultdict(lambda :defaultdict(lambda: 95959595))
46 | self._point = defaultdict(lambda :defaultdict(lambda: ""))
47 | self._visited = set()
48 |
49 | def distance(self, x, y):
50 | if x == y:
51 | return 0
52 | else:
53 | return self._dis[x][y]
54 |
55 | def add_edge(self, x, y, dis):
56 | if dis < self._dis[x][y]:
57 | self._dis[x][y] = dis
58 | self._dis[y][x] = dis
59 | self._point[x][y] = ""
60 | self._point[y][x] = ""
61 |
62 | def update(self, k):
63 | for x in self._dis:
64 | for y in self._dis:
65 | if x != y:
66 | if self._dis[x][k] + self._dis[k][y] < self._dis[x][y]:
67 | self._dis[x][y] = self._dis[x][k] + self._dis[k][y]
68 | self._dis[y][x] = self._dis[x][y]
69 | self._point[x][y] = k
70 | self._point[y][x] = k
71 | self._visited.add(k)
72 |
73 | def visited(self, k):
74 | return (k in self._visited)
75 |
76 | def path(self, x, y):
77 | """
78 | :param x: start
79 | :param y: end
80 | :return: the path from x to y [v1, v2, ..., v_n, y]
81 | """
82 | if x == y:
83 | return []
84 | if self._point[x][y] == "": # Direct edge
85 | return [y]
86 | else:
87 | k = self._point[x][y]
88 | # print(x, y, k)
89 | # for x1 in (x, k, y):
90 | # for x2 in (x, k, y):
91 | # print(x1, x2, "%.4f" % self._dis[x1][x2])
92 | return self.path(x, k) + self.path(k, y)
93 |
94 |
95 | class GraphMap(object):
96 | def __init__(self, start_vp):
97 | self.start_vp = start_vp # start viewpoint
98 |
99 | self.node_positions = {} # viewpoint to position (x, y, z)
100 | self.graph = FloydGraph() # shortest path graph
101 | self.node_embeds = {} # {viewpoint: feature (sum feature, count)}
102 | self.node_stop_scores = {} # {viewpoint: prob}
103 | self.node_nav_scores = {} # {viewpoint: {t: prob}}
104 | self.node_step_ids = {}
105 |
106 | def update_graph(self, ob):
107 | self.node_positions[ob['viewpoint']] = ob['position']
108 | for cc in ob['candidate']:
109 | self.node_positions[cc['viewpointId']] = cc['position']
110 | dist = calc_position_distance(ob['position'], cc['position'])
111 | self.graph.add_edge(ob['viewpoint'], cc['viewpointId'], dist)
112 | self.graph.update(ob['viewpoint'])
113 |
114 | def update_node_embed(self, vp, embed, rewrite=False):
115 | if rewrite:
116 | self.node_embeds[vp] = [embed, 1]
117 | else:
118 | if vp in self.node_embeds:
119 | self.node_embeds[vp][0] = self.node_embeds[vp][0] + embed
120 | self.node_embeds[vp][1] = self.node_embeds[vp][1] + 1
121 | else:
122 | self.node_embeds[vp] = [embed, 1]
123 |
124 | def get_node_embed(self, vp):
125 | return self.node_embeds[vp][0] / self.node_embeds[vp][1]
126 |
127 | def get_pos_fts(self, cur_vp, gmap_vpids, cur_heading, cur_elevation, angle_feat_size=4):
128 | # dim=7 (sin(heading), cos(heading), sin(elevation), cos(elevation),
129 | # line_dist, shortest_dist, shortest_step)
130 | rel_angles, rel_dists = [], []
131 | for vp in gmap_vpids:
132 | if vp is None:
133 | rel_angles.append([0, 0])
134 | rel_dists.append([0, 0, 0])
135 | else:
136 | rel_heading, rel_elevation, rel_dist = calculate_vp_rel_pos_fts(
137 | self.node_positions[cur_vp], self.node_positions[vp],
138 | base_heading=cur_heading, base_elevation=cur_elevation,
139 | )
140 | rel_angles.append([rel_heading, rel_elevation])
141 | rel_dists.append(
142 | [rel_dist / MAX_DIST, self.graph.distance(cur_vp, vp) / MAX_DIST, \
143 | len(self.graph.path(cur_vp, vp)) / MAX_STEP]
144 | )
145 | rel_angles = np.array(rel_angles).astype(np.float32)
146 | rel_dists = np.array(rel_dists).astype(np.float32)
147 | rel_ang_fts = get_angle_fts(rel_angles[:, 0], rel_angles[:, 1], angle_feat_size)
148 | return np.concatenate([rel_ang_fts, rel_dists], 1)
149 |
150 | def save_to_json(self):
151 | nodes = {}
152 | for vp, pos in self.node_positions.items():
153 | nodes[vp] = {
154 | 'location': pos, # (x, y, z)
155 | 'visited': self.graph.visited(vp),
156 | }
157 | if nodes[vp]['visited']:
158 | nodes[vp]['stop_prob'] = self.node_stop_scores[vp]['stop']
159 | nodes[vp]['og_objid'] = self.node_stop_scores[vp]['og']
160 | else:
161 | nodes[vp]['nav_prob'] = self.node_nav_scores[vp]
162 |
163 | edges = []
164 | for k, v in self.graph._dis.items():
165 | for kk in v.keys():
166 | edges.append((k, kk))
167 |
168 | return {'nodes': nodes, 'edges': edges}
169 |
170 |
171 |
--------------------------------------------------------------------------------
/VLN-DUET-RVR/map_nav_src/models/model.py:
--------------------------------------------------------------------------------
1 | import numpy as np
2 | import collections
3 |
4 | import torch
5 | import torch.nn as nn
6 | import torch.nn.functional as F
7 |
8 | from transformers import BertPreTrainedModel
9 |
10 | from .vlnbert_init import get_vlnbert_models
11 |
12 | class VLNBert(nn.Module):
13 | def __init__(self, args):
14 | super().__init__()
15 | print('\nInitalizing the VLN-BERT model ...')
16 | self.args = args
17 |
18 | self.vln_bert = get_vlnbert_models(args, config=None) # initialize the VLN-BERT
19 | self.drop_env = nn.Dropout(p=args.feat_dropout)
20 |
21 | def forward(self, mode, batch):
22 | batch = collections.defaultdict(lambda: None, batch)
23 |
24 | if mode == 'language':
25 | txt_embeds = self.vln_bert(mode, batch)
26 | return txt_embeds
27 |
28 | elif mode == 'panorama':
29 | batch['view_img_fts'] = self.drop_env(batch['view_img_fts'])
30 | if 'obj_img_fts' in batch:
31 | batch['obj_img_fts'] = self.drop_env(batch['obj_img_fts'])
32 | pano_embeds, pano_masks = self.vln_bert(mode, batch)
33 | return pano_embeds, pano_masks
34 |
35 | elif mode == 'navigation':
36 | outs = self.vln_bert(mode, batch)
37 | return outs
38 |
39 | else:
40 | raise NotImplementedError('wrong mode: %s'%mode)
41 |
42 |
43 | class Critic(nn.Module):
44 | def __init__(self, args):
45 | super(Critic, self).__init__()
46 | self.state2value = nn.Sequential(
47 | nn.Linear(768, 512),
48 | nn.ReLU(),
49 | nn.Dropout(args.dropout),
50 | nn.Linear(512, 1),
51 | )
52 |
53 | def forward(self, state):
54 | return self.state2value(state).squeeze()
55 |
--------------------------------------------------------------------------------
/VLN-DUET-RVR/map_nav_src/models/ops.py:
--------------------------------------------------------------------------------
1 | import torch
2 |
3 | from .transformer import TransformerEncoder, TransformerEncoderLayer
4 |
5 | # try:
6 | # from apex.normalization.fused_layer_norm import FusedLayerNorm as BertLayerNorm
7 | # except (ImportError, AttributeError) as e:
8 | # # logger.info("Better speed can be achieved with apex installed from https://www.github.com/nvidia/apex .")
9 | # BertLayerNorm = torch.nn.LayerNorm
10 | BertLayerNorm = torch.nn.LayerNorm
11 |
12 | def create_transformer_encoder(config, num_layers, norm=False):
13 | enc_layer = TransformerEncoderLayer(
14 | config.hidden_size, config.num_attention_heads,
15 | dim_feedforward=config.intermediate_size,
16 | dropout=config.hidden_dropout_prob,
17 | activation=config.hidden_act,
18 | normalize_before=True
19 | )
20 | if norm:
21 | norm_layer = BertLayerNorm(config.hidden_size, eps=1e-12)
22 | else:
23 | norm_layer = None
24 | return TransformerEncoder(enc_layer, num_layers, norm=norm_layer, batch_first=True)
25 |
26 | def extend_neg_masks(masks, dtype=None):
27 | """
28 | mask from (N, L) into (N, 1(H), 1(L), L) and make it negative
29 | """
30 | if dtype is None:
31 | dtype = torch.float
32 | extended_masks = masks.unsqueeze(1).unsqueeze(2)
33 | extended_masks = extended_masks.to(dtype=dtype)
34 | extended_masks = (1.0 - extended_masks) * -10000.0
35 | return extended_masks
36 |
37 | def gen_seq_masks(seq_lens, max_len=None):
38 | if max_len is None:
39 | max_len = max(seq_lens)
40 | batch_size = len(seq_lens)
41 | device = seq_lens.device
42 |
43 | masks = torch.arange(max_len).unsqueeze(0).repeat(batch_size, 1).to(device)
44 | masks = masks < seq_lens.unsqueeze(1)
45 | return masks
46 |
47 | def pad_tensors_wgrad(tensors, lens=None):
48 | """B x [T, ...] torch tensors"""
49 | if lens is None:
50 | lens = [t.size(0) for t in tensors]
51 | max_len = max(lens)
52 | batch_size = len(tensors)
53 | hid = list(tensors[0].size()[1:])
54 |
55 | device = tensors[0].device
56 | dtype = tensors[0].dtype
57 |
58 | output = []
59 | for i in range(batch_size):
60 | if lens[i] < max_len:
61 | tmp = torch.cat(
62 | [tensors[i], torch.zeros([max_len-lens[i]]+hid, dtype=dtype).to(device)],
63 | dim=0
64 | )
65 | else:
66 | tmp = tensors[i]
67 | output.append(tmp)
68 | output = torch.stack(output, 0)
69 | return output
70 |
--------------------------------------------------------------------------------
/VLN-DUET-RVR/map_nav_src/models/vlnbert_init.py:
--------------------------------------------------------------------------------
1 | import torch
2 |
3 |
4 | def get_tokenizer(args):
5 | from transformers import AutoTokenizer
6 | if args.tokenizer == 'xlm':
7 | cfg_name = 'xlm-roberta-base'
8 | else:
9 | cfg_name = 'bert-base-uncased'
10 | tokenizer = AutoTokenizer.from_pretrained(cfg_name)
11 | return tokenizer
12 |
13 | def get_vlnbert_models(args, config=None):
14 |
15 | from transformers import PretrainedConfig
16 | from models.vilmodel import GlocalTextPathNavCMT
17 |
18 | model_name_or_path = args.bert_ckpt_file
19 | new_ckpt_weights = {}
20 | if model_name_or_path is not None:
21 | ckpt_weights = torch.load(model_name_or_path)
22 | for k, v in ckpt_weights.items():
23 | if k.startswith('module'):
24 | k = k[7:]
25 | if '_head' in k or 'sap_fuse' in k:
26 | new_ckpt_weights['bert.' + k] = v
27 | else:
28 | new_ckpt_weights[k] = v
29 |
30 | if args.tokenizer == 'xlm':
31 | cfg_name = 'xlm-roberta-base'
32 | else:
33 | cfg_name = 'bert-base-uncased'
34 | vis_config = PretrainedConfig.from_pretrained(cfg_name)
35 |
36 | if args.tokenizer == 'xlm':
37 | vis_config.type_vocab_size = 2
38 |
39 | vis_config.max_action_steps = 100
40 | vis_config.image_feat_size = args.image_feat_size
41 | vis_config.angle_feat_size = args.angle_feat_size
42 | vis_config.obj_feat_size = args.obj_feat_size
43 | vis_config.obj_loc_size = 3
44 | vis_config.num_l_layers = args.num_l_layers
45 | vis_config.num_pano_layers = args.num_pano_layers
46 | vis_config.num_x_layers = args.num_x_layers
47 | vis_config.graph_sprels = args.graph_sprels
48 | vis_config.glocal_fuse = args.fusion == 'dynamic'
49 |
50 | vis_config.fix_lang_embedding = args.fix_lang_embedding
51 | vis_config.fix_pano_embedding = args.fix_pano_embedding
52 | vis_config.fix_local_branch = args.fix_local_branch
53 |
54 | vis_config.update_lang_bert = not args.fix_lang_embedding
55 | vis_config.output_attentions = True
56 | vis_config.pred_head_dropout_prob = 0.1
57 | vis_config.use_lang2visn_attn = False
58 |
59 | visual_model = GlocalTextPathNavCMT.from_pretrained(
60 | pretrained_model_name_or_path=None,
61 | config=vis_config,
62 | state_dict=new_ckpt_weights)
63 |
64 | return visual_model
65 |
--------------------------------------------------------------------------------
/VLN-DUET-RVR/map_nav_src/reverie/data_utils.py:
--------------------------------------------------------------------------------
1 | import os
2 | import json
3 | import jsonlines
4 | import numpy as np
5 |
6 | import lmdb
7 | import msgpack
8 | import msgpack_numpy
9 | msgpack_numpy.patch()
10 |
11 | from utils.data import angle_feature
12 |
13 | class ObjectFeatureDB(object):
14 | def __init__(self, obj_ft_file, obj_feat_size, im_width=640, im_height=480):
15 | self.obj_feat_size = obj_feat_size
16 | self.obj_ft_file = obj_ft_file
17 | self._feature_store = {}
18 | self.im_width = im_width
19 | self.im_height = im_height
20 | self.env = lmdb.open(self.obj_ft_file, readonly=True)
21 |
22 | def load_feature(self, scan, viewpoint, max_objects=None):
23 | key = '%s_%s' % (scan, viewpoint)
24 | if key in self._feature_store:
25 | obj_fts, obj_attrs = self._feature_store[key]
26 | else:
27 | with self.env.begin() as txn:
28 | obj_data = txn.get(key.encode('ascii'))
29 | if obj_data is not None:
30 | obj_data = msgpack.unpackb(obj_data)
31 | obj_fts = obj_data['fts'][:, :self.obj_feat_size].astype(np.float32)
32 | obj_attrs = {k: v for k, v in obj_data.items() if k != 'fts'}
33 | else:
34 | obj_fts = np.zeros((0, self.obj_feat_size), dtype=np.float32)
35 | obj_attrs = {}
36 | self._feature_store[key] = (obj_fts, obj_attrs)
37 |
38 | if max_objects is not None:
39 | obj_fts = obj_fts[:max_objects]
40 | obj_attrs = {k: v[:max_objects] for k, v in obj_attrs.items()}
41 | return obj_fts, obj_attrs
42 |
43 | def get_object_feature(
44 | self, scan, viewpoint, base_heading, base_elevation, angle_feat_size,
45 | max_objects=None
46 | ):
47 | obj_fts, obj_attrs = self.load_feature(scan, viewpoint, max_objects=max_objects)
48 | obj_ang_fts = np.zeros((len(obj_fts), angle_feat_size), dtype=np.float32)
49 | obj_box_fts = np.zeros((len(obj_fts), 3), dtype=np.float32)
50 | obj_ids = []
51 | if len(obj_fts) > 0:
52 | for k, obj_ang in enumerate(obj_attrs['centers']):
53 | obj_ang_fts[k] = angle_feature(
54 | obj_ang[0] - base_heading, obj_ang[1] - base_elevation, angle_feat_size
55 | )
56 | w, h = obj_attrs['bboxes'][k][2:]
57 | obj_box_fts[k, :2] = [h/self.im_height, w/self.im_width]
58 | obj_box_fts[k, 2] = obj_box_fts[k, 0] * obj_box_fts[k, 1]
59 | obj_ids = obj_attrs['obj_ids']
60 | return obj_fts, obj_ang_fts, obj_box_fts, obj_ids
61 |
62 |
63 | def load_instr_datasets(anno_dir, dataset, splits, tokenizer):
64 | data = []
65 | for split in splits:
66 | if "/" not in split: # the official splits
67 | if tokenizer == 'bert':
68 | filepath = os.path.join(anno_dir, 'REVERIE_%s_enc.json' % split)
69 | elif tokenizer == 'xlm':
70 | filepath = os.path.join(anno_dir, 'REVERIE_%s_enc_xlmr.json' % split)
71 | else:
72 | raise NotImplementedError('unspported tokenizer %s' % tokenizer)
73 |
74 | with open(filepath) as f:
75 | new_data = json.load(f)
76 | else: # augmented data
77 | print('\nLoading augmented data %s for pretraining...' % os.path.basename(split))
78 | if split.endswith('json'):
79 | with open(split) as f:
80 | new_data = json.load(f)
81 | elif split.endswith('jsonl'):
82 | # reuse pretrain aug format
83 | with jsonlines.open(split) as f:
84 | new_data = []
85 | for item in f:
86 | objid = item['instr_id'].split('_')[1]
87 | new_data.append({
88 | 'scan': item['scan'],
89 | 'id': '%s_%d'%(item['instr_id'], len(new_data)),
90 | 'instructions': [''],
91 | 'instr_encodings': [item['instr_encoding']],
92 | 'path_id': '%s_%d'%(item['instr_id'], len(new_data)),
93 | 'objId': objid,
94 | 'path': item['path'],
95 | 'heading': np.random.rand() * np.pi * 2,
96 | 'end_vps': item['pos_vps'],
97 | })
98 | else:
99 | raise NotImplementedError('unsupported aug data format %s' % split)
100 | # Join
101 | data += new_data
102 | return data
103 |
104 | def construct_instrs(anno_dir, dataset, splits, tokenizer, max_instr_len=512):
105 | data = []
106 | for i, item in enumerate(load_instr_datasets(anno_dir, dataset, splits, tokenizer)):
107 | # Split multiple instructions into separate entries
108 | for j, instr in enumerate(item['instructions']):
109 | new_item = dict(item)
110 | if 'objId' in item:
111 | new_item['instr_id'] = '%s_%s_%d' % (str(item['path_id']), str(item['objId']), j)
112 | else:
113 | new_item['path_id'] = item['id']
114 | new_item['instr_id'] = '%s_%d' % (item['id'], j)
115 | new_item['objId'] = None
116 | new_item['instruction'] = instr
117 | new_item['instr_encoding'] = item['instr_encodings'][j][:max_instr_len]
118 | del new_item['instructions']
119 | del new_item['instr_encodings']
120 | data.append(new_item)
121 | return data
122 |
123 | def load_obj2vps(bbox_file):
124 | obj2vps = {}
125 | bbox_data = json.load(open(bbox_file))
126 | for scanvp, value in bbox_data.items():
127 | scan, vp = scanvp.split('_')
128 | # for all visible objects at that viewpoint
129 | for objid, objinfo in value.items():
130 | if objinfo['visible_pos']:
131 | # if such object not already in the dict
132 | obj2vps.setdefault(scan+'_'+objid, [])
133 | obj2vps[scan+'_'+objid].append(vp)
134 | return obj2vps
135 |
--------------------------------------------------------------------------------
/VLN-DUET-RVR/map_nav_src/reverie/env_hm3d.py:
--------------------------------------------------------------------------------
1 | ''' Batched REVERIE navigation environment '''
2 |
3 | import json
4 | import os
5 | import numpy as np
6 | import math
7 | import random
8 | import networkx as nx
9 | import copy
10 | import h5py
11 | import jsonlines
12 | import collections
13 |
14 | from utils.data import load_nav_graphs, new_simulator
15 | from utils.data import angle_feature, get_all_point_angle_feature
16 |
17 | from .env import EnvBatch, ReverieObjectNavBatch
18 | from utils.data import ImageFeaturesDB
19 | from .data_utils import ObjectFeatureDB
20 |
21 | def construct_instrs(instr_files, max_instr_len=512):
22 | data = []
23 | for instr_file in instr_files:
24 | with jsonlines.open(instr_file) as f:
25 | for item in f:
26 | newitem = {
27 | 'instr_id': item['instr_id'],
28 | 'objId': item['objid'],
29 | 'scan': item['scan'],
30 | 'path': item['path'],
31 | 'end_vps': item['pos_vps'],
32 | 'instruction': item['instruction'],
33 | 'instr_encoding': item['instr_encoding'][:max_instr_len],
34 | 'heading': np.random.rand() * np.pi * 2,
35 | }
36 | data.append(newitem)
37 | return data
38 |
39 |
40 | class HM3DReverieObjectNavBatch(ReverieObjectNavBatch):
41 | ''' Implements the REVERIE navigation task, using discretized viewpoints and pretrained features '''
42 |
43 | def __init__(
44 | self, view_db_file, obj_db_file, instr_files, connectivity_dir,
45 | multi_endpoints=False, multi_startpoints=False,
46 | image_feat_size=768, obj_feat_size=768,
47 | batch_size=64, angle_feat_size=4, max_objects=None,
48 | seed=0, name=None, sel_data_idxs=None, scan_ranges=None
49 | ):
50 | view_db = ImageFeaturesDB(view_db_file, image_feat_size)
51 | obj_db = ObjectFeatureDB(obj_db_file, obj_feat_size, im_width=224, im_height=224)
52 | instr_data = construct_instrs(instr_files, max_instr_len=100)
53 | if scan_ranges is not None:
54 | scan_idxs = set(list(range(scan_ranges[0], scan_ranges[1])))
55 | new_instr_data = []
56 | for item in instr_data:
57 | if int(item['scan'].split('-')[0]) in scan_idxs:
58 | new_instr_data.append(item)
59 | instr_data = new_instr_data
60 | #print(connectivity_dir)
61 | #exit()
62 | self.env = EnvBatch(connectivity_dir, feat_db=view_db, batch_size=batch_size)
63 | self.obj_db = obj_db
64 | self.data = instr_data
65 | self.scans = set([x['scan'] for x in self.data])
66 | self.multi_endpoints = multi_endpoints
67 | self.multi_startpoints = multi_startpoints
68 | self.connectivity_dir = connectivity_dir
69 | self.batch_size = batch_size
70 | self.angle_feat_size = angle_feat_size
71 | self.max_objects = max_objects
72 | self.name = name
73 |
74 | self.gt_trajs = self._get_gt_trajs(self.data) # for evaluation
75 |
76 | # in validation, we would split the data
77 | if sel_data_idxs is not None:
78 | t_split, n_splits = sel_data_idxs
79 | ndata_per_split = len(self.data) // n_splits
80 | start_idx = ndata_per_split * t_split
81 | if t_split == n_splits - 1:
82 | end_idx = None
83 | else:
84 | end_idx = start_idx + ndata_per_split
85 | self.data = self.data[start_idx: end_idx]
86 |
87 | # use different seeds in different processes to shuffle data
88 | self.seed = seed
89 | random.seed(self.seed)
90 | random.shuffle(self.data)
91 |
92 | self.obj2vps = collections.defaultdict(list) # {scan_objid: vp_list} (objects can be viewed at the viewpoints)
93 | for item in self.data:
94 | self.obj2vps['%s_%s'%(item['scan'], item['objId'])].extend(item['end_vps'])
95 |
96 | self.ix = 0
97 | self._load_nav_graphs()
98 |
99 | self.sim = new_simulator(self.connectivity_dir)
100 | self.angle_feature = get_all_point_angle_feature(self.sim, self.angle_feat_size)
101 |
102 | self.buffered_state_dict = {}
103 | print('%s loaded with %d instructions, using splits: %s' % (
104 | self.__class__.__name__, len(self.data), self.name))
105 |
--------------------------------------------------------------------------------
/VLN-DUET-RVR/map_nav_src/reverie/eval_utils.py:
--------------------------------------------------------------------------------
1 | ''' Utils for evaluation '''
2 |
3 | import numpy as np
4 |
5 |
6 | def cal_dtw(shortest_distances, prediction, reference, success=None, threshold=3.0):
7 | dtw_matrix = np.inf * np.ones((len(prediction) + 1, len(reference) + 1))
8 | dtw_matrix[0][0] = 0
9 | for i in range(1, len(prediction)+1):
10 | for j in range(1, len(reference)+1):
11 | best_previous_cost = min(
12 | dtw_matrix[i-1][j], dtw_matrix[i][j-1], dtw_matrix[i-1][j-1])
13 | cost = shortest_distances[prediction[i-1]][reference[j-1]]
14 | dtw_matrix[i][j] = cost + best_previous_cost
15 |
16 | dtw = dtw_matrix[len(prediction)][len(reference)]
17 | ndtw = np.exp(-dtw/(threshold * len(reference)))
18 | if success is None:
19 | success = float(shortest_distances[prediction[-1]][reference[-1]] < threshold)
20 | sdtw = success * ndtw
21 |
22 | return {
23 | 'DTW': dtw,
24 | 'nDTW': ndtw,
25 | 'SDTW': sdtw
26 | }
27 |
28 | def cal_cls(shortest_distances, prediction, reference, threshold=3.0):
29 | def length(nodes):
30 | return np.sum([
31 | shortest_distances[a][b]
32 | for a, b in zip(nodes[:-1], nodes[1:])
33 | ])
34 |
35 | coverage = np.mean([
36 | np.exp(-np.min([ # pylint: disable=g-complex-comprehension
37 | shortest_distances[u][v] for v in prediction
38 | ]) / threshold) for u in reference
39 | ])
40 | expected = coverage * length(reference)
41 | score = expected / (expected + np.abs(expected - length(prediction)))
42 | return coverage * score
43 |
44 |
--------------------------------------------------------------------------------
/VLN-DUET-RVR/map_nav_src/reverie/parser.py:
--------------------------------------------------------------------------------
1 | import argparse
2 | import os
3 |
4 |
5 | def parse_args():
6 | parser = argparse.ArgumentParser(description="")
7 |
8 | parser.add_argument('--root_dir', type=str, default='/sequoia/data3/shichen/datasets')
9 | parser.add_argument('--dataset', type=str, default='reverie', choices=['reverie'])
10 | parser.add_argument('--output_dir', type=str, default='default', help='experiment id')
11 | parser.add_argument('--seed', type=int, default=0)
12 |
13 | parser.add_argument('--tokenizer', choices=['bert', 'xlm'], default='bert')
14 |
15 | parser.add_argument('--fusion', choices=['global', 'local', 'avg', 'dynamic'])
16 | parser.add_argument('--dagger_sample', choices=['sample', 'expl_sample', 'argmax'])
17 | parser.add_argument('--expl_max_ratio', type=float, default=0.6)
18 | parser.add_argument('--loss_nav_3', action='store_true', default=False)
19 |
20 | # distributional training (single-node, multiple-gpus)
21 | parser.add_argument('--world_size', type=int, default=1, help='number of gpus')
22 | parser.add_argument('--local_rank', type=int, default=-1)
23 | parser.add_argument("--node_rank", type=int, default=0, help="Id of the node")
24 |
25 | # General
26 | parser.add_argument('--iters', type=int, default=100000, help='training iterations')
27 | parser.add_argument('--log_every', type=int, default=1000)
28 | parser.add_argument('--eval_first', action='store_true', default=False)
29 |
30 | # Data preparation
31 | parser.add_argument('--max_instr_len', type=int, default=80)
32 | parser.add_argument('--max_action_len', type=int, default=15)
33 | parser.add_argument('--max_objects', type=int, default=20)
34 | parser.add_argument('--batch_size', type=int, default=8)
35 | parser.add_argument('--ignoreid', type=int, default=-100, help='ignoreid for action')
36 |
37 | # Load the model from
38 | parser.add_argument("--resume_file", default=None, help='path of the trained model')
39 | parser.add_argument("--resume_optimizer", action="store_true", default=False)
40 |
41 | # Augmented Paths from
42 | parser.add_argument("--multi_endpoints", default=False, action="store_true")
43 | parser.add_argument("--multi_startpoints", default=False, action="store_true")
44 | parser.add_argument("--aug_only", default=False, action="store_true")
45 | parser.add_argument("--aug", default=None)
46 | parser.add_argument('--bert_ckpt_file', default=None, help='init vlnbert')
47 |
48 | # Listener Model Config
49 | parser.add_argument("--ml_weight", type=float, default=0.20)
50 | parser.add_argument('--entropy_loss_weight', type=float, default=0.01)
51 |
52 | parser.add_argument("--features", type=str, default='vitbase')
53 | parser.add_argument('--obj_features', type=str, default='vitbase')
54 |
55 | parser.add_argument('--fix_lang_embedding', action='store_true', default=False)
56 | parser.add_argument('--fix_pano_embedding', action='store_true', default=False)
57 | parser.add_argument('--fix_local_branch', action='store_true', default=False)
58 |
59 | parser.add_argument('--num_l_layers', type=int, default=9)
60 | parser.add_argument('--num_pano_layers', type=int, default=2)
61 | parser.add_argument('--num_x_layers', type=int, default=4)
62 |
63 | parser.add_argument('--enc_full_graph', default=False, action='store_true')
64 | parser.add_argument('--graph_sprels', action='store_true', default=False)
65 |
66 | # Dropout Param
67 | parser.add_argument('--dropout', type=float, default=0.5)
68 | parser.add_argument('--feat_dropout', type=float, default=0.3)
69 |
70 | # Submision configuration
71 | parser.add_argument('--test', action='store_true', default=False)
72 | parser.add_argument('--zero_shot', action='store_true', default=False)
73 | parser.add_argument("--submit", action='store_true', default=False)
74 | parser.add_argument('--no_backtrack', action='store_true', default=False)
75 | parser.add_argument('--detailed_output', action='store_true', default=False)
76 |
77 | # Training Configurations
78 | parser.add_argument(
79 | '--optim', type=str, default='rms',
80 | choices=['rms', 'adam', 'adamW', 'sgd']
81 | ) # rms, adam
82 | parser.add_argument('--lr', type=float, default=0.00001, help="the learning rate")
83 | parser.add_argument('--decay', dest='weight_decay', type=float, default=0.)
84 | parser.add_argument(
85 | '--feedback', type=str, default='sample',
86 | help='How to choose next position, one of ``teacher``, ``sample`` and ``argmax``'
87 | )
88 | parser.add_argument('--epsilon', type=float, default=0.1, help='')
89 |
90 | # Model hyper params:
91 | parser.add_argument("--angle_feat_size", type=int, default=4)
92 | parser.add_argument('--image_feat_size', type=int, default=2048)
93 | parser.add_argument('--obj_feat_size', type=int, default=2048)
94 | parser.add_argument('--views', type=int, default=36)
95 |
96 | # # A2C
97 | parser.add_argument("--gamma", default=0.9, type=float, help='reward discount factor')
98 | parser.add_argument(
99 | "--normalize", dest="normalize_loss", default="total",
100 | type=str, help='batch or total'
101 | )
102 | parser.add_argument('--train_alg',
103 | choices=['imitation', 'dagger', 'a3c', 'reinforce'],
104 | default='imitation'
105 | )
106 |
107 | parser.add_argument('--hm3d_scan_ranges', nargs='+', type=int, default=None)
108 | parser.add_argument('--hm3d_og_loss_weight', type=float, default=1)
109 |
110 |
111 | parser.add_argument('--num_mp3d_scans', default=None, type=int)
112 |
113 | args, _ = parser.parse_known_args()
114 |
115 | args = postprocess_args(args)
116 |
117 | return args
118 |
119 |
120 | def postprocess_args(args):
121 | ROOTDIR = args.root_dir
122 |
123 | # Setup input paths
124 | ft_file_map = {
125 | 'timm_vitb16': 'view_timm_imagenet_vitb16',
126 | # 'timm_vitb16': '../../../../data/img_features',
127 | 'mmdet2d_frcnn_hm3d': 'obj_gtmax_mmdet2d_frcnn_hm3d/views',
128 | 'clip-h14': 'clip_vit-h14_final.hdf5'
129 | }
130 | args.img_ft_file = os.path.join(ROOTDIR, 'R2R', 'features', ft_file_map[args.features])
131 | if not os.path.exists(args.img_ft_file):
132 | args.img_ft_file = os.path.join(ROOTDIR, 'R2R', 'features', ft_file_map[args.features])
133 | if args.features == 'timm_vitb16':
134 | args.img_ft_file = "/nvme/wangzun/vln/hm3d-vln/HM3DAutoVLN/datasets/R2R_hm3dautovln/features/view_timm_imagenet_vitb16"
135 | elif args.features == 'clip-h14':
136 | args.img_ft_file = "../datasets/R2R/features/clip_vit-h14_final.hdf5"
137 | args.aug_ft_file = "../datasets/R2R/features/clip_vit-h14_final.hdf5"
138 |
139 | args.mp3d_ft_files = [
140 | os.path.join(ROOTDIR, 'R2R', 'features', 'clip_vit-h14_mp3d_img_image_synthesis.hdf5'),
141 | os.path.join(ROOTDIR, 'R2R', 'features', 'clip_vit-h14_mp3d_img_mask_image_synthesis.hdf5'),
142 | os.path.join(ROOTDIR, 'R2R', 'features', 'clip_vit-h14_mp3d_img_style_transfer.hdf5'),
143 | os.path.join(ROOTDIR, 'R2R', 'features', 'clip_vit-h14_mp3d_original.hdf5'),
144 | ]
145 | # args.mp3d_ft_files = [
146 | # os.path.join(ROOTDIR, 'R2R', 'features', 'clip_vit-h14_mp3d_original.hdf5'),
147 | # ]
148 |
149 | obj_ft_file_map = {
150 | 'timm_vitb16': 'obj_gtmax_timm_imagenet_vitb16',
151 | 'mmdet2d_frcnn_hm3d': 'obj_gtmax_mmdet2d_frcnn_hm3d/gt_objs',
152 | }
153 | args.obj_ft_file = os.path.join(ROOTDIR, 'REVERIE', 'features', obj_ft_file_map[args.obj_features])
154 | args.obj_ft_file = "../datasets/REVERIE/features/obj_gtmax_timm_imagenet_vitb16"
155 |
156 | args.val_ft_file = os.path.join(ROOTDIR, 'R2R', 'features', 'clip_vit-h14_mp3d_original.hdf5')
157 |
158 | args.connectivity_dir = os.path.join(ROOTDIR, 'R2R', 'connectivity')
159 | args.scan_data_dir = os.path.join(ROOTDIR, 'Matterport3D', 'v1_unzip_scans')
160 |
161 | args.anno_dir = os.path.join(ROOTDIR, 'REVERIE', 'annotations')
162 |
163 | # Build paths
164 | args.ckpt_dir = os.path.join(args.output_dir, 'ckpts')
165 | args.log_dir = os.path.join(args.output_dir, 'logs')
166 | if args.zero_shot:
167 | args.log_dir = os.path.join(args.output_dir, 'zero_shot_logs')
168 | args.pred_dir = os.path.join(args.output_dir, 'preds')
169 |
170 | os.makedirs(args.output_dir, exist_ok=True)
171 | os.makedirs(args.ckpt_dir, exist_ok=True)
172 | os.makedirs(args.log_dir, exist_ok=True)
173 | os.makedirs(args.pred_dir, exist_ok=True)
174 |
175 | return args
176 |
177 |
--------------------------------------------------------------------------------
/VLN-DUET-RVR/map_nav_src/scripts/run_reverie.sh:
--------------------------------------------------------------------------------
1 | #! /bin/bash
2 |
3 | DATA_ROOT=../datasets
4 |
5 | train_alg=dagger
6 |
7 | features=clip-h14
8 | ft_dim=1024
9 | obj_features=timm_vitb16
10 | obj_ft_dim=768
11 |
12 | ngpus=1
13 | seed=0
14 |
15 | name=scalevln_rvr
16 | outdir=${DATA_ROOT}/REVERIE/expr_duet/finetune/${name}
17 |
18 | flag="--root_dir ${DATA_ROOT}
19 | --dataset reverie
20 | --output_dir ${outdir}
21 | --world_size ${ngpus}
22 | --seed ${seed}
23 | --tokenizer bert
24 |
25 | --enc_full_graph
26 | --graph_sprels
27 | --fusion dynamic
28 | --multi_endpoints
29 |
30 | --dagger_sample sample
31 |
32 | --train_alg ${train_alg}
33 |
34 | --num_l_layers 9
35 | --num_x_layers 4
36 | --num_pano_layers 2
37 |
38 | --max_action_len 15
39 | --max_instr_len 100
40 | --max_objects 50
41 |
42 | --batch_size 32
43 | --lr 2e-5
44 | --iters 200000
45 | --log_every 500
46 | --optim adamW
47 |
48 | --features ${features}
49 | --obj_features ${obj_features}
50 | --image_feat_size ${ft_dim}
51 | --angle_feat_size 4
52 | --obj_feat_size ${obj_ft_dim}
53 |
54 | --ml_weight 0.2
55 |
56 | --feat_dropout 0.4
57 | --dropout 0.5
58 |
59 | --gamma 0."
60 |
61 | CUDA_VISIBLE_DEVICES=0 python reverie/main_nav_obj_hm3d.py $flag --tokenizer bert \
62 | --bert_ckpt_file ../datasets/REVERIE/trained_models/model_step_40000.pt \
63 | --aug ../datasets/REVERIE/annotations/ade20k_pseudo3d_depth2_epoch_94_beam0_sample10.jsonl \
64 | --eval_first
65 |
66 | # python reverie/main_nav_obj_hm3d.py $flag --tokenizer bert \
67 | # --zero_shot
--------------------------------------------------------------------------------
/VLN-DUET-RVR/map_nav_src/utils/data.py:
--------------------------------------------------------------------------------
1 | import os
2 | import json
3 | import jsonlines
4 | import networkx as nx
5 | import math
6 | import numpy as np
7 |
8 | import lmdb
9 | import msgpack
10 | import msgpack_numpy
11 | msgpack_numpy.patch()
12 | import h5py
13 | import random
14 |
15 | class ImageFeaturesDB(object):
16 | def __init__(self, img_ft_file, image_feat_size):
17 | self.image_feat_size = image_feat_size
18 | self.img_ft_file = img_ft_file
19 | self._feature_store = {}
20 | if ".hdf5" not in self.img_ft_file:
21 | self.env = lmdb.open(self.img_ft_file, readonly=True)
22 | else:
23 | print('pass!')
24 | with h5py.File(self.img_ft_file, 'r') as f:
25 | for key in list(f.keys()):
26 | self._feature_store[key] = f[key][...].astype(np.float32)
27 |
28 | def __del__(self):
29 | if ".hdf5" not in self.img_ft_file:
30 | self.env.close()
31 |
32 | def get_image_feature(self, scan, viewpoint):
33 | key = '%s_%s' % (scan, viewpoint)
34 | if key in self._feature_store:
35 | ft = self._feature_store[key]
36 | else:
37 | if ".hdf5" in self.img_ft_file:
38 | with h5py.File(self.img_ft_file, 'r') as f:
39 | ft = f[key][...].astype(np.float32)
40 | else:
41 | with self.env.begin() as txn:
42 | ft = msgpack.unpackb(txn.get(key.encode('ascii')))
43 | ft = ft[:, :self.image_feat_size].astype(np.float32)
44 | self._feature_store[key] = ft
45 | return ft
46 |
47 | class ImageFeaturesDB2(object):
48 | def __init__(self, img_ft_files, image_feat_size):
49 | self.image_feat_size = image_feat_size
50 | self.img_ft_file = img_ft_files
51 | self._feature_stores = {}
52 | for name in img_ft_files:
53 | self._feature_stores[name] = {}
54 | with h5py.File(name, 'r') as f:
55 | for key in f.keys():
56 | ft = f[key][...][:, :self.image_feat_size].astype(np.float32)
57 | self._feature_stores[name][key] = ft
58 | self.env_names = list(self._feature_stores.keys())
59 | print(self.env_names)
60 |
61 |
62 | def get_image_feature(self, scan, viewpoint):
63 | key = '%s_%s' % (scan, viewpoint)
64 | env_name = random.choice(self.env_names)
65 | if key in self._feature_stores[env_name]:
66 | ft = self._feature_stores[env_name][key]
67 | else:
68 | with h5py.File(env_name, 'r') as f:
69 | ft = f[key][...][:, :self.image_feat_size].astype(np.float32)
70 | self._feature_stores[env_name][key] = ft
71 | return ft
72 |
73 | def load_nav_graphs(connectivity_dir, scans):
74 | ''' Load connectivity graph for each scan '''
75 |
76 | def distance(pose1, pose2):
77 | ''' Euclidean distance between two graph poses '''
78 | return ((pose1['pose'][3]-pose2['pose'][3])**2\
79 | + (pose1['pose'][7]-pose2['pose'][7])**2\
80 | + (pose1['pose'][11]-pose2['pose'][11])**2)**0.5
81 |
82 | graphs = {}
83 | for scan in scans:
84 | with open(os.path.join(connectivity_dir, '%s_connectivity.json' % scan)) as f:
85 | G = nx.Graph()
86 | positions = {}
87 | data = json.load(f)
88 | for i,item in enumerate(data):
89 | if item['included']:
90 | for j,conn in enumerate(item['unobstructed']):
91 | if conn and data[j]['included']:
92 | positions[item['image_id']] = np.array([item['pose'][3],
93 | item['pose'][7], item['pose'][11]]);
94 | assert data[j]['unobstructed'][i], 'Graph should be undirected'
95 | G.add_edge(item['image_id'],data[j]['image_id'],weight=distance(item,data[j]))
96 | nx.set_node_attributes(G, values=positions, name='position')
97 | graphs[scan] = G
98 | return graphs
99 |
100 | def new_simulator(connectivity_dir, scan_data_dir=None, width=640, height=480, vfov=60):
101 | import MatterSim
102 |
103 | sim = MatterSim.Simulator()
104 | if scan_data_dir:
105 | sim.setDatasetPath(scan_data_dir)
106 | sim.setNavGraphPath(connectivity_dir)
107 | sim.setRenderingEnabled(False)
108 | sim.setCameraResolution(width, height)
109 | sim.setCameraVFOV(math.radians(vfov))
110 | sim.setDiscretizedViewingAngles(True)
111 | sim.setBatchSize(1)
112 | sim.initialize()
113 | #sim.init()
114 |
115 | return sim
116 |
117 | def angle_feature(heading, elevation, angle_feat_size):
118 | return np.array(
119 | [math.sin(heading), math.cos(heading), math.sin(elevation), math.cos(elevation)] * (angle_feat_size // 4),
120 | dtype=np.float32)
121 |
122 | def get_point_angle_feature(sim, angle_feat_size, baseViewId=0):
123 | feature = np.empty((36, angle_feat_size), np.float32)
124 | base_heading = (baseViewId % 12) * math.radians(30)
125 | base_elevation = (baseViewId // 12 - 1) * math.radians(30)
126 |
127 | for ix in range(36):
128 | # if ix == 0:
129 | # sim.newEpisode(['ZMojNkEp431'], ['2f4d90acd4024c269fb0efe49a8ac540'], [0], [math.radians(-30)])
130 | # elif ix % 12 == 0:
131 | # sim.makeAction([0], [1.0], [1.0])
132 | # else:
133 | # sim.makeAction([0], [1.0], [0])
134 |
135 | # state = sim.getState()[0]
136 | # assert state.viewIndex == ix
137 |
138 | # heading = state.heading - base_heading
139 | # elevation = state.elevation - base_elevation
140 | heading = (ix % 12) * math.radians(30) - base_heading
141 | elevation = (ix // 12 - 1) * math.radians(30) - base_elevation
142 |
143 | feature[ix, :] = angle_feature(heading, elevation, angle_feat_size)
144 | return feature
145 |
146 | def get_all_point_angle_feature(sim, angle_feat_size):
147 | return [get_point_angle_feature(sim, angle_feat_size, baseViewId) for baseViewId in range(36)]
148 |
149 |
--------------------------------------------------------------------------------
/VLN-DUET-RVR/map_nav_src/utils/distributed.py:
--------------------------------------------------------------------------------
1 | """
2 | Distributed tools
3 | """
4 | import os
5 | from pathlib import Path
6 | from pprint import pformat
7 | import pickle
8 |
9 | import torch
10 | import torch.distributed as dist
11 |
12 |
13 | def load_init_param(opts):
14 | """
15 | Load parameters for the rendezvous distributed procedure
16 | """
17 | # sync file
18 | if opts.output_dir != "":
19 | sync_dir = Path(opts.output_dir).resolve()
20 | sync_dir.mkdir(parents=True, exist_ok=True)
21 | sync_file = f"{sync_dir}/.torch_distributed_sync"
22 | else:
23 | raise RuntimeError("Can't find any sync dir")
24 |
25 | # world size
26 | if opts.world_size != -1:
27 | world_size = opts.world_size
28 | elif os.environ.get("WORLD_SIZE", "") != "":
29 | world_size = int(os.environ["WORLD_SIZE"])
30 | else:
31 | raise RuntimeError("Can't find any world size")
32 |
33 | # rank
34 | if os.environ.get("RANK", "") != "":
35 | # pytorch.distributed.launch provide this variable no matter what
36 | rank = int(os.environ["RANK"])
37 | else:
38 | if opts.node_rank != -1:
39 | node_rank = opts.node_rank
40 | elif os.environ.get("NODE_RANK", "") != "":
41 | node_rank = int(os.environ["NODE_RANK"])
42 | else:
43 | raise RuntimeError("Can't find any rank or node rank")
44 |
45 | if opts.local_rank != -1:
46 | local_rank = opts.local_rank
47 | elif os.environ.get("LOCAL_RANK", "") != "":
48 | local_rank = int(os.environ["LOCAL_RANK"])
49 | else:
50 | raise RuntimeError("Can't find any rank or local rank")
51 |
52 | # WARNING: this assumes that each node has the same number of GPUs
53 | n_gpus = torch.cuda.device_count()
54 | rank = local_rank + node_rank * n_gpus
55 |
56 | return {
57 | "backend": "nccl",
58 | # "init_method": f"file://{sync_file}",
59 | "rank": rank,
60 | "world_size": world_size,
61 | }
62 |
63 |
64 | def init_distributed(opts):
65 | init_param = load_init_param(opts)
66 | rank = init_param["rank"]
67 |
68 | print(f"Init distributed {init_param['rank']} - {init_param['world_size']}")
69 |
70 | dist.init_process_group(**init_param)
71 | return rank
72 |
73 |
74 | def is_default_gpu(opts) -> bool:
75 | return opts.local_rank == -1 or dist.get_rank() == 0
76 |
77 |
78 | def is_dist_avail_and_initialized():
79 | if not dist.is_available():
80 | return False
81 | if not dist.is_initialized():
82 | return False
83 | return True
84 |
85 | def get_world_size():
86 | if not is_dist_avail_and_initialized():
87 | return 1
88 | return dist.get_world_size()
89 |
90 | def all_gather(data):
91 | """
92 | Run all_gather on arbitrary picklable data (not necessarily tensors)
93 | Args:
94 | data: any picklable object
95 | Returns:
96 | list[data]: list of data gathered from each rank
97 | """
98 | world_size = get_world_size()
99 | if world_size == 1:
100 | return [data]
101 |
102 | # serialized to a Tensor
103 | buffer = pickle.dumps(data)
104 | storage = torch.ByteStorage.from_buffer(buffer)
105 | tensor = torch.ByteTensor(storage).to("cuda")
106 |
107 | # obtain Tensor size of each rank
108 | local_size = torch.tensor([tensor.numel()], device="cuda")
109 | size_list = [torch.tensor([0], device="cuda") for _ in range(world_size)]
110 | dist.all_gather(size_list, local_size)
111 | size_list = [int(size.item()) for size in size_list]
112 | max_size = max(size_list)
113 |
114 | # receiving Tensor from all ranks
115 | # we pad the tensor because torch all_gather does not support
116 | # gathering tensors of different shapes
117 | tensor_list = []
118 | for _ in size_list:
119 | tensor_list.append(torch.empty((max_size,), dtype=torch.uint8, device="cuda"))
120 | if local_size != max_size:
121 | padding = torch.empty(size=(max_size - local_size,), dtype=torch.uint8, device="cuda")
122 | tensor = torch.cat((tensor, padding), dim=0)
123 | dist.all_gather(tensor_list, tensor)
124 |
125 | data_list = []
126 | for size, tensor in zip(size_list, tensor_list):
127 | buffer = tensor.cpu().numpy().tobytes()[:size]
128 | data_list.append(pickle.loads(buffer))
129 |
130 | return data_list
131 |
132 |
133 | def reduce_dict(input_dict, average=True):
134 | """
135 | Args:
136 | input_dict (dict): all the values will be reduced
137 | average (bool): whether to do average or sum
138 | Reduce the values in the dictionary from all processes so that all processes
139 | have the averaged results. Returns a dict with the same fields as
140 | input_dict, after reduction.
141 | """
142 | world_size = get_world_size()
143 | if world_size < 2:
144 | return input_dict
145 | with torch.no_grad():
146 | names = []
147 | values = []
148 | # sort the keys so that they are consistent across processes
149 | for k in sorted(input_dict.keys()):
150 | names.append(k)
151 | values.append(input_dict[k])
152 | values = torch.stack(values, dim=0)
153 | dist.all_reduce(values)
154 | if average:
155 | values /= world_size
156 | reduced_dict = {k: v for k, v in zip(names, values)}
157 | return reduced_dict
158 |
159 |
160 | def merge_dist_results(results):
161 | outs = []
162 | for res in results:
163 | outs.extend(res)
164 | return outs
165 |
--------------------------------------------------------------------------------
/VLN-DUET-RVR/map_nav_src/utils/logger.py:
--------------------------------------------------------------------------------
1 | import os
2 | import sys
3 | import math
4 | import time
5 | from collections import OrderedDict
6 |
7 |
8 | def write_to_record_file(data, file_path, verbose=True):
9 | if verbose:
10 | print(data)
11 | record_file = open(file_path, 'a')
12 | record_file.write(data+'\n')
13 | record_file.close()
14 |
15 |
16 | def asMinutes(s):
17 | m = math.floor(s / 60)
18 | s -= m * 60
19 | return '%dm %ds' % (m, s)
20 |
21 | def timeSince(since, percent):
22 | now = time.time()
23 | s = now - since
24 | es = s / (percent)
25 | rs = es - s
26 | return '%s (- %s)' % (asMinutes(s), asMinutes(rs))
27 |
28 | class Timer:
29 | def __init__(self):
30 | self.cul = OrderedDict()
31 | self.start = {}
32 | self.iter = 0
33 |
34 | def reset(self):
35 | self.cul = OrderedDict()
36 | self.start = {}
37 | self.iter = 0
38 |
39 | def tic(self, key):
40 | self.start[key] = time.time()
41 |
42 | def toc(self, key):
43 | delta = time.time() - self.start[key]
44 | if key not in self.cul:
45 | self.cul[key] = delta
46 | else:
47 | self.cul[key] += delta
48 |
49 | def step(self):
50 | self.iter += 1
51 |
52 | def show(self):
53 | total = sum(self.cul.values())
54 | for key in self.cul:
55 | print("%s, total time %0.2f, avg time %0.2f, part of %0.2f" %
56 | (key, self.cul[key], self.cul[key]*1./self.iter, self.cul[key]*1./total))
57 | print(total / self.iter)
58 |
59 |
60 | def print_progress(iteration, total, prefix='', suffix='', decimals=1, bar_length=100):
61 | """
62 | Call in a loop to create terminal progress bar
63 | @params:
64 | iteration - Required : current iteration (Int)
65 | total - Required : total iterations (Int)
66 | prefix - Optional : prefix string (Str)
67 | suffix - Optional : suffix string (Str)
68 | decimals - Optional : positive number of decimals in percent complete (Int)
69 | bar_length - Optional : character length of bar (Int)
70 | """
71 | str_format = "{0:." + str(decimals) + "f}"
72 | percents = str_format.format(100 * (iteration / float(total)))
73 | filled_length = int(round(bar_length * iteration / float(total)))
74 | bar = '█' * filled_length + '-' * (bar_length - filled_length)
75 |
76 | sys.stdout.write('\r%s |%s| %s%s %s' % (prefix, bar, percents, '%', suffix)),
77 |
78 | if iteration == total:
79 | sys.stdout.write('\n')
80 | sys.stdout.flush()
81 |
--------------------------------------------------------------------------------
/VLN-DUET-RVR/map_nav_src/utils/misc.py:
--------------------------------------------------------------------------------
1 | import random
2 | import numpy as np
3 | import torch
4 |
5 | def set_random_seed(seed):
6 | torch.manual_seed(seed)
7 | torch.cuda.manual_seed(seed)
8 | torch.cuda.manual_seed_all(seed)
9 | random.seed(seed)
10 | np.random.seed(seed)
11 |
12 | def length2mask(length, size=None):
13 | batch_size = len(length)
14 | size = int(max(length)) if size is None else size
15 | mask = (torch.arange(size, dtype=torch.int64).unsqueeze(0).repeat(batch_size, 1)
16 | > (torch.LongTensor(length) - 1).unsqueeze(1)).cuda()
17 | return mask
18 |
--------------------------------------------------------------------------------
/VLN-DUET-RVR/map_nav_src/utils/ops.py:
--------------------------------------------------------------------------------
1 | import numpy as np
2 | import torch
3 |
4 | def pad_tensors(tensors, lens=None, pad=0):
5 | """B x [T, ...]"""
6 | if lens is None:
7 | lens = [t.size(0) for t in tensors]
8 | max_len = max(lens)
9 | bs = len(tensors)
10 | hid = list(tensors[0].size()[1:])
11 | size = [bs, max_len] + hid
12 |
13 | dtype = tensors[0].dtype
14 | device = tensors[0].device
15 | output = torch.zeros(*size, dtype=dtype).to(device)
16 | if pad:
17 | output.data.fill_(pad)
18 | for i, (t, l) in enumerate(zip(tensors, lens)):
19 | output.data[i, :l, ...] = t.data
20 | return output
21 |
22 | def gen_seq_masks(seq_lens, max_len=None):
23 | if max_len is None:
24 | max_len = max(seq_lens)
25 |
26 | if isinstance(seq_lens, torch.Tensor):
27 | device = seq_lens.device
28 | masks = torch.arange(max_len).to(device).repeat(len(seq_lens), 1) < seq_lens.unsqueeze(1)
29 | return masks
30 |
31 | if max_len == 0:
32 | return np.zeros((len(seq_lens), 0), dtype=np.bool)
33 |
34 | seq_lens = np.array(seq_lens)
35 | batch_size = len(seq_lens)
36 | masks = np.arange(max_len).reshape(-1, max_len).repeat(batch_size, 0)
37 | masks = masks < seq_lens.reshape(-1, 1)
38 | return masks
--------------------------------------------------------------------------------
/VLN-DUET-RVR/pretrain_src/configs/model_config.json:
--------------------------------------------------------------------------------
1 | {
2 | "pred_head_dropout_prob": 0.1,
3 | "attention_probs_dropout_prob": 0.1,
4 | "finetuning_task": null,
5 | "hidden_act": "gelu",
6 | "hidden_dropout_prob": 0.1,
7 | "hidden_size": 768,
8 | "image_feat_size": 1024,
9 | "image_prob_size": 0,
10 | "angle_feat_size": 4,
11 | "obj_feat_size": 768,
12 | "obj_prob_size": 0,
13 | "share_scene_obj_enc": false,
14 | "img_feature_type": "imagenet",
15 | "initializer_range": 0.02,
16 | "intermediate_size": 3072,
17 | "num_l_layers": 9,
18 | "num_x_layers": 4,
19 | "num_pano_layers": 2,
20 | "layer_norm_eps": 1e-12,
21 | "max_position_embeddings": 512,
22 | "max_action_steps": 100,
23 | "num_attention_heads": 12,
24 | "num_hidden_layers": 12,
25 | "num_labels": 2,
26 | "output_attentions": false,
27 | "output_hidden_states": false,
28 | "pruned_heads": {},
29 | "torchscript": false,
30 | "type_vocab_size": 2,
31 | "update_lang_bert": true,
32 | "vocab_size": 30522,
33 | "use_lang2visn_attn": true,
34 | "graph_sprels": true,
35 | "glocal_fuse": true,
36 | "lang_bert_name": "bert-base-uncased"
37 | }
--------------------------------------------------------------------------------
/VLN-DUET-RVR/pretrain_src/configs/training_args.json:
--------------------------------------------------------------------------------
1 | {
2 | "vlnbert": "cmt",
3 | "model_config": "",
4 | "checkpoint": null,
5 | "output_dir": "",
6 | "train_batch_size": 128,
7 | "val_batch_size": 64,
8 | "gradient_accumulation_steps": 1,
9 | "learning_rate": 5e-05,
10 | "valid_steps": 5000,
11 | "log_steps": 1000,
12 | "num_train_steps": 200000,
13 | "optim": "adamw",
14 | "betas": [
15 | 0.9,
16 | 0.98
17 | ],
18 | "dropout": 0.1,
19 | "weight_decay": 0.01,
20 | "grad_norm": 5.0,
21 | "warmup_steps": 10000,
22 | "seed": 0,
23 | "fp16": false,
24 | "n_workers": 0,
25 | "pin_mem": true,
26 | "local_rank": -1,
27 | "node_rank": 0,
28 | "world_size": 1,
29 | "mrc_mask_prob": 0.15,
30 | "itm_neg_imgs": 5,
31 | "nearby_vp_steps": null,
32 | "max_objects": 50,
33 | "max_txt_len": 100,
34 | "init_pretrained": "lxmert",
35 | "train_datasets": {
36 | "HM3D": {
37 | "name": "HM3D",
38 | "train_traj_files": [
39 | "../datasets/REVERIE/annotations/pretrain/ade20k_pseudo3d_depth2_epoch_94_beam0_zun_3_none.jsonl",
40 | "../datasets/REVERIE/annotations/pretrain/ade20k_pseudo3d_depth2_epoch_94_beam0_zun_gibson_3_none.jsonl"
41 | ],
42 | "connectivity_dir": "../datasets/R2R/connectivity",
43 | "img_ft_file": "../datasets/R2R/features/clip_vit-h14_final.hdf5",
44 | "obj_ft_file": "../../data_all/obj_features_merged",
45 | "scanvp_cands_file": [
46 | "../datasets/REVERIE/annotations/scanvp_candview_relangles_new.json",
47 | "../datasets/R2R/annotations/scanvp_candview_relangles.json",
48 | "../datasets/REVERIE/annotations/scanvp_candview_relangles_new_gibson.json"
49 | ],
50 | "tasks": [
51 | "mlm",
52 | "sap",
53 | "og"
54 | ],
55 | "mix_ratio": [
56 | 1,
57 | 1,
58 | 1
59 | ],
60 | "scan_ranges": null
61 | },
62 | "REVERIE": {
63 | "name": "REVERIE",
64 | "val_seen_traj_files": [
65 | "../datasets/REVERIE/annotations/pretrain/REVERIE_val_seen_enc.jsonl"
66 | ],
67 | "val_unseen_traj_files": [
68 | "../datasets/REVERIE/annotations/pretrain/REVERIE_val_unseen_enc.jsonl"
69 | ],
70 | "connectivity_dir": "../datasets/R2R/connectivity",
71 | "img_ft_file": "../datasets/R2R/features/clip_vit-h14_final.hdf5",
72 | "obj_ft_file": "../datasets/REVERIE/features/obj_gtmax_timm_imagenet_vitb16",
73 | "scanvp_cands_file": [
74 | "../datasets/R2R/annotations/scanvp_candview_relangles.json"
75 | ]
76 | }
77 | }
78 | }
--------------------------------------------------------------------------------
/VLN-DUET-RVR/pretrain_src/data/__init__.py:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/wz0919/ScaleVLN/1189fe898462e2e10908631070bcf2d4ec2204b2/VLN-DUET-RVR/pretrain_src/data/__init__.py
--------------------------------------------------------------------------------
/VLN-DUET-RVR/pretrain_src/data/common.py:
--------------------------------------------------------------------------------
1 | import os
2 | import math
3 | import json
4 | import numpy as np
5 | import networkx as nx
6 |
7 | import torch
8 |
9 | def pad_tensors(tensors, lens=None, pad=0):
10 | """B x [T, ...] torch tensors"""
11 | if lens is None:
12 | lens = [t.size(0) for t in tensors]
13 | max_len = max(lens)
14 | bs = len(tensors)
15 | hid = list(tensors[0].size()[1:])
16 | size = [bs, max_len] + hid
17 |
18 | dtype = tensors[0].dtype
19 | output = torch.zeros(*size, dtype=dtype)
20 | if pad:
21 | output.data.fill_(pad)
22 | for i, (t, l) in enumerate(zip(tensors, lens)):
23 | output.data[i, :l, ...] = t.data
24 | return output
25 |
26 | def gen_seq_masks(seq_lens, max_len=None):
27 | """
28 | Args:
29 | seq_lens: list or nparray int, shape=(N, )
30 | Returns:
31 | masks: nparray, shape=(N, L), padded=0
32 | """
33 | seq_lens = np.array(seq_lens)
34 | if max_len is None:
35 | max_len = max(seq_lens)
36 | if max_len == 0:
37 | return np.zeros((len(seq_lens), 0), dtype=np.bool)
38 | batch_size = len(seq_lens)
39 | masks = np.arange(max_len).reshape(-1, max_len).repeat(batch_size, 0)
40 | masks = masks < seq_lens.reshape(-1, 1)
41 | return masks
42 |
43 | def get_angle_fts(headings, elevations, angle_feat_size):
44 | ang_fts = [np.sin(headings), np.cos(headings), np.sin(elevations), np.cos(elevations)]
45 | ang_fts = np.vstack(ang_fts).transpose().astype(np.float32)
46 | num_repeats = angle_feat_size // 4
47 | if num_repeats > 1:
48 | ang_fts = np.concatenate([ang_fts] * num_repeats, 1)
49 | return ang_fts
50 |
51 | def get_view_rel_angles(baseViewId=0):
52 | rel_angles = np.zeros((36, 2), dtype=np.float32)
53 |
54 | base_heading = (baseViewId % 12) * math.radians(30)
55 | base_elevation = (baseViewId // 12 - 1) * math.radians(30)
56 | for ix in range(36):
57 | if ix == 0:
58 | heading = 0
59 | elevation = math.radians(-30)
60 | elif ix % 12 == 0:
61 | heading = 0
62 | elevation += math.radians(30)
63 | else:
64 | heading += math.radians(30)
65 | rel_angles[ix, 0] = heading - base_heading
66 | rel_angles[ix, 1] = elevation - base_elevation
67 |
68 | return rel_angles
69 |
70 |
71 | def load_nav_graphs(connectivity_dir):
72 | ''' Load connectivity graph for each scan '''
73 |
74 | def distance(pose1, pose2):
75 | ''' Euclidean distance between two graph poses '''
76 | return ((pose1['pose'][3]-pose2['pose'][3])**2\
77 | + (pose1['pose'][7]-pose2['pose'][7])**2\
78 | + (pose1['pose'][11]-pose2['pose'][11])**2)**0.5
79 |
80 | scans = [x.strip() for x in open(os.path.join(connectivity_dir, 'scans.txt')).readlines()]
81 | graphs = {}
82 | for scan in scans:
83 | with open(os.path.join(connectivity_dir, '%s_connectivity.json' % scan)) as f:
84 | G = nx.Graph()
85 | positions = {}
86 | data = json.load(f)
87 | for i, item in enumerate(data):
88 | if item['included']:
89 | for j,conn in enumerate(item['unobstructed']):
90 | if conn and data[j]['included']:
91 | positions[item['image_id']] = np.array([item['pose'][3],
92 | item['pose'][7], item['pose'][11]]);
93 | assert data[j]['unobstructed'][i], 'Graph should be undirected'
94 | G.add_edge(item['image_id'],data[j]['image_id'],weight=distance(item,data[j]))
95 | nx.set_node_attributes(G, values=positions, name='position')
96 | graphs[scan] = G
97 |
98 | shortest_distances = {}
99 | shortest_paths = {}
100 | for scan, G in graphs.items(): # compute all shortest paths
101 | shortest_distances[scan] = dict(nx.all_pairs_dijkstra_path_length(G))
102 | shortest_paths[scan] = dict(nx.all_pairs_dijkstra_path(G))
103 | return graphs, shortest_distances, shortest_paths
104 |
105 | def softmax(logits, dim=1):
106 | # logits: (n, d)
107 | tmp = np.exp(logits)
108 | return tmp / np.sum(tmp, axis=dim, keepdims=True)
109 |
110 |
111 | def calculate_vp_rel_pos_fts(a, b, base_heading=0, base_elevation=0):
112 | # a, b: (x, y, z)
113 | dx = b[0] - a[0]
114 | dy = b[1] - a[1]
115 | dz = b[2] - a[2]
116 | xy_dist = max(np.sqrt(dx**2 + dy**2), 1e-8)
117 | xyz_dist = max(np.sqrt(dx**2 + dy**2 + dz**2), 1e-8)
118 |
119 | # the simulator's api is weired (x-y axis is transposed)
120 | heading = np.arcsin(dx/xy_dist) # [-pi/2, pi/2]
121 | if b[1] < a[1]:
122 | heading = np.pi - heading
123 | heading -= base_heading
124 |
125 | elevation = np.arcsin(dz/xyz_dist) # [-pi/2, pi/2]
126 | elevation -= base_elevation
127 |
128 | return heading, elevation, xyz_dist
129 |
130 | def normalize_angle(x):
131 | '''convert radians into (-pi, pi]'''
132 | pi2 = 2 * math.pi
133 | x = x % pi2 # [0, 2pi]
134 | x = np.where(x > math.pi, x - pi2, x)
135 | return x
--------------------------------------------------------------------------------
/VLN-DUET-RVR/pretrain_src/data/loader.py:
--------------------------------------------------------------------------------
1 | """
2 | Copyright (c) Microsoft Corporation.
3 | Licensed under the MIT license.
4 |
5 | A prefetch loader to speedup data loading
6 | Modified from Nvidia Deep Learning Examples
7 | (https://github.com/NVIDIA/DeepLearningExamples/tree/master/PyTorch).
8 | """
9 | import random
10 | from typing import List, Dict, Tuple, Union, Iterator
11 |
12 | import torch
13 | from torch.utils.data import DataLoader, RandomSampler, SequentialSampler
14 | from torch.utils.data.distributed import DistributedSampler
15 | import torch.distributed as dist
16 |
17 |
18 | class MetaLoader:
19 | """wraps multiple data loaders"""
20 |
21 | def __init__(
22 | self, loaders, accum_steps: int = 1, distributed: bool = False, device=None
23 | ):
24 | assert isinstance(loaders, dict)
25 | self.name2loader = {}
26 | self.name2iter = {}
27 | self.name2pre_epoch = {}
28 | self.names: List[str] = []
29 | ratios: List[int] = []
30 | for n, l in loaders.items():
31 | if isinstance(l, tuple):
32 | l, r, p = l
33 | elif isinstance(l, DataLoader):
34 | r = 1
35 | p = lambda e: None
36 | else:
37 | raise ValueError()
38 | self.names.append(n)
39 | self.name2loader[n] = l
40 | self.name2iter[n] = iter(l)
41 | self.name2pre_epoch[n] = p
42 | ratios.append(r)
43 |
44 | self.accum_steps = accum_steps
45 | self.device = device
46 | self.sampling_ratios = torch.tensor(ratios).float().to(self.device)
47 | self.distributed = distributed
48 | self.step = 0
49 |
50 | def __iter__(self) -> Iterator[Tuple]:
51 | """this iterator will run indefinitely"""
52 | task_id = None
53 | epoch_id = 0
54 | while True:
55 | if self.step % self.accum_steps == 0:
56 | task_id = torch.multinomial(self.sampling_ratios, 1)
57 | if self.distributed:
58 | # make sure all process is training same task
59 | dist.broadcast(task_id, 0)
60 | self.step += 1
61 | task = self.names[task_id.cpu().item()]
62 | iter_ = self.name2iter[task]
63 | try:
64 | batch = next(iter_)
65 | except StopIteration:
66 | epoch_id += 1
67 | # In distributed mode, calling the set_epoch() method at the beginning of each epoch
68 | # before creating the DataLoader iterator is necessary to make shuffling work properly
69 | # across multiple epochs. Otherwise, the same ordering will be always used.
70 | self.name2pre_epoch[task](epoch_id)
71 | iter_ = iter(self.name2loader[task])
72 | batch = next(iter_)
73 | self.name2iter[task] = iter_
74 |
75 | yield task, batch
76 |
77 |
78 | def move_to_cuda(batch: Union[List, Tuple, Dict, torch.Tensor], device: torch.device):
79 | if isinstance(batch, torch.Tensor):
80 | return batch.to(device, non_blocking=True)
81 | elif isinstance(batch, list):
82 | return [move_to_cuda(t, device) for t in batch]
83 | elif isinstance(batch, tuple):
84 | return tuple(move_to_cuda(t, device) for t in batch)
85 | elif isinstance(batch, dict):
86 | return {n: move_to_cuda(t, device) for n, t in batch.items()}
87 | return batch
88 |
89 |
90 | class PrefetchLoader(object):
91 | """
92 | overlap compute and cuda data transfer
93 | """
94 | def __init__(self, loader, device: torch.device):
95 | self.loader = loader
96 | self.device = device
97 |
98 | def __iter__(self):
99 | loader_it = iter(self.loader)
100 | self.preload(loader_it)
101 | batch = self.next(loader_it)
102 | while batch is not None:
103 | yield batch
104 | batch = self.next(loader_it)
105 |
106 | def __len__(self):
107 | return len(self.loader)
108 |
109 | def preload(self, it):
110 | try:
111 | self.batch = next(it)
112 | except StopIteration:
113 | self.batch = None
114 | return
115 | self.batch = move_to_cuda(self.batch, self.device)
116 |
117 | def next(self, it):
118 | batch = self.batch
119 | self.preload(it)
120 | return batch
121 |
122 | def __getattr__(self, name):
123 | method = self.loader.__getattribute__(name)
124 | return method
125 |
126 |
127 | def build_dataloader(task, dataset, collate_fn, is_train: bool, opts):
128 |
129 | batch_size = opts.train_batch_size if is_train else opts.val_batch_size
130 | # if task == 'itm': batch_size = max(1, batch_size // 2)
131 |
132 | if opts.local_rank == -1:
133 | if is_train:
134 | sampler: Union[
135 | RandomSampler, SequentialSampler, DistributedSampler
136 | ] = RandomSampler(dataset)
137 | else:
138 | sampler = SequentialSampler(dataset)
139 |
140 | size = torch.cuda.device_count() if torch.cuda.is_available() else 1
141 | pre_epoch = lambda e: None
142 |
143 | # DataParallel: scale the batch size by the number of GPUs
144 | if size > 1:
145 | batch_size *= size
146 |
147 | else:
148 | size = dist.get_world_size()
149 | sampler = DistributedSampler(
150 | dataset, num_replicas=size, rank=dist.get_rank(), shuffle=is_train
151 | )
152 | pre_epoch = sampler.set_epoch
153 |
154 | loader = DataLoader(
155 | dataset,
156 | sampler=sampler,
157 | batch_size=batch_size,
158 | num_workers=opts.n_workers,
159 | pin_memory=opts.pin_mem,
160 | collate_fn=collate_fn,
161 | drop_last=False,
162 | )
163 |
164 | return loader, pre_epoch
165 |
--------------------------------------------------------------------------------
/VLN-DUET-RVR/pretrain_src/model/__init__.py:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/wz0919/ScaleVLN/1189fe898462e2e10908631070bcf2d4ec2204b2/VLN-DUET-RVR/pretrain_src/model/__init__.py
--------------------------------------------------------------------------------
/VLN-DUET-RVR/pretrain_src/model/ops.py:
--------------------------------------------------------------------------------
1 | import torch
2 |
3 | from .transformer import TransformerEncoder, TransformerEncoderLayer
4 |
5 | # try:
6 | # from apex.normalization.fused_layer_norm import FusedLayerNorm as BertLayerNorm
7 | # except (ImportError, AttributeError) as e:
8 | # # logger.info("Better speed can be achieved with apex installed from https://www.github.com/nvidia/apex .")
9 | # BertLayerNorm = torch.nn.LayerNorm
10 | BertLayerNorm = torch.nn.LayerNorm
11 |
12 | def create_transformer_encoder(config, num_layers, norm=False):
13 | enc_layer = TransformerEncoderLayer(
14 | config.hidden_size, config.num_attention_heads,
15 | dim_feedforward=config.intermediate_size,
16 | dropout=config.hidden_dropout_prob,
17 | activation=config.hidden_act,
18 | normalize_before=True
19 | )
20 | if norm:
21 | norm_layer = BertLayerNorm(config.hidden_size, eps=1e-12)
22 | else:
23 | norm_layer = None
24 | return TransformerEncoder(enc_layer, num_layers, norm=norm_layer, batch_first=True)
25 |
26 | def extend_neg_masks(masks, dtype=None):
27 | """
28 | mask from (N, L) into (N, 1(H), 1(L), L) and make it negative
29 | """
30 | if dtype is None:
31 | dtype = torch.float
32 | extended_masks = masks.unsqueeze(1).unsqueeze(2)
33 | extended_masks = extended_masks.to(dtype=dtype)
34 | extended_masks = (1.0 - extended_masks) * -10000.0
35 | return extended_masks
36 |
37 | def gen_seq_masks(seq_lens, max_len=None):
38 | if max_len is None:
39 | max_len = max(seq_lens)
40 | batch_size = len(seq_lens)
41 | device = seq_lens.device
42 |
43 | masks = torch.arange(max_len).unsqueeze(0).repeat(batch_size, 1).to(device)
44 | masks = masks < seq_lens.unsqueeze(1)
45 | return masks
46 |
47 | def pad_tensors_wgrad(tensors, lens=None):
48 | """B x [T, ...] torch tensors"""
49 | if lens is None:
50 | lens = [t.size(0) for t in tensors]
51 | max_len = max(lens)
52 | batch_size = len(tensors)
53 | hid = list(tensors[0].size()[1:])
54 |
55 | device = tensors[0].device
56 | dtype = tensors[0].dtype
57 |
58 | output = []
59 | for i in range(batch_size):
60 | if lens[i] < max_len:
61 | tmp = torch.cat(
62 | [tensors[i], torch.zeros([max_len-lens[i]]+hid, dtype=dtype).to(device)],
63 | dim=0
64 | )
65 | else:
66 | tmp = tensors[i]
67 | output.append(tmp)
68 | output = torch.stack(output, 0)
69 | return output
70 |
--------------------------------------------------------------------------------
/VLN-DUET-RVR/pretrain_src/optim/__init__.py:
--------------------------------------------------------------------------------
1 | """
2 | Copyright (c) Microsoft Corporation.
3 | Licensed under the MIT license.
4 |
5 | """
6 | from .sched import noam_schedule, warmup_linear, get_lr_sched
7 | from .adamw import AdamW
8 |
--------------------------------------------------------------------------------
/VLN-DUET-RVR/pretrain_src/optim/adamw.py:
--------------------------------------------------------------------------------
1 | """
2 | AdamW optimizer (weight decay fix)
3 | copied from hugginface (https://github.com/huggingface/transformers).
4 | """
5 |
6 | import math
7 | from typing import Callable, Iterable, Tuple
8 |
9 | import torch
10 |
11 | from torch.optim import Optimizer
12 |
13 | class AdamW(Optimizer):
14 | """
15 | Implements Adam algorithm with weight decay fix as introduced in `Decoupled Weight Decay Regularization
16 | `__.
17 |
18 | Parameters:
19 | params (:obj:`Iterable[torch.nn.parameter.Parameter]`):
20 | Iterable of parameters to optimize or dictionaries defining parameter groups.
21 | lr (:obj:`float`, `optional`, defaults to 1e-3):
22 | The learning rate to use.
23 | betas (:obj:`Tuple[float,float]`, `optional`, defaults to (0.9, 0.999)):
24 | Adam's betas parameters (b1, b2).
25 | eps (:obj:`float`, `optional`, defaults to 1e-6):
26 | Adam's epsilon for numerical stability.
27 | weight_decay (:obj:`float`, `optional`, defaults to 0):
28 | Decoupled weight decay to apply.
29 | correct_bias (:obj:`bool`, `optional`, defaults to `True`):
30 | Whether ot not to correct bias in Adam (for instance, in Bert TF repository they use :obj:`False`).
31 | """
32 |
33 | def __init__(
34 | self,
35 | params: Iterable[torch.nn.parameter.Parameter],
36 | lr: float = 1e-3,
37 | betas: Tuple[float, float] = (0.9, 0.999),
38 | eps: float = 1e-6,
39 | weight_decay: float = 0.0,
40 | correct_bias: bool = True,
41 | ):
42 | if lr < 0.0:
43 | raise ValueError("Invalid learning rate: {} - should be >= 0.0".format(lr))
44 | if not 0.0 <= betas[0] < 1.0:
45 | raise ValueError("Invalid beta parameter: {} - should be in [0.0, 1.0[".format(betas[0]))
46 | if not 0.0 <= betas[1] < 1.0:
47 | raise ValueError("Invalid beta parameter: {} - should be in [0.0, 1.0[".format(betas[1]))
48 | if not 0.0 <= eps:
49 | raise ValueError("Invalid epsilon value: {} - should be >= 0.0".format(eps))
50 | defaults = dict(lr=lr, betas=betas, eps=eps, weight_decay=weight_decay, correct_bias=correct_bias)
51 | super().__init__(params, defaults)
52 |
53 | def step(self, closure: Callable = None):
54 | """
55 | Performs a single optimization step.
56 |
57 | Arguments:
58 | closure (:obj:`Callable`, `optional`): A closure that reevaluates the model and returns the loss.
59 | """
60 | loss = None
61 | if closure is not None:
62 | loss = closure()
63 |
64 | for group in self.param_groups:
65 | for p in group["params"]:
66 | if p.grad is None:
67 | continue
68 | grad = p.grad.data
69 | if grad.is_sparse:
70 | raise RuntimeError("Adam does not support sparse gradients, please consider SparseAdam instead")
71 |
72 | state = self.state[p]
73 |
74 | # State initialization
75 | if len(state) == 0:
76 | state["step"] = 0
77 | # Exponential moving average of gradient values
78 | state["exp_avg"] = torch.zeros_like(p.data)
79 | # Exponential moving average of squared gradient values
80 | state["exp_avg_sq"] = torch.zeros_like(p.data)
81 |
82 | exp_avg, exp_avg_sq = state["exp_avg"], state["exp_avg_sq"]
83 | beta1, beta2 = group["betas"]
84 |
85 | state["step"] += 1
86 |
87 | # Decay the first and second moment running average coefficient
88 | # In-place operations to update the averages at the same time
89 | exp_avg.mul_(beta1).add_(grad, alpha=1.0 - beta1)
90 | exp_avg_sq.mul_(beta2).addcmul_(grad, grad, value=1.0 - beta2)
91 | denom = exp_avg_sq.sqrt().add_(group["eps"])
92 |
93 | step_size = group["lr"]
94 | if group["correct_bias"]: # No bias correction for Bert
95 | bias_correction1 = 1.0 - beta1 ** state["step"]
96 | bias_correction2 = 1.0 - beta2 ** state["step"]
97 | step_size = step_size * math.sqrt(bias_correction2) / bias_correction1
98 |
99 | p.data.addcdiv_(exp_avg, denom, value=-step_size)
100 |
101 | # Just adding the square of the weights to the loss function is *not*
102 | # the correct way of using L2 regularization/weight decay with Adam,
103 | # since that will interact with the m and v parameters in strange ways.
104 | #
105 | # Instead we want to decay the weights in a manner that doesn't interact
106 | # with the m/v parameters. This is equivalent to adding the square
107 | # of the weights to the loss with plain (non-momentum) SGD.
108 | # Add weight decay at the end (fixed version)
109 | if group["weight_decay"] > 0.0:
110 | p.data.add_(p.data, alpha=-group["lr"] * group["weight_decay"])
111 |
112 | return loss
113 |
--------------------------------------------------------------------------------
/VLN-DUET-RVR/pretrain_src/optim/lookahead.py:
--------------------------------------------------------------------------------
1 | # Lookahead implementation from https://github.com/rwightman/pytorch-image-models/blob/master/timm/optim/lookahead.py
2 |
3 | """ Lookahead Optimizer Wrapper.
4 | Implementation modified from: https://github.com/alphadl/lookahead.pytorch
5 | Paper: `Lookahead Optimizer: k steps forward, 1 step back` - https://arxiv.org/abs/1907.08610
6 | """
7 | import torch
8 | from torch.optim.optimizer import Optimizer
9 | from torch.optim import Adam
10 | from collections import defaultdict
11 |
12 | class Lookahead(Optimizer):
13 | def __init__(self, base_optimizer, alpha=0.5, k=6):
14 | if not 0.0 <= alpha <= 1.0:
15 | raise ValueError(f'Invalid slow update rate: {alpha}')
16 | if not 1 <= k:
17 | raise ValueError(f'Invalid lookahead steps: {k}')
18 | defaults = dict(lookahead_alpha=alpha, lookahead_k=k, lookahead_step=0)
19 | self.base_optimizer = base_optimizer
20 | self.param_groups = self.base_optimizer.param_groups
21 | self.defaults = base_optimizer.defaults
22 | self.defaults.update(defaults)
23 | self.state = defaultdict(dict)
24 | # manually add our defaults to the param groups
25 | for name, default in defaults.items():
26 | for group in self.param_groups:
27 | group.setdefault(name, default)
28 |
29 | def update_slow(self, group):
30 | for fast_p in group["params"]:
31 | if fast_p.grad is None:
32 | continue
33 | param_state = self.state[fast_p]
34 | if 'slow_buffer' not in param_state:
35 | param_state['slow_buffer'] = torch.empty_like(fast_p.data)
36 | param_state['slow_buffer'].copy_(fast_p.data)
37 | slow = param_state['slow_buffer']
38 | slow.add_(group['lookahead_alpha'], fast_p.data - slow)
39 | fast_p.data.copy_(slow)
40 |
41 | def sync_lookahead(self):
42 | for group in self.param_groups:
43 | self.update_slow(group)
44 |
45 | def step(self, closure=None):
46 | # print(self.k)
47 | #assert id(self.param_groups) == id(self.base_optimizer.param_groups)
48 | loss = self.base_optimizer.step(closure)
49 | for group in self.param_groups:
50 | group['lookahead_step'] += 1
51 | if group['lookahead_step'] % group['lookahead_k'] == 0:
52 | self.update_slow(group)
53 | return loss
54 |
55 | def state_dict(self):
56 | fast_state_dict = self.base_optimizer.state_dict()
57 | slow_state = {
58 | (id(k) if isinstance(k, torch.Tensor) else k): v
59 | for k, v in self.state.items()
60 | }
61 | fast_state = fast_state_dict['state']
62 | param_groups = fast_state_dict['param_groups']
63 | return {
64 | 'state': fast_state,
65 | 'slow_state': slow_state,
66 | 'param_groups': param_groups,
67 | }
68 |
69 | def load_state_dict(self, state_dict):
70 | fast_state_dict = {
71 | 'state': state_dict['state'],
72 | 'param_groups': state_dict['param_groups'],
73 | }
74 | self.base_optimizer.load_state_dict(fast_state_dict)
75 |
76 | # We want to restore the slow state, but share param_groups reference
77 | # with base_optimizer. This is a bit redundant but least code
78 | slow_state_new = False
79 | if 'slow_state' not in state_dict:
80 | print('Loading state_dict from optimizer without Lookahead applied.')
81 | state_dict['slow_state'] = defaultdict(dict)
82 | slow_state_new = True
83 | slow_state_dict = {
84 | 'state': state_dict['slow_state'],
85 | 'param_groups': state_dict['param_groups'], # this is pointless but saves code
86 | }
87 | super(Lookahead, self).load_state_dict(slow_state_dict)
88 | self.param_groups = self.base_optimizer.param_groups # make both ref same container
89 | if slow_state_new:
90 | # reapply defaults to catch missing lookahead specific ones
91 | for name, default in self.defaults.items():
92 | for group in self.param_groups:
93 | group.setdefault(name, default)
94 |
95 | def LookaheadAdam(params, alpha=0.5, k=6, *args, **kwargs):
96 | adam = Adam(params, *args, **kwargs)
97 | return Lookahead(adam, alpha, k)
98 |
--------------------------------------------------------------------------------
/VLN-DUET-RVR/pretrain_src/optim/misc.py:
--------------------------------------------------------------------------------
1 | """
2 | Copyright (c) Microsoft Corporation.
3 | Licensed under the MIT license.
4 |
5 | Misc lr helper
6 | """
7 | from torch.optim import Adam, Adamax
8 |
9 | from .adamw import AdamW
10 | from .rangerlars import RangerLars
11 |
12 | def build_optimizer(model, opts):
13 | param_optimizer = list(model.named_parameters())
14 | no_decay = ['bias', 'LayerNorm.bias', 'LayerNorm.weight']
15 | optimizer_grouped_parameters = [
16 | {'params': [p for n, p in param_optimizer
17 | if not any(nd in n for nd in no_decay)],
18 | 'weight_decay': opts.weight_decay},
19 | {'params': [p for n, p in param_optimizer
20 | if any(nd in n for nd in no_decay)],
21 | 'weight_decay': 0.0}
22 | ]
23 |
24 | # currently Adam only
25 | if opts.optim == 'adam':
26 | OptimCls = Adam
27 | elif opts.optim == 'adamax':
28 | OptimCls = Adamax
29 | elif opts.optim == 'adamw':
30 | OptimCls = AdamW
31 | elif opts.optim == 'rangerlars':
32 | OptimCls = RangerLars
33 | else:
34 | raise ValueError('invalid optimizer')
35 | optimizer = OptimCls(optimizer_grouped_parameters,
36 | lr=opts.learning_rate, betas=opts.betas)
37 | return optimizer
38 |
--------------------------------------------------------------------------------
/VLN-DUET-RVR/pretrain_src/optim/ralamb.py:
--------------------------------------------------------------------------------
1 | import torch, math
2 | from torch.optim.optimizer import Optimizer
3 |
4 | # RAdam + LARS
5 | class Ralamb(Optimizer):
6 |
7 | def __init__(self, params, lr=1e-3, betas=(0.9, 0.999), eps=1e-8, weight_decay=0):
8 | defaults = dict(lr=lr, betas=betas, eps=eps, weight_decay=weight_decay)
9 | self.buffer = [[None, None, None] for ind in range(10)]
10 | super(Ralamb, self).__init__(params, defaults)
11 |
12 | def __setstate__(self, state):
13 | super(Ralamb, self).__setstate__(state)
14 |
15 | def step(self, closure=None):
16 |
17 | loss = None
18 | if closure is not None:
19 | loss = closure()
20 |
21 | for group in self.param_groups:
22 |
23 | for p in group['params']:
24 | if p.grad is None:
25 | continue
26 | grad = p.grad.data.float()
27 | if grad.is_sparse:
28 | raise RuntimeError('Ralamb does not support sparse gradients')
29 |
30 | p_data_fp32 = p.data.float()
31 |
32 | state = self.state[p]
33 |
34 | if len(state) == 0:
35 | state['step'] = 0
36 | state['exp_avg'] = torch.zeros_like(p_data_fp32)
37 | state['exp_avg_sq'] = torch.zeros_like(p_data_fp32)
38 | else:
39 | state['exp_avg'] = state['exp_avg'].type_as(p_data_fp32)
40 | state['exp_avg_sq'] = state['exp_avg_sq'].type_as(p_data_fp32)
41 |
42 | exp_avg, exp_avg_sq = state['exp_avg'], state['exp_avg_sq']
43 | beta1, beta2 = group['betas']
44 |
45 | # Decay the first and second moment running average coefficient
46 | # m_t
47 | exp_avg.mul_(beta1).add_(grad, alpha=1 - beta1)
48 | # v_t
49 | exp_avg_sq.mul_(beta2).addcmul_(grad, grad, value=1 - beta2)
50 |
51 | state['step'] += 1
52 | buffered = self.buffer[int(state['step'] % 10)]
53 |
54 | if state['step'] == buffered[0]:
55 | N_sma, radam_step_size = buffered[1], buffered[2]
56 | else:
57 | buffered[0] = state['step']
58 | beta2_t = beta2 ** state['step']
59 | N_sma_max = 2 / (1 - beta2) - 1
60 | N_sma = N_sma_max - 2 * state['step'] * beta2_t / (1 - beta2_t)
61 | buffered[1] = N_sma
62 |
63 | # more conservative since it's an approximated value
64 | if N_sma >= 5:
65 | radam_step_size = math.sqrt((1 - beta2_t) * (N_sma - 4) / (N_sma_max - 4) * (N_sma - 2) / N_sma * N_sma_max / (N_sma_max - 2)) / (1 - beta1 ** state['step'])
66 | else:
67 | radam_step_size = 1.0 / (1 - beta1 ** state['step'])
68 | buffered[2] = radam_step_size
69 |
70 | if group['weight_decay'] != 0:
71 | p_data_fp32.add_(p_data_fp32, alpha=-group['weight_decay'] * group['lr'])
72 |
73 | # more conservative since it's an approximated value
74 | radam_step = p_data_fp32.clone()
75 | if N_sma >= 5:
76 | denom = exp_avg_sq.sqrt().add_(group['eps'])
77 | radam_step.addcdiv_(-radam_step_size * group['lr'], exp_avg, denom)
78 | else:
79 | radam_step.add_(exp_avg, alpha=-radam_step_size * group['lr'])
80 |
81 | radam_norm = radam_step.pow(2).sum().sqrt()
82 | weight_norm = p.data.pow(2).sum().sqrt().clamp(0, 10)
83 | if weight_norm == 0 or radam_norm == 0:
84 | trust_ratio = 1
85 | else:
86 | trust_ratio = weight_norm / radam_norm
87 |
88 | state['weight_norm'] = weight_norm
89 | state['adam_norm'] = radam_norm
90 | state['trust_ratio'] = trust_ratio
91 |
92 | if N_sma >= 5:
93 | p_data_fp32.addcdiv_(-radam_step_size * group['lr'] * trust_ratio, exp_avg, denom)
94 | else:
95 | p_data_fp32.add_(-radam_step_size * group['lr'] * trust_ratio, exp_avg)
96 |
97 | p.data.copy_(p_data_fp32)
98 |
99 | return loss
100 |
--------------------------------------------------------------------------------
/VLN-DUET-RVR/pretrain_src/optim/rangerlars.py:
--------------------------------------------------------------------------------
1 | import torch, math
2 | from torch.optim.optimizer import Optimizer
3 | import itertools as it
4 | from .lookahead import *
5 | from .ralamb import *
6 |
7 | # RAdam + LARS + LookAHead
8 |
9 | # Lookahead implementation from https://github.com/lonePatient/lookahead_pytorch/blob/master/optimizer.py
10 | # RAdam + LARS implementation from https://gist.github.com/redknightlois/c4023d393eb8f92bb44b2ab582d7ec20
11 |
12 | def RangerLars(params, alpha=0.5, k=6, *args, **kwargs):
13 | ralamb = Ralamb(params, *args, **kwargs)
14 | return Lookahead(ralamb, alpha, k)
15 |
--------------------------------------------------------------------------------
/VLN-DUET-RVR/pretrain_src/optim/sched.py:
--------------------------------------------------------------------------------
1 | """
2 | Copyright (c) Microsoft Corporation.
3 | Licensed under the MIT license.
4 |
5 | optimizer learning rate scheduling helpers
6 | """
7 | from math import ceil
8 |
9 |
10 | def noam_schedule(step, warmup_step=4000):
11 | """ original Transformer schedule"""
12 | if step <= warmup_step:
13 | return step / warmup_step
14 | return (warmup_step ** 0.5) * (step ** -0.5)
15 |
16 |
17 | def warmup_linear(step, warmup_step, tot_step):
18 | """ BERT schedule """
19 | if step < warmup_step:
20 | return step / warmup_step
21 | return max(0, (tot_step-step)/(tot_step-warmup_step))
22 |
23 |
24 | def get_lr_sched(global_step, opts):
25 | # learning rate scheduling
26 | lr_this_step = opts.learning_rate * warmup_linear(
27 | global_step, opts.warmup_steps, opts.num_train_steps)
28 | if lr_this_step <= 0:
29 | lr_this_step = 1e-8
30 | return lr_this_step
31 |
--------------------------------------------------------------------------------
/VLN-DUET-RVR/pretrain_src/parser.py:
--------------------------------------------------------------------------------
1 | import argparse
2 | import sys
3 | import json
4 |
5 |
6 | def load_parser():
7 | parser = argparse.ArgumentParser()
8 |
9 | # Required parameters
10 | # NOTE: train tasks and val tasks cannot take command line arguments
11 | parser.add_argument('--vlnbert', choices=['cmt', 'mmt', 'causal.cmt', 'cmt3'])
12 | parser.add_argument(
13 | "--model_config", type=str, help="path to model structure config json"
14 | )
15 | parser.add_argument(
16 | "--checkpoint", default=None, type=str, help="path to model checkpoint (*.pt)"
17 | )
18 |
19 | parser.add_argument(
20 | "--output_dir",
21 | default=None,
22 | type=str,
23 | help="The output directory where the model checkpoints will be written.",
24 | )
25 |
26 | # training parameters
27 | parser.add_argument(
28 | "--train_batch_size",
29 | default=4096,
30 | type=int,
31 | help="Total batch size for training. ",
32 | )
33 | parser.add_argument(
34 | "--val_batch_size",
35 | default=4096,
36 | type=int,
37 | help="Total batch size for validation. ",
38 | )
39 | parser.add_argument(
40 | "--gradient_accumulation_steps",
41 | type=int,
42 | default=16,
43 | help="Number of updates steps to accumualte before "
44 | "performing a backward/update pass.",
45 | )
46 | parser.add_argument(
47 | "--learning_rate",
48 | default=3e-5,
49 | type=float,
50 | help="The initial learning rate for Adam.",
51 | )
52 | parser.add_argument(
53 | "--valid_steps", default=1000, type=int, help="Run validation every X steps"
54 | )
55 | parser.add_argument("--log_steps", default=1000, type=int)
56 | parser.add_argument(
57 | "--num_train_steps",
58 | default=100000,
59 | type=int,
60 | help="Total number of training updates to perform.",
61 | )
62 | parser.add_argument(
63 | "--optim",
64 | default="adamw",
65 | choices=["adam", "adamax", "adamw"],
66 | help="optimizer",
67 | )
68 | parser.add_argument(
69 | "--betas", default=[0.9, 0.98], nargs="+", help="beta for adam optimizer"
70 | )
71 | parser.add_argument(
72 | "--dropout", default=0.1, type=float, help="tune dropout regularization"
73 | )
74 | parser.add_argument(
75 | "--weight_decay",
76 | default=0.01,
77 | type=float,
78 | help="weight decay (L2) regularization",
79 | )
80 | parser.add_argument(
81 | "--grad_norm",
82 | default=2.0,
83 | type=float,
84 | help="gradient clipping (-1 for no clipping)",
85 | )
86 | parser.add_argument(
87 | "--warmup_steps",
88 | default=10000,
89 | type=int,
90 | help="Number of training steps to perform linear " "learning rate warmup for.",
91 | )
92 |
93 | # device parameters
94 | parser.add_argument(
95 | "--seed", type=int, default=0, help="random seed for initialization"
96 | )
97 | parser.add_argument(
98 | "--fp16",
99 | action="store_true",
100 | help="Whether to use 16-bit float precision instead of 32-bit",
101 | )
102 | parser.add_argument(
103 | "--n_workers", type=int, default=4, help="number of data workers"
104 | )
105 | parser.add_argument("--pin_mem", action="store_true", help="pin memory")
106 |
107 | # distributed computing
108 | parser.add_argument(
109 | "--local_rank",
110 | type=int,
111 | default=-1,
112 | help="local rank for distributed training on gpus",
113 | )
114 | parser.add_argument(
115 | "--node_rank",
116 | type=int,
117 | default=0,
118 | help="Id of the node",
119 | )
120 | parser.add_argument(
121 | "--world_size",
122 | type=int,
123 | default=1,
124 | help="Number of GPUs across all nodes",
125 | )
126 |
127 | # can use config files
128 | parser.add_argument("--config", required=True, help="JSON config files")
129 |
130 | return parser
131 |
132 |
133 | def parse_with_config(parser):
134 | args = parser.parse_args()
135 | if args.config is not None:
136 | config_args = json.load(open(args.config))
137 | override_keys = {
138 | arg[2:].split("=")[0] for arg in sys.argv[1:] if arg.startswith("--")
139 | }
140 | for k, v in config_args.items():
141 | if k not in override_keys:
142 | setattr(args, k, v)
143 | del args.config
144 | return args
145 |
--------------------------------------------------------------------------------
/VLN-DUET-RVR/pretrain_src/submit_reverie.sh:
--------------------------------------------------------------------------------
1 | #!/bin/bash
2 |
3 | #SBATCH --gres=gpu:1
4 | #SBATCH --qos=gamma_access
5 | #SBATCH -p gamma-gpu
6 | #SBATCH --wrap='hostname'
7 | #SBATCH --mem=64G
8 | #SBATCH --time=2-0:00:00
9 |
10 | # module add cuda/11.2
11 | # module add gcc/9.1.0
12 | # module add nccl/1.3.3
13 |
14 | # CUDA_VISIBLE_DEVICES=$1
15 | # python train_hm3d_reverie.py --world_size 4
16 |
17 | export NCCL_IB_DISABLE=1; export NCCL_P2P_DISABLE=1;
18 | CUDA_VISIBLE_DEVICES=0 python -u train_hm3d_reverie.py \
19 | --vlnbert cmt \
20 | --model_config configs/model_config.json \
21 | --config configs/training_args.json \
22 | --output_dir ../datasets/REVERIE/expr_duet/pretrain/hm3d_rvr
23 | # --model_config config/hm3d_reverie_obj_model_config.json \
24 | # --config config/hm3d_reverie_obj_pretrain_text.clip.json \
25 | # --output_dir ../datasets/REVERIE/expr_duet/pretrain/agent7_text.clip.fix_nomlm
--------------------------------------------------------------------------------
/VLN-DUET-RVR/pretrain_src/test/test_dataset.py:
--------------------------------------------------------------------------------
1 | import os
2 | import sys
3 |
4 | from data.dataset import ReverieTextPathData
5 |
6 | if __name__ == '__main__':
7 | # test
8 | data_dir = '/sequoia/data3/shichen/datasets'
9 | train_traj_files = [os.path.join(data_dir, "REVERIE/annotations/pretrain/REVERIE_train_enc.jsonl")]
10 | connectivity_dir = os.path.join(data_dir, "R2R/connectivity")
11 | img_ft_file = os.path.join(data_dir, "R2R/features/pth_vit_base_patch16_224_imagenet.hdf5")
12 | obj_ft_file = os.path.join(data_dir, "REVERIE/features/obj.avg.top3.min80_vit_base_patch16_224_imagenet.hdf5")
13 | scanvp_cands_file = os.path.join(data_dir, "R2R/annotations/scanvp_candview_relangles.json")
14 |
15 | train_nav_db = ReverieTextPathData(
16 | train_traj_files, img_ft_file, obj_ft_file,
17 | scanvp_cands_file, connectivity_dir,
18 | image_prob_size=1000, image_feat_size=768, angle_feat_size=4,
19 | obj_feat_size=768, obj_prob_size=1000,
20 | max_txt_len=100, max_objects=20, in_memory=True
21 | )
22 | print(len(train_nav_db))
23 |
24 | print('\n\npos')
25 | print(train_nav_db.get_input(0, 'pos', return_act_label=True, return_img_probs=True, return_obj_label=True))
26 |
27 | print('\n\nneg_in_gt_path')
28 | print(train_nav_db.get_input(0, 'neg_in_gt_path', return_act_label=True))
29 |
30 | print('\n\nneg_others')
31 | print(train_nav_db.get_input(0, 'neg_others', return_act_label=True))
32 |
--------------------------------------------------------------------------------
/VLN-DUET-RVR/pretrain_src/test/test_tasks.py:
--------------------------------------------------------------------------------
1 | import os
2 | import sys
3 |
4 | from data.dataset import ReverieTextPathData
5 | from data.tasks import MlmDataset, mlm_collate
6 |
7 | import torch
8 | from torch.utils.data import DataLoader
9 | from transformers import AutoTokenizer
10 |
11 | if __name__ == '__main__':
12 | # test
13 | data_dir = '/sequoia/data3/shichen/datasets'
14 | train_traj_files = [os.path.join(data_dir, "REVERIE/annotations/pretrain/REVERIE_train_enc.jsonl")]
15 | connectivity_dir = os.path.join(data_dir, "R2R/connectivity")
16 | img_ft_file = os.path.join(data_dir, "R2R/features/pth_vit_base_patch16_224_imagenet.hdf5")
17 | obj_ft_file = os.path.join(data_dir, "REVERIE/features/obj.avg.top3.min80_vit_base_patch16_224_imagenet.hdf5")
18 | scanvp_cands_file = os.path.join(data_dir, "R2R/annotations/scanvp_candview_relangles.json")
19 |
20 | train_nav_db = ReverieTextPathData(
21 | train_traj_files, img_ft_file, obj_ft_file,
22 | scanvp_cands_file, connectivity_dir,
23 | image_prob_size=1000, image_feat_size=768, angle_feat_size=4,
24 | obj_feat_size=768, obj_prob_size=1000,
25 | max_txt_len=100, max_objects=20, in_memory=True
26 | )
27 |
28 | tokenizer = AutoTokenizer.from_pretrained('bert-base-uncased')
29 | dataset = MlmDataset(train_nav_db, tokenizer)
30 |
31 | loader = DataLoader(dataset, batch_size=2, shuffle=False, collate_fn=mlm_collate)
32 |
33 | for batch in loader:
34 | for key, value in batch.items():
35 | print(key)
36 | if isinstance(value, torch.Tensor):
37 | print(value.size())
38 | break
--------------------------------------------------------------------------------
/VLN-DUET-RVR/pretrain_src/test/test_vilmodel.py:
--------------------------------------------------------------------------------
1 | import os
2 | import sys
3 | import json
4 | from easydict import EasyDict
5 |
6 | from data.dataset import ReverieTextPathData
7 | from data.tasks import MlmDataset, mlm_collate
8 | from model.vilmodel import GlocalTextPathCMT
9 |
10 | import torch
11 | from torch.utils.data import DataLoader
12 | from transformers import AutoTokenizer, PretrainedConfig
13 |
14 | if __name__ == '__main__':
15 | # test
16 | data_dir = '/sequoia/data3/shichen/datasets'
17 | train_traj_files = [os.path.join(data_dir, "REVERIE/annotations/pretrain/REVERIE_train_enc.jsonl")]
18 | connectivity_dir = os.path.join(data_dir, "R2R/connectivity")
19 | img_ft_file = os.path.join(data_dir, "R2R/features/pth_vit_base_patch16_224_imagenet.hdf5")
20 | obj_ft_file = os.path.join(data_dir, "REVERIE/features/obj.avg.top3.min80_vit_base_patch16_224_imagenet.hdf5")
21 | scanvp_cands_file = os.path.join(data_dir, "R2R/annotations/scanvp_candview_relangles.json")
22 |
23 | model_config = PretrainedConfig.from_json_file('config/reverie_obj_model_config.json')
24 | model = GlocalTextPathCMT(model_config).cuda()
25 |
26 | train_nav_db = ReverieTextPathData(
27 | train_traj_files, img_ft_file, obj_ft_file,
28 | scanvp_cands_file, connectivity_dir,
29 | image_prob_size=1000, image_feat_size=768, angle_feat_size=4,
30 | obj_feat_size=768, obj_prob_size=1000,
31 | max_txt_len=100, max_objects=20, in_memory=True
32 | )
33 |
34 | tokenizer = AutoTokenizer.from_pretrained('bert-base-uncased')
35 | dataset = MlmDataset(train_nav_db, tokenizer)
36 | loader = DataLoader(dataset, batch_size=2, shuffle=False, collate_fn=mlm_collate)
37 |
38 | for batch in loader:
39 | for key, value in batch.items():
40 | print(key)
41 | if isinstance(value, torch.Tensor):
42 | batch[key] = value.cuda()
43 | print('\t', value.size())
44 | txt_embeds = model.forward_mlm(
45 | batch['txt_ids'], batch['txt_lens'], batch['traj_view_img_fts'],
46 | batch['traj_obj_img_fts'], batch['traj_loc_fts'], batch['traj_nav_types'],
47 | batch['traj_step_lens'], batch['traj_vp_view_lens'], batch['traj_vp_obj_lens'],
48 | batch['traj_vpids'], batch['traj_cand_vpids'],
49 | batch['gmap_lens'], batch['gmap_step_ids'], batch['gmap_pos_fts'],
50 | batch['gmap_pair_dists'], batch['gmap_vpids'], batch['vp_pos_fts'],
51 | )
52 | print(txt_embeds)
53 | s = txt_embeds.sum()
54 | s.backward()
55 | print(s)
56 |
--------------------------------------------------------------------------------
/VLN-DUET-RVR/pretrain_src/utils/__init__.py:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/wz0919/ScaleVLN/1189fe898462e2e10908631070bcf2d4ec2204b2/VLN-DUET-RVR/pretrain_src/utils/__init__.py
--------------------------------------------------------------------------------
/VLN-DUET-RVR/pretrain_src/utils/distributed.py:
--------------------------------------------------------------------------------
1 | """
2 | Distributed tools
3 | """
4 | import os
5 | from pathlib import Path
6 | from pprint import pformat
7 | import pickle
8 |
9 | import torch
10 | import torch.distributed as dist
11 |
12 |
13 | DEFAULT_PORT = 8738
14 | DEFAULT_PORT_RANGE = 127
15 | # Default address of world rank 0
16 | DEFAULT_MASTER_ADDR = "127.0.0.1"
17 | SLURM_JOBID = os.environ.get("SLURM_JOB_ID", None)
18 |
19 | def parse_ip(s):
20 | s = s.split("-")
21 | s = [y for x in s for y in x.split("[") if y]
22 | s = [y for x in s for y in x.split(",") if y ]
23 |
24 | return ".".join(s[2:6])
25 |
26 | def load_init_param(opts):
27 | # import pdb;pdb.set_trace()
28 | """
29 | Load parameters for the rendezvous distributed procedure
30 | """
31 | # # sync file
32 | # if opts.output_dir != "":
33 | # sync_dir = Path(opts.output_dir).resolve()
34 | # sync_dir.mkdir(parents=True, exist_ok=True)
35 | # sync_file = f"{sync_dir}/.torch_distributed_sync"
36 | # else:
37 | # raise RuntimeError("Can't find any sync dir")
38 |
39 | # # world size
40 | # if opts.world_size != -1:
41 | # world_size = opts.world_size
42 | # elif os.environ.get("WORLD_SIZE", "") != "":
43 | # world_size = int(os.environ["WORLD_SIZE"])
44 | # else:
45 | # raise RuntimeError("Can't find any world size")
46 |
47 | # # rank
48 | # if os.environ.get("RANK", "") != "":
49 | # # pytorch.distributed.launch provide this variable no matter what
50 | # rank = int(os.environ["RANK"])
51 | # else:
52 | # # if not provided, calculate the gpu rank
53 | # if opts.node_rank != -1:
54 | # node_rank = opts.node_rank
55 | # elif os.environ.get("NODE_RANK", "") != "":
56 | # node_rank = int(os.environ["NODE_RANK"])
57 | # else:
58 | # raise RuntimeError("Can't find any rank or node rank")
59 |
60 | # if opts.local_rank != -1:
61 | # local_rank = opts.local_rank
62 | # elif os.environ.get("LOCAL_RANK", "") != "":
63 | # local_rank = int(os.environ["LOCAL_RANK"])
64 | # else:
65 | # raise RuntimeError("Can't find any rank or local rank")
66 |
67 | # # WARNING: this assumes that each node has the same number of GPUs
68 | # n_gpus = torch.cuda.device_count()
69 | # rank = local_rank + node_rank * n_gpus
70 | # opts.rank = rank
71 |
72 | # Check to see if we should parse from torch.distributed.launch
73 | if os.environ.get("LOCAL_RANK", None) is not None:
74 | local_rank = int(os.environ["LOCAL_RANK"])
75 | world_rank = int(os.environ["RANK"])
76 | world_size = int(os.environ["WORLD_SIZE"])
77 | # Else parse from SLURM is using SLURM
78 | elif SLURM_JOBID is not None:
79 | local_rank = int(os.environ["SLURM_LOCALID"])
80 | world_rank = int(os.environ["SLURM_PROCID"])
81 | world_size = int(os.environ["SLURM_NTASKS"])
82 | # Otherwise setup for just 1 process, this is nice for testing
83 | else:
84 | local_rank = 0
85 | world_rank = 0
86 | world_size = 1
87 |
88 | opts.local_rank = local_rank
89 | opts.rank = world_rank
90 | opts.world_size = world_size
91 |
92 | print("tcp://{}:{}".format(
93 | parse_ip(os.environ['SLURM_STEP_NODELIST']), "9998"))
94 | return {
95 | "backend": "nccl",
96 | "init_method": "tcp://{}:{}".format(
97 | parse_ip(os.environ['SLURM_STEP_NODELIST']), "9998"),
98 | "rank": world_rank,
99 | "world_size": world_size,
100 | }
101 |
102 |
103 | def init_distributed(opts):
104 | init_param = load_init_param(opts)
105 | rank = init_param["rank"]
106 |
107 | print(f"Init distributed {init_param['rank']} - {init_param['world_size']}")
108 |
109 | dist.init_process_group(**init_param)
110 |
111 |
112 | def is_default_gpu(opts) -> bool:
113 | return opts.local_rank == -1 or dist.get_rank() == 0
114 |
115 |
116 |
117 | def is_dist_avail_and_initialized():
118 | if not dist.is_available():
119 | return False
120 | if not dist.is_initialized():
121 | return False
122 | return True
123 |
124 | def get_world_size():
125 | if not is_dist_avail_and_initialized():
126 | return 1
127 | return dist.get_world_size()
128 |
129 | def all_gather(data):
130 | """
131 | Run all_gather on arbitrary picklable data (not necessarily tensors)
132 | Args:
133 | data: any picklable object
134 | Returns:
135 | list[data]: list of data gathered from each rank
136 | """
137 | world_size = get_world_size()
138 | if world_size == 1:
139 | return [data]
140 |
141 | # serialized to a Tensor
142 | buffer = pickle.dumps(data)
143 | storage = torch.ByteStorage.from_buffer(buffer)
144 | tensor = torch.ByteTensor(storage).to("cuda")
145 |
146 | # obtain Tensor size of each rank
147 | local_size = torch.tensor([tensor.numel()], device="cuda")
148 | size_list = [torch.tensor([0], device="cuda") for _ in range(world_size)]
149 | dist.all_gather(size_list, local_size)
150 | size_list = [int(size.item()) for size in size_list]
151 | max_size = max(size_list)
152 |
153 | # receiving Tensor from all ranks
154 | # we pad the tensor because torch all_gather does not support
155 | # gathering tensors of different shapes
156 | tensor_list = []
157 | for _ in size_list:
158 | tensor_list.append(torch.empty((max_size,), dtype=torch.uint8, device="cuda"))
159 | if local_size != max_size:
160 | padding = torch.empty(size=(max_size - local_size,), dtype=torch.uint8, device="cuda")
161 | tensor = torch.cat((tensor, padding), dim=0)
162 | dist.all_gather(tensor_list, tensor)
163 |
164 | data_list = []
165 | for size, tensor in zip(size_list, tensor_list):
166 | buffer = tensor.cpu().numpy().tobytes()[:size]
167 | data_list.append(pickle.loads(buffer))
168 |
169 | return data_list
170 |
171 |
172 | def reduce_dict(input_dict, average=True):
173 | """
174 | Args:
175 | input_dict (dict): all the values will be reduced
176 | average (bool): whether to do average or sum
177 | Reduce the values in the dictionary from all processes so that all processes
178 | have the averaged results. Returns a dict with the same fields as
179 | input_dict, after reduction.
180 | """
181 | world_size = get_world_size()
182 | if world_size < 2:
183 | return input_dict
184 | with torch.no_grad():
185 | names = []
186 | values = []
187 | # sort the keys so that they are consistent across processes
188 | for k in sorted(input_dict.keys()):
189 | names.append(k)
190 | values.append(input_dict[k])
191 | values = torch.stack(values, dim=0)
192 | dist.all_reduce(values)
193 | if average:
194 | values /= world_size
195 | reduced_dict = {k: v for k, v in zip(names, values)}
196 | return reduced_dict
197 |
198 |
199 |
--------------------------------------------------------------------------------
/VLN-DUET-RVR/pretrain_src/utils/logger.py:
--------------------------------------------------------------------------------
1 | """
2 | Copyright (c) Microsoft Corporation.
3 | Licensed under the MIT license.
4 |
5 | helper for logging
6 | NOTE: loggers are global objects use with caution
7 | """
8 | import logging
9 | import math
10 |
11 | import tensorboardX
12 |
13 |
14 | _LOG_FMT = '%(asctime)s - %(levelname)s - %(name)s - %(message)s'
15 | _DATE_FMT = '%m/%d/%Y %H:%M:%S'
16 | logging.basicConfig(format=_LOG_FMT, datefmt=_DATE_FMT, level=logging.INFO)
17 | LOGGER = logging.getLogger('__main__') # this is the global logger
18 |
19 |
20 | def add_log_to_file(log_path):
21 | fh = logging.FileHandler(log_path)
22 | formatter = logging.Formatter(_LOG_FMT, datefmt=_DATE_FMT)
23 | fh.setFormatter(formatter)
24 | LOGGER.addHandler(fh)
25 |
26 |
27 | class TensorboardLogger(object):
28 | def __init__(self):
29 | self._logger = None
30 | self._global_step = 0
31 |
32 | def create(self, path):
33 | self._logger = tensorboardX.SummaryWriter(path)
34 |
35 | def noop(self, *args, **kwargs):
36 | return
37 |
38 | def step(self):
39 | self._global_step += 1
40 |
41 | @property
42 | def global_step(self):
43 | return self._global_step
44 |
45 | def log_scalar_dict(self, log_dict, prefix=''):
46 | """ log a dictionary of scalar values"""
47 | if self._logger is None:
48 | return
49 | if prefix:
50 | prefix = f'{prefix}_'
51 | for name, value in log_dict.items():
52 | if isinstance(value, dict):
53 | self.log_scalar_dict(value, self._global_step,
54 | prefix=f'{prefix}{name}')
55 | else:
56 | self._logger.add_scalar(f'{prefix}{name}', value,
57 | self._global_step)
58 |
59 | def __getattr__(self, name):
60 | if self._logger is None:
61 | return self.noop
62 | return self._logger.__getattribute__(name)
63 |
64 |
65 | TB_LOGGER = TensorboardLogger()
66 |
67 |
68 | class RunningMeter(object):
69 | """ running meteor of a scalar value
70 | (useful for monitoring training loss)
71 | """
72 | def __init__(self, name, val=None, smooth=0.99):
73 | self._name = name
74 | self._sm = smooth
75 | self._val = val
76 |
77 | def __call__(self, value):
78 | val = (value if self._val is None
79 | else value*(1-self._sm) + self._val*self._sm)
80 | if not math.isnan(val):
81 | self._val = val
82 |
83 | def __str__(self):
84 | return f'{self._name}: {self._val:.4f}'
85 |
86 | @property
87 | def val(self):
88 | if self._val is None:
89 | return 0
90 | return self._val
91 |
92 | @property
93 | def name(self):
94 | return self._name
95 |
--------------------------------------------------------------------------------
/VLN-DUET-RVR/pretrain_src/utils/misc.py:
--------------------------------------------------------------------------------
1 | import random
2 | import numpy as np
3 | from typing import Tuple, Union, Dict, Any
4 |
5 | import os
6 | import torch
7 | import torch.distributed as dist
8 | from torch.nn.parallel import DistributedDataParallel as DDP
9 |
10 | from .distributed import init_distributed
11 | from .logger import LOGGER
12 |
13 |
14 | def set_random_seed(seed):
15 | random.seed(seed)
16 | np.random.seed(seed)
17 | torch.manual_seed(seed)
18 | torch.cuda.manual_seed_all(seed)
19 |
20 | def set_dropout(model, drop_p):
21 | for name, module in model.named_modules():
22 | # we might want to tune dropout for smaller dataset
23 | if isinstance(module, torch.nn.Dropout):
24 | if module.p != drop_p:
25 | module.p = drop_p
26 | LOGGER.info(f'{name} set to {drop_p}')
27 |
28 |
29 | def set_cuda(opts) -> Tuple[bool, int, torch.device]:
30 | """
31 | Initialize CUDA for distributed computing
32 | """
33 | if not torch.cuda.is_available():
34 | assert opts.local_rank == -1, opts.local_rank
35 | return True, 0, torch.device("cpu")
36 |
37 | # get device settings
38 | if opts.local_rank != -1:
39 | init_distributed(opts)
40 | torch.cuda.set_device(opts.local_rank)
41 | device = torch.device("cuda", opts.local_rank)
42 | n_gpu = 1
43 | default_gpu = dist.get_rank() == 0
44 | if default_gpu:
45 | LOGGER.info(f"Found {dist.get_world_size()} GPUs")
46 | elif os.environ.get("SLURM_JOBID", None) is not None:
47 | init_distributed(opts)
48 | torch.cuda.set_device(opts.local_rank)
49 | device = torch.device("cuda", opts.local_rank)
50 | n_gpu = 1
51 | default_gpu = dist.get_rank() == 0
52 | if default_gpu:
53 | LOGGER.info(f"Found {dist.get_world_size()} GPUs")
54 | else:
55 | default_gpu = True
56 | device = torch.device("cuda")
57 | n_gpu = torch.cuda.device_count()
58 |
59 | return default_gpu, n_gpu, device
60 |
61 |
62 | def wrap_model(
63 | model: torch.nn.Module, device: torch.device, local_rank: int
64 | ) -> torch.nn.Module:
65 | model.to(device)
66 |
67 | if local_rank != -1:
68 | model = DDP(model, device_ids=[local_rank], find_unused_parameters=True)
69 | # At the time of DDP wrapping, parameters and buffers (i.e., model.state_dict())
70 | # on rank0 are broadcasted to all other ranks.
71 | elif torch.cuda.device_count() > 1:
72 | LOGGER.info("Using data parallel")
73 | model = torch.nn.DataParallel(model)
74 |
75 | return model
76 |
77 |
78 | class NoOp(object):
79 | """ useful for distributed training No-Ops """
80 | def __getattr__(self, name):
81 | return self.noop
82 |
83 | def noop(self, *args, **kwargs):
84 | return
85 |
86 |
--------------------------------------------------------------------------------
/VLN-DUET-RVR/pretrain_src/utils/save.py:
--------------------------------------------------------------------------------
1 | """
2 | Copyright (c) Microsoft Corporation.
3 | Licensed under the MIT license.
4 |
5 | saving utilities
6 | """
7 | import json
8 | import os
9 | import torch
10 |
11 |
12 | def save_training_meta(args):
13 | os.makedirs(os.path.join(args.output_dir, 'logs'), exist_ok=True)
14 | os.makedirs(os.path.join(args.output_dir, 'ckpts'), exist_ok=True)
15 |
16 | with open(os.path.join(args.output_dir, 'logs', 'training_args.json'), 'w') as writer:
17 | json.dump(vars(args), writer, indent=4)
18 | model_config = json.load(open(args.model_config))
19 | with open(os.path.join(args.output_dir, 'logs', 'model_config.json'), 'w') as writer:
20 | json.dump(model_config, writer, indent=4)
21 |
22 |
23 | class ModelSaver(object):
24 | def __init__(self, output_dir, prefix='model_step', suffix='pt'):
25 | self.output_dir = output_dir
26 | self.prefix = prefix
27 | self.suffix = suffix
28 |
29 | def save(self, model, step, optimizer=None):
30 | output_model_file = os.path.join(self.output_dir,
31 | f"{self.prefix}_{step}.{self.suffix}")
32 | state_dict = {}
33 | for k, v in model.state_dict().items():
34 | if k.startswith('module.'):
35 | k = k[7:]
36 | if isinstance(v, torch.Tensor):
37 | state_dict[k] = v.cpu()
38 | else:
39 | state_dict[k] = v
40 | torch.save(state_dict, output_model_file)
41 | if optimizer is not None:
42 | dump = {'step': step, 'optimizer': optimizer.state_dict()}
43 | if hasattr(optimizer, '_amp_stash'):
44 | pass # TODO fp16 optimizer
45 | torch.save(dump, f'{self.output_dir}/train_state_{step}.pt')
46 |
47 |
--------------------------------------------------------------------------------
/VLN-DUET/datasets/.gitkeep:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/wz0919/ScaleVLN/1189fe898462e2e10908631070bcf2d4ec2204b2/VLN-DUET/datasets/.gitkeep
--------------------------------------------------------------------------------
/VLN-DUET/map_nav_src/models/graph_utils.py:
--------------------------------------------------------------------------------
1 | from collections import defaultdict
2 | import numpy as np
3 |
4 | MAX_DIST = 30
5 | MAX_STEP = 10
6 |
7 | def calc_position_distance(a, b):
8 | # a, b: (x, y, z)
9 | dx = b[0] - a[0]
10 | dy = b[1] - a[1]
11 | dz = b[2] - a[2]
12 | dist = np.sqrt(dx**2 + dy**2 + dz**2)
13 | return dist
14 |
15 | def calculate_vp_rel_pos_fts(a, b, base_heading=0, base_elevation=0):
16 | # a, b: (x, y, z)
17 | dx = b[0] - a[0]
18 | dy = b[1] - a[1]
19 | dz = b[2] - a[2]
20 | xy_dist = max(np.sqrt(dx**2 + dy**2), 1e-8)
21 | xyz_dist = max(np.sqrt(dx**2 + dy**2 + dz**2), 1e-8)
22 |
23 | # the simulator's api is weired (x-y axis is transposed)
24 | heading = np.arcsin(dx/xy_dist) # [-pi/2, pi/2]
25 | if b[1] < a[1]:
26 | heading = np.pi - heading
27 | heading -= base_heading
28 |
29 | elevation = np.arcsin(dz/xyz_dist) # [-pi/2, pi/2]
30 | elevation -= base_elevation
31 |
32 | return heading, elevation, xyz_dist
33 |
34 | def get_angle_fts(headings, elevations, angle_feat_size):
35 | ang_fts = [np.sin(headings), np.cos(headings), np.sin(elevations), np.cos(elevations)]
36 | ang_fts = np.vstack(ang_fts).transpose().astype(np.float32)
37 | num_repeats = angle_feat_size // 4
38 | if num_repeats > 1:
39 | ang_fts = np.concatenate([ang_fts] * num_repeats, 1)
40 | return ang_fts
41 |
42 |
43 | class FloydGraph(object):
44 | def __init__(self):
45 | self._dis = defaultdict(lambda :defaultdict(lambda: 95959595))
46 | self._point = defaultdict(lambda :defaultdict(lambda: ""))
47 | self._visited = set()
48 |
49 | def distance(self, x, y):
50 | if x == y:
51 | return 0
52 | else:
53 | return self._dis[x][y]
54 |
55 | def add_edge(self, x, y, dis):
56 | if dis < self._dis[x][y]:
57 | self._dis[x][y] = dis
58 | self._dis[y][x] = dis
59 | self._point[x][y] = ""
60 | self._point[y][x] = ""
61 |
62 | def update(self, k):
63 | for x in self._dis:
64 | for y in self._dis:
65 | if x != y:
66 | if self._dis[x][k] + self._dis[k][y] < self._dis[x][y]:
67 | self._dis[x][y] = self._dis[x][k] + self._dis[k][y]
68 | self._dis[y][x] = self._dis[x][y]
69 | self._point[x][y] = k
70 | self._point[y][x] = k
71 | self._visited.add(k)
72 |
73 | def visited(self, k):
74 | return (k in self._visited)
75 |
76 | def path(self, x, y):
77 | """
78 | :param x: start
79 | :param y: end
80 | :return: the path from x to y [v1, v2, ..., v_n, y]
81 | """
82 | if x == y:
83 | return []
84 | if self._point[x][y] == "": # Direct edge
85 | return [y]
86 | else:
87 | k = self._point[x][y]
88 | # print(x, y, k)
89 | # for x1 in (x, k, y):
90 | # for x2 in (x, k, y):
91 | # print(x1, x2, "%.4f" % self._dis[x1][x2])
92 | return self.path(x, k) + self.path(k, y)
93 |
94 |
95 | class GraphMap(object):
96 | def __init__(self, start_vp):
97 | self.start_vp = start_vp # start viewpoint
98 |
99 | self.node_positions = {} # viewpoint to position (x, y, z)
100 | self.graph = FloydGraph() # shortest path graph
101 | self.node_embeds = {} # {viewpoint: feature (sum feature, count)}
102 | self.node_stop_scores = {} # {viewpoint: prob}
103 | self.node_nav_scores = {} # {viewpoint: {t: prob}}
104 | self.node_step_ids = {}
105 |
106 | def update_graph(self, ob):
107 | self.node_positions[ob['viewpoint']] = ob['position']
108 | for cc in ob['candidate']:
109 | self.node_positions[cc['viewpointId']] = cc['position']
110 | dist = calc_position_distance(ob['position'], cc['position'])
111 | self.graph.add_edge(ob['viewpoint'], cc['viewpointId'], dist)
112 | self.graph.update(ob['viewpoint'])
113 |
114 | def update_node_embed(self, vp, embed, rewrite=False):
115 | if rewrite:
116 | self.node_embeds[vp] = [embed, 1]
117 | else:
118 | if vp in self.node_embeds:
119 | self.node_embeds[vp][0] = self.node_embeds[vp][0] + embed
120 | self.node_embeds[vp][1] = self.node_embeds[vp][1] + 1
121 | else:
122 | self.node_embeds[vp] = [embed, 1]
123 |
124 | def get_node_embed(self, vp):
125 | return self.node_embeds[vp][0] / self.node_embeds[vp][1]
126 |
127 | def get_pos_fts(self, cur_vp, gmap_vpids, cur_heading, cur_elevation, angle_feat_size=4):
128 | # dim=7 (sin(heading), cos(heading), sin(elevation), cos(elevation),
129 | # line_dist, shortest_dist, shortest_step)
130 | rel_angles, rel_dists = [], []
131 | for vp in gmap_vpids:
132 | if vp is None:
133 | rel_angles.append([0, 0])
134 | rel_dists.append([0, 0, 0])
135 | else:
136 | rel_heading, rel_elevation, rel_dist = calculate_vp_rel_pos_fts(
137 | self.node_positions[cur_vp], self.node_positions[vp],
138 | base_heading=cur_heading, base_elevation=cur_elevation,
139 | )
140 | rel_angles.append([rel_heading, rel_elevation])
141 | rel_dists.append(
142 | [rel_dist / MAX_DIST, self.graph.distance(cur_vp, vp) / MAX_DIST, \
143 | len(self.graph.path(cur_vp, vp)) / MAX_STEP]
144 | )
145 | rel_angles = np.array(rel_angles).astype(np.float32)
146 | rel_dists = np.array(rel_dists).astype(np.float32)
147 | rel_ang_fts = get_angle_fts(rel_angles[:, 0], rel_angles[:, 1], angle_feat_size)
148 | return np.concatenate([rel_ang_fts, rel_dists], 1)
149 |
150 | def save_to_json(self):
151 | nodes = {}
152 | for vp, pos in self.node_positions.items():
153 | nodes[vp] = {
154 | 'location': pos, # (x, y, z)
155 | 'visited': self.graph.visited(vp),
156 | }
157 | if nodes[vp]['visited']:
158 | nodes[vp]['stop_prob'] = self.node_stop_scores[vp]['stop']
159 | nodes[vp]['og_objid'] = self.node_stop_scores[vp]['og']
160 | else:
161 | nodes[vp]['nav_prob'] = self.node_nav_scores[vp]
162 |
163 | edges = []
164 | for k, v in self.graph._dis.items():
165 | for kk in v.keys():
166 | edges.append((k, kk))
167 |
168 | return {'nodes': nodes, 'edges': edges}
169 |
170 |
171 |
--------------------------------------------------------------------------------
/VLN-DUET/map_nav_src/models/model.py:
--------------------------------------------------------------------------------
1 | import numpy as np
2 | import collections
3 |
4 | import torch
5 | import torch.nn as nn
6 | import torch.nn.functional as F
7 |
8 | from transformers import BertPreTrainedModel
9 |
10 | from .vlnbert_init import get_vlnbert_models
11 |
12 | class VLNBert(nn.Module):
13 | def __init__(self, args):
14 | super().__init__()
15 | print('\nInitalizing the VLN-BERT model ...')
16 | self.args = args
17 |
18 | self.vln_bert = get_vlnbert_models(args, config=None) # initialize the VLN-BERT
19 | self.drop_env = nn.Dropout(p=args.feat_dropout)
20 |
21 | def forward(self, mode, batch):
22 | batch = collections.defaultdict(lambda: None, batch)
23 |
24 | if mode == 'language':
25 | txt_embeds = self.vln_bert(mode, batch)
26 | return txt_embeds
27 |
28 | elif mode == 'panorama':
29 | batch['view_img_fts'] = self.drop_env(batch['view_img_fts'])
30 | if 'obj_img_fts' in batch:
31 | batch['obj_img_fts'] = self.drop_env(batch['obj_img_fts'])
32 | pano_embeds, pano_masks = self.vln_bert(mode, batch)
33 | return pano_embeds, pano_masks
34 |
35 | elif mode == 'navigation':
36 | outs = self.vln_bert(mode, batch)
37 | return outs
38 |
39 | else:
40 | raise NotImplementedError('wrong mode: %s'%mode)
41 |
42 |
43 | class Critic(nn.Module):
44 | def __init__(self, args):
45 | super(Critic, self).__init__()
46 | self.state2value = nn.Sequential(
47 | nn.Linear(768, 512),
48 | nn.ReLU(),
49 | nn.Dropout(args.dropout),
50 | nn.Linear(512, 1),
51 | )
52 |
53 | def forward(self, state):
54 | return self.state2value(state).squeeze()
55 |
--------------------------------------------------------------------------------
/VLN-DUET/map_nav_src/models/ops.py:
--------------------------------------------------------------------------------
1 | import torch
2 |
3 | from .transformer import TransformerEncoder, TransformerEncoderLayer
4 |
5 | # try:
6 | # from apex.normalization.fused_layer_norm import FusedLayerNorm as BertLayerNorm
7 | # except (ImportError, AttributeError) as e:
8 | # # logger.info("Better speed can be achieved with apex installed from https://www.github.com/nvidia/apex .")
9 | # BertLayerNorm = torch.nn.LayerNorm
10 | BertLayerNorm = torch.nn.LayerNorm
11 |
12 | def create_transformer_encoder(config, num_layers, norm=False):
13 | enc_layer = TransformerEncoderLayer(
14 | config.hidden_size, config.num_attention_heads,
15 | dim_feedforward=config.intermediate_size,
16 | dropout=config.hidden_dropout_prob,
17 | activation=config.hidden_act,
18 | normalize_before=True
19 | )
20 | if norm:
21 | norm_layer = BertLayerNorm(config.hidden_size, eps=1e-12)
22 | else:
23 | norm_layer = None
24 | return TransformerEncoder(enc_layer, num_layers, norm=norm_layer, batch_first=True)
25 |
26 | def extend_neg_masks(masks, dtype=None):
27 | """
28 | mask from (N, L) into (N, 1(H), 1(L), L) and make it negative
29 | """
30 | if dtype is None:
31 | dtype = torch.float
32 | extended_masks = masks.unsqueeze(1).unsqueeze(2)
33 | extended_masks = extended_masks.to(dtype=dtype)
34 | extended_masks = (1.0 - extended_masks) * -10000.0
35 | return extended_masks
36 |
37 | def gen_seq_masks(seq_lens, max_len=None):
38 | if max_len is None:
39 | max_len = max(seq_lens)
40 | batch_size = len(seq_lens)
41 | device = seq_lens.device
42 |
43 | masks = torch.arange(max_len).unsqueeze(0).repeat(batch_size, 1).to(device)
44 | masks = masks < seq_lens.unsqueeze(1)
45 | return masks
46 |
47 | def pad_tensors_wgrad(tensors, lens=None):
48 | """B x [T, ...] torch tensors"""
49 | if lens is None:
50 | lens = [t.size(0) for t in tensors]
51 | max_len = max(lens)
52 | batch_size = len(tensors)
53 | hid = list(tensors[0].size()[1:])
54 |
55 | device = tensors[0].device
56 | dtype = tensors[0].dtype
57 |
58 | output = []
59 | for i in range(batch_size):
60 | if lens[i] < max_len:
61 | tmp = torch.cat(
62 | [tensors[i], torch.zeros([max_len-lens[i]]+hid, dtype=dtype).to(device)],
63 | dim=0
64 | )
65 | else:
66 | tmp = tensors[i]
67 | output.append(tmp)
68 | output = torch.stack(output, 0)
69 | return output
70 |
--------------------------------------------------------------------------------
/VLN-DUET/map_nav_src/models/vlnbert_init.py:
--------------------------------------------------------------------------------
1 | import torch
2 |
3 |
4 | def get_tokenizer(args):
5 | from transformers import AutoTokenizer
6 | if args.tokenizer == 'xlm':
7 | cfg_name = 'xlm-roberta-base'
8 | else:
9 | cfg_name = 'bert-base-uncased'
10 | tokenizer = AutoTokenizer.from_pretrained(cfg_name)
11 | return tokenizer
12 |
13 | def get_vlnbert_models(args, config=None):
14 |
15 | from transformers import PretrainedConfig
16 | from models.vilmodel import GlocalTextPathNavCMT
17 |
18 | model_name_or_path = args.bert_ckpt_file
19 | new_ckpt_weights = {}
20 | if model_name_or_path is not None:
21 | ckpt_weights = torch.load(model_name_or_path)
22 | for k, v in ckpt_weights.items():
23 | if k.startswith('module'):
24 | k = k[7:]
25 | if '_head' in k or 'sap_fuse' in k:
26 | new_ckpt_weights['bert.' + k] = v
27 | else:
28 | new_ckpt_weights[k] = v
29 |
30 | if args.tokenizer == 'xlm':
31 | cfg_name = 'xlm-roberta-base'
32 | else:
33 | cfg_name = 'bert-base-uncased'
34 | vis_config = PretrainedConfig.from_pretrained(cfg_name)
35 |
36 | if args.tokenizer == 'xlm':
37 | vis_config.type_vocab_size = 2
38 |
39 | vis_config.max_action_steps = 100
40 | vis_config.image_feat_size = args.image_feat_size
41 | vis_config.angle_feat_size = args.angle_feat_size
42 | vis_config.obj_feat_size = args.obj_feat_size
43 | vis_config.obj_loc_size = 3
44 | vis_config.num_l_layers = args.num_l_layers
45 | vis_config.num_pano_layers = args.num_pano_layers
46 | vis_config.num_x_layers = args.num_x_layers
47 | vis_config.graph_sprels = args.graph_sprels
48 | vis_config.glocal_fuse = args.fusion == 'dynamic'
49 |
50 | vis_config.fix_lang_embedding = args.fix_lang_embedding
51 | vis_config.fix_pano_embedding = args.fix_pano_embedding
52 | vis_config.fix_local_branch = args.fix_local_branch
53 |
54 | vis_config.update_lang_bert = not args.fix_lang_embedding
55 | vis_config.output_attentions = True
56 | vis_config.pred_head_dropout_prob = 0.1
57 | vis_config.use_lang2visn_attn = False
58 |
59 | visual_model = GlocalTextPathNavCMT.from_pretrained(
60 | pretrained_model_name_or_path=None,
61 | config=vis_config,
62 | state_dict=new_ckpt_weights)
63 |
64 | return visual_model
65 |
--------------------------------------------------------------------------------
/VLN-DUET/map_nav_src/r2r/data_utils.py:
--------------------------------------------------------------------------------
1 | import os
2 | import json
3 | import numpy as np
4 |
5 | def load_instr_datasets(anno_dir, dataset, splits, tokenizer, is_test=True):
6 | data = []
7 | for split in splits:
8 | if "/" not in split: # the official splits
9 | if tokenizer == 'bert':
10 | filepath = os.path.join(anno_dir, '%s_%s_enc.json' % (dataset.upper(), split))
11 | elif tokenizer == 'xlm':
12 | filepath = os.path.join(anno_dir, '%s_%s_enc_xlmr.json' % (dataset.upper(), split))
13 | else:
14 | raise NotImplementedError('unspported tokenizer %s' % tokenizer)
15 |
16 | with open(filepath) as f:
17 | new_data = json.load(f)
18 |
19 | if split == 'val_train_seen':
20 | new_data = new_data[:50]
21 |
22 | if not is_test:
23 | if dataset == 'r4r' and split == 'val_unseen':
24 | ridxs = np.random.permutation(len(new_data))[:200]
25 | new_data = [new_data[ridx] for ridx in ridxs]
26 | else: # augmented data
27 | print('\nLoading augmented data %s for pretraining...' % os.path.basename(split))
28 | with open(split) as f:
29 | new_data = json.load(f)
30 | # Join
31 | data += new_data
32 | return data
33 |
34 | def construct_instrs(anno_dir, dataset, splits, tokenizer, max_instr_len=512, is_test=True):
35 | data = []
36 | for i, item in enumerate(load_instr_datasets(anno_dir, dataset, splits, tokenizer, is_test=is_test)):
37 | # Split multiple instructions into separate entries
38 | for j, instr in enumerate(item['instructions']):
39 | new_item = dict(item)
40 | new_item['instr_id'] = '%s_%d' % (item['path_id'], j)
41 | new_item['instruction'] = instr
42 | new_item['instr_encoding'] = item['instr_encodings'][j][:max_instr_len]
43 | del new_item['instructions']
44 | del new_item['instr_encodings']
45 | data.append(new_item)
46 | return data
--------------------------------------------------------------------------------
/VLN-DUET/map_nav_src/r2r/eval_utils.py:
--------------------------------------------------------------------------------
1 | ''' Utils for evaluation '''
2 |
3 | import numpy as np
4 |
5 |
6 | def cal_dtw(shortest_distances, prediction, reference, success=None, threshold=3.0):
7 | dtw_matrix = np.inf * np.ones((len(prediction) + 1, len(reference) + 1))
8 | dtw_matrix[0][0] = 0
9 | for i in range(1, len(prediction)+1):
10 | for j in range(1, len(reference)+1):
11 | best_previous_cost = min(
12 | dtw_matrix[i-1][j], dtw_matrix[i][j-1], dtw_matrix[i-1][j-1])
13 | cost = shortest_distances[prediction[i-1]][reference[j-1]]
14 | dtw_matrix[i][j] = cost + best_previous_cost
15 |
16 | dtw = dtw_matrix[len(prediction)][len(reference)]
17 | ndtw = np.exp(-dtw/(threshold * len(reference)))
18 | if success is None:
19 | success = float(shortest_distances[prediction[-1]][reference[-1]] < threshold)
20 | sdtw = success * ndtw
21 |
22 | return {
23 | 'DTW': dtw,
24 | 'nDTW': ndtw,
25 | 'SDTW': sdtw
26 | }
27 |
28 | def cal_cls(shortest_distances, prediction, reference, threshold=3.0):
29 | def length(nodes):
30 | return np.sum([
31 | shortest_distances[a][b]
32 | for a, b in zip(nodes[:-1], nodes[1:])
33 | ])
34 |
35 | coverage = np.mean([
36 | np.exp(-np.min([ # pylint: disable=g-complex-comprehension
37 | shortest_distances[u][v] for v in prediction
38 | ]) / threshold) for u in reference
39 | ])
40 | expected = coverage * length(reference)
41 | score = expected / (expected + np.abs(expected - length(prediction)))
42 | return coverage * score
43 |
44 |
--------------------------------------------------------------------------------
/VLN-DUET/map_nav_src/r2r/parser.py:
--------------------------------------------------------------------------------
1 | import argparse
2 | from ast import arg
3 | import os
4 |
5 |
6 | def parse_args():
7 | parser = argparse.ArgumentParser(description="")
8 |
9 | parser.add_argument('--root_dir', type=str, default='../datasets')
10 | parser.add_argument('--dataset', type=str, default='r2r', choices=['r2r', 'r4r'])
11 | parser.add_argument('--output_dir', type=str, default='default', help='experiment id')
12 | parser.add_argument('--seed', type=int, default=0)
13 |
14 | parser.add_argument('--tokenizer', choices=['bert', 'xlm'], default='bert')
15 |
16 | parser.add_argument('--act_visited_nodes', action='store_true', default=False)
17 | parser.add_argument('--fusion', choices=['global', 'local', 'avg', 'dynamic'])
18 | parser.add_argument('--expl_sample', action='store_true', default=False)
19 | parser.add_argument('--expl_max_ratio', type=float, default=0.6)
20 | parser.add_argument('--expert_policy', default='spl', choices=['spl', 'ndtw'])
21 |
22 | # distributional training (single-node, multiple-gpus)
23 | parser.add_argument('--world_size', type=int, default=1, help='number of gpus')
24 | parser.add_argument('--local_rank', type=int, default=-1)
25 | parser.add_argument("--node_rank", type=int, default=0, help="Id of the node")
26 |
27 | # General
28 | parser.add_argument('--iters', type=int, default=100000, help='training iterations')
29 | parser.add_argument('--log_every', type=int, default=1000)
30 | parser.add_argument('--eval_first', action='store_true', default=False)
31 |
32 | # Data preparation
33 | parser.add_argument('--max_instr_len', type=int, default=80)
34 | parser.add_argument('--max_action_len', type=int, default=15)
35 | parser.add_argument('--batch_size', type=int, default=8)
36 | parser.add_argument('--ignoreid', type=int, default=-100, help='ignoreid for action')
37 |
38 | # Load the model from
39 | parser.add_argument("--resume_file", default=None, help='path of the trained model')
40 | parser.add_argument("--resume_optimizer", action="store_true", default=False)
41 |
42 |
43 | # Augmented Paths from
44 | parser.add_argument("--aug", default=None)
45 | parser.add_argument('--bert_ckpt_file', default=None, help='init vlnbert')
46 |
47 | # Listener Model Config
48 | parser.add_argument("--ml_weight", type=float, default=0.20)
49 | parser.add_argument('--entropy_loss_weight', type=float, default=0.01)
50 |
51 | parser.add_argument("--features", type=str, default='vitbase')
52 | parser.add_argument("--env_aug", action='store_true', default=False)
53 | parser.add_argument("--aug_times", type=int, default=19)
54 |
55 | parser.add_argument('--fix_lang_embedding', action='store_true', default=False)
56 | parser.add_argument('--fix_pano_embedding', action='store_true', default=False)
57 | parser.add_argument('--fix_local_branch', action='store_true', default=False)
58 |
59 | parser.add_argument('--num_l_layers', type=int, default=9)
60 | parser.add_argument('--num_pano_layers', type=int, default=2)
61 | parser.add_argument('--num_x_layers', type=int, default=4)
62 |
63 | parser.add_argument('--enc_full_graph', default=False, action='store_true')
64 | parser.add_argument('--graph_sprels', action='store_true', default=False)
65 |
66 | # Dropout Param
67 | parser.add_argument('--dropout', type=float, default=0.5)
68 | parser.add_argument('--feat_dropout', type=float, default=0.3)
69 |
70 | # Submision configuration
71 | parser.add_argument('--test', action='store_true', default=False)
72 | parser.add_argument('--zero_shot', action='store_true', default=False)
73 | parser.add_argument("--submit", action='store_true', default=False)
74 | parser.add_argument('--no_backtrack', action='store_true', default=False)
75 | parser.add_argument('--detailed_output', action='store_true', default=False)
76 |
77 | # Training Configurations
78 | parser.add_argument(
79 | '--optim', type=str, default='rms',
80 | choices=['rms', 'adam', 'adamW', 'sgd']
81 | ) # rms, adam
82 | parser.add_argument('--lr', type=float, default=0.00001, help="the learning rate")
83 | parser.add_argument('--decay', dest='weight_decay', type=float, default=0.)
84 | parser.add_argument(
85 | '--feedback', type=str, default='sample',
86 | help='How to choose next position, one of ``teacher``, ``sample`` and ``argmax``'
87 | )
88 | parser.add_argument('--epsilon', type=float, default=0.1, help='')
89 |
90 | # Model hyper params:
91 | parser.add_argument("--angle_feat_size", type=int, default=4)
92 | parser.add_argument('--image_feat_size', type=int, default=2048)
93 | parser.add_argument('--obj_feat_size', type=int, default=0)
94 | parser.add_argument('--views', type=int, default=36)
95 |
96 | # # A2C
97 | parser.add_argument("--gamma", default=0.9, type=float, help='reward discount factor')
98 | parser.add_argument(
99 | "--normalize", dest="normalize_loss", default="total",
100 | type=str, help='batch or total'
101 | )
102 | parser.add_argument('--train_alg',
103 | choices=['imitation', 'dagger'],
104 | default='imitation'
105 | )
106 |
107 | args, _ = parser.parse_known_args()
108 |
109 | args = postprocess_args(args)
110 |
111 | return args
112 |
113 |
114 | def postprocess_args(args):
115 | ROOTDIR = args.root_dir
116 |
117 | # Setup input paths
118 | ft_file_map = {
119 | 'clip.h14': 'clip_vit-h14_mp3d_hm3d_gibson.hdf5',
120 | 'clip.b16': 'clip_vit-b16_mp3d_hm3d_gibson.hdf5'
121 | }
122 |
123 | args.aug_ft_file = os.path.join(ROOTDIR, 'R2R', 'features', ft_file_map[args.features])
124 |
125 | if args.features == 'clip.h14':
126 | args.mp3d_ft_files = [os.path.join(ROOTDIR, 'R2R', 'features', 'clip_vit-h14_mp3d_original.hdf5')]
127 | args.val_ft_file = os.path.join(ROOTDIR, 'R2R', 'features', 'clip_vit-h14_mp3d_original.hdf5')
128 | elif args.features == 'clip.b16':
129 | args.mp3d_ft_files = [os.path.join(ROOTDIR, 'R2R', 'features', 'clip_vit-b16_mp3d_original.hdf5')]
130 | args.val_ft_file = os.path.join(ROOTDIR, 'R2R', 'features', 'clip_vit-b16_mp3d_original.hdf5')
131 |
132 | if args.env_aug: # only h14
133 | args.mp3d_ft_files = [
134 | os.path.join(ROOTDIR, 'R2R', 'features', 'clip_vit-h14_mp3d_img_image_synthesis.hdf5'),
135 | os.path.join(ROOTDIR, 'R2R', 'features', 'clip_vit-h14_mp3d_img_mask_image_synthesis.hdf5'),
136 | os.path.join(ROOTDIR, 'R2R', 'features', 'clip_vit-h14_mp3d_img_style_transfer.hdf5'),
137 | os.path.join(ROOTDIR, 'R2R', 'features', 'clip_vit-h14_mp3d_original.hdf5'),
138 | ]
139 |
140 |
141 | if args.aug:
142 | args.connectivity_dir = os.path.join(ROOTDIR, 'R2R', 'connectivity')
143 | else:
144 | args.connectivity_dir = os.path.join(ROOTDIR, 'R2R', 'connectivity_mp3d')
145 |
146 | args.scan_data_dir = os.path.join(ROOTDIR, 'Matterport3D', 'v1_unzip_scans')
147 |
148 | args.anno_dir = os.path.join(ROOTDIR, 'R2R', 'annotations')
149 |
150 | # Build paths
151 | args.ckpt_dir = os.path.join(args.output_dir, 'ckpts')
152 | if args.zero_shot:
153 | args.log_dir = os.path.join(args.output_dir, 'zero_shot_logs')
154 | else:
155 | args.log_dir = os.path.join(args.output_dir, 'logs')
156 | args.pred_dir = os.path.join(args.output_dir, 'preds')
157 |
158 | if not args.zero_shot:
159 | os.makedirs(args.output_dir, exist_ok=True)
160 | os.makedirs(args.ckpt_dir, exist_ok=True)
161 | os.makedirs(args.pred_dir, exist_ok=True)
162 | os.makedirs(args.log_dir, exist_ok=True)
163 |
164 | return args
165 |
166 |
--------------------------------------------------------------------------------
/VLN-DUET/map_nav_src/scripts/r2r_b16_mix.sh:
--------------------------------------------------------------------------------
1 | DATA_ROOT=../datasets
2 |
3 | train_alg=dagger
4 |
5 | features=clip.b16
6 | ft_dim=512
7 | obj_features=vitbase
8 | obj_ft_dim=768
9 |
10 | ngpus=1
11 | bs=16
12 | seed=0
13 |
14 | name=${train_alg}-${features}
15 | name=${name}-seed.${seed}
16 | name=${name}-aug.mp3d.prevalent.hm3d_gibson.envdrop.init.140k
17 |
18 |
19 | outdir=${DATA_ROOT}/R2R/exprs_map/finetune/${name}-aug.hm3d.envdrop
20 |
21 | flag="--root_dir ${DATA_ROOT}
22 | --dataset r2r
23 | --output_dir ${outdir}
24 | --world_size ${ngpus}
25 | --seed ${seed}
26 | --tokenizer bert
27 |
28 | --enc_full_graph
29 | --graph_sprels
30 | --fusion dynamic
31 |
32 | --expert_policy spl
33 | --train_alg ${train_alg}
34 |
35 | --num_l_layers 9
36 | --num_x_layers 4
37 | --num_pano_layers 2
38 |
39 | --max_action_len 15
40 | --max_instr_len 200
41 |
42 | --batch_size ${bs}
43 | --lr 1e-5
44 | --iters 200000
45 | --log_every 500
46 | --aug_times 9
47 |
48 | --optim adamW
49 |
50 | --features ${features}
51 | --image_feat_size ${ft_dim}
52 | --angle_feat_size 4
53 |
54 | --ml_weight 0.15
55 |
56 | --feat_dropout 0.4
57 | --dropout 0.5
58 |
59 | --gamma 0."
60 |
61 | # # zero shot
62 | # python r2r/main_nav.py $flag \
63 | # --tokenizer bert \
64 | # --zero_shot
65 |
66 | # train
67 | CUDA_VISIBLE_DEVICES=$1 python r2r/main_nav.py $flag \
68 | --tokenizer bert \
69 | --bert_ckpt_file ../datasets/R2R/trained_models/pretrain/duet_vit-b16_model_step_140000.pt \
70 | --aug ../datasets/R2R/annotations/R2R_scalevln_ft_aug_enc.json
71 |
72 | # # test
73 | # CUDA_VISIBLE_DEVICES=$1 python r2r/main_nav.py $flag \
74 | # --tokenizer bert \
75 | # --resume_file ../datasets/R2R/trained_models/finetune/duet_vit-b16_ft_best_val_unseen \
76 | # --test --submit
--------------------------------------------------------------------------------
/VLN-DUET/map_nav_src/scripts/r2r_h14_envedit_mix.sh:
--------------------------------------------------------------------------------
1 | DATA_ROOT=../datasets
2 |
3 | train_alg=dagger
4 |
5 | features=clip.h14
6 | ft_dim=1024
7 | obj_features=vitbase
8 | obj_ft_dim=768
9 |
10 | ngpus=1
11 | bs=8
12 | seed=0
13 |
14 | name=${train_alg}-${features}-envedit
15 | name=${name}-seed.${seed}
16 | name=${name}-aug.mp3d.prevalent.hm3d_gibson.envdrop.init.190k
17 |
18 |
19 | outdir=${DATA_ROOT}/R2R/exprs_map/finetune/${name}-aug.hm3d.envdrop
20 |
21 | flag="--root_dir ${DATA_ROOT}
22 | --dataset r2r
23 | --output_dir ${outdir}
24 | --world_size ${ngpus}
25 | --seed ${seed}
26 | --tokenizer bert
27 |
28 | --enc_full_graph
29 | --graph_sprels
30 | --fusion dynamic
31 |
32 | --expert_policy spl
33 | --train_alg ${train_alg}
34 |
35 | --num_l_layers 9
36 | --num_x_layers 4
37 | --num_pano_layers 2
38 |
39 | --max_action_len 15
40 | --max_instr_len 200
41 |
42 | --batch_size ${bs}
43 | --lr 1e-5
44 | --iters 200000
45 | --log_every 500
46 | --aug_times 9
47 |
48 | --env_aug
49 |
50 | --optim adamW
51 |
52 | --features ${features}
53 | --image_feat_size ${ft_dim}
54 | --angle_feat_size 4
55 |
56 | --ml_weight 0.15
57 |
58 | --feat_dropout 0.4
59 | --dropout 0.5
60 |
61 | --gamma 0."
62 |
63 | # # zero shot
64 | # python r2r/main_nav.py $flag \
65 | # --tokenizer bert \
66 | # --zero_shot
67 |
68 | # train
69 | CUDA_VISIBLE_DEVICES=$1 python r2r/main_nav.py $flag \
70 | --tokenizer bert \
71 | --bert_ckpt_file ../datasets/R2R/trained_models/pretrain/duet_vit-h14_model_step_190000.pt \
72 | --aug ../datasets/R2R/annotations/R2R_scalevln_ft_aug_enc.json
73 |
74 | # # test
75 | # CUDA_VISIBLE_DEVICES='0' python r2r/main_nav.py $flag \
76 | # --tokenizer bert \
77 | # --resume_file ../datasets/R2R/trained_models/finetune/duet_vit-h14_ft_best_val_unseen \
78 | # --test --submit
--------------------------------------------------------------------------------
/VLN-DUET/map_nav_src/utils/data.py:
--------------------------------------------------------------------------------
1 | import os
2 | import json
3 | import jsonlines
4 | import h5py
5 | import networkx as nx
6 | import math
7 | import numpy as np
8 | import random
9 |
10 | class ImageFeaturesDB(object):
11 | def __init__(self, img_ft_file, image_feat_size):
12 | self.image_feat_size = image_feat_size
13 | self.img_ft_file = img_ft_file
14 | self._feature_store = {}
15 | with h5py.File(self.img_ft_file, 'r') as f:
16 | for key in f.keys():
17 | ft = f[key][...][:, :self.image_feat_size].astype(np.float32)
18 | self._feature_store[key] = ft
19 |
20 |
21 | def get_image_feature(self, scan, viewpoint):
22 | key = '%s_%s' % (scan, viewpoint)
23 | if key in self._feature_store:
24 | ft = self._feature_store[key]
25 | else:
26 | with h5py.File(self.img_ft_file, 'r') as f:
27 | ft = f[key][...][:, :self.image_feat_size].astype(np.float32)
28 | self._feature_store[key] = ft
29 | return ft
30 |
31 | class ImageFeaturesDB2(object):
32 | def __init__(self, img_ft_files, image_feat_size):
33 | self.image_feat_size = image_feat_size
34 | self.img_ft_file = img_ft_files
35 | self._feature_stores = {}
36 | for name in img_ft_files:
37 | self._feature_stores[name] = {}
38 | with h5py.File(name, 'r') as f:
39 | for key in f.keys():
40 | ft = f[key][...][:, :self.image_feat_size].astype(np.float32)
41 | self._feature_stores[name][key] = ft
42 | self.env_names = list(self._feature_stores.keys())
43 | print(self.env_names)
44 |
45 |
46 | def get_image_feature(self, scan, viewpoint):
47 | key = '%s_%s' % (scan, viewpoint)
48 | env_name = random.choice(self.env_names)
49 | if key in self._feature_stores[env_name]:
50 | ft = self._feature_stores[env_name][key]
51 | else:
52 | with h5py.File(env_name, 'r') as f:
53 | ft = f[key][...][:, :self.image_feat_size].astype(np.float32)
54 | self._feature_stores[env_name][key] = ft
55 | return ft
56 |
57 | def load_nav_graphs(connectivity_dir, scans):
58 | ''' Load connectivity graph for each scan '''
59 |
60 | def distance(pose1, pose2):
61 | ''' Euclidean distance between two graph poses '''
62 | return ((pose1['pose'][3]-pose2['pose'][3])**2\
63 | + (pose1['pose'][7]-pose2['pose'][7])**2\
64 | + (pose1['pose'][11]-pose2['pose'][11])**2)**0.5
65 |
66 | graphs = {}
67 | for scan in scans:
68 | with open(os.path.join(connectivity_dir, '%s_connectivity.json' % scan)) as f:
69 | G = nx.Graph()
70 | positions = {}
71 | data = json.load(f)
72 | for i,item in enumerate(data):
73 | if item['included']:
74 | for j,conn in enumerate(item['unobstructed']):
75 | if conn and data[j]['included']:
76 | positions[item['image_id']] = np.array([item['pose'][3],
77 | item['pose'][7], item['pose'][11]]);
78 | assert data[j]['unobstructed'][i], 'Graph should be undirected'
79 | G.add_edge(item['image_id'],data[j]['image_id'],weight=distance(item,data[j]))
80 | nx.set_node_attributes(G, values=positions, name='position')
81 | graphs[scan] = G
82 | return graphs
83 |
84 | def new_simulator(connectivity_dir, scan_data_dir=None):
85 | import MatterSim
86 |
87 | # Simulator image parameters
88 | WIDTH = 640
89 | HEIGHT = 480
90 | VFOV = 60
91 |
92 | sim = MatterSim.Simulator()
93 | if scan_data_dir:
94 | sim.setDatasetPath(scan_data_dir)
95 | sim.setNavGraphPath(connectivity_dir)
96 | sim.setRenderingEnabled(False)
97 | sim.setCameraResolution(WIDTH, HEIGHT)
98 | sim.setCameraVFOV(math.radians(VFOV))
99 | sim.setDiscretizedViewingAngles(True)
100 | sim.setBatchSize(1)
101 | sim.initialize()
102 |
103 | return sim
104 |
105 | def angle_feature(heading, elevation, angle_feat_size):
106 | return np.array(
107 | [math.sin(heading), math.cos(heading), math.sin(elevation), math.cos(elevation)] * (angle_feat_size // 4),
108 | dtype=np.float32)
109 |
110 | def get_point_angle_feature(sim, angle_feat_size, baseViewId=0):
111 | feature = np.empty((36, angle_feat_size), np.float32)
112 | base_heading = (baseViewId % 12) * math.radians(30)
113 | base_elevation = (baseViewId // 12 - 1) * math.radians(30)
114 |
115 | for ix in range(36):
116 | if ix == 0:
117 | sim.newEpisode(['ZMojNkEp431'], ['2f4d90acd4024c269fb0efe49a8ac540'], [0], [math.radians(-30)])
118 | elif ix % 12 == 0:
119 | sim.makeAction([0], [1.0], [1.0])
120 | else:
121 | sim.makeAction([0], [1.0], [0])
122 |
123 | state = sim.getState()[0]
124 | assert state.viewIndex == ix
125 |
126 | heading = state.heading - base_heading
127 | elevation = state.elevation - base_elevation
128 |
129 | feature[ix, :] = angle_feature(heading, elevation, angle_feat_size)
130 | return feature
131 |
132 | def get_all_point_angle_feature(sim, angle_feat_size):
133 | return [get_point_angle_feature(sim, angle_feat_size, baseViewId) for baseViewId in range(36)]
134 |
135 |
--------------------------------------------------------------------------------
/VLN-DUET/map_nav_src/utils/distributed.py:
--------------------------------------------------------------------------------
1 | """
2 | Distributed tools
3 | """
4 | import os
5 | from pathlib import Path
6 | from pprint import pformat
7 | import pickle
8 |
9 | import torch
10 | import torch.distributed as dist
11 |
12 |
13 | def load_init_param(opts):
14 | """
15 | Load parameters for the rendezvous distributed procedure
16 | """
17 | # sync file
18 | if opts.output_dir != "":
19 | sync_dir = Path(opts.output_dir).resolve()
20 | sync_dir.mkdir(parents=True, exist_ok=True)
21 | sync_file = f"{sync_dir}/.torch_distributed_sync"
22 | else:
23 | raise RuntimeError("Can't find any sync dir")
24 |
25 | # world size
26 | if opts.world_size != -1:
27 | world_size = opts.world_size
28 | elif os.environ.get("WORLD_SIZE", "") != "":
29 | world_size = int(os.environ["WORLD_SIZE"])
30 | else:
31 | raise RuntimeError("Can't find any world size")
32 |
33 | # rank
34 | if os.environ.get("RANK", "") != "":
35 | # pytorch.distributed.launch provide this variable no matter what
36 | rank = int(os.environ["RANK"])
37 | else:
38 | if opts.node_rank != -1:
39 | node_rank = opts.node_rank
40 | elif os.environ.get("NODE_RANK", "") != "":
41 | node_rank = int(os.environ["NODE_RANK"])
42 | else:
43 | raise RuntimeError("Can't find any rank or node rank")
44 |
45 | if opts.local_rank != -1:
46 | local_rank = opts.local_rank
47 | elif os.environ.get("LOCAL_RANK", "") != "":
48 | local_rank = int(os.environ["LOCAL_RANK"])
49 | else:
50 | raise RuntimeError("Can't find any rank or local rank")
51 |
52 | # WARNING: this assumes that each node has the same number of GPUs
53 | n_gpus = torch.cuda.device_count()
54 | rank = local_rank + node_rank * n_gpus
55 |
56 | return {
57 | "backend": "nccl",
58 | # "init_method": f"file://{sync_file}",
59 | "rank": rank,
60 | "world_size": world_size,
61 | }
62 |
63 |
64 | def init_distributed(opts):
65 | init_param = load_init_param(opts)
66 | rank = init_param["rank"]
67 |
68 | print(f"Init distributed {init_param['rank']} - {init_param['world_size']}")
69 |
70 | dist.init_process_group(**init_param)
71 | return rank
72 |
73 |
74 | def is_default_gpu(opts) -> bool:
75 | return opts.local_rank == -1 or dist.get_rank() == 0
76 |
77 |
78 | def is_dist_avail_and_initialized():
79 | if not dist.is_available():
80 | return False
81 | if not dist.is_initialized():
82 | return False
83 | return True
84 |
85 | def get_world_size():
86 | if not is_dist_avail_and_initialized():
87 | return 1
88 | return dist.get_world_size()
89 |
90 | def all_gather(data):
91 | """
92 | Run all_gather on arbitrary picklable data (not necessarily tensors)
93 | Args:
94 | data: any picklable object
95 | Returns:
96 | list[data]: list of data gathered from each rank
97 | """
98 | world_size = get_world_size()
99 | if world_size == 1:
100 | return [data]
101 |
102 | # serialized to a Tensor
103 | buffer = pickle.dumps(data)
104 | storage = torch.ByteStorage.from_buffer(buffer)
105 | tensor = torch.ByteTensor(storage).to("cuda")
106 |
107 | # obtain Tensor size of each rank
108 | local_size = torch.tensor([tensor.numel()], device="cuda")
109 | size_list = [torch.tensor([0], device="cuda") for _ in range(world_size)]
110 | dist.all_gather(size_list, local_size)
111 | size_list = [int(size.item()) for size in size_list]
112 | max_size = max(size_list)
113 |
114 | # receiving Tensor from all ranks
115 | # we pad the tensor because torch all_gather does not support
116 | # gathering tensors of different shapes
117 | tensor_list = []
118 | for _ in size_list:
119 | tensor_list.append(torch.empty((max_size,), dtype=torch.uint8, device="cuda"))
120 | if local_size != max_size:
121 | padding = torch.empty(size=(max_size - local_size,), dtype=torch.uint8, device="cuda")
122 | tensor = torch.cat((tensor, padding), dim=0)
123 | dist.all_gather(tensor_list, tensor)
124 |
125 | data_list = []
126 | for size, tensor in zip(size_list, tensor_list):
127 | buffer = tensor.cpu().numpy().tobytes()[:size]
128 | data_list.append(pickle.loads(buffer))
129 |
130 | return data_list
131 |
132 |
133 | def reduce_dict(input_dict, average=True):
134 | """
135 | Args:
136 | input_dict (dict): all the values will be reduced
137 | average (bool): whether to do average or sum
138 | Reduce the values in the dictionary from all processes so that all processes
139 | have the averaged results. Returns a dict with the same fields as
140 | input_dict, after reduction.
141 | """
142 | world_size = get_world_size()
143 | if world_size < 2:
144 | return input_dict
145 | with torch.no_grad():
146 | names = []
147 | values = []
148 | # sort the keys so that they are consistent across processes
149 | for k in sorted(input_dict.keys()):
150 | names.append(k)
151 | values.append(input_dict[k])
152 | values = torch.stack(values, dim=0)
153 | dist.all_reduce(values)
154 | if average:
155 | values /= world_size
156 | reduced_dict = {k: v for k, v in zip(names, values)}
157 | return reduced_dict
158 |
159 |
160 | def merge_dist_results(results):
161 | outs = []
162 | for res in results:
163 | outs.extend(res)
164 | return outs
165 |
--------------------------------------------------------------------------------
/VLN-DUET/map_nav_src/utils/logger.py:
--------------------------------------------------------------------------------
1 | import os
2 | import sys
3 | import math
4 | import time
5 | from collections import OrderedDict
6 |
7 |
8 | def write_to_record_file(data, file_path, verbose=True):
9 | if verbose:
10 | print(data)
11 | record_file = open(file_path, 'a')
12 | record_file.write(data+'\n')
13 | record_file.close()
14 |
15 |
16 | def asMinutes(s):
17 | m = math.floor(s / 60)
18 | s -= m * 60
19 | return '%dm %ds' % (m, s)
20 |
21 | def timeSince(since, percent):
22 | now = time.time()
23 | s = now - since
24 | es = s / (percent)
25 | rs = es - s
26 | return '%s (- %s)' % (asMinutes(s), asMinutes(rs))
27 |
28 | class Timer:
29 | def __init__(self):
30 | self.cul = OrderedDict()
31 | self.start = {}
32 | self.iter = 0
33 |
34 | def reset(self):
35 | self.cul = OrderedDict()
36 | self.start = {}
37 | self.iter = 0
38 |
39 | def tic(self, key):
40 | self.start[key] = time.time()
41 |
42 | def toc(self, key):
43 | delta = time.time() - self.start[key]
44 | if key not in self.cul:
45 | self.cul[key] = delta
46 | else:
47 | self.cul[key] += delta
48 |
49 | def step(self):
50 | self.iter += 1
51 |
52 | def show(self):
53 | total = sum(self.cul.values())
54 | for key in self.cul:
55 | print("%s, total time %0.2f, avg time %0.2f, part of %0.2f" %
56 | (key, self.cul[key], self.cul[key]*1./self.iter, self.cul[key]*1./total))
57 | print(total / self.iter)
58 |
59 |
60 | def print_progress(iteration, total, prefix='', suffix='', decimals=1, bar_length=100):
61 | """
62 | Call in a loop to create terminal progress bar
63 | @params:
64 | iteration - Required : current iteration (Int)
65 | total - Required : total iterations (Int)
66 | prefix - Optional : prefix string (Str)
67 | suffix - Optional : suffix string (Str)
68 | decimals - Optional : positive number of decimals in percent complete (Int)
69 | bar_length - Optional : character length of bar (Int)
70 | """
71 | str_format = "{0:." + str(decimals) + "f}"
72 | percents = str_format.format(100 * (iteration / float(total)))
73 | filled_length = int(round(bar_length * iteration / float(total)))
74 | bar = '█' * filled_length + '-' * (bar_length - filled_length)
75 |
76 | sys.stdout.write('\r%s |%s| %s%s %s' % (prefix, bar, percents, '%', suffix)),
77 |
78 | if iteration == total:
79 | sys.stdout.write('\n')
80 | sys.stdout.flush()
81 |
--------------------------------------------------------------------------------
/VLN-DUET/map_nav_src/utils/misc.py:
--------------------------------------------------------------------------------
1 | import random
2 | import numpy as np
3 | import torch
4 |
5 | def set_random_seed(seed):
6 | torch.manual_seed(seed)
7 | torch.cuda.manual_seed(seed)
8 | torch.cuda.manual_seed_all(seed)
9 | random.seed(seed)
10 | np.random.seed(seed)
11 |
12 | def length2mask(length, size=None):
13 | batch_size = len(length)
14 | size = int(max(length)) if size is None else size
15 | mask = (torch.arange(size, dtype=torch.int64).unsqueeze(0).repeat(batch_size, 1)
16 | > (torch.LongTensor(length) - 1).unsqueeze(1)).cuda()
17 | return mask
18 |
--------------------------------------------------------------------------------
/VLN-DUET/map_nav_src/utils/ops.py:
--------------------------------------------------------------------------------
1 | import numpy as np
2 | import torch
3 |
4 | def pad_tensors(tensors, lens=None, pad=0):
5 | """B x [T, ...]"""
6 | if lens is None:
7 | lens = [t.size(0) for t in tensors]
8 | max_len = max(lens)
9 | bs = len(tensors)
10 | hid = list(tensors[0].size()[1:])
11 | size = [bs, max_len] + hid
12 |
13 | dtype = tensors[0].dtype
14 | device = tensors[0].device
15 | output = torch.zeros(*size, dtype=dtype).to(device)
16 | if pad:
17 | output.data.fill_(pad)
18 | for i, (t, l) in enumerate(zip(tensors, lens)):
19 | output.data[i, :l, ...] = t.data
20 | return output
21 |
22 | def gen_seq_masks(seq_lens, max_len=None):
23 | if max_len is None:
24 | max_len = max(seq_lens)
25 |
26 | if isinstance(seq_lens, torch.Tensor):
27 | device = seq_lens.device
28 | masks = torch.arange(max_len).to(device).repeat(len(seq_lens), 1) < seq_lens.unsqueeze(1)
29 | return masks
30 |
31 | if max_len == 0:
32 | return np.zeros((len(seq_lens), 0), dtype=np.bool)
33 |
34 | seq_lens = np.array(seq_lens)
35 | batch_size = len(seq_lens)
36 | masks = np.arange(max_len).reshape(-1, max_len).repeat(batch_size, 0)
37 | masks = masks < seq_lens.reshape(-1, 1)
38 | return masks
--------------------------------------------------------------------------------
/VLN-DUET/pretrain_src/config/r2r_model_config_clip-b16.json:
--------------------------------------------------------------------------------
1 | {
2 | "pred_head_dropout_prob": 0.1,
3 | "attention_probs_dropout_prob": 0.1,
4 | "finetuning_task": null,
5 | "hidden_act": "gelu",
6 | "hidden_dropout_prob": 0.1,
7 | "hidden_size": 768,
8 | "image_feat_size": 512,
9 | "image_prob_size": 1000,
10 | "angle_feat_size": 4,
11 | "obj_feat_size": 0,
12 | "obj_prob_size": 0,
13 | "img_feature_type": "imagenet",
14 | "initializer_range": 0.02,
15 | "intermediate_size": 3072,
16 | "num_l_layers": 9,
17 | "num_x_layers": 4,
18 | "num_pano_layers": 2,
19 | "layer_norm_eps": 1e-12,
20 | "max_position_embeddings": 512,
21 | "max_action_steps": 100,
22 | "num_attention_heads": 12,
23 | "num_hidden_layers": 12,
24 | "num_labels": 2,
25 | "output_attentions": false,
26 | "output_hidden_states": false,
27 | "pruned_heads": {},
28 | "torchscript": false,
29 | "type_vocab_size": 2,
30 | "update_lang_bert": true,
31 | "vocab_size": 30522,
32 | "use_lang2visn_attn": true,
33 | "graph_sprels": true,
34 | "glocal_fuse": true,
35 | "lang_bert_name": "bert-base-uncased"
36 | }
37 |
--------------------------------------------------------------------------------
/VLN-DUET/pretrain_src/config/r2r_model_config_clip-h14.json:
--------------------------------------------------------------------------------
1 | {
2 | "pred_head_dropout_prob": 0.1,
3 | "attention_probs_dropout_prob": 0.1,
4 | "finetuning_task": null,
5 | "hidden_act": "gelu",
6 | "hidden_dropout_prob": 0.1,
7 | "hidden_size": 768,
8 | "image_feat_size": 1024,
9 | "image_prob_size": 1000,
10 | "angle_feat_size": 4,
11 | "obj_feat_size": 0,
12 | "obj_prob_size": 0,
13 | "img_feature_type": "imagenet",
14 | "initializer_range": 0.02,
15 | "intermediate_size": 3072,
16 | "num_l_layers": 9,
17 | "num_x_layers": 4,
18 | "num_pano_layers": 2,
19 | "layer_norm_eps": 1e-12,
20 | "max_position_embeddings": 512,
21 | "max_action_steps": 100,
22 | "num_attention_heads": 12,
23 | "num_hidden_layers": 12,
24 | "num_labels": 2,
25 | "output_attentions": false,
26 | "output_hidden_states": false,
27 | "pruned_heads": {},
28 | "torchscript": false,
29 | "type_vocab_size": 2,
30 | "update_lang_bert": true,
31 | "vocab_size": 30522,
32 | "use_lang2visn_attn": true,
33 | "graph_sprels": true,
34 | "glocal_fuse": true,
35 | "lang_bert_name": "bert-base-uncased"
36 | }
37 |
--------------------------------------------------------------------------------
/VLN-DUET/pretrain_src/config/r2r_pretrain_hm3d+mp3d+gibson_clip-b16.json:
--------------------------------------------------------------------------------
1 | {
2 | "model_config": "",
3 | "checkpoint": null,
4 | "output_dir": "",
5 | "mrc_mask_prob": 0.15,
6 | "max_txt_len": 200,
7 | "train_batch_size": 128,
8 | "val_batch_size": 128,
9 | "gradient_accumulation_steps": 1,
10 | "learning_rate": 5e-05,
11 | "valid_steps": 10000,
12 | "log_steps": 1000,
13 | "num_train_steps":200000,
14 | "optim": "adamw",
15 | "betas": [
16 | 0.9,
17 | 0.98
18 | ],
19 | "dropout": 0.1,
20 | "weight_decay": 0.01,
21 | "grad_norm": 5.0,
22 | "warmup_steps": 10000,
23 | "seed": 0,
24 | "fp16": false,
25 | "n_workers": 1,
26 | "pin_mem": false,
27 | "init_pretrained": "lxmert",
28 |
29 | "train_datasets": {
30 | "R2R": {
31 | "name": "R2R",
32 | "train_traj_files": ["../datasets/R2R/annotations/pretrain_map/R2R_train_enc.jsonl",
33 | "../datasets/R2R/annotations/pretrain_map/R2R_hm3d_aug_envdrop_generated_enc.jsonl",
34 | "../datasets/R2R/annotations/pretrain_map/R2R_prevalent_aug_train_enc.jsonl",
35 | "../datasets/R2R/annotations/pretrain_map/R2R_gibson_aug_envdrop_generated_enc.jsonl"],
36 | "val_seen_traj_files": ["../datasets/R2R/annotations/pretrain_map/R2R_val_seen_enc.jsonl"],
37 | "val_unseen_traj_files": ["../datasets/R2R/annotations/pretrain_map/R2R_val_unseen_enc.jsonl"],
38 | "connectivity_dir": "../datasets/R2R/connectivity",
39 | "img_ft_file": "../datasets/R2R/features/clip_vit-b16_mp3d_hm3d_gibson.hdf5",
40 | "scanvp_cands_file": "../datasets/R2R/annotations/scanvp_candview_relangles_with_hm3d_gibson.json",
41 | "tasks": [
42 | "mlm",
43 | "sap"
44 | ],
45 | "mix_ratio": [
46 | 1,
47 | 1
48 | ]
49 | }
50 | }
51 | }
52 |
--------------------------------------------------------------------------------
/VLN-DUET/pretrain_src/config/r2r_pretrain_hm3d+mp3d+gibson_clip-h14.json:
--------------------------------------------------------------------------------
1 | {
2 | "model_config": "",
3 | "checkpoint": null,
4 | "output_dir": "",
5 | "mrc_mask_prob": 0.15,
6 | "max_txt_len": 200,
7 | "train_batch_size": 128,
8 | "val_batch_size": 128,
9 | "gradient_accumulation_steps": 1,
10 | "learning_rate": 5e-05,
11 | "valid_steps": 10000,
12 | "log_steps": 1000,
13 | "num_train_steps":200000,
14 | "optim": "adamw",
15 | "betas": [
16 | 0.9,
17 | 0.98
18 | ],
19 | "dropout": 0.1,
20 | "weight_decay": 0.01,
21 | "grad_norm": 5.0,
22 | "warmup_steps": 10000,
23 | "seed": 0,
24 | "fp16": false,
25 | "n_workers": 1,
26 | "pin_mem": false,
27 | "init_pretrained": "lxmert",
28 |
29 | "train_datasets": {
30 | "R2R": {
31 | "name": "R2R",
32 | "train_traj_files": ["../datasets/R2R/annotations/pretrain_map/R2R_train_enc.jsonl",
33 | "../datasets/R2R/annotations/pretrain_map/R2R_hm3d_aug_envdrop_generated_enc.jsonl",
34 | "../datasets/R2R/annotations/pretrain_map/R2R_prevalent_aug_train_enc.jsonl",
35 | "../datasets/R2R/annotations/pretrain_map/R2R_gibson_aug_envdrop_generated_enc.jsonl"],
36 | "val_seen_traj_files": ["../datasets/R2R/annotations/pretrain_map/R2R_val_seen_enc.jsonl"],
37 | "val_unseen_traj_files": ["../datasets/R2R/annotations/pretrain_map/R2R_val_unseen_enc.jsonl"],
38 | "connectivity_dir": "../datasets/R2R/connectivity",
39 | "img_ft_file": "../datasets/R2R/features/clip_vit-h14_mp3d_hm3d_gibson.hdf5",
40 | "scanvp_cands_file": "../datasets/R2R/annotations/scanvp_candview_relangles_with_hm3d_gibson.json",
41 | "tasks": [
42 | "mlm",
43 | "sap"
44 | ],
45 | "mix_ratio": [
46 | 1,
47 | 1
48 | ]
49 | }
50 | }
51 | }
52 |
--------------------------------------------------------------------------------
/VLN-DUET/pretrain_src/data/__init__.py:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/wz0919/ScaleVLN/1189fe898462e2e10908631070bcf2d4ec2204b2/VLN-DUET/pretrain_src/data/__init__.py
--------------------------------------------------------------------------------
/VLN-DUET/pretrain_src/data/common.py:
--------------------------------------------------------------------------------
1 | import os
2 | import math
3 | import json
4 | import numpy as np
5 | import networkx as nx
6 |
7 | import torch
8 |
9 | def pad_tensors(tensors, lens=None, pad=0):
10 | """B x [T, ...] torch tensors"""
11 | if lens is None:
12 | lens = [t.size(0) for t in tensors]
13 | max_len = max(lens)
14 | bs = len(tensors)
15 | hid = list(tensors[0].size()[1:])
16 | size = [bs, max_len] + hid
17 |
18 | dtype = tensors[0].dtype
19 | output = torch.zeros(*size, dtype=dtype)
20 | if pad:
21 | output.data.fill_(pad)
22 | for i, (t, l) in enumerate(zip(tensors, lens)):
23 | output.data[i, :l, ...] = t.data
24 | return output
25 |
26 | def gen_seq_masks(seq_lens, max_len=None):
27 | """
28 | Args:
29 | seq_lens: list or nparray int, shape=(N, )
30 | Returns:
31 | masks: nparray, shape=(N, L), padded=0
32 | """
33 | seq_lens = np.array(seq_lens)
34 | if max_len is None:
35 | max_len = max(seq_lens)
36 | if max_len == 0:
37 | return np.zeros((len(seq_lens), 0), dtype=np.bool)
38 | batch_size = len(seq_lens)
39 | masks = np.arange(max_len).reshape(-1, max_len).repeat(batch_size, 0)
40 | masks = masks < seq_lens.reshape(-1, 1)
41 | return masks
42 |
43 | def get_angle_fts(headings, elevations, angle_feat_size):
44 | ang_fts = [np.sin(headings), np.cos(headings), np.sin(elevations), np.cos(elevations)]
45 | ang_fts = np.vstack(ang_fts).transpose().astype(np.float32)
46 | num_repeats = angle_feat_size // 4
47 | if num_repeats > 1:
48 | ang_fts = np.concatenate([ang_fts] * num_repeats, 1)
49 | return ang_fts
50 |
51 | def get_view_rel_angles(baseViewId=0):
52 | rel_angles = np.zeros((36, 2), dtype=np.float32)
53 |
54 | base_heading = (baseViewId % 12) * math.radians(30)
55 | base_elevation = (baseViewId // 12 - 1) * math.radians(30)
56 | for ix in range(36):
57 | if ix == 0:
58 | heading = 0
59 | elevation = math.radians(-30)
60 | elif ix % 12 == 0:
61 | heading = 0
62 | elevation += math.radians(30)
63 | else:
64 | heading += math.radians(30)
65 | rel_angles[ix, 0] = heading - base_heading
66 | rel_angles[ix, 1] = elevation - base_elevation
67 |
68 | return rel_angles
69 |
70 |
71 | def load_nav_graphs(connectivity_dir):
72 | ''' Load connectivity graph for each scan '''
73 |
74 | def distance(pose1, pose2):
75 | ''' Euclidean distance between two graph poses '''
76 | return ((pose1['pose'][3]-pose2['pose'][3])**2\
77 | + (pose1['pose'][7]-pose2['pose'][7])**2\
78 | + (pose1['pose'][11]-pose2['pose'][11])**2)**0.5
79 |
80 | scans = [x.strip() for x in open(os.path.join(connectivity_dir, 'scans.txt')).readlines()]
81 | graphs = {}
82 | for scan in scans:
83 | with open(os.path.join(connectivity_dir, '%s_connectivity.json' % scan)) as f:
84 | G = nx.Graph()
85 | positions = {}
86 | data = json.load(f)
87 | for i, item in enumerate(data):
88 | if item['included']:
89 | for j,conn in enumerate(item['unobstructed']):
90 | if conn and data[j]['included']:
91 | positions[item['image_id']] = np.array([item['pose'][3],
92 | item['pose'][7], item['pose'][11]]);
93 | assert data[j]['unobstructed'][i], 'Graph should be undirected'
94 | G.add_edge(item['image_id'],data[j]['image_id'],weight=distance(item,data[j]))
95 | nx.set_node_attributes(G, values=positions, name='position')
96 | graphs[scan] = G
97 |
98 | shortest_distances = {}
99 | shortest_paths = {}
100 | for scan, G in graphs.items(): # compute all shortest paths
101 | shortest_distances[scan] = dict(nx.all_pairs_dijkstra_path_length(G))
102 | shortest_paths[scan] = dict(nx.all_pairs_dijkstra_path(G))
103 | return graphs, shortest_distances, shortest_paths
104 |
105 | def softmax(logits, dim=1):
106 | # logits: (n, d)
107 | tmp = np.exp(logits)
108 | return tmp / np.sum(tmp, axis=dim, keepdims=True)
109 |
110 |
111 | def calculate_vp_rel_pos_fts(a, b, base_heading=0, base_elevation=0):
112 | # a, b: (x, y, z)
113 | dx = b[0] - a[0]
114 | dy = b[1] - a[1]
115 | dz = b[2] - a[2]
116 | xy_dist = max(np.sqrt(dx**2 + dy**2), 1e-8)
117 | xyz_dist = max(np.sqrt(dx**2 + dy**2 + dz**2), 1e-8)
118 |
119 | # the simulator's api is weired (x-y axis is transposed)
120 | heading = np.arcsin(dx/xy_dist) # [-pi/2, pi/2]
121 | if b[1] < a[1]:
122 | heading = np.pi - heading
123 | heading -= base_heading
124 |
125 | elevation = np.arcsin(dz/xyz_dist) # [-pi/2, pi/2]
126 | elevation -= base_elevation
127 |
128 | return heading, elevation, xyz_dist
129 |
130 | def normalize_angle(x):
131 | '''convert radians into (-pi, pi]'''
132 | pi2 = 2 * math.pi
133 | x = x % pi2 # [0, 2pi]
134 | x = np.where(x > math.pi, x - pi2, x)
135 | return x
--------------------------------------------------------------------------------
/VLN-DUET/pretrain_src/data/loader.py:
--------------------------------------------------------------------------------
1 | """
2 | Copyright (c) Microsoft Corporation.
3 | Licensed under the MIT license.
4 |
5 | A prefetch loader to speedup data loading
6 | Modified from Nvidia Deep Learning Examples
7 | (https://github.com/NVIDIA/DeepLearningExamples/tree/master/PyTorch).
8 | """
9 | import random
10 | from typing import List, Dict, Tuple, Union, Iterator
11 |
12 | import torch
13 | from torch.utils.data import DataLoader, RandomSampler, SequentialSampler
14 | from torch.utils.data.distributed import DistributedSampler
15 | import torch.distributed as dist
16 |
17 |
18 | class MetaLoader:
19 | """wraps multiple data loaders"""
20 |
21 | def __init__(
22 | self, loaders, accum_steps: int = 1, distributed: bool = False, device=None
23 | ):
24 | assert isinstance(loaders, dict)
25 | self.name2loader = {}
26 | self.name2iter = {}
27 | self.name2pre_epoch = {}
28 | self.names: List[str] = []
29 | ratios: List[int] = []
30 | for n, l in loaders.items():
31 | if isinstance(l, tuple):
32 | l, r, p = l
33 | elif isinstance(l, DataLoader):
34 | r = 1
35 | p = lambda e: None
36 | else:
37 | raise ValueError()
38 | self.names.append(n)
39 | self.name2loader[n] = l
40 | self.name2iter[n] = iter(l)
41 | self.name2pre_epoch[n] = p
42 | ratios.append(r)
43 |
44 | self.accum_steps = accum_steps
45 | self.device = device
46 | self.sampling_ratios = torch.tensor(ratios).float().to(self.device)
47 | self.distributed = distributed
48 | self.step = 0
49 |
50 | def __iter__(self) -> Iterator[Tuple]:
51 | """this iterator will run indefinitely"""
52 | task_id = None
53 | epoch_id = 0
54 | while True:
55 | if self.step % self.accum_steps == 0:
56 | task_id = torch.multinomial(self.sampling_ratios, 1)
57 | if self.distributed:
58 | # make sure all process is training same task
59 | dist.broadcast(task_id, 0)
60 | self.step += 1
61 | task = self.names[task_id.cpu().item()]
62 | iter_ = self.name2iter[task]
63 | try:
64 | batch = next(iter_)
65 | except StopIteration:
66 | epoch_id += 1
67 | # In distributed mode, calling the set_epoch() method at the beginning of each epoch
68 | # before creating the DataLoader iterator is necessary to make shuffling work properly
69 | # across multiple epochs. Otherwise, the same ordering will be always used.
70 | self.name2pre_epoch[task](epoch_id)
71 | iter_ = iter(self.name2loader[task])
72 | batch = next(iter_)
73 | self.name2iter[task] = iter_
74 |
75 | yield task, batch
76 |
77 |
78 | def move_to_cuda(batch: Union[List, Tuple, Dict, torch.Tensor], device: torch.device):
79 | if isinstance(batch, torch.Tensor):
80 | return batch.to(device, non_blocking=True)
81 | elif isinstance(batch, list):
82 | return [move_to_cuda(t, device) for t in batch]
83 | elif isinstance(batch, tuple):
84 | return tuple(move_to_cuda(t, device) for t in batch)
85 | elif isinstance(batch, dict):
86 | return {n: move_to_cuda(t, device) for n, t in batch.items()}
87 | return batch
88 |
89 |
90 | class PrefetchLoader(object):
91 | """
92 | overlap compute and cuda data transfer
93 | """
94 | def __init__(self, loader, device: torch.device):
95 | self.loader = loader
96 | self.device = device
97 |
98 | def __iter__(self):
99 | loader_it = iter(self.loader)
100 | self.preload(loader_it)
101 | batch = self.next(loader_it)
102 | while batch is not None:
103 | yield batch
104 | batch = self.next(loader_it)
105 |
106 | def __len__(self):
107 | return len(self.loader)
108 |
109 | def preload(self, it):
110 | try:
111 | self.batch = next(it)
112 | except StopIteration:
113 | self.batch = None
114 | return
115 | self.batch = move_to_cuda(self.batch, self.device)
116 |
117 | def next(self, it):
118 | batch = self.batch
119 | self.preload(it)
120 | return batch
121 |
122 | def __getattr__(self, name):
123 | method = self.loader.__getattribute__(name)
124 | return method
125 |
126 |
127 | def build_dataloader(task, dataset, collate_fn, is_train: bool, opts):
128 |
129 | batch_size = opts.train_batch_size if is_train else opts.val_batch_size
130 | # if task == 'itm': batch_size = max(1, batch_size // 2)
131 |
132 | if opts.local_rank == -1:
133 | if is_train:
134 | sampler: Union[
135 | RandomSampler, SequentialSampler, DistributedSampler
136 | ] = RandomSampler(dataset)
137 | else:
138 | sampler = SequentialSampler(dataset)
139 |
140 | size = torch.cuda.device_count() if torch.cuda.is_available() else 1
141 | pre_epoch = lambda e: None
142 |
143 | # DataParallel: scale the batch size by the number of GPUs
144 | if size > 1:
145 | batch_size *= size
146 |
147 | else:
148 | size = dist.get_world_size()
149 | sampler = DistributedSampler(
150 | dataset, num_replicas=size, rank=dist.get_rank(), shuffle=is_train
151 | )
152 | pre_epoch = sampler.set_epoch
153 |
154 | loader = DataLoader(
155 | dataset,
156 | sampler=sampler,
157 | batch_size=batch_size,
158 | num_workers=opts.n_workers,
159 | pin_memory=opts.pin_mem,
160 | collate_fn=collate_fn,
161 | drop_last=False,
162 | )
163 |
164 | return loader, pre_epoch
165 |
--------------------------------------------------------------------------------
/VLN-DUET/pretrain_src/model/__init__.py:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/wz0919/ScaleVLN/1189fe898462e2e10908631070bcf2d4ec2204b2/VLN-DUET/pretrain_src/model/__init__.py
--------------------------------------------------------------------------------
/VLN-DUET/pretrain_src/model/ops.py:
--------------------------------------------------------------------------------
1 | import torch
2 |
3 | from .transformer import TransformerEncoder, TransformerEncoderLayer
4 |
5 | # try:
6 | # from apex.normalization.fused_layer_norm import FusedLayerNorm as BertLayerNorm
7 | # except (ImportError, AttributeError) as e:
8 | # # logger.info("Better speed can be achieved with apex installed from https://www.github.com/nvidia/apex .")
9 | # BertLayerNorm = torch.nn.LayerNorm
10 | BertLayerNorm = torch.nn.LayerNorm
11 |
12 | def create_transformer_encoder(config, num_layers, norm=False):
13 | enc_layer = TransformerEncoderLayer(
14 | config.hidden_size, config.num_attention_heads,
15 | dim_feedforward=config.intermediate_size,
16 | dropout=config.hidden_dropout_prob,
17 | activation=config.hidden_act,
18 | normalize_before=True
19 | )
20 | if norm:
21 | norm_layer = BertLayerNorm(config.hidden_size, eps=1e-12)
22 | else:
23 | norm_layer = None
24 | return TransformerEncoder(enc_layer, num_layers, norm=norm_layer, batch_first=True)
25 |
26 | def extend_neg_masks(masks, dtype=None):
27 | """
28 | mask from (N, L) into (N, 1(H), 1(L), L) and make it negative
29 | """
30 | if dtype is None:
31 | dtype = torch.float
32 | extended_masks = masks.unsqueeze(1).unsqueeze(2)
33 | extended_masks = extended_masks.to(dtype=dtype)
34 | extended_masks = (1.0 - extended_masks) * -10000.0
35 | return extended_masks
36 |
37 | def gen_seq_masks(seq_lens, max_len=None):
38 | if max_len is None:
39 | max_len = max(seq_lens)
40 | batch_size = len(seq_lens)
41 | device = seq_lens.device
42 |
43 | masks = torch.arange(max_len).unsqueeze(0).repeat(batch_size, 1).to(device)
44 | masks = masks < seq_lens.unsqueeze(1)
45 | return masks
46 |
47 | def pad_tensors_wgrad(tensors, lens=None):
48 | """B x [T, ...] torch tensors"""
49 | if lens is None:
50 | lens = [t.size(0) for t in tensors]
51 | max_len = max(lens)
52 | batch_size = len(tensors)
53 | hid = list(tensors[0].size()[1:])
54 |
55 | device = tensors[0].device
56 | dtype = tensors[0].dtype
57 |
58 | output = []
59 | for i in range(batch_size):
60 | if lens[i] < max_len:
61 | tmp = torch.cat(
62 | [tensors[i], torch.zeros([max_len-lens[i]]+hid, dtype=dtype).to(device)],
63 | dim=0
64 | )
65 | else:
66 | tmp = tensors[i]
67 | output.append(tmp)
68 | output = torch.stack(output, 0)
69 | return output
70 |
--------------------------------------------------------------------------------
/VLN-DUET/pretrain_src/optim/__init__.py:
--------------------------------------------------------------------------------
1 | """
2 | Copyright (c) Microsoft Corporation.
3 | Licensed under the MIT license.
4 |
5 | """
6 | from .sched import noam_schedule, warmup_linear, get_lr_sched
7 | from .adamw import AdamW
8 |
--------------------------------------------------------------------------------
/VLN-DUET/pretrain_src/optim/adamw.py:
--------------------------------------------------------------------------------
1 | """
2 | AdamW optimizer (weight decay fix)
3 | copied from hugginface (https://github.com/huggingface/transformers).
4 | """
5 |
6 | import math
7 | from typing import Callable, Iterable, Tuple
8 |
9 | import torch
10 |
11 | from torch.optim import Optimizer
12 |
13 | class AdamW(Optimizer):
14 | """
15 | Implements Adam algorithm with weight decay fix as introduced in `Decoupled Weight Decay Regularization
16 | `__.
17 |
18 | Parameters:
19 | params (:obj:`Iterable[torch.nn.parameter.Parameter]`):
20 | Iterable of parameters to optimize or dictionaries defining parameter groups.
21 | lr (:obj:`float`, `optional`, defaults to 1e-3):
22 | The learning rate to use.
23 | betas (:obj:`Tuple[float,float]`, `optional`, defaults to (0.9, 0.999)):
24 | Adam's betas parameters (b1, b2).
25 | eps (:obj:`float`, `optional`, defaults to 1e-6):
26 | Adam's epsilon for numerical stability.
27 | weight_decay (:obj:`float`, `optional`, defaults to 0):
28 | Decoupled weight decay to apply.
29 | correct_bias (:obj:`bool`, `optional`, defaults to `True`):
30 | Whether ot not to correct bias in Adam (for instance, in Bert TF repository they use :obj:`False`).
31 | """
32 |
33 | def __init__(
34 | self,
35 | params: Iterable[torch.nn.parameter.Parameter],
36 | lr: float = 1e-3,
37 | betas: Tuple[float, float] = (0.9, 0.999),
38 | eps: float = 1e-6,
39 | weight_decay: float = 0.0,
40 | correct_bias: bool = True,
41 | ):
42 | if lr < 0.0:
43 | raise ValueError("Invalid learning rate: {} - should be >= 0.0".format(lr))
44 | if not 0.0 <= betas[0] < 1.0:
45 | raise ValueError("Invalid beta parameter: {} - should be in [0.0, 1.0[".format(betas[0]))
46 | if not 0.0 <= betas[1] < 1.0:
47 | raise ValueError("Invalid beta parameter: {} - should be in [0.0, 1.0[".format(betas[1]))
48 | if not 0.0 <= eps:
49 | raise ValueError("Invalid epsilon value: {} - should be >= 0.0".format(eps))
50 | defaults = dict(lr=lr, betas=betas, eps=eps, weight_decay=weight_decay, correct_bias=correct_bias)
51 | super().__init__(params, defaults)
52 |
53 | def step(self, closure: Callable = None):
54 | """
55 | Performs a single optimization step.
56 |
57 | Arguments:
58 | closure (:obj:`Callable`, `optional`): A closure that reevaluates the model and returns the loss.
59 | """
60 | loss = None
61 | if closure is not None:
62 | loss = closure()
63 |
64 | for group in self.param_groups:
65 | for p in group["params"]:
66 | if p.grad is None:
67 | continue
68 | grad = p.grad.data
69 | if grad.is_sparse:
70 | raise RuntimeError("Adam does not support sparse gradients, please consider SparseAdam instead")
71 |
72 | state = self.state[p]
73 |
74 | # State initialization
75 | if len(state) == 0:
76 | state["step"] = 0
77 | # Exponential moving average of gradient values
78 | state["exp_avg"] = torch.zeros_like(p.data)
79 | # Exponential moving average of squared gradient values
80 | state["exp_avg_sq"] = torch.zeros_like(p.data)
81 |
82 | exp_avg, exp_avg_sq = state["exp_avg"], state["exp_avg_sq"]
83 | beta1, beta2 = group["betas"]
84 |
85 | state["step"] += 1
86 |
87 | # Decay the first and second moment running average coefficient
88 | # In-place operations to update the averages at the same time
89 | exp_avg.mul_(beta1).add_(grad, alpha=1.0 - beta1)
90 | exp_avg_sq.mul_(beta2).addcmul_(grad, grad, value=1.0 - beta2)
91 | denom = exp_avg_sq.sqrt().add_(group["eps"])
92 |
93 | step_size = group["lr"]
94 | if group["correct_bias"]: # No bias correction for Bert
95 | bias_correction1 = 1.0 - beta1 ** state["step"]
96 | bias_correction2 = 1.0 - beta2 ** state["step"]
97 | step_size = step_size * math.sqrt(bias_correction2) / bias_correction1
98 |
99 | p.data.addcdiv_(exp_avg, denom, value=-step_size)
100 |
101 | # Just adding the square of the weights to the loss function is *not*
102 | # the correct way of using L2 regularization/weight decay with Adam,
103 | # since that will interact with the m and v parameters in strange ways.
104 | #
105 | # Instead we want to decay the weights in a manner that doesn't interact
106 | # with the m/v parameters. This is equivalent to adding the square
107 | # of the weights to the loss with plain (non-momentum) SGD.
108 | # Add weight decay at the end (fixed version)
109 | if group["weight_decay"] > 0.0:
110 | p.data.add_(p.data, alpha=-group["lr"] * group["weight_decay"])
111 |
112 | return loss
113 |
--------------------------------------------------------------------------------
/VLN-DUET/pretrain_src/optim/lookahead.py:
--------------------------------------------------------------------------------
1 | # Lookahead implementation from https://github.com/rwightman/pytorch-image-models/blob/master/timm/optim/lookahead.py
2 |
3 | """ Lookahead Optimizer Wrapper.
4 | Implementation modified from: https://github.com/alphadl/lookahead.pytorch
5 | Paper: `Lookahead Optimizer: k steps forward, 1 step back` - https://arxiv.org/abs/1907.08610
6 | """
7 | import torch
8 | from torch.optim.optimizer import Optimizer
9 | from torch.optim import Adam
10 | from collections import defaultdict
11 |
12 | class Lookahead(Optimizer):
13 | def __init__(self, base_optimizer, alpha=0.5, k=6):
14 | if not 0.0 <= alpha <= 1.0:
15 | raise ValueError(f'Invalid slow update rate: {alpha}')
16 | if not 1 <= k:
17 | raise ValueError(f'Invalid lookahead steps: {k}')
18 | defaults = dict(lookahead_alpha=alpha, lookahead_k=k, lookahead_step=0)
19 | self.base_optimizer = base_optimizer
20 | self.param_groups = self.base_optimizer.param_groups
21 | self.defaults = base_optimizer.defaults
22 | self.defaults.update(defaults)
23 | self.state = defaultdict(dict)
24 | # manually add our defaults to the param groups
25 | for name, default in defaults.items():
26 | for group in self.param_groups:
27 | group.setdefault(name, default)
28 |
29 | def update_slow(self, group):
30 | for fast_p in group["params"]:
31 | if fast_p.grad is None:
32 | continue
33 | param_state = self.state[fast_p]
34 | if 'slow_buffer' not in param_state:
35 | param_state['slow_buffer'] = torch.empty_like(fast_p.data)
36 | param_state['slow_buffer'].copy_(fast_p.data)
37 | slow = param_state['slow_buffer']
38 | slow.add_(group['lookahead_alpha'], fast_p.data - slow)
39 | fast_p.data.copy_(slow)
40 |
41 | def sync_lookahead(self):
42 | for group in self.param_groups:
43 | self.update_slow(group)
44 |
45 | def step(self, closure=None):
46 | # print(self.k)
47 | #assert id(self.param_groups) == id(self.base_optimizer.param_groups)
48 | loss = self.base_optimizer.step(closure)
49 | for group in self.param_groups:
50 | group['lookahead_step'] += 1
51 | if group['lookahead_step'] % group['lookahead_k'] == 0:
52 | self.update_slow(group)
53 | return loss
54 |
55 | def state_dict(self):
56 | fast_state_dict = self.base_optimizer.state_dict()
57 | slow_state = {
58 | (id(k) if isinstance(k, torch.Tensor) else k): v
59 | for k, v in self.state.items()
60 | }
61 | fast_state = fast_state_dict['state']
62 | param_groups = fast_state_dict['param_groups']
63 | return {
64 | 'state': fast_state,
65 | 'slow_state': slow_state,
66 | 'param_groups': param_groups,
67 | }
68 |
69 | def load_state_dict(self, state_dict):
70 | fast_state_dict = {
71 | 'state': state_dict['state'],
72 | 'param_groups': state_dict['param_groups'],
73 | }
74 | self.base_optimizer.load_state_dict(fast_state_dict)
75 |
76 | # We want to restore the slow state, but share param_groups reference
77 | # with base_optimizer. This is a bit redundant but least code
78 | slow_state_new = False
79 | if 'slow_state' not in state_dict:
80 | print('Loading state_dict from optimizer without Lookahead applied.')
81 | state_dict['slow_state'] = defaultdict(dict)
82 | slow_state_new = True
83 | slow_state_dict = {
84 | 'state': state_dict['slow_state'],
85 | 'param_groups': state_dict['param_groups'], # this is pointless but saves code
86 | }
87 | super(Lookahead, self).load_state_dict(slow_state_dict)
88 | self.param_groups = self.base_optimizer.param_groups # make both ref same container
89 | if slow_state_new:
90 | # reapply defaults to catch missing lookahead specific ones
91 | for name, default in self.defaults.items():
92 | for group in self.param_groups:
93 | group.setdefault(name, default)
94 |
95 | def LookaheadAdam(params, alpha=0.5, k=6, *args, **kwargs):
96 | adam = Adam(params, *args, **kwargs)
97 | return Lookahead(adam, alpha, k)
98 |
--------------------------------------------------------------------------------
/VLN-DUET/pretrain_src/optim/misc.py:
--------------------------------------------------------------------------------
1 | """
2 | Copyright (c) Microsoft Corporation.
3 | Licensed under the MIT license.
4 |
5 | Misc lr helper
6 | """
7 | from torch.optim import Adam, Adamax
8 |
9 | from .adamw import AdamW
10 | from .rangerlars import RangerLars
11 |
12 | def build_optimizer(model, opts):
13 | param_optimizer = list(model.named_parameters())
14 | no_decay = ['bias', 'LayerNorm.bias', 'LayerNorm.weight']
15 | optimizer_grouped_parameters = [
16 | {'params': [p for n, p in param_optimizer
17 | if not any(nd in n for nd in no_decay)],
18 | 'weight_decay': opts.weight_decay},
19 | {'params': [p for n, p in param_optimizer
20 | if any(nd in n for nd in no_decay)],
21 | 'weight_decay': 0.0}
22 | ]
23 |
24 | # currently Adam only
25 | if opts.optim == 'adam':
26 | OptimCls = Adam
27 | elif opts.optim == 'adamax':
28 | OptimCls = Adamax
29 | elif opts.optim == 'adamw':
30 | OptimCls = AdamW
31 | elif opts.optim == 'rangerlars':
32 | OptimCls = RangerLars
33 | else:
34 | raise ValueError('invalid optimizer')
35 | optimizer = OptimCls(optimizer_grouped_parameters,
36 | lr=opts.learning_rate, betas=opts.betas)
37 | return optimizer
38 |
--------------------------------------------------------------------------------
/VLN-DUET/pretrain_src/optim/ralamb.py:
--------------------------------------------------------------------------------
1 | import torch, math
2 | from torch.optim.optimizer import Optimizer
3 |
4 | # RAdam + LARS
5 | class Ralamb(Optimizer):
6 |
7 | def __init__(self, params, lr=1e-3, betas=(0.9, 0.999), eps=1e-8, weight_decay=0):
8 | defaults = dict(lr=lr, betas=betas, eps=eps, weight_decay=weight_decay)
9 | self.buffer = [[None, None, None] for ind in range(10)]
10 | super(Ralamb, self).__init__(params, defaults)
11 |
12 | def __setstate__(self, state):
13 | super(Ralamb, self).__setstate__(state)
14 |
15 | def step(self, closure=None):
16 |
17 | loss = None
18 | if closure is not None:
19 | loss = closure()
20 |
21 | for group in self.param_groups:
22 |
23 | for p in group['params']:
24 | if p.grad is None:
25 | continue
26 | grad = p.grad.data.float()
27 | if grad.is_sparse:
28 | raise RuntimeError('Ralamb does not support sparse gradients')
29 |
30 | p_data_fp32 = p.data.float()
31 |
32 | state = self.state[p]
33 |
34 | if len(state) == 0:
35 | state['step'] = 0
36 | state['exp_avg'] = torch.zeros_like(p_data_fp32)
37 | state['exp_avg_sq'] = torch.zeros_like(p_data_fp32)
38 | else:
39 | state['exp_avg'] = state['exp_avg'].type_as(p_data_fp32)
40 | state['exp_avg_sq'] = state['exp_avg_sq'].type_as(p_data_fp32)
41 |
42 | exp_avg, exp_avg_sq = state['exp_avg'], state['exp_avg_sq']
43 | beta1, beta2 = group['betas']
44 |
45 | # Decay the first and second moment running average coefficient
46 | # m_t
47 | exp_avg.mul_(beta1).add_(grad, alpha=1 - beta1)
48 | # v_t
49 | exp_avg_sq.mul_(beta2).addcmul_(grad, grad, value=1 - beta2)
50 |
51 | state['step'] += 1
52 | buffered = self.buffer[int(state['step'] % 10)]
53 |
54 | if state['step'] == buffered[0]:
55 | N_sma, radam_step_size = buffered[1], buffered[2]
56 | else:
57 | buffered[0] = state['step']
58 | beta2_t = beta2 ** state['step']
59 | N_sma_max = 2 / (1 - beta2) - 1
60 | N_sma = N_sma_max - 2 * state['step'] * beta2_t / (1 - beta2_t)
61 | buffered[1] = N_sma
62 |
63 | # more conservative since it's an approximated value
64 | if N_sma >= 5:
65 | radam_step_size = math.sqrt((1 - beta2_t) * (N_sma - 4) / (N_sma_max - 4) * (N_sma - 2) / N_sma * N_sma_max / (N_sma_max - 2)) / (1 - beta1 ** state['step'])
66 | else:
67 | radam_step_size = 1.0 / (1 - beta1 ** state['step'])
68 | buffered[2] = radam_step_size
69 |
70 | if group['weight_decay'] != 0:
71 | p_data_fp32.add_(p_data_fp32, alpha=-group['weight_decay'] * group['lr'])
72 |
73 | # more conservative since it's an approximated value
74 | radam_step = p_data_fp32.clone()
75 | if N_sma >= 5:
76 | denom = exp_avg_sq.sqrt().add_(group['eps'])
77 | radam_step.addcdiv_(-radam_step_size * group['lr'], exp_avg, denom)
78 | else:
79 | radam_step.add_(exp_avg, alpha=-radam_step_size * group['lr'])
80 |
81 | radam_norm = radam_step.pow(2).sum().sqrt()
82 | weight_norm = p.data.pow(2).sum().sqrt().clamp(0, 10)
83 | if weight_norm == 0 or radam_norm == 0:
84 | trust_ratio = 1
85 | else:
86 | trust_ratio = weight_norm / radam_norm
87 |
88 | state['weight_norm'] = weight_norm
89 | state['adam_norm'] = radam_norm
90 | state['trust_ratio'] = trust_ratio
91 |
92 | if N_sma >= 5:
93 | p_data_fp32.addcdiv_(-radam_step_size * group['lr'] * trust_ratio, exp_avg, denom)
94 | else:
95 | p_data_fp32.add_(-radam_step_size * group['lr'] * trust_ratio, exp_avg)
96 |
97 | p.data.copy_(p_data_fp32)
98 |
99 | return loss
100 |
--------------------------------------------------------------------------------
/VLN-DUET/pretrain_src/optim/rangerlars.py:
--------------------------------------------------------------------------------
1 | import torch, math
2 | from torch.optim.optimizer import Optimizer
3 | import itertools as it
4 | from .lookahead import *
5 | from .ralamb import *
6 |
7 | # RAdam + LARS + LookAHead
8 |
9 | # Lookahead implementation from https://github.com/lonePatient/lookahead_pytorch/blob/master/optimizer.py
10 | # RAdam + LARS implementation from https://gist.github.com/redknightlois/c4023d393eb8f92bb44b2ab582d7ec20
11 |
12 | def RangerLars(params, alpha=0.5, k=6, *args, **kwargs):
13 | ralamb = Ralamb(params, *args, **kwargs)
14 | return Lookahead(ralamb, alpha, k)
15 |
--------------------------------------------------------------------------------
/VLN-DUET/pretrain_src/optim/sched.py:
--------------------------------------------------------------------------------
1 | """
2 | Copyright (c) Microsoft Corporation.
3 | Licensed under the MIT license.
4 |
5 | optimizer learning rate scheduling helpers
6 | """
7 | from math import ceil
8 |
9 |
10 | def noam_schedule(step, warmup_step=4000):
11 | """ original Transformer schedule"""
12 | if step <= warmup_step:
13 | return step / warmup_step
14 | return (warmup_step ** 0.5) * (step ** -0.5)
15 |
16 |
17 | def warmup_linear(step, warmup_step, tot_step):
18 | """ BERT schedule """
19 | if step < warmup_step:
20 | return step / warmup_step
21 | return max(0, (tot_step-step)/(tot_step-warmup_step))
22 |
23 |
24 | def get_lr_sched(global_step, opts):
25 | # learning rate scheduling
26 | lr_this_step = opts.learning_rate * warmup_linear(
27 | global_step, opts.warmup_steps, opts.num_train_steps)
28 | if lr_this_step <= 0:
29 | lr_this_step = 1e-8
30 | return lr_this_step
31 |
--------------------------------------------------------------------------------
/VLN-DUET/pretrain_src/parser.py:
--------------------------------------------------------------------------------
1 | import argparse
2 | import sys
3 | import json
4 |
5 |
6 | def load_parser():
7 | parser = argparse.ArgumentParser()
8 |
9 | # Required parameters
10 | # NOTE: train tasks and val tasks cannot take command line arguments
11 | parser.add_argument('--vlnbert', choices=['cmt'])
12 | parser.add_argument(
13 | "--model_config", type=str, help="path to model structure config json"
14 | )
15 | parser.add_argument(
16 | "--checkpoint", default=None, type=str, help="path to model checkpoint (*.pt)"
17 | )
18 |
19 | parser.add_argument(
20 | "--output_dir",
21 | default=None,
22 | type=str,
23 | help="The output directory where the model checkpoints will be written.",
24 | )
25 |
26 | # training parameters
27 | parser.add_argument(
28 | "--train_batch_size",
29 | default=4096,
30 | type=int,
31 | help="Total batch size for training. ",
32 | )
33 | parser.add_argument(
34 | "--val_batch_size",
35 | default=4096,
36 | type=int,
37 | help="Total batch size for validation. ",
38 | )
39 | parser.add_argument(
40 | "--gradient_accumulation_steps",
41 | type=int,
42 | default=16,
43 | help="Number of updates steps to accumualte before "
44 | "performing a backward/update pass.",
45 | )
46 | parser.add_argument(
47 | "--learning_rate",
48 | default=3e-5,
49 | type=float,
50 | help="The initial learning rate for Adam.",
51 | )
52 | parser.add_argument(
53 | "--valid_steps", default=1000, type=int, help="Run validation every X steps"
54 | )
55 | parser.add_argument("--log_steps", default=1000, type=int)
56 | parser.add_argument(
57 | "--num_train_steps",
58 | default=100000,
59 | type=int,
60 | help="Total number of training updates to perform.",
61 | )
62 | parser.add_argument(
63 | "--optim",
64 | default="adamw",
65 | choices=["adam", "adamax", "adamw"],
66 | help="optimizer",
67 | )
68 | parser.add_argument(
69 | "--betas", default=[0.9, 0.98], nargs="+", help="beta for adam optimizer"
70 | )
71 | parser.add_argument(
72 | "--dropout", default=0.1, type=float, help="tune dropout regularization"
73 | )
74 | parser.add_argument(
75 | "--weight_decay",
76 | default=0.01,
77 | type=float,
78 | help="weight decay (L2) regularization",
79 | )
80 | parser.add_argument(
81 | "--grad_norm",
82 | default=2.0,
83 | type=float,
84 | help="gradient clipping (-1 for no clipping)",
85 | )
86 | parser.add_argument(
87 | "--warmup_steps",
88 | default=10000,
89 | type=int,
90 | help="Number of training steps to perform linear " "learning rate warmup for.",
91 | )
92 |
93 | # device parameters
94 | parser.add_argument(
95 | "--seed", type=int, default=0, help="random seed for initialization"
96 | )
97 | parser.add_argument(
98 | "--fp16",
99 | action="store_true",
100 | help="Whether to use 16-bit float precision instead of 32-bit",
101 | )
102 | parser.add_argument(
103 | "--n_workers", type=int, default=4, help="number of data workers"
104 | )
105 | parser.add_argument("--pin_mem", action="store_true", help="pin memory")
106 |
107 | # distributed computing
108 | parser.add_argument(
109 | "--local_rank",
110 | type=int,
111 | default=-1,
112 | help="local rank for distributed training on gpus",
113 | )
114 | parser.add_argument(
115 | "--node_rank",
116 | type=int,
117 | default=0,
118 | help="Id of the node",
119 | )
120 | parser.add_argument(
121 | "--world_size",
122 | type=int,
123 | default=1,
124 | help="Number of GPUs across all nodes",
125 | )
126 |
127 | # can use config files
128 | parser.add_argument("--config", required=True, help="JSON config files")
129 |
130 | return parser
131 |
132 |
133 | def parse_with_config(parser):
134 | args = parser.parse_args()
135 | if args.config is not None:
136 | config_args = json.load(open(args.config))
137 | override_keys = {
138 | arg[2:].split("=")[0] for arg in sys.argv[1:] if arg.startswith("--")
139 | }
140 | for k, v in config_args.items():
141 | if k not in override_keys:
142 | setattr(args, k, v)
143 | del args.config
144 | return args
145 |
--------------------------------------------------------------------------------
/VLN-DUET/pretrain_src/run_r2r_b16.sh:
--------------------------------------------------------------------------------
1 |
2 | NODE_RANK=0
3 | NUM_GPUS=2
4 | outdir=../datasets/R2R/exprs_map/pretrain/cmt-clip.vit.b16-mlm.sap-init.lxmert-aug.mp3d.prevalent.hm3d_gibson.envdrop
5 |
6 | # train
7 | CUDA_VISIBLE_DEVICES=$1 python -m torch.distributed.launch \
8 | --master_port $2 \
9 | --nproc_per_node=${NUM_GPUS} --node_rank $NODE_RANK \
10 | train_r2r.py --world_size ${NUM_GPUS} \
11 | --vlnbert cmt \
12 | --model_config config/r2r_model_config_clip-b16.json \
13 | --config config/r2r_pretrain_hm3d+mp3d+gibson_clip-b16.json \
14 | --output_dir $outdir
15 |
--------------------------------------------------------------------------------
/VLN-DUET/pretrain_src/run_r2r_h14.sh:
--------------------------------------------------------------------------------
1 |
2 | NODE_RANK=0
3 | NUM_GPUS=2
4 | outdir=../datasets/R2R/exprs_map/pretrain/cmt-clip.vit.h14-mlm.sap-init.lxmert-aug.mp3d.prevalent.hm3d_gibson.envdrop
5 |
6 | # train
7 | CUDA_VISIBLE_DEVICES=$1 python -m torch.distributed.launch \
8 | --master_port $2 \
9 | --nproc_per_node=${NUM_GPUS} --node_rank $NODE_RANK \
10 | train_r2r.py --world_size ${NUM_GPUS} \
11 | --vlnbert cmt \
12 | --model_config config/r2r_model_config_clip-h14.json \
13 | --config config/r2r_pretrain_hm3d+mp3d+gibson_clip-h14.json \
14 | --output_dir $outdir
15 |
--------------------------------------------------------------------------------
/VLN-DUET/pretrain_src/utils/__init__.py:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/wz0919/ScaleVLN/1189fe898462e2e10908631070bcf2d4ec2204b2/VLN-DUET/pretrain_src/utils/__init__.py
--------------------------------------------------------------------------------
/VLN-DUET/pretrain_src/utils/distributed.py:
--------------------------------------------------------------------------------
1 | """
2 | Distributed tools
3 | """
4 | import os
5 | from pathlib import Path
6 | from pprint import pformat
7 | import pickle
8 |
9 | import torch
10 | import torch.distributed as dist
11 |
12 |
13 | def load_init_param(opts):
14 | """
15 | Load parameters for the rendezvous distributed procedure
16 | """
17 | # sync file
18 | if opts.output_dir != "":
19 | sync_dir = Path(opts.output_dir).resolve()
20 | sync_dir.mkdir(parents=True, exist_ok=True)
21 | sync_file = f"{sync_dir}/.torch_distributed_sync"
22 | else:
23 | raise RuntimeError("Can't find any sync dir")
24 |
25 | # world size
26 | if opts.world_size != -1:
27 | world_size = opts.world_size
28 | elif os.environ.get("WORLD_SIZE", "") != "":
29 | world_size = int(os.environ["WORLD_SIZE"])
30 | else:
31 | raise RuntimeError("Can't find any world size")
32 |
33 | # rank
34 | if os.environ.get("RANK", "") != "":
35 | # pytorch.distributed.launch provide this variable no matter what
36 | rank = int(os.environ["RANK"])
37 | else:
38 | # if not provided, calculate the gpu rank
39 | if opts.node_rank != -1:
40 | node_rank = opts.node_rank
41 | elif os.environ.get("NODE_RANK", "") != "":
42 | node_rank = int(os.environ["NODE_RANK"])
43 | else:
44 | raise RuntimeError("Can't find any rank or node rank")
45 |
46 | if opts.local_rank != -1:
47 | local_rank = opts.local_rank
48 | elif os.environ.get("LOCAL_RANK", "") != "":
49 | local_rank = int(os.environ["LOCAL_RANK"])
50 | else:
51 | raise RuntimeError("Can't find any rank or local rank")
52 |
53 | # WARNING: this assumes that each node has the same number of GPUs
54 | n_gpus = torch.cuda.device_count()
55 | rank = local_rank + node_rank * n_gpus
56 | opts.rank = rank
57 |
58 | return {
59 | "backend": "nccl",
60 | # "init_method": f"file://{sync_file}",
61 | "rank": rank,
62 | "world_size": world_size,
63 | }
64 |
65 |
66 | def init_distributed(opts):
67 | init_param = load_init_param(opts)
68 | rank = init_param["rank"]
69 |
70 | print(f"Init distributed {init_param['rank']} - {init_param['world_size']}")
71 |
72 | dist.init_process_group(**init_param)
73 |
74 |
75 | def is_default_gpu(opts) -> bool:
76 | return opts.local_rank == -1 or dist.get_rank() == 0
77 |
78 |
79 | def is_dist_avail_and_initialized():
80 | if not dist.is_available():
81 | return False
82 | if not dist.is_initialized():
83 | return False
84 | return True
85 |
86 | def get_world_size():
87 | if not is_dist_avail_and_initialized():
88 | return 1
89 | return dist.get_world_size()
90 |
91 | def all_gather(data):
92 | """
93 | Run all_gather on arbitrary picklable data (not necessarily tensors)
94 | Args:
95 | data: any picklable object
96 | Returns:
97 | list[data]: list of data gathered from each rank
98 | """
99 | world_size = get_world_size()
100 | if world_size == 1:
101 | return [data]
102 |
103 | # serialized to a Tensor
104 | buffer = pickle.dumps(data)
105 | storage = torch.ByteStorage.from_buffer(buffer)
106 | tensor = torch.ByteTensor(storage).to("cuda")
107 |
108 | # obtain Tensor size of each rank
109 | local_size = torch.tensor([tensor.numel()], device="cuda")
110 | size_list = [torch.tensor([0], device="cuda") for _ in range(world_size)]
111 | dist.all_gather(size_list, local_size)
112 | size_list = [int(size.item()) for size in size_list]
113 | max_size = max(size_list)
114 |
115 | # receiving Tensor from all ranks
116 | # we pad the tensor because torch all_gather does not support
117 | # gathering tensors of different shapes
118 | tensor_list = []
119 | for _ in size_list:
120 | tensor_list.append(torch.empty((max_size,), dtype=torch.uint8, device="cuda"))
121 | if local_size != max_size:
122 | padding = torch.empty(size=(max_size - local_size,), dtype=torch.uint8, device="cuda")
123 | tensor = torch.cat((tensor, padding), dim=0)
124 | dist.all_gather(tensor_list, tensor)
125 |
126 | data_list = []
127 | for size, tensor in zip(size_list, tensor_list):
128 | buffer = tensor.cpu().numpy().tobytes()[:size]
129 | data_list.append(pickle.loads(buffer))
130 |
131 | return data_list
132 |
133 |
134 | def reduce_dict(input_dict, average=True):
135 | """
136 | Args:
137 | input_dict (dict): all the values will be reduced
138 | average (bool): whether to do average or sum
139 | Reduce the values in the dictionary from all processes so that all processes
140 | have the averaged results. Returns a dict with the same fields as
141 | input_dict, after reduction.
142 | """
143 | world_size = get_world_size()
144 | if world_size < 2:
145 | return input_dict
146 | with torch.no_grad():
147 | names = []
148 | values = []
149 | # sort the keys so that they are consistent across processes
150 | for k in sorted(input_dict.keys()):
151 | names.append(k)
152 | values.append(input_dict[k])
153 | values = torch.stack(values, dim=0)
154 | dist.all_reduce(values)
155 | if average:
156 | values /= world_size
157 | reduced_dict = {k: v for k, v in zip(names, values)}
158 | return reduced_dict
159 |
160 |
161 |
--------------------------------------------------------------------------------
/VLN-DUET/pretrain_src/utils/logger.py:
--------------------------------------------------------------------------------
1 | """
2 | Copyright (c) Microsoft Corporation.
3 | Licensed under the MIT license.
4 |
5 | helper for logging
6 | NOTE: loggers are global objects use with caution
7 | """
8 | import logging
9 | import math
10 |
11 | import tensorboardX
12 |
13 |
14 | _LOG_FMT = '%(asctime)s - %(levelname)s - %(name)s - %(message)s'
15 | _DATE_FMT = '%m/%d/%Y %H:%M:%S'
16 | logging.basicConfig(format=_LOG_FMT, datefmt=_DATE_FMT, level=logging.INFO)
17 | LOGGER = logging.getLogger('__main__') # this is the global logger
18 |
19 |
20 | def add_log_to_file(log_path):
21 | fh = logging.FileHandler(log_path)
22 | formatter = logging.Formatter(_LOG_FMT, datefmt=_DATE_FMT)
23 | fh.setFormatter(formatter)
24 | LOGGER.addHandler(fh)
25 |
26 |
27 | class TensorboardLogger(object):
28 | def __init__(self):
29 | self._logger = None
30 | self._global_step = 0
31 |
32 | def create(self, path):
33 | self._logger = tensorboardX.SummaryWriter(path)
34 |
35 | def noop(self, *args, **kwargs):
36 | return
37 |
38 | def step(self):
39 | self._global_step += 1
40 |
41 | @property
42 | def global_step(self):
43 | return self._global_step
44 |
45 | def log_scalar_dict(self, log_dict, prefix=''):
46 | """ log a dictionary of scalar values"""
47 | if self._logger is None:
48 | return
49 | if prefix:
50 | prefix = f'{prefix}_'
51 | for name, value in log_dict.items():
52 | if isinstance(value, dict):
53 | self.log_scalar_dict(value, self._global_step,
54 | prefix=f'{prefix}{name}')
55 | else:
56 | self._logger.add_scalar(f'{prefix}{name}', value,
57 | self._global_step)
58 |
59 | def __getattr__(self, name):
60 | if self._logger is None:
61 | return self.noop
62 | return self._logger.__getattribute__(name)
63 |
64 |
65 | TB_LOGGER = TensorboardLogger()
66 |
67 |
68 | class RunningMeter(object):
69 | """ running meteor of a scalar value
70 | (useful for monitoring training loss)
71 | """
72 | def __init__(self, name, val=None, smooth=0.99):
73 | self._name = name
74 | self._sm = smooth
75 | self._val = val
76 |
77 | def __call__(self, value):
78 | val = (value if self._val is None
79 | else value*(1-self._sm) + self._val*self._sm)
80 | if not math.isnan(val):
81 | self._val = val
82 |
83 | def __str__(self):
84 | return f'{self._name}: {self._val:.4f}'
85 |
86 | @property
87 | def val(self):
88 | if self._val is None:
89 | return 0
90 | return self._val
91 |
92 | @property
93 | def name(self):
94 | return self._name
95 |
--------------------------------------------------------------------------------
/VLN-DUET/pretrain_src/utils/misc.py:
--------------------------------------------------------------------------------
1 | import random
2 | import numpy as np
3 | from typing import Tuple, Union, Dict, Any
4 |
5 | import torch
6 | import torch.distributed as dist
7 | from torch.nn.parallel import DistributedDataParallel as DDP
8 |
9 | from .distributed import init_distributed
10 | from .logger import LOGGER
11 |
12 |
13 | def set_random_seed(seed):
14 | random.seed(seed)
15 | np.random.seed(seed)
16 | torch.manual_seed(seed)
17 | torch.cuda.manual_seed_all(seed)
18 |
19 | def set_dropout(model, drop_p):
20 | for name, module in model.named_modules():
21 | # we might want to tune dropout for smaller dataset
22 | if isinstance(module, torch.nn.Dropout):
23 | if module.p != drop_p:
24 | module.p = drop_p
25 | LOGGER.info(f'{name} set to {drop_p}')
26 |
27 | def set_cuda(opts) -> Tuple[bool, int, torch.device]:
28 | """
29 | Initialize CUDA for distributed computing
30 | """
31 | if not torch.cuda.is_available():
32 | assert opts.local_rank == -1, opts.local_rank
33 | return True, 0, torch.device("cpu")
34 |
35 | # get device settings
36 | if opts.local_rank != -1:
37 | init_distributed(opts)
38 | torch.cuda.set_device(opts.local_rank)
39 | device = torch.device("cuda", opts.local_rank)
40 | n_gpu = 1
41 | default_gpu = dist.get_rank() == 0
42 | if default_gpu:
43 | LOGGER.info(f"Found {dist.get_world_size()} GPUs")
44 | else:
45 | default_gpu = True
46 | device = torch.device("cuda")
47 | n_gpu = torch.cuda.device_count()
48 |
49 | return default_gpu, n_gpu, device
50 |
51 |
52 | def wrap_model(
53 | model: torch.nn.Module, device: torch.device, local_rank: int
54 | ) -> torch.nn.Module:
55 | model.to(device)
56 |
57 | if local_rank != -1:
58 | model = DDP(model, device_ids=[local_rank], find_unused_parameters=True)
59 | # At the time of DDP wrapping, parameters and buffers (i.e., model.state_dict())
60 | # on rank0 are broadcasted to all other ranks.
61 | elif torch.cuda.device_count() > 1:
62 | LOGGER.info("Using data parallel")
63 | model = torch.nn.DataParallel(model)
64 |
65 | return model
66 |
67 |
68 | class NoOp(object):
69 | """ useful for distributed training No-Ops """
70 | def __getattr__(self, name):
71 | return self.noop
72 |
73 | def noop(self, *args, **kwargs):
74 | return
75 |
76 |
--------------------------------------------------------------------------------
/VLN-DUET/pretrain_src/utils/save.py:
--------------------------------------------------------------------------------
1 | """
2 | Copyright (c) Microsoft Corporation.
3 | Licensed under the MIT license.
4 |
5 | saving utilities
6 | """
7 | import json
8 | import os
9 | import torch
10 |
11 |
12 | def save_training_meta(args):
13 | os.makedirs(os.path.join(args.output_dir, 'logs'), exist_ok=True)
14 | os.makedirs(os.path.join(args.output_dir, 'ckpts'), exist_ok=True)
15 |
16 | with open(os.path.join(args.output_dir, 'logs', 'training_args.json'), 'w') as writer:
17 | json.dump(vars(args), writer, indent=4)
18 | model_config = json.load(open(args.model_config))
19 | with open(os.path.join(args.output_dir, 'logs', 'model_config.json'), 'w') as writer:
20 | json.dump(model_config, writer, indent=4)
21 |
22 |
23 | class ModelSaver(object):
24 | def __init__(self, output_dir, prefix='model_step', suffix='pt'):
25 | self.output_dir = output_dir
26 | self.prefix = prefix
27 | self.suffix = suffix
28 |
29 | def save(self, model, step, optimizer=None):
30 | output_model_file = os.path.join(self.output_dir,
31 | f"{self.prefix}_{step}.{self.suffix}")
32 | state_dict = {}
33 | for k, v in model.state_dict().items():
34 | if k.startswith('module.'):
35 | k = k[7:]
36 | if isinstance(v, torch.Tensor):
37 | state_dict[k] = v.cpu()
38 | else:
39 | state_dict[k] = v
40 | torch.save(state_dict, output_model_file)
41 | if optimizer is not None:
42 | dump = {'step': step, 'optimizer': optimizer.state_dict()}
43 | if hasattr(optimizer, '_amp_stash'):
44 | pass # TODO fp16 optimizer
45 | torch.save(dump, f'{self.output_dir}/train_state_{step}.pt')
46 |
47 |
--------------------------------------------------------------------------------
/VLN-DUET/requirements.txt:
--------------------------------------------------------------------------------
1 | # jsonlines==2.0.0
2 | # tqdm==4.62.0
3 | # easydict==1.9
4 | # Shapely==1.7.1
5 | # h5py==2.10.0
6 | timm==0.4.9
7 | # networkx==2.5.1
8 | # numpy==1.20.3
9 | # tensorboardX==2.4.1
10 | protobuf==3.20.1
11 | line_profiler=4.0.3
12 | # transformers==4.12.5
13 |
--------------------------------------------------------------------------------
/files/overall.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/wz0919/ScaleVLN/1189fe898462e2e10908631070bcf2d4ec2204b2/files/overall.jpg
--------------------------------------------------------------------------------