├── .github ├── CODE_OF_CONDUCT.md └── CONTRIBUTING.md ├── LICENSE ├── README.md ├── adaptation.py ├── assets ├── examples_pos.png └── method.png ├── data ├── head_clip_t=0.01.pt ├── head_clip_t=0.1.pt ├── simat_img_clip.pt └── simat_words_clip.ptd ├── demo.ipynb ├── encode.py ├── eval.py ├── prepare_dataset.py ├── requirements.txt └── simat_db ├── captions.txt ├── oscar_similarity_matrix.pt ├── retrieval_db.tsv ├── transfos.csv └── triplets.csv /.github/CODE_OF_CONDUCT.md: -------------------------------------------------------------------------------- 1 | # Code of Conduct 2 | 3 | ## Our Pledge 4 | 5 | In the interest of fostering an open and welcoming environment, we as 6 | contributors and maintainers pledge to make participation in our project and 7 | our community a harassment-free experience for everyone, regardless of age, body 8 | size, disability, ethnicity, sex characteristics, gender identity and expression, 9 | level of experience, education, socio-economic status, nationality, personal 10 | appearance, race, religion, or sexual identity and orientation. 11 | 12 | ## Our Standards 13 | 14 | Examples of behavior that contributes to creating a positive environment 15 | include: 16 | 17 | * Using welcoming and inclusive language 18 | * Being respectful of differing viewpoints and experiences 19 | * Gracefully accepting constructive criticism 20 | * Focusing on what is best for the community 21 | * Showing empathy towards other community members 22 | 23 | Examples of unacceptable behavior by participants include: 24 | 25 | * The use of sexualized language or imagery and unwelcome sexual attention or 26 | advances 27 | * Trolling, insulting/derogatory comments, and personal or political attacks 28 | * Public or private harassment 29 | * Publishing others' private information, such as a physical or electronic 30 | address, without explicit permission 31 | * Other conduct which could reasonably be considered inappropriate in a 32 | professional setting 33 | 34 | ## Our Responsibilities 35 | 36 | Project maintainers are responsible for clarifying the standards of acceptable 37 | behavior and are expected to take appropriate and fair corrective action in 38 | response to any instances of unacceptable behavior. 39 | 40 | Project maintainers have the right and responsibility to remove, edit, or 41 | reject comments, commits, code, wiki edits, issues, and other contributions 42 | that are not aligned to this Code of Conduct, or to ban temporarily or 43 | permanently any contributor for other behaviors that they deem inappropriate, 44 | threatening, offensive, or harmful. 45 | 46 | ## Scope 47 | 48 | This Code of Conduct applies within all project spaces, and it also applies when 49 | an individual is representing the project or its community in public spaces. 50 | Examples of representing a project or community include using an official 51 | project e-mail address, posting via an official social media account, or acting 52 | as an appointed representative at an online or offline event. Representation of 53 | a project may be further defined and clarified by project maintainers. 54 | 55 | This Code of Conduct also applies outside the project spaces when there is a 56 | reasonable belief that an individual's behavior may have a negative impact on 57 | the project or its community. 58 | 59 | ## Enforcement 60 | 61 | Instances of abusive, harassing, or otherwise unacceptable behavior may be 62 | reported by contacting the project team at . All 63 | complaints will be reviewed and investigated and will result in a response that 64 | is deemed necessary and appropriate to the circumstances. The project team is 65 | obligated to maintain confidentiality with regard to the reporter of an incident. 66 | Further details of specific enforcement policies may be posted separately. 67 | 68 | Project maintainers who do not follow or enforce the Code of Conduct in good 69 | faith may face temporary or permanent repercussions as determined by other 70 | members of the project's leadership. 71 | 72 | ## Attribution 73 | 74 | This Code of Conduct is adapted from the [Contributor Covenant][homepage], version 1.4, 75 | available at https://www.contributor-covenant.org/version/1/4/code-of-conduct.html 76 | 77 | [homepage]: https://www.contributor-covenant.org 78 | 79 | For answers to common questions about this code of conduct, see 80 | https://www.contributor-covenant.org/faq 81 | -------------------------------------------------------------------------------- /.github/CONTRIBUTING.md: -------------------------------------------------------------------------------- 1 | # Contributing to SIMAT 2 | We want to make contributing to this project as easy and transparent as 3 | possible. 4 | 5 | ## Our Development Process 6 | Minor changes and improvements will be released on an ongoing basis. 7 | Larger changes (e.g., changesets implementing a new paper) will be released on a more periodic basis. 8 | 9 | ## Pull Requests 10 | We actively welcome your pull requests. 11 | 12 | 1. Fork the repo and create your branch from `main`. 13 | 2. If you've added code that should be tested, add tests. 14 | 3. If you've changed APIs, update the documentation. 15 | 4. Ensure the test suite passes. 16 | 5. Make sure your code lints. 17 | 6. If you haven't already, complete the Contributor License Agreement ("CLA"). 18 | 19 | ## Contributor License Agreement ("CLA") 20 | In order to accept your pull request, we need you to submit a CLA. You only need 21 | to do this once to work on any of Meta's open source projects. 22 | 23 | Complete your CLA here: 24 | 25 | ## Issues 26 | We use GitHub issues to track public bugs. Please ensure your description is 27 | clear and has sufficient instructions to be able to reproduce the issue. 28 | 29 | Meta has a [bounty program](https://www.facebook.com/whitehat/) for the safe 30 | disclosure of security bugs. In those cases, please go through the process 31 | outlined on that page and do not file a public issue. 32 | 33 | ## Coding Style 34 | * 2 spaces for indentation rather than tabs 35 | * 80 character line length 36 | * ... 37 | 38 | ## License 39 | By contributing to SIMAT, you agree that your contributions will be licensed 40 | under the LICENSE file in the root directory of this source tree. 41 | -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | 2 | MIT License 3 | 4 | Copyright (c) Microsoft Corporation. 5 | 6 | Permission is hereby granted, free of charge, to any person obtaining a copy 7 | of this software and associated documentation files (the "Software"), to deal 8 | in the Software without restriction, including without limitation the rights 9 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 10 | copies of the Software, and to permit persons to whom the Software is 11 | furnished to do so, subject to the following conditions: 12 | 13 | The above copyright notice and this permission notice shall be included in all 14 | copies or substantial portions of the Software. 15 | 16 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 17 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 18 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 19 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 20 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 21 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 22 | SOFTWARE 23 | 24 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | This repository contains the database and code used in the paper [Embedding Arithmetic for Text-driven Image Transformation](https://arxiv.org/abs/2112.03162) (Guillaume Couairon, Holger Schwenk, Matthijs Douze, Matthieu Cord) 2 | 3 | The inspiration for this work are the geometric properties of word embeddings, such as Queen ~ Woman + (King - Man). 4 | We extend this idea to multimodal embedding spaces (like CLIP), which let us semantically edit images via "delta vectors". 5 | 6 | Transformed images can then be retrieved in a dataset of images. 7 | 8 |

9 | 10 |

11 | 12 | ## The SIMAT Dataset 13 | 14 | We build SIMAT, a dataset to evaluate the task of text-driven image transformation, for simple images that can be characterized by a single subject-relation-object annotation. 15 | A **transformation query** is a pair (*image*, *query*) where the query asks to change the subject, the relation or the object in the input *image*. 16 | SIMAT contains ~6k images and an average of 3 transformation queries per image. 17 | 18 | The goal is to retrieve an image in the dataset that corresponds to the query specifications. 19 | We use [OSCAR](https://github.com/microsoft/Oscar) as an oracle to check whether retrieved images are correct with respect to the expected modifications. 20 | 21 | 22 | 23 | ## Examples 24 | 25 | Below are a few examples that are in the dataset, and images that were retrieved for our best-performing algorithm. 26 | 27 |

28 | 29 |

30 | 31 | ## Download dataset 32 | 33 | The SIMAT database is composed of crops of images from Visual Genome. You first need to install Visual Genome and then run the following command : 34 | 35 | ```python 36 | python prepare_dataset.py --VG_PATH=/path/to/visual/genome 37 | ``` 38 | 39 | ## Perform inference with CLIP ViT-B/32 40 | 41 | In this example, we use the CLIP ViT-B/32 model to edit an image. Note that the dataset of clip embeddings is pre-computed. 42 | 43 | ```python 44 | import clip 45 | from torchvision import datasets 46 | from PIL import Image 47 | from IPython.display import display 48 | 49 | #hack to normalize tensors easily 50 | torch.Tensor.normalize = lambda x:x/x.norm(dim=-1, keepdim=True) 51 | 52 | # database to perform the retrieval step 53 | dataset = datasets.ImageFolder('simat_db/images/') 54 | 55 | db = torch.load('data/simat_img_clip.pt') 56 | db_stacked = torch.stack(list(db.values())).float() 57 | 58 | idx2rid = list(db.keys()) 59 | 60 | model, prep = clip.load('ViT-B/32', device=device) 61 | 62 | image = Image.open('simat_db/images/images/98316.png') 63 | img_enc = model.encode_image(prep(image).unsqueeze(0).to('cuda:0')).float().cpu().detach() 64 | 65 | txt = ['cat', 'dog'] 66 | txt_enc = model.encode_text(clip.tokenize(txt).to('cuda:0')).float().cpu().detach() 67 | 68 | # optionally, we can apply a linear layer on top of the embeddings 69 | heads = torch.load(f'data/head_clip_t=0.1.pt') 70 | img_enc = heads['img_head'](img_enc) 71 | txt_enc = heads['txt_head'](txt_enc) 72 | 73 | db = heads['img_head'](db).normalize() 74 | 75 | 76 | # now we perform the transformation step 77 | lbd = 1 78 | target_enc = img_enc.normalize() + lbd * (txt_enc[1].normalize() - txt_enc[0].normalize()) 79 | 80 | 81 | retrieved_idx = (db_stacked @ target_enc.float().T).argmax(0).item() 82 | 83 | retrieved_rid = idx2rid[retrieved_idx] 84 | 85 | display(Image.open(f'simat_db/images/images/{retrieved_rid}.png')) 86 | 87 | ``` 88 | 89 | 90 | ## Compute SIMAT scores with CLIP 91 | 92 | You can run the evaluation script with the following command: 93 | 94 | ```python 95 | python eval.py --backbone clip --domain dev --tau 0.01 --lbd 1 2 96 | ``` 97 | It automatically load the adaptation layer relative to the value of *tau*. 98 | 99 | ## Train adaptation layers on COCO 100 | 101 | In this part, you can train linear layers after the CLIP encoder on the COCO dataset, to get a better alignment. Here is an example : 102 | 103 | ```python 104 | python adaptation.py --backbone ViT-B/32 --lr 0.001 --tau 0.1 --batch_size 512 105 | ``` 106 | 107 | ## Citation 108 | 109 | If you find this paper or dataset useful for your research, please use the following. 110 | ``` 111 | @article{gco1embedding, 112 | title={Embedding Arithmetic for text-driven Image Transformation}, 113 | author={Guillaume Couairon, Matthieu Cord, Matthijs Douze, Holger Schwenk}, 114 | journal={arXiv preprint arXiv:2112.03162}, 115 | year={2021} 116 | } 117 | ``` 118 | 119 | ## References 120 | 121 | Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, Gretchen Krueger, Ilya Sutskever. [Learning Transferable Visual Models From Natural Language Supervision](https://arxiv.org/abs/2103.00020), OpenAI 2021 122 | 123 | Ranjay Krishna, Yuke Zhu, Oliver Groth, Justin Johnson, Kenji Hata, Joshua Kravitz, Stephanie Chen, Yannis Kalantidis, Li-Jia Li, David A. Shamma, Michael S. Bernstein, Fei-Fei Li. [Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations](https://arxiv.org/abs/1602.07332), IJCV 2017 124 | 125 | Xiujun Li, Xi Yin, Chunyuan Li, Pengchuan Zhang, Xiaowei Hu, Lei Zhang, Lijuan Wang, Houdong Hu, Li Dong, Furu Wei, Yejin Choi, Jianfeng Gao, [Oscar: Object-Semantics Aligned Pre-training for Vision-Language Tasks](https://arxiv.org/abs/1602.07332), ECCV 2020 126 | 127 | ## License 128 | 129 | The SIMAT is released under the MIT license. See LICENSE for details. 130 | -------------------------------------------------------------------------------- /adaptation.py: -------------------------------------------------------------------------------- 1 | # Copyright (c) 2015-present, Facebook, Inc. 2 | # All rights reserved. 3 | 4 | import sys 5 | import pytorch_lightning as pl 6 | import torch.nn as nn 7 | import torch.nn.functional as F 8 | import torchvision.transforms as T 9 | from PIL import Image 10 | import argparse 11 | import torchvision.datasets as datasets 12 | from functools import partial 13 | import torch 14 | import clip 15 | 16 | COCO_ROOT_TRAIN = '/checkpoint/gcouairon/coco/full' 17 | COCO_ANN_TRAIN = '/checkpoint/gcouairon/coco/train_annotations.json' 18 | COCO_ROOT_VAL = '/checkpoint/gcouairon/coco/full' 19 | COCO_ANN_VAL = '/checkpoint/gcouairon/coco/val_annotations.json' 20 | 21 | def main(): 22 | parser = argparse.ArgumentParser(description='Train adaptation layers') 23 | parser.add_argument('--backbone', type=str, default='ViT-B/32', help='which CLIP model to use') 24 | parser.add_argument('--precision', type=int, default=16) 25 | parser.add_argument('--output_dim', type=int, default=512) 26 | parser.add_argument('--tau', type=float, default=0.1) 27 | 28 | parser.add_argument('--lr', type=float, default=1e-3) 29 | parser.add_argument('--sched_step_size', type=int, default=25) 30 | parser.add_argument('--sched_gamma', type=float, default=0.1) 31 | parser.add_argument('--gpus', type=int, default=8) 32 | parser.add_argument('--nodes', type=int, default=1) 33 | parser.add_argument('--max_epochs', type=int, default=50) 34 | parser.add_argument('--batch_size', type=int, default=512) 35 | 36 | 37 | args = parser.parse_args() 38 | train_adaptation_layers(args) 39 | 40 | 41 | def clip_loss(imgf, txtf, T = 0.01): 42 | imgf = imgf / imgf.norm(2, dim=-1, keepdim=True) 43 | txtf = txtf / txtf.norm(2, dim=-1, keepdim=True) 44 | 45 | mma = (imgf @ txtf.T)/T # mma for MultiModalAlignment 46 | 47 | labels = torch.arange(mma.shape[0], device=mma.device) 48 | 49 | loss1 = F.cross_entropy(mma, labels) 50 | loss2 = F.cross_entropy(mma.T, labels) 51 | loss = (loss1 + loss2)/2 52 | 53 | return loss 54 | 55 | 56 | 57 | class MMEncoder(pl.LightningModule): 58 | def __init__(self, args): 59 | super().__init__() 60 | self.args = args 61 | self.core, _ = clip.load(args.backbone, device='cpu', jit=False) 62 | 63 | self.img_head = nn.Linear(512, args.output_dim) 64 | self.init_head(self.img_head) 65 | 66 | self.txt_head = nn.Linear(512, args.output_dim) 67 | self.init_head(self.txt_head) 68 | 69 | 70 | def init_head(self, x, eps=0.01): 71 | x.weight.data = torch.eye(*x.weight.data.shape) + eps * x.weight.data 72 | x.bias.data = eps * x.bias.data 73 | 74 | 75 | def training_step(self, batch, batch_nb): 76 | img, txt = batch 77 | with torch.no_grad(): 78 | img_ = self.core.encode_image(img).detach() 79 | txt_ = self.core.encode_text(txt).detach() 80 | 81 | img_ = self.img_head(img_) 82 | txt_ = self.txt_head(img_) 83 | 84 | loss = clip_loss(img_, txt_, T=self.T) 85 | self.log('train_loss', loss) 86 | return loss 87 | 88 | def validation_step(self, batch, batch_nb): 89 | img, txt = batch 90 | with torch.no_grad(): 91 | img_ = self.core.encode_image(img).detach() 92 | txt_ = self.core.encode_text(txt).detach() 93 | 94 | img_ = self.img_head(img_) 95 | txt_ = self.txt_head(img_) 96 | 97 | loss = clip_loss(img_, txt_, T=self.T) 98 | 99 | self.log('val_loss', loss) 100 | return img_, txt_ 101 | 102 | 103 | def configure_optimizers(self): 104 | opt = torch.optim.Adam([ 105 | {'params': self.img_head.parameters(), 'lr': self.args.lr}, 106 | {'params': self.txt_head.parameters(), 'lr': self.args.lr} 107 | ], lr=self.args.lr) 108 | sched = torch.optim.lr_scheduler.StepLR(opt, step_size = self.args.sched_step_size, gamma = self.args.sched_gamma) 109 | return [opt], [sched] 110 | 111 | def train_adaptation_layers(args): 112 | 113 | transform = T.Compose([T.Resize(size=256, interpolation=Image.BICUBIC), 114 | T.RandomCrop(224), T.RandomHorizontalFlip(), T.ToTensor(), 115 | T.Normalize(mean=(0.48145466, 0.4578275, 0.40821073), 116 | std=(0.26862954, 0.26130258, 0.27577711))]) 117 | 118 | train_dataset = datasets.CocoCaptions(root = COCO_ROOT_TRAIN, 119 | annFile = COCO_ANN_TRAIN, transform=transform) 120 | 121 | val_dataset = datasets.CocoCaptions(root = COCO_ROOT_VAL, 122 | annFile = COCO_ANN_VAL, transform=transform) 123 | 124 | collate_fn = lambda x:(torch.cat([xi[0] for xi in x]), clip.tokenize([xi[1][np.random.randint(5)] for xi in x])) 125 | 126 | dl = partial(torch.utils.data.DataLoader, 127 | batch_size=args.batch_size, 128 | num_workers=10, 129 | collate_fn = collate_fn) 130 | 131 | 132 | train_dataloader = dl(train_dataset, shuffle=True) 133 | 134 | val_dataloader = dl(val_dataset, shuffle=False) 135 | 136 | model = MMEncoder(args) 137 | 138 | trainer = pl.Trainer(precision=args.precision, 139 | gpus=args.gpus, 140 | num_nodes=args.nodes, 141 | gradient_clip_val=1, 142 | accelerator='ddp', 143 | max_epochs=args.max_epochs, 144 | progress_bar_refresh_rate=10) 145 | 146 | trainer.fit(model, train_dataloader, [val_dataloader]) 147 | 148 | 149 | 150 | if __name__ == '__main__': 151 | main() 152 | -------------------------------------------------------------------------------- /assets/examples_pos.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/facebookresearch/SIMAT/00fc29c5f02e2438187dc694ede42cdbab4bd82a/assets/examples_pos.png -------------------------------------------------------------------------------- /assets/method.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/facebookresearch/SIMAT/00fc29c5f02e2438187dc694ede42cdbab4bd82a/assets/method.png -------------------------------------------------------------------------------- /data/head_clip_t=0.01.pt: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/facebookresearch/SIMAT/00fc29c5f02e2438187dc694ede42cdbab4bd82a/data/head_clip_t=0.01.pt -------------------------------------------------------------------------------- /data/head_clip_t=0.1.pt: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/facebookresearch/SIMAT/00fc29c5f02e2438187dc694ede42cdbab4bd82a/data/head_clip_t=0.1.pt -------------------------------------------------------------------------------- /data/simat_img_clip.pt: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/facebookresearch/SIMAT/00fc29c5f02e2438187dc694ede42cdbab4bd82a/data/simat_img_clip.pt -------------------------------------------------------------------------------- /data/simat_words_clip.ptd: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/facebookresearch/SIMAT/00fc29c5f02e2438187dc694ede42cdbab4bd82a/data/simat_words_clip.ptd -------------------------------------------------------------------------------- /encode.py: -------------------------------------------------------------------------------- 1 | # Copyright (c) 2015-present, Facebook, Inc. 2 | # All rights reserved. 3 | 4 | import clip 5 | import torch 6 | import torchvision.datasets as datasets 7 | from functools import partial 8 | from tqdm import tqdm 9 | import pandas as pd 10 | from pathlib import Path 11 | 12 | # code for encoding the SIMAT database with CLIP 13 | # produces the files data/simat_img_clip_2.pt and data/simat_words_clip_2.ptd 14 | 15 | device = 'cuda:1' 16 | 17 | DATA_PATH = 'simat_db/images/' 18 | CLIP_MODEL = 'ViT-B/32' 19 | 20 | model, prep = clip.load(CLIP_MODEL, device=device) 21 | 22 | ds = datasets.ImageFolder(DATA_PATH, transform=prep) 23 | 24 | dl = torch.utils.data.DataLoader(ds, batch_size=32, num_workers=10, shuffle=False) 25 | 26 | img_enc = torch.cat([model.encode_image(b.to(device)).cpu().detach() for b, i in tqdm(dl)]).float() 27 | 28 | fnames = [x[0].name for x in datasets.ImageFolder(DATA_PATH, loader=Path)] 29 | region_ids = [int(x[:-4]) for x in fnames] 30 | 31 | img_enc_mapping = dict(zip(region_ids, img_enc)) 32 | torch.save(img_enc_mapping, 'data/simat_img_clip_2.pt') 33 | 34 | # encode words 35 | transfos = pd.read_csv('simat_db/transfos.csv', index_col=0) 36 | words = list(set(transfos.target) | set(transfos.value)) 37 | tokens = clip.tokenize(words) 38 | 39 | word_encs = torch.cat([model.encode_text(b.to(device)).cpu().detach() for b in tqdm(tokens.split(32))]) 40 | 41 | w2we = dict(zip(words, word_encs)) 42 | torch.save(w2we, 'data/simat_words_clip_2.ptd') 43 | -------------------------------------------------------------------------------- /eval.py: -------------------------------------------------------------------------------- 1 | # Copyright (c) 2015-present, Facebook, Inc. 2 | # All rights reserved. 3 | 4 | import clip 5 | import torch.nn as nn 6 | from torchvision import datasets 7 | import argparse 8 | import torch 9 | import pandas as pd 10 | import numpy as np 11 | 12 | torch.Tensor.normalize = lambda x: x/x.norm(dim=-1, keepdim=True) 13 | 14 | def simat_eval(args): 15 | #img_head, txt_head, emb_key='clip', lbds=[1], test=True:, tau 16 | # get heads ! 17 | emb_key = 'clip' 18 | heads = torch.load(f'data/head_{emb_key}_t={args.tau}.pt') 19 | #heads = dict(img_head = lambda x:x, txt_head=lambda x:x) 20 | output = {} 21 | transfos = pd.read_csv('simat_db/transfos.csv', index_col=0) 22 | triplets = pd.read_csv('simat_db/triplets.csv', index_col=0) 23 | did2rid = dict(zip(triplets.dataset_id, triplets.index)) 24 | rid2did = dict(zip(triplets.index, triplets.dataset_id)) 25 | 26 | transfos = transfos[transfos.is_test == (args.domain == 'test')] 27 | 28 | transfos_did = [rid2did[rid] for rid in transfos.region_id] 29 | 30 | #new method 31 | clip_simat = torch.load('data/simat_img_clip.pt') 32 | img_embs_stacked = torch.stack([clip_simat[did2rid[i]] for i in range(len(clip_simat))]).float() 33 | img_embs_stacked = heads['img_head'](img_embs_stacked).normalize() 34 | value_embs = torch.stack([img_embs_stacked[did] for did in transfos_did]) 35 | 36 | 37 | word_embs = dict(torch.load(f'data/simat_words_{emb_key}.ptd')) 38 | w2v = {k:heads['txt_head'](v.float()).normalize() for k, v in word_embs.items()} 39 | delta_vectors = torch.stack([w2v[x.target] - w2v[x.value] for i, x in transfos.iterrows()]) 40 | 41 | oscar_scores = torch.load('simat_db/oscar_similarity_matrix.pt') 42 | weights = 1/np.array(transfos.norm2)**.5 43 | weights = weights/sum(weights) 44 | 45 | for lbd in args.lbds: 46 | target_embs = value_embs + lbd*delta_vectors 47 | 48 | nnb = (target_embs @ img_embs_stacked.T).topk(5).indices 49 | nnb_notself = [r[0] if r[0].item() != t else r[1] for r, t in zip(nnb, transfos_did)] 50 | 51 | scores = np.array([oscar_scores[ri, tc] for ri, tc in zip(nnb_notself, transfos.target_ids)]) > .5 52 | 53 | 54 | output[lbd] = 100*np.average(scores, weights=weights) 55 | return output 56 | 57 | if __name__ == '__main__': 58 | parser = argparse.ArgumentParser(description='Run eval') 59 | parser.add_argument('--domain', type=str, default='dev', help='domain, test or dev') 60 | parser.add_argument('--backbone', type=str, default='clip', help='backbone method. Only clip is supported.') 61 | parser.add_argument('--tau', type=float, default=0.1, help='pretraining temperature tau') 62 | parser.add_argument('--lbds', nargs='+', default=[1], help='list of values for lambda') 63 | args = parser.parse_args() 64 | args.lbds = [float(l) for l in args.lbds] 65 | 66 | output = simat_eval(args) 67 | print('SIMAT Scores:') 68 | for lbd, v in output.items(): 69 | print(f'{lbd=}: {v:.2f}') -------------------------------------------------------------------------------- /prepare_dataset.py: -------------------------------------------------------------------------------- 1 | # Copyright (c) 2015-present, Facebook, Inc. 2 | # All rights reserved. 3 | 4 | import pandas as pd 5 | from pathlib import Path 6 | from tqdm import tqdm 7 | from PIL import Image 8 | import argparse 9 | 10 | parser = argparse.ArgumentParser(description='prepare SIMAT dataset') 11 | parser.add_argument('--path', type=str, 12 | help='where the Visual Genome dataset is stored') 13 | args = parser.parse_args() 14 | path = args.path 15 | 16 | triplets = pd.read_csv('simat_db/triplets.csv') 17 | 18 | Path('simat_db/images').mkdir(exist_ok=True) 19 | Path('simat_db/images/images').mkdir(exist_ok=True) 20 | 21 | # add images in the right folder 22 | retrieval_db = pd.read_csv('simat_db/retrieval_db.tsv', sep='\t', index_col=0) 23 | rid2iid = dict(zip(retrieval_db.index, retrieval_db.image_id)) 24 | for i, l in tqdm(triplets.iterrows()): 25 | img = Image.open(path + str(rid2iid[l.region_id])+'.jpg') 26 | bbox = [int(x) for x in retrieval_db.loc[l.region_id].bbox.split(',')] 27 | img.crop(bbox).save(f'simat_db/images/images/{l.region_id}.png') 28 | 29 | 30 | -------------------------------------------------------------------------------- /requirements.txt: -------------------------------------------------------------------------------- 1 | python>=3.6 2 | pandas 3 | tqdm 4 | torch 5 | torchvision 6 | PIL 7 | git+https://github.com/openai/CLIP.git 8 | -------------------------------------------------------------------------------- /simat_db/captions.txt: -------------------------------------------------------------------------------- 1 | A bear laying against a wall. 2 | A bear laying in a bed. 3 | A bear laying in a field. 4 | A bear laying in dirt. 5 | A bear laying on a log. 6 | A bear laying on a rock. 7 | A bear laying on a tree. 8 | A bear laying on rocks. 9 | A bear laying on the grass. 10 | A bear leaning on a log. 11 | A bear leaning on a rock. 12 | A bear leaning on a tree. 13 | A bear leaning on a wall. 14 | A bear licking another bear. 15 | A bear playing in water. 16 | A bear playing with a bear. 17 | A bear sitting on a bed. 18 | A bear sitting on a chair. 19 | A bear sitting on a rock. 20 | A bear sitting on a shelf. 21 | A bear sitting on a table. 22 | A bear sitting on a tree. 23 | A bear sitting on another bear. 24 | A bear sitting on books. 25 | A bear sitting on lap. 26 | A bear sitting on the floor. 27 | A bear sitting on the grass. 28 | A bear sitting on water. 29 | A bear sitting outside. 30 | A bear sniffing a man. 31 | A bear sniffing grass. 32 | A bird eating from a cup. 33 | A bird sitting on a bench. 34 | A bird sitting on a boat. 35 | A bird sitting on a bowl. 36 | A bird sitting on a building. 37 | A bird sitting on a chair. 38 | A bird sitting on a computer. 39 | A bird sitting on a feeder. 40 | A bird sitting on a fence. 41 | A bird sitting on a hand. 42 | A bird sitting on a pole. 43 | A bird sitting on a rock. 44 | A bird sitting on a table. 45 | A bird sitting on a tree. 46 | A bird sitting on a window. 47 | A bird sitting on rocks. 48 | A bird sitting on the grass. 49 | A bird sitting on the sand. 50 | A bird sitting on the water. 51 | A boy balancing on a skateboard. 52 | A boy balancing on a surfboard. 53 | A boy jumping on a skateboard. 54 | A boy laying in a bed. 55 | A boy laying in a sofa. 56 | A boy laying on a pillow. 57 | A boy laying on a skateboard. 58 | A boy laying on a surfboard. 59 | A boy leaning on a fence. 60 | A boy leaning on a table. 61 | A boy playing in a park. 62 | A boy playing in water. 63 | A boy playing on a field. 64 | A boy playing on the beach. 65 | A boy playing on the grass. 66 | A boy playing on the sand. 67 | A boy playing outside. 68 | A boy playing with a ball. 69 | A boy playing with a baseball bat. 70 | A boy playing with a computer. 71 | A boy playing with a frisbee. 72 | A boy playing with a girl. 73 | A boy playing with a kite. 74 | A boy playing with a skateboard. 75 | A boy playing with another boy. 76 | A boy running in a field. 77 | A boy running on the beach. 78 | A boy running on the grass. 79 | A boy sitting on a arm. 80 | A boy sitting on a bed. 81 | A boy sitting on a bench. 82 | A boy sitting on a boat. 83 | A boy sitting on a chair. 84 | A boy sitting on a couch. 85 | A boy sitting on a desk. 86 | A boy sitting on a elephant. 87 | A boy sitting on a fence. 88 | A boy sitting on a field. 89 | A boy sitting on a horse. 90 | A boy sitting on a motorcycle. 91 | A boy sitting on a seat. 92 | A boy sitting on a skateboard. 93 | A boy sitting on a sofa. 94 | A boy sitting on a table. 95 | A boy sitting on a toilet. 96 | A boy sitting on a wall. 97 | A boy sitting on the floor. 98 | A boy sitting on the grass. 99 | A boy sleeping on a couch. 100 | A boy swinging a baseball bat. 101 | A boy swinging a tennis racket. 102 | A boy touching a elephant. 103 | A cat eating from a plate. 104 | A cat eating from a toilet. 105 | A cat laying in a bag. 106 | A cat laying in a bed. 107 | A cat laying in a bench. 108 | A cat laying in a blanket. 109 | A cat laying in a car. 110 | A cat laying in a cat. 111 | A cat laying in a hood. 112 | A cat laying in a sink. 113 | A cat laying in a sofa. 114 | A cat laying in a suitcase. 115 | A cat laying on a carpet. 116 | A cat laying on a chair. 117 | A cat laying on a computer. 118 | A cat laying on a couch. 119 | A cat laying on a desk. 120 | A cat laying on a keyboard. 121 | A cat laying on a pillow. 122 | A cat laying on a rug. 123 | A cat laying on a table. 124 | A cat laying on the floor. 125 | A cat licking a keyboard. 126 | A cat licking a plate. 127 | A cat licking his leg. 128 | A cat playing with another cat. 129 | A cat playing with shoes. 130 | A cat sitting on a bed. 131 | A cat sitting on a bench. 132 | A cat sitting on a blanket. 133 | A cat sitting on a car. 134 | A cat sitting on a chair. 135 | A cat sitting on a computer. 136 | A cat sitting on a couch. 137 | A cat sitting on a desk. 138 | A cat sitting on a floor. 139 | A cat sitting on a grass. 140 | A cat sitting on a rug. 141 | A cat sitting on a seat. 142 | A cat sitting on a sink. 143 | A cat sitting on a sofa. 144 | A cat sitting on a suitcase. 145 | A cat sitting on a table. 146 | A cat sitting on a toilet. 147 | A cat sitting on a tv. 148 | A cat sitting on a window sill. 149 | A cat sitting on a window. 150 | A cat sleeping on a bed. 151 | A cat sleeping on a blanket. 152 | A cat sleeping on a chair. 153 | A cat sleeping on a computer. 154 | A cat sleeping on a couch. 155 | A cat sleeping on a desk. 156 | A cat sleeping on a floor. 157 | A cat sleeping on a rug. 158 | A cat sniffing a bottle. 159 | A cat sniffing a computer. 160 | A cat sniffing a pizza. 161 | A cat touching a tv. 162 | A cow drinking water. 163 | A cow eating from a bottle. 164 | A cow laying in a field. 165 | A cow laying in a straw. 166 | A cow laying in grass. 167 | A cow laying in hay. 168 | A cow laying in the dirt. 169 | A cow laying on the beach. 170 | A cow laying on the floor. 171 | A cow laying on the sand. 172 | A cow laying on the street. 173 | A cow licking another cow. 174 | A cow sitting on a beach. 175 | A cow sitting on a field. 176 | A cow sitting on another cow. 177 | A cow sitting on dirt. 178 | A cow sitting on hay. 179 | A cow sitting on sand. 180 | A cow sitting on the grass. 181 | A cow touching a cow. 182 | A dog biting a frisbee. 183 | A dog chasing a ball. 184 | A dog chasing a cow. 185 | A dog chasing a dog. 186 | A dog chasing a frisbee. 187 | A dog chasing a motorcycle. 188 | A dog chasing a sheep. 189 | A dog entering water. 190 | A dog grabbing a frisbee. 191 | A dog laying in a bed. 192 | A dog laying in a boat. 193 | A dog laying in a couch. 194 | A dog laying in a lap. 195 | A dog laying in a sofa. 196 | A dog laying in dirt. 197 | A dog laying in grass. 198 | A dog laying in rocks. 199 | A dog laying in the street. 200 | A dog laying on a bench. 201 | A dog laying on a blanket. 202 | A dog laying on a carpet. 203 | A dog laying on a man. 204 | A dog laying on a pillow. 205 | A dog laying on a rug. 206 | A dog laying on a suitcase. 207 | A dog laying on a surfboard. 208 | A dog laying on sand. 209 | A dog laying on the beach. 210 | A dog laying on the floor. 211 | A dog playing in a field. 212 | A dog playing in water. 213 | A dog playing on the beach. 214 | A dog playing on the floor. 215 | A dog playing on the grass. 216 | A dog playing with a ball. 217 | A dog playing with a cat. 218 | A dog playing with a frisbee. 219 | A dog playing with a soccer ball. 220 | A dog playing with another dog. 221 | A dog playing with snow. 222 | A dog running in a beach. 223 | A dog running in a field. 224 | A dog running in grass. 225 | A dog running in sand. 226 | A dog running in the snow. 227 | A dog running in water. 228 | A dog sitting in the dirt. 229 | A dog sitting on a bed. 230 | A dog sitting on a bench. 231 | A dog sitting on a blanket. 232 | A dog sitting on a car. 233 | A dog sitting on a carpet. 234 | A dog sitting on a chair. 235 | A dog sitting on a couch. 236 | A dog sitting on a floor. 237 | A dog sitting on a grass. 238 | A dog sitting on a lap. 239 | A dog sitting on a man. 240 | A dog sitting on a motorcycle. 241 | A dog sitting on a pillow. 242 | A dog sitting on a sand. 243 | A dog sitting on a seat. 244 | A dog sitting on a sidewalk. 245 | A dog sitting on a table. 246 | A dog sitting on a truck. 247 | A dog sitting on a window. 248 | A dog sleeping on a bed. 249 | A dog sleeping on a bench. 250 | A dog sleeping on a carpet. 251 | A dog sleeping on a couch. 252 | A dog sleeping on a floor. 253 | A dog sleeping on a pillow. 254 | A dog sleeping on a sidewalk. 255 | A dog sniffing a frisbee. 256 | A dog sniffing grass. 257 | A elephant drinking water. 258 | A elephant laying in grass. 259 | A elephant laying in water. 260 | A elephant leaving water. 261 | A elephant playing in water. 262 | A elephant playing with a ball. 263 | A elephant playing with a mud. 264 | A elephant playing with another elephant. 265 | A elephant sitting in water. 266 | A elephant touching a man. 267 | A giraffe drinking water. 268 | A giraffe eating a leaf. 269 | A giraffe eating from a basket. 270 | A giraffe eating from a feeder. 271 | A giraffe eating from a hand. 272 | A giraffe eating from a tree. 273 | A giraffe eating from a woman. 274 | A giraffe laying in a dirt. 275 | A giraffe laying in grass. 276 | A giraffe leaning on a fence. 277 | A giraffe leaning on a railing. 278 | A giraffe leaning on a rock. 279 | A giraffe licking a leaf. 280 | A giraffe licking a pole. 281 | A giraffe licking a tree. 282 | A giraffe running in a field. 283 | A giraffe sitting in the dirt. 284 | A giraffe sitting on grass. 285 | A giraffe sniffing another giraffe. 286 | A giraffe touching another giraffe. 287 | A girl balancing on a surfboard. 288 | A girl biting a pizza. 289 | A girl jumping on a bed. 290 | A girl laying in a bed. 291 | A girl laying in a couch. 292 | A girl laying in a pillow. 293 | A girl laying on a bench. 294 | A girl laying on a board. 295 | A girl laying on a surfboard. 296 | A girl laying on the floor. 297 | A girl leaning on a computer. 298 | A girl leaning on a fence. 299 | A girl playing in a field. 300 | A girl playing in a park. 301 | A girl playing in water. 302 | A girl playing on the beach. 303 | A girl playing on the grass. 304 | A girl playing on the sand. 305 | A girl playing outside. 306 | A girl playing with a ball. 307 | A girl playing with a boy. 308 | A girl playing with a computer. 309 | A girl playing with a cup. 310 | A girl playing with a frisbee. 311 | A girl playing with a girl. 312 | A girl playing with a kite. 313 | A girl playing with a skateboard. 314 | A girl playing with a umbrella. 315 | A girl running in a beach. 316 | A girl running in a field. 317 | A girl running in a road. 318 | A girl running on grass. 319 | A girl sitting on a bed. 320 | A girl sitting on a bench. 321 | A girl sitting on a blanket. 322 | A girl sitting on a bus. 323 | A girl sitting on a chair. 324 | A girl sitting on a couch. 325 | A girl sitting on a counter. 326 | A girl sitting on a desk. 327 | A girl sitting on a floor. 328 | A girl sitting on a girl. 329 | A girl sitting on a grass. 330 | A girl sitting on a horse. 331 | A girl sitting on a lap. 332 | A girl sitting on a mat. 333 | A girl sitting on a seat. 334 | A girl sitting on a sofa. 335 | A girl sitting on a suitcase. 336 | A girl sitting on a surfboard. 337 | A girl sitting on a table. 338 | A girl sitting on a wall. 339 | A girl sleeping on a seat. 340 | A girl swinging a baseball bat. 341 | A girl swinging a tennis racket. 342 | A girl touching a horse. 343 | A horse chasing a cow. 344 | A horse jumping on a fence. 345 | A horse jumping on poles. 346 | A horse laying in a field. 347 | A horse laying on grass. 348 | A horse laying on his back. 349 | A horse laying on the beach. 350 | A horse leaning on a fence. 351 | A horse running in a field. 352 | A horse running in the dirt. 353 | A horse running in water. 354 | A horse running on a sand. 355 | A horse running on grass. 356 | A horse running on the beach. 357 | A man balancing on a skateboard. 358 | A man balancing on a surfboard. 359 | A man balancing on a wave. 360 | A man biting a pizza. 361 | A man biting a surfboard. 362 | A man chasing a ball. 363 | A man chasing a cow. 364 | A man chasing a frisbee. 365 | A man eating from a bottle. 366 | A man eating from a cup. 367 | A man eating from a plate. 368 | A man entering a bus. 369 | A man entering a plane. 370 | A man entering a train. 371 | A man grabbing a frisbee. 372 | A man grabbing a pizza. 373 | A man grabbing a skateboard. 374 | A man grabbing a surfboard. 375 | A man grabbing a tennis racket. 376 | A man grabbing food. 377 | A man jumping on a bed. 378 | A man jumping on a skateboard. 379 | A man jumping on skis. 380 | A man laying in a chair. 381 | A man laying in a couch. 382 | A man laying in bed. 383 | A man laying in grass. 384 | A man laying in snow. 385 | A man laying in water. 386 | A man laying on a bench. 387 | A man laying on a board. 388 | A man laying on a pillow. 389 | A man laying on a surfboard. 390 | A man laying on a towel. 391 | A man laying on sand. 392 | A man laying on the beach. 393 | A man laying on the floor. 394 | A man laying on the sidewalk. 395 | A man leaning on a baseball bat. 396 | A man leaning on a bicycle. 397 | A man leaning on a bike. 398 | A man leaning on a boat. 399 | A man leaning on a car. 400 | A man leaning on a computer. 401 | A man leaning on a counter. 402 | A man leaning on a fence. 403 | A man leaning on a hand. 404 | A man leaning on a kite. 405 | A man leaning on a motorcycle. 406 | A man leaning on a pole. 407 | A man leaning on a rail. 408 | A man leaning on a railing. 409 | A man leaning on a surfboard. 410 | A man leaning on a table. 411 | A man leaning on a train. 412 | A man leaning on a tree. 413 | A man leaning on a truck. 414 | A man leaning on a wall. 415 | A man leaving a train. 416 | A man leaving water. 417 | A man playing in the snow. 418 | A man playing on a field. 419 | A man playing on the beach. 420 | A man playing on the grass. 421 | A man playing on the sand. 422 | A man playing tennis on a tennis court. 423 | A man playing with a ball. 424 | A man playing with a board. 425 | A man playing with a boy. 426 | A man playing with a bus. 427 | A man playing with a dog. 428 | A man playing with a frisbee. 429 | A man playing with a keyboard. 430 | A man playing with a kite. 431 | A man playing with a soccer ball. 432 | A man playing with a tennis racket. 433 | A man playing with a water. 434 | A man playing with a wave. 435 | A man playing with a woman. 436 | A man playing with another man. 437 | A man running in a field. 438 | A man running on a beach. 439 | A man running on grass. 440 | A man running on the sand. 441 | A man running with another man. 442 | A man selling bananas. 443 | A man sitting on a bed. 444 | A man sitting on a bench. 445 | A man sitting on a bike. 446 | A man sitting on a boat. 447 | A man sitting on a bus. 448 | A man sitting on a car. 449 | A man sitting on a chair. 450 | A man sitting on a couch. 451 | A man sitting on a desk. 452 | A man sitting on a elephant. 453 | A man sitting on a floor. 454 | A man sitting on a grass. 455 | A man sitting on a horse. 456 | A man sitting on a motorcycle. 457 | A man sitting on a seat. 458 | A man sitting on a sidewalk. 459 | A man sitting on a sofa. 460 | A man sitting on a table. 461 | A man sitting on the beach. 462 | A man sitting on the snow. 463 | A man sleeping on a bed. 464 | A man sleeping on a bench. 465 | A man sleeping on a pillow. 466 | A man swinging a ball. 467 | A man swinging a baseball bat. 468 | A man swinging a tennis racket. 469 | A man swinging an arm. 470 | A man touching a banana. 471 | A man touching a board. 472 | A man touching a cat. 473 | A man touching a computer. 474 | A man touching a cow. 475 | A man touching a dog. 476 | A man touching a elephant. 477 | A man touching a head. 478 | A man touching a horse. 479 | A man touching a pizza. 480 | A man touching a plane. 481 | A man touching a skateboard. 482 | A man touching a snow. 483 | A man touching a suitcase. 484 | A man touching a water. 485 | A man touching a woman. 486 | A man waiting at a bus stop. 487 | A man waiting at a train. 488 | A man waiting for the bus. 489 | A sheep laying in a field. 490 | A sheep laying in a straw. 491 | A sheep laying in grass. 492 | A sheep laying on a hill. 493 | A sheep laying on hay. 494 | A sheep running in a field. 495 | A sheep running in grass. 496 | A sheep sitting on a field. 497 | A sheep sitting on a grass. 498 | A woman balancing on a surfboard. 499 | A woman entering a train. 500 | A woman entering bus. 501 | A woman grabbing a man. 502 | A woman grabbing a pizza. 503 | A woman grabbing a tennis racket. 504 | A woman jumping on a horse. 505 | A woman laying in a bag. 506 | A woman laying in a bed. 507 | A woman laying in a bench. 508 | A woman laying in a board. 509 | A woman laying in a couch. 510 | A woman laying in a floor. 511 | A woman laying in a sofa. 512 | A woman laying in hay. 513 | A woman laying on a pillow. 514 | A woman laying on a surfboard. 515 | A woman laying on a towel. 516 | A woman laying on her head. 517 | A woman laying on sand. 518 | A woman laying on snow. 519 | A woman laying on the beach. 520 | A woman leaning on a baseball bat. 521 | A woman leaning on a bear. 522 | A woman leaning on a bike. 523 | A woman leaning on a chair. 524 | A woman leaning on a computer. 525 | A woman leaning on a couch. 526 | A woman leaning on a counter. 527 | A woman leaning on a fence. 528 | A woman leaning on a man. 529 | A woman leaning on a pole. 530 | A woman leaning on a railing. 531 | A woman leaning on a table. 532 | A woman leaning on a wall. 533 | A woman playing in a field. 534 | A woman playing in a park. 535 | A woman playing in the water. 536 | A woman playing on sand. 537 | A woman playing on the grass. 538 | A woman playing tennis on a tennis court. 539 | A woman playing with a ball. 540 | A woman playing with a dog. 541 | A woman playing with a elephant. 542 | A woman playing with a frisbee. 543 | A woman playing with a tennis racket. 544 | A woman playing with a wave. 545 | A woman playing with another woman. 546 | A woman running in a field. 547 | A woman running in grass. 548 | A woman running in the water. 549 | A woman running on the beach. 550 | A woman running on the sidewalk. 551 | A woman selling bananas. 552 | A woman selling food. 553 | A woman sitting on a bed. 554 | A woman sitting on a bench. 555 | A woman sitting on a boat. 556 | A woman sitting on a bus. 557 | A woman sitting on a chair. 558 | A woman sitting on a couch. 559 | A woman sitting on a desk. 560 | A woman sitting on a floor. 561 | A woman sitting on a grass. 562 | A woman sitting on a horse. 563 | A woman sitting on a man. 564 | A woman sitting on a motorcycle. 565 | A woman sitting on a sand. 566 | A woman sitting on a seat. 567 | A woman sitting on a sofa. 568 | A woman sitting on a surfboard. 569 | A woman sitting on a table. 570 | A woman sitting on a wall. 571 | A woman sitting on snow. 572 | A woman sitting on the beach. 573 | A woman sleeping on a bed. 574 | A woman swinging a baseball bat. 575 | A woman swinging a tennis racket. 576 | A woman swinging her arm. 577 | A woman touching a bottle. 578 | A woman touching a cat. 579 | A woman touching a dog. 580 | A woman touching a elephant. 581 | A woman touching a giraffe. 582 | A woman touching a horse. 583 | A woman touching a man. 584 | A woman touching a sheep. 585 | A woman waiting at a bus stop. 586 | A woman waiting at a platform. 587 | A zebra biting a zebra. 588 | A zebra drinking water. 589 | A zebra eating from a feeder. 590 | A zebra eating from hay. 591 | A zebra laying in a field. 592 | A zebra laying on dirt. 593 | A zebra laying on grass. 594 | A zebra laying on mud. 595 | A zebra leaning on a zebra. 596 | A zebra playing with another zebra. 597 | A zebra running in a field. 598 | A zebra running in water. 599 | A zebra running on grass. 600 | A zebra sitting on grass. 601 | A zebra touching another zebra 602 | People entering a bus. 603 | People entering a plane. 604 | People entering a train. 605 | People laying on the beach. 606 | People laying on the sand. 607 | People leaning on a computer. 608 | People leaning on a fence. 609 | People leaning on a railing. 610 | People leaving a building. 611 | People leaving a plane. 612 | People playing in a field. 613 | People playing in the snow. 614 | People playing in water. 615 | People playing on grass. 616 | People playing on sand. 617 | People playing on the beach. 618 | People playing with a ball. 619 | People playing with a frisbee. 620 | People playing with a kite. 621 | People playing with a park. 622 | People sitting on a beach. 623 | People sitting on a bench. 624 | People sitting on a boat. 625 | People sitting on a building. 626 | People sitting on a bus. 627 | People sitting on a car. 628 | People sitting on a chair. 629 | People sitting on a couch. 630 | People sitting on a elephant. 631 | People sitting on a floor. 632 | People sitting on a grass. 633 | People sitting on a park. 634 | People sitting on a sand. 635 | People sitting on a seat. 636 | People sitting on a sidewalk. 637 | People sitting on a snow. 638 | People sitting on a sofa. 639 | People sitting on a table. 640 | People sitting on a train. 641 | People sitting on a wall. 642 | People waiting at a bus stop. 643 | People waiting at a platform. 644 | People waiting at a train. 645 | People waiting at bus. 646 | -------------------------------------------------------------------------------- /simat_db/oscar_similarity_matrix.pt: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/facebookresearch/SIMAT/00fc29c5f02e2438187dc694ede42cdbab4bd82a/simat_db/oscar_similarity_matrix.pt --------------------------------------------------------------------------------