├── CODE_OF_CONDUCT.md ├── CONTRIBUTING.md ├── LICENSE ├── NOTICE ├── README.md ├── figures ├── qual_retriv.png └── teaser.png └── trainer └── loss.py /CODE_OF_CONDUCT.md: -------------------------------------------------------------------------------- 1 | ## Code of Conduct 2 | This project has adopted the [Amazon Open Source Code of Conduct](https://aws.github.io/code-of-conduct). 3 | For more information see the [Code of Conduct FAQ](https://aws.github.io/code-of-conduct-faq) or contact 4 | opensource-codeofconduct@amazon.com with any additional questions or comments. 5 | -------------------------------------------------------------------------------- /CONTRIBUTING.md: -------------------------------------------------------------------------------- 1 | # Contributing Guidelines 2 | 3 | Thank you for your interest in contributing to our project. Whether it's a bug report, new feature, correction, or additional 4 | documentation, we greatly value feedback and contributions from our community. 5 | 6 | Please read through this document before submitting any issues or pull requests to ensure we have all the necessary 7 | information to effectively respond to your bug report or contribution. 8 | 9 | 10 | ## Reporting Bugs/Feature Requests 11 | 12 | We welcome you to use the GitHub issue tracker to report bugs or suggest features. 13 | 14 | When filing an issue, please check existing open, or recently closed, issues to make sure somebody else hasn't already 15 | reported the issue. Please try to include as much information as you can. Details like these are incredibly useful: 16 | 17 | * A reproducible test case or series of steps 18 | * The version of our code being used 19 | * Any modifications you've made relevant to the bug 20 | * Anything unusual about your environment or deployment 21 | 22 | 23 | ## Contributing via Pull Requests 24 | Contributions via pull requests are much appreciated. Before sending us a pull request, please ensure that: 25 | 26 | 1. You are working against the latest source on the *main* branch. 27 | 2. You check existing open, and recently merged, pull requests to make sure someone else hasn't addressed the problem already. 28 | 3. You open an issue to discuss any significant work - we would hate for your time to be wasted. 29 | 30 | To send us a pull request, please: 31 | 32 | 1. Fork the repository. 33 | 2. Modify the source; please focus on the specific change you are contributing. If you also reformat all the code, it will be hard for us to focus on your change. 34 | 3. Ensure local tests pass. 35 | 4. Commit to your fork using clear commit messages. 36 | 5. Send us a pull request, answering any default questions in the pull request interface. 37 | 6. Pay attention to any automated CI failures reported in the pull request, and stay involved in the conversation. 38 | 39 | GitHub provides additional document on [forking a repository](https://help.github.com/articles/fork-a-repo/) and 40 | [creating a pull request](https://help.github.com/articles/creating-a-pull-request/). 41 | 42 | 43 | ## Finding contributions to work on 44 | Looking at the existing issues is a great way to find something to contribute on. As our projects, by default, use the default GitHub issue labels (enhancement/bug/duplicate/help wanted/invalid/question/wontfix), looking at any 'help wanted' issues is a great place to start. 45 | 46 | 47 | ## Code of Conduct 48 | This project has adopted the [Amazon Open Source Code of Conduct](https://aws.github.io/code-of-conduct). 49 | For more information see the [Code of Conduct FAQ](https://aws.github.io/code-of-conduct-faq) or contact 50 | opensource-codeofconduct@amazon.com with any additional questions or comments. 51 | 52 | 53 | ## Security issue notifications 54 | If you discover a potential security issue in this project we ask that you notify AWS/Amazon Security via our [vulnerability reporting page](http://aws.amazon.com/security/vulnerability-reporting/). Please do **not** create a public github issue. 55 | 56 | 57 | ## Licensing 58 | 59 | See the [LICENSE](LICENSE) file for our project's licensing. We will ask you to confirm the licensing of your contribution. 60 | -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | 2 | Apache License 3 | Version 2.0, January 2004 4 | http://www.apache.org/licenses/ 5 | 6 | TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION 7 | 8 | 1. Definitions. 9 | 10 | "License" shall mean the terms and conditions for use, reproduction, 11 | and distribution as defined by Sections 1 through 9 of this document. 12 | 13 | "Licensor" shall mean the copyright owner or entity authorized by 14 | the copyright owner that is granting the License. 15 | 16 | "Legal Entity" shall mean the union of the acting entity and all 17 | other entities that control, are controlled by, or are under common 18 | control with that entity. For the purposes of this definition, 19 | "control" means (i) the power, direct or indirect, to cause the 20 | direction or management of such entity, whether by contract or 21 | otherwise, or (ii) ownership of fifty percent (50%) or more of the 22 | outstanding shares, or (iii) beneficial ownership of such entity. 23 | 24 | "You" (or "Your") shall mean an individual or Legal Entity 25 | exercising permissions granted by this License. 26 | 27 | "Source" form shall mean the preferred form for making modifications, 28 | including but not limited to software source code, documentation 29 | source, and configuration files. 30 | 31 | "Object" form shall mean any form resulting from mechanical 32 | transformation or translation of a Source form, including but 33 | not limited to compiled object code, generated documentation, 34 | and conversions to other media types. 35 | 36 | "Work" shall mean the work of authorship, whether in Source or 37 | Object form, made available under the License, as indicated by a 38 | copyright notice that is included in or attached to the work 39 | (an example is provided in the Appendix below). 40 | 41 | "Derivative Works" shall mean any work, whether in Source or Object 42 | form, that is based on (or derived from) the Work and for which the 43 | editorial revisions, annotations, elaborations, or other modifications 44 | represent, as a whole, an original work of authorship. For the purposes 45 | of this License, Derivative Works shall not include works that remain 46 | separable from, or merely link (or bind by name) to the interfaces of, 47 | the Work and Derivative Works thereof. 48 | 49 | "Contribution" shall mean any work of authorship, including 50 | the original version of the Work and any modifications or additions 51 | to that Work or Derivative Works thereof, that is intentionally 52 | submitted to Licensor for inclusion in the Work by the copyright owner 53 | or by an individual or Legal Entity authorized to submit on behalf of 54 | the copyright owner. For the purposes of this definition, "submitted" 55 | means any form of electronic, verbal, or written communication sent 56 | to the Licensor or its representatives, including but not limited to 57 | communication on electronic mailing lists, source code control systems, 58 | and issue tracking systems that are managed by, or on behalf of, the 59 | Licensor for the purpose of discussing and improving the Work, but 60 | excluding communication that is conspicuously marked or otherwise 61 | designated in writing by the copyright owner as "Not a Contribution." 62 | 63 | "Contributor" shall mean Licensor and any individual or Legal Entity 64 | on behalf of whom a Contribution has been received by Licensor and 65 | subsequently incorporated within the Work. 66 | 67 | 2. Grant of Copyright License. Subject to the terms and conditions of 68 | this License, each Contributor hereby grants to You a perpetual, 69 | worldwide, non-exclusive, no-charge, royalty-free, irrevocable 70 | copyright license to reproduce, prepare Derivative Works of, 71 | publicly display, publicly perform, sublicense, and distribute the 72 | Work and such Derivative Works in Source or Object form. 73 | 74 | 3. Grant of Patent License. Subject to the terms and conditions of 75 | this License, each Contributor hereby grants to You a perpetual, 76 | worldwide, non-exclusive, no-charge, royalty-free, irrevocable 77 | (except as stated in this section) patent license to make, have made, 78 | use, offer to sell, sell, import, and otherwise transfer the Work, 79 | where such license applies only to those patent claims licensable 80 | by such Contributor that are necessarily infringed by their 81 | Contribution(s) alone or by combination of their Contribution(s) 82 | with the Work to which such Contribution(s) was submitted. If You 83 | institute patent litigation against any entity (including a 84 | cross-claim or counterclaim in a lawsuit) alleging that the Work 85 | or a Contribution incorporated within the Work constitutes direct 86 | or contributory patent infringement, then any patent licenses 87 | granted to You under this License for that Work shall terminate 88 | as of the date such litigation is filed. 89 | 90 | 4. Redistribution. You may reproduce and distribute copies of the 91 | Work or Derivative Works thereof in any medium, with or without 92 | modifications, and in Source or Object form, provided that You 93 | meet the following conditions: 94 | 95 | (a) You must give any other recipients of the Work or 96 | Derivative Works a copy of this License; and 97 | 98 | (b) You must cause any modified files to carry prominent notices 99 | stating that You changed the files; and 100 | 101 | (c) You must retain, in the Source form of any Derivative Works 102 | that You distribute, all copyright, patent, trademark, and 103 | attribution notices from the Source form of the Work, 104 | excluding those notices that do not pertain to any part of 105 | the Derivative Works; and 106 | 107 | (d) If the Work includes a "NOTICE" text file as part of its 108 | distribution, then any Derivative Works that You distribute must 109 | include a readable copy of the attribution notices contained 110 | within such NOTICE file, excluding those notices that do not 111 | pertain to any part of the Derivative Works, in at least one 112 | of the following places: within a NOTICE text file distributed 113 | as part of the Derivative Works; within the Source form or 114 | documentation, if provided along with the Derivative Works; or, 115 | within a display generated by the Derivative Works, if and 116 | wherever such third-party notices normally appear. The contents 117 | of the NOTICE file are for informational purposes only and 118 | do not modify the License. You may add Your own attribution 119 | notices within Derivative Works that You distribute, alongside 120 | or as an addendum to the NOTICE text from the Work, provided 121 | that such additional attribution notices cannot be construed 122 | as modifying the License. 123 | 124 | You may add Your own copyright statement to Your modifications and 125 | may provide additional or different license terms and conditions 126 | for use, reproduction, or distribution of Your modifications, or 127 | for any such Derivative Works as a whole, provided Your use, 128 | reproduction, and distribution of the Work otherwise complies with 129 | the conditions stated in this License. 130 | 131 | 5. Submission of Contributions. Unless You explicitly state otherwise, 132 | any Contribution intentionally submitted for inclusion in the Work 133 | by You to the Licensor shall be under the terms and conditions of 134 | this License, without any additional terms or conditions. 135 | Notwithstanding the above, nothing herein shall supersede or modify 136 | the terms of any separate license agreement you may have executed 137 | with Licensor regarding such Contributions. 138 | 139 | 6. Trademarks. This License does not grant permission to use the trade 140 | names, trademarks, service marks, or product names of the Licensor, 141 | except as required for reasonable and customary use in describing the 142 | origin of the Work and reproducing the content of the NOTICE file. 143 | 144 | 7. Disclaimer of Warranty. Unless required by applicable law or 145 | agreed to in writing, Licensor provides the Work (and each 146 | Contributor provides its Contributions) on an "AS IS" BASIS, 147 | WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or 148 | implied, including, without limitation, any warranties or conditions 149 | of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A 150 | PARTICULAR PURPOSE. You are solely responsible for determining the 151 | appropriateness of using or redistributing the Work and assume any 152 | risks associated with Your exercise of permissions under this License. 153 | 154 | 8. Limitation of Liability. In no event and under no legal theory, 155 | whether in tort (including negligence), contract, or otherwise, 156 | unless required by applicable law (such as deliberate and grossly 157 | negligent acts) or agreed to in writing, shall any Contributor be 158 | liable to You for damages, including any direct, indirect, special, 159 | incidental, or consequential damages of any character arising as a 160 | result of this License or out of the use or inability to use the 161 | Work (including but not limited to damages for loss of goodwill, 162 | work stoppage, computer failure or malfunction, or any and all 163 | other commercial damages or losses), even if such Contributor 164 | has been advised of the possibility of such damages. 165 | 166 | 9. Accepting Warranty or Additional Liability. While redistributing 167 | the Work or Derivative Works thereof, You may choose to offer, 168 | and charge a fee for, acceptance of support, warranty, indemnity, 169 | or other liability obligations and/or rights consistent with this 170 | License. However, in accepting such obligations, You may act only 171 | on Your own behalf and on Your sole responsibility, not on behalf 172 | of any other Contributor, and only if You agree to indemnify, 173 | defend, and hold each Contributor harmless for any liability 174 | incurred by, or claims asserted against, such Contributor by reason 175 | of your accepting any such warranty or additional liability. 176 | -------------------------------------------------------------------------------- /NOTICE: -------------------------------------------------------------------------------- 1 | Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved. 2 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # CrossCLR - ICCV 2021 2 |

3 | 4 |

5 | This is the official implementation of paper: 6 | 7 | ### CrossCLR: Cross-modal Contrastive Learning For Multi-modal Video Representations [[Paper]](https://arxiv.org/abs/2103.00020) 8 | 9 | Authors: 10 | [Mohammadreza Zolfaghari](https://mzolfaghari.github.io/), 11 | [Yi Zhu](https://bryanyzhu.github.io/), 12 | [Peter Gehler](http://gehler.io/), 13 | [Thomas Brox](https://lmb.informatik.uni-freiburg.de/people/brox/index.html), 14 | 15 | 16 | 17 | ## Update 18 | 19 | ##### [Dec 2021] CrossCLR-onlyIntraModality released 20 | ## Loss Function 21 | The loss function [`CrossCLR`](https://github.com/amazon-research/crossmodal-contrastive-learning) in `loss.py` takes `video features` and `text features` as input, and return the loss. 22 | 23 | Usage: 24 | ```python 25 | from trainer.loss import CrossCLR_onlyIntraModality 26 | 27 | # define loss with a temperature `temp` and weights for negative samples `w` 28 | criterion = CrossCLR_onlyIntraModality(temperature=temp, negative_weight=w) 29 | 30 | # features: [bsz, f_dim] 31 | video_features = ... 32 | text_features = ... 33 | 34 | # CrossCLR 35 | loss = criterion(video_features, text_features) 36 | 37 | ... 38 | ``` 39 | 40 | 41 | ## Qualitative samples 42 | 43 |

44 | 45 |

46 | 47 | ## Reference 48 | ``` 49 | @article{crossclr_aws_21, 50 | author = {Mohammadreza Zolfaghari and 51 | Yi Zhu and 52 | Peter V. Gehler and 53 | Thomas Brox}, 54 | title = {CrossCLR: Cross-modal Contrastive Learning For Multi-modal Video Representations}, 55 | url = {https://arxiv.org/abs/2109.14910}, 56 | eprinttype = {arXiv}, 57 | booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)}, 58 | month = {October}, 59 | year = {2021}, 60 | } 61 | ``` 62 | 63 | 64 | ## Security 65 | 66 | See [CONTRIBUTING](CONTRIBUTING.md#security-issue-notifications) for more information. 67 | 68 | ## License 69 | 70 | This project is licensed under the Apache-2.0 License. 71 | 72 | -------------------------------------------------------------------------------- /figures/qual_retriv.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/amazon-science/crossmodal-contrastive-learning/c9627a66c2b2737af8c151ee8526f3cdfec42c5a/figures/qual_retriv.png -------------------------------------------------------------------------------- /figures/teaser.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/amazon-science/crossmodal-contrastive-learning/c9627a66c2b2737af8c151ee8526f3cdfec42c5a/figures/teaser.png -------------------------------------------------------------------------------- /trainer/loss.py: -------------------------------------------------------------------------------- 1 | from torch import nn 2 | import torch 3 | import torch.nn.functional as F 4 | import time 5 | import numpy as np 6 | 7 | def cosine_sim(emb1, emb2): 8 | """compute cosine similarity of two embeddings 9 | Args: 10 | emb1 11 | emb2 12 | Returns: 13 | float: cosine similarity between (-1, 1) 14 | """ 15 | return emb1.mm(emb2.t()) 16 | 17 | class MaxMargin_coot(nn.Module): 18 | """Regular Contrastive Loss between 2 groups of embeddings 19 | inputs shape (batch, embed_dim) 20 | Ref: COOT: Cooperative Hierarchical Transformer for Video-Text Representation Learning, NeurIPS 2020 21 | """ 22 | 23 | def __init__(self, use_cuda: bool, margin: float = 0.1): 24 | super(ContrastiveLoss_coot, self).__init__() 25 | self.margin = margin 26 | self.sim = cosine_sim 27 | self.use_cuda = use_cuda 28 | 29 | def forward(self, im, s): 30 | scores = self.sim(im, s) 31 | diagonal = scores.diag().view(im.size(0), 1) 32 | d1 = diagonal.expand_as(scores) 33 | d2 = diagonal.t().expand_as(scores) 34 | cost_s = (self.margin + scores - d1).clamp(min=0) 35 | cost_im = (self.margin + scores - d2).clamp(min=0) 36 | mask = torch.eye(scores.size(0)) > .5 37 | if self.use_cuda: 38 | mask = mask.cuda() 39 | cost_s = cost_s.masked_fill_(mask, 0) 40 | cost_im = cost_im.masked_fill_(mask, 0) 41 | return (cost_s.sum() + cost_im.sum()).div(im.shape[0] * s.shape[0]) 42 | 43 | 44 | class CrossCLR_onlyIntraModality(nn.Module): 45 | """ 46 | CrossCLR Loss between 2 groups of embeddings - Only Intra Modality alignment 47 | ICCV 2021 48 | """ 49 | 50 | def __init__(self, temperature=0.03, negative_weight=0.8, logger = None): 51 | super().__init__() 52 | self.logit_scale = nn.Parameter(torch.ones([])) 53 | self.criterion = torch.nn.CrossEntropyLoss(reduction='none') 54 | self.temperature = temperature 55 | self.logger = logger 56 | self.negative_w = negative_weight # Weight of negative samples logits. 57 | 58 | 59 | def compute_loss(self, logits, mask): 60 | return - torch.log( (F.softmax(logits, dim=1) * mask).sum(1) ) 61 | 62 | def _get_positive_mask(self, batch_size): 63 | diag = np.eye(batch_size) 64 | mask = torch.from_numpy((diag)) 65 | mask = (1 - mask) 66 | return mask.cuda(non_blocking=True) 67 | 68 | def forward(self, video_features, text_features): 69 | """ 70 | Inputs shape (batch, embed_dim) 71 | Args: 72 | im: Visual embeddings (batch, embed_dim) 73 | s: Text embeddings (batch, embed_dim) 74 | Returns: 75 | """ 76 | batch_size = video_features.shape[0] 77 | 78 | # Normalize features 79 | video_features = nn.functional.normalize(video_features, dim=1) 80 | text_features = nn.functional.normalize(text_features, dim=1) 81 | 82 | # Inter-modality alignment 83 | logits_per_vid = video_features @ text_features.t() 84 | logits_per_text = text_features @ video_features.t() 85 | 86 | # Intra-modality alignment 87 | logits_clstr_vid = video_features @ video_features.t() 88 | logits_clstr_txt = text_features @ text_features.t() 89 | 90 | logits_per_vid /= self.temperature 91 | logits_per_text /= self.temperature 92 | logits_clstr_vid /= self.temperature 93 | logits_clstr_txt /= self.temperature 94 | 95 | positive_mask = self._get_positive_mask( video_features.shape[0]) 96 | negatives_vid = logits_clstr_vid * positive_mask 97 | negatives_txt = logits_clstr_txt * positive_mask 98 | 99 | vid_logits = torch.cat([logits_per_vid, self.negative_w * negatives_vid], dim=1) 100 | txt_logits = torch.cat([logits_per_text, self.negative_w * negatives_txt], dim=1) 101 | 102 | diag = np.eye(batch_size) 103 | mask_vid = torch.from_numpy((diag)).cuda() 104 | mask_txt = torch.from_numpy((diag)).cuda() 105 | 106 | mask_neg_v = torch.zeros_like(negatives_vid) 107 | mask_neg_t = torch.zeros_like(negatives_txt) 108 | mask_v = torch.cat([mask_vid, mask_neg_v], dim=1) 109 | mask_t = torch.cat([mask_txt, mask_neg_t], dim=1) 110 | 111 | loss_i = self.compute_loss(vid_logits, mask_v) 112 | loss_t = self.compute_loss(txt_logits, mask_t) 113 | 114 | return ((loss_i.mean() + loss_t.mean()) ) / 2 --------------------------------------------------------------------------------