├── CODE_OF_CONDUCT.md ├── LICENSE ├── SUPPORT.md ├── README_prev.md ├── SECURITY.md ├── randomized_quantization.py └── README.md /CODE_OF_CONDUCT.md: -------------------------------------------------------------------------------- 1 | # Microsoft Open Source Code of Conduct 2 | 3 | This project has adopted the [Microsoft Open Source Code of Conduct](https://opensource.microsoft.com/codeofconduct/). 4 | 5 | Resources: 6 | 7 | - [Microsoft Open Source Code of Conduct](https://opensource.microsoft.com/codeofconduct/) 8 | - [Microsoft Code of Conduct FAQ](https://opensource.microsoft.com/codeofconduct/faq/) 9 | - Contact [opencode@microsoft.com](mailto:opencode@microsoft.com) with questions or concerns 10 | -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | MIT License 2 | 3 | Copyright (c) Microsoft Corporation. 4 | 5 | Permission is hereby granted, free of charge, to any person obtaining a copy 6 | of this software and associated documentation files (the "Software"), to deal 7 | in the Software without restriction, including without limitation the rights 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 9 | copies of the Software, and to permit persons to whom the Software is 10 | furnished to do so, subject to the following conditions: 11 | 12 | The above copyright notice and this permission notice shall be included in all 13 | copies or substantial portions of the Software. 14 | 15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 21 | SOFTWARE 22 | -------------------------------------------------------------------------------- /SUPPORT.md: -------------------------------------------------------------------------------- 1 | # TODO: The maintainer of this repo has not yet edited this file 2 | 3 | **REPO OWNER**: Do you want Customer Service & Support (CSS) support for this product/project? 4 | 5 | - **No CSS support:** Fill out this template with information about how to file issues and get help. 6 | - **Yes CSS support:** Fill out an intake form at [aka.ms/onboardsupport](https://aka.ms/onboardsupport). CSS will work with/help you to determine next steps. 7 | - **Not sure?** Fill out an intake as though the answer were "Yes". CSS will help you decide. 8 | 9 | *Then remove this first heading from this SUPPORT.MD file before publishing your repo.* 10 | 11 | # Support 12 | 13 | ## How to file issues and get help 14 | 15 | This project uses GitHub Issues to track bugs and feature requests. Please search the existing 16 | issues before filing new issues to avoid duplicates. For new issues, file your bug or 17 | feature request as a new Issue. 18 | 19 | For help and questions about using this project, please **REPO MAINTAINER: INSERT INSTRUCTIONS HERE 20 | FOR HOW TO ENGAGE REPO OWNERS OR COMMUNITY FOR HELP. COULD BE A STACK OVERFLOW TAG OR OTHER 21 | CHANNEL. WHERE WILL YOU HELP PEOPLE?**. 22 | 23 | ## Microsoft Support Policy 24 | 25 | Support for this **PROJECT or PRODUCT** is limited to the resources listed above. 26 | -------------------------------------------------------------------------------- /README_prev.md: -------------------------------------------------------------------------------- 1 | # Project 2 | 3 | > This repo has been populated by an initial template to help get you started. Please 4 | > make sure to update the content to build a great experience for community-building. 5 | 6 | As the maintainer of this project, please make a few updates: 7 | 8 | - Improving this README.MD file to provide a great experience 9 | - Updating SUPPORT.MD with content about this project's support experience 10 | - Understanding the security reporting process in SECURITY.MD 11 | - Remove this section from the README 12 | 13 | ## Contributing 14 | 15 | This project welcomes contributions and suggestions. Most contributions require you to agree to a 16 | Contributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us 17 | the rights to use your contribution. For details, visit https://cla.opensource.microsoft.com. 18 | 19 | When you submit a pull request, a CLA bot will automatically determine whether you need to provide 20 | a CLA and decorate the PR appropriately (e.g., status check, comment). Simply follow the instructions 21 | provided by the bot. You will only need to do this once across all repos using our CLA. 22 | 23 | This project has adopted the [Microsoft Open Source Code of Conduct](https://opensource.microsoft.com/codeofconduct/). 24 | For more information see the [Code of Conduct FAQ](https://opensource.microsoft.com/codeofconduct/faq/) or 25 | contact [opencode@microsoft.com](mailto:opencode@microsoft.com) with any additional questions or comments. 26 | 27 | ## Trademarks 28 | 29 | This project may contain trademarks or logos for projects, products, or services. Authorized use of Microsoft 30 | trademarks or logos is subject to and must follow 31 | [Microsoft's Trademark & Brand Guidelines](https://www.microsoft.com/en-us/legal/intellectualproperty/trademarks/usage/general). 32 | Use of Microsoft trademarks or logos in modified versions of this project must not cause confusion or imply Microsoft sponsorship. 33 | Any use of third-party trademarks or logos are subject to those third-party's policies. 34 | -------------------------------------------------------------------------------- /SECURITY.md: -------------------------------------------------------------------------------- 1 | 2 | 3 | ## Security 4 | 5 | Microsoft takes the security of our software products and services seriously, which includes all source code repositories managed through our GitHub organizations, which include [Microsoft](https://github.com/microsoft), [Azure](https://github.com/Azure), [DotNet](https://github.com/dotnet), [AspNet](https://github.com/aspnet), [Xamarin](https://github.com/xamarin), and [our GitHub organizations](https://opensource.microsoft.com/). 6 | 7 | If you believe you have found a security vulnerability in any Microsoft-owned repository that meets [Microsoft's definition of a security vulnerability](https://aka.ms/opensource/security/definition), please report it to us as described below. 8 | 9 | ## Reporting Security Issues 10 | 11 | **Please do not report security vulnerabilities through public GitHub issues.** 12 | 13 | Instead, please report them to the Microsoft Security Response Center (MSRC) at [https://msrc.microsoft.com/create-report](https://aka.ms/opensource/security/create-report). 14 | 15 | If you prefer to submit without logging in, send email to [secure@microsoft.com](mailto:secure@microsoft.com). If possible, encrypt your message with our PGP key; please download it from the [Microsoft Security Response Center PGP Key page](https://aka.ms/opensource/security/pgpkey). 16 | 17 | You should receive a response within 24 hours. If for some reason you do not, please follow up via email to ensure we received your original message. Additional information can be found at [microsoft.com/msrc](https://aka.ms/opensource/security/msrc). 18 | 19 | Please include the requested information listed below (as much as you can provide) to help us better understand the nature and scope of the possible issue: 20 | 21 | * Type of issue (e.g. buffer overflow, SQL injection, cross-site scripting, etc.) 22 | * Full paths of source file(s) related to the manifestation of the issue 23 | * The location of the affected source code (tag/branch/commit or direct URL) 24 | * Any special configuration required to reproduce the issue 25 | * Step-by-step instructions to reproduce the issue 26 | * Proof-of-concept or exploit code (if possible) 27 | * Impact of the issue, including how an attacker might exploit the issue 28 | 29 | This information will help us triage your report more quickly. 30 | 31 | If you are reporting for a bug bounty, more complete reports can contribute to a higher bounty award. Please visit our [Microsoft Bug Bounty Program](https://aka.ms/opensource/security/bounty) page for more details about our active programs. 32 | 33 | ## Preferred Languages 34 | 35 | We prefer all communications to be in English. 36 | 37 | ## Policy 38 | 39 | Microsoft follows the principle of [Coordinated Vulnerability Disclosure](https://aka.ms/opensource/security/cvd). 40 | 41 | 42 | -------------------------------------------------------------------------------- /randomized_quantization.py: -------------------------------------------------------------------------------- 1 | import torch 2 | import torch.nn as nn 3 | 4 | class RandomizedQuantizationAugModule(nn.Module): 5 | def __init__(self, region_num, collapse_to_val = 'inside_random', spacing='random', transforms_like=False, p_random_apply_rand_quant = 1): 6 | """ 7 | region_num: int; 8 | """ 9 | super().__init__() 10 | self.region_num = region_num 11 | self.collapse_to_val = collapse_to_val 12 | self.spacing = spacing 13 | self.transforms_like = transforms_like 14 | self.p_random_apply_rand_quant = p_random_apply_rand_quant 15 | 16 | def get_params(self, x): 17 | """ 18 | x: (C, H, W)· 19 | returns (C), (C), (C) 20 | """ 21 | C, _, _ = x.size() # one batch img 22 | min_val, max_val = x.view(C, -1).min(1)[0], x.view(C, -1).max(1)[0] # min, max over batch size, spatial dimension 23 | total_region_percentile_number = (torch.ones(C) * (self.region_num - 1)).int() 24 | return min_val, max_val, total_region_percentile_number 25 | 26 | def forward(self, x): 27 | """ 28 | x: (B, c, H, W) or (C, H, W) 29 | """ 30 | EPSILON = 1 31 | if self.p_random_apply_rand_quant != 1: 32 | x_orig = x 33 | if not self.transforms_like: 34 | B, c, H, W = x.shape 35 | C = B * c 36 | x = x.view(C, H, W) 37 | else: 38 | C, H, W = x.shape 39 | min_val, max_val, total_region_percentile_number_per_channel = self.get_params(x) # -> (C), (C), (C) 40 | 41 | # region percentiles for each channel 42 | if self.spacing == "random": 43 | region_percentiles = torch.rand(total_region_percentile_number_per_channel.sum(), device=x.device) 44 | elif self.spacing == "uniform": 45 | region_percentiles = torch.tile(torch.arange(1/(total_region_percentile_number_per_channel[0] + 1), 1, step=1/(total_region_percentile_number_per_channel[0]+1), device=x.device), [C]) 46 | region_percentiles_per_channel = region_percentiles.reshape([-1, self.region_num - 1]) 47 | # ordered region ends 48 | region_percentiles_pos = (region_percentiles_per_channel * (max_val - min_val).view(C, 1) + min_val.view(C, 1)).view(C, -1, 1, 1) 49 | ordered_region_right_ends_for_checking = torch.cat([region_percentiles_pos, max_val.view(C, 1, 1, 1)+EPSILON], dim=1).sort(1)[0] 50 | ordered_region_right_ends = torch.cat([region_percentiles_pos, max_val.view(C, 1, 1, 1)+1e-6], dim=1).sort(1)[0] 51 | ordered_region_left_ends = torch.cat([min_val.view(C, 1, 1, 1), region_percentiles_pos], dim=1).sort(1)[0] 52 | # ordered middle points 53 | ordered_region_mid = (ordered_region_right_ends + ordered_region_left_ends) / 2 54 | 55 | # associate region id 56 | is_inside_each_region = (x.view(C, 1, H, W) < ordered_region_right_ends_for_checking) * (x.view(C, 1, H, W) >= ordered_region_left_ends) # -> (C, self.region_num, H, W); boolean 57 | assert (is_inside_each_region.sum(1) == 1).all()# sanity check: each pixel falls into one sub_range 58 | associated_region_id = torch.argmax(is_inside_each_region.int(), dim=1, keepdim=True) # -> (C, 1, H, W) 59 | 60 | if self.collapse_to_val == 'middle': 61 | # middle points as the proxy for all values in corresponding regions 62 | proxy_vals = torch.gather(ordered_region_mid.expand([-1, -1, H, W]), 1, associated_region_id)[:,0] 63 | x = proxy_vals.type(x.dtype) 64 | elif self.collapse_to_val == 'inside_random': 65 | # random points inside each region as the proxy for all values in corresponding regions 66 | proxy_percentiles_per_region = torch.rand((total_region_percentile_number_per_channel + 1).sum(), device=x.device) 67 | proxy_percentiles_per_channel = proxy_percentiles_per_region.reshape([-1, self.region_num]) 68 | ordered_region_rand = ordered_region_left_ends + proxy_percentiles_per_channel.view(C, -1, 1, 1) * (ordered_region_right_ends - ordered_region_left_ends) 69 | proxy_vals = torch.gather(ordered_region_rand.expand([-1, -1, H, W]), 1, associated_region_id)[:, 0] 70 | x = proxy_vals.type(x.dtype) 71 | 72 | elif self.collapse_to_val == 'all_zeros': 73 | proxy_vals = torch.zeros_like(x, device=x.device) 74 | x = proxy_vals.type(x.dtype) 75 | else: 76 | raise NotImplementedError 77 | 78 | if not self.transforms_like: 79 | x = x.view(B, c, H, W) 80 | 81 | if self.p_random_apply_rand_quant != 1: 82 | if not self.transforms_like: 83 | x = torch.where(torch.rand([B,1,1,1], device=x.device) < self.p_random_apply_rand_quant, x, x_orig) 84 | else: 85 | x = torch.where(torch.rand([C,1,1], device=x.device) < self.p_random_apply_rand_quant, x, x_orig) 86 | 87 | return x 88 | 89 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | ## Introduction 2 | This is a PyTorch implementation of ICCV 2023 paper [Randomized Quantization for Data Agnostic Representation Learning](https://arxiv.org/abs/2212.08663). 3 | This paper introduces a self-supervised augmentation tool for data agnostic representation learning, by quantizing each input channel through a non-uniform quantizer, with the quantized value 4 | sampled randomly within randomly generated quantization bins. 5 | Applying the randomized quantization in conjunction with sequential augmentations on self-supervised contrastive models achieves on par results with 6 | modality-specific augmentation on vision tasks, and state-of-the-art results on 3D point clouds as well as on audio. 7 | We also demonstrate this method to be applicable for augmenting intermediate embeddings in a deep neural network on the comprehensive [DABS](https://arxiv.org/abs/2111.12062) benchmark which is 8 | comprised of various data modalities. 9 | 10 | ## Pretrained checkpoints on ImageNet under [moco-v3](https://arxiv.org/abs/2104.02057) 11 | 12 | | Augmentations |Pre-trained checkpoints|Linear probe 13 | :-: | :-:| :-: 14 | |Randomized Quantization (100 epochs) |[model](https://frontiers.blob.core.windows.net/pretraining/projects/whm_ckpt/random_quantize/randomized_quantization_100ep.pth.tar) |42.9 15 | |RRC + Randomized Quantization (100 epochs) |[model](https://frontiers.blob.core.windows.net/pretraining/projects/whm_ckpt/random_quantize/rrc_randomized_quantization_100ep.pth.tar) |67.9 16 | |RRC + Randomized Quantization (300 epochs) |[model](https://frontiers.blob.core.windows.net/pretraining/projects/whm_ckpt/random_quantize/rrc_randomized_quantization_300ep.pth.tar) |71.6 17 | |RRC + Randomized Quantization (800 epochs) |[model](https://frontiers.blob.core.windows.net/pretraining/projects/whm_ckpt/random_quantize/rrc_randomized_quantization_800ep.pth.tar) |72.1 18 | 19 | ## Pretrained checkpoints on [Audioset](https://ieeexplore.ieee.org/document/7952261) under [byol-a](https://arxiv.org/abs/2103.06695) 20 | We largely follow the experimental settings of [BYOL-A](https://arxiv.org/abs/2103.06695) and treat it as our baseline. We replace the Mixup augmentation used in [BYOL-A](https://arxiv.org/abs/2103.06695) with our randomized quantization. The network is trained on [Audioset](https://ieeexplore.ieee.org/document/7952261) for 100 epoches. On six downstream audio classification datasets, including NSynth ([NS](https://arxiv.org/abs/1704.01279)), UrbanSound8K ([US8K](https://dl.acm.org/doi/abs/10.1145/2647868.2655045)), VoxCeleb1 ([VC1](https://arxiv.org/abs/1706.08612)), VoxForge ([VF](Voxforge.org)), Speech Commands V2 ([SPCV2/12](https://arxiv.org/abs/1804.03209)), Speech Commands V2 ([SPCV2](https://arxiv.org/abs/1804.03209)) , linear probing results are reported as below: 21 | | Method |Augmentations|NS|US8K|VC1|VF|SPCV2/12|SPCV2|Average 22 | :-: | :-:| :-: | :-: | :-: | :-: | :-: | :-: | :-: 23 | |BYOL-A |RRC + [Mixup](https://arxiv.org/abs/1710.09412)|74.1|79.1|40.1|90.2|91.0|92.2|77.8 24 | |[Our model](https://frontiers.blob.core.windows.net/pretraining/projects/whm_ckpt/random_quantize/randomized_quantization_audio.pth) |RRC + Randomized Quantization|74.2|78.0|45.7|92.6|95.1|92.1|79.6 25 | 26 | 27 | ## Usage 28 | The code has been tested with PyTorch 1.10.0, CUDA 11.3 and CuDNN 8.2.0. 29 | You are recommended to work with [this docker image](https://hub.docker.com/layers/wuzhiron/pytorch/pytorch1.10.0-cuda11.3-cudnn8-singularity/images/sha256-3e0feccdb9a72cc93e520c35dcf08b928ca379234e4ed7fe7376f7eb53d1dd7a?context=explore). 30 | Bellow are use cases based on [moco-v3](https://github.com/facebookresearch/moco-v3) with minimal effort that allow people having an interest to immediately inject our augmentation into their own project. 31 | 32 | 1. Call the augmentation as one of torchvision.transforms modules. 33 | ```python 34 | region_num = 8 35 | #https://github.com/facebookresearch/moco-v3/blob/c349e6e24f40d3fedb22d973f92defa4cedf37a7/main_moco.py#L262-L285 36 | augmentation1 = [ 37 | transforms.RandomResizedCrop(224, scale=(args.crop_min, 1.)), 38 | RandomizedQuantizationAugModule(region_num, transforms_like=True), 39 | transforms.ToTensor() 40 | ] 41 | augmentation2 = [ 42 | transforms.RandomResizedCrop(224, scale=(args.crop_min, 1.)), 43 | RandomizedQuantizationAugModule(region_num, transforms_like=True), 44 | transforms.ToTensor() 45 | ] 46 | ``` 47 | 2. Apply randomly our augmentation with a given probability. 48 | ```python 49 | region_num = 8 50 | p_random_apply1, p_random_apply2 = 0.5, 0.5 51 | #https://github.com/facebookresearch/moco-v3/blob/c349e6e24f40d3fedb22d973f92defa4cedf37a7/main_moco.py#L262 52 | augmentation1 = [ 53 | transforms.RandomResizedCrop(224, scale=(args.crop_min, 1.)), 54 | RandomizedQuantizationAugModule(region_num, p_random_apply_rand_quant=p_random_apply1), 55 | transforms.ToTensor() 56 | ] 57 | augmentation2 = [ 58 | transforms.RandomResizedCrop(224, scale=(args.crop_min, 1.)), 59 | RandomizedQuantizationAugModule(region_num, p_random_apply_rand_quant=p_random_apply2), 60 | transforms.ToTensor() 61 | ] 62 | ``` 63 | 3. Call the augmentation in forward(). This is faster than above two usages since the augmentation is deployed on GPUs. 64 | ```python 65 | # https://github.com/facebookresearch/moco-v3/blob/c349e6e24f40d3fedb22d973f92defa4cedf37a7/moco/builder.py#L35 66 | region_num = 8 67 | self.rand_quant_layer = RandomizedQuantizationAugModule(region_num) 68 | # https://github.com/facebookresearch/moco-v3/blob/c349e6e24f40d3fedb22d973f92defa4cedf37a7/moco/builder.py#L86-L94 69 | q1 = self.predictor(self.base_encoder(self.rand_quant_layer(x1))) 70 | q2 = self.predictor(self.base_encoder(self.rand_quant_layer(x2))) 71 | 72 | with torch.no_grad(): # no gradient 73 | self._update_momentum_encoder(m) # update the momentum encoder 74 | 75 | # compute momentum features as targets 76 | k1 = self.momentum_encoder(self.rand_quant_layer(x1)) 77 | k2 = self.momentum_encoder(self.rand_quant_layer(x2)) 78 | ``` 79 | 80 | ## Citation 81 | ``` 82 | @inproceedings{wu2023randomized, 83 | title={Randomized Quantization: A Generic Augmentation for Data Agnostic Self-supervised Learning}, 84 | author={Wu, Huimin and Lei, Chenyang and Sun, Xiao and Wang, Peng-Shuai and Chen, Qifeng and Cheng, Kwang-Ting and Lin, Stephen and Wu, Zhirong}, 85 | booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision}, 86 | pages={16305--16316}, 87 | year={2023} 88 | } 89 | 90 | @Article{wu2023randomized, 91 | author={Huimin Wu and Chenyang Lei and Xiao Sun and Peng-Shuai Wang and Qifeng Chen and Kwang-Ting Cheng and Stephen Lin and Zhirong Wu}, 92 | journal = {arXiv:2212.08663}, 93 | title={Randomized Quantization: A Generic Augmentation for Data Agnostic Self-supervised Learning}, 94 | year={2023}, 95 | } 96 | 97 | ``` 98 | ## Contributing 99 | 100 | This project welcomes contributions and suggestions. Most contributions require you to agree to a 101 | Contributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us 102 | the rights to use your contribution. For details, visit https://cla.opensource.microsoft.com. 103 | 104 | When you submit a pull request, a CLA bot will automatically determine whether you need to provide 105 | a CLA and decorate the PR appropriately (e.g., status check, comment). Simply follow the instructions 106 | provided by the bot. You will only need to do this once across all repos using our CLA. 107 | 108 | This project has adopted the [Microsoft Open Source Code of Conduct](https://opensource.microsoft.com/codeofconduct/). 109 | For more information see the [Code of Conduct FAQ](https://opensource.microsoft.com/codeofconduct/faq/) or 110 | contact [opencode@microsoft.com](mailto:opencode@microsoft.com) with any additional questions or comments. 111 | 112 | ## Trademarks 113 | 114 | This project may contain trademarks or logos for projects, products, or services. Authorized use of Microsoft 115 | trademarks or logos is subject to and must follow 116 | [Microsoft's Trademark & Brand Guidelines](https://www.microsoft.com/en-us/legal/intellectualproperty/trademarks/usage/general). 117 | Use of Microsoft trademarks or logos in modified versions of this project must not cause confusion or imply Microsoft sponsorship. 118 | Any use of third-party trademarks or logos are subject to those third-party's policies. 119 | --------------------------------------------------------------------------------