├── CODE_OF_CONDUCT.md
├── LICENSE
├── SUPPORT.md
├── README_prev.md
├── SECURITY.md
├── randomized_quantization.py
└── README.md


/CODE_OF_CONDUCT.md:
--------------------------------------------------------------------------------
 1 | # Microsoft Open Source Code of Conduct
 2 | 
 3 | This project has adopted the [Microsoft Open Source Code of Conduct](https://opensource.microsoft.com/codeofconduct/).
 4 | 
 5 | Resources:
 6 | 
 7 | - [Microsoft Open Source Code of Conduct](https://opensource.microsoft.com/codeofconduct/)
 8 | - [Microsoft Code of Conduct FAQ](https://opensource.microsoft.com/codeofconduct/faq/)
 9 | - Contact [opencode@microsoft.com](mailto:opencode@microsoft.com) with questions or concerns
10 | 


--------------------------------------------------------------------------------
/LICENSE:
--------------------------------------------------------------------------------
 1 |     MIT License
 2 | 
 3 |     Copyright (c) Microsoft Corporation.
 4 | 
 5 |     Permission is hereby granted, free of charge, to any person obtaining a copy
 6 |     of this software and associated documentation files (the "Software"), to deal
 7 |     in the Software without restriction, including without limitation the rights
 8 |     to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
 9 |     copies of the Software, and to permit persons to whom the Software is
10 |     furnished to do so, subject to the following conditions:
11 | 
12 |     The above copyright notice and this permission notice shall be included in all
13 |     copies or substantial portions of the Software.
14 | 
15 |     THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16 |     IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17 |     FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18 |     AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19 |     LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20 |     OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21 |     SOFTWARE
22 | 


--------------------------------------------------------------------------------
/SUPPORT.md:
--------------------------------------------------------------------------------
 1 | # TODO: The maintainer of this repo has not yet edited this file
 2 | 
 3 | **REPO OWNER**: Do you want Customer Service & Support (CSS) support for this product/project?
 4 | 
 5 | - **No CSS support:** Fill out this template with information about how to file issues and get help.
 6 | - **Yes CSS support:** Fill out an intake form at [aka.ms/onboardsupport](https://aka.ms/onboardsupport). CSS will work with/help you to determine next steps.
 7 | - **Not sure?** Fill out an intake as though the answer were "Yes". CSS will help you decide.
 8 | 
 9 | *Then remove this first heading from this SUPPORT.MD file before publishing your repo.*
10 | 
11 | # Support
12 | 
13 | ## How to file issues and get help  
14 | 
15 | This project uses GitHub Issues to track bugs and feature requests. Please search the existing 
16 | issues before filing new issues to avoid duplicates.  For new issues, file your bug or 
17 | feature request as a new Issue.
18 | 
19 | For help and questions about using this project, please **REPO MAINTAINER: INSERT INSTRUCTIONS HERE 
20 | FOR HOW TO ENGAGE REPO OWNERS OR COMMUNITY FOR HELP. COULD BE A STACK OVERFLOW TAG OR OTHER
21 | CHANNEL. WHERE WILL YOU HELP PEOPLE?**.
22 | 
23 | ## Microsoft Support Policy  
24 | 
25 | Support for this **PROJECT or PRODUCT** is limited to the resources listed above.
26 | 


--------------------------------------------------------------------------------
/README_prev.md:
--------------------------------------------------------------------------------
 1 | # Project
 2 | 
 3 | > This repo has been populated by an initial template to help get you started. Please
 4 | > make sure to update the content to build a great experience for community-building.
 5 | 
 6 | As the maintainer of this project, please make a few updates:
 7 | 
 8 | - Improving this README.MD file to provide a great experience
 9 | - Updating SUPPORT.MD with content about this project's support experience
10 | - Understanding the security reporting process in SECURITY.MD
11 | - Remove this section from the README
12 | 
13 | ## Contributing
14 | 
15 | This project welcomes contributions and suggestions.  Most contributions require you to agree to a
16 | Contributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us
17 | the rights to use your contribution. For details, visit https://cla.opensource.microsoft.com.
18 | 
19 | When you submit a pull request, a CLA bot will automatically determine whether you need to provide
20 | a CLA and decorate the PR appropriately (e.g., status check, comment). Simply follow the instructions
21 | provided by the bot. You will only need to do this once across all repos using our CLA.
22 | 
23 | This project has adopted the [Microsoft Open Source Code of Conduct](https://opensource.microsoft.com/codeofconduct/).
24 | For more information see the [Code of Conduct FAQ](https://opensource.microsoft.com/codeofconduct/faq/) or
25 | contact [opencode@microsoft.com](mailto:opencode@microsoft.com) with any additional questions or comments.
26 | 
27 | ## Trademarks
28 | 
29 | This project may contain trademarks or logos for projects, products, or services. Authorized use of Microsoft 
30 | trademarks or logos is subject to and must follow 
31 | [Microsoft's Trademark & Brand Guidelines](https://www.microsoft.com/en-us/legal/intellectualproperty/trademarks/usage/general).
32 | Use of Microsoft trademarks or logos in modified versions of this project must not cause confusion or imply Microsoft sponsorship.
33 | Any use of third-party trademarks or logos are subject to those third-party's policies.
34 | 


--------------------------------------------------------------------------------
/SECURITY.md:
--------------------------------------------------------------------------------
 1 | <!-- BEGIN MICROSOFT SECURITY.MD V0.0.8 BLOCK -->
 2 | 
 3 | ## Security
 4 | 
 5 | Microsoft takes the security of our software products and services seriously, which includes all source code repositories managed through our GitHub organizations, which include [Microsoft](https://github.com/microsoft), [Azure](https://github.com/Azure), [DotNet](https://github.com/dotnet), [AspNet](https://github.com/aspnet), [Xamarin](https://github.com/xamarin), and [our GitHub organizations](https://opensource.microsoft.com/).
 6 | 
 7 | If you believe you have found a security vulnerability in any Microsoft-owned repository that meets [Microsoft's definition of a security vulnerability](https://aka.ms/opensource/security/definition), please report it to us as described below.
 8 | 
 9 | ## Reporting Security Issues
10 | 
11 | **Please do not report security vulnerabilities through public GitHub issues.**
12 | 
13 | Instead, please report them to the Microsoft Security Response Center (MSRC) at [https://msrc.microsoft.com/create-report](https://aka.ms/opensource/security/create-report).
14 | 
15 | If you prefer to submit without logging in, send email to [secure@microsoft.com](mailto:secure@microsoft.com).  If possible, encrypt your message with our PGP key; please download it from the [Microsoft Security Response Center PGP Key page](https://aka.ms/opensource/security/pgpkey).
16 | 
17 | You should receive a response within 24 hours. If for some reason you do not, please follow up via email to ensure we received your original message. Additional information can be found at [microsoft.com/msrc](https://aka.ms/opensource/security/msrc). 
18 | 
19 | Please include the requested information listed below (as much as you can provide) to help us better understand the nature and scope of the possible issue:
20 | 
21 |   * Type of issue (e.g. buffer overflow, SQL injection, cross-site scripting, etc.)
22 |   * Full paths of source file(s) related to the manifestation of the issue
23 |   * The location of the affected source code (tag/branch/commit or direct URL)
24 |   * Any special configuration required to reproduce the issue
25 |   * Step-by-step instructions to reproduce the issue
26 |   * Proof-of-concept or exploit code (if possible)
27 |   * Impact of the issue, including how an attacker might exploit the issue
28 | 
29 | This information will help us triage your report more quickly.
30 | 
31 | If you are reporting for a bug bounty, more complete reports can contribute to a higher bounty award. Please visit our [Microsoft Bug Bounty Program](https://aka.ms/opensource/security/bounty) page for more details about our active programs.
32 | 
33 | ## Preferred Languages
34 | 
35 | We prefer all communications to be in English.
36 | 
37 | ## Policy
38 | 
39 | Microsoft follows the principle of [Coordinated Vulnerability Disclosure](https://aka.ms/opensource/security/cvd).
40 | 
41 | <!-- END MICROSOFT SECURITY.MD BLOCK -->
42 | 


--------------------------------------------------------------------------------
/randomized_quantization.py:
--------------------------------------------------------------------------------
 1 | import torch
 2 | import torch.nn as nn
 3 | 
 4 | class RandomizedQuantizationAugModule(nn.Module):
 5 |     def __init__(self, region_num, collapse_to_val = 'inside_random', spacing='random', transforms_like=False, p_random_apply_rand_quant = 1):
 6 |         """
 7 |         region_num: int;
 8 |         """
 9 |         super().__init__()
10 |         self.region_num = region_num
11 |         self.collapse_to_val = collapse_to_val
12 |         self.spacing = spacing
13 |         self.transforms_like = transforms_like
14 |         self.p_random_apply_rand_quant = p_random_apply_rand_quant
15 | 
16 |     def get_params(self, x):
17 |         """
18 |         x: (C, H, W)·
19 |         returns (C), (C), (C)
20 |         """
21 |         C, _, _ = x.size() # one batch img
22 |         min_val, max_val = x.view(C, -1).min(1)[0], x.view(C, -1).max(1)[0] # min, max over batch size, spatial dimension
23 |         total_region_percentile_number = (torch.ones(C) * (self.region_num - 1)).int()
24 |         return min_val, max_val, total_region_percentile_number
25 | 
26 |     def forward(self, x):
27 |         """
28 |         x: (B, c, H, W) or (C, H, W)
29 |         """
30 |         EPSILON = 1
31 |         if self.p_random_apply_rand_quant != 1:
32 |             x_orig = x
33 |         if not self.transforms_like:
34 |             B, c, H, W = x.shape
35 |             C = B * c
36 |             x = x.view(C, H, W)
37 |         else:
38 |             C, H, W = x.shape
39 |         min_val, max_val, total_region_percentile_number_per_channel = self.get_params(x) # -> (C), (C), (C)
40 | 
41 |         # region percentiles for each channel
42 |         if self.spacing == "random":
43 |             region_percentiles = torch.rand(total_region_percentile_number_per_channel.sum(), device=x.device)
44 |         elif self.spacing == "uniform":
45 |             region_percentiles = torch.tile(torch.arange(1/(total_region_percentile_number_per_channel[0] + 1), 1, step=1/(total_region_percentile_number_per_channel[0]+1), device=x.device), [C])
46 |         region_percentiles_per_channel = region_percentiles.reshape([-1, self.region_num - 1])
47 |         # ordered region ends
48 |         region_percentiles_pos = (region_percentiles_per_channel * (max_val - min_val).view(C, 1) + min_val.view(C, 1)).view(C, -1, 1, 1)
49 |         ordered_region_right_ends_for_checking = torch.cat([region_percentiles_pos, max_val.view(C, 1, 1, 1)+EPSILON], dim=1).sort(1)[0]
50 |         ordered_region_right_ends = torch.cat([region_percentiles_pos, max_val.view(C, 1, 1, 1)+1e-6], dim=1).sort(1)[0]
51 |         ordered_region_left_ends = torch.cat([min_val.view(C, 1, 1, 1), region_percentiles_pos], dim=1).sort(1)[0]
52 |         # ordered middle points
53 |         ordered_region_mid = (ordered_region_right_ends + ordered_region_left_ends) / 2
54 | 
55 |         # associate region id
56 |         is_inside_each_region = (x.view(C, 1, H, W) < ordered_region_right_ends_for_checking) * (x.view(C, 1, H, W) >= ordered_region_left_ends) # -> (C, self.region_num, H, W); boolean
57 |         assert (is_inside_each_region.sum(1) == 1).all()# sanity check: each pixel falls into one sub_range
58 |         associated_region_id = torch.argmax(is_inside_each_region.int(), dim=1, keepdim=True)  # -> (C, 1, H, W)
59 | 
60 |         if self.collapse_to_val == 'middle':
61 |             # middle points as the proxy for all values in corresponding regions
62 |             proxy_vals = torch.gather(ordered_region_mid.expand([-1, -1, H, W]), 1, associated_region_id)[:,0]
63 |             x = proxy_vals.type(x.dtype)
64 |         elif self.collapse_to_val == 'inside_random':
65 |             # random points inside each region as the proxy for all values in corresponding regions
66 |             proxy_percentiles_per_region = torch.rand((total_region_percentile_number_per_channel + 1).sum(), device=x.device)
67 |             proxy_percentiles_per_channel = proxy_percentiles_per_region.reshape([-1, self.region_num])
68 |             ordered_region_rand = ordered_region_left_ends + proxy_percentiles_per_channel.view(C, -1, 1, 1) * (ordered_region_right_ends - ordered_region_left_ends)
69 |             proxy_vals = torch.gather(ordered_region_rand.expand([-1, -1, H, W]), 1, associated_region_id)[:, 0]
70 |             x = proxy_vals.type(x.dtype)
71 | 
72 |         elif self.collapse_to_val == 'all_zeros':
73 |             proxy_vals = torch.zeros_like(x, device=x.device)
74 |             x = proxy_vals.type(x.dtype)
75 |         else:
76 |             raise NotImplementedError
77 | 
78 |         if not self.transforms_like:
79 |             x = x.view(B, c, H, W)
80 | 
81 |         if self.p_random_apply_rand_quant != 1:
82 |             if not self.transforms_like:
83 |                 x = torch.where(torch.rand([B,1,1,1], device=x.device) < self.p_random_apply_rand_quant, x, x_orig)
84 |             else:
85 |                 x = torch.where(torch.rand([C,1,1], device=x.device) < self.p_random_apply_rand_quant, x, x_orig)
86 | 
87 |         return x
88 | 
89 | 


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
  1 | ## Introduction
  2 | This is a PyTorch implementation of ICCV 2023 paper [Randomized Quantization for Data Agnostic Representation Learning](https://arxiv.org/abs/2212.08663).
  3 | This paper introduces a self-supervised augmentation tool for data agnostic representation learning, by quantizing each input channel through a non-uniform quantizer, with the quantized value
  4 | sampled randomly within randomly generated quantization bins.
  5 | Applying the randomized quantization in conjunction with sequential augmentations on self-supervised contrastive models achieves on par results with 
  6 | modality-specific augmentation on vision tasks, and state-of-the-art results on 3D point clouds as well as on audio.
  7 | We also demonstrate this method to be applicable for augmenting intermediate embeddings in a deep neural network on the comprehensive [DABS](https://arxiv.org/abs/2111.12062) benchmark which is
  8 | comprised of various data modalities.
  9 | 
 10 | ## Pretrained checkpoints on ImageNet under [moco-v3](https://arxiv.org/abs/2104.02057)
 11 | 
 12 | | Augmentations |Pre-trained checkpoints|Linear probe
 13 |  :-: | :-:| :-:
 14 | |Randomized Quantization (100 epochs) |[model](https://frontiers.blob.core.windows.net/pretraining/projects/whm_ckpt/random_quantize/randomized_quantization_100ep.pth.tar) |42.9
 15 | |RRC + Randomized Quantization (100 epochs)  |[model](https://frontiers.blob.core.windows.net/pretraining/projects/whm_ckpt/random_quantize/rrc_randomized_quantization_100ep.pth.tar) |67.9
 16 | |RRC + Randomized Quantization (300 epochs)  |[model](https://frontiers.blob.core.windows.net/pretraining/projects/whm_ckpt/random_quantize/rrc_randomized_quantization_300ep.pth.tar) |71.6
 17 | |RRC + Randomized Quantization (800 epochs)  |[model](https://frontiers.blob.core.windows.net/pretraining/projects/whm_ckpt/random_quantize/rrc_randomized_quantization_800ep.pth.tar) |72.1
 18 | 
 19 | ## Pretrained checkpoints on [Audioset](https://ieeexplore.ieee.org/document/7952261) under [byol-a](https://arxiv.org/abs/2103.06695)
 20 | We largely follow the experimental settings of [BYOL-A](https://arxiv.org/abs/2103.06695) and treat it as our baseline. We replace the Mixup augmentation used in [BYOL-A](https://arxiv.org/abs/2103.06695) with our randomized quantization. The network is trained on [Audioset](https://ieeexplore.ieee.org/document/7952261) for 100 epoches. On six downstream audio classification datasets, including NSynth ([NS](https://arxiv.org/abs/1704.01279)), UrbanSound8K ([US8K](https://dl.acm.org/doi/abs/10.1145/2647868.2655045)), VoxCeleb1 ([VC1](https://arxiv.org/abs/1706.08612)), VoxForge ([VF](Voxforge.org)), Speech Commands V2 ([SPCV2/12](https://arxiv.org/abs/1804.03209)), Speech Commands V2 ([SPCV2](https://arxiv.org/abs/1804.03209)) , linear probing results are reported as below:
 21 | | Method |Augmentations|NS|US8K|VC1|VF|SPCV2/12|SPCV2|Average
 22 |  :-: | :-:| :-: | :-: | :-: | :-: | :-: | :-: | :-:
 23 | |BYOL-A |RRC + [Mixup](https://arxiv.org/abs/1710.09412)|74.1|79.1|40.1|90.2|91.0|92.2|77.8
 24 | |[Our model](https://frontiers.blob.core.windows.net/pretraining/projects/whm_ckpt/random_quantize/randomized_quantization_audio.pth) |RRC + Randomized Quantization|74.2|78.0|45.7|92.6|95.1|92.1|79.6
 25 | 
 26 | 
 27 | ## Usage
 28 | The code has been tested with PyTorch 1.10.0, CUDA 11.3 and CuDNN 8.2.0. 
 29 | You are recommended to work with [this docker image](https://hub.docker.com/layers/wuzhiron/pytorch/pytorch1.10.0-cuda11.3-cudnn8-singularity/images/sha256-3e0feccdb9a72cc93e520c35dcf08b928ca379234e4ed7fe7376f7eb53d1dd7a?context=explore).
 30 | Bellow are use cases based on [moco-v3](https://github.com/facebookresearch/moco-v3) with minimal effort that allow people having an interest to immediately inject our augmentation into their own project.
 31 | 
 32 | 1. Call the augmentation as one of torchvision.transforms modules. 
 33 | ```python
 34 | region_num = 8
 35 | #https://github.com/facebookresearch/moco-v3/blob/c349e6e24f40d3fedb22d973f92defa4cedf37a7/main_moco.py#L262-L285
 36 | augmentation1 = [
 37 |     transforms.RandomResizedCrop(224, scale=(args.crop_min, 1.)),
 38 |     RandomizedQuantizationAugModule(region_num, transforms_like=True),
 39 |     transforms.ToTensor()
 40 | ]
 41 | augmentation2 = [
 42 |     transforms.RandomResizedCrop(224, scale=(args.crop_min, 1.)),
 43 |     RandomizedQuantizationAugModule(region_num, transforms_like=True),
 44 |     transforms.ToTensor()
 45 | ]
 46 | ```
 47 | 2. Apply randomly our augmentation with a given probability.
 48 | ```python
 49 | region_num = 8
 50 | p_random_apply1, p_random_apply2 = 0.5, 0.5
 51 | #https://github.com/facebookresearch/moco-v3/blob/c349e6e24f40d3fedb22d973f92defa4cedf37a7/main_moco.py#L262
 52 | augmentation1 = [
 53 |     transforms.RandomResizedCrop(224, scale=(args.crop_min, 1.)),
 54 |     RandomizedQuantizationAugModule(region_num, p_random_apply_rand_quant=p_random_apply1),
 55 |     transforms.ToTensor()
 56 | ]
 57 | augmentation2 = [
 58 |     transforms.RandomResizedCrop(224, scale=(args.crop_min, 1.)),
 59 |     RandomizedQuantizationAugModule(region_num, p_random_apply_rand_quant=p_random_apply2),
 60 |     transforms.ToTensor()
 61 | ]
 62 | ```
 63 | 3. Call the augmentation in forward(). This is faster than above two usages since the augmentation is deployed on GPUs.
 64 | ```python
 65 | # https://github.com/facebookresearch/moco-v3/blob/c349e6e24f40d3fedb22d973f92defa4cedf37a7/moco/builder.py#L35
 66 | region_num = 8
 67 | self.rand_quant_layer = RandomizedQuantizationAugModule(region_num)
 68 | # https://github.com/facebookresearch/moco-v3/blob/c349e6e24f40d3fedb22d973f92defa4cedf37a7/moco/builder.py#L86-L94
 69 | q1 = self.predictor(self.base_encoder(self.rand_quant_layer(x1)))
 70 | q2 = self.predictor(self.base_encoder(self.rand_quant_layer(x2)))
 71 | 
 72 | with torch.no_grad():  # no gradient
 73 |     self._update_momentum_encoder(m)  # update the momentum encoder
 74 | 
 75 |     # compute momentum features as targets
 76 |     k1 = self.momentum_encoder(self.rand_quant_layer(x1))
 77 |     k2 = self.momentum_encoder(self.rand_quant_layer(x2))
 78 | ```
 79 | 
 80 | ## Citation
 81 | ```
 82 | @inproceedings{wu2023randomized,
 83 |   title={Randomized Quantization: A Generic Augmentation for Data Agnostic Self-supervised Learning},
 84 |   author={Wu, Huimin and Lei, Chenyang and Sun, Xiao and Wang, Peng-Shuai and Chen, Qifeng and Cheng, Kwang-Ting and Lin, Stephen and Wu, Zhirong},
 85 |   booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision},
 86 |   pages={16305--16316},
 87 |   year={2023}
 88 | }
 89 | 
 90 | @Article{wu2023randomized,
 91 |   author={Huimin Wu and Chenyang Lei and Xiao Sun and Peng-Shuai Wang and Qifeng Chen and Kwang-Ting Cheng and Stephen Lin and Zhirong Wu},
 92 |   journal = {arXiv:2212.08663},
 93 |   title={Randomized Quantization: A Generic Augmentation for Data Agnostic Self-supervised Learning}, 
 94 |   year={2023},
 95 | }
 96 | 
 97 | ```
 98 | ## Contributing
 99 | 
100 | This project welcomes contributions and suggestions.  Most contributions require you to agree to a
101 | Contributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us
102 | the rights to use your contribution. For details, visit https://cla.opensource.microsoft.com.
103 | 
104 | When you submit a pull request, a CLA bot will automatically determine whether you need to provide
105 | a CLA and decorate the PR appropriately (e.g., status check, comment). Simply follow the instructions
106 | provided by the bot. You will only need to do this once across all repos using our CLA.
107 | 
108 | This project has adopted the [Microsoft Open Source Code of Conduct](https://opensource.microsoft.com/codeofconduct/).
109 | For more information see the [Code of Conduct FAQ](https://opensource.microsoft.com/codeofconduct/faq/) or
110 | contact [opencode@microsoft.com](mailto:opencode@microsoft.com) with any additional questions or comments.
111 | 
112 | ## Trademarks
113 | 
114 | This project may contain trademarks or logos for projects, products, or services. Authorized use of Microsoft 
115 | trademarks or logos is subject to and must follow 
116 | [Microsoft's Trademark & Brand Guidelines](https://www.microsoft.com/en-us/legal/intellectualproperty/trademarks/usage/general).
117 | Use of Microsoft trademarks or logos in modified versions of this project must not cause confusion or imply Microsoft sponsorship.
118 | Any use of third-party trademarks or logos are subject to those third-party's policies.
119 | 


--------------------------------------------------------------------------------