├── .gitmodules
├── README.md
└── imgs
    ├── comp.png
    └── norm.png


/.gitmodules:
--------------------------------------------------------------------------------
 1 | [submodule "maskrcnn-benchmark"]
 2 | 	path = maskrcnn-benchmark
 3 | 	url = https://github.com/joe-siyuan-qiao/maskrcnn-benchmark.git
 4 | [submodule "pytorch-classification"]
 5 | 	path = pytorch-classification
 6 | 	url = https://github.com/joe-siyuan-qiao/pytorch-classification.git
 7 | [submodule "dgcnn"]
 8 | 	path = dgcnn
 9 | 	url = git@github.com:csrhddlam/dgcnn.git
10 | [submodule "DeepLabv3.pytorch"]
11 | 	path = DeepLabv3.pytorch
12 | 	url = https://github.com/chenxi116/DeepLabv3.pytorch.git
13 | 


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
 1 | # Weight Standardization
 2 | 
 3 | [Weight Standardization](https://arxiv.org/abs/1903.10520) (WS) is a normalization method to accelerate **micro-batch training**.
 4 | Micro-batch training is hard because small batch sizes are not enough for
 5 | training networks with Batch Normalization (BN), while
 6 | other normalization methods that do not rely on batch
 7 | knowledge still have difficulty matching the performances
 8 | of BN in large-batch training.
 9 | 
10 | **Our WS ends this problem** because when used with Group Normalization and trained
11 | with 1 image/GPU, WS is able to match or outperform the
12 | performances of BN trained with large batch sizes with **only
13 | 2 more lines of code**.
14 | So if you are facing any micro-batch training problem, please do yourself a favor and try Weight Standardization.
15 | You will be surprised by how well it performs.
16 | 
17 | <p float="left">
18 |   <img src="imgs/comp.png" height="200" />
19 |   <img src="imgs/norm.png" height="200" />
20 | </p>
21 | 
22 | WS achieves these superior results by standardizing the weights in the convolutional layers, which we show is able to smooth the loss landscape by reducing the Lipschitz constants of the loss and the gradients.
23 | Please see our [arXiv](https://arxiv.org/abs/1903.10520) report for the details.
24 | If you find this project helpful, please consider citing our paper.
25 | 
26 | ```
27 | @article{weightstandardization,
28 |   author    = {Siyuan Qiao and Huiyu Wang and Chenxi Liu and Wei Shen and Alan Yuille},
29 |   title     = {Weight Standardization},
30 |   journal   = {arXiv preprint arXiv:1903.10520},
31 |   year      = {2019},
32 | }
33 | ```
34 | 
35 | ## Weight Standardization on computer vision tasks
36 | This repo holds our implementations of [Weight Standardization](https://github.com/joe-siyuan-qiao/WeightStandardization) for the following tasks.
37 | 
38 | | Task | Folder |
39 | |------|:------:|
40 | | ImageNet classification | pytorch-classification |
41 | | Object detection and instance segmentation on COCO | maskrcnn-benchmark |
42 | | Semantic segmentation on PASCAL VOC | DeepLabv3.pytorch |
43 | | Point cloud classification on ModelNet40 | dgcnn |
44 | 
45 | ## Implementing WS as a layer
46 | We provide layer implementation of WS in TensorFlow and PyTorch.
47 | Replacing the convolutional layers with the following ones will give you performance boosts for free.
48 | **NOTE:** only replace convolutional layers that are followed by normalization layers such as BN, GN, etc.
49 | #### PyTorch
50 | ```python
51 | class Conv2d(nn.Conv2d):
52 | 
53 |     def __init__(self, in_channels, out_channels, kernel_size, stride=1,
54 |                  padding=0, dilation=1, groups=1, bias=True):
55 |         super(Conv2d, self).__init__(in_channels, out_channels, kernel_size, stride,
56 |                  padding, dilation, groups, bias)
57 | 
58 |     def forward(self, x):
59 |         weight = self.weight
60 |         weight_mean = weight.mean(dim=1, keepdim=True).mean(dim=2,
61 |                                   keepdim=True).mean(dim=3, keepdim=True)
62 |         weight = weight - weight_mean
63 |         std = weight.view(weight.size(0), -1).std(dim=1).view(-1, 1, 1, 1) + 1e-5
64 |         weight = weight / std.expand_as(weight)
65 |         return F.conv2d(x, weight, self.bias, self.stride,
66 |                         self.padding, self.dilation, self.groups)
67 | ```
68 | #### TensorFlow
69 | ```python
70 | kernel_mean = tf.math.reduce_mean(kernel, axis=[0, 1, 2], keepdims=True, name='kernel_mean')
71 | kernel = kernel - kernel_mean
72 | #kernel_std = tf.math.reduce_std(kernel, axis=[0, 1, 2], keepdims=True, name='kernel_std')
73 | kernel_std = tf.keras.backend.std(kernel, axis=[0, 1, 2], keepdims=True)
74 | kernel = kernel / (kernel_std + 1e-5)
75 | ```
76 | 


--------------------------------------------------------------------------------
/imgs/comp.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/joe-siyuan-qiao/WeightStandardization/d81e50d05ad449fe4b4b0a70d964a28050961dbb/imgs/comp.png


--------------------------------------------------------------------------------
/imgs/norm.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/joe-siyuan-qiao/WeightStandardization/d81e50d05ad449fe4b4b0a70d964a28050961dbb/imgs/norm.png


--------------------------------------------------------------------------------