├── .gitignore ├── 0.png ├── 1.png ├── 2.png ├── 3.png ├── README.md ├── README_en.md ├── attention.py ├── deit ├── deit.py ├── transforms.py └── wenht.jpg ├── detr ├── main.py ├── resnet.py └── transformer.py ├── distributed └── main.py ├── iterator └── tmp.py ├── load_config ├── a.yaml ├── config.py └── main.py ├── resnet.py ├── swin_transformer ├── main_1128.py ├── main_1129.py ├── main_1130.py └── mask_1129.py ├── vit.py └── vit_1126.py /.gitignore: -------------------------------------------------------------------------------- 1 | *__pycache__/ 2 | *.pyc 3 | -------------------------------------------------------------------------------- /0.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/hatimwen/Paddle_VIT_tutorial/b1ab0d408f5dad4f8f6690bde4f91fffc5b9f118/0.png -------------------------------------------------------------------------------- /1.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/hatimwen/Paddle_VIT_tutorial/b1ab0d408f5dad4f8f6690bde4f91fffc5b9f118/1.png -------------------------------------------------------------------------------- /2.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/hatimwen/Paddle_VIT_tutorial/b1ab0d408f5dad4f8f6690bde4f91fffc5b9f118/2.png -------------------------------------------------------------------------------- /3.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/hatimwen/Paddle_VIT_tutorial/b1ab0d408f5dad4f8f6690bde4f91fffc5b9f118/3.png -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Paddle_VIT_tutorial 2 | 3 | Baidu PaddlePaddle `从零开始学视觉Transformer` Dr. Zhu's codes. 4 | 5 | [English](./README_en.md) | 简体中文 6 | 7 | 课程链接:https://aistudio.baidu.com/aistudio/course/introduce/25102?directly=1&shared=1 8 | 9 | 官方代码链接:https://github.com/BR-IDL/PaddleViT/tree/develop/edu 10 | 11 | 同步上课讲的一些代码,纯手敲,仅供参考,有问题可以一起交流学习。 12 | 13 | 具体时间线及对应代码如下: 14 | 15 | - Class #0, 2021.11.23 16 | 17 | resnet18 实现 [resnet.py](./resnet.py) 18 | 19 | - Class #1, 2021.11.24 20 | 21 | 开始搭建ViT [vit.py](./vit.py) 22 | 23 | - Class #2, 2021.11.25 24 | 25 | Multi-Head Self Attention [attention.py](./attention.py) 26 | 27 | - Class #3, 2021.11.26 28 | 29 | 实现一个ViT模型 [vit_1126.py](./vit_1126.py) 30 | 31 | - Class #4, 2021.11.27 32 | 33 | 实现DeiT [deit/deit.py](./deit/deit.py) 34 | 35 | 图像输入网络前的步骤——图像处理 [deit/transforms.py](./deit/transforms.py) 36 | 37 | - Class #5, 2021.11.28 38 | 39 | 图像窗口上的注意力机制 [swin_transformer/main_1128.py](./swin_transformer/main_1128.py) 40 | 41 | - Class #6, 2021.11.29 42 | 43 | 注意力掩码 Attention Mask [swin_transformer/mask_1129.py](./swin_transformer/mask_1129.py) 44 | 45 | 实现Swin Transformer 的 SwinBlock [swin_transformer/main_1129.py](./swin_transformer/main_1129.py) 46 | 47 | - Class #7, 2021.11.30 48 | 49 | 实现 Swin Transformer [swin_transformer/main_1130.py](./swin_transformer/main_1130.py) 50 | 51 | 数据加载过程——迭代器的实现 [iterator_1130/tmp.py](./iterator_1130/tmp.py) 52 | 53 | - Class #8, 2021.11.31 54 | 55 | [PaddleViT](https://github.com/BR-IDL/PaddleViT) 中配置文件的加载逻辑 [load_config](./load_config/) 56 | 57 | - Class #9, 2021.12.1 58 | 59 | PaddlePaddle 进行多机多卡训练 [distributed/main.py](./distributed/main.py) 60 | 61 | - Class #10, 2021.12.2 62 | 63 | 实现 DETR [detr](./detr/) 64 | 65 | 感谢百度飞桨~加油! 66 | -------------------------------------------------------------------------------- /README_en.md: -------------------------------------------------------------------------------- 1 | # Paddle_VIT_tutorial 2 | 3 | English | [简体中文](./README.md) 4 | 5 | This repo contains some codes recorded from the online course, [Learn Vision Transformer from Scratch](https://aistudio.baidu.com/aistudio/course/introduce/25102?directly=1&shared=1), which was lectured by [Dr. Zhu](https://github.com/xperzy), Baidu PaddlePaddle. 6 | 7 | If you have any questions, please feel free to contact me. 8 | 9 | Official code:https://github.com/BR-IDL/PaddleViT/tree/develop/edu 10 | 11 | Timeline and corresponding codes: 12 | 13 | - Class #0, 2021.11.23 14 | 15 | Implementation of resnet18. [resnet.py](./resnet.py) 16 | 17 | - Class #1, 2021.11.24 18 | 19 | Let's build a ViT! [vit.py](./vit.py) 20 | 21 | - Class #2, 2021.11.25 22 | 23 | Multi-Head Self Attention. [attention.py](./attention.py) 24 | 25 | - Class #3, 2021.11.26 26 | 27 | Implementation of ViT. [vit_1126.py](./vit_1126.py) 28 | 29 | - Class #4, 2021.11.27 30 | 31 | Implementation of DeiT. [deit/deit.py](./deit/deit.py) 32 | 33 | Before feeding to a net: Image Preprocess. [deit/transforms.py](./deit/transforms.py) 34 | 35 | - Class #5, 2021.11.28 36 | 37 | Window Attention. [swin_transformer/main_1128.py](./swin_transformer/main_1128.py) 38 | 39 | - Class #6, 2021.11.29 40 | 41 | Attention Mask. [swin_transformer/mask_1129.py](./swin_transformer/mask_1129.py) 42 | 43 | Implementation of SwinBlock, a block of Swin Transformer. [swin_transformer/main_1129.py](./swin_transformer/main_1129.py) 44 | 45 | - Class #7, 2021.11.30 46 | 47 | Implementation of Swin Transformer. [swin_transformer/main_1130.py](./swin_transformer/main_1130.py) 48 | 49 | Used to load data: Iterator. [iterator_1130/tmp.py](./iterator_1130/tmp.py) 50 | 51 | - Class #8, 2021.11.31 52 | 53 | How does [PaddleViT](https://github.com/BR-IDL/PaddleViT) set and load configs? [load_config](./load_config/) 54 | 55 | - Class #9, 2021.12.1 56 | 57 | Distributed training for PaddlePaddle. [distributed/main.py](./distributed/main.py) 58 | 59 | - Class #10, 2021.12.2 60 | 61 | Implementation of DETR. [detr](./detr/) 62 | 63 | Thanks a lot for what Baidu PaddlePaddle have done! Fighting! 64 | -------------------------------------------------------------------------------- /attention.py: -------------------------------------------------------------------------------- 1 | """ 2 | DateTime: 2021.11.25 3 | Written By: Dr. Zhu 4 | Recorded By: Hatimwen 5 | """ 6 | 7 | import paddle as paddle 8 | import paddle.nn as nn 9 | 10 | paddle.set_device('cpu') 11 | 12 | class Attention(nn.Layer): 13 | def __init__(self, embed_dim, num_heads, qkv_bias=False, qk_scale=None, dropout=0., attention_dropout=0.): 14 | super(Attention, self).__init__() 15 | self.embed_dim = embed_dim 16 | self.num_heads = num_heads 17 | self.head_dim = int(self.embed_dim / self.num_heads) 18 | self.all_head_dim = self.head_dim * num_heads 19 | self.qkv = nn.Linear(embed_dim, 20 | self.all_head_dim * 3, 21 | bias_attr=False if qkv_bias is False else None) 22 | self.scale = self.head_dim ** -0.5 if qk_scale is None else qk_scale 23 | self.softmax = nn.Softmax(-1) 24 | self.proj = nn.Linear(self.all_head_dim, self.embed_dim) 25 | 26 | def transpose_multi_head(self, x): 27 | # N: num_patches 28 | # x: [B, N, all_head_dim] 29 | new_shape = x.shape[:-1] + [self.num_heads, self.head_dim] 30 | x = x.reshape(new_shape) 31 | # x: [B, N, num_heads, head_dim] 32 | x = x.transpose([0, 2, 1, 3]) 33 | # x: [B, num_heads, N, head_dim] 34 | return x 35 | 36 | def forward(self, x): 37 | B, N, _ = x.shape 38 | qkv = self.qkv(x).chunk(3, -1) 39 | # [B, N, all_head_dim] * 3 40 | q, k, v = map(self.transpose_multi_head, qkv) 41 | 42 | # q, k, v: [B, num_heads, N, head_dim] 43 | attn = paddle.matmul(q, k, transpose_y=True) # q * k^T 44 | attn = self.scale * attn 45 | attn = self.softmax(attn) 46 | attn_weight = attn 47 | # dropout 48 | # attn :[B, num_heads, N, N] 49 | 50 | out = paddle.matmul(attn, v) # softmax(scale(q * k^T)) * v 51 | out = out.transpose([0, 2, 1, 3]) 52 | # out: [B, N, num_heads, head_dim] 53 | out = out.reshape([B, N, -1]) 54 | 55 | out = self.proj(out) 56 | # dropout 57 | return out, attn_weight 58 | 59 | def main(): 60 | t = paddle.randn([8, 16, 96]) # image tokens 61 | model = Attention(embed_dim=96, num_heads=4, qkv_bias=False, qk_scale=None) 62 | print(model) 63 | out, attn_weight = model(t) 64 | print(out.shape) 65 | print(attn_weight.shape) 66 | 67 | 68 | if __name__ == "__main__": 69 | main() -------------------------------------------------------------------------------- /deit/deit.py: -------------------------------------------------------------------------------- 1 | """ 2 | DateTime: 2021.11.27 3 | Written By: Dr. Zhu 4 | Recorded By: Hatimwen 5 | """ 6 | import paddle 7 | import paddle.nn as nn 8 | 9 | 10 | paddle.set_device('cpu') 11 | 12 | class Identity(nn.Layer): 13 | def __init__(self): 14 | super(Identity, self).__init__() 15 | 16 | def forward(self, x): 17 | return x 18 | 19 | class Mlp(nn.Layer): 20 | def __init__(self, embed_dim, mlp_ratio, dropout=0.): 21 | super().__init__() 22 | self.fc1 = nn.Linear(embed_dim, int(embed_dim * mlp_ratio)) 23 | self.fc2 = nn.Linear(int(embed_dim * mlp_ratio), embed_dim) 24 | self.act = nn.GELU() 25 | self.dropout = nn.Dropout(dropout) 26 | 27 | def forward(self, x): 28 | x = self.fc1(x) 29 | x = self.act(x) 30 | x = self.dropout(x) 31 | x = self.fc2(x) 32 | x = self.dropout(x) 33 | return x 34 | 35 | class PatchEmbedding(nn.Layer): 36 | def __init__(self, image_size=224, patch_size=16, in_channels=3, embed_dim=768, dropout=0.): 37 | super().__init__() 38 | n_patches = (image_size // patch_size) * (image_size // patch_size) 39 | self.patch_embedding = nn.Conv2D(in_channels=in_channels, 40 | out_channels=embed_dim, 41 | kernel_size=patch_size, 42 | stride=patch_size) 43 | 44 | self.class_token = paddle.create_parameter( 45 | shape=[1, 1, embed_dim], 46 | dtype='float32', 47 | default_initializer=paddle.nn.initializer.Constant(0.)) 48 | 49 | self.distill_token = paddle.create_parameter( 50 | shape=[1, 1, embed_dim], 51 | dtype='float32', 52 | default_initializer=nn.initializer.TruncatedNormal(std=.02)) 53 | 54 | self.position_embedding = paddle.create_parameter( 55 | shape=[1, n_patches+2, embed_dim], # +2 56 | dtype='float32', 57 | default_initializer=nn.initializer.TruncatedNormal(std=.02)) 58 | 59 | self.dropout = nn.Dropout(dropout) 60 | 61 | def forward(self, x): 62 | # [n, c, h, w] 63 | class_tokens = self.class_token.expand([x.shape[0], -1, -1]) 64 | distill_tokens = self.distill_token.expand([x.shape[0], -1, -1]) 65 | x = self.patch_embedding(x) #[n, embed_dim, h', w'] 66 | x = x.flatten(2) # [n, embed_dim, h' * w'] 67 | x = x.transpose([0, 2, 1]) # [n, h' * w, embed_dim] 68 | 69 | 70 | x = paddle.concat([class_tokens, distill_tokens, x], axis=1) 71 | 72 | x = x + self.position_embedding 73 | x = self.dropout(x) 74 | return x 75 | 76 | 77 | class Attention(nn.Layer): 78 | """multi-head self attention""" 79 | def __init__(self, embed_dim, num_heads, qkv_bias=True, dropout=0., attention_dropout=0.): 80 | super().__init__() 81 | self.num_heads = num_heads 82 | self.head_dim = int(embed_dim / num_heads) 83 | self.all_head_dim = self.head_dim * num_heads 84 | self.scale = self.head_dim ** -0.5 85 | 86 | self.qkv = nn.Linear(embed_dim, 87 | self.all_head_dim * 3) 88 | 89 | self.proj = nn.Linear(self.all_head_dim, embed_dim) 90 | 91 | self.dropout = nn.Dropout(dropout) 92 | self.attention_dropout = nn.Dropout(attention_dropout) 93 | self.softmax = nn.Softmax(axis=-1) 94 | 95 | def transpose_multi_head(self, x): 96 | # N: num_patches 97 | # x: [B, N, all_head_dim] 98 | new_shape = x.shape[:-1] + [self.num_heads, self.head_dim] 99 | x = x.reshape(new_shape) 100 | # x: [B, N, num_heads, head_dim] 101 | x = x.transpose([0, 2, 1, 3]) 102 | # x: [B, num_heads, N, head_dim] 103 | return x 104 | 105 | def forward(self, x): 106 | B, N, _ = x.shape 107 | qkv = self.qkv(x).chunk(3, -1) 108 | # [B, N, all_head_dim] * 3 109 | q, k, v = map(self.transpose_multi_head, qkv) 110 | 111 | # q, k, v: [B, num_heads, N, head_dim] 112 | attn = paddle.matmul(q, k, transpose_y=True) # q * k^T 113 | attn = self.scale * attn 114 | attn = self.softmax(attn) 115 | attn = self.attention_dropout(attn) 116 | # attn :[B, num_heads, N, N] 117 | 118 | out = paddle.matmul(attn, v) # softmax(scale(q * k^T)) * v 119 | out = out.transpose([0, 2, 1, 3]) 120 | # out: [B, N, num_heads, head_dim] 121 | out = out.reshape([B, N, -1]) 122 | 123 | out = self.proj(out) 124 | out = self.dropout(out) 125 | return out 126 | 127 | class EncoderLayer(nn.Layer): 128 | def __init__(self, embed_dim=768, num_heads=4, qkv_bias=True, mlp_ratio=40, dropout=0., attention_dropout=0.): 129 | super().__init__() 130 | self.attn_norm = nn.LayerNorm(embed_dim) 131 | self.attn = Attention(embed_dim, num_heads) 132 | self.mlp_norm = nn.LayerNorm(embed_dim) 133 | self.mlp = Mlp(embed_dim, mlp_ratio) 134 | 135 | def forward(self, x): 136 | h = x # residual 137 | x = self.attn_norm(x) 138 | x = self.attn(x) 139 | x = x + h 140 | 141 | h = x 142 | x = self.mlp_norm(x) 143 | x = self.mlp(x) 144 | x = x + h 145 | return x 146 | 147 | class Encoder(nn.Layer): 148 | def __init__(self, embed_dim, depth): 149 | super().__init__() 150 | layer_list = [] 151 | for _ in range(depth): 152 | encoder_layer = EncoderLayer() 153 | layer_list.append(encoder_layer) 154 | self.layers = nn.LayerList(layer_list) 155 | self.norm = nn.LayerNorm(embed_dim) 156 | 157 | def forward(self, x): 158 | for layer in self.layers: 159 | x = layer(x) 160 | x = self.norm(x) 161 | 162 | return x[:, 0], x[:, 1] 163 | 164 | 165 | class Deit(nn.Layer): 166 | def __init__(self, 167 | image_size=224, 168 | patch_size=16, 169 | in_channels=3, 170 | num_classes=1000, 171 | embed_dim=768, 172 | depth=3, 173 | num_heads=8, 174 | mlp_ratio=4, 175 | qkv_bias=True, 176 | dropout=0., 177 | attention_dropout=0., 178 | droppath=0.): 179 | super().__init__() 180 | self.patch_embedding = PatchEmbedding(224, 16, 3, 768) 181 | self.encoder = Encoder(embed_dim, depth) 182 | self.head = nn.Linear(embed_dim, num_classes) 183 | self.head_distill = nn.Linear(embed_dim, num_classes) 184 | 185 | def forward(self, x): 186 | x = self.patch_embedding(x) 187 | x, x_distill = self.encoder(x) 188 | x = self.head(x) 189 | x_distill = self.head_distill(x_distill) 190 | if self.training: 191 | return x, x_distill 192 | else: 193 | return (x + x_distill) / 2 194 | 195 | def main(): 196 | model = Deit() 197 | print(model) 198 | paddle.summary(model, (4, 3, 224, 224)) 199 | 200 | 201 | if __name__ == '__main__': 202 | main() -------------------------------------------------------------------------------- /deit/transforms.py: -------------------------------------------------------------------------------- 1 | """ 2 | DateTime: 2021.11.27 3 | Written By: Dr. Zhu 4 | Recorded By: Hatimwen 5 | """ 6 | import numpy as np 7 | from PIL import Image 8 | import paddle 9 | import paddle.vision.transforms as T 10 | paddle.set_device('cpu') 11 | 12 | def crop(img, region): 13 | cropped_img = T.crop(img, *region) 14 | return cropped_img 15 | 16 | class CenterCrop(): 17 | def __init__(self, size): 18 | self.size = size 19 | def __call__(self, img): 20 | w, h = img.size 21 | cw, ch = self.size 22 | crop_top = int(round((h - ch) / 2.)) 23 | crop_left = int(round((w - cw) / 2.)) 24 | return crop(img, (crop_top, crop_left, ch, cw)) 25 | 26 | class Resize(): 27 | def __init__(self, size): 28 | self.size = size 29 | def __call__(self, img): 30 | return T.resize(img, self.size) 31 | 32 | class ToTensor(): 33 | def __init__(self): 34 | pass 35 | def __call__(self, img): 36 | w, h = img.size 37 | img = paddle.to_tensor(np.array(img)) 38 | if img.dtype == paddle.uint8: 39 | img = paddle.cast(img, paddle.float32) / 255. 40 | # img = img.transpose([2, 0, 1]) 41 | return img 42 | 43 | class Compose(): 44 | def __init__(self, transforms): 45 | self.transforms = transforms 46 | 47 | def __call__(self, image): 48 | for t in self.transforms: 49 | image = t(image) 50 | return image 51 | 52 | def main(): 53 | img = Image.open('deit_1127/wenht.jpg') 54 | img = img.convert('L') 55 | transforms = Compose([Resize([256, 256]), 56 | CenterCrop([112, 112]), 57 | ToTensor()]) 58 | out = transforms(img) 59 | print(out) 60 | print(out.shape) 61 | 62 | if __name__ == '__main__': 63 | main() -------------------------------------------------------------------------------- /deit/wenht.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/hatimwen/Paddle_VIT_tutorial/b1ab0d408f5dad4f8f6690bde4f91fffc5b9f118/deit/wenht.jpg -------------------------------------------------------------------------------- /detr/main.py: -------------------------------------------------------------------------------- 1 | import paddle 2 | import paddle.nn as nn 3 | import paddle.nn.functional as F 4 | 5 | import sys 6 | sys.path.append('./') 7 | from detr.resnet import ResNet18 8 | from detr.transformer import Transformer 9 | sys.path.pop() 10 | 11 | paddle.set_device('cpu') 12 | 13 | 14 | class PositionEmbedding(nn.Layer): 15 | def __init__(self, embed_dim): 16 | super().__init__() 17 | self.row_embed = nn.Embedding(50, embed_dim) 18 | self.col_embed = nn.Embedding(50, embed_dim) 19 | 20 | def forward(self, x): 21 | # x: [b, feat, H, W] 22 | h, w = x.shape[-2:] 23 | i = paddle.arange(w) 24 | j = paddle.arange(h) 25 | x_embed = self.col_embed(i) 26 | y_embed = self.row_embed(i) 27 | pos = paddle.concat([x_embed.unsqueeze(0).expand((h, x_embed.shape[0], x_embed.shape[1])), 28 | y_embed.unsqueeze(1).expand((y_embed.shape[0], w, y_embed.shape[1]))], axis=-1) 29 | pos = pos.transpose([2, 0, 1]) 30 | pos = pos.unsqueeze(0) 31 | pos = pos.expand([x.shape[0]] + pos.shape[1::]) #[batch_size, embed_dim, h, w] 32 | return pos 33 | 34 | 35 | class BboxEmbed(nn.Layer): 36 | def __init__(self, in_dim, hidden_dim, out_dim): 37 | super().__init__() 38 | self.fc1 = nn.Linear(in_dim, hidden_dim) 39 | self.fc2 = nn.Linear(hidden_dim, hidden_dim) 40 | self.fc3 = nn.Linear(hidden_dim, out_dim) 41 | self.act = nn.ReLU() 42 | 43 | def forward(self, x): 44 | x = self.fc1(x) 45 | x = self.act(x) 46 | x = self.fc2(x) 47 | x = self.act(x) 48 | x = self.fc3(x) 49 | return x 50 | 51 | 52 | class DETR(nn.Layer): 53 | def __init__(self, backbone, pos_embed, transformer, num_classes, num_queries): 54 | super().__init__() 55 | self.num_queries = num_queries 56 | self.transformer = transformer 57 | embed_dim = transformer.embed_dim 58 | 59 | self.class_embed = nn.Linear(embed_dim, num_classes + 1) 60 | self.bbox_embed = BboxEmbed(embed_dim, embed_dim, 4) 61 | self.query_embed = nn.Embedding(num_queries, embed_dim) 62 | 63 | self.input_proj = nn.Conv2D(backbone.num_channels, embed_dim, kernel_size=1) 64 | self.backbone = backbone 65 | self.pos_embed = pos_embed 66 | 67 | def forward(self, x): 68 | print(f'----- INPUT: {x.shape}') 69 | feat = self.backbone(x) 70 | print(f'----- Feature after ResNet18: {feat.shape}') 71 | pos_embed = self.pos_embed(feat) 72 | print(f'----- Positional Embedding: {pos_embed.shape}') 73 | 74 | feat = self.input_proj(feat) 75 | print(f'----- Feature after input_proj: {feat.shape}') 76 | out, _ = self.transformer(feat, self.query_embed.weight, pos_embed) 77 | print(f'----- out after transformer: {out.shape}') 78 | 79 | out_class = self.class_embed(out) 80 | out_coord = self.bbox_embed(out) 81 | print(f'----- out for class: {out_class.shape}') 82 | print(f'----- out for bbox: {out_coord.shape}') 83 | #out_coord = F.sigmoid(out_coord) 84 | 85 | return out_class, out_coord 86 | 87 | 88 | def build_detr(): 89 | backbone = ResNet18() 90 | transformer = Transformer() 91 | pos_embed = PositionEmbedding(16) 92 | detr = DETR(backbone, pos_embed, transformer, 10, 100) 93 | return detr 94 | 95 | 96 | def main(): 97 | t = paddle.randn([3, 3, 224, 224]) 98 | model = build_detr() 99 | out = model(t) 100 | print(out[0].shape, out[1].shape) 101 | 102 | 103 | 104 | if __name__ == "__main__": 105 | main() 106 | -------------------------------------------------------------------------------- /detr/resnet.py: -------------------------------------------------------------------------------- 1 | import paddle 2 | import paddle.nn as nn 3 | 4 | paddle.set_device('cpu') 5 | 6 | class Identity(nn.Layer): 7 | def __init__(self): 8 | super().__init__() 9 | 10 | def forward(self, x): 11 | return x 12 | 13 | class Block(nn.Layer): 14 | def __init__(self, in_dim, out_dim, stride): 15 | super().__init__() 16 | self.conv1 = nn.Conv2D(in_dim, out_dim, 3, stride=stride, padding=1, bias_attr=False) 17 | self.bn1 = nn.BatchNorm2D(out_dim) 18 | self.conv2 = nn.Conv2D(out_dim, out_dim, 3, stride=1, padding=1, bias_attr=False) 19 | self.bn2 = nn.BatchNorm2D(out_dim) 20 | self.relu = nn.ReLU() 21 | 22 | if stride == 2 or in_dim != out_dim: 23 | self.downsample = nn.Sequential(*[ 24 | nn.Conv2D(in_dim, out_dim, 1, stride=stride), 25 | nn.BatchNorm2D(out_dim)]) 26 | else: 27 | self.downsample = Identity() 28 | 29 | def forward(self, x): 30 | h = x 31 | x = self.conv1(x) 32 | x = self.bn1(x) 33 | x = self.relu(x) 34 | x = self.conv2(x) 35 | x = self.bn2(x) 36 | identity = self.downsample(h) 37 | x = x + identity 38 | x = self.relu(x) 39 | return x 40 | 41 | class ResNet18(nn.Layer): 42 | def __init__(self, in_dim=64, num_classes=10): 43 | super().__init__() 44 | self.num_channels = 512 45 | self.in_dim = in_dim 46 | # stem layers 47 | self.conv1 = nn.Conv2D(in_channels=3, 48 | out_channels=in_dim, 49 | kernel_size=3, 50 | stride=1, 51 | padding=1, 52 | bias_attr=False) 53 | self.bn1 = nn.BatchNorm2D(in_dim) 54 | self.relu = nn.ReLU() 55 | #blocks 56 | self.layer1 = self._make_layer(dim=64, n_blocks=2, stride=1) 57 | self.layer2 = self._make_layer(dim=128, n_blocks=2, stride=2) 58 | self.layer3 = self._make_layer(dim=256, n_blocks=2, stride=2) 59 | self.layer4 = self._make_layer(dim=512, n_blocks=2, stride=2) 60 | # head layer 61 | self.avgpool = nn.AdaptiveAvgPool2D(1) 62 | self.classifier = nn.Linear(512, num_classes) 63 | 64 | def _make_layer(self, dim, n_blocks, stride): 65 | layer_list = [] 66 | layer_list.append(Block(self.in_dim, dim, stride=stride)) 67 | self.in_dim = dim 68 | for i in range(1, n_blocks): 69 | layer_list.append(Block(self.in_dim, dim, stride=1)) 70 | return nn.Sequential(*layer_list) 71 | 72 | 73 | # CLASS 10: Modify the forward, remove the head and classifier 74 | def forward(self, x): 75 | x = self.conv1(x) 76 | x = self.bn1(x) 77 | x = self.relu(x) 78 | x = self.layer1(x) 79 | x = self.layer2(x) 80 | x = self.layer3(x) 81 | x = self.layer4(x) 82 | return x 83 | 84 | #def forward(self, x): 85 | # x = self.conv1(x) 86 | # x = self.bn1(x) 87 | # x = self.relu(x) 88 | # x = self.layer1(x) 89 | # x = self.layer2(x) 90 | # x = self.layer3(x) 91 | # x = self.layer4(x) 92 | # x = self.forward_feature(x) 93 | # x = self.avgpool(x) 94 | # x = x.flatten(1) 95 | # x = self.classifier(x) 96 | # return x 97 | 98 | def main(): 99 | t = paddle.randn([4, 3, 224, 224]) 100 | model = ResNet18() 101 | print(model) 102 | out = model(t) 103 | print(out.shape) 104 | 105 | if __name__ == "__main__": 106 | main() -------------------------------------------------------------------------------- /detr/transformer.py: -------------------------------------------------------------------------------- 1 | import copy 2 | import paddle 3 | import paddle.nn as nn 4 | import paddle.nn.functional as F 5 | 6 | paddle.set_device('cpu') 7 | 8 | class Identity(nn.Layer): 9 | def __init__(self): 10 | super().__init__() 11 | 12 | def forward(self, x): 13 | return x 14 | 15 | 16 | class Mlp(nn.Layer): 17 | def __init__(self, embed_dim, mlp_ratio, dropout=0.): 18 | super().__init__() 19 | self.fc1 = nn.Linear(embed_dim, int(embed_dim * mlp_ratio)) 20 | self.fc2 = nn.Linear(int(embed_dim * mlp_ratio), embed_dim) 21 | self.act = nn.GELU() 22 | self.dropout = nn.Dropout(dropout) 23 | 24 | def forward(self, x): 25 | x = self.fc1(x) 26 | x = self.act(x) 27 | x = self.dropout(x) 28 | x = self.fc2(x) 29 | x = self.dropout(x) 30 | return x 31 | 32 | 33 | class Attention(nn.Layer): 34 | """multi-head self attention""" 35 | def __init__(self, embed_dim, num_heads, qkv_bias=True, dropout=0., attention_dropout=0.): 36 | super().__init__() 37 | self.num_heads = num_heads 38 | self.head_dim = int(embed_dim / num_heads) 39 | self.all_head_dim = self.head_dim * num_heads 40 | self.scales = self.head_dim ** -0.5 41 | 42 | 43 | # CLASS 10: support decoder 44 | self.q = nn.Linear(embed_dim, 45 | self.all_head_dim) 46 | self.k = nn.Linear(embed_dim, 47 | self.all_head_dim) 48 | self.v = nn.Linear(embed_dim, 49 | self.all_head_dim) 50 | 51 | 52 | self.proj = nn.Linear(self.all_head_dim, embed_dim) 53 | self.dropout = nn.Dropout(dropout) 54 | self.attention_dropout = nn.Dropout(attention_dropout) 55 | self.softmax = nn.Softmax(axis=-1) 56 | 57 | def transpose_multihead(self, x): 58 | # x: [seq_l, batch, all_head_dim] -> [seq_l, batch, n_head, head_dim] 59 | new_shape = x.shape[:-1] + [self.num_heads, self.head_dim] 60 | x = x.reshape(new_shape) 61 | x = x.flatten(1, 2) # merge batch and n_head: [seq_l, batch*n_head, head_dim] 62 | x = x.transpose([1, 0, 2]) #[batch * n_head, seq_l, head_dim] 63 | return x 64 | 65 | def forward(self, query, key, value): 66 | lk = key.shape[0] # when enc-dec: num_patches (sequence len, token len) 67 | b = key.shape[1] # when enc-dec: batch_size 68 | lq = query.shape[0] # when enc-dec: num_queries 69 | d = query.shape[2] # when enc-dec: embed_dim 70 | 71 | q = self.q(query) 72 | k = self.k(key) 73 | v = self.v(value) 74 | q, k, v = map(self.transpose_multihead, [q, k, v]) 75 | 76 | print(f'----- ----- ----- ----- [Attn] batch={key.shape[1]}, n_head={self.num_heads}, head_dim={self.head_dim}') 77 | print(f'----- ----- ----- ----- [Attn] q: {q.shape}, k: {k.shape}, v:{v.shape}') 78 | attn = paddle.matmul(q, k, transpose_y=True) # q * k' 79 | attn = attn * self.scales 80 | attn = self.softmax(attn) 81 | attn = self.attention_dropout(attn) 82 | print(f'----- ----- ----- ----- [Attn] attn: {attn.shape}') 83 | 84 | out = paddle.matmul(attn, v) 85 | out = out.transpose([1, 0, 2]) 86 | out = out.reshape([lq, b, d]) 87 | 88 | out = self.proj(out) 89 | out = self.dropout(out) 90 | 91 | return out 92 | 93 | 94 | class EncoderLayer(nn.Layer): 95 | def __init__(self, embed_dim=768, num_heads=4, mlp_ratio=4.0): 96 | super().__init__() 97 | self.attn_norm = nn.LayerNorm(embed_dim) 98 | self.attn = Attention(embed_dim, num_heads) 99 | self.mlp_norm = nn.LayerNorm(embed_dim) 100 | self.mlp = Mlp(embed_dim, mlp_ratio) 101 | 102 | def forward(self, x, pos=None): 103 | 104 | h = x 105 | x = self.attn_norm(x) 106 | q = x + pos if pos is not None else x 107 | k = x + pos if pos is not None else x 108 | print(f'----- ----- ----- encoder q: {q.shape}, k: {k.shape}, v:{x.shape}') 109 | x = self.attn(q, k, x) 110 | x = x + h 111 | 112 | h = x 113 | x = self.mlp_norm(x) 114 | x = self.mlp(x) 115 | x = x + h 116 | print(f'----- ----- ----- encoder out: {x.shape}') 117 | return x 118 | 119 | 120 | class DecoderLayer(nn.Layer): 121 | def __init__(self, embed_dim=768, num_heads=4, mlp_ratio=4.0): 122 | super().__init__() 123 | self.attn_norm = nn.LayerNorm(embed_dim) 124 | self.attn = Attention(embed_dim, num_heads) 125 | self.enc_dec_attn_norm = nn.LayerNorm(embed_dim) 126 | self.enc_dec_attn = Attention(embed_dim, num_heads) 127 | self.mlp_norm = nn.LayerNorm(embed_dim) 128 | self.mlp = Mlp(embed_dim, mlp_ratio) 129 | 130 | def forward(self, x, enc_out, pos=None, query_pos=None): 131 | 132 | h = x 133 | x = self.attn_norm(x) 134 | q = x + query_pos if pos is not None else x 135 | k = x + query_pos if pos is not None else x 136 | print(f'----- ----- ----- decoder(self-attn) q: {q.shape}, k: {k.shape}, v:{x.shape}') 137 | x = self.attn(q, k, x) 138 | x = x + h 139 | 140 | h = x 141 | x = self.enc_dec_attn_norm(x) 142 | q = x + query_pos if pos is not None else x 143 | k = enc_out + pos if pos is not None else x 144 | v = enc_out 145 | print(f'----- ----- ----- decoder(enc-dec attn) q: {q.shape}, k: {k.shape}, v:{v.shape}') 146 | x = self.attn(q, k, v) 147 | x = x + h 148 | 149 | h = x 150 | x = self.mlp_norm(x) 151 | x = self.mlp(x) 152 | x = x + h 153 | print(f'----- ----- ----- decoder out: {x.shape}') 154 | return x 155 | 156 | 157 | class Transformer(nn.Layer): 158 | def __init__(self, embed_dim=32, num_heads=4, num_encoders=2, num_decoders=2): 159 | super().__init__() 160 | self.embed_dim = embed_dim 161 | self.encoder = nn.LayerList([EncoderLayer(embed_dim, num_heads) for i in range(num_encoders)]) 162 | self.decoder = nn.LayerList([DecoderLayer(embed_dim, num_heads) for i in range(num_decoders)]) 163 | self.encoder_norm = nn.LayerNorm(embed_dim) 164 | self.decoder_norm = nn.LayerNorm(embed_dim) 165 | 166 | def forward(self, x, query_embed, pos_embed): 167 | B, C, H, W = x.shape 168 | print(f'----- ----- Transformer INPUT: {x.shape}') 169 | x = x.flatten(2) #[B, C, H*W] 170 | x = x.transpose([2, 0, 1]) # [H*W, B, C] 171 | print(f'----- ----- Transformer INPUT(after reshape): {x.shape}') 172 | 173 | # [B, dim, H, W] 174 | pos_embed = pos_embed.flatten(2) 175 | pos_embed = pos_embed.transpose([2, 0, 1]) #[H*W, B, dim] 176 | print(f'----- ----- pos_embed(after reshape): {pos_embed.shape}') 177 | 178 | # [num_queries, dim] 179 | query_embed = query_embed.unsqueeze(1) 180 | query_embed = query_embed.expand((query_embed.shape[0], B, query_embed.shape[2])) 181 | print(f'----- ----- query_embed(after reshape): {query_embed.shape}') 182 | 183 | target = paddle.zeros_like(query_embed) 184 | print(f'----- ----- target (now all zeros): {target.shape}') 185 | 186 | for encoder_layer in self.encoder: 187 | encoder_out = encoder_layer(x, pos_embed) 188 | encoder_out = self.encoder_norm(encoder_out) 189 | print(f'----- ----- encoder out: {encoder_out.shape}') 190 | 191 | for decoder_layer in self.decoder: 192 | decoder_out = decoder_layer(target, 193 | encoder_out, 194 | pos_embed, 195 | query_embed) 196 | decoder_out = self.decoder_norm(decoder_out) 197 | decoder_out = decoder_out.unsqueeze(0) 198 | print(f'----- ----- decoder out: {decoder_out.shape}') 199 | 200 | 201 | decoder_out = decoder_out.transpose([0, 2, 1, 3]) #[1, B, num_queries, embed_dim] 202 | encoder_out = encoder_out.transpose([1, 2, 0]) 203 | encoder_out = encoder_out.reshape([B, C, H, W]) 204 | print(f'----- ----- decoder out(after reshape): {decoder_out.shape}') 205 | 206 | return decoder_out, encoder_out 207 | 208 | 209 | def main(): 210 | trans = Transformer() 211 | print(trans) 212 | 213 | 214 | if __name__ == "__main__": 215 | main() -------------------------------------------------------------------------------- /distributed/main.py: -------------------------------------------------------------------------------- 1 | import numpy as np 2 | import paddle 3 | import paddle.nn as nn 4 | import paddle.distributed as dist 5 | from paddle.io import Dataset 6 | from paddle.io import DataLoader 7 | from paddle.io import DistributedBatchSampler 8 | 9 | class MyDataset(Dataset): 10 | def __init__(self): 11 | super().__init__() 12 | self.data = np.arange(32).astype('float32')[:, np.newaxis] 13 | 14 | def __getitem__(self, idx): 15 | return paddle.to_tensor(self.data[idx]), paddle.to_tensor(self.data[idx]) 16 | 17 | def __len__(self): 18 | return len(self.data) 19 | 20 | def get_dataset(): 21 | dataset = MyDataset() 22 | return dataset 23 | 24 | def get_dataloader(dataset, batch_size): 25 | sample = DistributedBatchSampler(dataset, batch_size=batch_size, shuffle=False) 26 | dataloader = DataLoader(dataset, batch_sampler=sample) 27 | return dataloader 28 | 29 | def build_model(): 30 | model = nn.Sequential(*[ 31 | nn.Linear(1, 8), 32 | nn.ReLU(), 33 | nn.Linear(8, 10) 34 | ]) 35 | return model 36 | 37 | def main_worker(*args): 38 | dataset = args[0] 39 | dist.init_parallel_env() 40 | world_size = dist.get_world_size() 41 | local_rank = dist.get_rank() 42 | 43 | dataloader = get_dataloader(dataset, batch_size=1) 44 | 45 | model = build_model() 46 | model = paddle.DataParallel(model) 47 | print(f'Hello PPViT, I am {local_rank}: I built a model for myself.') 48 | 49 | tensor_list = [] 50 | for data in dataloader: 51 | sample = data[0] 52 | label = data[1] 53 | 54 | out = model(sample) 55 | out = out.argmax(1) 56 | print(f'{local_rank} I got data:{sample.cpu().numpy()}, I have out: {out.cpu().numpy()}') 57 | 58 | dist.all_gather(tensor_list, out) 59 | if local_rank == 0: 60 | print(f'I am {local_rank}: I got all_gathered out: {tensor_list}') 61 | break 62 | 63 | def main(): 64 | dataset = get_dataset() 65 | dist.spawn(main_worker, args=(dataset,), nprocs=1) 66 | 67 | if __name__ == '__main__': 68 | main() -------------------------------------------------------------------------------- /iterator/tmp.py: -------------------------------------------------------------------------------- 1 | 2 | class MyIterable(): 3 | def __init__(self): 4 | self.data = [1, 2, 3, 4, 5] 5 | def __iter__(self): 6 | return MyIterator(self.data) 7 | 8 | def __getitem__(self, idx): 9 | return self.data[idx] 10 | 11 | class MyIterator(): 12 | def __init__(self, data): 13 | self.data = data 14 | self.counter = 0 15 | 16 | def __iter__(self): 17 | return self 18 | 19 | def __next__(self): 20 | if self.counter >= len(self.data): 21 | raise StopIteration() 22 | data = self.data[self.counter] 23 | self.counter += 1 24 | return data 25 | 26 | my_iterable = MyIterable() 27 | for d in my_iterable: 28 | print(d) 29 | print(my_iterable[1]) -------------------------------------------------------------------------------- /load_config/a.yaml: -------------------------------------------------------------------------------- 1 | DATA: 2 | BATCH_SIZE: 512 3 | MODEL: 4 | TRANS: 5 | EMBED_DIM: 768 -------------------------------------------------------------------------------- /load_config/config.py: -------------------------------------------------------------------------------- 1 | from yacs.config import CfgNode as CN 2 | import yaml 3 | 4 | _C = CN() 5 | _C.DATA = CN() 6 | _C.DATA.DATASET = 'cifar10' 7 | _C.DATA.BATCH_SIZE = 128 8 | 9 | _C.MODEL = CN() 10 | _C.MODEL.NUM_CLASSES = 10 11 | 12 | _C.MODEL.TRANS = CN() 13 | _C.MODEL.TRANS.EMBED_DIM = 96 14 | _C.MODEL.TRANS.DEPTHS = [2, 2, 6, 2] 15 | _C.MODEL.TRANS.QKV_BIAS = False 16 | 17 | def _update_config_from_file(config, cfg_file): 18 | config.defrost() 19 | config.merge_from_file(cfg_file) # yaml 20 | 21 | def update_config(config, args): 22 | if args.cfg: 23 | _update_config_from_file(config, args.cfg) 24 | if args.dataset: 25 | config.DATA.DATASET = args.dataset 26 | if args.batch_size: 27 | config.DATA.BATCH_SIZE = args.batch_size 28 | return config 29 | 30 | def get_config(cfg_file=None): 31 | config = _C.clone() 32 | if cfg_file: 33 | _update_config_from_file(config, cfg_file) 34 | return config 35 | 36 | 37 | def main(): 38 | cfg = get_config("load_config/a.yaml") 39 | print(cfg) 40 | 41 | if __name__ == "__main__": 42 | main() -------------------------------------------------------------------------------- /load_config/main.py: -------------------------------------------------------------------------------- 1 | import argparse 2 | from config import get_config, update_config 3 | 4 | def get_arguments(): 5 | parser = argparse.ArgumentParser() 6 | parser.add_argument('-cfg', type=str, default=None, help='config file') 7 | parser.add_argument('-batch_size', type=int, default=1024, help='batch size') 8 | parser.add_argument('-dataset', type=str, default='imagenet', help='dataset') 9 | return parser.parse_args() 10 | 11 | 12 | 13 | def main(): 14 | cfg = get_config() 15 | print(cfg) 16 | print('-----------------') 17 | 18 | 19 | cfg = get_config("load_config/a.yaml") 20 | print(cfg) 21 | print('-----------------') 22 | 23 | args = get_arguments() 24 | cfg = update_config(cfg, args) 25 | print(cfg) 26 | print('-----------------') 27 | 28 | 29 | if __name__ == "__main__": 30 | main() -------------------------------------------------------------------------------- /resnet.py: -------------------------------------------------------------------------------- 1 | """ 2 | DateTime: 2021.11.23 3 | Written By: Dr. Zhu 4 | Recorded By: Hatimwen 5 | """ 6 | import paddle 7 | import paddle.nn as nn 8 | 9 | #paddle.set_device('cpu') 10 | 11 | class Identity(nn.Layer): 12 | def __init__(self): 13 | super().__init__() 14 | 15 | def forward(self, x): 16 | return x 17 | 18 | 19 | class Block(nn.Layer): 20 | def __init__(self, in_dim, out_dim, stride=1): 21 | super().__init__() 22 | ## 补充代码 23 | self.conv1 = nn.Conv2D(in_dim, out_dim, 3, stride=stride, padding=1, bias_attr=False) 24 | self.bn1 = nn.BatchNorm(out_dim) 25 | self.conv2 = nn.Conv2D(out_dim, out_dim, 3, stride=1, padding=1, bias_attr=False) 26 | self.bn2 = nn.BatchNorm(out_dim) 27 | self.relu = nn.ReLU() 28 | 29 | if stride == 2 or in_dim != out_dim: 30 | self.downsample = nn.Sequential(*[ 31 | nn.Conv2D(in_dim, out_dim, 1, stride=stride), 32 | nn.BatchNorm(out_dim) 33 | ]) 34 | else: 35 | self.downsample = Identity() 36 | 37 | def forward(self, x): 38 | ## 补充代码 39 | h = x 40 | x = self.conv1(x) 41 | x = self.bn1(x) 42 | x = self.relu(x) 43 | x = self.conv2(x) 44 | x = self.bn2(x) 45 | identity = self.downsample(h) 46 | x = x + identity 47 | x = self.relu(x) 48 | return x 49 | 50 | 51 | class ResNet18(nn.Layer): 52 | def __init__(self, in_dim=64, num_classes=1000): 53 | super().__init__() 54 | ## 补充代码 55 | self.in_dim = in_dim 56 | self.conv1 = nn.Conv2D(in_channels=3, out_channels=in_dim, kernel_size=3, stride=1, padding=1, bias_attr=False) 57 | self.bn1 = nn.BatchNorm(in_dim) 58 | self.relu = nn.ReLU() 59 | # self.maxpool = nn.MaxPool2D(kernel_size=3, stride=2, padding=1) 60 | 61 | self.layer1 = self._make_layer(64, 2) 62 | self.layer2 = self._make_layer(128, 2, 2) 63 | self.layer3 = self._make_layer(256, 2, 2) 64 | self.layer4 = self._make_layer(512, 2, 2) 65 | 66 | self.avgpool = nn.AdaptiveAvgPool2D(1) 67 | self.fc = nn.Linear(512, num_classes) 68 | 69 | def _make_layer(self, out_dim, n_blocks, stride=1): 70 | ## 补充代码 71 | layer_list = [] 72 | layer_list.append(Block(self.in_dim, out_dim, stride)) 73 | self.in_dim = out_dim 74 | for _ in range(1, n_blocks): 75 | layer_list.append(Block(self.in_dim, out_dim)) 76 | return nn.Sequential(*layer_list) 77 | 78 | def forward(self, x): 79 | ## 补充代码 80 | x = self.conv1(x) 81 | x = self.bn1(x) 82 | x = self.relu(x) 83 | # x = self.maxpool(x) 84 | 85 | x = self.layer1(x) 86 | x = self.layer2(x) 87 | x = self.layer3(x) 88 | x = self.layer4(x) 89 | 90 | x = self.avgpool(x) 91 | x = x.flatten(1) 92 | x = self.fc(x) 93 | 94 | return x 95 | 96 | 97 | 98 | def main(): 99 | model = ResNet18() 100 | print(model) 101 | x = paddle.randn([2, 3, 32, 32]) 102 | out = model(x) 103 | print(out.shape) 104 | 105 | if __name__ == "__main__": 106 | main() 107 | -------------------------------------------------------------------------------- /swin_transformer/main_1128.py: -------------------------------------------------------------------------------- 1 | """ 2 | DateTime: 2021.11.28 3 | Written By: Dr. Zhu 4 | Recorded By: Hatimwen 5 | """ 6 | import paddle 7 | import paddle.nn as nn 8 | 9 | paddle.set_device('cpu') 10 | 11 | class PatchEmbedding(nn.Layer): 12 | def __init__(self, patch_size=4, embed_dim=96): 13 | super().__init__() 14 | self.patch_size = nn.Conv2D(3, embed_dim, kernel_size=patch_size, stride=patch_size) 15 | self.norm = nn.LayerNorm(embed_dim) 16 | 17 | def forward(self, x): 18 | x = self.patch_size(x) # [n, embed_dim, h', w'] 19 | x = x.flatten(2) # [n, embed_dim, h'*w'] 20 | x = x.transpose([0, 2, 1]) # [n, h'*w, embed_dim] 21 | x = self.norm(x) 22 | return x 23 | 24 | 25 | class PatchMerging(nn.Layer): 26 | def __init__(self, input_resolution, dim): 27 | super().__init__() 28 | self.resolution = input_resolution 29 | self.dim = dim 30 | self.reduction = nn.Linear(4 * dim, 2 * dim) 31 | self.norm = nn.LayerNorm(4 * dim) 32 | 33 | def forward(self, x): 34 | h, w = self.resolution 35 | b, _, c = x.shape # _ : num_patches 36 | 37 | x = x.reshape([b, h, w, c]) 38 | 39 | x0 = x[:, 0::2, 0::2, :] 40 | x1 = x[:, 0::2, 1::2, :] 41 | x2 = x[:, 1::2, 0::2, :] 42 | x3 = x[:, 1::2, 1::2, :] 43 | 44 | x = paddle.concat([x0, x1, x2, x3], axis=-1) # [b, h/2, w/2, 4c] 45 | x = x.reshape([b, -1, 4 * c]) 46 | x = self.norm(x) 47 | x = self.reduction(x) 48 | 49 | return x 50 | 51 | class Mlp(nn.Layer): 52 | def __init__(self, dim, mlp_ratio=4.0, dropout=0.): 53 | super().__init__() 54 | self.fc1 = nn.Linear(dim, int(dim * mlp_ratio)) 55 | self.fc2 = nn.Linear(int(dim * mlp_ratio), dim) 56 | self.act = nn.GELU() 57 | self.dropout = nn.Dropout(dropout) 58 | 59 | def forward(self, x): 60 | x = self.fc1(x) 61 | x = self.act(x) 62 | x = self.dropout(x) 63 | x = self.fc2(x) 64 | x = self.dropout(x) 65 | return x 66 | 67 | def windows_partition(x, window_size): 68 | B, H, W, C = x.shape 69 | x = x.reshape([B, H//window_size, window_size, W//window_size, window_size, C]) 70 | x = x.transpose([0, 1, 3, 2, 4, 5]) 71 | # [B, h//ws, w//ws, ws, ws, c] 72 | x = x.reshape([-1, window_size, window_size, C]) 73 | # [B * num_patches, ws, ws, c] 74 | return x 75 | 76 | def windows_reverse(windows, window_size, H, W): 77 | B = int(windows.shape[0] // (H / window_size * W / window_size)) 78 | x = windows.reshape([B, H//window_size, W//window_size, window_size, window_size, -1]) 79 | x = x.transpose([0, 1, 3, 2, 4, 5]) 80 | x = x.reshape([B, H, W, -1]) 81 | return x 82 | 83 | class WindowAttention(nn.Layer): 84 | def __init__(self, dim, window_size, num_heads): 85 | super().__init__() 86 | self.dim = dim 87 | self.dim_head = dim // num_heads 88 | self.num_heads = num_heads 89 | self.scale = self.dim_head ** -0.5 90 | self.softmax = nn.Softmax(axis=-1) 91 | self.qkv = nn.Linear(dim, 3 * dim) 92 | self.proj = nn.Linear(dim, dim) 93 | 94 | def transpose_multi_head(self, x): 95 | new_shape = x.shape[:-1] + [self.num_heads, self.dim_head] 96 | x = x.reshape(new_shape) 97 | x = x.transpose([0, 2, 1, 3]) # [B, num_heads, num_patches, dim_head] 98 | return x 99 | 100 | def forward(self, x): 101 | B, N, C = x.shape 102 | # x: [B, num_patches, embed_dim] 103 | qkv = self.qkv(x).chunk(3, -1) 104 | q, k, v = map(self.transpose_multi_head, qkv) 105 | 106 | q = q * self.scale 107 | attn = paddle.matmul(q, k, transpose_y=True) 108 | attn = self.softmax(attn) 109 | 110 | out = paddle.matmul(attn, v) 111 | # [B, num_heads, num_patches, dim_head] 112 | out = out.transpose([0, 2, 1, 3]) 113 | # [B, num_patches, num_heads, dim_head] num_heads * dim_head = embed_dim 114 | out = out.reshape([B, N, C]) 115 | out = self.proj(out) 116 | return out 117 | 118 | class SwinBlock(nn.Layer): 119 | def __init__(self, dim, input_resolution, num_heads, window_size): 120 | super().__init__() 121 | self.dim = dim 122 | self.resolution = input_resolution 123 | self.window_size = window_size 124 | 125 | self.attn_norm = nn.LayerNorm(dim) 126 | self.attn = WindowAttention(dim, window_size, num_heads) 127 | 128 | self.mlp_norm = nn.LayerNorm(dim) 129 | self.mlp = Mlp(dim) 130 | 131 | def forward(self, x): 132 | H, W = self.resolution 133 | B, N, C = x.shape 134 | 135 | h = x 136 | x = self.attn_norm(x) 137 | 138 | x = x.reshape([B, H, W, C]) 139 | x_windows = windows_partition(x, self.window_size) 140 | # [B * num_patches, ws, ws, c] 141 | x_windows = x_windows.reshape([-1, self.window_size * self.window_size, C]) 142 | attn_windows = self.attn(x_windows) 143 | attn_windows = attn_windows.reshape([-1, self.window_size, self.window_size, C]) 144 | x = windows_reverse(attn_windows, self.window_size, H, W) 145 | # [B, H, W, C] 146 | x = x.reshape([B, H*W, C]) 147 | 148 | x = self.attn(x) 149 | 150 | x = h + x 151 | 152 | h = x 153 | x = self.mlp_norm(x) 154 | x = self.mlp(x) 155 | x = h + x 156 | return x 157 | 158 | def main(): 159 | t = paddle.randn([4, 3, 224, 224]) 160 | print('image shape = ', t.shape) 161 | patch_embedding = PatchEmbedding(patch_size=4, embed_dim=96) 162 | swin_block = SwinBlock(dim=96, input_resolution=[56, 56], num_heads=4, window_size=7) 163 | patch_merging = PatchMerging(input_resolution=[56, 56], dim=96) 164 | 165 | out = patch_embedding(t) # [4, 56, 56, 96] 166 | print('patch_embedding out shape = ', out.shape) 167 | out = swin_block(out) 168 | print('swin_block out shape = ', out.shape) 169 | out = patch_merging(out) 170 | print('patch_merging out shape = ', out.shape) 171 | 172 | if __name__ == '__main__': 173 | main() -------------------------------------------------------------------------------- /swin_transformer/main_1129.py: -------------------------------------------------------------------------------- 1 | """ 2 | DateTime: 2021.11.29 3 | Written By: Dr. Zhu 4 | Recorded By: Hatimwen 5 | """ 6 | import paddle 7 | import paddle.nn as nn 8 | from mask_1129 import generate_mask 9 | 10 | paddle.set_device('cpu') 11 | 12 | class PatchEmbedding(nn.Layer): 13 | def __init__(self, patch_size=4, embed_dim=96): 14 | super().__init__() 15 | self.patch_size = nn.Conv2D(3, embed_dim, kernel_size=patch_size, stride=patch_size) 16 | self.norm = nn.LayerNorm(embed_dim) 17 | 18 | def forward(self, x): 19 | x = self.patch_size(x) # [n, embed_dim, h', w'] 20 | x = x.flatten(2) # [n, embed_dim, h'*w'] 21 | x = x.transpose([0, 2, 1]) # [n, h'*w, embed_dim] 22 | x = self.norm(x) 23 | return x 24 | 25 | 26 | class PatchMerging(nn.Layer): 27 | def __init__(self, input_resolution, dim): 28 | super().__init__() 29 | self.resolution = input_resolution 30 | self.dim = dim 31 | self.reduction = nn.Linear(4 * dim, 2 * dim) 32 | self.norm = nn.LayerNorm(4 * dim) 33 | 34 | def forward(self, x): 35 | h, w = self.resolution 36 | b, _, c = x.shape # _ : num_patches 37 | 38 | x = x.reshape([b, h, w, c]) 39 | 40 | x0 = x[:, 0::2, 0::2, :] 41 | x1 = x[:, 0::2, 1::2, :] 42 | x2 = x[:, 1::2, 0::2, :] 43 | x3 = x[:, 1::2, 1::2, :] 44 | 45 | x = paddle.concat([x0, x1, x2, x3], axis=-1) # [b, h/2, w/2, 4c] 46 | x = x.reshape([b, -1, 4 * c]) 47 | x = self.norm(x) 48 | x = self.reduction(x) 49 | 50 | return x 51 | 52 | class Mlp(nn.Layer): 53 | def __init__(self, dim, mlp_ratio=4.0, dropout=0.): 54 | super().__init__() 55 | self.fc1 = nn.Linear(dim, int(dim * mlp_ratio)) 56 | self.fc2 = nn.Linear(int(dim * mlp_ratio), dim) 57 | self.act = nn.GELU() 58 | self.dropout = nn.Dropout(dropout) 59 | 60 | def forward(self, x): 61 | x = self.fc1(x) 62 | x = self.act(x) 63 | x = self.dropout(x) 64 | x = self.fc2(x) 65 | x = self.dropout(x) 66 | return x 67 | 68 | def windows_partition(x, window_size): 69 | B, H, W, C = x.shape 70 | x = x.reshape([B, H//window_size, window_size, W//window_size, window_size, C]) 71 | x = x.transpose([0, 1, 3, 2, 4, 5]) 72 | # [B, h//ws, w//ws, ws, ws, c] 73 | x = x.reshape([-1, window_size, window_size, C]) 74 | # [B * num_patches, ws, ws, c] 75 | return x 76 | 77 | def windows_reverse(windows, window_size, H, W): 78 | B = int(windows.shape[0] // (H / window_size * W / window_size)) 79 | x = windows.reshape([B, H//window_size, W//window_size, window_size, window_size, -1]) 80 | x = x.transpose([0, 1, 3, 2, 4, 5]) 81 | x = x.reshape([B, H, W, -1]) 82 | return x 83 | 84 | class WindowAttention(nn.Layer): 85 | def __init__(self, dim, window_size, num_heads): 86 | super().__init__() 87 | self.dim = dim 88 | self.dim_head = dim // num_heads 89 | self.num_heads = num_heads 90 | self.scale = self.dim_head ** -0.5 91 | self.softmax = nn.Softmax(axis=-1) 92 | self.qkv = nn.Linear(dim, 3 * dim) 93 | self.proj = nn.Linear(dim, dim) 94 | 95 | def transpose_multi_head(self, x): 96 | new_shape = x.shape[:-1] + [self.num_heads, self.dim_head] 97 | x = x.reshape(new_shape) 98 | x = x.transpose([0, 2, 1, 3]) # [B, num_heads, num_patches, dim_head] 99 | return x 100 | 101 | def forward(self, x, mask=None): 102 | B, N, C = x.shape 103 | # x: [B, num_patches, embed_dim] 104 | qkv = self.qkv(x).chunk(3, -1) 105 | q, k, v = map(self.transpose_multi_head, qkv) 106 | 107 | q = q * self.scale 108 | attn = paddle.matmul(q, k, transpose_y=True) 109 | 110 | ##### BEGIN CLASS 6: Mask 111 | if mask is None: 112 | attn = self.softmax(attn) 113 | else: 114 | # mask: [num_windows, num_patches, num_patches] 115 | # attn: [B*num_windows, num_heads, num_patches, num_patches] 116 | attn = attn.reshape([B//mask.shape[0], mask.shape[0], self.num_heads, mask.shape[1], mask.shape[1]]) 117 | # attn: [B, num_windows, num_heads, num_patches, num_patches] 118 | # mask: [1, num_windows, 1, num_patches, num_patches] 119 | attn = attn + mask.unsqueeze(1).unsqueeze(0) 120 | attn = attn.reshape([-1, self.num_heads, mask.shape[1], mask.shape[1]]) 121 | # attn: [B*num_windows, num_heads, num_patches, num_patches] 122 | ##### END CLASS 6: Mask 123 | 124 | 125 | out = paddle.matmul(attn, v) 126 | # [B, num_heads, num_patches, dim_head] 127 | out = out.transpose([0, 2, 1, 3]) 128 | # [B, num_patches, num_heads, dim_head] num_heads * dim_head = embed_dim 129 | out = out.reshape([B, N, C]) 130 | out = self.proj(out) 131 | return out 132 | 133 | class SwinBlock(nn.Layer): 134 | def __init__(self, dim, input_resolution, num_heads, window_size, shift_size=0): 135 | super().__init__() 136 | self.dim = dim 137 | self.resolution = input_resolution 138 | self.window_size = window_size 139 | self.shift_size = shift_size 140 | 141 | self.attn_norm = nn.LayerNorm(dim) 142 | self.attn = WindowAttention(dim, window_size, num_heads) 143 | 144 | self.mlp_norm = nn.LayerNorm(dim) 145 | self.mlp = Mlp(dim) 146 | 147 | # CLASS 6 148 | if self.shift_size > 0: 149 | attn_mask = generate_mask(window_size=self.window_size, 150 | shift_size=self.shift_size, 151 | input_resolution=self.resolution) 152 | else: 153 | attn_mask = None 154 | self.register_buffer('attn_mask', attn_mask) 155 | 156 | def forward(self, x): 157 | H, W = self.resolution 158 | B, N, C = x.shape 159 | 160 | h = x 161 | x = self.attn_norm(x) 162 | 163 | x = x.reshape([B, H, W, C]) 164 | 165 | ##### BEGIN CLASS 6 166 | # Shift window 167 | if self.shift_size > 0: 168 | shifted_x = paddle.roll(x, shifts=(-self.shift_size, -self.shift_size), axis=(1, 2)) 169 | else: 170 | shifted_x = x 171 | 172 | # Compute window attn 173 | x_windows = windows_partition(shifted_x, self.window_size) 174 | x_windows = x_windows.reshape([-1, self.window_size * self.window_size, C]) 175 | attn_windows = self.attn(x_windows, mask=self.attn_mask) 176 | attn_windows = attn_windows.reshape([-1, self.window_size, self.window_size, C]) 177 | # Shift back 178 | shifted_x = windows_reverse(attn_windows, self.window_size, H, W) 179 | 180 | if self.shift_size > 0: 181 | x = paddle.roll(x, shifts=(self.shift_size, self.shift_size), axis=(1, 2)) 182 | else: 183 | x = shifted_x 184 | ##### END CLASS 6 185 | 186 | 187 | # [B, H, W, C] 188 | x = x.reshape([B, H*W, C]) 189 | 190 | x = self.attn(x) 191 | 192 | x = h + x 193 | 194 | h = x 195 | x = self.mlp_norm(x) 196 | x = self.mlp(x) 197 | x = h + x 198 | return x 199 | 200 | def main(): 201 | t = paddle.randn([4, 3, 224, 224]) 202 | patch_embedding = PatchEmbedding(patch_size=4, embed_dim=96) 203 | swin_block_w_msa = SwinBlock(dim=96, input_resolution=[56, 56], num_heads=4, window_size=7, shift_size=0) 204 | swin_block_sw_msa = SwinBlock(dim=96, input_resolution=[56, 56], num_heads=4, window_size=7, shift_size=7//2) 205 | patch_merging = PatchMerging(input_resolution=[56, 56], dim=96) 206 | 207 | print('image shape = [4, 3, 224, 224]') 208 | out = patch_embedding(t) # [4, 56, 56, 96] 209 | print('patch_embedding out shape = ', out.shape) 210 | out = swin_block_w_msa(out) 211 | out = swin_block_sw_msa(out) 212 | print('swin_block out shape = ', out.shape) 213 | out = patch_merging(out) 214 | print('patch_merging out shape = ', out.shape) 215 | 216 | if __name__ == '__main__': 217 | main() -------------------------------------------------------------------------------- /swin_transformer/main_1130.py: -------------------------------------------------------------------------------- 1 | """ 2 | DateTime: 2021.11.30 3 | Written By: Dr. Zhu 4 | Recorded By: Hatimwen 5 | """ 6 | import paddle 7 | import paddle.nn as nn 8 | from mask_1129 import generate_mask 9 | 10 | paddle.set_device('cpu') 11 | 12 | class Identity(nn.Layer): 13 | def __init__(self): 14 | super().__init__() 15 | 16 | def forward(self, x): 17 | return x 18 | 19 | class PatchEmbedding(nn.Layer): 20 | def __init__(self, patch_size=4, embed_dim=96): 21 | super().__init__() 22 | self.patch_size = nn.Conv2D(3, embed_dim, kernel_size=patch_size, stride=patch_size) 23 | self.norm = nn.LayerNorm(embed_dim) 24 | 25 | def forward(self, x): 26 | x = self.patch_size(x) # [n, embed_dim, h', w'] 27 | x = x.flatten(2) # [n, embed_dim, h'*w'] 28 | x = x.transpose([0, 2, 1]) # [n, h'*w, embed_dim] 29 | x = self.norm(x) 30 | return x 31 | 32 | 33 | class PatchMerging(nn.Layer): 34 | def __init__(self, input_resolution, dim): 35 | super().__init__() 36 | self.resolution = input_resolution 37 | self.dim = dim 38 | self.reduction = nn.Linear(4 * dim, 2 * dim) 39 | self.norm = nn.LayerNorm(4 * dim) 40 | 41 | def forward(self, x): 42 | h, w = self.resolution 43 | b, _, c = x.shape # _ : num_patches 44 | 45 | x = x.reshape([b, h, w, c]) 46 | 47 | x0 = x[:, 0::2, 0::2, :] 48 | x1 = x[:, 0::2, 1::2, :] 49 | x2 = x[:, 1::2, 0::2, :] 50 | x3 = x[:, 1::2, 1::2, :] 51 | 52 | x = paddle.concat([x0, x1, x2, x3], axis=-1) # [b, h/2, w/2, 4c] 53 | x = x.reshape([b, -1, 4 * c]) 54 | x = self.norm(x) 55 | x = self.reduction(x) 56 | 57 | return x 58 | 59 | class Mlp(nn.Layer): 60 | def __init__(self, dim, mlp_ratio=4.0, dropout=0.): 61 | super().__init__() 62 | self.fc1 = nn.Linear(dim, int(dim * mlp_ratio)) 63 | self.fc2 = nn.Linear(int(dim * mlp_ratio), dim) 64 | self.act = nn.GELU() 65 | self.dropout = nn.Dropout(dropout) 66 | 67 | def forward(self, x): 68 | x = self.fc1(x) 69 | x = self.act(x) 70 | x = self.dropout(x) 71 | x = self.fc2(x) 72 | x = self.dropout(x) 73 | return x 74 | 75 | def windows_partition(x, window_size): 76 | B, H, W, C = x.shape 77 | x = x.reshape([B, H//window_size, window_size, W//window_size, window_size, C]) 78 | x = x.transpose([0, 1, 3, 2, 4, 5]) 79 | # [B, h//ws, w//ws, ws, ws, c] 80 | x = x.reshape([-1, window_size, window_size, C]) 81 | # [B * num_patches, ws, ws, c] 82 | return x 83 | 84 | def windows_reverse(windows, window_size, H, W): 85 | B = int(windows.shape[0] // (H / window_size * W / window_size)) 86 | x = windows.reshape([B, H//window_size, W//window_size, window_size, window_size, -1]) 87 | x = x.transpose([0, 1, 3, 2, 4, 5]) 88 | x = x.reshape([B, H, W, -1]) 89 | return x 90 | 91 | class WindowAttention(nn.Layer): 92 | def __init__(self, dim, window_size, num_heads): 93 | super().__init__() 94 | self.dim = dim 95 | self.dim_head = dim // num_heads 96 | self.num_heads = num_heads 97 | self.scale = self.dim_head ** -0.5 98 | self.softmax = nn.Softmax(axis=-1) 99 | self.qkv = nn.Linear(dim, 3 * dim) 100 | self.proj = nn.Linear(dim, dim) 101 | 102 | def transpose_multi_head(self, x): 103 | new_shape = x.shape[:-1] + [self.num_heads, self.dim_head] 104 | x = x.reshape(new_shape) 105 | x = x.transpose([0, 2, 1, 3]) # [B, num_heads, num_patches, dim_head] 106 | return x 107 | 108 | def forward(self, x, mask=None): 109 | B, N, C = x.shape 110 | # x: [B, num_patches, embed_dim] 111 | qkv = self.qkv(x).chunk(3, -1) 112 | q, k, v = map(self.transpose_multi_head, qkv) 113 | 114 | q = q * self.scale 115 | attn = paddle.matmul(q, k, transpose_y=True) 116 | 117 | print('attn shape=', attn.shape) 118 | 119 | ##### BEGIN CLASS 6: Mask 120 | if mask is None: 121 | attn = self.softmax(attn) 122 | else: 123 | # mask: [num_windows, num_patches, num_patches] 124 | # attn: [B*num_windows, num_heads, num_patches, num_patches] 125 | attn = attn.reshape([B//mask.shape[0], mask.shape[0], self.num_heads, mask.shape[1], mask.shape[1]]) 126 | # attn: [B, num_windows, num_heads, num_patches, num_patches] 127 | # mask: [1, num_windows, 1, num_patches, num_patches] 128 | attn = attn + mask.unsqueeze(1).unsqueeze(0) 129 | attn = attn.reshape([-1, self.num_heads, mask.shape[1], mask.shape[1]]) 130 | # attn: [B*num_windows, num_heads, num_patches, num_patches] 131 | ##### END CLASS 6: Mask 132 | 133 | 134 | out = paddle.matmul(attn, v) 135 | # [B, num_heads, num_patches, dim_head] 136 | out = out.transpose([0, 2, 1, 3]) 137 | # [B, num_patches, num_heads, dim_head] num_heads * dim_head = embed_dim 138 | out = out.reshape([B, N, C]) 139 | out = self.proj(out) 140 | return out 141 | 142 | class SwinBlock(nn.Layer): 143 | def __init__(self, dim, input_resolution, num_heads, window_size, shift_size=0): 144 | super().__init__() 145 | self.dim = dim 146 | self.resolution = input_resolution 147 | self.window_size = window_size 148 | self.shift_size = shift_size 149 | 150 | self.attn_norm = nn.LayerNorm(dim) 151 | self.attn = WindowAttention(dim, window_size, num_heads) 152 | 153 | self.mlp_norm = nn.LayerNorm(dim) 154 | self.mlp = Mlp(dim) 155 | 156 | if min(self.resolution) <= self.window_size: 157 | self.shift_size = 0 158 | self.window_size = min(self.resolution) 159 | 160 | # CLASS 6 161 | if self.shift_size > 0: 162 | attn_mask = generate_mask(window_size=self.window_size, 163 | shift_size=self.shift_size, 164 | input_resolution=self.resolution) 165 | else: 166 | attn_mask = None 167 | self.register_buffer('attn_mask', attn_mask) 168 | 169 | def forward(self, x): 170 | H, W = self.resolution 171 | B, N, C = x.shape 172 | 173 | h = x 174 | x = self.attn_norm(x) 175 | 176 | x = x.reshape([B, H, W, C]) 177 | 178 | ##### BEGIN CLASS 6 179 | # Shift window 180 | if self.shift_size > 0: 181 | shifted_x = paddle.roll(x, shifts=(-self.shift_size, -self.shift_size), axis=(1, 2)) 182 | else: 183 | shifted_x = x 184 | 185 | # Compute window attn 186 | x_windows = windows_partition(shifted_x, self.window_size) 187 | x_windows = x_windows.reshape([-1, self.window_size * self.window_size, C]) 188 | attn_windows = self.attn(x_windows, mask=self.attn_mask) 189 | attn_windows = attn_windows.reshape([-1, self.window_size, self.window_size, C]) 190 | # Shift back 191 | shifted_x = windows_reverse(attn_windows, self.window_size, H, W) 192 | 193 | if self.shift_size > 0: 194 | x = paddle.roll(x, shifts=(self.shift_size, self.shift_size), axis=(1, 2)) 195 | else: 196 | x = shifted_x 197 | ##### END CLASS 6 198 | 199 | 200 | # [B, H, W, C] 201 | x = x.reshape([B, H*W, C]) 202 | 203 | x = self.attn(x) 204 | 205 | x = h + x 206 | 207 | h = x 208 | x = self.mlp_norm(x) 209 | x = self.mlp(x) 210 | x = h + x 211 | return x 212 | 213 | class SwinStage(nn.Layer): 214 | def __init__(self, dim, input_resolution, depth, num_heads, window_size, patch_merging=None): 215 | super().__init__() 216 | self.blocks = nn.LayerList() 217 | for i in range(depth): 218 | self.blocks.append( 219 | SwinBlock(dim=dim, 220 | input_resolution=input_resolution, 221 | num_heads=num_heads, 222 | window_size=window_size, 223 | shift_size=0 if (i % 2 == 0) else window_size//2)) 224 | if patch_merging is None: 225 | self.patch_merging = Identity() 226 | else: 227 | self.patch_merging = patch_merging(input_resolution, dim) 228 | 229 | def forward(self, x): 230 | for block in self.blocks: 231 | x = block(x) 232 | x = self.patch_merging(x) 233 | return x 234 | 235 | class Swin(nn.Layer): 236 | def __init__(self, 237 | image_size=224, 238 | patch_size=4, 239 | in_channels=3, 240 | embed_dim=96, 241 | window_size=7, 242 | num_heads=[3, 6, 12, 24], 243 | depths=[2, 2, 6, 2], 244 | num_classes=1000): 245 | super().__init__() 246 | self.num_classes = num_classes 247 | self.depths = depths 248 | self.num_heads = num_heads 249 | self.embed_dim = embed_dim 250 | self.num_stages = len(depths) 251 | self.num_features = int(self.embed_dim * 2**(self.num_stages-1)) 252 | self.patch_resolution = [image_size // patch_size, image_size // patch_size] 253 | self.patch_embedding = PatchEmbedding(patch_size=patch_size, embed_dim=embed_dim) 254 | 255 | self.stages = nn.LayerList() 256 | for idx, (depth, num_heads) in enumerate(zip(self.depths, self.num_heads)): 257 | stage = SwinStage(dim=int(self.embed_dim * 2**idx), 258 | input_resolution=(self.patch_resolution[0]//(2**idx), 259 | self.patch_resolution[0]//(2**idx)), 260 | depth=depth, 261 | num_heads=num_heads, 262 | window_size=window_size, 263 | patch_merging=PatchMerging if (idx < self.num_stages-1) else None) 264 | self.stages.append(stage) 265 | 266 | self.norm = nn.LayerNorm(self.num_features) 267 | self.avgpool = nn.AdaptiveAvgPool1D(1) 268 | self.fc = nn.Linear(self.num_features, self.num_classes) 269 | 270 | def forward(self, x): 271 | x = self.patch_embedding(x) 272 | for stage in self.stages: 273 | x = stage(x) 274 | x = self.norm(x) 275 | x = x.transpose([0, 2, 1]) 276 | # [B, embed_dim, num_windows] 277 | x = self.avgpool(x) # [B, embed_dim, 1] 278 | x = x.flatten(1) 279 | x = self.fc(x) 280 | return x 281 | 282 | 283 | def main(): 284 | t = paddle.randn((4, 3, 224, 224)) 285 | # patch_embedding = PatchEmbedding(patch_size=4, embed_dim=96) 286 | # swin_block_w_msa = SwinBlock(dim=96, input_resolution=[56, 56], num_heads=4, window_size=7, shift_size=0) 287 | # swin_block_sw_msa = SwinBlock(dim=96, input_resolution=[56, 56], num_heads=4, window_size=7, shift_size=7//2) 288 | # patch_merging = PatchMerging(input_resolution=[56, 56], dim=96) 289 | 290 | # print('image shape = [4, 3, 224, 224]') 291 | # out = patch_embedding(t) # [4, 56, 56, 96] 292 | # print('patch_embedding out shape = ', out.shape) 293 | # out = swin_block_w_msa(out) 294 | # out = swin_block_sw_msa(out) 295 | # print('swin_block out shape = ', out.shape) 296 | # out = patch_merging(out) 297 | # print('patch_merging out shape = ', out.shape) 298 | 299 | model = Swin() 300 | print(model) 301 | out = model(t) 302 | print(out.shape) 303 | 304 | if __name__ == '__main__': 305 | main() -------------------------------------------------------------------------------- /swin_transformer/mask_1129.py: -------------------------------------------------------------------------------- 1 | """ 2 | DateTime: 2021.11.29 3 | Written By: Dr. Zhu 4 | Recorded By: Hatimwen 5 | """ 6 | import paddle 7 | from PIL import Image 8 | paddle.set_device('cpu') 9 | 10 | def window_partition(x, window_size): 11 | B, H, W, C = x.shape 12 | x = x.reshape([B, H//window_size, window_size, W//window_size, window_size, C]) 13 | x = x.transpose([0, 1, 3, 2, 4, 5]) 14 | x = x.reshape([-1, window_size, window_size, C]) 15 | return x 16 | 17 | def generate_mask(window_size=4, shift_size=2, input_resolution=(8, 8)): 18 | H, W = input_resolution 19 | img_mask = paddle.zeros([1, H, W, 1]) 20 | h_slices = [slice(0, -window_size), 21 | slice(-window_size, -shift_size), 22 | slice(-shift_size, None)] # a[slice(...)] = a[0:-window_size] 23 | w_slices = [slice(0, -window_size), 24 | slice(-window_size, -shift_size), 25 | slice(-shift_size, None)] 26 | cnt = 0 27 | for h in h_slices: 28 | for w in w_slices: 29 | img_mask[:, h, w, :] = cnt 30 | cnt += 1 31 | windows_mask = window_partition(img_mask, window_size=window_size) 32 | windows_mask = windows_mask.reshape([-1, window_size * window_size]) 33 | 34 | attn_mask = windows_mask.unsqueeze(1) - windows_mask.unsqueeze(2) 35 | # Broadcasting: [n, 1, ws*ws] - [n, ws*ws, 1] 36 | attn_mask = paddle.where(attn_mask!=0, 37 | paddle.ones_like(attn_mask) * 255, 38 | paddle.zeros_like(attn_mask)) 39 | return attn_mask 40 | 41 | def main(): 42 | mask = generate_mask() 43 | print(mask.shape) 44 | mask = mask.cpu().numpy().astype('uint8') 45 | for i in range(4): 46 | for j in range(16): 47 | for k in range(16): 48 | print(mask[i, j, k], end='\t') 49 | print() 50 | 51 | im = Image.fromarray(mask[i, :, :]) 52 | im.save(f'{i}.png') 53 | print() 54 | print() 55 | print() 56 | 57 | if __name__ == '__main__': 58 | main() -------------------------------------------------------------------------------- /vit.py: -------------------------------------------------------------------------------- 1 | """ 2 | DateTime: 2021.11.24 3 | Written By: Dr. Zhu 4 | Recorded By: Hatimwen 5 | """ 6 | import paddle as paddle 7 | import paddle.nn as nn 8 | # from PIL import Image 9 | from paddle.nn.layer.common import Identity 10 | 11 | paddle.set_device('cpu') 12 | 13 | class MLp(nn.Layer): 14 | def __init__(self, embed_dim, mlp_ratio=4.0, dropout=0.): 15 | super(MLp, self).__init__() 16 | self.fc1 = nn.Linear(embed_dim, int(embed_dim * mlp_ratio)) 17 | self.fc2 = nn.Linear(int(embed_dim * mlp_ratio), embed_dim) 18 | self.act = nn.GELU() 19 | self.dropout = nn.Dropout(dropout) 20 | 21 | def forward(self, x): 22 | x = self.fc1(x) 23 | x = self.act(x) 24 | x = self.dropout(x) 25 | x = self.fc2(x) 26 | x = self.dropout(x) 27 | return x 28 | 29 | class Encoder(nn.Layer): 30 | def __init__(self, embed_dim): 31 | super(Encoder, self).__init__() 32 | self.attn = Identity() #TODO 33 | self.attn_norm = nn.LayerNorm(embed_dim) 34 | self.mlp = MLp(embed_dim) 35 | self.mlp_norm = nn.LayerNorm(embed_dim) 36 | 37 | def forward(self, x): 38 | h = x 39 | x = self.attn_norm(x) 40 | x = self.attn(x) 41 | x = x + h 42 | 43 | h = x 44 | x = self.mlp_norm(x) 45 | x = self.mlp(x) 46 | x = x + h 47 | return x 48 | 49 | 50 | 51 | class PatchEmbedding(nn.Layer): 52 | def __init__(self, image_size, patch_size, in_channels, embed_dim, dropout=0.): 53 | super(PatchEmbedding, self).__init__() 54 | self.patch_embed = nn.Conv2D(in_channels, 55 | embed_dim, 56 | kernel_size=patch_size, 57 | stride=patch_size, 58 | weight_attr=paddle.ParamAttr(initializer=nn.initializer.Constant(1.0)), 59 | bias_attr=False) 60 | self.drop_out = nn.Dropout(dropout) 61 | 62 | def forward(self, x): 63 | x = self.patch_embed(x) # [n, embed_dim, h', w'] 64 | x = x.flatten(2) # [n, embed_dim, h' * w'] 65 | x = x.transpose([0, 2, 1]) # [n, h' * w, embed_dim] 66 | x = self.drop_out(x) 67 | return x 68 | 69 | class ViT(nn.Layer): 70 | def __init__(self): 71 | super(ViT, self).__init__() 72 | self.patch_embed = PatchEmbedding(224, 7, 3, 16) 73 | layer_list = [Encoder(16) for _ in range(5)] 74 | self.encoders = nn.LayerList(layer_list) 75 | self.head = nn.Linear(16, 10) # 10:num_classes 76 | self.avgpool = nn.AdaptiveAvgPool1D(1) 77 | 78 | def forward(self, x): 79 | x = self.patch_embed(x) 80 | for encoder in self.encoders: 81 | x = encoder(x) 82 | # layernorm usually used here 83 | # [n, h' * w', embed_dim] 84 | x = x.transpose([0, 2, 1]) 85 | x = self.avgpool(x) # [n, embed_dim, 1] 86 | x = x.flatten(1) # [n, embed_dim] 87 | x = self.head(x) 88 | return x 89 | 90 | def main(): 91 | # random img: 92 | # img = np.random.randint(0, 255, [28, 28], dtype=np.uint8) 93 | # sample = paddle.to_tensor(img, dtype='float32') 94 | # sample = sample.reshape([1, 1, 28, 28]) 95 | 96 | # patch_embed = PatchEmbedding(28, 7, 1, 1) 97 | # out = patch_embed(sample) 98 | # print(out) 99 | # print(out.shape) 100 | 101 | # mlp = MLp(1) 102 | # out = mlp(out) 103 | # print(out) 104 | # print(out.shape) 105 | 106 | t = paddle.randn([4, 3, 224, 224]) 107 | vit = ViT() 108 | out = vit(t) 109 | print(out) 110 | print(type(out)) 111 | print(out.shape) 112 | 113 | if __name__ == "__main__": 114 | main() -------------------------------------------------------------------------------- /vit_1126.py: -------------------------------------------------------------------------------- 1 | """ 2 | DateTime: 2021.11.26 3 | Written By: Dr. Zhu 4 | Recorded By: Hatimwen 5 | """ 6 | import paddle 7 | import paddle.nn as nn 8 | 9 | paddle.set_device('cpu') 10 | 11 | class Identity(nn.Layer): 12 | def __init__(self): 13 | super(Identity, self).__init__() 14 | 15 | def forward(self, x): 16 | return x 17 | 18 | class Mlp(nn.Layer): 19 | def __init__(self, embed_dim, mlp_ratio, dropout=0.): 20 | super().__init__() 21 | self.fc1 = nn.Linear(embed_dim, int(embed_dim * mlp_ratio)) 22 | self.fc2 = nn.Linear(int(embed_dim * mlp_ratio), embed_dim) 23 | self.act = nn.GELU() 24 | self.dropout = nn.Dropout(dropout) 25 | 26 | def forward(self, x): 27 | x = self.fc1(x) 28 | x = self.act(x) 29 | x = self.dropout(x) 30 | x = self.fc2(x) 31 | x = self.dropout(x) 32 | return x 33 | 34 | class PatchEmbedding(nn.Layer): 35 | def __init__(self, image_size=224, patch_size=16, in_channels=3, embed_dim=768, dropout=0.): 36 | super().__init__() 37 | n_patches = (image_size // patch_size) * (image_size // patch_size) 38 | self.patch_embedding = nn.Conv2D(in_channels=in_channels, 39 | out_channels=embed_dim, 40 | kernel_size=patch_size, 41 | stride=patch_size) 42 | 43 | self.class_token = paddle.create_parameter( 44 | shape=[1, 1, embed_dim], 45 | dtype='float32', 46 | default_initializer=paddle.nn.initializer.Constant(0.)) 47 | 48 | self.position_embedding = paddle.create_parameter( 49 | shape=[1, n_patches+1, embed_dim], 50 | dtype='float32', 51 | default_initializer=nn.initializer.TruncatedNormal(std=.02)) 52 | 53 | self.dropout = nn.Dropout(dropout) 54 | 55 | def forward(self, x): 56 | # [n, c, h, w] 57 | class_tokens = self.class_token.expand([x.shape[0], -1, -1]) 58 | # class_tokens = self.class_token.expand([x.shape[0], 1, self.embed_dim]) # for batch 59 | x = self.patch_embedding(x) #[n, embed_dim, h', w'] 60 | x = x.flatten(2) # [n, embed_dim, h' * w'] 61 | x = x.transpose([0, 2, 1]) # [n, h' * w, embed_dim] 62 | x = paddle.concat([class_tokens, x], axis=1) 63 | 64 | x = x + self.position_embedding 65 | x = self.dropout(x) 66 | return x 67 | 68 | class Attention(nn.Layer): 69 | """multi-head self attention""" 70 | def __init__(self, embed_dim, num_heads, qkv_bias=True, dropout=0., attention_dropout=0.): 71 | super().__init__() 72 | self.num_heads = num_heads 73 | self.head_dim = int(embed_dim / num_heads) 74 | self.all_head_dim = self.head_dim * num_heads 75 | self.scale = self.head_dim ** -0.5 76 | 77 | self.qkv = nn.Linear(embed_dim, 78 | self.all_head_dim * 3) 79 | 80 | self.proj = nn.Linear(self.all_head_dim, embed_dim) 81 | 82 | self.dropout = nn.Dropout(dropout) 83 | self.attention_dropout = nn.Dropout(attention_dropout) 84 | self.softmax = nn.Softmax(axis=-1) 85 | 86 | def transpose_multi_head(self, x): 87 | # N: num_patches 88 | # x: [B, N, all_head_dim] 89 | new_shape = x.shape[:-1] + [self.num_heads, self.head_dim] 90 | x = x.reshape(new_shape) 91 | # x: [B, N, num_heads, head_dim] 92 | x = x.transpose([0, 2, 1, 3]) 93 | # x: [B, num_heads, N, head_dim] 94 | return x 95 | 96 | def forward(self, x): 97 | B, N, _ = x.shape 98 | qkv = self.qkv(x).chunk(3, -1) 99 | # [B, N, all_head_dim] * 3 100 | q, k, v = map(self.transpose_multi_head, qkv) 101 | 102 | # q, k, v: [B, num_heads, N, head_dim] 103 | attn = paddle.matmul(q, k, transpose_y=True) # q * k^T 104 | attn = self.scale * attn 105 | attn = self.softmax(attn) 106 | attn = self.attention_dropout(attn) 107 | # attn :[B, num_heads, N, N] 108 | 109 | out = paddle.matmul(attn, v) # softmax(scale(q * k^T)) * v 110 | out = out.transpose([0, 2, 1, 3]) 111 | # out: [B, N, num_heads, head_dim] 112 | out = out.reshape([B, N, -1]) 113 | 114 | out = self.proj(out) 115 | out = self.dropout(out) 116 | return out 117 | 118 | class EncoderLayer(nn.Layer): 119 | def __init__(self, embed_dim=768, num_heads=4, qkv_bias=True, mlp_ratio=40, dropout=0., attention_dropout=0.): 120 | super().__init__() 121 | self.attn_norm = nn.LayerNorm(embed_dim) 122 | self.attn = Attention(embed_dim, num_heads) 123 | self.mlp_norm = nn.LayerNorm(embed_dim) 124 | self.mlp = Mlp(embed_dim, mlp_ratio) 125 | 126 | def forward(self, x): 127 | h = x # residual 128 | x = self.attn_norm(x) 129 | x = self.attn(x) 130 | x = x + h 131 | 132 | h = x 133 | x = self.mlp_norm(x) 134 | x = self.mlp(x) 135 | x = x + h 136 | return x 137 | 138 | class Encoder(nn.Layer): 139 | def __init__(self, embed_dim, depth): 140 | super().__init__() 141 | layer_list = [] 142 | for i in range(depth): 143 | encoder_layer = EncoderLayer() 144 | layer_list.append(encoder_layer) 145 | self.layers = nn.LayerList(layer_list) 146 | self.norm = nn.LayerNorm(embed_dim) 147 | 148 | def forward(self, x): 149 | for layer in self.layers: 150 | x = layer(x) 151 | x = self.norm(x) 152 | return x 153 | 154 | class VisualTransformer(nn.Layer): 155 | def __init__(self, 156 | image_size=224, 157 | patch_size=16, 158 | in_channels=3, 159 | num_classes=1000, 160 | embed_dim=768, 161 | depth=3, 162 | num_heads=8, 163 | mlp_ratio=4, 164 | qkv_bias=True, 165 | dropout=0., 166 | attention_dropout=0., 167 | droppath=0.): 168 | super().__init__() 169 | self.patch_embedding = PatchEmbedding(image_size, patch_size, in_channels, embed_dim) 170 | self.encoder = Encoder(embed_dim, depth) 171 | self.classifier = nn.Linear(embed_dim, num_classes) 172 | 173 | def forward(self, x): 174 | # x: [N, C, H, W] 175 | x = self.patch_embedding(x) # [N, embed_dim, h', w'] 176 | # x = x.flatten(2) # [N, embed_dim, h' * w'] h' * w' = num_patches 177 | # x = x.transpose([0, 2, 1]) # [N, num_patches, embed_dim] 178 | x = self.encoder(x) 179 | x = self.classifier(x[:, 0]) 180 | return x 181 | 182 | def main(): 183 | vit = VisualTransformer() 184 | print(vit) 185 | paddle.summary(vit, input_size=(4, 3, 224, 224)) 186 | 187 | 188 | if __name__ == '__main__': 189 | main() --------------------------------------------------------------------------------