├── .gitignore
├── 0.png
├── 1.png
├── 2.png
├── 3.png
├── README.md
├── README_en.md
├── attention.py
├── deit
    ├── deit.py
    ├── transforms.py
    └── wenht.jpg
├── detr
    ├── main.py
    ├── resnet.py
    └── transformer.py
├── distributed
    └── main.py
├── iterator
    └── tmp.py
├── load_config
    ├── a.yaml
    ├── config.py
    └── main.py
├── resnet.py
├── swin_transformer
    ├── main_1128.py
    ├── main_1129.py
    ├── main_1130.py
    └── mask_1129.py
├── vit.py
└── vit_1126.py


/.gitignore:
--------------------------------------------------------------------------------
1 | *__pycache__/
2 | *.pyc
3 | 


--------------------------------------------------------------------------------
/0.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/hatimwen/Paddle_VIT_tutorial/b1ab0d408f5dad4f8f6690bde4f91fffc5b9f118/0.png


--------------------------------------------------------------------------------
/1.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/hatimwen/Paddle_VIT_tutorial/b1ab0d408f5dad4f8f6690bde4f91fffc5b9f118/1.png


--------------------------------------------------------------------------------
/2.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/hatimwen/Paddle_VIT_tutorial/b1ab0d408f5dad4f8f6690bde4f91fffc5b9f118/2.png


--------------------------------------------------------------------------------
/3.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/hatimwen/Paddle_VIT_tutorial/b1ab0d408f5dad4f8f6690bde4f91fffc5b9f118/3.png


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
 1 | # Paddle_VIT_tutorial
 2 | 
 3 | Baidu PaddlePaddle `从零开始学视觉Transformer` Dr. Zhu's codes.
 4 | 
 5 | [English](./README_en.md) | 简体中文
 6 | 
 7 | 课程链接：https://aistudio.baidu.com/aistudio/course/introduce/25102?directly=1&shared=1
 8 | 
 9 | 官方代码链接：https://github.com/BR-IDL/PaddleViT/tree/develop/edu
10 | 
11 | 同步上课讲的一些代码，纯手敲，仅供参考，有问题可以一起交流学习。
12 | 
13 | 具体时间线及对应代码如下：
14 | 
15 | - Class #0, 2021.11.23
16 | 
17 |     resnet18 实现 [resnet.py](./resnet.py)
18 | 
19 | - Class #1, 2021.11.24
20 | 
21 |     开始搭建ViT [vit.py](./vit.py)
22 | 
23 | - Class #2, 2021.11.25
24 | 
25 |     Multi-Head Self Attention [attention.py](./attention.py)
26 | 
27 | - Class #3, 2021.11.26
28 | 
29 |     实现一个ViT模型 [vit_1126.py](./vit_1126.py)
30 | 
31 | - Class #4, 2021.11.27
32 | 
33 |     实现DeiT [deit/deit.py](./deit/deit.py)
34 | 
35 |     图像输入网络前的步骤——图像处理 [deit/transforms.py](./deit/transforms.py)
36 | 
37 | - Class #5, 2021.11.28
38 | 
39 |     图像窗口上的注意力机制 [swin_transformer/main_1128.py](./swin_transformer/main_1128.py)
40 | 
41 | - Class #6, 2021.11.29
42 | 
43 |     注意力掩码 Attention Mask [swin_transformer/mask_1129.py](./swin_transformer/mask_1129.py)
44 | 
45 |     实现Swin Transformer 的 SwinBlock [swin_transformer/main_1129.py](./swin_transformer/main_1129.py)
46 | 
47 | - Class #7, 2021.11.30
48 | 
49 |     实现 Swin Transformer [swin_transformer/main_1130.py](./swin_transformer/main_1130.py)
50 | 
51 |     数据加载过程——迭代器的实现 [iterator_1130/tmp.py](./iterator_1130/tmp.py)
52 | 
53 | - Class #8, 2021.11.31
54 | 
55 |     [PaddleViT](https://github.com/BR-IDL/PaddleViT) 中配置文件的加载逻辑 [load_config](./load_config/)
56 | 
57 | - Class #9, 2021.12.1
58 | 
59 |     PaddlePaddle 进行多机多卡训练 [distributed/main.py](./distributed/main.py)
60 | 
61 | - Class #10, 2021.12.2
62 | 
63 |     实现 DETR [detr](./detr/)
64 | 
65 | 感谢百度飞桨~加油！
66 | 


--------------------------------------------------------------------------------
/README_en.md:
--------------------------------------------------------------------------------
 1 | # Paddle_VIT_tutorial
 2 | 
 3 | English | [简体中文](./README.md)
 4 | 
 5 | This repo contains some codes recorded from the online course, [Learn Vision Transformer from Scratch](https://aistudio.baidu.com/aistudio/course/introduce/25102?directly=1&shared=1), which was lectured by  [Dr. Zhu](https://github.com/xperzy), Baidu PaddlePaddle.
 6 | 
 7 | If you have any questions, please feel free to contact me.
 8 | 
 9 | Official code：https://github.com/BR-IDL/PaddleViT/tree/develop/edu
10 | 
11 | Timeline and corresponding codes:
12 | 
13 | - Class #0, 2021.11.23
14 | 
15 |     Implementation of resnet18. [resnet.py](./resnet.py)
16 | 
17 | - Class #1, 2021.11.24
18 | 
19 |     Let's build a ViT! [vit.py](./vit.py)
20 | 
21 | - Class #2, 2021.11.25
22 | 
23 |     Multi-Head Self Attention. [attention.py](./attention.py)
24 | 
25 | - Class #3, 2021.11.26
26 | 
27 |     Implementation of ViT. [vit_1126.py](./vit_1126.py)
28 | 
29 | - Class #4, 2021.11.27
30 | 
31 |     Implementation of DeiT. [deit/deit.py](./deit/deit.py)
32 | 
33 |     Before feeding to a net: Image Preprocess. [deit/transforms.py](./deit/transforms.py)
34 | 
35 | - Class #5, 2021.11.28
36 | 
37 |     Window Attention. [swin_transformer/main_1128.py](./swin_transformer/main_1128.py)
38 | 
39 | - Class #6, 2021.11.29
40 | 
41 |     Attention Mask. [swin_transformer/mask_1129.py](./swin_transformer/mask_1129.py)
42 | 
43 |     Implementation of SwinBlock, a block of Swin Transformer. [swin_transformer/main_1129.py](./swin_transformer/main_1129.py)
44 | 
45 | - Class #7, 2021.11.30
46 | 
47 |     Implementation of Swin Transformer. [swin_transformer/main_1130.py](./swin_transformer/main_1130.py)
48 | 
49 |     Used to load data: Iterator. [iterator_1130/tmp.py](./iterator_1130/tmp.py)
50 | 
51 | - Class #8, 2021.11.31
52 | 
53 |     How does [PaddleViT](https://github.com/BR-IDL/PaddleViT) set and load configs? [load_config](./load_config/)
54 | 
55 | - Class #9, 2021.12.1
56 | 
57 |     Distributed training for PaddlePaddle. [distributed/main.py](./distributed/main.py)
58 | 
59 | - Class #10, 2021.12.2
60 | 
61 |     Implementation of DETR. [detr](./detr/)
62 | 
63 | Thanks a lot for what Baidu PaddlePaddle have done! Fighting!
64 | 


--------------------------------------------------------------------------------
/attention.py:
--------------------------------------------------------------------------------
 1 | """
 2 | DateTime: 2021.11.25
 3 | Written By: Dr. Zhu
 4 | Recorded By: Hatimwen
 5 | """
 6 | 
 7 | import paddle as paddle
 8 | import paddle.nn as nn
 9 | 
10 | paddle.set_device('cpu')
11 | 
12 | class Attention(nn.Layer):
13 |     def __init__(self, embed_dim, num_heads, qkv_bias=False, qk_scale=None, dropout=0., attention_dropout=0.):
14 |         super(Attention, self).__init__()
15 |         self.embed_dim = embed_dim
16 |         self.num_heads = num_heads
17 |         self.head_dim = int(self.embed_dim / self.num_heads)
18 |         self.all_head_dim = self.head_dim * num_heads
19 |         self.qkv = nn.Linear(embed_dim,
20 |                              self.all_head_dim * 3,
21 |                              bias_attr=False if qkv_bias is False else None)
22 |         self.scale = self.head_dim ** -0.5 if qk_scale is None else qk_scale
23 |         self.softmax = nn.Softmax(-1)
24 |         self.proj = nn.Linear(self.all_head_dim, self.embed_dim)
25 | 
26 |     def transpose_multi_head(self, x):
27 |         # N: num_patches
28 |         # x: [B, N, all_head_dim]
29 |         new_shape = x.shape[:-1] + [self.num_heads, self.head_dim]
30 |         x = x.reshape(new_shape)
31 |         # x: [B, N, num_heads, head_dim]
32 |         x = x.transpose([0, 2, 1, 3])
33 |         # x: [B, num_heads, N, head_dim]
34 |         return x
35 | 
36 |     def forward(self, x):
37 |         B, N, _ = x.shape
38 |         qkv = self.qkv(x).chunk(3, -1)
39 |         # [B, N, all_head_dim] * 3
40 |         q, k, v = map(self.transpose_multi_head, qkv)
41 | 
42 |         # q, k, v: [B, num_heads, N, head_dim]
43 |         attn = paddle.matmul(q, k, transpose_y=True)    # q * k^T
44 |         attn = self.scale * attn
45 |         attn = self.softmax(attn)
46 |         attn_weight = attn
47 |         # dropout
48 |         # attn :[B, num_heads, N, N]
49 | 
50 |         out = paddle.matmul(attn, v)    # softmax(scale(q * k^T)) * v
51 |         out = out.transpose([0, 2, 1, 3])
52 |         # out: [B, N, num_heads, head_dim]
53 |         out = out.reshape([B, N, -1])
54 | 
55 |         out = self.proj(out)
56 |         # dropout
57 |         return out, attn_weight
58 | 
59 | def main():
60 |     t = paddle.randn([8, 16, 96])   # image tokens
61 |     model = Attention(embed_dim=96, num_heads=4, qkv_bias=False, qk_scale=None)
62 |     print(model)
63 |     out, attn_weight = model(t)
64 |     print(out.shape)
65 |     print(attn_weight.shape)
66 | 
67 | 
68 | if __name__ == "__main__":
69 |     main()


--------------------------------------------------------------------------------
/deit/deit.py:
--------------------------------------------------------------------------------
  1 | """
  2 | DateTime: 2021.11.27
  3 | Written By: Dr. Zhu
  4 | Recorded By: Hatimwen
  5 | """
  6 | import paddle
  7 | import paddle.nn as nn
  8 | 
  9 | 
 10 | paddle.set_device('cpu')
 11 | 
 12 | class Identity(nn.Layer):
 13 |     def __init__(self):
 14 |         super(Identity, self).__init__()
 15 | 
 16 |     def forward(self, x):
 17 |         return x
 18 | 
 19 | class Mlp(nn.Layer):
 20 |     def __init__(self, embed_dim, mlp_ratio, dropout=0.):
 21 |         super().__init__()
 22 |         self.fc1 = nn.Linear(embed_dim, int(embed_dim * mlp_ratio))
 23 |         self.fc2 = nn.Linear(int(embed_dim * mlp_ratio), embed_dim)
 24 |         self.act = nn.GELU()
 25 |         self.dropout = nn.Dropout(dropout)
 26 | 
 27 |     def forward(self, x):
 28 |         x = self.fc1(x)
 29 |         x = self.act(x)
 30 |         x = self.dropout(x)
 31 |         x = self.fc2(x)
 32 |         x = self.dropout(x)
 33 |         return x
 34 | 
 35 | class PatchEmbedding(nn.Layer):
 36 |     def __init__(self, image_size=224, patch_size=16, in_channels=3, embed_dim=768, dropout=0.):
 37 |         super().__init__()
 38 |         n_patches = (image_size // patch_size) * (image_size // patch_size)
 39 |         self.patch_embedding = nn.Conv2D(in_channels=in_channels,
 40 |                                          out_channels=embed_dim,
 41 |                                          kernel_size=patch_size,
 42 |                                          stride=patch_size)
 43 | 
 44 |         self.class_token = paddle.create_parameter(
 45 |             shape=[1, 1, embed_dim],
 46 |             dtype='float32',
 47 |             default_initializer=paddle.nn.initializer.Constant(0.))
 48 | 
 49 |         self.distill_token = paddle.create_parameter(
 50 |             shape=[1, 1, embed_dim],
 51 |             dtype='float32',
 52 |             default_initializer=nn.initializer.TruncatedNormal(std=.02))
 53 | 
 54 |         self.position_embedding = paddle.create_parameter(
 55 |             shape=[1, n_patches+2, embed_dim],  # +2
 56 |             dtype='float32',
 57 |             default_initializer=nn.initializer.TruncatedNormal(std=.02))
 58 | 
 59 |         self.dropout = nn.Dropout(dropout)
 60 | 
 61 |     def forward(self, x):
 62 |         # [n, c, h, w]
 63 |         class_tokens = self.class_token.expand([x.shape[0], -1, -1])
 64 |         distill_tokens = self.distill_token.expand([x.shape[0], -1, -1])
 65 |         x = self.patch_embedding(x)    #[n, embed_dim, h', w']
 66 |         x = x.flatten(2) # [n, embed_dim, h' * w']
 67 |         x = x.transpose([0, 2, 1]) # [n, h' * w, embed_dim]
 68 | 
 69 | 
 70 |         x = paddle.concat([class_tokens, distill_tokens, x], axis=1)
 71 |         
 72 |         x = x + self.position_embedding
 73 |         x = self.dropout(x)
 74 |         return x
 75 | 
 76 | 
 77 | class Attention(nn.Layer):
 78 |     """multi-head self attention"""
 79 |     def __init__(self, embed_dim, num_heads, qkv_bias=True, dropout=0., attention_dropout=0.):
 80 |         super().__init__()
 81 |         self.num_heads = num_heads
 82 |         self.head_dim = int(embed_dim / num_heads)
 83 |         self.all_head_dim = self.head_dim * num_heads
 84 |         self.scale = self.head_dim ** -0.5
 85 | 
 86 |         self.qkv = nn.Linear(embed_dim,
 87 |                              self.all_head_dim * 3)
 88 | 
 89 |         self.proj = nn.Linear(self.all_head_dim, embed_dim)
 90 | 
 91 |         self.dropout = nn.Dropout(dropout)
 92 |         self.attention_dropout = nn.Dropout(attention_dropout)
 93 |         self.softmax = nn.Softmax(axis=-1)
 94 | 
 95 |     def transpose_multi_head(self, x):
 96 |         # N: num_patches
 97 |         # x: [B, N, all_head_dim]
 98 |         new_shape = x.shape[:-1] + [self.num_heads, self.head_dim]
 99 |         x = x.reshape(new_shape)
100 |         # x: [B, N, num_heads, head_dim]
101 |         x = x.transpose([0, 2, 1, 3])
102 |         # x: [B, num_heads, N, head_dim]
103 |         return x
104 | 
105 |     def forward(self, x):
106 |         B, N, _ = x.shape
107 |         qkv = self.qkv(x).chunk(3, -1)
108 |         # [B, N, all_head_dim] * 3
109 |         q, k, v = map(self.transpose_multi_head, qkv)
110 | 
111 |         # q, k, v: [B, num_heads, N, head_dim]
112 |         attn = paddle.matmul(q, k, transpose_y=True)    # q * k^T
113 |         attn = self.scale * attn
114 |         attn = self.softmax(attn)
115 |         attn = self.attention_dropout(attn)
116 |         # attn :[B, num_heads, N, N]
117 | 
118 |         out = paddle.matmul(attn, v)    # softmax(scale(q * k^T)) * v
119 |         out = out.transpose([0, 2, 1, 3])
120 |         # out: [B, N, num_heads, head_dim]
121 |         out = out.reshape([B, N, -1])
122 | 
123 |         out = self.proj(out)
124 |         out = self.dropout(out)
125 |         return out
126 | 
127 | class EncoderLayer(nn.Layer):
128 |     def __init__(self, embed_dim=768, num_heads=4, qkv_bias=True, mlp_ratio=40, dropout=0., attention_dropout=0.):
129 |         super().__init__()
130 |         self.attn_norm = nn.LayerNorm(embed_dim)
131 |         self.attn = Attention(embed_dim, num_heads)
132 |         self.mlp_norm = nn.LayerNorm(embed_dim)
133 |         self.mlp = Mlp(embed_dim, mlp_ratio)
134 | 
135 |     def forward(self, x):
136 |         h = x   # residual
137 |         x = self.attn_norm(x)
138 |         x = self.attn(x)
139 |         x = x + h
140 | 
141 |         h = x
142 |         x = self.mlp_norm(x)
143 |         x = self.mlp(x)
144 |         x = x + h
145 |         return x
146 | 
147 | class Encoder(nn.Layer):
148 |     def __init__(self, embed_dim, depth):
149 |         super().__init__()
150 |         layer_list = []
151 |         for _ in range(depth):
152 |             encoder_layer = EncoderLayer()
153 |             layer_list.append(encoder_layer)
154 |         self.layers = nn.LayerList(layer_list)
155 |         self.norm = nn.LayerNorm(embed_dim)
156 | 
157 |     def forward(self, x):
158 |         for layer in self.layers:
159 |             x = layer(x)
160 |         x = self.norm(x)
161 | 
162 |         return x[:, 0], x[:, 1]
163 | 
164 | 
165 | class Deit(nn.Layer):
166 |     def __init__(self,
167 |                  image_size=224,
168 |                  patch_size=16,
169 |                  in_channels=3,
170 |                  num_classes=1000,
171 |                  embed_dim=768,
172 |                  depth=3,
173 |                  num_heads=8,
174 |                  mlp_ratio=4,
175 |                  qkv_bias=True,
176 |                  dropout=0.,
177 |                  attention_dropout=0.,
178 |                  droppath=0.):
179 |         super().__init__()
180 |         self.patch_embedding = PatchEmbedding(224, 16, 3, 768)
181 |         self.encoder = Encoder(embed_dim, depth)
182 |         self.head = nn.Linear(embed_dim, num_classes)
183 |         self.head_distill = nn.Linear(embed_dim, num_classes)
184 |         
185 |     def forward(self, x):
186 |         x = self.patch_embedding(x)
187 |         x, x_distill = self.encoder(x)
188 |         x = self.head(x)
189 |         x_distill = self.head_distill(x_distill)
190 |         if self.training:
191 |             return x, x_distill
192 |         else:
193 |             return (x + x_distill) / 2
194 | 
195 | def main():
196 |     model = Deit()
197 |     print(model)
198 |     paddle.summary(model, (4, 3, 224, 224))
199 | 
200 | 
201 | if __name__ == '__main__':
202 |     main()


--------------------------------------------------------------------------------
/deit/transforms.py:
--------------------------------------------------------------------------------
 1 | """
 2 | DateTime: 2021.11.27
 3 | Written By: Dr. Zhu
 4 | Recorded By: Hatimwen
 5 | """
 6 | import numpy as np
 7 | from PIL import Image
 8 | import paddle
 9 | import paddle.vision.transforms as T
10 | paddle.set_device('cpu')
11 | 
12 | def crop(img, region):
13 |     cropped_img = T.crop(img, *region)
14 |     return cropped_img
15 | 
16 | class CenterCrop():
17 |     def __init__(self, size):
18 |         self.size = size
19 |     def __call__(self, img):
20 |         w, h = img.size
21 |         cw, ch = self.size
22 |         crop_top = int(round((h - ch) / 2.))
23 |         crop_left = int(round((w - cw) / 2.))
24 |         return crop(img, (crop_top, crop_left, ch, cw))
25 | 
26 | class Resize():
27 |     def __init__(self, size):
28 |         self.size = size
29 |     def __call__(self, img):
30 |         return T.resize(img, self.size)
31 | 
32 | class ToTensor():
33 |     def __init__(self):
34 |         pass
35 |     def __call__(self, img):
36 |         w, h = img.size
37 |         img = paddle.to_tensor(np.array(img))
38 |         if img.dtype == paddle.uint8:
39 |             img = paddle.cast(img, paddle.float32) / 255.
40 |         # img = img.transpose([2, 0, 1])
41 |         return img
42 | 
43 | class Compose():
44 |     def __init__(self, transforms):
45 |         self.transforms = transforms
46 | 
47 |     def __call__(self, image):
48 |         for t in self.transforms:
49 |             image = t(image)
50 |         return image
51 | 
52 | def main():
53 |     img = Image.open('deit_1127/wenht.jpg')
54 |     img = img.convert('L')
55 |     transforms = Compose([Resize([256, 256]), 
56 |                           CenterCrop([112, 112]),
57 |                           ToTensor()])
58 |     out = transforms(img)
59 |     print(out)
60 |     print(out.shape)
61 | 
62 | if __name__ == '__main__':
63 |     main()


--------------------------------------------------------------------------------
/deit/wenht.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/hatimwen/Paddle_VIT_tutorial/b1ab0d408f5dad4f8f6690bde4f91fffc5b9f118/deit/wenht.jpg


--------------------------------------------------------------------------------
/detr/main.py:
--------------------------------------------------------------------------------
  1 | import paddle
  2 | import paddle.nn as nn
  3 | import paddle.nn.functional as F
  4 | 
  5 | import sys
  6 | sys.path.append('./')
  7 | from detr.resnet import ResNet18
  8 | from detr.transformer import Transformer
  9 | sys.path.pop()
 10 | 
 11 | paddle.set_device('cpu')
 12 | 
 13 | 
 14 | class PositionEmbedding(nn.Layer):
 15 |     def __init__(self, embed_dim):
 16 |         super().__init__()
 17 |         self.row_embed = nn.Embedding(50, embed_dim)
 18 |         self.col_embed = nn.Embedding(50, embed_dim)
 19 |         
 20 |     def forward(self, x):
 21 |         # x: [b, feat, H, W]
 22 |         h, w = x.shape[-2:]
 23 |         i = paddle.arange(w)
 24 |         j = paddle.arange(h)
 25 |         x_embed = self.col_embed(i)
 26 |         y_embed = self.row_embed(i)
 27 |         pos = paddle.concat([x_embed.unsqueeze(0).expand((h, x_embed.shape[0], x_embed.shape[1])),
 28 |                              y_embed.unsqueeze(1).expand((y_embed.shape[0], w, y_embed.shape[1]))], axis=-1)
 29 |         pos = pos.transpose([2, 0, 1])
 30 |         pos = pos.unsqueeze(0)
 31 |         pos = pos.expand([x.shape[0]] + pos.shape[1::]) #[batch_size, embed_dim, h, w]
 32 |         return pos
 33 | 
 34 | 
 35 | class BboxEmbed(nn.Layer):
 36 |     def __init__(self, in_dim, hidden_dim, out_dim):
 37 |         super().__init__()
 38 |         self.fc1 = nn.Linear(in_dim, hidden_dim)
 39 |         self.fc2 = nn.Linear(hidden_dim, hidden_dim)
 40 |         self.fc3 = nn.Linear(hidden_dim, out_dim)
 41 |         self.act = nn.ReLU()
 42 |     
 43 |     def forward(self, x):
 44 |         x = self.fc1(x)
 45 |         x = self.act(x)
 46 |         x = self.fc2(x)
 47 |         x = self.act(x)
 48 |         x = self.fc3(x)
 49 |         return x
 50 | 
 51 | 
 52 | class DETR(nn.Layer):
 53 |     def __init__(self, backbone, pos_embed, transformer, num_classes, num_queries):
 54 |         super().__init__()
 55 |         self.num_queries = num_queries
 56 |         self.transformer = transformer
 57 |         embed_dim = transformer.embed_dim
 58 | 
 59 |         self.class_embed = nn.Linear(embed_dim, num_classes + 1)
 60 |         self.bbox_embed  = BboxEmbed(embed_dim, embed_dim, 4)
 61 |         self.query_embed = nn.Embedding(num_queries, embed_dim)
 62 | 
 63 |         self.input_proj = nn.Conv2D(backbone.num_channels, embed_dim, kernel_size=1)
 64 |         self.backbone = backbone
 65 |         self.pos_embed = pos_embed
 66 | 
 67 |     def forward(self, x):
 68 |         print(f'----- INPUT: {x.shape}')
 69 |         feat = self.backbone(x)
 70 |         print(f'----- Feature after ResNet18: {feat.shape}')
 71 |         pos_embed = self.pos_embed(feat)
 72 |         print(f'----- Positional Embedding: {pos_embed.shape}')
 73 |         
 74 |         feat = self.input_proj(feat)
 75 |         print(f'----- Feature after input_proj: {feat.shape}')
 76 |         out, _ = self.transformer(feat, self.query_embed.weight, pos_embed)
 77 |         print(f'----- out after transformer: {out.shape}')
 78 | 
 79 |         out_class = self.class_embed(out)
 80 |         out_coord = self.bbox_embed(out)
 81 |         print(f'----- out for class: {out_class.shape}')
 82 |         print(f'----- out for bbox: {out_coord.shape}')
 83 |         #out_coord = F.sigmoid(out_coord)
 84 | 
 85 |         return out_class, out_coord
 86 | 
 87 | 
 88 | def build_detr():
 89 |     backbone = ResNet18() 
 90 |     transformer = Transformer()
 91 |     pos_embed = PositionEmbedding(16)
 92 |     detr = DETR(backbone, pos_embed, transformer, 10, 100)
 93 |     return detr
 94 | 
 95 | 
 96 | def main():
 97 |     t = paddle.randn([3, 3, 224, 224])
 98 |     model = build_detr()
 99 |     out = model(t)
100 |     print(out[0].shape, out[1].shape)
101 | 
102 | 
103 | 
104 | if __name__ == "__main__":
105 |     main()
106 |     


--------------------------------------------------------------------------------
/detr/resnet.py:
--------------------------------------------------------------------------------
  1 | import paddle
  2 | import paddle.nn as nn
  3 | 
  4 | paddle.set_device('cpu')
  5 | 
  6 | class Identity(nn.Layer):
  7 |     def __init__(self):
  8 |         super().__init__()
  9 | 
 10 |     def forward(self, x):
 11 |         return x
 12 | 
 13 | class Block(nn.Layer):
 14 |     def __init__(self, in_dim, out_dim, stride):
 15 |         super().__init__()
 16 |         self.conv1 = nn.Conv2D(in_dim, out_dim, 3, stride=stride, padding=1, bias_attr=False)
 17 |         self.bn1 = nn.BatchNorm2D(out_dim)
 18 |         self.conv2 = nn.Conv2D(out_dim, out_dim, 3, stride=1, padding=1, bias_attr=False)
 19 |         self.bn2 = nn.BatchNorm2D(out_dim)
 20 |         self.relu = nn.ReLU()
 21 | 
 22 |         if stride == 2 or in_dim != out_dim:
 23 |             self.downsample = nn.Sequential(*[
 24 |                 nn.Conv2D(in_dim, out_dim, 1, stride=stride),
 25 |                 nn.BatchNorm2D(out_dim)])
 26 |         else:
 27 |             self.downsample = Identity()
 28 | 
 29 |     def forward(self, x):
 30 |         h = x
 31 |         x = self.conv1(x)
 32 |         x = self.bn1(x)
 33 |         x = self.relu(x)
 34 |         x = self.conv2(x)
 35 |         x = self.bn2(x)
 36 |         identity = self.downsample(h) 
 37 |         x = x + identity
 38 |         x = self.relu(x)
 39 |         return x
 40 | 
 41 | class ResNet18(nn.Layer):
 42 |     def __init__(self, in_dim=64, num_classes=10):
 43 |         super().__init__()
 44 |         self.num_channels = 512
 45 |         self.in_dim = in_dim
 46 |         # stem layers
 47 |         self.conv1 = nn.Conv2D(in_channels=3,
 48 |                                out_channels=in_dim,
 49 |                                kernel_size=3,
 50 |                                stride=1,
 51 |                                padding=1,
 52 |                                bias_attr=False)
 53 |         self.bn1 = nn.BatchNorm2D(in_dim)
 54 |         self.relu = nn.ReLU()
 55 |         #blocks
 56 |         self.layer1 = self._make_layer(dim=64, n_blocks=2, stride=1)
 57 |         self.layer2 = self._make_layer(dim=128, n_blocks=2, stride=2)
 58 |         self.layer3 = self._make_layer(dim=256, n_blocks=2, stride=2)
 59 |         self.layer4 = self._make_layer(dim=512, n_blocks=2, stride=2)
 60 |         # head layer
 61 |         self.avgpool = nn.AdaptiveAvgPool2D(1)
 62 |         self.classifier = nn.Linear(512, num_classes)
 63 | 
 64 |     def _make_layer(self, dim, n_blocks, stride):
 65 |         layer_list = []
 66 |         layer_list.append(Block(self.in_dim, dim, stride=stride))
 67 |         self.in_dim = dim
 68 |         for i in range(1, n_blocks):
 69 |             layer_list.append(Block(self.in_dim, dim, stride=1))
 70 |         return nn.Sequential(*layer_list)
 71 | 
 72 | 
 73 |     # CLASS 10: Modify the forward, remove the head and classifier
 74 |     def forward(self, x):
 75 |         x = self.conv1(x)
 76 |         x = self.bn1(x)
 77 |         x = self.relu(x)
 78 |         x = self.layer1(x)
 79 |         x = self.layer2(x)
 80 |         x = self.layer3(x)
 81 |         x = self.layer4(x)
 82 |         return x
 83 | 
 84 |     #def forward(self, x):
 85 |     #    x = self.conv1(x)
 86 |     #    x = self.bn1(x)
 87 |     #    x = self.relu(x)
 88 |     #    x = self.layer1(x)
 89 |     #    x = self.layer2(x)
 90 |     #    x = self.layer3(x)
 91 |     #    x = self.layer4(x)
 92 |     #    x = self.forward_feature(x)
 93 |     #    x = self.avgpool(x)
 94 |     #    x = x.flatten(1)
 95 |     #    x = self.classifier(x)
 96 |     #    return x
 97 | 
 98 | def main():
 99 |     t = paddle.randn([4, 3, 224, 224])
100 |     model = ResNet18()
101 |     print(model)
102 |     out = model(t)
103 |     print(out.shape)
104 | 
105 | if __name__ == "__main__":
106 |     main()


--------------------------------------------------------------------------------
/detr/transformer.py:
--------------------------------------------------------------------------------
  1 | import copy
  2 | import paddle
  3 | import paddle.nn as nn
  4 | import paddle.nn.functional as F
  5 | 
  6 | paddle.set_device('cpu')
  7 | 
  8 | class Identity(nn.Layer):
  9 |     def __init__(self):
 10 |         super().__init__()
 11 | 
 12 |     def forward(self, x):
 13 |         return x
 14 | 
 15 | 
 16 | class Mlp(nn.Layer):
 17 |     def __init__(self, embed_dim, mlp_ratio, dropout=0.):
 18 |         super().__init__()
 19 |         self.fc1 = nn.Linear(embed_dim, int(embed_dim * mlp_ratio))
 20 |         self.fc2 = nn.Linear(int(embed_dim * mlp_ratio), embed_dim)
 21 |         self.act = nn.GELU()
 22 |         self.dropout = nn.Dropout(dropout)
 23 | 
 24 |     def forward(self, x):
 25 |         x = self.fc1(x)
 26 |         x = self.act(x)
 27 |         x = self.dropout(x)
 28 |         x = self.fc2(x)
 29 |         x = self.dropout(x)
 30 |         return x
 31 | 
 32 | 
 33 | class Attention(nn.Layer):
 34 |     """multi-head self attention"""
 35 |     def __init__(self, embed_dim, num_heads, qkv_bias=True, dropout=0., attention_dropout=0.):
 36 |         super().__init__()
 37 |         self.num_heads = num_heads
 38 |         self.head_dim = int(embed_dim / num_heads)
 39 |         self.all_head_dim = self.head_dim * num_heads
 40 |         self.scales = self.head_dim ** -0.5
 41 | 
 42 | 
 43 |         # CLASS 10: support decoder
 44 |         self.q = nn.Linear(embed_dim,
 45 |                            self.all_head_dim)
 46 |         self.k = nn.Linear(embed_dim,
 47 |                            self.all_head_dim)
 48 |         self.v = nn.Linear(embed_dim,
 49 |                            self.all_head_dim)
 50 | 
 51 | 
 52 |         self.proj = nn.Linear(self.all_head_dim, embed_dim)
 53 |         self.dropout = nn.Dropout(dropout)
 54 |         self.attention_dropout = nn.Dropout(attention_dropout)
 55 |         self.softmax = nn.Softmax(axis=-1)
 56 | 
 57 |     def transpose_multihead(self, x):
 58 |         # x: [seq_l, batch, all_head_dim] -> [seq_l, batch, n_head, head_dim]
 59 |         new_shape = x.shape[:-1] + [self.num_heads, self.head_dim]
 60 |         x = x.reshape(new_shape)
 61 |         x = x.flatten(1, 2) # merge batch and n_head:  [seq_l, batch*n_head, head_dim]
 62 |         x = x.transpose([1, 0, 2]) #[batch * n_head, seq_l, head_dim]
 63 |         return x
 64 | 
 65 |     def forward(self, query, key, value):
 66 |         lk = key.shape[0] # when enc-dec: num_patches (sequence len, token len)
 67 |         b = key.shape[1] # when enc-dec: batch_size
 68 |         lq = query.shape[0] # when enc-dec: num_queries
 69 |         d = query.shape[2] # when enc-dec: embed_dim
 70 |     
 71 |         q = self.q(query)
 72 |         k = self.k(key)
 73 |         v = self.v(value)
 74 |         q, k, v = map(self.transpose_multihead, [q, k, v])
 75 | 
 76 |         print(f'----- ----- ----- ----- [Attn] batch={key.shape[1]}, n_head={self.num_heads}, head_dim={self.head_dim}')
 77 |         print(f'----- ----- ----- ----- [Attn] q: {q.shape}, k: {k.shape}, v:{v.shape}')
 78 |         attn = paddle.matmul(q, k, transpose_y=True) # q * k'
 79 |         attn = attn * self.scales
 80 |         attn = self.softmax(attn)
 81 |         attn = self.attention_dropout(attn)
 82 |         print(f'----- ----- ----- ----- [Attn] attn: {attn.shape}')
 83 | 
 84 |         out = paddle.matmul(attn, v)
 85 |         out = out.transpose([1, 0, 2])
 86 |         out = out.reshape([lq, b, d])
 87 | 
 88 |         out = self.proj(out)
 89 |         out = self.dropout(out)
 90 | 
 91 |         return out
 92 | 
 93 | 
 94 | class EncoderLayer(nn.Layer):
 95 |     def __init__(self, embed_dim=768, num_heads=4, mlp_ratio=4.0):
 96 |         super().__init__()
 97 |         self.attn_norm = nn.LayerNorm(embed_dim)
 98 |         self.attn = Attention(embed_dim, num_heads)
 99 |         self.mlp_norm = nn.LayerNorm(embed_dim)
100 |         self.mlp = Mlp(embed_dim, mlp_ratio)
101 | 
102 |     def forward(self, x, pos=None):
103 | 
104 |         h = x  
105 |         x = self.attn_norm(x)
106 |         q = x + pos if pos is not None else x
107 |         k = x + pos if pos is not None else x
108 |         print(f'----- ----- ----- encoder q: {q.shape}, k: {k.shape}, v:{x.shape}')
109 |         x = self.attn(q, k, x)
110 |         x = x + h
111 | 
112 |         h = x
113 |         x = self.mlp_norm(x)
114 |         x = self.mlp(x)
115 |         x = x + h
116 |         print(f'----- ----- ----- encoder out: {x.shape}')
117 |         return x
118 | 
119 | 
120 | class DecoderLayer(nn.Layer):
121 |     def __init__(self, embed_dim=768, num_heads=4, mlp_ratio=4.0):
122 |         super().__init__()
123 |         self.attn_norm = nn.LayerNorm(embed_dim)
124 |         self.attn = Attention(embed_dim, num_heads)
125 |         self.enc_dec_attn_norm = nn.LayerNorm(embed_dim)
126 |         self.enc_dec_attn = Attention(embed_dim, num_heads)
127 |         self.mlp_norm = nn.LayerNorm(embed_dim)
128 |         self.mlp = Mlp(embed_dim, mlp_ratio)
129 | 
130 |     def forward(self, x, enc_out, pos=None, query_pos=None):
131 | 
132 |         h = x  
133 |         x = self.attn_norm(x)
134 |         q = x + query_pos if pos is not None else x
135 |         k = x + query_pos if pos is not None else x
136 |         print(f'----- ----- ----- decoder(self-attn) q: {q.shape}, k: {k.shape}, v:{x.shape}')
137 |         x = self.attn(q, k, x)
138 |         x = x + h
139 | 
140 |         h = x  
141 |         x = self.enc_dec_attn_norm(x)
142 |         q = x + query_pos if pos is not None else x
143 |         k = enc_out + pos if pos is not None else x
144 |         v = enc_out
145 |         print(f'----- ----- ----- decoder(enc-dec attn) q: {q.shape}, k: {k.shape}, v:{v.shape}')
146 |         x = self.attn(q, k, v)
147 |         x = x + h
148 | 
149 |         h = x
150 |         x = self.mlp_norm(x)
151 |         x = self.mlp(x)
152 |         x = x + h
153 |         print(f'----- ----- ----- decoder out: {x.shape}')
154 |         return x
155 | 
156 | 
157 | class Transformer(nn.Layer):
158 |     def __init__(self, embed_dim=32, num_heads=4, num_encoders=2, num_decoders=2):
159 |         super().__init__()
160 |         self.embed_dim = embed_dim
161 |         self.encoder = nn.LayerList([EncoderLayer(embed_dim, num_heads) for i in range(num_encoders)])
162 |         self.decoder = nn.LayerList([DecoderLayer(embed_dim, num_heads) for i in range(num_decoders)])
163 |         self.encoder_norm = nn.LayerNorm(embed_dim)
164 |         self.decoder_norm = nn.LayerNorm(embed_dim)
165 | 
166 |     def forward(self, x, query_embed, pos_embed):
167 |         B, C, H, W = x.shape
168 |         print(f'----- ----- Transformer INPUT: {x.shape}')
169 |         x = x.flatten(2) #[B, C, H*W]
170 |         x = x.transpose([2, 0, 1]) # [H*W, B, C]
171 |         print(f'----- ----- Transformer INPUT(after reshape): {x.shape}')
172 | 
173 |         # [B, dim, H, W]
174 |         pos_embed = pos_embed.flatten(2)
175 |         pos_embed = pos_embed.transpose([2, 0, 1]) #[H*W, B, dim]
176 |         print(f'----- ----- pos_embed(after reshape): {pos_embed.shape}')
177 | 
178 |         # [num_queries, dim]
179 |         query_embed = query_embed.unsqueeze(1)
180 |         query_embed = query_embed.expand((query_embed.shape[0], B, query_embed.shape[2]))
181 |         print(f'----- ----- query_embed(after reshape): {query_embed.shape}')
182 | 
183 |         target = paddle.zeros_like(query_embed)
184 |         print(f'----- ----- target (now all zeros): {target.shape}')
185 | 
186 |         for encoder_layer in self.encoder:
187 |             encoder_out = encoder_layer(x, pos_embed)
188 |         encoder_out = self.encoder_norm(encoder_out)
189 |         print(f'----- ----- encoder out: {encoder_out.shape}')
190 | 
191 |         for decoder_layer in self.decoder:
192 |             decoder_out = decoder_layer(target,
193 |                                         encoder_out,
194 |                                         pos_embed,
195 |                                         query_embed)
196 |         decoder_out = self.decoder_norm(decoder_out)
197 |         decoder_out = decoder_out.unsqueeze(0)
198 |         print(f'----- ----- decoder out: {decoder_out.shape}')
199 | 
200 | 
201 |         decoder_out = decoder_out.transpose([0, 2, 1, 3]) #[1, B, num_queries, embed_dim]
202 |         encoder_out = encoder_out.transpose([1, 2, 0])
203 |         encoder_out = encoder_out.reshape([B, C, H, W])
204 |         print(f'----- ----- decoder out(after reshape): {decoder_out.shape}')
205 | 
206 |         return decoder_out, encoder_out
207 | 
208 | 
209 | def main():
210 |     trans = Transformer()
211 |     print(trans)
212 | 
213 | 
214 | if __name__ == "__main__":
215 |     main()


--------------------------------------------------------------------------------
/distributed/main.py:
--------------------------------------------------------------------------------
 1 | import numpy as np
 2 | import paddle
 3 | import paddle.nn as nn
 4 | import paddle.distributed as dist
 5 | from paddle.io import Dataset
 6 | from paddle.io import DataLoader
 7 | from paddle.io import DistributedBatchSampler
 8 | 
 9 | class MyDataset(Dataset):
10 |     def __init__(self):
11 |         super().__init__()
12 |         self.data = np.arange(32).astype('float32')[:, np.newaxis]
13 | 
14 |     def __getitem__(self, idx):
15 |         return paddle.to_tensor(self.data[idx]), paddle.to_tensor(self.data[idx])
16 |     
17 |     def __len__(self):
18 |         return len(self.data)
19 | 
20 | def get_dataset():
21 |     dataset = MyDataset()
22 |     return dataset
23 | 
24 | def get_dataloader(dataset, batch_size):
25 |     sample = DistributedBatchSampler(dataset, batch_size=batch_size, shuffle=False)
26 |     dataloader = DataLoader(dataset, batch_sampler=sample)
27 |     return dataloader
28 | 
29 | def build_model():
30 |     model = nn.Sequential(*[
31 |         nn.Linear(1, 8),
32 |         nn.ReLU(),
33 |         nn.Linear(8, 10)
34 |     ])
35 |     return model
36 | 
37 | def main_worker(*args):
38 |     dataset = args[0]
39 |     dist.init_parallel_env()
40 |     world_size = dist.get_world_size()
41 |     local_rank = dist.get_rank()
42 | 
43 |     dataloader = get_dataloader(dataset, batch_size=1)
44 | 
45 |     model = build_model()
46 |     model = paddle.DataParallel(model)
47 |     print(f'Hello PPViT, I am {local_rank}: I built a model for myself.')
48 | 
49 |     tensor_list = []
50 |     for data in dataloader:
51 |         sample = data[0]
52 |         label = data[1]
53 | 
54 |         out = model(sample)
55 |         out = out.argmax(1)
56 |         print(f'{local_rank} I got data:{sample.cpu().numpy()}, I have out: {out.cpu().numpy()}')
57 | 
58 |         dist.all_gather(tensor_list, out)
59 |         if local_rank == 0:
60 |             print(f'I am {local_rank}: I got all_gathered out: {tensor_list}')
61 |         break
62 | 
63 | def main():
64 |     dataset = get_dataset()
65 |     dist.spawn(main_worker, args=(dataset,), nprocs=1)
66 | 
67 | if __name__ == '__main__':
68 |     main()


--------------------------------------------------------------------------------
/iterator/tmp.py:
--------------------------------------------------------------------------------
 1 | 
 2 | class MyIterable():
 3 |     def __init__(self):
 4 |         self.data = [1, 2, 3, 4, 5]
 5 |     def __iter__(self):
 6 |         return MyIterator(self.data)
 7 | 
 8 |     def __getitem__(self, idx):
 9 |         return self.data[idx]
10 | 
11 | class MyIterator():
12 |     def __init__(self, data):
13 |         self.data = data
14 |         self.counter = 0
15 |     
16 |     def __iter__(self):
17 |         return self
18 |     
19 |     def __next__(self):
20 |         if self.counter >= len(self.data):
21 |             raise StopIteration()
22 |         data = self.data[self.counter]
23 |         self.counter += 1
24 |         return data
25 | 
26 | my_iterable = MyIterable()
27 | for d in my_iterable:
28 |     print(d)
29 | print(my_iterable[1])


--------------------------------------------------------------------------------
/load_config/a.yaml:
--------------------------------------------------------------------------------
1 | DATA:
2 |     BATCH_SIZE: 512
3 | MODEL:
4 |     TRANS:
5 |         EMBED_DIM: 768


--------------------------------------------------------------------------------
/load_config/config.py:
--------------------------------------------------------------------------------
 1 | from yacs.config import CfgNode as CN
 2 | import yaml
 3 | 
 4 | _C = CN()
 5 | _C.DATA = CN()
 6 | _C.DATA.DATASET = 'cifar10'
 7 | _C.DATA.BATCH_SIZE = 128
 8 | 
 9 | _C.MODEL = CN()
10 | _C.MODEL.NUM_CLASSES = 10
11 | 
12 | _C.MODEL.TRANS = CN()
13 | _C.MODEL.TRANS.EMBED_DIM = 96
14 | _C.MODEL.TRANS.DEPTHS = [2, 2, 6, 2]
15 | _C.MODEL.TRANS.QKV_BIAS = False
16 | 
17 | def _update_config_from_file(config, cfg_file):
18 |     config.defrost()
19 |     config.merge_from_file(cfg_file)    # yaml
20 | 
21 | def update_config(config, args):
22 |     if args.cfg:
23 |         _update_config_from_file(config, args.cfg)
24 |     if args.dataset:
25 |         config.DATA.DATASET = args.dataset
26 |     if args.batch_size:
27 |         config.DATA.BATCH_SIZE = args.batch_size
28 |     return config
29 | 
30 | def get_config(cfg_file=None):
31 |     config = _C.clone()
32 |     if cfg_file:
33 |         _update_config_from_file(config, cfg_file)
34 |     return config
35 | 
36 | 
37 | def main():
38 |     cfg = get_config("load_config/a.yaml")
39 |     print(cfg)
40 | 
41 | if __name__ == "__main__":
42 |     main()


--------------------------------------------------------------------------------
/load_config/main.py:
--------------------------------------------------------------------------------
 1 | import argparse
 2 | from config import get_config, update_config
 3 | 
 4 | def get_arguments():
 5 |     parser = argparse.ArgumentParser()
 6 |     parser.add_argument('-cfg', type=str, default=None, help='config file')
 7 |     parser.add_argument('-batch_size', type=int, default=1024, help='batch size')
 8 |     parser.add_argument('-dataset', type=str, default='imagenet', help='dataset')
 9 |     return parser.parse_args()
10 | 
11 | 
12 | 
13 | def main():
14 |     cfg = get_config()
15 |     print(cfg)
16 |     print('-----------------')
17 | 
18 | 
19 |     cfg = get_config("load_config/a.yaml")
20 |     print(cfg)
21 |     print('-----------------')
22 | 
23 |     args = get_arguments()
24 |     cfg = update_config(cfg, args)
25 |     print(cfg)
26 |     print('-----------------')
27 | 
28 | 
29 | if __name__ == "__main__":
30 |     main()


--------------------------------------------------------------------------------
/resnet.py:
--------------------------------------------------------------------------------
  1 | """
  2 | DateTime: 2021.11.23
  3 | Written By: Dr. Zhu
  4 | Recorded By: Hatimwen
  5 | """
  6 | import paddle
  7 | import paddle.nn as nn
  8 | 
  9 | #paddle.set_device('cpu')
 10 | 
 11 | class Identity(nn.Layer):
 12 |     def __init__(self):
 13 |         super().__init__()
 14 | 
 15 |     def forward(self, x):
 16 |         return x
 17 | 
 18 | 
 19 | class Block(nn.Layer):
 20 |     def __init__(self, in_dim, out_dim, stride=1):
 21 |         super().__init__()
 22 |         ## 补充代码
 23 |         self.conv1 = nn.Conv2D(in_dim, out_dim, 3, stride=stride, padding=1, bias_attr=False)
 24 |         self.bn1 = nn.BatchNorm(out_dim)
 25 |         self.conv2 = nn.Conv2D(out_dim, out_dim, 3, stride=1, padding=1, bias_attr=False)
 26 |         self.bn2 = nn.BatchNorm(out_dim)
 27 |         self.relu = nn.ReLU()
 28 | 
 29 |         if stride == 2 or in_dim != out_dim:
 30 |             self.downsample = nn.Sequential(*[
 31 |                 nn.Conv2D(in_dim, out_dim, 1, stride=stride),
 32 |                 nn.BatchNorm(out_dim)
 33 |             ])
 34 |         else:
 35 |             self.downsample = Identity()
 36 | 
 37 |     def forward(self, x):
 38 |         ## 补充代码
 39 |         h = x
 40 |         x = self.conv1(x)
 41 |         x = self.bn1(x)
 42 |         x = self.relu(x)
 43 |         x = self.conv2(x)
 44 |         x = self.bn2(x)
 45 |         identity = self.downsample(h)
 46 |         x = x + identity
 47 |         x = self.relu(x)
 48 |         return x 
 49 | 
 50 | 
 51 | class ResNet18(nn.Layer):
 52 |     def __init__(self, in_dim=64, num_classes=1000):
 53 |         super().__init__()
 54 |         ## 补充代码
 55 |         self.in_dim = in_dim
 56 |         self.conv1 = nn.Conv2D(in_channels=3, out_channels=in_dim, kernel_size=3, stride=1, padding=1, bias_attr=False)
 57 |         self.bn1 = nn.BatchNorm(in_dim)
 58 |         self.relu = nn.ReLU()
 59 |         # self.maxpool = nn.MaxPool2D(kernel_size=3, stride=2, padding=1)
 60 | 
 61 |         self.layer1 = self._make_layer(64, 2)
 62 |         self.layer2 = self._make_layer(128, 2, 2)
 63 |         self.layer3 = self._make_layer(256, 2, 2)
 64 |         self.layer4 = self._make_layer(512, 2, 2)
 65 | 
 66 |         self.avgpool = nn.AdaptiveAvgPool2D(1)
 67 |         self.fc = nn.Linear(512, num_classes)
 68 | 
 69 |     def _make_layer(self, out_dim, n_blocks, stride=1):
 70 |         ## 补充代码
 71 |         layer_list = []
 72 |         layer_list.append(Block(self.in_dim, out_dim, stride))
 73 |         self.in_dim = out_dim
 74 |         for _ in range(1, n_blocks):
 75 |             layer_list.append(Block(self.in_dim, out_dim))
 76 |         return nn.Sequential(*layer_list)
 77 | 
 78 |     def forward(self, x):
 79 |         ## 补充代码
 80 |         x = self.conv1(x)
 81 |         x = self.bn1(x)
 82 |         x = self.relu(x)
 83 |         # x = self.maxpool(x)
 84 | 
 85 |         x = self.layer1(x)
 86 |         x = self.layer2(x)
 87 |         x = self.layer3(x)
 88 |         x = self.layer4(x)
 89 | 
 90 |         x = self.avgpool(x)
 91 |         x = x.flatten(1)
 92 |         x = self.fc(x)
 93 | 
 94 |         return x
 95 | 
 96 |          
 97 | 
 98 | def main():
 99 |     model = ResNet18()
100 |     print(model)
101 |     x = paddle.randn([2, 3, 32, 32])
102 |     out = model(x)
103 |     print(out.shape)
104 | 
105 | if __name__ == "__main__":
106 |     main()
107 | 


--------------------------------------------------------------------------------
/swin_transformer/main_1128.py:
--------------------------------------------------------------------------------
  1 | """
  2 | DateTime: 2021.11.28
  3 | Written By: Dr. Zhu
  4 | Recorded By: Hatimwen
  5 | """
  6 | import paddle
  7 | import paddle.nn as nn
  8 | 
  9 | paddle.set_device('cpu')
 10 | 
 11 | class PatchEmbedding(nn.Layer):
 12 |     def __init__(self, patch_size=4, embed_dim=96):
 13 |         super().__init__()
 14 |         self.patch_size = nn.Conv2D(3, embed_dim, kernel_size=patch_size, stride=patch_size)
 15 |         self.norm = nn.LayerNorm(embed_dim)
 16 |     
 17 |     def forward(self, x):
 18 |         x = self.patch_size(x)  # [n, embed_dim, h', w']
 19 |         x = x.flatten(2)  # [n, embed_dim, h'*w']
 20 |         x = x.transpose([0, 2, 1])  # [n, h'*w, embed_dim]
 21 |         x = self.norm(x)
 22 |         return x
 23 | 
 24 | 
 25 | class PatchMerging(nn.Layer):
 26 |     def __init__(self, input_resolution, dim):
 27 |         super().__init__()
 28 |         self.resolution = input_resolution
 29 |         self.dim = dim
 30 |         self.reduction = nn.Linear(4 * dim, 2 * dim)
 31 |         self.norm = nn.LayerNorm(4 * dim)
 32 | 
 33 |     def forward(self, x):
 34 |         h, w = self.resolution
 35 |         b, _, c = x.shape   # _ : num_patches
 36 | 
 37 |         x = x.reshape([b, h, w, c])
 38 | 
 39 |         x0 = x[:, 0::2, 0::2, :]
 40 |         x1 = x[:, 0::2, 1::2, :]
 41 |         x2 = x[:, 1::2, 0::2, :]
 42 |         x3 = x[:, 1::2, 1::2, :]
 43 | 
 44 |         x = paddle.concat([x0, x1, x2, x3], axis=-1)    # [b, h/2, w/2, 4c]
 45 |         x = x.reshape([b, -1, 4 * c])
 46 |         x = self.norm(x)
 47 |         x = self.reduction(x)
 48 | 
 49 |         return x
 50 | 
 51 | class Mlp(nn.Layer):
 52 |     def __init__(self, dim, mlp_ratio=4.0, dropout=0.):
 53 |         super().__init__()
 54 |         self.fc1 = nn.Linear(dim, int(dim * mlp_ratio))
 55 |         self.fc2 = nn.Linear(int(dim * mlp_ratio), dim)
 56 |         self.act = nn.GELU()
 57 |         self.dropout = nn.Dropout(dropout)
 58 | 
 59 |     def forward(self, x):
 60 |         x = self.fc1(x)
 61 |         x = self.act(x)
 62 |         x = self.dropout(x)
 63 |         x = self.fc2(x)
 64 |         x = self.dropout(x)
 65 |         return x
 66 | 
 67 | def windows_partition(x, window_size):
 68 |     B, H, W, C = x.shape
 69 |     x = x.reshape([B, H//window_size, window_size, W//window_size, window_size, C])
 70 |     x = x.transpose([0, 1, 3, 2, 4, 5])
 71 |     # [B, h//ws, w//ws, ws, ws, c]
 72 |     x = x.reshape([-1, window_size, window_size, C])
 73 |     # [B * num_patches, ws, ws, c]
 74 |     return x
 75 | 
 76 | def windows_reverse(windows, window_size, H, W):
 77 |     B = int(windows.shape[0] // (H / window_size * W / window_size))
 78 |     x = windows.reshape([B, H//window_size, W//window_size, window_size, window_size, -1])
 79 |     x = x.transpose([0, 1, 3, 2, 4, 5])
 80 |     x = x.reshape([B, H, W, -1])
 81 |     return x
 82 | 
 83 | class WindowAttention(nn.Layer):
 84 |     def __init__(self, dim, window_size, num_heads):
 85 |         super().__init__()
 86 |         self.dim = dim
 87 |         self.dim_head = dim // num_heads
 88 |         self.num_heads = num_heads
 89 |         self.scale = self.dim_head ** -0.5
 90 |         self.softmax = nn.Softmax(axis=-1)
 91 |         self.qkv = nn.Linear(dim, 3 * dim)
 92 |         self.proj = nn.Linear(dim, dim)
 93 | 
 94 |     def transpose_multi_head(self, x):
 95 |         new_shape = x.shape[:-1] + [self.num_heads, self.dim_head]
 96 |         x = x.reshape(new_shape)
 97 |         x = x.transpose([0, 2, 1, 3])   # [B, num_heads, num_patches, dim_head]
 98 |         return x
 99 | 
100 |     def forward(self, x):
101 |         B, N, C = x.shape
102 |         # x: [B, num_patches, embed_dim]
103 |         qkv = self.qkv(x).chunk(3, -1)
104 |         q, k, v = map(self.transpose_multi_head, qkv)
105 | 
106 |         q = q * self.scale
107 |         attn = paddle.matmul(q, k, transpose_y=True)
108 |         attn = self.softmax(attn)
109 | 
110 |         out = paddle.matmul(attn, v)
111 |         # [B, num_heads, num_patches, dim_head]
112 |         out = out.transpose([0, 2, 1, 3])
113 |         # [B, num_patches, num_heads, dim_head] num_heads * dim_head = embed_dim
114 |         out = out.reshape([B, N, C])
115 |         out = self.proj(out)
116 |         return out
117 | 
118 | class SwinBlock(nn.Layer):
119 |     def __init__(self, dim, input_resolution, num_heads, window_size):
120 |         super().__init__()
121 |         self.dim = dim
122 |         self.resolution = input_resolution
123 |         self.window_size = window_size
124 | 
125 |         self.attn_norm = nn.LayerNorm(dim)
126 |         self.attn = WindowAttention(dim, window_size, num_heads)
127 | 
128 |         self.mlp_norm = nn.LayerNorm(dim)
129 |         self.mlp = Mlp(dim)
130 | 
131 |     def forward(self, x):
132 |         H, W = self.resolution
133 |         B, N, C = x.shape
134 | 
135 |         h = x
136 |         x = self.attn_norm(x)
137 |         
138 |         x = x.reshape([B, H, W, C])
139 |         x_windows = windows_partition(x, self.window_size)
140 |         # [B * num_patches, ws, ws, c]
141 |         x_windows = x_windows.reshape([-1, self.window_size * self.window_size, C])
142 |         attn_windows = self.attn(x_windows)
143 |         attn_windows = attn_windows.reshape([-1, self.window_size, self.window_size, C])
144 |         x = windows_reverse(attn_windows, self.window_size, H, W)
145 |         # [B, H, W, C]
146 |         x = x.reshape([B, H*W, C])
147 | 
148 |         x = self.attn(x)
149 | 
150 |         x = h + x
151 | 
152 |         h = x
153 |         x = self.mlp_norm(x)
154 |         x = self.mlp(x)
155 |         x = h + x
156 |         return x
157 | 
158 | def main():
159 |     t = paddle.randn([4, 3, 224, 224])
160 |     print('image shape = ', t.shape)
161 |     patch_embedding = PatchEmbedding(patch_size=4, embed_dim=96)
162 |     swin_block = SwinBlock(dim=96, input_resolution=[56, 56], num_heads=4, window_size=7)
163 |     patch_merging = PatchMerging(input_resolution=[56, 56], dim=96)
164 | 
165 |     out = patch_embedding(t)    # [4, 56, 56, 96]
166 |     print('patch_embedding out shape = ', out.shape)
167 |     out = swin_block(out)
168 |     print('swin_block out shape = ', out.shape)
169 |     out = patch_merging(out)
170 |     print('patch_merging out shape = ', out.shape)
171 | 
172 | if __name__ == '__main__':
173 |     main()


--------------------------------------------------------------------------------
/swin_transformer/main_1129.py:
--------------------------------------------------------------------------------
  1 | """
  2 | DateTime: 2021.11.29
  3 | Written By: Dr. Zhu
  4 | Recorded By: Hatimwen
  5 | """
  6 | import paddle
  7 | import paddle.nn as nn
  8 | from mask_1129 import generate_mask
  9 | 
 10 | paddle.set_device('cpu')
 11 | 
 12 | class PatchEmbedding(nn.Layer):
 13 |     def __init__(self, patch_size=4, embed_dim=96):
 14 |         super().__init__()
 15 |         self.patch_size = nn.Conv2D(3, embed_dim, kernel_size=patch_size, stride=patch_size)
 16 |         self.norm = nn.LayerNorm(embed_dim)
 17 |     
 18 |     def forward(self, x):
 19 |         x = self.patch_size(x)  # [n, embed_dim, h', w']
 20 |         x = x.flatten(2)  # [n, embed_dim, h'*w']
 21 |         x = x.transpose([0, 2, 1])  # [n, h'*w, embed_dim]
 22 |         x = self.norm(x)
 23 |         return x
 24 | 
 25 | 
 26 | class PatchMerging(nn.Layer):
 27 |     def __init__(self, input_resolution, dim):
 28 |         super().__init__()
 29 |         self.resolution = input_resolution
 30 |         self.dim = dim
 31 |         self.reduction = nn.Linear(4 * dim, 2 * dim)
 32 |         self.norm = nn.LayerNorm(4 * dim)
 33 | 
 34 |     def forward(self, x):
 35 |         h, w = self.resolution
 36 |         b, _, c = x.shape   # _ : num_patches
 37 | 
 38 |         x = x.reshape([b, h, w, c])
 39 | 
 40 |         x0 = x[:, 0::2, 0::2, :]
 41 |         x1 = x[:, 0::2, 1::2, :]
 42 |         x2 = x[:, 1::2, 0::2, :]
 43 |         x3 = x[:, 1::2, 1::2, :]
 44 | 
 45 |         x = paddle.concat([x0, x1, x2, x3], axis=-1)    # [b, h/2, w/2, 4c]
 46 |         x = x.reshape([b, -1, 4 * c])
 47 |         x = self.norm(x)
 48 |         x = self.reduction(x)
 49 | 
 50 |         return x
 51 | 
 52 | class Mlp(nn.Layer):
 53 |     def __init__(self, dim, mlp_ratio=4.0, dropout=0.):
 54 |         super().__init__()
 55 |         self.fc1 = nn.Linear(dim, int(dim * mlp_ratio))
 56 |         self.fc2 = nn.Linear(int(dim * mlp_ratio), dim)
 57 |         self.act = nn.GELU()
 58 |         self.dropout = nn.Dropout(dropout)
 59 | 
 60 |     def forward(self, x):
 61 |         x = self.fc1(x)
 62 |         x = self.act(x)
 63 |         x = self.dropout(x)
 64 |         x = self.fc2(x)
 65 |         x = self.dropout(x)
 66 |         return x
 67 | 
 68 | def windows_partition(x, window_size):
 69 |     B, H, W, C = x.shape
 70 |     x = x.reshape([B, H//window_size, window_size, W//window_size, window_size, C])
 71 |     x = x.transpose([0, 1, 3, 2, 4, 5])
 72 |     # [B, h//ws, w//ws, ws, ws, c]
 73 |     x = x.reshape([-1, window_size, window_size, C])
 74 |     # [B * num_patches, ws, ws, c]
 75 |     return x
 76 | 
 77 | def windows_reverse(windows, window_size, H, W):
 78 |     B = int(windows.shape[0] // (H / window_size * W / window_size))
 79 |     x = windows.reshape([B, H//window_size, W//window_size, window_size, window_size, -1])
 80 |     x = x.transpose([0, 1, 3, 2, 4, 5])
 81 |     x = x.reshape([B, H, W, -1])
 82 |     return x
 83 | 
 84 | class WindowAttention(nn.Layer):
 85 |     def __init__(self, dim, window_size, num_heads):
 86 |         super().__init__()
 87 |         self.dim = dim
 88 |         self.dim_head = dim // num_heads
 89 |         self.num_heads = num_heads
 90 |         self.scale = self.dim_head ** -0.5
 91 |         self.softmax = nn.Softmax(axis=-1)
 92 |         self.qkv = nn.Linear(dim, 3 * dim)
 93 |         self.proj = nn.Linear(dim, dim)
 94 | 
 95 |     def transpose_multi_head(self, x):
 96 |         new_shape = x.shape[:-1] + [self.num_heads, self.dim_head]
 97 |         x = x.reshape(new_shape)
 98 |         x = x.transpose([0, 2, 1, 3])   # [B, num_heads, num_patches, dim_head]
 99 |         return x
100 | 
101 |     def forward(self, x, mask=None):
102 |         B, N, C = x.shape
103 |         # x: [B, num_patches, embed_dim]
104 |         qkv = self.qkv(x).chunk(3, -1)
105 |         q, k, v = map(self.transpose_multi_head, qkv)
106 | 
107 |         q = q * self.scale
108 |         attn = paddle.matmul(q, k, transpose_y=True)
109 | 
110 |         ##### BEGIN CLASS 6: Mask
111 |         if mask is None:
112 |             attn = self.softmax(attn)
113 |         else:
114 |             # mask: [num_windows, num_patches, num_patches]
115 |             # attn: [B*num_windows, num_heads, num_patches, num_patches]
116 |             attn = attn.reshape([B//mask.shape[0], mask.shape[0], self.num_heads, mask.shape[1], mask.shape[1]])
117 |             # attn: [B, num_windows, num_heads, num_patches, num_patches]
118 |             # mask: [1, num_windows, 1,         num_patches, num_patches]
119 |             attn = attn + mask.unsqueeze(1).unsqueeze(0)
120 |             attn = attn.reshape([-1, self.num_heads, mask.shape[1], mask.shape[1]])
121 |             # attn: [B*num_windows, num_heads, num_patches, num_patches]
122 |         ##### END CLASS 6: Mask
123 | 
124 | 
125 |         out = paddle.matmul(attn, v)
126 |         # [B, num_heads, num_patches, dim_head]
127 |         out = out.transpose([0, 2, 1, 3])
128 |         # [B, num_patches, num_heads, dim_head] num_heads * dim_head = embed_dim
129 |         out = out.reshape([B, N, C])
130 |         out = self.proj(out)
131 |         return out
132 | 
133 | class SwinBlock(nn.Layer):
134 |     def __init__(self, dim, input_resolution, num_heads, window_size, shift_size=0):
135 |         super().__init__()
136 |         self.dim = dim
137 |         self.resolution = input_resolution
138 |         self.window_size = window_size
139 |         self.shift_size = shift_size
140 | 
141 |         self.attn_norm = nn.LayerNorm(dim)
142 |         self.attn = WindowAttention(dim, window_size, num_heads)
143 | 
144 |         self.mlp_norm = nn.LayerNorm(dim)
145 |         self.mlp = Mlp(dim)
146 | 
147 |         # CLASS 6
148 |         if self.shift_size > 0:
149 |             attn_mask = generate_mask(window_size=self.window_size,
150 |                                       shift_size=self.shift_size,
151 |                                       input_resolution=self.resolution)
152 |         else:
153 |             attn_mask = None
154 |         self.register_buffer('attn_mask', attn_mask)
155 | 
156 |     def forward(self, x):
157 |         H, W = self.resolution
158 |         B, N, C = x.shape
159 | 
160 |         h = x
161 |         x = self.attn_norm(x)
162 |         
163 |         x = x.reshape([B, H, W, C])
164 | 
165 |         ##### BEGIN CLASS 6
166 |         # Shift window
167 |         if self.shift_size > 0:
168 |             shifted_x = paddle.roll(x, shifts=(-self.shift_size, -self.shift_size), axis=(1, 2))
169 |         else:
170 |             shifted_x = x
171 | 
172 |         # Compute window attn
173 |         x_windows = windows_partition(shifted_x, self.window_size)
174 |         x_windows = x_windows.reshape([-1, self.window_size * self.window_size, C])
175 |         attn_windows = self.attn(x_windows, mask=self.attn_mask)
176 |         attn_windows = attn_windows.reshape([-1, self.window_size, self.window_size, C])
177 |         # Shift back
178 |         shifted_x = windows_reverse(attn_windows, self.window_size, H, W)
179 | 
180 |         if self.shift_size > 0:
181 |             x = paddle.roll(x, shifts=(self.shift_size, self.shift_size), axis=(1, 2))
182 |         else:
183 |             x = shifted_x
184 |         ##### END CLASS 6
185 |         
186 | 
187 |         # [B, H, W, C]
188 |         x = x.reshape([B, H*W, C])
189 | 
190 |         x = self.attn(x)
191 | 
192 |         x = h + x
193 | 
194 |         h = x
195 |         x = self.mlp_norm(x)
196 |         x = self.mlp(x)
197 |         x = h + x
198 |         return x
199 | 
200 | def main():
201 |     t = paddle.randn([4, 3, 224, 224])
202 |     patch_embedding = PatchEmbedding(patch_size=4, embed_dim=96)
203 |     swin_block_w_msa = SwinBlock(dim=96, input_resolution=[56, 56], num_heads=4, window_size=7, shift_size=0)
204 |     swin_block_sw_msa = SwinBlock(dim=96, input_resolution=[56, 56], num_heads=4, window_size=7, shift_size=7//2)
205 |     patch_merging = PatchMerging(input_resolution=[56, 56], dim=96)
206 | 
207 |     print('image shape = [4, 3, 224, 224]')
208 |     out = patch_embedding(t)    # [4, 56, 56, 96]
209 |     print('patch_embedding out shape = ', out.shape)
210 |     out = swin_block_w_msa(out)
211 |     out = swin_block_sw_msa(out)
212 |     print('swin_block out shape = ', out.shape)
213 |     out = patch_merging(out)
214 |     print('patch_merging out shape = ', out.shape)
215 | 
216 | if __name__ == '__main__':
217 |     main()


--------------------------------------------------------------------------------
/swin_transformer/main_1130.py:
--------------------------------------------------------------------------------
  1 | """
  2 | DateTime: 2021.11.30
  3 | Written By: Dr. Zhu
  4 | Recorded By: Hatimwen
  5 | """
  6 | import paddle
  7 | import paddle.nn as nn
  8 | from mask_1129 import generate_mask
  9 | 
 10 | paddle.set_device('cpu')
 11 | 
 12 | class Identity(nn.Layer):
 13 |     def __init__(self):
 14 |         super().__init__()
 15 | 
 16 |     def forward(self, x):
 17 |         return x
 18 | 
 19 | class PatchEmbedding(nn.Layer):
 20 |     def __init__(self, patch_size=4, embed_dim=96):
 21 |         super().__init__()
 22 |         self.patch_size = nn.Conv2D(3, embed_dim, kernel_size=patch_size, stride=patch_size)
 23 |         self.norm = nn.LayerNorm(embed_dim)
 24 |     
 25 |     def forward(self, x):
 26 |         x = self.patch_size(x)  # [n, embed_dim, h', w']
 27 |         x = x.flatten(2)  # [n, embed_dim, h'*w']
 28 |         x = x.transpose([0, 2, 1])  # [n, h'*w, embed_dim]
 29 |         x = self.norm(x)
 30 |         return x
 31 | 
 32 | 
 33 | class PatchMerging(nn.Layer):
 34 |     def __init__(self, input_resolution, dim):
 35 |         super().__init__()
 36 |         self.resolution = input_resolution
 37 |         self.dim = dim
 38 |         self.reduction = nn.Linear(4 * dim, 2 * dim)
 39 |         self.norm = nn.LayerNorm(4 * dim)
 40 | 
 41 |     def forward(self, x):
 42 |         h, w = self.resolution
 43 |         b, _, c = x.shape   # _ : num_patches
 44 | 
 45 |         x = x.reshape([b, h, w, c])
 46 | 
 47 |         x0 = x[:, 0::2, 0::2, :]
 48 |         x1 = x[:, 0::2, 1::2, :]
 49 |         x2 = x[:, 1::2, 0::2, :]
 50 |         x3 = x[:, 1::2, 1::2, :]
 51 | 
 52 |         x = paddle.concat([x0, x1, x2, x3], axis=-1)    # [b, h/2, w/2, 4c]
 53 |         x = x.reshape([b, -1, 4 * c])
 54 |         x = self.norm(x)
 55 |         x = self.reduction(x)
 56 | 
 57 |         return x
 58 | 
 59 | class Mlp(nn.Layer):
 60 |     def __init__(self, dim, mlp_ratio=4.0, dropout=0.):
 61 |         super().__init__()
 62 |         self.fc1 = nn.Linear(dim, int(dim * mlp_ratio))
 63 |         self.fc2 = nn.Linear(int(dim * mlp_ratio), dim)
 64 |         self.act = nn.GELU()
 65 |         self.dropout = nn.Dropout(dropout)
 66 | 
 67 |     def forward(self, x):
 68 |         x = self.fc1(x)
 69 |         x = self.act(x)
 70 |         x = self.dropout(x)
 71 |         x = self.fc2(x)
 72 |         x = self.dropout(x)
 73 |         return x
 74 | 
 75 | def windows_partition(x, window_size):
 76 |     B, H, W, C = x.shape
 77 |     x = x.reshape([B, H//window_size, window_size, W//window_size, window_size, C])
 78 |     x = x.transpose([0, 1, 3, 2, 4, 5])
 79 |     # [B, h//ws, w//ws, ws, ws, c]
 80 |     x = x.reshape([-1, window_size, window_size, C])
 81 |     # [B * num_patches, ws, ws, c]
 82 |     return x
 83 | 
 84 | def windows_reverse(windows, window_size, H, W):
 85 |     B = int(windows.shape[0] // (H / window_size * W / window_size))
 86 |     x = windows.reshape([B, H//window_size, W//window_size, window_size, window_size, -1])
 87 |     x = x.transpose([0, 1, 3, 2, 4, 5])
 88 |     x = x.reshape([B, H, W, -1])
 89 |     return x
 90 | 
 91 | class WindowAttention(nn.Layer):
 92 |     def __init__(self, dim, window_size, num_heads):
 93 |         super().__init__()
 94 |         self.dim = dim
 95 |         self.dim_head = dim // num_heads
 96 |         self.num_heads = num_heads
 97 |         self.scale = self.dim_head ** -0.5
 98 |         self.softmax = nn.Softmax(axis=-1)
 99 |         self.qkv = nn.Linear(dim, 3 * dim)
100 |         self.proj = nn.Linear(dim, dim)
101 | 
102 |     def transpose_multi_head(self, x):
103 |         new_shape = x.shape[:-1] + [self.num_heads, self.dim_head]
104 |         x = x.reshape(new_shape)
105 |         x = x.transpose([0, 2, 1, 3])   # [B, num_heads, num_patches, dim_head]
106 |         return x
107 | 
108 |     def forward(self, x, mask=None):
109 |         B, N, C = x.shape
110 |         # x: [B, num_patches, embed_dim]
111 |         qkv = self.qkv(x).chunk(3, -1)
112 |         q, k, v = map(self.transpose_multi_head, qkv)
113 | 
114 |         q = q * self.scale
115 |         attn = paddle.matmul(q, k, transpose_y=True)
116 | 
117 |         print('attn shape=', attn.shape)
118 | 
119 |         ##### BEGIN CLASS 6: Mask
120 |         if mask is None:
121 |             attn = self.softmax(attn)
122 |         else:
123 |             # mask: [num_windows, num_patches, num_patches]
124 |             # attn: [B*num_windows, num_heads, num_patches, num_patches]
125 |             attn = attn.reshape([B//mask.shape[0], mask.shape[0], self.num_heads, mask.shape[1], mask.shape[1]])
126 |             # attn: [B, num_windows, num_heads, num_patches, num_patches]
127 |             # mask: [1, num_windows, 1,         num_patches, num_patches]
128 |             attn = attn + mask.unsqueeze(1).unsqueeze(0)
129 |             attn = attn.reshape([-1, self.num_heads, mask.shape[1], mask.shape[1]])
130 |             # attn: [B*num_windows, num_heads, num_patches, num_patches]
131 |         ##### END CLASS 6: Mask
132 | 
133 | 
134 |         out = paddle.matmul(attn, v)
135 |         # [B, num_heads, num_patches, dim_head]
136 |         out = out.transpose([0, 2, 1, 3])
137 |         # [B, num_patches, num_heads, dim_head] num_heads * dim_head = embed_dim
138 |         out = out.reshape([B, N, C])
139 |         out = self.proj(out)
140 |         return out
141 | 
142 | class SwinBlock(nn.Layer):
143 |     def __init__(self, dim, input_resolution, num_heads, window_size, shift_size=0):
144 |         super().__init__()
145 |         self.dim = dim
146 |         self.resolution = input_resolution
147 |         self.window_size = window_size
148 |         self.shift_size = shift_size
149 | 
150 |         self.attn_norm = nn.LayerNorm(dim)
151 |         self.attn = WindowAttention(dim, window_size, num_heads)
152 | 
153 |         self.mlp_norm = nn.LayerNorm(dim)
154 |         self.mlp = Mlp(dim)
155 |         
156 |         if min(self.resolution) <= self.window_size:
157 |             self.shift_size = 0
158 |             self.window_size = min(self.resolution)
159 | 
160 |         # CLASS 6
161 |         if self.shift_size > 0:
162 |             attn_mask = generate_mask(window_size=self.window_size,
163 |                                       shift_size=self.shift_size,
164 |                                       input_resolution=self.resolution)
165 |         else:
166 |             attn_mask = None
167 |         self.register_buffer('attn_mask', attn_mask)
168 | 
169 |     def forward(self, x):
170 |         H, W = self.resolution
171 |         B, N, C = x.shape
172 | 
173 |         h = x
174 |         x = self.attn_norm(x)
175 |         
176 |         x = x.reshape([B, H, W, C])
177 | 
178 |         ##### BEGIN CLASS 6
179 |         # Shift window
180 |         if self.shift_size > 0:
181 |             shifted_x = paddle.roll(x, shifts=(-self.shift_size, -self.shift_size), axis=(1, 2))
182 |         else:
183 |             shifted_x = x
184 | 
185 |         # Compute window attn
186 |         x_windows = windows_partition(shifted_x, self.window_size)
187 |         x_windows = x_windows.reshape([-1, self.window_size * self.window_size, C])
188 |         attn_windows = self.attn(x_windows, mask=self.attn_mask)
189 |         attn_windows = attn_windows.reshape([-1, self.window_size, self.window_size, C])
190 |         # Shift back
191 |         shifted_x = windows_reverse(attn_windows, self.window_size, H, W)
192 | 
193 |         if self.shift_size > 0:
194 |             x = paddle.roll(x, shifts=(self.shift_size, self.shift_size), axis=(1, 2))
195 |         else:
196 |             x = shifted_x
197 |         ##### END CLASS 6
198 |         
199 | 
200 |         # [B, H, W, C]
201 |         x = x.reshape([B, H*W, C])
202 | 
203 |         x = self.attn(x)
204 | 
205 |         x = h + x
206 | 
207 |         h = x
208 |         x = self.mlp_norm(x)
209 |         x = self.mlp(x)
210 |         x = h + x
211 |         return x
212 | 
213 | class SwinStage(nn.Layer):
214 |     def __init__(self, dim, input_resolution, depth, num_heads, window_size, patch_merging=None):
215 |         super().__init__()
216 |         self.blocks = nn.LayerList()
217 |         for i in range(depth):
218 |             self.blocks.append(
219 |                 SwinBlock(dim=dim, 
220 |                           input_resolution=input_resolution,
221 |                           num_heads=num_heads,
222 |                           window_size=window_size,
223 |                           shift_size=0 if (i % 2 == 0) else window_size//2))
224 |         if patch_merging is None:
225 |             self.patch_merging = Identity()
226 |         else:
227 |             self.patch_merging = patch_merging(input_resolution, dim)
228 | 
229 |     def forward(self, x):
230 |         for block in self.blocks:
231 |             x = block(x)
232 |         x = self.patch_merging(x)
233 |         return x
234 | 
235 | class Swin(nn.Layer):
236 |     def __init__(self,
237 |                  image_size=224,
238 |                  patch_size=4,
239 |                  in_channels=3,
240 |                  embed_dim=96,
241 |                  window_size=7,
242 |                  num_heads=[3, 6, 12, 24],
243 |                  depths=[2, 2, 6, 2],
244 |                  num_classes=1000):
245 |         super().__init__()
246 |         self.num_classes = num_classes
247 |         self.depths = depths
248 |         self.num_heads = num_heads
249 |         self.embed_dim = embed_dim
250 |         self.num_stages = len(depths)
251 |         self.num_features = int(self.embed_dim * 2**(self.num_stages-1))
252 |         self.patch_resolution = [image_size // patch_size, image_size // patch_size]
253 |         self.patch_embedding = PatchEmbedding(patch_size=patch_size, embed_dim=embed_dim)
254 | 
255 |         self.stages = nn.LayerList()
256 |         for idx, (depth, num_heads) in enumerate(zip(self.depths, self.num_heads)):
257 |             stage = SwinStage(dim=int(self.embed_dim * 2**idx),
258 |                               input_resolution=(self.patch_resolution[0]//(2**idx),
259 |                                                 self.patch_resolution[0]//(2**idx)),
260 |                               depth=depth,
261 |                               num_heads=num_heads,
262 |                               window_size=window_size,
263 |                               patch_merging=PatchMerging if (idx < self.num_stages-1) else None)
264 |             self.stages.append(stage)
265 | 
266 |         self.norm = nn.LayerNorm(self.num_features)
267 |         self.avgpool = nn.AdaptiveAvgPool1D(1)
268 |         self.fc = nn.Linear(self.num_features, self.num_classes)
269 | 
270 |     def forward(self, x):
271 |         x = self.patch_embedding(x)
272 |         for stage in self.stages:
273 |             x = stage(x)
274 |         x = self.norm(x)
275 |         x = x.transpose([0, 2, 1])
276 |         # [B, embed_dim, num_windows]
277 |         x = self.avgpool(x)  # [B, embed_dim, 1]
278 |         x = x.flatten(1)
279 |         x = self.fc(x)
280 |         return x
281 | 
282 | 
283 | def main():
284 |     t = paddle.randn((4, 3, 224, 224))
285 |     # patch_embedding = PatchEmbedding(patch_size=4, embed_dim=96)
286 |     # swin_block_w_msa = SwinBlock(dim=96, input_resolution=[56, 56], num_heads=4, window_size=7, shift_size=0)
287 |     # swin_block_sw_msa = SwinBlock(dim=96, input_resolution=[56, 56], num_heads=4, window_size=7, shift_size=7//2)
288 |     # patch_merging = PatchMerging(input_resolution=[56, 56], dim=96)
289 | 
290 |     # print('image shape = [4, 3, 224, 224]')
291 |     # out = patch_embedding(t)    # [4, 56, 56, 96]
292 |     # print('patch_embedding out shape = ', out.shape)
293 |     # out = swin_block_w_msa(out)
294 |     # out = swin_block_sw_msa(out)
295 |     # print('swin_block out shape = ', out.shape)
296 |     # out = patch_merging(out)
297 |     # print('patch_merging out shape = ', out.shape)
298 | 
299 |     model = Swin()
300 |     print(model)
301 |     out = model(t)
302 |     print(out.shape)
303 | 
304 | if __name__ == '__main__':
305 |     main()


--------------------------------------------------------------------------------
/swin_transformer/mask_1129.py:
--------------------------------------------------------------------------------
 1 | """
 2 | DateTime: 2021.11.29
 3 | Written By: Dr. Zhu
 4 | Recorded By: Hatimwen
 5 | """
 6 | import paddle
 7 | from PIL import Image
 8 | paddle.set_device('cpu')
 9 | 
10 | def window_partition(x, window_size):
11 |     B, H, W, C = x.shape
12 |     x = x.reshape([B, H//window_size, window_size, W//window_size, window_size, C])
13 |     x = x.transpose([0, 1, 3, 2, 4, 5])
14 |     x = x.reshape([-1, window_size, window_size, C])
15 |     return x
16 | 
17 | def generate_mask(window_size=4, shift_size=2, input_resolution=(8, 8)):
18 |     H, W = input_resolution
19 |     img_mask = paddle.zeros([1, H, W, 1])
20 |     h_slices = [slice(0, -window_size),
21 |                 slice(-window_size, -shift_size),
22 |                 slice(-shift_size, None)] # a[slice(...)] = a[0:-window_size]
23 |     w_slices = [slice(0, -window_size),
24 |                 slice(-window_size, -shift_size),
25 |                 slice(-shift_size, None)]
26 |     cnt = 0
27 |     for h in h_slices:
28 |         for w in w_slices:
29 |             img_mask[:, h, w, :] = cnt
30 |             cnt += 1
31 |     windows_mask = window_partition(img_mask, window_size=window_size)
32 |     windows_mask = windows_mask.reshape([-1, window_size * window_size])
33 | 
34 |     attn_mask = windows_mask.unsqueeze(1) - windows_mask.unsqueeze(2)
35 |     # Broadcasting: [n, 1, ws*ws] - [n, ws*ws, 1]
36 |     attn_mask = paddle.where(attn_mask!=0,
37 |                              paddle.ones_like(attn_mask) * 255,
38 |                              paddle.zeros_like(attn_mask))
39 |     return attn_mask
40 | 
41 | def main():
42 |     mask = generate_mask()
43 |     print(mask.shape)
44 |     mask = mask.cpu().numpy().astype('uint8')
45 |     for i in range(4):
46 |         for j in range(16):
47 |             for k in range(16):
48 |                 print(mask[i, j, k], end='\t')
49 |             print()
50 | 
51 |         im = Image.fromarray(mask[i, :, :])
52 |         im.save(f'{i}.png')
53 |         print()
54 |         print()
55 |     print()
56 | 
57 | if __name__ == '__main__':
58 |     main()


--------------------------------------------------------------------------------
/vit.py:
--------------------------------------------------------------------------------
  1 | """
  2 | DateTime: 2021.11.24
  3 | Written By: Dr. Zhu
  4 | Recorded By: Hatimwen
  5 | """
  6 | import paddle as paddle
  7 | import paddle.nn as nn
  8 | # from PIL import Image
  9 | from paddle.nn.layer.common import Identity
 10 | 
 11 | paddle.set_device('cpu')
 12 | 
 13 | class MLp(nn.Layer):
 14 |     def __init__(self, embed_dim, mlp_ratio=4.0, dropout=0.):
 15 |         super(MLp, self).__init__()
 16 |         self.fc1 = nn.Linear(embed_dim, int(embed_dim * mlp_ratio))
 17 |         self.fc2 = nn.Linear(int(embed_dim * mlp_ratio), embed_dim)
 18 |         self.act = nn.GELU()
 19 |         self.dropout = nn.Dropout(dropout)
 20 | 
 21 |     def forward(self, x):
 22 |         x = self.fc1(x)
 23 |         x = self.act(x)
 24 |         x = self.dropout(x)
 25 |         x = self.fc2(x)
 26 |         x = self.dropout(x)
 27 |         return x
 28 | 
 29 | class Encoder(nn.Layer):
 30 |     def __init__(self, embed_dim):
 31 |         super(Encoder, self).__init__()
 32 |         self.attn = Identity() #TODO
 33 |         self.attn_norm = nn.LayerNorm(embed_dim)
 34 |         self.mlp = MLp(embed_dim)
 35 |         self.mlp_norm = nn.LayerNorm(embed_dim)
 36 | 
 37 |     def forward(self, x):
 38 |         h = x
 39 |         x = self.attn_norm(x)
 40 |         x = self.attn(x)
 41 |         x = x + h
 42 | 
 43 |         h = x
 44 |         x = self.mlp_norm(x)
 45 |         x = self.mlp(x)
 46 |         x = x + h
 47 |         return x
 48 | 
 49 | 
 50 | 
 51 | class PatchEmbedding(nn.Layer):
 52 |     def __init__(self, image_size, patch_size, in_channels, embed_dim, dropout=0.):
 53 |         super(PatchEmbedding, self).__init__()
 54 |         self.patch_embed = nn.Conv2D(in_channels,
 55 |                                     embed_dim,
 56 |                                     kernel_size=patch_size,
 57 |                                     stride=patch_size,
 58 |                                     weight_attr=paddle.ParamAttr(initializer=nn.initializer.Constant(1.0)),
 59 |                                     bias_attr=False)
 60 |         self.drop_out = nn.Dropout(dropout)
 61 | 
 62 |     def forward(self, x):
 63 |         x = self.patch_embed(x) # [n, embed_dim, h', w']
 64 |         x = x.flatten(2) # [n, embed_dim, h' * w']
 65 |         x = x.transpose([0, 2, 1]) # [n, h' * w, embed_dim]
 66 |         x = self.drop_out(x)
 67 |         return x
 68 | 
 69 | class ViT(nn.Layer):
 70 |     def __init__(self):
 71 |         super(ViT, self).__init__()
 72 |         self.patch_embed = PatchEmbedding(224, 7, 3, 16)
 73 |         layer_list = [Encoder(16) for _ in range(5)]
 74 |         self.encoders = nn.LayerList(layer_list)
 75 |         self.head = nn.Linear(16, 10)   # 10:num_classes
 76 |         self.avgpool = nn.AdaptiveAvgPool1D(1)
 77 | 
 78 |     def forward(self, x):
 79 |         x = self.patch_embed(x)
 80 |         for encoder in self.encoders:
 81 |             x = encoder(x)
 82 |         # layernorm usually used here
 83 |         # [n, h' * w', embed_dim]
 84 |         x = x.transpose([0, 2, 1])
 85 |         x = self.avgpool(x)    # [n, embed_dim, 1]
 86 |         x = x.flatten(1)       # [n, embed_dim]
 87 |         x = self.head(x)
 88 |         return x
 89 | 
 90 | def main():
 91 |     # random img:
 92 |     # img = np.random.randint(0, 255, [28, 28], dtype=np.uint8)
 93 |     # sample = paddle.to_tensor(img, dtype='float32')
 94 |     # sample = sample.reshape([1, 1, 28, 28])
 95 | 
 96 |     # patch_embed = PatchEmbedding(28, 7, 1, 1)
 97 |     # out = patch_embed(sample)
 98 |     # print(out)
 99 |     # print(out.shape)
100 | 
101 |     # mlp = MLp(1)
102 |     # out = mlp(out)
103 |     # print(out)
104 |     # print(out.shape)
105 | 
106 |     t = paddle.randn([4, 3, 224, 224])
107 |     vit = ViT()
108 |     out = vit(t)
109 |     print(out)
110 |     print(type(out))
111 |     print(out.shape)
112 | 
113 | if __name__ == "__main__":
114 |     main()


--------------------------------------------------------------------------------
/vit_1126.py:
--------------------------------------------------------------------------------
  1 | """
  2 | DateTime: 2021.11.26
  3 | Written By: Dr. Zhu
  4 | Recorded By: Hatimwen
  5 | """
  6 | import paddle
  7 | import paddle.nn as nn
  8 | 
  9 | paddle.set_device('cpu')
 10 | 
 11 | class Identity(nn.Layer):
 12 |     def __init__(self):
 13 |         super(Identity, self).__init__()
 14 | 
 15 |     def forward(self, x):
 16 |         return x
 17 | 
 18 | class Mlp(nn.Layer):
 19 |     def __init__(self, embed_dim, mlp_ratio, dropout=0.):
 20 |         super().__init__()
 21 |         self.fc1 = nn.Linear(embed_dim, int(embed_dim * mlp_ratio))
 22 |         self.fc2 = nn.Linear(int(embed_dim * mlp_ratio), embed_dim)
 23 |         self.act = nn.GELU()
 24 |         self.dropout = nn.Dropout(dropout)
 25 | 
 26 |     def forward(self, x):
 27 |         x = self.fc1(x)
 28 |         x = self.act(x)
 29 |         x = self.dropout(x)
 30 |         x = self.fc2(x)
 31 |         x = self.dropout(x)
 32 |         return x
 33 | 
 34 | class PatchEmbedding(nn.Layer):
 35 |     def __init__(self, image_size=224, patch_size=16, in_channels=3, embed_dim=768, dropout=0.):
 36 |         super().__init__()
 37 |         n_patches = (image_size // patch_size) * (image_size // patch_size)
 38 |         self.patch_embedding = nn.Conv2D(in_channels=in_channels,
 39 |                                          out_channels=embed_dim,
 40 |                                          kernel_size=patch_size,
 41 |                                          stride=patch_size)
 42 | 
 43 |         self.class_token = paddle.create_parameter(
 44 |             shape=[1, 1, embed_dim],
 45 |             dtype='float32',
 46 |             default_initializer=paddle.nn.initializer.Constant(0.))
 47 | 
 48 |         self.position_embedding = paddle.create_parameter(
 49 |             shape=[1, n_patches+1, embed_dim],
 50 |             dtype='float32',
 51 |             default_initializer=nn.initializer.TruncatedNormal(std=.02))
 52 | 
 53 |         self.dropout = nn.Dropout(dropout)
 54 | 
 55 |     def forward(self, x):
 56 |         # [n, c, h, w]
 57 |         class_tokens = self.class_token.expand([x.shape[0], -1, -1])
 58 |         # class_tokens = self.class_token.expand([x.shape[0], 1, self.embed_dim])  # for batch
 59 |         x = self.patch_embedding(x)    #[n, embed_dim, h', w']
 60 |         x = x.flatten(2) # [n, embed_dim, h' * w']
 61 |         x = x.transpose([0, 2, 1]) # [n, h' * w, embed_dim]
 62 |         x = paddle.concat([class_tokens, x], axis=1)
 63 |         
 64 |         x = x + self.position_embedding
 65 |         x = self.dropout(x)
 66 |         return x
 67 | 
 68 | class Attention(nn.Layer):
 69 |     """multi-head self attention"""
 70 |     def __init__(self, embed_dim, num_heads, qkv_bias=True, dropout=0., attention_dropout=0.):
 71 |         super().__init__()
 72 |         self.num_heads = num_heads
 73 |         self.head_dim = int(embed_dim / num_heads)
 74 |         self.all_head_dim = self.head_dim * num_heads
 75 |         self.scale = self.head_dim ** -0.5
 76 | 
 77 |         self.qkv = nn.Linear(embed_dim,
 78 |                              self.all_head_dim * 3)
 79 | 
 80 |         self.proj = nn.Linear(self.all_head_dim, embed_dim)
 81 | 
 82 |         self.dropout = nn.Dropout(dropout)
 83 |         self.attention_dropout = nn.Dropout(attention_dropout)
 84 |         self.softmax = nn.Softmax(axis=-1)
 85 | 
 86 |     def transpose_multi_head(self, x):
 87 |         # N: num_patches
 88 |         # x: [B, N, all_head_dim]
 89 |         new_shape = x.shape[:-1] + [self.num_heads, self.head_dim]
 90 |         x = x.reshape(new_shape)
 91 |         # x: [B, N, num_heads, head_dim]
 92 |         x = x.transpose([0, 2, 1, 3])
 93 |         # x: [B, num_heads, N, head_dim]
 94 |         return x
 95 | 
 96 |     def forward(self, x):
 97 |         B, N, _ = x.shape
 98 |         qkv = self.qkv(x).chunk(3, -1)
 99 |         # [B, N, all_head_dim] * 3
100 |         q, k, v = map(self.transpose_multi_head, qkv)
101 | 
102 |         # q, k, v: [B, num_heads, N, head_dim]
103 |         attn = paddle.matmul(q, k, transpose_y=True)    # q * k^T
104 |         attn = self.scale * attn
105 |         attn = self.softmax(attn)
106 |         attn = self.attention_dropout(attn)
107 |         # attn :[B, num_heads, N, N]
108 | 
109 |         out = paddle.matmul(attn, v)    # softmax(scale(q * k^T)) * v
110 |         out = out.transpose([0, 2, 1, 3])
111 |         # out: [B, N, num_heads, head_dim]
112 |         out = out.reshape([B, N, -1])
113 | 
114 |         out = self.proj(out)
115 |         out = self.dropout(out)
116 |         return out
117 | 
118 | class EncoderLayer(nn.Layer):
119 |     def __init__(self, embed_dim=768, num_heads=4, qkv_bias=True, mlp_ratio=40, dropout=0., attention_dropout=0.):
120 |         super().__init__()
121 |         self.attn_norm = nn.LayerNorm(embed_dim)
122 |         self.attn = Attention(embed_dim, num_heads)
123 |         self.mlp_norm = nn.LayerNorm(embed_dim)
124 |         self.mlp = Mlp(embed_dim, mlp_ratio)
125 | 
126 |     def forward(self, x):
127 |         h = x   # residual
128 |         x = self.attn_norm(x)
129 |         x = self.attn(x)
130 |         x = x + h
131 | 
132 |         h = x
133 |         x = self.mlp_norm(x)
134 |         x = self.mlp(x)
135 |         x = x + h
136 |         return x
137 | 
138 | class Encoder(nn.Layer):
139 |     def __init__(self, embed_dim, depth):
140 |         super().__init__()
141 |         layer_list = []
142 |         for i in range(depth):
143 |             encoder_layer = EncoderLayer()
144 |             layer_list.append(encoder_layer)
145 |         self.layers = nn.LayerList(layer_list)
146 |         self.norm = nn.LayerNorm(embed_dim)
147 | 
148 |     def forward(self, x):
149 |         for layer in self.layers:
150 |             x = layer(x)
151 |         x = self.norm(x)
152 |         return x
153 | 
154 | class VisualTransformer(nn.Layer):
155 |     def __init__(self,
156 |                  image_size=224,
157 |                  patch_size=16,
158 |                  in_channels=3,
159 |                  num_classes=1000,
160 |                  embed_dim=768,
161 |                  depth=3,
162 |                  num_heads=8,
163 |                  mlp_ratio=4,
164 |                  qkv_bias=True,
165 |                  dropout=0.,
166 |                  attention_dropout=0.,
167 |                  droppath=0.):
168 |         super().__init__()
169 |         self.patch_embedding = PatchEmbedding(image_size, patch_size, in_channels, embed_dim)
170 |         self.encoder = Encoder(embed_dim, depth)
171 |         self.classifier = nn.Linear(embed_dim, num_classes)
172 | 
173 |     def forward(self, x):
174 |         # x: [N, C, H, W]
175 |         x = self.patch_embedding(x) # [N, embed_dim, h', w']
176 |         # x = x.flatten(2) # [N, embed_dim, h' * w'] h' * w' = num_patches
177 |         # x = x.transpose([0, 2, 1]) # [N, num_patches, embed_dim]
178 |         x = self.encoder(x)
179 |         x = self.classifier(x[:, 0])
180 |         return x
181 | 
182 | def main():
183 |     vit = VisualTransformer()
184 |     print(vit)
185 |     paddle.summary(vit, input_size=(4, 3, 224, 224))
186 | 
187 | 
188 | if __name__ == '__main__':
189 |     main()


--------------------------------------------------------------------------------