├── README.md
├── .gitattributes
├── assets
    └── 猫数据增强.png
├── 6模型保存与加载.md
├── 5自定义网络层.md
├── 2面向 Numpy 用户的 PyTorch 速查表.md
├── 7数据增强的方法.md
├── pytorch套路.md
├── 9pytorch读取数据集.md
├── 4pytorch初始化.md
├── 01一步步实现神经网络.md
├── 1PyTorch 实现中的一些常用技巧.md
├── 3pytorch中的损失函数.md
├── 0tensor操作.md
└── 8pytorch优化函数学习率衰减.md


/README.md:
--------------------------------------------------------------------------------
1 | # pytorch-note
2 |  pytorch学习笔记
3 | 将学习的pytorch进行总结
4 | 


--------------------------------------------------------------------------------
/.gitattributes:
--------------------------------------------------------------------------------
1 | # Auto detect text files and perform LF normalization
2 | * text=auto
3 | 


--------------------------------------------------------------------------------
/assets/猫数据增强.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/ys1305/pytorch-note/HEAD/assets/猫数据增强.png


--------------------------------------------------------------------------------
/6模型保存与加载.md:
--------------------------------------------------------------------------------
 1 | 网络的保存和提取有两种方式：一种是保存网络的状态（parameters and buffers）；另一种是保存整个网络（模型和状态）,当然是推荐第一种啦！！！(方便重构)。
 2 | 官方保存模型讲解：
 3 | https://pytorch.org/tutorials/beginner/saving_loading_models.html
 4 | 
 5 | ## 1 保存参数
 6 | 
 7 |     # 保存模型
 8 |     torch.save(model.state_dict(), "wordavg-model.pth")
 9 |     
10 |     # 加载模型，加载前需要定义模型的结构
11 |     model = WordAVGModel(vocab_size=VOCAB_SIZE, 
12 |                          embedding_size=EMBEDDING_SIZE, 
13 |                          output_size=OUTPUT_SIZE, 
14 |                          pad_idx=PAD_IDX)
15 |     model = model.cuda()
16 |     # 将state_dict中的参数和缓冲区复制到模型中
17 |     model.load_state_dict(torch.load("wordavg-model.pth"))
18 | 
19 | 在PyTorch中，torch.nn.Module模型的可学习参数（即权重和偏差）包含在模型的参数中（使用model.parameters（）访问）。 state_dict只是一个Python字典对象，它将每个图层映射到其参数张量。 请注意，只有具有可学习参数（卷积层，线性层等）和已注册缓冲区（batchnorm的running_mean）的图层在模型的state_dict中具有条目。 优化器对象（torch.optim）也有一个state_dict，它包含有关优化器状态的信息，以及使用的超参数。
20 | 
21 | 因为state_dict对象是Python字典，所以它们可以轻松保存，更新，更改和恢复，为PyTorch模型和优化器添加了大量模块化。
22 | 
23 | ## 2 整个模型
24 | 
25 |     Save:
26 |     torch.save(model, PATH)
27 |     Load:
28 |     # Model class must be defined somewhere
29 |     model = torch.load(PATH)
30 |     model.eval()
31 | 这种方式仅看文档的写法，会给你造成一个误解，认为整个模型保存下来了，任何地方都能直接load使用，其实不然！！！
32 | 
33 | 此保存/加载过程使用最直观的语法并涉及最少量的代码。 以这种方式保存模型将使用Python的pickle模块保存整个模块。 这种方法的缺点是序列化数据绑定到特定类以及保存模型时使用的确切目录结构。 这是因为pickle不保存模型类本身。 相反，它会保存包含类的文件的路径，该文件在加载时使用。 因此，当您在其他项目中或在重构之后使用时，您的代码可能会以各种方式中断。
34 | 
35 | 常见的PyTorch约定是使用.pt或.pth文件扩展名保存模型。
36 | 
37 | 请记住，在运行推理之前，必须调用model.eval（）将dropout和批处理规范化层设置为评估模式。 如果不这样做，将导致不一致的推理结果。
38 | 
39 | ## 3 Saving & Loading a General Checkpoint for Inference and/or Resuming Training
40 | 
41 |     Save:
42 |     torch.save({
43 |                 'epoch': epoch,
44 |                 'model_state_dict': model.state_dict(),
45 |                 'optimizer_state_dict': optimizer.state_dict(),
46 |                 'loss': loss,
47 |                 ...
48 |                 }, PATH)
49 |     Load:
50 |     model = TheModelClass(*args, **kwargs)
51 |     optimizer = TheOptimizerClass(*args, **kwargs)
52 |     
53 |     checkpoint = torch.load(PATH)
54 |     model.load_state_dict(checkpoint['model_state_dict'])
55 |     optimizer.load_state_dict(checkpoint['optimizer_state_dict'])
56 |     epoch = checkpoint['epoch']
57 |     loss = checkpoint['loss']
58 |     
59 |     model.eval()
60 |     # - or -
61 |     model.train()


--------------------------------------------------------------------------------
/5自定义网络层.md:
--------------------------------------------------------------------------------
  1 | # nn.Parameter
  2 | 
  3 | ` nn.Parameter 包裹起来才能加入model.parameters()中，才能够使用优化器进行优化参数`
  4 | 
  5 | ```python
  6 | x = torch.from_numpy(x).float()
  7 | y = torch.from_numpy(y).float()
  8 | # w,b可以被优化器进行优化
  9 | w = nn.Parameter(torch.randn(2, 1))
 10 | b = nn.Parameter(torch.zeros(1))
 11 | 
 12 | optimizer = torch.optim.SGD([w, b], 1e-1)
 13 | 
 14 | def logistic_regression(x):
 15 |     return torch.mm(x, w) + b
 16 | 
 17 | criterion = nn.BCEWithLogitsLoss()
 18 | for e in range(100):
 19 |     out = logistic_regression(x)
 20 |     loss = criterion(out, y)
 21 |     optimizer.zero_grad()
 22 |     loss.backward()
 23 |     optimizer.step()
 24 |     if (e + 1) % 20 == 0:
 25 |         print('epoch: {}, loss: {}'.format(e+1, loss.data[0]))
 26 | ```
 27 | 
 28 | 
 29 | 
 30 | ## 自定义网络层和函数
 31 | 
 32 | ```python
 33 | import  torch
 34 | from    torch import nn
 35 | from    torch import optim
 36 | 
 37 | 
 38 | # nn.Parameter 包裹起来才能加入model.parameters()中，才能够使用优化器进行优化参数
 39 | class MyLinear(nn.Module):
 40 |     def __init__(self, inp, outp):
 41 |         super(MyLinear, self).__init__()
 42 | 
 43 |         # requires_grad = True
 44 |         self.w = nn.Parameter(torch.randn(outp, inp))
 45 |         self.b = nn.Parameter(torch.randn(outp))
 46 |  	# x.shape [batch,inp]
 47 |     def forward(self, x):
 48 |         # x = x @ self.w.t() + self.b
 49 |         x = x.mm(self.w.t()) + self.b
 50 |         # .t()为转置  @等价于mm均为矩阵乘法
 51 |         return x
 52 |     
 53 |     
 54 | # 类函数，可以写在Sequential中
 55 | class Flatten(nn.Module):
 56 | 
 57 |     def __init__(self):
 58 |         super(Flatten, self).__init__()
 59 | 
 60 |     def forward(self, input):
 61 |         return input.view(input.size(0), -1)
 62 | ```
 63 | 
 64 | ## 访问children/modules
 65 | 
 66 | ```
 67 | class sim_net(nn.Module):
 68 |     def __init__(self):
 69 |         super(sim_net, self).__init__()
 70 |         self.l1 = nn.Sequential(
 71 |             nn.Linear(30, 40),
 72 |             nn.ReLU()
 73 |         )
 74 |         
 75 |         self.l1[0].weight.data = torch.randn(40, 30) # 直接对某一层初始化
 76 |         
 77 |         self.l2 = nn.Sequential(
 78 |             nn.Linear(40, 50),
 79 |             nn.ReLU()
 80 |         )
 81 |         
 82 |         self.l3 = nn.Sequential(
 83 |             nn.Linear(50, 10),
 84 |             nn.ReLU()
 85 |         )
 86 |     
 87 |     def forward(self, x):
 88 |         x = self.l1(x)
 89 |         x =self.l2(x)
 90 |         x = self.l3(x)
 91 |         return x
 92 | net2 = sim_net()
 93 | 
 94 | # 访问 children
 95 | for i in net2.children():
 96 |     print(i)
 97 |     
 98 | # 访问 modules
 99 | for i in net2.modules():
100 |     print(i)
101 | ```
102 | 
103 | 
104 | 
105 | ```python
106 | class BasicNet(nn.Module):
107 | 
108 |     def __init__(self):
109 |         super(BasicNet, self).__init__()
110 | 
111 |         self.net = nn.Linear(4, 3)
112 | 
113 |     def forward(self, x):
114 |         return self.net(x)
115 | 
116 | class Net(nn.Module):
117 | 
118 |     def __init__(self):
119 |         super(Net, self).__init__()
120 | 
121 |         self.net = nn.Sequential(BasicNet(),
122 |                                  nn.ReLU(),
123 |                                  nn.Linear(3, 2))
124 | 
125 |     def forward(self, x):
126 |         return self.net(x)
127 |         
128 | net = Net()
129 | net.to(device)
130 | 
131 | net.train()
132 | net.eval()
133 | 
134 | # net.load_state_dict(torch.load('ckpt.mdl'))
135 | # torch.save(net.state_dict(), 'ckpt.mdl')
136 | 
137 | for name, t in net.named_parameters():
138 |     print('parameters:', name, t.shape)
139 | print("-"*20)
140 | # 找到直系孩子
141 | for name, m in net.named_children():
142 |     print('children:', name, m)
143 | print("-"*20)
144 | #找到所有的孩子，包括孩子的孩子
145 | for name, m in net.named_modules():
146 |     print('modules:', name, m)
147 | 
148 | 
149 | parameters: net.0.net.weight torch.Size([3, 4])
150 | parameters: net.0.net.bias torch.Size([3])
151 | parameters: net.2.weight torch.Size([2, 3])
152 | parameters: net.2.bias torch.Size([2])
153 | --------------------
154 | children: net Sequential(
155 |   (0): BasicNet(
156 |     (net): Linear(in_features=4, out_features=3, bias=True)
157 |   )
158 |   (1): ReLU()
159 |   (2): Linear(in_features=3, out_features=2, bias=True)
160 | )
161 | --------------------
162 | modules:  Net(
163 |   (net): Sequential(
164 |     (0): BasicNet(
165 |       (net): Linear(in_features=4, out_features=3, bias=True)
166 |     )
167 |     (1): ReLU()
168 |     (2): Linear(in_features=3, out_features=2, bias=True)
169 |   )
170 | )
171 | # 模型本身
172 | 
173 | modules: net Sequential(
174 |   (0): BasicNet(
175 |     (net): Linear(in_features=4, out_features=3, bias=True)
176 |   )
177 |   (1): ReLU()
178 |   (2): Linear(in_features=3, out_features=2, bias=True)
179 | )
180 | # 直系孩子
181 | modules: net.0 BasicNet(
182 |   (net): Linear(in_features=4, out_features=3, bias=True)
183 | )
184 | # 孩子的孩子
185 | modules: net.0.net Linear(in_features=4, out_features=3, bias=True)
186 | # 
187 | modules: net.1 ReLU()
188 | modules: net.2 Linear(in_features=3, out_features=2, bias=True)
189 | ```
190 | 
191 | 
192 | 


--------------------------------------------------------------------------------
/2面向 Numpy 用户的 PyTorch 速查表.md:
--------------------------------------------------------------------------------
  1 | # 类型（Types）
  2 | 
  3 | | Numpy      | PyTorch                     |
  4 | | ---------- | --------------------------- |
  5 | | np.ndarray | torch.Tensor                |
  6 | | np.float32 | torch.float32; torch.float  |
  7 | | np.float64 | torch.float64; torch.double |
  8 | | np.float16 | torch.float16; torch.half   |
  9 | | np.int8    | torch.int8                  |
 10 | | np.uint8   | torch.uint8                 |
 11 | | np.int16   | torch.int16; torch.short    |
 12 | | np.int32   | torch.int32; torch.int      |
 13 | | np.int64   | torch.int64; torch.long     |
 14 | 
 15 | # 构造器（Constructor）
 16 | 
 17 | ## 零和一（Ones and zeros）
 18 | 
 19 | | Numpy            | PyTorch             |
 20 | | ---------------- | ------------------- |
 21 | | np.empty((2, 3)) | torch.empty(2, 3)   |
 22 | | np.empty_like(x) | torch.empty_like(x) |
 23 | | np.eye           | torch.eye           |
 24 | | np.identity      | torch.eye           |
 25 | | np.ones          | torch.ones          |
 26 | | np.ones_like     | torch.ones_like     |
 27 | | np.zeros         | torch.zeros         |
 28 | | np.zeros_like    | torch.zeros_like    |
 29 | 
 30 | ## 从已知数据构造
 31 | 
 32 | | Numpy                                                        | PyTorch                                       |
 33 | | ------------------------------------------------------------ | --------------------------------------------- |
 34 | | np.array([[1, 2], [3, 4]])                                   | torch.tensor([[1, 2], [3, 4]])                |
 35 | | np.array([3.2, 4.3], dtype=np.float16)np.float16([3.2, 4.3]) | torch.tensor([3.2, 4.3], dtype=torch.float16) |
 36 | | x.copy()                                                     | x.clone()                                     |
 37 | | np.fromfile(file)                                            | torch.tensor(torch.Storage(file))             |
 38 | | np.frombuffer                                                |                                               |
 39 | | np.fromfunction                                              |                                               |
 40 | | np.fromiter                                                  |                                               |
 41 | | np.fromstring                                                |                                               |
 42 | | np.load                                                      | torch.load                                    |
 43 | | np.loadtxt                                                   |                                               |
 44 | | np.concatenate                                               | torch.cat                                     |
 45 | 
 46 | ## 数值范围
 47 | 
 48 | | Numpy                | PyTorch                 |
 49 | | -------------------- | ----------------------- |
 50 | | np.arange(10)        | torch.arange(10)        |
 51 | | np.arange(2, 3, 0.1) | torch.arange(2, 3, 0.1) |
 52 | | np.linspace          | torch.linspace          |
 53 | | np.logspace          | torch.logspace          |
 54 | 
 55 | ## 构造矩阵
 56 | 
 57 | | Numpy   | PyTorch    |
 58 | | ------- | ---------- |
 59 | | np.diag | torch.diag |
 60 | | np.tril | torch.tril |
 61 | | np.triu | torch.triu |
 62 | 
 63 | ## 参数
 64 | 
 65 | | Numpy     | PyTorch      |
 66 | | --------- | ------------ |
 67 | | x.shape   | x.shape      |
 68 | | x.strides | x.stride()   |
 69 | | x.ndim    | x.dim()      |
 70 | | x.data    | x.data       |
 71 | | x.size    | x.nelement() |
 72 | | x.dtype   | x.dtype      |
 73 | 
 74 | ## 索引
 75 | 
 76 | | Numpy               | PyTorch                                  |
 77 | | ------------------- | ---------------------------------------- |
 78 | | x[0]                | x[0]                                     |
 79 | | x[:, 0]             | x[:, 0]                                  |
 80 | | x[indices]          | x[indices]                               |
 81 | | np.take(x, indices) | torch.take(x, torch.LongTensor(indices)) |
 82 | | x[x != 0]           | x[x != 0]                                |
 83 | 
 84 | ## 形状（Shape）变换
 85 | 
 86 | | Numpy                                  | PyTorch                  |
 87 | | -------------------------------------- | ------------------------ |
 88 | | x.reshape                              | x.reshape; x.view        |
 89 | | x.resize()                             | x.resize_                |
 90 | |                                        | x.resize_as_             |
 91 | | x.transpose                            | x.transpose or x.permute |
 92 | | x.flatten                              | x.view(-1)               |
 93 | | x.squeeze()                            | x.squeeze()              |
 94 | | x[:, np.newaxis]; np.expand_dims(x, 1) | x.unsqueeze(1)           |
 95 | 
 96 | ## 数据选择
 97 | 
 98 | | Numpy                                                   | PyTorch                                                      |
 99 | | ------------------------------------------------------- | ------------------------------------------------------------ |
100 | | np.put                                                  |                                                              |
101 | | x.put                                                   | x.put_                                                       |
102 | | x = np.array([1, 2, 3])x.repeat(2) # [1, 1, 2, 2, 3, 3] | x = torch.tensor([1, 2, 3])x.repeat(2) # [1, 2, 3, 1, 2, 3]x.repeat(2).reshape(2, -1).transpose(1, 0).reshape(-1) # [1, 1, 2, 2, 3, 3] |
103 | | np.tile(x, (3, 2))                                      | x.repeat(3, 2)                                               |
104 | | np.choose                                               |                                                              |
105 | | np.sort                                                 | sorted, indices = torch.sort(x, [dim])                       |
106 | | np.argsort                                              | sorted, indices = torch.sort(x, [dim])                       |
107 | | np.nonzero                                              | torch.nonzero                                                |
108 | | np.where                                                | torch.where                                                  |
109 | | x[::-1]                                                 |                                                              |
110 | 
111 | ## 数值计算
112 | 
113 | | Numpy       | PyTorch                        |
114 | | ----------- | ------------------------------ |
115 | | x.min       | x.min                          |
116 | | x.argmin    | x.argmin                       |
117 | | x.max       | x.max                          |
118 | | x.argmax    | x.argmax                       |
119 | | x.clip      | x.clamp                        |
120 | | x.round     | x.round                        |
121 | | np.floor(x) | torch.floor(x); x.floor()      |
122 | | np.ceil(x)  | torch.ceil(x); x.ceil()        |
123 | | x.trace     | x.trace                        |
124 | | x.sum       | x.sum                          |
125 | | x.cumsum    | x.cumsum                       |
126 | | x.mean      | x.mean                         |
127 | | x.std       | x.std                          |
128 | | x.prod      | x.prod                         |
129 | | x.cumprod   | x.cumprod                      |
130 | | x.all       | (x == 1).sum() == x.nelement() |
131 | | x.any       | (x == 1).sum() > 0             |
132 | 
133 | ## 数值比较
134 | 
135 | | Numpy            | PyTorch |
136 | | ---------------- | ------- |
137 | | np.less          | x.lt    |
138 | | np.less_equal    | x.le    |
139 | | np.greater       | x.gt    |
140 | | np.greater_equal | x.ge    |
141 | | np.equal         | x.eq    |
142 | | np.not_equal     | x.ne    |
143 | 
144 |  
145 | 
146 | 
147 | 
148 | pytorch与tensorflow API速查表
149 | |方法名称	|pytroch	|tensorflow	|numpy|
150 | | ---------------- | ------- | ------- | ------- |
151 | |裁剪	|torch.clamp(x, min, max)	|tf.clip_by_value(x, min, max)	|np.clip(x, min, max)|
152 | |取最小值|	torch.min(x, dim)[0]|	tf.min(x, axis)|	np.min(x , axis)|
153 | |取两个tensor的最大值|	torch.max(x, y)|	tf.maximum(x, y)|	np.maximum(x, y)|
154 | |取两个tensor的最小值|	torch.min(x, y)	|torch.minimum(x, y)|	np.minmum(x, y)|
155 | |取最大值索引|	torch.max(x, dim)[1]|	tf.argmax(x, axis)|	np.argmax(x, axis)|
156 | |取最小值索引|	torch.min(x, dim)[1]|	tf.argmin(x, axis)|	np.argmin(x, axis)|
157 | |比较(x > y)|	torch.gt(x, y)|	tf.greater(x, y)|	np.greater(x, y)|
158 | |比较(x < y)	|torch.le(x, y)|	tf.less(x, y)|	np.less(x, y)|
159 | |比较(x==y)|	torch.eq(x, y)|	tf.equal(x, y)|	np.equal(x, y)|
160 | |比较(x!=y)	|torch.ne(x, y)	|tf.not_equal(x, y)|	np.not_queal(x , y)|
161 | |取符合条件值的索引|	torch.nonzero(cond)|	tf.where(cond)	|np.where(cond)|
162 | |多个tensor聚合	|torch.cat([x, y], dim)|	tf.concat([x,y], axis)	|np.concatenate([x,y], axis)|
163 | |堆叠成一个tensor|	torch.stack([x1, x2], dim)	|tf.stack([x1, x2], axis)|	np.stack([x, y], axis) |
164 | |tensor切成多个tensor|	torch.split(x1, split_size_or_sections, dim)|	tf.split(x1, num_or_size_splits, axis)	|np.split(x1, indices_or_sections, axis)	|
165 | |-|torch.unbind(x1, dim)|	tf.unstack(x1,axis)|	NULL|
166 | |随机扰乱| torch.randperm(n)    1 |	tf.random_shuffle(x)| np.random.shuffle(x)   2 np.random.permutation(x )  3 |
167 | |前k个值|	torch.topk(x, n, sorted, dim)|	tf.nn.top_k(x, n, sorted)|	NULL|
168 | 
169 | 1. 该方法只能对0~n-1自然数随机扰乱，所以先对索引随机扰乱，然后再根据扰乱后的索引取相应的数据得到扰乱后的数据 
170 | 2. 该方法会修改原值，没有返回值
171 | 3. 该方法不会修改原值，返回扰乱后的值


--------------------------------------------------------------------------------
/7数据增强的方法.md:
--------------------------------------------------------------------------------
  1 | # Compose组合***
  2 | 
  3 | transforms包含了一些常用的图像变换，这些变换能够用`Compose`串联组合起来
  4 | 
  5 | ```python
  6 | torchvision.transforms.Compose(transforms)
  7 | # 用于把一系列变换组合到一起。
  8 | # 参数：transforms（list或Transform对象）- 一系列需要进行组合的变换。
  9 | 
 10 | >>> transforms.Compose([
 11 | >>>     transforms.CenterCrop(10),
 12 | >>>     transforms.ToTensor(),
 13 | >>> ])
 14 | ```
 15 | 
 16 | 主要从官方文档中总结而来，官方文档只是将方法陈列，没有归纳总结，顺序很乱，这里总结一共有四大类，方便大家索引：
 17 | 裁剪——Crop
 18 | 中心裁剪：transforms.CenterCrop
 19 | 随机裁剪：transforms.RandomCrop
 20 | 随机长宽比裁剪：transforms.RandomResizedCrop
 21 | 上下左右中心裁剪：transforms.FiveCrop
 22 | 上下左右中心裁剪后翻转，transforms.TenCrop
 23 | 
 24 | 翻转和旋转——Flip and Rotation
 25 | 依概率p水平翻转：transforms.RandomHorizontalFlip(p=0.5)
 26 | 依概率p垂直翻转：transforms.RandomVerticalFlip(p=0.5)
 27 | 随机旋转：transforms.RandomRotation
 28 | 
 29 | 图像变换
 30 | resize：transforms.Resize
 31 | 标准化：transforms.Normalize
 32 | 转为tensor，并归一化至[0-1]：transforms.ToTensor
 33 | 填充：transforms.Pad
 34 | 修改亮度、对比度和饱和度：transforms.ColorJitter
 35 | 转灰度图：transforms.Grayscale
 36 | 线性变换：transforms.LinearTransformation()
 37 | 仿射变换：transforms.RandomAffine
 38 | 依概率p转为灰度图：transforms.RandomGrayscale
 39 | 将数据转换为PILImage：transforms.ToPILImage
 40 | transforms.Lambda：Apply a user-defined lambda as a transform.
 41 | 
 42 | 对transforms操作，使数据增强更灵活
 43 | transforms.RandomChoice(transforms)， 从给定的一系列transforms中选一个进行操作
 44 | transforms.RandomApply(transforms, p=0.5)，给一个transform加上概率，依概率进行操作
 45 | transforms.RandomOrder，将transforms中的操作随机打乱
 46 | 
 47 | #  一、 裁剪——Crop
 48 | ## 1.随机裁剪：transforms.RandomCrop
 49 | ```
 50 | class torchvision.transforms.RandomCrop(size, padding=None, pad_if_needed=False, fill=0, padding_mode=‘constant’)
 51 | ```
 52 | 
 53 | 功能：依据给定的size随机裁剪
 54 | 参数：
 55 | size- (sequence or int)，若为sequence,则为(h,w)，若为int，则(size,size)
 56 | padding-(sequence or int, optional)，此参数是设置填充多少个pixel。
 57 | 当为int时，图像上下左右均填充int个，例如padding=4，则上下左右均填充4个pixel，若为$32* 32$，则会变成$40*40$。
 58 | 当为sequence时，若有2个数，则第一个数表示左右扩充多少，第二个数表示上下的。当有4个数时，则为左，上，右，下。
 59 | fill- (int or tuple) 填充的值是什么（仅当填充模式为constant时有用）。int时，各通道均填充该值，当长度为3的tuple时，表示RGB通道需要填充的值。
 60 | padding_mode- 填充模式，这里提供了4种填充模式，1.constant，常量。2.edge 按照图片边缘的像素值来填充。3.reflect  4. symmetric
 61 | 
 62 | ## 2.中心裁剪：transforms.CenterCrop***
 63 | ```
 64 | class torchvision.transforms.CenterCrop(size)
 65 | ```
 66 | 
 67 | 功能：依据给定的size从中心裁剪
 68 | 参数：
 69 | size- (sequence or int)，需要裁剪出的形状,如果size是int，将会裁剪成正方形；如果是形如(h, w)的序列，将会裁剪成矩形。
 70 | 
 71 | ## 3.随机长宽比裁剪 transforms.RandomResizedCrop
 72 | ```
 73 | class torchvision.transforms.RandomResizedCrop(size, scale=(0.08, 1.0), ratio=(0.75, 1.3333333333333333), interpolation=2)
 74 | ```
 75 | 
 76 | 功能：随机大小，随机长宽比裁剪原始图片，最后将图片resize到设定好的size
 77 | 参数：
 78 | size- 输出的分辨率
 79 | scale- 随机crop的大小区间，如scale=(0.08, 1.0)，表示随机crop出来的图片会在的0.08倍至1倍之间。
 80 | ratio- 随机长宽比设置
 81 | interpolation- 插值的方法，默认为双线性插值(PIL.Image.BILINEAR)
 82 | 
 83 | ## 4.上下左右中心裁剪：transforms.FiveCrop
 84 | ```
 85 | class torchvision.transforms.FiveCrop(size)
 86 | ```
 87 | 
 88 | 功能：对图片进行上下左右以及中心裁剪，获得5张图片，返回一个4D-tensor
 89 | 参数：
 90 | size- (sequence or int)，若为sequence,则为(h,w)，若为int，则(size,size)
 91 | 
 92 | ## 5.上下左右中心裁剪后翻转: transforms.TenCrop
 93 | ```
 94 | class torchvision.transforms.TenCrop(size, vertical_flip=False)
 95 | 
 96 | ```
 97 | 
 98 | 功能：对图片进行上下左右以及中心裁剪，然后全部翻转（水平或者垂直），获得10张图片，返回一个4D-tensor。
 99 | 参数：
100 | size- (sequence or int)，若为sequence,则为(h,w)，若为int，则(size,size)
101 | vertical_flip (bool) - 是否垂直翻转，默认为flase，即默认为水平翻转
102 | 
103 | # 二、翻转和旋转——Flip and Rotation
104 | ## 6.依概率p水平翻转transforms.RandomHorizontalFlip
105 | ```
106 | class torchvision.transforms.RandomHorizontalFlip(p=0.5)
107 | ```
108 | 
109 | 功能：依据概率p对PIL图片进行水平翻转
110 | 参数：
111 | p- 概率，默认值为0.5
112 | 
113 | ## 7.依概率p垂直翻转transforms.RandomVerticalFlip
114 | ```
115 | class torchvision.transforms.RandomVerticalFlip(p=0.5)
116 | ```
117 | 
118 | 功能：依据概率p对PIL图片进行垂直翻转
119 | 参数：
120 | p- 概率，默认值为0.5
121 | 
122 | ## 8.随机旋转：transforms.RandomRotation
123 | ```
124 | class torchvision.transforms.RandomRotation(degrees, resample=False, expand=False, center=None)
125 | ```
126 | 
127 | 功能：依degrees随机旋转一定角度
128 | 参数：
129 | degress- (sequence or float or int) ，若为单个数，如 30，则表示在（-30，+30）之间随机旋转
130 | 若为sequence，如(30，60)，则表示在30-60度之间随机旋转
131 | resample- 重采样方法选择，可选 PIL.Image.NEAREST, PIL.Image.BILINEAR, PIL.Image.BICUBIC，默认为最近邻
132 | expand- ?
133 | center- 可选为中心旋转还是左上角旋转
134 | 
135 | # 三、图像变换
136 | ## 9.resize：transforms.Resize
137 | ```
138 | class torchvision.transforms.Resize(size, interpolation=2)
139 | ```
140 | 
141 | 功能：重置图像分辨率
142 | 参数：
143 | 
144 | - **size**（*序列* *或* [*int*](https://docs.python.org/3/library/functions.html#int)）– 期望输出形状。如果size形如（h, w），输出就以该形状。If size is an int, ==smaller edge== of the image will be matched to this number. i.e, if height > width, then image will be rescaled to (size * height / width, size).可以保持原来的宽高比
145 | - interpolation- 插值方法选择，默认为PIL.Image.BILINEAR
146 | 
147 | ## 10.标准化：transforms.Normalize***
148 | ```
149 | torchvision.transforms.Normalize(mean, std, inplace=False)
150 | ```
151 | 
152 | 功能：对数据按通道进行标准化，即先减均值，再除以标准差
153 | 
154 | 用平均值和标准差标准化输入图片。给定`n`个通道的平均值`(M1,...,Mn)`和标准差`(S1,..,Sn)`，这一变换会在`torch.*Tensor`的每一个通道上进行标准化，即`input[channel] = (input[channel] - mean[channel]) / std[channel]`。
155 | 
156 | 
157 | 
158 | 需要标准化的图像Tensor，形状须为(C, H, W)
159 | 
160 | ## 11.转为tensor：transforms.ToTensor***
161 | ```
162 | class torchvision.transforms.ToTensor
163 | ```
164 | 
165 | 
166 | 功能：将PIL Image或者 ndarray 转换为tensor，并且归一化至[0-1]
167 | 
168 | ==把PIL图像或[0, 255]范围内的numpy.ndarray（形状(H x W x C)）转化成torch.FloatTensor，张量形状(C x H x W)，范围在[0.0, 1.0]中。==输入应是是PIL图像且是模式（L, LA, P, I, F, RGB, YCbCr, RGBA, CMYK, 1）中的一种，或输入是numpy.ndarray且类型为np.uint8。
169 | 
170 | 注意事项：归一化至[0-1]是直接除以255，若自己的ndarray数据尺度有变化，则需要自行修改。
171 | 
172 | ## 12.填充：transforms.Pad
173 | ```
174 | class torchvision.transforms.Pad(padding, fill=0, padding_mode=‘constant’)
175 | ```
176 | 
177 | 功能：对图像进行填充
178 | 参数：
179 | 
180 | - padding-(sequence or int, optional)，此参数是设置填充多少个pixel。
181 |     当为int时，图像上下左右均填充int个，例如padding=4，则上下左右均填充4个pixel，若为$32*32$，则会变成$40*40$。
182 |     当为sequence时，若有2个数，则第一个数表示左右扩充多少，第二个数表示上下的。当有4个数时，则为左，上，右，下。
183 | - fill- (int or tuple) 填充的值是什么（仅当填充模式为constant时有用）。int时，各通道均填充该值，当长度为3的tuple时，表示RGB通道需要填充的值。
184 | - padding_mode- 填充模式，这里提供了4种填充模式，1.constant，常量。2.edge 按照图片边缘的像素值来填充。3.reflect 4. symmetric
185 |     - constant：用常数扩展，这个值由fill参数指定。
186 |     - edge：用图像边缘上的指填充。
187 |     - reflect：以边缘为对称轴进行轴对称填充（边缘值不重复）。 > 例如，在[1, 2, 3, 4]的两边填充2个元素会得到[3, 2, 1, 2, 3, 4, 3, 2]。
188 |     - symmetric：用图像边缘的反转进行填充（图像的边缘值需要重复）。 > 例如，在[1, 2, 3, 4]的两边填充2个元素会得到[2, 1, 1, 2, 3, 4, 4, 3]。
189 | 
190 | ## 13.修改亮度、对比度和饱和度：transforms.ColorJitter
191 | ```
192 | class torchvision.transforms.ColorJitter(brightness=0, contrast=0, saturation=0, hue=0)
193 | ```
194 | 
195 | 
196 | 功能：修改修改亮度、对比度和饱和度
197 | 
198 | ## 14.转灰度图：transforms.Grayscale
199 | ```
200 | class torchvision.transforms.Grayscale(num_output_channels=1)
201 | ```
202 | 
203 | 
204 | 功能：将图片转换为灰度图
205 | 参数：
206 | num_output_channels- (int) ，当为1时，正常的灰度图，如果num_output_channels == 3：返回3通道图像，其中r == g == b。
207 | 
208 | ## 15.线性变换：transforms.LinearTransformation()
209 | ```
210 | class torchvision.transforms.LinearTransformation(transformation_matrix)
211 | ```
212 | 
213 | 
214 | 功能：对矩阵做线性变化，可用于白化处理！ whitening: zero-center the data, compute the data covariance matrix
215 | 参数：
216 | transformation_matrix (Tensor) – tensor [D x D], D = C x H x W
217 | 
218 | ## 16.仿射变换：transforms.RandomAffine
219 | ```
220 | class torchvision.transforms.RandomAffine(degrees, translate=None, scale=None, shear=None, resample=False, fillcolor=0)
221 | ```
222 | 
223 | 
224 | 功能：仿射变换
225 | 
226 | ## 17.依概率p转为灰度图：transforms.RandomGrayscale
227 | ```
228 | class torchvision.transforms.RandomGrayscale(p=0.1)
229 | ```
230 | 
231 | 
232 | 功能：依概率p将图片转换为灰度图，若通道数为3，则3 channel with r == g == b
233 | 
234 | ## 18.将数据转换为PILImage：transforms.ToPILImage
235 | ```
236 | class torchvision.transforms.ToPILImage(mode=None)
237 | ```
238 | 
239 | 
240 | 功能：将tensor 或者 ndarray的数据转换为 PIL Image 类型数据
241 | 参数：
242 | mode- 为None时，为1通道， mode=3通道默认转换为RGB，4通道默认转换为RGBA
243 | 
244 | ## 19.transforms.Lambda
245 | Apply a user-defined lambda as a transform.
246 | 
247 | 
248 | # 四、对transforms操作，使数据增强更灵活
249 | PyTorch不仅可设置对图片的操作，还可以对这些操作进行随机选择、组合
250 | 
251 | ## 20.transforms.RandomChoice(transforms)
252 | 功能：从给定的一系列transforms中选一个进行操作，randomly picked from a list
253 | 
254 | ## 21.transforms.RandomApply(transforms, p=0.5)
255 | 功能：给一个transform加上概率，以一定的概率执行该操作
256 | 
257 | ## 22.transforms.RandomOrder
258 | 
259 | ```
260 | class torchvision.transforms.RandomOrder(transforms)
261 | ```
262 | 
263 | 功能：将transforms中的操作顺序随机打乱
264 | 
265 | 
266 | 
267 | # 例子
268 | 
269 | ```python
270 | from PIL import Image
271 | from torchvision import transforms as tfs
272 | # 读入一张图片
273 | im = Image.open('./cat.png')
274 | # 比例缩放
275 | print('before scale, shape: {}'.format(im.size))
276 | new_im = tfs.Resize((100, 200))(im)
277 | print('after scale, shape: {}'.format(new_im.size))
278 | # before scale, shape: (224, 224)
279 | # after scale, shape: (200, 100)
280 | 
281 | # 随机裁剪出 150 x 100 的区域
282 | random_im2 = tfs.RandomCrop((150, 100))(im)
283 | # 中心裁剪出 100 x 100 的区域
284 | center_im = tfs.CenterCrop(100)(im)
285 | # 随机水平翻转
286 | h_filp = tfs.RandomHorizontalFlip()(im)
287 | # 随机竖直翻转
288 | v_flip = tfs.RandomVerticalFlip()(im)
289 | # 亮度
290 | bright_im = tfs.ColorJitter(brightness=1)(im) # 随机从 0 ~ 2 之间亮度变化，1 表示原图
291 | # 对比度
292 | contrast_im = tfs.ColorJitter(contrast=1)(im) # 随机从 0 ~ 2 之间对比度变化，1 表示原图
293 | # 颜色
294 | color_im = tfs.ColorJitter(hue=0.5)(im) # 随机从 -0.5 ~ 0.5 之间对颜色变化
295 | 
296 | 
297 | im_aug = tfs.Compose([
298 |     tfs.Resize(120),
299 |     tfs.RandomHorizontalFlip(),
300 |     tfs.RandomCrop(96),
301 |     tfs.ColorJitter(brightness=0.5, contrast=0.5, hue=0.5)
302 | ])
303 | import matplotlib.pyplot as plt
304 | 
305 | nrows = 3
306 | ncols = 3
307 | figsize = (8, 8)
308 | _, figs = plt.subplots(nrows, ncols, figsize=figsize)
309 | for i in range(nrows):
310 |     for j in range(ncols):
311 |         figs[i][j].imshow(im_aug(im))
312 |         figs[i][j].axes.get_xaxis().set_visible(False)
313 |         figs[i][j].axes.get_yaxis().set_visible(False)
314 | plt.show()
315 | ```
316 | 
317 | 
318 | 
319 | ![猫数据增强](assets/猫数据增强.png)


--------------------------------------------------------------------------------
/pytorch套路.md:
--------------------------------------------------------------------------------
  1 | ```python
  2 | # loss.backward()是求梯度的过程,可以通过手动来更新参数，而不用优化器来更新
  3 | # optimizer.step()只是使用loss.backward()得到的梯度进行更新参数
  4 | # 需要to(device) 只有model,训练集data，标签target
  5 | 
  6 | import torch
  7 | import torch.nn as nn # 各种层类型的实现
  8 | import torch.nn.functional as F
  9 | # 各中层函数的实现，与层类型对应，如：卷积函数、池化函数、归一化函数等等
 10 | 
 11 | # 是否可以用gpu
 12 | USE_CUDA = torch.cuda.is_available()
 13 | 
 14 | # 为了保证实验结果可以复现，我们经常会把各种random seed固定在某一个值
 15 | 
 16 | device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
 17 | 
 18 | random.seed(53113)
 19 | np.random.seed(53113)
 20 | torch.manual_seed(53113)
 21 | if USE_CUDA:
 22 |     torch.cuda.manual_seed(53113)
 23 |     
 24 | # 自定义模型类需要继承nn.Module，且你至少要重写__init__和forward两个函数
 25 | 
 26 | class TwoLayerNet(torch.nn.Module):
 27 |     def __init__(self, D_in, H, D_out):
 28 |         """
 29 |         在构造函数中，我们实例化两个nn.Linear模块并将它们指定为成员变量。
 30 |         """
 31 |         super(TwoLayerNet, self).__init__()
 32 |         # 初始化继承
 33 |         
 34 |         self.linear1 = torch.nn.Linear(D_in, H)
 35 |         self.linear2 = torch.nn.Linear(H, D_out)
 36 | 
 37 |     def forward(self, x):
 38 |         """
 39 |         在forward函数中，我们接受输入数据的Tensor，我们必须返回Tensor的输出数据。
 40 |         我们可以使用构造函数中定义的模块以及Tensors上的任意（可区分）操作。
 41 |         """
 42 |         h_relu = self.linear1(x).clamp(min=0)
 43 |         y_pred = self.linear2(h_relu)
 44 |         return y_pred
 45 | 
 46 | # N is batch size; D_in is input dimension;
 47 | # H is hidden dimension; D_out is output dimension.
 48 | N, D_in, H, D_out = 64, 1000, 100, 10
 49 | 
 50 | # 输入和输出
 51 | x = torch.randn(N, D_in, device=device)
 52 | y = torch.randn(N, D_out, device=device)
 53 | 
 54 | # 创建模型
 55 | model = TwoLayerNet(D_in, H, D_out)
 56 | model = model.to(device)
 57 | 
 58 | # 构造我们的损失函数和优化器。 
 59 | #在SGD构造函数中对model.parameters（）的调用将包含作为模型成员的两个nn.Linear模块的可学习参数。
 60 | # 损失函数
 61 | loss_fn = torch.nn.MSELoss(reduction='sum')
 62 | # 优化器不用 to(device)
 63 | optimizer = torch.optim.SGD(model.parameters(), lr=1e-4)
 64 | for t in range(500):
 65 |     # Forward pass: 喂入数据并前向传播获取输出
 66 |     y_pred = model(x)
 67 | 
 68 |     # Compute and print loss
 69 |     # 调用损失函数计算损失
 70 |     loss = loss_fn(y_pred, y)
 71 |     print(t, loss.item())
 72 | 
 73 |     # Zero gradients, perform a backward pass, and update the weights.
 74 |     # 清除所有优化的梯度
 75 |     optimizer.zero_grad()
 76 |     # 反向传播
 77 |     loss.backward()
 78 |     # 参数更新
 79 |     optimizer.step()
 80 |     
 81 | #测试时不用计算梯度
 82 | #with torch.no_grad(): 
 83 | # 禁用梯度计算
 84 | ```
 85 | 
 86 | 
 87 | 
 88 | 
 89 | ​    
 90 | 
 91 | # CNN-LeNet5
 92 | ```python
 93 | import torch.nn as nn
 94 | class LeNet5(nn.Module):
 95 | 
 96 |     def __init__(self):
 97 |         super(LeNet5, self).__init__()
 98 |         # 1 input image channel, 6 output channels, 5x5 square convolution
 99 |         # kernel
100 |         self.conv1 = nn.Conv2d(1, 6, 5)
101 |         self.conv2 = nn.Conv2d(6, 16, 5)
102 |         # an affine operation: y = Wx + b
103 |         self.fc1 = nn.Linear(16 * 4 * 4, 120) # 这里论文上写的是conv,官方教程用了线性层
104 |         self.fc2 = nn.Linear(120, 84)
105 |         self.fc3 = nn.Linear(84, 10)
106 | 
107 |     def forward(self, x):
108 |         # Max pooling over a (2, 2) window
109 |         x = F.max_pool2d(F.relu(self.conv1(x)), (2, 2))
110 |         # If the size is a square you can only specify a single number
111 |         x = F.max_pool2d(F.relu(self.conv2(x)), 2)
112 |         x = x.view(-1, self.num_flat_features(x))
113 |         x = F.relu(self.fc1(x))
114 |         x = F.relu(self.fc2(x))
115 |         x = self.fc3(x)
116 |         return x
117 | 
118 |     def num_flat_features(self, x):
119 |         size = x.size()[1:]  # all dimensions except the batch dimension
120 |         num_features = 1
121 |         for s in size:
122 |             num_features *= s
123 |         return num_features
124 |     
125 | 
126 | ```
127 | 
128 |     net = LeNet5()
129 |     print(net)
130 |     LeNet5(
131 |       (conv1): Conv2d(1, 6, kernel_size=(5, 5), stride=(1, 1))
132 |       (conv2): Conv2d(6, 16, kernel_size=(5, 5), stride=(1, 1))
133 |       (fc1): Linear(in_features=256, out_features=120, bias=True)
134 |       (fc2): Linear(in_features=120, out_features=84, bias=True)
135 |       (fc3): Linear(in_features=84, out_features=10, bias=True)
136 |     )
137 | 
138 | # 完整CNN  
139 | 
140 | ## 定义CNN
141 | 
142 | ```python
143 | import torch
144 | import torch.nn as nn
145 | import torch.nn.functional as F
146 | import torch.optim as optim
147 | from torchvision import datasets, transforms
148 | 
149 | 
150 | class Net(nn.Module):
151 |     def __init__(self):
152 |         super(Net, self).__init__()
153 |         self.conv1 = nn.Conv2d(1, 20, 5, 1) 
154 |         #torch.nn.Conv2d(in_channels, out_channels, kernel_size, stride=1)
155 |         #in_channels：输入图像通道数，手写数字图像为1，彩色图像为3
156 |         #out_channels：输出通道数，这个等于卷积核的数量
157 |         #kernel_size：卷积核大小
158 |         #stride：步长
159 |          
160 |         self.conv2 = nn.Conv2d(20, 50, 5, 1)
161 |         #上个卷积网络的out_channels，就是下一个网络的in_channels，所以这里是20
162 |         #out_channels：卷积核数量50
163 |         
164 |         
165 |         self.fc1 = nn.Linear(4*4*50, 500)
166 |         #全连接层torch.nn.Linear(in_features, out_features)
167 |         #in_features:输入特征维度，4*4*50是自己算出来的，跟输入图像维度有关
168 |         #out_features；输出特征维度
169 |         
170 |         self.fc2 = nn.Linear(500, 10)
171 |         #输出维度10，10分类
172 | 
173 |     def forward(self, x):  
174 |         #print(x.shape)  #手写数字的输入维度，(N,1,28,28), N为batch_size
175 |         x = F.relu(self.conv1(x)) # x = (N,50,24,24)
176 |         x = F.max_pool2d(x, 2, 2) # x = (N,50,12,12)
177 |         x = F.relu(self.conv2(x)) # x = (N,50,8,8)
178 |         x = F.max_pool2d(x, 2, 2) # x = (N,50,4,4)
179 |         x = x.view(-1, 4*4*50)    # x = (N,4*4*50)
180 |         x = F.relu(self.fc1(x))   # x = (N,4*4*50)*(4*4*50, 500)=(N,500)
181 |         x = self.fc2(x)           # x = (N,500)*(500, 10)=(N,10)
182 |         return F.log_softmax(x, dim=1)  #带log的softmax分类，每张图片返回10个概率
183 | ```
184 | 
185 | 
186 | 
187 | NLL-loss的定义
188 | $$
189 | \ell(x, y) = L = \{l_1,\dots,l_N\}^\top, \quad
190 |         l_n = - w_{y_n} x_{n,y_n}, \quad
191 |         w_{c} = \text{weight}[c] \cdot \mathbb{1}\{c \not= \text{ignore\_index}\}
192 | $$
193 | 
194 | ## 定义训练函数
195 | ```python
196 | def train(model, device, train_loader, optimizer, epoch, log_interval=100):
197 |     model.train() #进入训练模式
198 |     for batch_idx, (data, target) in enumerate(train_loader):
199 |         data, target = data.to(device), target.to(device)
200 |         optimizer.zero_grad() #梯度归零
201 |         output = model(data)  #输出的维度[N,10] 这里的data是函数的forward参数x
202 |         loss = F.nll_loss(output, target) #这里loss求的是平均数，除以了batch
203 | #F.nll_loss(F.log_softmax(input), target) ：
204 | #单分类交叉熵损失函数，一张图片里只能有一个类别，输入input的需要softmax
205 | #还有一种是多分类损失函数，一张图片有多个类别，输入的input需要sigmoid
206 |         
207 |         loss.backward()
208 |         optimizer.step()
209 |         if batch_idx % log_interval == 0:
210 |             print("Train Epoch: {} [{}/{} ({:0f}%)]\tLoss: {:.6f}".format(
211 |                 epoch, 
212 |                 batch_idx * len(data), #100*32
213 |                 len(train_loader.dataset), #60000
214 |                 100. * batch_idx / len(train_loader), #len(train_loader)=60000/32=1875
215 |                 loss.item()
216 |             ))
217 |             #print(len(train_loader))
218 | ```
219 | 
220 | 
221 | ## 定义测试函数
222 | 
223 | ```python
224 | def test(model, device, test_loader):
225 |     model.eval() #进入测试模式
226 |     test_loss = 0
227 |     correct = 0
228 |     with torch.no_grad():
229 |         for data, target in test_loader:
230 |             data, target = data.to(device), target.to(device)
231 |             output = model(data) 
232 |             test_loss += F.nll_loss(output, target, reduction='sum').item()
233 |             # sum up batch loss
234 |             #reduction='sum'代表batch的每个元素loss累加求和，默认是mean求平均
235 |                        
236 |             pred = output.argmax(dim=1, keepdim=True) 
237 |             # get the index of the max log-probability
238 |             
239 |             #print(target.shape) #torch.Size([32])
240 |             #print(pred.shape) #torch.Size([32, 1])
241 |             correct += pred.eq(target.view_as(pred)).sum().item()
242 |             #pred和target的维度不一样
243 |             #pred.eq()相等返回1，不相等返回0，返回的tensor维度(32，1)。
244 | 
245 |     test_loss /= len(test_loader.dataset)
246 | 
247 |     print('\nTest set: Average loss: {:.4f}, Accuracy: {}/{} ({:.0f}%)\n'.format(
248 |         test_loss, correct, len(test_loader.dataset),
249 |         100. * correct / len(test_loader.dataset)))
250 | ```
251 | ## 训练和测试
252 | 
253 | ```python
254 | torch.manual_seed(53113)
255 | 
256 | use_cuda = torch.cuda.is_available()
257 | device = torch.device("cuda" if use_cuda else "cpu")
258 | 
259 | batch_size = test_batch_size = 32
260 | kwargs = {'num_workers': 1, 'pin_memory': True} if use_cuda else {}
261 | train_loader = torch.utils.data.DataLoader(
262 |     datasets.MNIST('./mnist_data', train=True, download=True,
263 |                    transform=transforms.Compose([
264 |                        transforms.ToTensor(),
265 |                        transforms.Normalize((0.1307,), (0.3081,))
266 |                    ])),
267 |     batch_size=batch_size, shuffle=True, **kwargs)
268 | test_loader = torch.utils.data.DataLoader(
269 |     datasets.MNIST('./mnist_data', train=False, transform=transforms.Compose([
270 |                        transforms.ToTensor(),
271 |                        transforms.Normalize((0.1307,), (0.3081,))
272 |                    ])),
273 |     batch_size=test_batch_size, shuffle=True, **kwargs)
274 | 
275 | 
276 | lr = 0.01
277 | momentum = 0.5
278 | model = Net().to(device)
279 | optimizer = optim.SGD(model.parameters(), lr=lr, momentum=momentum)
280 | 
281 | epochs = 2
282 | for epoch in range(1, epochs + 1):
283 |     train(model, device, train_loader, optimizer, epoch)
284 |     test(model, device, test_loader)
285 | 
286 | save_model = True
287 | if (save_model):
288 |     torch.save(model.state_dict(),"mnist_cnn.pt")
289 | ```
290 | 
291 | ## 损失函数细节
292 | 
293 | cross_entropy输入的logits是未经过softmax层的输出。
294 | 
295 | 而标签值为一个数字，而不是对应的one-hot向量。
296 | $$
297 | loss(x, class) = -log(\frac{exp(x[class])}{(\sum_j exp(x[j]))})
298 |                = -x[class] + log(\sum_j exp(x[j]))
299 | $$
300 | nll_loss 输入的则是经过softmax和log后的输出
301 | 
302 | ```
303 | out=F.log_softmax(out,dim=1)
304 | ```
305 | 
306 | ```python
307 | torch.nn.CrossEntropyLoss
308 | 将输入经过 softmax 激活函数之后，再计算其与 target 的交叉熵损失。即该方法将
309 | nn.LogSoftmax()和 nn.NLLLoss()进行了结合
310 | 
311 | 输入的target是标签，而不能是对应的one-hot向量
312 | 
313 | torch.nn.NLLLoss
314 | loss(input, class) = -input[class]。 举个例，三分类任务， 
315 | input=[-1.233, 2.657, 0.534]， 真实标签为 2（class=2），则 loss 为-0.534
316 | ```
317 | 
318 | 
319 | 
320 | | torch.nn         | torch.nn.functional (F) |
321 | | ---------------- | ----------------------- |
322 | | CrossEntropyLoss | cross_entropy           |
323 | | LogSoftmax       | log_softmax             |
324 | | NLLLoss          | nll_loss                |
325 | 
326 | 
327 | 
328 | ```python
329 | x = torch.linspace(1, 10, 10)       # this is x data (torch tensor)
330 | y = torch.linspace(10, 1, 10)       # this is y data (torch tensor)
331 | 
332 | 
333 | torch_dataset = Data.TensorDataset(x, y) # y为target
334 | loader = Data.DataLoader(
335 |     dataset=torch_dataset,      # torch TensorDataset format
336 |     batch_size=BATCH_SIZE,      # mini batch size
337 |     shuffle=True,               # random shuffle for training每次训练打乱顺序
338 |     num_workers=2,              # subprocesses for loading data
339 | )
340 | ```
341 | 
342 | 


--------------------------------------------------------------------------------
/9pytorch读取数据集.md:
--------------------------------------------------------------------------------
  1 | 
  2 | 
  3 | 
  4 | 
  5 | ```python
  6 | # 自定义 Dataset
  7 | class MyMNIST(Dataset):
  8 |     def __init__(self, csv_file: str, train=False, transform=None):
  9 |         self.train = train
 10 |         self.transform = transform
 11 |         if self.train:
 12 |             train_df = pd.read_csv(csv_file)
 13 |             self.train_labels = train_df.iloc[:, 0].values
 14 |             self.train_data = train_df.iloc[:, 1:].values.reshape((-1, 28, 28))
 15 |         else:
 16 |             test_df = pd.read_csv(csv_file)
 17 |             self.test_data = test_df.values.reshape((-1, 28, 28))
 18 |     
 19 |     def __len__(self):
 20 |         if self.train:
 21 |             return len(self.train_data)
 22 |         else:
 23 |             return len(self.test_data)
 24 |         
 25 |     def __getitem__(self, index):
 26 |         if self.train:
 27 |             image, label = self.train_data[index], self.train_labels[index]
 28 |         else:
 29 |             image = self.test_data[index]
 30 |         image = Image.fromarray(image.astype(np.uint8))
 31 |         if self.transform is not None:
 32 |             image = self.transform(image)
 33 |         if self.train:
 34 |             return image, label
 35 |         else:
 36 |             return image
 37 | ```
 38 | 
 39 | # 普通神经网络代码
 40 | 
 41 | 这里需要除以255,而CNN不需要,因为ToTensor()会自动除以255
 42 | 
 43 | ```python
 44 | import os
 45 | import torch
 46 | import torch.nn as nn
 47 | import pandas as pd
 48 | from skimage import io, transform
 49 | import numpy as np
 50 | import matplotlib.pyplot as plt
 51 | from torch.utils.data import Dataset, DataLoader
 52 | import numpy as np
 53 | class MyData():
 54 |     """Face Landmarks dataset."""
 55 | 
 56 |     def __init__(self, csv_file, transform=None):
 57 |         """
 58 |         Args:
 59 |             csv_file (string): Path to the csv file with annotations.
 60 |             root_dir (string): Directory with all the images.
 61 |             transform (callable, optional): Optional transform to be applied
 62 |                 on a sample.
 63 |         """
 64 |         self.csv_data = pd.read_csv(csv_file,header=None)
 65 |         self.xdata = self.csv_data.iloc[:,1:].values/255.
 66 |         self.ydata = self.csv_data.iloc[:,0].values
 67 | 
 68 |     def __len__(self):
 69 |         return len(self.csv_data)
 70 | 
 71 |     def __getitem__(self, idx):
 72 |     	x = self.xdata[idx]
 73 |     	y = self.ydata[idx]
 74 | 
 75 |     	return x,y
 76 | 
 77 |         # return self.xdata[idx],self.ydata[idx]
 78 | # 
 79 | 
 80 | # initialize the paths to our training and testing CSV files
 81 | TRAIN_CSV = "../data/Mnist/mnist_train.csv"
 82 | TEST_CSV = "../data/Mnist/mnist_test.csv"
 83 | 
 84 | # initialize the number of epochs to train for and batch size
 85 | batch_size = 32
 86 | 
 87 | device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
 88 | 
 89 | # Hyper-parameters 
 90 | input_size = 784
 91 | hidden_size = 500
 92 | num_classes = 10
 93 | num_epochs = 1
 94 | learning_rate = 0.001
 95 | 
 96 | # initialize both the training and testing image generators
 97 | train_loader = DataLoader(dataset=MyData(TRAIN_CSV),
 98 |                         batch_size=batch_size, 
 99 |                        shuffle=True)
100 | test_loader = DataLoader(dataset=MyData(TEST_CSV), 
101 |                         batch_size=batch_size, 
102 |                        shuffle=True)
103 | # Fully connected neural network with one hidden layer
104 | class NeuralNet(nn.Module):
105 |     def __init__(self, input_size, hidden_size, num_classes):
106 |         super(NeuralNet, self).__init__()
107 |         self.fc1 = nn.Linear(input_size, hidden_size) 
108 |         self.relu = nn.ReLU()
109 |         self.fc2 = nn.Linear(hidden_size, num_classes)  
110 |     
111 |     def forward(self, x):
112 |         out = self.fc1(x)
113 |         out = self.relu(out)
114 |         out = self.fc2(out)
115 |         return out
116 | 
117 | model = NeuralNet(input_size, hidden_size, num_classes).to(device)
118 | 
119 | # Loss and optimizer
120 | criterion = nn.CrossEntropyLoss()
121 | optimizer = torch.optim.Adam(model.parameters(), lr=learning_rate)  
122 | 
123 | # Train the model
124 | total_step = len(train_loader)
125 | for epoch in range(num_epochs):
126 |     for i, (images, labels) in enumerate(train_loader):  
127 |         # Move tensors to the configured device
128 |         images = images.type(torch.FloatTensor)
129 |         # print(images.max()) 最大值为1
130 | 
131 |         images = images.reshape(-1, 28*28).to(device)
132 |         labels = labels.to(device)
133 |         
134 |         # Forward pass
135 |         outputs = model(images)
136 |         loss = criterion(outputs, labels)
137 |         
138 |         # Backward and optimize
139 |         optimizer.zero_grad()
140 |         loss.backward()
141 |         optimizer.step()
142 |         
143 |         if (i+1) % 100 == 0:
144 |             print ('Epoch [{}/{}], Step [{}/{}], Loss: {:.4f}' 
145 |                    .format(epoch+1, num_epochs, i+1, total_step, loss.item()))
146 | 
147 | with torch.no_grad():
148 |     correct = 0
149 |     total = 0
150 |     for images, labels in test_loader:
151 |         images = images.type(torch.FloatTensor)
152 |         images = images.reshape(-1, 28*28).to(device)
153 |         labels = labels.to(device)
154 |         outputs = model(images)
155 |         _, predicted = torch.max(outputs.data, 1)
156 |         total += labels.size(0)
157 |         correct += (predicted == labels).sum().item()
158 | 
159 |     print('Accuracy of the network on the 10000 test images: {} %'.format(100 * correct / total))
160 | 
161 | ```
162 | 
163 | 
164 | 
165 | 
166 | 
167 | # CNN完整代码
168 | 
169 | ```python
170 | import torch
171 | import pandas as pd 
172 | import numpy as np 
173 | import torch.nn as nn
174 | import torch.nn.functional as F
175 | import torch.optim as optim
176 | from torch.optim import lr_scheduler
177 | from torch.utils.data import DataLoader, Dataset
178 | from torchvision import transforms
179 | from torchvision.utils import make_grid
180 | 
181 | import math
182 | import random
183 | 
184 | from PIL import Image, ImageOps, ImageEnhance
185 | import numbers
186 | 
187 | import matplotlib.pyplot as plt
188 | 
189 | train_df = pd.read_csv('./data/train.csv')
190 | 
191 | n_train = len(train_df)
192 | n_pixels = len(train_df.columns) - 1
193 | n_class = len(set(train_df['label']))
194 | 
195 | # test_df = pd.read_csv('./data/test.csv')
196 | 
197 | # print(train_df.iloc[:,1:].values.mean(axis=1).mean())
198 | 
199 | class MNIST_data(Dataset):
200 |     """MNIST dtaa set"""
201 |     # transforms.ToTensor()自动除以255
202 |     
203 |     def __init__(self, file_path, 
204 |                  transform = transforms.Compose([transforms.ToTensor(), 
205 |                      transforms.Normalize(mean=(0.5,), std=(0.5,))])
206 |                 ):
207 |         
208 |         df = pd.read_csv(file_path)
209 |         
210 |         if len(df.columns) == n_pixels:
211 |             # test data
212 |             self.X = df.values.reshape((-1,28,28)).astype(np.uint8)[:,:,:,None]
213 |             self.y = None
214 |         else:
215 |             # training data
216 |             self.X = df.iloc[:,1:].values.reshape((-1,28,28)).astype(np.uint8)[:,:,:,None]
217 |             self.y = torch.from_numpy(df.iloc[:,0].values)
218 |             
219 |         self.transform = transform
220 |     
221 |     def __len__(self):
222 |         return len(self.X)
223 | 
224 |     def __getitem__(self, idx):
225 |         if self.y is not None:
226 |             return self.transform(self.X[idx]), self.y[idx]
227 |         else:
228 |             return self.transform(self.X[idx])
229 | 
230 | batch_size = 32
231 | 
232 | train_dataset = MNIST_data('./data/train.csv', transform= transforms.Compose(
233 |                             [transforms.ToPILImage(), transforms.RandomRotation(degrees=20), 
234 |                             # RandomShift(3),
235 |                              transforms.ToTensor(), transforms.Normalize(mean=(0.5,), std=(0.5,))]))
236 | # test_dataset = MNIST_data('./data/test.csv')
237 | 
238 | 
239 | train_loader = torch.utils.data.DataLoader(dataset=train_dataset,
240 |                                            batch_size=batch_size, shuffle=True)
241 | # test_loader = torch.utils.data.DataLoader(dataset=test_dataset,
242 | #                                            batch_size=batch_size, shuffle=False)
243 | 
244 | 
245 | # print(next(iter(train_loader))[0].max())
246 | # print(next(iter(train_loader))[0].min())
247 | # tensor(1.)
248 | # tensor(-1.)
249 | device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
250 | 
251 | class Net(nn.Module):
252 | 
253 |     def __init__(self):
254 |         super(Net, self).__init__()
255 |         # 1 input image channel, 6 output channels, 5x5 square convolution
256 |         # kernel
257 |         self.conv1 = nn.Conv2d(1, 6, 5)
258 |         self.conv2 = nn.Conv2d(6, 16, 5)
259 |         # an affine operation: y = Wx + b
260 |         self.fc1 = nn.Linear(16 * 4 * 4, 120) # 这里论文上写的是conv,官方教程用了线性层
261 |         self.fc2 = nn.Linear(120, 84)
262 |         self.fc3 = nn.Linear(84, 10)
263 | 
264 |     def forward(self, x):
265 |         # Max pooling over a (2, 2) window
266 |         x = F.max_pool2d(F.relu(self.conv1(x)), (2, 2))
267 |         # If the size is a square you can only specify a single number
268 |         x = F.max_pool2d(F.relu(self.conv2(x)), 2)
269 |         x = x.view(-1, self.num_flat_features(x))
270 |         x = F.relu(self.fc1(x))
271 |         x = F.relu(self.fc2(x))
272 |         x = self.fc3(x)
273 |         return F.log_softmax(x, dim=1)
274 | 
275 |     def num_flat_features(self, x):
276 |         size = x.size()[1:]  # all dimensions except the batch dimension
277 |         num_features = 1
278 |         for s in size:
279 |             num_features *= s
280 |         return num_features
281 |     
282 | def train(model, device, train_loader, optimizer, epoch, log_interval=100):
283 |     model.train() #进入训练模式
284 |     for batch_idx, (data, target) in enumerate(train_loader):
285 |         data, target = data.to(device), target.to(device)
286 |         optimizer.zero_grad() #梯度归零
287 |         output = model(data)  #输出的维度[N,10] 这里的data是函数的forward参数x
288 |         loss = F.nll_loss(output, target) #这里loss求的是平均数，除以了batch
289 | #F.nll_loss(F.log_softmax(input), target) ：
290 | #单分类交叉熵损失函数，一张图片里只能有一个类别，输入input的需要softmax
291 | #还有一种是多分类损失函数，一张图片有多个类别，输入的input需要sigmoid
292 |         
293 |         loss.backward()
294 |         optimizer.step()
295 |         if batch_idx % log_interval == 0:
296 |             print("Train Epoch: {} [{}/{} ({:0f}%)]\tLoss: {:.6f}".format(
297 |                 epoch, 
298 |                 batch_idx * len(data), #100*32
299 |                 len(train_loader.dataset), #60000
300 |                 100. * batch_idx / len(train_loader), #len(train_loader)=60000/32=1875 计算的不是准确率，而是已训练数据的比例
301 |                 loss.item()
302 |             ))
303 |             #print(len(train_loader))
304 | 
305 | lr = 0.01
306 | momentum = 0.5
307 | model = Net().to(device)
308 | optimizer = optim.SGD(model.parameters(), lr=lr, momentum=momentum)
309 | 
310 | epochs = 1
311 | for epoch in range(1, epochs + 1):
312 |     train(model, device, train_loader, optimizer, epoch)
313 | 
314 | # Train Epoch: 1 [0/42000 (0.000000%)]    Loss: 2.316952
315 | # Train Epoch: 1 [3200/42000 (7.616146%)] Loss: 2.290542
316 | # Train Epoch: 1 [6400/42000 (15.232292%)]    Loss: 2.284746
317 | # Train Epoch: 1 [9600/42000 (22.848439%)]    Loss: 2.109172
318 | # Train Epoch: 1 [12800/42000 (30.464585%)]   Loss: 1.203029
319 | # Train Epoch: 1 [16000/42000 (38.080731%)]   Loss: 0.748803
320 | # Train Epoch: 1 [19200/42000 (45.696877%)]   Loss: 0.574634
321 | # Train Epoch: 1 [22400/42000 (53.313024%)]   Loss: 0.399714
322 | # Train Epoch: 1 [25600/42000 (60.929170%)]   Loss: 0.317008
323 | # Train Epoch: 1 [28800/42000 (68.545316%)]   Loss: 0.168170
324 | # Train Epoch: 1 [32000/42000 (76.161462%)]   Loss: 0.153755
325 | # Train Epoch: 1 [35200/42000 (83.777609%)]   Loss: 0.332306
326 | # Train Epoch: 1 [38400/42000 (91.393755%)]   Loss: 0.197081
327 | # Train Epoch: 1 [41600/42000 (99.009901%)]   Loss: 0.208421
328 | ```
329 | 
330 | 


--------------------------------------------------------------------------------
/4pytorch初始化.md:
--------------------------------------------------------------------------------
  1 | 
  2 | 
  3 | ## 参数初始化（Weight Initialization）
  4 | 
  5 | PyTorch 中参数的默认初始化在各个层的 `reset_parameters()` 方法中。例如：`nn.Linear` 和 `nn.Conv2D`，都是在 \[-limit, limit\] 之间的均匀分布（Uniform distribution），其中 limit 是 `1. / sqrt(fan_in)` ，`fan_in` 是指参数张量（tensor）的输入单元的数量
  6 | 
  7 | 下面是几种常见的初始化方式。
  8 | 
  9 | ### Xavier Initialization
 10 | 
 11 | Xavier初始化的基本思想是保持输入和输出的方差一致，这样就避免了所有输出值都趋向于0。这是通用的方法，适用于任何激活函数。
 12 | 
 13 | ```python
 14 | # 默认方法
 15 | for m in model.modules():
 16 |     if isinstance(m, (nn.Conv2d, nn.Linear)):
 17 |         nn.init.xavier_uniform_(m.weight)
 18 | ```
 19 | 
 20 | 也可以使用 `gain` 参数来自定义初始化的标准差来匹配特定的激活函数：
 21 | 
 22 | ```python
 23 | for m in model.modules():
 24 |     if isinstance(m, (nn.Conv2d, nn.Linear)):
 25 |         nn.init.xavier_uniform_(m.weight(), gain=nn.init.calculate_gain('relu'))
 26 | ```
 27 | 
 28 | 参考资料：
 29 | 
 30 | - [Understanding the difficulty of training deep feedforward neural networks](https://www.pytorchtutorial.com/goto/http://proceedings.mlr.press/v9/glorot10a/glorot10a.pdf)
 31 | 
 32 | ### He et. al Initialization
 33 | 
 34 | ```
 35 | torch.nn.init.kaiming_uniform_(tensor, a=0, mode='fan_in', nonlinearity='leaky_relu')
 36 | 
 37 | ```
 38 | 
 39 | He initialization的思想是：在ReLU网络中，假定每一层有一半的神经元被激活，另一半为0。推荐在ReLU网络中使用。
 40 | 
 41 | ```python
 42 | # he initialization
 43 | for m in model.modules():
 44 |     if isinstance(m, (nn.Conv2d, nn.Linear)):
 45 |         nn.init.kaiming_normal_(m.weight, mode='fan_in')
 46 | ```
 47 | 
 48 | 
 49 | 
 50 | ### 正交初始化（Orthogonal Initialization）
 51 | 
 52 | 主要用以解决深度网络下的梯度消失、梯度爆炸问题，在RNN中经常使用的参数初始化方法。
 53 | 
 54 | ```python
 55 | for m in model.modules():
 56 |     if isinstance(m, (nn.Conv2d, nn.Linear)):
 57 |         nn.init.orthogonal(m.weight)
 58 | ```
 59 | 
 60 | 
 61 | 
 62 | ### Batchnorm Initialization
 63 | 
 64 | 在非线性激活函数之前，我们想让输出值有比较好的分布（例如高斯分布），以便于计算梯度和更新参数。Batch Normalization 将输出值强行做一次 Gaussian Normalization 和线性变换：
 65 | 
 66 | ![](https://www.pytorchtutorial.com/wp-content/uploads/2019/02/v2-2b14851823a6ec035cc16147eb5e04b0_hd.png)
 67 | 
 68 | 实现方法：
 69 | 
 70 | ```python
 71 | for m in model:
 72 |     if isinstance(m, nn.BatchNorm2d):
 73 |         nn.init.constant(m.weight, 1)
 74 |         nn.init.constant(m.bias, 0)
 75 | ```
 76 | 
 77 | 
 78 | 
 79 | 
 80 | 
 81 | 
 82 | 
 83 | 
 84 | 
 85 | ## 单层初始化
 86 | 
 87 | ```python
 88 | conv1 = nn.Conv2d(3, 64, kernel_size=7, stride=2, padding=3)
 89 | nn.init.xavier_uniform(conv1.weight)
 90 | nn.init.constant(conv1.bias, 0.1)
 91 | ```
 92 | 
 93 | ## 模型初始化
 94 | 
 95 | ```python
 96 | def weights_init(m):
 97 |     classname = m.__class__.__name__
 98 |     if classname.find('Conv2d') != -1:
 99 |         nn.init.xavier_normal_(m.weight.data)
100 |         nn.init.constant_(m.bias.data, 0.0)
101 |     elif classname.find('Linear') != -1:
102 |         nn.init.xavier_normal_(m.weight)
103 |         nn.init.constant_(m.bias, 0.0)
104 | net = Net()
105 | net.apply(weights_init) #apply函数会递归地搜索网络内的所有module并把参数表示的函数应用到所有的module上。
106 | ```
107 | 
108 | 不建议访问以下划线为前缀的成员，他们是内部的，如果有改变不会通知用户。更推荐的一种方法是检查某个module是否是某种类型：
109 | 
110 | ```
111 | def weights_init(m):
112 |     if isinstance(m, (nn.Conv2d, nn.Linear)):
113 |         nn.init.xavier_normal_(m.weight)
114 |         nn.init.constant_(m.bias, 0.0)
115 | ```
116 | 
117 | 
118 | 
119 | ```python
120 | import torch
121 | import torch.nn as nn
122 | 
123 | w = torch.empty(2, 3)
124 | 
125 | # 1. 均匀分布 - u(a,b)
126 | # torch.nn.init.uniform_(tensor, a=0, b=1)
127 | nn.init.uniform_(w)
128 | # tensor([[ 0.0578,  0.3402,  0.5034],
129 | #         [ 0.7865,  0.7280,  0.6269]])
130 | 
131 | # 2. 正态分布 - N(mean, std)
132 | # torch.nn.init.normal_(tensor, mean=0, std=1)
133 | nn.init.normal_(w)
134 | # tensor([[ 0.3326,  0.0171, -0.6745],
135 | #        [ 0.1669,  0.1747,  0.0472]])
136 | 
137 | # 3. 常数 - 固定值 val
138 | # torch.nn.init.constant_(tensor, val)
139 | nn.init.constant_(w, 0.3)
140 | # tensor([[ 0.3000,  0.3000,  0.3000],
141 | #         [ 0.3000,  0.3000,  0.3000]])
142 | 
143 | # 4. 对角线为 1，其它为 0
144 | # torch.nn.init.eye_(tensor)
145 | nn.init.eye_(w)
146 | # tensor([[ 1.,  0.,  0.],
147 | #         [ 0.,  1.,  0.]])
148 | 
149 | # 5. Dirac delta 函数初始化，仅适用于 {3, 4, 5}-维的 torch.Tensor
150 | # torch.nn.init.dirac_(tensor)
151 | w1 = torch.empty(3, 16, 5, 5)
152 | nn.init.dirac_(w1)
153 | 
154 | # 6. xavier_uniform 初始化
155 | # torch.nn.init.xavier_uniform_(tensor, gain=1)
156 | # From - Understanding the difficulty of training deep feedforward neural networks - Bengio 2010
157 | nn.init.xavier_uniform_(w, gain=nn.init.calculate_gain('relu'))
158 | # tensor([[ 1.3374,  0.7932, -0.0891],
159 | #         [-1.3363, -0.0206, -0.9346]])
160 | 
161 | # 7. xavier_normal 初始化
162 | # torch.nn.init.xavier_normal_(tensor, gain=1)
163 | nn.init.xavier_normal_(w)
164 | # tensor([[-0.1777,  0.6740,  0.1139],
165 | #         [ 0.3018, -0.2443,  0.6824]])
166 | 
167 | # 8. kaiming_uniform 初始化
168 | # From - Delving deep into rectifiers: Surpassing human-level performance on ImageNet classification - HeKaiming 2015
169 | # torch.nn.init.kaiming_uniform_(tensor, a=0, mode='fan_in', nonlinearity='leaky_relu')
170 | nn.init.kaiming_uniform_(w, mode='fan_in', nonlinearity='relu')
171 | # tensor([[ 0.6426, -0.9582, -1.1783],
172 | #         [-0.0515, -0.4975,  1.3237]])
173 | 
174 | # 9. kaiming_normal 初始化
175 | # torch.nn.init.kaiming_normal_(tensor, a=0, mode='fan_in', nonlinearity='leaky_relu')
176 | nn.init.kaiming_normal_(w, mode='fan_out', nonlinearity='relu')
177 | # tensor([[ 0.2530, -0.4382,  1.5995],
178 | #         [ 0.0544,  1.6392, -2.0752]])
179 | 
180 | # 10. 正交矩阵 - (semi)orthogonal matrix
181 | # From - Exact solutions to the nonlinear dynamics of learning in deep linear neural networks - Saxe 2013
182 | # torch.nn.init.orthogonal_(tensor, gain=1)
183 | nn.init.orthogonal_(w)
184 | # tensor([[ 0.5786, -0.5642, -0.5890],
185 | #         [-0.7517, -0.0886, -0.6536]])
186 | 
187 | # 11. 稀疏矩阵 - sparse matrix 
188 | # 非零元素采用正态分布 N(0, 0.01) 初始化.
189 | # From - Deep learning via Hessian-free optimization - Martens 2010
190 | # torch.nn.init.sparse_(tensor, sparsity, std=0.01)
191 | nn.init.sparse_(w, sparsity=0.1)
192 | # tensor(1.00000e-03 *
193 | #        [[-0.3382,  1.9501, -1.7761],
194 | #         [ 0.0000,  0.0000,  0.0000]])
195 | ```
196 | 
197 | 
198 | 
199 | ### Xavier均匀分布
200 | ```python
201 | torch.nn.init.xavier_uniform_(tensor, gain=1)
202 | xavier初始化方法中服从均匀分布U(−a,a) ，分布的参数a = gain * sqrt(6/fan_in+fan_out)，
203 | 这里有一个gain，增益的大小是依据激活函数类型来设定
204 | eg：nn.init.xavier_uniform_(w, gain=nn.init.calculate_gain(‘relu’))
205 | PS：上述初始化方法，也称为Glorot initialization
206 | 
207 | """
208 | torch.nn.init.xavier_uniform_(tensor, gain=1)
209 | 根据Glorot, X.和Bengio, Y.在“Understanding the dif×culty of training deep feedforward neural
210 | networks”中描述的方法，用一个均匀分布生成值，填充输入的张量或变量。结果张量中的值
211 | 采样自U(-a, a)，其中a= gain * sqrt( 2/(fan_in + fan_out))* sqrt(3). 该方法也被称为Glorot initialisat
212 | 
213 | 参数：
214 | tensor – n维的torch.Tensor
215 | gain - 可选的缩放因子
216 | """
217 | import torch
218 | from torch import nn
219 | w=torch.Tensor(3,5)
220 | nn.init.xavier_uniform_(w,gain=1)
221 | print(w)
222 | ```
223 | 
224 | 
225 | 
226 | ### Xavier正态分布
227 | 
228 | ```python
229 | torch.nn.init.xavier_normal_(tensor, gain=1)
230 | xavier初始化方法中服从正态分布，
231 | mean=0,std = gain * sqrt(2/fan_in + fan_out)
232 | 
233 | kaiming初始化方法，论文在《 Delving deep into rectifiers: Surpassing human-level performance on ImageNet classification》，公式推导同样从“方差一致性”出法，kaiming是针对xavier初始化方法在relu这一类激活函数表现不佳而提出的改进，详细可以参看论文。
234 | 
235 | """
236 | 根据Glorot, X.和Bengio, Y. 于2010年在“Understanding the dif×culty of training deep
237 | feedforward neural networks”中描述的方法，用一个正态分布生成值，填充输入的张量或变
238 | 量。结果张量中的值采样自均值为0，标准差为gain * sqrt(2/(fan_in + fan_out))的正态分布。
239 | 也被称为Glorot initialisation.
240 | 参数：
241 | tensor – n维的torch.Tensor
242 | gain - 可选的缩放因子
243 | """
244 |     
245 | b=torch.Tensor(3,4)
246 | nn.init.xavier_normal_(b, gain=1)
247 | print(b)
248 | ```
249 | 
250 | 
251 | 
252 | ### kaiming均匀分布
253 | 
254 | ```python
255 | torch.nn.init.kaiming_uniform_(tensor, a=0, mode=‘fan_in’, nonlinearity=‘leaky_relu’)
256 | 此为均匀分布，U～（-bound, bound）, bound = sqrt(6/(1+a^2)*fan_in)
257 | 其中，a为激活函数的负半轴的斜率，relu是0
258 | mode- 可选为fan_in 或 fan_out, fan_in使正向传播时，方差一致; fan_out使反向传播时，方差一致
259 | nonlinearity- 可选 relu 和 leaky_relu ，默认值为 。 leaky_relu
260 | nn.init.kaiming_uniform_(w, mode=‘fan_in’, nonlinearity=‘relu’)
261 | 
262 | w=torch.Tensor(3,5)
263 | nn.init.kaiming_normal_(w,a=0,mode='fan_in')
264 | print(w)
265 | ```
266 | 
267 | 
268 | 
269 | ### kaiming正态分布
270 | 
271 | ```
272 | torch.nn.init.kaiming_normal_(tensor, a=0, mode=‘fan_in’, nonlinearity=‘leaky_relu’)
273 | 此为0均值的正态分布，N～ (0,std)，其中std = sqrt(2/(1+a^2)*fan_in)
274 | 其中，a为激活函数的负半轴的斜率，relu是0
275 | mode- 可选为fan_in 或 fan_out, fan_in使正向传播时，方差一致;fan_out使反向传播时，方差一致
276 | nonlinearity- 可选 relu 和 leaky_relu ，默认值为 。 leaky_relu
277 | nn.init.kaiming_normal_(w, mode=‘fan_out’, nonlinearity=‘relu’)
278 | ```
279 | 
280 | 
281 | 2.其他
282 | 
283 | ### 均匀分布初始化
284 | 
285 | torch.nn.init.uniform_(tensor, a=0, b=1)
286 | 使值服从均匀分布U(a,b)
287 | 
288 | tensor - n维的torch.Tensor
289 | a - 均匀分布的下界
290 | b - 均匀分布的上界
291 | 
292 | 
293 | 
294 | ### 正态分布初始化
295 | 
296 | torch.nn.init.normal_(tensor, mean=0, std=1)
297 | 使值服从正态分布N(mean, std)，默认值为0，1
298 | 
299 | tensor – n维的torch.Tensor
300 | mean – 正态分布的均值
301 | std – 正态分布的标准差
302 | 
303 | 
304 | 
305 | ### 常数初始化
306 | 
307 | torch.nn.init.constant_(tensor, val)
308 | 使值为常数val nn.init.constant_(w, 0.3)
309 | 
310 | ```python
311 | """
312 | torch.nn.init.constant(tensor, val)
313 | 用val的值填充输入的张量或变量
314 | 参数：
315 | tensor – n维的torch.Tensor或autograd.Variable
316 | val – 用来填充张量的值
317 | """
318 | w=torch.Tensor(3,5)
319 | nn.init.constant_(w,1.2)
320 | print(w)
321 | tensor([[1.2000, 1.2000, 1.2000, 1.2000, 1.2000],
322 |         [1.2000, 1.2000, 1.2000, 1.2000, 1.2000],
323 |         [1.2000, 1.2000, 1.2000, 1.2000, 1.2000]])
324 | ```
325 | 
326 | 
327 | 
328 | ### 单位矩阵初始化
329 | 
330 | torch.nn.init.eye_(tensor)
331 | 将二维tensor初始化为单位矩阵（the identity matrix）
332 | 
333 | ```python
334 | 
335 | """
336 | torch.nn.init.eye(tensor)
337 | 用单位矩阵来填充2维输入张量或变量。在线性层尽可能多的保存输入特性。
338 | 参数：
339 | tensor – 2维的torch.Tensor或autograd.Variable
340 | """
341 | w=torch.Tensor(3,5)
342 | nn.init.eye_(w)
343 | print(w)
344 | tensor([[1., 0., 0., 0., 0.],
345 |         [0., 1., 0., 0., 0.],
346 |         [0., 0., 1., 0., 0.]])
347 | 
348 | ```
349 | 
350 | 
351 | 
352 | ### 正交初始化
353 | 
354 | torch.nn.init.orthogonal_(tensor, gain=1)
355 | 使得tensor是正交的，论文:Exact solutions to the nonlinear dynamics of learning in deep linear neural networks” - Saxe, A. et al. (2013)
356 | 
357 | ```python
358 | """
359 | torch.nn.init.orthogonal_(tensor, gain=1)
360 | 25 torch.nn.init - PyTorch中文文档
361 | https://pytorch-cn.readthedocs.io/zh/latest/package_references/nn_init/ 5/5
362 | 用（半）正交矩阵填充输入的张量或变量。输入张量必须至少是2维的，对于更高维度的张
363 | 量，超出的维度会被展平，视作行等于第一个维度，列等于稀疏矩阵乘积的2维表示。其中非
364 | 零元素生成自均值为0，标准差为std的正态分布。
365 | 
366 | 参数：
367 | tensor – n维的torch.Tensor或 autograd.Variable，其中n>=2
368 | gain -可选
369 | """
370 | w = torch.Tensor(3, 5)
371 | nn.init.orthogonal_(w)
372 | print(w)
373 | ```
374 | 
375 | 
376 | 
377 | ### 稀疏初始化
378 | 
379 | torch.nn.init.sparse_(tensor, sparsity, std=0.01)
380 | 从正态分布N～（0. std）中进行稀疏化，使每一个column有一部分为0
381 | sparsity- 每一个column稀疏的比例，即为0的比例_
382 | 
383 | sparsity - 每列中需要被设置成零的元素比例
384 | std - 用于生成非零值的正态分布的标准差
385 | nn.init.sparse_(w, sparsity=0.1)
386 | 
387 | ```python
388 | w = torch.Tensor(3, 5)
389 | nn.init.sparse_(w, sparsity=0.1)
390 | print(w)
391 | 
392 | tensor([[-0.0042,  0.0000,  0.0000, -0.0016,  0.0000],
393 |         [ 0.0000,  0.0050,  0.0082,  0.0000,  0.0003],
394 |         [ 0.0018, -0.0016, -0.0003, -0.0068,  0.0103]])
395 | ```
396 | 
397 | 
398 | 
399 | ### dirac
400 | 
401 | ```python
402 | """
403 | torch.nn.init.dirac(tensor)
404 | 用Dirac 函数来填充{3, 4, 5}维输入张量或变量。在卷积层尽可能多的保存输入通道特性
405 | 参数：
406 | tensor – {3, 4, 5}维的torch.Tensor或autograd.Variable
407 | """
408 | w=torch.Tensor(3,16,5,5)
409 | nn.init.dirac_(w)
410 | print(w)
411 | 
412 | w.sum()
413 | tensor(3.)
414 | ```
415 | 
416 | 
417 | 
418 | 
419 | 
420 | ### 计算增益calculate_gain
421 | 
422 | torch.nn.init.calculate_gain(nonlinearity, param=None)
423 | 
424 | ```python
425 | torch.nn.init.calculate_gain(nonlinearity,param=None)
426 | 对于给定的非线性函数，返回推荐的增益值.
427 | 参数：
428 | nonlinearity - 非线性函数（ nn.functional 名称）
429 | param - 非线性函数的可选参数
430 | 
431 | from torch import nn
432 | import torch
433 | gain = nn.init.calculate_gain('leaky_relu')
434 | print(gain)
435 | 
436 | 1.4141428569978354
437 | ```
438 | 
439 | 
440 | 
441 | |nonlinearity|	gain|
442 | | ---- | ---- |
443 | |Linear / Identity|	1|
444 | |Conv{1,2,3}D|	1|
445 | |Sigmoid|	1|
446 | |Tanh	|5/3|
447 | |ReLU	|sqrt(2)|
448 | ||         |
449 | 
450 | 


--------------------------------------------------------------------------------
/01一步步实现神经网络.md:
--------------------------------------------------------------------------------
  1 | # numpy实现
  2 | 
  3 | ```python
  4 | import numpy as np
  5 | 
  6 | # N is batch size; D_in is input dimension;
  7 | # H is hidden dimension; D_out is output dimension.
  8 | N, D_in, H, D_out = 64, 1000, 100, 10
  9 | 
 10 | # Create random input and output data
 11 | x = np.random.randn(N, D_in)
 12 | y = np.random.randn(N, D_out)
 13 | 
 14 | # Randomly initialize weights
 15 | w1 = np.random.randn(D_in, H)
 16 | w2 = np.random.randn(H, D_out)
 17 | 
 18 | learning_rate = 1e-6
 19 | for t in range(500):
 20 |     # Forward pass: compute predicted y
 21 |     h = x.dot(w1)
 22 |     h_relu = np.maximum(h, 0)
 23 |     y_pred = h_relu.dot(w2)
 24 | 
 25 |     # Compute and print loss
 26 |     loss = np.square(y_pred - y).sum()
 27 |     print(t, loss)
 28 | 
 29 |     # Backprop to compute gradients of w1 and w2 with respect to loss
 30 |     
 31 |     # loss = (y_pred - y) ** 2
 32 |     grad_y_pred = 2.0 * (y_pred - y)
 33 |     # 
 34 |     grad_w2 = h_relu.T.dot(grad_y_pred)
 35 |     grad_h_relu = grad_y_pred.dot(w2.T)
 36 |     grad_h = grad_h_relu.copy()
 37 |     grad_h[h < 0] = 0
 38 |     grad_w1 = x.T.dot(grad_h)
 39 | 
 40 |     # Update weights
 41 |     w1 -= learning_rate * grad_w1
 42 |     w2 -= learning_rate * grad_w2
 43 | ```
 44 | 
 45 | 
 46 | 
 47 | # tensor 实现
 48 | 
 49 | ```python
 50 | import torch
 51 | 
 52 | 
 53 | dtype = torch.float
 54 | device = torch.device("cpu")
 55 | # device = torch.device("cuda:0") # Uncomment this to run on GPU
 56 | 
 57 | # N is batch size; D_in is input dimension;
 58 | # H is hidden dimension; D_out is output dimension.
 59 | N, D_in, H, D_out = 64, 1000, 100, 10
 60 | 
 61 | # Create random input and output data
 62 | x = torch.randn(N, D_in, device=device, dtype=dtype)
 63 | y = torch.randn(N, D_out, device=device, dtype=dtype)
 64 | 
 65 | # Randomly initialize weights
 66 | w1 = torch.randn(D_in, H, device=device, dtype=dtype)
 67 | w2 = torch.randn(H, D_out, device=device, dtype=dtype)
 68 | 
 69 | learning_rate = 1e-6
 70 | for t in range(500):
 71 |     # Forward pass: compute predicted y
 72 |     h = x.mm(w1)
 73 |     h_relu = h.clamp(min=0)
 74 |     y_pred = h_relu.mm(w2)
 75 | 
 76 |     # Compute and print loss
 77 |     loss = (y_pred - y).pow(2).sum().item()
 78 |     print(t, loss)
 79 | 
 80 |     # Backprop to compute gradients of w1 and w2 with respect to loss
 81 |     grad_y_pred = 2.0 * (y_pred - y)
 82 |     grad_w2 = h_relu.t().mm(grad_y_pred)
 83 |     grad_h_relu = grad_y_pred.mm(w2.t())
 84 |     grad_h = grad_h_relu.clone()
 85 |     grad_h[h < 0] = 0
 86 |     grad_w1 = x.t().mm(grad_h)
 87 | 
 88 |     # Update weights using gradient descent
 89 |     w1 -= learning_rate * grad_w1
 90 |     w2 -= learning_rate * grad_w2
 91 | ```
 92 | 
 93 | 
 94 | 
 95 | # 自动求导
 96 | 
 97 | ```python
 98 | import torch
 99 | 
100 | dtype = torch.float
101 | device = torch.device("cpu")
102 | # device = torch.device("cuda:0") # Uncomment this to run on GPU
103 | 
104 | # N 是 batch size; D_in 是 input dimension;
105 | # H 是 hidden dimension; D_out 是 output dimension.
106 | N, D_in, H, D_out = 64, 1000, 100, 10
107 | 
108 | # 创建随机的Tensor来保存输入和输出
109 | # 设定requires_grad=False表示在反向传播的时候我们不需要计算gradient
110 | x = torch.randn(N, D_in, device=device, dtype=dtype)
111 | y = torch.randn(N, D_out, device=device, dtype=dtype)
112 | 
113 | # 创建随机的Tensor和权重。
114 | # 设置requires_grad=True表示我们希望反向传播的时候计算Tensor的gradient
115 | w1 = torch.randn(D_in, H, device=device, dtype=dtype, requires_grad=True)
116 | w2 = torch.randn(H, D_out, device=device, dtype=dtype, requires_grad=True)
117 | 
118 | learning_rate = 1e-6
119 | for t in range(500):
120 |     # 前向传播:通过Tensor预测y；这个和普通的神经网络的前向传播没有任何不同，
121 |     # 但是我们不需要保存网络的中间运算结果，因为我们不需要手动计算反向传播。
122 |     y_pred = x.mm(w1).clamp(min=0).mm(w2)
123 | 
124 |     # 通过前向传播计算loss
125 |     # loss是一个形状为(1，)的Tensor
126 |     # loss.item()可以给我们返回一个loss的scalar
127 |     loss = (y_pred - y).pow(2).sum()
128 |     print(t, loss.item())
129 | 
130 |     # PyTorch给我们提供了autograd的方法做反向传播。如果一个Tensor的requires_grad=True，
131 |     # backward会自动计算loss相对于每个Tensor的gradient。在backward之后，
132 |     # w1.grad和w2.grad会包含两个loss相对于两个Tensor的gradient信息。
133 |     loss.backward()
134 | 
135 |     # 我们可以手动做gradient descent(后面我们会介绍自动的方法)。
136 |     # 用torch.no_grad()包含以下statements，因为w1和w2都是requires_grad=True，
137 |     # 但是在更新weights之后我们并不需要再做autograd。
138 |     # 另一种方法是在weight.data和weight.grad.data上做操作，这样就不会对grad产生影响。
139 |     # tensor.data会我们一个tensor，这个tensor和原来的tensor指向相同的内存空间，
140 |     # 但是不会记录计算图的历史。
141 |     with torch.no_grad():
142 |         w1 -= learning_rate * w1.grad
143 |         w2 -= learning_rate * w2.grad
144 | 
145 |         # Manually zero the gradients after updating weights
146 |         w1.grad.zero_()
147 |         w2.grad.zero_()
148 | ```
149 | 
150 | 
151 | 
152 | # 定义自动求导函数
153 | 
154 | ```python
155 | 
156 | import torch
157 | 
158 | class MyReLU(torch.autograd.Function):
159 |   """
160 |   We can implement our own custom autograd Functions by subclassing
161 |   torch.autograd.Function and implementing the forward and backward passes
162 |   which operate on Tensors.
163 |   """
164 |   @staticmethod
165 |   def forward(ctx, x):
166 |     """
167 |     In the forward pass we receive a context object and a Tensor containing the
168 |     input; we must return a Tensor containing the output, and we can use the
169 |     context object to cache objects for use in the backward pass.
170 |     """
171 |     ctx.save_for_backward(x)
172 |     return x.clamp(min=0)
173 | 
174 |   @staticmethod
175 |   def backward(ctx, grad_output):
176 |     """
177 |     In the backward pass we receive the context object and a Tensor containing
178 |     the gradient of the loss with respect to the output produced during the
179 |     forward pass. We can retrieve cached data from the context object, and must
180 |     compute and return the gradient of the loss with respect to the input to the
181 |     forward function.
182 |     """
183 |     x, = ctx.saved_tensors
184 |     grad_x = grad_output.clone()
185 |     grad_x[x < 0] = 0
186 |     return grad_x
187 | 
188 | 
189 | dtype = torch.float
190 | device = torch.device("cpu")
191 | # device = torch.device("cuda:0") # Uncomment this to run on GPU
192 | 
193 | # N 是 batch size; D_in 是 input dimension;
194 | # H 是 hidden dimension; D_out 是 output dimension.
195 | N, D_in, H, D_out = 64, 1000, 100, 10
196 | 
197 | # 创建随机的Tensor来保存输入和输出
198 | # 设定requires_grad=False表示在反向传播的时候我们不需要计算gradient
199 | x = torch.randn(N, D_in, device=device, dtype=dtype)
200 | y = torch.randn(N, D_out, device=device, dtype=dtype)
201 | 
202 | # 创建随机的Tensor和权重。
203 | # 设置requires_grad=True表示我们希望反向传播的时候计算Tensor的gradient
204 | w1 = torch.randn(D_in, H, device=device, dtype=dtype, requires_grad=True)
205 | w2 = torch.randn(H, D_out, device=device, dtype=dtype, requires_grad=True)
206 | 
207 | learning_rate = 1e-6
208 | for t in range(500):
209 |     # 前向传播:通过Tensor预测y；
210 |     # 使用自定义的ReLU
211 |     y_pred = MyReLU.apply(x.mm(w1)).mm(w2)
212 |    
213 |     # 通过前向传播计算loss
214 |     loss = (y_pred - y).pow(2).sum()
215 |     print(t, loss.item())
216 | 
217 |     loss.backward()
218 | 
219 |     with torch.no_grad():
220 |         w1 -= learning_rate * w1.grad
221 |         w2 -= learning_rate * w2.grad
222 | 
223 |         # Manually zero the gradients after updating weights
224 |         w1.grad.zero_()
225 |         w2.grad.zero_()
226 | ```
227 | 
228 | 
229 | 
230 | # pytorch 中的nn
231 | 
232 | ```python
233 | import torch
234 | 
235 | # N is batch size; D_in is input dimension;
236 | # H is hidden dimension; D_out is output dimension.
237 | N, D_in, H, D_out = 64, 1000, 100, 10
238 | 
239 | # Create random Tensors to hold inputs and outputs
240 | x = torch.randn(N, D_in)
241 | y = torch.randn(N, D_out)
242 | 
243 | # Use the nn package to define our model as a sequence of layers. nn.Sequential
244 | # is a Module which contains other Modules, and applies them in sequence to
245 | # produce its output. Each Linear Module computes output from input using a
246 | # linear function, and holds internal Tensors for its weight and bias.
247 | model = torch.nn.Sequential(
248 |     torch.nn.Linear(D_in, H),
249 |     torch.nn.ReLU(),
250 |     torch.nn.Linear(H, D_out),
251 | )
252 | 
253 | # The nn package also contains definitions of popular loss functions; in this
254 | # case we will use Mean Squared Error (MSE) as our loss function.
255 | loss_fn = torch.nn.MSELoss(reduction='sum')
256 | 
257 | learning_rate = 1e-4
258 | for t in range(500):
259 |     # Forward pass: compute predicted y by passing x to the model. Module objects
260 |     # override the __call__ operator so you can call them like functions. When
261 |     # doing so you pass a Tensor of input data to the Module and it produces
262 |     # a Tensor of output data.
263 |     y_pred = model(x)
264 | 
265 |     # Compute and print loss. We pass Tensors containing the predicted and true
266 |     # values of y, and the loss function returns a Tensor containing the
267 |     # loss.
268 |     loss = loss_fn(y_pred, y)
269 |     print(t, loss.item())
270 | 
271 |     # Zero the gradients before running the backward pass.
272 |     model.zero_grad()
273 | 
274 |     # Backward pass: compute gradient of the loss with respect to all the learnable
275 |     # parameters of the model. Internally, the parameters of each Module are stored
276 |     # in Tensors with requires_grad=True, so this call will compute gradients for
277 |     # all learnable parameters in the model.
278 |     loss.backward()
279 | 
280 |     # Update the weights using gradient descent. Each parameter is a Tensor, so
281 |     # we can access its gradients like we did before.
282 |     with torch.no_grad():
283 |         for param in model.parameters():
284 |             param -= learning_rate * param.grad
285 | ```
286 | 
287 | 
288 | 
289 | # PyTorch: optim
290 | 
291 | 这一次我们不再手动更新模型的weights,而是使用optim这个包来帮助我们更新参数。 optim这个package提供了各种不同的模型优化方法，包括SGD+momentum, RMSProp, Adam等等。
292 | 
293 | ```python
294 | import torch
295 | 
296 | # N is batch size; D_in is input dimension;
297 | # H is hidden dimension; D_out is output dimension.
298 | N, D_in, H, D_out = 64, 1000, 100, 10
299 | 
300 | # Create random Tensors to hold inputs and outputs
301 | x = torch.randn(N, D_in)
302 | y = torch.randn(N, D_out)
303 | 
304 | # Use the nn package to define our model and loss function.
305 | model = torch.nn.Sequential(
306 |     torch.nn.Linear(D_in, H),
307 |     torch.nn.ReLU(),
308 |     torch.nn.Linear(H, D_out),
309 | )
310 | loss_fn = torch.nn.MSELoss(reduction='sum')
311 | 
312 | learning_rate = 1e-4
313 | optimizer = torch.optim.Adam(model.parameters(), lr=learning_rate)
314 | for t in range(500):
315 |     # Forward pass: compute predicted y by passing x to the model.
316 |     y_pred = model(x)
317 | 
318 |     # 计算损失
319 |     loss = loss_fn(y_pred, y)
320 |     print(t, loss.item())
321 |     # 梯度清零
322 |     optimizer.zero_grad()
323 |     # 反向过程：计算梯度
324 |     loss.backward()
325 | 
326 |     # 更新参数
327 |     optimizer.step()
328 | ```
329 | 
330 | 
331 | 
332 | # PyTorch: 自定义 nn Modules
333 | 
334 | ```python
335 | import torch
336 | 
337 | 
338 | class TwoLayerNet(torch.nn.Module):
339 |     def __init__(self, D_in, H, D_out):
340 |         """
341 |         In the constructor we instantiate two nn.Linear modules and assign them as
342 |         member variables.
343 |         """
344 |         super(TwoLayerNet, self).__init__()
345 |         self.linear1 = torch.nn.Linear(D_in, H)
346 |         self.linear2 = torch.nn.Linear(H, D_out)
347 | 
348 |     def forward(self, x):
349 |         """
350 |         In the forward function we accept a Tensor of input data and we must return
351 |         a Tensor of output data. We can use Modules defined in the constructor as
352 |         well as arbitrary operators on Tensors.
353 |         """
354 |         h_relu = self.linear1(x).clamp(min=0)
355 |         y_pred = self.linear2(h_relu)
356 |         return y_pred
357 | 
358 | 
359 | 
360 | N, D_in, H, D_out = 64, 1000, 100, 10
361 | x = torch.randn(N, D_in)
362 | y = torch.randn(N, D_out)
363 | # Construct our model by instantiating the class defined above
364 | model = TwoLayerNet(D_in, H, D_out)
365 | 
366 | criterion = torch.nn.MSELoss(reduction='sum')
367 | optimizer = torch.optim.SGD(model.parameters(), lr=1e-4)
368 | for t in range(500):
369 |     # Forward pass: Compute predicted y by passing x to the model
370 |     y_pred = model(x)
371 | 
372 |     # Compute and print loss
373 |     loss = criterion(y_pred, y)
374 |     print(t, loss.item())
375 | 
376 |     # Zero gradients, perform a backward pass, and update the weights.
377 |     optimizer.zero_grad()
378 |     loss.backward()
379 |     optimizer.step()
380 | ```
381 | 
382 | 


--------------------------------------------------------------------------------
/1PyTorch 实现中的一些常用技巧.md:
--------------------------------------------------------------------------------
  1 | 模型统计数据（Model Statistics)
  2 | ------------------------
  3 | 
  4 | ### 统计参数总数量
  5 | 
  6 | ```python
  7 | num_params  =  sum(param.numel()  for  param in  model.parameters())
  8 | ```
  9 | 
 10 | 参数正则化（Weight Regularization）
 11 | ----------------------------
 12 | 
 13 | ### 以前的方法
 14 | 
 15 | #### L2/L1 Regularization
 16 | 
 17 | 机器学习中几乎都可以看到损失函数后面会添加一个额外项，常用的额外项一般有两种，称作**_L1正则化_**和**_L2正则化_**，或者**_L1范数_**和**_L2范数_**。
 18 | 
 19 | L1 正则化和 L2 正则化可以看做是损失函数的惩罚项。所谓 “惩罚” 是指对损失函数中的某些参数做一些限制。
 20 | 
 21 | *   L1 正则化是指权值向量 w中各个元素的**_绝对值之和_**，通常表示为 ${||w||}_1$
 22 | *   L2 正则化是指权值向量 w中各个元素的**_平方和然后再求平方根_**，通常表示为{||w||}_2$
 23 | 
 24 | 下面是L1正则化和L2正则化的作用，这些表述可以在很多文章中找到。
 25 | 
 26 | *   L1 正则化可以产生稀疏权值矩阵，即产生一个稀疏模型，可以用于特征选择
 27 | *   L2 正则化可以防止模型过拟合（overfitting）；一定程度上，L1也可以防止过拟合
 28 | 
 29 | L2 正则化的实现方法：  
 30 | 
 31 | ```python
 32 | reg = 1e-6
 33 | l2_loss = Variable(torch.FloatTensor(1), requires_grad=True)
 34 | for name, param in model.named_parameters():
 35 |     if \'bias\' not in name:
 36 |         l2_loss = l2_loss   (0.5 * reg * torch.sum(torch.pow(W, 2)))
 37 | ```
 38 | 
 39 | L1 正则化的实现方法：  
 40 | 
 41 | ```python
 42 | reg = 1e-6
 43 | l1_loss = Variable(torch.FloatTensor(1), requires_grad=True)
 44 | for name, param in model.named_parameters():
 45 |     if \'bias\' not in name:
 46 |         l1_loss = l1_loss   (reg * torch.sum(torch.abs(W)))
 47 | ```
 48 | 
 49 | 
 50 | 
 51 | #### Orthogonal Regularization
 52 | 
 53 | ```python
 54 | reg = 1e-6
 55 | orth_loss = Variable(torch.FloatTensor(1), requires_grad=True)
 56 | for name, param in model.named_parameters():
 57 |     if \'bias\' not in name:
 58 |         param_flat = param.view(param.shape[0], -1)
 59 |         sym = torch.mm(param_flat, torch.t(param_flat))
 60 |         sym -= Variable(torch.eye(param_flat.shape[0]))
 61 |         orth_loss = orth_loss   (reg * sym.sum())
 62 | ```
 63 | 
 64 | 
 65 | 
 66 | #### Max Norm Constraint
 67 | 
 68 | 简单来讲就是对 w 的指直接进行限制。  
 69 | 
 70 | ```python
 71 | def max_norm(model, max_val=3, eps=1e-8):
 72 |     for name, param in model.named_parameters():
 73 |         if \'bias\' not in name:
 74 |             norm = param.norm(2, dim=0, keepdim=True)
 75 |             desired = torch.clamp(norm, 0, max_val)
 76 |             param = param * (desired / (eps   norm))
 77 | ```
 78 | 
 79 | ### L2正则
 80 | 
 81 | 在pytorch中进行L2正则化，最直接的方式可以直接用优化器自带的weight_decay选项指定权值衰减率，相当于L2正则化中的λ
 82 | 
 83 | ```
 84 | optimizer = optim.SGD(model.parameters(), lr = 0.01, momentum=0.9,weight_decay=1e-5) 
 85 | ```
 86 | 
 87 | ```python
 88 | lambda = torch.tensor(1.) 
 89 | l2_reg = torch.tensor(0.) 
 90 | for param in model.parameters():     
 91 | 	l2_reg += torch.norm(param) 
 92 | loss += lambda * l2_reg 
 93 | ```
 94 | 
 95 | 
 96 | 
 97 | 此外，优化器还支持一种称之为Per-parameter options的操作，就是对每一个参数进行特定的指定，以满足更为细致的要求。做法也很简单，与上面不同的，我们传入的待优化变量不是一个Variable而是一个可迭代的字典，字典中必须有params的key，用于指定待优化变量，而其他的key需要匹配优化器本身的参数设置。
 98 | 
 99 | ```python
100 | optim.SGD([
101 |                 {'params': model.base.parameters()},
102 |                 {'params': model.classifier.parameters(), 'lr': 1e-3}
103 |             ], lr=1e-2, momentum=0.9)
104 | ```
105 | 
106 | 
107 | 
108 | ```python
109 | weight_p, bias_p = [],[]
110 | for name, p in model.named_parameters():
111 |   if 'bias' in name:
112 |      bias_p += [p]
113 |    else:
114 |      weight_p += [p]
115 | # 这里的model中每个参数的名字都是系统自动命名的，只要是权值都是带有weight，偏置都带有bias，
116 | # 因此可以通过名字判断属性，这个和tensorflow不同，tensorflow是可以用户自己定义名字的，当然也会系统自己定义。
117 | optim.SGD([
118 |           {'params': weight_p, 'weight_decay':1e-5},
119 |           {'params': bias_p, 'weight_decay':0}
120 |           ], lr=1e-2, momentum=0.9)
121 | ```
122 | 
123 | ### L1正则化
124 | 
125 | ```python
126 | criterion= nn.CrossEntropyLoss()
127 | 
128 | classify_loss = criterion(input=out, target=batch_train_label)
129 | 
130 | lambda = torch.tensor(1.)
131 | l1_reg = torch.tensor(0.)
132 | for param in model.parameters():
133 |     l1_reg += torch.sum(torch.abs(param))
134 | 
135 | loss =classify_loss+ lambda * l1_reg
136 | ```
137 | 
138 | 
139 | 
140 | ### 定义正则化类
141 | 
142 | ```python
143 | # 检查GPU是否可用
144 | device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
145 | # device='cuda'
146 | print("-----device:{}".format(device))
147 | print("-----Pytorch version:{}".format(torch.__version__))
148 |  
149 |  
150 | class Regularization(torch.nn.Module):
151 |     def __init__(self,model,weight_decay,p=2):
152 |         '''
153 |         :param model 模型
154 |         :param weight_decay:正则化参数
155 |         :param p: 范数计算中的幂指数值，默认求2范数,
156 |                   当p=0为L2正则化,p=1为L1正则化
157 |         '''
158 |         super(Regularization, self).__init__()
159 |         if weight_decay <= 0:
160 |             print("param weight_decay can not <=0")
161 |             exit(0)
162 |         self.model=model
163 |         self.weight_decay=weight_decay
164 |         self.p=p
165 |         self.weight_list=self.get_weight(model)
166 |         self.weight_info(self.weight_list)
167 |  
168 |     def to(self,device):
169 |         '''
170 |         指定运行模式
171 |         :param device: cude or cpu
172 |         :return:
173 |         '''
174 |         self.device=device
175 |         super().to(device)
176 |         return self
177 |  
178 |     def forward(self, model):
179 |         self.weight_list=self.get_weight(model)#获得最新的权重
180 |         reg_loss = self.regularization_loss(self.weight_list, self.weight_decay, p=self.p)
181 |         return reg_loss
182 |  
183 |     def get_weight(self,model):
184 |         '''
185 |         获得模型的权重列表
186 |         :param model:
187 |         :return:
188 |         '''
189 |         weight_list = []
190 |         for name, param in model.named_parameters():
191 |             if 'weight' in name:
192 |                 weight = (name, param)
193 |                 weight_list.append(weight)
194 |         return weight_list
195 |  
196 |     def regularization_loss(self,weight_list, weight_decay, p=2):
197 |         '''
198 |         计算张量范数
199 |         :param weight_list:
200 |         :param p: 范数计算中的幂指数值，默认求2范数
201 |         :param weight_decay:
202 |         :return:
203 |         '''
204 |         # weight_decay=Variable(torch.FloatTensor([weight_decay]).to(self.device),requires_grad=True)
205 |         # reg_loss=Variable(torch.FloatTensor([0.]).to(self.device),requires_grad=True)
206 |         # weight_decay=torch.FloatTensor([weight_decay]).to(self.device)
207 |         # reg_loss=torch.FloatTensor([0.]).to(self.device)
208 |         reg_loss=0
209 |         for name, w in weight_list:
210 |             l2_reg = torch.norm(w, p=p)
211 |             reg_loss = reg_loss + l2_reg
212 |  
213 |         reg_loss=weight_decay*reg_loss
214 |         return reg_loss
215 |  
216 |     def weight_info(self,weight_list):
217 |         '''
218 |         打印权重列表信息
219 |         :param weight_list:
220 |         :return:
221 |         '''
222 |         print("---------------regularization weight---------------")
223 |         for name ,w in weight_list:
224 |             print(name)
225 |         print("---------------------------------------------------")
226 | ```
227 | 
228 | #### 正则化类的使用
229 | 
230 | ```python
231 | # 检查GPU是否可用
232 | device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
233 |  
234 | print("-----device:{}".format(device))
235 | print("-----Pytorch version:{}".format(torch.__version__))
236 |  
237 | weight_decay=100.0 # 正则化参数
238 |  
239 | model = my_net().to(device)
240 | # 初始化正则化
241 | if weight_decay>0:
242 |    reg_loss=Regularization(model, weight_decay, p=2).to(device)
243 | else:
244 |    print("no regularization")
245 |  
246 |  
247 | criterion= nn.CrossEntropyLoss().to(device) # CrossEntropyLoss=softmax+cross entropy
248 | optimizer = optim.Adam(model.parameters(),lr=learning_rate)#不需要指定参数weight_decay
249 |  
250 | # train
251 | batch_train_data=...
252 | batch_train_label=...
253 |  
254 | out = model(batch_train_data)
255 |  
256 | # loss and regularization
257 | loss = criterion(input=out, target=batch_train_label)
258 | if weight_decay > 0:
259 |    loss = loss + reg_loss(model)
260 | total_loss = loss.item()
261 |  
262 | # backprop
263 | optimizer.zero_grad()#清除当前所有的累积梯度
264 | total_loss.backward()
265 | optimizer.step()
266 | ```
267 | 
268 | 
269 | 
270 | ### **学习率衰减**
271 | 
272 | torch.optim.lr_scheduler 
273 | 
274 | #### 根据迭代次数
275 | 
276 | 当epoch每过stop_size时,学习率都变为初始学习率的gamma倍
277 | 
278 | ```python
279 | optimizer = optim.SGD(params=model.parameters(), lr=0.05)
280 | 
281 | # lr_scheduler.StepLR()
282 | # Assuming optimizer uses lr = 0.05 for all groups
283 | # lr = 0.05     if epoch < 30
284 | # lr = 0.005    if 30 <= epoch < 60
285 | # lr = 0.0005   if 60 <= epoch < 90
286 | 
287 | scheduler = lr_scheduler.StepLR(optimizer, step_size=30, gamma=0.1)
288 | plt.figure()
289 | x = list(range(100))
290 | y = []
291 | for epoch in range(100):
292 |     scheduler.step()
293 |     lr = scheduler.get_lr()
294 |     print(epoch, scheduler.get_lr()[0])
295 |     y.append(scheduler.get_lr()[0])
296 | ```
297 | 
298 | #### 根据测试指标
299 | 
300 | ```python
301 | CLASS torch.optim.lr_scheduler.ReduceLROnPlateau(optimizer, 
302 | mode='min', factor=0.1, patience=10, verbose=False, 
303 | threshold=0.0001, threshold_mode='rel', cooldown=0, min_lr=0, eps=1e-08)
304 | ```
305 | 
306 | 
307 | 
308 | # 查看Pytorch网络的各层输出(feature map)、权重(weight)、偏置(bias)
309 | 
310 | ## weight and bias
311 | 
312 | ```python
313 | # Method 1 查看Parameters的方式多样化，直接访问即可
314 | model = alexnet(pretrained=True).to(device)
315 | conv1_weight = model.features[0].weight
316 | 
317 | # Method 2 
318 | # 这种方式还适合你想自己参考一个预训练模型写一个网络，各层的参数不变，但网络结构上表述有所不同
319 | # 这样你就可以把param迭代出来，赋给你的网络对应层，避免直接load不能匹配的问题！
320 | for layer,param in model.state_dict().items(): # param is weight or bias(Tensor) 
321 | 	print layer,param
322 | ```
323 | 
324 | ## feature map
325 | 由于pytorch是动态网络，不存储计算数据，查看各层输出的特征图并不是很方便！分下面两种情况讨论：
326 | 
327 | 1、你想查看的层是独立的,那么你在forward时用变量接收并返回即可！！
328 | 
329 | ```python
330 | class Net(nn.Module):
331 |     def __init__(self):
332 |         self.conv1 = nn.Conv2d(1, 1, 3)
333 |         self.conv2 = nn.Conv2d(1, 1, 3)
334 |         self.conv3 = nn.Conv2d(1, 1, 3)
335 | 
336 |     def forward(self, x):
337 |         out1 = F.relu(self.conv1(x))
338 |         out2 = F.relu(self.conv2(out1))
339 |         out3 = F.relu(self.conv3(out2))
340 |         return out1, out2, out3
341 | ```
342 | 
343 | 2、你的想看的层在nn.Sequential()顺序容器中，这个麻烦些，主要有以下几种思路：
344 | 
345 | ```python
346 | # Method 1 巧用nn.Module.children()
347 | # 在模型实例化之后，利用nn.Module.children()删除你查看的那层的后面层
348 | import torch
349 | import torch.nn as nn
350 | from torchvision import models
351 | 
352 | model = models.alexnet(pretrained=True)
353 | 
354 | # remove last fully-connected layer
355 | new_classifier = nn.Sequential(*list(model.classifier.children())[:-1])
356 | model.classifier = new_classifier
357 | # Third convolutional layer
358 | new_features = nn.Sequential(*list(model.features.children())[:5])
359 | model.features = new_features
360 | ```
361 | 
362 | 
363 | ​    
364 | ```python
365 | # Method 2 巧用hook,推荐使用这种方式，不用改变原有模型
366 | # torch.nn.Module.register_forward_hook(hook)
367 | # hook(module, input, output) -> None
368 | 
369 | model = models.alexnet(pretrained=True)
370 | # 定义
371 | def hook (module,input,output):
372 |     print output.size()
373 | # 注册
374 | handle = model.features[0].register_forward_hook(hook)
375 | # 删除句柄
376 | handle.remove()
377 | 
378 | # torch.nn.Module.register_backward_hook(hook)
379 | # hook(module, grad_input, grad_output) -> Tensor or None
380 | model = alexnet(pretrained=True).to(device)
381 | outputs = []
382 | def hook (module,input,output):
383 |     outputs.append(output)
384 |     print len(outputs)
385 | 
386 | handle = model.features[0].register_backward_hook(hook)
387 | ```
388 | 
389 | 注：还可以通过定义一个提取特征的类，甚至是重构成各层独立相同模型将问题转化成第一种
390 | 
391 | ## 计算模型参数数量
392 | ```python
393 | def count_parameters(model):
394 |     return sum(p.numel() for p in model.parameters() if p.requires_grad)
395 | ```
396 | # 自定义Operation(Function)
397 | class torch.autograd.Function能为微分操作定义公式并记录操作历史，在Tensor上执行的每个操作都会创建一个新的函数对象，它执行计算，并记录它发生的事件。历史记录以函数的DAG形式保留，边表示数据依赖关系（输入< - 输出）。 然后，当backward被调用时，通过调用每个Function对象的backward()方法并将返回的梯度传递给下一个Function，以拓扑顺序处理图。
398 | 
399 | 一般来说，用户与函数交互的唯一方法是通过创建子类并定义新的操作。这是拓展torch.autograd的推荐方法。
400 | 
401 | ## 创建子类的注意事项
402 | 
403 | - 子类必须重写forward()，backward()方法，且为静态方法，定义时需加@staticmethod装饰器。
404 | - forward()必须接受一个contextctx作为第一个参数，context可用于存储可在反向传播期间检索的张量。后面可接任意个数的参数(张量或者其他类型)。
405 | - backward()必须接受一个contextctx作为第一个参数，context可用于检索前向传播期间保存的张量。
406 | - 其参数是forward()给定输出的梯度，数量与forward()返回值个数一致。其返回值是forward()对应输入的梯度，数量与forward()的输入个数一致。
407 |   使用class_name.apply(arg)的方式即可调用该操作
408 | 
409 | ### 示例1：自定义ReLU激活函数
410 | 
411 | ```python
412 | class MyReLU(torch.autograd.Function):
413 | """
414 | We can implement our own custom autograd Functions by subclassing
415 | torch.autograd.Function and implementing the forward and backward passes
416 | which operate on Tensors.
417 | """
418 | 
419 |     @staticmethod
420 |     def forward(ctx, input):
421 |         """
422 |         In the forward pass we receive a Tensor containing the input and return
423 |         a Tensor containing the output. ctx is a context object that can be used
424 |         to stash information for backward computation. You can cache arbitrary
425 |         objects for use in the backward pass using the ctx.save_for_backward method.
426 |         """
427 |         ctx.save_for_backward(input)
428 |         return input.clamp(min=0)
429 | 
430 |     @staticmethod
431 |     def backward(ctx, grad_output):
432 |         """
433 |         In the backward pass we receive a Tensor containing the gradient of the loss
434 |         with respect to the output, and we need to compute the gradient of the loss
435 |         with respect to the input.
436 |         """
437 |         input, = ctx.saved_tensors
438 |         grad_input = grad_output.clone()
439 |         grad_input[input < 0] = 0
440 |         return grad_input
441 | ```
442 | 
443 | 
444 | ### 示例2：自定义OHEMHingeLoss损失函数
445 | 
446 | ```python
447 | # from the https://github.com/yjxiong/action-detection
448 | class OHEMHingeLoss(torch.autograd.Function):
449 |     """
450 |     This class is the core implementation for the completeness loss in paper.
451 |     It compute class-wise hinge loss and performs online hard negative mining (OHEM).
452 |     """
453 | 
454 |     @staticmethod
455 |     def forward(ctx, pred, labels, is_positive, ohem_ratio, group_size):
456 |         n_sample = pred.size()[0]
457 |         assert n_sample == len(labels), "mismatch between sample size and label size"
458 |         losses = torch.zeros(n_sample)
459 |         slopes = torch.zeros(n_sample)
460 |         for i in range(n_sample):
461 |             losses[i] = max(0, 1 - is_positive * pred[i, labels[i] - 1])
462 |             slopes[i] = -is_positive if losses[i] != 0 else 0
463 | 
464 |         losses = losses.view(-1, group_size).contiguous()
465 |         sorted_losses, indices = torch.sort(losses, dim=1, descending=True)
466 |         keep_num = int(group_size * ohem_ratio)
467 |         loss = torch.zeros(1).cuda()
468 |         for i in range(losses.size(0)):
469 |             loss += sorted_losses[i, :keep_num].sum()
470 |         ctx.loss_ind = indices[:, :keep_num]
471 |         ctx.labels = labels
472 |         ctx.slopes = slopes
473 |         ctx.shape = pred.size()
474 |         ctx.group_size = group_size
475 |         ctx.num_group = losses.size(0)
476 |         return loss
477 | 
478 |     @staticmethod
479 |     def backward(ctx, grad_output):
480 |         labels = ctx.labels
481 |         slopes = ctx.slopes
482 | 
483 |         grad_in = torch.zeros(ctx.shape)
484 |         for group in range(ctx.num_group):
485 |             for idx in ctx.loss_ind[group]:
486 |                 loc = idx + group * ctx.group_size
487 |                 grad_in[loc, labels[loc] - 1] = slopes[loc] * grad_output.data[0]
488 |         return torch.autograd.Variable(grad_in.cuda()), None, None, None, None
489 | ```


--------------------------------------------------------------------------------
/3pytorch中的损失函数.md:
--------------------------------------------------------------------------------
  1 | 把最常用的记住就行了
  2 | 
  3 | # 1交叉熵损失函数CrossEntropyLoss
  4 | 
  5 | cross_entropy输入的logits是未经过softmax层的输出。
  6 | 
  7 | 而标签值为一个数字，而不是对应的one-hot向量。
  8 | $$
  9 | loss(x, class) = -log(\frac{exp(x[class])}{(\sum_j exp(x[j]))})
 10 |                = -x[class] + log(\sum_j exp(x[j]))
 11 | $$
 12 | 
 13 | 
 14 | ![img](F:/%E7%AC%94%E8%AE%B0%E6%95%B4%E7%90%86/%E6%9C%89%E9%81%93%E4%BA%91%E7%AC%94%E8%AE%B0/yangsenupc@163.com/32f7540b9310445d879401203e4e0881/clipboard.png)
 15 | 
 16 | ```python
 17 | class torch.nn.CrossEntropyLoss(weight=None, size_average=None, ignore_index=-100, reduce=None, reduction='mean')
 18 | # 将输入经过 softmax 激活函数之后，再计算其与 target 的交叉熵损失。即该方法将
 19 | # nn.LogSoftmax()和 nn.NLLLoss()进行了结合
 20 | # 输入的target是标签，而不能是对应的one-hot向量
 21 | 
 22 | #weight:a manual rescaling weight given to each class. If given, has to be a Tensor of size  C
 23 | ```
 24 | 
 25 | ## 模板
 26 | 
 27 | ```python
 28 | criteon = nn.CrossEntropyLoss().to(device)
 29 | 
 30 | for batch_idx, (data, target) in enumerate(train_loader):
 31 |         data = data.view(-1, 28*28)
 32 |         data, target = data.to(device), target.cuda()
 33 | 
 34 |         logits = model(data)
 35 |         loss = criteon(logits, target)
 36 |         # logits.shape:[batch,C] ,target.shape:[batch]
 37 |         # C为类别总数
 38 |         # 手写数字识别的例子batch=50
 39 |         # torch.Size([50, 10])
 40 | 		# torch.Size([50])
 41 | 
 42 |         optimizer.zero_grad()
 43 |         loss.backward()
 44 |         optimizer.step()
 45 | 
 46 | ```
 47 | 
 48 | 
 49 | 
 50 | | torch.nn         | torch.nn.functional (F) |
 51 | | ---------------- | ----------------------- |
 52 | | CrossEntropyLoss | cross_entropy           |
 53 | | LogSoftmax       | log_softmax             |
 54 | | NLLLoss          | nll_loss                |
 55 | 
 56 | # 2NLLLoss
 57 | 
 58 | negative log likelihood loss：最大似然 / log似然代价函数
 59 | 
 60 | ```python
 61 | torch.nn.NLLLoss
 62 | loss(input, class) = -input[class]。 举个例，三分类任务， 
 63 | input=[-1.233, 2.657, 0.534]， 真实标签为 2（class=2），则 loss 为-0.534
 64 | ```
 65 | 
 66 | nll-loss 输入的则是经过softmax和log后的输出
 67 | 
 68 | ## 模板
 69 | 
 70 | ```python
 71 | out=F.log_softmax(out,dim=1)
 72 | # #带log的softmax分类，每个样本返回N个概率,N为类别总数
 73 | ```
 74 | 
 75 | ```python
 76 | for batch_idx, (data, target) in enumerate(train_loader):
 77 |         data, target = data.to(device), target.to(device)
 78 |         optimizer.zero_grad() #梯度归零
 79 |         output = model(data)  #输出的维度[N,10] 这里的data是函数的forward参数x
 80 |         loss = F.nll_loss(output, target) #这里loss求的是平均数，除以了batch
 81 | #F.nll_loss(F.log_softmax(input), target) ：
 82 | #单分类交叉熵损失函数，一张图片里只能有一个类别，输入input的需要softmax
 83 | #还有一种是多分类损失函数，一张图片有多个类别，输入的input需要sigmoid     
 84 |         loss.backward()
 85 |         optimizer.step()
 86 | 
 87 | ```
 88 | 
 89 | 
 90 | 
 91 | # 3L1loss 
 92 | 
 93 | ## 功能
 94 | 
 95 | 计算 output 和 target 之差的绝对值，可选返回同维度的 tensor 或者是一个标量 
 96 | 
 97 | ```python
 98 | torch.nn.L1Loss(size_average=None, reduce=None, reduction='mean')
 99 | 
100 | reduce(bool)-返回值是否为标量，默认为 True
101 | size_average(bool)-当 reduce=True 时有效。为 True 时，返回的 loss 为平均值；为 False
102 | 时，返回的各样本的 loss 之和。
103 | size_average and reduce are in the process of being deprecated
104 | 
105 | reduction='mean':输出为标量,求均值
106 | reduction='sum':输出为标量,求和
107 | reduction='none':输出为张量,不降维
108 | ```
109 | 
110 | ![img](https://pytorch.apachecn.org/docs/1.0/img/415564bfa6c89ba182a02fe2a3d0ca49.jpg)
111 | 
112 | where $N$ is the batch size. If reduce is `True`, then:
113 | $$
114 | \ell(x, y)=\left\{\begin{array}{ll}{\operatorname{mean}(L),} & {\text { if reduction }=\text { 'mean' }} \\ {\operatorname{sum}(L),} & {\text { if reduction }=\text { 'sum' }}\end{array}\right.
115 | $$
116 | 
117 | $$
118 | \begin{array}{l}{\text { Input }(N, *) \text { where } * \text { means, any number of additional dimensions }} \\ {\text { Target: }(N, *) \text { , same shape as the input }} \\ {\text { Output: scalar. If reduce is Falue, then }(N, *), \text { same shape as the input }}\end{array}
119 | $$
120 | 
121 | 
122 | # 4MSELoss
123 | 
124 | ## 功能
125 | 
126 | 计算 output 和 target 之差的平方，可选返回同维度的 tensor 或者是一个标量 
127 | 
128 | 
129 | 
130 | ```python
131 | torch.nn.MSELoss(size_average=None, reduce=None, reduction='mean')
132 | reduction='mean':输出为标量,求均值
133 | reduction='sum':输出为标量,求和
134 | reduction='none':输出为张量,不降维
135 | ```
136 | 
137 | The unreduced (i.e. with `reduction` set to `'none'`) loss can be described as:
138 | $$
139 | \ell(x, y)=L=\left\{l_{1}, \ldots, l_{N}\right\}^{\top}, \quad l_{n}=\left(x_{n}-y_{n}\right)^{2}
140 | $$
141 | where $N$ is the batch size. If `reduction` is not `'none'` (default `'mean'`), then:
142 | $$
143 | \ell(x, y)=\left\{\begin{array}{ll}{\operatorname{mean}(L),} & {\text { if reduction }=\text { 'mean' }} \\ {\operatorname{sum}(L),} & {\text { if reduction }=\text { 'sum' }}\end{array}\right.
144 | $$
145 | 
146 | 
147 | # 5BCELoss 二分类任务时的交叉熵
148 | 
149 | ## 功能
150 | 
151 | 二分类任务时的交叉熵计算函数。此函数可以认为是 nn.CrossEntropyLoss 函数的特例。其分类限定为二分类， y 必须是{0,1}。还需要注意的是， input 应该为概率分布的形式，这样才符合交叉熵的应用。所以在 BCELoss 之前， input 一般为 sigmoid 激活层的输出
152 | 
153 | ```python
154 | torch.nn.BCELoss(weight=None, size_average=None, reduce=None, reduction='mean')
155 | # weight(Tensor)- 为每个类别的loss设置权值，常用于类别不均衡问题。
156 | ```
157 | 
158 | The unreduced (i.e. with `reduction` set to `'none'`) loss can be described as
159 | $$
160 | \ell(x, y)=L=\left\{l_{1}, \ldots, l_{N}\right\}^{T}, \quad l_{n}=-w_{n}\left[y_{n} \cdot \log x_{n}+\left(1-y_{n}\right) \cdot \log \left(1-x_{n}\right)\right]\\
161 | \ell(x, y)=\left\{\begin{array}{ll}{\operatorname{mean}(L),} & {\text { if reduction }=\text { 'mean' }} \\ {\operatorname{sum}(L),} & {\text { if reduction }=\text { 'sum' }}\end{array}\right.
162 | $$
163 | 
164 | 
165 | # 6BCEWithLogitsLoss 
166 | 
167 | ## 功能
168 | 
169 | 将 Sigmoid 与 BCELoss 结合，类似于 CrossEntropyLoss(将 nn.LogSoftmax()和 nn.NLLLoss()进行结合）。即 input 会经过 Sigmoid 激活函数，将 input 变成概率分布的形式。 
170 | 
171 | ```python
172 | torch.nn.BCEWithLogitsLoss(weight=None, size_average=None, reduce=None, reduction='mean', pos_weight=None)
173 | # weight(Tensor): 为batch中单个样本设置权值，If given, has to be a Tensor of size “nbatch”.
174 | # pos_weight: 正样本的权重, 当p>1，提高召回率，当P<1，提高精确度。可达到权衡召回率(Recall)和精确度(Precision)的作用。 Must be a vector with length equal to the number of classes.
175 | ```
176 | 
177 | 
178 | $$
179 | \ell(x, y)=L=\left\{l_{1}, \ldots, l_{N}\right\}^{\top}, \quad l_{n}=-w_{n}\left[y_{n} \cdot \log \sigma\left(x_{n}\right)+\left(1-y_{n}\right) \cdot \log \left(1-\sigma\left(x_{n}\right)\right)\right]
180 | $$
181 | 
182 | 
183 | 
184 | 
185 | 
186 | # 7.PoissonNLLLoss
187 | ```
188 | torch.nn.PoissonNLLLoss(log_input=True, full=False, size_average=None, eps=1e-08, reduce=None, reduction='mean')
189 | ```
190 | 
191 | 功能：
192 | 用于target服从泊松分布的分类任务。
193 | 计算公式：
194 | $$
195 | \text{target} \sim \mathrm{Poisson}(\text{input})\\
196 | \text{loss}(\text{input}, \text{target}) = \text{input} - \text{target} * \log(\text{input}) + \log(\text{target!})
197 | $$
198 | 参数：
199 | 
200 | - log_input(bool)- 为True时，计算公式为：$loss(input,target)=exp(input) - target * input$;
201 |     为False时，$loss(input,target)=input - target * log(input+eps)$
202 | 
203 | - full(bool)- 是否计算全部的loss。例如，当采用斯特林公式近似阶乘项时，此为 target*log(target) - target+0.5∗log(2πtarget)
204 | 
205 | - eps(float)- 当log_input = False时，用来防止计算log(0)，而增加的一个修正项。即 $loss(input,target)=input - target * log(input+eps)$
206 | 
207 | - reduction (*string*,*optional*) – Specifies the reduction to apply to the output: `'none'` | `'mean'` | `'sum'`. `'none'`: no reduction will be applied, `'mean'`: the sum of the output will be divided by the number of elements in the output, `'sum'`: the output will be summed. Note: `size_average` and `reduce` are in the process of being deprecated, and in the meantime, specifying either of those two args will override `reduction`. Default: `'mean'`
208 | 
209 |     ### shape
210 | 
211 |     Input: $(N,∗)$ where $∗ $means, any number of additional dimensions
212 | 
213 |     Target: $(N,∗)$, same shape as the input
214 | 
215 |     Output: scalar by default. If `reduction` is `'none'`, then$(N,∗)$ the same shape as the input
216 | 
217 | # 8.KLDivLoss
218 | ```python
219 | torch.nn.KLDivLoss(size_average=None, reduce=None, reduction='mean')
220 | 
221 | ```
222 | 
223 | 功能：
224 | 计算input和target之间的KL散度( Kullback–Leibler divergence) 。
225 | 计算公式：
226 | $$
227 | l(x, y)=L=\left\{l_{1}, \ldots, l_{N}\right\}, \quad l_{n}=y_{n} \cdot\left(\log y_{n}-x_{n}\right)
228 | $$
229 | （后面有代码手动计算，证明计算公式确实是这个，但是为什么没有对x_n计算对数呢？）
230 | 
231 |  If `reduction` is not `'none'`(default `'mean'`), then:
232 | $$
233 | \ell(x, y) = \begin{cases} \operatorname{mean}(L); \text{if reduction} = \text{'mean';} \\ \operatorname{sum}(L); \text{if reduction} = \text{'sum';} \end{cases}
234 | $$
235 | 补充：KL散度
236 | KL散度( Kullback–Leibler divergence) 又称为相对熵(Relative Entropy)，用于描述两个概率分布之间的差异。计算公式(离散时)：
237 | 
238 | 其中p表示真实分布，q表示p的拟合分布， D(P||Q)表示当用概率分布q来拟合真实分布p时，产生的信息损耗。这里的信息损耗，可以理解为损失，损失越低，拟合分布q越接近真实分布p。同时也可以从另外一个角度上观察这个公式，即计算的是 p 与 q 之间的对数差在 p 上的期望值。
239 | 特别注意，D(p||q) ≠ D(q||p)， 其不具有对称性，因此不能称为K-L距离。
240 | 信息熵 = 交叉熵 - 相对熵
241 | 从信息论角度观察三者，其关系为信息熵 = 交叉熵 - 相对熵。在机器学习中，当训练数据固定，最小化相对熵 D(p||q) 等价于最小化交叉熵 H(p,q) 。
242 | 
243 | 参数：
244 | **reduction** (*string*, *optional*) – Specifies the reduction to apply to the output: `'none'` | `'batchmean'` | `'sum'` | `'mean'`. `'none'`: no reduction will be applied. `'batchmean'`: the sum of the output will be divided by batchsize. `'sum'`: the output will be summed. `'mean'`: the output will be divided by the number of elements in the output. Default: `'mean'`
245 | 
246 | 使用注意事项：
247 | 要想获得真正的KL散度，需要如下操作：
248 | 
249 | `reduction` = `'mean'` doesn’t return the `true kl divergence` value, please use `reduction` = `'batchmean'` which aligns with KL math definition. 在下一个主要版本中, `'mean'` will be changed to be the same as `'batchmean'`.
250 | 
251 | ## shape
252 | 
253 | Input: $(N,∗)$ where $∗ $means, any number of additional dimensions
254 | 
255 | Target: $(N,∗)$, same shape as the input
256 | 
257 | Output: scalar by default. If `reduction` is `'none'`, then$(N,∗)$ the same shape as the input
258 | 
259 | 
260 | 
261 | # 9.MarginRankingLoss
262 | ```
263 | torch.nn.MarginRankingLoss(margin=0.0, size_average=None, reduce=None, reduction='mean')
264 | ```
265 | 
266 | 功能：
267 | 计算两个向量之间的相似度，当两个向量之间的距离大于margin，则loss为正，小于margin，loss为0。
268 | 计算公式：
269 | $$
270 | \operatorname{loss}(x, y)=\max (0,-y *(x 1-x 2)+\operatorname{margin})
271 | $$
272 | y = 1时，x1要比x2大，才不会有loss，反之，y = -1 时，x1要比x2小，才不会有loss。
273 | 参数：
274 | margin(float):x1和x2之间的差异。
275 | **reduction** (*string*, *optional*) – Specifies the reduction to apply to the output: `'none'` | `'mean'` | `'sum'`. `'none'`: no reduction will be applied, `'mean'`: the sum of the output will be divided by the number of elements in the output, `'sum'`: the output will be summed. Note: `size_average` and `reduce` are in the process of being deprecated, and in the meantime, specifying either of those two args will override `reduction`. Default: `'mean'`
276 | 
277 | ## shape
278 | 
279 | $$
280 | \begin{array}{l}{\text { Input: }(N, D) \text { where } N \text { is the batch size and } D \text { is the size of a sample. }} \\ {\text { Target: }(N)} \\ {\text { Output: scalar. If reduction is 'none', then }(N) .}\end{array}
281 | $$
282 | 
283 | # 10.HingeEmbeddingLoss
284 | ```
285 | torch.nn.HingeEmbeddingLoss(margin=1.0, size_average=None, reduce=None, reduction='mean')
286 | ```
287 | 
288 | 功能：
289 | 未知。为折页损失的拓展，主要用于衡量两个输入是否相似。 used for learning nonlinear embeddings or semi-supervised 。
290 | 计算公式：
291 | 
292 | The loss function for n*n*-th sample in the mini-batch is
293 | 
294 | $l_n = \begin{cases} x_n,  \text{if}\; y_n = 1,\\ \max \{0, \Delta - x_n\}, \text{if}\; y_n = -1, \end{cases}$
295 | 
296 | and the total loss functions is
297 | 
298 | $\ell(x, y) = \begin{cases} \operatorname{mean}(L), \text{if reduction} = \text{mean;}\\ \operatorname{sum}(L), \text{if reduction} = \text{sum.} \end{cases}$
299 | 
300 | where $L = \{l_1,\dots,l_N\}^\top.$
301 | 
302 | 参数：
303 | margin(float)- 默认值为1，容忍的差距。
304 | 
305 | ## shape
306 | 
307 | $$
308 | \begin{array}{l}{\text { input: } : \text { (*) where * means, any number of dimensions.
309 | The sum operation operates over all the elements. }} \\ {\text { Target: }(*), \text { same shape as the input }} \\ {\text { Output: scalar. If reduction is 'none', then same shape as the input }}\end{array}
310 | $$
311 | 
312 | # 11.MultiLabelMarginLoss
313 | class torch.nn.MultiLabelMarginLoss(size_average=None, reduce=None, reduction=‘elementwise_mean’)
314 | 功能：
315 | 用于一个样本属于多个类别时的分类任务。例如一个四分类任务，样本x属于第0类，第1类，不属于第2类，第3类。
316 | 计算公式：
317 | 
318 | x[y[j]] 表示 样本x所属类的输出值，x[i]表示不等于该类的输出值。
319 | 
320 | 参数：
321 | size_average(bool)- 当reduce=True时有效。为True时，返回的loss为平均值；为False时，返回的各样本的loss之和。
322 | reduce(bool)- 返回值是否为标量，默认为True。
323 | Input: © or (N,C) where N is the batch size and C is the number of classes.
324 | Target: © or (N,C), same shape as the input.
325 | 
326 | # 12.SmoothL1Loss
327 | class torch.nn.SmoothL1Loss(size_average=None, reduce=None, reduction=‘elementwise_mean’)
328 | 功能：
329 | 计算平滑L1损失，属于 Huber Loss中的一种(因为参数δ固定为1了)。
330 | 补充：
331 | Huber Loss常用于回归问题，其最大的特点是对离群点（outliers）、噪声不敏感，具有较强的鲁棒性。
332 | 公式为：
333 | 
334 | 理解为，当误差绝对值小于δ，采用L2损失；若大于δ，采用L1损失。
335 | 回到SmoothL1Loss，这是δ=1时的Huber Loss。
336 | 计算公式为：
337 | 
338 | 对应下图红色线：
339 | 
340 | 参数：
341 | size_average(bool)- 当reduce=True时有效。为True时，返回的loss为平均值；为False时，返回的各样本的loss之和。
342 | reduce(bool)- 返回值是否为标量，默认为True。
343 | 
344 | # 13.SoftMarginLoss
345 | class torch.nn.SoftMarginLoss(size_average=None, reduce=None, reduction=‘elementwise_mean’)
346 | 功能：
347 | Creates a criterion that optimizes a two-class classification logistic loss between input tensor xand target tensor y (containing 1 or -1). （暂时看不懂怎么用，有了解的朋友欢迎补充！）
348 | 计算公式：
349 | 
350 | 参数：
351 | size_average(bool)- 当reduce=True时有效。为True时，返回的loss为平均值；为False时，返回的各样本的loss之和。
352 | reduce(bool)- 返回值是否为标量，默认为True。
353 | 
354 | # 14.MultiLabelSoftMarginLoss
355 | class torch.nn.MultiLabelSoftMarginLoss(weight=None, size_average=None, reduce=None, reduction=‘elementwise_mean’)
356 | 功能：
357 | SoftMarginLoss多标签版本，a multi-label one-versus-all loss based on max-entropy,
358 | 计算公式：
359 | 
360 | 参数：
361 | weight(Tensor)- 为每个类别的loss设置权值。weight必须是float类型的tensor，其长度要于类别C一致，即每一个类别都要设置有weight。
362 | 
363 | # 15.CosineEmbeddingLoss
364 | class torch.nn.CosineEmbeddingLoss(margin=0, size_average=None, reduce=None, reduction=‘elementwise_mean’)
365 | 功能：
366 | 用Cosine函数来衡量两个输入是否相似。 used for learning nonlinear embeddings or semi-supervised 。
367 | 计算公式：
368 | 
369 | 参数：
370 | margin(float)- ： 取值范围[-1,1]， 推荐设置范围 [0, 0.5]
371 | size_average(bool)- 当reduce=True时有效。为True时，返回的loss为平均值；为False时，返回的各样本的loss之和。
372 | reduce(bool)- 返回值是否为标量，默认为True。
373 | 
374 | # 16.MultiMarginLoss
375 | class torch.nn.MultiMarginLoss(p=1, margin=1, weight=None, size_average=None, reduce=None, reduction=‘elementwise_mean’)
376 | 功能：
377 | 计算多分类的折页损失。
378 | 计算公式：
379 | 
380 | 其中，0≤y≤x.size(1) ; i == 0 to x.size(0) and i≠y; p==1 or p ==2; w[y]为各类别的weight。
381 | 参数：
382 | p(int)- 默认值为1，仅可选1或者2。
383 | margin(float)- 默认值为1
384 | weight(Tensor)- 为每个类别的loss设置权值。weight必须是float类型的tensor，其长度要于类别C一致，即每一个类别都要设置有weight。
385 | size_average(bool)- 当reduce=True时有效。为True时，返回的loss为平均值；为False时，返回的各样本的loss之和。
386 | reduce(bool)- 返回值是否为标量，默认为True。
387 | 
388 | # 17.TripletMarginLoss
389 | class torch.nn.TripletMarginLoss(margin=1.0, p=2, eps=1e-06, swap=False, size_average=None, reduce=None, reduction=‘elementwise_mean’)
390 | 功能：
391 | 计算三元组损失，人脸验证中常用。
392 | 如下图Anchor、Negative、Positive，目标是让Positive元和Anchor元之间的距离尽可能的小，Positive元和Negative元之间的距离尽可能的大。
393 | 
394 | 从公式上看，Anchor元和Positive元之间的距离加上一个threshold之后，要小于Anchor元与Negative元之间的距离。
395 | 
396 | 计算公式：
397 | 
398 | 
399 | 参数：
400 | margin(float)- 默认值为1
401 | p(int)- The norm degree ，默认值为2
402 | swap(float)– The distance swap is described in detail in the paper Learning shallow convolutional feature descriptors with triplet losses by V. Balntas, E. Riba et al. Default: False
403 | size_average(bool)- 当reduce=True时有效。为True时，返回的loss为平均值；为False时，返回的各样本的loss之和。
404 | 
405 | reduce(bool)- 返回值是否为标量，默认为True。
406 | 
407 | 
408 | 
409 | # 18CTCLoss
410 | 
411 | 
412 | 
413 | 


--------------------------------------------------------------------------------
/0tensor操作.md:
--------------------------------------------------------------------------------
  1 | | 函数                              | 功能                      |
  2 | | --------------------------------- | ------------------------- |
  3 | | Tensor(*sizes)                    | 基础构造函数              |
  4 | | tensor(data,)                     | 类似np.array的构造函数    |
  5 | | ones(*sizes)                      | 全1Tensor                 |
  6 | | zeros(*sizes)                     | 全0Tensor                 |
  7 | | eye(*sizes)                       | 对角线为1，其他为0        |
  8 | | arange(s,e,step)                  | 从s到e，步长为step        |
  9 | | linspace(s,e,steps)               | 从s到e，均匀切分成steps份 |
 10 | | rand/randn(*sizes)                | 均匀/标准分布             |
 11 | | normal(mean,std)/uniform(from,to) | 正态分布/均匀分布         |
 12 | | randperm(m)                       | 长度为5随机排列           |
 13 | 
 14 | # 创建tensor
 15 | 
 16 | 这些创建方法都可以在创建的时候指定数据类型dtype和存放device(cpu/gpu)
 17 | 
 18 | 查看tensor的形状，`tensor.shape`等价于`tensor.size()`
 19 | 
 20 | ```python
 21 | import torch as t
 22 | # 用list的数据创建tensor
 23 | b = t.Tensor([[1,2,3],[4,5,6]])
 24 | b.tolist() # 把tensor转为list
 25 | b_size = b.size()
 26 | b.numel() # b中元素总个数，2*3，等价于b.nelement()
 27 | # 创建一个和b形状一样的tensor
 28 | c = t.Tensor(b_size)
 29 | # 创建一个元素为2和3的tensor
 30 | d = t.Tensor((2, 3))
 31 | ```
 32 | 
 33 | ```python
 34 | 
 35 | 
 36 | >>> torch.arange(0,10,1)
 37 | tensor([ 0.,  1.,  2.,  3.,  4.,  5.,  6.,  7.,  8.,  9.])
 38 | >>> torch.range(0,10,1)
 39 | tensor([  0.,   1.,   2.,   3.,   4.,   5.,   6.,   7.,   8.,   9.,
 40 |          10.])
 41 | ```
 42 | 
 43 | 
 44 | 
 45 | # Tensor类型
 46 | 
 47 | | Data type                | dtype                             | CPU tensor                                                   | GPU tensor                |
 48 | | ------------------------ | --------------------------------- | ------------------------------------------------------------ | ------------------------- |
 49 | | 32-bit floating point    | `torch.float32` or `torch.float`  | `torch.FloatTensor`                                          | `torch.cuda.FloatTensor`  |
 50 | | 64-bit floating point    | `torch.float64` or `torch.double` | `torch.DoubleTensor`                                         | `torch.cuda.DoubleTensor` |
 51 | | 16-bit floating point    | `torch.float16` or `torch.half`   | `torch.HalfTensor`                                           | `torch.cuda.HalfTensor`   |
 52 | | 8-bit integer (unsigned) | `torch.uint8`                     | [`torch.ByteTensor`](https://pytorch.org/docs/stable/tensors.html#torch.ByteTensor) | `torch.cuda.ByteTensor`   |
 53 | | 8-bit integer (signed)   | `torch.int8`                      | `torch.CharTensor`                                           | `torch.cuda.CharTensor`   |
 54 | | 16-bit integer (signed)  | `torch.int16` or `torch.short`    | `torch.ShortTensor`                                          | `torch.cuda.ShortTensor`  |
 55 | | 32-bit integer (signed)  | `torch.int32` or `torch.int`      | `torch.IntTensor`                                            | `torch.cuda.IntTensor`    |
 56 | | 64-bit integer (signed)  | `torch.int64` or `torch.long`     | `torch.LongTensor`                                           | `torch.cuda.LongTensor`   |
 57 | 
 58 | 
 59 | 
 60 | ```python
 61 | import torch as t
 62 | # 设置默认tensor，注意参数是字符串
 63 | t.set_default_tensor_type('torch.DoubleTensor')
 64 | a = t.Tensor(2,3)
 65 | a.dtype # 现在a是DoubleTensor,dtype是float64
 66 | # 恢复之前的默认设置
 67 | t.set_default_tensor_type('torch.FloatTensor')
 68 | t.zeros_like(a) #等价于t.zeros(a.shape,dtype=a.dtype,device=a.device)
 69 | t.zeros_like(a, dtype=t.int16) #可以修改某些属性
 70 | 
 71 | ```
 72 | 
 73 | ## 修改类型
 74 | 
 75 | ```python
 76 | >>> c = torch.tensor([3,4,5], dtype=torch.long)
 77 | >>> c
 78 | tensor([3, 4, 5])
 79 | >>> c.dtype
 80 | torch.int64
 81 | 
 82 | 
 83 | >>> a = torch.Tensor([2,3])
 84 | >>> a.dtype
 85 | torch.float32
 86 | >>> a.requires_grad
 87 | False
 88 | >>> a.int()
 89 | tensor([2, 3], dtype=torch.int32)
 90 | >>> a.short()
 91 | tensor([2, 3], dtype=torch.int16)
 92 | >>> a.type(torch.FloatTensor)
 93 | tensor([2., 3.])
 94 | >>> a.dtype
 95 | torch.float32
 96 | >>> a.long()# 为什么修改失败
 97 | tensor([2, 3])
 98 | >>> a.dtype
 99 | torch.float32
100 | >>> a.double()
101 | tensor([2., 3.], dtype=torch.float64)
102 | 
103 | >>> b=torch.LongTensor([4,5])
104 | >>> b
105 | tensor([4, 5])
106 | >>> b.dtype
107 | torch.int64
108 | ```
109 | 
110 | 
111 | 
112 | 
113 | 
114 | # 逐元素操作
115 | 
116 | | 函数                            | 功能                                  |
117 | | ------------------------------- | ------------------------------------- |
118 | | abs/sqrt/div/exp/fmod/log/pow.. | 绝对值/平方根/除法/指数/求余/求幂..   |
119 | | cos/sin/asin/atan2/cosh..       | 相关三角函数                          |
120 | | ceil/round/floor/trunc          | 上取整/四舍五入/下取整/只保留整数部分 |
121 | | clamp(input, min, max)          | 超过min和max部分截断                  |
122 | | sigmod/tanh..                   | 激活函数                              |
123 | 
124 | 
125 | 
126 | 
127 | 
128 | # Tensor和Numpy
129 | 
130 | Tensor---->Numpy  可以使用 data.numpy()，data为Tensor变量
131 | 
132 | Numpy ----> Tensor 可以使用torch.from_numpy(data)，data为numpy变量
133 | 
134 | ```python
135 | import numpy as np
136 | a = np.ones([2, 3],dtype=np.float32)
137 | b = t.from_numpy(a)
138 | b = t.Tensor(a) # 也可以直接将numpy对象传入Tensor
139 | 
140 | c = b.numpy() # a, b, c三个对象共享内存
141 | 
142 | # 当numpy的数据类型和Tensor的类型不一样的时候，数据会被复制，不会共享内存。
143 | # 不论输入的类型是什么，t.tensor(a)都会进行数据拷贝，不会共享内存
144 | ```
145 | 
146 | 
147 | 
148 | # 自动求导
149 | 
150 | 自动求导需要指定，默认创建的tensor不能求导
151 | 
152 | ```python
153 | #在创建tensor的时候指定requires_grad
154 | a = t.randn(3,4, requires_grad=True)
155 | # 或者
156 | a = t.randn(3,4).requires_grad_()
157 | # 或者
158 | a = t.randn(3,4)
159 | a.requires_grad=True
160 | ```
161 | 
162 | 
163 | 
164 | 
165 | 
166 | # tensor操作
167 | 
168 | 
169 | 
170 | ## **Tensor attributes:**
171 | 
172 | 在tensor attributes中有三个类，分别为torch.dtype, torch.device, 和 torch.layout
173 | 
174 | 其中， torch.dtype 是展示 torch.Tensor 数据类型的类，pytorch 有八个不同的数据类型,下表是完整的 dtype 列表.
175 | 
176 | 
177 | 
178 | ![img](https://pic1.zhimg.com/80/v2-95729ebb10269f807b0809fb09b125d0_hd.jpg)
179 | 
180 | 
181 | 
182 | Torch.device 是表现 torch.Tensor被分配的设备类型的类，其中分为’cpu’ 和 ‘cuda’两种，如果设备序号没有显示则表示此 tensor 被分配到当前设备, 比如: 'cuda' 等同于 'cuda': X , X 为torch.cuda.current _device() 返回值
183 | 
184 | 我们可以通过 tensor.device 来获取其属性，同时可以利用字符或字符+序号的方式来分配设备
185 | 
186 | ```python3
187 | 通过字符串：
188 | >>> torch.device('cuda:0')
189 | device(type='cuda', index=0)
190 | >>> torch.device('cpu')
191 | device(type='cpu')
192 | >>> torch.device('cuda') # 当前设备
193 | device(type='cuda')
194 | 
195 | 通过字符串和设备序号：
196 | >>> torch.device('cuda', 0)
197 | device(type='cuda', index=0)
198 | >>> torch.device('cpu', 0)
199 | device(type='cpu', index=0)
200 | ```
201 | 
202 | 此外，cpu 和 cuda 设备的转换使用 'to' 来实现：
203 | 
204 | ```text
205 | >>> device_cpu = torch.device("cuda")  #声明cuda设备
206 | >>> device_cuda = torch.device('cuda')  #设备cpu设备
207 | >>> data = torch.Tensor([1])
208 | >>> data.to(device_cpu)  #将数据转为cpu格式
209 | >>> data.to(device_cuda)   #将数据转为cuda格式
210 | ```
211 | 
212 | 
213 | 
214 | torch.layout 是表现 torch.Tensor 内存分布的类，目前只支持 torch.strided
215 | 
216 | 
217 | 
218 | ## **创建tensor**
219 | 
220 | - 直接创建
221 | 
222 | torch.tensor(data, dtype=None, device=None,requires_grad=False)
223 | 
224 | data - 可以是list, tuple, numpy array, scalar或其他类型
225 | 
226 | dtype - 可以返回想要的tensor类型
227 | 
228 | device - 可以指定返回的设备
229 | 
230 | requires_grad - 可以指定是否进行记录图的操作，默认为False
231 | 
232 | 需要注意的是，torch.tensor 总是会复制 data, 如果你想避免复制，可以使 torch.Tensor. detach()，如果是从 numpy 中获得数据，那么你可以用 torch.from_numpy(), 注from_numpy() 是共享内存的
233 | 
234 | 
235 | 
236 | ```text
237 | >>> torch.tensor([[0.1, 1.2], [2.2, 3.1], [4.9, 5.2]])
238 | tensor([[ 0.1000,  1.2000],
239 |         [ 2.2000,  3.1000],
240 |         [ 4.9000,  5.2000]])
241 |  
242 | >>> torch.tensor([0, 1])  # Type inference on data
243 | tensor([ 0,  1])
244 |  
245 | >>> torch.tensor([[0.11111, 0.222222, 0.3333333]],
246 |                  dtype=torch.float64,
247 |                  device=torch.device('cuda:0'))  # creates a torch.cuda.DoubleTensor
248 | tensor([[ 0.1111,  0.2222,  0.3333]], dtype=torch.float64, device='cuda:0')
249 |  
250 | >>> torch.tensor(3.14159)  # Create a scalar (zero-dimensional tensor)
251 | tensor(3.1416)
252 |  
253 | >>> torch.tensor([])  # Create an empty tensor (of size (0,))
254 | tensor([])
255 | ```
256 | 
257 | 
258 | 
259 | - 从numpy中获得数据
260 | 
261 | torch.from_numpy(ndarry)
262 | 
263 | 注：生成返回的tensor会和ndarry共享数据，任何对tensor的操作都会影响到ndarry,
264 | 反之亦然
265 | 
266 | ```text
267 | >>> a = numpy.array([1, 2, 3])
268 | >>> t = torch.from_numpy(a)
269 | >>> t
270 | tensor([ 1,  2,  3])
271 | >>> t[0] = -1
272 | >>> a
273 | array([-1,  2,  3])
274 | ```
275 | 
276 | 
277 | 
278 | - 创建特定的tensor
279 | 
280 | 根据数值要求：
281 | 
282 | ```text
283 | torch.zeros(*sizes, out=None, ..)# 返回大小为sizes的零矩阵 
284 | 
285 | torch.zeros_like(input, ..) # 返回与input相同size的零矩阵
286 | 
287 | torch.ones(*sizes, out=None, ..) #f返回大小为sizes的单位矩阵
288 | 
289 | torch.ones_like(input, ..) #返回与input相同size的单位矩阵
290 | 
291 | torch.full(size, fill_value, …) #返回大小为sizes,单位值为fill_value的矩阵
292 | 
293 | torch.full_like(input, fill_value, …) 返回与input相同size，单位值为fill_value的矩阵
294 | 
295 | torch.arange(start=0, end, step=1, …) #返回从start到end, 单位步长为step的1-d tensor.
296 | 
297 | torch.linspace(start, end, steps=100, …)  #返回从start到end, 间隔中的插值数目为steps的1-d tensor
298 | 
299 | torch.logspace(start, end, steps=100, …) #返回1-d tensor ，从10^start到10^end的steps个对数间隔
300 | ```
301 | 
302 | 根据矩阵要求:
303 | 
304 | ```text
305 | torch.eye(n, m=None, out=None,…) #返回2-D 的单位对角矩阵
306 | 
307 | torch.empty(*sizes, out=None, …) #返回被未初始化的数值填充，大小为sizes的tensor
308 | 
309 | torch.empty_like(input, …) # 返回与input相同size,并被未初始化的数值填充的tensor
310 | ```
311 | 
312 | 
313 | 
314 | - *随机采用生成:*
315 | 
316 | ```text
317 | torch.normal(mean, std, out=None)
318 | 
319 | torch.rand(*size, out=None, dtype=None, …) #返回[0,1]之间均匀分布的随机数值
320 | 
321 | torch.rand_like(input, dtype=None, …) #返回与input相同size的tensor, 填充均匀分布的随机数值
322 | 
323 | torch.randint(low=0, high, size,…) #返回均匀分布的[low,high]之间的整数随机值
324 | 
325 | torch.randint_like(input, low=0, high, dtype=None, …) #
326 | 
327 | torch.randn(*sizes, out=None, …) #返回大小为size,由均值为0，方差为1的正态分布的随机数值
328 | 
329 | torch.randn_like(input, dtype=None, …)
330 | 
331 | torch.randperm(n, out=None, dtype=torch.int64) # 返回0到n-1的数列的随机排列
332 | ```
333 | 
334 | 
335 | 
336 | 
337 | 
338 | 
339 | 
340 | ## **操作tensor**
341 | 
342 | 基本操作：
343 | 
344 | Joining ops:
345 | 
346 | ```text
347 | torch.cat(seq,dim=0,out=None) # 沿着dim连接seq中的tensor, 所有的tensor必须有相同的size或为empty， 其相反的操作为 torch.split() 和torch.chunk()
348 | torch.stack(seq, dim=0, out=None) #同上
349 | 
350 | #注: .cat 和 .stack的区别在于 cat会增加现有维度的值,可以理解为续接，stack会新加增加一个维度，可以
351 | 理解为叠加
352 | >>> a=torch.Tensor([1,2,3])
353 | >>> torch.stack((a,a)).size()
354 | torch.size(2,3)
355 | >>> torch.cat((a,a)).size()
356 | torch.size(6)
357 | ```
358 | 
359 | 
360 | 
361 | ```text
362 | torch.gather(input, dim, index, out=None) #返回沿着dim收集的新的tensor
363 | >> t = torch.Tensor([[1,2],[3,4]])
364 | >> index = torch.LongTensor([[0,0],[1,0]])
365 | >> torch.gather(t, 0, index) #由于 dim=0,所以结果为
366 | | t[index[0, 0] 0]   t[index[0, 1] 1] |
367 | | t[index[1, 0] 0]   t[index[1, 1] 1] |
368 | 
369 | 对于3-D 的张量来说，可以作为
370 | 
371 | out[i][j][k] = input[index[i][j][k]][j][k]  # if dim == 0
372 | out[i][j][k] = input[i][index[i][j][k]][k]  # if dim == 1
373 | out[i][j][k] = input[i][j][index[i][j][k]]  # if dim == 2
374 | ```
375 | 
376 | 
377 | 
378 | clicing ops:
379 | 
380 | ```text
381 | torch.split(tensor, split_size_or_sections, dim=0) #将tensor 拆分成相应的组块
382 | torch.chunk(tensor, chunks, dim=0) #将tensor 拆分成相应的组块， 最后一块会小一些如果不能整除的话#
383 | 
384 | #注：split和chunk的区别在于：
385 | split的split_size_or_sections 表示每一个组块中的数据大小，chunks表示组块的数量
386 | >>> a = torch.Tensor([1,2,3])
387 | >>> torch.split(a,1)
388 | (tensor([1.]), tensor([2.]), tensor([3.]))
389 | >>> torch.chunk(a,1)
390 | (tensor([ 1., 2., 3.]),)
391 | ```
392 | 
393 | 
394 | 
395 | Indexing ops:
396 | 
397 | ```text
398 | torch.index_select(input, dim, index, out=None) #返回沿着dim的指定tensor, index需为longTensor类型，不共用内存
399 | 
400 | torch.masked_select(input, mask, out=None) #根据mask来返回input的值其为1-D tensor. Mask为ByteTensor, true返回，false不返回，返回值不共用内存
401 | >>> x = torch.randn(3, 4)
402 | >>> x
403 | tensor([[ 0.3552, -2.3825, -0.8297,  0.3477],
404 |         [-1.2035,  1.2252,  0.5002,  0.6248],
405 |         [ 0.1307, -2.0608,  0.1244,  2.0139]])
406 | >>> mask = x.ge(0.5)
407 | >>> mask
408 | tensor([[ 0,  0,  0,  0],
409 |         [ 0,  1,  1,  1],
410 |         [ 0,  0,  0,  1]], dtype=torch.uint8)
411 | >>> torch.masked_select(x, mask)
412 | tensor([ 1.2252,  0.5002,  0.6248,  2.0139])
413 | ```
414 | 
415 | 
416 | 
417 | 
418 | 
419 | Mutation ops:
420 | 
421 | ```text
422 | torch.transpose(input, dim0, dim1, out=None) #返回dim0和dim1交换后的tensor
423 | torch.t(input, out=None) #专为2D矩阵的转置，是transpose的便捷函数
424 | 
425 | torch.squeeze(input, dim, out=None)  #默认移除所有size为1的维度，当dim指定时，移除指定size为1的维度. 返回的tensor会和input共享存储空间，所以任何一个的改变都会影响另一个
426 | torch.unsqueeze(input, dim, out=None) #扩展input的size, 如 A x B 变为 1 x A x B 
427 | 
428 | torch.reshape(input, shape) #返回size为shape具有相同数值的tensor, 注意 shape=(-1,)这种表述，-1表示任意的。
429 | #注 reshape(-1,)
430 | >>> a=torch.Tensor([1,2,3,4,5]) #a.size 是 torch.size(5)
431 | >>> b=a.reshape(1,-1)  #表示第一维度是1，第二维度按a的size填充满
432 | >>> b.size()
433 | torch.size([1,5])
434 | 
435 | torch.where(condition,x,y) #根据condition的值来相应x,y的值，true返回x的值，false返回y的值，形成新的tensor
436 | 
437 | torch.unbind(tensor, dim=0) #返回tuple 解除指定的dim的绑定,相当于按指定dim拆分
438 | >>> a=torch.Tensor([[1,2,3],[2,3,4]])
439 | >>> torch.unbind(a,dim=0)
440 | (torch([1,2,3]),torch([2,3,4])) # 将一个(2,3) 分为两个(3)
441 | 
442 | torch.nonzero(input, out=None) # 返回非零值的索引， 每一行都是一个非零值的索引值
443 | >>> torch.nonzero(torch.tensor([1, 1, 1, 0, 1]))
444 | tensor([[ 0],
445 |         [ 1],
446 |         [ 2],
447 |         [ 4]])
448 | >>> torch.nonzero(torch.tensor([[0.6, 0.0, 0.0, 0.0],
449 |                                 [0.0, 0.4, 0.0, 0.0],
450 |                                 [0.0, 0.0, 1.2, 0.0],
451 |                                 [0.0, 0.0, 0.0,-0.4]]))
452 | tensor([[ 0,  0],
453 |         [ 1,  1],
454 |         [ 2,  2],
455 |         [ 3,  3]])
456 | ```
457 | 
458 | 
459 | 
460 | ## **Tensor操作**
461 | 
462 | - 点对点操作
463 | 
464 | 三角函数：
465 | 
466 | ```text
467 | torch.abs(input, out=None)
468 | torch.acos(input, out=None)
469 | torch.asin(input, out=None)
470 | torch.atan(input, out=None)
471 | torch.atan2(input, inpu2, out=None) 
472 | torch.cos(input, out=None)
473 | torch.cosh(input, out=None)
474 | torch.sin(input, out=None)
475 | torch.sinh(input, out=None)
476 | torch.tan(input, out=None)
477 | torch.tanh(input, out=None)
478 | ```
479 | 
480 | 
481 | 
482 | 基本运算，加减乘除
483 | 
484 | ```text
485 | Torch.add(input, value, out=None)
486 |           .add(input, value=1, other, out=None)
487 |           .addcdiv(tensor, value=1, tensor1, tensor2, out=None)
488 |           .addcmul(tensor, value=1, tensor1, tensor2, out=None)
489 | torch.div(input, value, out=None)
490 |          .div(input, other, out=None)
491 | torch.mul(input, value, out=None)
492 |         .mul(input, other, out=None)
493 | ```
494 | 
495 | 
496 | 
497 | 对数运算：
498 | 
499 | ```text
500 | torch.log(input, out=None)  # y_i=log_e(x_i)
501 | torch.log1p(input, out=None)  #y_i=log_e(x_i+1)
502 | torch.log2(input, out=None)   #y_i=log_2(x_i)
503 | torch.log10(input,out=None)  #y_i=log_10(x_i)
504 | ```
505 | 
506 | 
507 | 
508 | 幂函数：
509 | 
510 | ```text
511 | torch.pow(input, exponent, out=None)  # y_i=input^(exponent)
512 | ```
513 | 
514 | 
515 | 
516 | 指数运算
517 | 
518 | ```text
519 | torch.exp(tensor, out=None)    #y_i=e^(x_i)
520 | torch.expm1(tensor, out=None)   #y_i=e^(x_i) -1
521 | ```
522 | 
523 | 
524 | 
525 | 截断函数
526 | 
527 | ```text
528 | torch.ceil(input, out=None)   #返回向正方向取得最小整数
529 | torch.floor(input, out=None)  #返回向负方向取得最大整数
530 | 
531 | torch.round(input, out=None)  #返回相邻最近的整数，四舍五入
532 | 
533 | torch.trunc(input, out=None)  #返回整数部分数值
534 | torch.frac(tensor, out=None)  #返回小数部分数值
535 | 
536 | torch.fmod(input, divisor, out=None)  #返回input/divisor的余数
537 | torch.remainder(input, divisor, out=None)  #同上
538 | ```
539 | 
540 | 
541 | 
542 | 其他运算
543 | 
544 | ```text
545 | torch.erf(tensor， out=None)
546 |  
547 | torch.erfinv(tensor, out=None)
548 |  
549 | torch.sigmoid(input, out=None)
550 |  
551 | torch.clamp(input, min, max out=None)  #返回 input<min,则返回min, input>max,则返回max,其余返回input
552 | 
553 | torch.neg(input, out=None) #out_i=-1*(input)
554 | 
555 | torch.reciprocal(input, out=None)  # out_i= 1/input_i
556 | 
557 | torch.sqrt(input, out=None)  # out_i=sqrt(input_i)
558 | torch.rsqrt(input, out=None) #out_i=1/(sqrt(input_i))
559 | 
560 | torch.sign(input, out=None)  #out_i=sin(input_i)  大于0为1，小于0为-1
561 | 
562 | torch.lerp(start, end, weight, out=None)
563 | ```
564 | 
565 | 
566 | 
567 | - 降维操作
568 | 
569 | ```text
570 | torch.argmax(input, dim=None, keepdim=False) #返回最大值排序的索引值
571 | torch.argmin(input, dim=None, keepdim=False)  #返回最小值排序的索引值
572 | 
573 | torch.cumprod(input, dim, out=None)  #y_i=x_1 * x_2 * x_3 *…* x_i
574 | torch.cumsum(input, dim, out=None)  #y_i=x_1 + x_2 + … + x_i
575 | 
576 | torch.dist(input, out, p=2)       #返回input和out的p式距离
577 | torch.mean()                      #返回平均值
578 | torch.sum()                       #返回总和
579 | torch.median(input)               #返回中间值
580 | torch.mode(input)                 #返回众数值
581 | torch.unique(input, sorted=False) #返回1-D的唯一的tensor,每个数值返回一次.
582 | >>> output = torch.unique(torch.tensor([1, 3, 2, 3], dtype=torch.long))
583 | >>> output
584 | tensor([ 2,  3,  1])
585 | 
586 | torch.std(  #返回标准差)
587 | torch.var() #返回方差
588 | 
589 | torch.norm(input, p=2) #返回p-norm的范式
590 | torch.prod(input, dim, keepdim=False) #返回指定维度每一行的乘积
591 | ```
592 | 
593 | 
594 | 
595 | - 对比操作：
596 | 
597 | ```text
598 | torch.eq(input, other, out=None)  #按成员进行等式操作，相同返回1
599 | torch.equal(tensor1, tensor2) #如果tensor1和tensor2有相同的size和elements，则为true
600 | >>> torch.eq(torch.tensor([[1, 2], [3, 4]]), torch.tensor([[1, 1], [4, 4]]))
601 | tensor([[ 1,  0],
602 |         [ 0,  1]], dtype=torch.uint8)
603 | >>> torch.eq(torch.tensor([[1, 2], [3, 4]]), torch.tensor([[1, 1], [4, 4]]))
604 | tensor([[ 1,  0],
605 |         [ 0,  1]], dtype=torch.uint8)
606 | 
607 | torch.ge(input, other, out=None)   # input>= other
608 | torch.gt(input, other, out=None)   # input>other
609 | torch.le(input, other, out=None)    # input=<other
610 | torch.lt(input, other, out=None)    # input<other
611 | torch.ne(input, other, out=None)  # input != other 不等于
612 | 
613 | torch.max()                        # 返回最大值
614 | torch.min()                        # 返回最小值
615 | torch.isnan(tensor) #判断是否为’nan’
616 | torch.sort(input, dim=None, descending=False, out=None) #对目标input进行排序
617 | torch.topk(input, k, dim=None, largest=True, sorted=True, out=None)  #沿着指定维度返回最大k个数值及其索引值
618 | torch.kthvalue(input, k, dim=None, deepdim=False, out=None) #沿着指定维度返回最小k个数值及其索引值
619 | ```
620 | 
621 | 
622 | 
623 | - 频谱操作
624 | 
625 | ```text
626 | torch.fft(input, signal_ndim, normalized=False)
627 | torch.ifft(input, signal_ndim, normalized=False)
628 | torch.rfft(input, signal_ndim, normalized=False, onesided=True)
629 | torch.irfft(input, signal_ndim, normalized=False, onesided=True)
630 | torch.stft(signa, frame_length, hop, …)
631 | ```
632 | 
633 | 
634 | 
635 | - 其他操作：
636 | 
637 | ```text
638 | torch.cross(input, other, dim=-1, out=None)  #叉乘(外积)
639 | 
640 | torch.dot(tensor1, tensor2)  #返回tensor1和tensor2的点乘
641 | 
642 | torch.mm(mat1, mat2, out=None) #返回矩阵mat1和mat2的乘积
643 | 
644 | torch.eig(a, eigenvectors=False, out=None) #返回矩阵a的特征值/特征向量 
645 | 
646 | torch.det(A)  #返回矩阵A的行列式
647 | 
648 | torch.trace(input) #返回2-d 矩阵的迹(对对角元素求和)
649 | 
650 | torch.diag(input, diagonal=0, out=None) #
651 | 
652 | torch.histc(input, bins=100, min=0, max=0, out=None) #计算input的直方图
653 | 
654 | torch.tril(input, diagonal=0, out=None)  #返回矩阵的下三角矩阵，其他为0
655 | 
656 | torch.triu(input, diagonal=0, out=None) #返回矩阵的上三角矩阵，其他为0
657 | ```
658 | 
659 | 
660 | 
661 | ## Tips:
662 | 
663 | - 获取python number:
664 | 
665 | 由于pytorch 0.4后，python number的获取统一通过 .item()方式实现：
666 | 
667 | ```text
668 | >>> a = torch.Tensor([1,2,3])
669 | >>> a[0]   #直接取索引返回的是tensor数据
670 | tensor(1.)
671 | >>> a[0].item()  #获取python number
672 | 1
673 | ```
674 | 
675 | 
676 | 
677 | - tensor设置
678 | 
679 | 判断:
680 | 
681 | ```text
682 | torch.is_tensor()  #如果是pytorch的tensor类型返回true
683 | torch.is_storage() # 如果是pytorch的storage类型返回ture
684 | ```
685 | 
686 | 
687 | 
688 | 这里还有一个小技巧，如果需要判断tensor是否为空，可以如下
689 | 
690 | ```text
691 | >>> a=torch.Tensor()
692 | >>> len(a)
693 | 0
694 | >>> len(a) is 0
695 | True
696 | ```
697 | 
698 | 
699 | 
700 | 设置: 通过一些内置函数，可以实现对tensor的精度, 类型，print打印参数等进行设置
701 | 
702 | ```text
703 | torch.set_default_dtype(d)  #对torch.tensor() 设置默认的浮点类型
704 | 
705 | torch.set_default_tensor_type() # 同上，对torch.tensor()设置默认的tensor类型
706 | >>> torch.tensor([1.2, 3]).dtype           # initial default for floating point is torch.float32
707 | torch.float32
708 | >>> torch.set_default_dtype(torch.float64)
709 | >>> torch.tensor([1.2, 3]).dtype           # a new floating point tensor
710 | torch.float64
711 | >>> torch.set_default_tensor_type(torch.DoubleTensor)
712 | >>> torch.tensor([1.2, 3]).dtype    # a new floating point tensor
713 | torch.float64
714 | 
715 | torch.get_default_dtype() #获得当前默认的浮点类型torch.dtype
716 | 
717 | torch.set_printoptions(precision=None, threshold=None, edgeitems=None, linewidth=None, profile=None）#)
718 | ## 设置printing的打印参数
719 | ```


--------------------------------------------------------------------------------
/8pytorch优化函数学习率衰减.md:
--------------------------------------------------------------------------------
  1 | PyTorch提供了十种优化器，在这里就看看都有哪些优化器。
  2 | 
  3 | # torch.optim
  4 | 
  5 | `torch.optim`是一个实现了各种优化算法的库。大部分常用的方法得到支持，并且接口具备足够的通用性，使得未来能够集成更加复杂的方法。
  6 | 
  7 | ## 如何使用optimizer
  8 | 
  9 | 为了使用`torch.optim`，你需要构建一个optimizer对象。这个对象能够保持当前参数状态并基于计算得到的梯度进行参数更新。
 10 | 
 11 | ### 构建
 12 | 
 13 | 为了构建一个`Optimizer`，你需要给它一个包含了需要优化的参数（必须都是`Variable`对象）的iterable。然后，你可以设置optimizer的参数选项，比如学习率，权重衰减，等等。
 14 | 
 15 | 例子：
 16 | 
 17 | ```
 18 | optimizer = optim.SGD(model.parameters(), lr = 0.01, momentum=0.9)
 19 | optimizer = optim.Adam([var1, var2], lr = 0.0001)
 20 | ```
 21 | 
 22 | ### 为每个参数单独设置选项
 23 | 
 24 | `Optimizer`也支持为每个参数单独设置选项。若想这么做，不要直接传入`Variable`的iterable，而是传入`dict`的iterable。每一个dict都分别定 义了一组参数，并且包含一个`param`键，这个键对应参数的列表。其他的键应该optimizer所接受的其他参数的关键字相匹配，并且会被用于对这组参数的 优化。
 25 | 
 26 | **注意：**
 27 | 
 28 | 你仍然能够传递选项作为关键字参数。在未重写这些选项的组中，它们会被用作默认值。当你只想改动一个参数组的选项，但其他参数组的选项不变时，这是 非常有用的。
 29 | 
 30 | 例如，当我们想指定每一层的学习率时，这是非常有用的：
 31 | 
 32 | ```python
 33 | optim.SGD([
 34 |                 {'params': model.base.parameters()},
 35 |                 {'params': model.classifier.parameters(), 'lr': 1e-3}
 36 |             ], lr=1e-2, momentum=0.9)
 37 | ```
 38 | 
 39 | 这意味着`model.base`的参数将会使用`1e-2`的学习率，`model.classifier`的参数将会使用`1e-3`的学习率，并且`0.9`的momentum将会被用于所 有的参数。
 40 | 
 41 | ### 进行单次优化
 42 | 
 43 | 所有的optimizer都实现了`step()`方法，这个方法会更新所有的参数。它能按两种方式来使用：
 44 | 
 45 | **optimizer.step()**
 46 | 
 47 | 这是大多数optimizer所支持的简化版本。一旦梯度被如`backward()`之类的函数计算好后，我们就可以调用这个函数。
 48 | 
 49 | 例子
 50 | 
 51 | ```python
 52 | for input, target in dataset:
 53 |     optimizer.zero_grad()
 54 |     output = model(input)
 55 |     loss = loss_fn(output, target)
 56 |     loss.backward()
 57 |     optimizer.step()
 58 | ```
 59 | 
 60 | **optimizer.step(closure)**
 61 | 
 62 | 一些优化算法例如Conjugate Gradient和LBFGS需要重复多次计算函数，因此你需要传入一个闭包去允许它们重新计算你的模型。这个闭包应当清空梯度， 计算损失，然后返回。
 63 | 
 64 | 例子：
 65 | 
 66 | ```python
 67 | for input, target in dataset:
 68 |     def closure():
 69 |         optimizer.zero_grad()
 70 |         output = model(input)
 71 |         loss = loss_fn(output, target)
 72 |         loss.backward()
 73 |         return loss
 74 |     optimizer.step(closure)
 75 | ```
 76 | 
 77 | 
 78 | 
 79 | # class torch.optim.Optimizer(params, defaults) 
 80 | 
 81 | ==Base class for all optimizers.==
 82 | 
 83 | **参数：**
 84 | 
 85 | - params (iterable) —— `Variable` 或者 `dict`的iterable。指定了什么参数应当被优化。
 86 | - defaults —— (dict)：包含了优化选项默认值的字典（一个参数组没有指定的参数选项将会使用默认值）。
 87 | 
 88 | #### load_state_dict(state_dict) 
 89 | 
 90 | 加载optimizer状态
 91 | 
 92 | **参数：**
 93 | 
 94 | state_dict (`dict`) —— optimizer的状态。应当是一个调用`state_dict()`所返回的对象。
 95 | 
 96 | #### state_dict() 
 97 | 
 98 | 以`dict`返回optimizer的状态。
 99 | 
100 | 它包含两项。
101 | 
102 | - state - 一个保存了当前优化状态的dict。optimizer的类别不同，state的内容也会不同。
103 | - param_groups - 一个包含了全部参数组的dict。
104 | 
105 | #### step(closure) 
106 | 
107 | 进行单次优化 (参数更新).
108 | 
109 | **参数：**
110 | 
111 | - closure (`callable`) – 一个重新评价模型并返回loss的闭包，对于大多数参数来说是可选的。
112 | 
113 | #### zero_grad() [source]
114 | 
115 | 清空所有被优化过的Variable的梯度
116 | 
117 | 
118 | 
119 | # 1 torch.optim.SGD
120 | ```python
121 | class torch.optim.SGD(params, lr=<required parameter>, momentum=0, dampening=0, weight_decay=0, nesterov=False)
122 | ```
123 | 
124 | 功能：
125 | 可实现SGD优化算法，带动量SGD优化算法，带NAG(Nesterov accelerated gradient)动量SGD优化算法,并且均可拥有weight_decay项。
126 | 
127 | ## 参数：
128 | 
129 | - params (iterable) – 待优化参数的iterable或者是定义了参数组的dict
130 | - lr (`float`) – 学习率
131 | - momentum (`float`, 可选) – 动量因子（默认：0）
132 | - weight_decay (`float`, 可选) – 权重衰减（L2惩罚）（默认：0）
133 | - dampening (`float`, 可选) – 动量的抑制因子（默认：0）在源码中是这样用的：buf.mul_(momentum).add_(1 - dampening, d_p)，值得注意的是，若采用nesterov，dampening必须为 0.
134 | - nesterov (`bool`, 可选) – 使用Nesterov动量（默认：False）
135 | 
136 | 
137 | 
138 | 注意事项：
139 | pytroch中使用SGD十分需要注意的是，更新公式与其他框架略有不同！
140 | pytorch中是这样的：
141 | $v=ρ∗v+g\\
142 | p=p−lr∗v = p - lr∗ρ∗v - lr∗g$
143 | 其他框架：
144 | $v=ρ∗v+lr∗g\\
145 | p=p−v = p - ρ∗v - lr∗g$
146 | ρ是动量，v是速率，g是梯度，p是参数，其实差别就是在ρ∗v这一项，pytorch中将此项也乘了一个学习率。
147 | 
148 | ## 手写sgd
149 | 
150 | ```python
151 | def sgd_update(parameters, lr):
152 |     for param in parameters:
153 |         param.data = param.data - lr * param.grad.data
154 |         
155 | def sgd_momentum(parameters, vs, lr, gamma):
156 |     for param, v in zip(parameters, vs):
157 |         v[:] = gamma * v + lr * param.grad.data
158 |         param.data = param.data - v
159 |         
160 | loss.backward()
161 | sgd_momentum(net.parameters(), vs, 1e-2, 0.9) # 使用的动量参数为 0.9，学习率 0.01
162 | ```
163 | 
164 | 
165 | 
166 | # 2 torch.optim.ASGD
167 | ```
168 | class torch.optim.ASGD(params, lr=0.01, lambd=0.0001, alpha=0.75, t0=1000000.0, weight_decay=0)
169 | ```
170 | 
171 | 功能：
172 | ASGD也成为SAG，均表示随机平均梯度下降(Averaged Stochastic Gradient Descent)，简单地说ASGD就是用空间换时间的一种SGD，详细可参看论文：http://riejohnson.com/rie/stograd_nips.pdf
173 | 
174 | ## **参数：**
175 | 
176 | - params (iterable) – 待优化参数的iterable或者是定义了参数组的dict
177 | - lr (`float`, 可选) – （默认：1e-2）初始学习率，可按需随着训练过程不断调整学习率 
178 | - lambd (`float`, 可选) – 衰减项（默认：1e-4）
179 | - alpha (`float`, 可选) – eta更新的指数（默认：0.75）
180 | - t0 (`float`, 可选) – 指明在哪一次开始平均化（默认：1e6）
181 | - weight_decay (`float`, 可选) – 权重衰减（L2惩罚）（默认: 0）
182 | 
183 | # 3 torch.optim.Rprop
184 | ```
185 | class torch.optim.Rprop(params, lr=0.01, etas=(0.5, 1.2), step_sizes=(1e-06, 50))
186 | ```
187 | 
188 | 功能：
189 | 实现Rprop优化方法(弹性反向传播)，优化方法原文《Martin Riedmiller und Heinrich Braun: Rprop - A Fast Adaptive Learning Algorithm. Proceedings of the International Symposium on Computer and Information Science VII, 1992》
190 | 
191 | ## **参数：**
192 | 
193 | - params (iterable) – 待优化参数的iterable或者是定义了参数组的dict
194 | - lr (`float`, 可选) – 学习率（默认：1e-2）
195 | - etas (Tuple[`float`, `float`], 可选) –  一对（etaminus，etaplis）, 它们分别是乘法的增加和减小的因子（默认：0.5，1.2）
196 | - step_sizes (Tuple[`float`, `float`], 可选) – 允许的一对最小和最大的步长（默认：1e-6，50）
197 | 
198 | 
199 | 
200 | 该优化方法适用于full-batch，不适用于mini-batch，因而在min-batch大行其道的时代里，很少见到。
201 | 
202 | # 4 torch.optim.Adagrad
203 | ```python
204 | classs torch.optim.Adagrad(params, lr=0.01, lr_decay=0, weight_decay=0, initial_accumulator_value=0)
205 | ```
206 | 
207 | 功能：
208 | 实现Adagrad优化方法(Adaptive Gradient)，Adagrad是一种自适应优化方法，是自适应的为各个参数分配不同的学习率。这个学习率的变化，会受到梯度的大小和迭代次数的影响。梯度越大，学习率越小；梯度越小，学习率越大。缺点是训练后期，学习率过小，因为Adagrad累加之前所有的梯度平方作为分母。
209 | 
210 | ## **参数：**
211 | 
212 | - params (iterable) – 待优化参数的iterable或者是定义了参数组的dict
213 | - lr (`float`, 可选) – 学习率（默认: 1e-2）
214 | - lr_decay (`float`, 可选) – 学习率衰减（默认: 0）
215 | - weight_decay (`float`, 可选) – 权重衰减（L2惩罚）（默认: 0）
216 | 
217 | #### step(closure=None) 
218 | 
219 | 进行单次优化 (参数更新).
220 | 
221 | **参数：**
222 | 
223 | - closure (`callable`) – 一个重新评价模型并返回loss的闭包，对于大多数参数来说是可选的。
224 | 
225 | ## 手写Adagrad
226 | 
227 | ```python
228 | def sgd_adagrad(parameters, sqrs, lr):
229 |     eps = 1e-10
230 |     for param, sqr in zip(parameters, sqrs):
231 |         sqr[:] = sqr + param.grad.data ** 2
232 |         div = lr / torch.sqrt(sqr + eps) * param.grad.data
233 |         param.data = param.data - div
234 |         
235 | # 在循环中更新参数
236 | sgd_adagrad(net.parameters(), sqrs, 1e-2) # 学习率设为 0.01
237 | ```
238 | 
239 | 
240 | 
241 | # 5 torch.optim.Adadelta
242 | ```
243 | class torch.optim.Adadelta(params, lr=1.0, rho=0.9, eps=1e-06, weight_decay=0)
244 | ```
245 | 
246 | 功能：
247 | 实现Adadelta优化方法。Adadelta是Adagrad的改进。Adadelta分母中采用距离当前时间点比较近的累计项，这可以避免在训练后期，学习率过小。
248 | 
249 | ## **参数：**
250 | 
251 | - params (iterable) – 待优化参数的iterable或者是定义了参数组的dict
252 | - rho (`float`, 可选) – 用于计算平方梯度的运行平均值的系数（默认：0.9）
253 | - eps (`float`, 可选) – 为了增加数值计算的稳定性而加到分母里的项（默认：1e-6）
254 | - lr (`float`, 可选) – 在delta被应用到参数更新之前对它缩放的系数（默认：1.0）
255 | - weight_decay (`float`, 可选) – 权重衰减（L2惩罚）（默认: 0）
256 | 
257 | #### step(closure=None) 
258 | 
259 | 进行单次优化 (参数更新).
260 | 
261 | **参数：**
262 | 
263 | - closure (`callable`) – 一个重新评价模型并返回loss的闭包，对于大多数参数来说是可选的。
264 | 
265 | 
266 | 
267 | ## 手写Adadelta
268 | 
269 | ```python
270 | def adadelta(parameters, sqrs, deltas, rho):
271 |     eps = 1e-6
272 |     for param, sqr, delta in zip(parameters, sqrs, deltas):
273 |         sqr[:] = rho * sqr + (1 - rho) * param.grad.data ** 2
274 |         cur_delta = torch.sqrt(delta + eps) / torch.sqrt(sqr + eps) * param.grad.data
275 |         delta[:] = rho * delta + (1 - rho) * cur_delta ** 2
276 |         param.data = param.data - cur_delta
277 | 
278 | 
279 | # 循环中更新参数
280 | adadelta(net.parameters(), sqrs, deltas, 0.9) # rho 设置为 0.9
281 | ```
282 | 
283 | 
284 | 
285 | 
286 | 
287 | # 6 torch.optim.RMSprop
288 | ```
289 | class torch.optim.RMSprop(params, lr=0.01, alpha=0.99, eps=1e-08, weight_decay=0, momentum=0, centered=False)
290 | ```
291 | 
292 | 功能：
293 | 实现RMSprop优化方法（Hinton提出），RMS是均方根（root meam square）的意思。RMSprop和Adadelta一样，也是对Adagrad的一种改进。RMSprop采用均方根作为分母，可缓解Adagrad学习率下降较快的问题。并且引入均方根，可以减少摆动.
294 | 
295 | ## **参数：**
296 | 
297 | - params (iterable) – 待优化参数的iterable或者是定义了参数组的dict
298 | - lr (`float`, 可选) – 学习率（默认：1e-2）
299 | - momentum (`float`, 可选) – 动量因子（默认：0）
300 | - alpha (`float`, 可选) – 平滑常数（默认：0.99）
301 | - eps (`float`, 可选) – 为了增加数值计算的稳定性而加到分母里的项（默认：1e-8）
302 | - centered (`bool`, 可选) – 如果为True，计算中心化的RMSProp，并且用它的方差预测值对梯度进行归一化
303 | - weight_decay (`float`, 可选) – 权重衰减（L2惩罚）（默认: 0）
304 | 
305 | 
306 | 
307 | ## 手写rmsprop
308 | 
309 | ```python
310 | def rmsprop(parameters, sqrs, lr, alpha):
311 |     eps = 1e-10
312 |     for param, sqr in zip(parameters, sqrs):
313 |         sqr[:] = alpha * sqr + (1 - alpha) * param.grad.data ** 2
314 |         div = lr / torch.sqrt(sqr + eps) * param.grad.data
315 |         param.data = param.data - div
316 |         
317 |         
318 | loss.backward()
319 | rmsprop(net.parameters(), sqrs, 1e-3, 0.9) # 学习率设为 0.001，alpha 设为 0.9
320 | ```
321 | 
322 | 
323 | 
324 | # 7 torch.optim.Adam(AMSGrad)
325 | ```
326 | class torch.optim.Adam(params, lr=0.001, betas=(0.9, 0.999), eps=1e-08, weight_decay=0, amsgrad=False)
327 | ```
328 | 
329 | 功能：
330 | 实现Adam(Adaptive Moment Estimation))优化方法。Adam是一种自适应学习率的优化方法，Adam利用梯度的一阶矩估计和二阶矩估计动态的调整学习率。吴老师课上说过，Adam是结合了Momentum和RMSprop，并进行了偏差修正。
331 | 
332 | ## **参数：**
333 | 
334 | - params (iterable) – 待优化参数的iterable或者是定义了参数组的dict
335 | - lr (`float`, 可选) – 学习率（默认：1e-3）
336 | - betas (Tuple[`float`, `float`], 可选) – 用于计算梯度以及梯度平方的运行平均值的系数（默认：0.9，0.999）
337 | - eps (`float`, 可选) – 为了增加数值计算的稳定性而加到分母里的项（默认：1e-8）
338 | - weight_decay (`float`, 可选) – 权重衰减（L2惩罚）（默认: 0）
339 | - amsgrad - 是否采用AMSGrad优化方法，asmgrad优化方法是针对Adam的改进，通过添加额外的约束，使学习率始终为正值。(AMSGrad，ICLR-2018 Best-Pper之一，《On the convergence of Adam and Beyond》)。
340 | 
341 | ## 手写adam
342 | 
343 | ```python
344 | def adam(parameters, vs, sqrs, lr, t, beta1=0.9, beta2=0.999):
345 |     eps = 1e-8
346 |     for param, v, sqr in zip(parameters, vs, sqrs):
347 |         v[:] = beta1 * v + (1 - beta1) * param.grad.data
348 |         sqr[:] = beta2 * sqr + (1 - beta2) * param.grad.data ** 2
349 |         v_hat = v / (1 - beta1 ** t)
350 |         s_hat = sqr / (1 - beta2 ** t)
351 |         param.data = param.data - lr * v_hat / torch.sqrt(s_hat + eps)
352 | 
353 | # 使用 Sequential 定义 3 层神经网络
354 | net = nn.Sequential(
355 |     nn.Linear(784, 200),
356 |     nn.ReLU(),
357 |     nn.Linear(200, 10),
358 | )
359 | # 初始化梯度平方项和动量项
360 | sqrs = []
361 | vs = []
362 | for param in net.parameters():
363 |     sqrs.append(torch.zeros_like(param.data))
364 |     vs.append(torch.zeros_like(param.data))
365 | t = 1
366 | # 开始训练
367 | losses = []
368 | idx = 0
369 |         
370 | for e in range(5):
371 |     train_loss = 0
372 |     for im, label in train_data:
373 |         # 前向传播
374 |         out = net(im)
375 |         loss = criterion(out, label)
376 |         # 反向传播
377 |         net.zero_grad()
378 |         loss.backward()
379 |         adam(net.parameters(), vs, sqrs, 1e-3, t) # 学习率设为 0.001
380 |         t += 1
381 |         # 记录误差
382 |         train_loss += loss.data[0]
383 |         if idx % 30 == 0:
384 |             losses.append(loss.data[0])
385 |         idx += 1
386 |     print('epoch: {}, Train Loss: {:.6f}'
387 |           .format(e, train_loss / len(train_data)))
388 | ```
389 | 
390 | 
391 | 
392 | # 8 torch.optim.Adamax
393 | ```
394 | class torch.optim.Adamax(params, lr=0.002, betas=(0.9, 0.999), eps=1e-08, weight_decay=0)
395 | ```
396 | 
397 | 功能：
398 | 实现Adamax优化方法。Adamax是对Adam增加了一个学习率上限的概念，所以也称之为Adamax。 Adam的一种基于无穷范数的变种
399 | 
400 | ## **参数：**
401 | 
402 | - params (iterable) – 待优化参数的iterable或者是定义了参数组的dict
403 | - lr (`float`, 可选) – 学习率（默认：2e-3）
404 | - betas (Tuple[`float`, `float`], 可选) – 用于计算梯度以及梯度平方的运行平均值的系数
405 | - eps (`float`, 可选) – 为了增加数值计算的稳定性而加到分母里的项（默认：1e-8）
406 | - weight_decay (`float`, 可选) – 权重衰减（L2惩罚）（默认: 0）
407 | 
408 | # 9 torch.optim.SparseAdam
409 | ```
410 | class torch.optim.SparseAdam(params, lr=0.001, betas=(0.9, 0.999), eps=1e-08)
411 | ```
412 | 
413 | 功能：
414 | 针对稀疏张量的一种“阉割版”Adam优化方法。
415 | only moments that show up in the gradient get updated, and only those portions of the gradient get applied to the parameters
416 | 
417 | ## 参数：
418 | 
419 | - params (iterable) – 待优化参数的iterable或者是定义了参数组的dict
420 | - lr (`float`, 可选) – 学习率（默认：1e-3）
421 | - betas (Tuple[`float`, `float`], 可选) – 用于计算梯度以及梯度平方的运行平均值的系数
422 | - eps (`float`, 可选) – 为了增加数值计算的稳定性而加到分母里的项（默认：1e-8）
423 | 
424 | # 10 torch.optim.LBFGS
425 | ```python
426 | class torch.optim.LBFGS(params, lr=1, max_iter=20, max_eval=None, tolerance_grad=1e-05, tolerance_change=1e-09, history_size=100, line_search_fn=None)
427 | ```
428 | 
429 | 功能：
430 | 实现L-BFGS（Limited-memory Broyden–Fletcher–Goldfarb–Shanno）优化方法。L-BFGS属于拟牛顿算法。L-BFGS是对BFGS的改进，特点就是节省内存。
431 | 使用注意事项：
432 | 
433 | #### 警告
434 | 
435 | 这个optimizer不支持为每个参数单独设置选项以及不支持参数组（只能有一个）
436 | 
437 | #### 警告
438 | 
439 | 目前所有的参数不得不都在同一设备上。在将来这会得到改进。
440 | 
441 | #### 注意
442 | 
443 | 这是一个内存高度密集的optimizer（它要求额外的`param_bytes * (history_size + 1)` 个字节）。如果它不适应内存，尝试减小history size，或者使用不同的算法。
444 | 
445 | ## **参数：**
446 | 
447 | - lr (`float`) – 学习率（默认：1）
448 | - max_iter (`int`) – 每一步优化的最大迭代次数（默认：20）)
449 | - max_eval (`int`) – 每一步优化的最大函数评价次数（默认：max_iter * 1.25）
450 | - tolerance_grad (`float`) – 一阶最优的终止容忍度（默认：1e-5）
451 | - tolerance_change (`float`) – 在函数值/参数变化量上的终止容忍度（默认：1e-9）
452 | - history_size (`int`) – 更新历史的大小（默认：100）
453 | 
454 | # **什么是参数组 /param_groups?** 
455 | 
456 | `optimizer`通过`param_group`来管理参数组.`param_group`中保存了参数组及其对应的学习率,动量等等.所以我们可以通过更改`param_group['lr']`的值来更改对应参数组的学习率.
457 | 
458 | 下面有一个手动更改学习率的例子
459 | 
460 | ```python
461 | # 有两个`param_group`即,len(optim.param_groups)==2
462 | optim.SGD([
463 |                 {'params': model.base.parameters()},
464 |                 {'params': model.classifier.parameters(), 'lr': 1e-3}
465 |             ], lr=1e-2, momentum=0.9)
466 | 
467 | #一个参数组
468 | optim.SGD(model.parameters(), lr=1e-2, momentum=.9)
469 | # 获得学习率
470 | print('learning rate: {}'.format(optimizer.param_groups[0]['lr']))
471 | print('weight decay: {}'.format(optimizer.param_groups[0]['weight_decay']))
472 | ```
473 | 
474 | 
475 | 
476 | # 如何调整学习率
477 | 
478 | `torch.optim.lr_scheduler` provides several methods to adjust the learning rate based on the number of epochs. [`torch.optim.lr_scheduler.ReduceLROnPlateau`](https://pytorch.apachecn.org/docs/1.0/#/optim?id=torch.optim.lr_scheduler.reducelronplateau) allows dynamic learning rate reducing based on some validation measurements.
479 | 
480 | PyTorch学习率调整策略通过torch.optim.lr_scheduler接口实现。PyTorch提供的学习率调整策略分为三大类，分别是
481 | 
482 | a. 有序调整：等间隔调整(Step)，按需调整学习率(MultiStep)，指数衰减调整(Exponential)和 余弦退火CosineAnnealing。
483 | b. 自适应调整：自适应调整学习率 ReduceLROnPlateau。
484 | c. 自定义调整：自定义调整学习率 LambdaLR。
485 | 
486 | 第一类，依一定规律有序进行调整，这一类是最常用的，分别是等间隔下降(Step)，按需设定下降间隔(MultiStep)，指数下降(Exponential)和 CosineAnnealing。这四种方法的调整时机都是人为可控的，也是训练时常用到的。
487 | 第二类，依训练状况伺机调整，这就是 ReduceLROnPlateau 方法。该法通过监测某一指标的变化情况，当该指标不再怎么变化的时候，就是调整学习率的时机，因而属于自适应的调整。
488 | 第三类，自定义调整， Lambda。 Lambda 方法提供的调整策略十分灵活，我们可以为不同的层设定不同的学习率调整方法，这在 fine-tune 中十分有用，我们不仅可为不同的层设定不同的学习率，还可以为其设定不同的学习率调整策略，简直不能更棒！ 
489 | 
490 | ## scheduler.step()
491 | 
492 | scheduler.step()在一次循环中只能出现一次
493 | 
494 | ## 1 等间隔调整学习率 StepLR
495 | 等间隔调整学习率，调整倍数为 gamma 倍，调整间隔为 step_size。间隔单位是step。需要注意的是， step 通常是指 epoch，不要弄成 iteration 了。
496 | 
497 | ```python
498 | torch.optim.lr_scheduler.StepLR(optimizer, step_size, gamma=0.1, last_epoch=-1)
499 | ```
500 | 
501 | 参数：
502 | 
503 | - step_size(int) - 学习率下降间隔数，若为 30，则会在 30、 60、 90…个 step 时，将学习率调整为 lr*gamma。
504 | - gamma(float)- 学习率调整倍数，默认为 0.1 倍，即下降 10 倍。
505 | - last_epoch(int)- 上一个 epoch 数，这个变量用来指示学习率是否需要调整。当last_epoch 符合设定的间隔时，就会对学习率进行调整。当为-1 时，学习率设置为初始值。
506 | 
507 | ```python
508 | import torch
509 | import torch.optim as optim
510 | from torch.optim import lr_scheduler
511 | from torchvision.models import AlexNet
512 | import matplotlib.pyplot as plt
513 | 
514 | 
515 | model = AlexNet(num_classes=2)
516 | optimizer = optim.SGD(params=model.parameters(), lr=0.05)
517 | 
518 | # lr_scheduler.StepLR()
519 | # Assuming optimizer uses lr = 0.05 for all groups
520 | # lr = 0.05     if epoch < 30
521 | # lr = 0.005    if 30 <= epoch < 60
522 | # lr = 0.0005   if 60 <= epoch < 90
523 | 
524 | scheduler = lr_scheduler.StepLR(optimizer, step_size=30, gamma=0.1)
525 | plt.figure()
526 | x = list(range(100))
527 | y = []
528 | for epoch in range(100):
529 |     scheduler.step()
530 |     lr = scheduler.get_lr()
531 |     print(epoch, scheduler.get_lr()[0])
532 |     y.append(scheduler.get_lr()[0])
533 | 
534 | plt.plot(x, y)
535 | ```
536 | 
537 | ![img](https://upload-images.jianshu.io/upload_images/11478104-d4791323b2c09941.png?imageMogr2/auto-orient/strip%7CimageView2/2/w/857/format/webp)
538 | 
539 | 
540 | 
541 | ## 2 按需调整学习率 MultiStepLR
542 | 按设定的间隔调整学习率。这个方法适合后期调试使用，观察 loss 曲线，为每个实验定制学习率调整时机。
543 | 
544 | ```python
545 | torch.optim.lr_scheduler.MultiStepLR(optimizer, milestones, gamma=0.1, last_epoch=-1)
546 | ```
547 | 
548 | 参数：
549 | 
550 | - milestones(list)- 一个 list，每一个元素代表何时调整学习率， list 元素必须是递增的。如 milestones=[30,80,120]
551 | - gamma(float) - 学习率调整倍数，默认为 0.1 倍，即下降 10 倍。
552 | - last_epoch(int )- 上一个 epoch 数，这个变量用来指示学习率是否需要调整。last_epoch 符合设定的间隔时，就会对学习率进行调整。当为-1 时，学习率设置为初始值。
553 | 
554 | 
555 | 
556 | ```python
557 | # ---------------------------------------------------------------
558 | # 可以指定区间
559 | # lr_scheduler.MultiStepLR()
560 | #  Assuming optimizer uses lr = 0.05 for all groups
561 | # lr = 0.05     if epoch < 30
562 | # lr = 0.005    if 30 <= epoch < 80
563 | #  lr = 0.0005   if epoch >= 80
564 | print()
565 | plt.figure()
566 | y.clear()
567 | scheduler = lr_scheduler.MultiStepLR(optimizer, [30, 80], 0.1)
568 | for epoch in range(100):
569 |     scheduler.step()
570 |     print(epoch, 'lr={:.6f}'.format(scheduler.get_lr()[0]))
571 |     y.append(scheduler.get_lr()[0])
572 | 
573 | plt.plot(x, y)
574 | plt.show()
575 | ```
576 | 
577 | 
578 | 
579 | ![img](https://upload-images.jianshu.io/upload_images/11478104-b0c490c9034c897c.png?imageMogr2/auto-orient/strip%7CimageView2/2/w/834/format/webp)
580 | 
581 | ## 3 指数衰减调整学习率 ExponentialLR
582 | 按指数衰减调整学习率，调整公式:$ lr=lr∗gamma∗∗epoch $
583 | 
584 | ```
585 | torch.optim.lr_scheduler.ExponentialLR(optimizer, gamma, last_epoch=-1)
586 | ```
587 | 
588 | 参数：
589 | 
590 | - gamma- 学习率调整倍数的底，指数为 epoch，即 gamma**epoch
591 | 
592 | - last_epoch(int)- 上一个 epoch 数，这个变量用来指示学习率是否需要调整。当
593 |     last_epoch 符合设定的间隔时，就会对学习率进行调整。当为-1 时，学习率设置为初始
594 |     值。 
595 | 
596 | 
597 | 
598 | ```python
599 | scheduler = lr_scheduler.ExponentialLR(optimizer, gamma=0.9)
600 | print()
601 | plt.figure()
602 | y.clear()
603 | for epoch in range(100):
604 |     scheduler.step()
605 |     print(epoch, 'lr={:.6f}'.format(scheduler.get_lr()[0]))
606 |     y.append(scheduler.get_lr()[0])
607 | 
608 | plt.plot(x, y)
609 | plt.show()
610 | ```
611 | 
612 | 
613 | 
614 | ![img](https://upload-images.jianshu.io/upload_images/11478104-ddf68c9742f2e64c.png?imageMogr2/auto-orient/strip%7CimageView2/2/w/805/format/webp)
615 | 
616 | 
617 | 
618 | ## 4 余弦退火调整学习率 CosineAnnealingLR
619 | 以余弦函数为周期，并在每个周期最大值时重新设置学习率。以初始学习率为最大学习率，以 2∗Tmax 为周期，在一个周期内先下降，后上升。
620 | 
621 | ```
622 | torch.optim.lr_scheduler.CosineAnnealingLR(optimizer, T_max, eta_min=0, last_epoch=-1)
623 | ```
624 | 
625 | 参数：
626 | 
627 | - T_max(int)- 一次学习率周期的迭代次数，即 T_max 个 epoch 之后重新设置学习率。
628 | - eta_min(float)- 最小学习率，即在一个周期中，学习率最小会下降到 eta_min，默认值为 0。
629 | 
630 | 学习率调整公式为： 
631 | $$
632 | \eta_{t+1} = \eta_{min} + (\eta_t - \eta_{min})\frac{1 +
633 |         \cos(\frac{T_{cur}+1}{T_{max}}\pi)}{1 + \cos(\frac{T_{cur}}{T_{max}}\pi)},
634 |         T_{cur} \neq (2k+1)T_{max};\\
635 |         \eta_{t+1} = \eta_{t} + (\eta_{max} - \eta_{min})\frac{1 -
636 |         \cos(\frac{1}{T_{max}}\pi)}{2},
637 |         T_{cur} = (2k+1)T_{max}.
638 | $$
639 | When last_epoch=-1, sets initial lr as lr. Notice that because the schedule is defined recursively, the learning rate can be simultaneously modified outside this scheduler by other operators. If the learning rate is set solely by this scheduler, the learning rate at each step becomes:
640 | $$
641 | \eta_t = \eta_{min} + \frac{1}{2}(\eta_{max} - \eta_{min})(1 +
642 |         \cos(\frac{T_{cur}}{T_{max}}\pi))
643 | $$
644 | 
645 | 
646 | ## 5 根据指标调整学习率 ReduceLROnPlateau
647 | 当某指标不再变化（下降或升高），调整学习率，这是非常实用的学习率调整策略。
648 | 例如，当验证集的 loss 不再下降时，进行学习率调整；或者监测验证集的 accuracy，当accuracy 不再上升时，则调整学习率。
649 | 
650 | ```python
651 | torch.optim.lr_scheduler.ReduceLROnPlateau(optimizer, mode='min', factor=0.1, patience=10, verbose=False, threshold=0.0001, threshold_mode='rel', cooldown=0, min_lr=0, eps=1e-08)
652 | ```
653 | 
654 | 参数：
655 | 
656 | - mode(str)- 模式选择，有 min 和 max 两种模式， min 表示当指标不再降低(如监测loss)， max 表示当指标不再升高(如监测 accuracy)。
657 | 
658 | - factor(float)- 学习率调整倍数(等同于其它方法的 gamma)，即学习率更新为 lr = lr * factor
659 | 
660 | - patience(int)- 忍受该指标多少个 step 不变化，当忍无可忍时，调整学习率。
661 | 
662 | - verbose(bool)- 是否打印学习率信息， print(‘Epoch {:5d}: reducing learning rate of group {} to {:.4e}.’.format(epoch, i, new_lr))
663 | 
664 |     如果为true，则为每个更新将消息打印到stdout。默认值：false。
665 | 
666 | - threshold_mode(str)- 选择判断指标是否达最优的模式，有两种模式， rel 和 abs。
667 |     当 threshold_mode == rel，并且 mode == max 时， dynamic_threshold = best * ( 1 +threshold )；
668 |     当 threshold_mode == rel，并且 mode == min 时， dynamic_threshold = best * ( 1 -threshold )；
669 |     当 threshold_mode == abs，并且 mode== max 时， dynamic_threshold = best + threshold ；
670 |     当 threshold_mode == rel，并且 mode == max 时， dynamic_threshold = best - threshold；
671 | 
672 | - threshold(float)- 配合 threshold_mode 使用。
673 | 
674 | - cooldown(int)- “冷却时间“，当调整学习率之后，让学习率调整策略冷静一下，让模型再训练一段时间，再重启监测模式。
675 | 
676 | - min_lr(float or list)- 学习率下限，可为 float，或者 list，当有多个参数组时，可用 list 进行设置。
677 | 
678 | - eps(float)- 学习率衰减的最小值，当学习率变化小于 eps 时，则不调整学习率。
679 | 
680 | 
681 | 
682 | ```python
683 | optimizer = torch.optim.SGD(model.parameters(), 																args.lr,
684 |                             momentum=args.momentum ,
685 |                             weight_decay=args.weight_decay)
686 | 
687 | scheduler = ReducelROnPlateau(optimizer,'min')
688 | for epoch in range( args.start epoch, args.epochs ):
689 |     train(train_loader , model, criterion, optimizer, epoch )
690 |     result_avg, loss_val = validate(val_loader, model, criterion, 										epoch)
691 |     # Note that step should be called after validate()
692 |     scheduler.step(loss_val )
693 | ```
694 | 
695 | 
696 | 
697 | ## 6 自定义调整学习率 LambdaLR
698 | 为不同参数组设定不同学习率调整策略。调整规则为，
699 | 
700 | $lr=base\_lr∗lmbda(self.last\_epoch) $
701 | 
702 | fine-tune 中十分有用，我们不仅可为不同的层设定不同的学习率，还可以为其设定不同的学习率调整策略。
703 | 
704 | ```
705 | torch.optim.lr_scheduler.LambdaLR(optimizer, lr_lambda, last_epoch=-1)
706 | ```
707 | 
708 | 参数：
709 | 
710 | - lr_lambda(function or list)- 一个计算学习率调整倍数的函数，输入通常为 step，当有多个参数组时，设为 list.
711 | 
712 | - last_epoch (int) – 上一个 epoch 数，这个变量用来指示学习率是否需要调整。当
713 |     last_epoch 符合设定的间隔时，就会对学习率进行调整。当为-1 时，学习率设置为初始
714 |     值。 
715 | 
716 | ```python
717 | ignored_params = list(map(id, net.fc3.parameters()))
718 | base_params = filter(lambda p: id(p) not in ignored_params, net.parameters())
719 | optimizer = optim.SGD([
720 | 		{'params': base_params},
721 | 		{'params': net.fc3.parameters(), 'lr': 0.001*100}], 0.001, 			momentum=0.9,weight_decay=1e-4)
722 |  # Assuming optimizer has two groups.
723 | lambda1 = lambda epoch: epoch // 3
724 | lambda2 = lambda epoch: 0.95 ** epoch
725 | scheduler = LambdaLR(optimizer, lr_lambda=[lambda1, lambda2])
726 | for epoch in range(100):
727 |     train(...)
728 |     validate(...)
729 |     scheduler.step()
730 |     print('epoch: ', i, 'lr: ', scheduler.get_lr())
731 |     
732 | 输出：
733 | epoch: 0 lr: [0.0, 0.1]
734 | epoch: 1 lr: [0.0, 0.095]
735 | epoch: 2 lr: [0.0, 0.09025]
736 | epoch: 3 lr: [0.001, 0.0857375]
737 | epoch: 4 lr: [0.001, 0.081450625]
738 | epoch: 5 lr: [0.001, 0.07737809374999999]
739 | epoch: 6 lr: [0.002, 0.07350918906249998]
740 | epoch: 7 lr: [0.002, 0.06983372960937498]
741 | epoch: 8 lr: [0.002, 0.06634204312890622]
742 | epoch: 9 lr: [0.003, 0.0630249409724609]
743 | 为什么第一个参数组的学习率会是 0 呢？ 来看看学习率是如何计算的。
744 | 第一个参数组的初始学习率设置为 0.001, 
745 | lambda1 = lambda epoch: epoch // 3,
746 | 第 1 个 epoch 时，由 lr = base_lr * lmbda(self.last_epoch)，
747 | 可知道 lr = 0.001 *(0//3) ，又因为 1//3 等于 0，所以导致学习率为 0。
748 | 第二个参数组的学习率变化，就很容易看啦，初始为 0.1， lr = 0.1 * 0.95^epoch ，当
749 | epoch 为 0 时， lr=0.1 ， epoch 为 1 时， lr=0.1*0.95。
750 | ```
751 | 
752 | 
753 | 
754 | ## 7 CyclicLR
755 | 
756 | ```python
757 | torch.optim.lr_scheduler.CyclicLR(optimizer, base_lr, max_lr, step_size_up=2000, step_size_down=None, mode='triangular', gamma=1.0, scale_fn=None, scale_mode='cycle', cycle_momentum=True, base_momentum=0.8, max_momentum=0.9, last_epoch=-1)
758 | ```
759 | 
760 | 
761 | 
762 | ## step源码
763 | 
764 | 
765 | 在 PyTorch 中，学习率的更新是通过 scheduler.step()，而我们知道影响学习率的一个重要参数就是 epoch，而 epoch 与 scheduler.step()是如何关联的呢？这就需要看源码了。
766 | ```python
767 | def step(self, epoch=None):
768 | 	if epoch is None:
769 | 		epoch = self.last_epoch + 1
770 | 	self.last_epoch = epoch
771 | 	for param_group, lr in zip(self.optimizer.param_groups, self.get_lr()):
772 | 		param_group['lr'] = lr
773 | ```
774 | 
775 | 函数接收变量 epoch，默认为 None，当为 None 时， epoch = self.last_epoch + 1。从这里知道， last_epoch 是用以记录 epoch 的。上面有提到 last_epoch 的初始值是-1，因此，第一个 epoch 的值为 -1+1 =0。接着最重要的一步就是获取学习率，并更新。
776 | 
777 | 由于 PyTorch 是基于参数组的管理方式，这里需要采用 for 循环对每一个参数组的学习率进行获取及更新。这里需要注意的是 get_lr()， get_lr()的功能就是获取当前epoch，该参数组的学习率。 
778 | 
779 | 这里以 StepLR()为例，介绍 get_lr()，请看代码：
780 | 
781 | ```python
782 | def get_lr(self):
783 | 	return [base_lr * self.gamma ** (self.last_epoch // self.step_size) for
784 | 			base_lr in self.base_lrs] 
785 | ```
786 | 
787 | 由于 PyTorch 是基于参数组的管理方式，可能会有多个参数组，因此用 for 循环，返
788 | 回的是一个 list。 list 元素的计算方式为
789 | 
790 | ```
791 | base_lr * self.gamma ** (self.last_epoch // self.step_size)。 
792 | ```
793 | 
794 | 在执行一次 scheduler.step()之后， epoch 会加 1，因此scheduler.step()要放在 epoch 的 for 循环当中执行。 
795 | 
796 | 
797 | 
798 | ## 学习率下降例子
799 | 
800 | ```python
801 | import torch
802 | from torch.optim import lr_scheduler
803 | 
804 | class TwoLayerNet(torch.nn.Module):
805 |     def __init__(self, D_in, H, D_out):
806 |         """
807 |         In the constructor we instantiate two nn.Linear modules and assign them as
808 |         member variables.
809 |         """
810 |         super(TwoLayerNet, self).__init__()
811 |         self.linear1 = torch.nn.Linear(D_in, H)
812 |         self.linear2 = torch.nn.Linear(H, D_out)
813 | 
814 |     def forward(self, x):
815 |         """
816 |         In the forward function we accept a Tensor of input data and we must return
817 |         a Tensor of output data. We can use Modules defined in the constructor as
818 |         well as arbitrary operators on Tensors.
819 |         """
820 |         h_relu = self.linear1(x).clamp(min=0)
821 |         y_pred = self.linear2(h_relu)
822 |         return y_pred
823 | 
824 | 
825 | 
826 | N, D_in, H, D_out = 64, 1000, 100, 10
827 | x = torch.randn(N, D_in)
828 | y = torch.randn(N, D_out)
829 | # Construct our model by instantiating the class defined above
830 | model = TwoLayerNet(D_in, H, D_out)
831 | 
832 | criterion = torch.nn.MSELoss(reduction='sum')
833 | optimizer = torch.optim.SGD(model.parameters(), lr=1e-3)
834 | print('learning rate: {}'.format(optimizer.param_groups[0]['lr']))
835 | print('weight decay: {}'.format(optimizer.param_groups[0]['weight_decay']))
836 | 
837 | # scheduler = lr_scheduler.StepLR(optimizer, step_size=50, gamma=0.5)
838 | 
839 | # scheduler = lr_scheduler.MultiStepLR(optimizer, [50, 100], 0.5)
840 | 
841 | gamma = 0.9
842 | scheduler = lr_scheduler.ExponentialLR(optimizer, gamma=gamma)
843 | for t in range(200):
844 |     # Forward pass: Compute predicted y by passing x to the model
845 |     y_pred = model(x)
846 | 
847 |     # Compute and print loss
848 |     scheduler.step()
849 |     loss = criterion(y_pred, y)
850 |     if t %25 ==0:
851 |         print(t, loss.item())
852 |         print('t:',t, scheduler.get_lr()[0])
853 |         print('learning rate: {}'.format(optimizer.param_groups[0]['lr']))
854 |         print(1e-3*gamma**t)
855 |         # print('weight decay: {}'.format(optimizer.param_groups[0]['weight_decay']))
856 | 
857 |     # Zero gradients, perform a backward pass, and update the weights.
858 |     optimizer.zero_grad()
859 |     loss.backward()
860 |     optimizer.step()
861 |     # scheduler.step() 在一次循环中只能出现一次
862 | ```
863 | 
864 | ## 手动改学习率
865 | 
866 | ```python
867 | # 一个参数组
868 | #optimizer.param_groups 返回是一个list
869 | #optimizer.param_groups[0]返回的是字典
870 | optimizer.param_groups[0]['lr'] = 1e-5
871 | # 多个参数组
872 | def set_learning_rate(optimizer, lr):
873 |     for param_group in optimizer.param_groups:
874 |         param_group['lr'] = lr
875 | ```
876 | 
877 | 


--------------------------------------------------------------------------------