├── doc ├── 3损失函数.md ├── 4layer的实现.md ├── 2优化器.md └── 1自动求导基础运算实现.md ├── README.md ├── LICENSE ├── test ├── test_optim.py └── test_tensor.py ├── easytorch ├── layer.py ├── functional.py ├── optim.py └── tensor.py └── example ├── Predict.ipynb └── FunctionApproximation.ipynb /doc/3损失函数.md: -------------------------------------------------------------------------------- 1 | # 3. 损失函数 2 | 3 | ## L2损失函数 4 | 5 | 实现简单,不做说明。 6 | 7 | L2 loss的问题是梯度的值与x的值有关,在x特别大时,会有很大的梯度,训练不稳定。 8 | 9 | ## L1损失函数 10 | 11 | L1损失函数的形式为$loss = \sum_i |y_i - pred_i|$,导数为$sign(x)$,在$x = 0$处不可导,可以使用次梯度,取0。 12 | 13 | L1 loss的问题与L2相反,梯度是常数,在x值很小时,梯度依然是1,如果学习率不变的话,很容易发生震荡,难以收敛到更高的精度。 14 | -------------------------------------------------------------------------------- /doc/4layer的实现.md: -------------------------------------------------------------------------------- 1 | # 4. layer的实现 2 | 3 | 具体代码见```layer.py```,先构建抽象基类Layer,规定实现接口,然后由子类实现```forward```方法。 4 | 5 | 目前实现过于简单,之后希望可以按照pytorch的逻辑结构实现一个稍微复杂亿点点的版本。 6 | 7 | ## Linear 8 | 9 | Linear层完成的操作是$x @ W + b$,其中的$W$和$b$为可训练参数。实现时只需要完成上面的正向传播操作,反向传播由自动求导完成,十分简单。 10 | 11 | ## 激活层 12 | 13 | 激活层的特点在于没有可训练参数,实现时与自动求导中相应函数的实现相同。 14 | 15 | ## Sequential 16 | 17 | Sequential层在初始化时会保存所有的层,在正向传播时,按顺序传递数据。 18 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # easytorch 2 | 3 | 使用Python的numpy实现的简易深度学习框架,API与pytorch基本相同,实现了自动求导、基础优化器、layer等。 4 | 5 | ## 1 文档目录 6 | 7 | [1. 自动求导基础运算实现](./doc/1自动求导基础运算实现.md) 8 | 9 | [2. 优化器实现](./doc/2优化器.md) 10 | 11 | [3. 损失函数](./doc/3损失函数.md) 12 | 13 | [4. layer的实现](./doc/4layer的实现.md) 14 | 15 | ## 2 Quick Start 16 | 17 | ``` python 18 | from easytorch.layer import Linear, Tanh, Sequential 19 | from easytorch.optim import SGD 20 | import easytorch.functional as F 21 | 22 | # Create a model, optimizer, loss function 23 | model = Sequential( 24 | Linear(1, 5), 25 | Tanh(), 26 | Linear(5, 1) 27 | ) 28 | opt = SGD(model.parameters(), lr=3e-4) 29 | loss_fn = F.mse_loss 30 | 31 | # train the model 32 | for epoch in range(epochs): 33 | pred = model(x) 34 | loss = loss_fn(pred, y) 35 | opt.zero_grad() 36 | loss.backward() 37 | opt.step() 38 | ``` 39 | 40 | ## 3 Example 41 | 42 | 1. [使用神经网络近似三角函数](./example/FunctionApproximation.ipynb) 43 | 2. [使用神经网络预测波士顿房价](./example/Predict.ipynb) 44 | -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | MIT License 2 | 3 | Copyright (c) 2021 SongLei 4 | 5 | Permission is hereby granted, free of charge, to any person obtaining a copy 6 | of this software and associated documentation files (the "Software"), to deal 7 | in the Software without restriction, including without limitation the rights 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 9 | copies of the Software, and to permit persons to whom the Software is 10 | furnished to do so, subject to the following conditions: 11 | 12 | The above copyright notice and this permission notice shall be included in all 13 | copies or substantial portions of the Software. 14 | 15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 21 | SOFTWARE. 22 | -------------------------------------------------------------------------------- /test/test_optim.py: -------------------------------------------------------------------------------- 1 | import sys 2 | sys.path.append('..') 3 | 4 | import numpy as np 5 | import matplotlib.pyplot as plt 6 | from tqdm import tqdm 7 | from easytorch import tensor, optim 8 | 9 | 10 | def generate_data(n=100, f=lambda x: 2 * x - 1): 11 | data = [] 12 | for _ in range(n): 13 | x = np.random.uniform(-3, 3, 3) 14 | y = f(x) + 0.01 * np.random.randn() 15 | data.append([x, y]) 16 | return data 17 | 18 | 19 | def sgd_linear_approximation(): 20 | train_data = generate_data(n=100, f=lambda x: x[0]+2*x[1]+3*x[2]) 21 | x = tensor.Tensor([x for x, y in train_data]) 22 | y = tensor.Tensor([y for x, y in train_data]) 23 | w = tensor.random(3, requires_grad=True) 24 | b = tensor.Tensor(1.0, requires_grad=True) 25 | opt = optim.SGD([w, b], lr=0.01) 26 | loss_list = [] 27 | 28 | for _ in tqdm(range(1000)): 29 | for data_x, data_y in zip(x, y): 30 | pred = data_x @ w + b 31 | loss = ((pred - data_y) * (pred - data_y)).mean() 32 | loss_list.append(loss.data) 33 | opt.zero_grad() 34 | loss.backward() 35 | opt.step() 36 | 37 | plt.plot(loss_list) 38 | plt.show() 39 | 40 | 41 | if __name__ == '__main__': 42 | sgd_linear_approximation() 43 | -------------------------------------------------------------------------------- /doc/2优化器.md: -------------------------------------------------------------------------------- 1 | # 2. 优化器 2 | 3 | 迭代优化算法的基本框架如下: 4 | 5 | 1. 计算目标函数对当前参数的梯度$g_t = \nabla f(\omega _t)$ 6 | 2. 更新历史的一阶动量和二阶动量$m_t$, $V_t$ 7 | 3. 使用$m_t$控制更新的方向,用$V_t$控制更新的步长,计算当前的下降梯度$\eta_t = \alpha \frac{m_t}{\sqrt{V_t}}$ 8 | 4. 使用梯度更新$\omega_{t+1} = \omega_t - \eta_t$ 9 | 10 | 不同的优化器就是在第二步中不同。 11 | 12 | 具体代码见```optim.py```。 13 | 14 | ## SGD 15 | 16 | SGD第二行更新为$m_{t} = g_t$,$V_t = 1$。 17 | 18 | 优化公式为$\omega_{t+1} = \omega_t - \alpha * g_t$ 19 | 20 | ## Adagrad(Adaptive gradient) 21 | 22 | Adagrad第二行更新为$m_{t} = g_{t}$,$V_{t} = V_{t-1} + g_t \odot g_t$加入了自适应的步长,通过累加$V_t$的方式,使得更新梯度$g_t$较大的,更新减慢,而梯度较小的$g_t$,更新加速。 23 | 24 | 优化公式为$\omega_{t+1} = \omega_t - \frac{\alpha}{\sqrt{V_{t}} + \epsilon} \odot g_t$ 25 | 26 | Adagrad的问题是在训练后期,由于$V_t$一直在累加,所以分母会过大,导致后期的学习率过小,基本没有变化。 27 | 28 | ## Moment 29 | 30 | Moment引入了动量,第二行的更新为$m_t = \beta m_{t-1} + 31 | (1 - \beta) g_t$,通过当前梯度和历史梯度的平均,使得在震荡的方向学习减慢,在稳定下降的方向学习加快。 32 | 33 | 优化公式为$\omega_{t+1} = \omega_t + \alpha * m_t$ 34 | 35 | ## RMSprop 36 | 37 | RMSprop一定程度上解决了Adagrad学习率消失的问题,对二阶动量的更新方式$V_{t} = \beta V_{t-1} + (1 - \beta)g_t \odot g_t$。 38 | 39 | 优化公式为$\omega_{t+1} = \omega_t - \frac{\alpha}{\sqrt{V_{t}} + \epsilon} \odot g_t$ 40 | 41 | ## Adam 42 | 43 | Adam是Adaptive moment,将上面两种的思想结合,第二行的更新公式为$m_t = \beta_0 m_{t-1} + (1 - \beta_0) g_t$,$V_{t} = \beta_1 V_{t-1} + (1-\beta_1)g_t \odot g_t$,然后进行bias correction,$m_t = \frac{m_t}{1 - \beta_0^t}$,$V_t = \frac{V_t}{1 - \beta_1^t}$。 44 | 45 | 优化公式为$\omega_{t+1} = \omega_t + \alpha * \frac{m_t}{\sqrt{V_t} + \epsilon}$ 46 | 47 | ## 思考 48 | 49 | 优化器中已经加入了对学习率的衰减,那么再增加学习率衰减还有没有用。 50 | -------------------------------------------------------------------------------- /easytorch/layer.py: -------------------------------------------------------------------------------- 1 | from easytorch import tensor 2 | import easytorch.functional as F 3 | import abc 4 | 5 | 6 | class Layer(metaclass=abc.ABCMeta): 7 | 8 | def __init__(self): 9 | self.params = [] 10 | 11 | @abc.abstractmethod 12 | def forward(self, x): 13 | pass 14 | 15 | def __call__(self, x): 16 | return self.forward(x) 17 | 18 | def parameters(self): 19 | return self.params 20 | 21 | 22 | class Linear(Layer): 23 | 24 | def __init__(self, in_features, out_features, bias=True): 25 | super(Linear, self).__init__() 26 | self.in_features = in_features 27 | self.out_features = out_features 28 | self.weight = tensor.random(in_features, out_features) 29 | self.params.append(self.weight) 30 | if bias: 31 | self.bias = tensor.random(out_features) 32 | self.params.append(self.bias) 33 | else: 34 | self.bias = None 35 | 36 | def forward(self, x): 37 | y = x @ self.weight 38 | if self.bias: 39 | y += self.bias 40 | return y 41 | 42 | 43 | class Sequential(Layer): 44 | 45 | def __init__(self, *layers): 46 | super(Sequential, self).__init__() 47 | self.layers = layers 48 | for layer in layers: 49 | assert isinstance(layer, Layer) 50 | self.params.extend(layer.parameters()) 51 | 52 | def forward(self, x): 53 | for layer in self.layers: 54 | x = layer(x) 55 | return x 56 | 57 | 58 | class ReLU(Layer): 59 | 60 | def __init__(self): 61 | super(ReLU, self).__init__() 62 | 63 | def forward(self, x): 64 | return F.relu(x) 65 | 66 | 67 | class Tanh(Layer): 68 | 69 | def __init__(self): 70 | super(Tanh, self).__init__() 71 | 72 | def forward(self, x): 73 | return F.tanh(x) 74 | -------------------------------------------------------------------------------- /doc/1自动求导基础运算实现.md: -------------------------------------------------------------------------------- 1 | # 1. 自动求导基础运算实现 2 | 3 | 这部分是自动求导的基础运算实现,注意下面的推导都是使用**微分**进行推导,直接使用导数推导会出现很多问题,比如矩阵求导时链式法则不成立、求得的导数结果为四维张量等等,使用微分就比较合适。具体实现代码见```tensor.py```。 4 | 5 | ## 1. 基础运算实现 6 | 7 | ### Add,Sub,Mul,Divide, Pow 8 | 9 | 最基础的四则运算,共同特点是都是逐元素运算,所以求导都十分简单,与标量形式基本相同,麻烦的是要注意考虑broadcast的问题。 10 | 11 | 1. $Z = X + Y$,则$dZ = dX + dY$,数学上梯度为单位矩阵,十分简单。比较复杂的是计算机实现中要考虑到向量的broadcast的问题,broadcast分为两种情况:第一种是其中一个矩阵的维度小于另一个,比如$X=[[1, 2], [3, 4]]$,$Y = [1, 2]$,那么在对$Y$求微分时,需要将多余的维度进行$sum$操作;另一种情况是两个的维度相等,但其中一个的某些维度形状为1,比如$X=[[1, 2], [3, 4]]$,$Y = [[1, 2]]$中,$shape(X) = (2, 2)$,$shape(Y)=(1, 2)$,需要对$shape$为1的维度做$sum$操作。 12 | 13 | 2. Sub可以直接使用Add实现。 14 | 15 | 3. 逐元素乘法,$Z = X\odot Y$。 16 | 17 | 4. 逐元素除法,$Z = X / Y$。 18 | 19 | 5. Pow操作也是逐元素操作,求导比较简单。 20 | 21 | ### Sum, Mean 22 | 23 | 都是将矩阵的某一个维度压缩,所以梯度是将当前的梯度广播到原来的size,二者的差别只在于是否需要乘一个常数项。 24 | 25 | ### Matmul 26 | 27 | 矩阵乘法或叫矩阵内积,$Z = XY$,对$X$的微分为$dZ = dXY$,所以梯度为$Y^T$,而且要注意左乘右乘的顺序。 28 | 29 | 一种特殊情况是两个行向量做内积操作时,会自动将第二个行向量转化为列向量,得到一个标量的结果。比如下面的操作也是合法的,会得到5。但目前我还没有想到怎么处理。 30 | 31 | ``` python 32 | a = Tensor([1., 2.], requires_grad=True) 33 | b = Tensor([1., 2.], requires_grad=True) 34 | c = a @ b 35 | ``` 36 | 37 | ### reshape,__getitem__ 38 | 39 | 这两个操作都是对矩阵的元素的重新排列,在数学上是完全无法求梯度的,反向传播的实现中只需要记录梯度,保持梯度与原数据的位置对应。 40 | 41 | 另外实现要注意这两个操作生成的新向量的数据与原向量的数据是相同的,是浅拷贝。 42 | 43 | ### Tanh, Relu 44 | 45 | 这两个激活函数是逐元素函数$\sigma$,设为$y = \sigma(x)$,$dy = \sigma '(x) \odot dx = diag(\sigma '(x)) dx$,第一个等号是通过逐元素操作计算,实现更加简单,但这种操作数学上好像是不存在的,第二个等号是通过矩阵计算,是数学上的正确的形式,但计算机实现会稍微麻烦一点。 46 | 47 | Tanh求导:$tanh(x) = \frac{e^x - e^{-x}}{e^x + e^{-x}}$,然后进行求导。 48 | 49 | Relu求导:relu的问题是在$x=0$处不可导,此时需要使用次梯度$c \leq \frac{f(y) - f(x)}{y - x}$,通常取$c=0$(好像一般倾向取0,使得计算方便、带来更多的稀疏性)。 50 | 51 | ### Softmax 52 | 53 | Softmax的公式为$\frac{e^x}{\sum e^x}$,但这个公式在$x$非常小时,会出现下溢的情况,导致分母为0,所以在实现时,通常会减去最大值$m$,即$\frac{e^{x - m}}{\sum e^{x - m}}$,这样在$x_i=m$时,$e^{x_i-m}$为1,分母一定大于等于1,从而避免了下溢出的问题。 54 | 55 | 求导暂时没有解决,和pytorch的梯度不同。 56 | 57 | ### Abs 58 | 59 | 求导为sign函数,在$x=0$处不可导,需要使用次梯度。 60 | 61 | ## 2. 实现总结 62 | 63 | 这部分主要是总结理论上不需要考虑,但实际实现时需要考虑的问题。 64 | 65 | 1. Broadcast问题(详见Add、Sub、Mul、Divide的实现) 66 | 2. 两个操作数为同一个对象时,该怎么处理(见```tensor.py:Tensor/backward```的实现) 67 | 68 | ## 3. 测试 69 | 70 | ### 代码中加入assert 71 | 72 | ```backward```操作后,得到的```grad```和```data```的shape是相同的。 73 | 74 | ### 对拍测试 75 | 76 | **未测试的代码永远是错的**。因为API和pytorch完全相同,所以采用和pytorch对拍的方式测试正确性,在backward后比较叶节点的```grad```。 77 | -------------------------------------------------------------------------------- /easytorch/functional.py: -------------------------------------------------------------------------------- 1 | import numpy as np 2 | import warnings 3 | from easytorch import tensor 4 | 5 | 6 | def tanh(inputs): 7 | data = np.tanh(inputs.data) 8 | requires_grad = inputs.requires_grad 9 | t = tensor.Tensor(data, requires_grad) 10 | t.is_leaf = False 11 | 12 | if inputs.requires_grad: 13 | def TanhBackward(grad): 14 | return grad * (1 - np.tanh(inputs.data) ** 2) 15 | t.grad_node.append(tensor.GRAD_NODE_FMT(inputs, TanhBackward)) 16 | 17 | return t 18 | 19 | 20 | def relu(inputs): 21 | data = np.maximum(0, inputs.data) 22 | requires_grad = inputs.requires_grad 23 | t = tensor.Tensor(data, requires_grad) 24 | t.is_leaf = False 25 | 26 | if inputs.requires_grad: 27 | def ReluBackward(grad): 28 | relu_prime = np.zeros_like(inputs.data) 29 | relu_prime[inputs.data > 0] = 1 30 | return grad * relu_prime 31 | 32 | t.grad_node.append(tensor.GRAD_NODE_FMT(inputs, ReluBackward)) 33 | 34 | return t 35 | 36 | 37 | def softmax(inputs, dim=0): 38 | raise NotImplementedError('There is a bug') 39 | def softmax_func(x): 40 | max_v = np.max(x) 41 | return np.e**(x - max_v) / np.sum(np.e**(x - max_v)) 42 | assert inputs.data.ndim == 1 or (inputs.data.ndim == 2 and (inputs.data.shape[0] == 1 or inputs.data.shape[1] == 1)) 43 | # data = np.apply_over_axes(softmax_func, dim, inputs.data) 44 | data = softmax_func(inputs.data) 45 | requires_grad = inputs.requires_grad 46 | t = tensor.Tensor(data, requires_grad) 47 | t.is_leaf = False 48 | 49 | if inputs.requires_grad: 50 | def SoftmaxBackward(grad): 51 | result = softmax_func(inputs.data) 52 | length = inputs.data.reshape(-1).shape[0] 53 | mat = np.zeros((length, length)) 54 | for i in range(length): 55 | for j in range(length): 56 | if i == j: 57 | mat[i][j] = result[i]*(1 - result[i]) 58 | else: 59 | mat[i][j] = result[i] * result[j] 60 | print('mat') 61 | print(mat) 62 | print('grad') 63 | print(grad) 64 | next_grad = mat @ grad 65 | return next_grad 66 | t.grad_node.append(tensor.GRAD_NODE_FMT(inputs, SoftmaxBackward)) 67 | 68 | return t 69 | 70 | 71 | def mse_loss(target_y, y): 72 | if y.shape != target_y.shape: 73 | warnings.warn('mse_loss, target size {} is different from input size {}, ' 74 | 'this will likely lead to incorrect results due to broadcasting'.format(target_y.shape, y.shape)) 75 | return ((y - target_y) * (y - target_y)).mean() 76 | 77 | 78 | def l1_loss(target_y, y): 79 | if y.shape != target_y.shape: 80 | warnings.warn('mse_loss, target size {} is different from input size {}, ' 81 | 'this will likely lead to incorrect results due to broadcasting'.format(target_y.shape, y.shape)) 82 | return (y - target_y).abs().mean() 83 | -------------------------------------------------------------------------------- /easytorch/optim.py: -------------------------------------------------------------------------------- 1 | import numpy as np 2 | import abc 3 | 4 | 5 | class Optimizer(metaclass=abc.ABCMeta): 6 | 7 | def __init__(self, params, lr=3e-4): 8 | self.params = params 9 | self.lr = lr 10 | self.V = [] 11 | self.m = [] 12 | for param in self.params: 13 | self.V.append(np.zeros_like(param.data)) 14 | self.m.append(np.zeros_like(param.data)) 15 | 16 | def zero_grad(self): 17 | for param in self.params: 18 | param.grad = 0 19 | 20 | @abc.abstractmethod 21 | def step(self): 22 | pass 23 | 24 | 25 | class SGD(Optimizer): 26 | 27 | def __init__(self, params, lr=3e-4): 28 | super(SGD, self).__init__(params, lr) 29 | 30 | def step(self): 31 | for param in self.params: 32 | param.data -= self.lr * param.grad 33 | 34 | 35 | class Adagrad(Optimizer): 36 | 37 | def __init__(self, params, lr=1e-2, eps=1e-8): 38 | super(Adagrad, self).__init__(params, lr) 39 | self.eps = eps 40 | 41 | def step(self): 42 | for i in range(len(self.params)): 43 | self.V[i] += self.params[i].grad * self.params[i].grad 44 | self.params[i].data -= self.lr * self.params[i].grad / (np.sqrt(self.V[i]) + self.eps) 45 | 46 | 47 | class Moment(Optimizer): 48 | 49 | def __init__(self, params, lr=3e-4, beta=0.9): 50 | super(Moment, self).__init__(params, lr) 51 | self.beta = beta 52 | 53 | def step(self): 54 | for i in range(len(self.params)): 55 | self.m[i] = self.beta * self.m[i] + (1 - self.beta) * self.params[i].grad 56 | self.params[i].data -= self.lr * self.m[i] 57 | 58 | 59 | class RMSprop(Optimizer): 60 | 61 | def __init__(self, params, lr=1e-2, alpha=0.99, eps=1e-8): 62 | super(RMSprop, self).__init__(params, lr) 63 | self.alpha = alpha 64 | self.eps = eps 65 | 66 | def step(self): 67 | for i in range(len(self.params)): 68 | self.V[i] = self.alpha * self.V[i] + (1 - self.alpha) * (self.params[i].grad * self.params[i].grad) 69 | self.params[i].data -= self.lr * self.params[i].grad / (np.sqrt(self.V[i]) + self.eps) 70 | 71 | 72 | class Adam(Optimizer): 73 | 74 | def __init__(self, params, lr=1e-3, betas=(0.9, 0.999), eps=1e-8, weight_delay=0): 75 | super(Adam, self).__init__(params, lr) 76 | self.betas = betas 77 | self.eps = eps 78 | self.beta0_bias_correction = self.betas[0] 79 | self.beta1_bias_correction = self.betas[1] 80 | self.weight_delay = weight_delay 81 | 82 | def step(self): 83 | for i in range(len(self.params)): 84 | self.m[i] = self.betas[0] * self.m[i] + (1-self.betas[0]) * self.params[i].grad 85 | self.m[i] = self.m[i] / (1 - self.beta0_bias_correction) 86 | self.V[i] = self.betas[1] * self.V[i] + (1-self.betas[1]) * (self.params[i].grad * self.params[i].grad) 87 | # 直接这么写似乎容易溢出 88 | # self.V[i] = self.V[i] / (1 - self.beta1_bias_correction) 89 | # self.params[i].data = (1 - self.weight_delay) * self.params[i].data - self.lr * self.m[i] * \ 90 | # / (np.sqrt(self.V[i]) + self.eps) 91 | self.params[i].data = (1 - self.weight_delay) * self.params[i].data - self.lr * self.m[i] * \ 92 | np.sqrt((1 - self.beta1_bias_correction)) / (np.sqrt(self.V[i]) + self.eps) 93 | self.beta0_bias_correction *= self.betas[0] 94 | self.beta1_bias_correction *= self.betas[1] 95 | -------------------------------------------------------------------------------- /test/test_tensor.py: -------------------------------------------------------------------------------- 1 | import sys 2 | sys.path.append('..') 3 | 4 | import unittest 5 | from easytorch.tensor import Tensor 6 | from torch import tensor as torchTensor 7 | 8 | 9 | def is_tensor_equal(leaves1, leaves2): 10 | ret = True 11 | for t1, t2 in zip(leaves1, leaves2): 12 | val_eq = ((t1.data - t2.detach().numpy()) < 1e-4).all() 13 | grad_eq = ((t1.grad - t2.grad.detach().numpy()) < 1e-4).all() 14 | requires_grad_eq = (t1.requires_grad == t2.requires_grad) 15 | ret = ret and val_eq and grad_eq and requires_grad_eq 16 | return ret 17 | 18 | 19 | def print_leaves(leaves): 20 | print('-------------------------') 21 | for leaf in leaves: 22 | print(leaf.grad) 23 | print('-------------------------') 24 | 25 | 26 | class TestTensor(unittest.TestCase): 27 | 28 | def run_test_case(self, case): 29 | leaves1 = case(Tensor) 30 | leaves2 = case(torchTensor) 31 | self.assertTrue(is_tensor_equal(leaves1, leaves2)) 32 | 33 | def test_ops(self): 34 | def case_add(tensor): 35 | a = tensor([1., 2.], requires_grad=True) 36 | b = tensor([3., 4.], requires_grad=True) 37 | c = tensor([[5., 6.], [7., 8.]], requires_grad=True) 38 | leaves = [a, b, c] 39 | d = a + b + 10 # 相同尺寸的逐元素加法和标量加法 40 | d = d + c + c # broadcast和两个操作数为同一个对象 41 | d = d.mean() 42 | d.backward() 43 | leaves = list(filter(lambda x: x.grad is not None, leaves)) 44 | 45 | return leaves 46 | self.run_test_case(case_add) 47 | 48 | def case_sub(tensor): 49 | a = tensor([1., 2.], requires_grad=True) 50 | b = tensor([3., 4.], requires_grad=True) 51 | c = tensor([[5., 6.], [7., 8.]], requires_grad=True) 52 | leaves = [a, b, c] 53 | d = a - b - 10 # 相同尺寸的逐元素减法和标量减法 54 | d = d - c - c # broadcast和两个操作数为同一个对象 55 | d = 100 - d 56 | d = d.sum() 57 | d.backward() 58 | leaves = list(filter(lambda x: x.grad is not None, leaves)) 59 | 60 | return leaves 61 | self.run_test_case(case_sub) 62 | 63 | def case_mul(tensor): 64 | a = tensor([1., 2.], requires_grad=True) 65 | b = tensor([3., 4.], requires_grad=True) 66 | c = tensor([[5., 6.], [7., 8.]], requires_grad=True) 67 | leaves = [a, b, c] 68 | d = a * a 69 | d = d * c * c + a 70 | d = d.sum() 71 | d.backward() 72 | leaves = list(filter(lambda x: x.grad is not None, leaves)) 73 | 74 | return leaves 75 | self.run_test_case(case_mul) 76 | 77 | def case1(tensor): 78 | a = tensor([1., 2.], requires_grad=True) 79 | b = tensor([3., 4.], requires_grad=True) 80 | c = tensor([[5., 6.], [7., 8.]], requires_grad=True) 81 | leaves = [a, b, c] 82 | 83 | d = 3*a + b + 1 84 | d = d * b 85 | d = d + 5*c / 20 86 | d = d.mean() 87 | d.backward() 88 | leaves = list(filter(lambda x: x.grad is not None, leaves)) 89 | # print_leaves(leaves) 90 | 91 | return leaves 92 | self.run_test_case(case1) 93 | 94 | def test_dot(self): 95 | def case1(tensor): 96 | a = tensor([[1., 2.], [3., 4.]], requires_grad=True) 97 | b = tensor([[5., 6., 7.], [8., 9., 10.]], requires_grad=True) 98 | leaves = [a, b] 99 | c = a @ b 100 | c = c.sum() 101 | c.backward() 102 | return leaves 103 | self.run_test_case(case1) 104 | 105 | def case2(tensor): 106 | a = tensor([[1., 2.]], requires_grad=True) 107 | b = tensor([[5.], [8.]], requires_grad=True) 108 | leaves = [a, b] 109 | c = a @ b 110 | c = c.sum() 111 | c.backward() 112 | return leaves 113 | self.run_test_case(case2) 114 | 115 | def case3(tensor): 116 | a = tensor([1., 2.], requires_grad=True) 117 | b = tensor([3., 4.], requires_grad=True) 118 | leaves = [a, b] 119 | c = a @ b + b 120 | c = c.sum() 121 | c.backward() 122 | return leaves 123 | self.run_test_case(case3) 124 | 125 | def case4(tensor): 126 | a = tensor([1.], requires_grad=True) 127 | b = tensor([3.], requires_grad=True) 128 | leaves = [a, b] 129 | c = a @ b + b 130 | c = c.sum() 131 | c.backward() 132 | return leaves 133 | self.run_test_case(case4) 134 | 135 | def test_reshape(self): 136 | a = Tensor([1, 2]) 137 | b = a.reshape(2, 1) 138 | a[0] = 10 139 | self.assertTrue(a.data[0], b.data[0][0]) 140 | 141 | def case1(tensor): 142 | a = tensor([[1.], [2.]], requires_grad=True) 143 | b = tensor([3., 4.], requires_grad=True) 144 | leaves = [a, b] 145 | c = (2*a + 10).reshape(2) 146 | d = b * (c + 10) 147 | d = d.sum() 148 | d.backward() 149 | return leaves 150 | self.run_test_case(case1) 151 | 152 | def test_activation_func(self): 153 | def case_tanh(tensor): 154 | a = tensor([1., 2.], requires_grad=True) 155 | b = tensor([3., 4.], requires_grad=True) 156 | c = tensor([[5., 6.], [7., 8.]], requires_grad=True) 157 | leaves = [a, b, c] 158 | 159 | d = 3 * a + b + 1 160 | d = (d * b).tanh() 161 | d = (d + 5 * c / 20).tanh() 162 | d = d.mean() 163 | d.backward() 164 | leaves = list(filter(lambda x: x.grad is not None, leaves)) 165 | 166 | return leaves 167 | self.run_test_case(case_tanh) 168 | 169 | def case_relu(tensor): 170 | a = tensor([1., 2.], requires_grad=True) 171 | b = tensor([3., 4.], requires_grad=True) 172 | c = tensor([[5., 6.], [7., 8.]], requires_grad=True) 173 | leaves = [a, b, c] 174 | 175 | d = 3 * a + b + 1 176 | d = (d * b).relu() 177 | d = (d + 5 * c / 20).relu() 178 | d = d.mean() 179 | d.backward() 180 | leaves = list(filter(lambda x: x.grad is not None, leaves)) 181 | 182 | return leaves 183 | self.run_test_case(case_relu) 184 | 185 | def test_pow(self): 186 | def case(tensor): 187 | a = tensor([1., 2.], requires_grad=True) 188 | b = tensor([3., 4.], requires_grad=True) 189 | c = tensor([[5., 6.], [7., 8.]], requires_grad=True) 190 | leaves = [a, b, c] 191 | 192 | d = 3 * a.pow(5) + b + 1 193 | d = (d * b).pow(2).tanh() 194 | d = (d + 5 * c / 20).relu() 195 | d = d.mean() 196 | d.backward() 197 | leaves = list(filter(lambda x: x.grad is not None, leaves)) 198 | 199 | return leaves 200 | self.run_test_case(case) 201 | 202 | def test_select(self): 203 | def case(tensor): 204 | a = tensor([1., 2.], requires_grad=True) 205 | b = tensor([3., 4.], requires_grad=True) 206 | c = tensor([[5., 6.], [7., 8.]], requires_grad=True) 207 | leaves = [a, b, c] 208 | d = a + b + c[0] 209 | d = d.mean() 210 | d.backward() 211 | leaves = list(filter(lambda x: x.grad is not None, leaves)) 212 | 213 | return leaves 214 | self.run_test_case(case) 215 | 216 | # def test_softmax(self): 217 | # def case(tensor): 218 | # a = tensor([1., 2., 3.], requires_grad=True) 219 | # leaves = [a] 220 | # b = a.softmax(dim=0) 221 | # b = b.mean() 222 | # print('b', b) 223 | # b.backward() 224 | # leaves = list(filter(lambda x: x.grad is not None, leaves)) 225 | # print_leaves(leaves) 226 | # 227 | # return leaves 228 | # 229 | # self.run_test_case(case) 230 | 231 | def test_abs(self): 232 | def case(tensor): 233 | a = tensor([1., 2., 0., -10, -20], requires_grad=True) 234 | b = tensor([-2., 4., 0., 0, -20], requires_grad=True) 235 | leaves = [a, b] 236 | c = (a + b).abs() 237 | c = c.sum() 238 | c.backward() 239 | leaves = list(filter(lambda x: x.grad is not None, leaves)) 240 | 241 | return leaves 242 | self.run_test_case(case) 243 | 244 | 245 | if __name__ == '__main__': 246 | unittest.main() 247 | -------------------------------------------------------------------------------- /easytorch/tensor.py: -------------------------------------------------------------------------------- 1 | import numpy as np 2 | from collections import namedtuple 3 | import easytorch.functional as F 4 | 5 | 6 | GRAD_NODE_FMT = namedtuple('grad_node', ['tensor', 'grad_fn']) 7 | 8 | 9 | class Tensor: 10 | 11 | def __init__(self, data, requires_grad=False): 12 | self.data = np.asarray(data) 13 | self.requires_grad = requires_grad 14 | if self.data.dtype == np.int and self.requires_grad: 15 | raise RuntimeError('Only Tensors of floating point and complex dtype can require gradients') 16 | self.grad_node = [] 17 | self.grad = None 18 | self.is_leaf = True 19 | 20 | def __repr__(self): 21 | s = 'tensor({}'.format(self.data) 22 | if self.grad_node: 23 | s += ', grad_fn=<{}>)'.format(self.grad_node[0].grad_fn.__name__) 24 | elif self.requires_grad: 25 | s += ', requires_grad=True)' 26 | else: 27 | s += ')' 28 | return s 29 | 30 | def __getitem__(self, item): 31 | data = self.data[item] 32 | requires_grad = self.requires_grad 33 | t = Tensor(data, requires_grad) 34 | t.is_leaf = False 35 | 36 | if self.requires_grad: 37 | def SelectBackward(grad): 38 | next_grad = np.zeros_like(self.data) 39 | next_grad[item] = grad 40 | return next_grad 41 | t.grad_node.append(GRAD_NODE_FMT(self, SelectBackward)) 42 | 43 | return t 44 | 45 | def __setitem__(self, key, value): 46 | self.data[key] = value 47 | 48 | def __len__(self): 49 | return len(self.data) 50 | 51 | @property 52 | def shape(self): 53 | return self.data.shape 54 | 55 | @property 56 | def T(self): 57 | raise NotImplementedError('Transpose is not implemented') 58 | 59 | def reshape(self, *shape): 60 | old_shape = self.data.shape 61 | t = Tensor(self.data.reshape(shape), self.requires_grad) 62 | t.is_leaf = False 63 | 64 | if self.requires_grad: 65 | def ViewBackward(grad): 66 | grad = grad.reshape(old_shape) 67 | return grad 68 | t.grad_node.append(GRAD_NODE_FMT(self, ViewBackward)) 69 | 70 | return t 71 | 72 | def backward(self, gradient=None): 73 | if not self.requires_grad: 74 | raise RuntimeError('tensor does not require grad') 75 | if self.grad is None: 76 | if self.data.shape == () or self.data.shape == (1, ): 77 | self.grad = np.ones(1) 78 | else: 79 | print(self.data.shape) 80 | raise RuntimeError('grad can be implicitly created only for scalar outputs') 81 | 82 | for node in self.grad_node: 83 | if node.tensor.grad is None: 84 | node.tensor.grad = node.grad_fn(self.grad) 85 | else: 86 | node.tensor.grad += node.grad_fn(self.grad) 87 | node.tensor.backward() 88 | if not node.tensor.is_leaf: 89 | node.tensor.grad = None 90 | 91 | def __add__(self, other): 92 | other = Tensor.astensor(other) 93 | data = self.data + other.data 94 | requires_grad = self.requires_grad or other.requires_grad 95 | t = Tensor(data, requires_grad) 96 | t.is_leaf = False 97 | 98 | if self.requires_grad: 99 | def AddBackward(grad): 100 | grad = grad * np.ones_like(self.data) 101 | for _ in range(grad.ndim - self.data.ndim): 102 | grad = grad.sum(axis=0) 103 | for i, d in enumerate(self.data.shape): 104 | if d == 1: 105 | grad = grad.sum(axis=i, keepdims=True) 106 | 107 | assert grad.shape == self.data.shape, 'AddBackward, grad.shape != data.shape' 108 | return grad 109 | t.grad_node.append(GRAD_NODE_FMT(self, AddBackward)) 110 | 111 | if other.requires_grad: 112 | def AddBackward(grad): 113 | grad = grad * np.ones_like(other.data) 114 | for _ in range(grad.ndim - other.data.ndim): 115 | grad = grad.sum(axis=0) 116 | 117 | for i, d in enumerate(other.data.shape): 118 | if d == 1: 119 | grad = grad.sum(axis=i, keepdims=True) 120 | 121 | assert grad.shape == other.data.shape, 'AddBackward, grad.shape != data.shape' 122 | return grad 123 | 124 | t.grad_node.append(GRAD_NODE_FMT(other, AddBackward)) 125 | 126 | return t 127 | 128 | def __radd__(self, other): 129 | return self + other 130 | 131 | def __iadd__(self, other): 132 | return self + other 133 | 134 | def __sub__(self, other): 135 | # TODO: 重新写sub函数,目前的sub记录的grad_fn为AddBackward 136 | return self + (-other) 137 | 138 | def __rsub__(self, other): 139 | return other + (-self) 140 | 141 | def __isub__(self, other): 142 | return self - other 143 | 144 | def __neg__(self): 145 | data = - self.data 146 | requires_grad = self.requires_grad 147 | t = Tensor(data, requires_grad) 148 | t.is_leaf = False 149 | 150 | if requires_grad: 151 | def NegBackward(grad): 152 | return -grad 153 | t.grad_node.append(GRAD_NODE_FMT(self, NegBackward)) 154 | 155 | return t 156 | 157 | def __mul__(self, other): 158 | other = Tensor.astensor(other) 159 | data = self.data * other.data 160 | requires_grad = self.requires_grad or other.requires_grad 161 | t = Tensor(data, requires_grad) 162 | t.is_leaf = False 163 | 164 | if requires_grad: 165 | def MulBackward(grad): 166 | grad = grad * other.data 167 | 168 | for _ in range(grad.ndim - self.data.ndim): 169 | grad = grad.sum(0) 170 | for i, d in enumerate(self.data.shape): 171 | if d == 1: 172 | grad = grad.sum(axis=i, keepdims=True) 173 | 174 | assert grad.shape == self.data.shape, 'MulBackward, grad.shape != data.shape' 175 | return grad 176 | t.grad_node.append(GRAD_NODE_FMT(self, MulBackward)) 177 | 178 | if other.requires_grad: 179 | def MulBackward(grad): 180 | grad = grad * self.data 181 | 182 | for _ in range(grad.ndim - other.data.ndim): 183 | grad = grad.sum(0) 184 | for i, d in enumerate(self.data.shape): 185 | if d == 1: 186 | grad = grad.sum(axis=i, keepdims=True) 187 | 188 | assert grad.shape == other.data.shape, 'MulBackward, grad.shape != data.shape' 189 | return grad 190 | t.grad_node.append(GRAD_NODE_FMT(other, MulBackward)) 191 | 192 | return t 193 | 194 | def __rmul__(self, other): 195 | return self * other 196 | 197 | def __imul__(self, other): 198 | return self * other 199 | 200 | def __truediv__(self, other): 201 | other = Tensor.astensor(other) 202 | data = self.data / other.data 203 | requires_grad = self.requires_grad or other.requires_grad 204 | t = Tensor(data, requires_grad) 205 | t.is_leaf = False 206 | 207 | if self.requires_grad: 208 | def DivBackward(grad): 209 | grad = grad / other.data 210 | 211 | for _ in range(grad.ndim - self.data.ndim): 212 | grad = grad.sum(0) 213 | for i, d in enumerate(self.data.shape): 214 | if d == 1: 215 | grad = grad.sum(axis=i, keepdims=True) 216 | 217 | assert grad.shape == self.data.shape, 'DivBackward, grad.shape != data.shape' 218 | return grad 219 | t.grad_node.append(GRAD_NODE_FMT(self, DivBackward)) 220 | 221 | if other.requires_grad: 222 | def DivBackward(grad): 223 | grad = - (self.data * grad) / (other.data**2) 224 | 225 | for _ in range(grad.ndim - other.data.ndim): 226 | grad = grad.sum(0) 227 | for i, d in enumerate(other.shape): 228 | if d == 1: 229 | grad = grad.sum(axis=i, keepdims=True) 230 | 231 | assert grad.shape == other.data.shape, 'DivBackward, grad.shape != data.shape' 232 | return grad 233 | t.grad_node.append(GRAD_NODE_FMT(other, DivBackward)) 234 | 235 | return t 236 | 237 | def __floordiv__(self, other): 238 | raise NotImplementedError('__floordiv__ not implemented') 239 | 240 | def sum(self, dim=None, keepdim=False): 241 | data = self.data.sum(axis=dim, keepdims=keepdim) 242 | requires_grad = self.requires_grad 243 | t = Tensor(data, requires_grad) 244 | t.is_leaf = False 245 | 246 | if self.requires_grad: 247 | def SumBackward(grad): 248 | grad = grad * np.ones_like(self.data) 249 | return grad 250 | t.grad_node.append(GRAD_NODE_FMT(self, SumBackward)) 251 | 252 | return t 253 | 254 | def mean(self, dim=None, keepdim=False): 255 | data = self.data.mean(axis=dim, keepdims=keepdim) 256 | requires_grad = self.requires_grad 257 | t = Tensor(data, requires_grad) 258 | t.is_leaf = False 259 | 260 | if self.requires_grad: 261 | def MeanBackward(grad): 262 | grad = grad * np.ones_like(self.data) / (self.data.reshape(-1).shape[0] / data.reshape(-1).shape[0]) 263 | return grad 264 | t.grad_node.append(GRAD_NODE_FMT(self, MeanBackward)) 265 | 266 | return t 267 | 268 | def __matmul__(self, other): 269 | other = Tensor.astensor(other) 270 | 271 | if self.data.ndim == 1: 272 | self = self.reshape(1, -1) 273 | if other.data.ndim == 1: 274 | other = other.reshape(-1, 1) 275 | 276 | data = self.data @ other.data 277 | requires_grad = self.requires_grad or other.requires_grad 278 | t = Tensor(data, requires_grad) 279 | t.is_leaf = False 280 | 281 | if self.requires_grad: 282 | def DotBackward(grad): 283 | # d = other.data.reshape(-1, 1).T if other.data.ndim == 1 else other.data.T 284 | grad = grad @ other.data.T 285 | assert grad.shape == self.data.shape, 'DotBackward, grad.shape != data.shape' 286 | return grad 287 | t.grad_node.append(GRAD_NODE_FMT(self, DotBackward)) 288 | 289 | if other.requires_grad: 290 | def DotBackward(grad): 291 | # d = self.data.reshape(-1, 1) if other.data.ndim == 1 else self.data.T 292 | grad = self.data.T @ grad 293 | assert grad.shape == other.data.shape, 'DotBackward, grad.shape != data.shape' 294 | return grad 295 | t.grad_node.append(GRAD_NODE_FMT(other, DotBackward)) 296 | 297 | return t 298 | 299 | def tanh(self): 300 | return F.tanh(self) 301 | 302 | def relu(self): 303 | return F.relu(self) 304 | 305 | def pow(self, n): 306 | data = np.power(self.data, n) 307 | requires_grad = self.requires_grad 308 | t = Tensor(data, requires_grad) 309 | t.is_leaf = False 310 | 311 | if self.requires_grad: 312 | def PowBackward(grad): 313 | return grad * (n * np.power(self.data, n-1)) 314 | t.grad_node.append(GRAD_NODE_FMT(self, PowBackward)) 315 | 316 | return t 317 | 318 | def softmax(self, dim=0): 319 | return F.softmax(self, dim) 320 | 321 | def abs(self): 322 | data = np.abs(self.data) 323 | requires_grad = self.requires_grad 324 | t = Tensor(data, requires_grad) 325 | t.is_leaf = False 326 | 327 | if self.requires_grad: 328 | def AbsBackward(grad): 329 | assert grad.shape == self.data.shape, 'AbsBackward, grad.shape != data.shape' 330 | return grad * np.sign(self.data) 331 | t.grad_node.append(GRAD_NODE_FMT(self, AbsBackward)) 332 | 333 | return t 334 | 335 | @staticmethod 336 | def astensor(data): 337 | if not isinstance(data, Tensor): 338 | data = Tensor(data) 339 | return data 340 | 341 | 342 | def random(*shape, requires_grad=True): 343 | return Tensor(np.random.rand(*shape), requires_grad) 344 | -------------------------------------------------------------------------------- /example/Predict.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "id": "chemical-wrong", 6 | "metadata": {}, 7 | "source": [ 8 | "# Predict\n", 9 | "\n", 10 | "使用神经网络预测波士顿房价" 11 | ] 12 | }, 13 | { 14 | "cell_type": "code", 15 | "execution_count": 1, 16 | "id": "urban-minutes", 17 | "metadata": {}, 18 | "outputs": [], 19 | "source": [ 20 | "import sys\n", 21 | "sys.path.append('..')\n", 22 | "\n", 23 | "import numpy as np\n", 24 | "import matplotlib.pyplot as plt\n", 25 | "from sklearn.datasets import load_boston\n", 26 | "from tqdm import tqdm\n", 27 | "from easytorch.layer import Linear, ReLU, Tanh, Sequential\n", 28 | "from easytorch.optim import SGD\n", 29 | "from easytorch.tensor import Tensor\n", 30 | "import easytorch.functional as F" 31 | ] 32 | }, 33 | { 34 | "cell_type": "markdown", 35 | "id": "published-ridge", 36 | "metadata": {}, 37 | "source": [ 38 | "## 1. 加载数据" 39 | ] 40 | }, 41 | { 42 | "cell_type": "code", 43 | "execution_count": 2, 44 | "id": "wrong-player", 45 | "metadata": {}, 46 | "outputs": [ 47 | { 48 | "data": { 49 | "text/plain": [ 50 | "((506, 13), (506,))" 51 | ] 52 | }, 53 | "execution_count": 2, 54 | "metadata": {}, 55 | "output_type": "execute_result" 56 | } 57 | ], 58 | "source": [ 59 | "dataset = load_boston()\n", 60 | "data_x = dataset.data\n", 61 | "data_y = dataset.target\n", 62 | "data_name = dataset.feature_names\n", 63 | "data_x = (data_x - data_x.mean(axis=0)) / (data_x.std(axis=0) + 1e-6)\n", 64 | "data_x.shape, data_y.shape" 65 | ] 66 | }, 67 | { 68 | "cell_type": "code", 69 | "execution_count": 3, 70 | "id": "affecting-malpractice", 71 | "metadata": {}, 72 | "outputs": [], 73 | "source": [ 74 | "train_x = Tensor(data_x)\n", 75 | "train_y = Tensor(data_y)" 76 | ] 77 | }, 78 | { 79 | "cell_type": "markdown", 80 | "id": "absolute-edinburgh", 81 | "metadata": {}, 82 | "source": [ 83 | "## 2. 搭建模型及训练" 84 | ] 85 | }, 86 | { 87 | "cell_type": "code", 88 | "execution_count": 4, 89 | "id": "agricultural-orleans", 90 | "metadata": {}, 91 | "outputs": [], 92 | "source": [ 93 | "model = Sequential(\n", 94 | " Linear(13, 10),\n", 95 | " ReLU(),\n", 96 | " Linear(10, 1)\n", 97 | ")\n", 98 | "opt = SGD(model.parameters(), lr=3e-4)\n", 99 | "loss_fn = F.l1_loss\n", 100 | "loss_list = []" 101 | ] 102 | }, 103 | { 104 | "cell_type": "code", 105 | "execution_count": 5, 106 | "id": "german-seven", 107 | "metadata": { 108 | "tags": [] 109 | }, 110 | "outputs": [ 111 | { 112 | "name": "stderr", 113 | "output_type": "stream", 114 | "text": [ 115 | "100%|██████████| 500/500 [00:44<00:00, 11.21it/s]\n" 116 | ] 117 | } 118 | ], 119 | "source": [ 120 | "for _ in tqdm(range(500)):\n", 121 | " sum_loss = 0\n", 122 | " for x, y in zip(train_x, train_y):\n", 123 | " pred = model(x)\n", 124 | " loss = loss_fn(pred, y.reshape(1, 1))\n", 125 | " sum_loss += loss.data\n", 126 | " opt.zero_grad()\n", 127 | " loss.backward()\n", 128 | " opt.step()\n", 129 | " loss_list.append(sum_loss / len(train_x))" 130 | ] 131 | }, 132 | { 133 | "cell_type": "markdown", 134 | "id": "typical-trigger", 135 | "metadata": {}, 136 | "source": [ 137 | "## 3. 结果" 138 | ] 139 | }, 140 | { 141 | "cell_type": "code", 142 | "execution_count": 6, 143 | "id": "popular-argentina", 144 | "metadata": {}, 145 | "outputs": [ 146 | { 147 | "data": { 148 | "image/png": "\n", 149 | "text/plain": [ 150 | "
" 151 | ] 152 | }, 153 | "metadata": { 154 | "needs_background": "light" 155 | }, 156 | "output_type": "display_data" 157 | } 158 | ], 159 | "source": [ 160 | "plt.plot(loss_list)\n", 161 | "plt.show()" 162 | ] 163 | }, 164 | { 165 | "cell_type": "code", 166 | "execution_count": 7, 167 | "id": "comic-salem", 168 | "metadata": {}, 169 | "outputs": [ 170 | { 171 | "data": { 172 | "text/plain": [ 173 | "tensor(1.98941591887784, grad_fn=)" 174 | ] 175 | }, 176 | "execution_count": 7, 177 | "metadata": {}, 178 | "output_type": "execute_result" 179 | } 180 | ], 181 | "source": [ 182 | "pred = model(train_x)\n", 183 | "loss = loss_fn(pred, train_y.reshape(-1, 1)).mean()\n", 184 | "loss" 185 | ] 186 | }, 187 | { 188 | "cell_type": "code", 189 | "execution_count": null, 190 | "id": "interior-virgin", 191 | "metadata": {}, 192 | "outputs": [], 193 | "source": [] 194 | } 195 | ], 196 | "metadata": { 197 | "kernelspec": { 198 | "display_name": "Python 3", 199 | "language": "python", 200 | "name": "python3" 201 | }, 202 | "language_info": { 203 | "codemirror_mode": { 204 | "name": "ipython", 205 | "version": 3 206 | }, 207 | "file_extension": ".py", 208 | "mimetype": "text/x-python", 209 | "name": "python", 210 | "nbconvert_exporter": "python", 211 | "pygments_lexer": "ipython3", 212 | "version": "3.6.9" 213 | } 214 | }, 215 | "nbformat": 4, 216 | "nbformat_minor": 5 217 | } 218 | -------------------------------------------------------------------------------- /example/FunctionApproximation.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "id": "alternative-oxide", 6 | "metadata": {}, 7 | "source": [ 8 | "# Function Approximation\n", 9 | "\n", 10 | "使用单层的神经网络近似三角函数。" 11 | ] 12 | }, 13 | { 14 | "cell_type": "code", 15 | "execution_count": 1, 16 | "id": "beneficial-footwear", 17 | "metadata": {}, 18 | "outputs": [], 19 | "source": [ 20 | "import sys\n", 21 | "sys.path.append('..')\n", 22 | "\n", 23 | "import numpy as np\n", 24 | "import matplotlib.pyplot as plt\n", 25 | "from tqdm import tqdm\n", 26 | "from easytorch.layer import Linear, ReLU, Tanh, Sequential\n", 27 | "from easytorch.optim import SGD\n", 28 | "from easytorch.tensor import Tensor\n", 29 | "import easytorch.functional as F" 30 | ] 31 | }, 32 | { 33 | "cell_type": "code", 34 | "execution_count": 2, 35 | "id": "split-soldier", 36 | "metadata": {}, 37 | "outputs": [], 38 | "source": [ 39 | "def generate_data(n=100, f=lambda x: 2*np.sin(x) + np.cos(x)):\n", 40 | " data = []\n", 41 | " for _ in range(n):\n", 42 | " x = np.random.uniform(-3, 3)\n", 43 | " y = f(x) + 0.03 * np.random.randn()\n", 44 | " data.append([x, y])\n", 45 | " return data" 46 | ] 47 | }, 48 | { 49 | "cell_type": "code", 50 | "execution_count": 3, 51 | "id": "alpha-thanksgiving", 52 | "metadata": {}, 53 | "outputs": [], 54 | "source": [ 55 | "train_data = generate_data()\n", 56 | "x = Tensor(np.array([x for x, y in train_data]).reshape(-1, 1))\n", 57 | "y = Tensor(np.array([y for x, y in train_data]).reshape(-1, 1))" 58 | ] 59 | }, 60 | { 61 | "cell_type": "code", 62 | "execution_count": 4, 63 | "id": "saved-breakdown", 64 | "metadata": {}, 65 | "outputs": [], 66 | "source": [ 67 | "model = Sequential(\n", 68 | " Linear(1, 5),\n", 69 | " Tanh(),\n", 70 | " Linear(5, 1)\n", 71 | ")\n", 72 | "opt = SGD(model.parameters(), lr=3e-3)\n", 73 | "loss_fn = F.mse_loss\n", 74 | "loss_list = []" 75 | ] 76 | }, 77 | { 78 | "cell_type": "code", 79 | "execution_count": 5, 80 | "id": "million-immunology", 81 | "metadata": {}, 82 | "outputs": [ 83 | { 84 | "name": "stderr", 85 | "output_type": "stream", 86 | "text": [ 87 | "100%|██████████| 700/700 [00:16<00:00, 42.21it/s]\n" 88 | ] 89 | } 90 | ], 91 | "source": [ 92 | "for epoch in tqdm(range(700)):\n", 93 | " for data_x, data_y in zip(x, y):\n", 94 | " pred = model(data_x)\n", 95 | " loss = loss_fn(pred, data_y.reshape(-1, 1))\n", 96 | " opt.zero_grad()\n", 97 | " loss.backward()\n", 98 | " opt.step()\n", 99 | " loss_list.append(loss.data)" 100 | ] 101 | }, 102 | { 103 | "cell_type": "code", 104 | "execution_count": 6, 105 | "id": "informative-characteristic", 106 | "metadata": {}, 107 | "outputs": [ 108 | { 109 | "data": { 110 | "image/png": "\n", 111 | "text/plain": [ 112 | "
" 113 | ] 114 | }, 115 | "metadata": { 116 | "needs_background": "light" 117 | }, 118 | "output_type": "display_data" 119 | } 120 | ], 121 | "source": [ 122 | "# plt.plot(loss_list)\n", 123 | "# plt.show()\n", 124 | "\n", 125 | "plt.scatter(x.data, y.data, label='true data')\n", 126 | "plt.scatter(x.data, model(x).data, label='pred data')\n", 127 | "plt.legend()\n", 128 | "plt.show()" 129 | ] 130 | }, 131 | { 132 | "cell_type": "code", 133 | "execution_count": null, 134 | "id": "serious-compiler", 135 | "metadata": {}, 136 | "outputs": [], 137 | "source": [] 138 | } 139 | ], 140 | "metadata": { 141 | "kernelspec": { 142 | "display_name": "Python 3", 143 | "language": "python", 144 | "name": "python3" 145 | }, 146 | "language_info": { 147 | "codemirror_mode": { 148 | "name": "ipython", 149 | "version": 3 150 | }, 151 | "file_extension": ".py", 152 | "mimetype": "text/x-python", 153 | "name": "python", 154 | "nbconvert_exporter": "python", 155 | "pygments_lexer": "ipython3", 156 | "version": "3.6.9" 157 | } 158 | }, 159 | "nbformat": 4, 160 | "nbformat_minor": 5 161 | } 162 | --------------------------------------------------------------------------------