├── LICENSE ├── README.md ├── RLS_Neural_Network.py ├── RLS_sequential_data.py ├── images ├── comparision.jpg └── rls_figure.jpg ├── least-squares-regression.py ├── performance_comparision.cpp └── tutorial ├── RLS_text_prediction.py ├── Regression_step_by_step.ipynb ├── boston_housing_example.py ├── iris_dataset_example.py └── mnist_dataset_example.py /LICENSE: -------------------------------------------------------------------------------- 1 | MIT License 2 | 3 | Copyright (c) 2020 Hunar Ahmad 4 | 5 | Permission is hereby granted, free of charge, to any person obtaining a copy 6 | of this software and associated documentation files (the "Software"), to deal 7 | in the Software without restriction, including without limitation the rights 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 9 | copies of the Software, and to permit persons to whom the Software is 10 | furnished to do so, subject to the following conditions: 11 | 12 | The above copyright notice and this permission notice shall be included in all 13 | copies or substantial portions of the Software. 14 | 15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 21 | SOFTWARE. 22 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | 2 | ![GitHub repo size](https://img.shields.io/github/repo-size/hunar4321/RLS_Learning) 3 | ![GitHub](https://img.shields.io/github/license/hunar4321/RLS_Learning) 4 | 5 | # 1. Recursive Least Squares by predicting errors 6 | This is a simple intuitive method to solve linear equations using recursive least squares 7 | 8 | Checkout the step by step video tutorial here: https://youtu.be/4vGaN1dTVhw 9 | 10 | ------------ 11 | 12 | #### Illustration - RLS Error Prediction: 13 | 14 | ![](images/rls_figure.jpg) 15 | 16 | #### Comparision between how errors are shared among the inputs in Gradient based methods vs. RLS based methods 17 | ![](images/comparision.jpg) 18 | 19 | Algorithmically, this method is faster than matrix inversion due to the fewer operations required. However, in practice, it's hard to fairly compare this method with the already established linear solvers because many optimalization tricks have been done at the level of the hardware for matrix operations. We added a simple C++ implementation using Eigen library to compare the performance of this method to the matrix inversion method. 20 | 21 | Inspired from the following post by whuber: https://stats.stackexchange.com/q/166718 22 | 23 | # 2. Fast Learning in Neural Networks (Real time optimization) 24 | ----------------------------------- 25 | There is an example usage at the end of *RLS_Neural_Network.py* which showcases how this network can learn XOR data in a single iteration. Run the code and see the output. 26 | 27 | **Advantages of using RLS for learning instead of gradient descent** 28 | 1. Fast learning and sample efficiency (can learn in one-shot). 29 | 2. Online Learning (suitable for real time learning). 30 | 3. No worries about local minima. 31 | 32 | **Disadvantages:** 33 | 1. Computationally inefficient if the size of the input is big (Quadratic Complexity). 34 | 2. Sensitive to overflow and underflow and this can lead to unstability in some cases 35 | 3. The current implementation works with a single hidden unit neural network. It is not clear if adding more layers will be useful since learning only happens in the last layer 36 | 37 | -------------------------------------------------------------------------------- /RLS_Neural_Network.py: -------------------------------------------------------------------------------- 1 | # -*- coding: utf-8 -*- 2 | """ 3 | Created on Sat Jan 4 00:51:38 2020 4 | 5 | Author: Hunar Ahmad @ Brainxyz 6 | """ 7 | 8 | import numpy as np 9 | 10 | class RlsNode: 11 | """ 12 | Recursive Least Squares Estimation (This is the update part of Kalman filter) 13 | """ 14 | def __init__(self, _m, _name): 15 | self.name = _name; 16 | self.M = _m; 17 | self.w = np.random.rand(1, _m); 18 | self.P = np.eye(_m)/1; 19 | 20 | def RlsPredict(self, v): 21 | return np.dot(self.w , v); 22 | 23 | def RlsLearn(self, v, e): 24 | pv = np.dot(self.P, v) 25 | vv = np.dot(pv.T, v) 26 | eMod = pv/vv; 27 | ee = eMod * e; 28 | self.w = self.w + ee 29 | outer = np.outer(eMod , pv); 30 | self.P = self.P - outer 31 | 32 | 33 | class Net: 34 | """ 35 | neural network (single hidden layer where the weights in first layer (iWh) are randomly initialized) 36 | """ 37 | def __init__(self, _input_size, _neurons): 38 | self.input_size = _input_size; 39 | self.neurons = _neurons; 40 | self.iWh = (np.random.rand(_neurons, _input_size)-0.5)*1 41 | self.nodes = []; 42 | 43 | def CreateOutputNode(self, _name): 44 | nn = RlsNode(self.neurons, _name); 45 | self.nodes.append(nn) 46 | 47 | def sigmoid(self, x): 48 | return 1 / (1 + np.e ** -x) 49 | 50 | def FeedForwardL1(self, v): 51 | vout = np.dot(self.iWh, v); 52 | # tout = np.tanh(vout); 53 | tout = self.sigmoid(vout); 54 | return tout + 0.00000001 ## adding a small value to avoid dividion by zero in the upcoming computations! 55 | 56 | ### RLS layer (Trainable weights using RLS algorthim) 57 | def FeedForwardL2(self, tout): 58 | yhats = []; 59 | for i in range(len(self.nodes)): 60 | p = self.nodes[i].RlsPredict(tout); 61 | yhats.append(p[0]); 62 | return np.asarray(yhats) 63 | 64 | ### Error Evaluation 65 | def Evaluate(self, ys, yhats): 66 | errs = ys - yhats 67 | return errs 68 | 69 | def Learn(self, acts, errs): 70 | for i in range(len(self.nodes)): 71 | self.nodes[i].RlsLearn(acts, errs[i]) #change to errs[0][i] if indexing error happen 72 | 73 | 74 | # #### Example Usage ### 75 | 76 | x = [[1, 1],[ 0, 0],[ 1, 0],[ 0,1]]; ## input data 77 | y = [1, 1, 0, 0]; ## output data (targets) 78 | 79 | ## configuring the network 80 | n_input = 2 81 | n_neurons = 5 82 | n_output = 1 83 | net = Net(n_input, n_neurons) 84 | for i in range(n_output): 85 | net.CreateOutputNode(i) 86 | 87 | ## training 88 | N = len(x) ## you only need one iteration over the samples to learn (RLS is a near one-shot learning!) 89 | for i in range(N): 90 | inputt = x[i][:] 91 | L1 = net.FeedForwardL1(inputt); 92 | yhats = net.FeedForwardL2(L1) 93 | errs = net.Evaluate(y[i], yhats) 94 | net.Learn(L1, errs) 95 | 96 | ## evaluate after learning 97 | yh= []; 98 | for i in range(N): 99 | inputt = x[i][:] 100 | L1 = net.FeedForwardL1(inputt); 101 | yhats = net.FeedForwardL2(L1) 102 | 103 | print("input", inputt, "predicted output:", yhats[0]) 104 | 105 | 106 | 107 | 108 | 109 | -------------------------------------------------------------------------------- /RLS_sequential_data.py: -------------------------------------------------------------------------------- 1 | """ 2 | Lattice Recursive Least Squares for predicting next elements in the squential data 3 | Video Tutorial: https://youtu.be/4vGaN1dTVhw 4 | """ 5 | 6 | import numpy as np 7 | import matplotlib.pylab as plt 8 | 9 | # generating time series data 10 | N = 200 11 | t = np.arange(0.0001, 1.001, 0.001) 12 | data = 2 * np.sin(2 * np.pi * 50 * t) 13 | ys = data[:N] 14 | for i in range(1, N, 2): 15 | ys[i] = 1 16 | 17 | #ys[70] = -1 18 | #ys[150] = -1 19 | 20 | M = 20 # looking back M steps to predict the signal 21 | b = np.zeros(M) 22 | t = np.zeros(M) 23 | f = np.zeros(M) 24 | wf = np.zeros(M) 25 | wb = np.zeros(M) 26 | fb = np.zeros(M) 27 | ff = np.zeros(M) + 0.000001 28 | bb = np.zeros(M) + 0.000001 29 | e = np.zeros(N) 30 | 31 | pred = np.zeros(N) 32 | act = np.zeros((M, N)) 33 | for n in range(N - 1): 34 | 35 | pred[n] = wf @ b 36 | 37 | f[0] = ys[n] 38 | t[0] = ys[n] 39 | 40 | for m in range(M-1): 41 | 42 | ff[m] += f[m] * f[m] 43 | bb[m] += b[m] * b[m] 44 | fb[m] += f[m] * b[m] 45 | 46 | wb[m] = fb[m] / ff[m] 47 | wf[m] = fb[m] / bb[m] 48 | 49 | f[m + 1] = f[m] - wf[m] * b[m] 50 | t[m + 1] = b[m] - wb[m] * f[m] 51 | 52 | b = t.copy() 53 | 54 | e[n] = f[-1] 55 | act[:, n] = f 56 | 57 | plt.figure(1) 58 | plt.plot(ys) 59 | plt.plot(pred) 60 | plt.legend(["truth", "prediction"]) 61 | 62 | plt.figure(2) 63 | plt.title("Error") 64 | plt.plot(e) 65 | 66 | plt.figure(4) 67 | plt.title("Activity") 68 | plt.imshow(np.flipud(np.abs(act))) 69 | 70 | ###### generating data ############ 71 | 72 | # generated_data = np.zeros(N) 73 | # for n in range(N - 2): 74 | 75 | # generated_data[n] = wf @ b 76 | 77 | # f[0] = generated_data[n] 78 | # t[0] = generated_data[n] 79 | 80 | # for m in range(M-1): 81 | 82 | # f[m + 1] = f[m] - wf[m] * b[m] 83 | # t[m + 1] = b[m] - wb[m] * f[m] 84 | 85 | # b = t.copy() 86 | 87 | # plt.figure(3) 88 | # plt.title("generated_data") 89 | # plt.plot(generated_data) 90 | 91 | -------------------------------------------------------------------------------- /images/comparision.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/hunar4321/RLS-neural-net/1e43e64bebd916f94a8787176c4cc9606f16dc36/images/comparision.jpg -------------------------------------------------------------------------------- /images/rls_figure.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/hunar4321/RLS-neural-net/1e43e64bebd916f94a8787176c4cc9606f16dc36/images/rls_figure.jpg -------------------------------------------------------------------------------- /least-squares-regression.py: -------------------------------------------------------------------------------- 1 | """ 2 | Multiple Regression Using Recursive Error prediction. 3 | Author: Hunar Ahmad @ Brainxyz 4 | Video tutorial: https://youtu.be/4vGaN1dTVhw 5 | """ 6 | import numpy as np 7 | import matplotlib.pylab as plt 8 | 9 | N = 100 #number of the subjects 10 | M = 10 #number of the variables 11 | xs = np.random.randn(M, N) 12 | ws = np.random.randn(M) 13 | ys = ws @ xs 14 | 15 | x = xs.copy() #make copies of xs & ys before modifying them 16 | y = ys.copy() 17 | wy = np.zeros(M) 18 | sx = np.zeros(M) 19 | for i in range(M): 20 | sx[i] = np.sum(x[i]**2) 21 | for j in range(i+1, M): 22 | wx = np.sum(x[i] * x[j]) /sx[i] 23 | x[j] -= wx * x[i] 24 | 25 | for i in range(M-1,-1, -1): 26 | wy[i] = np.sum(y * x[i]) / sx[i] 27 | y -= wy[i] * xs[i] 28 | 29 | yh = wy @ xs #prediction 30 | plt.title("Prediction") 31 | plt.plot(ys) 32 | plt.plot(yh) 33 | plt.legend(["truth", "prediction"]) 34 | 35 | print("Solutions:", wy) 36 | print("------------------") 37 | print("True weights:", ws) 38 | -------------------------------------------------------------------------------- /performance_comparision.cpp: -------------------------------------------------------------------------------- 1 | // this is a simple C++ script to compare the raw performance of our RLS method versus the conventional matrix inversion method 2 | #include 3 | #include 4 | #include 5 | 6 | int main() { 7 | 8 | std::cout << "Performance Comparsion:" << std::endl; 9 | std::cout << "----------------------" << std::endl; 10 | 11 | int N = 10000; // number of the subjects 12 | int M = 100; // number of the variables 13 | 14 | srand((unsigned int)time(0)); 15 | 16 | // Generate data where ys = ws @ xs 17 | Eigen::MatrixXd xs = Eigen::MatrixXd::Random(M, N); 18 | Eigen::VectorXd ws = Eigen::VectorXd::Random(M); 19 | Eigen::VectorXd ys = ws.transpose() * xs; 20 | 21 | Eigen::MatrixXd x = xs; // make copies of xs & ys before modifying them 22 | Eigen::VectorXd y = ys; 23 | 24 | // method1 25 | auto start = std::chrono::high_resolution_clock::now(); 26 | Eigen::VectorXd wy(M); 27 | Eigen::VectorXd sx(M); 28 | for (int i = 0; i < M; i++) { 29 | sx(i) = x.row(i).squaredNorm(); 30 | Eigen::VectorXd projection = (x.block(i + 1, 0, M - i - 1, N) * x.row(i).transpose()) / sx(i); 31 | x.block(i + 1, 0, M - i - 1, N) -= projection * x.row(i); 32 | } 33 | 34 | for (int i = M - 1; i >= 0; i--) { 35 | wy(i) = y.dot(x.row(i)) / sx(i); 36 | y -= wy(i) * xs.row(i).transpose(); 37 | } 38 | 39 | auto end = std::chrono::high_resolution_clock::now(); 40 | std::chrono::duration elapsed = end - start; 41 | std::cout << "Method1: Solving the equation with Error Prediction & Gram-Schmidt orthogonalization" << std::endl; 42 | std::cout << "Elapsed time: " << elapsed.count() << " seconds" << std::endl; 43 | 44 | std::cout << "------------------" << std::endl; 45 | // method2 46 | Eigen::MatrixXd ix = xs.transpose(); 47 | start = std::chrono::high_resolution_clock::now(); 48 | Eigen::VectorXd wi = ix.completeOrthogonalDecomposition().pseudoInverse() * ys; 49 | end = std::chrono::high_resolution_clock::now(); 50 | elapsed = end - start; 51 | std::cout << "Method2: Solving the equation using Matrix Inversion & SVD Decompostision" << std::endl; 52 | std::cout << "Elapsed time: " << elapsed.count() << " seconds" << std::endl; 53 | 54 | std::cout << "------------------" << std::endl; 55 | std::cout << "Comparing a sample of the resulted weights from the two methods (should be similar)" << std::endl; 56 | for (int i = 0; i < 5; i++) { 57 | std::cout << "method1: " << wy(i) << " method2: " << wi(i) << std::endl; 58 | } 59 | 60 | return 0; 61 | } 62 | -------------------------------------------------------------------------------- /tutorial/RLS_text_prediction.py: -------------------------------------------------------------------------------- 1 | ## a simple example usage for a multiclass example like predicting the next letter 2 | 3 | import numpy as np 4 | import matplotlib.pylab as plt 5 | 6 | text = ''' 7 | 01 02 03 04 05 06 07 08 09 10 8 | 11 12 13 14 15 16 17 18 19 20 9 | 21 22 23 24 25 26 27 28 29 30 10 | 31 32 33 34 35 36 37 38 39 40 11 | 41 42 43 44 45 46 47 48 49 50 12 | 51 52 53 54 55 56 57 58 59 60 13 | 61 62 63 64 65 66 67 68 69 70 14 | 71 72 73 74 75 76 77 78 79 80 15 | 81 82 83 84 85 86 87 88 89 90 16 | ''' 17 | 18 | N = len(text) 19 | M = 10 20 | emb_size = 100 21 | 22 | words=text[:N] 23 | chars = sorted(list(set(words))) 24 | stoi = {s:i for i,s in enumerate(chars)} 25 | itos = {i:s for s,i in stoi.items()} 26 | vocab_size = len(itos) 27 | xs = np.zeros(N, dtype=int) 28 | for i in range(N): 29 | xs[i] = stoi[words[i]] 30 | 31 | xemb = np.random.randn(vocab_size, emb_size) #embedding the letters 32 | xpos = np.random.randn(M, emb_size) #positional embedding 33 | yind = np.eye(vocab_size) #classes 34 | 35 | wx = np.zeros((emb_size, emb_size)) 36 | wy = np.zeros((vocab_size, emb_size)) 37 | xy = np.zeros((vocab_size, emb_size)) 38 | xx = np.zeros((emb_size, emb_size)) 39 | sx = np.zeros(emb_size) + 0.000001 40 | yh = np.zeros(N) 41 | ys = np.zeros(N, dtype = int) 42 | for n in range(N-(M+1)): 43 | 44 | if n%100==0: 45 | print(".", end="") 46 | 47 | # prepare the inputs 48 | x = xs[n:n+M].copy() 49 | x = xemb[x] * xpos 50 | x = np.sum(x, axis=0) 51 | 52 | # decorrelating the xs 53 | for i in range(emb_size): 54 | sx[i] += (x[i]**2) 55 | for j in range(i+1, emb_size): 56 | xx[i, j] += (x[i] * x[j]) 57 | wx[i, j] = xx[i, j] / sx[i] 58 | x[j] -= wx[i, j] * x[i] 59 | 60 | #predict 61 | pred = wy @ x 62 | yh[n] = np.argmax(pred) 63 | 64 | # regressing the xs on the ys (outputs) 65 | ys[n] = xs[n+M] 66 | for j in range(vocab_size): 67 | yi = yind[ys[n], j] 68 | for i in range(emb_size): 69 | xy[j,i] += yi * x[i] 70 | wy[j,i] = xy[j, i] / sx[i] 71 | yi -= wy[j, i] * x[i] 72 | 73 | plt.figure(1) 74 | plt.title("Prediction") 75 | plt.plot(ys) 76 | plt.plot(yh) 77 | plt.legend(["truth", "prediction"]) 78 | 79 | sen = "" 80 | for i in range(len(yh)): 81 | sen += itos[yh[i]] 82 | print(sen) 83 | -------------------------------------------------------------------------------- /tutorial/boston_housing_example.py: -------------------------------------------------------------------------------- 1 | import numpy as np 2 | import pandas as pd 3 | import matplotlib.pyplot as plt 4 | from pandas import read_csv 5 | 6 | # download housing dataset from: https://www.kaggle.com/code/prasadperera/the-boston-housing-dataset/notebook 7 | column_names = ['CRIM', 'ZN', 'INDUS', 'CHAS', 'NOX', 'RM', 'AGE', 'DIS', 'RAD', 'TAX', 'PTRATIO', 'B', 'LSTAT', 'MEDV'] 8 | data = read_csv("housing.xls", header=None, delimiter=r"\s+", names=column_names) 9 | 10 | column_sels = ['LSTAT', 'INDUS', 'NOX', 'PTRATIO', 'RM', 'TAX', 'DIS', 'AGE'] 11 | xs = data.loc[:,column_sels] 12 | ys = data['MEDV'] 13 | 14 | xs = xs.to_numpy().T 15 | ys = ys.to_numpy() 16 | 17 | ## add a bias term 18 | xs = np.vstack( (xs , np.ones(xs.shape[1] ) ) ) 19 | 20 | N = xs.shape[1] 21 | M = xs.shape[0] 22 | 23 | x = xs.copy() #make copies of xs & ys before modifying them 24 | y = ys.copy() 25 | wy = np.zeros(M) 26 | sx = np.zeros(M) 27 | for i in range(M): 28 | sx[i] = np.sum(x[i]**2) 29 | for j in range(i+1, M): 30 | wx = np.sum(x[i] * x[j]) /sx[i] 31 | x[j] -= wx * x[i] 32 | 33 | for i in range(M-1,-1, -1): 34 | wy[i] = np.sum(y * x[i]) / sx[i] 35 | y -= wy[i] * xs[i] 36 | 37 | yh = wy @ xs #prediction 38 | plt.figure(1) 39 | plt.title("Prediction") 40 | plt.plot(ys) 41 | plt.plot(yh) 42 | plt.legend(["truth", "prediction"]) 43 | 44 | print("Solutions:", wy) 45 | 46 | ### comparing the method with matrix inversion using svd & pinv 47 | # w_pinv = np.linalg.pinv(xs.T) @ ys 48 | # plt.figure(2) 49 | # plt.plot(w_pinv) 50 | # plt.plot(wy) 51 | 52 | 53 | 54 | -------------------------------------------------------------------------------- /tutorial/iris_dataset_example.py: -------------------------------------------------------------------------------- 1 | from sklearn import datasets 2 | from sklearn.model_selection import train_test_split 3 | import numpy as np 4 | 5 | # Load Iris dataset 6 | iris = datasets.load_iris() 7 | X = iris.data 8 | y = iris.target 9 | 10 | # Split the dataset into training and testing sets 11 | X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42) 12 | 13 | # Adding a column of ones to the features to include the bias (intercept) in the model 14 | X_train = np.c_[np.ones(X_train.shape[0]), X_train] 15 | X_test = np.c_[np.ones(X_test.shape[0]), X_test] 16 | 17 | print('De-correlating all the xs with each other') 18 | print('----------------------------') 19 | 20 | xs = X_train.T 21 | N = xs.shape[1] 22 | M = xs.shape[0] 23 | x = xs.copy() 24 | sx = np.zeros(M) 25 | for i in range(M): 26 | sx[i] = np.sum(x[i]**2) 27 | for j in range(i+1, M): 28 | wx = np.sum(x[i] * x[j]) /sx[i] 29 | x[j] -= wx * x[i] 30 | 31 | num_classes = 3 #Number of the output classes = 3 for iris dataset calculated from: (np.max(y_train)+1) 32 | 33 | ### finding the weights of the decorrelated xs with ys 34 | print("Method 1. regression on ys using multiple y_classes in the form of one_hot matrix") 35 | ys = np.zeros((N, num_classes)) 36 | for i in range(len(y_train)): #converting the classes to one_hot format 37 | ys[i, y_train[i]] = 1 38 | 39 | wy = np.zeros((M, num_classes)) 40 | for c in range(num_classes): 41 | y = ys[:, c] 42 | for i in range(M-1,-1, -1): 43 | wy[i, c] = np.sum(y * x[i]) / sx[i] 44 | y -= wy[i, c] * xs[i] 45 | 46 | ## predict the training set 47 | yh_train = X_train @ wy 48 | y_train_pred = np.argmax(yh_train, axis=1) 49 | train_accuracy = np.sum(y_train_pred == y_train)/len(y_train) 50 | print('train accuracy:', train_accuracy) 51 | 52 | ## predict the testing set 53 | yh_test = X_test @ wy 54 | y_test_pred = np.argmax(yh_test, axis=1) 55 | test_accuracy = np.sum(y_test_pred == y_test)/len(y_test) 56 | print('test accuracy:', test_accuracy) 57 | 58 | print("---------------------------------") 59 | print("Method 2. regression on ys with simple roundng & thresholing of the predicted y calsses.....") 60 | 61 | wy = np.zeros(M) 62 | y = y_train.astype(float) 63 | for i in range(M-1,-1, -1): 64 | wy[i] = np.sum(y * x[i]) / sx[i] 65 | y -= wy[i] * xs[i] 66 | 67 | ## predict the training set 68 | yh_train = X_train @ wy 69 | y_train_pred = np.round(yh_train).astype(int) 70 | y_train_pred = np.clip(y_train_pred, 0, num_classes-1) 71 | train_accuracy = np.sum(y_train_pred == y_train)/len(y_train) 72 | print('train accuracy:', train_accuracy) 73 | 74 | 75 | ## predict the testing set 76 | yh_test = X_test @ wy 77 | y_test_pred = np.round(yh_test).astype(int) 78 | y_test_pred = np.clip(y_test_pred, 0, num_classes-1) 79 | test_accuracy = np.sum(y_test_pred == y_test)/len(y_test) 80 | print('test accuracy:', test_accuracy) 81 | 82 | print("---------------------------------") 83 | print("Method 3. regression on ys using multiple y_classes in the form of random vectors (embedings)") 84 | 85 | np.random.seed(1) 86 | # Generate a random vector (embeding) for each class 87 | embed_size = 10 #can be same as the class number, 3, or it can be more 88 | class_vectors = np.random.randn(num_classes, embed_size) 89 | ys = np.zeros((N, embed_size)) 90 | # Assign each label its corresponding random vector 91 | for i in range(len(y_train)): 92 | ys[i] = class_vectors[y_train[i]] 93 | 94 | wy = np.zeros((M, embed_size)) 95 | for c in range(embed_size): 96 | y = ys[:, c] 97 | for i in range(M-1,-1, -1): 98 | wy[i, c] = np.sum(y * x[i]) / sx[i] 99 | y -= wy[i, c] * xs[i] 100 | 101 | ## In method 3, the predicted output will be a vector instead of a single number therefore we use a simple distance measure below to find the nearst class vector to the output 102 | def find_nearest_vector(vec, class_vectors): 103 | min_distance = np.inf 104 | min_index = 0 105 | for i in range(len(class_vectors)): 106 | distance = np.sum(np.abs(vec - class_vectors[i])) 107 | if distance < min_distance: 108 | min_distance = distance 109 | min_index = i 110 | return min_index 111 | 112 | ## predict the training set 113 | yh_train = X_train @ wy 114 | y_train_pred = [find_nearest_vector(vec, class_vectors) for vec in yh_train] 115 | train_accuracy = np.sum(y_train_pred == y_train)/len(y_train) 116 | print('train accuracy:', train_accuracy) 117 | 118 | ## predict the testing set 119 | yh_test = X_test @ wy 120 | y_test_pred =[find_nearest_vector(vec, class_vectors) for vec in yh_test] 121 | test_accuracy = np.sum(y_test_pred == y_test)/len(y_test) 122 | print('test accuracy:', test_accuracy) 123 | 124 | 125 | -------------------------------------------------------------------------------- /tutorial/mnist_dataset_example.py: -------------------------------------------------------------------------------- 1 | from sklearn.model_selection import train_test_split 2 | from sklearn import datasets 3 | import numpy as np 4 | 5 | # Load MNIST dataset 6 | mnist = datasets.fetch_openml('mnist_784', as_frame=False, parser='liac-arff') 7 | X = mnist.data.astype('float32') 8 | y = mnist.target.astype('int') 9 | 10 | # Normalize the pixel values to the range [0, 1] 11 | X /= 255.0 12 | 13 | # Split the dataset into training and testing sets 14 | X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42) 15 | 16 | # Adding a column of ones to the features to include the bias (intercept) in the model 17 | X_train = np.c_[np.ones(X_train.shape[0]), X_train] 18 | X_test = np.c_[np.ones(X_test.shape[0]), X_test] 19 | 20 | print('----------------------------') 21 | # Print out the sizes of the training and testing sets 22 | print("Size of training set: {}".format(X_train.shape[0])) 23 | print("Size of testing set: {}".format(X_test.shape[0])) 24 | print('----------------------------') 25 | print('De-correlating all the xs with each other') 26 | print('----------------------------') 27 | 28 | xs = X_train.T 29 | 30 | np.random.seed(1) 31 | 32 | def activate(w, x): 33 | linear = x = w.T @ x 34 | # out = linear 35 | out = np.maximum(linear, 0) #relu 36 | return out 37 | 38 | nodes = 500 39 | w = np.random.randn(xs.shape[0], nodes) 40 | xs = activate(w , xs) 41 | 42 | X_train = activate(w , X_train.T).T 43 | X_test = activate(w , X_test.T).T 44 | 45 | N = xs.shape[1] 46 | M = xs.shape[0] 47 | 48 | x = xs.copy() 49 | sx = np.zeros(M) 50 | 51 | for i in range(M): 52 | sx[i] = np.sum(x[i]**2) 53 | for j in range(i+1, M): 54 | wx = np.sum(x[i] * x[j]) / (sx[i] + 1e-8) # Adding a small constant to avoid division by zero 55 | x[j] -= wx * x[i] 56 | 57 | num_classes = len(np.unique(y_train)) 58 | 59 | ### finding the weights of the decorrelated xs with ys 60 | print("Method 1. regression on ys using multiple y_classes in the form of one_hot matrix") 61 | ys = np.zeros((N, num_classes)) 62 | for i in range(len(y_train)): # converting the classes to one_hot format 63 | ys[i, y_train[i]] = 1 64 | 65 | wy = np.zeros((M, num_classes)) 66 | for c in range(num_classes): 67 | y = ys[:, c] 68 | for i in range(M-1, -1, -1): 69 | wy[i, c] = np.sum(y * x[i]) / (sx[i] + 1e-8) # Adding a small constant to avoid division by zero 70 | y -= wy[i, c] * xs[i] 71 | 72 | ## predict the training set 73 | yh_train = X_train @ wy 74 | y_train_pred = np.argmax(yh_train, axis=1) 75 | train_accuracy = np.sum(y_train_pred == y_train) / len(y_train) 76 | print('train accuracy:', train_accuracy) 77 | 78 | ## predict the testing set 79 | yh_test = X_test @ wy 80 | y_test_pred = np.argmax(yh_test, axis=1) 81 | test_accuracy = np.sum(y_test_pred == y_test) / len(y_test) 82 | print('test accuracy:', test_accuracy) 83 | 84 | print("---------------------------------") 85 | print("Method 2. regression on ys with simple rounding & thresholding of the predicted y classes.....") 86 | 87 | wy = np.zeros(M) 88 | y = y_train.astype(float) 89 | for i in range(M-1, -1, -1): 90 | wy[i] = np.sum(y * x[i]) / (sx[i] + 1e-8) # Adding a small constant to avoid division by zero 91 | y -= wy[i] * xs[i] 92 | 93 | ## predict the training set 94 | yh_train = X_train @ wy 95 | y_train_pred = np.round(yh_train).astype(int) 96 | y_train_pred = np.clip(y_train_pred, 0, num_classes-1) 97 | train_accuracy = np.sum(y_train_pred == y_train) / len(y_train) 98 | print('train accuracy:', train_accuracy) 99 | 100 | ## predict the testing set 101 | yh_test = X_test @ wy 102 | y_test_pred = np.round(yh_test).astype(int) 103 | y_test_pred = np.clip(y_test_pred, 0, num_classes-1) 104 | test_accuracy = np.sum(y_test_pred == y_test) / len(y_test) 105 | print('test accuracy:', test_accuracy) 106 | 107 | print("---------------------------------") 108 | print("Method 3. regression on ys using multiple y_classes in the form of random vectors (embeddings)") 109 | 110 | np.random.seed(1) 111 | # Generate a random vector (embedding) for each class 112 | embed_size = 10 # can be the same as the class number, 3, or it can be more 113 | class_vectors = np.random.randn(num_classes, embed_size) 114 | ys = np.zeros((N, embed_size)) 115 | # Assign each label its corresponding random vector 116 | for i in range(len(y_train)): 117 | ys[i] = class_vectors[y_train[i]] 118 | 119 | wy = np.zeros((M, embed_size)) 120 | for c in range(embed_size): 121 | y = ys[:, c] 122 | for i in range(M-1, -1, -1): 123 | wy[i, c] = np.sum(y * x[i]) / (sx[i] + 1e-8) # Adding a small constant to avoid division by zero 124 | y -= wy[i, c] * xs[i] 125 | 126 | ## In method 3, the predicted output will be a vector instead of a single number, so we use a simple distance measure below to find the nearest class vector to the output 127 | def find_nearest_vector(vec, class_vectors): 128 | min_distance = np.inf 129 | min_index = 0 130 | for i in range(len(class_vectors)): 131 | distance = np.sum(np.abs(vec - class_vectors[i])) 132 | if distance < min_distance: 133 | min_distance = distance 134 | min_index = i 135 | return min_index 136 | 137 | ## predict the training set 138 | yh_train = X_train @ wy 139 | y_train_pred = [find_nearest_vector(vec, class_vectors) for vec in yh_train] 140 | train_accuracy = np.sum(y_train_pred == y_train) / len(y_train) 141 | print('train accuracy:', train_accuracy) 142 | 143 | ## predict the testing set 144 | yh_test = X_test @ wy 145 | y_test_pred = [find_nearest_vector(vec, class_vectors) for vec in yh_test] 146 | test_accuracy = np.sum(y_test_pred == y_test) / len(y_test) 147 | print('test accuracy:', test_accuracy) 148 | --------------------------------------------------------------------------------