├── LICENSE
├── README.md
├── RLS_Neural_Network.py
├── RLS_sequential_data.py
├── images
    ├── comparision.jpg
    └── rls_figure.jpg
├── least-squares-regression.py
├── performance_comparision.cpp
└── tutorial
    ├── RLS_text_prediction.py
    ├── Regression_step_by_step.ipynb
    ├── boston_housing_example.py
    ├── iris_dataset_example.py
    └── mnist_dataset_example.py


/LICENSE:
--------------------------------------------------------------------------------
 1 | MIT License
 2 | 
 3 | Copyright (c) 2020 Hunar Ahmad
 4 | 
 5 | Permission is hereby granted, free of charge, to any person obtaining a copy
 6 | of this software and associated documentation files (the "Software"), to deal
 7 | in the Software without restriction, including without limitation the rights
 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
 9 | copies of the Software, and to permit persons to whom the Software is
10 | furnished to do so, subject to the following conditions:
11 | 
12 | The above copyright notice and this permission notice shall be included in all
13 | copies or substantial portions of the Software.
14 | 
15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21 | SOFTWARE.
22 | 


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
 1 | 
 2 | ![GitHub repo size](https://img.shields.io/github/repo-size/hunar4321/RLS_Learning)
 3 | ![GitHub](https://img.shields.io/github/license/hunar4321/RLS_Learning)
 4 | 
 5 | # 1. Recursive Least Squares by predicting errors
 6 | This is a simple intuitive method to solve linear equations using recursive least squares
 7 | 
 8 | Checkout the step by step video tutorial here: https://youtu.be/4vGaN1dTVhw
 9 | 
10 | ------------
11 | 
12 | #### Illustration - RLS Error Prediction:
13 | 
14 | ![](images/rls_figure.jpg)
15 | 
16 | #### Comparision between how errors are shared among the inputs in Gradient based methods vs. RLS based methods
17 | ![](images/comparision.jpg)
18 | 
19 | Algorithmically, this method is faster than matrix inversion due to the fewer operations required. However, in practice, it's hard to fairly compare this method with the already established linear solvers because many optimalization tricks have been done at the level of the hardware for matrix operations. We added a simple C++ implementation using Eigen library to compare the performance of this method to the matrix inversion method.
20 | 
21 | Inspired from the following post by whuber: https://stats.stackexchange.com/q/166718
22 | 
23 | # 2. Fast Learning in Neural Networks (Real time optimization)
24 | -----------------------------------
25 | There is an example usage at the end of *RLS_Neural_Network.py* which showcases how this network can learn XOR data in a single iteration. Run the code and see the output.
26 | 
27 | **Advantages of using RLS for learning instead of gradient descent**
28 | 1. Fast learning and sample efficiency (can learn in one-shot).
29 | 2. Online Learning (suitable for real time learning).
30 | 3. No worries about local minima.
31 | 
32 | **Disadvantages:**
33 | 1. Computationally inefficient if the size of the input is big (Quadratic Complexity).
34 | 2. Sensitive to overflow and underflow and this can lead to unstability in some cases
35 | 3. The current implementation works with a single hidden unit neural network. It is not clear if adding more layers will be useful since learning only happens in the last layer 
36 | 
37 | 


--------------------------------------------------------------------------------
/RLS_Neural_Network.py:
--------------------------------------------------------------------------------
  1 | # -*- coding: utf-8 -*-
  2 | """
  3 | Created on Sat Jan  4 00:51:38 2020
  4 | 
  5 | Author: Hunar Ahmad @ Brainxyz
  6 | """
  7 | 
  8 | import numpy as np
  9 | 
 10 | class RlsNode:
 11 |     """
 12 |     Recursive Least Squares Estimation (This is the update part of Kalman filter)
 13 |     """
 14 |     def __init__(self, _m, _name):
 15 |         self.name = _name;
 16 |         self.M = _m;
 17 |         self.w = np.random.rand(1, _m);
 18 |         self.P = np.eye(_m)/1;
 19 | 
 20 |     def RlsPredict(self, v):
 21 |         return np.dot(self.w , v);
 22 |     
 23 |     def RlsLearn(self, v, e):
 24 |         pv = np.dot(self.P, v)
 25 |         vv = np.dot(pv.T, v)        
 26 |         eMod = pv/vv;
 27 |         ee = eMod * e;        
 28 |         self.w = self.w + ee
 29 |         outer = np.outer(eMod , pv);
 30 |         self.P = self.P - outer
 31 | 
 32 |         
 33 | class Net:
 34 |     """ 
 35 |     neural network (single hidden layer where the weights in first layer (iWh) are randomly initialized)
 36 |     """
 37 |     def __init__(self, _input_size, _neurons):
 38 |         self.input_size = _input_size;
 39 |         self.neurons = _neurons;
 40 |         self.iWh = (np.random.rand(_neurons, _input_size)-0.5)*1
 41 |         self.nodes = [];
 42 |         
 43 |     def CreateOutputNode(self, _name):
 44 |         nn = RlsNode(self.neurons, _name);
 45 |         self.nodes.append(nn)
 46 |         
 47 |     def sigmoid(self, x):
 48 |         return 1 / (1 + np.e ** -x)
 49 | 
 50 |     def FeedForwardL1(self, v):
 51 |         vout = np.dot(self.iWh, v);
 52 |         # tout = np.tanh(vout);
 53 |         tout = self.sigmoid(vout);
 54 |         return tout + 0.00000001 ## adding a small value to avoid dividion by zero in the upcoming computations!
 55 |     
 56 |     ### RLS layer (Trainable weights using RLS algorthim)
 57 |     def FeedForwardL2(self, tout):
 58 |         yhats = [];
 59 |         for i in range(len(self.nodes)):
 60 |             p = self.nodes[i].RlsPredict(tout);
 61 |             yhats.append(p[0]);
 62 |         return np.asarray(yhats)
 63 |     
 64 |     ### Error Evaluation
 65 |     def Evaluate(self, ys, yhats):
 66 |         errs = ys - yhats
 67 |         return errs
 68 |     
 69 |     def Learn(self, acts, errs):
 70 |         for i in range(len(self.nodes)):
 71 |             self.nodes[i].RlsLearn(acts, errs[i]) #change to errs[0][i] if indexing error happen
 72 | 
 73 | 
 74 | # #### Example Usage ###            
 75 | 
 76 | x = [[1, 1],[ 0, 0],[ 1, 0],[ 0,1]]; ## input data
 77 | y = [1, 1, 0, 0]; ## output data (targets)
 78 | 
 79 | ## configuring the network
 80 | n_input = 2
 81 | n_neurons = 5
 82 | n_output = 1
 83 | net = Net(n_input, n_neurons)
 84 | for i in range(n_output):
 85 |     net.CreateOutputNode(i)
 86 | 
 87 | ## training
 88 | N = len(x) ## you only need one iteration over the samples to learn (RLS is a near one-shot learning!)
 89 | for i in range(N):
 90 |     inputt = x[i][:]
 91 |     L1 = net.FeedForwardL1(inputt);
 92 |     yhats = net.FeedForwardL2(L1)
 93 |     errs = net.Evaluate(y[i], yhats)
 94 |     net.Learn(L1, errs)
 95 | 
 96 | ## evaluate after learning
 97 | yh= [];
 98 | for i in range(N):
 99 |     inputt = x[i][:] 
100 |     L1 = net.FeedForwardL1(inputt);
101 |     yhats = net.FeedForwardL2(L1)
102 |     
103 |     print("input", inputt, "predicted output:", yhats[0])
104 |   
105 |    
106 | 
107 | 
108 |     
109 | 


--------------------------------------------------------------------------------
/RLS_sequential_data.py:
--------------------------------------------------------------------------------
 1 | """
 2 | Lattice Recursive Least Squares for predicting next elements in the squential data 
 3 | Video Tutorial: https://youtu.be/4vGaN1dTVhw
 4 | """
 5 | 
 6 | import numpy as np
 7 | import matplotlib.pylab as plt
 8 | 
 9 | # generating time series data
10 | N = 200
11 | t = np.arange(0.0001, 1.001, 0.001)
12 | data = 2 * np.sin(2 * np.pi * 50 * t)  
13 | ys = data[:N]
14 | for i in range(1, N, 2):
15 |     ys[i] = 1
16 |     
17 | #ys[70] = -1
18 | #ys[150] = -1
19 | 
20 | M = 20 # looking back M steps to predict the signal
21 | b = np.zeros(M)
22 | t = np.zeros(M)
23 | f = np.zeros(M)
24 | wf = np.zeros(M)
25 | wb = np.zeros(M)
26 | fb = np.zeros(M) 
27 | ff = np.zeros(M) + 0.000001
28 | bb = np.zeros(M) + 0.000001
29 | e = np.zeros(N)
30 | 
31 | pred = np.zeros(N)
32 | act = np.zeros((M, N))
33 | for n in range(N - 1):
34 | 
35 |     pred[n] = wf @ b        
36 |         
37 |     f[0] = ys[n]
38 |     t[0] = ys[n]  
39 | 
40 |     for m in range(M-1):
41 |         
42 |         ff[m] += f[m] * f[m]
43 |         bb[m] += b[m] * b[m]
44 |         fb[m] += f[m] * b[m]
45 | 
46 |         wb[m] = fb[m] / ff[m]
47 |         wf[m] = fb[m] / bb[m]
48 | 
49 |         f[m + 1] = f[m] - wf[m] * b[m]
50 |         t[m + 1] = b[m] - wb[m] * f[m]
51 |           
52 |     b = t.copy()
53 |         
54 |     e[n] = f[-1] 
55 |     act[:, n] = f
56 | 
57 | plt.figure(1)
58 | plt.plot(ys)
59 | plt.plot(pred)
60 | plt.legend(["truth", "prediction"])  
61 | 
62 | plt.figure(2)
63 | plt.title("Error")
64 | plt.plot(e)
65 | 
66 | plt.figure(4)
67 | plt.title("Activity")
68 | plt.imshow(np.flipud(np.abs(act)))
69 | 
70 | ###### generating data ############
71 | 
72 | # generated_data = np.zeros(N)
73 | # for n in range(N - 2):
74 | 
75 | #     generated_data[n] = wf @ b        
76 |         
77 | #     f[0] = generated_data[n]
78 | #     t[0] = generated_data[n]  
79 | 
80 | #     for m in range(M-1):
81 |         
82 | #         f[m + 1] = f[m] - wf[m] * b[m]
83 | #         t[m + 1] = b[m] - wb[m] * f[m]
84 |           
85 | #     b = t.copy()
86 |         
87 | # plt.figure(3)
88 | # plt.title("generated_data")
89 | # plt.plot(generated_data)
90 | 
91 | 


--------------------------------------------------------------------------------
/images/comparision.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/hunar4321/RLS-neural-net/1e43e64bebd916f94a8787176c4cc9606f16dc36/images/comparision.jpg


--------------------------------------------------------------------------------
/images/rls_figure.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/hunar4321/RLS-neural-net/1e43e64bebd916f94a8787176c4cc9606f16dc36/images/rls_figure.jpg


--------------------------------------------------------------------------------
/least-squares-regression.py:
--------------------------------------------------------------------------------
 1 | """
 2 | Multiple Regression Using Recursive Error prediction. 
 3 | Author: Hunar Ahmad @ Brainxyz
 4 | Video tutorial: https://youtu.be/4vGaN1dTVhw
 5 | """
 6 | import numpy as np
 7 | import matplotlib.pylab as plt
 8 | 
 9 | N = 100 #number of the subjects
10 | M = 10 #number of the variables
11 | xs = np.random.randn(M, N)
12 | ws = np.random.randn(M)
13 | ys = ws @ xs
14 | 
15 | x = xs.copy()  #make copies of xs & ys before modifying them 
16 | y = ys.copy()
17 | wy = np.zeros(M)
18 | sx = np.zeros(M)
19 | for i in range(M):
20 |     sx[i] = np.sum(x[i]**2)
21 |     for j in range(i+1, M):
22 |         wx = np.sum(x[i] * x[j]) /sx[i]
23 |         x[j] -= wx * x[i]
24 |         
25 | for i in range(M-1,-1, -1):
26 |     wy[i] = np.sum(y * x[i]) / sx[i]
27 |     y -= wy[i] * xs[i]
28 |     
29 | yh = wy @ xs #prediction
30 | plt.title("Prediction")
31 | plt.plot(ys)
32 | plt.plot(yh)
33 | plt.legend(["truth", "prediction"])   
34 | 
35 | print("Solutions:", wy)
36 | print("------------------")
37 | print("True weights:", ws)
38 | 


--------------------------------------------------------------------------------
/performance_comparision.cpp:
--------------------------------------------------------------------------------
 1 | // this is a simple C++ script to compare the raw performance of our RLS method versus the conventional matrix inversion method
 2 | #include <iostream>
 3 | #include <Eigen/Dense>
 4 | #include <chrono>
 5 | 
 6 | int main() {
 7 | 
 8 |     std::cout << "Performance Comparsion:" << std::endl;
 9 |     std::cout << "----------------------" << std::endl;
10 | 
11 |     int N = 10000;  // number of the subjects
12 |     int M = 100;   // number of the variables
13 | 
14 |     srand((unsigned int)time(0));
15 | 
16 |     // Generate data where ys = ws @ xs 
17 |     Eigen::MatrixXd xs = Eigen::MatrixXd::Random(M, N);
18 |     Eigen::VectorXd ws = Eigen::VectorXd::Random(M);
19 |     Eigen::VectorXd ys = ws.transpose() * xs;
20 | 
21 |     Eigen::MatrixXd x = xs;  // make copies of xs & ys before modifying them 
22 |     Eigen::VectorXd y = ys;
23 | 
24 |     // method1
25 |     auto start = std::chrono::high_resolution_clock::now();
26 |     Eigen::VectorXd wy(M);
27 |     Eigen::VectorXd sx(M);
28 |     for (int i = 0; i < M; i++) {
29 |         sx(i) = x.row(i).squaredNorm();
30 |         Eigen::VectorXd projection = (x.block(i + 1, 0, M - i - 1, N) * x.row(i).transpose()) / sx(i);
31 |         x.block(i + 1, 0, M - i - 1, N) -= projection * x.row(i);
32 |     }
33 | 
34 |     for (int i = M - 1; i >= 0; i--) {
35 |         wy(i) = y.dot(x.row(i)) / sx(i);
36 |         y -= wy(i) * xs.row(i).transpose();
37 |     }
38 | 
39 |     auto end = std::chrono::high_resolution_clock::now();
40 |     std::chrono::duration<double> elapsed = end - start;
41 |     std::cout << "Method1: Solving the equation with Error Prediction & Gram-Schmidt orthogonalization" << std::endl;
42 |     std::cout << "Elapsed time: " << elapsed.count() << " seconds" << std::endl;
43 | 
44 |     std::cout << "------------------" << std::endl;
45 |     // method2
46 |     Eigen::MatrixXd ix = xs.transpose();
47 |     start = std::chrono::high_resolution_clock::now();
48 |     Eigen::VectorXd wi = ix.completeOrthogonalDecomposition().pseudoInverse() * ys;
49 |     end = std::chrono::high_resolution_clock::now();
50 |     elapsed = end - start;
51 |     std::cout << "Method2: Solving the equation using Matrix Inversion & SVD Decompostision" << std::endl;
52 |     std::cout << "Elapsed time: " << elapsed.count() << " seconds" << std::endl;
53 | 
54 |     std::cout << "------------------" << std::endl;
55 |     std::cout << "Comparing a sample of the resulted weights from the two methods (should be similar)" << std::endl;
56 |     for (int i = 0; i < 5; i++) {
57 |         std::cout << "method1: " <<  wy(i) << "  method2: " <<  wi(i) << std::endl;
58 |     }
59 | 
60 |     return 0;
61 | }
62 | 


--------------------------------------------------------------------------------
/tutorial/RLS_text_prediction.py:
--------------------------------------------------------------------------------
 1 | ## a simple example usage for a multiclass example like predicting the next letter
 2 | 
 3 | import numpy as np
 4 | import matplotlib.pylab as plt
 5 | 
 6 | text = '''
 7 | 01 02 03 04 05 06 07 08 09 10
 8 | 11 12 13 14 15 16 17 18 19 20
 9 | 21 22 23 24 25 26 27 28 29 30
10 | 31 32 33 34 35 36 37 38 39 40
11 | 41 42 43 44 45 46 47 48 49 50
12 | 51 52 53 54 55 56 57 58 59 60
13 | 61 62 63 64 65 66 67 68 69 70
14 | 71 72 73 74 75 76 77 78 79 80
15 | 81 82 83 84 85 86 87 88 89 90
16 | '''    
17 |      
18 | N = len(text) 
19 | M = 10
20 | emb_size = 100
21 | 
22 | words=text[:N]
23 | chars = sorted(list(set(words)))
24 | stoi = {s:i for i,s in enumerate(chars)}
25 | itos = {i:s for s,i in stoi.items()}
26 | vocab_size = len(itos)  
27 | xs = np.zeros(N, dtype=int)
28 | for i in range(N):
29 |     xs[i] = stoi[words[i]]
30 |      
31 | xemb = np.random.randn(vocab_size, emb_size) #embedding the letters
32 | xpos = np.random.randn(M, emb_size) #positional embedding
33 | yind = np.eye(vocab_size) #classes
34 | 
35 | wx = np.zeros((emb_size, emb_size))
36 | wy = np.zeros((vocab_size, emb_size))
37 | xy = np.zeros((vocab_size, emb_size))
38 | xx = np.zeros((emb_size, emb_size))
39 | sx = np.zeros(emb_size) + 0.000001
40 | yh = np.zeros(N)
41 | ys = np.zeros(N, dtype = int)
42 | for n in range(N-(M+1)):
43 |     
44 |     if n%100==0:
45 |         print(".", end="")
46 |       
47 |     # prepare the inputs
48 |     x = xs[n:n+M].copy()
49 |     x = xemb[x] * xpos
50 |     x = np.sum(x, axis=0)
51 |     
52 |     # decorrelating the xs
53 |     for i in range(emb_size):
54 |         sx[i] += (x[i]**2)
55 |         for j in range(i+1, emb_size):
56 |             xx[i, j] += (x[i] * x[j])
57 |             wx[i, j] = xx[i, j] / sx[i]
58 |             x[j] -= wx[i, j] * x[i]
59 |        
60 |     #predict 
61 |     pred = wy @ x
62 |     yh[n] = np.argmax(pred)
63 |     
64 |     # regressing the xs on the ys (outputs)
65 |     ys[n] = xs[n+M]   
66 |     for j in range(vocab_size):
67 |         yi = yind[ys[n], j]
68 |         for i in range(emb_size):   
69 |             xy[j,i] += yi * x[i]
70 |             wy[j,i] = xy[j, i] / sx[i]
71 |             yi -= wy[j, i] * x[i]  
72 |                     
73 | plt.figure(1)        
74 | plt.title("Prediction")
75 | plt.plot(ys)
76 | plt.plot(yh)
77 | plt.legend(["truth", "prediction"])  
78 | 
79 | sen = ""
80 | for i in range(len(yh)):
81 |     sen += itos[yh[i]]
82 | print(sen)
83 | 


--------------------------------------------------------------------------------
/tutorial/boston_housing_example.py:
--------------------------------------------------------------------------------
 1 | import numpy as np 
 2 | import pandas as pd 
 3 | import matplotlib.pyplot as plt
 4 | from pandas import read_csv
 5 | 
 6 | # download housing dataset from: https://www.kaggle.com/code/prasadperera/the-boston-housing-dataset/notebook
 7 | column_names = ['CRIM', 'ZN', 'INDUS', 'CHAS', 'NOX', 'RM', 'AGE', 'DIS', 'RAD', 'TAX', 'PTRATIO', 'B', 'LSTAT', 'MEDV']
 8 | data = read_csv("housing.xls", header=None, delimiter=r"\s+", names=column_names)
 9 | 
10 | column_sels = ['LSTAT', 'INDUS', 'NOX', 'PTRATIO', 'RM', 'TAX', 'DIS', 'AGE']
11 | xs = data.loc[:,column_sels]
12 | ys = data['MEDV']
13 | 
14 | xs = xs.to_numpy().T
15 | ys = ys.to_numpy()
16 | 
17 | ## add a bias term
18 | xs = np.vstack( (xs , np.ones(xs.shape[1] ) ) )
19 | 
20 | N = xs.shape[1]
21 | M = xs.shape[0]
22 | 
23 | x = xs.copy()  #make copies of xs & ys before modifying them 
24 | y = ys.copy()
25 | wy = np.zeros(M)
26 | sx = np.zeros(M)
27 | for i in range(M):
28 |     sx[i] = np.sum(x[i]**2)
29 |     for j in range(i+1, M):
30 |         wx = np.sum(x[i] * x[j]) /sx[i]
31 |         x[j] -= wx * x[i]
32 |         
33 | for i in range(M-1,-1, -1):
34 |     wy[i] = np.sum(y * x[i]) / sx[i]
35 |     y -= wy[i] * xs[i]
36 |     
37 | yh = wy @ xs #prediction
38 | plt.figure(1)
39 | plt.title("Prediction")
40 | plt.plot(ys)
41 | plt.plot(yh)
42 | plt.legend(["truth", "prediction"])   
43 | 
44 | print("Solutions:", wy)
45 |  
46 | ### comparing the method with matrix inversion using svd & pinv
47 | # w_pinv = np.linalg.pinv(xs.T) @ ys
48 | # plt.figure(2)
49 | # plt.plot(w_pinv)
50 | # plt.plot(wy)
51 | 
52 | 
53 | 
54 | 


--------------------------------------------------------------------------------
/tutorial/iris_dataset_example.py:
--------------------------------------------------------------------------------
  1 | from sklearn import datasets
  2 | from sklearn.model_selection import train_test_split
  3 | import numpy as np
  4 | 
  5 | # Load Iris dataset
  6 | iris = datasets.load_iris()
  7 | X = iris.data
  8 | y = iris.target
  9 | 
 10 | # Split the dataset into training and testing sets
 11 | X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
 12 | 
 13 | # Adding a column of ones to the features to include the bias (intercept) in the model
 14 | X_train = np.c_[np.ones(X_train.shape[0]), X_train]
 15 | X_test = np.c_[np.ones(X_test.shape[0]), X_test]
 16 | 
 17 | print('De-correlating all the xs with each other')
 18 | print('----------------------------')
 19 | 
 20 | xs = X_train.T
 21 | N = xs.shape[1]
 22 | M = xs.shape[0]
 23 | x = xs.copy() 
 24 | sx = np.zeros(M)
 25 | for i in range(M):
 26 |     sx[i] = np.sum(x[i]**2)
 27 |     for j in range(i+1, M):
 28 |         wx = np.sum(x[i] * x[j]) /sx[i]
 29 |         x[j] -= wx * x[i]
 30 | 
 31 | num_classes = 3 #Number of the output classes = 3 for iris dataset calculated from: (np.max(y_train)+1)
 32 | 
 33 | ### finding the weights of the decorrelated xs with ys
 34 | print("Method 1. regression on ys using multiple y_classes in the form of one_hot matrix")
 35 | ys = np.zeros((N, num_classes))
 36 | for i in range(len(y_train)): #converting the classes to one_hot format
 37 |     ys[i,  y_train[i]] = 1
 38 |     
 39 | wy = np.zeros((M, num_classes))    
 40 | for c in range(num_classes):
 41 |     y = ys[:, c]
 42 |     for i in range(M-1,-1, -1):
 43 |         wy[i, c] = np.sum(y * x[i]) / sx[i]
 44 |         y -= wy[i, c] * xs[i]    
 45 |     
 46 | ## predict the training set
 47 | yh_train = X_train @ wy
 48 | y_train_pred = np.argmax(yh_train, axis=1)
 49 | train_accuracy = np.sum(y_train_pred == y_train)/len(y_train)
 50 | print('train accuracy:', train_accuracy)
 51 | 
 52 | ## predict the testing set
 53 | yh_test = X_test @ wy
 54 | y_test_pred = np.argmax(yh_test, axis=1)
 55 | test_accuracy = np.sum(y_test_pred == y_test)/len(y_test)
 56 | print('test accuracy:', test_accuracy)
 57 | 
 58 | print("---------------------------------")
 59 | print("Method 2. regression on ys with simple roundng & thresholing of the predicted y calsses.....")
 60 | 
 61 | wy = np.zeros(M)
 62 | y = y_train.astype(float)    
 63 | for i in range(M-1,-1, -1):
 64 |     wy[i] = np.sum(y * x[i]) / sx[i]
 65 |     y -= wy[i] * xs[i]
 66 | 
 67 | ## predict the training set
 68 | yh_train = X_train @ wy
 69 | y_train_pred = np.round(yh_train).astype(int)
 70 | y_train_pred = np.clip(y_train_pred, 0, num_classes-1)
 71 | train_accuracy = np.sum(y_train_pred == y_train)/len(y_train)
 72 | print('train accuracy:', train_accuracy)
 73 | 
 74 | 
 75 | ## predict the testing set
 76 | yh_test = X_test @ wy
 77 | y_test_pred = np.round(yh_test).astype(int)
 78 | y_test_pred = np.clip(y_test_pred, 0, num_classes-1)
 79 | test_accuracy = np.sum(y_test_pred == y_test)/len(y_test)
 80 | print('test accuracy:', test_accuracy)
 81 | 
 82 | print("---------------------------------")
 83 | print("Method 3. regression on ys using multiple y_classes in the form of random vectors (embedings)")
 84 | 
 85 | np.random.seed(1)
 86 | # Generate a random vector (embeding) for each class
 87 | embed_size = 10 #can be same as the class number, 3, or it can be more
 88 | class_vectors = np.random.randn(num_classes, embed_size)
 89 | ys = np.zeros((N, embed_size))
 90 | # Assign each label its corresponding random vector
 91 | for i in range(len(y_train)):
 92 |     ys[i] = class_vectors[y_train[i]]
 93 |     
 94 | wy = np.zeros((M, embed_size))    
 95 | for c in range(embed_size):
 96 |     y = ys[:, c]
 97 |     for i in range(M-1,-1, -1):
 98 |         wy[i, c] = np.sum(y * x[i]) / sx[i]
 99 |         y -= wy[i, c] * xs[i]    
100 |     
101 | ## In method 3, the predicted output will be a vector instead of a single number therefore we use a simple distance measure below to find the nearst class vector to the output 
102 | def find_nearest_vector(vec, class_vectors):
103 |     min_distance = np.inf
104 |     min_index = 0
105 |     for i in range(len(class_vectors)):
106 |         distance = np.sum(np.abs(vec - class_vectors[i])) 
107 |         if distance < min_distance:
108 |             min_distance = distance
109 |             min_index = i
110 |     return min_index    
111 |     
112 | ## predict the training set
113 | yh_train = X_train @ wy
114 | y_train_pred = [find_nearest_vector(vec, class_vectors) for vec in yh_train]
115 | train_accuracy = np.sum(y_train_pred == y_train)/len(y_train)
116 | print('train accuracy:', train_accuracy)
117 | 
118 | ## predict the testing set
119 | yh_test = X_test @ wy
120 | y_test_pred =[find_nearest_vector(vec, class_vectors) for vec in yh_test]
121 | test_accuracy = np.sum(y_test_pred == y_test)/len(y_test)
122 | print('test accuracy:', test_accuracy)
123 | 
124 | 
125 | 


--------------------------------------------------------------------------------
/tutorial/mnist_dataset_example.py:
--------------------------------------------------------------------------------
  1 | from sklearn.model_selection import train_test_split
  2 | from sklearn import datasets
  3 | import numpy as np
  4 | 
  5 | # Load MNIST dataset
  6 | mnist = datasets.fetch_openml('mnist_784', as_frame=False, parser='liac-arff')
  7 | X = mnist.data.astype('float32')
  8 | y = mnist.target.astype('int')
  9 | 
 10 | # Normalize the pixel values to the range [0, 1]
 11 | X /= 255.0
 12 | 
 13 | # Split the dataset into training and testing sets
 14 | X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
 15 | 
 16 | # Adding a column of ones to the features to include the bias (intercept) in the model
 17 | X_train = np.c_[np.ones(X_train.shape[0]), X_train]
 18 | X_test = np.c_[np.ones(X_test.shape[0]), X_test]
 19 | 
 20 | print('----------------------------')
 21 | # Print out the sizes of the training and testing sets
 22 | print("Size of training set: {}".format(X_train.shape[0]))
 23 | print("Size of testing set: {}".format(X_test.shape[0]))
 24 | print('----------------------------')
 25 | print('De-correlating all the xs with each other')
 26 | print('----------------------------')
 27 | 
 28 | xs = X_train.T
 29 | 
 30 | np.random.seed(1)
 31 | 
 32 | def activate(w, x):
 33 |     linear = x = w.T @ x    
 34 |     # out = linear
 35 |     out = np.maximum(linear, 0) #relu    
 36 |     return out
 37 | 
 38 | nodes = 500
 39 | w = np.random.randn(xs.shape[0], nodes)
 40 | xs = activate(w , xs)
 41 | 
 42 | X_train = activate(w , X_train.T).T
 43 | X_test = activate(w , X_test.T).T
 44 | 
 45 | N = xs.shape[1]
 46 | M = xs.shape[0]
 47 | 
 48 | x = xs.copy() 
 49 | sx = np.zeros(M)
 50 | 
 51 | for i in range(M):
 52 |     sx[i] = np.sum(x[i]**2)
 53 |     for j in range(i+1, M):
 54 |         wx = np.sum(x[i] * x[j]) / (sx[i] + 1e-8)  # Adding a small constant to avoid division by zero
 55 |         x[j] -= wx * x[i]
 56 | 
 57 | num_classes = len(np.unique(y_train))
 58 | 
 59 | ### finding the weights of the decorrelated xs with ys
 60 | print("Method 1. regression on ys using multiple y_classes in the form of one_hot matrix")
 61 | ys = np.zeros((N, num_classes))
 62 | for i in range(len(y_train)):  # converting the classes to one_hot format
 63 |     ys[i, y_train[i]] = 1
 64 |     
 65 | wy = np.zeros((M, num_classes))    
 66 | for c in range(num_classes):
 67 |     y = ys[:, c]
 68 |     for i in range(M-1, -1, -1):
 69 |         wy[i, c] = np.sum(y * x[i]) / (sx[i] + 1e-8)  # Adding a small constant to avoid division by zero
 70 |         y -= wy[i, c] * xs[i]    
 71 |     
 72 | ## predict the training set
 73 | yh_train = X_train @ wy
 74 | y_train_pred = np.argmax(yh_train, axis=1)
 75 | train_accuracy = np.sum(y_train_pred == y_train) / len(y_train)
 76 | print('train accuracy:', train_accuracy)
 77 | 
 78 | ## predict the testing set
 79 | yh_test = X_test @ wy
 80 | y_test_pred = np.argmax(yh_test, axis=1)
 81 | test_accuracy = np.sum(y_test_pred == y_test) / len(y_test)
 82 | print('test accuracy:', test_accuracy)
 83 | 
 84 | print("---------------------------------")
 85 | print("Method 2. regression on ys with simple rounding & thresholding of the predicted y classes.....")
 86 | 
 87 | wy = np.zeros(M)
 88 | y = y_train.astype(float)    
 89 | for i in range(M-1, -1, -1):
 90 |     wy[i] = np.sum(y * x[i]) / (sx[i] + 1e-8)  # Adding a small constant to avoid division by zero
 91 |     y -= wy[i] * xs[i]
 92 | 
 93 | ## predict the training set
 94 | yh_train = X_train @ wy
 95 | y_train_pred = np.round(yh_train).astype(int)
 96 | y_train_pred = np.clip(y_train_pred, 0, num_classes-1)
 97 | train_accuracy = np.sum(y_train_pred == y_train) / len(y_train)
 98 | print('train accuracy:', train_accuracy)
 99 | 
100 | ## predict the testing set
101 | yh_test = X_test @ wy
102 | y_test_pred = np.round(yh_test).astype(int)
103 | y_test_pred = np.clip(y_test_pred, 0, num_classes-1)
104 | test_accuracy = np.sum(y_test_pred == y_test) / len(y_test)
105 | print('test accuracy:', test_accuracy)
106 | 
107 | print("---------------------------------")
108 | print("Method 3. regression on ys using multiple y_classes in the form of random vectors (embeddings)")
109 | 
110 | np.random.seed(1)
111 | # Generate a random vector (embedding) for each class
112 | embed_size = 10  # can be the same as the class number, 3, or it can be more
113 | class_vectors = np.random.randn(num_classes, embed_size)
114 | ys = np.zeros((N, embed_size))
115 | # Assign each label its corresponding random vector
116 | for i in range(len(y_train)):
117 |     ys[i] = class_vectors[y_train[i]]
118 |     
119 | wy = np.zeros((M, embed_size))    
120 | for c in range(embed_size):
121 |     y = ys[:, c]
122 |     for i in range(M-1, -1, -1):
123 |         wy[i, c] = np.sum(y * x[i]) / (sx[i] + 1e-8)  # Adding a small constant to avoid division by zero
124 |         y -= wy[i, c] * xs[i]    
125 |     
126 | ## In method 3, the predicted output will be a vector instead of a single number, so we use a simple distance measure below to find the nearest class vector to the output 
127 | def find_nearest_vector(vec, class_vectors):
128 |     min_distance = np.inf
129 |     min_index = 0
130 |     for i in range(len(class_vectors)):
131 |         distance = np.sum(np.abs(vec - class_vectors[i])) 
132 |         if distance < min_distance:
133 |             min_distance = distance
134 |             min_index = i
135 |     return min_index    
136 |     
137 | ## predict the training set
138 | yh_train = X_train @ wy
139 | y_train_pred = [find_nearest_vector(vec, class_vectors) for vec in yh_train]
140 | train_accuracy = np.sum(y_train_pred == y_train) / len(y_train)
141 | print('train accuracy:', train_accuracy)
142 | 
143 | ## predict the testing set
144 | yh_test = X_test @ wy
145 | y_test_pred = [find_nearest_vector(vec, class_vectors) for vec in yh_test]
146 | test_accuracy = np.sum(y_test_pred == y_test) / len(y_test)
147 | print('test accuracy:', test_accuracy)
148 | 


--------------------------------------------------------------------------------