├── .gitignore ├── PCA ├── img │ ├── zca.png │ ├── data_pca.png │ ├── dataset.png │ ├── zca_rot.png │ ├── peppers_img.png │ ├── dim_reduction.png │ ├── peppers_dst_bf.png │ ├── zca_data_pca.png │ ├── peppers_PCA1_bf.png │ ├── peppers_PCA2_bf.png │ ├── peppers_PCA3_bf.png │ ├── peppers_aligned.png │ └── peppers_disaligned.png ├── src │ ├── aux │ │ ├── peppers.png │ │ └── plot3D.plt │ ├── PCA.lua │ └── alignedPerturbation.lua └── README.md ├── MLP-regression ├── img │ ├── figure_5.3.png │ ├── x2_reg_neu.png │ ├── absx_reg_neu.png │ ├── singx_reg_neu.png │ ├── sinx_reg_neu.png │ ├── x2_trans_cost.png │ ├── x2_trans_reg.png │ └── x2_reg_neu_fix.png ├── src │ └── regression.lua └── README.md └── README.md /.gitignore: -------------------------------------------------------------------------------- 1 | # Files 2 | *.sw* 3 | .DS_Store 4 | /PCA/src/aux/dataPoints.dat 5 | -------------------------------------------------------------------------------- /PCA/img/zca.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Atcold/torch-Machine-learning-with-Torch/HEAD/PCA/img/zca.png -------------------------------------------------------------------------------- /PCA/img/data_pca.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Atcold/torch-Machine-learning-with-Torch/HEAD/PCA/img/data_pca.png -------------------------------------------------------------------------------- /PCA/img/dataset.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Atcold/torch-Machine-learning-with-Torch/HEAD/PCA/img/dataset.png -------------------------------------------------------------------------------- /PCA/img/zca_rot.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Atcold/torch-Machine-learning-with-Torch/HEAD/PCA/img/zca_rot.png -------------------------------------------------------------------------------- /PCA/img/peppers_img.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Atcold/torch-Machine-learning-with-Torch/HEAD/PCA/img/peppers_img.png -------------------------------------------------------------------------------- /PCA/src/aux/peppers.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Atcold/torch-Machine-learning-with-Torch/HEAD/PCA/src/aux/peppers.png -------------------------------------------------------------------------------- /PCA/img/dim_reduction.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Atcold/torch-Machine-learning-with-Torch/HEAD/PCA/img/dim_reduction.png -------------------------------------------------------------------------------- /PCA/img/peppers_dst_bf.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Atcold/torch-Machine-learning-with-Torch/HEAD/PCA/img/peppers_dst_bf.png -------------------------------------------------------------------------------- /PCA/img/zca_data_pca.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Atcold/torch-Machine-learning-with-Torch/HEAD/PCA/img/zca_data_pca.png -------------------------------------------------------------------------------- /PCA/img/peppers_PCA1_bf.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Atcold/torch-Machine-learning-with-Torch/HEAD/PCA/img/peppers_PCA1_bf.png -------------------------------------------------------------------------------- /PCA/img/peppers_PCA2_bf.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Atcold/torch-Machine-learning-with-Torch/HEAD/PCA/img/peppers_PCA2_bf.png -------------------------------------------------------------------------------- /PCA/img/peppers_PCA3_bf.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Atcold/torch-Machine-learning-with-Torch/HEAD/PCA/img/peppers_PCA3_bf.png -------------------------------------------------------------------------------- /PCA/img/peppers_aligned.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Atcold/torch-Machine-learning-with-Torch/HEAD/PCA/img/peppers_aligned.png -------------------------------------------------------------------------------- /PCA/img/peppers_disaligned.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Atcold/torch-Machine-learning-with-Torch/HEAD/PCA/img/peppers_disaligned.png -------------------------------------------------------------------------------- /MLP-regression/img/figure_5.3.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Atcold/torch-Machine-learning-with-Torch/HEAD/MLP-regression/img/figure_5.3.png -------------------------------------------------------------------------------- /MLP-regression/img/x2_reg_neu.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Atcold/torch-Machine-learning-with-Torch/HEAD/MLP-regression/img/x2_reg_neu.png -------------------------------------------------------------------------------- /MLP-regression/img/absx_reg_neu.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Atcold/torch-Machine-learning-with-Torch/HEAD/MLP-regression/img/absx_reg_neu.png -------------------------------------------------------------------------------- /MLP-regression/img/singx_reg_neu.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Atcold/torch-Machine-learning-with-Torch/HEAD/MLP-regression/img/singx_reg_neu.png -------------------------------------------------------------------------------- /MLP-regression/img/sinx_reg_neu.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Atcold/torch-Machine-learning-with-Torch/HEAD/MLP-regression/img/sinx_reg_neu.png -------------------------------------------------------------------------------- /MLP-regression/img/x2_trans_cost.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Atcold/torch-Machine-learning-with-Torch/HEAD/MLP-regression/img/x2_trans_cost.png -------------------------------------------------------------------------------- /MLP-regression/img/x2_trans_reg.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Atcold/torch-Machine-learning-with-Torch/HEAD/MLP-regression/img/x2_trans_reg.png -------------------------------------------------------------------------------- /MLP-regression/img/x2_reg_neu_fix.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Atcold/torch-Machine-learning-with-Torch/HEAD/MLP-regression/img/x2_reg_neu_fix.png -------------------------------------------------------------------------------- /PCA/src/aux/plot3D.plt: -------------------------------------------------------------------------------- 1 | set object 1 rectangle from screen 0,0 to screen 1,1 fillcolor rgb "grey" behind 2 | unset border 3 | unset xtics 4 | unset ytics 5 | unset ztics 6 | set view equal xyz 7 | set xyplane 0 8 | set style line 50 lt 1 lc rgb "red" lw 3 9 | set style line 51 lt 1 lc rgb "green" lw 3 10 | set style line 52 lt 1 lc rgb "blue" lw 3 11 | set arrow 1 from 0,0,0 to 256,0,0 empty ls 50 12 | set arrow 2 from 0,0,0 to 0,256,0 empty ls 51 13 | set arrow 3 from 0,0,0 to 0,0,256 empty ls 52 14 | set view 50, 20 15 | set hidden3d 16 | splot 'aux/datapoints.dat' using 1:2:3:4 with points pt 7 ps 2 lc rgb variable notitle 17 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Machine learning with Torch 2 | 3 | This repository aims to be a collection of simple machine learning algorithms for [Torch7](http://torch.ch/). 4 | 5 | ## [Regression with MLP](MLP-regression/README.md) 6 | 7 | Usually [*multilayer perceptrons*](http://en.wikipedia.org/wiki/Multilayer_perceptron), (*MLPs*), are used for *pattern recognition* (a *classification* task) in the fields of *image* and *speech* recognition. Nevertheless, they can be effectively used for *regression*. Check out the [`MLP-regression`](MLP-regression) section to find out more about it. 8 | 9 | ## [PCA / KLT](PCA/README.md) 10 | [Principal component analysis](http://en.wikipedia.org/wiki/Principal_component_analysis), (*PCA*), or [Karhunen–Loève transform](http://en.wikipedia.org/wiki/Karhunen%E2%80%93Lo%C3%A8ve_theorem), (*KLT*), allows us to smartly reduce the dimensionality of a *data-space*. It can be used for removing the redundancy from input data (and, therefore, speeding up the learning process) and for visualisation purposes (going from, say, 10 dimensions to 3D, which we can better understand). More details can be found in the [`PCA`](PCA) section. 11 | -------------------------------------------------------------------------------- /PCA/src/PCA.lua: -------------------------------------------------------------------------------- 1 | -------------------------------------------------------------------------------- 2 | -- PCA with Torch7 3 | -------------------------------------------------------------------------------- 4 | -- Koray Kavukcuoglu (https://github.com/koraykv/unsup) 5 | -- Alfredo Canziani, Jul, Aug 14 6 | -------------------------------------------------------------------------------- 7 | 8 | -- Instruction ----------------------------------------------------------------- 9 | -- This scripts aims to provide an understanding about how to play with PCA in 10 | -- order to maniplate data dimensionality (for algorithmic speed or visualisa- 11 | -- tion). Furthermore, it shows how to perform ZCA and PCA whitening. 12 | 13 | -- Choose a spherification radius (in normal application is set to 1, but bigger 14 | -- values will look better in the chart and enable ZCA display 15 | --<<< 16 | radius = 3 -- 1, 3 for visualisation sake 17 | showZCA = false -- true/false 18 | -->>> 19 | 20 | -- You want, perhaps, also to try to enable and disable the visualisation of the 21 | -- rotated data and PCA whitening in the ZCA visualisation section below (line 22 | -- 84 and below) 23 | 24 | -- Requires -------------------------------------------------------------------- 25 | require 'gnuplot' 26 | require 'unsup' 27 | require 'sys' 28 | 29 | -- Define dataset -------------------------------------------------------------- 30 | -- Random 2D data with std ~(1.5,6) 31 | N = 100 32 | math.randomseed(os.time()) 33 | x1 = torch.randn(N) * 1.5 + math.random() 34 | x2 = torch.randn(N) * 6 + 2 * math.random() 35 | X = torch.cat(x1, x2, 2) -- Nx2 36 | 37 | -- Rotating the data randomly 38 | theta = math.random(180) * math.pi / 180 39 | R = torch.Tensor{ 40 | {math.cos(theta), -math.sin(theta)}, 41 | {math.sin(theta), math.cos(theta)} 42 | } 43 | X = X * R:t() 44 | X[{ {},1 }]:add(25) 45 | X[{ {},2 }]:add(10) 46 | 47 | -- PCA ------------------------------------------------------------------------- 48 | -- X is m x n 49 | mean = torch.mean(X, 1) -- 1 x n 50 | m = X:size(1) 51 | Xm = X - torch.ones(m, 1) * mean 52 | Xm:div(math.sqrt(m - 1)) 53 | v,s,_ = torch.svd(Xm:t()) 54 | s:cmul(s) -- n 55 | 56 | -- v: eigenvectors, s: eigenvalues of covariance matrix 57 | b = sys.COLORS.blue; n = sys.COLORS.none 58 | print(b .. 'eigenvectors (columns):' .. n); print(v) 59 | print(b .. 'eigenvalues (power/variance):' .. n); print(s) 60 | print(b .. 'sqrt of the above (energy/std):' .. n); print(torch.sqrt(s)) 61 | 62 | -- Projection ------------------------------------------------------------------ 63 | X_hat = (X - torch.ones(m,1) * mean) * v[{ {},{1} }] -- m x 1 64 | 65 | -- Visualising PCA ------------------------------------------------------------- 66 | vv = v * torch.diag(torch.sqrt(s)) 67 | vv = torch.cat(torch.ones(2,1) * mean, vv:t()) 68 | 69 | gnuplot.plot{ 70 | {'dataset',X,'+'}, 71 | {'PC1',vv[{ {1,1} , {} }],'v'}, 72 | {'PC2',vv[{ {2,2} , {} }],'v'}, 73 | {'reduced',X_hat:squeeze(), torch.zeros(m), '+'} 74 | } 75 | gnuplot.axis('equal') 76 | gnuplot.axis{-20,50,-10,30} 77 | gnuplot.grid(true) 78 | 79 | -- ZCA / spherification / whitening -------------------------------------------- 80 | X_rot = (X - torch.ones(m,1) * mean) * v 81 | X_PCA_white = X_rot * torch.sqrt(s):pow(-1):mul(radius):diag() 82 | X_ZCA_white = X_PCA_white * v:t() 83 | 84 | -- Visualising ZCA ------------------------------------------------------------- 85 | if showZCA then 86 | gnuplot.figure(2) 87 | gnuplot.plot{ 88 | {'dataset',X,'+'}, 89 | -- {'rortated',X_rot,'+'}, 90 | -- {'PCA white',X_PCA_white,'+'}, 91 | {'ZCA white',X_ZCA_white,'+'} 92 | } 93 | gnuplot.axis('equal') 94 | gnuplot.axis{-20,50,-10,30} 95 | gnuplot.grid(true) 96 | end 97 | -------------------------------------------------------------------------------- /PCA/src/alignedPerturbation.lua: -------------------------------------------------------------------------------- 1 | -------------------------------------------------------------------------------- 2 | -- Data augmentation by aligned perturbation 3 | -------------------------------------------------------------------------------- 4 | -- Alfredo Canziani, Aug 14 5 | -------------------------------------------------------------------------------- 6 | 7 | -- Instruction ----------------------------------------------------------------- 8 | -- This scripts aims to provide an understanding about how to play with PCA in 9 | -- order to generate plausible fake data for fighting overfitting. 10 | -- No user input is required, but you are very welcome to muck around. 11 | 12 | -- Requires -------------------------------------------------------------------- 13 | require 'image' 14 | require 'gnuplot' 15 | require 'sys' 16 | 17 | -- Function definition (skip, not important) ----------------------------------- 18 | function rgb(rgb) 19 | return rgb[1]*256^2 + rgb[2]*256^1 + rgb[3]*256^0 20 | end 21 | 22 | function dumpToFile(colourImage) 23 | data = io.open('aux/dataPoints.dat','w+') 24 | for i = 1, colourImage:size(2) do 25 | for j = 1, colourImage:size(3) do 26 | data:write(string.format( 27 | '%f %f %f %d\n', 28 | colourImage[1][i][j], 29 | colourImage[2][i][j], 30 | colourImage[3][i][j], 31 | rgb(colourImage[{ {},i,j }]) 32 | )) 33 | end 34 | end 35 | data:close() 36 | end 37 | 38 | function gnuplot.colourPxDistribution(image) 39 | dumpToFile(image) 40 | plotCmd = io.open('aux/plot3D.plt','r') 41 | gnuplot.raw(plotCmd:read('*all')) 42 | plotCmd:close() 43 | gnuplot.title('3D colourspace pixels distribution') 44 | end 45 | 46 | function image.loadByte(str) 47 | return image.load(str):mul(255):add(.5):floor() 48 | end 49 | 50 | -- Loading dataset/image ------------------------------------------------------- 51 | -- Load image in byte (0-255) format 52 | img = image.loadByte('aux/peppers.png') 53 | 54 | -- Display the image and the px distribution 55 | image.display{image = img, zoom = 4, legend = 'Original image', min = 0, max = 255} 56 | gnuplot.colourPxDistribution(img) 57 | 58 | -- Rearranging pixel components along 3-column X matrix 59 | imgT = img:transpose(1,2):transpose(2,3):clone() 60 | X = imgT:reshape(img:size(2)*img:size(3),img:size(1)) 61 | 62 | -- PCA ------------------------------------------------------------------------- 63 | -- X is m x n 64 | mean = torch.mean(X, 1) -- 1 x n 65 | m = X:size(1) 66 | Xm = X - torch.ones(m, 1) * mean 67 | Xm:div(math.sqrt(m - 1)) 68 | v,s,_ = torch.svd(Xm:t()) 69 | s:cmul(s) -- n 70 | 71 | -- v: eigenvectors, s: eigenvalues of covariance matrix 72 | b = sys.COLORS.blue; n = sys.COLORS.none 73 | print(b .. 'eigenvectors (columns):' .. n); print(v) 74 | print(b .. 'sqrt of eigenvalues (energy/std):' .. n); print(torch.sqrt(s)) 75 | 76 | -- Scaling eigenvectors with corresponding std 77 | vv = v * torch.diag(torch.sqrt(s)) 78 | 79 | -- Visualising PCA ------------------------------------------------------------- 80 | -- Line style for PCA arrows 81 | gnuplot.raw('set style line 53 lt 1 lc rgb "white" lw 3') 82 | 83 | -- Drawing arrows 84 | mean = mean:squeeze() 85 | for cmp = 1, 3 do 86 | arrow = mean + vv[{ {},cmp }] 87 | cmd = string.format( 88 | 'set arrow %d from %f,%f,%f to %f,%f,%f empty ls 53 front', 89 | 3 + cmp, 90 | mean [1], mean [2], mean [3], 91 | arrow[1], arrow[2], arrow[3] 92 | ) 93 | gnuplot.raw(cmd) 94 | end 95 | gnuplot.plotflush() 96 | 97 | -- Aligned perturbation -------------------------------------------------------- 98 | collection1 = {} 99 | for i = 1, 12 do 100 | perturbation = vv * torch.randn(3,1) * 0.2 101 | X_hat = X + torch.ones(m,1) * perturbation:t() 102 | img_hatT = X_hat:reshape(img:size(2),img:size(3),img:size(1)) 103 | img_hat = img_hatT:transpose(2,3):transpose(1,2) 104 | table.insert(collection1,img_hat:clone()) 105 | end 106 | image.display{ 107 | image = collection1, legend = 'Aligned perturbation', 108 | zoom = 4/3, nrow = 4, min = 0, max = 255 109 | } 110 | 111 | -- Disaligned perturbation ----------------------------------------------------- 112 | collection2 = {} 113 | for i = 1, 12 do 114 | perturbation = torch.randn(3,1) * 0.2 * math.sqrt(s:sum()) 115 | X_hat = X + torch.ones(m,1) * perturbation:t() 116 | img_hatT = X_hat:reshape(img:size(2),img:size(3),img:size(1)) 117 | img_hat = img_hatT:transpose(2,3):transpose(1,2) 118 | table.insert(collection2,img_hat:clone()) 119 | end 120 | image.display{ 121 | image = collection2, legend = 'Disaligned perturbation', 122 | zoom = 4/3, nrow = 4, min = 0, max = 255 123 | } 124 | -------------------------------------------------------------------------------- /MLP-regression/src/regression.lua: -------------------------------------------------------------------------------- 1 | -------------------------------------------------------------------------------- 2 | -- Regression with Neural Networks 3 | -------------------------------------------------------------------------------- 4 | -- Alfredo Canziani, Jul 14 5 | -------------------------------------------------------------------------------- 6 | 7 | -- Instruction ----------------------------------------------------------------- 8 | 9 | -- + Monitor the progress 10 | -- You can run this script with small (1e2) values for and see 11 | -- how the MLP learns the target function through partial results. 12 | -- Have a look also how the cost function decreses with the progress of the 13 | -- training. Notice that, being randomly initialised, the output will change if 14 | -- the script is run another time. 15 | 16 | -- + Check each neuron 17 | -- You can also choose a higer (1e4) and then set to 18 | -- . In this way you will be able to check an optimal 19 | -- regressed function along with the output of the three hidden neurons. 20 | 21 | -- Choose a combination (comment the other) 22 | --<<< 23 | plotIntermediateResults = true 24 | maxIteration = 1e2 25 | --<<<>>> 26 | --plotIntermediateResults = false 27 | --maxIteration = 1e4 -- 1e4 for (a) & (b), 1e5 for (c), 1e6 for (d) 28 | -->>> 29 | 30 | -- Requires -------------------------------------------------------------------- 31 | require 'nn' 32 | require 'gnuplot' 33 | 34 | -- Define dataset -------------------------------------------------------------- 35 | dataset = {} 36 | function dataset:size() return 50 end 37 | x = torch.linspace(-1,1,dataset:size()) 38 | 39 | -- Here you can pick the function you want to regress by commenting out the 40 | -- other assignments "y = ..." 41 | --<<< (a) 42 | y = x:clone():pow(2) 43 | --y = x:clone():mul(math.sqrt(2)):pow(2) - 1 44 | --<<< (b) >>> 45 | --y = torch.sin(x * 2.5) 46 | --<<< (c) >>> 47 | --y = torch.abs(x*2)-1 48 | --<<< (d) >>> 49 | --y = x:gt(0):double() * 2 - 1 50 | -->>> 51 | 52 | for i = 1, dataset:size() do 53 | dataset[i] = {x:reshape(x:size(1),1)[i], y:reshape(y:size(1),1)[i]} 54 | end 55 | 56 | -- Define model architecture --------------------------------------------------- 57 | model = nn.Sequential() 58 | model:add(nn.Linear(1,3)) 59 | model:add(nn.Tanh()) 60 | model:add(nn.Linear(3,1)) 61 | 62 | -- Trainer definition ---------------------------------------------------------- 63 | criterion = nn.MSECriterion() 64 | trainer = nn.StochasticGradient(model, criterion) 65 | trainer.learningRate = 0.01 66 | trainer.maxIteration = maxIteration 67 | trainer.verbose = false 68 | 69 | -- Hook iteration function 70 | nbSnapShots = 5 71 | h = torch.Tensor(nbSnapShots+1, x:size(1)) 72 | plt = {{'Training data', x, y, '+'}} 73 | 74 | -- Starting condition 75 | if plotIntermediateResults then 76 | for i = 1, x:size(1) do 77 | h[#plt][i] = model:forward(x:reshape(x:size(1),1)[i])[1] 78 | end 79 | table.insert(plt,{'Iteration ' .. 0, x, h[#plt], '-'}) 80 | end 81 | gnuplot.plot(plt) 82 | gnuplot.grid(true) 83 | 84 | nbErrorPoints = 20 85 | costFunction = {} 86 | costIter = {} 87 | 88 | function trainer.hookIteration(train, iteration, currentError) 89 | if iteration % (train.maxIteration/nbSnapShots) == 0 then 90 | if plotIntermediateResults then 91 | for i = 1, x:size(1) do 92 | h[#plt][i] = model:forward(x:reshape(x:size(1),1))[i] 93 | end 94 | table.insert(plt,{'Iteration ' .. iteration, x, h[#plt], '-'}) 95 | gnuplot.figure(1) 96 | gnuplot.plot(plt) 97 | end 98 | print("# Epoch " .. iteration .. ", error: ", currentError) 99 | end 100 | if iteration % (train.maxIteration/nbErrorPoints) == 0 or iteration == 1 then 101 | table.insert(costFunction, currentError) 102 | table.insert(costIter, iteration) 103 | gnuplot.figure(2) 104 | gnuplot.plot{'Cost function', torch.Tensor(costIter), torch.Tensor(costFunction)} 105 | gnuplot.xlabel('Iterations') 106 | gnuplot.grid(true) 107 | end 108 | end 109 | 110 | -- Training -------------------------------------------------------------------- 111 | -- Start timer 112 | timer = torch.Timer() 113 | 114 | -- Training 115 | trainer:train(dataset) 116 | 117 | -- Profiling 118 | t = timer:time().real 119 | print('Time per iteration [ms]: ', t * 1000 / trainer.maxIteration) 120 | print('Total training time [min]: ', t / 60) 121 | 122 | -- Check neurones -------------------------------------------------------------- 123 | if not plotIntermediateResults then 124 | gnuplot.figure(1) 125 | for i = 1, x:size(1) do 126 | h[1][i] = model:forward(x:reshape(x:size(1),1)[i])[1] 127 | h[2][i] = model.modules[2].output[1] 128 | h[3][i] = model.modules[2].output[2] 129 | h[4][i] = model.modules[2].output[3] 130 | end 131 | table.insert(plt,{'Regression', x, h[1], '-'}) 132 | table.insert(plt,{'Neuron 1', x, h[2], '-'}) 133 | table.insert(plt,{'Neuron 2', x, h[3], '-'}) 134 | table.insert(plt,{'Neuron 3', x, h[4], '-'}) 135 | gnuplot.plot(plt) 136 | end 137 | -------------------------------------------------------------------------------- /MLP-regression/README.md: -------------------------------------------------------------------------------- 1 | # Regression with MLP 2 | 3 | Reading Bishop's [*Pattern Recognition and Machine Learning*](http://research.microsoft.com/en-us/um/people/cmbishop/prml/), I got to the point in which a 3-layer [MLP](http://en.wikipedia.org/wiki/Multilayer_perceptron) with **only 3 hidden neurons** (and 1 output linear neuron) was used to regress seamlessly some continuous functions. Here's the image 4 | 5 | ![Figure 5.2](img/figure_5.3.png) 6 | 7 | ## Figure 5.3(a) 8 | Therefore, I wanted to reproduce these results, starting from 5.3(a). Notice that *f*_a : [−1,+1] → [0,+1], *x* ↦ *x*² whereas **tanh** : ℝ → [−1,+1], but it looks like **tanh** : ℝ → [0,+1]. Therefore, I assume a scaling and shifting factor have been applied to the *tanhs* in order to make them look better on the paper (or *f*_a(*x*) = (√2 *x*)² − 1). 9 | 10 | ### MLP and neurons' outputs 11 | Running [`src/regression.lua`](src/regression.lua) with `plotIntermediateResults = false` and `maxIteration = 1e4` produces the following result 12 | 13 | ![*x*², regression and neuron's output](img/x2_reg_neu.png) 14 | 15 | Here we can see the output of the 3 hidden neurons and how they are linearly combined to produce the *regression* of the input function. Notice that, in this specific case, only two hidden neurons are actually contributing to the output since the one with constant output has the same contribution of a bias term. 16 | 17 | ### Transient 18 | The script [`src/regression.lua`](src/regression.lua) can also be run with `plotIntermediateResults = true` and `maxIteration = 1e2`. In this way we can monitor the progress of the training algorithm in its early iterations (i.e. its *transient response*) and see how the convergence is reached. 19 | 20 | ![*x*² transient cost function](img/x2_trans_cost.png) 21 | ![*x*² transient regression](img/x2_trans_reg.png) 22 | 23 | ### Run the script 24 | Running the script is pretty simple. All you need is to read the instruction at the top of the [file](src/regression.lua) and run *Torch* interactively. 25 | 26 | ```bash 27 | th -i regression.lua 28 | ``` 29 | 30 | ### The algorithm 31 | The majority of [`src/regression.lua`](src/regression.lua) is visualisation stuff. The algorithmic part is pretty small and simple. I will report it here as well, for clarity and reference. 32 | 33 | ```lua 34 | -- Define dataset -------------------------------------------------------------- 35 | dataset = {} 36 | function dataset:size() return 50 end 37 | x = torch.linspace(-1,1,dataset:size()) 38 | y = x:clone():pow(2) 39 | for i = 1, dataset:size() do 40 | dataset[i] = {x:reshape(x:size(1),1)[i], y:reshape(y:size(1),1)[i]} 41 | end 42 | 43 | -- Define model architecture --------------------------------------------------- 44 | model = nn.Sequential() 45 | model:add(nn.Linear(1,3)) 46 | model:add(nn.Tanh()) 47 | model:add(nn.Linear(3,1)) 48 | 49 | -- Trainer definition ---------------------------------------------------------- 50 | criterion = nn.MSECriterion() 51 | trainer = nn.StochasticGradient(model, criterion) 52 | trainer.learningRate = 0.01 53 | 54 | -- Training -------------------------------------------------------------------- 55 | trainer:train(dataset) 56 | ``` 57 | 58 | ### Like in Figure 5.3(a) 59 | If we really want to match the result of Figure 5.3(a) we can use *f*_a(*x*) = (√2 *x*)² − 1 and therefore the following assignment for `y` 60 | 61 | ```lua 62 | y = x:clone():mul(math.sqrt(2)):pow(2) - 1 63 | ``` 64 | 65 | ![*x*², regression and neuron's output](img/x2_reg_neu_fix.png) 66 | 67 | ## Figure 5.3(b) 68 | Here is clear, again, that the function sin(*x*) has been manipulated. And, precisely, *f*_b(*x*) = sin(2.5∙*x*). Therefore, by choosing the case (b) in the code (i.e. commenting out the other function definitions, as explained in the script itself) we can proceed and run it with *Torch7*. 69 | 70 | ### MLP and neurons' outputs 71 | Picking again `plotIntermediateResults = false` and `maxIteration = 1e4` produces decent results 72 | 73 | ![sin(*x*), regression and neuron's output](img/sinx_reg_neu.png) 74 | 75 | ### The algorithm 76 | The only difference is in the creation of the `dataset`, which now is built with a different `y` 77 | 78 | ```lua 79 | y = torch.sin(x * 2.5) 80 | ``` 81 | 82 | ## Figure 5.3(c) 83 | Here we are dealing with out first function ∉ 𝒞¹. More specifically, *f*_c(*x*) = |2∙*x*| − 1 and we are trying to approximate it with a linear combination of 3 𝒞¹ functions. 84 | 85 | ### MLP and neurons' outputs 86 | For this reason we need a higher number of iterations, let's say `maxIteration = 1e5`. Hence, after having commented out what needs to be, we can get the following result 87 | 88 | ![|*x*|, regression and neuron's output](img/absx_reg_neu.png) 89 | 90 | ### The algorithm 91 | Again here, the only difference is the `y` assignment and, more precisely 92 | 93 | ```lua 94 | y = torch.abs(x*2)-1 95 | ``` 96 | 97 | ## Figure 5.3(d) 98 | In this last case, things get even worse since sign(*x*) is not even continuous (∉ 𝒞⁰)! Nevertheless, we are in a fortunate case, where we can cosider sign(*x*) being the limit function of a tanh(*ax*), *a* → +∞. Moreover, being sign(*x*) sampled, *a* can just be "big" and we don't need to deal with dangerous symbols like "+∞". 99 | 100 | ### MLP and neurons' outputs 101 | To reach convergence, we need some more steps, in this case. Setting `maxIteration = 1e6` will do the job. Pay attention that this will take approximately 30 minutes on a MacBook Pro, 2.2 GHz and with an Intel i7. 102 | 103 | ![sign(*x*), regression and neuron's output](img/singx_reg_neu.png) 104 | 105 | ### The algorithm 106 | Same story as before, we just need a different assignment for `y` 107 | 108 | ```lua 109 | y = x:gt(0):double() * 2 - 1 110 | ``` 111 | -------------------------------------------------------------------------------- /PCA/README.md: -------------------------------------------------------------------------------- 1 | # PCA 2 | [*Principal component analysis*](http://en.wikipedia.org/wiki/Principal_component_analysis) (*PCA*) finds the directions of greatest variance in a dataset. 3 | 4 | ## Index 5 | - [Why do we care?](#why-do-we-care) 6 | - [How does it work?](#how-does-it-work) 7 | - [What is used for?](#what-is-used-for) 8 | - [Dimensionality reduction](#dimensionality-reduction) 9 | - [Acquiring knowledge of variance distribution](#acquiring-knowledge-of-variance-distribution) 10 | - [Reducing data dimensionality](#reducing-data-dimensionality) 11 | - [Conclusion](#conclusion) 12 | - [Run the script](#run-the-script) 13 | - [The algorithm](#the-algorithm) 14 | - [Data spherification](#data-spherification) 15 | - [Run the script](#run-the-script-1) 16 | - [The algorithm](#the-algorithm-1) 17 | - [Data augmentation by aligned perturbation](#data-augmentation-by-aligned-perturbation) 18 | - [Introduction](#introduction) 19 | - [Example](#example) 20 | - [Justification](#justification) 21 | - [Run the script](#run-the-script-2) 22 | - [The algorithm](#the-algorithm-2) 23 | 24 | ## Why do we care? 25 | PCA can do a great deal of useful things such as: 26 | 27 | - speed up training; 28 | - 2/3D representation of data living in *n*-D, *n* > 3; 29 | - data augmentation by aligned perturbation; 30 | - ZCA whitening. 31 | 32 | It can of course as well screw up everything (see [Feldman's blog post](http://blog.explainmydata.com/2012/07/should-you-apply-pca-to-your-data.html), for example). 33 | 34 | ## How does it work? 35 | To find the *first component*, PCA looks for a **linear combination of the elements of your dataset's vectors, which explains the highest variation of the data**. Think of it as the *versor* on which, the projection (*dot product*) of the dataset will have its highest variability. 36 | For the *second component*, PCA looks for another (linear) combination — i.e. another *versor* — **orthogonal to the first one**, which explains the second highest variation of the data. 37 | For the third, same story. In this case, the elements's combination has to be vertical to all previously found one. Etc… 38 | 39 | ## What is used for? 40 | 41 | ### Dimensionality reduction 42 | For each *versor*, *principal component* or *eigenvector* there is an associated *power* (or *energy*, if square rooted), *variance* (or *standard deviation*, if square rooted) or *eigenvalue* which tells us the "amount of variability" in that direction. What happens often is that only the first few components have non-neglectable variance. Hence, data dimensionality can be greatly reduced with little loss of information. 43 | In turn, dimensionality reduction can be used to perform a series of tricks, such as *training speed-up* and *2/3D visualisation of high dimensional data* I mentioned above. 44 | 45 | ### Acquiring knowledge of variance distribution 46 | Knowing the *direction* and the *amount* of variance of our data allows us, by playing smartly with them, to achieve reasonable *data augmentation* and *data spherificatoin*. More about it will be said later, in these notes. 47 | 48 | ## Reducing data dimensionality 49 | OK, let's get our hands dirty with PCA. 50 | So, after all this chatting, let's get a bit more specific with a case study. 51 | Let's think we have a data living in a 2D space — with an uneven distribution — we'd like to compress into 1D, i.e. onto a line. 52 | So, this is how the data looks like 53 | 54 | ![Datasest](img/dataset.png) 55 | 56 | This data is said to be *correlated*. This means that the value of one component influence the other component, hence both component are "important". Let's run PCA 57 | 58 | ![PCA](img/data_pca.png) 59 | 60 | ```lua 61 | eigenvectors (colums): 62 | -0.5488 0.8359 63 | 0.8359 0.5488 64 | [torch.DoubleTensor of dimension 2x2] 65 | 66 | eigenvalues (power/variance): 67 | 27.2135 68 | 2.3272 69 | [torch.DoubleTensor of dimension 2] 70 | 71 | sqrt of the above (energy/std): 72 | 5.2167 73 | 1.5255 74 | [torch.DoubleTensor of dimension 2] 75 | ``` 76 | Great. Now we have the direction of highest variability (1st component) and its orthogonal one. 77 | If we consider the data in its new reference system (represented by the two principal components), we can say it is *uncorrelated*. Losely speaking, this means that one component does not influence the value of the other component. 78 | Let's look at the text output. Here we can see that the total *energy* / *information* of `5.44` (= √[`27.2` + `2.33`]) is spread unevenly across the components. `5.22` on the first and `1.53` on the second one. This means that, if we project the dataset onto the first component and discard the second one, we would retain 92.1% (= `27.2`/[`27.2` + `2.33`]) of the *variance*. 79 | OK, it looks like cool. Let's project 80 | 81 | ![Dimensionality reduction](img/dim_reduction.png) 82 | 83 | Great! Now we have data along only 1 dimension which have a variability very close to the original data. 84 | Notice how the "spare datapoint" at south-east in the original data is mapped on the far-west in the projected replica. This is because the "green arrow" of the first component is pointing on the opposite direction, hence the projection will be "very negative". 85 | Notice also the 5 datapoints at north-west in the original data, are eavenly separated in the far-east projected data. 86 | 87 | ### Conclusion 88 | Cool. Now we know how to reduce data dimensionality. In turn, this means we can speed up our training (by using less input data) and we are able to visualise data living in high-D onto 3/2/1D. 89 | 90 | ### Run the script 91 | Running the script is pretty simple. All you need is to read the instruction at the top of the file and run Torch interactively. 92 | 93 | ``` 94 | th -i PCA.lua 95 | ``` 96 | 97 | ### The algorithm 98 | The script I've used so far is [`src/PCA.lua`](src/PCA.lua). *PCA* and *projection* are shown below. 99 | 100 | ```lua 101 | -- PCA ------------------------------------------------------------------------- 102 | -- X is m x n 103 | mean = torch.mean(X, 1) -- 1 x n 104 | m = X:size(1) 105 | Xm = X - torch.ones(m, 1) * mean 106 | Xm:div(math.sqrt(m - 1)) 107 | v,s,_ = torch.svd(Xm:t()) 108 | s:cmul(s) -- n 109 | 110 | -- Projection ------------------------------------------------------------------ 111 | X_hat = (X - torch.ones(m,1) * mean) * v[{ {},{1} }] -- m x 1 112 | ``` 113 | 114 | `X`, a `m` × `n` matrix, contains our dataset by rows; in this case `m` = `100` and `n` = `2`, i.e. we have `100` `2`-dimensional datapoints. `X_hat` is our projected data onto the *first component* `v[{ {},{1} }]`. 115 | 116 | ## Data spherification 117 | This framework we just build could be also useful for illustrating *ZCA whitening* or *data spherificatoin*. 118 | First of all, what the heck is "data spherification"? Well, in easy terms, it means redistribute data (along its principal components) in a way such that the variance is constant, i.e. minimum correlation. This makes sense **only** if the data components are generated by close parents. Otherwise, you just take a ticket for the doom way (see [Feldman's blog post](http://blog.explainmydata.com/2012/07/should-you-apply-pca-to-your-data.html), for example). This tecnhique is useful for "stripping off" the data all the "rubbish information" that won't supposedly help the job of other algorithms down the pipeline. 119 | This said, let's get our cucumber-blob to reseable more a tomatoe-blob, starting with a new dataset and its PCA 120 | 121 | ![PCA for ZCA](img/zca_data_pca.png) 122 | 123 | Its *standard deviations* are `5.51` and `1.54`. We can rotate the data by evaluating the projection on both *principal components* and dividing it by the corresponding standard deviation obtaining, therefore, *PCA whitening*. (For sake of visualisation, I've multiplied the data by 3, obtaining hence std of `3` per component.) 124 | 125 | ![ZCA rotated data](img/zca_rot.png) 126 | 127 | Note how the "rotated data" is also "flipped" (for ease of spotting this, I've highlighten a group of four datapoints that have a particular L-shape). This happens because the second principal component happened to be on the "negative side" of a "standard" positively oriented reference system (i.e. it's at 90° clockwise rotation from the first component). Therefore, in order to put things to their right place and keep the original meaninig of each component, we ought to rotate back the data into its original reference system. 128 | 129 | ![ZCA whitening](img/zca.png) 130 | 131 | ### Run the script 132 | The script is the same as above. The only thing you need to do is to enable the visualisation of *ZCA*, and tweak the sphere radius, if you so desire. So, open the script [`src/PCA.lua`](src/PCA.lua), read the instruction and change the code accordigly. 133 | 134 | ### The algorithm 135 | What *ZCA* implies is: (1) rotate the data onto its principal components, (2) normalise the variance, (3) rotate back to the original reference system. In Torch's terms 136 | 137 | ```lua 138 | -- ZCA / spherification / whitening -------------------------------------------- 139 | X_rot = (X - torch.ones(m,1) * mean) * v 140 | X_PCA_white = X_rot * torch.sqrt(s):pow(-1):diag() 141 | X_ZCA_white = X_PCA_white * v:t() 142 | ``` 143 | 144 | ## Data augmentation by aligned perturbation 145 | 146 | ### Introduction 147 | What does this bombastic title stay for? Well, the concept behind it is actually quite simple. 148 | Put yourself in the situation in which your learning algorithm is *overfitting* the dataset, i.e. it's learning the inherent distribution of the *training dataset* and won't generalise well for the *testing one*. Therefore, you'd like to artificially augment your training dataset by adding some noise in a "smart" way. E.g, you could add centred (`0`-mean) small (`0.2`-std) Gaussian noise along the data principal components scaled by the square root of the corresponding eigenvalues. In this way, the "general trend" is preserved and the new fictitious observation will be quite plausible. 149 | 150 | ### Example 151 | Let's get our hands on, to get an understanding out of this nice concept. 152 | Let our dataset be the pixels of a colour (`3` channels) image of `96` rows and `128` columns. 153 | 154 | ![Peppers](img/peppers_img.png) 155 | 156 | Here the pixels are aligned on a plane in a specific order constituting what we call *image*. Let's throw them into a 3D space letting their colour components' values determine their position. Here they are (if you run the code you will have the chance of rotating the 3D scatter plot and have a better idea of the pixels' 3D distribution). 157 | 158 | ![Peppers pixels's distribution](img/peppers_dst_bf.png) 159 | 160 | Now we can compute the principal components (as said before, if you run the script you'll be able to change the point of view using the mouse, which will help understand the distribution's shape and the position of the new reference system). 161 | 162 | ![Peppers PCA view 1](img/peppers_PCA1_bf.png) 163 | ![Peppers PCA view 2](img/peppers_PCA2_bf.png) 164 | ![Peppers PCA view 3](img/peppers_PCA3_bf.png) 165 | 166 | Hence, we can add a small amount of centred Gaussian noise along to the principal component directions, scaled by the corresponding standard deviation. Here's how the recontructed image looks like for `12` different draws of the random variable. 167 | 168 | ![Peppers aligned perturbation](img/peppers_aligned.png) 169 | 170 | ### Justification 171 | Someone may argue about why introducing all this framework if, at the end, we simply use "random values". Well… the results of using spherical random values (in contrast to our ellipsoidal approach) is the following. 172 | 173 | ![Peppers disaligned perturbation](img/peppers_disaligned.png) 174 | 175 | It is undeniable that *aligned perturbation* produces far more credible results. What happens is that the component that has greater spread will eventually "move" much more than those that are more localised, in terms of colour space coordinates. 176 | In this specific case — as we can see from our 3D pixels distribution's scatter plots — the major component (√*s*₁ = `76.2`) closely approximates the *brightness* channel, i.e. the oriented line that gose from (`0`,`0`,`0`) to (`255`,`255`,`255`), even though it is oriented in the opposite direction. Therefore, the highest perturbation will occur in terms of brightness variability, which won't affect the overall appearance of the image, due our *brightness visual invariancy*. Furthermore, all perturbations are compliant with the "data distribution shape", hence the output will look more "natural". The remaining two components (√*s*₂ = `43.1` and √*s*₃ = `29.8`), which are orthogonal to the brightness one, will change mainly the *saturation* (average std radius of `52.4`) and less the *hue* (average rotation of `34.7` degrees). 177 | 178 | ### Run the script 179 | In this case, running the script [`src/alignedPerturbation.lua`](src/alignedPerturbation.lua) requires `qlua` for the visualisation of the images. Therefore, we can start an interactive session with 180 | 181 | ```bash 182 | qlua -i alignedPerturbation.lua 183 | ``` 184 | 185 | ### The algorithm 186 | It comprises 3 main parts: (1) loading the dataset, (2) computing PCA (and this is the exact code you can read above) plus scaling the eigenvector with the corresponding eigenvalues' square root and (3) add noise alongside the principal components. In code we have 187 | 188 | ```lua 189 | -- Loading dataset/image ------------------------------------------------------- 190 | -- Load image in byte (0-255) format 191 | img = image.loadByte('aux/peppers.png') 192 | 193 | -- Rearranging pixel components along 3-column X matrix 194 | imgT = img:transpose(1,2):transpose(2,3):clone() 195 | X = imgT:reshape(img:size(2)*img:size(3),img:size(1)) 196 | 197 | -- PCA ------------------------------------------------------------------------- 198 | -- see above -- 199 | 200 | -- Scaling eigenvectors with corresponding std 201 | vv = v * torch.diag(torch.sqrt(s)) 202 | 203 | -- Aligned perturbation -------------------------------------------------------- 204 | collection1 = {} 205 | for i = 1, 12 do 206 | perturbation = vv * torch.randn(3,1) * 0.2 207 | X_hat = X + torch.ones(m,1) * perturbation:t() 208 | img_hatT = X_hat:reshape(img:size(2),img:size(3),img:size(1)) 209 | img_hat = img_hatT:transpose(2,3):transpose(1,2) 210 | table.insert(collection1,img_hat:clone()) 211 | end 212 | ``` 213 | 214 | The only line that is actually worth mention, which constitutes the algorithm itself, is the following 215 | 216 | ```lua 217 | perturbation = vv * torch.randn(3,1) * 0.2 218 | ``` 219 | 220 | `torch.randn(3,1) * 0.2` is a `0.2` radius spherical random variable; let's call it __*a*__. Therefore, we'd like to add to our pixels: *a*₁ ∙ √*s*₁ ∙ __*v*__₁ + *a*₂ ∙ √*s*₂ ∙ __*v*__₂ + *a*₃ ∙ √*s*₃ ∙ __*v*__₃ = [__vv__] ∙ __*a*__, where [__vv__] is the matrix of scaled eigenvectors, i.e. [__vv__] = [__v__] ∙ `diag(`√__*s*__`)`, with [__v__] being the matrix of eigenvectors, stacked one side each other, and __*s*__ being the vector of eigenvalues. 221 | --------------------------------------------------------------------------------