├── .gitignore
├── PCA
    ├── img
    │   ├── zca.png
    │   ├── data_pca.png
    │   ├── dataset.png
    │   ├── zca_rot.png
    │   ├── peppers_img.png
    │   ├── dim_reduction.png
    │   ├── peppers_dst_bf.png
    │   ├── zca_data_pca.png
    │   ├── peppers_PCA1_bf.png
    │   ├── peppers_PCA2_bf.png
    │   ├── peppers_PCA3_bf.png
    │   ├── peppers_aligned.png
    │   └── peppers_disaligned.png
    ├── src
    │   ├── aux
    │   │   ├── peppers.png
    │   │   └── plot3D.plt
    │   ├── PCA.lua
    │   └── alignedPerturbation.lua
    └── README.md
├── MLP-regression
    ├── img
    │   ├── figure_5.3.png
    │   ├── x2_reg_neu.png
    │   ├── absx_reg_neu.png
    │   ├── singx_reg_neu.png
    │   ├── sinx_reg_neu.png
    │   ├── x2_trans_cost.png
    │   ├── x2_trans_reg.png
    │   └── x2_reg_neu_fix.png
    ├── src
    │   └── regression.lua
    └── README.md
└── README.md


/.gitignore:
--------------------------------------------------------------------------------
1 | # Files
2 | *.sw*
3 | .DS_Store
4 | /PCA/src/aux/dataPoints.dat
5 | 


--------------------------------------------------------------------------------
/PCA/img/zca.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Atcold/torch-Machine-learning-with-Torch/HEAD/PCA/img/zca.png


--------------------------------------------------------------------------------
/PCA/img/data_pca.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Atcold/torch-Machine-learning-with-Torch/HEAD/PCA/img/data_pca.png


--------------------------------------------------------------------------------
/PCA/img/dataset.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Atcold/torch-Machine-learning-with-Torch/HEAD/PCA/img/dataset.png


--------------------------------------------------------------------------------
/PCA/img/zca_rot.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Atcold/torch-Machine-learning-with-Torch/HEAD/PCA/img/zca_rot.png


--------------------------------------------------------------------------------
/PCA/img/peppers_img.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Atcold/torch-Machine-learning-with-Torch/HEAD/PCA/img/peppers_img.png


--------------------------------------------------------------------------------
/PCA/src/aux/peppers.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Atcold/torch-Machine-learning-with-Torch/HEAD/PCA/src/aux/peppers.png


--------------------------------------------------------------------------------
/PCA/img/dim_reduction.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Atcold/torch-Machine-learning-with-Torch/HEAD/PCA/img/dim_reduction.png


--------------------------------------------------------------------------------
/PCA/img/peppers_dst_bf.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Atcold/torch-Machine-learning-with-Torch/HEAD/PCA/img/peppers_dst_bf.png


--------------------------------------------------------------------------------
/PCA/img/zca_data_pca.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Atcold/torch-Machine-learning-with-Torch/HEAD/PCA/img/zca_data_pca.png


--------------------------------------------------------------------------------
/PCA/img/peppers_PCA1_bf.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Atcold/torch-Machine-learning-with-Torch/HEAD/PCA/img/peppers_PCA1_bf.png


--------------------------------------------------------------------------------
/PCA/img/peppers_PCA2_bf.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Atcold/torch-Machine-learning-with-Torch/HEAD/PCA/img/peppers_PCA2_bf.png


--------------------------------------------------------------------------------
/PCA/img/peppers_PCA3_bf.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Atcold/torch-Machine-learning-with-Torch/HEAD/PCA/img/peppers_PCA3_bf.png


--------------------------------------------------------------------------------
/PCA/img/peppers_aligned.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Atcold/torch-Machine-learning-with-Torch/HEAD/PCA/img/peppers_aligned.png


--------------------------------------------------------------------------------
/PCA/img/peppers_disaligned.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Atcold/torch-Machine-learning-with-Torch/HEAD/PCA/img/peppers_disaligned.png


--------------------------------------------------------------------------------
/MLP-regression/img/figure_5.3.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Atcold/torch-Machine-learning-with-Torch/HEAD/MLP-regression/img/figure_5.3.png


--------------------------------------------------------------------------------
/MLP-regression/img/x2_reg_neu.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Atcold/torch-Machine-learning-with-Torch/HEAD/MLP-regression/img/x2_reg_neu.png


--------------------------------------------------------------------------------
/MLP-regression/img/absx_reg_neu.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Atcold/torch-Machine-learning-with-Torch/HEAD/MLP-regression/img/absx_reg_neu.png


--------------------------------------------------------------------------------
/MLP-regression/img/singx_reg_neu.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Atcold/torch-Machine-learning-with-Torch/HEAD/MLP-regression/img/singx_reg_neu.png


--------------------------------------------------------------------------------
/MLP-regression/img/sinx_reg_neu.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Atcold/torch-Machine-learning-with-Torch/HEAD/MLP-regression/img/sinx_reg_neu.png


--------------------------------------------------------------------------------
/MLP-regression/img/x2_trans_cost.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Atcold/torch-Machine-learning-with-Torch/HEAD/MLP-regression/img/x2_trans_cost.png


--------------------------------------------------------------------------------
/MLP-regression/img/x2_trans_reg.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Atcold/torch-Machine-learning-with-Torch/HEAD/MLP-regression/img/x2_trans_reg.png


--------------------------------------------------------------------------------
/MLP-regression/img/x2_reg_neu_fix.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Atcold/torch-Machine-learning-with-Torch/HEAD/MLP-regression/img/x2_reg_neu_fix.png


--------------------------------------------------------------------------------
/PCA/src/aux/plot3D.plt:
--------------------------------------------------------------------------------
 1 | set object 1 rectangle from screen 0,0 to screen 1,1 fillcolor rgb "grey" behind
 2 | unset border
 3 | unset xtics
 4 | unset ytics
 5 | unset ztics
 6 | set view equal xyz
 7 | set xyplane 0
 8 | set style line 50 lt 1 lc rgb "red"   lw 3
 9 | set style line 51 lt 1 lc rgb "green" lw 3
10 | set style line 52 lt 1 lc rgb "blue"  lw 3
11 | set arrow 1 from 0,0,0 to 256,0,0 empty ls 50
12 | set arrow 2 from 0,0,0 to 0,256,0 empty ls 51
13 | set arrow 3 from 0,0,0 to 0,0,256 empty ls 52
14 | set view 50, 20
15 | set hidden3d
16 | splot 'aux/datapoints.dat' using 1:2:3:4 with points pt 7 ps 2 lc rgb variable notitle
17 | 


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
 1 | # Machine learning with Torch
 2 | 
 3 | This repository aims to be a collection of simple machine learning algorithms for [Torch7](http://torch.ch/).
 4 | 
 5 | ## [Regression with MLP](MLP-regression/README.md)
 6 | 
 7 | Usually [*multilayer perceptrons*](http://en.wikipedia.org/wiki/Multilayer_perceptron), (*MLPs*), are used for *pattern recognition* (a *classification* task) in the fields of *image* and *speech* recognition. Nevertheless, they can be effectively used for *regression*. Check out the [`MLP-regression`](MLP-regression) section to find out more about it.
 8 | 
 9 | ## [PCA / KLT](PCA/README.md)
10 | [Principal component analysis](http://en.wikipedia.org/wiki/Principal_component_analysis), (*PCA*), or [Karhunen–Loève transform](http://en.wikipedia.org/wiki/Karhunen%E2%80%93Lo%C3%A8ve_theorem), (*KLT*), allows us to smartly reduce the dimensionality of a *data-space*. It can be used for removing the redundancy from input data (and, therefore, speeding up the learning process) and for visualisation purposes (going from, say, 10 dimensions to 3D, which we can better understand). More details can be found in the [`PCA`](PCA) section.
11 | 


--------------------------------------------------------------------------------
/PCA/src/PCA.lua:
--------------------------------------------------------------------------------
 1 | --------------------------------------------------------------------------------
 2 | -- PCA with Torch7
 3 | --------------------------------------------------------------------------------
 4 | -- Koray Kavukcuoglu (https://github.com/koraykv/unsup)
 5 | -- Alfredo Canziani, Jul, Aug 14
 6 | --------------------------------------------------------------------------------
 7 | 
 8 | -- Instruction -----------------------------------------------------------------
 9 | -- This scripts aims to provide an understanding about how to play with PCA in
10 | -- order to maniplate data dimensionality (for algorithmic speed or visualisa-
11 | -- tion). Furthermore, it shows how to perform ZCA and PCA whitening.
12 | 
13 | -- Choose a spherification radius (in normal application is set to 1, but bigger
14 | -- values will look better in the chart and enable ZCA display
15 | --<<<
16 | radius = 3 -- 1, 3 for visualisation sake
17 | showZCA = false -- true/false
18 | -->>>
19 | 
20 | -- You want, perhaps, also to try to enable and disable the visualisation of the
21 | -- rotated data and PCA whitening in the ZCA visualisation section below (line
22 | -- 84 and below)
23 | 
24 | -- Requires --------------------------------------------------------------------
25 | require 'gnuplot'
26 | require 'unsup'
27 | require 'sys'
28 | 
29 | -- Define dataset --------------------------------------------------------------
30 | -- Random 2D data with std ~(1.5,6)
31 | N = 100
32 | math.randomseed(os.time())
33 | x1 = torch.randn(N) * 1.5 + math.random()
34 | x2 = torch.randn(N) * 6 + 2 * math.random()
35 | X = torch.cat(x1, x2, 2) -- Nx2
36 | 
37 | -- Rotating the data randomly
38 | theta = math.random(180) * math.pi / 180
39 | R = torch.Tensor{
40 |    {math.cos(theta), -math.sin(theta)},
41 |    {math.sin(theta),  math.cos(theta)}
42 | }
43 | X = X * R:t()
44 | X[{ {},1 }]:add(25)
45 | X[{ {},2 }]:add(10)
46 | 
47 | -- PCA -------------------------------------------------------------------------
48 | -- X is m x n
49 | mean = torch.mean(X, 1) -- 1 x n
50 | m = X:size(1)
51 | Xm = X - torch.ones(m, 1) * mean
52 | Xm:div(math.sqrt(m - 1))
53 | v,s,_ = torch.svd(Xm:t())
54 | s:cmul(s) -- n
55 | 
56 | -- v: eigenvectors, s: eigenvalues of covariance matrix
57 | b = sys.COLORS.blue; n = sys.COLORS.none
58 | print(b .. 'eigenvectors (columns):' .. n); print(v)
59 | print(b .. 'eigenvalues (power/variance):' .. n); print(s)
60 | print(b .. 'sqrt of the above (energy/std):' .. n); print(torch.sqrt(s))
61 | 
62 | -- Projection ------------------------------------------------------------------
63 | X_hat = (X - torch.ones(m,1) * mean) * v[{ {},{1} }] -- m x 1
64 | 
65 | -- Visualising PCA -------------------------------------------------------------
66 | vv = v * torch.diag(torch.sqrt(s))
67 | vv = torch.cat(torch.ones(2,1) * mean, vv:t())
68 | 
69 | gnuplot.plot{
70 |    {'dataset',X,'+'},
71 |    {'PC1',vv[{ {1,1} , {} }],'v'},
72 |    {'PC2',vv[{ {2,2} , {} }],'v'},
73 |    {'reduced',X_hat:squeeze(), torch.zeros(m), '+'}
74 | }
75 | gnuplot.axis('equal')
76 | gnuplot.axis{-20,50,-10,30}
77 | gnuplot.grid(true)
78 | 
79 | -- ZCA / spherification / whitening --------------------------------------------
80 | X_rot = (X - torch.ones(m,1) * mean) * v
81 | X_PCA_white = X_rot * torch.sqrt(s):pow(-1):mul(radius):diag()
82 | X_ZCA_white = X_PCA_white * v:t()
83 | 
84 | -- Visualising ZCA -------------------------------------------------------------
85 | if showZCA then
86 |    gnuplot.figure(2)
87 |    gnuplot.plot{
88 |       {'dataset',X,'+'},
89 | --    {'rortated',X_rot,'+'},
90 | --    {'PCA white',X_PCA_white,'+'},
91 |       {'ZCA white',X_ZCA_white,'+'}
92 |    }
93 |    gnuplot.axis('equal')
94 |    gnuplot.axis{-20,50,-10,30}
95 |    gnuplot.grid(true)
96 | end
97 | 


--------------------------------------------------------------------------------
/PCA/src/alignedPerturbation.lua:
--------------------------------------------------------------------------------
  1 | --------------------------------------------------------------------------------
  2 | -- Data augmentation by aligned perturbation
  3 | --------------------------------------------------------------------------------
  4 | -- Alfredo Canziani, Aug 14
  5 | --------------------------------------------------------------------------------
  6 | 
  7 | -- Instruction -----------------------------------------------------------------
  8 | -- This scripts aims to provide an understanding about how to play with PCA in
  9 | -- order to generate plausible fake data for fighting overfitting.
 10 | -- No user input is required, but you are very welcome to muck around.
 11 | 
 12 | -- Requires --------------------------------------------------------------------
 13 | require 'image'
 14 | require 'gnuplot'
 15 | require 'sys'
 16 | 
 17 | -- Function definition (skip, not important) -----------------------------------
 18 | function rgb(rgb)
 19 |    return rgb[1]*256^2 + rgb[2]*256^1 + rgb[3]*256^0
 20 | end
 21 | 
 22 | function dumpToFile(colourImage)
 23 |    data = io.open('aux/dataPoints.dat','w+')
 24 |    for i = 1, colourImage:size(2) do
 25 |       for j = 1, colourImage:size(3) do
 26 |          data:write(string.format(
 27 |             '%f %f %f %d\n',
 28 |             colourImage[1][i][j],
 29 |             colourImage[2][i][j],
 30 |             colourImage[3][i][j],
 31 |             rgb(colourImage[{ {},i,j }])
 32 |          ))
 33 |       end
 34 |    end
 35 |    data:close()
 36 | end
 37 | 
 38 | function gnuplot.colourPxDistribution(image)
 39 |    dumpToFile(image)
 40 |    plotCmd = io.open('aux/plot3D.plt','r')
 41 |    gnuplot.raw(plotCmd:read('*all'))
 42 |    plotCmd:close()
 43 |    gnuplot.title('3D colourspace pixels distribution')
 44 | end
 45 | 
 46 | function image.loadByte(str)
 47 |    return image.load(str):mul(255):add(.5):floor()
 48 | end
 49 | 
 50 | -- Loading dataset/image -------------------------------------------------------
 51 | -- Load image in byte (0-255) format
 52 | img = image.loadByte('aux/peppers.png')
 53 | 
 54 | -- Display the image and the px distribution
 55 | image.display{image = img, zoom = 4, legend = 'Original image', min = 0, max = 255}
 56 | gnuplot.colourPxDistribution(img)
 57 | 
 58 | -- Rearranging pixel components along 3-column X matrix
 59 | imgT = img:transpose(1,2):transpose(2,3):clone()
 60 | X = imgT:reshape(img:size(2)*img:size(3),img:size(1))
 61 | 
 62 | -- PCA -------------------------------------------------------------------------
 63 | -- X is m x n
 64 | mean = torch.mean(X, 1) -- 1 x n
 65 | m = X:size(1)
 66 | Xm = X - torch.ones(m, 1) * mean
 67 | Xm:div(math.sqrt(m - 1))
 68 | v,s,_ = torch.svd(Xm:t())
 69 | s:cmul(s) -- n
 70 | 
 71 | -- v: eigenvectors, s: eigenvalues of covariance matrix
 72 | b = sys.COLORS.blue; n = sys.COLORS.none
 73 | print(b .. 'eigenvectors (columns):' .. n); print(v)
 74 | print(b .. 'sqrt of eigenvalues (energy/std):' .. n); print(torch.sqrt(s))
 75 | 
 76 | -- Scaling eigenvectors with corresponding std
 77 | vv = v * torch.diag(torch.sqrt(s))
 78 | 
 79 | -- Visualising PCA -------------------------------------------------------------
 80 | -- Line style for PCA arrows
 81 | gnuplot.raw('set style line 53 lt 1 lc rgb "white" lw 3')
 82 | 
 83 | -- Drawing arrows
 84 | mean = mean:squeeze()
 85 | for cmp = 1, 3 do
 86 |    arrow = mean + vv[{ {},cmp }]
 87 |    cmd = string.format(
 88 |       'set arrow %d from %f,%f,%f to %f,%f,%f empty ls 53 front',
 89 |       3 + cmp,
 90 |       mean [1], mean [2], mean [3],
 91 |       arrow[1], arrow[2], arrow[3]
 92 |    )
 93 |    gnuplot.raw(cmd)
 94 | end
 95 | gnuplot.plotflush()
 96 | 
 97 | -- Aligned perturbation --------------------------------------------------------
 98 | collection1 = {}
 99 | for i = 1, 12 do
100 |    perturbation = vv * torch.randn(3,1) * 0.2
101 |    X_hat = X + torch.ones(m,1) * perturbation:t()
102 |    img_hatT = X_hat:reshape(img:size(2),img:size(3),img:size(1))
103 |    img_hat = img_hatT:transpose(2,3):transpose(1,2)
104 |    table.insert(collection1,img_hat:clone())
105 | end
106 | image.display{
107 |    image = collection1, legend = 'Aligned perturbation',
108 |    zoom = 4/3, nrow = 4, min = 0, max = 255
109 | }
110 | 
111 | -- Disaligned perturbation -----------------------------------------------------
112 | collection2 = {}
113 | for i = 1, 12 do
114 |    perturbation = torch.randn(3,1) * 0.2 * math.sqrt(s:sum())
115 |    X_hat = X + torch.ones(m,1) * perturbation:t()
116 |    img_hatT = X_hat:reshape(img:size(2),img:size(3),img:size(1))
117 |    img_hat = img_hatT:transpose(2,3):transpose(1,2)
118 |    table.insert(collection2,img_hat:clone())
119 | end
120 | image.display{
121 |    image = collection2, legend = 'Disaligned  perturbation',
122 |    zoom = 4/3, nrow = 4, min = 0, max = 255
123 | }
124 | 


--------------------------------------------------------------------------------
/MLP-regression/src/regression.lua:
--------------------------------------------------------------------------------
  1 | --------------------------------------------------------------------------------
  2 | -- Regression with Neural Networks
  3 | --------------------------------------------------------------------------------
  4 | -- Alfredo Canziani, Jul 14
  5 | --------------------------------------------------------------------------------
  6 | 
  7 | -- Instruction -----------------------------------------------------------------
  8 | 
  9 | -- + Monitor the progress
 10 | -- You can run this script with small (1e2) values for <maxIteration> and see
 11 | -- how the MLP learns the target function through <nbSnapShots> partial results.
 12 | -- Have a look also how the cost function decreses with the progress of the
 13 | -- training. Notice that, being randomly initialised, the output will change if
 14 | -- the script is run another time.
 15 | 
 16 | -- + Check each neuron
 17 | -- You can also choose a higer (1e4) <maxIteration> and then set to <false>
 18 | -- <plotIntermediateResults>. In this way you will be able to check an optimal
 19 | -- regressed function along with the output of the three hidden neurons.
 20 | 
 21 | -- Choose a combination (comment the other)
 22 | --<<<
 23 | plotIntermediateResults = true
 24 | maxIteration = 1e2
 25 | --<<<>>>
 26 | --plotIntermediateResults = false
 27 | --maxIteration = 1e4 -- 1e4 for (a) & (b), 1e5 for (c), 1e6 for (d)
 28 | -->>>
 29 | 
 30 | -- Requires --------------------------------------------------------------------
 31 | require 'nn'
 32 | require 'gnuplot'
 33 | 
 34 | -- Define dataset --------------------------------------------------------------
 35 | dataset = {}
 36 | function dataset:size() return 50 end
 37 | x = torch.linspace(-1,1,dataset:size())
 38 | 
 39 | -- Here you can pick the function you want to regress by commenting out the
 40 | -- other assignments "y = ..."
 41 | --<<< (a)
 42 | y = x:clone():pow(2)
 43 | --y = x:clone():mul(math.sqrt(2)):pow(2) - 1
 44 | --<<< (b) >>>
 45 | --y = torch.sin(x * 2.5)
 46 | --<<< (c) >>>
 47 | --y = torch.abs(x*2)-1
 48 | --<<< (d) >>>
 49 | --y = x:gt(0):double() * 2 - 1
 50 | -->>>
 51 | 
 52 | for i = 1, dataset:size() do
 53 |    dataset[i] = {x:reshape(x:size(1),1)[i], y:reshape(y:size(1),1)[i]}
 54 | end
 55 | 
 56 | -- Define model architecture ---------------------------------------------------
 57 | model = nn.Sequential()
 58 | model:add(nn.Linear(1,3))
 59 | model:add(nn.Tanh())
 60 | model:add(nn.Linear(3,1))
 61 | 
 62 | -- Trainer definition ----------------------------------------------------------
 63 | criterion = nn.MSECriterion()
 64 | trainer = nn.StochasticGradient(model, criterion)
 65 | trainer.learningRate = 0.01
 66 | trainer.maxIteration = maxIteration
 67 | trainer.verbose = false
 68 | 
 69 | -- Hook iteration function
 70 | nbSnapShots = 5
 71 | h = torch.Tensor(nbSnapShots+1, x:size(1))
 72 | plt = {{'Training data', x, y, '+'}}
 73 | 
 74 | -- Starting condition
 75 | if plotIntermediateResults then
 76 |    for i = 1, x:size(1) do
 77 |       h[#plt][i] = model:forward(x:reshape(x:size(1),1)[i])[1]
 78 |    end
 79 |    table.insert(plt,{'Iteration ' .. 0, x, h[#plt], '-'})
 80 | end
 81 | gnuplot.plot(plt)
 82 | gnuplot.grid(true)
 83 | 
 84 | nbErrorPoints = 20
 85 | costFunction = {}
 86 | costIter = {}
 87 | 
 88 | function trainer.hookIteration(train, iteration, currentError)
 89 |    if iteration % (train.maxIteration/nbSnapShots) == 0 then
 90 |       if plotIntermediateResults then
 91 |          for i = 1, x:size(1) do
 92 |             h[#plt][i] = model:forward(x:reshape(x:size(1),1))[i]
 93 |          end
 94 |          table.insert(plt,{'Iteration ' .. iteration, x, h[#plt], '-'})
 95 |          gnuplot.figure(1)
 96 |          gnuplot.plot(plt)
 97 |       end
 98 |       print("# Epoch " .. iteration .. ", error: ", currentError)
 99 |    end
100 |    if iteration % (train.maxIteration/nbErrorPoints) == 0 or iteration == 1 then
101 |       table.insert(costFunction, currentError)
102 |       table.insert(costIter, iteration)
103 |       gnuplot.figure(2)
104 |       gnuplot.plot{'Cost function', torch.Tensor(costIter), torch.Tensor(costFunction)}
105 |       gnuplot.xlabel('Iterations')
106 |       gnuplot.grid(true)
107 |    end
108 | end
109 | 
110 | -- Training --------------------------------------------------------------------
111 | -- Start timer
112 | timer = torch.Timer()
113 | 
114 | -- Training
115 | trainer:train(dataset)
116 | 
117 | -- Profiling
118 | t = timer:time().real
119 | print('Time per iteration [ms]: ', t * 1000 / trainer.maxIteration)
120 | print('Total training time [min]: ', t / 60)
121 | 
122 | -- Check neurones --------------------------------------------------------------
123 | if not plotIntermediateResults then
124 |    gnuplot.figure(1)
125 |    for i = 1, x:size(1) do
126 |       h[1][i] = model:forward(x:reshape(x:size(1),1)[i])[1]
127 |       h[2][i] = model.modules[2].output[1]
128 |       h[3][i] = model.modules[2].output[2]
129 |       h[4][i] = model.modules[2].output[3]
130 |    end
131 |    table.insert(plt,{'Regression', x, h[1], '-'})
132 |    table.insert(plt,{'Neuron 1', x, h[2], '-'})
133 |    table.insert(plt,{'Neuron 2', x, h[3], '-'})
134 |    table.insert(plt,{'Neuron 3', x, h[4], '-'})
135 |    gnuplot.plot(plt)
136 | end
137 | 


--------------------------------------------------------------------------------
/MLP-regression/README.md:
--------------------------------------------------------------------------------
  1 | # Regression with MLP
  2 | 
  3 | Reading Bishop's [*Pattern Recognition and Machine Learning*](http://research.microsoft.com/en-us/um/people/cmbishop/prml/), I got to the point in which a 3-layer [MLP](http://en.wikipedia.org/wiki/Multilayer_perceptron) with **only 3 hidden neurons** (and 1 output linear neuron) was used to regress seamlessly some continuous functions. Here's the image
  4 | 
  5 | ![Figure 5.2](img/figure_5.3.png)
  6 | 
  7 | ## Figure 5.3(a)
  8 | Therefore, I wanted to reproduce these results, starting from 5.3(a). Notice that *f*_a : [−1,+1] → [0,+1], *x* ↦ *x*² whereas **tanh** : ℝ → [−1,+1], but it looks like **tanh** : ℝ → [0,+1]. Therefore, I assume a scaling and shifting factor have been applied to the *tanhs* in order to make them look better on the paper (or *f*_a(*x*) = (√2 *x*)² − 1).
  9 | 
 10 | ### MLP and neurons' outputs
 11 | Running [`src/regression.lua`](src/regression.lua) with `plotIntermediateResults = false` and `maxIteration = 1e4` produces the following result
 12 | 
 13 | ![*x*², regression and neuron's output](img/x2_reg_neu.png)
 14 | 
 15 | Here we can see the output of the 3 hidden neurons and how they are linearly combined to produce the *regression* of the input function. Notice that, in this specific case, only two hidden neurons are actually contributing to the output since the one with constant output has the same contribution of a bias term.
 16 | 
 17 | ### Transient
 18 | The script [`src/regression.lua`](src/regression.lua) can also be run with `plotIntermediateResults = true` and `maxIteration = 1e2`. In this way we can monitor the progress of the training algorithm in its early iterations (i.e. its *transient response*) and see how the convergence is reached.
 19 | 
 20 | ![*x*² transient cost function](img/x2_trans_cost.png)
 21 | ![*x*² transient regression](img/x2_trans_reg.png)
 22 | 
 23 | ### Run the script
 24 | Running the script is pretty simple. All you need is to read the instruction at the top of the [file](src/regression.lua) and run *Torch* interactively.
 25 | 
 26 | ```bash
 27 | th -i regression.lua
 28 | ```
 29 | 
 30 | ### The algorithm
 31 | The majority of [`src/regression.lua`](src/regression.lua) is visualisation stuff. The algorithmic part is pretty small and simple. I will report it here as well, for clarity and reference.
 32 | 
 33 | ```lua
 34 | -- Define dataset --------------------------------------------------------------
 35 | dataset = {}
 36 | function dataset:size() return 50 end
 37 | x = torch.linspace(-1,1,dataset:size())
 38 | y = x:clone():pow(2)
 39 | for i = 1, dataset:size() do
 40 |    dataset[i] = {x:reshape(x:size(1),1)[i], y:reshape(y:size(1),1)[i]}
 41 | end
 42 | 
 43 | -- Define model architecture ---------------------------------------------------
 44 | model = nn.Sequential()
 45 | model:add(nn.Linear(1,3))
 46 | model:add(nn.Tanh())
 47 | model:add(nn.Linear(3,1))
 48 | 
 49 | -- Trainer definition ----------------------------------------------------------
 50 | criterion = nn.MSECriterion()
 51 | trainer = nn.StochasticGradient(model, criterion)
 52 | trainer.learningRate = 0.01
 53 | 
 54 | -- Training --------------------------------------------------------------------
 55 | trainer:train(dataset)
 56 | ```
 57 | 
 58 | ### Like in Figure 5.3(a)
 59 | If we really want to match the result of Figure 5.3(a) we can use *f*_a(*x*) = (√2 *x*)² − 1 and therefore the following assignment for `y`
 60 | 
 61 | ```lua
 62 | y = x:clone():mul(math.sqrt(2)):pow(2) - 1
 63 | ```
 64 | 
 65 | ![*x*², regression and neuron's output](img/x2_reg_neu_fix.png)
 66 | 
 67 | ## Figure 5.3(b)
 68 | Here is clear, again, that the function sin(*x*) has been manipulated. And, precisely, *f*_b(*x*) = sin(2.5∙*x*). Therefore, by choosing the case (b) in the code (i.e. commenting out the other function definitions, as explained in the script itself) we can proceed and run it with *Torch7*.
 69 | 
 70 | ### MLP and neurons' outputs
 71 | Picking again `plotIntermediateResults = false` and `maxIteration = 1e4` produces decent results
 72 | 
 73 | ![sin(*x*), regression and neuron's output](img/sinx_reg_neu.png)
 74 | 
 75 | ### The algorithm
 76 | The only difference is in the creation of the `dataset`, which now is built with a different `y`
 77 | 
 78 | ```lua
 79 | y = torch.sin(x * 2.5)
 80 | ```
 81 | 
 82 | ## Figure 5.3(c)
 83 | Here we are dealing with out first function ∉ 𝒞¹. More specifically, *f*_c(*x*) = |2∙*x*| − 1 and we are trying to approximate it with a linear combination of 3 𝒞¹ functions.
 84 | 
 85 | ### MLP and neurons' outputs
 86 | For this reason we need a higher number of iterations, let's say `maxIteration = 1e5`. Hence, after having commented out what needs to be, we can get the following result
 87 | 
 88 | ![|*x*|, regression and neuron's output](img/absx_reg_neu.png)
 89 | 
 90 | ### The algorithm
 91 | Again here, the only difference is the `y` assignment and, more precisely
 92 | 
 93 | ```lua
 94 | y = torch.abs(x*2)-1
 95 | ```
 96 | 
 97 | ## Figure 5.3(d)
 98 | In this last case, things get even worse since sign(*x*) is not even continuous (∉ 𝒞⁰)! Nevertheless, we are in a fortunate case, where we can cosider sign(*x*) being the limit function of a tanh(*ax*), *a* → +∞. Moreover, being sign(*x*) sampled, *a* can just be "big" and we don't need to deal with dangerous symbols like "+∞".
 99 | 
100 | ### MLP and neurons' outputs
101 | To reach convergence, we need some more steps, in this case. Setting `maxIteration = 1e6` will do the job. Pay attention that this will take approximately 30 minutes on a MacBook Pro, 2.2 GHz and with an Intel i7.
102 | 
103 | ![sign(*x*), regression and neuron's output](img/singx_reg_neu.png)
104 | 
105 | ### The algorithm
106 | Same story as before, we just need a different assignment for `y`
107 | 
108 | ```lua
109 | y = x:gt(0):double() * 2 - 1
110 | ```
111 | 


--------------------------------------------------------------------------------
/PCA/README.md:
--------------------------------------------------------------------------------
  1 | # PCA
  2 | [*Principal component analysis*](http://en.wikipedia.org/wiki/Principal_component_analysis) (*PCA*) finds the directions of greatest variance in a dataset.
  3 | 
  4 | ## Index
  5 |  - [Why do we care?](#why-do-we-care)
  6 |  - [How does it work?](#how-does-it-work)
  7 |  - [What is used for?](#what-is-used-for)
  8 |     - [Dimensionality reduction](#dimensionality-reduction)
  9 |     - [Acquiring knowledge of variance distribution](#acquiring-knowledge-of-variance-distribution)
 10 |  - [Reducing data dimensionality](#reducing-data-dimensionality)
 11 |     - [Conclusion](#conclusion)
 12 |     - [Run the script](#run-the-script)
 13 |     - [The algorithm](#the-algorithm)
 14 |  - [Data spherification](#data-spherification)
 15 |     - [Run the script](#run-the-script-1)
 16 |     - [The algorithm](#the-algorithm-1)
 17 |  - [Data augmentation by aligned perturbation](#data-augmentation-by-aligned-perturbation)
 18 |     - [Introduction](#introduction)
 19 |     - [Example](#example)
 20 |     - [Justification](#justification)
 21 |     - [Run the script](#run-the-script-2)
 22 |     - [The algorithm](#the-algorithm-2)
 23 | 
 24 | ## Why do we care?
 25 | PCA can do a great deal of useful things such as:
 26 | 
 27 |  - speed up training;
 28 |  - 2/3D representation of data living in *n*-D, *n* > 3;
 29 |  - data augmentation by aligned perturbation;
 30 |  - ZCA whitening.
 31 | 
 32 | It can of course as well screw up everything (see [Feldman's blog post](http://blog.explainmydata.com/2012/07/should-you-apply-pca-to-your-data.html), for example).
 33 | 
 34 | ## How does it work?
 35 | To find the *first component*, PCA looks for a **linear combination of the elements of your dataset's vectors, which explains the highest variation of the data**. Think of it as the *versor* on which, the projection (*dot product*) of the dataset will have its highest variability.
 36 | For the *second component*, PCA looks for another (linear) combination — i.e. another *versor* — **orthogonal to the first one**, which explains the second highest variation of the data.
 37 | For the third, same story. In this case, the elements's combination has to be vertical to all previously found one. Etc…
 38 | 
 39 | ## What is used for?
 40 | 
 41 | ### Dimensionality reduction
 42 | For each *versor*, *principal component* or *eigenvector* there is an associated *power* (or *energy*, if square rooted), *variance* (or *standard deviation*, if square rooted) or *eigenvalue* which tells us the "amount of variability" in that direction. What happens often is that only the first few components have non-neglectable variance. Hence, data dimensionality can be greatly reduced with little loss of information.
 43 | In turn, dimensionality reduction can be used to perform a series of tricks, such as *training speed-up* and *2/3D visualisation of high dimensional data* I mentioned above.
 44 | 
 45 | ### Acquiring knowledge of variance distribution
 46 | Knowing the *direction* and the *amount* of variance of our data allows us, by playing smartly with them, to achieve reasonable *data augmentation* and *data spherificatoin*. More about it will be said later, in these notes.
 47 | 
 48 | ## Reducing data dimensionality
 49 | OK, let's get our hands dirty with PCA.  
 50 | So, after all this chatting, let's get a bit more specific with a case study.
 51 | Let's think we have a data living in a 2D space — with an uneven distribution — we'd like to compress into 1D, i.e. onto a line.
 52 | So, this is how the data looks like
 53 | 
 54 | ![Datasest](img/dataset.png)
 55 | 
 56 | This data is said to be *correlated*. This means that the value of one component influence the other component, hence both component are "important". Let's run PCA
 57 | 
 58 | ![PCA](img/data_pca.png)
 59 | 
 60 | ```lua
 61 | eigenvectors (colums):
 62 | -0.5488  0.8359
 63 |  0.8359  0.5488
 64 | [torch.DoubleTensor of dimension 2x2]
 65 | 
 66 | eigenvalues (power/variance):
 67 |  27.2135
 68 |   2.3272
 69 | [torch.DoubleTensor of dimension 2]
 70 | 
 71 | sqrt of the above (energy/std):
 72 |  5.2167
 73 |  1.5255
 74 | [torch.DoubleTensor of dimension 2]
 75 | ```
 76 | Great. Now we have the direction of highest variability (1st component) and its orthogonal one.  
 77 | If we consider the data in its new reference system (represented by the two principal components), we can say it is *uncorrelated*. Losely speaking, this means that one component does not influence the value of the other component.  
 78 | Let's look at the text output. Here we can see that the total *energy* / *information* of `5.44` (= √[`27.2` + `2.33`]) is spread unevenly across the components. `5.22` on the first and `1.53` on the second one. This means that, if we project the dataset onto the first component and discard the second one, we would retain 92.1% (= `27.2`/[`27.2` + `2.33`]) of the *variance*.
 79 | OK, it looks like cool. Let's project
 80 | 
 81 | ![Dimensionality reduction](img/dim_reduction.png)
 82 | 
 83 | Great! Now we have data along only 1 dimension which have a variability very close to the original data.
 84 | Notice how the "spare datapoint" at south-east in the original data is mapped on the far-west in the projected replica. This is because the "green arrow" of the first component is pointing on the opposite direction, hence the projection will be "very negative".
 85 | Notice also the 5 datapoints at north-west in the original data, are eavenly separated in the far-east projected data.
 86 | 
 87 | ### Conclusion
 88 | Cool. Now we know how to reduce data dimensionality. In turn, this means we can speed up our training (by using less input data) and we are able to visualise data living in high-D onto 3/2/1D.
 89 | 
 90 | ### Run the script
 91 | Running the script is pretty simple. All you need is to read the instruction at the top of the file and run Torch interactively.
 92 | 
 93 | ```
 94 | th -i PCA.lua
 95 | ```
 96 | 
 97 | ### The algorithm
 98 | The script I've used so far is [`src/PCA.lua`](src/PCA.lua). *PCA* and *projection* are shown below.
 99 | 
100 | ```lua
101 | -- PCA -------------------------------------------------------------------------
102 | -- X is m x n
103 | mean = torch.mean(X, 1) -- 1 x n
104 | m = X:size(1)
105 | Xm = X - torch.ones(m, 1) * mean
106 | Xm:div(math.sqrt(m - 1))
107 | v,s,_ = torch.svd(Xm:t())
108 | s:cmul(s) -- n
109 | 
110 | -- Projection ------------------------------------------------------------------
111 | X_hat = (X - torch.ones(m,1) * mean) * v[{ {},{1} }] -- m x 1
112 | ```
113 | 
114 | `X`, a `m` × `n` matrix, contains our dataset by rows; in this case `m` = `100` and `n` = `2`, i.e. we have `100` `2`-dimensional datapoints. `X_hat` is our projected data onto the *first component* `v[{ {},{1} }]`.
115 | 
116 | ## Data spherification
117 | This framework we just build could be also useful for illustrating *ZCA whitening* or *data spherificatoin*.  
118 | First of all, what the heck is "data spherification"? Well, in easy terms, it means redistribute data (along its principal components) in a way such that the variance is constant, i.e. minimum correlation. This makes sense **only** if the data components are generated by close parents. Otherwise, you just take a ticket for the doom way (see [Feldman's blog post](http://blog.explainmydata.com/2012/07/should-you-apply-pca-to-your-data.html), for example). This tecnhique is useful for "stripping off" the data all the "rubbish information" that won't supposedly help the job of other algorithms down the pipeline.  
119 | This said, let's get our cucumber-blob to reseable more a tomatoe-blob, starting with a new dataset and its PCA
120 | 
121 | ![PCA for ZCA](img/zca_data_pca.png)
122 | 
123 | Its *standard deviations* are `5.51` and `1.54`. We can rotate the data by evaluating the projection on both *principal components* and dividing it by the corresponding standard deviation obtaining, therefore, *PCA whitening*. (For sake of visualisation, I've multiplied the data by 3, obtaining hence std of `3` per component.)
124 | 
125 | ![ZCA rotated data](img/zca_rot.png)
126 | 
127 | Note how the "rotated data" is also "flipped" (for ease of spotting this, I've highlighten a group of four datapoints that have a particular L-shape). This happens because the second principal component happened to be on the "negative side" of a "standard" positively oriented reference system (i.e. it's at 90° clockwise rotation from the first component). Therefore, in order to put things to their right place and keep the original meaninig of each component, we ought to rotate back the data into its original reference system.
128 | 
129 | ![ZCA whitening](img/zca.png)
130 | 
131 | ### Run the script
132 | The script is the same as above. The only thing you need to do is to enable the visualisation of *ZCA*, and tweak the sphere radius, if you so desire. So, open the script [`src/PCA.lua`](src/PCA.lua), read the instruction and change the code accordigly.
133 | 
134 | ### The algorithm
135 | What *ZCA* implies is: (1) rotate the data onto its principal components, (2) normalise the variance, (3) rotate back to the original reference system. In Torch's terms
136 | 
137 | ```lua
138 | -- ZCA / spherification / whitening -------------------------------------------- 
139 | X_rot = (X - torch.ones(m,1) * mean) * v
140 | X_PCA_white = X_rot * torch.sqrt(s):pow(-1):diag()
141 | X_ZCA_white = X_PCA_white * v:t()
142 | ```
143 | 
144 | ## Data augmentation by aligned perturbation
145 | 
146 | ### Introduction
147 | What does this bombastic title stay for? Well, the concept behind it is actually quite simple.  
148 | Put yourself in the situation in which your learning algorithm is *overfitting* the dataset, i.e. it's learning the inherent distribution of the *training dataset* and won't generalise well for the *testing one*. Therefore, you'd like to artificially augment your training dataset by adding some noise in a "smart" way. E.g, you could add centred (`0`-mean) small (`0.2`-std) Gaussian noise along the data principal components scaled by the square root of the corresponding eigenvalues. In this way, the "general trend" is preserved and the new fictitious observation will be quite plausible.
149 | 
150 | ### Example
151 | Let's get our hands on, to get an understanding out of this nice concept.  
152 | Let our dataset be the pixels of a colour (`3` channels) image of `96` rows and `128` columns.
153 | 
154 | ![Peppers](img/peppers_img.png)
155 | 
156 | Here the pixels are aligned on a plane in a specific order constituting what we call *image*. Let's throw them into a 3D space letting their colour components' values determine their position. Here they are (if you run the code you will have the chance of rotating the 3D scatter plot and have a better idea of the pixels' 3D distribution).
157 | 
158 | ![Peppers pixels's distribution](img/peppers_dst_bf.png)
159 | 
160 | Now we can compute the principal components (as said before, if you run the script you'll be able to change the point of view using the mouse, which will help understand the distribution's shape and the position of the new reference system).
161 | 
162 | ![Peppers PCA view 1](img/peppers_PCA1_bf.png)
163 | ![Peppers PCA view 2](img/peppers_PCA2_bf.png)
164 | ![Peppers PCA view 3](img/peppers_PCA3_bf.png)
165 | 
166 | Hence, we can add a small amount of centred Gaussian noise along to the principal component directions, scaled by the corresponding standard deviation. Here's how the recontructed image looks like for `12` different draws of the random variable.
167 | 
168 | ![Peppers aligned perturbation](img/peppers_aligned.png)
169 | 
170 | ### Justification
171 | Someone may argue about why introducing all this framework if, at the end, we simply use "random values". Well… the results of using spherical random values (in contrast to our ellipsoidal approach) is the following.
172 | 
173 | ![Peppers disaligned perturbation](img/peppers_disaligned.png)
174 | 
175 | It is undeniable that *aligned perturbation* produces far more credible results. What happens is that the component that has greater spread will eventually "move" much more than those that are more localised, in terms of colour space coordinates.  
176 | In this specific case — as we can see from our 3D pixels distribution's scatter plots — the major component (√*s*₁ = `76.2`) closely approximates the *brightness* channel, i.e. the oriented line that gose from (`0`,`0`,`0`) to (`255`,`255`,`255`), even though it is oriented in the opposite direction. Therefore, the highest perturbation will occur in terms of brightness variability, which won't affect the overall appearance of the image, due our *brightness visual invariancy*. Furthermore, all perturbations are compliant with the "data distribution shape", hence the output will look more "natural". The remaining two components (√*s*₂ = `43.1` and √*s*₃ = `29.8`), which are orthogonal to the brightness one, will change mainly the *saturation* (average std radius of `52.4`) and less the *hue* (average rotation of `34.7` degrees).
177 | 
178 | ### Run the script
179 | In this case, running the script [`src/alignedPerturbation.lua`](src/alignedPerturbation.lua) requires `qlua` for the visualisation of the images. Therefore, we can start an interactive session with
180 | 
181 | ```bash
182 | qlua -i alignedPerturbation.lua
183 | ```
184 | 
185 | ### The algorithm
186 | It comprises 3 main parts: (1) loading the dataset, (2) computing PCA (and this is the exact code you can read above) plus scaling the eigenvector with the corresponding eigenvalues' square root and (3) add noise alongside the principal components. In code we have
187 | 
188 | ```lua
189 | -- Loading dataset/image -------------------------------------------------------
190 | -- Load image in byte (0-255) format
191 | img = image.loadByte('aux/peppers.png')
192 | 
193 | -- Rearranging pixel components along 3-column X matrix
194 | imgT = img:transpose(1,2):transpose(2,3):clone()
195 | X = imgT:reshape(img:size(2)*img:size(3),img:size(1))
196 | 
197 | -- PCA -------------------------------------------------------------------------
198 | -- see above --
199 | 
200 | -- Scaling eigenvectors with corresponding std
201 | vv = v * torch.diag(torch.sqrt(s))
202 | 
203 | -- Aligned perturbation --------------------------------------------------------
204 | collection1 = {}
205 | for i = 1, 12 do
206 |    perturbation = vv * torch.randn(3,1) * 0.2
207 |    X_hat = X + torch.ones(m,1) * perturbation:t()
208 |    img_hatT = X_hat:reshape(img:size(2),img:size(3),img:size(1))
209 |    img_hat = img_hatT:transpose(2,3):transpose(1,2)
210 |    table.insert(collection1,img_hat:clone())
211 | end
212 | ```
213 | 
214 | The only line that is actually worth mention, which constitutes the algorithm itself, is the following
215 | 
216 | ```lua
217 | perturbation = vv * torch.randn(3,1) * 0.2
218 | ```
219 | 
220 | `torch.randn(3,1) * 0.2` is a `0.2` radius spherical random variable; let's call it __*a*__. Therefore, we'd like to add to our pixels: *a*₁ ∙ √*s*₁ ∙ __*v*__₁ + *a*₂ ∙ √*s*₂ ∙ __*v*__₂ + *a*₃ ∙ √*s*₃ ∙ __*v*__₃ = [__vv__] ∙ __*a*__, where [__vv__] is the matrix of scaled eigenvectors, i.e. [__vv__] = [__v__] ∙ `diag(`√__*s*__`)`, with [__v__] being the matrix of eigenvectors, stacked one side each other, and __*s*__ being the vector of eigenvalues.
221 | 


--------------------------------------------------------------------------------