├── 01_MATLAB ├── 09_Value_Iteration_Method │ ├── data │ │ ├── environment1.txt │ │ └── environment2.txt │ ├── README.txt │ ├── answers.pdf │ └── code │ │ └── main.m ├── 03_Singular_Value_Decomposition │ ├── RUN.txt │ ├── README.txt │ ├── answers.pdf │ ├── data │ │ └── input1.txt │ ├── code │ │ ├── svd_power.m │ │ └── main.m │ └── README.md ├── 08_K_Mean_Clustering │ ├── README.txt │ ├── answers.pdf │ ├── README.md │ └── code │ │ └── main.m ├── 06_Linear_Regression │ ├── README.txt │ ├── answers.pdf │ ├── data │ │ └── sample_data1.txt │ ├── code │ │ ├── linear_regression.m │ │ └── Assignment4.m │ └── README.md ├── 10_Dynamic_Time_Warping │ ├── README.txt │ ├── answers.pdf │ ├── code │ │ ├── dist.m │ │ ├── main.m │ │ ├── load_data.m │ │ └── dtw.m │ └── README.md ├── 02_Neural_Network │ ├── answers.pdf │ ├── README.txt │ ├── code │ │ ├── main.m │ │ └── neural_network.m │ ├── README.md │ └── data │ │ └── yeast_test.txt ├── 05_K_Nearest_Neigbour │ ├── answers.pdf │ ├── README.txt │ ├── code │ │ ├── normalise.m │ │ ├── main.m │ │ └── knn.m │ └── README.md ├── 07_Logistic_Regression │ ├── Answers.pdf │ ├── README.txt │ ├── code │ │ └── logistic_regression.m │ └── README.md ├── 04_Principal_Component_Analysis │ ├── answers.pdf │ ├── README.txt │ ├── code │ │ ├── power_method.m │ │ ├── compute_X.m │ │ └── main.m │ └── README.md └── 01_Descision_Tree_and_Random_Forest │ ├── answers.pdf │ ├── README.txt │ ├── code │ ├── choose_attribute.m │ ├── get_info_gain.m │ ├── make_tree.m │ ├── main.m │ └── random.m │ └── README.md ├── img └── dnld_rep.png ├── 02_PYTHON ├── 07_Frequentist_Estimate │ ├── answers.pdf │ ├── README.md │ ├── Frequentist_Estimate.ipynb │ └── old_Frequentist_Estimate.ipynb ├── 04_Fitting_1D_Gaussian_to_data │ ├── answers.pdf │ ├── README.md │ └── Fitting_1D_Gaussian_to_data.ipynb ├── 05_Fitting_2D_Gaussian_to_data │ ├── answers.pdf │ ├── README.md │ └── Fitting_2D_Gaussian_to_data.ipynb ├── 06_Error_Function_and_Regularisation │ ├── img │ │ ├── errorfunction.png │ │ └── regularisederror.png │ ├── data │ │ ├── training_outputs1.txt │ │ └── training_inputs1.txt │ ├── README.md │ └── ErrorFunction.ipynb ├── 02_Gaussian_Naive_Bayes │ └── README.md ├── 01_Naive_Bayes_Classifier │ └── README.md └── 03_Mixture_Of_Gaussian_Using_EM_Algortihm │ └── README.md ├── LICENSE └── README.md /01_MATLAB/09_Value_Iteration_Method/data/environment1.txt: -------------------------------------------------------------------------------- 1 | 1.0,X 2 | .,-1.0 -------------------------------------------------------------------------------- /01_MATLAB/09_Value_Iteration_Method/data/environment2.txt: -------------------------------------------------------------------------------- 1 | .,.,.,1.0 2 | .,X,.,-1.0 3 | .,.,.,. -------------------------------------------------------------------------------- /img/dnld_rep.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/milaan9/Machine_Learning_Algorithms_from_Scratch/master/img/dnld_rep.png -------------------------------------------------------------------------------- /01_MATLAB/03_Singular_Value_Decomposition/RUN.txt: -------------------------------------------------------------------------------- 1 | main('input1.txt', 2, 10) 2 | 3 | 4 | 5 | main('input1.txt', 4, 100) -------------------------------------------------------------------------------- /01_MATLAB/09_Value_Iteration_Method/README.txt: -------------------------------------------------------------------------------- 1 | 1) open main.m file 2 | 2) type main('environment2.txt', -0.04, 1.0, 20) and enter 3 | -------------------------------------------------------------------------------- /01_MATLAB/08_K_Mean_Clustering/README.txt: -------------------------------------------------------------------------------- 1 | 1) open main.m file 2 | 2) type main('yeast_test.txt', 2, 5) in command window 3 | 3) Press Enter -------------------------------------------------------------------------------- /01_MATLAB/06_Linear_Regression/README.txt: -------------------------------------------------------------------------------- 1 | Run linear_regression.m 2 | 3 | 4 | In command Window type : linear_regression('sample_data1.txt', degree, lambda) -------------------------------------------------------------------------------- /01_MATLAB/10_Dynamic_Time_Warping/README.txt: -------------------------------------------------------------------------------- 1 | 1) open "main.m" 2 | 2) type main('asl_training.txt', 'asl_test.txt') in the command window 3 | 3) press enter. -------------------------------------------------------------------------------- /01_MATLAB/02_Neural_Network/answers.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/milaan9/Machine_Learning_Algorithms_from_Scratch/master/01_MATLAB/02_Neural_Network/answers.pdf -------------------------------------------------------------------------------- /01_MATLAB/03_Singular_Value_Decomposition/README.txt: -------------------------------------------------------------------------------- 1 | 2 | IN command window type : main('input1.txt', 2, 10) 3 | 4 | Here : 5 | M = 2 6 | iterations = 10 7 | 8 | -------------------------------------------------------------------------------- /01_MATLAB/05_K_Nearest_Neigbour/answers.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/milaan9/Machine_Learning_Algorithms_from_Scratch/master/01_MATLAB/05_K_Nearest_Neigbour/answers.pdf -------------------------------------------------------------------------------- /01_MATLAB/06_Linear_Regression/answers.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/milaan9/Machine_Learning_Algorithms_from_Scratch/master/01_MATLAB/06_Linear_Regression/answers.pdf -------------------------------------------------------------------------------- /01_MATLAB/08_K_Mean_Clustering/answers.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/milaan9/Machine_Learning_Algorithms_from_Scratch/master/01_MATLAB/08_K_Mean_Clustering/answers.pdf -------------------------------------------------------------------------------- /01_MATLAB/07_Logistic_Regression/Answers.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/milaan9/Machine_Learning_Algorithms_from_Scratch/master/01_MATLAB/07_Logistic_Regression/Answers.pdf -------------------------------------------------------------------------------- /01_MATLAB/10_Dynamic_Time_Warping/answers.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/milaan9/Machine_Learning_Algorithms_from_Scratch/master/01_MATLAB/10_Dynamic_Time_Warping/answers.pdf -------------------------------------------------------------------------------- /02_PYTHON/07_Frequentist_Estimate/answers.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/milaan9/Machine_Learning_Algorithms_from_Scratch/master/02_PYTHON/07_Frequentist_Estimate/answers.pdf -------------------------------------------------------------------------------- /01_MATLAB/09_Value_Iteration_Method/answers.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/milaan9/Machine_Learning_Algorithms_from_Scratch/master/01_MATLAB/09_Value_Iteration_Method/answers.pdf -------------------------------------------------------------------------------- /01_MATLAB/10_Dynamic_Time_Warping/code/dist.m: -------------------------------------------------------------------------------- 1 | function [cost_value] = dist(x1, y1, x2, y2) 2 | distance = (x1-y1)^2 + (x2-y2)^2; 3 | cost_value = sqrt(distance); 4 | end -------------------------------------------------------------------------------- /01_MATLAB/03_Singular_Value_Decomposition/answers.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/milaan9/Machine_Learning_Algorithms_from_Scratch/master/01_MATLAB/03_Singular_Value_Decomposition/answers.pdf -------------------------------------------------------------------------------- /01_MATLAB/04_Principal_Component_Analysis/answers.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/milaan9/Machine_Learning_Algorithms_from_Scratch/master/01_MATLAB/04_Principal_Component_Analysis/answers.pdf -------------------------------------------------------------------------------- /02_PYTHON/04_Fitting_1D_Gaussian_to_data/answers.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/milaan9/Machine_Learning_Algorithms_from_Scratch/master/02_PYTHON/04_Fitting_1D_Gaussian_to_data/answers.pdf -------------------------------------------------------------------------------- /02_PYTHON/05_Fitting_2D_Gaussian_to_data/answers.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/milaan9/Machine_Learning_Algorithms_from_Scratch/master/02_PYTHON/05_Fitting_2D_Gaussian_to_data/answers.pdf -------------------------------------------------------------------------------- /01_MATLAB/01_Descision_Tree_and_Random_Forest/answers.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/milaan9/Machine_Learning_Algorithms_from_Scratch/master/01_MATLAB/01_Descision_Tree_and_Random_Forest/answers.pdf -------------------------------------------------------------------------------- /01_MATLAB/03_Singular_Value_Decomposition/data/input1.txt: -------------------------------------------------------------------------------- 1 | 0 1 1 1 0 1 0 0 0 2 | 1 0 0 0 0 1 0 1 0 3 | 1 0 0 0 0 0 0 0 1 4 | 0 1 0 1 0 0 0 0 1 5 | 1 0 0 0 1 0 0 0 0 6 | 0 1 0 0 0 1 1 0 1 7 | 1 0 0 0 0 1 0 0 0 -------------------------------------------------------------------------------- /02_PYTHON/06_Error_Function_and_Regularisation/img/errorfunction.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/milaan9/Machine_Learning_Algorithms_from_Scratch/master/02_PYTHON/06_Error_Function_and_Regularisation/img/errorfunction.png -------------------------------------------------------------------------------- /01_MATLAB/07_Logistic_Regression/README.txt: -------------------------------------------------------------------------------- 1 | 1) type "logistic_regression(training_filename, degree, testing_filename) 2 | 2) press enter. 3 | 3) for example : logistic_regression('pendigits_training.txt', 1, 'pendigits_test.txt') -------------------------------------------------------------------------------- /02_PYTHON/06_Error_Function_and_Regularisation/img/regularisederror.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/milaan9/Machine_Learning_Algorithms_from_Scratch/master/02_PYTHON/06_Error_Function_and_Regularisation/img/regularisederror.png -------------------------------------------------------------------------------- /01_MATLAB/02_Neural_Network/README.txt: -------------------------------------------------------------------------------- 1 | 1) type main('pendigits_training.txt', 'pendigits_test.txt', 2, 20, 50)" 2 | where the argument format is : "main(training file, testing file, layers, units, rounds)" 3 | 4 | 2) press enter. 5 | -------------------------------------------------------------------------------- /01_MATLAB/04_Principal_Component_Analysis/README.txt: -------------------------------------------------------------------------------- 1 | 2 | Run any one of the below commands: 3 | 4 | main('pendigits_training.txt', 'pendigits_test.txt', 1, 10) 5 | 6 | main('satellite_training.txt', 'satellite_test.txt', 2, 20) 7 | 8 | main('yeast_training.txt', 'yeast_test.txt', 3, 30) -------------------------------------------------------------------------------- /01_MATLAB/05_K_Nearest_Neigbour/README.txt: -------------------------------------------------------------------------------- 1 | 2 | Run the main.m file 3 | 4 | main('pendigits_training.txt','pendigits_test.txt', 1) 5 | 6 | main('pendigits_training.txt','pendigits_test.txt', 3) 7 | 8 | main('pendigits_training.txt','pendigits_test.txt', 5) 9 | 10 | 11 | 12 | -------------------------------------------------------------------------------- /01_MATLAB/03_Singular_Value_Decomposition/code/svd_power.m: -------------------------------------------------------------------------------- 1 | function [U] = svd_power(A, iterations) 2 | D = size(A, 1); 3 | B = zeros(1, D); 4 | B = B + 1; 5 | B = transpose(B); 6 | for i = 1:iterations 7 | B = (A*B)/norm(A*B); 8 | end 9 | U = B; 10 | end -------------------------------------------------------------------------------- /01_MATLAB/05_K_Nearest_Neigbour/code/normalise.m: -------------------------------------------------------------------------------- 1 | function [data] = normalise(data, std, mean) 2 | %disp(data); 3 | for i = 1: (size(data, 2)) 4 | data(:, i) = data(:, i) - mean(1, i); 5 | 6 | data(:, i) = data(:, i) / std(1, i) ; 7 | end 8 | %disp(data); 9 | end -------------------------------------------------------------------------------- /01_MATLAB/02_Neural_Network/code/main.m: -------------------------------------------------------------------------------- 1 | function [] = main(training_file, test_file, layers, units_per_layer, rounds) 2 | obj = neural_network(training_file, test_file, layers, units_per_layer, rounds); 3 | obj = obj.initialise(obj); 4 | for i = 1:obj.rounds 5 | obj = obj.feed_forward(obj, i-1); 6 | end 7 | obj = obj.testing(obj); 8 | end -------------------------------------------------------------------------------- /01_MATLAB/04_Principal_Component_Analysis/code/power_method.m: -------------------------------------------------------------------------------- 1 | function [U] = power_method(covariance, iterations, D) 2 | A = covariance; 3 | B = rand(1, D); 4 | B = transpose(B); 5 | for iteration = 1:iterations 6 | matrix = A*B; 7 | magnitude = norm(matrix); 8 | B = matrix/magnitude ; 9 | 10 | end 11 | U = B; 12 | end 13 | -------------------------------------------------------------------------------- /01_MATLAB/04_Principal_Component_Analysis/code/compute_X.m: -------------------------------------------------------------------------------- 1 | function [data] = compute_X(data, U) 2 | 3 | for i = 1:size(data, 1) 4 | %disp(size(U)); 5 | %disp(size(data(i, 1:end))); 6 | term = transpose(U)* transpose(data(i, 1:end))* U; 7 | %disp(size(term)) 8 | data(i, 1:end) = data(i, 1:end) - transpose(term); 9 | end 10 | 11 | end -------------------------------------------------------------------------------- /01_MATLAB/01_Descision_Tree_and_Random_Forest/README.txt: -------------------------------------------------------------------------------- 1 | 1) open "main.m" 2 | 2) in command window you can type any one of the following and hit enter: 3 | 4 | main(dtree , 'pendigits_training.txt', 'pendigits_test.txt', 'randomized', 50) 5 | 6 | main(dtree , 'pendigits_training.txt', 'pendigits_test.txt', 'optimized', 50) 7 | 8 | main(dtree , 'pendigits_training.txt', 'pendigits_test.txt', 'forest3', 50) 9 | 10 | -------------------------------------------------------------------------------- /01_MATLAB/06_Linear_Regression/data/sample_data1.txt: -------------------------------------------------------------------------------- 1 | 0 -0.3025 2 | 0.0421 0.5519 3 | 0.0842 0.6801 4 | 0.1263 0.4707 5 | 0.1684 0.9281 6 | 0.2105 0.3283 7 | 0.2526 -0.1455 8 | 0.2947 1.0968 9 | 0.3368 0.3703 10 | 0.3789 0.9732 11 | 0.4211 0.7456 12 | 0.4632 0.6848 13 | 0.5053 0.2559 14 | 0.5474 0.0517 15 | 0.5895 -0.6573 16 | 0.6316 0.2664 17 | 0.6737 -0.7675 18 | 0.7158 -0.7983 19 | 0.7579 -0.4087 20 | 0.8000 -1.4177 21 | 22 | 23 | 24 | 25 | -------------------------------------------------------------------------------- /02_PYTHON/06_Error_Function_and_Regularisation/data/training_outputs1.txt: -------------------------------------------------------------------------------- 1 | 3.5269 2 | 5.2430 3 | -0.2659 4 | 3.9385 5 | 4.3225 6 | 0.1213 7 | 1.4293 8 | 2.5223 9 | 3.8194 10 | 4.3168 11 | 0.3155 12 | 3.7798 13 | 4.1128 14 | 3.1523 15 | 4.9872 16 | -0.2744 17 | 1.2272 18 | 4.6418 19 | 2.8333 20 | 4.6780 21 | 2.1960 22 | 0.9208 23 | 3.6260 24 | 4.0712 25 | 1.8112 26 | 2.5611 27 | 2.8856 28 | 2.4767 29 | 3.8546 30 | 1.3331 31 | 2.4832 32 | -0.4488 33 | 0.3009 34 | 1.3362 35 | -1.1489 36 | 4.6507 37 | 3.8073 38 | 2.7776 39 | 3.5709 40 | -0.3047 -------------------------------------------------------------------------------- /01_MATLAB/06_Linear_Regression/code/linear_regression.m: -------------------------------------------------------------------------------- 1 | function [] = linear_regression(filename, degree, lambda) 2 | %filename = 'sample_data1.txt'; 3 | %degree = 1; 4 | %lambda = 0; 5 | object = Assignment4(filename, degree, lambda); 6 | object.load_data(object); 7 | object.calculate_phi(object); 8 | object.calculate_w(object); 9 | for i = 1: size(object.w, 1) 10 | fprintf(' W%d = %.4f\n', i-1, object.w(i, 1)); 11 | end 12 | if degree == 1 13 | fprintf(' W2 = 0\n'); 14 | end 15 | end -------------------------------------------------------------------------------- /01_MATLAB/10_Dynamic_Time_Warping/code/main.m: -------------------------------------------------------------------------------- 1 | function [] = main(train_file, test_file) 2 | %=============================================== 3 | % Reading the files 4 | %=============================================== 5 | [train_data_col1, train_data_col2, train_class, train_lenght] = load_data(train_file); 6 | [test_data_col1, test_data_col2, test_class, test_lenght] = load_data(test_file); 7 | 8 | %=============================================== 9 | % Perfrom DTW 10 | %=============================================== 11 | dtw(train_data_col1, train_data_col2, train_class, train_lenght, test_data_col1, test_data_col2, test_class, test_lenght) 12 | end -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | MIT License 2 | 3 | Copyright (c) 2021 milaan9 4 | 5 | Permission is hereby granted, free of charge, to any person obtaining a copy 6 | of this software and associated documentation files (the "Software"), to deal 7 | in the Software without restriction, including without limitation the rights 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 9 | copies of the Software, and to permit persons to whom the Software is 10 | furnished to do so, subject to the following conditions: 11 | 12 | The above copyright notice and this permission notice shall be included in all 13 | copies or substantial portions of the Software. 14 | 15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 21 | SOFTWARE. 22 | -------------------------------------------------------------------------------- /02_PYTHON/06_Error_Function_and_Regularisation/data/training_inputs1.txt: -------------------------------------------------------------------------------- 1 | 0.8147 0.4387 0.3517 2 | 0.9058 0.3816 0.8308 3 | 0.1270 0.7655 0.5853 4 | 0.9134 0.7952 0.5497 5 | 0.6324 0.1869 0.9172 6 | 0.0975 0.4898 0.2858 7 | 0.2785 0.4456 0.7572 8 | 0.5469 0.6463 0.7537 9 | 0.9575 0.7094 0.3804 10 | 0.9649 0.7547 0.5678 11 | 0.1576 0.2760 0.0759 12 | 0.9706 0.6797 0.0540 13 | 0.9572 0.6551 0.5308 14 | 0.4854 0.1626 0.7792 15 | 0.8003 0.1190 0.9340 16 | 0.1419 0.4984 0.1299 17 | 0.4218 0.9597 0.5688 18 | 0.9157 0.3404 0.4694 19 | 0.7922 0.5853 0.0119 20 | 0.9595 0.2238 0.3371 21 | 0.6557 0.7513 0.1622 22 | 0.0357 0.2551 0.7943 23 | 0.8491 0.5060 0.3112 24 | 0.9340 0.6991 0.5285 25 | 0.6787 0.8909 0.1656 26 | 0.7577 0.9593 0.6020 27 | 0.7431 0.5472 0.2630 28 | 0.3922 0.1386 0.6541 29 | 0.6555 0.1493 0.6892 30 | 0.1712 0.2575 0.7482 31 | 0.7060 0.8407 0.4505 32 | 0.0318 0.2543 0.0838 33 | 0.2769 0.8143 0.2290 34 | 0.0462 0.2435 0.9133 35 | 0.0971 0.9293 0.1524 36 | 0.8235 0.3500 0.8258 37 | 0.6948 0.1966 0.5383 38 | 0.3171 0.2511 0.9961 39 | 0.9502 0.6160 0.0782 40 | 0.0344 0.4733 0.4427 -------------------------------------------------------------------------------- /01_MATLAB/10_Dynamic_Time_Warping/code/load_data.m: -------------------------------------------------------------------------------- 1 | function [column_1, column_2, labels, lenght] = load_data(file) 2 | is_last_line = 0; 3 | start=1; 4 | labels = []; 5 | k = 0; 6 | file_instance=fopen(file,'rt'); 7 | line = fopen(file_instance); 8 | while ischar(line) 9 | line = fgetl(file_instance); 10 | if ~isempty(strfind(line,'object')) 11 | k = k+1; 12 | index = k; 13 | end 14 | if ~isempty(strfind(line,'class')) 15 | split_values = split(line,' '); 16 | class = split_values(3); 17 | labels(index) = class; 18 | end 19 | if ~isempty(strfind(line,'----------------')) 20 | is_last_line=0; 21 | end 22 | if is_last_line == 1 23 | if line ~= -1 24 | start = start+1; 25 | split_values = strsplit(string(line)); 26 | column_1(index,start) = double(split_values(1, 2)); 27 | column_2(index,start) = double(split_values(1, 3)); 28 | end 29 | end 30 | if ~isempty(strfind(line,'dominant hand trajectory:')) 31 | is_last_line = 1; 32 | start = 1; 33 | end 34 | end 35 | fclose(file_instance); 36 | lenght = index; 37 | end -------------------------------------------------------------------------------- /01_MATLAB/05_K_Nearest_Neigbour/code/main.m: -------------------------------------------------------------------------------- 1 | function [] = main(train_file, test_file, k) 2 | %=============================================== 3 | % Loading Data 4 | %=============================================== 5 | training_data = load(train_file); 6 | testing_data = load(test_file); 7 | train_target = training_data(:, end); 8 | test_target = testing_data(:, end); 9 | train_data = training_data(:, 1: end-1); 10 | test_data = testing_data(:, 1: end-1); 11 | %=============================================== 12 | % Calculating mean and standard deviation 13 | %=============================================== 14 | mean_of_dimension = mean(train_data); 15 | std_deviation = std(train_data, 1); 16 | %=============================================== 17 | % Normalising Data 18 | %=============================================== 19 | train_data = normalise(train_data, std_deviation, mean_of_dimension); % normalising training data 20 | test_data = normalise(test_data, std_deviation, mean_of_dimension); % normalising testing data 21 | %=============================================== 22 | % Performing Knn calculation 23 | %=============================================== 24 | knn(train_data, test_data, train_target, test_target, k); 25 | end -------------------------------------------------------------------------------- /01_MATLAB/06_Linear_Regression/code/Assignment4.m: -------------------------------------------------------------------------------- 1 | classdef Assignment4 < handle 2 | properties 3 | filename; 4 | degree; 5 | lambda; 6 | data; 7 | phi; 8 | w; 9 | end 10 | methods (Static) 11 | function [obj] = Assignment4(filename, degree, lambda) 12 | obj.filename = filename; 13 | obj. degree = degree; 14 | obj.lambda = lambda; 15 | end 16 | 17 | function [obj] = load_data(obj) 18 | obj.data = load(obj.filename); 19 | end 20 | 21 | function [obj] = calculate_phi(obj) 22 | number_of_rows = size(obj.data, 1); 23 | obj.phi = zeros(number_of_rows, (obj.degree)+1); 24 | for row = 1: number_of_rows 25 | obj.phi(row, 1) = 1; 26 | for deg = 1: obj.degree 27 | obj.phi(row, deg+1) = obj.data(row, 1)^deg; 28 | end 29 | end 30 | end 31 | 32 | function [obj] = calculate_w(obj) 33 | if obj.lambda == 0 34 | obj.w = (inv(transpose(obj.phi)* (obj.phi))) * (transpose(obj.phi)) * (obj.data(:,2)); 35 | else 36 | obj.w = (inv((obj.lambda * eye(obj.degree+1)) + transpose(obj.phi)* (obj.phi))) * (transpose(obj.phi)) * (obj.data(:,2)); 37 | end 38 | end 39 | end 40 | end -------------------------------------------------------------------------------- /02_PYTHON/04_Fitting_1D_Gaussian_to_data/README.md: -------------------------------------------------------------------------------- 1 | ### Task 2 | Write code that fits 1-dimensional Gaussians to data. The input file will be a single text file, like the text files in the datasets directory. 3 | A description of the datasets and the file format can be found on this link. 4 | Your code will be given as a command-line argument the path of a text file. 5 | This text file could be any of the six files in the UCI datasets directory, but it could also be ANY OTHER file using the same format as the files in the UCI datasets directory. Link is mentioned below: 6 | 7 | 8 |
 9 |     UCI dataset directory
10 | 
11 | 12 | A description of the datasets and the file format can be found on this link. 13 | As the description states, do NOT use data from the last column (i.e., the class labels) in your calculations. 14 | In these files, all columns except for the last one contain example inputs. The last column contains the class label. 15 | 16 | ### Output 17 | 18 | The output of your code should contain one line for each dimension of each class. Such a line should look like this: 19 | Class %d, dimension %d, mean = %.2f, variance = %.2f 20 | 21 | In your answers.pdf document, provide the output produced by your program when given yeast_training.txt as the input file 22 | -------------------------------------------------------------------------------- /01_MATLAB/01_Descision_Tree_and_Random_Forest/code/choose_attribute.m: -------------------------------------------------------------------------------- 1 | function[best_attribute,best_threshold,max_gain]=choose_attribute(data,attributes,option) 2 | max_gain = -1; 3 | if isequal(option,'randomized') 4 | attribute = datasample(attributes,1,'Replace',true); 5 | L = min(data(:,attribute)); 6 | M = max(data(:,attribute)); 7 | for K=1:50 8 | threshold= L + K*(M - L)/51; 9 | gain=get_info_gain(data, attribute, threshold); 10 | if gain>max_gain 11 | max_gain=gain; 12 | best_attribute=attribute; 13 | best_threshold=threshold; 14 | end 15 | end 16 | %disp(best_attribute) 17 | 18 | elseif isequal(option,'optimized') 19 | 20 | for attribute = 1:size(data, 2)-1 21 | L=min(data(:, attribute)); 22 | M=max(data(:, attribute)); 23 | for K = 1:50 24 | threshold = (L + K*(M-L))/(51); 25 | [gain] = get_info_gain(data, attribute, threshold); 26 | if gain > max_gain 27 | max_gain = gain; 28 | best_attribute = attribute; 29 | best_threshold = threshold; 30 | end 31 | end 32 | end 33 | 34 | end 35 | 36 | end -------------------------------------------------------------------------------- /02_PYTHON/02_Gaussian_Naive_Bayes/README.md: -------------------------------------------------------------------------------- 1 | ### Task 2 | 3 | You must implement a program that learns a naive Bayes classifier for a classification problem, given some training data and some additional options. 4 | In particular, your program will be invoked as follows: 5 | - naive_bayes gaussians 6 | 7 | ### Training: Gaussians 8 | 9 | If the third commandline argument is gaussians, then you should model P(x | class) as a Gaussian separately for each dimension of the data. 10 | The output of the training phase should be a sequence of lines like this: 11 | Class %d, attribute %d, mean = %.2f, std = %.2f 12 | The output lines should be sorted by class number. 13 | Within the same class, lines should be sorted by attribute number. 14 | Attributes should be numbered starting from 0, not from 1. 15 | In certain cases, it is possible that value computed for the standard deviation is equal to zero. 16 | Your code should make sure that the variance of the Gaussian is NEVER smaller than 0.0001. 17 | Since the variance is the square of the standard deviation, this means that the standard deviation should never be smaller than sqrt(0.0001) = 0.01. 18 | Any time the value for the standard deviation is computed to be smaller than 0.01, your code should replace that value with 0.01. 19 | 20 | In your answers.pdf document, provide the output produced by the training stage of your program when given yeast_training.txt as the input file. 21 | -------------------------------------------------------------------------------- /02_PYTHON/06_Error_Function_and_Regularisation/README.md: -------------------------------------------------------------------------------- 1 | Markup : # Calculating Error Function # 2 | 3 | 4 | Markup : We have a dataset of 40 training examples. 5 | Markup : The i-th training example is denoted as (xi, ti), where xi is the example input and ti is the target output. 6 | Markup : The target inputs xi can be downloaded from training_inputs1.txt. 7 | Markup : Each xi is a three-dimensional vector denoted as (xi, 1, xi, 2, xi, 3). 8 | Markup : In file training_inputs1.txt, the number at row i and column j is the value for xi, j. 9 | Markup : The target outputs ti can be downloaded from training_outputs1.txt. 10 | Markup : Each ti is a real number. Row i of training_outputs1.txt contains the value for ti. 11 | 12 | Markup : Error Function formula 13 | Markup : ![ErrorFunction](https://github.com/milaan9/Machine_Learning_Algorithms_from_Scratch/blob/master/02_Python/Error_Function_and_Regularisation/img/errorfunction.png) 14 | 15 | Markup : Let w be a three dimensional vector (w1, w2, w3). 16 | Markup : Define y(xi, w) as follows: y(xi, w) = w1 * xi, 1 + w2 * xi, 2 + w3 * xi, 3. 17 | Markup : Part a: If w = (3, -1.5, -2), evaluate E(w) 18 | Markup : Part b: If w = (5.2, -2, 1), evaluate E(w) 19 | 20 | Markup : Regularisation formula 21 | Markup : ![Regularisation](https://github.com/milaan9/Machine_Learning_Algorithms_from_Scratch/blob/master/02_Python/Error_Function_and_Regularisation/img/regularisederror.png) 22 | 23 | Markup : Part c: If w = (3, -1.5, -2) and ? = 0.25, evaluate the alternative error 24 | Markup : Part d: If w = (5.2, -2, 1) and ? = 0.25, evaluate the alternative error -------------------------------------------------------------------------------- /02_PYTHON/05_Fitting_2D_Gaussian_to_data/README.md: -------------------------------------------------------------------------------- 1 | ### Task 2 | Write code that fits 2-dimensional Gaussians to data. 3 | The input file will be a single text file, like the text files in the UCI datasets directory. 4 | A description of the datasets and the file format can be found on this link. 5 | Your code will be given as a command-line argument the path of a text file. 6 | This text file could be any of the six files in the UCI datasets directory, but it could also be ANY OTHER file using the same format as the files in the datasets directory. 7 | 8 |
 9 |     UCI dataset directory
10 | 
11 | 12 | A description of the datasets and the file format can be found on above link. 13 | You should only fit a 2D Gaussian to the first two dimensions of the data. You can ignore the other dimensions. 14 | 15 | ### Output 16 | The output of your code should contain one line for each class. Such a line should look like this: 17 | Class %d, mean = [%.2f, %.2f], sigma = [%.2f, %.2f, %.2f, %.2f] 18 | Note that, in the above output sample, %d is a place holder for an integer, and %.2f is a placeholder for a number with two decimal digits, following the Java and C printf conventions. With sigma we denote the covariance matrix. 19 | The values of sigma should be printed in this order: [(row=1 col=1), (row=1 col=2), (row=2 col=1), (row=2 col=2)]. 20 | In your answers.pdf document, provide the output produced by your program when given satellite_training.txt as the input file 21 | -------------------------------------------------------------------------------- /02_PYTHON/07_Frequentist_Estimate/README.md: -------------------------------------------------------------------------------- 1 | ### Task 2 | Implement a simulation, where you estimate the probability of a binary event using a frequentist approach. 3 | The data will be generated by your code using a specific probability distribution, but then your code will ESTIMATE that distribution based on the data that was generated. 4 | Naturally, the estimated distribution will probably not be identical to the true distribution that was used. 5 | First, your code a random string S whose length is 3100 characters: S = c1, c2, c3100. 6 | The length of 3100 is based on the "snow in January" example covered in class, it corresponds to 100 years of weather records of January days). 7 | To generate the string S, follow these guidelines: 8 | 9 | Each character ci is either character 'a' or character 'b'. 10 | Each characer c should be chosen randomly, so that the prior p(c = 'a') = 0.1. To do this, your code should: 11 | Generate a random number, drawn from a uniform distribution between 0 and 1. 12 | If that random number is less than or equal to 0.1, then the character should be set to 'a'. If the random number is greater than 0.1, the character should be set to 'b'. 13 | You should make sure that the choice of any character is independent from the choice of any other character. In other words: if i != j, then P(ci = 'a' | cj = 'a') = P(ci = 'a'). 14 | After you generate the string S, you should estimate (using the frequentist approach) the probability p(c = 'a'), based on the characters of S. At the end, your code should print out the estimated probability. 15 | The program output should follow EXACTLY this format: 16 | p(c = 'a') = %.4f 17 | -------------------------------------------------------------------------------- /01_MATLAB/04_Principal_Component_Analysis/code/main.m: -------------------------------------------------------------------------------- 1 | function [] = main(train_file, test_file, M, iterations) 2 | 3 | %=================================== 4 | % Loading and Inistialising Data 5 | %=================================== 6 | train_data = load(train_file); 7 | train_data = double(train_data); 8 | test_data = load(test_file); 9 | test_data = test_data(:, 1:end-1); 10 | U = zeros(size(train_data, 2)-1, M); 11 | data = train_data(:, 1:end-1); 12 | 13 | %=================================== 14 | % Calculating Eigen Vector - U 15 | %=================================== 16 | for d = 1:M 17 | 18 | covariance = cov(data, 1); 19 | U(1:end, d) = power_method(covariance, iterations, size(data, 2)); 20 | data = compute_X(data, U(1:end, d)); 21 | 22 | end 23 | 24 | %=================================== 25 | % Printing Eigen Vector - U 26 | %=================================== 27 | for i = 1:M 28 | fprintf('Eigenvector %d \n', i); 29 | for j = 1:size(train_data, 2)-1 30 | fprintf('%d : %.4f \n', j, U(j, i)); 31 | end 32 | end 33 | 34 | %========================================================= 35 | % Calculating Projection Matrix and Projection Values 36 | %========================================================= 37 | proj_mat = transpose(U); 38 | proj_value = proj_mat * transpose(test_data); 39 | 40 | %=================================== 41 | % Displaying Results 42 | %=================================== 43 | for i = 1: size(test_data) 44 | fprintf('Test object %d \n', i-1); 45 | for j = 1:M 46 | fprintf('%d : %.4f\n', j, proj_value(j, i)); 47 | end 48 | end 49 | end 50 | -------------------------------------------------------------------------------- /01_MATLAB/01_Descision_Tree_and_Random_Forest/code/get_info_gain.m: -------------------------------------------------------------------------------- 1 | function [gain] = get_info_gain(data, attribute, threshold) 2 | target = data(:,end); 3 | distribution = histc(target,unique(target)); 4 | distribution = distribution / size(target,1); 5 | class_gain = 0; 6 | 7 | for i = 1:size(distribution, 1) 8 | if distribution(i) > 0 9 | class_gain = class_gain + distribution(i) * log2(distribution(i)); 10 | end 11 | end 12 | 13 | class_gain = (-1)*class_gain; 14 | 15 | info_gain_1 = zeros(size(unique(target))); 16 | info_gain_2 = zeros(size(unique(target))); 17 | 18 | find_1 = data(1:end, attribute) < threshold; 19 | find_2 = data(1:end, attribute) >= threshold; 20 | 21 | left_data = data(find_1, :); 22 | right_data = data(find_2, :); 23 | 24 | left_class = histc(left_data(:, end), unique(target)); 25 | right_class = histc(right_data(:, end), unique(target)); 26 | 27 | if isrow(left_class) 28 | left_class = transpose(left_class); 29 | end 30 | 31 | if isrow(right_class) 32 | right_class = transpose(right_class); 33 | end 34 | 35 | left_probab = left_class / sum(left_class); 36 | left_probab(isnan(left_probab)) = 0; 37 | 38 | right_probab = right_class / sum(right_class); 39 | right_probab(isnan(right_probab)) = 0; 40 | 41 | for class = 1: size(unique(target)) 42 | info_gain_1(class, 1) = left_probab(class, 1)*(log2(left_probab(class, 1))); 43 | end 44 | 45 | info_gain_1(isnan(info_gain_1)) = 0; 46 | info_gain_1 = (-1)*info_gain_1; 47 | ig_1 = sum(info_gain_1); 48 | 49 | for class = 1: size(unique(target)) 50 | info_gain_2(class, 1) = right_probab(class, 1)*(log2(right_probab(class, 1))); 51 | end 52 | 53 | info_gain_2(isnan(info_gain_2)) = 0; 54 | info_gain_2 = (-1)*info_gain_2; 55 | ig_2 = sum(info_gain_2); 56 | 57 | entropy = (sum(left_class)/size(data, 1))*ig_1 + (sum(right_class)/size(data, 1))*ig_2; 58 | 59 | gain = class_gain - entropy; 60 | 61 | 62 | end 63 | -------------------------------------------------------------------------------- /01_MATLAB/06_Linear_Regression/README.md: -------------------------------------------------------------------------------- 1 | ### Task 2 | In this task you will implement linear regression 3 | Data used is in the Data folder 4 | 5 | ### Command-line Arguments 6 | 7 | You must implement a program that uses linear regression to fit a line or a second-degree polynomial to a set of training data. 8 | Your program can be invoked as follows: 9 | linear_regression <λ> 10 | The arguments provide to the program the following information: 11 | The first argument is the path name of the training file, where the training data is stored. 12 | The path name can specify any file stored on the local computer. 13 | The second argument is a number. This number should be either 1 or 2. 14 | We will not test your code with any other values. If the number is 1, you should fit a line to the data. 15 | If the number is 2, you should fit a second-degree polynomial to the data. 16 | The third number is a non-negative real number (it can be zero or greater than zero). 17 | This is the value of λ that you should use for regularization. If λ = 0, then no regularization is used. 18 | The training file is a text file, containing data in tabular format. Each value is a number, and values are separated by white space. 19 | Each row contains two numbers: the first of those numbers is the training input, and the second of those numbers is the target output. 20 | 21 | ### Output 22 | At the end, your program should print out the values of the weights that you have estimated. 23 | 24 | For C or C++, use: 25 | printf("w0=%.4lf\n", w0); 26 | printf("w1=%.4lf\n", w1); 27 | printf("w2=%.4lf\n", w2); 28 | For any other language, just make sure that you use formatting specifies that produce aligned output that matches EXACTLY the specs given above for C. Note that you print the value of w2 even when the command-line degree argument is 1. In that case, just print 0 for w2. 29 | 30 | ### Running the Program 31 | Run linear_regression.m 32 | In command Window type : 33 | - linear_regression('sample_data1.txt', degree, lambda) 34 | -------------------------------------------------------------------------------- /01_MATLAB/01_Descision_Tree_and_Random_Forest/code/make_tree.m: -------------------------------------------------------------------------------- 1 | function[tree,threshold,gain] = make_tree(data,pruning_threshold,option,attributes,class_max,tree,threshold,gain,index) 2 | target=(data(:,end)); 3 | if size(data,1) <= pruning_threshold 4 | distribution=zeros(class_max,1); 5 | target = data(1:end,end); 6 | 7 | for j=1:size(distribution,1) 8 | column = find(target==j); 9 | probclass=size(column,1)/size(target,1); 10 | distribution(j,1)=probclass; 11 | end 12 | 13 | [m,i]=max(distribution); 14 | 15 | tree(index)=i; 16 | 17 | threshold(index)=-1; 18 | 19 | gain(index)=-1; 20 | 21 | elseif size(unique(target),1)==1 22 | 23 | tree(index)=target(1, 1); 24 | threshold(index)=-1; 25 | gain(index)=-1; 26 | 27 | else 28 | [best_attri,best_thresh,best_gain]= choose_attribute(data, attributes,option); 29 | 30 | tree(index)=best_attri; 31 | 32 | threshold(index)=best_thresh; 33 | 34 | gain(index)=best_gain; 35 | 36 | ryt=1; 37 | 38 | left=1; 39 | 40 | for i=1:size(data,1) 41 | if data(i,best_attri) >= best_thresh 42 | rightdata(ryt,:)=data(i,1:end); 43 | ryt = ryt + 1; 44 | else 45 | leftdata(left,:) = data(i,1:end); 46 | left = left + 1; 47 | end 48 | end 49 | 50 | if exist('leftdata') 51 | [tree,threshold,gain]= make_tree(leftdata,pruning_threshold,option,attributes,class_max,tree,threshold,gain,2*index); 52 | 53 | else 54 | tree(2*index)=1; 55 | threshold(2*index)=-1; 56 | gain(2*index)=-1; 57 | end 58 | 59 | if exist('rightdata') 60 | [tree,threshold,gain] = make_tree(rightdata,pruning_threshold,option,attributes,class_max,tree,threshold,gain,(2*index)+1); 61 | else 62 | tree((2*index)+1)=1; 63 | threshold((2*index)+1)=-1; 64 | gain((2*index)+1)=-1; 65 | end 66 | end 67 | end -------------------------------------------------------------------------------- /01_MATLAB/08_K_Mean_Clustering/README.md: -------------------------------------------------------------------------------- 1 | ### Task 1 2 | In this task you will implement k-means clustering. 3 | 4 | ### Command-line Arguments 5 | Your program will be invoked as follows: 6 | k_means_cluster 7 | The arguments provide to the program the following information: 8 | The first argument, , is the path name of a file where the data is stored. 9 | The path name can specify any file stored on the local computer. 10 | The second argument, , specifies the number of clusters. 11 | The third argument, , specifies the number of iterations of the main loop. 12 | The initialization stage (giving a random assignment of objects to clusters, and computing the means of those random assignments) does not count as an iteration. 13 | The data file will follow the same format as the training and test files in the UCI datasets directory. 14 | A description of the datasets and the file format can be found on this link. 15 | Your code should also work with ANY OTHER training and test files using the same format as the files in the UCI datasets directory. 16 | As the description states, do NOT use data from the last column (i.e., the class labels) as features. 17 | In these files, all columns except for the last one contain example inputs. The last column contains the class label. 18 | Link to data set is given below : 19 | 20 |
21 |     UCI dataset directory
22 | 
23 | 24 | ### Implementation Guidelines 25 | Use the L2 distance (the Euclidean distance) for computing the distance between any two objects in the dataset. 26 | 27 | ### Output 28 | 29 | After the initialization stage, and after each iteration, you should print the value E(S1,S2,...,SK), as defined in page 27 of the clustering slides. 30 | The output should follow this format: 31 | 32 | After initialization: error = %.4f 33 | After iteration 1: error = %.4f 34 | After iteration 2: error = %.4f 35 | ... 36 | 37 | ### Running The Program 38 | 1) open main.m file 39 | 2) type main('yeast_test.txt', 2, 5) in command window 40 | 3) Press Enter 41 | -------------------------------------------------------------------------------- /01_MATLAB/05_K_Nearest_Neigbour/code/knn.m: -------------------------------------------------------------------------------- 1 | function [] = knn(train_data, test_data, train_target, test_target, k) 2 | classification_accuracy = 0; 3 | for i = 1:size(test_data, 1) 4 | %============================================ 5 | % Calculating Euclidean Distance 6 | %============================================ 7 | D = test_data(i, :) - train_data(: , :); 8 | D = D.^2; 9 | dist_mat = sum(D, 2); 10 | dist_mat = sqrt(dist_mat); 11 | dist = [dist_mat train_target]; 12 | %============================================ 13 | % Sorting Row according to minimum distance 14 | %============================================ 15 | dist = sortrows(dist, 1); 16 | %============================================ 17 | % If K value is 1 Print Results 18 | %============================================ 19 | if k == 1 20 | k_neighbours = dist(k, :); 21 | predicted = k_neighbours(1, 2); 22 | true = test_target(i, 1); 23 | if true == predicted 24 | accuracy = 1; 25 | classification_accuracy = classification_accuracy + accuracy; 26 | else 27 | accuracy = 0; 28 | end 29 | fprintf('ID=%5d, predicted=%3d, true=%3d, accuracy=%4.2f \n', i, predicted, true, accuracy) 30 | %=============================================== 31 | % Else K value is greater then 1 Print Results 32 | %=============================================== 33 | else 34 | k_neighbours = dist(1:k, :); 35 | if size(unique(k_neighbours(:, 2))) == 1 36 | predicted = unique(k_neighbours(:, 2)); 37 | elseif size(unique(k_neighbours(:, 2))) == k 38 | predicted = k_neighbours(1, 2); 39 | else 40 | predicted = mode(k_neighbours(:, 2)) 41 | end 42 | true = test_target(i, 1); 43 | if true == predicted 44 | accuracy = 1; 45 | classification_accuracy = classification_accuracy + accuracy; 46 | else 47 | accuracy = 0; 48 | end 49 | fprintf('ID=%5d, predicted=%3d, true=%3d, accuracy=%4.2f \n', i, predicted, true, accuracy) 50 | end 51 | end 52 | fprintf('classification_accuracy=%6.4f \n', classification_accuracy/size(test_target, 1)) 53 | end 54 | -------------------------------------------------------------------------------- /01_MATLAB/01_Descision_Tree_and_Random_Forest/code/main.m: -------------------------------------------------------------------------------- 1 | function [] = main(string, training_file, testing_file, option, pruning_thr) 2 | there = strfind(option,'forest'); 3 | if size(there) > 0 4 | %disp('in if') 5 | option = 'forest3'; 6 | end 7 | if isequal(option,'forest3') 8 | random(training_file, testing_file, 'randomized', pruning_thr); 9 | 10 | else 11 | %train_file = 'pendigits_training.txt'; 12 | %test_file = 'pendigits_test.txt'; 13 | train_file = training_file; 14 | test_file = testing_file; 15 | train_data = load(train_file); 16 | test_data = load(test_file); 17 | target = train_data(:, end); 18 | test_target = test_data(:, end); 19 | unique_class = unique(target); 20 | pruning_thrsld = pruning_thr; 21 | class_max = max(target); 22 | tree = []; 23 | thrsldeshold = []; 24 | gainin = []; 25 | index = 1; 26 | classification_acc = 0; 27 | 28 | attributes = zeros(1, size(train_data, 2)-1); 29 | 30 | for col = 1: size(train_data, 2)-1 31 | attributes(1, col) = col; 32 | end 33 | 34 | %attributes = [1, 2, 3, 4, 5 ,6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16]; 35 | 36 | [tree,thrsldeshold,gainin] = make_tree(train_data,pruning_thrsld,option,attributes,class_max,tree,thrsldeshold,gainin,index); 37 | 38 | loop_end = size(tree,2); 39 | 40 | for i=1:loop_end 41 | if (tree(:,i)-1) ~= -1 42 | fprintf('tree=%2d, node=%3d, feature=%2d, thr=%6.2f, gain=%f\n',0,i,tree(:,i)-1,thrsldeshold(:,i),gainin(:,i)); 43 | end 44 | end 45 | 46 | loop_end = size(test_data,1); 47 | for row=1:loop_end 48 | index=1; 49 | is_leaf=1; 50 | while is_leaf == 1 51 | attr=tree(index); 52 | thrsld=thrsldeshold(index); 53 | gain=gainin(index); 54 | if thrsld~=-1 && gain~=-1 55 | if test_data(row,attr) < thrsld 56 | index = (2*index); 57 | else 58 | index = (2*index)+1; 59 | end 60 | else 61 | test_label = test_data(row,end); 62 | if attr~=test_label 63 | acc = 0; 64 | else 65 | acc = 1; 66 | end 67 | classification_acc=classification_acc+acc; 68 | fprintf('ID=%5d, predicted=%3d, true=%3d, accuracy=%4.2f\n', row, attr, test_label, acc); 69 | is_leaf=0; 70 | 71 | end 72 | end 73 | end 74 | 75 | fprintf('classification accuracy=%6.4f\n',classification_acc/size(test_data,1)); 76 | 77 | end 78 | 79 | end 80 | 81 | -------------------------------------------------------------------------------- /01_MATLAB/08_K_Mean_Clustering/code/main.m: -------------------------------------------------------------------------------- 1 | function [] = main(file, k, iterations) 2 | %======================================= 3 | % Initialising Data 4 | %======================================= 5 | % k = 3; 6 | % iterations = 20; 7 | % file = 'yeast_test.txt'; 8 | data = load(file); 9 | rows = size(data, 1); 10 | cols = size(data, 2); 11 | data = data(1:rows, 1:cols-1); 12 | clusters = randi([1 k], rows, 1); 13 | clustered_data = [data clusters]; 14 | mean_matrix = zeros(k, cols-1); 15 | 16 | %======================================= 17 | % Calculating Mean 18 | %======================================= 19 | for i = 1:k 20 | index = clustered_data(:, end) == i; 21 | indexed_data = clustered_data(index, 1:end-1); 22 | mean_matrix(i, :) = mean(indexed_data); 23 | 24 | end 25 | 26 | %======================================= 27 | % Computing Error 28 | %======================================= 29 | error = get_error(clustered_data, mean_matrix); 30 | fprintf('After initialization: error = %.4f \n', error); 31 | 32 | %======================================= 33 | % Starting Iterations 34 | %======================================= 35 | for p = 1:iterations 36 | for q = 1:rows 37 | %======================================= 38 | % Deciding which Cluster data belongs 39 | %======================================= 40 | dist = get_euclidean(data(q, :), mean_matrix); 41 | [minimum_row, minimum_col] = min(dist); 42 | clusters(q) = minimum_col; 43 | end 44 | clustered_data = [data, clusters]; 45 | 46 | %======================================= 47 | % Calculating Mean 48 | %======================================= 49 | for i = 1:k 50 | index = clustered_data(:, end) == i; 51 | indexed_data = clustered_data(index, 1:end-1); 52 | mean_matrix(i, :) = mean(indexed_data); 53 | end 54 | %======================================= 55 | % Computing Error 56 | %======================================= 57 | error = get_error(clustered_data, mean_matrix); 58 | fprintf('After iteration %d: error = %.4f \n', p, error); 59 | end 60 | end 61 | 62 | function [error] = get_error(data, mean_matrix) 63 | %======================================= 64 | % Computing Error 65 | %======================================= 66 | error = 0; 67 | for j = 1: size(data, 1) 68 | c = data(j, end); 69 | dist = get_euclidean(data(j, 1:end-1), mean_matrix(c, 1:end)); 70 | error = error + dist; 71 | end 72 | end 73 | 74 | function [distance_matrix] = get_euclidean(data, mean_matrix) 75 | %======================================= 76 | % Calculating Euclidean 77 | %======================================= 78 | dist = data - mean_matrix; 79 | dist = dist.^2; 80 | dist = sum(dist, 2); 81 | distance_matrix = sqrt(dist); 82 | end -------------------------------------------------------------------------------- /01_MATLAB/10_Dynamic_Time_Warping/code/dtw.m: -------------------------------------------------------------------------------- 1 | function [] = dtw(train_data_col1, train_data_col2, train_class, train_lenght, test_data_col1, test_data_col2, test_class, test_lenght) 2 | classification_accuracy = 0; 3 | %=============================================== 4 | % For every test object 5 | %=============================================== 6 | for i = 1:test_lenght 7 | x = test_data_col1(i, :); 8 | x = transpose(x); 9 | y = test_data_col2(i, :); 10 | y = transpose(y); 11 | test_cord = [x, y]; 12 | test_cord(all(test_cord == 0, 2), :)=[]; 13 | n = size(test_cord, 1); 14 | %=============================================== 15 | % For every train object 16 | %=============================================== 17 | for j = 1:train_lenght 18 | x = train_data_col1(j, :); 19 | x = transpose(x); 20 | y = train_data_col2(j, :); 21 | y = transpose(y); 22 | train_cord = [x, y]; 23 | train_cord(all(train_cord == 0, 2), :)=[]; 24 | m = size(train_cord, 1); 25 | c = zeros(m, n); 26 | x1 = train_cord(1, 1); 27 | x2 = test_cord(1, 1); 28 | y1 = train_cord(1, 2); 29 | y2 = test_cord(1, 2); 30 | c(1, 1) = dist(x1, x2, y1, y2); 31 | %=============================================== 32 | % Filling the first column 33 | %=============================================== 34 | for k = 2:m 35 | x1 = train_cord(k, 1); 36 | x2 = test_cord(1, 1); 37 | y1 = train_cord(k, 2); 38 | y2 = test_cord(1, 2); 39 | c(k, 1) = c(k-1, 1) + dist(x1, x2, y1, y2); 40 | end 41 | 42 | %=============================================== 43 | % Filling the first row 44 | %=============================================== 45 | for l = 2:n 46 | x1 = train_cord(1, 1); 47 | x2 = test_cord(l, 1); 48 | y1 = train_cord(1, 2); 49 | y2 = test_cord(l, 2); 50 | c(1, l) = c(1, l-1) + dist(x1, x2, y1, y2); 51 | end 52 | 53 | %=============================================== 54 | % Filling the rest of the matrix 55 | %=============================================== 56 | for p = 2:m 57 | for q = 2:n 58 | x1 = train_cord(p, 1); 59 | x2 = test_cord(q, 1); 60 | y1 = train_cord(p, 2); 61 | y2 = test_cord(q, 2); 62 | c(p, q)= min([c(p-1, q) c(p, q-1) c(p-1, q-1)]) + dist(x1, x2, y1, y2); 63 | end 64 | end 65 | cost(j, 1) = c(m, n); 66 | cost(j, 2) = train_class(j); 67 | end 68 | value = sortrows(cost, 1); 69 | distance = value(1, 1); 70 | predicted = value(1, 2); 71 | true = test_class(i); 72 | acc = 0; 73 | if true == predicted 74 | acc = 1; 75 | end 76 | fprintf('ID=%5d, predicted=%3d, true=%3d, accuracy=%4.2f, distance = %.2f \n', i, predicted, true, acc, distance); 77 | classification_accuracy = classification_accuracy + acc; 78 | end 79 | fprintf('classification accuracy=%6.4f\n', classification_accuracy/test_lenght); 80 | end 81 | -------------------------------------------------------------------------------- /01_MATLAB/03_Singular_Value_Decomposition/README.md: -------------------------------------------------------------------------------- 1 | ### Task 2 | In this task you will implement singular value decomposition (SVD). 3 | More specifically, you will compute the matrices U, S, V for a specific input matrix, and for a specific target dimensionality. 4 | Where needed, you will find eigenvectors using the power method. 5 | 6 | ### Command-line Arguments 7 | 8 | Your program will be invoked as follows: 9 | svd_power 10 | The arguments provide to the program the following information: 11 | The first argument, , is the path name of the file where the input matrix is stored. 12 | The path name can specify any file stored on the local computer. 13 | The data file will have as many lines as the rows of the input matrix. 14 | Line n will contain the values in the n-th row of the matrix. 15 | Within that line n, values will be separated by white space. An example data file is input1.txt. 16 | Values can be any real numbers, and the input matrix can have any number of rows and columns. 17 | The second argument, , specifies the number of dimensions for the SVD output. 18 | In other words, the U matrix should have columns, the V matrix should have columns, and the S matrix should have rows and columns. Remember, the diagonal entries Sd,d of S should contain values that decrease as d increases. 19 | The third argument, , is a number greater than or equal to 1, that specifies the number of iterations for the power method. 20 | Slide 44 in the slides on PCA describes how to use the power method to find the top eigenvector, using a sequence bk. You should stop calculating this sequence after the specified number of iterations, and use the last bk (where k=) as the eigenvector. 21 | 22 | ### Output 23 | 24 | After you compute matrices U, S, V, you need to print each of those matrices. You also need to print the reconstruction of the original matrix. This reconstruction is computed as U*S*VT. The output should follow this format: 25 | Matrix U: 26 | Row 1: %8.4f %8.4f ... %8.4f 27 | Row 2: %8.4f %8.4f ... %8.4f 28 | ... 29 | Row X: %8.4f %8.4f ... %8.4f 30 | 31 | Matrix S: 32 | Row 1: %8.4f %8.4f ... %8.4f 33 | Row 2: %8.4f %8.4f ... %8.4f 34 | ... 35 | Row M: %8.4f %8.4f ... %8.4f 36 | 37 | Matrix V: 38 | Row 1: %8.4f %8.4f ... %8.4f 39 | Row 2: %8.4f %8.4f ... %8.4f 40 | ... 41 | Row Y: %8.4f %8.4f ... %8.4f 42 | 43 | Reconstruction (U*S*V'): 44 | Row 1: %8.4f %8.4f ... %8.4f 45 | Row 2: %8.4f %8.4f ... %8.4f 46 | ... 47 | Row X: %8.4f %8.4f ... %8.4f 48 | In the above output template: 49 | X is the number of rows in the data file. 50 | Y is the number of columns in the data file. 51 | M is command-line argument . 52 | In each line printing the row of a matrix, the row numbershould be printed using the %3d format specification (an integer with three allocated spaces). The actual values of the matrix should be printed with the %8.4f format specifier (or equivalent format if using a language different than Java). 53 | In your answers.pdf file, include the output for the following invocations of your program: 54 | 55 | svd_power input1.txt 2 10 56 | svd_power input1.txt 4 100 57 | 58 | ### Running The Program 59 | IN command window type : main('input1.txt', 2, 10) 60 | 61 | Here : 62 | M = 2 63 | iterations = 10 64 | 65 | -------------------------------------------------------------------------------- /01_MATLAB/05_K_Nearest_Neigbour/README.md: -------------------------------------------------------------------------------- 1 | ### Task 2 | In this task you will implement k-nearest neighbor classification using Euclidean Distance. 3 | 4 | ### Command-line Arguments 5 | 6 | Your program will be invoked as follows: 7 | knn_classify 8 | The arguments provide to the program the following information: 9 | The first argument, , is the path name of the training file, where the training data is stored. 10 | The path name can specify any file stored on the local computer. 11 | The second argument, , is the path name of the test file, where the test data is stored. 12 | The path name can specify any file stored on the local computer. 13 | The third argument specifies the value of k for the k-nearest neighbor classifier. 14 | Files used will be any of the files in the UCI datasets directory 15 | A description of the datasets and the file format can be found on below link. 16 | 17 |
18 |     UCI dataset directory
19 | 
20 | 21 | ### Implementation Guidelines 22 | 23 | Each dimension should be normalized, separately from all other dimensions. 24 | Specifically, for both training and test objects, each dimension should be transformed using function: 25 | - F(v) = (v - mean) / std, using the mean and std of the values of that dimension on the TRAINING data. 26 | To compute the std, divide by the number of training data (NOT the number of training data minus 1). 27 | Use the L2 distance (the Euclidean distance) for computing the nearest neighbors 28 | 29 | ### Classification Stage 30 | 31 | For each test object you should print a line containing the following info: 32 | object ID. This is the line number where that object occurs in the test file. Start with 0 in numbering the objects, not with 1. 33 | predicted class (the result of the classification). If your classification result is a tie among two or more classes, choose one of them randomly. 34 | true class (from the last column of the test file). 35 | accuracy. This is defined as follows: 36 | If there were no ties in your classification result, and the predicted class is correct, the accuracy is 1. 37 | If there were no ties in your classification result, and the predicted class is incorrect, the accuracy is 0. 38 | If there were ties in your classification result, and the correct class was one of the classes that tied for best, the accuracy is 1 divided by the number of classes that tied for best. 39 | If there were ties in your classification result, and the correct class was NOT one of the classes that tied for best, the accuracy is 0. 40 | To produce this output in a uniform manner, use these printing statements: 41 | For C or C++, use: 42 | printf("ID=%5d, predicted=%3d, true=%3d, accuracy=%4.2lf\n", object_id, predicted_class, true_class, accuracy); 43 | For Java, use: 44 | System.out.printf("ID=%5d, predicted=%3d, true=%3d, accuracy=%4.2f\n", object_id, predicted_class, true_class, accuracy); 45 | 46 | For Python or any other language, just make sure that you use formatting specifies that produce aligned output that matches the specs given for C and Java. 47 | After you have printed the results for all test objects, you should print the overall classification accuracy, which is defined as the average of the classification accuracies you printed out for each test object. To print the classification accuracy in a uniform manner, use these printing statements: 48 | Use: 49 | printf("classification accuracy=%6.4lf\n", classification_accuracy); 50 | 51 | 52 | 53 | ### Running The Program 54 | Run the main.m file 55 | 56 | main('pendigits_training.txt','pendigits_test.txt', 1) 57 | 58 | main('pendigits_training.txt','pendigits_test.txt', 3) 59 | 60 | main('pendigits_training.txt','pendigits_test.txt', 5) 61 | -------------------------------------------------------------------------------- /01_MATLAB/03_Singular_Value_Decomposition/code/main.m: -------------------------------------------------------------------------------- 1 | function [] = main(train_file, M, iterations) 2 | % =================================================% 3 | % Loading and Initialising Data 4 | % =================================================% 5 | train_data = load(train_file); 6 | train_data = double(train_data); 7 | A = train_data*transpose(train_data); 8 | A_1 = train_data*transpose(train_data); 9 | U = zeros(size(train_data, 1), M); 10 | 11 | % =================================================% 12 | % Calculating Eigen Vectors - U 13 | % =================================================% 14 | for d = 1: M 15 | U(:, d) = svd_power(A*A', iterations); 16 | %disp(size(U)) 17 | for i = 1:size(A, 1) 18 | value = transpose(U(:, d)) * transpose(A(i, 1:end))*U(:, d); 19 | A(i, :) = A(i, :) - transpose(value); 20 | end 21 | end 22 | 23 | % =================================================% 24 | % Calculating Lambda, Eigen Values, and S Diagonal 25 | % =================================================% 26 | lambda = transpose(U)*A_1*U; 27 | eigen_values = sqrt(max(lambda)); 28 | S = zeros(M, M); 29 | for i = 1:M 30 | S(i, i) = eigen_values(1, i); 31 | end 32 | 33 | % =================================================% 34 | % Calculating V 35 | % =================================================% 36 | V = zeros(size(train_data, 2), M); 37 | A = transpose(train_data)*train_data; 38 | for d = 1: M 39 | V(:, d) = svd_power(A*A', iterations); 40 | %disp(size(U)) 41 | for i = 1:size(A, 1) 42 | value = transpose(V(:, d)) * transpose(A(i, 1:end))*V(:, d); 43 | A(i, :) = A(i, :) - transpose(value); 44 | end 45 | end 46 | 47 | % =================================================% 48 | % Performing Reconstruction 49 | % =================================================% 50 | reconstruction = U*S*transpose(V); 51 | display(U, S, V, reconstruction) 52 | end 53 | 54 | function [] = display(U, S, V, reconstruction) 55 | % =================================================% 56 | % Displaying Eigen Vectors - U 57 | % =================================================% 58 | fprintf('Matrix U: \n') 59 | for row = 1:size(U, 1) 60 | fprintf('Row%3d:', row); 61 | for col = 1:size(U, 2) 62 | fprintf('%8.4f', U(row, col)); 63 | end 64 | fprintf('\n') 65 | end 66 | 67 | % =================================================% 68 | % Displaying Diagonal Matrix - S 69 | % =================================================% 70 | fprintf('\n'); 71 | fprintf('Matrix S: \n') 72 | for row = 1:size(S, 1) 73 | fprintf('Row%3d:', row); 74 | for col = 1:size(S, 2) 75 | fprintf('%8.4f', S(row, col)); 76 | end 77 | fprintf('\n') 78 | end 79 | 80 | % =================================================% 81 | % Displaying Matrix - V 82 | % =================================================% 83 | fprintf('\n'); 84 | fprintf('Matrix V: \n') 85 | for row = 1:size(V, 1) 86 | fprintf('Row%3d:', row); 87 | for col = 1:size(V, 2) 88 | fprintf('%8.4f', V(row, col)); 89 | end 90 | fprintf('\n') 91 | end 92 | 93 | 94 | % =================================================% 95 | % Displaying Reconstruction Matrix 96 | % =================================================% 97 | fprintf('\n'); 98 | fprintf('Reconstruction (U*S*V''): \n') 99 | for row = 1:size(reconstruction, 1) 100 | fprintf('Row%3d:', row); 101 | for col = 1:size(reconstruction, 2) 102 | fprintf('%8.4f', reconstruction(row, col)); 103 | end 104 | fprintf('\n') 105 | end 106 | 107 | end 108 | 109 | -------------------------------------------------------------------------------- /01_MATLAB/04_Principal_Component_Analysis/README.md: -------------------------------------------------------------------------------- 1 | ### Task 2 | In this task you will implement principal component analysis (PCA) 3 | 4 | ### Command-line Arguments 5 | 6 | Your program will be invoked as follows: 7 | pca_power 8 | The arguments provide to the program the following information: 9 | The first argument, , is the path name of the training file, where the training data is stored. 10 | The path name can specify any file stored on the local computer. 11 | The second argument, , is the path name of the test file, where the test data is stored. 12 | The path name can specify any file stored on the local computer. 13 | The third argument, , specifies the dimension of the output space of the projection. 14 | In other words, you will use the with the largest eigenvalues to define the projection matrix. 15 | The fourth argument, , is a number greater than or equal to 1, that specifies the number of iterations for the power method. 16 | Use the power method to find the top eigenvector, using a sequence bk. 17 | You should stop calculating this sequence after the specified number of iterations, and use the last bk (where k=) as the eigenvector. 18 | The training and test files will follow the same format as the text files in the UCI datasets directory 19 | ### Training Stage Output 20 | 21 | After you compute the projection matrix using the training data, print out the top eigenvectors, in decreasing order of their eigenvalues. Note that you do not need to know the eigenvalues to specify this order. You just print out the eigenvectors in the order in which they have been calculated, based on the pseudocode in slide 54 of the slides on PCA. The output should follow this format: 22 | Eigenvector 1 23 | 1: %.4f 24 | 2: %.4f 25 | ... 26 | D: %.4f 27 | 28 | Eigenvector 2 29 | 1: %.4f 30 | 2: %.4f 31 | ... 32 | D: %.4f 33 | 34 | ... 35 | 36 | Eigenvector M 37 | 1: %.4f 38 | 2: %.4f 39 | ... 40 | D: %.4f 41 | In the above output template: 42 | D is the number of dimensions of the training data. 43 | M is command-line argument . 44 | In each line containing a value of an eigenvector, the first number (the dimension index) should be printed using the %3d format specification (an integer with three allocated spaces). The second value is simply the value of the eigenvector in that dimension, with exactly four decimal digits. 45 | 46 | ### Test Stage 47 | 48 | For each test object (in the order in which test objects appear in the test file), you should print the projection of that test object based on the projection you computed during training. 49 | The output should follow this format: 50 | 51 | Test object 0 52 | 1: %.4f 53 | 2: %.4f 54 | ... 55 | M: %.4f 56 | 57 | Test object 1 58 | 1: %.4f 59 | 2: %.4f 60 | ... 61 | M: %.4f 62 | 63 | ... 64 | In the above output template: 65 | M is command-line argument . 66 | In each line containing a value of the projection of an object, follow the same instructions as for printing values of eigenvectors at the end of the training stage. 67 | Output for answers.pdf 68 | 69 | In your answers.pdf document, you need to provide parts of the output for some invocations of your program listed below. For each invocation, provide: 70 | The full output of the training stage. 71 | ONLY THE PROJECTION OF THE FIRST OBJECT for the test stage. 72 | Include this output for the following invocations of your program: 73 | pca_power pendigits_training pendigits_test 1 10 74 | pca_power satellite_training satellite_test 2 20 75 | pca_power yeast_training yeast_test 3 30 76 | 77 | ### Running The Program 78 | 79 | Run any one of the below commands: 80 | 81 | main('pendigits_training.txt', 'pendigits_test.txt', 1, 10) 82 | 83 | main('satellite_training.txt', 'satellite_test.txt', 2, 20) 84 | 85 | main('yeast_training.txt', 'yeast_test.txt', 3, 30) 86 | -------------------------------------------------------------------------------- /02_PYTHON/01_Naive_Bayes_Classifier/README.md: -------------------------------------------------------------------------------- 1 | ### Task 2 | In this assignment you will implement naive Bayes classifiers based on histograms 3 | 4 | ### Command-line Arguments 5 | 6 | You must implement a program that learns a naive Bayes classifier for a classification problem. 7 | Given some training data and some additional options. In particular, your program will be invoked as follows: 8 | naive_bayes histograms 9 | 10 | ### Training: Histograms 11 | 12 | If the third commandline argument is histograms, then you should model P(x | class) as a histogram separately for each dimension of the data. 13 | The number of bins for each histogram is specified by the fourth command line argument. 14 | Suppose that you are building a histogram of N bins for the j-th dimension of the data and for the c-th class. 15 | Let S be the smallest and L be the largest value in the j-th dimension among all training data belonging to the c-th class. 16 | Let G = (L-S)/(N-3). G will be the width of all bins, except for bin 0 and bin N-1, whose width is infinite. 17 | If you get a value of G that is less than 0.0001, then set G to 0.0001. Your bins should have the following ranges: 18 | 19 | Bin 0, covering interval (-infinity, S-G/2). 20 | Bin 1, covering interval [S-G/2, S+G/2). 21 | Bin 2, covering interval [S+G/2, S+G+G/2). 22 | Bin 3, covering interval [S+G+G/2, S+2G+G/2). 23 | ... 24 | Bin N-2, covering interval [S+(N-4)G+G/2, S+(N-3)G+G/2). This interval is the same as [L-G/2, L+G/2). 25 | Bin N-1, covering interval [S+(N-3)G+G/2, +infinity). This interval is the same as [L+G/2, +infinity). 26 | 27 | The output of the training phase should be a sequence of lines like this: 28 | Class %d, attribute %d, bin %d, P(bin | class) = %.2f 29 | The output lines should be sorted by class number. Within the same class, lines should be sorted by attribute number. 30 | Within the same attribute, lines should be sorted by bin number. Attributes and bins should be numbered starting from 0, not from 1. 31 | In computing the value that you store at each bin of each histogram, you must use Equation 2.241 on page 120 of the textbook. 32 | Notice that the width of the bin appears in the denominator of that equation. As mentioned above, the minimum width should be 0.0001. If your value of G is less than 0.0001, you should set G to 0.0001. 33 | 34 | In your answers.pdf document, provide the output produced by the training stage of your program when given yeast_training.txt as the input file, using seven bins for each histogram. 35 | 36 | ### Classification 37 | 38 | For each test object you should print a line containing the following info: 39 | object ID. This is the line number where that object occurs in the test file. Start with 0 in numbering the objects, not with 1. 40 | predicted class (the result of the classification). If your classification result is a tie among two or more classes, choose one of them randomly. 41 | probability of the predicted class given the data. 42 | true class (from the last column of the test file). 43 | accuracy. This is defined as follows: 44 | If there were no ties in your classification result, and the predicted class is correct, the accuracy is 1. 45 | If there were no ties in your classification result, and the predicted class is incorrect, the accuracy is 0. 46 | If there were ties in your classification result, and the correct class was one of the classes that tied for best, the accuracy is 1 divided by the number of classes that tied for best. 47 | If there were ties in your classification result, and the correct class was NOT one of the classes that tied for best, the accuracy is 0. 48 | To produce this output in a uniform manner, use these printing statements: 49 | For C or C++, use: 50 | printf("ID=%5d, predicted=%3d, probability = %.4lf, true=%3d, accuracy=%4.2lf\n", 51 | object_id, probability, predicted_class, true_class, accuracy); 52 | For Java, use: 53 | System.out.printf("ID=%5d, predicted=%3d, probability = %.4f, true=%3d, accuracy=%4.2f\n", 54 | object_id, predicted_class, probability, true_class, accuracy); 55 | -------------------------------------------------------------------------------- /02_PYTHON/07_Frequentist_Estimate/Frequentist_Estimate.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "id": "c5b12fc7", 6 | "metadata": { 7 | "ExecuteTime": { 8 | "end_time": "2021-12-20T03:18:13.862346Z", 9 | "start_time": "2021-12-20T03:18:13.817895Z" 10 | } 11 | }, 12 | "source": [ 13 | "\"logo\"/" 14 | ] 15 | }, 16 | { 17 | "cell_type": "code", 18 | "execution_count": 1, 19 | "id": "c7d6f26e", 20 | "metadata": { 21 | "ExecuteTime": { 22 | "end_time": "2021-12-20T05:46:00.120744Z", 23 | "start_time": "2021-12-20T05:46:00.084767Z" 24 | } 25 | }, 26 | "outputs": [ 27 | { 28 | "name": "stdout", 29 | "output_type": "stream", 30 | "text": [ 31 | "p(c = 'a') = 0.1026\n" 32 | ] 33 | } 34 | ], 35 | "source": [ 36 | "import random\n", 37 | "\n", 38 | "\n", 39 | "class Task1:\n", 40 | "\n", 41 | "\tdef __init__(self):\n", 42 | "\t\t\"\"\"This Function initialises the random uniform array and the string\"\"\"\n", 43 | "\t\t\n", 44 | "\t\tself.string = \"\"\n", 45 | "\n", 46 | "\tdef form_string(self):\n", 47 | "\t\t\"\"\" This Function will form the string which will contain a and b \"\"\"\n", 48 | "\t\t\n", 49 | "\t\tfor x in range(0, 3100):\n", 50 | "\t\t\trandom_number = random.uniform(0, 1)\n", 51 | "\t\t\tif random_number > 0.1:\n", 52 | "\t\t\t\tself.string += \"b\"\n", 53 | "\t\t\telse:\n", 54 | "\t\t\t\tself.string += \"a\"\n", 55 | "\n", 56 | "\tdef probability_of_character(self, character):\n", 57 | "\t\t\"\"\" This function calculates the probability of occurrence of certain character in the string \"\"\"\n", 58 | "\t\tcount_of_character = self.string.count(character)\n", 59 | "\t\tcount_of_character = float(count_of_character)\n", 60 | "\t\tprobability = count_of_character/len(self.string)\n", 61 | "\t\treturn probability\n", 62 | "\n", 63 | "if __name__ == \"__main__\":\n", 64 | "\t# start = time.time()\n", 65 | "\ttask1 = Task1()\n", 66 | "\ttask1.form_string()\n", 67 | "\tsolution1 = task1.probability_of_character(\"a\")\n", 68 | "\tprint(\"p(c = 'a') = %.4f\" % solution1)\n", 69 | "\t# end = time.time()\n", 70 | "\t# print(\"time\", end - start)\n" 71 | ] 72 | }, 73 | { 74 | "cell_type": "code", 75 | "execution_count": null, 76 | "id": "5285289a", 77 | "metadata": {}, 78 | "outputs": [], 79 | "source": [] 80 | } 81 | ], 82 | "metadata": { 83 | "kernelspec": { 84 | "display_name": "Python 3", 85 | "language": "python", 86 | "name": "python3" 87 | }, 88 | "language_info": { 89 | "codemirror_mode": { 90 | "name": "ipython", 91 | "version": 3 92 | }, 93 | "file_extension": ".py", 94 | "mimetype": "text/x-python", 95 | "name": "python", 96 | "nbconvert_exporter": "python", 97 | "pygments_lexer": "ipython3", 98 | "version": "3.8.8" 99 | }, 100 | "toc": { 101 | "base_numbering": 1, 102 | "nav_menu": {}, 103 | "number_sections": true, 104 | "sideBar": true, 105 | "skip_h1_title": false, 106 | "title_cell": "Table of Contents", 107 | "title_sidebar": "Contents", 108 | "toc_cell": false, 109 | "toc_position": {}, 110 | "toc_section_display": true, 111 | "toc_window_display": false 112 | }, 113 | "varInspector": { 114 | "cols": { 115 | "lenName": 16, 116 | "lenType": 16, 117 | "lenVar": 40 118 | }, 119 | "kernels_config": { 120 | "python": { 121 | "delete_cmd_postfix": "", 122 | "delete_cmd_prefix": "del ", 123 | "library": "var_list.py", 124 | "varRefreshCmd": "print(var_dic_list())" 125 | }, 126 | "r": { 127 | "delete_cmd_postfix": ") ", 128 | "delete_cmd_prefix": "rm(", 129 | "library": "var_list.r", 130 | "varRefreshCmd": "cat(var_dic_list()) " 131 | } 132 | }, 133 | "types_to_exclude": [ 134 | "module", 135 | "function", 136 | "builtin_function_or_method", 137 | "instance", 138 | "_Feature" 139 | ], 140 | "window_display": false 141 | } 142 | }, 143 | "nbformat": 4, 144 | "nbformat_minor": 5 145 | } 146 | -------------------------------------------------------------------------------- /02_PYTHON/07_Frequentist_Estimate/old_Frequentist_Estimate.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "id": "c5b12fc7", 6 | "metadata": { 7 | "ExecuteTime": { 8 | "end_time": "2021-12-20T03:18:13.862346Z", 9 | "start_time": "2021-12-20T03:18:13.817895Z" 10 | } 11 | }, 12 | "source": [ 13 | "\"logo\"/" 14 | ] 15 | }, 16 | { 17 | "cell_type": "code", 18 | "execution_count": 1, 19 | "id": "c7d6f26e", 20 | "metadata": { 21 | "ExecuteTime": { 22 | "end_time": "2021-12-20T05:46:04.900907Z", 23 | "start_time": "2021-12-20T05:46:04.668788Z" 24 | } 25 | }, 26 | "outputs": [ 27 | { 28 | "name": "stdout", 29 | "output_type": "stream", 30 | "text": [ 31 | "p(c = 'a') = 0.0977\n" 32 | ] 33 | } 34 | ], 35 | "source": [ 36 | "import numpy as np\n", 37 | "import time\n", 38 | "\n", 39 | "\n", 40 | "class Task1:\n", 41 | "\n", 42 | "\tdef __init__(self):\n", 43 | "\t\t\"\"\"This Function initialises the random uniform array and the string\"\"\"\n", 44 | "\t\t\n", 45 | "\t\tself.randUniformArray = np.random.uniform(0, 1, 3100)\n", 46 | "\t\tself.string = \"\"\n", 47 | "\n", 48 | "\tdef form_string(self):\n", 49 | "\t\t\"\"\" This Function will form the string which will contain a and b \"\"\"\n", 50 | "\t\t\n", 51 | "\t\tfor x in range(0, 3100):\n", 52 | "\t\t\tif self.randUniformArray[x] > 0.1:\n", 53 | "\t\t\t\tself.string += \"b\"\n", 54 | "\t\t\telse:\n", 55 | "\t\t\t\tself.string += \"a\"\n", 56 | "\n", 57 | "\tdef probability_of_character(self, character):\n", 58 | "\t\t\"\"\" This function calculates the probability of occurrence of certain character in the string \"\"\"\n", 59 | "\t\t\n", 60 | "\t\tcount_of_character = self.string.count(character)\n", 61 | "\t\tcount_of_character = float(count_of_character)\n", 62 | "\t\tprobability = count_of_character/len(self.string)\n", 63 | "\t\treturn probability\n", 64 | "\n", 65 | "if __name__ == \"__main__\":\n", 66 | "\t# start = time.time()\n", 67 | "\ttask1 = Task1()\n", 68 | "\ttask1.form_string()\n", 69 | "\tsolution1 = task1.probability_of_character(\"a\")\n", 70 | "\tprint(\"p(c = 'a') = %.4f\" % solution1)\n", 71 | "\t# end = time.time()\n", 72 | "\t# print(\"time\", end - start)\n" 73 | ] 74 | }, 75 | { 76 | "cell_type": "code", 77 | "execution_count": null, 78 | "id": "5285289a", 79 | "metadata": {}, 80 | "outputs": [], 81 | "source": [] 82 | } 83 | ], 84 | "metadata": { 85 | "kernelspec": { 86 | "display_name": "Python 3", 87 | "language": "python", 88 | "name": "python3" 89 | }, 90 | "language_info": { 91 | "codemirror_mode": { 92 | "name": "ipython", 93 | "version": 3 94 | }, 95 | "file_extension": ".py", 96 | "mimetype": "text/x-python", 97 | "name": "python", 98 | "nbconvert_exporter": "python", 99 | "pygments_lexer": "ipython3", 100 | "version": "3.8.8" 101 | }, 102 | "toc": { 103 | "base_numbering": 1, 104 | "nav_menu": {}, 105 | "number_sections": true, 106 | "sideBar": true, 107 | "skip_h1_title": false, 108 | "title_cell": "Table of Contents", 109 | "title_sidebar": "Contents", 110 | "toc_cell": false, 111 | "toc_position": {}, 112 | "toc_section_display": true, 113 | "toc_window_display": false 114 | }, 115 | "varInspector": { 116 | "cols": { 117 | "lenName": 16, 118 | "lenType": 16, 119 | "lenVar": 40 120 | }, 121 | "kernels_config": { 122 | "python": { 123 | "delete_cmd_postfix": "", 124 | "delete_cmd_prefix": "del ", 125 | "library": "var_list.py", 126 | "varRefreshCmd": "print(var_dic_list())" 127 | }, 128 | "r": { 129 | "delete_cmd_postfix": ") ", 130 | "delete_cmd_prefix": "rm(", 131 | "library": "var_list.r", 132 | "varRefreshCmd": "cat(var_dic_list()) " 133 | } 134 | }, 135 | "types_to_exclude": [ 136 | "module", 137 | "function", 138 | "builtin_function_or_method", 139 | "instance", 140 | "_Feature" 141 | ], 142 | "window_display": false 143 | } 144 | }, 145 | "nbformat": 4, 146 | "nbformat_minor": 5 147 | } 148 | -------------------------------------------------------------------------------- /02_PYTHON/03_Mixture_Of_Gaussian_Using_EM_Algortihm/README.md: -------------------------------------------------------------------------------- 1 | ### Task 2 | You must implement a program that learns a naive Bayes classifier for a classification problem. 3 | 4 | 5 | ### Command-line Arguments 6 | Given some training data and some additional options. In particular, your program can be invoked as follows: 7 | naive_bayes mixtures 8 | 9 | ### Training: Mixtures of Gaussians 10 | 11 | If the third commandline argument is mixtures, then you should model P(x | class) as a mixture of Gaussians separately for each dimension of the data. 12 | The number of Gaussians for each mixture is specified by the fourth command line argument. 13 | Suppose that you are building a mixture of N Gaussians for the j-th dimension of the data and for the c-th class. 14 | Let S be the smallest and L be the largest value in the j-th dimension among all training data belonging to the c-th class. 15 | Let G = (L-S)/N. Then, you should initialize all standard deviations of the mixture to 1, you should initialize all weights to 1/N, and you should initialize the means as follows: 16 | 17 | For the first Gaussian, the initial mean should be S + G/2. 18 | For the second Gaussian, the initial mean should be S + G + G/2. 19 | For the third Gaussian, the initial mean should be S + 2G + G/2. 20 | ... 21 | For the N-th Gaussian, the initial mean should be S + (N-1)G + G/2. 22 | You should repeat the main loop of the EM algorithm 50 times. So, no need to worry about any other stopping criterion. 23 | Your stopping criterion is simply that the loop has been executed 50 times. 24 | In certain cases, it is possible that the M-step computes a value for the standard deviation that is equal to zero. 25 | Your code should make sure that the variance of the Gaussian is NEVER smaller than 0.0001. 26 | Since the variance is the square of the standard deviation, this means that the standard deviation should never be smaller than sqrt(0.0001) = 0.01. 27 | Any time the M-step computes a value for the standard deviation that is smaller than 0.01, your code should replace that value with 0.01. 28 | 29 | The output of the training phase should be a sequence of lines like this: 30 | 31 | Class %d, attribute %d, Gaussian %d, mean = %.2f, std = %.2f 32 | The output lines should be sorted by class number. Within the same class, lines should be sorted by attribute number. 33 | Within the same attribute, lines should be sorted by Gaussian number. 34 | Attributes and Gaussians should be numbered starting from 0, not from 1. 35 | In your answers.pdf document, provide the output produced by the training stage of your program when given yeast_training.txt as the input file, using three Gaussians for each mixture. 36 | 37 | 38 | ### Classification 39 | 40 | For each test object you should print a line containing the following info: 41 | object ID. This is the line number where that object occurs in the test file. Start with 0 in numbering the objects, not with 1. 42 | predicted class (the result of the classification). If your classification result is a tie among two or more classes, choose one of them randomly. 43 | probability of the predicted class given the data. 44 | true class (from the last column of the test file). 45 | accuracy. This is defined as follows: 46 | If there were no ties in your classification result, and the predicted class is correct, the accuracy is 1. 47 | If there were no ties in your classification result, and the predicted class is incorrect, the accuracy is 0. 48 | If there were ties in your classification result, and the correct class was one of the classes that tied for best, the accuracy is 1 divided by the number of classes that tied for best. 49 | If there were ties in your classification result, and the correct class was NOT one of the classes that tied for best, the accuracy is 0. 50 | To produce this output in a uniform manner, use these printing statements: 51 | use: 52 | printf("ID=%5d, predicted=%3d, probability = %.4lf, true=%3d, accuracy=%4.2lf\n", 53 | object_id, probability, predicted_class, true_class, accuracy); 54 | 55 | For Python or any other language, just make sure that you use formatting specifies that produce aligned output that matches the specs given for C and Java. 56 | Object IDs should be numbered starting from 0, not 1. 57 | After you have printed the results for all test objects, you should print the overall classification accuracy, which is defined as the average of the classification accuracies you printed out for each test object. 58 | -------------------------------------------------------------------------------- /01_MATLAB/01_Descision_Tree_and_Random_Forest/code/random.m: -------------------------------------------------------------------------------- 1 | function [] = random(training_file, testing_file, option, pruning_thr) 2 | %train_file = 'pendigits_training.txt'; 3 | %test_file = 'pendigits_test.txt'; 4 | train_file = training_file; 5 | test_file = testing_file; 6 | train_data = load(train_file); 7 | test_data = load(test_file); 8 | target = train_data(:, end); 9 | test_target = test_data(:, end); 10 | unique_class = unique(target); 11 | pruning_thrsld = pruning_thr; 12 | classificatio_acc=0; 13 | class_max = max(target); 14 | 15 | tree1 = []; 16 | thrsldeshold1 = []; 17 | gainin1 = []; 18 | index1 = 1; 19 | 20 | 21 | tree2 = []; 22 | thrsldeshold2 = []; 23 | gainin2 = []; 24 | index2 = 1; 25 | 26 | 27 | tree3 = []; 28 | thrsldeshold3 = []; 29 | gainin3 = []; 30 | index3 = 1; 31 | 32 | attributes = zeros(1, size(train_data, 2)-1); 33 | for col = 1: size(train_data, 2)-1 34 | attributes(1, col) = col; 35 | end 36 | 37 | % = [1, 2, 3, 4, 5 ,6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16]; 38 | 39 | 40 | 41 | [tree1,thrsldeshold1,gainin1] = make_tree(train_data,pruning_thrsld,option,attributes,class_max,tree1,thrsldeshold1,gainin1,index1); 42 | [tree2,thrsldeshold2,gainin2] = make_tree(train_data,pruning_thrsld,option,attributes,class_max,tree2,thrsldeshold2,gainin2,index2); 43 | [tree3,thrsldeshold3,gainin3] = make_tree(train_data,pruning_thrsld,option,attributes,class_max,tree3,thrsldeshold3,gainin3,index3); 44 | 45 | for i=1:size(tree1,2) 46 | if (tree1(:,i)-1) ~= -1 47 | fprintf('tree=%2d, node=%3d, feature=%2d, thr=%6.2f, gain=%f\n',0,i,tree1(:,i)-1,thrsldeshold1(:,i),gainin1(:,i)); 48 | end 49 | end 50 | 51 | for i=1:size(tree2, 2) 52 | if (tree2(:,i)-1) ~= -1 53 | fprintf('tree=%2d, node=%3d, feature=%2d, thr=%6.2f, gain=%f\n',1,i,tree2(:,i)-1,thrsldeshold2(:,i),gainin2(:,i)); 54 | end 55 | end 56 | 57 | for i=1:size(tree3, 2) 58 | if (tree3(:,i)-1) ~= -1 59 | fprintf('tree=%2d, node=%3d, feature=%2d, thr=%6.2f, gain=%f\n',2,i,tree3(:,i)-1,thrsldeshold3(:,i),gainin3(:,i)); 60 | end 61 | end 62 | 63 | 64 | for test_row=1:size(test_data,1) 65 | index=1; 66 | is_leaf=1; 67 | while is_leaf == 1 68 | atest_classr = tree1(index); 69 | thrsld = thrsldeshold1(index); 70 | gain = gainin1(index); 71 | if thrsld == -1 && gain == -1 72 | tree_1_values = atest_classr; 73 | is_leaf=0; 74 | else 75 | if (test_data(test_row,atest_classr))>=thrsld 76 | index=(2*index)+1; 77 | else 78 | index=(2*index); 79 | end 80 | end 81 | end 82 | index=1; 83 | is_leaf=1; 84 | while is_leaf == 1 85 | atest_classr = tree2(index); 86 | thrsld=thrsldeshold2(index); 87 | gain=gainin2(index); 88 | if thrsld ~= -1 && gain ~= -1 89 | if (test_data(test_row,atest_classr)) 0 39 | d = utility_matrix((row -1), col); 40 | end 41 | %================================= 42 | % Can Go Up ?? 43 | %================================= 44 | if (row + 1) > size(utility_matrix,1) || utility_matrix((row + 1), col) == 2 45 | u = utility_matrix(row, col); 46 | %up = 1; 47 | elseif (row + 1) <= size(utility_matrix,1) 48 | u = utility_matrix((row + 1), col); 49 | end 50 | %================================= 51 | % Can Go Left ?? 52 | %================================= 53 | if (col - 1) == 0 || utility_matrix(row, (col - 1)) == 2 54 | l = utility_matrix(row, col); 55 | %left = 1; 56 | elseif (col - 1) > 0 57 | l = utility_matrix(row, (col - 1)); 58 | end 59 | %================================= 60 | % Can Go Right ?? 61 | %================================= 62 | if (col + 1) > size(utility_matrix, 2) || utility_matrix(row, (col + 1)) == 2 63 | r = utility_matrix(row, col); 64 | %right = 1; 65 | elseif (col + 1) <= size(utility_matrix, 2) 66 | r = utility_matrix(row, (col + 1)); 67 | end 68 | % right, left, up, down 69 | ut = [r, l, u, d]; 70 | ut = ut'; 71 | 72 | %================================= 73 | % Calculating Utility Sequence 74 | %================================= 75 | % right, left, up, down 76 | uti = zeros(4, 1); 77 | for x = 1:4 78 | uti(x, 1) = non_terminal + g*ut(x, 1); 79 | end 80 | 81 | %================================= 82 | % Calculating Expected Utility 83 | %================================= 84 | % right, left, up, down 85 | util = zeros(4, 1); 86 | util(1, 1) = 0.8*uti(1, 1)+0.1*uti(3, 1)+0.1*uti(4, 1); 87 | util(2, 1) = 0.8*uti(2, 1)+0.1*uti(3, 1)+0.1*uti(4, 1); 88 | util(3, 1) = 0.8*uti(3, 1)+0.1*uti(1, 1)+0.1*uti(2, 1); 89 | util(4, 1) = 0.8*uti(4, 1)+0.1*uti(1, 1)+0.1*uti(2, 1); 90 | 91 | %========================================= 92 | % Selecting the Best Uitility 93 | %========================================= 94 | max_value = max(util); 95 | dummy(row, col) = max_value; 96 | 97 | end 98 | end 99 | utility_matrix = dummy(:, :); 100 | end 101 | %========================================= 102 | % Displaying Results 103 | %========================================= 104 | result = utility_matrix(:, :); 105 | result(result == 2) = 0; 106 | for row = 1:size(result, 1) 107 | for col = 1:size(result, 2) 108 | if col == size(result, 2) 109 | fprintf('%6.3f ', result(row, col)); 110 | else 111 | fprintf('%6.3f, ', result(row, col)); 112 | end 113 | end 114 | fprintf('\n'); 115 | end 116 | end 117 | 118 | 119 | 120 | 121 | 122 | 123 | -------------------------------------------------------------------------------- /01_MATLAB/10_Dynamic_Time_Warping/README.md: -------------------------------------------------------------------------------- 1 | ### Task 1 (100 points) 2 | 3 | In this task you will implement 1-nearest neighbor classification of time series using dynamic time warping (DTW) as the distance measure. 4 | Your zip file should have a folder called dtw_classification, which contains your code and the README.txt file. 5 | 6 | ### Command-line Arguments 7 | 8 | Your program will be invoked as follows: 9 | dtw_classify 10 | The arguments provide to the program the following information: 11 | The first argument, , is the path name of the training file, where the training data is stored. The path name can specify any file stored on the local computer. 12 | The second argument, , is the path name of the test file, where the test data is stored. The path name can specify any file stored on the local computer. 13 | The training and test files will follow the same format as the text files asl_training.txt and asl_test.txt. For example, for test object 40, file asl_test.txt contains the following information: 14 | object ID: 40 15 | class label: 53 16 | sign meaning: advice 17 | 18 | dominant hand trajectory: 19 | -0.098205 0.584317 20 | -0.108025 0.554856 21 | -0.088384 0.535215 22 | -0.088384 0.535215 23 | -0.068743 0.545035 24 | -0.039282 0.574497 25 | 0.009820 0.574497 26 | 0.058923 0.643240 27 | 0.108025 0.702162 28 | 0.147307 0.751265 29 | 0.186589 0.790547 30 | 0.216050 0.829828 31 | 0.216050 0.859290 32 | 0.225870 0.888751 33 | 0.235691 0.918212 34 | 0.245511 0.928033 35 | 0.265152 0.947674 36 | The object ID for that object is 40. The class label is 53, so classification of that object is correct if and only if its nearest neighbor among the training objects also has class label 53. In the example training and test files, for every test object there are only two training objects with the same class label. 37 | Each time series is a sequence of two-dimensional vectors. For the example shown above (test example 40), after the line with text "dominant hand trajectory", there is a sequence of 17 lines. The n-th line in that sequence contains the value of the n-th vector in the time series. Different objects have different length. 38 | 39 | ### Implementation Guidelines 40 | 41 | In contrast to the previous assignment, do NOT do any type of normalization on the time series values that you read from the files. Just use those values as they are. 42 | Use the L2 distance (the Euclidean distance) for computing the cost of matching two 2D vectors to each other. Use DTW, as described in the course slides, to compute the distance between two time series. 43 | 44 | ### Classification Stage 45 | 46 | For each test object you should print a line containing the following info: 47 | object ID. This number is explicitly stated in the file for each object. Note that object IDs are numbered starting from 1. 48 | predicted class (the result of the classification). If your classification result is a tie among two or more classes, choose one of them randomly. 49 | true class (from the last column of the test file). 50 | accuracy. This is defined as follows: 51 | If there were no ties in your classification result, and the predicted class is correct, the accuracy is 1. 52 | If there were no ties in your classification result, and the predicted class is incorrect, the accuracy is 0. 53 | If there were ties in your classification result, and the correct class was one of the classes that tied for best, the accuracy is 1 divided by the number of classes that tied for best. 54 | If there were ties in your classification result, and the correct class was NOT one of the classes that tied for best, the accuracy is 0. 55 | the DTW distance of the test object to its nearest neighbor in the training objects. 56 | To produce this output in a uniform manner, use these printing statements: 57 | Use: 58 | printf("ID=%5d, predicted=%3d, true=%3d, accuracy=%4.2lf, distance = %.2lf\n", object_id, predicted_class, true_class, accuracy, distance); 59 | 60 | 61 | After you have printed the results for all test objects, you should print the overall classification accuracy, which is defined as the average of the classification accuracies you printed out for each test object. To print the classification accuracy in a uniform manner, use these printing statements: 62 | use: 63 | printf("classification accuracy=%6.4lf\n", classification_accuracy); 64 | 65 | ### Output for answers.pdf 66 | 67 | In your answers.pdf document, you need to provide the COMPLETE output for the following invocation of your program: 68 | dtw_classify asl_training.txt asl_test.txt 69 | 70 | ### Grading 71 | 72 | 75 points: Correct implementation of the 1-nearest neighbor classifier with DTW as the distance measure. 73 | 25 points: Following the specifications in producing the required output and in producing the answers.pdf file. 74 | 75 | ### Running The Program 76 | 1) open "main.m" 77 | 2) type main('asl_training.txt', 'asl_test.txt') in the command window 78 | 3) press enter. 79 | -------------------------------------------------------------------------------- /01_MATLAB/07_Logistic_Regression/code/logistic_regression.m: -------------------------------------------------------------------------------- 1 | function [] = logistic_regression(train_file, degree, test_file) 2 | 3 | %============================================ 4 | % Initialising Values 5 | %============================================ 6 | data = double(load(train_file)); 7 | target = data(1: end, end); 8 | data = data(:,1:end-1); 9 | target(target > 1) = 0; 10 | prev_error=0; 11 | rows = size(data,1); 12 | cols = size(data,2); 13 | phi = zeros(rows,1); 14 | 15 | %============================================ 16 | % Calculating Training Phi Matrix 17 | %============================================ 18 | for row = 1: rows 19 | phi(row, 1) = 1; 20 | x = 2; 21 | for col = 1: cols 22 | for deg = 1:degree 23 | phi(row, x) = data(row, col)^deg; 24 | x = x+1; 25 | end 26 | end 27 | end 28 | 29 | %============================================ 30 | % Training starts from this point 31 | %============================================ 32 | condition = true; 33 | wght = zeros(cols*degree+1,1); 34 | phi_trans = transpose(phi); 35 | m = 1; 36 | 37 | while condition 38 | wghtT = transpose(wght); 39 | 40 | for i = 1:rows 41 | output(i,1) = wghtT * phi_trans(1:end,i); 42 | output(i,1) = 1 / (1 + exp(output(i,1)*(-1))); 43 | end 44 | 45 | %============================================ 46 | % Calculating Error Matrix 47 | %============================================ 48 | E = phi_trans * (output - target); 49 | 50 | %============================================ 51 | % Calculating New Error 52 | %============================================ 53 | new_error = sum(E,1); 54 | 55 | 56 | %============================================ 57 | % Calculating Error Difference 58 | %============================================ 59 | error_diff = abs(new_error - prev_error); 60 | 61 | %============================================ 62 | % Calculating R Diagonal Matrix 63 | %============================================ 64 | R = zeros(rows,rows); 65 | for i = 1:rows 66 | R(i,i) = output(i,1) * (1 - output(i,1)); 67 | end 68 | 69 | %============================================ 70 | % Calculating New Weights 71 | %============================================ 72 | new_wght = wght - pinv(phi_trans * R * phi) * E ; 73 | 74 | condition = abs(sum(new_wght) - sum(wght)) >= 0.001 && error_diff>= 0.001; 75 | if condition 76 | wght = new_wght; 77 | prev_error = new_error; 78 | end 79 | m = m + 1; 80 | end 81 | 82 | %============================================ 83 | % Testing Initialisation 84 | %============================================ 85 | test_data = double(load(test_file)); 86 | target = test_data(1: end, end); 87 | test_data = test_data(:,1:end-1); 88 | rows = size(test_data,1); 89 | cols = size(test_data,2); 90 | phi = zeros(rows,1); 91 | 92 | %============================================ 93 | % Calculating Testing Phi Matrix 94 | %============================================ 95 | for row = 1: rows 96 | phi(row, 1) = 1; 97 | x = 2; 98 | for col = 1: cols 99 | for deg = 1:degree 100 | phi(row, x) = test_data(row, col)^deg; 101 | x = x+1; 102 | end 103 | end 104 | end 105 | 106 | %============================================ 107 | % Calculating Testing Output Matrix 108 | %============================================ 109 | phi_trans = transpose(phi); 110 | for i = 1:rows 111 | output(i,1) = transpose(new_wght) * phi_trans(1:end,i); 112 | output(i,1) = 1 / (1 + exp(output(i,1)*(-1))); 113 | end 114 | 115 | target(target > 1) = 0; 116 | predicted = zeros(size(output, 1), 1); 117 | accuracy = zeros(rows, 1); 118 | 119 | %============================================ 120 | % Printing the Weights 121 | %============================================ 122 | for i = 1:size(new_wght, 1) 123 | fprintf(' W%d = %.4f\n', i-1, new_wght(i, 1)); 124 | end 125 | 126 | %============================================ 127 | % Prediction 128 | %============================================ 129 | for i = 1:rows 130 | first = transpose(new_wght) * transpose(phi(i, 1:end)); 131 | second = output(i, 1) ; 132 | if (first > 0) && (second > 0.5) 133 | predicted(i, 1) = 1; 134 | if predicted(i, 1) == target(i, 1) 135 | accuracy(i, 1) = 1; 136 | end 137 | elseif (first < 0) && (1 - second > 0.5) 138 | predicted(i, 1) = 0; 139 | output(i, 1) = (1 - second); 140 | if predicted(i, 1) == target(i, 1) 141 | accuracy(i, 1) = 1; 142 | end 143 | else 144 | predicted(i, 1) = 1; 145 | accuracy(i, 1) = 0.5; 146 | end 147 | fprintf(' objectID=%5d, predicted=%3d, probability = %.4f, true=%3d, accuracy=%4.2f \n', i-1, predicted(i, 1), output(i, 1), target(i, 1), accuracy(i, 1)); 148 | end 149 | 150 | num = sum(accuracy); 151 | den = size(accuracy, 1); 152 | final_acc = num/den; 153 | 154 | fprintf('classification accuracy=%6.4f \n', final_acc) 155 | 156 | 157 | 158 | end 159 | -------------------------------------------------------------------------------- /01_MATLAB/07_Logistic_Regression/README.md: -------------------------------------------------------------------------------- 1 | ### Task 2 | In this task you will implement logistic regression using Iterative Reweighted Least Square 3 | 4 | ### Command-line Arguments 5 | 6 | You must implement a program that uses linear regression to fit a line or a second-degree polynomial to a set of training data. 7 | Your program can be invoked as follows: 8 | logistic_regression 9 | The arguments provide to the program the following information: 10 | The first argument, , is the path name of the training file, where the training data is stored. 11 | The path name can specify any file stored on the local computer. 12 | The second argument, is a number equal to either 1 or 2. 13 | We will not test your code with any other values. The degree specifies what function φ you should use. 14 | Suppose that you have an input vector x = (x1, x2, ..., xD)T,. 15 | If the degree is 1, then φ(x) = (1, x1, x2, ..., xD)T. 16 | If the number is 2, then φ(x) = (1, x1, (x1)2, x2, (x2)2..., xD, (xD)2)T. 17 | The third argument, , is the path name of the test file, where the test data is stored. 18 | The path name can specify any file stored on the local computer. 19 | The training and test files will follow the same format as the text files in the UCI datasets directory. 20 | A description of the datasets and the file format can be found on this link. 21 | For each dataset, a training file and a test file are provided. 22 | The name of each file indicates what dataset the file belongs to, and whether the file contains training or test data. 23 | Your code should also work with ANY OTHER training and test files using the same format as the files in the UCI datasets directory. 24 | A description of the datasets and the file format can be found on this link. 25 | 26 |
27 |     UCI dataset directory
28 | 
29 | 30 | 31 | ### Converting to Binary Classification Problem 32 | 33 | We have only covered logistic regression for binary classification problems. In this assignment, you should convert the class labels found in the files as follows: 34 | If the class label is equal to 1, it stays equal to 1. 35 | If the class label is not equal to 1, you must set it equal to 0. 36 | This way, your code will only see class labels that are 1 or 0. 37 | 38 | ### Weight Initialisation 39 | All weights must be initialized to 0. 40 | 41 | 42 | ### Stopping Criteria 43 | 44 | For logistic regression, the training goes through iterations. At each iteration, you should decide as follows if you should stop the training: 45 | Compare the new weight values, computed at this iteration, with the previous weight values. If the sum of absolute values of differences of individual weights is less than 0.001, then you should stop the training. 46 | Compute the cross-entropy error, using the new weights computed at this iteration. Compare it with the cross-entropy error computed using the previous value of weights. If the change in the error is less than 0.001, then you should stop the training. 47 | 48 | ### Numerical Issues for Yeast Dataset 49 | 50 | Your code will probably not work on the yeast dataset for degree=2. Don't worry about that, we will not test for that case. As an optional (and zero-credit) task, figure out how to make the code work in this case. Feel free to do any changes you want, including ignoring some dimensions of the data. If you succeed, describe in your answers.pdf file what you did. 51 | 52 | 53 | ### Output of Training Stage 54 | printf("w0=%.4lf\n", w0); 55 | printf("w1=%.4lf\n", w1); 56 | printf("w2=%.4lf\n", w2); 57 | ... 58 | For any other language, just make sure that you use formatting specifies that produce aligned output that matches EXACTLY the specs given above for C. You should print exactly as many lines as the number of weights that you are estimating (D+1 weights if degree=1, 2D+1 weights if degree=2). 59 | 60 | ### Output of Test Stage 61 | 62 | After the training stage, you should apply the classifier that you have learned on the test data. For each test object (following the order in which each test object appears in the test file), you should print a line containing the following info: 63 | object ID. This is the line number where that object occurs in the test file. Start with 0 in numbering the objects, not with 1. 64 | predicted class (the result of the classification). If your classification result is a tie, choose one of them randomly. 65 | probability of the predicted class given the data. This probability is the output of the classifier if the predicted class is 1. 66 | If the predicted class is 0, then the probability is 1 minus the output of the classifier. 67 | true class (should be binary, 0 or 1). 68 | accuracy. This is defined as follows: 69 | If there were no ties in your classification result, and the predicted class is correct, the accuracy is 1. 70 | If there were no ties in your classification result, and the predicted class is incorrect, the accuracy is 0. 71 | If there were ties in your classification result, and the correct class was one of the classes that tied for best, the accuracy is 1 divided by the number of classes that tied for best. 72 | If there were ties in your classification result, since we only have two classes, the accuracy is 0.5. 73 | To produce this output in a uniform manner, use these printing statements: 74 | Use: 75 | printf("ID=%5d, predicted=%3d, probability = %.4lf, true=%3d, accuracy=%4.2lf\n", 76 | object_id, probability, predicted_class, true_class, accuracy); 77 | 78 | ### Running The Program 79 | 1) type "logistic_regression(training_filename, degree, testing_filename) 80 | 2) press enter. 81 | 3) for example : logistic_regression('pendigits_training.txt', 1, 'pendigits_test.txt') 82 | -------------------------------------------------------------------------------- /01_MATLAB/01_Descision_Tree_and_Random_Forest/README.md: -------------------------------------------------------------------------------- 1 | ### Task 1 2 | 3 | In this task you will implement decision trees and decision forests. 4 | Your program will learn decision trees from training data and will apply decision trees and decision forests to classify test objects. 5 | Your zip file should have a folder called decision_trees, which contains your code and the README.txt file. 6 | 7 | ### Command-line Arguments 8 | 9 | Your program will be invoked as follows: 10 | dtree