├── Bioinformatics └── README.md ├── Computer Vision └── README.md ├── LICENSE ├── Machine Learning ├── Problem1 │ ├── 1_Logistic_Regression.ipynb │ ├── 2_Poisson_Regression.ipynb │ ├── 3_Gaussian_Discriminant_Analysis.ipynb │ ├── 4_Linear_Invariance.ipynb │ ├── 5_Quasar_Regression.ipynb │ ├── data │ │ ├── logistic_x.txt │ │ ├── logistic_y.txt │ │ ├── quasar_test.csv │ │ └── quasar_train.csv │ └── ps1.pdf ├── Problem2 │ ├── 1_Training_Stability.ipynb │ ├── 2_Model_Calibration.ipynb │ ├── 3_Bayesian_Logistic_Regression.ipynb │ ├── 4_Constructing_Kernels.ipynb │ ├── 5_Kernelizing_the_Perceptron.ipynb │ ├── 6_Spam_Classification.ipynb │ ├── data │ │ ├── MATRIX.TEST │ │ ├── MATRIX.TRAIN │ │ ├── MATRIX.TRAIN.100 │ │ ├── MATRIX.TRAIN.1400 │ │ ├── MATRIX.TRAIN.200 │ │ ├── MATRIX.TRAIN.400 │ │ ├── MATRIX.TRAIN.50 │ │ ├── MATRIX.TRAIN.800 │ │ ├── data_a.txt │ │ └── data_b.txt │ ├── nb.py │ ├── ps2.pdf │ └── svm.py ├── Problem3 │ ├── 1_Simple_Neural_Network.ipynb │ ├── 2_EM_for_MAP.ipynb │ ├── 3_EM_Application.ipynb │ ├── 4_KL_Divergence.ipynb │ ├── 5_K-means_for_Compression.ipynb │ ├── data │ │ ├── mandrill-large.tiff │ │ ├── mandrill-small.tiff │ │ └── triangle_pb3_1.jpg │ └── ps3.pdf ├── Problem4 │ ├── 2_EM-Convergence.ipynb │ ├── 4_Independent-Component-Analysis.ipynb │ ├── data │ │ ├── bellsej.py │ │ ├── cart_pole.py │ │ ├── control.py │ │ ├── mix.dat │ │ ├── mnist.zip │ │ └── nn_starter.py │ └── ps4.pdf └── Readme.md ├── NLP ├── README.md ├── assignment1 │ ├── Makefile │ ├── assignment1-solution.pdf │ ├── assignment1.pdf │ ├── collect_submission.sh │ ├── get_datasets.sh │ ├── q1_softmax.py │ ├── q2_gradcheck.py │ ├── q2_neural.py │ ├── q2_sigmoid.py │ ├── q3_run.py │ ├── q3_sgd.py │ ├── q3_word2vec.py │ ├── q3_word_vectors.png │ ├── q4_dev_conf.png │ ├── q4_reg_v_acc.png │ ├── q4_sentiment.py │ ├── requirements.txt │ └── utils │ │ ├── __pycache__ │ │ ├── __init__.cpython-36.pyc │ │ ├── glove.cpython-36.pyc │ │ └── treebank.cpython-36.pyc │ │ ├── glove.py │ │ └── treebank.py ├── assignment2 │ ├── assignment2-soln.pdf │ ├── assignment2.pdf │ ├── model.py │ ├── q1_classifier.py │ ├── q1_softmax.py │ ├── q2_initialization.py │ ├── q2_parser_model.py │ ├── q2_parser_transitions.py │ └── utils │ │ ├── general_utils.py │ │ └── parser_utils.py └── assignment3 │ ├── assignment3-soln.pdf │ ├── assignment3.pdf │ ├── data_util.py │ ├── defs.py │ ├── model.py │ ├── ner_model.py │ ├── q1_window.py │ ├── q2_rnn.py │ ├── q2_rnn_cell.py │ ├── q3-clip-gru.png │ ├── q3-clip-rnn.png │ ├── q3-noclip-gru.png │ ├── q3_gru.py │ ├── q3_gru_cell.py │ ├── requirements.txt │ └── util.py ├── Python ├── CME193 │ ├── lec1.pdf │ ├── lec2.pdf │ ├── lec3.pdf │ ├── lec4.pdf │ ├── lec5.pdf │ ├── lec6.pdf │ ├── lec7.pdf │ ├── lec8.pdf │ └── problemsets │ │ ├── Markov-chain-startercode-master.zip │ │ ├── Rock-paper-scissors-startercode-master.zip │ │ ├── exercises.pdf │ │ ├── hangman-master.zip │ │ └── shakespeare.txt └── README.md ├── Readme.md └── Speech └── Readme.md /Bioinformatics/README.md: -------------------------------------------------------------------------------- 1 | 6 | 7 | # Bio Informatics 8 | 9 | ## Course List 10 | **S.No** | **Course Title** | **Link to course** 11 | ------------ | ------------- | --------- 12 | [1](#1-computational-systems-biology--deep-learning-in-life-sciences) | Computational Systems Biology : Deep Learning in Life Sciences | https://mit6874.github.io/ 13 | [2](#2-computational-genomics) | Computatinal Genomics | https://web.stanford.edu/class/cs262/ 14 | [3](#3-the-human-genome-source-code) | The Human Genome Source Code | https://web.stanford.edu/class/cs273a/ 15 | 16 | ## Course Details 17 | ### 1. Computational Systems Biology : Deep Learning in Life Sciences 18 | * **Link to course**            :     https://mit6874.github.io/ 19 | * **Offered By**                  :     MIT 20 | * **Pre-Requisites**           :     Calculus, Linear Algebra, Python programming,Probability, 21 |                                            Introductory molecular biology 22 | * **Level**                           :     Advanced 23 | * **Course description** 24 | This course introduces foundations and state-of-the-art machine learning challenges in genomics and the life sciences more broadly. The course introduces both deep learning and classical machine learning approaches to key problems, comparing and contrasting their power and limitations. 25 | 26 | 27 | ### 2. Computational Genomics 28 | * **Link to course**            :     https://web.stanford.edu/class/cs262/ 29 | * **Offered By**                  :     Stanford 30 | * **Pre-Requisites**           :     Design and Analysis of Algorithms 31 | * **Level**                           :     Intermediate 32 | * **Course description** 33 | Genomics is a new and very active application area of computer science. Computer science is playing a central role in genomics: from sequencing and assembling of DNA sequences to analyzing genomes in order to locate genes, similarities between sequences of different organisms, and several other applications. This course aims to present some of the most basic and useful algorithms for sequence analysis, together with the minimal biological background. Sequence alignments, hidden Markov models, multiple alignment algorithms and heuristics such as Gibbs sampling, and the probabilistic interpretation of alignments will be covered. 34 | 35 | 36 | ### 3. The Human Genome Source Code 37 | * **Link to course**            :     https://web.stanford.edu/class/cs273a/ 38 | * **Offered By**                  :     Stanford 39 | * **Pre-Requisites**           :     Programming Experience in any language 40 | * **Level**                           :     Beginner 41 | * **Course description** 42 | The course introduces you to various aspects of genomic data . The course contents cover Population genomics & paternity testing, Medical AI (disease) genomics and Comparative (evolutionary) genomics and maybe a dash of cryptogenomics and genomic privacy. 43 | 44 | #### Happy Learning   :thumbsup: :memo: 45 | 46 | 47 | 48 | 49 | 50 | -------------------------------------------------------------------------------- /Computer Vision/README.md: -------------------------------------------------------------------------------- 1 | 8 | 9 | # Computer Vision 10 | 11 | ## Course List 12 | **S.No** | **Course Title** | **Link to course** | **Link to Assignment Solutions** 13 | ------------ | ------------- | --------- | ----------- 14 | [1](#1-computer-vision--foundations-and-applications) | Computer Vision: Foundations and Applications | http://vision.stanford.edu/teaching/cs131_fall2122/ | [CS131 Solutions](https://github.com/StanfordVL/CS131_release) 15 | [2](#2-deep-learning-for-computer-vision) | Deep Learning for Computer Vision | http://cs231n.stanford.edu/ | [CS231 Solutions](https://github.com/mantasu/cs231n) 16 | 17 | 18 | ## Course Details 19 | ### 1. Computer Vision: Foundations and Applications 20 | * **Link to course**            :     http://vision.stanford.edu/teaching/cs131_fall2122/ 21 | * **Offered By**                  :     Stanford 22 | * **Pre-Requisites**           :     Calculus, Linear Algebra, Python programming,Probability, Statistics 23 | 24 | * **Level**                           :     Beginner 25 | * **Course description** 26 | Computer Vision is one of the fastest growing and most exciting AI disciplines in today’s academia and industry. This 10-week course is designed to open the doors for students who are interested in learning about the fundamental principles and important applications of computer vision.It covers topics ranging from basic operations on images to Image Segmentation. Students will be exposed to a number of real-world applications that are important to our daily lives. 27 | 28 | 29 | ### 2. Deep Learning for Computer Vision 30 | * **Link to course**            :     http://cs231n.stanford.edu/ 31 | * **Offered By**                  :     Stanford 32 | * **Pre-Requisites**           :     Proficiency in Python, Calculus, Linear Algebra,Probability, Statistics 33 | * **Level**                           :     Advanced 34 | * **Course description** 35 | This course is a deep dive into the details of deep learning architectures with a focus on learning end-to-end models for these tasks, particularly image classification. During the 10-week course, students will learn to implement and train their own neural networks and gain a detailed understanding of cutting-edge research in computer vision. 36 | 37 | 38 | 39 | 46 | 47 | #### Happy Learning   :thumbsup: :memo: 48 | 49 | 50 | 51 | 52 | 53 | 54 | 55 | 56 | 57 | 58 | 59 | 60 | 61 | 62 | -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | MIT License 2 | 3 | Copyright (c) 2019 bayeslabs 4 | 5 | Permission is hereby granted, free of charge, to any person obtaining a copy 6 | of this software and associated documentation files (the "Software"), to deal 7 | in the Software without restriction, including without limitation the rights 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 9 | copies of the Software, and to permit persons to whom the Software is 10 | furnished to do so, subject to the following conditions: 11 | 12 | The above copyright notice and this permission notice shall be included in all 13 | copies or substantial portions of the Software. 14 | 15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 21 | SOFTWARE. 22 | -------------------------------------------------------------------------------- /Machine Learning/Problem1/2_Poisson_Regression.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "# CS229: Problem Set 1\n", 8 | "## Problem 2: Poisson Regression and the Exponential Family\n", 9 | "\n", 10 | "\n", 11 | "**C. Combier**\n", 12 | "\n", 13 | "This iPython Notebook provides solutions to Stanford's CS229 (Machine Learning, Fall 2017) graduate course problem set 1, taught by Andrew Ng.\n", 14 | "\n", 15 | "The problem set can be found here: [./ps1.pdf](ps1.pdf)\n", 16 | "\n", 17 | "I chose to write the solutions to the coding questions in Python, whereas the Stanford class is taught with Matlab/Octave.\n", 18 | "\n", 19 | "## Notation\n", 20 | "\n", 21 | "- $x^i$ is the $i^{th}$ feature vector\n", 22 | "- $y^i$ is the expected outcome for the $i^{th}$ training example\n", 23 | "- $m$ is the number of training examples\n", 24 | "- $n$ is the number of features" 25 | ] 26 | }, 27 | { 28 | "cell_type": "markdown", 29 | "metadata": {}, 30 | "source": [ 31 | "### Question 2.a)\n", 32 | "The exponential family is a class of distributions with the following form:\n", 33 | "\n", 34 | "$$\n", 35 | "p(y;\\eta) = b(y)\\exp{(\\eta^T T(y) - a(\\eta))}\n", 36 | "$$\n", 37 | "\n", 38 | "By identifying the parameters of the Poisson distribution:\n", 39 | "\n", 40 | "$$\n", 41 | "p(y;\\lambda) = \\frac{e^{-\\lambda}\\lambda^y}{y!}\n", 42 | "$$\n", 43 | "\n", 44 | "- $b(y) = \\frac{1}{y!}$\n", 45 | "- $T(y) = y$\n", 46 | "- $\\eta = \\log \\lambda$\n", 47 | "- $a(\\eta) = e^{\\eta}$" 48 | ] 49 | }, 50 | { 51 | "cell_type": "markdown", 52 | "metadata": {}, 53 | "source": [ 54 | "### Question 2.b)\n", 55 | "\n", 56 | "The canonical response function for the Poisson distribution is given by:\n", 57 | "\n", 58 | "$$\n", 59 | "g(\\eta) = E(y;\\eta) = \\lambda = e^{\\eta}\n", 60 | "$$" 61 | ] 62 | }, 63 | { 64 | "cell_type": "markdown", 65 | "metadata": {}, 66 | "source": [ 67 | "### Question 2.c)\n", 68 | "\n", 69 | "The log-likelihood of a training example $(x^i, y^i)$ is given by $\\log p(y^i|x^i;\\theta)$.\n", 70 | "\n", 71 | "To derive the stochastic gradient descent update rule, we start by computing the partial derivative of the log-likelihood with respect to parameter $\\theta_j$, with $\\eta = \\theta^T x^i$:\n", 72 | "\n", 73 | "$$\n", 74 | "\\begin{align*}\n", 75 | "\\frac{\\partial}{\\partial \\theta_j} \\log p(y^i|x^i;\\theta) &= \\frac{\\partial}{\\partial \\theta_j} ((\\theta^T x^i)^T y^i - e^{\\theta^T x^i}) \\\\\n", 76 | "&= (y^i-e^{\\theta^T x^i})x_j^i\n", 77 | "\\end{align*}\n", 78 | "$$\n", 79 | "\n", 80 | "The yields the update rule for stochastic gradient descent with learning rate $\\alpha$:\n", 81 | "\n", 82 | "$$\n", 83 | "\\theta_j := \\theta_j + \\alpha.(y^i-e^{\\theta^T x^i})x_j^i\n", 84 | "$$" 85 | ] 86 | } 87 | ], 88 | "metadata": { 89 | "kernelspec": { 90 | "display_name": "Python 2", 91 | "language": "python", 92 | "name": "python2" 93 | }, 94 | "language_info": { 95 | "codemirror_mode": { 96 | "name": "ipython", 97 | "version": 2 98 | }, 99 | "file_extension": ".py", 100 | "mimetype": "text/x-python", 101 | "name": "python", 102 | "nbconvert_exporter": "python", 103 | "pygments_lexer": "ipython2", 104 | "version": "2.7.15" 105 | } 106 | }, 107 | "nbformat": 4, 108 | "nbformat_minor": 2 109 | } 110 | -------------------------------------------------------------------------------- /Machine Learning/Problem1/4_Linear_Invariance.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "# CS229: Problem Set 1\n", 8 | "## Problem 4: Linear Invariance of Optimization Algorithms\n", 9 | "\n", 10 | "**C. Combier**\n", 11 | "\n", 12 | "This iPython Notebook provides solutions to Stanford's CS229 (Machine Learning, Fall 2017) graduate course problem set 1, taught by Andrew Ng.\n", 13 | "\n", 14 | "The problem set can be found here: [./ps1.pdf](ps1.pdf)\n", 15 | "\n", 16 | "I chose to write the solutions to the coding questions in Python, whereas the Stanford class is taught with Matlab/Octave.\n", 17 | "\n", 18 | "## Notation\n", 19 | "\n", 20 | "- $x^i$ is the $i^{th}$ feature vector\n", 21 | "- $y^i$ is the expected outcome for the $i^{th}$ training example\n", 22 | "- $m$ is the number of training examples\n", 23 | "- $n$ is the number of features" 24 | ] 25 | }, 26 | { 27 | "cell_type": "markdown", 28 | "metadata": { 29 | "colab_type": "text", 30 | "id": "9J7p406abzgl" 31 | }, 32 | "source": [ 33 | "### Question 4.a)\n", 34 | "Let:\n", 35 | " - $z = A^{-1} x$\n", 36 | " - $g(z) = f(Az)$\n", 37 | " \n", 38 | "We also define the notation $H_f |_x$ and $\\nabla_f |_x$ , which are respectively the Hessian and Gradient of function $f$ evaluted at $x$.\n", 39 | " \n", 40 | "Write the update rule for Newton-Rhapson's method:\n", 41 | " \n", 42 | " $$\n", 43 | " \\begin{align*}\n", 44 | " z : &= z -H_g^{-1} |_z. \\nabla_g |_z \\\\\n", 45 | " : &= z - H_f^{-1} |_{Az}. \\nabla_f |_{Az}\n", 46 | " \\end{align*}\n", 47 | " $$\n", 48 | " \n", 49 | "We calculate the Hessian using the chain rule:\n", 50 | " $$\n", 51 | " H_f|_{Az} = A^T . H_f |_{Az} . A \\implies H_f|_{Az} ^{-1}= A^{-1} . H_f|_{Az}^{-1} . A^{T^{-1}}\n", 52 | " $$\n", 53 | " \n", 54 | "Similarly, the chain rule applied to the gradient operator is given by:\n", 55 | " \n", 56 | " $$\n", 57 | " \\nabla_f |_{Az} = A^T . \\nabla_f|_{Az}\n", 58 | " $$\n", 59 | " \n", 60 | "Combining the two in the update rule:\n", 61 | " \n", 62 | " $$\n", 63 | " \\begin{align*}\n", 64 | " z : &= z - A^{-1} . H_f|_{Az}^{-1}. A^{T^{-1}}. A^T .\\nabla_f|_{Az} \\\\\n", 65 | " : &= z - A^{-1} H_f|_{Az}^{-1} .\\nabla_f|_{Az} \\\\\n", 66 | " : &= z - A^{-1} H_f|_x^{-1} .\\nabla_f|_x \\\\\n", 67 | " : &= A^{-1} . (x - H_f|_x^{-1} . \\nabla_f|_x )\n", 68 | " \\end{align*}\n", 69 | " $$\n", 70 | " \n", 71 | "Which proves the linearity of the Newton-Rhapson method." 72 | ] 73 | }, 74 | { 75 | "cell_type": "markdown", 76 | "metadata": { 77 | "colab_type": "text", 78 | "id": "OLXoC8jH6mlv" 79 | }, 80 | "source": [ 81 | "### Question 4.b)\n", 82 | "\n", 83 | "In this question, we show that gradient descent is not invariant to linear reparametrization.\n", 84 | "\n", 85 | "Consider the function $f:x \\mapsto x^2$, $x \\in R$\n", 86 | "\n", 87 | "We now consider the gradient descent update rule for this function and parameter $z = \\lambda x$:\n", 88 | "\n", 89 | "$$\n", 90 | "\\begin{align*}\n", 91 | "z:&= z - \\alpha \\frac{df}{dz} \\\\\n", 92 | ": &= z - \\alpha \\frac{df}{dx} \\frac{dx}{dz} \\\\\n", 93 | ": &= \\lambda x -\\frac{\\alpha}{\\lambda} (2 \\lambda x) = \\lambda x - \\alpha (2x) \\\\\n", 94 | "\\neq \\lambda.(x - \\alpha.\\frac{df}{dx}) = \\lambda (x - \\alpha.(2x))\n", 95 | "\\end{align*}\n", 96 | "$$\n", 97 | "\n", 98 | "This counter example shows that gradient descent is not invariant to linear reparametrization." 99 | ] 100 | } 101 | ], 102 | "metadata": { 103 | "kernelspec": { 104 | "display_name": "Python 2", 105 | "language": "python", 106 | "name": "python2" 107 | }, 108 | "language_info": { 109 | "codemirror_mode": { 110 | "name": "ipython", 111 | "version": 2 112 | }, 113 | "file_extension": ".py", 114 | "mimetype": "text/x-python", 115 | "name": "python", 116 | "nbconvert_exporter": "python", 117 | "pygments_lexer": "ipython2", 118 | "version": "2.7.15" 119 | } 120 | }, 121 | "nbformat": 4, 122 | "nbformat_minor": 2 123 | } 124 | -------------------------------------------------------------------------------- /Machine Learning/Problem1/data/logistic_x.txt: -------------------------------------------------------------------------------- 1 | 1.3432504e+00 -1.3311479e+00 2 | 1.8205529e+00 -6.3466810e-01 3 | 9.8632067e-01 -1.8885762e+00 4 | 1.9443734e+00 -1.6354520e+00 5 | 9.7673352e-01 -1.3533151e+00 6 | 1.9458584e+00 -2.0443278e+00 7 | 2.1075153e+00 -2.1256684e+00 8 | 2.0703730e+00 -2.4634101e+00 9 | 8.6864964e-01 -2.4119348e+00 10 | 1.8006594e+00 -2.7739689e+00 11 | 3.1283787e+00 -3.4452432e+00 12 | 3.0947429e+00 -3.6446145e+00 13 | 2.9086652e+00 -4.0065037e+00 14 | 2.6770338e+00 -3.0198592e+00 15 | 2.7458671e+00 -2.7100561e+00 16 | 4.1714647e+00 -3.4622482e+00 17 | 3.9313220e+00 -2.1099044e+00 18 | 4.3786870e+00 -2.3804743e+00 19 | 4.8016565e+00 -3.3803344e+00 20 | 4.1661050e+00 -2.8138844e+00 21 | 2.4670141e+00 -1.6108444e+00 22 | 3.4826743e+00 -1.5533872e+00 23 | 3.3652482e+00 -1.8164936e+00 24 | 2.8772788e+00 -1.8511689e+00 25 | 3.1090444e+00 -1.6384946e+00 26 | 2.2183701e+00 7.4279558e-02 27 | 1.9949873e+00 1.6268659e-01 28 | 2.9500308e+00 1.6873016e-02 29 | 2.0216009e+00 1.7227387e-01 30 | 2.0486921e+00 -6.3581041e-01 31 | 8.7548563e-01 -5.4586168e-01 32 | 5.7079941e-01 -3.3278660e-02 33 | 1.4266468e+00 -7.5288337e-01 34 | 7.2265633e-01 -8.6691930e-01 35 | 9.5346198e-01 -1.4896956e+00 36 | 4.8333333e+00 7.0175439e-02 37 | 4.3070175e+00 1.4152047e+00 38 | 6.0321637e+00 4.5029240e-01 39 | 5.4181287e+00 -2.7076023e+00 40 | 3.4590643e+00 -2.8245614e+00 41 | 2.7280702e+00 -9.2397661e-01 42 | 1.0029240e+00 7.7192982e-01 43 | 3.6637427e+00 -7.7777778e-01 44 | 4.3070175e+00 -1.0409357e+00 45 | 3.6929825e+00 -1.0526316e-01 46 | 5.7397661e+00 -1.6257310e+00 47 | 4.9795322e+00 -1.5087719e+00 48 | 6.5000000e+00 -2.9122807e+00 49 | 5.2426901e+00 9.1812865e-01 50 | 1.6754386e+00 5.6725146e-01 51 | 5.1708997e+00 1.2103667e+00 52 | 4.8795188e+00 1.6081848e+00 53 | 4.6649870e+00 1.0695532e+00 54 | 4.4934321e+00 1.2351592e+00 55 | 4.1512967e+00 8.6721260e-01 56 | 3.7177080e+00 1.1517200e+00 57 | 3.6224477e+00 1.3106769e+00 58 | 3.0606943e+00 1.4857163e+00 59 | 7.0718465e+00 -3.4961651e-01 60 | 6.0391832e+00 -2.4756832e-01 61 | 6.6747480e+00 -1.2484766e-01 62 | 6.8461291e+00 2.5977167e-01 63 | 6.4270724e+00 -1.4713863e-01 64 | 6.8456065e+00 1.4754967e+00 65 | 7.7054006e+00 1.6045555e+00 66 | 6.2870658e+00 2.4156427e+00 67 | 6.9810956e+00 1.2599865e+00 68 | 7.0990172e+00 2.2155151e+00 69 | 5.5275479e+00 2.9968421e-01 70 | 5.8303489e+00 -2.1974408e-01 71 | 6.3594527e+00 2.3944217e-01 72 | 6.1004524e+00 -4.0957414e-02 73 | 5.6237412e+00 3.7135914e-01 74 | 5.8836969e+00 2.7768186e+00 75 | 5.5781611e+00 3.0682889e+00 76 | 7.0050662e+00 -2.5781727e-01 77 | 4.4538114e+00 8.3941831e-01 78 | 5.6495924e+00 1.3053929e+00 79 | 4.6337489e+00 1.9467546e+00 80 | 3.6986847e+00 2.2594084e+00 81 | 4.1193005e+00 2.5474510e+00 82 | 4.7665558e+00 2.7531209e+00 83 | 3.0812098e+00 2.7985255e+00 84 | 4.0730994e+00 -3.0292398e+00 85 | 3.4883041e+00 -1.8888889e+00 86 | 7.6900585e-01 1.2105263e+00 87 | 1.5000000e+00 3.8128655e+00 88 | 5.7982456e+00 -2.0935673e+00 89 | 6.8114529e+00 -8.3456730e-01 90 | 7.1106096e+00 -1.0201158e+00 91 | 7.4941520e+00 -1.7426901e+00 92 | 3.1374269e+00 4.2105263e-01 93 | 1.6754386e+00 5.0877193e-01 94 | 2.4941520e+00 -8.6549708e-01 95 | 4.7748538e+00 9.9415205e-02 96 | 5.8274854e+00 -6.9005848e-01 97 | 2.2894737e+00 1.9707602e+00 98 | 2.4941520e+00 1.4152047e+00 99 | 2.0847953e+00 1.3567251e+00 100 | -------------------------------------------------------------------------------- /Machine Learning/Problem1/data/logistic_y.txt: -------------------------------------------------------------------------------- 1 | -1.0000000e+00 2 | -1.0000000e+00 3 | -1.0000000e+00 4 | -1.0000000e+00 5 | -1.0000000e+00 6 | -1.0000000e+00 7 | -1.0000000e+00 8 | -1.0000000e+00 9 | -1.0000000e+00 10 | -1.0000000e+00 11 | -1.0000000e+00 12 | -1.0000000e+00 13 | -1.0000000e+00 14 | -1.0000000e+00 15 | -1.0000000e+00 16 | -1.0000000e+00 17 | -1.0000000e+00 18 | -1.0000000e+00 19 | -1.0000000e+00 20 | -1.0000000e+00 21 | -1.0000000e+00 22 | -1.0000000e+00 23 | -1.0000000e+00 24 | -1.0000000e+00 25 | -1.0000000e+00 26 | -1.0000000e+00 27 | -1.0000000e+00 28 | -1.0000000e+00 29 | -1.0000000e+00 30 | -1.0000000e+00 31 | -1.0000000e+00 32 | -1.0000000e+00 33 | -1.0000000e+00 34 | -1.0000000e+00 35 | -1.0000000e+00 36 | -1.0000000e+00 37 | -1.0000000e+00 38 | -1.0000000e+00 39 | -1.0000000e+00 40 | -1.0000000e+00 41 | -1.0000000e+00 42 | -1.0000000e+00 43 | -1.0000000e+00 44 | -1.0000000e+00 45 | -1.0000000e+00 46 | -1.0000000e+00 47 | -1.0000000e+00 48 | -1.0000000e+00 49 | -1.0000000e+00 50 | -1.0000000e+00 51 | 1.0000000e+00 52 | 1.0000000e+00 53 | 1.0000000e+00 54 | 1.0000000e+00 55 | 1.0000000e+00 56 | 1.0000000e+00 57 | 1.0000000e+00 58 | 1.0000000e+00 59 | 1.0000000e+00 60 | 1.0000000e+00 61 | 1.0000000e+00 62 | 1.0000000e+00 63 | 1.0000000e+00 64 | 1.0000000e+00 65 | 1.0000000e+00 66 | 1.0000000e+00 67 | 1.0000000e+00 68 | 1.0000000e+00 69 | 1.0000000e+00 70 | 1.0000000e+00 71 | 1.0000000e+00 72 | 1.0000000e+00 73 | 1.0000000e+00 74 | 1.0000000e+00 75 | 1.0000000e+00 76 | 1.0000000e+00 77 | 1.0000000e+00 78 | 1.0000000e+00 79 | 1.0000000e+00 80 | 1.0000000e+00 81 | 1.0000000e+00 82 | 1.0000000e+00 83 | 1.0000000e+00 84 | 1.0000000e+00 85 | 1.0000000e+00 86 | 1.0000000e+00 87 | 1.0000000e+00 88 | 1.0000000e+00 89 | 1.0000000e+00 90 | 1.0000000e+00 91 | 1.0000000e+00 92 | 1.0000000e+00 93 | 1.0000000e+00 94 | 1.0000000e+00 95 | 1.0000000e+00 96 | 1.0000000e+00 97 | 1.0000000e+00 98 | 1.0000000e+00 99 | 1.0000000e+00 100 | -------------------------------------------------------------------------------- /Machine Learning/Problem1/ps1.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/bayeslabs/AiGym/205350c311f0e0981fe90eae84586c9bd4a9cfef/Machine Learning/Problem1/ps1.pdf -------------------------------------------------------------------------------- /Machine Learning/Problem2/2_Model_Calibration.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "# CS229: Problem Set 2\n", 8 | "## Problem 2: Model Calibration\n", 9 | "\n", 10 | "\n", 11 | "**C. Combier**\n", 12 | "\n", 13 | "This iPython Notebook provides solutions to Stanford's CS229 (Machine Learning, Fall 2017) graduate course problem set 2, taught by Andrew Ng.\n", 14 | "\n", 15 | "The problem set can be found here: [./ps2.pdf](ps2.pdf)\n", 16 | "\n", 17 | "I chose to write the solutions to the coding questions in Python, whereas the Stanford class is taught with Matlab/Octave.\n", 18 | "\n", 19 | "## Notation\n", 20 | "\n", 21 | "- $x^i$ is the $i^{th}$ feature vector\n", 22 | "- $y^i$ is the expected outcome for the $i^{th}$ training example\n", 23 | "- $m$ is the number of training examples\n", 24 | "- $n$ is the number of features" 25 | ] 26 | }, 27 | { 28 | "cell_type": "markdown", 29 | "metadata": { 30 | "colab_type": "text", 31 | "id": "rsUtreJMonLw" 32 | }, 33 | "source": [ 34 | "### Question 2.a)\n", 35 | "\n", 36 | "The maximum likelihood parameters $\\theta^*$ are obtained by writing the gradient of the log-likelihood with respect to $\\theta$ and setting it to $0$. In matrix form, this is equivalent to solving the following equation:\n", 37 | "\n", 38 | "$$\n", 39 | "X^T(Y-h_{\\theta}(X)) = 0\n", 40 | "$$\n", 41 | "\n", 42 | "Where:\n", 43 | "- $X$ is an $m \\times (n+1)$ matrix, given the addition of the intercept term $x_0 = 1 \\hspace{1em}, \\forall i$\n", 44 | "- $Y$ is an $m \\times 1$ matrix\n", 45 | "\n", 46 | "Expanding the matrix equation for $\\theta = \\theta^*$ gives:\n", 47 | "\n", 48 | "$$\n", 49 | " \\left[ {\\begin{array}{cccc}\n", 50 | " 1 & ... & 1 \\\\\n", 51 | " x^1_1 & ... & x_n^1 \\\\\n", 52 | " & ... & \\\\\n", 53 | " x^m_1 & ... & x^m_n\n", 54 | " \\end{array} } \\right]\n", 55 | " (Y-h_{\\theta^*}(X)) = 0\n", 56 | " $$\n", 57 | " \n", 58 | " If we extract the first line from the above matrix equation, we get:\n", 59 | " \n", 60 | " $$\n", 61 | " \\sum_{i=1}^m y^i = \\sum_{i=1}^m h_{\\theta^*}(x^i)\n", 62 | " $$\n", 63 | " \n", 64 | " Using the definition of $h_{\\theta^*}$:\n", 65 | " \n", 66 | " $$\n", 67 | " \\sum_{i=1}^m 1(y^i = 1) = \\sum_{i=1}^m P(y = 1|x;\\theta^*)\n", 68 | " $$\n", 69 | " \n", 70 | " We conclule by saying that $|\\{ i \\in I_{0,1} \\}| = m$ which shows the property holds true for $(a,b) = (0,1)$\n", 71 | " \n", 72 | " ### Question 2.b)\n", 73 | " \n", 74 | " - If a model is perfectly callibrated, then all we can say is that the probabilities output from the model match empirical observations. This only describes the probabilities of the outcomes, and not the outcomes themselves, therefore the model does not necessarily achieve perfect accuracy.\n", 75 | " - Conversely, if a model has perfect accuracy, then the probabilities output by the model necessarily match empirical observations\n", 76 | " \n", 77 | " \n", 78 | " This implies that callibration is a weaker assumption than accuracy." 79 | ] 80 | } 81 | ], 82 | "metadata": { 83 | "kernelspec": { 84 | "display_name": "Python 2", 85 | "language": "python", 86 | "name": "python2" 87 | }, 88 | "language_info": { 89 | "codemirror_mode": { 90 | "name": "ipython", 91 | "version": 2 92 | }, 93 | "file_extension": ".py", 94 | "mimetype": "text/x-python", 95 | "name": "python", 96 | "nbconvert_exporter": "python", 97 | "pygments_lexer": "ipython2", 98 | "version": "2.7.15" 99 | } 100 | }, 101 | "nbformat": 4, 102 | "nbformat_minor": 2 103 | } 104 | -------------------------------------------------------------------------------- /Machine Learning/Problem2/3_Bayesian_Logistic_Regression.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "# CS229: Problem Set 2\n", 8 | "## Problem 3: Bayesian Logistic Regression and Weight Decay\n", 9 | "\n", 10 | "\n", 11 | "**C. Combier**\n", 12 | "\n", 13 | "This iPython Notebook provides solutions to Stanford's CS229 (Machine Learning, Fall 2017) graduate course problem set 3, taught by Andrew Ng.\n", 14 | "\n", 15 | "The problem set can be found here: [./ps2.pdf](ps2.pdf)\n", 16 | "\n", 17 | "I chose to write the solutions to the coding questions in Python, whereas the Stanford class is taught with Matlab/Octave.\n", 18 | "\n", 19 | "## Notation\n", 20 | "\n", 21 | "- $x^i$ is the $i^{th}$ feature vector\n", 22 | "- $y^i$ is the expected outcome for the $i^{th}$ training example\n", 23 | "- $m$ is the number of training examples\n", 24 | "- $n$ is the number of features" 25 | ] 26 | }, 27 | { 28 | "cell_type": "markdown", 29 | "metadata": { 30 | "colab_type": "text", 31 | "id": "rsUtreJMonLw" 32 | }, 33 | "source": [ 34 | "### Question 3)\n", 35 | "\n", 36 | "Suppose that $|| \\theta_{MAP} ||^2 > || \\theta_{MLE} ||^2$.\n", 37 | "\n", 38 | "Then, given the prior that $\\theta$ is a gaussian random variable:\n", 39 | "\n", 40 | "$$\n", 41 | "\\begin{align*}\n", 42 | "p(\\theta_{MAP}) &< p(\\theta_{MLE}) \\\\\n", 43 | "p(\\theta_{MAP}) \\prod_{i=1}^m p(y^i |x^i; \\theta_{MAP}) &< p(\\theta_{MLE}) \\prod_{i=1}^m p(y^i |x^i; \\theta_{MAP}) \\\\\n", 44 | "&< p(\\theta_{MLE}) \\prod_{i=1}^m p(y^i |x^i; \\theta_{MLE})\n", 45 | "\\end{align*}\n", 46 | "$$\n", 47 | "This is true because by the definition of $\\theta_{MLE}$:\n", 48 | "$$\n", 49 | "\\forall \\theta, \\prod_{i=1}^m p(y^i |x^i; \\theta) < \\prod_{i=1}^m p(y^i |x^i; \\theta_{MLE})\n", 50 | "$$\n", 51 | "\n", 52 | "However, this statement contradicts the definition of $\\theta_{MAP}$. Therefore, our initial assumption is incorrect, which proves that:\n", 53 | "\n", 54 | "$$\n", 55 | "|| \\theta_{MAP} ||^2 \\leq || \\theta_{MLE} ||^2\n", 56 | "$$" 57 | ] 58 | } 59 | ], 60 | "metadata": { 61 | "kernelspec": { 62 | "display_name": "Python 2", 63 | "language": "python", 64 | "name": "python2" 65 | }, 66 | "language_info": { 67 | "codemirror_mode": { 68 | "name": "ipython", 69 | "version": 2 70 | }, 71 | "file_extension": ".py", 72 | "mimetype": "text/x-python", 73 | "name": "python", 74 | "nbconvert_exporter": "python", 75 | "pygments_lexer": "ipython2", 76 | "version": "2.7.15" 77 | } 78 | }, 79 | "nbformat": 4, 80 | "nbformat_minor": 2 81 | } 82 | -------------------------------------------------------------------------------- /Machine Learning/Problem2/4_Constructing_Kernels.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "# CS229: Problem Set 2\n", 8 | "## Problem 4: Constructing Kernels\n", 9 | "\n", 10 | "\n", 11 | "**C. Combier**\n", 12 | "\n", 13 | "This iPython Notebook provides solutions to Stanford's CS229 (Machine Learning, Fall 2017) graduate course problem set 2, taught by Andrew Ng.\n", 14 | "\n", 15 | "The problem set can be found here: [./ps2.pdf](ps2.pdf)\n", 16 | "\n", 17 | "I chose to write the solutions to the coding questions in Python, whereas the Stanford class is taught with Matlab/Octave.\n", 18 | "\n", 19 | "## Notation\n", 20 | "\n", 21 | "- $x^i$ is the $i^{th}$ feature vector\n", 22 | "- $y^i$ is the expected outcome for the $i^{th}$ training example\n", 23 | "- $m$ is the number of training examples\n", 24 | "- $n$ is the number of features" 25 | ] 26 | }, 27 | { 28 | "cell_type": "markdown", 29 | "metadata": { 30 | "colab_type": "text", 31 | "id": "Cc5deiNrag6C" 32 | }, 33 | "source": [ 34 | "### Question 4.a)\n", 35 | "\n", 36 | "$$\n", 37 | "\\begin{align*}\n", 38 | "u^T K u &= u^T K_1 u + u^T K_2 u\\\\\n", 39 | "\\end{align*}\n", 40 | "$$\n", 41 | "\n", 42 | "Since $u^T K_1 u \\geq 0$ and $u^T K_2 u \\geq 0$, we have that $u^T K u \\geq 0$ and thus $K$ is a Mercer kernel.\n", 43 | "\n", 44 | "### Question 4.b)\n", 45 | "\n", 46 | "$$\n", 47 | "\\begin{align*}\n", 48 | "u^T K u &= u^T K_1 u - u^T K_2 u\\\\\n", 49 | "\\end{align*}\n", 50 | "$$\n", 51 | "\n", 52 | "Therefore $u^T K u$ is not necessarily positive, i.e. $K$ is not a Mercer kernel.\n", 53 | "\n", 54 | "### Question 4.c)\n", 55 | "\n", 56 | "$$\n", 57 | "\\begin{align*}\n", 58 | "u^T K u &= u^T a K_1 u\\\\\n", 59 | "&= a. u^T K_1 u \\geq 0\n", 60 | "\\end{align*}\n", 61 | "$$\n", 62 | "\n", 63 | "Therefore $K$ is a Mercer kernel.\n", 64 | "\n", 65 | "### Question 4.d)\n", 66 | "\n", 67 | "$$\n", 68 | "\\begin{align*}\n", 69 | "u^T K u &= - u^T a K_1 u\\\\\n", 70 | "&= -a. u^T K_1 u \\leq 0\n", 71 | "\\end{align*}\n", 72 | "$$\n", 73 | "\n", 74 | "Therefore $K$ is **not** a Mercer kernel.\n", 75 | "\n", 76 | "### Question 4.e)\n", 77 | "\n", 78 | "$$\n", 79 | "\\begin{align*}\n", 80 | "u^T K u &= u^T K_1 K_2 u\\\\\n", 81 | "&= u^T K_1 [uu^T] [uu^T]^{-1} K_2 u \\\\\n", 82 | "&= [u^T K_1 u] u^T [uu^T]^{-1} K_2 u\n", 83 | "\\end{align*}\n", 84 | "$$\n", 85 | "\n", 86 | "Now, we use the linear algebra property:\n", 87 | "$$\n", 88 | "A^{-1} = [A^T A]^{-1} A^T\n", 89 | "$$\n", 90 | "The proof is straightforward (multiplying left and right by $A$ yields $[A^T A]^{-1} [A^T A] = I$).\n", 91 | "\n", 92 | "Choosing $A = uu^T$ and replacing in the previous formulation, we get:\n", 93 | "\n", 94 | "$$\n", 95 | "[uu^T]^{-1} = [[uu^T]^T[uu^T]]^{-1} uu^T\n", 96 | "$$\n", 97 | "\n", 98 | "Since $uu^T$ is symmetric, $[uu^T]^T = uu^T$, therefore:\n", 99 | "\n", 100 | "$$\n", 101 | "[uu^T]^{-1} = [[uu^T]^2]^{-1} uu^T\n", 102 | "$$\n", 103 | "\n", 104 | "We inject this formulation into the previous step:\n", 105 | "\n", 106 | "$$\n", 107 | "\\begin{align*}\n", 108 | "u^T K u &= [u^T K_1 u] u^T [[uu^T]^2]^{-1} u[u^T K_2 u]\\\\\n", 109 | "\\end{align*}\n", 110 | "$$\n", 111 | "\n", 112 | "Let $C = [[uu^T]^2]^{-1}$. To complete the proof, we need to show that $C \\geq 0$.\n", 113 | "\n", 114 | "We know that by construction, $uu^T$ is symetric. Therefore $uu^T$ is diagonalizable in an orthogonal basis:\n", 115 | "\n", 116 | "$$\n", 117 | "A = uu^T = Q \\Lambda Q^{-1}.\n", 118 | "$$\n", 119 | "\n", 120 | "Squaring this result yields:\n", 121 | "\n", 122 | "$$\n", 123 | "A^2 = [uu^T]^2 = Q \\Lambda ^2 Q^{-1}.\n", 124 | "$$\n", 125 | "\n", 126 | "The diagonal elements of $\\Lambda^2$ are the eigenvalues of $[uu^T]^2$. The eigenvalues are all positive, hence $[uu^T]^2$ is semi defininte positive.\n", 127 | "\n", 128 | "Finally, because $[uu^T]^2$ is semi definite positive, $C=[[uu^T]^2]^{-1}$ is also semi definite positive.\n", 129 | "\n", 130 | "We therefore have:\n", 131 | "\n", 132 | "$$\n", 133 | "\\begin{align*}\n", 134 | "u^T K u &= [u^T K_1 u] [u^T C u][u^T K_2 u] \\geq 0\\\\\n", 135 | "\\end{align*}\n", 136 | "$$\n", 137 | "\n", 138 | "This concludes the proof that $K$ is indeed a Mercer kernel, since all the elements of this product are positive.\n", 139 | "\n", 140 | "### Question 4.f)\n", 141 | "\n", 142 | "$K$ is not a Mercer kernel. A counter example would be $f: y \\mapsto sign(y)$, and choosing $(x,z) = (-1,1)$.\n", 143 | "\n", 144 | "### Question 4.g)\n", 145 | "\n", 146 | "It is straightforward to prove $K$ is a Mercer kernel, since $K_3$ is a Mercer kernel. This is independant of the chosen map $\\phi$.\n", 147 | "\n", 148 | "### Question 4.f)\n", 149 | "\n", 150 | "We need to prove that $\\forall a_q \\geq 0$, $\\forall N$:\n", 151 | "\n", 152 | "$$\\sum_{q=0}^N a_q K_1 ^q$$\n", 153 | "\n", 154 | "is also a Mercer kernel.\n", 155 | "\n", 156 | "Let's start by showing that $\\forall q, K_1^q$ is a Mercer kernel. We can do this by induction:\n", 157 | "\n", 158 | "**$k=0$:** the result is immediate, since $u^T K_1^0 u = u^T u = ||u||^2 \\geq 0$\n", 159 | "\n", 160 | "**$k \\implies k+1$:** this result is proved in question 4.e)\n", 161 | "\n", 162 | "Furthermore, $\\forall q, a_q \\geq 0$ so according to 4.c), $a_q K_1^q$ is also a Mercer kernel.\n", 163 | "\n", 164 | "Finally, according to 4.a), the sum of two Mercer kernels is also a Mercer kernel. This concludes the proof that $K$ is a Mercer kernel.\n", 165 | "\n" 166 | ] 167 | }, 168 | { 169 | "cell_type": "code", 170 | "execution_count": null, 171 | "metadata": {}, 172 | "outputs": [], 173 | "source": [] 174 | } 175 | ], 176 | "metadata": { 177 | "kernelspec": { 178 | "display_name": "Python 2", 179 | "language": "python", 180 | "name": "python2" 181 | }, 182 | "language_info": { 183 | "codemirror_mode": { 184 | "name": "ipython", 185 | "version": 2 186 | }, 187 | "file_extension": ".py", 188 | "mimetype": "text/x-python", 189 | "name": "python", 190 | "nbconvert_exporter": "python", 191 | "pygments_lexer": "ipython2", 192 | "version": "2.7.15" 193 | } 194 | }, 195 | "nbformat": 4, 196 | "nbformat_minor": 2 197 | } 198 | -------------------------------------------------------------------------------- /Machine Learning/Problem2/5_Kernelizing_the_Perceptron.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "# CS229: Problem Set 2\n", 8 | "## Problem 5: Kernelizing the Perceptron\n", 9 | "\n", 10 | "\n", 11 | "**C. Combier**\n", 12 | "\n", 13 | "This iPython Notebook provides solutions to Stanford's CS229 (Machine Learning, Fall 2017) graduate course problem set 5, taught by Andrew Ng.\n", 14 | "\n", 15 | "The problem set can be found here: [./ps2.pdf](ps2.pdf)\n", 16 | "\n", 17 | "I chose to write the solutions to the coding questions in Python, whereas the Stanford class is taught with Matlab/Octave.\n", 18 | "\n", 19 | "## Notation\n", 20 | "\n", 21 | "- $x^i$ is the $i^{th}$ feature vector\n", 22 | "- $y^i$ is the expected outcome for the $i^{th}$ training example\n", 23 | "- $m$ is the number of training examples\n", 24 | "- $n$ is the number of features" 25 | ] 26 | }, 27 | { 28 | "cell_type": "markdown", 29 | "metadata": { 30 | "colab_type": "text", 31 | "id": "rsUtreJMonLw" 32 | }, 33 | "source": [ 34 | "### Question 5.a)\n", 35 | "\n", 36 | "Let $K$ be a Mercer kernel with mapping $\\phi: R^n \\to E $:\n", 37 | "\n", 38 | "$$\n", 39 | "\\begin{align*}\n", 40 | "K(x, y) &\\mapsto \\langle \\phi(x), \\phi(y) \\rangle \\\\\n", 41 | "\\end{align*}\n", 42 | "$$\n", 43 | "\n", 44 | "$\\theta^i$ can be represented as $K(\\theta^i,x^i )$\n", 45 | "\n", 46 | "### Question 5.b)\n", 47 | "\n", 48 | "\n", 49 | "$h_{\\theta^i} (x^{i+1}) = g( {\\theta^i}^T \\phi(x^{i+1}) ) = g( K ({\\theta^i} ,x^{i+1}) ) $\n", 50 | "\n", 51 | "### Question 5.c)\n", 52 | "\n", 53 | "Simply remap the update rule by using $K$:\n", 54 | "\n", 55 | "$$\n", 56 | "K (\\theta^{i+1} ,x^{i+1}) := K ({\\theta^i} ,x^{i+1})+ \\alpha y^{i+1}.1 \\left \\{ y^{i+1}g \\left( K ({\\theta^i} ,x^{i+1}) \\right) \\right \\}. K(x^{i+1},x^{i+1})\n", 57 | "$$" 58 | ] 59 | } 60 | ], 61 | "metadata": { 62 | "kernelspec": { 63 | "display_name": "Python 2", 64 | "language": "python", 65 | "name": "python2" 66 | }, 67 | "language_info": { 68 | "codemirror_mode": { 69 | "name": "ipython", 70 | "version": 2 71 | }, 72 | "file_extension": ".py", 73 | "mimetype": "text/x-python", 74 | "name": "python", 75 | "nbconvert_exporter": "python", 76 | "pygments_lexer": "ipython2", 77 | "version": "2.7.15" 78 | } 79 | }, 80 | "nbformat": 4, 81 | "nbformat_minor": 2 82 | } 83 | -------------------------------------------------------------------------------- /Machine Learning/Problem2/data/data_a.txt: -------------------------------------------------------------------------------- 1 | -1.000000000000000000e+00 6.012660321346644521e-01 1.650910586864833274e-01 2 | 1.000000000000000000e+00 8.717253403947561319e-01 5.273606284195629934e-01 3 | -1.000000000000000000e+00 3.725479744663405812e-01 4.466090687037850282e-01 4 | -1.000000000000000000e+00 1.357664310444239852e-02 5.135778964393811208e-02 5 | 1.000000000000000000e+00 5.830316375912952820e-01 7.106191307030319537e-01 6 | 1.000000000000000000e+00 9.084797126970022285e-01 1.752718002509726647e-01 7 | -1.000000000000000000e+00 3.999644820298295933e-01 4.739952831980015491e-01 8 | 1.000000000000000000e+00 8.325367962545292544e-01 5.980482975033334370e-01 9 | -1.000000000000000000e+00 4.816405545569502067e-03 9.844565337131838678e-01 10 | -1.000000000000000000e+00 7.499084384207852505e-01 4.542354333273501688e-02 11 | 1.000000000000000000e+00 6.787728052832739944e-01 9.244621614802555065e-01 12 | -1.000000000000000000e+00 2.955893587048743498e-01 3.491424901113958645e-01 13 | -1.000000000000000000e+00 2.036023752010092114e-01 5.920026826758448824e-01 14 | -1.000000000000000000e+00 2.198870010022596633e-01 4.525792858555204301e-01 15 | -1.000000000000000000e+00 1.454150156442755026e-01 8.284065671346926285e-01 16 | -1.000000000000000000e+00 2.507999560886482460e-01 5.412252402844155430e-01 17 | -1.000000000000000000e+00 4.454069172249203179e-01 9.642287028245233316e-02 18 | 1.000000000000000000e+00 2.533909449181281914e-02 9.541419533675528086e-01 19 | 1.000000000000000000e+00 8.795209274605995109e-01 9.807605212599779243e-01 20 | 1.000000000000000000e+00 7.941665553454693161e-01 4.595618235830615239e-01 21 | -1.000000000000000000e+00 2.692782581937129827e-01 3.045600108717329002e-01 22 | -1.000000000000000000e+00 5.269267778019376403e-01 1.446119313623852598e-01 23 | -1.000000000000000000e+00 6.425484830157719429e-01 1.444312811956353082e-01 24 | -1.000000000000000000e+00 2.762516897555838957e-01 2.210722138600362818e-02 25 | -1.000000000000000000e+00 5.583924564641318256e-01 3.164687286818336220e-01 26 | -1.000000000000000000e+00 2.075942837293226484e-01 5.810765156019325195e-01 27 | -1.000000000000000000e+00 5.865040077341432401e-01 1.730316178976930575e-01 28 | -1.000000000000000000e+00 3.805484638713033663e-01 6.717623204463272213e-01 29 | -1.000000000000000000e+00 3.813986396562527581e-01 3.077646651653764831e-02 30 | -1.000000000000000000e+00 4.899248660762223206e-01 4.167626968687931921e-02 31 | 1.000000000000000000e+00 6.682081706177449565e-01 6.628198755504333128e-01 32 | -1.000000000000000000e+00 3.421034653103514067e-01 7.363192575332345724e-01 33 | 1.000000000000000000e+00 8.337479918488441832e-01 1.573775395900884888e-01 34 | -1.000000000000000000e+00 4.923674631211191199e-01 3.888211557841462218e-01 35 | -1.000000000000000000e+00 2.746871724354470468e-01 2.194037119875775765e-01 36 | 1.000000000000000000e+00 9.514244202703872055e-01 7.517850107244149482e-01 37 | 1.000000000000000000e+00 7.222970233828414077e-01 6.293849395650549239e-01 38 | 1.000000000000000000e+00 7.221358438649915223e-01 9.296141057040921973e-01 39 | -1.000000000000000000e+00 1.351819603026834793e-01 1.854270482295840017e-01 40 | 1.000000000000000000e+00 6.847589110633647280e-01 3.005782511358631170e-01 41 | 1.000000000000000000e+00 9.167839007195689449e-01 7.608979395120699651e-01 42 | -1.000000000000000000e+00 7.296113795807901425e-02 4.672119711166422551e-01 43 | 1.000000000000000000e+00 8.453793119640681253e-01 7.107858693353201751e-01 44 | 1.000000000000000000e+00 8.758550459041712921e-01 5.390947722932232233e-01 45 | -1.000000000000000000e+00 6.680240193628613765e-01 4.079056712401981644e-01 46 | -1.000000000000000000e+00 1.942797580183325268e-01 6.786361588339889783e-01 47 | -1.000000000000000000e+00 6.992478452176446035e-01 2.772214675572326481e-02 48 | 1.000000000000000000e+00 4.856696344043648361e-01 6.878385389105863279e-01 49 | -1.000000000000000000e+00 1.532187070142976282e-01 7.760493991951687986e-01 50 | 1.000000000000000000e+00 4.260091802091322544e-01 8.316101224643255296e-01 51 | 1.000000000000000000e+00 7.730169346598478874e-01 8.167050106762755446e-01 52 | 1.000000000000000000e+00 2.811591409918925422e-01 7.915812228496058589e-01 53 | -1.000000000000000000e+00 3.969712841861736674e-01 5.436956492094292548e-01 54 | -1.000000000000000000e+00 1.561845853065626510e-01 1.149413637944906030e-01 55 | -1.000000000000000000e+00 5.462254340670188446e-01 2.775432517674987221e-01 56 | 1.000000000000000000e+00 9.147298002684245422e-01 8.771582557983079731e-01 57 | 1.000000000000000000e+00 5.480483804952250848e-01 5.571056036752553009e-01 58 | 1.000000000000000000e+00 8.750401697283192171e-01 9.688990899668680212e-01 59 | -1.000000000000000000e+00 8.693212797165661421e-02 4.463068285614146813e-01 60 | 1.000000000000000000e+00 4.598375862902908118e-01 8.810994801265042975e-01 61 | -1.000000000000000000e+00 3.757937993395965570e-02 3.157852668502673099e-01 62 | -1.000000000000000000e+00 1.644957474326235181e-01 7.126782272435483456e-01 63 | -1.000000000000000000e+00 4.696837654443807297e-01 7.651211500270849175e-02 64 | -1.000000000000000000e+00 3.457483908016467655e-01 6.395094766845679235e-01 65 | 1.000000000000000000e+00 8.106231383282292979e-01 4.007879765862863986e-01 66 | -1.000000000000000000e+00 2.198238787111499448e-01 3.812991804733401047e-01 67 | -1.000000000000000000e+00 7.804157051854651028e-01 1.695647137097033852e-01 68 | -1.000000000000000000e+00 1.332143858129445357e-01 6.990228277977402760e-01 69 | 1.000000000000000000e+00 2.147236390512209381e-01 8.529417155381071591e-01 70 | 1.000000000000000000e+00 8.302333072606401521e-01 4.886383657396295987e-01 71 | 1.000000000000000000e+00 6.565655768761827771e-01 6.457323732473402300e-01 72 | 1.000000000000000000e+00 7.214153912553155079e-01 2.401191402571628553e-01 73 | -1.000000000000000000e+00 1.323621993588250945e-01 3.770522982995052619e-01 74 | -1.000000000000000000e+00 4.977100693580231994e-01 6.548100620049546183e-02 75 | -1.000000000000000000e+00 2.540307903435575776e-01 6.519108537876683318e-01 76 | 1.000000000000000000e+00 9.925329991819680231e-01 8.009199234853079385e-01 77 | 1.000000000000000000e+00 9.871058868993810576e-01 8.958814489034055972e-01 78 | 1.000000000000000000e+00 7.329421343906131758e-01 7.162614142817553819e-01 79 | 1.000000000000000000e+00 9.944219413039141475e-01 7.052213028969684938e-01 80 | -1.000000000000000000e+00 2.072586215931685460e-01 7.856577766861695400e-01 81 | -1.000000000000000000e+00 6.419590805832910974e-01 3.274665648965369158e-01 82 | -1.000000000000000000e+00 3.809536714273964453e-02 7.312617957403662050e-01 83 | -1.000000000000000000e+00 7.506710432993660698e-01 1.265072249195027254e-01 84 | 1.000000000000000000e+00 8.343988342644053091e-01 1.841024147367045227e-01 85 | -1.000000000000000000e+00 1.758563408675866135e-01 4.620185467796678047e-01 86 | 1.000000000000000000e+00 9.017343818930201316e-01 6.451069718007419462e-01 87 | 1.000000000000000000e+00 4.645860976665043829e-01 8.036931879984066107e-01 88 | -1.000000000000000000e+00 1.095113168514771917e-01 5.896622604055185013e-01 89 | 1.000000000000000000e+00 2.595460748012538010e-01 6.656903665016161709e-01 90 | 1.000000000000000000e+00 8.248707100076894116e-01 5.982805782011830775e-01 91 | 1.000000000000000000e+00 6.066008767832432591e-01 7.002726085427883884e-01 92 | -1.000000000000000000e+00 4.960382430993948155e-03 9.641836628417100874e-01 93 | 1.000000000000000000e+00 9.187704651055217386e-02 9.457729354805407551e-01 94 | 1.000000000000000000e+00 2.146449197939926945e-01 9.362964528123883801e-01 95 | 1.000000000000000000e+00 7.650517822114389910e-01 7.758744085669813106e-01 96 | 1.000000000000000000e+00 7.910586028200941033e-01 7.463191757483961242e-01 97 | -1.000000000000000000e+00 3.116241907975170200e-01 3.477116968694634602e-01 98 | 1.000000000000000000e+00 3.473903707850247713e-01 9.182780921232864824e-01 99 | 1.000000000000000000e+00 9.469950080796307734e-01 8.927011126301536148e-01 100 | -1.000000000000000000e+00 1.636710794703860605e-01 8.530396603820045165e-02 101 | -------------------------------------------------------------------------------- /Machine Learning/Problem2/data/data_b.txt: -------------------------------------------------------------------------------- 1 | -1.000000000000000000e+00 5.956630502064887978e-01 1.930721369700331147e-01 2 | -1.000000000000000000e+00 4.369971909808768595e-01 5.448065208512253843e-01 3 | 1.000000000000000000e+00 8.999454640117418025e-01 8.459224350533809389e-01 4 | -1.000000000000000000e+00 5.550637832421146944e-01 9.263357825110341004e-03 5 | -1.000000000000000000e+00 7.468707153253317799e-02 2.828451350645997397e-01 6 | -1.000000000000000000e+00 5.560221769762927480e-01 4.096332857505222691e-01 7 | -1.000000000000000000e+00 6.795002114321661013e-01 2.984057796113770422e-04 8 | -1.000000000000000000e+00 4.710146542284510129e-02 9.463613488310902433e-01 9 | 1.000000000000000000e+00 7.238166214179516667e-01 4.940647054551196016e-01 10 | -1.000000000000000000e+00 2.443048766770348212e-01 1.766116557088118766e-01 11 | 1.000000000000000000e+00 5.974033174044348637e-01 6.139306418759564732e-01 12 | -1.000000000000000000e+00 2.069637273226839769e-01 3.987357388610405229e-01 13 | -1.000000000000000000e+00 3.221219680536931973e-01 2.844430513676717842e-01 14 | 1.000000000000000000e+00 7.445778579629004357e-01 4.351377072146410674e-01 15 | -1.000000000000000000e+00 5.451932897422276936e-01 2.062341013493946829e-01 16 | -1.000000000000000000e+00 1.696239808246304825e-01 4.358392332310834227e-03 17 | 1.000000000000000000e+00 2.339114962383119778e-01 9.684295920296079885e-01 18 | 1.000000000000000000e+00 5.622484507716049018e-01 6.023455730528913810e-01 19 | -1.000000000000000000e+00 2.802845826689382980e-01 1.867436119464408462e-01 20 | -1.000000000000000000e+00 3.079895604262861131e-02 3.020010729636660729e-01 21 | -1.000000000000000000e+00 2.278860921159409081e-01 6.609776742839968966e-01 22 | -1.000000000000000000e+00 2.775014686241032980e-01 4.238468460906956725e-01 23 | 1.000000000000000000e+00 3.378432848578666325e-01 7.944254769116867454e-01 24 | 1.000000000000000000e+00 9.939354074741211242e-01 8.490465597278253895e-01 25 | -1.000000000000000000e+00 2.863167202290489710e-01 5.959512902157737546e-02 26 | -1.000000000000000000e+00 1.209130049738784685e-01 3.141006657109364220e-01 27 | 1.000000000000000000e+00 4.003937420986558582e-02 9.676845691209272626e-01 28 | 1.000000000000000000e+00 8.086399636856905770e-01 8.618918132536165233e-01 29 | -1.000000000000000000e+00 5.539808004962957222e-01 1.907996257607158519e-02 30 | 1.000000000000000000e+00 1.163521320106113421e-01 9.398709177182549279e-01 31 | 1.000000000000000000e+00 7.301063472763824613e-01 9.499676490265697160e-01 32 | 1.000000000000000000e+00 8.468338829463664119e-01 1.867208747061926966e-01 33 | 1.000000000000000000e+00 2.608233534980617385e-01 9.834132630673575459e-01 34 | 1.000000000000000000e+00 4.199570420269110871e-01 9.327541919772166512e-01 35 | 1.000000000000000000e+00 7.719418059150739975e-01 5.532023133427225181e-01 36 | 1.000000000000000000e+00 9.206744943572922057e-01 6.352192232989287701e-01 37 | 1.000000000000000000e+00 5.293421631045709397e-01 7.222684582313321222e-01 38 | -1.000000000000000000e+00 1.379460622648243096e-02 4.214618296938960063e-01 39 | -1.000000000000000000e+00 7.751426490739787845e-02 6.299004172832024517e-01 40 | 1.000000000000000000e+00 9.276983243348403407e-01 1.040934615547040032e-01 41 | 1.000000000000000000e+00 7.957432241562331088e-01 9.215388971855021927e-01 42 | -1.000000000000000000e+00 2.239672931215105356e-01 7.332974484931875647e-02 43 | 1.000000000000000000e+00 9.422253608049449003e-01 5.218366190287160311e-01 44 | 1.000000000000000000e+00 9.651959620807119000e-01 2.014979368917352298e-01 45 | 1.000000000000000000e+00 9.940321308542976464e-01 6.081093015679264191e-01 46 | 1.000000000000000000e+00 6.658087328963868678e-01 5.027583853754320486e-01 47 | -1.000000000000000000e+00 7.176560564085979754e-01 3.989391362458483137e-02 48 | -1.000000000000000000e+00 3.487063110075080408e-01 2.238231883533835509e-01 49 | -1.000000000000000000e+00 2.709494622365625771e-01 2.082144211471060880e-01 50 | -1.000000000000000000e+00 3.182573103269309422e-01 3.915896341357829602e-01 51 | 1.000000000000000000e+00 8.274854880780476707e-01 7.264992541831815087e-01 52 | -1.000000000000000000e+00 5.362365071659461746e-01 4.372891008985122507e-01 53 | -1.000000000000000000e+00 8.427624155699986463e-02 4.113506863837846916e-01 54 | 1.000000000000000000e+00 6.701074178232642176e-01 4.536317270900971366e-01 55 | 1.000000000000000000e+00 8.544602135318842828e-01 2.735166442807617226e-01 56 | 1.000000000000000000e+00 9.949426365900033709e-01 7.080489150662402364e-01 57 | 1.000000000000000000e+00 9.344457506471001151e-01 4.628294452441769069e-01 58 | 1.000000000000000000e+00 2.748701859697193495e-01 8.689564728606002930e-01 59 | -1.000000000000000000e+00 3.562057238439320095e-01 3.450267206455902569e-01 60 | 1.000000000000000000e+00 9.877831746152774262e-01 4.914572650953237254e-01 61 | 1.000000000000000000e+00 7.987092991607904757e-01 6.098110781977787997e-01 62 | 1.000000000000000000e+00 8.038461863207471136e-01 2.830525912541793643e-01 63 | 1.000000000000000000e+00 8.130156172775482304e-01 9.302480416896362625e-01 64 | 1.000000000000000000e+00 9.059172919354674391e-01 3.568542056989827405e-01 65 | -1.000000000000000000e+00 4.337382469790391770e-01 4.783305272611882986e-01 66 | -1.000000000000000000e+00 1.143317228148252873e-01 7.397845361075310322e-01 67 | -1.000000000000000000e+00 3.449200082398627965e-01 6.130545045172697272e-01 68 | -1.000000000000000000e+00 2.781990965302262309e-01 6.411607338569670356e-01 69 | 1.000000000000000000e+00 6.964452033739039205e-01 8.817775541629888636e-01 70 | 1.000000000000000000e+00 7.989675199719957766e-01 3.531907985451089305e-01 71 | 1.000000000000000000e+00 8.768836927519900737e-01 6.774500857515753927e-01 72 | 1.000000000000000000e+00 6.348480021319043987e-01 5.015602127922110798e-01 73 | -1.000000000000000000e+00 2.119084404426209156e-01 2.856859505361388774e-01 74 | 1.000000000000000000e+00 5.865762180265414738e-01 5.713716895712067645e-01 75 | -1.000000000000000000e+00 6.985710058820981949e-02 7.915028704009061666e-01 76 | -1.000000000000000000e+00 2.355671221113677660e-01 6.438144833145231782e-02 77 | 1.000000000000000000e+00 8.877615762048381987e-01 5.200746512035342439e-01 78 | -1.000000000000000000e+00 2.449941209134361975e-01 3.213293478699230654e-02 79 | -1.000000000000000000e+00 9.069944587950939940e-02 8.690374126034291491e-01 80 | 1.000000000000000000e+00 5.088278277278240891e-01 6.612414281766401114e-01 81 | 1.000000000000000000e+00 6.520418524389888226e-01 7.755778069467987867e-01 82 | 1.000000000000000000e+00 7.722246267138052067e-01 8.067057648890757493e-01 83 | -1.000000000000000000e+00 1.326901020660873343e-01 1.189518561857797474e-01 84 | -1.000000000000000000e+00 3.261379646538820065e-02 7.520091046634519438e-01 85 | -1.000000000000000000e+00 4.051593114583660338e-01 2.829115340428780545e-01 86 | -1.000000000000000000e+00 3.261819856628866976e-01 2.026040109627341712e-01 87 | 1.000000000000000000e+00 8.473728790474870376e-01 7.041767658931633589e-01 88 | 1.000000000000000000e+00 3.762819089695988994e-01 6.355178264510906727e-01 89 | 1.000000000000000000e+00 6.647854472459970854e-01 9.596004675793697869e-01 90 | 1.000000000000000000e+00 8.591562056089958599e-01 7.006845513875639142e-01 91 | 1.000000000000000000e+00 5.181283220873837969e-01 5.480024371738405620e-01 92 | 1.000000000000000000e+00 7.278833585497114234e-01 4.235243120149102536e-01 93 | 1.000000000000000000e+00 3.096603842065530632e-01 9.127329643704719109e-01 94 | -1.000000000000000000e+00 2.894308966996906873e-01 1.389663259706833687e-01 95 | 1.000000000000000000e+00 9.077251355288461498e-01 2.432028201602077777e-01 96 | 1.000000000000000000e+00 8.173287941331177642e-01 6.937093875591073822e-01 97 | -1.000000000000000000e+00 3.721150829343222721e-02 1.226343137474231737e-01 98 | 1.000000000000000000e+00 9.715801573056137563e-02 9.315221884331510438e-01 99 | 1.000000000000000000e+00 8.075115122905083265e-01 5.837523984780504938e-01 100 | -1.000000000000000000e+00 8.298607464063743056e-01 8.628668164813368957e-02 101 | -------------------------------------------------------------------------------- /Machine Learning/Problem2/nb.py: -------------------------------------------------------------------------------- 1 | import numpy as np 2 | 3 | def readMatrix(file): 4 | fd = open(file, 'r') 5 | hdr = fd.readline() 6 | rows, cols = [int(s) for s in fd.readline().strip().split()] 7 | tokens = fd.readline().strip().split() 8 | matrix = np.zeros((rows, cols)) 9 | Y = [] 10 | for i, line in enumerate(fd): 11 | nums = [int(x) for x in line.strip().split()] 12 | Y.append(nums[0]) 13 | kv = np.array(nums[1:]) 14 | k = np.cumsum(kv[:-1:2]) 15 | v = kv[1::2] 16 | matrix[i, k] = v 17 | return matrix, tokens, np.array(Y) 18 | 19 | def nb_train(matrix, category): 20 | state = {} 21 | N = matrix.shape[1] 22 | ################### 23 | 24 | ################### 25 | return state 26 | 27 | def nb_test(matrix, state): 28 | output = np.zeros(matrix.shape[0]) 29 | ################### 30 | 31 | ################### 32 | return output 33 | 34 | def evaluate(output, label): 35 | error = (output != label).sum() * 1. / len(output) 36 | print 'Error: %1.4f' % error 37 | 38 | def main(): 39 | trainMatrix, tokenlist, trainCategory = readMatrix('./data/MATRIX.TRAIN') 40 | testMatrix, tokenlist, testCategory = readMatrix('./data/MATRIX.TEST') 41 | 42 | state = nb_train(trainMatrix, trainCategory) 43 | output = nb_test(testMatrix, state) 44 | 45 | evaluate(output, testCategory) 46 | return 47 | 48 | if __name__ == '__main__': 49 | main() 50 | -------------------------------------------------------------------------------- /Machine Learning/Problem2/ps2.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/bayeslabs/AiGym/205350c311f0e0981fe90eae84586c9bd4a9cfef/Machine Learning/Problem2/ps2.pdf -------------------------------------------------------------------------------- /Machine Learning/Problem2/svm.py: -------------------------------------------------------------------------------- 1 | import numpy as np 2 | 3 | tau = 8. 4 | 5 | def svm_readMatrix(file): 6 | fd = open(file, 'r') 7 | hdr = fd.readline() 8 | rows, cols = [int(s) for s in fd.readline().strip().split()] 9 | tokens = fd.readline().strip().split() 10 | matrix = np.zeros((rows, cols)) 11 | Y = [] 12 | for i, line in enumerate(fd): 13 | nums = [int(x) for x in line.strip().split()] 14 | Y.append(nums[0]) 15 | kv = np.array(nums[1:]) 16 | k = np.cumsum(kv[:-1:2]) 17 | v = kv[1::2] 18 | matrix[i, k] = v 19 | category = (np.array(Y) * 2) - 1 20 | return matrix, tokens, category 21 | 22 | def svm_train(matrix, category): 23 | state = {} 24 | M, N = matrix.shape 25 | ##################### 26 | Y = category 27 | matrix = 1. * (matrix > 0) 28 | squared = np.sum(matrix * matrix, axis=1) 29 | gram = matrix.dot(matrix.T) 30 | K = np.exp(-(squared.reshape((1, -1)) + squared.reshape((-1, 1)) - 2 * gram) / (2 * (tau ** 2)) ) 31 | 32 | alpha = np.zeros(M) 33 | alpha_avg = np.zeros(M) 34 | L = 1. / (64 * M) 35 | outer_loops = 40 36 | 37 | alpha_avg 38 | for ii in xrange(outer_loops * M): 39 | i = int(np.random.rand() * M) 40 | margin = Y[i] * np.dot(K[i, :], alpha) 41 | grad = M * L * K[:, i] * alpha[i] 42 | if (margin < 1): 43 | grad -= Y[i] * K[:, i] 44 | alpha -= grad / np.sqrt(ii + 1) 45 | alpha_avg += alpha 46 | 47 | alpha_avg /= (ii + 1) * M 48 | 49 | state['alpha'] = alpha 50 | state['alpha_avg'] = alpha_avg 51 | state['Xtrain'] = matrix 52 | state['Sqtrain'] = squared 53 | #################### 54 | return state 55 | 56 | def svm_test(matrix, state): 57 | M, N = matrix.shape 58 | output = np.zeros(M) 59 | ################### 60 | Xtrain = state['Xtrain'] 61 | Sqtrain = state['Sqtrain'] 62 | matrix = 1. * (matrix > 0) 63 | squared = np.sum(matrix * matrix, axis=1) 64 | gram = matrix.dot(Xtrain.T) 65 | K = np.exp(-(squared.reshape((-1, 1)) + Sqtrain.reshape((1, -1)) - 2 * gram) / (2 * (tau ** 2))) 66 | alpha_avg = state['alpha_avg'] 67 | preds = K.dot(alpha_avg) 68 | output = np.sign(preds) 69 | ################### 70 | return output 71 | 72 | def svm_evaluate(output, label): 73 | error = (output != label).sum() * 1. / len(output) 74 | print 'Error: %1.4f' % error 75 | return error 76 | 77 | def main(): 78 | trainMatrix, tokenlist, trainCategory = svm_readMatrix('./data/MATRIX.TRAIN.400') 79 | testMatrix, tokenlist, testCategory = svm_readMatrix('./data/MATRIX.TEST') 80 | 81 | state = svm_train(trainMatrix, trainCategory) 82 | output = svm_test(testMatrix, state) 83 | 84 | evaluate(output, testCategory) 85 | return 86 | 87 | if __name__ == '__main__': 88 | main() 89 | -------------------------------------------------------------------------------- /Machine Learning/Problem3/1_Simple_Neural_Network.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "# CS229: Problem Set 3\n", 8 | "## Problem 1: A Simple Neural Network\n", 9 | "\n", 10 | "\n", 11 | "**C. Combier**\n", 12 | "\n", 13 | "This iPython Notebook provides solutions to Stanford's CS229 (Machine Learning, Fall 2017) graduate course problem set 3, taught by Andrew Ng.\n", 14 | "\n", 15 | "The problem set can be found here: [./ps3.pdf](ps3.pdf)\n", 16 | "\n", 17 | "I chose to write the solutions to the coding questions in Python, whereas the Stanford class is taught with Matlab/Octave.\n", 18 | "\n", 19 | "## Notation\n", 20 | "\n", 21 | "- $x^i$ is the $i^{th}$ feature vector\n", 22 | "- $y^i$ is the expected outcome for the $i^{th}$ training example\n", 23 | "- $m$ is the number of training examples\n", 24 | "- $n$ is the number of features" 25 | ] 26 | }, 27 | { 28 | "cell_type": "markdown", 29 | "metadata": {}, 30 | "source": [ 31 | "### Question 1.b)\n", 32 | "\n", 33 | "![triangle separation](data/triangle_pb3_1.jpg)\n", 34 | "\n", 35 | "It seems that a triangle can separate the data.\n", 36 | "\n", 37 | "We can construct a weight matrix by using a combination of linear classifiers, where each side of the triangle represents a decision boundary.\n", 38 | "\n", 39 | "Each side of the triangle can be represented by an equation of the form $w_0 +w_1 x_1 + w_2 x_2 = 0$. If we transform this equality into an inequality, then the output represents on which side of the decision boundary a given data point $(x_1,x_2)$ belongs. The intersection of the outputs for each of these decision boundaries tells us whether $(x_1,x_2)$ lies within the triangle, in which case we will classify it $0$, and if not as $1$.\n", 40 | "\n", 41 | "The first weight matrix can be written as:\n", 42 | "\n" 43 | ] 44 | }, 45 | { 46 | "cell_type": "markdown", 47 | "metadata": {}, 48 | "source": [ 49 | "$$\n", 50 | "W^{[1]} = \\left ( \\begin{array}{ccc}\n", 51 | "-1 & 4 & 0 \\\\\n", 52 | "-1 & 0 & 4 \\\\\n", 53 | "4.5 & -1 & -1\n", 54 | "\\end{array} \\right )\n", 55 | "$$\n", 56 | "\n", 57 | "The input vector is:\n", 58 | "$$\n", 59 | "X = (\\begin{array}{ccc}\n", 60 | "1 & x_1 & x_2\n", 61 | "\\end{array})^T\n", 62 | "$$\n", 63 | "\n", 64 | "- The first line of $W^{[1]}$ is the equation for the vertical side of the triangle, $x_1 = 0.25$\n", 65 | "- The second line of $W^{[1]}$ is the equation for the horizontal side of the triangle, $x_2 = 0.25$\n", 66 | "- The third line of $W^{[1]}$ is the equation for the oblique side of the triangle, $x_2 = -x_1 + 4.5$\n", 67 | "\n", 68 | "Consequently, with the given activation function, if the training example given by ($x_1$, $x_2$) lies within the triangle, then:\n", 69 | "\n", 70 | "$$\n", 71 | "f(W^{[1]}X) = (\\begin{array}{ccc}\n", 72 | "1 & 1 & 1\n", 73 | "\\end{array})^T\n", 74 | "$$\n", 75 | "\n", 76 | "In all other cases, at least one element of the output vector $f(W^{[1]}X)$ is not equal to $1$.\n", 77 | "\n", 78 | "We can use this observation to find weights for the ouput layer. We take the sum of the components of $f(W^{[1]}X)$, and compare the value to 2.5 to check if all elements are equal to $1$ or not. This gives the weight matrix:\n", 79 | "\n", 80 | "$$\n", 81 | "W^{[2]} =(\\begin{array}{cccc}\n", 82 | "2.5 & -1 & -1 & -1\n", 83 | "\\end{array})\n", 84 | "$$\n", 85 | "\n", 86 | "The additional term 2.5 is the zero intercept. With this weight matrix, the ouput of the final layer will be $0$ if the training example is within the triangle, and $1$ if it is outside of the triangle.\n", 87 | "\n", 88 | "The " 89 | ] 90 | }, 91 | { 92 | "cell_type": "markdown", 93 | "metadata": {}, 94 | "source": [ 95 | "### Question 1.c)\n", 96 | "\n", 97 | "A linear activation function does not work, because the problem is not linearly separable, i.e. there is no hyperplane that perfectly separates the data." 98 | ] 99 | } 100 | ], 101 | "metadata": { 102 | "kernelspec": { 103 | "display_name": "Python 2", 104 | "language": "python", 105 | "name": "python2" 106 | }, 107 | "language_info": { 108 | "codemirror_mode": { 109 | "name": "ipython", 110 | "version": 2 111 | }, 112 | "file_extension": ".py", 113 | "mimetype": "text/x-python", 114 | "name": "python", 115 | "nbconvert_exporter": "python", 116 | "pygments_lexer": "ipython2", 117 | "version": "2.7.15" 118 | } 119 | }, 120 | "nbformat": 4, 121 | "nbformat_minor": 2 122 | } 123 | -------------------------------------------------------------------------------- /Machine Learning/Problem3/2_EM_for_MAP.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "# CS229: Problem Set 3\n", 8 | "## Problem 2: Expectation-Maximization for Maximum a Posteriori\n", 9 | "\n", 10 | "\n", 11 | "**C. Combier**\n", 12 | "\n", 13 | "This iPython Notebook provides solutions to Stanford's CS229 (Machine Learning, Fall 2017) graduate course problem set 3, taught by Andrew Ng.\n", 14 | "\n", 15 | "The problem set can be found here: [./ps3.pdf](ps3.pdf)\n", 16 | "\n", 17 | "I chose to write the solutions to the coding questions in Python, whereas the Stanford class is taught with Matlab/Octave.\n", 18 | "\n", 19 | "## Notation\n", 20 | "\n", 21 | "- $x^i$ is the $i^{th}$ feature vector\n", 22 | "- $y^i$ is the expected outcome for the $i^{th}$ training example\n", 23 | "- $m$ is the number of training examples\n", 24 | "- $n$ is the number of features" 25 | ] 26 | }, 27 | { 28 | "cell_type": "markdown", 29 | "metadata": {}, 30 | "source": [ 31 | "This problem is very similar to the derivation of the EM algorithm for MLE given in the lectures notes. The difference is that we are now in a Bayesian setting, and impose a prior on $\\theta$:\n", 32 | "\n", 33 | "$$\n", 34 | "MAP = \\prod_i^m \\sum_{z^i} p(x^i, z^i | \\theta)p(\\theta)\n", 35 | "$$\n", 36 | "\n", 37 | "Here, $z^i$ denotes the latent (hidden) random variables.\n", 38 | "\n", 39 | "### Step 1: E-step\n", 40 | "\n", 41 | "1. We start by taking the log-MAP:\n", 42 | "\n", 43 | "$$\n", 44 | "\\log MAP = \\sum_i^m \\log \\sum_{z^i} Q_i(z^i) \\frac{p(x^i, z^i | \\theta)}{Q_i(z^i)} + \\log p(\\theta)\n", 45 | "$$\n", 46 | "\n", 47 | "2. We apply Jensen's inequality to the above formula:\n", 48 | "\n", 49 | "$$\n", 50 | "\\log MAP \\geq \\sum_i^m \\sum_{z^i} Q_i(z^i) \\log \\frac{p(x^i, z^i | \\theta)}{Q_i(z^i)} + \\log p(\\theta)\n", 51 | "$$\n", 52 | "\n", 53 | "3. Next, we need to choose a distribution $Q_i$ for $z^i$. The above inequality become an equality if $\\frac{p(x^i, z^i | \\theta)}{Q_i(z^i)} = cste$, which will lead to the inequality becoming tight for the current value of $\\theta$:\n", 54 | "\n", 55 | "$$\n", 56 | "\\begin{align*}\n", 57 | "\\frac{p(x^i, z^i | \\theta)}{Q_i(z^i)} = \\lambda & \\iff Q_i(z^i) = \\frac{1}{\\lambda} p(x^i, z^i | \\theta) \\\\\n", 58 | "& \\iff Q_i(z^i) = \\frac{p(x^i, z^i | \\theta) }{\\sum_{z^i}p(x^i, z^i | \\theta)} \\\\\n", 59 | "& \\iff Q_i(z^i) = \\frac{p(x^i, z^i | \\theta) }{p(x^i | \\theta)} \\\\\n", 60 | "& \\iff Q_i(z^i) = p(z^i | x^i, \\theta)\n", 61 | "\\end{align*}\n", 62 | "$$\n", 63 | "\n", 64 | "This obtained by using the fact that since $Q_i$ is a distribution, $\\sum_{z^i} Q_i(z^i) = 1 \\implies \\lambda = \\sum_{z^i} p(x^i,z^i | \\theta)$.\n", 65 | "\n", 66 | "**This completes the E-step of the EM algorithm.**" 67 | ] 68 | }, 69 | { 70 | "cell_type": "markdown", 71 | "metadata": {}, 72 | "source": [ 73 | "### Step 2: M-step\n", 74 | "\n", 75 | "For the M-step, we simply maximize the expression obtained in step 2) with respect to $\\theta$:\n", 76 | "\n", 77 | "$$\n", 78 | "\\theta := \\text{arg}\\max_{\\theta} \\sum_i^m \\sum_{z^i} Q_i(z^i) \\log \\frac{p(x^i, z^i | \\theta)}{Q_i(z^i)} + \\log p(\\theta)\n", 79 | "$$\n", 80 | "\n", 81 | "As usual, we do this by taking the gradient with respect to $\\theta$ and setting it to $0$." 82 | ] 83 | }, 84 | { 85 | "cell_type": "markdown", 86 | "metadata": {}, 87 | "source": [ 88 | "### Proof of Convergence\n", 89 | "\n", 90 | "We consider two successive iterations $k+1$ and $k$ of EM, and we will prove that $\\ell(\\theta^{k+1}) \\geq \\ell(\\theta^k)$, i.e. that $\\ell$ is monotonically increasing.\n", 91 | "\n", 92 | "We refer the reader to the lecture notes, as the proof is the same." 93 | ] 94 | } 95 | ], 96 | "metadata": { 97 | "kernelspec": { 98 | "display_name": "Python 2", 99 | "language": "python", 100 | "name": "python2" 101 | }, 102 | "language_info": { 103 | "codemirror_mode": { 104 | "name": "ipython", 105 | "version": 2 106 | }, 107 | "file_extension": ".py", 108 | "mimetype": "text/x-python", 109 | "name": "python", 110 | "nbconvert_exporter": "python", 111 | "pygments_lexer": "ipython2", 112 | "version": "2.7.15" 113 | } 114 | }, 115 | "nbformat": 4, 116 | "nbformat_minor": 2 117 | } 118 | -------------------------------------------------------------------------------- /Machine Learning/Problem3/3_EM_Application.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "# CS229: Problem Set 3\n", 8 | "## Problem 3: EM Application - Paper Reviews\n", 9 | "\n", 10 | "\n", 11 | "**C. Combier**\n", 12 | "\n", 13 | "This iPython Notebook provides solutions to Stanford's CS229 (Machine Learning, Fall 2017) graduate course problem set 3, taught by Andrew Ng.\n", 14 | "\n", 15 | "The problem set can be found here: [./ps3.pdf](ps3.pdf)\n", 16 | "\n", 17 | "I chose to write the solutions to the coding questions in Python, whereas the Stanford class is taught with Matlab/Octave.\n", 18 | "\n", 19 | "## Notation\n", 20 | "\n", 21 | "- $x^i$ is the $i^{th}$ feature vector\n", 22 | "- $y^i$ is the expected outcome for the $i^{th}$ training example\n", 23 | "- $m$ is the number of training examples\n", 24 | "- $n$ is the number of features" 25 | ] 26 | }, 27 | { 28 | "cell_type": "markdown", 29 | "metadata": {}, 30 | "source": [ 31 | "### Question 3.a.i)\n", 32 | "\n", 33 | "$x^{pr} = y^{pr}+z^{pr}+\\epsilon^{pr}$.\n", 34 | "\n", 35 | "Given that, $y^{pr}$, $z^{pr}$ and $\\epsilon^{pr}$ are all Gaussian, then $x^{pr}$ is also gaussian, with\n", 36 | "\n", 37 | "- Mean $\\mu_p + \\nu_r$\n", 38 | "- Variance $\\sigma_p^2 + \\tau_r^2 + \\sigma^2$\n", 39 | "\n", 40 | "The joint probability distribution for $x^{pr}, y^{pr}, z^{pr})$ is gaussian, with:\n", 41 | "\n", 42 | "- Mean $[\\mu_p + \\nu_r, \\mu_p, \\nu_r]^T$\n", 43 | "- Covariance:\n", 44 | "\n", 45 | "$$\n", 46 | "\\left( \\begin{array}{ccc}\n", 47 | "\\sigma_p^2 + \\tau_r^2 + \\sigma^2 & \\sigma_p^2 & \\tau_r^2 \\\\\n", 48 | "\\sigma_p^2 & \\sigma_p^2 & 0 \\\\\n", 49 | "\\tau_r^2 & 0 & \\tau_r^2\n", 50 | "\\end{array}\\right)\n", 51 | "$$" 52 | ] 53 | }, 54 | { 55 | "cell_type": "markdown", 56 | "metadata": {}, 57 | "source": [ 58 | "### Question 3.a.ii)\n", 59 | "\n", 60 | "For the E-step, we are looking for a certain distribution $Q_{pr}(y^{pr},z^{pr})$ such that:\n", 61 | "\n", 62 | "$$\n", 63 | "\\frac{p(x^{pr},y^{pr},{z^{pr}})}{Q_{pr}(y^{pr},z^{pr})} = cste\n", 64 | "$$\n", 65 | "\n", 66 | "Since $Q_{pr}$ is a distribution, it must sum (discrete case) or integrate (continuous case) to one, i.e. : \n", 67 | "\n", 68 | "$$\n", 69 | "\\sum_{r=1}^R \\sum_{p=1}^P Q_{pr}(y^{pr},z^{pr}) = 1\n", 70 | "$$\n", 71 | "\n", 72 | "This yields the value for the constant, and hence the value of $Q_{pr}$:\n", 73 | "\n", 74 | "$$\n", 75 | "\\begin{align*}\n", 76 | "Q_{pr} &= \\frac{p(x^{pr},y^{pr},{z^{pr}})}{\\sum_{r=1}^R \\sum_{p=1}^P p(x^{pr},y^{pr},{z^{pr}})} \\\\\n", 77 | "& = \\frac{p(x^{pr},y^{pr},{z^{pr}})}{p(x^{pr})}\n", 78 | "\\end{align*}\n", 79 | "$$\n", 80 | "\n", 81 | "We recognize the conditional probability given below:\n", 82 | "\n", 83 | "$$\n", 84 | "Q_{pr} = p(y^{pr},z^{pr} | x^{pr})\n", 85 | "$$\n", 86 | "\n", 87 | "This is also a gaussian distribution. Calculations are heavy, but the mean and variance of this joint distribution are given by:\n", 88 | "\n", 89 | "\\begin{align*}\n", 90 | "\\mu_x &=\n", 91 | "\\begin{bmatrix}\n", 92 | "\\mu_p \\\\\n", 93 | "\\nu_r \\\\\n", 94 | "\\end{bmatrix} + \\frac{x^{pr} - \\mu_p - \\nu_r}{\\sigma_p^2 + \\tau_r^2 + \\sigma^2} \n", 95 | "\\begin{bmatrix}\n", 96 | "\\sigma_p^2 \\\\\n", 97 | "\\tau^2 \\\\\n", 98 | "\\end{bmatrix}\n", 99 | "\\end{align*}\n", 100 | "\n", 101 | "\\begin{align*}\n", 102 | "\\Sigma_x \n", 103 | "&= \\begin{bmatrix}\n", 104 | "\\sigma_p^2 & 0 \\\\ \n", 105 | "0 & \\tau_r^2 \\\\ \n", 106 | "\\end{bmatrix} - \\frac{1}{\\sigma_p^2 + \\tau_r^2 + \\sigma^2} \\begin{bmatrix}\n", 107 | "\\sigma_p^4 & \\sigma_p^2 \\tau_r^2 \\\\\n", 108 | "\\tau_r^2 \\sigma_p^2 & \\tau_r^4 \\\\\n", 109 | "\\end{bmatrix} \n", 110 | "\\end{align*}" 111 | ] 112 | }, 113 | { 114 | "cell_type": "markdown", 115 | "metadata": {}, 116 | "source": [ 117 | "### Question 3.b)" 118 | ] 119 | }, 120 | { 121 | "cell_type": "markdown", 122 | "metadata": {}, 123 | "source": [ 124 | "In the E-step, we calculate a lower bound for the log-likelihood, and make it tight for the current value of the parameters of the latent variables $y^{pr}$ and $z^{pr}$. In the M-step, we update those parameters by maximizing the lower bound calculated in the E-step. This is done by calculating the gradient of the lower bound with respect to the parameters ($\\mu_p, \\sigma_p, \\nu_r, \\tau_r $), and setting the gradient to $0$." 125 | ] 126 | } 127 | ], 128 | "metadata": { 129 | "kernelspec": { 130 | "display_name": "Python 2", 131 | "language": "python", 132 | "name": "python2" 133 | }, 134 | "language_info": { 135 | "codemirror_mode": { 136 | "name": "ipython", 137 | "version": 2 138 | }, 139 | "file_extension": ".py", 140 | "mimetype": "text/x-python", 141 | "name": "python", 142 | "nbconvert_exporter": "python", 143 | "pygments_lexer": "ipython2", 144 | "version": "2.7.15" 145 | } 146 | }, 147 | "nbformat": 4, 148 | "nbformat_minor": 2 149 | } 150 | -------------------------------------------------------------------------------- /Machine Learning/Problem3/4_KL_Divergence.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "# CS229: Problem Set 3\n", 8 | "## Problem 4: KL Divergence and Maximum Likelihood\n", 9 | "\n", 10 | "\n", 11 | "**C. Combier**\n", 12 | "\n", 13 | "This iPython Notebook provides solutions to Stanford's CS229 (Machine Learning, Fall 2017) graduate course problem set 3, taught by Andrew Ng.\n", 14 | "\n", 15 | "The problem set can be found here: [./ps3.pdf](ps3.pdf)\n", 16 | "\n", 17 | "I chose to write the solutions to the coding questions in Python, whereas the Stanford class is taught with Matlab/Octave.\n", 18 | "\n", 19 | "## Notation\n", 20 | "\n", 21 | "- $x^i$ is the $i^{th}$ feature vector\n", 22 | "- $y^i$ is the expected outcome for the $i^{th}$ training example\n", 23 | "- $m$ is the number of training examples\n", 24 | "- $n$ is the number of features" 25 | ] 26 | }, 27 | { 28 | "cell_type": "markdown", 29 | "metadata": {}, 30 | "source": [ 31 | "### Question 4.a)\n", 32 | "\n", 33 | "The goal is to prove that $KL(P||Q) \\geq 0$.\n", 34 | "\n", 35 | "$$\n", 36 | "\\begin{align*}\n", 37 | "KL(P||Q) &= \\sum_x P(x) \\log \\frac{P(x)}{Q(x)} \\\\\n", 38 | "&= -\\sum_x P(x) \\log \\frac{Q(x)}{P(x)} \\\\\n", 39 | "& \\geq -\\log \\sum_x P(x) \\frac{Q(x)}{P(x)} \\\\\n", 40 | "& \\geq -\\log \\sum_x Q(x) \\\\\n", 41 | "& \\geq - \\log 1 \\\\\n", 42 | "& \\geq 0\n", 43 | "\\end{align*}\n", 44 | "$$\n", 45 | "\n", 46 | "Now we prove $KL(P||Q) = 0 \\iff P=Q$.\n", 47 | "\n", 48 | "1. If $P = Q$, then it is immediate that $\\log \\frac{P(x)}{Q(x)} = 0$ and hence $KL(P||Q) = 0$\n", 49 | "\n", 50 | "2. If $KL(P||Q) = 0$, then $\\forall x$, $\\frac{P(x)}{Q(x)} = 1$, therefore $P = Q$\n", 51 | "\n" 52 | ] 53 | }, 54 | { 55 | "cell_type": "markdown", 56 | "metadata": {}, 57 | "source": [ 58 | "### Question 4.b)\n", 59 | "\n", 60 | "\\begin{align*}\n", 61 | "KL(P(X) \\parallel Q(X)) + KL(P(Y|X) \\parallel Q(Y|X)) \n", 62 | "&= \\sum_x P(x) (\\log \\frac{P(x)}{Q(x)} + \\sum_y P(y|x) \\log \\frac{P(y|x)}{Q(y|x)}) \\\\\n", 63 | "&= \\sum_x P(x) \\sum_y P(y|x) ( \\log \\frac{P(x)}{Q(x)} + \\log \\frac{P(y|x)}{Q(y|x)} ) \\\\\n", 64 | "\\end{align*}\n", 65 | "\n", 66 | "We can include the term $\\log \\frac{P(x)}{Q(x)}$ in the sum over $y$, because $\\sum_y P(y|x) = 1$ since $P$ is a probability distribution. We continue the calculation:\n", 67 | "\n", 68 | "\\begin{align*}\n", 69 | "KL(P(X) \\parallel Q(X)) + KL(P(Y|X) \\parallel Q(Y|X)) \n", 70 | "&= \\sum_x P(x) \\sum_y P(y|x) \\log \\frac{P(x) P(y|x)}{Q(x) Q(y|x)} \\\\\n", 71 | "&= \\sum_x P(x) \\sum_y P(y|x) \\log \\frac{P(x, y)}{Q(x, y)} \\\\\n", 72 | "&= \\sum_x P(x, y) \\log \\frac{P(x, y)}{Q(x, y)} \\\\\n", 73 | "&= KL(P(X, Y) || Q(X, Y)) \\\\\n", 74 | "\\end{align*}" 75 | ] 76 | }, 77 | { 78 | "cell_type": "markdown", 79 | "metadata": {}, 80 | "source": [ 81 | "### Question 4.c)\n", 82 | "\n", 83 | "\\begin{align*}\n", 84 | "KL(\\hat P || P_{\\theta}) \n", 85 | "&= \\sum_x \\hat P(x) \\log \\frac{\\hat P(x)}{P_{\\theta}(x)} \\\\\n", 86 | "&= - \\sum_x \\hat P(x) \\log \\frac{P_{\\theta}(x)}{\\hat P(x)} \\\\\n", 87 | "&= - \\sum_x (\\frac{1}{m} \\sum_{i=1}^{m} 1 \\{x^{(i)} = x\\}). \\log \\frac{P_{\\theta}(x)}{\\frac{1}{m} \\sum_{i=1}^{m} 1 \\{x^{(i)} = x\\}} \\\\\n", 88 | "&= - \\frac{1}{m} \\sum_{i=1}^{m} \\log P_{\\theta}(x^{(i)}) \\\\\n", 89 | "\\end{align*}\n", 90 | "\n", 91 | "Thus, minimizing $KL(\\hat P || P_{\\theta})$ is equivalent to maximizing $\\sum_{i=1}^{m} \\log P_{\\theta}(x^{(i)}) = \\ell(\\theta)$" 92 | ] 93 | } 94 | ], 95 | "metadata": { 96 | "kernelspec": { 97 | "display_name": "Python 2", 98 | "language": "python", 99 | "name": "python2" 100 | }, 101 | "language_info": { 102 | "codemirror_mode": { 103 | "name": "ipython", 104 | "version": 2 105 | }, 106 | "file_extension": ".py", 107 | "mimetype": "text/x-python", 108 | "name": "python", 109 | "nbconvert_exporter": "python", 110 | "pygments_lexer": "ipython2", 111 | "version": "2.7.15" 112 | } 113 | }, 114 | "nbformat": 4, 115 | "nbformat_minor": 2 116 | } 117 | -------------------------------------------------------------------------------- /Machine Learning/Problem3/data/mandrill-large.tiff: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/bayeslabs/AiGym/205350c311f0e0981fe90eae84586c9bd4a9cfef/Machine Learning/Problem3/data/mandrill-large.tiff -------------------------------------------------------------------------------- /Machine Learning/Problem3/data/mandrill-small.tiff: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/bayeslabs/AiGym/205350c311f0e0981fe90eae84586c9bd4a9cfef/Machine Learning/Problem3/data/mandrill-small.tiff -------------------------------------------------------------------------------- /Machine Learning/Problem3/data/triangle_pb3_1.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/bayeslabs/AiGym/205350c311f0e0981fe90eae84586c9bd4a9cfef/Machine Learning/Problem3/data/triangle_pb3_1.jpg -------------------------------------------------------------------------------- /Machine Learning/Problem3/ps3.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/bayeslabs/AiGym/205350c311f0e0981fe90eae84586c9bd4a9cfef/Machine Learning/Problem3/ps3.pdf -------------------------------------------------------------------------------- /Machine Learning/Problem4/2_EM-Convergence.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "# CS229: Problem Set 4\n", 8 | "## Problem 2: Expectation-Maximization Convergence \n", 9 | "\n", 10 | "\n", 11 | "**C. Combier**\n", 12 | "\n", 13 | "This iPython Notebook provides solutions to Stanford's CS229 (Machine Learning, Fall 2017) graduate course problem set 3, taught by Andrew Ng.\n", 14 | "\n", 15 | "The problem set can be found here: [./ps4.pdf](ps4.pdf)\n", 16 | "\n", 17 | "I chose to write the solutions to the coding questions in Python, whereas the Stanford class is taught with Matlab/Octave.\n", 18 | "\n", 19 | "## Notation\n", 20 | "\n", 21 | "- $x_i$ is the $i^{th}$ feature vector\n", 22 | "- $y_i$ is the expected outcome for the $i^{th}$ training example\n", 23 | "- $z_i$'s are the latent (hidden) variables\n", 24 | "- $m$ is the number of training examples\n", 25 | "- $n$ is the number of features" 26 | ] 27 | }, 28 | { 29 | "cell_type": "markdown", 30 | "metadata": { 31 | "colab_type": "text", 32 | "id": "2URlArNNdz_q" 33 | }, 34 | "source": [ 35 | "After the E-step, we obtain a lower bound on the log-likelihood denoted by:\n", 36 | "\n", 37 | "$$\n", 38 | "\\beta = \\sum_i^m \\sum_{z_i} Q_i(z_i) \\log \\frac{ p(x_i, z_i;\\theta)}{Q_i (z_i)}\n", 39 | "$$\n", 40 | "\n", 41 | "This lower bound $\\beta$ has been made tight by setting:\n", 42 | "\n", 43 | "$$\n", 44 | "Q_i(z_i) = p(z_i |x_i; \\theta) = \\frac{p(x_i, z_i; \\theta)}{p(x_i; \\theta)}\n", 45 | "$$\n", 46 | "\n", 47 | "For the M-step, we maximize $\\beta$ by taking the gradient with respect to $\\theta$ and setting it to zero.\n", 48 | "\n", 49 | "Suppose that EM hase converged, and that $\\theta = \\theta^*$.\n", 50 | "\n", 51 | "In this case:\n", 52 | "\n", 53 | "$$\n", 54 | "\\begin{align*}\n", 55 | "\\nabla_{\\theta} \\beta & = \\sum_i^m \\sum_{z_i} Q_i(z_i) \\nabla_{\\theta}\\log \\frac{ p(x_i, z_i;\\theta)}{Q_i (z_i)}_{| \\theta = \\theta^*} \\\\\n", 56 | "&= \\sum_i^m \\sum_{z_i} Q_i(z_i) \\frac{Q_i(z_i) }{p(x_i, z_i;\\theta^*) Q_i (z_i)} \\nabla_{\\theta} p(x_i, z_i;\\theta)_{| \\theta = \\theta^*} \\\\\n", 57 | "&= \\sum_i^m \\sum_{z_i} \\frac{p(x_i, z_i; \\theta^*)}{p(x_i; \\theta^*) p(x_i, z_i; \\theta^*)} \\nabla_{\\theta} p(x_i, z_i;\\theta^*)_{| \\theta = \\theta^*} \\\\\n", 58 | "&= \\sum_i^m \\sum_{z_i} \\frac{\\nabla_{\\theta} p(x_i, z_i;\\theta)_{| \\theta = \\theta^*} }{p(x_i; \\theta^*) } \\\\\n", 59 | "&= \\sum_i^m \\frac{\\nabla_{\\theta} p(x_i;\\theta)_{| \\theta = \\theta^*} }{p(x_i; \\theta^*) } \\\\\n", 60 | "&= \\sum_i^m \\nabla_{\\theta} \\log p(x_i;\\theta)_{| \\theta = \\theta^*} \\\\\n", 61 | "&= \\nabla_{\\theta} ( \\sum_i^m \\log p(x_i;\\theta) )_{| \\theta = \\theta^*}\\\\\n", 62 | "&= \\nabla_{\\theta} \\ell (\\theta)_{| \\theta = \\theta^*}\n", 63 | "\\end{align*}\n", 64 | "$$" 65 | ] 66 | } 67 | ], 68 | "metadata": { 69 | "colab": { 70 | "collapsed_sections": [], 71 | "name": "Bonjour, Colaboratory", 72 | "provenance": [], 73 | "version": "0.3.2" 74 | }, 75 | "kernelspec": { 76 | "display_name": "Python 2", 77 | "language": "python", 78 | "name": "python2" 79 | }, 80 | "language_info": { 81 | "codemirror_mode": { 82 | "name": "ipython", 83 | "version": 2 84 | }, 85 | "file_extension": ".py", 86 | "mimetype": "text/x-python", 87 | "name": "python", 88 | "nbconvert_exporter": "python", 89 | "pygments_lexer": "ipython2", 90 | "version": "2.7.15" 91 | } 92 | }, 93 | "nbformat": 4, 94 | "nbformat_minor": 1 95 | } 96 | -------------------------------------------------------------------------------- /Machine Learning/Problem4/4_Independent-Component-Analysis.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "# CS229: Problem Set 4\n", 8 | "## Problem 4: Independent Component Analysis\n", 9 | "\n", 10 | "\n", 11 | "**C. Combier**\n", 12 | "\n", 13 | "This iPython Notebook provides solutions to Stanford's CS229 (Machine Learning, Fall 2017) graduate course problem set 3, taught by Andrew Ng.\n", 14 | "\n", 15 | "The problem set can be found here: [./ps4.pdf](ps4.pdf)\n", 16 | "\n", 17 | "I chose to write the solutions to the coding questions in Python, whereas the Stanford class is taught with Matlab/Octave.\n", 18 | "\n", 19 | "## Notation\n", 20 | "\n", 21 | "- $x_i$ is the $i^{th}$ feature vector\n", 22 | "- $y_i$ is the expected outcome for the $i^{th}$ training example\n", 23 | "- $z_i$'s are the latent (hidden) variables\n", 24 | "- $m$ is the number of training examples\n", 25 | "- $n$ is the number of features\n", 26 | "\n", 27 | "For clarity, I've inlined the code of the provided helper function ```belsej.py```.\n", 28 | "\n", 29 | "## Dependencies\n", 30 | "\n", 31 | "I installed ```sounddevice``` to Anaconda with the following command:\n", 32 | "\n", 33 | "```conda install -c conda-forge python-sounddevice ```\n", 34 | "\n", 35 | "First, let's set up the environment and write helper functions:\n", 36 | "\n", 37 | "- ```normalize``` ensures all mixes have the same volume\n", 38 | "- ```load_data``` loads the mix\n", 39 | "- ```play``` plays the audio using ```sounddevice```" 40 | ] 41 | }, 42 | { 43 | "cell_type": "code", 44 | "execution_count": 2, 45 | "metadata": {}, 46 | "outputs": [], 47 | "source": [ 48 | "### Independent Components Analysis\n", 49 | "###\n", 50 | "### This program requires a working installation of:\n", 51 | "###\n", 52 | "### On Mac:\n", 53 | "### conda install -c conda-forge python-sounddevice\n", 54 | "###\n", 55 | "\n", 56 | "import sounddevice as sd\n", 57 | "import numpy as np\n", 58 | "\n", 59 | "Fs = 11025\n", 60 | "\n", 61 | "def normalize(dat):\n", 62 | " return 0.99 * dat / np.max(np.abs(dat))\n", 63 | "\n", 64 | "def load_data():\n", 65 | " mix = np.loadtxt('data/mix.dat')\n", 66 | " return mix\n", 67 | "\n", 68 | "def play(vec):\n", 69 | " sd.play(vec, Fs, blocking=True)" 70 | ] 71 | }, 72 | { 73 | "cell_type": "markdown", 74 | "metadata": {}, 75 | "source": [ 76 | "Next we write a numerically stable sigmoid function, to avoid overflows:" 77 | ] 78 | }, 79 | { 80 | "cell_type": "code", 81 | "execution_count": 1, 82 | "metadata": {}, 83 | "outputs": [], 84 | "source": [ 85 | "# Numerically stable sigmoid\n", 86 | "def sigmoid(x):\n", 87 | " return np.where(x >= 0, 1 / (1 + np.exp(-x)), np.exp(x) / (1 + np.exp(x)))" 88 | ] 89 | }, 90 | { 91 | "cell_type": "markdown", 92 | "metadata": {}, 93 | "source": [ 94 | "The following functions calculates the weights to separate the independent components of the five mixes, using stochastic gradient descent and annealing to speed up convergence." 95 | ] 96 | }, 97 | { 98 | "cell_type": "code", 99 | "execution_count": 3, 100 | "metadata": {}, 101 | "outputs": [], 102 | "source": [ 103 | "def unmixer(X):\n", 104 | " M, N = X.shape\n", 105 | " W = np.eye(N)\n", 106 | "\n", 107 | " anneal = [0.1, 0.1, 0.1, 0.05, 0.05, 0.05, 0.02, 0.02, 0.01, 0.01,\n", 108 | " 0.005, 0.005, 0.002, 0.002, 0.001, 0.001]\n", 109 | " print('Separating tracks ...')\n", 110 | " for alpha in anneal:\n", 111 | " for xi in X:\n", 112 | " W += alpha * (np.outer(1 - 2 * sigmoid(np.dot(W, xi.T)), xi) + np.linalg.inv(W.T))\n", 113 | " return W" 114 | ] 115 | }, 116 | { 117 | "cell_type": "markdown", 118 | "metadata": {}, 119 | "source": [ 120 | "Finally, this last function unmixes the 5 mixes to extract the independent components." 121 | ] 122 | }, 123 | { 124 | "cell_type": "code", 125 | "execution_count": 4, 126 | "metadata": {}, 127 | "outputs": [], 128 | "source": [ 129 | "def unmix(X, W):\n", 130 | " S = np.zeros(X.shape)\n", 131 | " S = X.dot(W.T)\n", 132 | " return S" 133 | ] 134 | }, 135 | { 136 | "cell_type": "markdown", 137 | "metadata": {}, 138 | "source": [ 139 | "Now, we load the mix data:" 140 | ] 141 | }, 142 | { 143 | "cell_type": "code", 144 | "execution_count": 8, 145 | "metadata": {}, 146 | "outputs": [ 147 | { 148 | "name": "stdout", 149 | "output_type": "stream", 150 | "text": [ 151 | "Playing mixed track 0\n", 152 | "Playing mixed track 1\n", 153 | "Playing mixed track 2\n", 154 | "Playing mixed track 3\n", 155 | "Playing mixed track 4\n" 156 | ] 157 | } 158 | ], 159 | "source": [ 160 | "X = normalize(load_data())\n", 161 | "for i in range(X.shape[1]):\n", 162 | " print('Playing mixed track %d' % i)\n", 163 | " play(X[:, i])" 164 | ] 165 | }, 166 | { 167 | "cell_type": "markdown", 168 | "metadata": {}, 169 | "source": [ 170 | "Next, we run Independent Component Analysis and separate the components in the mix:" 171 | ] 172 | }, 173 | { 174 | "cell_type": "code", 175 | "execution_count": 7, 176 | "metadata": {}, 177 | "outputs": [ 178 | { 179 | "name": "stdout", 180 | "output_type": "stream", 181 | "text": [ 182 | "Separating tracks ...\n" 183 | ] 184 | } 185 | ], 186 | "source": [ 187 | "W = unmixer(X)\n", 188 | "S = normalize(unmix(X, W))" 189 | ] 190 | }, 191 | { 192 | "cell_type": "markdown", 193 | "metadata": {}, 194 | "source": [ 195 | "Finally, we play the separated components:" 196 | ] 197 | }, 198 | { 199 | "cell_type": "code", 200 | "execution_count": 9, 201 | "metadata": {}, 202 | "outputs": [ 203 | { 204 | "name": "stdout", 205 | "output_type": "stream", 206 | "text": [ 207 | "Playing separated track 0\n", 208 | "Playing separated track 1\n", 209 | "Playing separated track 2\n", 210 | "Playing separated track 3\n", 211 | "Playing separated track 4\n" 212 | ] 213 | } 214 | ], 215 | "source": [ 216 | "for i in range(S.shape[1]):\n", 217 | " print('Playing separated track %d' % i)\n", 218 | " play(S[:, i])" 219 | ] 220 | } 221 | ], 222 | "metadata": { 223 | "colab": { 224 | "collapsed_sections": [], 225 | "name": "Bonjour, Colaboratory", 226 | "provenance": [], 227 | "version": "0.3.2" 228 | }, 229 | "kernelspec": { 230 | "display_name": "Python 2", 231 | "language": "python", 232 | "name": "python2" 233 | }, 234 | "language_info": { 235 | "codemirror_mode": { 236 | "name": "ipython", 237 | "version": 2 238 | }, 239 | "file_extension": ".py", 240 | "mimetype": "text/x-python", 241 | "name": "python", 242 | "nbconvert_exporter": "python", 243 | "pygments_lexer": "ipython2", 244 | "version": "2.7.15" 245 | } 246 | }, 247 | "nbformat": 4, 248 | "nbformat_minor": 1 249 | } 250 | -------------------------------------------------------------------------------- /Machine Learning/Problem4/data/bellsej.py: -------------------------------------------------------------------------------- 1 | ### Independent Components Analysis 2 | ### 3 | ### This program requires a working installation of: 4 | ### 5 | ### On Mac: 6 | ### 1. portaudio: On Mac: brew install portaudio 7 | ### 2. sounddevice: pip install sounddevice 8 | ### 9 | ### On windows: 10 | ### pip install pyaudio sounddevice 11 | ### 12 | 13 | import sounddevice as sd 14 | import numpy as np 15 | 16 | Fs = 11025 17 | 18 | def normalize(dat): 19 | return 0.99 * dat / np.max(np.abs(dat)) 20 | 21 | def load_data(): 22 | mix = np.loadtxt('mix.dat') 23 | return mix 24 | 25 | def play(vec): 26 | sd.play(vec, Fs, blocking=True) 27 | 28 | def unmixer(X): 29 | M, N = X.shape 30 | W = np.eye(N) 31 | 32 | anneal = [0.1, 0.1, 0.1, 0.05, 0.05, 0.05, 0.02, 0.02, 0.01, 0.01, 33 | 0.005, 0.005, 0.002, 0.002, 0.001, 0.001] 34 | print('Separating tracks ...') 35 | ######## Your code here ########## 36 | 37 | ################################### 38 | return W 39 | 40 | def unmix(X, W): 41 | S = np.zeros(X.shape) 42 | 43 | ######### Your code here ########## 44 | 45 | ################################## 46 | return S 47 | 48 | def main(): 49 | X = normalize(load_data()) 50 | 51 | for i in range(X.shape[1]): 52 | print('Playing mixed track %d' % i) 53 | play(X[:, i]) 54 | 55 | W = unmixer(X) 56 | S = normalize(unmix(X, W)) 57 | 58 | for i in range(S.shape[1]): 59 | print('Playing separated track %d' % i) 60 | play(S[:, i]) 61 | 62 | if __name__ == '__main__': 63 | main() 64 | -------------------------------------------------------------------------------- /Machine Learning/Problem4/data/cart_pole.py: -------------------------------------------------------------------------------- 1 | """ 2 | CS 229 Machine Learning, Fall 2017 3 | Problem Set 4 4 | Question: Reinforcement Learning: The inverted pendulum 5 | Author: Sanyam Mehra, sanyam@stanford.edu 6 | """ 7 | from __future__ import division, print_function 8 | from math import sin, cos, pi 9 | import matplotlib.pyplot as plt 10 | import matplotlib.patches as patches 11 | 12 | class CartPole: 13 | def __init__(self, physics): 14 | self.physics = physics 15 | self.mass_cart = 1.0 16 | self.mass_pole = 0.3 17 | self.mass = self.mass_cart + self.mass_pole 18 | self.length = 0.7 # actually half the pole length 19 | self.pole_mass_length = self.mass_pole * self.length 20 | 21 | def simulate(self, action, state_tuple): 22 | """ 23 | Simulation dynamics of the cart-pole system 24 | 25 | Parameters 26 | ---------- 27 | action : int 28 | Action represented as 0 or 1 29 | state_tuple : tuple 30 | Continuous vector of x, x_dot, theta, theta_dot 31 | 32 | Returns 33 | ------- 34 | new_state : tuple 35 | Updated state vector of new_x, new_x_dot, nwe_theta, new_theta_dot 36 | """ 37 | x, x_dot, theta, theta_dot = state_tuple 38 | costheta, sintheta = cos(theta), sin(theta) 39 | # costheta, sintheta = cos(theta * 180 / pi), sin(theta * 180 / pi) 40 | 41 | # calculate force based on action 42 | force = self.physics.force_mag if action > 0 else (-1 * self.physics.force_mag) 43 | 44 | # intermediate calculation 45 | temp = (force + self.pole_mass_length * theta_dot * theta_dot * sintheta) / self.mass 46 | theta_acc = (self.physics.gravity * sintheta - temp * costheta) / (self.length * (4/3 - self.mass_pole * costheta * costheta / self.mass)) 47 | 48 | x_acc = temp - self.pole_mass_length * theta_acc * costheta / self.mass 49 | 50 | # return new state variable using Euler's method 51 | new_x = x + self.physics.tau * x_dot 52 | new_x_dot = x_dot + self.physics.tau * x_acc 53 | new_theta = theta + self.physics.tau * theta_dot 54 | new_theta_dot = theta_dot + self.physics.tau * theta_acc 55 | new_state = (new_x, new_x_dot, new_theta, new_theta_dot) 56 | 57 | return new_state 58 | 59 | def get_state(self, state_tuple): 60 | """ 61 | Discretizes the continuous state vector. The current discretization 62 | divides x into 3, x_dot into 3, theta into 6 and theta_dot into 3 63 | categories. A finer discretization produces a larger state space 64 | but allows for a better policy 65 | 66 | Parameters 67 | ---------- 68 | state_tuple : tuple 69 | Continuous vector of x, x_dot, theta, theta_dot 70 | 71 | Returns 72 | ------- 73 | state : int 74 | Discretized state value 75 | """ 76 | x, x_dot, theta, theta_dot = state_tuple 77 | # parameters for state discretization in get_state 78 | # convert degrees to radians 79 | one_deg = pi / 180 80 | six_deg = 6 * pi / 180 81 | twelve_deg = 12 * pi / 180 82 | fifty_deg = 50 * pi / 180 83 | 84 | total_states = 163 85 | state = 0 86 | 87 | if x < -2.4 or x > 2.4 or theta < -twelve_deg or theta > twelve_deg: 88 | state = total_states - 1 # to signal failure 89 | else: 90 | # x: 3 categories 91 | if x < -1.5: 92 | state = 0 93 | elif x < 1.5: 94 | state = 1 95 | else: 96 | state = 2 97 | # x_dot: 3 categories 98 | if x_dot < -0.5: 99 | pass 100 | elif x_dot < 0.5: 101 | state += 3 102 | else: 103 | state += 6 104 | # theta: 6 categories 105 | if theta < -six_deg: 106 | pass 107 | elif theta < -one_deg: 108 | state += 9 109 | elif theta < 0: 110 | state += 18 111 | elif theta < one_deg: 112 | state += 27 113 | elif theta < six_deg: 114 | state += 36 115 | else: 116 | state += 45 117 | # theta_dot: 3 categories 118 | if theta_dot < -fifty_deg: 119 | pass 120 | elif theta_dot < fifty_deg: 121 | state += 54 122 | else: 123 | state += 108 124 | # state += 1 # converting from MATLAB 1-indexing to 0-indexing 125 | return state 126 | 127 | def show_cart(self, state_tuple, pause_time): 128 | """ 129 | Given the `state_tuple`, displays the cart-pole system. 130 | 131 | Parameters 132 | ---------- 133 | state_tuple : tuple 134 | Continuous vector of x, x_dot, theta, theta_dot 135 | pause_time : float 136 | Time delay in seconds 137 | 138 | Returns 139 | ------- 140 | """ 141 | x, x_dot, theta, theta_dot = state_tuple 142 | X = [x, x + 4*self.length * sin(theta)] 143 | Y = [0, 4*self.length * cos(theta)] 144 | plt.close('all') 145 | fig, ax = plt.subplots(1) 146 | plt.ion() 147 | ax.set_xlim(-3, 3) 148 | ax.set_ylim(-0.5, 3.5) 149 | ax.plot(X, Y) 150 | cart = patches.Rectangle((x - 0.4, -0.25), 0.8, 0.25, 151 | linewidth=1, edgecolor='k', facecolor='cyan') 152 | base = patches.Rectangle((x - 0.01, -0.5), 0.02, 0.25, 153 | linewidth=1, edgecolor='k', facecolor='r') 154 | ax.add_patch(cart) 155 | ax.add_patch(base) 156 | x_dot_str, theta_str, theta_dot_str = '\\dot{x}', '\\theta', '\\dot{\\theta}' 157 | ax.set_title('x: %.3f, $%s$: %.3f, $%s$: %.3f, $%s$: %.3f'\ 158 | %(x, x_dot_str, x_dot, theta_str, theta, theta_dot_str, x)) 159 | plt.show() 160 | plt.pause(pause_time) 161 | 162 | class Physics: 163 | gravity = 9.8 164 | force_mag = 10.0 165 | tau = 0.02 # seconds between state updates 166 | -------------------------------------------------------------------------------- /Machine Learning/Problem4/data/mnist.zip: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/bayeslabs/AiGym/205350c311f0e0981fe90eae84586c9bd4a9cfef/Machine Learning/Problem4/data/mnist.zip -------------------------------------------------------------------------------- /Machine Learning/Problem4/data/nn_starter.py: -------------------------------------------------------------------------------- 1 | import numpy as np 2 | import matplotlib.pyplot as plt 3 | 4 | def readData(images_file, labels_file): 5 | x = np.loadtxt(images_file, delimiter=',') 6 | y = np.loadtxt(labels_file, delimiter=',') 7 | return x, y 8 | 9 | def softmax(x): 10 | """ 11 | Compute softmax function for input. 12 | Use tricks from previous assignment to avoid overflow 13 | """ 14 | ### YOUR CODE HERE 15 | 16 | ### END YOUR CODE 17 | return s 18 | 19 | def sigmoid(x): 20 | """ 21 | Compute the sigmoid function for the input here. 22 | """ 23 | ### YOUR CODE HERE 24 | 25 | ### END YOUR CODE 26 | return s 27 | 28 | def forward_prop(data, labels, params): 29 | """ 30 | return hidder layer, output(softmax) layer and loss 31 | """ 32 | W1 = params['W1'] 33 | b1 = params['b1'] 34 | W2 = params['W2'] 35 | b2 = params['b2'] 36 | 37 | ### YOUR CODE HERE 38 | 39 | ### END YOUR CODE 40 | return h, y, cost 41 | 42 | def backward_prop(data, labels, params): 43 | """ 44 | return gradient of parameters 45 | """ 46 | W1 = params['W1'] 47 | b1 = params['b1'] 48 | W2 = params['W2'] 49 | b2 = params['b2'] 50 | 51 | ### YOUR CODE HERE 52 | 53 | ### END YOUR CODE 54 | 55 | grad = {} 56 | grad['W1'] = gradW1 57 | grad['W2'] = gradW2 58 | grad['b1'] = gradb1 59 | grad['b2'] = gradb2 60 | 61 | return grad 62 | 63 | def nn_train(trainData, trainLabels, devData, devLabels): 64 | (m, n) = trainData.shape 65 | num_hidden = 300 66 | learning_rate = 5 67 | params = {} 68 | 69 | ### YOUR CODE HERE 70 | 71 | ### END YOUR CODE 72 | 73 | return params 74 | 75 | def nn_test(data, labels, params): 76 | h, output, cost = forward_prop(data, labels, params) 77 | accuracy = compute_accuracy(output, labels) 78 | return accuracy 79 | 80 | def compute_accuracy(output, labels): 81 | accuracy = (np.argmax(output,axis=1) == np.argmax(labels,axis=1)).sum() * 1. / labels.shape[0] 82 | return accuracy 83 | 84 | def one_hot_labels(labels): 85 | one_hot_labels = np.zeros((labels.size, 10)) 86 | one_hot_labels[np.arange(labels.size),labels.astype(int)] = 1 87 | return one_hot_labels 88 | 89 | def main(): 90 | np.random.seed(100) 91 | trainData, trainLabels = readData('images_train.csv', 'labels_train.csv') 92 | trainLabels = one_hot_labels(trainLabels) 93 | p = np.random.permutation(60000) 94 | trainData = trainData[p,:] 95 | trainLabels = trainLabels[p,:] 96 | 97 | devData = trainData[0:10000,:] 98 | devLabels = trainLabels[0:10000,:] 99 | trainData = trainData[10000:,:] 100 | trainLabels = trainLabels[10000:,:] 101 | 102 | mean = np.mean(trainData) 103 | std = np.std(trainData) 104 | trainData = (trainData - mean) / std 105 | devData = (devData - mean) / std 106 | 107 | testData, testLabels = readData('images_test.csv', 'labels_test.csv') 108 | testLabels = one_hot_labels(testLabels) 109 | testData = (testData - mean) / std 110 | 111 | params = nn_train(trainData, trainLabels, devData, devLabels) 112 | 113 | 114 | readyForTesting = False 115 | if readyForTesting: 116 | accuracy = nn_test(testData, testLabels, params) 117 | print 'Test accuracy: %f' % accuracy 118 | 119 | if __name__ == '__main__': 120 | main() 121 | -------------------------------------------------------------------------------- /Machine Learning/Problem4/ps4.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/bayeslabs/AiGym/205350c311f0e0981fe90eae84586c9bd4a9cfef/Machine Learning/Problem4/ps4.pdf -------------------------------------------------------------------------------- /Machine Learning/Readme.md: -------------------------------------------------------------------------------- 1 | 7 | 8 | # Machine Learning 9 | 10 | ## Course List 11 | **S.No** | **Course Title** | **Link to course** | **Link to Assignment Solutions** 12 | ------------ | ------------- | --------- | ----------- 13 | [1](#1-machine-learning) | Machine Learning | https://cs229.stanford.edu/ | [CS229 Assignment Solutions](https://github.com/huyfam/cs229-solutions-2020) 14 | 15 | 16 | 17 | ## Course Details 18 | ### 1. Machine Learning 19 | * **Link to course**            :     http://cs229.stanford.edu/ 20 | * **Offered By**                  :     Stanford 21 | * **Pre-Requisites**           :     Calculus, Linear Algebra, Basic Python programming,Probability Theory 22 | 23 | * **Level**                           :     Beginner 24 | * **Course description** 25 | This course provides a broad introduction to machine learning and statistical pattern recognition. Topics include: supervised learning (generative/discriminative learning, parametric/non-parametric learning, neural networks, support vector machines); unsupervised learning (clustering, dimensionality reduction, kernel methods); learning theory (bias/variance tradeoffs, practical advice); reinforcement learning and adaptive control. 26 | 27 | 28 | 35 | 36 | 37 | #### Happy Learning   :thumbsup: :memo: 38 | 39 | 40 | 41 | 42 | 43 | -------------------------------------------------------------------------------- /NLP/README.md: -------------------------------------------------------------------------------- 1 | 6 | 7 | 8 | # Natural Language Processing 9 | 10 | ## Course List 11 | **S.No** | **Course Title** | **Link to course** | **Link to Assignment Solutions** 12 | ------------ | ------------- | --------- | ------------ 13 | [1](#1-natural-language-processing-with-deep-learning) | Natural Language Processing with Deep Learning | https://web.stanford.edu/class/cs224n/index.html | [CS224n Solutions](https://github.com/Brant-Skywalker/CS224n-Winter-2022) 14 | 15 | 16 | 17 | ## Course Details 18 | ### 1. Natural Language Processing with Deep Learning 19 | * **Link to course**            :     https://web.stanford.edu/class/cs224n/index.html 20 | * **Offered By**                  :     Stanford 21 | * **Pre-Requisites**           :     Calculus, Linear Algebra, Proficiency in Python,Probability , Statistics, Foundations of Machine Learning 22 | 23 | * **Level**                           :     Advanced 24 | * **Course description** 25 | In this course, students will gain a thorough introduction to cutting-edge research in Deep Learning for NLP. Through lectures, assignments and a final project, students will learn the necessary skills to design, implement, and understand their own neural network models, using the Pytorch framework. 26 | 27 | 28 | 35 | 36 | 37 | #### Happy Learning   :thumbsup: :memo: 38 | 39 | 40 | 41 | 42 | 43 | 44 | -------------------------------------------------------------------------------- /NLP/assignment1/Makefile: -------------------------------------------------------------------------------- 1 | DATASETS_DIR=utils/datasets 2 | 3 | init: 4 | sh get_datasets.sh 5 | 6 | submit: 7 | sh collect_submission.sh 8 | 9 | clean: 10 | rm -f assignment1.zip 11 | rm -rf ${DATASETS_DIR} 12 | rm -f *.pyc *.png *.npy utils/*.pyc 13 | 14 | -------------------------------------------------------------------------------- /NLP/assignment1/assignment1-solution.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/bayeslabs/AiGym/205350c311f0e0981fe90eae84586c9bd4a9cfef/NLP/assignment1/assignment1-solution.pdf -------------------------------------------------------------------------------- /NLP/assignment1/assignment1.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/bayeslabs/AiGym/205350c311f0e0981fe90eae84586c9bd4a9cfef/NLP/assignment1/assignment1.pdf -------------------------------------------------------------------------------- /NLP/assignment1/collect_submission.sh: -------------------------------------------------------------------------------- 1 | rm -f assignment1.zip 2 | zip -r assignment1.zip *.py *.png saved_params_40000.npy 3 | -------------------------------------------------------------------------------- /NLP/assignment1/get_datasets.sh: -------------------------------------------------------------------------------- 1 | #!/bin/bash 2 | 3 | DATASETS_DIR="utils/datasets" 4 | mkdir -p $DATASETS_DIR 5 | 6 | cd $DATASETS_DIR 7 | 8 | # Get Stanford Sentiment Treebank 9 | if hash wget 2>/dev/null; then 10 | wget http://nlp.stanford.edu/~socherr/stanfordSentimentTreebank.zip 11 | else 12 | curl -L http://nlp.stanford.edu/~socherr/stanfordSentimentTreebank.zip -o stanfordSentimentTreebank.zip 13 | fi 14 | unzip stanfordSentimentTreebank.zip 15 | rm stanfordSentimentTreebank.zip 16 | 17 | # Get 50D GloVe vectors 18 | if hash wget 2>/dev/null; then 19 | wget http://nlp.stanford.edu/data/glove.6B.zip 20 | else 21 | curl -L http://nlp.stanford.edu/data/glove.6B.zip -o glove.6B.zip 22 | fi 23 | unzip glove.6B.zip 24 | rm glove.6B.100d.txt glove.6B.200d.txt glove.6B.300d.txt glove.6B.zip 25 | -------------------------------------------------------------------------------- /NLP/assignment1/q1_softmax.py: -------------------------------------------------------------------------------- 1 | import numpy as np 2 | 3 | 4 | def softmax(x): 5 | """Compute the softmax function for each row of the input x. 6 | 7 | It is crucial that this function is optimized for speed because 8 | it will be used frequently in later code. You might find numpy 9 | functions np.exp, np.sum, np.reshape, np.max, and numpy 10 | broadcasting useful for this task. 11 | 12 | Numpy broadcasting documentation: 13 | http://docs.scipy.org/doc/numpy/user/basics.broadcasting.html 14 | 15 | You should also make sure that your code works for a single 16 | D-dimensional vector (treat the vector as a single row) and 17 | for N x D matrices. This may be useful for testing later. Also, 18 | make sure that the dimensions of the output match the input. 19 | 20 | You must implement the optimization in problem 1(a) of the 21 | written assignment! 22 | 23 | Arguments: 24 | x -- A D dimensional vector or N x D dimensional numpy matrix. 25 | 26 | Return: 27 | x -- You are allowed to modify x in-place 28 | """ 29 | orig_shape = x.shape 30 | x-=np.max(x,axis=-1,keepdims=True) 31 | x_exp=np.exp(x) 32 | x=x_exp/np.sum(x_exp, axis=-1, keepdims=True) 33 | 34 | assert x.shape == orig_shape 35 | return x 36 | 37 | 38 | def test_softmax_basic(): 39 | """ 40 | Some simple tests to get you started. 41 | Warning: these are not exhaustive. 42 | """ 43 | print("Running basic tests...") 44 | test1 = softmax(np.array([1,2])) 45 | print(test1) 46 | ans1 = np.array([0.26894142, 0.73105858]) 47 | assert np.allclose(test1, ans1, rtol=1e-05, atol=1e-06) 48 | 49 | test2 = softmax(np.array([[1001,1002],[3,4]])) 50 | print(test2) 51 | ans2 = np.array([ 52 | [0.26894142, 0.73105858], 53 | [0.26894142, 0.73105858]]) 54 | assert np.allclose(test2, ans2, rtol=1e-05, atol=1e-06) 55 | 56 | test3 = softmax(np.array([[-1001,-1002]])) 57 | print(test3) 58 | ans3 = np.array([0.73105858, 0.26894142]) 59 | assert np.allclose(test3, ans3, rtol=1e-05, atol=1e-06) 60 | 61 | print("You should be able to verify these results by hand!\n") 62 | 63 | 64 | def test_softmax(): 65 | """ 66 | Use this space to test your softmax implementation by running: 67 | python q1_softmax.py 68 | This function will not be called by the autograder, nor will 69 | your tests be graded. 70 | """ 71 | print("Running your tests...") 72 | ### YOUR CODE HERE 73 | raise NotImplementedError 74 | ### END YOUR CODE 75 | 76 | 77 | if __name__ == "__main__": 78 | test_softmax_basic() 79 | # test_softmax() 80 | -------------------------------------------------------------------------------- /NLP/assignment1/q2_gradcheck.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python 2 | 3 | import numpy as np 4 | import random 5 | 6 | 7 | # First implement a gradient checker by filling in the following functions 8 | def gradcheck_naive(f, x): 9 | """ Gradient check for a function f. 10 | 11 | Arguments: 12 | f -- a function that takes a single argument and outputs the 13 | cost and its gradients 14 | x -- the point (numpy array) to check the gradient at 15 | """ 16 | 17 | rndstate = random.getstate() 18 | random.setstate(rndstate) 19 | fx, grad = f(x) # Evaluate function value at original point 20 | h = 1e-4 # Do not change this! 21 | 22 | # Iterate over all indexes ix in x to check the gradient. 23 | it = np.nditer(x, flags=['multi_index'], op_flags=['readwrite']) 24 | while not it.finished: 25 | ix = it.multi_index 26 | 27 | # Try modifying x[ix] with h defined above to compute numerical 28 | # gradients (numgrad). 29 | 30 | # Use the centered difference of the gradient. 31 | # It has smaller asymptotic error than forward / backward difference 32 | # methods. If you are curious, check out here: 33 | # https://math.stackexchange.com/questions/2326181/when-to-use-forward-or-central-difference-approximations 34 | 35 | # Make sure you call random.setstate(rndstate) 36 | # before calling f(x) each time. This will make it possible 37 | # to test cost functions with built in randomness later. 38 | 39 | ### YOUR CODE HERE: 40 | x[ix]+=h 41 | random.setstate(rndstate) 42 | fhn=f(x)[0] 43 | x[ix]-=2*h 44 | random.setstate(rndstate) 45 | fhp=f(x)[0] 46 | x[ix]+=h 47 | numgrad = (fhn-fhp)/(2*h) 48 | ### END YOUR CODE 49 | 50 | # Compare gradients 51 | reldiff = abs(numgrad - grad[ix]) / max(1, abs(numgrad), abs(grad[ix])) 52 | if reldiff > 1e-5: 53 | print("Gradient check failed.") 54 | print("First gradient error found at index %s" % str(ix)) 55 | print("Your gradient: %f \t Numerical gradient: %f" % ( 56 | grad[ix], numgrad)) 57 | return 58 | 59 | it.iternext() # Step to next dimension 60 | 61 | print("Gradient check passed!") 62 | 63 | 64 | def sanity_check(): 65 | """ 66 | Some basic sanity checks. 67 | """ 68 | quad = lambda x: (np.sum(x ** 2), x * 2) 69 | 70 | print("Running sanity checks...") 71 | gradcheck_naive(quad, np.array(123.456)) # scalar test 72 | gradcheck_naive(quad, np.random.randn(3,)) # 1-D test 73 | gradcheck_naive(quad, np.random.randn(4,5)) # 2-D test 74 | print("") 75 | 76 | 77 | def your_sanity_checks(): 78 | """ 79 | Use this space add any additional sanity checks by running: 80 | python q2_gradcheck.py 81 | This function will not be called by the autograder, nor will 82 | your additional tests be graded. 83 | """ 84 | print("Running your sanity checks...") 85 | ### YOUR CODE HERE 86 | raise NotImplementedError 87 | ### END YOUR CODE 88 | 89 | 90 | if __name__ == "__main__": 91 | sanity_check() 92 | # your_sanity_checks() 93 | -------------------------------------------------------------------------------- /NLP/assignment1/q2_neural.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python 2 | 3 | import numpy as np 4 | import random 5 | 6 | from q1_softmax import softmax 7 | from q2_sigmoid import sigmoid, sigmoid_grad 8 | from q2_gradcheck import gradcheck_naive 9 | 10 | 11 | def forward_backward_prop(X, labels, params, dimensions): 12 | """ 13 | Forward and backward propagation for a two-layer sigmoidal network 14 | 15 | Compute the forward propagation and for the cross entropy cost, 16 | the backward propagation for the gradients for all parameters. 17 | 18 | Notice the gradients computed here are different from the gradients in 19 | the assignment sheet: they are w.r.t. weights, not inputs. 20 | 21 | Arguments: 22 | X -- M x Dx matrix, where each row is a training example x. 23 | labels -- M x Dy matrix, where each row is a one-hot vector. 24 | params -- Model parameters, these are unpacked for you. 25 | dimensions -- A tuple of input dimension, number of hidden units 26 | and output dimension 27 | """ 28 | 29 | ### Unpack network parameters (do not modify) 30 | ofs = 0 31 | Dx, H, Dy = (dimensions[0], dimensions[1], dimensions[2]) 32 | 33 | W1 = np.reshape(params[ofs:ofs+ Dx * H], (Dx, H)) 34 | ofs += Dx * H 35 | b1 = np.reshape(params[ofs:ofs + H], (1, H)) 36 | ofs += H 37 | W2 = np.reshape(params[ofs:ofs + H * Dy], (H, Dy)) 38 | ofs += H * Dy 39 | b2 = np.reshape(params[ofs:ofs + Dy], (1, Dy)) 40 | 41 | # Note: compute cost based on `sum` not `mean`. 42 | ### YOUR CODE HERE: forward propagation 43 | fc1=X.dot(W1)+b1 # [M,H] 44 | sig1=sigmoid(fc1) # [M,H] 45 | scores=sig1.dot(W2)+b2 # [M,Dy] 46 | shifted_scores = scores - np.max(scores,axis=-1,keepdims=True) # [M,Dy] 47 | z = np.exp(shifted_scores).sum(axis=-1, keepdims=True) # [M,1] 48 | log_porbs = shifted_scores - np.log(z) 49 | cost = -1*(log_porbs*labels).sum() 50 | ### END YOUR CODE 51 | 52 | ### YOUR CODE HERE: backward propagation 53 | dout=np.exp(log_porbs) 54 | dout[labels==1]-=1 55 | gradW2=sig1.T.dot(dout) 56 | gradb2=dout.sum(axis=0) 57 | dsig1=dout.dot(W2.T) 58 | dfc1=sigmoid_grad(sig1)*dsig1 59 | gradW1=X.T.dot(dfc1) 60 | gradb1=dfc1.sum(axis=0) 61 | ### END YOUR CODE 62 | 63 | ### Stack gradients (do not modify) 64 | grad = np.concatenate((gradW1.flatten(), gradb1.flatten(), 65 | gradW2.flatten(), gradb2.flatten())) 66 | 67 | return cost, grad 68 | 69 | 70 | def sanity_check(): 71 | """ 72 | Set up fake data and parameters for the neural network, and test using 73 | gradcheck. 74 | """ 75 | print("Running sanity check...") 76 | 77 | N = 20 78 | dimensions = [10, 5, 10] 79 | data = np.random.randn(N, dimensions[0]) # each row will be a datum 80 | labels = np.zeros((N, dimensions[2])) 81 | for i in range(N): 82 | labels[i, random.randint(0,dimensions[2]-1)] = 1 83 | 84 | params = np.random.randn((dimensions[0] + 1) * dimensions[1] + ( 85 | dimensions[1] + 1) * dimensions[2], ) 86 | 87 | gradcheck_naive(lambda params: 88 | forward_backward_prop(data, labels, params, dimensions), params) 89 | 90 | 91 | def your_sanity_checks(): 92 | """ 93 | Use this space add any additional sanity checks by running: 94 | python q2_neural.py 95 | This function will not be called by the autograder, nor will 96 | your additional tests be graded. 97 | """ 98 | print("Running your sanity checks...") 99 | ### YOUR CODE HERE 100 | raise NotImplementedError 101 | ### END YOUR CODE 102 | 103 | 104 | if __name__ == "__main__": 105 | sanity_check() 106 | # your_sanity_checks() 107 | -------------------------------------------------------------------------------- /NLP/assignment1/q2_sigmoid.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python 2 | 3 | import numpy as np 4 | 5 | 6 | def sigmoid(x): 7 | """ 8 | Compute the sigmoid function for the input here. 9 | 10 | Arguments: 11 | x -- A scalar or numpy array. 12 | 13 | Return: 14 | s -- sigmoid(x) 15 | """ 16 | 17 | ### YOUR CODE HERE 18 | if isinstance(x,np.ndarray): 19 | mask_pos = (x>0) 20 | mask_neg = (x<=0) 21 | pos = np.zeros_like(x, dtype=float) 22 | neg = np.zeros_like(x, dtype=float) 23 | pos[mask_pos] = np.exp(-x[mask_pos]) 24 | neg[mask_neg] = np.exp(x[mask_neg]) 25 | numerator=np.ones_like(pos) 26 | numerator[mask_neg]=neg[mask_neg] 27 | denumerator=pos+neg 28 | s=numerator/(1+denumerator) 29 | else: 30 | if x < 0: 31 | s = 1./(1+np.exp(x)) 32 | else: 33 | s = np.exp(-x)/(1+np.exp(-x)) 34 | ### END YOUR CODE 35 | 36 | return s 37 | 38 | 39 | def sigmoid_grad(s): 40 | """ 41 | Compute the gradient for the sigmoid function here. Note that 42 | for this implementation, the input s should be the sigmoid 43 | function value of your original input x. 44 | 45 | Arguments: 46 | s -- A scalar or numpy array. 47 | 48 | Return: 49 | ds -- Your computed gradient. 50 | """ 51 | 52 | ### YOUR CODE HERE 53 | ds = s*(1-s) 54 | ### END YOUR CODE 55 | 56 | return ds 57 | 58 | 59 | def test_sigmoid_basic(): 60 | """ 61 | Some simple tests to get you started. 62 | Warning: these are not exhaustive. 63 | """ 64 | print("Running basic tests...") 65 | x = np.array([[1, 2], [-1, -2]]) 66 | f = sigmoid(x) 67 | g = sigmoid_grad(f) 68 | print(f) 69 | f_ans = np.array([ 70 | [0.73105858, 0.88079708], 71 | [0.26894142, 0.11920292]]) 72 | assert np.allclose(f, f_ans, rtol=1e-05, atol=1e-06) 73 | print(g) 74 | g_ans = np.array([ 75 | [0.19661193, 0.10499359], 76 | [0.19661193, 0.10499359]]) 77 | assert np.allclose(g, g_ans, rtol=1e-05, atol=1e-06) 78 | print("You should verify these results by hand!\n") 79 | 80 | 81 | def test_sigmoid(): 82 | """ 83 | Use this space to test your sigmoid implementation by running: 84 | python q2_sigmoid.py 85 | This function will not be called by the autograder, nor will 86 | your tests be graded. 87 | """ 88 | print("Running your tests...") 89 | ### YOUR CODE HERE 90 | raise NotImplementedError 91 | ### END YOUR CODE 92 | 93 | 94 | if __name__ == "__main__": 95 | test_sigmoid_basic() 96 | # test_sigmoid() 97 | -------------------------------------------------------------------------------- /NLP/assignment1/q3_run.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python 2 | 3 | import random 4 | import numpy as np 5 | from utils.treebank import StanfordSentiment 6 | import matplotlib 7 | matplotlib.use('agg') 8 | import matplotlib.pyplot as plt 9 | import time 10 | 11 | from q3_word2vec import * 12 | from q3_sgd import * 13 | 14 | # Reset the random seed to make sure that everyone gets the same results 15 | random.seed(314) 16 | dataset = StanfordSentiment() 17 | tokens = dataset.tokens() 18 | nWords = len(tokens) 19 | 20 | # We are going to train 10-dimensional vectors for this assignment 21 | dimVectors = 10 22 | 23 | # Context size 24 | C = 5 25 | 26 | # Reset the random seed to make sure that everyone gets the same results 27 | random.seed(31415) 28 | np.random.seed(9265) 29 | 30 | startTime=time.time() 31 | wordVectors = np.concatenate( 32 | ((np.random.rand(nWords, dimVectors) - 0.5) / 33 | dimVectors, np.zeros((nWords, dimVectors))), 34 | axis=0) 35 | wordVectors = sgd( 36 | lambda vec: word2vec_sgd_wrapper(skipgram, tokens, vec, dataset, C, 37 | negSamplingCostAndGradient), 38 | wordVectors, 0.3, 40000, None, True, PRINT_EVERY=10) 39 | # Note that normalization is not called here. This is not a bug, 40 | # normalizing during training loses the notion of length. 41 | 42 | print("sanity check: cost at convergence should be around or below 10") 43 | print("training took %d seconds" % (time.time() - startTime)) 44 | 45 | # concatenate the input and output word vectors 46 | wordVectors = np.concatenate( 47 | (wordVectors[:nWords,:], wordVectors[nWords:,:]), 48 | axis=0) 49 | # wordVectors = wordVectors[:nWords,:] + wordVectors[nWords:,:] 50 | 51 | visualizeWords = [ 52 | "the", "a", "an", ",", ".", "?", "!", "``", "''", "--", 53 | "good", "great", "cool", "brilliant", "wonderful", "well", "amazing", 54 | "worth", "sweet", "enjoyable", "boring", "bad", "waste", "dumb", 55 | "annoying"] 56 | 57 | visualizeIdx = [tokens[word] for word in visualizeWords] 58 | visualizeVecs = wordVectors[visualizeIdx, :] 59 | temp = (visualizeVecs - np.mean(visualizeVecs, axis=0)) 60 | covariance = 1.0 / len(visualizeIdx) * temp.T.dot(temp) 61 | U,S,V = np.linalg.svd(covariance) 62 | coord = temp.dot(U[:,0:2]) 63 | 64 | for i in range(len(visualizeWords)): 65 | plt.text(coord[i,0], coord[i,1], visualizeWords[i], 66 | bbox=dict(facecolor='green', alpha=0.1)) 67 | 68 | plt.xlim((np.min(coord[:,0]), np.max(coord[:,0]))) 69 | plt.ylim((np.min(coord[:,1]), np.max(coord[:,1]))) 70 | 71 | plt.savefig('q3_word_vectors.png') 72 | -------------------------------------------------------------------------------- /NLP/assignment1/q3_sgd.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python 2 | 3 | # Save parameters every a few SGD iterations as fail-safe 4 | SAVE_PARAMS_EVERY = 5000 5 | 6 | import glob 7 | import random 8 | import numpy as np 9 | import os.path as op 10 | import pickle 11 | 12 | 13 | def load_saved_params(): 14 | """ 15 | A helper function that loads previously saved parameters and resets 16 | iteration start. 17 | """ 18 | st = 0 19 | for f in glob.glob("saved_params_*.npy"): 20 | iter = int(op.splitext(op.basename(f))[0].split("_")[2]) 21 | if (iter > st): 22 | st = iter 23 | 24 | if st > 0: 25 | with open("saved_params_%d.npy" % st, "rb") as f: 26 | params = pickle.load(f) 27 | state = pickle.load(f) 28 | return st, params, state 29 | else: 30 | return st, None, None 31 | 32 | 33 | def save_params(iter, params): 34 | with open("saved_params_%d.npy" % iter, "wb") as f: 35 | pickle.dump(params, f) 36 | pickle.dump(random.getstate(), f) 37 | 38 | 39 | def sgd(f, x0, step, iterations, postprocessing=None, useSaved=False, 40 | PRINT_EVERY=10): 41 | """ Stochastic Gradient Descent 42 | 43 | Implement the stochastic gradient descent method in this function. 44 | 45 | Arguments: 46 | f -- the function to optimize, it should take a single 47 | argument and yield two outputs, a cost and the gradient 48 | with respect to the arguments 49 | x0 -- the initial point to start SGD from 50 | step -- the step size for SGD 51 | iterations -- total iterations to run SGD for 52 | postprocessing -- postprocessing function for the parameters 53 | if necessary. In the case of word2vec we will need to 54 | normalize the word vectors to have unit length. 55 | PRINT_EVERY -- specifies how many iterations to output loss 56 | 57 | Return: 58 | x -- the parameter value after SGD finishes 59 | """ 60 | 61 | # Anneal learning rate every several iterations 62 | ANNEAL_EVERY = 20000 63 | 64 | if useSaved: 65 | start_iter, oldx, state = load_saved_params() 66 | if start_iter > 0: 67 | x0 = oldx 68 | step *= 0.5 ** (start_iter / ANNEAL_EVERY) 69 | 70 | if state: 71 | random.setstate(state) 72 | else: 73 | start_iter = 0 74 | 75 | x = x0 76 | 77 | if not postprocessing: 78 | postprocessing = lambda x: x 79 | 80 | expcost = None 81 | 82 | for iter in range(start_iter + 1, iterations + 1): 83 | # Don't forget to apply the postprocessing after every iteration! 84 | # You might want to print the progress every few iterations. 85 | 86 | cost = None 87 | ### YOUR CODE HERE 88 | cost,grad = f(x) 89 | x-=step*grad 90 | x=postprocessing(x) 91 | ### END YOUR CODE 92 | 93 | if iter % PRINT_EVERY == 0: 94 | if not expcost: 95 | expcost = cost 96 | else: 97 | expcost = .95 * expcost + .05 * cost 98 | print("iter %d: %f" % (iter, expcost)) 99 | 100 | if iter % SAVE_PARAMS_EVERY == 0 and useSaved: 101 | save_params(iter, x) 102 | 103 | if iter % ANNEAL_EVERY == 0: 104 | step *= 0.5 105 | 106 | return x 107 | 108 | 109 | def sanity_check(): 110 | quad = lambda x: (np.sum(x ** 2), x * 2) 111 | 112 | print("Running sanity checks...") 113 | t1 = sgd(quad, 0.5, 0.01, 1000, PRINT_EVERY=100) 114 | print("test 1 result:", t1) 115 | assert abs(t1) <= 1e-6 116 | 117 | t2 = sgd(quad, 0.0, 0.01, 1000, PRINT_EVERY=100) 118 | print("test 2 result:", t2) 119 | assert abs(t2) <= 1e-6 120 | 121 | t3 = sgd(quad, -1.5, 0.01, 1000, PRINT_EVERY=100) 122 | print("test 3 result:", t3) 123 | assert abs(t3) <= 1e-6 124 | 125 | print("") 126 | 127 | 128 | def your_sanity_checks(): 129 | """ 130 | Use this space add any additional sanity checks by running: 131 | python q3_sgd.py 132 | This function will not be called by the autograder, nor will 133 | your additional tests be graded. 134 | """ 135 | print("Running your sanity checks...") 136 | ### YOUR CODE HERE 137 | # raise NotImplementedError 138 | ### END YOUR CODE 139 | 140 | 141 | if __name__ == "__main__": 142 | sanity_check() 143 | # your_sanity_checks() 144 | -------------------------------------------------------------------------------- /NLP/assignment1/q3_word_vectors.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/bayeslabs/AiGym/205350c311f0e0981fe90eae84586c9bd4a9cfef/NLP/assignment1/q3_word_vectors.png -------------------------------------------------------------------------------- /NLP/assignment1/q4_dev_conf.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/bayeslabs/AiGym/205350c311f0e0981fe90eae84586c9bd4a9cfef/NLP/assignment1/q4_dev_conf.png -------------------------------------------------------------------------------- /NLP/assignment1/q4_reg_v_acc.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/bayeslabs/AiGym/205350c311f0e0981fe90eae84586c9bd4a9cfef/NLP/assignment1/q4_reg_v_acc.png -------------------------------------------------------------------------------- /NLP/assignment1/q4_sentiment.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python 2 | 3 | import argparse 4 | import numpy as np 5 | import matplotlib 6 | matplotlib.use('agg') 7 | import matplotlib.pyplot as plt 8 | import itertools 9 | 10 | from utils.treebank import StanfordSentiment 11 | import utils.glove as glove 12 | 13 | from q3_sgd import load_saved_params, sgd 14 | 15 | # We will use sklearn here because it will run faster than implementing 16 | # ourselves. However, for other parts of this assignment you must implement 17 | # the functions yourself! 18 | from sklearn.linear_model import LogisticRegression 19 | from sklearn.metrics import confusion_matrix 20 | 21 | 22 | def getArguments(): 23 | parser = argparse.ArgumentParser() 24 | group = parser.add_mutually_exclusive_group(required=True) 25 | group.add_argument("--pretrained", dest="pretrained", action="store_true", 26 | help="Use pretrained GloVe vectors.") 27 | group.add_argument("--yourvectors", dest="yourvectors", action="store_true", 28 | help="Use your vectors from q3.") 29 | return parser.parse_args() 30 | 31 | 32 | def getSentenceFeatures(tokens, wordVectors, sentence): 33 | """ 34 | Obtain the sentence feature for sentiment analysis by averaging its 35 | word vectors 36 | """ 37 | 38 | # Implement computation for the sentence features given a sentence. 39 | 40 | # Inputs: 41 | # tokens -- a dictionary that maps words to their indices in 42 | # the word vector list 43 | # wordVectors -- word vectors (each row) for all tokens 44 | # sentence -- a list of words in the sentence of interest 45 | 46 | # Output: 47 | # - sentVector: feature vector for the sentence 48 | 49 | sentVector = np.zeros((wordVectors.shape[1],)) 50 | 51 | ### YOUR CODE HERE 52 | sentVector = np.sum(np.vstack([wordVectors[tokens[w]] for w in sentence]),axis=0)/len(sentence) 53 | ### END YOUR CODE 54 | 55 | assert sentVector.shape == (wordVectors.shape[1],) 56 | return sentVector 57 | 58 | 59 | def getRegularizationValues(): 60 | """Try different regularizations 61 | 62 | Return a sorted list of values to try. 63 | """ 64 | values = None # Assign a list of floats in the block below 65 | ### YOUR CODE HERE 66 | values = np.logspace(-6, 0, num=10) 67 | ### END YOUR CODE 68 | return sorted(values) 69 | 70 | 71 | def chooseBestModel(results): 72 | """Choose the best model based on dev set performance. 73 | 74 | Arguments: 75 | results -- A list of python dictionaries of the following format: 76 | { 77 | "reg": regularization, 78 | "clf": classifier, 79 | "train": trainAccuracy, 80 | "dev": devAccuracy, 81 | "test": testAccuracy 82 | } 83 | 84 | Each dictionary represents the performance of one model. 85 | 86 | Returns: 87 | Your chosen result dictionary. 88 | """ 89 | bestResult = None 90 | 91 | ### YOUR CODE HERE 92 | bestResult = max(results, key=lambda x: x['dev']) 93 | ### END YOUR CODE 94 | 95 | return bestResult 96 | 97 | 98 | def accuracy(y, yhat): 99 | """ Precision for classifier """ 100 | assert(y.shape == yhat.shape) 101 | return np.sum(y == yhat) * 100.0 / y.size 102 | 103 | 104 | def plotRegVsAccuracy(regValues, results, filename): 105 | """ Make a plot of regularization vs accuracy """ 106 | plt.plot(regValues, [x["train"] for x in results]) 107 | plt.plot(regValues, [x["dev"] for x in results]) 108 | plt.xscale('log') 109 | plt.xlabel("regularization") 110 | plt.ylabel("accuracy") 111 | plt.legend(['train', 'dev'], loc='upper left') 112 | plt.savefig(filename) 113 | 114 | 115 | def outputConfusionMatrix(features, labels, clf, filename): 116 | """ Generate a confusion matrix """ 117 | pred = clf.predict(features) 118 | cm = confusion_matrix(labels, pred, labels=range(5)) 119 | plt.figure() 120 | plt.imshow(cm, interpolation='nearest', cmap=plt.cm.Reds) 121 | plt.colorbar() 122 | classes = ["- -", "-", "neut", "+", "+ +"] 123 | tick_marks = np.arange(len(classes)) 124 | plt.xticks(tick_marks, classes) 125 | plt.yticks(tick_marks, classes) 126 | thresh = cm.max() / 2. 127 | for i, j in itertools.product(range(cm.shape[0]), range(cm.shape[1])): 128 | plt.text(j, i, cm[i, j], 129 | horizontalalignment="center", 130 | color="white" if cm[i, j] > thresh else "black") 131 | plt.tight_layout() 132 | plt.ylabel('True label') 133 | plt.xlabel('Predicted label') 134 | plt.savefig(filename) 135 | 136 | 137 | def outputPredictions(dataset, features, labels, clf, filename): 138 | """ Write the predictions to file """ 139 | pred = clf.predict(features) 140 | with open(filename, "w") as f: 141 | print(f, "True\tPredicted\tText") 142 | for i in range(len(dataset)): 143 | print(f, "%d\t%d\t%s" % ( 144 | labels[i], pred[i], " ".join(dataset[i][0]))) 145 | 146 | 147 | def main(args): 148 | """ Train a model to do sentiment analyis""" 149 | 150 | # Load the dataset 151 | dataset = StanfordSentiment() 152 | tokens = dataset.tokens() 153 | nWords = len(tokens) 154 | 155 | if args.yourvectors: 156 | _, wordVectors, _ = load_saved_params() 157 | wordVectors = np.concatenate( 158 | (wordVectors[:nWords,:], wordVectors[nWords:,:]), 159 | axis=1) 160 | elif args.pretrained: 161 | wordVectors = glove.loadWordVectors(tokens) 162 | dimVectors = wordVectors.shape[1] 163 | 164 | # Load the train set 165 | trainset = dataset.getTrainSentences() 166 | nTrain = len(trainset) 167 | trainFeatures = np.zeros((nTrain, dimVectors)) 168 | trainLabels = np.zeros((nTrain,), dtype=np.int32) 169 | for i in range(nTrain): 170 | words, trainLabels[i] = trainset[i] 171 | trainFeatures[i, :] = getSentenceFeatures(tokens, wordVectors, words) 172 | 173 | # Prepare dev set features 174 | devset = dataset.getDevSentences() 175 | nDev = len(devset) 176 | devFeatures = np.zeros((nDev, dimVectors)) 177 | devLabels = np.zeros((nDev,), dtype=np.int32) 178 | for i in range(nDev): 179 | words, devLabels[i] = devset[i] 180 | devFeatures[i, :] = getSentenceFeatures(tokens, wordVectors, words) 181 | 182 | # Prepare test set features 183 | testset = dataset.getTestSentences() 184 | nTest = len(testset) 185 | testFeatures = np.zeros((nTest, dimVectors)) 186 | testLabels = np.zeros((nTest,), dtype=np.int32) 187 | for i in range(nTest): 188 | words, testLabels[i] = testset[i] 189 | testFeatures[i, :] = getSentenceFeatures(tokens, wordVectors, words) 190 | 191 | # We will save our results from each run 192 | results = [] 193 | regValues = getRegularizationValues() 194 | for reg in regValues: 195 | print("Training for reg=%f" % reg) 196 | # Note: add a very small number to regularization to please the library 197 | clf = LogisticRegression(C=1.0/(reg + 1e-12)) 198 | clf.fit(trainFeatures, trainLabels) 199 | 200 | # Test on train set 201 | pred = clf.predict(trainFeatures) 202 | trainAccuracy = accuracy(trainLabels, pred) 203 | print("Train accuracy (%%): %f" % trainAccuracy) 204 | 205 | # Test on dev set 206 | pred = clf.predict(devFeatures) 207 | devAccuracy = accuracy(devLabels, pred) 208 | print("Dev accuracy (%%): %f" % devAccuracy) 209 | 210 | # Test on test set 211 | # Note: always running on test is poor style. Typically, you should 212 | # do this only after validation. 213 | pred = clf.predict(testFeatures) 214 | testAccuracy = accuracy(testLabels, pred) 215 | print("Test accuracy (%%): %f" % testAccuracy) 216 | 217 | results.append({ 218 | "reg": reg, 219 | "clf": clf, 220 | "train": trainAccuracy, 221 | "dev": devAccuracy, 222 | "test": testAccuracy}) 223 | 224 | # Print the accuracies 225 | print("") 226 | print("=== Recap ===") 227 | print("Reg\t\tTrain\tDev\tTest") 228 | for result in results: 229 | print("%.2E\t%.3f\t%.3f\t%.3f" % ( 230 | result["reg"], 231 | result["train"], 232 | result["dev"], 233 | result["test"])) 234 | print("") 235 | 236 | bestResult = chooseBestModel(results) 237 | print("Best regularization value: %0.2E" % bestResult["reg"]) 238 | print("Test accuracy (%%): %f" % bestResult["test"]) 239 | 240 | # do some error analysis 241 | if args.pretrained: 242 | plotRegVsAccuracy(regValues, results, "q4_reg_v_acc.png") 243 | outputConfusionMatrix(devFeatures, devLabels, bestResult["clf"], 244 | "q4_dev_conf.png") 245 | outputPredictions(devset, devFeatures, devLabels, bestResult["clf"], 246 | "q4_dev_pred.txt") 247 | 248 | 249 | if __name__ == "__main__": 250 | main(getArguments()) 251 | -------------------------------------------------------------------------------- /NLP/assignment1/requirements.txt: -------------------------------------------------------------------------------- 1 | matplotlib 2 | scipy 3 | numpy 4 | sklearn 5 | -------------------------------------------------------------------------------- /NLP/assignment1/utils/__pycache__/__init__.cpython-36.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/bayeslabs/AiGym/205350c311f0e0981fe90eae84586c9bd4a9cfef/NLP/assignment1/utils/__pycache__/__init__.cpython-36.pyc -------------------------------------------------------------------------------- /NLP/assignment1/utils/__pycache__/glove.cpython-36.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/bayeslabs/AiGym/205350c311f0e0981fe90eae84586c9bd4a9cfef/NLP/assignment1/utils/__pycache__/glove.cpython-36.pyc -------------------------------------------------------------------------------- /NLP/assignment1/utils/__pycache__/treebank.cpython-36.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/bayeslabs/AiGym/205350c311f0e0981fe90eae84586c9bd4a9cfef/NLP/assignment1/utils/__pycache__/treebank.cpython-36.pyc -------------------------------------------------------------------------------- /NLP/assignment1/utils/glove.py: -------------------------------------------------------------------------------- 1 | import os 2 | import numpy as np 3 | 4 | DEFAULT_FILE_PATH = os.path.join(os.path.dirname(__file__),"datasets/glove.6B.50d.txt") 5 | 6 | def loadWordVectors(tokens, filepath=DEFAULT_FILE_PATH, dimensions=50): 7 | """Read pretrained GloVe vectors""" 8 | wordVectors = np.zeros((len(tokens), dimensions)) 9 | with open(filepath) as ifs: 10 | for line in ifs: 11 | line = line.strip() 12 | if not line: 13 | continue 14 | row = line.split() 15 | token = row[0] 16 | if token not in tokens: 17 | continue 18 | data = [float(x) for x in row[1:]] 19 | if len(data) != dimensions: 20 | raise RuntimeError("wrong number of dimensions") 21 | wordVectors[tokens[token]] = np.asarray(data) 22 | return wordVectors 23 | -------------------------------------------------------------------------------- /NLP/assignment1/utils/treebank.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python 2 | # -*- coding: utf-8 -*- 3 | 4 | import pickle 5 | import numpy as np 6 | import os 7 | import random 8 | 9 | class StanfordSentiment: 10 | def __init__(self, path=None, tablesize = 1000000): 11 | if not path: 12 | path = os.path.join(os.path.dirname(__file__),"datasets/stanfordSentimentTreebank") 13 | 14 | self.path = path 15 | self.tablesize = tablesize 16 | 17 | def tokens(self): 18 | if hasattr(self, "_tokens") and self._tokens: 19 | return self._tokens 20 | 21 | tokens = dict() 22 | tokenfreq = dict() 23 | wordcount = 0 24 | revtokens = [] 25 | idx = 0 26 | 27 | for sentence in self.sentences(): 28 | for w in sentence: 29 | wordcount += 1 30 | if not w in tokens: 31 | tokens[w] = idx 32 | revtokens += [w] 33 | tokenfreq[w] = 1 34 | idx += 1 35 | else: 36 | tokenfreq[w] += 1 37 | 38 | tokens["UNK"] = idx 39 | revtokens += ["UNK"] 40 | tokenfreq["UNK"] = 1 41 | wordcount += 1 42 | 43 | self._tokens = tokens 44 | self._tokenfreq = tokenfreq 45 | self._wordcount = wordcount 46 | self._revtokens = revtokens 47 | return self._tokens 48 | 49 | def sentences(self): 50 | if hasattr(self, "_sentences") and self._sentences: 51 | return self._sentences 52 | 53 | sentences = [] 54 | with open(self.path + "/datasetSentences.txt", "r") as f: 55 | first = True 56 | for line in f: 57 | if first: 58 | first = False 59 | continue 60 | 61 | splitted = line.strip().split()[1:] 62 | # Deal with some peculiar encoding issues with this file 63 | sentences += [[w.lower() for w in splitted]] 64 | 65 | self._sentences = sentences 66 | self._sentlengths = np.array([len(s) for s in sentences]) 67 | self._cumsentlen = np.cumsum(self._sentlengths) 68 | 69 | return self._sentences 70 | 71 | def numSentences(self): 72 | if hasattr(self, "_numSentences") and self._numSentences: 73 | return self._numSentences 74 | else: 75 | self._numSentences = len(self.sentences()) 76 | return self._numSentences 77 | 78 | def allSentences(self): 79 | if hasattr(self, "_allsentences") and self._allsentences: 80 | return self._allsentences 81 | 82 | sentences = self.sentences() 83 | rejectProb = self.rejectProb() 84 | tokens = self.tokens() 85 | allsentences = [[w for w in s 86 | if 0 >= rejectProb[tokens[w]] or random.random() >= rejectProb[tokens[w]]] 87 | for s in sentences * 30] 88 | 89 | allsentences = [s for s in allsentences if len(s) > 1] 90 | 91 | self._allsentences = allsentences 92 | 93 | return self._allsentences 94 | 95 | def getRandomContext(self, C=5): 96 | allsent = self.allSentences() 97 | sentID = random.randint(0, len(allsent) - 1) 98 | sent = allsent[sentID] 99 | wordID = random.randint(0, len(sent) - 1) 100 | 101 | context = sent[max(0, wordID - C):wordID] 102 | if wordID+1 < len(sent): 103 | context += sent[wordID+1:min(len(sent), wordID + C + 1)] 104 | 105 | centerword = sent[wordID] 106 | context = [w for w in context if w != centerword] 107 | 108 | if len(context) > 0: 109 | return centerword, context 110 | else: 111 | return self.getRandomContext(C) 112 | 113 | def sent_labels(self): 114 | if hasattr(self, "_sent_labels") and self._sent_labels: 115 | return self._sent_labels 116 | 117 | dictionary = dict() 118 | phrases = 0 119 | with open(self.path + "/dictionary.txt", "r") as f: 120 | for line in f: 121 | line = line.strip() 122 | if not line: continue 123 | splitted = line.split("|") 124 | dictionary[splitted[0].lower()] = int(splitted[1]) 125 | phrases += 1 126 | 127 | labels = [0.0] * phrases 128 | with open(self.path + "/sentiment_labels.txt", "r") as f: 129 | first = True 130 | for line in f: 131 | if first: 132 | first = False 133 | continue 134 | 135 | line = line.strip() 136 | if not line: continue 137 | splitted = line.split("|") 138 | labels[int(splitted[0])] = float(splitted[1]) 139 | 140 | sent_labels = [0.0] * self.numSentences() 141 | sentences = self.sentences() 142 | for i in range(self.numSentences()): 143 | sentence = sentences[i] 144 | full_sent = " ".join(sentence).replace('-lrb-', '(').replace('-rrb-', ')') 145 | try: 146 | sent_labels[i] = labels[dictionary[full_sent]] 147 | except: 148 | continue 149 | 150 | self._sent_labels = sent_labels 151 | return self._sent_labels 152 | 153 | def dataset_split(self): 154 | if hasattr(self, "_split") and self._split: 155 | return self._split 156 | 157 | split = [[] for i in range(3)] 158 | with open(self.path + "/datasetSplit.txt", "r") as f: 159 | first = True 160 | for line in f: 161 | if first: 162 | first = False 163 | continue 164 | 165 | splitted = line.strip().split(",") 166 | split[int(splitted[1]) - 1] += [int(splitted[0]) - 1] 167 | 168 | self._split = split 169 | return self._split 170 | 171 | def getRandomTrainSentence(self): 172 | split = self.dataset_split() 173 | sentId = split[0][random.randint(0, len(split[0]) - 1)] 174 | return self.sentences()[sentId], self.categorify(self.sent_labels()[sentId]) 175 | 176 | def categorify(self, label): 177 | if label <= 0.2: 178 | return 0 179 | elif label <= 0.4: 180 | return 1 181 | elif label <= 0.6: 182 | return 2 183 | elif label <= 0.8: 184 | return 3 185 | else: 186 | return 4 187 | 188 | def getDevSentences(self): 189 | return self.getSplitSentences(2) 190 | 191 | def getTestSentences(self): 192 | return self.getSplitSentences(1) 193 | 194 | def getTrainSentences(self): 195 | return self.getSplitSentences(0) 196 | 197 | def getSplitSentences(self, split=0): 198 | ds_split = self.dataset_split() 199 | return [(self.sentences()[i], self.categorify(self.sent_labels()[i])) for i in ds_split[split]] 200 | 201 | def sampleTable(self): 202 | if hasattr(self, '_sampleTable') and self._sampleTable is not None: 203 | return self._sampleTable 204 | 205 | nTokens = len(self.tokens()) 206 | samplingFreq = np.zeros((nTokens,)) 207 | self.allSentences() 208 | i = 0 209 | for w in range(nTokens): 210 | w = self._revtokens[i] 211 | if w in self._tokenfreq: 212 | freq = 1.0 * self._tokenfreq[w] 213 | # Reweigh 214 | freq = freq ** 0.75 215 | else: 216 | freq = 0.0 217 | samplingFreq[i] = freq 218 | i += 1 219 | 220 | samplingFreq /= np.sum(samplingFreq) 221 | samplingFreq = np.cumsum(samplingFreq) * self.tablesize 222 | 223 | self._sampleTable = [0] * self.tablesize 224 | 225 | j = 0 226 | for i in range(self.tablesize): 227 | while i > samplingFreq[j]: 228 | j += 1 229 | self._sampleTable[i] = j 230 | 231 | return self._sampleTable 232 | 233 | def rejectProb(self): 234 | if hasattr(self, '_rejectProb') and self._rejectProb is not None: 235 | return self._rejectProb 236 | 237 | threshold = 1e-5 * self._wordcount 238 | 239 | nTokens = len(self.tokens()) 240 | rejectProb = np.zeros((nTokens,)) 241 | for i in range(nTokens): 242 | w = self._revtokens[i] 243 | freq = 1.0 * self._tokenfreq[w] 244 | # Reweigh 245 | rejectProb[i] = max(0, 1 - np.sqrt(threshold / freq)) 246 | 247 | self._rejectProb = rejectProb 248 | return self._rejectProb 249 | 250 | def sampleTokenIdx(self): 251 | return self.sampleTable()[random.randint(0, self.tablesize - 1)] -------------------------------------------------------------------------------- /NLP/assignment2/assignment2-soln.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/bayeslabs/AiGym/205350c311f0e0981fe90eae84586c9bd4a9cfef/NLP/assignment2/assignment2-soln.pdf -------------------------------------------------------------------------------- /NLP/assignment2/assignment2.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/bayeslabs/AiGym/205350c311f0e0981fe90eae84586c9bd4a9cfef/NLP/assignment2/assignment2.pdf -------------------------------------------------------------------------------- /NLP/assignment2/model.py: -------------------------------------------------------------------------------- 1 | class Model(object): 2 | """Abstracts a Tensorflow graph for a learning task. 3 | 4 | We use various Model classes as usual abstractions to encapsulate tensorflow 5 | computational graphs. Each algorithm you will construct in this homework will 6 | inherit from a Model object. 7 | """ 8 | def add_placeholders(self): 9 | """Adds placeholder variables to tensorflow computational graph. 10 | 11 | Tensorflow uses placeholder variables to represent locations in a 12 | computational graph where data is inserted. These placeholders are used as 13 | inputs by the rest of the model building and will be fed data during 14 | training. 15 | 16 | See for more information: 17 | https://www.tensorflow.org/versions/r0.7/api_docs/python/io_ops.html#placeholders 18 | """ 19 | raise NotImplementedError("Each Model must re-implement this method.") 20 | 21 | def create_feed_dict(self, inputs_batch, labels_batch=None): 22 | """Creates the feed_dict for one step of training. 23 | 24 | A feed_dict takes the form of: 25 | feed_dict = { 26 | : , 27 | .... 28 | } 29 | 30 | If labels_batch is None, then no labels are added to feed_dict. 31 | 32 | Hint: The keys for the feed_dict should be a subset of the placeholder 33 | tensors created in add_placeholders. 34 | Args: 35 | inputs_batch: A batch of input data. 36 | labels_batch: A batch of label data. 37 | Returns: 38 | feed_dict: The feed dictionary mapping from placeholders to values. 39 | """ 40 | raise NotImplementedError("Each Model must re-implement this method.") 41 | 42 | def add_prediction_op(self): 43 | """Implements the core of the model that transforms a batch of input data into predictions. 44 | 45 | Returns: 46 | pred: A tensor of shape (batch_size, n_classes) 47 | """ 48 | raise NotImplementedError("Each Model must re-implement this method.") 49 | 50 | def add_loss_op(self, pred): 51 | """Adds Ops for the loss function to the computational graph. 52 | 53 | Args: 54 | pred: A tensor of shape (batch_size, n_classes) 55 | Returns: 56 | loss: A 0-d tensor (scalar) output 57 | """ 58 | raise NotImplementedError("Each Model must re-implement this method.") 59 | 60 | def add_training_op(self, loss): 61 | """Sets up the training Ops. 62 | 63 | Creates an optimizer and applies the gradients to all trainable variables. 64 | The Op returned by this function is what must be passed to the 65 | sess.run() to train the model. See 66 | 67 | https://www.tensorflow.org/versions/r0.7/api_docs/python/train.html#Optimizer 68 | 69 | for more information. 70 | 71 | Args: 72 | loss: Loss tensor (a scalar). 73 | Returns: 74 | train_op: The Op for training. 75 | """ 76 | 77 | raise NotImplementedError("Each Model must re-implement this method.") 78 | 79 | def train_on_batch(self, sess, inputs_batch, labels_batch): 80 | """Perform one step of gradient descent on the provided batch of data. 81 | 82 | Args: 83 | sess: tf.Session() 84 | input_batch: np.ndarray of shape (n_samples, n_features) 85 | labels_batch: np.ndarray of shape (n_samples, n_classes) 86 | Returns: 87 | loss: loss over the batch (a scalar) 88 | """ 89 | feed = self.create_feed_dict(inputs_batch, labels_batch=labels_batch) 90 | _, loss = sess.run([self.train_op, self.loss], feed_dict=feed) 91 | return loss 92 | 93 | def predict_on_batch(self, sess, inputs_batch): 94 | """Make predictions for the provided batch of data 95 | 96 | Args: 97 | sess: tf.Session() 98 | input_batch: np.ndarray of shape (n_samples, n_features) 99 | Returns: 100 | predictions: np.ndarray of shape (n_samples, n_classes) 101 | """ 102 | feed = self.create_feed_dict(inputs_batch) 103 | predictions = sess.run(self.pred, feed_dict=feed) 104 | if predictions.sum() >10000: 105 | print("predictions.sum()", predictions.sum()) 106 | return predictions 107 | 108 | def build(self): 109 | self.add_placeholders() 110 | self.pred = self.add_prediction_op() 111 | self.loss = self.add_loss_op(self.pred) 112 | self.train_op = self.add_training_op(self.loss) 113 | -------------------------------------------------------------------------------- /NLP/assignment2/q1_classifier.py: -------------------------------------------------------------------------------- 1 | import time 2 | 3 | import numpy as np 4 | import tensorflow as tf 5 | 6 | from q1_softmax import softmax 7 | from q1_softmax import cross_entropy_loss 8 | from model import Model 9 | from utils.general_utils import get_minibatches 10 | 11 | 12 | class Config(object): 13 | """Holds model hyperparams and data information. 14 | 15 | The config class is used to store various hyperparameters and dataset 16 | information parameters. Model objects are passed a Config() object at 17 | instantiation. They can then call self.config. to 18 | get the hyperparameter settings. 19 | """ 20 | n_samples = 1024 21 | n_features = 100 22 | n_classes = 5 23 | batch_size = 64 24 | n_epochs = 50 25 | lr = 1e-4 26 | 27 | 28 | class SoftmaxModel(Model): 29 | """Implements a Softmax classifier with cross-entropy loss.""" 30 | 31 | def add_placeholders(self): 32 | """Generates placeholder variables to represent the input tensors. 33 | 34 | These placeholders are used as inputs by the rest of the model building 35 | and will be fed data during training. 36 | 37 | Adds following nodes to the computational graph 38 | 39 | input_placeholder: Input placeholder tensor of shape 40 | (batch_size, n_features), type tf.float32 41 | labels_placeholder: Labels placeholder tensor of shape 42 | (batch_size, n_classes), type tf.int32 43 | 44 | Add these placeholders to self as the instance variables 45 | self.input_placeholder 46 | self.labels_placeholder 47 | """ 48 | ### YOUR CODE HERE 49 | self.input_placeholder=tf.placeholder(dtype=tf.float32,shape=(None, Config.n_features),name='input_placeholder') 50 | self.labels_placeholder=tf.placeholder(dtype=tf.int32,shape=(None, Config.n_classes), name='labels_placeholder') 51 | ### END YOUR CODE 52 | 53 | def create_feed_dict(self, inputs_batch, labels_batch=None): 54 | """Creates the feed_dict for training the given step. 55 | 56 | A feed_dict takes the form of: 57 | feed_dict = { 58 | : , 59 | .... 60 | } 61 | 62 | If label_batch is None, then no labels are added to feed_dict. 63 | 64 | Hint: The keys for the feed_dict should be the placeholder 65 | tensors created in add_placeholders. 66 | 67 | Args: 68 | inputs_batch: A batch of input data. 69 | labels_batch: A batch of label data. 70 | Returns: 71 | feed_dict: The feed dictionary mapping from placeholders to values. 72 | """ 73 | ### YOUR CODE HERE 74 | feed_dict=dict() 75 | feed_dict[self.input_placeholder]=inputs_batch 76 | if labels_batch is not None: 77 | feed_dict[self.labels_placeholder]=labels_batch 78 | ### END YOUR CODE 79 | return feed_dict 80 | 81 | def add_prediction_op(self): 82 | """Adds the core transformation for this model which transforms a batch of input 83 | data into a batch of predictions. In this case, the transformation is a linear layer plus a 84 | softmax transformation: 85 | 86 | yhat = softmax(xW + b) 87 | 88 | Hint: The input x will be passed in through self.input_placeholder. Each ROW of 89 | self.input_placeholder is a single example. This is usually best-practice for 90 | tensorflow code. 91 | Hint: Make sure to create tf.Variables as needed. 92 | Hint: For this simple use-case, it's sufficient to initialize both weights W 93 | and biases b with zeros. 94 | 95 | Returns: 96 | pred: A tensor of shape (batch_size, n_classes) 97 | """ 98 | ### YOUR CODE HERE 99 | W = tf.Variable(tf.zeros((Config.n_features,Config.n_classes)), dtype=tf.float32) 100 | b = tf.Variable(tf.zeros(Config.n_classes), dtype=tf.float32) 101 | pred = softmax(tf.matmul(self.input_placeholder,W)+b) 102 | ### END YOUR CODE 103 | return pred 104 | 105 | def add_loss_op(self, pred): 106 | """Adds cross_entropy_loss ops to the computational graph. 107 | 108 | Hint: Use the cross_entropy_loss function we defined. This should be a very 109 | short function. 110 | Args: 111 | pred: A tensor of shape (batch_size, n_classes) 112 | Returns: 113 | loss: A 0-d tensor (scalar) 114 | """ 115 | ### YOUR CODE HERE 116 | loss = cross_entropy_loss(self.labels_placeholder,pred) 117 | ### END YOUR CODE 118 | return loss 119 | 120 | def add_training_op(self, loss): 121 | """Sets up the training Ops. 122 | 123 | Creates an optimizer and applies the gradients to all trainable variables. 124 | The Op returned by this function is what must be passed to the 125 | `sess.run()` call to cause the model to train. See 126 | 127 | https://www.tensorflow.org/api_docs/python/tf/train/Optimizer 128 | 129 | for more information. Use the learning rate from self.config. 130 | 131 | Hint: Use tf.train.GradientDescentOptimizer to get an optimizer object. 132 | Calling optimizer.minimize() will return a train_op object. 133 | 134 | Args: 135 | loss: Loss tensor, from cross_entropy_loss. 136 | Returns: 137 | train_op: The Op for training. 138 | """ 139 | ### YOUR CODE HERE 140 | train_op=tf.train.GradientDescentOptimizer(learning_rate=Config.lr).minimize(loss) 141 | ### END YOUR CODE 142 | return train_op 143 | 144 | def run_epoch(self, sess, inputs, labels): 145 | """Runs an epoch of training. 146 | 147 | Args: 148 | sess: tf.Session() object 149 | inputs: np.ndarray of shape (n_samples, n_features) 150 | labels: np.ndarray of shape (n_samples, n_classes) 151 | Returns: 152 | average_loss: scalar. Average minibatch loss of model on epoch. 153 | """ 154 | n_minibatches, total_loss = 0, 0 155 | for input_batch, labels_batch in get_minibatches([inputs, labels], self.config.batch_size): 156 | n_minibatches += 1 157 | total_loss += self.train_on_batch(sess, input_batch, labels_batch) 158 | return total_loss / n_minibatches 159 | 160 | def fit(self, sess, inputs, labels): 161 | """Fit model on provided data. 162 | 163 | Args: 164 | sess: tf.Session() 165 | inputs: np.ndarray of shape (n_samples, n_features) 166 | labels: np.ndarray of shape (n_samples, n_classes) 167 | Returns: 168 | losses: list of loss per epoch 169 | """ 170 | losses = [] 171 | for epoch in range(self.config.n_epochs): 172 | start_time = time.time() 173 | average_loss = self.run_epoch(sess, inputs, labels) 174 | duration = time.time() - start_time 175 | print(('Epoch {:}: loss = {:.2f} ({:.3f} sec)'.format(epoch, average_loss, duration))) 176 | losses.append(average_loss) 177 | return losses 178 | 179 | def __init__(self, config): 180 | """Initializes the model. 181 | 182 | Args: 183 | config: A model configuration object of type Config 184 | """ 185 | self.config = config 186 | self.build() 187 | 188 | 189 | def test_softmax_model(): 190 | """Train softmax model for a number of steps.""" 191 | config = Config() 192 | 193 | # Generate random data to train the model on 194 | np.random.seed(1234) 195 | inputs = np.random.rand(config.n_samples, config.n_features) 196 | labels = np.zeros((config.n_samples, config.n_classes), dtype=np.int32) 197 | labels[:, 0] = 1 198 | 199 | # Tell TensorFlow that the model will be built into the default Graph. 200 | # (not required but good practice) 201 | with tf.Graph().as_default() as graph: 202 | # Build the model and add the variable initializer op 203 | model = SoftmaxModel(config) 204 | init_op = tf.global_variables_initializer() 205 | # Finalizing the graph causes tensorflow to raise an exception if you try to modify the graph 206 | # further. This is good practice because it makes explicit the distinction between building and 207 | # running the graph. 208 | graph.finalize() 209 | 210 | # Create a session for running ops in the graph 211 | with tf.Session(graph=graph) as sess: 212 | # Run the op to initialize the variables. 213 | sess.run(init_op) 214 | # Fit the model 215 | losses = model.fit(sess, inputs, labels) 216 | 217 | # If ops are implemented correctly, the average loss should fall close to zero 218 | # rapidly. 219 | assert losses[-1] < .5 220 | print("Basic (non-exhaustive) classifier tests pass") 221 | 222 | if __name__ == "__main__": 223 | test_softmax_model() 224 | -------------------------------------------------------------------------------- /NLP/assignment2/q1_softmax.py: -------------------------------------------------------------------------------- 1 | import numpy as np 2 | import tensorflow as tf 3 | from utils.general_utils import test_all_close 4 | 5 | 6 | def softmax(x): 7 | """ 8 | Compute the softmax function in tensorflow. 9 | 10 | You might find the tensorflow functions tf.exp, tf.reduce_max, 11 | tf.reduce_sum, tf.expand_dims useful. (Many solutions are possible, so you may 12 | not need to use all of these functions). Recall also that many common 13 | tensorflow operations are sugared (e.g. x + y does elementwise addition 14 | if x and y are both tensors). Make sure to implement the numerical stability 15 | fixes as in the previous homework! 16 | 17 | Args: 18 | x: tf.Tensor with shape (n_samples, n_features). Note feature vectors are 19 | represented by row-vectors. (For simplicity, no need to handle 1-d 20 | input as in the previous homework) 21 | Returns: 22 | out: tf.Tensor with shape (n_sample, n_features). You need to construct this 23 | tensor in this problem. 24 | """ 25 | 26 | ### YOUR CODE HERE 27 | m = tf.reduce_max(x, axis=-1, keepdims=True) 28 | e = tf.exp(x - m) 29 | out = e/tf.reduce_sum(e, axis=-1, keepdims=True) 30 | ### END YOUR CODE 31 | 32 | return out 33 | 34 | 35 | def cross_entropy_loss(y, yhat): 36 | """ 37 | Compute the cross entropy loss in tensorflow. 38 | The loss should be summed over the current minibatch. 39 | 40 | y is a one-hot tensor of shape (n_samples, n_classes) and yhat is a tensor 41 | of shape (n_samples, n_classes). y should be of dtype tf.int32, and yhat should 42 | be of dtype tf.float32. 43 | 44 | The functions tf.to_float, tf.reduce_sum, and tf.log might prove useful. (Many 45 | solutions are possible, so you may not need to use all of these functions). 46 | 47 | Note: You are NOT allowed to use the tensorflow built-in cross-entropy 48 | functions. 49 | 50 | Args: 51 | y: tf.Tensor with shape (n_samples, n_classes). One-hot encoded. 52 | yhat: tf.Tensorwith shape (n_sample, n_classes). Each row encodes a 53 | probability distribution and should sum to 1. 54 | Returns: 55 | out: tf.Tensor with shape (1,) (Scalar output). You need to construct this 56 | tensor in the problem. 57 | """ 58 | 59 | out=0 60 | ### YOUR CODE HERE 61 | out=-tf.reduce_sum(tf.to_float(y)*tf.log(yhat+1e-8)) 62 | ### END YOUR CODE 63 | 64 | return out 65 | 66 | 67 | def test_softmax_basic(): 68 | """ 69 | Some simple tests of softmax to get you started. 70 | Warning: these are not exhaustive. 71 | """ 72 | 73 | test1 = softmax(tf.constant(np.array([[1001, 1002], [3, 4]]), dtype=tf.float32)) 74 | with tf.Session() as sess: 75 | test1 = sess.run(test1) 76 | test_all_close("Softmax test 1", test1, np.array([[0.26894142, 0.73105858], 77 | [0.26894142, 0.73105858]])) 78 | 79 | test2 = softmax(tf.constant(np.array([[-1001, -1002]]), dtype=tf.float32)) 80 | with tf.Session() as sess: 81 | test2 = sess.run(test2) 82 | test_all_close("Softmax test 2", test2, np.array([[0.73105858, 0.26894142]])) 83 | 84 | print("Basic (non-exhaustive) softmax tests pass\n") 85 | 86 | 87 | def test_cross_entropy_loss_basic(): 88 | """ 89 | Some simple tests of cross_entropy_loss to get you started. 90 | Warning: these are not exhaustive. 91 | """ 92 | y = np.array([[0, 1], [1, 0], [1, 0]]) 93 | yhat = np.array([[.5, .5], [.5, .5], [.5, .5]]) 94 | 95 | test1 = cross_entropy_loss(tf.constant(y, dtype=tf.int32), tf.constant(yhat, dtype=tf.float32)) 96 | with tf.Session() as sess: 97 | test1 = sess.run(test1) 98 | expected = -3 * np.log(.5) 99 | test_all_close("Cross-entropy test 1", test1, expected) 100 | 101 | print("Basic (non-exhaustive) cross-entropy tests pass") 102 | 103 | if __name__ == "__main__": 104 | test_softmax_basic() 105 | test_cross_entropy_loss_basic() 106 | -------------------------------------------------------------------------------- /NLP/assignment2/q2_initialization.py: -------------------------------------------------------------------------------- 1 | import numpy as np 2 | import tensorflow as tf 3 | 4 | 5 | def xavier_weight_init(): 6 | """Returns function that creates random tensor. 7 | 8 | The specified function will take in a shape (tuple or 1-d array) and 9 | returns a random tensor of the specified shape drawn from the 10 | Xavier initialization distribution. 11 | 12 | Hint: You might find tf.random_uniform useful. 13 | """ 14 | def _xavier_initializer(shape, **kwargs): 15 | """Defines an initializer for the Xavier distribution. 16 | Specifically, the output should be sampled uniformly from [-epsilon, epsilon] where 17 | epsilon = sqrt(6) / 18 | e.g., if shape = (2, 3), epsilon = sqrt(6 / (2 + 3)) 19 | 20 | This function will be used as a variable initializer. 21 | 22 | Args: 23 | shape: Tuple or 1-d array that species the dimensions of the requested tensor. 24 | Returns: 25 | out: tf.Tensor of specified shape sampled from the Xavier distribution. 26 | """ 27 | ### YOUR CODE HERE 28 | eps=np.sqrt(6/np.sum(shape)) 29 | out=tf.random_uniform(shape,-eps,eps) 30 | ### END YOUR CODE 31 | return out 32 | # Returns defined initializer function. 33 | return _xavier_initializer 34 | 35 | 36 | def test_initialization_basic(): 37 | """Some simple tests for the initialization. 38 | """ 39 | print("Running basic tests...") 40 | xavier_initializer = xavier_weight_init() 41 | shape = (1,) 42 | xavier_mat = xavier_initializer(shape) 43 | assert xavier_mat.get_shape() == shape 44 | 45 | shape = (1, 2, 3) 46 | xavier_mat = xavier_initializer(shape) 47 | assert xavier_mat.get_shape() == shape 48 | print("Basic (non-exhaustive) Xavier initialization tests pass") 49 | 50 | if __name__ == "__main__": 51 | test_initialization_basic() 52 | -------------------------------------------------------------------------------- /NLP/assignment2/q2_parser_transitions.py: -------------------------------------------------------------------------------- 1 | class PartialParse(object): 2 | def __init__(self, sentence): 3 | """Initializes this partial parse. 4 | 5 | Your code should initialize the following fields: 6 | self.stack: The current stack represented as a list with the top of the stack as the 7 | last element of the list. 8 | self.buffer: The current buffer represented as a list with the first item on the 9 | buffer as the first item of the list 10 | self.dependencies: The list of dependencies produced so far. Represented as a list of 11 | tuples where each tuple is of the form (head, dependent). 12 | Order for this list doesn't matter. 13 | 14 | The root token should be represented with the string "ROOT" 15 | 16 | Args: 17 | sentence: The sentence to be parsed as a list of words. 18 | Your code should not modify the sentence. 19 | """ 20 | # The sentence being parsed is kept for bookkeeping purposes. Do not use it in your code. 21 | self.sentence = sentence 22 | 23 | ### YOUR CODE HERE 24 | self.stack=['ROOT'] 25 | self.buffer=sentence.copy() 26 | self.dependencies=[] 27 | ### END YOUR CODE 28 | 29 | def parse_step(self, transition): 30 | """Performs a single parse step by applying the given transition to this partial parse 31 | 32 | Args: 33 | transition: A string that equals "S", "LA", or "RA" representing the shift, left-arc, 34 | and right-arc transitions. You can assume the provided transition is a legal 35 | transition. 36 | """ 37 | ### YOUR CODE HERE 38 | # print(self.stack) 39 | if transition == 'S': 40 | # if len(self.buffer)>0: 41 | w=self.buffer[0] 42 | self.stack.append(w) 43 | del self.buffer[0] 44 | elif transition == 'LA': 45 | # if len(self.stack)>1: 46 | self.dependencies.append((self.stack[-1],self.stack[-2])) 47 | del self.stack[-2] 48 | elif transition == 'RA': 49 | # if len(self.stack)>1: 50 | w=self.stack.pop() 51 | self.dependencies.append((self.stack[-1],w)) 52 | else: 53 | raise(KeyError("Transition str key {} is not valide".format(transition))) 54 | ### END YOUR CODE 55 | 56 | def parse(self, transitions): 57 | """Applies the provided transitions to this PartialParse 58 | 59 | Args: 60 | transitions: The list of transitions in the order they should be applied 61 | Returns: 62 | dependencies: The list of dependencies produced when parsing the sentence. Represented 63 | as a list of tuples where each tuple is of the form (head, dependent) 64 | """ 65 | for transition in transitions: 66 | self.parse_step(transition) 67 | return self.dependencies 68 | 69 | 70 | def minibatch_parse(sentences, model, batch_size): 71 | """Parses a list of sentences in minibatches using a model. 72 | 73 | Args: 74 | sentences: A list of sentences to be parsed (each sentence is a list of words) 75 | model: The model that makes parsing decisions. It is assumed to have a function 76 | model.predict(partial_parses) that takes in a list of PartialParses as input and 77 | returns a list of transitions predicted for each parse. That is, after calling 78 | transitions = model.predict(partial_parses) 79 | transitions[i] will be the next transition to apply to partial_parses[i]. 80 | batch_size: The number of PartialParses to include in each minibatch 81 | Returns: 82 | dependencies: A list where each element is the dependencies list for a parsed sentence. 83 | Ordering should be the same as in sentences (i.e., dependencies[i] should 84 | contain the parse for sentences[i]). 85 | """ 86 | dependencies=[] 87 | ### YOUR CODE HERE 88 | partial_parses=[PartialParse(s) for s in sentences] 89 | unfinished_parses=[p for p in partial_parses] 90 | while len(unfinished_parses) != 0: 91 | pars=unfinished_parses[:batch_size] 92 | transitions=model.predict(pars) 93 | tobe_deleted=[] 94 | for i,p in enumerate(pars): 95 | p.parse_step(transitions[i]) 96 | if len(p.buffer)==0 and len(p.stack)==1: 97 | tobe_deleted.append(i) 98 | for i in reversed(tobe_deleted): 99 | del unfinished_parses[i] 100 | 101 | dependencies=[p.dependencies for p in partial_parses] 102 | 103 | ### END YOUR CODE 104 | 105 | return dependencies 106 | 107 | 108 | def test_step(name, transition, stack, buf, deps, 109 | ex_stack, ex_buf, ex_deps): 110 | """Tests that a single parse step returns the expected output""" 111 | pp = PartialParse([]) 112 | pp.stack, pp.buffer, pp.dependencies = stack, buf, deps 113 | 114 | pp.parse_step(transition) 115 | stack, buf, deps = (tuple(pp.stack), tuple(pp.buffer), tuple(sorted(pp.dependencies))) 116 | assert stack == ex_stack, \ 117 | "{:} test resulted in stack {:}, expected {:}".format(name, stack, ex_stack) 118 | assert buf == ex_buf, \ 119 | "{:} test resulted in buffer {:}, expected {:}".format(name, buf, ex_buf) 120 | assert deps == ex_deps, \ 121 | "{:} test resulted in dependency list {:}, expected {:}".format(name, deps, ex_deps) 122 | print("{:} test passed!".format(name)) 123 | 124 | 125 | def test_parse_step(): 126 | """Simple tests for the PartialParse.parse_step function 127 | Warning: these are not exhaustive 128 | """ 129 | test_step("SHIFT", "S", ["ROOT", "the"], ["cat", "sat"], [], 130 | ("ROOT", "the", "cat"), ("sat",), ()) 131 | test_step("LEFT-ARC", "LA", ["ROOT", "the", "cat"], ["sat"], [], 132 | ("ROOT", "cat",), ("sat",), (("cat", "the"),)) 133 | test_step("RIGHT-ARC", "RA", ["ROOT", "run", "fast"], [], [], 134 | ("ROOT", "run",), (), (("run", "fast"),)) 135 | 136 | 137 | def test_parse(): 138 | """Simple tests for the PartialParse.parse function 139 | Warning: these are not exhaustive 140 | """ 141 | sentence = ["parse", "this", "sentence"] 142 | dependencies = PartialParse(sentence).parse(["S", "S", "S", "LA", "RA", "RA"]) 143 | dependencies = tuple(sorted(dependencies)) 144 | expected = (('ROOT', 'parse'), ('parse', 'sentence'), ('sentence', 'this')) 145 | assert dependencies == expected, \ 146 | "parse test resulted in dependencies {:}, expected {:}".format(dependencies, expected) 147 | assert tuple(sentence) == ("parse", "this", "sentence"), \ 148 | "parse test failed: the input sentence should not be modified" 149 | print("parse test passed!") 150 | 151 | 152 | class DummyModel(object): 153 | """Dummy model for testing the minibatch_parse function 154 | First shifts everything onto the stack and then does exclusively right arcs if the first word of 155 | the sentence is "right", "left" if otherwise. 156 | """ 157 | def predict(self, partial_parses): 158 | return [("RA" if pp.stack[1] is "right" else "LA") if len(pp.buffer) == 0 else "S" 159 | for pp in partial_parses] 160 | 161 | 162 | def test_dependencies(name, deps, ex_deps): 163 | """Tests the provided dependencies match the expected dependencies""" 164 | deps = tuple(sorted(deps)) 165 | assert deps == ex_deps, \ 166 | "{:} test resulted in dependency list {:}, expected {:}".format(name, deps, ex_deps) 167 | 168 | 169 | def test_minibatch_parse(): 170 | """Simple tests for the minibatch_parse function 171 | Warning: these are not exhaustive 172 | """ 173 | sentences = [["right", "arcs", "only"], 174 | ["right", "arcs", "only", "again"], 175 | ["left", "arcs", "only"], 176 | ["left", "arcs", "only", "again"]] 177 | deps = minibatch_parse(sentences, DummyModel(), 2) 178 | test_dependencies("minibatch_parse", deps[0], 179 | (('ROOT', 'right'), ('arcs', 'only'), ('right', 'arcs'))) 180 | test_dependencies("minibatch_parse", deps[1], 181 | (('ROOT', 'right'), ('arcs', 'only'), ('only', 'again'), ('right', 'arcs'))) 182 | test_dependencies("minibatch_parse", deps[2], 183 | (('only', 'ROOT'), ('only', 'arcs'), ('only', 'left'))) 184 | test_dependencies("minibatch_parse", deps[3], 185 | (('again', 'ROOT'), ('again', 'arcs'), ('again', 'left'), ('again', 'only'))) 186 | print("minibatch_parse test passed!") 187 | 188 | if __name__ == '__main__': 189 | test_parse_step() 190 | test_parse() 191 | test_minibatch_parse() 192 | -------------------------------------------------------------------------------- /NLP/assignment2/utils/general_utils.py: -------------------------------------------------------------------------------- 1 | import sys 2 | import time 3 | import numpy as np 4 | 5 | 6 | def get_minibatches(data, minibatch_size, shuffle=True): 7 | """ 8 | Iterates through the provided data one minibatch at at time. You can use this function to 9 | iterate through data in minibatches as follows: 10 | 11 | for inputs_minibatch in get_minibatches(inputs, minibatch_size): 12 | ... 13 | 14 | Or with multiple data sources: 15 | 16 | for inputs_minibatch, labels_minibatch in get_minibatches([inputs, labels], minibatch_size): 17 | ... 18 | 19 | Args: 20 | data: there are two possible values: 21 | - a list or numpy array 22 | - a list where each element is either a list or numpy array 23 | minibatch_size: the maximum number of items in a minibatch 24 | shuffle: whether to randomize the order of returned data 25 | Returns: 26 | minibatches: the return value depends on data: 27 | - If data is a list/array it yields the next minibatch of data. 28 | - If data a list of lists/arrays it returns the next minibatch of each element in the 29 | list. This can be used to iterate through multiple data sources 30 | (e.g., features and labels) at the same time. 31 | 32 | """ 33 | list_data = type(data) is list and (type(data[0]) is list or type(data[0]) is np.ndarray) 34 | data_size = len(data[0]) if list_data else len(data) 35 | indices = np.arange(data_size) 36 | if shuffle: 37 | np.random.shuffle(indices) 38 | for minibatch_start in np.arange(0, data_size, minibatch_size): 39 | minibatch_indices = indices[minibatch_start:minibatch_start + minibatch_size] 40 | yield [_minibatch(d, minibatch_indices) for d in data] if list_data \ 41 | else _minibatch(data, minibatch_indices) 42 | 43 | 44 | def _minibatch(data, minibatch_idx): 45 | return data[minibatch_idx] if type(data) is np.ndarray else [data[i] for i in minibatch_idx] 46 | 47 | 48 | def test_all_close(name, actual, expected): 49 | if actual.shape != expected.shape: 50 | raise ValueError("{:} failed, expected output to have shape {:} but has shape {:}" 51 | .format(name, expected.shape, actual.shape)) 52 | if np.amax(np.fabs(actual - expected)) > 1e-6: 53 | raise ValueError("{:} failed, expected {:} but value is {:}".format(name, expected, actual)) 54 | else: 55 | print(name, "passed!") 56 | -------------------------------------------------------------------------------- /NLP/assignment3/assignment3-soln.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/bayeslabs/AiGym/205350c311f0e0981fe90eae84586c9bd4a9cfef/NLP/assignment3/assignment3-soln.pdf -------------------------------------------------------------------------------- /NLP/assignment3/assignment3.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/bayeslabs/AiGym/205350c311f0e0981fe90eae84586c9bd4a9cfef/NLP/assignment3/assignment3.pdf -------------------------------------------------------------------------------- /NLP/assignment3/data_util.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python3 2 | # -*- coding: utf-8 -*- 3 | """ 4 | Utility functions to process data. 5 | """ 6 | import os 7 | import pickle 8 | import logging 9 | from collections import Counter 10 | 11 | import numpy as np 12 | from util import read_conll, one_hot, window_iterator, ConfusionMatrix, load_word_vector_mapping 13 | from defs import LBLS, NONE, LMAP, NUM, UNK, EMBED_SIZE 14 | 15 | logger = logging.getLogger(__name__) 16 | logger.setLevel(logging.DEBUG) 17 | logging.basicConfig(format='%(levelname)s:%(message)s', level=logging.DEBUG) 18 | 19 | 20 | FDIM = 4 21 | P_CASE = "CASE:" 22 | CASES = ["aa", "AA", "Aa", "aA"] 23 | START_TOKEN = "" 24 | END_TOKEN = "" 25 | 26 | def casing(word): 27 | if len(word) == 0: return word 28 | 29 | # all lowercase 30 | if word.islower(): return "aa" 31 | # all uppercase 32 | elif word.isupper(): return "AA" 33 | # starts with capital 34 | elif word[0].isupper(): return "Aa" 35 | # has non-initial capital 36 | else: return "aA" 37 | 38 | def normalize(word): 39 | """ 40 | Normalize words that are numbers or have casing. 41 | """ 42 | if word.isdigit(): return NUM 43 | else: return word.lower() 44 | 45 | def featurize(embeddings, word): 46 | """ 47 | Featurize a word given embeddings. 48 | """ 49 | case = casing(word) 50 | word = normalize(word) 51 | case_mapping = {c: one_hot(FDIM, i) for i, c in enumerate(CASES)} 52 | wv = embeddings.get(word, embeddings[UNK]) 53 | fv = case_mapping[case] 54 | return np.hstack((wv, fv)) 55 | 56 | def evaluate(model, X, Y): 57 | cm = ConfusionMatrix(labels=LBLS) 58 | Y_ = model.predict(X) 59 | for i in range(Y.shape[0]): 60 | y, y_ = np.argmax(Y[i]), np.argmax(Y_[i]) 61 | cm.update(y,y_) 62 | cm.print_table() 63 | return cm.summary() 64 | 65 | class ModelHelper(object): 66 | """ 67 | This helper takes care of preprocessing data, constructing embeddings, etc. 68 | """ 69 | def __init__(self, tok2id, max_length): 70 | self.tok2id = tok2id 71 | self.START = [tok2id[START_TOKEN], tok2id[P_CASE + "aa"]] 72 | self.END = [tok2id[END_TOKEN], tok2id[P_CASE + "aa"]] 73 | self.max_length = max_length 74 | 75 | def vectorize_example(self, sentence, labels=None): 76 | sentence_ = [[self.tok2id.get(normalize(word), self.tok2id[UNK]), self.tok2id[P_CASE + casing(word)]] for word in sentence] 77 | if labels: 78 | labels_ = [LBLS.index(l) for l in labels] 79 | return sentence_, labels_ 80 | else: 81 | return sentence_, [LBLS[-1] for _ in sentence] 82 | 83 | def vectorize(self, data): 84 | return [self.vectorize_example(sentence, labels) for sentence, labels in data] 85 | 86 | @classmethod 87 | def build(cls, data): 88 | # Preprocess data to construct an embedding 89 | # Reserve 0 for the special NIL token. 90 | tok2id = build_dict((normalize(word) for sentence, _ in data for word in sentence), offset=1, max_words=10000) 91 | tok2id.update(build_dict([P_CASE + c for c in CASES], offset=len(tok2id))) 92 | tok2id.update(build_dict([START_TOKEN, END_TOKEN, UNK], offset=len(tok2id))) 93 | assert sorted(tok2id.items(), key=lambda t: t[1])[0][1] == 1 94 | logger.info("Built dictionary for %d features.", len(tok2id)) 95 | 96 | max_length = max(len(sentence) for sentence, _ in data) 97 | 98 | return cls(tok2id, max_length) 99 | 100 | def save(self, path): 101 | # Make sure the directory exists. 102 | if not os.path.exists(path): 103 | os.makedirs(path) 104 | # Save the tok2id map. 105 | with open(os.path.join(path, "features.pkl"), "wb") as f: 106 | pickle.dump([self.tok2id, self.max_length], f) 107 | 108 | @classmethod 109 | def load(cls, path): 110 | # Make sure the directory exists. 111 | assert os.path.exists(path) and os.path.exists(os.path.join(path, "features.pkl")) 112 | # Save the tok2id map. 113 | with open(os.path.join(path, "features.pkl"), "rb") as f: 114 | tok2id, max_length = pickle.load(f) 115 | return cls(tok2id, max_length) 116 | 117 | def load_and_preprocess_data(args): 118 | logger.info("Loading training data...") 119 | train = read_conll(args.data_train) 120 | logger.info("Done. Read %d sentences", len(train)) 121 | logger.info("Loading dev data...") 122 | dev = read_conll(args.data_dev) 123 | logger.info("Done. Read %d sentences", len(dev)) 124 | 125 | helper = ModelHelper.build(train) 126 | 127 | # now process all the input data. 128 | train_data = helper.vectorize(train) 129 | dev_data = helper.vectorize(dev) 130 | 131 | return helper, train_data, dev_data, train, dev 132 | 133 | def load_embeddings(args, helper): 134 | embeddings = np.array(np.random.randn(len(helper.tok2id) + 1, EMBED_SIZE), dtype=np.float32) 135 | embeddings[0] = 0. 136 | for word, vec in load_word_vector_mapping(args.vocab, args.vectors).items(): 137 | word = normalize(word) 138 | if word in helper.tok2id: 139 | embeddings[helper.tok2id[word]] = vec 140 | logger.info("Initialized embeddings.") 141 | 142 | return embeddings 143 | 144 | def build_dict(words, max_words=None, offset=0): 145 | cnt = Counter(words) 146 | if max_words: 147 | words = cnt.most_common(max_words) 148 | else: 149 | words = cnt.most_common() 150 | return {word: offset+i for i, (word, _) in enumerate(words)} 151 | 152 | 153 | def get_chunks(seq, default=LBLS.index(NONE)): 154 | """Breaks input of 4 4 4 0 0 4 0 -> (0, 4, 5), (0, 6, 7)""" 155 | chunks = [] 156 | chunk_type, chunk_start = None, None 157 | for i, tok in enumerate(seq): 158 | # End of a chunk 1 159 | if tok == default and chunk_type is not None: 160 | # Add a chunk. 161 | chunk = (chunk_type, chunk_start, i) 162 | chunks.append(chunk) 163 | chunk_type, chunk_start = None, None 164 | # End of a chunk + start of a chunk! 165 | elif tok != default: 166 | if chunk_type is None: 167 | chunk_type, chunk_start = tok, i 168 | elif tok != chunk_type: 169 | chunk = (chunk_type, chunk_start, i) 170 | chunks.append(chunk) 171 | chunk_type, chunk_start = tok, i 172 | else: 173 | pass 174 | # end condition 175 | if chunk_type is not None: 176 | chunk = (chunk_type, chunk_start, len(seq)) 177 | chunks.append(chunk) 178 | return chunks 179 | 180 | def test_get_chunks(): 181 | assert get_chunks([4, 4, 4, 0, 0, 4, 1, 2, 4, 3], 4) == [(0,3,5), (1, 6, 7), (2, 7, 8), (3,9,10)] 182 | -------------------------------------------------------------------------------- /NLP/assignment3/defs.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python3 2 | # -*- coding: utf-8 -*- 3 | """ 4 | Common definitions for NER 5 | """ 6 | 7 | from util import one_hot 8 | 9 | LBLS = [ 10 | "PER", 11 | "ORG", 12 | "LOC", 13 | "MISC", 14 | "O", 15 | ] 16 | NONE = "O" 17 | LMAP = {k: one_hot(5,i) for i, k in enumerate(LBLS)} 18 | NUM = "NNNUMMM" 19 | UNK = "UUUNKKK" 20 | 21 | EMBED_SIZE = 50 22 | -------------------------------------------------------------------------------- /NLP/assignment3/model.py: -------------------------------------------------------------------------------- 1 | class Model(object): 2 | """Abstracts a Tensorflow graph for a learning task. 3 | 4 | We use various Model classes as usual abstractions to encapsulate tensorflow 5 | computational graphs. Each algorithm you will construct in this homework will 6 | inherit from a Model object. 7 | """ 8 | def add_placeholders(self): 9 | """Adds placeholder variables to tensorflow computational graph. 10 | 11 | Tensorflow uses placeholder variables to represent locations in a 12 | computational graph where data is inserted. These placeholders are used as 13 | inputs by the rest of the model building and will be fed data during 14 | training. 15 | 16 | See for more information: 17 | https://www.tensorflow.org/versions/r0.7/api_docs/python/io_ops.html#placeholders 18 | """ 19 | raise NotImplementedError("Each Model must re-implement this method.") 20 | 21 | def create_feed_dict(self, inputs_batch, labels_batch=None): 22 | """Creates the feed_dict for one step of training. 23 | 24 | A feed_dict takes the form of: 25 | feed_dict = { 26 | : , 27 | .... 28 | } 29 | 30 | If labels_batch is None, then no labels are added to feed_dict. 31 | 32 | Hint: The keys for the feed_dict should be a subset of the placeholder 33 | tensors created in add_placeholders. 34 | Args: 35 | inputs_batch: A batch of input data. 36 | labels_batch: A batch of label data. 37 | Returns: 38 | feed_dict: The feed dictionary mapping from placeholders to values. 39 | """ 40 | raise NotImplementedError("Each Model must re-implement this method.") 41 | 42 | def add_prediction_op(self): 43 | """Implements the core of the model that transforms a batch of input data into predictions. 44 | 45 | Returns: 46 | pred: A tensor of shape (batch_size, n_classes) 47 | """ 48 | raise NotImplementedError("Each Model must re-implement this method.") 49 | 50 | def add_loss_op(self, pred): 51 | """Adds Ops for the loss function to the computational graph. 52 | 53 | Args: 54 | pred: A tensor of shape (batch_size, n_classes) 55 | Returns: 56 | loss: A 0-d tensor (scalar) output 57 | """ 58 | raise NotImplementedError("Each Model must re-implement this method.") 59 | 60 | def add_training_op(self, loss): 61 | """Sets up the training Ops. 62 | 63 | Creates an optimizer and applies the gradients to all trainable variables. 64 | The Op returned by this function is what must be passed to the 65 | sess.run() to train the model. See 66 | 67 | https://www.tensorflow.org/versions/r0.7/api_docs/python/train.html#Optimizer 68 | 69 | for more information. 70 | 71 | Args: 72 | loss: Loss tensor (a scalar). 73 | Returns: 74 | train_op: The Op for training. 75 | """ 76 | 77 | raise NotImplementedError("Each Model must re-implement this method.") 78 | 79 | def train_on_batch(self, sess, inputs_batch, labels_batch): 80 | """Perform one step of gradient descent on the provided batch of data. 81 | 82 | Args: 83 | sess: tf.Session() 84 | input_batch: np.ndarray of shape (n_samples, n_features) 85 | labels_batch: np.ndarray of shape (n_samples, n_classes) 86 | Returns: 87 | loss: loss over the batch (a scalar) 88 | """ 89 | feed = self.create_feed_dict(inputs_batch, labels_batch=labels_batch) 90 | _, loss = sess.run([self.train_op, self.loss], feed_dict=feed) 91 | return loss 92 | 93 | def predict_on_batch(self, sess, inputs_batch): 94 | """Make predictions for the provided batch of data 95 | 96 | Args: 97 | sess: tf.Session() 98 | input_batch: np.ndarray of shape (n_samples, n_features) 99 | Returns: 100 | predictions: np.ndarray of shape (n_samples, n_classes) 101 | """ 102 | feed = self.create_feed_dict(inputs_batch) 103 | predictions = sess.run(self.pred, feed_dict=feed) 104 | return predictions 105 | 106 | def build(self): 107 | self.add_placeholders() 108 | self.pred = self.add_prediction_op() 109 | self.loss = self.add_loss_op(self.pred) 110 | self.train_op = self.add_training_op(self.loss) 111 | -------------------------------------------------------------------------------- /NLP/assignment3/ner_model.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python2.7 2 | # -*- coding: utf-8 -*- 3 | """ 4 | A model for named entity recognition. 5 | """ 6 | import pdb 7 | import logging 8 | 9 | import tensorflow as tf 10 | from util import ConfusionMatrix, Progbar, minibatches 11 | from data_util import get_chunks 12 | from model import Model 13 | from defs import LBLS 14 | 15 | logger = logging.getLogger("hw3") 16 | logger.setLevel(logging.DEBUG) 17 | logging.basicConfig(format='%(levelname)s:%(message)s', level=logging.DEBUG) 18 | 19 | class NERModel(Model): 20 | """ 21 | Implements special functionality for NER models. 22 | """ 23 | 24 | def __init__(self, helper, config, report=None): 25 | self.helper = helper 26 | self.config = config 27 | self.report = report 28 | 29 | def preprocess_sequence_data(self, examples): 30 | """Preprocess sequence data for the model. 31 | 32 | Args: 33 | examples: A list of vectorized input/output sequences. 34 | Returns: 35 | A new list of vectorized input/output pairs appropriate for the model. 36 | """ 37 | raise NotImplementedError("Each Model must re-implement this method.") 38 | 39 | def consolidate_predictions(self, data_raw, data, preds): 40 | """ 41 | Convert a sequence of predictions according to the batching 42 | process back into the original sequence. 43 | """ 44 | raise NotImplementedError("Each Model must re-implement this method.") 45 | 46 | 47 | def evaluate(self, sess, examples, examples_raw): 48 | """Evaluates model performance on @examples. 49 | 50 | This function uses the model to predict labels for @examples and constructs a confusion matrix. 51 | 52 | Args: 53 | sess: the current TensorFlow session. 54 | examples: A list of vectorized input/output pairs. 55 | examples: A list of the original input/output sequence pairs. 56 | Returns: 57 | The F1 score for predicting tokens as named entities. 58 | """ 59 | token_cm = ConfusionMatrix(labels=LBLS) 60 | 61 | correct_preds, total_correct, total_preds = 0., 0., 0. 62 | for _, labels, labels_ in self.output(sess, examples_raw, examples): 63 | for l, l_ in zip(labels, labels_): 64 | token_cm.update(l, l_) 65 | gold = set(get_chunks(labels)) 66 | pred = set(get_chunks(labels_)) 67 | correct_preds += len(gold.intersection(pred)) 68 | total_preds += len(pred) 69 | total_correct += len(gold) 70 | 71 | p = correct_preds / total_preds if correct_preds > 0 else 0 72 | r = correct_preds / total_correct if correct_preds > 0 else 0 73 | f1 = 2 * p * r / (p + r) if correct_preds > 0 else 0 74 | return token_cm, (p, r, f1) 75 | 76 | 77 | def output(self, sess, inputs_raw, inputs=None): 78 | """ 79 | Reports the output of the model on examples (uses helper to featurize each example). 80 | """ 81 | if inputs is None: 82 | inputs = self.preprocess_sequence_data(self.helper.vectorize(inputs_raw)) 83 | 84 | preds = [] 85 | prog = Progbar(target=1 + int(len(inputs) / self.config.batch_size)) 86 | for i, batch in enumerate(minibatches(inputs, self.config.batch_size, shuffle=False)): 87 | # Ignore predict 88 | batch = batch[:1] + batch[2:] 89 | preds_ = self.predict_on_batch(sess, *batch) 90 | preds += list(preds_) 91 | prog.update(i + 1, []) 92 | return self.consolidate_predictions(inputs_raw, inputs, preds) 93 | 94 | def fit(self, sess, saver, train_examples_raw, dev_set_raw): 95 | best_score = 0. 96 | 97 | train_examples = self.preprocess_sequence_data(train_examples_raw) 98 | dev_set = self.preprocess_sequence_data(dev_set_raw) 99 | 100 | for epoch in range(self.config.n_epochs): 101 | logger.info("Epoch %d out of %d", epoch + 1, self.config.n_epochs) 102 | # You may use the progress bar to monitor the training progress 103 | # Addition of progress bar will not be graded, but may help when debugging 104 | prog = Progbar(target=1 + int(len(train_examples) / self.config.batch_size)) 105 | 106 | # The general idea is to loop over minibatches from train_examples, and run train_on_batch inside the loop 107 | # Hint: train_examples could be a list containing the feature data and label data 108 | # Read the doc for utils.get_minibatches to find out how to use it. 109 | # Note that get_minibatches could either return a list, or a list of list 110 | # [features, labels]. This makes expanding tuples into arguments (* operator) handy 111 | 112 | ### YOUR CODE HERE (2-3 lines) 113 | for batch in minibatches(train_examples, self.config.batch_size): 114 | if len(batch)==2: 115 | features,labels=batch 116 | self.train_on_batch(sess, features, labels) 117 | if len(batch)==3: 118 | features,labels,masks=batch 119 | self.train_on_batch(sess, inputs_batch=features, labels_batch=labels, mask_batch=masks) 120 | else: 121 | raise ValueError("train examples has %d element for each sentence! But the supporteds are 2 and 3"%len(batch)) 122 | ### END YOUR CODE 123 | 124 | logger.info("Evaluating on development data") 125 | token_cm, entity_scores = self.evaluate(sess, dev_set, dev_set_raw) 126 | logger.debug("Token-level confusion matrix:\n" + token_cm.as_table()) 127 | logger.debug("Token-level scores:\n" + token_cm.summary()) 128 | logger.info("Entity level P/R/F1: %.2f/%.2f/%.2f", *entity_scores) 129 | 130 | score = entity_scores[-1] 131 | 132 | if score > best_score: 133 | best_score = score 134 | if saver: 135 | logger.info("New best score! Saving model in %s", self.config.model_output) 136 | saver.save(sess, self.config.model_output) 137 | print("") 138 | if self.report: 139 | self.report.log_epoch() 140 | self.report.save() 141 | return best_score 142 | -------------------------------------------------------------------------------- /NLP/assignment3/q2_rnn_cell.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python3 2 | # -*- coding: utf-8 -*- 3 | """ 4 | Q2(c): Recurrent neural nets for NER 5 | """ 6 | 7 | from __future__ import absolute_import 8 | from __future__ import division 9 | 10 | import argparse 11 | import logging 12 | import sys 13 | 14 | import tensorflow as tf 15 | import numpy as np 16 | 17 | logger = logging.getLogger("hw3.q2.1") 18 | logger.setLevel(logging.DEBUG) 19 | logging.basicConfig(format='%(levelname)s:%(message)s', level=logging.DEBUG) 20 | 21 | class RNNCell(tf.nn.rnn_cell.RNNCell): 22 | """Wrapper around our RNN cell implementation that allows us to play 23 | nicely with TensorFlow. 24 | """ 25 | def __init__(self, input_size, state_size): 26 | self.input_size = input_size 27 | self._state_size = state_size 28 | 29 | @property 30 | def state_size(self): 31 | return self._state_size 32 | 33 | @property 34 | def output_size(self): 35 | return self._state_size 36 | 37 | def __call__(self, inputs, state, scope=None): 38 | """Updates the state using the previous @state and @inputs. 39 | Remember the RNN equations are: 40 | 41 | h_t = sigmoid(x_t W_x + h_{t-1} W_h + b) 42 | 43 | TODO: In the code below, implement an RNN cell using @inputs 44 | (x_t above) and the state (h_{t-1} above). 45 | - Define W_x, W_h, b to be variables of the apporiate shape 46 | using the `tf.get_variable' functions. Make sure you use 47 | the names "W_x", "W_h" and "b"! 48 | - Compute @new_state (h_t) defined above 49 | Tips: 50 | - Remember to initialize your matrices using the xavier 51 | initialization as before. 52 | Args: 53 | inputs: is the input vector of size [None, self.input_size] 54 | state: is the previous state vector of size [None, self.state_size] 55 | scope: is the name of the scope to be used when defining the variables inside. 56 | Returns: 57 | a pair of the output vector and the new state vector. 58 | """ 59 | scope = scope or type(self).__name__ 60 | 61 | # It's always a good idea to scope variables in functions lest they 62 | # be defined elsewhere! 63 | with tf.variable_scope(scope): 64 | ### YOUR CODE HERE (~6-10 lines) 65 | W_x = tf.get_variable(name="W_x", shape=(self.input_size,self.state_size),initializer=tf.contrib.layers.xavier_initializer()) 66 | b = tf.get_variable(name='b', shape=(self.state_size), initializer=tf.constant_initializer(0)) 67 | W_h = tf.get_variable(name='W_h', shape=(self.state_size,self.output_size), initializer=tf.contrib.layers.xavier_initializer()) 68 | z_t=tf.add(tf.matmul(inputs,W_x)+tf.matmul(state,W_h),b,name='z_t') 69 | new_state=tf.nn.sigmoid(z_t, name='new_state') 70 | ### END YOUR CODE ### 71 | # For an RNN , the output and state are the same (N.B. this 72 | # isn't true for an LSTM, though we aren't using one of those in 73 | # our assignment) 74 | output = new_state 75 | return output, new_state 76 | 77 | def test_rnn_cell(): 78 | with tf.Graph().as_default(): 79 | with tf.variable_scope("test_rnn_cell"): 80 | x_placeholder = tf.placeholder(tf.float32, shape=(None,3)) 81 | h_placeholder = tf.placeholder(tf.float32, shape=(None,2)) 82 | 83 | with tf.variable_scope("rnn"): 84 | tf.get_variable("W_x", initializer=np.array(np.eye(3,2), dtype=np.float32)) 85 | tf.get_variable("W_h", initializer=np.array(np.eye(2,2), dtype=np.float32)) 86 | tf.get_variable("b", initializer=np.array(np.ones(2), dtype=np.float32)) 87 | 88 | tf.get_variable_scope().reuse_variables() 89 | cell = RNNCell(3, 2) 90 | y_var, ht_var = cell(x_placeholder, h_placeholder, scope="rnn") 91 | 92 | init = tf.global_variables_initializer() 93 | with tf.Session() as session: 94 | session.run(init) 95 | x = np.array([ 96 | [0.4, 0.5, 0.6], 97 | [0.3, -0.2, -0.1]], dtype=np.float32) 98 | h = np.array([ 99 | [0.2, 0.5], 100 | [-0.3, -0.3]], dtype=np.float32) 101 | y = np.array([ 102 | [0.832, 0.881], 103 | [0.731, 0.622]], dtype=np.float32) 104 | ht = y 105 | 106 | y_, ht_ = session.run([y_var, ht_var], feed_dict={x_placeholder: x, h_placeholder: h}) 107 | print("y_ = " + str(y_)) 108 | print("ht_ = " + str(ht_)) 109 | 110 | assert np.allclose(y_, ht_), "output and state should be equal." 111 | assert np.allclose(ht, ht_, atol=1e-2), "new state vector does not seem to be correct." 112 | 113 | def do_test(_): 114 | logger.info("Testing rnn_cell") 115 | test_rnn_cell() 116 | logger.info("Passed!") 117 | 118 | if __name__ == "__main__": 119 | parser = argparse.ArgumentParser(description='Tests the RNN cell implemented as part of Q2 of Homework 3') 120 | subparsers = parser.add_subparsers() 121 | 122 | command_parser = subparsers.add_parser('test', help='') 123 | command_parser.set_defaults(func=do_test) 124 | 125 | ARGS = parser.parse_args() 126 | if ARGS.func is None: 127 | parser.print_help() 128 | sys.exit(1) 129 | else: 130 | ARGS.func(ARGS) 131 | -------------------------------------------------------------------------------- /NLP/assignment3/q3-clip-gru.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/bayeslabs/AiGym/205350c311f0e0981fe90eae84586c9bd4a9cfef/NLP/assignment3/q3-clip-gru.png -------------------------------------------------------------------------------- /NLP/assignment3/q3-clip-rnn.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/bayeslabs/AiGym/205350c311f0e0981fe90eae84586c9bd4a9cfef/NLP/assignment3/q3-clip-rnn.png -------------------------------------------------------------------------------- /NLP/assignment3/q3-noclip-gru.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/bayeslabs/AiGym/205350c311f0e0981fe90eae84586c9bd4a9cfef/NLP/assignment3/q3-noclip-gru.png -------------------------------------------------------------------------------- /NLP/assignment3/q3_gru_cell.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python3 2 | # -*- coding: utf-8 -*- 3 | """ 4 | Q3(d): Grooving with GRUs 5 | """ 6 | 7 | from __future__ import absolute_import 8 | from __future__ import division 9 | 10 | import argparse 11 | import logging 12 | import sys 13 | 14 | import tensorflow as tf 15 | import numpy as np 16 | 17 | logger = logging.getLogger("hw3.q3.1") 18 | logger.setLevel(logging.DEBUG) 19 | logging.basicConfig(format='%(levelname)s:%(message)s', level=logging.DEBUG) 20 | 21 | class GRUCell(tf.nn.rnn_cell.RNNCell): 22 | """Wrapper around our GRU cell implementation that allows us to play 23 | nicely with TensorFlow. 24 | """ 25 | def __init__(self, input_size, state_size): 26 | self.input_size = input_size 27 | self._state_size = state_size 28 | 29 | @property 30 | def state_size(self): 31 | return self._state_size 32 | 33 | @property 34 | def output_size(self): 35 | return self._state_size 36 | 37 | def __call__(self, inputs, state, scope=None): 38 | """Updates the state using the previous @state and @inputs. 39 | Remember the GRU equations are: 40 | 41 | z_t = sigmoid(x_t W_z + h_{t-1} U_z + b_z) 42 | r_t = sigmoid(x_t W_r + h_{t-1} U_r + b_r) 43 | o_t = tanh(x_t W_o + r_t * h_{t-1} U_o + b_o) 44 | h_t = z_t * h_{t-1} + (1 - z_t) * o_t 45 | 46 | TODO: In the code below, implement an GRU cell using @inputs 47 | (x_t above) and the state (h_{t-1} above). 48 | - Define U_r, W_r, b_r, U_z, W_z, b_z and U_o, W_o, b_o to 49 | be variables of the apporiate shape using the 50 | `tf.get_variable' functions. 51 | - Compute z, r, o and @new_state (h_t) defined above 52 | Tips: 53 | - Remember to initialize your matrices using the xavier 54 | initialization as before. 55 | Args: 56 | inputs: is the input vector of size [None, self.input_size] 57 | state: is the previous state vector of size [None, self.state_size] 58 | scope: is the name of the scope to be used when defining the variables inside. 59 | Returns: 60 | a pair of the output vector and the new state vector. 61 | """ 62 | scope = scope or type(self).__name__ 63 | 64 | # It's always a good idea to scope variables in functions lest they 65 | # be defined elsewhere! 66 | with tf.variable_scope(scope): 67 | ### YOUR CODE HERE (~20-30 lines) 68 | W_z=tf.get_variable(name="W_z", shape=(self.input_size, self.output_size), initializer=tf.contrib.layers.xavier_initializer()) 69 | U_z=tf.get_variable(name="U_z", shape=(self.output_size, self.output_size), initializer=tf.contrib.layers.xavier_initializer()) 70 | b_z=tf.get_variable(name='b_z', shape=(self.output_size)) 71 | W_r=tf.get_variable(name="W_r", shape=(self.input_size, self.output_size), initializer=tf.contrib.layers.xavier_initializer()) 72 | U_r=tf.get_variable(name="U_r", shape=(self.output_size, self.output_size), initializer=tf.contrib.layers.xavier_initializer()) 73 | b_r=tf.get_variable(name='b_r', shape=(self.output_size)) 74 | W_o=tf.get_variable(name="W_o", shape=(self.input_size, self.output_size), initializer=tf.contrib.layers.xavier_initializer()) 75 | U_o=tf.get_variable(name="U_o", shape=(self.output_size, self.output_size), initializer=tf.contrib.layers.xavier_initializer()) 76 | b_o=tf.get_variable(name='b_o', shape=(self.output_size)) 77 | z_t=tf.nn.sigmoid(tf.add(tf.matmul(inputs,W_z)+tf.matmul(state,U_z),b_z),name='z_t') 78 | r_t=tf.nn.sigmoid(tf.add(tf.matmul(inputs,W_r)+tf.matmul(state,U_r),b_r),name='r_t') 79 | o_t=tf.nn.tanh(tf.add(tf.matmul(inputs,W_o)+r_t*tf.matmul(state,U_o),b_o),name='o_t') 80 | new_state=z_t*state+(1-z_t)*o_t 81 | ### END YOUR CODE ### 82 | # For a GRU, the output and state are the same (N.B. this isn't true 83 | # for an LSTM, though we aren't using one of those in our 84 | # assignment) 85 | output = new_state 86 | return output, new_state 87 | 88 | def test_gru_cell(): 89 | with tf.Graph().as_default(): 90 | with tf.variable_scope("test_gru_cell"): 91 | x_placeholder = tf.placeholder(tf.float32, shape=(None,3)) 92 | h_placeholder = tf.placeholder(tf.float32, shape=(None,2)) 93 | 94 | with tf.variable_scope("gru"): 95 | tf.get_variable("W_r", initializer=np.array(np.eye(3,2), dtype=np.float32)) 96 | tf.get_variable("U_r", initializer=np.array(np.eye(2,2), dtype=np.float32)) 97 | tf.get_variable("b_r", initializer=np.array(np.ones(2), dtype=np.float32)) 98 | tf.get_variable("W_z", initializer=np.array(np.eye(3,2), dtype=np.float32)) 99 | tf.get_variable("U_z", initializer=np.array(np.eye(2,2), dtype=np.float32)) 100 | tf.get_variable("b_z", initializer=np.array(np.ones(2), dtype=np.float32)) 101 | tf.get_variable("W_o", initializer=np.array(np.eye(3,2), dtype=np.float32)) 102 | tf.get_variable("U_o", initializer=np.array(np.eye(2,2), dtype=np.float32)) 103 | tf.get_variable("b_o", initializer=np.array(np.ones(2), dtype=np.float32)) 104 | 105 | tf.get_variable_scope().reuse_variables() 106 | cell = GRUCell(3, 2) 107 | y_var, ht_var = cell(x_placeholder, h_placeholder, scope="gru") 108 | 109 | init = tf.global_variables_initializer() 110 | with tf.Session() as session: 111 | session.run(init) 112 | x = np.array([ 113 | [0.4, 0.5, 0.6], 114 | [0.3, -0.2, -0.1]], dtype=np.float32) 115 | h = np.array([ 116 | [0.2, 0.5], 117 | [-0.3, -0.3]], dtype=np.float32) 118 | y = np.array([ 119 | [ 0.320, 0.555], 120 | [-0.006, 0.020]], dtype=np.float32) 121 | ht = y 122 | 123 | y_, ht_ = session.run([y_var, ht_var], feed_dict={x_placeholder: x, h_placeholder: h}) 124 | print("y_ = " + str(y_)) 125 | print("ht_ = " + str(ht_)) 126 | 127 | assert np.allclose(y_, ht_), "output and state should be equal." 128 | assert np.allclose(ht, ht_, atol=1e-2), "new state vector does not seem to be correct." 129 | 130 | def do_test(_): 131 | logger.info("Testing gru_cell") 132 | test_gru_cell() 133 | logger.info("Passed!") 134 | 135 | if __name__ == "__main__": 136 | parser = argparse.ArgumentParser(description='Tests the GRU cell implemented as part of Q3 of Homework 3') 137 | subparsers = parser.add_subparsers() 138 | 139 | command_parser = subparsers.add_parser('test', help='') 140 | command_parser.set_defaults(func=do_test) 141 | 142 | ARGS = parser.parse_args() 143 | if ARGS.func is None: 144 | parser.print_help() 145 | sys.exit(1) 146 | else: 147 | ARGS.func(ARGS) 148 | -------------------------------------------------------------------------------- /NLP/assignment3/requirements.txt: -------------------------------------------------------------------------------- 1 | tensorflow>=0.12 2 | matplotlib 3 | -------------------------------------------------------------------------------- /Python/CME193/lec1.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/bayeslabs/AiGym/205350c311f0e0981fe90eae84586c9bd4a9cfef/Python/CME193/lec1.pdf -------------------------------------------------------------------------------- /Python/CME193/lec2.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/bayeslabs/AiGym/205350c311f0e0981fe90eae84586c9bd4a9cfef/Python/CME193/lec2.pdf -------------------------------------------------------------------------------- /Python/CME193/lec3.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/bayeslabs/AiGym/205350c311f0e0981fe90eae84586c9bd4a9cfef/Python/CME193/lec3.pdf -------------------------------------------------------------------------------- /Python/CME193/lec4.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/bayeslabs/AiGym/205350c311f0e0981fe90eae84586c9bd4a9cfef/Python/CME193/lec4.pdf -------------------------------------------------------------------------------- /Python/CME193/lec5.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/bayeslabs/AiGym/205350c311f0e0981fe90eae84586c9bd4a9cfef/Python/CME193/lec5.pdf -------------------------------------------------------------------------------- /Python/CME193/lec6.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/bayeslabs/AiGym/205350c311f0e0981fe90eae84586c9bd4a9cfef/Python/CME193/lec6.pdf -------------------------------------------------------------------------------- /Python/CME193/lec7.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/bayeslabs/AiGym/205350c311f0e0981fe90eae84586c9bd4a9cfef/Python/CME193/lec7.pdf -------------------------------------------------------------------------------- /Python/CME193/lec8.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/bayeslabs/AiGym/205350c311f0e0981fe90eae84586c9bd4a9cfef/Python/CME193/lec8.pdf -------------------------------------------------------------------------------- /Python/CME193/problemsets/Markov-chain-startercode-master.zip: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/bayeslabs/AiGym/205350c311f0e0981fe90eae84586c9bd4a9cfef/Python/CME193/problemsets/Markov-chain-startercode-master.zip -------------------------------------------------------------------------------- /Python/CME193/problemsets/Rock-paper-scissors-startercode-master.zip: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/bayeslabs/AiGym/205350c311f0e0981fe90eae84586c9bd4a9cfef/Python/CME193/problemsets/Rock-paper-scissors-startercode-master.zip -------------------------------------------------------------------------------- /Python/CME193/problemsets/exercises.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/bayeslabs/AiGym/205350c311f0e0981fe90eae84586c9bd4a9cfef/Python/CME193/problemsets/exercises.pdf -------------------------------------------------------------------------------- /Python/CME193/problemsets/hangman-master.zip: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/bayeslabs/AiGym/205350c311f0e0981fe90eae84586c9bd4a9cfef/Python/CME193/problemsets/hangman-master.zip -------------------------------------------------------------------------------- /Python/README.md: -------------------------------------------------------------------------------- 1 | 8 | 9 | 10 | 11 | # Python 12 | 13 | ## Course List 14 | **S.No** | **Course Title** | **Link to course** | **Link to Assignment Solutions** 15 | ------------ | ------------- | --------- | ------------- 16 | [1](#1-scientific-python) | Scientific Python | http://web.stanford.edu/class/cme193/ | [CME193 Solutions](https://github.com/icme/cme193) 17 | [2](#2-introduction-to-computer-science-and-programming-in-python) | Introduction to Computer Science and Programming in Python | https://ocw.mit.edu/courses/6-0001-introduction-to-computer-science-and-programming-in-python-fall-2016/ | [CS6.0001 Solutions](https://github.com/tuthang102/MIT-6.0001-Intro-to-CS) 18 | 19 | 20 | ## Course Details 21 | ### 1. Scientific Python 22 | * **Link to course**            :     http://web.stanford.edu/class/cme193/index.html 23 | * **Offered By**                  :     Stanford 24 | * **Pre-Requisites**           :     Basic Programming Knowledge 25 | 26 | * **Level**                           :     Beginner 27 | * **Course description** 28 | This course is recommended for students who are familiar with programming at least at the level of CS106A and want to translate their programming knowledge to Python with the goal of becoming proficient in the scientific computing and data science stack. Lectures will be interactive with a focus on real world applications of scientific computing. Technologies covered include Numpy, SciPy, Pandas, Scikit-learn, and others. Topics will be chosen from Linear Algebra, Optimization, Machine Learning, and Data Science. Prior knowledge of programming will be assumed, and some familiarity with Python is helpful, but not mandatory. 29 | 30 | 31 | ### 2. Introduction to Computer Science and Programming in Python 32 | * **Link to course**            :     https://ocw.mit.edu/courses/6-0001-introduction-to-computer-science-and-programming-in-python-fall-2016/ 33 | * **Offered By**                  :     MIT 34 | * **Pre-Requisites**           :     None 35 | * **Level**                           :     Intermediate 36 | * **Course description** 37 | Introduction to Computer Science and Programming in Python is intended for students with little or no programming experience. It aims to provide students with an understanding of the role computation can play in solving problems and to help students, regardless of their major, feel justifiably confident of their ability to write small programs that allow them to accomplish useful goals. The class uses the Python 3.5 programming language. 38 | 39 | 40 | #### Happy Learning   :thumbsup: :memo: 41 | 42 | 43 | 44 | 45 | 46 | 47 | 48 | -------------------------------------------------------------------------------- /Readme.md: -------------------------------------------------------------------------------- 1 | In this repository we tried to maintain best AI materials to learn and apply in different fields. Most of these courses are based with python. Hope you will enjoy our curation. 2 | -------------------------------------------------------------------------------- /Speech/Readme.md: -------------------------------------------------------------------------------- 1 | For speech processing we will be following stanford CS224s course. 2 | 3 | http://web.stanford.edu/class/cs224s/syllabus.html 4 | --------------------------------------------------------------------------------