├── figures ├── fig2018027_EM.png ├── fig20180407_EM.png ├── fig20180320_monge_matrix.png └── fig20180320_master_theorem.png ├── README.md ├── ML_ProblemSet.ipynb ├── Summary_Algo.ipynb ├── Summary_Basic_compact.ipynb ├── Summary_Basic_Lite.ipynb ├── Summary_Basic.ipynb ├── Summary_Algo_Backup.ipynb ├── Summary_Algo_compact.ipynb └── ProblemSet.ipynb /figures/fig2018027_EM.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/zhangchuheng123/IIIS-preliminary/HEAD/figures/fig2018027_EM.png -------------------------------------------------------------------------------- /figures/fig20180407_EM.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/zhangchuheng123/IIIS-preliminary/HEAD/figures/fig20180407_EM.png -------------------------------------------------------------------------------- /figures/fig20180320_monge_matrix.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/zhangchuheng123/IIIS-preliminary/HEAD/figures/fig20180320_monge_matrix.png -------------------------------------------------------------------------------- /figures/fig20180320_master_theorem.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/zhangchuheng123/IIIS-preliminary/HEAD/figures/fig20180320_master_theorem.png -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # IIIS-preliminary 2 | 3 | ## Description 4 | 5 | This is a summary (cheatsheet) for preliminary exam of IIIS, THU. Areas of the exam are mainly undergraduate level algorithm, AI and machine learning (subarea). 6 | 7 | ## Scope of the Exam 8 | 9 | ### General requirements (60%): 10 | 11 | 1. Algorithms (Please refer to the requirement for the Theory Qualify exam, Algorithm 1- 12) 12 | 2. Solving problem by search (AI Book Chapter 3) 13 | 3. Beyond Classic search (AI Book Chapter 4) 14 | 4. Basic Machine Learning (https://www.coursera.org/learn/machine-learning) 15 | 5. Basic Deep Learning (http://cs231n.stanford.edu/syllabus.html, lecture 1-10) 16 | 17 | ### Sub-area #1 (40%): Machine Learning and Deep learning 18 | 19 | The rest of http://cs231n.stanford.edu/syllabus.html 20 | 21 | Other things you should know: 22 | 1. PCA (FML Ch 12.1) 23 | 1. Random Forest (ESL Ch 15) 24 | 1. Adaboost (FML Ch6) 25 | 1. Gradient Boosting / Additive Models (ESL Ch 9 and 10) 26 | 1. Clustering and unsupervised learning (ESL Ch 14.3) 27 | 1. Bias-Variance Decomposition, Cross-validation (ESL Ch 7) 28 | 1. PAC learning (FML Ch2) 29 | 1. Online learning (FML Ch7) 30 | 1. VC dimension (FML Ch3) 31 | 1. Graphical models: Bayesian network and undirected graphical models (PRML Ch.8) 32 | 1. Optional: Sampling and MCMC (PRML Ch 11) 33 | 1. Optional: Reinforcement Learning (FML Ch14) 34 | 1. Optional: Deep learning for NLP (http://web.stanford.edu/class/cs224n/) 35 | 36 | ### Reference Book 37 | 38 | * [AI Book] Artificial Intelligence: a modern approach. Third Edition. Stuart Russell and Peter Norvig 39 | * [PRML] Pattern Recognition and Machine Learning, Bishop. 40 | * [FML] Foundations of Machine Learning. Mehryar Mohri, Afshin Rostamizadeh, and Ameet Talwalkar. 41 | * [ESL] The Elements of Statistical Learning Data Mining, Inference, and Prediction. Trevor Hastie, Robert Tibshirani, Jerome Friedman. 2008. 42 | -------------------------------------------------------------------------------- /ML_ProblemSet.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "## 1. Bias-Variance Decomposition and Cross Validation \n", 8 | "\n", 9 | "1. 将期望泛化误差分解为bias,variance和noise\n", 10 | "2. k-fold CV中k取值越大,bias,variance如何变化?LOO相对于k-fold有什么优劣?\n", 11 | "\n", 12 | "## 2. Linear Regression\n", 13 | "\n", 14 | "1. 一元线性回归模型$f(x) = w^T x + b$,通过最小化均方误差推导参数的闭式解\n", 15 | "2. 多元线性回归模型$f(X) = W^T X$,通过最小化均方误差推导参数的闭式解\n", 16 | "2. 证明对线性回归模型做牛顿法的更新可以得到其最优解\n", 17 | "3. 线性回归的概率模型$y|x,w \\sim \\mathcal{N}(wx, \\sigma^2)$,通过最大化似然函数求闭式解\n", 18 | "\n", 19 | "## 3. Logistic Regression\n", 20 | "\n", 21 | "1. 假设$\\ln \\dfrac{p(y=1|x)}{p(y=0|x)} = w^T x + b$,写出在一个数据集上的log-likelihood function\n", 22 | "2. 通过最大化likelihood function来推导其梯度下降算法的公式\n", 23 | "\n", 24 | "## 4. Decision Tree\n", 25 | "\n", 26 | "1. 给定一个数据集,基于信息增益找到对于数据集进行划分的特征(计算题)\n", 27 | "1. 推导gradient boosting tree每个节点上的增益计算公式和权重更新公式\n", 28 | "\n", 29 | "## 5. Neural Network\n", 30 | "\n", 31 | "推导BP算法公式\n", 32 | "\n", 33 | "## 6. SVM\n", 34 | "\n", 35 | "1. 给出线性可分SVM原问题,写出其对偶性形式;样本点是支持向量的条件\n", 36 | "2. 给出线性不可分(soft-margin)SVM原问题,写出其对偶形式;样本点在margin上/内/外的条件\n", 37 | "3. 简要概述一种有效解决SVM的算法(SMO)\n", 38 | "4. 写出核化SVM以及核化线性回归的形式\n", 39 | "\n", 40 | "## 7. MAP and ML\n", 41 | "\n", 42 | "1. 给定一个样本序列,用MAP和ML求相应的参数\n", 43 | "2. 如果使用full Bayesian方法应该怎么求?Full Bayesian方法有什么缺点?\n", 44 | "\n", 45 | "## 8. Bayesian Network \n", 46 | "\n", 47 | "1. 给定一个贝叶斯网络,给定某个随机变量的观测值,求另一个随机变量的概率分布\n", 48 | "2. 给定一个贝叶斯网络,当某个随机变量给定的时候求另外两个随机变量是否独立(可以公式写出说明,也可以使用moralization方法)\n", 49 | "3. 给定一个贝叶斯网络,给定证据变量和待查询变量,用Gibbs sampling求待查询变量的概率分布;根据Markov Blanket求解待改变变量的概率分布\n", 50 | "\n", 51 | "## 9. EM Algorithm\n", 52 | "\n", 53 | "1. 通过对$J(c, \\mu) = \\sum_{i=1}^N ||x^{(i)} - \\mu_{c^{(i)}}||^2$做coordinate descent,推导k-mean参数更新公式\n", 54 | "1. 推导Gaussian mixture model的EM算法迭代公式;GMM与k-means有什么关系?\n", 55 | "2. 推导一般问题的EM算法迭代公式\n", 56 | "3. 证明EM算法的收敛性\n", 57 | "\n", 58 | "## 10. Ensemble\n", 59 | "\n", 60 | "1. 证明相互独立的N个分类器进行majority voting,错误率随N指数下降(Hoeffding不等式)\n", 61 | "2. 【AdaBoost】简述什么是additive model;通过最小化指数损失函数$L = \\mathbb{E}_{x\\sim D}[e^{-y f(x)}]$推导AdaBoost的样本权重更新公式\n", 62 | "3. 证明bagging能够减小variance;简述随机森林模型\n", 63 | "\n", 64 | "## 11. Clustering\n", 65 | "\n", 66 | "层次聚类(hierarchical clustering)的几种距离度量有什么优劣(min,max,mean)?层次聚类结果的表示方法(dendrogram);相比于k-mean的优劣。\n", 67 | "\n", 68 | "## 12. kNN\n", 69 | "\n", 70 | "假设对于任意x和小正数e,在x附近e距离范围内都能找到一个训练样本,证明kNN的泛化错误率不超过贝叶斯最优分类器错误率的两倍\n", 71 | "\n", 72 | "## 13. PCA\n", 73 | "\n", 74 | "1. 说明PCA特征值分解的方法等价于找到某个向低维空间的投影,使得各个样本的重构误差最小\n", 75 | "1. 说明PCA特征值分解的方法等价于找到某个向低维空间的投影,使得各个样本投影的方差最大\n", 76 | "\n", 77 | "## 14. PAC and VC Dimension\n", 78 | "\n", 79 | "1. 证明样本数足够大的时候,任意一个假设的在样本上的经验误差与泛化误差大概率接近;\n", 80 | "2. 证明样本数足够大的时候,通过最小化经验误差得到假设的泛化误差与假设空间中最小泛化误差大概率接近;\n", 81 | "3. 求假设的VC维\n", 82 | " - $h(x) = I(a < x)$\n", 83 | " - $h(x) = I(a < x < b)$\n", 84 | " - $h(x) = I(a\\sin x > 0)$\n", 85 | " - $h(x) = I(\\sin(x + a) > 0)$\n", 86 | " - $h(x) = I(\\sin(ax) > 0)$\n", 87 | "\n", 88 | "## 15. Probabilistic Graphical Model\n", 89 | "\n", 90 | "1. 写出隐马可夫模型的联合概率分布,并使用该公式计算某个序列的概率\n", 91 | "2. 给定一个马可夫条件场,以及相应的势函数,计算联合概率和条件概率\n", 92 | "\n", 93 | "## 16. Searching\n", 94 | "\n", 95 | "1. BFS、DFS、A\\*算法complete和optimal的条件,它们的时间和空间复杂度\n", 96 | "2. 证明满足$h(n) \\ge c(n, n') + h(n')$的A\\*算法能找到最优解\n", 97 | "3. 简述模拟退火和遗传算法\n", 98 | "\n", 99 | "## 17. Guassian Discriminate Analysis\n", 100 | "\n", 101 | "$$\n", 102 | "\\begin{aligned}\n", 103 | "y & \\sim Bernoulli(\\Phi) = p(y) = \\Phi^y (1-\\Phi)^{1-y} \\\\\n", 104 | "x|y=0 & \\sim N(\\mu_0, \\Sigma) = \\dfrac{1}{(2\\pi)^{n/2} |\\Sigma|^{1/2}} \\exp(-\\dfrac{1}{2}(x-\\mu_0)^T \\Sigma^{-1} (x-\\mu_0)) \\\\\n", 105 | "x|y=1 & \\sim N(\\mu_1, \\Sigma)\n", 106 | "\\end{aligned}\n", 107 | "$$\n", 108 | "\n", 109 | "通过最大化以上模型的似然函数求其参数估计\n", 110 | "\n", 111 | "## 18. Naive Bayes\n", 112 | "\n", 113 | "1. 对于垃圾邮件判别任务写出如何利用朴素贝叶斯模型来做判断\n", 114 | "2. 通过最大化似然函数来推导参数估计公式\n", 115 | "\n", 116 | "## 19. Online Learning\n", 117 | "\n", 118 | "证明perceptron algorithm所犯错误次数有上界\n" 119 | ] 120 | }, 121 | { 122 | "cell_type": "code", 123 | "execution_count": null, 124 | "metadata": { 125 | "collapsed": true 126 | }, 127 | "outputs": [], 128 | "source": [] 129 | } 130 | ], 131 | "metadata": { 132 | "kernelspec": { 133 | "display_name": "Python 3", 134 | "language": "python", 135 | "name": "python3" 136 | }, 137 | "language_info": { 138 | "codemirror_mode": { 139 | "name": "ipython", 140 | "version": 3 141 | }, 142 | "file_extension": ".py", 143 | "mimetype": "text/x-python", 144 | "name": "python", 145 | "nbconvert_exporter": "python", 146 | "pygments_lexer": "ipython3", 147 | "version": "3.4.2" 148 | } 149 | }, 150 | "nbformat": 4, 151 | "nbformat_minor": 2 152 | } 153 | -------------------------------------------------------------------------------- /Summary_Algo.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "## Sort\n", 8 | "\n", 9 | "### Bubble Sort\n", 10 | "\n", 11 | "```\n", 12 | "for i = 1 to N-1:\n", 13 | " for j = 1 to N-i:\n", 14 | " if A[j] < A[j-1]:\n", 15 | " swap(A[j], A[j-1])\n", 16 | "```\n", 17 | "\n", 18 | "* 证明正确性:每个j循环之后,A[N-i:N]都变为有序的\n", 19 | "* 复杂度:不管什么情况都需要$O(n^2)$次判断,最好情况可以不做交换,平均和最坏需要$O(n^2)$次交换\n", 20 | "\n", 21 | "### Insertion Sort\n", 22 | "\n", 23 | "```\n", 24 | "for i = 2 to N:\n", 25 | " tmp = A[i]\n", 26 | " for j = i downto 2:\n", 27 | " if tmp >= A[j-1]:\n", 28 | " break\n", 29 | " A[j] = A[j-1]\n", 30 | " A[j] = tmp\n", 31 | "```\n", 32 | "\n", 33 | "* 证明正确性:每次j循环之后A[1:i]都变为有序的\n", 34 | "* 运行时间:$O(n^2)$\n", 35 | "\n", 36 | "### Heap Sort\n", 37 | "\n", 38 | "* 最大堆:一颗二叉树,每个父节点都比自己的子节点大\n", 39 | "* 一个最大堆,根节点i被修改了,通过max_heapify可以使得其在$O(\\log n)$时间内维护成最大堆\n", 40 | "\n", 41 | "```\n", 42 | "def max_heapify(A, i):\n", 43 | " l = left(i)\n", 44 | " r = right(i)\n", 45 | " if l <= heapsize and A[l] > A[i]:\n", 46 | " largest = l\n", 47 | " else:\n", 48 | " largest = i\n", 49 | " if r <= heapsize and A[r] > A[largest]\n", 50 | " largest = r\n", 51 | " if largest != i\n", 52 | " swap(A[i], A[largest])\n", 53 | " max_heapify(A, largest)\n", 54 | "```\n", 55 | "\n", 56 | "* 思路:先建立一个堆,由最大堆的性质可以知道,堆的根是最大的元素,把根和最后一个叶子互换,然后把那个叶子从堆中取走,再维护成一个最大堆。如此重复,可以得到排序\n", 57 | "\n", 58 | "```\n", 59 | "def build_heap(A):\n", 60 | " headsize(A) = len(A)\n", 61 | " for i = floor(len(A) / 2) downto 1:\n", 62 | " max_heapify(A, i)\n", 63 | "```\n", 64 | "\n", 65 | "```\n", 66 | "def heap_sort(A)\n", 67 | " build_heap(A)\n", 68 | " for i = N downto 2:\n", 69 | " swap(A[1], A[i])\n", 70 | " heapsize(A) -= 1\n", 71 | " max_heapify(A, 1)\n", 72 | "```\n", 73 | "\n", 74 | "## Basic Graph\n", 75 | "\n", 76 | "### BFS\n", 77 | "\n", 78 | "```\n", 79 | "def BSF(G, s):\n", 80 | " for each u in G:\n", 81 | " color[u] = WHITE\n", 82 | " color[s] = GRAY\n", 83 | " Q = [s]\n", 84 | " while Q:\n", 85 | " u = Q.pop()\n", 86 | " for each v adj to u:\n", 87 | " if color[v] = WHITE:\n", 88 | " color[v] = GRAY\n", 89 | " Q.append(v)\n", 90 | " color[u] = BLACK\n", 91 | "```\n", 92 | "\n", 93 | "### DFS\n", 94 | "\n", 95 | "```\n", 96 | "def DFS(G):\n", 97 | " for each u in G:\n", 98 | " color[u] = WHITE\n", 99 | " for each u in G:\n", 100 | " if color[u] == WHITE:\n", 101 | " DFS_visit(u)\n", 102 | " \n", 103 | "def DFS_visit(u):\n", 104 | " color[u] = GRAY\n", 105 | " for each v adj to u:\n", 106 | " if color[v] == WHITE:\n", 107 | " DFS_visit(v)\n", 108 | " color[u] = BLACK\n", 109 | "```\n", 110 | "\n", 111 | "### Topological Order\n", 112 | "\n", 113 | "* 性质:在有向无环图中,存在一种排序使得每条边都往后指\n", 114 | "* 运行DFS,并且按照每个节点被标记为黑色的时间先后顺序排序就是拓扑排序;只有某个节点的后继都标黑了它才可能标黑,所以标黑时间靠后的排在前面。\n", 115 | "\n", 116 | "### Strongly Connected Components\n", 117 | "\n", 118 | "* 定义:一个图中的一个节点集合,任意两个节点都可以互通,就叫做强连通分支;问题是给定一个图,找出其强连通分支的分解\n", 119 | "* 图的转置:就是把边都翻过来 $G^T=(V, E^T)$,$E^T = \\{(u,v) | (v, u)\\in E\\}$\n", 120 | "\n", 121 | "```\n", 122 | "def strongly_connected_components(G):\n", 123 | " DFS(G) record turn-black time f(u) for all u\n", 124 | " DFS(G^T) in decreasing order of f(u) and record the forest\n", 125 | " each tree in the forest is a strongly connected component\n", 126 | "```\n", 127 | "\n", 128 | "## Matching\n", 129 | "\n", 130 | "* 定义:图$G(V,E)$中的一个边集$M\\subseteq E$,每个结点都至多出现在M的一条边上。称M为匹配。如果每个结点恰好出现在M的一条边上,成为完全匹配。\n", 131 | " * matching: 不共点的边集M\n", 132 | " * matching number: 不共点边集的边的数量\n", 133 | " * maximum matching: matching number最大的\n", 134 | " * perfect matching: 能够覆盖所有点的\n", 135 | " * M-alternating path: $G=(V, E)$中一条交替出现在$M$和$E\\setminus M$中的路径\n", 136 | " * M-augmenting path: 路径中两端没有被覆盖\n", 137 | " \n", 138 | "### Hungarian Algorithm\n", 139 | "\n", 140 | "```\n", 141 | "start from any matching M\n", 142 | "if M is a perfect matching:\n", 143 | " return M\n", 144 | "x0 = a exposed vertex in X\n", 145 | "A = {x0} \n", 146 | "B = {}\n", 147 | "if N(A) == B:\n", 148 | " return NO_PERFECT_MATCHING\n", 149 | " # because |N(A)|=|B|=|A|-1<|A|, violate Hall's theorem\n", 150 | "else:\n", 151 | " y1 = a vertex in N(A) but not in B\n", 152 | " if y1 is covered by M:\n", 153 | " B += {y1}\n", 154 | " A += {x1: (y1,x1) in M}\n", 155 | " goto 'if N(A) == B'\n", 156 | " else:\n", 157 | " P = path from x0 to y is an alternating path\n", 158 | " replace M by new M'\n", 159 | " goto 'if M is a perfect matching'\n", 160 | "```\n", 161 | "\n", 162 | "### Hall's Theorem\n", 163 | "\n", 164 | "* 动机:如何在不求出最大流的情况下,找出一个证据说某二分图不存在完美匹配,即最大流小于n。根据最大流最小割定理,如果能够找到一个割容量小于n,即可说明。ps. 当然如果直接求出最大流,也能知道是否存在完美匹配。\n", 165 | "* 描述:对于一个子集$A\\subseteq X$,用$\\Gamma(A)\\subseteq Y$表示邻接A中节点的集合,如果二部图$(X,Y)$有完美匹配,那么对于所有的$A\\subseteq X$,都有$|\\Gamma(A)|\\ge |A|$。\n", 166 | "\n", 167 | "\n", 168 | "## Linear Programming\n", 169 | "\n", 170 | "线性规划标准型表示:\n", 171 | "\n", 172 | "$$\n", 173 | "\\begin{aligned}\n", 174 | "& \\max\\ {\\bf C}^T {\\bf x} \\\\\n", 175 | "& s.t. {\\bf Ax} \\le {\\bf b} \\\\\n", 176 | "& \\quad \\quad {\\bf x} \\ge 0\n", 177 | "\\end{aligned}\n", 178 | "$$\n", 179 | "\n", 180 | "LP都可以转化为标准型。特别地,如果某元素$x_j$不满足$x_j \\ge 0$,可令$x'_j,x''_j \\ge 0$,代换$x_j = x'_j - x''_j$\n", 181 | "\n", 182 | "线性规划松弛型表示,相关的不等式约束都变成等式约束,只剩下关于单个元素的不等式约束:\n", 183 | "\n", 184 | "$$\n", 185 | "\\begin{aligned}\n", 186 | "& \\max\\ z = v + \\sum_{j\\in N}c_j x_j \\\\\n", 187 | "& s.t. x_i = b_i - \\sum_{j \\in N} a_{ij}x_j, \\, \\forall i \\in B \\\\\n", 188 | "& \\quad \\quad x_i \\ge 0, \\, \\forall i \\in B\\cup N\n", 189 | "\\end{aligned}\n", 190 | "$$\n", 191 | "\n", 192 | "\n", 193 | "### Simplex\n", 194 | "\n", 195 | "* 基本解:考虑松弛型表示,令第一个约束右边变量(非基本变量)都为0,得到的一组解\n", 196 | "* 基本可行解:如果基本解也满足第二个约束,则是基本可行解\n", 197 | "* 单纯形法思路:先找一个基本可行解,然后每次都考虑变化一个最优化表达式中的非基本变量,使其刚好不违反约束而使得目标最优,然后把这个非基本变量$x_e$代换成一个基本变量$x_l$(pivot),然后重复进行\n", 198 | "\n", 199 | "```\n", 200 | "def pivot(N, B, A, b, c, v, l, e):\n", 201 | " b_new[e] = b[l] / a[l,e]\n", 202 | " for each j in N-{e}:\n", 203 | " a_new[e,j] = a[l,j] / a[l,e]\n", 204 | " a_new[e,l] = 1/a[l,e]\n", 205 | " \n", 206 | " for each i in B-{l}:\n", 207 | " b_new[i] = b[i] - a[i,e] * b_new[e]\n", 208 | " for each j in N-{e}:\n", 209 | " a_new[i,j] = a[i,j] - a[i,e] * a_new[e,j]\n", 210 | " a_new[i,l] = - a[i,e] a_new[e,l]\n", 211 | " \n", 212 | " v_new = v + c[e] * b_new[e]\n", 213 | " for each j in N-{e}:\n", 214 | " c_new[j] = c[j] - c[e] * a_new[e,j]\n", 215 | " c_new[l] = - c[e] * a_new[e,l]\n", 216 | " \n", 217 | " N_new = N - {e, l}\n", 218 | " B_new = B - {e, l}\n", 219 | " \n", 220 | " return N_new, B_new, A_new, b_new, c_new, v_new\n", 221 | "```\n", 222 | "\n", 223 | "其中需要一个程序initialize_simplex来找到第一个基本可行解。\n", 224 | "\n", 225 | "```\n", 226 | "def simplex(A, b, c):\n", 227 | " N, B, A, b, c, v = initialize_simplex(A, b, c)\n", 228 | " while some index e in N that c[e] > 0:\n", 229 | " for each i in B:\n", 230 | " if a[i, e] > 0:\n", 231 | " delta[i] = b[i] / a[i, e]\n", 232 | " else:\n", 233 | " delta[i] = inf\n", 234 | " choose l in B that minimize delta[i]\n", 235 | " if delta[l] == inf:\n", 236 | " return UNBOUNDED\n", 237 | " else:\n", 238 | " N, B, A, b, c, v = pivot(N, B, A, b, c, v, l, e)\n", 239 | " \n", 240 | " for i = 1 to n:\n", 241 | " if i in B:\n", 242 | " x[i] = b[i]\n", 243 | " else:\n", 244 | " x[i] = 0\n", 245 | " return x\n", 246 | "```" 247 | ] 248 | }, 249 | { 250 | "cell_type": "code", 251 | "execution_count": null, 252 | "metadata": { 253 | "collapsed": true 254 | }, 255 | "outputs": [], 256 | "source": [] 257 | } 258 | ], 259 | "metadata": { 260 | "kernelspec": { 261 | "display_name": "Python 3", 262 | "language": "python", 263 | "name": "python3" 264 | }, 265 | "language_info": { 266 | "codemirror_mode": { 267 | "name": "ipython", 268 | "version": 3 269 | }, 270 | "file_extension": ".py", 271 | "mimetype": "text/x-python", 272 | "name": "python", 273 | "nbconvert_exporter": "python", 274 | "pygments_lexer": "ipython3", 275 | "version": "3.4.2" 276 | } 277 | }, 278 | "nbformat": 4, 279 | "nbformat_minor": 2 280 | } 281 | -------------------------------------------------------------------------------- /Summary_Basic_compact.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "### Math\n", 8 | "\n", 9 | "$$\n", 10 | "\\newcommand{\\sumoversample}{\\sum_{i=1}^m}\n", 11 | "\\newcommand{\\samplexi}{x^{(1)}}\n", 12 | "\\newcommand{\\sampleyi}{y^{(1)}}\n", 13 | "$$\n", 14 | "\n", 15 | "* 【Union Bound】$P(A \\cup B) \\le P(A) + P(B)$\n", 16 | "* 【Chernoff Bound】$Z_1, \\cdots, Z_m \\in \\{0, 1\\}$ i.i.d. $p(z=1) = \\Phi, p(z=0) = 1 - \\Phi$, let $\\widehat{\\Phi} = \\dfrac{1}{m} \\sum_i Z_i$, we have \n", 17 | "$$P(|\\Phi - \\widehat{\\Phi}| > \\gamma) \\le 2\\exp (-2\\gamma^2m)$$\n", 18 | "* 【Jensen's Inequality】For convex function $f$ and random variable $X$, $\\mathbb{E}[f(X)] \\ge f(\\mathbb{E} X)$. Equality exists when $X = \\mathbb{E}X, \\ w.p.1$\n", 19 | "* 【Multivariant Gaussian Distribution】\n", 20 | "$$\n", 21 | "p(x;\\mu,\\Sigma) = \\dfrac{1}{(2\\pi)^{n/2} |\\Sigma|^{1/2}} \\exp(-\\dfrac{1}{2}(x-\\mu)^T \\Sigma^{-1} (x-\\mu))\n", 22 | "$$\n", 23 | "Derivatives:\n", 24 | "$$\n", 25 | "\\frac{ \\partial \\log p({\\boldsymbol x};{\\boldsymbol \\mu},{\\boldsymbol \\Sigma}) }{ \\partial {\\boldsymbol \\mu}}\n", 26 | "= {\\boldsymbol \\Sigma}^{-1} \\left( {\\boldsymbol x} - {\\boldsymbol \\mu} \\right) \n", 27 | "$$\n", 28 | "$$\n", 29 | "\\frac{ \\partial \\log p({\\boldsymbol x};{\\boldsymbol \\mu},{\\boldsymbol \\Sigma}) }{ \\partial {\\boldsymbol \\Sigma}}\n", 30 | "= \\frac{1}{2} \\left( \n", 31 | "{\\boldsymbol \\Sigma}^{-1} \n", 32 | "\\left( {\\boldsymbol x} - {\\boldsymbol \\mu} \\right)\n", 33 | "\\left( {\\boldsymbol x} - {\\boldsymbol \\mu} \\right)^T\n", 34 | "{\\boldsymbol \\Sigma}^{-1} \n", 35 | "- {\\boldsymbol \\Sigma}^{-1} \\right)\n", 36 | "$$\n", 37 | "Conditional distribution, $x_1|x_2 \\sim \\mathcal{N}(\\mu_{1|2}, \\Sigma_{1|2})$, wherre $\\mu_{1|2} = \\mu_1 + \\Sigma_{12} \\Sigma_{22}^{-1} (x_2 - \\mu_2)$ and $\\Sigma_{1|2} = \\Sigma_{11}- \\Sigma_{12}\\Sigma_{22}^{-1} \\Sigma_{21}$\n", 38 | "Marginal distribution, $x_1 \\sim \\mathcal{N}(\\mu_1, \\Sigma_{11})$\n", 39 | "\n", 40 | "* 【Matrix】$\\nabla_A tr(AB) = B^T$;$\\nabla_{A^T} f(A) = (\\nabla_A f(A))^T$;$\\nabla_A tr(ABA^TC) = CAB + C^T A B^T$;$\\nabla_A |A| = |A|(A^{-1})^T$\n", 41 | "\n", 42 | "* 【Bayes' Theorem】$P(A|B) = \\dfrac{P(B|A)P(A)}{P(B)}$\n", 43 | "\n", 44 | "* 【KL Divergence】$KL(P||Q) = -\\sum_i P(i) \\log \\dfrac{Q(i)}{P(i)}$\n", 45 | "\n", 46 | "* 【Hoeffding's Inequality】Define $p(H(n) \\le k) = \\sum_{i=0}^k C_n^i p^i (1-p)^{n-i}$, we have $p(H(n) \\le k) \\le exp(-2(p-\\dfrac{1}{2})^2 n)$ and $p(H(n) \\ge k) \\le exp(-2(\\dfrac{1}{2}-p)^2 n)$\n", 47 | "\n", 48 | "* 【Lagrange Duelity】\n", 49 | "$$\n", 50 | "\\min_w f(w) \\quad\n", 51 | "s.t. g_i(w) \\le 0, \\ \\forall i \\in [k] \\quad\n", 52 | "h_i(w) = 0, i \\in [l]\n", 53 | "$$\n", 54 | "Lagrange function $L(w,\\alpha,\\beta)=f(w) + \\sum \\alpha_i g_i(w) + \\sum \\beta_i h_i(w)$, define $\\theta_P(w) = \\max_{\\alpha, \\beta: \\alpha_i \\ge 0} L(w,\\alpha,\\beta)$ and $\\theta_D (\\alpha, \\beta) = \\min_w L(w,\\alpha,\\beta)$. We have $\\max_{\\alpha, \\beta: \\alpha_i \\ge 0} \\theta_D (\\alpha, \\beta) = d^* \\le p^* = \\min_w \\theta_P(w)$. Under certain conditions ($f$ and $g_i$ are convex; $\\{g_i\\}$ is feasible), $d^* = p^*$. On this optimal point, KKT condision should be satisfied\n", 55 | "$$\n", 56 | "\\dfrac{\\partial L }{\\partial w_i} = 0 \\quad\n", 57 | "\\dfrac{\\partial L }{\\partial \\beta_i} = 0 \\quad\n", 58 | "\\alpha_i g_i(w) = 0 \\quad\n", 59 | "g_i(w) \\le 0 \\quad\n", 60 | "a_i \\ge 0 \n", 61 | "$$\n", 62 | "* $\\sum_{i=1}^n \\dfrac{1}{i} = \\Theta(\\log n)$\n", 63 | "\n", 64 | "### Machine Learning\n", 65 | "\n", 66 | "$$\\newcommand{\\sumsamples}{\\sum_{i=1}^m}$$\n", 67 | "\n", 68 | "* 【Bias-Variance Decomposition】泛化误差的分解:$f(x;D)$-variance-$\\mathbb{E}_D[f(x)]$-bias-$y_{true}$-noise-$y_D$\n", 69 | "* 【Cross Validation】\n", 70 | "```\n", 71 | "def model_selectio_kfold(M1, M2, ..., Md):\n", 72 | " randomly split S into k set S1, ..., Sk\n", 73 | " for each Mi:\n", 74 | " for j = 1 to k:\n", 75 | " train on S/Sj test on Sj\n", 76 | " epsilon[Mi] = mean(epsilon[Mi, j], j)\n", 77 | " pick Mi with lowest epsilon[Mi] and retrain the model\n", 78 | "```\n", 79 | "* 【Linear Regression】推导闭式解可用最小化均方误差$\\sumsamples (y_i - wx_i -b)^2$,也可以转化为概率模型$p(y|x,w) \\sim \\mathbb{N}(wx, \\sigma^2)$最大化对数似然函数$L(w)=\\sumsamples \\log p(y|x,w)$;线性回归中,牛顿法$w^{(t+1)}=w^{(t)}-(\\dfrac{\\partial^2 E}{\\partial w^2})^{-1}(\\dfrac{\\partial E}{\\partial w})$可以一次得到最优解\n", 80 | "* 【Logistic Regression】logistic回归假设$\\ln \\dfrac{p(y=1|x)}{p(y=0|x)} = w^T x + b$,写出其对数似然函数$L(w)=\\sumsamples \\log p(y_i|w,x_i)=\\sumsamples y_i\\log(1-g(wx_i)) + (1-y_i)\\log g(wx_i)$,其中$g(x) = \\dfrac{1}{1+e^x}$,求导可得到梯度下降公式$w\\leftarrow w -\\alpha \\sumsamples(1-g(wx_i) - y_i) x_i$\n", 81 | "\n", 82 | "* 【Decision Tree】信息增益公式$H(D)=-\\sum_{k=1}^K p_k \\log p_k$,$H^{split}(D) = \\sum_i \\dfrac{|D_i|}{|D|} H(D_i)$,增益$H^{split}(D) - H(D)$\n", 83 | "* 【Gradient Boosting Tree】每步目标$obj^{(t)}=\\sumsamples l(y_i, \\widehat{y}^{t-1} + f_t(x_i)) + \\Omega(f_t) \\approx \\sumsamples (g_i f_t(x_i) + \\frac{1}{2} h_i f_t^2(x_i)) + \\gamma T + \\dfrac{\\lambda}{2} \\sum_{j=1}^T w_j^2$,结果$w^* = - \\dfrac{G_j}{H_j + \\lambda}$,$obj^* = -\\dfrac{1}{2} \\sum_{j=1}^T \\dfrac{G_j^2}{H_j + \\lambda}$,其中$G_j$和$H_j$分别是落在j节点上样本的一阶和二阶导数之和;分支划分依据$gain=\\dfrac{1}{2} [\\dfrac{G_L^2}{H_L + \\lambda} + \\dfrac{G_R^2}{H_R + \\lambda} - \\dfrac{(G_L+G_R)^2}{H_L + H_R + \\lambda}]-\\gamma$\n", 84 | "* 【Additive Model】$f(x) = \\sum \\alpha_m G_m(x)$\n", 85 | "\n", 86 | "* 【BP】假设权重$W_j$连接j和j+1层,$\\dfrac{\\partial E}{\\partial W_j} = \\sumsamples 2(\\widehat{y}^{(i)} - y^{(i)}) \\cdot \\dfrac{\\partial \\widehat{y}^{(i)}}{\\partial x_{j+1}^{(i)}} \\circ x_{j+1}^{(i)} \\circ (1-x_{j+1}^{(i)}) \\otimes x_j$\n", 87 | "* 【Pooling】使其不受轻微的scaling和rotation的干扰;可以让相邻的特征进行拼接;可以减小输出图像的大小\n", 88 | "* 【ReLU】能有效传播梯度;使得受到激活的空间比较sparse,有利于不受噪声干扰;计算比较简便\n", 89 | "* 【SVM】Start from \n", 90 | "$$\n", 91 | "\\max_{\\gamma, w, b} \\gamma \\quad s.t. y^{(i)}(w^T x^{(i)} + b) \\ge \\gamma, \\ ||w|| = 1\n", 92 | "$$\n", 93 | "Considering scaling invariance, we can scale by $\\dfrac{1}{||w||}$ and let $\\gamma = 1$.\n", 94 | "$$\n", 95 | "\\max_{w, b} \\dfrac{1}{2} ||w||^2 \\quad s.t. y^{(i)}(w^T x^{(i)} + b) \\ge 1\n", 96 | "$$\n", 97 | "By using Lagrange duality,\n", 98 | "$$\n", 99 | "\\max_\\alpha W(\\alpha) = \\sum_i \\alpha_i - \\dfrac{1}{2} \\sum_{i,j} y^{(i)}y^{(j)}\\alpha_i\\alpha_j x^{(i)} \\cdot x^{(j)}\n", 100 | "\\quad s.t. \\alpha_i \\ge 0, \\sum_i \\alpha_i y^{(i)} = 0\n", 101 | "$$\n", 102 | "When faced by soft margin,\n", 103 | "$$\n", 104 | "\\max_{w, b} \\dfrac{1}{2} ||w||^2 + C \\sum \\xi_i \\quad s.t. y^{(i)}(w^T x^{(i)} + b) \\ge 1 - \\xi_i, \\ \\xi_i \\ge 0\n", 105 | "$$\n", 106 | "we have \n", 107 | "$$\n", 108 | "\\max_\\alpha W(\\alpha) = \\sum_i \\alpha_i - \\dfrac{1}{2} \\sum_{i,j} y^{(i)}y^{(j)}\\alpha_i\\alpha_j x^{(i)} \\cdot x^{(j)}\n", 109 | "\\quad s.t. 0 \\le \\alpha_i \\le C, \\sum_i \\alpha_i y^{(i)} = 0\n", 110 | "$$\n", 111 | "\n", 112 | "* 【SVM kernel】Kernel function corresponds to the product of feature vector. Theorem: $K: \\mathbb{R}^n \\times \\mathbb{R}^n \\to \\mathbb{R}$ is a valid kernel $\\Longleftrightarrow \\forall \\{x^{(1)}, \\cdots, x^{(m)}\\}$ corresponding kernel matrix is symmetric positive semidefinite.\n", 113 | "* 【SVM SMO】SMO是用来解SVM的一个算法,主要思路是coordinate ascend,每次选定$\\alpha_1, \\alpha_2$,然后在不违反约束的情况下,最大化$W(\\alpha)$。需要满足的约束是$\\sum_i \\alpha_i y_i = 0 \\Rightarrow \\alpha_1 = (\\xi - \\alpha_2 y_2) y_1$和$\\alpha_1, \\alpha_2 \\in [0, C] \\Rightarrow L \\le \\alpha_2 \\le H$。把第一个约束条件带入,$W(\\alpha) = W((\\xi - \\alpha_2 y_2) y_1, \\alpha_2, \\cdots, \\alpha_m) = a\\alpha_2^2 + b\\alpha_2 + c$,二次型能够求出最值$\\alpha_2^*$,最后$\\alpha_2 \\leftarrow clip(\\alpha_2^*, L, H)$。迭代进行。\n", 114 | "\n", 115 | "* 【Full Bayes/MAP/ML】full Bayes:参数估计$p(\\theta|D)=\\dfrac{P(D|\\theta)P(\\theta)}{P(D)}\\propto p(\\theta) \\prod_{i=1}^m p(y_i|x_i,\\theta)$,样本预测$p(y|x;D)=\\int_\\theta p(y|x,\\theta)p(\\theta|D)d\\theta$,MAP和ML只取一个$\\theta$,MAP $\\theta = arg \\max p(\\theta) \\prod_{i=1}^m p(y_i|x_i,\\theta)$,ML $\\theta = arg \\max \\prod_{i=1}^m p(y_i|x_i,\\theta)$\n", 116 | "\n", 117 | "* 【Bayesian Network】基于无向图的概率图模型,联合概率分布$p(x_1,\\cdots,x_d) = \\prod_{i=1}^d p(x_i|\\pi_i)$,其中$\\pi_i$表示i的父节点,条件概率分布在图中给出;条件概率分布可由联合概率分布得出;给定两个变量,如果相互独立有$p(x,y)=p(x)p(y)$;1父2子,给定父节点,子节点相互独立;2父1子,子节点不给定时父节点独立,子节点给定的时候父节点不独立;x-y-z顺序结构,给定y时,xz独立\n", 118 | "* 【Gibbs Sampling】吉布斯采样可以求给定证据变量情形下,待查询变量的期望。随机产生一个与证据变量E一致的各rv赋值,每次选取一个除证据变量以外的rv,根据其Markov Blanket中的变量数值计算该变量的条件概率分布,依此采样得到新的数值。待查询变量的期望为所有循环中,该变量的均值。MB包含,该节点的父节点、子节点和子节点的父节点。\n", 119 | "* 【k-means EM】算法:repeat $c^{(i)} = arg \\min_j ||\\samplexi - \\mu_j||^2, \\ \\forall i \\in [m]$ and $\\mu_j = \\dfrac{\\sumoversample I(c^{(i)} = j) \\samplexi}{\\sumoversample I(c^{(i)} = j)}$;分析:k-means is exactly coordinate descent on $J(c, \\mu) = \\sumoversample ||\\samplexi - \\mu_{c^{(i)}}||^2$\n", 120 | "* 【GMM EM】E-step:$w_{ij} = p(z_i = j| x_i; \\Phi, \\mu, \\Sigma)$;M-step:$\\Phi_j = \\frac{1}{m}\\sumsamples w_{ij}$,$\\mu_j = \\dfrac{\\sumsamples w_{ij} x_i}{\\sumsamples w_{ij}}$,$\\Sigma_j = \\dfrac{\\sumsamples w_{ij} (x_i - \\mu_j)(x_i - \\mu_j)^T }{\\sumsamples w_{ij}}$;预测:$p(z_i = j| x_i; \\Phi, \\mu, \\Sigma) = \\dfrac{p(x_i|z_i = j; \\mu, \\Sigma)p(z_i=j;\\Phi)}{\\sum_{l=1}^k p(x_i|z_i = l; \\mu, \\Sigma)p(z_i=l;\\Phi)}$\n", 121 | "* 【EM】由Jesen不等式得到对数似然函数$l(\\theta) = \\sumoversample \\log \\sum_z p(x; x, \\theta)$的下界$J(Q, \\theta) = \\sum_i \\sum_{z^{(i)}} Q(z^{(i)}) \\log \\dfrac{p(\\samplexi, z^{(i)}, \\theta)}{Q(z^{(i)})}$,当$Q(z^{(i)}) = p(z^{(i)} | \\samplexi, \\theta) = \\dfrac{p(\\samplexi, z^{(i)}, \\theta)}{\\sum_z p(\\samplexi, z, \\theta)}$(E-step)可以取到此下界;把Q固定可以优化参数$\\theta = arg\\max_\\theta \\sum_i \\sum_{z^{(i)}} Q(z^{(i)}) \\log \\dfrac{p(\\samplexi, z^{(i)}, \\theta)}{Q(z^{(i)})}$ (M-step)\n", 122 | "* 【EM Covergence】证明每一轮之后对数似然函数都不小于上一轮\n", 123 | "* 【Majority Vote】直接使用Hoeffding,错误率随分类器数目指数下降\n", 124 | "* 【AdaBoost】最小化指数损失函数$L = \\mathbb{E}_{x\\sim D}[e^{-y f(x)}]$能够推导到权重更新公式$G_m = arg\\min\\sumsamples w_i^{(m)} I(y_i \\neq G_m(x_i))$,$\\alpha_m = \\log \\dfrac{1-err_m}{err_m}$,$w_i^{(m+1)} = w_i^{(m)} \\exp (\\alpha_m I (y_i \\neq G_m(x_i))$,其中$err_m = \\dfrac{\\sum I(\\neq) w_i^{(m)}}{\\sum w_i^{(m)}}$\n", 125 | "\n", 126 | "* 【Bagging】平均均方误差$E=\\mathbb{E}[\\dfrac{1}{M} \\sum \\epsilon_i^2]$;bagging是各个模型取平均,假设相互独立,误差$E=\\mathbb{E}[(\\dfrac{1}{M} \\sum \\epsilon_i)^2]$\n", 127 | "\n", 128 | "* 【Random Forest】每次随机抽样生成一个训练集(bootstrap),然后再随机抽取若干特征,训练一棵树,最后的结果取平均\n", 129 | "* 【kNN】如果样本足够密,其误差率不超过贝叶斯最优分类器的两倍:$c^* = arg\\max P(c|x)$,$E = 1 - \\sum_c p(c|x)p(c|nn(x)) = 1 - \\sum_c p(c|x)^2 \\le 1 - p(c^*|x) = (1+p(c^*|x))(1-p(c^*|x)) \\le 2 (1-p(c^*|x))$\n", 130 | "\n", 131 | "* 【PCA】PCA特征值分解的方法等价于找到某个向低维空间的投影,使得各个样本的重构误差最小;也等价于找到某个向低维空间的投影,使得各个样本投影的方差最大\n", 132 | " - $|x_n\\rangle = \\sum_1^d |u_i\\rangle\\langle u_i|x_n\\rangle + \\sum_{d+1}^p |u_i\\rangle\\langle u_i|\\bar{x}\\rangle$,重构误差$J = \\dfrac{1}{N} \\sumsamples || x_i - x_i^{recon}||^2$,转化为优化问题,然后用拉格朗日乘子法\n", 133 | " - 方差$Var = \\dfrac{1}{N} \\sumsamples ( \\langle x_i | u_j \\rangle - \\langle \\bar{x} | u_j \\rangle )^2$; 假设希望找到一个主轴方向为$u$,满足$||u||^2=1$,其实就是解最优化问题$\\max \\frac{1}{m} \\sumoversample ({\\samplexi}^T u)^2 = \\frac{1}{m} u^T \\Sigma u$,其中$\\Sigma =\\frac{1}{m}\\sumoversample {\\samplexi}^T \\samplexi \\in \\mathbb{R}^{n\\times n}$。称$\\Sigma$为Covariance matrix,设$\\{u_k\\}$是其前k个特征向量,因此可以用每个样本的特征向量和这k个特征向量分别做内积,得到的k个数字组成新的k维特征向量。\n", 134 | "\n", 135 | "* 【Empirical risk and generalization error】\n", 136 | " - empirical risk: $\\newcommand{\\epsilonhat}{\\widehat{\\epsilon}} \\epsilonhat(h) = \\dfrac{1}{m} \\sum_{i} I(h(x^{(i)}) \\neq y^{(i})$\n", 137 | " - generalization error: $\\epsilon(h) = P_{(x,y)\\sim D}(h(x)\\neq y)$\n", 138 | " - empirical risk minimization (ERM) is to find parameter $\\theta$ that minimize empirical risk, or find hypothesis among hypothesis space that minimize empirical risk.\n", 139 | "* 【PAC】\n", 140 | " - Theorem: 当样本数$m$满足以下关系时,能够保证empirical risk能够以大概率($1-\\delta$),近似$|\\epsilonhat - \\epsilon| \\le \\gamma$等于generalization error。\n", 141 | " - 证明:其实最后是要证明$P(\\neg \\exists h\\in \\mathcal{H} |\\epsilonhat(h) - \\epsilon(h)| > \\gamma) \\le 1 - 2 k \\exp(-2\\gamma^2 m)$,用Union Bound把它转化为单个的,再用Chernoff bound写出两者接近程度的bound。\n", 142 | " - Theorem:$\\epsilon(\\widehat{h}) \\le \\min_{h\\in\\mathcal{H}} \\epsilon(h) + 2 \\sqrt{\\dfrac{1}{2m} \\log(\\dfrac{2k}{\\delta})}$, where $\\widehat{h} = arg\\min \\epsilonhat(h)$, define $\\widehat{h^*} = arg\\min \\epsilon(h)$.\n", 143 | " - 证明:$\\epsilon(\\widehat{h}) \\le \\epsilonhat(\\widehat{h}) + \\gamma \\le \\epsilonhat(h^*) + \\gamma \\le \\epsilon(h^*) + 2 \\gamma$\n", 144 | "* 【VC Dimension】能够找到一个维度为d的样本集,使得该假设能够实现所有$2^d$种binary标签组合(打散);但是不能打散任意一个维度为d+1的样本组合;$\\gamma = |\\epsilon(h)-\\epsilonhat(h)| \\le O(\\dfrac{d}{m} \\log \\dfrac{m}{d} + \\dfrac{1}{m} \\log \\dfrac{1}{d})$\n", 145 | "\n", 146 | "* 【Generative and Discriminative Model】前者是学习$p(y|x)$,希望直接从input space $\\rightarrow$ label,后者是学习$p(x|y)$,通过贝叶斯公式$p(y|x)=p(x|y)p(y)/p(x)$来做判别\n", 147 | "* 【Guassian Discriminate Analysis】模型,结论,方法对数似然函数求导\n", 148 | "$$\n", 149 | "y \\sim Bernoulli(\\Phi) = p(y) = \\Phi^y (1-\\Phi)^{1-y} \\\\\n", 150 | "x|y=0 \\sim N(\\mu_0, \\Sigma) = \\dfrac{1}{(2\\pi)^{n/2} |\\Sigma|^{1/2}} \\exp(-\\dfrac{1}{2}(x-\\mu_0)^T \\Sigma^{-1} (x-\\mu_0)) \\quad\n", 151 | "x|y=1 \\sim N(\\mu_1, \\Sigma)\n", 152 | "$$\n", 153 | "$$\n", 154 | "\\Phi = \\dfrac{1}{m} \\sum_i y^{(i)} \\quad\n", 155 | "\\mu_0 = \\dfrac{\\sum_i (1-y^{(i)}) x^{(i)}}{\\sum_i (1-y^{(i)})} \\quad\n", 156 | "\\mu_1 = \\dfrac{\\sum_i y^{(i)} x^{(i)}}{\\sum_i y^{(i)}} \\quad\n", 157 | "\\Sigma = \\dfrac{1}{m} \\sum_i (x^{(i)} - \\mu_{y^{(i)}}) (x^{(i)} - \\mu_{y^{(i)}})^T\n", 158 | "$$\n", 159 | "* 【Hidden Markov Model】Joint distribution: $P(x_1,e_1,\\cdots, x_n, e_n) = P(x_1) P(e_1|x_1) \\prod_{i=1}^n P(x_i|x_{i-1})P(e_i|x_i)$. Forward Algorithm: Elapse Time $P(X_t|e_1\\cdots e_{t-1}) = \\sum_{x_{t-1}} P(X_t|x_{t-1})P(x_{t-1}|e_1\\cdots e_{t-1})$; Observe $P(X_t|e_1\\cdots e_t) = \\dfrac{P(e_t|X_t)P(X_t|e_1\\cdots e_{t-1})}{\\sum_{x_t}P(e_t|x_t)P(x_t|e_1\\cdots e_{t-1})}$\n", 160 | "* 【Markov Random Field】Joint distribution: $P(x) = \\frac{1}{Z} \\prod_{Q\\in C} \\psi_Q(x_Q)$, product over all clique; potential function $\\psi_Q(x_Q)=e^{-H_Q(x_Q)}$.\n", 161 | "* 【RL】Bellman Equation: $V^\\pi(S) = R(s) + \\gamma \\sum_{s' \\in S} p(s, \\pi(s), s') V^\\pi(s')$\n", 162 | "\n", 163 | "* 【Generalized Linear Model】Exponential Family:$p(y,\\eta) = b(y) \\exp(\\eta^T T(y) - a(\\eta))$;通常$T(y)=y$, $\\eta = \\theta^T x$(这就是为什么叫linear);用处:假设数据服从一个分布,用一组参数$\\Phi$表示;把它化为exponential family,写出$\\eta, a(\\eta), b(y)$分别是什么;然后可以写出$p(y|x,\\theta)$具体的形式\n", 164 | "\n", 165 | "* 【Naive Bayes】email spam,一篇文章用一个向量$x\\in \\mathbb{R}^{|V|}$表示,如果出现某个词就是1,如果没出现就是0;要学习的参数是$\\Phi = p(y=1)$,$\\Phi_{i|y=c} = p(x_i=1|y=c)$;取似然函数对数求导,得到结果,最后的结论是分子分母计数统计。\n", 166 | "\n", 167 | "* 【Online Learning】perceptron algorithm for online learning: define $h_\\theta(x) = sign(\\theta^T x)$ returns -1 or 1, label $y \\in \\{-1, +1\\}$, if the algorithm goes wrong at some example do $\\theta \\leftarrow \\theta + \\sampleyi \\samplexi$;\n", 168 | " - Theorem: $||\\samplexi|| \\le D$, $\\exists u$ s.t. $||u||^2 = 1$, $\\sampleyi (u^T \\samplexi) \\ge \\gamma$, total number of mistaks the algorithm makes $\\le (\\dfrac{D}{\\gamma})^2$\n", 169 | " - 证明:假设出现第k个错误时为$\\theta^{(k)}$,数学归纳法证${\\theta^{(k+1)}}^T u \\ge k\\gamma$, $||\\theta^{(k+1)}||^2 \\le k D^2$\n", 170 | "\n", 171 | "* 【Factor Analysis】如果样本的数目m没有输入样本的维度n多的时候,用Gaussian模型来求解的时候协方差矩阵很容易是Singlar Matrix。考虑用一个维度比较低的隐变量$z\\in\\mathbb{R}^k, \\ k:\n", 193 | " \n", 194 | " while:\n", 195 | " if frontier is empty:\n", 196 | " return FAILURE\n", 197 | " choose a leaf node and remove if from frontier\n", 198 | " if goal(node):\n", 199 | " return corresponding solution\n", 200 | " expand the node\n", 201 | " \n", 202 | " add the resulting nodes to frontier\n", 203 | " < ->add the resulting nodes to frontier if not in frontier of explored_set>\n", 204 | "```\n", 205 | "* 【搜索算法的衡量标准】 completeness:如果存在解,是否能够找到;optimality:是否能够找到最优解;time complexity / space complexity\n", 206 | "* 【BFS】complete if branching factor $b$ is finite;optimal if step costs are identical;time and space $O(b^d)$, where $d$ is depth\n", 207 | "* 【Uniform-cost Search】frontier采用一个优先队列,每次弹出cost最小的节点,并且扩展的时候如果节点比之前的cost小,会进行更新;在BFS的基础上这样做保证了step cost不相同时返回结果的最优性;complete if $b$ is finite and step cost $\\ge \\epsilon > 0$;always optimal;time and space $O(b^{1 + \\lfloor C^* / \\epsilon \\rfloor})$, where $C^*$ is the optimal solution\n", 208 | "* 【DFS (with tree search frame)】not complete: 由于tree search中不会记录已访问的态,因此可能陷入循环;如果记录的话,就没有相对于BFS的空间优势了;not optimal: 先访问到的不一定最优;time $O(b^m)$; space $O(bm)$ (frontier中最多装bm个节点),其中$m$为搜索的最深深度\n", 209 | "* 【Depth Limited DPS (with tree search frame)】和DFS相同,只不过$m$代表的是限制的深度\n", 210 | "* 【Iterative Deepening】complete if $b$ is finite;optimal if cost are identical;time $O(b^d)$; space $O(bd)$\n", 211 | "* 【Bidirectional Search】not applicable in all cases;complete if $b$ is finite and both directions are BFS;optimal if step costs are identical and BFS;space and time $O(b^{d/2})$\n", 212 | "* 【A-Star】complete if #node equal or less than $C^*$ is finite, i.e. all step cost $\\ge \\epsilon$ and $b$ is finite;time and space $O(b^m)$;optimal: addmisible $h(n) \\le h^*(n)$ for tree-search; consistent $h(n) \\le c(n, a, n') + h(n')$ for graph-search\n", 213 | " - proof: 1. 证明路径上$f(n) = g(n) + h(n)$是越来越大的;2. 证明一个节点被找到,通往这个节点的最优路径就被找到了(反证法,由于f小的先被扩展)\n", 214 | "* 【Hill-climbing Search】每次只考虑邻域,如果结果变好就走到该邻域,使用random restart技术可以使得算法变得complete\n", 215 | "* 【Simulated Annealing】考虑邻域,如果结果变好就接受它,如果没有变好,以概率$e^{- |\\Delta E| / T}$接受\n", 216 | "* 【Local Beam Search】使用k个agent来一起做Hill-climbing Search,如果有某些agent发现哪里比较好,会让其他agent也来搜索这一块地方\n", 217 | "* 【Genetic Algorithm】random select by fitness func, mutate and reproduce\n", 218 | "* 【Searching with Nondeterministic Actions】环境的因素不确定的时候,使用AND-OR search的方法,OR节点表示agent可以自己选择的节点,AND节点表示环境随机选择的节点。目标是找到一棵树,使得每个叶子节点都是一个目标状态,在OR节点上表明要做出的选择,每个AND节点的分支都要考虑。\n", 219 | "* 【Searching with Partial Observation】在观察受限的时候,可以使用belief state,它是多个可能实际状态的集合;每一个action把一个belief state转到另一个,旧的belief state中每一个状态都包含在新的中;观察到新状态的部分信息之后,可以把belief state缩减为一个更小的belief state" 220 | ] 221 | } 222 | ], 223 | "metadata": { 224 | "kernelspec": { 225 | "display_name": "Python 3", 226 | "language": "python", 227 | "name": "python3" 228 | }, 229 | "language_info": { 230 | "codemirror_mode": { 231 | "name": "ipython", 232 | "version": 3 233 | }, 234 | "file_extension": ".py", 235 | "mimetype": "text/x-python", 236 | "name": "python", 237 | "nbconvert_exporter": "python", 238 | "pygments_lexer": "ipython3", 239 | "version": "3.4.2" 240 | } 241 | }, 242 | "nbformat": 4, 243 | "nbformat_minor": 2 244 | } 245 | -------------------------------------------------------------------------------- /Summary_Basic_Lite.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "### Math\n", 8 | "\n", 9 | "$$\n", 10 | "\\newcommand{\\sumoversample}{\\sum_{i=1}^m}\n", 11 | "\\newcommand{\\samplexi}{x^{(1)}}\n", 12 | "\\newcommand{\\sampleyi}{y^{(1)}}\n", 13 | "$$\n", 14 | "\n", 15 | "* 【Union Bound】$P(A \\cup B) \\le P(A) + P(B)$\n", 16 | "* 【Chernoff Bound】$Z_1, \\cdots, Z_m \\in \\{0, 1\\}$ i.i.d. $p(z=1) = \\Phi, p(z=0) = 1 - \\Phi$, let $\\widehat{\\Phi} = \\dfrac{1}{m} \\sum_i Z_i$, we have \n", 17 | "$$P(|\\Phi - \\widehat{\\Phi}| > \\gamma) \\le 2\\exp (-2\\gamma^2m)$$\n", 18 | "* 【Jensen's Inequality】For convex function $f$ and random variable $X$, $\\mathbb{E}[f(X)] \\ge f(\\mathbb{E} X)$. Equality exists when $X = \\mathbb{E}X, \\ w.p.1$\n", 19 | "* 【Multivariant Gaussian Distribution】\n", 20 | "$$\n", 21 | "p(x;\\mu,\\Sigma) = \\dfrac{1}{(2\\pi)^{n/2} |\\Sigma|^{1/2}} \\exp(-\\dfrac{1}{2}(x-\\mu)^T \\Sigma^{-1} (x-\\mu))\n", 22 | "$$\n", 23 | "Derivatives:\n", 24 | "$$\n", 25 | "\\frac{ \\partial \\log p({\\boldsymbol x};{\\boldsymbol \\mu},{\\boldsymbol \\Sigma}) }{ \\partial {\\boldsymbol \\mu}}\n", 26 | "= {\\boldsymbol \\Sigma}^{-1} \\left( {\\boldsymbol x} - {\\boldsymbol \\mu} \\right) \n", 27 | "$$\n", 28 | "$$\n", 29 | "\\frac{ \\partial \\log p({\\boldsymbol x};{\\boldsymbol \\mu},{\\boldsymbol \\Sigma}) }{ \\partial {\\boldsymbol \\Sigma}}\n", 30 | "= \\frac{1}{2} \\left( \n", 31 | "{\\boldsymbol \\Sigma}^{-1} \n", 32 | "\\left( {\\boldsymbol x} - {\\boldsymbol \\mu} \\right)\n", 33 | "\\left( {\\boldsymbol x} - {\\boldsymbol \\mu} \\right)^T\n", 34 | "{\\boldsymbol \\Sigma}^{-1} \n", 35 | "- {\\boldsymbol \\Sigma}^{-1} \\right)\n", 36 | "$$\n", 37 | "Conditional distribution, $x_1|x_2 \\sim \\mathcal{N}(\\mu_{1|2}, \\Sigma_{1|2})$, wherre $\\mu_{1|2} = \\mu_1 + \\Sigma_{12} \\Sigma_{22}^{-1} (x_2 - \\mu_2)$ and $\\Sigma_{1|2} = \\Sigma_{11}- \\Sigma_{12}\\Sigma_{22}^{-1} \\Sigma_{21}$\n", 38 | "Marginal distribution, $x_1 \\sim \\mathcal{N}(\\mu_1, \\Sigma_{11})$" 39 | ] 40 | }, 41 | { 42 | "cell_type": "markdown", 43 | "metadata": {}, 44 | "source": [ 45 | "* 【Matrix】$\\nabla_A tr(AB) = B^T$;$\\nabla_{A^T} f(A) = (\\nabla_A f(A))^T$;$\\nabla_A tr(ABA^TC) = CAB + C^T A B^T$;$\\nabla_A |A| = |A|(A^{-1})^T$" 46 | ] 47 | }, 48 | { 49 | "cell_type": "markdown", 50 | "metadata": {}, 51 | "source": [ 52 | "* 【Bayes' Theorem】$P(A|B) = \\dfrac{P(B|A)P(A)}{P(B)}$" 53 | ] 54 | }, 55 | { 56 | "cell_type": "markdown", 57 | "metadata": {}, 58 | "source": [ 59 | "* 【KL Divergence】$KL(P||Q) = -\\sum_i P(i) \\log \\dfrac{Q(i)}{P(i)}$" 60 | ] 61 | }, 62 | { 63 | "cell_type": "markdown", 64 | "metadata": {}, 65 | "source": [ 66 | "* 【Hoeffding's Inequality】Define $p(H(n) \\le k) = \\sum_{i=0}^k C_n^i p^i (1-p)^{n-i}$, we have $p(H(n) \\le k) \\le exp(-2(p-\\dfrac{1}{2})^2 n)$ and $p(H(n) \\ge k) \\le exp(-2(\\dfrac{1}{2}-p)^2 n)$" 67 | ] 68 | }, 69 | { 70 | "cell_type": "markdown", 71 | "metadata": {}, 72 | "source": [ 73 | "* 【Lagrange Duelity】\n", 74 | "$$\n", 75 | "\\min_w f(w) \\quad\n", 76 | "s.t. g_i(w) \\le 0, \\ \\forall i \\in [k] \\quad\n", 77 | "h_i(w) = 0, i \\in [l]\n", 78 | "$$\n", 79 | "Lagrange function $L(w,\\alpha,\\beta)=f(w) + \\sum \\alpha_i g_i(w) + \\sum \\beta_i h_i(w)$, define $\\theta_P(w) = \\max_{\\alpha, \\beta: \\alpha_i \\ge 0} L(w,\\alpha,\\beta)$ and $\\theta_D (\\alpha, \\beta) = \\min_w L(w,\\alpha,\\beta)$. We have $\\max_{\\alpha, \\beta: \\alpha_i \\ge 0} \\theta_D (\\alpha, \\beta) = d^* \\le p^* = \\min_w \\theta_P(w)$. Under certain conditions ($f$ and $g_i$ are convex; $\\{g_i\\}$ is feasible), $d^* = p^*$. On this optimal point, KKT condision should be satisfied\n", 80 | "$$\n", 81 | "\\dfrac{\\partial L }{\\partial w_i} = 0 \\quad\n", 82 | "\\dfrac{\\partial L }{\\partial \\beta_i} = 0 \\quad\n", 83 | "\\alpha_i g_i(w) = 0 \\quad\n", 84 | "g_i(w) \\le 0 \\quad\n", 85 | "a_i \\ge 0 \n", 86 | "$$" 87 | ] 88 | }, 89 | { 90 | "cell_type": "markdown", 91 | "metadata": {}, 92 | "source": [ 93 | "### Machine Learning\n", 94 | "\n", 95 | "$$\\newcommand{\\sumsamples}{\\sum_{i=1}^m}$$\n", 96 | "\n", 97 | "* 【Bias-Variance Decomposition】泛化误差的分解:$f(x;D)$-variance-$\\mathbb{E}_D[f(x)]$-bias-$y_{true}$-noise-$y_D$\n", 98 | "* 【Cross Validation】\n", 99 | "```\n", 100 | "def model_selectio_kfold(M1, M2, ..., Md):\n", 101 | " randomly split S into k set S1, ..., Sk\n", 102 | " for each Mi:\n", 103 | " for j = 1 to k:\n", 104 | " train on S/Sj test on Sj\n", 105 | " epsilon[Mi] = mean(epsilon[Mi, j], j)\n", 106 | " pick Mi with lowest epsilon[Mi] and retrain the model\n", 107 | "```\n", 108 | "* 【Linear Regression】推导闭式解可用最小化均方误差$\\sumsamples (y_i - wx_i -b)^2$,也可以转化为概率模型$p(y|x,w) \\sim \\mathbb{N}(wx, \\sigma^2)$最大化对数似然函数$L(w)=\\sumsamples \\log p(y|x,w)$;线性回归中,牛顿法$w^{(t+1)}=w^{(t)}-(\\dfrac{\\partial^2 E}{\\partial w^2})^{-1}(\\dfrac{\\partial E}{\\partial w})$可以一次得到最优解\n", 109 | "* 【Logistic Regression】logistic回归假设$\\ln \\dfrac{p(y=1|x)}{p(y=0|x)} = w^T x + b$,写出其对数似然函数$L(w)=\\sumsamples \\log p(y_i|w,x_i)=\\sumsamples y_i\\log(1-g(wx_i)) + (1-y_i)\\log g(wx_i)$,其中$g(x) = \\dfrac{1}{1+e^x}$,求导可得到梯度下降公式$w\\leftarrow w -\\alpha \\sumsamples(1-g(wx_i) - y_i) x_i$" 110 | ] 111 | }, 112 | { 113 | "cell_type": "markdown", 114 | "metadata": {}, 115 | "source": [ 116 | "* 【Decision Tree】信息增益公式$H(D)=-\\sum_{k=1}^K p_k \\log p_k$,$H^{split}(D) = \\sum_i \\dfrac{|D_i|}{|D|} H(D_i)$,增益$H^{split}(D) - H(D)$\n", 117 | "* 【Gradient Boosting Tree】每步目标$obj^{(t)}=\\sumsamples l(y_i, \\widehat{y}^{t-1} + f_t(x_i)) + \\Omega(f_t) \\approx \\sumsamples (g_i f_t(x_i) + \\frac{1}{2} h_i f_t^2(x_i)) + \\gamma T + \\dfrac{\\lambda}{2} \\sum_{j=1}^T w_j^2$,结果$w^* = - \\dfrac{G_j}{H_j + \\lambda}$,$obj^* = -\\dfrac{1}{2} \\sum_{j=1}^T \\dfrac{G_j^2}{H_j + \\lambda}$,其中$G_j$和$H_j$分别是落在j节点上样本的一阶和二阶导数之和;分支划分依据$gain=\\dfrac{1}{2} [\\dfrac{G_L^2}{H_L + \\lambda} + \\dfrac{G_R^2}{H_R + \\lambda} - \\dfrac{(G_L+G_R)^2}{H_L + H_R + \\lambda}]-\\gamma$\n", 118 | "* 【Additive Model】$f(x) = \\sum \\alpha_m G_m(x)$" 119 | ] 120 | }, 121 | { 122 | "cell_type": "markdown", 123 | "metadata": {}, 124 | "source": [ 125 | "* 【BP】假设权重$W_j$连接j和j+1层,$\\dfrac{\\partial E}{\\partial W_j} = \\sumsamples 2(\\widehat{y}^{(i)} - y^{(i)}) \\cdot \\dfrac{\\partial \\widehat{y}^{(i)}}{\\partial x_{j+1}^{(i)}} \\circ x_{j+1}^{(i)} \\circ (1-x_{j+1}^{(i)}) \\otimes x_j$" 126 | ] 127 | }, 128 | { 129 | "cell_type": "markdown", 130 | "metadata": {}, 131 | "source": [ 132 | "* 【SVM】Start from \n", 133 | "$$\n", 134 | "\\max_{\\gamma, w, b} \\gamma \\quad s.t. y^{(i)}(w^T x^{(i)} + b) \\ge \\gamma, \\ ||w|| = 1\n", 135 | "$$\n", 136 | "Considering scaling invariance, we can scale by $\\dfrac{1}{||w||}$ and let $\\gamma = 1$.\n", 137 | "$$\n", 138 | "\\max_{w, b} \\dfrac{1}{2} ||w||^2 \\quad s.t. y^{(i)}(w^T x^{(i)} + b) \\ge 1\n", 139 | "$$\n", 140 | "By using Lagrange duality,\n", 141 | "$$\n", 142 | "\\max_\\alpha W(\\alpha) = \\sum_i \\alpha_i - \\dfrac{1}{2} \\sum_{i,j} y^{(i)}y^{(j)}\\alpha_i\\alpha_j x^{(i)} \\cdot x^{(j)}\n", 143 | "\\quad s.t. \\alpha_i \\ge 0, \\sum_i \\alpha_i y^{(i)} = 0\n", 144 | "$$\n", 145 | "When faced by soft margin,\n", 146 | "$$\n", 147 | "\\max_{w, b} \\dfrac{1}{2} ||w||^2 + C \\sum \\xi_i \\quad s.t. y^{(i)}(w^T x^{(i)} + b) \\ge 1 - \\xi_i, \\ \\xi_i \\ge 0\n", 148 | "$$\n", 149 | "we have \n", 150 | "$$\n", 151 | "\\max_\\alpha W(\\alpha) = \\sum_i \\alpha_i - \\dfrac{1}{2} \\sum_{i,j} y^{(i)}y^{(j)}\\alpha_i\\alpha_j x^{(i)} \\cdot x^{(j)}\n", 152 | "\\quad s.t. 0 \\le \\alpha_i \\le C, \\sum_i \\alpha_i y^{(i)} = 0\n", 153 | "$$" 154 | ] 155 | }, 156 | { 157 | "cell_type": "markdown", 158 | "metadata": {}, 159 | "source": [ 160 | "* 【SVM kernel】Kernel function corresponds to the product of feature vector. Theorem: $K: \\mathbb{R}^n \\times \\mathbb{R}^n \\to \\mathbb{R}$ is a valid kernel $\\Longleftrightarrow \\forall \\{x^{(1)}, \\cdots, x^{(m)}\\}$ corresponding kernel matrix is symmetric positive semidefinite.\n", 161 | "* 【SVM SMO】SMO是用来解SVM的一个算法,主要思路是coordinate ascend,每次选定$\\alpha_1, \\alpha_2$,然后在不违反约束的情况下,最大化$W(\\alpha)$。需要满足的约束是$\\sum_i \\alpha_i y_i = 0 \\Rightarrow \\alpha_1 = (\\xi - \\alpha_2 y_2) y_1$和$\\alpha_1, \\alpha_2 \\in [0, C] \\Rightarrow L \\le \\alpha_2 \\le H$。把第一个约束条件带入,$W(\\alpha) = W((\\xi - \\alpha_2 y_2) y_1, \\alpha_2, \\cdots, \\alpha_m) = a\\alpha_2^2 + b\\alpha_2 + c$,二次型能够求出最值$\\alpha_2^*$,最后$\\alpha_2 \\leftarrow clip(\\alpha_2^*, L, H)$。迭代进行。" 162 | ] 163 | }, 164 | { 165 | "cell_type": "markdown", 166 | "metadata": {}, 167 | "source": [ 168 | "* 【Full Bayes/MAP/ML】full Bayes:参数估计$p(\\theta|D)=\\dfrac{P(D|\\theta)P(\\theta)}{P(D)}\\propto p(\\theta) \\prod_{i=1}^m p(y_i|x_i,\\theta)$,样本预测$p(y|x;D)=\\int_\\theta p(y|x,\\theta)p(\\theta|D)d\\theta$,MAP和ML只取一个$\\theta$,MAP $\\theta = arg \\max p(\\theta) \\prod_{i=1}^m p(y_i|x_i,\\theta)$,ML $\\theta = arg \\max \\prod_{i=1}^m p(y_i|x_i,\\theta)$" 169 | ] 170 | }, 171 | { 172 | "cell_type": "markdown", 173 | "metadata": {}, 174 | "source": [ 175 | "* 【Bayesian Network】基于无向图的概率图模型,联合概率分布$p(x_1,\\cdots,x_d) = \\prod_{i=1}^d p(x_i|\\pi_i)$,其中$\\pi_i$表示i的父节点,条件概率分布在图中给出;条件概率分布可由联合概率分布得出;给定两个变量,如果相互独立有$p(x,y)=p(x)p(y)$;1父2子,给定父节点,子节点相互独立;2父1子,子节点不给定时父节点独立,子节点给定的时候父节点不独立;x-y-z顺序结构,给定y时,xz独立\n", 176 | "* 【Gibbs Sampling】吉布斯采样可以求给定证据变量情形下,待查询变量的期望。随机产生一个与证据变量E一致的各rv赋值,每次选取一个除证据变量以外的rv,根据其Markov Blanket中的变量数值计算该变量的条件概率分布,依此采样得到新的数值。待查询变量的期望为所有循环中,该变量的均值。MB包含,该节点的父节点、子节点和子节点的父节点。\n", 177 | "* 【k-means EM】算法:repeat $c^{(i)} = arg \\min_j ||\\samplexi - \\mu_j||^2, \\ \\forall i \\in [m]$ and $\\mu_j = \\dfrac{\\sumoversample I(c^{(i)} = j) \\samplexi}{\\sumoversample I(c^{(i)} = j)}$;分析:k-means is exactly coordinate descent on $J(c, \\mu) = \\sumoversample ||\\samplexi - \\mu_{c^{(i)}}||^2$\n", 178 | "* 【GMM EM】E-step:$w_{ij} = p(z_i = j| x_i; \\Phi, \\mu, \\Sigma)$;M-step:$\\Phi_j = \\frac{1}{m}\\sumsamples w_{ij}$,$\\mu_j = \\dfrac{\\sumsamples w_{ij} x_i}{\\sumsamples w_{ij}}$,$\\Sigma_j = \\dfrac{\\sumsamples w_{ij} (x_i - \\mu_j)(x_i - \\mu_j)^T }{\\sumsamples w_{ij}}$;预测:$p(z_i = j| x_i; \\Phi, \\mu, \\Sigma) = \\dfrac{p(x_i|z_i = j; \\mu, \\Sigma)p(z_i=j;\\Phi)}{\\sum_{l=1}^k p(x_i|z_i = l; \\mu, \\Sigma)p(z_i=l;\\Phi)}$\n", 179 | "* 【EM】由Jesen不等式得到对数似然函数$l(\\theta) = \\sumoversample \\log \\sum_z p(x; x, \\theta)$的下界$J(Q, \\theta) = \\sum_i \\sum_{z^{(i)}} Q(z^{(i)}) \\log \\dfrac{p(\\samplexi, z^{(i)}, \\theta)}{Q(z^{(i)})}$,当$Q(z^{(i)}) = p(z^{(i)} | \\samplexi, \\theta) = \\dfrac{p(\\samplexi, z^{(i)}, \\theta)}{\\sum_z p(\\samplexi, z, \\theta)}$(E-step)可以取到此下界;把Q固定可以优化参数$\\theta = arg\\max_\\theta \\sum_i \\sum_{z^{(i)}} Q(z^{(i)}) \\log \\dfrac{p(\\samplexi, z^{(i)}, \\theta)}{Q(z^{(i)})}$ (M-step)\n", 180 | "* 【EM Covergence】证明每一轮之后对数似然函数都不小于上一轮\n", 181 | "* 【Majority Vote】直接使用Hoeffding,错误率随分类器数目指数下降\n", 182 | "* 【AdaBoost】最小化指数损失函数$L = \\mathbb{E}_{x\\sim D}[e^{-y f(x)}]$能够推导到权重更新公式$G_m = arg\\min\\sumsamples w_i^{(m)} I(y_i \\neq G_m(x_i))$,$\\alpha_m = \\log \\dfrac{1-err_m}{err_m}$,$w_i^{(m+1)} = w_i^{(m)} \\exp (\\alpha_m I (y_i \\neq G_m(x_i))$,其中$err_m = \\dfrac{\\sum I(\\neq) w_i^{(m)}}{\\sum w_i^{(m)}}$" 183 | ] 184 | }, 185 | { 186 | "cell_type": "markdown", 187 | "metadata": {}, 188 | "source": [ 189 | "* 【Bagging】平均均方误差$E=\\mathbb{E}[\\dfrac{1}{M} \\sum \\epsilon_i^2]$;bagging是各个模型取平均,假设相互独立,误差$E=\\mathbb{E}[(\\dfrac{1}{M} \\sum \\epsilon_i)^2]$" 190 | ] 191 | }, 192 | { 193 | "cell_type": "markdown", 194 | "metadata": {}, 195 | "source": [ 196 | "* 【Random Forest】每次随机抽样生成一个训练集(bootstrap),然后再随机抽取若干特征,训练一棵树,最后的结果取平均\n", 197 | "* 【kNN】如果样本足够密,其误差率不超过贝叶斯最优分类器的两倍:$c^* = arg\\max P(c|x)$,$E = 1 - \\sum_c p(c|x)p(c|nn(x)) = 1 - \\sum_c p(c|x)^2 \\le 1 - p(c^*|x) = (1+p(c^*|x))(1-p(c^*|x)) \\le 2 (1-p(c^*|x))$" 198 | ] 199 | }, 200 | { 201 | "cell_type": "markdown", 202 | "metadata": {}, 203 | "source": [ 204 | "* 【PCA】PCA特征值分解的方法等价于找到某个向低维空间的投影,使得各个样本的重构误差最小;也等价于找到某个向低维空间的投影,使得各个样本投影的方差最大\n", 205 | " - $|x_n\\rangle = \\sum_1^d |u_i\\rangle\\langle u_i|x_n\\rangle + \\sum_{d+1}^p |u_i\\rangle\\langle u_i|\\bar{x}\\rangle$,重构误差$J = \\dfrac{1}{N} \\sumsamples || x_i - x_i^{recon}||^2$,转化为优化问题,然后用拉格朗日乘子法\n", 206 | " - 方差$Var = \\dfrac{1}{N} \\sumsamples ( \\langle x_i | u_j \\rangle - \\langle \\bar{x} | u_j \\rangle )^2$; 假设希望找到一个主轴方向为$u$,满足$||u||^2=1$,其实就是解最优化问题$\\max \\frac{1}{m} \\sumoversample ({\\samplexi}^T u)^2 = \\frac{1}{m} u^T \\Sigma u$,其中$\\Sigma =\\frac{1}{m}\\sumoversample {\\samplexi}^T \\samplexi \\in \\mathbb{R}^{n\\times n}$。称$\\Sigma$为Covariance matrix,设$\\{u_k\\}$是其前k个特征向量,因此可以用每个样本的特征向量和这k个特征向量分别做内积,得到的k个数字组成新的k维特征向量。" 207 | ] 208 | }, 209 | { 210 | "cell_type": "markdown", 211 | "metadata": {}, 212 | "source": [ 213 | "* 【Empirical risk and generalization error】\n", 214 | " - empirical risk: $\\newcommand{\\epsilonhat}{\\widehat{\\epsilon}} \\epsilonhat(h) = \\dfrac{1}{m} \\sum_{i} I(h(x^{(i)}) \\neq y^{(i})$\n", 215 | " - generalization error: $\\epsilon(h) = P_{(x,y)\\sim D}(h(x)\\neq y)$\n", 216 | " - empirical risk minimization (ERM) is to find parameter $\\theta$ that minimize empirical risk, or find hypothesis among hypothesis space that minimize empirical risk.\n", 217 | "* 【PAC】\n", 218 | " - Theorem: 当样本数$m$满足以下关系时,能够保证empirical risk能够以大概率($1-\\delta$),近似$|\\epsilonhat - \\epsilon| \\le \\gamma$等于generalization error。\n", 219 | " - 证明:其实最后是要证明$P(\\neg \\exists h\\in \\mathcal{H} |\\epsilonhat(h) - \\epsilon(h)| > \\gamma) \\le 1 - 2 k \\exp(-2\\gamma^2 m)$,用Union Bound把它转化为单个的,再用Chernoff bound写出两者接近程度的bound。\n", 220 | " - Theorem:$\\epsilon(\\widehat{h}) \\le \\min_{h\\in\\mathcal{H}} \\epsilon(h) + 2 \\sqrt{\\dfrac{1}{2m} \\log(\\dfrac{2k}{\\delta})}$, where $\\widehat{h} = arg\\min \\epsilonhat(h)$, define $\\widehat{h^*} = arg\\min \\epsilon(h)$.\n", 221 | " - 证明:$\\epsilon(\\widehat{h}) \\le \\epsilonhat(\\widehat{h}) + \\gamma \\le \\epsilonhat(h^*) + \\gamma \\le \\epsilon(h^*) + 2 \\gamma$\n", 222 | "* 【VC Dimension】能够找到一个维度为d的样本集,使得该假设能够实现所有$2^d$种binary标签组合(打散);但是不能打散任意一个维度为d+1的样本组合;$\\gamma = |\\epsilon(h)-\\epsilonhat(h)| \\le O(\\dfrac{d}{m} \\log \\dfrac{m}{d} + \\dfrac{1}{m} \\log \\dfrac{1}{d})$" 223 | ] 224 | }, 225 | { 226 | "cell_type": "markdown", 227 | "metadata": {}, 228 | "source": [ 229 | "* 【Generative and Discriminative Model】前者是学习$p(y|x)$,希望直接从input space $\\rightarrow$ label,后者是学习$p(x|y)$,通过贝叶斯公式$p(y|x)=p(x|y)p(y)/p(x)$来做判别\n", 230 | "* 【Guassian Discriminate Analysis】模型,结论,方法对数似然函数求导\n", 231 | "$$\n", 232 | "y \\sim Bernoulli(\\Phi) = p(y) = \\Phi^y (1-\\Phi)^{1-y} \\\\\n", 233 | "x|y=0 \\sim N(\\mu_0, \\Sigma) = \\dfrac{1}{(2\\pi)^{n/2} |\\Sigma|^{1/2}} \\exp(-\\dfrac{1}{2}(x-\\mu_0)^T \\Sigma^{-1} (x-\\mu_0)) \\quad\n", 234 | "x|y=1 \\sim N(\\mu_1, \\Sigma)\n", 235 | "$$\n", 236 | "$$\n", 237 | "\\Phi = \\dfrac{1}{m} \\sum_i y^{(i)} \\quad\n", 238 | "\\mu_0 = \\dfrac{\\sum_i (1-y^{(i)}) x^{(i)}}{\\sum_i (1-y^{(i)})} \\quad\n", 239 | "\\mu_1 = \\dfrac{\\sum_i y^{(i)} x^{(i)}}{\\sum_i y^{(i)}} \\quad\n", 240 | "\\Sigma = \\dfrac{1}{m} \\sum_i (x^{(i)} - \\mu_{y^{(i)}}) (x^{(i)} - \\mu_{y^{(i)}})^T\n", 241 | "$$" 242 | ] 243 | }, 244 | { 245 | "cell_type": "markdown", 246 | "metadata": {}, 247 | "source": [ 248 | "* 【RL】Bellman Equation: $V^\\pi(S) = R(s) + \\gamma \\sum_{s' \\in S} p(s, \\pi(s), s') V^\\pi(s')$" 249 | ] 250 | }, 251 | { 252 | "cell_type": "markdown", 253 | "metadata": { 254 | "collapsed": true 255 | }, 256 | "source": [ 257 | "* 【Generalized Linear Model】Exponential Family:$p(y,\\eta) = b(y) \\exp(\\eta^T T(y) - a(\\eta))$;通常$T(y)=y$, $\\eta = \\theta^T x$(这就是为什么叫linear);用处:假设数据服从一个分布,用一组参数$\\Phi$表示;把它化为exponential family,写出$\\eta, a(\\eta), b(y)$分别是什么;然后可以写出$p(y|x,\\theta)$具体的形式" 258 | ] 259 | }, 260 | { 261 | "cell_type": "markdown", 262 | "metadata": {}, 263 | "source": [ 264 | "* 【Naive Bayes】email spam,一篇文章用一个向量$x\\in \\mathbb{R}^{|V|}$表示,如果出现某个词就是1,如果没出现就是0;要学习的参数是$\\Phi = p(y=1)$,$\\Phi_{i|y=c} = p(x_i=1|y=c)$;取似然函数对数求导,得到结果,最后的结论是分子分母计数统计。" 265 | ] 266 | }, 267 | { 268 | "cell_type": "markdown", 269 | "metadata": {}, 270 | "source": [ 271 | "* 【Online Learning】perceptron algorithm for online learning: define $h_\\theta(x) = sign(\\theta^T x)$ returns -1 or 1, label $y \\in \\{-1, +1\\}$, if the algorithm goes wrong at some example do $\\theta \\leftarrow \\theta + \\sampleyi \\samplexi$;\n", 272 | " - Theorem: $||\\samplexi|| \\le D$, $\\exists u$ s.t. $||u||^2 = 1$, $\\sampleyi (u^T \\samplexi) \\ge \\gamma$, total number of mistaks the algorithm makes $\\le (\\dfrac{D}{\\gamma})^2$\n", 273 | " - 证明:假设出现第k个错误时为$\\theta^{(k)}$,数学归纳法证${\\theta^{(k+1)}}^T u \\ge k\\gamma$, $||\\theta^{(k+1)}||^2 \\le k D^2$" 274 | ] 275 | }, 276 | { 277 | "cell_type": "markdown", 278 | "metadata": {}, 279 | "source": [ 280 | "* 【Factor Analysis】如果样本的数目m没有输入样本的维度n多的时候,用Gaussian模型来求解的时候协方差矩阵很容易是Singlar Matrix。考虑用一个维度比较低的隐变量$z\\in\\mathbb{R}^k, \\ k 0$;always optimal;time and space $O(b^{1 + \\lfloor C^* / \\epsilon \\rfloor})$, where $C^*$ is the optimal solution\n", 337 | "* 【DFS (with tree search frame)】not complete: 由于tree search中不会记录已访问的态,因此可能陷入循环;如果记录的话,就没有相对于BFS的空间优势了;not optimal: 先访问到的不一定最优;time $O(b^m)$; space $O(bm)$ (frontier中最多装bm个节点),其中$m$为搜索的最深深度\n", 338 | "* 【Depth Limited DPS (with tree search frame)】和DFS相同,只不过$m$代表的是限制的深度\n", 339 | "* 【Iterative Deepening】complete if $b$ is finite;optimal if cost are identical;time $O(b^d)$; space $O(bd)$\n", 340 | "* 【Bidirectional Search】not applicable in all cases;complete if $b$ is finite and both directions are BFS;optimal if step costs are identical and BFS;space and time $O(b^{d/2})$\n", 341 | "* 【A-Star】complete if #node equal or less than $C^*$ is finite, i.e. all step cost $\\ge \\epsilon$ and $b$ is finite;time and space $O(b^m)$;optimal: addmisible $h(n) \\le h^*(n)$ for tree-search; consistent $h(n) \\le c(n, a, n') + h(n')$ for graph-search\n", 342 | " - proof: 1. 证明路径上$f(n) = g(n) + h(n)$是越来越大的;2. 证明一个节点被找到,通往这个节点的最优路径就被找到了(反证法,由于f小的先被扩展)\n", 343 | "* 【Hill-climbing Search】每次只考虑邻域,如果结果变好就走到该邻域,使用random restart技术可以使得算法变得complete\n", 344 | "* 【Simulated Annealing】考虑邻域,如果结果变好就接受它,如果没有变好,以概率$e^{- |\\Delta E| / T}$接受\n", 345 | "* 【Local Beam Search】使用k个agent来一起做Hill-climbing Search,如果有某些agent发现哪里比较好,会让其他agent也来搜索这一块地方\n", 346 | "* 【Genetic Algorithm】random select by fitness func, mutate and reproduce\n", 347 | "* 【Searching with Nondeterministic Actions】环境的因素不确定的时候,使用AND-OR search的方法,OR节点表示agent可以自己选择的节点,AND节点表示环境随机选择的节点。目标是找到一棵树,使得每个叶子节点都是一个目标状态,在OR节点上表明要做出的选择,每个AND节点的分支都要考虑。\n", 348 | "* 【Searching with Partial Observation】在观察受限的时候,可以使用belief state,它是多个可能实际状态的集合;每一个action把一个belief state转到另一个,旧的belief state中每一个状态都包含在新的中;观察到新状态的部分信息之后,可以把belief state缩减为一个更小的belief state" 349 | ] 350 | } 351 | ], 352 | "metadata": { 353 | "kernelspec": { 354 | "display_name": "Python 3", 355 | "language": "python", 356 | "name": "python3" 357 | }, 358 | "language_info": { 359 | "codemirror_mode": { 360 | "name": "ipython", 361 | "version": 3 362 | }, 363 | "file_extension": ".py", 364 | "mimetype": "text/x-python", 365 | "name": "python", 366 | "nbconvert_exporter": "python", 367 | "pygments_lexer": "ipython3", 368 | "version": "3.6.3" 369 | } 370 | }, 371 | "nbformat": 4, 372 | "nbformat_minor": 2 373 | } 374 | -------------------------------------------------------------------------------- /Summary_Basic.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "# Math\n", 8 | "\n", 9 | "### Union Bound\n", 10 | "\n", 11 | "$$\n", 12 | "P(A \\cup B) \\le P(A) + P(B)\n", 13 | "$$\n", 14 | "\n", 15 | "### Chernoff Bound\n", 16 | "\n", 17 | "$Z_1, \\cdots, Z_m \\in \\{0, 1\\}$ i.i.d. $p(z=1) = \\Phi, p(z=0) = 1 - \\Phi$, let $\\widehat{\\Phi} = \\dfrac{1}{m} \\sum_i Z_i$, we have \n", 18 | "$$P(|\\Phi - \\widehat{\\Phi}| > \\gamma) \\le 2\\exp (-2\\gamma^2m)$$" 19 | ] 20 | }, 21 | { 22 | "cell_type": "markdown", 23 | "metadata": {}, 24 | "source": [ 25 | "### Jensen's Inequality\n", 26 | "\n", 27 | "For convex function $f$ and random variable $X$, $\\mathbb{E}[f(X)] \\ge f(\\mathbb{E} X)$. Equality exists when $X = \\mathbb{E}X, \\ w.p.1$" 28 | ] 29 | }, 30 | { 31 | "cell_type": "markdown", 32 | "metadata": {}, 33 | "source": [ 34 | "### Multivariant Gaussian Distribution\n", 35 | "\n", 36 | "$$\n", 37 | "p(x;\\mu,\\Sigma) = \\dfrac{1}{(2\\pi)^{n/2} |\\Sigma|^{1/2}} \\exp(-\\dfrac{1}{2}(x-\\mu)^T \\Sigma^{-1} (x-\\mu))\n", 38 | "$$\n", 39 | "\n", 40 | "Derivatives:\n", 41 | "\n", 42 | "$$\n", 43 | "\\frac{ \\partial \\log p({\\boldsymbol x};{\\boldsymbol \\mu},{\\boldsymbol \\Sigma}) }{ \\partial {\\boldsymbol \\mu}}\n", 44 | "= {\\boldsymbol \\Sigma}^{-1} \\left( {\\boldsymbol x} - {\\boldsymbol \\mu} \\right) \n", 45 | "$$\n", 46 | "\n", 47 | "$$\n", 48 | "\\frac{ \\partial \\log p({\\boldsymbol x};{\\boldsymbol \\mu},{\\boldsymbol \\Sigma}) }{ \\partial {\\boldsymbol \\Sigma}}\n", 49 | "= \\frac{1}{2} \\left( \n", 50 | "{\\boldsymbol \\Sigma}^{-1} \n", 51 | "\\left( {\\boldsymbol x} - {\\boldsymbol \\mu} \\right)\n", 52 | "\\left( {\\boldsymbol x} - {\\boldsymbol \\mu} \\right)^T\n", 53 | "{\\boldsymbol \\Sigma}^{-1} \n", 54 | "- {\\boldsymbol \\Sigma}^{-1} \\right)\n", 55 | "$$\n", 56 | "\n", 57 | "Conditional distribution, $x_1|x_2 \\sim \\mathcal{N}(\\mu_{1|2}, \\Sigma_{1|2})$, wherre $\\mu_{1|2} = \\mu_1 + \\Sigma_{12} \\Sigma_{22}^{-1} (x_2 - \\mu_2)$ and $\\Sigma_{1|2} = \\Sigma_{11}- \\Sigma_{12}\\Sigma_{22}^{-1} \\Sigma_{21}$\n", 58 | "\n", 59 | "Marginal distribution, $x_1 \\sim \\mathcal{N}(\\mu_1, \\Sigma_{11})$" 60 | ] 61 | }, 62 | { 63 | "cell_type": "markdown", 64 | "metadata": {}, 65 | "source": [ 66 | "### Matrix\n", 67 | "\n", 68 | "* $\\nabla_A tr(AB) = B^T$\n", 69 | "* $\\nabla_{A^T} f(A) = (\\nabla_A f(A))^T$\n", 70 | "* $\\nabla_A tr(ABA^TC) = CAB + C^T A B^T$\n", 71 | "* $\\nabla_A |A| = |A|(A^{-1})^T$" 72 | ] 73 | }, 74 | { 75 | "cell_type": "markdown", 76 | "metadata": {}, 77 | "source": [ 78 | "### Bayes' Theorem\n", 79 | "\n", 80 | "$$P(A|B) = \\dfrac{P(B|A)P(A)}{P(B)}$$" 81 | ] 82 | }, 83 | { 84 | "cell_type": "markdown", 85 | "metadata": {}, 86 | "source": [ 87 | "### KL Divergence\n", 88 | "\n", 89 | "$$\n", 90 | "KL(P||Q) = -\\sum_i P(i) \\log \\dfrac{Q(i)}{P(i)}\n", 91 | "$$" 92 | ] 93 | }, 94 | { 95 | "cell_type": "markdown", 96 | "metadata": {}, 97 | "source": [ 98 | "### Hoeffding's Inequality\n", 99 | "\n", 100 | "Define $p(H(n) \\le k) = \\sum_{i=0}^k C_n^i p^i (1-p)^{n-i}$, we have $p(H(n) \\le k) \\le exp(-2(p-\\dfrac{1}{2})^2 n)$ and $p(H(n) \\ge k) \\le exp(-2(\\dfrac{1}{2}-p)^2 n)$" 101 | ] 102 | }, 103 | { 104 | "cell_type": "markdown", 105 | "metadata": {}, 106 | "source": [ 107 | "## AI-Book Chapter 3\n", 108 | "\n", 109 | "### Searching\n", 110 | "\n", 111 | "* 两种搜索的范式:Tree Search和Graph Search\n", 112 | "\n", 113 | "```\n", 114 | "def tree_search:\n", 115 | " while:\n", 116 | " if frontier is empty:\n", 117 | " return FAILURE\n", 118 | " choose a leaf node and remove if from frontier\n", 119 | " if goal(node):\n", 120 | " return corresponding solution\n", 121 | " expand the node\n", 122 | " add the resulting nodes to frontier\n", 123 | "```\n", 124 | "\n", 125 | "```\n", 126 | "def graph_search:\n", 127 | " explored_set = []\n", 128 | " while:\n", 129 | " if frontier is empty:\n", 130 | " return FAILURE\n", 131 | " choose a leaf node and remove if from frontier\n", 132 | " if goal(node):\n", 133 | " return corresponding solution\n", 134 | " explored_set += [node]\n", 135 | " expand the node\n", 136 | " add the resulting nodes to frontier if not in frontier of explored_set\n", 137 | "```\n", 138 | "\n", 139 | "* 搜索算法的衡量标准\n", 140 | " - completeness:如果存在解,是否能够找到\n", 141 | " - optimality:是否能够找到最优解\n", 142 | " - time complexity / space complexity\n", 143 | "\n", 144 | "### Uninformed Search Strategies\n", 145 | "\n", 146 | "* BFS\n", 147 | " - complete if branching factor $b$ is finite\n", 148 | " - optimal if step costs are identical\n", 149 | " - time and space $O(b^d)$, where $d$ is depth\n", 150 | "* Uniform-cost Search\n", 151 | " - frontier采用一个优先队列,每次弹出cost最小的节点,并且扩展的时候如果节点比之前的cost小,会进行更新;在BFS的基础上这样做保证了step cost不相同时返回结果的最优性\n", 152 | " - complete if $b$ is finite and step cost $\\ge \\epsilon > 0$\n", 153 | " - always optimal\n", 154 | " - time and space $O(b^{1 + \\lfloor C^* / \\epsilon \\rfloor})$, where $C^*$ is the optimal solution\n", 155 | "* DFS (with tree search frame)\n", 156 | " - not complete: 由于tree search中不会记录已访问的态,因此可能陷入循环;如果记录的话,就没有相对于BFS的空间优势了\n", 157 | " - not optimal: 先访问到的不一定最优\n", 158 | " - time $O(b^m)$; space $O(bm)$ (frontier中最多装bm个节点),其中$m$为搜索的最深深度\n", 159 | "* Depth Limited DPS (with tree search frame)\n", 160 | " - 和DFS相同,只不过$m$代表的是限制的深度\n", 161 | "* Iterative Deepening\n", 162 | " - complete if $b$ is finite\n", 163 | " - optimal if cost are identical\n", 164 | " - time $O(b^d)$; space $O(bd)$\n", 165 | "* Bidirectional Search: not applicable in all cases\n", 166 | " - complete if $b$ is finite and both directions are BFS\n", 167 | " - optimal if step costs are identical and BFS\n", 168 | " - space and time $O(b^{d/2})$\n", 169 | "\n", 170 | "### Informed (Heuristic) Search\n", 171 | "\n", 172 | "* A-Star\n", 173 | " - complete if #node equal or less than $C^*$ is finite, i.e. all step cost $\\ge \\epsilon$ and $b$ is finite\n", 174 | " - optimal: addmisible $h(n) \\le h^*(n)$ for tree-search; consistent $h(n) \\le c(n, a, n') + h(n')$ for graph-search\n", 175 | " - proof: 1. 证明路径上$f(n) = g(n) + h(n)$是越来越大的;2. 证明一个节点被找到,通往这个节点的最优路径就被找到了(反证法,由于f小的先被扩展)\n", 176 | " - time and space $O(b^m)$\n", 177 | "\n", 178 | "## AI-Book Chapter 4\n", 179 | "\n", 180 | "### Local Search \n", 181 | "* Hill-climbing Search:每次只考虑邻域,如果结果变好就走到该邻域,使用random restart技术可以使得算法变得complete\n", 182 | "* Simulated Annealing:考虑邻域,如果结果变好就接受它,如果没有变好,以概率$e^{- |\\Delta E| / T}$接受\n", 183 | "* Local Beam Search:使用k个agent来一起做Hill-climbing Search,如果有某些agent发现哪里比较好,会让其他agent也来搜索这一块地方\n", 184 | "* Genetic Algorithm:random select by fitness func, mutate and reproduce\n", 185 | "\n", 186 | "### Searching with Nondeterministic Actions or Partial Observation\n", 187 | "* 环境的因素不确定的时候,使用AND-OR search的方法,OR节点表示agent可以自己选择的节点,AND节点表示环境随机选择的节点。目标是找到一棵树,使得每个叶子节点都是一个目标状态,在OR节点上表明要做出的选择,每个AND节点的分支都要考虑。\n", 188 | "* 在观察受限的时候,可以使用belief state,它是多个可能实际状态的集合;每一个action把一个belief state转到另一个,旧的belief state中每一个状态都包含在新的中;观察到新状态的部分信息之后,可以把belief state缩减为一个更小的belief state" 189 | ] 190 | }, 191 | { 192 | "cell_type": "markdown", 193 | "metadata": { 194 | "collapsed": true 195 | }, 196 | "source": [ 197 | "## Andrew Ng Lecture 1\n", 198 | "\n", 199 | "### Minimize Loss Function/Maximize Log Liklihood\n", 200 | "\n", 201 | "* 对Linear Regression的损失函数求梯度\n", 202 | "* 对Linear Regression的似然函数求梯度(认为y随$\\theta^T x$正态分布,即正态噪声)\n", 203 | "* 对Logistic Regression的似然函数求梯度\n", 204 | "\n", 205 | "### Generalized Linear Model\n", 206 | "\n", 207 | "* Exponential Family:$p(y,\\eta) = b(y) \\exp(\\eta^T T(y) - a(\\eta))$\n", 208 | "* 通常$T(y)=y$, $\\eta = \\theta^T x$(这就是为什么叫linear)\n", 209 | "* 用处:假设数据服从一个分布,用一组参数$\\Phi$表示;把它化为exponential family,写出$\\eta, a(\\eta), b(y)$分别是什么;然后可以写出$p(y|x,\\theta)$具体的形式" 210 | ] 211 | }, 212 | { 213 | "cell_type": "markdown", 214 | "metadata": {}, 215 | "source": [ 216 | "## Andrew Ng Lecture 2\n", 217 | "\n", 218 | "### Generative and Discriminative Model\n", 219 | "\n", 220 | "* Discriminative model 是学习$p(y|x)$,希望直接从input space $\\rightarrow$ label\n", 221 | "* Generative model 是学习$p(x|y)$,通过贝叶斯公式$p(y|x)=p(x|y)p(y)/p(x)$来做判别\n", 222 | "\n", 223 | "### Guassian Discriminate Analysis\n", 224 | "\n", 225 | "$$\n", 226 | "\\begin{aligned}\n", 227 | "y & \\sim Bernoulli(\\Phi) = p(y) = \\Phi^y (1-\\Phi)^{1-y} \\\\\n", 228 | "x|y=0 & \\sim N(\\mu_0, \\Sigma) = \\dfrac{1}{(2\\pi)^{n/2} |\\Sigma|^{1/2}} \\exp(-\\dfrac{1}{2}(x-\\mu_0)^T \\Sigma^{-1} (x-\\mu_0)) \\\\\n", 229 | "x|y=1 & \\sim N(\\mu_1, \\Sigma)\n", 230 | "\\end{aligned}\n", 231 | "$$" 232 | ] 233 | }, 234 | { 235 | "cell_type": "markdown", 236 | "metadata": {}, 237 | "source": [ 238 | "Write down log-likelihood and take derivative\n", 239 | "\n", 240 | "$$\n", 241 | "\\begin{aligned}\n", 242 | "\\Phi & = \\dfrac{1}{m} \\sum_i y^{(i)} \\\\\n", 243 | "\\mu_0 &= \\dfrac{\\sum_i (1-y^{(i)}) x^{(i)}}{\\sum_i (1-y^{(i)})} \\\\\n", 244 | "\\mu_1 &= \\dfrac{\\sum_i y^{(i)} x^{(i)}}{\\sum_i y^{(i)}} \\\\\n", 245 | "\\Sigma &= \\dfrac{1}{m} \\sum_i (x^{(i)} - \\mu_{y^{(i)}}) (x^{(i)} - \\mu_{y^{(i)}})^T\n", 246 | "\\end{aligned}\n", 247 | "$$" 248 | ] 249 | }, 250 | { 251 | "cell_type": "markdown", 252 | "metadata": {}, 253 | "source": [ 254 | "GDA can be changed into the form of logistic regression. Relationship: GDA is stronger and data efficient when at leat assumption is approximately correct; LR is weaker but do better in non-Gaussian data, e.g. Poisson data." 255 | ] 256 | }, 257 | { 258 | "cell_type": "markdown", 259 | "metadata": {}, 260 | "source": [ 261 | "### Naive Bayes and Multinomial Event Model\n", 262 | "\n", 263 | "* 都来做email spam任务\n", 264 | "* Naive Bayes:一篇文章用一个向量$x\\in \\mathbb{R}^{|V|}$表示,如果出现某个词就是1,如果没出现就是0;要学习的参数是$\\Phi = p(y=1)$,$\\Phi_{i|y=c} = p(x_i=1|y=c)$;取似然函数对数求导,得到结果,最后的结论是分子分母计数统计。\n", 265 | "* Multinomial Event Model:一篇文章用一个向量$x\\in \\mathbb{R}^{n_j}$表示,其中n是第j篇文章的长度,每一位代表这个词在词库中的编号;要学习的参数$\\Phi = p(y=1)$,$\\Phi_{i=k|y=c} = p(x_i=k|y=c)$,结论也差不多,就是计数统计。\n", 266 | "* Laplace Smoothing:有些词出现很少或者没出现过,所以在计数的时候认为每个基本的类都先出现过一次,即在分子+1,在分母加上类的个数。" 267 | ] 268 | }, 269 | { 270 | "cell_type": "markdown", 271 | "metadata": {}, 272 | "source": [ 273 | "## Andrew Ng Lecture 3\n", 274 | "\n", 275 | "### Lagrange Duelity\n", 276 | "\n", 277 | "$$\n", 278 | "\\begin{aligned}\n", 279 | "& \\min_w f(w) \\\\\n", 280 | "& s.t. g_i(w) \\le 0, \\ \\forall i \\in [k] \\\\\n", 281 | "& \\ \\ h_i(w) = 0, i \\in [l]\n", 282 | "\\end{aligned}\n", 283 | "$$\n", 284 | "\n", 285 | "Lagrange function $L(w,\\alpha,\\beta)=f(w) + \\sum \\alpha_i g_i(w) + \\sum \\beta_i h_i(w)$, define $\\theta_P(w) = \\max_{\\alpha, \\beta: \\alpha_i \\ge 0} L(w,\\alpha,\\beta)$ and $\\theta_D (\\alpha, \\beta) = \\min_w L(w,\\alpha,\\beta)$. We have $\\max_{\\alpha, \\beta: \\alpha_i \\ge 0} \\theta_D (\\alpha, \\beta) = d^* \\le p^* = \\min_w \\theta_P(w)$. Under certain conditions ($f$ and $g_i$ are convex; $\\{g_i\\}$ is feasible), $d^* = p^*$. On this optimal point, KKT condision should be satisfied\n", 286 | "\n", 287 | "$$\n", 288 | "\\begin{aligned}\n", 289 | "& \\dfrac{\\partial L }{\\partial w_i} = 0 \\\\\n", 290 | "& \\dfrac{\\partial L }{\\partial \\beta_i} = 0 \\\\\n", 291 | "& \\alpha_i g_i(w) = 0 \\\\\n", 292 | "& g_i(w) \\le 0 \\\\\n", 293 | "& a_i \\ge 0 \n", 294 | "\\end{aligned}\n", 295 | "$$" 296 | ] 297 | }, 298 | { 299 | "cell_type": "markdown", 300 | "metadata": {}, 301 | "source": [ 302 | "### SVM Deduction\n", 303 | "\n", 304 | "Start from \n", 305 | "\n", 306 | "$$\n", 307 | "\\max_{\\gamma, w, b} \\gamma \\quad s.t. y^{(i)}(w^T x^{(i)} + b) \\ge \\gamma, \\ ||w|| = 1\n", 308 | "$$\n", 309 | "\n", 310 | "Considering scaling invariance, we can scale by $\\dfrac{1}{||w||}$ and let $\\gamma = 1$.\n", 311 | "\n", 312 | "$$\n", 313 | "\\max_{w, b} \\dfrac{1}{2} ||w||^2 \\quad s.t. y^{(i)}(w^T x^{(i)} + b) \\ge 1\n", 314 | "$$\n", 315 | "\n", 316 | "By using Lagrange duality,\n", 317 | "\n", 318 | "$$\n", 319 | "\\max_\\alpha W(\\alpha) = \\sum_i \\alpha_i - \\dfrac{1}{2} \\sum_{i,j} y^{(i)}y^{(j)}\\alpha_i\\alpha_j x^{(i)} \\cdot x^{(j)}\n", 320 | "\\quad s.t. \\alpha_i \\ge 0, \\sum_i \\alpha_i y^{(i)} = 0\n", 321 | "$$\n", 322 | "\n", 323 | "When faced by soft margin,\n", 324 | "\n", 325 | "$$\n", 326 | "\\max_{w, b} \\dfrac{1}{2} ||w||^2 + C \\sum \\xi_i \\quad s.t. y^{(i)}(w^T x^{(i)} + b) \\ge 1 - \\xi_i, \\ \\xi_i \\ge 0\n", 327 | "$$\n", 328 | "\n", 329 | "we have \n", 330 | "\n", 331 | "$$\n", 332 | "\\max_\\alpha W(\\alpha) = \\sum_i \\alpha_i - \\dfrac{1}{2} \\sum_{i,j} y^{(i)}y^{(j)}\\alpha_i\\alpha_j x^{(i)} \\cdot x^{(j)}\n", 333 | "\\quad s.t. 0 \\le \\alpha_i \\le C, \\sum_i \\alpha_i y^{(i)} = 0\n", 334 | "$$" 335 | ] 336 | }, 337 | { 338 | "cell_type": "markdown", 339 | "metadata": {}, 340 | "source": [ 341 | "### Kernel\n", 342 | "\n", 343 | "Kernel function corresponds to the product of feature vector. Theorem: $K: \\mathbb{R}^n \\times \\mathbb{R}^n \\to \\mathbb{R}$ is a valid kernel $\\Longleftrightarrow \\forall \\{x^{(1)}, \\cdots, x^{(m)}\\}$ corresponding kernel matrix is symmetric positive semidefinite." 344 | ] 345 | }, 346 | { 347 | "cell_type": "markdown", 348 | "metadata": {}, 349 | "source": [ 350 | "### SMO\n", 351 | "\n", 352 | "SMO是用来解SVM的一个算法,主要思路是coordinate ascend,每次选定$\\alpha_1, \\alpha_2$,然后在不违反约束的情况下,最大化$W(\\alpha)$。需要满足的约束是$\\sum_i \\alpha_i y_i = 0 \\Rightarrow \\alpha_1 = (\\xi - \\alpha_2 y_2) y_1$和$\\alpha_1, \\alpha_2 \\in [0, C] \\Rightarrow L \\le \\alpha_2 \\le H$。把第一个约束条件带入,$W(\\alpha) = W((\\xi - \\alpha_2 y_2) y_1, \\alpha_2, \\cdots, \\alpha_m) = a\\alpha_2^2 + b\\alpha_2 + c$,二次型能够求出最值$\\alpha_2^*$,最后$\\alpha_2 \\leftarrow clip(\\alpha_2^*, L, H)$。迭代进行。" 353 | ] 354 | }, 355 | { 356 | "cell_type": "markdown", 357 | "metadata": {}, 358 | "source": [ 359 | "## Andrew Ng Lecture 4\n", 360 | "\n", 361 | "### Bias and variance\n", 362 | "\n", 363 | "* Bias: expected generalization error enven if we fit on a large training set. Simple model with few parameters suffers large bias.\n", 364 | "* Variance: a model has large variance if the model vary much under different data from the same distribution. Complex model with large number of parameters suffers large variance.\n", 365 | "\n", 366 | "### Empirical risk and generalization error\n", 367 | "\n", 368 | "* empirical risk: $\\newcommand{\\epsilonhat}{\\widehat{\\epsilon}} \\epsilonhat(h) = \\dfrac{1}{m} \\sum_{i} I(h(x^{(i)}) \\neq y^{(i})$\n", 369 | "* generalization error: $\\epsilon(h) = P_{(x,y)\\sim D}(h(x)\\neq y)$\n", 370 | "* empirical risk minimization (ERM) is to find parameter $\\theta$ that minimize empirical risk, or find hypothesis among hypothesis space that minimize empirical risk.\n", 371 | "\n", 372 | "### PAC\n", 373 | "\n", 374 | "* Theorem: 当样本数$m$满足以下关系时,能够保证empirical risk能够以大概率($1-\\delta$),近似$|\\epsilonhat - \\epsilon| \\le \\gamma$等于generalization error。\n", 375 | "* 证明:其实最后是要证明$P(\\neg \\exists h\\in \\mathcal{H} |\\epsilonhat(h) - \\epsilon(h)| > \\gamma) \\le 1 - 2 k \\exp(-2\\gamma^2 m)$,用Union Bound把它转化为单个的,再用Chernoff bound写出两者接近程度的bound。\n", 376 | "\n", 377 | "* Theorem:$\\epsilon(\\widehat{h}) \\le \\min_{h\\in\\mathcal{H}} \\epsilon(h) + 2 \\sqrt{\\dfrac{1}{2m} \\log(\\dfrac{2k}{\\delta})}$, where $\\widehat{h} = arg\\min \\epsilonhat(h)$, define $\\widehat{h^*} = arg\\min \\epsilon(h)$.\n", 378 | "* 证明:$\\epsilon(\\widehat{h}) \\le \\epsilonhat(\\widehat{h}) + \\gamma \\le \\epsilonhat(h^*) + \\gamma \\le \\epsilon(h^*) + 2 \\gamma$" 379 | ] 380 | }, 381 | { 382 | "cell_type": "markdown", 383 | "metadata": {}, 384 | "source": [ 385 | "### VC-dimension\n", 386 | "\n", 387 | "* VC-dimension解决了上述理论里面$k=|\\mathcal{H}|$在参数连续情况下发散的情况\n", 388 | "* shatter:$\\mathcal{H}$ shatters $S$ if $\\mathcal{H}$ can realize any binary label on $S = \\{x^{(1)}, \\cdots, x^{(d)}\\}$\n", 389 | "* $VC(\\mathcal{H}) = k$ 表示最多能找到一个包含k个点的$S$使得$\\mathcal{H}$能够把$S$分开。注意,找到一个这样k个点的集合即可,并不要求任意的k个点的集合,$\\mathcal{H}$都能分开。\n", 390 | "* Theorem: $\\gamma = |\\epsilon(h)-\\epsilonhat(h)| \\le O(\\dfrac{d}{m} \\log \\dfrac{m}{d} + \\dfrac{1}{m} \\log \\dfrac{1}{d})$, where d is the VC-dimension" 391 | ] 392 | }, 393 | { 394 | "cell_type": "markdown", 395 | "metadata": {}, 396 | "source": [ 397 | "## Andrew Ng Lecture 5\n", 398 | "\n", 399 | "### Model selection by k-fold CV\n", 400 | "\n", 401 | "```\n", 402 | "def model_selectio_kfold(M1, M2, ..., Md):\n", 403 | " randomly split S into k set S1, ..., Sk\n", 404 | " for each Mi:\n", 405 | " for j = 1 to k:\n", 406 | " train on S/Sj test on Sj\n", 407 | " epsilon[Mi] = mean(epsilon[Mi, j], j)\n", 408 | " pick Mi with lowest epsilon[Mi] and retrain the model\n", 409 | "```\n", 410 | "\n", 411 | "### Full Bayes, MAP and ML\n", 412 | "\n", 413 | "* Full Bayes:\n", 414 | "on training \n", 415 | "$$\n", 416 | "\\begin{aligned}\n", 417 | "\\newcommand{\\sampleyi}{y^{(i)}}\n", 418 | "\\newcommand{\\samplexi}{x^{(i)}}\n", 419 | "\\newcommand{\\sumoversample}{\\sum_{i=1}^m}\n", 420 | "\\newcommand{\\prodoversample}{\\prod_{i=1}^m}\n", 421 | "p(\\theta| S) & = \\dfrac{p(S|\\theta)p(\\theta)}{p(S)} \\\\\n", 422 | "& = \\dfrac{p(\\theta) \\prodoversample p(\\sampleyi | \\samplexi, \\theta)}{\\int_\\theta (\\prodoversample p(\\sampleyi | \\samplexi, \\theta) ) p(\\theta) d\\theta}\n", 423 | "\\end{aligned}\n", 424 | "$$\n", 425 | "on new example\n", 426 | "$$ p(y|x, S) = \\int_\\theta p(y|x,\\theta) p(\\theta|S) d\\theta $$" 427 | ] 428 | }, 429 | { 430 | "cell_type": "markdown", 431 | "metadata": {}, 432 | "source": [ 433 | "* MAP (maximum a posteriori): choose only the maximizer $\\theta$, $\\theta_{MAP} = arg\\max_\\theta \\prodoversample p(\\sampleyi | \\samplexi, \\theta) p(\\theta)$. Choosing $p(\\theta) \\sim \\mathcal{N}(0, \\tau^2 I)$ serves as a regularization.\n", 434 | "* ML (maximum a likelihood): $\\theta_{ML} = arg\\max_\\theta \\prodoversample p(\\sampleyi | \\samplexi, \\theta)$" 435 | ] 436 | }, 437 | { 438 | "cell_type": "markdown", 439 | "metadata": {}, 440 | "source": [ 441 | "## Andrew Ng Lecture 6\n", 442 | "\n", 443 | "### Online Learning\n", 444 | "\n", 445 | "* from batch learning to online learning\n", 446 | "* perceptron algorithm for online learning: define $h_\\theta(x) = sign(\\theta^T x)$ returns -1 or 1, label $y \\in \\{-1, +1\\}$, if the algorithm goes wrong at some example do $\\theta \\leftarrow \\theta + \\sampleyi \\samplexi$\n", 447 | "* Theorem: $||\\samplexi|| \\le D$, $\\exists u$ s.t. $||u||^2 = 1$, $\\sampleyi (u^T \\samplexi) \\ge \\gamma$, total number of mistaks the algorithm makes $\\le (\\dfrac{D}{\\gamma})^2$\n", 448 | "* 证明:假设出现第k个错误时为$\\theta^{(k)}$,数学归纳法证${\\theta^{(k+1)}}^T u \\ge k\\gamma$, $||\\theta^{(k+1)}||^2 \\le k D^2$" 449 | ] 450 | }, 451 | { 452 | "cell_type": "markdown", 453 | "metadata": {}, 454 | "source": [ 455 | "## Andrew Ng Lecture 7\n", 456 | "\n", 457 | "### k-means\n", 458 | "\n", 459 | "* 算法:repeat $c^{(i)} = arg \\min_j ||\\samplexi - \\mu_j||^2, \\ \\forall i \\in [m]$ and $\\mu_j = \\dfrac{\\sumoversample I(c^{(i)} = j) \\samplexi}{\\sumoversample I(c^{(i)} = j)}$\n", 460 | "* 分析:k-means is exactly coordinate descent on $J(c, \\mu) = \\sumoversample ||\\samplexi - \\mu_{c^{(i)}}||^2$" 461 | ] 462 | }, 463 | { 464 | "cell_type": "markdown", 465 | "metadata": {}, 466 | "source": [ 467 | "### Mixture of Gaussian\n", 468 | "\n", 469 | "![](figures/fig20180407_EM.png)\n", 470 | "\n", 471 | "![](figures/fig2018027_EM.png)" 472 | ] 473 | }, 474 | { 475 | "cell_type": "markdown", 476 | "metadata": {}, 477 | "source": [ 478 | "## Andrew Ng Lecture 8\n", 479 | "\n", 480 | "### EM Algorithm\n", 481 | "\n", 482 | "当隐变量存在的时候,对数似然函数$l(\\theta) = \\sumoversample \\log \\sum_z p(x; x, \\theta)$。注意到log函数是凹函数,使用Jensen不等式,可以得到该似然函数的下界,令似然函数能够取到该下界,得到x的分布$Q(z^{(i)}) = p(z^{(i)} | \\samplexi, \\theta) = \\dfrac{p(\\samplexi, z^{(i)}, \\theta)}{\\sum_z p(\\samplexi, z, \\theta)}$(E-step)。在M-step求一个参数能够最大化该下界,$\\theta = arg\\max_\\theta \\sum_i \\sum_{z^{(i)}} Q(z^{(i)}) \\log \\dfrac{p(\\samplexi, z^{(i)}, \\theta)}{Q(z^{(i)})}$。\n", 483 | "\n", 484 | "EM算法还可以看做是对$J(Q, \\theta) = \\sum_i \\sum_{z^{(i)}} Q(z^{(i)}) \\log \\dfrac{p(\\samplexi, z^{(i)}, \\theta)}{Q(z^{(i)})}$做coordinate ascend。" 485 | ] 486 | }, 487 | { 488 | "cell_type": "markdown", 489 | "metadata": {}, 490 | "source": [ 491 | "## Andrew Ng Lecture 9\n", 492 | "\n", 493 | "### Factor Analysis\n", 494 | "\n", 495 | "如果样本的数目m没有输入样本的维度n多的时候,用Gaussian模型来求解的时候协方差矩阵很容易是Singlar Matrix。考虑用一个维度比较低的隐变量$z\\in\\mathbb{R}^k, \\ ku->v,而是从s~>x(x in S)->y(y not in S)~>v,注意到每次先扩展最小的,所以到v的肯定比到y短,因此不可能从别的点绕一圈回到v更短。\n", 182 | "\n", 183 | "### Bellman-Ford Algorithm\n", 184 | "\n", 185 | "单源最短路径问题,权重可以为负数,但是不能有负环。动态规划,定义$opt[i,v]$为从v到t最多用i条边的最短路径,有状态转移方程\n", 186 | "\n", 187 | "$$\n", 188 | "opt[i,v] = \\min(opt[i-1, v], \\min_{w\\in V} (opt[i-1, w] + c[v, w])) \n", 189 | "$$\n", 190 | "\n", 191 | "计算的时候按照i递增的顺序计算即可。\n", 192 | "\n", 193 | "应用:用于网络路由,每个节点可以维护当前节点到其他节点的数值$M[t]=\\min_{w\\in V} (opt[i-1, w] + c[v, w])$,这样一个包在每次都只用选择相连的$M[t]$最小的节点即可。维护的方法是对于每个节点都去基于$M[v] = \\min(M[v], \\min_u M[u]+c[v,u])$去更新,如果更新了,该节点激活,用于提示相邻节点也看看要不要更新。" 194 | ] 195 | }, 196 | { 197 | "cell_type": "markdown", 198 | "metadata": {}, 199 | "source": [ 200 | "### Floyd-Warshall算法\n", 201 | "\n", 202 | "多源最短路径问题,权重可以为负数,但是不能有负环。动态规划,定义$d_{ij}^{(k)}$表示从顶点i到顶点j,并且中间路径中顶点属于[k]的最短路径。最开始$d_{ij}^{(0)}=w_{ij}$,如果顶点i和顶点j直接相连。有状态转移方程\n", 203 | "\n", 204 | "$$\n", 205 | "d_{ij}^{(k)} = \\min(d_{ij}^{(k-1)}. d_{ik}^{(k-1)} + d_{kj}^{(k-1)})\n", 206 | "$$\n", 207 | "\n", 208 | "三层循环,从外到内分别是k,i,j,运行时间$O(|V|^3)$。要想追踪具体的路径,可以利用一个前驱矩阵来每次记录选用的哪个k。" 209 | ] 210 | }, 211 | { 212 | "cell_type": "markdown", 213 | "metadata": {}, 214 | "source": [ 215 | "## Minimum spanning tree\n", 216 | "\n", 217 | "### Kruskal's algorithm\n", 218 | "\n", 219 | "* 方法:每次选取一条最短的边,并且这条边不和现有的边构成环\n", 220 | "* 性质:如果一条边$e=(u,v)$是从S到V-S最小的边,那么所以的最小生成树必须包含这条边。证明:如果不选e,那么最小生成树P中从u到v一定有一条路径,这条路径肯定横跨S和V-S,横跨的边为$e'$,把边从$e'$换成$e$可以使得代价更小。\n", 221 | "\n", 222 | "### Prim's algorithm\n", 223 | "\n", 224 | "* 方法:每次加入一条边到S中,每次加入最短的连接到V-S的边\n", 225 | "* 证明方法与上面类似" 226 | ] 227 | }, 228 | { 229 | "cell_type": "markdown", 230 | "metadata": {}, 231 | "source": [ 232 | "## Sort\n", 233 | "\n", 234 | "### Bubble Sort\n", 235 | "\n", 236 | "```\n", 237 | "for i = 1 to N-1:\n", 238 | " for j = 1 to N-i:\n", 239 | " if A[j] < A[j-1]:\n", 240 | " swap(A[j], A[j-1])\n", 241 | "```\n", 242 | "\n", 243 | "* 证明正确性:每个j循环之后,A[N-i:N]都变为有序的\n", 244 | "* 复杂度:不管什么情况都需要$O(n^2)$次判断,最好情况可以不做交换,平均和最坏需要$O(n^2)$次交换\n", 245 | "\n", 246 | "### Insertion Sort\n", 247 | "\n", 248 | "```\n", 249 | "for i = 2 to N:\n", 250 | " tmp = A[i]\n", 251 | " for j = i downto 2:\n", 252 | " if tmp >= A[j-1]:\n", 253 | " break\n", 254 | " A[j] = A[j-1]\n", 255 | " A[j] = tmp\n", 256 | "```\n", 257 | "\n", 258 | "* 证明正确性:每次j循环之后A[1:i]都变为有序的\n", 259 | "* 运行时间:$O(n^2)$\n", 260 | "\n", 261 | "### Heap Sort\n", 262 | "\n", 263 | "* 最大堆:一颗二叉树,每个父节点都比自己的子节点大\n", 264 | "* 一个最大堆,根节点i被修改了,通过max_heapify可以使得其在$O(\\log n)$时间内维护成最大堆\n", 265 | "\n", 266 | "```\n", 267 | "def max_heapify(A, i):\n", 268 | " l = left(i)\n", 269 | " r = right(i)\n", 270 | " if l <= heapsize and A[l] > A[i]:\n", 271 | " largest = l\n", 272 | " else:\n", 273 | " largest = i\n", 274 | " if r <= heapsize and A[r] > A[largest]\n", 275 | " largest = r\n", 276 | " if largest != i\n", 277 | " swap(A[i], A[largest])\n", 278 | " max_heapify(A, largest)\n", 279 | "```\n", 280 | "\n", 281 | "* 思路:先建立一个堆,由最大堆的性质可以知道,堆的根是最大的元素,把根和最后一个叶子互换,然后把那个叶子从堆中取走,再维护成一个最大堆。如此重复,可以得到排序\n", 282 | "\n", 283 | "```\n", 284 | "def build_heap(A):\n", 285 | " headsize(A) = len(A)\n", 286 | " for i = floor(len(A) / 2) downto 1:\n", 287 | " max_heapify(A, i)\n", 288 | "```\n", 289 | "\n", 290 | "```\n", 291 | "def heap_sort(A)\n", 292 | " build_heap(A)\n", 293 | " for i = N downto 2:\n", 294 | " swap(A[1], A[i])\n", 295 | " heapsize(A) -= 1\n", 296 | " max_heapify(A, 1)\n", 297 | "```" 298 | ] 299 | }, 300 | { 301 | "cell_type": "markdown", 302 | "metadata": {}, 303 | "source": [ 304 | "## Basic Data Structure\n", 305 | "\n", 306 | "### Binary Search Tree\n", 307 | "\n", 308 | "* 每个节点左子树上面元素的数值都比该节点小,右子树上面元素的数值都比该节点大。\n", 309 | "* 查询:递归判断大小觉得往左还是往右,运行时间为树的平均深度\n", 310 | "* 找最大和最小:一直往左或者往右,运行时间为树的深度\n", 311 | "* 给定一个节点找后继(比该节点数值大的节点中最小的):如果有右子树就找右子树中最大的,如果没有就找祖先中第一个把自己这一支当做左子树的节点\n", 312 | "* 插入:和查找类似,找到能找到的最近的叶子,在该叶子上开一个节点来放自己\n", 313 | "* 删除:如果是一个单节点,直接删除;如果有一个子节点,把子节点接到父节点上面;如果有两个子节点,找出自己的后继,用后继替代自己的位置\n", 314 | "\n", 315 | "### B-tree\n", 316 | "\n", 317 | "* 性质:\n", 318 | " - 每个节点有多个子节点,节点中顺序存放子节点的分割数值\n", 319 | " - 除了根节点之外,每个节点的子节点数目$[t-1, 2t-1]$,如果子节点数目达到$2t-1$就说这个节点是满的\n", 320 | "* 查询:显而易见\n", 321 | "* 插入:如果要插入的节点从上到下都不满,就直接插入;如果满了就需要把满的节点分裂。分裂的方式就是把节点中第t个元素提到父节点中,然后把[1:t-1]和[t+1:2t-1]元素分别分裂成两个分支\n", 322 | "* 删除:情况比较复杂,总体思想就是不能违反B树的性质\n", 323 | "\n", 324 | "### Union-Find Set\n", 325 | "\n", 326 | "* 作用:反映元素和集合之间的关系\n", 327 | "* 主要功能:\n", 328 | " - MAKE_SET(x):建立一个只包含元素x的集合\n", 329 | " - UNION(x,y):把x所在的集合和y所在的集合合并\n", 330 | " - FIND_SET(x):找出x所对应的集合\n", 331 | "* 实现:\n", 332 | " * 链表\n", 333 | " * 带路径压缩的树结构" 334 | ] 335 | }, 336 | { 337 | "cell_type": "markdown", 338 | "metadata": {}, 339 | "source": [ 340 | "## Basic Graph\n", 341 | "\n", 342 | "### BFS\n", 343 | "\n", 344 | "```\n", 345 | "def BSF(G, s):\n", 346 | " for each u in G:\n", 347 | " color[u] = WHITE\n", 348 | " color[s] = GRAY\n", 349 | " Q = [s]\n", 350 | " while Q:\n", 351 | " u = Q.pop()\n", 352 | " for each v adj to u:\n", 353 | " if color[v] = WHITE:\n", 354 | " color[v] = GRAY\n", 355 | " Q.append(v)\n", 356 | " color[u] = BLACK\n", 357 | "```\n", 358 | "\n", 359 | "### DFS\n", 360 | "\n", 361 | "```\n", 362 | "def DFS(G):\n", 363 | " for each u in G:\n", 364 | " color[u] = WHITE\n", 365 | " for each u in G:\n", 366 | " if color[u] == WHITE:\n", 367 | " DFS_visit(u)\n", 368 | " \n", 369 | "def DFS_visit(u):\n", 370 | " color[u] = GRAY\n", 371 | " for each v adj to u:\n", 372 | " if color[v] == WHITE:\n", 373 | " DFS_visit(v)\n", 374 | " color[u] = BLACK\n", 375 | "```\n", 376 | "\n", 377 | "### Topological Order\n", 378 | "\n", 379 | "* 性质:在有向无环图中,存在一种排序使得每条边都往后指\n", 380 | "* 运行DFS,并且按照每个节点被标记为黑色的时间先后顺序排序就是拓扑排序;只有某个节点的后继都标黑了它才可能标黑,所以标黑时间靠后的排在前面。\n", 381 | "\n", 382 | "### Strongly Connected Components\n", 383 | "\n", 384 | "* 定义:一个图中的一个节点集合,任意两个节点都可以互通,就叫做强连通分支;问题是给定一个图,找出其强连通分支的分解\n", 385 | "* 图的转置:就是把边都翻过来 $G^T=(V, E^T)$,$E^T = \\{(u,v) | (v, u)\\in E\\}$\n", 386 | "\n", 387 | "```\n", 388 | "def strongly_connected_components(G):\n", 389 | " DFS(G) record turn-black time f(u) for all u\n", 390 | " DFS(G^T) in decreasing order of f(u) and record the forest\n", 391 | " each tree in the forest is a strongly connected component\n", 392 | "```" 393 | ] 394 | }, 395 | { 396 | "cell_type": "markdown", 397 | "metadata": {}, 398 | "source": [ 399 | "## Hashing Table\n", 400 | "\n", 401 | "* 关键字域U很大,但是实际的关键字集合K很小。这时如果要用**直接寻址法**储存每个关键字对应的数值,需要的内存开销为$|U|$。如果能够使用一个hash function $h: U \\rightarrow [m]$就可以把所需要的内存降低到m。\n", 402 | "* 坏处是可能发生不同的关键字被映射到同一个槽位上,即碰撞(collision)。解决方法有chaining和open addressing。\n", 403 | "\n", 404 | "### Chaining \n", 405 | "\n", 406 | "* 做法:每个槽位后面跟一个链表,插入和删除就直接对链表操作时间为$O(1)$\n", 407 | "* 查找的时间:通过计算期望得到$O(1+\\alpha)$,其中$\\alpha = n/m$,$m$为槽位数目,$n$为要储存的元素个数。即如果$m$和$n$成正比,就能在$O(1)$时间内完成查找。\n", 408 | "\n", 409 | "### Open Addressing\n", 410 | "\n", 411 | "* 思路:与chaining不同的是,开放寻址中所有的元素都存在槽位里面,如果发生冲突,就找散列函数对应的下一个槽位。因此其装载因子$\\alpha\\le 1$。\n", 412 | "* 做法:把之前的散列函数的输入加入一个探查号i,$h:U\\times [m] \\rightarrow [m]$,并且希望探查号$i=1,\\cdots,m$对应的槽位是所有槽位的一个排列,不同散列函数尽量对应不同的排列。\n", 413 | "* 具体散列函数:\n", 414 | " - 线性探查:$h(k,i)=(h'(k) + i) \\mod m$\n", 415 | " - 二次探查:$h(k,i)=(h'(k) + c_1i+c_2i^2) \\mod m$\n", 416 | " - 双重探查:$h(k,i)=(h_1(k) + i h_2(k)) \\mod m$\n", 417 | "\n", 418 | "### Univeral Hashing\n", 419 | "\n", 420 | "* 思路:如果固定一个散列函数,对手都可以选择全部散列到同一个槽位的n个元素给你作对,使得算法的检索时间为$\\Theta(n)$。我们希望设计一组散列函数,每次随机选择一个散列函数,这样就有较好的平均性能。\n", 421 | "* 定义:对于任意的不同关键字$k,l\\in U$,满足$h(k) = h(l)$的散列函数$h\\in \\mathcal{H}$的个数至多为$|\\mathcal{H}|/m$。\n", 422 | "* 设计一个universal hashing:$\\mathcal{H}_{p,m} = \\{ h_{a,b} : a\\in[1:p-1], b\\in[0:p-1]\\}$,其中$h_{a,b}(k) =((ak+b) \\mod p) \\mod m$\n", 423 | "\n", 424 | "### Perfect Hasing\n", 425 | "\n", 426 | "* 定义:如果某一种散列技术在进行查找时,最坏情况的性能也是$O(1)$的话,称其为perfect mathing。仅在关键字是静态时,才有完全散列。\n", 427 | "* 做法:采用两级散列表,第一级散列表照常进行,第二级散列表选择的槽位数目$m_j=n_j^2$,可以证明大于$1/2$的概率可以一次选的一个没有碰撞的二级散列表(注意到关键字是静态的),通过几次尝试可以选择一个没有碰撞的散列函数。由于第二级散列表没有冲突,因此最坏查询时间也是$O(1)$。同时可以证明,总体存储空间也是$O(n)$,即$\\mathbb{E}[\\sum_{j=1}^{m-1}n_j^2] < 2n$。" 428 | ] 429 | }, 430 | { 431 | "cell_type": "markdown", 432 | "metadata": {}, 433 | "source": [ 434 | "## Flow\n", 435 | "\n", 436 | "* 流网络:$G(V,E)$每条边关联一个非负容量c;存在单一源点$s\\in V$;存在单一汇点$t\\in V$。\n", 437 | "* 流函数:$f:E\\rightarrow\\mathbb{R}^+$,并且满足\n", 438 | " - 容量条件:对于每条边$e\\in E$,$0\\le f(e)\\le c[e]$\n", 439 | " - 守恒条件:$\\forall e\\in E \\setminus \\{s, t\\}$,$\\sum_{e\\text{ towards }v} f(e) = \\sum_{e\\text{ from }v} f(e)$\n", 440 | "* 最大流:使得$v(f) = f^{out}(s)$最大,其中$f^{out}(v) = \\sum_{e\\text{ from }v} f(e)$\n", 441 | "\n", 442 | "### Ford-Fulkerson\n", 443 | "\n", 444 | "* 剩余图:$G_f$的点集和$G$的点集相同,每条边$e=(u,v)$,如果$f(e)0$,则在$G_f$中构建一条容量为$f(e)$的前向边$(v,u)$\n", 445 | "\n", 446 | "* 对于一条简单路径P、现在已经存在的流f,去增加f的流量。其中``bottleneck(P,f)``是P上任何边关于f的最小剩余容量(还可以增大多少容量)。\n", 447 | "\n", 448 | "```\n", 449 | "def augment(f, P):\n", 450 | " b = bottleneck(P, f)\n", 451 | " for e in P:\n", 452 | " if e is forward edge in G_f:\n", 453 | " f(e) += b\n", 454 | " elif e is backward edge in G_f:\n", 455 | " f(e) -= b\n", 456 | " return f\n", 457 | "```\n", 458 | "\n", 459 | "```\n", 460 | "def max_flow:\n", 461 | " f(e) = 0 for all e in G\n", 462 | " while exist s-t path in G_f:\n", 463 | " P = the exist s-t path\n", 464 | " f' = augment(f, P)\n", 465 | " f = f'\n", 466 | " G_f = G_f'\n", 467 | " return f\n", 468 | "```\n", 469 | "\n", 470 | "* 分析:如果容量都是整数。每次产生的新的流,由于bottleneck>0,$v(f') = v(f) + bottleneck(P,f)$,所以每次循环都变好。假设最大流为C,因此算法肯定能够在$O(C(|V|+|E|))$内找到答案。" 471 | ] 472 | }, 473 | { 474 | "cell_type": "markdown", 475 | "metadata": {}, 476 | "source": [ 477 | "### Max Flow and Minimum Cut\n", 478 | "\n", 479 | "* s-t割:是对于结点集合V的一个划分(A,B),使得$s\\in A$,$t\\in B$。一个割的容量记为$c(A,B)=\\sum_{e from A} c[e]$\n", 480 | "* 性质:\n", 481 | " - f是任意s-t流,(A,B)是任意s-t割,那么$v(f) = f^{out}(A)-f^{in}(A) = f^{in}(B)-f^{out}(B)$,$v(f) \\le c(A,B)$\n", 482 | " - Ford-Fulkerson返回最大流:f是剩余图中没有s-t路径的一个s-t流,那么G中存在一个s-t割$(A^*,B^*)$使得$v(f) = c(A^*,B^*)$\n", 483 | " - 最大流-最小割定理:每个流网络中,存在一个流f和一个割(A,B)使得$v(f) = c(A,B)$;每个流网络中,s-t流的最大值等于s-t割的最小容量。" 484 | ] 485 | }, 486 | { 487 | "cell_type": "markdown", 488 | "metadata": {}, 489 | "source": [ 490 | "### Multiple Sources/Sinks\n", 491 | "\n", 492 | "* 每个节点都有供给或者需求,守恒条件变为需求条件$f^{in}(v) - f^{out}(v) = d_v$,求解的问题变为判断是否存在一个可行的解满足所有点的需求。\n", 493 | "* 解决方法:引入一个超源点$s^*$它向每个$d>0$的节点有一条容量$d$的边,引入一个超汇聚点$t^*$每个$d<0$的节点都有一条容量为$-d$的边进入它。解决新问题的最大流问题,如果最大流v满足$v = \\sum_{d>0} d = \\sum_{d<0} -d$则存在可行解。\n", 494 | "\n", 495 | "### With Capacity Lower Bound\n", 496 | "\n", 497 | "* 描述:在前面的基础上,如果修改容量条件为$l[e]\\le f(e) \\le c_e$,再判断是否存在可行解\n", 498 | "* 解法:固定最小的容量,并且修改相关节点的需求值,转化为上面等价的问题\n", 499 | "\n", 500 | "### Other Problems\n", 501 | "\n", 502 | "* 调查设计:X产品,Y顾客;X-Y连边:顾客i购买过产品j,容量1;s-X连边容量有上下界,表示每个产品需要的问卷数;Y-t连边容量有上下界,表示每个顾客问卷涉及的产品数目;存在可行的流等价于存在可行方案\n", 503 | "* 航线调度:用k个飞机执飞若干个航班;节点为出发地或者目的地;如果有一趟航班,则添加一条边具有下界1和容量1;如果有足够时间从一个目的地到下一个出发地,则添加一条边具有下界0和容量1;s到每一个出发地都有一条边下界0和容量1(可以从任何出发地开始这一天);相应的有t;源点-k的需求,汇点k的需求;是否存在可行解\n", 504 | "* 图像分割:问题描述是对于像素进行一个前景-后景的划分,$\\max \\sum_{i\\in A}a_i + \\sum_{i\\in B}b_i - \\sum_{(i,j)\\in E}p_{ij}$,其中$a_i,b_i$是前后景的概率,$p$是分割惩罚,$E$表示分割不同的相邻像素边。每个像素都是一个节点,节点之间的连边容量表示分割惩罚;s和所有节点相连,容量为$a_i$;所有节点和t相连,容量为$b_i$。找最大流对应的就是最小割,最小割刚好就是上面的优化目标。\n", 505 | "* 棒球排除:判断某个队z在当前情况下小组积分能否排第一。X:除z之外剩余比赛;Y:除z外的队伍;X-Y连边表示在比赛x中队伍y获胜,容量无限;连边s-X的容量表示还剩多少场比赛;连边Y-t容量表示要想z排第一其他队伍在剩下的比赛中不能获得超过此分数。如果存在最大流把s-X路径填满,则z可能获胜,否则不可能。" 506 | ] 507 | }, 508 | { 509 | "cell_type": "markdown", 510 | "metadata": {}, 511 | "source": [ 512 | "## Matching\n", 513 | "\n", 514 | "* 定义:图$G(V,E)$中的一个边集$M\\subseteq E$,每个结点都至多出现在M的一条边上。称M为匹配。如果每个结点恰好出现在M的一条边上,成为完全匹配。\n", 515 | " * matching: 不共点的边集M\n", 516 | " * matching number: 不共点边集的边的数量\n", 517 | " * maximum matching: matching number最大的\n", 518 | " * perfect matching: 能够覆盖所有点的\n", 519 | " * M-alternating path: $G=(V, E)$中一条交替出现在$M$和$E\\setminus M$中的路径\n", 520 | " * M-augmenting path: 路径中两端没有被覆盖\n", 521 | " \n", 522 | "### Hungarian Algorithm\n", 523 | "\n", 524 | "```\n", 525 | "start from any matching M\n", 526 | "if M is a perfect matching:\n", 527 | " return M\n", 528 | "x0 = a exposed vertex in X\n", 529 | "A = {x0} \n", 530 | "B = {}\n", 531 | "if N(A) == B:\n", 532 | " return NO_PERFECT_MATCHING\n", 533 | " # because |N(A)|=|B|=|A|-1<|A|, violate Hall's theorem\n", 534 | "else:\n", 535 | " y1 = a vertex in N(A) but not in B\n", 536 | " if y1 is covered by M:\n", 537 | " B += {y1}\n", 538 | " A += {x1: (y1,x1) in M}\n", 539 | " goto 'if N(A) == B'\n", 540 | " else:\n", 541 | " P = path from x0 to y is an alternating path\n", 542 | " replace M by new M'\n", 543 | " goto 'if M is a perfect matching'\n", 544 | "```\n", 545 | "\n", 546 | "### Hall's Theorem\n", 547 | "\n", 548 | "* 动机:如何在不求出最大流的情况下,找出一个证据说某二分图不存在完美匹配,即最大流小于n。根据最大流最小割定理,如果能够找到一个割容量小于n,即可说明。ps. 当然如果直接求出最大流,也能知道是否存在完美匹配。\n", 549 | "* 描述:对于一个子集$A\\subseteq X$,用$\\Gamma(A)\\subseteq Y$表示邻接A中节点的集合,如果二部图$(X,Y)$有完美匹配,那么对于所有的$A\\subseteq X$,都有$|\\Gamma(A)|\\ge |A|$。\n" 550 | ] 551 | }, 552 | { 553 | "cell_type": "markdown", 554 | "metadata": {}, 555 | "source": [ 556 | "## Linear Programming\n", 557 | "\n", 558 | "线性规划标准型表示:\n", 559 | "\n", 560 | "$$\n", 561 | "\\begin{aligned}\n", 562 | "& \\max\\ {\\bf C}^T {\\bf x} \\\\\n", 563 | "& s.t. {\\bf Ax} \\le {\\bf b} \\\\\n", 564 | "& \\quad \\quad {\\bf x} \\ge 0\n", 565 | "\\end{aligned}\n", 566 | "$$\n", 567 | "\n", 568 | "LP都可以转化为标准型。特别地,如果某元素$x_j$不满足$x_j \\ge 0$,可令$x'_j,x''_j \\ge 0$,代换$x_j = x'_j - x''_j$\n", 569 | "\n", 570 | "线性规划松弛型表示,相关的不等式约束都变成等式约束,只剩下关于单个元素的不等式约束:\n", 571 | "\n", 572 | "$$\n", 573 | "\\begin{aligned}\n", 574 | "& \\max\\ z = v + \\sum_{j\\in N}c_j x_j \\\\\n", 575 | "& s.t. x_i = b_i - \\sum_{j \\in N} a_{ij}x_j, \\, \\forall i \\in B \\\\\n", 576 | "& \\quad \\quad x_i \\ge 0, \\, \\forall i \\in B\\cup N\n", 577 | "\\end{aligned}\n", 578 | "$$\n" 579 | ] 580 | }, 581 | { 582 | "cell_type": "markdown", 583 | "metadata": {}, 584 | "source": [ 585 | "### Simplex\n", 586 | "\n", 587 | "* 基本解:考虑松弛型表示,令第一个约束右边变量(非基本变量)都为0,得到的一组解\n", 588 | "* 基本可行解:如果基本解也满足第二个约束,则是基本可行解\n", 589 | "* 单纯形法思路:先找一个基本可行解,然后每次都考虑变化一个最优化表达式中的非基本变量,使其刚好不违反约束而使得目标最优,然后把这个非基本变量$x_e$代换成一个基本变量$x_l$(pivot),然后重复进行\n", 590 | "\n", 591 | "```\n", 592 | "def pivot(N, B, A, b, c, v, l, e):\n", 593 | " b_new[e] = b[l] / a[l,e]\n", 594 | " for each j in N-{e}:\n", 595 | " a_new[e,j] = a[l,j] / a[l,e]\n", 596 | " a_new[e,l] = 1/a[l,e]\n", 597 | " \n", 598 | " for each i in B-{l}:\n", 599 | " b_new[i] = b[i] - a[i,e] * b_new[e]\n", 600 | " for each j in N-{e}:\n", 601 | " a_new[i,j] = a[i,j] - a[i,e] * a_new[e,j]\n", 602 | " a_new[i,l] = - a[i,e] a_new[e,l]\n", 603 | " \n", 604 | " v_new = v + c[e] * b_new[e]\n", 605 | " for each j in N-{e}:\n", 606 | " c_new[j] = c[j] - c[e] * a_new[e,j]\n", 607 | " c_new[l] = - c[e] * a_new[e,l]\n", 608 | " \n", 609 | " N_new = N - {e, l}\n", 610 | " B_new = B - {e, l}\n", 611 | " \n", 612 | " return N_new, B_new, A_new, b_new, c_new, v_new\n", 613 | "```\n", 614 | "\n", 615 | "其中需要一个程序initialize_simplex来找到第一个基本可行解。\n", 616 | "\n", 617 | "```\n", 618 | "def simplex(A, b, c):\n", 619 | " N, B, A, b, c, v = initialize_simplex(A, b, c)\n", 620 | " while some index e in N that c[e] > 0:\n", 621 | " for each i in B:\n", 622 | " if a[i, e] > 0:\n", 623 | " delta[i] = b[i] / a[i, e]\n", 624 | " else:\n", 625 | " delta[i] = inf\n", 626 | " choose l in B that minimize delta[i]\n", 627 | " if delta[l] == inf:\n", 628 | " return UNBOUNDED\n", 629 | " else:\n", 630 | " N, B, A, b, c, v = pivot(N, B, A, b, c, v, l, e)\n", 631 | " \n", 632 | " for i = 1 to n:\n", 633 | " if i in B:\n", 634 | " x[i] = b[i]\n", 635 | " else:\n", 636 | " x[i] = 0\n", 637 | " return x\n", 638 | "```\n", 639 | "\n", 640 | "### Duality\n", 641 | "\n", 642 | "LP\n", 643 | "$$\n", 644 | "\\begin{aligned}\n", 645 | "& \\max\\ {\\bf C}^T {\\bf x} \\\\\n", 646 | "& s.t. {\\bf Ax} \\le {\\bf b} \\\\\n", 647 | "& \\quad \\quad {\\bf x} \\ge 0\n", 648 | "\\end{aligned}\n", 649 | "$$\n", 650 | "\n", 651 | "DP\n", 652 | "$$\n", 653 | "\\begin{aligned}\n", 654 | "& \\min\\ {\\bf b}^T {\\bf y} \\\\\n", 655 | "& s.t. {\\bf A}^T y \\ge {\\bf C} \\\\\n", 656 | "& \\quad \\quad {\\bf y} \\ge 0\n", 657 | "\\end{aligned}\n", 658 | "$$" 659 | ] 660 | }, 661 | { 662 | "cell_type": "markdown", 663 | "metadata": {}, 664 | "source": [ 665 | "弱对偶:对偶问题的可行解$\\bf b^Ty$是原问题可行解$\\bf C^T x$的上界。在LP中,两者的最优解相等。" 666 | ] 667 | } 668 | ], 669 | "metadata": { 670 | "kernelspec": { 671 | "display_name": "Python 3", 672 | "language": "python", 673 | "name": "python3" 674 | }, 675 | "language_info": { 676 | "codemirror_mode": { 677 | "name": "ipython", 678 | "version": 3 679 | }, 680 | "file_extension": ".py", 681 | "mimetype": "text/x-python", 682 | "name": "python", 683 | "nbconvert_exporter": "python", 684 | "pygments_lexer": "ipython3", 685 | "version": "3.6.3" 686 | } 687 | }, 688 | "nbformat": 4, 689 | "nbformat_minor": 2 690 | } 691 | -------------------------------------------------------------------------------- /Summary_Algo_compact.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "## Greedy\n", 8 | "* 【Interval Scheduling】目标:选择最多的任务;解法:每次选择最早的ddl;证明:和OPT比较,前k个任务的结束时间SOL都比OPT早,如果OPT比SOL多,可以把OPT多出来的任务放到SOL尾巴上面。\n", 9 | "* 【Schedule to Minimize Lateness】目标:$\\min \\sum_i Relu(f[i]-d_i)$;解法:每次选择最早的ddl;证明:逆序指的是OPT中的紧邻的对,放在后面ddl比前面的早,可以说明,通过交换这样的逆序对,OPT变成SOL并且结果不会变差\n", 10 | "* 【Optimal Caching】目标:把某个元素从缓存中拿走的次数最少;解法:每次去掉最远会遇到的(Farthest in Future);online版本:去掉出现频率最小的(Least Recent Used);结论:$LRU\\le k FF + k$\n", 11 | "* 【非负单源最短路径】算法Dijistra:每次找最短的能够连接到已有shortest path tree的边,把它加入;证明:假设之前加入的node其树上的路径都是最短路径,并且最短路径较小的都已经加入到树里面了;那么新加入的也找到了其最短路径,因为假设有其他的最短路径,那么一定会从某个地方离开目前的树,这一步就已经大于现有的路径长度了。\n", 12 | "* 【Minimum Spanning Tree】算法Kruskal:只要不形成环,全局地每次添加长度最小的边;算法Prim:类似Dijistra,每次加入最小的边到已有的树中;算法ReverseDeletion:只要不破坏连通性,每次删除最长的边;性质:如果边e=(u,v)是S和E-S的最小割,那么每个MST里面都包含e;证明:找一个包含最小割的u-v路径和包含另一个割的u-v路径,uv分属S、E-S,然后把割换成最小割既不影响连通性,权重也更小\n", 13 | "* 【Union Find Set】用来实现Kruskal算法;union操作:把一个集合的代表元素指向另一个集合的代表元素;find操作:返回某元素的所在集合的代表元素\n", 14 | "* 【interval partitioning】given start time, end time, find minimum of m/c to finish all the tasks; greedily pick the task in order of increasing start time, if the task cannot be allocated, open a new m/c; property: #m/c can never be less than maximum depth, depth is number of overlapping tasks at a given time\n", 15 | "* 【Huffman coding】merge from least frequent node;合并叶子节点为一个节点,节点的频率为叶子节点频率之和,这样做新树和旧树的代价只相差一个常数,因此具有意义的最优结构;通过把频率最小换到最深的位置,可以证明不增加总体代价,由此证明最优解中最小频率的两个节点一定是最深的兄弟\n", 16 | "* 【Methods used to prove optimal】\n", 17 | " 1. Show that after each step of the greedy algorithm, its solution is at least as good as any other algorithm's.\n", 18 | " 1. Discover a simple \"structural\" bound asserting that every possible solution must have a certain value. Then show that your algorithm always achieves this bound. \n", 19 | " 1. Gradually transform any solution to the one found by the greedy algorithm without hurting its quality.\n", 20 | " 1. Show it is a matroid\n", 21 | "* 【Matroid】Matroid is a tuple $(S, I)$, where $S$ is a finite ground set, and $I$ is a family of *independent* subset of $S$, which has following properties 1)Hereditary property: $B\\in I, A\\subset B \\Rightarrow A\\in I$;2)Exchange property: $A\\in I, B\\in I, |A|<|B| \\Rightarrow \\exists x \\in B-A, s.t. A\\cup \\{ x \\} \\in I$\n", 22 | "* 【Weighted Matriod Problem】is a matroid where each element in $S$ assigned by a weight $w_i$ and the total weight of a independent subset of $S$ is maximized. This problem can be solved greedy algorithm.\n", 23 | "```\n", 24 | "Greedy(S, I, w):\n", 25 | "A = {}\n", 26 | "sort S decreasingly by w\n", 27 | "for x in S:\n", 28 | " if A + {x} in I:\n", 29 | " A = A + {x}\n", 30 | "```\n", 31 | "\n", 32 | "### Dynamic Programming\n", 33 | "* 【Weighted Interval Scheduling】先按照ddl排序,然后$OPT[j]$表示选到第j个任务时的最大收益;$OPT[j]=\\max(v_j+OPT[P_j], OPT[j-1])$,其中$P_j$代表ddl在任务j开始时间之前的最后任务序号\n", 34 | "* 【Segment Least Square 分段拟合】$OPT[j]$表示选到第j个数据点时候的最小误差;$OPT[j]=\\min_{1\\le i\\le j}(e_{ij} + c + OPT[i-1])$,$i$列举了前一个分段所有的可能位置,$e_{ij}$是i到j数据点的拟合误差\n", 35 | "* 【Knepsack】$OPT[i,w]$表示选前i个物品用w的空间能取得的最大价值;$OPT[i,w]=\\max (OPT[i-1,w], OPT[i-1,w-w_i]+v_i)$,是伪多项式,实际是一个NP-hard问题。\n", 36 | "* 【RNA配对】要求:配对iT,T满足1)T是树;2)$\\forall v\\in T$,v是G中顶点的集合;3)$\\forall (u,v)\\in E(G)$,一定有一个T中的顶点同时包含uv;4)对于包含G中同一顶点v的T的顶点,能够组成一颗subtree。定义tree width,$tw(T) = \\max_{v\\in T}|v|-1$。算法:定义$OPT[S,v]$表示考虑树节点v子树中最大IS的个数,假设IS的在原图中顶点的集合为I,这里还需要满足$I\\cap v = S$(对应之前的选择或者不选择此顶点,现在是选择这个子节点的哪一个子集)。$OPT[S, v] = |S| + \\sum_{v\\in children(u)} max\\{OPT[S',v] - |u\\cap S|\\}$\n", 40 | "* 【装配线调度问题】m个装配线,n个工序,装配线之间切换需要时间,装配线上的每个站完成工序需要不同的时间,在n个装配线下,最短需要多少时间完成;A[i,j]为在第i调装配线上完成第j个工序的最短时间\n", 41 | "* 【乘法链问题】矩阵乘法通过不同的括号组合可以减小其中标量乘法计算次数,求矩阵乘法计算顺序;A[i,j]为从第i个到第j个矩阵做乘法的最小标量乘法次数\n", 42 | "* 【最长公共子列】A[i,j]表示序列X[:i]和Y[:j]的最长公共子列长度\n", 43 | "* 【最优二叉查找树】大致描述:一堆叶子节点从小到大排序,每个叶子对应一定的查找概率;叶子的分割点也从小到大排序,每个分割点也对应一定的查找概率。期望代价为(叶子和分割点)的(查找概率乘以深度之和)。目标是期望代价最小。A[i,j]表示叶子i到叶子j之间构造最优查找树的期望代价。 \n", 44 | "* 【证明方法 最优子结构】cut-and-paste:最优解使用的子问题的解必须是最优的,否则,可以导出矛盾\n", 45 | "\n", 46 | "### Divide and Conquer\n", 47 | "\n", 48 | "* 【Quick Sort】最坏情况是每次找到的划分元素要么是最大的要么是最小的,即划分总是不对称,这是运行时间为$O(n^2)$;最好情况是每次都能找到比较对称的划分,运行时间为$O(n \\log n)$;平均每次划分有的好有的坏,最后也能达到$O(n \\log n)$的运行时间\n", 49 | "```\n", 50 | "def quicksort(A, p, r): if p1, $T(n) = \\Theta(n^{\\log_b a})$\n", 89 | " - if af(n/b) = f(n), $T(n) = \\Theta(f(n) \\log_b n)$\n", 90 | "\n", 91 | "* 【Sequence Alignment】类似EditDistance,gap一个字符的损失为$\\delta$,替换一个字符的损失为$a[x_i,y_j]$,可以使用DP解决$OPT[i,j] = \\min(a[x_i,y_j] + OPT[i-1,j-1], \\delta+OPT[i-1,j], \\delta+OPT[i,j-1])$。DP有一种线性空间的解法,只能找到数值,不能找到结果。利用跟这个方法,我们结合DC来在同样的时间复杂度下找到路径。\n", 92 | "```\n", 93 | "compute OPT[(0,0)->(i,n/2)] for all i (Cmn/2)\n", 94 | "compute OPT[(i,n/2)->(m,n)] for all i (Cmn/2)\n", 95 | "compute q = argmin(OPT[(0,0)->(q,n/2)] + OPT[(q,n/2)->(m,n)] (Cm)\n", 96 | "DC(x[1:q], y[1:n/2])\n", 97 | "DC(x[q+1:m], y[n/2+1:n])\n", 98 | "```\n", 99 | "\n", 100 | "* 【Closest Pair】平面内n个点,找到距离最近的两个点;分治法,分成两半,找左边的和右边的,d=min(d1, d2),合并的时候需要在合并分界线附近各画两排d/2的格子,来找是否有恰好横跨分界线的点,合并复杂度O(n)。最后O(nlogn)。\n", 101 | "* 【Multipilication of 2 Poly】目标找两个给定系数的n阶多项式A(x),B(x)的乘积的多项式系数C(x)。选取2n个参数$w_k = e^{2\\pi ik/2n}$,求这2n个参数带入到A(x)和B(x)中的数值,对于每个多项式求2n个数值,此操作可以分解为奇数项和偶数项递归求解,可以在O(nlogn)内求解;把求出来的2n个A和2n个B做pointwise乘积得到2n个C,在O(n)内完成;同理可以完成逆向的操作在O(nlogn)内还原C的系数。其核心在于选取的这2n个参数在次方、倒数等操作上封闭。\n", 102 | "* 【Median Selection】find k-th smallest number;解法:分为n/5组,每组5个数;找每组的中位数O(n);找中位数的中位数x O(n/5);令t为确定比x小的数的个数,如果$kt$,那么下面迭代计算(k-t)-th small,把确定比x小的去掉。$T(n)\\le T(n/5)+T(7n/10)+Cn$为线性时间,其中每次至少排除3/10的点。\n", 103 | "\n", 104 | "### NP\n", 105 | "\n", 106 | "* 【P与NP】P是指存在多项式算法能够解决的问题;NP是指存在多项式certifier能够在有一个解的时候判断这个解是否满足。针对decision problem而言,是NPC;针对优化问题而言是NP-hard。\n", 107 | "* 【NPC证明】待证明问题P,已经知道的NPC问题Q。一般证明方法就是 ``Q_input -poly-> P_input`` 并且 ``P_output -poly-> Q_output``。\n", 108 | "* 【常见NPC问题】\n", 109 | " - SAT:给定一个布尔语句,是否存在一个布尔变量的赋值使得该语句为真\n", 110 | " - 3SAT:给定一堆布尔变量和一堆子句(形如A或者B或者C),是否存在一个布尔变量的赋值是的所有子句为真。``SAT<3SAT``把所有的布尔语句都能转化为3SAT形式。\n", 111 | " - Independent Set:图G中的顶点集合S满足其中的任意两个顶点不共边,则称S为独立集;求解图G的最大独立集。``VCk|V| return False; 3) any e=(u,v) return CHECK(G-u,k-1) or CHECK(G-v,k-1).\n", 122 | "* 【Load Balancing approximate algorithm】n个工作,每个工作有一个完成时间$t_i$,m台机器,第j个机器的负载$T_j = \\sum_{i on j}t_i$,目标最小化makespan=$\\max_j T_j$。贪心算法1:每次挑一个任务放在当前负载最小的机器上(approx ratio=2);贪心算法2:按照$t_i$降序排列,每次挑一个任务放在负载最小的机器上(approx ratio=1.5)。注意$T^* \\ge \\max_i t_i$,$T^* \\ge \\sum_i t_i / m$,在第二个算法的时候考虑添加到最多那个机器的最后一个任务。\n", 123 | "* 【Set Cover approximate algorithm】每次选择包含没覆盖元素数目最多的集合,ratio=$H_n=1+\\frac{1}{2}+\\frac{1}{3}+\\cdots+\\frac{1}{n}$;证明:考虑每增加一个集合有1的代价,这个代价均摊到刚被覆盖的元素上,每个元素代价$C[e]$,$|C|=\\sum_{e\\in U}C[e]\\le\\sum_{S\\in C^*}\\sum_{e\\in S}C[e]\\le\\sum_{S\\in C^*}H(|S|)$\n", 124 | "* 【K-center approximate algorithm】目标:使得k个聚类中,聚类的最大半径最小;2-approximate:猜一个最优解r,每次选一个点,把2r范围内的点删掉。可以证明k-center的($2-\\epsilon$)-approximate是NP-hard的。Decision版本:给定一个图和r,告诉最优解是$OPT\\ge r$还是$OPT \\ge (2-\\epsilon) r$。规约问题【Dominate Set】一个集合$S\\subset V$,$\\forall v\\in V/S, \\exists (u,v) s.t. u\\in S$,找到最小的|S|。DS是NP-hard。规约方法:DS问题中原来的图边长为1,任意没有连边的节点之间添加长度为2的边。G has DS <= k iff k-center OPT is 1; G has DS > k iff k-center OPT is 2.\n", 125 | "\n", 126 | "* 【PTAS/FPTAS】FPTAS:given constant $\\epsilon>0$, can get $(1+\\epsilon)$-approx in time poly(n, $\\frac{1}{\\epsilon}$);PTAS:given constant $\\epsilon>0$, can get $(1+\\epsilon)$-approx in time poly(n)\n", 127 | "* 【FPTAS of knepsack】把物品价值离散化,$b=\\dfrac{\\epsilon}{2n}$,$\\widehat{v_i} = \\lceil\\dfrac{v_i}{b} \\rceil$,定义$OPT[i,v]$为取到[i]物品的时候,至少有v的价值所需要的最小容量。$OPT[i,v]=\\min(OPT[i-1, v], w[i]+OPT[i-1, max(0, v-\\widehat{v_i})])$。可以证明$\\sum_{i\\in SOL} v_i \\ge (1-\\epsilon/2) \\sum_{i\\in OPT} v_i$,$\\sum_{i\\in SOL} \\widehat{v_i} \\ge \\sum_{i\\in OPT} \\widehat{v_i}$\n", 128 | "\n", 129 | "### Network Flow\n", 130 | "\n", 131 | "* 【定义】流网络:$G(V,E)$每条边关联一个非负容量c;存在单一源点$s\\in V$;存在单一汇点$t\\in V$。流函数:$f:E\\rightarrow\\mathbb{R}^+$,并且满足1)容量条件:对于每条边$e\\in E$,$0\\le f(e)\\le c[e]$;2)守恒条件:$\\forall e\\in E \\setminus \\{s, t\\}$,$\\sum_{e\\text{ towards }v} f(e) = \\sum_{e\\text{ from }v} f(e)$;最大流:使得$v(f) = f^{out}(s)$最大,其中$f^{out}(v) = \\sum_{e\\text{ from }v} f(e)$\n", 132 | "* 【剩余图】$G_f$的点集和$G$的点集相同,每条边$e=(u,v)$,如果$f(e)0$,则在$G_f$中构建一条容量为$f(e)$的前向边$(v,u)$\n", 133 | "* 【Ford-Fulkerson】对于一条简单路径P、现在已经存在的流f,去增加f的流量。其中``bottleneck(P,f)``是P上任何边关于f的最小剩余容量(还可以增大多少容量)。分析:如果容量都是整数。每次产生的新的流,由于bottleneck>0,$v(f') = v(f) + bottleneck(P,f)$,所以每次循环都变好。假设最大流为C,因此算法肯定能够在$O(C(|V|+|E|))$内找到答案。\n", 134 | "```\n", 135 | "def augment(f, P):\n", 136 | " b = bottleneck(P, f)\n", 137 | " for e in P:\n", 138 | " if e is forward edge in G_f:\n", 139 | " f(e) += b\n", 140 | " elif e is backward edge in G_f:\n", 141 | " f(e) -= b\n", 142 | " return f\n", 143 | "```\n", 144 | "```\n", 145 | "def max_flow:\n", 146 | " f(e) = 0 for all e in G\n", 147 | " while exist s-t path in G_f:\n", 148 | " P = the exist s-t path\n", 149 | " f' = augment(f, P)\n", 150 | " f = f'\n", 151 | " G_f = G_f'\n", 152 | " return f\n", 153 | "```\n", 154 | "* 【最大流最小割】s-t割:是对于结点集合V的一个划分(A,B),使得$s\\in A$,$t\\in B$。一个割的容量记为$c(A,B)=\\sum_{e from A} c[e]$;性质:\n", 155 | " - f是任意s-t流,(A,B)是任意s-t割,那么$v(f) = f^{out}(A)-f^{in}(A) = f^{in}(B)-f^{out}(B)$,$v(f) \\le c(A,B)$\n", 156 | " - Ford-Fulkerson返回最大流:f是剩余图中没有s-t路径的一个s-t流,那么G中存在一个s-t割$(A^*,B^*)$使得$v(f) = c(A^*,B^*)$\n", 157 | " - 最大流-最小割定理:每个流网络中,存在一个流f和一个割(A,B)使得$v(f) = c(A,B)$;每个流网络中,s-t流的最大值等于s-t割的最小容量。\n", 158 | "* 【Multiple Sources/Sinks】每个节点都有供给或者需求,守恒条件变为需求条件$f^{in}(v) - f^{out}(v) = d_v$,求解的问题变为判断是否存在一个可行的解满足所有点的需求;解决方法:引入一个超源点$s^*$它向每个$d>0$的节点有一条容量$d$的边,引入一个超汇聚点$t^*$每个$d<0$的节点都有一条容量为$-d$的边进入它。解决新问题的最大流问题,如果最大流v满足$v = \\sum_{d>0} d = \\sum_{d<0} -d$则存在可行解。\n", 159 | "* 【With Capacity Lower Bound】描述:在前面的基础上,如果修改容量条件为$l[e]\\le f(e) \\le c_e$,再判断是否存在可行解;解法:固定最小的容量,并且修改相关节点的需求值,转化为上面等价的问题\n", 160 | "* 【调查设计】X产品,Y顾客;X-Y连边:顾客i购买过产品j,容量1;s-X连边容量有上下界,表示每个产品需要的问卷数;Y-t连边容量有上下界,表示每个顾客问卷涉及的产品数目;存在可行的流等价于存在可行方案\n", 161 | "* 【航线调度】用k个飞机执飞若干个航班;节点为出发地或者目的地;如果有一趟航班,则添加一条边具有下界1和容量1;如果有足够时间从一个目的地到下一个出发地,则添加一条边具有下界0和容量1;s到每一个出发地都有一条边下界0和容量1(可以从任何出发地开始这一天);相应的有t;源点-k的需求,汇点k的需求;是否存在可行解\n", 162 | "* 【图像分割】问题描述是对于像素进行一个前景-后景的划分,$\\max \\sum_{i\\in A}a_i + \\sum_{i\\in B}b_i - \\sum_{(i,j)\\in E}p_{ij}$,其中$a_i,b_i$是前后景的概率,$p$是分割惩罚,$E$表示分割不同的相邻像素边。每个像素都是一个节点,节点之间的连边容量表示分割惩罚;s和所有节点相连,容量为$a_i$;所有节点和t相连,容量为$b_i$。找最大流对应的就是最小割,最小割刚好就是上面的优化目标。\n", 163 | "* 【棒球排除】判断某个队z在当前情况下小组积分能否排第一。X:除z之外剩余比赛;Y:除z外的队伍;X-Y连边表示在比赛x中队伍y获胜,容量无限;连边s-X的容量表示还剩多少场比赛;连边Y-t容量表示要想z排第一其他队伍在剩下的比赛中不能获得超过此分数。如果存在最大流把s-X路径填满,则z可能获胜,否则不可能。\n", 164 | "* 【Project Selection】选课,有些课程有先修要求,这些要求由一个DAG表示,某个节点代表的课程指出去的结点是它的先修课程。每个有收益$p_i$,有正有负,要求一个选课方案,最大化总收益。转化为网络流问题,所有连边上的容量为inf;所有收益为正的点有一条从s出发到它的边,容量为收益;所有收益为负的点有一条从它出发到t的边,容量为收益绝对值。最大流等价于最小割,考虑一个割A(包含s)和V-A(包含t),最小化割$\\sum_{i\\notin A, p_i>0} p_i + \\sum_{i\\in A, p_i <0}(-p_i)$,等价于最大化收益$\\sum_{i\\in A, p_i>0} p_i + \\sum_{i\\in A, p_i <0}p_i = \\sum_{i\\in A} p_i$\n", 165 | "* 【Densest Subgraph】一个无向图里面选取$S\\subseteq V\\ s.t. \\dfrac{|E[S]|}{|S|}$ is maximized。构建一个二分图,左边代表每条边,右边代表每个顶点,相应的边和顶点之间有容量无限的连边,s到左边每条边的容量为1,右边到t每条边的容量为$\\rho$。观察:the graph has subgraph of density $\\le \\rho$, iff max flow $=|E|$。证明:$T=A\\cap R$,A为左边的割,R为右边的点,$cut(A,L\\cup R/A) = \\rho |T| + |E| - |E[T]|$,when $cut(A,(L\\cup R)/A)=|E|$, $\\rho = \\dfrac{|E[T]|}{|T|}$。解决方法:parametric flow,找到一个$\\rho$使得最大流刚好为$|E|$,可以跟解决单个最大流问题差不多快(Tarjan)\n", 166 | "\n", 167 | "### Linear Program\n", 168 | "\n", 169 | "* 【Duality】原问题$ \\max\\ {\\bf C}^T {\\bf x} \\quad s.t. {\\bf Ax} \\le {\\bf b} \\quad {\\bf x} \\ge 0$;对偶问题$\\min\\ {\\bf b}^T {\\bf y} \\quad s.t. {\\bf A}^T y \\ge {\\bf C} \\quad {\\bf y} \\ge 0$;弱对偶:对偶问题的可行解$\\bf b^Ty$是原问题可行解$\\bf C^T x$的上界。在LP中,两者的最优解相等(强对偶)。\n", 170 | "* 【解决方法】Simplex不是多项式时间,但是实际上比较快;Ellipsoid多项式时间,但是实际上比较慢;Iterior Point多项式时间,实际上也比较快。\n", 171 | "* 【Simplex】基本解:考虑松弛型表示,令第一个约束右边变量(非基本变量)都为0,得到的一组解;基本可行解:如果基本解也满足第二个约束,则是基本可行解;单纯形法思路:先找一个基本可行解,然后每次都考虑变化一个最优化表达式中的非基本变量,使其刚好不违反约束而使得目标最优,然后把这个非基本变量$x_e$代换成一个基本变量$x_l$(pivot),然后重复进行\n", 172 | "* 【Ellipsoid】考虑一个能够包含可行域A并且球心为待求解问题在较大范围内最优解的椭圆P,每次看球心在不在A中,如果不在,就返回有个超平面把球心和A分开,然后去掉被分开的那一半,在找一个刚好包含这个区域内的椭圆P';重复,知道球心位于A中,即为最优解\n", 173 | "* 【MaxFlow】$\\max \\sum_{(s,v)\\in E} f[s,v]\\quad s.t. \\forall v\\neq s, t \\ \\sum_u f[v,u]=\\sum_w f[w,v], \\ 0\\le f[v,u] \\le C[v,u]$\n", 174 | "* 【VertexCover】$\\min \\sum_v x[v] \\quad s.t. \\forall (u,v)\\in E x[v]+x[u]\\ge 1, \\ x[u]\\in\\{0,1\\}$ (Integer LP)\n", 175 | "* 【MinSpanTree】$\\min \\sum_{(i,j)\\in E} c_{ij} x_{ij} \\quad s.t. \\sum_{(i,j)\\in E} x_{ij}=n-1,\\ \\sum_{(i,j)\\in E, i,j\\in S} x_{ij} \\le |S|-1\\ \\forall S\\subseteq V, \\ x_{ij}\\ge 0,\\ \\forall (i,j)\\in E$,其中c矩阵表示边的连接关系,如果有连边就是1,否则为0;x表示所选择的连边,如果选择了就是1,否则是0;C矩阵是TUM,可用整数LP解决。\n", 176 | "* 【Matching】向量$x\\in\\mathbb{R}^{|E|}$表示每条边选不选,它是matching的限制条件是1)$x[e]\\ge 0,\\ \\forall e\\in E$;2)$\\sum_{e\\in\\delta(v)} x[e] \\le 1,\\ \\forall v\\in V$;3)$\\sum_{e\\in E(U)} x[e] \\le (|U|-1)/2,\\ \\forall U\\subseteq V, |U|$ is odd. 所有Matching构成一个polytope $P=\\{x[e]\\}$\n", 177 | "* 【SetCover】基本元素集$E=\\{e_1, \\cdots, e_n\\}$,集合的集$F=\\{S_1,\\cdots,S_m\\}$;$\\min \\sum_{S\\in F} x[s]\\quad s.t. \\forall e\\in R,\\ \\sum_{S:e\\in S} x[s] \\ge 1,\\ \\forall e,\\ x[e]\\ge 0$\n", 178 | "* 【TUM】定义:Matrix $A\\in \\mathbb{Z}^{m\\times n}$ is totally unimodular if the determinate of each square submatrix of 0, +1 or -1. 性质:If A is totally unimodular, then every vertex solution of $Ax \\le b$ is integral;证明:考虑$A^{-1}$都是整数,而b也是整数,所以x也是整数。性质:$A\\in\\mathbb{R}^{m\\times n}$ is TUM iff for any $R\\subseteq [m]$, there is a partition $R_1\\cup R_2 =R,\\ R_1\\cap R_2 =\\emptyset\\quad s.t. \\forall j\\in [n] \\ \\sum_{i\\in R_1}a_{ij} - \\sum_{i\\in R_2}a_{ij} = 0,\\pm 1$;性质:如果原问题有最优整数解,那么对偶问题也有最优整数解。\n", 179 | "* 【TUM举例】邻接矩阵$|V|\\times|E|$,有连边关系为1,没有为0;行或者列上面连续1,其他为0的矩阵\n", 180 | "\n", 181 | "* 【TDI】定义:如果原问题里面对于任意的整数向量$C$,其最优值$C^Tx$都是有限的,并且其对偶问题的解$y$存在且为整数向量,称其为Totally Dual Integrality;性质:如果原问题是TDI,并且b是整数向量,那么原问题所有的解都是整数向量。\n", 182 | "\n", 183 | "### Hashing\n", 184 | "\n", 185 | "* 【Hashing Table】关键字域U很大,但是实际的关键字集合K很小。这时如果要用**直接寻址法**储存每个关键字对应的数值,需要的内存开销为$|U|$。如果能够使用一个hash function $h: U \\rightarrow [m]$就可以把所需要的内存降低到m。;坏处是可能发生不同的关键字被映射到同一个槽位上,即碰撞(collision)。解决方法有chaining和open addressing。\n", 186 | "* 【Chaining】做法:每个槽位后面跟一个链表,插入和删除就直接对链表操作时间为$O(1)$;查找的时间:通过计算期望得到$O(1+\\alpha)$,其中$\\alpha = n/m$,$m$为槽位数目,$n$为要储存的元素个数。即如果$m$和$n$成正比,就能在$O(1)$时间内完成查找。\n", 187 | "* 【Open Addressing】思路:与chaining不同的是,开放寻址中所有的元素都存在槽位里面,如果发生冲突,就找散列函数对应的下一个槽位。因此其装载因子$\\alpha\\le 1$。做法:把之前的散列函数的输入加入一个探查号i,$h:U\\times [m] \\rightarrow [m]$,并且希望探查号$i=1,\\cdots,m$对应的槽位是所有槽位的一个排列,不同散列函数尽量对应不同的排列。具体散列函数:线性探查:$h(k,i)=(h'(k) + i) \\mod m$;二次探查:$h(k,i)=(h'(k) + c_1i+c_2i^2) \\mod m$;双重探查:$h(k,i)=(h_1(k) + i h_2(k)) \\mod m$\n", 188 | "* 【Univeral Hashing】思路:如果固定一个散列函数,对手都可以选择全部散列到同一个槽位的n个元素给你作对,使得算法的检索时间为$\\Theta(n)$。我们希望设计一组散列函数,每次随机选择一个散列函数,这样就有较好的平均性能。定义:对于任意的不同关键字$k,l\\in U$,满足$h(k) = h(l)$的散列函数$h\\in \\mathcal{H}$的个数至多为$|\\mathcal{H}|/m$。设计一个universal hashing:$\\mathcal{H}_{p,m} = \\{ h_{a,b} : a\\in[1:p-1], b\\in[0:p-1]\\}$,其中$h_{a,b}(k) =((ak+b) \\mod p) \\mod m$\n", 189 | "* 【Perfect Hasing】定义:如果某一种散列技术在进行查找时,最坏情况的性能也是$O(1)$的话,称其为perfect mathing。仅在关键字是静态时,才有完全散列。做法:采用两级散列表,第一级散列表照常进行,第二级散列表选择的槽位数目$m_j=n_j^2$,可以证明大于$1/2$的概率可以一次选的一个没有碰撞的二级散列表(注意到关键字是静态的),通过几次尝试可以选择一个没有碰撞的散列函数。由于第二级散列表没有冲突,因此最坏查询时间也是$O(1)$。同时可以证明,总体存储空间也是$O(n)$,即$\\mathbb{E}[\\sum_{j=1}^{m-1}n_j^2] < 2n$。\n", 190 | "\n", 191 | "* 【Hashing实现】一个和n差不多大的质数p,以p为基底写出来$x=(x_1x_2\\cdots x_r)$,$r\\approx\\log |U| / \\log p$,$\\mathcal{H}=\\{h_a(x) = (\\sum_{i=1}^r a_ix_i) \\mod p| \\forall a\\}$。证明:把a看做随机变量,$Pr_a[h_a(x)=h_b(y)] = Pr_a[az=h(mod\\ p)] =1/p=1/n$,其中x和y在p基底下写出来总有不相等的位j,$x_j-y_j=z$,可以推导出一个大于零的数h。\n", 192 | "* 【Closest Pair】有n个点,找最近的一对。从任意一对点的距离$\\delta$开始,划分二维的$\\delta/2$的网格,每次来一个点,查找这个点附近的25个格子里面有没有其他的点,如果有再看距离,如果距离比$\\delta$小,那么从头再进行。直到完成一次全部的扫描。\n", 193 | "* 【Random Ball into Bins】n个球随机扔到n个格子里面,格子最大负载的期望$\\log n/\\log\\log n$\n", 194 | "\n", 195 | "### Shortest path\n", 196 | "\n", 197 | "* 【Dijkstra】单源最短路径问题,边权重为正数。每次找最短的能够连接到已有shortest path tree的边,把它加入。最优性证明:归纳法,考虑不从s~>u->v,而是从s~>x(x in S)->y(y not in S)~>v,注意到每次先扩展最小的,所以到v的肯定比到y短,因此不可能从别的点绕一圈回到v更短。\n", 198 | "* 【Bellman-Ford】单源最短路径问题,权重可以为负数,但是不能有负环。动态规划,定义$opt[i,v]$为从v到t最多用i条边的最短路径,有状态转移方程$opt[i,v] = \\min(opt[i-1, v], \\min_{w\\in V} (opt[i-1, w] + c[v, w]))$,计算的时候按照i递增的顺序计算即可。应用:用于网络路由,每个节点可以维护当前节点到其他节点的数值$M[t]=\\min_{w\\in V} (opt[i-1, w] + c[v, w])$,这样一个包在每次都只用选择相连的$M[t]$最小的节点即可。维护的方法是对于每个节点都去基于$M[v] = \\min(M[v], \\min_u M[u]+c[v,u])$去更新,如果更新了,该节点激活,用于提示相邻节点也看看要不要更新。\n", 199 | "* 【Floyd-Warshall】多源最短路径问题,权重可以为负数,但是不能有负环。动态规划,定义$d_{ij}^{(k)}$表示从顶点i到顶点j,并且中间路径中顶点属于[k]的最短路径。最开始$d_{ij}^{(0)}=w_{ij}$,如果顶点i和顶点j直接相连。有状态转移方程$d_{ij}^{(k)} = \\min(d_{ij}^{(k-1)}. d_{ik}^{(k-1)} + d_{kj}^{(k-1)})$三层循环,从外到内分别是k,i,j,运行时间$O(|V|^3)$。要想追踪具体的路径,可以利用一个前驱矩阵来每次记录选用的哪个k。\n", 200 | "\n", 201 | "### Basic Data Structure\n", 202 | "\n", 203 | "* 【Binary Search Tree】每个节点左子树上面元素的数值都比该节点小,右子树上面元素的数值都比该节点大。查询:递归判断大小觉得往左还是往右,运行时间为树的平均深度。找最大和最小:一直往左或者往右,运行时间为树的深度。给定一个节点找后继(比该节点数值大的节点中最小的):如果有右子树就找右子树中最大的,如果没有就找祖先中第一个把自己这一支当做左子树的节点。插入:和查找类似,找到能找到的最近的叶子,在该叶子上开一个节点来放自己。删除:如果是一个单节点,直接删除;如果有一个子节点,把子节点接到父节点上面;如果有两个子节点,找出自己的后继,用后继替代自己的位置\n", 204 | "* 【B-tree】性质:1)每个节点有多个子节点,节点中顺序存放子节点的分割数值;2)除了根节点之外,每个节点的子节点数目$[t-1, 2t-1]$,如果子节点数目达到$2t-1$就说这个节点是满的。查询:显而易见。插入:如果要插入的节点从上到下都不满,就直接插入;如果满了就需要把满的节点分裂。分裂的方式就是把节点中第t个元素提到父节点中,然后把[1:t-1]和[t+1:2t-1]元素分别分裂成两个分支。删除:情况比较复杂,总体思想就是不能违反B树的性质\n", 205 | "* 【Union-Find Set】作用:反映元素和集合之间的关系。主要功能:MAKE_SET(x):建立一个只包含元素x的集合;UNION(x,y):把x所在的集合和y所在的集合合并;FIND_SET(x):找出x所对应的集合。实现:链表;带路径压缩的树结构\n", 206 | "\n", 207 | "### Others/ProblemSet\n", 208 | "* 【Topological Order in DAG】算法:每次找没有进入边的节点,然后把它顺序加到order的列表里面,并且把它相关的边删掉\n", 209 | "* 【Stable Matching】a pair (a,b) is not stable if a prefer b than current match and b prefer a than current match; GS算法可以保证结束时都有匹配、没有不稳定匹配;每次找一个没有被匹配的男,找他list里面第一个他没有propose过的女,如果该女没有被匹配,就和他在一起,如果被匹配了,就选择和现任比更prefer的。\n", 210 | "* 【Coupon Collection】Initially, we have n empty bins. In each round, we throw a ball into a uniformly random chosen bin. Let T be number of rounds needed such that no bin is empty. Show that $\\mathbb{E}[T] = nH_n$, where$H_n = \\sum_{i=1}^n \\dfrac{1}{n}$. 假设已经有i-1个盒子里面有球了,出现一个新的盒子里面有球的概率是$p = [n-(i-1)]/n$,因此从i-1个盒子里面刚刚有球,到i个盒子里面有球的期望时间为$1/p = n/[n-(i-1)]$。总的时间$\\mathbb{E}(T) = n\\sum_{i=1}^n \\dfrac{1}{n-i+1} = n\\sum_{i=1}^n \\dfrac{1}{n}$\n", 211 | "* 【动态规划问题合集】Longest Path in DAG(先拓扑排序);Finding the maximum area polygon(圆上的顶点找面积最大;需要选择不同的顶点运行若干次;OPT为截止第i个顶点时的最大面积);Longest palindrome subsequence(OPT为从i到j的回文序列长度);Matrix-chain multiplication(如何选择矩阵乘法的顺序使得乘法运算量最小,OPT为从i到j的最小乘法量);Viterbi algorithm(Use $T[i, m]$ to represent the probability of observing sequence $\\langle \\sigma_1, \\cdots, \\sigma_m \\rangle$ with node $v_i$ labeling $\\sigma_m$,$T[i,m] = \\max_j (T[j, m-1] \\cdot A[j, i] \\cdot B[j, m])$,其中A是从状态j到状态i的概率,B是在状态j观察到m的概率);Edit Distance(``A[i, j] = min(A[i-1, j] + 1, A[i, j-1] + 1, A[i-1, j-1] + I(x[i] != y[j]))``分布代表delete,insert,mismatch/match);Longest Common Sequence(可以有节省储存空间的方法,弄成一维数组,不过需要一个临时数值存储上一步的值)Longest increasing subsequence(``A[i] = max(A[j] + (S[j] < S[i]) * 1) for all j < i``)\n", 212 | "* 【分治法问题合集】Monge Matrix找每行左数第一个最小数字的索引;\n", 213 | "* 【贪心问题合集】Unit tasks scheduling(每个任务单位时间能完成,超过ddl$d_i$会有惩罚$p_i$;按惩罚排序,并把任务依次放到其ddl前能放的最晚位置,如果放不下,就扔到最后);Coin changing(每次都选最大面额的,选不了再考虑小一点面额的;证明:该算法结果$a = (a_0, a_1, \\cdots, a_k)$,一个最优解$b = (b_0, b_1, \\cdots, b_k)$。先证$b_i \\le c-1,\\, \\forall 0\\le i \\le k-1$,不然可以换成更好的结果;$c^j \\le x < c^{j+1}$再证$\\sum_{i=0}^{j-1} b_i c^i \\le \\sum_{i=0}^{j-1} (c-1) c^i = c^j - 1 < c^j \\le x$,最后归纳法);\n", 214 | "* 【NPC证明问题合集】『先说明是NP』Feedback set(a feedback set is a set X ⊆ V with the property that G − X has no cycles,从顶点覆盖问题转化:对于一个顶点覆盖问题G,每个顶点和每条边都变为一个顶点,每条边连接相关的顶点和边。一个顶点覆盖对应转化之后图的feedback set);Multiple Interval Scheduling(有些任务冲突,最多能选多少任务去做,是NPC。规约Independent Set。每个任务是一个节点,如果两个任务不能同时进行,就把他们连一条边。如果有大小为k的Independent set,那么这k个任务就可以都被处理。)Densest k-Subgraph(大小为k的点集的子集S,是的E(S)最大;Decision version:是否存在一个大小为k的子点集S,G(S)内边数目为y?规约:CLIQUE问题,是否存在包含k个节点的CLIQUE(任两个节点两两相连);给定一个这个问题,问是否存在k个节点的CLIQUE使得边的数目为k(k-1)/2。)Maximum Coverage(和set cover类似,问的是挑选k个子集,能覆盖的最多元素数目是否大于y,从set Cover规约,选择k个set,是否能够包含所有的元素。令y=|U|,直接转化);Tiling Problem(从Bin Packing规约,给定Bin Packing的一个输入$x_1, x_2, \\cdots, x_n$,构造一个Tiling的输入,使得n个矩形的宽度分别为$x_i$,长度为2(任何大于1的数都可以),大举行的宽度为1,长度为$2k$。我们依次增加k,看看最少k等于多少时,能够把这些矩形都包含进去,返回这个k。这个k的数值就是Bin Packing问题里面最少需要的子集个数。注意到因为小矩形的长大于1,因此没法旋转,只能横向排。)\n", 215 | "* 【Recurrence】$T(n)=2T(\\dfrac{n}{2})+n\\log n \\Rightarrow \\Theta(n \\log n \\log n)$;$T(n)=2T(\\sqrt{n})+\\log n \\Rightarrow \\Theta(\\log n \\log n)$\n", 216 | "* 【Rearrangable Matrix】通过行交换和列交换能够把对角元都变为1的,称为RM。转化为二分图匹配,二分图左边顶点每个代表一行,右边每个代表一列,如果有一个元素等于1,就连一条边从左边到右边。如果存在一个完全匹配(perfect matching)就代表是RM。\n", 217 | "* 【Turan's Bound】Independent set的下界为$\\sum_{v \\in V} \\dfrac{1}{deg(v)+1}$。把顶点排成一排,考虑一个独立集,包含顶点v,如果这个顶点所有的相邻顶点都在它的后面。那么对于每个被选的顶点,它和它相邻的顶点平均贡献的独立集长度为$\\dfrac{1}{deg(v)+1}$。有的未被选择的顶点其实应该被count多次,但是我们这里只count一次,得到上面那个下界。因此这个独立集的大小的下界为$\\sum_{v \\in V} \\dfrac{1}{deg(v)+1}$,这还不是最大的独立集,因此一个最大的独立集比这个还大。\n", 218 | "* 【min-cost matching with exactly k edges】转化为网络流问题,引入源点s,s',中间的容量为k;s'到左边图连边容量为1;左右相连;右边图到t容量为1。然后使用最大流的算法来解决。\n", 219 | "* 【二分图maximum matching = minimum vertex cover】Kőnig's theorem: 从一个最大匹配构最小顶点覆盖的方法。注意最大匹配的性质是不能找到更长的alternating path。U: 左边L没有被匹配的节点;Z: 通过alternating path和U相连的节点;$K=(L \\backslash Z) \\cup (R \\cap Z)$是构造出来的顶点覆盖。首先,K是一个顶点覆盖。对于在匹配M中的边,如果它在AP中,那么它和$R \\cap Z$相接;如果不在AP中,那么和$L\\backslash Z$相接。对于不在M中的边,如果它在AP中,那么它和$R \\cap Z$相接;如果不在AP中,它的左端点肯定不在Z里面,因此也在K里面。其次,K的数目和M的数目一样多。每个K中的顶点都与M相接:左边的顶点中不相接的已经被剔除了;右边的顶点就是和Z相交得到的,因此也在AP中,如果不在M中,就能够扩展更长的路径了。M中每条边不可能两个端点都在K中:因为如果这条边在AP中,那么其左端点被去掉了;如果不在AP中,其右顶点不在K中。最后,K的数目不可能小于M的数目,因此,K是最小的顶点覆盖。因为,M包含互不相接的的|M|个边,要保证这|M|个边被覆盖,至少需要|M|个顶点。\n", 220 | "* 【二分图maximum independent set = maximum matching】K是最小顶点覆盖 $\\Leftrightarrow$ V-K是最大独立集。如果K是顶点覆盖,那么把K和其相关的边去掉那么就没有边了,因此V-K是独立集。如果V-K是独立集,那么它们之间没有任何的边,因此所有的边都和K有关,因此K是顶点覆盖。由于其等价,一个最小的时候,另一个也最大。\n", 221 | "* 【Eularian graph】证明所有顶点度数为偶数的存在遍历路径。方法一:从任意的一个点开始,每选择一条边之后就把这个边从图里面删除,当有多条边可以选择的时候,要选择删除这条边之后不会破坏图联通性的边。检测图连通性的方式最差可以使用一个DFS在多项式时间里面找出来。对于连通性的检测的次数最多为O(d|E|),其中d为图中最大的度数。方法二:从任意的一个点开始,随便走,最后肯定能回到这个点,只不过有些点访问不到。在刚刚的路径里面找一个点,它还有往外面连通的边,然后从这个点再随便走一个环,并且把这个环加到刚刚的环的某个位置上。重复此操作,可以找到最后的环。复杂度O(|E|)\n", 222 | "* 【平面上直线交点在某一范围内均匀采样】在之前的过程中,每一层的merge都记录下后半部分表内元素和能够和前半部分表内元素产生逆序对的格式,按照前序遍历把这些个数都记录下来。这样我们有一个数组$C_1, C_2, \\cdots, C_s$。假设总逆序对个数为$C=\\sum C_i$,那么就按照C个数中随机采样i个点,其中最小一个点的分布依次产生一个随机数(for i=q downto 1),这样能够在O(q)的时间内产生均匀分布的q个点。我们下一步要在O(q)的时间内计算出这q个数值中的每一个数值$x_i$对应的$a=\\max(\\{i: \\sum_{j=1}^i C_j \\le x\\})$和$b=x-\\sum_{j=1}^a C_j$,然后通过这个数组下储存的辅助信息就可以求得到对应的是哪两条直线,从而可以求出交点。要想实现找到a,b的过程,可以使用Van Emde Boas tree。\n", 223 | "\n", 224 | "### Sort\n", 225 | "\n", 226 | "* 【Bubble Sort】重复$N^2$轮,``if A[j] < A[j-1]: swap(A[j], A[j-1])``;证明正确性:每个j循环之后,A[N-i:N]都变为有序的;复杂度:不管什么情况都需要$O(n^2)$次判断,最好情况可以不做交换,平均和最坏需要$O(n^2)$次交换\n", 227 | "* 【Insertion Sort】``for i = 2 to N: {tmp = A[i]; for j = i downto 2: {if tmp >= A[j-1]: break; A[j] = A[j-1]} A[j] = tmp}``;证明正确性:每次j循环之后A[1:i]都变为有序的;运行时间:$O(n^2)$\n", 228 | "* 【Heap Sort】最大堆:一颗二叉树,每个父节点都比自己的子节点大;一个最大堆,根节点i被修改了,通过max_heapify可以使得其在$O(\\log n)$时间内维护成最大堆;思路:先建立一个堆,由最大堆的性质可以知道,堆的根是最大的元素,把根和最后一个叶子互换,然后把那个叶子从堆中取走,再维护成一个最大堆。如此重复,可以得到排序\n", 229 | "```\n", 230 | "def max_heapify(A, i):\n", 231 | " l = left(i)\n", 232 | " r = right(i)\n", 233 | " if l <= heapsize and A[l] > A[i]:\n", 234 | " largest = l\n", 235 | " else:\n", 236 | " largest = i\n", 237 | " if r <= heapsize and A[r] > A[largest]\n", 238 | " largest = r\n", 239 | " if largest != i\n", 240 | " swap(A[i], A[largest])\n", 241 | " max_heapify(A, largest)\n", 242 | "def build_heap(A):\n", 243 | " headsize(A) = len(A)\n", 244 | " for i = floor(len(A) / 2) downto 1:\n", 245 | " max_heapify(A, i)\n", 246 | "def heap_sort(A)\n", 247 | " build_heap(A)\n", 248 | " for i = N downto 2:\n", 249 | " swap(A[1], A[i])\n", 250 | " heapsize(A) -= 1\n", 251 | " max_heapify(A, 1)\n", 252 | "```\n", 253 | "\n", 254 | "### Graph\n", 255 | "\n", 256 | "* 【BFS】\n", 257 | "```\n", 258 | "def BSF(G, s):\n", 259 | " for each u in G:\n", 260 | " color[u] = WHITE\n", 261 | " color[s] = GRAY\n", 262 | " Q = [s]\n", 263 | " while Q:\n", 264 | " u = Q.pop()\n", 265 | " for each v adj to u:\n", 266 | " if color[v] = WHITE:\n", 267 | " color[v] = GRAY\n", 268 | " Q.append(v)\n", 269 | " color[u] = BLACK\n", 270 | "```\n", 271 | "* 【DFS】\n", 272 | "```\n", 273 | "def DFS(G):\n", 274 | " for each u in G:\n", 275 | " color[u] = WHITE\n", 276 | " for each u in G:\n", 277 | " if color[u] == WHITE:\n", 278 | " DFS_visit(u)\n", 279 | "def DFS_visit(u):\n", 280 | " color[u] = GRAY\n", 281 | " for each v adj to u:\n", 282 | " if color[v] == WHITE:\n", 283 | " DFS_visit(v)\n", 284 | " color[u] = BLACK\n", 285 | "```\n", 286 | "* 【Topological Order】性质:在有向无环图中,存在一种排序使得每条边都往后指;运行DFS,并且按照每个节点被标记为黑色的时间先后顺序排序就是拓扑排序;只有某个节点的后继都标黑了它才可能标黑,所以标黑时间靠后的排在前面。\n", 287 | "* 【Strongly Connected Components】定义:一个图中的一个节点集合,任意两个节点都可以互通,就叫做强连通分支;问题是给定一个图,找出其强连通分支的分解;图的转置:就是把边都翻过来 $G^T=(V, E^T)$,$E^T = \\{(u,v) | (v, u)\\in E\\}$\n", 288 | "```\n", 289 | "def strongly_connected_components(G):\n", 290 | " DFS(G) record turn-black time f(u) for all u\n", 291 | " DFS(G^T) in decreasing order of f(u) and record the forest\n", 292 | " each tree in the forest is a strongly connected component\n", 293 | "```\n", 294 | "\n", 295 | "### Matching\n", 296 | "\n", 297 | "* 【定义】图$G(V,E)$中的一个边集$M\\subseteq E$,每个结点都至多出现在M的一条边上。称M为匹配。如果每个结点恰好出现在M的一条边上,成为完全匹配。matching: 不共点的边集M;matching number: 不共点边集的边的数量;maximum matching: matching number最大的;perfect matching: 能够覆盖所有点的;M-alternating path: $G=(V, E)$中一条交替出现在$M$和$E\\setminus M$中的路径;M-augmenting path: 路径中两端没有被覆盖\n", 298 | "* 【Hungarian Algorithm】\n", 299 | "```\n", 300 | "start from any matching M\n", 301 | "if M is a perfect matching:\n", 302 | " return M\n", 303 | "x0 = a exposed vertex in X\n", 304 | "A = {x0} \n", 305 | "B = {}\n", 306 | "if N(A) == B:\n", 307 | " return NO_PERFECT_MATCHING\n", 308 | " # because |N(A)|=|B|=|A|-1<|A|, violate Hall's theorem\n", 309 | "else:\n", 310 | " y1 = a vertex in N(A) but not in B\n", 311 | " if y1 is covered by M:\n", 312 | " B += {y1}\n", 313 | " A += {x1: (y1,x1) in M}\n", 314 | " goto 'if N(A) == B'\n", 315 | " else:\n", 316 | " P = path from x0 to y is an alternating path\n", 317 | " replace M by new M'\n", 318 | " goto 'if M is a perfect matching'\n", 319 | "```\n", 320 | "* 【Hall's Theorem】动机:如何在不求出最大流的情况下,找出一个证据说某二分图不存在完美匹配,即最大流小于n。根据最大流最小割定理,如果能够找到一个割容量小于n,即可说明。ps. 当然如果直接求出最大流,也能知道是否存在完美匹配。描述:对于一个子集$A\\subseteq X$,用$\\Gamma(A)\\subseteq Y$表示邻接A中节点的集合,如果二部图$(X,Y)$有完美匹配,那么对于所有的$A\\subseteq X$,都有$|\\Gamma(A)|\\ge |A|$。\n", 321 | "\n", 322 | "\n", 323 | "### Linear Programming (Supplementary)\n", 324 | "\n", 325 | "* 【表示】线性规划标准型表示:\n", 326 | "$$\n", 327 | "\\begin{aligned}\n", 328 | "& \\max\\ {\\bf C}^T {\\bf x} \\\\\n", 329 | "& s.t. {\\bf Ax} \\le {\\bf b} \\\\\n", 330 | "& \\quad \\quad {\\bf x} \\ge 0\n", 331 | "\\end{aligned}\n", 332 | "$$\n", 333 | "LP都可以转化为标准型。特别地,如果某元素$x_j$不满足$x_j \\ge 0$,可令$x'_j,x''_j \\ge 0$,代换$x_j = x'_j - x''_j$。线性规划松弛型表示,相关的不等式约束都变成等式约束,只剩下关于单个元素的不等式约束:\n", 334 | "$$\n", 335 | "\\begin{aligned}\n", 336 | "& \\max\\ z = v + \\sum_{j\\in N}c_j x_j \\\\\n", 337 | "& s.t. x_i = b_i - \\sum_{j \\in N} a_{ij}x_j, \\, \\forall i \\in B \\\\\n", 338 | "& \\quad \\quad x_i \\ge 0, \\, \\forall i \\in B\\cup N\n", 339 | "\\end{aligned}\n", 340 | "$$\n", 341 | "* 【Simplex】基本解:考虑松弛型表示,令第一个约束右边变量(非基本变量)都为0,得到的一组解;基本可行解:如果基本解也满足第二个约束,则是基本可行解;单纯形法思路:先找一个基本可行解,然后每次都考虑变化一个最优化表达式中的非基本变量,使其刚好不违反约束而使得目标最优,然后把这个非基本变量$x_e$代换成一个基本变量$x_l$(pivot),然后重复进行。其中需要一个程序initialize_simplex来找到第一个基本可行解。\n", 342 | "```\n", 343 | "def pivot(N, B, A, b, c, v, l, e):\n", 344 | " b_new[e] = b[l] / a[l,e]\n", 345 | " for each j in N-{e}:\n", 346 | " a_new[e,j] = a[l,j] / a[l,e]\n", 347 | " a_new[e,l] = 1/a[l,e]\n", 348 | " for each i in B-{l}:\n", 349 | " b_new[i] = b[i] - a[i,e] * b_new[e]\n", 350 | " for each j in N-{e}:\n", 351 | " a_new[i,j] = a[i,j] - a[i,e] * a_new[e,j]\n", 352 | " a_new[i,l] = - a[i,e] a_new[e,l]\n", 353 | " v_new = v + c[e] * b_new[e]\n", 354 | " for each j in N-{e}:\n", 355 | " c_new[j] = c[j] - c[e] * a_new[e,j]\n", 356 | " c_new[l] = - c[e] * a_new[e,l]\n", 357 | " N_new = N - {e, l}\n", 358 | " B_new = B - {e, l}\n", 359 | " return N_new, B_new, A_new, b_new, c_new, v_new\n", 360 | "def simplex(A, b, c):\n", 361 | " N, B, A, b, c, v = initialize_simplex(A, b, c)\n", 362 | " while some index e in N that c[e] > 0:\n", 363 | " for each i in B:\n", 364 | " if a[i, e] > 0:\n", 365 | " delta[i] = b[i] / a[i, e]\n", 366 | " else:\n", 367 | " delta[i] = inf\n", 368 | " choose l in B that minimize delta[i]\n", 369 | " if delta[l] == inf:\n", 370 | " return UNBOUNDED\n", 371 | " else:\n", 372 | " N, B, A, b, c, v = pivot(N, B, A, b, c, v, l, e)\n", 373 | " for i = 1 to n:\n", 374 | " if i in B:\n", 375 | " x[i] = b[i]\n", 376 | " else:\n", 377 | " x[i] = 0\n", 378 | " return x\n", 379 | "```" 380 | ] 381 | }, 382 | { 383 | "cell_type": "code", 384 | "execution_count": null, 385 | "metadata": { 386 | "collapsed": true 387 | }, 388 | "outputs": [], 389 | "source": [] 390 | } 391 | ], 392 | "metadata": { 393 | "kernelspec": { 394 | "display_name": "Python 3", 395 | "language": "python", 396 | "name": "python3" 397 | }, 398 | "language_info": { 399 | "codemirror_mode": { 400 | "name": "ipython", 401 | "version": 3 402 | }, 403 | "file_extension": ".py", 404 | "mimetype": "text/x-python", 405 | "name": "python", 406 | "nbconvert_exporter": "python", 407 | "pygments_lexer": "ipython3", 408 | "version": "3.4.2" 409 | } 410 | }, 411 | "nbformat": 4, 412 | "nbformat_minor": 2 413 | } 414 | -------------------------------------------------------------------------------- /ProblemSet.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "## Useful Resources\n", 8 | "\n", 9 | "https://www.cs.princeton.edu/courses/archive/spring13/cos423/lectures.php" 10 | ] 11 | }, 12 | { 13 | "cell_type": "markdown", 14 | "metadata": {}, 15 | "source": [ 16 | "## 1. Longest path in a DAG\n", 17 | "\n", 18 | "### Description\n", 19 | "\n", 20 | "We are given a directed acyclic graph G and two specific vertices s and t in G. Each edge in the graph has a length. Design a polynomial time algorithm that finds the longest path from s to t. (if there is no path from s to t, your algorithm should be able to detect the fact.)\n", 21 | "\n", 22 | "### Solution (DP on graph)\n", 23 | "\n", 24 | "```\n", 25 | "(V, E) = sort G in topological order\n", 26 | "\n", 27 | "for each node x in V:\n", 28 | " x.dist = -inf\n", 29 | " x.pred = null\n", 30 | " \n", 31 | "ind_s = index of s in V\n", 32 | "V[ind_x].dist = 0\n", 33 | "\n", 34 | "for each node x in V[ind_x:]:\n", 35 | " for each edge e = (y, x) in E:\n", 36 | " if (x.dist < y.dist + e.weight):\n", 37 | " x.dist = y.dist + e.weight\n", 38 | " x.pred = y\n", 39 | "\n", 40 | "ind_t = index of t in V\n", 41 | "if V[ind_t].dist > 0:\n", 42 | " return V[ind_t].dist, path tracked by V[ind_t].pred\n", 43 | "else:\n", 44 | " return no-path-from-s-to-t\n", 45 | "```\n", 46 | "\n", 47 | "Theorem: Topological order can be found in any DAG in linear time $O(|V|)$, e.g. remove node with no incoming edge repeatedly from V." 48 | ] 49 | }, 50 | { 51 | "cell_type": "markdown", 52 | "metadata": {}, 53 | "source": [ 54 | "## 2. Finding the maximum area polygon\n", 55 | "\n", 56 | "### Description\n", 57 | "\n", 58 | "We are given a unit circle and n points on the circle. Design a polynomial time algorithm that, given a number m < n, finds m points (out of n points) such that the area of the polygon formed by the m points is maximized.\n", 59 | "\n", 60 | "### Solution (DP)\n", 61 | "\n", 62 | "Points are labeled 1, 2, ..., n. A[j, k] is the maximum area if we choose k points among points 1, 2, ..., j, where point 1 and j are chosen and fixed.\n", 63 | "\n", 64 | "We have\n", 65 | "\n", 66 | "$$\n", 67 | "A[j, k] = \\max_{1 0$ it is feasible and there exists a path. The path is given by tracing $P[i, m]$ back. Running time is at most $O(m|V|)$." 227 | ] 228 | }, 229 | { 230 | "cell_type": "markdown", 231 | "metadata": {}, 232 | "source": [ 233 | "## 6. Edit distance\n", 234 | "\n", 235 | "### Description\n", 236 | "\n", 237 | "In order to transform one string x to a target string y, we can perform various edit operations. Our goal is, given x and y, to produce a series of edits that change x to y. We may choose from among edit operations:\n", 238 | "\n", 239 | "1. Insert a letter, (e.g., changing 100 to 1001 takes one insertion)\n", 240 | "1. Delete a letter, (e.g., changing 100 to 10 takes one deletion)\n", 241 | "1. Replace a letter by another (e.g., you need do one replacement to change 100 to 000).\n", 242 | "\n", 243 | "Design an efficient algorithm that finds a series of edit operations that change x to y and the total number of edits is minimized." 244 | ] 245 | }, 246 | { 247 | "cell_type": "markdown", 248 | "metadata": {}, 249 | "source": [ 250 | "### Solution (DP)\n", 251 | "\n", 252 | "Also a DP problem. Defind A[i,j] as the number of edit operation need to take to change x[1:i] to y[1:j].\n", 253 | "\n", 254 | "We have\n", 255 | "\n", 256 | "```\n", 257 | "A[i, j] = min(\n", 258 | " A[i-1, j] + 1, # delete\n", 259 | " A[i, j-1] + 1, # insert\n", 260 | " A[i-1, j-1] + I(x[i] != y[j]) # match or replace\n", 261 | ")\n", 262 | "```\n", 263 | "\n", 264 | "where $I(\\cdot)$ is indicator function. DP is running from $i+j = 2$ to len(x) + len(y)." 265 | ] 266 | }, 267 | { 268 | "cell_type": "markdown", 269 | "metadata": {}, 270 | "source": [ 271 | "## 7. Four Russians Speedup\n", 272 | "\n", 273 | "### Description\n", 274 | "\n", 275 | "In this problem, we explore an interesting trick to speed up a dynamic program. We use the edit distance problem as an example (See the last problem). We assume the size of the alphabet is constant. Suppose the two strings are $S_1$ and $S_2$, both of length $n$. To start with, your solution for the last problem must run in $O(n^2)$ time. If it is so, in the dynamic program for the problem, you need to fill a two dimensional table $M$. In fact, we can fill out this table in a more clever way such that the running time can be improved to $O(n^2/ \\log n)$ (I know, it is a small improvement and of little practical interests for this particular problem. But the same trick has been used somewhere else to make a huge difference). In this trick, we need to a bit preprocessing to make many small tables. Then we fill only a subset of entries of the dynamic program table. The value of a table entry we are filling depends on the values of some entries we have already filled and the list we made in the beginning.\n", 276 | "\n", 277 | "To make it a little bit more formal. We define a t-block to be a $t \\times t$ squares in the dynamic programming table. Let $t = \\dfrac{\\log n}{4}$. We first observe that the distance values in a t-block starting in position $(i,j)$ are a function of the values of its first row and first colum and substrings $S_1[i,\\cdots ,i+t−1]$ and\n", 278 | "$S_2[i,\\cdots,i+t−1]$.\n", 279 | "\n", 280 | "Now, let us observe another interesting fact: In any row or column, the values of two adjacent cells differ by at most 1.\n", 281 | "\n", 282 | "* Please prove this fact.\n", 283 | "\n", 284 | "We say two t-blocks $B_1$, $B_2$ are offset-equivalent if there is a number $c$, such that $B_1[i, j] = B_2[i, j]+c$ for all $i$, $j$.\n", 285 | "\n", 286 | "* There are C types of t-blocks (up to offset-equivalence). Show how large C is.\n", 287 | "\n", 288 | "It is obvious we can fill a t-block in $O(t^2)$ times. Therefore, filling all C types of blocks takes $O(Ct^2)$ times. These are the small tables we produce for preprocessing. How, we start to fill up the dynamic table $M$. We are going to fill up only $O(n^2/ \\log n)$ entries.\n", 289 | "\n", 290 | "* We have given you enough hints. Now, it is up to you to develop the entire algorithm which should run in $O(n^2/ \\log n)$ time. In particular, you need to describe which entries of M that need to be filled and how to compute their values." 291 | ] 292 | }, 293 | { 294 | "cell_type": "markdown", 295 | "metadata": {}, 296 | "source": [ 297 | "### Solution (DP optimization)\n", 298 | "\n", 299 | "(0) \n", 300 | "\n", 301 | "Draw grids on the big DP table and divide it into blocks. The objective is to obtain $M[n,n]$, which means we do not care the values within the table. We can compute the table block by block. Input for each block is 1) the upper left value A; 2) the first column B; 3) the first row C; 4) the corresponding $S_1[i, \\cdots, i+t-1]$ and $S_2[i, \\cdots, i+t-1]$. The output of the block is 1) the last column D; 2) the last row E; 3) the lower right value F.\n", 302 | "\n", 303 | "We do the computation for all possible blocks in advance and use the result to fill the big DP table.\n", 304 | "\n", 305 | "(1)\n", 306 | "\n", 307 | "In any row or column, the values of two adjacent cells differ by at most 1. It is written as $M[i, j-1] - 1 \\le M[i, j] \\le M[i, j-1] + 1$.\n", 308 | "\n", 309 | "The second part is easy to prove by the transition equation $M[i,j] = \\min(M[i,j-1]+1, \\cdots)$. For the first part, consider the value of $M[i,j]$ comes from the path ``M[i-k, j-1] -> M[i-k+1, j] -> M[i, j] ``. The first transition at least costs zero, and the second transition costs at least $k-1$. Thus, $M[i,j] \\ge M[i-k, j-1] + (k-1)$. Considering another route ``M[i-k, j-1] -> M[i, j-1]`` costs at most $k$, we have $M[i-k] + k \\ge M[i, j-1]$. Combining the two, we have $M[i, j-1] - 1 \\le M[i, j]$.\n", 310 | "\n", 311 | "(2)\n", 312 | "\n", 313 | "Considering offset equivalence, different A does not incur different types of blocks. Considering the property in (1), the absolute values in B and C does not matter, and we can use incremental sequence to denote B and C, e.g. B=$\\{+1, -1, +1, 0, -1, \\cdots \\}$. Each of $S_1[i, \\cdots, i+t-1]$ and $S_2[i, \\cdots, i+t-1]$ has $|V|^t$ different possibilities.\n", 314 | "\n", 315 | "Combined, we have $C = 3^t 3^t |V|^t |V|^t = 3^{2t}|V|^{2t}$\n", 316 | "\n", 317 | "(3)\n", 318 | "\n", 319 | "In the preprocessing step, we need to compute results for $C$ blocks. Each takes $O(t^2)$ time. Combined, we need $O[(3|V|)^{2t} t^2]$ time.\n", 320 | "\n", 321 | "In the computation step, there are $O[(\\dfrac{n}{t})^2]$ blocks should be fill in the large table. The input and output size of each block is $O(t)$, therefore we need $O(t)$ time to look for a block. Combined we need $O(\\dfrac{n^2}{t})$ time. \n", 322 | "\n", 323 | "By taking $t = \\dfrac{\\log_{3|V|} n}{2}$, the total running time $O[\\dfrac{n^2}{t} + (3|V|)^{2t} t^2] = O(n^2 / \\log n)$.\n", 324 | "\n", 325 | "```\n", 326 | "# do the preprocessing\n", 327 | "for c in (all types of the block):\n", 328 | " result = compute the result for the block c\n", 329 | " B[c] = result\n", 330 | " \n", 331 | "# do the computation\n", 332 | "for i = 1 to ceil(n/t):\n", 333 | " for j = 1 to ceil(n/t):\n", 334 | " M[i+t-1, j+t-1], M[i+1:i+t-1, j+t-1], M[i+t-1, j+1:j+t-1] \\\n", 335 | " = B(M[i, j], M[i, j+1:j+t-1], M[i+1:i+t-1, j], incr(S1[i:i+t-1]), incr(S2[j:j+t-1]))\n", 336 | " \n", 337 | "return M[n,n]\n", 338 | "```\n", 339 | "\n", 340 | "### Reference\n", 341 | "\n", 342 | "* http://cs.au.dk/~cstorm/courses/AiBS_e12/slides/FourRussians.pdf" 343 | ] 344 | }, 345 | { 346 | "cell_type": "markdown", 347 | "metadata": {}, 348 | "source": [ 349 | "## 8. Solve the following two recurrences\n", 350 | "\n", 351 | "### Description\n", 352 | "\n", 353 | "(a) $T(n)=2T(\\dfrac{n}{2})+n\\log n$\n", 354 | "\n", 355 | "(b) $T(n)=2T(\\sqrt{n})+\\log n$" 356 | ] 357 | }, 358 | { 359 | "cell_type": "markdown", 360 | "metadata": {}, 361 | "source": [ 362 | "### Methods\n", 363 | "\n", 364 | "Methods to solve recurrences are \n", 365 | "\n", 366 | "1. Guess and confirm (can be used in all cases): guess and replace to prove it's true; unrolling to find the rule\n", 367 | "2. Recurrence tree (solve $T(n) = aT(\\frac{n}{b}) + f(n)$ like recurrences): $T(n) = \\sum_{k=0}^L a^k f(\\dfrac{n}{b^k})$\n", 368 | "3. Master therem\n", 369 | "\n", 370 | "![](figures/fig20180320_master_theorem.png)\n", 371 | "\n", 372 | "Something useful:\n", 373 | "* $\\sum_{i=1}^n \\dfrac{1}{i} = \\Theta(\\log n)$\n" 374 | ] 375 | }, 376 | { 377 | "cell_type": "markdown", 378 | "metadata": {}, 379 | "source": [ 380 | "### Solution\n", 381 | "\n", 382 | "(a)\n", 383 | "\n", 384 | "Use recurrence tree formula\n", 385 | "\n", 386 | "$$\n", 387 | "\\begin{aligned}\n", 388 | "T(n) & = \\sum_{k=0}^L 2^k f(\\dfrac{n}{2^k}) = \\sum_{k=0}^L 2^k \\dfrac{n}{2^k} \\log(\\dfrac{n}{2^k}) \\\\\n", 389 | "& = \\sum_{k=0}^L n (\\log n - k) \\\\\n", 390 | "& = n \\sum_{j=0}^{\\log n} j \\\\\n", 391 | "& = \\Theta(n \\log n \\log n)\n", 392 | "\\end{aligned}\n", 393 | "$$" 394 | ] 395 | }, 396 | { 397 | "cell_type": "markdown", 398 | "metadata": {}, 399 | "source": [ 400 | "(b)\n", 401 | "\n", 402 | "By drawing the recurrence tree, we find the numbers of calculation on each layer are the same, $2^k \\log(n^{\\dfrac{1}{2^k}}) = \\log n$. Easily we get, $T(n) = \\Theta(\\log n \\log n)$" 403 | ] 404 | }, 405 | { 406 | "cell_type": "markdown", 407 | "metadata": {}, 408 | "source": [ 409 | "## 9. Maximal Common Subsequence\n", 410 | "\n", 411 | "### Description\n", 412 | "\n", 413 | "String C is a subsequence of string A if C can be obtained by deleting some letters from A. For example, ``ade`` is a substring of ``abcde``. We are given two string $A = a_1a_2 \\cdots a_m$ and $B = b_1b_2 \\cdots b_n$. Design an algorithm that finds a common subsequence of A and B using $O(mn)$ time and $O(m + n)$ space. (You will get half of the points if you can find a polynomial time algorithm, but the running time (and/or) the space are worse that the stated.)" 414 | ] 415 | }, 416 | { 417 | "cell_type": "markdown", 418 | "metadata": {}, 419 | "source": [ 420 | "### Solution (DP and space optimization)\n", 421 | "\n", 422 | "Define ``P[i, j]`` to be the length of longest common subsequence (LCS), ``Q[i,j]`` to trace the path of this LCS. A DP formula can be easily written as\n", 423 | "\n", 424 | "$$\n", 425 | "P[i,j] = \\max(P[i-1, j], P[i,j-1], P[i-1, j-1] + I(A[i] == B[j]))\n", 426 | "$$\n", 427 | "\n", 428 | "This takes $O(mn)$ time and $O(mn)$ space. Further investigation shows that only the adjacent DP matrix is useful, thus we can have the following algorithm running in $O(mn)$ and using $O(m+n)$ space to obtain the length of LCS.\n", 429 | "\n", 430 | "```\n", 431 | "X = [0] * (m+n)\n", 432 | "Y = [0] * (m+n)\n", 433 | "Z = [0] * (m+n)\n", 434 | "\n", 435 | "for i = 1 to n:\n", 436 | " for j = 1 to m:\n", 437 | " X[j] = max(Y[j], X[j-1], Y[j-1] + 1 * (A[i] == B[j]))\n", 438 | " if i == floor(n/2):\n", 439 | " Z[j] = which term is taken in the above max function\n", 440 | " Y = X\n", 441 | "```\n", 442 | "\n", 443 | "Notice that we cannot trace what the LCS is. By Dan Hirschberg (1975), we can use another 1D array to note down one trace in the middle (the ``Z`` in the above) and find the crossing point k. Later solve recursively LCS(A[1:n/2], B[1:k]) and LCS(A[n/2+1:n], B[k+1:m]). Though much more computation to be done, running time is still $O(mn)$.\n", 444 | "\n", 445 | "### Reference\n", 446 | "\n", 447 | "* https://www.ics.uci.edu/~eppstein/161/960229.html" 448 | ] 449 | }, 450 | { 451 | "cell_type": "markdown", 452 | "metadata": {}, 453 | "source": [ 454 | "## 10. Stick Game\n", 455 | "\n", 456 | "### Description\n", 457 | "\n", 458 | "There is a pile of n sticks. Two players A and B take turns removing 1 or 4 sticks. A starts first. The players who removes the last stick wins. Design a polynomial time algorithm (polynomial in n) that decides which player has a winning strategy (i.e., no matter how the opponent plays, the player can take certain moves to win the game.)\n", 459 | "\n", 460 | "### Solution (induction)\n", 461 | "\n", 462 | "Just simply find patterns when n = 1, 2, 3, ...\n", 463 | "\n", 464 | "We found that when n = 5, 10, 15, ..., A has no winning strategy, i.e. if B has a winning strategy. In other cases, A has winning strategy. We guess when ``n mod 5 == 0`` B has winning strategy, otherwise A has winning strategy.\n", 465 | "\n", 466 | "Next, we are going to prove it. When ``n mod 5 == 0``, no matter how many sticks A removes, B can remove 1 to 4 sticks to make A face a number that can be devided by 5 again. At last, when n = 5, the number of sticks after A's action is 1 to 4. B wins by remove all the remaining sticks. When ``n mod 5 != 0``, on the other hand, A can always let B face the 'dead' numbers (numbers that can be devided by 5) and wins.\n", 467 | "\n", 468 | "Thus\n", 469 | "\n", 470 | "```\n", 471 | "if n mod 5 == 0:\n", 472 | " return B\n", 473 | "else:\n", 474 | " return A\n", 475 | "```\n", 476 | "\n", 477 | "which is an $O(1)$ algorithm." 478 | ] 479 | }, 480 | { 481 | "cell_type": "markdown", 482 | "metadata": {}, 483 | "source": [ 484 | "## 11. Monge Matrix\n", 485 | "\n", 486 | "### Description\n", 487 | "\n", 488 | "![](figures/fig20180320_monge_matrix.png)" 489 | ] 490 | }, 491 | { 492 | "cell_type": "markdown", 493 | "metadata": {}, 494 | "source": [ 495 | "### Solution (divide and conquer)\n", 496 | "\n", 497 | "(a)\n", 498 | "\n", 499 | "It can be proved by contradiction. Suppose there exists some $i < j$, s.t. $f(i)>f(j)$. Observe that ``A[i, f(j)] > A[i, f(i)]`` and ``A[j, f(i)] >= A[j, f(j)]``. We have ``A[i, f(j)] + A[j, f(i)] > A[i, f(i)] + A[j, f(j)]`` and ``i 1$, $k \\ge 1$. You are asked to change for n cents using the fewest number of coins. Show that the greedy algorithm yields an optimal solution. (The greedy algorithm first tries the coin with the largest denomination , then the coin with the second largest denomination, and so on).\n", 620 | "\n", 621 | "### Method\n", 622 | "\n", 623 | "```\n", 624 | "Greedy-choice property:\n", 625 | " If element x cannot be used immediately by GREEDY, it can never be used later.\n", 626 | "```\n", 627 | "\n", 628 | "### Solution (greedy proof)\n", 629 | "\n", 630 | "Denote the solution followed by the greedy algorithm to be $a = (a_0, a_1, \\cdots, a_k)$, where $a_i$ is the number of coins whose denomination is $c^i$. An optimal solution to be $b = (b_0, b_1, \\cdots, b_k)$.\n", 631 | "\n", 632 | "Firstly, we discover a property for optimal solution. $b_i \\le c-1,\\, \\forall 0\\le i \\le k-1$. Otherwise, if there's a $b_i \\ge c$, we can change c $c^i$ coins to a $c^{i+1}$ coins without affacting others and decreasing the total number of coins. This contradicts the optimal assumption. \n", 633 | "\n", 634 | "Next, consider the situation when remaining denomination x is faced. Suppose $c^j \\le x < c^{j+1}$. We wanna show that an optimal solution must contain a $c^j$ coin. Suppose in optimal solution $b_j = 0$. Obviously, $b_i = 0, i > 0$, or the denomination of coins exceeds the required denomination. Given the previous property, the denomination this optimal solution can present $\\sum_{i=0}^{j-1} b_i c^i \\le \\sum_{i=0}^{j-1} (c-1) c^i = c^j - 1 < c^j \\le x$, which is not possible to be a valid solution. Thus, when $c^j \\le x < c^{j+1}$ is faced, a $c^j$ coin must be contained in an optimal solution.\n", 635 | "\n", 636 | "By induction, the greedy algorithm is proved to be correct every single step." 637 | ] 638 | }, 639 | { 640 | "cell_type": "markdown", 641 | "metadata": {}, 642 | "source": [ 643 | "## 14. Schedule to minimize completion time\n", 644 | "\n", 645 | "### Description\n", 646 | "\n", 647 | "You are given a set $S = \\{a_1, a_2, \\cdots , an\\}$ tasks. Task $a_i$ is released at time $r_i$ and requires $p_i$ units of time to process. You machine can process one task at each time. Assume preemption is allowed, so that you can suspend a task and resume it at a later time. For a particular schedule $S$, we denote the completion time of task $a_i$ to be $c_S(a_i)$. Give a polynomial time algorithm that finds a schedule $S$ of all tasks and minimizes the total completion time $\\sum^n_{i=1} c_S(a_i)$." 648 | ] 649 | }, 650 | { 651 | "cell_type": "markdown", 652 | "metadata": {}, 653 | "source": [ 654 | "### Solution (greedy)\n", 655 | "\n", 656 | "```\n", 657 | "# num of unit time since start\n", 658 | "t = 0 \n", 659 | "# remaining time of tasks\n", 660 | "r = p\n", 661 | "\n", 662 | "while there's task unfinished:\n", 663 | " i = None\n", 664 | " rmin = inf\n", 665 | " for j in tasks start before t:\n", 666 | " if r[j] < rmin:\n", 667 | " i = j\n", 668 | " rmin = r[j]\n", 669 | "\n", 670 | " t_closest = closest start time of unfinished task\n", 671 | "\n", 672 | " if i is None:\n", 673 | " t = t_closest\n", 674 | " elif t + rmin < t_closest:\n", 675 | " finish task i\n", 676 | " r[i] = 0\n", 677 | " t = t + rmin\n", 678 | " else:\n", 679 | " do task i till t_closest\n", 680 | " r[i] -= (t_closest - t)\n", 681 | " t = t_closest\n", 682 | "```\n", 683 | "\n", 684 | "Running time: main loop either jump to a start time or finish a job, at most $O(n)$ runs. Finding shortest remaining time and find closest start time each takes $O(n)$. Thus $O(n^2)$.\n", 685 | "\n", 686 | "I cannot prove it.... tell me (zhangchuheng123@qq.com) if you can." 687 | ] 688 | }, 689 | { 690 | "cell_type": "markdown", 691 | "metadata": {}, 692 | "source": [ 693 | "## 15. Matching in graph\n", 694 | "\n", 695 | "### Description\n", 696 | "\n", 697 | "We are given an unweighted undirected graph G. Let $M$ be a matching in G that has no\n", 698 | "augmenting path of length smaller than $2t + 1$. Let $M^∗$ be the maximum matching in G. Show that\n", 699 | "$|M|\\ge \\dfrac{t}{t+1} |M^∗|$." 700 | ] 701 | }, 702 | { 703 | "cell_type": "markdown", 704 | "metadata": {}, 705 | "source": [ 706 | "### Methods\n", 707 | "\n", 708 | "* matching: 不共点的边集M\n", 709 | "* matching number: 不共点边集的边的数量\n", 710 | "* maximum matching: matching number最大的\n", 711 | "* perfect matching: 能够覆盖所有点的\n", 712 | "* M-alternating path: $G=(V, E)$中一条交替出现在$M$和$E\\setminus M$中的路径\n", 713 | "* M-augmenting path: 路径中两端没有被覆盖\n", 714 | "\n", 715 | "### Solution (graph)\n", 716 | "\n", 717 | "假设$|M^*|-|M|=m$,边集$P$定义为存在于$M^*$中但是不在$M$中的集合。由于$M$和$M^*$都是合法的matching,所以$P$的中的边都互不相接。假设$P$中的边能够和$M$构成n条M-augmenting path,那么这些path也互不相接。定义这些path的长度为$2p_i + 1$。由于互不相接,每条path的替换都能够使得$M$增长一条边,即$n=m$。同时观察到$M$中的边只可能出现在一条path中,$ |M| \\le \\sum_i p_i $;最短的path也有$2t+t$,$\\sum_i p_i \\ge n t$。\n", 718 | "\n", 719 | "得到 \n", 720 | "$$\n", 721 | "\\dfrac{|M^*|-|M|}{|M|} = \\dfrac{m}{|M|} \\le \\dfrac{n}{\\sum_i p_i}\\le \\dfrac{n}{nt}\\le \\dfrac{1}{t}\n", 722 | "$$\n", 723 | "\n", 724 | "\n", 725 | "### Reference\n", 726 | "\n", 727 | "http://www-sop.inria.fr/members/Frederic.Havet/Cours/matching.pdf" 728 | ] 729 | }, 730 | { 731 | "cell_type": "markdown", 732 | "metadata": {}, 733 | "source": [ 734 | "## 16.\n", 735 | "\n", 736 | "### Description\n", 737 | "\n", 738 | "We are given n jobs and one server. Each job $j$ is associated with a profit $p_j$, a release time $r_j$ and completion time $c_j$. If we decide to schedule job $j$, the server has to process it continuously from time $r_j$ to $c_j$ and we can get a profit $p_j$. No partial profit can be obtained if the job is not finished. The server can process at most $k$ jobs at any time. Design a polynomial time algorithm that finds a feasible schedule such that the total profit we can get is maximized.\n", 739 | "\n", 740 | "### Solution\n", 741 | "\n" 742 | ] 743 | }, 744 | { 745 | "cell_type": "markdown", 746 | "metadata": {}, 747 | "source": [ 748 | "## 40.\n", 749 | "\n", 750 | "### Description\n", 751 | "\n", 752 | "Without the help of a computer or calculator, find the total sum of the digits in all integers from 1 to a million, inclusive. Write down the computation details.\n", 753 | "\n", 754 | "### Solution\n", 755 | "\n", 756 | "1到9合起来是45,到100w有6位的数字,每一位都从0~9变化了10w次,所以是$6*100000*45+1=27000001$次,最后那一个是100w中的单出来的1。" 757 | ] 758 | }, 759 | { 760 | "cell_type": "markdown", 761 | "metadata": {}, 762 | "source": [ 763 | "## 41. Rumor Spreading\n", 764 | "\n", 765 | "There are n people, each in possession of a different rumor. They want to share all the rumors through a series of bilateral conversations (e.g., via a telephone). Devise an efficient (in terms of the total number of conversations) algorithm for this task. Assume that in every conversation both parties exchange all the rumors they know at the time. (You can assume n is a power of 2 first.)" 766 | ] 767 | }, 768 | { 769 | "cell_type": "markdown", 770 | "metadata": {}, 771 | "source": [ 772 | "## 42. \n", 773 | "\n", 774 | "### Description\n", 775 | "\n", 776 | "We are given a sequence of distinct numbers A1, A2, A3, ,, An. For each number, its position in the sequence and its position in the sorted sequence (in increasing order) differ by at most k, where k is much smaller than n. Design an algorithm that sort the sequence in time less than O(n log n). Of course, the running time of your algorithm should depend on k.\n", 777 | "\n", 778 | "### Solution\n", 779 | "\n", 780 | "```\n", 781 | "for i = 1 to k:\n", 782 | " for j = 1 to n-1:\n", 783 | " if A[j] > A[j+1]:\n", 784 | " swap(A[j], A[j+1])\n", 785 | "```\n", 786 | "\n", 787 | "运行时间O(kn),由于k远小于n,也应该认为k远小于logn。" 788 | ] 789 | }, 790 | { 791 | "cell_type": "markdown", 792 | "metadata": {}, 793 | "source": [ 794 | "## 46.\n", 795 | "\n", 796 | "### Description\n", 797 | "\n", 798 | "An undirected Eularian graph is a connected graph in which all nodes have even degree. You job is to design an efficient algorithm that traverse an undirected Eularian graph so that each edge is visited exactly once.\n", 799 | "\n", 800 | "### Solution\n", 801 | "\n", 802 | "* 方法一:从任意的一个点开始,每选择一条边之后就把这个边从图里面删除,当有多条边可以选择的时候,要选择删除这条边之后不会破坏图联通性的边。检测图连通性的方式最差可以使用一个DFS在多项式时间里面找出来。对于连通性的检测的次数最多为O(d|E|),其中d为图中最大的度数\n", 803 | "* 方法二:从任意的一个点开始,随便走,最后肯定能回到这个点,只不过有些点访问不到。在刚刚的路径里面找一个点,它还有往外面连通的边,然后从这个点再随便走一个环,并且把这个环加到刚刚的环的某个位置上。重复此操作,可以找到最后的环。复杂度O(|E|)" 804 | ] 805 | }, 806 | { 807 | "cell_type": "markdown", 808 | "metadata": {}, 809 | "source": [ 810 | "## 47.\n", 811 | "\n", 812 | "### Description\n", 813 | "\n", 814 | "A directed Eularian graph is a strongly connected graph in which the indegree of each node is equal to its outdegree. You job is to design an efficient algorithm that traverse a directed Eularian graph so that each edge is visited exactly once.\n", 815 | "\n", 816 | "### Solution\n", 817 | "\n", 818 | "和前一题类似" 819 | ] 820 | }, 821 | { 822 | "cell_type": "markdown", 823 | "metadata": {}, 824 | "source": [ 825 | "## 57.\n", 826 | "\n", 827 | "### Description\n", 828 | "\n", 829 | "Given a set $\\mathcal{H}$ of halfspaces (in general positions) in $\\mathbb{R}^d$, design an efficient algorithm that decides whether $\\mathcal{H}$ can cover the whole space $\\mathbb{R}^d$. If $\\mathcal{H}$ can cover $\\mathbb{R}^d$, show there is always a subset of $\\mathcal{H}$ with at most d + 1 halfspaces that also covers $\\mathbb{R}^d$." 830 | ] 831 | }, 832 | { 833 | "cell_type": "markdown", 834 | "metadata": {}, 835 | "source": [ 836 | "### Solution\n", 837 | "\n", 838 | "Similar to Helly's Theorem but not the same, I have no idea." 839 | ] 840 | }, 841 | { 842 | "cell_type": "markdown", 843 | "metadata": {}, 844 | "source": [ 845 | "## 59.\n", 846 | "\n", 847 | "### Description\n", 848 | "\n", 849 | "There are n players. Each player holds a private 0/1 bit. Player 1 is the coordinator. The coordinator wants to compute the parity (i.e., determine there are even number of 1s or odd number of 1s).\n", 850 | "\n", 851 | "In each round, your protocol can choose a player and let the player to broadcast a bit (depending on your protocol, the bit can be either the player’s own bit, or other encoded information). Upon the broadcast, each other player can receive the bit with probability $1 − \\epsilon$ for some small constant $0 < \\epsilon < 1/3$, and the flipped bit with probability $\\epsilon$ (all flips are independent of each other). Show if every player broadcasts her bits for k = O(log n) times, the coordinator can compute the parity correctly with probability 0.9." 852 | ] 853 | }, 854 | { 855 | "cell_type": "markdown", 856 | "metadata": {}, 857 | "source": [ 858 | "### Solution\n", 859 | "\n", 860 | "题意1:每次广播之后每个人以概率$1-\\epsilon$变成这个被广播的bit,以$\\epsilon$翻转手中的bit。\n", 861 | "\n", 862 | "题目还有不太清楚的地方就是,coordinator显然不能知道全局信息,但是每个player能不能知道全局信息呢?如果每个player能知道全局信息,那么它就广播目前大家手里面比较少的bit,比如10101时,就广播0,然后持有1的不管怎么着都变成0,持有0的以概率$1-\\epsilon$保持0。Claim:这些个一样的bit以后都会保持一样,并且bit一样的数目会越来越多。\n", 863 | "\n", 864 | "假设每一轮广播之前,更多的比特的持有数目为$x_i$,较少的就为$n-x_i$,广播较少的比特数之后,持有之前较少比特数的期望为$x+(n-x)(1-\\epsilon)$,即$x_{i+1} = x_i+(n-x_i)(1-\\epsilon)$,经过k轮之后大概率大家手上的bit都是一样的了,因此能够计算总宇称。\n", 865 | "\n", 866 | "题意二:每个人接收被广播的bit都有$\\epsilon$的概率出错,但是每个人手上的bit是固定不变的\n", 867 | "\n", 868 | "那么就要求每个人广播自己的bit 2k+1次,然后coordinator就对每个人广播的bit做majority vote来决定这个人手里面是0还是1,majority vote的胜率是$1-\\Phi(\\dfrac{k - 2k(1-\\epsilon)}{\\sqrt{2k\\epsilon(1-\\epsilon)}})$(Moivre-Laplace-Theorem),最后证明以k=O(log n)就可以大概率得到每个人正确的bit。" 869 | ] 870 | }, 871 | { 872 | "cell_type": "markdown", 873 | "metadata": {}, 874 | "source": [ 875 | "## 60.\n", 876 | "\n", 877 | "### Description\n", 878 | "\n", 879 | "There are n straight lines in a $\\mathbb{R}^{2}$ plane. \n", 880 | "1. Design an $O(n\\log n)$ time algorithm for the following decision problem: Given two vertical slab W bounded by two vertical line x = a and x = b, compute how many intersection points of $\\{l_i\\},\\ i\\in[n]$ are there in W.\n", 881 | "2. Given a vertical slab, show how to uniformly sample q intersection points in W in $O(n \\log n + q)$ time." 882 | ] 883 | }, 884 | { 885 | "cell_type": "markdown", 886 | "metadata": {}, 887 | "source": [ 888 | "### Solution\n", 889 | "\n", 890 | "1.\n", 891 | "\n", 892 | "1. find intersection points of the n lines with x=a in O(n). \n", 893 | "2. argsort these points in O(nlogn), resulting in a list of the numbering of the lines $[i_1, i_2, \\cdots, i_n]$, where $i_1$ represent the $i_1$-st line's intersection with x=a hax maximum y. \n", 894 | "3. calculate the lines' intersection with x=b note down $(j, y_{i_j})$ if line $i_j$ intersect at $(x,y)=(b,y_{i_j})$ in O(n)\n", 895 | "4. argsort these tuple by y in O(n logn), resulting in a list which is a permutation of $[n]$\n", 896 | "5. number of intersection points is the number of inversions(逆序对)\n", 897 | "6. we can count inversions in a mergesort style during which a counter is added - the counter should add the length of current left-hand-side part when a element in right-hand-side is added to the merging queue. This can be done in O(n log n)" 898 | ] 899 | }, 900 | { 901 | "cell_type": "markdown", 902 | "metadata": {}, 903 | "source": [ 904 | "2.\n", 905 | "\n", 906 | "在之前的过程中,每一层的merge都记录下后半部分表内元素和能够和前半部分表内元素产生逆序对的格式,按照前序遍历把这些个数都记录下来。这样我们有一个数组$C_1, C_2, \\cdots, C_s$。假设总逆序对个数为$C=\\sum C_i$,那么就按照C个数中随机采样i个点,其中最小一个点的分布依次产生一个随机数(for i=q downto 1),这样能够在O(q)的时间内产生均匀分布的q个点。我们下一步要在O(q)的时间内计算出这q个数值中的每一个数值$x_i$对应的$a=\\max(\\{i: \\sum_{j=1}^i C_j \\le x\\})$和$b=x-\\sum_{j=1}^a C_j$,然后通过这个数组下储存的辅助信息就可以求得到对应的是哪两条直线,从而可以求出交点。要想实现找到a,b的过程,可以使用Van Emde Boas tree。" 907 | ] 908 | }, 909 | { 910 | "cell_type": "markdown", 911 | "metadata": {}, 912 | "source": [ 913 | "## 64. Tiling Problem is NPC\n", 914 | "\n", 915 | "### Description\n", 916 | "\n", 917 | "We are given a finite set S of rectangles and a rectangle R in the plane. Is there a way of placing the rectangles of S inside R, so that no pair of the rectangles intersect, and all the rectangles have their edges parallel of the edges of R? Show the problem is NPC.\n", 918 | "\n", 919 | "### Method\n", 920 | "\n", 921 | "* 一般证明步骤:\n", 922 | " 1. 证明是NP(给定一个候选的解,能够在多项式时间内检验解是不是合理的解)\n", 923 | " 2. 能够把一个现成的NPC问题规约到这个问题上(给定一个NPC问题,把NPC问题的输入在多项式时间内变为这个问题的输入,再将这个问题的输出变化为该NPC问题的输出)\n", 924 | "* 常见NPC问题:\n", 925 | " 1. 3SAT:给定一堆布尔变量和一堆子句(形如A或者B或者C),是否存在一个布尔变量的赋值是的所有子句为真。\n", 926 | " 2. Independent Set:图G中的顶点集合S满足其中的任意两个顶点不共边,则称S为独立集;求解图G的最大独立集。\n", 927 | " 3. Vertex Cover:图的顶点覆盖是一组顶点的集合,使得图的每个边缘至少与集合中的一个顶点相连接;求解图G的最大顶点覆盖。\n", 928 | " 4. Integer Linear Programming:线性方程组是否存在整数解\n", 929 | " 5. Hamiltonian Cycle:一个图是否存在Hamiltonian cycle\n", 930 | " 6. Subset Sum:给定一组整数,是否存在一些整数使得其和等于K\n", 931 | " 7. Bin Packing:给定一组一维长度小于1的线段,找出这堆线段的一个划分,使得划分的子集数目最小,并且每个子集中线段的和都小于1\n", 932 | " \n", 933 | "### Solution\n", 934 | "\n", 935 | "1. 给定一个矩形的分配方案,我们能够在多项式内验证这个方案是否可以,即,逐对检查矩形是否相交,并且是否超出了大矩形,即是NP\n", 936 | "2. 我们从Bin Packing规约,给定Bin Packing的一个输入$x_1, x_2, \\cdots, x_n$,构造一个Tiling的输入,使得n个矩形的宽度分别为$x_i$,长度为2(任何大于1的数都可以),大举行的宽度为1,长度为$2k$。我们依次增加k,看看最少k等于多少时,能够把这些矩形都包含进去,返回这个k。这个k的数值就是Bin Packing问题里面最少需要的子集个数。注意到因为小矩形的长大于1,因此没法旋转,只能横向排。\n" 937 | ] 938 | } 939 | ], 940 | "metadata": { 941 | "kernelspec": { 942 | "display_name": "Python 3", 943 | "language": "python", 944 | "name": "python3" 945 | }, 946 | "language_info": { 947 | "codemirror_mode": { 948 | "name": "ipython", 949 | "version": 3 950 | }, 951 | "file_extension": ".py", 952 | "mimetype": "text/x-python", 953 | "name": "python", 954 | "nbconvert_exporter": "python", 955 | "pygments_lexer": "ipython3", 956 | "version": "3.6.3" 957 | } 958 | }, 959 | "nbformat": 4, 960 | "nbformat_minor": 2 961 | } 962 | --------------------------------------------------------------------------------