├── 第10章 共轭方向法.ipynb ├── 第11章 拟牛顿法.ipynb ├── 第13章 无约束优化问题和神经网络.ipynb ├── 第1章 证明方法与相关记法.ipynb ├── 第20章 仅含等式约束的优化问题.ipynb ├── 第21章 含不等式约束的优化问题.ipynb ├── 第22章 凸优化问题.ipynb ├── 第23章 有约束优化问题的求解算法.ipynb ├── 第2章 向量空间与矩阵.ipynb ├── 第3章 变换.ipynb ├── 第4章 有关集合概念.ipynb ├── 第5章 微积分基础.ipynb ├── 第6章 集合约束和无约束优化问题的基础知识.ipynb ├── 第7章 一维搜索方法.ipynb ├── 第8章 梯度方法.ipynb └── 第9章 牛顿法.ipynb /第10章 共轭方向法.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "第10章 共轭方向法" 8 | ] 9 | }, 10 | { 11 | "cell_type": "markdown", 12 | "metadata": {}, 13 | "source": [ 14 | "10.1 引言" 15 | ] 16 | }, 17 | { 18 | "cell_type": "markdown", 19 | "metadata": {}, 20 | "source": [ 21 | "一般情况下,共轭方向法的性能优于最速下降法,不如牛顿法。" 22 | ] 23 | }, 24 | { 25 | "cell_type": "markdown", 26 | "metadata": {}, 27 | "source": [ 28 | "共轭方向法特性:\n", 29 | "1. 对于$n$维二次型问题,能够在$n$步内得到结果;\n", 30 | "2. 共轭方向的代表方法共轭梯度法不需要计算黑塞矩阵;\n", 31 | "3. 不需要存储$n\\times n$矩阵,不需要对其进行求逆。" 32 | ] 33 | }, 34 | { 35 | "cell_type": "markdown", 36 | "metadata": {}, 37 | "source": [ 38 | "定义10.1 $\\mathbf{Q}$为$n\\times n$的对称实矩阵,对于方向$\\mathbf{d}^{\\left(0\\right)},\\mathbf{d}^{\\left(1\\right)},\\dots,\\mathbf{d}^{\\left(m\\right)}$对于所有的$i\\neq j$,有$\\mathbf{d}^{\\left(i\\right)\\top}\\mathbf{Q}\\mathbf{d}^{\\left(j\\right)}=0$,则称他们是关于$\\mathbf{Q}$共轭的。" 39 | ] 40 | }, 41 | { 42 | "cell_type": "markdown", 43 | "metadata": {}, 44 | "source": [ 45 | "10.2 基本的共轭方向法" 46 | ] 47 | }, 48 | { 49 | "cell_type": "markdown", 50 | "metadata": {}, 51 | "source": [ 52 | "针对$n$维二次型函数的最小化问题\n", 53 | "$$\\min f\\left(\\mathbf{x}\\right)=\\frac{1}{2}\\mathbf{x}^\\top\\mathbf{Q}\\mathbf{x}-\\mathbf{x}^\\top\\mathbf{b}$$\n", 54 | "其中,$\\mathbf{Q}=\\mathbf{Q}^\\top,\\mathbf{x}\\in\\mathbb{R}^n$。由于$\\mathbf{Q}>0$,因此函数$f$有一个全局极小点,可通过求解方程$\\mathbf{Q}\\mathbf{x}=\\mathbf{b}$得到。" 55 | ] 56 | }, 57 | { 58 | "cell_type": "markdown", 59 | "metadata": {}, 60 | "source": [ 61 | "基本的共轭方向法: \n", 62 | "给定初始点$\\mathbf{x}^{\\left(0\\right)}$和一组关于$\\mathbf{Q}$共轭的方向$\\mathbf{d}^{\\left(0\\right)},\\mathbf{d}^{\\left(1\\right)},\\dots,\\mathbf{d}^{\\left(n-1\\right)}$,迭代公式为\n", 63 | "$$\\mathbf{g}^{\\left(k\\right)}=\\nabla f\\left(\\mathbf{x}^{\\left(k\\right)}\\right)=\\mathbf{Q}\\mathbf{x}^{\\left(k\\right)}-\\mathbf{b} \\\\\n", 64 | "\\alpha_k=\\frac{\\mathbf{g}^{\\left(k\\right)\\top}\\mathbf{d}^{\\left(k\\right)}}{\\mathbf{d}^{\\left(k\\right)\\top}\\mathbf{Q}\\mathbf{d}^{\\left(k\\right)}} \\\\\n", 65 | "\\mathbf{x}^{\\left(k+1\\right)}=\\mathbf{x}^{\\left(k\\right)}+\\alpha_k\\mathbf{d}^{\\left(k\\right)}$$" 66 | ] 67 | }, 68 | { 69 | "cell_type": "markdown", 70 | "metadata": {}, 71 | "source": [ 72 | "定理10.1 对于任意初始点$\\mathbf{x}^{\\left(0\\right)}$,基本共轭方向法都能在$n$次迭代之内收敛到唯一全局极小点$\\mathbf{x}^*$,即$\\mathbf{x}^{\\left(n\\right)}=\\mathbf{x}^*$。" 73 | ] 74 | }, 75 | { 76 | "cell_type": "markdown", 77 | "metadata": { 78 | "collapsed": true 79 | }, 80 | "source": [ 81 | "10.3 共轭梯度法" 82 | ] 83 | }, 84 | { 85 | "cell_type": "markdown", 86 | "metadata": {}, 87 | "source": [ 88 | "共轭梯度法算法:\n", 89 | "1. 令$k=0$;选择初始值$\\mathbf{x}^{\\left(0\\right)}$。\n", 90 | "2. 计算$\\mathbf{g}^{\\left(0\\right)}=\\nabla f\\left(\\mathbf{x}^{\\left(0\\right)}\\right)$,如果$\\mathbf{g}^{\\left(0\\right)}=\\mathbf{0}$,停止迭代;否则,令$\\mathbf{d}^{\\left(0\\right)}=\\mathbf{g}^{\\left(0\\right)}$。\n", 91 | "3. 计算$\\alpha_k=-\\frac{\\mathbf{g}^{\\left(k\\right)\\top}\\mathbf{d}^{\\left(k\\right)}}{\\mathbf{d}^{\\left(k\\right)\\top}\\mathbf{Q}\\mathbf{d}^{\\left(k\\right)}}$。\n", 92 | "4. 计算$\\mathbf{x}^{\\left(k+1\\right)}=\\mathbf{x}^{\\left(k\\right)}+\\alpha_k\\mathbf{d}^{\\left(k\\right)}$。\n", 93 | "5. 计算$\\mathbf{g}^{\\left(k+1\\right)}=\\nabla f\\left(\\mathbf{x}^{\\left(k+1\\right)}\\right)$,如果$\\mathbf{g}^{\\left(k+1\\right)}=\\mathbf{0}$,停止迭代。\n", 94 | "6. 计算$\\beta_k=-\\frac{\\mathbf{g}^{\\left(k+1\\right)\\top}\\mathbf{Q}\\mathbf{d}^{\\left(k\\right)}}{\\mathbf{d}^{\\left(k\\right)\\top}\\mathbf{Q}\\mathbf{d}^{\\left(k\\right)}}$。\n", 95 | "7. 计算$\\mathbf{d}^{\\left(k+1\\right)}=-\\mathbf{g}^{\\left(k+1\\right)}+\\beta_k\\mathbf{d}^{\\left(k\\right)}$。\n", 96 | "8. 令$k=k+1$,返回第3步。" 97 | ] 98 | }, 99 | { 100 | "cell_type": "markdown", 101 | "metadata": {}, 102 | "source": [ 103 | "命题10.1 共轭梯度法中的搜索方向$\\mathbf{d}^{\\left(0\\right)},\\mathbf{d}^{\\left(1\\right)},\\dots,\\mathbf{d}^{\\left(n-1\\right)}$是$\\mathbf{Q}$共轭方向。" 104 | ] 105 | }, 106 | { 107 | "cell_type": "markdown", 108 | "metadata": {}, 109 | "source": [ 110 | "10.4 非二次型问题中的共轭梯度法" 111 | ] 112 | }, 113 | { 114 | "cell_type": "markdown", 115 | "metadata": {}, 116 | "source": [ 117 | "Hestenes-Stiefel公式:共轭梯度法中\n", 118 | "$$\\beta_k=\\frac{\\mathbf{g}^{\\left(k+1\\right)\\top}\\left[\\mathbf{g}^{\\left(k+1\\right)}-\\mathbf{g}^{\\left(k\\right)\\top}\\right]}{\\mathbf{d}^{\\left(k\\right)\\top}\\left[\\mathbf{g}^{\\left(k+1\\right)}-\\mathbf{g}^{\\left(k\\right)\\top}\\right]}$$" 119 | ] 120 | }, 121 | { 122 | "cell_type": "markdown", 123 | "metadata": {}, 124 | "source": [ 125 | "Polak-Ribiere公式:Hestenes-Stiefel公式中分母展开,可得\n", 126 | "$$\\beta_k=\\frac{\\mathbf{g}^{\\left(k+1\\right)\\top}\\left[\\mathbf{g}^{\\left(k+1\\right)}-\\mathbf{g}^{\\left(k\\right)\\top}\\right]}{\\mathbf{g}^{\\left(k\\right)\\top}\\mathbf{g}^{\\left(k\\right)}}$$" 127 | ] 128 | }, 129 | { 130 | "cell_type": "markdown", 131 | "metadata": {}, 132 | "source": [ 133 | "Polak-Ribiere公式:Hestenes-Stiefel公式中分子展开,可得\n", 134 | "$$\\beta_k=\\frac{\\mathbf{g}^{\\left(k+1\\right)\\top}\\mathbf{g}^{\\left(k+1\\right)}}{\\mathbf{g}^{\\left(k\\right)\\top}\\mathbf{g}^{\\left(k\\right)}}$$" 135 | ] 136 | }, 137 | { 138 | "cell_type": "code", 139 | "execution_count": null, 140 | "metadata": { 141 | "collapsed": true 142 | }, 143 | "outputs": [], 144 | "source": [] 145 | } 146 | ], 147 | "metadata": { 148 | "kernelspec": { 149 | "display_name": "Python 2", 150 | "language": "python", 151 | "name": "python2" 152 | }, 153 | "language_info": { 154 | "codemirror_mode": { 155 | "name": "ipython", 156 | "version": 2 157 | }, 158 | "file_extension": ".py", 159 | "mimetype": "text/x-python", 160 | "name": "python", 161 | "nbconvert_exporter": "python", 162 | "pygments_lexer": "ipython2", 163 | "version": "2.7.13" 164 | } 165 | }, 166 | "nbformat": 4, 167 | "nbformat_minor": 2 168 | } 169 | -------------------------------------------------------------------------------- /第11章 拟牛顿法.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "第11章 拟牛顿法" 8 | ] 9 | }, 10 | { 11 | "cell_type": "markdown", 12 | "metadata": {}, 13 | "source": [ 14 | "11.1 引言" 15 | ] 16 | }, 17 | { 18 | "cell_type": "markdown", 19 | "metadata": {}, 20 | "source": [ 21 | "拟牛顿法的思路是通过设计牛顿法中的$\\mathbf{F}\\left(\\mathbf{x}^{\\left(k\\right)}\\right)^{-1}$的近似矩阵来代替$\\mathbf{F}\\left(\\mathbf{x}^{\\left(k\\right)}\\right)^{-1}$。$\\mathbf{F}\\left(\\mathbf{x}^{\\left(k\\right)}\\right)^{-1}$的近似矩阵随着迭代的进行不断更新,使其至少拥有$\\mathbf{F}\\left(\\mathbf{x}^{\\left(k\\right)}\\right)^{-1}$的的部分性质。" 22 | ] 23 | }, 24 | { 25 | "cell_type": "markdown", 26 | "metadata": {}, 27 | "source": [ 28 | "引入等式\n", 29 | "$$\\mathbf{x}^{\\left(k+1\\right)}=\\mathbf{x}^{\\left(k\\right)}-\\alpha\\mathbf{H}_k\\mathbf{g}^{\\left(k\\right)}$$\n", 30 | "其中,$\\mathbf{H}_k$是$n\\times n$实矩阵,是$\\mathbf{F}\\left(\\mathbf{x}^{\\left(k\\right)}\\right)^{-1}$的替代矩阵,$\\alpha>0$为搜索步长。" 31 | ] 32 | }, 33 | { 34 | "cell_type": "markdown", 35 | "metadata": {}, 36 | "source": [ 37 | "在$\\mathbf{x}^{\\left(k\\right)}$处对$f$进行一阶泰勒展开,并将$\\mathbf{x}^{\\left(k+1\\right)}$带入后,可得\n", 38 | "$$f\\left(\\mathbf{x}^{\\left(k+1\\right)}\\right)=f\\left(\\mathbf{x}^{\\left(k\\right)}\\right)+\\mathbf{g}^{\\left(k\\right)\\top}\\left(\\mathbf{x}^{\\left(k+1\\right)}-\\mathbf{x}^{\\left(k\\right)}\\right)+o\\left(\\|\\mathbf{x}^{\\left(k+1\\right)}-\\mathbf{x}^{\\left(k\\right)}\\|\\right) \\\\\n", 39 | "=f\\left(\\mathbf{x}^{\\left(k\\right)}\\right)-\\alpha\\mathbf{g}^{\\left(k\\right)\\top}\\mathbf{H}_k\\mathbf{g}^{\\left(k\\right)}+o\\left(\\alpha\\|\\mathbf{H}_k\\mathbf{g}^{\\left(k\\right)}\\|\\right)$$" 40 | ] 41 | }, 42 | { 43 | "cell_type": "markdown", 44 | "metadata": {}, 45 | "source": [ 46 | "当$\\alpha$趋向于$0$时,上式等号右侧的第2项主导了第3项。因此,当$\\alpha$比较小时,为保证函数$f$从$\\mathbf{x}^{\\left(k\\right)}$到$\\mathbf{x}^{\\left(k+1\\right)}$是下降的,必须有\n", 47 | "$$\\mathbf{g}^{\\left(k\\right)\\top}\\mathbf{H}_k\\mathbf{g}^{\\left(k\\right)}>0$$\n", 48 | "为保证上式成立,最简单的方法就是保证$\\mathbf{H}_k$是正定的。" 49 | ] 50 | }, 51 | { 52 | "cell_type": "markdown", 53 | "metadata": {}, 54 | "source": [ 55 | "11.2 黑塞矩阵逆矩阵的近似" 56 | ] 57 | }, 58 | { 59 | "cell_type": "markdown", 60 | "metadata": {}, 61 | "source": [ 62 | "令$\\mathbf{H}_0,\\mathbf{H}_1,\\dots$表示黑塞矩阵逆矩阵$\\mathbf{F}\\left(\\mathbf{x}^{\\left(k\\right)}\\right)^{-1}$的一系列近似矩阵。假定目标函数$f$的黑塞矩阵$\\mathbf{F}\\left(\\mathbf{x}\\right)$是常数矩阵,与$\\mathbf{x}$的取值无关,即目标函数是二次型函数,$\\mathbf{F}\\left(\\mathbf{x}\\right)=\\mathbf{Q}$,且$\\mathbf{Q}=\\mathbf{Q}^\\top$,则有\n", 63 | "$$\\mathbf{g}^{\\left(k+1\\right)}-\\mathbf{g}^{\\left(k\\right)}=\\mathbf{Q}\\left(\\mathbf{x}^{\\left(k+1\\right)}-\\mathbf{x}^{\\left(k\\right)}\\right)$$" 64 | ] 65 | }, 66 | { 67 | "cell_type": "markdown", 68 | "metadata": {}, 69 | "source": [ 70 | "令\n", 71 | "$$\\varDelta\\mathbf{g}^{\\left(k\\right)}\\triangleq\\mathbf{g}^{\\left(k+1\\right)}-\\mathbf{g}^{\\left(k\\right)} \\\\\n", 72 | "\\varDelta\\mathbf{x}^{\\left(k\\right)}\\triangleq\\mathbf{x}^{\\left(k+1\\right)}-\\mathbf{x}^{\\left(k\\right)} $$\n", 73 | "可得\n", 74 | "$$\\varDelta\\mathbf{g}^{\\left(k\\right)}=\\mathbf{Q}\\varDelta\\mathbf{x}^{\\left(k\\right)}$$" 75 | ] 76 | }, 77 | { 78 | "cell_type": "markdown", 79 | "metadata": {}, 80 | "source": [ 81 | "记对称正定矩阵$\\mathbf{H}_0$作为近似矩阵的初始矩阵,在给定的$k$下,矩阵$\\mathbf{Q}^{-1}$应满足\n", 82 | "$$\\mathbf{Q}^{-1}\\varDelta\\mathbf{g}^{\\left(i\\right)}=\\varDelta\\mathbf{x}^{\\left(i\\right)},\\quad 0\\leqslant i\\leqslant k$$" 83 | ] 84 | }, 85 | { 86 | "cell_type": "markdown", 87 | "metadata": {}, 88 | "source": [ 89 | "因此近似矩阵$\\mathbf{H}_{k+1}$应满足\n", 90 | "$$\\mathbf{H}_{k+1}\\varDelta\\mathbf{g}^{\\left(i\\right)}=\\varDelta\\mathbf{x}^{\\left(i\\right)},\\quad 0\\leqslant i\\leqslant k$$" 91 | ] 92 | }, 93 | { 94 | "cell_type": "markdown", 95 | "metadata": {}, 96 | "source": [ 97 | "如果共开展$n$次迭代,则近似矩阵$\\mathbf{H}_{n}$应满足\n", 98 | "$$\\mathbf{H}_{n}\\varDelta\\mathbf{g}^{\\left(i\\right)}=\\varDelta\\mathbf{x}^{\\left(i\\right)},\\quad 0\\leqslant i\\leqslant n-1$$" 99 | ] 100 | }, 101 | { 102 | "cell_type": "markdown", 103 | "metadata": {}, 104 | "source": [ 105 | "将其改写为\n", 106 | "$$\\mathbf{H}_n\\left[\\varDelta\\mathbf{g}^{\\left(0\\right)},\\varDelta\\mathbf{g}^{\\left(1\\right)}\\dots,\\varDelta\\mathbf{g}^{\\left(n-1\\right)}\\right]=\\left[\\varDelta\\mathbf{x}^{\\left(0\\right)},\\varDelta\\mathbf{x}^{\\left(1\\right)}\\dots,\\varDelta\\mathbf{x}^{\\left(n-1\\right)}\\right]$$" 107 | ] 108 | }, 109 | { 110 | "cell_type": "markdown", 111 | "metadata": {}, 112 | "source": [ 113 | "矩阵$\\mathbf{Q}$能够满足\n", 114 | "$$\\mathbf{Q}\\left[\\varDelta\\mathbf{x}^{\\left(0\\right)},\\varDelta\\mathbf{x}^{\\left(1\\right)}\\dots,\\varDelta\\mathbf{x}^{\\left(n-1\\right)}\\right]=\\left[\\varDelta\\mathbf{g}^{\\left(0\\right)},\\varDelta\\mathbf{g}^{\\left(1\\right)}\\dots,\\varDelta\\mathbf{g}^{\\left(n-1\\right)}\\right]$$\n", 115 | "和\n", 116 | "$$\\mathbf{Q}^{-1}\\left[\\varDelta\\mathbf{g}^{\\left(0\\right)},\\varDelta\\mathbf{g}^{\\left(1\\right)}\\dots,\\varDelta\\mathbf{g}^{\\left(n-1\\right)}\\right]=\\left[\\varDelta\\mathbf{x}^{\\left(0\\right)},\\varDelta\\mathbf{x}^{\\left(1\\right)}\\dots,\\varDelta\\mathbf{x}^{\\left(n-1\\right)}\\right]$$" 117 | ] 118 | }, 119 | { 120 | "cell_type": "markdown", 121 | "metadata": {}, 122 | "source": [ 123 | "说明,如果$\\left[\\varDelta\\mathbf{g}^{\\left(0\\right)},\\varDelta\\mathbf{g}^{\\left(1\\right)}\\dots,\\varDelta\\mathbf{g}^{\\left(n-1\\right)}\\right]$非奇异,那么矩阵$\\mathbf{Q}^{-1}$能够在$n$次迭代后唯一确定,即\n", 124 | "$$\\mathbf{Q}^{-1}=\\mathbf{H}_n=\\left[\\varDelta\\mathbf{x}^{\\left(0\\right)},\\varDelta\\mathbf{x}^{\\left(1\\right)}\\dots,\\varDelta\\mathbf{x}^{\\left(n-1\\right)}\\right]\\left[\\varDelta\\mathbf{g}^{\\left(0\\right)},\\varDelta\\mathbf{g}^{\\left(1\\right)}\\dots,\\varDelta\\mathbf{g}^{\\left(n-1\\right)}\\right]^{-1}$$" 125 | ] 126 | }, 127 | { 128 | "cell_type": "markdown", 129 | "metadata": {}, 130 | "source": [ 131 | "由此可得,如果$\\mathbf{H}_n$能够使方程$\\mathbf{H}_{n}\\varDelta\\mathbf{g}^{\\left(i\\right)}=\\varDelta\\mathbf{x}^{\\left(i\\right)},0\\leqslant i\\leqslant n-1$,那么利用迭代公式$\\mathbf{x}^{\\left(k+1\\right)}=\\mathbf{x}^{\\left(k\\right)}-\\alpha\\mathbf{H}_k\\mathbf{g}^{\\left(k\\right)},\\alpha_k=\\arg\\min_{\\alpha\\geqslant0}f\\left(\\mathbf{x}^{\\left(k\\right)}-\\alpha\\mathbf{H}_k\\mathbf{g}^{\\left(k\\right)}\\right)$求解$n$维二次型优化问题,可得$\\mathbf{x}^{\\left(n+1\\right)}=\\mathbf{x}^{\\left(n\\right)}-\\alpha\\mathbf{H}_n\\mathbf{g}^{\\left(n\\right)}$,这与牛顿迭代公式是一致的,说明一定能够在$n+1$次迭代内完成求解。" 132 | ] 133 | }, 134 | { 135 | "cell_type": "markdown", 136 | "metadata": {}, 137 | "source": [ 138 | "拟牛顿法的迭代公式为\n", 139 | "$$\\mathbf{d}^{\\left(k\\right)}=-\\mathbf{H}_k\\mathbf{g}^{\\left(k\\right)} \\\\\n", 140 | "\\alpha_k=\\arg\\min_{\\alpha\\geqslant0}f\\left(\\mathbf{x}^{\\left(k\\right)}-\\alpha\\mathbf{d}^{\\left(k\\right)}\\right) \\\\\n", 141 | "\\mathbf{x}^{\\left(k+1\\right)}=\\mathbf{x}^{\\left(k\\right)}+\\alpha_k\\mathbf{d}^{\\left(k\\right)}$$\n", 142 | "其中,矩阵$\\mathbf{H}_0,\\mathbf{H}_1,\\dots$是对称矩阵。" 143 | ] 144 | }, 145 | { 146 | "cell_type": "markdown", 147 | "metadata": {}, 148 | "source": [ 149 | "目标函数为二次型函数时,必须满足\n", 150 | "$$\\mathbf{H}_{k+1}\\varDelta\\mathbf{g}^{\\left(i\\right)}=\\varDelta\\mathbf{x}^{\\left(i\\right)},\\quad 0\\leqslant i\\leqslant k$$\n", 151 | "其中,$\\varDelta\\mathbf{x}^{\\left(i\\right)}={x}^{\\left(i+1\\right)}-{x}^{\\left(i\\right)}=\\alpha_i\\mathbf{d}^{\\left(i\\right)},\\varDelta\\mathbf{g}^{\\left(i\\right)}=\\mathbf{g}^{\\left(i+1\\right)}-\\mathbf{g}^{\\left(i\\right)}=\\mathbf{Q}\\varDelta\\mathbf{x}^{\\left(i\\right)}$。" 152 | ] 153 | }, 154 | { 155 | "cell_type": "markdown", 156 | "metadata": {}, 157 | "source": [ 158 | "由满足条件并不能唯一确定矩阵$\\mathbf{H}_k$。矩阵$\\mathbf{H}_{k+1}$可由矩阵$\\mathbf{H}_{k}$增加修正项得到。" 159 | ] 160 | }, 161 | { 162 | "cell_type": "markdown", 163 | "metadata": {}, 164 | "source": [ 165 | "11.3 秩1修正公式" 166 | ] 167 | }, 168 | { 169 | "cell_type": "markdown", 170 | "metadata": {}, 171 | "source": [ 172 | "秩1修正公式中,修正项为$\\alpha_k\\mathbf{z}^{\\left(k\\right)}\\mathbf{z}^{\\left(k\\right)\\top}$,$\\alpha\\in\\mathbb{R},\\mathbf{z}^{\\left(k\\right)}\\in\\mathbb{R}^n$,是一个对称矩阵,近似矩阵的跟新方程为\n", 173 | "$$\\mathbf{H}_{k+1}=\\mathbf{H}_k+\\alpha_k\\mathbf{z}^{\\left(k\\right)}\\mathbf{z}^{\\left(k\\right)\\top}$$" 174 | ] 175 | }, 176 | { 177 | "cell_type": "markdown", 178 | "metadata": {}, 179 | "source": [ 180 | "注意\n", 181 | "$$\\mathrm{rank}\\thinspace\\mathbf{z}^{\\left(k\\right)}\\mathbf{z}^{\\left(k\\right)\\top}=\\mathrm{rank}\\left(\\begin{bmatrix} \\mathbf{z}_1^{\\left(k\\right)} \\\\ \\mathbf{z}_2^{\\left(k\\right)} \\\\ \\vdots \\\\ \\mathbf{z}_n^{\\left(k\\right)} \\end{bmatrix}\\left[\\mathbf{z}_1^{\\left(k\\right)},\\mathbf{z}_2^{\\left(k\\right)},\\dots,\\mathbf{z}_n^{\\left(k\\right)}\\right]\\right)=1$$\n", 182 | "故称为秩1修正算法。" 183 | ] 184 | }, 185 | { 186 | "cell_type": "markdown", 187 | "metadata": {}, 188 | "source": [ 189 | "由于需满足条件\n", 190 | "$$\\mathbf{H}_{k+1}\\varDelta\\mathbf{g}^{\\left(i\\right)}=\\varDelta\\mathbf{x}^{\\left(i\\right)},\\quad 0\\leqslant i\\leqslant k$$\n", 191 | "则\n", 192 | "$$\\left(\\mathbf{H}_k+\\alpha_k\\mathbf{z}^{\\left(k\\right)}\\mathbf{z}^{\\left(k\\right)\\top}\\right)\\varDelta\\mathbf{g}^{\\left(i\\right)}=\\varDelta\\mathbf{x}^{\\left(i\\right)}$$" 193 | ] 194 | }, 195 | { 196 | "cell_type": "markdown", 197 | "metadata": {}, 198 | "source": [ 199 | "注意$\\mathbf{z}^{\\left(k\\right)\\top}\\varDelta\\mathbf{g}^{\\left(k\\right)}$是一个标量,因此\n", 200 | "$$\\varDelta\\mathbf{x}^{\\left(k\\right)}-\\mathbf{H}_k\\varDelta\\mathbf{g}^{\\left(k\\right)}=\\left(\\alpha_k\\mathbf{z}^{\\left(k\\right)\\top}\\varDelta\\mathbf{g}^{\\left(k\\right)}\\right)\\mathbf{z}^{\\left(k\\right)}$$\n", 201 | "得\n", 202 | "$$\\mathbf{z}^{\\left(k\\right)}=\\frac{\\varDelta\\mathbf{x}^{\\left(k\\right)}-\\mathbf{H}_k\\varDelta\\mathbf{g}^{\\left(k\\right)}}{\\alpha_k\\left(\\mathbf{z}^{\\left(k\\right)\\top}\\varDelta\\mathbf{g}^{\\left(k\\right)}\\right)}$$" 203 | ] 204 | }, 205 | { 206 | "cell_type": "markdown", 207 | "metadata": {}, 208 | "source": [ 209 | "据此可得\n", 210 | "$$\\alpha\\mathbf{z}^{\\left(k\\right)}\\mathbf{z}^{\\left(k\\right)\\top}=\\frac{\\left(\\varDelta\\mathbf{x}^{\\left(k\\right)}-\\mathbf{H}_k\\varDelta\\mathbf{g}^{\\left(k\\right)}\\right)\\left(\\varDelta\\mathbf{x}^{\\left(k\\right)}-\\mathbf{H}_k\\varDelta\\mathbf{g}^{\\left(k\\right)}\\right)^\\top}{\\alpha_k\\left(\\mathbf{z}^{\\left(k\\right)\\top}\\varDelta\\mathbf{g}^{\\left(k\\right)}\\right)^2}$$" 211 | ] 212 | }, 213 | { 214 | "cell_type": "markdown", 215 | "metadata": {}, 216 | "source": [ 217 | "可得近似矩阵的中间更新方程\n", 218 | "$$\\mathbf{H}_{k+1}=\\mathbf{H}_k+\\frac{\\left(\\varDelta\\mathbf{x}^{\\left(k\\right)}-\\mathbf{H}_k\\varDelta\\mathbf{g}^{\\left(k\\right)}\\right)\\left(\\varDelta\\mathbf{x}^{\\left(k\\right)}-\\mathbf{H}_k\\varDelta\\mathbf{g}^{\\left(k\\right)}\\right)^\\top}{\\alpha_k\\left(\\mathbf{z}^{\\left(k\\right)\\top}\\varDelta\\mathbf{g}^{\\left(k\\right)}\\right)^2}$$" 219 | ] 220 | }, 221 | { 222 | "cell_type": "markdown", 223 | "metadata": {}, 224 | "source": [ 225 | "为将上式中右端整理为只与$\\mathbf{H}_k$、$\\varDelta\\mathbf{g}^{\\left(k\\right)}$、$\\varDelta\\mathbf{x}^{\\left(k\\right)}$有关,在$\\varDelta\\mathbf{x}^{\\left(k\\right)}-\\mathbf{H}_k\\varDelta\\mathbf{g}^{\\left(k\\right)}=\\left(\\alpha_k\\mathbf{z}^{\\left(k\\right)\\top}\\varDelta\\mathbf{g}^{\\left(k\\right)}\\right)\\mathbf{z}^{\\left(k\\right)}$两端同时左乘$\\varDelta\\mathbf{g}^{\\left(k\\right)\\top}$,可得\n", 226 | "$$\\varDelta\\mathbf{g}^{\\left(k\\right)\\top}\\varDelta\\mathbf{x}^{\\left(k\\right)}-\\varDelta\\mathbf{g}^{\\left(k\\right)\\top}\\mathbf{H}_k\\varDelta\\mathbf{g}^{\\left(k\\right)}=\\varDelta\\mathbf{g}^{\\left(k\\right)\\top}\\alpha_k\\mathbf{z}^{\\left(k\\right)\\top}\\varDelta\\mathbf{g}^{\\left(k\\right)}\\mathbf{z}^{\\left(k\\right)}$$" 227 | ] 228 | }, 229 | { 230 | "cell_type": "markdown", 231 | "metadata": {}, 232 | "source": [ 233 | "由于$\\alpha_k$为标量,$\\varDelta\\mathbf{g}^{\\left(k\\right)\\top}\\mathbf{z}^{\\left(k\\right)}=\\varDelta\\mathbf{z}^{\\left(k\\right)\\top}\\mathbf{g}^{\\left(k\\right)}$为标量,因此\n", 234 | "$$\\varDelta\\mathbf{g}^{\\left(k\\right)\\top}\\varDelta\\mathbf{x}^{\\left(k\\right)}-\\varDelta\\mathbf{g}^{\\left(k\\right)\\top}\\mathbf{H}_k\\varDelta\\mathbf{g}^{\\left(k\\right)}=\\alpha_k\\left(\\mathbf{z}^{\\left(k\\right)\\top}\\varDelta\\mathbf{g}^{\\left(k\\right)}\\right)^2$$" 235 | ] 236 | }, 237 | { 238 | "cell_type": "markdown", 239 | "metadata": {}, 240 | "source": [ 241 | "将上式带入中间更新方程,得最终更新方程\n", 242 | "$$\\mathbf{H}_{k+1}=\\mathbf{H}_k+\\frac{\\left(\\varDelta\\mathbf{x}^{\\left(k\\right)}-\\mathbf{H}_k\\varDelta\\mathbf{g}^{\\left(k\\right)}\\right)\\left(\\varDelta\\mathbf{x}^{\\left(k\\right)}-\\mathbf{H}_k\\varDelta\\mathbf{g}^{\\left(k\\right)}\\right)^\\top}{\\varDelta\\mathbf{g}^{\\left(k\\right)\\top}\\left(\\varDelta\\mathbf{x}^{\\left(k\\right)}-\\mathbf{H}_k\\varDelta\\mathbf{g}^{\\left(k\\right)}\\right)}$$" 243 | ] 244 | }, 245 | { 246 | "cell_type": "markdown", 247 | "metadata": {}, 248 | "source": [ 249 | "秩1算法:\n", 250 | "1. 令$k=0$;选择初始点$\\mathbf{x}^{\\left(0\\right)}$,任选一个对称正定实矩阵$\\mathbf{H}_0$。\n", 251 | "2. 如果$\\mathbf{g}^{\\left(k\\right)}=0$,停止迭代;否则,令$\\mathbf{d}^{\\left(k\\right)}=-\\mathbf{H}_k\\mathbf{g}^{\\left(k\\right)}$。\n", 252 | "3. 计算\n", 253 | "$$\\alpha_k=\\arg\\min_{\\alpha\\geqslant0}f\\left(\\mathbf{x}^{\\left(k\\right)}+\\alpha\\mathbf{d}^{\\left(k\\right)}\\right) \\\\\n", 254 | "\\mathbf{x}^{\\left(k+1\\right)}=\\mathbf{x}^{\\left(k\\right)}+\\alpha_k\\mathbf{d}^{\\left(k\\right)}$$\n", 255 | "4. 计算\n", 256 | "$$\\varDelta\\mathbf{x}^{\\left(k\\right)}=\\alpha_k\\mathbf{d}^{\\left(k\\right)} \\\\\n", 257 | "\\varDelta\\mathbf{g}^{\\left(k\\right)}=\\mathbf{g}^{\\left(k+1\\right)}-\\mathbf{g}^{\\left(k\\right)} \\\\\n", 258 | "\\mathbf{H}_{k+1}=\\mathbf{H}_k+\\frac{\\left(\\varDelta\\mathbf{x}^{\\left(k\\right)}-\\mathbf{H}_k\\varDelta\\mathbf{g}^{\\left(k\\right)}\\right)\\left(\\varDelta\\mathbf{x}^{\\left(k\\right)}-\\mathbf{H}_k\\varDelta\\mathbf{g}^{\\left(k\\right)}\\right)^\\top}{\\varDelta\\mathbf{g}^{\\left(k\\right)\\top}\\left(\\varDelta\\mathbf{x}^{\\left(k\\right)}-\\mathbf{H}_k\\varDelta\\mathbf{g}^{\\left(k\\right)}\\right)}$$\n", 259 | "5. 令$k=k+1$,返回第2步。" 260 | ] 261 | }, 262 | { 263 | "cell_type": "markdown", 264 | "metadata": {}, 265 | "source": [ 266 | "秩1算法产生的矩阵$\\mathbf{H}_{k+1}$并不一定是正定的,这将导致$\\mathbf{d}^{\\left(k+1\\right)}$可能不是下降方向。" 267 | ] 268 | }, 269 | { 270 | "cell_type": "markdown", 271 | "metadata": {}, 272 | "source": [ 273 | "11.4 DFP算法(变尺度法)" 274 | ] 275 | }, 276 | { 277 | "cell_type": "markdown", 278 | "metadata": {}, 279 | "source": [ 280 | "1. 令$k=0$;选择初始点$\\mathbf{x}^{\\left(0\\right)}$,任选一个对称正定实矩阵$\\mathbf{H}_0$。\n", 281 | "2. 如果$\\mathbf{g}^{\\left(k\\right)}=0$,停止迭代;否则,令$\\mathbf{d}^{\\left(k\\right)}=-\\mathbf{H}_k\\mathbf{g}^{\\left(k\\right)}$。\n", 282 | "3. 计算\n", 283 | "$$\\alpha_k=\\arg\\min_{\\alpha\\geqslant0}f\\left(\\mathbf{x}^{\\left(k\\right)}+\\alpha\\mathbf{d}^{\\left(k\\right)}\\right) \\\\\n", 284 | "\\mathbf{x}^{\\left(k+1\\right)}=\\mathbf{x}^{\\left(k\\right)}+\\alpha_k\\mathbf{d}^{\\left(k\\right)}$$\n", 285 | "4. 计算\n", 286 | "$$\\varDelta\\mathbf{x}^{\\left(k\\right)}=\\alpha_k\\mathbf{d}^{\\left(k\\right)} \\\\\n", 287 | "\\varDelta\\mathbf{g}^{\\left(k\\right)}=\\mathbf{g}^{\\left(k+1\\right)}-\\mathbf{g}^{\\left(k\\right)} \\\\\n", 288 | "\\mathbf{H}_{k+1}=\\mathbf{H}_k+\\frac{\\varDelta\\mathbf{x}^{\\left(k\\right)}\\varDelta\\mathbf{x}^{\\left(k\\right)\\top}}{\\varDelta\\mathbf{x}^{\\left(k\\right)\\top}\\varDelta\\mathbf{g}^{\\left(k\\right)}}-\\frac{\\left[\\mathbf{H}_k\\varDelta\\mathbf{g}^{\\left(k\\right)}\\right]\\left[\\mathbf{H}_k\\varDelta\\mathbf{g}^{\\left(k\\right)}\\right]^\\top}{\\varDelta\\mathbf{g}^{\\left(k\\right)\\top}\\mathbf{H}_k\\varDelta\\mathbf{g}^{\\left(k\\right)}}$$\n", 289 | "5. 令$k=k+1$,返回第2步。" 290 | ] 291 | }, 292 | { 293 | "cell_type": "markdown", 294 | "metadata": {}, 295 | "source": [ 296 | "DFP算法中,只要矩阵$\\mathbf{H}_k$是正定的,$\\mathbf{H}_{k+1}$就是正定的。" 297 | ] 298 | }, 299 | { 300 | "cell_type": "markdown", 301 | "metadata": {}, 302 | "source": [ 303 | "DFP算法中,当处理一些规模较大的二次型问题时,迭代过程中会出现矩阵$\\mathbf{H}_k$非常接近成为奇异矩阵,造成迭代无法继续开展。" 304 | ] 305 | }, 306 | { 307 | "cell_type": "markdown", 308 | "metadata": {}, 309 | "source": [ 310 | "11.5 BFGS算法" 311 | ] 312 | }, 313 | { 314 | "cell_type": "markdown", 315 | "metadata": {}, 316 | "source": [ 317 | "已知DFP算法中黑塞矩阵逆矩阵的近似矩阵的更新公式为\n", 318 | "$$\\mathbf{H}^{DFP}_{k+1}=\\mathbf{H}_k+\\frac{\\varDelta\\mathbf{x}^{\\left(k\\right)}\\varDelta\\mathbf{x}^{\\left(k\\right)\\top}}{\\varDelta\\mathbf{x}^{\\left(k\\right)\\top}\\varDelta\\mathbf{g}^{\\left(k\\right)}}-\\frac{\\mathbf{H}_k\\varDelta\\mathbf{g}^{\\left(k\\right)}\\varDelta\\mathbf{g}^{\\left(k\\right)\\top}\\mathbf{H}_k}{\\varDelta\\mathbf{g}^{\\left(k\\right)\\top}\\mathbf{H}_k\\varDelta\\mathbf{g}^{\\left(k\\right)}}$$" 319 | ] 320 | }, 321 | { 322 | "cell_type": "markdown", 323 | "metadata": {}, 324 | "source": [ 325 | "利用互补的概念,可得到黑塞矩阵近似矩阵的更新公式为\n", 326 | "$$\\mathbf{B}_{k+1}=\\mathbf{B}_k+\\frac{\\varDelta\\mathbf{g}^{\\left(k\\right)}\\varDelta\\mathbf{g}^{\\left(k\\right)\\top}}{\\varDelta\\mathbf{g}^{\\left(k\\right)\\top}\\varDelta\\mathbf{x}^{\\left(k\\right)}}-\\frac{\\mathbf{H}_k\\varDelta\\mathbf{x}^{\\left(k\\right)}\\varDelta\\mathbf{x}^{\\left(k\\right)\\top}\\mathbf{H}_k}{\\varDelta\\mathbf{x}^{\\left(k\\right)\\top}\\mathbf{H}_k\\varDelta\\mathbf{x}^{\\left(k\\right)}}$$" 327 | ] 328 | }, 329 | { 330 | "cell_type": "markdown", 331 | "metadata": {}, 332 | "source": [ 333 | "得BFGS方法中黑塞矩阵逆矩阵的近似矩阵的更新公式为\n", 334 | "$$\\mathbf{H}^{BFGS}_{k+1}=\\left(\\mathbf{B}_{k+1}\\right)^{-1}=\\left(\\mathbf{B}_k+\\frac{\\varDelta\\mathbf{g}^{\\left(k\\right)}\\varDelta\\mathbf{g}^{\\left(k\\right)\\top}}{\\varDelta\\mathbf{g}^{\\left(k\\right)\\top}\\varDelta\\mathbf{x}^{\\left(k\\right)}}-\\frac{\\mathbf{H}_k\\varDelta\\mathbf{x}^{\\left(k\\right)}\\varDelta\\mathbf{x}^{\\left(k\\right)\\top}\\mathbf{H}_k}{\\varDelta\\mathbf{x}^{\\left(k\\right)\\top}\\mathbf{H}_k\\varDelta\\mathbf{x}^{\\left(k\\right)}}\\right)^{-1}$$" 335 | ] 336 | }, 337 | { 338 | "cell_type": "markdown", 339 | "metadata": {}, 340 | "source": [ 341 | "应用谢尔曼-莫里森矩阵求逆公式,得\n", 342 | "$$\\mathbf{H}^{BFGS}_{k+1}=\\mathbf{H}_{k}+\\left(1+\\frac{\\varDelta\\mathbf{g}^{\\left(k\\right)\\top}\\mathbf{H}_k\\varDelta\\mathbf{g}^{\\left(k\\right)}}{\\varDelta\\mathbf{g}^{\\left(k\\right)\\top}\\varDelta\\mathbf{x}^{\\left(k\\right)}}\\right)\\frac{\\varDelta\\mathbf{x}^{\\left(k\\right)}\\varDelta\\mathbf{x}^{\\left(k\\right)\\top}}{\\varDelta\\mathbf{x}^{\\left(k\\right)\\top}\\varDelta\\mathbf{g}^{\\left(k\\right)}} \\\\\n", 343 | "-\\frac{\\mathbf{H}_k\\varDelta\\mathbf{g}^{\\left(k\\right)}\\varDelta\\mathbf{x}^{\\left(k\\right)\\top}+\\left(\\mathbf{H}_k\\varDelta\\mathbf{g}^{\\left(k\\right)}\\varDelta\\mathbf{x}^{\\left(k\\right)\\top}\\right)^\\top}{\\varDelta\\mathbf{g}^{\\left(k\\right)\\top}\\varDelta\\mathbf{x}^{\\left(k\\right)}}$$" 344 | ] 345 | }, 346 | { 347 | "cell_type": "markdown", 348 | "metadata": {}, 349 | "source": [ 350 | "BFGS算法保持了拟牛顿法的共轭方向性质,也能够使近似矩阵一直保持正定。" 351 | ] 352 | }, 353 | { 354 | "cell_type": "markdown", 355 | "metadata": {}, 356 | "source": [ 357 | "当迭代过程中一维搜索的进度不高时,BFGS算法扔比较稳健。" 358 | ] 359 | }, 360 | { 361 | "cell_type": "markdown", 362 | "metadata": {}, 363 | "source": [ 364 | "多数情况下,BFGS算法效率远超DFP算法。" 365 | ] 366 | }, 367 | { 368 | "cell_type": "markdown", 369 | "metadata": {}, 370 | "source": [ 371 | "对于非二次型问题,也需要对拟牛顿法进行一些修正。比如,可以每经过几次迭代(n或n+1),将搜索方向重置为梯度负方向,然后继续迭代,知道满足停止规则。" 372 | ] 373 | }, 374 | { 375 | "cell_type": "code", 376 | "execution_count": null, 377 | "metadata": { 378 | "collapsed": true 379 | }, 380 | "outputs": [], 381 | "source": [] 382 | } 383 | ], 384 | "metadata": { 385 | "kernelspec": { 386 | "display_name": "Python 2", 387 | "language": "python", 388 | "name": "python2" 389 | }, 390 | "language_info": { 391 | "codemirror_mode": { 392 | "name": "ipython", 393 | "version": 2 394 | }, 395 | "file_extension": ".py", 396 | "mimetype": "text/x-python", 397 | "name": "python", 398 | "nbconvert_exporter": "python", 399 | "pygments_lexer": "ipython2", 400 | "version": "2.7.13" 401 | } 402 | }, 403 | "nbformat": 4, 404 | "nbformat_minor": 2 405 | } 406 | -------------------------------------------------------------------------------- /第13章 无约束优化问题和神经网络.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "第13章 无优化问题和神经网络优化" 8 | ] 9 | }, 10 | { 11 | "cell_type": "markdown", 12 | "metadata": {}, 13 | "source": [ 14 | "13.1 引言" 15 | ] 16 | }, 17 | { 18 | "cell_type": "markdown", 19 | "metadata": {}, 20 | "source": [ 21 | "神经网络的核心是是神经元之间的连接权重,确定权重的过程称为训练或学习。" 22 | ] 23 | }, 24 | { 25 | "cell_type": "markdown", 26 | "metadata": {}, 27 | "source": [ 28 | "神经网络训练方法为反向传播算法,该算法基于无约束的优化问题,并利用地图算法进行求解。" 29 | ] 30 | }, 31 | { 32 | "cell_type": "markdown", 33 | "metadata": {}, 34 | "source": [ 35 | "每个神经元表示一个映射,通常是多输入单输出。神经元的输出是输入之和的函数,该函数通常称为激活函数。某个神经元的输出可以用作多个其他神经元的输入。" 36 | ] 37 | }, 38 | { 39 | "cell_type": "markdown", 40 | "metadata": {}, 41 | "source": [ 42 | "神经网络包括多个相互连接的神经元,各神经元的输入为其他神经元的输出的加权。在前馈神经网络中,神经元按照不同的层次进行连接。每个神经元只接受来自上一层次神经元的输出,是上一层次神经元输出的加权。" 43 | ] 44 | }, 45 | { 46 | "cell_type": "markdown", 47 | "metadata": {}, 48 | "source": [ 49 | "网络的第一层称为输入层,最后一层称为输出层,输入层和输出层之间为中间层(隐藏层)。" 50 | ] 51 | }, 52 | { 53 | "cell_type": "markdown", 54 | "metadata": {}, 55 | "source": [ 56 | "给定一个映射$\\mathbf{F}:\\mathbb{R}^n\\to\\mathbb{R}^m$,可以采用特定结构的神经网络实现。" 57 | ] 58 | }, 59 | { 60 | "cell_type": "markdown", 61 | "metadata": {}, 62 | "source": [ 63 | "确定数据对$\\left(\\mathbf{x}_{d,1},\\mathbf{y}_{d,1}\\right),\\left(\\mathbf{x}_{d,1},\\mathbf{y}_{d,1}\\right),\\dots,\\left(\\mathbf{x}_{d,p},\\mathbf{y}_{d,p}\\right)\\in\\mathbb{R}^n\\times\\mathbb{R}^m$,其中,$\\mathbf{y}_{d,i}$为映射$\\mathbf{F}$的输出,对应输入为$\\mathbf{x}_{d,i}$,即$\\mathbf{y}_{d,i}=\\mathbf{F}\\left(\\mathbf{x}_{d,i}\\right)$,以数据对$\\{\\left(\\mathbf{x}_{d,1},\\mathbf{y}_{d,1}\\right),\\left(\\mathbf{x}_{d,1},\\mathbf{y}_{d,1}\\right),\\dots,\\left(\\mathbf{x}_{d,p},\\mathbf{y}_{d,p}\\right)\\}$作为训练集对神经网络进行训练。利用训练算法,以网络实际输出和制定输出之间的误差,即$\\mathbf{y}_{d,i}=\\mathbf{F}\\left(\\mathbf{y}_{d,i}\\right)$和神经网络在输入$\\mathbf{x}_{d,i}$下的输出之间的差值为依据,对连接权重进行调整。" 64 | ] 65 | }, 66 | { 67 | "cell_type": "markdown", 68 | "metadata": {}, 69 | "source": [ 70 | "神经网络训练问题可归纳为优化问题。" 71 | ] 72 | }, 73 | { 74 | "cell_type": "markdown", 75 | "metadata": {}, 76 | "source": [ 77 | "13.2 单个神经元" 78 | ] 79 | }, 80 | { 81 | "cell_type": "markdown", 82 | "metadata": {}, 83 | "source": [ 84 | "神经元可以由如下函数给出:\n", 85 | "$$y=f_a\\left(\\sum_{i=1}^{n}w_i x_i\\right)=f_a\\left(\\mathbf{x}^\\top\\mathbf{w}\\right)$$\n", 86 | "其中,$\\mathbf{x}=\\left[x_1,x_2,\\dots,x_n\\right]^\\top\\in\\mathbb{R}^n$表示输入向量,$y\\in\\mathbb{R}$为输出,$\\mathbf{w}=\\left[w_1,w_2,\\dots,w_n\\right]^\\top\\in\\mathbb{R}^n$为权重向量,$f_a$为任意可微激活函数。" 87 | ] 88 | }, 89 | { 90 | "cell_type": "markdown", 91 | "metadata": {}, 92 | "source": [ 93 | "给定映射$\\mathbf{F}:\\mathbb{R}^n\\to\\mathbb{R}$,希望通过训练$w_1,w_2,\\dots,w_n$,使得该神经元能够尽可能地逼近$\\mathbf{F}$。" 94 | ] 95 | }, 96 | { 97 | "cell_type": "markdown", 98 | "metadata": {}, 99 | "source": [ 100 | "13.3 反向传播算法" 101 | ] 102 | }, 103 | { 104 | "cell_type": "markdown", 105 | "metadata": {}, 106 | "source": [ 107 | "设神经网络共有$n$个输入$x_i,i=1,2,\\dots,n$;$m$个输出$y_s,s=1,2,\\dots,m$。中间层包括$l$个神经元,中间层神经元的输出为$z_j,j=1,2,\\dots,l$。中间层神经元的激活函数为$f_j^h,j=1,2,\\dots,l$,输出层神经元的激活函数为$f_s^o,s=1,2,\\dots,m$。" 108 | ] 109 | }, 110 | { 111 | "cell_type": "markdown", 112 | "metadata": { 113 | "collapsed": true 114 | }, 115 | "source": [ 116 | "令$w_{ji}^h,i=1,2,\\dots,n;j=1,2,\\dots,l$表示中间层输入对应的权重,$w_{sj}^o,j=1,2,\\dots,l;s=1,2,\\dots,m$表示输出层输入对应的权重。" 117 | ] 118 | }, 119 | { 120 | "cell_type": "markdown", 121 | "metadata": {}, 122 | "source": [ 123 | "令$v_j$表示中间层第$j$个神经元的输入,有$v_j=\\sum_{i=1}^n w_{ji}^h x_i$。" 124 | ] 125 | }, 126 | { 127 | "cell_type": "markdown", 128 | "metadata": {}, 129 | "source": [ 130 | "令$z_j$表示中间层第$j$个神经元的输出,有$z_j=f_j^h\\left(v_j\\right)=f_j^h\\left(\\sum_{i=1}^n w_{ji}^h x_i\\right)$。" 131 | ] 132 | }, 133 | { 134 | "cell_type": "markdown", 135 | "metadata": {}, 136 | "source": [ 137 | "输出层第$s$个神经元的输出为$y_s=f_s^o\\left(\\sum_{j=1}^lw_{sj}^oz_j\\right)$" 138 | ] 139 | }, 140 | { 141 | "cell_type": "markdown", 142 | "metadata": {}, 143 | "source": [ 144 | "输入$x_i,i=1,2,\\dots,n$和第$s$个输出$y_s$之间的关系为\n", 145 | "$$y_s=f_s^o\\left(\\sum_{j=1}^lw_{sj}^oz_j\\right) \\\\\n", 146 | "=f_s^o\\left(\\sum_{j=1}^lw_{sj}^o f_j^h\\left(v_j\\right)\\right) \\\\\n", 147 | "=f_s^o\\left(\\sum_{j=1}^lw_{sj}^o f_j^h\\left(\\sum_{i=1}^n w_{ji}^h x_i\\right)\\right)$$" 148 | ] 149 | }, 150 | { 151 | "cell_type": "markdown", 152 | "metadata": { 153 | "collapsed": true 154 | }, 155 | "source": [ 156 | "假定训练数据对$\\left(\\mathbf{x}_d,\\mathbf{y}_d\\right)$,$\\mathbf{x}_d=\\left[x_{d1},x_{d2},\\dots,x_{dn}\\right]^\\top\\in\\mathbb{R}^n$,$\\mathbf{y}_d\\in\\mathbb{R}^m$。神经网络的训练指的是调整网络的连接权重,使得在给定的输入$\\mathbf{x}_d$下,输出能够尽可能地接近于$\\mathbf{y}_d$。" 157 | ] 158 | }, 159 | { 160 | "cell_type": "markdown", 161 | "metadata": {}, 162 | "source": [ 163 | "形式上,可以得到如下优化问题:\n", 164 | "$$\\min\\frac{1}{2}\\sum_{s=1}^m\\left(y_{ds}-y_s\\right)^2$$\n", 165 | "其中,$y_s,s=1,2,\\dots,m$表示神经网络在输入$x_{d1},x_{d2},\\dots,x_{dn}$下的实际输出:\n", 166 | "$$y_s=f_s^o\\left(\\sum_{j=1}^lw_{sj}^of_j^h\\left(\\sum_{i=1}^nw_{ji}^hx_i\\right)\\right)$$" 167 | ] 168 | }, 169 | { 170 | "cell_type": "markdown", 171 | "metadata": {}, 172 | "source": [ 173 | "决策变量为所有权重,即$w_{ji}^h,w_{sj}^o,i=1,2,\\dots,n;j=1,2,\\dots,l;s=1,2,\\dots,m$。" 174 | ] 175 | }, 176 | { 177 | "cell_type": "markdown", 178 | "metadata": {}, 179 | "source": [ 180 | "决策变量向量形式为$\\mathbf{w}=\\{w_{ji}^h,w_{sj}^o:i=1,2,\\dots,n,j=1,2,\\dots,l,s=1,2,\\dots,m\\}$" 181 | ] 182 | }, 183 | { 184 | "cell_type": "markdown", 185 | "metadata": {}, 186 | "source": [ 187 | "目标函数\n", 188 | "$$\\mathbf{E}\\left(\\mathbf{w}\\right)=\\frac{1}{2}\\sum_{s=1}^m\\left(y_{ds}-y_s\\right)^2 \\\\\n", 189 | "=\\frac{1}{2}\\sum_{s=1}^m\\left(y_{ds}-f_s^o\\left(\\sum_{j=1}^lw_{sj}^of_j^h\\left(\\sum_{i=1}^nw_{ji}^hx_i\\right)\\right)\\right)^2$$" 190 | ] 191 | }, 192 | { 193 | "cell_type": "markdown", 194 | "metadata": {}, 195 | "source": [ 196 | "可利用固定步长梯度法来求解。需要计算目标函数$\\mathbf{E}$关于$\\mathbf{w}$中每个元素的偏导数。" 197 | ] 198 | }, 199 | { 200 | "cell_type": "markdown", 201 | "metadata": {}, 202 | "source": [ 203 | "计算函数$\\mathbf{E}$关于$w_{sj}^o$的偏导数。固定$i$、$j$、$s$,将函数$\\mathbf{E}$改写为\n", 204 | "$$\\mathbf{E}\\left(\\mathbf{w}\\right)=\\frac{1}{2}\\sum_{p=1}^m\\left(y_{dp}-f_p^o\\left(\\sum_{q=1}^lw_{pq}^oz_q\\right)\\right)^2$$" 205 | ] 206 | }, 207 | { 208 | "cell_type": "markdown", 209 | "metadata": {}, 210 | "source": [ 211 | "对于$q=1,2,\\dots,l$,有\n", 212 | "$$z_q=f_q^h\\left(\\sum_{i=1}^nw_{qi}^hx_{di}\\right)$$" 213 | ] 214 | }, 215 | { 216 | "cell_type": "markdown", 217 | "metadata": { 218 | "collapsed": true 219 | }, 220 | "source": [ 221 | "利用链式法则,可得\n", 222 | "$$\\frac{\\partial\\mathbf{E}}{\\partial w_{sj}^o}\\left(\\mathbf{w}\\right)=-\\left(y_{ds}-y_s\\right)f_s^{o'}\\left(\\sum_{q=1}^lw_{sq}^oz_q\\right)z_j$$\n", 223 | "其中,$f_s^{o'}:\\mathbb{R}\\to\\mathbb{R}$表示函数$f_s^{o}$的导数。" 224 | ] 225 | }, 226 | { 227 | "cell_type": "markdown", 228 | "metadata": {}, 229 | "source": [ 230 | "为简化描述,定义\n", 231 | "$$\\delta_s=\\left(y_{ds}-y_s\\right)f_s^{o'}\\left(\\sum_{q=1}^lw_{sq}^oz_q\\right)z_j$$\n", 232 | "$\\delta_s$是输出误差(神经网络的实际输出$y_s$与要求输出$y_ds$之间的差值)进行缩放的结果,缩放因子为$f_s^{o'}\\left(\\sum_{q=1}^lw_{sq}^oz_q\\right)z_j$。" 233 | ] 234 | }, 235 | { 236 | "cell_type": "markdown", 237 | "metadata": {}, 238 | "source": [ 239 | "利用$\\delta_s$的表达式,可得\n", 240 | "$$\\frac{\\partial\\mathbf{E}}{\\partial w_{sj}^o}\\left(\\mathbf{w}\\right)=-\\delta_sz_j$$" 241 | ] 242 | }, 243 | { 244 | "cell_type": "markdown", 245 | "metadata": {}, 246 | "source": [ 247 | "计算函数$\\mathbf{E}$关于$w_{ji}^h$的偏导数。固定$i$、$j$、$s$,将函数$\\mathbf{E}$改写为\n", 248 | "$$\\mathbf{E}\\left(\\mathbf{w}\\right)=\\frac{1}{2}\\sum_{p=1}^m\\left(y_{ds}-f_s^o\\left(\\sum_{q=1}^lw_{pq}^of_q^h\\left(\\sum_{r=1}^nw_{qr}^hx_{dr}\\right)\\right)\\right)^2$$" 249 | ] 250 | }, 251 | { 252 | "cell_type": "markdown", 253 | "metadata": {}, 254 | "source": [ 255 | "利用链式法则,可得\n", 256 | "$$\\frac{\\partial\\mathbf{E}}{\\partial w_{ji}^h}\\left(\\mathbf{w}\\right)=-\\sum_{p=1}^m\\left(y_{dp}-y_p\\right)f_p^{o'}\\left(\\sum_{q=1}^lw_{pq}^oz_q\\right)w_{pj}^of_j^{h'}\\left(\\sum_{r=1}^nw_{jr}^hx_{dr}\\right)x_{di}$$\n", 257 | "其中,$f_j^{h'}:\\mathbb{R}\\to\\mathbb{R}$表示函数$f_j^{h}$的导数。" 258 | ] 259 | }, 260 | { 261 | "cell_type": "markdown", 262 | "metadata": {}, 263 | "source": [ 264 | "利用$\\delta_s$的表达式,可得\n", 265 | "$$\\frac{\\partial\\mathbf{E}}{\\partial w_{ji}^h}\\left(\\mathbf{w}\\right)=-\\left(\\sum_{p=1}^m\\delta_pw_{pj}^o\\right)f_j^{h'}\\left(v_j\\right)x_{di}$$" 266 | ] 267 | }, 268 | { 269 | "cell_type": "markdown", 270 | "metadata": {}, 271 | "source": [ 272 | "权重$w_{sj}^o$和$w_{ji}^h$的迭代更新公式:\n", 273 | "$$w_{sj}^{o\\left(k+1\\right)}=w_{sj}^{o\\left(k\\right)}+\\eta\\delta_s^{\\left(k\\right)}z_j^{\\left(k\\right)} \\\\\n", 274 | "w_{ji}^{h\\left(k+1\\right)}=w_{ji}^{h\\left(k\\right)}+\\eta\\left(\\sum_{p=1}^m\\delta_p^{\\left(k\\right)}w_{pj}^{o\\left(k\\right)}\\right)f_j^{h'}\\left(v_j^{\\left(k\\right)}\\right)x_{di}$$\n", 275 | "其中,\n", 276 | "$$v_j^{\\left(k\\right)}=\\sum_{i=1}^nw_{ji}^{h\\left(k\\right)}x_{di} \\\\\n", 277 | "z_j^{\\left(k\\right)}=f_j^k\\left(v_j^{\\left(k\\right)}\\right) \\\\\n", 278 | "y_s^{\\left(k\\right)}=f_s^o\\left(\\sum_{q=1}^lw_{sq}^{o\\left(k\\right)}z_q^{\\left(k\\right)}\\right) \\\\\n", 279 | "\\delta_s^{\\left(k\\right)}=\\left(y_{ds}-y_s^{\\left(k\\right)}\\right)f_s^{o'}\\left(\\sum_{q=1}^lw_{sq}^{o\\left(k\\right)}z_q^{\\left(k\\right)}\\right)$$\n", 280 | "$\\eta$表示为固定步长。" 281 | ] 282 | } 283 | ], 284 | "metadata": { 285 | "kernelspec": { 286 | "display_name": "Python 2", 287 | "language": "python", 288 | "name": "python2" 289 | }, 290 | "language_info": { 291 | "codemirror_mode": { 292 | "name": "ipython", 293 | "version": 2 294 | }, 295 | "file_extension": ".py", 296 | "mimetype": "text/x-python", 297 | "name": "python", 298 | "nbconvert_exporter": "python", 299 | "pygments_lexer": "ipython2", 300 | "version": "2.7.13" 301 | } 302 | }, 303 | "nbformat": 4, 304 | "nbformat_minor": 2 305 | } 306 | -------------------------------------------------------------------------------- /第1章 证明方法与相关记法.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": { 6 | "collapsed": true 7 | }, 8 | "source": [ 9 | "# 第1章 证明方法与相关记法" 10 | ] 11 | }, 12 | { 13 | "cell_type": "markdown", 14 | "metadata": {}, 15 | "source": [ 16 | "1.1 证明方法" 17 | ] 18 | }, 19 | { 20 | "cell_type": "markdown", 21 | "metadata": {}, 22 | "source": [ 23 | "原命题:$A$ \n", 24 | "否命题(逻辑非):$\\neg A$ \n", 25 | "逻辑非真值表: \n", 26 | "\n", 27 | "| $A$ | $\\neg A$ |\n", 28 | "|------:|---------:|\n", 29 | "|$True$ | $False$|\n", 30 | "|$False$| $True$| " 31 | ] 32 | }, 33 | { 34 | "cell_type": "markdown", 35 | "metadata": { 36 | "collapsed": true 37 | }, 38 | "source": [ 39 | "逻辑与:$A \\land B$ \n", 40 | "逻辑或:$A \\lor B$ \n", 41 | "逻辑与、逻辑或真值表: \n", 42 | "\n", 43 | "| $A$ | $B$ |$A \\land B$ | $A \\lor B$ |\n", 44 | "|------:|------:|------------:|------------:|\n", 45 | "|$True$ |$True$ |$\\quad True$ |$\\quad True$ |\n", 46 | "|$True$ |$False$|$\\quad False$|$\\quad True$ |\n", 47 | "|$False$|$True$ |$\\quad False$|$\\quad True$ |\n", 48 | "|$False$|$False$|$\\quad False$|$\\quad False$|" 49 | ] 50 | }, 51 | { 52 | "cell_type": "markdown", 53 | "metadata": { 54 | "collapsed": true 55 | }, 56 | "source": [ 57 | "充分条件:$A \\Rightarrow B$($A$蕴含$B$、$A$仅当$B$、如果$A$则$B$、$A$是$B$的充分条件) \n", 58 | "必要条件:$A \\Leftarrow B$ ($A$是$B$的必要条件) \n", 59 | "充要条件:$A \\Leftrightarrow B$ ($A$等价于$B、A$是$B$的充要条件) \n", 60 | "充分条件、必要条件、充要条件真值表: \n", 61 | "\n", 62 | "| $A$ | $B$ |$A \\Rightarrow B$|$A \\Leftarrow B$|$A \\Leftrightarrow B$|\n", 63 | "|------:|------:|----------------:|---------------:|--------------------:|\n", 64 | "|$True$ |$True$ |$\\qquad True$ |$\\qquad True$ |$\\qquad True$ |\n", 65 | "|$True$ |$False$|$\\qquad False$ |$\\qquad True$ |$\\qquad False$ |\n", 66 | "|$False$|$True$ |$\\qquad True$ |$\\qquad False$ |$\\qquad False$ |\n", 67 | "|$False$|$False$|$\\qquad True$ |$\\qquad True$ |$\\qquad True$ |" 68 | ] 69 | }, 70 | { 71 | "cell_type": "markdown", 72 | "metadata": {}, 73 | "source": [ 74 | "原命题:$A \\Rightarrow B$ \n", 75 | "逆否命题:$\\neg B \\Rightarrow \\neg A$ \n", 76 | "原命题与其逆否命题等价 \n", 77 | "$$ \\left( A \\Rightarrow B \\right) \\Leftrightarrow \\left( \\neg B \\Rightarrow \\neg A \\right)$$" 78 | ] 79 | }, 80 | { 81 | "cell_type": "markdown", 82 | "metadata": {}, 83 | "source": [ 84 | "德摩根定律: \n", 85 | "$$ \\neg \\left( A \\lor B \\right) \\Leftrightarrow \\neg A \\land \\neg B \\\\\n", 86 | " \\neg \\left( A \\land B \\right) \\Leftrightarrow \\neg A \\lor \\neg B $$" 87 | ] 88 | }, 89 | { 90 | "cell_type": "markdown", 91 | "metadata": {}, 92 | "source": [ 93 | "命题$A \\Rightarrow B$的证明: \n", 94 | "1)直接证明法:$A \\Rightarrow B$ \n", 95 | "2)对位证明法:$ \\left( A \\Rightarrow B \\right) \\Leftrightarrow \\left( \\neg B \\Rightarrow \\neg A \\right)$ \n", 96 | "3)反证法: $ \\left( A \\Rightarrow B \\right) \\Leftrightarrow \\neg \\left( A \\land \\neg B \\right)$ \n", 97 | "4)归纳法:设序列中各项的属性满足a)第1项具有该属性;b)如果第$n$项具有该属性,那么第$n+1$项也具有该属性,则序列中任意项均具有该属性。" 98 | ] 99 | }, 100 | { 101 | "cell_type": "markdown", 102 | "metadata": {}, 103 | "source": [ 104 | "1.2 记法" 105 | ] 106 | }, 107 | { 108 | "cell_type": "markdown", 109 | "metadata": {}, 110 | "source": [ 111 | "设$X$是一个集合,则$x \\in X$表示$x$是集合$X$的一个元素,$x \\notin X$表示$x$不是集合$X$的元素。 \n", 112 | "集合$\\{x_1,x_2,\\dots, x_n \\}$表示由$x_1,x_2,\\dots, x_n$元素组成的集合。 \n", 113 | "集合$\\{x:x\\in \\mathbb{R},x>5\\}$表示由实数集合中大于5的元素组成的集合,也可表示为$\\{x\\in \\mathbb{R}:x>5\\}$。 \n", 114 | "设$X$和$Y$是集合,则$X \\subset Y$表示$X$的元素也是$Y$的元素,即$X$是$Y$的子集。 \n", 115 | "设$X$和$Y$是集合,则$X \\backslash Y$表示由在集合$X$中但不在集合$Y$中的元素组成的集合,即$X$减$Y$。" 116 | ] 117 | }, 118 | { 119 | "cell_type": "markdown", 120 | "metadata": {}, 121 | "source": [ 122 | "符号“$f:X \\to Y$”表示$f$是一个从集合$X$到集合$Y$的函数。 \n", 123 | "符号“$x:=y$”表示一个算数赋值操作,即将$x$赋值给$y$。 \n", 124 | "符号“$\\triangleq$”表示定义为相等。" 125 | ] 126 | } 127 | ], 128 | "metadata": { 129 | "kernelspec": { 130 | "display_name": "Python 2", 131 | "language": "python", 132 | "name": "python2" 133 | }, 134 | "language_info": { 135 | "codemirror_mode": { 136 | "name": "ipython", 137 | "version": 2 138 | }, 139 | "file_extension": ".py", 140 | "mimetype": "text/x-python", 141 | "name": "python", 142 | "nbconvert_exporter": "python", 143 | "pygments_lexer": "ipython2", 144 | "version": "2.7.13" 145 | } 146 | }, 147 | "nbformat": 4, 148 | "nbformat_minor": 2 149 | } 150 | -------------------------------------------------------------------------------- /第20章 仅含等式约束的优化问题.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "第20章 仅含等式约束的优化问题" 8 | ] 9 | }, 10 | { 11 | "cell_type": "markdown", 12 | "metadata": {}, 13 | "source": [ 14 | "20.1 引言" 15 | ] 16 | }, 17 | { 18 | "cell_type": "markdown", 19 | "metadata": {}, 20 | "source": [ 21 | "有约束非线性优化问题\n", 22 | "$$\\min f\\left(\\mathbf{x}\\right) \\\\\n", 23 | "s.t. \\quad h_i\\left(\\mathbf{x}\\right)=0, i=1,\\dots,m \\\\\n", 24 | "\\quad g_j\\left(\\mathbf{x}\\right)\\leqslant0,j=1,\\dots,p$$\n", 25 | "其中,$\\mathbf{x}\\in\\mathbb{R}^n,f:\\mathbb{R}^n\\to\\mathbb{R},h_i:\\mathbb{R}^n\\to\\mathbb{R},g_j:\\mathbb{R}^n\\to\\mathbb{R},m\\leqslant n$。" 26 | ] 27 | }, 28 | { 29 | "cell_type": "markdown", 30 | "metadata": {}, 31 | "source": [ 32 | "向量表示的标准型\n", 33 | "$$\\min f\\left(\\mathbf{x}\\right) \\\\\n", 34 | "s.t. \\mathbf{h}\\left(\\mathbf{x}\\right)=\\mathbf{0} \\\\\n", 35 | "\\quad \\mathbf{g}\\left(\\mathbf{x}\\right)\\leqslant \\mathbf{0}$$\n", 36 | "其中,$\\mathbf{h}:\\mathbb{R}^n\\to\\mathbb{R}^m,\\mathbf{g}:\\mathbb{R}^n\\to\\mathbb{R}^p$。" 37 | ] 38 | }, 39 | { 40 | "cell_type": "markdown", 41 | "metadata": {}, 42 | "source": [ 43 | "定义20.1 满足所有约束条件的点称为可行点,所有可行点组成的集合\n", 44 | "$$\\{\\mathbf{x}\\in\\mathbb{R}^n:\\mathbf{h}\\left(\\mathbf{x}\\right)=\\mathbf{0},\\mathbf{g}\\leqslant\\mathbf{0}\\}$$\n", 45 | "称为可行集。" 46 | ] 47 | }, 48 | { 49 | "cell_type": "markdown", 50 | "metadata": {}, 51 | "source": [ 52 | "极大化问题可转化为极小划问题\n", 53 | "$$\\max f\\left(x\\right)=\\min\\;-f\\left(x\\right)$$" 54 | ] 55 | }, 56 | { 57 | "cell_type": "markdown", 58 | "metadata": {}, 59 | "source": [ 60 | "20.2 问题描述" 61 | ] 62 | }, 63 | { 64 | "cell_type": "markdown", 65 | "metadata": {}, 66 | "source": [ 67 | "仅含等式约束的优化问题\n", 68 | "$$\\min f\\left(\\mathbf{x}\\right) \\\\\n", 69 | "s.t. \\mathbf{h}\\left(\\mathbf{x}\\right)=\\mathbf{0} $$\n", 70 | "其中,$\\mathbf{x}\\in\\mathbb{R}^n,f:\\mathbb{R}^n\\to\\mathbb{R},\\mathbf{h}:\\mathbb{R}^n\\to\\mathbb{R}^m,\\mathbf{h}=\\left[h_1,\\dots,h_m\\right]^\\top,m\\leqslant n$。" 71 | ] 72 | }, 73 | { 74 | "cell_type": "markdown", 75 | "metadata": {}, 76 | "source": [ 77 | "定义20.2 对于满足约束$h_1\\left(\\mathbf{x}^*\\right)=0,\\dots,h_m\\left(\\mathbf{x}^*\\right)=0$的点$\\mathbf{x}^*$,如果梯度向量$\\nabla h_1\\left(\\mathbf{x}^*\\right),\\dots,\\nabla h_m\\left(\\mathbf{x}^*\\right)$是线性无关的,则称点$\\mathbf{x}$为该约束的一个正则点。其中假定函数$\\mathbf{h}$连续可微,即$\\mathbf{h}\\in\\mathcal{C}^1$。" 78 | ] 79 | }, 80 | { 81 | "cell_type": "markdown", 82 | "metadata": {}, 83 | "source": [ 84 | "令$D\\mathbf{h}\\left(\\mathbf{x}^*\\right)$为向量$\\mathbf{h}=\\left[h_1,\\dots,h_m\\right]^\\top$在$\\mathbf{x}^*$处的雅可比矩阵\n", 85 | "$$D\\mathbf{h}\\left(\\mathbf{x}^*\\right)=\\begin{bmatrix} Dh_1\\left(\\mathbf{x}^*\\right) \\\\ \\vdots \\\\ Dh_m\\left(\\mathbf{x}^*\\right) \\end{bmatrix}=\\begin{bmatrix} \\nabla h_1\\left(\\mathbf{x}^*\\right)^\\top \\\\ \\vdots \\\\ \\nabla h_m\\left(\\mathbf{x}^*\\right)^\\top \\end{bmatrix}$$\n", 86 | "则,当且仅当$\\mathrm{rank}D\\mathbf{h}\\left(\\mathbf{x}^*\\right)=m$(即雅可比矩阵行满秩)时,$\\mathbf{x}^*$是正则的。" 87 | ] 88 | }, 89 | { 90 | "cell_type": "markdown", 91 | "metadata": {}, 92 | "source": [ 93 | "线性约束的集合$h_1\\left(\\mathbf{x}\\right)=0,\\dots,h_m\\left(\\mathbf{x}\\right)=0,h_i:\\mathbb{R}^n\\to\\mathbb{R}$定义的是一个曲面:\n", 94 | "$$S=\\{\\mathbf{x}\\in\\mathbb{R}^n:h_1\\left(\\mathbf{x}\\right)=0,\\dots,h_m\\left(\\mathbf{x}\\right)=0\\}$$\n", 95 | "如果$S$上的所有点都是正则点,则曲面$S$的维数为$n-m$。" 96 | ] 97 | }, 98 | { 99 | "cell_type": "markdown", 100 | "metadata": {}, 101 | "source": [ 102 | "20.3 切线空间和法线空间" 103 | ] 104 | }, 105 | { 106 | "cell_type": "markdown", 107 | "metadata": {}, 108 | "source": [ 109 | "定义20.3 曲面$S$上的曲线$C$,是由$t\\in\\left(a,b\\right)$连续参数化的一组点构成的集合$\\{\\mathbf{x}\\left(t\\right)\\in S:t\\in\\left(a,b\\right)\\}$,即$\\mathbf{x}:\\left(a,b\\right)\\to S$是连续函数。" 110 | ] 111 | }, 112 | { 113 | "cell_type": "markdown", 114 | "metadata": {}, 115 | "source": [ 116 | "由曲线定义可知,曲线上的所有点都满足曲面方程。如果曲线$C$通过一个点$\\mathbf{x}^*$,则必然存在$t^*\\in\\left(a,b\\right)$,使得$\\mathbf{x}\\left(t^*\\right)=\\mathbf{x}^*$。" 117 | ] 118 | }, 119 | { 120 | "cell_type": "markdown", 121 | "metadata": {}, 122 | "source": [ 123 | "可以把曲线$C=\\{\\mathbf{x}\\left(t\\right):t\\in\\left(a,b\\right)\\}$当作某个点在曲面$S$上运动是经过点$\\mathbf{x}$的路径,$\\mathbf{x}\\left(t\\right)$表示点在$t$时刻的位置。" 124 | ] 125 | }, 126 | { 127 | "cell_type": "markdown", 128 | "metadata": {}, 129 | "source": [ 130 | "定义20.4 如果对于所有$t\\in\\left(a,b\\right)$,\n", 131 | "$$\\dot{\\mathbf{x}}\\left(t\\right)=\\frac{\\mathrm{d}\\mathbf{x}}{\\mathrm{d}t}\\left(t\\right)=\\begin{bmatrix} \\dot{x}_1\\left(t\\right) \\\\ \\vdots \\\\ \\dot{x}_n\\left(t\\right) \\end{bmatrix}$$\n", 132 | "都存在,则曲线$C=\\{\\mathbf{x}\\left(t\\right):t\\in\\left(a,b\\right)\\}$是线性可微的。" 133 | ] 134 | }, 135 | { 136 | "cell_type": "markdown", 137 | "metadata": {}, 138 | "source": [ 139 | "如果对于所有$t\\in\\left(a,b\\right)$,\n", 140 | "$$\\ddot{\\mathbf{x}}\\left(t\\right)=\\frac{\\mathrm{d}^2\\mathbf{x}}{\\mathrm{d}t^2}\\left(t\\right)=\\begin{bmatrix} \\ddot{x}_1\\left(t\\right) \\\\ \\vdots \\\\ \\ddot{x}_n\\left(t\\right) \\end{bmatrix}$$\n", 141 | "都存在,则曲线$C=\\{\\mathbf{x}\\left(t\\right):t\\in\\left(a,b\\right)\\}$是二次可微的。" 142 | ] 143 | }, 144 | { 145 | "cell_type": "markdown", 146 | "metadata": {}, 147 | "source": [ 148 | "$\\dot{\\mathbf{x}}\\left(t\\right)$和$\\ddot{\\mathbf{x}}\\left(t\\right)$均是$n$维向量。可以把$\\dot{\\mathbf{x}}\\left(t\\right)$和$\\ddot{\\mathbf{x}}\\left(t\\right)$分别视为运动路径为$C$的某个点$\\mathbf{x}\\left(t\\right)$在$t$时刻的速度和加速度。向量$\\dot{\\mathbf{x}}\\left(t\\right)$指向$\\mathbf{x}\\left(t\\right)$的瞬时运动方向。因此向量$\\dot{\\mathbf{x}}\\left(t^*\\right)$在$\\mathbf{x}^*$处与曲线$C$相切。" 149 | ] 150 | }, 151 | { 152 | "cell_type": "markdown", 153 | "metadata": {}, 154 | "source": [ 155 | "定义20.5 曲面$S=\\{\\mathbf{x}\\in\\mathbb{R}^n:\\mathbf{h}\\left(\\mathbf{x}\\right)=\\mathbf{0}\\}$中的点$\\mathbf{x}^*$处的切线空间为集合$T\\left(\\mathbf{x^*}\\right)=\\{\\mathbf{y}:D\\mathbf{h}\\left(\\mathbf{x}^*\\right)\\mathbf{y}=\\mathbf{0}\\}$" 156 | ] 157 | }, 158 | { 159 | "cell_type": "markdown", 160 | "metadata": {}, 161 | "source": [ 162 | "切线空间$T\\left(\\mathbf{x}^*\\right)$是矩阵$D\\mathbf{h}\\left(\\mathbf{x}^*\\right)$的零空间\n", 163 | "$$T\\left(\\mathbf{x}^*\\right)=\\mathcal{N}\\left(D\\mathbf{h}\\left(\\mathbf{x}^*\\right)\\right)$$\n", 164 | "因此,切线空间是$\\mathbb{R}^n$的子空间。" 165 | ] 166 | }, 167 | { 168 | "cell_type": "markdown", 169 | "metadata": {}, 170 | "source": [ 171 | "假设$\\mathbf{x}^*$是正则点,则切线空间的维数为$n-m$,$m$是等式约束$h_i\\left(\\mathbf{x}^*\\right)=0$的数量。" 172 | ] 173 | }, 174 | { 175 | "cell_type": "markdown", 176 | "metadata": {}, 177 | "source": [ 178 | "切线空间经过原点,但常被描绘为一个经过点$\\mathbf{x}^*$的平面。点$\\mathbf{x}^*$处的切平面定义为\n", 179 | "$$TP\\left(\\mathbf{x}^*\\right)=T\\left(\\mathbf{x}^*\\right)+\\mathbf{x}^*=\\{\\mathbf{x}+\\mathbf{x}^*:\\mathbf{x}\\in T\\left(\\mathbf{x}^*\\right)\\}$$" 180 | ] 181 | }, 182 | { 183 | "cell_type": "markdown", 184 | "metadata": {}, 185 | "source": [ 186 | "定理20.1 假设$\\mathbf{x}^*\\in S$是一个正则点且$T\\left(\\mathbf{x}^*\\right)$是$\\mathbf{x}^*$处的切线空间,当且仅当曲面$S$中存在一条经过点$\\mathbf{x}^*$的可微曲线,其在$\\mathbf{x}^*$处的导数为$\\mathbf{y}$时,有$\\mathbf{y}\\in T\\left(\\mathbf{x}^*\\right)$成立。" 187 | ] 188 | }, 189 | { 190 | "cell_type": "markdown", 191 | "metadata": {}, 192 | "source": [ 193 | "定义20.6 曲面$S=\\{\\mathbf{x}\\in\\mathbb{R}^n:\\mathbf{h}\\left(\\mathbf{x}\\right)=\\mathbf{0}\\}$中的点$\\mathbf{x}^*$处的法向量空间$N\\left(\\mathbf{x}^*\\right)$定义为$N\\left(\\mathbf{x}^*\\right)=\\{\\mathbf{x}\\in\\mathbb{R}^n:\\mathbf{x}=D\\mathbf{h}\\left(\\mathbf{x}^*\\right)^\\top\\mathbf{z},\\mathbf{z}\\in\\mathbb{R}^m\\}$" 194 | ] 195 | }, 196 | { 197 | "cell_type": "markdown", 198 | "metadata": {}, 199 | "source": [ 200 | "法线空间可表示为\n", 201 | "$$N\\left(\\mathbf{x}^*\\right)=\\mathcal{R}\\left(D\\mathbf{h}\\left(\\mathbf{x}^*\\right)^\\top\\right)$$\n", 202 | "即法线空间是矩阵$D\\mathbf{h}\\left(\\mathbf{x}^*\\right)^\\top$的值域。" 203 | ] 204 | }, 205 | { 206 | "cell_type": "markdown", 207 | "metadata": {}, 208 | "source": [ 209 | "法线空间$N\\left(\\mathbf{x}^*\\right)$是由向量$\\nabla h_1\\left(\\mathbf{x}^*\\right),\\dots,\\nabla h_m\\left(\\mathbf{x}^*\\right)$张成的子空间,即\n", 210 | "$$N\\left(\\mathbf{x}^*\\right)=span\\left[\\nabla h_1\\left(\\mathbf{x}^*\\right),\\dots,\\nabla h_m\\left(\\mathbf{x}^*\\right)\\right] \\\\\n", 211 | "\\quad\\quad\\quad\\quad\\quad\\quad\\quad\\quad\\quad\\quad\\quad\\quad\\quad\\quad=\\{\\mathbf{x}\\in\\mathbb{R}^n:\\mathbf{x}=z_1\\nabla h_1\\left(\\mathbf{x}^*\\right)+\\dots+z_m\\nabla h_m\\left(\\mathbf{x}^*\\right),z_1,\\dots,z_m\\in\\mathbb{R}\\}$$" 212 | ] 213 | }, 214 | { 215 | "cell_type": "markdown", 216 | "metadata": {}, 217 | "source": [ 218 | "当$\\mathbf{x}^*$是正则点时,法线空间$N\\left(\\mathbf{x}^*\\right)$的维数为$m$。" 219 | ] 220 | }, 221 | { 222 | "cell_type": "markdown", 223 | "metadata": {}, 224 | "source": [ 225 | "$\\mathbf{x}^*$处的法平面为\n", 226 | "$$NP\\left(\\mathbf{x}^*\\right)=N\\left(\\mathbf{x}^*\\right)+\\mathbf{x}^*=\\{\\mathbf{x}+\\mathbf{x}^*:\\mathbf{x}\\in N\\left(\\mathbf{x}^*\\right)\\}$$" 227 | ] 228 | }, 229 | { 230 | "cell_type": "markdown", 231 | "metadata": {}, 232 | "source": [ 233 | "切线空间和法线空间互为正交补。" 234 | ] 235 | }, 236 | { 237 | "cell_type": "markdown", 238 | "metadata": {}, 239 | "source": [ 240 | "可以对$\\mathbb{R}^n$进行直和分解\n", 241 | "$$\\mathbb{R}^n=N\\left(\\mathbf{x}^*\\right)\\oplus T\\left(\\mathbf{x}^*\\right)$$\n", 242 | "即对于任意向量$\\mathbf{v}\\in\\mathbb{R}^n$,存在仅有的一对向量$\\mathbf{w}\\in N\\left(\\mathbf{x}^*\\right)$和$\\mathbf{y}\\in T\\left(\\mathbf{x}^*\\right)$,使得$\\mathbf{v}=\\mathbf{w}+\\mathbf{y}$。" 243 | ] 244 | }, 245 | { 246 | "cell_type": "markdown", 247 | "metadata": { 248 | "collapsed": true 249 | }, 250 | "source": [ 251 | "20.4 拉格朗日条件" 252 | ] 253 | }, 254 | { 255 | "cell_type": "markdown", 256 | "metadata": {}, 257 | "source": [ 258 | "令$\\mathbf{h}:\\mathbb{R}^2\\to\\mathbb{R}$为约束函数,一直函数定义域中的点$\\mathbf{x}$处的梯度$\\nabla\\mathbf{h}\\left(\\mathbf{x}\\right)$与通过该点的$\\mathbf{h}\\left(\\mathbf{x}\\right)$水平集正交。选择点$\\mathbf{x}^*=\\left[x_1^*,x_2^*\\right]^\\top$,使得$\\mathbf{h}\\left(\\mathbf{x^*}\\right)=0$,且$\\nabla\\mathbf{h}\\left(\\mathbf{x}\\right)\\neq0$,经过点$\\mathbf{x}^*$的水平集为集合$\\{\\mathbf{x}:\\mathbf{h}\\left(\\mathbf{x}\\right)=\\mathbf{0}\\}$。" 259 | ] 260 | }, 261 | { 262 | "cell_type": "markdown", 263 | "metadata": {}, 264 | "source": [ 265 | "利用曲线$\\{\\mathbf{x}\\left(t\\right)\\}$在$\\mathbf{x}^*$邻域内对水平集进行参数化,$\\mathbf{x}\\left(t\\right)$为一个连续可微的向量函数$\\mathbf{x}:\\mathbb{R}\\to\\mathbb{R}^2$\n", 266 | "$$\\mathbf{x}\\left(t\\right)=\\begin{bmatrix} x_1\\left(t\\right) \\\\ x_2\\left(t\\right) \\end{bmatrix},t\\in\\left(a,b\\right),\\mathbf{x}^*=\\mathbf{x}\\left(t^*\\right),\\dot{\\mathbf{x}}\\left(t^*\\right)\\neq\\mathbf{0},t^*\\in\\left(a,b\\right)$$" 267 | ] 268 | }, 269 | { 270 | "cell_type": "markdown", 271 | "metadata": {}, 272 | "source": [ 273 | "由于$\\mathbf{h}$在曲线$\\{\\mathbf{x}\\left(t\\right):t\\in\\left(a,b\\right)\\}$上是常数,即对于任意$t\\in\\left(a,b\\right)$,有\n", 274 | "$$\\mathbf{h}\\left(\\mathbf{x}\\left(t\\right)\\right)=0$$\n", 275 | "因此,对于任意$t\\in\\left(a,b\\right)$有\n", 276 | "$$\\frac{\\mathrm{d}}{\\mathrm{d}t}\\mathbf{h}\\left(\\mathbf{x}\\left(t\\right)\\right)=0$$" 277 | ] 278 | }, 279 | { 280 | "cell_type": "markdown", 281 | "metadata": {}, 282 | "source": [ 283 | "利用链式法则,可得\n", 284 | "$$\\frac{\\mathrm{d}}{\\mathrm{d}t}\\mathbf{h}\\left(\\mathbf{x}\\left(t\\right)\\right)=\\nabla\\mathbf{h}\\left(\\mathbf{x}\\left(t\\right)\\right)^\\top\\dot{\\mathbf{x}}\\left(t\\right)=0$$\n", 285 | "因此,$\\nabla\\mathbf{h}\\left(\\mathbf{x}\\left(t\\right)\\right)^\\top\\dot{\\mathbf{x}}\\left(t\\right)=0$,即$\\nabla\\mathbf{h}\\left(\\mathbf{x^*}\\right)$与$\\dot{\\mathbf{x}}\\left(t^*\\right)$正交。" 286 | ] 287 | }, 288 | { 289 | "cell_type": "markdown", 290 | "metadata": {}, 291 | "source": [ 292 | "构造关于$t$的复合函数\n", 293 | "$$\\phi\\left(t\\right)=\\mathbf{f}\\left(\\mathbf{x}\\left(t\\right)\\right)$$" 294 | ] 295 | }, 296 | { 297 | "cell_type": "markdown", 298 | "metadata": {}, 299 | "source": [ 300 | "该函数在$t=t^*$时取得极小值。根据无约束极值问题的一阶必要条件可知\n", 301 | "$$\\frac{\\mathrm{d}\\phi}{\\mathrm{d}t}\\left(t^*\\right)=0$$" 302 | ] 303 | }, 304 | { 305 | "cell_type": "markdown", 306 | "metadata": {}, 307 | "source": [ 308 | "利用链式法则,可得\n", 309 | "$$\\frac{\\mathrm{d}}{\\mathrm{d}t}\\phi\\left(t^*\\right)=\\nabla\\mathbf{f}\\left(\\mathbf{x}\\left(t^*\\right)\\right)^\\top\\dot{\\mathbf{x}}\\left(t^*\\right)=0$$\n", 310 | "因此,$\\nabla\\mathbf{f}\\left(\\mathbf{x}^*\\right)$与$\\dot{\\mathbf{x}}\\left(t^*\\right)$正交。" 311 | ] 312 | }, 313 | { 314 | "cell_type": "markdown", 315 | "metadata": {}, 316 | "source": [ 317 | "由于$\\dot{\\mathbf{x}}\\left(t^*\\right)$与曲线$\\{\\mathbf{x}\\left(t\\right)\\}$在点$\\mathbf{x}^*$处相切,因此$\\nabla\\mathbf{f}\\left(\\mathbf{x}^*\\right)$与曲线$\\{\\mathbf{x}\\left(t\\right)\\}$在点$\\mathbf{x}^*$处正交。" 318 | ] 319 | }, 320 | { 321 | "cell_type": "markdown", 322 | "metadata": {}, 323 | "source": [ 324 | "由于,$\\nabla\\mathbf{h}\\left(\\mathbf{x^*}\\right)$与$\\dot{\\mathbf{x}}\\left(t^*\\right)$正交,且$\\nabla\\mathbf{f}\\left(\\mathbf{x}^*\\right)$与$\\dot{\\mathbf{x}}\\left(t^*\\right)$正交,因此$\\nabla\\mathbf{h}\\left(\\mathbf{x^*}\\right)$与$\\nabla\\mathbf{f}\\left(\\mathbf{x}^*\\right)$平行,即$\\nabla\\mathbf{f}\\left(\\mathbf{x}^*\\right)$等于$\\nabla\\mathbf{h}\\left(\\mathbf{x^*}\\right)$与一个标量之积。" 325 | ] 326 | }, 327 | { 328 | "cell_type": "markdown", 329 | "metadata": {}, 330 | "source": [ 331 | "定理20.2 $n=2,m=1$时的拉格朗日定理 \n", 332 | "设点$\\mathbf{x}^*$是函数$\\mathbf{f}:\\mathbb{R}^2\\to\\mathbb{R}$的一个极小点,约束条件是$\\mathbf{h}\\left(\\mathbf{x}\\right)=0,\\mathbf{h}:\\mathbb{R}^2\\to\\mathbb{R}$,那么$\\nabla\\mathbf{h}\\left(\\mathbf{x^*}\\right)$与$\\nabla\\mathbf{f}\\left(\\mathbf{x}^*\\right)$平行,即如果$\\nabla\\mathbf{h}\\left(\\mathbf{x^*}\\right)\\neq0$,则存在标量$\\lambda^*$,使得\n", 333 | "$$\\nabla\\mathbf{f}\\left(\\mathbf{x}^*\\right)+\\lambda^*\\nabla\\mathbf{h}\\left(\\mathbf{x^*}\\right)=\\mathbf{0}$$\n", 334 | "其中,$\\lambda^*$称为拉格朗日乘子。" 335 | ] 336 | }, 337 | { 338 | "cell_type": "markdown", 339 | "metadata": {}, 340 | "source": [ 341 | "定理20.3 拉格朗日定理 \n", 342 | "设点$\\mathbf{x}^*$是函数$\\mathbf{f}:\\mathbb{R}^n\\to\\mathbb{R}$的一个极小点,约束条件是$\\mathbf{h}\\left(\\mathbf{x}\\right)=0,\\mathbf{h}:\\mathbb{R}^n\\to\\mathbb{R}^m,m\\leqslant n$。如果$\\mathbf{x}^*$是正则点,那么存在$\\boldsymbol{\\lambda}^*\\in\\mathbb{R}^m$,使得\n", 343 | "$$D\\mathbf{f}\\left(\\mathbf{x}^*\\right)+\\boldsymbol{\\lambda}^{*\\top}D\\mathbf{h}\\left(\\mathbf{x^*}\\right)=\\mathbf{0}^\\top$$" 344 | ] 345 | }, 346 | { 347 | "cell_type": "markdown", 348 | "metadata": {}, 349 | "source": [ 350 | "拉格朗日定理表明,如果$\\mathbf{x}^*$是极值点,则目标函数$\\mathbf{f}$在该店处梯度可表示为关于约束函数在该点处梯度的线性组合。向量$\\boldsymbol{\\lambda}^*$称为拉格朗日乘子向量。拉格朗日条件是必要条件,而不是充分条件。" 351 | ] 352 | }, 353 | { 354 | "cell_type": "markdown", 355 | "metadata": {}, 356 | "source": [ 357 | "引入拉格朗日函数$l:\\mathbb{R}^n\\times\\mathbb{R}^m\\to\\mathbb{R}$\n", 358 | "$$l\\left(\\mathbf{x},\\boldsymbol{\\lambda}\\right)\\triangleq\\mathbf{f}\\left(\\mathbf{x}\\right)+\\boldsymbol{\\lambda}^\\top\n", 359 | "\\mathbf{h}\\left(\\mathbf{x}\\right)$$" 360 | ] 361 | }, 362 | { 363 | "cell_type": "markdown", 364 | "metadata": {}, 365 | "source": [ 366 | "令$D_{\\mathbf{x}}l$表示$l$关于$\\mathbf{x}$的导数,$D_{\\mathbf{x}}l\\left(\\mathbf{x},\\boldsymbol{\\lambda}\\right)=D\\mathbf{f}\\left(\\mathbf{x}\\right)+\\boldsymbol{\\lambda}^\\top D\\mathbf{h}\\left(\\mathbf{x}\\right)$;$D_{\\boldsymbol{\\lambda}}l$表示$l$关于$\\boldsymbol{\\lambda}$的导数,$D_{\\boldsymbol{\\lambda}}l\\left(\\mathbf{x},\\boldsymbol{\\lambda}\\right)=\\mathbf{h}\\left(\\mathbf{x}\\right)^\\top$,有\n", 367 | "$$Dl\\left(\\mathbf{x},\\boldsymbol{\\lambda}\\right)=\\left[D_{\\mathbf{x}}l\\left(\\mathbf{x},\\boldsymbol{\\lambda}\\right),D_{\\boldsymbol{\\lambda}}l\\left(\\mathbf{x},\\boldsymbol{\\lambda}\\right)\\right]$$" 368 | ] 369 | }, 370 | { 371 | "cell_type": "markdown", 372 | "metadata": {}, 373 | "source": [ 374 | "局部极小点$\\mathbf{x}^*$的拉格朗日条件可以表达为存在$\\boldsymbol{\\lambda}^*$,满足\n", 375 | "$$D_{\\mathbf{x}}l\\left(\\mathbf{x},\\boldsymbol{\\lambda}\\right)=D\\mathbf{f}\\left(\\mathbf{x}\\right)+\\boldsymbol{\\lambda}^\\top D\\mathbf{h}\\left(\\mathbf{x}\\right)=\\mathbf{0}^\\top \\\\\n", 376 | "D_{\\boldsymbol{\\lambda}}l\\left(\\mathbf{x},\\boldsymbol{\\lambda}\\right)=\\mathbf{h}\\left(\\mathbf{x}\\right)^\\top=\\mathbf{0}^\\top$$\n", 377 | "即\n", 378 | "$$Dl\\left(\\mathbf{x}^*,\\boldsymbol{\\lambda}^*\\right)=\\mathbf{0}^\\top$$\n", 379 | "拉格朗日定理给定的必要条件,等价于将拉格朗日方程视为无约束优化问题的目标函数对应的一阶必要条件。" 380 | ] 381 | }, 382 | { 383 | "cell_type": "markdown", 384 | "metadata": {}, 385 | "source": [ 386 | "通过求解拉格朗日条件\n", 387 | "$$D_{\\mathbf{x}}l\\left(\\mathbf{x},\\boldsymbol{\\lambda}\\right)=\\mathbf{0}^\\top \\\\\n", 388 | "D_{\\boldsymbol{\\lambda}}l\\left(\\mathbf{x},\\boldsymbol{\\lambda}\\right)=\\mathbf{0}^\\top$$\n", 389 | "可找出可能的极值点。拉格朗日条件是必要非充分条件,即满足上述方程的点$\\mathbf{x}^*$不一定是极值点。" 390 | ] 391 | }, 392 | { 393 | "cell_type": "markdown", 394 | "metadata": {}, 395 | "source": [ 396 | "20.5 二阶条件" 397 | ] 398 | }, 399 | { 400 | "cell_type": "markdown", 401 | "metadata": {}, 402 | "source": [ 403 | "已知$f:\\mathbb{R}^n\\to\\mathbb{R}$和$\\mathbf{h}:\\mathbb{R}^n\\to\\mathbb{R}^m$是二次连续可微函数,即$f,\\mathbf{h}\\in\\mathcal{C}^2$。拉格朗日函数为\n", 404 | "$$l\\left(\\mathbf{x},\\boldsymbol{\\lambda}\\right)=f\\left(\\mathbf{x}\\right)+\\boldsymbol{\\lambda}^\\top\\mathbf{h}\\left(\\mathbf{x}\\right)=f\\left(\\mathbf{x}\\right)+\\lambda_1h_1\\left(\\mathbf{x}\\right)+\\dots+\\lambda_mh_m\\left(\\mathbf{x}\\right)$$" 405 | ] 406 | }, 407 | { 408 | "cell_type": "markdown", 409 | "metadata": {}, 410 | "source": [ 411 | "记$\\mathbf{L}\\left(\\mathbf{x},\\boldsymbol{\\lambda}\\right)$是$l\\left(\\mathbf{x},\\boldsymbol{\\lambda}\\right)$关于$\\mathbf{x}$的黑塞矩阵\n", 412 | "$$\\mathbf{L}\\left(\\mathbf{x},\\boldsymbol{\\lambda}\\right)=\\mathbf{F}\\left(\\mathbf{x}\\right)+\\lambda_1\\mathbf{H}_1\\left(\\mathbf{x}\\right)+\\dots+\\lambda_m\\mathbf{H}_m\\left(\\mathbf{x}\\right)$$\n", 413 | "其中,$\\mathbf{F}\\left(\\mathbf{x}\\right)$是$f$在$\\mathbf{x}$处的黑塞矩阵;$\\mathbf{H}_k\\left(\\mathbf{x}\\right)$是$h_k,k=1,\\dots,m$在$\\mathbf{x}$处的黑塞矩阵\n", 414 | "$$\\mathbf{H}_k\\left(\\mathbf{x}\\right)=\\begin{bmatrix} \\frac{\\partial^2h_k}{\\partial x_1^2}\\left(\\mathbf{x}\\right) & \\cdots & \\frac{\\partial^2h_k}{\\partial x_n \\partial x_1}\\left(\\mathbf{x}\\right) \\\\ \\vdots & & \\vdots\\\\ \\frac{\\partial^2h_k}{\\partial x_1 \\partial x_n}\\left(\\mathbf{x}\\right) & \\cdots & \\frac{\\partial^2h_k}{\\partial x_n^2 }\\left(\\mathbf{x}\\right) \\end{bmatrix}$$" 415 | ] 416 | }, 417 | { 418 | "cell_type": "markdown", 419 | "metadata": {}, 420 | "source": [ 421 | "记\n", 422 | "$$\\left[\\boldsymbol{\\lambda}\\mathbf{H}\\left(\\mathbf{x}\\right)\\right]=\\lambda_1\\mathbf{H}_1\\left(\\mathbf{x}\\right)+\\dots+\\lambda_m\\mathbf{H}_m\\left(\\mathbf{x}\\right)$$" 423 | ] 424 | }, 425 | { 426 | "cell_type": "markdown", 427 | "metadata": {}, 428 | "source": [ 429 | "可将拉格朗日函数写为\n", 430 | "$$\\mathbf{L}\\left(\\mathbf{x},\\boldsymbol{\\lambda}\\right)=\\mathbf{F}\\left(\\mathbf{x}\\right)+\\left[\\boldsymbol{\\lambda}\\mathbf{H}\\left(\\mathbf{x}\\right)\\right]$$" 431 | ] 432 | }, 433 | { 434 | "cell_type": "markdown", 435 | "metadata": {}, 436 | "source": [ 437 | "定理20.4 二阶必要条件 \n", 438 | "设$\\mathbf{x}^*$是$f:\\mathbb{R}^n\\to\\mathbb{R}$在约束条件$\\mathbf{h}\\left(\\mathbf{x}\\right)=\\mathbf{0},\\mathbf{h}:\\mathbb{R}^n\\to\\mathbb{R}^m,m\\leqslant n,f,\\mathbf{h}\\in\\mathcal{C}^2$下的局部极小点。如果$\\mathbf{x}^*$是正则点,那么存在$\\boldsymbol{\\lambda}^*\\in\\mathbb{R}^m$使得\n", 439 | "1. $Df\\left(\\mathbf{x}\\right)+\\boldsymbol{\\lambda}^{*\\top}D\\mathbf{h}\\left(\\mathbf{x}^*\\right)=\\mathbf{0}^\\top$ \n", 440 | "2. 对于所有$\\mathbf{y}\\in\\mathbf{T}\\left(\\mathbf{x}^*\\right)$,都有$\\mathbf{y}^\\top\\mathbf{L}\\left(\\mathbf{x}^*,\\boldsymbol{\\lambda}^*\\right)\\mathbf{y}\\geqslant0$" 441 | ] 442 | }, 443 | { 444 | "cell_type": "markdown", 445 | "metadata": {}, 446 | "source": [ 447 | "定理20.5 二阶充分条件 \n", 448 | "函数$f,\\mathbf{h}\\in\\mathcal{C}^2$,如果存在点$\\mathbf{x}^*\\in\\mathbf{R}^n$和$\\boldsymbol{\\lambda}^*\\in\\mathbb{R}^m$使得\n", 449 | "1. $Df\\left(\\mathbf{x}\\right)+\\boldsymbol{\\lambda}^{*\\top}D\\mathbf{h}\\left(\\mathbf{x}^*\\right)=\\mathbf{0}^\\top$ \n", 450 | "2. 对于所有$\\mathbf{y}\\in\\mathbf{T}\\left(\\mathbf{x}^*\\right),\\mathbf{y}\\neq\\mathbf{0}$,都有$\\mathbf{y}^\\top\\mathbf{L}\\left(\\mathbf{x}^*,\\boldsymbol{\\lambda}^*\\right)\\mathbf{y}\\geqslant0$ \n", 451 | "\n", 452 | "那么,$\\mathbf{x}^*$是$f$在约束条件$\\mathbf{h}\\left(\\mathbf{x}\\right)=\\mathbf{0}$下的严格局部极小点。" 453 | ] 454 | }, 455 | { 456 | "cell_type": "markdown", 457 | "metadata": {}, 458 | "source": [ 459 | "20.6 线性约束下二次型函数的极小化" 460 | ] 461 | }, 462 | { 463 | "cell_type": "markdown", 464 | "metadata": {}, 465 | "source": [ 466 | "$$\\min\\frac{1}{2}\\mathbf{x}^\\top\\mathbf{Q}\\mathbf{x} \\\\\n", 467 | "s.t.\\mathbf{A}\\mathbf{x}=\\mathbf{b}$$\n", 468 | "其中,$\\mathbf{Q}>\\mathbf{0},\\mathbf{A}\\in\\mathbb{R}^{m\\times n},m0$,且$\\mathbf{A}$的秩为$m$,可得\n", 508 | "$$\\boldsymbol{\\lambda}^*=\\left(\\mathbf{A}\\mathbf{Q}^{-1}\\mathbf{A}^\\top\\boldsymbol{\\lambda}^*\\right)^{-1}\\mathbf{b}$$" 509 | ] 510 | }, 511 | { 512 | "cell_type": "markdown", 513 | "metadata": {}, 514 | "source": [ 515 | "由此可得\n", 516 | "$$\\mathbf{x}^*=\\mathbf{Q}^{-1}\\mathbf{A}^\\top\\left(\\mathbf{A}\\mathbf{Q}^{-1}\\mathbf{A}^\\top\\boldsymbol{\\lambda}^*\\right)^{-1}\\mathbf{b}$$" 517 | ] 518 | }, 519 | { 520 | "cell_type": "markdown", 521 | "metadata": {}, 522 | "source": [ 523 | "由于拉格朗日函数在$\\left(\\mathbf{x}^*,\\boldsymbol{\\lambda}^*\\right)$处的黑塞矩阵\n", 524 | "$$\\mathbf{L}\\left(\\mathbf{x}^*,\\boldsymbol{\\lambda}^*\\right)=\\mathbf{Q}$$\n", 525 | "黑塞矩阵正定,因此$\\mathbf{x}^*$是严格局部极小点。" 526 | ] 527 | }, 528 | { 529 | "cell_type": "code", 530 | "execution_count": null, 531 | "metadata": { 532 | "collapsed": true 533 | }, 534 | "outputs": [], 535 | "source": [] 536 | } 537 | ], 538 | "metadata": { 539 | "kernelspec": { 540 | "display_name": "Python 2", 541 | "language": "python", 542 | "name": "python2" 543 | }, 544 | "language_info": { 545 | "codemirror_mode": { 546 | "name": "ipython", 547 | "version": 2 548 | }, 549 | "file_extension": ".py", 550 | "mimetype": "text/x-python", 551 | "name": "python", 552 | "nbconvert_exporter": "python", 553 | "pygments_lexer": "ipython2", 554 | "version": "2.7.13" 555 | } 556 | }, 557 | "nbformat": 4, 558 | "nbformat_minor": 2 559 | } 560 | -------------------------------------------------------------------------------- /第21章 含不等式约束的优化问题.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "第21章 含不等式约束的优化问题" 8 | ] 9 | }, 10 | { 11 | "cell_type": "markdown", 12 | "metadata": {}, 13 | "source": [ 14 | "21.1 卡罗需-库恩-塔克条件(KKT条件)" 15 | ] 16 | }, 17 | { 18 | "cell_type": "markdown", 19 | "metadata": {}, 20 | "source": [ 21 | "向量表示的一般形式的优化问题\n", 22 | "$$\\min f\\left(\\mathbf{x}\\right) \\\\\n", 23 | "s.t. \\mathbf{h}\\left(\\mathbf{x}\\right)=\\mathbf{0} \\\\\n", 24 | "\\quad \\mathbf{g}\\left(\\mathbf{x}\\right)\\leqslant \\mathbf{0}$$\n", 25 | "其中,$f:\\mathbb{R}\\to\\mathbb{R},\\mathbf{h}:\\mathbb{R}^n\\to\\mathbb{R}^m,\\mathbf{g}:\\mathbb{R}^n\\to\\mathbb{R}^p$。" 26 | ] 27 | }, 28 | { 29 | "cell_type": "markdown", 30 | "metadata": {}, 31 | "source": [ 32 | "定义21.1 对于一个不等式约束$g_j(\\left(\\mathbf{x}\\right)\\leqslant0$,如果在$\\mathbf{x}^*$处$g_j(\\left(\\mathbf{x}\\right)=0$,那么称该不等式是$\\mathbf{x}^*$处的起作用的约束;如果在$\\mathbf{x}^*$处$g_j(\\left(\\mathbf{x}\\right)<0$,那么称该不等式是$\\mathbf{x}^*$处的不起作用的约束。" 33 | ] 34 | }, 35 | { 36 | "cell_type": "markdown", 37 | "metadata": {}, 38 | "source": [ 39 | "定义21.2 设$\\mathbf{x}^*$满足$\\mathbf{h}\\left(\\mathbf{x}\\right)=\\mathbf{0},\\mathbf{g}\\left(\\mathbf{x}\\right)\\leqslant\\mathbf{0}$,设$J\\left(\\mathbf{x}^*\\right)$为起作用不等式约束下标集\n", 40 | "$$J\\left(\\mathbf{x}^*\\right)\\triangleq\\{j:g_j\\left(\\mathbf{x}^*\\right)=0\\}$$" 41 | ] 42 | }, 43 | { 44 | "cell_type": "markdown", 45 | "metadata": {}, 46 | "source": [ 47 | "如果向量\n", 48 | "$$\\nabla h_i\\left(\\mathbf{x}^*\\right),\\nabla g_j\\left(\\mathbf{x}^*\\right),1\\leqslant i\\leqslant m,j\\in J\\left(\\mathbf{x}^*\\right)$$\n", 49 | "是线性无关的,则称$\\mathbf{x}^*$是一个正则点。" 50 | ] 51 | }, 52 | { 53 | "cell_type": "markdown", 54 | "metadata": {}, 55 | "source": [ 56 | "定理21.1 KKT条件 \n", 57 | "设$f,\\mathbf{h},\\mathbf{g}\\in\\mathcal{C}^1$,$\\mathbf{x}^*$是一个正则点和局部极小点,则必然存在$\\boldsymbol{\\lambda}^*\\in\\mathbf{R}^m,\\boldsymbol{\\mu}^*\\in\\mathbf{R}^p$,使得以下条件成立:\n", 58 | "1. $\\boldsymbol{\\mu}^*\\geqslant\\mathbf{0}$;\n", 59 | "2. $Df\\left(\\mathbf{x}\\right)+\\boldsymbol{\\lambda}^{*\\top}D\\mathbf{h}\\left(\\mathbf{x}^*\\right)+\\boldsymbol{\\mu}^{*\\top}D\\mathbf{g}\\left(\\mathbf{x}^*\\right)=\\mathbf{0}^\\top$;\n", 60 | "3. $\\boldsymbol{\\mu}^{*\\top}\\mathbf{g}\\left(\\mathbf{x}^*\\right)=0$。\n", 61 | "\n", 62 | "其中,$\\boldsymbol{\\lambda}^{*\\top}$为拉格朗日乘子向量,$\\boldsymbol{\\mu}^{*\\top}$为KKT乘子向量,向量中的元素分别称为拉格朗日乘子和KKT乘子。" 63 | ] 64 | }, 65 | { 66 | "cell_type": "markdown", 67 | "metadata": {}, 68 | "source": [ 69 | "由定理可知 $\\boldsymbol{\\mu}^*\\geqslant\\mathbf{0}$和$g_j\\left(\\mathbf{x}^*\\right)\\leqslant0$,则条件\n", 70 | "$$\\boldsymbol{\\mu}^{*\\top}\\mathbf{g}\\left(\\mathbf{x}^*\\right)=\\mu_1^*g_1(\\left(\\mathbf{x}^*\\right)+\\dots+\\mu_p^*g_p(\\left(\\mathbf{x}^*\\right)=0$$\n", 71 | "意味,如果$g_j\\left(\\mathbf{x}^*\\right)<0$,则$\\mu_j^*=0$。也就是对于所有的$j\\notin J\\left(\\mathbf{x}^*\\right),\\mu_j^*=0$恒成立,即不起作用约束对应的KKT乘子$\\mu_j^*$等于0;其他KKT乘子$\\mu_i^*,i\\in J\\left(\\mathbf{x}^*\\right)$是非负的,可以等于0或者不等于0。" 72 | ] 73 | }, 74 | { 75 | "cell_type": "markdown", 76 | "metadata": {}, 77 | "source": [ 78 | "KKT条件是极小点的必要条件,因此,应按照必要条件的使用方式利用KKT条件搜索满足条件的点,并把这些点作为极小点的候选对象。KKT条件由以下5部分组成:\n", 79 | "1. $\\boldsymbol{\\mu}^*\\geqslant\\mathbf{0}$;\n", 80 | "2. $Df\\left(\\mathbf{x}^*\\right)+\\boldsymbol{\\lambda}^{*\\top}D\\mathbf{h}\\left(\\mathbf{x}^*\\right)+\\boldsymbol{\\mu}^{*\\top}D\\mathbf{g}\\left(\\mathbf{x}^*\\right)=\\mathbf{0}^\\top$;\n", 81 | "3. $\\boldsymbol{\\mu}^{*\\top}\\mathbf{g}\\left(\\mathbf{x}^*\\right)=0$;\n", 82 | "4. $\\mathbf{h}\\left(\\mathbf{x}^*\\right)=\\mathbf{0}$;\n", 83 | "5. $\\mathbf{g}\\left(\\mathbf{x}^*\\right)\\leqslant\\mathbf{0}$。" 84 | ] 85 | }, 86 | { 87 | "cell_type": "markdown", 88 | "metadata": {}, 89 | "source": [ 90 | "21.2 二阶条件" 91 | ] 92 | }, 93 | { 94 | "cell_type": "markdown", 95 | "metadata": {}, 96 | "source": [ 97 | "定义矩阵\n", 98 | "$$\\mathbf{L}\\left(\\mathbf{x},\\boldsymbol{\\lambda},\\boldsymbol{\\mu}\\right)=\\mathbf{F}\\left(\\mathbf{x}\\right)+\\left[\\boldsymbol{\\lambda}\\mathbf{H}\\left(\\mathbf{x}\\right)\\right]+\\left[\\boldsymbol{\\mu}\\mathbf{G}\\left(\\mathbf{x}\\right)\\right]$$\n", 99 | "其中,$\\mathbf{F}\\left(\\mathbf{x}\\right)$是$f$在点$\\mathbf{x}$处的黑塞矩阵。" 100 | ] 101 | }, 102 | { 103 | "cell_type": "markdown", 104 | "metadata": {}, 105 | "source": [ 106 | "$\\left[\\boldsymbol{\\lambda}\\mathbf{H}\\left(\\mathbf{x}\\right)\\right]$表示\n", 107 | "$$\\left[\\boldsymbol{\\lambda}\\mathbf{H}\\left(\\mathbf{x}\\right)\\right]=\\lambda_1\\mathbf{H}_1\\left(\\mathbf{x}\\right)+\\dots+\\lambda_m\\mathbf{H}_m\\left(\\mathbf{x}\\right)$$\n", 108 | "$\\left[\\boldsymbol{\\mu}\\mathbf{G}\\left(\\mathbf{x}\\right)\\right]$表示\n", 109 | "$$\\left[\\boldsymbol{\\mu}\\mathbf{G}\\left(\\mathbf{x}\\right)\\right]=\\mu_1\\mathbf{G}_1\\left(\\mathbf{x}\\right)+\\dots+\\mu_p\\mathbf{G}_p\\left(\\mathbf{x}\\right)$$" 110 | ] 111 | }, 112 | { 113 | "cell_type": "markdown", 114 | "metadata": {}, 115 | "source": [ 116 | "$\\mathbf{G}_k\\left(\\mathbf{x}\\right)$是$g_k$在$\\mathbf{x}$处的黑塞矩阵\n", 117 | "$$\\mathbf{G}_k\\left(\\mathbf{x}\\right)=\\begin{bmatrix} \\frac{\\partial^2g_k}{\\partial x_1^2}\\left(\\mathbf{x}\\right) & \\cdots & \\frac{\\partial^2g_k}{\\partial x_n \\partial x_1}\\left(\\mathbf{x}\\right) \\\\ \\vdots & & \\vdots\\\\ \\frac{\\partial^2g_k}{\\partial x_1 \\partial x_n}\\left(\\mathbf{x}\\right) & \\cdots & \\frac{\\partial^2g_k}{\\partial x_n^2 }\\left(\\mathbf{x}\\right) \\end{bmatrix}$$" 118 | ] 119 | }, 120 | { 121 | "cell_type": "markdown", 122 | "metadata": {}, 123 | "source": [ 124 | "记\n", 125 | "$$T\\left(\\mathbf{x}^*\\right)=\\{\\mathbf{y}\\in\\mathbb{R}^n,D\\mathbf{h}\\left(\\mathbf{x}^*\\right)\\mathbf{y}=\\mathbf{0},Dg_j\\left(\\mathbf{x}^*\\right)\\mathbf{y}=0,j\\in J\\left(\\mathbf{x}\\right)\\}$$\n", 126 | "代表由起作用约束所定义曲面的切线空间。" 127 | ] 128 | }, 129 | { 130 | "cell_type": "markdown", 131 | "metadata": {}, 132 | "source": [ 133 | "定理21.2 二阶必要条件 \n", 134 | "设$f:\\mathbb{R}^n\\to\\mathbb{R}^m,\\mathbf{h}:\\mathbb{R}^n\\to\\mathbb{R}^m\\left(m\\leqslant n\\right),\\mathbf{g}:\\mathbb{R}^n\\to\\mathbb{R}^p, f,\\mathbf{h},\\mathbf{g}\\in\\mathcal{C}^2$,$\\mathbf{x}^*$是一个正则点和局部极小点,则必然存在$\\boldsymbol{\\lambda}^*\\in\\mathbf{R}^m,\\boldsymbol{\\mu}^*\\in\\mathbf{R}^p$,使得以下条件成立:\n", 135 | "1. $\\boldsymbol{\\mu}^*\\geqslant\\mathbf{0}, Df\\left(\\mathbf{x}^*\\right)+\\boldsymbol{\\lambda}^{*\\top}D\\mathbf{h}\\left(\\mathbf{x}^*\\right)+\\boldsymbol{\\mu}^{*\\top}D\\mathbf{g}\\left(\\mathbf{x}^*\\right)=\\mathbf{0}^\\top,\\boldsymbol{\\mu}^{*\\top}\\mathbf{g}\\left(\\mathbf{x}^*\\right)=0$;\n", 136 | "2. 对于所有$\\mathbf{y}\\in T\\left(\\mathbf{x}^*\\right)$,都有$\\mathbf{y}^\\top\\mathbf{L}\\left(\\mathbf{x}^*,\\boldsymbol{\\lambda}^*,\\boldsymbol{\\mu}^*\\right)\\mathbf{y}\\geqslant0$。" 137 | ] 138 | }, 139 | { 140 | "cell_type": "markdown", 141 | "metadata": {}, 142 | "source": [ 143 | "记 \n", 144 | "$$\\tilde{T}\\left(\\mathbf{x}^*,\\boldsymbol{\\mu}^*\\right)=\\{\\mathbf{y}\\in\\mathbb{R}^n,D\\mathbf{h}\\left(\\mathbf{x}^*\\right)\\mathbf{y}=\\mathbf{0},Dg_i\\left(\\mathbf{x}^*\\right)\\mathbf{y}=0,i\\in \\tilde{J}\\left(\\mathbf{x}^*,\\boldsymbol{\\mu}^*\\right)\\}$$\n", 145 | "其中,$\\tilde{J}\\left(\\mathbf{x}^*,\\boldsymbol{\\mu}^*\\right)=\\{i:g_i\\left(\\mathbf{x}^*=0,\\mu_i^*>0\\right)\\}$" 146 | ] 147 | }, 148 | { 149 | "cell_type": "markdown", 150 | "metadata": {}, 151 | "source": [ 152 | "定理21.3 二阶充分条件 \n", 153 | "设$f,\\mathbf{h},\\mathbf{g}\\in\\mathcal{C}^2$,$\\mathbf{x}^*\\in\\mathbb{R}^n$是一个可行点,存在向量$\\boldsymbol{\\lambda}^*\\in\\mathbf{R}^m,\\boldsymbol{\\mu}^*\\in\\mathbf{R}^p$使得\n", 154 | "1. $\\boldsymbol{\\mu}^*\\geqslant\\mathbf{0}, Df\\left(\\mathbf{x}^*\\right)+\\boldsymbol{\\lambda}^{*\\top}D\\mathbf{h}\\left(\\mathbf{x}^*\\right)+\\boldsymbol{\\mu}^{*\\top}D\\mathbf{g}\\left(\\mathbf{x}^*\\right)=\\mathbf{0}^\\top,\\boldsymbol{\\mu}^{*\\top}\\mathbf{g}\\left(\\mathbf{x}^*\\right)=0$;\n", 155 | "2. 对于所有$\\mathbf{y}\\in \\tilde{T}\\left(\\mathbf{x}^*,\\boldsymbol{\\mu}^*\\right),\\mathbf{y}\\neq\\mathbf{0}$,都有$\\mathbf{y}^\\top\\mathbf{L}\\left(\\mathbf{x}^*,\\boldsymbol{\\lambda}^*,\\boldsymbol{\\mu}^*\\right)\\mathbf{y}>0$。\n", 156 | "\n", 157 | "则$\\mathbf{x}^*$是优化问题$\\mathbf{h}\\left(\\mathbf{x}\\right)=\\mathbf{0},\\mathbf{g}\\left(\\mathbf{x}\\right)\\leqslant \\mathbf{0}$的严格局部极小点。" 158 | ] 159 | }, 160 | { 161 | "cell_type": "code", 162 | "execution_count": null, 163 | "metadata": { 164 | "collapsed": true 165 | }, 166 | "outputs": [], 167 | "source": [] 168 | } 169 | ], 170 | "metadata": { 171 | "kernelspec": { 172 | "display_name": "Python 2", 173 | "language": "python", 174 | "name": "python2" 175 | }, 176 | "language_info": { 177 | "codemirror_mode": { 178 | "name": "ipython", 179 | "version": 2 180 | }, 181 | "file_extension": ".py", 182 | "mimetype": "text/x-python", 183 | "name": "python", 184 | "nbconvert_exporter": "python", 185 | "pygments_lexer": "ipython2", 186 | "version": "2.7.13" 187 | } 188 | }, 189 | "nbformat": 4, 190 | "nbformat_minor": 2 191 | } 192 | -------------------------------------------------------------------------------- /第22章 凸优化问题.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "第22章 凸优化问题" 8 | ] 9 | }, 10 | { 11 | "cell_type": "markdown", 12 | "metadata": {}, 13 | "source": [ 14 | "22.1 引言" 15 | ] 16 | }, 17 | { 18 | "cell_type": "markdown", 19 | "metadata": {}, 20 | "source": [ 21 | "把优化问题的可行域限定为凸集。" 22 | ] 23 | }, 24 | { 25 | "cell_type": "markdown", 26 | "metadata": {}, 27 | "source": [ 28 | "22.2 凸函数" 29 | ] 30 | }, 31 | { 32 | "cell_type": "markdown", 33 | "metadata": {}, 34 | "source": [ 35 | "定义22.1 函数$f:\\Omega\\to\\mathbb{R},\\Omega\\subset\\mathbb{R}^n$的图像为集合$\\Omega\\times\\mathbb{R}\\subset\\mathbb{R}^{n+1}$中的点集:\n", 36 | "$$\\left\\{\\begin{bmatrix} \\mathbf{x} \\\\ f\\left(\\mathbf{x}\\right) \\end{bmatrix}:\\mathbf{x}\\in\\Omega\\right\\}$$ " 37 | ] 38 | }, 39 | { 40 | "cell_type": "markdown", 41 | "metadata": {}, 42 | "source": [ 43 | "函数$f$的图像可以描绘成$f\\left(\\mathbf{x}\\right)$关于$\\mathbf{x}$的图形上的点集。" 44 | ] 45 | }, 46 | { 47 | "cell_type": "markdown", 48 | "metadata": {}, 49 | "source": [ 50 | "定义22.2 函数$f:\\Omega\\to\\mathbb{R},\\Omega\\subset\\mathbb{R}^n$的上图(epigraph),记为$epi\\left(f\\right)$,是集合$\\Omega\\times\\mathbb{R}$中的点集:\n", 51 | "$$epi\\left(f\\right)=\\left\\{\\begin{bmatrix} \\mathbf{x} \\\\ \\beta \\end{bmatrix}:\\mathbf{x}\\in\\Omega,\\beta\\in\\mathbb{R},\\beta\\geqslant f\\left(\\mathbf{x}\\right)\\right\\}$$" 52 | ] 53 | }, 54 | { 55 | "cell_type": "markdown", 56 | "metadata": {}, 57 | "source": [ 58 | "函数$f$的上图$epi\\left(f\\right)$是位于集合$\\Omega\\times\\mathbb{R}$中、在函数$f$的图像上和图像上方的点集。" 59 | ] 60 | }, 61 | { 62 | "cell_type": "markdown", 63 | "metadata": {}, 64 | "source": [ 65 | "定义22.3 如果函数$f:\\Omega\\to\\mathbb{R},\\Omega\\subset\\mathbb{R}^n$的上图是凸集,则函数$f$是集合$\\Omega$上的凸函数。" 66 | ] 67 | }, 68 | { 69 | "cell_type": "markdown", 70 | "metadata": {}, 71 | "source": [ 72 | "定理22.1 如果函数$f:\\Omega\\to\\mathbb{R},\\Omega\\subset\\mathbb{R}^n$是集合$\\Omega$上的凸函数,则集合$\\Omega$是凸集。" 73 | ] 74 | }, 75 | { 76 | "cell_type": "markdown", 77 | "metadata": {}, 78 | "source": [ 79 | "定理22.2 对于定义在凸集$\\Omega\\subset\\mathbb{R}^n$上的函数$f:\\Omega\\to\\mathbb{R}$,$f$是凸函数当且仅当对于任意$\\mathbf{x},\\mathbf{y}\\in\\Omega$和任意$\\alpha\\in\\left(0,1\\right)$,都有\n", 80 | "$$f\\left(\\alpha\\mathbf{x}+\\left(1-\\alpha\\right)\\mathbf{y}\\right)\\leqslant\\alpha f\\left(\\mathbf{x}\\right)+\\left(1-\\alpha\\right)f\\left(\\mathbf{y}\\right)$$" 81 | ] 82 | }, 83 | { 84 | "cell_type": "markdown", 85 | "metadata": {}, 86 | "source": [ 87 | "如果函数$f:\\Omega\\to\\mathbb{R}$是定义在凸集$\\Omega$上的凸函数,那么对于任意$\\mathbf{x},\\mathbf{y}\\in\\Omega$,在空间$\\mathbb{R}^{n+1}$中连接两点$\\left[\\mathbf{x}^\\top,f\\left(\\mathbf{x}\\right)\\right]^\\top$和$\\left[\\mathbf{y}^\\top,f\\left(\\mathbf{y}\\right)\\right]^\\top$之间线段上的所有点,都位于函数$f$的图像或上图。" 88 | ] 89 | }, 90 | { 91 | "cell_type": "markdown", 92 | "metadata": {}, 93 | "source": [ 94 | "定理22.3 设函数$f,f_1,f_2$都是凸函数,则对于$\\forall a\\geqslant0$,函数$af$也是凸函数;$f_1+f_2$也是凸函数。" 95 | ] 96 | }, 97 | { 98 | "cell_type": "markdown", 99 | "metadata": {}, 100 | "source": [ 101 | "给定一组凸函数$f_1,\\dots,f_\\ell$和一组非负实数$c_1,\\dots,c_\\ell$,函数$c_1 f_1+\\dots+c_\\ell f_\\ell$也是凸函数;函数$\\max\\{f_1,\\dots,f_\\ell\\}$也是凸函数。" 102 | ] 103 | }, 104 | { 105 | "cell_type": "markdown", 106 | "metadata": {}, 107 | "source": [ 108 | "定义22.4 对于定义在凸集$\\Omega\\subset\\mathbb{R}^n$上的函数$f:\\Omega\\to\\mathbb{R}$,如果对于任意$\\mathbf{x},\\mathbf{y}\\in\\Omega,\\mathbf{x}\\neq\\mathbf{y}$和任意$\\alpha\\in\\left(0,1\\right)$,都有\n", 109 | "$$f\\left(\\alpha\\mathbf{x}+\\left(1-\\alpha\\right)\\mathbf{y}\\right)<\\alpha f\\left(\\mathbf{x}\\right)+\\left(1-\\alpha\\right)f\\left(\\mathbf{y}\\right)$$\n", 110 | "则函数$f$是$\\Omega$上的严格凸函数。" 111 | ] 112 | }, 113 | { 114 | "cell_type": "markdown", 115 | "metadata": {}, 116 | "source": [ 117 | "对于严格凸函数,连接两点$\\left[\\mathbf{x}^\\top,f\\left(\\mathbf{x}\\right)\\right]^\\top$和$\\left[\\mathbf{y}^\\top,f\\left(\\mathbf{y}\\right)\\right]^\\top$之间线段上的所有点(不包括两个断点),都严格位于函数$f$的上图。" 118 | ] 119 | }, 120 | { 121 | "cell_type": "markdown", 122 | "metadata": {}, 123 | "source": [ 124 | "定义22.5 对于定义在凸集$\\Omega\\subset\\mathbb{R}^n$上的函数$f:\\Omega\\to\\mathbb{R}$,当$-f$是(严格)凸函数时,$f$是(严格)凹函数。" 125 | ] 126 | }, 127 | { 128 | "cell_type": "markdown", 129 | "metadata": {}, 130 | "source": [ 131 | "命题22.1 如果函数$f:\\Omega\\to\\mathbb{R},\\Omega\\subset\\mathbb{R}^n$是二次型函数$f\\left(\\mathbf{x}\\right)=\\mathbf{x}^\\top\\mathbf{Q}\\mathbf{x},\\mathbf{Q}\\in\\mathbb{R}^{n\\times n},\\mathbf{Q}=\\mathbf{Q}^\\top$,那么$f$是$\\Omega$上的凸函数,当且仅当对所有$\\mathbf{x},\\mathbf{y}\\in\\Omega$,恒有$\\left(\\mathbf{x}-\\mathbf{y}\\right)^\\top\\mathbf{Q}\\left(\\mathbf{x}-\\mathbf{y}\\right)\\geqslant0$成立。" 132 | ] 133 | }, 134 | { 135 | "cell_type": "markdown", 136 | "metadata": {}, 137 | "source": [ 138 | "定理22.4 设$f:\\Omega\\to\\mathbb{R},f\\in\\mathcal{C}^1$是定义在开凸集$\\Omega\\in\\mathbb{R}^n$上的可微函数,那么$f$是$\\Omega$上的凸函数,当且仅当对于任意$\\mathbf{x},\\mathbf{y}\\in\\Omega$,有\n", 139 | "$$f\\left(\\mathbf{y}\\right)\\geqslant f\\left(\\mathbf{x}\\right)+Df\\left(\\mathbf{x}\\right)\\left(\\mathbf{y}-\\mathbf{x}\\right)$$" 140 | ] 141 | }, 142 | { 143 | "cell_type": "markdown", 144 | "metadata": {}, 145 | "source": [ 146 | "令$x_0\\in\\Omega$,函数$\\ell\\left(\\mathbf{x}\\right)=f\\left(\\mathbf{x}_0\\right)+Df\\left(\\mathbf{x}_0\\right)\\left(\\mathbf{x}-\\mathbf{x}_0\\right)$是函数$f$在点$\\mathbf{x}_0$处的线性近似。 \n", 147 | "函数$f$的图像总是位于线性近似函数的上方,即在定义域内任意一点处,凸函数$f$的线性近似总是位于其上图$epi\\left(f\\right)$的下方。" 148 | ] 149 | }, 150 | { 151 | "cell_type": "markdown", 152 | "metadata": {}, 153 | "source": [ 154 | "函数$f:\\Omega\\to\\mathbb{R}$定义在开凸集$\\Omega\\subset\\mathbb{R}^n$上,如果对于所有$\\mathbf{y}\\in\\Omega$,都有\n", 155 | "$$f\\left(\\mathbf{y}\\right)\\geqslant f\\left(\\mathbf{x}\\right)+\\mathbf{g}^\\top\\left(\\mathbf{y}-\\mathbf{x}\\right)$$\n", 156 | "则称向量$\\mathbf{g}\\in\\mathbb{R}^n$为函数$f$定义在点$\\mathbf{x}\\in\\Omega$处的次梯度。" 157 | ] 158 | }, 159 | { 160 | "cell_type": "markdown", 161 | "metadata": {}, 162 | "source": [ 163 | "如果$\\mathbf{g}$是次梯度,那么对于给定的$\\mathbf{x}_0\\in\\Omega$函数$\\ell\\left(\\mathbf{x}_0\\right)=f\\left(\\mathbf{x}_0\\right)+\\mathbf{g}^\\top\\left(\\mathbf{y}-\\mathbf{x}_0\\right)$位于上图$epi\\left(\\mathbf{x}\\right)$的下方。" 164 | ] 165 | }, 166 | { 167 | "cell_type": "markdown", 168 | "metadata": {}, 169 | "source": [ 170 | "定理22.5 函数$f:\\Omega\\to\\mathbb{R},f\\in\\mathcal{C}^2$定义在开凸集$\\Omega\\subset\\mathbb{R}^n$上的凸函数,当且仅当对于任意$\\mathbf{x}\\in\\Omega$,函数$f$在点$\\mathbf{x}$处的黑塞矩阵$\\mathbf{F}\\left(\\mathbf{x}\\right)$半正定。" 171 | ] 172 | }, 173 | { 174 | "cell_type": "markdown", 175 | "metadata": { 176 | "collapsed": true 177 | }, 178 | "source": [ 179 | "22.3 凸优化问题" 180 | ] 181 | }, 182 | { 183 | "cell_type": "markdown", 184 | "metadata": {}, 185 | "source": [ 186 | "凸优化问题(凸规划)是目标函数是凸函数、约束集是凸集的优化问题。" 187 | ] 188 | }, 189 | { 190 | "cell_type": "markdown", 191 | "metadata": {}, 192 | "source": [ 193 | "定理22.6 已知$f:\\Omega\\to\\mathbb{R}$是定义在凸集$\\Omega\\in\\mathbb{R}^n$上的凸函数,集合$\\Omega$中某一点是$f$的全局极小点,当且仅当它是$f$的局部极小点。" 194 | ] 195 | }, 196 | { 197 | "cell_type": "markdown", 198 | "metadata": {}, 199 | "source": [ 200 | "引理22.2 函数$f:\\Omega\\to\\mathbb{R}$是定义在凸集$\\Omega\\in\\mathbb{R}^n$上的凸函数,$f\\in\\mathcal{C}^1$定义在包含$\\Omega$的开集上。选定点$\\mathbf{x}^*\\in\\Omega$,如果对于任意$\\mathbf{x}\\in\\Omega,\\mathbf{x}\\neq\\mathbf{x}^*$,都有\n", 201 | "$$Df\\left(\\mathbf{x}^*\\right)\\left(\\mathbf{x}-\\mathbf{x}^*\\right)\\geqslant0$$\n", 202 | "则$\\mathbf{x}^*$是在$\\Omega$上的全局极小点。" 203 | ] 204 | }, 205 | { 206 | "cell_type": "markdown", 207 | "metadata": {}, 208 | "source": [ 209 | "定理22.7 函数$f:\\Omega\\to\\mathbb{R}$是定义在凸集$\\Omega\\in\\mathbb{R}^n$上的凸函数,$f\\in\\mathcal{C}^1$定义在包含$\\Omega$的开集上。$\\mathbf{x}^*\\in\\Omega$,对于点$\\mathbf{x}^*$处的任意可行方向$\\mathbf{d}$,有\n", 210 | "$$\\mathbf{d}^\\top\\nabla f\\left(\\mathbf{x}^*\\right)\\geqslant0$$\n", 211 | "则$\\mathbf{x}^*$是在$\\Omega$上的全局极小点。" 212 | ] 213 | }, 214 | { 215 | "cell_type": "markdown", 216 | "metadata": {}, 217 | "source": [ 218 | "推论22.2 函数$f:\\Omega\\to\\mathbb{R}$是定义在凸集$\\Omega\\in\\mathbb{R}^n$上的凸函数,$f\\in\\mathcal{C}^1$定义在包含$\\Omega$的开集上。存在点$\\mathbf{x}^*\\in\\Omega$,使得\n", 219 | "$$\\nabla f\\left(\\mathbf{x}^*\\right)=\\mathbf{0}$$\n", 220 | "则$\\mathbf{x}^*$是在$\\Omega$上的全局极小点。" 221 | ] 222 | }, 223 | { 224 | "cell_type": "markdown", 225 | "metadata": {}, 226 | "source": [ 227 | "定理22.8 函数$f:\\mathbf{R}^n\\to\\mathbf{R},f\\in\\mathcal{C}^1$是可行域\n", 228 | "$$\\Omega=\\{\\mathbf{x}\\in\\mathbb{R}^n:\\mathbf{h}\\left(\\mathbf{x}\\right)=\\mathbf{0}\\}$$\n", 229 | "上的凸函数。$\\mathbf{h}:\\mathbb{R}^n\\to\\mathbb{R}^m,\\mathbf{h}\\in\\mathcal{C}^1$,且$\\Omega$是凸集。假设存在点$\\mathbf{x}^*\\in\\Omega$和$\\boldsymbol{\\lambda}^*\\in\\mathbb{R}^m$,使得\n", 230 | "$$Df\\left(\\mathbf{x}^*\\right)+\\boldsymbol{\\lambda}^{*\\top}D\\mathbf{h}\\left(\\mathbf{x}^*\\right)=\\mathbf{0}^\\top$$\n", 231 | "则$\\mathbf{x}^*$是在$\\Omega$上的全局极小点。" 232 | ] 233 | }, 234 | { 235 | "cell_type": "markdown", 236 | "metadata": {}, 237 | "source": [ 238 | "定理22.9 函数$f:\\mathbf{R}^n\\to\\mathbf{R},f\\in\\mathcal{C}^1$是可行域\n", 239 | "$$\\Omega=\\{\\mathbf{x}\\in\\mathbb{R}^n:\\mathbf{h}\\left(\\mathbf{x}\\right)=\\mathbf{0},\\mathbf{g}\\left(\\mathbf{x}\\right)\\leqslant\\mathbf{0}\\}$$\n", 240 | "上的凸函数。$\\mathbf{h}:\\mathbb{R}^n\\to\\mathbb{R}^m,\\mathbf{g}:\\mathbb{R}^n\\to\\mathbb{R}^p,\\mathbf{h},\\mathbf{g}\\in\\mathcal{C}^1$,且$\\Omega$是凸集。假设存在点$\\mathbf{x}^*\\in\\Omega$和$\\boldsymbol{\\lambda}^*\\in\\mathbb{R}^m,\\boldsymbol{\\mu}^*\\in\\mathbb{R}^p$,使得\n", 241 | "1. $\\boldsymbol{\\mu}\\geqslant\\mathbf{0}$\n", 242 | "2. $Df\\left(\\mathbf{x}^*\\right)+\\boldsymbol{\\lambda}^{*\\top}D\\mathbf{h}\\left(\\mathbf{x}^*\\right)+\\boldsymbol{\\mu}^{*\\top}D\\mathbf{g}\\left(\\mathbf{x}^*\\right)=\\mathbf{0}^\\top$\n", 243 | "3. $\\boldsymbol{\\mu}^{*\\top}\\mathbf{g}\\left(\\mathbf{x}^*\\right)=\\mathbf{0}$\n", 244 | "\n", 245 | "则$\\mathbf{x}^*$是在$\\Omega$上的全局极小点。" 246 | ] 247 | } 248 | ], 249 | "metadata": { 250 | "kernelspec": { 251 | "display_name": "Python 2", 252 | "language": "python", 253 | "name": "python2" 254 | }, 255 | "language_info": { 256 | "codemirror_mode": { 257 | "name": "ipython", 258 | "version": 2 259 | }, 260 | "file_extension": ".py", 261 | "mimetype": "text/x-python", 262 | "name": "python", 263 | "nbconvert_exporter": "python", 264 | "pygments_lexer": "ipython2", 265 | "version": "2.7.13" 266 | } 267 | }, 268 | "nbformat": 4, 269 | "nbformat_minor": 2 270 | } 271 | -------------------------------------------------------------------------------- /第23章 有约束优化问题的求解算法.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "第23章 有约束优化问题的求解算法" 8 | ] 9 | }, 10 | { 11 | "cell_type": "markdown", 12 | "metadata": {}, 13 | "source": [ 14 | "23.1 引言" 15 | ] 16 | }, 17 | { 18 | "cell_type": "markdown", 19 | "metadata": {}, 20 | "source": [ 21 | "特殊约束条件下的优化问题的求解算法。" 22 | ] 23 | }, 24 | { 25 | "cell_type": "markdown", 26 | "metadata": {}, 27 | "source": [ 28 | "23.2 投影法" 29 | ] 30 | }, 31 | { 32 | "cell_type": "markdown", 33 | "metadata": {}, 34 | "source": [ 35 | "优化问题的通用迭代公式\n", 36 | "$$\\mathbf{x}^{\\left(k+1\\right)}=\\mathbf{x}^{\\left(k\\right)}+\\alpha_k\\mathbf{d}^{\\left(k\\right)}$$\n", 37 | "其中,$\\mathbf{d}^{\\left(k\\right)}$是关于$\\nabla f\\left(\\mathbf{x}^{\\left(k\\right)}\\right)$的函数。" 38 | ] 39 | }, 40 | { 41 | "cell_type": "markdown", 42 | "metadata": {}, 43 | "source": [ 44 | "如果$\\mathbf{x}^{\\left(k\\right)}+\\alpha_k\\mathbf{d}^{\\left(k\\right)}$在约束集$\\Omega$内,则令$\\mathbf{x}^{\\left(k+1\\right)}=\\mathbf{x}^{\\left(k\\right)}+\\alpha_k\\mathbf{d}^{\\left(k\\right)}$;否则将其投影到$\\Omega$中,并将投影结果作为$\\mathbf{x}^{\\left(k+1\\right)}$。" 45 | ] 46 | }, 47 | { 48 | "cell_type": "markdown", 49 | "metadata": {}, 50 | "source": [ 51 | "在特殊约束集\n", 52 | "$$\\Omega=\\{\\mathbf{x}:l_i\\leqslant x_i\\leqslant u_i,i=1,\\dots,n\\}$$\n", 53 | "下,约束集$\\Omega$是$\\mathbb{R}^n$中的一个“方框”,称为框式约束。" 54 | ] 55 | }, 56 | { 57 | "cell_type": "markdown", 58 | "metadata": {}, 59 | "source": [ 60 | "对于点$\\mathbf{x}\\in\\mathbb{R}^n$,定义$\\mathbf{y}=\\Pi\\left[\\mathbf{x}\\right]\\in\\mathbb{R}^n$\n", 61 | "$$y_i=\\min\\{u_i,\\max\\{l_i,x_i\\}\\}=\\left\\{\n", 62 | "\\begin{aligned}\n", 63 | "u_i,\\quad x_i> u_i \\\\\n", 64 | "x_i,l_i\\leqslant x_i\\leqslant u_i \\\\\n", 65 | "l_i,\\quad x_i0\n", 342 | "\\end{aligned}\n", 343 | "\\right.$$\n", 344 | "称为绝对值罚函数,其等于$\\sum|g_i\\left(\\mathbf{x}\\right)|$,是对$\\mathbf{x}$所有无法满足的约束条件进行求和。" 345 | ] 346 | }, 347 | { 348 | "cell_type": "markdown", 349 | "metadata": {}, 350 | "source": [ 351 | "库朗-贝尔特拉米罚函数能够确保可微\n", 352 | "$$\\mathbf{P}\\left(\\mathbf{x}\\right)=\\sum_{i=1}^{p}\\left(g_i^+\\left(\\mathbf{x}\\right)\\right)^2$$" 353 | ] 354 | }, 355 | { 356 | "cell_type": "markdown", 357 | "metadata": {}, 358 | "source": [ 359 | "惩罚因子$\\gamma$越大,违反约束点受罚越重,近似解就与真正解越接近。当惩罚因子$\\gamma\\to\\infty$时,罚函数法得到的就是有约束问题的真正解。" 360 | ] 361 | } 362 | ], 363 | "metadata": { 364 | "kernelspec": { 365 | "display_name": "Python 2", 366 | "language": "python", 367 | "name": "python2" 368 | }, 369 | "language_info": { 370 | "codemirror_mode": { 371 | "name": "ipython", 372 | "version": 2 373 | }, 374 | "file_extension": ".py", 375 | "mimetype": "text/x-python", 376 | "name": "python", 377 | "nbconvert_exporter": "python", 378 | "pygments_lexer": "ipython2", 379 | "version": "2.7.13" 380 | } 381 | }, 382 | "nbformat": 4, 383 | "nbformat_minor": 2 384 | } 385 | -------------------------------------------------------------------------------- /第2章 向量空间与矩阵.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "第2章 向量空间与矩阵" 8 | ] 9 | }, 10 | { 11 | "cell_type": "markdown", 12 | "metadata": {}, 13 | "source": [ 14 | "2.1 向量与矩阵" 15 | ] 16 | }, 17 | { 18 | "cell_type": "markdown", 19 | "metadata": {}, 20 | "source": [ 21 | "$n$维列向量:含有$n$个数的数组,记为 \n", 22 | "$$\\mathbf{a}=\\begin{bmatrix} a_1 \\\\ a_2 \\\\ \\vdots \\\\ a_n \\end{bmatrix}$$ \n", 23 | "$a_i$表示向量$a$的第$i$个元素。" 24 | ] 25 | }, 26 | { 27 | "cell_type": "markdown", 28 | "metadata": {}, 29 | "source": [ 30 | "定义$\\mathbb{R}$为全体实数组成的集合,则由实数组成的$n$维列向量可表示为$\\mathbb{R}^n$,称为$n$维实数向量空间。" 31 | ] 32 | }, 33 | { 34 | "cell_type": "markdown", 35 | "metadata": {}, 36 | "source": [ 37 | "$n$维行向量记为 \n", 38 | "$$\\mathbf{a}=\\left[a_1, a_2, \\dots, a_n \\right]$$" 39 | ] 40 | }, 41 | { 42 | "cell_type": "markdown", 43 | "metadata": {}, 44 | "source": [ 45 | "向量$\\mathbf{a}$的转置记为$\\mathbf{a}^\\top$。如果\n", 46 | "$$\\mathbf{a}=\\begin{bmatrix} a_1 \\\\ a_2 \\\\ \\vdots \\\\ a_n \\end{bmatrix}$$ \n", 47 | "则\n", 48 | "$$\\mathbf{a}^\\top=\\left[a_1, a_2, \\dots, a_n \\right]$$ \n", 49 | "相应的,可记为 \n", 50 | "$$\\mathbf{a}=\\left[a_1, a_2, \\dots, a_n \\right]^\\top$$ " 51 | ] 52 | }, 53 | { 54 | "cell_type": "markdown", 55 | "metadata": {}, 56 | "source": [ 57 | "给定向量$\\mathbf{a}=\\left[a_1, a_2, \\dots, a_n \\right]^\\top$和向量$\\mathbf{b}=\\left[b_1, b_2, \\dots, b_n \\right]^\\top$,如果$a_i=b_i,i=1,2,\\dots,n$,则两个向量相等。" 58 | ] 59 | }, 60 | { 61 | "cell_type": "markdown", 62 | "metadata": {}, 63 | "source": [ 64 | "向量加法运算:\n", 65 | "$$\\mathbf{a}+\\mathbf{b}=\\left[a_1+b_1, a_2+b_2, \\dots, a_n+b_n \\right]^\\top$$" 66 | ] 67 | }, 68 | { 69 | "cell_type": "markdown", 70 | "metadata": {}, 71 | "source": [ 72 | "向量加法运算的性质:\n", 73 | "1. 交换性 $\\mathbf{a}+\\mathbf{b}=\\mathbf{b}+\\mathbf{a}$\n", 74 | "2. 结合性 $\\left(\\mathbf{a}+\\mathbf{b}\\right)+\\mathbf{c}=\\mathbf{a}+\\left(\\mathbf{b}+\\mathbf{c}\\right)$\n", 75 | "3. 存在零向量 \n", 76 | "$$\\mathbf{0}=\\left[0,0,\\dots,0\\right]^\\top$$ \n", 77 | "使得\n", 78 | "$$\\mathbf{a}+\\mathbf{0}=\\mathbf{0}+\\mathbf{a}=\\mathbf{a}$$" 79 | ] 80 | }, 81 | { 82 | "cell_type": "markdown", 83 | "metadata": {}, 84 | "source": [ 85 | "向量减法运算:\n", 86 | "$$\\mathbf{a}-\\mathbf{b}=\\left[a_1-b_1, a_2-b_2, \\dots, a_n-b_n \\right]^\\top$$\n", 87 | "$$\\mathbf{0}-\\mathbf{b}=-\\mathbf{b}$$" 88 | ] 89 | }, 90 | { 91 | "cell_type": "markdown", 92 | "metadata": {}, 93 | "source": [ 94 | "向量减法运算性质:\n", 95 | "$$\\mathbf{b}+\\{\\mathbf{a}-\\mathbf{b}\\}=\\mathbf{a} \\\\\n", 96 | "-\\left(-\\mathbf{b}\\right)=\\mathbf{b} \\\\\n", 97 | "-\\left(\\mathbf{a}-\\mathbf{b}\\right)=\\mathbf{b}-\\mathbf{a}$$" 98 | ] 99 | }, 100 | { 101 | "cell_type": "markdown", 102 | "metadata": {}, 103 | "source": [ 104 | "设$\\mathbf{x}=\\left[x_1,x_2,\\dots,x_n\\right]^\\top$是$\\mathbf{a}+\\mathbf{x}=\\mathbf{b}$的解,有 \n", 105 | "$$a_1+x_1=b_1 \\\\a_2+x_2=b_2 \\\\ \\vdots \\\\ a_n+x_n=b_n$$ \n", 106 | "则$$\\mathbf{x}=\\mathbf{b}-\\mathbf{a}$$\n", 107 | "即向量$\\mathbf{b}-\\mathbf{a}$是向量方程$\\mathbf{a}+\\mathbf{x}=\\mathbf{b}$的唯一解。" 108 | ] 109 | }, 110 | { 111 | "cell_type": "markdown", 112 | "metadata": {}, 113 | "source": [ 114 | "标量向量乘法运算:\n", 115 | "$$\\alpha\\mathbf{a}=\\left[\\alpha a_1, \\alpha a_2, \\dots, \\alpha a_n \\right], \\quad \\alpha \\in \\mathbb{R}, \\mathbf{a} \\in \\mathbb{R}^n$$" 116 | ] 117 | }, 118 | { 119 | "cell_type": "markdown", 120 | "metadata": {}, 121 | "source": [ 122 | "标量向量乘法运算性质:\n", 123 | "1. 分配律 $$\\alpha\\left(\\mathbf{a}+\\mathbf{b}\\right)=\\alpha\\mathbf{a}+\\alpha\\mathbf{b} \\\\ \n", 124 | "\\left(\\alpha+\\beta\\right)\\mathbf{a}=\\alpha\\mathbf{a}+\\beta\\mathbf{a}$$ \n", 125 | "2. 结合性 $\\alpha\\left(\\beta\\mathbf{a}\\right)=\\left(\\alpha\\beta\\right)\\mathbf{a}$\n", 126 | "3. 标量1满足 $1\\mathbf{a}=\\mathbf{a}$\n", 127 | "4. 任意标量$\\alpha$满足 $\\alpha\\mathbf{0}=\\mathbf{0}$\n", 128 | "5. 标量0满足 $0\\mathbf{a}=\\mathbf{0}$\n", 129 | "6. 标量-1满足 $\\left(-1\\right)\\mathbf{a}=-\\mathbf{a}$" 130 | ] 131 | }, 132 | { 133 | "cell_type": "markdown", 134 | "metadata": {}, 135 | "source": [ 136 | "如果方程$$\\alpha_1\\mathbf{a}_1+\\alpha_2\\mathbf{a}_2+\\dots+\\alpha_k\\mathbf{a}_k=\\mathbf{0}$$中所有的系数$\\alpha_i\\left(i=1,2,\\dots,k\\right)$都等于零,则称向量集合$\\{\\mathbf{a_1},\\mathbf{a_2},\\dots,\\mathbf{a_k}\\}$是线性无关的,否则称为线性相关的." 137 | ] 138 | }, 139 | { 140 | "cell_type": "markdown", 141 | "metadata": {}, 142 | "source": [ 143 | "如果向量集合中只包含一个$\\mathbf{0}$向量元素,由于对于任意$\\alpha\\neq0$,都有$\\alpha\\mathbf{0}=\\mathbf{0}$,因此该集合是线性相关的.所有包含$\\mathbf{0}$向量的集合都是线性相关的." 144 | ] 145 | }, 146 | { 147 | "cell_type": "markdown", 148 | "metadata": {}, 149 | "source": [ 150 | "如果集合中只包括单个非零向量$\\mathbf{a}\\neq\\mathbf{0}$,只有$\\alpha=0$时,才有$\\alpha\\mathbf{0}=\\mathbf{0}$成立,因此该集合是线性无关的." 151 | ] 152 | }, 153 | { 154 | "cell_type": "markdown", 155 | "metadata": {}, 156 | "source": [ 157 | "给定向量$\\mathbf{a}$,如果存在标量$\\alpha_1,\\alpha_2,\\dots,\\alpha_k$,使得\n", 158 | "$$\\mathbf{a}=\\alpha_1\\mathbf{a}_1+\\alpha_2\\mathbf{a}_2+\\dots+\\alpha_k\\mathbf{a}_k$$\n", 159 | "则称向量$\\mathbf{a}$为$\\mathbf{a}_1,\\mathbf{a}_2,\\dots,\\mathbf{a}_k$的线性组合." 160 | ] 161 | }, 162 | { 163 | "cell_type": "markdown", 164 | "metadata": {}, 165 | "source": [ 166 | "命题2.1 向量结合$\\{\\mathbf{a_1},\\mathbf{a_2},\\dots,\\mathbf{a_k}\\}$是线性相关的,当且仅当集合中的一个向量可以表示为其他向量的线性组合." 167 | ] 168 | }, 169 | { 170 | "cell_type": "markdown", 171 | "metadata": {}, 172 | "source": [ 173 | "令$\\mathcal{V}\\subset\\mathbb{R}^n$,如果对于$\\mathbf{a},\\mathbf{b}\\in\\mathcal{V}$,都有$\\mathbf{a}+\\mathbf{b}\\in\\mathcal{V},\\alpha\\mathbf{a}\\in\\mathcal{V}$($\\alpha$为任意标量),即$\\mathcal{V}$在向量加法运算和标量向量乘法运算下是封闭的,则称$\\mathcal{V}$为$\\mathbb{R}$的子空间." 174 | ] 175 | }, 176 | { 177 | "cell_type": "markdown", 178 | "metadata": {}, 179 | "source": [ 180 | "令$\\mathbf{a}\\in\\mathcal{V}$,因为$\\left(-1\\right)\\mathbf{a}=-\\mathbf{a}$,所以$-\\mathbf{a}\\in\\mathcal{V}$;因为$\\mathbf{a}+\\left(-\\mathbf{a}\\right)=\\mathbf{0}$,所以$\\mathbf{0}\\in\\mathcal{V}$,即每个子空间都包含$\\mathbf{0}$向量." 181 | ] 182 | }, 183 | { 184 | "cell_type": "markdown", 185 | "metadata": {}, 186 | "source": [ 187 | "设$\\mathbf{a}_1,\\mathbf{a}_2,\\dots,\\mathbf{a}_k\\in\\mathbb{R}$,它们所有线性组合的集合记为\n", 188 | "$$span\\left[\\mathbf{a}_1,\\mathbf{a}_2,\\dots,\\mathbf{a}_k\\right]=\\left\\{\\sum_{i=1}^{k}\\alpha_i\\mathbf{a}_i:\\alpha_1,\\alpha_2,\\dots,\\alpha_k\\in\\mathbb{R}\\right\\}$$ \n", 189 | "称为$\\mathbf{a}_1,\\mathbf{a}_2,\\dots,\\mathbf{a}_k$张成的子空间." 190 | ] 191 | }, 192 | { 193 | "cell_type": "markdown", 194 | "metadata": {}, 195 | "source": [ 196 | "对于向量$\\mathbf{a}$,子空间$span\\left[\\mathbf{a}\\right]$由向量$\\alpha\\mathbf{a}$组成,其中$\\alpha\\in\\mathbb{R}$." 197 | ] 198 | }, 199 | { 200 | "cell_type": "markdown", 201 | "metadata": {}, 202 | "source": [ 203 | "如果向量$\\mathbf{a}$可表示为$\\mathbf{a}_1,\\mathbf{a}_2,\\dots,\\mathbf{a}_k$的线性组合,则\n", 204 | "$$\\mathrm{span}\\left[\\mathbf{a}_1,\\mathbf{a}_2,\\dots,\\mathbf{a}_k,\\mathbf{a}\\right]=\\mathrm{span}\\left[\\mathbf{a}_1,\\mathbf{a}_2,\\dots,\\mathbf{a}_k\\right]$$" 205 | ] 206 | }, 207 | { 208 | "cell_type": "markdown", 209 | "metadata": {}, 210 | "source": [ 211 | "给定子空间$\\mathcal{V}$,如果存在线性无关的向量集合$\\{\\mathbf{a}_1,\\mathbf{a}_2,\\dots,\\mathbf{a}_k\\}\\subset\\mathcal{V}$使得$\\mathcal{V}=\\mathrm{span}\\left[\\mathbf{a}_1,\\mathbf{a}_2,\\dots,\\mathbf{a}_k\\right]$,则称$\\{\\mathbf{a}_1,\\mathbf{a}_2,\\dots,\\mathbf{a}_k\\}$是子空间$\\mathcal{V}$的一组基.子空间$\\mathcal{V}$中的所有基都包含相同数量的向量,这一数量称为$\\mathcal{V}$的维数,记为$\\mathrm{dim}\\mathcal{V}$." 212 | ] 213 | }, 214 | { 215 | "cell_type": "markdown", 216 | "metadata": {}, 217 | "source": [ 218 | "命题2.2 如果$\\{\\mathbf{a}_1,\\mathbf{a}_2,\\dots,\\mathbf{a}_k\\}$是子空间$\\mathcal{V}$的一组基,则$\\mathcal{V}$中的任意向量$\\mathbf{a}$都可以唯一的表示为\n", 219 | "$$\\mathbf{a}=\\alpha_1\\mathbf{a}_1+\\alpha_2\\mathbf{a}_2+\\dots+\\alpha_k\\mathbf{a}_k$$" 220 | ] 221 | }, 222 | { 223 | "cell_type": "markdown", 224 | "metadata": {}, 225 | "source": [ 226 | "给定$\\mathcal{V}$的一组基$\\{\\mathbf{a}_1,\\mathbf{a}_2,\\dots,\\mathbf{a}_k\\}$和向量$\\mathbf{a}\\in\\mathcal{V}$,如果\n", 227 | "$$\\mathbf{a}=\\alpha_1\\mathbf{a}_1+\\alpha_2\\mathbf{a}_2+\\dots+\\alpha_k\\mathbf{a}_k$$ \n", 228 | "则系数$\\alpha_i,i=1,2,\\dots,k$称为向量$\\mathbf{a}$对应于基$\\{\\mathbf{a}_1,\\mathbf{a}_2,\\dots,\\mathbf{a}_k\\}$的坐标." 229 | ] 230 | }, 231 | { 232 | "cell_type": "markdown", 233 | "metadata": {}, 234 | "source": [ 235 | "$\\mathbb{R}^n$的标准基定义为 \n", 236 | "$$\\mathbf{e}_1=\\begin{bmatrix} 1 \\\\ 0 \\\\ 0 \\\\ \\vdots \\\\ 0 \\end{bmatrix},\\mathbf{e}_2=\\begin{bmatrix} 0 \\\\ 1 \\\\ 0 \\\\ \\vdots \\\\ 0 \\end{bmatrix},\\dots,\\mathbf{e}_n=\\begin{bmatrix} 0 \\\\ 0 \\\\ 0 \\\\ \\vdots \\\\ 1 \\end{bmatrix}$$ " 237 | ] 238 | }, 239 | { 240 | "cell_type": "markdown", 241 | "metadata": {}, 242 | "source": [ 243 | "在标准基下,向量$\\mathbf{x}$可表示为 \n", 244 | "$$\\mathbf{x}=\\begin{bmatrix} x_1 \\\\ x_2 \\\\ \\vdots \\\\ x_n \\end{bmatrix}=x_1\\mathbf{e}_1+x_2\\mathbf{e}_2+\\dots+x_n\\mathbf{e}_n$$" 245 | ] 246 | }, 247 | { 248 | "cell_type": "markdown", 249 | "metadata": {}, 250 | "source": [ 251 | "令$\\mathbb{C}$表示复数集合,$\\mathbb{C}^n$表示$n$维复数向量集合.集合$\\mathbb{C}^n$具有与$\\mathbb{R}^n$类似的属性,其中标量可以取复数." 252 | ] 253 | }, 254 | { 255 | "cell_type": "markdown", 256 | "metadata": {}, 257 | "source": [ 258 | "矩阵:行列数组,$m$行$n$列矩阵称为$m\\times n$矩阵,记为\n", 259 | "$$\\mathbf{A}=\\begin{bmatrix} a_{11} & a_{12} & \\cdots & a_{1n} \\\\ a_{21} & a_{22} & \\cdots & a_{2n} \\\\ \\vdots & \\vdots & \\ddots & \\vdots \\\\ a_{m1} & a_{m2} & \\cdots & a_{mn} \\end{bmatrix}$$ \n", 260 | "位于矩阵第$i$行第$j$列的实数$a_{ij}$称为矩阵的第$\\left(i,j\\right)$个元素." 261 | ] 262 | }, 263 | { 264 | "cell_type": "markdown", 265 | "metadata": {}, 266 | "source": [ 267 | "如果认为矩阵$\\mathbf{A}$是由$n$个列向量组成的,则每列都是$\\mathbb{R}^m$空间的一个列向量. \n", 268 | "如果认为矩阵$\\mathbf{A}$是由$m$个行向量组成的,则每行都是$\\mathbb{R}^n$空间的一个列向量." 269 | ] 270 | }, 271 | { 272 | "cell_type": "markdown", 273 | "metadata": {}, 274 | "source": [ 275 | "$m \\times n$矩阵$\\mathbf{A}$的转置矩阵$\\mathbf{A}^\\top$是一个$n\\times m$矩阵  \n", 276 | "$$\\mathbf{A}^\\top=\\begin{bmatrix} a_{11} & a_{21} & \\cdots & a_{m1} \\\\ a_{12} & a_{22} & \\cdots & a_{m2} \\\\ \\vdots & \\vdots & \\ddots & \\vdots \\\\ a_{1n} & a_{2n} & \\cdots & a_{mn} \\end{bmatrix}$$ " 277 | ] 278 | }, 279 | { 280 | "cell_type": "markdown", 281 | "metadata": {}, 282 | "source": [ 283 | "符号$\\mathbb{R}^{m\\times n}$表示由$m\\times n$矩阵组成的集合,矩阵中每个元素都是实数. \n", 284 | "$\\mathbb{R}^n$中的列向量可视为$\\mathbb{R}^{n\\times 1}$中的元素.$n$维行向量视为$\\mathbb{R}^{1\\times n}$中的元素." 285 | ] 286 | }, 287 | { 288 | "cell_type": "markdown", 289 | "metadata": { 290 | "collapsed": true 291 | }, 292 | "source": [ 293 | "2.2 矩阵的秩" 294 | ] 295 | }, 296 | { 297 | "cell_type": "markdown", 298 | "metadata": {}, 299 | "source": [ 300 | "$m\\times n$矩阵\n", 301 | "$$\\mathbf{A}=\\begin{bmatrix} a_{11} & a_{12} & \\cdots & a_{1n} \\\\ a_{21} & a_{22} & \\cdots & a_{2n} \\\\ \\vdots & \\vdots & \\ddots & \\vdots \\\\ a_{m1} & a_{m2} & \\cdots & a_{mn} \\end{bmatrix}$$ \n", 302 | "的第$k$列用$\\mathbf{a}_k$表示 \n", 303 | "$$\\mathbf{a}_k=\\begin{bmatrix} a_1k \\\\ a_2k \\\\ \\vdots \\\\ a_mk \\end{bmatrix}$$ " 304 | ] 305 | }, 306 | { 307 | "cell_type": "markdown", 308 | "metadata": {}, 309 | "source": [ 310 | "矩阵$\\mathbf{A}$中线性无关列的最大数据称为矩阵$\\mathbf{A}$的秩,记为$\\mathrm{rank}\\mathbf{A}$.  \n", 311 | "$$\\mathrm{rank}\\mathbf{A}=\\mathrm{dim}\\thinspace\\mathrm{span}\\left[a_1,a_2,\\dots,a_n\\right]$$" 312 | ] 313 | }, 314 | { 315 | "cell_type": "markdown", 316 | "metadata": {}, 317 | "source": [ 318 | "命题2.3 一下运算中,矩阵$\\mathbf{A}$的秩保持不变:\n", 319 | "1. 矩阵$\\mathbf{A}$的某些列乘以非零标量;\n", 320 | "2. 矩阵内部交换次序;\n", 321 | "3. 在矩阵中加入列,该列是其他列的线性组合." 322 | ] 323 | }, 324 | { 325 | "cell_type": "markdown", 326 | "metadata": {}, 327 | "source": [ 328 | "如果矩阵$\\mathbf{A}$的行数等于列数,则该矩阵为方阵." 329 | ] 330 | }, 331 | { 332 | "cell_type": "markdown", 333 | "metadata": {}, 334 | "source": [ 335 | "行列式是与方阵$\\mathbf{A}$对应的一个标量,记为$\\mathrm{det}\\mathbf{A}$或$|\\mathbf{A}|$." 336 | ] 337 | }, 338 | { 339 | "cell_type": "markdown", 340 | "metadata": {}, 341 | "source": [ 342 | "方阵的行列式是各列的函数,具有一下性质:\n", 343 | "1. 方阵$\\mathbf{A}=\\left[a_1,a_2,\\dots,a_n\\right]$的行列式是各列的线性函数,即对与任意$\\alpha,\\beta\\in\\mathbb{R}$和$\\mathbf{a}_k^{\\left(1\\right)},\\mathbf{b}_k^{\\left(2\\right)}\\in\\mathbb{R}^n$,有  \n", 344 | "$$\\mathrm{det}=\\left[\\mathbf{a}_1,\\dots,\\mathbf{a}_{k-1},\\alpha\\mathbf{a}_k^{\\left(1\\right)}+\\beta\\mathbf{a}_k^{\\left(2\\right)},\\mathbf{a}_{k+1},\\dots,\\mathbf{a}_n\\right] \\\\\n", 345 | "=\\alpha\\left[\\mathbf{a}_1,\\dots,\\mathbf{a}_{k-1},\\mathbf{a}_k^{\\left(1\\right)},\\mathbf{a}_{k+1},\\dots,\\mathbf{a}_n\\right] \\\\ \n", 346 | "+\\beta\\left[\\mathbf{a}_1,\\dots,\\mathbf{a}_{k-1},\\mathbf{a}_k^{\\left(2\\right)},\\mathbf{a}_{k+1},\\dots,\\mathbf{a}_n\\right]$$ \n", 347 | "2. 如果对于某个$k$,有$\\mathbf{a}_k=\\mathbf{a}_{k+1}$,则 \n", 348 | "$$\\mathrm{det}\\mathbf{A}=\\mathrm{det}\\left[a_1,\\dots,a_k,a_{k+1},\\dots,a_n\\right]=\\mathrm{det}\\left[a_1,\\dots,a_k,a_k,\\dots,a_n\\right]=0$$ \n", 349 | "3. 令 \n", 350 | "$$\\mathbf{I}_n=\\left[\\mathbf{e}_1,\\mathbf{e}_2,\\dots,\\mathbf{e}_n\\right]=\\begin{bmatrix} 1 & 0 & \\cdots & 0 \\\\ 0 & 1 & \\cdots & 0 \\\\ \\vdots & \\vdots & \\ddots & \\vdots \\\\ 0 & 0 & \\cdots & 1 \\end{bmatrix}$$ \n", 351 | "其中$\\{\\mathbf{e}_1,\\mathbf{e}_2,\\dots,\\mathbf{e}_n\\}$是$\\mathbb{R}_n$的标准基,则 \n", 352 | "$$\\mathrm{det}\\mathbf{I}_n=1$$" 353 | ] 354 | }, 355 | { 356 | "cell_type": "markdown", 357 | "metadata": {}, 358 | "source": [ 359 | "如果性质1中$\\alpha=\\beta=0$,则 \n", 360 | "$$\\mathrm{det}\\left[\\mathbf{a}_1,\\dots,\\mathbf{a}_{k-1},\\mathbf{0},\\mathbf{a}_{k+1},\\mathbf{a}_n\\right]=0$$ \n", 361 | "即如果方阵中一列为$\\mathbf{0}$,则该方阵的行列式等于0." 362 | ] 363 | }, 364 | { 365 | "cell_type": "markdown", 366 | "metadata": {}, 367 | "source": [ 368 | "如果在方阵中的一列中加上另外一列与某个标量的乘积,行列式的值不会发生变化. \n", 369 | "$$\\mathrm{det}\\left[\\mathbf{a}_1,\\dots,\\mathbf{a}_{k-1},\\mathbf{a}_k+\\alpha\\mathbf{a}_j,\\mathbf{a}_{k+1},\\dots,\\mathbf{a}_j,\\dots,\\mathbf{a}_n\\right] \\\\\n", 370 | "=\\mathrm{det}\\left[\\mathbf{a}_1,\\dots,\\mathbf{a}_{k-1},\\mathbf{a}_k,\\mathbf{a}_{k+1},\\dots,\\mathbf{a}_j,\\dots,\\mathbf{a}_n\\right] \\\\\n", 371 | "+\\alpha\\mathrm{det}\\left[\\mathbf{a}_1,\\dots,\\mathbf{a}_{k-1},\\mathbf{a}_j,\\mathbf{a}_{k+1},\\dots,\\mathbf{a}_j,\\dots,\\mathbf{a}_n\\right] \\\\\n", 372 | "=\\mathrm{det}\\left[\\mathbf{a}_1,\\dots,\\mathbf{a}_{k-1},\\mathbf{a}_k,\\mathbf{a}_{k+1},\\dots,\\mathbf{a}_j,\\dots,\\mathbf{a}_n\\right]$$" 373 | ] 374 | }, 375 | { 376 | "cell_type": "markdown", 377 | "metadata": {}, 378 | "source": [ 379 | "如果交换方阵中的列次序,则行列式的符号将发生改变.\n", 380 | "$$\\mathrm{det}\\left[\\mathbf{a}_1,\\dots,\\mathbf{a}_{k-1},\\mathbf{a}_k,\\mathbf{a}_{k+1},\\dots,\\mathbf{a}_n\\right] \\\\\n", 381 | "=\\mathrm{det}\\left[\\mathbf{a}_1,\\dots,\\mathbf{a}_{k-1},\\mathbf{a}_k+\\mathbf{a}_{k+1},\\mathbf{a}_{k+1},\\dots,\\mathbf{a}_n\\right] \\\\\n", 382 | "=\\mathrm{det}\\left[\\mathbf{a}_1,\\dots,\\mathbf{a}_{k-1},\\mathbf{a}_k+\\mathbf{a}_{k+1},\\mathbf{a}_{k+1}-\\left(\\mathbf{a}_k+\\mathbf{a}_{k+1}\\right),\\dots,\\mathbf{a}_n\\right] \\\\\n", 383 | "=\\mathrm{det}\\left[\\mathbf{a}_1,\\dots,\\mathbf{a}_{k-1},\\mathbf{a}_k+\\mathbf{a}_{k+1},-\\mathbf{a}_k,\\dots,\\mathbf{a}_n\\right] \\\\\n", 384 | "=-\\mathrm{det}\\left[\\mathbf{a}_1,\\dots,\\mathbf{a}_{k-1},\\mathbf{a}_k+\\mathbf{a}_{k+1},\\mathbf{a}_k,\\dots,\\mathbf{a}_n\\right] \\\\\n", 385 | "=-\\left(\\mathrm{det}\\left[\\mathbf{a}_1,\\dots,\\mathbf{a}_{k-1},\\mathbf{a}_k,\\mathbf{a}_k,\\dots,\\mathbf{a}_n\\right]+ \\\\ \\mathrm{det}\\left[\\mathbf{a}_1,\\dots,\\mathbf{a}_{k-1},\\mathbf{a}_{k+1},\\mathbf{a}_k,\\dots,\\mathbf{a}_n\\right]\\right) \\\\\n", 386 | "=-\\mathrm{det}\\left[\\mathbf{a}_1,\\dots,\\mathbf{a}_{k-1},\\mathbf{a}_{k+1},\\mathbf{a}_k,\\dots,\\mathbf{a}_n\\right]$$" 387 | ] 388 | }, 389 | { 390 | "cell_type": "markdown", 391 | "metadata": {}, 392 | "source": [ 393 | "给定$m\\times n$矩阵$\\mathbf{A}$,其中$p$阶子式是一个$p\\times p$矩阵的行列式,该$p\\times p$行列式由矩阵$\\mathbf{A}$去掉$m-p$行和$n-p$列获得,其中$p\\leqslant\\min{\\{m,n\\}}$" 394 | ] 395 | }, 396 | { 397 | "cell_type": "markdown", 398 | "metadata": { 399 | "collapsed": true 400 | }, 401 | "source": [ 402 | "命题2.4 如果一个$m\\times n\\left(m\\geqslant n\\right)$矩阵$\\mathbf{A}$具有非零的$n$阶子式,则$\\mathbf{A}$的各列是线性无关的,即$\\mathrm{rank}\\mathbf{A}=n$." 403 | ] 404 | }, 405 | { 406 | "cell_type": "markdown", 407 | "metadata": {}, 408 | "source": [ 409 | "如果矩阵存在一个非零子式,则与非零子式相对应的列都是线性无关的." 410 | ] 411 | }, 412 | { 413 | "cell_type": "markdown", 414 | "metadata": {}, 415 | "source": [ 416 | "如果矩阵$\\mathbf{A}$具有$r$阶子式$|\\mathbf{M}|$,有以下性质1.$|\\mathbf{M}|\\neq 0$;2.从$\\mathbf{A}$中抽取出一行和一列,增加到$\\mathbf{M}$中,由此得到的新子式为零,则\n", 417 | "$$\\mathrm{rank}\\mathbf{A}=r$$即矩阵$\\mathbf{A}$的秩等于它非零子式的最高阶数." 418 | ] 419 | }, 420 | { 421 | "cell_type": "markdown", 422 | "metadata": {}, 423 | "source": [ 424 | "一个非奇异(可逆)的矩阵是一个行列式非零的方阵." 425 | ] 426 | }, 427 | { 428 | "cell_type": "markdown", 429 | "metadata": {}, 430 | "source": [ 431 | "设$\\mathbf{A}$是$n\\times n$方阵,$\\mathbf{A}$是非奇异的,当且仅当存在$n\\times n$方阵$\\mathbf{B}$,使得\n", 432 | "$$\\mathbf{A}\\mathbf{B}=\\mathbf{B}\\mathbf{A}=\\mathbf{I}_n$$\n", 433 | "其中,$\\mathbf{I}_n$表示$n\\times n$单位矩阵:\n", 434 | "$$\\mathbf{I}_n=\\begin{bmatrix} 1 & 0 & \\cdots & 0 \\\\ 0 & 1 & \\cdots & 0 \\\\ \\vdots & \\vdots & \\ddots & \\vdots \\\\ 0 & 0 & \\cdots & 1 \\end{bmatrix}$$ " 435 | ] 436 | }, 437 | { 438 | "cell_type": "markdown", 439 | "metadata": {}, 440 | "source": [ 441 | "矩阵$\\mathbf{B}$称为矩阵$\\mathbf{A}$的逆矩阵,记为$\\mathbf{B}=\\mathbf{A}^{-1}$" 442 | ] 443 | }, 444 | { 445 | "cell_type": "markdown", 446 | "metadata": {}, 447 | "source": [ 448 | "2.3 线性方程组" 449 | ] 450 | }, 451 | { 452 | "cell_type": "markdown", 453 | "metadata": {}, 454 | "source": [ 455 | "包含$n$个标量的$m$个方程可表示为向量等式\n", 456 | "$$x_1\\mathbf{a}_1+x_2\\mathbf{a}_2+\\dots+x_n\\mathbf{a}_n=\\mathbf{b}$$\n", 457 | "其中,\n", 458 | "$$\\mathbf{a}_j=\\begin{bmatrix} a_1j \\\\ a_2j \\\\ \\vdots \\\\ a_mj \\end{bmatrix},\\mathbf{b}=\\begin{bmatrix} b_1 \\\\ b_2 \\\\ \\vdots \\\\ b_m \\end{bmatrix}$$ " 459 | ] 460 | }, 461 | { 462 | "cell_type": "markdown", 463 | "metadata": {}, 464 | "source": [ 465 | "该方程可表示为矩阵形式\n", 466 | "$$\\mathbf{A}\\mathbf{x}=\\mathbf{b}$$\n", 467 | "其中,$\\mathbf{A}$为系数矩阵\n", 468 | "$$\\mathbf{A}=\\left[\\mathbf{a}_1,\\mathbf{a}_2,\\dots,\\mathbf{a}_n\\right]$$ \n", 469 | "$\\mathbf{x}$为未知数向量\n", 470 | "$$\\mathbf{x}=\\begin{bmatrix} x_1 \\\\ x_2 \\\\ \\vdots \\\\ x_n \\end{bmatrix}$$\n", 471 | "增广矩阵定义为\n", 472 | "$$\\left[\\mathbf{A},\\mathbf{b}\\right]=\\left[\\mathbf{a}_1,\\mathbf{a}_2,\\dots,\\mathbf{a}_n,\\mathbf{b}\\right]$$" 473 | ] 474 | }, 475 | { 476 | "cell_type": "markdown", 477 | "metadata": {}, 478 | "source": [ 479 | "定理2.1 方程组$\\mathbf{A}\\mathbf{x}=\\mathbf{b}$有解,当且仅当\n", 480 | "$$\\mathrm{rank}\\mathbf{A}=\\mathrm{rank}\\left[\\mathbf{A},\\mathbf{b}\\right]$$" 481 | ] 482 | }, 483 | { 484 | "cell_type": "markdown", 485 | "metadata": {}, 486 | "source": [ 487 | "定理2.2 方程$\\mathbf{A}\\mathbf{x}=\\mathbf{b}$中$\\mathbf{A}\\in\\mathbb{R}^{n\\times n}$且$\\mathrm{rank}\\mathbf{A}=m$.可以通过为$n-m$个未知数赋予任意值并求解其他未知数来获得$\\mathbf{A}\\mathbf{x}=\\mathbf{b}$的解." 488 | ] 489 | }, 490 | { 491 | "cell_type": "markdown", 492 | "metadata": {}, 493 | "source": [ 494 | "2.4 内积和范数" 495 | ] 496 | }, 497 | { 498 | "cell_type": "markdown", 499 | "metadata": {}, 500 | "source": [ 501 | "实数$a$的绝对值记为$|a|$,定义为\n", 502 | "$$ |a|=\\left\\{\n", 503 | "\\begin{aligned}\n", 504 | "a,\\quad a\\geqslant 0 \\\\\n", 505 | "-a,\\quad a < 0\n", 506 | "\\end{aligned}\n", 507 | "\\right.\n", 508 | "$$" 509 | ] 510 | }, 511 | { 512 | "cell_type": "markdown", 513 | "metadata": {}, 514 | "source": [ 515 | "实数绝对值的性质:\n", 516 | "1. $|a|=|-a|$ \n", 517 | "2. $-|a|\\leqslant a\\leqslant|a|$\n", 518 | "3. $|a+b|\\leqslant |a|+|b|$\n", 519 | "4. $||a|-|b||\\leqslant |a-b|\\leqslant|a|+|b|$\n", 520 | "5. $|ab|=|a||b|$\n", 521 | "6. 如果$|a|\\leqslant c$且$|b|\\leqslant d$,则$|a+b|\\leqslant c+d$\n", 522 | "7. $|a|\\leqslant b \\Leftrightarrow -b\\leqslant a\\leqslant b$\n", 523 | "8. $|a|\\geqslant b \\Leftrightarrow \\left(a\\geqslant b\\lor -a\\geqslant b\\right)$" 524 | ] 525 | }, 526 | { 527 | "cell_type": "markdown", 528 | "metadata": {}, 529 | "source": [ 530 | "对于$\\mathbf{x},\\mathbf{y}\\in\\mathbb{R}^n$,定义欧式内积为\n", 531 | "$$\\langle\\mathbf{x},\\mathbf{y}\\rangle=\\sum_{i=1}^nx_i y_i=\\mathbf{x}^\\top\\mathbf{y}$$" 532 | ] 533 | }, 534 | { 535 | "cell_type": "markdown", 536 | "metadata": {}, 537 | "source": [ 538 | "内积是一个实值函数$\\langle\\cdot\\thinspace,\\cdot\\rangle:\\mathbb{R}^n\\times\\mathbb{R}^n\\to\\mathbb{R}$,具有如下性质:\n", 539 | "1. 非负性:$\\langle\\mathbf{x},\\mathbf{y}\\rangle\\geqslant0$,当且仅当$\\mathbf{x}=\\mathbf{0}$时,$\\langle\\mathbf{x},\\mathbf{x}\\rangle=0$. \n", 540 | "2. 对称性:$\\langle\\mathbf{x},\\mathbf{y}\\rangle=\\langle\\mathbf{y},\\mathbf{x}\\rangle$ \n", 541 | "3. 可加行:$\\langle\\mathbf{x}+\\mathbf{y},\\mathbf{z}\\rangle=\\langle\\mathbf{x},\\mathbf{z}\\rangle+\\langle\\mathbf{y},\\mathbf{z}\\rangle$ \n", 542 | "4. 齐次性:对于任意$r\\in\\mathbb{R}$,总有$\\langle r\\mathbf{x},\\mathbf{y}\\rangle=r\\langle\\mathbf{x},\\mathbf{y}\\rangle$" 543 | ] 544 | }, 545 | { 546 | "cell_type": "markdown", 547 | "metadata": {}, 548 | "source": [ 549 | "给定向量$\\mathbf{x}$和$\\mathbf{y}$,如果$\\langle\\mathbf{x},\\mathbf{y}\\rangle=0$,则称$\\mathbf{x}$和$\\mathbf{y}$是正交的." 550 | ] 551 | }, 552 | { 553 | "cell_type": "markdown", 554 | "metadata": {}, 555 | "source": [ 556 | "向量$\\mathbf{x}$的欧式范数定义为\n", 557 | "$$\\|\\mathbf{x}\\|=\\sqrt{\\langle\\mathbf{x},\\mathbf{x}\\rangle}=\\sqrt{\\mathbf{x}^\\top\\mathbf{x}}$$" 558 | ] 559 | }, 560 | { 561 | "cell_type": "markdown", 562 | "metadata": {}, 563 | "source": [ 564 | "定理2.3 柯西-施瓦茨不等式  对于$\\mathbb{R}^n$中任意两个向量$\\mathbf{x}$和$\\mathbf{y}$,有\n", 565 | "$$|\\langle\\mathbf{x},\\mathbf{y}\\rangle|\\leqslant\\|\\mathbf{x}\\|\\|\\mathbf{y}\\|$$\n", 566 | "成立.进一步,当且仅当对于某个$\\alpha\\in\\mathbb{R}$有$\\mathbf{x}=\\alpha\\mathbf{y}$时,该不等式的等号成立." 567 | ] 568 | }, 569 | { 570 | "cell_type": "markdown", 571 | "metadata": {}, 572 | "source": [ 573 | "向量$\\mathbf{x}$的欧式范数$\\|\\mathbf{x}\\|$具有如下性质:\n", 574 | "1. 非负性:$\\|\\mathbf{x}\\|\\geqslant 0$,当且仅当$\\mathbf{x}\\geqslant\\mathbf{0}$时,$\\|\\mathbf{x}\\|= 0$; \n", 575 | "2. 齐次性:$\\|r\\mathbf{x}\\|=|r|\\|\\mathbf{x}\\|,r\\in\\mathbb{R}$;\n", 576 | "3. 三角不等式:$\\|\\mathbf{x}+\\mathbf{y}\\|\\leqslant\\|\\mathbf{x}\\|+\\|\\mathbf{y}\\|$." 577 | ] 578 | }, 579 | { 580 | "cell_type": "markdown", 581 | "metadata": {}, 582 | "source": [ 583 | "$p$范数定义为\n", 584 | "$$\\|\\mathbf{x}\\|_{p}=\\left\\{\n", 585 | "\\begin{aligned}\n", 586 | "\\left(|x_1|^p+|x_2|^p+\\dots+|x_n|^p\\right)^{1/p}, 1\\leqslant p<\\infty \\\\\n", 587 | "\\max{\\{|x_1|,|x_2|,\\dots,|x_n|\\}}, \\qquad p=\\infty\n", 588 | "\\end{aligned}\n", 589 | "\\right.$$" 590 | ] 591 | }, 592 | { 593 | "cell_type": "markdown", 594 | "metadata": {}, 595 | "source": [ 596 | "如果对于所有$\\varepsilon>0$,都存在一个$\\delta>0$,使得$\\|\\mathbf{y}-\\mathbf{x}\\|<\\delta\\Rightarrow\\|\\mathbf{f}\\left(\\mathbf{y}\\right)-\\mathbf{f}\\left(\\mathbf{x}\\right)\\|<\\varepsilon$,则函数$\\mathbf{f}:\\mathbb{R}^n\\to\\mathbb{R}^m$在点$\\mathbf{x}$是连续的." 597 | ] 598 | }, 599 | { 600 | "cell_type": "markdown", 601 | "metadata": {}, 602 | "source": [ 603 | "如果函数$\\mathbf{f}$在$\\mathbb{R}^n$中任意点都是连续的,称该函数在$\\mathbb{R}^n$中是连续的." 604 | ] 605 | }, 606 | { 607 | "cell_type": "markdown", 608 | "metadata": {}, 609 | "source": [ 610 | "对于复数空间$\\mathbb{C}^n$,内积$\\langle\\mathbf{x},\\mathbf{y}\\rangle$定义为$\\sum_{i=1}^n x_i\\overline{y}_i$,上划线表示共轭." 611 | ] 612 | }, 613 | { 614 | "cell_type": "markdown", 615 | "metadata": {}, 616 | "source": [ 617 | "复数空间$\\mathbb{C}^n$上的内积是一个复值函数,具有如下性质:\n", 618 | "1. $\\langle\\mathbf{x},\\mathbf{x}\\rangle\\geqslant0$,当且仅当$\\mathbf{x}=\\mathbf{0}$时,$\\langle\\mathbf{x},\\mathbf{x}\\rangle=0$. \n", 619 | "2. $\\langle\\mathbf{x},\\mathbf{y}\\rangle=\\overline{\\langle\\mathbf{y},\\mathbf{x}\\rangle}$ \n", 620 | "3. $\\langle\\mathbf{x}+\\mathbf{y},\\mathbf{z}\\rangle=\\langle\\mathbf{x},\\mathbf{z}\\rangle+\\langle\\mathbf{y},\\mathbf{z}\\rangle$ \n", 621 | "4. 对于任意$r\\in\\mathbb{C}$,总有$\\langle r\\mathbf{x},\\mathbf{y}\\rangle=r\\langle\\mathbf{x},\\mathbf{y}\\rangle$ " 622 | ] 623 | }, 624 | { 625 | "cell_type": "code", 626 | "execution_count": null, 627 | "metadata": { 628 | "collapsed": true 629 | }, 630 | "outputs": [], 631 | "source": [] 632 | } 633 | ], 634 | "metadata": { 635 | "kernelspec": { 636 | "display_name": "Python 2", 637 | "language": "python", 638 | "name": "python2" 639 | }, 640 | "language_info": { 641 | "codemirror_mode": { 642 | "name": "ipython", 643 | "version": 2 644 | }, 645 | "file_extension": ".py", 646 | "mimetype": "text/x-python", 647 | "name": "python", 648 | "nbconvert_exporter": "python", 649 | "pygments_lexer": "ipython2", 650 | "version": "2.7.13" 651 | } 652 | }, 653 | "nbformat": 4, 654 | "nbformat_minor": 2 655 | } 656 | -------------------------------------------------------------------------------- /第3章 变换.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "第3章 变换" 8 | ] 9 | }, 10 | { 11 | "cell_type": "markdown", 12 | "metadata": {}, 13 | "source": [ 14 | "3.1 线性变换" 15 | ] 16 | }, 17 | { 18 | "cell_type": "markdown", 19 | "metadata": {}, 20 | "source": [ 21 | "给定函数$\\mathcal{L}:\\mathbb{R}^n\\to\\mathbb{R}^m$,如果 \n", 22 | "1. 对于任意$x\\in\\mathbb{R}^n$和$\\alpha\\in\\mathbb{R}$,都有$\\mathcal{L}\\left(\\alpha\\mathbf{x}\\right)=\\alpha\\mathcal{L}\\left(\\mathbf{x}\\right)$; \n", 23 | "2. 对于任意$\\mathbf{x},\\mathbf{y}\\in\\mathbb{R}^n$,都有$\\mathcal{L}\\left(\\mathbf{x}+\\mathbf{y}\\right)=\\mathcal{L}\\left(\\mathbf{x}\\right)+\\mathcal{L}\\left(\\mathbf{y}\\right)$; \n", 24 | " \n", 25 | "则称函数$\\mathcal{L}$为一个线性变换." 26 | ] 27 | }, 28 | { 29 | "cell_type": "markdown", 30 | "metadata": {}, 31 | "source": [ 32 | "对于向量$\\mathbf{x}\\in\\mathbb{R}^n$,将其表示为$\\mathbb{R}^n$中基向量的线性组合,用$\\mathbf{x}'$表示,如果$\\mathbf{y}=\\mathcal{L}\\left(\\mathbf{x}\\right)$,且$\\mathbf{y}'$是$\\mathbf{y}$关于$\\mathbb{R}^m$中给定基的线性组合,则存在一个矩阵$\\mathbf{A}\\in\\mathbb{R}^{m\\times n}$,满足 \n", 33 | "$$\\mathbf{y}'=\\mathbf{A}\\mathbf{x}'$$\n", 34 | "即分别为$\\mathbb{R}^n$和$\\mathbb{R}^m$指定一组基,线性变换$\\mathcal{L}$可以利用矩阵进行表示,矩阵$\\mathbf{A}$称为线性变换$\\mathcal{L}$关于$\\mathbb{R}^n$和$\\mathbb{R}^m$中给定基的矩阵表示." 35 | ] 36 | }, 37 | { 38 | "cell_type": "markdown", 39 | "metadata": {}, 40 | "source": [ 41 | "如果为$\\mathbb{R}^n$和$\\mathbb{R}^m$指定的是标准基,则矩阵表示$\\mathbf{A}$满足\n", 42 | "$$\\mathbf{y}=\\mathcal{L}\\left(\\mathbf{x}\\right)=\\mathbf{A}\\mathbf{x}$$" 43 | ] 44 | }, 45 | { 46 | "cell_type": "markdown", 47 | "metadata": {}, 48 | "source": [ 49 | "令$\\{\\mathbf{e}_1,\\mathbf{e}_2,\\dots,\\mathbf{e}_n\\}$和$\\{\\mathbf{e}_1',\\mathbf{e}_2',\\dots,\\mathbf{e}_n'\\}$是$\\mathbb{R}^n$的两组基.定义矩阵$\\mathbf{T}$为\n", 50 | "$$\\mathbf{T}=\\left[\\mathbf{e}_1,\\mathbf{e}_2,\\dots,\\mathbf{e}_n\\right]^{-1}\\left[\\mathbf{e}_1',\\mathbf{e}_2',\\dots,\\mathbf{e}_n'\\right]$$则$\\mathbf{T}$称为从$\\{\\mathbf{e}_1,\\mathbf{e}_2,\\dots,\\mathbf{e}_n\\}$到$\\{\\mathbf{e}_1',\\mathbf{e}_2',\\dots,\\mathbf{e}_n'\\}$的转移矩阵." 51 | ] 52 | }, 53 | { 54 | "cell_type": "markdown", 55 | "metadata": {}, 56 | "source": [ 57 | "显然有\n", 58 | "$$\\left[\\mathbf{e}_1,\\mathbf{e}_2,\\dots,\\mathbf{e}_n\\right]=\\left[\\mathbf{e}_1',\\mathbf{e}_2',\\dots,\\mathbf{e}_n'\\right]\\mathbf{T}$$即$\\mathbf{T}$的第$i$列是$\\mathbf{e}_i$关于$\\left[\\mathbf{e}_1',\\mathbf{e}_2',\\dots,\\mathbf{e}_n'\\right]$的坐标向量." 59 | ] 60 | }, 61 | { 62 | "cell_type": "markdown", 63 | "metadata": {}, 64 | "source": [ 65 | "对于线性变换$$\\mathcal{L}:\\mathbb{R}^n\\to\\mathbb{R}^m$$\n", 66 | "令$\\mathbf{A}$为$\\mathcal{L}$关于$\\{\\mathbf{e}_1,\\mathbf{e}_2,\\dots,\\mathbf{e}_n\\}$的矩阵表示,$\\mathbf{B}$为$\\mathcal{L}$关于$\\{\\mathbf{e}_1',\\mathbf{e}_2',\\dots,\\mathbf{e}_n'\\}$的矩阵表示.令$\\mathbf{y}=\\mathbf{A}\\mathbf{x}$且$\\mathbf{y}'=\\mathbf{B}\\mathbf{x}'$,因此$\\mathbf{y}'=\\mathbf{T}\\mathbf{y}=\\mathbf{T}\\mathbf{A}\\mathbf{x}=\\mathbf{B}\\mathbf{x}'=\\mathbf{B}\\mathbf{T}\\mathbf{x}$,从而得到$\\mathbf{T}\\mathbf{A}=\\mathbf{B}\\mathbf{T}$或$\\mathbf{A}=\\mathbf{T}^{-1}\\mathbf{B}\\mathbf{T}$." 67 | ] 68 | }, 69 | { 70 | "cell_type": "markdown", 71 | "metadata": {}, 72 | "source": [ 73 | "给定两个$n\\times n$矩阵$\\mathbf{A}$和矩阵$\\mathbf{B}$,如果存在一个非奇异矩阵$\\mathbf{T}$,使得$\\mathbf{A}=\\mathbf{T}^{-1}\\mathbf{B}\\mathbf{T}$,则称矩阵$\\mathbf{A}$和矩阵$\\mathbf{B}$是相似的.在不同的基下,相似矩阵对应线性变换是相同的." 74 | ] 75 | }, 76 | { 77 | "cell_type": "markdown", 78 | "metadata": {}, 79 | "source": [ 80 | "3.2 特征值与特征向量" 81 | ] 82 | }, 83 | { 84 | "cell_type": "markdown", 85 | "metadata": {}, 86 | "source": [ 87 | "令$\\mathbf{A}$是$n\\times n$实数方阵.存在标量$\\lambda$和非零向量$\\boldsymbol{\\upsilon}$满足等式$\\mathbf{A}\\boldsymbol{\\upsilon}=\\lambda\\boldsymbol{\\upsilon}$,称$\\lambda$是矩阵$\\mathbf{A}$的特征值,$\\boldsymbol{\\upsilon}$是矩阵$\\mathbf{A}$的特征向量." 88 | ] 89 | }, 90 | { 91 | "cell_type": "markdown", 92 | "metadata": {}, 93 | "source": [ 94 | "$\\lambda$是矩阵$\\mathbf{A}$的特征值的充要条件是矩阵$\\lambda\\mathbf{I}-\\mathbf{A}$是奇异的,即$\\mathrm{det}\\left[\\lambda\\mathbf{I}-\\mathbf{A}\\right]=0$,其中$\\mathbf{I}$是$n\\times n$单位矩阵,即有$n$次方程成立:\n", 95 | "$$\\mathrm{det}\\left[\\lambda\\mathbf{I}-\\mathbf{A}\\right]=\\lambda^n+a_{n-1}\\lambda^{n-1}+\\dots+a_1\\lambda+a_0=0$$\n", 96 | "多项式$\\mathrm{det}\\left[\\lambda\\mathbf{I}-\\mathbf{A}\\right]$称为矩阵$\\mathbf{A}$的特征多项式,上面的方程称为特征方程." 97 | ] 98 | }, 99 | { 100 | "cell_type": "markdown", 101 | "metadata": {}, 102 | "source": [ 103 | "定理3.1 设特征方程$\\mathrm{det}\\left[\\lambda\\mathbf{I}-\\mathbf{A}\\right]=0$存在$n$个相异的根$\\lambda_1,\\lambda_2,\\dots,\\lambda_n$,则存在$n$个线性不相关的向量$\\boldsymbol{\\upsilon}_1,\\boldsymbol{\\upsilon}_2,\\dots,\\boldsymbol{\\upsilon}_n$,使得\n", 104 | "$$\\mathbf{A}\\upsilon_i=\\lambda_i\\boldsymbol{\\upsilon}_i,\\quad i=1,2,\\dots,n$$" 105 | ] 106 | }, 107 | { 108 | "cell_type": "markdown", 109 | "metadata": {}, 110 | "source": [ 111 | "令\n", 112 | "$$\\mathbf{T}=\\left[\\boldsymbol{\\upsilon}_1,\\boldsymbol{\\upsilon}_2,\\dots,\\boldsymbol{\\upsilon}_n\\right]^{-1}$$\n", 113 | "则\n", 114 | "$$\\mathbf{T}\\mathbf{A}\\mathbf{T}^{-1}=\\mathbf{T}\\mathbf{A}\\left[\\boldsymbol{\\upsilon}_1,\\boldsymbol{\\upsilon}_2,\\dots,\\boldsymbol{\\upsilon}_n\\right] \\\\\n", 115 | "=\\mathbf{T}\\left[\\mathbf{A}\\boldsymbol{\\upsilon}_1,\\mathbf{A}\\boldsymbol{\\upsilon}_2,\\dots,\\mathbf{A}\\boldsymbol{\\upsilon}_n\\right] \\\\\n", 116 | "=\\mathbf{T}\\left[\\lambda\\boldsymbol{\\upsilon}_1,\\lambda\\boldsymbol{\\upsilon}_2,\\dots,\\lambda\\boldsymbol{\\upsilon}_n\\right] \\\\\n", 117 | "=\\mathbf{T}\\mathbf{T}^{-1}\\begin{bmatrix} \\lambda_1 & & & 0 \\\\ & \\lambda_2 & & \\\\ & & \\ddots & \\\\ 0 & & & \\lambda_n \\end{bmatrix} \n", 118 | "=\\begin{bmatrix} \\lambda_1 & & & 0 \\\\ & \\lambda_2 & & \\\\ & & \\ddots & \\\\ 0 & & & \\lambda_n \\end{bmatrix}$$\n", 119 | "在由特征向量$\\{\\boldsymbol{\\upsilon}_1,\\lambda\\boldsymbol{\\upsilon}_2,\\dots,\\lambda\\boldsymbol{\\upsilon}_n\\}$构成的一组线性无关基下,可对矩阵$\\mathbf{A}$进行对角化,即对于所有的$i\\neq j$,对角矩阵的第$\\left(i,j\\right)$个元素$a_{ij}=0$." 120 | ] 121 | }, 122 | { 123 | "cell_type": "markdown", 124 | "metadata": {}, 125 | "source": [ 126 | "对于矩阵$\\mathbf{A}$,若$\\mathbf{A}=\\mathbf{A}^{-1}$,则矩阵$\\mathbf{A}$为对称矩阵." 127 | ] 128 | }, 129 | { 130 | "cell_type": "markdown", 131 | "metadata": {}, 132 | "source": [ 133 | "定理3.2 一个实对称矩阵的所有特征值都是实数." 134 | ] 135 | }, 136 | { 137 | "cell_type": "markdown", 138 | "metadata": {}, 139 | "source": [ 140 | "定理3.3 对于任意$n\\times n$实对称矩阵,其$n$个特征向量是相互正交的." 141 | ] 142 | }, 143 | { 144 | "cell_type": "markdown", 145 | "metadata": {}, 146 | "source": [ 147 | "如果$\\mathbf{A}$是对称矩阵,则它的特征向量集合构成了$\\mathbb{R}^n$空间的正交基.如果对正交基$\\{\\boldsymbol{\\upsilon}_1,\\boldsymbol{\\upsilon}_2,\\dots,\\boldsymbol{\\upsilon}_n\\}$进行标准化,使得每个向量$\\boldsymbol{\\upsilon}_i$的范数都为1,则可定义矩阵\n", 148 | "$$\\mathbf{T}=\\left[\\boldsymbol{\\upsilon}_1,\\boldsymbol{\\upsilon}_2,\\dots,\\boldsymbol{\\upsilon}_n\\right]$$\n", 149 | "该矩阵满足\n", 150 | "$$\\mathbf{T}^\\top\\mathbf{T}=\\mathbf{I}$$\n", 151 | "从而有\n", 152 | "$$\\mathbf{T}^\\top=\\mathbf{T}^{-1}$$" 153 | ] 154 | }, 155 | { 156 | "cell_type": "markdown", 157 | "metadata": {}, 158 | "source": [ 159 | "如果一个矩阵的转置等于它的逆矩阵,则该矩阵称为正交矩阵." 160 | ] 161 | }, 162 | { 163 | "cell_type": "markdown", 164 | "metadata": { 165 | "collapsed": true 166 | }, 167 | "source": [ 168 | "3.3 正交投影" 169 | ] 170 | }, 171 | { 172 | "cell_type": "markdown", 173 | "metadata": {}, 174 | "source": [ 175 | "如果$\\mathcal{V}$是$\\mathbb{R}^n$的子空间,则$\\mathcal{V}$的正交补记为$\\mathcal{V}^\\bot$,包含与$\\mathcal{V}$中每个向量正交的所有向量,即\n", 176 | "$$\\mathcal{V}^\\bot=\\{\\mathbf{x}:\\boldsymbol{\\upsilon}^\\top\\mathbf{x}=0, \\forall\\boldsymbol{\\upsilon}\\in\\mathbb{R}\\}$$" 177 | ] 178 | }, 179 | { 180 | "cell_type": "markdown", 181 | "metadata": {}, 182 | "source": [ 183 | "$\\mathcal{V}$的正交补也是一个子空间." 184 | ] 185 | }, 186 | { 187 | "cell_type": "markdown", 188 | "metadata": {}, 189 | "source": [ 190 | "$\\mathcal{V}$和$\\mathcal{V}^\\bot$能够共同张成一个$\\mathbb{R}^n$,即对每一个向量$\\mathbf{x}\\in\\mathbb{R}^n$,都可以唯一地表示为\n", 191 | "$$\\mathbf{x}=\\mathbf{x_1}+\\mathbf{x_2}$$\n", 192 | "其中,$\\mathbf{x}_1\\in\\mathcal{V},\\mathbf{x}_2\\in\\mathcal{V}^\\bot$,$\\mathbf{x}_1$和$\\mathbf{x}_2$称为$\\mathbf{x}$在子空间$\\mathcal{V}$和$\\mathcal{V}^\\bot$上的正交投影.该表达式称为$\\mathbf{x}$相对于$\\mathcal{V}$的正交分解." 193 | ] 194 | }, 195 | { 196 | "cell_type": "markdown", 197 | "metadata": {}, 198 | "source": [ 199 | "$\\mathbb{R}^n=\\mathcal{V}\\oplus\\mathcal{V}^\\bot$表示$\\mathbb{R}^n$是$\\mathcal{V}$与$\\mathcal{V}^\\bot$的直和." 200 | ] 201 | }, 202 | { 203 | "cell_type": "markdown", 204 | "metadata": {}, 205 | "source": [ 206 | "如果对于$\\forall\\mathbf{x}\\in\\mathbb{R}^n$,都有$\\mathbf{P}\\mathbf{x}\\in\\mathcal{V}$且$\\mathbf{x}-\\mathbf{P}\\mathbf{x}\\in\\mathcal{V}^\\bot$,则称线性变换$\\mathbf{P}$是$\\mathcal{V}$上的正交投影算子." 207 | ] 208 | }, 209 | { 210 | "cell_type": "markdown", 211 | "metadata": {}, 212 | "source": [ 213 | "令$\\mathbf{A}\\in\\mathbb{R}^{m\\times n}$,$\\mathbf{A}$的值域空间(像空间)记为\n", 214 | "$$\\mathcal{R}\\left(\\mathbf{A}\\right)\\triangleq\\{\\mathbf{A}\\mathbf{x}:\\mathbf{x}\\in\\mathbb{R}^n\\}$$\n", 215 | "$\\mathbf{A}$的零空间(核)记为\n", 216 | "$$\\mathcal{N}\\left(\\mathbf{A}\\right)\\triangleq\\{\\mathbf{x}\\in\\mathbb{R}^n:\\mathbf{A}\\mathbf{x}=\\mathbf{0}\\}$$\n", 217 | "$\\mathcal{R}\\left(\\mathbf{A}\\right)$和$\\mathcal{N}\\left(\\mathbf{A}\\right)$都是子空间." 218 | ] 219 | }, 220 | { 221 | "cell_type": "markdown", 222 | "metadata": {}, 223 | "source": [ 224 | "定理3.4 对于任意矩阵$\\mathbf{A}$,总有$\\mathcal{R}\\left(\\mathbf{A}\\right)^\\bot=\\mathcal{N}\\left(\\mathbf{A}^\\top\\right)$和$\\mathcal{N}\\left(\\mathbf{A}\\right)^\\bot=\\mathcal{R}\\left(\\mathbf{A}^\\top\\right)$" 225 | ] 226 | }, 227 | { 228 | "cell_type": "markdown", 229 | "metadata": {}, 230 | "source": [ 231 | "对于任意子空间$\\mathcal{V}$,都有$\\left(\\mathcal{V}^\\bot\\right)^\\bot=\\mathcal{V}$." 232 | ] 233 | }, 234 | { 235 | "cell_type": "markdown", 236 | "metadata": {}, 237 | "source": [ 238 | "如果$\\mathbf{P}$是$\\mathcal{V}$上的一个正交投影算子,则对于所有$\\mathbf{x}\\in\\mathcal{V}$都有$\\mathbf{P}\\mathbf{x}=\\mathbf{x}$且$\\mathcal{R}\\left(\\mathbf{P}\\right)=\\mathcal{V}$." 239 | ] 240 | }, 241 | { 242 | "cell_type": "markdown", 243 | "metadata": {}, 244 | "source": [ 245 | "定理3.5 矩阵$\\mathbf{P}$是子空间$\\mathcal{V}=\\mathcal{R}\\left(\\mathbf{P}\\right)$上的一个正交投影算子,当且仅当$\\mathbf{P}^2=\\mathbf{P}=\\mathbf{P}^\\top$." 246 | ] 247 | }, 248 | { 249 | "cell_type": "markdown", 250 | "metadata": { 251 | "collapsed": true 252 | }, 253 | "source": [ 254 | "3.4 二次型函数" 255 | ] 256 | }, 257 | { 258 | "cell_type": "markdown", 259 | "metadata": {}, 260 | "source": [ 261 | "二次型函数$f:\\mathbb{R}^n\\to\\mathbb{R}$定义为\n", 262 | "$$f\\left(\\mathbf{x}\\right)=\\mathbf{x}^\\top\\mathbf{Q}\\mathbf{x}$$\n", 263 | "其中,$\\mathbf{Q}$是一个$n\\times n$实数矩阵.不失一般性,可假定$\\mathbf{Q}$是对称矩阵,即$\\mathbf{Q}=\\mathbf{Q}^\\top$." 264 | ] 265 | }, 266 | { 267 | "cell_type": "markdown", 268 | "metadata": { 269 | "collapsed": true 270 | }, 271 | "source": [ 272 | "如果对于任意非零向量$\\mathbf{x}$,都有二次型$\\mathbf{x}^\\top\\mathbf{Q}\\mathbf{x}>0$,则二次型$\\mathbf{x}^\\top\\mathbf{Q}\\mathbf{x}$是正定的,对称阵$\\mathbf{Q}$是正定的,记为$\\mathbf{Q}>0$. \n", 273 | "如果对于任意非零向量$\\mathbf{x}$,都有二次型$\\mathbf{x}^\\top\\mathbf{Q}\\mathbf{x}\\geqslant0$,则二次型$\\mathbf{x}^\\top\\mathbf{Q}\\mathbf{x}$是半正定的,对称阵$\\mathbf{Q}$是半正定的,记为$\\mathbf{Q}\\geqslant0$. \n", 274 | "如果对于任意非零向量$\\mathbf{x}$,都有二次型$\\mathbf{x}^\\top\\mathbf{Q}\\mathbf{x}<0$,则二次型$\\mathbf{x}^\\top\\mathbf{Q}\\mathbf{x}$是负定的,对称阵$\\mathbf{Q}$是负定的,记为$\\mathbf{Q}<0$. \n", 275 | "如果对于任意非零向量$\\mathbf{x}$,都有二次型$\\mathbf{x}^\\top\\mathbf{Q}\\mathbf{x}\\leqslant0$,则二次型$\\mathbf{x}^\\top\\mathbf{Q}\\mathbf{x}$是半负定的,对称阵$\\mathbf{Q}$是半负定的,记为$\\mathbf{Q}\\leqslant0$." 276 | ] 277 | }, 278 | { 279 | "cell_type": "markdown", 280 | "metadata": {}, 281 | "source": [ 282 | "矩阵$\\mathbf{Q}$的顺序主子式为$\\mathrm{det}\\mathbf{Q}$自身以及从矩阵$mathbf{Q}$中依次移除最后一行和最后一列获得的所有子式,即\n", 283 | "$$\\Delta_1=q_{11},\\Delta_2=\\begin{bmatrix} q_{11} & q_{12} \\\\ q_{21} & q_{22} \\end{bmatrix}, \\Delta_3=\\begin{bmatrix} q_{11} & q_{12} & q_{13} \\\\ q_{21} & q_{22} & q_{23} \\\\ q_{31} & q_{32} & q_{33} \\end{bmatrix},\\dots,\\Delta_n=\\mathrm{det}\\mathbf{Q} $$" 284 | ] 285 | }, 286 | { 287 | "cell_type": "markdown", 288 | "metadata": {}, 289 | "source": [ 290 | "定理3.6 西尔韦斯特准则:给定二次型$\\mathbf{x}^\\top\\mathbf{Q}\\mathbf{x}$,其中$\\mathbf{Q}=\\mathbf{Q}^\\top$,该二次型是正定的,当且仅当矩阵$\\mathbf{Q}$的顺序主子式是正定的." 291 | ] 292 | }, 293 | { 294 | "cell_type": "markdown", 295 | "metadata": {}, 296 | "source": [ 297 | "定理3.7 对于矩阵$\\mathbf{Q}$是正定(半正定)的,当且仅当$\\mathbf{Q}$的所有特征值是正(非负)的." 298 | ] 299 | }, 300 | { 301 | "cell_type": "markdown", 302 | "metadata": {}, 303 | "source": [ 304 | "3.5 矩阵范数" 305 | ] 306 | }, 307 | { 308 | "cell_type": "markdown", 309 | "metadata": {}, 310 | "source": [ 311 | "矩阵$\\mathbf{A}$的范数记为$\\|\\mathbf{A}\\|$,是一个满足以下条件的任意函数$\\|\\cdot\\|$: \n", 312 | "1. 如果$\\mathbf{A}\\neq\\mathbf{O}$,则$\\|\\mathbf{A}\\|>0,\\|\\mathbf{O}\\|=0$,$\\mathbf{O}$是零矩阵.\n", 313 | "2. 对于任意$c\\in\\mathbb{R}$,有$\\|c\\mathbf{A}\\|=|c|\\|\\mathbf{A}\\|$.\n", 314 | "3. $\\|\\mathbf{A}+\\mathbf{B}\\|\\leqslant\\|\\mathbf{A}\\|+\\|\\mathbf{B}\\|$ \n", 315 | "Frobenius范数是矩阵范数的定义之一\n", 316 | "$$\\|\\mathbf{A}\\|_F=\\left(\\sum_{i=1}^m\\sum_{j=1}^n\\left(a_{ij}\\right)^2\\right)^\\frac{1}{2}$$ \n", 317 | "其中,$\\mathbf{A}\\in\\mathbb{R}^{m\\times n}$.仅考虑满足如下附加条件的矩阵范数:\n", 318 | "4. $\\|\\mathbf{A}\\mathbf{B}\\|\\leqslant\\|\\mathbf{A}\\|\\|\\mathbf{B}\\|$" 319 | ] 320 | }, 321 | { 322 | "cell_type": "markdown", 323 | "metadata": {}, 324 | "source": [ 325 | "令$\\|\\cdot\\|_{\\left(n\\right)}$和$\\|\\cdot\\|_{\\left(m\\right)}$分别为$\\mathbb{R}^n$和$\\mathbb{R}^m$上的向量范数.如果对于任意矩阵$\\mathbf{A}\\in\\mathbb{R}^{m\\times n}$,有以下不等式成立:\n", 326 | "$$\\|\\mathbf{A}\\mathbf{x}\\|_{\\left(m\\right)}\\leqslant\\|\\mathbf{A}\\|\\|\\mathbf{x}\\|_{\\left(n\\right)}$$\n", 327 | "则称该矩阵范数可由向量范数导出,或与向量范数兼容." 328 | ] 329 | }, 330 | { 331 | "cell_type": "markdown", 332 | "metadata": {}, 333 | "source": [ 334 | "导出矩阵范数定义为\n", 335 | "$$\\|\\mathbf{A}\\|=\\max_{\\|\\mathbf{x}\\|_{\\left(n\\right)}=1}\\|\\mathbf{A}\\mathbf{x}\\|_{\\left(m\\right)}$$ \n", 336 | "即$\\|\\mathbf{A}\\|$是向量$\\mathbf{A}\\mathbf{x}$范数中最大值,其中向量$\\mathbf{x}$是范数为1的任意向量." 337 | ] 338 | }, 339 | { 340 | "cell_type": "markdown", 341 | "metadata": {}, 342 | "source": [ 343 | "定理3.8 令\n", 344 | "$$\\|\\mathbf{x}\\|=\\left(\\sum_{k=1}^n|x_k|^2\\right)^{\\frac{1}{2}}=\\sqrt{\\langle\\mathbf{x},\\mathbf{x}\\rangle}$$\n", 345 | "则由该向量范数导出的矩阵范数为\n", 346 | "$$\\|\\mathbf{A}\\|=\\sqrt{\\lambda_1}$$ \n", 347 | "其中,$\\lambda$是矩阵$\\mathbf{A}^\\top\\mathbf{A}$的最大特征值." 348 | ] 349 | }, 350 | { 351 | "cell_type": "markdown", 352 | "metadata": {}, 353 | "source": [ 354 | "瑞丽不等式 如果$n\\times n$矩阵$\\mathbf{P}$是一个实对称正定矩阵,则 \n", 355 | "$$\\lambda_\\min\\left(P\\right)\\|\\mathbf{x}\\|^2\\leqslant\\mathbf{x}^\\top\\mathbf{P}\\mathbf{x}\\leqslant\\lambda_\\max\\left(\\mathbf{P}\\right)\\|\\mathbf{x}\\|^2$$ \n", 356 | "其中,$\\lambda_\\min\\left(P\\right)$表示$\\mathbf{P}$的最小特征值,$\\lambda_\\max\\left(P\\right)$表示$\\mathbf{P}$的最大特征值." 357 | ] 358 | }, 359 | { 360 | "cell_type": "code", 361 | "execution_count": null, 362 | "metadata": { 363 | "collapsed": true 364 | }, 365 | "outputs": [], 366 | "source": [] 367 | } 368 | ], 369 | "metadata": { 370 | "kernelspec": { 371 | "display_name": "Python 2", 372 | "language": "python", 373 | "name": "python2" 374 | }, 375 | "language_info": { 376 | "codemirror_mode": { 377 | "name": "ipython", 378 | "version": 2 379 | }, 380 | "file_extension": ".py", 381 | "mimetype": "text/x-python", 382 | "name": "python", 383 | "nbconvert_exporter": "python", 384 | "pygments_lexer": "ipython2", 385 | "version": "2.7.13" 386 | } 387 | }, 388 | "nbformat": 4, 389 | "nbformat_minor": 2 390 | } 391 | -------------------------------------------------------------------------------- /第4章 有关集合概念.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "第4章 有关集合概念" 8 | ] 9 | }, 10 | { 11 | "cell_type": "markdown", 12 | "metadata": { 13 | "collapsed": true 14 | }, 15 | "source": [ 16 | "4.1 线段" 17 | ] 18 | }, 19 | { 20 | "cell_type": "markdown", 21 | "metadata": { 22 | "collapsed": true 23 | }, 24 | "source": [ 25 | "$\\mathbf{x}$和$\\mathbf{y}$是空间$\\mathbb{R}^n$中的两点,两点之间的线段指的是连接$\\mathbf{x}$和$\\mathbf{y}$的直线上所有点的集合。如果$\\mathbf{z}$位于$\\mathbf{x}$和$\\mathbf{y}$之间的线段上,则\n", 26 | "$$\\frac{\\mathbf{z}-\\mathbf{y}}{\\mathbf{x}-\\mathbf{y}}=\\alpha,\\quad\\alpha\\in\\left[0,1\\right]$$\n", 27 | "有\n", 28 | "$$\\mathbf{z}-\\mathbf{y}=\\alpha\\left(\\mathbf{x}-\\mathbf{y}\\right) \\\\ \n", 29 | "\\mathbf{z}=\\alpha\\mathbf{x}+\\left(1-\\alpha\\right)\\mathbf{y}$$即$\\mathbf{x}$和$\\mathbf{y}$之间的线段可表示为\n", 30 | "$$\\{\\mathbf{z}-\\mathbf{y}=\\alpha\\left(\\mathbf{x}-\\mathbf{y}\\right):\\alpha\\in\\left[0,1\\right]\\}$$" 31 | ] 32 | }, 33 | { 34 | "cell_type": "markdown", 35 | "metadata": {}, 36 | "source": [ 37 | "4.2 超平面与线性簇" 38 | ] 39 | }, 40 | { 41 | "cell_type": "markdown", 42 | "metadata": {}, 43 | "source": [ 44 | "令$u_1,u_2,\\dots,u_n,v\\in\\mathbb{R}$,其中至少存在一个$u_i$不为零。由所有满足线性方程\n", 45 | "$$u_1x_1+u_2x_2+\\dots+u_nx_n=v$$\n", 46 | "的点$\\mathbf{x}=\\left[x_1,x_2,\\dots,x_n\\right]^\\top$组成的集合称为空间$\\mathbb{R}$的超平面,记为\n", 47 | "$$\\{\\mathbf{x}\\in\\mathbb{R}^n:\\mathbf{u}^\\top\\mathbf{x}=v\\}$$\n", 48 | "其中$\\mathbf{u}=\\left[u_1,u_2,\\dots,u_n\\right]^\\top$。" 49 | ] 50 | }, 51 | { 52 | "cell_type": "markdown", 53 | "metadata": {}, 54 | "source": [ 55 | "注意,由于超平面通常不包含远点,因此超平面不一定是空间$\\mathbb{R}$的子空间。" 56 | ] 57 | }, 58 | { 59 | "cell_type": "markdown", 60 | "metadata": {}, 61 | "source": [ 62 | "超平面$H=\\{\\mathbf{x}:u_1x_+u_2x_2\\dots u_nx_n=v\\}$将$\\mathbb{R}^n$空间划分为两部分。其中一部分包含满足不等式$u_1x_+u_2x_2\\dots u_nx_n\\geqslant v$的所有点,记为\n", 63 | "$$H_+ =\\{\\mathbf{x}\\in\\mathbb{R}^n:\\mathbf{u}^\\top\\mathbf{x}\\geqslant v\\}$$\n", 64 | "称为正半空间;\n", 65 | "另外一部分包含满足不等式$u_1x_+u_2x_2\\dots u_nx_n\\leqslant v$的所有点,记为\n", 66 | "$$H_- =\\{\\mathbf{x}\\in\\mathbb{R}^n:\\mathbf{u}^\\top\\mathbf{x}\\leqslant v\\}$$\n", 67 | "称为负半空间。" 68 | ] 69 | }, 70 | { 71 | "cell_type": "markdown", 72 | "metadata": {}, 73 | "source": [ 74 | "令$\\mathbf{a}=\\left[a_1,a_2,\\dots,a_n\\right]^\\top$表示超平面$H$中任意一点,故有$\\mathbf{u}^\\top\\mathbf{a}-v=0$。可得\n", 75 | "$$\\mathbf{u}^\\top\\mathbf{x}-v=\\mathbf{u}^\\top\\mathbf{x}-v-\\left(\\mathbf{u}^\\top\\mathbf{a}-v\\right) \\\\\n", 76 | "=\\mathbf{u}^\\top \\left(\\mathbf{x}-\\mathbf{a}\\right) \\\\\n", 77 | "\\qquad\\qquad\\qquad\\qquad\\qquad\\qquad\\quad\\quad=u_1\\left(x_1-a_1\\right)+u_2\\left(x_2-a_2\\right)+\\dots+u_n\\left(x_n-a_n\\right)=0$$\n", 78 | "其中,$\\left(x_i-a_i\\right) \\left(i=1,\\dots,n\\right)$是向量$\\mathbf{x}-\\mathbf{a}$的元素。则超平面$H$包含满足方程$\\langle\\mathbf{u},\\mathbf{x}-\\mathbf{a}\\rangle=0$的所有点$\\mathbf{x}$,称向量$\\mathbf{u}$正交于平面$H$。" 79 | ] 80 | }, 81 | { 82 | "cell_type": "markdown", 83 | "metadata": {}, 84 | "source": [ 85 | "集合$H_+$包含所有满足$\\langle\\mathbf{u},\\mathbf{x}-\\mathbf{a}\\rangle\\geqslant0$的点;集合$H_-$包含所有满足$\\langle\\mathbf{u},\\mathbf{x}-\\mathbf{a}\\rangle\\leqslant0$的点。" 86 | ] 87 | }, 88 | { 89 | "cell_type": "markdown", 90 | "metadata": {}, 91 | "source": [ 92 | "线性簇为集合\n", 93 | "$$\\{\\mathbf{x}\\in\\mathbb{R}^n:\\mathbf{A}\\mathbf{x}=\\mathbf{b}\\}$$\n", 94 | "其中,矩阵$\\mathbf{A}\\in\\mathbb{R}^{m\\times n}$,向量$\\mathbf{b}\\in\\mathbb{R}^m$。如果$\\mathrm{dim}\\mathcal{N}\\left(\\mathbf{A}\\right)=r$,则称线性簇的维数为$r$。当且仅当$\\mathbf{b}=0$时,线性簇是一个子空间。如果$\\mathbf{A}=\\mathbf{0}$,则该线性簇是$\\mathbb{R}^n$。如果线性簇的维数小于$n$,则它是有限个超平面的交集。" 95 | ] 96 | }, 97 | { 98 | "cell_type": "markdown", 99 | "metadata": {}, 100 | "source": [ 101 | "4.3 凸集" 102 | ] 103 | }, 104 | { 105 | "cell_type": "markdown", 106 | "metadata": {}, 107 | "source": [ 108 | "已知两点$\\mathbf{u},\\mathbf{v}\\in\\mathbb{R}^n$之间的线段可表示为集合$\\{\\mathbf{w}\\in\\mathbb{R}^n:\\mathbf{w}=\\alpha\\mathbf{u}+\\left(1-\\alpha\\right)\\mathbf{v},\\alpha\\in\\left[0,1\\right]\\}$,点$\\mathbf{w}=\\alpha\\mathbf{u}+\\left(1-\\alpha\\right)\\mathbf{v}\\thinspace \\left(\\alpha\\in\\left[0,1\\right]\\right)$称为点$\\mathbf{u}$和点$\\mathbf{v}$的凸组合。" 109 | ] 110 | }, 111 | { 112 | "cell_type": "markdown", 113 | "metadata": {}, 114 | "source": [ 115 | "如果对于所有$\\mathbf{u},\\mathbf{v}\\in\\Theta$,$\\mathbf{u}$和$\\mathbf{v}$之间的线段都位于$\\Theta$内,则称集合$\\Theta\\in\\mathbb{R}^n$为凸集。 \n", 116 | "凸集可以是\n", 117 | "+ 空集\n", 118 | "+ 单点组成的集合\n", 119 | "+ 一条直线或线段\n", 120 | "+ 子空间\n", 121 | "+ 超平面\n", 122 | "+ 线性簇\n", 123 | "+ 半空间\n", 124 | "+ $\\mathbb{R}^n$" 125 | ] 126 | }, 127 | { 128 | "cell_type": "markdown", 129 | "metadata": {}, 130 | "source": [ 131 | "定理4.1 $\\mathbb{R}$的凸子集具有如下性质: \n", 132 | "a. 如果$\\Theta$是一个凸集,且$\\beta$是一个实数,则集合\n", 133 | "$$\\beta\\Theta=\\{\\mathbf{x}:\\mathbf{x}=\\beta\\mathbf{v},\\mathbf{v}\\in\\Theta\\}$$\n", 134 | "也是凸集。 \n", 135 | "b. 如果$\\Theta_1$和$\\Theta_2$都是凸集,则集合\n", 136 | "$$\\Theta_1+\\Theta_2=\\{\\mathbf{x}:\\mathbf{x}=\\mathbf{v}_1+\\mathbf{v}_2,\\mathbf{v}_1\\in\\Theta_1,\\mathbf{v}_2\\in\\Theta_2\\}$$\n", 137 | "也是凸集。 \n", 138 | "c. 任意多个凸集的交集也是凸集。" 139 | ] 140 | }, 141 | { 142 | "cell_type": "markdown", 143 | "metadata": {}, 144 | "source": [ 145 | "对于凸集中的点$\\mathbf{x}$,如果不存在两个点$\\mathbf{u}$和$\\mathbf{v}$,使得对于某个$\\alpha\\in\\left(0,1\\right)$有$\\mathbf{x}=\\alpha\\mathbf{u}+\\left(1-\\alpha\\right)\\mathbf{v}\\thinspace$,则称点$\\mathbf{x}$是凸集$\\Theta$的极点。" 146 | ] 147 | }, 148 | { 149 | "cell_type": "markdown", 150 | "metadata": { 151 | "collapsed": true 152 | }, 153 | "source": [ 154 | "4.4 邻域" 155 | ] 156 | }, 157 | { 158 | "cell_type": "markdown", 159 | "metadata": {}, 160 | "source": [ 161 | "点$\\mathbf{x}\\in\\mathbb{R}^n$的邻域可表示为\n", 162 | "$$\\{\\mathbf{y}\\in\\mathbb{R}^n:\\|\\mathbf{y}-\\mathbf{x}\\|<\\varepsilon\\}$$\n", 163 | "其中,$\\varepsilon$为某个正数。邻域也可视为半径为$\\varepsilon$、中心为$\\mathbf{x}$的球体。" 164 | ] 165 | }, 166 | { 167 | "cell_type": "markdown", 168 | "metadata": {}, 169 | "source": [ 170 | "如果集合$S$包含$\\mathbf{x}$的某个邻域,即$\\mathbf{x}$的某个邻域的所有点都属于$S$,则点$\\mathbf{x}\\in S$称为集合$S$的内点。$S$所有内点的集合称为$S$的内部。 \n", 171 | "如果$\\mathbf{x}$的邻域既包含$S$中的点,也包含$S$外的点,则称$\\mathbf{x}$为集合$S$的边界点。$S$的边界点可能是$S$中的元素,也可能不是$S$中的元素。$S$的所有边界点的集合称为$S$的边界。" 172 | ] 173 | }, 174 | { 175 | "cell_type": "markdown", 176 | "metadata": {}, 177 | "source": [ 178 | "如果集合$S$包含它的每个点的邻域,则该集合是开集。即如果$S$中的每个点都是内点,或$S$不包含任意边界点,则$S$是开集。 \n", 179 | "如果集合$S$包含边界点,则该集合是闭集。可以证明,当且仅当一个集合的补集是开集,则该集合是闭集。" 180 | ] 181 | }, 182 | { 183 | "cell_type": "markdown", 184 | "metadata": {}, 185 | "source": [ 186 | "如果一个集合可以被一个有限半径的球体包围,则该集合称为有界集。如果一个集合既是有界集又是闭集,则该集合称为紧集。" 187 | ] 188 | }, 189 | { 190 | "cell_type": "markdown", 191 | "metadata": {}, 192 | "source": [ 193 | "定理4.2 魏尔斯特拉斯定理 \n", 194 | "假设$f:\\Omega\\to\\mathbb{R}$是一个连续函数,其中$\\Omega\\subset\\mathbb{R}^n$是紧集。则必定存在点$x_0\\in\\Omega$,使得对于所有$x\\in\\Omega$都有$f\\left(x_0\\right)\\leqslant f\\left(x\\right)$,即$f$能够在$\\Omega$上取得极小值。" 195 | ] 196 | }, 197 | { 198 | "cell_type": "markdown", 199 | "metadata": {}, 200 | "source": [ 201 | "4.5 多面体和多胞形" 202 | ] 203 | }, 204 | { 205 | "cell_type": "markdown", 206 | "metadata": {}, 207 | "source": [ 208 | "令$\\Theta$为一个凸集,$\\mathbf{y}$是$\\Theta$的一个边界点。某个经过点$\\mathbf{y}$的超平面将$\\mathbb{R}^n$空间分为两个半空间,如果$\\Theta$完全位于其中一个半空间,则称该超平面为集合$\\Theta$的支撑超平面。" 209 | ] 210 | }, 211 | { 212 | "cell_type": "markdown", 213 | "metadata": {}, 214 | "source": [ 215 | "如果一个集合可以表示为有限个半空间的交集,则称该集合为多面体。由于在$\\mathbb{R}^n$空间中的半空间$H_+$和$H_-$都是凸集,且任意多个凸集的交集也是凸集,所以多面体是凸集。 \n", 216 | "一个非空有界多面体称为多胞形。" 217 | ] 218 | }, 219 | { 220 | "cell_type": "markdown", 221 | "metadata": {}, 222 | "source": [ 223 | "对于每个凸多面体$\\Theta\\subset\\mathbf{R}^n$,都存在一个非负正数$k\\leqslant n$,使得$\\Theta$能够包含在一个维度为$k$的线性簇中,却无法完全包含与$\\mathbb{R}^n$的任意$k-1$维线性簇中。即包含$\\Theta$的线性簇是唯一的该线性簇称为多面体$\\Theta$的包,$k$称为$\\Theta$的维数。任意$k\\left(k>0\\right)$维多面体的边界,包含有限数量的$k-1$维多面体。" 224 | ] 225 | }, 226 | { 227 | "cell_type": "markdown", 228 | "metadata": {}, 229 | "source": [ 230 | "如果$k-1$维多面体能构成$k$维多面体的边界,则这个$k-1$维多面体称为$k$维多面体的面。一个多面体的零维面称为定点,一维面称为棱。" 231 | ] 232 | } 233 | ], 234 | "metadata": { 235 | "kernelspec": { 236 | "display_name": "Python 2", 237 | "language": "python", 238 | "name": "python2" 239 | }, 240 | "language_info": { 241 | "codemirror_mode": { 242 | "name": "ipython", 243 | "version": 2 244 | }, 245 | "file_extension": ".py", 246 | "mimetype": "text/x-python", 247 | "name": "python", 248 | "nbconvert_exporter": "python", 249 | "pygments_lexer": "ipython2", 250 | "version": "2.7.13" 251 | } 252 | }, 253 | "nbformat": 4, 254 | "nbformat_minor": 2 255 | } 256 | -------------------------------------------------------------------------------- /第5章 微积分基础.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "第5章 微积分基础" 8 | ] 9 | }, 10 | { 11 | "cell_type": "markdown", 12 | "metadata": {}, 13 | "source": [ 14 | "5.1 序列与极限" 15 | ] 16 | }, 17 | { 18 | "cell_type": "markdown", 19 | "metadata": {}, 20 | "source": [ 21 | "实数序列是一个函数,定义域是自然数$1,2,\\dots,k,\\dots$组成的集合,值域是$\\mathbb{R}$。实数序列可以写成集合$\\{x_1,x_2,\\dots,x_k,\\dots\\}$,常记为$\\{x_k\\}$。" 22 | ] 23 | }, 24 | { 25 | "cell_type": "markdown", 26 | "metadata": {}, 27 | "source": [ 28 | "如果$x_1x_2>\\dots>x_k>\\dots$,则序列$\\{x_k\\}$是递减的,即对于所有的$k$,都有$x_k>x_{k+1}$。如果$x_k\\geqslant x_{k+1}$,则该序列是非增的。" 30 | ] 31 | }, 32 | { 33 | "cell_type": "markdown", 34 | "metadata": {}, 35 | "source": [ 36 | "如果对于任意正数$\\varepsilon$,存在一个数$K$,使得对于所有的$k>K$,都有$|x_k-x^*|<\\varepsilon$;即对所有$k>K$,$x_k$位于$x^*-\\varepsilon$和$x^*+\\varepsilon$之间,则称$x^*\\in\\mathbb{R}$为序列$|x_k|$的极限,记为\n", 37 | "$$x^*=\\lim_{k\\to\\infty} x_k$$\n", 38 | "或\n", 39 | "$$x_k\\to x^*$$" 40 | ] 41 | }, 42 | { 43 | "cell_type": "markdown", 44 | "metadata": {}, 45 | "source": [ 46 | "如果一个序列存在极限,则该序列称为收敛序列。" 47 | ] 48 | }, 49 | { 50 | "cell_type": "markdown", 51 | "metadata": {}, 52 | "source": [ 53 | "$\\mathbb{R}^n$中的序列是一个定义域是自然数$1,2,\\dots,k,\\dots$,值域是$\\mathbb{R}^n$的函数,记为$\\{\\mathbf{x}^{\\left(1\\right)},\\mathbf{x}^{\\left(2\\right)},\\dots,\\mathbf{x}^{\\left(k\\right)},\\dots\\}$或$\\{\\mathbf{x}^{\\left(k\\right)}\\}$。" 54 | ] 55 | }, 56 | { 57 | "cell_type": "markdown", 58 | "metadata": {}, 59 | "source": [ 60 | "如果对于任意正数$\\varepsilon$,存在一个数$K$,使得对于所有的$k>K$,都有$\\|x^{\\left(k\\right)}-x^*\\|<\\varepsilon$则称$x^*$为序列$\\{x^{\\left(k\\right)}\\}$的极限,记为$x^*=\\lim_{k\\to\\infty} x^{\\left(k\\right)}$或$x^{\\left(k\\right)}\\to x^*$。" 61 | ] 62 | }, 63 | { 64 | "cell_type": "markdown", 65 | "metadata": {}, 66 | "source": [ 67 | "定理5.1 收敛序列的极限是唯一的。" 68 | ] 69 | }, 70 | { 71 | "cell_type": "markdown", 72 | "metadata": {}, 73 | "source": [ 74 | "定理5.2 任意收敛序列是有界的。" 75 | ] 76 | }, 77 | { 78 | "cell_type": "markdown", 79 | "metadata": {}, 80 | "source": [ 81 | "对于$\\mathbb{R}$中的序列$\\{x_k\\}$,如果对于所有$k=1,2,\\dots,$都有$x_k\\leqslant B$,则B称为该序列的上界。称序列$\\{x_k\\}$有上界。 \n", 82 | "对于$\\mathbb{R}$中的序列$\\{x_k\\}$,如果对于所有$k=1,2,\\dots,$都有$x_k\\geqslant B$,则B称为该序列的下界。称序列$\\{x_k\\}$有下界。 \n", 83 | "如果一个序列既有上界,又有下界,则该序列是有界的。" 84 | ] 85 | }, 86 | { 87 | "cell_type": "markdown", 88 | "metadata": {}, 89 | "source": [ 90 | "如果$\\mathbb{R}$中任意序列$\\{x_k\\}$有上界,则它有上确界(最小上界),即$\\{x_k\\}$上界B的最小值。 \n", 91 | "如果$\\mathbb{R}$中任意序列$\\{x_k\\}$有下界,则它有下确界(最大下界),即$\\{x_k\\}$上界B的最大值。 " 92 | ] 93 | }, 94 | { 95 | "cell_type": "markdown", 96 | "metadata": {}, 97 | "source": [ 98 | "如果B是序列$\\{x_k\\}$的上确界,则对于所有$k$,都有$x_k\\leqslant B$,且对于任意$\\varepsilon>0$,都存在$K$,使得$x_K>B-\\varepsilon$。 \n", 99 | "如果B是序列$\\{x_k\\}$的下确界,则对于所有$k$,都有$x_k\\geqslant B$,且对于任意$\\varepsilon>0$,都存在$K$,使得$x_K0$和$\\delta>0$,使得如果$\\|\\mathbf{x}\\|<\\delta,\\mathbf{x}\\in\\Omega$,那么$\\frac{\\|\\mathbf{f}\\left(\\mathbf{x}\\right)\\|}{|g\\left(\\mathbf{x}\\right)|}\\leqslant K$。  \n", 500 | "$2. \\mathbf{f}\\left(\\mathbf{x}\\right)=o\\left(\\mathbf{g}\\left(\\mathbf{x}\\right)\\right)$则意味\n", 501 | "$$\\lim_{\\mathbf{x}\\to\\mathbf{0},\\mathbf{x}\\in\\Omega}\\frac{\\|\\mathbf{f}\\left(\\mathbf{x}\\right)\\|}{|g\\left(\\mathbf{x}\\right)|}=0$$" 502 | ] 503 | }, 504 | { 505 | "cell_type": "markdown", 506 | "metadata": {}, 507 | "source": [ 508 | "符号$O\\left(\\mathbf{g}\\left(\\mathbf{x}\\right)\\right)$表示一个在原点$\\mathbf{0}$的邻域上有界的函数,这一边界可通过对函数$\\mathbf{g}\\left(\\mathbf{x}\\right)$进行适当缩放得到。以下函数属于此类:\n", 509 | "* $x=O\\left(x\\right)$\n", 510 | "* $\\begin{bmatrix} x^3 \\\\ 2x^2+3x^4 \\end{bmatrix}=O\\left(x^2\\right)$\n", 511 | "* $\\cos x = O\\left(1\\right)$\n", 512 | "* $\\sin x = O\\left(x\\right)$" 513 | ] 514 | }, 515 | { 516 | "cell_type": "markdown", 517 | "metadata": {}, 518 | "source": [ 519 | "符号$o\\left(g\\left(\\mathbf{x}\\right)\\right)$表示相对于函数$g\\left(\\mathbf{x}\\right)$能够更快地接近于零的函数,即$\\lim_{\\mathbf{x}\\to\\mathbf{0}}\\|o\\left(g\\left(\\mathbf{x}\\right)\\right)\\|/|g\\left(\\mathbf{x}\\right)|=0$。以下函数属于此类:\n", 520 | "* $x^2=o\\left(x\\right)$\n", 521 | "* $\\begin{bmatrix} x^3 \\\\ 2x^2+3x^4 \\end{bmatrix}=o\\left(x\\right)$\n", 522 | "* $x^3=o\\left(x^2\\right)$\n", 523 | "* $x=o\\left(1\\right)$" 524 | ] 525 | }, 526 | { 527 | "cell_type": "markdown", 528 | "metadata": {}, 529 | "source": [ 530 | "如果$\\mathbf{f}\\left(\\mathbf{x}\\right)=o\\left(g\\left(\\mathbf{x}\\right)\\right)$,则$\\mathbf{f}\\left(\\mathbf{x}\\right)=O\\left(g\\left(\\mathbf{x}\\right)\\right)$ \n", 531 | "如果$\\mathbf{f}\\left(\\mathbf{x}\\right)=O\\left(\\|\\mathbf{x}\\|^p\\right)$,则对于任意$\\varepsilon>0$,有$\\mathbf{f}\\left(\\mathbf{x}\\right)=o\\left(\\|\\mathbf{x}\\|^{p-\\varepsilon}\\right)$。" 532 | ] 533 | }, 534 | { 535 | "cell_type": "markdown", 536 | "metadata": {}, 537 | "source": [ 538 | "已知函数$f\\in\\mathcal{C}^m$,泰勒定理的余项为\n", 539 | "$$R_m=\\frac{h^m}{m!}f^{\\left(m\\right)}\\left(a+\\theta h\\right)$$\n", 540 | "其中,$\\theta\\in\\left(0,1\\right)$。将其带入泰勒公式,可得\n", 541 | "$$f\\left(b\\right)=f\\left(a\\right)+\\frac{h}{1!}f^{\\left(1\\right)}\\left(a\\right)+\\frac{h^2}{2!}f^{\\left(2\\right)}\\left(a\\right)+\\dots+\\frac{h^{m-1}}{\\left(m-1\\right)!}f^{\\left(m-1\\right)}\\left(a\\right)+\\frac{h^m}{m!}f^{\\left(m\\right)}\\left(a+\\theta h\\right)$$" 542 | ] 543 | }, 544 | { 545 | "cell_type": "markdown", 546 | "metadata": {}, 547 | "source": [ 548 | "利用$f^{\\left(m\\right)}$的连续性,可知当$h\\to 0$,有$f^{\\left(m\\right)}\\left(a+\\theta h\\right)\\to f^{\\left(m\\right)}\\left(a\\right)$,即$f^{\\left(m\\right)}\\left(a+\\theta h\\right)=f^{\\left(m\\right)}\\left(a\\right)+o\\left(1\\right)$。" 549 | ] 550 | }, 551 | { 552 | "cell_type": "markdown", 553 | "metadata": {}, 554 | "source": [ 555 | "因此,\n", 556 | "$$\\frac{h^m}{m!}f^{\\left(m\\right)}\\left(a+\\theta h\\right)=\\frac{h^m}{m!}f^{\\left(m\\right)}\\left(a\\right)+o\\left(h^m\\right)$$\n", 557 | "这里用到了$h^m o\\left(1\\right)=o\\left(h^m\\right)$。因此,泰勒公式可重写为\n", 558 | "$$f\\left(b\\right)=f\\left(a\\right)+\\frac{h}{1!}f^{\\left(1\\right)}\\left(a\\right)+\\frac{h^2}{2!}f^{\\left(2\\right)}\\left(a\\right)+\\dots+\\frac{h^m}{m!}f^{\\left(m\\right)}\\left(a\\right)+o\\left(h^m\\right)$$" 559 | ] 560 | }, 561 | { 562 | "cell_type": "markdown", 563 | "metadata": {}, 564 | "source": [ 565 | "假设$f\\in\\mathcal{C}^{m+1}$,则可将上式中$o\\left(h^m\\right)$替换为$O\\left(h^{m+1}\\right)$。因此,余项为$R_{m+1}$,对应的泰勒公式为\n", 566 | "$$f\\left(b\\right)=f\\left(a\\right)+\\frac{h}{1!}f^{\\left(1\\right)}\\left(a\\right)+\\frac{h^2}{2!}f^{\\left(2\\right)}\\left(a\\right)+\\dots+\\frac{h^{m-1}}{\\left(m-1\\right)!}f^{\\left(m-1\\right)}\\left(a\\right)+\\frac{h^m}{m!}f^{\\left(m\\right)}\\left(a\\right)+R_{m+1}$$\n", 567 | "其中,\n", 568 | "$$R_{m+1}=\\frac{h^{m+1}}{\\left(m+1\\right)!}f^{\\left(m+1\\right)}\\left(a+\\theta'h\\right)$$\n", 569 | "其中,$\\theta'\\in\\left(0,1\\right)$。" 570 | ] 571 | }, 572 | { 573 | "cell_type": "markdown", 574 | "metadata": {}, 575 | "source": [ 576 | "由于$f^{\\left(m+1\\right)}$在$\\left[a,b\\right]$上是有界的,可知\n", 577 | "$$R_{m+1}=O\\left(h^{m+1}\\right)$$" 578 | ] 579 | }, 580 | { 581 | "cell_type": "markdown", 582 | "metadata": {}, 583 | "source": [ 584 | "因此,对于函数$f\\in\\mathcal{C}^{m+1}$,泰勒公式为\n", 585 | "$$f\\left(b\\right)=f\\left(a\\right)+\\frac{h}{1!}f^{\\left(1\\right)}\\left(a\\right)+\\frac{h^2}{2!}f^{\\left(2\\right)}\\left(a\\right)+\\dots+\\frac{h^m}{m!}f^{\\left(m\\right)}\\left(a\\right)+O\\left(h^{m+1}\\right)$$" 586 | ] 587 | }, 588 | { 589 | "cell_type": "markdown", 590 | "metadata": {}, 591 | "source": [ 592 | "实值函数$\\mathbf{f}:\\mathbb{R}^n\\to\\mathbb{R}$在点$\\mathbf{x}_0\\in\\mathbb{R}^n$的泰勒展开式: \n", 593 | "假定$\\mathbf{f}\\in\\mathcal{C}^2$,令$\\mathbf{x}$和$\\mathbf{x}_0$表示$\\mathbb{R}^n$中的点,令$\\mathbf{z}\\left(\\alpha\\right)=\\mathbf{x}_0+\\alpha\\left(\\mathbf{x}-\\mathbf{x}_0\\right)/\\|\\mathbf{x}-\\mathbf{x}_0\\|$。定义函数$\\phi:\\mathbb{R}\\to\\mathbb{R}$为 \n", 594 | "$$\\phi\\left(\\alpha\\right)=f\\left(\\mathbf{z}\\left(\\alpha\\right)\\right)=f\\left(\\mathbf{x}_0+\\alpha\\left(\\mathbf{x}-\\mathbf{x}_0\\right)/\\|\\mathbf{x}-\\mathbf{x}_0\\|\\right)$$" 595 | ] 596 | }, 597 | { 598 | "cell_type": "markdown", 599 | "metadata": {}, 600 | "source": [ 601 | "利用链式法则,可的一阶导数为\n", 602 | "$$\\phi'\\left(\\alpha\\right)=\\frac{\\mathrm{d}\\phi}{\\mathrm{d}\\alpha}\\left(\\alpha\\right) \\\\\n", 603 | "\\qquad\\qquad\\qquad\\qquad\\qquad\\qquad\\qquad=Df\\left(\\mathbf{z}\\left(\\alpha\\right)\\right)D\\mathbf{z}\\left(\\alpha\\right)=Df\\left(\\mathbf{z}\\left(\\alpha\\right)\\right)\\frac{\\left(\\mathbf{x}-\\mathbf{x}_0\\right)}{\\|\\mathbf{x}-\\mathbf{x}_0\\|} \\\\\n", 604 | "\\qquad\\qquad\\qquad\\qquad\\qquad=\\left(\\mathbf{x}-\\mathbf{x}_0\\right)^\\top Df\\left(\\mathbf{z}\\left(\\alpha\\right)\\right)^\\top/\\|\\mathbf{x}-\\mathbf{x}_0\\|$$" 605 | ] 606 | }, 607 | { 608 | "cell_type": "markdown", 609 | "metadata": {}, 610 | "source": [ 611 | "二阶导数为\n", 612 | "$$\\phi''\\left(\\alpha\\right)=\\frac{\\mathrm{d}^2\\phi}{\\mathrm{d}\\alpha^2}\\left(\\alpha\\right) \\\\\n", 613 | "\\quad\\quad\\quad\\quad\\quad=\\frac{\\mathrm{d}}{\\mathrm{d}\\alpha}\\left(\\frac{\\mathrm{d}\\phi}{\\mathrm{d}\\alpha}\\right)\\left(\\alpha\\right) \\\\\n", 614 | "\\quad\\quad\\quad\\quad\\quad\\quad\\quad\\quad=\\frac{\\left(\\mathbf{x}-\\mathbf{x}_0\\right)^\\top}{\\|\\mathbf{x}-\\mathbf{x}_0\\|}\\frac{\\mathrm{d}}{\\mathrm{d}\\alpha}Df\\left(\\mathbf{z}\\left(\\alpha\\right)\\right)^\\top \\\\\n", 615 | "\\quad\\quad\\quad\\quad\\quad\\quad\\quad\\quad\\quad\\quad\\quad=\\frac{\\left(\\mathbf{x}-\\mathbf{x}_0\\right)^\\top}{\\|\\mathbf{x}-\\mathbf{x}_0\\|}D\\left(Df\\right)\\left(\\mathbf{z}\\left(\\alpha\\right)\\right)^\\top\\frac{\\mathrm{d}\\mathbf{z}}{\\mathrm{d}\\alpha}\\left(\\alpha\\right) \\\\\n", 616 | "\\quad\\quad\\quad\\quad\\quad\\quad\\quad\\quad\\quad\\quad\\quad\\quad\\quad\\quad=\\frac{1}{\\|\\mathbf{x}-\\mathbf{x}_0\\|}\\left(\\mathbf{x}-\\mathbf{x}_0\\right)^\\top D^2f\\left(\\mathbf{z}\\left(\\alpha\\right)\\right)^\\top\\left(\\mathbf{x}-\\mathbf{x}_0\\right) \\\\\n", 617 | "\\quad\\quad\\quad\\quad\\quad\\quad\\quad\\quad\\quad\\quad\\quad\\quad\\quad\\quad=\\frac{1}{\\|\\mathbf{x}-\\mathbf{x}_0\\|}\\left(\\mathbf{x}-\\mathbf{x}_0\\right)^\\top D^2f\\left(\\mathbf{z}\\left(\\alpha\\right)\\right)\\left(\\mathbf{x}-\\mathbf{x}_0\\right)$$" 618 | ] 619 | }, 620 | { 621 | "cell_type": "markdown", 622 | "metadata": {}, 623 | "source": [ 624 | "由于\n", 625 | "$$f\\left(\\mathbf{x}\\right)=\\phi\\left(\\|\\mathbf{x}-\\mathbf{x}_0\\|\\right) \\\\\n", 626 | "\\quad\\quad\\quad\\quad\\quad\\quad\\quad\\quad\\quad\\quad\\quad\\quad\\quad\\quad\\quad\\quad\\quad\\quad\\quad=\\phi\\left(0\\right)+\\frac{\\|\\mathbf{x}-\\mathbf{x}_0\\|}{1!}\\phi'\\left(0\\right)+\\frac{\\|\\mathbf{x}-\\mathbf{x}_0\\|^2}{2!}\\phi''\\left(0\\right)+o\\left(\\|\\mathbf{x}-\\mathbf{x}_0\\|^2\\right)$$\n", 627 | "则\n", 628 | "$$f\\left(\\mathbf{x}\\right)=f\\left(\\mathbf{x}_0\\right)+\\frac{1}{1!}Df\\left(\\mathbf{x}_0\\right)\\left(\\mathbf{x}-\\mathbf{x}_0\\right) \\\\\n", 629 | "\\quad\\quad\\quad\\quad\\quad\\quad\\quad\\quad\\quad\\quad+\\frac{1}{2!}\\left(\\mathbf{x}-\\mathbf{x}_0\\right)^\\top D^2f\\left(\\mathbf{x}_0\\right)\\left(\\mathbf{x}-\\mathbf{x}_0\\right)+o\\left(\\|\\mathbf{x}-\\mathbf{x}_0\\|^2\\right)$$" 630 | ] 631 | }, 632 | { 633 | "cell_type": "markdown", 634 | "metadata": {}, 635 | "source": [ 636 | "如果函数$\\mathbf{f}\\in\\mathcal{C}^3$,则泰勒公式展开到余项$R_3$,即\n", 637 | "$$f\\left(\\mathbf{x}\\right)=f\\left(\\mathbf{x}_0\\right)+\\frac{1}{1!}Df\\left(\\mathbf{x}_0\\right)\\left(\\mathbf{x}-\\mathbf{x}_0\\right) \\\\\n", 638 | "\\quad\\quad\\quad\\quad\\quad\\quad\\quad\\quad\\quad\\quad+\\frac{1}{2!}\\left(\\mathbf{x}-\\mathbf{x}_0\\right)^\\top D^2f\\left(\\mathbf{x}_0\\right)\\left(\\mathbf{x}-\\mathbf{x}_0\\right)+O\\left(\\|\\mathbf{x}-\\mathbf{x}_0\\|^3\\right)$$" 639 | ] 640 | }, 641 | { 642 | "cell_type": "markdown", 643 | "metadata": {}, 644 | "source": [ 645 | "定理5.9 中值定理 如果函数$\\mathbf{f}:\\mathbb{R}^n\\to\\mathbb{R}^m$在开集$\\Omega\\subset\\mathbb{R}^n$上可微,则对于任意两点$\\mathbf{x},\\mathbf{y}\\in\\Omega$,存在矩阵$\\mathbf{M}$,使得\n", 646 | "$$\\mathbf{f}\\left(\\mathbf{x}\\right)-\\mathbf{f}\\left(\\mathbf{y}\\right)=\\mathbf{M}\\left(\\mathbf{x}-\\mathbf{y}\\right)$$" 647 | ] 648 | }, 649 | { 650 | "cell_type": "markdown", 651 | "metadata": {}, 652 | "source": [ 653 | "将泰勒公式应用与$\\mathbf{f}$中的每个元素即可得到中值定理。针对$\\mathbf{x}$和$\\mathbf{y}$之间的线段中的点计算$\\mathbf{D}\\mathbf{f}$,矩阵$\\mathbf{M}$中的各行就是直接对应$\\mathbf{D}\\mathbf{f}$中的各行。" 654 | ] 655 | }, 656 | { 657 | "cell_type": "code", 658 | "execution_count": null, 659 | "metadata": { 660 | "collapsed": true 661 | }, 662 | "outputs": [], 663 | "source": [] 664 | } 665 | ], 666 | "metadata": { 667 | "kernelspec": { 668 | "display_name": "Python 2", 669 | "language": "python", 670 | "name": "python2" 671 | }, 672 | "language_info": { 673 | "codemirror_mode": { 674 | "name": "ipython", 675 | "version": 2 676 | }, 677 | "file_extension": ".py", 678 | "mimetype": "text/x-python", 679 | "name": "python", 680 | "nbconvert_exporter": "python", 681 | "pygments_lexer": "ipython2", 682 | "version": "2.7.13" 683 | } 684 | }, 685 | "nbformat": 4, 686 | "nbformat_minor": 2 687 | } 688 | -------------------------------------------------------------------------------- /第6章 集合约束和无约束优化问题的基础知识.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "第6章 集合约束和无约束优化问题的基础知识" 8 | ] 9 | }, 10 | { 11 | "cell_type": "markdown", 12 | "metadata": {}, 13 | "source": [ 14 | "6.1 引言" 15 | ] 16 | }, 17 | { 18 | "cell_type": "markdown", 19 | "metadata": {}, 20 | "source": [ 21 | "优化问题\n", 22 | "$$minimize\\; f\\left(\\mathbf{x}\\right) \\\\\n", 23 | "subject\\;to\\;\\mathbf{x}\\in\\Omega$$\n", 24 | "其中,函数$f:\\mathbb{R}^n\\to\\mathbb{R}$称为目标函数或价值函数,是一个实值函数。" 25 | ] 26 | }, 27 | { 28 | "cell_type": "markdown", 29 | "metadata": {}, 30 | "source": [ 31 | "优化问题的含义是寻找合适的$\\mathbf{x}$,使得函数$f$达到最小。$\\mathbf{x}$是$n$维向量,表示为$\\mathbf{x}=\\left[x_1,x_2,\\dots,x_n\\right]^\\top\\in\\mathbb{R}^n$,$x_1,x_2,\\dots,x_n$相互独立,通常称为决策变量。集合$\\Omega$是$n$维实数空间$\\mathbb{R}^n$的一个子集,称为约束集或可行集,可表示为$\\Omega=\\{\\mathbf{x}:\\mathbf{h}\\left(\\mathbf{x}\\right)=\\mathbf{0},\\mathbf{g}\\left(\\mathbf{x}\\right)\\leqslant\\mathbf{0}\\}$,其中,$\\mathbf{h}$和$\\mathbf{g}$表示由函数组成的向量,称为函数约束。如果$\\Omega=\\mathbb{R}^n$,则该问题是无约束优化问题。" 32 | ] 33 | }, 34 | { 35 | "cell_type": "markdown", 36 | "metadata": {}, 37 | "source": [ 38 | "定义6.1 存在一个$n$元实值函数$f:\\mathbb{R}^n\\to\\mathbb{R}$,定义域为$\\Omega\\subset\\mathbb{R}^n$。对于定义域$\\Omega$中的点$\\mathbf{x}^*$,如果存在$\\varepsilon>0$,对于所有满足$\\|\\mathbf{x}-\\mathbf{x}^*\\|<\\varepsilon,\\mathbf{x}\\in\\Omega\\backslash \\{\\mathbf{x}^*\\}$的向量$\\mathbf{x}$,不等式$f\\left(\\mathbf{x}\\right)\\geqslant f\\left(\\mathbf{x}^*\\right)$都成立,则称$\\mathbf{x}^*$是函数$f$在定义域$\\Omega$上的一个局部极小点。如果对于所有$\\mathbf{x}\\in\\Omega\\backslash\\{\\mathbf{x}\\}$,不等式$f\\left(\\mathbf{x}\\right)\\geqslant f\\left(\\mathbf{x}^*\\right)$都成立,则称$\\mathbf{x}^*$是函数$f$在定义域$\\Omega$中的一个全局极小点。" 39 | ] 40 | }, 41 | { 42 | "cell_type": "markdown", 43 | "metadata": {}, 44 | "source": [ 45 | "如果将上述中的$f\\left(\\mathbf{x}\\right)\\geqslant f\\left(\\mathbf{x}^*\\right)$替换为$f\\left(\\mathbf{x}\\right)> f\\left(\\mathbf{x}^*\\right)$,则局部极小点和全局极小点对应称为严格局部极小点和严格全局极小点。" 46 | ] 47 | }, 48 | { 49 | "cell_type": "markdown", 50 | "metadata": {}, 51 | "source": [ 52 | "函数$f$在定义域$\\Omega$上的全局极小点$\\mathbf{x}^*$可以表示为$f\\left(\\mathbf{x}^*\\right)=\\min_{\\mathbf{x}\\in\\Omega}f\\left(\\mathbf{x}\\right)$或$\\mathbf{x}^*=\\arg\\min_{\\mathbf{x}\\in\\Omega}f\\left(\\mathbf{x}\\right)$。" 53 | ] 54 | }, 55 | { 56 | "cell_type": "markdown", 57 | "metadata": {}, 58 | "source": [ 59 | "一阶必要条件可以采用方向导数的形式表示,即对于所有的可行方向$\\mathbf{d}$,都有\n", 60 | "$$\\frac{\\partial f}{\\partial\\mathbf{d}}\\left(\\mathbf{x}^*\\right)\\geqslant0$$\n", 61 | "即对于局部极小点$\\mathbf{x}^*$,在约束集$\\Omega$内,函数$f$的值沿$\\mathbf{x}^*$处任意可行方向$\\mathbf{d}$的增长率都是非负的。" 62 | ] 63 | }, 64 | { 65 | "cell_type": "markdown", 66 | "metadata": {}, 67 | "source": [ 68 | "6.2 局部极小点的条件" 69 | ] 70 | }, 71 | { 72 | "cell_type": "markdown", 73 | "metadata": {}, 74 | "source": [ 75 | "定义6.2 对于向量$\\mathbf{d}\\in\\mathbb{R}^n,\\mathbf{d}\\neq\\mathbf{0}$和约束集中的某点$\\mathbf{x}\\in\\Omega$,如果存在一个实数$\\alpha_0>0$,使得对于所有$\\alpha\\in\\left[0,\\alpha_0\\right],\\mathbf{x}+\\alpha\\mathbf{d}$任然在约束集内,即$\\mathbf{x}+\\alpha\\mathbf{d}\\in\\Omega$,则称$\\mathbf{d}$为$\\mathbf{x}$处的可行方向。" 76 | ] 77 | }, 78 | { 79 | "cell_type": "markdown", 80 | "metadata": {}, 81 | "source": [ 82 | "$\\mathbf{d}$为$n$元实值函数$f:\\mathbb{R}^n\\to\\mathbb{R}$在$\\mathbf{x}\\in\\Omega$处的可行方向,则函数$f$沿方向$\\mathbf{d}$的方向导数可表示为\n", 83 | "$$\\frac{\\partial f}{\\partial\\mathbf{d}}=\\lim_{\\alpha\\to 0}\\frac{f\\left(\\mathbf{x}+\\alpha\\mathbf{d}\\right)-f\\left(\\mathbf{x}\\right)}{\\alpha}$$" 84 | ] 85 | }, 86 | { 87 | "cell_type": "markdown", 88 | "metadata": {}, 89 | "source": [ 90 | "如果$\\|\\mathbf{d}\\|=1$,则方向导数$\\partial f/\\partial\\mathbf{d}$表示的是函数$f$的值在$\\mathbf{x}$处沿方向$\\mathbf{d}$的增长率。" 91 | ] 92 | }, 93 | { 94 | "cell_type": "markdown", 95 | "metadata": {}, 96 | "source": [ 97 | "设$\\mathbf{x}$和$\\mathbf{d}$已知,则$f\\left(\\mathbf{x}+\\alpha\\mathbf{d}\\right)$成为关于$\\alpha$的函数,有\n", 98 | "$$\\frac{\\partial f}{\\partial\\mathbf{d}}\\left(\\mathbf{x}\\right)=\\frac{\\mathbf{d}}{\\partial\\alpha}f\\left(\\mathbf{x}+\\alpha\\mathbf{d}\\right)\\Big|_{\\alpha=0}$$\n", 99 | "应用链式法则,可得\n", 100 | "$$\\frac{\\partial f}{\\partial\\mathbf{d}}\\left(\\mathbf{x}\\right)=\\frac{\\mathbf{d}}{\\partial\\alpha}f\\left(\\mathbf{x}+\\alpha\\mathbf{d}\\right)\\Big|_{\\alpha=0}=\\nabla f\\left(\\mathbf{x}\\right)^\\top\\mathbf{d}=\\langle\\nabla f\\left(\\mathbf{x}\\right),\\mathbf{d}\\rangle=\\mathbf{d}^\\top\\nabla f\\left(\\mathbf{x}\\right)$$" 101 | ] 102 | }, 103 | { 104 | "cell_type": "markdown", 105 | "metadata": {}, 106 | "source": [ 107 | "当$\\mathbf{d}$是一个单位向量$\\left(\\|\\mathbf{d}\\|=1\\right)$时,函数$f$的值在$\\mathbf{x}$处沿方向$\\mathbf{d}$的增长率可用内积$\\langle\\nabla f\\left(\\mathbf{x}\\right),\\mathbf{d}\\rangle$表示。" 108 | ] 109 | }, 110 | { 111 | "cell_type": "markdown", 112 | "metadata": {}, 113 | "source": [ 114 | "定理6.1 局部极小点的一阶必要条件 多元实值函数$f$在约束集$\\Omega$上一阶连续可微,即$f\\in\\mathcal{C}^1$,约束集$\\Omega$是$\\mathbb{R}^n$的子集。如果$\\mathbf{x}^*$是函数$f$在$\\Omega$上的局部极小点,则对于$\\mathbf{x}^*$处的任意可行方向$\\mathbf{d}$,都有\n", 115 | "$$\\mathbf{d}^\\top\\nabla f\\left(\\mathbf{x}\\right)\\geqslant 0$$\n", 116 | "成立。" 117 | ] 118 | }, 119 | { 120 | "cell_type": "markdown", 121 | "metadata": {}, 122 | "source": [ 123 | "推论6.1 局部极小点位于约束集内部一阶必要条件 多元实值函数$f$在约束集$\\Omega$上一阶连续可微,即$f\\in\\mathcal{C}^1$,约束集$\\Omega$是$\\mathbb{R}^n$的子集。如果$\\mathbf{x}^*$是函数$f$在$\\Omega$上的局部极小点,且是$\\Omega$的内点,则有\n", 124 | "$$\\nabla f\\left(\\mathbf{x}\\right)= 0$$\n", 125 | "成立。" 126 | ] 127 | }, 128 | { 129 | "cell_type": "markdown", 130 | "metadata": {}, 131 | "source": [ 132 | "定理6.2 局部极小点的二阶必要条件 多元实值函数$f$在约束集$\\Omega$上二阶连续可微,即$f\\in\\mathcal{C}^2$,约束集$\\Omega$是$\\mathbb{R}^n$的子集。如果$\\mathbf{x}^*$是函数$f$在$\\Omega$上的局部极小点,$\\mathbf{d}$是$\\mathbf{x}^*$处的一个可行方向,且$\\mathbf{d}^\\top\\nabla f\\left(\\mathbf{x}^*\\right)=0$,则有\n", 133 | "$$\\mathbf{d}^\\top\\mathbf{F}\\left(\\mathbf{x}^*\\right)\\mathbf{d}\\geqslant0$$\n", 134 | "其中,$\\mathbf{F}$是函数$f$的黑塞矩阵。" 135 | ] 136 | }, 137 | { 138 | "cell_type": "markdown", 139 | "metadata": {}, 140 | "source": [ 141 | "推论6.2 局部极小点位于约束集内部二阶必要条件 多元实值函数$f$在约束集$\\Omega$上二阶连续可微,即$f\\in\\mathcal{C}^2$,约束集$\\Omega$是$\\mathbb{R}^n$的子集。如果$\\mathbf{x}^*$是函数$f$在$\\Omega$上的局部极小点,且是$\\Omega$的内点,则有\n", 142 | "$$\\nabla f\\left(\\mathbf{x}\\right)= 0$$\n", 143 | "黑塞矩阵$\\mathbf{F}\\left(\\mathbf{x}^*\\right)$是半正定$\\left(\\mathbf{F}\\left(\\mathbf{x}^*\\right)\\geqslant0\\right)$,即对于所有的向量$\\mathbf{d}\\in\\mathbb{R}^n$都有\n", 144 | "$$\\mathbf{d}^\\top\\mathbf{F}\\left(\\mathbf{x}^*\\right)\\mathbf{d}\\geqslant0$$" 145 | ] 146 | }, 147 | { 148 | "cell_type": "markdown", 149 | "metadata": {}, 150 | "source": [ 151 | "定理6.3 局部极小点的二阶充分条件(局部极小点为内点) 多元实值函数$f$在约束集$\\Omega$上二阶连续可微,即$f\\in\\mathcal{C}^2$,$\\mathbf{x}^*$是约束集的一个内点,如果同时满足\n", 152 | "1. $\\nabla f\\left(\\mathbf{x}^*\\right)= 0$ \n", 153 | "2. $\\mathbf{F}\\left(\\mathbf{x}^*\\right)>0$ \n", 154 | "\n", 155 | "则$\\mathbf{x}^*$是函数$f$的一个严格局部极小点。" 156 | ] 157 | }, 158 | { 159 | "cell_type": "code", 160 | "execution_count": null, 161 | "metadata": { 162 | "collapsed": true 163 | }, 164 | "outputs": [], 165 | "source": [] 166 | } 167 | ], 168 | "metadata": { 169 | "kernelspec": { 170 | "display_name": "Python 2", 171 | "language": "python", 172 | "name": "python2" 173 | }, 174 | "language_info": { 175 | "codemirror_mode": { 176 | "name": "ipython", 177 | "version": 2 178 | }, 179 | "file_extension": ".py", 180 | "mimetype": "text/x-python", 181 | "name": "python", 182 | "nbconvert_exporter": "python", 183 | "pygments_lexer": "ipython2", 184 | "version": "2.7.13" 185 | } 186 | }, 187 | "nbformat": 4, 188 | "nbformat_minor": 2 189 | } 190 | -------------------------------------------------------------------------------- /第7章 一维搜索方法.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "第7章 一维搜索方法" 8 | ] 9 | }, 10 | { 11 | "cell_type": "markdown", 12 | "metadata": {}, 13 | "source": [ 14 | "7.1 引言" 15 | ] 16 | }, 17 | { 18 | "cell_type": "markdown", 19 | "metadata": {}, 20 | "source": [ 21 | "目标函数为一元单值函数$f:\\mathbb{R}\\to\\mathbb{R}$时的最小优化问题的迭代求解方法,称为一维搜索法,也称为线性搜索法。" 22 | ] 23 | }, 24 | { 25 | "cell_type": "markdown", 26 | "metadata": {}, 27 | "source": [ 28 | "迭代算法从初始搜索点$x^{\\left(0\\right)}$出发,产生一个迭代序列$x^{\\left(1\\right)},x^{\\left(2\\right)},\\dots$。在第$k=1,2,\\dots$次迭代中,通过当前迭代点$x^{\\left(k\\right)}$和目标函数$f$构建下一个迭代点$x^{\\left(k+1\\right)}$。算法可能只需要迭代点处的目标函数值,还可能用到目标函数的一阶导数$f'$,甚至二阶导数$f''$。" 29 | ] 30 | }, 31 | { 32 | "cell_type": "markdown", 33 | "metadata": {}, 34 | "source": [ 35 | "一维搜索算法包括(但不限于):\n", 36 | "* 黄金分割法(只使用目标函数值)\n", 37 | "* 斐波那契数列法(只使用目标函数值)\n", 38 | "* 二分法(只使用目标函数的一阶导数$f'$)\n", 39 | "* 割线法(只使用目标函数的一阶导数$f'$)\n", 40 | "* 牛顿法(同时使用目标函数的一阶导数$f'$和二阶导数$f''$)" 41 | ] 42 | }, 43 | { 44 | "cell_type": "markdown", 45 | "metadata": {}, 46 | "source": [ 47 | "7.2 黄金分割法" 48 | ] 49 | }, 50 | { 51 | "cell_type": "markdown", 52 | "metadata": {}, 53 | "source": [ 54 | "用于求解一元单值函数$f:\\mathbb{R}\\to\\mathbb{R}$在区间$\\left[a_0,b_0\\right]$上的极小点。前提是目标函数$f$在区间$\\left[a_0,b_0\\right]$上是单峰的,即存在唯一的局部极小点。" 55 | ] 56 | }, 57 | { 58 | "cell_type": "markdown", 59 | "metadata": { 60 | "collapsed": true 61 | }, 62 | "source": [ 63 | "方法思路为选择区间$\\left[a_0,b_0\\right]$中的点,计算对应的目标函数值,通过比较不断缩小极小点所在的区间。利用尽可能少的计算次数来找出函数$f$的极小点,即不段的压缩极小所在的区间,直到达到足够的精度水平。" 64 | ] 65 | }, 66 | { 67 | "cell_type": "markdown", 68 | "metadata": {}, 69 | "source": [ 70 | "每次需要计算两个点处的目标函数值$f$。可以按照对称压缩方式来缩小极小点所在区间,即\n", 71 | "$$a_1-a_0=b_0-b_1=\\rho\\left(b_0-a_0\\right)$$\n", 72 | "其中,$\\rho<\\frac{1}{2}$。" 73 | ] 74 | }, 75 | { 76 | "cell_type": "markdown", 77 | "metadata": {}, 78 | "source": [ 79 | "计算目标函数在中间点$a_1$和$b_1$处的值,如果$f\\left(a_1\\right)0$,则极小点位于$x^{\\left(0\\right)}$左侧,新区间为$\\left[a_0,x^{\\left(0\\right)}\\right]$,转1;\n", 233 | "3. 如果$f'\\left(x^{\\left(0\\right)}\\right)<0$,则极小点位于$x^{\\left(0\\right)}$右侧,新区间为$\\left[x^{\\left(0\\right)},b_0\\right]$,转1;\n", 234 | "4. 如果$f'\\left(x^{\\left(0\\right)}\\right)=0$,则$x^{\\left(0\\right)}$为极小点。算法结束。" 235 | ] 236 | }, 237 | { 238 | "cell_type": "markdown", 239 | "metadata": {}, 240 | "source": [ 241 | "二分法的总压缩比为$\\left(1/2\\right)^N$。" 242 | ] 243 | }, 244 | { 245 | "cell_type": "markdown", 246 | "metadata": {}, 247 | "source": [ 248 | "7.5 牛顿法" 249 | ] 250 | }, 251 | { 252 | "cell_type": "markdown", 253 | "metadata": { 254 | "collapsed": true 255 | }, 256 | "source": [ 257 | "假设函数$f$连续二阶可微。可构造经过点$\\left(x^{\\left(k\\right)},f\\left(x^{\\left(k\\right)}\\right)\\right)$处的二次函数\n", 258 | "$$q\\left(x\\right)=f\\left(x^{\\left(k\\right)}\\right)+f'\\left(x^{\\left(k\\right)}\\right)\\left(x-x^{\\left(k\\right)}\\right)+\\frac{1}{2}f''\\left(x^{\\left(k\\right)}\\right)\\left(x-x^{\\left(k\\right)}\\right)^2$$\n", 259 | "显然,有$q\\left(x^{\\left(k\\right)}\\right)=f\\left(x^{\\left(k\\right)}\\right),q'\\left(x^{\\left(k\\right)}\\right)=f'\\left(x^{\\left(k\\right)}\\right),q''\\left(x^{\\left(k\\right)}\\right)=f''\\left(x^{\\left(k\\right)}\\right)$。" 260 | ] 261 | }, 262 | { 263 | "cell_type": "markdown", 264 | "metadata": {}, 265 | "source": [ 266 | "$q\\left(x\\right)$可认为是函数$f\\left(x\\right)$的近似,因此求函数$f$的极小点可近似于求解$q$的极小点。" 267 | ] 268 | }, 269 | { 270 | "cell_type": "markdown", 271 | "metadata": {}, 272 | "source": [ 273 | "函数$q$的极小点应满足一阶必要条件:\n", 274 | "$$q'\\left(x\\right)=f'\\left(x^{\\left(k\\right)}\\right)+f''\\left(x^{\\left(k\\right)}\\right)\\left(x-x^{\\left(k\\right)}\\right)=0$$" 275 | ] 276 | }, 277 | { 278 | "cell_type": "markdown", 279 | "metadata": {}, 280 | "source": [ 281 | "令$x=x^{\\left(k+1\\right)}$,可得\n", 282 | "$$x^{\\left(k+1\\right)}=x^{\\left(k\\right)}-\\frac{f'\\left(x^{\\left(k\\right)}\\right)}{f''\\left(x^{\\left(k\\right)}\\right)}$$" 283 | ] 284 | }, 285 | { 286 | "cell_type": "markdown", 287 | "metadata": {}, 288 | "source": [ 289 | "当$f''\\left(x\\right)>0$对于区间内所有点$x$都成立时,牛顿法可收敛到极小点;当$f''\\left(x\\right)<0$在区间内某点$x$成立时,牛顿法可能收敛到极大点;" 290 | ] 291 | }, 292 | { 293 | "cell_type": "markdown", 294 | "metadata": {}, 295 | "source": [ 296 | "令$g\\left(x\\right)=f'\\left(x\\right)$,可得迭代公式\n", 297 | "$$x^{\\left(k+1\\right)}=x^{\\left(k\\right)}-\\frac{g\\left(x^{\\left(k\\right)}\\right)}{g'\\left(x^{\\left(k\\right)}\\right)}$$\n", 298 | "用于求解方程$g\\left(x\\right)=0$,称为牛顿切线法。" 299 | ] 300 | }, 301 | { 302 | "cell_type": "markdown", 303 | "metadata": {}, 304 | "source": [ 305 | "7.6 割线法" 306 | ] 307 | }, 308 | { 309 | "cell_type": "markdown", 310 | "metadata": {}, 311 | "source": [ 312 | "如果函数$f$的二阶导数不存在,可用不同点处的一阶导数对其近似得到。" 313 | ] 314 | }, 315 | { 316 | "cell_type": "markdown", 317 | "metadata": {}, 318 | "source": [ 319 | "将近似值\n", 320 | "$$f''\\left(x^{\\left(k\\right)}\\right)\\approx\\frac{f'\\left(x^{\\left(k\\right)}\\right)-f'\\left(x^{\\left(k-1\\right)}\\right)}{x^{\\left(k\\right)}-x^{\\left(k-1\\right)}}$$\n", 321 | "带入牛顿法迭代公式,可得\n", 322 | "$$x^{\\left(k+1\\right)}=x^{\\left(k\\right)}-\\frac{x^{\\left(k\\right)}-x^{\\left(k-1\\right)}}{f'\\left(x^{\\left(k\\right)}\\right)-f'\\left(x^{\\left(k-1\\right)}\\right)}f'\\left(x^{\\left(k\\right)}\\right) \\\\\n", 323 | "=\\frac{f'\\left(x^{\\left(k\\right)}\\right)x^{\\left(k-1\\right)}-f'\\left(x^{\\left(k-1\\right)}\\right)x^{\\left(k\\right)}}{f'\\left(x^{\\left(k\\right)}\\right)-f'\\left(x^{\\left(k-1\\right)}\\right)}$$\n", 324 | "该方法需要两个初始点$x^{\\left(-1\\right)}$和$x^{\\left(0\\right)}$。割线法使用第$k-1$和第$k$个迭代之间的割线确定第$k+1$个迭代点。" 325 | ] 326 | }, 327 | { 328 | "cell_type": "markdown", 329 | "metadata": {}, 330 | "source": [ 331 | "割线法也可用于解方程$g\\left(x\\right)=0$,其迭代公式为\n", 332 | "$$x^{\\left(k+1\\right)}=x^{\\left(k\\right)}-\\frac{x^{\\left(k\\right)}-x^{\\left(k-1\\right)}}{g\\left(x^{\\left(k\\right)}\\right)-g\\left(x^{\\left(k-1\\right)}\\right)}g\\left(x^{\\left(k\\right)}\\right) \\\\\n", 333 | "=\\frac{g\\left(x^{\\left(k\\right)}\\right)x^{\\left(k-1\\right)}-g\\left(x^{\\left(k-1\\right)}\\right)x^{\\left(k\\right)}}{g\\left(x^{\\left(k\\right)}\\right)-g\\left(x^{\\left(k-1\\right)}\\right)}$$" 334 | ] 335 | }, 336 | { 337 | "cell_type": "markdown", 338 | "metadata": {}, 339 | "source": [ 340 | "7.7 划界法" 341 | ] 342 | }, 343 | { 344 | "cell_type": "markdown", 345 | "metadata": {}, 346 | "source": [ 347 | "确定目标函数极小点所在初始区间(上下边界)的方法,称为划界法。" 348 | ] 349 | }, 350 | { 351 | "cell_type": "markdown", 352 | "metadata": {}, 353 | "source": [ 354 | "任选3点$x_0f\\left(x_2\\right)$,则选一点$x_3>x_2$,检查$f\\left(x_2\\right)1,\\eta\\in\\left(\\varepsilon,1\\right)$。通过要去\n", 417 | "$$\\phi_k\\left(\\alpha_k\\right)\\leqslant\\phi_k\\left(0\\right)+\\varepsilon\\alpha_k\\phi_k'\\left(0\\right)$$\n", 418 | "保证$\\alpha_k$不会太大。通过要求\n", 419 | "$$\\phi_k\\left(\\gamma\\alpha_k\\right)\\geqslant\\phi_k\\left(0\\right)+\\varepsilon\\gamma\\alpha_k\\phi_k'\\left(0\\right)$$\n", 420 | "保证$\\alpha_k$不会太小。" 421 | ] 422 | }, 423 | { 424 | "cell_type": "markdown", 425 | "metadata": {}, 426 | "source": [ 427 | "Goldstein条件: \n", 428 | "$$\\phi_k\\left(\\alpha_k\\right)\\geqslant\\phi_k\\left(0\\right)+\\eta\\alpha_k\\phi_k'\\left(0\\right)$$" 429 | ] 430 | }, 431 | { 432 | "cell_type": "markdown", 433 | "metadata": {}, 434 | "source": [ 435 | "Armijo-Goldstein条件: \n", 436 | "$$\\phi_k\\left(\\alpha_k\\right)\\leqslant\\phi_k\\left(0\\right)+\\varepsilon\\alpha_k\\phi_k'\\left(0\\right) \\\\\n", 437 | "\\phi_k\\left(\\alpha_k\\right)\\geqslant\\phi_k\\left(0\\right)+\\eta\\alpha_k\\phi_k'\\left(0\\right)$$" 438 | ] 439 | }, 440 | { 441 | "cell_type": "markdown", 442 | "metadata": {}, 443 | "source": [ 444 | "Wolfe条件: \n", 445 | "$$\\phi_k'\\left(\\alpha_k\\right)\\geqslant\\eta\\phi_k'\\left(0\\right)$$" 446 | ] 447 | }, 448 | { 449 | "cell_type": "markdown", 450 | "metadata": {}, 451 | "source": [ 452 | "强Wolfe条件: \n", 453 | "$$|\\phi_k'\\left(\\alpha_k\\right)|\\leqslant\\eta|\\phi_k'\\left(0\\right)|$$" 454 | ] 455 | }, 456 | { 457 | "cell_type": "markdown", 458 | "metadata": {}, 459 | "source": [ 460 | "Armijo划界法: \n", 461 | "1. 选定备选值$\\alpha^{\\left(0\\right)}$;\n", 462 | "2. 如果满足预定停止条件(通常是Armijo条件中的第一个不等式),则其为步长,算法结束;\n", 463 | "3. 否则,乘以$\\tau\\in\\left(0,1\\right)$,步长为$\\alpha^{k+1}=\\tau\\alpha^{\\left(k\\right)}$转2" 464 | ] 465 | }, 466 | { 467 | "cell_type": "code", 468 | "execution_count": null, 469 | "metadata": { 470 | "collapsed": true 471 | }, 472 | "outputs": [], 473 | "source": [] 474 | } 475 | ], 476 | "metadata": { 477 | "kernelspec": { 478 | "display_name": "Python 2", 479 | "language": "python", 480 | "name": "python2" 481 | }, 482 | "language_info": { 483 | "codemirror_mode": { 484 | "name": "ipython", 485 | "version": 2 486 | }, 487 | "file_extension": ".py", 488 | "mimetype": "text/x-python", 489 | "name": "python", 490 | "nbconvert_exporter": "python", 491 | "pygments_lexer": "ipython2", 492 | "version": "2.7.13" 493 | } 494 | }, 495 | "nbformat": 4, 496 | "nbformat_minor": 2 497 | } 498 | -------------------------------------------------------------------------------- /第8章 梯度方法.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": { 6 | "collapsed": true 7 | }, 8 | "source": [ 9 | "第8章 梯度方法" 10 | ] 11 | }, 12 | { 13 | "cell_type": "markdown", 14 | "metadata": {}, 15 | "source": [ 16 | "8.1 引言" 17 | ] 18 | }, 19 | { 20 | "cell_type": "markdown", 21 | "metadata": {}, 22 | "source": [ 23 | "如果函数$f$在$\\mathbf{x}_0$处的梯度$\\nabla f\\left(\\mathbf{x}_0\\right)$不是零向量,那么它与水平集$f\\left(\\mathbf{x}\\right)=c$中任意一条经过$\\mathbf{x}_0$处的光滑曲线的切向量正交。因此,一个实值可微函数在某点处函数值增加最快的方向正交于经过该点的函数水平集。" 24 | ] 25 | }, 26 | { 27 | "cell_type": "markdown", 28 | "metadata": {}, 29 | "source": [ 30 | "函数$f$在点$\\mathbf{x}$处,在方向$\\mathbf{d}$上增长率记为:$\\langle\\nabla f\\left(\\mathbf{x}\\right),\\mathbf{d}\\rangle$,$\\|\\mathbf{d}\\|=1$。由于$\\|\\mathbf{d}\\|=1$,由柯西-施瓦茨不等式可得\n", 31 | "$$\\langle\\nabla f\\left(\\mathbf{x}\\right),\\mathbf{d}\\rangle\\leqslant\\|\\nabla f\\left(\\mathbf{x}\\right)\\|$$" 32 | ] 33 | }, 34 | { 35 | "cell_type": "markdown", 36 | "metadata": {}, 37 | "source": [ 38 | "若令$\\mathbf{d}=\\nabla f\\left(\\mathbf{x}\\right)/\\|\\nabla f\\left(\\mathbf{x}\\right)\\|$,则有\n", 39 | "$$\\langle\\nabla f\\left(\\mathbf{x}\\right),\\frac{\\nabla f\\left(\\mathbf{x}\\right)}{\\|\\nabla f\\left(\\mathbf{x}\\right)\\|}\\rangle=\\|\\nabla f\\left(\\mathbf{x}\\right)\\|$$" 40 | ] 41 | }, 42 | { 43 | "cell_type": "markdown", 44 | "metadata": {}, 45 | "source": [ 46 | "可看出,梯度方向$\\nabla f\\left(\\mathbf{x}\\right)$就是函数$f$在$\\mathbf{x}$处增加最快的方向。反之,梯度负方向$-\\nabla f\\left(\\mathbf{x}\\right)$就是函数$f$在$\\mathbf{x}$处减少最快的方向。在梯度方向上,自变量的细微变动,所导致的目标函数值的增加幅度超过其他任意方向。如果需要搜索函数极小点,梯度负方向是优选方向。" 47 | ] 48 | }, 49 | { 50 | "cell_type": "markdown", 51 | "metadata": {}, 52 | "source": [ 53 | "令$\\mathbf{x}^{\\left(0\\right)}$作为初始搜索点,并沿着梯度负方向构造一个新点$\\mathbf{x}^{\\left(0\\right)}-\\alpha\\nabla f\\left(\\mathbf{x}^{\\left(0\\right)}\\right)$,由泰勒定理可得\n", 54 | "$$f\\left(\\mathbf{x}^{\\left(0\\right)}-\\alpha\\nabla f\\left(\\mathbf{x}^{\\left(0\\right)}\\right)\\right)=f\\left(\\mathbf{x}^{\\left(0\\right)}\\right)-\\alpha\\|\\nabla f\\left(\\mathbf{x}^{\\left(0\\right)}\\right)\\|^2+o\\left(\\alpha\\right)$$" 55 | ] 56 | }, 57 | { 58 | "cell_type": "markdown", 59 | "metadata": {}, 60 | "source": [ 61 | "因此,如果$\\nabla f\\left(\\mathbf{x}^{\\left(0\\right)}\\right)\\neq\\mathbf{0}$那么当$\\alpha > 0$足够小时,有\n", 62 | "$$f\\left( \\mathbf{x}^{\\left(0\\right)}-\\alpha\\nabla f\\left(\\mathbf{x}^{\\left(0\\right)}\\right)\\right)0$,有$$\\lim_{k\\to\\infty}\\frac{\\|\\mathbf{x}^{\\left(k+1\\right)}-\\mathbf{x}^*\\|}{\\|\\mathbf{x}^{\\left(k\\right)}-\\mathbf{x}^*\\|^p}=0$$\n", 272 | "则称收敛阶数为$\\infty$。" 273 | ] 274 | }, 275 | { 276 | "cell_type": "markdown", 277 | "metadata": {}, 278 | "source": [ 279 | "收敛阶数越高,收敛速度越快。" 280 | ] 281 | }, 282 | { 283 | "cell_type": "markdown", 284 | "metadata": {}, 285 | "source": [ 286 | "如果$p=1$(一阶收敛),$\\lim_{k\\to\\infty}\\|\\mathbf{x}^{\\left(k+1\\right)}-\\mathbf{x}^*\\|/\\|\\mathbf{x}^{\\left(k\\right)}-\\mathbf{x}^*\\|^p=1$,则称收敛是拟线性的。 \n", 287 | "如果$p=1$(一阶收敛),$\\lim_{k\\to\\infty}\\|\\mathbf{x}^{\\left(k+1\\right)}-\\mathbf{x}^*\\|/\\|\\mathbf{x}^{\\left(k\\right)}-\\mathbf{x}^*\\|^p<1$,则称收敛是线性的。 \n", 288 | "如果$p>1$,则称收敛是超线性的。\n", 289 | "如果$p=2$(二阶收敛),则称收敛是二次型的。" 290 | ] 291 | }, 292 | { 293 | "cell_type": "code", 294 | "execution_count": null, 295 | "metadata": { 296 | "collapsed": true 297 | }, 298 | "outputs": [], 299 | "source": [] 300 | } 301 | ], 302 | "metadata": { 303 | "kernelspec": { 304 | "display_name": "Python 2", 305 | "language": "python", 306 | "name": "python2" 307 | }, 308 | "language_info": { 309 | "codemirror_mode": { 310 | "name": "ipython", 311 | "version": 2 312 | }, 313 | "file_extension": ".py", 314 | "mimetype": "text/x-python", 315 | "name": "python", 316 | "nbconvert_exporter": "python", 317 | "pygments_lexer": "ipython2", 318 | "version": "2.7.13" 319 | } 320 | }, 321 | "nbformat": 4, 322 | "nbformat_minor": 2 323 | } 324 | -------------------------------------------------------------------------------- /第9章 牛顿法.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "第9章 牛顿法" 8 | ] 9 | }, 10 | { 11 | "cell_type": "markdown", 12 | "metadata": {}, 13 | "source": [ 14 | "9.1 引言" 15 | ] 16 | }, 17 | { 18 | "cell_type": "markdown", 19 | "metadata": {}, 20 | "source": [ 21 | "牛顿法同时使用一阶和二阶导数来确定搜索方向,当初始点与目标函数的极小点足够接近时,效率要优于最速下降法。" 22 | ] 23 | }, 24 | { 25 | "cell_type": "markdown", 26 | "metadata": {}, 27 | "source": [ 28 | "当目标函数$f:\\mathbb{R}^n\\to\\mathbb{R}$二阶连续可微时,将函数$f$在点$\\mathbf{x}^{\\left(k\\right)}$处进行泰勒展开(忽略三次以上的项),可得到二次型近似函数\n", 29 | "$$f\\left(\\mathbf{x}\\right)\\approx f\\left(\\mathbf{x}^{\\left(k\\right)}\\right)+\\left(\\mathbf{x}-\\mathbf{x}^{\\left(k\\right)}\\right)^\\top\\mathbf{g}^{\\left(k\\right)}+\\frac{1}{2}\\left(\\mathbf{x}-\\mathbf{x}^{\\left(k\\right)}\\right)^\\top\\mathbf{F}^{\\left(k\\right)}\\left(\\mathbf{x}-\\mathbf{x}^{\\left(k\\right)}\\right)\\triangleq q\\left(\\mathbf{x}\\right)$$" 30 | ] 31 | }, 32 | { 33 | "cell_type": "markdown", 34 | "metadata": {}, 35 | "source": [ 36 | "令$\\mathbf{g}^{\\left(k\\right)}=\\nabla f\\left(\\mathbf{x}^{\\left(k\\right)}\\right)$,应用函数$q$的局部极小点的一阶必要条件,得\n", 37 | "$$\\nabla q\\left(\\mathbf{x}\\right)=\\mathbf{g}^{\\left(k\\right)}+\\mathbf{F}\\left(\\mathbf{x}^{\\left(k\\right)}\\right)\\left(\\mathbf{x}-\\mathbf{x}^{\\left(k\\right)}\\right)=0$$" 38 | ] 39 | }, 40 | { 41 | "cell_type": "markdown", 42 | "metadata": {}, 43 | "source": [ 44 | "如果$\\mathbf{F}\\left(\\mathbf{x}^{\\left(k\\right)}\\right)>0$,牛顿迭代公式为\n", 45 | "$$\\mathbf{x}^{\\left(k+1\\right)}=\\mathbf{x}^{\\left(k\\right)}-\\mathbf{F}\\left(\\mathbf{x}^{\\left(k\\right)}\\right)^{-1}\\mathbf{g}^{\\left(k\\right)}$$" 46 | ] 47 | }, 48 | { 49 | "cell_type": "markdown", 50 | "metadata": {}, 51 | "source": [ 52 | "9.2 牛顿法性质分析" 53 | ] 54 | }, 55 | { 56 | "cell_type": "markdown", 57 | "metadata": {}, 58 | "source": [ 59 | "当初始点$\\mathbf{x}^{\\left(0\\right)}$远离函数极小点时,即使$\\mathbf{F}\\left(\\mathbf{x}^{\\left(k\\right)}\\right)>0$,牛顿法也不具有下降性,即可能出现$f\\left(\\mathbf{x}^{\\left(k+1\\right)}\\right)\\geqslant f\\left(\\mathbf{x}^{\\left(k\\right)}\\right)$。" 60 | ] 61 | }, 62 | { 63 | "cell_type": "markdown", 64 | "metadata": {}, 65 | "source": [ 66 | "定理9.2 $\\{\\mathbf{x}^{\\left(k\\right)}\\}$为利用牛顿法求解目标函数$f\\left(\\mathbf{x}\\right)$极小点时得到的迭代点序列,如果$\\mathbf{F}\\left(\\mathbf{x}^{\\left(k\\right)}\\right)>0$,且$\\mathbf{g}^{\\left(k\\right)}=\\nabla f\\left(\\mathbf{x}^{\\left(k\\right)}\\right)\\neq\\mathbf{0}$,则从点$\\mathbf{x}^{\\left(k\\right)}$到点$\\mathbf{x}^{\\left(+1\\right)}$的搜索方向\n", 67 | "$$\\mathbf{d}^{\\left(k\\right)}=-\\mathbf{F}\\left(\\mathbf{x}^{\\left(k\\right)}\\right)^{-1}\\mathbf{g}^{\\left(k\\right)}=\\mathbf{x}^{\\left(k+1\\right)}-\\mathbf{x}^{\\left(k\\right)}$$\n", 68 | "是一个下降方向,即存在一个$\\bar{\\alpha}>0$,使得对于所有$\\alpha\\in\\left(0,\\bar{\\alpha}\\right)$,都有\n", 69 | "$$f\\left(\\mathbf{x}^{\\left(k\\right)}+\\alpha\\mathbf{d}^{\\left(k\\right)}\\right)