├── .gitbook
    └── assets
    │   ├── 10.1.png
    │   ├── 11.1.1.png
    │   ├── 11.1.2.png
    │   ├── 11.2.1.png
    │   ├── 11.2.2.png
    │   ├── 11.2.3.png
    │   ├── 11.2.4.png
    │   ├── 11.2.5.png
    │   ├── 11.3.1.png
    │   ├── 11.3.2.png
    │   ├── 11.4.1.png
    │   ├── 11.4.2.png
    │   ├── 2.2.1.png
    │   ├── 2.3.1.png
    │   ├── 3.2.1.jpg
    │   ├── 3.2.2.png
    │   ├── 3.2.3.png
    │   ├── 3.3.1.png
    │   ├── 3.3.2.png
    │   ├── 3.4.1.png
    │   ├── 3.5.1.png
    │   ├── 3.6.1.png
    │   ├── 4.2.1.png
    │   ├── 4.2.2.png
    │   ├── 4.3.1.png
    │   ├── 5.2.1.png
    │   ├── 6.2.1.png
    │   ├── 6.3.1.png
    │   ├── 6.3.2.png
    │   ├── 7.1.1.png
    │   ├── 7.1.2.png
    │   ├── 7.1.3.png
    │   ├── 8.1.1.png
    │   ├── 8.1.2.png
    │   ├── 8.1.3.png
    │   ├── 8.1.4.png
    │   ├── HMM1.png
    │   ├── HMM2.png
    │   ├── HMM参数.png
    │   ├── K-mean.png
    │   ├── KL1.png
    │   ├── KL2.png
    │   ├── acc.png
    │   ├── myplot.png
    │   ├── 判别函数1.png
    │   ├── 判别函数2.png
    │   ├── 判别函数3.png
    │   ├── 朴素贝叶斯1.png
    │   ├── 聚类1.png
    │   ├── 聚类2.png
    │   └── 贝叶斯.png
├── LICENSE
├── README.md
├── SUMMARY.md
├── di-ba-zhang-ju-lei
    ├── 8.1-ji-ben-gai-nian.md
    ├── 8.2-jing-dian-ju-lei-suan-fa.md
    └── fu-di-ba-zhang-zuo-ye.md
├── di-er-zhang-sheng-cheng-shi-fen-lei-qi
    ├── 2.1-mo-shi-shi-bie-yu-ji-qi-xue-xi-de-mu-biao.md
    ├── 2.2-zheng-tai-fen-bu-mo-shi-de-bei-ye-si-fen-lei-qi.md
    ├── 2.3-jun-zhi-xiang-liang-he-xie-fang-cha-ju-zhen-de-can-shu-gu-ji.md
    └── fu-di-er-zhang-zuo-ye.md
├── di-jiu-zhang-jiang-wei
    ├── 9.1-ji-ben-gai-nian.md
    ├── 9.2-wei-du-xuan-ze.md
    └── 9.3-wei-du-chou-qu.md
├── di-liu-zhang-you-jian-du-xue-xi
    ├── 6.1-you-jian-du-xue-xi.md
    ├── 6.2-hui-gui-ren-wu.md
    ├── 6.3-fen-lei-wen-ti.md
    └── fu-di-liu-zhang-zuo-ye.md
├── di-qi-zhang-zhi-chi-xiang-liang-ji
    ├── 7.1-xian-xing-zhi-chi-xiang-liang-ji.md
    ├── 7.2-he-zhi-chi-xiang-liang-ji.md
    ├── 7.3-xu-lie-zui-xiao-you-hua-suan-fa.md
    └── fu-di-qi-zhang-zuo-ye.md
├── di-san-zhang-pan-bie-shi-fen-lei-qi
    ├── 3.1-pan-bie-shi-fen-lei-qi-yu-sheng-cheng-shi-fen-lei-qi.md
    ├── 3.2-xian-xing-pan-bie-han-shu.md
    ├── 3.3-guang-yi-xian-xing-pan-bie-han-shu.md
    ├── 3.4-fisher-xian-xing-pan-bie.md
    ├── 3.5-gan-zhi-qi-suan-fa.md
    ├── 3.6-ke-xun-lian-de-que-ding-xing-fen-lei-qi-de-die-dai-suan-fa.md
    ├── 3.7-shi-han-shu-fa.md
    ├── 3.8-jue-ce-shu.md
    └── fu-di-san-zhang-zuo-ye.md
├── di-shi-er-zhang-ji-cheng-xue-xi
    ├── 12.1-jian-jie.md
    ├── 12.2-bagging.md
    ├── 12.3-boosting.md
    └── fu-di-shi-er-zhang-zuo-ye.md
├── di-shi-yi-zhang-gai-shuai-tu-mo-xing
    ├── 11.1-pgm-jian-jie.md
    ├── 11.2-you-xiang-tu-mo-xing-bei-ye-si-wang-luo.md
    ├── 11.3-wu-xiang-tu-mo-xing-ma-er-ke-fu-sui-ji-chang.md
    ├── 11.4-xue-xi-he-tui-duan.md
    ├── 11.5-dian-xing-gai-shuai-tu-mo-xing.md
    └── fu-di-shi-yi-zhang-zuo-ye.md
├── di-shi-zhang-ban-jian-du-xue-xi
    ├── 10.1-ji-ben-gai-nian.md
    └── 10.2-ban-jian-du-xue-xi-suan-fa.md
├── di-si-zhang-te-zheng-xuan-ze-he-ti-qu
    ├── 4.1-mo-shi-lei-bie-ke-fen-xing-de-ce-du.md
    ├── 4.2-te-zheng-xuan-ze.md
    ├── 4.3-li-san-kl-bian-huan.md
    └── fu-di-si-zhang-zuo-ye.md
├── di-wu-zhang-tong-ji-ji-qi-xue-xi
    ├── 5.1-ji-qi-xue-xi-jian-jie.md
    └── 5.2-tong-ji-ji-qi-xue-xi.md
├── di-yi-zhang-gai-shu
    └── 1.1-gai-shu.md
├── pull.bat
└── update.bat


/.gitbook/assets/10.1.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Aye10032/PRMLNote/bddb83ab3aedb7f91490312f5d53ee1f7812b06b/.gitbook/assets/10.1.png


--------------------------------------------------------------------------------
/.gitbook/assets/11.1.1.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Aye10032/PRMLNote/bddb83ab3aedb7f91490312f5d53ee1f7812b06b/.gitbook/assets/11.1.1.png


--------------------------------------------------------------------------------
/.gitbook/assets/11.1.2.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Aye10032/PRMLNote/bddb83ab3aedb7f91490312f5d53ee1f7812b06b/.gitbook/assets/11.1.2.png


--------------------------------------------------------------------------------
/.gitbook/assets/11.2.1.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Aye10032/PRMLNote/bddb83ab3aedb7f91490312f5d53ee1f7812b06b/.gitbook/assets/11.2.1.png


--------------------------------------------------------------------------------
/.gitbook/assets/11.2.2.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Aye10032/PRMLNote/bddb83ab3aedb7f91490312f5d53ee1f7812b06b/.gitbook/assets/11.2.2.png


--------------------------------------------------------------------------------
/.gitbook/assets/11.2.3.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Aye10032/PRMLNote/bddb83ab3aedb7f91490312f5d53ee1f7812b06b/.gitbook/assets/11.2.3.png


--------------------------------------------------------------------------------
/.gitbook/assets/11.2.4.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Aye10032/PRMLNote/bddb83ab3aedb7f91490312f5d53ee1f7812b06b/.gitbook/assets/11.2.4.png


--------------------------------------------------------------------------------
/.gitbook/assets/11.2.5.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Aye10032/PRMLNote/bddb83ab3aedb7f91490312f5d53ee1f7812b06b/.gitbook/assets/11.2.5.png


--------------------------------------------------------------------------------
/.gitbook/assets/11.3.1.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Aye10032/PRMLNote/bddb83ab3aedb7f91490312f5d53ee1f7812b06b/.gitbook/assets/11.3.1.png


--------------------------------------------------------------------------------
/.gitbook/assets/11.3.2.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Aye10032/PRMLNote/bddb83ab3aedb7f91490312f5d53ee1f7812b06b/.gitbook/assets/11.3.2.png


--------------------------------------------------------------------------------
/.gitbook/assets/11.4.1.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Aye10032/PRMLNote/bddb83ab3aedb7f91490312f5d53ee1f7812b06b/.gitbook/assets/11.4.1.png


--------------------------------------------------------------------------------
/.gitbook/assets/11.4.2.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Aye10032/PRMLNote/bddb83ab3aedb7f91490312f5d53ee1f7812b06b/.gitbook/assets/11.4.2.png


--------------------------------------------------------------------------------
/.gitbook/assets/2.2.1.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Aye10032/PRMLNote/bddb83ab3aedb7f91490312f5d53ee1f7812b06b/.gitbook/assets/2.2.1.png


--------------------------------------------------------------------------------
/.gitbook/assets/2.3.1.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Aye10032/PRMLNote/bddb83ab3aedb7f91490312f5d53ee1f7812b06b/.gitbook/assets/2.3.1.png


--------------------------------------------------------------------------------
/.gitbook/assets/3.2.1.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Aye10032/PRMLNote/bddb83ab3aedb7f91490312f5d53ee1f7812b06b/.gitbook/assets/3.2.1.jpg


--------------------------------------------------------------------------------
/.gitbook/assets/3.2.2.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Aye10032/PRMLNote/bddb83ab3aedb7f91490312f5d53ee1f7812b06b/.gitbook/assets/3.2.2.png


--------------------------------------------------------------------------------
/.gitbook/assets/3.2.3.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Aye10032/PRMLNote/bddb83ab3aedb7f91490312f5d53ee1f7812b06b/.gitbook/assets/3.2.3.png


--------------------------------------------------------------------------------
/.gitbook/assets/3.3.1.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Aye10032/PRMLNote/bddb83ab3aedb7f91490312f5d53ee1f7812b06b/.gitbook/assets/3.3.1.png


--------------------------------------------------------------------------------
/.gitbook/assets/3.3.2.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Aye10032/PRMLNote/bddb83ab3aedb7f91490312f5d53ee1f7812b06b/.gitbook/assets/3.3.2.png


--------------------------------------------------------------------------------
/.gitbook/assets/3.4.1.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Aye10032/PRMLNote/bddb83ab3aedb7f91490312f5d53ee1f7812b06b/.gitbook/assets/3.4.1.png


--------------------------------------------------------------------------------
/.gitbook/assets/3.5.1.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Aye10032/PRMLNote/bddb83ab3aedb7f91490312f5d53ee1f7812b06b/.gitbook/assets/3.5.1.png


--------------------------------------------------------------------------------
/.gitbook/assets/3.6.1.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Aye10032/PRMLNote/bddb83ab3aedb7f91490312f5d53ee1f7812b06b/.gitbook/assets/3.6.1.png


--------------------------------------------------------------------------------
/.gitbook/assets/4.2.1.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Aye10032/PRMLNote/bddb83ab3aedb7f91490312f5d53ee1f7812b06b/.gitbook/assets/4.2.1.png


--------------------------------------------------------------------------------
/.gitbook/assets/4.2.2.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Aye10032/PRMLNote/bddb83ab3aedb7f91490312f5d53ee1f7812b06b/.gitbook/assets/4.2.2.png


--------------------------------------------------------------------------------
/.gitbook/assets/4.3.1.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Aye10032/PRMLNote/bddb83ab3aedb7f91490312f5d53ee1f7812b06b/.gitbook/assets/4.3.1.png


--------------------------------------------------------------------------------
/.gitbook/assets/5.2.1.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Aye10032/PRMLNote/bddb83ab3aedb7f91490312f5d53ee1f7812b06b/.gitbook/assets/5.2.1.png


--------------------------------------------------------------------------------
/.gitbook/assets/6.2.1.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Aye10032/PRMLNote/bddb83ab3aedb7f91490312f5d53ee1f7812b06b/.gitbook/assets/6.2.1.png


--------------------------------------------------------------------------------
/.gitbook/assets/6.3.1.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Aye10032/PRMLNote/bddb83ab3aedb7f91490312f5d53ee1f7812b06b/.gitbook/assets/6.3.1.png


--------------------------------------------------------------------------------
/.gitbook/assets/6.3.2.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Aye10032/PRMLNote/bddb83ab3aedb7f91490312f5d53ee1f7812b06b/.gitbook/assets/6.3.2.png


--------------------------------------------------------------------------------
/.gitbook/assets/7.1.1.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Aye10032/PRMLNote/bddb83ab3aedb7f91490312f5d53ee1f7812b06b/.gitbook/assets/7.1.1.png


--------------------------------------------------------------------------------
/.gitbook/assets/7.1.2.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Aye10032/PRMLNote/bddb83ab3aedb7f91490312f5d53ee1f7812b06b/.gitbook/assets/7.1.2.png


--------------------------------------------------------------------------------
/.gitbook/assets/7.1.3.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Aye10032/PRMLNote/bddb83ab3aedb7f91490312f5d53ee1f7812b06b/.gitbook/assets/7.1.3.png


--------------------------------------------------------------------------------
/.gitbook/assets/8.1.1.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Aye10032/PRMLNote/bddb83ab3aedb7f91490312f5d53ee1f7812b06b/.gitbook/assets/8.1.1.png


--------------------------------------------------------------------------------
/.gitbook/assets/8.1.2.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Aye10032/PRMLNote/bddb83ab3aedb7f91490312f5d53ee1f7812b06b/.gitbook/assets/8.1.2.png


--------------------------------------------------------------------------------
/.gitbook/assets/8.1.3.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Aye10032/PRMLNote/bddb83ab3aedb7f91490312f5d53ee1f7812b06b/.gitbook/assets/8.1.3.png


--------------------------------------------------------------------------------
/.gitbook/assets/8.1.4.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Aye10032/PRMLNote/bddb83ab3aedb7f91490312f5d53ee1f7812b06b/.gitbook/assets/8.1.4.png


--------------------------------------------------------------------------------
/.gitbook/assets/HMM1.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Aye10032/PRMLNote/bddb83ab3aedb7f91490312f5d53ee1f7812b06b/.gitbook/assets/HMM1.png


--------------------------------------------------------------------------------
/.gitbook/assets/HMM2.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Aye10032/PRMLNote/bddb83ab3aedb7f91490312f5d53ee1f7812b06b/.gitbook/assets/HMM2.png


--------------------------------------------------------------------------------
/.gitbook/assets/HMM参数.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Aye10032/PRMLNote/bddb83ab3aedb7f91490312f5d53ee1f7812b06b/.gitbook/assets/HMM参数.png


--------------------------------------------------------------------------------
/.gitbook/assets/K-mean.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Aye10032/PRMLNote/bddb83ab3aedb7f91490312f5d53ee1f7812b06b/.gitbook/assets/K-mean.png


--------------------------------------------------------------------------------
/.gitbook/assets/KL1.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Aye10032/PRMLNote/bddb83ab3aedb7f91490312f5d53ee1f7812b06b/.gitbook/assets/KL1.png


--------------------------------------------------------------------------------
/.gitbook/assets/KL2.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Aye10032/PRMLNote/bddb83ab3aedb7f91490312f5d53ee1f7812b06b/.gitbook/assets/KL2.png


--------------------------------------------------------------------------------
/.gitbook/assets/acc.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Aye10032/PRMLNote/bddb83ab3aedb7f91490312f5d53ee1f7812b06b/.gitbook/assets/acc.png


--------------------------------------------------------------------------------
/.gitbook/assets/myplot.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Aye10032/PRMLNote/bddb83ab3aedb7f91490312f5d53ee1f7812b06b/.gitbook/assets/myplot.png


--------------------------------------------------------------------------------
/.gitbook/assets/判别函数1.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Aye10032/PRMLNote/bddb83ab3aedb7f91490312f5d53ee1f7812b06b/.gitbook/assets/判别函数1.png


--------------------------------------------------------------------------------
/.gitbook/assets/判别函数2.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Aye10032/PRMLNote/bddb83ab3aedb7f91490312f5d53ee1f7812b06b/.gitbook/assets/判别函数2.png


--------------------------------------------------------------------------------
/.gitbook/assets/判别函数3.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Aye10032/PRMLNote/bddb83ab3aedb7f91490312f5d53ee1f7812b06b/.gitbook/assets/判别函数3.png


--------------------------------------------------------------------------------
/.gitbook/assets/朴素贝叶斯1.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Aye10032/PRMLNote/bddb83ab3aedb7f91490312f5d53ee1f7812b06b/.gitbook/assets/朴素贝叶斯1.png


--------------------------------------------------------------------------------
/.gitbook/assets/聚类1.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Aye10032/PRMLNote/bddb83ab3aedb7f91490312f5d53ee1f7812b06b/.gitbook/assets/聚类1.png


--------------------------------------------------------------------------------
/.gitbook/assets/聚类2.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Aye10032/PRMLNote/bddb83ab3aedb7f91490312f5d53ee1f7812b06b/.gitbook/assets/聚类2.png


--------------------------------------------------------------------------------
/.gitbook/assets/贝叶斯.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Aye10032/PRMLNote/bddb83ab3aedb7f91490312f5d53ee1f7812b06b/.gitbook/assets/贝叶斯.png


--------------------------------------------------------------------------------
/LICENSE:
--------------------------------------------------------------------------------
  1 |                     GNU GENERAL PUBLIC LICENSE
  2 |                        Version 3, 29 June 2007
  3 | 
  4 |  Copyright (C) 2007 Free Software Foundation, Inc. <https://fsf.org/>
  5 |  Everyone is permitted to copy and distribute verbatim copies
  6 |  of this license document, but changing it is not allowed.
  7 | 
  8 |                             Preamble
  9 | 
 10 |   The GNU General Public License is a free, copyleft license for
 11 | software and other kinds of works.
 12 | 
 13 |   The licenses for most software and other practical works are designed
 14 | to take away your freedom to share and change the works.  By contrast,
 15 | the GNU General Public License is intended to guarantee your freedom to
 16 | share and change all versions of a program--to make sure it remains free
 17 | software for all its users.  We, the Free Software Foundation, use the
 18 | GNU General Public License for most of our software; it applies also to
 19 | any other work released this way by its authors.  You can apply it to
 20 | your programs, too.
 21 | 
 22 |   When we speak of free software, we are referring to freedom, not
 23 | price.  Our General Public Licenses are designed to make sure that you
 24 | have the freedom to distribute copies of free software (and charge for
 25 | them if you wish), that you receive source code or can get it if you
 26 | want it, that you can change the software or use pieces of it in new
 27 | free programs, and that you know you can do these things.
 28 | 
 29 |   To protect your rights, we need to prevent others from denying you
 30 | these rights or asking you to surrender the rights.  Therefore, you have
 31 | certain responsibilities if you distribute copies of the software, or if
 32 | you modify it: responsibilities to respect the freedom of others.
 33 | 
 34 |   For example, if you distribute copies of such a program, whether
 35 | gratis or for a fee, you must pass on to the recipients the same
 36 | freedoms that you received.  You must make sure that they, too, receive
 37 | or can get the source code.  And you must show them these terms so they
 38 | know their rights.
 39 | 
 40 |   Developers that use the GNU GPL protect your rights with two steps:
 41 | (1) assert copyright on the software, and (2) offer you this License
 42 | giving you legal permission to copy, distribute and/or modify it.
 43 | 
 44 |   For the developers' and authors' protection, the GPL clearly explains
 45 | that there is no warranty for this free software.  For both users' and
 46 | authors' sake, the GPL requires that modified versions be marked as
 47 | changed, so that their problems will not be attributed erroneously to
 48 | authors of previous versions.
 49 | 
 50 |   Some devices are designed to deny users access to install or run
 51 | modified versions of the software inside them, although the manufacturer
 52 | can do so.  This is fundamentally incompatible with the aim of
 53 | protecting users' freedom to change the software.  The systematic
 54 | pattern of such abuse occurs in the area of products for individuals to
 55 | use, which is precisely where it is most unacceptable.  Therefore, we
 56 | have designed this version of the GPL to prohibit the practice for those
 57 | products.  If such problems arise substantially in other domains, we
 58 | stand ready to extend this provision to those domains in future versions
 59 | of the GPL, as needed to protect the freedom of users.
 60 | 
 61 |   Finally, every program is threatened constantly by software patents.
 62 | States should not allow patents to restrict development and use of
 63 | software on general-purpose computers, but in those that do, we wish to
 64 | avoid the special danger that patents applied to a free program could
 65 | make it effectively proprietary.  To prevent this, the GPL assures that
 66 | patents cannot be used to render the program non-free.
 67 | 
 68 |   The precise terms and conditions for copying, distribution and
 69 | modification follow.
 70 | 
 71 |                        TERMS AND CONDITIONS
 72 | 
 73 |   0. Definitions.
 74 | 
 75 |   "This License" refers to version 3 of the GNU General Public License.
 76 | 
 77 |   "Copyright" also means copyright-like laws that apply to other kinds of
 78 | works, such as semiconductor masks.
 79 | 
 80 |   "The Program" refers to any copyrightable work licensed under this
 81 | License.  Each licensee is addressed as "you".  "Licensees" and
 82 | "recipients" may be individuals or organizations.
 83 | 
 84 |   To "modify" a work means to copy from or adapt all or part of the work
 85 | in a fashion requiring copyright permission, other than the making of an
 86 | exact copy.  The resulting work is called a "modified version" of the
 87 | earlier work or a work "based on" the earlier work.
 88 | 
 89 |   A "covered work" means either the unmodified Program or a work based
 90 | on the Program.
 91 | 
 92 |   To "propagate" a work means to do anything with it that, without
 93 | permission, would make you directly or secondarily liable for
 94 | infringement under applicable copyright law, except executing it on a
 95 | computer or modifying a private copy.  Propagation includes copying,
 96 | distribution (with or without modification), making available to the
 97 | public, and in some countries other activities as well.
 98 | 
 99 |   To "convey" a work means any kind of propagation that enables other
100 | parties to make or receive copies.  Mere interaction with a user through
101 | a computer network, with no transfer of a copy, is not conveying.
102 | 
103 |   An interactive user interface displays "Appropriate Legal Notices"
104 | to the extent that it includes a convenient and prominently visible
105 | feature that (1) displays an appropriate copyright notice, and (2)
106 | tells the user that there is no warranty for the work (except to the
107 | extent that warranties are provided), that licensees may convey the
108 | work under this License, and how to view a copy of this License.  If
109 | the interface presents a list of user commands or options, such as a
110 | menu, a prominent item in the list meets this criterion.
111 | 
112 |   1. Source Code.
113 | 
114 |   The "source code" for a work means the preferred form of the work
115 | for making modifications to it.  "Object code" means any non-source
116 | form of a work.
117 | 
118 |   A "Standard Interface" means an interface that either is an official
119 | standard defined by a recognized standards body, or, in the case of
120 | interfaces specified for a particular programming language, one that
121 | is widely used among developers working in that language.
122 | 
123 |   The "System Libraries" of an executable work include anything, other
124 | than the work as a whole, that (a) is included in the normal form of
125 | packaging a Major Component, but which is not part of that Major
126 | Component, and (b) serves only to enable use of the work with that
127 | Major Component, or to implement a Standard Interface for which an
128 | implementation is available to the public in source code form.  A
129 | "Major Component", in this context, means a major essential component
130 | (kernel, window system, and so on) of the specific operating system
131 | (if any) on which the executable work runs, or a compiler used to
132 | produce the work, or an object code interpreter used to run it.
133 | 
134 |   The "Corresponding Source" for a work in object code form means all
135 | the source code needed to generate, install, and (for an executable
136 | work) run the object code and to modify the work, including scripts to
137 | control those activities.  However, it does not include the work's
138 | System Libraries, or general-purpose tools or generally available free
139 | programs which are used unmodified in performing those activities but
140 | which are not part of the work.  For example, Corresponding Source
141 | includes interface definition files associated with source files for
142 | the work, and the source code for shared libraries and dynamically
143 | linked subprograms that the work is specifically designed to require,
144 | such as by intimate data communication or control flow between those
145 | subprograms and other parts of the work.
146 | 
147 |   The Corresponding Source need not include anything that users
148 | can regenerate automatically from other parts of the Corresponding
149 | Source.
150 | 
151 |   The Corresponding Source for a work in source code form is that
152 | same work.
153 | 
154 |   2. Basic Permissions.
155 | 
156 |   All rights granted under this License are granted for the term of
157 | copyright on the Program, and are irrevocable provided the stated
158 | conditions are met.  This License explicitly affirms your unlimited
159 | permission to run the unmodified Program.  The output from running a
160 | covered work is covered by this License only if the output, given its
161 | content, constitutes a covered work.  This License acknowledges your
162 | rights of fair use or other equivalent, as provided by copyright law.
163 | 
164 |   You may make, run and propagate covered works that you do not
165 | convey, without conditions so long as your license otherwise remains
166 | in force.  You may convey covered works to others for the sole purpose
167 | of having them make modifications exclusively for you, or provide you
168 | with facilities for running those works, provided that you comply with
169 | the terms of this License in conveying all material for which you do
170 | not control copyright.  Those thus making or running the covered works
171 | for you must do so exclusively on your behalf, under your direction
172 | and control, on terms that prohibit them from making any copies of
173 | your copyrighted material outside their relationship with you.
174 | 
175 |   Conveying under any other circumstances is permitted solely under
176 | the conditions stated below.  Sublicensing is not allowed; section 10
177 | makes it unnecessary.
178 | 
179 |   3. Protecting Users' Legal Rights From Anti-Circumvention Law.
180 | 
181 |   No covered work shall be deemed part of an effective technological
182 | measure under any applicable law fulfilling obligations under article
183 | 11 of the WIPO copyright treaty adopted on 20 December 1996, or
184 | similar laws prohibiting or restricting circumvention of such
185 | measures.
186 | 
187 |   When you convey a covered work, you waive any legal power to forbid
188 | circumvention of technological measures to the extent such circumvention
189 | is effected by exercising rights under this License with respect to
190 | the covered work, and you disclaim any intention to limit operation or
191 | modification of the work as a means of enforcing, against the work's
192 | users, your or third parties' legal rights to forbid circumvention of
193 | technological measures.
194 | 
195 |   4. Conveying Verbatim Copies.
196 | 
197 |   You may convey verbatim copies of the Program's source code as you
198 | receive it, in any medium, provided that you conspicuously and
199 | appropriately publish on each copy an appropriate copyright notice;
200 | keep intact all notices stating that this License and any
201 | non-permissive terms added in accord with section 7 apply to the code;
202 | keep intact all notices of the absence of any warranty; and give all
203 | recipients a copy of this License along with the Program.
204 | 
205 |   You may charge any price or no price for each copy that you convey,
206 | and you may offer support or warranty protection for a fee.
207 | 
208 |   5. Conveying Modified Source Versions.
209 | 
210 |   You may convey a work based on the Program, or the modifications to
211 | produce it from the Program, in the form of source code under the
212 | terms of section 4, provided that you also meet all of these conditions:
213 | 
214 |     a) The work must carry prominent notices stating that you modified
215 |     it, and giving a relevant date.
216 | 
217 |     b) The work must carry prominent notices stating that it is
218 |     released under this License and any conditions added under section
219 |     7.  This requirement modifies the requirement in section 4 to
220 |     "keep intact all notices".
221 | 
222 |     c) You must license the entire work, as a whole, under this
223 |     License to anyone who comes into possession of a copy.  This
224 |     License will therefore apply, along with any applicable section 7
225 |     additional terms, to the whole of the work, and all its parts,
226 |     regardless of how they are packaged.  This License gives no
227 |     permission to license the work in any other way, but it does not
228 |     invalidate such permission if you have separately received it.
229 | 
230 |     d) If the work has interactive user interfaces, each must display
231 |     Appropriate Legal Notices; however, if the Program has interactive
232 |     interfaces that do not display Appropriate Legal Notices, your
233 |     work need not make them do so.
234 | 
235 |   A compilation of a covered work with other separate and independent
236 | works, which are not by their nature extensions of the covered work,
237 | and which are not combined with it such as to form a larger program,
238 | in or on a volume of a storage or distribution medium, is called an
239 | "aggregate" if the compilation and its resulting copyright are not
240 | used to limit the access or legal rights of the compilation's users
241 | beyond what the individual works permit.  Inclusion of a covered work
242 | in an aggregate does not cause this License to apply to the other
243 | parts of the aggregate.
244 | 
245 |   6. Conveying Non-Source Forms.
246 | 
247 |   You may convey a covered work in object code form under the terms
248 | of sections 4 and 5, provided that you also convey the
249 | machine-readable Corresponding Source under the terms of this License,
250 | in one of these ways:
251 | 
252 |     a) Convey the object code in, or embodied in, a physical product
253 |     (including a physical distribution medium), accompanied by the
254 |     Corresponding Source fixed on a durable physical medium
255 |     customarily used for software interchange.
256 | 
257 |     b) Convey the object code in, or embodied in, a physical product
258 |     (including a physical distribution medium), accompanied by a
259 |     written offer, valid for at least three years and valid for as
260 |     long as you offer spare parts or customer support for that product
261 |     model, to give anyone who possesses the object code either (1) a
262 |     copy of the Corresponding Source for all the software in the
263 |     product that is covered by this License, on a durable physical
264 |     medium customarily used for software interchange, for a price no
265 |     more than your reasonable cost of physically performing this
266 |     conveying of source, or (2) access to copy the
267 |     Corresponding Source from a network server at no charge.
268 | 
269 |     c) Convey individual copies of the object code with a copy of the
270 |     written offer to provide the Corresponding Source.  This
271 |     alternative is allowed only occasionally and noncommercially, and
272 |     only if you received the object code with such an offer, in accord
273 |     with subsection 6b.
274 | 
275 |     d) Convey the object code by offering access from a designated
276 |     place (gratis or for a charge), and offer equivalent access to the
277 |     Corresponding Source in the same way through the same place at no
278 |     further charge.  You need not require recipients to copy the
279 |     Corresponding Source along with the object code.  If the place to
280 |     copy the object code is a network server, the Corresponding Source
281 |     may be on a different server (operated by you or a third party)
282 |     that supports equivalent copying facilities, provided you maintain
283 |     clear directions next to the object code saying where to find the
284 |     Corresponding Source.  Regardless of what server hosts the
285 |     Corresponding Source, you remain obligated to ensure that it is
286 |     available for as long as needed to satisfy these requirements.
287 | 
288 |     e) Convey the object code using peer-to-peer transmission, provided
289 |     you inform other peers where the object code and Corresponding
290 |     Source of the work are being offered to the general public at no
291 |     charge under subsection 6d.
292 | 
293 |   A separable portion of the object code, whose source code is excluded
294 | from the Corresponding Source as a System Library, need not be
295 | included in conveying the object code work.
296 | 
297 |   A "User Product" is either (1) a "consumer product", which means any
298 | tangible personal property which is normally used for personal, family,
299 | or household purposes, or (2) anything designed or sold for incorporation
300 | into a dwelling.  In determining whether a product is a consumer product,
301 | doubtful cases shall be resolved in favor of coverage.  For a particular
302 | product received by a particular user, "normally used" refers to a
303 | typical or common use of that class of product, regardless of the status
304 | of the particular user or of the way in which the particular user
305 | actually uses, or expects or is expected to use, the product.  A product
306 | is a consumer product regardless of whether the product has substantial
307 | commercial, industrial or non-consumer uses, unless such uses represent
308 | the only significant mode of use of the product.
309 | 
310 |   "Installation Information" for a User Product means any methods,
311 | procedures, authorization keys, or other information required to install
312 | and execute modified versions of a covered work in that User Product from
313 | a modified version of its Corresponding Source.  The information must
314 | suffice to ensure that the continued functioning of the modified object
315 | code is in no case prevented or interfered with solely because
316 | modification has been made.
317 | 
318 |   If you convey an object code work under this section in, or with, or
319 | specifically for use in, a User Product, and the conveying occurs as
320 | part of a transaction in which the right of possession and use of the
321 | User Product is transferred to the recipient in perpetuity or for a
322 | fixed term (regardless of how the transaction is characterized), the
323 | Corresponding Source conveyed under this section must be accompanied
324 | by the Installation Information.  But this requirement does not apply
325 | if neither you nor any third party retains the ability to install
326 | modified object code on the User Product (for example, the work has
327 | been installed in ROM).
328 | 
329 |   The requirement to provide Installation Information does not include a
330 | requirement to continue to provide support service, warranty, or updates
331 | for a work that has been modified or installed by the recipient, or for
332 | the User Product in which it has been modified or installed.  Access to a
333 | network may be denied when the modification itself materially and
334 | adversely affects the operation of the network or violates the rules and
335 | protocols for communication across the network.
336 | 
337 |   Corresponding Source conveyed, and Installation Information provided,
338 | in accord with this section must be in a format that is publicly
339 | documented (and with an implementation available to the public in
340 | source code form), and must require no special password or key for
341 | unpacking, reading or copying.
342 | 
343 |   7. Additional Terms.
344 | 
345 |   "Additional permissions" are terms that supplement the terms of this
346 | License by making exceptions from one or more of its conditions.
347 | Additional permissions that are applicable to the entire Program shall
348 | be treated as though they were included in this License, to the extent
349 | that they are valid under applicable law.  If additional permissions
350 | apply only to part of the Program, that part may be used separately
351 | under those permissions, but the entire Program remains governed by
352 | this License without regard to the additional permissions.
353 | 
354 |   When you convey a copy of a covered work, you may at your option
355 | remove any additional permissions from that copy, or from any part of
356 | it.  (Additional permissions may be written to require their own
357 | removal in certain cases when you modify the work.)  You may place
358 | additional permissions on material, added by you to a covered work,
359 | for which you have or can give appropriate copyright permission.
360 | 
361 |   Notwithstanding any other provision of this License, for material you
362 | add to a covered work, you may (if authorized by the copyright holders of
363 | that material) supplement the terms of this License with terms:
364 | 
365 |     a) Disclaiming warranty or limiting liability differently from the
366 |     terms of sections 15 and 16 of this License; or
367 | 
368 |     b) Requiring preservation of specified reasonable legal notices or
369 |     author attributions in that material or in the Appropriate Legal
370 |     Notices displayed by works containing it; or
371 | 
372 |     c) Prohibiting misrepresentation of the origin of that material, or
373 |     requiring that modified versions of such material be marked in
374 |     reasonable ways as different from the original version; or
375 | 
376 |     d) Limiting the use for publicity purposes of names of licensors or
377 |     authors of the material; or
378 | 
379 |     e) Declining to grant rights under trademark law for use of some
380 |     trade names, trademarks, or service marks; or
381 | 
382 |     f) Requiring indemnification of licensors and authors of that
383 |     material by anyone who conveys the material (or modified versions of
384 |     it) with contractual assumptions of liability to the recipient, for
385 |     any liability that these contractual assumptions directly impose on
386 |     those licensors and authors.
387 | 
388 |   All other non-permissive additional terms are considered "further
389 | restrictions" within the meaning of section 10.  If the Program as you
390 | received it, or any part of it, contains a notice stating that it is
391 | governed by this License along with a term that is a further
392 | restriction, you may remove that term.  If a license document contains
393 | a further restriction but permits relicensing or conveying under this
394 | License, you may add to a covered work material governed by the terms
395 | of that license document, provided that the further restriction does
396 | not survive such relicensing or conveying.
397 | 
398 |   If you add terms to a covered work in accord with this section, you
399 | must place, in the relevant source files, a statement of the
400 | additional terms that apply to those files, or a notice indicating
401 | where to find the applicable terms.
402 | 
403 |   Additional terms, permissive or non-permissive, may be stated in the
404 | form of a separately written license, or stated as exceptions;
405 | the above requirements apply either way.
406 | 
407 |   8. Termination.
408 | 
409 |   You may not propagate or modify a covered work except as expressly
410 | provided under this License.  Any attempt otherwise to propagate or
411 | modify it is void, and will automatically terminate your rights under
412 | this License (including any patent licenses granted under the third
413 | paragraph of section 11).
414 | 
415 |   However, if you cease all violation of this License, then your
416 | license from a particular copyright holder is reinstated (a)
417 | provisionally, unless and until the copyright holder explicitly and
418 | finally terminates your license, and (b) permanently, if the copyright
419 | holder fails to notify you of the violation by some reasonable means
420 | prior to 60 days after the cessation.
421 | 
422 |   Moreover, your license from a particular copyright holder is
423 | reinstated permanently if the copyright holder notifies you of the
424 | violation by some reasonable means, this is the first time you have
425 | received notice of violation of this License (for any work) from that
426 | copyright holder, and you cure the violation prior to 30 days after
427 | your receipt of the notice.
428 | 
429 |   Termination of your rights under this section does not terminate the
430 | licenses of parties who have received copies or rights from you under
431 | this License.  If your rights have been terminated and not permanently
432 | reinstated, you do not qualify to receive new licenses for the same
433 | material under section 10.
434 | 
435 |   9. Acceptance Not Required for Having Copies.
436 | 
437 |   You are not required to accept this License in order to receive or
438 | run a copy of the Program.  Ancillary propagation of a covered work
439 | occurring solely as a consequence of using peer-to-peer transmission
440 | to receive a copy likewise does not require acceptance.  However,
441 | nothing other than this License grants you permission to propagate or
442 | modify any covered work.  These actions infringe copyright if you do
443 | not accept this License.  Therefore, by modifying or propagating a
444 | covered work, you indicate your acceptance of this License to do so.
445 | 
446 |   10. Automatic Licensing of Downstream Recipients.
447 | 
448 |   Each time you convey a covered work, the recipient automatically
449 | receives a license from the original licensors, to run, modify and
450 | propagate that work, subject to this License.  You are not responsible
451 | for enforcing compliance by third parties with this License.
452 | 
453 |   An "entity transaction" is a transaction transferring control of an
454 | organization, or substantially all assets of one, or subdividing an
455 | organization, or merging organizations.  If propagation of a covered
456 | work results from an entity transaction, each party to that
457 | transaction who receives a copy of the work also receives whatever
458 | licenses to the work the party's predecessor in interest had or could
459 | give under the previous paragraph, plus a right to possession of the
460 | Corresponding Source of the work from the predecessor in interest, if
461 | the predecessor has it or can get it with reasonable efforts.
462 | 
463 |   You may not impose any further restrictions on the exercise of the
464 | rights granted or affirmed under this License.  For example, you may
465 | not impose a license fee, royalty, or other charge for exercise of
466 | rights granted under this License, and you may not initiate litigation
467 | (including a cross-claim or counterclaim in a lawsuit) alleging that
468 | any patent claim is infringed by making, using, selling, offering for
469 | sale, or importing the Program or any portion of it.
470 | 
471 |   11. Patents.
472 | 
473 |   A "contributor" is a copyright holder who authorizes use under this
474 | License of the Program or a work on which the Program is based.  The
475 | work thus licensed is called the contributor's "contributor version".
476 | 
477 |   A contributor's "essential patent claims" are all patent claims
478 | owned or controlled by the contributor, whether already acquired or
479 | hereafter acquired, that would be infringed by some manner, permitted
480 | by this License, of making, using, or selling its contributor version,
481 | but do not include claims that would be infringed only as a
482 | consequence of further modification of the contributor version.  For
483 | purposes of this definition, "control" includes the right to grant
484 | patent sublicenses in a manner consistent with the requirements of
485 | this License.
486 | 
487 |   Each contributor grants you a non-exclusive, worldwide, royalty-free
488 | patent license under the contributor's essential patent claims, to
489 | make, use, sell, offer for sale, import and otherwise run, modify and
490 | propagate the contents of its contributor version.
491 | 
492 |   In the following three paragraphs, a "patent license" is any express
493 | agreement or commitment, however denominated, not to enforce a patent
494 | (such as an express permission to practice a patent or covenant not to
495 | sue for patent infringement).  To "grant" such a patent license to a
496 | party means to make such an agreement or commitment not to enforce a
497 | patent against the party.
498 | 
499 |   If you convey a covered work, knowingly relying on a patent license,
500 | and the Corresponding Source of the work is not available for anyone
501 | to copy, free of charge and under the terms of this License, through a
502 | publicly available network server or other readily accessible means,
503 | then you must either (1) cause the Corresponding Source to be so
504 | available, or (2) arrange to deprive yourself of the benefit of the
505 | patent license for this particular work, or (3) arrange, in a manner
506 | consistent with the requirements of this License, to extend the patent
507 | license to downstream recipients.  "Knowingly relying" means you have
508 | actual knowledge that, but for the patent license, your conveying the
509 | covered work in a country, or your recipient's use of the covered work
510 | in a country, would infringe one or more identifiable patents in that
511 | country that you have reason to believe are valid.
512 | 
513 |   If, pursuant to or in connection with a single transaction or
514 | arrangement, you convey, or propagate by procuring conveyance of, a
515 | covered work, and grant a patent license to some of the parties
516 | receiving the covered work authorizing them to use, propagate, modify
517 | or convey a specific copy of the covered work, then the patent license
518 | you grant is automatically extended to all recipients of the covered
519 | work and works based on it.
520 | 
521 |   A patent license is "discriminatory" if it does not include within
522 | the scope of its coverage, prohibits the exercise of, or is
523 | conditioned on the non-exercise of one or more of the rights that are
524 | specifically granted under this License.  You may not convey a covered
525 | work if you are a party to an arrangement with a third party that is
526 | in the business of distributing software, under which you make payment
527 | to the third party based on the extent of your activity of conveying
528 | the work, and under which the third party grants, to any of the
529 | parties who would receive the covered work from you, a discriminatory
530 | patent license (a) in connection with copies of the covered work
531 | conveyed by you (or copies made from those copies), or (b) primarily
532 | for and in connection with specific products or compilations that
533 | contain the covered work, unless you entered into that arrangement,
534 | or that patent license was granted, prior to 28 March 2007.
535 | 
536 |   Nothing in this License shall be construed as excluding or limiting
537 | any implied license or other defenses to infringement that may
538 | otherwise be available to you under applicable patent law.
539 | 
540 |   12. No Surrender of Others' Freedom.
541 | 
542 |   If conditions are imposed on you (whether by court order, agreement or
543 | otherwise) that contradict the conditions of this License, they do not
544 | excuse you from the conditions of this License.  If you cannot convey a
545 | covered work so as to satisfy simultaneously your obligations under this
546 | License and any other pertinent obligations, then as a consequence you may
547 | not convey it at all.  For example, if you agree to terms that obligate you
548 | to collect a royalty for further conveying from those to whom you convey
549 | the Program, the only way you could satisfy both those terms and this
550 | License would be to refrain entirely from conveying the Program.
551 | 
552 |   13. Use with the GNU Affero General Public License.
553 | 
554 |   Notwithstanding any other provision of this License, you have
555 | permission to link or combine any covered work with a work licensed
556 | under version 3 of the GNU Affero General Public License into a single
557 | combined work, and to convey the resulting work.  The terms of this
558 | License will continue to apply to the part which is the covered work,
559 | but the special requirements of the GNU Affero General Public License,
560 | section 13, concerning interaction through a network will apply to the
561 | combination as such.
562 | 
563 |   14. Revised Versions of this License.
564 | 
565 |   The Free Software Foundation may publish revised and/or new versions of
566 | the GNU General Public License from time to time.  Such new versions will
567 | be similar in spirit to the present version, but may differ in detail to
568 | address new problems or concerns.
569 | 
570 |   Each version is given a distinguishing version number.  If the
571 | Program specifies that a certain numbered version of the GNU General
572 | Public License "or any later version" applies to it, you have the
573 | option of following the terms and conditions either of that numbered
574 | version or of any later version published by the Free Software
575 | Foundation.  If the Program does not specify a version number of the
576 | GNU General Public License, you may choose any version ever published
577 | by the Free Software Foundation.
578 | 
579 |   If the Program specifies that a proxy can decide which future
580 | versions of the GNU General Public License can be used, that proxy's
581 | public statement of acceptance of a version permanently authorizes you
582 | to choose that version for the Program.
583 | 
584 |   Later license versions may give you additional or different
585 | permissions.  However, no additional obligations are imposed on any
586 | author or copyright holder as a result of your choosing to follow a
587 | later version.
588 | 
589 |   15. Disclaimer of Warranty.
590 | 
591 |   THERE IS NO WARRANTY FOR THE PROGRAM, TO THE EXTENT PERMITTED BY
592 | APPLICABLE LAW.  EXCEPT WHEN OTHERWISE STATED IN WRITING THE COPYRIGHT
593 | HOLDERS AND/OR OTHER PARTIES PROVIDE THE PROGRAM "AS IS" WITHOUT WARRANTY
594 | OF ANY KIND, EITHER EXPRESSED OR IMPLIED, INCLUDING, BUT NOT LIMITED TO,
595 | THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
596 | PURPOSE.  THE ENTIRE RISK AS TO THE QUALITY AND PERFORMANCE OF THE PROGRAM
597 | IS WITH YOU.  SHOULD THE PROGRAM PROVE DEFECTIVE, YOU ASSUME THE COST OF
598 | ALL NECESSARY SERVICING, REPAIR OR CORRECTION.
599 | 
600 |   16. Limitation of Liability.
601 | 
602 |   IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN WRITING
603 | WILL ANY COPYRIGHT HOLDER, OR ANY OTHER PARTY WHO MODIFIES AND/OR CONVEYS
604 | THE PROGRAM AS PERMITTED ABOVE, BE LIABLE TO YOU FOR DAMAGES, INCLUDING ANY
605 | GENERAL, SPECIAL, INCIDENTAL OR CONSEQUENTIAL DAMAGES ARISING OUT OF THE
606 | USE OR INABILITY TO USE THE PROGRAM (INCLUDING BUT NOT LIMITED TO LOSS OF
607 | DATA OR DATA BEING RENDERED INACCURATE OR LOSSES SUSTAINED BY YOU OR THIRD
608 | PARTIES OR A FAILURE OF THE PROGRAM TO OPERATE WITH ANY OTHER PROGRAMS),
609 | EVEN IF SUCH HOLDER OR OTHER PARTY HAS BEEN ADVISED OF THE POSSIBILITY OF
610 | SUCH DAMAGES.
611 | 
612 |   17. Interpretation of Sections 15 and 16.
613 | 
614 |   If the disclaimer of warranty and limitation of liability provided
615 | above cannot be given local legal effect according to their terms,
616 | reviewing courts shall apply local law that most closely approximates
617 | an absolute waiver of all civil liability in connection with the
618 | Program, unless a warranty or assumption of liability accompanies a
619 | copy of the Program in return for a fee.
620 | 
621 |                      END OF TERMS AND CONDITIONS
622 | 
623 |             How to Apply These Terms to Your New Programs
624 | 
625 |   If you develop a new program, and you want it to be of the greatest
626 | possible use to the public, the best way to achieve this is to make it
627 | free software which everyone can redistribute and change under these terms.
628 | 
629 |   To do so, attach the following notices to the program.  It is safest
630 | to attach them to the start of each source file to most effectively
631 | state the exclusion of warranty; and each file should have at least
632 | the "copyright" line and a pointer to where the full notice is found.
633 | 
634 |     <one line to give the program's name and a brief idea of what it does.>
635 |     Copyright (C) <year>  <name of author>
636 | 
637 |     This program is free software: you can redistribute it and/or modify
638 |     it under the terms of the GNU General Public License as published by
639 |     the Free Software Foundation, either version 3 of the License, or
640 |     (at your option) any later version.
641 | 
642 |     This program is distributed in the hope that it will be useful,
643 |     but WITHOUT ANY WARRANTY; without even the implied warranty of
644 |     MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
645 |     GNU General Public License for more details.
646 | 
647 |     You should have received a copy of the GNU General Public License
648 |     along with this program.  If not, see <https://www.gnu.org/licenses/>.
649 | 
650 | Also add information on how to contact you by electronic and paper mail.
651 | 
652 |   If the program does terminal interaction, make it output a short
653 | notice like this when it starts in an interactive mode:
654 | 
655 |     <program>  Copyright (C) <year>  <name of author>
656 |     This program comes with ABSOLUTELY NO WARRANTY; for details type `show w'.
657 |     This is free software, and you are welcome to redistribute it
658 |     under certain conditions; type `show c' for details.
659 | 
660 | The hypothetical commands `show w' and `show c' should show the appropriate
661 | parts of the General Public License.  Of course, your program's commands
662 | might be different; for a GUI interface, you would use an "about box".
663 | 
664 |   You should also get your employer (if you work as a programmer) or school,
665 | if any, to sign a "copyright disclaimer" for the program, if necessary.
666 | For more information on this, and how to apply and follow the GNU GPL, see
667 | <https://www.gnu.org/licenses/>.
668 | 
669 |   The GNU General Public License does not permit incorporating your program
670 | into proprietary programs.  If your program is a subroutine library, you
671 | may consider it more useful to permit linking proprietary applications with
672 | the library.  If this is what you want to do, use the GNU Lesser General
673 | Public License instead of this License.  But first, please read
674 | <https://www.gnu.org/licenses/why-not-lgpl.html>.
675 | 


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
  1 | # 课程概况
  2 | 
  3 | **课程编号：** 180086081200P1001H-1
  4 | 
  5 | **课时：** 60
  6 | 
  7 | **学分：** 3.00
  8 | 
  9 | **课程属性：** 学科核心课
 10 | 
 11 | **主讲教师：** 黄庆明等
 12 | 
 13 | **课程名称：** 模式识别与机器学习23-24秋季
 14 | 
 15 | **课程英文名称：** Pattern Recognition and Machine Learning
 16 | 
 17 | **教学目的、要求**
 18 | 
 19 | 本课程为计算机应用技术专业硕士研究生的专业核心课，同时也可作为计算机其它专业及电子、自动化等学科研究生的专业普及课。模式识别是研究用机器代替人去识别、辨识客观事物的学科,而机器学习则研究如何构造算法从数据中学习或对数据做出预测。二者均具有系统的理论和方法,并有着千丝万缕的联系，也都在近年来得到了广泛的应用，成为人工智能领域最热门的学科方向之一。本课程将讲述模式识别与机器学习的基本概念、基本理论和方法、关键算法原理以及典型应用情况。预期通过理论讲授与应用案例分析，为本学科研究生今后从事相关方向的深入研究打下坚实的理论基础，为其他学科的研究生应用相关方法和技术提供理论和实践指导。要求学生不仅能够切实掌握本课程的基本理论和方法，并能在解决实际问题时灵活应用所学知识。
 20 | 
 21 | **预修课程**
 22 | 
 23 | 高等数学，线性代数，概率论与数理统计
 24 | 
 25 | **教材**
 26 | 
 27 | 无
 28 | 
 29 | **主要内容**
 30 | 
 31 | * 第一章 概述 3学时 黄庆明
 32 |   * 第1节 课程简介
 33 |   * 第2节 模式识别和机器学习的基本概念
 34 |   * 第3节 模式识别和机器学习发展简史
 35 |   * 第4节 模式识别和机器学习方法
 36 |   * 第5节 模式识别和机器学习系统
 37 |   * 第6节 模式识别和机器学习应用
 38 |   * 第7节 相关数学基础
 39 | * 第二章 生成式分类器 3学时 黄庆明
 40 |   * 第1节 贝叶斯判别准则
 41 |   * 第2节 最小风险判别
 42 |   * 第3节 朴素贝叶斯分类器
 43 |   * 第4节 概率分布参数估计（极大似然估计和贝叶斯估计）
 44 |   * 第5节 正态分布模式的贝叶斯分类器
 45 |   * 第6节 正态分布参数估计（均值和协方差矩阵）
 46 | * 第三章 判别式分类器 6学时 黄庆明
 47 |   * 第1节 判别式分类器 vs. 生成式分类器
 48 |   * 第2节 线性判别函数
 49 |   * 第3节 广义线性判别函数
 50 |   * 第4节 分段线性判别函数
 51 |   * 第5节 Fisher线性判别
 52 |   * 第6节 感知器算法
 53 |   * 第7节 最小平方误差(LMSE)算法
 54 |   * 第8节 决策树
 55 | * 第四章 特征提取 3学时 黄庆明
 56 |   * 第1节 特征选择
 57 |   * 第2节 特征变换
 58 | * 第五章 统计学习理论基础 3学时 苏荔
 59 |   * 第1节 统计学习框架
 60 |   * 第2节 经验风险与期望风险
 61 |   * 第3节 测试误差估计
 62 |   * 第4节 正则化方法
 63 |   * 第5节 偏差—方差分析
 64 |   * 第6节 统计学习理论
 65 | * 第六章 线性模型 3学时 苏荔
 66 |   * 第1节 线性回归模型
 67 |   * 第2节 逻辑回归模型
 68 | * 第七章 支持向量机 3学时 苏荔
 69 |   * 第1节 线性支持向量机
 70 |   * 第2节 软间隔的支持向量机
 71 |   * 第3节 核方法支持向量机
 72 |   * 第4节 支持向量回归
 73 | * 第八章 聚类 3学时 苏荔
 74 |   * 第1节 无监督学习与有监督学习对比
 75 |   * 第2节 距离计算
 76 |   * 第3节 聚类算法的评价方法
 77 |   * 第4节 经典聚类方法
 78 | * 第九章 降维 3学时 苏荔
 79 |   * 第1节 线性降维技术
 80 |   * 第2节 全局结构保持降维方法
 81 |   * 第3节 局部结构保持降维方法
 82 | * 第十章 半监督学习 5学时 苏荔
 83 |   * 第1节 自我训练
 84 |   * 第2节 多视角学习
 85 |   * 第3节 生成模型
 86 |   * 第4节 S3VMs
 87 |   * 第5节 基于图的算法
 88 |   * 第6节 半监督聚类
 89 | * 第十一章 概率图模型 4学时 苏荔
 90 |   * 第1节 有向概率图模型
 91 |   * 第2节 无向概率图模型
 92 |   * 第3节 学习和推断
 93 |   * 第4节 经典概率图模型（HMM\&CRF）
 94 | * 第十二章 集成学习 3学时 苏荔
 95 |   * 第1节 Bagging和随机森林
 96 |   * 第2节 Boosting和GBDT
 97 | * 第十三章 深度学习及应用 12学时 苏荔
 98 |   * 第1节 人工神经网络的生物原型
 99 |   * 第2节 生物视觉系统简介
100 |   * 第3节 卷积神经网络CNN源起与概述
101 |   * 第4节 典型卷积神经网络结构
102 |   * 第5节 循环神经网络
103 |   * 第6节 反向传播算法介绍
104 |   * 第7节 深度模型训练技巧
105 |   * 第8节 深度模型应用
106 |   * 第9节 深度学习新进展
107 | * 第十四章 课程复习 3学时 苏荔
108 |   * 第1节 课程复习
109 | * 第十五章 期末考试 3学时 苏荔
110 |   * 第1节 期末考试
111 | 
112 | **参考用书**
113 | 
114 | 1. 模式识别（模式识别与机器学习（第4版）） 张学工、汪小我 2021年9月
115 | 2. 机器学习从原理到应用 卿来云、黄庆明 2020年10月
116 | 3. 机器学习 周志华 2016年1月
117 | 4. 统计学习方法（第2版） 李航 2019年5月
118 | 5. 神经网络与深度学习 邱锡鹏 2020年5月
119 | 
120 | **课程教师信息**
121 | 
122 | 1. 黄庆明，中国科学院大学计算机科学与技术学院副院长、网络空间安全学院副院长（兼），教授（二级）、博士生导师，中国科学院计算技术研究所客座研究员、博士生导师，中国科学院信息工程研究所客座研究员、博士生导师。国家杰出青年科学基金获得者，百千万人才工程国家级人选并被授予“有突出贡献中青年专家”荣誉称号，享受国务院政府特殊津贴。现为IEEE Fellow，IEEE CASS北京分会主席，CCF理事，CCF会士，CCF多媒体技术专业委员会副主任，中国图象图形学学会常务理事，北京图象图形学学会副理事长。 主要研究方向为模式识别、多媒体计算、图像与视频分析、计算机视觉、机器学习等，主持承担了国家科技创新2030-”新一代人工智能“重大项目、国家自然科学基金重点项目和重点国际合作项目、863课题、973课题等国家和省部级项目的研究工作，在国内外权威期刊和重要国际会议上发表学术论文500余篇，申请国内外发明专利50余项，相关研究成果多次获得省部级奖励。自2004年起先后讲授了“模式识别”、“模式识别在图像与视频分析中的应用”、“模式识别与机器学习”、“视觉信息学习与分析”等课程。
123 | 2. 李国荣， 中国科学院大学计算机科学与技术学院，副教授，硕士生导师，中国科学院青促会会员。主要研究方向为图像与视频分析、多媒体内容分析与检索、模式识别等，已在CVPR、ECCV、ICCV、AAAI、ICDM、ACM Multimedia等相关国际权威会议和期刊上发表论文40余篇。作为项目负责人或研究骨干，参与了包括国家973课题、国家自然基金重点项目、国家自然科学基金国际合作；作为项目负责人，承担了国家自然基金面上和青年项目、中国博士后基金特等多项国家和省部级项目的研究。
124 | 3. 苏荔，中国科学院大学计算机科学与技术学院，教授，博士生导师，中国计算机学会(CCF)多媒体专委会委员，中国图象图形学学会(CSIG)多媒体专委会委员，中国数字音视频编解码技术标准(AVS)工作组成员。主要研究方向为多媒体计算、模式识别与机器学习等。自2009年起先后主讲了我校“数字图像处理”、“模式识别‘、“模式识别与机器学习”、“多媒体技术”、“图像处理与计算机视觉”等研究生专业课程。
125 | 


--------------------------------------------------------------------------------
/SUMMARY.md:
--------------------------------------------------------------------------------
 1 | # Table of contents
 2 | 
 3 | * [课程概况](README.md)
 4 | 
 5 | ## 第一章 概述
 6 | 
 7 | * [1.1 概述](di-yi-zhang-gai-shu/1.1-gai-shu.md)
 8 | 
 9 | ## 第二章 生成式分类器
10 | 
11 | * [2.1 模式识别与机器学习的目标](di-er-zhang-sheng-cheng-shi-fen-lei-qi/2.1-mo-shi-shi-bie-yu-ji-qi-xue-xi-de-mu-biao.md)
12 | * [2.2 正态分布模式的贝叶斯分类器](di-er-zhang-sheng-cheng-shi-fen-lei-qi/2.2-zheng-tai-fen-bu-mo-shi-de-bei-ye-si-fen-lei-qi.md)
13 | * [2.3 均值向量和协方差矩阵的参数估计](di-er-zhang-sheng-cheng-shi-fen-lei-qi/2.3-jun-zhi-xiang-liang-he-xie-fang-cha-ju-zhen-de-can-shu-gu-ji.md)
14 | * [附 第二章作业](di-er-zhang-sheng-cheng-shi-fen-lei-qi/fu-di-er-zhang-zuo-ye.md)
15 | 
16 | ## 第三章 判别式分类器
17 | 
18 | * [3.1 判别式分类器与生成式分类器](di-san-zhang-pan-bie-shi-fen-lei-qi/3.1-pan-bie-shi-fen-lei-qi-yu-sheng-cheng-shi-fen-lei-qi.md)
19 | * [3.2 线性判别函数](di-san-zhang-pan-bie-shi-fen-lei-qi/3.2-xian-xing-pan-bie-han-shu.md)
20 | * [3.3 广义线性判别函数](di-san-zhang-pan-bie-shi-fen-lei-qi/3.3-guang-yi-xian-xing-pan-bie-han-shu.md)
21 | * [3.4 Fisher线性判别](di-san-zhang-pan-bie-shi-fen-lei-qi/3.4-fisher-xian-xing-pan-bie.md)
22 | * [3.5 感知器算法](di-san-zhang-pan-bie-shi-fen-lei-qi/3.5-gan-zhi-qi-suan-fa.md)
23 | * [3.6 可训练的确定性分类器的迭代算法](di-san-zhang-pan-bie-shi-fen-lei-qi/3.6-ke-xun-lian-de-que-ding-xing-fen-lei-qi-de-die-dai-suan-fa.md)
24 | * [3.7 势函数法](di-san-zhang-pan-bie-shi-fen-lei-qi/3.7-shi-han-shu-fa.md)
25 | * [3.8 决策树](di-san-zhang-pan-bie-shi-fen-lei-qi/3.8-jue-ce-shu.md)
26 | * [附 第三章作业](di-san-zhang-pan-bie-shi-fen-lei-qi/fu-di-san-zhang-zuo-ye.md)
27 | 
28 | ## 第四章 特征选择和提取
29 | 
30 | * [4.1 模式类别可分性的测度](di-si-zhang-te-zheng-xuan-ze-he-ti-qu/4.1-mo-shi-lei-bie-ke-fen-xing-de-ce-du.md)
31 | * [4.2 特征选择](di-si-zhang-te-zheng-xuan-ze-he-ti-qu/4.2-te-zheng-xuan-ze.md)
32 | * [4.3 离散K-L变换](di-si-zhang-te-zheng-xuan-ze-he-ti-qu/4.3-li-san-kl-bian-huan.md)
33 | * [附 第四章作业](di-si-zhang-te-zheng-xuan-ze-he-ti-qu/fu-di-si-zhang-zuo-ye.md)
34 | 
35 | ## 第五章 统计机器学习
36 | 
37 | * [5.1 机器学习简介](di-wu-zhang-tong-ji-ji-qi-xue-xi/5.1-ji-qi-xue-xi-jian-jie.md)
38 | * [5.2 统计机器学习](di-wu-zhang-tong-ji-ji-qi-xue-xi/5.2-tong-ji-ji-qi-xue-xi.md)
39 | 
40 | ## 第六章 有监督学习
41 | 
42 | * [6.1 有监督学习](di-liu-zhang-you-jian-du-xue-xi/6.1-you-jian-du-xue-xi.md)
43 | * [6.2 回归任务](di-liu-zhang-you-jian-du-xue-xi/6.2-hui-gui-ren-wu.md)
44 | * [6.3 分类问题](di-liu-zhang-you-jian-du-xue-xi/6.3-fen-lei-wen-ti.md)
45 | * [附 第六章作业](di-liu-zhang-you-jian-du-xue-xi/fu-di-liu-zhang-zuo-ye.md)
46 | 
47 | ## 第七章 支持向量机
48 | 
49 | * [7.1 线性支持向量机](di-qi-zhang-zhi-chi-xiang-liang-ji/7.1-xian-xing-zhi-chi-xiang-liang-ji.md)
50 | * [7.2 核支持向量机](di-qi-zhang-zhi-chi-xiang-liang-ji/7.2-he-zhi-chi-xiang-liang-ji.md)
51 | * [7.3 序列最小优化算法](di-qi-zhang-zhi-chi-xiang-liang-ji/7.3-xu-lie-zui-xiao-you-hua-suan-fa.md)
52 | * [附 第七章作业](di-qi-zhang-zhi-chi-xiang-liang-ji/fu-di-qi-zhang-zuo-ye.md)
53 | 
54 | ## 第八章 聚类
55 | 
56 | * [8.1 基本概念](di-ba-zhang-ju-lei/8.1-ji-ben-gai-nian.md)
57 | * [8.2 经典聚类算法](di-ba-zhang-ju-lei/8.2-jing-dian-ju-lei-suan-fa.md)
58 | * [附 第八章作业](di-ba-zhang-ju-lei/fu-di-ba-zhang-zuo-ye.md)
59 | 
60 | ## 第九章 降维
61 | 
62 | * [9.1 基本概念](di-jiu-zhang-jiang-wei/9.1-ji-ben-gai-nian.md)
63 | * [9.2 维度选择](di-jiu-zhang-jiang-wei/9.2-wei-du-xuan-ze.md)
64 | * [9.3 维度抽取](di-jiu-zhang-jiang-wei/9.3-wei-du-chou-qu.md)
65 | 
66 | ## 第十章 半监督学习
67 | 
68 | * [10.1 基本概念](di-shi-zhang-ban-jian-du-xue-xi/10.1-ji-ben-gai-nian.md)
69 | * [10.2 半监督学习算法](di-shi-zhang-ban-jian-du-xue-xi/10.2-ban-jian-du-xue-xi-suan-fa.md)
70 | 
71 | ## 第十一章 概率图模型
72 | 
73 | * [11.1 PGM简介](di-shi-yi-zhang-gai-shuai-tu-mo-xing/11.1-pgm-jian-jie.md)
74 | * [11.2 有向图模型（贝叶斯网络）](di-shi-yi-zhang-gai-shuai-tu-mo-xing/11.2-you-xiang-tu-mo-xing-bei-ye-si-wang-luo.md)
75 | * [11.3 无向图模型（马尔科夫随机场）](di-shi-yi-zhang-gai-shuai-tu-mo-xing/11.3-wu-xiang-tu-mo-xing-ma-er-ke-fu-sui-ji-chang.md)
76 | * [11.4 学习和推断](di-shi-yi-zhang-gai-shuai-tu-mo-xing/11.4-xue-xi-he-tui-duan.md)
77 | * [11.5 典型概率图模型](di-shi-yi-zhang-gai-shuai-tu-mo-xing/11.5-dian-xing-gai-shuai-tu-mo-xing.md)
78 | * [附 第十一章作业](di-shi-yi-zhang-gai-shuai-tu-mo-xing/fu-di-shi-yi-zhang-zuo-ye.md)
79 | 
80 | ## 第十二章 集成学习
81 | 
82 | * [12.1 简介](di-shi-er-zhang-ji-cheng-xue-xi/12.1-jian-jie.md)
83 | * [12.2 Bagging](di-shi-er-zhang-ji-cheng-xue-xi/12.2-bagging.md)
84 | * [12.3 Boosting](di-shi-er-zhang-ji-cheng-xue-xi/12.3-boosting.md)
85 | * [附 第十二章作业](di-shi-er-zhang-ji-cheng-xue-xi/fu-di-shi-er-zhang-zuo-ye.md)
86 | 


--------------------------------------------------------------------------------
/di-ba-zhang-ju-lei/8.1-ji-ben-gai-nian.md:
--------------------------------------------------------------------------------
  1 | # 8.1 基本概念
  2 | 
  3 | ## 8.1.1 什么是聚类？
  4 | 
  5 | 在无监督学习中，需要发现数据中分组聚集的结构，并根据数据中样本与样本之间的距离或相似度，将样本划分为若干组/类/簇（cluster）
  6 | 
  7 | 
  8 | 
  9 | <mark style="color:orange;">**划分的原则**</mark>：<mark style="color:red;">**类内样本距离小，类间样本距离大**</mark>
 10 | 
 11 | 
 12 | 
 13 | ### 一、聚类的类型
 14 | 
 15 | 聚类的结果是产生一组聚类的集合
 16 | 
 17 | - **基于划分的聚类**（<mark style="color:orange;">**无嵌套**</mark>）：每个样本仅属于一个簇
 18 | - **基于层次的聚类**（<mark style="color:orange;">**嵌套**</mark>）：树形的聚类结构，簇之间存在嵌套
 19 | 
 20 | ![](../.gitbook/assets/8.1.1.png)
 21 | 
 22 | 
 23 | 
 24 | 聚类中的簇集合还有一些其它的区别，包括：
 25 | 
 26 | - <mark style="color:purple;">**独占**</mark>和<mark style="color:purple;">**非独占**</mark>：非独占的簇中，样本可以属于多个簇
 27 | - <mark style="color:purple;">**模糊**</mark>和<mark style="color:purple;">**非模糊**</mark>
 28 |   - 模糊的簇表现为$$\sum p=1$$，而非模糊的簇中概率非0即1
 29 |   - 概率聚类有相似的特性
 30 | - <mark style="color:purple;">**部分**</mark>和<mark style="color:purple;">**完备**</mark>：在非完备的场景中，只聚类部分数据
 31 | - <mark style="color:purple;">**异质**</mark>和<mark style="color:purple;">**同质**</mark>：簇的大小、形状和密度是否有很大差别
 32 | 
 33 | 
 34 | 
 35 | ### 二、簇的类型
 36 | 
 37 | #### 1、基于中心的簇
 38 | 
 39 | {% hint style="success" %}
 40 | 
 41 | 簇内的点和其“中心”较为相近（或相似），和其他簇的“中心”较远，这样的一组样本形成的簇
 42 | 
 43 | {% endhint %}
 44 | 
 45 | 
 46 | 
 47 | 簇**中心**的表示：
 48 | 
 49 | - **质心**：簇内所有点的平均
 50 | - **中值点**：簇内最有代表性的点
 51 | 
 52 | ![](../.gitbook/assets/8.1.2.png)
 53 | 
 54 | 
 55 | 
 56 | #### 2、基于连续性和基于密度的簇
 57 | 
 58 | - **<mark style="color:orange;">连续性</mark>**：相比其他任何簇的点，每个点都至少和所属簇的某一个点更近
 59 | - **<mark style="color:orange;">密度</mark>**：簇是由高密度的区域形成的，簇之间是一些低密度的区域
 60 | 
 61 | ![](../.gitbook/assets/8.1.3.png)
 62 | 
 63 | #### 3、基于概念的簇
 64 | 
 65 | {% hint style="success" %}
 66 | 
 67 | 同一个簇共享某种性质，这个性质是从整个集合推导出来的
 68 | 
 69 | {% endhint %}
 70 | 
 71 | 这种簇通常较难分辨，因为它一般不是基于中心/密度的
 72 | 
 73 | 
 74 | 
 75 | ### 三、聚类分析的三要素
 76 | 
 77 | - **如何定义样本点之间的**<mark style="color:orange;">**远近**</mark>
 78 |   - <mark style="color:red;">**距离函数**</mark>
 79 | - **如何评价聚类得到的簇的**<mark style="color:orange;">**质量**</mark>
 80 |   - <mark style="color:red;">**评价函数**</mark>
 81 | - **如何获得聚类的簇**
 82 |   - 怎样表示簇
 83 |   - 怎样设计划分和优化算法
 84 |   - 算法何时停止
 85 | 
 86 | 
 87 | 
 88 | ## 8.1.2 数据预处理
 89 | 
 90 | 假设有n个样本，每一个有d个特征，则可以表示为一个n行d列的特征矩阵。对于这样的样本特征矩阵，有一些常见的数据预处理步骤：
 91 | 
 92 | ### 一、标准化（StandardScaler）
 93 | 
 94 | {% hint style="success" %}
 95 | 
 96 | 将输入特征的<mark style="color:purple;">**均值**</mark>变为0，<mark style="color:purple;">**方差**</mark>变为1
 97 | 
 98 | {% endhint %}
 99 | 
100 | ```python
101 | from sklearn.preprocessing import StandardScaler
102 | 
103 | # 构造输入特征的标准化器
104 | ss_X = StandardScaler()
105 | 
106 | # 分别对训练和测试数据的特征进行标准化处理
107 | X_train = ss_X.fit_transform(X_train)
108 | X_test = ss_X.transform(X_test)
109 | ```
110 | 
111 | 对于每一个维度的特征，有：
112 | $$
113 | \begin{align}
114 | &x_i^`=\frac{x_i-\mu}{\sigma} \nonumber\\
115 | \\
116 | &\mu = \frac1N\sum_{i=0}^Nx_i\\ 
117 | \\
118 | &\sigma = \sqrt{\frac1{N-1}\left(\sum_{i=1}^Nx_i-\mu\right)^2}
119 | \end{align}
120 | $$
121 | 
122 | 
123 | ### 二、区间缩放（Min-Max Scaling）
124 | 
125 | {% hint style="success" %}
126 | 
127 | 将特征的<mark style="color:purple;">**取值范围**</mark>缩放到$$[0,1]$$​区间
128 | 
129 | 有时也会缩放到$$[-1,1]$$
130 | 
131 | {% endhint %}
132 | 
133 | 
134 | 
135 | - 对非常小的标准偏差的特征鲁棒性更强
136 | - 能够在系数数据中保留零条目
137 | 
138 | 
139 | 
140 | ### 三、归一化（Normalization）
141 | 
142 | {% hint style="success" %}
143 | 
144 | 对于样本的不同特征，将它们缩放同样的单位向量
145 | 
146 | {% endhint %}
147 | 
148 | 即将样本的模的长度变为单位1：
149 | $$
150 | \begin{align}
151 | \boldsymbol x_i^` &= \frac{\boldsymbol x_i}{\Vert\boldsymbol x_i\Vert_2} \nonumber\\
152 | \\
153 | &= \frac{\boldsymbol x_i}{\sqrt{\sum\limits_{d=1}^D x_{id}^2}}
154 | \end{align}
155 | $$
156 | 在求欧氏距离和文本特征时常用到
157 | 
158 | 
159 | 
160 | ## 8.1.3 距离度量函数
161 | 
162 | 一个距离度量函数应当满足的特征：
163 | 
164 | - <mark style="color:orange;">**非负性**</mark>：$$dist(\boldsymbol x_i,\boldsymbol x_j)\geq 0$$
165 | - <mark style="color:orange;">**不可分的同一性**</mark>：$$dist(\boldsymbol x_i,\boldsymbol x_j) = 0\ if\ \boldsymbol x_i=\boldsymbol x_j$$
166 | - <mark style="color:orange;">**对称性**</mark>：$$dist(\boldsymbol x_i,\boldsymbol x_j) = dist(\boldsymbol x_j,\boldsymbol x_i)$$
167 | - <mark style="color:orange;">**三角不等式**</mark>：$$dist(\boldsymbol x_i,\boldsymbol x_j) \leq dist(\boldsymbol x_i,\boldsymbol x_k) + dist(\boldsymbol x_k,\boldsymbol x_j)$$
168 | 
169 | 
170 | 
171 | ### 一、闵可夫斯基（Minkowski）距离
172 | 
173 | $$
174 | dist(\boldsymbol x_i,\boldsymbol x_j)=\left(\sum_{d=1}^D\vert x_{id}-x_{jd}\vert^p\right)^\frac1p
175 | $$
176 | 
177 | 当$$p=2$$时，为<mark style="color:purple;">**欧氏距离**</mark>：
178 | $$
179 | dist(\boldsymbol x_i,\boldsymbol x_j)=\sqrt{\sum_{d=1}^D( x_{id}-x_{jd})^2}
180 | $$
181 | 当$$p=1$$时，为<mark style="color:purple;">**曼哈顿距离**</mark>：
182 | $$
183 | dist(\boldsymbol x_i,\boldsymbol x_j)=\sum_{d=1}^D\vert x_{id}-x_{jd}\vert
184 | $$
185 | 
186 | 
187 | - 对样本特征的<mark style="color:orange;">**旋转和平移变换不敏感**</mark>
188 | - 对样本特征的<mark style="color:orange;">**数值尺度敏感**</mark>
189 | - 当特征值尺度不一致时，需要进行标准化操作
190 | 
191 | 
192 | 
193 | ### 二、余弦相似度
194 | 
195 | ![](../.gitbook/assets/8.1.4.png)
196 | 
197 | 将两个变量看作高维空间的两个向量，通过夹角余弦评估其相似度：
198 | $$
199 | \begin{align}
200 | \cos(\theta) &= \frac{a\cdot b}{\Vert a\Vert\times\Vert b\Vert} \nonumber \\
201 | \\
202 | &= \frac{(x_1,y_1)\cdot(x_2,y_2)}{\sqrt{x_1^2+y_1^2}\times\sqrt{x_2^2+y_2^2}}\\
203 | \\
204 | &= \frac{x_1x_2+y_1y_2}{\sqrt{x_1^2+y_1^2}\times\sqrt{x_2^2+y_2^2}}
205 | \end{align}
206 | $$
207 | 
208 | 进而有：
209 | $$
210 | \cos(\theta)=\frac{\sum\limits_{i=1}^n(x_i\times y_i)}{\sqrt{\sum\limits_{i=1}^nx_i^2}\times\sqrt{\sum\limits_{i=1}^ny_i^2}}
211 | $$
212 | 
213 | ### 三、相关系数
214 | 
215 | 定义变量$$\boldsymbol x_i,\boldsymbol x_j$$的相关系数为：
216 | $$
217 | \begin{align}
218 | r(\boldsymbol x_i,\boldsymbol x_j) &= \frac{cov(\boldsymbol x_i,\boldsymbol x_j)}{\sigma_{x_i}\sigma_{x_j}}\nonumber \\
219 | \\
220 | &=\frac{\mathbb{E}\left[\left(\boldsymbol{x}_i-\boldsymbol{\mu}_i\right)\left(\boldsymbol{x}_i-\boldsymbol{\mu}_j\right)\right]}{\sigma_{\boldsymbol{x}_i} \sigma_{\boldsymbol{x}_j}}\\
221 | \\
222 | &=\frac{\sum\limits_{k=1}^D(x_{ik-\mu_{ik}})(x_{jk}-\mu_{jk})}{\sqrt{\sum\limits_{k=1}^D(x_{ik}-\mu_{ik})^2\sum\limits_{j=1}^D(x_{jk}-\mu_{jk})^2}}
223 | \end{align}
224 | $$
225 | {% hint style="success" %}
226 | 
227 | 当对数据做中心化后，相关系数等于余弦相似度
228 | 
229 | {% endhint %}
230 | 
231 | 
232 | 
233 | ## 8.1.4 聚类性能评价指标
234 | 
235 | 
236 | 
237 | 
238 | ## 绘图代码
239 | 
240 | ```python
241 | import numpy as np
242 | import matplotlib.pyplot as plt
243 | from matplotlib.patches import Arc
244 | from matplotlib.transforms import Bbox, IdentityTransform, TransformedBbox
245 | 
246 | class AngleAnnotation(Arc):
247 |     """
248 |     Draws an arc between two vectors which appears circular in display space.
249 |     """
250 |     def __init__(self, xy, p1, p2, size=75, unit="points", ax=None,
251 |                  text="", textposition="inside", text_kw=None, **kwargs):
252 |         """
253 |         Parameters
254 |         ----------
255 |         xy, p1, p2 : tuple or array of two floats
256 |             Center position and two points. Angle annotation is drawn between
257 |             the two vectors connecting *p1* and *p2* with *xy*, respectively.
258 |             Units are data coordinates.
259 | 
260 |         size : float
261 |             Diameter of the angle annotation in units specified by *unit*.
262 | 
263 |         unit : str
264 |             One of the following strings to specify the unit of *size*:
265 | 
266 |             * "pixels": pixels
267 |             * "points": points, use points instead of pixels to not have a
268 |               dependence on the DPI
269 |             * "axes width", "axes height": relative units of Axes width, height
270 |             * "axes min", "axes max": minimum or maximum of relative Axes
271 |               width, height
272 | 
273 |         ax : `matplotlib.axes.Axes`
274 |             The Axes to add the angle annotation to.
275 | 
276 |         text : str
277 |             The text to mark the angle with.
278 | 
279 |         textposition : {"inside", "outside", "edge"}
280 |             Whether to show the text in- or outside the arc. "edge" can be used
281 |             for custom positions anchored at the arc's edge.
282 | 
283 |         text_kw : dict
284 |             Dictionary of arguments passed to the Annotation.
285 | 
286 |         **kwargs
287 |             Further parameters are passed to `matplotlib.patches.Arc`. Use this
288 |             to specify, color, linewidth etc. of the arc.
289 | 
290 |         """
291 |         self.ax = ax or plt.gca()
292 |         self._xydata = xy  # in data coordinates
293 |         self.vec1 = p1
294 |         self.vec2 = p2
295 |         self.size = size
296 |         self.unit = unit
297 |         self.textposition = textposition
298 | 
299 |         super().__init__(self._xydata, size, size, angle=0.0,
300 |                          theta1=self.theta1, theta2=self.theta2, **kwargs)
301 | 
302 |         self.set_transform(IdentityTransform())
303 |         self.ax.add_patch(self)
304 | 
305 |         self.kw = dict(ha="center", va="center",
306 |                        xycoords=IdentityTransform(),
307 |                        xytext=(0, 0), textcoords="offset points",
308 |                        annotation_clip=True)
309 |         self.kw.update(text_kw or {})
310 |         self.text = ax.annotate(text, xy=self._center, **self.kw)
311 | 
312 |     def get_size(self):
313 |         factor = 1.
314 |         if self.unit == "points":
315 |             factor = self.ax.figure.dpi / 72.
316 |         elif self.unit[:4] == "axes":
317 |             b = TransformedBbox(Bbox.unit(), self.ax.transAxes)
318 |             dic = {"max": max(b.width, b.height),
319 |                    "min": min(b.width, b.height),
320 |                    "width": b.width, "height": b.height}
321 |             factor = dic[self.unit[5:]]
322 |         return self.size * factor
323 | 
324 |     def set_size(self, size):
325 |         self.size = size
326 | 
327 |     def get_center_in_pixels(self):
328 |         """return center in pixels"""
329 |         return self.ax.transData.transform(self._xydata)
330 | 
331 |     def set_center(self, xy):
332 |         """set center in data coordinates"""
333 |         self._xydata = xy
334 | 
335 |     def get_theta(self, vec):
336 |         vec_in_pixels = self.ax.transData.transform(vec) - self._center
337 |         return np.rad2deg(np.arctan2(vec_in_pixels[1], vec_in_pixels[0]))
338 | 
339 |     def get_theta1(self):
340 |         return self.get_theta(self.vec1)
341 | 
342 |     def get_theta2(self):
343 |         return self.get_theta(self.vec2)
344 | 
345 |     def set_theta(self, angle):
346 |         pass
347 | 
348 |     # Redefine attributes of the Arc to always give values in pixel space
349 |     _center = property(get_center_in_pixels, set_center)
350 |     theta1 = property(get_theta1, set_theta)
351 |     theta2 = property(get_theta2, set_theta)
352 |     width = property(get_size, set_size)
353 |     height = property(get_size, set_size)
354 | 
355 |     # The following two methods are needed to update the text position.
356 |     def draw(self, renderer):
357 |         self.update_text()
358 |         super().draw(renderer)
359 | 
360 |     def update_text(self):
361 |         c = self._center
362 |         s = self.get_size()
363 |         angle_span = (self.theta2 - self.theta1) % 360
364 |         angle = np.deg2rad(self.theta1 + angle_span / 2)
365 |         r = s / 2
366 |         if self.textposition == "inside":
367 |             r = s / np.interp(angle_span, [60, 90, 135, 180],
368 |                                           [3.3, 3.5, 3.8, 4])
369 |         self.text.xy = c + r * np.array([np.cos(angle), np.sin(angle)])
370 |         if self.textposition == "outside":
371 |             def R90(a, r, w, h):
372 |                 if a < np.arctan(h/2/(r+w/2)):
373 |                     return np.sqrt((r+w/2)**2 + (np.tan(a)*(r+w/2))**2)
374 |                 else:
375 |                     c = np.sqrt((w/2)**2+(h/2)**2)
376 |                     T = np.arcsin(c * np.cos(np.pi/2 - a + np.arcsin(h/2/c))/r)
377 |                     xy = r * np.array([np.cos(a + T), np.sin(a + T)])
378 |                     xy += np.array([w/2, h/2])
379 |                     return np.sqrt(np.sum(xy**2))
380 | 
381 |             def R(a, r, w, h):
382 |                 aa = (a % (np.pi/4))*((a % (np.pi/2)) <= np.pi/4) + \
383 |                      (np.pi/4 - (a % (np.pi/4)))*((a % (np.pi/2)) >= np.pi/4)
384 |                 return R90(aa, r, *[w, h][::int(np.sign(np.cos(2*a)))])
385 | 
386 |             bbox = self.text.get_window_extent()
387 |             X = R(angle, r, bbox.width, bbox.height)
388 |             trans = self.ax.figure.dpi_scale_trans.inverted()
389 |             offs = trans.transform(((X-s/2), 0))[0] * 72
390 |             self.text.set_position([offs*np.cos(angle), offs*np.sin(angle)])
391 |             
392 | 
393 | # 创建一个新的图形和轴
394 | fig, ax = plt.subplots()
395 | 
396 | # 设置矢量a的起点和终点
397 | a_start = np.array([0, 0])
398 | a_end = np.array([1, 2])
399 | 
400 | # 设置矢量b的起点和终点
401 | b_start = np.array([0, 0])
402 | b_end = np.array([3, 1])
403 | 
404 | # 绘制矢量a
405 | ax.plot([0, 1], [2, 2], color='red', linestyle='--', linewidth=1)
406 | ax.plot([1, 1], [0, 2], color='red', linestyle='--', linewidth=1)
407 | ax.annotate('', xy=a_end, xytext=a_start, arrowprops=dict(facecolor='red', edgecolor='red', arrowstyle='->', lw=2))
408 | 
409 | # 绘制矢量b
410 | ax.plot([0, 3], [1, 1], color='red', linestyle='--', linewidth=1)
411 | ax.plot([3, 3], [0, 1], color='red', linestyle='--', linewidth=1)
412 | ax.annotate('', xy=b_end, xytext=b_start, arrowprops=dict(facecolor='red', edgecolor='red', arrowstyle='->', lw=2))
413 | 
414 | # 标注点 (x1, y1)
415 | ax.text(a_end[0], a_end[1], '(x1, y1)', fontsize=12, color='red')
416 | 
417 | # 标注矢量的名称和角度θ
418 | ax.text((a_end[0] / 2)-0.1, (a_end[1] / 2)+0.1, 'a', fontsize=14, color='blue')
419 | ax.text(b_end[0] / 2, (b_end[1] / 2)+0.2, 'b', fontsize=14, color='blue')
420 | AngleAnnotation((0,0), b_end, a_end, ax=ax, size=35, text=r"$\theta$", textposition="outside")
421 | 
422 | # 设置坐标轴的范围和标签
423 | ax.set_xlim(0, 3.5)
424 | ax.set_ylim(0, 3.5)
425 | ax.set_xlabel('x')
426 | ax.set_ylabel('y')
427 | 
428 | # 绘制坐标轴
429 | ax.spines['left'].set_position('zero')
430 | ax.spines['left'].set_color('blue')
431 | ax.spines['bottom'].set_position('zero')
432 | ax.spines['bottom'].set_color('blue')
433 | ax.spines['right'].set_color('none')
434 | ax.spines['top'].set_color('none')
435 | 
436 | # 显示图形
437 | plt.show()
438 | ```
439 | 
440 | 


--------------------------------------------------------------------------------
/di-ba-zhang-ju-lei/8.2-jing-dian-ju-lei-suan-fa.md:
--------------------------------------------------------------------------------
1 | # 8.2 经典聚类算法
2 | 
3 | 


--------------------------------------------------------------------------------
/di-ba-zhang-ju-lei/fu-di-ba-zhang-zuo-ye.md:
--------------------------------------------------------------------------------
 1 | # 附 第八章作业
 2 | 
 3 | ## 作业1
 4 | 
 5 | ### 题目
 6 | 
 7 | 下图给出6个数据集A-F分别用两种算法得到的聚类结果，其 中一种是K均值聚类。请问哪些最可能是K均值聚类的结果？如 果K均值聚类结果不够理想，建议采用哪种聚类算法?
 8 | 
 9 | ![](../.gitbook/assets/聚类1.png)
10 | 
11 | 
12 | 
13 | ### 解答
14 | 
15 | 由于K-means聚类会表现为以簇中心的**中垂线分类**，因此很显然，上图中A2、B2、C2、D1、E1和F2属于使用K-means所得到的结果。
16 | 
17 | ![](../.gitbook/assets/K-mean.png)
18 | 
19 | 当K-means效果不佳时可以使用**高斯混合模型**等算法进行处理
20 | 
21 | 
22 | 
23 | 
24 | 
25 | ## 作业2
26 | 
27 | ### 题目
28 | 
29 | 对如图所示的数据集，采用K均值聚类。设K=3，3个聚类中心分别为$\mu_1=(6.2,3.2)^T$（红色），$\mu_2=(6.6,3.7)^T$（绿色），$\mu_3=(6.5,3.0)^T$（蓝色）
30 | 
31 | 请给出一次迭代后属于第一簇的样本及更新后的簇中心（保留两位小数）
32 | 
33 | ![](../.gitbook/assets/聚类2.png)
34 | 
35 | ### 解
36 | 
37 | 计算每一个点到三个聚类中心的距离，并将该样本归类到距离最短的聚类中。使用代码计算如下：
38 | 
39 | ```python
40 | import numpy as np
41 | 
42 | X = np.array([
43 |     [5.9, 3.2],
44 |     [4.6, 2.9],
45 |     [6.2, 2.8],
46 |     [4.7, 3.2],
47 |     [5.5, 4.2],
48 |     [5.0, 3.0],
49 |     [4.9, 3.1],
50 |     [6.7, 3.1],
51 |     [5.1, 3.8],
52 |     [6.0, 3.0]
53 | ])
54 | 
55 | mu = np.array([
56 |     [6.2, 3.2],
57 |     [6.6, 3.7],
58 |     [6.5, 3.0],
59 | ])
60 | 
61 | distances = np.linalg.norm(X[:, np.newaxis] - mu, axis=-1)
62 | 
63 | labels = np.argmin(distances, axis=-1)
64 | 
65 | new_mu = np.array([np.mean(X[labels == i], axis=0) for i in range(mu.shape[0])])
66 | 
67 | cluster1_samples = X[labels == 0]
68 | new_mu_1 = new_mu[0]
69 | 
70 | print('属于第一簇的样本：')
71 | print(cluster1_samples)
72 | print(f'更新后的第一簇中心：{np.round(new_mu_1, 2)}')
73 | ```
74 | 
75 | 
76 | 
77 | 得到结果：
78 | 
79 | 属于第一簇的样本：
80 | [[5.9 3.2]
81 |  [4.6 2.9]
82 |  [4.7 3.2]
83 |  [5.0 3.0]
84 |  [4.9 3.1]
85 |  [5.1 3.8]
86 |  [6.0  3.0]]
87 | 
88 | 更新后的第一簇中心：[5.17 3.17]
89 | 


--------------------------------------------------------------------------------
/di-er-zhang-sheng-cheng-shi-fen-lei-qi/2.1-mo-shi-shi-bie-yu-ji-qi-xue-xi-de-mu-biao.md:
--------------------------------------------------------------------------------
  1 | # 2.1 模式识别与机器学习的目标
  2 | 
  3 | **判别式的分类器**：
  4 | 
  5 | * 即建立一个映射$$y=F(x)$$
  6 | * 是非概率的，确定的
  7 | 
  8 | 但是现实中，并非所有事件都是因果对应的，而是概率性的，此时判别式的模式识别就不再能解决问题。需要用模式集的统计特征来分类，使得分类器发生错误的概论最小。
  9 | 
 10 | ### 2.1.1 贝叶斯判别原则
 11 | 
 12 | #### 贝叶斯公式
 13 | 
 14 | $$
 15 | P(A|B) = \frac{p(B|A)p(A)}{p(B)}
 16 | $$
 17 | 
 18 | #### 贝叶斯判别
 19 | 
 20 | 将实例带入其中，假设有两种模式$$\omega_1$$和$$\omega_2$$，需要分析$$x$$来自其中哪个，则有
 21 | 
 22 | $$
 23 | P(\omega_1|x)=\frac{p(x|\omega_1)p(\omega_1)}{p(x)}
 24 | \\ 
 25 | P(\omega_2|x)=\frac{p(x|\omega_2)p(\omega_2)}{p(x)}
 26 | $$
 27 | 
 28 | 以其中第一个式子举例，
 29 | 
 30 | * 要求的 $$P(\omega_1|x)%$$ 即为 $$x \in \omega_1$$ 的概率，称为<mark style="color:orange;">**后验概率**</mark>
 31 | * $$p(\omega_1)$$ 是来自数据集和历史数据，称为<mark style="color:orange;">**先验概率**</mark>
 32 | * $$p(x|\omega_1)$$ 是x的条件概率，这里也称为<mark style="color:orange;">**似然函数**</mark>
 33 | * $$p(x)$$ 是全概率
 34 | 
 35 | {% hint style="warning" %}
 36 | 
 37 | 这里全概率计算时也可能是使用条件概率来计算的，但是在贝叶斯判别中将其称为全概率
 38 | 
 39 | {% endhint %}
 40 | 
 41 | 
 42 | 
 43 | 实际上在使用中，由于每个后验概率的全概率是相同的，因此只需要比较分子即可，进一步说，比较似然函数和先验函数即可。
 44 | 
 45 | $$
 46 | 若P(\omega_1|x) > P(\omega_2|x)，则c\in \omega_1 
 47 | \\ 
 48 | 若P(\omega_1|x)< P(\omega_2|x)，则c\in \omega_2
 49 | $$
 50 | 
 51 | 特别的，将$$l_{12}(x)=\dfrac{p(x|\omega_1)}{p(x|\omega_2)}$$称为<mark style="color:orange;">**似然比**</mark>，将$$\theta_{21} = \dfrac{P(\omega_2)}{P(\omega_1)}$$称为似然比的<mark style="color:orange;">**判决阈值**</mark>，则将上式简化可得：
 52 | 
 53 | $$
 54 | 若l_{12}(x) > \theta_{21}，则c\in \omega_1 \\ 若l_{12}(x) < \theta_{21}，则c\in \omega_2
 55 | $$
 56 | 
 57 | 此判别就称为<mark style="color:orange;">**贝叶斯判别**</mark>。
 58 | 
 59 | {% hint style="info" %}
 60 | 
 61 | **例**：假设对地震进行分析，$$\omega_1$$表示地震，$$\omega_2$$表示正常，根据统计得知$$P(\omega_1)=0.2$$。而生物是否发生异常反应是与地震发生与否相关的，统计地震前一周生物是否发生异常，得到了以下数据：
 62 | 
 63 | - 地震前一周生物发生异常的概率为0.6
 64 | - 地震前一周生物没有发生异常的概率为0.4
 65 | - 没有发生地震但生物发生异常的概率为0.1
 66 | - 没有发生地震且生物没有异常的概率为0.9
 67 | 
 68 | 那么某日观测到生物发生异常，问是否会发生地震？
 69 | 
 70 | 
 71 | 
 72 | 由题意可知：
 73 | $$
 74 | \begin{align}
 75 | &P(\omega_1) = 0.2 \ \ P(\omega_2) = 0.8 \nonumber\\
 76 | &p(x=\text{异常}|\omega_1) =0.6 \ \ p(x=\text{正常}|\omega_1)=0.4\nonumber\\
 77 | &p(x=\text{异常}|\omega_2) =0.1 \ \ p(x=\text{正常}|\omega_2)=0.9\nonumber\\
 78 | \end{align}
 79 | $$
 80 | 带入贝叶斯公式，有：
 81 | $$
 82 | \begin{align}
 83 | P(\omega_1|x=异常) &= \frac{p(x=异常|\omega_1)P(\omega_1)}{p(x=异常)} \nonumber
 84 | \\
 85 | &=\frac{p(x=异常|\omega_1)P(\omega_1)}{p(x=异常|\omega_1)P(\omega_1) + p(x=异常|\omega_2)P(\omega_2)} \nonumber
 86 | \\
 87 | &= \frac{0.6\times0.2}{0.6\times0.2+0.1\times0.8} = 0.6 \nonumber
 88 | \end{align}
 89 | $$
 90 | 计算似然比与判决阈值：
 91 | $$
 92 | l_{12} = \frac{p(x=异常|\omega_1)}{p(x=异常|\omega_2)} = 6\\
 93 | \theta_{21} = \frac{P(\omega_2)}{P(\omega_1)} = 4
 94 | $$
 95 | 似然比大于判别阈值，因此会发生地震。
 96 | 
 97 | {% endhint %}
 98 | 
 99 | ### 2.1.2 贝叶斯最小风险判别
100 | 
101 | 实际上，不同模式<mark style="color:purple;">**误判的代价是不一样的**</mark>，因此需要对贝叶斯判别做一些修正，提出了<mark style="color:orange;">**条件平均风险**</mark> $$r_j(x)$$。
102 | 
103 | #### M类分类问题的平均条件风险
104 | 
105 | 对于M类分类问题，若样本被判定为属于$$\omega_j$$的平均风险为：
106 | 
107 | $$
108 | r_{ij}(x) = \sum_{i=1}^ML_{ij}P(\omega_i|x)
109 | $$
110 | 
111 | 其中，$$L_{ij}$$表示误判的损失，称为将属于$$\omega_i$$类的物品误判为$$\omega_j$$的<mark style="color:orange;">**是非代价**</mark>
112 | 
113 | 一般而言，是非代价表现为一个<mark style="color:purple;">**对称阵**</mark>，其中$$L_{ii}$$一般为0或负数，表示判定成功，其他值表示判定失败，用正数表示。
114 | 
115 | #### 最小平均风险
116 | 
117 | 按照贝叶斯公式，最小平均风险可以表示为：
118 | 
119 | $$
120 | r_{j}=\frac{1}{p(x)}\sum_{i=1}^{M} L_{ij} p(x|\omega_i)P(\omega_i)
121 | $$
122 | 
123 | 其中全概率可以省去，因此最小平均风险可以表示为：
124 | 
125 | $$
126 | r_{j}=\sum_{i=1}^{M} L_{ij} p(x|\omega_i)P(\omega_i)
127 | $$
128 | 
129 | #### 贝叶斯最小风险判别
130 | 
131 | 对于M分类的情况，若 $$r_i(x) < r_j(x),j=1,2,\dots,M,\ j\neq i$$ ，则有$$x \in \omega_i$$
132 | 
133 | 对于是非代价，取
134 | 
135 | $$
136 | L_{ij} = \begin{cases} 0& \text{when}\ i=j\\ 1& \text{when}\ i\neq j \end{cases}
137 | $$
138 | 
139 | 则条件平均风险表示为：
140 | 
141 | $$
142 | \begin{align} 
143 | r_{j}&=\sum_{i=1}^{M} L_{ij} p(x|\omega_i)P(\omega_i) \nonumber
144 | \\ 
145 | &=L_{1j}p(x|\omega_1)P(\omega_1) + L_{2j}p(x|\omega_2)P(\omega_2) + \cdots + L_{Mj}p(x|\omega_M)P(\omega_M) \nonumber
146 | \\ 
147 | &= \sum_{i=1}^Mp(x|\omega_i)P(\omega_i) - p(x|\omega_i)P(\omega_i) \nonumber
148 | \\ 
149 | &=p(x)-p(x|\omega_i)P(\omega_i) \nonumber
150 | \end{align}
151 | $$
152 | 
153 | 记$$d_i(x)=p(x|\omega_i)P(\omega_i),i=1,2,\dots,M$$，则有若$$d_i(x) > r_j(x)$$，则$$x \in \omega_i$$
154 | 


--------------------------------------------------------------------------------
/di-er-zhang-sheng-cheng-shi-fen-lei-qi/2.2-zheng-tai-fen-bu-mo-shi-de-bei-ye-si-fen-lei-qi.md:
--------------------------------------------------------------------------------
  1 | # 2.2 正态分布模式的贝叶斯分类器
  2 | 
  3 | ## 2.2.1 M种模式类别的正态密度函数
  4 | 
  5 | 具有M种模式类别的多变量正态密度函数为：
  6 | 
  7 | $$
  8 | p(x|\omega_i)=\frac{1}{(2\pi)^{\frac{n}{2}} (|C_i|)^{\frac{1}{2}}}e^{-\frac{1}{2}(x-m_i)^T C_i^{-1}(x-m_i)}
  9 | $$
 10 | 
 11 | 其中，
 12 | 
 13 | * $$n$$ 为<mark style="color:purple;">**模式向量的维度**</mark>
 14 | * $$m_i$$ 为<mark style="color:orange;">**均值向量**</mark>
 15 | * $$C_i$$ 为<mark style="color:orange;">**协方差矩阵**</mark>
 16 | * $$|C_i|$$ 为协方差矩阵的行列式
 17 | 
 18 | $$
 19 | \begin{align} 
 20 | m_i &= E_i\{x\} \nonumber
 21 | \\ 
 22 | C_i &= E_i\{(x-m_i)(x-m_i)^T\}\nonumber
 23 | \end{align}
 24 | $$
 25 | 
 26 | $$E_i{x}$$表示对类别属于$$\omega_i$$的模型的<mark style="color:purple;">**数学期望**</mark>
 27 | 
 28 | $$C_i$$是一个对称的正定阵，其对角线上的值代表元素的<mark style="color:purple;">**方差**</mark>，非对角线上为元素之间的<mark style="color:purple;">**协方差**</mark>。因此若元素之间全部<mark style="color:red;">**独立**</mark>时，多变量的正态概率密度函数可以简化为单个正态类密度函数的乘积。
 29 | 
 30 | 由于类别$$\omega_i$$的**判别函数**可以写为：
 31 | 
 32 | $$
 33 | d_i(x)=p(x|\omega_i)P(\omega_i),\ i=1,2,\dots,M
 34 | $$
 35 | 
 36 | 对于正态密度函数，可以取对数方便计算，则将正态类密度函数带入，可得：
 37 | 
 38 | $$
 39 | \begin{align} 
 40 | d_i(x) &= \ln[p(x|\omega_i)] + \ln(P(\omega_i)) \nonumber
 41 | \\ 
 42 | &= -[\frac{n}{2}\ln(2\pi) + \frac{1}{2}\ln(|C_i|)] -\frac{1}{2}(x-m_i)^T C_i^{-1}(x-m_i) + \ln(P[\omega_i)] \nonumber
 43 | \\ 
 44 | &= \ln[P(\omega_i)] - \frac{1}{2}\ln(|C_i|) -\frac{1}{2}(x-m_i)^T C_i^{-1}(x-m_i) - \frac{n}{2}\ln(2\pi) \nonumber
 45 | \end{align}
 46 | $$
 47 | 
 48 | 将其中与 $$i$$ 无关的项去除，即可得到<mark style="color:orange;">**正态分布模式的贝叶斯判别函数**</mark>：
 49 | 
 50 | $$
 51 | d_i(x) = \ln[P(\omega_i)] - \frac{1}{2}\ln(|C_i|) -\frac{1}{2}(x-m_i)^T C_i^{-1}(x-m_i),\ i=1,2,\dots,M
 52 | $$
 53 | 
 54 | ### 特点
 55 | 
 56 | * 判别函数是一个<mark style="color:purple;">**超二次曲面**</mark>
 57 | * 对于正态分布模式的贝叶斯判别器，将模式类别之间用一个二此判别界面分开，即可得到最优的分类结果
 58 | 
 59 | ## 2.2.2 符合正态分布的二分类问题
 60 | 
 61 | ### 当 $$C_1\neq C_2$$ 时
 62 | 
 63 | 假设两类模式的分布分别为$$N(m_1,C_1)$$和$$N(m_2,C_2)$$，则两类的判别函数分别为
 64 | 
 65 | $$
 66 | \begin{align} 
 67 | d_1(x) &= \ln P(\omega_1) - \frac{1}{2}\ln(|C_1|) -\frac{1}{2}(x-m_1)^T C_1^{-1}(x-m_1) \nonumber
 68 | \\ 
 69 | d_2(x) &= \ln P(\omega_2) - \frac{1}{2}\ln(|C_2|) -\frac{1}{2}(x-m_2)^T C_2^{-1}(x-m_2) \nonumber
 70 | \\ 
 71 | &d_1(x)-d_2(x) = 
 72 | \begin{cases}
 73 | >0& x \in \omega_1
 74 | \\ 
 75 | <0& x\in \omega_2 
 76 | \end{cases} \nonumber
 77 | \end{align}
 78 | $$
 79 | 
 80 | * <mark style="color:orange;">**判别界面**</mark>$$d_1(x)-d_2(x)=0$$是x的<mark style="color:purple;">**二次型方程**</mark>
 81 | * 当x是二维模式时，判别界面为二次曲线。如圆、椭圆、双曲线、抛物线等
 82 | 
 83 | ### 当 $$C_1=C_2=C$$ 时
 84 | 
 85 | {% hint style="warning" %}
 86 | 
 87 | 当两个模式的协方差矩阵相等时，意味着它们具有相同的方差和相同的线性关系。这可以解释为两个模式具有相似的变化模式，并且它们之间的相关性和方向相同。这种情况下，可以说这两个模式在数据中具有相似的特征和变化方式。
 88 | 
 89 | {% endhint %}
 90 | 
 91 | 由于$$C_1=C_2$$，上式可以简化为：
 92 | 
 93 | $$
 94 | d_1(x) - d_2(x) = \ln P(\omega_1) - \ln P(\omega_2) + (m_1 - m_2)^TC^{-1}x - \frac{1}{2}m_1^TC_{-1}m_1 + \frac{1}{2}m_2^TC^{-1}m_2
 95 | $$
 96 | 
 97 | * 判别界面为x的<mark style="color:purple;">**线性函数**</mark>，为一超平面
 98 | * 当x是二维时，判别界面为一条直线
 99 | 
100 | {% hint style="info" %}
101 | **例**：两类问题且模式均为正态分布的实例$$P(\omega_1) = P(\omega_2) = \frac{1}{2}$$，求判别界面
102 | 
103 | ![](../.gitbook/assets/2.2.1.png)
104 | 
105 | 计算均值向量和协方差矩阵，由大数定律：
106 | 
107 | $$
108 | \begin{align} 
109 | m_i &= E_i\{x\} \nonumber
110 | \\
111 | &= \frac{1}{N}\sum_{j=1}^{N_i}x_{ij} \nonumber
112 | \\ 
113 | C_i &= E_i\{(x-m_i)(x-m_i)^T\} \nonumber
114 | \\ 
115 | &=\frac{1}{N}\sum_{j=1}^{N_i}(x_{ij}-m_i)(x_{ij}-m_i)^T \nonumber
116 | \end{align}
117 | $$
118 | 
119 | 其中，$$N_i$$ 为 $$\omega_i$$ 中模式的数目，$$x_{ij}$$ 表示第i个类别中的第j个模式，可得：
120 | 
121 | $$
122 | m_1 = \frac{1}{4}(3\ 1\ 1)^T
123 | \\ 
124 | \\ 
125 | m_2 = \frac{1}{4}(1\ 3\ 3)^T
126 | \\
127 | \\
128 | C_1 = C_2 = C = \frac{1}{16} 
129 | \begin{pmatrix} 
130 | 3 & 1 &1\\ 
131 | 1 & 3 &-1\\ 
132 | 1 & -1 & 3 
133 | \end{pmatrix}
134 | \\ 
135 | \\ 
136 | C^{-1} = 4 
137 | \begin{pmatrix} 
138 | 2 &-1 & -1\\ 
139 | -1 & 2 & 1\\ 
140 | -1 & 1 & 2 
141 | \end{pmatrix}
142 | $$
143 | 
144 | 带入可得判别界面为：
145 | 
146 | $$
147 | \begin{align} 
148 | d_1(x) - d_2(x) &= \ln P(\omega_1) - \ln P(\omega_2) + (m_1 - m_2)^TC^{-1}x  - \frac{1}{2}m_1^TC_{-1}m_1 + \frac{1}{2}m_2^TC^{-1}m_2 \nonumber
149 | \\ 
150 | &=8x_1-8x_2 -8x_3 + 4 = 0 
151 | \end{align}
152 | $$
153 | {% endhint %}
154 | 
155 | {% hint style="warning" %}
156 | * 贝叶斯分类是基于统计规则的
157 | * 若样本量较少，一般难以获得最有效果
158 | 
159 | 
160 | 
161 | {% endhint %}
162 | 
163 | ## 2.2.3 朴素贝叶斯
164 | 
165 | 在特征 $$x=(x\_1,x\_2,x\_3,\dots,x\_d)$$ 是多维向量时，朴素贝叶斯算法假设各个特征之间<mark style="color:purple;">**相互独立**</mark>
166 | 
167 | $$
168 | p(x_1,x_2,\dots,x_d|\omega)= \prod^d p(x_i|\omega)
169 | $$
170 | 


--------------------------------------------------------------------------------
/di-er-zhang-sheng-cheng-shi-fen-lei-qi/2.3-jun-zhi-xiang-liang-he-xie-fang-cha-ju-zhen-de-can-shu-gu-ji.md:
--------------------------------------------------------------------------------
  1 | # 2.3 均值向量和协方差矩阵的参数估计
  2 | 
  3 | * 在贝叶斯分类器中，构造分类器需要知道类概率密度函数$$p(x|\omega_i)$$
  4 | * 如果按先验知识已知其分布，则只需知道分布的参数即可
  5 |   * 例如：类概率密度是正态分布，它完全由其均值向量和协方差矩阵所确定
  6 | 
  7 | 对均值向量和协方差矩阵的估计即为贝叶斯分类器中的一种<mark style="color:orange;">**参数估计**</mark>问题
  8 | 
  9 | {% hint style="success" %}
 10 | **参数估计的两种方式**
 11 | 
 12 | - 一种是将参数作为非随机变量来处理，例如<mark style="color:purple;">**矩估计**</mark>就是一种非随机参数的估计
 13 | - 另一种是随机参数的估计，即**把这些参数看成是随机变量**，例如<mark style="color:purple;">**贝叶斯参数估计**</mark>
 14 | 
 15 | {% endhint %}
 16 | 
 17 | ## 2.3.1 定义
 18 | 
 19 | ### 均值
 20 | 
 21 | 设模式的概率密度函数为$$p(x)$$，则均值的定义为：
 22 | 
 23 | $$
 24 | m = E(x) = \int_x xp(x)dx
 25 | $$
 26 | 
 27 | 其中，$$x=(x_1,x_2,\dots,x_n)^T$$，$$m=(m_1,m_2,\dots,m_n)^T$$
 28 | 
 29 | 由大数定律有，均值的估计量为：
 30 | 
 31 | $$
 32 | \hat{m} = \frac{1}{N}\sum^N_{j=1}x_j
 33 | $$
 34 | 
 35 | ### 协方差
 36 | 
 37 | 协方差矩阵为：
 38 | 
 39 | $$
 40 | C= \begin{pmatrix} c_{11} & c_{12} & \cdots & c_{1n}\\ c_{21} & c_{22} & \cdots & c_{2n}\\ \vdots & \vdots & \ddots & \vdots\\ c_{n1} & c_{n2} & \cdots & c_{nn} \end{pmatrix}
 41 | $$
 42 | 
 43 | 其中，每个元素的定义为：
 44 | 
 45 | $$
 46 | \begin{align} 
 47 | c_{ij} &= E\{(x_i-m_i)(x_j-m_j)\} \nonumber
 48 | \\ 
 49 | &=\int_{-\infty}^{\infty}\int_{-\infty}^{\infty}(x_i-m_i)(x_j-m_j)p(x_i,x_j)dx_idx_j \nonumber
 50 | \end{align}
 51 | $$
 52 | 
 53 | 其中，$$x_i$$、$$x_j$$和$$m_i$$、$$m_j$$分别为**x**、**m**的第i和j个分量。
 54 | 
 55 | 将协方差矩阵写成向量的方式为：
 56 | 
 57 | $$
 58 | \begin{align} 
 59 | C&=E\{(x-m)(x-m)^T\} \nonumber
 60 | \\ 
 61 | &=E\{xx^T\} - mm^T \nonumber
 62 | \end{align}
 63 | $$
 64 | 
 65 | 则根据大数定律，协方差的估计量可以写为：
 66 | 
 67 | $$
 68 | \hat{C} \approx \frac{1}{N}\sum^{N}_{k=1}(x_k-\hat{m})(x_k-\hat{m})^T
 69 | $$
 70 | 
 71 | ## 2.3.2 迭代运算
 72 | 
 73 | ### 均值
 74 | 
 75 | 假设已经计算了N个样本的均值估计量，此时若新增一个样本，则新的估计量为：
 76 | $$
 77 | \begin{align}
 78 | \hat{m}(N+1) &= \frac{1}{N+1}\sum^{N+1}_{j=1}x_j \nonumber
 79 | \\
 80 | &= \frac{1}{N+1}\left[\sum_{j=1}^Nx_j + x_{N+1}\right] \nonumber
 81 | \\
 82 | &= \frac{1}{N+1}[N\hat{m}(N) + x_{N+1}]
 83 | \end{align}
 84 | $$
 85 | 迭代的初始化取$$\hat{m}(1)=x_1$$
 86 | 
 87 | 
 88 | 
 89 | ### 协方差
 90 | 
 91 | 协方差与均值类似，当前已知
 92 | $$
 93 | \hat{C}(N)=\frac{1}{N}\sum^N_{j=1}x_jx_j^T - \hat{m}(N)\hat{m}^T(N)
 94 | $$
 95 | 则新加入一个样本后：
 96 | $$
 97 | \begin{align}
 98 | \hat{C}(N+1) &= \frac{1}{N+1}\sum^{N+1}_{j=1}x_jx_j^T - \hat{m}(N+1)\hat{m}^T(N+1) \nonumber
 99 | \\
100 | &= \frac{1}{N+1}\left[\sum_{j=1}^Nx_jx_j^T + x_{N+1}x_{N+1}^T\right] - \hat{m}(N+1)\hat{m}^T(N+1) \nonumber
101 | \\
102 | &=\frac{1}{N+1}[N\hat{C}(N) + N\hat{m}(N)\hat{m}^T(N) + x_{N+1}x_{N+1}^T] - \nonumber
103 | \\
104 | &\ \frac{1}{(N+1)^2}[N\hat{m}(N) + x_{N+1}][N\hat{m}(N) + x_{N+1}]^T
105 | \end{align}
106 | $$
107 | 
108 | 
109 | 由于$$\hat{m}(1)=x_1$$，因此有$$\hat{C}(1) = 0$$
110 | 
111 | 
112 | 
113 | ## 2.3.4 贝叶斯学习
114 | 
115 | - 将概率密度函数的参数估计量看成是随机变量$$\theta$$，它可以是纯量、向量或矩阵
116 | - 按这些估计量统计特性的先验知识，可以先粗略地预选出它们的密度函数
117 | - 通过训练模式样本集$$\{x_i\}$$，利用贝叶斯公式设计一个迭代运算过程求出参数的后验概率密度$$p(\theta|x_i)$$
118 | - 当后验概率密度函数中的随机变量$$\theta$$的确定性提高时，可获得较准确的估计量
119 | 
120 | 具体而言，就是：
121 | $$
122 | p(\theta|x_1,\cdots,x_N) = \frac{p(x_N|\theta,x_1,\cdots,x_{N-1})p(\theta|x_1,\cdots,x_{N-1})}{p(x_N|x_1,\cdots,x_{N-1})}
123 | $$
124 | 其中，<mark style="color:orange;">**先验概率**</mark>$$p(\theta|x_1,\cdots,x_{N-1})$$由迭代计算而来，而<mark style="color:orange;">**全概率**</mark>则由以下方式计算：
125 | $$
126 | p(x_N|x_1,\cdots,x_{N-1})=\int_xp(x_N|\theta,x_1,\cdots,x_{N-1})p(\theta|x_1,\cdots,x_{N-1})d\theta
127 | $$
128 | 因此，实际上需要知道的就是初始的$$p(\theta)$$
129 | 
130 | 
131 | 
132 | ### 单变量正态密度的均值学习
133 | 
134 | {% hint style="info" %}
135 | 
136 | 假设有一个模式样本集，其概率密度函数是单变量正态分布$$N(\theta,\sigma^2)$$，均值$$\theta$$待求，即：
137 | $$
138 | p(x|\theta)=\frac{1}{\sqrt{2\pi}\sigma}\exp{\left[-\frac{1}{2}\left(\frac{x-\theta}{\sigma^2}\right)^2\right]}
139 | $$
140 | 给出N个训练样本$$\{x_1,x_2,\dots,x_N\}$$，用贝叶斯学习计算其均值估计量。
141 | 
142 | 
143 | 
144 | 对于初始条件，设 $$p(\theta)=N(\theta_0,\sigma^2_0)$$，$$p(x_1|\theta)=N(\theta,\sigma^2)$$，由贝叶斯公式可得：
145 | $$
146 | \begin{align}
147 | p(\theta|x_1) &= a\cdot p(x_1|\theta)p(\theta)\nonumber
148 | \\
149 | &= a\cdot \frac{1}{\sqrt{2\pi}\sigma}\exp{\left[-\frac{1}{2}\left(\frac{x-\theta}{\sigma^2}\right)^2\right]}\cdot \frac{1}{\sqrt{2\pi}\sigma_0}\exp{\left[-\frac{1}{2}\left(\frac{x-\theta_0}{\sigma_0^2}\right)^2\right]}\nonumber
150 | \end{align}
151 | $$
152 | 其中a是一定值。由贝叶斯法则有：
153 | $$
154 | p(\theta|x_1,\dots,x_N)=\frac{p(x_1,\dots,x_N|\theta)p(\theta)}{\int_\varphi p(x_1,\dots,x_N|\theta)p(\theta)d\theta}
155 | $$
156 | 
157 | 
158 | 此处$$\phi$$表示整个模式空间，由于每一次迭代是逐个从样本子集中抽取，因此N次运算是<mark style="color:orange;">**独立的**</mark>，上式由此可以写成：
159 | $$
160 | \begin{align}
161 | p(\theta|x_1,\dots,x_N)&=a\cdot\left\{\prod_{k=1}^Np(x_k|\theta)\right\}p(\theta)\nonumber
162 | \\
163 | &=a\cdot\left\{\prod_{k=1}^N\frac{1}{\sqrt{2\pi}\sigma}\exp{\left[-\frac{1}{2}\left(\frac{x_k-\theta}{\sigma^2}\right)^2\right]}\right\}\cdot\frac{1}{\sqrt{2\pi}\sigma_0}\exp{\left[-\frac{1}{2}\left(\frac{x-\theta_0}{\sigma_0^2}\right)^2\right]}\nonumber
164 | \\
165 | &=a^{'}\exp{\left[-\frac{1}{2}\left\{\sum_{k=1}^N\left(\frac{x_k-\theta}{\sigma}\right)^2\right\} + \left(\frac{x-\theta_0}{\sigma_0^2}\right)^2\right]}\nonumber
166 | \\
167 | &= a^{\prime \prime} \exp \left[-\frac{1}{2}\left\{\left(\frac{N}{\sigma^2}+\frac{1}{\sigma_0^2}\right) \theta^2-2\left(\frac{1}{\sigma^2} \sum_{k=1}^N x_k+\frac{\theta_0}{\sigma_0^2}\right) \theta\right\}\right]\nonumber
168 | \end{align}
169 | $$
170 | 将上式中所有与$$\theta$$无关的变量并入常数项$$a^{'}$$和$$a^{''}$$，则$$p(\theta|x_1,\dots,x_N)$$是$$\theta$$平方函数的指数集合，仍是<mark style="color:orange;">**正态密度函数**</mark>，写为$$N(\theta_N,\sigma_N^2)$$的形式，有：
171 | $$
172 | \begin{align}
173 | p(\theta|x_1,\dots,x_N) &= \frac{1}{\sqrt{2\pi}\sigma_N}\exp{\left[-\frac{1}{2}\left(\frac{\theta-\theta_N}{\sigma_N}\right)^2\right]}\nonumber
174 | \\
175 | &= a^{'''}\exp{\left[-\frac{1}{2}\left(\frac{\theta^2}{\sigma^2_N}-2\frac{\theta_N\theta}{\sigma^2_N}\right)\right]}\nonumber
176 | \end{align}
177 | $$
178 | 上述两式相比较，可得：
179 | $$
180 | \begin{align}
181 | \frac{1}{\sigma^2}&=\frac{N}{\sigma^2} + \frac{1}{\sigma^2_0}\nonumber
182 | \\
183 | \frac{\theta_N}{\sigma_N^2} &= \frac{1}{\sigma^2}\sum^N_{k=1}x_k + \frac{\theta_0}{\sigma_0^2}\nonumber
184 | \\
185 | &= \frac{N}{\sigma^2}\hat{m} + \frac{\theta_0}{\sigma_0^2}\nonumber
186 | \end{align}
187 | $$
188 | 解出$$\theta_N$$和$$\sigma_N$$，得：
189 | $$
190 | \begin{align}
191 | \theta_N &= \frac{N\sigma_0^2}{N\sigma_0^2 + \sigma^2}\hat{m}_N + \frac{\sigma^2}{N\sigma_0^2 + \sigma^2}\nonumber
192 | \\
193 | \sigma_N^2 &= \frac{\sigma_0^2\sigma^2}{N\sigma_0^2 + \sigma^2}\nonumber
194 | \end{align}
195 | $$
196 | 即根据对样本的观测，求得均值$$\theta$$的<mark style="color:purple;">**后验概率密度**</mark>$$p(\theta|x_i)$$为$$N(\theta_N,\sigma_N^2)$$，其中：
197 | 
198 | $$\theta_N$$是先验信息（$$\theta_0,\sigma_0^2,\sigma^2$$）与训练样本所给信息（$$N,\hat{m}$$）适当结合的结果，是N个训练样本对均值的<mark style="color:purple;">**先验估计**</mark>$$\theta_0$$的补充
199 | 
200 | $$\sigma_N^2$$是<mark style="color:orange;">**对这个估计的不确定性的度量**</mark>，随着N的增加而减少，因此当$$N\to\infin$$时，$$\sigma_N \to 0$$，代入上式可知只要$$\sigma_0\neq0$$，则当N数量足够大时，$$\theta_N$$趋于样本均值的估计量$$\hat{m}$$
201 | 
202 | ![](../.gitbook/assets/2.3.1.png)
203 | 
204 | {% endhint %}
205 | 
206 | 


--------------------------------------------------------------------------------
/di-er-zhang-sheng-cheng-shi-fen-lei-qi/fu-di-er-zhang-zuo-ye.md:
--------------------------------------------------------------------------------
  1 | # 附 第二章作业
  2 | 
  3 | ## 题目
  4 | 
  5 | 设以下模式类别具有正态概率密度函数：  
  6 | $$
  7 | \omega_1 : \{(0\ 0)^T,(2\ 0)^T,(2\ 2)^T,(0\ 2)^T\}
  8 | \\
  9 | \omega_2 : \{(4\ 4)^T,(6\ 4)^T,(6\ 6)^T,(4\ 6)^T\}
 10 | $$
 11 | （1）设$$P(\omega_1)=P)\omega_2=\frac{1}{2}$$，求这两类模式之间的贝叶斯判别界面方程式。
 12 | 
 13 | （2）绘出判别界面。
 14 | 
 15 | 
 16 | 
 17 | ### 解
 18 | 
 19 | #### （1）判别界面方程式
 20 | 
 21 | 由
 22 | $$
 23 | m_i = \frac{1}{N}\sum_{j=1}^{N_i}x_{ij}
 24 | \\
 25 | C_i = \frac{1}{N}\sum_{j=1}^{N_i}(x_{ij}-m_i)(x_{ij}-m_i)^T
 26 | $$
 27 | 得：
 28 | $$
 29 | \begin{align}
 30 | &m_1 =\frac14\sum_{j=1}^4x_{1j}= (1\ 1)^T \nonumber
 31 | \\
 32 | &m_2 =\frac14\sum_{j=1}^4x_{2j}= (5\ 5)^T
 33 | \\
 34 | &C_1 = \frac14\sum_{j=1}^4(x_{1j}-m_1)(x_{1j}-m_1)^T=
 35 | \begin{pmatrix}
 36 | 1 & 0\\
 37 | 0 & 1
 38 | \end{pmatrix}
 39 | \\
 40 | &C_2 = \frac14\sum_{j=1}^4(x_{2j}-m_2)(x_{2j}-m_2)^T=
 41 | \begin{pmatrix}
 42 | 1 & 0\\
 43 | 0 & 1
 44 | \end{pmatrix}
 45 | \\
 46 | &C^{-1}=
 47 | \begin{pmatrix}
 48 | 1 & 0\\
 49 | 0 & 1
 50 | \end{pmatrix}
 51 | \end{align}
 52 | $$
 53 | 带入判别式有：
 54 | $$
 55 | \begin{align} 
 56 | d_1(x) - d_2(x) &= \ln P(\omega_1) - \ln P(\omega_2) + (m_1 - m_2)^TC^{-1}x  - \frac{1}{2}m_1^TC^{-1}m_1 + \frac{1}{2}m_2^TC^{-1}m_2 \nonumber
 57 | \\ 
 58 | &=-4x_1-4x_2 + 24 = 0 
 59 | \end{align}
 60 | $$
 61 | 
 62 | #### （2）绘图
 63 | 
 64 | 绘制图像如下图所示：
 65 | 
 66 | ![image-20230925093502582](../.gitbook/assets/贝叶斯.png)
 67 | 
 68 | ### 附加
 69 | 
 70 | 使用python实现：
 71 | 
 72 | ```python
 73 | import numpy as np
 74 | import matplotlib.pyplot as plt
 75 | 
 76 | w1 = np.array([[0, 0], [2, 0], [2, 2], [0, 2]])
 77 | w2 = np.array([[4, 4], [6, 4], [6, 6], [4, 6]])
 78 | 
 79 | m1 = np.mean(w1, axis=0)
 80 | m2 = np.mean(w2, axis=0)
 81 | 
 82 | cov1 = np.cov(w1.T, bias=True)
 83 | cov2 = np.cov(w2.T, bias=True)
 84 | 
 85 | inv_cov1 = np.linalg.inv(cov1)
 86 | inv_cov2 = np.linalg.inv(cov2)
 87 | 
 88 | w = np.dot((m1 - m2), inv_cov1)
 89 | b = -0.5 * np.dot(m1, np.dot(inv_cov1, m1)) + 0.5 * np.dot(m2, np.dot(inv_cov2, m2))
 90 | 
 91 | 
 92 | def discriminant_function(_x):
 93 |     return -(w[0] * _x + b) / w[1]
 94 | 
 95 | 
 96 | x = np.linspace(-2, 7, 100)
 97 | plt.plot(x, discriminant_function(x), 'r-', label='Discriminant')
 98 | plt.scatter(w1[:, 0], w1[:, 1], c='blue', label='ω1')
 99 | plt.scatter(w2[:, 0], w2[:, 1], c='green', label='ω2')
100 | plt.legend()
101 | 
102 | plt.show()
103 | ```
104 | 
105 | 


--------------------------------------------------------------------------------
/di-jiu-zhang-jiang-wei/9.1-ji-ben-gai-nian.md:
--------------------------------------------------------------------------------
1 | # 9.1 基本概念
2 | 
3 | 


--------------------------------------------------------------------------------
/di-jiu-zhang-jiang-wei/9.2-wei-du-xuan-ze.md:
--------------------------------------------------------------------------------
1 | # 9.2 维度选择
2 | 
3 | 


--------------------------------------------------------------------------------
/di-jiu-zhang-jiang-wei/9.3-wei-du-chou-qu.md:
--------------------------------------------------------------------------------
1 | # 9.3 维度抽取
2 | 
3 | 


--------------------------------------------------------------------------------
/di-liu-zhang-you-jian-du-xue-xi/6.1-you-jian-du-xue-xi.md:
--------------------------------------------------------------------------------
 1 | # 6.1 有监督学习
 2 | 
 3 | 
 4 | 
 5 | ## 6.1.1 什么是有监督学习
 6 | 
 7 | {% hint style="success" %}
 8 | 
 9 | 从<mark style="color:orange;">**有标记**</mark>的训练数据中学习推断函数。
10 | 
11 | {% endhint %}
12 | 
13 | - 有监督学习算法分析训练数据，产生<mark style="color:purple;">**推断函数**</mark>
14 | - 推断函数能够对新的样本进行预测
15 | - **最优的情形**：算法能够准确地对**没见过**的样本进行正确地分类
16 | - <mark style="color:purple;">**目标函数（target function）** </mark>：$$y=f(x)$$ 或 $$P(y\vert x)$$
17 | 
18 | 
19 | 
20 | 
21 | 
22 | ## 6.1.2 有监督学习的主要方法
23 | 
24 | ### 一、产生式模型
25 | 
26 | - 首先对联合分布进行推断：
27 | 
28 | $$
29 | p(x,y) = p(y)p(x\vert y)
30 | $$
31 | 
32 | > 其中$$p(y)$$为<mark style="color:purple;">**先验概率**</mark>，一半来自频次等等信息
33 | 
34 | 
35 | 
36 | - 接下来使用贝叶斯定理计算<mark style="color:purple;">**目标函数**</mark>条件分布$$p(y\vert x)$$
37 | 
38 | $$
39 | \begin{align}
40 | p(y\vert x) &=\frac{p(x,y)}{p(y)} \notag
41 | \\
42 | & = \frac{p(y)p(x\vert y)}{\int p(y)p(x\vert y)dy} \notag
43 | \end{align}
44 | $$
45 | 
46 | - 最后使用这个条件概率密度来进行预测
47 | 
48 | 
49 | 
50 | {% hint style="success" %}
51 | 
52 | 要确定某人所说语言的类别，产生式模型先学习所有语言，然后进行预测
53 | 
54 | {% endhint %}
55 | 
56 | 
57 | 
58 | ### 二、判别式模型
59 | 
60 | - 直接估计出概率分布$$P(y\vert x)$$或条件概率密度函数$$p(y\vert x)$$
61 | - 根据估计的函数确定输出
62 | 
63 | 
64 | 
65 | {% hint style="success" %}
66 | 
67 | 要确定某人所说语言的类别，判别式模型在不学习任何语言的情况下判别别语言的差异
68 | 
69 | {% endhint %}
70 | 
71 | 
72 | 
73 | 
74 | 
75 | ### 三、判别函数
76 | 
77 | - 寻找一个函数$$f(x)$$，将每个输入直接映射到目标输出
78 | - 概率不起直接作用
79 |   - 不能直接获取后验概率
80 |   - $$f(x)$$的目的通常旨在近似条件分布$$p(y\vert x)$$
81 | 
82 | 
83 | 
84 | 


--------------------------------------------------------------------------------
/di-liu-zhang-you-jian-du-xue-xi/6.2-hui-gui-ren-wu.md:
--------------------------------------------------------------------------------
  1 | # 6.2 回归任务
  2 | 
  3 | 
  4 | 
  5 | 线性回归的任务：
  6 | 
  7 | - **输入**：N个<mark style="color:orange;">**独立同分布（i.i.d）**</mark>的训练样本$$(\mathbf{x}^i,y^i)\in X\times R$$，$$i=1,2,\dots,N$$
  8 | - **目标函数**：$$f\in \mathcal{F}$$
  9 | - **损失函数**：$$L(f;x,y) = (f(x)-y)^2$$
 10 | - **期望风险**：$$\int (f(x)-y)^2dP(x,y)$$
 11 | 
 12 | 
 13 | 
 14 | ## 6.2.1 最小均方误差（LMS）
 15 | 
 16 | {% hint style="warning" %}
 17 | 
 18 | 最小均方误差在分类上属于判别函数
 19 | 
 20 | {% endhint %}
 21 | 
 22 | 
 23 | 
 24 | 当$$f$$是线性函数，则最优化问题为：
 25 | $$
 26 | \min_\limits{\mathbf{w}} J(\mathbf{w}) = \sum_{i=1}^N(\mathbf{w}^T\mathbf{x}^i - y^i)^2
 27 | $$
 28 | 
 29 | 也就是最小化<mark style="color:purple;">**经验风险**</mark>，在这里即为<mark style="color:orange;">**最小二乘/均方误差**</mark>
 30 | 
 31 | 
 32 | 
 33 | ### 批梯度下降
 34 | 
 35 | 对于上述最优化问题，采用梯度下降法进行更新，梯度为
 36 | $$
 37 | \frac{\partial J(\mathbf{w})}{\partial w_j} = 2\sum_{i=1}^Nx_j^i(\mathbf{w}^T\mathbf{x}^i - y^i)
 38 | $$
 39 | 对于<mark style="color:purple;">**批梯度下降法（BGD）**</mark>，更新规则为：
 40 | $$
 41 | w_j = w_j - 2\alpha\sum_{i=1}^Nx_j^i(\mathbf{w}^T\mathbf{x}^i - y^i),\ \alpha>0
 42 | $$
 43 | 这里$$\alpha$$为<mark style="color:orange;">**学习率**</mark>
 44 | 
 45 | 
 46 | 
 47 | - **优点**
 48 |   - 一次迭代是对所有样本进行计算，此时利用矩阵进行操作，实现了并行
 49 |   - 由全数据集确定的方向能够更好地代表样本总体，从而更准确地朝向极值所在的方向。当目标函数为凸函数时，BGD**一定能够得到全局最优**
 50 | - **缺点**
 51 |   - 当样本数目N很大时，每迭代一步都需要对所有样本计算，训练过程会很慢
 52 | 
 53 | 
 54 | 
 55 | ### 随机梯度下降
 56 | 
 57 | 对于批梯度下降的缺点，随机梯度下降采用了不同的更新规则：
 58 | $$
 59 | w_j = w_j - 2\alpha (\mathbf{w}^T\mathbf{x}^i - y^i)x_j^i,\ \alpha>0
 60 | $$
 61 | 也可以写作：
 62 | $$
 63 | \mathbf{w} = \mathbf{w} - 2\alpha \mathbf{X}^T\mathbf{b}
 64 | \\
 65 | \mathbf{b}=(b_1, b_2,\dots,b_N)^T\ \text{where}\  b_i = \mathbf{w}^T\mathbf{x}^i - y^i
 66 | $$
 67 | 区别在于，<mark style="color:purple;">**随机梯度下降（SGD）**</mark>每次迭代仅针对一个样本进行，而不像BGD每次对所有样本进行训练
 68 | 
 69 | 
 70 | 
 71 | 
 72 | 
 73 | ## 6.2.2 广义线性回归
 74 | 
 75 | 利用非线性基进行线性回归的思路就是对非线性基进行线性组合：
 76 | $$
 77 | f(\mathbf{w},\mathbf{x}) = w_0+ \sum_{j=1}^Kw_j\phi_j(\mathbf{x})
 78 | \\
 79 | 其中\ \Phi=(1,\phi_1,\dots,\phi_K)
 80 | $$
 81 | 
 82 | 
 83 | ### 常见的非线性基函数
 84 | 
 85 | - 多项式基函数
 86 | 
 87 | $$
 88 | \phi(\mathbf{x}) = (1,x,x^2,\dots,x^K)
 89 | $$
 90 | 
 91 | - 高斯函数
 92 | 
 93 | $$
 94 | \phi_j(\mathbf{x}) = \exp\left(-\frac{(x-\mu_j)^2}{2s^2}\right)
 95 | $$
 96 | 
 97 | - Sigmoid函数
 98 | 
 99 | $$
100 | \phi_j(\mathbf{x}) = \sigma\left(\frac{x-\mu_j}{s}\right)
101 | \\
102 | \sigma(a) = \frac{1}{1+\exp(-a)}
103 | $$
104 | 
105 | 
106 | 
107 | ### 广义线性回归的闭式解
108 | 
109 | **最优化问题**：
110 | $$
111 | \min\limits_w J(\mathbf{w}) = \sum_{i=1}^N(\mathbf{w}^T\phi(\mathbf{x}^i) - y^i)^2
112 | $$
113 | **梯度**：
114 | $$
115 | \frac{\partial J(\mathbf{w})}{\partial w_j} = 2\sum_{i=1}^N\phi_j(\mathbf{x}^i)(\mathbf{w}^T\phi(\mathbf{w^i})-y^i)
116 | $$
117 | **闭式解**：
118 | $$
119 | \mathbf{w}^* = (\Phi^T\Phi)^{-1}\Phi^T\mathbf{y}
120 | $$
121 | 其中，
122 | $$
123 | \begin{align}
124 | \Phi &= \begin{pmatrix}
125 | \phi_0(\mathbf{x}^1) & \dots & \phi_k(\mathbf{x}^1)
126 | \\
127 | \vdots & \vdots & \vdots
128 | \\
129 | \phi_0(\mathbf{x}^N) & \dots & \phi_k(\mathbf{x}^N)
130 | \end{pmatrix} \nonumber
131 | \\
132 | \\
133 | \mathbf{y} &= (y^1,\dots,y^N)^T \nonumber
134 | \end{align}
135 | $$
136 | 
137 | 
138 | ## 6.2.3 最大似然估计（MLE）
139 | 
140 | {% hint style="warning" %}
141 | 
142 | 最大似然估计在分类上属于判别式模型
143 | 
144 | {% endhint %}
145 | 
146 | 
147 | 
148 | 假设y是具有加性高斯噪声的确定函数$$f$$给出的标量，即$$y=f(\mathbf{x},\mathbf{w})+\varepsilon$$，$$\varepsilon$$是均值为0，方差为$$\beta^{-1}$$的高斯噪声
149 | 
150 | ![](../.gitbook/assets/6.2.1.png)
151 | 
152 | **训练数据**：$$(\mathbf{x}^i,y^i)$$，$$i=1,2,\dots,N$$
153 | 
154 | <mark style="color:purple;">**似然函数**</mark>：
155 | $$
156 | \begin{align}
157 | p(y\vert \mathbf{x},\mathbf{w},\beta^{-1}) &= \mathcal{N}(y\vert f(\mathbf{x},\mathbf{w}),\beta^{-1}) \nonumber
158 | \\
159 | &= \prod_{i=1}^N \mathcal{N}(y^i\vert \mathbf{w}^T\mathbf{x}^i,\beta^{-1})
160 | \end{align}
161 | $$
162 | <mark style="color:purple;">**对数似然函数**</mark>：
163 | $$
164 | \sum_{i=1}^N\ln \mathcal{N}(y^i\vert \mathbf{w}^T\mathbf{x}^i,\beta^{-1}) = \frac{N}{2}\ln\beta-\frac{N}{2}\ln2\pi-\frac{1}{2}\beta J(\mathbf{w})
165 | $$
166 | 其中，$$J(\mathbf{w}) = \sum\limits_{i=1}^N(\mathbf{w}^T\mathbf{x}^i - y^i)^2$$
167 | 
168 | 
169 | 
170 | {% hint style="success" %}
171 | 
172 | **结论**：在高斯噪声模型下，<mark style="color:red;">**最大化似然相当于最小化平方误差之和**</mark>
173 | 
174 | {% endhint %}
175 | 
176 | 最小二乘法实际上是在假设误差项满足高斯分布且独立同分布情况下，使似然性最大化。
177 | 
178 | 
179 | 
180 | ## 6.2.4 最大化后验概率（MAP）
181 | 
182 | {% hint style="warning" %}
183 | 
184 | 最大化后验概率在分类上属于生成式模型
185 | 
186 | {% endhint %}
187 | 
188 | 
189 | 
190 | - 采用<mark style="color:orange;">**正则项**</mark>的LMS问题：
191 | 
192 | $$
193 | \min\limits_w \sum_{i=1}^N(\mathbf{w}^T\mathbf{x}^i-y^i)^2+\lambda\mathbf{w}^T\mathbf{w}
194 | $$
195 | 
196 | - **闭式解**
197 | 
198 | $$
199 | \mathbf{w}^* = (\Phi^T\Phi+\lambda \mathbf{I})^{-1}\Phi^T\mathbf{y}
200 | $$
201 | 
202 | - **似然函数**
203 | 
204 | $$
205 | p(\mathbf{y}\vert\mathbf{X},\mathbf{w},\beta)=\prod_{i=1}^N\mathcal{N}(y^i\vert \mathbf{w}^T\mathbf{x}^i,\beta^{-1})
206 | $$
207 | 
208 | 接下来假设参数的<mark style="color:purple;">**先验概率**</mark>为<mark style="color:orange;">**多变量高斯分布**</mark>：
209 | $$
210 | p(\mathbf{w}) = \mathcal{N}(0,\mathbf{\alpha}^{-1}\mathbf{I})
211 | $$
212 | 这是因为根据贝叶斯公式，需要求似然与先验的联合分布，因此先验必须与似然同分布才能继续求解，则根据贝叶斯公式：
213 | $$
214 | p(\mathbf{w}\vert\mathbf{y})=\frac{p(\mathbf{y}\vert\mathbf{X},\mathbf{w},\beta)p(\mathbf{w})}{p(\mathbf{y})}
215 | $$
216 | 后验概率依然是高斯分布，对其取对数得：
217 | $$
218 | \ln(p(\mathbf{w}\vert\mathbf{y}))=-\beta\sum_{i=1}^N(y^i-\mathbf{w}^T\mathbf{x}^i)^2-\lambda\mathbf{w}^T\mathbf{w} + C
219 | $$
220 | 因此，<mark style="color:red;">**最大化后验等同于最小化带有正则项的平方和误差**</mark>
221 | 
222 | 
223 | 
224 | ## 6.2.5 MLE与MAP的比较
225 | 
226 | - MLE是判别式模型，其先验为一常数
227 | 
228 | $$
229 | \hat\theta_{MLE} = \arg \max\limits_\theta P(D\vert\theta)
230 | $$
231 | 
232 | - MAP是产生式模型
233 | 
234 | $$
235 | \begin{align}
236 | \hat\theta_{MAP} &= \arg\max\limits_\theta P(\theta\vert D)
237 | \\
238 | &= \arg\max\limits_\theta P(D\vert \theta)P(\theta)
239 | \end{align}
240 | $$
241 | 
242 | - MLE是频率学派的想法，MAP是贝叶斯学派的想法
243 | - 更多的数据会使得MLE拟合更好，但容易出现过拟合
244 | 


--------------------------------------------------------------------------------
/di-liu-zhang-you-jian-du-xue-xi/6.3-fen-lei-wen-ti.md:
--------------------------------------------------------------------------------
  1 | # 6.3 分类问题
  2 | 
  3 | 
  4 | 
  5 | 分类问题的任务：
  6 | 
  7 | - **输入**：N个<mark style="color:orange;">**独立同分布（i.i.d）**</mark>的训练样本$$(\mathbf{x}^i,y^i)\in X\times C$$，$$i=1,2,\dots,N$$
  8 | - **目标函数**：$$f\in\mathcal{F}$$
  9 | - **损失函数**：$$L(f;x,y)=I_{\{f(\mathbf{x}\neq y\}}$$
 10 | - **期望风险**：$$\int I_{\{f(\mathbf{x}\neq y\}}dP(\mathbf{x},y)=P(f(\mathbf{x})\neq y)$$
 11 | 
 12 | 
 13 | 
 14 | 对于二分类问题，y仅有两个取值-1和1，则其目标为求解一个$$\mathbf{w}$$用于预测y，例如
 15 | $$
 16 | f(\mathbf{x,w}) = \text{sgn}(\mathbf{w}^T\mathbf{x})=\begin{cases}
 17 | 1 &\mathbf{w}^T\mathbf{x}>0
 18 | \\
 19 | -1 &\mathbf{w}^T\mathbf{x}
 20 | \end{cases}
 21 | $$
 22 | 理论上，对于使用$$y=-1$$或$$y=1$$的所有样本，都应该有：
 23 | $$
 24 | \mathbf{w}^T\mathbf{x}y>0
 25 | $$
 26 | 因此，一些方法会试图最小化分错的样本，即最小化：
 27 | $$
 28 | \hat{E}_p(\mathbf{w}) = -\sum_{i\in T_M}\mathbf{w}^T\mathbf{x}^iy^i
 29 | $$
 30 | 其中，$$T_M$$为错分样本的集合
 31 | 
 32 | 
 33 | 
 34 | ### 一些常见的分布
 35 | 
 36 | #### 伯努利分布
 37 | 
 38 | 单个二进制变量$$x\in \{0,1\}$$的分布由单个连续参数$$\beta\in[0,1]$$控制：
 39 | $$
 40 | P(x\vert \beta) = \beta^x(1-\beta)^{1-x}
 41 | $$
 42 | 
 43 | 
 44 | #### 二项式分布
 45 | 
 46 | 给出N个服从伯努利分布的样本中观察到$$m$$次$$x=1$$的概率：
 47 | $$
 48 | P(m\vert N,\beta) = \begin{pmatrix}
 49 | N\\m
 50 | \end{pmatrix} \beta^m(1-\beta)^{N-m}
 51 | $$
 52 | 
 53 | 
 54 | #### 多项式分布
 55 | 
 56 | 多项式分布是二次分布的推广，变量可以取K个状态，第K个状态被观测到了$$m_k$$次的概率为：
 57 | $$
 58 | P(m_1,\dots,m_k\vert N,\beta) = \begin{pmatrix}
 59 | N\\m_1,\dots,m_k
 60 | \end{pmatrix}
 61 | \prod_{k=1}^K\beta_k^{m_k}
 62 | $$
 63 | 
 64 | #### 多变量正态分布
 65 | 
 66 | 对于$$x\sim N(\mu,\Sigma)$$，有
 67 | 
 68 | $$
 69 | p(x) = \frac{1}{\sqrt{(2\pi)^D\vert\Sigma\vert}}\exp\left(-\frac 12(\mathbf x-\mathbf\mu)^T\Sigma^{-1}(\mathbf x-\mathbf\mu)\right)
 70 | $$
 71 | 
 72 | 其中，$$\vert \Sigma\vert$$为协方差矩阵的行列式，此分布的图像如下图所示：
 73 | 
 74 | ![](../.gitbook/assets/6.3.2.png)
 75 | 
 76 | ## 6.3.1 Logistic 回归
 77 | 
 78 | {% hint style="warning" %}
 79 | 
 80 | Logistic回归属于判别式模型
 81 | 
 82 | {% endhint %}
 83 | 
 84 | Logistic回归使用<mark style="color:purple;">**Logistic函数**</mark>估计后验概率：
 85 | $$
 86 | \begin{align}
 87 | p(y=1\vert\mathbf{x}) &= f(\mathbf{x,w}) \nonumber
 88 | \\
 89 | &= g(\mathbf{w}^T\mathbf{x})
 90 | \\
 91 | &=\frac{1}{1+\exp(-\mathbf{w}^T\mathbf{x})}
 92 | \end{align}
 93 | $$
 94 | {% hint style="success" %}
 95 | 
 96 | 上式中的$$g(x)=\dfrac{1}{1+e^{-z}}$$即为<mark style="color:purple;">**Logistic函数**</mark>，它是一个经典的<mark style="color:purple;">**Sigmoid函数**</mark>
 97 | 
 98 | 其图像为：![](../.gitbook/assets/6.3.1.png)
 99 | 
100 | Logistic函数具有以下特性：
101 | 
102 | - $$z \to \infin$$时，$$g(z)\to 1$$
103 | - $$z \to -\infin$$时，$$g(z)\to 0$$
104 | - 取值在0到1之间
105 | - $$g'(z) = g(z)(1-g(z))$$
106 | 
107 | {% endhint %}
108 | 
109 | 
110 | 
111 | ### 一、使用最大似然估计求解Logistic回归
112 | 
113 | 对于二分类问题，可以假设样本的输出服从<mark style="color:purple;">**伯努利分布**</mark>（0-1分布），则对于$$y=0$$和$$y=1$$可以统一表达为：
114 | $$
115 | P(y\vert \mathbf{x,w})=(f(\mathbf{x,w}))^y(1-f(\mathbf{x,w}))^{1-y}
116 | $$
117 | 则可以得到其似然函数和对数似然函数为：
118 | $$
119 | L(\mathbf{w}) = \prod_{i=1}^NP(y^i\vert \mathbf{x}^i,\mathbf{w}) = \prod_{i=1}^N\left(f(\mathbf{x}^i,\mathbf{w})\right)^{y^i}\left(1-f(\mathbf{x}^i,\mathbf{w})\right)^{1-y^i}
120 | \\
121 | l(\mathbf{w})=\log L(\mathbf{w}) = \sum_{i=1}^N\left(y^i\log f(\mathbf{x}^i,\mathbf{w})+(1-y^i)\log\left(1- f\left(\mathbf{x}^i,\mathbf{w}\right)\right)\right)
122 | $$
123 | 梯度为：
124 | $$
125 | \frac{\partial l(\mathbf w)}{\partial w_j} = (y^i-f(\mathbf x^i,\mathbf w))x^i_j,\ \forall(\mathbf x^i,\mathbf w)
126 | $$
127 | 
128 | 
129 | 则可以使用梯度下降法更新参数：
130 | $$
131 | \textcolor{red}{w_j = w_j + \alpha (y^i - f(\mathbf x^i,\mathbf w))x^i_j}
132 | $$
133 | 
134 | 
135 | ### 二、多类Logistic回归
136 | 
137 | 思路是使用<mark style="color:purple;">**softmax函数**</mark>取代logistic sigmoid：
138 | $$
139 | P(C_k\vert\mathbf{w,x}) = \frac{\exp(\mathbf w^T_k\mathbf x)}{\sum\limits_{j=1}^N\exp(\mathbf w_j^T\mathbf x)}
140 | $$
141 | 而对于y，使用一个K维的<mark style="color:orange;">**独热表示**</mark>的向量来代替，它满足：
142 | $$
143 | \sum_{i=1}^KP(y_i\vert\mathbf{w,x}) = 1
144 | $$
145 | 同样的，对于样本的概率分布，采用<mark style="color:purple;">**广义伯努利分布**</mark>表示：
146 | $$
147 | \begin{align}
148 | &P(\mathbf{y|\mu}) = \prod_{i=1}^K\mathbf \mu_i^{y_i} \nonumber
149 | \\
150 | &\mu_i = P(y_i=1\vert \mathbf{w,x})
151 | \end{align}
152 | $$
153 | 这样就可以写出似然函数：
154 | $$
155 | P(\mathbf y^1,\mathbf y^2,L\mathbf y^N\vert\mathbf w_1,\dots,\mathbf w_K,\mathbf x^1,\dots,\mathbf x^N) = \prod_{i=1}^N\prod_{k=1}^KP(C_k\vert\mathbf w_k, \mathbf x^i) = \prod_{i=1}^N\prod_{k=1}^K\mu_{ik}^{y_k^i}
156 | $$
157 | 其中，$$\mu_{ik}$$为softmax函数：
158 | $$
159 | \mu_{ik} = \frac{\exp (\mathbf w_k^T\mathbf x^i)}{\sum\limits_{j=1}^K\exp(\mathbf w_j^T\mathbf x^i)}
160 | $$
161 | 那么，最优化问题就可以写成最小化对数似然函数：
162 | 
163 | 
164 | $$
165 | \begin{align}
166 | \min{E} &=\min -\ln P(\mathbf y^1,\dots,\mathbf y^N\vert\mathbf w_1,\dots,\mathbf w_K,\mathbf x^1,\dots,\mathbf x^N) \nonumber
167 | \\
168 | &= \min \color{red}{-\sum_{i=1}^N\sum_{k=1}^Ky_k^i\ln\mu_{ik}}
169 | \end{align}
170 | $$
171 | 
172 | 
173 | 上式中的$$E$$即为<mark style="color:purple;">**交叉熵损失函数**</mark>。
174 | 
175 | 对于这个损失函数，采用梯度下降法更新，梯度为：
176 | $$
177 | \frac{\partial E}{\partial \mathbf w_j} = \sum_{i=1}^N(\mu_{ij}-y^i_j)\mathbf x^i
178 | $$
179 | 
180 | 
181 | ## 6.3.2 高斯判别分析（GDA）
182 | 
183 | {% hint style="warning" %}
184 | 
185 | GDA属于生成式模型
186 | 
187 | {% endhint %}
188 | 
189 | GDA使用<mark style="color:purple;">**多变量正态分布**</mark>对$$p(\mathbf x\vert y)$$进行建模：
190 | $$
191 | \begin{align}
192 |  y&\sim Bernoulli(\beta) \nonumber
193 | \\
194 |  (x\vert y = 0) &\sim N(\mathbf\mu_0,\mathbf\Sigma_0)
195 |  \\
196 |   (x\vert y = 1) &\sim N(\mathbf\mu_1,\mathbf\Sigma_1)
197 | \end{align}
198 | $$
199 | 则对数似然函数为：
200 | $$
201 | \begin{align}
202 | L(\beta,\mathbf \mu_0,\mathbf \mu_1,\mathbf \Sigma) &= \log\prod_{i=1}^Np(\mathbf x^i,\mathbf y^i;\beta,\mathbf \mu_0,\mathbf \mu_1,\mathbf \Sigma) \nonumber
203 | \\
204 | &=\log\prod_{i=1}^Np(\mathbf y^i;\beta)p(\mathbf x^i\vert\mathbf y^i;\beta,\mathbf \mu_0,\mathbf \mu_1,\mathbf \Sigma)
205 | \end{align}
206 | $$
207 | 
208 | 使用MLE，估计各参数为：
209 | $$
210 | \begin{align}
211 | \beta &=\frac1N\sum_{i=1}^NI_{\{y^i=1\}} \nonumber
212 | \\
213 | \mu_k &= \frac{\sum\limits_{i=1}^NI_{\{y^i=k\}}x^i}{\sum\limits_{i=1}^NI_{\{y^i=k\}}} &k=\{0,1\}
214 | \\
215 | \Sigma&= \frac1N\sum_{i=1}^N(\mathbf x^i - \mathbf\mu_{y^i})(\mathbf x^i - \mathbf\mu_{y^i})^T
216 | \end{align}
217 | $$
218 | 
219 | 
220 | ### GDA与LR的区别
221 | 
222 | - GDA有<mark style="color:orange;">**很强的模型假设**</mark>：当假设是正确时，处理数据的效率更高
223 | - LR<mark style="color:orange;">**假设很弱**</mark>：对偏离假设的情况更具鲁棒性
224 | - 实际中，LR更常用
225 | - 在GDA中，特征向量x中的元素是连续的实数
226 |   - 若特征向量中元素是离散的：$$p(x_1,\dots,x_D\vert y)$$
227 | 
228 | 
229 | 
230 | 
231 | 
232 | ## 6.3.3 朴素贝叶斯（NB）
233 | 
234 | {% hint style="warning" %}
235 | 
236 | NB属于生成式模型
237 | 
238 | {% endhint %}
239 | 
240 | 
241 | 
242 | 假设给定y时，特征分量$$x_j$$<mark style="color:orange;">**相互独立**</mark>：
243 | $$
244 | p(x_1,\dots,x_D\vert y) = \prod_{i=1}^Dp(x_i\vert y)
245 | $$
246 | 那么对于给定的训练数据$$(\mathbf x^i,y^i)$$，$$i=1,\dots,N$$，对数似然为：
247 | $$
248 | L = \sum_{i=1}^N\sum_{j=1}^D\left[\log p(\mathbf x_j^i\vert y^i) + \log p(y^i)\right]
249 | $$
250 | 使用MLE，估计各参数为：
251 | $$
252 | \begin{align}
253 | &p(x_j=1\vert y=1) = \frac{\sum\limits_{i=1}^NI_{\{x_j^i=1,y^i=1\}}}{\sum\limits_{i=1}^NI_{\{y^i=1\}}} \nonumber
254 | \\
255 | &p(x_j=1\vert y=0) = \frac{\sum\limits_{i=1}^NI_{\{x_j^i=1,y^i=0\}}}{\sum\limits_{i=1}^NI_{\{y^i=0\}}}
256 | \\
257 | &p(y=1) = \frac{\sum\limits_{i=1}^NI_{\{y^i=1\}}}{N}
258 | \end{align}
259 | $$
260 | 
261 | 基于此，即可做出预测：
262 | $$
263 | \begin{align}
264 | p(y=1\vert x) &=\frac{p(x\vert y=1)p(y=1)}{p(x)} \nonumber
265 | \\
266 | &= \frac{\prod\limits_{j=1}^Dp(x_j\vert y=1)p(y=1)}{\prod\limits_{j=1}^Dp(x_j\vert y=1)p(y=1) + \prod\limits_{j=1}^Dp(x_j\vert y=0)p(y=0)}
267 | \end{align}
268 | $$
269 | 
270 | ### 平滑
271 | 
272 | 对于给定的训练集$$\{x^1,\dots,x^N\}$$，利用最大似然估计可以估计变量$$x$$取每个值的概率：
273 | $$
274 | p(x=j)=\frac{\sum\limits_{i=1}^NI_{\{\mathbf x^i=j\}}}{N},\ j=1,\dots,K
275 | $$
276 | 然而，如果训练集中某个类别的数据没有涵盖$$x$$的第$$m$$个取值的话，就无法估计相应的条件概率，从而导致模型可能会在测试集上产生误差
277 | 
278 | 对此，可以使用Laplace平滑，在各个估计中加入平滑项：
279 | $$
280 | p(x=j)=\frac{\sum\limits_{i=1}^NI_{\{\mathbf x^i=j\}}+1}{N+K},\ j=1,\dots,K
281 | $$
282 | 
283 | 
284 | ### NB与LR的区别
285 | 
286 | - **渐进比较**
287 |   - 当模型假设正确：NB与LR产生相似的分类器
288 |   - 当模型假设不正确
289 |     - LR不假设条件具有独立性，偏差较小
290 |     - LR的表现优于NB
291 | - **非渐进比较**
292 |   - 参数估计的收敛性
293 |     - NB：$$O(\log N)$$
294 |     - LR：$$O(N)$$
295 | 
296 | 
297 | 
298 | 
299 | 
300 | ## 本节绘图代码
301 | 
302 | ```python
303 | import numpy as np
304 | import matplotlib.pyplot as plt
305 | 
306 | np.random.seed(0)
307 | 
308 | mean = np.array([0, 0])
309 | cov = np.eye(2)
310 | data = np.random.multivariate_normal(mean, cov, (2, 2))
311 | 
312 | x = np.linspace(-4, 4, 30)
313 | y = np.linspace(-4, 4, 30)
314 | X, Y = np.meshgrid(x, y)
315 | 
316 | Z = np.zeros_like(X)
317 | for i in range(len(x)):
318 |     for j in range(len(y)):
319 |         point = np.array([x[i], y[j]])
320 |         Z[j, i] = np.exp(-0.5 * np.dot(np.dot((point - mean), np.linalg.inv(cov)), (point - mean).T))
321 | 
322 | fig = plt.figure()
323 | ax = fig.add_subplot(111, projection='3d')
324 | ax.plot_surface (X, Y, Z, cmap='viridis', shade=False)
325 | 
326 | plt.show()
327 | ```
328 | 
329 | 


--------------------------------------------------------------------------------
/di-liu-zhang-you-jian-du-xue-xi/fu-di-liu-zhang-zuo-ye.md:
--------------------------------------------------------------------------------
  1 | # 附 第六章作业
  2 | 
  3 | ## 作业1
  4 | 
  5 | ### 题目
  6 | 
  7 | 考虑一种特定类别的高斯朴素贝叶斯分类器。其中：
  8 | 
  9 | - y是一个布尔变量，服从伯努利分布，参数为$$\pi=P(y=1)$$，因此$$P(Y=1)=1-\pi$$
 10 | - $$x=\left[x_1,\dots,x_D\right]^T$$，其中每个特征$$x_i$$是一个连续随机变量。对于每个$$x_i$$，$$P(x_i\vert y=k)$$是一个高斯分布$$N(\mu_{ik},\sigma_i)$$，其中$$\sigma_i$$是高斯分布的标准差，不依赖于$$k$$
 11 | - 对于所有$$i\neq j$$，给定$$y$$，$$x_i$$和$$x_j$$是条件独立的（即所谓“朴素”分类器）
 12 | 
 13 | 问：证明上述这种高斯朴素贝叶斯判别器与逻辑回归得到的分类器形式是一致的
 14 | 
 15 | 
 16 | 
 17 | ### 解
 18 | 
 19 | 已知Logitstc回归的一般形式：
 20 | $$
 21 | P(Y=1\mid X)=\frac{1}{1+\exp(w_0+\sum\limits_{i=1}^nw_ix_i)}\\
 22 | \\
 23 | P(Y=0\mid X)=\frac{\exp(w_0+\sum\limits_{i=1}^nw_ix_i)}{1+\exp(w_0+\sum\limits_{i=1}^nw_ix_i)}
 24 | $$
 25 | 根据题目中高斯朴素贝叶斯分类器的假设，有：
 26 | $$
 27 | \begin{align}
 28 | P(Y=1\mid X) & \frac{P(Y=1)P(X\mid Y=1)}{P(Y=1)P(X\mid Y=1)+P(Y=0)P(X\mid Y=0)}\\ \nonumber
 29 | \\
 30 | &=\frac1{1+\frac{P(Y=0)P(X\mid Y=0)}{P(Y=1)p(X \mid Y=1)}}\\
 31 | \\
 32 | &=\frac1{1+\exp\left(\ln\frac{P(Y=0)P(X\mid Y=0)}{P(Y=1)P(X \mid Y=1)}\right)}
 33 | \end{align}
 34 | $$
 35 | 由给定的Y，x条件独立性假设，可得：
 36 | $$
 37 | \begin{align}
 38 | P(Y=1\mid X) &=\frac1{1+\exp\left(\ln\frac{P(Y=0)}{P(Y=1)}+\ln\left(\prod_i\frac{P(x_i\mid Y=0)}{P(x_i \mid Y=1)}\right)\right)}\\ \nonumber
 39 | \\
 40 | &=\frac1{1+\exp\left(\ln\frac{P(Y=0)}{P(Y=1)}+\sum_i\frac{P(x_i\mid Y=0)}{P(x_i \mid Y=1)}\right)}
 41 | \end{align}
 42 | $$
 43 | 
 44 | 由于$$P_i(x_i\mid Y=y_k)$$服从高斯分布$$\mathcal N(\mu_{ik},\sigma_i)$$，可得：
 45 | $$
 46 | \begin{align}
 47 | \sum_i \ln\frac{P(x_i\mid Y=0)}{P(x_i\mid Y=1)} &= \sum_i\ln\frac{\frac1{\sqrt{2\pi\sigma_i^2}}\exp\left(\frac{-(x_i-\mu_{i0})^2}{2\sigma_i^2}\right)}{\frac1{\sqrt{2\pi\sigma_i^2}}\exp\left(\frac{-(x_i-\mu_{i1})^2}{2\sigma_i^2}\right)}\nonumber\\
 48 | \\
 49 | &=\sum_i\ln\exp\left(\frac{\left(x_i-\mu_{i1})^2-(x_i-\mu_{i0}\right)^2}{2\sigma_i^2}\right)\\
 50 | \\
 51 | &=\sum_i\frac{\left(x_i-\mu_{i1})^2-(x_i-\mu_{i0}\right)^2}{2\sigma_i^2}\\
 52 | \\
 53 | &=\sum_i\left(\frac{\mu_{i0}-\mu_{i1}}{\sigma_i^2}x_i+\frac{\mu_{i1}^2-\mu_{i0}^2}{2\sigma_0^2}\right)
 54 | \end{align}
 55 | $$
 56 | 则：
 57 | $$
 58 | P(Y=1\mid X)=\frac1{1+\exp\left(\ln \frac{1-\pi}{\pi}+\sum_i\left(\frac{\mu_{i0}-\mu_{i1}}{\sigma_i^2}x_i+\frac{\mu_{i1}^2-\mu_{i0}^2}{2\sigma_0^2}\right)\right)}
 59 | $$
 60 | 等价于：
 61 | $$
 62 | P(Y=1\mid X) = \frac1{1+\exp(w_0+\sum_iw_ix_i)}
 63 | $$
 64 | 
 65 | 
 66 | 其中：
 67 | $$
 68 | w_0=\ln\frac{1-\pi}{\pi} + \sum_{i}\frac{\mu_{i1}^2-\mu_{i0}^2}{2\sigma_i^2}\\
 69 | \\
 70 | w_i = \frac{\mu_{i0}-\mu_{i1}}{\sigma_i^2}
 71 | $$
 72 | 这一形式满足logistic回归的一般形式，因此判别式分类器与上述高斯朴素分类器之间的关系正是logistic回归的形式
 73 | 
 74 | 
 75 | 
 76 | 
 77 | # 作业二
 78 | 
 79 | ## 题目
 80 | 
 81 | - 去掉$p(x_i\vert y=k)$的标准差$\sigma_i$不依赖于k的假设，即对于每一个$x_i$，$P(x_i\vert y=k)$是一个高斯分布$N(\mu_{ik},\sigma_i)$，其中$i=1,\dots ,D$，$k=0$
 82 | 
 83 | 问：这个更一般的高斯朴素贝叶斯分类器所隐含的$P(x\vert y)$的新形式仍然是逻辑回归所使用的形式吗？推导$P(x\vert y)$的新形式来证明你的答案
 84 | 
 85 | 
 86 | 
 87 | ## 解
 88 | 
 89 | 在更一般的高斯朴素贝叶斯分类器中，每个特征$x_i$的条件概率分布$P(x_i|y=k)$都是一个高斯分布$N(\mu_{ik}, \sigma_i)$，其中 $i=1, ..., D$，$k=0$。
 90 | 
 91 | 则有：
 92 | 
 93 | $P(y=0|x) = \dfrac{P(x|y=0)P(y=0)}{P(x)} P(y=1|x) = \dfrac{P(x|y=1)P(y=1)}{P(x)}$
 94 | 
 95 | 由于是二元分类，所以$P(y=1) = 1 - P(y=0)$。因此有：
 96 | 
 97 | $$
 98 | \dfrac{P(y=0|x)}{P(y=1|x)} = \dfrac{P(x|y=0)}{P(x|y=1)} \times \dfrac{1 - \pi}{\pi}
 99 | $$
100 | 
101 | 将上式记作式1，其中，$\pi = P(y=1)$是类别为1的概率。
102 | 
103 | 由于$P(x\vert y=k)$是一个高斯分布，则对于$P(x_i\vert y=0)$和$P(x_i\vert y=1)$分别记作$N(\mu_{0k}, \sigma_i)$和$N(\mu_{1k}, \sigma_i)$，进而有联合概率分布：
104 | $$
105 | \begin{align}
106 | P(x_i|y=0) = P(x_1, ..., x_D|y=0) = \prod_{i=1}^{D} P(x_i|y=0) = \prod_{i=1}^{D} N(x_i|\mu_{0i}, \sigma_i) \nonumber
107 | \\
108 | P(x_i|y=1) = P(x_1, ..., x_D|y=1) = \prod_{i=1}^{D} P(x_i|y=1) = \prod_{i=1}^{D} N(x_i|\mu_{1i}, \sigma_i)
109 | \end{align}
110 | $$
111 | 
112 | 
113 | 将等式 (2) 和 (3) 代入等式 (1) ，得：
114 | 
115 | $$
116 | \frac{P(y=0|x)}{P(y=1|x)} = \frac{\prod\limits_{i=1}^{D} N(x_i|\mu_{0i}, \sigma_i)}{\prod\limits_{i=1}^{D} N(x_i|\mu_{1i}, \sigma_i)} \times \frac{1 - \pi}{\pi}
117 | $$
118 | 
119 | 等式两边同时取对数，得：
120 | 
121 | $$
122 | \begin{align}
123 | \log\left(\frac{P(y=0|x)}{P(y=1|x)}\right) &= \log\left(\frac{1 - \pi}{\pi}\right) + \sum_{i=1}^{D} \log\left(\frac{N(x_i|\mu_{0i}, \sigma_i)}{N(x_i|\mu_{1i}, \sigma_i)}\right)
124 | \\ \nonumber
125 | &= \log\left(\frac{1 - \pi}{\pi}\right) + \sum_{i=1}^{D} \left[\log\left(\frac{1}{\sqrt{2\pi}\sigma_i}\right) - \frac{(x_i-\mu_{0i})^2}{2\sigma_i^2} + \frac{(x_i-\mu_{1i})^2}{2\sigma_i^2}\right]
126 | \\
127 | &= \log\left(\frac{1 - \pi}{\pi}\right) + \sum_{i=1}^{D} \left[\frac{\mu_{1i}-\mu_{0i}}{\sigma_i^2}x_i - \frac{\mu_{1i}^2-\mu_{0i}^2}{2\sigma_i^2}\right] + C
128 | \end{align}
129 | $$
130 | 
131 | 其中C是与特征$x_i$无关的常数。继续整理等式，将其变为：
132 | $$
133 | \log\left(\frac{P(y=0|x)}{P(y=1|x)}\right) = \sum_{i=1}^{D} \left[\frac{\mu_{1i}-\mu_{0i}}{\sigma_i^2}x_i - \frac{\mu_{1i}^2-\mu_{0i}^2}{2\sigma_i^2}\right] + C'
134 | $$
135 | 
136 | 其中$C^{\\'}$是另一个与特征$x_i$无关的常数。现在，我们可以将对数比率和特征的线性组合形式结合起来：
137 | 
138 | $$
139 | \log\left(\frac{P(y=0|x)}{P(y=1|x)}\right) = \mathbf{w}^T \mathbf{x} + b
140 | $$
141 | 
142 | 这里，$\mathbf{w}=\left[\frac{\mu_{10}-\mu_{00}}{\sigma_1^2}, \frac{\mu_{11}-\mu_{01}}{\sigma_2^2}, ..., \frac{\mu_{1D}-\mu_{0D}}{\sigma_D^2}\right]$是一个参数向量，$b=\sum\limits_{i=1}^{D} \left[\frac{\mu_{1i}^2-\mu_{0i}^2}{2\sigma_i^2}\right] - \log\left(\frac{1-\pi}{\pi}\right)$是一个常数项。
143 | 
144 | 因此，通过上述推导，$P(x|y)$的新形式为：
145 | 
146 | 
147 | $$
148 | P(x|y) = \frac{1}{1+\exp(-(\mathbf{w}^T \mathbf{x} + b))}
149 | $$
150 | 
151 | 这正是逻辑回归模型中使用的形式。因此，更一般的高斯朴素贝叶斯分类器所隐含的P(x|y)的新形式与逻辑回归所使用的形式是一致的。
152 | 
153 | 
154 | 
155 | # 作业三
156 | 
157 | ## 题目
158 | 
159 | 现在，考虑我们的高斯贝叶斯分类器的以下假设（不是“朴素”的）：
160 | 
161 | - $y$是符合伯努利分布的布尔变量，参数$\pi=P(y=1)$，$P(y=0)=1-\pi$
162 | - $x=\left[x_1,x_2\right]^T$，即每个样本只考虑两个特征，每个特征为连续随机变量， 假设$P(x_1,x_2\vert y=k)$是一个二元高斯分布$N(\mu_{1k},\mu_{2k},\sigma_1, \sigma_2,\rho)$，其中$\mu_{1k}$和$\mu_{2k}$是$x_1$和$x_2$的均值，$\sigma_1$和$\sigma_2$是$x_1$和$x_2$的标准差，$\rho$是$x_1$和$x_2$的相关性。二元高斯分布的概率密度为：
163 | 
164 | $$
165 | P\left(x_1, x_2 \mid y=k\right)=\frac{1}{2 \pi \sigma_1 \sigma_2 \sqrt{1-\rho^2}} \exp \left[-\frac{\sigma_2^2\left(x_1-\mu_{1 k}\right)^2+\sigma_1^2\left(x_2-\mu_{2 k}\right)^2-2 \rho \sigma_1 \sigma_2\left(x_1-\mu_{1 k}\right)\left(x_2-\mu_{2 k}\right)}{2\left(1-\rho^2\right) \sigma_1^2 \sigma_2^2}\right]
166 | $$
167 | 
168 | 问：这种不那么朴素的高斯贝叶斯分类器所隐含的$P(x\vert y)$的形式仍然是逻辑回归所使用的形式吗？推导$P(y\vert x)$的形式来证明你的答案
169 | 
170 | 
171 | 
172 | ## 解
173 | 
174 | 根据贝叶斯定理，有：
175 | 
176 | $$
177 | P(y|x) = \frac{P(x|y)P(y)}{P(x)}
178 | $$
179 | 
180 | 首先计算$P(x|y)$的概率密度函数：
181 | 
182 | $$
183 | P(x|y=k) = \frac{1}{2 \pi \sigma_1 \sigma_2 \sqrt{1-\rho^2}} \exp \left[-\frac{\sigma_2^2\left(x_1-\mu_{1k}\right)^2+\sigma_1^2\left(x_2-\mu_{2k}\right)^2-2 \rho \sigma_1 \sigma_2\left(x_1-\mu_{1k}\right)\left(x_2-\mu_{2k}\right)}{2\left(1-\rho^2\right) \sigma_1^2 \sigma_2^2}\right]
184 | $$
185 | 
186 | 然后计算P(x)的边缘概率密度函数。由于只有两个特征，我们可以把它们视为多元高斯分布的条件下界的一部分。因此，可以计算P(x)如下：
187 | 
188 | $$
189 | P(x) = \sum_k P(x|y=k)P(y=k) = P(x|y=1)P(y=1) + P(x|y=0)P(y=0)
190 | $$
191 | 
192 | 接下来，计算$P(y=1|x)$和$P(y=0|x)$：
193 | 
194 | $$
195 | \begin{align}
196 | P(y=1|x) = \frac{P(x|y=1)P(y=1)}{P(x)} = \frac{P(x|y=1)P(y=1)}{P(x|y=1)P(y=1) + P(x|y=0)P(y=0)} \nonumber
197 | \\
198 | P(y=0|x) = \frac{P(x|y=0)P(y=0)}{P(x)} = \frac{P(x|y=0)P(y=0)}{P(x|y=1)P(y=1) + P(x|y=0)P(y=0)}
199 | \end{align}
200 | $$
201 | 
202 | 将P(x|y)的表达式代入上面的公式，并利用比例关系简化公式：
203 | 
204 | $$
205 | P(y=1|x) = \frac{1}{1 + \frac{P(x|y=0)P(y=0)}{P(x|y=1)P(y=1)} \cdot \frac{P(y=1)}{P(y=0)}}
206 | $$
207 | 
208 | 注意到$P(y=1)/P(y=0)$是一个常数，可以化简上式：
209 | 
210 | $$
211 | P(y=1|x) = \frac{1}{1 + \frac{P(x|y=0)P(y=0)}{P(x|y=1)P(y=1)} \cdot \frac{P(y=1)}{P(y=0)}} = \frac{1}{1 + \frac{P(x|y=0)P(y=0)}{P(x|y=1)P(y=1)} \cdot \frac{1-\pi}{\pi}}
212 | $$
213 | 
214 | 将$P(x|y=0)$和$P(x|y=1)$的表达式代入上式，并进行计算和化简。最终，我们会得到一个与逻辑回归形式相似的表达式，但其中的权重项**会受到先验概率和$P(x)$的影响**。因此，这种不那么朴素的高斯贝叶斯分类器所隐含的$P(x|y)$的形式不同于逻辑回归所使用的形式。
215 | 
216 | 
217 | 
218 | # 作业四
219 | 
220 | ## 题目
221 | 
222 | 利用表格中的数据训练朴素贝叶斯分类器
223 | $$
224 | \begin{align}
225 | x_1 &= \{1,2,3\} \nonumber
226 | \\
227 | x_2 &= \{S,M,L,N\}
228 | \\
229 | y &= \{1,-1\}
230 | \end{align}
231 | $$
232 | 
233 | |      |  1   |  2   |  3   |  4   |  5   |  6   |  7   |  8   |  9   |  10  |  11  |  12  |  13  |  14  |  15  |
234 | | :--: | :--: | :--: | :--: | :--: | :--: | :--: | :--: | :--: | :--: | :--: | :--: | :--: | :--: | :--: | :--: |
235 | |  x1  |  1   |  1   |  1   |  1   |  1   |  2   |  2   |  2   |  2   |  2   |  3   |  3   |  3   |  3   |  3   |
236 | |  x2  |  S   |  M   |  M   |  S   |  S   |  S   |  M   |  M   |  L   |  L   |  L   |  M   |  M   |  L   |  L   |
237 | |  y   |  -1  |  -1  |  1   |  1   |  -1  |  -1  |  -1  |  1   |  1   |  1   |  1   |  1   |  1   |  1   |  -1  |
238 | 
239 | 给定测试样本$x=(2,S)^T$和$x=(1,N)^T$，请预测他们的标签
240 | 
241 | 
242 | 
243 | ## 解
244 | 
245 | ```python
246 | from sklearn.naive_bayes import GaussianNB
247 | import numpy as np
248 | 
249 | label = {'S': 0, 'M': 1, 'L': 2, 'N': 3}
250 | 
251 | x1 = [1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3]
252 | x2 = ['S', 'M', 'M', 'S', 'S', 'S', 'M', 'M', 'L', 'L', 'L', 'M', 'M', 'L', 'L']
253 | y = [-1, -1, 1, 1, -1, -1, -1, 1, 1, 1, 1, 1, 1, 1, -1]
254 | 
255 | X = np.array([x1, [label.get(x, -1) for x in x2]]).T
256 | 
257 | clf = GaussianNB()
258 | 
259 | clf.fit(X, y)
260 | 
261 | test_samples = [[2, label['S']], [1, label['N']]]
262 | predictions = clf.predict(test_samples)
263 | 
264 | for index, predict in enumerate(predictions):
265 |     print(f'sample{index}: {predict}')
266 | ```
267 | 
268 | sample0: -1
269 | sample1: 1
270 | 
271 | 


--------------------------------------------------------------------------------
/di-qi-zhang-zhi-chi-xiang-liang-ji/7.1-xian-xing-zhi-chi-xiang-liang-ji.md:
--------------------------------------------------------------------------------
  1 | # 7.1 线性支持向量机
  2 | 
  3 | ## 7.1.1 间隔
  4 | 
  5 | ### 一、函数间隔
  6 | 
  7 | 对于一个训练样本$$(\mathbf x^i,y^i)$$，它到$$(\mathbf w,b)$$确定的超平面的<mark style="color:purple;">**函数间隔**</mark>为：
  8 | $$
  9 | \hat{\gamma}^i = y^i(\mathbf w^T\mathbf x^i+b)
 10 | $$
 11 | <mark style="color:orange;">**函数间隔与距离是正相关的**</mark>
 12 | 
 13 | 
 14 | 
 15 | - $$y^i=1$$，$$(\mathbf w^T\mathbf x^i + b)$$是一个大的正数
 16 | - $$y^i=-1$$，$$(\mathbf w^T\mathbf x^i + b)$$是一个比较小的负数
 17 | - $$y^i(\mathbf w^T\mathbf x^i+b)>0$$，说明模型对样本的预测是正确的
 18 | - <mark style="color:red;">**大的函数间隔→高的预测置信度**</mark>
 19 | 
 20 | 
 21 | 
 22 | 对于训练数据集$$S = \{(\mathbf x^i,y^i),\ i=1,\dots,N\}$$，它的函数间隔定义为所有样本中<mark style="color:orange;">**最小的**</mark>那个：
 23 | $$
 24 | \hat{\gamma} = \min_i \hat{\gamma}^i,\ i=1,\dots,N
 25 | $$
 26 | 
 27 | 
 28 | ### 二、几何间隔
 29 | 
 30 | ![](../.gitbook/assets/7.1.1.png)
 31 | 
 32 | 对于样本$$(\mathbf x^i,y^i)$$，$$y^i=1$$，它到决策面的距离$$\gamma^i$$是线段AB的长度
 33 | 
 34 | 其中，点B可以表示为：
 35 | $$
 36 | \mathbf x^i - \frac{\gamma^i\mathbf w}{\Vert \mathbf w\Vert_2}
 37 | $$
 38 | 由于点B在<mark style="color:purple;">**决策边界**</mark>上，即：
 39 | $$
 40 | \mathbf w^T\left(\mathbf x^i - \frac{\gamma^i\mathbf w}{\Vert \mathbf w\Vert_2}\right) + b = 0
 41 | $$
 42 | 求解此方程可以得到样本$$(\mathbf x^i,y^i)$$的<mark style="color:purple;">**几何间隔**</mark>为：
 43 | $$
 44 | \gamma^i = y^i\left(\left(\frac{\mathbf w}{\Vert\mathbf w\Vert_2}\right)^T\mathbf x^i + \frac{b}{\Vert\mathbf w\Vert_2}\right)
 45 | $$
 46 | 同样的，对于训练数据集$$S = \{(\mathbf x^i,y^i),\ i=1,\dots,N\}$$，关于判别界面的几何间隔为：
 47 | $$
 48 | \gamma = \min_i \gamma^i,\ i=1,\dots,N
 49 | $$
 50 | 
 51 | 
 52 | **函数间隔与几何间隔的关系**：
 53 | $$
 54 | \gamma^i = \frac{\hat{\gamma}^i}{\Vert \mathbf w\Vert_2}
 55 | \\
 56 | \gamma = \frac{\hat{\gamma}}{\Vert \mathbf w\Vert_2}
 57 | $$
 58 | 显然，若$$\Vert \mathbf w\Vert_2=1$$，则二者相等
 59 | 
 60 | 
 61 | 
 62 | <mark style="color:red;">**几何间隔具有不变性**</mark>
 63 | 
 64 | 
 65 | 
 66 | ### 三、最优间隔分类器
 67 | 
 68 | 假设数据是<mark style="color:orange;">**线性可分**</mark>的
 69 | 
 70 | 给定一个训练集，一个自然的想法是试图找到一个使<mark style="color:orange;">**几何间隔最大化**</mark>的决策边界，这表示对训练集的有可信的预测并且对训练数据的良好“拟合”
 71 | 
 72 | ![](../.gitbook/assets/7.1.2.png)
 73 | 
 74 | 那么就需要最大化间隔，即：
 75 | $$
 76 | \begin{align}
 77 | \max_{\gamma,\mathbf w,b}\ &\gamma \nonumber
 78 | \\
 79 | s.t.\ &y^i\left(\left(\frac{\mathbf w}{\Vert \mathbf w\Vert_2}\right)^T\mathbf x^i + \frac{b}{\Vert \mathbf w\Vert_2}\right)\geq\gamma,i=1,\dots,N
 80 | \end{align}
 81 | $$
 82 | 可以将问题转化为<mark style="color:purple;">**几何间隔**</mark>：
 83 | $$
 84 | \begin{align}
 85 | \max_{\gamma,\mathbf w,b}\ &\frac{\hat{\gamma}}{\Vert \mathbf w\Vert_2} \nonumber
 86 | \\
 87 | s.t.\ &y^i(\mathbf w^T\mathbf x^i+b)\geq\hat\gamma,i=1,\dots,N
 88 | \end{align}
 89 | $$
 90 | 进一步简化问题，**令几何间隔为单位1**：
 91 | $$
 92 | \begin{align}
 93 | \min_{\mathbf w,b}\ &\frac12\Vert\mathbf w\Vert_2^2 \nonumber
 94 | \\
 95 | s.t.\ &y^i(\mathbf w^T\mathbf x^i+b)\geq1,i=1,\dots,N
 96 | \end{align}
 97 | $$
 98 | 
 99 | 也就是说，在分类正确的情况下，样本到判别界面的距离应当大于单位1
100 | 
101 | 
102 | 
103 | 
104 | 
105 | ## 7.1.2 拉格朗日对偶性
106 | 
107 | ### 一、一般情况
108 | 
109 | 对于一般的含有等式和不等式约束的最优化问题：
110 | $$
111 | \begin{align}
112 | &\min_{w} \quad f(w) \nonumber
113 | \\
114 | &\begin{array}
115 | {r@{\quad} l@{\quad}l}
116 | s.t. &g_i(w)\leq0, &i=1,\dots,k
117 | \\
118 |  &h_i(w) = 0, &i=1,\dots,l
119 | \end{array}
120 | \end{align}
121 | $$
122 | 可以定义<mark style="color:purple;">**广义拉格朗日函数**</mark>：
123 | $$
124 | L(w,\alpha,\beta) = f(w) + \color{blue}\sum_{i=1}^k \color{red}{\alpha_i} \color{blue}g_i(w) + \sum_{i=1}^l\color{red}{\beta_i} \color{blue}h_i(w)
125 | $$
126 |  上式中，$$\alpha_i$$和$$\beta_i$$分别是等式约束和不等式约束的<mark style="color:purple;">**拉格朗日乘子**</mark>，并要求$$\color{red}\alpha_i\geq0$$
127 | 
128 | 那么考虑对于$$w$$的函数：
129 | $$
130 | \theta_P(w) = \max_{\alpha,\beta;\alpha_i\geq0}{L(w,\alpha,\beta)}
131 | $$
132 | 这里下标P表示<mark style="color:orange;">**原始问题**</mark>
133 | 
134 | 假设存在$$w_i$$使得约束条件不成立（即$$g_i(w)>0$$或$$h_i(w)\neq0$$），则可以通过令$$\alpha_i$$或$$\beta_i$$等于正无穷来使得$$\theta_P(w)=+\infin$$
135 | 
136 | 而若$$w$$满足全部约束，显然可以有$$\theta_P(w) = f(w)$$，即：
137 | $$
138 | \theta_P(w) = 
139 | \begin{cases}
140 | f(w) & w满足约束
141 | \\
142 | +\infin & 其它
143 | \end{cases}
144 | $$
145 | 那么如果考虑最小问题：
146 | $$
147 | \min_{w}\quad\theta_P(w) = \min_{w}\max_{\alpha,\beta;\alpha_i\geq0}{L(w,\alpha,\beta)}
148 | $$
149 | 它与原始最优化问题是等价的，问题$$\min\limits_{w}\max\limits_{\alpha,\beta;\alpha_i\geq0}{L({w,\alpha,\beta})}$$称为<mark style="color:purple;">**广义拉格朗日函数的极小极大问题**</mark>。这样就将原始问题的最优解转化为了拉格朗日函数的极小极大问题。定义原始问题的最优值
150 | $$
151 | p^* = \min_{x}\quad\theta_P(w)
152 | $$
153 | 为<mark style="color:purple;">**原始问题的值**</mark>
154 | 
155 | 
156 | 
157 | ### 二、对偶问题
158 | 
159 |  定义：
160 | $$
161 | \theta_D(\alpha,\beta) = \min_w\quad L(w,\alpha,\beta)
162 | $$
163 | 
164 | 考虑最大化$$\theta_D(\alpha,\beta)$$，即：
165 | $$
166 | \max_{\alpha,\beta;\alpha_i\geq0}\theta_D(\alpha,\beta) = \max_{\alpha,\beta;\alpha_i\geq0} \min_w\quad L(w,\alpha,\beta)
167 | $$
168 | 问题$$\max_{\alpha,\beta;\alpha_i\geq0} \min_w\quad L(w,\alpha,\beta)$$称为<mark style="color:purple;">**广义拉格朗日函数的极大极小问题**</mark>。可以将广义拉格朗日函数的极大极小问题表示为如下约束：
169 | $$
170 | \begin{align}
171 | \max_{\alpha,\beta}&\quad\theta_D(\alpha,\beta) = \max_{\alpha,\beta}\ \min_w\ L(w,\alpha,\beta) \nonumber
172 | \\
173 | \text{s.t.}&\quad\alpha_i\geq0,i=1,\dots,k
174 | \end{align}
175 | $$
176 | 这一优化问题就称为原始问题的<mark style="color:purple;">**对偶问题**</mark>，并定义对偶问题的最优解：
177 | $$
178 | d^* = \max_{\alpha,\beta;\alpha_i\geq0}\theta_D(\alpha,\beta)
179 | $$
180 | 为对偶问题的值
181 | 
182 | 
183 | 
184 | ### 三、原始问题和对偶问题的关系
185 | 
186 | 若原始问题和对偶问题都有最优解，则：
187 | $$
188 | d^* = \max_{\alpha,\beta;\alpha_i\geq0}\min_{w} L(w,\alpha,\beta) \leq \min_w \max_{\alpha,\beta;\alpha_i\geq0} L(w,\alpha,\beta) = p^*
189 | $$
190 | {% hint style="success" %}
191 | 
192 | 设$$w^*$$和$$\alpha^*$$，$$\beta^*$$分别为原始问题和对偶问题的可行解，若$$\color{red}d^*=p^*$$，则$$w^*$$和$$\alpha^*$$，$$\beta^*$$分别是原始问题和对偶问题的<mark style="color:orange;">**最优解**</mark>
193 | 
194 | {% endhint %}
195 | 
196 | 
197 | 
198 | ### 四、KKT条件
199 | 
200 | 对于原始问题和对偶问题，假设$$f(w)$$和$$g_i(w)$$是凸函数，$$h_i(w)$$是仿射函数，并且不等式约束是<mark style="color:orange;">**严格执行**</mark>的，即$$g_i(w)<0$$，则存在$$w^*,\alpha^*,\beta^*$$，使得$$w$$是原始问题的解，$$\alpha,\beta$$是对偶问题的解，且：
201 | $$
202 | p^*=d^*=L(w^*,\alpha^*,\beta^*)
203 | $$
204 | 它的充分必要条件是以下<mark style="color:purple;">**Karush-Kuhn-Tucker（KKT）条件**</mark>：
205 | $$
206 | \frac{\partial L(w^*,\alpha^*,\beta^*)}{\partial w} = 0
207 | \\
208 | \color{red}{\alpha_i^*g_i(w^*)=0,\quad i=1,\dots,k}
209 | \\
210 | g_i(w^*)\leq0,\quad i=1,\dots,k
211 | \\
212 | \alpha_i^*\geq0
213 | \\
214 | h_i(w^*) = 0,\quad i=1,\dots,l
215 | $$
216 | 
217 | 
218 | 其中，上式中标红的部分称为<mark style="color:purple;">**KKT的对偶互补条件**</mark>，总结下来就是：若<mark style="color:orange;">**强对偶**</mark>（$$\alpha_i^*>0$$），则$$\alpha_i^*g_i(w^*)=0$$
219 | 
220 | 
221 | 
222 | 
223 | 
224 | 
225 | ## 7.1.3 线性SVM
226 | 
227 | <mark style="color:red;">**支持向量**</mark>：距分离超平面<mark style="color:orange;">**最近**</mark>的样本
228 | 
229 | 
230 | 
231 | - **输入**：<mark style="color:orange;">**线性可分**</mark>的数据集$$S=\{(\mathbf x^i,y^i),i=1,\dots,N\}$$
232 | - **输出**：判别函数及决策/判别界面
233 | - **最优化问题**
234 | 
235 | $$
236 | \begin{align}
237 | \min_{\mathbf w,b}\ &\frac12\Vert\mathbf w\Vert_2^2 \nonumber
238 | \\
239 | s.t.\ &y^i(\mathbf w^T\mathbf x^i+b)\geq1,i=1,\dots,N
240 | \end{align}
241 | $$
242 | 
243 | - **分离超平面**：$$(\mathbf w^*)^T\mathbf x + b^* = 0$$
244 | - **判别函数**：$$f_{\mathbf w,b}(\mathbf x) = sign((\mathbf w^*)^T\mathbf x + b^*)$$
245 | 
246 | 
247 | 
248 | {% hint style="success" %}
249 | 
250 | **理论保证**：对于线性可分的训练数据集，最大间隔分类器存在且唯一
251 | 
252 | {% endhint %}
253 | 
254 | 
255 | 
256 | ### 一、最优间隔分类器的对偶解
257 | 
258 | ![](../.gitbook/assets/7.1.3.png)
259 | 
260 | 对于上面的最优化问题，其约束条件为：
261 | $$
262 | g_i(\mathbf w) = -y^i(\mathbf w^T\mathbf x^i+b) + 1 \leq 0,\quad i=1,\dots,N
263 | $$
264 | 由于KKT对偶互补条件为$$\alpha_i^*g_i(w)=0$$，而在本问题中显然至少有一个$$\alpha_i\neq0$$（证明略），因此满足KKT条件即要求：
265 | $$
266 | g_i(\mathbf w^*) = 0
267 | $$
268 | 而满足这一条件的样本就是<mark style="color:red;">**支持向量**</mark>，其距离到分离超平面的距离为单位1
269 | 
270 | 
271 | 
272 | {% hint style="warning" %}
273 | 
274 | 支持向量的数量远小于样本数量，因此可以大大减少训练成本
275 | 
276 | {% endhint %}
277 | 
278 | 
279 | 
280 | 将整个问题写为广义拉格朗日函数：
281 | $$
282 | L(\boldsymbol w,b,\boldsymbol \alpha) = \frac12\Vert\boldsymbol w\Vert_2^2 - \sum_{i=1}^N\alpha_i[y^i(\boldsymbol w^T\boldsymbol x^i + b)-1]
283 | $$
284 | 那么它的**对偶问题**为：
285 | $$
286 | \theta_D(\boldsymbol \alpha) = \min_{w,b}L(\boldsymbol w,b,\boldsymbol \alpha)
287 | $$
288 | 记对偶问题的最优值为$$d^*$$，令偏导数等于0，求解$$\boldsymbol w^*$$和$$b^*$$：
289 | $$
290 | \begin{align}
291 | &\because\ \frac{\partial}{\partial \boldsymbol w}L(\boldsymbol w,b,\boldsymbol \alpha) = \boldsymbol w - \sum_{i=1}^N\alpha_iy^i\boldsymbol x^i = 0 \nonumber
292 | \\
293 | &\therefore \boldsymbol w^* = \sum_{i=1}^N\alpha_iy^i\boldsymbol x^i
294 | \\
295 | &\quad\frac{\partial}{\partial b}L(\boldsymbol w,b,\boldsymbol \alpha) = \sum_{i=1}^N\alpha_iy^i = 0
296 | \end{align}
297 | $$
298 | 带入拉格朗日函数，即有：
299 | $$
300 | \color{red}\theta_D(\boldsymbol \alpha) = \sum_{i=1}^N\alpha_i - \frac12\sum_{i,j=1}^Ny^iy^j\alpha_i\alpha_j(\boldsymbol x^i)^T\boldsymbol x^j
301 | $$
302 | 
303 | ### 二、线性可分SVM的求解过程
304 | 
305 | 输入<mark style="color:orange;">**线性可分**</mark>的训练数据集$$S=\{(\boldsymbol x^i,y^i),i=1,\dots,N\}$$
306 | 
307 | 首先通过求解对偶问题，得到拉格朗日乘子的最优解$$\boldsymbol \alpha^*$$：
308 | $$
309 | \begin{align}
310 | &\max_{\boldsymbol \alpha}\quad \sum_{i=1}^N\alpha_i-\frac12\sum_{i,j=1}^Ny^iy^j\alpha_i\alpha_j(\boldsymbol x^i)^T\boldsymbol x^j \nonumber
311 | \\
312 | &\begin{array}
313 | {r@{\quad} l@{\quad}l}
314 | s.t. & \alpha_i \geq0 &i=1,\dots,N
315 | \\
316 | &\sum\limits_{i=1}^N\alpha_iy^i = 0
317 | \end{array}
318 | \end{align}
319 | $$
320 | 
321 | 
322 | 进而得到原问题的最优解：
323 | $$
324 | \begin{align}
325 | &\color{red}{\boldsymbol w^* = \sum_{i=1}^N\alpha_i^*y^i\boldsymbol x^i} \nonumber
326 | \\
327 | &\color{red}{b^* = y^j - \sum_{i=1}^N\alpha^*_iy^i(\boldsymbol x^i)^T\boldsymbol x^j}
328 | \end{align}
329 | $$
330 | 
331 | 
332 | - **分离超平面**：$$(\boldsymbol w^*)^T\boldsymbol x + b^*=0$$
333 | - **判别函数**：$$f_{\boldsymbol w,b}(\boldsymbol w^*)^T\boldsymbol x + b^*$$
334 | 
335 | 
336 | 
337 | {% hint style="warning" %}
338 | 
339 | - 在计算$$\boldsymbol w^*$$和$$b^*$$时，只需要利用那些$$\alpha_i>0$$的那些样本（<mark style="color:purple;">**支持向量**</mark>）来计算
340 | - **对偶技巧**：只需要计算训练样本与输入特征的<mark style="color:orange;">**内积**</mark>——$$(\boldsymbol x^i)^T\boldsymbol x = \boldsymbol x^i\cdot \boldsymbol x$$
341 | 
342 | {% endhint %}
343 | 
344 | 
345 | 
346 | ## 7.1.4 非线性可分SVM
347 | 
348 | 假设
349 | 
350 | 
351 | 
352 | ## 本文中的绘图代码：
353 | 
354 | ```python
355 | import numpy as np
356 | import matplotlib.pyplot as plt
357 | from sklearn.svm import SVC
358 | 
359 | X = np.array([[3, 3], [4, 3], [1, 1], [1, 0], [0, 1.8]])
360 | y = np.array([1, 1, -1, -1, -1])
361 | 
362 | svm = SVC(kernel='linear')
363 | svm.fit(X, y)
364 | 
365 | w = svm.coef_[0]
366 | b = svm.intercept_[0]
367 | 
368 | plt.scatter(X[:, 0], X[:, 1], c=['b' if label == 1 else 'orange' for label in y])
369 | ax = plt.gca()
370 | x_lim = ax.get_xlim()
371 | y_lim = ax.get_ylim()
372 | 
373 | # 创建网格以绘制分离超平面
374 | xx = np.linspace(-1, 5, 30)
375 | yy = np.linspace(-1, 5, 30)
376 | YY, XX = np.meshgrid(yy, xx)
377 | xy = np.vstack([XX.ravel(), YY.ravel()]).T
378 | Z = svm.decision_function(xy).reshape(XX.shape)
379 | 
380 | # 绘制分离超平面和间隔边界
381 | ax.contour(XX, YY, Z, colors='k', levels=[-1, 0, 1], alpha=0.5,
382 |            linestyles=['--', '-', '--'])
383 | # 高亮显示支持向量
384 | ax.scatter(svm.support_vectors_[:, 0], svm.support_vectors_[:, 1], s=100,
385 |            linewidth=1, facecolors='none', edgecolors='r')
386 | plt.show()
387 | ```
388 | 
389 | 
390 | 
391 | ## 参考
392 | 
393 | [https://www.kaggle.com/code/alaapdhall/custom-svm-with-numpy-vs-sklearn](https://www.kaggle.com/code/alaapdhall/custom-svm-with-numpy-vs-sklearn)
394 | 


--------------------------------------------------------------------------------
/di-qi-zhang-zhi-chi-xiang-liang-ji/7.2-he-zhi-chi-xiang-liang-ji.md:
--------------------------------------------------------------------------------
1 | # 7.2 核支持向量机
2 | 
3 | 


--------------------------------------------------------------------------------
/di-qi-zhang-zhi-chi-xiang-liang-ji/7.3-xu-lie-zui-xiao-you-hua-suan-fa.md:
--------------------------------------------------------------------------------
1 | # 7.3 序列最小优化算法
2 | 
3 | 


--------------------------------------------------------------------------------
/di-qi-zhang-zhi-chi-xiang-liang-ji/fu-di-qi-zhang-zuo-ye.md:
--------------------------------------------------------------------------------
  1 | # 附 第七章作业
  2 | 
  3 | ## 作业1
  4 | 
  5 | ### 题目
  6 | 
  7 | 给定如下训练数据集
  8 | $$
  9 | x_1 = [3\ 3]\\ 
 10 | y^1 = 1\\
 11 | x_2 = [4\ 3]\\
 12 | y^2 = 1\\
 13 | x_3 = [1\ 1]\\
 14 | y^3 = -1
 15 | $$
 16 | 通过求解SVM的对偶问题来求解最大间隔的分类超平面    
 17 | 
 18 | 
 19 | 
 20 | ### 解
 21 | 
 22 | 首先，对偶问题为：
 23 | $$
 24 | \begin{align}
 25 | \max\limits_{\alpha} &\sum_{i=1}^N \alpha_i - \frac{1}{2}\sum_{i,j=1}^N y^i y^j\alpha_i\alpha_j (\mathbf{x}^i)^T\mathbf{x}^j\nonumber
 26 | \\
 27 | \operatorname{s.t.}&\ \alpha_i\geq0,i=1,\dots,N
 28 | \\
 29 | &\sum_{i=1}^N \alpha_i y^i = 0
 30 | \end{align}
 31 | $$
 32 | 并求的其最优解$$\boldsymbol{\alpha}^*=(\alpha_1^*,\dots,\alpha_l^*)$$
 33 | 
 34 | 则得到原问题的最优解：
 35 | $$
 36 | \begin{align}
 37 | &\mathbf{w}^*=\sum_{i=1}^N\alpha_i^*y^i\mathbf{x}^i \nonumber
 38 | \\
 39 | &b^* = y^j-\sum_{i=1}^N\alpha_i^*y^i(\mathbf{x}^i)^T\mathbf{x}^j
 40 | \\
 41 | & \alpha_j^* > 0
 42 | \end{align}
 43 | $$
 44 | 进而可以得到分离超平面：
 45 | $$
 46 | (\mathbf{w}^*)^T\mathbf{x} + b^* = 0
 47 | $$
 48 | 
 49 | 
 50 | 
 51 | 对于给定的训练数据集，有：
 52 | 
 53 | $$
 54 | \begin{align} 
 55 | \mathbf{x}^1 &= [3\ 3] \\ \nonumber
 56 | \mathbf{x}^2 &= [4\ 3] \\ 
 57 | \mathbf{x}^3 &= [1\ 1] \\ 
 58 | y^1 &= 1 \\ y^2 &= 1 \\ y^3 &= -1 
 59 | \end{align}
 60 | $$
 61 | 
 62 | 则有：
 63 | $$
 64 | k = \mathbf x^T \cdot \mathbf x = \begin{pmatrix}
 65 | 18&21&6\\
 66 | 21&25&7\\
 67 | 6&7&2
 68 | \end{pmatrix}
 69 | $$
 70 | 
 71 | 
 72 | 
 73 | 目标函数为：
 74 | $$
 75 | \max\limits_{\alpha} \quad\sum_{i=1}^N \alpha_i - \frac{1}{2}\sum_{i,j=1}^N y^i y^j\alpha_i\alpha_j k_{ij} 
 76 | $$
 77 | 
 78 | 约束条件为：
 79 | 
 80 | $$
 81 | \begin{align} 
 82 | &\alpha_i \geq 0, i=1,\dots,N \\ \nonumber
 83 | &\sum_{i=1}^N \alpha_i y^i = 0 
 84 | \end{align}
 85 | $$
 86 | 
 87 | 
 88 | 
 89 | ### 代码实现
 90 | 
 91 | #### 使用scipy求解最优化
 92 | 
 93 | ```python
 94 | import numpy as np
 95 | from scipy.optimize import minimize
 96 | 
 97 | X = np.array([[3, 3], [4, 3], [1, 1]])
 98 | y = np.array([1, 1, -1])
 99 | k = np.dot(X, X.T)
100 | 
101 | 
102 | def objective(alpha):
103 |     return -np.sum(alpha) + 0.5 * np.sum(np.outer(y, y) * np.outer(alpha, alpha) * k)
104 | 
105 | 
106 | def constraint1(alpha):
107 |     return alpha
108 | 
109 | 
110 | def constraint2(alpha):
111 |     return np.dot(alpha, y)
112 | 
113 | 
114 | N = X.shape[0]
115 | bounds = [(0, None)] * N
116 | constraints = [{'type': 'ineq', 'fun': constraint1}, {'type': 'eq', 'fun': constraint2}]
117 | 
118 | alpha_initial = np.zeros(N)
119 | result = minimize(objective, alpha_initial, bounds=bounds, constraints=constraints)
120 | 
121 | alphas = result.x
122 | print(f'Optimal alphas:[{alphas[0]},{alphas[1]},{alphas[2]}]')
123 | 
124 | w = np.dot(alphas * y, X)
125 | print(w)
126 | 
127 | b_index = np.argmax(alphas)
128 | b = y[b_index] - np.sum(alphas * y * np.dot(X, X[b_index]))
129 | print(b)
130 | ```
131 | 
132 | 输出结果为：
133 | 
134 | w=[0.49999996 0.49999996]
135 | b=-1.999999927629584
136 | 
137 | 
138 | 
139 | #### 使用sklearn中的SVM模块求解
140 | 
141 | ```python
142 | import numpy as np
143 | from sklearn.svm import SVC
144 | 
145 | X = np.array([[3, 3], [4, 3], [1, 1]])
146 | y = np.array([1, 1, -1])
147 | 
148 | svm = SVC(kernel='linear')
149 | svm.fit(X, y)
150 | 
151 | w = svm.coef_[0]
152 | b = svm.intercept_[0]
153 | 
154 | print(w)
155 | print(b)
156 | ```
157 | 
158 | 输出结果为：
159 | 
160 | w = [0.5 0.5]
161 | b = -2.0
162 | 
163 | 
164 | 
165 | ## 作业2
166 | 
167 | ### 题目
168 | 
169 | 高斯核有以下形式：
170 | $$
171 | K(x,z) = \exp \left(-\frac{\Vert x-z \Vert^2}{2\sigma^2}\right)
172 | $$
173 | 请证明高斯核函数可以表示为一个无穷维特征向量的内积
174 | 
175 | **提示**：利用以下展开式，将中间的因子展开为幂级数
176 | $$
177 | K(x,z) = \exp \left(-\frac{x^Tx}{2\sigma^2}\right) \exp \left(\frac{x^Tz}{2\sigma^2}\right) \exp \left(-\frac{z^Tz}{2\sigma^2}\right)
178 | $$
179 | 
180 | 
181 | ### 证明
182 | 
183 | 首先，有：
184 | $$
185 | \begin{align}
186 | K(x,z) &= \exp \left(-\frac{\Vert x-z \Vert^2}{2\sigma^2}\right) \nonumber
187 | \\
188 | &= \exp \left(-\frac{x^2+z^2-2xz}{2\sigma^2}\right)
189 | \\
190 | &= \exp \left(-\frac{x^2+z^2}{2\sigma^2}\right) \exp \left(\frac{xz}{\sigma^2}\right)
191 | \end{align}
192 | $$
193 | 记为式（1）
194 | 
195 | 由于函数$$e^x$$的幂级数展开式为：
196 | $$
197 | \begin{align}
198 | e^x &= \sum_{i=0}^\infin\frac{x^n}{n!}\\ \nonumber
199 | &= 1 + x + \frac{x^2}{2!} + \dots + \frac{x^n}{n!} + R_n
200 | \end{align}
201 | $$
202 | 因此，可以有：
203 | $$
204 | \begin{align}
205 | \exp \left(\frac{xz}{\sigma^2}\right) &= 1 + \left(\frac{xz}{\sigma^2}\right) + \frac{\left(\frac{xz}{\sigma^2}\right)^2}{2!} + \dots + \frac{(\frac{xz}{\sigma^2})^n}{n!} + \dots \nonumber
206 | \\
207 | &=1 + \frac1{\sigma^2}\cdot\frac{xz}{1!} + \left(\frac1{\sigma^2}\right)^2\cdot\frac{(xz)^2}{2!} + \dots + \left(\frac1{\sigma^2}\right)^n\cdot\frac{(xz)^n}{n!} + \dots
208 | \\
209 | &= 1\cdot1 + \frac1{1!}\frac x\sigma\cdot\frac z\sigma + \frac1{2!}\frac{x^2}{\sigma^2}\cdot\frac{z^2}{\sigma^2} +\dots+\frac1{n!}\frac{x^n}{\sigma^n}\cdot\frac{z^n}{\sigma^n} + \dots
210 | \end{align}
211 | $$
212 | 将其带回（1）式，有：
213 | $$
214 | \begin{align}
215 | K(x,z) &= \exp \left(-\frac{x^2+z^2}{2\sigma^2}\right)\cdot\left(1\cdot1 + \frac1{1!}\frac x\sigma\cdot\frac z\sigma + \frac1{2!}\frac{x^2}{\sigma^2}\cdot\frac{z^2}{\sigma^2} +\dots+\frac1{n!}\frac{x^n}{\sigma^n}\cdot\frac{z^n}{\sigma^n} + \dots\right) \nonumber
216 | \\
217 | &= \exp \left(-\frac{x^2}{2\sigma^2}\right)\cdot\exp \left(-\frac{z^2}{2\sigma^2}\right)\cdot\left(1\cdot1 + \frac1{1!}\frac x\sigma\cdot\frac z\sigma + \frac1{2!}\frac{x^2}{\sigma^2}\cdot\frac{z^2}{\sigma^2} +\dots+\frac1{n!}\frac{x^n}{\sigma^n}\cdot\frac{z^n}{\sigma^n} + \dots\right)
218 | \\
219 | &= \Phi(x)^T\cdot\Phi(z)
220 | \end{align}
221 | $$
222 | 其中，
223 | $$
224 | \Phi(x) = \exp \left(-\frac{x^2}{2\sigma^2}\right)\cdot\left(1 , \sqrt\frac1{1!}\frac x\sigma , \sqrt\frac1{2!}\frac{x^2}{\sigma^2} , \dots , \sqrt\frac1{n!}\frac{x^n}{\sigma^n} , \dots\right)
225 | $$
226 | 因此，高斯核函数可以表示为一个无穷维特征向量的内积
227 | 


--------------------------------------------------------------------------------
/di-san-zhang-pan-bie-shi-fen-lei-qi/3.1-pan-bie-shi-fen-lei-qi-yu-sheng-cheng-shi-fen-lei-qi.md:
--------------------------------------------------------------------------------
 1 | # 3.1 判别式分类器与生成式分类器
 2 | 
 3 | {% hint style="warning" %}
 4 | 
 5 | 分类是监督学习的主要任务之一
 6 | 
 7 | {% endhint %}
 8 | 
 9 | 
10 | 
11 | ## 3.1.1 分类器
12 | 
13 | - **分类器**：条件概率分别或判别函数
14 |   - **条件概率分布**$$P(y|x)$$：对于输入x，比较属于所有类的<mark style="color:purple;">**概率**</mark>，输出概率最大的最为x的类别
15 |   - **判别函数**$$y=f(x)$$：对于输入x，将输出y与<mark style="color:purple;">**阈值**</mark>比较，判定x属于哪个类
16 | 
17 | 
18 | 
19 | 
20 | 
21 | ### 生成式模型
22 | 
23 | - 要学习所有的样本
24 | - 利用条件概率进行预测
25 | 
26 | 
27 | 
28 | ### 判别式模型
29 | 
30 | - 直接估计分布函数
31 | - 利用分布函数确定输出类别
32 | - 不需要学习所有样本
33 | 
34 | 
35 | 
36 | ## 3.1.2 两者的区别
37 | 
38 | - 生成式模型学习了联合概率分布，可以从统计的角度表示数据的分布情况，能够反映同类数据本身的相似度，但是它<mark style="color:orange;">**不关心划分各类的边界**</mark>
39 | - 生成式模型的学习收敛速度更快，即当样本容易增加的时候，学到的模型可以更快地收敛于真实模型
40 | - 生成模型能够应付存在隐变量的情况
41 | - 联合分布能提供更多的信息，但也需要更多的样本和更多计算，尤其是为了更准确估计类别条件分布，需要增加样本的数目，而且类别条件概率的<mark style="color:orange;">**许多信息是我们做分类用不到**</mark>，因而如果我们只需要做分类任务，就浪费了计算资源
42 | - 实践中多数情况下判别模型效果更好
43 | 


--------------------------------------------------------------------------------
/di-san-zhang-pan-bie-shi-fen-lei-qi/3.2-xian-xing-pan-bie-han-shu.md:
--------------------------------------------------------------------------------
  1 | # 3.2 线性判别函数
  2 | 
  3 | ## 3.2.1 线性判别函数
  4 | 
  5 | ### 两类问题的判别函数
  6 | 
  7 | 假设x是二维模式样本$$x=(x_1\ x_2)^T$$，模式的平面图如下：
  8 | 
  9 | ![img](../.gitbook/assets/3.2.1.jpg)
 10 | 
 11 | 此时，分属于$$\omega_1$$和$$\omega_2$$两类的模式可以用一个直线方程进行划分：
 12 | $$
 13 | d(x) = w_1x_1 + w_2x_2 + w_3 = 0
 14 | $$
 15 | 其中，$$x_1$$、$$x_2$$称为<mark style="color:orange;">**坐标变量**</mark>，$$w_1$$、$$w_2$$、$$w_3$$称为<mark style="color:orange;">**参数方程**</mark>，则将一个位置模式带入，有：
 16 | 
 17 | - 若$$d(x)>0$$，则$$x\in \omega_1$$
 18 | - 若$$d(x)<0$$，则$$x\in \omega_2$$
 19 | 
 20 | 此处的$$d(x)=0$$就称为<mark style="color:orange;">**判别函数**</mark>
 21 | 
 22 | 
 23 | 
 24 | ### 用判别函数进行分类的两个因素
 25 | 
 26 | - **判别函数的<mark style="color:purple;">几何性质</mark>**
 27 |   - 线性的
 28 |   - 非线性的
 29 | - **判别函数的<mark style="color:purple;">系数</mark>**：使用给定的模式样本确定判别函数的系数
 30 | 
 31 | 
 32 | 
 33 | 
 34 | 
 35 | ### n维线性判别函数
 36 | 
 37 | 一个n维线性判别函数可以写为：
 38 | $$
 39 | d(x)=w_1x_1 + w_2x_2 + \cdots + w_nx_n + w_{n+1} = w_0^Tx + w_{n+1}
 40 | $$
 41 | 
 42 | 
 43 | 其中，$$\boldsymbol{w}_0=(w_1,w_2,\dots,w_n)^T$$称为<mark style="color:orange;">**权向量**</mark>
 44 | 
 45 | 此外，$$d(x)$$还可以写为：
 46 | $$
 47 | d(x)=w^Tx
 48 | $$
 49 | 其中，$$\boldsymbol{x}=(x_1,x_2,\dots,x_n,1)$$称为<mark style="color:orange;">**增广模式向量**</mark>，$$\boldsymbol{w}_0=(w_1,w_2,\dots,w_n,w_{n+1})^T$$称为<mark style="color:orange;">**增广权向量**</mark>
 50 | 
 51 | 
 52 | 
 53 | ## 3.2.2 线性判别函数的三种形式
 54 | 
 55 | ### 一、$$\omega_i\backslash\overline{\omega_i}$$两分法
 56 | 
 57 | 每条判别函数只区分是否属于某一类
 58 | 
 59 | ![image-20230928113317386](../.gitbook/assets/3.2.2.png)
 60 | 
 61 | 上图中，白色区域为分类失败区域
 62 | 
 63 | - 将M分类问题分为M个单独的分类问题
 64 | - 需要M条线
 65 | - 不一定能够找到判别函数区分开其它所有类别
 66 | 
 67 | 
 68 | 
 69 | ### 二、$$\omega_i\backslash\overline{\omega_j}$$两分法
 70 | 
 71 | 每一条线分开两种类别
 72 | 
 73 | ![image-20230928134949845](../.gitbook/assets/3.2.3.png)
 74 | 
 75 | 
 76 | 
 77 | ### 三、类别1转化为类别2
 78 | 
 79 | 可以通过以下方式将方式1的判别函数转化为方式2的：
 80 | $$
 81 | d_{12}(x) = d_1(x) - d_2(x) = 0
 82 | \\
 83 | d_{13}(x) = d_1(x) - d_3(x) = 0
 84 | \\
 85 | d_{23}(x) = d_2(x) - d_3(x) = 0
 86 | $$
 87 | 
 88 | 
 89 | 
 90 | 
 91 | ### 四、小结
 92 | 
 93 | - **线性可分**：模式分类若可以用任一<mark style="color:purple;">**线性函数**</mark>划分，则称这些模式是<mark style="color:orange;">**线性可分**</mark>的
 94 | - 一旦线性函数的系数$$w_k$$被确定，则此函数就可以作为分类的基础
 95 | - 两种分类法的比较
 96 |   - 对于M分类，法一需要M个判别函数，法二需要$$\frac{M(M-1)}{2}$$个判别函数，因此当模式较多时，法二需要更多的判别函数
 97 |   - 但是对于法一而言，并不是每一种情况都是线性可分的，因此法二对模式是线性可分的概率比法一大
 98 | 
 99 | ## 绘图代码
100 | 
101 | ```python
102 | import matplotlib.pyplot as plt
103 | import numpy as np
104 | 
105 | x = np.linspace(-5, 10, 100)
106 | 
107 | y1 = x
108 | y2 = -x + 5
109 | y3 = np.ones_like(x)
110 | 
111 | plt.plot(x, y1, label='$d_1(x)=-x_1+x_2 = 0$')
112 | plt.plot(x, y2, label='$d_2(x)=x_1+x_2-5 = 0$')
113 | plt.plot(x, y3, label='$d_3(x)=-x_2+1 = 0$')
114 | 
115 | plt.axhline(y=0, color='black', linestyle='--', linewidth=0.8)
116 | plt.xlabel('$x_1$')
117 | plt.ylabel('$x_2$')
118 | plt.axvline(x=0, color='black', linestyle='--', linewidth=0.8)
119 | 
120 | plt.fill_between(x, y2, np.maximum(y1, y3), where=(x <= 2.5), color='blue', alpha=0.2)
121 | plt.fill_between(x, y1, np.maximum(y2, y3), where=(x >= 2.5), color='orange', alpha=0.2)
122 | plt.fill([-5, 1, 4, 10], [-5, 1, 1, -5], color='green', alpha=0.2)
123 | 
124 | plt.annotate('$\\omega_1$\n$d_1(x)>0$\n$d_2(x)<0$\n$d_3(x)<0$',
125 |              xy=(-4, 6), xytext=(-4, 4), fontsize=12, color='black')
126 | plt.annotate('$\\omega_2$\n$d_2(x)>0$\n$d_1(x)<0$\n$d_3(x)<0$',
127 |              xy=(7, 6), xytext=(7, 3), fontsize=12, color='black')
128 | plt.annotate('$\\omega_3$\n$d_3(x)>0$\n$d_1(x)<0$\n$d_2(x)<0$',
129 |              xy=(-1, -4), xytext=(1, -4), fontsize=12, color='black')
130 | 
131 | plt.xlim(-5, 10)
132 | plt.ylim(-5, 10)
133 | 
134 | plt.legend(loc='lower right')
135 | 
136 | plt.show()
137 | ```
138 | 
139 | 
140 | 
141 | ```python
142 | import matplotlib.pyplot as plt
143 | import numpy as np
144 | 
145 | x = np.linspace(-5, 10, 100)
146 | y = np.linspace(-5, 10, 100)
147 | 
148 | y1 = x
149 | y2 = -x + 5
150 | x1 = np.full_like(y, 2)
151 | 
152 | plt.plot(x, y1, label='$d_{23}(x)=-x_1+x_2 = 0$')
153 | plt.plot(x, y2, label='$d_{12}(x)=x_1+x_2-5 = 0$')
154 | plt.plot(x1, y, label='$d_{13}(x)=-x_1+2 = 0$')
155 | 
156 | plt.axhline(y=0, color='black', linestyle='--', linewidth=0.8)
157 | plt.xlabel('$x_1$')
158 | plt.ylabel('$x_2$')
159 | plt.axvline(x=0, color='black', linestyle='--', linewidth=0.8)
160 | 
161 | plt.fill_between(x, np.maximum(y1, y2), 10 * np.ones_like(x), color='blue', alpha=0.2)
162 | plt.fill_between(x, y2, -5 * np.ones_like(x), where=(x <= 2), color='orange', alpha=0.2)
163 | plt.fill_between(x, y1, -5 * np.ones_like(x), where=(x > 1.9), color='green', alpha=0.2)
164 | 
165 | plt.annotate('$\\omega_1$\n$d_{12}(x)>0$\n$d_{13}(x)>0$',
166 |              xy=(-3, 2), xytext=(-3, 2), fontsize=12, color='black')
167 | plt.annotate('$\\omega_2$\n$d_{32}(x)>0$\n$d_{12}(x)>0$',
168 |              xy=(1, 6), xytext=(1, 5), fontsize=12, color='black')
169 | plt.annotate('$\\omega_3$\n$d_{32}(x)>0$\n$d_{13}(x)>0$',
170 |              xy=(6, 1), xytext=(6, 1), fontsize=12, color='black')
171 | 
172 | plt.xlim(-5, 10)
173 | plt.ylim(-5, 10)
174 | plt.grid(True)
175 | 
176 | plt.legend(loc='lower right')
177 | 
178 | plt.show()
179 | ```
180 | 
181 | 


--------------------------------------------------------------------------------
/di-san-zhang-pan-bie-shi-fen-lei-qi/3.3-guang-yi-xian-xing-pan-bie-han-shu.md:
--------------------------------------------------------------------------------
  1 | # 3.3 广义线性判别函数
  2 | 
  3 | ## 3.3.1 基本思想
  4 | 
  5 | 假设一个模式集$$\{x\}$$，在模式空间$$x$$中<mark style="color:blue;">**线性不可分**</mark>，但是在模式空间$$x^*$$中<mark style="color:blue;">**线性可分**</mark>，其中$$x^*$$中的各分量是$$x$$的<mark style="color:purple;">**单值实函数**</mark>，且$$x^*$$的维度<mark style="color:orange;">**高于**</mark>$$x$$的维度，即：
  6 | $$
  7 | x^* = (f_1(x),f_2(x),\dots,f_k(x),1)^T\ \ \ k>n
  8 | $$
  9 | 则若有非线性判别函数：
 10 | $$
 11 | d(x)=w_1f_1(x) + w_2f_2(x) + \cdots + w_kf_k(x) + w_{k+1}
 12 | $$
 13 | 
 14 | 
 15 | 该判别函数可以表示为：
 16 | $$
 17 | d(x^*)=w^Tx^*
 18 | $$
 19 | 
 20 | 
 21 | 此时非线性判别函数已经被转换为<mark style="color:orange;">**广义线性**</mark>
 22 | 
 23 | 
 24 | 
 25 | ## 3.3.2 f(x)的选择
 26 | 
 27 | ### 一、一次函数
 28 | 
 29 | 若取$$f_i(x)$$为一次函数，则变换后的模式$$x^*=x$$，$$x^*$$的维数k等于$$x$$的维数n，此时广义化后的线性判别式仍然为：
 30 | $$
 31 | d(x) = w^Tx = w_{n+1}
 32 | $$
 33 | 
 34 | 
 35 | ### 二、二次多项式函数
 36 | 
 37 | 设x的维度为n，则原判别函数为：
 38 | $$
 39 | d(x) = \sum_{j=1}^nw_{jj}x_j^2 + \sum_{j=1}^{n-1}\sum_{k=j+1}^nw_{jk}x_jx_k + \sum_{j=1}^nw_jx_j + w_{n+1}
 40 | $$
 41 | 式中包含$$x$$各分量的二次项、一次项和常数项，其中：
 42 | 
 43 | - 平方项$$n$$个
 44 | - 二次项$$\dfrac{n(n-1)}{2}$$个
 45 | - 一次项$$n$$个
 46 | - 常数项1个
 47 | 
 48 | 总的项数为：
 49 | $$
 50 | n+\frac{n(n-1)}{2} + n +1 = \frac{(n+1)(n+2)}{2} > n
 51 | $$
 52 | 显然对于$$x^*$$，其维数大于$$x$$的原维数，则$$x^*$$的各分量一般化为：
 53 | $$
 54 | f_i(x) = x_{p_1}^sx_{p_2}^t,\ p_1,p_2=1,2,\dots,n,\ s,t=0,1
 55 | $$
 56 | 
 57 | 
 58 | ### 三、r次多项式
 59 | 
 60 | 若$$f_i(x)$$为r次多项式函数，x为n维模式，则有：
 61 | $$
 62 | f_i(x) = x_{p_1}^{s_1}x_{p_2}^{s_2}\cdots x_{p_r}^{s_r},\ p_1,p_2,\dots,p_r=1,2,\dots,n,\ s_1,s_2,\dots,s_r=0,1
 63 | $$
 64 | 此时，判别函数$$d(x)$$可由以下递推关系给出：
 65 | $$
 66 | \begin{align}
 67 | 常数项:\ d^{(0)}(x)&=w_{n+1} \nonumber
 68 | \\
 69 | 一次项:\ d^{(1)}(x)&=\sum_{p_1=1}^nw_{p_1}x_{p_1} + d^{(0)}(x) \nonumber
 70 | \\
 71 | 二次项:\ d^{(2)}(x)&=\sum_{p_1=1}^n\sum_{p_2=p_1}^nw_{p_1p_2}x_{p_1}x_{p_2} + d^{(1)}(x) \nonumber
 72 | \\
 73 | & \cdots \nonumber
 74 | \\
 75 | r次项:\ d^{(r)}(x)&=\sum_{p_1=1}^n\sum_{p_2=p_1}^n\dots\sum_{p_r=p_{r-1}}^nw_{p_1p_2\dots p_r}x_{p_1}x_{p_2}\dots x_{p_r} + d^{(r-1)}(x) \nonumber
 76 | \end{align}
 77 | $$
 78 | {% hint style="info" %}
 79 | 
 80 | 例：当取r=2，n=2时，写出判别函数
 81 | $$
 82 | \begin{align}
 83 | \text{常数项}:\ d^{(0)}(x)&=w_{n+1} \nonumber
 84 | \\
 85 | &=w_3 \nonumber
 86 | \\
 87 | \text{一次项}:\ d^{(1)}(x)&=\sum_{p_1=1}^nw_{p_1}x_{p_1} + d^{(0)}(x) \nonumber
 88 | \\
 89 | &=w_1x_1+w_2x_2+w_3 \nonumber
 90 | \\
 91 | \text{二次项}:\ d^{(2)}(x)&=\sum_{p_1=1}^n\sum_{p_2=p_1}^nw_{p_1p_2}x_{p_1}x_{p_2} + d^{(1)}(x) \nonumber
 92 | \\
 93 | &=w_{11}x_1^2 + w_{12}x_1x_2 + w_{22}x_2^2 + w_1x_1+w_2x_2+w_3 \nonumber
 94 | \\
 95 | \end{align}
 96 | $$
 97 | {% endhint %}
 98 | 
 99 | ### 四、$$d(x)$$的项数
100 | 
101 | 对于n维x向量，若用r次多项式，$$d(x)$$的权系数的总项数为：
102 | $$
103 | N_w=C_{n+r}^r=\frac{(n+r)!}{r!n!}
104 | $$
105 | 可以看出$$d(x)$$的<mark style="color:orange;">**项数随着r和n的增大而迅速增大**</mark>，若采用次数较高的多项式变换，即使原来$$x$$的维数不高，也会使得变换后的$$x^*$$维数很高，给分类带来困难
106 | 
107 | 
108 | 
109 | {% hint style="warning" %}
110 | 
111 | 实际情况可只取r=2，或只选多项式的一部分，例如r=2时只取二次项，略去一次项，以减少$$x^*$$的维数。
112 | 
113 | {% endhint %}
114 | 
115 | 
116 | 
117 | 
118 | 
119 | {% hint style="info" %}
120 | 
121 | 例：设有一维样本空间X，所希望的分类是若$$x\leq b$$或$$x\geq a$$，则$$x \in \omega_1$$；若$$b<x<a$$，$$x \in \omega_1$$
122 | 
123 | ![](../.gitbook/assets/3.3.1.png)
124 | 
125 | 显然没有一个线性判别函数能在一维空间中解决上述问题。
126 | 
127 | 要在一维空间中分类，只有定义判别函数：
128 | $$
129 | d(x) = (x-a)(x-b) = x^2 - (a+b)x + ab
130 | $$
131 | 对应到二维空间，令：
132 | $$
133 | x_1 = f_1(x) = x^2
134 | \\
135 | x_2 = f_2(x) = x
136 | $$
137 | 则可以得到线性判别函数：
138 | $$
139 | d(x) = x_1 - (a+b)x_2 + ab = w^Tx
140 | $$
141 | 
142 | 
143 | 其中
144 | $$
145 | x = (x_1,\ x_2,\ 1)^T
146 | \\
147 | w = (1,\ -(a+b),\ ab)^T
148 | $$
149 | 
150 | 
151 | {% endhint %}
152 | 
153 | 
154 | 
155 | ## 3.3.3 分段线性判别函数
156 | 
157 | ### 一、出发点
158 | 
159 | - 线性判别函数在进行分类决策时是最简单有效的，但在实际应用中，常常会出现不能用线性判别函数直接进行分类的情况
160 | - 采用广义线性判别函数的概念，可以通过增加维数来得到线性判别，但维数的大量增加会使在低维空间里在解析和计算上行得通的方法在高维空间遇到困难，增加计算的复杂性
161 | - 引入分段线性判别函数的判别过程，它比一般的线性判别函数的错误率小，但又比非线性判别函数简单
162 | 
163 | 
164 | 
165 | {% hint style="warning" %}
166 | 
167 | 简单来说，就是用一个分段函数来逼近非线性的判别函数
168 | 
169 | {% endhint %}
170 | 
171 | 
172 | 
173 | ### 二、最小距离分类器
174 | 
175 | 设$$\mu_1$$和$$\mu_2$$为两个模式$$\omega_1$$和$$\omega_2$$的聚类中心，定义决策规则：
176 | $$
177 | \Vert x- \mu_1\Vert^2-\Vert x - \mu_2\Vert^2=
178 | \begin{cases}
179 | <0 & x\in \omega_1
180 | \\
181 | >0 & x\in\omega_2
182 | \end{cases}
183 | $$
184 | 此时的决策面是两类期望连线的<mark style="color:purple;">**垂直平分面**</mark>，这样的分类器称为<mark style="color:orange;">**最小距离分类器**</mark>
185 | 
186 | ![](../.gitbook/assets/3.3.2.png)
187 | 


--------------------------------------------------------------------------------
/di-san-zhang-pan-bie-shi-fen-lei-qi/3.4-fisher-xian-xing-pan-bie.md:
--------------------------------------------------------------------------------
  1 | # 3.4 Fisher线性判别
  2 | 
  3 | ## 3.4.1 概述
  4 | 
  5 | **出发点**
  6 | 
  7 | - 应用统计方法解决模式识别问题时，一再碰到的问题之一就是维数问题
  8 | - 在低维空间里解析上或计算上行得通的方法，在高维空间里往往行不通
  9 | - 因此，<mark style="color:purple;">**降低维数**</mark>有时就会成为处理实际问题的关键
 10 | 
 11 | 
 12 | 
 13 | **问题描述**
 14 | 
 15 | - 考虑把d维空间的样本投影到一条直线上，形成一维空间，即把维数压缩到一维
 16 | - 然而，即使样本在d维空间里形成若干紧凑的互相分得开的集群，当把它们投影到一条直线上时，也可能会是几类样本混在一起而变得无法识别
 17 | - 但是，在一般情况下，总可以找到某个方向，使在这个方向的直线上，样本的投影能分得开
 18 | 
 19 | 
 20 | 
 21 | {% hint style="success" %}
 22 | 
 23 | <mark style="color:orange;">**Fisher判别方法**</mark>所要解决的基本问题，就是如何根据实际情况找到一条最好的、最易于分类的投影线
 24 | 
 25 | {% endhint %}
 26 | 
 27 | 
 28 | 
 29 | ![](../.gitbook/assets/3.4.1.png)
 30 | 
 31 | 
 32 | 
 33 | ## 3.4.2 降维的数学原理
 34 | 
 35 | 从d维空间降到一维空间的一般数学变换方法：
 36 | 
 37 | - 设有一集合$$\Gamma$$包含N个d维样本$$x_1,x_2,\dots,x_N$$，其中$$N_1$$个属于$$\omega_1$$类的样本记为子集$$\Gamma_1$$，$$N_2$$个属于$$\omega_2$$类的样本记为子集$$\Gamma_2$$，若对$$x_n$$的分量做线性组合可得标量：
 38 | 
 39 | $$
 40 | y_n = \boldsymbol{w}^T\boldsymbol{x}_n,\ \ n=1,2,\dots,N
 41 | $$
 42 | 
 43 | - 这样可以得到N个一维样本$$y_n$$组成的集合，且可以分为两个子集$$\Gamma_1,\Gamma_2$$
 44 | - 这里关心的是$$\boldsymbol{w}$$的方向，即<mark style="color:purple;">**样本投影的方向**</mark>，而具体的值并不重要，只是一个比例因子
 45 | - 所以，抽象到数学层面，本质就是寻找最好的变换向量$$\boldsymbol{w}^*$$
 46 | 
 47 | 
 48 | 
 49 | ## 3.4.3 Fisher准则
 50 | 
 51 | ### 一、Fisher准则中的基本参量
 52 | 
 53 | **在高维空间X中：**
 54 | 
 55 | - 各样本的均值向量$$\boldsymbol{m}_i$$
 56 | 
 57 | $$
 58 | \boldsymbol{m}_i = \frac{1}{N_i}\sum_{x\in \Gamma_i}x,\ \ i=1,2
 59 | $$
 60 | 
 61 | - 样本类内离散度矩阵$$S_i$$和总样本类内离散度矩阵$$S_w$$
 62 | 
 63 | $$
 64 | \begin{align}
 65 | S_i &= \sum_{x\in\Gamma_i}(x-\boldsymbol{m}_i)(x-\boldsymbol{m}_i)^T,\ \ i=1,2 \nonumber
 66 | \\
 67 | S_w &= S_1 + S_2 \nonumber
 68 | \end{align}
 69 | $$
 70 | 
 71 | - 样本类间离散度矩阵$$S_b$$，$$S_b$$是一个<mark style="color:purple;">**对称半正定矩阵**</mark>
 72 | 
 73 | $$
 74 | S_b = (\boldsymbol{m}_1-\boldsymbol{m}_2)(\boldsymbol{m}_1-\boldsymbol{m}_2)^T
 75 | $$
 76 | 
 77 | 
 78 | 
 79 | **在一维空间Y中：**
 80 | 
 81 | - 各类样本的均值$$\widetilde{m}_i$$
 82 | 
 83 | $$
 84 | \widetilde{m}_i = \frac{1}{N_i}\sum_{y \in \Gamma_i}y,\ \ i=1,2
 85 | $$
 86 | 
 87 | - 样本类内离散度$$\widetilde{S}_i^2$$和总样本类内离散度$$\widetilde{S}_w$$
 88 | 
 89 | $$
 90 | \begin{align}
 91 | \widetilde{S}_i^2 &= \sum_{y\in \Gamma_i}(y-\widetilde{m}_i)^2,\ \ i=1,2\nonumber
 92 | \\
 93 | \widetilde{S}_w &= \widetilde{S}_1^2 + \widetilde{S}_2^2
 94 | \end{align}
 95 | $$
 96 | 
 97 | 
 98 | 
 99 | {% hint style="success" %}
100 | 
101 | 我们希望投影后，在一维Y空间中各类样本尽可能分得开些，同时各类样本内部尽量密集，实际上就是
102 | 
103 | - 两类之间的<mark style="color:purple;">**均值**</mark>相差<mark style="color:orange;">**越大越好**</mark>
104 | - 类内的<mark style="color:purple;">**离散度**</mark><mark style="color:orange;">**越小越好**</mark>
105 | 
106 | {% endhint %}
107 | 
108 | 
109 | 
110 | ## 3.4.3 Fisher准则函数的定义
111 | 
112 | Fisher准则函数定义为：
113 | $$
114 | J_F(\boldsymbol{w}) = \frac{(\widetilde{m}_1 - \widetilde{m}_2)^2}{\widetilde{S}_1^2 + \widetilde{S}_2^2}
115 | $$
116 | 而其中，样本均值可以写为：
117 | $$
118 | \begin{align}
119 | \widetilde{m}_i &= \frac{1}{N_i}\sum_{y \in \Gamma_i}y \nonumber
120 | \\
121 | &= \frac{1}{N_i}\sum_{x\in\Gamma_i}\boldsymbol{w}^Tx \nonumber
122 | \\
123 | &= \boldsymbol{w}^T\left(\frac{1}{N_i}\sum_{x\in\Gamma_i}x\right) \nonumber
124 | \\
125 | &= \boldsymbol{w}^T\boldsymbol{m}_i \nonumber
126 | \end{align}
127 | $$
128 | 则准则函数的分子可以写为：
129 | $$
130 | \begin{align}
131 | (\widetilde{m}_1 - \widetilde{m}_2)^2 &= (\boldsymbol{w}^T\boldsymbol{m}_1 - \boldsymbol{w}^T\boldsymbol{m}_2)^2 \nonumber
132 | \\
133 | &=(\boldsymbol{w}^T\boldsymbol{m}_1 - \boldsymbol{w}^T\boldsymbol{m}_2)(\boldsymbol{w}^T\boldsymbol{m}_1 - \boldsymbol{w}^T\boldsymbol{m}_2)^T \nonumber
134 | \\
135 | &= (\boldsymbol{w}^T\boldsymbol{m}_1 - \boldsymbol{w}^T\boldsymbol{m}_2)(\boldsymbol{m}_1^T\boldsymbol{w} - \boldsymbol{m}_2^T\boldsymbol{w}) \nonumber
136 | \\
137 | &=\boldsymbol{w}^T(\boldsymbol{m}_1-\boldsymbol{m}_2)(\boldsymbol{m}_1-\boldsymbol{m}_2)^T\boldsymbol{w} \nonumber
138 | \\
139 | &=\boldsymbol{w}^TS_b\boldsymbol{w}
140 | \end{align}
141 | $$
142 | 而由于
143 | $$
144 | \begin{align}
145 | \widetilde{S}_i^2 &= \sum_{y\in \Gamma_i}(y-\widetilde{m}_i)^2 \nonumber
146 | \\
147 | &= \sum_{x\in\Gamma_i}(\boldsymbol{w}^Tx-\boldsymbol{w}^T\boldsymbol{m}_i)^2 \nonumber
148 | \\
149 | &= \boldsymbol{w}^T\left[\sum_{x\in\Gamma_i}(x-\boldsymbol{m}_i)(x-\boldsymbol{m}_i)^T\right]\boldsymbol{w} \nonumber
150 | \\
151 | &= \boldsymbol{w}^TS_i\boldsymbol{w} \nonumber
152 | \end{align}
153 | $$
154 | 因此分母可以写成：
155 | $$
156 | \begin{align}
157 | \widetilde{S}_1^2 + \widetilde{S}_2^2 &= \boldsymbol{w}^T(S_1 + S_2)\boldsymbol{w} \nonumber
158 | \\
159 | &= \boldsymbol{w}^TS_w\boldsymbol{w}
160 | \end{align}
161 | $$
162 | 将上述各式带回$$J_F(\boldsymbol{w})$$，可得：
163 | $$
164 | J_F(\boldsymbol{w}) = \frac{\boldsymbol{w}^TS_b\boldsymbol{w}}{\boldsymbol{w}^TS_w\boldsymbol{w}}
165 | $$
166 | 
167 | 
168 | 
169 | 
170 | ## 3.4.4 最佳变换向量求解
171 | 
172 | 由于需要使得均值之差（即分子）尽可能大，同时使得样本内离散度（即分母）尽可能小，故实际上就是要使得<mark style="color:purple;">**准则函数**</mark><mark style="color:orange;">**尽可能的大**</mark>
173 | 
174 | 要求使得准则函数取极大值时的$$\boldsymbol{w}^*$$，可以采用<mark style="color:purple;">**拉格朗日乘数法**</mark>求解：
175 | 
176 | {% hint style="success" %}
177 | 
178 | **拉格朗日乘数法**
179 | 
180 | 基本思想是将等式约束条件下的最优化问题转化为无约束条件下的最优化问题
181 | 
182 | 
183 | 
184 | **问题：** 设目标函数为
185 | $$
186 | y = f(x),\ x=(x_1,x_2,\dots,x_n)
187 | $$
188 | 求其在$$m\ (m<n)$$个约束条件
189 | $$
190 | g_k(x) = 0,\ k=1,2,\dots,m
191 | $$
192 | 下的极值
193 | 
194 | 
195 | 
196 | **描述：** 引进函数
197 | $$
198 | L(x,\lambda_1,\lambda_2,\dots,\lambda_m) = f(x) + \sum_{k=1}^{m}\lambda_kg_k(x)
199 | $$
200 | 其中$$\lambda_k,\ k=1,2,\dots,m$$为待定常数，将$$L$$当作$$m+n$$个变量$$x_1,x_2,\dots,x_n$$和$$\lambda_1,\lambda_2,\dots,\lambda_m$$的无约束的函数，对其求一阶偏导数可得稳定点所需要的方程：
201 | $$
202 | \frac{\partial L}{\partial x_i} = 0,\ i=1,2,\dots,n
203 | \\
204 | g_k = 0,\ k=1,2,\dots,m
205 | $$
206 | 
207 | 
208 | {% endhint %}
209 | 
210 | 
211 | 
212 | 令分母等于非零常数，即：
213 | $$
214 | \boldsymbol{w}^TS_w\boldsymbol{w}=c\neq0
215 | $$
216 | 则定义拉格朗日函数为：
217 | 
218 | $$
219 | L(\boldsymbol{w},\lambda) = \boldsymbol{w}^TS_b\boldsymbol{w}-\lambda(\boldsymbol{w}^TS_w\boldsymbol{w}-c)
220 | $$
221 | 对$$\boldsymbol{w}^*$$求偏导，可得：
222 | $$
223 | \frac{\partial L(\boldsymbol{w},\lambda)}{\partial \boldsymbol{w}}= 2(S_b\boldsymbol{w}-\lambda S_w\boldsymbol{w})
224 | $$
225 | 令偏导数为0，有：
226 | $$
227 | \begin{align}
228 | & S_b \boldsymbol{w}^*-\lambda S_w\boldsymbol{w}^*=0 \nonumber
229 | \\
230 | & S_b\boldsymbol{w}^*=\lambda S_w \boldsymbol{w}^* \nonumber
231 | \end{align}
232 | $$
233 | 由于$$S_w$$非奇异，因此存在逆矩阵，可得：
234 | $$
235 | S_w^{-1}S_b\boldsymbol{w}^* = \lambda \boldsymbol{w}^*
236 | $$
237 | 此时本质即为求矩阵$$S_w^{-1}S_b$$的<mark style="color:purple;">**特征值**</mark>问题，将$$S_b=(\boldsymbol{m}_1-\boldsymbol{m}_2)(\boldsymbol{m}_1-\boldsymbol{m}_2)^T$$代入上式，可将$$S_b\boldsymbol{w}^*$$写为：
238 | $$
239 | \begin{align}
240 | S_b\boldsymbol{w}^* &= (\boldsymbol{m}_1-\boldsymbol{m}_2)(\boldsymbol{m}_1-\boldsymbol{m}_2)^T\boldsymbol{w}^* \nonumber\\
241 | &=(\boldsymbol{m}_1-\boldsymbol{m}_2)R
242 | \end{align}
243 | $$
244 | 其中$$R=(\boldsymbol{m}_1-\boldsymbol{m}_2)^T\boldsymbol{w}^*$$为一标量，因此$$S_b\boldsymbol{w}^*$$总是在向量$$(\boldsymbol{m}_1-\boldsymbol{m}_2)$$的方向上，故$$\lambda \boldsymbol{w}^*$$可以写成：
245 | $$
246 | \begin{align}
247 | \lambda \boldsymbol{w}^* &= S_w^{-1}(S_b\boldsymbol{w}^*) \nonumber
248 | \\
249 | &= S_w^{-1}(\boldsymbol{m}_1-\boldsymbol{m}_2)R \nonumber
250 | \end{align}
251 | $$
252 | 从而有：
253 | $$
254 | \boldsymbol{w}^* = \frac{R}{\lambda}S_w^{-1}(\boldsymbol{m}_1-\boldsymbol{m}_2)
255 | $$
256 | 由于只需要找最佳投影方向，因此可以忽略比例因子，有：
257 | $$
258 | \boldsymbol{w}^* = S_w^{-1}(\boldsymbol{m}_1-\boldsymbol{m}_2)
259 | $$
260 | 其中，$$S_w^{-1}$$为高维空间中的<mark style="color:purple;">**总样本类内离散度矩阵**</mark>的逆矩阵，$$\boldsymbol{m}_i$$为高维空间中<mark style="color:purple;">**各样本的均值向量**</mark>
261 | 
262 | 
263 | 
264 | ## 3.4.5 基于最佳变换向量$$\boldsymbol{m}^*$$的投影
265 | 
266 | $$\boldsymbol{w}^*$$是使Fisher准则函数$$J_F(\boldsymbol{w})$$取极大值时的解，也就是d维X空间到一维Y空间的最佳投影方向。有了$$\boldsymbol{w}^*$$，就可以把d维样本X投影到一维，这实际上是多维空间到一维空间的一种映射，这个一维空间的方向$$\boldsymbol{w}^*$$相对于Fisher准则函数$$J_F(\boldsymbol{w})$$是最好的。
267 | 
268 | 
269 | 
270 | 利用Fisher准则，就可以将d维分类问题转化为一维分类问题，然后，只要确定一个阈值T，将投影点$$y_n$$与T相比较，即可进行分类判别。
271 | 


--------------------------------------------------------------------------------
/di-san-zhang-pan-bie-shi-fen-lei-qi/3.5-gan-zhi-qi-suan-fa.md:
--------------------------------------------------------------------------------
  1 | # 3.5 感知器算法
  2 | 
  3 | ## 3.5.1 概述
  4 | 
  5 | - 一旦判别函数的形式确定下来，不管它是线性的还是非线性的，剩下的问题就是如何确定它的系数
  6 | - 在模式识别中，系数确定的一个主要方法就是通过对已知样本的训练和学习来得到
  7 | - 感知器算法就是通过训练样本模式的迭代和学习，产生线性（或广义线性）可分的模式判别函数
  8 | 
  9 | 
 10 | 
 11 | 采用<mark style="color:orange;">**感知器算法**</mark>（Perception Approach）能通过对训练模式样本集的“学习”得到判别函数的系数。
 12 | 
 13 | {% hint style="warning" %}
 14 | 
 15 | 这里采用的算法不需要对各类别中模式的统计性质做任何假设，因此称为<mark style="color:orange;">**确定性**</mark>的方法
 16 | 
 17 | {% endhint %}
 18 | 
 19 | 
 20 | 
 21 | ## 3.5.2 感知器的训练算法
 22 | 
 23 | 已知两个训练模式集分别属于$$\omega_1$$类和$$\omega_2$$类，权向量的初始值为$$\boldsymbol{w}(1)$$，可任意取值。则在用全部训练模式集进行迭代训练时，第k次的训练步骤为：
 24 | 
 25 | - 若$$x_k\in\omega_1$$且$$\boldsymbol{w}^T(k)x_k\leq0$$或$$x_k\in\omega_2$$且$$\boldsymbol{w}^T(k)x_k>0$$，则说明分类器对第k个模式做出了错误分类，此时应该<mark style="color:purple;">**校正权向量**</mark>，使得
 26 | 
 27 | $$
 28 | w(k+1)=w(k)-Cx_k
 29 | $$
 30 | ​	其中C为<mark style="color:orange;">**校正增量**</mark>
 31 | 
 32 | - 否则，则说明分类正确，因此权向量不变：
 33 | 
 34 | $$
 35 | w(k+1)=w(k)
 36 | $$
 37 | 
 38 | 
 39 | 
 40 | 若对$$x\in\omega_2$$的模式样本乘以-1，则有：
 41 | $$
 42 | \text{若}\boldsymbol{w}^T(k)x_k\leq0,\text{则}w(k+1)=w(k)+Cx_k
 43 | $$
 44 | 因此，感知器算法可以统一写成：
 45 | $$
 46 | w(k+1) = 
 47 | \begin{cases}
 48 | w(k) & \boldsymbol{w}^T(k)x_k > 0
 49 | \\
 50 | w(k) + Cx_k & \boldsymbol{w}^T(k)x_k\leq0
 51 | \end{cases}
 52 | $$
 53 | 
 54 | 
 55 | {% hint style="success" %}
 56 | 
 57 | 感知器算法本质上是一种<mark style="color:orange;">**赏罚过程**</mark>
 58 | 
 59 | - 对正确分类的“赏”本质是“不罚”
 60 | - 对错误分类的“罚”，本质是给权向量加上一个正比于$$x_k$$的分量
 61 | 
 62 | {% endhint %}
 63 | 
 64 | 
 65 | 
 66 | {% hint style="info" %}
 67 | 
 68 | ![](../.gitbook/assets/3.5.1.png)
 69 | 
 70 | 将属于$$\omega_2$$的训练样本乘以-1，并写作增广向量的形式，则有：
 71 | $$
 72 | \begin{align}
 73 | x_1&=(0,0,1)^T \nonumber
 74 | \\
 75 | x_2&=(0,1,1)^T\nonumber
 76 | \\
 77 | x_3&=(-1,0,-1)^T\nonumber
 78 | \\
 79 | x_4&=(-1,-1,-1)^T\nonumber
 80 | \end{align}
 81 | $$
 82 | 接下来开始迭代
 83 | 
 84 | 第一轮，取$$C=1,w(1)=(0,0,0)^T$$
 85 | 
 86 | $$\boldsymbol{w}^T(1)x_1=(0,0,0)(0,0,1)^T=0\not\gt 0$$，因此$$\boldsymbol{w}(2)=\boldsymbol{w}(1) + x_1 = (0,0,1)^T$$
 87 | 
 88 | $$\boldsymbol{w}^T(2)x_2=(0,0,1)(0,1,1)^T=1\gt 0$$，因此$$\boldsymbol{w}(3)=\boldsymbol{w}(2)= (0,0,1)^T$$
 89 | 
 90 | $$\boldsymbol{w}^T(3)x_3=(0,0,1)(-1,0,-1)^T=-1\not\gt 0$$，因此$$\boldsymbol{w}(4)=\boldsymbol{w}(3) + x_3 = (-1,0,0)^T$$
 91 | 
 92 | $$\boldsymbol{w}^T(4)x_4=(-1,0,0)(-1,-1,-1)^T=1\gt 0$$，因此$$\boldsymbol{w}(5)=\boldsymbol{w}(4) = (-1,0,0)^T$$
 93 | 
 94 | 
 95 | 
 96 | 由于上一轮迭代中还存在不大于0的情况，因此继续第二轮迭代：
 97 | 
 98 | $$\boldsymbol{w}^T(5)x_1=(-1,0,0)(0,0,1)^T=0\not\gt 0$$，因此$$\boldsymbol{w}(6)=\boldsymbol{w}(5) + x_1 = (-1,0,1)^T$$
 99 | 
100 | $$\boldsymbol{w}^T(6)x_2=(-1,0,1)(0,1,1)^T=1\gt 0$$，因此$$\boldsymbol{w}(7)=\boldsymbol{w}(6) = (-1,0,1)^T$$
101 | 
102 | $$\boldsymbol{w}^T(7)x_3=(-1,0,1)(-1,0,-1)^T=0\not\gt 0$$，因此$$\boldsymbol{w}(8)=\boldsymbol{w}(7) + x_3 = (-2,0,0)^T$$
103 | 
104 | $$\boldsymbol{w}^T(8)x_4=(-2,0,0)(-1,-1,-1)^T=2\gt 0$$，因此$$\boldsymbol{w}(9)=\boldsymbol{w}(8) = (-2,0,0)^T$$
105 | 
106 | 
107 | 
108 | 仍然不是全满足，继续第四轮迭代：
109 | 
110 | $$\boldsymbol{w}^T(9)x_1=(-2,0,0)(0,0,1)^T=0\not\gt 0$$，因此$$\boldsymbol{w}(10)=\boldsymbol{w}(9) + x_1 = (-2,0,1)^T$$
111 | 
112 | $$\boldsymbol{w}^T(10)x_2=(-2,0,1)(0,1,1)^T=1\gt 0$$，因此$$\boldsymbol{w}(11)=\boldsymbol{w}(10) = (-2,0,1)^T$$
113 | 
114 | $$\boldsymbol{w}^T(11)x_3=(-2,0,1)(-1,0,-1)^T=1\gt 0$$，因此$$\boldsymbol{w}(12)=\boldsymbol{w}(11) = (-2,0,1)^T$$
115 | 
116 | $$\boldsymbol{w}^T(12)x_4=(-2,0,1)(-1,-1,-1)^T=1\gt 0$$，因此$$\boldsymbol{w}(13)=\boldsymbol{w}(12) = (-2,0,1)^T$$
117 | 
118 | 
119 | 
120 | 仍然有一个不满足，继续第四轮迭代：
121 | 
122 | $$\boldsymbol{w}^T(13)x_1=(-2,0,1)(0,0,1)^T=1\gt 0$$，因此$$\boldsymbol{w}(14)=\boldsymbol{w}(13) = (-2,0,1)^T$$
123 | 
124 | $$\boldsymbol{w}^T(14)x_2=1\gt 0$$，因此$$\boldsymbol{w}(15)=\boldsymbol{w}(14) = (-2,0,1)^T$$
125 | 
126 | $$\boldsymbol{w}^T(15)x_3=1\gt 0$$，因此$$\boldsymbol{w}(16)=\boldsymbol{w}(15) = (-2,0,1)^T$$
127 | 
128 | $$\boldsymbol{w}^T(16)x_2=1\gt 0$$，因此解向量为$$(-2,0,1)^T$$，对应的判别函数即为：
129 | $$
130 | d(x) = -2x_1+1
131 | $$
132 | {% endhint %}
133 | 
134 | 
135 | 
136 | {% hint style="success" %}
137 | 
138 | 感知器算法的<mark style="color:orange;">**收敛性**</mark>
139 | 
140 | 只要模式类别是线性可分的，就可以在有限的迭代步数里求出权向量
141 | 
142 | {% endhint %}
143 | 
144 | 
145 | 
146 | ## 3.5.3 使用感知器算法的多模式分类
147 | 
148 | 采用3.2.2章节中第三种情况，对M类模式存在M个判别函数$$\{d_i, i = 1,2,\dots,M\}$$，若$$x\in\omega_1$$，则$$d_i\gt d_j,\forall j\neq i$$。由此将感知器算法推广到多类模式：
149 | 
150 | 设M种模式类别$$\omega_1,\omega_2,\dots,\omega_M$$，若在训练过程的第k次迭代时，一个属于$$\omega_i$$类的模式样本x送入分类器，则应先算出M个判别函数：
151 | $$
152 | d_j(k)=\omega_j(k)x,\ \ j=1,2,\dots,M
153 | $$
154 | 若$$d_i(k)\gt d_j(k),\ j=1,2,\dots,M,\ \forall j\neq i$$的条件成立，则权向量不变，即：
155 | $$
156 | w_j(k+1) = w_j(k)
157 | $$
158 | 若其中第一个权向量使得$$d_j(k)\leq d_1(k)$$，则相应的权向量应作调整，即：
159 | $$
160 | \begin{cases}
161 | w_i(k+1) = w_i(k)+C_x
162 | \\
163 | w_l(k+1) = w_l(k)-C_x
164 | \\
165 | w_j(k+1) = w_j(k) & j=1,2,\dots,M,j\neq i,j\neq l
166 | \end{cases}
167 | $$
168 | 其中C是一个正常数，权向量的初值可视情况任意选择。
169 | 
170 | 
171 | 
172 | {% hint style="info" %}
173 | 
174 | 例：给出三类模式的训练样本
175 | $$
176 | \omega_1:\{(0,0)^T\},\\ \omega_1:\{(1,1)^T\},\\ \omega_1:\{(-1,1)^T\}
177 | $$
178 | 将模式样本写成增广形式：
179 | $$
180 | x_1 = (0,0,1)^T
181 | \\
182 | x_2 = (1,1,1)^T
183 | \\
184 | x_3 = (-1,1,1)^T
185 | $$
186 | 取初始值$$w_1(1)=w_2(1)=w_3(1)=(0,0,0)^T$$，C=1
187 | 
188 | 
189 | 
190 | 第一轮迭代，以$$x_1$$为训练样本
191 | $$
192 | d_1(1)=w_1^T(1)x_1=(0,0,0)(0,0,1)^T=0
193 | \\
194 | d_2(1)=w_2^T(1)x_1=(0,0,0)(0,0,1)^T=0
195 | \\
196 | d_3(1)=w_3^T(1)x_1=(0,0,0)(0,0,1)^T=0
197 | $$
198 | 由于$$d_1(1)\not\gt d_2(1),d_1(1)\not\gt d_3(1)$$，因此：
199 | $$
200 | w_1(2) = w_1(1) + x_1=(0,0,1)^T
201 | \\
202 | w_2(2) = w_2(1) - x_1=(0,0,-1)^T
203 | \\
204 | w_3(2) = w_3(1) - x_1=(0,0,-1)^T
205 | $$
206 | 
207 | 
208 | 第二轮迭代，以$$x_2$$为训练样本
209 | $$
210 | d_1(2)=w_1^T(2)x_2=(0,0,1)(1,1,1)^T=1
211 | \\
212 | d_2(2)=w_2^T(2)x_2=(0,0,-1)(1,1,1)^T=-1
213 | \\
214 | d_3(2)=w_3^T(2)x_2=(0,0,-1)(1,1,1)^T=-1
215 | $$
216 | 由于$$d_2(2)\not\gt d_1(2),d_2(2)\not\gt d_3(2)$$，因此：
217 | $$
218 | w_1(3) = w_1(2) - x_2=(-1,-1,0)^T
219 | \\
220 | w_2(3) = w_2(2) + x_2=(1,1,0)^T
221 | \\
222 | w_3(3) = w_3(2) - x_2=(-1,-1,-2)^T
223 | $$
224 | 
225 | 
226 | 第三轮迭代，以$$x_3$$为训练样本
227 | $$
228 | d_1(3)=w_1^T(3)x_3=(-1,-1,0)(-1,1,1)^T=0
229 | \\
230 | d_2(3)=w_2^T(3)x_3=(1,1,0)(-1,1,1)^T=0
231 | \\
232 | d_3(3)=w_3^T(3)x_3=(-1,-1,-2)(-1,1,1)^T=-2
233 | $$
234 | 由于$$d_3(3)\not\gt d_1(3),d_3(3)\not\gt d_2(3)$$，因此：
235 | $$
236 | w_1(4) = w_1(3) - x_3=(0,-2,-1)^T
237 | \\
238 | w_2(4) = w_2(3) - x_3=(2,0,-1)^T
239 | \\
240 | w_3(4) = w_3(3) + x_3=(-2,0,-1)^T
241 | $$
242 | 
243 | 第四轮迭代，以$$x_1$$为训练样本
244 | $$
245 | d_1(4)=w_1^T(4)x_1=-1
246 | \\
247 | d_2(4)=w_2^T(4)x_1=-1
248 | \\
249 | d_3(4)=w_3^T(4)x_1=-1
250 | $$
251 | 由于$$d_1(4)\not\gt d_2(4),d_1(4)\not\gt d_3(4)$$，因此：
252 | $$
253 | w_1(5) = w_1(4) + x_1=(0,-2,0)^T
254 | \\
255 | w_2(5) = w_2(4) - x_1=(2,0,-2)^T
256 | \\
257 | w_3(5) = w_3(4) - x_1=(-2,0,-2)^T
258 | $$
259 | 
260 | 第五轮迭代，以$$x_2$$为训练样本
261 | $$
262 | d_1(5)=w_1^T(5)x_2=-2
263 | \\
264 | d_2(5)=w_2^T(5)x_2=0
265 | \\
266 | d_3(5)=w_3^T(5)x_2=-4
267 | $$
268 | 由于$$d_2(5)\gt d_1(5),d_2(5)\gt d_3(5)$$，因此：
269 | $$
270 | w_1(6) = w_1(5)
271 | \\
272 | w_2(6) = w_2(5)
273 | \\
274 | w_3(6) = w_3(5)
275 | $$
276 | 
277 | 第六轮迭代，以$$x_3$$为训练样本
278 | $$
279 | d_1(6)=w_1^T(6)x_3=-2
280 | \\
281 | d_2(6)=w_2^T(6)x_3=-4
282 | \\
283 | d_3(6)=w_3^T(6)x_3=0
284 | $$
285 | 由于$$d_3(6)\gt d_1(6),d_3(6)\gt d_2(6)$$，因此：
286 | $$
287 | w_1(7) = w_1(6)
288 | \\
289 | w_2(7) = w_2(6)
290 | \\
291 | w_3(7) = w_3(6)
292 | $$
293 | 
294 | 第七轮迭代，以$$x_1$$为训练样本
295 | $$
296 | d_1(7)=w_1^T(7)x_1=0
297 | \\
298 | d_2(7)=w_2^T(7)x_1=-2
299 | \\
300 | d_3(7)=w_3^T(7)x_1=-2
301 | $$
302 | 由于$$d_1(7)\gt d_2(7),d_1(7)\gt d_3(7)$$，因此权向量不变
303 | 
304 | 
305 | 
306 | 综上，可得最后的权向量：
307 | $$
308 | w_1=(0,-2,0)^T
309 | \\
310 | w_2=(2,0,-2)^T
311 | \\
312 | w_3=(-2,0,-2)^T
313 | $$
314 | 因此判别函数为：
315 | $$
316 | d_1(x) = -2x_2
317 | \\
318 | d_2(x) = 2x_1-2
319 | \\
320 | d_3(x) = -2x_1-2
321 | $$
322 | 
323 | 
324 | {% endhint %}
325 | 
326 | 
327 | 
328 | 这里的分类算法都是通过模式样本来确定判别函数的系数，但一个分类器的判断性能最终要受并未用于训练的那些未知样本来检验。
329 | 
330 | 要使一个分类器设计完善，必须采用**有代表性的训练数据**，它能够合理反映模式数据的整体。
331 | 
332 | {% hint style="success" %}
333 | 
334 | 一般来说，合适的样本数目可如下估计：
335 | 
336 |  若k是模式的维数，令C=2(k+1)，则通常选用的训练样本数目约为C的10~20倍。
337 | 
338 | {% endhint %}
339 | 


--------------------------------------------------------------------------------
/di-san-zhang-pan-bie-shi-fen-lei-qi/3.6-ke-xun-lian-de-que-ding-xing-fen-lei-qi-de-die-dai-suan-fa.md:
--------------------------------------------------------------------------------
 1 | # 3.6 可训练的确定性分类器的迭代算法
 2 | 
 3 | ## 3.6.1 梯度法
 4 | 
 5 | ### 定义
 6 | 
 7 | 设函数$$f(y)$$是向量$$y=(y_1,y_2,\dots,y_n)^T$$的函数，则$$f(y)$$的<mark style="color:orange;">**梯度**</mark>定义为：
 8 | $$
 9 | \nabla f(y) = \frac{d}{dy}f(y)=\left(\frac{\partial f}{\partial y_1},\frac{\partial f}{\partial y_2},\dots,\frac{\partial f}{\partial y_n}\right)^T
10 | $$
11 | 
12 | 
13 | - 梯度是一个向量，它的最重要性质就是指出了函数f在其自变量y增加时最大增长率的方向
14 | - 负梯度指出f的<mark style="color:orange;">**最陡下降方向**</mark>
15 | 
16 | 利用这个性质，可以设计一个迭代方案来寻找函数的最小值
17 | 
18 | 
19 | 
20 | ### 采用梯度法求解的一般思想
21 | 
22 | **首先**，对于感知器算法而言
23 | $$
24 | w (k + 1) = 
25 | \begin{cases} 
26 | w (k) & \text{if } w^T(k) x_k > 0 \\
27 | w (k) + C x_k & \text{if } w^T(k) x_k \leq 0
28 | \end{cases}
29 | $$
30 | 其中$$w(k)$$、$$x_k$$随着迭代次数$$k$$变化
31 | 
32 | 
33 | 
34 | **接下来**，定义一个对于错误分类敏感的准则函数$$J(w,x)$$。先任选一个初始权向量$$w(1)$$，计算准则函数$$J$$的梯度，然后从$$w(1)$$出发，在最陡方向（梯度方向）上移动某一距离得到下一个权向量$$w(2)$$ 。
35 | 
36 | 类似的，可以得到从$$w(k)$$导出$$w(k+1)$$的一般关系式：
37 | $$
38 | \begin{align}
39 | w(k+1) &= w(k) - C\left\{\frac{\partial J(w,x)}{\partial w}\right\}_{w=w(k)}\nonumber
40 | \\
41 | &=w(k)-C\cdot\nabla J
42 | \end{align}
43 | $$
44 | 其中C是<mark style="color:orange;">**步长**</mark>，为一个正的比例因子
45 | 
46 | 
47 | 
48 | ### 讨论
49 | 
50 | - 若正确地选择了准则函数$$J(w,x)$$，则当权向量w是一个解时，J达到极小值（此时J的梯度为零）。由于权向量是按J的梯度值减小，因此这种方法称为<mark style="color:purple;">**梯度法**</mark>（最速下降法）。
51 | - 为了使权向量能较快地收敛于一个使函数$$J$$极小的解，**C值的选择是很重要的**：
52 |   - 若C值太小，则收敛太慢
53 |   - 若C值太大，则搜索可能过头，引起发散
54 | 
55 | {% hint style="info" %}
56 | 
57 | ![](..\.gitbook\assets\3.6.1.png)
58 | 
59 | {% endhint %}
60 | 


--------------------------------------------------------------------------------
/di-san-zhang-pan-bie-shi-fen-lei-qi/3.7-shi-han-shu-fa.md:
--------------------------------------------------------------------------------
1 | # 3.7 势函数法
2 | 
3 | 


--------------------------------------------------------------------------------
/di-san-zhang-pan-bie-shi-fen-lei-qi/3.8-jue-ce-shu.md:
--------------------------------------------------------------------------------
1 | # 3.8 决策树
2 | 
3 | 


--------------------------------------------------------------------------------
/di-san-zhang-pan-bie-shi-fen-lei-qi/fu-di-san-zhang-zuo-ye.md:
--------------------------------------------------------------------------------
  1 | # 附 第三章作业
  2 | 
  3 | ## 作业（1）
  4 | 
  5 | ### 题目
  6 | 
  7 | 在一个10类的模式识别问题中，有3类单独满足多类情况1，其余的类别满足多类情况2。问该模式识别问题所需判别函数的最少数目是多少？
  8 | 
  9 | 
 10 | 
 11 | ### 解
 12 | 
 13 | - 对于3类满足情况1的，将剩下的7类合看作一类，则实际上是4分类问题，需要4个判别函数
 14 | - 剩下7个类别为情况2，需要$$\tfrac{M(M-1)}{2}$$条判别函数
 15 | - 故总共需要：
 16 | 
 17 | $$
 18 | 4 + \frac{7\times 6}{2} = 25
 19 | $$
 20 | 
 21 | 
 22 | 
 23 | {% hint style="warning" %}
 24 | 
 25 | 这里24个也是可以的，不再做一次额外的判断
 26 | 
 27 | {% endhint %}
 28 | 
 29 | 
 30 | 
 31 | ## 作业（2）
 32 | 
 33 | ### 题目
 34 | 
 35 | 一个三类问题，其判别函数如下：
 36 | 
 37 |  $$d_1(x)=-x_1, d_2(x)=x_1+x_2-1, d_3(x)=x_1-x_2-1$$
 38 | 
 39 | 1. 设这些函数是在多类情况1条件下确定的，绘出其判别界面和每一个模式类别的区域。
 40 | 
 41 | 2. 设为多类情况2，并使：$$d_{12}(x)= d_1(x), d_{13}(x)= d_2(x), d_{23}(x)= d_3(x)$$。绘出其判别界面和多类情况2的区域。
 42 | 
 43 | 3. 设$$d_1(x), d_2(x)$$和$$d_3(x)$$是在多类情况3的条件下确定的，绘出其判别界面和每类的区域。
 44 | 
 45 | 
 46 | 
 47 | ### 解
 48 | 
 49 | #### Q1
 50 | 
 51 | ![](../.gitbook/assets/判别函数1.png)
 52 | 
 53 | #### Q2
 54 | 
 55 | ![](../.gitbook/assets/判别函数2.png)
 56 | 
 57 | #### Q3
 58 | 
 59 | $$
 60 | \begin{align}
 61 | \because\ &d_1(x)=-x_1 \nonumber
 62 | \\
 63 | & d_2(x)=x_1+x_2-1
 64 | \\
 65 | & d_3(x)=x_1-x_2-1
 66 | \\
 67 | \\
 68 | \therefore\ & d_{12}(x) = d_1(x) - d_2(x) = -2x_1-x_2+1=0
 69 | \\
 70 | & d_{13}(x) = d_1(x)-d_3(x) = -2x_1+x_2+1
 71 | \\
 72 | & d_{23}(x) = d_2(x)-d_3(x) = 2x_2 = 0
 73 | \end{align}
 74 | $$
 75 | 
 76 | ![](../.gitbook/assets/判别函数3.png)
 77 | 
 78 | 
 79 | 
 80 | ## 作业（3）
 81 | 
 82 | ### 题目
 83 | 
 84 | 两类模式，每类包括 5 个 **3 维**不同的模式向量，且良好分布。如果它们是线性可分的，问权向量至少需要几个系数分量？
 85 | 
 86 | 假如要建立二次的多项式判别函数，又至少需要几个系数分量？（设模式的良好分布不因模式变化而改变）
 87 | 
 88 | 
 89 | 
 90 | ### 解
 91 | 
 92 | 代入公式
 93 | $$
 94 | N_w = C_{n+r}^r=\frac{(n+r)!}{r!n!}
 95 | $$
 96 | 则当线性可分时，r=1，n=3，$$N_w$$=4
 97 | 
 98 | 当采用二次多项式判别函数时，r=2，n=3，$$N_w$$=10
 99 | 
100 | {% hint style="warning" %}
101 | 
102 | 只看维度与最高幂
103 | 
104 | {% endhint %}
105 | 
106 | 
107 | 
108 | ## 作业（4）
109 | 
110 | ### 题目
111 | 
112 | 用感知器算法求下列模式分类的解向量$$\boldsymbol{w}$$:
113 | $$
114 | \omega_1:\{(0\ 0\ 0)^T,(1\ 0\ 0)^T,(1\ 0\ 1)^T,(1\ 1\ 0)^T\}
115 | \\
116 | \omega_2:\{(0\ 0\ 1)^T,(0\ 1\ 1)^T,(0\ 1\ 0)^T,(1\ 1\ 1)^T\}
117 | $$
118 | 
119 | 
120 | ### 解
121 | 
122 | 首先讲属于$$\omega_2$$的样本乘以-1，写成增广形式：
123 | $$
124 | \begin{align}
125 | &x_1 = (0\ 0\ 0\ 1)^T \nonumber
126 | \\
127 | &x_2 = (1\ 0\ 0\ 1)^T
128 | \\
129 | &x_3 = (1\ 0\ 1\ 1)^T
130 | \\
131 | &x_4 = (1\ 1\ 0\ 1)^T
132 | \\
133 | &x_5 = (0\ 0\ -1\ -1)^T
134 | \\
135 | &x_6 = (0\ -1\ -1\ -1)^T
136 | \\
137 | &x_7 = (0\ -1\ 0\ -1)^T
138 | \\
139 | &x_8 = (-1\ -1\ -1\ -1)^T
140 | \end{align}
141 | $$
142 | 接下来开始迭代，感知器算法的一般表达：
143 | $$
144 | w(k+1)=
145 | \begin{cases}
146 | w(k) & w^T(k)x^k>0
147 | \\
148 | w(k)+Cx^k &w^T(k)x^k \leq 0
149 | \end{cases}
150 | $$
151 | 
152 | 
153 | 由于步数较多，此处直接给出程序运行结果：
154 | 
155 | [[0. 0. 0. 1.]
156 |  [1. 0. 0. 1.]
157 |  [1. 0. 1. 1.]
158 |  [1. 1. 0. 1.]
159 |  [0. 0. 1. 1.]
160 |  [0. 1. 1. 1.]
161 |  [0. 1. 0. 1.]
162 |  [1. 1. 1. 1.]]
163 | 
164 | init w=[0. 0. 0. 0.]
165 | 
166 | epoch0:
167 |   for x0=[0. 0. 0. 1.],label=1, now w=[0. 0. 0. 0.], predicted_label=0.0,update w=[0. 0. 0. 1.]
168 |   for x1=[1. 0. 0. 1.],label=1, now w=[0. 0. 0. 1.], predicted_label=1.0,keep w
169 |   for x2=[1. 0. 1. 1.],label=1, now w=[0. 0. 0. 1.], predicted_label=1.0,keep w
170 |   for x3=[1. 1. 0. 1.],label=1, now w=[0. 0. 0. 1.], predicted_label=1.0,keep w
171 |   for x4=[0. 0. 1. 1.],label=-1, now w=[0. 0. 0. 1.], predicted_label=1.0,update w=[ 0.  0. -1.  0.]
172 |   for x5=[0. 1. 1. 1.],label=-1, now w=[ 0.  0. -1.  0.], predicted_label=-1.0,keep w
173 |   for x6=[0. 1. 0. 1.],label=-1, now w=[ 0.  0. -1.  0.], predicted_label=0.0,update w=[ 0. -1. -1. -1.]
174 |   for x7=[1. 1. 1. 1.],label=-1, now w=[ 0. -1. -1. -1.], predicted_label=-1.0,keep w
175 | epoch1:
176 |   for x0=[0. 0. 0. 1.],label=1, now w=[ 0. -1. -1. -1.], predicted_label=-1.0,update w=[ 0. -1. -1.  0.]
177 |   for x1=[1. 0. 0. 1.],label=1, now w=[ 0. -1. -1.  0.], predicted_label=0.0,update w=[ 1. -1. -1.  1.]
178 |   for x2=[1. 0. 1. 1.],label=1, now w=[ 1. -1. -1.  1.], predicted_label=1.0,keep w
179 |   for x3=[1. 1. 0. 1.],label=1, now w=[ 1. -1. -1.  1.], predicted_label=1.0,keep w
180 |   for x4=[0. 0. 1. 1.],label=-1, now w=[ 1. -1. -1.  1.], predicted_label=0.0,update w=[ 1. -1. -2.  0.]
181 |   for x5=[0. 1. 1. 1.],label=-1, now w=[ 1. -1. -2.  0.], predicted_label=-1.0,keep w
182 |   for x6=[0. 1. 0. 1.],label=-1, now w=[ 1. -1. -2.  0.], predicted_label=-1.0,keep w
183 |   for x7=[1. 1. 1. 1.],label=-1, now w=[ 1. -1. -2.  0.], predicted_label=-1.0,keep w
184 | epoch2:
185 |   for x0=[0. 0. 0. 1.],label=1, now w=[ 1. -1. -2.  0.], predicted_label=0.0,update w=[ 1. -1. -2.  1.]
186 |   for x1=[1. 0. 0. 1.],label=1, now w=[ 1. -1. -2.  1.], predicted_label=1.0,keep w
187 |   for x2=[1. 0. 1. 1.],label=1, now w=[ 1. -1. -2.  1.], predicted_label=0.0,update w=[ 2. -1. -1.  2.]
188 |   for x3=[1. 1. 0. 1.],label=1, now w=[ 2. -1. -1.  2.], predicted_label=1.0,keep w
189 |   for x4=[0. 0. 1. 1.],label=-1, now w=[ 2. -1. -1.  2.], predicted_label=1.0,update w=[ 2. -1. -2.  1.]
190 |   for x5=[0. 1. 1. 1.],label=-1, now w=[ 2. -1. -2.  1.], predicted_label=-1.0,keep w
191 |   for x6=[0. 1. 0. 1.],label=-1, now w=[ 2. -1. -2.  1.], predicted_label=0.0,update w=[ 2. -2. -2.  0.]
192 |   for x7=[1. 1. 1. 1.],label=-1, now w=[ 2. -2. -2.  0.], predicted_label=-1.0,keep w
193 | epoch3:
194 |   for x0=[0. 0. 0. 1.],label=1, now w=[ 2. -2. -2.  0.], predicted_label=0.0,update w=[ 2. -2. -2.  1.]
195 |   for x1=[1. 0. 0. 1.],label=1, now w=[ 2. -2. -2.  1.], predicted_label=1.0,keep w
196 |   for x2=[1. 0. 1. 1.],label=1, now w=[ 2. -2. -2.  1.], predicted_label=1.0,keep w
197 |   for x3=[1. 1. 0. 1.],label=1, now w=[ 2. -2. -2.  1.], predicted_label=1.0,keep w
198 |   for x4=[0. 0. 1. 1.],label=-1, now w=[ 2. -2. -2.  1.], predicted_label=-1.0,keep w
199 |   for x5=[0. 1. 1. 1.],label=-1, now w=[ 2. -2. -2.  1.], predicted_label=-1.0,keep w
200 |   for x6=[0. 1. 0. 1.],label=-1, now w=[ 2. -2. -2.  1.], predicted_label=-1.0,keep w
201 |   for x7=[1. 1. 1. 1.],label=-1, now w=[ 2. -2. -2.  1.], predicted_label=-1.0,keep w
202 | epoch4:
203 |   for x0=[0. 0. 0. 1.],label=1, now w=[ 2. -2. -2.  1.], predicted_label=1.0,keep w
204 |   for x1=[1. 0. 0. 1.],label=1, now w=[ 2. -2. -2.  1.], predicted_label=1.0,keep w
205 |   for x2=[1. 0. 1. 1.],label=1, now w=[ 2. -2. -2.  1.], predicted_label=1.0,keep w
206 |   for x3=[1. 1. 0. 1.],label=1, now w=[ 2. -2. -2.  1.], predicted_label=1.0,keep w
207 |   for x4=[0. 0. 1. 1.],label=-1, now w=[ 2. -2. -2.  1.], predicted_label=-1.0,keep w
208 |   for x5=[0. 1. 1. 1.],label=-1, now w=[ 2. -2. -2.  1.], predicted_label=-1.0,keep w
209 |   for x6=[0. 1. 0. 1.],label=-1, now w=[ 2. -2. -2.  1.], predicted_label=-1.0,keep w
210 |   for x7=[1. 1. 1. 1.],label=-1, now w=[ 2. -2. -2.  1.], predicted_label=-1.0,keep w
211 | w = [ 2. -2. -2.  1.]
212 | 
213 | 最终得到权向量为$$w=(2,-2,-2,1)^T$$，故判别函数为：
214 | $$
215 | d(x) = 2x_1-2x_2-2x_3+1
216 | $$
217 | 
218 | 
219 | 
220 | 上述结果的代码如下：
221 | 
222 | ```python
223 | import numpy as np
224 | 
225 | 
226 | def perceptron_train(_patterns, _labels, learning_rate, epochs):
227 |     _patterns = np.hstack((patterns, np.ones((_patterns.shape[0], 1))))
228 |     print(f'{_patterns}\n')
229 |     _w = np.zeros(_patterns.shape[1])
230 |     print(f'init w={_w}\n')
231 | 
232 |     for epoch in range(epochs):
233 |         print(f'epoch{epoch}:')
234 |         w_update = False
235 |         for i, pattern in enumerate(_patterns):
236 |             predicted_label = np.sign(np.dot(pattern, _w))
237 | 
238 |             print(f'  for x{i}={pattern},label={_labels[i]}, now w={_w}, predicted_label={predicted_label},', end='')
239 |             if predicted_label != _labels[i]:
240 |                 _w += learning_rate * _labels[i] * pattern
241 |                 w_update = True
242 |                 print(f'update w={_w}')
243 |             else:
244 |                 print(f'keep w')
245 | 
246 |         if not w_update:
247 |             break
248 | 
249 |     return _w
250 | 
251 | 
252 | patterns = np.array([[0, 0, 0], [1, 0, 0], [1, 0, 1], [1, 1, 0],
253 |                      [0, 0, 1], [0, 1, 1], [0, 1, 0], [1, 1, 1]])
254 | labels = np.array([1, 1, 1, 1, -1, -1, -1, -1])
255 | 
256 | w = perceptron_train(patterns, labels, 1, 10)
257 | 
258 | print(f'w = {w}'')
259 | ```
260 | 
261 | 
262 | 
263 | 
264 | ## 作业（5）
265 | 
266 | ### 题目
267 | 
268 | 用多类感知器算法求下列模式的判别函数：
269 | $$
270 | \omega_1:(-1,-1)^T
271 | \\
272 | \omega_2:(0,0)^T
273 | \\
274 | \omega_3:(1,1)^T
275 | $$
276 | 
277 | ### 解
278 | 
279 | 由于递推字数较多，此处仍然直接给出程序运行结果：
280 | 
281 | [[-1. -1.  1.]
282 |  [ 0.  0.  1.]
283 |  [ 1.  1.  1.]]
284 | 
285 | init w=
286 | [[0. 0. 0.]
287 |  [0. 0. 0.]
288 |  [0. 0. 0.]]
289 | 
290 | epoch0:
291 |   for x1=[-1. -1.  1.], now w=[0. 0. 0. 0. 0. 0. 0. 0. 0.], max_label=[0 1 2], 
292 |     update w=[-1. -1.  1.  1.  1. -1.  1.  1. -1.]
293 | 
294 |   for x2=[0. 0. 1.], now w=[-1. -1.  1.  1.  1. -1.  1.  1. -1.], max_label=[0], 
295 |     update w=[-1. -1.  0.  1.  1.  0.  1.  1. -2.]
296 | 
297 |   for x3=[1. 1. 1.], now w=[-1. -1.  0.  1.  1.  0.  1.  1. -2.], max_label=[1], 
298 |     update w=[-2. -2. -1.  0.  0. -1.  2.  2. -1.]
299 | 
300 | epoch1:
301 |   for x1=[-1. -1.  1.], now w=[-2. -2. -1.  0.  0. -1.  2.  2. -1.], max_label=[0], 
302 |     keep w
303 | 
304 |   for x2=[0. 0. 1.], now w=[-2. -2. -1.  0.  0. -1.  2.  2. -1.], max_label=[0 1 2], 
305 |     update w=[-2. -2. -2.  0.  0.  0.  2.  2. -2.]
306 | 
307 |   for x3=[1. 1. 1.], now w=[-2. -2. -2.  0.  0.  0.  2.  2. -2.], max_label=[2], 
308 |     keep w
309 | 
310 | epoch2:
311 |   for x1=[-1. -1.  1.], now w=[-2. -2. -2.  0.  0.  0.  2.  2. -2.], max_label=[0], 
312 |     keep w
313 | 
314 |   for x2=[0. 0. 1.], now w=[-2. -2. -2.  0.  0.  0.  2.  2. -2.], max_label=[1], 
315 |     keep w
316 | 
317 |   for x3=[1. 1. 1.], now w=[-2. -2. -2.  0.  0.  0.  2.  2. -2.], max_label=[2], 
318 |     keep w
319 | 
320 | w = [[-2. -2. -2.]
321 |  [ 0.  0.  0.]
322 |  [ 2.  2. -2.]]
323 | 
324 | 
325 | 
326 | 综上，得到的判别函数为：
327 | $$
328 | d_1(x) = -2x_1-2x_2-2
329 | \\
330 | d_2(x) = 0
331 | \\
332 | d_3(x) = 2x_1+2x_2-2
333 | $$
334 | 所用程序如下：
335 | 
336 | ```python
337 | import numpy as np
338 | 
339 | 
340 | def perceptron_train(_patterns, _labels, learning_rate, epochs):
341 |     _patterns = np.hstack((patterns, np.ones((_patterns.shape[0], 1))))
342 |     print(f'{_patterns}\n')
343 |     _w = np.zeros((_patterns.shape[1], _patterns.shape[0]))
344 |     print(f'init w=\n{_w}\n')
345 | 
346 |     for epoch in range(epochs):
347 |         print(f'epoch{epoch}:')
348 |         w_update = False
349 |         for i, pattern in enumerate(_patterns):
350 |             _d = np.dot(_w, np.transpose(pattern))
351 |             max_label = np.where(_d == np.max(_d))[0]
352 | 
353 |             print(f'  for x{i + 1}={pattern}, now w={_w.flatten()}, max_label={max_label}, \n', end='')
354 | 
355 |             if max_label.__len__() == 1 and max_label[0] == i:
356 |                 print(f'    keep w\n')
357 |             else:
358 |                 _w += learning_rate * np.outer(labels[i], pattern)
359 |                 print(f'    update w={_w.flatten()}\n')
360 |                 w_update = True
361 | 
362 |         if not w_update:
363 |             break
364 | 
365 |     return _w
366 | 
367 | 
368 | patterns = np.array([[-1, -1],
369 |                      [0, 0],
370 |                      [1, 1]])
371 | labels = np.array([[1, -1, -1],
372 |                    [-1, 1, -1],
373 |                    [-1, -1, 1]])
374 | 
375 | w = perceptron_train(patterns, labels, 1, 10)
376 | 
377 | print(f'w = {w}')
378 | ```
379 | 
380 | 


--------------------------------------------------------------------------------
/di-shi-er-zhang-ji-cheng-xue-xi/12.1-jian-jie.md:
--------------------------------------------------------------------------------
1 | # 12.1 简介
2 | 
3 | 


--------------------------------------------------------------------------------
/di-shi-er-zhang-ji-cheng-xue-xi/12.2-bagging.md:
--------------------------------------------------------------------------------
1 | # 12.2 Bagging
2 | 
3 | 


--------------------------------------------------------------------------------
/di-shi-er-zhang-ji-cheng-xue-xi/12.3-boosting.md:
--------------------------------------------------------------------------------
1 | # 12.3 Boosting
2 | 
3 | 


--------------------------------------------------------------------------------
/di-shi-er-zhang-ji-cheng-xue-xi/fu-di-shi-er-zhang-zuo-ye.md:
--------------------------------------------------------------------------------
  1 | # 附 第十二章作业
  2 | 
  3 | ## 作业1
  4 | 
  5 | ### 题目
  6 | 
  7 | 模型复杂度过低/过高通常会导致Bias和Variance怎样的问题？
  8 | 
  9 | ### 答
 10 | 
 11 | - 模型简单会出现欠拟合，表现为偏差高，方差低
 12 | 
 13 |   - 当模型太简单，不能捕捉到数据中的复杂结构时，模型往往会出现⾼偏 差。这意味着模型在训练数据上的表现和在未知数据上的表现都不太好，因为它 没有很好地学习到数据的特征，从⽽导致错误的预测或分类
 14 | 
 15 |   - 由于模型简单，它对训练数据的⼩变化不太敏感，因此在不同的数据集 上的表现⽐较⼀致，导致低⽅差。但这种⼀致性是以牺牲准确性为代价的
 16 | 
 17 | - 模型复杂会出现过拟合，表现为偏差低，方差高
 18 |   - ⼀个复杂的模型能够很好地适应训练数据，⼏乎完美地捕获其所有特 征，从⽽在训练数据上表现出很低的偏差。它可以⾮常精确地预测训练数据中的 结果
 19 |   - 过于复杂的模型可能会对训练数据中的噪声和误差也进⾏学习， 这导致它对于新的、未⻅过的数据表现出⾼⽅差。这意味着模型在不同的数据集 上可能表现出很⼤的波动，即使这些数据集之间的差异很⼩
 20 | 
 21 | 
 22 | 
 23 | 
 24 | 
 25 | ## 作业2
 26 | 
 27 | ### 题目
 28 | 
 29 | 怎样判断、怎样缓解过拟合/欠拟合问题？
 30 | 
 31 | ### 答
 32 | 
 33 | #### 判断
 34 | 
 35 | 从理论上看，若模型的偏差高、方差低，则意味着存在欠拟合；反之则存在过拟合
 36 | 
 37 | 实际可以通过校验误差判断。校验误差随着模型复杂度的变化先减小，此时模型处于欠拟合状态；当模型复杂度超过一定值后，校验误差随模型复杂度增加而增大 ，此时模型进入过拟合状态
 38 | 
 39 | 看在训练集和测试集上的表现
 40 | 
 41 | #### 缓解
 42 | 
 43 | - **欠拟合**
 44 |   - 需要增减模型的复杂度
 45 |   - 增加训练时间
 46 |   - 减少正则化
 47 | - **过拟合**
 48 |   - 降低模型的复杂度
 49 |   - 扩大训练集
 50 |   - 添加正则项
 51 |   - 神经网络中增加Dropout
 52 | 
 53 | 
 54 | 
 55 | ## 作业3
 56 | 
 57 | ### 题目
 58 | 
 59 | 比较Bagging和Boosting算法的异同
 60 | 
 61 | ### 答
 62 | 
 63 | - **相同**
 64 |   - 都是集成学习的算法，本质思路都是通过组合多个弱学习器来构建一个强学习器
 65 |   - 都是旨在通过集成的方法降低泛化误差
 66 | 
 67 | - **不同**
 68 |   - **训练方式**：
 69 |     - bagging是并行的
 70 |     - boosting是顺序执行的
 71 | 
 72 |   - **主要目标**
 73 |     - bagging旨在降低方差，防止出现过拟合
 74 |     - boosting旨在降低偏差，提高模型在训练集上的表现
 75 | 
 76 |   - **权重**
 77 |     - bagging样本权重是一样的
 78 |     - boosting会对分类错误的样本增加权重
 79 | 
 80 |   - **对噪声的敏感度**
 81 |     - bagging模型之间是独立的，容错性更强
 82 |     - boosting对异常值更敏感
 83 | 
 84 | 
 85 | 
 86 | 
 87 | 
 88 | 
 89 | ## 作业4
 90 | 
 91 | ### 题目
 92 | 
 93 | 简述Adaboosting的流程
 94 | 
 95 | ### 答
 96 | 
 97 | 首先，用一个基础的学习器对数据集进行分类训练。接下来的每一次迭代中，增加分类错误的数据的权重，减轻分类正确的样本的权重，依此训练下一个分类器。
 98 | 
 99 | 最后，对于这一系列弱分类器，若该分类器错误率高，则权重较低；反之则权重较高。依此进行加权求和，得到最终的分类结果。
100 | 
101 | 
102 | 
103 | ## 作业5
104 | 
105 | ### 题目
106 | 
107 | 随机森林更适合采用那种决策树？
108 | 
109 | - A、性能好，深度较深
110 | - B、性能弱、深度较浅 
111 | 
112 | ### 答
113 | 
114 | **A**：较深的决策树容易发生过拟合问题，然而采用Bagging可以降低模型的方差，因此可以较好的缓解该问题
115 | 
116 | 
117 | 
118 | ## 作业6
119 | 
120 | ### 题目
121 | 
122 | 基于树的Boosting更适合采用那种决策树？
123 | 
124 | - A、性能好，深度较深
125 | - B、性能弱、深度较浅
126 | 
127 | ### 答
128 | 
129 | **B**：boosting是将许多弱学习器进行组合，形成强分类器。因此此时选择层数不深的决策树即可
130 | 
131 | 
132 | 
133 | 
134 | 
135 | ## 作业7
136 | 
137 | ### 题目
138 | 
139 | 如果对决策树采用Bagging方式进行集成学习，更适合采用哪种方法对决策树的超参（如树的深度）进行调优？
140 | 
141 | - A、交叉验证
142 | - B、包外估计
143 | 
144 | ### 答
145 | 
146 | **B**：在Bagging中，每个基学习器只在原始数据集的一部分上训练，所以可以不用交叉验证，直接采用包外估计
147 | 


--------------------------------------------------------------------------------
/di-shi-yi-zhang-gai-shuai-tu-mo-xing/11.1-pgm-jian-jie.md:
--------------------------------------------------------------------------------
 1 | # 11.1 PGM简介
 2 | 
 3 | {% hint style="success" %}
 4 | 
 5 | <mark style="color:purple;">**概率图模型**</mark>（Probabilistic Graphical Models）
 6 | 
 7 | 多元随机变量的<mark style="color:orange;">**条件独立性**</mark>的概率模型
 8 | 
 9 | {% endhint %}
10 | 
11 | 
12 | 
13 | 它的**特点**是<mark style="color:orange;">**结构预测**</mark>，即输入输出是“序列→序列”的形式，是元素具有依赖约束的序列预测：
14 | $$
15 | \hat{\boldsymbol y} = \arg\max\limits_{\boldsymbol y}p(\boldsymbol{y\vert x})\quad \boldsymbol y\in \mathcal Y, \vert\mathcal Y\vert很大
16 | $$
17 | 
18 | 
19 | 而传统分类问题，y的取值代表分类，只有有限的几个值，并且不是序列：
20 | $$
21 | \hat y = \arg\max\limits_y p (y\vert x)\quad y\in\{1,-1\}
22 | $$
23 | 
24 | 
25 | ## 11.1.1 三大问题
26 | 
27 | - <mark style="color:red;">**表示**</mark>：能够用模型去描述随机变量之间依赖关系
28 |   - **联合概率**：$$P(X) = P(x_1,x_2,\dots,x_D)=P(X_O,X_H)$$
29 |   - **条件独立性**：$$\{x_i\}\perp \{x_j\}\vert\{x_n\}$$
30 | - <mark style="color:red;">**推断**</mark>：给定观测数据，逆向推理，回答非确定性问题
31 |   - **条件概率**：用已知观测变量推测未知变量分布$$P(X_H\vert X_O)$$
32 | - <mark style="color:red;">**学习**</mark>：给定观测数据，学习最佳模型（结构、参数）
33 |   - **联合概率**最大化时的M参数：
34 | 
35 | $$
36 | \Theta^*=\arg\max\limits_\theta P(X\vert\theta)
37 | $$
38 | 
39 | ### 一、表示
40 | 
41 | {% hint style="success" %}
42 | 
43 | 用图表示的概率分布
44 | 
45 | {% endhint %}
46 | 
47 | 
48 | 
49 | **节点**：表示随机变量/状态
50 | 
51 | **边**：表示概率关系
52 | 
53 | 
54 | 
55 | #### 类型
56 | 
57 | - <mark style="color:red;">**有向概率图模型**</mark>（<mark style="color:purple;">**贝叶斯网络**</mark>）：因果关系
58 | - <mark style="color:red;">**无向概率图模型**</mark>（<mark style="color:purple;">**马尔可夫随机场**</mark>）：关联关系
59 | 
60 | ![](../.gitbook/assets/11.1.1.png)
61 | 
62 | 
63 | 
64 | ### 二、推断
65 | 
66 | 如何根据模型和给定的数据回答问题？
67 | 
68 | 用已知的变量推断未知变量的分布：
69 | 
70 | - **边缘概率**：$$p(x_i)$$
71 | - **最大后验概率**：$$y^*=\arg\max p(y\vert x_1,x_2,\dots,x_D)$$
72 | 
73 | 
74 | 
75 | 
76 | 
77 | ### 三、学习
78 | 
79 | - **参数学习**：模型结构已知，求最佳参数
80 | - **结构学习**：变量间依赖关系未知，从数据中学习
81 | 
82 | $$
83 | \mathcal M^*=\arg\max\limits_{\mathcal M\in M} F(\mathcal D;\mathcal M)
84 | $$
85 | 
86 | 
87 | 
88 | ## 两类概率图模型
89 | 
90 | ![](../.gitbook/assets/11.1.2.png)
91 | 
92 | 
93 | 
94 | ## 附录：概率论相关知识
95 | 
96 | 


--------------------------------------------------------------------------------
/di-shi-yi-zhang-gai-shuai-tu-mo-xing/11.2-you-xiang-tu-mo-xing-bei-ye-si-wang-luo.md:
--------------------------------------------------------------------------------
  1 | # 11.2 有向图模型（贝叶斯网络）
  2 | 
  3 | {% hint style="success" %}
  4 | 
  5 | 有向图模型可以表示<mark style="color:orange;">**因果关系**</mark>
  6 | 
  7 | 我们经常观察子变量并依此推断出父变量的分布
  8 | 
  9 | {% endhint %}
 10 | 
 11 | 
 12 | 
 13 | #### 有向图的例子
 14 | 
 15 | - **朴素贝叶斯**：假设在给定y的情况下，特征$$X_j$$之间条件独立
 16 | - 隐马尔科夫模型
 17 | - 卡尔曼滤波
 18 | - 因子分析
 19 | - 概率主成分分析
 20 | - 独立成分分析
 21 | - 混合高斯
 22 | - 转换成分分析
 23 | - 概率专家系统
 24 | - sigmoid信念网络
 25 | - ……
 26 | 
 27 | 
 28 | 
 29 | 
 30 | 
 31 | ## 11.2.1 贝叶斯网络
 32 | 
 33 | ![](../.gitbook/assets/11.2.1.png)
 34 | 
 35 | 1. **概率分布**：用于查询/推断
 36 | 2. **表示**：具体实现
 37 | 3. **条件独立**：模型的解释
 38 | 
 39 | 
 40 | 
 41 | ### 一、概率分布
 42 | 
 43 | 一个概率图模型对应着一族概率分布（a family of probability distribution）
 44 | 
 45 | 
 46 | 
 47 | 每一个<mark style="color:purple;">**节点**</mark>对应着一个<mark style="color:purple;">**条件概率分布**</mark>$$p(x_j\vert x_{\pi_j})$$，其中$$\pi_j$$表示节点j的<mark style="color:orange;">**父节点集合**</mark>
 48 | 
 49 | 联合概率分布可以表示为：
 50 | $$
 51 | p(x_1,x_2,\dots,x_D)= \prod_{j=1}^Dp(x_j\vert x_{\pi_j})
 52 | $$
 53 | 
 54 | 
 55 | 如对于上面的图而言，有：
 56 | $$
 57 | p\left(x_1, x_2, x_3, x_4, x_5, x_6\right)=p\left(x_1\right) p\left(x_2 \mid x_1\right) p\left(x_3 \mid x_1\right) p\left(x_4 \mid x_2\right) p\left(x_5 \mid x_3\right) p\left(x_6 \mid x_2, x_5\right)
 58 | $$
 59 | 
 60 | 
 61 | ### 二、表示
 62 | 
 63 | {% hint style="success" %}
 64 | 
 65 | 贝叶斯网络使用一些列变量间的<mark style="color:orange;">**局部关系**</mark>紧凑的表示<mark style="color:purple;">**联合分布**</mark>
 66 | 
 67 | {% endhint %}
 68 | 
 69 | 
 70 | 
 71 | ![](../.gitbook/assets/11.2.2.png)
 72 | 
 73 | 
 74 | 
 75 | 通过上面一节中的方式，可以将变量的数量由$$O(2^D)$$变为$$O(D\times2^k)$$
 76 | 
 77 | 而通常，变量数$$D$$是远大于状态数$$k$$的
 78 | 
 79 | 
 80 | 
 81 | 具体而言，每一个节点有一个<mark style="color:purple;">**条件概率表（CPT）**</mark>，如下面的例子中：
 82 | 
 83 | ![](../.gitbook/assets/11.2.3.png)
 84 | 
 85 | 很显然，变量的数量得以大大的减少了
 86 | 
 87 | 
 88 | 
 89 | ### 三、条件独立
 90 | 
 91 | **拓扑排序**：定义图G中节点的顺序$$I$$，若对于每个节点$$i\in V$$，它的父节点都在这个顺序中出现在它之前，则称$$I$$为<mark style="color:purple;">**拓扑排序**</mark>
 92 | 
 93 | 
 94 | 
 95 | 对于节点$$j$$，给定图G的拓扑排序$$I$$，假设$$v_j$$表示在$$I$$中除了$$\pi_j$$之外所有出现在节点$$j$$之前的节点，有以下定则：
 96 | 
 97 | {% hint style="warning" %}
 98 | 
 99 | 给定一个节点的父节点，则该节点和它的祖先节点<mark style="color:orange;">**条件独立**</mark>
100 | $$
101 | \{X_j \perp X_{v_j} \mid X_{\pi_j}\}
102 | $$
103 | {% endhint %}
104 | 
105 | 
106 | 
107 | ### 四、三种经典图
108 | 
109 | ![](../.gitbook/assets/11.2.4.png)
110 | 
111 | #### 解释消除
112 | 
113 | 对于多因一果（即上图中第三种），假设各种“因”之间是相互独立的，若已经确定发生了其中一种原因导致了结果，那么由于其他原因导致结果发生的概率就下降了，因此上面一图的独立是**没有条件的**
114 | 
115 | **例子**：结果是草地湿了，原因是下雨和园丁浇水。本来浇水和下雨是独立的，但若已经知道了草地湿是下雨导致的，那么浇水的概率就下降了
116 | 
117 | 
118 | 
119 | 
120 | 
121 | ## 11.2.2 条件独立的快速检验（贝叶斯球）
122 | 
123 | 
124 | 
125 | ### 一、贝叶斯球的动作
126 | 
127 | - <mark style="color:red;">**通过**</mark>：贝叶斯球可以从当前节点的子节点到达其父节点，或从其父节点到达其子节点
128 | - <mark style="color:red;">**反弹**</mark>：对于子节点，父节点反弹来自子节点的球，子节点可以到达其各个兄弟节点；父节点方向同理
129 | - <mark style="color:red;">**截止**</mark>：不对贝叶斯球有任何响应
130 | 
131 | 
132 | 
133 | ### 二、规则
134 | 
135 | - <mark style="color:orange;">**未知节点**</mark>：
136 |   - 总能使贝叶斯球<mark style="color:red;">**通过**</mark>
137 |   - <mark style="color:red;">**反弹**</mark>来自子节点的球
138 | - <mark style="color:orange;">**已知节点**</mark>
139 |   - <mark style="color:red;">**反弹**</mark>来自父节点的球
140 |   - <mark style="color:red;">**截止**</mark>来自子节点的球
141 | 
142 | ![](../.gitbook/assets/11.2.5.png)
143 | 
144 | 对于中间节点Y，若贝叶斯球不能由X经由Y到达Z（或由Z经由Y到达X），则称X和Z关于Y<mark style="color:orange;">**条件独立**</mark>
145 | 


--------------------------------------------------------------------------------
/di-shi-yi-zhang-gai-shuai-tu-mo-xing/11.3-wu-xiang-tu-mo-xing-ma-er-ke-fu-sui-ji-chang.md:
--------------------------------------------------------------------------------
  1 | # 11.3 无向图模型（马尔科夫随机场）
  2 | 
  3 | 
  4 | 
  5 | ## 11.3.1 马尔科夫随机场
  6 | 
  7 | ### 一、概率分布
  8 | 
  9 | {% hint style="success" %}
 10 | 
 11 | <mark style="color:purple;">**团**</mark>：无向图中任何两个节点均有边相连的节点子集称为团
 12 | 
 13 | <mark style="color:purple;">**极大团**</mark>：若C是无向图G的一个团，并且不能再加入G中的任何一个节点使其称为更大的团，则称C为G的一个极大团
 14 | 
 15 | {% endhint %}
 16 | 
 17 | 
 18 | 
 19 | 将无向图模型的联合概率分布表示为其极大团上的随机变量的函数的乘积的形式，称为概率无向图模型的<mark style="color:orange;">**因子分解**</mark>
 20 | 
 21 | 给定无向图G，C为G上的极大团，$$X_C$$表示C对应到随机变量。则无向图模型的联合概率分布$$P(X)$$可以表示为图中所有<mark style="color:purple;">**极大团**</mark>上的函数$$\Psi_C(X_C)$$的乘积形式：
 22 | $$
 23 | P(X) = \frac1Z\prod_C\Psi_c(X_c)
 24 | $$
 25 | 
 26 | 
 27 | 其中，Z是<mark style="color:purple;">**归一化因子**</mark>：
 28 | $$
 29 | Z=\sum_X\prod_C\Psi_c(X_c)
 30 | $$
 31 | 
 32 | 在上式中，$$\Psi_C(X_C)$$称为<mark style="color:purple;">**势函数**</mark>。一般来说，势函数既不是条件概率也不是边际概率，这里一般要求势函数是严格正的，因此一般定义为指数函数：
 33 | $$
 34 | \Psi_c(X_c) = \exp\{-H_c(X_c)\}
 35 | $$
 36 | 
 37 | 
 38 | 
 39 | 
 40 | ### 二、表示
 41 | 
 42 | ![](../.gitbook/assets/11.3.1.png)
 43 | 
 44 | 同样的，通过利用局部参数去表示联合概率，大大的缩小了参数的量
 45 | 
 46 | 
 47 | 
 48 | ### 三、条件独立
 49 | 
 50 | 相较于有向图，无向图的条件独立较为简单：
 51 | 
 52 | ![](../.gitbook/assets/11.3.2.png)
 53 | 
 54 | 对于一个无向图，一个节点所有的邻居节点，构成该节点的<mark style="color:purple;">**马尔科夫包裹**</mark>。
 55 | 
 56 | 
 57 | 
 58 | {% hint style="warning" %}
 59 | 
 60 | 只要给定任一节点的邻居，则该节点和其余节点独立。
 61 | 
 62 | {% endhint %}
 63 | 
 64 | 
 65 | 
 66 | 
 67 | 
 68 | ## 11.3.2 小结
 69 | 
 70 | ### 定义一族概率分布的两种方式
 71 | 
 72 | - 通过枚举所有图上极大团的势函数的可能选择
 73 | 
 74 | $$
 75 | P(X)=\frac1Z\prod_{C\in G}\varPhi_c(X_c)
 76 | $$
 77 | 
 78 | 
 79 | 
 80 | - 通过声明图上的所有条件独立断言
 81 | 
 82 | $$
 83 | P(X_i\mid X_{G\text{\\}i}) = P(X_i\mid X_{N_i})
 84 | $$
 85 | 
 86 | 
 87 | 
 88 | ### Hammersley-Clifford 定理
 89 | 
 90 | {% hint style="success" %}
 91 | 
 92 | **Hammersley-Clifford 定理**：
 93 | 
 94 | 如果分布是严格正的并且满足<mark style="color:purple;">**局部马尔科夫性质**</mark>，那么它对应的<mark style="color:purple;">**无向图**</mark>G可以因子分解为定义在团上的<mark style="color:orange;">**正函数的乘积**</mark>，这些团覆盖了G的所有顶点和边
 95 | 
 96 | 
 97 | 
 98 | 对应于图$$G=(V,E)$$的一个分布具有**局部马尔科夫性**：
 99 | 
100 | 给定任意一节点的邻居，该节点和其余节点<mark style="color:orange;">**条件独立**</mark>
101 | 
102 | {% endhint %}
103 | 
104 | 
105 | 
106 | 基于此定理，可知上述两种方式是<mark style="color:red;">**等价的**</mark>
107 | 


--------------------------------------------------------------------------------
/di-shi-yi-zhang-gai-shuai-tu-mo-xing/11.4-xue-xi-he-tui-duan.md:
--------------------------------------------------------------------------------
 1 | # 11.4 学习和推断
 2 | 
 3 | 我们已经使用概率图M描述了唯一的概率分布P，接下来，有两个典型任务：
 4 | 
 5 | 
 6 | 
 7 | - 我们如何回答关于$$\color{blue}P_M$$的查询，例如$$P_M(X\mid Y)$$？
 8 |   - 我们用<mark style="color:red;">**推断**</mark>来表示计算上述问题答案的过程
 9 | - 我们如何基于数据D估计<mark style="color:blue;">**合理的模型M**</mark>？
10 |   - 我们用<mark style="color:red;">**学习**</mark>来命名获得M的点估计过程
11 |   - 对于<mark style="color:purple;">**贝叶斯学派**</mark>，寻找$$P(M\mid D)$$实际上是一个<mark style="color:red;">**推断**</mark>过程
12 |   - 当不是所有变量都是可观察时，即使是计算M的点估计，也需要使用<mark style="color:red;">**推断**</mark>处理隐含变量
13 | 
14 | 
15 | 
16 | ## 11.4.1 推断
17 | 
18 | ### 一、可能性推断
19 | 
20 | - **给定因求果**：求<mark style="color:purple;">**边际概率**</mark>：
21 | 
22 | $$
23 | p(y) = \sum_xp(y\mid x)p(x)
24 | $$
25 | 
26 | 
27 | 
28 | - **已知果推因**：求<mark style="color:purple;">**后验概率**</mark>：
29 | 
30 | $$
31 | p(x\mid y)=\frac{p(y\mid x)p(x)}{p(y)}
32 | $$
33 | 
34 | 
35 | 
36 | 
37 | 
38 | ### 二、一般的推断方法
39 | 
40 | - **精确推断**：计算代价高
41 |   - 变量消去
42 |   - 信念传播
43 | - **近似推断**：计算代价较低
44 |   - 采样
45 |   - 变分推断
46 | 
47 | 
48 | 
49 | ## 11.4.2 变量消去法
50 | 
51 | 


--------------------------------------------------------------------------------
/di-shi-yi-zhang-gai-shuai-tu-mo-xing/11.5-dian-xing-gai-shuai-tu-mo-xing.md:
--------------------------------------------------------------------------------
 1 | # 11.5 典型概率图模型
 2 | 
 3 | ## 11.5.1 隐马尔科夫模型（HMM）
 4 | 
 5 | ![](../.gitbook/assets/11.4.1.png)
 6 | 
 7 | ### 一、HMM的结构
 8 | 
 9 | ![](../.gitbook/assets/11.4.2.png)
10 | 
11 | - <mark style="color:purple;">**状态节点**</mark>：顶层节点表示隐含变量 $$y_t$$
12 | - <mark style="color:purple;">**输出节点**</mark>：底层节点表示观测变量$$x_t$$
13 | 
14 | 
15 | 
16 | 这里定义$$x_t^j$$表示观测变量在t时刻取j的概率，隐含变量同理
17 | 
18 | 
19 | 
20 | ### 二、HMM的表示
21 | 
22 | 假设隐含变量$$y_t$$的取值范围为状态空间$$\{s_1,\dots,s_N\}$$，观测变量$$x_t$$的取值范围为$$\{o_1,\dots,o_M\}$$，则有：
23 | 
24 | 
25 | 
26 | - <mark style="color:green;">**初始状态分布**</mark>：隐含变量的初始概率分布
27 | 
28 | $$
29 | \boldsymbol \pi =(\pi_1,\dots,\pi_N),\quad\pi_i=P(y_1^i=1)
30 | $$
31 | 
32 | 
33 | 
34 | - <mark style="color:blue;">**状态转移矩阵**</mark>：大小为$$N^2$$
35 | 
36 | $$
37 | \boldsymbol A = \begin{pmatrix}
38 | a_{11} & \cdots & a_{1j} & \cdots &a_{1N}\\
39 | \vdots & \ddots & \vdots & &\vdots\\
40 | a_{i1} & \cdots & a_{ij} &\cdots & a_{iN}\\
41 | \vdots & & \vdots & \ddots &\vdots\\
42 | a_{N1} & \cdots & a_{Nj} & \cdots &a_{NN}
43 | \end{pmatrix}
44 | $$
45 | 
46 | 其中
47 | $$
48 | a_{ij} = P(y_{t+1}^j\mid y_t^i=1),\quad 1\leq i\leq N,1\leq j\leq N
49 | $$
50 | 表示t+1时刻从状态i变为状态j的概率
51 | 
52 | 
53 | 
54 | - <mark style="color:red;">**发射概率矩阵**</mark>：大小为$$N\times M$$
55 | 
56 | $$
57 | \boldsymbol B = \begin{pmatrix}
58 | b_{11} & \cdots & b_{1j} & \cdots & b_{1M}\\
59 | \vdots & \ddots & \vdots & &\vdots\\
60 | b_{i1} & \cdots & b_{ij} &\cdots & b_{iM}\\
61 | \vdots & & \vdots & \ddots &\vdots\\
62 | b_{N1} & \cdots & b_{Nj} & \cdots & b_{NM}
63 | \end{pmatrix}
64 | $$
65 | 
66 | 其中
67 | $$
68 | b_{ij}=P(x_t^j=1\mid y_t^i=1),\quad 1\leq i\leq N,1\leq j\leq M
69 | $$
70 | 表示若t时刻隐含变量处于i状态，观测到变量为j状态的概率
71 | 
72 | 
73 | 
74 | 因此，对于$$(\boldsymbol{x,y})=(x_0,x_1,\dots,x_T,y_0,y_1\dots,y_T)$$的<mark style="color:orange;">**联合概率**</mark>，可以表示为：
75 | $$
76 | \color{orange} p(x,y) = \color{green}p(y_1)\color{blue}\prod_{t=1}^{T-1}p(y_{t+1}\mid y_t)\color{red}\prod_{t=1}^Tp(x_t\mid y_t)
77 | $$
78 | 


--------------------------------------------------------------------------------
/di-shi-yi-zhang-gai-shuai-tu-mo-xing/fu-di-shi-yi-zhang-zuo-ye.md:
--------------------------------------------------------------------------------
  1 | # 附 第十一章作业
  2 | 
  3 | # 作业1
  4 | 
  5 | ## 题目
  6 | 
  7 | 假设我们要采用HMM实现一个英文的词性标注系统，系统中共有20种词性，则状态转移矩阵B的大小为（）
  8 | 
  9 | - A、20
 10 | - B、40
 11 | - C、400
 12 | 
 13 | 
 14 | 
 15 | ## 答
 16 | 
 17 | **C**：状态转移矩阵代表着从每一种状态在下一时刻向另一状态转移的概率，因此大小为$$20^2=400$$
 18 | 
 19 | 
 20 | 
 21 | # 作业2
 22 | 
 23 | ## 题目
 24 | 
 25 | 已知以下贝叶斯网络,包含 7 个变量，即 Season (季节)，Flu (流感)，Dehydration (脱水)，Chills (发 冷)，Headache (头疼)，Nausea (恶心)，Dizziness (头晕)，则下列（条件）独立成立的是（）
 26 | 
 27 | ![](../.gitbook/assets/HMM1.png)
 28 | 
 29 | - A、Season$$\perp$$Chills $$\mid$$ Flu
 30 | - B、Season$$\perp$$Chills
 31 | - C、Season$$\perp$$Headache $$\mid$$ Flu
 32 | 
 33 | 
 34 | 
 35 | ## 答
 36 | 
 37 | 若贝叶斯球不能从A到B，则有条件独立。
 38 | 
 39 | - A
 40 |   - 已知Flu，它会截止来自子节点的贝叶斯球，从Chills发出的球无法到达Season
 41 |   - Dehydration未知，来自Season的贝叶斯球可以通过它到达Headache
 42 |     - 已知Flu，它会截止来自子节点的贝叶斯球，从Headache出发的贝叶斯球无法到达Chills
 43 | - B
 44 |   - Flu未知，能够使贝叶斯球通过，Season和Chills之间可以到达
 45 | - C
 46 |   - Flu已知，根据A的推导，贝叶斯球可以从Season到达Headache
 47 | 
 48 | 
 49 | 
 50 | 因此**A**条件独立
 51 | 
 52 | 
 53 | 
 54 | # 作业3
 55 | 
 56 | ## 题目
 57 | 
 58 | 已知以下贝叶斯网络，包含4个二值变量，则该网络一共有（）个参数
 59 | 
 60 | ![](../.gitbook/assets/HMM2.png)
 61 | 
 62 | - A、4
 63 | - B、8
 64 | - C、9
 65 | - D、16
 66 | 
 67 | 
 68 | 
 69 | ## 答
 70 | 
 71 | **C**：1+2+4+2=9，具体如下：
 72 | 
 73 | ![](../.gitbook/assets/HMM参数.png)
 74 | 
 75 | 
 76 | 
 77 | 
 78 | 
 79 | ## 作业4
 80 | 
 81 | ### 题目
 82 | 
 83 | 假设你有三个盒子，每个盒子里都有一定数量的苹果和桔子。每次随机选择一个盒子，然后从盒子里选一个水果，并记录你的发现（a代表苹果，0代表橘子）。不幸的是，你忘了写下你所选的盒子，只是简单地记下了苹果和桔子。假设每个盒子中水果数量如下：
 84 | 
 85 | - 盒子一：2个苹果，2个橘子
 86 | - 盒子二：3个苹果，1个橘子
 87 | - 盒子三：1个苹果，3个橘子
 88 | 
 89 | （1）请给出HMM模型
 90 | 
 91 | （2）请给出水果序列$$x=(a,a,o,o,o)$$对应的最佳盒子序列
 92 | 
 93 | 
 94 | 
 95 | ### 答
 96 | 
 97 | （1）
 98 | 
 99 | 初始状态矩阵
100 | $$
101 | \boldsymbol\pi = (\frac13,\frac13,\frac13)
102 | $$
103 | 状态转移矩阵
104 | $$
105 | \boldsymbol A=\begin{pmatrix}
106 | \frac13 & \frac13 & \frac13\\
107 | \frac13 & \frac13 & \frac13\\
108 | \frac13 & \frac13 & \frac13
109 | \end{pmatrix}
110 | $$
111 | 发射概率矩阵
112 | $$
113 | \boldsymbol B = \begin{pmatrix}
114 | \frac12 & \frac12\\
115 | \frac34 & \frac14\\
116 | \frac14 & \frac34
117 | \end{pmatrix}
118 | $$
119 | 
120 | 
121 | （2）使用Viterbi解码
122 | 
123 | 初始化：
124 | $$
125 | V_1^k=p(x_1,y_1=s_k)=\pi_kb_{y_1,x_1}
126 | $$
127 | 则本题中，初始化为如下状态：
128 | 
129 | > [1/6 1/4 1/12]
130 | 
131 | 接下来，开始迭代运算：
132 | $$
133 | V_t^k=p(x_t\mid y_t^k=1)\ \max_i(a_{i,k}V_{t-1}^i)
134 | $$
135 | 最后回溯每次的最优解得到答案。
136 | 
137 | 使用numpy求解这部分运算：
138 | 
139 | ```python
140 | import numpy as np
141 | 
142 | from fractions import Fraction
143 | 
144 | np.set_printoptions(formatter={'all': lambda x: str(Fraction(x).limit_denominator())})
145 | 
146 | states = ['box1', 'box2', 'box3']
147 | observables = ['a', 'o']
148 | 
149 | pi = np.array([1 / 3, 1 / 3, 1 / 3])
150 | A = np.array([[1 / 3, 1 / 3, 1 / 3],
151 |               [1 / 3, 1 / 3, 1 / 3],
152 |               [1 / 3, 1 / 3, 1 / 3]])
153 | B = np.array([[1 / 2, 1 / 2],
154 |               [3 / 4, 1 / 4],
155 |               [1 / 4, 3 / 4]])
156 | obs_sequence = ['a', 'a', 'o', 'o', 'o']
157 | obs_idx = [observables.index(x) for x in obs_sequence]
158 | 
159 | 
160 | def viterbi(_obs_sequence, _pi, _A, _B):
161 |     T = len(_obs_sequence)
162 |     N = len(states)
163 |     V = np.zeros((T, N))
164 |     path = np.zeros((T, N), dtype=int)
165 |     for t in range(T):
166 |         if t == 0:
167 |             V[t] = _pi * _B[:, obs_idx[t]]
168 | 
169 |             print('----------init-----------')
170 |             print(V[0])
171 |         else:
172 |             print(f'--------iter {t}----------')
173 |             for j in range(N):
174 |                 V[t, j] = np.max(V[t - 1] * A[:, j]) * B[j, obs_idx[t]]
175 |                 path[t, j] = np.argmax(V[t - 1] * A[:, j])
176 |             print(V[t])
177 | 
178 |     print('----------end-----------')
179 |     print(V.T)
180 |     print('------------------------')
181 |     print(path.T)
182 | 
183 |     print('----------回溯-----------')
184 |     y = np.zeros(T, dtype=int)
185 |     y[-1] = np.argmax(V[-1])
186 | 
187 |     for t in range(T - 1, 0, -1):
188 |         y[t - 1] = path[t, y[t]]
189 | 
190 |     print(y)
191 | 
192 | 
193 | viterbi(obs_sequence, pi, A, B)
194 | ```
195 | 
196 | 输出如下：
197 | 
198 | ```bash
199 | ----------init-----------
200 | [1/6 1/4 1/12]
201 | --------iter 1----------
202 | [1/24 1/16 1/48]
203 | --------iter 2----------
204 | [1/96 1/192 1/64]
205 | --------iter 3----------
206 | [1/384 1/768 1/256]
207 | --------iter 4----------
208 | [1/1536 1/3072 1/1024]
209 | ----------end-----------
210 | [[1/6 1/24 1/96 1/384 1/1536]
211 |  [1/4 1/16 1/192 1/768 1/3072]
212 |  [1/12 1/48 1/64 1/256 1/1024]]
213 | ------------------------
214 | [[0 1 1 2 2]
215 |  [0 1 1 2 2]
216 |  [0 1 1 2 2]]
217 | ----------回溯-----------
218 | [1 1 2 2 2]
219 | ```
220 | 
221 | 


--------------------------------------------------------------------------------
/di-shi-zhang-ban-jian-du-xue-xi/10.1-ji-ben-gai-nian.md:
--------------------------------------------------------------------------------
 1 | # 10.1 基本概念
 2 | 
 3 | ## 10.1.1 常见的学习方式
 4 | 
 5 | ![](../.gitbook/assets/10.1.png)
 6 | 
 7 | ### 一、归纳学习与直推式学习
 8 | 
 9 | - <mark style="color:purple;">**归纳学习（Inductive learning）**</mark>：能够处理全新的数据
10 |   - 给定训练数据集$$D=\{(\boldsymbol x_1, y_1),\dots,(\boldsymbol x_L,yL)\}$$，<mark style="color:orange;">**无标注数据**</mark>$$D_U=\{\boldsymbol x_{L+1},\dots\boldsymbol x_{L+U}\}$$（$$U\gg L$$）
11 |   - 学习一个函数$$f$$用于预测<mark style="color:orange;">**新来的测试数据**</mark>的标签
12 | - <mark style="color:purple;">**直推式学习（Transductive learning）**</mark>：只能处理见过的数据，对于新的数据需要重新训练模型
13 |   - 给定训练数据集$$D=\{(\boldsymbol x_1, y_1),\dots,(\boldsymbol x_L,yL)\}$$，<mark style="color:orange;">**无标注数据**</mark>$$D_U=\{\boldsymbol x_{L+1},\dots\boldsymbol x_{L+U}\}$$（$$U\gg L$$）
14 |   - 可以没有显式的学习函数，关心的是在$$D_U$$上的预测
15 | 
16 | 
17 | 
18 | ### 二、半监督学习
19 | 
20 | - **通用想法**：同时利用有标签数据和无标记数据进行训练
21 | - **半监督分类/回归**
22 |   - 给定训练数据集$$D=\{(\boldsymbol x_1, y_1),\dots,(\boldsymbol x_L,yL)\}$$，无标注数据$$D_U=\{\boldsymbol x_{L+1},\dots\boldsymbol x_{L+U}\}$$（$$U\gg L$$）
23 |   - **目标**：学习一个分类器$$f$$，比只用标记数据训练效果更好
24 | - **半监督聚类/降维**
25 |   - 给定标注数据$$\{\boldsymbol x_i\}_{i=1}^N$$
26 |   - **目的**：聚类或降维
27 |   - **限制**
28 |     - 两个点必须在一个簇，或两个点一定不在一个簇
29 |     - 两个点降维后必须接近
30 | 
31 | 
32 | 
33 | ## 10.1.2 几种假设
34 | 
35 | ### 一、平滑假设
36 | 
37 | **半监督学习的平滑假设**：如果<mark style="color:orange;">**高密度**</mark>空间中两个点$$\boldsymbol x_1,\boldsymbol x_2$$距离较近，则对应的输出$$y_1,y_2$$也应该接近
38 | 
39 | **监督学习的平滑假设**：如果空间中两个点$$\boldsymbol x_1,\boldsymbol x_2$$距离较近，那么对应到输出$$y_1,y_2$$也应该接近
40 | 
41 | 
42 | 
43 | 


--------------------------------------------------------------------------------
/di-shi-zhang-ban-jian-du-xue-xi/10.2-ban-jian-du-xue-xi-suan-fa.md:
--------------------------------------------------------------------------------
1 | # 10.2 半监督学习算法
2 | 
3 | 


--------------------------------------------------------------------------------
/di-si-zhang-te-zheng-xuan-ze-he-ti-qu/4.1-mo-shi-lei-bie-ke-fen-xing-de-ce-du.md:
--------------------------------------------------------------------------------
  1 | # 4.1 模式类别可分性的测度
  2 | 
  3 | ## 4.1.1 概述
  4 | 
  5 | {% hint style="warning" %}
  6 | 
  7 | 特征选择和提取是模式识别中的一个关键问题
  8 | 
  9 | {% endhint %}
 10 | 
 11 | 
 12 | 
 13 | - 如果将数目很多的测量值不做分析，全部直接用作分类特征，不但耗时，而且会影响到分类的效果，产生<mark style="color:purple;">**特征维数灾难**</mark>问题
 14 | - 为了设计出效果好的分类器，通常需要对原始的测量值集合进行分析，经过选择或变换处理，组成有效的识别特征
 15 | - 在保证一定分类精度的前提下，减少特征维数，即进行<mark style="color:orange;">**降维**</mark>处理，使分类器实现快速、准确和高效的分类
 16 | - 为达到上述目的，关键是所提供的识别特征应具有很好的可分性，使分类器容易判别。为此，需对特征进行选择
 17 |   - 去掉模棱两可、不易判别的特征
 18 |   - 所提供的特征不要重复，即去掉那些相关性强且没有增加更多分类信息的特征
 19 | 
 20 | 
 21 | 
 22 | 
 23 | 
 24 | ## 4.1.2 特征选择和提取
 25 | 
 26 | **特征选择**：从$$n$$个度量值集合$$\left\{x_1,x_2,\dots,x_n\right\}$$中，按照某一准则<mark style="color:orange;">**选取**</mark>出供分类的<mark style="color:purple;">**子集**</mark>，作为降维的分类特征
 27 | 
 28 | 
 29 | 
 30 | **特征提取**：使$$\{x_1,x_2,\dots,x_n\}$$通过某种<mark style="color:orange;">**变换**</mark>，产生$$m$$个特征作为新的分类特征（也称为<mark style="color:purple;">**二次特征**</mark>）
 31 | 
 32 | 
 33 | 
 34 | 上述两种方法的目的都是为了在尽可能保留识别信息的前提下，降低特征空间的维数，已达到有效的分类。
 35 | 
 36 | 
 37 | 
 38 | ## 4.1.3 模式类别可分性的测度
 39 | 
 40 | ### 一、点到点之间的距离
 41 | 
 42 | 在n维空间中，两点a、b之间的欧式距离为：
 43 | $$
 44 | D(a,b)= \Vert a-b\Vert
 45 | $$
 46 | 写成距离平方的形式：
 47 | $$
 48 | \begin{align}
 49 | D^2(a,b)&=(a-b)^T(a-b) \nonumber
 50 | \\
 51 | &=\sum_{k=1}^n(a_k-b_k)^2 \nonumber
 52 | \end{align}
 53 | $$
 54 | 其中，$$a_k$$、$$b_k$$为向量$$\boldsymbol{a},\boldsymbol{b}$$的第k个分量
 55 | 
 56 | 
 57 | 
 58 | ### 二、点到点集之间的距离
 59 | 
 60 | 在n维空间中，点$$x$$到点$$a^{(i)}$$之间的距离平方为：
 61 | $$
 62 | D^2(x,a^{(i)})=\sum_{k-=1}^n(x_k-a_k^{(i)})^2
 63 | $$
 64 | 带入得点$$x$$到点集$$\{a^{(i)}\}_{i=1,2,\dots,k}$$之间的<mark style="color:orange;">**均方距离**</mark>为：
 65 | $$
 66 | \begin{align}
 67 | \overline{D^2(x,a^{(i)})} &= \frac{1}{K}\sum_{i=1}^KD^2(x,a^{(i)}) \nonumber
 68 | \\
 69 | &= \frac{1}{K}\sum_{i=1}^K\left\{\sum_{k-=1}^n(x_k-a_k^{(i)})^2\right\}
 70 | \end{align}
 71 | $$
 72 | 
 73 | 
 74 | ### 三、类内距离
 75 | 
 76 | n维空间中同一类内各模式样本点集$$\{a^{(i)}\}_{i=1,2,\dots,K}$$，其内部各点的<mark style="color:purple;">**均方距离**</mark>为：
 77 | $$
 78 | \overline{D^2(\{a^{(j)}\}, \{a^{(i)}\})} = \frac{1}{K}\sum_{j=1}^K\left[\frac{1}{K-1}\sum_{\substack{i=1\\i\neq j}}^K\sum_{k=1}^n(a_k^{(j)}-a_k^{(i)})^2\right]
 79 | $$
 80 | 此外，可证明：
 81 | $$
 82 | \overline{D^2}=2\sum_{k=1}^n\sigma_k^2
 83 | $$
 84 | 其中，$$\sigma_k^2$$为$$\{a^{(i)}\}$$在第k个份量上的<mark style="color:purple;">**无偏方差**</mark>：
 85 | $$
 86 | \sigma_k^2=\frac{1}{K-1}\sum_{i=1}^K(a_k^{(i)}-\overline{a_k})^2
 87 | $$
 88 | 其中，$$\overline{a_k}$$为$$a^{(i)}$$在第k个分量上的<mark style="color:purple;">**均值**</mark>：
 89 | $$
 90 | \overline{a_k} = \frac{1}{K}\sum_{i=1}^Ka_k^{(i)}
 91 | $$
 92 | 证明略
 93 | 
 94 | 
 95 | 
 96 | ### 四、类内散布矩阵
 97 | 
 98 | 一类内各模式样本点集$$\{a^{(i)}\}_{i=1,2,\dots,K}$$，其<mark style="color:orange;">**类内散布矩阵**</mark>为：
 99 | $$
100 | S=\sum_{i=1}^K\{(a^{(i)}-m)(a^{(i)}-m)^T\}
101 | $$
102 | 其中
103 | $$
104 | m=\frac{1}{K}\sum_{i=1}^Ka^{(i)}
105 | $$
106 | {% hint style="success" %}
107 | 
108 | 类内散布矩阵表示各样本点围绕其均值周围的散布情况
109 | 
110 | {% endhint %}
111 | 
112 | 
113 | 
114 | ### 五、类间距离和类间散布矩阵
115 | 
116 | 两个点集的距离$$\overline{D^2(\{a^{(i)}\}, \{b^{(j)}\})}_{i=1,2,\dots,K_a;j=1,2,\dots,K_b}$$对类别的可分性起着重要的作用，为简化起见，常用两类样本各自质心间的距离作为<mark style="color:orange;">**类间距离**</mark>，并假设两类样本出现的概率相等，则：
117 | $$
118 | D^2=\sum_{k=1}^n(\boldsymbol{m}_{1_k}-\boldsymbol{m}_{2_k})^2
119 | $$
120 | 其中，$$\boldsymbol{m}_1$$和$$\boldsymbol{m}_2$$为两类模式样本集各自的<mark style="color:purple;">**均值向量**</mark>，$$\boldsymbol{m}_{1_k}$$和$$\boldsymbol{m}_{2_k}$$为各自的第k个<mark style="color:purple;">**分量**</mark>，n为<mark style="color:purple;">**维数**</mark>
121 | 
122 | 
123 | 
124 | 这两个模式的<mark style="color:purple;">**类间散布矩阵**</mark>为：
125 | $$
126 | S_{b2}=(m_1-m_2)(m_1-m_2)^T
127 | $$
128 | 
129 | 
130 | 扩展到三个以上的类别，<mark style="color:orange;">**类间散布矩阵**</mark>可以写作：
131 | $$
132 | S_b = \sum_{i=1}^cP(\omega_i)(m_i-m_0)(m_i-m_0)^T
133 | $$
134 | 其中，$$m_0$$为多类模式分布的<mark style="color:purple;">**总体均值向量**</mark>，c为类别数量：
135 | $$
136 | m_0=E\{x\}=\sum_{i=1}^cp(\omega_i)m_i,\ \forall\omega_i,i=1,2,\dots,c
137 | $$
138 | 
139 | 
140 | ### 六、多类模式集散布矩阵
141 | 
142 | <mark style="color:red;">**多类情况**</mark>的<mark style="color:orange;">**类内散布矩阵**</mark>，可以写成各类的类内散布矩阵的<mark style="color:purple;">**先验概率的加权和**</mark>：
143 | $$
144 | \begin{align}
145 | S_w &=\sum_{i=1}^cP(\omega_1)E\{(x-m_i)(x-m_i)^T\vert\omega_i\} \nonumber
146 | \\
147 | &=\sum_{i=1}^cP(\omega_i)C_i \nonumber
148 | \end{align}
149 | $$
150 | 其中，$$C_i$$是第i类的<mark style="color:purple;">**协方差矩阵**</mark>
151 | 
152 | 
153 | 
154 | 有时，使用多类模式<mark style="color:purple;">**总体分布的散布矩阵**</mark>来反映其可分性，即：
155 | $$
156 | S_t = E\{(x-m_0)(x-m_0)^T\},\ \ x\in\forall,i=1,2,\dots,c
157 | $$
158 | 其中$$\boldsymbol{m}_0$$为多类模式分布的<mark style="color:purple;">**总体均值向量**</mark>
159 | 
160 | 
161 | 
162 | ### 七、关系
163 | 
164 | {% hint style="success" %}
165 | 
166 | 总体散布矩阵是各类类内散布矩阵与类间散布矩阵之和
167 | 
168 | {% endhint %}
169 | 
170 | $$
171 | S_t = S_w+S_b
172 | $$
173 | 


--------------------------------------------------------------------------------
/di-si-zhang-te-zheng-xuan-ze-he-ti-qu/4.2-te-zheng-xuan-ze.md:
--------------------------------------------------------------------------------
  1 | # 4.2 特征选择
  2 | 
  3 | ## 4.2.1 概念
  4 | 
  5 | 设有n个可用作分类的测量值，为了在尽量不降低分类精度的前提下，减小特征空间的维数以减少计算量，需从中直接选出m个作为分类的特征。
  6 | 
  7 | 
  8 | 
  9 | **那么，怎么选呢？**
 10 | 
 11 | 
 12 | 
 13 | 要从n个特征值中选出m个，共有$$C_n^m=\dfrac{n!}{m!(n-m)!}$$种选法，使用穷举法对每种选法进行测试耗时过大，因此需要寻找一种简便的可分性准则，间接判断每种子集的优劣
 14 | 
 15 | 
 16 | 
 17 | ## 4.2.2 类间可分性准则
 18 | 
 19 | {% hint style="success" %}
 20 | 
 21 | - 对于不同类别模式之间，<mark style="color:purple;">**均值向量**</mark>间的距离应该尽可能的大
 22 | - 对于同一类的模式特征，<mark style="color:purple;">**方差**</mark>之和应该尽可能的小
 23 | 
 24 | {% endhint %}
 25 | 
 26 | 
 27 | 
 28 | 假设各原始特征测量值是统计独立的，此时，只需对训练样本的n个测量值独立地进行分析，从中选出m个最好的作为分类特征即可。
 29 | 
 30 | 
 31 | 
 32 | {% hint style="info" %}
 33 | 
 34 | 例：对于$$\omega_i$$和$$\omega_j$$两类训练样本，设其均值向量为$$\boldsymbol{m}_i$$和$$\boldsymbol{m}_j$$，其在k维度方向上的分量为$$m_{ik}$$、$$m_{jk}$$，方差为$$\sigma_{ik}^2$$和$$\sigma_{jk}^2$$
 35 | 
 36 | 
 37 | 
 38 | 则定义可分性准则函数：
 39 | $$
 40 | G_K=\frac{(m_{ik}-m_{jk})^2}{\sigma_{ik}^2+\sigma_{jk}^2},\ k=1,2,\dots,n
 41 | $$
 42 | 若$$G_k$$越大，代表测度值的第k个分量对分离两类越有效。将$$G_K,\ k=1,2,\dots,n$$按照大小分类，选出最大的m个对应的测度值既可作为分类特征。
 43 | 
 44 | {% endhint %}
 45 | 
 46 | 
 47 | 
 48 | 
 49 | 
 50 | ## 4.2.3 可分性准则的适用范围
 51 | 
 52 | ![](../.gitbook/assets/4.2.1.png)
 53 | 
 54 | - 对于（a）中的特征$$x_k$$，其分布有着很好的可分性，通过它可以分离两种类别
 55 | - 对于（b）中的特征$$x_k$$，其分布存在很大的重叠，单靠$$x_k$$不足以打到较好的分类，需要添加其他特征
 56 | 
 57 | ![](../.gitbook/assets/4.2.2.png)
 58 | 
 59 | - 对于（c）中$$\omega_i$$的特征$$x_k$$，它的分布有两个最大值，虽然与$$\omega_j$$不存在重叠，但是由于计算出来$$G_k$$约等于0，因此它作为可分性准则已经不再合适
 60 | 
 61 | 
 62 | 
 63 | {% hint style="success" %}
 64 | 
 65 | **总结**：假若类概率密度函数不是或不近似正态分布，均值和方差就不足以用来估计类别的可分性，此时该准则函数不完全适用
 66 | 
 67 | {% endhint %}
 68 | 
 69 | 
 70 | 
 71 | ## 4.2.4 一般特征的散布矩阵准则
 72 | 
 73 | <mark style="color:purple;">**类内离散度矩阵**</mark>：
 74 | $$
 75 | S_w=\sum_{i=1}^cP(\omega_i)E\{(x-m_i)(x-m_i)^T\vert\omega_i\}
 76 | $$
 77 | <mark style="color:purple;">**类间离散度矩阵**</mark>：
 78 | $$
 79 | S_b = \sum_{i=1}^cP(\omega_i)(m_i-m_0)(m_i-m_0)^T
 80 | $$
 81 | 由上可以推出散布矩阵准则采用以下两种形式：
 82 | 
 83 | - 行列式形式
 84 | 
 85 | $$
 86 | J_1=\det(S_w^{-1}S_b)=\prod_{i}\lambda_i
 87 | $$
 88 | 
 89 | - 迹形式
 90 | 
 91 | $$
 92 | J_2=\text{tr}(S_w^{-1}S_b)=\sum_{i}\lambda_i
 93 | $$
 94 | 
 95 | 其中，$$\lambda_i$$是矩阵$$S_w^{-1}S_b$$的<mark style="color:purple;">**特征值**</mark>，使得$$J_1$$和$$J_2$$最大的子集可以作为可选择的分类特征。
 96 | 
 97 | 
 98 | 
 99 | {% hint style="warning" %}
100 | 
101 | 这里计算的散布矩阵不受模式分布形式的限制，但需要有足够数量的模式样本才能获得有效的结果
102 | 
103 | {% endhint %}
104 | 
105 | 


--------------------------------------------------------------------------------
/di-si-zhang-te-zheng-xuan-ze-he-ti-qu/4.3-li-san-kl-bian-huan.md:
--------------------------------------------------------------------------------
  1 | # 4.3 离散K-L变换
  2 | 
  3 | - 前面讨论的特征选择是在一定准则下，从n个特征中选出k个来反映原有模式
  4 | - 这种简单删掉某n-k个特征的做法并不十分理想，因为一般来说，原来的n个数据各自在不同程度上反映了识别对象的某些特征，简单地删去某些特征可能会丢失较多的有用信息
  5 | - 如果将原来的特征做正交变换，获得的每个数据都是原来n个数据的线性组合，然后从新的数据中选出少数几个，使其尽可能多地反映各类模式之间的差异，而这些特征间又尽可能相互独立，则比单纯的选择方法更灵活、更有效
  6 | 
  7 | 
  8 | 
  9 | {% hint style="success" %}
 10 | 
 11 | K-L变换（Karhunen-Loeve变换）就是一种适用于任意概率密度函数的<mark style="color:orange;">**正交变换**</mark>
 12 | 
 13 | {% endhint %}
 14 | 
 15 | 
 16 | 
 17 | ## 4.3.1 离散的有限K-L展开
 18 | 
 19 | ### 一、展开式的推导
 20 | 
 21 | 设有一连续的随机实函数$$x(t)$$，$$T_1\leq t\leq T_2$$，则$$x(t)$$可用已知的正交函数集$$\{\varphi_j(t),\ j=1,2,\dots\}$$的线性组合展开，即：
 22 | 
 23 | 
 24 | $$
 25 | \begin{align}
 26 | x(t) &= a_1\varphi_1(t) + a_2\varphi_2(t) + \cdots+a_j\varphi_j(t) + \cdots \nonumber
 27 | \\
 28 | &= \sum_{j=1}^\infty a_j\varphi_j(t),\ \ T_1\leq t\leq T_2 \nonumber
 29 | \end{align}
 30 | $$
 31 | 
 32 | 
 33 | 其中$$a_j$$为展开式的<mark style="color:purple;">**随即系数**</mark>，$$\varphi_j(t)$$为一<mark style="color:purple;">**连续的正交函数**</mark>，满足：
 34 | $$
 35 | \int_{T_1}^{T_2}\varphi_n(t)\tilde{\varphi}_m(t)dt=\begin{cases}
 36 | 1 & m=n
 37 | \\
 38 | 0 & m\neq n
 39 | \end{cases}
 40 | $$
 41 | 其中，$$\tilde{\varphi}_m(t)$$为$$\varphi_m(t)$$的<mark style="color:purple;">**共轭复数**</mark>形式
 42 | 
 43 | 
 44 | 
 45 | 将上式写成离散的正交函数形式，使得连续随机函数$$x(t)$$和连续正交函数$$\varphi_j(t)$$在区间$$T_1\leq t \leq T_2$$内被<mark style="color:orange;">**等间隔采样**</mark>为n个离散点，即：
 46 | $$
 47 | x(t) \to \{x(1),x(2),\cdots,x(n)\}
 48 | \\
 49 | \varphi_j(t) \to \{\varphi_j(1),\varphi_j(2),\cdots,\varphi_j(n)\}
 50 | $$
 51 | 写成向量形式，则有：
 52 | $$
 53 | x = (x(1),x(2), \cdots, x(n))^T
 54 | \\
 55 | \varphi_j=(\varphi_j(1),\varphi_j(2),\cdots,\varphi_j(n))^T
 56 | $$
 57 | 
 58 | 
 59 | 则可以对公式$$(1)$$取n项近似，并写成离散展开形式：
 60 | $$
 61 | x=\sum_{j=1}^na_j\varphi_j = \varphi a,\ \ T_1\leq t\leq T_2
 62 | $$
 63 | 其中，$$a$$为展开式中的随即系数的向量形式，$$\varphi$$为一$$n\times n$$矩阵，即：
 64 | $$
 65 | a=(a_1,a_2,\dots,a_j,\dots,a_n)^T
 66 | \\
 67 | \\
 68 | \varphi = 
 69 | \begin{bmatrix}
 70 | \varphi_1(1) & \varphi_2(1) & \cdots & \varphi_n(1)
 71 | \\
 72 | \varphi_1(2) & \varphi_2(2) & \cdots & \varphi_n(2)
 73 | \\
 74 | \vdots & \vdots & \ddots & \vdots
 75 | \\
 76 | \varphi_1(n) & \varphi_2(n) & \cdots & \varphi_n(n)
 77 | \end{bmatrix}
 78 | $$
 79 | 可以看出，$$\varphi$$本质上是一个<mark style="color:orange;">**正交变换矩阵**</mark>，它将$$x$$变换成$$a$$。
 80 | 
 81 | 
 82 | 
 83 | ### 二、K-L展开式的性质
 84 | 
 85 | {% hint style="success" %}
 86 | 
 87 | 如果对c种模式类别$$\{\omega_i\}_{i=1,\dots,c}$$做离散正交展开，则对每一模式可分别写成：$$x_i=\varphi a_i$$，其中矩阵$$\varphi$$取决于所选用的正交函数
 88 | 
 89 | 对各个模式类别，正交函数都是<mark style="color:orange;">**相同的**</mark>，但其展开系数向量$$a_i$$则因类别的不同模式分布而异
 90 | 
 91 | {% endhint %}
 92 | 
 93 | 
 94 | 
 95 | K-L展开式的根本性质是将随机向量x展开为另一组正交向量$$\varphi_j$$的线性和，且其展开式系数$$a_j$$（即系数向量a的各个分量）具有不同的性质。
 96 | 
 97 | 
 98 | 
 99 | ### 三、变换矩阵和系数的计算
100 | 
101 | 设随机向量x的<mark style="color:purple;">**总体自相关矩阵**</mark>为$$R=E\{xx^T\}$$，由：
102 | $$
103 | x=\sum_{j=1}^n a_j\varphi_j=\varphi a,\ \ T_1\leq t\leq T_2 \tag{2}
104 | $$
105 | 
106 | 
107 | 将$$X=\varphi a$$带入自相关矩阵，可得：
108 | $$
109 | \begin{align}
110 | R &= E\{\varphi a a^T\varphi^T\} \notag
111 | \\
112 | &= \varphi (E\{aa^T\})\varphi^T \notag
113 | \end{align}
114 | $$
115 | 要求系数向量$$\boldsymbol{a}$$的各个不同分量应<mark style="color:orange;">**独立**</mark>，即应使$$(a_1, a_2, …, a_j, …, a_n)$$满足如下关系：
116 | $$
117 | E(a_ja_k) = \begin{cases}
118 | \lambda_j & j=k
119 | \\
120 | 0 & j\neq k
121 | \end{cases}
122 | $$
123 | 写成矩阵形式，应使：$$E\{a a^T\} = D_\lambda$$，其中$$D_\lambda$$为对角形矩阵，其互相关成分均为0，即：
124 | $$
125 | D_\lambda = 
126 | \begin{bmatrix}
127 | \lambda_1 & 0 & \cdots & \cdots 0
128 | \\
129 | 0 & \ddots & 0 & \cdots & 0
130 | \\
131 | 0 & \cdots & \lambda_j & \cdots & 0
132 | \\
133 | 0 & \cdots & 0 & \ddots & 0
134 | \\
135 | 0 & \cdots & \cdots & 0 & \lambda_n
136 | \end{bmatrix}
137 | $$
138 | 则自相关矩阵可以写成：
139 | $$
140 | R=\varphi D_\lambda \varphi^T
141 | $$
142 | 由于$$\varphi$$中各个向量都相互<mark style="color:orange;">**归一正交**</mark>，因此有：
143 | $$
144 | R\varphi = \varphi D_\lambda \varphi^T \varphi = \varphi D_\lambda
145 | $$
146 | 其中，$$\varphi_j$$向量对应为：
147 | $$
148 | R \varphi_j=\lambda_j\varphi_j
149 | $$
150 | 可以看出，$$\lambda_j$$是x的自相关矩阵R的<mark style="color:orange;">**特征值**</mark>，$$\varphi_j$$是对应的<mark style="color:orange;">**特征向量**</mark>。因为R是实对称矩阵，其不同特征值对应的特征向量应正交，即：
151 | $$
152 | \varphi_j^T\varphi_k = 
153 | \begin{cases}
154 | 1 & j=k
155 | \\
156 | 0 & j\neq k
157 | \end{cases}
158 | $$
159 | 带入式$$(2)$$，K-L展开式的系数为：
160 | $$
161 | a = \varphi^Tx
162 | $$
163 | 
164 | 
165 | {% hint style="success" %}
166 | 
167 | 总结而言，计算l K-L展开式系数可以分为以下三步：
168 | 
169 | 1. 求随机向量x的自相关矩阵：$$R = E\{xx^T\}$$
170 | 2. 求R的特征值和特征向量，得到矩阵$$\varphi=(\varphi_1,\varphi_2,\dots,\varphi_n)$$
171 | 3. 求展开式系数：$$a=\varphi^Tx$$
172 | 
173 | {% endhint %}
174 | 
175 | 
176 | 
177 | 
178 | 
179 | ## 4.3.2 按照K-L展开式选择特征
180 | 
181 | - K-L展开式用于特征选择相当于一种线性变换
182 | - 若从n个特征向量中取出m个组成变换矩阵$$\varphi$$，即：
183 | 
184 | $$
185 | \varphi = (\varphi_1\ \varphi_2\ \dots\ \varphi_m),\ m<n
186 | $$
187 | 
188 | 此时，$$\varphi$$是一个n*m维矩阵，x是n维向量，经过$$\varphi^Tx$$变换，即得到降维为m的新向量
189 | 
190 | 
191 | 
192 | ### 推导
193 | 
194 | **问题**：选取变换矩阵$$\varphi$$，使得降维后的新向量在<mark style="color:orange;">**最小均方差条件下**</mark>接近原来的向量x
195 | 
196 | 
197 | 
198 | 对于$$x=\sum\limits_{j=1}^n$$，现在仅仅取m项，**对略去的系数项用预先选定的常数b代替**，此时对x的估计值为：
199 | $$
200 | \hat{x} = \sum_{j=1}^ma_i\varphi_i + \sum_{j=m+1}^nb\varphi_j
201 | $$
202 | 则产生的误差为：
203 | $$
204 | \Delta x = x - \hat{x} = \sum_{j=m+1}^n(a_j-b)\varphi_j
205 | $$
206 | 此时$$\Delta x$$的<mark style="color:purple;">**均方差**</mark>为：
207 | 
208 | $$
209 | \begin{align}
210 | \overline{\varepsilon^2} &= E\{\Vert\Delta x\Vert\}^2 \nonumber
211 | \\
212 | &= \sum_{j=m+1}^n\{E(a_j-b)^2\} \nonumber
213 | \end{align}
214 | $$
215 | 要使得$$\overline{\varepsilon^2}$$最小，则对b的选择应满足：
216 | $$
217 | \begin{align}
218 | \frac{\partial}{\partial b}[E(a_j-b)^2] &= \frac{\partial}{\partial b}[E(a_j^2-2a_jb+b^2)] \nonumber
219 | \\
220 | &= -2[E(a_j)-b] \nonumber
221 | \\
222 | &=0 \nonumber
223 | \end{align}
224 | $$
225 | 因此，$$b=E[a_j]$$，即<mark style="color:orange;">**对省略掉的a中的分量，应使用它们的数学期望来代替**</mark>，此时的误差为：
226 | $$
227 | \begin{align}
228 | \overline{\varepsilon^2} &= \sum_{j=m+1}^nE[(a_j-E\{a_j\})^2] \nonumber
229 | \\
230 | &= \sum_{j=m+1}^n\varphi^T_jE[(x-E\{x\})(x-E\{x\})^T]\varphi_j \nonumber
231 | \\
232 | &= \sum_{j=m+1}^n\varphi_j^TC_x\varphi_j \nonumber
233 | \end{align}
234 | $$
235 | 其中，$$C_x$$为x的<mark style="color:purple;">**协方差矩阵**</mark>
236 | 
237 | 
238 | 
239 | 设$$\lambda_j$$为$$C_x$$的第j个特征值，$$\varphi_j$$是与$$\lambda_j$$对应的特征向量，则：
240 | $$
241 | C_x\varphi_j = \lambda_j\varphi_j
242 | $$
243 | 由于$$\varphi_j$$是一个正交阵，因此有：
244 | $$
245 | \varphi_j^T\varphi_j=1
246 | $$
247 | 从而
248 | $$
249 | \varphi_j^TC_x\varphi_j = \lambda_j
250 | $$
251 | 因此
252 | $$
253 | \begin{align}
254 | \overline{\varepsilon^2} &= \sum_{j=m+1}^n\varphi_j^TC_x\varphi_j \nonumber
255 | \\
256 | &= \sum_{j=m+1}^n\lambda_j
257 | \end{align}
258 | $$
259 | 
260 | 
261 | 由此可以看出，<mark style="color:orange;">**特征值越小，误差也越小**</mark>
262 | 
263 | 
264 | 
265 | ### 结论
266 | 
267 | 从K-L展开式的性质和按最小均方差的准则来选择特征，应使$$E[a_j]=0$$。由于$$E[a]=E[\varphi^Tx]= \varphi^TE[x]$$，故应使$$E[x]=0$$。基于这一条件，在将整体模式进行K-L变换之前，应先将其均值作为新坐标轴的原点，采用协方差矩阵C或自相关矩阵R来计算特征值。如果$$E[x]\neq0$$，则只能得到“次最佳”的结果。
268 | 
269 | 将K-L展开式系数$$a_j$$（亦即变换后的特征）用$$y_j$$表示，写成向量形式：$$y= \varphi^Tx$$。此时变换矩阵$$\varphi$$用m个特征向量组成。为使误差最小，不采用的特征向量，其对应的<mark style="color:orange;">**特征值应尽可能小**</mark>。因此，将特征值按大小次序标号，即
270 | $$
271 | \lambda_1\gt\lambda_2\gt\dots\gt\lambda_m\gt\dots\gt\lambda_n\geq0
272 | $$
273 | 若首先采用前面的m个特征向量，便可使变换误差最小。此时的变换矩阵为：
274 | $$
275 | \varphi^T = \begin{pmatrix}
276 | \varphi_1^T\\
277 | \varphi_2^T\\
278 | \vdots\\
279 | \varphi_m^T
280 | \end{pmatrix}
281 | $$
282 | K-L变换是在均方误差最小的意义下获得数据压缩（降维）的最佳变换，且不受模式分布的限制。对于一种类别的模式特征提取，它不存在特征分类问题，只是实现用低维的m个特征来表示原来高维的n个特征，使其误差最小，亦即使其整个模式分布结构尽可能保持不变。
283 | 
284 | 通过K-L变换能获得互不相关的新特征。若采用较大特征值对应的特征向量组成变换矩阵，则能对应地保留原模式中方差最大的特征成分，所以K-L变换起到了减小相关性、突出差异性的效果。在此情况下， K-L变换也称为<mark style="color:orange;">**主成分变换**（**PCA变换**）</mark>。
285 | 
286 | {% hint style="warning" %}
287 | 
288 | 需要指出的是，采用K-L变换作为模式分类的特征提取时，要特别注意保留不同类别的模式分类鉴别信息，仅单纯考虑尽可能代表原来模式的主成分，有时并不一定有利于分类的鉴别。
289 | 
290 | {% endhint %}
291 | 
292 | 
293 | 
294 | #### K-L变换的一般步骤
295 | 
296 | 给定N个样本$$x^1,\dots,x^N$$，利用KL变换将其降至m维的步骤：
297 | 
298 | 1. 计算样本均值：$$\boldsymbol m =E(\boldsymbol x)$$
299 | 2. 平移样本：$$\boldsymbol{z=x-m}$$
300 | 3. 计算$$\boldsymbol z$$的自相关矩阵：$$R(z)=\sum\limits_{i=1}^N p(\omega_i)E(\boldsymbol{z\ z}^T)=C_x$$
301 | 4. 求协方差矩阵$$C_x$$的特征向量，取最大的m个特征值对应的m个特征向量构成变换矩阵$$\Phi$$
302 | 
303 | 
304 | 
305 | {% hint style="info" %}
306 | 
307 | 给定两类模式，其分布如图所示，试用K-L变换实现一维的特征提取（假定两类模式出现的概率相等）
308 | 
309 | ![](../.gitbook/assets/4.3.1.png)
310 | $$
311 | m = \frac{1}{5}\sum_{j=1}^{N_i}x_{1j}+\frac{1}{5}\sum_{j=1}^{N_i}x_{2j}=0
312 | $$
313 | 这符合K-L变换进行特征压缩的最佳条件。
314 | 
315 | 由于$$P(\omega_1)=P(\omega_2)=0.5$$，故：
316 | 
317 | 
318 | $$
319 | \begin{align}
320 | R &= \sum_{j=1}^2P(\omega_i)E\{xx^T\} \nonumber
321 | \\
322 | &= \frac{1}{2}\left[\frac{1}{5}\sum_{j=1}^5x_{1j}x_{1j}^T\right] + \frac{1}{2}\left[\frac{1}{5}\sum_{j=1}^5x_{2j}x_{2j}^T\right] \nonumber
323 | \\
324 | \\
325 | &= \begin{pmatrix}
326 | 25.4 & 25.0\\
327 | 25.0 & 25.4
328 | \end{pmatrix}
329 | \end{align}
330 | $$
331 | 
332 | 
333 | 解特征值方程$$\vert R-\lambda E\vert$$，求R的特征值：$$\lambda_1=50.4,\lambda_2=0.4$$，求解对应的特征向量，得到：
334 | 
335 | 
336 | $$
337 | \varphi_1=\frac{1}{\sqrt{2}}
338 | \begin{pmatrix}
339 | 1
340 | \\
341 | 1
342 | \end{pmatrix}
343 | \\
344 | \varphi_2=\frac{1}{\sqrt{2}}
345 | \begin{pmatrix}
346 | 1
347 | \\
348 | -1
349 | \end{pmatrix}
350 | $$
351 | 
352 | 
353 | 取较大的特征值$$\lambda_1$$对应的变换向量作为变换矩阵，由$$y=\varphi^Tx$$得到变换后的一维模式特征为：
354 | $$
355 | \omega_1:\left\{-\frac{10}{\sqrt2},-\frac{9}{\sqrt2},-\frac{9}{\sqrt2},-\frac{11}{\sqrt2},-\frac{11}{\sqrt2}\right\}
356 | \\
357 | \omega_2:\left\{\frac{10}{\sqrt2},\frac{11}{\sqrt2},\frac{11}{\sqrt2},\frac{9}{\sqrt2},\frac{9}{\sqrt2}\right\}
358 | $$
359 | {% endhint %}
360 | 
361 | 上面的例子中可以看到
362 | 
363 | 
364 | 
365 | ## 绘图代码
366 | 
367 | ```python
368 | import matplotlib.pyplot as plt
369 | import numpy as np
370 | 
371 | x = [4, 5, 5, 5, 6, -4, -5, -5, -5, -6]
372 | y = [5, 4, 5, 6, 5, -5, -4, -5, -6, -5]
373 | 
374 | plt.plot([5, -5], [5, -5], color='red', linestyle='--')
375 | plt.arrow(0, 0, 2, 2, head_width=0.2, head_length=0.5, fc='black', ec='black', zorder=2)
376 | plt.arrow(0, 0, 2, -2, head_width=0.2, head_length=0.5, fc='black', ec='black')
377 | plt.plot(x[0:5], y[0:5], 'o', color='white', markeredgecolor='blue')
378 | plt.plot(x[5:10], y[5:10], 'o', color='blue')
379 | 
380 | 
381 | plt.annotate('$\\phi_1$', xy=(2, 1.2), xytext=(2, 1.2), fontsize=12)
382 | plt.annotate('$\\phi_2$', xy=(2, -3), xytext=(2, -3), fontsize=12)
383 | 
384 | plt.axhline(0, color='black', linewidth=0.5)
385 | plt.axvline(0, color='black', linewidth=0.5)
386 | 
387 | plt.gca().set_aspect('equal', adjustable='box')
388 | plt.grid(True)
389 | 
390 | plt.show()
391 | ```
392 | 
393 | 


--------------------------------------------------------------------------------
/di-si-zhang-te-zheng-xuan-ze-he-ti-qu/fu-di-si-zhang-zuo-ye.md:
--------------------------------------------------------------------------------
  1 | # 附 第四章作业
  2 | 
  3 | ## Q1
  4 | 
  5 | ### 题目
  6 | 
  7 | 设有如下三类模式样本集$$\omega_1$$，$$\omega_2$$和$$\omega_3$$，其先验概率相等，求$$S_w$$和$$S_b$$
  8 | $$
  9 | \begin{align}
 10 | &\omega_1: \{(1\ 0)^T,(2\ 0)^T,(1\ 1)^T\}\\ \nonumber
 11 | &\omega_2: \{(-1\ 0)^T,(0\ 1)^T,(-1\ 1)^T\}\\
 12 | &\omega_3: \{(-1\ -1)^T,(0\ -1)^T,(0\ -2)^T\}
 13 | \end{align}
 14 | $$
 15 | 
 16 | ### 解
 17 | 
 18 | 由题意可知
 19 | $$
 20 | P(\omega_1)=P(\omega_2)=P(\omega_3) = \frac{1}{3}
 21 | $$
 22 | 先算出样本均值：
 23 | $$
 24 | m_1=\left(\frac{4}{3}\ \frac{1}{3}\right)^T
 25 | \\
 26 | m_2=\left(-\frac{2}{3}\ \frac{2}{3}\right)^T
 27 | \\
 28 | m_3=\left(-\frac{1}{3}\ -\frac{4}{3}\right)^T
 29 | $$
 30 | 则可得总体均值：
 31 | $$
 32 | m_0=E\{x\}=\sum_{j=1}^3P(\omega_i)m_i=\left(\frac19\ -\frac19\right)^T
 33 | $$
 34 | 类内离散度矩阵：
 35 | $$
 36 | \begin{align}
 37 | S_w &= \sum_{i=1}^3 P(\omega_i)E\{(\boldsymbol x-m_i)(\boldsymbol x-m_i)^T\mid \omega_i\}\\ \nonumber
 38 | \\
 39 | &=\sum_{i=1}^3P(\omega_i)\frac1N\sum_{k=1}^{N_i}(x_i^k-m_i)(k_i^k-m_i)^T\\
 40 | \\
 41 | &=\frac13
 42 | \begin{pmatrix}
 43 | \frac29 & -\frac19\\
 44 | -\frac19 & \frac29
 45 | \end{pmatrix}+
 46 | \frac13
 47 | \begin{pmatrix}
 48 | \frac29 & \frac19\\
 49 | \frac19 & \frac29
 50 | \end{pmatrix}+
 51 | \frac13
 52 | \begin{pmatrix}
 53 | \frac29 & -\frac19\\
 54 | -\frac19 & \frac29
 55 | \end{pmatrix}
 56 | \end{align}
 57 | $$
 58 | 类间离散度矩阵：
 59 | $$
 60 | S_b=\sum_{i=1}^cP(\omega_i)(m_i-m_0)(m_i-m_0)^T
 61 | $$
 62 | 具体计算我这里通过numpy计算得到：
 63 | 
 64 | ```python
 65 | from fractions import Fraction
 66 | 
 67 | import numpy as np
 68 | 
 69 | np.set_printoptions(formatter={'all': lambda x: str(Fraction(x).limit_denominator())})
 70 | 
 71 | w = np.array([
 72 |     [[1, 0], [2, 0], [1, 1]],
 73 |     [[-1, 0], [0, 1], [-1, 1]],
 74 |     [[-1, -1], [0, -1], [0, -2]]
 75 | ])
 76 | 
 77 | m = np.mean(w, axis=1)
 78 | m0 = np.mean(m, axis=0)
 79 | 
 80 | C = np.zeros((3, 2, 2))
 81 | for i in range(3):
 82 |     diff = w[i] - m[i]
 83 |     C[i] = np.dot(diff.T, diff)
 84 | C = (1 / 3) * C
 85 | 
 86 | sw = np.zeros((2, 2))
 87 | for i in range(3):
 88 |     sw = np.add(sw, (1 / 3) * C[i])
 89 | print('Sw=')
 90 | print(sw)
 91 | 
 92 | sb = np.zeros((2, 2))
 93 | for i in range(3):
 94 |     sb = np.add(sb, np.outer((m[i] - m0), (m[i] - m0)))
 95 | sb = (1 / 3) * sb
 96 | print('Sb = ')
 97 | print(sb)
 98 | ```
 99 | 
100 | Sw=
101 | [[2/9 -1/27]
102 |  [-1/27 2/9]]
103 | 
104 | Sb = 
105 | [[62/81 13/81]
106 |  [13/81 62/81]]
107 | 
108 | 
109 | 
110 | ## Q2
111 | 
112 | ### 题目
113 | 
114 | 设有如下两类样本集，其出现概率相等：
115 | $$
116 | \omega_1:\quad\{(0\ 0\ 0)^T,(1\ 0\ 0)^T,(1\ 0\ 1)^T,(1\ 1\ 0)^T\}
117 | \\
118 | \omega_2:\quad\{(0\ 0\ 1)^T,(0\ 1\ 0)^T,(0\ 1\ 1)^T,(1\ 1\ 1)^T\}
119 | $$
120 | 用K-L变换，分别把特征空间维数降到二维和一维，并画出样本在该空间中的位置
121 | 
122 | 
123 | 
124 | ### 解
125 | 
126 | 求总体均值
127 | $$
128 | \begin{align}
129 | \boldsymbol m &= E\{\boldsymbol x\}\\ \nonumber
130 | &=0.5\times\frac14\sum_{j=1}^4x_{1j} + 0.5\times\frac14\sum_{j=1}^4x_{2j}\\
131 | &=\left(\frac12\ \frac12\ \frac12\right)^T
132 | \end{align}
133 | $$
134 | 平移样本到原点：
135 | $$
136 | \boldsymbol z = \boldsymbol {x-m}
137 | $$
138 | 
139 | 
140 | 求协方差矩阵：
141 | $$
142 | \begin{align}
143 | R &= \sum_{i=1}^2P(\omega_i)E(\boldsymbol z_i\ \boldsymbol z_i^T)\\ \nonumber
144 | \\
145 | &=\sum_{i=1}^2P(\omega_i)\frac1N\sum_{j=1}^N(z_{ij}\ z_{ij}^T)\\
146 | \\
147 | &=\frac12\left[\frac14\sum_{j=1}^4z_{1j}z_{1j}^T\right] + \frac12\left[\frac14\sum_{j=1}^4z_{2j}z_{2j}^T\right]\\
148 | \\
149 | &=\begin{pmatrix}
150 | \frac14 & 0 & 0\\
151 | 0 & \frac14 & 0\\
152 | 0 & 0 & \frac14
153 | \end{pmatrix}
154 | \end{align}
155 | $$
156 | 
157 | 
158 | 求特征值和特征向量：
159 | $$
160 | \lambda_1=\lambda_2=\lambda_3=\frac14\\
161 | \phi_1=
162 | \begin{pmatrix}
163 | 1\\
164 | 0\\
165 | 0
166 | \end{pmatrix}
167 | \\
168 | \phi_2=
169 | \begin{pmatrix}
170 | 0\\
171 | 1\\
172 | 0
173 | \end{pmatrix}
174 | \\
175 | \phi_3=
176 | \begin{pmatrix}
177 | 0\\
178 | 0\\
179 | 1
180 | \end{pmatrix}
181 | $$
182 | 
183 | #### (1) 降到二维
184 | 
185 | 取前两大的特征值对应的特征向量组成转换矩阵：
186 | $$
187 | \Phi = \begin{pmatrix}
188 | 1 & 0\\
189 | 0 & 1\\
190 | 0 & 0
191 | \end{pmatrix}
192 | $$
193 | 则可以得到降维后的$$y=\Phi^Tx$$：
194 | $$
195 | \omega_1:\left\{\left(-\frac12\ -\frac12\right)^T,\left(\frac12\ -\frac12\right)^T, \left(\frac12\ -\frac12\right)^T, \left(\frac12\ \frac12\right)^T\right\}\\
196 | \\
197 | \omega_2:\left\{\left(-\frac12\ -\frac12\right)^T,\left(-\frac12\ \frac12\right)^T, \left(-\frac12\ \frac12\right)^T, \left(\frac12\ \frac12\right)^T\right\}
198 | $$
199 | 则绘制出图片：
200 | 
201 | ![](../.gitbook/assets/KL1.png)
202 | 
203 | 
204 | 
205 | #### (2) 降到一维
206 | 
207 | 同理，取第一大的特征值对应的特征向量作为转换矩阵，即可得到将为结果：
208 | 
209 | ![](../.gitbook/assets/KL2.png)
210 | 
211 | 
212 | 
213 | ### 代码
214 | 
215 | ```python
216 | from fractions import Fraction
217 | 
218 | import numpy as np
219 | from matplotlib import pyplot as plt
220 | 
221 | np.set_printoptions(formatter={'all': lambda x: str(Fraction(x).limit_denominator())})
222 | 
223 | 
224 | x = np.array([
225 |     [[0, 0, 0], [1, 0, 0], [1, 0, 1], [1, 1, 0]],
226 |     [[0, 0, 1], [0, 1, 0], [0, 1, 1], [1, 1, 1]]
227 | ])
228 | p = np.array([1 / 2, 1 / 2])
229 | 
230 | m = np.mean(x, axis=1)
231 | m0 = np.zeros(3)
232 | for i in range(2):
233 |     m0 = np.add(m0, m[i] * p[i])
234 | 
235 | z = x - m0
236 | 
237 | R = np.zeros((3, 3))
238 | for i in range(z.shape[0]):
239 |     R += p[i] * np.mean(np.einsum('ij,ik->ijk', z[i], z[i]), axis=0)
240 | 
241 | eig_val, eig_vec = np.linalg.eig(R)
242 | 
243 | sort_indices = np.argsort(-eig_val)
244 | eig_val = eig_val[sort_indices]
245 | eig_vec = eig_vec[sort_indices, :]
246 | 
247 | 
248 | # 降到2维
249 | transform = eig_vec[:2].T
250 | 
251 | y = np.dot(z, transform)
252 | 
253 | class1_data = y[0]
254 | class2_data = y[1]
255 | 
256 | 
257 | plt.scatter(class1_data[:, 0], class1_data[:, 1], marker='^', color='red', alpha=0.5, label='class 1')
258 | plt.scatter(class2_data[:, 0], class2_data[:, 1], marker='o', color='blue', alpha=0.5, label='class 2')
259 | 
260 | plt.title('KL Transformation 2D')
261 | plt.legend()
262 | plt.show()
263 | 
264 | 
265 | # 降到1维
266 | transform = eig_vec[:1].T
267 | 
268 | y = np.dot(z, transform)
269 | 
270 | class1_data = y[0]
271 | class2_data = y[1]
272 | 
273 | 
274 | plt.scatter(class1_data, np.zeros_like(class1_data), marker='^', color='red', alpha=0.5, label='class 1')
275 | plt.scatter(class2_data, np.zeros_like(class2_data), marker='o', color='blue', alpha=0.5, label='class 2')
276 | 
277 | plt.title('KL Transformation 1D')
278 | plt.legend()
279 | plt.show()
280 | ```
281 | 
282 | 


--------------------------------------------------------------------------------
/di-wu-zhang-tong-ji-ji-qi-xue-xi/5.1-ji-qi-xue-xi-jian-jie.md:
--------------------------------------------------------------------------------
 1 | # 5.1 机器学习简介
 2 | 
 3 | > 桑克（R. Shank） 
 4 | >
 5 | > “一台计算机若不会学习，就不能说它具有智能。”
 6 | 
 7 | 
 8 | 
 9 | ## 5.1.1 统计机器学习
10 | 
11 | - 机器学习
12 |   - 更强调面向算法
13 |   - 机器学习强调算法的结果要好，所以机器学习很关注损失函数
14 | - 统计学
15 |   - 更偏重于面向模型
16 |   - 统计学要先扔出来一大堆模型假设，然后站在模型上面通 过严格的数学推导做出结果
17 | 
18 | 
19 | 
20 | {% hint style="success" %}
21 | 
22 | <mark style="color:orange;">**统计机器学习**</mark>：是基于数据构建概率统计模型并运用模型对数据进行预测分析的一门学科
23 | 
24 | {% endhint %}
25 | 
26 | 
27 | 
28 | ## 5.1.2 机器学习三要素
29 | 
30 | {% hint style="success" %}
31 | 
32 | "A computer program is said to learn  from <mark style="color:red;">**experience E**</mark> with respect to some class of <mark style="color:red;">**tasks T**</mark> and  <mark style="color:red;">**performance measure P**</mark>, if its performance at tasks in T, as  measured by P, improves with experience E"
33 | 
34 | --Tom M. Mitchell
35 | 
36 | {% endhint %}
37 | 
38 | 
39 | 
40 | - <mark style="color:orange;">**经验（E）**</mark>：训练数据
41 | - <mark style="color:orange;">**模型（T）**</mark>：—需要学习的目标函数
42 | - 学习算法: 怎么样从经验中推断出模型
43 | - <mark style="color:orange;">**评价（P）**</mark>：测试数据
44 | 
45 | 
46 | 
47 | {% hint style="success" %}
48 | 
49 | 机器学习的任务：Improve on task（T）,  with respect to performance metric（P）,  based on experience（E）
50 | 
51 | {% endhint %}
52 | 
53 | 
54 | 
55 | ## 5.1.3 机器学习的特点
56 | 
57 | - <mark style="color:purple;">**数据**</mark>大量、廉价；<mark style="color:purple;">**知识**</mark>昂贵、稀少
58 | - 数据产生过程的细节是未知的，但是数据产生的过程不是完全随机的
59 | - 通过利用数据中的某些模式或规律从数据中<mark style="color:orange;">**学习模型**</mark>：反推数据生成路径
60 | - 模型通常不是完整过程的精确复制品，而是一种良好且有用的<mark style="color:orange;">**近似**</mark>：（George Box: “All models are wrong, but  some are useful.”) 
61 | - 模型可以<mark style="color:orange;">**描述**</mark>从数据中获取知识，或<mark style="color:orange;">**预测将来**</mark>（具有预测性），或者两者兼而有之
62 | - 几乎所有的科学都关注于<mark style="color:orange;">**用模型拟合数据**</mark>：推理
63 | 
64 | 
65 | 
66 | ## 5.1.4 机器学习的分类
67 | 
68 | - <mark style="color:red;">**有监督学习**</mark>：有标记数据，e.g. Fisher，感知器算法，线性判别分析
69 | - <mark style="color:red;">**无监督学习**</mark>：无标注数据，降维方法K-L
70 | - <mark style="color:red;">**半监督学习**</mark>：无标注数据+有标注数据
71 | - <mark style="color:blue;">**多任务学习**</mark>：共享相关任务之间的表征
72 | - <mark style="color:blue;">**迁移学习**</mark>：训练数据与测试数据不是同分布的
73 | - <mark style="color:blue;">**增强学习**</mark>：间接的标注数据（状态和对应的reward ）
74 | - <mark style="color:blue;">**主动学习**</mark>：主动选择训练数据
75 | - <mark style="color:blue;">**自监督学习**</mark>：从无标注数据提取监督信号
76 | 


--------------------------------------------------------------------------------
/di-wu-zhang-tong-ji-ji-qi-xue-xi/5.2-tong-ji-ji-qi-xue-xi.md:
--------------------------------------------------------------------------------
  1 | # 5.2 统计机器学习
  2 | 
  3 | ## 5.2.1 统计机器学习的框架
  4 | 
  5 | - **输入**：<mark style="color:orange;">**独立同分布**</mark>的训练样本$$(x_i,y_i)\in X\times Y,i=1,2,\dots,N$$
  6 |   - 回归问题：Y是连续的
  7 |   - 分类问题：Y是类别
  8 |   - 排序问题：Y是序数
  9 | - **目标函数**：$$f\in \mathcal{F}$$
 10 | - **损失函数**：$$L(f;x,y)$$
 11 | - **期望风险**：$$\int L(f;x,y)dP(x,y)$$
 12 | 
 13 | 
 14 | 
 15 | ## 5.2.2 回归及分类问题的最优函数
 16 | 
 17 | ### 一、回归问题
 18 | 
 19 | - **输入**：<mark style="color:orange;">**独立同分布**</mark>的训练样本$$(x_i,y_i)\in X\times Y,i=1,2,\dots,N$$
 20 | - **目标函数**：$$f\in \mathcal{F}$$
 21 |   - 线性回归：f是线性的
 22 |   - 广义线性：f是非线性的
 23 | - **损失函数**：$$L(f;x,y)=(f(x)-y)^2$$
 24 | - **期望风险**：$$\int (f(x)-y)^2dP(x,y)$$
 25 | 
 26 | 
 27 | 
 28 | ### 二、回归问题的最优函数
 29 | 
 30 | $$
 31 | \begin{align}
 32 | &\int (f(x)-y)^2dP(x,y) \nonumber
 33 | \\
 34 | =&\iint(f(x) - y)^2p(x,y)dxdy \nonumber
 35 | \\
 36 | =&\iint(f^2(x) - 2yf(x) + y^2)p(y\vert x)p(x)dxdy \nonumber
 37 | \\
 38 | =&\int\left[\int (f^2(x) - 2yf(x) + y^2)p(y\vert x)p(x)dy\right]dx \nonumber
 39 | \\
 40 | =&\int Q(f(x),y)p(x)dx \nonumber
 41 | \end{align}
 42 | $$
 43 | 
 44 | 其中，$$Q(f(x),y)=f^2(x)-2E(y\vert x)f(x) + E(y^2\vert x)$$
 45 | 
 46 | 关于$$f(x)$$求导并令其等于0，即可得到上述问题的解：
 47 | $$
 48 | f(x) = E(y\vert x)=\int yp(y\vert x)dy
 49 | $$
 50 | 
 51 | 
 52 | {% hint style="success" %}
 53 | 
 54 | <mark style="color:orange;">**最小化均方误差**</mark>（MSE）的回归函数是由有条件分布$$p(y\vert x)$$的y的均值给出
 55 | 
 56 | {% endhint %}
 57 | 
 58 | 
 59 | 
 60 | ### 三、分类问题
 61 | 
 62 | - **输入**：<mark style="color:orange;">**独立同分布**</mark>的训练样本$$(x_i,y_i)\in X\times Y,i=1,2,\dots,N$$
 63 | - **目标函数**：$$f\in \mathcal{F}$$
 64 | - **损失函数**：$$L(f;x,y)=I_{\{f(x)\neq y\}}$$
 65 | - **期望风险**：$$\int I_{\{f(x)\neq y\}}dP(x,y)=P(f(x)\neq y)$$
 66 | 
 67 | 
 68 | 
 69 | ### 四、分类问题的最优函数
 70 | 
 71 | 要求的是<mark style="color:purple;">**最小期望风险**</mark>：
 72 | $$
 73 | \begin{align}
 74 | & \int I_{\{f(x)\neq y\}}dP(x,y) \nonumber
 75 | \\
 76 | =& P(f(x)\neq y) \nonumber
 77 | \\
 78 | =&\sum_{f(x)\neq C_i}P(C_i \vert x)p(x) \nonumber
 79 | \end{align}
 80 | $$
 81 | {% hint style="warning" %}
 82 | 
 83 | 这里其实是求的<mark style="color:red;">**分类错误的概率**</mark>，因此需要将其最小化
 84 | 
 85 | {% endhint %}
 86 | 
 87 | 因此，目标函数就是$$f(x)=\max\limits_{C_i}P(C_i\vert x)$$
 88 | 
 89 | {% hint style="success" %}
 90 | 
 91 | 最小化0-损失的贝叶斯分类器选择具有最大条件分布$$p(y\vert x)$$的类标签
 92 | 
 93 | {% endhint %}
 94 | 
 95 | 
 96 | $$
 97 | \text{choose}\ C_i\ if P(C_i\vert x) = \max\limits_{k}P(C_k\vert x)
 98 | $$
 99 | 
100 | 
101 | 
102 | 
103 | ## 5.2.3 过拟合和正则化
104 | 
105 | ### 一、风险最小化
106 | 
107 | <mark style="color:orange;">**期望风险**</mark>最小化：
108 | $$
109 | R_{exp} = \int L(f;x,y)dP(x,y)
110 | $$
111 | 
112 | 
113 | <mark style="color:orange;">**经验风险**</mark>最小化：
114 | $$
115 | R_{emp}(f)=\frac{1}{N}\sum_{i=1}^NL(f;x_i,y_i)
116 | $$
117 | 
118 | 
119 | <mark style="color:orange;">**结构风险**</mark>最小化：
120 | $$
121 | R_{srm}(f) = \frac{1}{N}\sum_{i=1}^NL(f;x,y) + \lambda J(f)
122 | $$
123 | 上式中的$$\lambda J(f)$$称为<mark style="color:orange;">**正则项**</mark>或<mark style="color:orange;">**惩罚函数**</mark>
124 | 
125 | 
126 | 
127 | ### 二、过拟合
128 | 
129 | ![](../.gitbook/assets/5.2.1.png)
130 | 
131 | 
132 | 
133 | 
134 | 
135 | ## 5.2.4 泛化能力分析
136 | 


--------------------------------------------------------------------------------
/di-yi-zhang-gai-shu/1.1-gai-shu.md:
--------------------------------------------------------------------------------
 1 | # 1.1 概述
 2 | 
 3 | ### 一、概念
 4 | 
 5 | #### 1.1 什么是模式？
 6 | 
 7 | {% hint style="success" %}
 8 | 
 9 | 模式所指的不是事物本身，而是从事物获得的信息，因此，模式往往表现为具有时间和空间分布的信息
10 | {% endhint %}
11 | 
12 | #### 1.2 模式的直观特征
13 | 
14 | - 可观察性
15 | - 可区分性
16 | - 相似性
17 | 
18 | #### 1.3 模式识别与机器学习的目的
19 | 
20 | 利用计算机对物理对象进行分类，在错误概率最小的条件下，使识别的结果尽量与客观物体相符合。
21 | 
22 | $$
23 | Y = F(X)
24 | $$
25 | 
26 | 
27 | - X：定义域取自特征集
28 | - Y：值域为类别的标号集
29 | - F：模式识别的判别方法
30 | 
31 | {% hint style="warning" %}
32 | 机器学习利用大量的训练数据可以获得更好的预测结果。
33 | {% endhint %}
34 | 
35 | #### 1.4 什么是机器学习？
36 | 
37 | 
38 | 
39 | {% hint style="success" %}
40 | 机器学习是研究如何构造理论、算法和计算机系统，让机器通过从数据中学习后可以进行如下工作：分类和识别事物、推理决策、预测未来等。
41 | {% endhint %}
42 | 
43 | ### 二、方法
44 | 
45 | #### 2.1 模式识别与机器学习的目标
46 | 
47 | - **模式识别**：在特征空间和解释空间之间找到一种映射关系，这种映射也称之为假说
48 | - **机器学习**：针对某类任务T，用P衡量性能，根据经验来学习和自我完善，提高性能
49 | 
50 | {% hint style="success" %}
51 | **特征空间** ：从模式得到的对分类有用的度量、属性或基元构成的空间     
52 |  **解释空间** ：将c个类别表示为$$ωi∈Ω,i=1,2,⋯,c\omega_{i} \in \Omega, i=1,2,\cdots,c$$。其中，$$\Omega$$为所属类别的集合，称为解释空间
53 | {% endhint %}
54 | 
55 | #### 2.2 获得假说的两种方法
56 | 
57 | - **监督学习**：<mark style="color:orange;">**在特征空间中找到一个与解释空间的结构相对应的假说**</mark>。在给定模式下假定一个解决方案，任何在训练集中接近目标的假说也都必须在“未知”的样本上得到近似的结果
58 |   - 依靠已知所属类别的训练样本集，按它们特征向量的分布来确定假说 （通常为一个判别函数），在判别函数确定之后能用它对未知的模式进行分类
59 |   - 对分类的模式要有足够的先验知识，通常需要采集足够数量的具有典型性的样本进行训练
60 | - **非监督学习**：<mark style="color:orange;">**在解释空间中找到一个与特征空间的结构相对应的假说**</mark>。这种方法试图找到一种只以特征空间中的相似关系为基础的有效假说
61 |   - 在没有先验知识的情况下，通常采用聚类分析方法，基于“物以类聚”的观点，用数学方法分析各特征向量之间的距离及分散情况
62 |   - 如果特征向量集聚集若干个群，可按群间距离远近把它们划分成类
63 |   - 这种按各类之间的亲疏程度的划分，若事先能知道应划分成几类，则可获得更好的分类结果
64 | 
65 | #### 2.3 主要的分类和学习方法
66 | 
67 | - **数据聚类**
68 |   - **目标**：用某种相似性度量的方法将原始数据组织成有意义的和有用的各种数据集
69 |   - **性质**：是一种非监督学习的方法，解决方案是数据驱动的
70 | - **统计分类**
71 |   - 基于概率统计模型得到各类别的特征向量的分布，以取得分类的方法
72 |   - 特征向量分布的获得是基于一个**类别已知**的训练样本集
73 |   - 是一种监督分类的方法，分类器是概念驱动的
74 | - 结构模式识别
75 | - 神经网络
76 | - 监督学习
77 | - 无监督学习
78 | - 半监督学习
79 | - 增强学习
80 | - 集成学习
81 | - 深度学习
82 | - 元学习
83 | - 多任务学习
84 | - 多标记学习
85 | - 对抗学习
86 | 


--------------------------------------------------------------------------------
/pull.bat:
--------------------------------------------------------------------------------
1 | git pull


--------------------------------------------------------------------------------
/update.bat:
--------------------------------------------------------------------------------
1 | git add .
2 | git commit -m "update"
3 | git push origin -u main


--------------------------------------------------------------------------------