├── .gitignore
├── LICENSE
├── README.md
├── articles
├── 00-introduction
│ ├── image
│ │ └── introduction
│ │ │ ├── 1694248366853.png
│ │ │ ├── 1694249048730.png
│ │ │ └── 1694249575455.png
│ ├── imgs
│ │ ├── langchain+chatglm.png
│ │ ├── 大模型技术栈-实战与应用.png
│ │ └── 大模型技术栈-算法与原理.png
│ ├── introduction.docx
│ └── introduction.md
├── 01-agent
│ ├── imgs
│ │ ├── image-20230909103314020.png
│ │ ├── image-20230909103849920.png
│ │ ├── image-20230909111629069.png
│ │ ├── image-20230909112136592.png
│ │ ├── react_font.png
│ │ ├── 行动决策与环境感知的任务协同分析-第二篇
│ │ │ ├── image-20230909103849920.png
│ │ │ ├── image-20231021205743582.png
│ │ │ ├── image-20231021213049923.png
│ │ │ ├── image-20231021213107635.png
│ │ │ ├── image-20231021221344238.png
│ │ │ └── images.jpeg
│ │ └── 行动决策与环境感知的任务协同分析
│ │ │ ├── image-20230909103314020.png
│ │ │ ├── image-20230909103849920.png
│ │ │ ├── image-20230909111629069.png
│ │ │ ├── image-20230909133639650.png
│ │ │ ├── image-20230909152745682.png
│ │ │ ├── image-20230909152808532.png
│ │ │ ├── image-20230909160034084.png
│ │ │ ├── image-20230909160047075.png
│ │ │ ├── image-20230909160058793.png
│ │ │ ├── image-20230909160334508.png
│ │ │ └── image-20230909160347941.png
│ ├── 关于REACT中行动决策与环境感知的任务协同分析.pptx
│ ├── 此处是一个高维空间在同空间下,低维的子空间映射图.kra
│ ├── 行动决策与环境感知的任务协同分析-第一篇.md
│ └── 行动决策与环境感知的任务协同分析-第二篇.md
├── 02-chatchat加载p-tuning
│ └── chatchat加载ptuning.md
├── 03-大模型技术栈概览
│ ├── 大模型技术栈-算法与原理.docx
│ └── 大模型技术栈-算法与原理.png
├── 04-大模型推理优化策略
│ ├── 大模型推理优化策略.docx
│ └── 大模型推理优化策略.png
├── 05-大模型指令对齐训练
│ ├── 大模型指令对齐训练原理.docx
│ └── 大模型指令对齐训练原理.png
├── 06-大模型分布式训练技术
│ ├── 分布式训练技术原理.docx
│ └── 分布式训练技术原理.png
├── 07-大模型应用技术原理
│ ├── 大模型应用技术原理.docx
│ └── 大模型应用技术原理.png
├── 08-强化学习简介
│ └── 强化学习简介.docx
├── 09-AIOps调研报告
│ ├── AIOps.docx
│ └── AIOps.png
├── 10-AIOps方法论
│ ├── AIOps方法论.docx
│ └── AIOps方法论.png
├── 11-RCA根因分析技术
│ ├── RCA Survey.docx
│ └── RCA Survey.png
└── 12-有限马尔可夫过程简介
│ └── 有限马尔可夫过程.md
└── chatchat-qrcode.jpg
/.gitignore:
--------------------------------------------------------------------------------
1 | # Default ignored files
2 |
3 | /.idea
4 |
--------------------------------------------------------------------------------
/LICENSE:
--------------------------------------------------------------------------------
1 | Apache License
2 | Version 2.0, January 2004
3 | http://www.apache.org/licenses/
4 |
5 | TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
6 |
7 | 1. Definitions.
8 |
9 | "License" shall mean the terms and conditions for use, reproduction,
10 | and distribution as defined by Sections 1 through 9 of this document.
11 |
12 | "Licensor" shall mean the copyright owner or entity authorized by
13 | the copyright owner that is granting the License.
14 |
15 | "Legal Entity" shall mean the union of the acting entity and all
16 | other entities that control, are controlled by, or are under common
17 | control with that entity. For the purposes of this definition,
18 | "control" means (i) the power, direct or indirect, to cause the
19 | direction or management of such entity, whether by contract or
20 | otherwise, or (ii) ownership of fifty percent (50%) or more of the
21 | outstanding shares, or (iii) beneficial ownership of such entity.
22 |
23 | "You" (or "Your") shall mean an individual or Legal Entity
24 | exercising permissions granted by this License.
25 |
26 | "Source" form shall mean the preferred form for making modifications,
27 | including but not limited to software source code, documentation
28 | source, and configuration files.
29 |
30 | "Object" form shall mean any form resulting from mechanical
31 | transformation or translation of a Source form, including but
32 | not limited to compiled object code, generated documentation,
33 | and conversions to other media types.
34 |
35 | "Work" shall mean the work of authorship, whether in Source or
36 | Object form, made available under the License, as indicated by a
37 | copyright notice that is included in or attached to the work
38 | (an example is provided in the Appendix below).
39 |
40 | "Derivative Works" shall mean any work, whether in Source or Object
41 | form, that is based on (or derived from) the Work and for which the
42 | editorial revisions, annotations, elaborations, or other modifications
43 | represent, as a whole, an original work of authorship. For the purposes
44 | of this License, Derivative Works shall not include works that remain
45 | separable from, or merely link (or bind by name) to the interfaces of,
46 | the Work and Derivative Works thereof.
47 |
48 | "Contribution" shall mean any work of authorship, including
49 | the original version of the Work and any modifications or additions
50 | to that Work or Derivative Works thereof, that is intentionally
51 | submitted to Licensor for inclusion in the Work by the copyright owner
52 | or by an individual or Legal Entity authorized to submit on behalf of
53 | the copyright owner. For the purposes of this definition, "submitted"
54 | means any form of electronic, verbal, or written communication sent
55 | to the Licensor or its representatives, including but not limited to
56 | communication on electronic mailing lists, source code control systems,
57 | and issue tracking systems that are managed by, or on behalf of, the
58 | Licensor for the purpose of discussing and improving the Work, but
59 | excluding communication that is conspicuously marked or otherwise
60 | designated in writing by the copyright owner as "Not a Contribution."
61 |
62 | "Contributor" shall mean Licensor and any individual or Legal Entity
63 | on behalf of whom a Contribution has been received by Licensor and
64 | subsequently incorporated within the Work.
65 |
66 | 2. Grant of Copyright License. Subject to the terms and conditions of
67 | this License, each Contributor hereby grants to You a perpetual,
68 | worldwide, non-exclusive, no-charge, royalty-free, irrevocable
69 | copyright license to reproduce, prepare Derivative Works of,
70 | publicly display, publicly perform, sublicense, and distribute the
71 | Work and such Derivative Works in Source or Object form.
72 |
73 | 3. Grant of Patent License. Subject to the terms and conditions of
74 | this License, each Contributor hereby grants to You a perpetual,
75 | worldwide, non-exclusive, no-charge, royalty-free, irrevocable
76 | (except as stated in this section) patent license to make, have made,
77 | use, offer to sell, sell, import, and otherwise transfer the Work,
78 | where such license applies only to those patent claims licensable
79 | by such Contributor that are necessarily infringed by their
80 | Contribution(s) alone or by combination of their Contribution(s)
81 | with the Work to which such Contribution(s) was submitted. If You
82 | institute patent litigation against any entity (including a
83 | cross-claim or counterclaim in a lawsuit) alleging that the Work
84 | or a Contribution incorporated within the Work constitutes direct
85 | or contributory patent infringement, then any patent licenses
86 | granted to You under this License for that Work shall terminate
87 | as of the date such litigation is filed.
88 |
89 | 4. Redistribution. You may reproduce and distribute copies of the
90 | Work or Derivative Works thereof in any medium, with or without
91 | modifications, and in Source or Object form, provided that You
92 | meet the following conditions:
93 |
94 | (a) You must give any other recipients of the Work or
95 | Derivative Works a copy of this License; and
96 |
97 | (b) You must cause any modified files to carry prominent notices
98 | stating that You changed the files; and
99 |
100 | (c) You must retain, in the Source form of any Derivative Works
101 | that You distribute, all copyright, patent, trademark, and
102 | attribution notices from the Source form of the Work,
103 | excluding those notices that do not pertain to any part of
104 | the Derivative Works; and
105 |
106 | (d) If the Work includes a "NOTICE" text file as part of its
107 | distribution, then any Derivative Works that You distribute must
108 | include a readable copy of the attribution notices contained
109 | within such NOTICE file, excluding those notices that do not
110 | pertain to any part of the Derivative Works, in at least one
111 | of the following places: within a NOTICE text file distributed
112 | as part of the Derivative Works; within the Source form or
113 | documentation, if provided along with the Derivative Works; or,
114 | within a display generated by the Derivative Works, if and
115 | wherever such third-party notices normally appear. The contents
116 | of the NOTICE file are for informational purposes only and
117 | do not modify the License. You may add Your own attribution
118 | notices within Derivative Works that You distribute, alongside
119 | or as an addendum to the NOTICE text from the Work, provided
120 | that such additional attribution notices cannot be construed
121 | as modifying the License.
122 |
123 | You may add Your own copyright statement to Your modifications and
124 | may provide additional or different license terms and conditions
125 | for use, reproduction, or distribution of Your modifications, or
126 | for any such Derivative Works as a whole, provided Your use,
127 | reproduction, and distribution of the Work otherwise complies with
128 | the conditions stated in this License.
129 |
130 | 5. Submission of Contributions. Unless You explicitly state otherwise,
131 | any Contribution intentionally submitted for inclusion in the Work
132 | by You to the Licensor shall be under the terms and conditions of
133 | this License, without any additional terms or conditions.
134 | Notwithstanding the above, nothing herein shall supersede or modify
135 | the terms of any separate license agreement you may have executed
136 | with Licensor regarding such Contributions.
137 |
138 | 6. Trademarks. This License does not grant permission to use the trade
139 | names, trademarks, service marks, or product names of the Licensor,
140 | except as required for reasonable and customary use in describing the
141 | origin of the Work and reproducing the content of the NOTICE file.
142 |
143 | 7. Disclaimer of Warranty. Unless required by applicable law or
144 | agreed to in writing, Licensor provides the Work (and each
145 | Contributor provides its Contributions) on an "AS IS" BASIS,
146 | WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
147 | implied, including, without limitation, any warranties or conditions
148 | of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
149 | PARTICULAR PURPOSE. You are solely responsible for determining the
150 | appropriateness of using or redistributing the Work and assume any
151 | risks associated with Your exercise of permissions under this License.
152 |
153 | 8. Limitation of Liability. In no event and under no legal theory,
154 | whether in tort (including negligence), contract, or otherwise,
155 | unless required by applicable law (such as deliberate and grossly
156 | negligent acts) or agreed to in writing, shall any Contributor be
157 | liable to You for damages, including any direct, indirect, special,
158 | incidental, or consequential damages of any character arising as a
159 | result of this License or out of the use or inability to use the
160 | Work (including but not limited to damages for loss of goodwill,
161 | work stoppage, computer failure or malfunction, or any and all
162 | other commercial damages or losses), even if such Contributor
163 | has been advised of the possibility of such damages.
164 |
165 | 9. Accepting Warranty or Additional Liability. While redistributing
166 | the Work or Derivative Works thereof, You may choose to offer,
167 | and charge a fee for, acceptance of support, warranty, indemnity,
168 | or other liability obligations and/or rights consistent with this
169 | License. However, in accepting such obligations, You may act only
170 | on Your own behalf and on Your sole responsibility, not on behalf
171 | of any other Contributor, and only if You agree to indemnify,
172 | defend, and hold each Contributor harmless for any liability
173 | incurred by, or claims asserted against, such Contributor by reason
174 | of your accepting any such warranty or additional liability.
175 |
176 | END OF TERMS AND CONDITIONS
177 |
178 | APPENDIX: How to apply the Apache License to your work.
179 |
180 | To apply the Apache License to your work, attach the following
181 | boilerplate notice, with the fields enclosed by brackets "[]"
182 | replaced with your own identifying information. (Don't include
183 | the brackets!) The text should be enclosed in the appropriate
184 | comment syntax for the file format. We also recommend that a
185 | file or class name and description of purpose be included on the
186 | same "printed page" as the copyright notice for easier
187 | identification within third-party archives.
188 |
189 | Copyright [yyyy] [name of copyright owner]
190 |
191 | Licensed under the Apache License, Version 2.0 (the "License");
192 | you may not use this file except in compliance with the License.
193 | You may obtain a copy of the License at
194 |
195 | http://www.apache.org/licenses/LICENSE-2.0
196 |
197 | Unless required by applicable law or agreed to in writing, software
198 | distributed under the License is distributed on an "AS IS" BASIS,
199 | WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
200 | See the License for the specific language governing permissions and
201 | limitations under the License.
202 |
--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
1 | # chatchat-knowledgebase
2 | 属于每个人的公众号”查特查特“上线啦!新问题、新方法、新发现,欢迎提PR!
3 |
--------------------------------------------------------------------------------
/articles/00-introduction/image/introduction/1694248366853.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/chatchat-space/chatchat-knowledgebase/30e19bc82557c004c7cc0978d25bc46523a75b49/articles/00-introduction/image/introduction/1694248366853.png
--------------------------------------------------------------------------------
/articles/00-introduction/image/introduction/1694249048730.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/chatchat-space/chatchat-knowledgebase/30e19bc82557c004c7cc0978d25bc46523a75b49/articles/00-introduction/image/introduction/1694249048730.png
--------------------------------------------------------------------------------
/articles/00-introduction/image/introduction/1694249575455.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/chatchat-space/chatchat-knowledgebase/30e19bc82557c004c7cc0978d25bc46523a75b49/articles/00-introduction/image/introduction/1694249575455.png
--------------------------------------------------------------------------------
/articles/00-introduction/imgs/langchain+chatglm.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/chatchat-space/chatchat-knowledgebase/30e19bc82557c004c7cc0978d25bc46523a75b49/articles/00-introduction/imgs/langchain+chatglm.png
--------------------------------------------------------------------------------
/articles/00-introduction/imgs/大模型技术栈-实战与应用.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/chatchat-space/chatchat-knowledgebase/30e19bc82557c004c7cc0978d25bc46523a75b49/articles/00-introduction/imgs/大模型技术栈-实战与应用.png
--------------------------------------------------------------------------------
/articles/00-introduction/imgs/大模型技术栈-算法与原理.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/chatchat-space/chatchat-knowledgebase/30e19bc82557c004c7cc0978d25bc46523a75b49/articles/00-introduction/imgs/大模型技术栈-算法与原理.png
--------------------------------------------------------------------------------
/articles/00-introduction/introduction.docx:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/chatchat-space/chatchat-knowledgebase/30e19bc82557c004c7cc0978d25bc46523a75b49/articles/00-introduction/introduction.docx
--------------------------------------------------------------------------------
/articles/00-introduction/introduction.md:
--------------------------------------------------------------------------------
1 | # 公众号简介
2 |
3 | **Hello there!**
4 |
5 | Everybody, put your hands up! ~~无人期待~~无比期待的langchain-chatchat官方订阅号终于上线了!!!
6 |
7 | 本公众号计划分三个栏目:
8 |
9 | * langchain-chatchat官方事件发布
10 | * 大模型技术栈-算法与原理
11 | * 大模型技术栈-实战与应用
12 |
13 | ## 1. langchain-chatchat官方事件发布
14 |
15 | 
16 |
17 | **图1 langchain-chatchat框架总览**
18 |
19 | 本栏目主要发布项目相关信息,计划发布包括但不限于如下内容:
20 |
21 | * 版本更新简报
22 | * 紧急补丁公告
23 | * 常见错误问题定位与解决方案
24 | * 重要依赖包更新简况
25 | * 为开发组提供各种白嫖资源的金主爸爸的各种硬广与软广
26 | * 还没想好的其他内容
27 |
28 | ## 2. 大模型技术栈-算法与原理
29 |
30 | 
31 |
32 | **图2 大模型技术栈-算法与原理**
33 |
34 | 本栏目主要关注大模型全流程的算法与原理解读,计划发布包括但不限于如下内容:
35 |
36 | * tokenzier技术
37 | * position encoding技术与上下文外推技术
38 | * 大模型技术常用的高效注意力机制
39 | * 大模型分布式训练技术
40 | * 各种PEFT技术
41 | * 以量化为主的模型压缩技术
42 | * 推理加速与显存优化技术
43 | * 大模型推理增强方案
44 | * 限于作者能力而导致内容解读错误的道歉信
45 |
46 | ## 3. 大模型技术栈-实战与应用
47 |
48 | 
49 |
50 | **图3 大模型技术栈-实战与应用**
51 |
52 | 本栏目主要关注大模型落地相关的开源框架解析,计划发布包括但不限于如下内容:
53 |
54 | * 大模型训练框架
55 | * 大模型推理框架
56 | * 大模型压缩框架
57 | * 向量数据库
58 | * 大模型落地应用框架
59 | * 主流python前端框架
60 | * 主流python API工具
61 | * 与大模型落地相关的其他技术方案与框架
62 | * 与大模型落地无关但作者觉得不吐不快的一些问题
63 |
64 | ## 4. 几点重要说明
65 |
66 | 1. **限于几位作者都是正经打工人,更新频率可能无法保证;**
67 | 2. **限于几位作者的能力可能会出现解读错误的问题,在此希望广大开发者能不吝批评指正;**
68 | 3. **栏目二和栏目三属于常规的画饼环节,别太当真;**
69 | 4. **广告位长期招租;**
70 |
71 | ## 5. 总结
72 |
73 | 总而言之,言而总之,事情就是这么个事情,情况就是这么个情况,langchain-chatchat开发组开通了订阅号~~求赞助~~分享技术,希望大家多多~~打赏~~支持。
74 |
--------------------------------------------------------------------------------
/articles/01-agent/imgs/image-20230909103314020.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/chatchat-space/chatchat-knowledgebase/30e19bc82557c004c7cc0978d25bc46523a75b49/articles/01-agent/imgs/image-20230909103314020.png
--------------------------------------------------------------------------------
/articles/01-agent/imgs/image-20230909103849920.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/chatchat-space/chatchat-knowledgebase/30e19bc82557c004c7cc0978d25bc46523a75b49/articles/01-agent/imgs/image-20230909103849920.png
--------------------------------------------------------------------------------
/articles/01-agent/imgs/image-20230909111629069.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/chatchat-space/chatchat-knowledgebase/30e19bc82557c004c7cc0978d25bc46523a75b49/articles/01-agent/imgs/image-20230909111629069.png
--------------------------------------------------------------------------------
/articles/01-agent/imgs/image-20230909112136592.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/chatchat-space/chatchat-knowledgebase/30e19bc82557c004c7cc0978d25bc46523a75b49/articles/01-agent/imgs/image-20230909112136592.png
--------------------------------------------------------------------------------
/articles/01-agent/imgs/react_font.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/chatchat-space/chatchat-knowledgebase/30e19bc82557c004c7cc0978d25bc46523a75b49/articles/01-agent/imgs/react_font.png
--------------------------------------------------------------------------------
/articles/01-agent/imgs/行动决策与环境感知的任务协同分析-第二篇/image-20230909103849920.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/chatchat-space/chatchat-knowledgebase/30e19bc82557c004c7cc0978d25bc46523a75b49/articles/01-agent/imgs/行动决策与环境感知的任务协同分析-第二篇/image-20230909103849920.png
--------------------------------------------------------------------------------
/articles/01-agent/imgs/行动决策与环境感知的任务协同分析-第二篇/image-20231021205743582.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/chatchat-space/chatchat-knowledgebase/30e19bc82557c004c7cc0978d25bc46523a75b49/articles/01-agent/imgs/行动决策与环境感知的任务协同分析-第二篇/image-20231021205743582.png
--------------------------------------------------------------------------------
/articles/01-agent/imgs/行动决策与环境感知的任务协同分析-第二篇/image-20231021213049923.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/chatchat-space/chatchat-knowledgebase/30e19bc82557c004c7cc0978d25bc46523a75b49/articles/01-agent/imgs/行动决策与环境感知的任务协同分析-第二篇/image-20231021213049923.png
--------------------------------------------------------------------------------
/articles/01-agent/imgs/行动决策与环境感知的任务协同分析-第二篇/image-20231021213107635.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/chatchat-space/chatchat-knowledgebase/30e19bc82557c004c7cc0978d25bc46523a75b49/articles/01-agent/imgs/行动决策与环境感知的任务协同分析-第二篇/image-20231021213107635.png
--------------------------------------------------------------------------------
/articles/01-agent/imgs/行动决策与环境感知的任务协同分析-第二篇/image-20231021221344238.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/chatchat-space/chatchat-knowledgebase/30e19bc82557c004c7cc0978d25bc46523a75b49/articles/01-agent/imgs/行动决策与环境感知的任务协同分析-第二篇/image-20231021221344238.png
--------------------------------------------------------------------------------
/articles/01-agent/imgs/行动决策与环境感知的任务协同分析-第二篇/images.jpeg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/chatchat-space/chatchat-knowledgebase/30e19bc82557c004c7cc0978d25bc46523a75b49/articles/01-agent/imgs/行动决策与环境感知的任务协同分析-第二篇/images.jpeg
--------------------------------------------------------------------------------
/articles/01-agent/imgs/行动决策与环境感知的任务协同分析/image-20230909103314020.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/chatchat-space/chatchat-knowledgebase/30e19bc82557c004c7cc0978d25bc46523a75b49/articles/01-agent/imgs/行动决策与环境感知的任务协同分析/image-20230909103314020.png
--------------------------------------------------------------------------------
/articles/01-agent/imgs/行动决策与环境感知的任务协同分析/image-20230909103849920.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/chatchat-space/chatchat-knowledgebase/30e19bc82557c004c7cc0978d25bc46523a75b49/articles/01-agent/imgs/行动决策与环境感知的任务协同分析/image-20230909103849920.png
--------------------------------------------------------------------------------
/articles/01-agent/imgs/行动决策与环境感知的任务协同分析/image-20230909111629069.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/chatchat-space/chatchat-knowledgebase/30e19bc82557c004c7cc0978d25bc46523a75b49/articles/01-agent/imgs/行动决策与环境感知的任务协同分析/image-20230909111629069.png
--------------------------------------------------------------------------------
/articles/01-agent/imgs/行动决策与环境感知的任务协同分析/image-20230909133639650.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/chatchat-space/chatchat-knowledgebase/30e19bc82557c004c7cc0978d25bc46523a75b49/articles/01-agent/imgs/行动决策与环境感知的任务协同分析/image-20230909133639650.png
--------------------------------------------------------------------------------
/articles/01-agent/imgs/行动决策与环境感知的任务协同分析/image-20230909152745682.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/chatchat-space/chatchat-knowledgebase/30e19bc82557c004c7cc0978d25bc46523a75b49/articles/01-agent/imgs/行动决策与环境感知的任务协同分析/image-20230909152745682.png
--------------------------------------------------------------------------------
/articles/01-agent/imgs/行动决策与环境感知的任务协同分析/image-20230909152808532.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/chatchat-space/chatchat-knowledgebase/30e19bc82557c004c7cc0978d25bc46523a75b49/articles/01-agent/imgs/行动决策与环境感知的任务协同分析/image-20230909152808532.png
--------------------------------------------------------------------------------
/articles/01-agent/imgs/行动决策与环境感知的任务协同分析/image-20230909160034084.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/chatchat-space/chatchat-knowledgebase/30e19bc82557c004c7cc0978d25bc46523a75b49/articles/01-agent/imgs/行动决策与环境感知的任务协同分析/image-20230909160034084.png
--------------------------------------------------------------------------------
/articles/01-agent/imgs/行动决策与环境感知的任务协同分析/image-20230909160047075.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/chatchat-space/chatchat-knowledgebase/30e19bc82557c004c7cc0978d25bc46523a75b49/articles/01-agent/imgs/行动决策与环境感知的任务协同分析/image-20230909160047075.png
--------------------------------------------------------------------------------
/articles/01-agent/imgs/行动决策与环境感知的任务协同分析/image-20230909160058793.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/chatchat-space/chatchat-knowledgebase/30e19bc82557c004c7cc0978d25bc46523a75b49/articles/01-agent/imgs/行动决策与环境感知的任务协同分析/image-20230909160058793.png
--------------------------------------------------------------------------------
/articles/01-agent/imgs/行动决策与环境感知的任务协同分析/image-20230909160334508.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/chatchat-space/chatchat-knowledgebase/30e19bc82557c004c7cc0978d25bc46523a75b49/articles/01-agent/imgs/行动决策与环境感知的任务协同分析/image-20230909160334508.png
--------------------------------------------------------------------------------
/articles/01-agent/imgs/行动决策与环境感知的任务协同分析/image-20230909160347941.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/chatchat-space/chatchat-knowledgebase/30e19bc82557c004c7cc0978d25bc46523a75b49/articles/01-agent/imgs/行动决策与环境感知的任务协同分析/image-20230909160347941.png
--------------------------------------------------------------------------------
/articles/01-agent/关于REACT中行动决策与环境感知的任务协同分析.pptx:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/chatchat-space/chatchat-knowledgebase/30e19bc82557c004c7cc0978d25bc46523a75b49/articles/01-agent/关于REACT中行动决策与环境感知的任务协同分析.pptx
--------------------------------------------------------------------------------
/articles/01-agent/此处是一个高维空间在同空间下,低维的子空间映射图.kra:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/chatchat-space/chatchat-knowledgebase/30e19bc82557c004c7cc0978d25bc46523a75b49/articles/01-agent/此处是一个高维空间在同空间下,低维的子空间映射图.kra
--------------------------------------------------------------------------------
/articles/01-agent/行动决策与环境感知的任务协同分析-第一篇.md:
--------------------------------------------------------------------------------
1 | #### 行动决策与环境感知的任务协同分析-第一篇
2 |
3 |
4 |
5 | 前段时间都在谈论agent,可以说是引足了大家的眼球,今天我们也来深入的水一水
6 |
7 | 文章的开始我们先提几个问题?
8 |
9 | > agent是替换监督学习和强化学习的方法?
10 | >
11 | > 在我们传统的nlp任务中,意图识别真的只能硬编码吗?
12 | >
13 | > agent 能不能用在生产
14 |
15 | 本篇我们先讨论agent是不是替换监督学习和强化学习的方法
16 |
17 | 内容主要来自《关于REACT中行动决策与环境感知的任务协同分析.pptx》其中参考文献,相关资料可从文章结尾参考文献获得
18 |
19 | 
20 |
21 |
22 |
23 |
24 |
25 | 相信大部分水友,在头一次听到这个名词的时候,都是在langchain这个工具包中发现的,在其包中存在一个`ReAct`模块,关于这个模块的论文我们先水一期,附上链接`https://react-lm.github.io/`
26 |
27 | 
28 |
29 |
30 |
31 |
32 |
33 | 听到`ReAct`这个词,可以说是包罗万象,大名鼎鼎的前端框架`React`一定是大家听的最多的。
34 |
35 | 左边是描述程序行为,帮助程序决策的思想。右边是前端发布构建工具,__实际这是两个东西__,
36 |
37 | ReAct React
38 |
39 |

40 |
41 |
42 |
43 |
44 |
45 | __那有什么可以水的吗。。。__
46 |
47 | 让我们来讨论一下`React`是如何提出的吧。论文分别提供了以下三种形式的语言示例:
48 |
49 | - (a) Standard
50 |
51 | - (b) Chain-of-thought (CoT, Reason Only)
52 |
53 | - (c) Act-only
54 |
55 | - (d) ReAct (Reason+Act)
56 |
57 | 我们可以看到,论文使用了之前提出的三种理论来解决模型预测的问题,并且阐述了每种理论存在的弊端。
58 |
59 | 以“chain-of-thought”来作为标准来说,此推理和是一个静态的黑匣子,因为模型使用它自己的内部表征来产生思想,并且不以外部世界为基础,这限制了其适应推理或更新知识的能力。这可能会导致诸如事实之类的问题
60 |
61 | __其余理论结果相同__
62 |
63 | 
64 |
65 |
66 |
67 | 从`ReAct`举例的数据来看,我们观察到 `ReAct` 所展示的问题解决过程更具事实性和基础性,而 CoT 在构建推理结构方面更准确,但容易受到虚构事实或思维的影响。
68 |
69 | 因此,我们建议将ReAct和CoT-SC纳入考虑,并让模型基于以下启发式决定何时切换到其他方法:
70 |
71 | A) ReAct → CoT-SC: 当 ReAct 在给定的步骤内未能返回答案时,切换到 CoT-SC。我们分别为 HotpotQA 和 FEVER 设置了7个步骤和5个步骤,因为我们发现增加步骤不会提高 ReAct 的性能。
72 |
73 | B) CoT-SC → ReAct:当n个CoT-SC样本中的大多数答案发生不到n/2次(即内部知识可能不足以自信地支持任务)时,切换回 ReAct。
74 |
75 | 
76 |
77 | **得出结论:**那么在知识密集型推理任务中,多级问答和事实准确一定是需要考虑的,假设一个系统无法做到事实准确,无法以逻辑解释的系统,必然这个系统是无法使用的,大模型最近几年的论文中,每篇都切实的论述了这点,我们从`ReAct`中得出了一个这个结论,大型语言模型 (LLM) 可以通过特定的思维方式,使用逻辑结构解释语言和交互式决策制定,这对于我们是一个令人印象深刻的能力,*但它们的推理能力(例如思维链提示)和行动能力(例如行动计划生成)需要作为单独的主题进行研究*
78 |
79 |
80 |
81 | 不过,得到这个结论也是*非常的amazing*,我们从这个结论里可以做到哪些事情呢?
82 |
83 | 
84 |
85 |
86 |
87 | 乍一看,这些专业术语似乎很复杂,有点让人望而生畏。不过,如果我们换个角度来思考,理解起来这些其实并不难。
88 |
89 | 我们来看下这个强化学习的示例,考虑有固定策略π,研究其相关的行为。机器理解逻辑控制单元,寻求最大化价值的策略,这一概念与最优路径的概念密切相关。
90 |
91 | 
92 |
93 |
94 |
95 |
96 |
97 |
98 |
99 | *我们观察到一系列最优策略相对应的值分布,策略分布图可以直观的展示*
100 |
101 | 而这个怎么就并不难了,强化学习固然是需要非常专业的领域的大佬才能设计,但是强化学习解决的问题其实是日常人的行为逻辑,知道了这个前提我们来解释下这个图再说什么
102 |
103 | 那么这个图来自《A Distributional Perspective on Reinforcement Learning》,作者提出了关于策略评估和控制的理论结果,用来实现风险感知行为,核心观点是强调了价值分布的基本重要性,它指的是强化学习代理接收的随机回报的分布。这与传统的强化学习方法不同,传统方法通常是建模对这个回报的期望值或价值。
104 |
105 | 说了这么半天,什么是风险感知行为,实际吧,这就是再讨论危险怎么认定的问题,从我们人类的角度,认定风险是一个学习过程,例如看见黄色标牌、红色路标、听到高频的声波,我们会本能的认为会发生危险
106 |
107 |
108 |
109 | 下面我们大概的总结下这里的内容
110 |
111 |
112 |
113 | 
114 |
115 |
116 |
117 | 论文指出,相比于使用提示方法,使用ReAct微调方法可以获得更好的性能。此外,即使是在模型大小相同的情况下,使用ReAct微调较小的模型也可以获得比提示较大模型更好的性能。这表明ReAct微调方法可以帮助提高模型性能,并且在资源受限的情况下也可以实现更好的性能
118 |
119 | 虽然大型语言模型 (LLM) 在语言理解和交互式决策制定等任务中展示了令人印象深刻的能力,但它们的推理能力(例如思维链提示)和行动能力(例如行动计划生成)需要作为单独的主题进行研究
120 |
121 |
122 |
123 | *讨论学术不是在讨论权威,本文的所有内容作者没有全部复现,祝您心明眼亮,图一乐儿*
124 |
125 |
126 |
127 | 参考文献:
128 |
129 | 1. https://arxiv.org/pdf/2210.03629.pdf
130 |
131 | 2. https://arxiv.org/abs/1707.06887
132 |
133 | 3. https://arxiv.org/abs/2201.11903
134 |
135 | 4. https://blog.csdn.net/qq_38293297/article/details/108711288
136 |
137 | 5. 《关于REACT中行动决策与环境感知的任务协同分析.pptx》在[chatchat-space/chatchat-knowledgebase](https://github.com/chatchat-space/chatchat-knowledgebase.git)
138 |
139 |
140 |
141 |
--------------------------------------------------------------------------------
/articles/01-agent/行动决策与环境感知的任务协同分析-第二篇.md:
--------------------------------------------------------------------------------
1 | #### 行动决策与环境感知的任务协同分析-第二篇
2 |
3 | 上篇在讨论强化学习与agent,指出强化学习虽然在表面上似乎需要专业知识,但实际上牵涉到日常生活中的人类行为逻辑。
4 |
5 | 那么对于*Agent是不是替换监督学习和强化学习的方法*的视角,我们抛出几个前置问题
6 |
7 | > 机器代理的风险感知与决策与强化学习有什么区别
8 | >
9 | > 新视角否在实际应用中能够提供更好的结果?
10 |
11 | 看似新的视角,改变了传统强化学习方法,将重点放在价值分布上,并为机器代理的风险感知行为提供了理论基础。这对于改善强化学习算法的性能以及将其应用于更广泛的领域具有潜在的影响。
12 |
13 | PS:讨论这个问题其实很可笑,但是碗里没料了~~~
14 |
15 | 
16 |
17 | (一个统一的封面图)
18 |
19 |
20 |
21 | 要掌握对着几个问题的讨论尺度,我们需要站在专业的视角谈论一个问题,“模型如何评估“、“度量标准如何评价”,以下内容摘自周志华《西瓜书》中的一部分内容
22 |
23 | 从训练模型开始,一个模型的经验错误(事实错误)与过拟合(逻辑错误)往往是在模型初期,整理训练参数的时候决定的,我们实际希望的,是在新样本上能表现的很好的学习器,为了达到这个目的,应该 从训练样本中尽可能学出适用于所有潜在样本的“普遍规律”。任何偏于此期望的情况,两极分化出来了两个概念,“欠拟合”与“过拟合”
24 |
25 | 机器学习面临的问题通常是NP问题,*距离的定义与计算*,而一个有效的学习算法必然是在多项式时间内运行完成。通常,在有效的结构内,寻求一种“经验结果”满足模型的任务,常见的经验函数如KNN\1NN等一些临近算法,临近算法目的是为了使样本映射至高维“特征空间”,在空间维度上实施线性变换或非线性变换,好勒说这么多,挺枯燥的。
26 |
27 | 那么回去看上面的问题,是不是有了一个不同的角度,即:强化学习与监督学习实际只是,经验函数和空间计算函数(激活函数)不同
28 |
29 | 至于如何不同呢,我们来水一水。
30 |
31 | 
32 |
33 | (一个关于做了很牛逼事情的沙雕图)
34 |
35 | 从我们(小学二年级)的课本中介绍,临近算法是一种监督学习算法,常用于数据挖掘、目标匹配等任务中,强化学习算法是一类用于训练智能体在与环境互动中学习最佳决策策略的算法,这里其实已经点出了,如果想让监督学习的任务更加准确,必然需要设计一个强化学习的策略来满足任务(老生常谈的问题了)
36 |
37 |
38 |
39 | #### 低维嵌入
40 |
41 | 现实应用中,我们已经有了一个LLM(自然语言大模型),当然这个模型肯定是有偏好的,上面已经解释了原因,这里不做赘述。当我们再用大模型的时候都会遇到一个问题,它好像对我说的概念不知道,例如,“苹果掉下来”这个句话,放在牛顿那里就是万有引力,放在小猪面前,就是把你吃掉。结果显而易见,模型需要一个调参,或者说模型需要一个空间维度的提示。
42 |
43 | 
44 |
45 | 回到课堂上,来讨论问题,假定属性维数为20,若要求样本未满足密采样条件,则至少需 $ (x10^3)^{20} = 10^{60} $个样本,现实应用中属性维数经常成千上万,要满足密采样条件所需的样本数目是无法达到的天文数字,事实上,在高维情形下出现的数据样本稀疏,距离计算困难等问题,是所有机器学习方法共同面临的严重障碍,被称为“维数灾难”,这种问题可以在降维,即低维空间中得到解决。
46 |
47 | 这个问题确实有点难以理解,还是一样,先提出几个问题,什么是高维度空间,什么是低维空间,什么是空间映射,空间映射是如何发生的
48 |
49 | 打开没读过的课本,在看上我们讨论过维度这个名词,我们有好几种解释、空间维度、数据维度、向量空间维度等等,
50 |
51 | 
52 |
53 | (此处是一个高维空间在同空间下,低维的子空间映射图)
54 |
55 | 图中是一个空间坐标与相机坐标的变换公式,当前这个公式,不一定大家都看过,解释起来也挺困难的,桀桀桀,大家自己去看看吧~
56 |
57 | 我们来说下,为什么用这个公式举例,试想一下,如果不存在这个公式,那如何存储一个空间转换的位置关系(原点坐标与世界坐标的关系),一个位置发生了变化之后,元坐标与法线的关系,如果使用矩阵存储,必然会占用大量的空间,在尝试坐标计算时,计算量是无法估计的情况,引入上方的公式,就好像找到了一个无形的变换关系,结果自然会在 `投影点`出现
58 |
59 | 
60 |
61 | 回到LLM的问题,如何给模型一个空间维度的提示(投影坐标的关系),回到聚类问题,尝试给文本增加语义解释之后,这个聚类分组就有了
62 |
63 | 上面的例子是线性维度下的直观距离,但是很多时候,模型处理的任务都是非线性的
64 |
--------------------------------------------------------------------------------
/articles/02-chatchat加载p-tuning/chatchat加载ptuning.md:
--------------------------------------------------------------------------------
1 | # chatchat加载ptuning指南
2 |
3 | P-tuning虽然是一种peft方法,但并不能于huggingface的peft python包兼容,而fastchat在多处以字符串匹配的方式进行硬编码加载模型,因此导致fastchat和chatchat不能兼容p-tuning,经langchain-chatchat开发组多次尝试,给出如下指南进行p-tuning加载。
4 |
5 | # 1. peft文件夹修改
6 |
7 | 1. 将config.json文件修改为adapter_config.json;
8 | 2. 保证文件夹包含pytorch_model.bin文件;
9 | 3. 修改文件夹名称,保证文件夹包含'peft'一词;
10 | 4. 在adapter_config.json文件中增加如下字段:
11 |
12 | ```json
13 | "base_model_name_or_path": "/root/model/chatglm2-6b/"
14 | "task_type": "CAUSAL_LM",
15 | "peft_type": "PREFIX_TUNING",
16 | "inference_mode": true,
17 | "revision": "main",
18 | "num_virtual_tokens": 16
19 | ```
20 |
21 | **其中,"base_model_name_or_path"为基础模型的存在位置**;
22 | 5. 将文件夹移入项目文件夹中,如Langchain-Chatchat项目文件夹目录下;
23 |
24 | # 2. fastchat包代码修改
25 |
26 | ## 2.1 fastchat.model.model_adapter文件修改
27 |
28 | 1. 将fastchat.model.model_adapter.py文件的load_model函数修改为:
29 |
30 | ```python
31 | def load_model(
32 | model_path: str,
33 | device: str = "cuda",
34 | num_gpus: int = 1,
35 | max_gpu_memory: Optional[str] = None,
36 | dtype: Optional[torch.dtype] = None,
37 | load_8bit: bool = False,
38 | cpu_offloading: bool = False,
39 | gptq_config: Optional[GptqConfig] = None,
40 | awq_config: Optional[AWQConfig] = None,
41 | revision: str = "main",
42 | debug: bool = False,
43 | load_kwargs = {}
44 | ):
45 | """Load a model from Hugging Face."""
46 | # get model adapter
47 | adapter = get_model_adapter(model_path)
48 | kwargs = load_kwargs
49 | # Handle device mapping
50 | cpu_offloading = raise_warning_for_incompatible_cpu_offloading_configuration(
51 | device, load_8bit, cpu_offloading
52 | )
53 | if device == "cpu":
54 | kwargs["torch_dtype"]= torch.float32
55 | if CPU_ISA in ["avx512_bf16", "amx"]:
56 | try:
57 | import intel_extension_for_pytorch as ipex
58 |
59 | kwargs ["torch_dtype"]= torch.bfloat16
60 | except ImportError:
61 | warnings.warn(
62 | "Intel Extension for PyTorch is not installed, it can be installed to accelerate cpu inference"
63 | )
64 | elif device == "cuda":
65 | kwargs["torch_dtype"] = torch.float16
66 | if num_gpus != 1:
67 | kwargs["device_map"] = "auto"
68 | if max_gpu_memory is None:
69 | kwargs[
70 | "device_map"
71 | ] = "sequential" # This is important for not the same VRAM sizes
72 | available_gpu_memory = get_gpu_memory(num_gpus)
73 | kwargs["max_memory"] = {
74 | i: str(int(available_gpu_memory[i] * 0.85)) + "GiB"
75 | for i in range(num_gpus)
76 | }
77 | else:
78 | kwargs["max_memory"] = {i: max_gpu_memory for i in range(num_gpus)}
79 | elif device == "mps":
80 | kwargs["torch_dtype"] = torch.float16
81 | # Avoid bugs in mps backend by not using in-place operations.
82 | replace_llama_attn_with_non_inplace_operations()
83 | elif device == "xpu":
84 | kwargs["torch_dtype"] = torch.bfloat16
85 | # Try to load ipex, while it looks unused, it links into torch for xpu support
86 | try:
87 | import intel_extension_for_pytorch as ipex
88 | except ImportError:
89 | warnings.warn(
90 | "Intel Extension for PyTorch is not installed, but is required for xpu inference."
91 | )
92 | elif device == "npu":
93 | kwargs["torch_dtype"]= torch.float16
94 | # Try to load ipex, while it looks unused, it links into torch for xpu support
95 | try:
96 | import torch_npu
97 | except ImportError:
98 | warnings.warn("Ascend Extension for PyTorch is not installed.")
99 | else:
100 | raise ValueError(f"Invalid device: {device}")
101 |
102 | if cpu_offloading:
103 | # raises an error on incompatible platforms
104 | from transformers import BitsAndBytesConfig
105 |
106 | if "max_memory" in kwargs:
107 | kwargs["max_memory"]["cpu"] = (
108 | str(math.floor(psutil.virtual_memory().available / 2**20)) + "Mib"
109 | )
110 | kwargs["quantization_config"] = BitsAndBytesConfig(
111 | load_in_8bit_fp32_cpu_offload=cpu_offloading
112 | )
113 | kwargs["load_in_8bit"] = load_8bit
114 | elif load_8bit:
115 | if num_gpus != 1:
116 | warnings.warn(
117 | "8-bit quantization is not supported for multi-gpu inference."
118 | )
119 | else:
120 | model, tokenizer = adapter.load_compress_model(
121 | model_path=model_path,
122 | device=device,
123 | torch_dtype=kwargs["torch_dtype"],
124 | revision=revision,
125 | )
126 | if debug:
127 | print(model)
128 | return model, tokenizer
129 | elif awq_config and awq_config.wbits < 16:
130 | assert (
131 | awq_config.wbits == 4
132 | ), "Currently we only support 4-bit inference for AWQ."
133 | model, tokenizer = load_awq_quantized(model_path, awq_config, device)
134 | if num_gpus != 1:
135 | device_map = accelerate.infer_auto_device_map(
136 | model,
137 | max_memory=kwargs["max_memory"],
138 | no_split_module_classes=[
139 | "OPTDecoderLayer",
140 | "LlamaDecoderLayer",
141 | "BloomBlock",
142 | "MPTBlock",
143 | "DecoderLayer",
144 | ],
145 | )
146 | model = accelerate.dispatch_model(
147 | model, device_map=device_map, offload_buffers=True
148 | )
149 | else:
150 | model.to(device)
151 | return model, tokenizer
152 | elif gptq_config and gptq_config.wbits < 16:
153 | model, tokenizer = load_gptq_quantized(model_path, gptq_config)
154 | if num_gpus != 1:
155 | device_map = accelerate.infer_auto_device_map(
156 | model,
157 | max_memory=kwargs["max_memory"],
158 | no_split_module_classes=["LlamaDecoderLayer"],
159 | )
160 | model = accelerate.dispatch_model(
161 | model, device_map=device_map, offload_buffers=True
162 | )
163 | else:
164 | model.to(device)
165 | return model, tokenizer
166 | kwargs["revision"] = revision
167 |
168 | if dtype is not None: # Overwrite dtype if it is provided in the arguments.
169 | kwargs["torch_dtype"] = dtype
170 |
171 | # Load model
172 | model, tokenizer = adapter.load_model(model_path, kwargs)
173 |
174 | if (
175 | device == "cpu"
176 | and kwargs["torch_dtype"] is torch.bfloat16
177 | and CPU_ISA is not None
178 | ):
179 | model = ipex.optimize(model, dtype=kwargs["torch_dtype"])
180 |
181 | if (device == "cuda" and num_gpus == 1 and not cpu_offloading) or device in (
182 | "mps",
183 | "xpu",
184 | "npu",
185 | ):
186 | model.to(device)
187 |
188 | if device == "xpu":
189 | model = torch.xpu.optimize(model, dtype=kwargs["torch_dtype"], inplace=True)
190 |
191 | if debug:
192 | print(model)
193 |
194 | return model, tokenizer
195 | ```
196 | 2. 将fastchat.model.model_adapter.py的函数修改为:
197 |
198 | ```python
199 | def get_generate_stream_function(model: torch.nn.Module, model_path: str):
200 | """Get the generate_stream function for inference."""
201 | from fastchat.serve.inference import generate_stream
202 |
203 | model_type = str(type(model)).lower()
204 |
205 | is_chatglm = "chatglm" in model_type
206 | is_falcon = "rwforcausallm" in model_type
207 | is_codet5p = "codet5p" in model_type
208 | is_peft = "peft" in model_type
209 |
210 | if is_chatglm:
211 | return generate_stream_chatglm
212 | elif is_falcon:
213 | return generate_stream_falcon
214 | elif is_codet5p:
215 | return generate_stream_codet5p
216 | elif peft_share_base_weights and is_peft:
217 | # Return a curried stream function that loads the right adapter
218 | # according to the model_name available in this context. This ensures
219 | # the right weights are available.
220 | @torch.inference_mode()
221 | def generate_stream_peft(
222 | model,
223 | tokenizer,
224 | params: Dict,
225 | device: str,
226 | context_len: int,
227 | stream_interval: int = 2,
228 | judge_sent_end: bool = False,
229 | ):
230 |
231 | model.set_adapter(model_path)
232 | if "chatglm" in str(type(model.base_model)).lower():
233 | model.disable_adapter()
234 | prefix_state_dict = torch.load(os.path.join(model_path, "pytorch_model.bin"))
235 | new_prefix_state_dict = {}
236 |
237 | for k, v in prefix_state_dict.items():
238 | if k.startswith("transformer.prefix_encoder."):
239 | new_prefix_state_dict[k[len("transformer.prefix_encoder."):]] = v
240 | elif k.startswith("transformer.prompt_encoder."):
241 | new_prefix_state_dict[k[len("transformer.prompt_encoder."):]] = v
242 | model.transformer.prefix_encoder.load_state_dict(new_prefix_state_dict)
243 | for x in generate_stream_chatglm(
244 | model,
245 | tokenizer,
246 | params,
247 | device,
248 | context_len,
249 | stream_interval,
250 | judge_sent_end,
251 | ):
252 | yield x
253 | elif "rwforcausallm" in str(type(model.base_model)).lower():
254 |
255 | for x in generate_stream_falcon(
256 | model,
257 | tokenizer,
258 | params,
259 | device,
260 | context_len,
261 | stream_interval,
262 | judge_sent_end,
263 | ):
264 | yield x
265 | elif "codet5p" in str(type(model.base_model)).lower():
266 |
267 | for x in generate_stream_codet5p(
268 | model,
269 | tokenizer,
270 | params,
271 | device,
272 | context_len,
273 | stream_interval,
274 | judge_sent_end,
275 | ):
276 | yield x
277 | else:
278 |
279 | for x in generate_stream(
280 | model,
281 | tokenizer,
282 | params,
283 | device,
284 | context_len,
285 | stream_interval,
286 | judge_sent_end,
287 | ):
288 | yield x
289 |
290 | return generate_stream_peft
291 | else:
292 | return generate_stream
293 | ```
294 | 3. 将fastchat.model.model_adapter.py的PeftModelAdapter类的load_model方法修改为:
295 |
296 | ```python
297 | def load_model(self, model_path: str, from_pretrained_kwargs: dict):
298 | """Loads the base model then the (peft) adapter weights"""
299 | from peft import PeftConfig, PeftModel
300 |
301 | config = PeftConfig.from_pretrained(model_path)
302 | base_model_path = config.base_model_name_or_path
303 | if "peft" in base_model_path:
304 | raise ValueError(
305 | f"PeftModelAdapter cannot load a base model with 'peft' in the name: {config.base_model_name_or_path}"
306 | )
307 |
308 | # Basic proof of concept for loading peft adapters that share the base
309 | # weights. This is pretty messy because Peft re-writes the underlying
310 | # base model and internally stores a map of adapter layers.
311 | # So, to make this work we:
312 | # 1. Cache the first peft model loaded for a given base models.
313 | # 2. Call `load_model` for any follow on Peft models.
314 | # 3. Make sure we load the adapters by the model_path. Why? This is
315 | # what's accessible during inference time.
316 | # 4. In get_generate_stream_function, make sure we load the right
317 | # adapter before doing inference. This *should* be safe when calls
318 | # are blocked the same semaphore.
319 | if peft_share_base_weights:
320 | if base_model_path in peft_model_cache:
321 | model, tokenizer = peft_model_cache[base_model_path]
322 | # Super important: make sure we use model_path as the
323 | # `adapter_name`.
324 | model.load_adapter(model_path, adapter_name=model_path)
325 | else:
326 | base_adapter = get_model_adapter(base_model_path)
327 | base_model, tokenizer = base_adapter.load_model(
328 | base_model_path, from_pretrained_kwargs
329 | )
330 | # Super important: make sure we use model_path as the
331 | # `adapter_name`.
332 | from peft import get_peft_model
333 | model = get_peft_model(base_model,config,adapter_name=model_path)
334 | peft_model_cache[base_model_path] = (model, tokenizer)
335 | return model, tokenizer
336 |
337 | # In the normal case, load up the base model weights again.
338 | base_adapter = get_model_adapter(base_model_path)
339 | base_model, tokenizer = base_adapter.load_model(
340 | base_model_path, from_pretrained_kwargs
341 | )
342 | from peft import get_peft_model
343 | model = get_peft_model(base_model,config,adapter_name=model_path)
344 | return model, tokenizer
345 |
346 | ```
347 | 4. 将fastchat.model.model_adapter.py的ChatglmAdapter类的load_model方法修改为:
348 |
349 | ```python
350 | def load_model(self, model_path: str, from_pretrained_kwargs: dict):
351 | revision = from_pretrained_kwargs.get("revision", "main")
352 | tokenizer = AutoTokenizer.from_pretrained(
353 | model_path, trust_remote_code=True, revision=revision
354 | )
355 | config = AutoConfig.from_pretrained(model_path, trust_remote_code=True,**from_pretrained_kwargs)
356 | model = AutoModel.from_pretrained(
357 | model_path, trust_remote_code=True, config=config
358 | )
359 | return model, tokenizer
360 | ```
361 |
362 | ## 2.2 fastchat.serve.model_worker文件修改
363 |
364 | 1. 将fastchat.serve.model_worker文件的ModelWorker的__init__方法修改如下:
365 |
366 | ```python
367 | class ModelWorker(BaseModelWorker):
368 | def __init__(
369 | self,
370 | controller_addr: str,
371 | worker_addr: str,
372 | worker_id: str,
373 | model_path: str,
374 | model_names: List[str],
375 | limit_worker_concurrency: int,
376 | no_register: bool,
377 | device: str,
378 | num_gpus: int,
379 | max_gpu_memory: str,
380 | dtype: Optional[torch.dtype] = None,
381 | load_8bit: bool = False,
382 | cpu_offloading: bool = False,
383 | gptq_config: Optional[GptqConfig] = None,
384 | awq_config: Optional[AWQConfig] = None,
385 | stream_interval: int = 2,
386 | conv_template: Optional[str] = None,
387 | embed_in_truncate: bool = False,
388 | seed: Optional[int] = None,
389 | load_kwargs = {}, #修改点
390 | **kwargs,
391 | ):
392 | super().__init__(
393 | controller_addr,
394 | worker_addr,
395 | worker_id,
396 | model_path,
397 | model_names,
398 | limit_worker_concurrency,
399 | conv_template=conv_template,
400 | )
401 |
402 | logger.info(f"Loading the model {self.model_names} on worker {worker_id} ...")
403 | self.model, self.tokenizer = load_model(
404 | model_path,
405 | device=device,
406 | num_gpus=num_gpus,
407 | max_gpu_memory=max_gpu_memory,
408 | dtype=dtype,
409 | load_8bit=load_8bit,
410 | cpu_offloading=cpu_offloading,
411 | gptq_config=gptq_config,
412 | awq_config=awq_config,
413 | load_kwargs=load_kwargs #修改点
414 | )
415 | self.device = device
416 | if self.tokenizer.pad_token == None:
417 | self.tokenizer.pad_token = self.tokenizer.eos_token
418 | self.context_len = get_context_length(self.model.config)
419 | print("**"*100)
420 | self.generate_stream_func = get_generate_stream_function(self.model, model_path)
421 | print(f"self.generate_stream_func{self.generate_stream_func}")
422 | print("*"*100)
423 | self.stream_interval = stream_interval
424 | self.embed_in_truncate = embed_in_truncate
425 | self.seed = seed
426 |
427 | if not no_register:
428 | self.init_heart_beat()
429 | ```
430 | 2. 在fastchat.serve.model_worker文件的create_model_worker增加如下args参数:
431 |
432 | ```python
433 | parser.add_argument("--load_kwargs",type=dict,default={})
434 | ```
435 |
436 | 并将如下语句:
437 |
438 | ```python
439 | worker = ModelWorker(
440 | args.controller_address,
441 | args.worker_address,
442 | worker_id,
443 | args.model_path,
444 | args.model_names,
445 | args.limit_worker_concurrency,
446 | no_register=args.no_register,
447 | device=args.device,
448 | num_gpus=args.num_gpus,
449 | max_gpu_memory=args.max_gpu_memory,
450 | dtype=str_to_torch_dtype(args.dtype),
451 | load_8bit=args.load_8bit,
452 | cpu_offloading=args.cpu_offloading,
453 | gptq_config=gptq_config,
454 | awq_config=awq_config,
455 | stream_interval=args.stream_interval,
456 | conv_template=args.conv_template,
457 | embed_in_truncate=args.embed_in_truncate,
458 | seed=args.seed,
459 | )
460 | ```
461 |
462 | 修改为:
463 |
464 | ```python
465 | worker = ModelWorker(
466 | args.controller_address,
467 | args.worker_address,
468 | worker_id,
469 | args.model_path,
470 | args.model_names,
471 | args.limit_worker_concurrency,
472 | no_register=args.no_register,
473 | device=args.device,
474 | num_gpus=args.num_gpus,
475 | max_gpu_memory=args.max_gpu_memory,
476 | dtype=str_to_torch_dtype(args.dtype),
477 | load_8bit=args.load_8bit,
478 | cpu_offloading=args.cpu_offloading,
479 | gptq_config=gptq_config,
480 | awq_config=awq_config,
481 | stream_interval=args.stream_interval,
482 | conv_template=args.conv_template,
483 | embed_in_truncate=args.embed_in_truncate,
484 | seed=args.seed,
485 | load_kwargs=args.load_kwargs
486 | )
487 | ```
488 |
489 | 至此,我们完成了fastchat加载ptuning的所有修改,在调用fastchat加载p-tuning时,可以通过加入 `PEFT_SHARE_BASE_WEIGHTS=true`,并以字典的形式添加--load_kwargs参数为训练ptuning时的pre_seq_len值即可,例如将2.2.2步骤中的 `parser.add_argument("--load_kwargs",type=dict,default={})`修改为:
490 |
491 | `parser.add_argument("--load_kwargs",type=dict,default={"pre_seq_len":16})`
492 |
493 | # 3 langchain-chatchat代码修改:
494 |
495 | 1. 在configs/serve_config.py中的FSCHAT_MODEL_WORKERS字典中增加如下字段:
496 |
497 | ```
498 | "load_kwargs": {"pre_seq_len": 16} #值修改为adapter_config.json中的pre_seq_len值
499 | ```
500 | 2. 将startup.py中的create_model_worker_app修改为:
501 |
502 | ```python
503 | def create_model_worker_app(log_level: str = "INFO", **kwargs) -> FastAPI:
504 | """
505 | kwargs包含的字段如下:
506 | host:
507 | port:
508 | model_names:[`model_name`]
509 | controller_address:
510 | worker_address:
511 |
512 |
513 | 对于online_api:
514 | online_api:True
515 | worker_class: `provider`
516 | 对于离线模型:
517 | model_path: `model_name_or_path`,huggingface的repo-id或本地路径
518 | device:`LLM_DEVICE`
519 | """
520 | import fastchat.constants
521 | fastchat.constants.LOGDIR = LOG_PATH
522 | from fastchat.serve.model_worker import worker_id, logger
523 | import argparse
524 | logger.setLevel(log_level)
525 |
526 | parser = argparse.ArgumentParser()
527 | args = parser.parse_args([])
528 |
529 | for k, v in kwargs.items():
530 | setattr(args, k, v)
531 |
532 | # 在线模型API
533 | if worker_class := kwargs.get("worker_class"):
534 | from fastchat.serve.model_worker import app
535 | worker = worker_class(model_names=args.model_names,
536 | controller_addr=args.controller_address,
537 | worker_addr=args.worker_address)
538 | sys.modules["fastchat.serve.model_worker"].worker = worker
539 | # 本地模型
540 | else:
541 | from configs.model_config import VLLM_MODEL_DICT
542 | if kwargs["model_names"][0] in VLLM_MODEL_DICT and args.infer_turbo == "vllm":
543 | import fastchat.serve.vllm_worker
544 | from fastchat.serve.vllm_worker import VLLMWorker,app
545 | from vllm import AsyncLLMEngine
546 | from vllm.engine.arg_utils import AsyncEngineArgs,EngineArgs
547 | args.tokenizer = args.model_path # 如果tokenizer与model_path不一致在此处添加
548 | args.tokenizer_mode = 'auto'
549 | args.trust_remote_code= True
550 | args.download_dir= None
551 | args.load_format = 'auto'
552 | args.dtype = 'auto'
553 | args.seed = 0
554 | args.worker_use_ray = False
555 | args.pipeline_parallel_size = 1
556 | args.tensor_parallel_size = 1
557 | args.block_size = 16
558 | args.swap_space = 4 # GiB
559 | args.gpu_memory_utilization = 0.90
560 | args.max_num_batched_tokens = 2560
561 | args.max_num_seqs = 256
562 | args.disable_log_stats = False
563 | args.conv_template = None
564 | args.limit_worker_concurrency = 5
565 | args.no_register = False
566 | args.num_gpus = 1 # vllm worker的切分是tensor并行,这里填写显卡的数量
567 | args.engine_use_ray = False
568 | args.disable_log_requests = False
569 | if args.model_path:
570 | args.model = args.model_path
571 | if args.num_gpus > 1:
572 | args.tensor_parallel_size = args.num_gpus
573 |
574 | for k, v in kwargs.items():
575 | setattr(args, k, v)
576 |
577 | engine_args = AsyncEngineArgs.from_cli_args(args)
578 | engine = AsyncLLMEngine.from_engine_args(engine_args)
579 |
580 | worker = VLLMWorker(
581 | controller_addr = args.controller_address,
582 | worker_addr = args.worker_address,
583 | worker_id = worker_id,
584 | model_path = args.model_path,
585 | model_names = args.model_names,
586 | limit_worker_concurrency = args.limit_worker_concurrency,
587 | no_register = args.no_register,
588 | llm_engine = engine,
589 | conv_template = args.conv_template,
590 | )
591 | sys.modules["fastchat.serve.vllm_worker"].engine = engine
592 | sys.modules["fastchat.serve.vllm_worker"].worker = worker
593 |
594 | else:
595 | from fastchat.serve.model_worker import app, GptqConfig, AWQConfig, ModelWorker
596 | args.gpus = "0" # GPU的编号,如果有多个GPU,可以设置为"0,1,2,3"
597 | args.max_gpu_memory = "20GiB"
598 | args.num_gpus = 1 # model worker的切分是model并行,这里填写显卡的数量
599 |
600 | args.load_8bit = False
601 | args.cpu_offloading = None
602 | args.gptq_ckpt = None
603 | args.gptq_wbits = 16
604 | args.gptq_groupsize = -1
605 | args.gptq_act_order = False
606 | args.awq_ckpt = None
607 | args.awq_wbits = 16
608 | args.awq_groupsize = -1
609 | args.model_names = []
610 | args.conv_template = None
611 | args.limit_worker_concurrency = 5
612 | args.stream_interval = 2
613 | args.no_register = False
614 | args.embed_in_truncate = False
615 | args.load_kwargs = {"pre_seq_len": 16} # 改*************************
616 | for k, v in kwargs.items():
617 | setattr(args, k, v)
618 | if args.gpus:
619 | if args.num_gpus is None:
620 | args.num_gpus = len(args.gpus.split(','))
621 | if len(args.gpus.split(",")) < args.num_gpus:
622 | raise ValueError(
623 | f"Larger --num-gpus ({args.num_gpus}) than --gpus {args.gpus}!"
624 | )
625 | os.environ["CUDA_VISIBLE_DEVICES"] = args.gpus
626 | gptq_config = GptqConfig(
627 | ckpt=args.gptq_ckpt or args.model_path,
628 | wbits=args.gptq_wbits,
629 | groupsize=args.gptq_groupsize,
630 | act_order=args.gptq_act_order,
631 | )
632 | awq_config = AWQConfig(
633 | ckpt=args.awq_ckpt or args.model_path,
634 | wbits=args.awq_wbits,
635 | groupsize=args.awq_groupsize,
636 | )
637 |
638 | worker = ModelWorker(
639 | controller_addr=args.controller_address,
640 | worker_addr=args.worker_address,
641 | worker_id=worker_id,
642 | model_path=args.model_path,
643 | model_names=args.model_names,
644 | limit_worker_concurrency=args.limit_worker_concurrency,
645 | no_register=args.no_register,
646 | device=args.device,
647 | num_gpus=args.num_gpus,
648 | max_gpu_memory=args.max_gpu_memory,
649 | load_8bit=args.load_8bit,
650 | cpu_offloading=args.cpu_offloading,
651 | gptq_config=gptq_config,
652 | awq_config=awq_config,
653 | stream_interval=args.stream_interval,
654 | conv_template=args.conv_template,
655 | embed_in_truncate=args.embed_in_truncate,
656 | load_kwargs=args.load_kwargs #改*************************
657 | )
658 | sys.modules["fastchat.serve.model_worker"].args = args
659 | sys.modules["fastchat.serve.model_worker"].gptq_config = gptq_config
660 |
661 | sys.modules["fastchat.serve.model_worker"].worker = worker
662 |
663 | MakeFastAPIOffline(app)
664 | app.title = f"FastChat LLM Server ({args.model_names[0]})"
665 | app._worker = worker
666 | return app
667 | ```
668 |
669 | 至此,我们完成了langchain-chatchat加载p-tuning的全部操作,可以如下方式加载p-tuning:
670 |
671 | ```shell
672 | PEFT_SHARE_BASE_WEIGHTS=true python startup.py -a
673 |
674 | ```
675 |
--------------------------------------------------------------------------------
/articles/03-大模型技术栈概览/大模型技术栈-算法与原理.docx:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/chatchat-space/chatchat-knowledgebase/30e19bc82557c004c7cc0978d25bc46523a75b49/articles/03-大模型技术栈概览/大模型技术栈-算法与原理.docx
--------------------------------------------------------------------------------
/articles/03-大模型技术栈概览/大模型技术栈-算法与原理.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/chatchat-space/chatchat-knowledgebase/30e19bc82557c004c7cc0978d25bc46523a75b49/articles/03-大模型技术栈概览/大模型技术栈-算法与原理.png
--------------------------------------------------------------------------------
/articles/04-大模型推理优化策略/大模型推理优化策略.docx:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/chatchat-space/chatchat-knowledgebase/30e19bc82557c004c7cc0978d25bc46523a75b49/articles/04-大模型推理优化策略/大模型推理优化策略.docx
--------------------------------------------------------------------------------
/articles/04-大模型推理优化策略/大模型推理优化策略.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/chatchat-space/chatchat-knowledgebase/30e19bc82557c004c7cc0978d25bc46523a75b49/articles/04-大模型推理优化策略/大模型推理优化策略.png
--------------------------------------------------------------------------------
/articles/05-大模型指令对齐训练/大模型指令对齐训练原理.docx:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/chatchat-space/chatchat-knowledgebase/30e19bc82557c004c7cc0978d25bc46523a75b49/articles/05-大模型指令对齐训练/大模型指令对齐训练原理.docx
--------------------------------------------------------------------------------
/articles/05-大模型指令对齐训练/大模型指令对齐训练原理.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/chatchat-space/chatchat-knowledgebase/30e19bc82557c004c7cc0978d25bc46523a75b49/articles/05-大模型指令对齐训练/大模型指令对齐训练原理.png
--------------------------------------------------------------------------------
/articles/06-大模型分布式训练技术/分布式训练技术原理.docx:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/chatchat-space/chatchat-knowledgebase/30e19bc82557c004c7cc0978d25bc46523a75b49/articles/06-大模型分布式训练技术/分布式训练技术原理.docx
--------------------------------------------------------------------------------
/articles/06-大模型分布式训练技术/分布式训练技术原理.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/chatchat-space/chatchat-knowledgebase/30e19bc82557c004c7cc0978d25bc46523a75b49/articles/06-大模型分布式训练技术/分布式训练技术原理.png
--------------------------------------------------------------------------------
/articles/07-大模型应用技术原理/大模型应用技术原理.docx:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/chatchat-space/chatchat-knowledgebase/30e19bc82557c004c7cc0978d25bc46523a75b49/articles/07-大模型应用技术原理/大模型应用技术原理.docx
--------------------------------------------------------------------------------
/articles/07-大模型应用技术原理/大模型应用技术原理.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/chatchat-space/chatchat-knowledgebase/30e19bc82557c004c7cc0978d25bc46523a75b49/articles/07-大模型应用技术原理/大模型应用技术原理.png
--------------------------------------------------------------------------------
/articles/08-强化学习简介/强化学习简介.docx:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/chatchat-space/chatchat-knowledgebase/30e19bc82557c004c7cc0978d25bc46523a75b49/articles/08-强化学习简介/强化学习简介.docx
--------------------------------------------------------------------------------
/articles/09-AIOps调研报告/AIOps.docx:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/chatchat-space/chatchat-knowledgebase/30e19bc82557c004c7cc0978d25bc46523a75b49/articles/09-AIOps调研报告/AIOps.docx
--------------------------------------------------------------------------------
/articles/09-AIOps调研报告/AIOps.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/chatchat-space/chatchat-knowledgebase/30e19bc82557c004c7cc0978d25bc46523a75b49/articles/09-AIOps调研报告/AIOps.png
--------------------------------------------------------------------------------
/articles/10-AIOps方法论/AIOps方法论.docx:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/chatchat-space/chatchat-knowledgebase/30e19bc82557c004c7cc0978d25bc46523a75b49/articles/10-AIOps方法论/AIOps方法论.docx
--------------------------------------------------------------------------------
/articles/10-AIOps方法论/AIOps方法论.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/chatchat-space/chatchat-knowledgebase/30e19bc82557c004c7cc0978d25bc46523a75b49/articles/10-AIOps方法论/AIOps方法论.png
--------------------------------------------------------------------------------
/articles/11-RCA根因分析技术/RCA Survey.docx:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/chatchat-space/chatchat-knowledgebase/30e19bc82557c004c7cc0978d25bc46523a75b49/articles/11-RCA根因分析技术/RCA Survey.docx
--------------------------------------------------------------------------------
/articles/11-RCA根因分析技术/RCA Survey.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/chatchat-space/chatchat-knowledgebase/30e19bc82557c004c7cc0978d25bc46523a75b49/articles/11-RCA根因分析技术/RCA Survey.png
--------------------------------------------------------------------------------
/articles/12-有限马尔可夫过程简介/有限马尔可夫过程.md:
--------------------------------------------------------------------------------
1 | # 有限马尔可夫过程
2 |
3 | MDPs是决策序列的一种经典形式化模型,其中的行动不仅会影响当前的即时奖励,也会通过未来的奖励值来影响后续的情况和状态,即决策与环境的互动。
4 |
5 | ## 3.1 the agent-environment interface
6 |
7 | agent和environment的界限:**不能被agent任意改变的事务,均认为其属于environment**如果agent能够改变环境的reward,那么agent将倾向于学习reward的表示而不是如何通过行动来获得reward。
8 |
9 | ## 3.2 goals and rewards
10 |
11 | agent的目标是通过environment传入的reward信号来量化,即最大化奖励值的累积求和。
12 |
13 | ## return
14 |
15 | 定义$G_t$表示时间点t之后的期望返回值,一种简单的期望返回值为对未来的奖励值序列进行求和:$G_t=R_{t+1}+R_{t+1}+...+R_T$.
16 |
17 | ### episodic tasks
18 |
19 | episodes:agent与环境交互过程的任意重复性交互过程。terminal state:每个子片段的一个特殊状态,其后续时间点被重置进行新片段。定义$\mathcal{S}$为非终止态的集合,$S^{+}$为全体集合。
20 |
21 | ### continuing tasks
22 |
23 | 对于连续式任务,$T=\infty$,从而$G_t$趋于无穷,导致agent无法根据返回值来进行比较学习,此时需要加入衰减系数$\gamma$,即
24 |
25 | $$
26 | G_t=\sum_{k=0}^{\infty}{\gamma^kR_{t+k+1}}
27 | $$
28 |
29 | ## 3.4 unified notation for episodic and continuing tasks
30 |
31 | 对于片段式任务降入一种特殊的吸收态,其特点是状态的转移和交互都是在自身进行,且奖励值为0,则片段式和连续任务可统一为:
32 |
33 | $$
34 | G_t=\sum_{k=t+1}^{T}{\gamma^{k-t-1}R_{t+k+1}}
35 | $$
36 | ,其中$\gamma=1,T=\infty$不会同时存在;
37 |
38 | ## 3.5 the markov property
39 |
40 | $$
41 | p(s',r|s,a)=P(s',r|S_0,A_0,R_1,...,S_t=s,A_t=a)
42 | $$
43 |
44 | 当前状态包含了所有历史状态的信息。
45 | 其中,$\sum_{s'\in S}{\sum_{r\in R}{p(s',r|s,a)}}=1$。
46 |
47 | ## 2.6 MDP
48 |
49 | 如果一个强化学习任务满足马尔可夫性质,我们称之为马尔可夫决策过程(MDP),如果**状态空间和行动空间都是有限的**,我们称之为有限马尔可夫过程(finite MDP).
50 | 根据$p(s',r|s,a)=P\{S_t=s',R_t=r|S_{t-1}=s,A_{t-1}=a\}$我们可以得到:
51 |
52 | * 状态转移概率
53 |
54 | $$
55 | p(s'|s,a)=P(S_t=s'|S_{t-1}=s,A_{t-1}=a)=\sum_{r\in R}{p(s',r|s,a)}
56 | $$
57 |
58 | * 给定状态、行动的期望值
59 |
60 | $$
61 | r(s,a)=E[R_t|S_{t-1}=s,A_{t-1}=a]=\sum_{r\in R}{r\sum_{s'\in S}{p(s',r|s,a)}}
62 | $$
63 |
64 | * 给定状态、行动、后续状态的期望奖励:
65 |
66 | $$
67 | r(s,a,s')=E[R_t|S_{t-1}=s,A_{t-1}=a,S_t=s']\sum_{r\in R}{r\frac{p(s',r|s,a)}{p(s'|s,a)}}
68 | $$
69 |
70 | ## 3.7 value functions
71 |
72 | value function 用于量化评估不同条件下行动的好坏程度。我们可以对返回值求期望来作为value function来进行估计,agent的策略决定了如何来计算这一期望。策略就是有状态空间、行动空间到概率空间的一个映射,对于MDP,我们分别定义状态值函数为:在状态s下,遵守策略$\pi$的期望累积奖励,即:
73 |
74 | $$
75 | V_{\pi}(s)=E_{\pi}[G_t|S_t=s]=E_{\pi}\left[\sum_{k=0}^{\infty}{\gamma^kR_{t+k+1}|S_t=s}\right]
76 | $$
77 |
78 | 定义状态-动作值函数为:在状态s下,采取行动a,遵守策略$\pi$的期望累积奖励,即:
79 |
80 | $$
81 | V_{\pi}(s)=E_{\pi}[G_t|S_t=s,A_t=a]=E_{\pi}\left[\sum_{k=0}^{\infty}{\gamma^kR_{t+k+1}|S_t=s,A_t=a}\right]
82 | $$
83 |
84 | ### Monte Carlo Methods
85 |
86 | * 在实验中,如果agent在遵守策略$\pi$的条件下,为每个出发状态s都保留了后面的真实返回值的均值,显然这个均值会随着实验次数的增加而最终收敛到$v_{\pi}(s)$。
87 | * 类似地,如果还细分到每个出发状态s、每个行动a都保留了真实返回值的均值,则能最终收敛到$q_{\pi}(s,a)$
88 | 但如果状态空间很大,进行MC模拟很不现实。此时即可将$v_{\pi},q_{\pi}$看作参数化函数,通过调参来逼近真实值,一样能够较为精确地估计。
89 |
90 | ### Bellman equation
91 |
92 | $$
93 | v_{\pi}(s)=\sum_{\alpha}{\pi(a|s)\sum_{s',r}{p(s',r|s,a)[r+\gamma v_{\pi}(s')]}}
94 | $$
95 |
96 | ## 3.8 optimal value functions
97 |
98 | $$
99 | q_*(s,a)=E\[R_{t+1}+\gamma v_*(S_{t+1})|S_t=s,A_t=a]
100 | $$
101 |
102 | ### Bellman optimal equation
103 |
104 | 贝尔曼最优方程表示了最优策略下一个状态的状态值,必然等于该状态下最优行动的期望返回值:
105 |
106 | $$
107 | v_*(s)=max_{a\in \mathcal{A}(s)}q_{\pi_*}(s,a)=max_{a} \sum_{s',r}{p(s',r|s,a)[r+\gamma v_*(s')]}
108 | $$
109 |
110 | $$
111 | q_*(s)= \sum_{s',r}p(s',r|s,a)[r+\gamma max_{a'}q_*(s')]
112 | $$
113 |
114 | ```
115 | 最优策略:显然至少存在一个行动使行动值取到$v_*$,如果一个策略只将非0概率分配给这个行动,称这个策略为最优策略。
116 | ```
117 |
118 | 如果每一步都采取贪心策略,即只根据$v_*$来确定下一步的行动,这样的行动却恰好是最优行动,$v_*$的优美之处便体现于此。之所以能达到这一效果,是因为$v_*$已经考虑到了未来所有可能性,于是,看似贪心的「一步搜索」却能生成出全局最优行动。
119 | 如果我们进一步解得了$q_*$,agent 甚至都无需来做「一步搜索」:对于任意状态 s ,只需找到$a_0$使得$q_*(a_0,s)=max_{a}q_*(s,a)$即可。这是因为我们已经在之前的工作中多做了一些准备,将进一步的搜索信息缓存在了各个了$q_*$中,使得它的信息量比 $v_*$ 更大。
120 |
121 | ## optimality and approximation
122 |
123 | 前面已经讲过,最优策略是通过耗费极端的计算资源求得的,这使得我们不得不考虑一些方法来近似估计前面的一些函数。
124 | 在近似求解最优行为时,我们不难想象,会有很多状态其实只会以极低概率出现,我们若仍对其求解最优行动则意义不大,这时候如果选取一个局部最优行动来代替最优行动,显然,从整体期望意义来看,对总体奖励值造成的影响其实并不大,但却能节省很多的计算资源。与之对应的,当遇到那些经常出现的状态,我们则务必求出最优解。这是在解决 MDP 时强化学习方法区别于其他近似方法的一个重要性质。
125 |
--------------------------------------------------------------------------------
/chatchat-qrcode.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/chatchat-space/chatchat-knowledgebase/30e19bc82557c004c7cc0978d25bc46523a75b49/chatchat-qrcode.jpg
--------------------------------------------------------------------------------