├── .gitignore
├── LICENSE
├── README.md
├── assets
    ├── demo.png
    ├── train_success1.png
    └── train_success2.png
├── attention_rl.md
├── log.txt
├── rl_demo
    └── CartPoleSimscapeModel.m
├── some_simulink_model
    ├── SHTL1_TestLevel_2_MODEL.slx
    ├── SHTL1_TestLevel_2_MODEL.slx.autosave
    ├── example.slx
    ├── rlCartPoleSimscapeModel.slx
    └── rlSimplePendulumModel.slx
├── tcp_connection_attempt
    ├── matlab_client.m
    ├── matlab_service.m
    ├── tcp_client.py
    └── tcp_service.py
└── test
    ├── test_cartpole
        ├── __init__.py
        ├── dqn_test.py
        └── environment.py
    └── test_origin
        ├── __init__.py
        ├── dqn_test.png
        ├── dqn_test.py
        ├── environment.py
        └── readme.md


/.gitignore:
--------------------------------------------------------------------------------
  1 | # Byte-compiled / optimized / DLL files
  2 | __pycache__/
  3 | *.py[cod]
  4 | *$py.class
  5 | 
  6 | # C extensions
  7 | *.so
  8 | 
  9 | # Distribution / packaging
 10 | .Python
 11 | build/
 12 | develop-eggs/
 13 | dist/
 14 | downloads/
 15 | eggs/
 16 | .eggs/
 17 | lib/
 18 | lib64/
 19 | parts/
 20 | sdist/
 21 | var/
 22 | wheels/
 23 | *.egg-info/
 24 | .installed.cfg
 25 | *.egg
 26 | MANIFEST
 27 | 
 28 | # PyInstaller
 29 | #  Usually these files are written by a python script from a template
 30 | #  before PyInstaller builds the exe, so as to inject date/other infos into it.
 31 | *.manifest
 32 | *.spec
 33 | 
 34 | # Installer logs
 35 | pip-log.txt
 36 | pip-delete-this-directory.txt
 37 | 
 38 | # Unit test / coverage reports
 39 | htmlcov/
 40 | .tox/
 41 | .coverage
 42 | .coverage.*
 43 | .cache
 44 | nosetests.xml
 45 | coverage.xml
 46 | *.cover
 47 | .hypothesis/
 48 | .pytest_cache/
 49 | 
 50 | # Translations
 51 | *.mo
 52 | *.pot
 53 | 
 54 | # Django stuff:
 55 | *.log
 56 | local_settings.py
 57 | db.sqlite3
 58 | 
 59 | # Flask stuff:
 60 | instance/
 61 | .webassets-cache
 62 | 
 63 | # Scrapy stuff:
 64 | .scrapy
 65 | 
 66 | # Sphinx documentation
 67 | docs/_build/
 68 | 
 69 | # PyBuilder
 70 | target/
 71 | 
 72 | # Jupyter Notebook
 73 | .ipynb_checkpoints
 74 | 
 75 | # pyenv
 76 | .python-version
 77 | 
 78 | # celery beat schedule file
 79 | celerybeat-schedule
 80 | 
 81 | # SageMath parsed files
 82 | *.sage.py
 83 | 
 84 | # Environments
 85 | .env
 86 | .venv
 87 | env/
 88 | venv/
 89 | ENV/
 90 | env.bak/
 91 | venv.bak/
 92 | 
 93 | # Spyder project settings
 94 | .spyderproject
 95 | .spyproject
 96 | 
 97 | # Rope project settings
 98 | .ropeproject
 99 | 
100 | # mkdocs documentation
101 | /site
102 | 
103 | # mypy
104 | .mypy_cache/
105 | 


--------------------------------------------------------------------------------
/LICENSE:
--------------------------------------------------------------------------------
  1 |                                  Apache License
  2 |                            Version 2.0, January 2004
  3 |                         http://www.apache.org/licenses/
  4 | 
  5 |    TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
  6 | 
  7 |    1. Definitions.
  8 | 
  9 |       "License" shall mean the terms and conditions for use, reproduction,
 10 |       and distribution as defined by Sections 1 through 9 of this document.
 11 | 
 12 |       "Licensor" shall mean the copyright owner or entity authorized by
 13 |       the copyright owner that is granting the License.
 14 | 
 15 |       "Legal Entity" shall mean the union of the acting entity and all
 16 |       other entities that control, are controlled by, or are under common
 17 |       control with that entity. For the purposes of this definition,
 18 |       "control" means (i) the power, direct or indirect, to cause the
 19 |       direction or management of such entity, whether by contract or
 20 |       otherwise, or (ii) ownership of fifty percent (50%) or more of the
 21 |       outstanding shares, or (iii) beneficial ownership of such entity.
 22 | 
 23 |       "You" (or "Your") shall mean an individual or Legal Entity
 24 |       exercising permissions granted by this License.
 25 | 
 26 |       "Source" form shall mean the preferred form for making modifications,
 27 |       including but not limited to software source code, documentation
 28 |       source, and configuration files.
 29 | 
 30 |       "Object" form shall mean any form resulting from mechanical
 31 |       transformation or translation of a Source form, including but
 32 |       not limited to compiled object code, generated documentation,
 33 |       and conversions to other media types.
 34 | 
 35 |       "Work" shall mean the work of authorship, whether in Source or
 36 |       Object form, made available under the License, as indicated by a
 37 |       copyright notice that is included in or attached to the work
 38 |       (an example is provided in the Appendix below).
 39 | 
 40 |       "Derivative Works" shall mean any work, whether in Source or Object
 41 |       form, that is based on (or derived from) the Work and for which the
 42 |       editorial revisions, annotations, elaborations, or other modifications
 43 |       represent, as a whole, an original work of authorship. For the purposes
 44 |       of this License, Derivative Works shall not include works that remain
 45 |       separable from, or merely link (or bind by name) to the interfaces of,
 46 |       the Work and Derivative Works thereof.
 47 | 
 48 |       "Contribution" shall mean any work of authorship, including
 49 |       the original version of the Work and any modifications or additions
 50 |       to that Work or Derivative Works thereof, that is intentionally
 51 |       submitted to Licensor for inclusion in the Work by the copyright owner
 52 |       or by an individual or Legal Entity authorized to submit on behalf of
 53 |       the copyright owner. For the purposes of this definition, "submitted"
 54 |       means any form of electronic, verbal, or written communication sent
 55 |       to the Licensor or its representatives, including but not limited to
 56 |       communication on electronic mailing lists, source code control systems,
 57 |       and issue tracking systems that are managed by, or on behalf of, the
 58 |       Licensor for the purpose of discussing and improving the Work, but
 59 |       excluding communication that is conspicuously marked or otherwise
 60 |       designated in writing by the copyright owner as "Not a Contribution."
 61 | 
 62 |       "Contributor" shall mean Licensor and any individual or Legal Entity
 63 |       on behalf of whom a Contribution has been received by Licensor and
 64 |       subsequently incorporated within the Work.
 65 | 
 66 |    2. Grant of Copyright License. Subject to the terms and conditions of
 67 |       this License, each Contributor hereby grants to You a perpetual,
 68 |       worldwide, non-exclusive, no-charge, royalty-free, irrevocable
 69 |       copyright license to reproduce, prepare Derivative Works of,
 70 |       publicly display, publicly perform, sublicense, and distribute the
 71 |       Work and such Derivative Works in Source or Object form.
 72 | 
 73 |    3. Grant of Patent License. Subject to the terms and conditions of
 74 |       this License, each Contributor hereby grants to You a perpetual,
 75 |       worldwide, non-exclusive, no-charge, royalty-free, irrevocable
 76 |       (except as stated in this section) patent license to make, have made,
 77 |       use, offer to sell, sell, import, and otherwise transfer the Work,
 78 |       where such license applies only to those patent claims licensable
 79 |       by such Contributor that are necessarily infringed by their
 80 |       Contribution(s) alone or by combination of their Contribution(s)
 81 |       with the Work to which such Contribution(s) was submitted. If You
 82 |       institute patent litigation against any entity (including a
 83 |       cross-claim or counterclaim in a lawsuit) alleging that the Work
 84 |       or a Contribution incorporated within the Work constitutes direct
 85 |       or contributory patent infringement, then any patent licenses
 86 |       granted to You under this License for that Work shall terminate
 87 |       as of the date such litigation is filed.
 88 | 
 89 |    4. Redistribution. You may reproduce and distribute copies of the
 90 |       Work or Derivative Works thereof in any medium, with or without
 91 |       modifications, and in Source or Object form, provided that You
 92 |       meet the following conditions:
 93 | 
 94 |       (a) You must give any other recipients of the Work or
 95 |           Derivative Works a copy of this License; and
 96 | 
 97 |       (b) You must cause any modified files to carry prominent notices
 98 |           stating that You changed the files; and
 99 | 
100 |       (c) You must retain, in the Source form of any Derivative Works
101 |           that You distribute, all copyright, patent, trademark, and
102 |           attribution notices from the Source form of the Work,
103 |           excluding those notices that do not pertain to any part of
104 |           the Derivative Works; and
105 | 
106 |       (d) If the Work includes a "NOTICE" text file as part of its
107 |           distribution, then any Derivative Works that You distribute must
108 |           include a readable copy of the attribution notices contained
109 |           within such NOTICE file, excluding those notices that do not
110 |           pertain to any part of the Derivative Works, in at least one
111 |           of the following places: within a NOTICE text file distributed
112 |           as part of the Derivative Works; within the Source form or
113 |           documentation, if provided along with the Derivative Works; or,
114 |           within a display generated by the Derivative Works, if and
115 |           wherever such third-party notices normally appear. The contents
116 |           of the NOTICE file are for informational purposes only and
117 |           do not modify the License. You may add Your own attribution
118 |           notices within Derivative Works that You distribute, alongside
119 |           or as an addendum to the NOTICE text from the Work, provided
120 |           that such additional attribution notices cannot be construed
121 |           as modifying the License.
122 | 
123 |       You may add Your own copyright statement to Your modifications and
124 |       may provide additional or different license terms and conditions
125 |       for use, reproduction, or distribution of Your modifications, or
126 |       for any such Derivative Works as a whole, provided Your use,
127 |       reproduction, and distribution of the Work otherwise complies with
128 |       the conditions stated in this License.
129 | 
130 |    5. Submission of Contributions. Unless You explicitly state otherwise,
131 |       any Contribution intentionally submitted for inclusion in the Work
132 |       by You to the Licensor shall be under the terms and conditions of
133 |       this License, without any additional terms or conditions.
134 |       Notwithstanding the above, nothing herein shall supersede or modify
135 |       the terms of any separate license agreement you may have executed
136 |       with Licensor regarding such Contributions.
137 | 
138 |    6. Trademarks. This License does not grant permission to use the trade
139 |       names, trademarks, service marks, or product names of the Licensor,
140 |       except as required for reasonable and customary use in describing the
141 |       origin of the Work and reproducing the content of the NOTICE file.
142 | 
143 |    7. Disclaimer of Warranty. Unless required by applicable law or
144 |       agreed to in writing, Licensor provides the Work (and each
145 |       Contributor provides its Contributions) on an "AS IS" BASIS,
146 |       WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
147 |       implied, including, without limitation, any warranties or conditions
148 |       of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
149 |       PARTICULAR PURPOSE. You are solely responsible for determining the
150 |       appropriateness of using or redistributing the Work and assume any
151 |       risks associated with Your exercise of permissions under this License.
152 | 
153 |    8. Limitation of Liability. In no event and under no legal theory,
154 |       whether in tort (including negligence), contract, or otherwise,
155 |       unless required by applicable law (such as deliberate and grossly
156 |       negligent acts) or agreed to in writing, shall any Contributor be
157 |       liable to You for damages, including any direct, indirect, special,
158 |       incidental, or consequential damages of any character arising as a
159 |       result of this License or out of the use or inability to use the
160 |       Work (including but not limited to damages for loss of goodwill,
161 |       work stoppage, computer failure or malfunction, or any and all
162 |       other commercial damages or losses), even if such Contributor
163 |       has been advised of the possibility of such damages.
164 | 
165 |    9. Accepting Warranty or Additional Liability. While redistributing
166 |       the Work or Derivative Works thereof, You may choose to offer,
167 |       and charge a fee for, acceptance of support, warranty, indemnity,
168 |       or other liability obligations and/or rights consistent with this
169 |       License. However, in accepting such obligations, You may act only
170 |       on Your own behalf and on Your sole responsibility, not on behalf
171 |       of any other Contributor, and only if You agree to indemnify,
172 |       defend, and hold each Contributor harmless for any liability
173 |       incurred by, or claims asserted against, such Contributor by reason
174 |       of your accepting any such warranty or additional liability.
175 | 
176 |    END OF TERMS AND CONDITIONS
177 | 
178 |    APPENDIX: How to apply the Apache License to your work.
179 | 
180 |       To apply the Apache License to your work, attach the following
181 |       boilerplate notice, with the fields enclosed by brackets "[]"
182 |       replaced with your own identifying information. (Don't include
183 |       the brackets!)  The text should be enclosed in the appropriate
184 |       comment syntax for the file format. We also recommend that a
185 |       file or class name and description of purpose be included on the
186 |       same "printed page" as the copyright notice for easier
187 |       identification within third-party archives.
188 | 
189 |    Copyright [yyyy] [name of copyright owner]
190 | 
191 |    Licensed under the Apache License, Version 2.0 (the "License");
192 |    you may not use this file except in compliance with the License.
193 |    You may obtain a copy of the License at
194 | 
195 |        http://www.apache.org/licenses/LICENSE-2.0
196 | 
197 |    Unless required by applicable law or agreed to in writing, software
198 |    distributed under the License is distributed on an "AS IS" BASIS,
199 |    WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
200 |    See the License for the specific language governing permissions and
201 |    limitations under the License.
202 | 


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
 1 | # simulink_python
 2 | 
 3 | 使用simulink进行环境的模拟，使用python编写强化学习代码
 4 | 
 5 | ## 快速开始
 6 | 
 7 | ## 项目简介
 8 | 
 9 | * ##### tcp通信模块测试
10 | 
11 |   matlab与python之间使用tcp协议进行本地阻塞式通信，matlab接收python端信息后，才能使用simulink进行模拟（目前未解决模拟步长问题）。
12 | 
13 |   尝试将matlab和python分别作为客户端和服务端进行测试。其中，matlab作为客户端模拟100步时间为20s，python作为客户端模拟100步时间为2min。测试代码在[这里](./tcp_connection_attempt)。
14 | 
15 |   ![image](./assets/demo.png)
16 | 
17 | * ##### rl模块测试
18 | 
19 |   ~~使用的是经典的[CartPole](./some_simulink_model/rlCartPoleSimscapeModel.slx)模型~~
20 | 
21 |   在调bug无果之后，准备先试试这个[项目](https://github.com/qLience/AI-Pump-for-Underfloor-Heating-systems) 
22 | 
23 | * ##### 尝试项目
24 | 
25 |   项目缺少'svdutilitieslib' matlab，提示install 'Embedded Coder Support Package for ARM Cortex-A Processors，安装完之后发现无法打开matlab
26 | 
27 |   将中文用户名修改为英文之后，问题解决，打开matlab之后发现所安装的模块没起作用，继续安装其他可能有用的模块
28 | 
29 | * ##### 在服务器是部署安装matlab
30 | 
31 |   找到了两篇很好的博客，一个是[如何安装](<https://blog.csdn.net/u011387593/article/details/84883474>)，一个是[更好的使用matlab](<https://www.cnblogs.com/yinxiangnan-charles/p/5625463.html>)，如果本地用不了的话就准备用服务器上的matlab了
32 | 
33 | * ##### 调试并开始训练
34 | 
35 |   分别跑了pytorch版本和tensorflow版本的dqn
36 | 
37 |   1. 发现其pytorch版本非常陈旧，准备自己重写
38 |   2. tensorflow版本的没有问题，出现权限获取失败的错误，准备chmod那个目录，但现在显卡似乎not free，准备待会再跑
39 |   3. 两者的报错日志记录[在这](log.txt)B.T.W，服务器打开matlab 耗时10s，除了操作不方便意外没什么缺点
40 | 
41 | * ##### 训练成功
42 | 
43 |   tensorflow 修改目录权限后跑通了，但还是报了Write failed because file could not be opened的下一步准备进行一些修改吧(包括环境和网络)
44 | 
45 |   截图如下:
46 | 
47 |   ![train_success2](./assets/train_success1.png)
48 | 
49 |   ![train_success2](./assets/train_success2.png)
50 | 
51 | * [有关rl_attention的一些论文](./attention_rl.md)，有时间再去看吧
52 | 
53 | * ##### 开始重写其代码并进行训练
54 | 
55 |   1. 代码放置在[test文件夹](./test)中,初步将其action_space和state_space分别定为3,3。通信的频率和间歇性通信还有待处理。在这之后准备用matlab写的CartPole进行测试
56 |   2. 发现一个问题: python接受state信息时，simulink会将那段时间所累积的所有state信息发送至python端,而simulink只要接收到python的action信息就会立即采取动作。目前想到的解决方法是：将所有的state信息累计的每个状态求平均值。不过这样也就解决了如何让simulink挂机的问题。
57 |   3. 测试控制温度的程序已经跑完了，代码在test文件夹中。
58 |   4. 解决了一些问题:1)可以自行设定通信方式，目前使用的是停等式通信。simulink端设置timeout时长，若python未在超时时长内应答则报错。在等待过程中，simulink是不进行模拟的。2)通过调整Packing option来调整每次step的时间。以此实现了每次通信模拟one step
59 | 
60 | * ##### tips：
61 | 
62 |   Google之后没有发现使用Pymodelica搭建模型的RL项目，不准备使用modelica入手
63 | 
64 |   [关于attention的调查](./attention_rl)
65 | 
66 | ### 参考链接
67 | 
68 | * [UDP&TCP通信测试](<https://blog.csdn.net/tiancai13579/article/details/53039437?locationNum=5&fps=1>)
69 | * [调用simulink&通信](<https://github.com/sherrysherryli/simulink-python-communication>)
70 | * [Create Simulink Environments for Reinforcement Learning](<https://www.mathworks.com/help/reinforcement-learning/ug/create-simulink-environments-for-reinforcement-learning.html>)
71 | * [Load Predefined Simulink Environments](<https://www.mathworks.com/help/reinforcement-learning/ug/create-predefined-simulink-environments.html>)
72 | * [Train DDPG Agent to Swing Up and Balance Cart-Pole System](<https://www.mathworks.com/help/reinforcement-learning/ug/train-ddpg-agent-to-swing-up-and-balance-cart-pole-system.html>)
73 | 
74 | *  [qLience / AI-Pump-for-Underfloor-Heating-systems](https://github.com/qLience/AI-Pump-for-Underfloor-Heating-systems)
75 | 


--------------------------------------------------------------------------------
/assets/demo.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/sunnyswag/simulink_python/481cc075cbf1036393b66a668676489eac3bc088/assets/demo.png


--------------------------------------------------------------------------------
/assets/train_success1.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/sunnyswag/simulink_python/481cc075cbf1036393b66a668676489eac3bc088/assets/train_success1.png


--------------------------------------------------------------------------------
/assets/train_success2.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/sunnyswag/simulink_python/481cc075cbf1036393b66a668676489eac3bc088/assets/train_success2.png


--------------------------------------------------------------------------------
/attention_rl.md:
--------------------------------------------------------------------------------
 1 | 1.[Reinforcement Learning in Multidimensional Environments Relies on Attention Mechanisms](<https://www.jneurosci.org/content/35/21/8145?utm_source=TrendMD&utm_medium=cpc&utm_campaign=JNeurosci_TrendMD_1>)
 2 | 
 3 | 2.[Reinforcement Learning with Attention that Works: A Self-Supervised Approach](Reinforcement Learning with Attention that Works: A Self-Supervised Approach)
 4 | 
 5 | 3.[VMAV-C: A Deep Attention-based Reinforcement Learning Algorithm for Model-based Control](<https://arxiv.org/abs/1812.09968>)
 6 | 
 7 | 4.[Attention and Reinforcement Learning:Constructing Representations from Indirect Feedback](<http://matt.colorado.edu/papers/cogsci10cj.pdf>)
 8 | 
 9 | 5.[Better deep visual attention with reinforcement learning in action recognition](<https://ieeexplore.ieee.org/document/8050638>)
10 | 
11 | 6.[Attention-based Deep Reinforcement Learningfor Multi-view Environments](<http://www.ifaamas.org/Proceedings/aamas2019/pdfs/p1805.pdf>)
12 | 
13 | 7.[Recurrent Models of Visual Attention](<https://papers.nips.cc/paper/5542-recurrent-models-of-visual-attention.pdf>)
14 | 
15 | 


--------------------------------------------------------------------------------
/log.txt:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/sunnyswag/simulink_python/481cc075cbf1036393b66a668676489eac3bc088/log.txt


--------------------------------------------------------------------------------
/rl_demo/CartPoleSimscapeModel.m:
--------------------------------------------------------------------------------
1 | open_system('example')


--------------------------------------------------------------------------------
/some_simulink_model/SHTL1_TestLevel_2_MODEL.slx:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/sunnyswag/simulink_python/481cc075cbf1036393b66a668676489eac3bc088/some_simulink_model/SHTL1_TestLevel_2_MODEL.slx


--------------------------------------------------------------------------------
/some_simulink_model/SHTL1_TestLevel_2_MODEL.slx.autosave:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/sunnyswag/simulink_python/481cc075cbf1036393b66a668676489eac3bc088/some_simulink_model/SHTL1_TestLevel_2_MODEL.slx.autosave


--------------------------------------------------------------------------------
/some_simulink_model/example.slx:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/sunnyswag/simulink_python/481cc075cbf1036393b66a668676489eac3bc088/some_simulink_model/example.slx


--------------------------------------------------------------------------------
/some_simulink_model/rlCartPoleSimscapeModel.slx:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/sunnyswag/simulink_python/481cc075cbf1036393b66a668676489eac3bc088/some_simulink_model/rlCartPoleSimscapeModel.slx


--------------------------------------------------------------------------------
/some_simulink_model/rlSimplePendulumModel.slx:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/sunnyswag/simulink_python/481cc075cbf1036393b66a668676489eac3bc088/some_simulink_model/rlSimplePendulumModel.slx


--------------------------------------------------------------------------------
/tcp_connection_attempt/matlab_client.m:
--------------------------------------------------------------------------------
 1 | clc;
 2 | clear;
 3 | close all;
 4 | open_system('../some_simulink_model/example.slx')
 5 | 
 6 | sim_time_step = 0.01;
 7 | 
 8 | % start the simulation and pause the simulation, waiting for singal from python
 9 | set_param(gcs,'SimulationCommand','start','SimulationCommand','pause');
10 | 
11 | % open a server, it will block until a client connect to it
12 | %s = tcpip('127.0.0.1', 54320,  'NetworkRole', 'server');
13 | s = tcpip('127.0.0.1', 54320, 'Timeout', 60,'InputBufferSize',10240);
14 | fopen(s);
15 | % main loop
16 | while(1) % can be changed   
17 |     while(1) %loop, until read some data
18 |         nBytes = get(s,'BytesAvailable');
19 |         if nBytes>0
20 |             break;
21 |         end
22 |     end
23 |     command = fread(s,nBytes);
24 |     data=str2num(char(command()'));
25 |     if isempty(data)
26 |         data=0;
27 |     end
28 | 
29 |     % set a paramter in the simulink model using the data get from python
30 |     set_param('example/K','Gain',num2str(data))
31 |         
32 |     % run the simulink model for a step
33 |     set_param(gcs, 'SimulationCommand', 'step');  
34 |     
35 |     % puase the simulink model and send some data to python
36 |     pause(1);
37 |     u=states.data(end,:);
38 |     fwrite(s, jsonencode(u));
39 | end
40 | 
41 | 
42 | 


--------------------------------------------------------------------------------
/tcp_connection_attempt/matlab_service.m:
--------------------------------------------------------------------------------
 1 | clc;
 2 | clear;
 3 | close all;
 4 | mdl = '../some_simulink_model/example.slx';
 5 | open_system(mdl)
 6 | 
 7 | sim_time_step = 0.01;
 8 | 
 9 | % start the simulation and pause the simulation, waiting for singal from python
10 | set_param(gcs,'SimulationCommand','start','SimulationCommand','pause');
11 | 
12 | % open a server, it will block until a client connect to it
13 | s = tcpip('127.0.0.1', 54320,  'NetworkRole', 'server');
14 | fopen(s);
15 | 
16 | count=0;
17 | % main loop
18 | while  count<100 % can be changed      
19 |     while(1) %loop, until read some data
20 |         nBytes = get(s,'BytesAvailable');
21 |         if nBytes>0
22 |             break;
23 |         end
24 |     end
25 |     command = fread(s,nBytes);
26 |     data=str2num(char(command()'));
27 |     if isempty(data)
28 |         data=0;
29 |     end
30 |     % set a paramter in the simulink model using the data get from python
31 |     set_param('example/K','Gain',num2str(data));
32 |         
33 |     % run the simulink model for a step
34 |     set_param(gcs, 'SimulationCommand', 'step');  
35 |     
36 |     % puase the simulink model and send some data to python
37 |     pause(0.1);
38 |     u=states.data(end,:);
39 |     fwrite(s, jsonencode(u));
40 |     count=count+1;
41 | end
42 | fclose(s);
43 | 
44 | 


--------------------------------------------------------------------------------
/tcp_connection_attempt/tcp_client.py:
--------------------------------------------------------------------------------
 1 | #!/usr/bin/env python    
 2 | # -*- coding: utf-8 -*
 3 | 
 4 | import socket
 5 | import time
 6 | import json
 7 | 
 8 | def tcp_sim():
 9 | 
10 |     sever_port=('127.0.0.1', 54320)
11 | 
12 |     try:
13 |         # create an AF_INET, STREAM socket (TCP)
14 |         sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
15 |     except:
16 |         print('Failed to create a socket. ')
17 |         sys.exit()
18 |     print('Socket Created :)')
19 |     
20 |     sock.connect(sever_port)
21 |     print("start a client")
22 |     time.sleep(0.01)
23 | 
24 |     s="start"
25 |     s = bytes(s, encoding = "utf8")
26 |     sock.send(s)
27 |     print(s)
28 | 
29 | 
30 |     while 1:
31 |         buf = sock.recv(1000)
32 |         #print(buf)
33 |         buf_l = json.loads(buf)
34 |         print(buf_l)
35 |         control_signal=buf_l[0]*1
36 |         # s=str(control_signal)
37 |         s = bytes(str(control_signal), encoding = "utf8")
38 |         sock.send(s)
39 |         print(s)
40 |         #sock.send(control_signal)
41 |         #print(control_signal)
42 |     sock.close()
43 | 
44 | 
45 | 
46 | 
47 | if __name__ == '__main__':
48 |   
49 |     tcp_sim()
50 | 


--------------------------------------------------------------------------------
/tcp_connection_attempt/tcp_service.py:
--------------------------------------------------------------------------------
 1 | import socket
 2 | import time
 3 | import json
 4 | 
 5 | sock = socket.socket(socket.AF_INET,socket.SOCK_STREAM)#IPV4,TCP协议
 6 | sock.bind(('127.0.0.1', 54320))#绑定ip和端口，bind接受的是一个元组
 7 | sock.listen(5)#设置监听，其值阻塞队列长度，一共可以有5+1个客户端和服务器连接
 8 | 
 9 | print("start server")
10 | connection, address = sock.accept()#等待客户请求
11 | print("client ip is:",address)#打印客户端地址
12 | 
13 | s = bytes("start", encoding = "utf8")#输入start，开始
14 | connection.send(s)
15 | print(s)
16 | 
17 | while True:
18 |     buf = connection.recv(1000)#接收数据
19 |     print(buf)
20 |     buf_l = json.loads(buf)
21 |     control_signal=buf_l[0]*1
22 |     # s=str(control_signal)
23 |     connection.send(bytes(str(control_signal), encoding = "utf8"))
24 |     #connection.close()#关闭连接
25 |     time.sleep(1)
26 | sock.close()#关闭服务器
27 | 


--------------------------------------------------------------------------------
/test/test_cartpole/__init__.py:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/sunnyswag/simulink_python/481cc075cbf1036393b66a668676489eac3bc088/test/test_cartpole/__init__.py


--------------------------------------------------------------------------------
/test/test_cartpole/dqn_test.py:
--------------------------------------------------------------------------------
 1 | from environment import Environment
 2 | import time
 3 | env = Environment("test_dqn")
 4 | env.create_sockets_server()
 5 | print(env.reset())
 6 | print(env.step(0),' 0')
 7 | print(env.step(1),' 1')
 8 | print(env.step(2),' 2')
 9 | time.sleep(1)
10 | print(env.step(0),' 0')
11 | time.sleep(2)
12 | print(env.step(1),' 1')
13 | time.sleep(3)
14 | print(env.step(2),' 2')


--------------------------------------------------------------------------------
/test/test_cartpole/environment.py:
--------------------------------------------------------------------------------
  1 | import socket
  2 | import sys
  3 | import struct
  4 | import array
  5 | import time
  6 | import random
  7 | import numpy as np
  8 | 
  9 | class Environment:
 10 |     def __init__(self, env_name):
 11 |         self.env_name = env_name
 12 |         self.sendConn = 0 # socket 发送端对象
 13 |         self.send_and_recv_host = 'localhost'
 14 |         self.sendPort = 50000
 15 |         self.recvConn = 0 # socket 接收数据的对象
 16 |         self.recvPort = 50001
 17 | 
 18 |         self.current_state = [0,0,0]
 19 | 
 20 |     # 创建socket服务
 21 |     def create_sockets_server(self):
 22 | 
 23 |         # 创建发送端socket服务
 24 |         sockets_server_send = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
 25 |         sockets_server_send.bind((self.send_and_recv_host, self.sendPort))
 26 |         print('bind server port success')
 27 | 
 28 |         sockets_server_send.listen(1)
 29 |         print("Wait 20 seconds for a response from client to server {} ".format (self.sendPort))
 30 |         sockets_server_send.settimeout(20)
 31 |         
 32 |         try:
 33 |             self.sendConn, addr = sockets_server_send.accept()
 34 |         except socket.timeout:
 35 |             print("connection timeout")
 36 |             sys.exit()
 37 |         print("Server connnection success ! Address :{} , port :{}".format(addr, self.sendPort))
 38 | 
 39 |         # 创建接收端socket服务
 40 |         sockets_server_recv = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
 41 |         sockets_server_recv.bind((self.send_and_recv_host, self.recvPort))
 42 |         print("The receiver port binding success")
 43 |         
 44 |         sockets_server_recv.listen(1)
 45 |         self.recvConn, addr = sockets_server_recv.accept()
 46 |         print("receiving connect success！Address :{} ,port :{}".format(addr, self.recvPort))
 47 | 
 48 |     # 发送动作值simulink
 49 |     def _send_action(self, action):
 50 |         action = struct.pack("I", action)
 51 |         self.sendConn.sendall(action)
 52 | 
 53 |     # 接收来自simulink的状态信息
 54 |     def _receive_state(self):
 55 |         data = self.recvConn.recv(2048)
 56 |         data = array.array('d', data)
 57 |         # 将返回的多次state信息求平均
 58 |         data = np.array(data).reshape(-1,3)
 59 |         print(data.shape[0])
 60 |         return list(data.mean(axis=0))
 61 | 
 62 |     # 计算奖励值
 63 |     def _calculate_reward(self):
 64 |         T1, Tmix, Treturn = self.current_state[0],self.current_state[1],self.current_state[2]
 65 |         room_goal = 22 # 房间一的标准温度
 66 |         room_LL = 15 # lower limit
 67 |         room_UL = 29 # Upper limit
 68 |         Tmix_LL = 15 # Lower limit
 69 |         Tmix_UL = 44 # Upper limit
 70 | 
 71 |         distance = abs(room_goal - T1)
 72 |         if distance > 7 or Tmix < Tmix_LL or Tmix > Tmix_UL:
 73 |             reward = -1
 74 |         elif 0.5 < distance <= 7:
 75 |             reward = (7 - distance) * 0.5
 76 |         else:
 77 |             reward = 7 - distance
 78 | 
 79 |         return reward
 80 | 
 81 |     def step(self, action):
 82 |         self._send_action(action)
 83 |         time.sleep(0.1)
 84 |         env_values = self._receive_state()
 85 |         if env_values is not None:
 86 |             self.current_state = env_values
 87 |         reward = self._calculate_reward()
 88 |         done = False
 89 |         info = "normal"
 90 |         return self.current_state, reward, done, info
 91 | 
 92 |     def reset(self):
 93 |         self._send_action(random.randint(0,2))
 94 |         time.sleep(0.1)
 95 |         env_values = self._receive_state()
 96 |         if env_values is not None:
 97 |             self.current_state = env_values
 98 |             print("current state T1: {} ,Tmix: {} ,Treturn: {} ".format(self.current_state[0],self.current_state[1],self.current_state[2]))
 99 |         return self.current_state
100 |             
101 |         


--------------------------------------------------------------------------------
/test/test_origin/__init__.py:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/sunnyswag/simulink_python/481cc075cbf1036393b66a668676489eac3bc088/test/test_origin/__init__.py


--------------------------------------------------------------------------------
/test/test_origin/dqn_test.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/sunnyswag/simulink_python/481cc075cbf1036393b66a668676489eac3bc088/test/test_origin/dqn_test.png


--------------------------------------------------------------------------------
/test/test_origin/dqn_test.py:
--------------------------------------------------------------------------------
  1 | from environment import Environment
  2 | import os
  3 | import matplotlib.pyplot as plt
  4 | import numpy as np
  5 | import random
  6 | import torch
  7 | import torch.nn as nn
  8 | import torch.optim as optim
  9 | from torch.nn.utils import clip_grad_norm_
 10 | 
 11 | 
 12 | class ReplayBuffer:
 13 |     def __init__(self, column, max_size, batch_size):
 14 |         self.current_state = np.zeros((max_size, column), dtype=np.float32)
 15 |         self.next_state = np.zeros((max_size, column), dtype=np.float32)
 16 |         self.action = np.zeros(max_size, dtype=np.float32)
 17 |         self.reward = np.zeros(max_size, dtype=np.float32)
 18 |         self.done = np.zeros(max_size,dtype=np.float32)
 19 |         self.max_size, self.batch_size = max_size, batch_size
 20 |         self.size, self.current_index = 0, 0
 21 |     
 22 |     def store(self, current_state, action, next_state, reward, done):
 23 |         self.current_state[self.current_index] = current_state
 24 |         self.action[self.current_index] = action
 25 |         self.next_state[self.current_index] = next_state
 26 |         self.reward[self.current_index] = reward
 27 |         self.done[self.current_index] = done
 28 |         self.current_index = (self.current_index + 1) % self.max_size
 29 |         self.size = min(self.size + 1, self.max_size)
 30 |     
 31 |     def sample_batch(self):
 32 |         ptr = np.random.choice(self.size, self.batch_size)
 33 |         return dict(current_state=self.current_state[ptr],
 34 |                     next_state=self.next_state[ptr],
 35 |                     action=self.action[ptr],
 36 |                     reward=self.reward[ptr],
 37 |                     done=self.done[ptr]
 38 |         )
 39 | 
 40 |     def __len__(self):
 41 |         return self.size
 42 | 
 43 | 
 44 | class Network(nn.Module):
 45 |     def __init__(self, in_dim, out_dim):
 46 |         super(Network, self).__init__()
 47 |         
 48 |         self.layers = nn.Sequential(
 49 |             nn.Linear(in_dim, 128),
 50 |             nn.ReLU(),
 51 |             nn.Linear(128, 128),
 52 |             nn.ReLU(),
 53 |             nn.Linear(128, out_dim)
 54 |         )
 55 |     def forward(self,x):
 56 |         return self.layers(x)
 57 | 
 58 | 
 59 | min_epsilon = 0.05
 60 | max_epsilon = 1
 61 | epsilon_decay = 80
 62 | epsilon_episode = lambda episode : min_epsilon + np.exp(-episode / epsilon_decay)*0.95
 63 | 
 64 | 
 65 | env = Environment("test_you")
 66 | state_space = 3
 67 | action_space = 3
 68 | 
 69 | batch_size = 32
 70 | max_size = 1000
 71 | memory = ReplayBuffer(state_space, max_size, batch_size)
 72 | 
 73 | device = torch.device("cuda:1" if torch.cuda.is_available() else "cpu")
 74 | 
 75 | network = Network(state_space,action_space).to(device)
 76 | target_network = Network(state_space,action_space).to(device)
 77 | target_network.load_state_dict(network.state_dict())
 78 | target_network.eval()
 79 | 
 80 | optimizer = optim.Adam(network.parameters())
 81 | 
 82 | gamma = 0.99
 83 | target_update = 200
 84 | 
 85 | 
 86 | def select_action(episode, state):
 87 |     if np.random.random_sample() > epsilon_episode(episode):
 88 |         selected_action = network(torch.FloatTensor(state).to(device)).argmax().detach().item()
 89 |     else:
 90 |         selected_action = random.randint(0,2)
 91 |     return selected_action
 92 | 
 93 | 
 94 | def train():
 95 |     samples = memory.sample_batch()
 96 |     state = torch.FloatTensor(samples["current_state"]).to(device).to(device)
 97 |     next_state = torch.FloatTensor(samples["next_state"]).to(device)
 98 |     action = torch.LongTensor(samples["action"].reshape(-1, 1)).to(device)
 99 |     reward = torch.FloatTensor(samples["reward"].reshape(-1, 1)).to(device)
100 |     done = torch.FloatTensor(samples["done"].reshape(-1, 1)).to(device)
101 |     
102 |     current_Q_value = network(state).gather(1, action)
103 |     next_Q_value = target_network(next_state).max(dim=1,keepdim=True)[0].detach()
104 |     target = (reward + gamma*next_Q_value*(1 - done)).to(device)
105 |     loss = ((target - current_Q_value).pow(2)).mean()
106 |     optimizer.zero_grad()
107 |     loss.backward()
108 |     clip_grad_norm_(network.parameters(),1.0,norm_type=1) # Gradient clipping(增加稳定性)
109 |     optimizer.step()
110 |     return loss.item()
111 | 
112 | 
113 | def plot_and_save(frame_idx, rewards, losses):
114 |     rewards_factor = 10
115 |     losses_smooth_x = np.arange(len(losses))
116 |     losses_smooth = [np.mean(losses[i:i+rewards_factor]) if i > rewards_factor else np.mean(losses[0:i+1])
117 |                      for i in range(len(losses))]
118 |     rewards_smooth_x = np.arange(len(rewards))
119 |     rewards_smooth = [np.mean(rewards[i:i+rewards_factor]) if i > rewards_factor else np.mean(rewards[0:i+1])
120 |                       for i in range(len(rewards))]
121 |     
122 |     for i in range(len(losses)//3000):
123 |         losses_smooth = losses_smooth[::2]
124 |         losses_smooth_x = losses_smooth_x[::2]
125 |     for i in range(len(rewards)//200):
126 |         rewards_smooth = rewards_smooth[::2]
127 |         rewards_smooth_x = rewards_smooth_x[::2]
128 |         
129 |     plt.figure(figsize=(18,10))   
130 |     plt.subplot(211)
131 |     plt.xlabel("episode")
132 |     plt.ylabel("episode_rewards")
133 |     plt.title('episode %s. rewards: %s' % (frame_idx, rewards[-1]))
134 |     plt.plot(rewards, label="Rewards",color='lightsteelblue',linewidth='1')
135 |     plt.plot(rewards_smooth_x, rewards_smooth, 
136 |              label="Smoothed_Rewards",color='darkorange',linewidth='3')
137 |     plt.legend(loc='best')
138 |     
139 |     plt.subplot(212)
140 |     plt.title('loss')
141 |     plt.plot(losses,label="Losses",color='lightsteelblue',linewidth='1')
142 |     plt.plot(losses_smooth_x, losses_smooth, 
143 |              label="Smoothed_Losses",color='darkorange',linewidth='3')
144 |     plt.legend(loc='best')
145 |     
146 |     plt.savefig("dqn_test.png")
147 | 
148 | all_rewards = []
149 | losses = []
150 | update_count = 0
151 | 
152 | env.create_sockets_server()
153 | state = env.reset()
154 | for episode in range(200):
155 |     rewards = 0
156 |     for i in range(100):
157 |         action = select_action(episode, state)
158 |         next_state, reward, done, _ = env.step(action)
159 |         print("episode:",episode,"state:",next_state,"reward:", reward,"action:",action )
160 |         memory.store(state, action, next_state, reward, done)
161 |         state = next_state
162 |         rewards += reward
163 |         #if done:
164 |             #break
165 |         if len(memory) > batch_size:
166 |             loss = train()
167 |             update_count += 1
168 |             losses.append(loss)
169 |             if update_count % target_update == 0:
170 |                 target_network.load_state_dict(network.state_dict())     
171 |     all_rewards.append(rewards)
172 |     plot_and_save(episode , [round(all_rewards[i], 3) for i in range(len(all_rewards))], losses)


--------------------------------------------------------------------------------
/test/test_origin/environment.py:
--------------------------------------------------------------------------------
  1 | import socket
  2 | import sys
  3 | import struct
  4 | import array
  5 | import time
  6 | import random
  7 | import numpy as np
  8 | 
  9 | class Environment:
 10 |     def __init__(self, env_name):
 11 |         self.env_name = env_name
 12 |         self.sendConn = 0 # socket 发送端对象
 13 |         self.send_and_recv_host = 'localhost'
 14 |         self.sendPort = 50000
 15 |         self.recvConn = 0 # socket 接收数据的对象
 16 |         self.recvPort = 50001
 17 | 
 18 |         self.current_state = [0,0,0]
 19 | 
 20 |     # 创建socket服务
 21 |     def create_sockets_server(self):
 22 | 
 23 |         # 创建发送端socket服务
 24 |         sockets_server_send = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
 25 |         sockets_server_send.bind((self.send_and_recv_host, self.sendPort))
 26 |         print('bind server port success')
 27 | 
 28 |         sockets_server_send.listen(1)
 29 |         print("Wait 20 seconds for a response from client to server {} ".format (self.sendPort))
 30 |         sockets_server_send.settimeout(20)
 31 |         
 32 |         try:
 33 |             self.sendConn, addr = sockets_server_send.accept()
 34 |         except socket.timeout:
 35 |             print("connection timeout")
 36 |             sys.exit()
 37 |         print("Server connnection success ! Address :{} , port :{}".format(addr, self.sendPort))
 38 | 
 39 |         # 创建接收端socket服务
 40 |         sockets_server_recv = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
 41 |         sockets_server_recv.bind((self.send_and_recv_host, self.recvPort))
 42 |         print("The receiver port binding success")
 43 |         
 44 |         sockets_server_recv.listen(1)
 45 |         self.recvConn, addr = sockets_server_recv.accept()
 46 |         print("receiving connect success！Address :{} ,port :{}".format(addr, self.recvPort))
 47 | 
 48 |     # 发送动作值simulink
 49 |     def _send_action(self, action):
 50 |         action = struct.pack("I", action)
 51 |         self.sendConn.sendall(action)
 52 | 
 53 |     # 接收来自simulink的状态信息
 54 |     def _receive_state(self):
 55 |         data = self.recvConn.recv(2048)
 56 |         data = array.array('d', data)
 57 |         # 将返回的多次state信息求平均
 58 |         data = np.array(data).reshape(-1,3)
 59 |         data = list(data.mean(axis=0))
 60 |         data = [round(i, 2) for i in data]
 61 |         return data
 62 | 
 63 |     # 计算奖励值
 64 |     def _calculate_reward(self):
 65 |         T1, Tmix, Treturn = self.current_state[0],self.current_state[1],self.current_state[2]
 66 |         room_goal = 22 # 房间一的标准温度
 67 |         room_LL = 15 # lower limit
 68 |         room_UL = 29 # Upper limit
 69 |         Tmix_LL = 15 # Lower limit
 70 |         Tmix_UL = 44 # Upper limit
 71 | 
 72 |         distance = abs(room_goal - T1)
 73 |         if Tmix < Tmix_LL or Tmix > Tmix_UL :
 74 |             if distance >= 7 :
 75 |                 reward = -1 * distance * 0.1 
 76 |             else :
 77 |                 reward = -1
 78 |         else :
 79 |             if distance >= 7 :
 80 |                 reward = -1 * distance * 0.1
 81 |             elif 5 < distance < 7:
 82 |                 reward = (7 - distance) * 0.2
 83 |             elif 1 < distance <= 5:
 84 |                 reward = (7 - distance) * 0.5
 85 |             elif 0 <= distance <= 1:
 86 |                 reward = 7 - distance
 87 | 
 88 |         return round(reward, 3)
 89 | 
 90 |     def step(self, action):
 91 |         self._send_action(action)
 92 |         time.sleep(0.1)
 93 |         env_values = self._receive_state()
 94 |         if env_values is not None:
 95 |             self.current_state = env_values
 96 |         reward = self._calculate_reward()
 97 |         done = False
 98 |         info = "normal"
 99 |         return self.current_state, reward, done, info
100 | 
101 |     def reset(self):
102 |         self._send_action(random.randint(0,2))
103 |         time.sleep(0.1)
104 |         env_values = self._receive_state()
105 |         if env_values is not None:
106 |             self.current_state = env_values
107 |             print("current state T1: {} ,Tmix: {} ,Treturn: {} ".format(self.current_state[0],self.current_state[1],self.current_state[2]))
108 |         return self.current_state
109 |             
110 |         


--------------------------------------------------------------------------------
/test/test_origin/readme.md:
--------------------------------------------------------------------------------
 1 | ### Step1
 2 | 
 3 |  1. 发现除了第一次是传输10次state以外，其余时刻都是传输4个state的信息
 4 |  
 5 |  2. 将其取平均之后发现采取相同的action会得到不同趋势的state，环境不稳定,准备先跑通这个之后再尝试CartPole的环境
 6 |  
 7 |  3. 使用dqn_test.py测试脚本的log如下:
 8 | 
 9 |     >bind server port success
10 |     Wait 20 seconds for a response from client to server 50000
11 |     Server connnection success ! Address :('127.0.0.1', 48334) , port :50000
12 |     The receiver port binding success
13 |     receiving connect success！Address :('127.0.0.1', 40358) ,port :50001
14 |     1
15 |     current state T1: 16.0 ,Tmix: 23.2 ,Treturn: 16.0
16 |     [16.0, 23.2, 16.0]
17 |     1
18 |     ([15.0, 22.400000000000002, 16.88], 0.0, False, 'normal')  0
19 |     10
20 |     ([14.343, 18.060000000000002, 16.586000000000002], -1, False, 'normal')  1
21 |     4
22 |     ([14.245000000000001, 17.000000000000004, 15.962499999999999], -1, False, 'normal')  2
23 |     4
24 |     ([14.21, 20.2, 16.487499999999997], -1, False, 'normal')  0
25 |     4
26 |     ([14.200000000000001, 19.400000000000002, 16.7325], -1, False, 'normal')  1
27 |     4
28 |     ([14.192499999999999, 20.2, 16.5375], -1, False, 'normal')  2
29 | 
30 | ### Step2
31 | 
32 |  1. 使用CartPole的环境需要一段时间的适配，所以先使用origin的环境进行测试，测试结果如下2,3,4,5点
33 |  
34 |  2. 减少step size可以提升结果的准确度，总的模拟的时间也会变长。所以调整step size不可以调整单次通信模拟的次数。调整单次通信模拟次数也就是调整一次step的时间。所以打开并调整pacing option
35 | 
36 |  3. 阻塞式通信方式(勾选Enable blocking mode)。simulink接收了python传输的action之后才能进行渲染,python传输action之后等待0.1s,再接收simulink传输的state。
37 | 
38 |  4. TCP/IP Receive有两个输出，一个data，一个state。state:(0,1)表示有没有接收到
39 | 
40 |  5. block sample time为当前模块的模拟step只要在max step size 和min step size之间就好了
41 | 
42 | ### Questions
43 | 
44 |  1. 奖励值的图波动较大。原来奖励值不稳定很正常，有多个评价指标，某个指标超了来个负号，再乘上计算量，就会造成奖励值的波动。而且我们画图的时候用的是episode的总奖励值，所以波动会更大。而且在gym的环境中，一直操作有误的话它会done掉的，所以就会造成那个epoch的数据特别差。不过这都没关系，因为这种数据放到记忆库中也是可以学习的，对于学习来说，错误比正确更重要，从错误中学习(机器和人都一样)。
45 | **从正确中学习？或者说从正确中创造错误进行学习？**
46 | 
47 |  2. 梯度消失也和奖励值的设置有很大的关系。简单来说，有奖励值的差值就会有梯度的存在。假设：某个评价指标变化两个单位的量奖励值才会变化一个单位。比如评价指标从18.33到18.35奖励值才从1.81变为1.82,这其中是两倍的关系，而经过了loss函数的计算，这个奖励值会更大或者更小,**loss函数如何对reward梯度进行影响?**。假设奖励值的设置只和这个评价指标有关。这时就会出现梯度的消失。会出现的问题为:不容易收敛，也就是说神经网络不容易很快学到。如果问题无法慢性解决的话(也就是靠着这些梯度前进的话)。环境会在当前的状态往回走，也就是说它会倒退，为什么呢？在没有梯度的情况下，为了更好的学习获取更多的reward，我要自己创造梯度，而我现在所处的环境是没有梯度的，那我就回去走走吧，借着前几个state的梯度冲一冲，达到下一个reward更高的state。这又会造成另一个问题，我的state往回走多少呢，如果走的太多，可能会导致一直处于这个阶段的某几个状态，也就是所谓的**local optim**。将local optim理解为一个盆地再好不过了。为了冲出这个盆地，我之前的速度不够，所以我退了下来，来到盆地的另一边，借助另一边的坡度，给予我速度。下次能不能冲过就看我有没有足够的速度了。但是这样的盆地并非只有一个，而且在高维空间中，盆地所在的维度也不一样。所以当你的reward在到达某处之后在不断下降，一点都不要方，你的神经网络可能在计算要冲过这个盆地，所需要的速度也就是梯度是多少，就是说它会回退多少。但是，他可能并不知道，这样会使问题变得更加复杂。因为，由于数据不同，每次更新之后神经网络的结构也不一样，所以他可能根本找不到原来的那个盆地了。**为什么神经网络的梯度会选择回退呢？**
48 | 当然，数据多就没事，数据量足够大的情况下(比如DQN的batch sample)就可以比较好的解决这个问题，只要数据是没问题的。数据够多，梯度自然存在。
49 | 
50 |  3. 这样的话之前的很好解释训练好的模型，当真正用时效果却很差。在最后很多次epoch的训练过程中，使用的是操作很好的经验值，所以神经网络储存的大部分映射信息被这些操作很好的经验值占据。**能不能将训练好的那部分神经网络固定住呢？用另一个神经网络进行控制？**


--------------------------------------------------------------------------------