├── .gitignore ├── LICENSE ├── README.md ├── assets ├── demo.png ├── train_success1.png └── train_success2.png ├── attention_rl.md ├── log.txt ├── rl_demo └── CartPoleSimscapeModel.m ├── some_simulink_model ├── SHTL1_TestLevel_2_MODEL.slx ├── SHTL1_TestLevel_2_MODEL.slx.autosave ├── example.slx ├── rlCartPoleSimscapeModel.slx └── rlSimplePendulumModel.slx ├── tcp_connection_attempt ├── matlab_client.m ├── matlab_service.m ├── tcp_client.py └── tcp_service.py └── test ├── test_cartpole ├── __init__.py ├── dqn_test.py └── environment.py └── test_origin ├── __init__.py ├── dqn_test.png ├── dqn_test.py ├── environment.py └── readme.md /.gitignore: -------------------------------------------------------------------------------- 1 | # Byte-compiled / optimized / DLL files 2 | __pycache__/ 3 | *.py[cod] 4 | *$py.class 5 | 6 | # C extensions 7 | *.so 8 | 9 | # Distribution / packaging 10 | .Python 11 | build/ 12 | develop-eggs/ 13 | dist/ 14 | downloads/ 15 | eggs/ 16 | .eggs/ 17 | lib/ 18 | lib64/ 19 | parts/ 20 | sdist/ 21 | var/ 22 | wheels/ 23 | *.egg-info/ 24 | .installed.cfg 25 | *.egg 26 | MANIFEST 27 | 28 | # PyInstaller 29 | # Usually these files are written by a python script from a template 30 | # before PyInstaller builds the exe, so as to inject date/other infos into it. 31 | *.manifest 32 | *.spec 33 | 34 | # Installer logs 35 | pip-log.txt 36 | pip-delete-this-directory.txt 37 | 38 | # Unit test / coverage reports 39 | htmlcov/ 40 | .tox/ 41 | .coverage 42 | .coverage.* 43 | .cache 44 | nosetests.xml 45 | coverage.xml 46 | *.cover 47 | .hypothesis/ 48 | .pytest_cache/ 49 | 50 | # Translations 51 | *.mo 52 | *.pot 53 | 54 | # Django stuff: 55 | *.log 56 | local_settings.py 57 | db.sqlite3 58 | 59 | # Flask stuff: 60 | instance/ 61 | .webassets-cache 62 | 63 | # Scrapy stuff: 64 | .scrapy 65 | 66 | # Sphinx documentation 67 | docs/_build/ 68 | 69 | # PyBuilder 70 | target/ 71 | 72 | # Jupyter Notebook 73 | .ipynb_checkpoints 74 | 75 | # pyenv 76 | .python-version 77 | 78 | # celery beat schedule file 79 | celerybeat-schedule 80 | 81 | # SageMath parsed files 82 | *.sage.py 83 | 84 | # Environments 85 | .env 86 | .venv 87 | env/ 88 | venv/ 89 | ENV/ 90 | env.bak/ 91 | venv.bak/ 92 | 93 | # Spyder project settings 94 | .spyderproject 95 | .spyproject 96 | 97 | # Rope project settings 98 | .ropeproject 99 | 100 | # mkdocs documentation 101 | /site 102 | 103 | # mypy 104 | .mypy_cache/ 105 | -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | Apache License 2 | Version 2.0, January 2004 3 | http://www.apache.org/licenses/ 4 | 5 | TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION 6 | 7 | 1. Definitions. 8 | 9 | "License" shall mean the terms and conditions for use, reproduction, 10 | and distribution as defined by Sections 1 through 9 of this document. 11 | 12 | "Licensor" shall mean the copyright owner or entity authorized by 13 | the copyright owner that is granting the License. 14 | 15 | "Legal Entity" shall mean the union of the acting entity and all 16 | other entities that control, are controlled by, or are under common 17 | control with that entity. For the purposes of this definition, 18 | "control" means (i) the power, direct or indirect, to cause the 19 | direction or management of such entity, whether by contract or 20 | otherwise, or (ii) ownership of fifty percent (50%) or more of the 21 | outstanding shares, or (iii) beneficial ownership of such entity. 22 | 23 | "You" (or "Your") shall mean an individual or Legal Entity 24 | exercising permissions granted by this License. 25 | 26 | "Source" form shall mean the preferred form for making modifications, 27 | including but not limited to software source code, documentation 28 | source, and configuration files. 29 | 30 | "Object" form shall mean any form resulting from mechanical 31 | transformation or translation of a Source form, including but 32 | not limited to compiled object code, generated documentation, 33 | and conversions to other media types. 34 | 35 | "Work" shall mean the work of authorship, whether in Source or 36 | Object form, made available under the License, as indicated by a 37 | copyright notice that is included in or attached to the work 38 | (an example is provided in the Appendix below). 39 | 40 | "Derivative Works" shall mean any work, whether in Source or Object 41 | form, that is based on (or derived from) the Work and for which the 42 | editorial revisions, annotations, elaborations, or other modifications 43 | represent, as a whole, an original work of authorship. For the purposes 44 | of this License, Derivative Works shall not include works that remain 45 | separable from, or merely link (or bind by name) to the interfaces of, 46 | the Work and Derivative Works thereof. 47 | 48 | "Contribution" shall mean any work of authorship, including 49 | the original version of the Work and any modifications or additions 50 | to that Work or Derivative Works thereof, that is intentionally 51 | submitted to Licensor for inclusion in the Work by the copyright owner 52 | or by an individual or Legal Entity authorized to submit on behalf of 53 | the copyright owner. For the purposes of this definition, "submitted" 54 | means any form of electronic, verbal, or written communication sent 55 | to the Licensor or its representatives, including but not limited to 56 | communication on electronic mailing lists, source code control systems, 57 | and issue tracking systems that are managed by, or on behalf of, the 58 | Licensor for the purpose of discussing and improving the Work, but 59 | excluding communication that is conspicuously marked or otherwise 60 | designated in writing by the copyright owner as "Not a Contribution." 61 | 62 | "Contributor" shall mean Licensor and any individual or Legal Entity 63 | on behalf of whom a Contribution has been received by Licensor and 64 | subsequently incorporated within the Work. 65 | 66 | 2. Grant of Copyright License. Subject to the terms and conditions of 67 | this License, each Contributor hereby grants to You a perpetual, 68 | worldwide, non-exclusive, no-charge, royalty-free, irrevocable 69 | copyright license to reproduce, prepare Derivative Works of, 70 | publicly display, publicly perform, sublicense, and distribute the 71 | Work and such Derivative Works in Source or Object form. 72 | 73 | 3. Grant of Patent License. Subject to the terms and conditions of 74 | this License, each Contributor hereby grants to You a perpetual, 75 | worldwide, non-exclusive, no-charge, royalty-free, irrevocable 76 | (except as stated in this section) patent license to make, have made, 77 | use, offer to sell, sell, import, and otherwise transfer the Work, 78 | where such license applies only to those patent claims licensable 79 | by such Contributor that are necessarily infringed by their 80 | Contribution(s) alone or by combination of their Contribution(s) 81 | with the Work to which such Contribution(s) was submitted. If You 82 | institute patent litigation against any entity (including a 83 | cross-claim or counterclaim in a lawsuit) alleging that the Work 84 | or a Contribution incorporated within the Work constitutes direct 85 | or contributory patent infringement, then any patent licenses 86 | granted to You under this License for that Work shall terminate 87 | as of the date such litigation is filed. 88 | 89 | 4. Redistribution. You may reproduce and distribute copies of the 90 | Work or Derivative Works thereof in any medium, with or without 91 | modifications, and in Source or Object form, provided that You 92 | meet the following conditions: 93 | 94 | (a) You must give any other recipients of the Work or 95 | Derivative Works a copy of this License; and 96 | 97 | (b) You must cause any modified files to carry prominent notices 98 | stating that You changed the files; and 99 | 100 | (c) You must retain, in the Source form of any Derivative Works 101 | that You distribute, all copyright, patent, trademark, and 102 | attribution notices from the Source form of the Work, 103 | excluding those notices that do not pertain to any part of 104 | the Derivative Works; and 105 | 106 | (d) If the Work includes a "NOTICE" text file as part of its 107 | distribution, then any Derivative Works that You distribute must 108 | include a readable copy of the attribution notices contained 109 | within such NOTICE file, excluding those notices that do not 110 | pertain to any part of the Derivative Works, in at least one 111 | of the following places: within a NOTICE text file distributed 112 | as part of the Derivative Works; within the Source form or 113 | documentation, if provided along with the Derivative Works; or, 114 | within a display generated by the Derivative Works, if and 115 | wherever such third-party notices normally appear. The contents 116 | of the NOTICE file are for informational purposes only and 117 | do not modify the License. You may add Your own attribution 118 | notices within Derivative Works that You distribute, alongside 119 | or as an addendum to the NOTICE text from the Work, provided 120 | that such additional attribution notices cannot be construed 121 | as modifying the License. 122 | 123 | You may add Your own copyright statement to Your modifications and 124 | may provide additional or different license terms and conditions 125 | for use, reproduction, or distribution of Your modifications, or 126 | for any such Derivative Works as a whole, provided Your use, 127 | reproduction, and distribution of the Work otherwise complies with 128 | the conditions stated in this License. 129 | 130 | 5. Submission of Contributions. Unless You explicitly state otherwise, 131 | any Contribution intentionally submitted for inclusion in the Work 132 | by You to the Licensor shall be under the terms and conditions of 133 | this License, without any additional terms or conditions. 134 | Notwithstanding the above, nothing herein shall supersede or modify 135 | the terms of any separate license agreement you may have executed 136 | with Licensor regarding such Contributions. 137 | 138 | 6. Trademarks. This License does not grant permission to use the trade 139 | names, trademarks, service marks, or product names of the Licensor, 140 | except as required for reasonable and customary use in describing the 141 | origin of the Work and reproducing the content of the NOTICE file. 142 | 143 | 7. Disclaimer of Warranty. Unless required by applicable law or 144 | agreed to in writing, Licensor provides the Work (and each 145 | Contributor provides its Contributions) on an "AS IS" BASIS, 146 | WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or 147 | implied, including, without limitation, any warranties or conditions 148 | of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A 149 | PARTICULAR PURPOSE. You are solely responsible for determining the 150 | appropriateness of using or redistributing the Work and assume any 151 | risks associated with Your exercise of permissions under this License. 152 | 153 | 8. Limitation of Liability. In no event and under no legal theory, 154 | whether in tort (including negligence), contract, or otherwise, 155 | unless required by applicable law (such as deliberate and grossly 156 | negligent acts) or agreed to in writing, shall any Contributor be 157 | liable to You for damages, including any direct, indirect, special, 158 | incidental, or consequential damages of any character arising as a 159 | result of this License or out of the use or inability to use the 160 | Work (including but not limited to damages for loss of goodwill, 161 | work stoppage, computer failure or malfunction, or any and all 162 | other commercial damages or losses), even if such Contributor 163 | has been advised of the possibility of such damages. 164 | 165 | 9. Accepting Warranty or Additional Liability. While redistributing 166 | the Work or Derivative Works thereof, You may choose to offer, 167 | and charge a fee for, acceptance of support, warranty, indemnity, 168 | or other liability obligations and/or rights consistent with this 169 | License. However, in accepting such obligations, You may act only 170 | on Your own behalf and on Your sole responsibility, not on behalf 171 | of any other Contributor, and only if You agree to indemnify, 172 | defend, and hold each Contributor harmless for any liability 173 | incurred by, or claims asserted against, such Contributor by reason 174 | of your accepting any such warranty or additional liability. 175 | 176 | END OF TERMS AND CONDITIONS 177 | 178 | APPENDIX: How to apply the Apache License to your work. 179 | 180 | To apply the Apache License to your work, attach the following 181 | boilerplate notice, with the fields enclosed by brackets "[]" 182 | replaced with your own identifying information. (Don't include 183 | the brackets!) The text should be enclosed in the appropriate 184 | comment syntax for the file format. We also recommend that a 185 | file or class name and description of purpose be included on the 186 | same "printed page" as the copyright notice for easier 187 | identification within third-party archives. 188 | 189 | Copyright [yyyy] [name of copyright owner] 190 | 191 | Licensed under the Apache License, Version 2.0 (the "License"); 192 | you may not use this file except in compliance with the License. 193 | You may obtain a copy of the License at 194 | 195 | http://www.apache.org/licenses/LICENSE-2.0 196 | 197 | Unless required by applicable law or agreed to in writing, software 198 | distributed under the License is distributed on an "AS IS" BASIS, 199 | WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 200 | See the License for the specific language governing permissions and 201 | limitations under the License. 202 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # simulink_python 2 | 3 | 使用simulink进行环境的模拟,使用python编写强化学习代码 4 | 5 | ## 快速开始 6 | 7 | ## 项目简介 8 | 9 | * ##### tcp通信模块测试 10 | 11 | matlab与python之间使用tcp协议进行本地阻塞式通信,matlab接收python端信息后,才能使用simulink进行模拟(目前未解决模拟步长问题)。 12 | 13 | 尝试将matlab和python分别作为客户端和服务端进行测试。其中,matlab作为客户端模拟100步时间为20s,python作为客户端模拟100步时间为2min。测试代码在[这里](./tcp_connection_attempt)。 14 | 15 | ![image](./assets/demo.png) 16 | 17 | * ##### rl模块测试 18 | 19 | ~~使用的是经典的[CartPole](./some_simulink_model/rlCartPoleSimscapeModel.slx)模型~~ 20 | 21 | 在调bug无果之后,准备先试试这个[项目](https://github.com/qLience/AI-Pump-for-Underfloor-Heating-systems) 22 | 23 | * ##### 尝试项目 24 | 25 | 项目缺少'svdutilitieslib' matlab,提示install 'Embedded Coder Support Package for ARM Cortex-A Processors,安装完之后发现无法打开matlab 26 | 27 | 将中文用户名修改为英文之后,问题解决,打开matlab之后发现所安装的模块没起作用,继续安装其他可能有用的模块 28 | 29 | * ##### 在服务器是部署安装matlab 30 | 31 | 找到了两篇很好的博客,一个是[如何安装](),一个是[更好的使用matlab](),如果本地用不了的话就准备用服务器上的matlab了 32 | 33 | * ##### 调试并开始训练 34 | 35 | 分别跑了pytorch版本和tensorflow版本的dqn 36 | 37 | 1. 发现其pytorch版本非常陈旧,准备自己重写 38 | 2. tensorflow版本的没有问题,出现权限获取失败的错误,准备chmod那个目录,但现在显卡似乎not free,准备待会再跑 39 | 3. 两者的报错日志记录[在这](log.txt)B.T.W,服务器打开matlab 耗时10s,除了操作不方便意外没什么缺点 40 | 41 | * ##### 训练成功 42 | 43 | tensorflow 修改目录权限后跑通了,但还是报了Write failed because file could not be opened的下一步准备进行一些修改吧(包括环境和网络) 44 | 45 | 截图如下: 46 | 47 | ![train_success2](./assets/train_success1.png) 48 | 49 | ![train_success2](./assets/train_success2.png) 50 | 51 | * [有关rl_attention的一些论文](./attention_rl.md),有时间再去看吧 52 | 53 | * ##### 开始重写其代码并进行训练 54 | 55 | 1. 代码放置在[test文件夹](./test)中,初步将其action_space和state_space分别定为3,3。通信的频率和间歇性通信还有待处理。在这之后准备用matlab写的CartPole进行测试 56 | 2. 发现一个问题: python接受state信息时,simulink会将那段时间所累积的所有state信息发送至python端,而simulink只要接收到python的action信息就会立即采取动作。目前想到的解决方法是:将所有的state信息累计的每个状态求平均值。不过这样也就解决了如何让simulink挂机的问题。 57 | 3. 测试控制温度的程序已经跑完了,代码在test文件夹中。 58 | 4. 解决了一些问题:1)可以自行设定通信方式,目前使用的是停等式通信。simulink端设置timeout时长,若python未在超时时长内应答则报错。在等待过程中,simulink是不进行模拟的。2)通过调整Packing option来调整每次step的时间。以此实现了每次通信模拟one step 59 | 60 | * ##### tips: 61 | 62 | Google之后没有发现使用Pymodelica搭建模型的RL项目,不准备使用modelica入手 63 | 64 | [关于attention的调查](./attention_rl) 65 | 66 | ### 参考链接 67 | 68 | * [UDP&TCP通信测试]() 69 | * [调用simulink&通信]() 70 | * [Create Simulink Environments for Reinforcement Learning]() 71 | * [Load Predefined Simulink Environments]() 72 | * [Train DDPG Agent to Swing Up and Balance Cart-Pole System]() 73 | 74 | * [qLience / AI-Pump-for-Underfloor-Heating-systems](https://github.com/qLience/AI-Pump-for-Underfloor-Heating-systems) 75 | -------------------------------------------------------------------------------- /assets/demo.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/sunnyswag/simulink_python/481cc075cbf1036393b66a668676489eac3bc088/assets/demo.png -------------------------------------------------------------------------------- /assets/train_success1.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/sunnyswag/simulink_python/481cc075cbf1036393b66a668676489eac3bc088/assets/train_success1.png -------------------------------------------------------------------------------- /assets/train_success2.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/sunnyswag/simulink_python/481cc075cbf1036393b66a668676489eac3bc088/assets/train_success2.png -------------------------------------------------------------------------------- /attention_rl.md: -------------------------------------------------------------------------------- 1 | 1.[Reinforcement Learning in Multidimensional Environments Relies on Attention Mechanisms]() 2 | 3 | 2.[Reinforcement Learning with Attention that Works: A Self-Supervised Approach](Reinforcement Learning with Attention that Works: A Self-Supervised Approach) 4 | 5 | 3.[VMAV-C: A Deep Attention-based Reinforcement Learning Algorithm for Model-based Control]() 6 | 7 | 4.[Attention and Reinforcement Learning:Constructing Representations from Indirect Feedback]() 8 | 9 | 5.[Better deep visual attention with reinforcement learning in action recognition]() 10 | 11 | 6.[Attention-based Deep Reinforcement Learningfor Multi-view Environments]() 12 | 13 | 7.[Recurrent Models of Visual Attention]() 14 | 15 | -------------------------------------------------------------------------------- /log.txt: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/sunnyswag/simulink_python/481cc075cbf1036393b66a668676489eac3bc088/log.txt -------------------------------------------------------------------------------- /rl_demo/CartPoleSimscapeModel.m: -------------------------------------------------------------------------------- 1 | open_system('example') -------------------------------------------------------------------------------- /some_simulink_model/SHTL1_TestLevel_2_MODEL.slx: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/sunnyswag/simulink_python/481cc075cbf1036393b66a668676489eac3bc088/some_simulink_model/SHTL1_TestLevel_2_MODEL.slx -------------------------------------------------------------------------------- /some_simulink_model/SHTL1_TestLevel_2_MODEL.slx.autosave: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/sunnyswag/simulink_python/481cc075cbf1036393b66a668676489eac3bc088/some_simulink_model/SHTL1_TestLevel_2_MODEL.slx.autosave -------------------------------------------------------------------------------- /some_simulink_model/example.slx: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/sunnyswag/simulink_python/481cc075cbf1036393b66a668676489eac3bc088/some_simulink_model/example.slx -------------------------------------------------------------------------------- /some_simulink_model/rlCartPoleSimscapeModel.slx: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/sunnyswag/simulink_python/481cc075cbf1036393b66a668676489eac3bc088/some_simulink_model/rlCartPoleSimscapeModel.slx -------------------------------------------------------------------------------- /some_simulink_model/rlSimplePendulumModel.slx: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/sunnyswag/simulink_python/481cc075cbf1036393b66a668676489eac3bc088/some_simulink_model/rlSimplePendulumModel.slx -------------------------------------------------------------------------------- /tcp_connection_attempt/matlab_client.m: -------------------------------------------------------------------------------- 1 | clc; 2 | clear; 3 | close all; 4 | open_system('../some_simulink_model/example.slx') 5 | 6 | sim_time_step = 0.01; 7 | 8 | % start the simulation and pause the simulation, waiting for singal from python 9 | set_param(gcs,'SimulationCommand','start','SimulationCommand','pause'); 10 | 11 | % open a server, it will block until a client connect to it 12 | %s = tcpip('127.0.0.1', 54320, 'NetworkRole', 'server'); 13 | s = tcpip('127.0.0.1', 54320, 'Timeout', 60,'InputBufferSize',10240); 14 | fopen(s); 15 | % main loop 16 | while(1) % can be changed 17 | while(1) %loop, until read some data 18 | nBytes = get(s,'BytesAvailable'); 19 | if nBytes>0 20 | break; 21 | end 22 | end 23 | command = fread(s,nBytes); 24 | data=str2num(char(command()')); 25 | if isempty(data) 26 | data=0; 27 | end 28 | 29 | % set a paramter in the simulink model using the data get from python 30 | set_param('example/K','Gain',num2str(data)) 31 | 32 | % run the simulink model for a step 33 | set_param(gcs, 'SimulationCommand', 'step'); 34 | 35 | % puase the simulink model and send some data to python 36 | pause(1); 37 | u=states.data(end,:); 38 | fwrite(s, jsonencode(u)); 39 | end 40 | 41 | 42 | -------------------------------------------------------------------------------- /tcp_connection_attempt/matlab_service.m: -------------------------------------------------------------------------------- 1 | clc; 2 | clear; 3 | close all; 4 | mdl = '../some_simulink_model/example.slx'; 5 | open_system(mdl) 6 | 7 | sim_time_step = 0.01; 8 | 9 | % start the simulation and pause the simulation, waiting for singal from python 10 | set_param(gcs,'SimulationCommand','start','SimulationCommand','pause'); 11 | 12 | % open a server, it will block until a client connect to it 13 | s = tcpip('127.0.0.1', 54320, 'NetworkRole', 'server'); 14 | fopen(s); 15 | 16 | count=0; 17 | % main loop 18 | while count<100 % can be changed 19 | while(1) %loop, until read some data 20 | nBytes = get(s,'BytesAvailable'); 21 | if nBytes>0 22 | break; 23 | end 24 | end 25 | command = fread(s,nBytes); 26 | data=str2num(char(command()')); 27 | if isempty(data) 28 | data=0; 29 | end 30 | % set a paramter in the simulink model using the data get from python 31 | set_param('example/K','Gain',num2str(data)); 32 | 33 | % run the simulink model for a step 34 | set_param(gcs, 'SimulationCommand', 'step'); 35 | 36 | % puase the simulink model and send some data to python 37 | pause(0.1); 38 | u=states.data(end,:); 39 | fwrite(s, jsonencode(u)); 40 | count=count+1; 41 | end 42 | fclose(s); 43 | 44 | -------------------------------------------------------------------------------- /tcp_connection_attempt/tcp_client.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python 2 | # -*- coding: utf-8 -* 3 | 4 | import socket 5 | import time 6 | import json 7 | 8 | def tcp_sim(): 9 | 10 | sever_port=('127.0.0.1', 54320) 11 | 12 | try: 13 | # create an AF_INET, STREAM socket (TCP) 14 | sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM) 15 | except: 16 | print('Failed to create a socket. ') 17 | sys.exit() 18 | print('Socket Created :)') 19 | 20 | sock.connect(sever_port) 21 | print("start a client") 22 | time.sleep(0.01) 23 | 24 | s="start" 25 | s = bytes(s, encoding = "utf8") 26 | sock.send(s) 27 | print(s) 28 | 29 | 30 | while 1: 31 | buf = sock.recv(1000) 32 | #print(buf) 33 | buf_l = json.loads(buf) 34 | print(buf_l) 35 | control_signal=buf_l[0]*1 36 | # s=str(control_signal) 37 | s = bytes(str(control_signal), encoding = "utf8") 38 | sock.send(s) 39 | print(s) 40 | #sock.send(control_signal) 41 | #print(control_signal) 42 | sock.close() 43 | 44 | 45 | 46 | 47 | if __name__ == '__main__': 48 | 49 | tcp_sim() 50 | -------------------------------------------------------------------------------- /tcp_connection_attempt/tcp_service.py: -------------------------------------------------------------------------------- 1 | import socket 2 | import time 3 | import json 4 | 5 | sock = socket.socket(socket.AF_INET,socket.SOCK_STREAM)#IPV4,TCP协议 6 | sock.bind(('127.0.0.1', 54320))#绑定ip和端口,bind接受的是一个元组 7 | sock.listen(5)#设置监听,其值阻塞队列长度,一共可以有5+1个客户端和服务器连接 8 | 9 | print("start server") 10 | connection, address = sock.accept()#等待客户请求 11 | print("client ip is:",address)#打印客户端地址 12 | 13 | s = bytes("start", encoding = "utf8")#输入start,开始 14 | connection.send(s) 15 | print(s) 16 | 17 | while True: 18 | buf = connection.recv(1000)#接收数据 19 | print(buf) 20 | buf_l = json.loads(buf) 21 | control_signal=buf_l[0]*1 22 | # s=str(control_signal) 23 | connection.send(bytes(str(control_signal), encoding = "utf8")) 24 | #connection.close()#关闭连接 25 | time.sleep(1) 26 | sock.close()#关闭服务器 27 | -------------------------------------------------------------------------------- /test/test_cartpole/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/sunnyswag/simulink_python/481cc075cbf1036393b66a668676489eac3bc088/test/test_cartpole/__init__.py -------------------------------------------------------------------------------- /test/test_cartpole/dqn_test.py: -------------------------------------------------------------------------------- 1 | from environment import Environment 2 | import time 3 | env = Environment("test_dqn") 4 | env.create_sockets_server() 5 | print(env.reset()) 6 | print(env.step(0),' 0') 7 | print(env.step(1),' 1') 8 | print(env.step(2),' 2') 9 | time.sleep(1) 10 | print(env.step(0),' 0') 11 | time.sleep(2) 12 | print(env.step(1),' 1') 13 | time.sleep(3) 14 | print(env.step(2),' 2') -------------------------------------------------------------------------------- /test/test_cartpole/environment.py: -------------------------------------------------------------------------------- 1 | import socket 2 | import sys 3 | import struct 4 | import array 5 | import time 6 | import random 7 | import numpy as np 8 | 9 | class Environment: 10 | def __init__(self, env_name): 11 | self.env_name = env_name 12 | self.sendConn = 0 # socket 发送端对象 13 | self.send_and_recv_host = 'localhost' 14 | self.sendPort = 50000 15 | self.recvConn = 0 # socket 接收数据的对象 16 | self.recvPort = 50001 17 | 18 | self.current_state = [0,0,0] 19 | 20 | # 创建socket服务 21 | def create_sockets_server(self): 22 | 23 | # 创建发送端socket服务 24 | sockets_server_send = socket.socket(socket.AF_INET, socket.SOCK_STREAM) 25 | sockets_server_send.bind((self.send_and_recv_host, self.sendPort)) 26 | print('bind server port success') 27 | 28 | sockets_server_send.listen(1) 29 | print("Wait 20 seconds for a response from client to server {} ".format (self.sendPort)) 30 | sockets_server_send.settimeout(20) 31 | 32 | try: 33 | self.sendConn, addr = sockets_server_send.accept() 34 | except socket.timeout: 35 | print("connection timeout") 36 | sys.exit() 37 | print("Server connnection success ! Address :{} , port :{}".format(addr, self.sendPort)) 38 | 39 | # 创建接收端socket服务 40 | sockets_server_recv = socket.socket(socket.AF_INET, socket.SOCK_STREAM) 41 | sockets_server_recv.bind((self.send_and_recv_host, self.recvPort)) 42 | print("The receiver port binding success") 43 | 44 | sockets_server_recv.listen(1) 45 | self.recvConn, addr = sockets_server_recv.accept() 46 | print("receiving connect success!Address :{} ,port :{}".format(addr, self.recvPort)) 47 | 48 | # 发送动作值simulink 49 | def _send_action(self, action): 50 | action = struct.pack("I", action) 51 | self.sendConn.sendall(action) 52 | 53 | # 接收来自simulink的状态信息 54 | def _receive_state(self): 55 | data = self.recvConn.recv(2048) 56 | data = array.array('d', data) 57 | # 将返回的多次state信息求平均 58 | data = np.array(data).reshape(-1,3) 59 | print(data.shape[0]) 60 | return list(data.mean(axis=0)) 61 | 62 | # 计算奖励值 63 | def _calculate_reward(self): 64 | T1, Tmix, Treturn = self.current_state[0],self.current_state[1],self.current_state[2] 65 | room_goal = 22 # 房间一的标准温度 66 | room_LL = 15 # lower limit 67 | room_UL = 29 # Upper limit 68 | Tmix_LL = 15 # Lower limit 69 | Tmix_UL = 44 # Upper limit 70 | 71 | distance = abs(room_goal - T1) 72 | if distance > 7 or Tmix < Tmix_LL or Tmix > Tmix_UL: 73 | reward = -1 74 | elif 0.5 < distance <= 7: 75 | reward = (7 - distance) * 0.5 76 | else: 77 | reward = 7 - distance 78 | 79 | return reward 80 | 81 | def step(self, action): 82 | self._send_action(action) 83 | time.sleep(0.1) 84 | env_values = self._receive_state() 85 | if env_values is not None: 86 | self.current_state = env_values 87 | reward = self._calculate_reward() 88 | done = False 89 | info = "normal" 90 | return self.current_state, reward, done, info 91 | 92 | def reset(self): 93 | self._send_action(random.randint(0,2)) 94 | time.sleep(0.1) 95 | env_values = self._receive_state() 96 | if env_values is not None: 97 | self.current_state = env_values 98 | print("current state T1: {} ,Tmix: {} ,Treturn: {} ".format(self.current_state[0],self.current_state[1],self.current_state[2])) 99 | return self.current_state 100 | 101 | -------------------------------------------------------------------------------- /test/test_origin/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/sunnyswag/simulink_python/481cc075cbf1036393b66a668676489eac3bc088/test/test_origin/__init__.py -------------------------------------------------------------------------------- /test/test_origin/dqn_test.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/sunnyswag/simulink_python/481cc075cbf1036393b66a668676489eac3bc088/test/test_origin/dqn_test.png -------------------------------------------------------------------------------- /test/test_origin/dqn_test.py: -------------------------------------------------------------------------------- 1 | from environment import Environment 2 | import os 3 | import matplotlib.pyplot as plt 4 | import numpy as np 5 | import random 6 | import torch 7 | import torch.nn as nn 8 | import torch.optim as optim 9 | from torch.nn.utils import clip_grad_norm_ 10 | 11 | 12 | class ReplayBuffer: 13 | def __init__(self, column, max_size, batch_size): 14 | self.current_state = np.zeros((max_size, column), dtype=np.float32) 15 | self.next_state = np.zeros((max_size, column), dtype=np.float32) 16 | self.action = np.zeros(max_size, dtype=np.float32) 17 | self.reward = np.zeros(max_size, dtype=np.float32) 18 | self.done = np.zeros(max_size,dtype=np.float32) 19 | self.max_size, self.batch_size = max_size, batch_size 20 | self.size, self.current_index = 0, 0 21 | 22 | def store(self, current_state, action, next_state, reward, done): 23 | self.current_state[self.current_index] = current_state 24 | self.action[self.current_index] = action 25 | self.next_state[self.current_index] = next_state 26 | self.reward[self.current_index] = reward 27 | self.done[self.current_index] = done 28 | self.current_index = (self.current_index + 1) % self.max_size 29 | self.size = min(self.size + 1, self.max_size) 30 | 31 | def sample_batch(self): 32 | ptr = np.random.choice(self.size, self.batch_size) 33 | return dict(current_state=self.current_state[ptr], 34 | next_state=self.next_state[ptr], 35 | action=self.action[ptr], 36 | reward=self.reward[ptr], 37 | done=self.done[ptr] 38 | ) 39 | 40 | def __len__(self): 41 | return self.size 42 | 43 | 44 | class Network(nn.Module): 45 | def __init__(self, in_dim, out_dim): 46 | super(Network, self).__init__() 47 | 48 | self.layers = nn.Sequential( 49 | nn.Linear(in_dim, 128), 50 | nn.ReLU(), 51 | nn.Linear(128, 128), 52 | nn.ReLU(), 53 | nn.Linear(128, out_dim) 54 | ) 55 | def forward(self,x): 56 | return self.layers(x) 57 | 58 | 59 | min_epsilon = 0.05 60 | max_epsilon = 1 61 | epsilon_decay = 80 62 | epsilon_episode = lambda episode : min_epsilon + np.exp(-episode / epsilon_decay)*0.95 63 | 64 | 65 | env = Environment("test_you") 66 | state_space = 3 67 | action_space = 3 68 | 69 | batch_size = 32 70 | max_size = 1000 71 | memory = ReplayBuffer(state_space, max_size, batch_size) 72 | 73 | device = torch.device("cuda:1" if torch.cuda.is_available() else "cpu") 74 | 75 | network = Network(state_space,action_space).to(device) 76 | target_network = Network(state_space,action_space).to(device) 77 | target_network.load_state_dict(network.state_dict()) 78 | target_network.eval() 79 | 80 | optimizer = optim.Adam(network.parameters()) 81 | 82 | gamma = 0.99 83 | target_update = 200 84 | 85 | 86 | def select_action(episode, state): 87 | if np.random.random_sample() > epsilon_episode(episode): 88 | selected_action = network(torch.FloatTensor(state).to(device)).argmax().detach().item() 89 | else: 90 | selected_action = random.randint(0,2) 91 | return selected_action 92 | 93 | 94 | def train(): 95 | samples = memory.sample_batch() 96 | state = torch.FloatTensor(samples["current_state"]).to(device).to(device) 97 | next_state = torch.FloatTensor(samples["next_state"]).to(device) 98 | action = torch.LongTensor(samples["action"].reshape(-1, 1)).to(device) 99 | reward = torch.FloatTensor(samples["reward"].reshape(-1, 1)).to(device) 100 | done = torch.FloatTensor(samples["done"].reshape(-1, 1)).to(device) 101 | 102 | current_Q_value = network(state).gather(1, action) 103 | next_Q_value = target_network(next_state).max(dim=1,keepdim=True)[0].detach() 104 | target = (reward + gamma*next_Q_value*(1 - done)).to(device) 105 | loss = ((target - current_Q_value).pow(2)).mean() 106 | optimizer.zero_grad() 107 | loss.backward() 108 | clip_grad_norm_(network.parameters(),1.0,norm_type=1) # Gradient clipping(增加稳定性) 109 | optimizer.step() 110 | return loss.item() 111 | 112 | 113 | def plot_and_save(frame_idx, rewards, losses): 114 | rewards_factor = 10 115 | losses_smooth_x = np.arange(len(losses)) 116 | losses_smooth = [np.mean(losses[i:i+rewards_factor]) if i > rewards_factor else np.mean(losses[0:i+1]) 117 | for i in range(len(losses))] 118 | rewards_smooth_x = np.arange(len(rewards)) 119 | rewards_smooth = [np.mean(rewards[i:i+rewards_factor]) if i > rewards_factor else np.mean(rewards[0:i+1]) 120 | for i in range(len(rewards))] 121 | 122 | for i in range(len(losses)//3000): 123 | losses_smooth = losses_smooth[::2] 124 | losses_smooth_x = losses_smooth_x[::2] 125 | for i in range(len(rewards)//200): 126 | rewards_smooth = rewards_smooth[::2] 127 | rewards_smooth_x = rewards_smooth_x[::2] 128 | 129 | plt.figure(figsize=(18,10)) 130 | plt.subplot(211) 131 | plt.xlabel("episode") 132 | plt.ylabel("episode_rewards") 133 | plt.title('episode %s. rewards: %s' % (frame_idx, rewards[-1])) 134 | plt.plot(rewards, label="Rewards",color='lightsteelblue',linewidth='1') 135 | plt.plot(rewards_smooth_x, rewards_smooth, 136 | label="Smoothed_Rewards",color='darkorange',linewidth='3') 137 | plt.legend(loc='best') 138 | 139 | plt.subplot(212) 140 | plt.title('loss') 141 | plt.plot(losses,label="Losses",color='lightsteelblue',linewidth='1') 142 | plt.plot(losses_smooth_x, losses_smooth, 143 | label="Smoothed_Losses",color='darkorange',linewidth='3') 144 | plt.legend(loc='best') 145 | 146 | plt.savefig("dqn_test.png") 147 | 148 | all_rewards = [] 149 | losses = [] 150 | update_count = 0 151 | 152 | env.create_sockets_server() 153 | state = env.reset() 154 | for episode in range(200): 155 | rewards = 0 156 | for i in range(100): 157 | action = select_action(episode, state) 158 | next_state, reward, done, _ = env.step(action) 159 | print("episode:",episode,"state:",next_state,"reward:", reward,"action:",action ) 160 | memory.store(state, action, next_state, reward, done) 161 | state = next_state 162 | rewards += reward 163 | #if done: 164 | #break 165 | if len(memory) > batch_size: 166 | loss = train() 167 | update_count += 1 168 | losses.append(loss) 169 | if update_count % target_update == 0: 170 | target_network.load_state_dict(network.state_dict()) 171 | all_rewards.append(rewards) 172 | plot_and_save(episode , [round(all_rewards[i], 3) for i in range(len(all_rewards))], losses) -------------------------------------------------------------------------------- /test/test_origin/environment.py: -------------------------------------------------------------------------------- 1 | import socket 2 | import sys 3 | import struct 4 | import array 5 | import time 6 | import random 7 | import numpy as np 8 | 9 | class Environment: 10 | def __init__(self, env_name): 11 | self.env_name = env_name 12 | self.sendConn = 0 # socket 发送端对象 13 | self.send_and_recv_host = 'localhost' 14 | self.sendPort = 50000 15 | self.recvConn = 0 # socket 接收数据的对象 16 | self.recvPort = 50001 17 | 18 | self.current_state = [0,0,0] 19 | 20 | # 创建socket服务 21 | def create_sockets_server(self): 22 | 23 | # 创建发送端socket服务 24 | sockets_server_send = socket.socket(socket.AF_INET, socket.SOCK_STREAM) 25 | sockets_server_send.bind((self.send_and_recv_host, self.sendPort)) 26 | print('bind server port success') 27 | 28 | sockets_server_send.listen(1) 29 | print("Wait 20 seconds for a response from client to server {} ".format (self.sendPort)) 30 | sockets_server_send.settimeout(20) 31 | 32 | try: 33 | self.sendConn, addr = sockets_server_send.accept() 34 | except socket.timeout: 35 | print("connection timeout") 36 | sys.exit() 37 | print("Server connnection success ! Address :{} , port :{}".format(addr, self.sendPort)) 38 | 39 | # 创建接收端socket服务 40 | sockets_server_recv = socket.socket(socket.AF_INET, socket.SOCK_STREAM) 41 | sockets_server_recv.bind((self.send_and_recv_host, self.recvPort)) 42 | print("The receiver port binding success") 43 | 44 | sockets_server_recv.listen(1) 45 | self.recvConn, addr = sockets_server_recv.accept() 46 | print("receiving connect success!Address :{} ,port :{}".format(addr, self.recvPort)) 47 | 48 | # 发送动作值simulink 49 | def _send_action(self, action): 50 | action = struct.pack("I", action) 51 | self.sendConn.sendall(action) 52 | 53 | # 接收来自simulink的状态信息 54 | def _receive_state(self): 55 | data = self.recvConn.recv(2048) 56 | data = array.array('d', data) 57 | # 将返回的多次state信息求平均 58 | data = np.array(data).reshape(-1,3) 59 | data = list(data.mean(axis=0)) 60 | data = [round(i, 2) for i in data] 61 | return data 62 | 63 | # 计算奖励值 64 | def _calculate_reward(self): 65 | T1, Tmix, Treturn = self.current_state[0],self.current_state[1],self.current_state[2] 66 | room_goal = 22 # 房间一的标准温度 67 | room_LL = 15 # lower limit 68 | room_UL = 29 # Upper limit 69 | Tmix_LL = 15 # Lower limit 70 | Tmix_UL = 44 # Upper limit 71 | 72 | distance = abs(room_goal - T1) 73 | if Tmix < Tmix_LL or Tmix > Tmix_UL : 74 | if distance >= 7 : 75 | reward = -1 * distance * 0.1 76 | else : 77 | reward = -1 78 | else : 79 | if distance >= 7 : 80 | reward = -1 * distance * 0.1 81 | elif 5 < distance < 7: 82 | reward = (7 - distance) * 0.2 83 | elif 1 < distance <= 5: 84 | reward = (7 - distance) * 0.5 85 | elif 0 <= distance <= 1: 86 | reward = 7 - distance 87 | 88 | return round(reward, 3) 89 | 90 | def step(self, action): 91 | self._send_action(action) 92 | time.sleep(0.1) 93 | env_values = self._receive_state() 94 | if env_values is not None: 95 | self.current_state = env_values 96 | reward = self._calculate_reward() 97 | done = False 98 | info = "normal" 99 | return self.current_state, reward, done, info 100 | 101 | def reset(self): 102 | self._send_action(random.randint(0,2)) 103 | time.sleep(0.1) 104 | env_values = self._receive_state() 105 | if env_values is not None: 106 | self.current_state = env_values 107 | print("current state T1: {} ,Tmix: {} ,Treturn: {} ".format(self.current_state[0],self.current_state[1],self.current_state[2])) 108 | return self.current_state 109 | 110 | -------------------------------------------------------------------------------- /test/test_origin/readme.md: -------------------------------------------------------------------------------- 1 | ### Step1 2 | 3 | 1. 发现除了第一次是传输10次state以外,其余时刻都是传输4个state的信息 4 | 5 | 2. 将其取平均之后发现采取相同的action会得到不同趋势的state,环境不稳定,准备先跑通这个之后再尝试CartPole的环境 6 | 7 | 3. 使用dqn_test.py测试脚本的log如下: 8 | 9 | >bind server port success 10 | Wait 20 seconds for a response from client to server 50000 11 | Server connnection success ! Address :('127.0.0.1', 48334) , port :50000 12 | The receiver port binding success 13 | receiving connect success!Address :('127.0.0.1', 40358) ,port :50001 14 | 1 15 | current state T1: 16.0 ,Tmix: 23.2 ,Treturn: 16.0 16 | [16.0, 23.2, 16.0] 17 | 1 18 | ([15.0, 22.400000000000002, 16.88], 0.0, False, 'normal') 0 19 | 10 20 | ([14.343, 18.060000000000002, 16.586000000000002], -1, False, 'normal') 1 21 | 4 22 | ([14.245000000000001, 17.000000000000004, 15.962499999999999], -1, False, 'normal') 2 23 | 4 24 | ([14.21, 20.2, 16.487499999999997], -1, False, 'normal') 0 25 | 4 26 | ([14.200000000000001, 19.400000000000002, 16.7325], -1, False, 'normal') 1 27 | 4 28 | ([14.192499999999999, 20.2, 16.5375], -1, False, 'normal') 2 29 | 30 | ### Step2 31 | 32 | 1. 使用CartPole的环境需要一段时间的适配,所以先使用origin的环境进行测试,测试结果如下2,3,4,5点 33 | 34 | 2. 减少step size可以提升结果的准确度,总的模拟的时间也会变长。所以调整step size不可以调整单次通信模拟的次数。调整单次通信模拟次数也就是调整一次step的时间。所以打开并调整pacing option 35 | 36 | 3. 阻塞式通信方式(勾选Enable blocking mode)。simulink接收了python传输的action之后才能进行渲染,python传输action之后等待0.1s,再接收simulink传输的state。 37 | 38 | 4. TCP/IP Receive有两个输出,一个data,一个state。state:(0,1)表示有没有接收到 39 | 40 | 5. block sample time为当前模块的模拟step只要在max step size 和min step size之间就好了 41 | 42 | ### Questions 43 | 44 | 1. 奖励值的图波动较大。原来奖励值不稳定很正常,有多个评价指标,某个指标超了来个负号,再乘上计算量,就会造成奖励值的波动。而且我们画图的时候用的是episode的总奖励值,所以波动会更大。而且在gym的环境中,一直操作有误的话它会done掉的,所以就会造成那个epoch的数据特别差。不过这都没关系,因为这种数据放到记忆库中也是可以学习的,对于学习来说,错误比正确更重要,从错误中学习(机器和人都一样)。 45 | **从正确中学习?或者说从正确中创造错误进行学习?** 46 | 47 | 2. 梯度消失也和奖励值的设置有很大的关系。简单来说,有奖励值的差值就会有梯度的存在。假设:某个评价指标变化两个单位的量奖励值才会变化一个单位。比如评价指标从18.33到18.35奖励值才从1.81变为1.82,这其中是两倍的关系,而经过了loss函数的计算,这个奖励值会更大或者更小,**loss函数如何对reward梯度进行影响?**。假设奖励值的设置只和这个评价指标有关。这时就会出现梯度的消失。会出现的问题为:不容易收敛,也就是说神经网络不容易很快学到。如果问题无法慢性解决的话(也就是靠着这些梯度前进的话)。环境会在当前的状态往回走,也就是说它会倒退,为什么呢?在没有梯度的情况下,为了更好的学习获取更多的reward,我要自己创造梯度,而我现在所处的环境是没有梯度的,那我就回去走走吧,借着前几个state的梯度冲一冲,达到下一个reward更高的state。这又会造成另一个问题,我的state往回走多少呢,如果走的太多,可能会导致一直处于这个阶段的某几个状态,也就是所谓的**local optim**。将local optim理解为一个盆地再好不过了。为了冲出这个盆地,我之前的速度不够,所以我退了下来,来到盆地的另一边,借助另一边的坡度,给予我速度。下次能不能冲过就看我有没有足够的速度了。但是这样的盆地并非只有一个,而且在高维空间中,盆地所在的维度也不一样。所以当你的reward在到达某处之后在不断下降,一点都不要方,你的神经网络可能在计算要冲过这个盆地,所需要的速度也就是梯度是多少,就是说它会回退多少。但是,他可能并不知道,这样会使问题变得更加复杂。因为,由于数据不同,每次更新之后神经网络的结构也不一样,所以他可能根本找不到原来的那个盆地了。**为什么神经网络的梯度会选择回退呢?** 48 | 当然,数据多就没事,数据量足够大的情况下(比如DQN的batch sample)就可以比较好的解决这个问题,只要数据是没问题的。数据够多,梯度自然存在。 49 | 50 | 3. 这样的话之前的很好解释训练好的模型,当真正用时效果却很差。在最后很多次epoch的训练过程中,使用的是操作很好的经验值,所以神经网络储存的大部分映射信息被这些操作很好的经验值占据。**能不能将训练好的那部分神经网络固定住呢?用另一个神经网络进行控制?** --------------------------------------------------------------------------------