├── .gitignore ├── README.md ├── environment.yml ├── images ├── DQN_traces.png ├── DRQN_traces.png ├── DTQN_traces.png ├── mean_loss_comparison.png └── mean_scores_comparison.png ├── out ├── trace_DQN_1.txt ├── trace_DQN_10.txt ├── trace_DQN_2.txt ├── trace_DQN_3.txt ├── trace_DQN_4.txt ├── trace_DQN_5.txt ├── trace_DQN_6.txt ├── trace_DQN_7.txt ├── trace_DQN_8.txt ├── trace_DQN_9.txt ├── trace_DRQN_1.txt ├── trace_DRQN_10.txt ├── trace_DRQN_2.txt ├── trace_DRQN_3.txt ├── trace_DRQN_4.txt ├── trace_DRQN_5.txt ├── trace_DRQN_6.txt ├── trace_DRQN_7.txt ├── trace_DRQN_8.txt ├── trace_DRQN_9.txt ├── trace_DTQN_1.txt ├── trace_DTQN_10.txt ├── trace_DTQN_2.txt ├── trace_DTQN_3.txt ├── trace_DTQN_4.txt ├── trace_DTQN_5.txt ├── trace_DTQN_6.txt ├── trace_DTQN_7.txt ├── trace_DTQN_8.txt └── trace_DTQN_9.txt └── src ├── .ipynb_checkpoints └── plot_graphs-checkpoint.ipynb ├── __pycache__ ├── config_DQN.cpython-37.pyc ├── config_DRQN.cpython-37.pyc ├── config_DTQN.cpython-37.pyc ├── memory.cpython-37.pyc ├── model_DQN.cpython-37.pyc ├── model_DRQN.cpython-37.pyc └── model_DTQN.cpython-37.pyc ├── bash_gen_trace.sh ├── config_DQN.py ├── config_DRQN.py ├── config_DTQN.py ├── memory.py ├── model_DQN.py ├── model_DRQN.py ├── model_DTQN.py ├── plot_graphs.ipynb ├── train_DQN.py ├── train_DRQN.py └── train_DTQN.py /.gitignore: -------------------------------------------------------------------------------- 1 | src/logs/* 2 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Transformer Based Reinforcement Learning for Games 2 | 3 | This repository contains experimental models, written in PyTorch, which incorporate transformers in the Deep Q-Learning tasks, to see if they perform better than the RNN based version (DRQN) or simple DQN. 4 | 5 | 6 | # Requirements 7 | ``` 8 | * OpenAI Gym 9 | * PyTorch >= 1.0.0 10 | * Python 3.6+ 11 | * Conda (suggested for building environment etc) 12 | * tensorboardx==1.9 13 | * tensorflow==1.14.0 (non-gpu version will do, only needed for tensorboard) 14 | 15 | (environment.yml provides detailed list of dependency) 16 | ``` 17 | 18 | # How to run experiments? 19 | 20 | Currently, we experiment with the `cartpole` environment, and experiment with 21 | the three different algorithms, DQN, DRQN (using LSTM) and a transformer based model called DTQN. 22 | 23 | The repo is structured in the following mannner 24 | ``` 25 | -src/ 26 | |-config_*.py (config files of a particular algorithm) 27 | |-model_*.py (model definition for a particular algorithm) 28 | |-train_*.py (training file of a particular algorithm) 29 | |-memory.py (action replay memory buffer) 30 | 31 | -out/ 32 | |-trace_*.txt (traces obtained by different algorithms) 33 | ``` 34 | 35 | To run a particular algorithm (say DQN) one can do ``python train_DQN.py`` this will generate the trace for that algorithm. 36 | 37 | # Results 38 | 39 | We have performed multiple experiments for each of the algorithms - DQN, DRQN and DTQN. Each algorithm was trained for 5000 episodes and we ran 10 different instances for each of the algorithms with random initialization.
The folowing pictures illustrate the plots of scores over episodes for different runs. 40 | 41 | ![Scores vs Episodes for Multiple runs of DQN](https://github.com/udion/Transformer-RL/blob/master/images/DQN_traces.png)
42 | ![Scores vs Episodes for Multiple runs of DRQN](https://github.com/udion/Transformer-RL/blob/master/images/DRQN_traces.png)
43 | ![Scores vs Episodes for Multiple runs of DTQN](https://github.com/udion/Transformer-RL/blob/master/images/DTQN_traces.png)
44 | 45 | 46 | -------------------------------------------------------------------------------- /environment.yml: -------------------------------------------------------------------------------- 1 | name: cixd_rl 2 | channels: 3 | - pytorch 4 | - defaults 5 | dependencies: 6 | - _libgcc_mutex=0.1=main 7 | - blas=1.0=mkl 8 | - ca-certificates=2019.8.28=0 9 | - certifi=2019.9.11=py37_0 10 | - cffi=1.12.3=py37h2e261b9_0 11 | - cudatoolkit=10.0.130=0 12 | - freetype=2.9.1=h8a8886c_1 13 | - intel-openmp=2019.4=243 14 | - jpeg=9b=h024ee3a_2 15 | - libedit=3.1.20181209=hc058e9b_0 16 | - libffi=3.2.1=hd88cf55_4 17 | - libgcc-ng=9.1.0=hdf63c60_0 18 | - libgfortran-ng=7.3.0=hdf63c60_0 19 | - libpng=1.6.37=hbc83047_0 20 | - libstdcxx-ng=9.1.0=hdf63c60_0 21 | - libtiff=4.0.10=h2733197_2 22 | - mkl=2019.4=243 23 | - mkl-service=2.3.0=py37he904b0f_0 24 | - mkl_fft=1.0.14=py37ha843d7b_0 25 | - mkl_random=1.1.0=py37hd6b4f25_0 26 | - ncurses=6.1=he6710b0_1 27 | - ninja=1.9.0=py37hfd86e86_0 28 | - numpy=1.16.5=py37h7e9f1db_0 29 | - numpy-base=1.16.5=py37hde5b4d6_0 30 | - olefile=0.46=py37_0 31 | - openssl=1.1.1d=h7b6447c_1 32 | - pillow=6.1.0=py37h34e0f95_0 33 | - pip=19.2.3=py37_0 34 | - pycparser=2.19=py37_0 35 | - python=3.7.4=h265db76_1 36 | - readline=7.0=h7b6447c_5 37 | - setuptools=41.2.0=py37_0 38 | - six=1.12.0=py37_0 39 | - sqlite=3.29.0=h7b6447c_0 40 | - tk=8.6.8=hbc83047_0 41 | - wheel=0.33.6=py37_0 42 | - xz=5.2.4=h14c3975_4 43 | - zlib=1.2.11=h7b6447c_3 44 | - zstd=1.3.7=h0b5b093_0 45 | - pytorch=1.2.0=py3.7_cuda10.0.130_cudnn7.6.2_0 46 | - torchvision=0.4.0=py37_cu100 47 | - pip: 48 | - absl-py==0.8.1 49 | - astor==0.8.0 50 | - atari-py==0.2.6 51 | - attrs==19.3.0 52 | - backcall==0.1.0 53 | - baselines==0.1.6 54 | - bleach==3.1.0 55 | - click==7.0 56 | - cloudpickle==1.2.2 57 | - cycler==0.10.0 58 | - decorator==4.4.1 59 | - defusedxml==0.6.0 60 | - entrypoints==0.3 61 | - future==0.18.2 62 | - gast==0.3.2 63 | - google-pasta==0.1.8 64 | - grpcio==1.25.0 65 | - gym==0.15.4 66 | - gym-ple==0.3 67 | - h5py==2.10.0 68 | - importlib-metadata==0.23 69 | - ipykernel==5.1.3 70 | - ipython==7.9.0 71 | - ipython-genutils==0.2.0 72 | - ipywidgets==7.5.1 73 | - jedi==0.15.1 74 | - jinja2==2.10.3 75 | - joblib==0.14.0 76 | - jsonschema==3.1.1 77 | - jupyter==1.0.0 78 | - jupyter-client==5.3.4 79 | - jupyter-console==6.0.0 80 | - jupyter-core==4.6.1 81 | - keras-applications==1.0.8 82 | - keras-preprocessing==1.1.0 83 | - kiwisolver==1.1.0 84 | - markdown==3.1.1 85 | - markupsafe==1.1.1 86 | - matplotlib==3.1.1 87 | - mistune==0.8.4 88 | - more-itertools==7.2.0 89 | - nbconvert==5.6.1 90 | - nbformat==4.4.0 91 | - notebook==6.0.2 92 | - opencv-python==4.1.1.26 93 | - pandas==0.25.3 94 | - pandocfilters==1.4.2 95 | - parso==0.5.1 96 | - pexpect==4.7.0 97 | - pickleshare==0.7.5 98 | - ple==0.0.1 99 | - prometheus-client==0.7.1 100 | - prompt-toolkit==2.0.10 101 | - protobuf==3.10.0 102 | - ptyprocess==0.6.0 103 | - pyglet==1.3.2 104 | - pygments==2.4.2 105 | - pyparsing==2.4.5 106 | - pyrsistent==0.15.5 107 | - python-dateutil==2.8.1 108 | - pytz==2019.3 109 | - pyzmq==18.1.1 110 | - qtconsole==4.5.5 111 | - scipy==1.3.2 112 | - seaborn==0.9.0 113 | - send2trash==1.5.0 114 | - tensorboard==1.14.0 115 | - tensorboardx==1.9 116 | - tensorflow==1.14.0 117 | - tensorflow-estimator==1.14.0 118 | - termcolor==1.1.0 119 | - terminado==0.8.3 120 | - testpath==0.4.4 121 | - torch==1.2.0 122 | - tornado==6.0.3 123 | - tqdm==4.38.0 124 | - traitlets==4.3.3 125 | - wcwidth==0.1.7 126 | - webencodings==0.5.1 127 | - werkzeug==0.16.0 128 | - widgetsnbextension==3.5.1 129 | - wrapt==1.11.2 130 | - zipp==0.6.0 131 | 132 | -------------------------------------------------------------------------------- /images/DQN_traces.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/udion/Transformer-RL/2cdc29f6c31375f69d1ca1e2ce74b01300b94044/images/DQN_traces.png -------------------------------------------------------------------------------- /images/DRQN_traces.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/udion/Transformer-RL/2cdc29f6c31375f69d1ca1e2ce74b01300b94044/images/DRQN_traces.png -------------------------------------------------------------------------------- /images/DTQN_traces.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/udion/Transformer-RL/2cdc29f6c31375f69d1ca1e2ce74b01300b94044/images/DTQN_traces.png -------------------------------------------------------------------------------- /images/mean_loss_comparison.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/udion/Transformer-RL/2cdc29f6c31375f69d1ca1e2ce74b01300b94044/images/mean_loss_comparison.png -------------------------------------------------------------------------------- /images/mean_scores_comparison.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/udion/Transformer-RL/2cdc29f6c31375f69d1ca1e2ce74b01300b94044/images/mean_scores_comparison.png -------------------------------------------------------------------------------- /out/trace_DTQN_1.txt: -------------------------------------------------------------------------------- 1 | state size: 2 2 | action size: 2 3 | 0 episode | score: 42.00 | loss: 0.00000 | epsilon: 1.00 4 | 10 episode | score: 40.06 | loss: 0.00000 | epsilon: 1.00 5 | 20 episode | score: 38.51 | loss: 0.00000 | epsilon: 1.00 6 | 30 episode | score: 37.15 | loss: 0.00000 | epsilon: 1.00 7 | 40 episode | score: 35.32 | loss: 0.00000 | epsilon: 1.00 8 | 50 episode | score: 34.80 | loss: 0.16093 | epsilon: 0.99 9 | 60 episode | score: 34.03 | loss: 0.16544 | epsilon: 0.97 10 | 70 episode | score: 32.72 | loss: 0.41782 | epsilon: 0.96 11 | 80 episode | score: 30.96 | loss: 0.35763 | epsilon: 0.95 12 | 90 episode | score: 30.62 | loss: 1.54800 | epsilon: 0.94 13 | 100 episode | score: 29.44 | loss: 0.29163 | epsilon: 0.93 14 | 110 episode | score: 28.56 | loss: 0.46364 | epsilon: 0.92 15 | 120 episode | score: 27.74 | loss: 0.44584 | epsilon: 0.91 16 | 130 episode | score: 27.39 | loss: 0.37214 | epsilon: 0.90 17 | 140 episode | score: 27.27 | loss: 0.23095 | epsilon: 0.88 18 | 150 episode | score: 26.95 | loss: 0.43954 | epsilon: 0.87 19 | 160 episode | score: 26.27 | loss: 0.45517 | epsilon: 0.86 20 | 170 episode | score: 25.34 | loss: 1.56370 | epsilon: 0.85 21 | 180 episode | score: 24.77 | loss: 0.74127 | epsilon: 0.84 22 | 190 episode | score: 24.02 | loss: 0.54314 | epsilon: 0.83 23 | 200 episode | score: 23.12 | loss: 0.88684 | epsilon: 0.82 24 | 210 episode | score: 22.73 | loss: 1.21659 | epsilon: 0.81 25 | 220 episode | score: 22.60 | loss: 1.68564 | epsilon: 0.80 26 | 230 episode | score: 22.24 | loss: 0.63498 | epsilon: 0.79 27 | 240 episode | score: 22.67 | loss: 0.67226 | epsilon: 0.78 28 | 250 episode | score: 22.03 | loss: 0.64331 | epsilon: 0.77 29 | 260 episode | score: 22.30 | loss: 0.37808 | epsilon: 0.76 30 | 270 episode | score: 21.48 | loss: 1.64759 | epsilon: 0.75 31 | 280 episode | score: 21.68 | loss: 1.39732 | epsilon: 0.74 32 | 290 episode | score: 21.12 | loss: 1.54835 | epsilon: 0.73 33 | 300 episode | score: 21.05 | loss: 0.41286 | epsilon: 0.72 34 | 310 episode | score: 20.76 | loss: 1.75307 | epsilon: 0.71 35 | 320 episode | score: 20.23 | loss: 0.77339 | epsilon: 0.70 36 | 330 episode | score: 19.58 | loss: 1.25978 | epsilon: 0.69 37 | 340 episode | score: 19.76 | loss: 0.42972 | epsilon: 0.68 38 | 350 episode | score: 19.78 | loss: 0.81583 | epsilon: 0.67 39 | 360 episode | score: 20.31 | loss: 1.31229 | epsilon: 0.66 40 | 370 episode | score: 20.49 | loss: 0.85073 | epsilon: 0.65 41 | 380 episode | score: 20.23 | loss: 1.30102 | epsilon: 0.64 42 | 390 episode | score: 19.76 | loss: 0.87183 | epsilon: 0.63 43 | 400 episode | score: 19.27 | loss: 2.18815 | epsilon: 0.62 44 | 410 episode | score: 18.73 | loss: 1.33286 | epsilon: 0.62 45 | 420 episode | score: 18.34 | loss: 1.35239 | epsilon: 0.61 46 | 430 episode | score: 17.94 | loss: 3.20092 | epsilon: 0.60 47 | 440 episode | score: 17.70 | loss: 2.29407 | epsilon: 0.59 48 | 450 episode | score: 17.30 | loss: 2.79657 | epsilon: 0.58 49 | 460 episode | score: 16.82 | loss: 0.52218 | epsilon: 0.58 50 | 470 episode | score: 16.39 | loss: 2.35247 | epsilon: 0.57 51 | 480 episode | score: 16.28 | loss: 1.91684 | epsilon: 0.56 52 | 490 episode | score: 16.54 | loss: 3.37298 | epsilon: 0.55 53 | 500 episode | score: 16.32 | loss: 2.42507 | epsilon: 0.55 54 | 510 episode | score: 16.00 | loss: 2.96815 | epsilon: 0.54 55 | 520 episode | score: 15.60 | loss: 1.01627 | epsilon: 0.53 56 | 530 episode | score: 15.35 | loss: 2.97971 | epsilon: 0.53 57 | 540 episode | score: 15.45 | loss: 5.00929 | epsilon: 0.52 58 | 550 episode | score: 15.29 | loss: 0.59140 | epsilon: 0.51 59 | 560 episode | score: 15.29 | loss: 2.05343 | epsilon: 0.50 60 | 570 episode | score: 14.97 | loss: 1.67497 | epsilon: 0.49 61 | 580 episode | score: 14.70 | loss: 1.11788 | epsilon: 0.49 62 | 590 episode | score: 14.39 | loss: 2.09961 | epsilon: 0.48 63 | 600 episode | score: 14.15 | loss: 1.91011 | epsilon: 0.47 64 | 610 episode | score: 14.05 | loss: 1.59686 | epsilon: 0.47 65 | 620 episode | score: 13.69 | loss: 4.83170 | epsilon: 0.46 66 | 630 episode | score: 13.44 | loss: 2.16104 | epsilon: 0.46 67 | 640 episode | score: 13.62 | loss: 1.68080 | epsilon: 0.45 68 | 650 episode | score: 13.47 | loss: 4.89931 | epsilon: 0.44 69 | 660 episode | score: 13.48 | loss: 4.99668 | epsilon: 0.43 70 | 670 episode | score: 13.80 | loss: 2.77219 | epsilon: 0.43 71 | 680 episode | score: 13.58 | loss: 3.95906 | epsilon: 0.42 72 | 690 episode | score: 13.37 | loss: 3.97788 | epsilon: 0.41 73 | 700 episode | score: 13.37 | loss: 3.35664 | epsilon: 0.41 74 | 710 episode | score: 13.21 | loss: 2.86256 | epsilon: 0.40 75 | 720 episode | score: 12.90 | loss: 3.48034 | epsilon: 0.39 76 | 730 episode | score: 12.65 | loss: 5.11371 | epsilon: 0.39 77 | 740 episode | score: 12.52 | loss: 2.88403 | epsilon: 0.38 78 | 750 episode | score: 12.88 | loss: 3.52321 | epsilon: 0.37 79 | 760 episode | score: 12.65 | loss: 5.20732 | epsilon: 0.37 80 | 770 episode | score: 12.58 | loss: 5.21305 | epsilon: 0.36 81 | 780 episode | score: 12.32 | loss: 5.81512 | epsilon: 0.36 82 | 790 episode | score: 12.32 | loss: 5.29636 | epsilon: 0.35 83 | 800 episode | score: 12.15 | loss: 3.56297 | epsilon: 0.34 84 | 810 episode | score: 12.21 | loss: 3.53993 | epsilon: 0.34 85 | 820 episode | score: 12.08 | loss: 4.17779 | epsilon: 0.33 86 | 830 episode | score: 11.95 | loss: 6.10416 | epsilon: 0.32 87 | 840 episode | score: 11.68 | loss: 2.46521 | epsilon: 0.32 88 | 850 episode | score: 11.59 | loss: 4.23309 | epsilon: 0.31 89 | 860 episode | score: 11.50 | loss: 4.83049 | epsilon: 0.31 90 | 870 episode | score: 11.56 | loss: 7.28215 | epsilon: 0.30 91 | 880 episode | score: 11.42 | loss: 6.71525 | epsilon: 0.30 92 | 890 episode | score: 11.60 | loss: 4.95379 | epsilon: 0.29 93 | 900 episode | score: 11.63 | loss: 4.90676 | epsilon: 0.28 94 | 910 episode | score: 11.52 | loss: 5.55674 | epsilon: 0.28 95 | 920 episode | score: 11.38 | loss: 6.19411 | epsilon: 0.27 96 | 930 episode | score: 11.26 | loss: 3.80527 | epsilon: 0.27 97 | 940 episode | score: 11.34 | loss: 5.21086 | epsilon: 0.26 98 | 950 episode | score: 11.27 | loss: 6.26882 | epsilon: 0.25 99 | 960 episode | score: 11.08 | loss: 3.15750 | epsilon: 0.25 100 | 970 episode | score: 11.38 | loss: 5.66316 | epsilon: 0.24 101 | 980 episode | score: 11.24 | loss: 6.99554 | epsilon: 0.23 102 | 990 episode | score: 11.03 | loss: 5.79221 | epsilon: 0.23 103 | 1000 episode | score: 11.01 | loss: 2.61911 | epsilon: 0.22 104 | 1010 episode | score: 10.89 | loss: 4.51524 | epsilon: 0.22 105 | 1020 episode | score: 10.75 | loss: 4.46806 | epsilon: 0.21 106 | 1030 episode | score: 10.56 | loss: 7.10737 | epsilon: 0.21 107 | 1040 episode | score: 10.48 | loss: 6.40347 | epsilon: 0.20 108 | 1050 episode | score: 10.52 | loss: 5.79626 | epsilon: 0.20 109 | 1060 episode | score: 10.48 | loss: 5.86367 | epsilon: 0.19 110 | 1070 episode | score: 10.32 | loss: 7.76478 | epsilon: 0.19 111 | 1080 episode | score: 10.26 | loss: 2.62518 | epsilon: 0.18 112 | 1090 episode | score: 10.23 | loss: 4.57249 | epsilon: 0.18 113 | 1100 episode | score: 10.12 | loss: 8.46477 | epsilon: 0.17 114 | 1110 episode | score: 9.98 | loss: 4.59900 | epsilon: 0.17 115 | 1120 episode | score: 9.91 | loss: 6.57322 | epsilon: 0.16 116 | 1130 episode | score: 9.92 | loss: 7.20591 | epsilon: 0.16 117 | 1140 episode | score: 9.82 | loss: 8.53715 | epsilon: 0.15 118 | 1150 episode | score: 9.81 | loss: 6.61314 | epsilon: 0.15 119 | 1160 episode | score: 9.72 | loss: 6.60741 | epsilon: 0.14 120 | 1170 episode | score: 9.86 | loss: 8.11260 | epsilon: 0.13 121 | 1180 episode | score: 9.93 | loss: 6.63596 | epsilon: 0.13 122 | 1190 episode | score: 9.79 | loss: 6.67112 | epsilon: 0.12 123 | 1200 episode | score: 9.77 | loss: 6.09025 | epsilon: 0.12 124 | 1210 episode | score: 9.65 | loss: 4.76503 | epsilon: 0.11 125 | 1220 episode | score: 9.56 | loss: 4.09326 | epsilon: 0.11 126 | 1230 episode | score: 9.73 | loss: 8.71921 | epsilon: 0.10 127 | 1240 episode | score: 9.89 | loss: 8.09039 | epsilon: 0.10 128 | 1250 episode | score: 9.74 | loss: 7.44783 | epsilon: 0.09 129 | 1260 episode | score: 9.70 | loss: 7.45099 | epsilon: 0.09 130 | 1270 episode | score: 9.69 | loss: 8.78152 | epsilon: 0.08 131 | 1280 episode | score: 9.59 | loss: 8.14003 | epsilon: 0.08 132 | 1290 episode | score: 9.50 | loss: 6.83141 | epsilon: 0.07 133 | 1300 episode | score: 9.38 | loss: 8.14618 | epsilon: 0.07 134 | 1310 episode | score: 9.53 | loss: 8.83955 | epsilon: 0.06 135 | 1320 episode | score: 9.47 | loss: 10.88557 | epsilon: 0.06 136 | 1330 episode | score: 9.53 | loss: 10.25268 | epsilon: 0.05 137 | 1340 episode | score: 9.47 | loss: 7.50952 | epsilon: 0.05 138 | 1350 episode | score: 9.41 | loss: 6.84119 | epsilon: 0.04 139 | 1360 episode | score: 9.33 | loss: 7.62696 | epsilon: 0.04 140 | 1370 episode | score: 9.28 | loss: 8.29397 | epsilon: 0.03 141 | 1380 episode | score: 9.23 | loss: 8.93910 | epsilon: 0.03 142 | 1390 episode | score: 9.21 | loss: 8.32343 | epsilon: 0.02 143 | 1400 episode | score: 9.16 | loss: 11.03749 | epsilon: 0.02 144 | 1410 episode | score: 9.22 | loss: 11.10894 | epsilon: 0.01 145 | 1420 episode | score: 9.10 | loss: 9.04405 | epsilon: 0.01 146 | 1430 episode | score: 9.03 | loss: 10.40970 | epsilon: 0.01 147 | 1440 episode | score: 8.99 | loss: 9.06423 | epsilon: 0.01 148 | 1450 episode | score: 8.95 | loss: 9.79906 | epsilon: 0.01 149 | 1460 episode | score: 8.91 | loss: 10.45152 | epsilon: 0.01 150 | 1470 episode | score: 8.84 | loss: 11.17154 | epsilon: 0.01 151 | 1480 episode | score: 8.81 | loss: 13.30375 | epsilon: 0.01 152 | 1490 episode | score: 8.78 | loss: 10.50567 | epsilon: 0.01 153 | 1500 episode | score: 8.75 | loss: 11.23524 | epsilon: 0.01 154 | 1510 episode | score: 8.91 | loss: 10.59758 | epsilon: 0.01 155 | 1520 episode | score: 8.88 | loss: 7.06622 | epsilon: 0.01 156 | 1530 episode | score: 8.87 | loss: 11.27084 | epsilon: 0.01 157 | 1540 episode | score: 8.81 | loss: 13.44193 | epsilon: 0.01 158 | 1550 episode | score: 8.94 | loss: 6.37859 | epsilon: 0.01 159 | 1560 episode | score: 9.02 | loss: 11.31540 | epsilon: 0.01 160 | 1570 episode | score: 8.95 | loss: 9.99572 | epsilon: 0.01 161 | 1580 episode | score: 8.93 | loss: 9.93918 | epsilon: 0.01 162 | 1590 episode | score: 8.90 | loss: 7.20135 | epsilon: 0.01 163 | 1600 episode | score: 8.87 | loss: 7.13101 | epsilon: 0.01 164 | 1610 episode | score: 8.82 | loss: 9.29032 | epsilon: 0.01 165 | 1620 episode | score: 8.86 | loss: 9.27523 | epsilon: 0.01 166 | 1630 episode | score: 8.83 | loss: 9.26597 | epsilon: 0.01 167 | 1640 episode | score: 8.76 | loss: 11.53603 | epsilon: 0.01 168 | 1650 episode | score: 8.92 | loss: 10.00630 | epsilon: 0.01 169 | 1660 episode | score: 8.89 | loss: 7.93438 | epsilon: 0.01 170 | 1670 episode | score: 8.86 | loss: 10.85615 | epsilon: 0.01 171 | 1680 episode | score: 9.04 | loss: 9.40141 | epsilon: 0.01 172 | 1690 episode | score: 9.07 | loss: 8.03816 | epsilon: 0.01 173 | 1700 episode | score: 9.02 | loss: 9.38599 | epsilon: 0.01 174 | 1710 episode | score: 8.96 | loss: 10.12590 | epsilon: 0.01 175 | 1720 episode | score: 9.00 | loss: 10.15560 | epsilon: 0.01 176 | 1730 episode | score: 8.91 | loss: 8.75969 | epsilon: 0.01 177 | 1740 episode | score: 8.86 | loss: 11.63796 | epsilon: 0.01 178 | 1750 episode | score: 8.80 | loss: 8.73201 | epsilon: 0.01 179 | 1760 episode | score: 8.88 | loss: 8.85375 | epsilon: 0.01 180 | 1770 episode | score: 8.83 | loss: 10.21729 | epsilon: 0.01 181 | 1780 episode | score: 8.84 | loss: 10.34629 | epsilon: 0.01 182 | 1790 episode | score: 8.75 | loss: 10.94930 | epsilon: 0.01 183 | 1800 episode | score: 8.76 | loss: 8.05821 | epsilon: 0.01 184 | 1810 episode | score: 8.72 | loss: 8.11742 | epsilon: 0.01 185 | 1820 episode | score: 8.63 | loss: 13.22092 | epsilon: 0.01 186 | 1830 episode | score: 8.66 | loss: 12.51130 | epsilon: 0.01 187 | 1840 episode | score: 8.60 | loss: 11.05582 | epsilon: 0.01 188 | 1850 episode | score: 8.56 | loss: 10.29749 | epsilon: 0.01 189 | 1860 episode | score: 8.51 | loss: 9.62049 | epsilon: 0.01 190 | 1870 episode | score: 8.52 | loss: 11.11880 | epsilon: 0.01 191 | 1880 episode | score: 8.62 | loss: 12.55165 | epsilon: 0.01 192 | 1890 episode | score: 8.60 | loss: 12.51082 | epsilon: 0.01 193 | 1900 episode | score: 8.77 | loss: 9.59667 | epsilon: 0.01 194 | 1910 episode | score: 8.71 | loss: 11.08668 | epsilon: 0.01 195 | 1920 episode | score: 8.66 | loss: 7.43915 | epsilon: 0.01 196 | 1930 episode | score: 8.66 | loss: 9.64025 | epsilon: 0.01 197 | 1940 episode | score: 9.45 | loss: 11.13645 | epsilon: 0.01 198 | 1950 episode | score: 9.37 | loss: 7.49247 | epsilon: 0.01 199 | 1960 episode | score: 9.33 | loss: 9.59682 | epsilon: 0.01 200 | 1970 episode | score: 9.28 | loss: 14.80193 | epsilon: 0.01 201 | 1980 episode | score: 9.28 | loss: 10.48195 | epsilon: 0.01 202 | 1990 episode | score: 9.21 | loss: 7.45852 | epsilon: 0.01 203 | 2000 episode | score: 9.22 | loss: 5.95186 | epsilon: 0.01 204 | 2010 episode | score: 9.19 | loss: 8.95806 | epsilon: 0.01 205 | 2020 episode | score: 9.08 | loss: 9.78051 | epsilon: 0.01 206 | 2030 episode | score: 9.03 | loss: 11.93187 | epsilon: 0.01 207 | 2040 episode | score: 8.98 | loss: 11.95851 | epsilon: 0.01 208 | 2050 episode | score: 8.92 | loss: 12.71336 | epsilon: 0.01 209 | 2060 episode | score: 8.99 | loss: 10.47637 | epsilon: 0.01 210 | 2070 episode | score: 8.90 | loss: 6.80452 | epsilon: 0.01 211 | 2080 episode | score: 8.84 | loss: 14.29578 | epsilon: 0.01 212 | 2090 episode | score: 8.78 | loss: 11.28092 | epsilon: 0.01 213 | 2100 episode | score: 8.73 | loss: 9.88292 | epsilon: 0.01 214 | 2110 episode | score: 8.71 | loss: 13.52903 | epsilon: 0.01 215 | 2120 episode | score: 9.26 | loss: 8.32086 | epsilon: 0.01 216 | 2130 episode | score: 9.29 | loss: 12.94514 | epsilon: 0.01 217 | 2140 episode | score: 9.24 | loss: 6.88041 | epsilon: 0.01 218 | 2150 episode | score: 9.20 | loss: 7.53566 | epsilon: 0.01 219 | 2160 episode | score: 9.15 | loss: 10.57451 | epsilon: 0.01 220 | 2170 episode | score: 9.07 | loss: 12.07813 | epsilon: 0.01 221 | 2180 episode | score: 9.00 | loss: 6.05306 | epsilon: 0.01 222 | 2190 episode | score: 8.92 | loss: 7.59512 | epsilon: 0.01 223 | 2200 episode | score: 8.93 | loss: 9.81163 | epsilon: 0.01 224 | 2210 episode | score: 9.21 | loss: 9.08550 | epsilon: 0.01 225 | 2220 episode | score: 9.15 | loss: 12.10806 | epsilon: 0.01 226 | 2230 episode | score: 9.06 | loss: 13.57111 | epsilon: 0.01 227 | 2240 episode | score: 9.03 | loss: 7.54174 | epsilon: 0.01 228 | 2250 episode | score: 8.96 | loss: 8.35778 | epsilon: 0.01 229 | 2260 episode | score: 8.94 | loss: 12.08924 | epsilon: 0.01 230 | 2270 episode | score: 8.84 | loss: 14.33356 | epsilon: 0.01 231 | 2280 episode | score: 9.28 | loss: 8.35518 | epsilon: 0.01 232 | 2290 episode | score: 9.19 | loss: 10.67279 | epsilon: 0.01 233 | 2300 episode | score: 9.10 | loss: 8.38233 | epsilon: 0.01 234 | 2310 episode | score: 9.05 | loss: 11.37863 | epsilon: 0.01 235 | 2320 episode | score: 8.98 | loss: 7.61795 | epsilon: 0.01 236 | 2330 episode | score: 8.93 | loss: 9.88662 | epsilon: 0.01 237 | 2340 episode | score: 8.86 | loss: 10.59986 | epsilon: 0.01 238 | 2350 episode | score: 8.81 | loss: 10.60581 | epsilon: 0.01 239 | 2360 episode | score: 8.80 | loss: 11.37358 | epsilon: 0.01 240 | 2370 episode | score: 8.71 | loss: 9.16223 | epsilon: 0.01 241 | 2380 episode | score: 8.69 | loss: 12.14555 | epsilon: 0.01 242 | 2390 episode | score: 8.69 | loss: 7.63645 | epsilon: 0.01 243 | 2400 episode | score: 8.69 | loss: 15.23756 | epsilon: 0.01 244 | 2410 episode | score: 8.67 | loss: 12.32878 | epsilon: 0.01 245 | 2420 episode | score: 8.64 | loss: 10.72119 | epsilon: 0.01 246 | 2430 episode | score: 8.64 | loss: 12.24568 | epsilon: 0.01 247 | 2440 episode | score: 8.60 | loss: 14.54856 | epsilon: 0.01 248 | 2450 episode | score: 8.69 | loss: 12.30029 | epsilon: 0.01 249 | 2460 episode | score: 8.64 | loss: 7.73817 | epsilon: 0.01 250 | 2470 episode | score: 8.60 | loss: 13.07990 | epsilon: 0.01 251 | 2480 episode | score: 8.75 | loss: 10.10155 | epsilon: 0.01 252 | 2490 episode | score: 8.79 | loss: 9.20475 | epsilon: 0.01 253 | 2500 episode | score: 8.89 | loss: 10.77987 | epsilon: 0.01 254 | 2510 episode | score: 8.91 | loss: 10.04893 | epsilon: 0.01 255 | 2520 episode | score: 8.87 | loss: 10.77398 | epsilon: 0.01 256 | 2530 episode | score: 8.95 | loss: 8.53565 | epsilon: 0.01 257 | 2540 episode | score: 8.95 | loss: 7.73821 | epsilon: 0.01 258 | 2550 episode | score: 9.63 | loss: 13.85527 | epsilon: 0.01 259 | 2560 episode | score: 9.53 | loss: 13.11086 | epsilon: 0.01 260 | 2570 episode | score: 9.42 | loss: 13.81915 | epsilon: 0.01 261 | 2580 episode | score: 9.37 | loss: 6.15294 | epsilon: 0.01 262 | 2590 episode | score: 9.33 | loss: 6.17238 | epsilon: 0.01 263 | 2600 episode | score: 9.27 | loss: 4.73876 | epsilon: 0.01 264 | 2610 episode | score: 9.20 | loss: 11.56850 | epsilon: 0.01 265 | 2620 episode | score: 9.09 | loss: 10.02787 | epsilon: 0.01 266 | 2630 episode | score: 8.98 | loss: 12.40529 | epsilon: 0.01 267 | 2640 episode | score: 8.97 | loss: 9.43007 | epsilon: 0.01 268 | 2650 episode | score: 8.91 | loss: 7.74540 | epsilon: 0.01 269 | 2660 episode | score: 8.87 | loss: 7.75366 | epsilon: 0.01 270 | 2670 episode | score: 8.88 | loss: 10.15491 | epsilon: 0.01 271 | 2680 episode | score: 8.84 | loss: 13.18449 | epsilon: 0.01 272 | 2690 episode | score: 8.85 | loss: 13.98673 | epsilon: 0.01 273 | 2700 episode | score: 8.78 | loss: 13.95599 | epsilon: 0.01 274 | 2710 episode | score: 8.77 | loss: 13.96089 | epsilon: 0.01 275 | 2720 episode | score: 8.76 | loss: 13.20681 | epsilon: 0.01 276 | 2730 episode | score: 8.72 | loss: 14.73774 | epsilon: 0.01 277 | 2740 episode | score: 8.68 | loss: 10.11293 | epsilon: 0.01 278 | 2750 episode | score: 8.66 | loss: 11.65639 | epsilon: 0.01 279 | 2760 episode | score: 8.65 | loss: 11.73176 | epsilon: 0.01 280 | 2770 episode | score: 8.58 | loss: 9.37598 | epsilon: 0.01 281 | 2780 episode | score: 8.52 | loss: 10.91862 | epsilon: 0.01 282 | 2790 episode | score: 8.54 | loss: 12.41489 | epsilon: 0.01 283 | 2800 episode | score: 8.53 | loss: 13.98008 | epsilon: 0.01 284 | 2810 episode | score: 8.52 | loss: 11.63640 | epsilon: 0.01 285 | 2820 episode | score: 8.51 | loss: 12.39264 | epsilon: 0.01 286 | 2830 episode | score: 8.52 | loss: 11.66165 | epsilon: 0.01 287 | 2840 episode | score: 8.51 | loss: 11.61177 | epsilon: 0.01 288 | 2850 episode | score: 8.49 | loss: 10.06735 | epsilon: 0.01 289 | 2860 episode | score: 8.44 | loss: 10.97625 | epsilon: 0.01 290 | 2870 episode | score: 8.45 | loss: 12.41634 | epsilon: 0.01 291 | 2880 episode | score: 8.42 | loss: 11.74682 | epsilon: 0.01 292 | 2890 episode | score: 8.49 | loss: 7.79012 | epsilon: 0.01 293 | 2900 episode | score: 8.50 | loss: 9.37829 | epsilon: 0.01 294 | 2910 episode | score: 8.61 | loss: 9.34004 | epsilon: 0.01 295 | 2920 episode | score: 8.68 | loss: 7.81453 | epsilon: 0.01 296 | 2930 episode | score: 8.66 | loss: 14.08928 | epsilon: 0.01 297 | 2940 episode | score: 8.75 | loss: 12.54502 | epsilon: 0.01 298 | 2950 episode | score: 9.07 | loss: 13.32049 | epsilon: 0.01 299 | 2960 episode | score: 8.97 | loss: 14.04430 | epsilon: 0.01 300 | 2970 episode | score: 8.92 | loss: 6.30872 | epsilon: 0.01 301 | 2980 episode | score: 8.90 | loss: 7.05106 | epsilon: 0.01 302 | 2990 episode | score: 8.88 | loss: 8.59156 | epsilon: 0.01 303 | 3000 episode | score: 8.84 | loss: 10.28142 | epsilon: 0.01 304 | 3010 episode | score: 8.81 | loss: 9.38775 | epsilon: 0.01 305 | 3020 episode | score: 8.82 | loss: 10.99244 | epsilon: 0.01 306 | 3030 episode | score: 8.78 | loss: 11.80965 | epsilon: 0.01 307 | 3040 episode | score: 8.70 | loss: 11.76866 | epsilon: 0.01 308 | 3050 episode | score: 8.67 | loss: 10.16454 | epsilon: 0.01 309 | 3060 episode | score: 8.66 | loss: 11.08585 | epsilon: 0.01 310 | 3070 episode | score: 8.62 | loss: 10.20813 | epsilon: 0.01 311 | 3080 episode | score: 8.62 | loss: 10.95652 | epsilon: 0.01 312 | 3090 episode | score: 8.58 | loss: 15.65759 | epsilon: 0.01 313 | 3100 episode | score: 8.55 | loss: 14.13209 | epsilon: 0.01 314 | 3110 episode | score: 8.66 | loss: 10.94310 | epsilon: 0.01 315 | 3120 episode | score: 8.61 | loss: 14.11647 | epsilon: 0.01 316 | 3130 episode | score: 8.65 | loss: 11.84604 | epsilon: 0.01 317 | 3140 episode | score: 8.63 | loss: 12.52877 | epsilon: 0.01 318 | 3150 episode | score: 8.64 | loss: 12.53898 | epsilon: 0.01 319 | 3160 episode | score: 8.61 | loss: 11.78446 | epsilon: 0.01 320 | 3170 episode | score: 8.63 | loss: 15.68868 | epsilon: 0.01 321 | 3180 episode | score: 8.64 | loss: 11.01862 | epsilon: 0.01 322 | 3190 episode | score: 8.72 | loss: 15.68075 | epsilon: 0.01 323 | 3200 episode | score: 8.74 | loss: 9.40447 | epsilon: 0.01 324 | 3210 episode | score: 8.70 | loss: 8.67002 | epsilon: 0.01 325 | 3220 episode | score: 8.68 | loss: 9.44660 | epsilon: 0.01 326 | 3230 episode | score: 8.61 | loss: 10.25746 | epsilon: 0.01 327 | 3240 episode | score: 8.59 | loss: 15.70494 | epsilon: 0.01 328 | 3250 episode | score: 8.60 | loss: 10.97750 | epsilon: 0.01 329 | 3260 episode | score: 8.57 | loss: 8.68631 | epsilon: 0.01 330 | 3270 episode | score: 8.60 | loss: 11.00549 | epsilon: 0.01 331 | 3280 episode | score: 8.59 | loss: 10.23621 | epsilon: 0.01 332 | 3290 episode | score: 8.55 | loss: 11.06288 | epsilon: 0.01 333 | 3300 episode | score: 8.53 | loss: 11.07038 | epsilon: 0.01 334 | 3310 episode | score: 8.49 | loss: 9.51873 | epsilon: 0.01 335 | 3320 episode | score: 8.51 | loss: 10.22602 | epsilon: 0.01 336 | 3330 episode | score: 8.49 | loss: 10.26356 | epsilon: 0.01 337 | 3340 episode | score: 8.52 | loss: 10.24619 | epsilon: 0.01 338 | 3350 episode | score: 8.53 | loss: 11.02440 | epsilon: 0.01 339 | 3360 episode | score: 8.65 | loss: 14.21980 | epsilon: 0.01 340 | 3370 episode | score: 8.63 | loss: 9.46559 | epsilon: 0.01 341 | 3380 episode | score: 8.65 | loss: 9.56488 | epsilon: 0.01 342 | 3390 episode | score: 8.63 | loss: 11.02502 | epsilon: 0.01 343 | 3400 episode | score: 8.57 | loss: 11.85219 | epsilon: 0.01 344 | 3410 episode | score: 8.56 | loss: 13.38109 | epsilon: 0.01 345 | 3420 episode | score: 8.65 | loss: 7.90535 | epsilon: 0.01 346 | 3430 episode | score: 8.64 | loss: 15.00816 | epsilon: 0.01 347 | 3440 episode | score: 8.61 | loss: 14.24374 | epsilon: 0.01 348 | 3450 episode | score: 8.57 | loss: 12.63750 | epsilon: 0.01 349 | 3460 episode | score: 8.56 | loss: 11.11115 | epsilon: 0.01 350 | 3470 episode | score: 8.56 | loss: 10.27376 | epsilon: 0.01 351 | 3480 episode | score: 8.62 | loss: 11.87048 | epsilon: 0.01 352 | 3490 episode | score: 8.59 | loss: 8.70988 | epsilon: 0.01 353 | 3500 episode | score: 8.58 | loss: 9.51818 | epsilon: 0.01 354 | 3510 episode | score: 8.61 | loss: 11.09497 | epsilon: 0.01 355 | 3520 episode | score: 8.57 | loss: 15.90761 | epsilon: 0.01 356 | 3530 episode | score: 8.66 | loss: 8.75900 | epsilon: 0.01 357 | 3540 episode | score: 8.63 | loss: 11.89002 | epsilon: 0.01 358 | 3550 episode | score: 8.60 | loss: 11.09713 | epsilon: 0.01 359 | 3560 episode | score: 8.59 | loss: 13.51019 | epsilon: 0.01 360 | 3570 episode | score: 8.53 | loss: 8.81450 | epsilon: 0.01 361 | 3580 episode | score: 8.52 | loss: 7.92999 | epsilon: 0.01 362 | 3590 episode | score: 8.47 | loss: 12.68926 | epsilon: 0.01 363 | 3600 episode | score: 8.44 | loss: 15.07101 | epsilon: 0.01 364 | 3610 episode | score: 8.43 | loss: 11.09076 | epsilon: 0.01 365 | 3620 episode | score: 8.42 | loss: 12.73851 | epsilon: 0.01 366 | 3630 episode | score: 8.46 | loss: 11.88773 | epsilon: 0.01 367 | 3640 episode | score: 8.46 | loss: 12.73633 | epsilon: 0.01 368 | 3650 episode | score: 8.48 | loss: 13.50091 | epsilon: 0.01 369 | 3660 episode | score: 8.50 | loss: 13.49799 | epsilon: 0.01 370 | 3670 episode | score: 8.46 | loss: 11.94654 | epsilon: 0.01 371 | 3680 episode | score: 8.54 | loss: 12.04877 | epsilon: 0.01 372 | 3690 episode | score: 8.55 | loss: 12.80853 | epsilon: 0.01 373 | 3700 episode | score: 8.54 | loss: 11.15186 | epsilon: 0.01 374 | 3710 episode | score: 8.55 | loss: 13.55354 | epsilon: 0.01 375 | 3720 episode | score: 8.65 | loss: 7.96912 | epsilon: 0.01 376 | 3730 episode | score: 8.63 | loss: 10.31971 | epsilon: 0.01 377 | 3740 episode | score: 8.69 | loss: 11.10972 | epsilon: 0.01 378 | 3750 episode | score: 8.70 | loss: 15.06225 | epsilon: 0.01 379 | 3760 episode | score: 8.69 | loss: 12.18484 | epsilon: 0.01 380 | 3770 episode | score: 8.69 | loss: 10.39990 | epsilon: 0.01 381 | 3780 episode | score: 8.70 | loss: 9.68053 | epsilon: 0.01 382 | 3790 episode | score: 8.74 | loss: 7.95677 | epsilon: 0.01 383 | 3800 episode | score: 8.76 | loss: 12.74118 | epsilon: 0.01 384 | 3810 episode | score: 8.78 | loss: 11.16556 | epsilon: 0.01 385 | 3820 episode | score: 8.93 | loss: 11.24375 | epsilon: 0.01 386 | 3830 episode | score: 9.24 | loss: 8.01321 | epsilon: 0.01 387 | 3840 episode | score: 9.17 | loss: 9.61550 | epsilon: 0.01 388 | 3850 episode | score: 9.11 | loss: 13.57032 | epsilon: 0.01 389 | 3860 episode | score: 9.05 | loss: 9.55489 | epsilon: 0.01 390 | 3870 episode | score: 9.07 | loss: 11.13208 | epsilon: 0.01 391 | 3880 episode | score: 9.05 | loss: 11.19456 | epsilon: 0.01 392 | 3890 episode | score: 9.10 | loss: 8.16878 | epsilon: 0.01 393 | 3900 episode | score: 9.02 | loss: 10.33400 | epsilon: 0.01 394 | 3910 episode | score: 8.95 | loss: 11.69202 | epsilon: 0.01 395 | 3920 episode | score: 8.91 | loss: 6.57852 | epsilon: 0.01 396 | 3930 episode | score: 8.88 | loss: 12.02832 | epsilon: 0.01 397 | 3940 episode | score: 8.82 | loss: 9.25438 | epsilon: 0.01 398 | 3950 episode | score: 8.81 | loss: 11.35655 | epsilon: 0.01 399 | 3960 episode | score: 8.72 | loss: 13.42752 | epsilon: 0.01 400 | 3970 episode | score: 8.65 | loss: 14.05224 | epsilon: 0.01 401 | 3980 episode | score: 8.63 | loss: 12.41926 | epsilon: 0.01 402 | 3990 episode | score: 8.66 | loss: 9.58528 | epsilon: 0.01 403 | 4000 episode | score: 8.72 | loss: 9.59237 | epsilon: 0.01 404 | 4010 episode | score: 8.71 | loss: 9.69728 | epsilon: 0.01 405 | 4020 episode | score: 8.67 | loss: 13.23359 | epsilon: 0.01 406 | 4030 episode | score: 8.64 | loss: 10.66293 | epsilon: 0.01 407 | 4040 episode | score: 8.61 | loss: 13.73763 | epsilon: 0.01 408 | 4050 episode | score: 8.54 | loss: 17.78467 | epsilon: 0.01 409 | 4060 episode | score: 8.57 | loss: 9.59514 | epsilon: 0.01 410 | 4070 episode | score: 8.51 | loss: 12.94440 | epsilon: 0.01 411 | 4080 episode | score: 8.49 | loss: 9.02911 | epsilon: 0.01 412 | 4090 episode | score: 8.45 | loss: 12.79924 | epsilon: 0.01 413 | 4100 episode | score: 8.44 | loss: 15.17093 | epsilon: 0.01 414 | 4110 episode | score: 8.44 | loss: 14.67449 | epsilon: 0.01 415 | 4120 episode | score: 8.41 | loss: 13.59448 | epsilon: 0.01 416 | 4130 episode | score: 8.46 | loss: 16.86084 | epsilon: 0.01 417 | 4140 episode | score: 8.55 | loss: 15.98142 | epsilon: 0.01 418 | 4150 episode | score: 8.56 | loss: 12.12033 | epsilon: 0.01 419 | 4160 episode | score: 8.56 | loss: 12.86493 | epsilon: 0.01 420 | 4170 episode | score: 8.51 | loss: 13.62834 | epsilon: 0.01 421 | 4180 episode | score: 8.49 | loss: 17.55323 | epsilon: 0.01 422 | 4190 episode | score: 8.44 | loss: 12.87850 | epsilon: 0.01 423 | 4200 episode | score: 8.42 | loss: 12.78649 | epsilon: 0.01 424 | 4210 episode | score: 8.46 | loss: 11.28722 | epsilon: 0.01 425 | 4220 episode | score: 8.44 | loss: 9.79293 | epsilon: 0.01 426 | 4230 episode | score: 8.67 | loss: 13.11148 | epsilon: 0.01 427 | 4240 episode | score: 8.73 | loss: 11.36636 | epsilon: 0.01 428 | 4250 episode | score: 8.68 | loss: 18.31320 | epsilon: 0.01 429 | 4260 episode | score: 8.64 | loss: 15.54791 | epsilon: 0.01 430 | 4270 episode | score: 8.64 | loss: 10.45561 | epsilon: 0.01 431 | 4280 episode | score: 8.58 | loss: 14.11239 | epsilon: 0.01 432 | 4290 episode | score: 8.57 | loss: 12.93595 | epsilon: 0.01 433 | 4300 episode | score: 8.60 | loss: 15.43336 | epsilon: 0.01 434 | 4310 episode | score: 8.85 | loss: 10.41784 | epsilon: 0.01 435 | 4320 episode | score: 8.97 | loss: 14.09765 | epsilon: 0.01 436 | 4330 episode | score: 9.05 | loss: 11.35991 | epsilon: 0.01 437 | 4340 episode | score: 9.02 | loss: 11.48805 | epsilon: 0.01 438 | 4350 episode | score: 8.93 | loss: 12.31691 | epsilon: 0.01 439 | 4360 episode | score: 8.87 | loss: 10.58508 | epsilon: 0.01 440 | 4370 episode | score: 8.82 | loss: 8.43289 | epsilon: 0.01 441 | 4380 episode | score: 8.83 | loss: 8.85667 | epsilon: 0.01 442 | 4390 episode | score: 8.92 | loss: 8.89580 | epsilon: 0.01 443 | 4400 episode | score: 8.96 | loss: 12.08995 | epsilon: 0.01 444 | 4410 episode | score: 8.91 | loss: 11.35356 | epsilon: 0.01 445 | 4420 episode | score: 8.87 | loss: 11.24321 | epsilon: 0.01 446 | 4430 episode | score: 8.80 | loss: 11.34462 | epsilon: 0.01 447 | 4440 episode | score: 8.83 | loss: 15.81705 | epsilon: 0.01 448 | 4450 episode | score: 8.83 | loss: 11.94490 | epsilon: 0.01 449 | 4460 episode | score: 8.81 | loss: 13.50714 | epsilon: 0.01 450 | 4470 episode | score: 8.73 | loss: 15.08722 | epsilon: 0.01 451 | 4480 episode | score: 8.71 | loss: 10.57810 | epsilon: 0.01 452 | 4490 episode | score: 8.86 | loss: 7.95716 | epsilon: 0.01 453 | 4500 episode | score: 8.81 | loss: 9.58130 | epsilon: 0.01 454 | 4510 episode | score: 8.74 | loss: 14.31959 | epsilon: 0.01 455 | 4520 episode | score: 8.71 | loss: 12.78611 | epsilon: 0.01 456 | 4530 episode | score: 8.70 | loss: 15.13371 | epsilon: 0.01 457 | 4540 episode | score: 8.67 | loss: 11.13598 | epsilon: 0.01 458 | 4550 episode | score: 8.73 | loss: 9.87144 | epsilon: 0.01 459 | 4560 episode | score: 8.69 | loss: 15.18121 | epsilon: 0.01 460 | 4570 episode | score: 8.71 | loss: 17.76075 | epsilon: 0.01 461 | 4580 episode | score: 8.72 | loss: 17.46336 | epsilon: 0.01 462 | 4590 episode | score: 8.70 | loss: 16.82282 | epsilon: 0.01 463 | 4600 episode | score: 8.67 | loss: 11.97398 | epsilon: 0.01 464 | 4610 episode | score: 8.61 | loss: 5.62248 | epsilon: 0.01 465 | 4620 episode | score: 8.62 | loss: 12.76912 | epsilon: 0.01 466 | 4630 episode | score: 8.60 | loss: 8.81276 | epsilon: 0.01 467 | 4640 episode | score: 8.59 | loss: 10.44362 | epsilon: 0.01 468 | 4650 episode | score: 8.60 | loss: 15.01834 | epsilon: 0.01 469 | 4660 episode | score: 8.62 | loss: 17.10469 | epsilon: 0.01 470 | 4670 episode | score: 8.65 | loss: 14.30819 | epsilon: 0.01 471 | 4680 episode | score: 8.64 | loss: 14.29096 | epsilon: 0.01 472 | 4690 episode | score: 8.60 | loss: 12.26738 | epsilon: 0.01 473 | 4700 episode | score: 8.62 | loss: 11.95053 | epsilon: 0.01 474 | 4710 episode | score: 8.62 | loss: 10.07705 | epsilon: 0.01 475 | 4720 episode | score: 8.59 | loss: 4.77848 | epsilon: 0.01 476 | 4730 episode | score: 8.63 | loss: 10.37387 | epsilon: 0.01 477 | 4740 episode | score: 8.87 | loss: 11.96993 | epsilon: 0.01 478 | 4750 episode | score: 8.79 | loss: 8.79129 | epsilon: 0.01 479 | 4760 episode | score: 8.78 | loss: 7.22931 | epsilon: 0.01 480 | 4770 episode | score: 8.74 | loss: 11.15121 | epsilon: 0.01 481 | 4780 episode | score: 8.72 | loss: 12.72968 | epsilon: 0.01 482 | 4790 episode | score: 8.82 | loss: 10.33394 | epsilon: 0.01 483 | 4800 episode | score: 8.78 | loss: 10.33138 | epsilon: 0.01 484 | 4810 episode | score: 8.75 | loss: 15.39932 | epsilon: 0.01 485 | 4820 episode | score: 8.76 | loss: 11.96936 | epsilon: 0.01 486 | 4830 episode | score: 8.73 | loss: 24.04654 | epsilon: 0.01 487 | 4840 episode | score: 8.75 | loss: 15.82520 | epsilon: 0.01 488 | 4850 episode | score: 8.70 | loss: 15.51851 | epsilon: 0.01 489 | 4860 episode | score: 8.64 | loss: 6.37486 | epsilon: 0.01 490 | 4870 episode | score: 8.60 | loss: 19.06402 | epsilon: 0.01 491 | 4880 episode | score: 8.56 | loss: 15.11088 | epsilon: 0.01 492 | 4890 episode | score: 8.60 | loss: 13.55245 | epsilon: 0.01 493 | 4900 episode | score: 8.57 | loss: 16.86619 | epsilon: 0.01 494 | 4910 episode | score: 8.55 | loss: 15.18091 | epsilon: 0.01 495 | 4920 episode | score: 8.53 | loss: 14.59640 | epsilon: 0.01 496 | 4930 episode | score: 8.49 | loss: 12.74775 | epsilon: 0.01 497 | 4940 episode | score: 8.48 | loss: 12.72942 | epsilon: 0.01 498 | 4950 episode | score: 8.52 | loss: 14.33515 | epsilon: 0.01 499 | 4960 episode | score: 8.54 | loss: 9.65471 | epsilon: 0.01 500 | 4970 episode | score: 8.51 | loss: 18.19523 | epsilon: 0.01 501 | 4980 episode | score: 8.45 | loss: 15.07701 | epsilon: 0.01 502 | 4990 episode | score: 8.59 | loss: 13.52352 | epsilon: 0.01 503 | -------------------------------------------------------------------------------- /out/trace_DTQN_10.txt: -------------------------------------------------------------------------------- 1 | state size: 2 2 | action size: 2 3 | 0 episode | score: 16.00 | loss: 0.00000 | epsilon: 1.00 4 | 10 episode | score: 16.48 | loss: 0.00000 | epsilon: 1.00 5 | 20 episode | score: 17.15 | loss: 0.00000 | epsilon: 1.00 6 | 30 episode | score: 17.02 | loss: 0.00000 | epsilon: 1.00 7 | 40 episode | score: 17.05 | loss: 0.00000 | epsilon: 1.00 8 | 50 episode | score: 17.94 | loss: 0.18528 | epsilon: 0.99 9 | 60 episode | score: 18.16 | loss: 0.17950 | epsilon: 0.98 10 | 70 episode | score: 18.58 | loss: 0.12416 | epsilon: 0.97 11 | 80 episode | score: 18.75 | loss: 0.20064 | epsilon: 0.96 12 | 90 episode | score: 19.12 | loss: 0.85149 | epsilon: 0.95 13 | 100 episode | score: 20.25 | loss: 0.31414 | epsilon: 0.93 14 | 110 episode | score: 20.22 | loss: 0.46559 | epsilon: 0.92 15 | 120 episode | score: 20.24 | loss: 0.34208 | epsilon: 0.91 16 | 130 episode | score: 20.15 | loss: 0.20250 | epsilon: 0.90 17 | 140 episode | score: 20.55 | loss: 1.23666 | epsilon: 0.89 18 | 150 episode | score: 20.87 | loss: 0.61966 | epsilon: 0.88 19 | 160 episode | score: 21.53 | loss: 0.47718 | epsilon: 0.86 20 | 170 episode | score: 21.49 | loss: 1.12160 | epsilon: 0.85 21 | 180 episode | score: 21.84 | loss: 1.05996 | epsilon: 0.84 22 | 190 episode | score: 21.98 | loss: 0.28847 | epsilon: 0.83 23 | 200 episode | score: 22.02 | loss: 0.80320 | epsilon: 0.82 24 | 210 episode | score: 21.73 | loss: 0.93839 | epsilon: 0.80 25 | 220 episode | score: 22.06 | loss: 0.60672 | epsilon: 0.79 26 | 230 episode | score: 21.88 | loss: 0.04399 | epsilon: 0.78 27 | 240 episode | score: 21.35 | loss: 0.95139 | epsilon: 0.77 28 | 250 episode | score: 20.59 | loss: 0.65200 | epsilon: 0.77 29 | 260 episode | score: 20.06 | loss: 0.98826 | epsilon: 0.76 30 | 270 episode | score: 20.17 | loss: 1.66683 | epsilon: 0.75 31 | 280 episode | score: 19.94 | loss: 1.04465 | epsilon: 0.74 32 | 290 episode | score: 19.98 | loss: 2.11482 | epsilon: 0.73 33 | 300 episode | score: 19.88 | loss: 1.09425 | epsilon: 0.72 34 | 310 episode | score: 19.42 | loss: 1.52128 | epsilon: 0.71 35 | 320 episode | score: 19.30 | loss: 1.16642 | epsilon: 0.70 36 | 330 episode | score: 19.24 | loss: 1.19288 | epsilon: 0.69 37 | 340 episode | score: 18.74 | loss: 1.56613 | epsilon: 0.68 38 | 350 episode | score: 18.84 | loss: 1.20542 | epsilon: 0.67 39 | 360 episode | score: 18.53 | loss: 1.62247 | epsilon: 0.66 40 | 370 episode | score: 18.38 | loss: 0.44018 | epsilon: 0.65 41 | 380 episode | score: 18.52 | loss: 1.69055 | epsilon: 0.64 42 | 390 episode | score: 18.54 | loss: 0.88652 | epsilon: 0.63 43 | 400 episode | score: 18.25 | loss: 0.88762 | epsilon: 0.62 44 | 410 episode | score: 18.11 | loss: 0.91311 | epsilon: 0.62 45 | 420 episode | score: 17.75 | loss: 1.78714 | epsilon: 0.61 46 | 430 episode | score: 17.58 | loss: 2.47997 | epsilon: 0.60 47 | 440 episode | score: 17.88 | loss: 2.39753 | epsilon: 0.59 48 | 450 episode | score: 17.60 | loss: 1.39301 | epsilon: 0.58 49 | 460 episode | score: 17.32 | loss: 2.81882 | epsilon: 0.57 50 | 470 episode | score: 17.52 | loss: 1.44277 | epsilon: 0.56 51 | 480 episode | score: 17.03 | loss: 0.96965 | epsilon: 0.56 52 | 490 episode | score: 16.77 | loss: 3.42537 | epsilon: 0.55 53 | 500 episode | score: 16.33 | loss: 2.43177 | epsilon: 0.54 54 | 510 episode | score: 16.80 | loss: 1.97684 | epsilon: 0.53 55 | 520 episode | score: 16.40 | loss: 2.07802 | epsilon: 0.52 56 | 530 episode | score: 16.17 | loss: 2.02413 | epsilon: 0.52 57 | 540 episode | score: 15.87 | loss: 4.07846 | epsilon: 0.51 58 | 550 episode | score: 15.64 | loss: 3.07883 | epsilon: 0.50 59 | 560 episode | score: 15.57 | loss: 1.05649 | epsilon: 0.49 60 | 570 episode | score: 15.19 | loss: 1.05230 | epsilon: 0.49 61 | 580 episode | score: 15.15 | loss: 3.65838 | epsilon: 0.48 62 | 590 episode | score: 14.71 | loss: 2.22756 | epsilon: 0.47 63 | 600 episode | score: 14.31 | loss: 3.75582 | epsilon: 0.47 64 | 610 episode | score: 14.65 | loss: 3.74815 | epsilon: 0.46 65 | 620 episode | score: 14.83 | loss: 3.76151 | epsilon: 0.45 66 | 630 episode | score: 14.87 | loss: 2.75478 | epsilon: 0.44 67 | 640 episode | score: 14.56 | loss: 1.11075 | epsilon: 0.44 68 | 650 episode | score: 14.56 | loss: 3.89268 | epsilon: 0.43 69 | 660 episode | score: 14.25 | loss: 2.23375 | epsilon: 0.42 70 | 670 episode | score: 14.22 | loss: 2.98288 | epsilon: 0.41 71 | 680 episode | score: 13.91 | loss: 1.69801 | epsilon: 0.41 72 | 690 episode | score: 13.67 | loss: 2.29384 | epsilon: 0.40 73 | 700 episode | score: 13.45 | loss: 4.51319 | epsilon: 0.40 74 | 710 episode | score: 13.51 | loss: 2.84723 | epsilon: 0.39 75 | 720 episode | score: 13.40 | loss: 3.99457 | epsilon: 0.38 76 | 730 episode | score: 13.17 | loss: 5.72311 | epsilon: 0.38 77 | 740 episode | score: 12.95 | loss: 4.85307 | epsilon: 0.37 78 | 750 episode | score: 12.84 | loss: 4.10744 | epsilon: 0.36 79 | 760 episode | score: 12.79 | loss: 5.27431 | epsilon: 0.36 80 | 770 episode | score: 12.73 | loss: 4.68064 | epsilon: 0.35 81 | 780 episode | score: 12.64 | loss: 4.18007 | epsilon: 0.34 82 | 790 episode | score: 12.63 | loss: 4.76635 | epsilon: 0.34 83 | 800 episode | score: 12.57 | loss: 2.98694 | epsilon: 0.33 84 | 810 episode | score: 12.43 | loss: 3.58431 | epsilon: 0.32 85 | 820 episode | score: 12.58 | loss: 0.61492 | epsilon: 0.32 86 | 830 episode | score: 12.51 | loss: 3.59966 | epsilon: 0.31 87 | 840 episode | score: 12.72 | loss: 5.47352 | epsilon: 0.30 88 | 850 episode | score: 12.68 | loss: 4.25853 | epsilon: 0.30 89 | 860 episode | score: 12.53 | loss: 3.09318 | epsilon: 0.29 90 | 870 episode | score: 12.31 | loss: 2.08403 | epsilon: 0.28 91 | 880 episode | score: 12.14 | loss: 7.99741 | epsilon: 0.28 92 | 890 episode | score: 11.92 | loss: 2.47761 | epsilon: 0.27 93 | 900 episode | score: 12.51 | loss: 1.89428 | epsilon: 0.26 94 | 910 episode | score: 12.41 | loss: 3.13464 | epsilon: 0.26 95 | 920 episode | score: 12.14 | loss: 5.61082 | epsilon: 0.25 96 | 930 episode | score: 11.93 | loss: 3.76492 | epsilon: 0.25 97 | 940 episode | score: 11.98 | loss: 5.19080 | epsilon: 0.24 98 | 950 episode | score: 11.94 | loss: 6.96484 | epsilon: 0.23 99 | 960 episode | score: 11.99 | loss: 3.80788 | epsilon: 0.23 100 | 970 episode | score: 11.70 | loss: 9.48063 | epsilon: 0.22 101 | 980 episode | score: 11.76 | loss: 3.18651 | epsilon: 0.22 102 | 990 episode | score: 12.31 | loss: 5.74552 | epsilon: 0.21 103 | 1000 episode | score: 12.32 | loss: 7.71358 | epsilon: 0.20 104 | 1010 episode | score: 12.02 | loss: 3.31016 | epsilon: 0.19 105 | 1020 episode | score: 11.75 | loss: 3.88349 | epsilon: 0.19 106 | 1030 episode | score: 11.42 | loss: 4.62133 | epsilon: 0.18 107 | 1040 episode | score: 11.22 | loss: 6.51512 | epsilon: 0.18 108 | 1050 episode | score: 11.11 | loss: 3.32236 | epsilon: 0.17 109 | 1060 episode | score: 11.02 | loss: 7.15486 | epsilon: 0.17 110 | 1070 episode | score: 10.90 | loss: 5.20952 | epsilon: 0.16 111 | 1080 episode | score: 10.76 | loss: 7.23930 | epsilon: 0.16 112 | 1090 episode | score: 10.72 | loss: 6.55639 | epsilon: 0.15 113 | 1100 episode | score: 10.57 | loss: 7.21093 | epsilon: 0.15 114 | 1110 episode | score: 10.37 | loss: 7.88799 | epsilon: 0.14 115 | 1120 episode | score: 10.38 | loss: 7.39938 | epsilon: 0.14 116 | 1130 episode | score: 10.24 | loss: 5.30793 | epsilon: 0.13 117 | 1140 episode | score: 10.10 | loss: 8.76813 | epsilon: 0.13 118 | 1150 episode | score: 10.33 | loss: 5.46215 | epsilon: 0.12 119 | 1160 episode | score: 10.22 | loss: 7.47411 | epsilon: 0.11 120 | 1170 episode | score: 10.06 | loss: 6.90808 | epsilon: 0.11 121 | 1180 episode | score: 9.96 | loss: 6.75095 | epsilon: 0.11 122 | 1190 episode | score: 9.86 | loss: 9.46002 | epsilon: 0.10 123 | 1200 episode | score: 9.78 | loss: 10.21884 | epsilon: 0.10 124 | 1210 episode | score: 9.71 | loss: 6.17822 | epsilon: 0.09 125 | 1220 episode | score: 9.62 | loss: 8.82867 | epsilon: 0.09 126 | 1230 episode | score: 9.67 | loss: 9.50774 | epsilon: 0.08 127 | 1240 episode | score: 9.64 | loss: 8.20357 | epsilon: 0.07 128 | 1250 episode | score: 9.57 | loss: 7.58357 | epsilon: 0.07 129 | 1260 episode | score: 9.55 | loss: 7.55123 | epsilon: 0.06 130 | 1270 episode | score: 9.44 | loss: 10.22959 | epsilon: 0.06 131 | 1280 episode | score: 9.51 | loss: 8.21039 | epsilon: 0.05 132 | 1290 episode | score: 9.37 | loss: 7.57248 | epsilon: 0.05 133 | 1300 episode | score: 9.25 | loss: 8.99998 | epsilon: 0.04 134 | 1310 episode | score: 9.15 | loss: 8.93621 | epsilon: 0.04 135 | 1320 episode | score: 9.08 | loss: 6.27452 | epsilon: 0.04 136 | 1330 episode | score: 9.16 | loss: 9.62946 | epsilon: 0.03 137 | 1340 episode | score: 9.63 | loss: 7.61284 | epsilon: 0.02 138 | 1350 episode | score: 9.48 | loss: 12.47634 | epsilon: 0.02 139 | 1360 episode | score: 9.36 | loss: 10.38395 | epsilon: 0.01 140 | 1370 episode | score: 9.25 | loss: 6.25999 | epsilon: 0.01 141 | 1380 episode | score: 9.19 | loss: 9.05333 | epsilon: 0.01 142 | 1390 episode | score: 9.07 | loss: 8.36740 | epsilon: 0.01 143 | 1400 episode | score: 9.10 | loss: 13.26970 | epsilon: 0.01 144 | 1410 episode | score: 9.06 | loss: 8.44920 | epsilon: 0.01 145 | 1420 episode | score: 9.08 | loss: 8.39362 | epsilon: 0.01 146 | 1430 episode | score: 9.05 | loss: 9.86799 | epsilon: 0.01 147 | 1440 episode | score: 9.13 | loss: 11.96962 | epsilon: 0.01 148 | 1450 episode | score: 9.06 | loss: 11.25040 | epsilon: 0.01 149 | 1460 episode | score: 9.08 | loss: 9.15750 | epsilon: 0.01 150 | 1470 episode | score: 9.02 | loss: 10.56983 | epsilon: 0.01 151 | 1480 episode | score: 8.97 | loss: 8.49944 | epsilon: 0.01 152 | 1490 episode | score: 8.91 | loss: 7.08509 | epsilon: 0.01 153 | 1500 episode | score: 8.91 | loss: 9.17531 | epsilon: 0.01 154 | 1510 episode | score: 8.86 | loss: 7.78239 | epsilon: 0.01 155 | 1520 episode | score: 8.82 | loss: 10.60622 | epsilon: 0.01 156 | 1530 episode | score: 8.77 | loss: 10.68818 | epsilon: 0.01 157 | 1540 episode | score: 8.72 | loss: 9.89931 | epsilon: 0.01 158 | 1550 episode | score: 8.71 | loss: 13.45332 | epsilon: 0.01 159 | 1560 episode | score: 8.69 | loss: 9.95345 | epsilon: 0.01 160 | 1570 episode | score: 8.70 | loss: 12.07025 | epsilon: 0.01 161 | 1580 episode | score: 8.94 | loss: 9.34593 | epsilon: 0.01 162 | 1590 episode | score: 8.89 | loss: 12.17938 | epsilon: 0.01 163 | 1600 episode | score: 8.83 | loss: 10.72823 | epsilon: 0.01 164 | 1610 episode | score: 8.79 | loss: 6.47576 | epsilon: 0.01 165 | 1620 episode | score: 8.72 | loss: 10.74697 | epsilon: 0.01 166 | 1630 episode | score: 8.71 | loss: 5.77982 | epsilon: 0.01 167 | 1640 episode | score: 8.70 | loss: 11.54216 | epsilon: 0.01 168 | 1650 episode | score: 8.69 | loss: 8.69251 | epsilon: 0.01 169 | 1660 episode | score: 8.81 | loss: 10.78316 | epsilon: 0.01 170 | 1670 episode | score: 8.78 | loss: 6.49401 | epsilon: 0.01 171 | 1680 episode | score: 8.77 | loss: 7.22237 | epsilon: 0.01 172 | 1690 episode | score: 8.89 | loss: 8.70512 | epsilon: 0.01 173 | 1700 episode | score: 9.02 | loss: 8.67514 | epsilon: 0.01 174 | 1710 episode | score: 8.95 | loss: 6.52219 | epsilon: 0.01 175 | 1720 episode | score: 8.88 | loss: 8.06107 | epsilon: 0.01 176 | 1730 episode | score: 8.81 | loss: 7.99918 | epsilon: 0.01 177 | 1740 episode | score: 8.86 | loss: 8.01453 | epsilon: 0.01 178 | 1750 episode | score: 8.84 | loss: 7.27068 | epsilon: 0.01 179 | 1760 episode | score: 8.82 | loss: 10.14679 | epsilon: 0.01 180 | 1770 episode | score: 8.78 | loss: 10.91075 | epsilon: 0.01 181 | 1780 episode | score: 8.77 | loss: 10.94096 | epsilon: 0.01 182 | 1790 episode | score: 8.77 | loss: 8.04180 | epsilon: 0.01 183 | 1800 episode | score: 8.92 | loss: 6.63493 | epsilon: 0.01 184 | 1810 episode | score: 8.83 | loss: 10.22885 | epsilon: 0.01 185 | 1820 episode | score: 8.75 | loss: 8.09923 | epsilon: 0.01 186 | 1830 episode | score: 8.76 | loss: 7.35126 | epsilon: 0.01 187 | 1840 episode | score: 9.33 | loss: 9.55382 | epsilon: 0.01 188 | 1850 episode | score: 9.34 | loss: 8.09047 | epsilon: 0.01 189 | 1860 episode | score: 9.30 | loss: 6.61888 | epsilon: 0.01 190 | 1870 episode | score: 9.24 | loss: 9.54714 | epsilon: 0.01 191 | 1880 episode | score: 9.60 | loss: 10.28490 | epsilon: 0.01 192 | 1890 episode | score: 9.48 | loss: 6.67569 | epsilon: 0.01 193 | 1900 episode | score: 9.40 | loss: 11.07699 | epsilon: 0.01 194 | 1910 episode | score: 9.29 | loss: 7.40748 | epsilon: 0.01 195 | 1920 episode | score: 9.26 | loss: 8.09952 | epsilon: 0.01 196 | 1930 episode | score: 9.23 | loss: 8.16106 | epsilon: 0.01 197 | 1940 episode | score: 9.16 | loss: 9.59923 | epsilon: 0.01 198 | 1950 episode | score: 9.07 | loss: 11.85551 | epsilon: 0.01 199 | 1960 episode | score: 8.93 | loss: 9.61497 | epsilon: 0.01 200 | 1970 episode | score: 8.90 | loss: 9.60651 | epsilon: 0.01 201 | 1980 episode | score: 8.83 | loss: 10.42463 | epsilon: 0.01 202 | 1990 episode | score: 8.81 | loss: 6.69849 | epsilon: 0.01 203 | 2000 episode | score: 8.76 | loss: 8.90158 | epsilon: 0.01 204 | 2010 episode | score: 8.78 | loss: 11.14114 | epsilon: 0.01 205 | 2020 episode | score: 8.74 | loss: 11.22850 | epsilon: 0.01 206 | 2030 episode | score: 8.74 | loss: 13.42511 | epsilon: 0.01 207 | 2040 episode | score: 8.70 | loss: 10.42988 | epsilon: 0.01 208 | 2050 episode | score: 8.70 | loss: 11.23738 | epsilon: 0.01 209 | 2060 episode | score: 8.64 | loss: 11.95969 | epsilon: 0.01 210 | 2070 episode | score: 8.71 | loss: 10.46605 | epsilon: 0.01 211 | 2080 episode | score: 8.71 | loss: 12.78350 | epsilon: 0.01 212 | 2090 episode | score: 8.70 | loss: 12.75578 | epsilon: 0.01 213 | 2100 episode | score: 8.68 | loss: 7.51685 | epsilon: 0.01 214 | 2110 episode | score: 8.81 | loss: 12.03533 | epsilon: 0.01 215 | 2120 episode | score: 8.81 | loss: 9.81312 | epsilon: 0.01 216 | 2130 episode | score: 8.92 | loss: 9.80107 | epsilon: 0.01 217 | 2140 episode | score: 8.82 | loss: 7.60721 | epsilon: 0.01 218 | 2150 episode | score: 8.78 | loss: 9.06595 | epsilon: 0.01 219 | 2160 episode | score: 8.78 | loss: 9.80017 | epsilon: 0.01 220 | 2170 episode | score: 8.74 | loss: 12.85833 | epsilon: 0.01 221 | 2180 episode | score: 8.69 | loss: 11.30474 | epsilon: 0.01 222 | 2190 episode | score: 8.80 | loss: 11.48147 | epsilon: 0.01 223 | 2200 episode | score: 8.78 | loss: 7.64000 | epsilon: 0.01 224 | 2210 episode | score: 8.73 | loss: 9.82936 | epsilon: 0.01 225 | 2220 episode | score: 8.69 | loss: 10.60633 | epsilon: 0.01 226 | 2230 episode | score: 8.66 | loss: 12.14406 | epsilon: 0.01 227 | 2240 episode | score: 8.69 | loss: 12.14394 | epsilon: 0.01 228 | 2250 episode | score: 8.66 | loss: 13.65806 | epsilon: 0.01 229 | 2260 episode | score: 8.62 | loss: 9.91018 | epsilon: 0.01 230 | 2270 episode | score: 8.65 | loss: 10.62868 | epsilon: 0.01 231 | 2280 episode | score: 8.61 | loss: 13.67840 | epsilon: 0.01 232 | 2290 episode | score: 8.68 | loss: 11.39243 | epsilon: 0.01 233 | 2300 episode | score: 8.63 | loss: 11.43112 | epsilon: 0.01 234 | 2310 episode | score: 8.66 | loss: 10.68759 | epsilon: 0.01 235 | 2320 episode | score: 8.62 | loss: 12.98507 | epsilon: 0.01 236 | 2330 episode | score: 8.60 | loss: 9.93560 | epsilon: 0.01 237 | 2340 episode | score: 8.59 | loss: 13.80500 | epsilon: 0.01 238 | 2350 episode | score: 8.63 | loss: 8.43398 | epsilon: 0.01 239 | 2360 episode | score: 8.63 | loss: 7.72701 | epsilon: 0.01 240 | 2370 episode | score: 8.60 | loss: 9.18573 | epsilon: 0.01 241 | 2380 episode | score: 8.62 | loss: 10.83447 | epsilon: 0.01 242 | 2390 episode | score: 8.57 | loss: 11.49614 | epsilon: 0.01 243 | 2400 episode | score: 8.61 | loss: 6.16400 | epsilon: 0.01 244 | 2410 episode | score: 8.61 | loss: 10.03061 | epsilon: 0.01 245 | 2420 episode | score: 8.59 | loss: 10.77323 | epsilon: 0.01 246 | 2430 episode | score: 8.64 | loss: 9.24260 | epsilon: 0.01 247 | 2440 episode | score: 8.60 | loss: 10.04683 | epsilon: 0.01 248 | 2450 episode | score: 8.69 | loss: 12.27101 | epsilon: 0.01 249 | 2460 episode | score: 9.43 | loss: 9.24598 | epsilon: 0.01 250 | 2470 episode | score: 9.42 | loss: 8.50105 | epsilon: 0.01 251 | 2480 episode | score: 9.33 | loss: 9.97533 | epsilon: 0.01 252 | 2490 episode | score: 9.33 | loss: 6.14554 | epsilon: 0.01 253 | 2500 episode | score: 9.33 | loss: 5.40624 | epsilon: 0.01 254 | 2510 episode | score: 9.21 | loss: 10.75754 | epsilon: 0.01 255 | 2520 episode | score: 9.13 | loss: 6.99077 | epsilon: 0.01 256 | 2530 episode | score: 9.07 | loss: 8.52315 | epsilon: 0.01 257 | 2540 episode | score: 9.05 | loss: 6.93344 | epsilon: 0.01 258 | 2550 episode | score: 8.99 | loss: 6.93618 | epsilon: 0.01 259 | 2560 episode | score: 8.95 | loss: 13.07306 | epsilon: 0.01 260 | 2570 episode | score: 8.94 | loss: 11.62716 | epsilon: 0.01 261 | 2580 episode | score: 8.85 | loss: 10.78042 | epsilon: 0.01 262 | 2590 episode | score: 8.81 | loss: 11.56523 | epsilon: 0.01 263 | 2600 episode | score: 8.80 | loss: 9.23891 | epsilon: 0.01 264 | 2610 episode | score: 8.78 | loss: 10.78324 | epsilon: 0.01 265 | 2620 episode | score: 8.70 | loss: 10.02428 | epsilon: 0.01 266 | 2630 episode | score: 8.65 | loss: 10.77991 | epsilon: 0.01 267 | 2640 episode | score: 8.65 | loss: 13.08437 | epsilon: 0.01 268 | 2650 episode | score: 8.65 | loss: 13.09908 | epsilon: 0.01 269 | 2660 episode | score: 8.63 | loss: 17.00031 | epsilon: 0.01 270 | 2670 episode | score: 8.65 | loss: 10.79809 | epsilon: 0.01 271 | 2680 episode | score: 8.67 | loss: 7.69315 | epsilon: 0.01 272 | 2690 episode | score: 8.64 | loss: 8.46132 | epsilon: 0.01 273 | 2700 episode | score: 8.63 | loss: 16.19745 | epsilon: 0.01 274 | 2710 episode | score: 8.61 | loss: 12.32985 | epsilon: 0.01 275 | 2720 episode | score: 8.57 | loss: 9.25499 | epsilon: 0.01 276 | 2730 episode | score: 8.55 | loss: 12.31944 | epsilon: 0.01 277 | 2740 episode | score: 8.57 | loss: 11.62032 | epsilon: 0.01 278 | 2750 episode | score: 8.56 | loss: 11.63290 | epsilon: 0.01 279 | 2760 episode | score: 8.52 | loss: 13.10786 | epsilon: 0.01 280 | 2770 episode | score: 8.51 | loss: 12.36207 | epsilon: 0.01 281 | 2780 episode | score: 8.50 | loss: 15.42316 | epsilon: 0.01 282 | 2790 episode | score: 8.56 | loss: 13.21941 | epsilon: 0.01 283 | 2800 episode | score: 8.57 | loss: 10.06483 | epsilon: 0.01 284 | 2810 episode | score: 8.57 | loss: 11.56093 | epsilon: 0.01 285 | 2820 episode | score: 8.55 | loss: 14.74690 | epsilon: 0.01 286 | 2830 episode | score: 8.54 | loss: 11.64284 | epsilon: 0.01 287 | 2840 episode | score: 8.52 | loss: 10.13775 | epsilon: 0.01 288 | 2850 episode | score: 8.51 | loss: 15.48453 | epsilon: 0.01 289 | 2860 episode | score: 8.53 | loss: 13.18159 | epsilon: 0.01 290 | 2870 episode | score: 8.58 | loss: 10.06775 | epsilon: 0.01 291 | 2880 episode | score: 8.61 | loss: 9.30755 | epsilon: 0.01 292 | 2890 episode | score: 8.65 | loss: 8.55476 | epsilon: 0.01 293 | 2900 episode | score: 8.63 | loss: 10.87585 | epsilon: 0.01 294 | 2910 episode | score: 8.64 | loss: 13.18019 | epsilon: 0.01 295 | 2920 episode | score: 8.60 | loss: 10.19342 | epsilon: 0.01 296 | 2930 episode | score: 8.60 | loss: 10.07251 | epsilon: 0.01 297 | 2940 episode | score: 8.68 | loss: 11.60551 | epsilon: 0.01 298 | 2950 episode | score: 8.64 | loss: 8.51660 | epsilon: 0.01 299 | 2960 episode | score: 8.73 | loss: 10.87706 | epsilon: 0.01 300 | 2970 episode | score: 8.70 | loss: 12.41756 | epsilon: 0.01 301 | 2980 episode | score: 8.63 | loss: 11.63462 | epsilon: 0.01 302 | 2990 episode | score: 8.61 | loss: 12.45533 | epsilon: 0.01 303 | 3000 episode | score: 8.62 | loss: 13.31752 | epsilon: 0.01 304 | 3010 episode | score: 8.64 | loss: 11.63569 | epsilon: 0.01 305 | 3020 episode | score: 8.78 | loss: 9.32524 | epsilon: 0.01 306 | 3030 episode | score: 8.76 | loss: 11.73422 | epsilon: 0.01 307 | 3040 episode | score: 8.75 | loss: 13.20695 | epsilon: 0.01 308 | 3050 episode | score: 8.74 | loss: 7.05605 | epsilon: 0.01 309 | 3060 episode | score: 8.69 | loss: 8.57716 | epsilon: 0.01 310 | 3070 episode | score: 8.66 | loss: 10.19580 | epsilon: 0.01 311 | 3080 episode | score: 8.66 | loss: 14.06568 | epsilon: 0.01 312 | 3090 episode | score: 8.64 | loss: 12.54142 | epsilon: 0.01 313 | 3100 episode | score: 8.62 | loss: 8.65207 | epsilon: 0.01 314 | 3110 episode | score: 8.59 | loss: 11.02742 | epsilon: 0.01 315 | 3120 episode | score: 8.58 | loss: 10.94348 | epsilon: 0.01 316 | 3130 episode | score: 8.47 | loss: 14.08981 | epsilon: 0.01 317 | 3140 episode | score: 8.49 | loss: 12.66682 | epsilon: 0.01 318 | 3150 episode | score: 8.48 | loss: 12.50308 | epsilon: 0.01 319 | 3160 episode | score: 8.52 | loss: 11.85718 | epsilon: 0.01 320 | 3170 episode | score: 8.52 | loss: 9.50961 | epsilon: 0.01 321 | 3180 episode | score: 8.53 | loss: 12.61227 | epsilon: 0.01 322 | 3190 episode | score: 8.56 | loss: 9.45285 | epsilon: 0.01 323 | 3200 episode | score: 8.60 | loss: 11.76034 | epsilon: 0.01 324 | 3210 episode | score: 8.68 | loss: 14.85800 | epsilon: 0.01 325 | 3220 episode | score: 8.66 | loss: 11.74906 | epsilon: 0.01 326 | 3230 episode | score: 8.76 | loss: 10.19856 | epsilon: 0.01 327 | 3240 episode | score: 8.69 | loss: 7.87378 | epsilon: 0.01 328 | 3250 episode | score: 8.80 | loss: 10.24053 | epsilon: 0.01 329 | 3260 episode | score: 8.75 | loss: 6.31049 | epsilon: 0.01 330 | 3270 episode | score: 8.72 | loss: 11.75972 | epsilon: 0.01 331 | 3280 episode | score: 8.68 | loss: 11.00718 | epsilon: 0.01 332 | 3290 episode | score: 8.66 | loss: 8.77038 | epsilon: 0.01 333 | 3300 episode | score: 8.70 | loss: 11.11809 | epsilon: 0.01 334 | 3310 episode | score: 8.71 | loss: 19.04940 | epsilon: 0.01 335 | 3320 episode | score: 8.70 | loss: 11.57778 | epsilon: 0.01 336 | 3330 episode | score: 8.68 | loss: 17.37124 | epsilon: 0.01 337 | 3340 episode | score: 8.62 | loss: 13.05712 | epsilon: 0.01 338 | 3350 episode | score: 8.58 | loss: 14.91305 | epsilon: 0.01 339 | 3360 episode | score: 8.56 | loss: 14.71657 | epsilon: 0.01 340 | 3370 episode | score: 8.53 | loss: 16.40873 | epsilon: 0.01 341 | 3380 episode | score: 8.54 | loss: 14.87654 | epsilon: 0.01 342 | 3390 episode | score: 8.50 | loss: 12.68097 | epsilon: 0.01 343 | 3400 episode | score: 8.53 | loss: 12.65293 | epsilon: 0.01 344 | 3410 episode | score: 8.54 | loss: 10.97103 | epsilon: 0.01 345 | 3420 episode | score: 8.53 | loss: 10.18955 | epsilon: 0.01 346 | 3430 episode | score: 8.54 | loss: 9.49034 | epsilon: 0.01 347 | 3440 episode | score: 8.53 | loss: 7.01580 | epsilon: 0.01 348 | 3450 episode | score: 8.52 | loss: 14.17627 | epsilon: 0.01 349 | 3460 episode | score: 8.52 | loss: 13.27503 | epsilon: 0.01 350 | 3470 episode | score: 8.52 | loss: 9.44508 | epsilon: 0.01 351 | 3480 episode | score: 8.52 | loss: 13.33412 | epsilon: 0.01 352 | 3490 episode | score: 8.49 | loss: 15.74604 | epsilon: 0.01 353 | 3500 episode | score: 8.47 | loss: 15.71954 | epsilon: 0.01 354 | 3510 episode | score: 8.47 | loss: 10.21629 | epsilon: 0.01 355 | 3520 episode | score: 8.47 | loss: 8.66194 | epsilon: 0.01 356 | 3530 episode | score: 8.65 | loss: 12.74928 | epsilon: 0.01 357 | 3540 episode | score: 8.79 | loss: 7.37917 | epsilon: 0.01 358 | 3550 episode | score: 8.80 | loss: 13.15130 | epsilon: 0.01 359 | 3560 episode | score: 8.84 | loss: 7.93903 | epsilon: 0.01 360 | 3570 episode | score: 8.85 | loss: 12.73963 | epsilon: 0.01 361 | 3580 episode | score: 9.59 | loss: 12.13551 | epsilon: 0.01 362 | 3590 episode | score: 9.49 | loss: 15.21393 | epsilon: 0.01 363 | 3600 episode | score: 9.71 | loss: 14.52880 | epsilon: 0.01 364 | 3610 episode | score: 9.87 | loss: 7.44391 | epsilon: 0.01 365 | 3620 episode | score: 10.68 | loss: 7.64639 | epsilon: 0.01 366 | 3630 episode | score: 10.61 | loss: 12.73185 | epsilon: 0.01 367 | 3640 episode | score: 10.62 | loss: 8.65501 | epsilon: 0.01 368 | 3650 episode | score: 10.40 | loss: 18.17788 | epsilon: 0.01 369 | 3660 episode | score: 10.62 | loss: 9.56333 | epsilon: 0.01 370 | 3670 episode | score: 10.49 | loss: 5.19324 | epsilon: 0.01 371 | 3680 episode | score: 10.69 | loss: 5.95822 | epsilon: 0.01 372 | 3690 episode | score: 10.74 | loss: 5.43674 | epsilon: 0.01 373 | 3700 episode | score: 11.05 | loss: 11.38299 | epsilon: 0.01 374 | 3710 episode | score: 11.20 | loss: 7.25396 | epsilon: 0.01 375 | 3720 episode | score: 10.94 | loss: 5.58109 | epsilon: 0.01 376 | 3730 episode | score: 10.70 | loss: 10.16100 | epsilon: 0.01 377 | 3740 episode | score: 10.63 | loss: 5.46199 | epsilon: 0.01 378 | 3750 episode | score: 10.43 | loss: 6.40074 | epsilon: 0.01 379 | 3760 episode | score: 10.29 | loss: 5.79258 | epsilon: 0.01 380 | 3770 episode | score: 10.58 | loss: 9.68128 | epsilon: 0.01 381 | 3780 episode | score: 10.54 | loss: 10.07901 | epsilon: 0.01 382 | 3790 episode | score: 10.32 | loss: 10.71799 | epsilon: 0.01 383 | 3800 episode | score: 10.16 | loss: 4.50272 | epsilon: 0.01 384 | 3810 episode | score: 9.98 | loss: 11.63281 | epsilon: 0.01 385 | 3820 episode | score: 9.98 | loss: 12.97600 | epsilon: 0.01 386 | 3830 episode | score: 9.84 | loss: 6.84899 | epsilon: 0.01 387 | 3840 episode | score: 10.17 | loss: 10.55282 | epsilon: 0.01 388 | 3850 episode | score: 10.25 | loss: 6.92412 | epsilon: 0.01 389 | 3860 episode | score: 12.36 | loss: 8.25691 | epsilon: 0.01 390 | 3870 episode | score: 12.19 | loss: 6.05455 | epsilon: 0.01 391 | 3880 episode | score: 12.74 | loss: 5.93086 | epsilon: 0.01 392 | 3890 episode | score: 12.99 | loss: 7.37379 | epsilon: 0.01 393 | 3900 episode | score: 13.56 | loss: 6.59420 | epsilon: 0.01 394 | 3910 episode | score: 13.96 | loss: 12.78928 | epsilon: 0.01 395 | 3920 episode | score: 14.24 | loss: 9.47036 | epsilon: 0.01 396 | 3930 episode | score: 13.80 | loss: 5.10357 | epsilon: 0.01 397 | 3940 episode | score: 13.83 | loss: 4.61695 | epsilon: 0.01 398 | 3950 episode | score: 13.77 | loss: 6.49820 | epsilon: 0.01 399 | 3960 episode | score: 13.54 | loss: 8.94904 | epsilon: 0.01 400 | 3970 episode | score: 13.80 | loss: 4.33383 | epsilon: 0.01 401 | 3980 episode | score: 13.58 | loss: 6.72084 | epsilon: 0.01 402 | 3990 episode | score: 13.11 | loss: 5.56996 | epsilon: 0.01 403 | 4000 episode | score: 14.05 | loss: 6.95229 | epsilon: 0.01 404 | 4010 episode | score: 15.22 | loss: 6.26257 | epsilon: 0.01 405 | 4020 episode | score: 14.96 | loss: 8.59238 | epsilon: 0.01 406 | 4030 episode | score: 14.81 | loss: 5.12621 | epsilon: 0.01 407 | 4040 episode | score: 16.56 | loss: 5.35934 | epsilon: 0.01 408 | 4050 episode | score: 17.87 | loss: 5.79597 | epsilon: 0.01 409 | 4060 episode | score: 18.46 | loss: 5.03159 | epsilon: 0.01 410 | 4070 episode | score: 17.79 | loss: 4.54399 | epsilon: 0.01 411 | 4080 episode | score: 17.09 | loss: 5.42259 | epsilon: 0.01 412 | 4090 episode | score: 16.82 | loss: 6.25688 | epsilon: 0.01 413 | 4100 episode | score: 16.47 | loss: 4.95715 | epsilon: 0.01 414 | 4110 episode | score: 16.43 | loss: 6.76208 | epsilon: 0.01 415 | 4120 episode | score: 15.76 | loss: 5.24656 | epsilon: 0.01 416 | 4130 episode | score: 15.09 | loss: 4.21207 | epsilon: 0.01 417 | 4140 episode | score: 14.77 | loss: 4.31235 | epsilon: 0.01 418 | 4150 episode | score: 14.27 | loss: 7.29699 | epsilon: 0.01 419 | 4160 episode | score: 14.58 | loss: 8.66764 | epsilon: 0.01 420 | 4170 episode | score: 13.98 | loss: 9.69282 | epsilon: 0.01 421 | 4180 episode | score: 13.57 | loss: 11.01928 | epsilon: 0.01 422 | 4190 episode | score: 13.16 | loss: 7.08015 | epsilon: 0.01 423 | 4200 episode | score: 12.75 | loss: 11.21581 | epsilon: 0.01 424 | 4210 episode | score: 12.74 | loss: 8.57672 | epsilon: 0.01 425 | 4220 episode | score: 12.66 | loss: 7.83942 | epsilon: 0.01 426 | 4230 episode | score: 12.39 | loss: 7.03415 | epsilon: 0.01 427 | 4240 episode | score: 12.18 | loss: 9.00213 | epsilon: 0.01 428 | 4250 episode | score: 11.87 | loss: 5.73147 | epsilon: 0.01 429 | 4260 episode | score: 11.55 | loss: 10.40915 | epsilon: 0.01 430 | 4270 episode | score: 11.38 | loss: 7.70147 | epsilon: 0.01 431 | 4280 episode | score: 11.92 | loss: 10.10661 | epsilon: 0.01 432 | 4290 episode | score: 12.47 | loss: 10.77976 | epsilon: 0.01 433 | 4300 episode | score: 12.25 | loss: 11.34655 | epsilon: 0.01 434 | 4310 episode | score: 12.67 | loss: 8.98248 | epsilon: 0.01 435 | 4320 episode | score: 12.81 | loss: 5.49514 | epsilon: 0.01 436 | 4330 episode | score: 12.64 | loss: 10.40312 | epsilon: 0.01 437 | 4340 episode | score: 12.71 | loss: 8.04442 | epsilon: 0.01 438 | 4350 episode | score: 12.84 | loss: 5.51279 | epsilon: 0.01 439 | 4360 episode | score: 13.04 | loss: 5.63618 | epsilon: 0.01 440 | 4370 episode | score: 14.36 | loss: 7.06970 | epsilon: 0.01 441 | 4380 episode | score: 15.96 | loss: 3.44520 | epsilon: 0.01 442 | 4390 episode | score: 15.99 | loss: 4.16670 | epsilon: 0.01 443 | 4400 episode | score: 15.70 | loss: 6.04997 | epsilon: 0.01 444 | 4410 episode | score: 16.32 | loss: 3.34858 | epsilon: 0.01 445 | 4420 episode | score: 16.23 | loss: 3.47187 | epsilon: 0.01 446 | 4430 episode | score: 15.74 | loss: 2.59578 | epsilon: 0.01 447 | 4440 episode | score: 15.06 | loss: 4.78465 | epsilon: 0.01 448 | 4450 episode | score: 14.43 | loss: 4.92613 | epsilon: 0.01 449 | 4460 episode | score: 13.87 | loss: 5.04869 | epsilon: 0.01 450 | 4470 episode | score: 13.38 | loss: 4.26093 | epsilon: 0.01 451 | 4480 episode | score: 12.92 | loss: 8.86478 | epsilon: 0.01 452 | 4490 episode | score: 12.60 | loss: 7.41720 | epsilon: 0.01 453 | 4500 episode | score: 12.27 | loss: 6.93199 | epsilon: 0.01 454 | 4510 episode | score: 12.09 | loss: 8.05820 | epsilon: 0.01 455 | 4520 episode | score: 12.52 | loss: 11.38961 | epsilon: 0.01 456 | 4530 episode | score: 12.23 | loss: 3.42938 | epsilon: 0.01 457 | 4540 episode | score: 11.87 | loss: 6.62923 | epsilon: 0.01 458 | 4550 episode | score: 11.51 | loss: 9.16134 | epsilon: 0.01 459 | 4560 episode | score: 12.50 | loss: 7.77041 | epsilon: 0.01 460 | 4570 episode | score: 12.57 | loss: 5.53519 | epsilon: 0.01 461 | 4580 episode | score: 12.25 | loss: 4.22186 | epsilon: 0.01 462 | 4590 episode | score: 12.01 | loss: 7.64343 | epsilon: 0.01 463 | 4600 episode | score: 11.69 | loss: 9.08254 | epsilon: 0.01 464 | 4610 episode | score: 12.21 | loss: 9.23807 | epsilon: 0.01 465 | 4620 episode | score: 12.35 | loss: 7.85843 | epsilon: 0.01 466 | 4630 episode | score: 12.43 | loss: 4.00312 | epsilon: 0.01 467 | 4640 episode | score: 12.37 | loss: 6.80646 | epsilon: 0.01 468 | 4650 episode | score: 12.02 | loss: 5.93498 | epsilon: 0.01 469 | 4660 episode | score: 11.71 | loss: 8.09913 | epsilon: 0.01 470 | 4670 episode | score: 11.43 | loss: 6.34250 | epsilon: 0.01 471 | 4680 episode | score: 11.56 | loss: 4.99433 | epsilon: 0.01 472 | 4690 episode | score: 12.29 | loss: 9.94151 | epsilon: 0.01 473 | 4700 episode | score: 11.97 | loss: 7.68697 | epsilon: 0.01 474 | 4710 episode | score: 11.65 | loss: 4.49063 | epsilon: 0.01 475 | 4720 episode | score: 11.32 | loss: 7.65144 | epsilon: 0.01 476 | 4730 episode | score: 11.11 | loss: 7.93525 | epsilon: 0.01 477 | 4740 episode | score: 11.40 | loss: 10.41766 | epsilon: 0.01 478 | 4750 episode | score: 11.14 | loss: 8.34215 | epsilon: 0.01 479 | 4760 episode | score: 11.02 | loss: 8.46258 | epsilon: 0.01 480 | 4770 episode | score: 10.77 | loss: 9.06955 | epsilon: 0.01 481 | 4780 episode | score: 11.02 | loss: 6.47147 | epsilon: 0.01 482 | 4790 episode | score: 11.77 | loss: 11.79726 | epsilon: 0.01 483 | 4800 episode | score: 11.89 | loss: 6.76751 | epsilon: 0.01 484 | 4810 episode | score: 11.53 | loss: 3.67404 | epsilon: 0.01 485 | 4820 episode | score: 11.22 | loss: 8.01371 | epsilon: 0.01 486 | 4830 episode | score: 11.01 | loss: 5.74920 | epsilon: 0.01 487 | 4840 episode | score: 10.85 | loss: 6.84008 | epsilon: 0.01 488 | 4850 episode | score: 10.70 | loss: 7.53777 | epsilon: 0.01 489 | 4860 episode | score: 10.54 | loss: 7.15283 | epsilon: 0.01 490 | 4870 episode | score: 10.38 | loss: 4.72300 | epsilon: 0.01 491 | 4880 episode | score: 10.19 | loss: 8.11345 | epsilon: 0.01 492 | 4890 episode | score: 10.07 | loss: 11.14361 | epsilon: 0.01 493 | 4900 episode | score: 9.91 | loss: 11.17979 | epsilon: 0.01 494 | 4910 episode | score: 10.35 | loss: 10.31738 | epsilon: 0.01 495 | 4920 episode | score: 10.26 | loss: 9.51637 | epsilon: 0.01 496 | 4930 episode | score: 10.12 | loss: 9.10228 | epsilon: 0.01 497 | 4940 episode | score: 10.10 | loss: 8.66119 | epsilon: 0.01 498 | 4950 episode | score: 9.98 | loss: 12.24035 | epsilon: 0.01 499 | 4960 episode | score: 10.12 | loss: 8.12571 | epsilon: 0.01 500 | 4970 episode | score: 9.96 | loss: 9.27201 | epsilon: 0.01 501 | 4980 episode | score: 9.76 | loss: 6.27622 | epsilon: 0.01 502 | 4990 episode | score: 10.96 | loss: 8.89670 | epsilon: 0.01 503 | -------------------------------------------------------------------------------- /out/trace_DTQN_3.txt: -------------------------------------------------------------------------------- 1 | state size: 2 2 | action size: 2 3 | 0 episode | score: 29.00 | loss: 0.00000 | epsilon: 1.00 4 | 10 episode | score: 28.05 | loss: 0.00000 | epsilon: 1.00 5 | 20 episode | score: 27.76 | loss: 0.00000 | epsilon: 1.00 6 | 30 episode | score: 26.79 | loss: 0.00000 | epsilon: 1.00 7 | 40 episode | score: 26.39 | loss: 0.00000 | epsilon: 1.00 8 | 50 episode | score: 25.36 | loss: 0.28919 | epsilon: 1.00 9 | 60 episode | score: 25.46 | loss: 0.23288 | epsilon: 0.98 10 | 70 episode | score: 24.84 | loss: 0.11278 | epsilon: 0.97 11 | 80 episode | score: 25.18 | loss: 0.06212 | epsilon: 0.96 12 | 90 episode | score: 24.90 | loss: 0.24484 | epsilon: 0.95 13 | 100 episode | score: 24.19 | loss: 0.17145 | epsilon: 0.94 14 | 110 episode | score: 23.67 | loss: 0.42094 | epsilon: 0.93 15 | 120 episode | score: 23.28 | loss: 0.60395 | epsilon: 0.92 16 | 130 episode | score: 23.17 | loss: 0.96942 | epsilon: 0.90 17 | 140 episode | score: 23.19 | loss: 0.56008 | epsilon: 0.89 18 | 150 episode | score: 22.70 | loss: 0.59539 | epsilon: 0.88 19 | 160 episode | score: 22.48 | loss: 0.44283 | epsilon: 0.87 20 | 170 episode | score: 22.14 | loss: 0.87171 | epsilon: 0.86 21 | 180 episode | score: 21.63 | loss: 0.70601 | epsilon: 0.85 22 | 190 episode | score: 21.39 | loss: 0.28002 | epsilon: 0.84 23 | 200 episode | score: 22.30 | loss: 1.02817 | epsilon: 0.83 24 | 210 episode | score: 21.70 | loss: 1.63453 | epsilon: 0.82 25 | 220 episode | score: 22.18 | loss: 1.24204 | epsilon: 0.80 26 | 230 episode | score: 22.39 | loss: 1.17515 | epsilon: 0.79 27 | 240 episode | score: 22.06 | loss: 0.93853 | epsilon: 0.78 28 | 250 episode | score: 21.61 | loss: 0.95911 | epsilon: 0.77 29 | 260 episode | score: 21.10 | loss: 2.43419 | epsilon: 0.76 30 | 270 episode | score: 20.45 | loss: 0.67385 | epsilon: 0.76 31 | 280 episode | score: 19.91 | loss: 1.40381 | epsilon: 0.75 32 | 290 episode | score: 19.91 | loss: 1.05976 | epsilon: 0.74 33 | 300 episode | score: 20.05 | loss: 1.42052 | epsilon: 0.73 34 | 310 episode | score: 20.11 | loss: 0.39303 | epsilon: 0.72 35 | 320 episode | score: 19.67 | loss: 1.86086 | epsilon: 0.71 36 | 330 episode | score: 19.71 | loss: 0.78429 | epsilon: 0.70 37 | 340 episode | score: 19.21 | loss: 1.49736 | epsilon: 0.69 38 | 350 episode | score: 18.73 | loss: 0.79869 | epsilon: 0.68 39 | 360 episode | score: 18.79 | loss: 1.20941 | epsilon: 0.67 40 | 370 episode | score: 18.69 | loss: 1.23513 | epsilon: 0.66 41 | 380 episode | score: 18.26 | loss: 1.24348 | epsilon: 0.65 42 | 390 episode | score: 18.08 | loss: 0.44699 | epsilon: 0.65 43 | 400 episode | score: 18.15 | loss: 2.12892 | epsilon: 0.64 44 | 410 episode | score: 18.10 | loss: 1.73747 | epsilon: 0.63 45 | 420 episode | score: 17.90 | loss: 0.91533 | epsilon: 0.62 46 | 430 episode | score: 17.50 | loss: 2.23212 | epsilon: 0.61 47 | 440 episode | score: 18.26 | loss: 0.95165 | epsilon: 0.60 48 | 450 episode | score: 17.96 | loss: 1.67144 | epsilon: 0.59 49 | 460 episode | score: 17.76 | loss: 0.48981 | epsilon: 0.58 50 | 470 episode | score: 17.69 | loss: 1.89966 | epsilon: 0.57 51 | 480 episode | score: 17.55 | loss: 1.95501 | epsilon: 0.56 52 | 490 episode | score: 17.43 | loss: 0.83967 | epsilon: 0.55 53 | 500 episode | score: 17.33 | loss: 1.94789 | epsilon: 0.55 54 | 510 episode | score: 16.99 | loss: 2.94673 | epsilon: 0.54 55 | 520 episode | score: 16.67 | loss: 2.02298 | epsilon: 0.53 56 | 530 episode | score: 16.61 | loss: 2.01256 | epsilon: 0.52 57 | 540 episode | score: 16.47 | loss: 3.10653 | epsilon: 0.51 58 | 550 episode | score: 16.94 | loss: 2.09790 | epsilon: 0.50 59 | 560 episode | score: 16.67 | loss: 2.59242 | epsilon: 0.50 60 | 570 episode | score: 16.50 | loss: 3.64957 | epsilon: 0.49 61 | 580 episode | score: 16.16 | loss: 0.54352 | epsilon: 0.48 62 | 590 episode | score: 15.85 | loss: 3.84055 | epsilon: 0.47 63 | 600 episode | score: 15.43 | loss: 2.66422 | epsilon: 0.47 64 | 610 episode | score: 15.15 | loss: 2.68563 | epsilon: 0.46 65 | 620 episode | score: 14.77 | loss: 4.83064 | epsilon: 0.46 66 | 630 episode | score: 14.36 | loss: 3.35041 | epsilon: 0.45 67 | 640 episode | score: 14.00 | loss: 3.83584 | epsilon: 0.44 68 | 650 episode | score: 14.20 | loss: 4.90510 | epsilon: 0.44 69 | 660 episode | score: 13.95 | loss: 4.43448 | epsilon: 0.43 70 | 670 episode | score: 13.63 | loss: 3.32779 | epsilon: 0.42 71 | 680 episode | score: 13.54 | loss: 2.79640 | epsilon: 0.42 72 | 690 episode | score: 13.35 | loss: 3.95378 | epsilon: 0.41 73 | 700 episode | score: 13.16 | loss: 2.83241 | epsilon: 0.40 74 | 710 episode | score: 13.19 | loss: 4.54109 | epsilon: 0.40 75 | 720 episode | score: 13.05 | loss: 3.42609 | epsilon: 0.39 76 | 730 episode | score: 13.22 | loss: 2.38219 | epsilon: 0.38 77 | 740 episode | score: 12.99 | loss: 4.02707 | epsilon: 0.38 78 | 750 episode | score: 13.24 | loss: 2.98577 | epsilon: 0.37 79 | 760 episode | score: 13.19 | loss: 4.69834 | epsilon: 0.36 80 | 770 episode | score: 13.21 | loss: 4.26403 | epsilon: 0.35 81 | 780 episode | score: 12.95 | loss: 5.31904 | epsilon: 0.35 82 | 790 episode | score: 12.87 | loss: 4.12925 | epsilon: 0.34 83 | 800 episode | score: 12.56 | loss: 1.19411 | epsilon: 0.34 84 | 810 episode | score: 12.33 | loss: 5.39290 | epsilon: 0.33 85 | 820 episode | score: 12.34 | loss: 5.47288 | epsilon: 0.32 86 | 830 episode | score: 12.19 | loss: 3.06479 | epsilon: 0.32 87 | 840 episode | score: 12.10 | loss: 4.25964 | epsilon: 0.31 88 | 850 episode | score: 11.98 | loss: 6.09201 | epsilon: 0.31 89 | 860 episode | score: 11.82 | loss: 6.12898 | epsilon: 0.30 90 | 870 episode | score: 11.67 | loss: 3.80265 | epsilon: 0.29 91 | 880 episode | score: 11.57 | loss: 5.54596 | epsilon: 0.29 92 | 890 episode | score: 11.67 | loss: 7.39544 | epsilon: 0.28 93 | 900 episode | score: 11.83 | loss: 5.60033 | epsilon: 0.28 94 | 910 episode | score: 11.80 | loss: 6.22728 | epsilon: 0.27 95 | 920 episode | score: 11.77 | loss: 6.29498 | epsilon: 0.26 96 | 930 episode | score: 11.73 | loss: 6.87828 | epsilon: 0.26 97 | 940 episode | score: 11.57 | loss: 3.14054 | epsilon: 0.25 98 | 950 episode | score: 11.44 | loss: 4.42497 | epsilon: 0.25 99 | 960 episode | score: 11.75 | loss: 6.47100 | epsilon: 0.24 100 | 970 episode | score: 12.03 | loss: 2.65088 | epsilon: 0.23 101 | 980 episode | score: 11.77 | loss: 3.27828 | epsilon: 0.22 102 | 990 episode | score: 11.52 | loss: 5.82138 | epsilon: 0.22 103 | 1000 episode | score: 11.37 | loss: 5.23701 | epsilon: 0.21 104 | 1010 episode | score: 11.30 | loss: 5.16171 | epsilon: 0.21 105 | 1020 episode | score: 11.14 | loss: 4.57056 | epsilon: 0.20 106 | 1030 episode | score: 10.93 | loss: 8.42826 | epsilon: 0.20 107 | 1040 episode | score: 10.83 | loss: 3.27311 | epsilon: 0.19 108 | 1050 episode | score: 10.78 | loss: 6.52097 | epsilon: 0.19 109 | 1060 episode | score: 10.73 | loss: 6.55692 | epsilon: 0.18 110 | 1070 episode | score: 10.52 | loss: 10.46111 | epsilon: 0.18 111 | 1080 episode | score: 10.43 | loss: 5.27200 | epsilon: 0.17 112 | 1090 episode | score: 10.31 | loss: 6.59293 | epsilon: 0.17 113 | 1100 episode | score: 10.26 | loss: 7.88621 | epsilon: 0.16 114 | 1110 episode | score: 10.48 | loss: 5.32516 | epsilon: 0.15 115 | 1120 episode | score: 10.32 | loss: 5.33982 | epsilon: 0.15 116 | 1130 episode | score: 10.32 | loss: 8.01697 | epsilon: 0.14 117 | 1140 episode | score: 10.26 | loss: 4.68853 | epsilon: 0.14 118 | 1150 episode | score: 10.28 | loss: 8.93066 | epsilon: 0.13 119 | 1160 episode | score: 10.23 | loss: 7.86792 | epsilon: 0.13 120 | 1170 episode | score: 10.24 | loss: 3.57884 | epsilon: 0.12 121 | 1180 episode | score: 10.76 | loss: 6.14561 | epsilon: 0.11 122 | 1190 episode | score: 10.62 | loss: 7.47268 | epsilon: 0.11 123 | 1200 episode | score: 10.99 | loss: 8.17394 | epsilon: 0.10 124 | 1210 episode | score: 10.86 | loss: 5.48570 | epsilon: 0.09 125 | 1220 episode | score: 11.02 | loss: 4.13661 | epsilon: 0.09 126 | 1230 episode | score: 11.16 | loss: 6.14135 | epsilon: 0.08 127 | 1240 episode | score: 11.00 | loss: 5.50650 | epsilon: 0.08 128 | 1250 episode | score: 11.53 | loss: 4.82163 | epsilon: 0.07 129 | 1260 episode | score: 11.25 | loss: 4.83445 | epsilon: 0.06 130 | 1270 episode | score: 11.17 | loss: 6.16942 | epsilon: 0.06 131 | 1280 episode | score: 11.03 | loss: 4.15855 | epsilon: 0.05 132 | 1290 episode | score: 10.96 | loss: 6.19270 | epsilon: 0.05 133 | 1300 episode | score: 10.77 | loss: 10.95965 | epsilon: 0.04 134 | 1310 episode | score: 10.54 | loss: 7.58096 | epsilon: 0.04 135 | 1320 episode | score: 10.33 | loss: 7.57102 | epsilon: 0.03 136 | 1330 episode | score: 10.18 | loss: 6.90589 | epsilon: 0.03 137 | 1340 episode | score: 10.03 | loss: 8.31337 | epsilon: 0.02 138 | 1350 episode | score: 9.90 | loss: 7.61505 | epsilon: 0.02 139 | 1360 episode | score: 9.80 | loss: 9.71177 | epsilon: 0.01 140 | 1370 episode | score: 9.65 | loss: 8.40706 | epsilon: 0.01 141 | 1380 episode | score: 9.55 | loss: 11.78060 | epsilon: 0.01 142 | 1390 episode | score: 9.41 | loss: 8.36876 | epsilon: 0.01 143 | 1400 episode | score: 9.38 | loss: 10.45550 | epsilon: 0.01 144 | 1410 episode | score: 9.42 | loss: 4.21109 | epsilon: 0.01 145 | 1420 episode | score: 9.33 | loss: 10.47225 | epsilon: 0.01 146 | 1430 episode | score: 9.36 | loss: 7.83792 | epsilon: 0.01 147 | 1440 episode | score: 9.24 | loss: 7.73365 | epsilon: 0.01 148 | 1450 episode | score: 9.28 | loss: 7.04406 | epsilon: 0.01 149 | 1460 episode | score: 9.22 | loss: 6.33223 | epsilon: 0.01 150 | 1470 episode | score: 9.11 | loss: 11.27962 | epsilon: 0.01 151 | 1480 episode | score: 9.09 | loss: 8.48455 | epsilon: 0.01 152 | 1490 episode | score: 9.11 | loss: 9.19448 | epsilon: 0.01 153 | 1500 episode | score: 9.05 | loss: 9.20194 | epsilon: 0.01 154 | 1510 episode | score: 8.97 | loss: 10.62578 | epsilon: 0.01 155 | 1520 episode | score: 9.04 | loss: 8.54028 | epsilon: 0.01 156 | 1530 episode | score: 9.00 | loss: 8.52981 | epsilon: 0.01 157 | 1540 episode | score: 9.02 | loss: 7.83667 | epsilon: 0.01 158 | 1550 episode | score: 8.95 | loss: 12.84651 | epsilon: 0.01 159 | 1560 episode | score: 9.13 | loss: 7.17148 | epsilon: 0.01 160 | 1570 episode | score: 9.27 | loss: 8.56434 | epsilon: 0.01 161 | 1580 episode | score: 9.35 | loss: 11.46341 | epsilon: 0.01 162 | 1590 episode | score: 9.27 | loss: 8.61150 | epsilon: 0.01 163 | 1600 episode | score: 9.65 | loss: 10.73893 | epsilon: 0.01 164 | 1610 episode | score: 9.61 | loss: 7.24374 | epsilon: 0.01 165 | 1620 episode | score: 9.56 | loss: 7.25160 | epsilon: 0.01 166 | 1630 episode | score: 9.42 | loss: 6.48143 | epsilon: 0.01 167 | 1640 episode | score: 9.36 | loss: 8.69397 | epsilon: 0.01 168 | 1650 episode | score: 9.29 | loss: 7.95489 | epsilon: 0.01 169 | 1660 episode | score: 9.34 | loss: 8.01149 | epsilon: 0.01 170 | 1670 episode | score: 9.89 | loss: 9.41617 | epsilon: 0.01 171 | 1680 episode | score: 9.77 | loss: 10.11328 | epsilon: 0.01 172 | 1690 episode | score: 10.21 | loss: 10.17882 | epsilon: 0.01 173 | 1700 episode | score: 10.01 | loss: 7.34649 | epsilon: 0.01 174 | 1710 episode | score: 9.93 | loss: 8.06988 | epsilon: 0.01 175 | 1720 episode | score: 9.95 | loss: 5.81043 | epsilon: 0.01 176 | 1730 episode | score: 9.81 | loss: 5.08927 | epsilon: 0.01 177 | 1740 episode | score: 9.82 | loss: 8.00304 | epsilon: 0.01 178 | 1750 episode | score: 9.66 | loss: 6.57829 | epsilon: 0.01 179 | 1760 episode | score: 9.54 | loss: 7.31608 | epsilon: 0.01 180 | 1770 episode | score: 9.43 | loss: 8.07021 | epsilon: 0.01 181 | 1780 episode | score: 9.34 | loss: 8.83427 | epsilon: 0.01 182 | 1790 episode | score: 9.30 | loss: 8.83933 | epsilon: 0.01 183 | 1800 episode | score: 9.24 | loss: 5.92690 | epsilon: 0.01 184 | 1810 episode | score: 9.16 | loss: 11.05879 | epsilon: 0.01 185 | 1820 episode | score: 9.08 | loss: 11.79762 | epsilon: 0.01 186 | 1830 episode | score: 9.00 | loss: 12.45628 | epsilon: 0.01 187 | 1840 episode | score: 9.03 | loss: 8.79590 | epsilon: 0.01 188 | 1850 episode | score: 9.01 | loss: 10.99454 | epsilon: 0.01 189 | 1860 episode | score: 9.05 | loss: 8.92821 | epsilon: 0.01 190 | 1870 episode | score: 9.17 | loss: 10.29554 | epsilon: 0.01 191 | 1880 episode | score: 9.10 | loss: 12.52881 | epsilon: 0.01 192 | 1890 episode | score: 9.00 | loss: 11.12002 | epsilon: 0.01 193 | 1900 episode | score: 8.95 | loss: 9.59325 | epsilon: 0.01 194 | 1910 episode | score: 8.93 | loss: 13.28574 | epsilon: 0.01 195 | 1920 episode | score: 8.91 | loss: 10.31185 | epsilon: 0.01 196 | 1930 episode | score: 8.89 | loss: 12.52143 | epsilon: 0.01 197 | 1940 episode | score: 8.91 | loss: 9.66494 | epsilon: 0.01 198 | 1950 episode | score: 8.88 | loss: 11.86391 | epsilon: 0.01 199 | 1960 episode | score: 8.77 | loss: 9.59976 | epsilon: 0.01 200 | 1970 episode | score: 8.72 | loss: 14.83946 | epsilon: 0.01 201 | 1980 episode | score: 8.71 | loss: 12.61964 | epsilon: 0.01 202 | 1990 episode | score: 8.65 | loss: 14.11471 | epsilon: 0.01 203 | 2000 episode | score: 8.64 | loss: 10.42087 | epsilon: 0.01 204 | 2010 episode | score: 8.56 | loss: 11.18098 | epsilon: 0.01 205 | 2020 episode | score: 8.56 | loss: 7.49533 | epsilon: 0.01 206 | 2030 episode | score: 8.52 | loss: 11.23166 | epsilon: 0.01 207 | 2040 episode | score: 8.49 | loss: 13.45408 | epsilon: 0.01 208 | 2050 episode | score: 8.50 | loss: 15.70362 | epsilon: 0.01 209 | 2060 episode | score: 8.47 | loss: 8.98237 | epsilon: 0.01 210 | 2070 episode | score: 8.48 | loss: 9.80346 | epsilon: 0.01 211 | 2080 episode | score: 8.47 | loss: 17.99310 | epsilon: 0.01 212 | 2090 episode | score: 8.51 | loss: 9.07208 | epsilon: 0.01 213 | 2100 episode | score: 8.52 | loss: 12.03263 | epsilon: 0.01 214 | 2110 episode | score: 8.50 | loss: 11.25899 | epsilon: 0.01 215 | 2120 episode | score: 8.47 | loss: 15.76247 | epsilon: 0.01 216 | 2130 episode | score: 8.47 | loss: 9.03900 | epsilon: 0.01 217 | 2140 episode | score: 8.44 | loss: 7.63310 | epsilon: 0.01 218 | 2150 episode | score: 8.45 | loss: 11.31914 | epsilon: 0.01 219 | 2160 episode | score: 8.50 | loss: 10.52649 | epsilon: 0.01 220 | 2170 episode | score: 8.49 | loss: 9.82745 | epsilon: 0.01 221 | 2180 episode | score: 8.46 | loss: 9.84642 | epsilon: 0.01 222 | 2190 episode | score: 8.47 | loss: 11.29416 | epsilon: 0.01 223 | 2200 episode | score: 8.43 | loss: 12.78517 | epsilon: 0.01 224 | 2210 episode | score: 8.44 | loss: 12.05975 | epsilon: 0.01 225 | 2220 episode | score: 8.43 | loss: 9.84734 | epsilon: 0.01 226 | 2230 episode | score: 8.45 | loss: 9.78824 | epsilon: 0.01 227 | 2240 episode | score: 8.44 | loss: 9.05212 | epsilon: 0.01 228 | 2250 episode | score: 8.41 | loss: 15.08025 | epsilon: 0.01 229 | 2260 episode | score: 8.61 | loss: 9.82519 | epsilon: 0.01 230 | 2270 episode | score: 8.55 | loss: 13.58749 | epsilon: 0.01 231 | 2280 episode | score: 8.55 | loss: 13.59709 | epsilon: 0.01 232 | 2290 episode | score: 8.51 | loss: 14.37598 | epsilon: 0.01 233 | 2300 episode | score: 8.51 | loss: 13.62597 | epsilon: 0.01 234 | 2310 episode | score: 8.47 | loss: 11.37246 | epsilon: 0.01 235 | 2320 episode | score: 8.46 | loss: 12.93501 | epsilon: 0.01 236 | 2330 episode | score: 8.50 | loss: 12.89200 | epsilon: 0.01 237 | 2340 episode | score: 8.47 | loss: 10.60221 | epsilon: 0.01 238 | 2350 episode | score: 8.56 | loss: 12.27176 | epsilon: 0.01 239 | 2360 episode | score: 8.64 | loss: 10.60565 | epsilon: 0.01 240 | 2370 episode | score: 8.69 | loss: 8.35473 | epsilon: 0.01 241 | 2380 episode | score: 8.67 | loss: 12.27118 | epsilon: 0.01 242 | 2390 episode | score: 8.62 | loss: 7.66543 | epsilon: 0.01 243 | 2400 episode | score: 8.61 | loss: 6.15029 | epsilon: 0.01 244 | 2410 episode | score: 8.59 | loss: 9.96755 | epsilon: 0.01 245 | 2420 episode | score: 8.71 | loss: 12.24784 | epsilon: 0.01 246 | 2430 episode | score: 8.73 | loss: 9.98101 | epsilon: 0.01 247 | 2440 episode | score: 8.79 | loss: 11.48864 | epsilon: 0.01 248 | 2450 episode | score: 8.72 | loss: 7.72658 | epsilon: 0.01 249 | 2460 episode | score: 8.71 | loss: 8.52732 | epsilon: 0.01 250 | 2470 episode | score: 8.69 | loss: 12.31986 | epsilon: 0.01 251 | 2480 episode | score: 8.66 | loss: 9.25687 | epsilon: 0.01 252 | 2490 episode | score: 8.84 | loss: 6.97792 | epsilon: 0.01 253 | 2500 episode | score: 8.88 | loss: 7.70792 | epsilon: 0.01 254 | 2510 episode | score: 8.82 | loss: 7.77612 | epsilon: 0.01 255 | 2520 episode | score: 8.78 | loss: 10.07798 | epsilon: 0.01 256 | 2530 episode | score: 8.78 | loss: 4.70675 | epsilon: 0.01 257 | 2540 episode | score: 8.81 | loss: 10.03284 | epsilon: 0.01 258 | 2550 episode | score: 8.77 | loss: 10.06382 | epsilon: 0.01 259 | 2560 episode | score: 8.72 | loss: 7.71937 | epsilon: 0.01 260 | 2570 episode | score: 8.70 | loss: 12.33020 | epsilon: 0.01 261 | 2580 episode | score: 8.67 | loss: 10.13223 | epsilon: 0.01 262 | 2590 episode | score: 8.63 | loss: 16.95634 | epsilon: 0.01 263 | 2600 episode | score: 8.67 | loss: 6.94287 | epsilon: 0.01 264 | 2610 episode | score: 8.63 | loss: 10.93829 | epsilon: 0.01 265 | 2620 episode | score: 8.64 | loss: 10.09154 | epsilon: 0.01 266 | 2630 episode | score: 8.65 | loss: 11.65590 | epsilon: 0.01 267 | 2640 episode | score: 8.73 | loss: 8.57849 | epsilon: 0.01 268 | 2650 episode | score: 8.69 | loss: 10.87277 | epsilon: 0.01 269 | 2660 episode | score: 8.80 | loss: 9.33172 | epsilon: 0.01 270 | 2670 episode | score: 8.80 | loss: 7.81524 | epsilon: 0.01 271 | 2680 episode | score: 8.82 | loss: 8.58555 | epsilon: 0.01 272 | 2690 episode | score: 8.81 | loss: 7.02390 | epsilon: 0.01 273 | 2700 episode | score: 8.74 | loss: 8.66500 | epsilon: 0.01 274 | 2710 episode | score: 8.69 | loss: 11.73996 | epsilon: 0.01 275 | 2720 episode | score: 8.68 | loss: 7.07774 | epsilon: 0.01 276 | 2730 episode | score: 8.66 | loss: 10.19452 | epsilon: 0.01 277 | 2740 episode | score: 8.65 | loss: 7.81020 | epsilon: 0.01 278 | 2750 episode | score: 8.60 | loss: 10.97668 | epsilon: 0.01 279 | 2760 episode | score: 8.70 | loss: 10.99998 | epsilon: 0.01 280 | 2770 episode | score: 8.63 | loss: 9.45322 | epsilon: 0.01 281 | 2780 episode | score: 8.66 | loss: 10.94385 | epsilon: 0.01 282 | 2790 episode | score: 8.67 | loss: 14.88072 | epsilon: 0.01 283 | 2800 episode | score: 8.72 | loss: 12.54147 | epsilon: 0.01 284 | 2810 episode | score: 8.68 | loss: 10.23026 | epsilon: 0.01 285 | 2820 episode | score: 8.66 | loss: 12.69516 | epsilon: 0.01 286 | 2830 episode | score: 8.63 | loss: 13.34911 | epsilon: 0.01 287 | 2840 episode | score: 8.61 | loss: 11.87338 | epsilon: 0.01 288 | 2850 episode | score: 8.59 | loss: 11.08297 | epsilon: 0.01 289 | 2860 episode | score: 8.57 | loss: 11.82590 | epsilon: 0.01 290 | 2870 episode | score: 8.87 | loss: 9.44830 | epsilon: 0.01 291 | 2880 episode | score: 8.81 | loss: 10.29489 | epsilon: 0.01 292 | 2890 episode | score: 8.80 | loss: 10.28550 | epsilon: 0.01 293 | 2900 episode | score: 8.76 | loss: 11.07845 | epsilon: 0.01 294 | 2910 episode | score: 8.72 | loss: 11.16145 | epsilon: 0.01 295 | 2920 episode | score: 8.71 | loss: 9.51274 | epsilon: 0.01 296 | 2930 episode | score: 8.71 | loss: 15.77555 | epsilon: 0.01 297 | 2940 episode | score: 8.68 | loss: 9.46159 | epsilon: 0.01 298 | 2950 episode | score: 8.71 | loss: 11.06020 | epsilon: 0.01 299 | 2960 episode | score: 8.70 | loss: 12.72564 | epsilon: 0.01 300 | 2970 episode | score: 8.64 | loss: 10.22397 | epsilon: 0.01 301 | 2980 episode | score: 8.95 | loss: 8.74962 | epsilon: 0.01 302 | 2990 episode | score: 9.00 | loss: 7.99252 | epsilon: 0.01 303 | 3000 episode | score: 8.98 | loss: 8.66277 | epsilon: 0.01 304 | 3010 episode | score: 9.04 | loss: 11.75993 | epsilon: 0.01 305 | 3020 episode | score: 8.98 | loss: 11.04120 | epsilon: 0.01 306 | 3030 episode | score: 8.92 | loss: 9.44570 | epsilon: 0.01 307 | 3040 episode | score: 8.92 | loss: 14.13674 | epsilon: 0.01 308 | 3050 episode | score: 8.82 | loss: 14.19668 | epsilon: 0.01 309 | 3060 episode | score: 8.80 | loss: 8.63911 | epsilon: 0.01 310 | 3070 episode | score: 8.78 | loss: 10.21673 | epsilon: 0.01 311 | 3080 episode | score: 8.73 | loss: 10.22470 | epsilon: 0.01 312 | 3090 episode | score: 8.71 | loss: 11.04682 | epsilon: 0.01 313 | 3100 episode | score: 8.68 | loss: 9.47171 | epsilon: 0.01 314 | 3110 episode | score: 8.65 | loss: 13.38438 | epsilon: 0.01 315 | 3120 episode | score: 8.66 | loss: 11.09533 | epsilon: 0.01 316 | 3130 episode | score: 8.71 | loss: 13.38132 | epsilon: 0.01 317 | 3140 episode | score: 8.68 | loss: 8.69981 | epsilon: 0.01 318 | 3150 episode | score: 8.65 | loss: 11.82153 | epsilon: 0.01 319 | 3160 episode | score: 8.65 | loss: 7.90522 | epsilon: 0.01 320 | 3170 episode | score: 8.74 | loss: 10.32336 | epsilon: 0.01 321 | 3180 episode | score: 8.68 | loss: 9.49167 | epsilon: 0.01 322 | 3190 episode | score: 8.68 | loss: 13.40731 | epsilon: 0.01 323 | 3200 episode | score: 8.67 | loss: 14.95716 | epsilon: 0.01 324 | 3210 episode | score: 8.64 | loss: 14.95343 | epsilon: 0.01 325 | 3220 episode | score: 8.62 | loss: 9.44335 | epsilon: 0.01 326 | 3230 episode | score: 8.55 | loss: 12.65490 | epsilon: 0.01 327 | 3240 episode | score: 8.51 | loss: 14.23077 | epsilon: 0.01 328 | 3250 episode | score: 8.60 | loss: 11.86367 | epsilon: 0.01 329 | 3260 episode | score: 8.58 | loss: 12.63986 | epsilon: 0.01 330 | 3270 episode | score: 8.63 | loss: 14.20297 | epsilon: 0.01 331 | 3280 episode | score: 8.62 | loss: 15.74779 | epsilon: 0.01 332 | 3290 episode | score: 8.59 | loss: 9.47025 | epsilon: 0.01 333 | 3300 episode | score: 8.72 | loss: 11.01989 | epsilon: 0.01 334 | 3310 episode | score: 8.71 | loss: 7.90248 | epsilon: 0.01 335 | 3320 episode | score: 8.70 | loss: 8.67504 | epsilon: 0.01 336 | 3330 episode | score: 8.68 | loss: 9.49761 | epsilon: 0.01 337 | 3340 episode | score: 8.69 | loss: 11.00257 | epsilon: 0.01 338 | 3350 episode | score: 8.69 | loss: 15.71583 | epsilon: 0.01 339 | 3360 episode | score: 8.82 | loss: 11.00038 | epsilon: 0.01 340 | 3370 episode | score: 8.78 | loss: 13.48424 | epsilon: 0.01 341 | 3380 episode | score: 8.76 | loss: 12.62969 | epsilon: 0.01 342 | 3390 episode | score: 8.70 | loss: 10.23487 | epsilon: 0.01 343 | 3400 episode | score: 8.71 | loss: 11.04372 | epsilon: 0.01 344 | 3410 episode | score: 8.68 | loss: 13.37738 | epsilon: 0.01 345 | 3420 episode | score: 8.65 | loss: 11.09719 | epsilon: 0.01 346 | 3430 episode | score: 8.62 | loss: 13.40267 | epsilon: 0.01 347 | 3440 episode | score: 8.55 | loss: 8.69839 | epsilon: 0.01 348 | 3450 episode | score: 8.55 | loss: 11.86583 | epsilon: 0.01 349 | 3460 episode | score: 8.57 | loss: 13.41148 | epsilon: 0.01 350 | 3470 episode | score: 8.59 | loss: 10.25880 | epsilon: 0.01 351 | 3480 episode | score: 8.58 | loss: 10.23511 | epsilon: 0.01 352 | 3490 episode | score: 8.56 | loss: 11.02219 | epsilon: 0.01 353 | 3500 episode | score: 8.53 | loss: 14.15510 | epsilon: 0.01 354 | 3510 episode | score: 8.51 | loss: 10.25200 | epsilon: 0.01 355 | 3520 episode | score: 8.50 | loss: 9.49207 | epsilon: 0.01 356 | 3530 episode | score: 8.45 | loss: 13.36084 | epsilon: 0.01 357 | 3540 episode | score: 8.59 | loss: 11.04989 | epsilon: 0.01 358 | 3550 episode | score: 8.66 | loss: 11.04657 | epsilon: 0.01 359 | 3560 episode | score: 8.73 | loss: 13.35894 | epsilon: 0.01 360 | 3570 episode | score: 8.70 | loss: 8.79981 | epsilon: 0.01 361 | 3580 episode | score: 8.66 | loss: 15.00119 | epsilon: 0.01 362 | 3590 episode | score: 8.61 | loss: 14.17593 | epsilon: 0.01 363 | 3600 episode | score: 8.57 | loss: 14.23392 | epsilon: 0.01 364 | 3610 episode | score: 8.56 | loss: 11.84069 | epsilon: 0.01 365 | 3620 episode | score: 8.67 | loss: 11.06954 | epsilon: 0.01 366 | 3630 episode | score: 8.61 | loss: 7.12155 | epsilon: 0.01 367 | 3640 episode | score: 8.59 | loss: 14.26435 | epsilon: 0.01 368 | 3650 episode | score: 8.60 | loss: 11.10530 | epsilon: 0.01 369 | 3660 episode | score: 8.71 | loss: 13.45249 | epsilon: 0.01 370 | 3670 episode | score: 8.74 | loss: 10.33090 | epsilon: 0.01 371 | 3680 episode | score: 8.69 | loss: 5.57306 | epsilon: 0.01 372 | 3690 episode | score: 8.80 | loss: 10.29642 | epsilon: 0.01 373 | 3700 episode | score: 8.76 | loss: 10.33631 | epsilon: 0.01 374 | 3710 episode | score: 8.74 | loss: 10.33454 | epsilon: 0.01 375 | 3720 episode | score: 8.72 | loss: 10.36367 | epsilon: 0.01 376 | 3730 episode | score: 8.69 | loss: 12.73713 | epsilon: 0.01 377 | 3740 episode | score: 8.68 | loss: 9.54919 | epsilon: 0.01 378 | 3750 episode | score: 8.79 | loss: 10.33279 | epsilon: 0.01 379 | 3760 episode | score: 8.92 | loss: 11.97439 | epsilon: 0.01 380 | 3770 episode | score: 8.85 | loss: 12.01616 | epsilon: 0.01 381 | 3780 episode | score: 8.86 | loss: 11.89962 | epsilon: 0.01 382 | 3790 episode | score: 8.76 | loss: 12.68493 | epsilon: 0.01 383 | 3800 episode | score: 8.75 | loss: 14.28559 | epsilon: 0.01 384 | 3810 episode | score: 8.68 | loss: 9.54066 | epsilon: 0.01 385 | 3820 episode | score: 8.70 | loss: 12.69197 | epsilon: 0.01 386 | 3830 episode | score: 8.67 | loss: 14.34066 | epsilon: 0.01 387 | 3840 episode | score: 8.61 | loss: 8.75319 | epsilon: 0.01 388 | 3850 episode | score: 8.60 | loss: 10.33837 | epsilon: 0.01 389 | 3860 episode | score: 8.57 | loss: 12.74596 | epsilon: 0.01 390 | 3870 episode | score: 8.67 | loss: 11.18132 | epsilon: 0.01 391 | 3880 episode | score: 8.63 | loss: 12.75840 | epsilon: 0.01 392 | 3890 episode | score: 8.63 | loss: 12.73372 | epsilon: 0.01 393 | 3900 episode | score: 8.60 | loss: 9.57410 | epsilon: 0.01 394 | 3910 episode | score: 8.57 | loss: 11.99219 | epsilon: 0.01 395 | 3920 episode | score: 8.53 | loss: 12.74140 | epsilon: 0.01 396 | 3930 episode | score: 8.50 | loss: 13.56777 | epsilon: 0.01 397 | 3940 episode | score: 8.52 | loss: 9.62430 | epsilon: 0.01 398 | 3950 episode | score: 8.51 | loss: 15.94888 | epsilon: 0.01 399 | 3960 episode | score: 8.53 | loss: 14.44929 | epsilon: 0.01 400 | 3970 episode | score: 8.51 | loss: 14.33287 | epsilon: 0.01 401 | 3980 episode | score: 8.54 | loss: 11.23067 | epsilon: 0.01 402 | 3990 episode | score: 8.59 | loss: 14.34064 | epsilon: 0.01 403 | 4000 episode | score: 8.52 | loss: 8.06927 | epsilon: 0.01 404 | 4010 episode | score: 8.86 | loss: 4.83090 | epsilon: 0.01 405 | 4020 episode | score: 9.16 | loss: 9.64705 | epsilon: 0.01 406 | 4030 episode | score: 9.07 | loss: 8.81061 | epsilon: 0.01 407 | 4040 episode | score: 9.02 | loss: 9.68764 | epsilon: 0.01 408 | 4050 episode | score: 8.91 | loss: 15.24696 | epsilon: 0.01 409 | 4060 episode | score: 8.87 | loss: 12.83159 | epsilon: 0.01 410 | 4070 episode | score: 8.85 | loss: 8.83373 | epsilon: 0.01 411 | 4080 episode | score: 8.83 | loss: 8.02803 | epsilon: 0.01 412 | 4090 episode | score: 8.82 | loss: 11.28403 | epsilon: 0.01 413 | 4100 episode | score: 8.88 | loss: 10.47824 | epsilon: 0.01 414 | 4110 episode | score: 8.82 | loss: 13.57903 | epsilon: 0.01 415 | 4120 episode | score: 8.75 | loss: 7.18897 | epsilon: 0.01 416 | 4130 episode | score: 8.72 | loss: 9.63034 | epsilon: 0.01 417 | 4140 episode | score: 8.66 | loss: 8.00251 | epsilon: 0.01 418 | 4150 episode | score: 8.63 | loss: 9.60776 | epsilon: 0.01 419 | 4160 episode | score: 8.58 | loss: 11.98508 | epsilon: 0.01 420 | 4170 episode | score: 8.60 | loss: 12.10333 | epsilon: 0.01 421 | 4180 episode | score: 8.57 | loss: 10.45281 | epsilon: 0.01 422 | 4190 episode | score: 8.62 | loss: 11.28251 | epsilon: 0.01 423 | 4200 episode | score: 8.60 | loss: 8.83660 | epsilon: 0.01 424 | 4210 episode | score: 8.59 | loss: 11.22020 | epsilon: 0.01 425 | 4220 episode | score: 8.62 | loss: 12.84902 | epsilon: 0.01 426 | 4230 episode | score: 8.61 | loss: 12.86111 | epsilon: 0.01 427 | 4240 episode | score: 8.60 | loss: 15.39887 | epsilon: 0.01 428 | 4250 episode | score: 8.61 | loss: 8.85378 | epsilon: 0.01 429 | 4260 episode | score: 8.69 | loss: 8.83344 | epsilon: 0.01 430 | 4270 episode | score: 8.67 | loss: 5.69113 | epsilon: 0.01 431 | 4280 episode | score: 8.68 | loss: 9.65333 | epsilon: 0.01 432 | 4290 episode | score: 8.79 | loss: 9.64854 | epsilon: 0.01 433 | 4300 episode | score: 8.78 | loss: 8.08716 | epsilon: 0.01 434 | 4310 episode | score: 8.75 | loss: 8.83302 | epsilon: 0.01 435 | 4320 episode | score: 8.71 | loss: 12.05873 | epsilon: 0.01 436 | 4330 episode | score: 8.68 | loss: 10.39692 | epsilon: 0.01 437 | 4340 episode | score: 8.68 | loss: 9.61883 | epsilon: 0.01 438 | 4350 episode | score: 8.63 | loss: 12.08126 | epsilon: 0.01 439 | 4360 episode | score: 8.61 | loss: 12.13505 | epsilon: 0.01 440 | 4370 episode | score: 8.60 | loss: 13.63374 | epsilon: 0.01 441 | 4380 episode | score: 8.61 | loss: 15.19112 | epsilon: 0.01 442 | 4390 episode | score: 8.62 | loss: 15.24961 | epsilon: 0.01 443 | 4400 episode | score: 8.59 | loss: 11.25627 | epsilon: 0.01 444 | 4410 episode | score: 8.58 | loss: 13.66731 | epsilon: 0.01 445 | 4420 episode | score: 8.50 | loss: 11.30068 | epsilon: 0.01 446 | 4430 episode | score: 8.50 | loss: 13.64186 | epsilon: 0.01 447 | 4440 episode | score: 8.51 | loss: 10.41696 | epsilon: 0.01 448 | 4450 episode | score: 8.53 | loss: 9.64521 | epsilon: 0.01 449 | 4460 episode | score: 8.67 | loss: 9.70868 | epsilon: 0.01 450 | 4470 episode | score: 8.63 | loss: 10.40866 | epsilon: 0.01 451 | 4480 episode | score: 8.63 | loss: 12.10616 | epsilon: 0.01 452 | 4490 episode | score: 8.75 | loss: 13.58418 | epsilon: 0.01 453 | 4500 episode | score: 8.72 | loss: 11.95841 | epsilon: 0.01 454 | 4510 episode | score: 8.71 | loss: 9.60635 | epsilon: 0.01 455 | 4520 episode | score: 8.66 | loss: 11.95144 | epsilon: 0.01 456 | 4530 episode | score: 8.81 | loss: 14.35493 | epsilon: 0.01 457 | 4540 episode | score: 8.76 | loss: 10.48997 | epsilon: 0.01 458 | 4550 episode | score: 8.75 | loss: 10.43027 | epsilon: 0.01 459 | 4560 episode | score: 8.72 | loss: 12.12173 | epsilon: 0.01 460 | 4570 episode | score: 8.72 | loss: 12.01201 | epsilon: 0.01 461 | 4580 episode | score: 8.72 | loss: 9.61949 | epsilon: 0.01 462 | 4590 episode | score: 8.67 | loss: 12.77416 | epsilon: 0.01 463 | 4600 episode | score: 8.63 | loss: 16.01569 | epsilon: 0.01 464 | 4610 episode | score: 8.65 | loss: 13.59569 | epsilon: 0.01 465 | 4620 episode | score: 8.61 | loss: 8.00971 | epsilon: 0.01 466 | 4630 episode | score: 8.62 | loss: 9.68583 | epsilon: 0.01 467 | 4640 episode | score: 8.57 | loss: 11.19599 | epsilon: 0.01 468 | 4650 episode | score: 8.61 | loss: 9.61889 | epsilon: 0.01 469 | 4660 episode | score: 8.62 | loss: 7.23984 | epsilon: 0.01 470 | 4670 episode | score: 8.62 | loss: 11.21859 | epsilon: 0.01 471 | 4680 episode | score: 8.60 | loss: 11.23654 | epsilon: 0.01 472 | 4690 episode | score: 8.56 | loss: 12.08710 | epsilon: 0.01 473 | 4700 episode | score: 8.70 | loss: 13.60349 | epsilon: 0.01 474 | 4710 episode | score: 8.66 | loss: 12.83391 | epsilon: 0.01 475 | 4720 episode | score: 8.80 | loss: 15.21070 | epsilon: 0.01 476 | 4730 episode | score: 8.81 | loss: 11.25304 | epsilon: 0.01 477 | 4740 episode | score: 8.78 | loss: 9.60553 | epsilon: 0.01 478 | 4750 episode | score: 8.77 | loss: 8.88370 | epsilon: 0.01 479 | 4760 episode | score: 8.80 | loss: 11.28654 | epsilon: 0.01 480 | 4770 episode | score: 8.75 | loss: 13.70106 | epsilon: 0.01 481 | 4780 episode | score: 8.71 | loss: 12.84049 | epsilon: 0.01 482 | 4790 episode | score: 8.72 | loss: 9.73600 | epsilon: 0.01 483 | 4800 episode | score: 8.68 | loss: 11.25344 | epsilon: 0.01 484 | 4810 episode | score: 8.62 | loss: 9.74465 | epsilon: 0.01 485 | 4820 episode | score: 8.58 | loss: 12.82181 | epsilon: 0.01 486 | 4830 episode | score: 8.56 | loss: 12.08630 | epsilon: 0.01 487 | 4840 episode | score: 8.50 | loss: 13.64277 | epsilon: 0.01 488 | 4850 episode | score: 8.59 | loss: 9.67230 | epsilon: 0.01 489 | 4860 episode | score: 8.66 | loss: 15.23198 | epsilon: 0.01 490 | 4870 episode | score: 8.65 | loss: 15.16864 | epsilon: 0.01 491 | 4880 episode | score: 8.64 | loss: 14.38031 | epsilon: 0.01 492 | 4890 episode | score: 8.78 | loss: 11.27028 | epsilon: 0.01 493 | 4900 episode | score: 8.83 | loss: 11.98241 | epsilon: 0.01 494 | 4910 episode | score: 8.88 | loss: 11.29107 | epsilon: 0.01 495 | 4920 episode | score: 8.83 | loss: 11.27844 | epsilon: 0.01 496 | 4930 episode | score: 8.77 | loss: 13.63153 | epsilon: 0.01 497 | 4940 episode | score: 8.74 | loss: 8.10678 | epsilon: 0.01 498 | 4950 episode | score: 8.71 | loss: 11.96388 | epsilon: 0.01 499 | 4960 episode | score: 8.67 | loss: 11.19639 | epsilon: 0.01 500 | 4970 episode | score: 8.64 | loss: 10.38944 | epsilon: 0.01 501 | 4980 episode | score: 8.60 | loss: 9.59142 | epsilon: 0.01 502 | 4990 episode | score: 8.69 | loss: 16.00748 | epsilon: 0.01 503 | -------------------------------------------------------------------------------- /out/trace_DTQN_5.txt: -------------------------------------------------------------------------------- 1 | state size: 2 2 | action size: 2 3 | 0 episode | score: 42.00 | loss: 0.00000 | epsilon: 1.00 4 | 10 episode | score: 39.48 | loss: 0.00000 | epsilon: 1.00 5 | 20 episode | score: 37.85 | loss: 0.00000 | epsilon: 1.00 6 | 30 episode | score: 36.05 | loss: 0.00000 | epsilon: 1.00 7 | 40 episode | score: 34.11 | loss: 0.00000 | epsilon: 1.00 8 | 50 episode | score: 32.68 | loss: 0.33304 | epsilon: 1.00 9 | 60 episode | score: 31.11 | loss: 0.20664 | epsilon: 0.99 10 | 70 episode | score: 30.30 | loss: 0.17808 | epsilon: 0.98 11 | 80 episode | score: 29.67 | loss: 0.25635 | epsilon: 0.97 12 | 90 episode | score: 29.02 | loss: 0.52714 | epsilon: 0.95 13 | 100 episode | score: 27.89 | loss: 0.79162 | epsilon: 0.95 14 | 110 episode | score: 27.20 | loss: 0.32172 | epsilon: 0.93 15 | 120 episode | score: 27.31 | loss: 0.88816 | epsilon: 0.92 16 | 130 episode | score: 26.59 | loss: 0.67432 | epsilon: 0.91 17 | 140 episode | score: 25.59 | loss: 0.21146 | epsilon: 0.90 18 | 150 episode | score: 25.58 | loss: 0.57946 | epsilon: 0.89 19 | 160 episode | score: 25.57 | loss: 0.75138 | epsilon: 0.87 20 | 170 episode | score: 24.81 | loss: 0.45107 | epsilon: 0.87 21 | 180 episode | score: 24.31 | loss: 1.11557 | epsilon: 0.86 22 | 190 episode | score: 24.40 | loss: 0.73569 | epsilon: 0.84 23 | 200 episode | score: 23.95 | loss: 0.52971 | epsilon: 0.83 24 | 210 episode | score: 23.44 | loss: 0.29357 | epsilon: 0.82 25 | 220 episode | score: 23.33 | loss: 0.56872 | epsilon: 0.81 26 | 230 episode | score: 22.69 | loss: 1.67684 | epsilon: 0.80 27 | 240 episode | score: 22.04 | loss: 0.90344 | epsilon: 0.79 28 | 250 episode | score: 21.78 | loss: 1.22213 | epsilon: 0.78 29 | 260 episode | score: 22.02 | loss: 1.55604 | epsilon: 0.77 30 | 270 episode | score: 21.59 | loss: 1.91622 | epsilon: 0.76 31 | 280 episode | score: 21.56 | loss: 1.86581 | epsilon: 0.75 32 | 290 episode | score: 21.94 | loss: 1.05475 | epsilon: 0.74 33 | 300 episode | score: 21.63 | loss: 2.10064 | epsilon: 0.73 34 | 310 episode | score: 21.41 | loss: 0.74282 | epsilon: 0.72 35 | 320 episode | score: 21.19 | loss: 1.13760 | epsilon: 0.71 36 | 330 episode | score: 21.51 | loss: 0.42925 | epsilon: 0.69 37 | 340 episode | score: 20.95 | loss: 0.79201 | epsilon: 0.69 38 | 350 episode | score: 20.47 | loss: 0.81940 | epsilon: 0.68 39 | 360 episode | score: 20.35 | loss: 2.00666 | epsilon: 0.67 40 | 370 episode | score: 19.69 | loss: 1.29173 | epsilon: 0.66 41 | 380 episode | score: 19.51 | loss: 1.24970 | epsilon: 0.65 42 | 390 episode | score: 19.22 | loss: 1.70168 | epsilon: 0.64 43 | 400 episode | score: 18.73 | loss: 1.36878 | epsilon: 0.63 44 | 410 episode | score: 18.40 | loss: 2.58290 | epsilon: 0.63 45 | 420 episode | score: 18.50 | loss: 2.18710 | epsilon: 0.62 46 | 430 episode | score: 18.04 | loss: 4.43702 | epsilon: 0.61 47 | 440 episode | score: 17.68 | loss: 2.25777 | epsilon: 0.60 48 | 450 episode | score: 17.39 | loss: 2.31119 | epsilon: 0.59 49 | 460 episode | score: 17.14 | loss: 1.26720 | epsilon: 0.58 50 | 470 episode | score: 17.04 | loss: 1.41837 | epsilon: 0.58 51 | 480 episode | score: 16.55 | loss: 1.97171 | epsilon: 0.57 52 | 490 episode | score: 16.70 | loss: 0.96602 | epsilon: 0.56 53 | 500 episode | score: 16.50 | loss: 2.92387 | epsilon: 0.55 54 | 510 episode | score: 16.59 | loss: 0.99988 | epsilon: 0.54 55 | 520 episode | score: 16.23 | loss: 1.98007 | epsilon: 0.54 56 | 530 episode | score: 16.07 | loss: 1.51870 | epsilon: 0.53 57 | 540 episode | score: 16.09 | loss: 3.35758 | epsilon: 0.52 58 | 550 episode | score: 15.84 | loss: 0.52732 | epsilon: 0.51 59 | 560 episode | score: 15.46 | loss: 3.56532 | epsilon: 0.51 60 | 570 episode | score: 15.54 | loss: 4.62709 | epsilon: 0.50 61 | 580 episode | score: 15.15 | loss: 3.11519 | epsilon: 0.49 62 | 590 episode | score: 14.99 | loss: 2.29960 | epsilon: 0.48 63 | 600 episode | score: 14.67 | loss: 2.66678 | epsilon: 0.48 64 | 610 episode | score: 14.30 | loss: 3.19330 | epsilon: 0.47 65 | 620 episode | score: 14.76 | loss: 2.15541 | epsilon: 0.46 66 | 630 episode | score: 15.34 | loss: 2.69951 | epsilon: 0.45 67 | 640 episode | score: 14.88 | loss: 4.35402 | epsilon: 0.45 68 | 650 episode | score: 14.53 | loss: 3.36075 | epsilon: 0.44 69 | 660 episode | score: 14.26 | loss: 4.38827 | epsilon: 0.43 70 | 670 episode | score: 13.98 | loss: 2.76526 | epsilon: 0.43 71 | 680 episode | score: 13.68 | loss: 2.82026 | epsilon: 0.42 72 | 690 episode | score: 13.41 | loss: 5.13891 | epsilon: 0.41 73 | 700 episode | score: 13.38 | loss: 4.49162 | epsilon: 0.41 74 | 710 episode | score: 13.31 | loss: 2.81179 | epsilon: 0.40 75 | 720 episode | score: 13.11 | loss: 3.50450 | epsilon: 0.39 76 | 730 episode | score: 12.86 | loss: 4.08011 | epsilon: 0.39 77 | 740 episode | score: 12.95 | loss: 4.57233 | epsilon: 0.38 78 | 750 episode | score: 12.61 | loss: 5.19284 | epsilon: 0.38 79 | 760 episode | score: 12.57 | loss: 3.67743 | epsilon: 0.37 80 | 770 episode | score: 12.77 | loss: 5.29215 | epsilon: 0.36 81 | 780 episode | score: 12.68 | loss: 4.67841 | epsilon: 0.36 82 | 790 episode | score: 12.78 | loss: 3.58484 | epsilon: 0.35 83 | 800 episode | score: 12.50 | loss: 4.14623 | epsilon: 0.34 84 | 810 episode | score: 12.47 | loss: 4.75443 | epsilon: 0.34 85 | 820 episode | score: 12.26 | loss: 5.34280 | epsilon: 0.33 86 | 830 episode | score: 12.21 | loss: 4.85460 | epsilon: 0.32 87 | 840 episode | score: 11.93 | loss: 3.64325 | epsilon: 0.32 88 | 850 episode | score: 11.83 | loss: 4.22764 | epsilon: 0.31 89 | 860 episode | score: 12.12 | loss: 4.29264 | epsilon: 0.31 90 | 870 episode | score: 12.38 | loss: 3.06890 | epsilon: 0.30 91 | 880 episode | score: 12.41 | loss: 3.66894 | epsilon: 0.29 92 | 890 episode | score: 12.24 | loss: 4.93459 | epsilon: 0.28 93 | 900 episode | score: 11.99 | loss: 3.81321 | epsilon: 0.28 94 | 910 episode | score: 11.82 | loss: 3.75304 | epsilon: 0.27 95 | 920 episode | score: 11.84 | loss: 4.98607 | epsilon: 0.27 96 | 930 episode | score: 11.69 | loss: 3.75075 | epsilon: 0.26 97 | 940 episode | score: 11.43 | loss: 3.77254 | epsilon: 0.26 98 | 950 episode | score: 11.36 | loss: 3.76858 | epsilon: 0.25 99 | 960 episode | score: 11.16 | loss: 4.41588 | epsilon: 0.25 100 | 970 episode | score: 10.97 | loss: 6.90373 | epsilon: 0.24 101 | 980 episode | score: 10.81 | loss: 5.72781 | epsilon: 0.24 102 | 990 episode | score: 10.76 | loss: 7.01954 | epsilon: 0.23 103 | 1000 episode | score: 10.64 | loss: 6.46829 | epsilon: 0.22 104 | 1010 episode | score: 10.69 | loss: 5.75487 | epsilon: 0.22 105 | 1020 episode | score: 10.74 | loss: 6.37494 | epsilon: 0.21 106 | 1030 episode | score: 10.59 | loss: 5.13310 | epsilon: 0.21 107 | 1040 episode | score: 10.41 | loss: 7.83554 | epsilon: 0.20 108 | 1050 episode | score: 10.42 | loss: 7.08298 | epsilon: 0.20 109 | 1060 episode | score: 10.33 | loss: 5.80025 | epsilon: 0.19 110 | 1070 episode | score: 10.32 | loss: 2.59830 | epsilon: 0.19 111 | 1080 episode | score: 10.55 | loss: 7.13071 | epsilon: 0.18 112 | 1090 episode | score: 10.41 | loss: 5.87516 | epsilon: 0.17 113 | 1100 episode | score: 10.26 | loss: 7.79912 | epsilon: 0.17 114 | 1110 episode | score: 10.19 | loss: 9.12289 | epsilon: 0.16 115 | 1120 episode | score: 10.19 | loss: 8.51380 | epsilon: 0.16 116 | 1130 episode | score: 10.19 | loss: 4.64488 | epsilon: 0.15 117 | 1140 episode | score: 10.19 | loss: 3.95919 | epsilon: 0.15 118 | 1150 episode | score: 10.22 | loss: 9.88921 | epsilon: 0.14 119 | 1160 episode | score: 10.15 | loss: 5.28739 | epsilon: 0.14 120 | 1170 episode | score: 10.00 | loss: 5.29736 | epsilon: 0.13 121 | 1180 episode | score: 9.94 | loss: 6.01434 | epsilon: 0.13 122 | 1190 episode | score: 10.00 | loss: 5.98966 | epsilon: 0.12 123 | 1200 episode | score: 9.93 | loss: 7.42015 | epsilon: 0.11 124 | 1210 episode | score: 9.94 | loss: 4.78921 | epsilon: 0.11 125 | 1220 episode | score: 9.94 | loss: 6.73261 | epsilon: 0.10 126 | 1230 episode | score: 9.86 | loss: 6.77203 | epsilon: 0.10 127 | 1240 episode | score: 9.86 | loss: 10.79873 | epsilon: 0.09 128 | 1250 episode | score: 9.80 | loss: 5.41691 | epsilon: 0.09 129 | 1260 episode | score: 9.76 | loss: 7.49157 | epsilon: 0.08 130 | 1270 episode | score: 9.77 | loss: 4.80807 | epsilon: 0.08 131 | 1280 episode | score: 9.67 | loss: 8.86116 | epsilon: 0.07 132 | 1290 episode | score: 9.74 | loss: 8.23815 | epsilon: 0.07 133 | 1300 episode | score: 9.82 | loss: 6.13960 | epsilon: 0.06 134 | 1310 episode | score: 9.73 | loss: 6.84984 | epsilon: 0.06 135 | 1320 episode | score: 9.66 | loss: 10.96124 | epsilon: 0.05 136 | 1330 episode | score: 9.76 | loss: 6.19592 | epsilon: 0.05 137 | 1340 episode | score: 9.69 | loss: 8.32158 | epsilon: 0.04 138 | 1350 episode | score: 9.62 | loss: 8.96192 | epsilon: 0.04 139 | 1360 episode | score: 9.54 | loss: 10.39995 | epsilon: 0.03 140 | 1370 episode | score: 9.42 | loss: 6.96574 | epsilon: 0.03 141 | 1380 episode | score: 9.33 | loss: 9.09122 | epsilon: 0.02 142 | 1390 episode | score: 9.24 | loss: 11.82153 | epsilon: 0.02 143 | 1400 episode | score: 9.15 | loss: 8.40250 | epsilon: 0.01 144 | 1410 episode | score: 9.12 | loss: 11.16653 | epsilon: 0.01 145 | 1420 episode | score: 9.13 | loss: 8.46265 | epsilon: 0.01 146 | 1430 episode | score: 9.10 | loss: 8.38696 | epsilon: 0.01 147 | 1440 episode | score: 9.05 | loss: 7.71308 | epsilon: 0.01 148 | 1450 episode | score: 8.98 | loss: 10.51652 | epsilon: 0.01 149 | 1460 episode | score: 8.92 | loss: 8.40514 | epsilon: 0.01 150 | 1470 episode | score: 8.88 | loss: 9.84612 | epsilon: 0.01 151 | 1480 episode | score: 9.01 | loss: 12.71185 | epsilon: 0.01 152 | 1490 episode | score: 8.99 | loss: 9.84389 | epsilon: 0.01 153 | 1500 episode | score: 8.93 | loss: 13.35950 | epsilon: 0.01 154 | 1510 episode | score: 8.89 | loss: 8.45604 | epsilon: 0.01 155 | 1520 episode | score: 8.86 | loss: 9.86826 | epsilon: 0.01 156 | 1530 episode | score: 8.82 | loss: 11.98928 | epsilon: 0.01 157 | 1540 episode | score: 8.79 | loss: 9.35221 | epsilon: 0.01 158 | 1550 episode | score: 8.71 | loss: 12.74729 | epsilon: 0.01 159 | 1560 episode | score: 8.70 | loss: 12.13309 | epsilon: 0.01 160 | 1570 episode | score: 8.67 | loss: 11.34081 | epsilon: 0.01 161 | 1580 episode | score: 8.67 | loss: 10.66604 | epsilon: 0.01 162 | 1590 episode | score: 8.68 | loss: 9.95511 | epsilon: 0.01 163 | 1600 episode | score: 8.80 | loss: 10.72554 | epsilon: 0.01 164 | 1610 episode | score: 8.76 | loss: 12.85478 | epsilon: 0.01 165 | 1620 episode | score: 8.83 | loss: 9.37244 | epsilon: 0.01 166 | 1630 episode | score: 8.81 | loss: 10.04362 | epsilon: 0.01 167 | 1640 episode | score: 9.16 | loss: 10.84129 | epsilon: 0.01 168 | 1650 episode | score: 9.13 | loss: 13.66985 | epsilon: 0.01 169 | 1660 episode | score: 9.04 | loss: 10.13253 | epsilon: 0.01 170 | 1670 episode | score: 8.98 | loss: 8.66332 | epsilon: 0.01 171 | 1680 episode | score: 8.94 | loss: 11.53072 | epsilon: 0.01 172 | 1690 episode | score: 8.89 | loss: 10.12132 | epsilon: 0.01 173 | 1700 episode | score: 8.97 | loss: 9.47163 | epsilon: 0.01 174 | 1710 episode | score: 8.91 | loss: 10.89018 | epsilon: 0.01 175 | 1720 episode | score: 8.84 | loss: 9.44108 | epsilon: 0.01 176 | 1730 episode | score: 8.81 | loss: 9.40377 | epsilon: 0.01 177 | 1740 episode | score: 9.06 | loss: 6.60606 | epsilon: 0.01 178 | 1750 episode | score: 9.02 | loss: 8.85948 | epsilon: 0.01 179 | 1760 episode | score: 8.95 | loss: 13.07458 | epsilon: 0.01 180 | 1770 episode | score: 8.92 | loss: 5.84995 | epsilon: 0.01 181 | 1780 episode | score: 8.88 | loss: 10.89841 | epsilon: 0.01 182 | 1790 episode | score: 8.87 | loss: 10.92721 | epsilon: 0.01 183 | 1800 episode | score: 8.85 | loss: 10.95057 | epsilon: 0.01 184 | 1810 episode | score: 8.91 | loss: 5.12229 | epsilon: 0.01 185 | 1820 episode | score: 8.85 | loss: 12.39469 | epsilon: 0.01 186 | 1830 episode | score: 8.81 | loss: 7.34548 | epsilon: 0.01 187 | 1840 episode | score: 8.79 | loss: 8.81931 | epsilon: 0.01 188 | 1850 episode | score: 8.73 | loss: 11.68220 | epsilon: 0.01 189 | 1860 episode | score: 8.66 | loss: 9.51998 | epsilon: 0.01 190 | 1870 episode | score: 8.61 | loss: 11.85967 | epsilon: 0.01 191 | 1880 episode | score: 8.53 | loss: 12.43571 | epsilon: 0.01 192 | 1890 episode | score: 8.51 | loss: 13.20536 | epsilon: 0.01 193 | 1900 episode | score: 8.61 | loss: 12.49828 | epsilon: 0.01 194 | 1910 episode | score: 8.64 | loss: 14.00794 | epsilon: 0.01 195 | 1920 episode | score: 8.62 | loss: 11.06345 | epsilon: 0.01 196 | 1930 episode | score: 8.62 | loss: 12.55268 | epsilon: 0.01 197 | 1940 episode | score: 8.61 | loss: 10.36380 | epsilon: 0.01 198 | 1950 episode | score: 8.57 | loss: 9.59783 | epsilon: 0.01 199 | 1960 episode | score: 8.62 | loss: 11.87677 | epsilon: 0.01 200 | 1970 episode | score: 8.56 | loss: 8.95421 | epsilon: 0.01 201 | 1980 episode | score: 8.57 | loss: 11.90256 | epsilon: 0.01 202 | 1990 episode | score: 8.56 | loss: 12.88654 | epsilon: 0.01 203 | 2000 episode | score: 8.52 | loss: 22.89971 | epsilon: 0.01 204 | 2010 episode | score: 8.45 | loss: 11.69317 | epsilon: 0.01 205 | 2020 episode | score: 8.46 | loss: 13.89884 | epsilon: 0.01 206 | 2030 episode | score: 8.47 | loss: 12.21345 | epsilon: 0.01 207 | 2040 episode | score: 8.60 | loss: 12.74696 | epsilon: 0.01 208 | 2050 episode | score: 9.39 | loss: 9.73934 | epsilon: 0.01 209 | 2060 episode | score: 9.53 | loss: 5.97629 | epsilon: 0.01 210 | 2070 episode | score: 9.51 | loss: 7.89748 | epsilon: 0.01 211 | 2080 episode | score: 10.17 | loss: 11.30224 | epsilon: 0.01 212 | 2090 episode | score: 10.16 | loss: 11.95127 | epsilon: 0.01 213 | 2100 episode | score: 9.98 | loss: 9.02979 | epsilon: 0.01 214 | 2110 episode | score: 9.85 | loss: 6.85986 | epsilon: 0.01 215 | 2120 episode | score: 9.70 | loss: 7.56394 | epsilon: 0.01 216 | 2130 episode | score: 9.71 | loss: 6.06545 | epsilon: 0.01 217 | 2140 episode | score: 9.55 | loss: 4.56698 | epsilon: 0.01 218 | 2150 episode | score: 9.45 | loss: 11.08675 | epsilon: 0.01 219 | 2160 episode | score: 9.39 | loss: 5.97940 | epsilon: 0.01 220 | 2170 episode | score: 9.50 | loss: 6.68500 | epsilon: 0.01 221 | 2180 episode | score: 9.44 | loss: 10.37776 | epsilon: 0.01 222 | 2190 episode | score: 10.14 | loss: 8.89424 | epsilon: 0.01 223 | 2200 episode | score: 10.23 | loss: 6.68299 | epsilon: 0.01 224 | 2210 episode | score: 10.10 | loss: 8.16638 | epsilon: 0.01 225 | 2220 episode | score: 9.93 | loss: 8.17309 | epsilon: 0.01 226 | 2230 episode | score: 9.76 | loss: 8.17456 | epsilon: 0.01 227 | 2240 episode | score: 9.68 | loss: 11.12660 | epsilon: 0.01 228 | 2250 episode | score: 9.52 | loss: 8.20828 | epsilon: 0.01 229 | 2260 episode | score: 9.49 | loss: 6.74850 | epsilon: 0.01 230 | 2270 episode | score: 9.34 | loss: 8.91310 | epsilon: 0.01 231 | 2280 episode | score: 9.26 | loss: 9.66050 | epsilon: 0.01 232 | 2290 episode | score: 9.36 | loss: 7.45649 | epsilon: 0.01 233 | 2300 episode | score: 9.35 | loss: 14.13468 | epsilon: 0.01 234 | 2310 episode | score: 9.29 | loss: 8.20157 | epsilon: 0.01 235 | 2320 episode | score: 9.20 | loss: 13.40456 | epsilon: 0.01 236 | 2330 episode | score: 9.12 | loss: 10.44636 | epsilon: 0.01 237 | 2340 episode | score: 9.31 | loss: 10.45737 | epsilon: 0.01 238 | 2350 episode | score: 9.32 | loss: 8.96152 | epsilon: 0.01 239 | 2360 episode | score: 9.37 | loss: 12.77146 | epsilon: 0.01 240 | 2370 episode | score: 9.33 | loss: 11.96437 | epsilon: 0.01 241 | 2380 episode | score: 9.25 | loss: 8.28421 | epsilon: 0.01 242 | 2390 episode | score: 9.20 | loss: 11.25395 | epsilon: 0.01 243 | 2400 episode | score: 9.14 | loss: 12.75365 | epsilon: 0.01 244 | 2410 episode | score: 9.23 | loss: 11.28267 | epsilon: 0.01 245 | 2420 episode | score: 9.58 | loss: 9.01897 | epsilon: 0.01 246 | 2430 episode | score: 9.49 | loss: 9.78657 | epsilon: 0.01 247 | 2440 episode | score: 9.41 | loss: 12.01540 | epsilon: 0.01 248 | 2450 episode | score: 9.40 | loss: 9.77980 | epsilon: 0.01 249 | 2460 episode | score: 9.51 | loss: 9.04971 | epsilon: 0.01 250 | 2470 episode | score: 9.48 | loss: 9.81820 | epsilon: 0.01 251 | 2480 episode | score: 9.39 | loss: 12.04065 | epsilon: 0.01 252 | 2490 episode | score: 9.27 | loss: 8.29423 | epsilon: 0.01 253 | 2500 episode | score: 9.15 | loss: 7.55239 | epsilon: 0.01 254 | 2510 episode | score: 9.10 | loss: 11.31253 | epsilon: 0.01 255 | 2520 episode | score: 9.03 | loss: 11.32649 | epsilon: 0.01 256 | 2530 episode | score: 8.93 | loss: 10.57569 | epsilon: 0.01 257 | 2540 episode | score: 8.96 | loss: 12.10625 | epsilon: 0.01 258 | 2550 episode | score: 8.97 | loss: 10.59332 | epsilon: 0.01 259 | 2560 episode | score: 8.96 | loss: 12.86546 | epsilon: 0.01 260 | 2570 episode | score: 8.92 | loss: 13.63366 | epsilon: 0.01 261 | 2580 episode | score: 8.85 | loss: 12.12462 | epsilon: 0.01 262 | 2590 episode | score: 8.87 | loss: 9.89139 | epsilon: 0.01 263 | 2600 episode | score: 8.94 | loss: 6.10569 | epsilon: 0.01 264 | 2610 episode | score: 8.91 | loss: 11.38791 | epsilon: 0.01 265 | 2620 episode | score: 8.89 | loss: 9.11608 | epsilon: 0.01 266 | 2630 episode | score: 8.93 | loss: 12.91580 | epsilon: 0.01 267 | 2640 episode | score: 9.68 | loss: 7.73742 | epsilon: 0.01 268 | 2650 episode | score: 9.54 | loss: 6.88130 | epsilon: 0.01 269 | 2660 episode | score: 9.59 | loss: 10.00360 | epsilon: 0.01 270 | 2670 episode | score: 9.44 | loss: 9.20884 | epsilon: 0.01 271 | 2680 episode | score: 9.30 | loss: 7.65980 | epsilon: 0.01 272 | 2690 episode | score: 9.19 | loss: 8.41324 | epsilon: 0.01 273 | 2700 episode | score: 9.14 | loss: 12.98391 | epsilon: 0.01 274 | 2710 episode | score: 9.13 | loss: 6.90586 | epsilon: 0.01 275 | 2720 episode | score: 9.08 | loss: 6.90842 | epsilon: 0.01 276 | 2730 episode | score: 8.99 | loss: 9.21201 | epsilon: 0.01 277 | 2740 episode | score: 9.62 | loss: 10.79367 | epsilon: 0.01 278 | 2750 episode | score: 9.50 | loss: 13.79613 | epsilon: 0.01 279 | 2760 episode | score: 9.42 | loss: 6.24562 | epsilon: 0.01 280 | 2770 episode | score: 9.47 | loss: 10.74018 | epsilon: 0.01 281 | 2780 episode | score: 9.56 | loss: 10.00158 | epsilon: 0.01 282 | 2790 episode | score: 10.28 | loss: 8.44825 | epsilon: 0.01 283 | 2800 episode | score: 10.16 | loss: 8.44763 | epsilon: 0.01 284 | 2810 episode | score: 10.02 | loss: 10.75428 | epsilon: 0.01 285 | 2820 episode | score: 10.11 | loss: 6.92661 | epsilon: 0.01 286 | 2830 episode | score: 10.18 | loss: 9.22772 | epsilon: 0.01 287 | 2840 episode | score: 10.04 | loss: 8.44034 | epsilon: 0.01 288 | 2850 episode | score: 9.87 | loss: 8.47338 | epsilon: 0.01 289 | 2860 episode | score: 9.70 | loss: 10.01442 | epsilon: 0.01 290 | 2870 episode | score: 9.60 | loss: 5.43404 | epsilon: 0.01 291 | 2880 episode | score: 9.46 | loss: 6.15170 | epsilon: 0.01 292 | 2890 episode | score: 9.42 | loss: 6.16667 | epsilon: 0.01 293 | 2900 episode | score: 9.36 | loss: 7.73600 | epsilon: 0.01 294 | 2910 episode | score: 9.29 | loss: 12.33317 | epsilon: 0.01 295 | 2920 episode | score: 9.23 | loss: 9.23750 | epsilon: 0.01 296 | 2930 episode | score: 9.43 | loss: 10.80528 | epsilon: 0.01 297 | 2940 episode | score: 9.69 | loss: 14.62934 | epsilon: 0.01 298 | 2950 episode | score: 9.58 | loss: 9.26545 | epsilon: 0.01 299 | 2960 episode | score: 9.61 | loss: 7.04498 | epsilon: 0.01 300 | 2970 episode | score: 9.50 | loss: 10.81926 | epsilon: 0.01 301 | 2980 episode | score: 9.53 | loss: 12.33160 | epsilon: 0.01 302 | 2990 episode | score: 9.43 | loss: 9.26536 | epsilon: 0.01 303 | 3000 episode | score: 9.34 | loss: 6.20229 | epsilon: 0.01 304 | 3010 episode | score: 9.36 | loss: 8.49146 | epsilon: 0.01 305 | 3020 episode | score: 9.26 | loss: 12.36944 | epsilon: 0.01 306 | 3030 episode | score: 9.18 | loss: 10.04898 | epsilon: 0.01 307 | 3040 episode | score: 9.07 | loss: 10.06615 | epsilon: 0.01 308 | 3050 episode | score: 9.03 | loss: 11.62272 | epsilon: 0.01 309 | 3060 episode | score: 8.99 | loss: 13.16744 | epsilon: 0.01 310 | 3070 episode | score: 8.95 | loss: 12.43446 | epsilon: 0.01 311 | 3080 episode | score: 8.94 | loss: 9.40432 | epsilon: 0.01 312 | 3090 episode | score: 8.92 | loss: 14.01531 | epsilon: 0.01 313 | 3100 episode | score: 8.88 | loss: 7.11171 | epsilon: 0.01 314 | 3110 episode | score: 8.86 | loss: 10.89154 | epsilon: 0.01 315 | 3120 episode | score: 8.82 | loss: 9.32153 | epsilon: 0.01 316 | 3130 episode | score: 8.69 | loss: 11.73717 | epsilon: 0.01 317 | 3140 episode | score: 8.71 | loss: 9.37411 | epsilon: 0.01 318 | 3150 episode | score: 8.73 | loss: 11.65305 | epsilon: 0.01 319 | 3160 episode | score: 8.72 | loss: 9.43469 | epsilon: 0.01 320 | 3170 episode | score: 8.86 | loss: 14.79311 | epsilon: 0.01 321 | 3180 episode | score: 8.79 | loss: 13.25291 | epsilon: 0.01 322 | 3190 episode | score: 8.73 | loss: 11.71912 | epsilon: 0.01 323 | 3200 episode | score: 8.70 | loss: 8.59743 | epsilon: 0.01 324 | 3210 episode | score: 8.70 | loss: 11.68095 | epsilon: 0.01 325 | 3220 episode | score: 8.67 | loss: 12.51386 | epsilon: 0.01 326 | 3230 episode | score: 8.68 | loss: 9.40448 | epsilon: 0.01 327 | 3240 episode | score: 8.64 | loss: 9.40135 | epsilon: 0.01 328 | 3250 episode | score: 8.65 | loss: 10.15096 | epsilon: 0.01 329 | 3260 episode | score: 8.67 | loss: 9.35014 | epsilon: 0.01 330 | 3270 episode | score: 8.79 | loss: 12.45504 | epsilon: 0.01 331 | 3280 episode | score: 8.79 | loss: 8.64415 | epsilon: 0.01 332 | 3290 episode | score: 8.75 | loss: 10.20365 | epsilon: 0.01 333 | 3300 episode | score: 8.72 | loss: 7.86599 | epsilon: 0.01 334 | 3310 episode | score: 8.74 | loss: 10.96756 | epsilon: 0.01 335 | 3320 episode | score: 8.68 | loss: 13.34599 | epsilon: 0.01 336 | 3330 episode | score: 8.66 | loss: 10.94800 | epsilon: 0.01 337 | 3340 episode | score: 8.62 | loss: 14.05154 | epsilon: 0.01 338 | 3350 episode | score: 8.60 | loss: 10.23919 | epsilon: 0.01 339 | 3360 episode | score: 8.59 | loss: 14.03212 | epsilon: 0.01 340 | 3370 episode | score: 8.55 | loss: 11.77572 | epsilon: 0.01 341 | 3380 episode | score: 8.68 | loss: 10.25810 | epsilon: 0.01 342 | 3390 episode | score: 8.75 | loss: 8.62217 | epsilon: 0.01 343 | 3400 episode | score: 8.75 | loss: 12.53079 | epsilon: 0.01 344 | 3410 episode | score: 8.81 | loss: 7.07102 | epsilon: 0.01 345 | 3420 episode | score: 8.79 | loss: 15.74056 | epsilon: 0.01 346 | 3430 episode | score: 8.84 | loss: 11.75673 | epsilon: 0.01 347 | 3440 episode | score: 8.89 | loss: 10.96829 | epsilon: 0.01 348 | 3450 episode | score: 8.81 | loss: 12.55960 | epsilon: 0.01 349 | 3460 episode | score: 8.82 | loss: 12.55970 | epsilon: 0.01 350 | 3470 episode | score: 8.76 | loss: 10.98289 | epsilon: 0.01 351 | 3480 episode | score: 8.84 | loss: 11.01114 | epsilon: 0.01 352 | 3490 episode | score: 8.82 | loss: 11.75950 | epsilon: 0.01 353 | 3500 episode | score: 8.85 | loss: 13.32741 | epsilon: 0.01 354 | 3510 episode | score: 8.80 | loss: 12.60781 | epsilon: 0.01 355 | 3520 episode | score: 8.76 | loss: 13.35716 | epsilon: 0.01 356 | 3530 episode | score: 8.70 | loss: 11.76125 | epsilon: 0.01 357 | 3540 episode | score: 8.73 | loss: 14.90952 | epsilon: 0.01 358 | 3550 episode | score: 8.70 | loss: 11.77749 | epsilon: 0.01 359 | 3560 episode | score: 8.72 | loss: 12.56740 | epsilon: 0.01 360 | 3570 episode | score: 8.73 | loss: 14.99122 | epsilon: 0.01 361 | 3580 episode | score: 8.80 | loss: 17.37853 | epsilon: 0.01 362 | 3590 episode | score: 8.72 | loss: 8.76137 | epsilon: 0.01 363 | 3600 episode | score: 8.67 | loss: 10.23885 | epsilon: 0.01 364 | 3610 episode | score: 8.65 | loss: 11.01727 | epsilon: 0.01 365 | 3620 episode | score: 8.62 | loss: 14.13370 | epsilon: 0.01 366 | 3630 episode | score: 8.72 | loss: 11.00491 | epsilon: 0.01 367 | 3640 episode | score: 8.69 | loss: 8.70311 | epsilon: 0.01 368 | 3650 episode | score: 8.66 | loss: 13.38176 | epsilon: 0.01 369 | 3660 episode | score: 8.63 | loss: 7.11742 | epsilon: 0.01 370 | 3670 episode | score: 8.56 | loss: 14.16870 | epsilon: 0.01 371 | 3680 episode | score: 8.63 | loss: 11.78554 | epsilon: 0.01 372 | 3690 episode | score: 8.66 | loss: 10.24880 | epsilon: 0.01 373 | 3700 episode | score: 8.65 | loss: 13.40398 | epsilon: 0.01 374 | 3710 episode | score: 8.65 | loss: 12.64166 | epsilon: 0.01 375 | 3720 episode | score: 8.64 | loss: 9.59502 | epsilon: 0.01 376 | 3730 episode | score: 8.64 | loss: 12.61637 | epsilon: 0.01 377 | 3740 episode | score: 8.70 | loss: 7.87491 | epsilon: 0.01 378 | 3750 episode | score: 8.66 | loss: 7.88004 | epsilon: 0.01 379 | 3760 episode | score: 8.64 | loss: 11.78762 | epsilon: 0.01 380 | 3770 episode | score: 8.74 | loss: 9.46066 | epsilon: 0.01 381 | 3780 episode | score: 8.73 | loss: 7.96536 | epsilon: 0.01 382 | 3790 episode | score: 8.71 | loss: 11.97221 | epsilon: 0.01 383 | 3800 episode | score: 8.67 | loss: 11.88898 | epsilon: 0.01 384 | 3810 episode | score: 8.60 | loss: 11.85641 | epsilon: 0.01 385 | 3820 episode | score: 8.58 | loss: 14.25775 | epsilon: 0.01 386 | 3830 episode | score: 8.62 | loss: 11.08920 | epsilon: 0.01 387 | 3840 episode | score: 8.64 | loss: 11.07313 | epsilon: 0.01 388 | 3850 episode | score: 8.76 | loss: 10.28849 | epsilon: 0.01 389 | 3860 episode | score: 8.70 | loss: 11.09194 | epsilon: 0.01 390 | 3870 episode | score: 8.83 | loss: 7.27343 | epsilon: 0.01 391 | 3880 episode | score: 8.78 | loss: 12.70216 | epsilon: 0.01 392 | 3890 episode | score: 8.76 | loss: 12.71587 | epsilon: 0.01 393 | 3900 episode | score: 8.80 | loss: 11.92015 | epsilon: 0.01 394 | 3910 episode | score: 8.91 | loss: 13.49285 | epsilon: 0.01 395 | 3920 episode | score: 8.87 | loss: 14.29325 | epsilon: 0.01 396 | 3930 episode | score: 8.83 | loss: 10.36528 | epsilon: 0.01 397 | 3940 episode | score: 8.78 | loss: 11.20214 | epsilon: 0.01 398 | 3950 episode | score: 8.72 | loss: 12.78704 | epsilon: 0.01 399 | 3960 episode | score: 8.91 | loss: 8.74634 | epsilon: 0.01 400 | 3970 episode | score: 8.80 | loss: 11.15318 | epsilon: 0.01 401 | 3980 episode | score: 8.76 | loss: 11.17389 | epsilon: 0.01 402 | 3990 episode | score: 8.70 | loss: 10.37470 | epsilon: 0.01 403 | 4000 episode | score: 8.63 | loss: 15.13234 | epsilon: 0.01 404 | 4010 episode | score: 8.61 | loss: 11.99569 | epsilon: 0.01 405 | 4020 episode | score: 8.58 | loss: 8.01051 | epsilon: 0.01 406 | 4030 episode | score: 8.59 | loss: 14.32788 | epsilon: 0.01 407 | 4040 episode | score: 8.59 | loss: 12.81188 | epsilon: 0.01 408 | 4050 episode | score: 8.54 | loss: 14.31777 | epsilon: 0.01 409 | 4060 episode | score: 8.54 | loss: 9.59531 | epsilon: 0.01 410 | 4070 episode | score: 8.54 | loss: 14.39224 | epsilon: 0.01 411 | 4080 episode | score: 8.52 | loss: 14.33717 | epsilon: 0.01 412 | 4090 episode | score: 8.46 | loss: 12.74352 | epsilon: 0.01 413 | 4100 episode | score: 8.45 | loss: 9.54306 | epsilon: 0.01 414 | 4110 episode | score: 8.46 | loss: 8.00438 | epsilon: 0.01 415 | 4120 episode | score: 8.44 | loss: 12.01585 | epsilon: 0.01 416 | 4130 episode | score: 8.42 | loss: 8.08862 | epsilon: 0.01 417 | 4140 episode | score: 8.44 | loss: 11.19263 | epsilon: 0.01 418 | 4150 episode | score: 8.60 | loss: 12.77767 | epsilon: 0.01 419 | 4160 episode | score: 8.61 | loss: 9.61339 | epsilon: 0.01 420 | 4170 episode | score: 8.66 | loss: 8.78759 | epsilon: 0.01 421 | 4180 episode | score: 8.63 | loss: 9.59737 | epsilon: 0.01 422 | 4190 episode | score: 8.66 | loss: 11.22626 | epsilon: 0.01 423 | 4200 episode | score: 8.63 | loss: 8.79422 | epsilon: 0.01 424 | 4210 episode | score: 8.76 | loss: 7.25631 | epsilon: 0.01 425 | 4220 episode | score: 8.87 | loss: 5.62890 | epsilon: 0.01 426 | 4230 episode | score: 8.82 | loss: 8.87474 | epsilon: 0.01 427 | 4240 episode | score: 8.83 | loss: 12.82258 | epsilon: 0.01 428 | 4250 episode | score: 8.81 | loss: 12.82549 | epsilon: 0.01 429 | 4260 episode | score: 8.78 | loss: 12.02019 | epsilon: 0.01 430 | 4270 episode | score: 8.77 | loss: 13.64278 | epsilon: 0.01 431 | 4280 episode | score: 8.93 | loss: 8.82284 | epsilon: 0.01 432 | 4290 episode | score: 8.90 | loss: 8.86403 | epsilon: 0.01 433 | 4300 episode | score: 8.83 | loss: 9.78740 | epsilon: 0.01 434 | 4310 episode | score: 8.75 | loss: 15.24880 | epsilon: 0.01 435 | 4320 episode | score: 8.73 | loss: 13.59393 | epsilon: 0.01 436 | 4330 episode | score: 8.68 | loss: 13.73093 | epsilon: 0.01 437 | 4340 episode | score: 8.69 | loss: 11.38526 | epsilon: 0.01 438 | 4350 episode | score: 8.63 | loss: 12.88845 | epsilon: 0.01 439 | 4360 episode | score: 8.63 | loss: 15.29476 | epsilon: 0.01 440 | 4370 episode | score: 8.61 | loss: 9.62656 | epsilon: 0.01 441 | 4380 episode | score: 8.61 | loss: 11.30112 | epsilon: 0.01 442 | 4390 episode | score: 8.59 | loss: 9.64459 | epsilon: 0.01 443 | 4400 episode | score: 8.59 | loss: 11.23779 | epsilon: 0.01 444 | 4410 episode | score: 8.65 | loss: 9.63661 | epsilon: 0.01 445 | 4420 episode | score: 8.57 | loss: 12.04601 | epsilon: 0.01 446 | 4430 episode | score: 8.59 | loss: 12.01326 | epsilon: 0.01 447 | 4440 episode | score: 8.68 | loss: 10.37475 | epsilon: 0.01 448 | 4450 episode | score: 8.69 | loss: 15.99613 | epsilon: 0.01 449 | 4460 episode | score: 8.64 | loss: 11.96889 | epsilon: 0.01 450 | 4470 episode | score: 8.65 | loss: 12.04646 | epsilon: 0.01 451 | 4480 episode | score: 8.63 | loss: 8.90685 | epsilon: 0.01 452 | 4490 episode | score: 8.62 | loss: 10.39016 | epsilon: 0.01 453 | 4500 episode | score: 8.60 | loss: 11.22810 | epsilon: 0.01 454 | 4510 episode | score: 8.56 | loss: 11.23388 | epsilon: 0.01 455 | 4520 episode | score: 8.63 | loss: 12.09711 | epsilon: 0.01 456 | 4530 episode | score: 8.64 | loss: 6.43348 | epsilon: 0.01 457 | 4540 episode | score: 8.72 | loss: 11.20046 | epsilon: 0.01 458 | 4550 episode | score: 8.70 | loss: 10.46217 | epsilon: 0.01 459 | 4560 episode | score: 8.74 | loss: 10.44539 | epsilon: 0.01 460 | 4570 episode | score: 8.74 | loss: 8.07104 | epsilon: 0.01 461 | 4580 episode | score: 8.78 | loss: 14.40675 | epsilon: 0.01 462 | 4590 episode | score: 8.84 | loss: 8.82934 | epsilon: 0.01 463 | 4600 episode | score: 8.92 | loss: 11.25667 | epsilon: 0.01 464 | 4610 episode | score: 8.89 | loss: 12.83683 | epsilon: 0.01 465 | 4620 episode | score: 8.83 | loss: 8.06613 | epsilon: 0.01 466 | 4630 episode | score: 8.78 | loss: 14.43854 | epsilon: 0.01 467 | 4640 episode | score: 8.75 | loss: 12.05898 | epsilon: 0.01 468 | 4650 episode | score: 8.78 | loss: 10.55615 | epsilon: 0.01 469 | 4660 episode | score: 8.78 | loss: 9.73167 | epsilon: 0.01 470 | 4670 episode | score: 8.78 | loss: 9.68056 | epsilon: 0.01 471 | 4680 episode | score: 9.01 | loss: 11.23431 | epsilon: 0.01 472 | 4690 episode | score: 9.05 | loss: 11.26516 | epsilon: 0.01 473 | 4700 episode | score: 9.01 | loss: 12.04785 | epsilon: 0.01 474 | 4710 episode | score: 9.06 | loss: 12.07072 | epsilon: 0.01 475 | 4720 episode | score: 9.03 | loss: 9.62078 | epsilon: 0.01 476 | 4730 episode | score: 8.99 | loss: 8.94808 | epsilon: 0.01 477 | 4740 episode | score: 8.94 | loss: 14.43412 | epsilon: 0.01 478 | 4750 episode | score: 8.92 | loss: 6.44148 | epsilon: 0.01 479 | 4760 episode | score: 8.99 | loss: 8.02015 | epsilon: 0.01 480 | 4770 episode | score: 8.93 | loss: 6.42963 | epsilon: 0.01 481 | 4780 episode | score: 8.96 | loss: 9.68000 | epsilon: 0.01 482 | 4790 episode | score: 8.98 | loss: 7.21564 | epsilon: 0.01 483 | 4800 episode | score: 8.94 | loss: 10.49922 | epsilon: 0.01 484 | 4810 episode | score: 8.86 | loss: 12.90248 | epsilon: 0.01 485 | 4820 episode | score: 8.82 | loss: 12.01941 | epsilon: 0.01 486 | 4830 episode | score: 8.78 | loss: 7.24218 | epsilon: 0.01 487 | 4840 episode | score: 8.78 | loss: 7.23795 | epsilon: 0.01 488 | 4850 episode | score: 8.75 | loss: 12.03610 | epsilon: 0.01 489 | 4860 episode | score: 8.76 | loss: 11.21964 | epsilon: 0.01 490 | 4870 episode | score: 8.69 | loss: 16.00153 | epsilon: 0.01 491 | 4880 episode | score: 8.79 | loss: 12.86883 | epsilon: 0.01 492 | 4890 episode | score: 8.73 | loss: 14.43297 | epsilon: 0.01 493 | 4900 episode | score: 8.71 | loss: 12.91766 | epsilon: 0.01 494 | 4910 episode | score: 8.67 | loss: 11.28181 | epsilon: 0.01 495 | 4920 episode | score: 8.65 | loss: 12.87584 | epsilon: 0.01 496 | 4930 episode | score: 8.64 | loss: 9.59836 | epsilon: 0.01 497 | 4940 episode | score: 8.62 | loss: 16.85092 | epsilon: 0.01 498 | 4950 episode | score: 8.69 | loss: 12.79394 | epsilon: 0.01 499 | 4960 episode | score: 8.70 | loss: 10.47979 | epsilon: 0.01 500 | 4970 episode | score: 8.78 | loss: 10.40397 | epsilon: 0.01 501 | 4980 episode | score: 8.80 | loss: 15.97149 | epsilon: 0.01 502 | 4990 episode | score: 8.76 | loss: 11.32307 | epsilon: 0.01 503 | -------------------------------------------------------------------------------- /out/trace_DTQN_7.txt: -------------------------------------------------------------------------------- 1 | state size: 2 2 | action size: 2 3 | 0 episode | score: 35.00 | loss: 0.00000 | epsilon: 1.00 4 | 10 episode | score: 33.26 | loss: 0.00000 | epsilon: 1.00 5 | 20 episode | score: 31.47 | loss: 0.00000 | epsilon: 1.00 6 | 30 episode | score: 30.19 | loss: 0.00000 | epsilon: 1.00 7 | 40 episode | score: 29.01 | loss: 0.00000 | epsilon: 1.00 8 | 50 episode | score: 28.62 | loss: 0.43852 | epsilon: 1.00 9 | 60 episode | score: 27.80 | loss: 0.11433 | epsilon: 0.99 10 | 70 episode | score: 26.98 | loss: 0.19428 | epsilon: 0.98 11 | 80 episode | score: 26.27 | loss: 0.14741 | epsilon: 0.97 12 | 90 episode | score: 25.43 | loss: 0.33766 | epsilon: 0.96 13 | 100 episode | score: 24.73 | loss: 1.15387 | epsilon: 0.95 14 | 110 episode | score: 24.66 | loss: 0.26772 | epsilon: 0.94 15 | 120 episode | score: 24.23 | loss: 0.42548 | epsilon: 0.93 16 | 130 episode | score: 23.77 | loss: 0.44916 | epsilon: 0.92 17 | 140 episode | score: 23.46 | loss: 0.78435 | epsilon: 0.91 18 | 150 episode | score: 23.00 | loss: 0.53330 | epsilon: 0.90 19 | 160 episode | score: 23.18 | loss: 0.24204 | epsilon: 0.88 20 | 170 episode | score: 22.86 | loss: 1.22462 | epsilon: 0.87 21 | 180 episode | score: 22.61 | loss: 0.45706 | epsilon: 0.86 22 | 190 episode | score: 22.25 | loss: 0.70294 | epsilon: 0.85 23 | 200 episode | score: 21.77 | loss: 0.51427 | epsilon: 0.84 24 | 210 episode | score: 21.73 | loss: 0.53417 | epsilon: 0.83 25 | 220 episode | score: 21.82 | loss: 1.43799 | epsilon: 0.82 26 | 230 episode | score: 21.69 | loss: 0.90087 | epsilon: 0.81 27 | 240 episode | score: 21.73 | loss: 0.87772 | epsilon: 0.80 28 | 250 episode | score: 21.27 | loss: 0.33676 | epsilon: 0.79 29 | 260 episode | score: 20.94 | loss: 0.46457 | epsilon: 0.78 30 | 270 episode | score: 21.20 | loss: 0.96972 | epsilon: 0.77 31 | 280 episode | score: 21.32 | loss: 1.31934 | epsilon: 0.76 32 | 290 episode | score: 21.42 | loss: 0.40225 | epsilon: 0.74 33 | 300 episode | score: 21.46 | loss: 1.74762 | epsilon: 0.73 34 | 310 episode | score: 21.15 | loss: 0.75224 | epsilon: 0.72 35 | 320 episode | score: 20.52 | loss: 0.96829 | epsilon: 0.71 36 | 330 episode | score: 20.51 | loss: 0.85872 | epsilon: 0.70 37 | 340 episode | score: 19.86 | loss: 1.52210 | epsilon: 0.70 38 | 350 episode | score: 19.73 | loss: 0.79390 | epsilon: 0.69 39 | 360 episode | score: 20.06 | loss: 2.06629 | epsilon: 0.67 40 | 370 episode | score: 19.69 | loss: 2.02081 | epsilon: 0.67 41 | 380 episode | score: 19.52 | loss: 0.84718 | epsilon: 0.66 42 | 390 episode | score: 19.29 | loss: 2.10416 | epsilon: 0.65 43 | 400 episode | score: 19.11 | loss: 3.44531 | epsilon: 0.64 44 | 410 episode | score: 19.15 | loss: 2.24009 | epsilon: 0.63 45 | 420 episode | score: 19.00 | loss: 1.08297 | epsilon: 0.62 46 | 430 episode | score: 18.72 | loss: 0.46196 | epsilon: 0.61 47 | 440 episode | score: 18.33 | loss: 1.37073 | epsilon: 0.60 48 | 450 episode | score: 18.07 | loss: 2.04425 | epsilon: 0.59 49 | 460 episode | score: 17.45 | loss: 3.26727 | epsilon: 0.59 50 | 470 episode | score: 16.91 | loss: 0.94458 | epsilon: 0.58 51 | 480 episode | score: 16.62 | loss: 1.90392 | epsilon: 0.57 52 | 490 episode | score: 16.39 | loss: 3.33472 | epsilon: 0.57 53 | 500 episode | score: 16.01 | loss: 2.28937 | epsilon: 0.56 54 | 510 episode | score: 15.87 | loss: 3.86214 | epsilon: 0.55 55 | 520 episode | score: 16.01 | loss: 3.00679 | epsilon: 0.54 56 | 530 episode | score: 15.89 | loss: 2.45940 | epsilon: 0.54 57 | 540 episode | score: 15.65 | loss: 2.52061 | epsilon: 0.53 58 | 550 episode | score: 15.56 | loss: 0.55708 | epsilon: 0.52 59 | 560 episode | score: 15.83 | loss: 1.52446 | epsilon: 0.51 60 | 570 episode | score: 15.89 | loss: 4.10431 | epsilon: 0.50 61 | 580 episode | score: 15.67 | loss: 2.77351 | epsilon: 0.49 62 | 590 episode | score: 15.30 | loss: 2.61357 | epsilon: 0.49 63 | 600 episode | score: 15.50 | loss: 2.11396 | epsilon: 0.48 64 | 610 episode | score: 15.22 | loss: 2.12927 | epsilon: 0.47 65 | 620 episode | score: 14.91 | loss: 3.18743 | epsilon: 0.47 66 | 630 episode | score: 14.83 | loss: 2.16025 | epsilon: 0.46 67 | 640 episode | score: 14.46 | loss: 3.78357 | epsilon: 0.45 68 | 650 episode | score: 14.21 | loss: 4.93032 | epsilon: 0.45 69 | 660 episode | score: 14.06 | loss: 2.79694 | epsilon: 0.44 70 | 670 episode | score: 13.77 | loss: 3.85071 | epsilon: 0.43 71 | 680 episode | score: 13.88 | loss: 3.61345 | epsilon: 0.42 72 | 690 episode | score: 13.87 | loss: 3.37426 | epsilon: 0.42 73 | 700 episode | score: 13.58 | loss: 4.50640 | epsilon: 0.41 74 | 710 episode | score: 13.27 | loss: 2.85008 | epsilon: 0.41 75 | 720 episode | score: 13.29 | loss: 3.97639 | epsilon: 0.40 76 | 730 episode | score: 13.00 | loss: 2.30354 | epsilon: 0.39 77 | 740 episode | score: 12.89 | loss: 3.43699 | epsilon: 0.39 78 | 750 episode | score: 12.67 | loss: 6.29570 | epsilon: 0.38 79 | 760 episode | score: 12.72 | loss: 5.19002 | epsilon: 0.37 80 | 770 episode | score: 12.44 | loss: 2.41706 | epsilon: 0.37 81 | 780 episode | score: 12.31 | loss: 5.80734 | epsilon: 0.36 82 | 790 episode | score: 12.13 | loss: 7.02250 | epsilon: 0.36 83 | 800 episode | score: 11.95 | loss: 2.98520 | epsilon: 0.35 84 | 810 episode | score: 11.85 | loss: 4.71853 | epsilon: 0.35 85 | 820 episode | score: 11.66 | loss: 6.04004 | epsilon: 0.34 86 | 830 episode | score: 11.56 | loss: 7.74889 | epsilon: 0.33 87 | 840 episode | score: 11.71 | loss: 6.60510 | epsilon: 0.33 88 | 850 episode | score: 11.60 | loss: 3.04761 | epsilon: 0.32 89 | 860 episode | score: 11.72 | loss: 1.87209 | epsilon: 0.31 90 | 870 episode | score: 11.69 | loss: 6.05593 | epsilon: 0.31 91 | 880 episode | score: 11.53 | loss: 2.53926 | epsilon: 0.30 92 | 890 episode | score: 11.43 | loss: 5.47966 | epsilon: 0.30 93 | 900 episode | score: 11.71 | loss: 3.68325 | epsilon: 0.29 94 | 910 episode | score: 11.68 | loss: 3.10708 | epsilon: 0.28 95 | 920 episode | score: 12.07 | loss: 4.34701 | epsilon: 0.27 96 | 930 episode | score: 12.16 | loss: 6.20620 | epsilon: 0.27 97 | 940 episode | score: 12.13 | loss: 4.34877 | epsilon: 0.26 98 | 950 episode | score: 11.88 | loss: 6.85048 | epsilon: 0.26 99 | 960 episode | score: 11.64 | loss: 4.99166 | epsilon: 0.25 100 | 970 episode | score: 11.52 | loss: 4.50395 | epsilon: 0.24 101 | 980 episode | score: 13.71 | loss: 3.18380 | epsilon: 0.23 102 | 990 episode | score: 13.61 | loss: 3.20212 | epsilon: 0.22 103 | 1000 episode | score: 13.54 | loss: 3.86552 | epsilon: 0.21 104 | 1010 episode | score: 13.33 | loss: 3.24438 | epsilon: 0.21 105 | 1020 episode | score: 13.63 | loss: 2.62725 | epsilon: 0.20 106 | 1030 episode | score: 13.32 | loss: 3.26358 | epsilon: 0.19 107 | 1040 episode | score: 14.30 | loss: 3.29924 | epsilon: 0.18 108 | 1050 episode | score: 14.70 | loss: 3.92763 | epsilon: 0.17 109 | 1060 episode | score: 14.19 | loss: 3.28930 | epsilon: 0.17 110 | 1070 episode | score: 14.18 | loss: 4.59915 | epsilon: 0.16 111 | 1080 episode | score: 14.07 | loss: 5.24100 | epsilon: 0.15 112 | 1090 episode | score: 13.64 | loss: 6.55998 | epsilon: 0.15 113 | 1100 episode | score: 13.51 | loss: 0.09130 | epsilon: 0.14 114 | 1110 episode | score: 13.08 | loss: 6.60836 | epsilon: 0.13 115 | 1120 episode | score: 12.69 | loss: 3.33006 | epsilon: 0.13 116 | 1130 episode | score: 12.30 | loss: 6.59902 | epsilon: 0.12 117 | 1140 episode | score: 12.53 | loss: 5.47107 | epsilon: 0.12 118 | 1150 episode | score: 12.22 | loss: 4.05350 | epsilon: 0.11 119 | 1160 episode | score: 11.94 | loss: 9.41852 | epsilon: 0.11 120 | 1170 episode | score: 11.62 | loss: 6.70508 | epsilon: 0.10 121 | 1180 episode | score: 11.42 | loss: 4.86226 | epsilon: 0.10 122 | 1190 episode | score: 11.18 | loss: 5.39657 | epsilon: 0.09 123 | 1200 episode | score: 11.19 | loss: 6.11098 | epsilon: 0.08 124 | 1210 episode | score: 10.96 | loss: 7.41067 | epsilon: 0.08 125 | 1220 episode | score: 10.74 | loss: 8.72433 | epsilon: 0.08 126 | 1230 episode | score: 10.65 | loss: 6.14560 | epsilon: 0.07 127 | 1240 episode | score: 10.44 | loss: 8.21484 | epsilon: 0.06 128 | 1250 episode | score: 10.30 | loss: 8.88006 | epsilon: 0.06 129 | 1260 episode | score: 10.19 | loss: 8.21490 | epsilon: 0.05 130 | 1270 episode | score: 10.03 | loss: 6.78065 | epsilon: 0.05 131 | 1280 episode | score: 9.92 | loss: 10.19846 | epsilon: 0.05 132 | 1290 episode | score: 9.88 | loss: 10.88771 | epsilon: 0.04 133 | 1300 episode | score: 9.80 | loss: 6.82685 | epsilon: 0.03 134 | 1310 episode | score: 9.65 | loss: 10.89788 | epsilon: 0.03 135 | 1320 episode | score: 9.55 | loss: 4.12844 | epsilon: 0.03 136 | 1330 episode | score: 9.46 | loss: 10.29858 | epsilon: 0.02 137 | 1340 episode | score: 9.44 | loss: 8.25109 | epsilon: 0.02 138 | 1350 episode | score: 9.38 | loss: 7.56875 | epsilon: 0.01 139 | 1360 episode | score: 9.31 | loss: 11.00283 | epsilon: 0.01 140 | 1370 episode | score: 9.38 | loss: 10.34631 | epsilon: 0.01 141 | 1380 episode | score: 9.26 | loss: 11.74283 | epsilon: 0.01 142 | 1390 episode | score: 9.23 | loss: 9.70764 | epsilon: 0.01 143 | 1400 episode | score: 9.18 | loss: 11.77427 | epsilon: 0.01 144 | 1410 episode | score: 9.15 | loss: 11.11504 | epsilon: 0.01 145 | 1420 episode | score: 9.04 | loss: 11.21211 | epsilon: 0.01 146 | 1430 episode | score: 8.95 | loss: 12.53273 | epsilon: 0.01 147 | 1440 episode | score: 8.91 | loss: 7.00767 | epsilon: 0.01 148 | 1450 episode | score: 9.12 | loss: 10.50389 | epsilon: 0.01 149 | 1460 episode | score: 9.36 | loss: 11.23039 | epsilon: 0.01 150 | 1470 episode | score: 9.24 | loss: 7.05162 | epsilon: 0.01 151 | 1480 episode | score: 9.23 | loss: 7.01815 | epsilon: 0.01 152 | 1490 episode | score: 9.19 | loss: 9.16645 | epsilon: 0.01 153 | 1500 episode | score: 9.10 | loss: 9.85660 | epsilon: 0.01 154 | 1510 episode | score: 9.05 | loss: 9.85440 | epsilon: 0.01 155 | 1520 episode | score: 9.14 | loss: 9.93323 | epsilon: 0.01 156 | 1530 episode | score: 9.06 | loss: 9.91487 | epsilon: 0.01 157 | 1540 episode | score: 9.04 | loss: 6.37352 | epsilon: 0.01 158 | 1550 episode | score: 8.95 | loss: 6.42716 | epsilon: 0.01 159 | 1560 episode | score: 9.02 | loss: 12.11458 | epsilon: 0.01 160 | 1570 episode | score: 8.95 | loss: 9.95084 | epsilon: 0.01 161 | 1580 episode | score: 9.03 | loss: 7.11366 | epsilon: 0.01 162 | 1590 episode | score: 8.96 | loss: 5.71430 | epsilon: 0.01 163 | 1600 episode | score: 9.02 | loss: 9.28524 | epsilon: 0.01 164 | 1610 episode | score: 9.12 | loss: 9.29310 | epsilon: 0.01 165 | 1620 episode | score: 9.03 | loss: 7.89838 | epsilon: 0.01 166 | 1630 episode | score: 8.98 | loss: 9.31618 | epsilon: 0.01 167 | 1640 episode | score: 8.95 | loss: 10.05182 | epsilon: 0.01 168 | 1650 episode | score: 9.48 | loss: 7.97580 | epsilon: 0.01 169 | 1660 episode | score: 9.59 | loss: 10.10497 | epsilon: 0.01 170 | 1670 episode | score: 9.66 | loss: 12.99237 | epsilon: 0.01 171 | 1680 episode | score: 9.73 | loss: 5.83085 | epsilon: 0.01 172 | 1690 episode | score: 9.64 | loss: 8.68751 | epsilon: 0.01 173 | 1700 episode | score: 9.64 | loss: 7.96195 | epsilon: 0.01 174 | 1710 episode | score: 9.54 | loss: 12.32486 | epsilon: 0.01 175 | 1720 episode | score: 9.45 | loss: 7.23098 | epsilon: 0.01 176 | 1730 episode | score: 9.49 | loss: 7.36754 | epsilon: 0.01 177 | 1740 episode | score: 9.42 | loss: 7.25744 | epsilon: 0.01 178 | 1750 episode | score: 9.36 | loss: 6.53222 | epsilon: 0.01 179 | 1760 episode | score: 9.31 | loss: 10.92942 | epsilon: 0.01 180 | 1770 episode | score: 9.29 | loss: 9.51754 | epsilon: 0.01 181 | 1780 episode | score: 9.39 | loss: 10.25551 | epsilon: 0.01 182 | 1790 episode | score: 9.25 | loss: 12.39059 | epsilon: 0.01 183 | 1800 episode | score: 9.21 | loss: 7.43139 | epsilon: 0.01 184 | 1810 episode | score: 9.13 | loss: 8.77167 | epsilon: 0.01 185 | 1820 episode | score: 9.06 | loss: 10.92920 | epsilon: 0.01 186 | 1830 episode | score: 9.01 | loss: 7.37547 | epsilon: 0.01 187 | 1840 episode | score: 8.96 | loss: 9.52767 | epsilon: 0.01 188 | 1850 episode | score: 8.92 | loss: 13.15261 | epsilon: 0.01 189 | 1860 episode | score: 8.84 | loss: 6.61016 | epsilon: 0.01 190 | 1870 episode | score: 8.81 | loss: 11.70613 | epsilon: 0.01 191 | 1880 episode | score: 8.78 | loss: 11.75186 | epsilon: 0.01 192 | 1890 episode | score: 8.71 | loss: 11.78291 | epsilon: 0.01 193 | 1900 episode | score: 8.69 | loss: 11.03975 | epsilon: 0.01 194 | 1910 episode | score: 8.69 | loss: 11.00511 | epsilon: 0.01 195 | 1920 episode | score: 8.79 | loss: 9.61245 | epsilon: 0.01 196 | 1930 episode | score: 8.76 | loss: 8.90461 | epsilon: 0.01 197 | 1940 episode | score: 8.70 | loss: 11.79481 | epsilon: 0.01 198 | 1950 episode | score: 8.82 | loss: 10.33012 | epsilon: 0.01 199 | 1960 episode | score: 8.74 | loss: 12.54107 | epsilon: 0.01 200 | 1970 episode | score: 8.67 | loss: 10.37337 | epsilon: 0.01 201 | 1980 episode | score: 8.65 | loss: 14.00276 | epsilon: 0.01 202 | 1990 episode | score: 8.65 | loss: 16.27757 | epsilon: 0.01 203 | 2000 episode | score: 8.64 | loss: 11.12934 | epsilon: 0.01 204 | 2010 episode | score: 8.56 | loss: 12.59322 | epsilon: 0.01 205 | 2020 episode | score: 8.57 | loss: 8.91485 | epsilon: 0.01 206 | 2030 episode | score: 8.57 | loss: 13.42655 | epsilon: 0.01 207 | 2040 episode | score: 8.57 | loss: 9.68667 | epsilon: 0.01 208 | 2050 episode | score: 8.56 | loss: 13.31827 | epsilon: 0.01 209 | 2060 episode | score: 8.67 | loss: 8.91400 | epsilon: 0.01 210 | 2070 episode | score: 8.63 | loss: 10.44289 | epsilon: 0.01 211 | 2080 episode | score: 8.59 | loss: 13.33609 | epsilon: 0.01 212 | 2090 episode | score: 8.71 | loss: 13.34085 | epsilon: 0.01 213 | 2100 episode | score: 8.69 | loss: 13.33628 | epsilon: 0.01 214 | 2110 episode | score: 8.68 | loss: 11.17124 | epsilon: 0.01 215 | 2120 episode | score: 8.63 | loss: 12.62965 | epsilon: 0.01 216 | 2130 episode | score: 8.62 | loss: 14.87161 | epsilon: 0.01 217 | 2140 episode | score: 8.60 | loss: 11.95751 | epsilon: 0.01 218 | 2150 episode | score: 8.67 | loss: 14.16819 | epsilon: 0.01 219 | 2160 episode | score: 8.81 | loss: 12.69814 | epsilon: 0.01 220 | 2170 episode | score: 8.98 | loss: 9.73720 | epsilon: 0.01 221 | 2180 episode | score: 8.93 | loss: 12.00353 | epsilon: 0.01 222 | 2190 episode | score: 8.85 | loss: 9.09578 | epsilon: 0.01 223 | 2200 episode | score: 8.83 | loss: 10.47638 | epsilon: 0.01 224 | 2210 episode | score: 8.84 | loss: 6.82982 | epsilon: 0.01 225 | 2220 episode | score: 8.79 | loss: 12.02168 | epsilon: 0.01 226 | 2230 episode | score: 8.76 | loss: 10.52687 | epsilon: 0.01 227 | 2240 episode | score: 8.72 | loss: 8.28426 | epsilon: 0.01 228 | 2250 episode | score: 8.69 | loss: 9.21002 | epsilon: 0.01 229 | 2260 episode | score: 8.70 | loss: 10.57914 | epsilon: 0.01 230 | 2270 episode | score: 8.63 | loss: 9.84609 | epsilon: 0.01 231 | 2280 episode | score: 8.62 | loss: 10.56394 | epsilon: 0.01 232 | 2290 episode | score: 8.61 | loss: 9.90382 | epsilon: 0.01 233 | 2300 episode | score: 8.60 | loss: 9.81931 | epsilon: 0.01 234 | 2310 episode | score: 8.63 | loss: 12.82243 | epsilon: 0.01 235 | 2320 episode | score: 8.61 | loss: 10.57878 | epsilon: 0.01 236 | 2330 episode | score: 8.58 | loss: 9.15038 | epsilon: 0.01 237 | 2340 episode | score: 8.53 | loss: 8.33555 | epsilon: 0.01 238 | 2350 episode | score: 8.49 | loss: 15.08654 | epsilon: 0.01 239 | 2360 episode | score: 8.50 | loss: 9.81196 | epsilon: 0.01 240 | 2370 episode | score: 8.43 | loss: 13.60335 | epsilon: 0.01 241 | 2380 episode | score: 8.48 | loss: 11.37627 | epsilon: 0.01 242 | 2390 episode | score: 8.55 | loss: 14.38280 | epsilon: 0.01 243 | 2400 episode | score: 8.59 | loss: 11.38187 | epsilon: 0.01 244 | 2410 episode | score: 8.58 | loss: 9.87547 | epsilon: 0.01 245 | 2420 episode | score: 8.57 | loss: 8.44234 | epsilon: 0.01 246 | 2430 episode | score: 8.58 | loss: 12.23629 | epsilon: 0.01 247 | 2440 episode | score: 8.53 | loss: 7.65077 | epsilon: 0.01 248 | 2450 episode | score: 8.58 | loss: 13.13559 | epsilon: 0.01 249 | 2460 episode | score: 8.56 | loss: 10.70614 | epsilon: 0.01 250 | 2470 episode | score: 8.62 | loss: 9.23430 | epsilon: 0.01 251 | 2480 episode | score: 8.59 | loss: 7.70159 | epsilon: 0.01 252 | 2490 episode | score: 8.60 | loss: 12.97268 | epsilon: 0.01 253 | 2500 episode | score: 9.08 | loss: 6.14232 | epsilon: 0.01 254 | 2510 episode | score: 9.05 | loss: 9.98390 | epsilon: 0.01 255 | 2520 episode | score: 9.06 | loss: 7.68319 | epsilon: 0.01 256 | 2530 episode | score: 9.04 | loss: 7.66394 | epsilon: 0.01 257 | 2540 episode | score: 9.04 | loss: 8.42431 | epsilon: 0.01 258 | 2550 episode | score: 9.05 | loss: 10.74949 | epsilon: 0.01 259 | 2560 episode | score: 9.03 | loss: 10.00972 | epsilon: 0.01 260 | 2570 episode | score: 9.05 | loss: 10.00111 | epsilon: 0.01 261 | 2580 episode | score: 9.03 | loss: 8.57820 | epsilon: 0.01 262 | 2590 episode | score: 9.00 | loss: 9.22158 | epsilon: 0.01 263 | 2600 episode | score: 9.08 | loss: 11.53948 | epsilon: 0.01 264 | 2610 episode | score: 9.01 | loss: 8.47736 | epsilon: 0.01 265 | 2620 episode | score: 8.99 | loss: 13.89917 | epsilon: 0.01 266 | 2630 episode | score: 8.94 | loss: 10.78089 | epsilon: 0.01 267 | 2640 episode | score: 8.94 | loss: 9.23720 | epsilon: 0.01 268 | 2650 episode | score: 9.09 | loss: 13.82086 | epsilon: 0.01 269 | 2660 episode | score: 9.17 | loss: 10.81388 | epsilon: 0.01 270 | 2670 episode | score: 9.07 | loss: 7.08669 | epsilon: 0.01 271 | 2680 episode | score: 8.99 | loss: 8.57375 | epsilon: 0.01 272 | 2690 episode | score: 9.00 | loss: 7.73258 | epsilon: 0.01 273 | 2700 episode | score: 8.95 | loss: 7.82625 | epsilon: 0.01 274 | 2710 episode | score: 8.91 | loss: 10.05326 | epsilon: 0.01 275 | 2720 episode | score: 8.84 | loss: 7.74642 | epsilon: 0.01 276 | 2730 episode | score: 8.80 | loss: 13.19881 | epsilon: 0.01 277 | 2740 episode | score: 8.82 | loss: 10.07025 | epsilon: 0.01 278 | 2750 episode | score: 8.89 | loss: 12.45670 | epsilon: 0.01 279 | 2760 episode | score: 8.85 | loss: 11.64434 | epsilon: 0.01 280 | 2770 episode | score: 8.76 | loss: 7.01363 | epsilon: 0.01 281 | 2780 episode | score: 8.79 | loss: 13.19227 | epsilon: 0.01 282 | 2790 episode | score: 8.75 | loss: 14.75411 | epsilon: 0.01 283 | 2800 episode | score: 8.70 | loss: 9.51022 | epsilon: 0.01 284 | 2810 episode | score: 8.72 | loss: 13.20181 | epsilon: 0.01 285 | 2820 episode | score: 9.11 | loss: 10.83381 | epsilon: 0.01 286 | 2830 episode | score: 9.16 | loss: 9.31269 | epsilon: 0.01 287 | 2840 episode | score: 9.09 | loss: 7.04241 | epsilon: 0.01 288 | 2850 episode | score: 9.12 | loss: 8.60452 | epsilon: 0.01 289 | 2860 episode | score: 9.02 | loss: 10.94150 | epsilon: 0.01 290 | 2870 episode | score: 8.93 | loss: 12.45783 | epsilon: 0.01 291 | 2880 episode | score: 9.00 | loss: 11.71508 | epsilon: 0.01 292 | 2890 episode | score: 8.98 | loss: 8.59419 | epsilon: 0.01 293 | 2900 episode | score: 8.92 | loss: 8.64907 | epsilon: 0.01 294 | 2910 episode | score: 8.90 | loss: 10.14799 | epsilon: 0.01 295 | 2920 episode | score: 8.87 | loss: 13.24641 | epsilon: 0.01 296 | 2930 episode | score: 8.85 | loss: 11.01173 | epsilon: 0.01 297 | 2940 episode | score: 8.82 | loss: 10.20390 | epsilon: 0.01 298 | 2950 episode | score: 8.78 | loss: 13.23380 | epsilon: 0.01 299 | 2960 episode | score: 8.72 | loss: 10.24518 | epsilon: 0.01 300 | 2970 episode | score: 8.69 | loss: 10.95838 | epsilon: 0.01 301 | 2980 episode | score: 8.74 | loss: 7.84115 | epsilon: 0.01 302 | 2990 episode | score: 8.75 | loss: 11.73325 | epsilon: 0.01 303 | 3000 episode | score: 8.74 | loss: 11.66862 | epsilon: 0.01 304 | 3010 episode | score: 8.69 | loss: 7.82610 | epsilon: 0.01 305 | 3020 episode | score: 8.66 | loss: 10.22930 | epsilon: 0.01 306 | 3030 episode | score: 8.76 | loss: 7.79331 | epsilon: 0.01 307 | 3040 episode | score: 8.82 | loss: 7.90085 | epsilon: 0.01 308 | 3050 episode | score: 8.81 | loss: 13.34812 | epsilon: 0.01 309 | 3060 episode | score: 8.85 | loss: 10.22669 | epsilon: 0.01 310 | 3070 episode | score: 8.78 | loss: 6.30953 | epsilon: 0.01 311 | 3080 episode | score: 8.78 | loss: 13.25165 | epsilon: 0.01 312 | 3090 episode | score: 8.78 | loss: 8.65888 | epsilon: 0.01 313 | 3100 episode | score: 8.74 | loss: 14.08551 | epsilon: 0.01 314 | 3110 episode | score: 8.76 | loss: 7.86103 | epsilon: 0.01 315 | 3120 episode | score: 8.72 | loss: 11.01455 | epsilon: 0.01 316 | 3130 episode | score: 8.72 | loss: 14.21833 | epsilon: 0.01 317 | 3140 episode | score: 8.68 | loss: 8.61008 | epsilon: 0.01 318 | 3150 episode | score: 8.62 | loss: 13.33219 | epsilon: 0.01 319 | 3160 episode | score: 8.76 | loss: 4.78636 | epsilon: 0.01 320 | 3170 episode | score: 8.99 | loss: 7.05520 | epsilon: 0.01 321 | 3180 episode | score: 8.96 | loss: 11.86772 | epsilon: 0.01 322 | 3190 episode | score: 8.98 | loss: 10.17817 | epsilon: 0.01 323 | 3200 episode | score: 8.98 | loss: 12.53637 | epsilon: 0.01 324 | 3210 episode | score: 8.91 | loss: 10.12809 | epsilon: 0.01 325 | 3220 episode | score: 8.88 | loss: 8.65906 | epsilon: 0.01 326 | 3230 episode | score: 8.98 | loss: 10.20646 | epsilon: 0.01 327 | 3240 episode | score: 8.91 | loss: 10.96940 | epsilon: 0.01 328 | 3250 episode | score: 8.87 | loss: 10.18598 | epsilon: 0.01 329 | 3260 episode | score: 8.83 | loss: 8.61975 | epsilon: 0.01 330 | 3270 episode | score: 8.78 | loss: 10.19128 | epsilon: 0.01 331 | 3280 episode | score: 8.73 | loss: 8.62359 | epsilon: 0.01 332 | 3290 episode | score: 8.71 | loss: 11.02425 | epsilon: 0.01 333 | 3300 episode | score: 8.68 | loss: 14.13260 | epsilon: 0.01 334 | 3310 episode | score: 8.62 | loss: 9.41409 | epsilon: 0.01 335 | 3320 episode | score: 8.62 | loss: 12.54254 | epsilon: 0.01 336 | 3330 episode | score: 8.60 | loss: 11.82084 | epsilon: 0.01 337 | 3340 episode | score: 8.61 | loss: 10.22156 | epsilon: 0.01 338 | 3350 episode | score: 8.61 | loss: 11.80009 | epsilon: 0.01 339 | 3360 episode | score: 8.67 | loss: 11.07604 | epsilon: 0.01 340 | 3370 episode | score: 8.66 | loss: 11.78597 | epsilon: 0.01 341 | 3380 episode | score: 8.77 | loss: 11.01300 | epsilon: 0.01 342 | 3390 episode | score: 8.96 | loss: 10.25491 | epsilon: 0.01 343 | 3400 episode | score: 8.94 | loss: 11.80630 | epsilon: 0.01 344 | 3410 episode | score: 9.23 | loss: 13.34089 | epsilon: 0.01 345 | 3420 episode | score: 9.64 | loss: 7.89139 | epsilon: 0.01 346 | 3430 episode | score: 9.66 | loss: 8.69850 | epsilon: 0.01 347 | 3440 episode | score: 9.52 | loss: 9.43483 | epsilon: 0.01 348 | 3450 episode | score: 9.39 | loss: 10.31454 | epsilon: 0.01 349 | 3460 episode | score: 9.32 | loss: 10.27419 | epsilon: 0.01 350 | 3470 episode | score: 9.21 | loss: 6.42270 | epsilon: 0.01 351 | 3480 episode | score: 9.27 | loss: 5.52721 | epsilon: 0.01 352 | 3490 episode | score: 9.21 | loss: 12.69482 | epsilon: 0.01 353 | 3500 episode | score: 9.11 | loss: 12.71606 | epsilon: 0.01 354 | 3510 episode | score: 9.60 | loss: 11.90306 | epsilon: 0.01 355 | 3520 episode | score: 9.49 | loss: 15.78254 | epsilon: 0.01 356 | 3530 episode | score: 9.42 | loss: 9.48232 | epsilon: 0.01 357 | 3540 episode | score: 9.32 | loss: 9.52225 | epsilon: 0.01 358 | 3550 episode | score: 9.22 | loss: 14.21484 | epsilon: 0.01 359 | 3560 episode | score: 9.14 | loss: 11.09772 | epsilon: 0.01 360 | 3570 episode | score: 9.17 | loss: 8.73580 | epsilon: 0.01 361 | 3580 episode | score: 9.08 | loss: 9.49013 | epsilon: 0.01 362 | 3590 episode | score: 8.98 | loss: 11.07365 | epsilon: 0.01 363 | 3600 episode | score: 8.90 | loss: 7.99541 | epsilon: 0.01 364 | 3610 episode | score: 8.85 | loss: 11.86866 | epsilon: 0.01 365 | 3620 episode | score: 8.82 | loss: 15.04194 | epsilon: 0.01 366 | 3630 episode | score: 8.77 | loss: 10.41370 | epsilon: 0.01 367 | 3640 episode | score: 8.74 | loss: 13.54768 | epsilon: 0.01 368 | 3650 episode | score: 8.76 | loss: 11.88636 | epsilon: 0.01 369 | 3660 episode | score: 8.97 | loss: 10.68938 | epsilon: 0.01 370 | 3670 episode | score: 8.99 | loss: 18.81310 | epsilon: 0.01 371 | 3680 episode | score: 8.99 | loss: 2.19543 | epsilon: 0.01 372 | 3690 episode | score: 9.09 | loss: 5.16908 | epsilon: 0.01 373 | 3700 episode | score: 9.38 | loss: 6.65977 | epsilon: 0.01 374 | 3710 episode | score: 9.35 | loss: 4.14644 | epsilon: 0.01 375 | 3720 episode | score: 9.46 | loss: 14.50631 | epsilon: 0.01 376 | 3730 episode | score: 9.42 | loss: 9.58960 | epsilon: 0.01 377 | 3740 episode | score: 9.85 | loss: 8.54436 | epsilon: 0.01 378 | 3750 episode | score: 10.21 | loss: 10.65839 | epsilon: 0.01 379 | 3760 episode | score: 10.45 | loss: 9.73674 | epsilon: 0.01 380 | 3770 episode | score: 10.91 | loss: 7.87997 | epsilon: 0.01 381 | 3780 episode | score: 11.16 | loss: 7.26872 | epsilon: 0.01 382 | 3790 episode | score: 11.52 | loss: 5.94254 | epsilon: 0.01 383 | 3800 episode | score: 11.62 | loss: 5.46625 | epsilon: 0.01 384 | 3810 episode | score: 11.71 | loss: 10.21619 | epsilon: 0.01 385 | 3820 episode | score: 11.96 | loss: 9.50844 | epsilon: 0.01 386 | 3830 episode | score: 11.91 | loss: 4.78348 | epsilon: 0.01 387 | 3840 episode | score: 12.67 | loss: 5.18083 | epsilon: 0.01 388 | 3850 episode | score: 12.43 | loss: 6.25089 | epsilon: 0.01 389 | 3860 episode | score: 13.73 | loss: 4.49952 | epsilon: 0.01 390 | 3870 episode | score: 15.31 | loss: 5.31932 | epsilon: 0.01 391 | 3880 episode | score: 14.92 | loss: 8.60513 | epsilon: 0.01 392 | 3890 episode | score: 14.59 | loss: 7.08908 | epsilon: 0.01 393 | 3900 episode | score: 14.69 | loss: 3.78458 | epsilon: 0.01 394 | 3910 episode | score: 14.54 | loss: 7.11351 | epsilon: 0.01 395 | 3920 episode | score: 15.00 | loss: 4.39285 | epsilon: 0.01 396 | 3930 episode | score: 14.80 | loss: 5.84840 | epsilon: 0.01 397 | 3940 episode | score: 14.21 | loss: 5.61221 | epsilon: 0.01 398 | 3950 episode | score: 14.41 | loss: 5.26324 | epsilon: 0.01 399 | 3960 episode | score: 14.78 | loss: 4.12885 | epsilon: 0.01 400 | 3970 episode | score: 14.46 | loss: 10.88114 | epsilon: 0.01 401 | 3980 episode | score: 13.96 | loss: 3.99816 | epsilon: 0.01 402 | 3990 episode | score: 14.11 | loss: 2.82461 | epsilon: 0.01 403 | 4000 episode | score: 14.45 | loss: 5.42744 | epsilon: 0.01 404 | 4010 episode | score: 13.89 | loss: 9.51669 | epsilon: 0.01 405 | 4020 episode | score: 14.08 | loss: 9.78605 | epsilon: 0.01 406 | 4030 episode | score: 13.66 | loss: 12.18577 | epsilon: 0.01 407 | 4040 episode | score: 13.18 | loss: 5.42101 | epsilon: 0.01 408 | 4050 episode | score: 12.86 | loss: 7.65784 | epsilon: 0.01 409 | 4060 episode | score: 12.72 | loss: 6.49174 | epsilon: 0.01 410 | 4070 episode | score: 12.51 | loss: 5.89485 | epsilon: 0.01 411 | 4080 episode | score: 13.01 | loss: 9.56154 | epsilon: 0.01 412 | 4090 episode | score: 12.53 | loss: 4.01989 | epsilon: 0.01 413 | 4100 episode | score: 12.17 | loss: 6.92747 | epsilon: 0.01 414 | 4110 episode | score: 12.99 | loss: 8.53241 | epsilon: 0.01 415 | 4120 episode | score: 12.85 | loss: 10.38810 | epsilon: 0.01 416 | 4130 episode | score: 13.48 | loss: 10.76646 | epsilon: 0.01 417 | 4140 episode | score: 13.00 | loss: 6.65141 | epsilon: 0.01 418 | 4150 episode | score: 12.71 | loss: 11.13456 | epsilon: 0.01 419 | 4160 episode | score: 12.36 | loss: 8.49248 | epsilon: 0.01 420 | 4170 episode | score: 13.35 | loss: 9.90515 | epsilon: 0.01 421 | 4180 episode | score: 12.97 | loss: 8.97677 | epsilon: 0.01 422 | 4190 episode | score: 12.56 | loss: 8.43617 | epsilon: 0.01 423 | 4200 episode | score: 12.14 | loss: 9.76145 | epsilon: 0.01 424 | 4210 episode | score: 13.05 | loss: 7.90451 | epsilon: 0.01 425 | 4220 episode | score: 13.12 | loss: 8.31651 | epsilon: 0.01 426 | 4230 episode | score: 12.83 | loss: 7.39902 | epsilon: 0.01 427 | 4240 episode | score: 12.49 | loss: 4.18256 | epsilon: 0.01 428 | 4250 episode | score: 12.36 | loss: 9.26329 | epsilon: 0.01 429 | 4260 episode | score: 12.03 | loss: 7.76012 | epsilon: 0.01 430 | 4270 episode | score: 12.15 | loss: 9.94203 | epsilon: 0.01 431 | 4280 episode | score: 11.87 | loss: 10.34129 | epsilon: 0.01 432 | 4290 episode | score: 11.74 | loss: 8.36905 | epsilon: 0.01 433 | 4300 episode | score: 11.44 | loss: 6.75636 | epsilon: 0.01 434 | 4310 episode | score: 11.15 | loss: 7.25203 | epsilon: 0.01 435 | 4320 episode | score: 10.90 | loss: 11.51711 | epsilon: 0.01 436 | 4330 episode | score: 11.25 | loss: 6.79873 | epsilon: 0.01 437 | 4340 episode | score: 11.18 | loss: 8.40619 | epsilon: 0.01 438 | 4350 episode | score: 11.49 | loss: 7.93133 | epsilon: 0.01 439 | 4360 episode | score: 11.51 | loss: 3.24878 | epsilon: 0.01 440 | 4370 episode | score: 11.87 | loss: 9.02317 | epsilon: 0.01 441 | 4380 episode | score: 11.68 | loss: 6.11406 | epsilon: 0.01 442 | 4390 episode | score: 11.57 | loss: 10.30894 | epsilon: 0.01 443 | 4400 episode | score: 11.47 | loss: 8.55706 | epsilon: 0.01 444 | 4410 episode | score: 11.16 | loss: 5.48418 | epsilon: 0.01 445 | 4420 episode | score: 10.92 | loss: 7.56599 | epsilon: 0.01 446 | 4430 episode | score: 10.87 | loss: 8.38144 | epsilon: 0.01 447 | 4440 episode | score: 10.64 | loss: 7.39350 | epsilon: 0.01 448 | 4450 episode | score: 10.49 | loss: 10.57561 | epsilon: 0.01 449 | 4460 episode | score: 10.33 | loss: 9.50152 | epsilon: 0.01 450 | 4470 episode | score: 11.47 | loss: 10.41020 | epsilon: 0.01 451 | 4480 episode | score: 11.20 | loss: 6.86953 | epsilon: 0.01 452 | 4490 episode | score: 11.00 | loss: 8.93858 | epsilon: 0.01 453 | 4500 episode | score: 10.81 | loss: 7.15512 | epsilon: 0.01 454 | 4510 episode | score: 10.72 | loss: 6.55015 | epsilon: 0.01 455 | 4520 episode | score: 10.76 | loss: 11.14064 | epsilon: 0.01 456 | 4530 episode | score: 11.60 | loss: 6.40759 | epsilon: 0.01 457 | 4540 episode | score: 11.65 | loss: 6.36440 | epsilon: 0.01 458 | 4550 episode | score: 11.42 | loss: 9.43788 | epsilon: 0.01 459 | 4560 episode | score: 12.11 | loss: 5.23221 | epsilon: 0.01 460 | 4570 episode | score: 11.99 | loss: 7.09730 | epsilon: 0.01 461 | 4580 episode | score: 11.70 | loss: 4.86378 | epsilon: 0.01 462 | 4590 episode | score: 11.38 | loss: 6.45714 | epsilon: 0.01 463 | 4600 episode | score: 11.07 | loss: 9.09950 | epsilon: 0.01 464 | 4610 episode | score: 12.26 | loss: 5.60530 | epsilon: 0.01 465 | 4620 episode | score: 11.87 | loss: 11.84906 | epsilon: 0.01 466 | 4630 episode | score: 11.75 | loss: 9.31334 | epsilon: 0.01 467 | 4640 episode | score: 12.19 | loss: 9.48375 | epsilon: 0.01 468 | 4650 episode | score: 11.92 | loss: 7.69349 | epsilon: 0.01 469 | 4660 episode | score: 11.93 | loss: 7.12739 | epsilon: 0.01 470 | 4670 episode | score: 12.61 | loss: 8.63105 | epsilon: 0.01 471 | 4680 episode | score: 12.30 | loss: 13.35111 | epsilon: 0.01 472 | 4690 episode | score: 12.02 | loss: 10.32689 | epsilon: 0.01 473 | 4700 episode | score: 11.71 | loss: 8.77026 | epsilon: 0.01 474 | 4710 episode | score: 11.43 | loss: 7.19243 | epsilon: 0.01 475 | 4720 episode | score: 11.27 | loss: 8.00877 | epsilon: 0.01 476 | 4730 episode | score: 11.03 | loss: 8.81402 | epsilon: 0.01 477 | 4740 episode | score: 10.76 | loss: 11.19737 | epsilon: 0.01 478 | 4750 episode | score: 10.59 | loss: 7.21122 | epsilon: 0.01 479 | 4760 episode | score: 10.39 | loss: 6.38997 | epsilon: 0.01 480 | 4770 episode | score: 10.22 | loss: 9.57317 | epsilon: 0.01 481 | 4780 episode | score: 10.06 | loss: 7.35299 | epsilon: 0.01 482 | 4790 episode | score: 9.93 | loss: 9.67430 | epsilon: 0.01 483 | 4800 episode | score: 9.76 | loss: 11.25577 | epsilon: 0.01 484 | 4810 episode | score: 9.65 | loss: 11.93548 | epsilon: 0.01 485 | 4820 episode | score: 9.67 | loss: 11.97796 | epsilon: 0.01 486 | 4830 episode | score: 9.63 | loss: 15.08483 | epsilon: 0.01 487 | 4840 episode | score: 9.53 | loss: 8.80377 | epsilon: 0.01 488 | 4850 episode | score: 9.47 | loss: 12.76648 | epsilon: 0.01 489 | 4860 episode | score: 9.45 | loss: 6.41104 | epsilon: 0.01 490 | 4870 episode | score: 9.38 | loss: 10.42523 | epsilon: 0.01 491 | 4880 episode | score: 9.29 | loss: 12.79074 | epsilon: 0.01 492 | 4890 episode | score: 9.19 | loss: 8.76899 | epsilon: 0.01 493 | 4900 episode | score: 9.12 | loss: 12.03445 | epsilon: 0.01 494 | 4910 episode | score: 9.12 | loss: 8.84027 | epsilon: 0.01 495 | 4920 episode | score: 9.06 | loss: 9.61884 | epsilon: 0.01 496 | 4930 episode | score: 9.09 | loss: 12.78619 | epsilon: 0.01 497 | 4940 episode | score: 9.07 | loss: 16.75804 | epsilon: 0.01 498 | 4950 episode | score: 9.00 | loss: 13.62132 | epsilon: 0.01 499 | 4960 episode | score: 8.98 | loss: 8.02409 | epsilon: 0.01 500 | 4970 episode | score: 8.97 | loss: 11.97446 | epsilon: 0.01 501 | 4980 episode | score: 8.87 | loss: 16.05861 | epsilon: 0.01 502 | 4990 episode | score: 8.82 | loss: 10.44726 | epsilon: 0.01 503 | -------------------------------------------------------------------------------- /out/trace_DTQN_9.txt: -------------------------------------------------------------------------------- 1 | state size: 2 2 | action size: 2 3 | 0 episode | score: 21.00 | loss: 0.00000 | epsilon: 1.00 4 | 10 episode | score: 20.96 | loss: 0.00000 | epsilon: 1.00 5 | 20 episode | score: 20.74 | loss: 0.00000 | epsilon: 1.00 6 | 30 episode | score: 21.32 | loss: 0.00000 | epsilon: 1.00 7 | 40 episode | score: 21.30 | loss: 0.00000 | epsilon: 1.00 8 | 50 episode | score: 21.32 | loss: 0.12494 | epsilon: 0.99 9 | 60 episode | score: 21.28 | loss: 0.13065 | epsilon: 0.98 10 | 70 episode | score: 21.59 | loss: 0.26331 | epsilon: 0.97 11 | 80 episode | score: 21.54 | loss: 0.06410 | epsilon: 0.96 12 | 90 episode | score: 21.02 | loss: 0.17379 | epsilon: 0.95 13 | 100 episode | score: 20.89 | loss: 0.51764 | epsilon: 0.94 14 | 110 episode | score: 21.03 | loss: 0.54145 | epsilon: 0.93 15 | 120 episode | score: 20.77 | loss: 0.86984 | epsilon: 0.92 16 | 130 episode | score: 20.59 | loss: 0.20401 | epsilon: 0.91 17 | 140 episode | score: 21.17 | loss: 0.24127 | epsilon: 0.89 18 | 150 episode | score: 20.46 | loss: 0.39601 | epsilon: 0.89 19 | 160 episode | score: 20.47 | loss: 0.41627 | epsilon: 0.88 20 | 170 episode | score: 20.81 | loss: 0.66734 | epsilon: 0.86 21 | 180 episode | score: 20.67 | loss: 0.91335 | epsilon: 0.85 22 | 190 episode | score: 20.47 | loss: 1.18509 | epsilon: 0.84 23 | 200 episode | score: 20.98 | loss: 1.06382 | epsilon: 0.83 24 | 210 episode | score: 21.77 | loss: 0.59437 | epsilon: 0.81 25 | 220 episode | score: 21.55 | loss: 0.59339 | epsilon: 0.80 26 | 230 episode | score: 21.59 | loss: 1.17238 | epsilon: 0.79 27 | 240 episode | score: 21.35 | loss: 1.20873 | epsilon: 0.78 28 | 250 episode | score: 20.83 | loss: 0.05833 | epsilon: 0.77 29 | 260 episode | score: 20.76 | loss: 0.99572 | epsilon: 0.76 30 | 270 episode | score: 20.61 | loss: 1.66400 | epsilon: 0.75 31 | 280 episode | score: 21.10 | loss: 0.45072 | epsilon: 0.74 32 | 290 episode | score: 21.42 | loss: 0.72571 | epsilon: 0.73 33 | 300 episode | score: 20.94 | loss: 1.10599 | epsilon: 0.72 34 | 310 episode | score: 21.22 | loss: 0.40100 | epsilon: 0.71 35 | 320 episode | score: 21.19 | loss: 0.78178 | epsilon: 0.70 36 | 330 episode | score: 20.79 | loss: 2.69212 | epsilon: 0.69 37 | 340 episode | score: 20.73 | loss: 1.19997 | epsilon: 0.68 38 | 350 episode | score: 20.54 | loss: 0.42472 | epsilon: 0.67 39 | 360 episode | score: 20.33 | loss: 0.43453 | epsilon: 0.66 40 | 370 episode | score: 20.10 | loss: 0.85835 | epsilon: 0.65 41 | 380 episode | score: 20.22 | loss: 1.28385 | epsilon: 0.64 42 | 390 episode | score: 19.61 | loss: 0.47136 | epsilon: 0.63 43 | 400 episode | score: 19.93 | loss: 1.76091 | epsilon: 0.62 44 | 410 episode | score: 20.27 | loss: 2.30934 | epsilon: 0.60 45 | 420 episode | score: 19.88 | loss: 2.36865 | epsilon: 0.59 46 | 430 episode | score: 19.63 | loss: 1.84405 | epsilon: 0.59 47 | 440 episode | score: 19.30 | loss: 1.41989 | epsilon: 0.58 48 | 450 episode | score: 18.62 | loss: 0.48912 | epsilon: 0.57 49 | 460 episode | score: 18.05 | loss: 1.45394 | epsilon: 0.56 50 | 470 episode | score: 17.77 | loss: 1.01078 | epsilon: 0.56 51 | 480 episode | score: 17.36 | loss: 1.95200 | epsilon: 0.55 52 | 490 episode | score: 16.83 | loss: 1.95511 | epsilon: 0.54 53 | 500 episode | score: 16.72 | loss: 2.07467 | epsilon: 0.53 54 | 510 episode | score: 16.63 | loss: 2.51130 | epsilon: 0.53 55 | 520 episode | score: 16.57 | loss: 3.51992 | epsilon: 0.52 56 | 530 episode | score: 16.92 | loss: 4.61643 | epsilon: 0.51 57 | 540 episode | score: 16.41 | loss: 3.26759 | epsilon: 0.50 58 | 550 episode | score: 16.72 | loss: 4.22791 | epsilon: 0.49 59 | 560 episode | score: 17.05 | loss: 2.65086 | epsilon: 0.48 60 | 570 episode | score: 16.59 | loss: 3.17366 | epsilon: 0.47 61 | 580 episode | score: 16.34 | loss: 1.80297 | epsilon: 0.46 62 | 590 episode | score: 16.24 | loss: 1.09296 | epsilon: 0.46 63 | 600 episode | score: 15.85 | loss: 3.24726 | epsilon: 0.45 64 | 610 episode | score: 15.53 | loss: 4.36065 | epsilon: 0.44 65 | 620 episode | score: 15.64 | loss: 2.81037 | epsilon: 0.43 66 | 630 episode | score: 15.51 | loss: 2.22702 | epsilon: 0.43 67 | 640 episode | score: 15.46 | loss: 2.28287 | epsilon: 0.42 68 | 650 episode | score: 15.15 | loss: 2.27961 | epsilon: 0.41 69 | 660 episode | score: 15.35 | loss: 2.82823 | epsilon: 0.40 70 | 670 episode | score: 14.94 | loss: 3.45791 | epsilon: 0.40 71 | 680 episode | score: 14.55 | loss: 2.88071 | epsilon: 0.39 72 | 690 episode | score: 14.30 | loss: 2.88437 | epsilon: 0.38 73 | 700 episode | score: 14.20 | loss: 2.34074 | epsilon: 0.38 74 | 710 episode | score: 13.91 | loss: 4.06448 | epsilon: 0.37 75 | 720 episode | score: 13.84 | loss: 3.58162 | epsilon: 0.36 76 | 730 episode | score: 13.49 | loss: 1.78119 | epsilon: 0.36 77 | 740 episode | score: 13.53 | loss: 2.95448 | epsilon: 0.35 78 | 750 episode | score: 13.15 | loss: 2.96639 | epsilon: 0.35 79 | 760 episode | score: 13.43 | loss: 3.59416 | epsilon: 0.34 80 | 770 episode | score: 13.17 | loss: 1.20836 | epsilon: 0.33 81 | 780 episode | score: 13.01 | loss: 4.78889 | epsilon: 0.33 82 | 790 episode | score: 12.99 | loss: 4.84344 | epsilon: 0.32 83 | 800 episode | score: 12.78 | loss: 1.82502 | epsilon: 0.31 84 | 810 episode | score: 12.68 | loss: 6.67286 | epsilon: 0.31 85 | 820 episode | score: 12.46 | loss: 1.84013 | epsilon: 0.30 86 | 830 episode | score: 13.15 | loss: 5.51109 | epsilon: 0.29 87 | 840 episode | score: 12.85 | loss: 5.63518 | epsilon: 0.28 88 | 850 episode | score: 12.89 | loss: 3.74254 | epsilon: 0.28 89 | 860 episode | score: 12.65 | loss: 4.37036 | epsilon: 0.27 90 | 870 episode | score: 12.62 | loss: 3.14334 | epsilon: 0.27 91 | 880 episode | score: 12.55 | loss: 5.11216 | epsilon: 0.26 92 | 890 episode | score: 12.24 | loss: 4.41962 | epsilon: 0.25 93 | 900 episode | score: 12.23 | loss: 5.09165 | epsilon: 0.25 94 | 910 episode | score: 12.11 | loss: 5.07642 | epsilon: 0.24 95 | 920 episode | score: 11.94 | loss: 4.43827 | epsilon: 0.24 96 | 930 episode | score: 11.73 | loss: 6.97180 | epsilon: 0.23 97 | 940 episode | score: 11.53 | loss: 5.86489 | epsilon: 0.22 98 | 950 episode | score: 11.39 | loss: 5.19568 | epsilon: 0.22 99 | 960 episode | score: 11.24 | loss: 7.70984 | epsilon: 0.21 100 | 970 episode | score: 11.20 | loss: 5.80304 | epsilon: 0.21 101 | 980 episode | score: 11.66 | loss: 5.24795 | epsilon: 0.20 102 | 990 episode | score: 11.42 | loss: 5.90977 | epsilon: 0.19 103 | 1000 episode | score: 11.22 | loss: 3.26386 | epsilon: 0.19 104 | 1010 episode | score: 11.15 | loss: 7.18909 | epsilon: 0.18 105 | 1020 episode | score: 11.03 | loss: 5.88403 | epsilon: 0.18 106 | 1030 episode | score: 10.86 | loss: 5.28317 | epsilon: 0.17 107 | 1040 episode | score: 10.74 | loss: 7.23439 | epsilon: 0.17 108 | 1050 episode | score: 10.56 | loss: 7.89468 | epsilon: 0.16 109 | 1060 episode | score: 10.49 | loss: 7.96061 | epsilon: 0.16 110 | 1070 episode | score: 10.33 | loss: 6.62946 | epsilon: 0.15 111 | 1080 episode | score: 10.28 | loss: 7.93479 | epsilon: 0.15 112 | 1090 episode | score: 10.57 | loss: 8.63225 | epsilon: 0.14 113 | 1100 episode | score: 10.52 | loss: 6.12925 | epsilon: 0.13 114 | 1110 episode | score: 10.31 | loss: 4.05030 | epsilon: 0.13 115 | 1120 episode | score: 10.16 | loss: 6.83749 | epsilon: 0.12 116 | 1130 episode | score: 10.25 | loss: 6.74951 | epsilon: 0.12 117 | 1140 episode | score: 10.37 | loss: 6.06576 | epsilon: 0.11 118 | 1150 episode | score: 10.26 | loss: 6.73673 | epsilon: 0.11 119 | 1160 episode | score: 10.23 | loss: 6.74529 | epsilon: 0.10 120 | 1170 episode | score: 10.22 | loss: 8.10608 | epsilon: 0.10 121 | 1180 episode | score: 10.06 | loss: 8.78829 | epsilon: 0.09 122 | 1190 episode | score: 10.00 | loss: 8.12166 | epsilon: 0.09 123 | 1200 episode | score: 9.95 | loss: 8.14971 | epsilon: 0.08 124 | 1210 episode | score: 9.94 | loss: 7.47524 | epsilon: 0.08 125 | 1220 episode | score: 9.77 | loss: 8.87012 | epsilon: 0.07 126 | 1230 episode | score: 9.81 | loss: 4.11322 | epsilon: 0.07 127 | 1240 episode | score: 9.65 | loss: 9.55529 | epsilon: 0.06 128 | 1250 episode | score: 9.59 | loss: 7.53394 | epsilon: 0.06 129 | 1260 episode | score: 9.52 | loss: 5.47745 | epsilon: 0.05 130 | 1270 episode | score: 9.44 | loss: 8.96674 | epsilon: 0.05 131 | 1280 episode | score: 9.81 | loss: 9.69322 | epsilon: 0.04 132 | 1290 episode | score: 9.65 | loss: 4.15783 | epsilon: 0.03 133 | 1300 episode | score: 9.58 | loss: 7.06649 | epsilon: 0.03 134 | 1310 episode | score: 9.54 | loss: 7.65880 | epsilon: 0.02 135 | 1320 episode | score: 9.41 | loss: 7.74817 | epsilon: 0.02 136 | 1330 episode | score: 9.34 | loss: 7.74973 | epsilon: 0.01 137 | 1340 episode | score: 9.25 | loss: 8.37979 | epsilon: 0.01 138 | 1350 episode | score: 9.20 | loss: 7.64794 | epsilon: 0.01 139 | 1360 episode | score: 9.16 | loss: 13.24160 | epsilon: 0.01 140 | 1370 episode | score: 9.11 | loss: 9.75682 | epsilon: 0.01 141 | 1380 episode | score: 9.15 | loss: 7.03971 | epsilon: 0.01 142 | 1390 episode | score: 9.08 | loss: 7.71366 | epsilon: 0.01 143 | 1400 episode | score: 9.13 | loss: 11.21983 | epsilon: 0.01 144 | 1410 episode | score: 9.04 | loss: 11.23451 | epsilon: 0.01 145 | 1420 episode | score: 9.01 | loss: 6.33937 | epsilon: 0.01 146 | 1430 episode | score: 8.93 | loss: 7.11611 | epsilon: 0.01 147 | 1440 episode | score: 8.85 | loss: 12.66293 | epsilon: 0.01 148 | 1450 episode | score: 8.81 | loss: 10.55220 | epsilon: 0.01 149 | 1460 episode | score: 8.79 | loss: 9.18106 | epsilon: 0.01 150 | 1470 episode | score: 8.76 | loss: 11.31221 | epsilon: 0.01 151 | 1480 episode | score: 8.75 | loss: 8.49157 | epsilon: 0.01 152 | 1490 episode | score: 8.77 | loss: 7.81996 | epsilon: 0.01 153 | 1500 episode | score: 8.72 | loss: 12.06588 | epsilon: 0.01 154 | 1510 episode | score: 8.78 | loss: 9.32470 | epsilon: 0.01 155 | 1520 episode | score: 8.76 | loss: 12.09021 | epsilon: 0.01 156 | 1530 episode | score: 8.72 | loss: 7.13991 | epsilon: 0.01 157 | 1540 episode | score: 8.65 | loss: 11.45438 | epsilon: 0.01 158 | 1550 episode | score: 8.62 | loss: 8.64568 | epsilon: 0.01 159 | 1560 episode | score: 8.62 | loss: 7.22120 | epsilon: 0.01 160 | 1570 episode | score: 8.63 | loss: 5.76618 | epsilon: 0.01 161 | 1580 episode | score: 8.63 | loss: 10.17529 | epsilon: 0.01 162 | 1590 episode | score: 8.65 | loss: 7.94844 | epsilon: 0.01 163 | 1600 episode | score: 8.62 | loss: 9.36637 | epsilon: 0.01 164 | 1610 episode | score: 8.59 | loss: 10.79890 | epsilon: 0.01 165 | 1620 episode | score: 8.57 | loss: 8.65811 | epsilon: 0.01 166 | 1630 episode | score: 8.56 | loss: 10.83473 | epsilon: 0.01 167 | 1640 episode | score: 8.90 | loss: 10.93063 | epsilon: 0.01 168 | 1650 episode | score: 8.99 | loss: 13.74893 | epsilon: 0.01 169 | 1660 episode | score: 8.96 | loss: 8.70530 | epsilon: 0.01 170 | 1670 episode | score: 8.98 | loss: 9.46762 | epsilon: 0.01 171 | 1680 episode | score: 8.95 | loss: 7.37468 | epsilon: 0.01 172 | 1690 episode | score: 8.90 | loss: 3.67320 | epsilon: 0.01 173 | 1700 episode | score: 8.89 | loss: 10.23026 | epsilon: 0.01 174 | 1710 episode | score: 8.84 | loss: 11.70730 | epsilon: 0.01 175 | 1720 episode | score: 8.78 | loss: 5.85139 | epsilon: 0.01 176 | 1730 episode | score: 8.75 | loss: 11.75461 | epsilon: 0.01 177 | 1740 episode | score: 8.72 | loss: 9.58267 | epsilon: 0.01 178 | 1750 episode | score: 8.78 | loss: 10.97702 | epsilon: 0.01 179 | 1760 episode | score: 8.76 | loss: 10.28451 | epsilon: 0.01 180 | 1770 episode | score: 9.18 | loss: 13.91956 | epsilon: 0.01 181 | 1780 episode | score: 9.15 | loss: 8.86558 | epsilon: 0.01 182 | 1790 episode | score: 9.41 | loss: 10.27811 | epsilon: 0.01 183 | 1800 episode | score: 9.34 | loss: 5.93619 | epsilon: 0.01 184 | 1810 episode | score: 9.23 | loss: 8.81983 | epsilon: 0.01 185 | 1820 episode | score: 9.15 | loss: 8.83106 | epsilon: 0.01 186 | 1830 episode | score: 9.23 | loss: 11.10988 | epsilon: 0.01 187 | 1840 episode | score: 9.15 | loss: 8.85159 | epsilon: 0.01 188 | 1850 episode | score: 9.09 | loss: 12.58249 | epsilon: 0.01 189 | 1860 episode | score: 9.05 | loss: 9.61852 | epsilon: 0.01 190 | 1870 episode | score: 8.98 | loss: 8.87381 | epsilon: 0.01 191 | 1880 episode | score: 8.90 | loss: 10.43737 | epsilon: 0.01 192 | 1890 episode | score: 8.86 | loss: 11.14335 | epsilon: 0.01 193 | 1900 episode | score: 8.82 | loss: 11.12453 | epsilon: 0.01 194 | 1910 episode | score: 8.86 | loss: 10.44158 | epsilon: 0.01 195 | 1920 episode | score: 8.84 | loss: 12.64781 | epsilon: 0.01 196 | 1930 episode | score: 8.86 | loss: 11.15122 | epsilon: 0.01 197 | 1940 episode | score: 8.83 | loss: 11.15670 | epsilon: 0.01 198 | 1950 episode | score: 8.78 | loss: 8.95111 | epsilon: 0.01 199 | 1960 episode | score: 8.75 | loss: 10.43592 | epsilon: 0.01 200 | 1970 episode | score: 8.72 | loss: 7.49042 | epsilon: 0.01 201 | 1980 episode | score: 8.70 | loss: 12.69947 | epsilon: 0.01 202 | 1990 episode | score: 8.68 | loss: 14.24218 | epsilon: 0.01 203 | 2000 episode | score: 8.65 | loss: 9.06082 | epsilon: 0.01 204 | 2010 episode | score: 8.57 | loss: 9.78840 | epsilon: 0.01 205 | 2020 episode | score: 8.71 | loss: 11.30073 | epsilon: 0.01 206 | 2030 episode | score: 8.88 | loss: 12.03438 | epsilon: 0.01 207 | 2040 episode | score: 8.82 | loss: 8.31563 | epsilon: 0.01 208 | 2050 episode | score: 8.77 | loss: 13.61250 | epsilon: 0.01 209 | 2060 episode | score: 8.76 | loss: 10.57885 | epsilon: 0.01 210 | 2070 episode | score: 8.70 | loss: 7.57164 | epsilon: 0.01 211 | 2080 episode | score: 8.84 | loss: 11.41372 | epsilon: 0.01 212 | 2090 episode | score: 8.83 | loss: 9.85300 | epsilon: 0.01 213 | 2100 episode | score: 8.80 | loss: 9.90245 | epsilon: 0.01 214 | 2110 episode | score: 8.80 | loss: 9.11386 | epsilon: 0.01 215 | 2120 episode | score: 8.80 | loss: 11.46561 | epsilon: 0.01 216 | 2130 episode | score: 8.89 | loss: 9.88510 | epsilon: 0.01 217 | 2140 episode | score: 8.84 | loss: 11.45000 | epsilon: 0.01 218 | 2150 episode | score: 8.91 | loss: 9.92751 | epsilon: 0.01 219 | 2160 episode | score: 8.85 | loss: 12.21367 | epsilon: 0.01 220 | 2170 episode | score: 8.95 | loss: 13.72579 | epsilon: 0.01 221 | 2180 episode | score: 8.88 | loss: 11.49756 | epsilon: 0.01 222 | 2190 episode | score: 8.99 | loss: 9.18024 | epsilon: 0.01 223 | 2200 episode | score: 8.90 | loss: 9.31785 | epsilon: 0.01 224 | 2210 episode | score: 8.91 | loss: 9.93547 | epsilon: 0.01 225 | 2220 episode | score: 8.85 | loss: 9.92336 | epsilon: 0.01 226 | 2230 episode | score: 8.91 | loss: 7.66453 | epsilon: 0.01 227 | 2240 episode | score: 9.03 | loss: 8.46121 | epsilon: 0.01 228 | 2250 episode | score: 8.97 | loss: 13.07337 | epsilon: 0.01 229 | 2260 episode | score: 8.93 | loss: 9.23427 | epsilon: 0.01 230 | 2270 episode | score: 8.94 | loss: 9.31744 | epsilon: 0.01 231 | 2280 episode | score: 8.98 | loss: 9.97877 | epsilon: 0.01 232 | 2290 episode | score: 8.94 | loss: 8.58323 | epsilon: 0.01 233 | 2300 episode | score: 8.88 | loss: 9.21149 | epsilon: 0.01 234 | 2310 episode | score: 8.88 | loss: 12.25584 | epsilon: 0.01 235 | 2320 episode | score: 8.95 | loss: 9.25100 | epsilon: 0.01 236 | 2330 episode | score: 8.89 | loss: 6.96148 | epsilon: 0.01 237 | 2340 episode | score: 8.85 | loss: 10.03903 | epsilon: 0.01 238 | 2350 episode | score: 8.82 | loss: 10.77144 | epsilon: 0.01 239 | 2360 episode | score: 8.97 | loss: 8.43672 | epsilon: 0.01 240 | 2370 episode | score: 9.06 | loss: 13.85954 | epsilon: 0.01 241 | 2380 episode | score: 9.01 | loss: 9.24928 | epsilon: 0.01 242 | 2390 episode | score: 8.97 | loss: 12.36560 | epsilon: 0.01 243 | 2400 episode | score: 8.95 | loss: 6.94651 | epsilon: 0.01 244 | 2410 episode | score: 8.90 | loss: 12.32122 | epsilon: 0.01 245 | 2420 episode | score: 8.85 | loss: 12.33952 | epsilon: 0.01 246 | 2430 episode | score: 8.91 | loss: 9.31118 | epsilon: 0.01 247 | 2440 episode | score: 8.92 | loss: 10.07508 | epsilon: 0.01 248 | 2450 episode | score: 8.93 | loss: 9.35474 | epsilon: 0.01 249 | 2460 episode | score: 8.87 | loss: 10.87870 | epsilon: 0.01 250 | 2470 episode | score: 8.82 | loss: 8.50910 | epsilon: 0.01 251 | 2480 episode | score: 8.91 | loss: 10.12521 | epsilon: 0.01 252 | 2490 episode | score: 8.88 | loss: 11.73552 | epsilon: 0.01 253 | 2500 episode | score: 8.80 | loss: 8.57602 | epsilon: 0.01 254 | 2510 episode | score: 8.79 | loss: 10.88338 | epsilon: 0.01 255 | 2520 episode | score: 8.89 | loss: 8.54393 | epsilon: 0.01 256 | 2530 episode | score: 8.89 | loss: 10.91774 | epsilon: 0.01 257 | 2540 episode | score: 9.12 | loss: 10.86364 | epsilon: 0.01 258 | 2550 episode | score: 9.05 | loss: 15.57212 | epsilon: 0.01 259 | 2560 episode | score: 9.01 | loss: 7.00552 | epsilon: 0.01 260 | 2570 episode | score: 8.98 | loss: 9.33577 | epsilon: 0.01 261 | 2580 episode | score: 9.01 | loss: 9.40912 | epsilon: 0.01 262 | 2590 episode | score: 8.98 | loss: 8.62148 | epsilon: 0.01 263 | 2600 episode | score: 8.91 | loss: 7.81482 | epsilon: 0.01 264 | 2610 episode | score: 8.90 | loss: 10.24356 | epsilon: 0.01 265 | 2620 episode | score: 8.90 | loss: 7.01662 | epsilon: 0.01 266 | 2630 episode | score: 9.20 | loss: 7.82972 | epsilon: 0.01 267 | 2640 episode | score: 9.32 | loss: 13.31156 | epsilon: 0.01 268 | 2650 episode | score: 9.25 | loss: 13.25241 | epsilon: 0.01 269 | 2660 episode | score: 9.18 | loss: 9.42001 | epsilon: 0.01 270 | 2670 episode | score: 9.17 | loss: 6.25177 | epsilon: 0.01 271 | 2680 episode | score: 9.25 | loss: 13.35298 | epsilon: 0.01 272 | 2690 episode | score: 9.13 | loss: 11.79432 | epsilon: 0.01 273 | 2700 episode | score: 9.02 | loss: 13.34889 | epsilon: 0.01 274 | 2710 episode | score: 8.97 | loss: 7.13101 | epsilon: 0.01 275 | 2720 episode | score: 9.04 | loss: 8.60090 | epsilon: 0.01 276 | 2730 episode | score: 8.98 | loss: 7.03793 | epsilon: 0.01 277 | 2740 episode | score: 8.90 | loss: 10.19670 | epsilon: 0.01 278 | 2750 episode | score: 8.93 | loss: 10.17758 | epsilon: 0.01 279 | 2760 episode | score: 8.89 | loss: 10.98469 | epsilon: 0.01 280 | 2770 episode | score: 8.79 | loss: 6.37543 | epsilon: 0.01 281 | 2780 episode | score: 8.78 | loss: 9.44218 | epsilon: 0.01 282 | 2790 episode | score: 8.72 | loss: 8.68750 | epsilon: 0.01 283 | 2800 episode | score: 8.67 | loss: 11.84710 | epsilon: 0.01 284 | 2810 episode | score: 8.63 | loss: 10.20570 | epsilon: 0.01 285 | 2820 episode | score: 8.63 | loss: 12.58235 | epsilon: 0.01 286 | 2830 episode | score: 8.69 | loss: 11.04398 | epsilon: 0.01 287 | 2840 episode | score: 8.67 | loss: 10.31757 | epsilon: 0.01 288 | 2850 episode | score: 8.63 | loss: 10.31371 | epsilon: 0.01 289 | 2860 episode | score: 8.77 | loss: 14.91368 | epsilon: 0.01 290 | 2870 episode | score: 8.82 | loss: 15.00642 | epsilon: 0.01 291 | 2880 episode | score: 8.80 | loss: 12.59427 | epsilon: 0.01 292 | 2890 episode | score: 8.82 | loss: 13.39069 | epsilon: 0.01 293 | 2900 episode | score: 8.84 | loss: 11.82799 | epsilon: 0.01 294 | 2910 episode | score: 9.20 | loss: 9.44360 | epsilon: 0.01 295 | 2920 episode | score: 9.71 | loss: 8.69737 | epsilon: 0.01 296 | 2930 episode | score: 9.61 | loss: 6.31859 | epsilon: 0.01 297 | 2940 episode | score: 9.48 | loss: 9.55086 | epsilon: 0.01 298 | 2950 episode | score: 9.41 | loss: 9.44469 | epsilon: 0.01 299 | 2960 episode | score: 9.28 | loss: 7.95107 | epsilon: 0.01 300 | 2970 episode | score: 9.26 | loss: 7.17244 | epsilon: 0.01 301 | 2980 episode | score: 9.21 | loss: 7.11789 | epsilon: 0.01 302 | 2990 episode | score: 9.15 | loss: 7.16511 | epsilon: 0.01 303 | 3000 episode | score: 9.10 | loss: 13.47608 | epsilon: 0.01 304 | 3010 episode | score: 9.15 | loss: 7.94981 | epsilon: 0.01 305 | 3020 episode | score: 9.10 | loss: 11.87295 | epsilon: 0.01 306 | 3030 episode | score: 9.04 | loss: 10.43319 | epsilon: 0.01 307 | 3040 episode | score: 9.11 | loss: 11.96408 | epsilon: 0.01 308 | 3050 episode | score: 9.03 | loss: 10.34738 | epsilon: 0.01 309 | 3060 episode | score: 8.99 | loss: 11.88029 | epsilon: 0.01 310 | 3070 episode | score: 8.93 | loss: 10.37521 | epsilon: 0.01 311 | 3080 episode | score: 8.84 | loss: 11.11475 | epsilon: 0.01 312 | 3090 episode | score: 8.80 | loss: 10.34103 | epsilon: 0.01 313 | 3100 episode | score: 8.89 | loss: 12.71417 | epsilon: 0.01 314 | 3110 episode | score: 8.85 | loss: 9.57880 | epsilon: 0.01 315 | 3120 episode | score: 8.78 | loss: 11.87391 | epsilon: 0.01 316 | 3130 episode | score: 8.81 | loss: 11.11053 | epsilon: 0.01 317 | 3140 episode | score: 9.23 | loss: 7.96314 | epsilon: 0.01 318 | 3150 episode | score: 9.42 | loss: 14.31875 | epsilon: 0.01 319 | 3160 episode | score: 9.36 | loss: 9.55417 | epsilon: 0.01 320 | 3170 episode | score: 9.36 | loss: 12.69406 | epsilon: 0.01 321 | 3180 episode | score: 9.31 | loss: 12.02454 | epsilon: 0.01 322 | 3190 episode | score: 9.29 | loss: 12.67994 | epsilon: 0.01 323 | 3200 episode | score: 9.23 | loss: 10.64494 | epsilon: 0.01 324 | 3210 episode | score: 9.14 | loss: 10.39057 | epsilon: 0.01 325 | 3220 episode | score: 9.08 | loss: 8.77181 | epsilon: 0.01 326 | 3230 episode | score: 9.00 | loss: 8.75515 | epsilon: 0.01 327 | 3240 episode | score: 8.94 | loss: 9.55666 | epsilon: 0.01 328 | 3250 episode | score: 8.92 | loss: 11.69119 | epsilon: 0.01 329 | 3260 episode | score: 8.85 | loss: 18.40750 | epsilon: 0.01 330 | 3270 episode | score: 8.79 | loss: 23.38196 | epsilon: 0.01 331 | 3280 episode | score: 8.83 | loss: 6.67032 | epsilon: 0.01 332 | 3290 episode | score: 9.01 | loss: 11.30644 | epsilon: 0.01 333 | 3300 episode | score: 8.98 | loss: 9.37187 | epsilon: 0.01 334 | 3310 episode | score: 8.98 | loss: 9.69363 | epsilon: 0.01 335 | 3320 episode | score: 9.07 | loss: 8.18816 | epsilon: 0.01 336 | 3330 episode | score: 9.09 | loss: 8.10379 | epsilon: 0.01 337 | 3340 episode | score: 9.08 | loss: 8.61894 | epsilon: 0.01 338 | 3350 episode | score: 9.00 | loss: 11.35091 | epsilon: 0.01 339 | 3360 episode | score: 8.93 | loss: 10.41170 | epsilon: 0.01 340 | 3370 episode | score: 8.90 | loss: 14.24506 | epsilon: 0.01 341 | 3380 episode | score: 9.33 | loss: 12.81649 | epsilon: 0.01 342 | 3390 episode | score: 9.27 | loss: 7.33425 | epsilon: 0.01 343 | 3400 episode | score: 9.75 | loss: 12.60556 | epsilon: 0.01 344 | 3410 episode | score: 9.63 | loss: 12.47419 | epsilon: 0.01 345 | 3420 episode | score: 9.98 | loss: 13.25487 | epsilon: 0.01 346 | 3430 episode | score: 9.89 | loss: 5.03355 | epsilon: 0.01 347 | 3440 episode | score: 9.73 | loss: 8.11193 | epsilon: 0.01 348 | 3450 episode | score: 9.60 | loss: 8.43487 | epsilon: 0.01 349 | 3460 episode | score: 9.51 | loss: 13.05702 | epsilon: 0.01 350 | 3470 episode | score: 9.48 | loss: 3.30770 | epsilon: 0.01 351 | 3480 episode | score: 9.73 | loss: 9.20855 | epsilon: 0.01 352 | 3490 episode | score: 9.64 | loss: 8.84441 | epsilon: 0.01 353 | 3500 episode | score: 9.56 | loss: 12.18956 | epsilon: 0.01 354 | 3510 episode | score: 9.77 | loss: 7.42808 | epsilon: 0.01 355 | 3520 episode | score: 9.67 | loss: 8.75190 | epsilon: 0.01 356 | 3530 episode | score: 9.52 | loss: 9.91304 | epsilon: 0.01 357 | 3540 episode | score: 9.42 | loss: 7.19323 | epsilon: 0.01 358 | 3550 episode | score: 9.86 | loss: 9.61124 | epsilon: 0.01 359 | 3560 episode | score: 11.48 | loss: 9.93398 | epsilon: 0.01 360 | 3570 episode | score: 11.56 | loss: 9.95154 | epsilon: 0.01 361 | 3580 episode | score: 12.09 | loss: 5.97820 | epsilon: 0.01 362 | 3590 episode | score: 11.88 | loss: 11.61210 | epsilon: 0.01 363 | 3600 episode | score: 12.62 | loss: 5.66679 | epsilon: 0.01 364 | 3610 episode | score: 12.25 | loss: 4.34159 | epsilon: 0.01 365 | 3620 episode | score: 13.68 | loss: 6.75365 | epsilon: 0.01 366 | 3630 episode | score: 15.21 | loss: 5.62228 | epsilon: 0.01 367 | 3640 episode | score: 14.70 | loss: 2.85260 | epsilon: 0.01 368 | 3650 episode | score: 14.35 | loss: 8.27749 | epsilon: 0.01 369 | 3660 episode | score: 13.93 | loss: 4.46703 | epsilon: 0.01 370 | 3670 episode | score: 13.44 | loss: 4.22112 | epsilon: 0.01 371 | 3680 episode | score: 13.32 | loss: 6.51251 | epsilon: 0.01 372 | 3690 episode | score: 13.15 | loss: 3.76290 | epsilon: 0.01 373 | 3700 episode | score: 13.54 | loss: 4.41752 | epsilon: 0.01 374 | 3710 episode | score: 13.34 | loss: 4.64118 | epsilon: 0.01 375 | 3720 episode | score: 13.35 | loss: 5.92149 | epsilon: 0.01 376 | 3730 episode | score: 13.07 | loss: 6.23527 | epsilon: 0.01 377 | 3740 episode | score: 12.62 | loss: 5.54841 | epsilon: 0.01 378 | 3750 episode | score: 13.63 | loss: 10.53116 | epsilon: 0.01 379 | 3760 episode | score: 14.01 | loss: 3.18782 | epsilon: 0.01 380 | 3770 episode | score: 14.20 | loss: 8.02784 | epsilon: 0.01 381 | 3780 episode | score: 15.00 | loss: 2.93599 | epsilon: 0.01 382 | 3790 episode | score: 14.71 | loss: 4.80426 | epsilon: 0.01 383 | 3800 episode | score: 15.71 | loss: 7.91253 | epsilon: 0.01 384 | 3810 episode | score: 15.91 | loss: 4.73834 | epsilon: 0.01 385 | 3820 episode | score: 15.61 | loss: 0.79304 | epsilon: 0.01 386 | 3830 episode | score: 15.49 | loss: 4.78924 | epsilon: 0.01 387 | 3840 episode | score: 15.02 | loss: 4.90685 | epsilon: 0.01 388 | 3850 episode | score: 14.68 | loss: 2.36603 | epsilon: 0.01 389 | 3860 episode | score: 14.17 | loss: 6.88053 | epsilon: 0.01 390 | 3870 episode | score: 13.97 | loss: 5.15767 | epsilon: 0.01 391 | 3880 episode | score: 14.36 | loss: 4.82018 | epsilon: 0.01 392 | 3890 episode | score: 14.01 | loss: 2.58375 | epsilon: 0.01 393 | 3900 episode | score: 15.21 | loss: 4.39568 | epsilon: 0.01 394 | 3910 episode | score: 15.64 | loss: 4.94824 | epsilon: 0.01 395 | 3920 episode | score: 15.34 | loss: 5.53893 | epsilon: 0.01 396 | 3930 episode | score: 15.41 | loss: 6.22380 | epsilon: 0.01 397 | 3940 episode | score: 15.19 | loss: 6.26805 | epsilon: 0.01 398 | 3950 episode | score: 14.76 | loss: 4.59807 | epsilon: 0.01 399 | 3960 episode | score: 14.71 | loss: 7.60362 | epsilon: 0.01 400 | 3970 episode | score: 14.59 | loss: 6.93035 | epsilon: 0.01 401 | 3980 episode | score: 15.08 | loss: 6.01341 | epsilon: 0.01 402 | 3990 episode | score: 14.44 | loss: 5.23150 | epsilon: 0.01 403 | 4000 episode | score: 14.43 | loss: 3.90165 | epsilon: 0.01 404 | 4010 episode | score: 14.15 | loss: 4.73526 | epsilon: 0.01 405 | 4020 episode | score: 13.75 | loss: 6.11941 | epsilon: 0.01 406 | 4030 episode | score: 13.47 | loss: 7.32031 | epsilon: 0.01 407 | 4040 episode | score: 13.28 | loss: 3.62526 | epsilon: 0.01 408 | 4050 episode | score: 14.35 | loss: 1.43349 | epsilon: 0.01 409 | 4060 episode | score: 15.05 | loss: 3.68072 | epsilon: 0.01 410 | 4070 episode | score: 14.74 | loss: 4.02725 | epsilon: 0.01 411 | 4080 episode | score: 14.60 | loss: 6.00835 | epsilon: 0.01 412 | 4090 episode | score: 15.40 | loss: 4.19405 | epsilon: 0.01 413 | 4100 episode | score: 16.11 | loss: 2.98184 | epsilon: 0.01 414 | 4110 episode | score: 16.46 | loss: 3.63669 | epsilon: 0.01 415 | 4120 episode | score: 16.70 | loss: 2.74856 | epsilon: 0.01 416 | 4130 episode | score: 16.46 | loss: 3.80108 | epsilon: 0.01 417 | 4140 episode | score: 16.42 | loss: 5.00285 | epsilon: 0.01 418 | 4150 episode | score: 15.81 | loss: 1.60470 | epsilon: 0.01 419 | 4160 episode | score: 15.18 | loss: 4.54753 | epsilon: 0.01 420 | 4170 episode | score: 14.66 | loss: 3.38707 | epsilon: 0.01 421 | 4180 episode | score: 14.33 | loss: 2.86823 | epsilon: 0.01 422 | 4190 episode | score: 13.96 | loss: 3.27965 | epsilon: 0.01 423 | 4200 episode | score: 13.44 | loss: 6.32444 | epsilon: 0.01 424 | 4210 episode | score: 13.13 | loss: 7.85530 | epsilon: 0.01 425 | 4220 episode | score: 13.60 | loss: 7.71556 | epsilon: 0.01 426 | 4230 episode | score: 13.40 | loss: 8.20314 | epsilon: 0.01 427 | 4240 episode | score: 13.91 | loss: 0.95849 | epsilon: 0.01 428 | 4250 episode | score: 13.88 | loss: 8.55000 | epsilon: 0.01 429 | 4260 episode | score: 13.95 | loss: 3.74746 | epsilon: 0.01 430 | 4270 episode | score: 13.61 | loss: 9.64572 | epsilon: 0.01 431 | 4280 episode | score: 14.63 | loss: 4.42514 | epsilon: 0.01 432 | 4290 episode | score: 14.96 | loss: 6.65158 | epsilon: 0.01 433 | 4300 episode | score: 14.47 | loss: 4.77864 | epsilon: 0.01 434 | 4310 episode | score: 14.14 | loss: 4.92499 | epsilon: 0.01 435 | 4320 episode | score: 13.82 | loss: 7.58319 | epsilon: 0.01 436 | 4330 episode | score: 13.53 | loss: 6.55989 | epsilon: 0.01 437 | 4340 episode | score: 13.27 | loss: 6.31598 | epsilon: 0.01 438 | 4350 episode | score: 12.89 | loss: 7.07195 | epsilon: 0.01 439 | 4360 episode | score: 12.64 | loss: 11.24029 | epsilon: 0.01 440 | 4370 episode | score: 12.29 | loss: 8.43339 | epsilon: 0.01 441 | 4380 episode | score: 11.98 | loss: 9.08022 | epsilon: 0.01 442 | 4390 episode | score: 11.71 | loss: 9.21856 | epsilon: 0.01 443 | 4400 episode | score: 11.38 | loss: 13.00905 | epsilon: 0.01 444 | 4410 episode | score: 11.13 | loss: 6.91477 | epsilon: 0.01 445 | 4420 episode | score: 10.97 | loss: 9.24840 | epsilon: 0.01 446 | 4430 episode | score: 10.79 | loss: 5.93028 | epsilon: 0.01 447 | 4440 episode | score: 10.67 | loss: 11.75588 | epsilon: 0.01 448 | 4450 episode | score: 10.62 | loss: 7.53685 | epsilon: 0.01 449 | 4460 episode | score: 10.42 | loss: 8.77642 | epsilon: 0.01 450 | 4470 episode | score: 10.38 | loss: 12.00876 | epsilon: 0.01 451 | 4480 episode | score: 10.28 | loss: 12.04044 | epsilon: 0.01 452 | 4490 episode | score: 10.33 | loss: 10.63362 | epsilon: 0.01 453 | 4500 episode | score: 10.17 | loss: 12.11068 | epsilon: 0.01 454 | 4510 episode | score: 10.04 | loss: 9.83064 | epsilon: 0.01 455 | 4520 episode | score: 9.96 | loss: 12.98160 | epsilon: 0.01 456 | 4530 episode | score: 9.78 | loss: 9.32460 | epsilon: 0.01 457 | 4540 episode | score: 9.67 | loss: 11.56910 | epsilon: 0.01 458 | 4550 episode | score: 9.70 | loss: 13.24871 | epsilon: 0.01 459 | 4560 episode | score: 9.53 | loss: 10.25375 | epsilon: 0.01 460 | 4570 episode | score: 9.51 | loss: 11.35277 | epsilon: 0.01 461 | 4580 episode | score: 9.49 | loss: 10.24683 | epsilon: 0.01 462 | 4590 episode | score: 9.61 | loss: 15.10229 | epsilon: 0.01 463 | 4600 episode | score: 9.56 | loss: 10.60676 | epsilon: 0.01 464 | 4610 episode | score: 9.63 | loss: 12.66099 | epsilon: 0.01 465 | 4620 episode | score: 10.00 | loss: 9.28884 | epsilon: 0.01 466 | 4630 episode | score: 9.89 | loss: 7.29501 | epsilon: 0.01 467 | 4640 episode | score: 10.42 | loss: 8.61101 | epsilon: 0.01 468 | 4650 episode | score: 10.34 | loss: 7.17378 | epsilon: 0.01 469 | 4660 episode | score: 10.53 | loss: 8.58301 | epsilon: 0.01 470 | 4670 episode | score: 10.51 | loss: 5.47748 | epsilon: 0.01 471 | 4680 episode | score: 10.47 | loss: 6.90692 | epsilon: 0.01 472 | 4690 episode | score: 10.29 | loss: 9.03390 | epsilon: 0.01 473 | 4700 episode | score: 10.20 | loss: 5.70795 | epsilon: 0.01 474 | 4710 episode | score: 10.72 | loss: 9.54066 | epsilon: 0.01 475 | 4720 episode | score: 10.60 | loss: 12.63967 | epsilon: 0.01 476 | 4730 episode | score: 10.43 | loss: 4.03638 | epsilon: 0.01 477 | 4740 episode | score: 10.49 | loss: 6.36343 | epsilon: 0.01 478 | 4750 episode | score: 10.51 | loss: 7.95718 | epsilon: 0.01 479 | 4760 episode | score: 10.43 | loss: 7.94708 | epsilon: 0.01 480 | 4770 episode | score: 10.52 | loss: 4.00741 | epsilon: 0.01 481 | 4780 episode | score: 10.51 | loss: 9.49057 | epsilon: 0.01 482 | 4790 episode | score: 10.81 | loss: 7.92099 | epsilon: 0.01 483 | 4800 episode | score: 10.74 | loss: 3.20152 | epsilon: 0.01 484 | 4810 episode | score: 10.62 | loss: 5.55500 | epsilon: 0.01 485 | 4820 episode | score: 11.10 | loss: 4.79100 | epsilon: 0.01 486 | 4830 episode | score: 11.42 | loss: 7.12332 | epsilon: 0.01 487 | 4840 episode | score: 11.13 | loss: 4.76933 | epsilon: 0.01 488 | 4850 episode | score: 12.03 | loss: 7.91687 | epsilon: 0.01 489 | 4860 episode | score: 11.75 | loss: 4.77076 | epsilon: 0.01 490 | 4870 episode | score: 11.48 | loss: 7.90673 | epsilon: 0.01 491 | 4880 episode | score: 11.24 | loss: 8.69827 | epsilon: 0.01 492 | 4890 episode | score: 11.08 | loss: 7.91009 | epsilon: 0.01 493 | 4900 episode | score: 10.85 | loss: 6.32833 | epsilon: 0.01 494 | 4910 episode | score: 10.73 | loss: 7.97584 | epsilon: 0.01 495 | 4920 episode | score: 10.65 | loss: 7.13840 | epsilon: 0.01 496 | 4930 episode | score: 10.61 | loss: 7.24520 | epsilon: 0.01 497 | 4940 episode | score: 10.45 | loss: 11.09426 | epsilon: 0.01 498 | 4950 episode | score: 10.63 | loss: 8.69860 | epsilon: 0.01 499 | 4960 episode | score: 10.46 | loss: 9.52120 | epsilon: 0.01 500 | 4970 episode | score: 10.31 | loss: 8.69969 | epsilon: 0.01 501 | 4980 episode | score: 10.18 | loss: 7.11379 | epsilon: 0.01 502 | 4990 episode | score: 10.52 | loss: 11.04724 | epsilon: 0.01 503 | -------------------------------------------------------------------------------- /src/.ipynb_checkpoints/plot_graphs-checkpoint.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [], 3 | "metadata": {}, 4 | "nbformat": 4, 5 | "nbformat_minor": 2 6 | } 7 | -------------------------------------------------------------------------------- /src/__pycache__/config_DQN.cpython-37.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/udion/Transformer-RL/2cdc29f6c31375f69d1ca1e2ce74b01300b94044/src/__pycache__/config_DQN.cpython-37.pyc -------------------------------------------------------------------------------- /src/__pycache__/config_DRQN.cpython-37.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/udion/Transformer-RL/2cdc29f6c31375f69d1ca1e2ce74b01300b94044/src/__pycache__/config_DRQN.cpython-37.pyc -------------------------------------------------------------------------------- /src/__pycache__/config_DTQN.cpython-37.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/udion/Transformer-RL/2cdc29f6c31375f69d1ca1e2ce74b01300b94044/src/__pycache__/config_DTQN.cpython-37.pyc -------------------------------------------------------------------------------- /src/__pycache__/memory.cpython-37.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/udion/Transformer-RL/2cdc29f6c31375f69d1ca1e2ce74b01300b94044/src/__pycache__/memory.cpython-37.pyc -------------------------------------------------------------------------------- /src/__pycache__/model_DQN.cpython-37.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/udion/Transformer-RL/2cdc29f6c31375f69d1ca1e2ce74b01300b94044/src/__pycache__/model_DQN.cpython-37.pyc -------------------------------------------------------------------------------- /src/__pycache__/model_DRQN.cpython-37.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/udion/Transformer-RL/2cdc29f6c31375f69d1ca1e2ce74b01300b94044/src/__pycache__/model_DRQN.cpython-37.pyc -------------------------------------------------------------------------------- /src/__pycache__/model_DTQN.cpython-37.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/udion/Transformer-RL/2cdc29f6c31375f69d1ca1e2ce74b01300b94044/src/__pycache__/model_DTQN.cpython-37.pyc -------------------------------------------------------------------------------- /src/bash_gen_trace.sh: -------------------------------------------------------------------------------- 1 | for i in {1..10} 2 | do 3 | echo "training DQN trace $i ..." 4 | python train_DQN.py > ../out/trace_DQN_"$i".txt 5 | done 6 | 7 | for i in {1..10} 8 | do 9 | echo "training DRQN trace $i ..." 10 | python train_DRQN.py > ../out/trace_DRQN_"$i".txt 11 | done 12 | 13 | for i in {1..10} 14 | do 15 | echo "training DTQN trace $i ..." 16 | python train_DTQN.py > ../out/trace_DTQN_"$i".txt 17 | done -------------------------------------------------------------------------------- /src/config_DQN.py: -------------------------------------------------------------------------------- 1 | import torch 2 | 3 | env_name = 'CartPole-v1' 4 | gamma = 0.99 5 | batch_size = 32 6 | lr = 0.0001 7 | initial_exploration = 1000 8 | goal_score = 200 9 | log_interval = 10 10 | update_target = 100 11 | replay_memory_capacity = 1000 12 | device = torch.device("cuda" if torch.cuda.is_available() else "cpu") 13 | 14 | sequence_length = 4 -------------------------------------------------------------------------------- /src/config_DRQN.py: -------------------------------------------------------------------------------- 1 | import torch 2 | 3 | env_name = 'CartPole-v1' 4 | gamma = 0.99 5 | batch_size = 32 6 | lr = 0.001 7 | initial_exploration = 1000 8 | goal_score = 200 9 | log_interval = 10 10 | update_target = 100 11 | replay_memory_capacity = 100 12 | device = torch.device("cuda" if torch.cuda.is_available() else "cpu") 13 | 14 | sequence_length = 8 15 | burn_in_length = 4 -------------------------------------------------------------------------------- /src/config_DTQN.py: -------------------------------------------------------------------------------- 1 | import torch 2 | 3 | env_name = 'CartPole-v1' 4 | gamma = 0.99 5 | batch_size = 32 6 | lr = 0.001 7 | initial_exploration = 1000 8 | goal_score = 200 9 | log_interval = 10 10 | update_target = 100 11 | replay_memory_capacity = 100 12 | device = torch.device("cuda" if torch.cuda.is_available() else "cpu") 13 | 14 | sequence_length = 8 15 | burn_in_length = 4 -------------------------------------------------------------------------------- /src/memory.py: -------------------------------------------------------------------------------- 1 | import random 2 | from collections import namedtuple, deque 3 | from config_DQN import sequence_length as sequence_length_DQN 4 | from config_DRQN import sequence_length as sequence_length_DRQN 5 | from config_DTQN import sequence_length as sequence_length_DTQN 6 | import numpy as np 7 | import torch 8 | 9 | Transition = namedtuple( 10 | 'Transition', ('state', 'next_state', 'action', 'reward', 'mask') 11 | ) 12 | 13 | class Memory_DQN(object): 14 | def __init__(self, capacity): 15 | self.memory = deque(maxlen=capacity) 16 | self.capacity = capacity 17 | 18 | def push(self, state, next_state, action, reward, mask): 19 | self.memory.append(Transition(torch.stack(list(state)), torch.stack(list(next_state)), action, reward, mask)) 20 | 21 | def sample(self, batch_size): 22 | transitions = random.sample(self.memory, batch_size) 23 | batch = Transition(*zip(*transitions)) 24 | return batch 25 | 26 | def __len__(self): 27 | return len(self.memory) 28 | 29 | class Memory_DRQN(object): 30 | def __init__(self, capacity): 31 | self.memory = deque(maxlen=capacity) 32 | self.local_memory = [] 33 | self.capacity = capacity 34 | 35 | def push(self, state, next_state, action, reward, mask): 36 | self.local_memory.append(Transition(state, next_state, action, reward, mask)) 37 | if mask == 0: 38 | while len(self.local_memory) < sequence_length_DRQN: 39 | self.local_memory.insert(0, Transition( 40 | torch.Tensor([0, 0]), 41 | torch.Tensor([0, 0]), 42 | 0, 43 | 0, 44 | 0, 45 | )) 46 | self.memory.append(self.local_memory) 47 | self.local_memory = [] 48 | 49 | def sample(self, batch_size): 50 | batch_state, batch_next_state, batch_action, batch_reward, batch_mask = [], [], [], [], [] 51 | p = np.array([len(episode) for episode in self.memory]) 52 | p = p / p.sum() 53 | 54 | batch_indexes = np.random.choice(np.arange(len(self.memory)), batch_size, p=p) 55 | 56 | for batch_idx in batch_indexes: 57 | episode = self.memory[batch_idx] 58 | 59 | start = random.randint(0, len(episode) - sequence_length_DRQN) 60 | transitions = episode[start:start + sequence_length_DRQN] 61 | batch = Transition(*zip(*transitions)) 62 | 63 | # print(batch.state) 64 | batch_state.append(torch.stack( list(map(lambda s: s.to('cpu'), list(batch.state))) )) 65 | batch_next_state.append(torch.stack( list(map(lambda s: s.to('cpu'), list(batch.next_state))) )) 66 | batch_action.append(torch.Tensor(list(batch.action) )) 67 | batch_reward.append(torch.Tensor(list(batch.reward))) 68 | batch_mask.append(torch.Tensor(list(batch.mask))) 69 | 70 | return Transition(batch_state, batch_next_state, batch_action, batch_reward, batch_mask) 71 | 72 | def __len__(self): 73 | return len(self.memory) 74 | 75 | class Memory_DTQN(object): 76 | def __init__(self, capacity): 77 | self.memory = deque(maxlen=capacity) 78 | self.local_memory = [] 79 | self.capacity = capacity 80 | 81 | def push(self, state, next_state, action, reward, mask): 82 | self.local_memory.append(Transition(state, next_state, action, reward, mask)) 83 | if mask == 0: 84 | while len(self.local_memory) < sequence_length_DTQN: 85 | self.local_memory.insert(0, Transition( 86 | torch.Tensor([0, 0]), 87 | torch.Tensor([0, 0]), 88 | 0, 89 | 0, 90 | 0, 91 | )) 92 | self.memory.append(self.local_memory) 93 | self.local_memory = [] 94 | 95 | def sample(self, batch_size): 96 | batch_state, batch_next_state, batch_action, batch_reward, batch_mask = [], [], [], [], [] 97 | p = np.array([len(episode) for episode in self.memory]) 98 | p = p / p.sum() 99 | 100 | batch_indexes = np.random.choice(np.arange(len(self.memory)), batch_size, p=p) 101 | 102 | for batch_idx in batch_indexes: 103 | episode = self.memory[batch_idx] 104 | 105 | start = random.randint(0, len(episode) - sequence_length_DTQN) 106 | transitions = episode[start:start + sequence_length_DTQN] 107 | batch = Transition(*zip(*transitions)) 108 | 109 | # print(batch.state) 110 | batch_state.append(torch.stack( list(map(lambda s: s.to('cpu'), list(batch.state))) )) 111 | batch_next_state.append(torch.stack( list(map(lambda s: s.to('cpu'), list(batch.next_state))) )) 112 | batch_action.append(torch.Tensor(list(batch.action) )) 113 | batch_reward.append(torch.Tensor(list(batch.reward))) 114 | batch_mask.append(torch.Tensor(list(batch.mask))) 115 | 116 | return Transition(batch_state, batch_next_state, batch_action, batch_reward, batch_mask) 117 | 118 | def __len__(self): 119 | return len(self.memory) 120 | -------------------------------------------------------------------------------- /src/model_DQN.py: -------------------------------------------------------------------------------- 1 | import torch 2 | import torch.nn as nn 3 | import torch.nn.functional as F 4 | from config_DQN import gamma, sequence_length, device 5 | # torch.manual_seed(0) 6 | 7 | class QNet(nn.Module): 8 | def __init__(self, num_inputs, num_outputs): 9 | super(QNet, self).__init__() 10 | self.num_inputs = num_inputs 11 | self.num_outputs = num_outputs 12 | 13 | self.fc1 = nn.Linear(num_inputs * sequence_length, 128) 14 | self.fc2 = nn.Linear(128, num_outputs) 15 | 16 | for m in self.modules(): 17 | if isinstance(m, nn.Linear): 18 | nn.init.xavier_uniform_(m.weight) 19 | 20 | def forward(self, x): 21 | # print(1, x.shape) 22 | seq_length = x.size(1) 23 | if seq_length != sequence_length: 24 | x = torch.cat([x]*(sequence_length-seq_length+1), dim=1) 25 | # print('in', x.shape) 26 | x = x.view(-1, self.num_inputs * sequence_length) 27 | # print(2, x.shape) 28 | x = F.relu(self.fc1(x)) 29 | # print(3, x.shape) 30 | qvalue = self.fc2(x) 31 | return qvalue 32 | 33 | @classmethod 34 | def train_model(cls, online_net, target_net, optimizer, batch): 35 | states = torch.stack(batch.state).to(device) 36 | next_states = torch.stack(batch.next_state).to(device) 37 | actions = torch.Tensor(batch.action).float().to(device) 38 | rewards = torch.Tensor(batch.reward).to(device) 39 | masks = torch.Tensor(batch.mask).to(device) 40 | 41 | pred = online_net(states) 42 | next_pred = target_net(next_states) 43 | 44 | pred = torch.sum(pred.mul(actions), dim=1) 45 | 46 | target = rewards + masks * gamma * next_pred.max(1)[0] 47 | 48 | loss = F.l1_loss(pred, target.detach()) 49 | optimizer.zero_grad() 50 | loss.backward() 51 | optimizer.step() 52 | 53 | return loss 54 | 55 | def get_action(self, input): 56 | qvalue = self.forward(input) 57 | _, action = torch.max(qvalue, 1) 58 | return action.cpu().numpy()[0] 59 | -------------------------------------------------------------------------------- /src/model_DRQN.py: -------------------------------------------------------------------------------- 1 | import torch 2 | import torch.nn as nn 3 | import torch.nn.functional as F 4 | 5 | from config_DRQN import gamma, device, batch_size, sequence_length, burn_in_length 6 | # torch.manual_seed(0) 7 | 8 | class DRQN(nn.Module): 9 | def __init__(self, num_inputs, num_outputs): 10 | super(DRQN, self).__init__() 11 | self.num_inputs = num_inputs 12 | self.num_outputs = num_outputs 13 | 14 | self.lstm = nn.LSTM(input_size=num_inputs, hidden_size=128, batch_first=True) 15 | self.fc1 = nn.Linear(128, 256) 16 | self.fc2 = nn.Linear(256, num_outputs) 17 | 18 | for m in self.modules(): 19 | if isinstance(m, nn.Linear): 20 | nn.init.xavier_uniform_(m.weight) 21 | 22 | def forward(self, x, hidden=None): 23 | # x [batch_size, sequence_length, num_inputs] 24 | 25 | if hidden is not None: 26 | out, hidden = self.lstm(x, hidden) 27 | # print('if', out.shape, hidden[0].shape, x.shape) 28 | else: 29 | out, hidden = self.lstm(x) 30 | # print('else', out.shape, hidden[0].shape, x.shape) 31 | out = F.relu(self.fc1(out)) 32 | qvalue = self.fc2(out) 33 | 34 | return qvalue, hidden 35 | 36 | 37 | @classmethod 38 | def train_model(cls, online_net, target_net, optimizer, batch): 39 | def slice_burn_in(item): 40 | return item[:, burn_in_length:, :] 41 | states = torch.stack(batch.state).view(batch_size, sequence_length, online_net.num_inputs).to(device) 42 | next_states = torch.stack(batch.next_state).view(batch_size, sequence_length, online_net.num_inputs).to(device) 43 | actions = torch.stack(batch.action).view(batch_size, sequence_length, -1).long().to(device) 44 | rewards = torch.stack(batch.reward).view(batch_size, sequence_length, -1).to(device) 45 | masks = torch.stack(batch.mask).view(batch_size, sequence_length, -1).to(device) 46 | 47 | pred, _ = online_net(states) 48 | next_pred, _ = target_net(next_states) 49 | 50 | pred = slice_burn_in(pred) 51 | next_pred = slice_burn_in(next_pred) 52 | actions = slice_burn_in(actions) 53 | rewards = slice_burn_in(rewards) 54 | masks = slice_burn_in(masks) 55 | 56 | pred = pred.gather(2, actions) 57 | # print('dbg', rewards.shape, masks.shape, next_states.shape, next_pred.shape) 58 | target = rewards + masks * gamma * next_pred.max(2, keepdim=True)[0] 59 | 60 | loss = F.l1_loss(pred, target.detach()) 61 | optimizer.zero_grad() 62 | loss.backward() 63 | optimizer.step() 64 | 65 | return loss 66 | 67 | def get_action(self, state, hidden): 68 | state = state.unsqueeze(0).unsqueeze(0) 69 | 70 | qvalue, hidden = self.forward(state, hidden) 71 | 72 | _, action = torch.max(qvalue, 2) 73 | 74 | return action.cpu().numpy()[0][0], hidden 75 | -------------------------------------------------------------------------------- /src/model_DTQN.py: -------------------------------------------------------------------------------- 1 | import torch 2 | import torch.nn as nn 3 | import torch.nn.functional as F 4 | 5 | from config_DTQN import gamma, device, batch_size, sequence_length, burn_in_length 6 | # torch.manual_seed(0) 7 | 8 | class DTQN(nn.Module): 9 | def __init__(self, num_inputs, num_outputs): 10 | super(DTQN, self).__init__() 11 | self.num_inputs = num_inputs 12 | self.num_outputs = num_outputs 13 | 14 | self.fc = nn.Linear(2, 64) 15 | self.Tlayer = nn.TransformerEncoderLayer(d_model=64, nhead=2) 16 | self.transformerE = nn.TransformerEncoder(self.Tlayer, num_layers=3) 17 | 18 | self.fc1 = nn.Linear(64, 32) 19 | self.fc2 = nn.Linear(32, num_outputs) 20 | 21 | for m in self.modules(): 22 | if isinstance(m, nn.Linear): 23 | nn.init.xavier_uniform_(m.weight) 24 | 25 | def forward(self, x, hidden=None): 26 | x = x.transpose(0,1) 27 | x = self.fc(x) 28 | out = self.transformerE(x) 29 | out = out.transpose(0,1) 30 | out = F.relu(self.fc1(out)) 31 | qvalue = self.fc2(out) 32 | 33 | return qvalue, hidden 34 | 35 | 36 | @classmethod 37 | def train_model(cls, online_net, target_net, optimizer, batch): 38 | def slice_burn_in(item): 39 | return item[:, burn_in_length:, :] 40 | states = torch.stack(batch.state).view(batch_size, sequence_length, online_net.num_inputs).to(device) 41 | next_states = torch.stack(batch.next_state).view(batch_size, sequence_length, online_net.num_inputs).to(device) 42 | actions = torch.stack(batch.action).view(batch_size, sequence_length, -1).long().to(device) 43 | rewards = torch.stack(batch.reward).view(batch_size, sequence_length, -1).to(device) 44 | masks = torch.stack(batch.mask).view(batch_size, sequence_length, -1).to(device) 45 | 46 | pred, _ = online_net(states) 47 | next_pred, _ = target_net(next_states) 48 | 49 | pred = slice_burn_in(pred) 50 | next_pred = slice_burn_in(next_pred) 51 | actions = slice_burn_in(actions) 52 | rewards = slice_burn_in(rewards) 53 | masks = slice_burn_in(masks) 54 | 55 | pred = pred.gather(2, actions) 56 | 57 | target = rewards + masks * gamma * next_pred.max(2, keepdim=True)[0] 58 | 59 | loss = F.l1_loss(pred, target.detach()) 60 | optimizer.zero_grad() 61 | loss.backward() 62 | optimizer.step() 63 | 64 | return loss 65 | 66 | def get_action(self, state, hidden): 67 | state = state.unsqueeze(0).unsqueeze(0) 68 | 69 | qvalue, hidden = self.forward(state, hidden) 70 | 71 | _, action = torch.max(qvalue, 2) 72 | 73 | return action.cpu().numpy()[0][0], hidden 74 | -------------------------------------------------------------------------------- /src/train_DQN.py: -------------------------------------------------------------------------------- 1 | import os 2 | import sys 3 | import gym 4 | import random 5 | import numpy as np 6 | 7 | import torch 8 | import torch.optim as optim 9 | import torch.nn.functional as F 10 | from model_DQN import QNet 11 | from memory import Memory_DQN as Memory 12 | from tensorboardX import SummaryWriter 13 | 14 | from config_DQN import env_name, initial_exploration, batch_size, update_target, goal_score, log_interval, device, replay_memory_capacity, lr, sequence_length 15 | from collections import deque 16 | 17 | # torch.manual_seed(0) 18 | # random.seed(0) 19 | # np.random.seed(0) 20 | 21 | def get_action(state_series, target_net, epsilon, env): 22 | if np.random.rand() <= epsilon or len(state_series) < sequence_length: 23 | return env.action_space.sample() 24 | else: 25 | return target_net.get_action(torch.stack(list(state_series))) 26 | 27 | def update_target_model(online_net, target_net): 28 | # Target <- Net 29 | target_net.load_state_dict(online_net.state_dict()) 30 | 31 | def state_to_partial_observability(state): 32 | state = state[[0, 2]] 33 | return state 34 | 35 | def main(): 36 | env = gym.make(env_name) 37 | env.seed(500) 38 | torch.manual_seed(500) 39 | 40 | num_inputs = 2 41 | num_actions = env.action_space.n 42 | print('state size:', num_inputs) 43 | print('action size:', num_actions) 44 | 45 | online_net = QNet(num_inputs, num_actions) 46 | target_net = QNet(num_inputs, num_actions) 47 | update_target_model(online_net, target_net) 48 | 49 | optimizer = optim.Adam(online_net.parameters(), lr=lr) 50 | N_EPISODES = 5000 51 | writer = SummaryWriter('logs') 52 | 53 | online_net.to(device) 54 | target_net.to(device) 55 | online_net.train() 56 | target_net.train() 57 | memory = Memory(replay_memory_capacity) 58 | running_score = 0 59 | epsilon = 1.0 60 | steps = 0 61 | loss = 0 62 | 63 | for e in range(N_EPISODES): 64 | done = False 65 | 66 | state_series = deque(maxlen=sequence_length) 67 | next_state_series = deque(maxlen=sequence_length) 68 | score = 0 69 | state = env.reset() 70 | 71 | state = state_to_partial_observability(state) 72 | state = torch.Tensor(state).to(device) 73 | 74 | next_state_series.append(state) 75 | while not done: 76 | steps += 1 77 | state_series.append(state) 78 | action = get_action(state_series, target_net, epsilon, env) 79 | next_state, reward, done, _ = env.step(action) 80 | 81 | next_state = state_to_partial_observability(next_state) 82 | next_state = torch.Tensor(next_state).to(device) 83 | 84 | 85 | mask = 0 if done else 1 86 | reward = reward if not done or score == 499 else -1 87 | action_one_hot = np.zeros(2) 88 | action_one_hot[action] = 1 89 | if len(state_series) >= sequence_length: 90 | memory.push(state_series, next_state_series, action_one_hot, reward, mask) 91 | 92 | score += reward 93 | state = next_state 94 | 95 | if steps > initial_exploration: 96 | epsilon -= 0.000005 97 | epsilon = max(epsilon, 0.01) 98 | 99 | batch = memory.sample(batch_size) 100 | loss = QNet.train_model(online_net, target_net, optimizer, batch) 101 | 102 | if steps % update_target == 0: 103 | update_target_model(online_net, target_net) 104 | 105 | score = score if score == 500.0 else score + 1 106 | if running_score == 0: 107 | running_score = score 108 | else: 109 | running_score = 0.99 * running_score + 0.01 * score 110 | if e % log_interval == 0: 111 | print('{} episode | score: {:.2f} | loss: {:.5f} | epsilon: {:.2f}'.format( 112 | e, running_score, loss, epsilon)) 113 | writer.add_scalar('log/score', float(running_score), e) 114 | writer.add_scalar('log/loss', float(loss), e) 115 | 116 | if running_score > goal_score: 117 | break 118 | 119 | if __name__=="__main__": 120 | main() -------------------------------------------------------------------------------- /src/train_DRQN.py: -------------------------------------------------------------------------------- 1 | import os 2 | import sys 3 | import gym 4 | import random 5 | import numpy as np 6 | 7 | import torch 8 | import torch.optim as optim 9 | import torch.nn.functional as F 10 | from model_DRQN import DRQN 11 | from memory import Memory_DRQN as Memory 12 | from tensorboardX import SummaryWriter 13 | 14 | from config_DRQN import env_name, initial_exploration, batch_size, update_target, goal_score, log_interval, device, replay_memory_capacity, lr, sequence_length 15 | 16 | from collections import deque 17 | # torch.manual_seed(0) 18 | # random.seed(0) 19 | # np.random.seed(0) 20 | 21 | def get_action(state, target_net, epsilon, env, hidden): 22 | action, hidden = target_net.get_action(state, hidden) 23 | if np.random.rand() <= epsilon: 24 | return env.action_space.sample(), hidden 25 | else: 26 | return action, hidden 27 | 28 | def update_target_model(online_net, target_net): 29 | # Target <- Net 30 | target_net.load_state_dict(online_net.state_dict()) 31 | 32 | def state_to_partial_observability(state): 33 | # print(state) 34 | state = state[[0, 2]] 35 | # print(state) 36 | return state 37 | 38 | def main(): 39 | env = gym.make(env_name) 40 | env.seed(500) 41 | torch.manual_seed(500) 42 | 43 | # num_inputs = env.observation_space.shape[0] 44 | num_inputs = 2 45 | num_actions = env.action_space.n 46 | print('state size:', num_inputs) 47 | print('action size:', num_actions) 48 | 49 | online_net = DRQN(num_inputs, num_actions) 50 | target_net = DRQN(num_inputs, num_actions) 51 | update_target_model(online_net, target_net) 52 | 53 | optimizer = optim.Adam(online_net.parameters(), lr=lr) 54 | N_EPISODES = 5000 55 | # scheduler = optim.lr_scheduler.CosineAnnealingLR(optimizer, N_EPISODES) 56 | writer = SummaryWriter('logs') 57 | 58 | online_net.to(device) 59 | target_net.to(device) 60 | online_net.train() 61 | target_net.train() 62 | memory = Memory(replay_memory_capacity) 63 | running_score = 0 64 | epsilon = 1.0 65 | steps = 0 66 | loss = 0 67 | 68 | for e in range(N_EPISODES): 69 | done = False 70 | 71 | score = 0 72 | state = env.reset() 73 | state = state_to_partial_observability(state) 74 | state = torch.Tensor(state).to(device) 75 | 76 | hidden = None 77 | 78 | while not done: 79 | steps += 1 80 | 81 | # print(state.type(), hidden) 82 | action, hidden = get_action(state, target_net, epsilon, env, hidden) 83 | next_state, reward, done, _ = env.step(action) 84 | 85 | next_state = state_to_partial_observability(next_state) 86 | next_state = torch.Tensor(next_state).to(device) 87 | 88 | mask = 0 if done else 1 89 | reward = reward if not done or score == 499 else -1 90 | 91 | memory.push(state, next_state, action, reward, mask) 92 | 93 | score += reward 94 | state = next_state 95 | 96 | 97 | if steps > initial_exploration and len(memory) > batch_size: 98 | epsilon -= 0.00005 99 | epsilon = max(epsilon, 0.01) 100 | 101 | batch = memory.sample(batch_size) 102 | loss = DRQN.train_model(online_net, target_net, optimizer, batch) 103 | 104 | if steps % update_target == 0: 105 | update_target_model(online_net, target_net) 106 | # scheduler.step() 107 | 108 | score = score if score == 500.0 else score + 1 109 | if running_score == 0: 110 | running_score = score 111 | else: 112 | running_score = 0.99 * running_score + 0.01 * score 113 | if e % log_interval == 0: 114 | print('{} episode | score: {:.2f} | loss: {:.5f} | epsilon: {:.2f}'.format( 115 | e, running_score, loss, epsilon)) 116 | writer.add_scalar('log/score', float(running_score), e) 117 | writer.add_scalar('log/loss', float(loss), e) 118 | 119 | if running_score > goal_score: 120 | break 121 | 122 | if __name__=="__main__": 123 | main() 124 | -------------------------------------------------------------------------------- /src/train_DTQN.py: -------------------------------------------------------------------------------- 1 | import os 2 | import sys 3 | import gym 4 | import random 5 | import numpy as np 6 | 7 | import torch 8 | import torch.optim as optim 9 | import torch.nn.functional as F 10 | from model_DTQN import DTQN 11 | from memory import Memory_DTQN as Memory 12 | from tensorboardX import SummaryWriter 13 | 14 | from config_DRQN import env_name, initial_exploration, batch_size, update_target, goal_score, log_interval, device, replay_memory_capacity, lr, sequence_length 15 | 16 | from collections import deque 17 | # torch.manual_seed(0) 18 | # random.seed(0) 19 | # np.random.seed(0) 20 | 21 | def get_action(state, target_net, epsilon, env, hidden): 22 | action, hidden = target_net.get_action(state, hidden) 23 | if np.random.rand() <= epsilon: 24 | return env.action_space.sample(), hidden 25 | else: 26 | return action, hidden 27 | 28 | def update_target_model(online_net, target_net): 29 | # Target <- Net 30 | target_net.load_state_dict(online_net.state_dict()) 31 | 32 | def state_to_partial_observability(state): 33 | # print(state) 34 | state = state[[0, 2]] 35 | # print(state) 36 | return state 37 | 38 | def main(): 39 | env = gym.make(env_name) 40 | env.seed(500) 41 | torch.manual_seed(500) 42 | 43 | # num_inputs = env.observation_space.shape[0] 44 | num_inputs = 2 45 | num_actions = env.action_space.n 46 | print('state size:', num_inputs) 47 | print('action size:', num_actions) 48 | 49 | online_net = DTQN(num_inputs, num_actions) 50 | target_net = DTQN(num_inputs, num_actions) 51 | update_target_model(online_net, target_net) 52 | 53 | optimizer = optim.Adam(online_net.parameters(), lr=lr) 54 | N_EPISODES = 5000 55 | writer = SummaryWriter('logs') 56 | 57 | online_net.to(device) 58 | target_net.to(device) 59 | online_net.train() 60 | target_net.train() 61 | memory = Memory(replay_memory_capacity) 62 | running_score = 0 63 | epsilon = 1.0 64 | steps = 0 65 | loss = 0 66 | 67 | for e in range(N_EPISODES): 68 | done = False 69 | 70 | score = 0 71 | state = env.reset() 72 | state = state_to_partial_observability(state) 73 | state = torch.Tensor(state).to(device) 74 | 75 | hidden = None 76 | 77 | while not done: 78 | steps += 1 79 | 80 | action, hidden = get_action(state, target_net, epsilon, env, hidden) 81 | next_state, reward, done, _ = env.step(action) 82 | 83 | next_state = state_to_partial_observability(next_state) 84 | next_state = torch.Tensor(next_state).to(device) 85 | 86 | mask = 0 if done else 1 87 | reward = reward if not done or score == 499 else -1 88 | 89 | memory.push(state, next_state, action, reward, mask) 90 | 91 | score += reward 92 | state = next_state 93 | 94 | 95 | if steps > initial_exploration and len(memory) > batch_size: 96 | epsilon -= 0.00005 97 | epsilon = max(epsilon, 0.01) 98 | 99 | batch = memory.sample(batch_size) 100 | loss = DTQN.train_model(online_net, target_net, optimizer, batch) 101 | 102 | if steps % update_target == 0: 103 | update_target_model(online_net, target_net) 104 | 105 | score = score if score == 500.0 else score + 1 106 | if running_score == 0: 107 | running_score = score 108 | else: 109 | running_score = 0.99 * running_score + 0.01 * score 110 | if e % log_interval == 0: 111 | print('{} episode | score: {:.2f} | loss: {:.5f} | epsilon: {:.2f}'.format( 112 | e, running_score, loss, epsilon)) 113 | writer.add_scalar('log/score', float(running_score), e) 114 | writer.add_scalar('log/loss', float(loss), e) 115 | 116 | if running_score > goal_score: 117 | break 118 | 119 | 120 | if __name__=="__main__": 121 | main() 122 | --------------------------------------------------------------------------------