├── .gitignore
├── README.md
├── environment.yml
├── images
├── DQN_traces.png
├── DRQN_traces.png
├── DTQN_traces.png
├── mean_loss_comparison.png
└── mean_scores_comparison.png
├── out
├── trace_DQN_1.txt
├── trace_DQN_10.txt
├── trace_DQN_2.txt
├── trace_DQN_3.txt
├── trace_DQN_4.txt
├── trace_DQN_5.txt
├── trace_DQN_6.txt
├── trace_DQN_7.txt
├── trace_DQN_8.txt
├── trace_DQN_9.txt
├── trace_DRQN_1.txt
├── trace_DRQN_10.txt
├── trace_DRQN_2.txt
├── trace_DRQN_3.txt
├── trace_DRQN_4.txt
├── trace_DRQN_5.txt
├── trace_DRQN_6.txt
├── trace_DRQN_7.txt
├── trace_DRQN_8.txt
├── trace_DRQN_9.txt
├── trace_DTQN_1.txt
├── trace_DTQN_10.txt
├── trace_DTQN_2.txt
├── trace_DTQN_3.txt
├── trace_DTQN_4.txt
├── trace_DTQN_5.txt
├── trace_DTQN_6.txt
├── trace_DTQN_7.txt
├── trace_DTQN_8.txt
└── trace_DTQN_9.txt
└── src
├── .ipynb_checkpoints
└── plot_graphs-checkpoint.ipynb
├── __pycache__
├── config_DQN.cpython-37.pyc
├── config_DRQN.cpython-37.pyc
├── config_DTQN.cpython-37.pyc
├── memory.cpython-37.pyc
├── model_DQN.cpython-37.pyc
├── model_DRQN.cpython-37.pyc
└── model_DTQN.cpython-37.pyc
├── bash_gen_trace.sh
├── config_DQN.py
├── config_DRQN.py
├── config_DTQN.py
├── memory.py
├── model_DQN.py
├── model_DRQN.py
├── model_DTQN.py
├── plot_graphs.ipynb
├── train_DQN.py
├── train_DRQN.py
└── train_DTQN.py
/.gitignore:
--------------------------------------------------------------------------------
1 | src/logs/*
2 |
--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
1 | # Transformer Based Reinforcement Learning for Games
2 |
3 | This repository contains experimental models, written in PyTorch, which incorporate transformers in the Deep Q-Learning tasks, to see if they perform better than the RNN based version (DRQN) or simple DQN.
4 |
5 |
6 | # Requirements
7 | ```
8 | * OpenAI Gym
9 | * PyTorch >= 1.0.0
10 | * Python 3.6+
11 | * Conda (suggested for building environment etc)
12 | * tensorboardx==1.9
13 | * tensorflow==1.14.0 (non-gpu version will do, only needed for tensorboard)
14 |
15 | (environment.yml provides detailed list of dependency)
16 | ```
17 |
18 | # How to run experiments?
19 |
20 | Currently, we experiment with the `cartpole` environment, and experiment with
21 | the three different algorithms, DQN, DRQN (using LSTM) and a transformer based model called DTQN.
22 |
23 | The repo is structured in the following mannner
24 | ```
25 | -src/
26 | |-config_*.py (config files of a particular algorithm)
27 | |-model_*.py (model definition for a particular algorithm)
28 | |-train_*.py (training file of a particular algorithm)
29 | |-memory.py (action replay memory buffer)
30 |
31 | -out/
32 | |-trace_*.txt (traces obtained by different algorithms)
33 | ```
34 |
35 | To run a particular algorithm (say DQN) one can do ``python train_DQN.py`` this will generate the trace for that algorithm.
36 |
37 | # Results
38 |
39 | We have performed multiple experiments for each of the algorithms - DQN, DRQN and DTQN. Each algorithm was trained for 5000 episodes and we ran 10 different instances for each of the algorithms with random initialization.
The folowing pictures illustrate the plots of scores over episodes for different runs.
40 |
41 | 
42 | 
43 | 
44 |
45 |
46 |
--------------------------------------------------------------------------------
/environment.yml:
--------------------------------------------------------------------------------
1 | name: cixd_rl
2 | channels:
3 | - pytorch
4 | - defaults
5 | dependencies:
6 | - _libgcc_mutex=0.1=main
7 | - blas=1.0=mkl
8 | - ca-certificates=2019.8.28=0
9 | - certifi=2019.9.11=py37_0
10 | - cffi=1.12.3=py37h2e261b9_0
11 | - cudatoolkit=10.0.130=0
12 | - freetype=2.9.1=h8a8886c_1
13 | - intel-openmp=2019.4=243
14 | - jpeg=9b=h024ee3a_2
15 | - libedit=3.1.20181209=hc058e9b_0
16 | - libffi=3.2.1=hd88cf55_4
17 | - libgcc-ng=9.1.0=hdf63c60_0
18 | - libgfortran-ng=7.3.0=hdf63c60_0
19 | - libpng=1.6.37=hbc83047_0
20 | - libstdcxx-ng=9.1.0=hdf63c60_0
21 | - libtiff=4.0.10=h2733197_2
22 | - mkl=2019.4=243
23 | - mkl-service=2.3.0=py37he904b0f_0
24 | - mkl_fft=1.0.14=py37ha843d7b_0
25 | - mkl_random=1.1.0=py37hd6b4f25_0
26 | - ncurses=6.1=he6710b0_1
27 | - ninja=1.9.0=py37hfd86e86_0
28 | - numpy=1.16.5=py37h7e9f1db_0
29 | - numpy-base=1.16.5=py37hde5b4d6_0
30 | - olefile=0.46=py37_0
31 | - openssl=1.1.1d=h7b6447c_1
32 | - pillow=6.1.0=py37h34e0f95_0
33 | - pip=19.2.3=py37_0
34 | - pycparser=2.19=py37_0
35 | - python=3.7.4=h265db76_1
36 | - readline=7.0=h7b6447c_5
37 | - setuptools=41.2.0=py37_0
38 | - six=1.12.0=py37_0
39 | - sqlite=3.29.0=h7b6447c_0
40 | - tk=8.6.8=hbc83047_0
41 | - wheel=0.33.6=py37_0
42 | - xz=5.2.4=h14c3975_4
43 | - zlib=1.2.11=h7b6447c_3
44 | - zstd=1.3.7=h0b5b093_0
45 | - pytorch=1.2.0=py3.7_cuda10.0.130_cudnn7.6.2_0
46 | - torchvision=0.4.0=py37_cu100
47 | - pip:
48 | - absl-py==0.8.1
49 | - astor==0.8.0
50 | - atari-py==0.2.6
51 | - attrs==19.3.0
52 | - backcall==0.1.0
53 | - baselines==0.1.6
54 | - bleach==3.1.0
55 | - click==7.0
56 | - cloudpickle==1.2.2
57 | - cycler==0.10.0
58 | - decorator==4.4.1
59 | - defusedxml==0.6.0
60 | - entrypoints==0.3
61 | - future==0.18.2
62 | - gast==0.3.2
63 | - google-pasta==0.1.8
64 | - grpcio==1.25.0
65 | - gym==0.15.4
66 | - gym-ple==0.3
67 | - h5py==2.10.0
68 | - importlib-metadata==0.23
69 | - ipykernel==5.1.3
70 | - ipython==7.9.0
71 | - ipython-genutils==0.2.0
72 | - ipywidgets==7.5.1
73 | - jedi==0.15.1
74 | - jinja2==2.10.3
75 | - joblib==0.14.0
76 | - jsonschema==3.1.1
77 | - jupyter==1.0.0
78 | - jupyter-client==5.3.4
79 | - jupyter-console==6.0.0
80 | - jupyter-core==4.6.1
81 | - keras-applications==1.0.8
82 | - keras-preprocessing==1.1.0
83 | - kiwisolver==1.1.0
84 | - markdown==3.1.1
85 | - markupsafe==1.1.1
86 | - matplotlib==3.1.1
87 | - mistune==0.8.4
88 | - more-itertools==7.2.0
89 | - nbconvert==5.6.1
90 | - nbformat==4.4.0
91 | - notebook==6.0.2
92 | - opencv-python==4.1.1.26
93 | - pandas==0.25.3
94 | - pandocfilters==1.4.2
95 | - parso==0.5.1
96 | - pexpect==4.7.0
97 | - pickleshare==0.7.5
98 | - ple==0.0.1
99 | - prometheus-client==0.7.1
100 | - prompt-toolkit==2.0.10
101 | - protobuf==3.10.0
102 | - ptyprocess==0.6.0
103 | - pyglet==1.3.2
104 | - pygments==2.4.2
105 | - pyparsing==2.4.5
106 | - pyrsistent==0.15.5
107 | - python-dateutil==2.8.1
108 | - pytz==2019.3
109 | - pyzmq==18.1.1
110 | - qtconsole==4.5.5
111 | - scipy==1.3.2
112 | - seaborn==0.9.0
113 | - send2trash==1.5.0
114 | - tensorboard==1.14.0
115 | - tensorboardx==1.9
116 | - tensorflow==1.14.0
117 | - tensorflow-estimator==1.14.0
118 | - termcolor==1.1.0
119 | - terminado==0.8.3
120 | - testpath==0.4.4
121 | - torch==1.2.0
122 | - tornado==6.0.3
123 | - tqdm==4.38.0
124 | - traitlets==4.3.3
125 | - wcwidth==0.1.7
126 | - webencodings==0.5.1
127 | - werkzeug==0.16.0
128 | - widgetsnbextension==3.5.1
129 | - wrapt==1.11.2
130 | - zipp==0.6.0
131 |
132 |
--------------------------------------------------------------------------------
/images/DQN_traces.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/udion/Transformer-RL/2cdc29f6c31375f69d1ca1e2ce74b01300b94044/images/DQN_traces.png
--------------------------------------------------------------------------------
/images/DRQN_traces.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/udion/Transformer-RL/2cdc29f6c31375f69d1ca1e2ce74b01300b94044/images/DRQN_traces.png
--------------------------------------------------------------------------------
/images/DTQN_traces.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/udion/Transformer-RL/2cdc29f6c31375f69d1ca1e2ce74b01300b94044/images/DTQN_traces.png
--------------------------------------------------------------------------------
/images/mean_loss_comparison.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/udion/Transformer-RL/2cdc29f6c31375f69d1ca1e2ce74b01300b94044/images/mean_loss_comparison.png
--------------------------------------------------------------------------------
/images/mean_scores_comparison.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/udion/Transformer-RL/2cdc29f6c31375f69d1ca1e2ce74b01300b94044/images/mean_scores_comparison.png
--------------------------------------------------------------------------------
/out/trace_DTQN_1.txt:
--------------------------------------------------------------------------------
1 | state size: 2
2 | action size: 2
3 | 0 episode | score: 42.00 | loss: 0.00000 | epsilon: 1.00
4 | 10 episode | score: 40.06 | loss: 0.00000 | epsilon: 1.00
5 | 20 episode | score: 38.51 | loss: 0.00000 | epsilon: 1.00
6 | 30 episode | score: 37.15 | loss: 0.00000 | epsilon: 1.00
7 | 40 episode | score: 35.32 | loss: 0.00000 | epsilon: 1.00
8 | 50 episode | score: 34.80 | loss: 0.16093 | epsilon: 0.99
9 | 60 episode | score: 34.03 | loss: 0.16544 | epsilon: 0.97
10 | 70 episode | score: 32.72 | loss: 0.41782 | epsilon: 0.96
11 | 80 episode | score: 30.96 | loss: 0.35763 | epsilon: 0.95
12 | 90 episode | score: 30.62 | loss: 1.54800 | epsilon: 0.94
13 | 100 episode | score: 29.44 | loss: 0.29163 | epsilon: 0.93
14 | 110 episode | score: 28.56 | loss: 0.46364 | epsilon: 0.92
15 | 120 episode | score: 27.74 | loss: 0.44584 | epsilon: 0.91
16 | 130 episode | score: 27.39 | loss: 0.37214 | epsilon: 0.90
17 | 140 episode | score: 27.27 | loss: 0.23095 | epsilon: 0.88
18 | 150 episode | score: 26.95 | loss: 0.43954 | epsilon: 0.87
19 | 160 episode | score: 26.27 | loss: 0.45517 | epsilon: 0.86
20 | 170 episode | score: 25.34 | loss: 1.56370 | epsilon: 0.85
21 | 180 episode | score: 24.77 | loss: 0.74127 | epsilon: 0.84
22 | 190 episode | score: 24.02 | loss: 0.54314 | epsilon: 0.83
23 | 200 episode | score: 23.12 | loss: 0.88684 | epsilon: 0.82
24 | 210 episode | score: 22.73 | loss: 1.21659 | epsilon: 0.81
25 | 220 episode | score: 22.60 | loss: 1.68564 | epsilon: 0.80
26 | 230 episode | score: 22.24 | loss: 0.63498 | epsilon: 0.79
27 | 240 episode | score: 22.67 | loss: 0.67226 | epsilon: 0.78
28 | 250 episode | score: 22.03 | loss: 0.64331 | epsilon: 0.77
29 | 260 episode | score: 22.30 | loss: 0.37808 | epsilon: 0.76
30 | 270 episode | score: 21.48 | loss: 1.64759 | epsilon: 0.75
31 | 280 episode | score: 21.68 | loss: 1.39732 | epsilon: 0.74
32 | 290 episode | score: 21.12 | loss: 1.54835 | epsilon: 0.73
33 | 300 episode | score: 21.05 | loss: 0.41286 | epsilon: 0.72
34 | 310 episode | score: 20.76 | loss: 1.75307 | epsilon: 0.71
35 | 320 episode | score: 20.23 | loss: 0.77339 | epsilon: 0.70
36 | 330 episode | score: 19.58 | loss: 1.25978 | epsilon: 0.69
37 | 340 episode | score: 19.76 | loss: 0.42972 | epsilon: 0.68
38 | 350 episode | score: 19.78 | loss: 0.81583 | epsilon: 0.67
39 | 360 episode | score: 20.31 | loss: 1.31229 | epsilon: 0.66
40 | 370 episode | score: 20.49 | loss: 0.85073 | epsilon: 0.65
41 | 380 episode | score: 20.23 | loss: 1.30102 | epsilon: 0.64
42 | 390 episode | score: 19.76 | loss: 0.87183 | epsilon: 0.63
43 | 400 episode | score: 19.27 | loss: 2.18815 | epsilon: 0.62
44 | 410 episode | score: 18.73 | loss: 1.33286 | epsilon: 0.62
45 | 420 episode | score: 18.34 | loss: 1.35239 | epsilon: 0.61
46 | 430 episode | score: 17.94 | loss: 3.20092 | epsilon: 0.60
47 | 440 episode | score: 17.70 | loss: 2.29407 | epsilon: 0.59
48 | 450 episode | score: 17.30 | loss: 2.79657 | epsilon: 0.58
49 | 460 episode | score: 16.82 | loss: 0.52218 | epsilon: 0.58
50 | 470 episode | score: 16.39 | loss: 2.35247 | epsilon: 0.57
51 | 480 episode | score: 16.28 | loss: 1.91684 | epsilon: 0.56
52 | 490 episode | score: 16.54 | loss: 3.37298 | epsilon: 0.55
53 | 500 episode | score: 16.32 | loss: 2.42507 | epsilon: 0.55
54 | 510 episode | score: 16.00 | loss: 2.96815 | epsilon: 0.54
55 | 520 episode | score: 15.60 | loss: 1.01627 | epsilon: 0.53
56 | 530 episode | score: 15.35 | loss: 2.97971 | epsilon: 0.53
57 | 540 episode | score: 15.45 | loss: 5.00929 | epsilon: 0.52
58 | 550 episode | score: 15.29 | loss: 0.59140 | epsilon: 0.51
59 | 560 episode | score: 15.29 | loss: 2.05343 | epsilon: 0.50
60 | 570 episode | score: 14.97 | loss: 1.67497 | epsilon: 0.49
61 | 580 episode | score: 14.70 | loss: 1.11788 | epsilon: 0.49
62 | 590 episode | score: 14.39 | loss: 2.09961 | epsilon: 0.48
63 | 600 episode | score: 14.15 | loss: 1.91011 | epsilon: 0.47
64 | 610 episode | score: 14.05 | loss: 1.59686 | epsilon: 0.47
65 | 620 episode | score: 13.69 | loss: 4.83170 | epsilon: 0.46
66 | 630 episode | score: 13.44 | loss: 2.16104 | epsilon: 0.46
67 | 640 episode | score: 13.62 | loss: 1.68080 | epsilon: 0.45
68 | 650 episode | score: 13.47 | loss: 4.89931 | epsilon: 0.44
69 | 660 episode | score: 13.48 | loss: 4.99668 | epsilon: 0.43
70 | 670 episode | score: 13.80 | loss: 2.77219 | epsilon: 0.43
71 | 680 episode | score: 13.58 | loss: 3.95906 | epsilon: 0.42
72 | 690 episode | score: 13.37 | loss: 3.97788 | epsilon: 0.41
73 | 700 episode | score: 13.37 | loss: 3.35664 | epsilon: 0.41
74 | 710 episode | score: 13.21 | loss: 2.86256 | epsilon: 0.40
75 | 720 episode | score: 12.90 | loss: 3.48034 | epsilon: 0.39
76 | 730 episode | score: 12.65 | loss: 5.11371 | epsilon: 0.39
77 | 740 episode | score: 12.52 | loss: 2.88403 | epsilon: 0.38
78 | 750 episode | score: 12.88 | loss: 3.52321 | epsilon: 0.37
79 | 760 episode | score: 12.65 | loss: 5.20732 | epsilon: 0.37
80 | 770 episode | score: 12.58 | loss: 5.21305 | epsilon: 0.36
81 | 780 episode | score: 12.32 | loss: 5.81512 | epsilon: 0.36
82 | 790 episode | score: 12.32 | loss: 5.29636 | epsilon: 0.35
83 | 800 episode | score: 12.15 | loss: 3.56297 | epsilon: 0.34
84 | 810 episode | score: 12.21 | loss: 3.53993 | epsilon: 0.34
85 | 820 episode | score: 12.08 | loss: 4.17779 | epsilon: 0.33
86 | 830 episode | score: 11.95 | loss: 6.10416 | epsilon: 0.32
87 | 840 episode | score: 11.68 | loss: 2.46521 | epsilon: 0.32
88 | 850 episode | score: 11.59 | loss: 4.23309 | epsilon: 0.31
89 | 860 episode | score: 11.50 | loss: 4.83049 | epsilon: 0.31
90 | 870 episode | score: 11.56 | loss: 7.28215 | epsilon: 0.30
91 | 880 episode | score: 11.42 | loss: 6.71525 | epsilon: 0.30
92 | 890 episode | score: 11.60 | loss: 4.95379 | epsilon: 0.29
93 | 900 episode | score: 11.63 | loss: 4.90676 | epsilon: 0.28
94 | 910 episode | score: 11.52 | loss: 5.55674 | epsilon: 0.28
95 | 920 episode | score: 11.38 | loss: 6.19411 | epsilon: 0.27
96 | 930 episode | score: 11.26 | loss: 3.80527 | epsilon: 0.27
97 | 940 episode | score: 11.34 | loss: 5.21086 | epsilon: 0.26
98 | 950 episode | score: 11.27 | loss: 6.26882 | epsilon: 0.25
99 | 960 episode | score: 11.08 | loss: 3.15750 | epsilon: 0.25
100 | 970 episode | score: 11.38 | loss: 5.66316 | epsilon: 0.24
101 | 980 episode | score: 11.24 | loss: 6.99554 | epsilon: 0.23
102 | 990 episode | score: 11.03 | loss: 5.79221 | epsilon: 0.23
103 | 1000 episode | score: 11.01 | loss: 2.61911 | epsilon: 0.22
104 | 1010 episode | score: 10.89 | loss: 4.51524 | epsilon: 0.22
105 | 1020 episode | score: 10.75 | loss: 4.46806 | epsilon: 0.21
106 | 1030 episode | score: 10.56 | loss: 7.10737 | epsilon: 0.21
107 | 1040 episode | score: 10.48 | loss: 6.40347 | epsilon: 0.20
108 | 1050 episode | score: 10.52 | loss: 5.79626 | epsilon: 0.20
109 | 1060 episode | score: 10.48 | loss: 5.86367 | epsilon: 0.19
110 | 1070 episode | score: 10.32 | loss: 7.76478 | epsilon: 0.19
111 | 1080 episode | score: 10.26 | loss: 2.62518 | epsilon: 0.18
112 | 1090 episode | score: 10.23 | loss: 4.57249 | epsilon: 0.18
113 | 1100 episode | score: 10.12 | loss: 8.46477 | epsilon: 0.17
114 | 1110 episode | score: 9.98 | loss: 4.59900 | epsilon: 0.17
115 | 1120 episode | score: 9.91 | loss: 6.57322 | epsilon: 0.16
116 | 1130 episode | score: 9.92 | loss: 7.20591 | epsilon: 0.16
117 | 1140 episode | score: 9.82 | loss: 8.53715 | epsilon: 0.15
118 | 1150 episode | score: 9.81 | loss: 6.61314 | epsilon: 0.15
119 | 1160 episode | score: 9.72 | loss: 6.60741 | epsilon: 0.14
120 | 1170 episode | score: 9.86 | loss: 8.11260 | epsilon: 0.13
121 | 1180 episode | score: 9.93 | loss: 6.63596 | epsilon: 0.13
122 | 1190 episode | score: 9.79 | loss: 6.67112 | epsilon: 0.12
123 | 1200 episode | score: 9.77 | loss: 6.09025 | epsilon: 0.12
124 | 1210 episode | score: 9.65 | loss: 4.76503 | epsilon: 0.11
125 | 1220 episode | score: 9.56 | loss: 4.09326 | epsilon: 0.11
126 | 1230 episode | score: 9.73 | loss: 8.71921 | epsilon: 0.10
127 | 1240 episode | score: 9.89 | loss: 8.09039 | epsilon: 0.10
128 | 1250 episode | score: 9.74 | loss: 7.44783 | epsilon: 0.09
129 | 1260 episode | score: 9.70 | loss: 7.45099 | epsilon: 0.09
130 | 1270 episode | score: 9.69 | loss: 8.78152 | epsilon: 0.08
131 | 1280 episode | score: 9.59 | loss: 8.14003 | epsilon: 0.08
132 | 1290 episode | score: 9.50 | loss: 6.83141 | epsilon: 0.07
133 | 1300 episode | score: 9.38 | loss: 8.14618 | epsilon: 0.07
134 | 1310 episode | score: 9.53 | loss: 8.83955 | epsilon: 0.06
135 | 1320 episode | score: 9.47 | loss: 10.88557 | epsilon: 0.06
136 | 1330 episode | score: 9.53 | loss: 10.25268 | epsilon: 0.05
137 | 1340 episode | score: 9.47 | loss: 7.50952 | epsilon: 0.05
138 | 1350 episode | score: 9.41 | loss: 6.84119 | epsilon: 0.04
139 | 1360 episode | score: 9.33 | loss: 7.62696 | epsilon: 0.04
140 | 1370 episode | score: 9.28 | loss: 8.29397 | epsilon: 0.03
141 | 1380 episode | score: 9.23 | loss: 8.93910 | epsilon: 0.03
142 | 1390 episode | score: 9.21 | loss: 8.32343 | epsilon: 0.02
143 | 1400 episode | score: 9.16 | loss: 11.03749 | epsilon: 0.02
144 | 1410 episode | score: 9.22 | loss: 11.10894 | epsilon: 0.01
145 | 1420 episode | score: 9.10 | loss: 9.04405 | epsilon: 0.01
146 | 1430 episode | score: 9.03 | loss: 10.40970 | epsilon: 0.01
147 | 1440 episode | score: 8.99 | loss: 9.06423 | epsilon: 0.01
148 | 1450 episode | score: 8.95 | loss: 9.79906 | epsilon: 0.01
149 | 1460 episode | score: 8.91 | loss: 10.45152 | epsilon: 0.01
150 | 1470 episode | score: 8.84 | loss: 11.17154 | epsilon: 0.01
151 | 1480 episode | score: 8.81 | loss: 13.30375 | epsilon: 0.01
152 | 1490 episode | score: 8.78 | loss: 10.50567 | epsilon: 0.01
153 | 1500 episode | score: 8.75 | loss: 11.23524 | epsilon: 0.01
154 | 1510 episode | score: 8.91 | loss: 10.59758 | epsilon: 0.01
155 | 1520 episode | score: 8.88 | loss: 7.06622 | epsilon: 0.01
156 | 1530 episode | score: 8.87 | loss: 11.27084 | epsilon: 0.01
157 | 1540 episode | score: 8.81 | loss: 13.44193 | epsilon: 0.01
158 | 1550 episode | score: 8.94 | loss: 6.37859 | epsilon: 0.01
159 | 1560 episode | score: 9.02 | loss: 11.31540 | epsilon: 0.01
160 | 1570 episode | score: 8.95 | loss: 9.99572 | epsilon: 0.01
161 | 1580 episode | score: 8.93 | loss: 9.93918 | epsilon: 0.01
162 | 1590 episode | score: 8.90 | loss: 7.20135 | epsilon: 0.01
163 | 1600 episode | score: 8.87 | loss: 7.13101 | epsilon: 0.01
164 | 1610 episode | score: 8.82 | loss: 9.29032 | epsilon: 0.01
165 | 1620 episode | score: 8.86 | loss: 9.27523 | epsilon: 0.01
166 | 1630 episode | score: 8.83 | loss: 9.26597 | epsilon: 0.01
167 | 1640 episode | score: 8.76 | loss: 11.53603 | epsilon: 0.01
168 | 1650 episode | score: 8.92 | loss: 10.00630 | epsilon: 0.01
169 | 1660 episode | score: 8.89 | loss: 7.93438 | epsilon: 0.01
170 | 1670 episode | score: 8.86 | loss: 10.85615 | epsilon: 0.01
171 | 1680 episode | score: 9.04 | loss: 9.40141 | epsilon: 0.01
172 | 1690 episode | score: 9.07 | loss: 8.03816 | epsilon: 0.01
173 | 1700 episode | score: 9.02 | loss: 9.38599 | epsilon: 0.01
174 | 1710 episode | score: 8.96 | loss: 10.12590 | epsilon: 0.01
175 | 1720 episode | score: 9.00 | loss: 10.15560 | epsilon: 0.01
176 | 1730 episode | score: 8.91 | loss: 8.75969 | epsilon: 0.01
177 | 1740 episode | score: 8.86 | loss: 11.63796 | epsilon: 0.01
178 | 1750 episode | score: 8.80 | loss: 8.73201 | epsilon: 0.01
179 | 1760 episode | score: 8.88 | loss: 8.85375 | epsilon: 0.01
180 | 1770 episode | score: 8.83 | loss: 10.21729 | epsilon: 0.01
181 | 1780 episode | score: 8.84 | loss: 10.34629 | epsilon: 0.01
182 | 1790 episode | score: 8.75 | loss: 10.94930 | epsilon: 0.01
183 | 1800 episode | score: 8.76 | loss: 8.05821 | epsilon: 0.01
184 | 1810 episode | score: 8.72 | loss: 8.11742 | epsilon: 0.01
185 | 1820 episode | score: 8.63 | loss: 13.22092 | epsilon: 0.01
186 | 1830 episode | score: 8.66 | loss: 12.51130 | epsilon: 0.01
187 | 1840 episode | score: 8.60 | loss: 11.05582 | epsilon: 0.01
188 | 1850 episode | score: 8.56 | loss: 10.29749 | epsilon: 0.01
189 | 1860 episode | score: 8.51 | loss: 9.62049 | epsilon: 0.01
190 | 1870 episode | score: 8.52 | loss: 11.11880 | epsilon: 0.01
191 | 1880 episode | score: 8.62 | loss: 12.55165 | epsilon: 0.01
192 | 1890 episode | score: 8.60 | loss: 12.51082 | epsilon: 0.01
193 | 1900 episode | score: 8.77 | loss: 9.59667 | epsilon: 0.01
194 | 1910 episode | score: 8.71 | loss: 11.08668 | epsilon: 0.01
195 | 1920 episode | score: 8.66 | loss: 7.43915 | epsilon: 0.01
196 | 1930 episode | score: 8.66 | loss: 9.64025 | epsilon: 0.01
197 | 1940 episode | score: 9.45 | loss: 11.13645 | epsilon: 0.01
198 | 1950 episode | score: 9.37 | loss: 7.49247 | epsilon: 0.01
199 | 1960 episode | score: 9.33 | loss: 9.59682 | epsilon: 0.01
200 | 1970 episode | score: 9.28 | loss: 14.80193 | epsilon: 0.01
201 | 1980 episode | score: 9.28 | loss: 10.48195 | epsilon: 0.01
202 | 1990 episode | score: 9.21 | loss: 7.45852 | epsilon: 0.01
203 | 2000 episode | score: 9.22 | loss: 5.95186 | epsilon: 0.01
204 | 2010 episode | score: 9.19 | loss: 8.95806 | epsilon: 0.01
205 | 2020 episode | score: 9.08 | loss: 9.78051 | epsilon: 0.01
206 | 2030 episode | score: 9.03 | loss: 11.93187 | epsilon: 0.01
207 | 2040 episode | score: 8.98 | loss: 11.95851 | epsilon: 0.01
208 | 2050 episode | score: 8.92 | loss: 12.71336 | epsilon: 0.01
209 | 2060 episode | score: 8.99 | loss: 10.47637 | epsilon: 0.01
210 | 2070 episode | score: 8.90 | loss: 6.80452 | epsilon: 0.01
211 | 2080 episode | score: 8.84 | loss: 14.29578 | epsilon: 0.01
212 | 2090 episode | score: 8.78 | loss: 11.28092 | epsilon: 0.01
213 | 2100 episode | score: 8.73 | loss: 9.88292 | epsilon: 0.01
214 | 2110 episode | score: 8.71 | loss: 13.52903 | epsilon: 0.01
215 | 2120 episode | score: 9.26 | loss: 8.32086 | epsilon: 0.01
216 | 2130 episode | score: 9.29 | loss: 12.94514 | epsilon: 0.01
217 | 2140 episode | score: 9.24 | loss: 6.88041 | epsilon: 0.01
218 | 2150 episode | score: 9.20 | loss: 7.53566 | epsilon: 0.01
219 | 2160 episode | score: 9.15 | loss: 10.57451 | epsilon: 0.01
220 | 2170 episode | score: 9.07 | loss: 12.07813 | epsilon: 0.01
221 | 2180 episode | score: 9.00 | loss: 6.05306 | epsilon: 0.01
222 | 2190 episode | score: 8.92 | loss: 7.59512 | epsilon: 0.01
223 | 2200 episode | score: 8.93 | loss: 9.81163 | epsilon: 0.01
224 | 2210 episode | score: 9.21 | loss: 9.08550 | epsilon: 0.01
225 | 2220 episode | score: 9.15 | loss: 12.10806 | epsilon: 0.01
226 | 2230 episode | score: 9.06 | loss: 13.57111 | epsilon: 0.01
227 | 2240 episode | score: 9.03 | loss: 7.54174 | epsilon: 0.01
228 | 2250 episode | score: 8.96 | loss: 8.35778 | epsilon: 0.01
229 | 2260 episode | score: 8.94 | loss: 12.08924 | epsilon: 0.01
230 | 2270 episode | score: 8.84 | loss: 14.33356 | epsilon: 0.01
231 | 2280 episode | score: 9.28 | loss: 8.35518 | epsilon: 0.01
232 | 2290 episode | score: 9.19 | loss: 10.67279 | epsilon: 0.01
233 | 2300 episode | score: 9.10 | loss: 8.38233 | epsilon: 0.01
234 | 2310 episode | score: 9.05 | loss: 11.37863 | epsilon: 0.01
235 | 2320 episode | score: 8.98 | loss: 7.61795 | epsilon: 0.01
236 | 2330 episode | score: 8.93 | loss: 9.88662 | epsilon: 0.01
237 | 2340 episode | score: 8.86 | loss: 10.59986 | epsilon: 0.01
238 | 2350 episode | score: 8.81 | loss: 10.60581 | epsilon: 0.01
239 | 2360 episode | score: 8.80 | loss: 11.37358 | epsilon: 0.01
240 | 2370 episode | score: 8.71 | loss: 9.16223 | epsilon: 0.01
241 | 2380 episode | score: 8.69 | loss: 12.14555 | epsilon: 0.01
242 | 2390 episode | score: 8.69 | loss: 7.63645 | epsilon: 0.01
243 | 2400 episode | score: 8.69 | loss: 15.23756 | epsilon: 0.01
244 | 2410 episode | score: 8.67 | loss: 12.32878 | epsilon: 0.01
245 | 2420 episode | score: 8.64 | loss: 10.72119 | epsilon: 0.01
246 | 2430 episode | score: 8.64 | loss: 12.24568 | epsilon: 0.01
247 | 2440 episode | score: 8.60 | loss: 14.54856 | epsilon: 0.01
248 | 2450 episode | score: 8.69 | loss: 12.30029 | epsilon: 0.01
249 | 2460 episode | score: 8.64 | loss: 7.73817 | epsilon: 0.01
250 | 2470 episode | score: 8.60 | loss: 13.07990 | epsilon: 0.01
251 | 2480 episode | score: 8.75 | loss: 10.10155 | epsilon: 0.01
252 | 2490 episode | score: 8.79 | loss: 9.20475 | epsilon: 0.01
253 | 2500 episode | score: 8.89 | loss: 10.77987 | epsilon: 0.01
254 | 2510 episode | score: 8.91 | loss: 10.04893 | epsilon: 0.01
255 | 2520 episode | score: 8.87 | loss: 10.77398 | epsilon: 0.01
256 | 2530 episode | score: 8.95 | loss: 8.53565 | epsilon: 0.01
257 | 2540 episode | score: 8.95 | loss: 7.73821 | epsilon: 0.01
258 | 2550 episode | score: 9.63 | loss: 13.85527 | epsilon: 0.01
259 | 2560 episode | score: 9.53 | loss: 13.11086 | epsilon: 0.01
260 | 2570 episode | score: 9.42 | loss: 13.81915 | epsilon: 0.01
261 | 2580 episode | score: 9.37 | loss: 6.15294 | epsilon: 0.01
262 | 2590 episode | score: 9.33 | loss: 6.17238 | epsilon: 0.01
263 | 2600 episode | score: 9.27 | loss: 4.73876 | epsilon: 0.01
264 | 2610 episode | score: 9.20 | loss: 11.56850 | epsilon: 0.01
265 | 2620 episode | score: 9.09 | loss: 10.02787 | epsilon: 0.01
266 | 2630 episode | score: 8.98 | loss: 12.40529 | epsilon: 0.01
267 | 2640 episode | score: 8.97 | loss: 9.43007 | epsilon: 0.01
268 | 2650 episode | score: 8.91 | loss: 7.74540 | epsilon: 0.01
269 | 2660 episode | score: 8.87 | loss: 7.75366 | epsilon: 0.01
270 | 2670 episode | score: 8.88 | loss: 10.15491 | epsilon: 0.01
271 | 2680 episode | score: 8.84 | loss: 13.18449 | epsilon: 0.01
272 | 2690 episode | score: 8.85 | loss: 13.98673 | epsilon: 0.01
273 | 2700 episode | score: 8.78 | loss: 13.95599 | epsilon: 0.01
274 | 2710 episode | score: 8.77 | loss: 13.96089 | epsilon: 0.01
275 | 2720 episode | score: 8.76 | loss: 13.20681 | epsilon: 0.01
276 | 2730 episode | score: 8.72 | loss: 14.73774 | epsilon: 0.01
277 | 2740 episode | score: 8.68 | loss: 10.11293 | epsilon: 0.01
278 | 2750 episode | score: 8.66 | loss: 11.65639 | epsilon: 0.01
279 | 2760 episode | score: 8.65 | loss: 11.73176 | epsilon: 0.01
280 | 2770 episode | score: 8.58 | loss: 9.37598 | epsilon: 0.01
281 | 2780 episode | score: 8.52 | loss: 10.91862 | epsilon: 0.01
282 | 2790 episode | score: 8.54 | loss: 12.41489 | epsilon: 0.01
283 | 2800 episode | score: 8.53 | loss: 13.98008 | epsilon: 0.01
284 | 2810 episode | score: 8.52 | loss: 11.63640 | epsilon: 0.01
285 | 2820 episode | score: 8.51 | loss: 12.39264 | epsilon: 0.01
286 | 2830 episode | score: 8.52 | loss: 11.66165 | epsilon: 0.01
287 | 2840 episode | score: 8.51 | loss: 11.61177 | epsilon: 0.01
288 | 2850 episode | score: 8.49 | loss: 10.06735 | epsilon: 0.01
289 | 2860 episode | score: 8.44 | loss: 10.97625 | epsilon: 0.01
290 | 2870 episode | score: 8.45 | loss: 12.41634 | epsilon: 0.01
291 | 2880 episode | score: 8.42 | loss: 11.74682 | epsilon: 0.01
292 | 2890 episode | score: 8.49 | loss: 7.79012 | epsilon: 0.01
293 | 2900 episode | score: 8.50 | loss: 9.37829 | epsilon: 0.01
294 | 2910 episode | score: 8.61 | loss: 9.34004 | epsilon: 0.01
295 | 2920 episode | score: 8.68 | loss: 7.81453 | epsilon: 0.01
296 | 2930 episode | score: 8.66 | loss: 14.08928 | epsilon: 0.01
297 | 2940 episode | score: 8.75 | loss: 12.54502 | epsilon: 0.01
298 | 2950 episode | score: 9.07 | loss: 13.32049 | epsilon: 0.01
299 | 2960 episode | score: 8.97 | loss: 14.04430 | epsilon: 0.01
300 | 2970 episode | score: 8.92 | loss: 6.30872 | epsilon: 0.01
301 | 2980 episode | score: 8.90 | loss: 7.05106 | epsilon: 0.01
302 | 2990 episode | score: 8.88 | loss: 8.59156 | epsilon: 0.01
303 | 3000 episode | score: 8.84 | loss: 10.28142 | epsilon: 0.01
304 | 3010 episode | score: 8.81 | loss: 9.38775 | epsilon: 0.01
305 | 3020 episode | score: 8.82 | loss: 10.99244 | epsilon: 0.01
306 | 3030 episode | score: 8.78 | loss: 11.80965 | epsilon: 0.01
307 | 3040 episode | score: 8.70 | loss: 11.76866 | epsilon: 0.01
308 | 3050 episode | score: 8.67 | loss: 10.16454 | epsilon: 0.01
309 | 3060 episode | score: 8.66 | loss: 11.08585 | epsilon: 0.01
310 | 3070 episode | score: 8.62 | loss: 10.20813 | epsilon: 0.01
311 | 3080 episode | score: 8.62 | loss: 10.95652 | epsilon: 0.01
312 | 3090 episode | score: 8.58 | loss: 15.65759 | epsilon: 0.01
313 | 3100 episode | score: 8.55 | loss: 14.13209 | epsilon: 0.01
314 | 3110 episode | score: 8.66 | loss: 10.94310 | epsilon: 0.01
315 | 3120 episode | score: 8.61 | loss: 14.11647 | epsilon: 0.01
316 | 3130 episode | score: 8.65 | loss: 11.84604 | epsilon: 0.01
317 | 3140 episode | score: 8.63 | loss: 12.52877 | epsilon: 0.01
318 | 3150 episode | score: 8.64 | loss: 12.53898 | epsilon: 0.01
319 | 3160 episode | score: 8.61 | loss: 11.78446 | epsilon: 0.01
320 | 3170 episode | score: 8.63 | loss: 15.68868 | epsilon: 0.01
321 | 3180 episode | score: 8.64 | loss: 11.01862 | epsilon: 0.01
322 | 3190 episode | score: 8.72 | loss: 15.68075 | epsilon: 0.01
323 | 3200 episode | score: 8.74 | loss: 9.40447 | epsilon: 0.01
324 | 3210 episode | score: 8.70 | loss: 8.67002 | epsilon: 0.01
325 | 3220 episode | score: 8.68 | loss: 9.44660 | epsilon: 0.01
326 | 3230 episode | score: 8.61 | loss: 10.25746 | epsilon: 0.01
327 | 3240 episode | score: 8.59 | loss: 15.70494 | epsilon: 0.01
328 | 3250 episode | score: 8.60 | loss: 10.97750 | epsilon: 0.01
329 | 3260 episode | score: 8.57 | loss: 8.68631 | epsilon: 0.01
330 | 3270 episode | score: 8.60 | loss: 11.00549 | epsilon: 0.01
331 | 3280 episode | score: 8.59 | loss: 10.23621 | epsilon: 0.01
332 | 3290 episode | score: 8.55 | loss: 11.06288 | epsilon: 0.01
333 | 3300 episode | score: 8.53 | loss: 11.07038 | epsilon: 0.01
334 | 3310 episode | score: 8.49 | loss: 9.51873 | epsilon: 0.01
335 | 3320 episode | score: 8.51 | loss: 10.22602 | epsilon: 0.01
336 | 3330 episode | score: 8.49 | loss: 10.26356 | epsilon: 0.01
337 | 3340 episode | score: 8.52 | loss: 10.24619 | epsilon: 0.01
338 | 3350 episode | score: 8.53 | loss: 11.02440 | epsilon: 0.01
339 | 3360 episode | score: 8.65 | loss: 14.21980 | epsilon: 0.01
340 | 3370 episode | score: 8.63 | loss: 9.46559 | epsilon: 0.01
341 | 3380 episode | score: 8.65 | loss: 9.56488 | epsilon: 0.01
342 | 3390 episode | score: 8.63 | loss: 11.02502 | epsilon: 0.01
343 | 3400 episode | score: 8.57 | loss: 11.85219 | epsilon: 0.01
344 | 3410 episode | score: 8.56 | loss: 13.38109 | epsilon: 0.01
345 | 3420 episode | score: 8.65 | loss: 7.90535 | epsilon: 0.01
346 | 3430 episode | score: 8.64 | loss: 15.00816 | epsilon: 0.01
347 | 3440 episode | score: 8.61 | loss: 14.24374 | epsilon: 0.01
348 | 3450 episode | score: 8.57 | loss: 12.63750 | epsilon: 0.01
349 | 3460 episode | score: 8.56 | loss: 11.11115 | epsilon: 0.01
350 | 3470 episode | score: 8.56 | loss: 10.27376 | epsilon: 0.01
351 | 3480 episode | score: 8.62 | loss: 11.87048 | epsilon: 0.01
352 | 3490 episode | score: 8.59 | loss: 8.70988 | epsilon: 0.01
353 | 3500 episode | score: 8.58 | loss: 9.51818 | epsilon: 0.01
354 | 3510 episode | score: 8.61 | loss: 11.09497 | epsilon: 0.01
355 | 3520 episode | score: 8.57 | loss: 15.90761 | epsilon: 0.01
356 | 3530 episode | score: 8.66 | loss: 8.75900 | epsilon: 0.01
357 | 3540 episode | score: 8.63 | loss: 11.89002 | epsilon: 0.01
358 | 3550 episode | score: 8.60 | loss: 11.09713 | epsilon: 0.01
359 | 3560 episode | score: 8.59 | loss: 13.51019 | epsilon: 0.01
360 | 3570 episode | score: 8.53 | loss: 8.81450 | epsilon: 0.01
361 | 3580 episode | score: 8.52 | loss: 7.92999 | epsilon: 0.01
362 | 3590 episode | score: 8.47 | loss: 12.68926 | epsilon: 0.01
363 | 3600 episode | score: 8.44 | loss: 15.07101 | epsilon: 0.01
364 | 3610 episode | score: 8.43 | loss: 11.09076 | epsilon: 0.01
365 | 3620 episode | score: 8.42 | loss: 12.73851 | epsilon: 0.01
366 | 3630 episode | score: 8.46 | loss: 11.88773 | epsilon: 0.01
367 | 3640 episode | score: 8.46 | loss: 12.73633 | epsilon: 0.01
368 | 3650 episode | score: 8.48 | loss: 13.50091 | epsilon: 0.01
369 | 3660 episode | score: 8.50 | loss: 13.49799 | epsilon: 0.01
370 | 3670 episode | score: 8.46 | loss: 11.94654 | epsilon: 0.01
371 | 3680 episode | score: 8.54 | loss: 12.04877 | epsilon: 0.01
372 | 3690 episode | score: 8.55 | loss: 12.80853 | epsilon: 0.01
373 | 3700 episode | score: 8.54 | loss: 11.15186 | epsilon: 0.01
374 | 3710 episode | score: 8.55 | loss: 13.55354 | epsilon: 0.01
375 | 3720 episode | score: 8.65 | loss: 7.96912 | epsilon: 0.01
376 | 3730 episode | score: 8.63 | loss: 10.31971 | epsilon: 0.01
377 | 3740 episode | score: 8.69 | loss: 11.10972 | epsilon: 0.01
378 | 3750 episode | score: 8.70 | loss: 15.06225 | epsilon: 0.01
379 | 3760 episode | score: 8.69 | loss: 12.18484 | epsilon: 0.01
380 | 3770 episode | score: 8.69 | loss: 10.39990 | epsilon: 0.01
381 | 3780 episode | score: 8.70 | loss: 9.68053 | epsilon: 0.01
382 | 3790 episode | score: 8.74 | loss: 7.95677 | epsilon: 0.01
383 | 3800 episode | score: 8.76 | loss: 12.74118 | epsilon: 0.01
384 | 3810 episode | score: 8.78 | loss: 11.16556 | epsilon: 0.01
385 | 3820 episode | score: 8.93 | loss: 11.24375 | epsilon: 0.01
386 | 3830 episode | score: 9.24 | loss: 8.01321 | epsilon: 0.01
387 | 3840 episode | score: 9.17 | loss: 9.61550 | epsilon: 0.01
388 | 3850 episode | score: 9.11 | loss: 13.57032 | epsilon: 0.01
389 | 3860 episode | score: 9.05 | loss: 9.55489 | epsilon: 0.01
390 | 3870 episode | score: 9.07 | loss: 11.13208 | epsilon: 0.01
391 | 3880 episode | score: 9.05 | loss: 11.19456 | epsilon: 0.01
392 | 3890 episode | score: 9.10 | loss: 8.16878 | epsilon: 0.01
393 | 3900 episode | score: 9.02 | loss: 10.33400 | epsilon: 0.01
394 | 3910 episode | score: 8.95 | loss: 11.69202 | epsilon: 0.01
395 | 3920 episode | score: 8.91 | loss: 6.57852 | epsilon: 0.01
396 | 3930 episode | score: 8.88 | loss: 12.02832 | epsilon: 0.01
397 | 3940 episode | score: 8.82 | loss: 9.25438 | epsilon: 0.01
398 | 3950 episode | score: 8.81 | loss: 11.35655 | epsilon: 0.01
399 | 3960 episode | score: 8.72 | loss: 13.42752 | epsilon: 0.01
400 | 3970 episode | score: 8.65 | loss: 14.05224 | epsilon: 0.01
401 | 3980 episode | score: 8.63 | loss: 12.41926 | epsilon: 0.01
402 | 3990 episode | score: 8.66 | loss: 9.58528 | epsilon: 0.01
403 | 4000 episode | score: 8.72 | loss: 9.59237 | epsilon: 0.01
404 | 4010 episode | score: 8.71 | loss: 9.69728 | epsilon: 0.01
405 | 4020 episode | score: 8.67 | loss: 13.23359 | epsilon: 0.01
406 | 4030 episode | score: 8.64 | loss: 10.66293 | epsilon: 0.01
407 | 4040 episode | score: 8.61 | loss: 13.73763 | epsilon: 0.01
408 | 4050 episode | score: 8.54 | loss: 17.78467 | epsilon: 0.01
409 | 4060 episode | score: 8.57 | loss: 9.59514 | epsilon: 0.01
410 | 4070 episode | score: 8.51 | loss: 12.94440 | epsilon: 0.01
411 | 4080 episode | score: 8.49 | loss: 9.02911 | epsilon: 0.01
412 | 4090 episode | score: 8.45 | loss: 12.79924 | epsilon: 0.01
413 | 4100 episode | score: 8.44 | loss: 15.17093 | epsilon: 0.01
414 | 4110 episode | score: 8.44 | loss: 14.67449 | epsilon: 0.01
415 | 4120 episode | score: 8.41 | loss: 13.59448 | epsilon: 0.01
416 | 4130 episode | score: 8.46 | loss: 16.86084 | epsilon: 0.01
417 | 4140 episode | score: 8.55 | loss: 15.98142 | epsilon: 0.01
418 | 4150 episode | score: 8.56 | loss: 12.12033 | epsilon: 0.01
419 | 4160 episode | score: 8.56 | loss: 12.86493 | epsilon: 0.01
420 | 4170 episode | score: 8.51 | loss: 13.62834 | epsilon: 0.01
421 | 4180 episode | score: 8.49 | loss: 17.55323 | epsilon: 0.01
422 | 4190 episode | score: 8.44 | loss: 12.87850 | epsilon: 0.01
423 | 4200 episode | score: 8.42 | loss: 12.78649 | epsilon: 0.01
424 | 4210 episode | score: 8.46 | loss: 11.28722 | epsilon: 0.01
425 | 4220 episode | score: 8.44 | loss: 9.79293 | epsilon: 0.01
426 | 4230 episode | score: 8.67 | loss: 13.11148 | epsilon: 0.01
427 | 4240 episode | score: 8.73 | loss: 11.36636 | epsilon: 0.01
428 | 4250 episode | score: 8.68 | loss: 18.31320 | epsilon: 0.01
429 | 4260 episode | score: 8.64 | loss: 15.54791 | epsilon: 0.01
430 | 4270 episode | score: 8.64 | loss: 10.45561 | epsilon: 0.01
431 | 4280 episode | score: 8.58 | loss: 14.11239 | epsilon: 0.01
432 | 4290 episode | score: 8.57 | loss: 12.93595 | epsilon: 0.01
433 | 4300 episode | score: 8.60 | loss: 15.43336 | epsilon: 0.01
434 | 4310 episode | score: 8.85 | loss: 10.41784 | epsilon: 0.01
435 | 4320 episode | score: 8.97 | loss: 14.09765 | epsilon: 0.01
436 | 4330 episode | score: 9.05 | loss: 11.35991 | epsilon: 0.01
437 | 4340 episode | score: 9.02 | loss: 11.48805 | epsilon: 0.01
438 | 4350 episode | score: 8.93 | loss: 12.31691 | epsilon: 0.01
439 | 4360 episode | score: 8.87 | loss: 10.58508 | epsilon: 0.01
440 | 4370 episode | score: 8.82 | loss: 8.43289 | epsilon: 0.01
441 | 4380 episode | score: 8.83 | loss: 8.85667 | epsilon: 0.01
442 | 4390 episode | score: 8.92 | loss: 8.89580 | epsilon: 0.01
443 | 4400 episode | score: 8.96 | loss: 12.08995 | epsilon: 0.01
444 | 4410 episode | score: 8.91 | loss: 11.35356 | epsilon: 0.01
445 | 4420 episode | score: 8.87 | loss: 11.24321 | epsilon: 0.01
446 | 4430 episode | score: 8.80 | loss: 11.34462 | epsilon: 0.01
447 | 4440 episode | score: 8.83 | loss: 15.81705 | epsilon: 0.01
448 | 4450 episode | score: 8.83 | loss: 11.94490 | epsilon: 0.01
449 | 4460 episode | score: 8.81 | loss: 13.50714 | epsilon: 0.01
450 | 4470 episode | score: 8.73 | loss: 15.08722 | epsilon: 0.01
451 | 4480 episode | score: 8.71 | loss: 10.57810 | epsilon: 0.01
452 | 4490 episode | score: 8.86 | loss: 7.95716 | epsilon: 0.01
453 | 4500 episode | score: 8.81 | loss: 9.58130 | epsilon: 0.01
454 | 4510 episode | score: 8.74 | loss: 14.31959 | epsilon: 0.01
455 | 4520 episode | score: 8.71 | loss: 12.78611 | epsilon: 0.01
456 | 4530 episode | score: 8.70 | loss: 15.13371 | epsilon: 0.01
457 | 4540 episode | score: 8.67 | loss: 11.13598 | epsilon: 0.01
458 | 4550 episode | score: 8.73 | loss: 9.87144 | epsilon: 0.01
459 | 4560 episode | score: 8.69 | loss: 15.18121 | epsilon: 0.01
460 | 4570 episode | score: 8.71 | loss: 17.76075 | epsilon: 0.01
461 | 4580 episode | score: 8.72 | loss: 17.46336 | epsilon: 0.01
462 | 4590 episode | score: 8.70 | loss: 16.82282 | epsilon: 0.01
463 | 4600 episode | score: 8.67 | loss: 11.97398 | epsilon: 0.01
464 | 4610 episode | score: 8.61 | loss: 5.62248 | epsilon: 0.01
465 | 4620 episode | score: 8.62 | loss: 12.76912 | epsilon: 0.01
466 | 4630 episode | score: 8.60 | loss: 8.81276 | epsilon: 0.01
467 | 4640 episode | score: 8.59 | loss: 10.44362 | epsilon: 0.01
468 | 4650 episode | score: 8.60 | loss: 15.01834 | epsilon: 0.01
469 | 4660 episode | score: 8.62 | loss: 17.10469 | epsilon: 0.01
470 | 4670 episode | score: 8.65 | loss: 14.30819 | epsilon: 0.01
471 | 4680 episode | score: 8.64 | loss: 14.29096 | epsilon: 0.01
472 | 4690 episode | score: 8.60 | loss: 12.26738 | epsilon: 0.01
473 | 4700 episode | score: 8.62 | loss: 11.95053 | epsilon: 0.01
474 | 4710 episode | score: 8.62 | loss: 10.07705 | epsilon: 0.01
475 | 4720 episode | score: 8.59 | loss: 4.77848 | epsilon: 0.01
476 | 4730 episode | score: 8.63 | loss: 10.37387 | epsilon: 0.01
477 | 4740 episode | score: 8.87 | loss: 11.96993 | epsilon: 0.01
478 | 4750 episode | score: 8.79 | loss: 8.79129 | epsilon: 0.01
479 | 4760 episode | score: 8.78 | loss: 7.22931 | epsilon: 0.01
480 | 4770 episode | score: 8.74 | loss: 11.15121 | epsilon: 0.01
481 | 4780 episode | score: 8.72 | loss: 12.72968 | epsilon: 0.01
482 | 4790 episode | score: 8.82 | loss: 10.33394 | epsilon: 0.01
483 | 4800 episode | score: 8.78 | loss: 10.33138 | epsilon: 0.01
484 | 4810 episode | score: 8.75 | loss: 15.39932 | epsilon: 0.01
485 | 4820 episode | score: 8.76 | loss: 11.96936 | epsilon: 0.01
486 | 4830 episode | score: 8.73 | loss: 24.04654 | epsilon: 0.01
487 | 4840 episode | score: 8.75 | loss: 15.82520 | epsilon: 0.01
488 | 4850 episode | score: 8.70 | loss: 15.51851 | epsilon: 0.01
489 | 4860 episode | score: 8.64 | loss: 6.37486 | epsilon: 0.01
490 | 4870 episode | score: 8.60 | loss: 19.06402 | epsilon: 0.01
491 | 4880 episode | score: 8.56 | loss: 15.11088 | epsilon: 0.01
492 | 4890 episode | score: 8.60 | loss: 13.55245 | epsilon: 0.01
493 | 4900 episode | score: 8.57 | loss: 16.86619 | epsilon: 0.01
494 | 4910 episode | score: 8.55 | loss: 15.18091 | epsilon: 0.01
495 | 4920 episode | score: 8.53 | loss: 14.59640 | epsilon: 0.01
496 | 4930 episode | score: 8.49 | loss: 12.74775 | epsilon: 0.01
497 | 4940 episode | score: 8.48 | loss: 12.72942 | epsilon: 0.01
498 | 4950 episode | score: 8.52 | loss: 14.33515 | epsilon: 0.01
499 | 4960 episode | score: 8.54 | loss: 9.65471 | epsilon: 0.01
500 | 4970 episode | score: 8.51 | loss: 18.19523 | epsilon: 0.01
501 | 4980 episode | score: 8.45 | loss: 15.07701 | epsilon: 0.01
502 | 4990 episode | score: 8.59 | loss: 13.52352 | epsilon: 0.01
503 |
--------------------------------------------------------------------------------
/out/trace_DTQN_10.txt:
--------------------------------------------------------------------------------
1 | state size: 2
2 | action size: 2
3 | 0 episode | score: 16.00 | loss: 0.00000 | epsilon: 1.00
4 | 10 episode | score: 16.48 | loss: 0.00000 | epsilon: 1.00
5 | 20 episode | score: 17.15 | loss: 0.00000 | epsilon: 1.00
6 | 30 episode | score: 17.02 | loss: 0.00000 | epsilon: 1.00
7 | 40 episode | score: 17.05 | loss: 0.00000 | epsilon: 1.00
8 | 50 episode | score: 17.94 | loss: 0.18528 | epsilon: 0.99
9 | 60 episode | score: 18.16 | loss: 0.17950 | epsilon: 0.98
10 | 70 episode | score: 18.58 | loss: 0.12416 | epsilon: 0.97
11 | 80 episode | score: 18.75 | loss: 0.20064 | epsilon: 0.96
12 | 90 episode | score: 19.12 | loss: 0.85149 | epsilon: 0.95
13 | 100 episode | score: 20.25 | loss: 0.31414 | epsilon: 0.93
14 | 110 episode | score: 20.22 | loss: 0.46559 | epsilon: 0.92
15 | 120 episode | score: 20.24 | loss: 0.34208 | epsilon: 0.91
16 | 130 episode | score: 20.15 | loss: 0.20250 | epsilon: 0.90
17 | 140 episode | score: 20.55 | loss: 1.23666 | epsilon: 0.89
18 | 150 episode | score: 20.87 | loss: 0.61966 | epsilon: 0.88
19 | 160 episode | score: 21.53 | loss: 0.47718 | epsilon: 0.86
20 | 170 episode | score: 21.49 | loss: 1.12160 | epsilon: 0.85
21 | 180 episode | score: 21.84 | loss: 1.05996 | epsilon: 0.84
22 | 190 episode | score: 21.98 | loss: 0.28847 | epsilon: 0.83
23 | 200 episode | score: 22.02 | loss: 0.80320 | epsilon: 0.82
24 | 210 episode | score: 21.73 | loss: 0.93839 | epsilon: 0.80
25 | 220 episode | score: 22.06 | loss: 0.60672 | epsilon: 0.79
26 | 230 episode | score: 21.88 | loss: 0.04399 | epsilon: 0.78
27 | 240 episode | score: 21.35 | loss: 0.95139 | epsilon: 0.77
28 | 250 episode | score: 20.59 | loss: 0.65200 | epsilon: 0.77
29 | 260 episode | score: 20.06 | loss: 0.98826 | epsilon: 0.76
30 | 270 episode | score: 20.17 | loss: 1.66683 | epsilon: 0.75
31 | 280 episode | score: 19.94 | loss: 1.04465 | epsilon: 0.74
32 | 290 episode | score: 19.98 | loss: 2.11482 | epsilon: 0.73
33 | 300 episode | score: 19.88 | loss: 1.09425 | epsilon: 0.72
34 | 310 episode | score: 19.42 | loss: 1.52128 | epsilon: 0.71
35 | 320 episode | score: 19.30 | loss: 1.16642 | epsilon: 0.70
36 | 330 episode | score: 19.24 | loss: 1.19288 | epsilon: 0.69
37 | 340 episode | score: 18.74 | loss: 1.56613 | epsilon: 0.68
38 | 350 episode | score: 18.84 | loss: 1.20542 | epsilon: 0.67
39 | 360 episode | score: 18.53 | loss: 1.62247 | epsilon: 0.66
40 | 370 episode | score: 18.38 | loss: 0.44018 | epsilon: 0.65
41 | 380 episode | score: 18.52 | loss: 1.69055 | epsilon: 0.64
42 | 390 episode | score: 18.54 | loss: 0.88652 | epsilon: 0.63
43 | 400 episode | score: 18.25 | loss: 0.88762 | epsilon: 0.62
44 | 410 episode | score: 18.11 | loss: 0.91311 | epsilon: 0.62
45 | 420 episode | score: 17.75 | loss: 1.78714 | epsilon: 0.61
46 | 430 episode | score: 17.58 | loss: 2.47997 | epsilon: 0.60
47 | 440 episode | score: 17.88 | loss: 2.39753 | epsilon: 0.59
48 | 450 episode | score: 17.60 | loss: 1.39301 | epsilon: 0.58
49 | 460 episode | score: 17.32 | loss: 2.81882 | epsilon: 0.57
50 | 470 episode | score: 17.52 | loss: 1.44277 | epsilon: 0.56
51 | 480 episode | score: 17.03 | loss: 0.96965 | epsilon: 0.56
52 | 490 episode | score: 16.77 | loss: 3.42537 | epsilon: 0.55
53 | 500 episode | score: 16.33 | loss: 2.43177 | epsilon: 0.54
54 | 510 episode | score: 16.80 | loss: 1.97684 | epsilon: 0.53
55 | 520 episode | score: 16.40 | loss: 2.07802 | epsilon: 0.52
56 | 530 episode | score: 16.17 | loss: 2.02413 | epsilon: 0.52
57 | 540 episode | score: 15.87 | loss: 4.07846 | epsilon: 0.51
58 | 550 episode | score: 15.64 | loss: 3.07883 | epsilon: 0.50
59 | 560 episode | score: 15.57 | loss: 1.05649 | epsilon: 0.49
60 | 570 episode | score: 15.19 | loss: 1.05230 | epsilon: 0.49
61 | 580 episode | score: 15.15 | loss: 3.65838 | epsilon: 0.48
62 | 590 episode | score: 14.71 | loss: 2.22756 | epsilon: 0.47
63 | 600 episode | score: 14.31 | loss: 3.75582 | epsilon: 0.47
64 | 610 episode | score: 14.65 | loss: 3.74815 | epsilon: 0.46
65 | 620 episode | score: 14.83 | loss: 3.76151 | epsilon: 0.45
66 | 630 episode | score: 14.87 | loss: 2.75478 | epsilon: 0.44
67 | 640 episode | score: 14.56 | loss: 1.11075 | epsilon: 0.44
68 | 650 episode | score: 14.56 | loss: 3.89268 | epsilon: 0.43
69 | 660 episode | score: 14.25 | loss: 2.23375 | epsilon: 0.42
70 | 670 episode | score: 14.22 | loss: 2.98288 | epsilon: 0.41
71 | 680 episode | score: 13.91 | loss: 1.69801 | epsilon: 0.41
72 | 690 episode | score: 13.67 | loss: 2.29384 | epsilon: 0.40
73 | 700 episode | score: 13.45 | loss: 4.51319 | epsilon: 0.40
74 | 710 episode | score: 13.51 | loss: 2.84723 | epsilon: 0.39
75 | 720 episode | score: 13.40 | loss: 3.99457 | epsilon: 0.38
76 | 730 episode | score: 13.17 | loss: 5.72311 | epsilon: 0.38
77 | 740 episode | score: 12.95 | loss: 4.85307 | epsilon: 0.37
78 | 750 episode | score: 12.84 | loss: 4.10744 | epsilon: 0.36
79 | 760 episode | score: 12.79 | loss: 5.27431 | epsilon: 0.36
80 | 770 episode | score: 12.73 | loss: 4.68064 | epsilon: 0.35
81 | 780 episode | score: 12.64 | loss: 4.18007 | epsilon: 0.34
82 | 790 episode | score: 12.63 | loss: 4.76635 | epsilon: 0.34
83 | 800 episode | score: 12.57 | loss: 2.98694 | epsilon: 0.33
84 | 810 episode | score: 12.43 | loss: 3.58431 | epsilon: 0.32
85 | 820 episode | score: 12.58 | loss: 0.61492 | epsilon: 0.32
86 | 830 episode | score: 12.51 | loss: 3.59966 | epsilon: 0.31
87 | 840 episode | score: 12.72 | loss: 5.47352 | epsilon: 0.30
88 | 850 episode | score: 12.68 | loss: 4.25853 | epsilon: 0.30
89 | 860 episode | score: 12.53 | loss: 3.09318 | epsilon: 0.29
90 | 870 episode | score: 12.31 | loss: 2.08403 | epsilon: 0.28
91 | 880 episode | score: 12.14 | loss: 7.99741 | epsilon: 0.28
92 | 890 episode | score: 11.92 | loss: 2.47761 | epsilon: 0.27
93 | 900 episode | score: 12.51 | loss: 1.89428 | epsilon: 0.26
94 | 910 episode | score: 12.41 | loss: 3.13464 | epsilon: 0.26
95 | 920 episode | score: 12.14 | loss: 5.61082 | epsilon: 0.25
96 | 930 episode | score: 11.93 | loss: 3.76492 | epsilon: 0.25
97 | 940 episode | score: 11.98 | loss: 5.19080 | epsilon: 0.24
98 | 950 episode | score: 11.94 | loss: 6.96484 | epsilon: 0.23
99 | 960 episode | score: 11.99 | loss: 3.80788 | epsilon: 0.23
100 | 970 episode | score: 11.70 | loss: 9.48063 | epsilon: 0.22
101 | 980 episode | score: 11.76 | loss: 3.18651 | epsilon: 0.22
102 | 990 episode | score: 12.31 | loss: 5.74552 | epsilon: 0.21
103 | 1000 episode | score: 12.32 | loss: 7.71358 | epsilon: 0.20
104 | 1010 episode | score: 12.02 | loss: 3.31016 | epsilon: 0.19
105 | 1020 episode | score: 11.75 | loss: 3.88349 | epsilon: 0.19
106 | 1030 episode | score: 11.42 | loss: 4.62133 | epsilon: 0.18
107 | 1040 episode | score: 11.22 | loss: 6.51512 | epsilon: 0.18
108 | 1050 episode | score: 11.11 | loss: 3.32236 | epsilon: 0.17
109 | 1060 episode | score: 11.02 | loss: 7.15486 | epsilon: 0.17
110 | 1070 episode | score: 10.90 | loss: 5.20952 | epsilon: 0.16
111 | 1080 episode | score: 10.76 | loss: 7.23930 | epsilon: 0.16
112 | 1090 episode | score: 10.72 | loss: 6.55639 | epsilon: 0.15
113 | 1100 episode | score: 10.57 | loss: 7.21093 | epsilon: 0.15
114 | 1110 episode | score: 10.37 | loss: 7.88799 | epsilon: 0.14
115 | 1120 episode | score: 10.38 | loss: 7.39938 | epsilon: 0.14
116 | 1130 episode | score: 10.24 | loss: 5.30793 | epsilon: 0.13
117 | 1140 episode | score: 10.10 | loss: 8.76813 | epsilon: 0.13
118 | 1150 episode | score: 10.33 | loss: 5.46215 | epsilon: 0.12
119 | 1160 episode | score: 10.22 | loss: 7.47411 | epsilon: 0.11
120 | 1170 episode | score: 10.06 | loss: 6.90808 | epsilon: 0.11
121 | 1180 episode | score: 9.96 | loss: 6.75095 | epsilon: 0.11
122 | 1190 episode | score: 9.86 | loss: 9.46002 | epsilon: 0.10
123 | 1200 episode | score: 9.78 | loss: 10.21884 | epsilon: 0.10
124 | 1210 episode | score: 9.71 | loss: 6.17822 | epsilon: 0.09
125 | 1220 episode | score: 9.62 | loss: 8.82867 | epsilon: 0.09
126 | 1230 episode | score: 9.67 | loss: 9.50774 | epsilon: 0.08
127 | 1240 episode | score: 9.64 | loss: 8.20357 | epsilon: 0.07
128 | 1250 episode | score: 9.57 | loss: 7.58357 | epsilon: 0.07
129 | 1260 episode | score: 9.55 | loss: 7.55123 | epsilon: 0.06
130 | 1270 episode | score: 9.44 | loss: 10.22959 | epsilon: 0.06
131 | 1280 episode | score: 9.51 | loss: 8.21039 | epsilon: 0.05
132 | 1290 episode | score: 9.37 | loss: 7.57248 | epsilon: 0.05
133 | 1300 episode | score: 9.25 | loss: 8.99998 | epsilon: 0.04
134 | 1310 episode | score: 9.15 | loss: 8.93621 | epsilon: 0.04
135 | 1320 episode | score: 9.08 | loss: 6.27452 | epsilon: 0.04
136 | 1330 episode | score: 9.16 | loss: 9.62946 | epsilon: 0.03
137 | 1340 episode | score: 9.63 | loss: 7.61284 | epsilon: 0.02
138 | 1350 episode | score: 9.48 | loss: 12.47634 | epsilon: 0.02
139 | 1360 episode | score: 9.36 | loss: 10.38395 | epsilon: 0.01
140 | 1370 episode | score: 9.25 | loss: 6.25999 | epsilon: 0.01
141 | 1380 episode | score: 9.19 | loss: 9.05333 | epsilon: 0.01
142 | 1390 episode | score: 9.07 | loss: 8.36740 | epsilon: 0.01
143 | 1400 episode | score: 9.10 | loss: 13.26970 | epsilon: 0.01
144 | 1410 episode | score: 9.06 | loss: 8.44920 | epsilon: 0.01
145 | 1420 episode | score: 9.08 | loss: 8.39362 | epsilon: 0.01
146 | 1430 episode | score: 9.05 | loss: 9.86799 | epsilon: 0.01
147 | 1440 episode | score: 9.13 | loss: 11.96962 | epsilon: 0.01
148 | 1450 episode | score: 9.06 | loss: 11.25040 | epsilon: 0.01
149 | 1460 episode | score: 9.08 | loss: 9.15750 | epsilon: 0.01
150 | 1470 episode | score: 9.02 | loss: 10.56983 | epsilon: 0.01
151 | 1480 episode | score: 8.97 | loss: 8.49944 | epsilon: 0.01
152 | 1490 episode | score: 8.91 | loss: 7.08509 | epsilon: 0.01
153 | 1500 episode | score: 8.91 | loss: 9.17531 | epsilon: 0.01
154 | 1510 episode | score: 8.86 | loss: 7.78239 | epsilon: 0.01
155 | 1520 episode | score: 8.82 | loss: 10.60622 | epsilon: 0.01
156 | 1530 episode | score: 8.77 | loss: 10.68818 | epsilon: 0.01
157 | 1540 episode | score: 8.72 | loss: 9.89931 | epsilon: 0.01
158 | 1550 episode | score: 8.71 | loss: 13.45332 | epsilon: 0.01
159 | 1560 episode | score: 8.69 | loss: 9.95345 | epsilon: 0.01
160 | 1570 episode | score: 8.70 | loss: 12.07025 | epsilon: 0.01
161 | 1580 episode | score: 8.94 | loss: 9.34593 | epsilon: 0.01
162 | 1590 episode | score: 8.89 | loss: 12.17938 | epsilon: 0.01
163 | 1600 episode | score: 8.83 | loss: 10.72823 | epsilon: 0.01
164 | 1610 episode | score: 8.79 | loss: 6.47576 | epsilon: 0.01
165 | 1620 episode | score: 8.72 | loss: 10.74697 | epsilon: 0.01
166 | 1630 episode | score: 8.71 | loss: 5.77982 | epsilon: 0.01
167 | 1640 episode | score: 8.70 | loss: 11.54216 | epsilon: 0.01
168 | 1650 episode | score: 8.69 | loss: 8.69251 | epsilon: 0.01
169 | 1660 episode | score: 8.81 | loss: 10.78316 | epsilon: 0.01
170 | 1670 episode | score: 8.78 | loss: 6.49401 | epsilon: 0.01
171 | 1680 episode | score: 8.77 | loss: 7.22237 | epsilon: 0.01
172 | 1690 episode | score: 8.89 | loss: 8.70512 | epsilon: 0.01
173 | 1700 episode | score: 9.02 | loss: 8.67514 | epsilon: 0.01
174 | 1710 episode | score: 8.95 | loss: 6.52219 | epsilon: 0.01
175 | 1720 episode | score: 8.88 | loss: 8.06107 | epsilon: 0.01
176 | 1730 episode | score: 8.81 | loss: 7.99918 | epsilon: 0.01
177 | 1740 episode | score: 8.86 | loss: 8.01453 | epsilon: 0.01
178 | 1750 episode | score: 8.84 | loss: 7.27068 | epsilon: 0.01
179 | 1760 episode | score: 8.82 | loss: 10.14679 | epsilon: 0.01
180 | 1770 episode | score: 8.78 | loss: 10.91075 | epsilon: 0.01
181 | 1780 episode | score: 8.77 | loss: 10.94096 | epsilon: 0.01
182 | 1790 episode | score: 8.77 | loss: 8.04180 | epsilon: 0.01
183 | 1800 episode | score: 8.92 | loss: 6.63493 | epsilon: 0.01
184 | 1810 episode | score: 8.83 | loss: 10.22885 | epsilon: 0.01
185 | 1820 episode | score: 8.75 | loss: 8.09923 | epsilon: 0.01
186 | 1830 episode | score: 8.76 | loss: 7.35126 | epsilon: 0.01
187 | 1840 episode | score: 9.33 | loss: 9.55382 | epsilon: 0.01
188 | 1850 episode | score: 9.34 | loss: 8.09047 | epsilon: 0.01
189 | 1860 episode | score: 9.30 | loss: 6.61888 | epsilon: 0.01
190 | 1870 episode | score: 9.24 | loss: 9.54714 | epsilon: 0.01
191 | 1880 episode | score: 9.60 | loss: 10.28490 | epsilon: 0.01
192 | 1890 episode | score: 9.48 | loss: 6.67569 | epsilon: 0.01
193 | 1900 episode | score: 9.40 | loss: 11.07699 | epsilon: 0.01
194 | 1910 episode | score: 9.29 | loss: 7.40748 | epsilon: 0.01
195 | 1920 episode | score: 9.26 | loss: 8.09952 | epsilon: 0.01
196 | 1930 episode | score: 9.23 | loss: 8.16106 | epsilon: 0.01
197 | 1940 episode | score: 9.16 | loss: 9.59923 | epsilon: 0.01
198 | 1950 episode | score: 9.07 | loss: 11.85551 | epsilon: 0.01
199 | 1960 episode | score: 8.93 | loss: 9.61497 | epsilon: 0.01
200 | 1970 episode | score: 8.90 | loss: 9.60651 | epsilon: 0.01
201 | 1980 episode | score: 8.83 | loss: 10.42463 | epsilon: 0.01
202 | 1990 episode | score: 8.81 | loss: 6.69849 | epsilon: 0.01
203 | 2000 episode | score: 8.76 | loss: 8.90158 | epsilon: 0.01
204 | 2010 episode | score: 8.78 | loss: 11.14114 | epsilon: 0.01
205 | 2020 episode | score: 8.74 | loss: 11.22850 | epsilon: 0.01
206 | 2030 episode | score: 8.74 | loss: 13.42511 | epsilon: 0.01
207 | 2040 episode | score: 8.70 | loss: 10.42988 | epsilon: 0.01
208 | 2050 episode | score: 8.70 | loss: 11.23738 | epsilon: 0.01
209 | 2060 episode | score: 8.64 | loss: 11.95969 | epsilon: 0.01
210 | 2070 episode | score: 8.71 | loss: 10.46605 | epsilon: 0.01
211 | 2080 episode | score: 8.71 | loss: 12.78350 | epsilon: 0.01
212 | 2090 episode | score: 8.70 | loss: 12.75578 | epsilon: 0.01
213 | 2100 episode | score: 8.68 | loss: 7.51685 | epsilon: 0.01
214 | 2110 episode | score: 8.81 | loss: 12.03533 | epsilon: 0.01
215 | 2120 episode | score: 8.81 | loss: 9.81312 | epsilon: 0.01
216 | 2130 episode | score: 8.92 | loss: 9.80107 | epsilon: 0.01
217 | 2140 episode | score: 8.82 | loss: 7.60721 | epsilon: 0.01
218 | 2150 episode | score: 8.78 | loss: 9.06595 | epsilon: 0.01
219 | 2160 episode | score: 8.78 | loss: 9.80017 | epsilon: 0.01
220 | 2170 episode | score: 8.74 | loss: 12.85833 | epsilon: 0.01
221 | 2180 episode | score: 8.69 | loss: 11.30474 | epsilon: 0.01
222 | 2190 episode | score: 8.80 | loss: 11.48147 | epsilon: 0.01
223 | 2200 episode | score: 8.78 | loss: 7.64000 | epsilon: 0.01
224 | 2210 episode | score: 8.73 | loss: 9.82936 | epsilon: 0.01
225 | 2220 episode | score: 8.69 | loss: 10.60633 | epsilon: 0.01
226 | 2230 episode | score: 8.66 | loss: 12.14406 | epsilon: 0.01
227 | 2240 episode | score: 8.69 | loss: 12.14394 | epsilon: 0.01
228 | 2250 episode | score: 8.66 | loss: 13.65806 | epsilon: 0.01
229 | 2260 episode | score: 8.62 | loss: 9.91018 | epsilon: 0.01
230 | 2270 episode | score: 8.65 | loss: 10.62868 | epsilon: 0.01
231 | 2280 episode | score: 8.61 | loss: 13.67840 | epsilon: 0.01
232 | 2290 episode | score: 8.68 | loss: 11.39243 | epsilon: 0.01
233 | 2300 episode | score: 8.63 | loss: 11.43112 | epsilon: 0.01
234 | 2310 episode | score: 8.66 | loss: 10.68759 | epsilon: 0.01
235 | 2320 episode | score: 8.62 | loss: 12.98507 | epsilon: 0.01
236 | 2330 episode | score: 8.60 | loss: 9.93560 | epsilon: 0.01
237 | 2340 episode | score: 8.59 | loss: 13.80500 | epsilon: 0.01
238 | 2350 episode | score: 8.63 | loss: 8.43398 | epsilon: 0.01
239 | 2360 episode | score: 8.63 | loss: 7.72701 | epsilon: 0.01
240 | 2370 episode | score: 8.60 | loss: 9.18573 | epsilon: 0.01
241 | 2380 episode | score: 8.62 | loss: 10.83447 | epsilon: 0.01
242 | 2390 episode | score: 8.57 | loss: 11.49614 | epsilon: 0.01
243 | 2400 episode | score: 8.61 | loss: 6.16400 | epsilon: 0.01
244 | 2410 episode | score: 8.61 | loss: 10.03061 | epsilon: 0.01
245 | 2420 episode | score: 8.59 | loss: 10.77323 | epsilon: 0.01
246 | 2430 episode | score: 8.64 | loss: 9.24260 | epsilon: 0.01
247 | 2440 episode | score: 8.60 | loss: 10.04683 | epsilon: 0.01
248 | 2450 episode | score: 8.69 | loss: 12.27101 | epsilon: 0.01
249 | 2460 episode | score: 9.43 | loss: 9.24598 | epsilon: 0.01
250 | 2470 episode | score: 9.42 | loss: 8.50105 | epsilon: 0.01
251 | 2480 episode | score: 9.33 | loss: 9.97533 | epsilon: 0.01
252 | 2490 episode | score: 9.33 | loss: 6.14554 | epsilon: 0.01
253 | 2500 episode | score: 9.33 | loss: 5.40624 | epsilon: 0.01
254 | 2510 episode | score: 9.21 | loss: 10.75754 | epsilon: 0.01
255 | 2520 episode | score: 9.13 | loss: 6.99077 | epsilon: 0.01
256 | 2530 episode | score: 9.07 | loss: 8.52315 | epsilon: 0.01
257 | 2540 episode | score: 9.05 | loss: 6.93344 | epsilon: 0.01
258 | 2550 episode | score: 8.99 | loss: 6.93618 | epsilon: 0.01
259 | 2560 episode | score: 8.95 | loss: 13.07306 | epsilon: 0.01
260 | 2570 episode | score: 8.94 | loss: 11.62716 | epsilon: 0.01
261 | 2580 episode | score: 8.85 | loss: 10.78042 | epsilon: 0.01
262 | 2590 episode | score: 8.81 | loss: 11.56523 | epsilon: 0.01
263 | 2600 episode | score: 8.80 | loss: 9.23891 | epsilon: 0.01
264 | 2610 episode | score: 8.78 | loss: 10.78324 | epsilon: 0.01
265 | 2620 episode | score: 8.70 | loss: 10.02428 | epsilon: 0.01
266 | 2630 episode | score: 8.65 | loss: 10.77991 | epsilon: 0.01
267 | 2640 episode | score: 8.65 | loss: 13.08437 | epsilon: 0.01
268 | 2650 episode | score: 8.65 | loss: 13.09908 | epsilon: 0.01
269 | 2660 episode | score: 8.63 | loss: 17.00031 | epsilon: 0.01
270 | 2670 episode | score: 8.65 | loss: 10.79809 | epsilon: 0.01
271 | 2680 episode | score: 8.67 | loss: 7.69315 | epsilon: 0.01
272 | 2690 episode | score: 8.64 | loss: 8.46132 | epsilon: 0.01
273 | 2700 episode | score: 8.63 | loss: 16.19745 | epsilon: 0.01
274 | 2710 episode | score: 8.61 | loss: 12.32985 | epsilon: 0.01
275 | 2720 episode | score: 8.57 | loss: 9.25499 | epsilon: 0.01
276 | 2730 episode | score: 8.55 | loss: 12.31944 | epsilon: 0.01
277 | 2740 episode | score: 8.57 | loss: 11.62032 | epsilon: 0.01
278 | 2750 episode | score: 8.56 | loss: 11.63290 | epsilon: 0.01
279 | 2760 episode | score: 8.52 | loss: 13.10786 | epsilon: 0.01
280 | 2770 episode | score: 8.51 | loss: 12.36207 | epsilon: 0.01
281 | 2780 episode | score: 8.50 | loss: 15.42316 | epsilon: 0.01
282 | 2790 episode | score: 8.56 | loss: 13.21941 | epsilon: 0.01
283 | 2800 episode | score: 8.57 | loss: 10.06483 | epsilon: 0.01
284 | 2810 episode | score: 8.57 | loss: 11.56093 | epsilon: 0.01
285 | 2820 episode | score: 8.55 | loss: 14.74690 | epsilon: 0.01
286 | 2830 episode | score: 8.54 | loss: 11.64284 | epsilon: 0.01
287 | 2840 episode | score: 8.52 | loss: 10.13775 | epsilon: 0.01
288 | 2850 episode | score: 8.51 | loss: 15.48453 | epsilon: 0.01
289 | 2860 episode | score: 8.53 | loss: 13.18159 | epsilon: 0.01
290 | 2870 episode | score: 8.58 | loss: 10.06775 | epsilon: 0.01
291 | 2880 episode | score: 8.61 | loss: 9.30755 | epsilon: 0.01
292 | 2890 episode | score: 8.65 | loss: 8.55476 | epsilon: 0.01
293 | 2900 episode | score: 8.63 | loss: 10.87585 | epsilon: 0.01
294 | 2910 episode | score: 8.64 | loss: 13.18019 | epsilon: 0.01
295 | 2920 episode | score: 8.60 | loss: 10.19342 | epsilon: 0.01
296 | 2930 episode | score: 8.60 | loss: 10.07251 | epsilon: 0.01
297 | 2940 episode | score: 8.68 | loss: 11.60551 | epsilon: 0.01
298 | 2950 episode | score: 8.64 | loss: 8.51660 | epsilon: 0.01
299 | 2960 episode | score: 8.73 | loss: 10.87706 | epsilon: 0.01
300 | 2970 episode | score: 8.70 | loss: 12.41756 | epsilon: 0.01
301 | 2980 episode | score: 8.63 | loss: 11.63462 | epsilon: 0.01
302 | 2990 episode | score: 8.61 | loss: 12.45533 | epsilon: 0.01
303 | 3000 episode | score: 8.62 | loss: 13.31752 | epsilon: 0.01
304 | 3010 episode | score: 8.64 | loss: 11.63569 | epsilon: 0.01
305 | 3020 episode | score: 8.78 | loss: 9.32524 | epsilon: 0.01
306 | 3030 episode | score: 8.76 | loss: 11.73422 | epsilon: 0.01
307 | 3040 episode | score: 8.75 | loss: 13.20695 | epsilon: 0.01
308 | 3050 episode | score: 8.74 | loss: 7.05605 | epsilon: 0.01
309 | 3060 episode | score: 8.69 | loss: 8.57716 | epsilon: 0.01
310 | 3070 episode | score: 8.66 | loss: 10.19580 | epsilon: 0.01
311 | 3080 episode | score: 8.66 | loss: 14.06568 | epsilon: 0.01
312 | 3090 episode | score: 8.64 | loss: 12.54142 | epsilon: 0.01
313 | 3100 episode | score: 8.62 | loss: 8.65207 | epsilon: 0.01
314 | 3110 episode | score: 8.59 | loss: 11.02742 | epsilon: 0.01
315 | 3120 episode | score: 8.58 | loss: 10.94348 | epsilon: 0.01
316 | 3130 episode | score: 8.47 | loss: 14.08981 | epsilon: 0.01
317 | 3140 episode | score: 8.49 | loss: 12.66682 | epsilon: 0.01
318 | 3150 episode | score: 8.48 | loss: 12.50308 | epsilon: 0.01
319 | 3160 episode | score: 8.52 | loss: 11.85718 | epsilon: 0.01
320 | 3170 episode | score: 8.52 | loss: 9.50961 | epsilon: 0.01
321 | 3180 episode | score: 8.53 | loss: 12.61227 | epsilon: 0.01
322 | 3190 episode | score: 8.56 | loss: 9.45285 | epsilon: 0.01
323 | 3200 episode | score: 8.60 | loss: 11.76034 | epsilon: 0.01
324 | 3210 episode | score: 8.68 | loss: 14.85800 | epsilon: 0.01
325 | 3220 episode | score: 8.66 | loss: 11.74906 | epsilon: 0.01
326 | 3230 episode | score: 8.76 | loss: 10.19856 | epsilon: 0.01
327 | 3240 episode | score: 8.69 | loss: 7.87378 | epsilon: 0.01
328 | 3250 episode | score: 8.80 | loss: 10.24053 | epsilon: 0.01
329 | 3260 episode | score: 8.75 | loss: 6.31049 | epsilon: 0.01
330 | 3270 episode | score: 8.72 | loss: 11.75972 | epsilon: 0.01
331 | 3280 episode | score: 8.68 | loss: 11.00718 | epsilon: 0.01
332 | 3290 episode | score: 8.66 | loss: 8.77038 | epsilon: 0.01
333 | 3300 episode | score: 8.70 | loss: 11.11809 | epsilon: 0.01
334 | 3310 episode | score: 8.71 | loss: 19.04940 | epsilon: 0.01
335 | 3320 episode | score: 8.70 | loss: 11.57778 | epsilon: 0.01
336 | 3330 episode | score: 8.68 | loss: 17.37124 | epsilon: 0.01
337 | 3340 episode | score: 8.62 | loss: 13.05712 | epsilon: 0.01
338 | 3350 episode | score: 8.58 | loss: 14.91305 | epsilon: 0.01
339 | 3360 episode | score: 8.56 | loss: 14.71657 | epsilon: 0.01
340 | 3370 episode | score: 8.53 | loss: 16.40873 | epsilon: 0.01
341 | 3380 episode | score: 8.54 | loss: 14.87654 | epsilon: 0.01
342 | 3390 episode | score: 8.50 | loss: 12.68097 | epsilon: 0.01
343 | 3400 episode | score: 8.53 | loss: 12.65293 | epsilon: 0.01
344 | 3410 episode | score: 8.54 | loss: 10.97103 | epsilon: 0.01
345 | 3420 episode | score: 8.53 | loss: 10.18955 | epsilon: 0.01
346 | 3430 episode | score: 8.54 | loss: 9.49034 | epsilon: 0.01
347 | 3440 episode | score: 8.53 | loss: 7.01580 | epsilon: 0.01
348 | 3450 episode | score: 8.52 | loss: 14.17627 | epsilon: 0.01
349 | 3460 episode | score: 8.52 | loss: 13.27503 | epsilon: 0.01
350 | 3470 episode | score: 8.52 | loss: 9.44508 | epsilon: 0.01
351 | 3480 episode | score: 8.52 | loss: 13.33412 | epsilon: 0.01
352 | 3490 episode | score: 8.49 | loss: 15.74604 | epsilon: 0.01
353 | 3500 episode | score: 8.47 | loss: 15.71954 | epsilon: 0.01
354 | 3510 episode | score: 8.47 | loss: 10.21629 | epsilon: 0.01
355 | 3520 episode | score: 8.47 | loss: 8.66194 | epsilon: 0.01
356 | 3530 episode | score: 8.65 | loss: 12.74928 | epsilon: 0.01
357 | 3540 episode | score: 8.79 | loss: 7.37917 | epsilon: 0.01
358 | 3550 episode | score: 8.80 | loss: 13.15130 | epsilon: 0.01
359 | 3560 episode | score: 8.84 | loss: 7.93903 | epsilon: 0.01
360 | 3570 episode | score: 8.85 | loss: 12.73963 | epsilon: 0.01
361 | 3580 episode | score: 9.59 | loss: 12.13551 | epsilon: 0.01
362 | 3590 episode | score: 9.49 | loss: 15.21393 | epsilon: 0.01
363 | 3600 episode | score: 9.71 | loss: 14.52880 | epsilon: 0.01
364 | 3610 episode | score: 9.87 | loss: 7.44391 | epsilon: 0.01
365 | 3620 episode | score: 10.68 | loss: 7.64639 | epsilon: 0.01
366 | 3630 episode | score: 10.61 | loss: 12.73185 | epsilon: 0.01
367 | 3640 episode | score: 10.62 | loss: 8.65501 | epsilon: 0.01
368 | 3650 episode | score: 10.40 | loss: 18.17788 | epsilon: 0.01
369 | 3660 episode | score: 10.62 | loss: 9.56333 | epsilon: 0.01
370 | 3670 episode | score: 10.49 | loss: 5.19324 | epsilon: 0.01
371 | 3680 episode | score: 10.69 | loss: 5.95822 | epsilon: 0.01
372 | 3690 episode | score: 10.74 | loss: 5.43674 | epsilon: 0.01
373 | 3700 episode | score: 11.05 | loss: 11.38299 | epsilon: 0.01
374 | 3710 episode | score: 11.20 | loss: 7.25396 | epsilon: 0.01
375 | 3720 episode | score: 10.94 | loss: 5.58109 | epsilon: 0.01
376 | 3730 episode | score: 10.70 | loss: 10.16100 | epsilon: 0.01
377 | 3740 episode | score: 10.63 | loss: 5.46199 | epsilon: 0.01
378 | 3750 episode | score: 10.43 | loss: 6.40074 | epsilon: 0.01
379 | 3760 episode | score: 10.29 | loss: 5.79258 | epsilon: 0.01
380 | 3770 episode | score: 10.58 | loss: 9.68128 | epsilon: 0.01
381 | 3780 episode | score: 10.54 | loss: 10.07901 | epsilon: 0.01
382 | 3790 episode | score: 10.32 | loss: 10.71799 | epsilon: 0.01
383 | 3800 episode | score: 10.16 | loss: 4.50272 | epsilon: 0.01
384 | 3810 episode | score: 9.98 | loss: 11.63281 | epsilon: 0.01
385 | 3820 episode | score: 9.98 | loss: 12.97600 | epsilon: 0.01
386 | 3830 episode | score: 9.84 | loss: 6.84899 | epsilon: 0.01
387 | 3840 episode | score: 10.17 | loss: 10.55282 | epsilon: 0.01
388 | 3850 episode | score: 10.25 | loss: 6.92412 | epsilon: 0.01
389 | 3860 episode | score: 12.36 | loss: 8.25691 | epsilon: 0.01
390 | 3870 episode | score: 12.19 | loss: 6.05455 | epsilon: 0.01
391 | 3880 episode | score: 12.74 | loss: 5.93086 | epsilon: 0.01
392 | 3890 episode | score: 12.99 | loss: 7.37379 | epsilon: 0.01
393 | 3900 episode | score: 13.56 | loss: 6.59420 | epsilon: 0.01
394 | 3910 episode | score: 13.96 | loss: 12.78928 | epsilon: 0.01
395 | 3920 episode | score: 14.24 | loss: 9.47036 | epsilon: 0.01
396 | 3930 episode | score: 13.80 | loss: 5.10357 | epsilon: 0.01
397 | 3940 episode | score: 13.83 | loss: 4.61695 | epsilon: 0.01
398 | 3950 episode | score: 13.77 | loss: 6.49820 | epsilon: 0.01
399 | 3960 episode | score: 13.54 | loss: 8.94904 | epsilon: 0.01
400 | 3970 episode | score: 13.80 | loss: 4.33383 | epsilon: 0.01
401 | 3980 episode | score: 13.58 | loss: 6.72084 | epsilon: 0.01
402 | 3990 episode | score: 13.11 | loss: 5.56996 | epsilon: 0.01
403 | 4000 episode | score: 14.05 | loss: 6.95229 | epsilon: 0.01
404 | 4010 episode | score: 15.22 | loss: 6.26257 | epsilon: 0.01
405 | 4020 episode | score: 14.96 | loss: 8.59238 | epsilon: 0.01
406 | 4030 episode | score: 14.81 | loss: 5.12621 | epsilon: 0.01
407 | 4040 episode | score: 16.56 | loss: 5.35934 | epsilon: 0.01
408 | 4050 episode | score: 17.87 | loss: 5.79597 | epsilon: 0.01
409 | 4060 episode | score: 18.46 | loss: 5.03159 | epsilon: 0.01
410 | 4070 episode | score: 17.79 | loss: 4.54399 | epsilon: 0.01
411 | 4080 episode | score: 17.09 | loss: 5.42259 | epsilon: 0.01
412 | 4090 episode | score: 16.82 | loss: 6.25688 | epsilon: 0.01
413 | 4100 episode | score: 16.47 | loss: 4.95715 | epsilon: 0.01
414 | 4110 episode | score: 16.43 | loss: 6.76208 | epsilon: 0.01
415 | 4120 episode | score: 15.76 | loss: 5.24656 | epsilon: 0.01
416 | 4130 episode | score: 15.09 | loss: 4.21207 | epsilon: 0.01
417 | 4140 episode | score: 14.77 | loss: 4.31235 | epsilon: 0.01
418 | 4150 episode | score: 14.27 | loss: 7.29699 | epsilon: 0.01
419 | 4160 episode | score: 14.58 | loss: 8.66764 | epsilon: 0.01
420 | 4170 episode | score: 13.98 | loss: 9.69282 | epsilon: 0.01
421 | 4180 episode | score: 13.57 | loss: 11.01928 | epsilon: 0.01
422 | 4190 episode | score: 13.16 | loss: 7.08015 | epsilon: 0.01
423 | 4200 episode | score: 12.75 | loss: 11.21581 | epsilon: 0.01
424 | 4210 episode | score: 12.74 | loss: 8.57672 | epsilon: 0.01
425 | 4220 episode | score: 12.66 | loss: 7.83942 | epsilon: 0.01
426 | 4230 episode | score: 12.39 | loss: 7.03415 | epsilon: 0.01
427 | 4240 episode | score: 12.18 | loss: 9.00213 | epsilon: 0.01
428 | 4250 episode | score: 11.87 | loss: 5.73147 | epsilon: 0.01
429 | 4260 episode | score: 11.55 | loss: 10.40915 | epsilon: 0.01
430 | 4270 episode | score: 11.38 | loss: 7.70147 | epsilon: 0.01
431 | 4280 episode | score: 11.92 | loss: 10.10661 | epsilon: 0.01
432 | 4290 episode | score: 12.47 | loss: 10.77976 | epsilon: 0.01
433 | 4300 episode | score: 12.25 | loss: 11.34655 | epsilon: 0.01
434 | 4310 episode | score: 12.67 | loss: 8.98248 | epsilon: 0.01
435 | 4320 episode | score: 12.81 | loss: 5.49514 | epsilon: 0.01
436 | 4330 episode | score: 12.64 | loss: 10.40312 | epsilon: 0.01
437 | 4340 episode | score: 12.71 | loss: 8.04442 | epsilon: 0.01
438 | 4350 episode | score: 12.84 | loss: 5.51279 | epsilon: 0.01
439 | 4360 episode | score: 13.04 | loss: 5.63618 | epsilon: 0.01
440 | 4370 episode | score: 14.36 | loss: 7.06970 | epsilon: 0.01
441 | 4380 episode | score: 15.96 | loss: 3.44520 | epsilon: 0.01
442 | 4390 episode | score: 15.99 | loss: 4.16670 | epsilon: 0.01
443 | 4400 episode | score: 15.70 | loss: 6.04997 | epsilon: 0.01
444 | 4410 episode | score: 16.32 | loss: 3.34858 | epsilon: 0.01
445 | 4420 episode | score: 16.23 | loss: 3.47187 | epsilon: 0.01
446 | 4430 episode | score: 15.74 | loss: 2.59578 | epsilon: 0.01
447 | 4440 episode | score: 15.06 | loss: 4.78465 | epsilon: 0.01
448 | 4450 episode | score: 14.43 | loss: 4.92613 | epsilon: 0.01
449 | 4460 episode | score: 13.87 | loss: 5.04869 | epsilon: 0.01
450 | 4470 episode | score: 13.38 | loss: 4.26093 | epsilon: 0.01
451 | 4480 episode | score: 12.92 | loss: 8.86478 | epsilon: 0.01
452 | 4490 episode | score: 12.60 | loss: 7.41720 | epsilon: 0.01
453 | 4500 episode | score: 12.27 | loss: 6.93199 | epsilon: 0.01
454 | 4510 episode | score: 12.09 | loss: 8.05820 | epsilon: 0.01
455 | 4520 episode | score: 12.52 | loss: 11.38961 | epsilon: 0.01
456 | 4530 episode | score: 12.23 | loss: 3.42938 | epsilon: 0.01
457 | 4540 episode | score: 11.87 | loss: 6.62923 | epsilon: 0.01
458 | 4550 episode | score: 11.51 | loss: 9.16134 | epsilon: 0.01
459 | 4560 episode | score: 12.50 | loss: 7.77041 | epsilon: 0.01
460 | 4570 episode | score: 12.57 | loss: 5.53519 | epsilon: 0.01
461 | 4580 episode | score: 12.25 | loss: 4.22186 | epsilon: 0.01
462 | 4590 episode | score: 12.01 | loss: 7.64343 | epsilon: 0.01
463 | 4600 episode | score: 11.69 | loss: 9.08254 | epsilon: 0.01
464 | 4610 episode | score: 12.21 | loss: 9.23807 | epsilon: 0.01
465 | 4620 episode | score: 12.35 | loss: 7.85843 | epsilon: 0.01
466 | 4630 episode | score: 12.43 | loss: 4.00312 | epsilon: 0.01
467 | 4640 episode | score: 12.37 | loss: 6.80646 | epsilon: 0.01
468 | 4650 episode | score: 12.02 | loss: 5.93498 | epsilon: 0.01
469 | 4660 episode | score: 11.71 | loss: 8.09913 | epsilon: 0.01
470 | 4670 episode | score: 11.43 | loss: 6.34250 | epsilon: 0.01
471 | 4680 episode | score: 11.56 | loss: 4.99433 | epsilon: 0.01
472 | 4690 episode | score: 12.29 | loss: 9.94151 | epsilon: 0.01
473 | 4700 episode | score: 11.97 | loss: 7.68697 | epsilon: 0.01
474 | 4710 episode | score: 11.65 | loss: 4.49063 | epsilon: 0.01
475 | 4720 episode | score: 11.32 | loss: 7.65144 | epsilon: 0.01
476 | 4730 episode | score: 11.11 | loss: 7.93525 | epsilon: 0.01
477 | 4740 episode | score: 11.40 | loss: 10.41766 | epsilon: 0.01
478 | 4750 episode | score: 11.14 | loss: 8.34215 | epsilon: 0.01
479 | 4760 episode | score: 11.02 | loss: 8.46258 | epsilon: 0.01
480 | 4770 episode | score: 10.77 | loss: 9.06955 | epsilon: 0.01
481 | 4780 episode | score: 11.02 | loss: 6.47147 | epsilon: 0.01
482 | 4790 episode | score: 11.77 | loss: 11.79726 | epsilon: 0.01
483 | 4800 episode | score: 11.89 | loss: 6.76751 | epsilon: 0.01
484 | 4810 episode | score: 11.53 | loss: 3.67404 | epsilon: 0.01
485 | 4820 episode | score: 11.22 | loss: 8.01371 | epsilon: 0.01
486 | 4830 episode | score: 11.01 | loss: 5.74920 | epsilon: 0.01
487 | 4840 episode | score: 10.85 | loss: 6.84008 | epsilon: 0.01
488 | 4850 episode | score: 10.70 | loss: 7.53777 | epsilon: 0.01
489 | 4860 episode | score: 10.54 | loss: 7.15283 | epsilon: 0.01
490 | 4870 episode | score: 10.38 | loss: 4.72300 | epsilon: 0.01
491 | 4880 episode | score: 10.19 | loss: 8.11345 | epsilon: 0.01
492 | 4890 episode | score: 10.07 | loss: 11.14361 | epsilon: 0.01
493 | 4900 episode | score: 9.91 | loss: 11.17979 | epsilon: 0.01
494 | 4910 episode | score: 10.35 | loss: 10.31738 | epsilon: 0.01
495 | 4920 episode | score: 10.26 | loss: 9.51637 | epsilon: 0.01
496 | 4930 episode | score: 10.12 | loss: 9.10228 | epsilon: 0.01
497 | 4940 episode | score: 10.10 | loss: 8.66119 | epsilon: 0.01
498 | 4950 episode | score: 9.98 | loss: 12.24035 | epsilon: 0.01
499 | 4960 episode | score: 10.12 | loss: 8.12571 | epsilon: 0.01
500 | 4970 episode | score: 9.96 | loss: 9.27201 | epsilon: 0.01
501 | 4980 episode | score: 9.76 | loss: 6.27622 | epsilon: 0.01
502 | 4990 episode | score: 10.96 | loss: 8.89670 | epsilon: 0.01
503 |
--------------------------------------------------------------------------------
/out/trace_DTQN_3.txt:
--------------------------------------------------------------------------------
1 | state size: 2
2 | action size: 2
3 | 0 episode | score: 29.00 | loss: 0.00000 | epsilon: 1.00
4 | 10 episode | score: 28.05 | loss: 0.00000 | epsilon: 1.00
5 | 20 episode | score: 27.76 | loss: 0.00000 | epsilon: 1.00
6 | 30 episode | score: 26.79 | loss: 0.00000 | epsilon: 1.00
7 | 40 episode | score: 26.39 | loss: 0.00000 | epsilon: 1.00
8 | 50 episode | score: 25.36 | loss: 0.28919 | epsilon: 1.00
9 | 60 episode | score: 25.46 | loss: 0.23288 | epsilon: 0.98
10 | 70 episode | score: 24.84 | loss: 0.11278 | epsilon: 0.97
11 | 80 episode | score: 25.18 | loss: 0.06212 | epsilon: 0.96
12 | 90 episode | score: 24.90 | loss: 0.24484 | epsilon: 0.95
13 | 100 episode | score: 24.19 | loss: 0.17145 | epsilon: 0.94
14 | 110 episode | score: 23.67 | loss: 0.42094 | epsilon: 0.93
15 | 120 episode | score: 23.28 | loss: 0.60395 | epsilon: 0.92
16 | 130 episode | score: 23.17 | loss: 0.96942 | epsilon: 0.90
17 | 140 episode | score: 23.19 | loss: 0.56008 | epsilon: 0.89
18 | 150 episode | score: 22.70 | loss: 0.59539 | epsilon: 0.88
19 | 160 episode | score: 22.48 | loss: 0.44283 | epsilon: 0.87
20 | 170 episode | score: 22.14 | loss: 0.87171 | epsilon: 0.86
21 | 180 episode | score: 21.63 | loss: 0.70601 | epsilon: 0.85
22 | 190 episode | score: 21.39 | loss: 0.28002 | epsilon: 0.84
23 | 200 episode | score: 22.30 | loss: 1.02817 | epsilon: 0.83
24 | 210 episode | score: 21.70 | loss: 1.63453 | epsilon: 0.82
25 | 220 episode | score: 22.18 | loss: 1.24204 | epsilon: 0.80
26 | 230 episode | score: 22.39 | loss: 1.17515 | epsilon: 0.79
27 | 240 episode | score: 22.06 | loss: 0.93853 | epsilon: 0.78
28 | 250 episode | score: 21.61 | loss: 0.95911 | epsilon: 0.77
29 | 260 episode | score: 21.10 | loss: 2.43419 | epsilon: 0.76
30 | 270 episode | score: 20.45 | loss: 0.67385 | epsilon: 0.76
31 | 280 episode | score: 19.91 | loss: 1.40381 | epsilon: 0.75
32 | 290 episode | score: 19.91 | loss: 1.05976 | epsilon: 0.74
33 | 300 episode | score: 20.05 | loss: 1.42052 | epsilon: 0.73
34 | 310 episode | score: 20.11 | loss: 0.39303 | epsilon: 0.72
35 | 320 episode | score: 19.67 | loss: 1.86086 | epsilon: 0.71
36 | 330 episode | score: 19.71 | loss: 0.78429 | epsilon: 0.70
37 | 340 episode | score: 19.21 | loss: 1.49736 | epsilon: 0.69
38 | 350 episode | score: 18.73 | loss: 0.79869 | epsilon: 0.68
39 | 360 episode | score: 18.79 | loss: 1.20941 | epsilon: 0.67
40 | 370 episode | score: 18.69 | loss: 1.23513 | epsilon: 0.66
41 | 380 episode | score: 18.26 | loss: 1.24348 | epsilon: 0.65
42 | 390 episode | score: 18.08 | loss: 0.44699 | epsilon: 0.65
43 | 400 episode | score: 18.15 | loss: 2.12892 | epsilon: 0.64
44 | 410 episode | score: 18.10 | loss: 1.73747 | epsilon: 0.63
45 | 420 episode | score: 17.90 | loss: 0.91533 | epsilon: 0.62
46 | 430 episode | score: 17.50 | loss: 2.23212 | epsilon: 0.61
47 | 440 episode | score: 18.26 | loss: 0.95165 | epsilon: 0.60
48 | 450 episode | score: 17.96 | loss: 1.67144 | epsilon: 0.59
49 | 460 episode | score: 17.76 | loss: 0.48981 | epsilon: 0.58
50 | 470 episode | score: 17.69 | loss: 1.89966 | epsilon: 0.57
51 | 480 episode | score: 17.55 | loss: 1.95501 | epsilon: 0.56
52 | 490 episode | score: 17.43 | loss: 0.83967 | epsilon: 0.55
53 | 500 episode | score: 17.33 | loss: 1.94789 | epsilon: 0.55
54 | 510 episode | score: 16.99 | loss: 2.94673 | epsilon: 0.54
55 | 520 episode | score: 16.67 | loss: 2.02298 | epsilon: 0.53
56 | 530 episode | score: 16.61 | loss: 2.01256 | epsilon: 0.52
57 | 540 episode | score: 16.47 | loss: 3.10653 | epsilon: 0.51
58 | 550 episode | score: 16.94 | loss: 2.09790 | epsilon: 0.50
59 | 560 episode | score: 16.67 | loss: 2.59242 | epsilon: 0.50
60 | 570 episode | score: 16.50 | loss: 3.64957 | epsilon: 0.49
61 | 580 episode | score: 16.16 | loss: 0.54352 | epsilon: 0.48
62 | 590 episode | score: 15.85 | loss: 3.84055 | epsilon: 0.47
63 | 600 episode | score: 15.43 | loss: 2.66422 | epsilon: 0.47
64 | 610 episode | score: 15.15 | loss: 2.68563 | epsilon: 0.46
65 | 620 episode | score: 14.77 | loss: 4.83064 | epsilon: 0.46
66 | 630 episode | score: 14.36 | loss: 3.35041 | epsilon: 0.45
67 | 640 episode | score: 14.00 | loss: 3.83584 | epsilon: 0.44
68 | 650 episode | score: 14.20 | loss: 4.90510 | epsilon: 0.44
69 | 660 episode | score: 13.95 | loss: 4.43448 | epsilon: 0.43
70 | 670 episode | score: 13.63 | loss: 3.32779 | epsilon: 0.42
71 | 680 episode | score: 13.54 | loss: 2.79640 | epsilon: 0.42
72 | 690 episode | score: 13.35 | loss: 3.95378 | epsilon: 0.41
73 | 700 episode | score: 13.16 | loss: 2.83241 | epsilon: 0.40
74 | 710 episode | score: 13.19 | loss: 4.54109 | epsilon: 0.40
75 | 720 episode | score: 13.05 | loss: 3.42609 | epsilon: 0.39
76 | 730 episode | score: 13.22 | loss: 2.38219 | epsilon: 0.38
77 | 740 episode | score: 12.99 | loss: 4.02707 | epsilon: 0.38
78 | 750 episode | score: 13.24 | loss: 2.98577 | epsilon: 0.37
79 | 760 episode | score: 13.19 | loss: 4.69834 | epsilon: 0.36
80 | 770 episode | score: 13.21 | loss: 4.26403 | epsilon: 0.35
81 | 780 episode | score: 12.95 | loss: 5.31904 | epsilon: 0.35
82 | 790 episode | score: 12.87 | loss: 4.12925 | epsilon: 0.34
83 | 800 episode | score: 12.56 | loss: 1.19411 | epsilon: 0.34
84 | 810 episode | score: 12.33 | loss: 5.39290 | epsilon: 0.33
85 | 820 episode | score: 12.34 | loss: 5.47288 | epsilon: 0.32
86 | 830 episode | score: 12.19 | loss: 3.06479 | epsilon: 0.32
87 | 840 episode | score: 12.10 | loss: 4.25964 | epsilon: 0.31
88 | 850 episode | score: 11.98 | loss: 6.09201 | epsilon: 0.31
89 | 860 episode | score: 11.82 | loss: 6.12898 | epsilon: 0.30
90 | 870 episode | score: 11.67 | loss: 3.80265 | epsilon: 0.29
91 | 880 episode | score: 11.57 | loss: 5.54596 | epsilon: 0.29
92 | 890 episode | score: 11.67 | loss: 7.39544 | epsilon: 0.28
93 | 900 episode | score: 11.83 | loss: 5.60033 | epsilon: 0.28
94 | 910 episode | score: 11.80 | loss: 6.22728 | epsilon: 0.27
95 | 920 episode | score: 11.77 | loss: 6.29498 | epsilon: 0.26
96 | 930 episode | score: 11.73 | loss: 6.87828 | epsilon: 0.26
97 | 940 episode | score: 11.57 | loss: 3.14054 | epsilon: 0.25
98 | 950 episode | score: 11.44 | loss: 4.42497 | epsilon: 0.25
99 | 960 episode | score: 11.75 | loss: 6.47100 | epsilon: 0.24
100 | 970 episode | score: 12.03 | loss: 2.65088 | epsilon: 0.23
101 | 980 episode | score: 11.77 | loss: 3.27828 | epsilon: 0.22
102 | 990 episode | score: 11.52 | loss: 5.82138 | epsilon: 0.22
103 | 1000 episode | score: 11.37 | loss: 5.23701 | epsilon: 0.21
104 | 1010 episode | score: 11.30 | loss: 5.16171 | epsilon: 0.21
105 | 1020 episode | score: 11.14 | loss: 4.57056 | epsilon: 0.20
106 | 1030 episode | score: 10.93 | loss: 8.42826 | epsilon: 0.20
107 | 1040 episode | score: 10.83 | loss: 3.27311 | epsilon: 0.19
108 | 1050 episode | score: 10.78 | loss: 6.52097 | epsilon: 0.19
109 | 1060 episode | score: 10.73 | loss: 6.55692 | epsilon: 0.18
110 | 1070 episode | score: 10.52 | loss: 10.46111 | epsilon: 0.18
111 | 1080 episode | score: 10.43 | loss: 5.27200 | epsilon: 0.17
112 | 1090 episode | score: 10.31 | loss: 6.59293 | epsilon: 0.17
113 | 1100 episode | score: 10.26 | loss: 7.88621 | epsilon: 0.16
114 | 1110 episode | score: 10.48 | loss: 5.32516 | epsilon: 0.15
115 | 1120 episode | score: 10.32 | loss: 5.33982 | epsilon: 0.15
116 | 1130 episode | score: 10.32 | loss: 8.01697 | epsilon: 0.14
117 | 1140 episode | score: 10.26 | loss: 4.68853 | epsilon: 0.14
118 | 1150 episode | score: 10.28 | loss: 8.93066 | epsilon: 0.13
119 | 1160 episode | score: 10.23 | loss: 7.86792 | epsilon: 0.13
120 | 1170 episode | score: 10.24 | loss: 3.57884 | epsilon: 0.12
121 | 1180 episode | score: 10.76 | loss: 6.14561 | epsilon: 0.11
122 | 1190 episode | score: 10.62 | loss: 7.47268 | epsilon: 0.11
123 | 1200 episode | score: 10.99 | loss: 8.17394 | epsilon: 0.10
124 | 1210 episode | score: 10.86 | loss: 5.48570 | epsilon: 0.09
125 | 1220 episode | score: 11.02 | loss: 4.13661 | epsilon: 0.09
126 | 1230 episode | score: 11.16 | loss: 6.14135 | epsilon: 0.08
127 | 1240 episode | score: 11.00 | loss: 5.50650 | epsilon: 0.08
128 | 1250 episode | score: 11.53 | loss: 4.82163 | epsilon: 0.07
129 | 1260 episode | score: 11.25 | loss: 4.83445 | epsilon: 0.06
130 | 1270 episode | score: 11.17 | loss: 6.16942 | epsilon: 0.06
131 | 1280 episode | score: 11.03 | loss: 4.15855 | epsilon: 0.05
132 | 1290 episode | score: 10.96 | loss: 6.19270 | epsilon: 0.05
133 | 1300 episode | score: 10.77 | loss: 10.95965 | epsilon: 0.04
134 | 1310 episode | score: 10.54 | loss: 7.58096 | epsilon: 0.04
135 | 1320 episode | score: 10.33 | loss: 7.57102 | epsilon: 0.03
136 | 1330 episode | score: 10.18 | loss: 6.90589 | epsilon: 0.03
137 | 1340 episode | score: 10.03 | loss: 8.31337 | epsilon: 0.02
138 | 1350 episode | score: 9.90 | loss: 7.61505 | epsilon: 0.02
139 | 1360 episode | score: 9.80 | loss: 9.71177 | epsilon: 0.01
140 | 1370 episode | score: 9.65 | loss: 8.40706 | epsilon: 0.01
141 | 1380 episode | score: 9.55 | loss: 11.78060 | epsilon: 0.01
142 | 1390 episode | score: 9.41 | loss: 8.36876 | epsilon: 0.01
143 | 1400 episode | score: 9.38 | loss: 10.45550 | epsilon: 0.01
144 | 1410 episode | score: 9.42 | loss: 4.21109 | epsilon: 0.01
145 | 1420 episode | score: 9.33 | loss: 10.47225 | epsilon: 0.01
146 | 1430 episode | score: 9.36 | loss: 7.83792 | epsilon: 0.01
147 | 1440 episode | score: 9.24 | loss: 7.73365 | epsilon: 0.01
148 | 1450 episode | score: 9.28 | loss: 7.04406 | epsilon: 0.01
149 | 1460 episode | score: 9.22 | loss: 6.33223 | epsilon: 0.01
150 | 1470 episode | score: 9.11 | loss: 11.27962 | epsilon: 0.01
151 | 1480 episode | score: 9.09 | loss: 8.48455 | epsilon: 0.01
152 | 1490 episode | score: 9.11 | loss: 9.19448 | epsilon: 0.01
153 | 1500 episode | score: 9.05 | loss: 9.20194 | epsilon: 0.01
154 | 1510 episode | score: 8.97 | loss: 10.62578 | epsilon: 0.01
155 | 1520 episode | score: 9.04 | loss: 8.54028 | epsilon: 0.01
156 | 1530 episode | score: 9.00 | loss: 8.52981 | epsilon: 0.01
157 | 1540 episode | score: 9.02 | loss: 7.83667 | epsilon: 0.01
158 | 1550 episode | score: 8.95 | loss: 12.84651 | epsilon: 0.01
159 | 1560 episode | score: 9.13 | loss: 7.17148 | epsilon: 0.01
160 | 1570 episode | score: 9.27 | loss: 8.56434 | epsilon: 0.01
161 | 1580 episode | score: 9.35 | loss: 11.46341 | epsilon: 0.01
162 | 1590 episode | score: 9.27 | loss: 8.61150 | epsilon: 0.01
163 | 1600 episode | score: 9.65 | loss: 10.73893 | epsilon: 0.01
164 | 1610 episode | score: 9.61 | loss: 7.24374 | epsilon: 0.01
165 | 1620 episode | score: 9.56 | loss: 7.25160 | epsilon: 0.01
166 | 1630 episode | score: 9.42 | loss: 6.48143 | epsilon: 0.01
167 | 1640 episode | score: 9.36 | loss: 8.69397 | epsilon: 0.01
168 | 1650 episode | score: 9.29 | loss: 7.95489 | epsilon: 0.01
169 | 1660 episode | score: 9.34 | loss: 8.01149 | epsilon: 0.01
170 | 1670 episode | score: 9.89 | loss: 9.41617 | epsilon: 0.01
171 | 1680 episode | score: 9.77 | loss: 10.11328 | epsilon: 0.01
172 | 1690 episode | score: 10.21 | loss: 10.17882 | epsilon: 0.01
173 | 1700 episode | score: 10.01 | loss: 7.34649 | epsilon: 0.01
174 | 1710 episode | score: 9.93 | loss: 8.06988 | epsilon: 0.01
175 | 1720 episode | score: 9.95 | loss: 5.81043 | epsilon: 0.01
176 | 1730 episode | score: 9.81 | loss: 5.08927 | epsilon: 0.01
177 | 1740 episode | score: 9.82 | loss: 8.00304 | epsilon: 0.01
178 | 1750 episode | score: 9.66 | loss: 6.57829 | epsilon: 0.01
179 | 1760 episode | score: 9.54 | loss: 7.31608 | epsilon: 0.01
180 | 1770 episode | score: 9.43 | loss: 8.07021 | epsilon: 0.01
181 | 1780 episode | score: 9.34 | loss: 8.83427 | epsilon: 0.01
182 | 1790 episode | score: 9.30 | loss: 8.83933 | epsilon: 0.01
183 | 1800 episode | score: 9.24 | loss: 5.92690 | epsilon: 0.01
184 | 1810 episode | score: 9.16 | loss: 11.05879 | epsilon: 0.01
185 | 1820 episode | score: 9.08 | loss: 11.79762 | epsilon: 0.01
186 | 1830 episode | score: 9.00 | loss: 12.45628 | epsilon: 0.01
187 | 1840 episode | score: 9.03 | loss: 8.79590 | epsilon: 0.01
188 | 1850 episode | score: 9.01 | loss: 10.99454 | epsilon: 0.01
189 | 1860 episode | score: 9.05 | loss: 8.92821 | epsilon: 0.01
190 | 1870 episode | score: 9.17 | loss: 10.29554 | epsilon: 0.01
191 | 1880 episode | score: 9.10 | loss: 12.52881 | epsilon: 0.01
192 | 1890 episode | score: 9.00 | loss: 11.12002 | epsilon: 0.01
193 | 1900 episode | score: 8.95 | loss: 9.59325 | epsilon: 0.01
194 | 1910 episode | score: 8.93 | loss: 13.28574 | epsilon: 0.01
195 | 1920 episode | score: 8.91 | loss: 10.31185 | epsilon: 0.01
196 | 1930 episode | score: 8.89 | loss: 12.52143 | epsilon: 0.01
197 | 1940 episode | score: 8.91 | loss: 9.66494 | epsilon: 0.01
198 | 1950 episode | score: 8.88 | loss: 11.86391 | epsilon: 0.01
199 | 1960 episode | score: 8.77 | loss: 9.59976 | epsilon: 0.01
200 | 1970 episode | score: 8.72 | loss: 14.83946 | epsilon: 0.01
201 | 1980 episode | score: 8.71 | loss: 12.61964 | epsilon: 0.01
202 | 1990 episode | score: 8.65 | loss: 14.11471 | epsilon: 0.01
203 | 2000 episode | score: 8.64 | loss: 10.42087 | epsilon: 0.01
204 | 2010 episode | score: 8.56 | loss: 11.18098 | epsilon: 0.01
205 | 2020 episode | score: 8.56 | loss: 7.49533 | epsilon: 0.01
206 | 2030 episode | score: 8.52 | loss: 11.23166 | epsilon: 0.01
207 | 2040 episode | score: 8.49 | loss: 13.45408 | epsilon: 0.01
208 | 2050 episode | score: 8.50 | loss: 15.70362 | epsilon: 0.01
209 | 2060 episode | score: 8.47 | loss: 8.98237 | epsilon: 0.01
210 | 2070 episode | score: 8.48 | loss: 9.80346 | epsilon: 0.01
211 | 2080 episode | score: 8.47 | loss: 17.99310 | epsilon: 0.01
212 | 2090 episode | score: 8.51 | loss: 9.07208 | epsilon: 0.01
213 | 2100 episode | score: 8.52 | loss: 12.03263 | epsilon: 0.01
214 | 2110 episode | score: 8.50 | loss: 11.25899 | epsilon: 0.01
215 | 2120 episode | score: 8.47 | loss: 15.76247 | epsilon: 0.01
216 | 2130 episode | score: 8.47 | loss: 9.03900 | epsilon: 0.01
217 | 2140 episode | score: 8.44 | loss: 7.63310 | epsilon: 0.01
218 | 2150 episode | score: 8.45 | loss: 11.31914 | epsilon: 0.01
219 | 2160 episode | score: 8.50 | loss: 10.52649 | epsilon: 0.01
220 | 2170 episode | score: 8.49 | loss: 9.82745 | epsilon: 0.01
221 | 2180 episode | score: 8.46 | loss: 9.84642 | epsilon: 0.01
222 | 2190 episode | score: 8.47 | loss: 11.29416 | epsilon: 0.01
223 | 2200 episode | score: 8.43 | loss: 12.78517 | epsilon: 0.01
224 | 2210 episode | score: 8.44 | loss: 12.05975 | epsilon: 0.01
225 | 2220 episode | score: 8.43 | loss: 9.84734 | epsilon: 0.01
226 | 2230 episode | score: 8.45 | loss: 9.78824 | epsilon: 0.01
227 | 2240 episode | score: 8.44 | loss: 9.05212 | epsilon: 0.01
228 | 2250 episode | score: 8.41 | loss: 15.08025 | epsilon: 0.01
229 | 2260 episode | score: 8.61 | loss: 9.82519 | epsilon: 0.01
230 | 2270 episode | score: 8.55 | loss: 13.58749 | epsilon: 0.01
231 | 2280 episode | score: 8.55 | loss: 13.59709 | epsilon: 0.01
232 | 2290 episode | score: 8.51 | loss: 14.37598 | epsilon: 0.01
233 | 2300 episode | score: 8.51 | loss: 13.62597 | epsilon: 0.01
234 | 2310 episode | score: 8.47 | loss: 11.37246 | epsilon: 0.01
235 | 2320 episode | score: 8.46 | loss: 12.93501 | epsilon: 0.01
236 | 2330 episode | score: 8.50 | loss: 12.89200 | epsilon: 0.01
237 | 2340 episode | score: 8.47 | loss: 10.60221 | epsilon: 0.01
238 | 2350 episode | score: 8.56 | loss: 12.27176 | epsilon: 0.01
239 | 2360 episode | score: 8.64 | loss: 10.60565 | epsilon: 0.01
240 | 2370 episode | score: 8.69 | loss: 8.35473 | epsilon: 0.01
241 | 2380 episode | score: 8.67 | loss: 12.27118 | epsilon: 0.01
242 | 2390 episode | score: 8.62 | loss: 7.66543 | epsilon: 0.01
243 | 2400 episode | score: 8.61 | loss: 6.15029 | epsilon: 0.01
244 | 2410 episode | score: 8.59 | loss: 9.96755 | epsilon: 0.01
245 | 2420 episode | score: 8.71 | loss: 12.24784 | epsilon: 0.01
246 | 2430 episode | score: 8.73 | loss: 9.98101 | epsilon: 0.01
247 | 2440 episode | score: 8.79 | loss: 11.48864 | epsilon: 0.01
248 | 2450 episode | score: 8.72 | loss: 7.72658 | epsilon: 0.01
249 | 2460 episode | score: 8.71 | loss: 8.52732 | epsilon: 0.01
250 | 2470 episode | score: 8.69 | loss: 12.31986 | epsilon: 0.01
251 | 2480 episode | score: 8.66 | loss: 9.25687 | epsilon: 0.01
252 | 2490 episode | score: 8.84 | loss: 6.97792 | epsilon: 0.01
253 | 2500 episode | score: 8.88 | loss: 7.70792 | epsilon: 0.01
254 | 2510 episode | score: 8.82 | loss: 7.77612 | epsilon: 0.01
255 | 2520 episode | score: 8.78 | loss: 10.07798 | epsilon: 0.01
256 | 2530 episode | score: 8.78 | loss: 4.70675 | epsilon: 0.01
257 | 2540 episode | score: 8.81 | loss: 10.03284 | epsilon: 0.01
258 | 2550 episode | score: 8.77 | loss: 10.06382 | epsilon: 0.01
259 | 2560 episode | score: 8.72 | loss: 7.71937 | epsilon: 0.01
260 | 2570 episode | score: 8.70 | loss: 12.33020 | epsilon: 0.01
261 | 2580 episode | score: 8.67 | loss: 10.13223 | epsilon: 0.01
262 | 2590 episode | score: 8.63 | loss: 16.95634 | epsilon: 0.01
263 | 2600 episode | score: 8.67 | loss: 6.94287 | epsilon: 0.01
264 | 2610 episode | score: 8.63 | loss: 10.93829 | epsilon: 0.01
265 | 2620 episode | score: 8.64 | loss: 10.09154 | epsilon: 0.01
266 | 2630 episode | score: 8.65 | loss: 11.65590 | epsilon: 0.01
267 | 2640 episode | score: 8.73 | loss: 8.57849 | epsilon: 0.01
268 | 2650 episode | score: 8.69 | loss: 10.87277 | epsilon: 0.01
269 | 2660 episode | score: 8.80 | loss: 9.33172 | epsilon: 0.01
270 | 2670 episode | score: 8.80 | loss: 7.81524 | epsilon: 0.01
271 | 2680 episode | score: 8.82 | loss: 8.58555 | epsilon: 0.01
272 | 2690 episode | score: 8.81 | loss: 7.02390 | epsilon: 0.01
273 | 2700 episode | score: 8.74 | loss: 8.66500 | epsilon: 0.01
274 | 2710 episode | score: 8.69 | loss: 11.73996 | epsilon: 0.01
275 | 2720 episode | score: 8.68 | loss: 7.07774 | epsilon: 0.01
276 | 2730 episode | score: 8.66 | loss: 10.19452 | epsilon: 0.01
277 | 2740 episode | score: 8.65 | loss: 7.81020 | epsilon: 0.01
278 | 2750 episode | score: 8.60 | loss: 10.97668 | epsilon: 0.01
279 | 2760 episode | score: 8.70 | loss: 10.99998 | epsilon: 0.01
280 | 2770 episode | score: 8.63 | loss: 9.45322 | epsilon: 0.01
281 | 2780 episode | score: 8.66 | loss: 10.94385 | epsilon: 0.01
282 | 2790 episode | score: 8.67 | loss: 14.88072 | epsilon: 0.01
283 | 2800 episode | score: 8.72 | loss: 12.54147 | epsilon: 0.01
284 | 2810 episode | score: 8.68 | loss: 10.23026 | epsilon: 0.01
285 | 2820 episode | score: 8.66 | loss: 12.69516 | epsilon: 0.01
286 | 2830 episode | score: 8.63 | loss: 13.34911 | epsilon: 0.01
287 | 2840 episode | score: 8.61 | loss: 11.87338 | epsilon: 0.01
288 | 2850 episode | score: 8.59 | loss: 11.08297 | epsilon: 0.01
289 | 2860 episode | score: 8.57 | loss: 11.82590 | epsilon: 0.01
290 | 2870 episode | score: 8.87 | loss: 9.44830 | epsilon: 0.01
291 | 2880 episode | score: 8.81 | loss: 10.29489 | epsilon: 0.01
292 | 2890 episode | score: 8.80 | loss: 10.28550 | epsilon: 0.01
293 | 2900 episode | score: 8.76 | loss: 11.07845 | epsilon: 0.01
294 | 2910 episode | score: 8.72 | loss: 11.16145 | epsilon: 0.01
295 | 2920 episode | score: 8.71 | loss: 9.51274 | epsilon: 0.01
296 | 2930 episode | score: 8.71 | loss: 15.77555 | epsilon: 0.01
297 | 2940 episode | score: 8.68 | loss: 9.46159 | epsilon: 0.01
298 | 2950 episode | score: 8.71 | loss: 11.06020 | epsilon: 0.01
299 | 2960 episode | score: 8.70 | loss: 12.72564 | epsilon: 0.01
300 | 2970 episode | score: 8.64 | loss: 10.22397 | epsilon: 0.01
301 | 2980 episode | score: 8.95 | loss: 8.74962 | epsilon: 0.01
302 | 2990 episode | score: 9.00 | loss: 7.99252 | epsilon: 0.01
303 | 3000 episode | score: 8.98 | loss: 8.66277 | epsilon: 0.01
304 | 3010 episode | score: 9.04 | loss: 11.75993 | epsilon: 0.01
305 | 3020 episode | score: 8.98 | loss: 11.04120 | epsilon: 0.01
306 | 3030 episode | score: 8.92 | loss: 9.44570 | epsilon: 0.01
307 | 3040 episode | score: 8.92 | loss: 14.13674 | epsilon: 0.01
308 | 3050 episode | score: 8.82 | loss: 14.19668 | epsilon: 0.01
309 | 3060 episode | score: 8.80 | loss: 8.63911 | epsilon: 0.01
310 | 3070 episode | score: 8.78 | loss: 10.21673 | epsilon: 0.01
311 | 3080 episode | score: 8.73 | loss: 10.22470 | epsilon: 0.01
312 | 3090 episode | score: 8.71 | loss: 11.04682 | epsilon: 0.01
313 | 3100 episode | score: 8.68 | loss: 9.47171 | epsilon: 0.01
314 | 3110 episode | score: 8.65 | loss: 13.38438 | epsilon: 0.01
315 | 3120 episode | score: 8.66 | loss: 11.09533 | epsilon: 0.01
316 | 3130 episode | score: 8.71 | loss: 13.38132 | epsilon: 0.01
317 | 3140 episode | score: 8.68 | loss: 8.69981 | epsilon: 0.01
318 | 3150 episode | score: 8.65 | loss: 11.82153 | epsilon: 0.01
319 | 3160 episode | score: 8.65 | loss: 7.90522 | epsilon: 0.01
320 | 3170 episode | score: 8.74 | loss: 10.32336 | epsilon: 0.01
321 | 3180 episode | score: 8.68 | loss: 9.49167 | epsilon: 0.01
322 | 3190 episode | score: 8.68 | loss: 13.40731 | epsilon: 0.01
323 | 3200 episode | score: 8.67 | loss: 14.95716 | epsilon: 0.01
324 | 3210 episode | score: 8.64 | loss: 14.95343 | epsilon: 0.01
325 | 3220 episode | score: 8.62 | loss: 9.44335 | epsilon: 0.01
326 | 3230 episode | score: 8.55 | loss: 12.65490 | epsilon: 0.01
327 | 3240 episode | score: 8.51 | loss: 14.23077 | epsilon: 0.01
328 | 3250 episode | score: 8.60 | loss: 11.86367 | epsilon: 0.01
329 | 3260 episode | score: 8.58 | loss: 12.63986 | epsilon: 0.01
330 | 3270 episode | score: 8.63 | loss: 14.20297 | epsilon: 0.01
331 | 3280 episode | score: 8.62 | loss: 15.74779 | epsilon: 0.01
332 | 3290 episode | score: 8.59 | loss: 9.47025 | epsilon: 0.01
333 | 3300 episode | score: 8.72 | loss: 11.01989 | epsilon: 0.01
334 | 3310 episode | score: 8.71 | loss: 7.90248 | epsilon: 0.01
335 | 3320 episode | score: 8.70 | loss: 8.67504 | epsilon: 0.01
336 | 3330 episode | score: 8.68 | loss: 9.49761 | epsilon: 0.01
337 | 3340 episode | score: 8.69 | loss: 11.00257 | epsilon: 0.01
338 | 3350 episode | score: 8.69 | loss: 15.71583 | epsilon: 0.01
339 | 3360 episode | score: 8.82 | loss: 11.00038 | epsilon: 0.01
340 | 3370 episode | score: 8.78 | loss: 13.48424 | epsilon: 0.01
341 | 3380 episode | score: 8.76 | loss: 12.62969 | epsilon: 0.01
342 | 3390 episode | score: 8.70 | loss: 10.23487 | epsilon: 0.01
343 | 3400 episode | score: 8.71 | loss: 11.04372 | epsilon: 0.01
344 | 3410 episode | score: 8.68 | loss: 13.37738 | epsilon: 0.01
345 | 3420 episode | score: 8.65 | loss: 11.09719 | epsilon: 0.01
346 | 3430 episode | score: 8.62 | loss: 13.40267 | epsilon: 0.01
347 | 3440 episode | score: 8.55 | loss: 8.69839 | epsilon: 0.01
348 | 3450 episode | score: 8.55 | loss: 11.86583 | epsilon: 0.01
349 | 3460 episode | score: 8.57 | loss: 13.41148 | epsilon: 0.01
350 | 3470 episode | score: 8.59 | loss: 10.25880 | epsilon: 0.01
351 | 3480 episode | score: 8.58 | loss: 10.23511 | epsilon: 0.01
352 | 3490 episode | score: 8.56 | loss: 11.02219 | epsilon: 0.01
353 | 3500 episode | score: 8.53 | loss: 14.15510 | epsilon: 0.01
354 | 3510 episode | score: 8.51 | loss: 10.25200 | epsilon: 0.01
355 | 3520 episode | score: 8.50 | loss: 9.49207 | epsilon: 0.01
356 | 3530 episode | score: 8.45 | loss: 13.36084 | epsilon: 0.01
357 | 3540 episode | score: 8.59 | loss: 11.04989 | epsilon: 0.01
358 | 3550 episode | score: 8.66 | loss: 11.04657 | epsilon: 0.01
359 | 3560 episode | score: 8.73 | loss: 13.35894 | epsilon: 0.01
360 | 3570 episode | score: 8.70 | loss: 8.79981 | epsilon: 0.01
361 | 3580 episode | score: 8.66 | loss: 15.00119 | epsilon: 0.01
362 | 3590 episode | score: 8.61 | loss: 14.17593 | epsilon: 0.01
363 | 3600 episode | score: 8.57 | loss: 14.23392 | epsilon: 0.01
364 | 3610 episode | score: 8.56 | loss: 11.84069 | epsilon: 0.01
365 | 3620 episode | score: 8.67 | loss: 11.06954 | epsilon: 0.01
366 | 3630 episode | score: 8.61 | loss: 7.12155 | epsilon: 0.01
367 | 3640 episode | score: 8.59 | loss: 14.26435 | epsilon: 0.01
368 | 3650 episode | score: 8.60 | loss: 11.10530 | epsilon: 0.01
369 | 3660 episode | score: 8.71 | loss: 13.45249 | epsilon: 0.01
370 | 3670 episode | score: 8.74 | loss: 10.33090 | epsilon: 0.01
371 | 3680 episode | score: 8.69 | loss: 5.57306 | epsilon: 0.01
372 | 3690 episode | score: 8.80 | loss: 10.29642 | epsilon: 0.01
373 | 3700 episode | score: 8.76 | loss: 10.33631 | epsilon: 0.01
374 | 3710 episode | score: 8.74 | loss: 10.33454 | epsilon: 0.01
375 | 3720 episode | score: 8.72 | loss: 10.36367 | epsilon: 0.01
376 | 3730 episode | score: 8.69 | loss: 12.73713 | epsilon: 0.01
377 | 3740 episode | score: 8.68 | loss: 9.54919 | epsilon: 0.01
378 | 3750 episode | score: 8.79 | loss: 10.33279 | epsilon: 0.01
379 | 3760 episode | score: 8.92 | loss: 11.97439 | epsilon: 0.01
380 | 3770 episode | score: 8.85 | loss: 12.01616 | epsilon: 0.01
381 | 3780 episode | score: 8.86 | loss: 11.89962 | epsilon: 0.01
382 | 3790 episode | score: 8.76 | loss: 12.68493 | epsilon: 0.01
383 | 3800 episode | score: 8.75 | loss: 14.28559 | epsilon: 0.01
384 | 3810 episode | score: 8.68 | loss: 9.54066 | epsilon: 0.01
385 | 3820 episode | score: 8.70 | loss: 12.69197 | epsilon: 0.01
386 | 3830 episode | score: 8.67 | loss: 14.34066 | epsilon: 0.01
387 | 3840 episode | score: 8.61 | loss: 8.75319 | epsilon: 0.01
388 | 3850 episode | score: 8.60 | loss: 10.33837 | epsilon: 0.01
389 | 3860 episode | score: 8.57 | loss: 12.74596 | epsilon: 0.01
390 | 3870 episode | score: 8.67 | loss: 11.18132 | epsilon: 0.01
391 | 3880 episode | score: 8.63 | loss: 12.75840 | epsilon: 0.01
392 | 3890 episode | score: 8.63 | loss: 12.73372 | epsilon: 0.01
393 | 3900 episode | score: 8.60 | loss: 9.57410 | epsilon: 0.01
394 | 3910 episode | score: 8.57 | loss: 11.99219 | epsilon: 0.01
395 | 3920 episode | score: 8.53 | loss: 12.74140 | epsilon: 0.01
396 | 3930 episode | score: 8.50 | loss: 13.56777 | epsilon: 0.01
397 | 3940 episode | score: 8.52 | loss: 9.62430 | epsilon: 0.01
398 | 3950 episode | score: 8.51 | loss: 15.94888 | epsilon: 0.01
399 | 3960 episode | score: 8.53 | loss: 14.44929 | epsilon: 0.01
400 | 3970 episode | score: 8.51 | loss: 14.33287 | epsilon: 0.01
401 | 3980 episode | score: 8.54 | loss: 11.23067 | epsilon: 0.01
402 | 3990 episode | score: 8.59 | loss: 14.34064 | epsilon: 0.01
403 | 4000 episode | score: 8.52 | loss: 8.06927 | epsilon: 0.01
404 | 4010 episode | score: 8.86 | loss: 4.83090 | epsilon: 0.01
405 | 4020 episode | score: 9.16 | loss: 9.64705 | epsilon: 0.01
406 | 4030 episode | score: 9.07 | loss: 8.81061 | epsilon: 0.01
407 | 4040 episode | score: 9.02 | loss: 9.68764 | epsilon: 0.01
408 | 4050 episode | score: 8.91 | loss: 15.24696 | epsilon: 0.01
409 | 4060 episode | score: 8.87 | loss: 12.83159 | epsilon: 0.01
410 | 4070 episode | score: 8.85 | loss: 8.83373 | epsilon: 0.01
411 | 4080 episode | score: 8.83 | loss: 8.02803 | epsilon: 0.01
412 | 4090 episode | score: 8.82 | loss: 11.28403 | epsilon: 0.01
413 | 4100 episode | score: 8.88 | loss: 10.47824 | epsilon: 0.01
414 | 4110 episode | score: 8.82 | loss: 13.57903 | epsilon: 0.01
415 | 4120 episode | score: 8.75 | loss: 7.18897 | epsilon: 0.01
416 | 4130 episode | score: 8.72 | loss: 9.63034 | epsilon: 0.01
417 | 4140 episode | score: 8.66 | loss: 8.00251 | epsilon: 0.01
418 | 4150 episode | score: 8.63 | loss: 9.60776 | epsilon: 0.01
419 | 4160 episode | score: 8.58 | loss: 11.98508 | epsilon: 0.01
420 | 4170 episode | score: 8.60 | loss: 12.10333 | epsilon: 0.01
421 | 4180 episode | score: 8.57 | loss: 10.45281 | epsilon: 0.01
422 | 4190 episode | score: 8.62 | loss: 11.28251 | epsilon: 0.01
423 | 4200 episode | score: 8.60 | loss: 8.83660 | epsilon: 0.01
424 | 4210 episode | score: 8.59 | loss: 11.22020 | epsilon: 0.01
425 | 4220 episode | score: 8.62 | loss: 12.84902 | epsilon: 0.01
426 | 4230 episode | score: 8.61 | loss: 12.86111 | epsilon: 0.01
427 | 4240 episode | score: 8.60 | loss: 15.39887 | epsilon: 0.01
428 | 4250 episode | score: 8.61 | loss: 8.85378 | epsilon: 0.01
429 | 4260 episode | score: 8.69 | loss: 8.83344 | epsilon: 0.01
430 | 4270 episode | score: 8.67 | loss: 5.69113 | epsilon: 0.01
431 | 4280 episode | score: 8.68 | loss: 9.65333 | epsilon: 0.01
432 | 4290 episode | score: 8.79 | loss: 9.64854 | epsilon: 0.01
433 | 4300 episode | score: 8.78 | loss: 8.08716 | epsilon: 0.01
434 | 4310 episode | score: 8.75 | loss: 8.83302 | epsilon: 0.01
435 | 4320 episode | score: 8.71 | loss: 12.05873 | epsilon: 0.01
436 | 4330 episode | score: 8.68 | loss: 10.39692 | epsilon: 0.01
437 | 4340 episode | score: 8.68 | loss: 9.61883 | epsilon: 0.01
438 | 4350 episode | score: 8.63 | loss: 12.08126 | epsilon: 0.01
439 | 4360 episode | score: 8.61 | loss: 12.13505 | epsilon: 0.01
440 | 4370 episode | score: 8.60 | loss: 13.63374 | epsilon: 0.01
441 | 4380 episode | score: 8.61 | loss: 15.19112 | epsilon: 0.01
442 | 4390 episode | score: 8.62 | loss: 15.24961 | epsilon: 0.01
443 | 4400 episode | score: 8.59 | loss: 11.25627 | epsilon: 0.01
444 | 4410 episode | score: 8.58 | loss: 13.66731 | epsilon: 0.01
445 | 4420 episode | score: 8.50 | loss: 11.30068 | epsilon: 0.01
446 | 4430 episode | score: 8.50 | loss: 13.64186 | epsilon: 0.01
447 | 4440 episode | score: 8.51 | loss: 10.41696 | epsilon: 0.01
448 | 4450 episode | score: 8.53 | loss: 9.64521 | epsilon: 0.01
449 | 4460 episode | score: 8.67 | loss: 9.70868 | epsilon: 0.01
450 | 4470 episode | score: 8.63 | loss: 10.40866 | epsilon: 0.01
451 | 4480 episode | score: 8.63 | loss: 12.10616 | epsilon: 0.01
452 | 4490 episode | score: 8.75 | loss: 13.58418 | epsilon: 0.01
453 | 4500 episode | score: 8.72 | loss: 11.95841 | epsilon: 0.01
454 | 4510 episode | score: 8.71 | loss: 9.60635 | epsilon: 0.01
455 | 4520 episode | score: 8.66 | loss: 11.95144 | epsilon: 0.01
456 | 4530 episode | score: 8.81 | loss: 14.35493 | epsilon: 0.01
457 | 4540 episode | score: 8.76 | loss: 10.48997 | epsilon: 0.01
458 | 4550 episode | score: 8.75 | loss: 10.43027 | epsilon: 0.01
459 | 4560 episode | score: 8.72 | loss: 12.12173 | epsilon: 0.01
460 | 4570 episode | score: 8.72 | loss: 12.01201 | epsilon: 0.01
461 | 4580 episode | score: 8.72 | loss: 9.61949 | epsilon: 0.01
462 | 4590 episode | score: 8.67 | loss: 12.77416 | epsilon: 0.01
463 | 4600 episode | score: 8.63 | loss: 16.01569 | epsilon: 0.01
464 | 4610 episode | score: 8.65 | loss: 13.59569 | epsilon: 0.01
465 | 4620 episode | score: 8.61 | loss: 8.00971 | epsilon: 0.01
466 | 4630 episode | score: 8.62 | loss: 9.68583 | epsilon: 0.01
467 | 4640 episode | score: 8.57 | loss: 11.19599 | epsilon: 0.01
468 | 4650 episode | score: 8.61 | loss: 9.61889 | epsilon: 0.01
469 | 4660 episode | score: 8.62 | loss: 7.23984 | epsilon: 0.01
470 | 4670 episode | score: 8.62 | loss: 11.21859 | epsilon: 0.01
471 | 4680 episode | score: 8.60 | loss: 11.23654 | epsilon: 0.01
472 | 4690 episode | score: 8.56 | loss: 12.08710 | epsilon: 0.01
473 | 4700 episode | score: 8.70 | loss: 13.60349 | epsilon: 0.01
474 | 4710 episode | score: 8.66 | loss: 12.83391 | epsilon: 0.01
475 | 4720 episode | score: 8.80 | loss: 15.21070 | epsilon: 0.01
476 | 4730 episode | score: 8.81 | loss: 11.25304 | epsilon: 0.01
477 | 4740 episode | score: 8.78 | loss: 9.60553 | epsilon: 0.01
478 | 4750 episode | score: 8.77 | loss: 8.88370 | epsilon: 0.01
479 | 4760 episode | score: 8.80 | loss: 11.28654 | epsilon: 0.01
480 | 4770 episode | score: 8.75 | loss: 13.70106 | epsilon: 0.01
481 | 4780 episode | score: 8.71 | loss: 12.84049 | epsilon: 0.01
482 | 4790 episode | score: 8.72 | loss: 9.73600 | epsilon: 0.01
483 | 4800 episode | score: 8.68 | loss: 11.25344 | epsilon: 0.01
484 | 4810 episode | score: 8.62 | loss: 9.74465 | epsilon: 0.01
485 | 4820 episode | score: 8.58 | loss: 12.82181 | epsilon: 0.01
486 | 4830 episode | score: 8.56 | loss: 12.08630 | epsilon: 0.01
487 | 4840 episode | score: 8.50 | loss: 13.64277 | epsilon: 0.01
488 | 4850 episode | score: 8.59 | loss: 9.67230 | epsilon: 0.01
489 | 4860 episode | score: 8.66 | loss: 15.23198 | epsilon: 0.01
490 | 4870 episode | score: 8.65 | loss: 15.16864 | epsilon: 0.01
491 | 4880 episode | score: 8.64 | loss: 14.38031 | epsilon: 0.01
492 | 4890 episode | score: 8.78 | loss: 11.27028 | epsilon: 0.01
493 | 4900 episode | score: 8.83 | loss: 11.98241 | epsilon: 0.01
494 | 4910 episode | score: 8.88 | loss: 11.29107 | epsilon: 0.01
495 | 4920 episode | score: 8.83 | loss: 11.27844 | epsilon: 0.01
496 | 4930 episode | score: 8.77 | loss: 13.63153 | epsilon: 0.01
497 | 4940 episode | score: 8.74 | loss: 8.10678 | epsilon: 0.01
498 | 4950 episode | score: 8.71 | loss: 11.96388 | epsilon: 0.01
499 | 4960 episode | score: 8.67 | loss: 11.19639 | epsilon: 0.01
500 | 4970 episode | score: 8.64 | loss: 10.38944 | epsilon: 0.01
501 | 4980 episode | score: 8.60 | loss: 9.59142 | epsilon: 0.01
502 | 4990 episode | score: 8.69 | loss: 16.00748 | epsilon: 0.01
503 |
--------------------------------------------------------------------------------
/out/trace_DTQN_5.txt:
--------------------------------------------------------------------------------
1 | state size: 2
2 | action size: 2
3 | 0 episode | score: 42.00 | loss: 0.00000 | epsilon: 1.00
4 | 10 episode | score: 39.48 | loss: 0.00000 | epsilon: 1.00
5 | 20 episode | score: 37.85 | loss: 0.00000 | epsilon: 1.00
6 | 30 episode | score: 36.05 | loss: 0.00000 | epsilon: 1.00
7 | 40 episode | score: 34.11 | loss: 0.00000 | epsilon: 1.00
8 | 50 episode | score: 32.68 | loss: 0.33304 | epsilon: 1.00
9 | 60 episode | score: 31.11 | loss: 0.20664 | epsilon: 0.99
10 | 70 episode | score: 30.30 | loss: 0.17808 | epsilon: 0.98
11 | 80 episode | score: 29.67 | loss: 0.25635 | epsilon: 0.97
12 | 90 episode | score: 29.02 | loss: 0.52714 | epsilon: 0.95
13 | 100 episode | score: 27.89 | loss: 0.79162 | epsilon: 0.95
14 | 110 episode | score: 27.20 | loss: 0.32172 | epsilon: 0.93
15 | 120 episode | score: 27.31 | loss: 0.88816 | epsilon: 0.92
16 | 130 episode | score: 26.59 | loss: 0.67432 | epsilon: 0.91
17 | 140 episode | score: 25.59 | loss: 0.21146 | epsilon: 0.90
18 | 150 episode | score: 25.58 | loss: 0.57946 | epsilon: 0.89
19 | 160 episode | score: 25.57 | loss: 0.75138 | epsilon: 0.87
20 | 170 episode | score: 24.81 | loss: 0.45107 | epsilon: 0.87
21 | 180 episode | score: 24.31 | loss: 1.11557 | epsilon: 0.86
22 | 190 episode | score: 24.40 | loss: 0.73569 | epsilon: 0.84
23 | 200 episode | score: 23.95 | loss: 0.52971 | epsilon: 0.83
24 | 210 episode | score: 23.44 | loss: 0.29357 | epsilon: 0.82
25 | 220 episode | score: 23.33 | loss: 0.56872 | epsilon: 0.81
26 | 230 episode | score: 22.69 | loss: 1.67684 | epsilon: 0.80
27 | 240 episode | score: 22.04 | loss: 0.90344 | epsilon: 0.79
28 | 250 episode | score: 21.78 | loss: 1.22213 | epsilon: 0.78
29 | 260 episode | score: 22.02 | loss: 1.55604 | epsilon: 0.77
30 | 270 episode | score: 21.59 | loss: 1.91622 | epsilon: 0.76
31 | 280 episode | score: 21.56 | loss: 1.86581 | epsilon: 0.75
32 | 290 episode | score: 21.94 | loss: 1.05475 | epsilon: 0.74
33 | 300 episode | score: 21.63 | loss: 2.10064 | epsilon: 0.73
34 | 310 episode | score: 21.41 | loss: 0.74282 | epsilon: 0.72
35 | 320 episode | score: 21.19 | loss: 1.13760 | epsilon: 0.71
36 | 330 episode | score: 21.51 | loss: 0.42925 | epsilon: 0.69
37 | 340 episode | score: 20.95 | loss: 0.79201 | epsilon: 0.69
38 | 350 episode | score: 20.47 | loss: 0.81940 | epsilon: 0.68
39 | 360 episode | score: 20.35 | loss: 2.00666 | epsilon: 0.67
40 | 370 episode | score: 19.69 | loss: 1.29173 | epsilon: 0.66
41 | 380 episode | score: 19.51 | loss: 1.24970 | epsilon: 0.65
42 | 390 episode | score: 19.22 | loss: 1.70168 | epsilon: 0.64
43 | 400 episode | score: 18.73 | loss: 1.36878 | epsilon: 0.63
44 | 410 episode | score: 18.40 | loss: 2.58290 | epsilon: 0.63
45 | 420 episode | score: 18.50 | loss: 2.18710 | epsilon: 0.62
46 | 430 episode | score: 18.04 | loss: 4.43702 | epsilon: 0.61
47 | 440 episode | score: 17.68 | loss: 2.25777 | epsilon: 0.60
48 | 450 episode | score: 17.39 | loss: 2.31119 | epsilon: 0.59
49 | 460 episode | score: 17.14 | loss: 1.26720 | epsilon: 0.58
50 | 470 episode | score: 17.04 | loss: 1.41837 | epsilon: 0.58
51 | 480 episode | score: 16.55 | loss: 1.97171 | epsilon: 0.57
52 | 490 episode | score: 16.70 | loss: 0.96602 | epsilon: 0.56
53 | 500 episode | score: 16.50 | loss: 2.92387 | epsilon: 0.55
54 | 510 episode | score: 16.59 | loss: 0.99988 | epsilon: 0.54
55 | 520 episode | score: 16.23 | loss: 1.98007 | epsilon: 0.54
56 | 530 episode | score: 16.07 | loss: 1.51870 | epsilon: 0.53
57 | 540 episode | score: 16.09 | loss: 3.35758 | epsilon: 0.52
58 | 550 episode | score: 15.84 | loss: 0.52732 | epsilon: 0.51
59 | 560 episode | score: 15.46 | loss: 3.56532 | epsilon: 0.51
60 | 570 episode | score: 15.54 | loss: 4.62709 | epsilon: 0.50
61 | 580 episode | score: 15.15 | loss: 3.11519 | epsilon: 0.49
62 | 590 episode | score: 14.99 | loss: 2.29960 | epsilon: 0.48
63 | 600 episode | score: 14.67 | loss: 2.66678 | epsilon: 0.48
64 | 610 episode | score: 14.30 | loss: 3.19330 | epsilon: 0.47
65 | 620 episode | score: 14.76 | loss: 2.15541 | epsilon: 0.46
66 | 630 episode | score: 15.34 | loss: 2.69951 | epsilon: 0.45
67 | 640 episode | score: 14.88 | loss: 4.35402 | epsilon: 0.45
68 | 650 episode | score: 14.53 | loss: 3.36075 | epsilon: 0.44
69 | 660 episode | score: 14.26 | loss: 4.38827 | epsilon: 0.43
70 | 670 episode | score: 13.98 | loss: 2.76526 | epsilon: 0.43
71 | 680 episode | score: 13.68 | loss: 2.82026 | epsilon: 0.42
72 | 690 episode | score: 13.41 | loss: 5.13891 | epsilon: 0.41
73 | 700 episode | score: 13.38 | loss: 4.49162 | epsilon: 0.41
74 | 710 episode | score: 13.31 | loss: 2.81179 | epsilon: 0.40
75 | 720 episode | score: 13.11 | loss: 3.50450 | epsilon: 0.39
76 | 730 episode | score: 12.86 | loss: 4.08011 | epsilon: 0.39
77 | 740 episode | score: 12.95 | loss: 4.57233 | epsilon: 0.38
78 | 750 episode | score: 12.61 | loss: 5.19284 | epsilon: 0.38
79 | 760 episode | score: 12.57 | loss: 3.67743 | epsilon: 0.37
80 | 770 episode | score: 12.77 | loss: 5.29215 | epsilon: 0.36
81 | 780 episode | score: 12.68 | loss: 4.67841 | epsilon: 0.36
82 | 790 episode | score: 12.78 | loss: 3.58484 | epsilon: 0.35
83 | 800 episode | score: 12.50 | loss: 4.14623 | epsilon: 0.34
84 | 810 episode | score: 12.47 | loss: 4.75443 | epsilon: 0.34
85 | 820 episode | score: 12.26 | loss: 5.34280 | epsilon: 0.33
86 | 830 episode | score: 12.21 | loss: 4.85460 | epsilon: 0.32
87 | 840 episode | score: 11.93 | loss: 3.64325 | epsilon: 0.32
88 | 850 episode | score: 11.83 | loss: 4.22764 | epsilon: 0.31
89 | 860 episode | score: 12.12 | loss: 4.29264 | epsilon: 0.31
90 | 870 episode | score: 12.38 | loss: 3.06890 | epsilon: 0.30
91 | 880 episode | score: 12.41 | loss: 3.66894 | epsilon: 0.29
92 | 890 episode | score: 12.24 | loss: 4.93459 | epsilon: 0.28
93 | 900 episode | score: 11.99 | loss: 3.81321 | epsilon: 0.28
94 | 910 episode | score: 11.82 | loss: 3.75304 | epsilon: 0.27
95 | 920 episode | score: 11.84 | loss: 4.98607 | epsilon: 0.27
96 | 930 episode | score: 11.69 | loss: 3.75075 | epsilon: 0.26
97 | 940 episode | score: 11.43 | loss: 3.77254 | epsilon: 0.26
98 | 950 episode | score: 11.36 | loss: 3.76858 | epsilon: 0.25
99 | 960 episode | score: 11.16 | loss: 4.41588 | epsilon: 0.25
100 | 970 episode | score: 10.97 | loss: 6.90373 | epsilon: 0.24
101 | 980 episode | score: 10.81 | loss: 5.72781 | epsilon: 0.24
102 | 990 episode | score: 10.76 | loss: 7.01954 | epsilon: 0.23
103 | 1000 episode | score: 10.64 | loss: 6.46829 | epsilon: 0.22
104 | 1010 episode | score: 10.69 | loss: 5.75487 | epsilon: 0.22
105 | 1020 episode | score: 10.74 | loss: 6.37494 | epsilon: 0.21
106 | 1030 episode | score: 10.59 | loss: 5.13310 | epsilon: 0.21
107 | 1040 episode | score: 10.41 | loss: 7.83554 | epsilon: 0.20
108 | 1050 episode | score: 10.42 | loss: 7.08298 | epsilon: 0.20
109 | 1060 episode | score: 10.33 | loss: 5.80025 | epsilon: 0.19
110 | 1070 episode | score: 10.32 | loss: 2.59830 | epsilon: 0.19
111 | 1080 episode | score: 10.55 | loss: 7.13071 | epsilon: 0.18
112 | 1090 episode | score: 10.41 | loss: 5.87516 | epsilon: 0.17
113 | 1100 episode | score: 10.26 | loss: 7.79912 | epsilon: 0.17
114 | 1110 episode | score: 10.19 | loss: 9.12289 | epsilon: 0.16
115 | 1120 episode | score: 10.19 | loss: 8.51380 | epsilon: 0.16
116 | 1130 episode | score: 10.19 | loss: 4.64488 | epsilon: 0.15
117 | 1140 episode | score: 10.19 | loss: 3.95919 | epsilon: 0.15
118 | 1150 episode | score: 10.22 | loss: 9.88921 | epsilon: 0.14
119 | 1160 episode | score: 10.15 | loss: 5.28739 | epsilon: 0.14
120 | 1170 episode | score: 10.00 | loss: 5.29736 | epsilon: 0.13
121 | 1180 episode | score: 9.94 | loss: 6.01434 | epsilon: 0.13
122 | 1190 episode | score: 10.00 | loss: 5.98966 | epsilon: 0.12
123 | 1200 episode | score: 9.93 | loss: 7.42015 | epsilon: 0.11
124 | 1210 episode | score: 9.94 | loss: 4.78921 | epsilon: 0.11
125 | 1220 episode | score: 9.94 | loss: 6.73261 | epsilon: 0.10
126 | 1230 episode | score: 9.86 | loss: 6.77203 | epsilon: 0.10
127 | 1240 episode | score: 9.86 | loss: 10.79873 | epsilon: 0.09
128 | 1250 episode | score: 9.80 | loss: 5.41691 | epsilon: 0.09
129 | 1260 episode | score: 9.76 | loss: 7.49157 | epsilon: 0.08
130 | 1270 episode | score: 9.77 | loss: 4.80807 | epsilon: 0.08
131 | 1280 episode | score: 9.67 | loss: 8.86116 | epsilon: 0.07
132 | 1290 episode | score: 9.74 | loss: 8.23815 | epsilon: 0.07
133 | 1300 episode | score: 9.82 | loss: 6.13960 | epsilon: 0.06
134 | 1310 episode | score: 9.73 | loss: 6.84984 | epsilon: 0.06
135 | 1320 episode | score: 9.66 | loss: 10.96124 | epsilon: 0.05
136 | 1330 episode | score: 9.76 | loss: 6.19592 | epsilon: 0.05
137 | 1340 episode | score: 9.69 | loss: 8.32158 | epsilon: 0.04
138 | 1350 episode | score: 9.62 | loss: 8.96192 | epsilon: 0.04
139 | 1360 episode | score: 9.54 | loss: 10.39995 | epsilon: 0.03
140 | 1370 episode | score: 9.42 | loss: 6.96574 | epsilon: 0.03
141 | 1380 episode | score: 9.33 | loss: 9.09122 | epsilon: 0.02
142 | 1390 episode | score: 9.24 | loss: 11.82153 | epsilon: 0.02
143 | 1400 episode | score: 9.15 | loss: 8.40250 | epsilon: 0.01
144 | 1410 episode | score: 9.12 | loss: 11.16653 | epsilon: 0.01
145 | 1420 episode | score: 9.13 | loss: 8.46265 | epsilon: 0.01
146 | 1430 episode | score: 9.10 | loss: 8.38696 | epsilon: 0.01
147 | 1440 episode | score: 9.05 | loss: 7.71308 | epsilon: 0.01
148 | 1450 episode | score: 8.98 | loss: 10.51652 | epsilon: 0.01
149 | 1460 episode | score: 8.92 | loss: 8.40514 | epsilon: 0.01
150 | 1470 episode | score: 8.88 | loss: 9.84612 | epsilon: 0.01
151 | 1480 episode | score: 9.01 | loss: 12.71185 | epsilon: 0.01
152 | 1490 episode | score: 8.99 | loss: 9.84389 | epsilon: 0.01
153 | 1500 episode | score: 8.93 | loss: 13.35950 | epsilon: 0.01
154 | 1510 episode | score: 8.89 | loss: 8.45604 | epsilon: 0.01
155 | 1520 episode | score: 8.86 | loss: 9.86826 | epsilon: 0.01
156 | 1530 episode | score: 8.82 | loss: 11.98928 | epsilon: 0.01
157 | 1540 episode | score: 8.79 | loss: 9.35221 | epsilon: 0.01
158 | 1550 episode | score: 8.71 | loss: 12.74729 | epsilon: 0.01
159 | 1560 episode | score: 8.70 | loss: 12.13309 | epsilon: 0.01
160 | 1570 episode | score: 8.67 | loss: 11.34081 | epsilon: 0.01
161 | 1580 episode | score: 8.67 | loss: 10.66604 | epsilon: 0.01
162 | 1590 episode | score: 8.68 | loss: 9.95511 | epsilon: 0.01
163 | 1600 episode | score: 8.80 | loss: 10.72554 | epsilon: 0.01
164 | 1610 episode | score: 8.76 | loss: 12.85478 | epsilon: 0.01
165 | 1620 episode | score: 8.83 | loss: 9.37244 | epsilon: 0.01
166 | 1630 episode | score: 8.81 | loss: 10.04362 | epsilon: 0.01
167 | 1640 episode | score: 9.16 | loss: 10.84129 | epsilon: 0.01
168 | 1650 episode | score: 9.13 | loss: 13.66985 | epsilon: 0.01
169 | 1660 episode | score: 9.04 | loss: 10.13253 | epsilon: 0.01
170 | 1670 episode | score: 8.98 | loss: 8.66332 | epsilon: 0.01
171 | 1680 episode | score: 8.94 | loss: 11.53072 | epsilon: 0.01
172 | 1690 episode | score: 8.89 | loss: 10.12132 | epsilon: 0.01
173 | 1700 episode | score: 8.97 | loss: 9.47163 | epsilon: 0.01
174 | 1710 episode | score: 8.91 | loss: 10.89018 | epsilon: 0.01
175 | 1720 episode | score: 8.84 | loss: 9.44108 | epsilon: 0.01
176 | 1730 episode | score: 8.81 | loss: 9.40377 | epsilon: 0.01
177 | 1740 episode | score: 9.06 | loss: 6.60606 | epsilon: 0.01
178 | 1750 episode | score: 9.02 | loss: 8.85948 | epsilon: 0.01
179 | 1760 episode | score: 8.95 | loss: 13.07458 | epsilon: 0.01
180 | 1770 episode | score: 8.92 | loss: 5.84995 | epsilon: 0.01
181 | 1780 episode | score: 8.88 | loss: 10.89841 | epsilon: 0.01
182 | 1790 episode | score: 8.87 | loss: 10.92721 | epsilon: 0.01
183 | 1800 episode | score: 8.85 | loss: 10.95057 | epsilon: 0.01
184 | 1810 episode | score: 8.91 | loss: 5.12229 | epsilon: 0.01
185 | 1820 episode | score: 8.85 | loss: 12.39469 | epsilon: 0.01
186 | 1830 episode | score: 8.81 | loss: 7.34548 | epsilon: 0.01
187 | 1840 episode | score: 8.79 | loss: 8.81931 | epsilon: 0.01
188 | 1850 episode | score: 8.73 | loss: 11.68220 | epsilon: 0.01
189 | 1860 episode | score: 8.66 | loss: 9.51998 | epsilon: 0.01
190 | 1870 episode | score: 8.61 | loss: 11.85967 | epsilon: 0.01
191 | 1880 episode | score: 8.53 | loss: 12.43571 | epsilon: 0.01
192 | 1890 episode | score: 8.51 | loss: 13.20536 | epsilon: 0.01
193 | 1900 episode | score: 8.61 | loss: 12.49828 | epsilon: 0.01
194 | 1910 episode | score: 8.64 | loss: 14.00794 | epsilon: 0.01
195 | 1920 episode | score: 8.62 | loss: 11.06345 | epsilon: 0.01
196 | 1930 episode | score: 8.62 | loss: 12.55268 | epsilon: 0.01
197 | 1940 episode | score: 8.61 | loss: 10.36380 | epsilon: 0.01
198 | 1950 episode | score: 8.57 | loss: 9.59783 | epsilon: 0.01
199 | 1960 episode | score: 8.62 | loss: 11.87677 | epsilon: 0.01
200 | 1970 episode | score: 8.56 | loss: 8.95421 | epsilon: 0.01
201 | 1980 episode | score: 8.57 | loss: 11.90256 | epsilon: 0.01
202 | 1990 episode | score: 8.56 | loss: 12.88654 | epsilon: 0.01
203 | 2000 episode | score: 8.52 | loss: 22.89971 | epsilon: 0.01
204 | 2010 episode | score: 8.45 | loss: 11.69317 | epsilon: 0.01
205 | 2020 episode | score: 8.46 | loss: 13.89884 | epsilon: 0.01
206 | 2030 episode | score: 8.47 | loss: 12.21345 | epsilon: 0.01
207 | 2040 episode | score: 8.60 | loss: 12.74696 | epsilon: 0.01
208 | 2050 episode | score: 9.39 | loss: 9.73934 | epsilon: 0.01
209 | 2060 episode | score: 9.53 | loss: 5.97629 | epsilon: 0.01
210 | 2070 episode | score: 9.51 | loss: 7.89748 | epsilon: 0.01
211 | 2080 episode | score: 10.17 | loss: 11.30224 | epsilon: 0.01
212 | 2090 episode | score: 10.16 | loss: 11.95127 | epsilon: 0.01
213 | 2100 episode | score: 9.98 | loss: 9.02979 | epsilon: 0.01
214 | 2110 episode | score: 9.85 | loss: 6.85986 | epsilon: 0.01
215 | 2120 episode | score: 9.70 | loss: 7.56394 | epsilon: 0.01
216 | 2130 episode | score: 9.71 | loss: 6.06545 | epsilon: 0.01
217 | 2140 episode | score: 9.55 | loss: 4.56698 | epsilon: 0.01
218 | 2150 episode | score: 9.45 | loss: 11.08675 | epsilon: 0.01
219 | 2160 episode | score: 9.39 | loss: 5.97940 | epsilon: 0.01
220 | 2170 episode | score: 9.50 | loss: 6.68500 | epsilon: 0.01
221 | 2180 episode | score: 9.44 | loss: 10.37776 | epsilon: 0.01
222 | 2190 episode | score: 10.14 | loss: 8.89424 | epsilon: 0.01
223 | 2200 episode | score: 10.23 | loss: 6.68299 | epsilon: 0.01
224 | 2210 episode | score: 10.10 | loss: 8.16638 | epsilon: 0.01
225 | 2220 episode | score: 9.93 | loss: 8.17309 | epsilon: 0.01
226 | 2230 episode | score: 9.76 | loss: 8.17456 | epsilon: 0.01
227 | 2240 episode | score: 9.68 | loss: 11.12660 | epsilon: 0.01
228 | 2250 episode | score: 9.52 | loss: 8.20828 | epsilon: 0.01
229 | 2260 episode | score: 9.49 | loss: 6.74850 | epsilon: 0.01
230 | 2270 episode | score: 9.34 | loss: 8.91310 | epsilon: 0.01
231 | 2280 episode | score: 9.26 | loss: 9.66050 | epsilon: 0.01
232 | 2290 episode | score: 9.36 | loss: 7.45649 | epsilon: 0.01
233 | 2300 episode | score: 9.35 | loss: 14.13468 | epsilon: 0.01
234 | 2310 episode | score: 9.29 | loss: 8.20157 | epsilon: 0.01
235 | 2320 episode | score: 9.20 | loss: 13.40456 | epsilon: 0.01
236 | 2330 episode | score: 9.12 | loss: 10.44636 | epsilon: 0.01
237 | 2340 episode | score: 9.31 | loss: 10.45737 | epsilon: 0.01
238 | 2350 episode | score: 9.32 | loss: 8.96152 | epsilon: 0.01
239 | 2360 episode | score: 9.37 | loss: 12.77146 | epsilon: 0.01
240 | 2370 episode | score: 9.33 | loss: 11.96437 | epsilon: 0.01
241 | 2380 episode | score: 9.25 | loss: 8.28421 | epsilon: 0.01
242 | 2390 episode | score: 9.20 | loss: 11.25395 | epsilon: 0.01
243 | 2400 episode | score: 9.14 | loss: 12.75365 | epsilon: 0.01
244 | 2410 episode | score: 9.23 | loss: 11.28267 | epsilon: 0.01
245 | 2420 episode | score: 9.58 | loss: 9.01897 | epsilon: 0.01
246 | 2430 episode | score: 9.49 | loss: 9.78657 | epsilon: 0.01
247 | 2440 episode | score: 9.41 | loss: 12.01540 | epsilon: 0.01
248 | 2450 episode | score: 9.40 | loss: 9.77980 | epsilon: 0.01
249 | 2460 episode | score: 9.51 | loss: 9.04971 | epsilon: 0.01
250 | 2470 episode | score: 9.48 | loss: 9.81820 | epsilon: 0.01
251 | 2480 episode | score: 9.39 | loss: 12.04065 | epsilon: 0.01
252 | 2490 episode | score: 9.27 | loss: 8.29423 | epsilon: 0.01
253 | 2500 episode | score: 9.15 | loss: 7.55239 | epsilon: 0.01
254 | 2510 episode | score: 9.10 | loss: 11.31253 | epsilon: 0.01
255 | 2520 episode | score: 9.03 | loss: 11.32649 | epsilon: 0.01
256 | 2530 episode | score: 8.93 | loss: 10.57569 | epsilon: 0.01
257 | 2540 episode | score: 8.96 | loss: 12.10625 | epsilon: 0.01
258 | 2550 episode | score: 8.97 | loss: 10.59332 | epsilon: 0.01
259 | 2560 episode | score: 8.96 | loss: 12.86546 | epsilon: 0.01
260 | 2570 episode | score: 8.92 | loss: 13.63366 | epsilon: 0.01
261 | 2580 episode | score: 8.85 | loss: 12.12462 | epsilon: 0.01
262 | 2590 episode | score: 8.87 | loss: 9.89139 | epsilon: 0.01
263 | 2600 episode | score: 8.94 | loss: 6.10569 | epsilon: 0.01
264 | 2610 episode | score: 8.91 | loss: 11.38791 | epsilon: 0.01
265 | 2620 episode | score: 8.89 | loss: 9.11608 | epsilon: 0.01
266 | 2630 episode | score: 8.93 | loss: 12.91580 | epsilon: 0.01
267 | 2640 episode | score: 9.68 | loss: 7.73742 | epsilon: 0.01
268 | 2650 episode | score: 9.54 | loss: 6.88130 | epsilon: 0.01
269 | 2660 episode | score: 9.59 | loss: 10.00360 | epsilon: 0.01
270 | 2670 episode | score: 9.44 | loss: 9.20884 | epsilon: 0.01
271 | 2680 episode | score: 9.30 | loss: 7.65980 | epsilon: 0.01
272 | 2690 episode | score: 9.19 | loss: 8.41324 | epsilon: 0.01
273 | 2700 episode | score: 9.14 | loss: 12.98391 | epsilon: 0.01
274 | 2710 episode | score: 9.13 | loss: 6.90586 | epsilon: 0.01
275 | 2720 episode | score: 9.08 | loss: 6.90842 | epsilon: 0.01
276 | 2730 episode | score: 8.99 | loss: 9.21201 | epsilon: 0.01
277 | 2740 episode | score: 9.62 | loss: 10.79367 | epsilon: 0.01
278 | 2750 episode | score: 9.50 | loss: 13.79613 | epsilon: 0.01
279 | 2760 episode | score: 9.42 | loss: 6.24562 | epsilon: 0.01
280 | 2770 episode | score: 9.47 | loss: 10.74018 | epsilon: 0.01
281 | 2780 episode | score: 9.56 | loss: 10.00158 | epsilon: 0.01
282 | 2790 episode | score: 10.28 | loss: 8.44825 | epsilon: 0.01
283 | 2800 episode | score: 10.16 | loss: 8.44763 | epsilon: 0.01
284 | 2810 episode | score: 10.02 | loss: 10.75428 | epsilon: 0.01
285 | 2820 episode | score: 10.11 | loss: 6.92661 | epsilon: 0.01
286 | 2830 episode | score: 10.18 | loss: 9.22772 | epsilon: 0.01
287 | 2840 episode | score: 10.04 | loss: 8.44034 | epsilon: 0.01
288 | 2850 episode | score: 9.87 | loss: 8.47338 | epsilon: 0.01
289 | 2860 episode | score: 9.70 | loss: 10.01442 | epsilon: 0.01
290 | 2870 episode | score: 9.60 | loss: 5.43404 | epsilon: 0.01
291 | 2880 episode | score: 9.46 | loss: 6.15170 | epsilon: 0.01
292 | 2890 episode | score: 9.42 | loss: 6.16667 | epsilon: 0.01
293 | 2900 episode | score: 9.36 | loss: 7.73600 | epsilon: 0.01
294 | 2910 episode | score: 9.29 | loss: 12.33317 | epsilon: 0.01
295 | 2920 episode | score: 9.23 | loss: 9.23750 | epsilon: 0.01
296 | 2930 episode | score: 9.43 | loss: 10.80528 | epsilon: 0.01
297 | 2940 episode | score: 9.69 | loss: 14.62934 | epsilon: 0.01
298 | 2950 episode | score: 9.58 | loss: 9.26545 | epsilon: 0.01
299 | 2960 episode | score: 9.61 | loss: 7.04498 | epsilon: 0.01
300 | 2970 episode | score: 9.50 | loss: 10.81926 | epsilon: 0.01
301 | 2980 episode | score: 9.53 | loss: 12.33160 | epsilon: 0.01
302 | 2990 episode | score: 9.43 | loss: 9.26536 | epsilon: 0.01
303 | 3000 episode | score: 9.34 | loss: 6.20229 | epsilon: 0.01
304 | 3010 episode | score: 9.36 | loss: 8.49146 | epsilon: 0.01
305 | 3020 episode | score: 9.26 | loss: 12.36944 | epsilon: 0.01
306 | 3030 episode | score: 9.18 | loss: 10.04898 | epsilon: 0.01
307 | 3040 episode | score: 9.07 | loss: 10.06615 | epsilon: 0.01
308 | 3050 episode | score: 9.03 | loss: 11.62272 | epsilon: 0.01
309 | 3060 episode | score: 8.99 | loss: 13.16744 | epsilon: 0.01
310 | 3070 episode | score: 8.95 | loss: 12.43446 | epsilon: 0.01
311 | 3080 episode | score: 8.94 | loss: 9.40432 | epsilon: 0.01
312 | 3090 episode | score: 8.92 | loss: 14.01531 | epsilon: 0.01
313 | 3100 episode | score: 8.88 | loss: 7.11171 | epsilon: 0.01
314 | 3110 episode | score: 8.86 | loss: 10.89154 | epsilon: 0.01
315 | 3120 episode | score: 8.82 | loss: 9.32153 | epsilon: 0.01
316 | 3130 episode | score: 8.69 | loss: 11.73717 | epsilon: 0.01
317 | 3140 episode | score: 8.71 | loss: 9.37411 | epsilon: 0.01
318 | 3150 episode | score: 8.73 | loss: 11.65305 | epsilon: 0.01
319 | 3160 episode | score: 8.72 | loss: 9.43469 | epsilon: 0.01
320 | 3170 episode | score: 8.86 | loss: 14.79311 | epsilon: 0.01
321 | 3180 episode | score: 8.79 | loss: 13.25291 | epsilon: 0.01
322 | 3190 episode | score: 8.73 | loss: 11.71912 | epsilon: 0.01
323 | 3200 episode | score: 8.70 | loss: 8.59743 | epsilon: 0.01
324 | 3210 episode | score: 8.70 | loss: 11.68095 | epsilon: 0.01
325 | 3220 episode | score: 8.67 | loss: 12.51386 | epsilon: 0.01
326 | 3230 episode | score: 8.68 | loss: 9.40448 | epsilon: 0.01
327 | 3240 episode | score: 8.64 | loss: 9.40135 | epsilon: 0.01
328 | 3250 episode | score: 8.65 | loss: 10.15096 | epsilon: 0.01
329 | 3260 episode | score: 8.67 | loss: 9.35014 | epsilon: 0.01
330 | 3270 episode | score: 8.79 | loss: 12.45504 | epsilon: 0.01
331 | 3280 episode | score: 8.79 | loss: 8.64415 | epsilon: 0.01
332 | 3290 episode | score: 8.75 | loss: 10.20365 | epsilon: 0.01
333 | 3300 episode | score: 8.72 | loss: 7.86599 | epsilon: 0.01
334 | 3310 episode | score: 8.74 | loss: 10.96756 | epsilon: 0.01
335 | 3320 episode | score: 8.68 | loss: 13.34599 | epsilon: 0.01
336 | 3330 episode | score: 8.66 | loss: 10.94800 | epsilon: 0.01
337 | 3340 episode | score: 8.62 | loss: 14.05154 | epsilon: 0.01
338 | 3350 episode | score: 8.60 | loss: 10.23919 | epsilon: 0.01
339 | 3360 episode | score: 8.59 | loss: 14.03212 | epsilon: 0.01
340 | 3370 episode | score: 8.55 | loss: 11.77572 | epsilon: 0.01
341 | 3380 episode | score: 8.68 | loss: 10.25810 | epsilon: 0.01
342 | 3390 episode | score: 8.75 | loss: 8.62217 | epsilon: 0.01
343 | 3400 episode | score: 8.75 | loss: 12.53079 | epsilon: 0.01
344 | 3410 episode | score: 8.81 | loss: 7.07102 | epsilon: 0.01
345 | 3420 episode | score: 8.79 | loss: 15.74056 | epsilon: 0.01
346 | 3430 episode | score: 8.84 | loss: 11.75673 | epsilon: 0.01
347 | 3440 episode | score: 8.89 | loss: 10.96829 | epsilon: 0.01
348 | 3450 episode | score: 8.81 | loss: 12.55960 | epsilon: 0.01
349 | 3460 episode | score: 8.82 | loss: 12.55970 | epsilon: 0.01
350 | 3470 episode | score: 8.76 | loss: 10.98289 | epsilon: 0.01
351 | 3480 episode | score: 8.84 | loss: 11.01114 | epsilon: 0.01
352 | 3490 episode | score: 8.82 | loss: 11.75950 | epsilon: 0.01
353 | 3500 episode | score: 8.85 | loss: 13.32741 | epsilon: 0.01
354 | 3510 episode | score: 8.80 | loss: 12.60781 | epsilon: 0.01
355 | 3520 episode | score: 8.76 | loss: 13.35716 | epsilon: 0.01
356 | 3530 episode | score: 8.70 | loss: 11.76125 | epsilon: 0.01
357 | 3540 episode | score: 8.73 | loss: 14.90952 | epsilon: 0.01
358 | 3550 episode | score: 8.70 | loss: 11.77749 | epsilon: 0.01
359 | 3560 episode | score: 8.72 | loss: 12.56740 | epsilon: 0.01
360 | 3570 episode | score: 8.73 | loss: 14.99122 | epsilon: 0.01
361 | 3580 episode | score: 8.80 | loss: 17.37853 | epsilon: 0.01
362 | 3590 episode | score: 8.72 | loss: 8.76137 | epsilon: 0.01
363 | 3600 episode | score: 8.67 | loss: 10.23885 | epsilon: 0.01
364 | 3610 episode | score: 8.65 | loss: 11.01727 | epsilon: 0.01
365 | 3620 episode | score: 8.62 | loss: 14.13370 | epsilon: 0.01
366 | 3630 episode | score: 8.72 | loss: 11.00491 | epsilon: 0.01
367 | 3640 episode | score: 8.69 | loss: 8.70311 | epsilon: 0.01
368 | 3650 episode | score: 8.66 | loss: 13.38176 | epsilon: 0.01
369 | 3660 episode | score: 8.63 | loss: 7.11742 | epsilon: 0.01
370 | 3670 episode | score: 8.56 | loss: 14.16870 | epsilon: 0.01
371 | 3680 episode | score: 8.63 | loss: 11.78554 | epsilon: 0.01
372 | 3690 episode | score: 8.66 | loss: 10.24880 | epsilon: 0.01
373 | 3700 episode | score: 8.65 | loss: 13.40398 | epsilon: 0.01
374 | 3710 episode | score: 8.65 | loss: 12.64166 | epsilon: 0.01
375 | 3720 episode | score: 8.64 | loss: 9.59502 | epsilon: 0.01
376 | 3730 episode | score: 8.64 | loss: 12.61637 | epsilon: 0.01
377 | 3740 episode | score: 8.70 | loss: 7.87491 | epsilon: 0.01
378 | 3750 episode | score: 8.66 | loss: 7.88004 | epsilon: 0.01
379 | 3760 episode | score: 8.64 | loss: 11.78762 | epsilon: 0.01
380 | 3770 episode | score: 8.74 | loss: 9.46066 | epsilon: 0.01
381 | 3780 episode | score: 8.73 | loss: 7.96536 | epsilon: 0.01
382 | 3790 episode | score: 8.71 | loss: 11.97221 | epsilon: 0.01
383 | 3800 episode | score: 8.67 | loss: 11.88898 | epsilon: 0.01
384 | 3810 episode | score: 8.60 | loss: 11.85641 | epsilon: 0.01
385 | 3820 episode | score: 8.58 | loss: 14.25775 | epsilon: 0.01
386 | 3830 episode | score: 8.62 | loss: 11.08920 | epsilon: 0.01
387 | 3840 episode | score: 8.64 | loss: 11.07313 | epsilon: 0.01
388 | 3850 episode | score: 8.76 | loss: 10.28849 | epsilon: 0.01
389 | 3860 episode | score: 8.70 | loss: 11.09194 | epsilon: 0.01
390 | 3870 episode | score: 8.83 | loss: 7.27343 | epsilon: 0.01
391 | 3880 episode | score: 8.78 | loss: 12.70216 | epsilon: 0.01
392 | 3890 episode | score: 8.76 | loss: 12.71587 | epsilon: 0.01
393 | 3900 episode | score: 8.80 | loss: 11.92015 | epsilon: 0.01
394 | 3910 episode | score: 8.91 | loss: 13.49285 | epsilon: 0.01
395 | 3920 episode | score: 8.87 | loss: 14.29325 | epsilon: 0.01
396 | 3930 episode | score: 8.83 | loss: 10.36528 | epsilon: 0.01
397 | 3940 episode | score: 8.78 | loss: 11.20214 | epsilon: 0.01
398 | 3950 episode | score: 8.72 | loss: 12.78704 | epsilon: 0.01
399 | 3960 episode | score: 8.91 | loss: 8.74634 | epsilon: 0.01
400 | 3970 episode | score: 8.80 | loss: 11.15318 | epsilon: 0.01
401 | 3980 episode | score: 8.76 | loss: 11.17389 | epsilon: 0.01
402 | 3990 episode | score: 8.70 | loss: 10.37470 | epsilon: 0.01
403 | 4000 episode | score: 8.63 | loss: 15.13234 | epsilon: 0.01
404 | 4010 episode | score: 8.61 | loss: 11.99569 | epsilon: 0.01
405 | 4020 episode | score: 8.58 | loss: 8.01051 | epsilon: 0.01
406 | 4030 episode | score: 8.59 | loss: 14.32788 | epsilon: 0.01
407 | 4040 episode | score: 8.59 | loss: 12.81188 | epsilon: 0.01
408 | 4050 episode | score: 8.54 | loss: 14.31777 | epsilon: 0.01
409 | 4060 episode | score: 8.54 | loss: 9.59531 | epsilon: 0.01
410 | 4070 episode | score: 8.54 | loss: 14.39224 | epsilon: 0.01
411 | 4080 episode | score: 8.52 | loss: 14.33717 | epsilon: 0.01
412 | 4090 episode | score: 8.46 | loss: 12.74352 | epsilon: 0.01
413 | 4100 episode | score: 8.45 | loss: 9.54306 | epsilon: 0.01
414 | 4110 episode | score: 8.46 | loss: 8.00438 | epsilon: 0.01
415 | 4120 episode | score: 8.44 | loss: 12.01585 | epsilon: 0.01
416 | 4130 episode | score: 8.42 | loss: 8.08862 | epsilon: 0.01
417 | 4140 episode | score: 8.44 | loss: 11.19263 | epsilon: 0.01
418 | 4150 episode | score: 8.60 | loss: 12.77767 | epsilon: 0.01
419 | 4160 episode | score: 8.61 | loss: 9.61339 | epsilon: 0.01
420 | 4170 episode | score: 8.66 | loss: 8.78759 | epsilon: 0.01
421 | 4180 episode | score: 8.63 | loss: 9.59737 | epsilon: 0.01
422 | 4190 episode | score: 8.66 | loss: 11.22626 | epsilon: 0.01
423 | 4200 episode | score: 8.63 | loss: 8.79422 | epsilon: 0.01
424 | 4210 episode | score: 8.76 | loss: 7.25631 | epsilon: 0.01
425 | 4220 episode | score: 8.87 | loss: 5.62890 | epsilon: 0.01
426 | 4230 episode | score: 8.82 | loss: 8.87474 | epsilon: 0.01
427 | 4240 episode | score: 8.83 | loss: 12.82258 | epsilon: 0.01
428 | 4250 episode | score: 8.81 | loss: 12.82549 | epsilon: 0.01
429 | 4260 episode | score: 8.78 | loss: 12.02019 | epsilon: 0.01
430 | 4270 episode | score: 8.77 | loss: 13.64278 | epsilon: 0.01
431 | 4280 episode | score: 8.93 | loss: 8.82284 | epsilon: 0.01
432 | 4290 episode | score: 8.90 | loss: 8.86403 | epsilon: 0.01
433 | 4300 episode | score: 8.83 | loss: 9.78740 | epsilon: 0.01
434 | 4310 episode | score: 8.75 | loss: 15.24880 | epsilon: 0.01
435 | 4320 episode | score: 8.73 | loss: 13.59393 | epsilon: 0.01
436 | 4330 episode | score: 8.68 | loss: 13.73093 | epsilon: 0.01
437 | 4340 episode | score: 8.69 | loss: 11.38526 | epsilon: 0.01
438 | 4350 episode | score: 8.63 | loss: 12.88845 | epsilon: 0.01
439 | 4360 episode | score: 8.63 | loss: 15.29476 | epsilon: 0.01
440 | 4370 episode | score: 8.61 | loss: 9.62656 | epsilon: 0.01
441 | 4380 episode | score: 8.61 | loss: 11.30112 | epsilon: 0.01
442 | 4390 episode | score: 8.59 | loss: 9.64459 | epsilon: 0.01
443 | 4400 episode | score: 8.59 | loss: 11.23779 | epsilon: 0.01
444 | 4410 episode | score: 8.65 | loss: 9.63661 | epsilon: 0.01
445 | 4420 episode | score: 8.57 | loss: 12.04601 | epsilon: 0.01
446 | 4430 episode | score: 8.59 | loss: 12.01326 | epsilon: 0.01
447 | 4440 episode | score: 8.68 | loss: 10.37475 | epsilon: 0.01
448 | 4450 episode | score: 8.69 | loss: 15.99613 | epsilon: 0.01
449 | 4460 episode | score: 8.64 | loss: 11.96889 | epsilon: 0.01
450 | 4470 episode | score: 8.65 | loss: 12.04646 | epsilon: 0.01
451 | 4480 episode | score: 8.63 | loss: 8.90685 | epsilon: 0.01
452 | 4490 episode | score: 8.62 | loss: 10.39016 | epsilon: 0.01
453 | 4500 episode | score: 8.60 | loss: 11.22810 | epsilon: 0.01
454 | 4510 episode | score: 8.56 | loss: 11.23388 | epsilon: 0.01
455 | 4520 episode | score: 8.63 | loss: 12.09711 | epsilon: 0.01
456 | 4530 episode | score: 8.64 | loss: 6.43348 | epsilon: 0.01
457 | 4540 episode | score: 8.72 | loss: 11.20046 | epsilon: 0.01
458 | 4550 episode | score: 8.70 | loss: 10.46217 | epsilon: 0.01
459 | 4560 episode | score: 8.74 | loss: 10.44539 | epsilon: 0.01
460 | 4570 episode | score: 8.74 | loss: 8.07104 | epsilon: 0.01
461 | 4580 episode | score: 8.78 | loss: 14.40675 | epsilon: 0.01
462 | 4590 episode | score: 8.84 | loss: 8.82934 | epsilon: 0.01
463 | 4600 episode | score: 8.92 | loss: 11.25667 | epsilon: 0.01
464 | 4610 episode | score: 8.89 | loss: 12.83683 | epsilon: 0.01
465 | 4620 episode | score: 8.83 | loss: 8.06613 | epsilon: 0.01
466 | 4630 episode | score: 8.78 | loss: 14.43854 | epsilon: 0.01
467 | 4640 episode | score: 8.75 | loss: 12.05898 | epsilon: 0.01
468 | 4650 episode | score: 8.78 | loss: 10.55615 | epsilon: 0.01
469 | 4660 episode | score: 8.78 | loss: 9.73167 | epsilon: 0.01
470 | 4670 episode | score: 8.78 | loss: 9.68056 | epsilon: 0.01
471 | 4680 episode | score: 9.01 | loss: 11.23431 | epsilon: 0.01
472 | 4690 episode | score: 9.05 | loss: 11.26516 | epsilon: 0.01
473 | 4700 episode | score: 9.01 | loss: 12.04785 | epsilon: 0.01
474 | 4710 episode | score: 9.06 | loss: 12.07072 | epsilon: 0.01
475 | 4720 episode | score: 9.03 | loss: 9.62078 | epsilon: 0.01
476 | 4730 episode | score: 8.99 | loss: 8.94808 | epsilon: 0.01
477 | 4740 episode | score: 8.94 | loss: 14.43412 | epsilon: 0.01
478 | 4750 episode | score: 8.92 | loss: 6.44148 | epsilon: 0.01
479 | 4760 episode | score: 8.99 | loss: 8.02015 | epsilon: 0.01
480 | 4770 episode | score: 8.93 | loss: 6.42963 | epsilon: 0.01
481 | 4780 episode | score: 8.96 | loss: 9.68000 | epsilon: 0.01
482 | 4790 episode | score: 8.98 | loss: 7.21564 | epsilon: 0.01
483 | 4800 episode | score: 8.94 | loss: 10.49922 | epsilon: 0.01
484 | 4810 episode | score: 8.86 | loss: 12.90248 | epsilon: 0.01
485 | 4820 episode | score: 8.82 | loss: 12.01941 | epsilon: 0.01
486 | 4830 episode | score: 8.78 | loss: 7.24218 | epsilon: 0.01
487 | 4840 episode | score: 8.78 | loss: 7.23795 | epsilon: 0.01
488 | 4850 episode | score: 8.75 | loss: 12.03610 | epsilon: 0.01
489 | 4860 episode | score: 8.76 | loss: 11.21964 | epsilon: 0.01
490 | 4870 episode | score: 8.69 | loss: 16.00153 | epsilon: 0.01
491 | 4880 episode | score: 8.79 | loss: 12.86883 | epsilon: 0.01
492 | 4890 episode | score: 8.73 | loss: 14.43297 | epsilon: 0.01
493 | 4900 episode | score: 8.71 | loss: 12.91766 | epsilon: 0.01
494 | 4910 episode | score: 8.67 | loss: 11.28181 | epsilon: 0.01
495 | 4920 episode | score: 8.65 | loss: 12.87584 | epsilon: 0.01
496 | 4930 episode | score: 8.64 | loss: 9.59836 | epsilon: 0.01
497 | 4940 episode | score: 8.62 | loss: 16.85092 | epsilon: 0.01
498 | 4950 episode | score: 8.69 | loss: 12.79394 | epsilon: 0.01
499 | 4960 episode | score: 8.70 | loss: 10.47979 | epsilon: 0.01
500 | 4970 episode | score: 8.78 | loss: 10.40397 | epsilon: 0.01
501 | 4980 episode | score: 8.80 | loss: 15.97149 | epsilon: 0.01
502 | 4990 episode | score: 8.76 | loss: 11.32307 | epsilon: 0.01
503 |
--------------------------------------------------------------------------------
/out/trace_DTQN_7.txt:
--------------------------------------------------------------------------------
1 | state size: 2
2 | action size: 2
3 | 0 episode | score: 35.00 | loss: 0.00000 | epsilon: 1.00
4 | 10 episode | score: 33.26 | loss: 0.00000 | epsilon: 1.00
5 | 20 episode | score: 31.47 | loss: 0.00000 | epsilon: 1.00
6 | 30 episode | score: 30.19 | loss: 0.00000 | epsilon: 1.00
7 | 40 episode | score: 29.01 | loss: 0.00000 | epsilon: 1.00
8 | 50 episode | score: 28.62 | loss: 0.43852 | epsilon: 1.00
9 | 60 episode | score: 27.80 | loss: 0.11433 | epsilon: 0.99
10 | 70 episode | score: 26.98 | loss: 0.19428 | epsilon: 0.98
11 | 80 episode | score: 26.27 | loss: 0.14741 | epsilon: 0.97
12 | 90 episode | score: 25.43 | loss: 0.33766 | epsilon: 0.96
13 | 100 episode | score: 24.73 | loss: 1.15387 | epsilon: 0.95
14 | 110 episode | score: 24.66 | loss: 0.26772 | epsilon: 0.94
15 | 120 episode | score: 24.23 | loss: 0.42548 | epsilon: 0.93
16 | 130 episode | score: 23.77 | loss: 0.44916 | epsilon: 0.92
17 | 140 episode | score: 23.46 | loss: 0.78435 | epsilon: 0.91
18 | 150 episode | score: 23.00 | loss: 0.53330 | epsilon: 0.90
19 | 160 episode | score: 23.18 | loss: 0.24204 | epsilon: 0.88
20 | 170 episode | score: 22.86 | loss: 1.22462 | epsilon: 0.87
21 | 180 episode | score: 22.61 | loss: 0.45706 | epsilon: 0.86
22 | 190 episode | score: 22.25 | loss: 0.70294 | epsilon: 0.85
23 | 200 episode | score: 21.77 | loss: 0.51427 | epsilon: 0.84
24 | 210 episode | score: 21.73 | loss: 0.53417 | epsilon: 0.83
25 | 220 episode | score: 21.82 | loss: 1.43799 | epsilon: 0.82
26 | 230 episode | score: 21.69 | loss: 0.90087 | epsilon: 0.81
27 | 240 episode | score: 21.73 | loss: 0.87772 | epsilon: 0.80
28 | 250 episode | score: 21.27 | loss: 0.33676 | epsilon: 0.79
29 | 260 episode | score: 20.94 | loss: 0.46457 | epsilon: 0.78
30 | 270 episode | score: 21.20 | loss: 0.96972 | epsilon: 0.77
31 | 280 episode | score: 21.32 | loss: 1.31934 | epsilon: 0.76
32 | 290 episode | score: 21.42 | loss: 0.40225 | epsilon: 0.74
33 | 300 episode | score: 21.46 | loss: 1.74762 | epsilon: 0.73
34 | 310 episode | score: 21.15 | loss: 0.75224 | epsilon: 0.72
35 | 320 episode | score: 20.52 | loss: 0.96829 | epsilon: 0.71
36 | 330 episode | score: 20.51 | loss: 0.85872 | epsilon: 0.70
37 | 340 episode | score: 19.86 | loss: 1.52210 | epsilon: 0.70
38 | 350 episode | score: 19.73 | loss: 0.79390 | epsilon: 0.69
39 | 360 episode | score: 20.06 | loss: 2.06629 | epsilon: 0.67
40 | 370 episode | score: 19.69 | loss: 2.02081 | epsilon: 0.67
41 | 380 episode | score: 19.52 | loss: 0.84718 | epsilon: 0.66
42 | 390 episode | score: 19.29 | loss: 2.10416 | epsilon: 0.65
43 | 400 episode | score: 19.11 | loss: 3.44531 | epsilon: 0.64
44 | 410 episode | score: 19.15 | loss: 2.24009 | epsilon: 0.63
45 | 420 episode | score: 19.00 | loss: 1.08297 | epsilon: 0.62
46 | 430 episode | score: 18.72 | loss: 0.46196 | epsilon: 0.61
47 | 440 episode | score: 18.33 | loss: 1.37073 | epsilon: 0.60
48 | 450 episode | score: 18.07 | loss: 2.04425 | epsilon: 0.59
49 | 460 episode | score: 17.45 | loss: 3.26727 | epsilon: 0.59
50 | 470 episode | score: 16.91 | loss: 0.94458 | epsilon: 0.58
51 | 480 episode | score: 16.62 | loss: 1.90392 | epsilon: 0.57
52 | 490 episode | score: 16.39 | loss: 3.33472 | epsilon: 0.57
53 | 500 episode | score: 16.01 | loss: 2.28937 | epsilon: 0.56
54 | 510 episode | score: 15.87 | loss: 3.86214 | epsilon: 0.55
55 | 520 episode | score: 16.01 | loss: 3.00679 | epsilon: 0.54
56 | 530 episode | score: 15.89 | loss: 2.45940 | epsilon: 0.54
57 | 540 episode | score: 15.65 | loss: 2.52061 | epsilon: 0.53
58 | 550 episode | score: 15.56 | loss: 0.55708 | epsilon: 0.52
59 | 560 episode | score: 15.83 | loss: 1.52446 | epsilon: 0.51
60 | 570 episode | score: 15.89 | loss: 4.10431 | epsilon: 0.50
61 | 580 episode | score: 15.67 | loss: 2.77351 | epsilon: 0.49
62 | 590 episode | score: 15.30 | loss: 2.61357 | epsilon: 0.49
63 | 600 episode | score: 15.50 | loss: 2.11396 | epsilon: 0.48
64 | 610 episode | score: 15.22 | loss: 2.12927 | epsilon: 0.47
65 | 620 episode | score: 14.91 | loss: 3.18743 | epsilon: 0.47
66 | 630 episode | score: 14.83 | loss: 2.16025 | epsilon: 0.46
67 | 640 episode | score: 14.46 | loss: 3.78357 | epsilon: 0.45
68 | 650 episode | score: 14.21 | loss: 4.93032 | epsilon: 0.45
69 | 660 episode | score: 14.06 | loss: 2.79694 | epsilon: 0.44
70 | 670 episode | score: 13.77 | loss: 3.85071 | epsilon: 0.43
71 | 680 episode | score: 13.88 | loss: 3.61345 | epsilon: 0.42
72 | 690 episode | score: 13.87 | loss: 3.37426 | epsilon: 0.42
73 | 700 episode | score: 13.58 | loss: 4.50640 | epsilon: 0.41
74 | 710 episode | score: 13.27 | loss: 2.85008 | epsilon: 0.41
75 | 720 episode | score: 13.29 | loss: 3.97639 | epsilon: 0.40
76 | 730 episode | score: 13.00 | loss: 2.30354 | epsilon: 0.39
77 | 740 episode | score: 12.89 | loss: 3.43699 | epsilon: 0.39
78 | 750 episode | score: 12.67 | loss: 6.29570 | epsilon: 0.38
79 | 760 episode | score: 12.72 | loss: 5.19002 | epsilon: 0.37
80 | 770 episode | score: 12.44 | loss: 2.41706 | epsilon: 0.37
81 | 780 episode | score: 12.31 | loss: 5.80734 | epsilon: 0.36
82 | 790 episode | score: 12.13 | loss: 7.02250 | epsilon: 0.36
83 | 800 episode | score: 11.95 | loss: 2.98520 | epsilon: 0.35
84 | 810 episode | score: 11.85 | loss: 4.71853 | epsilon: 0.35
85 | 820 episode | score: 11.66 | loss: 6.04004 | epsilon: 0.34
86 | 830 episode | score: 11.56 | loss: 7.74889 | epsilon: 0.33
87 | 840 episode | score: 11.71 | loss: 6.60510 | epsilon: 0.33
88 | 850 episode | score: 11.60 | loss: 3.04761 | epsilon: 0.32
89 | 860 episode | score: 11.72 | loss: 1.87209 | epsilon: 0.31
90 | 870 episode | score: 11.69 | loss: 6.05593 | epsilon: 0.31
91 | 880 episode | score: 11.53 | loss: 2.53926 | epsilon: 0.30
92 | 890 episode | score: 11.43 | loss: 5.47966 | epsilon: 0.30
93 | 900 episode | score: 11.71 | loss: 3.68325 | epsilon: 0.29
94 | 910 episode | score: 11.68 | loss: 3.10708 | epsilon: 0.28
95 | 920 episode | score: 12.07 | loss: 4.34701 | epsilon: 0.27
96 | 930 episode | score: 12.16 | loss: 6.20620 | epsilon: 0.27
97 | 940 episode | score: 12.13 | loss: 4.34877 | epsilon: 0.26
98 | 950 episode | score: 11.88 | loss: 6.85048 | epsilon: 0.26
99 | 960 episode | score: 11.64 | loss: 4.99166 | epsilon: 0.25
100 | 970 episode | score: 11.52 | loss: 4.50395 | epsilon: 0.24
101 | 980 episode | score: 13.71 | loss: 3.18380 | epsilon: 0.23
102 | 990 episode | score: 13.61 | loss: 3.20212 | epsilon: 0.22
103 | 1000 episode | score: 13.54 | loss: 3.86552 | epsilon: 0.21
104 | 1010 episode | score: 13.33 | loss: 3.24438 | epsilon: 0.21
105 | 1020 episode | score: 13.63 | loss: 2.62725 | epsilon: 0.20
106 | 1030 episode | score: 13.32 | loss: 3.26358 | epsilon: 0.19
107 | 1040 episode | score: 14.30 | loss: 3.29924 | epsilon: 0.18
108 | 1050 episode | score: 14.70 | loss: 3.92763 | epsilon: 0.17
109 | 1060 episode | score: 14.19 | loss: 3.28930 | epsilon: 0.17
110 | 1070 episode | score: 14.18 | loss: 4.59915 | epsilon: 0.16
111 | 1080 episode | score: 14.07 | loss: 5.24100 | epsilon: 0.15
112 | 1090 episode | score: 13.64 | loss: 6.55998 | epsilon: 0.15
113 | 1100 episode | score: 13.51 | loss: 0.09130 | epsilon: 0.14
114 | 1110 episode | score: 13.08 | loss: 6.60836 | epsilon: 0.13
115 | 1120 episode | score: 12.69 | loss: 3.33006 | epsilon: 0.13
116 | 1130 episode | score: 12.30 | loss: 6.59902 | epsilon: 0.12
117 | 1140 episode | score: 12.53 | loss: 5.47107 | epsilon: 0.12
118 | 1150 episode | score: 12.22 | loss: 4.05350 | epsilon: 0.11
119 | 1160 episode | score: 11.94 | loss: 9.41852 | epsilon: 0.11
120 | 1170 episode | score: 11.62 | loss: 6.70508 | epsilon: 0.10
121 | 1180 episode | score: 11.42 | loss: 4.86226 | epsilon: 0.10
122 | 1190 episode | score: 11.18 | loss: 5.39657 | epsilon: 0.09
123 | 1200 episode | score: 11.19 | loss: 6.11098 | epsilon: 0.08
124 | 1210 episode | score: 10.96 | loss: 7.41067 | epsilon: 0.08
125 | 1220 episode | score: 10.74 | loss: 8.72433 | epsilon: 0.08
126 | 1230 episode | score: 10.65 | loss: 6.14560 | epsilon: 0.07
127 | 1240 episode | score: 10.44 | loss: 8.21484 | epsilon: 0.06
128 | 1250 episode | score: 10.30 | loss: 8.88006 | epsilon: 0.06
129 | 1260 episode | score: 10.19 | loss: 8.21490 | epsilon: 0.05
130 | 1270 episode | score: 10.03 | loss: 6.78065 | epsilon: 0.05
131 | 1280 episode | score: 9.92 | loss: 10.19846 | epsilon: 0.05
132 | 1290 episode | score: 9.88 | loss: 10.88771 | epsilon: 0.04
133 | 1300 episode | score: 9.80 | loss: 6.82685 | epsilon: 0.03
134 | 1310 episode | score: 9.65 | loss: 10.89788 | epsilon: 0.03
135 | 1320 episode | score: 9.55 | loss: 4.12844 | epsilon: 0.03
136 | 1330 episode | score: 9.46 | loss: 10.29858 | epsilon: 0.02
137 | 1340 episode | score: 9.44 | loss: 8.25109 | epsilon: 0.02
138 | 1350 episode | score: 9.38 | loss: 7.56875 | epsilon: 0.01
139 | 1360 episode | score: 9.31 | loss: 11.00283 | epsilon: 0.01
140 | 1370 episode | score: 9.38 | loss: 10.34631 | epsilon: 0.01
141 | 1380 episode | score: 9.26 | loss: 11.74283 | epsilon: 0.01
142 | 1390 episode | score: 9.23 | loss: 9.70764 | epsilon: 0.01
143 | 1400 episode | score: 9.18 | loss: 11.77427 | epsilon: 0.01
144 | 1410 episode | score: 9.15 | loss: 11.11504 | epsilon: 0.01
145 | 1420 episode | score: 9.04 | loss: 11.21211 | epsilon: 0.01
146 | 1430 episode | score: 8.95 | loss: 12.53273 | epsilon: 0.01
147 | 1440 episode | score: 8.91 | loss: 7.00767 | epsilon: 0.01
148 | 1450 episode | score: 9.12 | loss: 10.50389 | epsilon: 0.01
149 | 1460 episode | score: 9.36 | loss: 11.23039 | epsilon: 0.01
150 | 1470 episode | score: 9.24 | loss: 7.05162 | epsilon: 0.01
151 | 1480 episode | score: 9.23 | loss: 7.01815 | epsilon: 0.01
152 | 1490 episode | score: 9.19 | loss: 9.16645 | epsilon: 0.01
153 | 1500 episode | score: 9.10 | loss: 9.85660 | epsilon: 0.01
154 | 1510 episode | score: 9.05 | loss: 9.85440 | epsilon: 0.01
155 | 1520 episode | score: 9.14 | loss: 9.93323 | epsilon: 0.01
156 | 1530 episode | score: 9.06 | loss: 9.91487 | epsilon: 0.01
157 | 1540 episode | score: 9.04 | loss: 6.37352 | epsilon: 0.01
158 | 1550 episode | score: 8.95 | loss: 6.42716 | epsilon: 0.01
159 | 1560 episode | score: 9.02 | loss: 12.11458 | epsilon: 0.01
160 | 1570 episode | score: 8.95 | loss: 9.95084 | epsilon: 0.01
161 | 1580 episode | score: 9.03 | loss: 7.11366 | epsilon: 0.01
162 | 1590 episode | score: 8.96 | loss: 5.71430 | epsilon: 0.01
163 | 1600 episode | score: 9.02 | loss: 9.28524 | epsilon: 0.01
164 | 1610 episode | score: 9.12 | loss: 9.29310 | epsilon: 0.01
165 | 1620 episode | score: 9.03 | loss: 7.89838 | epsilon: 0.01
166 | 1630 episode | score: 8.98 | loss: 9.31618 | epsilon: 0.01
167 | 1640 episode | score: 8.95 | loss: 10.05182 | epsilon: 0.01
168 | 1650 episode | score: 9.48 | loss: 7.97580 | epsilon: 0.01
169 | 1660 episode | score: 9.59 | loss: 10.10497 | epsilon: 0.01
170 | 1670 episode | score: 9.66 | loss: 12.99237 | epsilon: 0.01
171 | 1680 episode | score: 9.73 | loss: 5.83085 | epsilon: 0.01
172 | 1690 episode | score: 9.64 | loss: 8.68751 | epsilon: 0.01
173 | 1700 episode | score: 9.64 | loss: 7.96195 | epsilon: 0.01
174 | 1710 episode | score: 9.54 | loss: 12.32486 | epsilon: 0.01
175 | 1720 episode | score: 9.45 | loss: 7.23098 | epsilon: 0.01
176 | 1730 episode | score: 9.49 | loss: 7.36754 | epsilon: 0.01
177 | 1740 episode | score: 9.42 | loss: 7.25744 | epsilon: 0.01
178 | 1750 episode | score: 9.36 | loss: 6.53222 | epsilon: 0.01
179 | 1760 episode | score: 9.31 | loss: 10.92942 | epsilon: 0.01
180 | 1770 episode | score: 9.29 | loss: 9.51754 | epsilon: 0.01
181 | 1780 episode | score: 9.39 | loss: 10.25551 | epsilon: 0.01
182 | 1790 episode | score: 9.25 | loss: 12.39059 | epsilon: 0.01
183 | 1800 episode | score: 9.21 | loss: 7.43139 | epsilon: 0.01
184 | 1810 episode | score: 9.13 | loss: 8.77167 | epsilon: 0.01
185 | 1820 episode | score: 9.06 | loss: 10.92920 | epsilon: 0.01
186 | 1830 episode | score: 9.01 | loss: 7.37547 | epsilon: 0.01
187 | 1840 episode | score: 8.96 | loss: 9.52767 | epsilon: 0.01
188 | 1850 episode | score: 8.92 | loss: 13.15261 | epsilon: 0.01
189 | 1860 episode | score: 8.84 | loss: 6.61016 | epsilon: 0.01
190 | 1870 episode | score: 8.81 | loss: 11.70613 | epsilon: 0.01
191 | 1880 episode | score: 8.78 | loss: 11.75186 | epsilon: 0.01
192 | 1890 episode | score: 8.71 | loss: 11.78291 | epsilon: 0.01
193 | 1900 episode | score: 8.69 | loss: 11.03975 | epsilon: 0.01
194 | 1910 episode | score: 8.69 | loss: 11.00511 | epsilon: 0.01
195 | 1920 episode | score: 8.79 | loss: 9.61245 | epsilon: 0.01
196 | 1930 episode | score: 8.76 | loss: 8.90461 | epsilon: 0.01
197 | 1940 episode | score: 8.70 | loss: 11.79481 | epsilon: 0.01
198 | 1950 episode | score: 8.82 | loss: 10.33012 | epsilon: 0.01
199 | 1960 episode | score: 8.74 | loss: 12.54107 | epsilon: 0.01
200 | 1970 episode | score: 8.67 | loss: 10.37337 | epsilon: 0.01
201 | 1980 episode | score: 8.65 | loss: 14.00276 | epsilon: 0.01
202 | 1990 episode | score: 8.65 | loss: 16.27757 | epsilon: 0.01
203 | 2000 episode | score: 8.64 | loss: 11.12934 | epsilon: 0.01
204 | 2010 episode | score: 8.56 | loss: 12.59322 | epsilon: 0.01
205 | 2020 episode | score: 8.57 | loss: 8.91485 | epsilon: 0.01
206 | 2030 episode | score: 8.57 | loss: 13.42655 | epsilon: 0.01
207 | 2040 episode | score: 8.57 | loss: 9.68667 | epsilon: 0.01
208 | 2050 episode | score: 8.56 | loss: 13.31827 | epsilon: 0.01
209 | 2060 episode | score: 8.67 | loss: 8.91400 | epsilon: 0.01
210 | 2070 episode | score: 8.63 | loss: 10.44289 | epsilon: 0.01
211 | 2080 episode | score: 8.59 | loss: 13.33609 | epsilon: 0.01
212 | 2090 episode | score: 8.71 | loss: 13.34085 | epsilon: 0.01
213 | 2100 episode | score: 8.69 | loss: 13.33628 | epsilon: 0.01
214 | 2110 episode | score: 8.68 | loss: 11.17124 | epsilon: 0.01
215 | 2120 episode | score: 8.63 | loss: 12.62965 | epsilon: 0.01
216 | 2130 episode | score: 8.62 | loss: 14.87161 | epsilon: 0.01
217 | 2140 episode | score: 8.60 | loss: 11.95751 | epsilon: 0.01
218 | 2150 episode | score: 8.67 | loss: 14.16819 | epsilon: 0.01
219 | 2160 episode | score: 8.81 | loss: 12.69814 | epsilon: 0.01
220 | 2170 episode | score: 8.98 | loss: 9.73720 | epsilon: 0.01
221 | 2180 episode | score: 8.93 | loss: 12.00353 | epsilon: 0.01
222 | 2190 episode | score: 8.85 | loss: 9.09578 | epsilon: 0.01
223 | 2200 episode | score: 8.83 | loss: 10.47638 | epsilon: 0.01
224 | 2210 episode | score: 8.84 | loss: 6.82982 | epsilon: 0.01
225 | 2220 episode | score: 8.79 | loss: 12.02168 | epsilon: 0.01
226 | 2230 episode | score: 8.76 | loss: 10.52687 | epsilon: 0.01
227 | 2240 episode | score: 8.72 | loss: 8.28426 | epsilon: 0.01
228 | 2250 episode | score: 8.69 | loss: 9.21002 | epsilon: 0.01
229 | 2260 episode | score: 8.70 | loss: 10.57914 | epsilon: 0.01
230 | 2270 episode | score: 8.63 | loss: 9.84609 | epsilon: 0.01
231 | 2280 episode | score: 8.62 | loss: 10.56394 | epsilon: 0.01
232 | 2290 episode | score: 8.61 | loss: 9.90382 | epsilon: 0.01
233 | 2300 episode | score: 8.60 | loss: 9.81931 | epsilon: 0.01
234 | 2310 episode | score: 8.63 | loss: 12.82243 | epsilon: 0.01
235 | 2320 episode | score: 8.61 | loss: 10.57878 | epsilon: 0.01
236 | 2330 episode | score: 8.58 | loss: 9.15038 | epsilon: 0.01
237 | 2340 episode | score: 8.53 | loss: 8.33555 | epsilon: 0.01
238 | 2350 episode | score: 8.49 | loss: 15.08654 | epsilon: 0.01
239 | 2360 episode | score: 8.50 | loss: 9.81196 | epsilon: 0.01
240 | 2370 episode | score: 8.43 | loss: 13.60335 | epsilon: 0.01
241 | 2380 episode | score: 8.48 | loss: 11.37627 | epsilon: 0.01
242 | 2390 episode | score: 8.55 | loss: 14.38280 | epsilon: 0.01
243 | 2400 episode | score: 8.59 | loss: 11.38187 | epsilon: 0.01
244 | 2410 episode | score: 8.58 | loss: 9.87547 | epsilon: 0.01
245 | 2420 episode | score: 8.57 | loss: 8.44234 | epsilon: 0.01
246 | 2430 episode | score: 8.58 | loss: 12.23629 | epsilon: 0.01
247 | 2440 episode | score: 8.53 | loss: 7.65077 | epsilon: 0.01
248 | 2450 episode | score: 8.58 | loss: 13.13559 | epsilon: 0.01
249 | 2460 episode | score: 8.56 | loss: 10.70614 | epsilon: 0.01
250 | 2470 episode | score: 8.62 | loss: 9.23430 | epsilon: 0.01
251 | 2480 episode | score: 8.59 | loss: 7.70159 | epsilon: 0.01
252 | 2490 episode | score: 8.60 | loss: 12.97268 | epsilon: 0.01
253 | 2500 episode | score: 9.08 | loss: 6.14232 | epsilon: 0.01
254 | 2510 episode | score: 9.05 | loss: 9.98390 | epsilon: 0.01
255 | 2520 episode | score: 9.06 | loss: 7.68319 | epsilon: 0.01
256 | 2530 episode | score: 9.04 | loss: 7.66394 | epsilon: 0.01
257 | 2540 episode | score: 9.04 | loss: 8.42431 | epsilon: 0.01
258 | 2550 episode | score: 9.05 | loss: 10.74949 | epsilon: 0.01
259 | 2560 episode | score: 9.03 | loss: 10.00972 | epsilon: 0.01
260 | 2570 episode | score: 9.05 | loss: 10.00111 | epsilon: 0.01
261 | 2580 episode | score: 9.03 | loss: 8.57820 | epsilon: 0.01
262 | 2590 episode | score: 9.00 | loss: 9.22158 | epsilon: 0.01
263 | 2600 episode | score: 9.08 | loss: 11.53948 | epsilon: 0.01
264 | 2610 episode | score: 9.01 | loss: 8.47736 | epsilon: 0.01
265 | 2620 episode | score: 8.99 | loss: 13.89917 | epsilon: 0.01
266 | 2630 episode | score: 8.94 | loss: 10.78089 | epsilon: 0.01
267 | 2640 episode | score: 8.94 | loss: 9.23720 | epsilon: 0.01
268 | 2650 episode | score: 9.09 | loss: 13.82086 | epsilon: 0.01
269 | 2660 episode | score: 9.17 | loss: 10.81388 | epsilon: 0.01
270 | 2670 episode | score: 9.07 | loss: 7.08669 | epsilon: 0.01
271 | 2680 episode | score: 8.99 | loss: 8.57375 | epsilon: 0.01
272 | 2690 episode | score: 9.00 | loss: 7.73258 | epsilon: 0.01
273 | 2700 episode | score: 8.95 | loss: 7.82625 | epsilon: 0.01
274 | 2710 episode | score: 8.91 | loss: 10.05326 | epsilon: 0.01
275 | 2720 episode | score: 8.84 | loss: 7.74642 | epsilon: 0.01
276 | 2730 episode | score: 8.80 | loss: 13.19881 | epsilon: 0.01
277 | 2740 episode | score: 8.82 | loss: 10.07025 | epsilon: 0.01
278 | 2750 episode | score: 8.89 | loss: 12.45670 | epsilon: 0.01
279 | 2760 episode | score: 8.85 | loss: 11.64434 | epsilon: 0.01
280 | 2770 episode | score: 8.76 | loss: 7.01363 | epsilon: 0.01
281 | 2780 episode | score: 8.79 | loss: 13.19227 | epsilon: 0.01
282 | 2790 episode | score: 8.75 | loss: 14.75411 | epsilon: 0.01
283 | 2800 episode | score: 8.70 | loss: 9.51022 | epsilon: 0.01
284 | 2810 episode | score: 8.72 | loss: 13.20181 | epsilon: 0.01
285 | 2820 episode | score: 9.11 | loss: 10.83381 | epsilon: 0.01
286 | 2830 episode | score: 9.16 | loss: 9.31269 | epsilon: 0.01
287 | 2840 episode | score: 9.09 | loss: 7.04241 | epsilon: 0.01
288 | 2850 episode | score: 9.12 | loss: 8.60452 | epsilon: 0.01
289 | 2860 episode | score: 9.02 | loss: 10.94150 | epsilon: 0.01
290 | 2870 episode | score: 8.93 | loss: 12.45783 | epsilon: 0.01
291 | 2880 episode | score: 9.00 | loss: 11.71508 | epsilon: 0.01
292 | 2890 episode | score: 8.98 | loss: 8.59419 | epsilon: 0.01
293 | 2900 episode | score: 8.92 | loss: 8.64907 | epsilon: 0.01
294 | 2910 episode | score: 8.90 | loss: 10.14799 | epsilon: 0.01
295 | 2920 episode | score: 8.87 | loss: 13.24641 | epsilon: 0.01
296 | 2930 episode | score: 8.85 | loss: 11.01173 | epsilon: 0.01
297 | 2940 episode | score: 8.82 | loss: 10.20390 | epsilon: 0.01
298 | 2950 episode | score: 8.78 | loss: 13.23380 | epsilon: 0.01
299 | 2960 episode | score: 8.72 | loss: 10.24518 | epsilon: 0.01
300 | 2970 episode | score: 8.69 | loss: 10.95838 | epsilon: 0.01
301 | 2980 episode | score: 8.74 | loss: 7.84115 | epsilon: 0.01
302 | 2990 episode | score: 8.75 | loss: 11.73325 | epsilon: 0.01
303 | 3000 episode | score: 8.74 | loss: 11.66862 | epsilon: 0.01
304 | 3010 episode | score: 8.69 | loss: 7.82610 | epsilon: 0.01
305 | 3020 episode | score: 8.66 | loss: 10.22930 | epsilon: 0.01
306 | 3030 episode | score: 8.76 | loss: 7.79331 | epsilon: 0.01
307 | 3040 episode | score: 8.82 | loss: 7.90085 | epsilon: 0.01
308 | 3050 episode | score: 8.81 | loss: 13.34812 | epsilon: 0.01
309 | 3060 episode | score: 8.85 | loss: 10.22669 | epsilon: 0.01
310 | 3070 episode | score: 8.78 | loss: 6.30953 | epsilon: 0.01
311 | 3080 episode | score: 8.78 | loss: 13.25165 | epsilon: 0.01
312 | 3090 episode | score: 8.78 | loss: 8.65888 | epsilon: 0.01
313 | 3100 episode | score: 8.74 | loss: 14.08551 | epsilon: 0.01
314 | 3110 episode | score: 8.76 | loss: 7.86103 | epsilon: 0.01
315 | 3120 episode | score: 8.72 | loss: 11.01455 | epsilon: 0.01
316 | 3130 episode | score: 8.72 | loss: 14.21833 | epsilon: 0.01
317 | 3140 episode | score: 8.68 | loss: 8.61008 | epsilon: 0.01
318 | 3150 episode | score: 8.62 | loss: 13.33219 | epsilon: 0.01
319 | 3160 episode | score: 8.76 | loss: 4.78636 | epsilon: 0.01
320 | 3170 episode | score: 8.99 | loss: 7.05520 | epsilon: 0.01
321 | 3180 episode | score: 8.96 | loss: 11.86772 | epsilon: 0.01
322 | 3190 episode | score: 8.98 | loss: 10.17817 | epsilon: 0.01
323 | 3200 episode | score: 8.98 | loss: 12.53637 | epsilon: 0.01
324 | 3210 episode | score: 8.91 | loss: 10.12809 | epsilon: 0.01
325 | 3220 episode | score: 8.88 | loss: 8.65906 | epsilon: 0.01
326 | 3230 episode | score: 8.98 | loss: 10.20646 | epsilon: 0.01
327 | 3240 episode | score: 8.91 | loss: 10.96940 | epsilon: 0.01
328 | 3250 episode | score: 8.87 | loss: 10.18598 | epsilon: 0.01
329 | 3260 episode | score: 8.83 | loss: 8.61975 | epsilon: 0.01
330 | 3270 episode | score: 8.78 | loss: 10.19128 | epsilon: 0.01
331 | 3280 episode | score: 8.73 | loss: 8.62359 | epsilon: 0.01
332 | 3290 episode | score: 8.71 | loss: 11.02425 | epsilon: 0.01
333 | 3300 episode | score: 8.68 | loss: 14.13260 | epsilon: 0.01
334 | 3310 episode | score: 8.62 | loss: 9.41409 | epsilon: 0.01
335 | 3320 episode | score: 8.62 | loss: 12.54254 | epsilon: 0.01
336 | 3330 episode | score: 8.60 | loss: 11.82084 | epsilon: 0.01
337 | 3340 episode | score: 8.61 | loss: 10.22156 | epsilon: 0.01
338 | 3350 episode | score: 8.61 | loss: 11.80009 | epsilon: 0.01
339 | 3360 episode | score: 8.67 | loss: 11.07604 | epsilon: 0.01
340 | 3370 episode | score: 8.66 | loss: 11.78597 | epsilon: 0.01
341 | 3380 episode | score: 8.77 | loss: 11.01300 | epsilon: 0.01
342 | 3390 episode | score: 8.96 | loss: 10.25491 | epsilon: 0.01
343 | 3400 episode | score: 8.94 | loss: 11.80630 | epsilon: 0.01
344 | 3410 episode | score: 9.23 | loss: 13.34089 | epsilon: 0.01
345 | 3420 episode | score: 9.64 | loss: 7.89139 | epsilon: 0.01
346 | 3430 episode | score: 9.66 | loss: 8.69850 | epsilon: 0.01
347 | 3440 episode | score: 9.52 | loss: 9.43483 | epsilon: 0.01
348 | 3450 episode | score: 9.39 | loss: 10.31454 | epsilon: 0.01
349 | 3460 episode | score: 9.32 | loss: 10.27419 | epsilon: 0.01
350 | 3470 episode | score: 9.21 | loss: 6.42270 | epsilon: 0.01
351 | 3480 episode | score: 9.27 | loss: 5.52721 | epsilon: 0.01
352 | 3490 episode | score: 9.21 | loss: 12.69482 | epsilon: 0.01
353 | 3500 episode | score: 9.11 | loss: 12.71606 | epsilon: 0.01
354 | 3510 episode | score: 9.60 | loss: 11.90306 | epsilon: 0.01
355 | 3520 episode | score: 9.49 | loss: 15.78254 | epsilon: 0.01
356 | 3530 episode | score: 9.42 | loss: 9.48232 | epsilon: 0.01
357 | 3540 episode | score: 9.32 | loss: 9.52225 | epsilon: 0.01
358 | 3550 episode | score: 9.22 | loss: 14.21484 | epsilon: 0.01
359 | 3560 episode | score: 9.14 | loss: 11.09772 | epsilon: 0.01
360 | 3570 episode | score: 9.17 | loss: 8.73580 | epsilon: 0.01
361 | 3580 episode | score: 9.08 | loss: 9.49013 | epsilon: 0.01
362 | 3590 episode | score: 8.98 | loss: 11.07365 | epsilon: 0.01
363 | 3600 episode | score: 8.90 | loss: 7.99541 | epsilon: 0.01
364 | 3610 episode | score: 8.85 | loss: 11.86866 | epsilon: 0.01
365 | 3620 episode | score: 8.82 | loss: 15.04194 | epsilon: 0.01
366 | 3630 episode | score: 8.77 | loss: 10.41370 | epsilon: 0.01
367 | 3640 episode | score: 8.74 | loss: 13.54768 | epsilon: 0.01
368 | 3650 episode | score: 8.76 | loss: 11.88636 | epsilon: 0.01
369 | 3660 episode | score: 8.97 | loss: 10.68938 | epsilon: 0.01
370 | 3670 episode | score: 8.99 | loss: 18.81310 | epsilon: 0.01
371 | 3680 episode | score: 8.99 | loss: 2.19543 | epsilon: 0.01
372 | 3690 episode | score: 9.09 | loss: 5.16908 | epsilon: 0.01
373 | 3700 episode | score: 9.38 | loss: 6.65977 | epsilon: 0.01
374 | 3710 episode | score: 9.35 | loss: 4.14644 | epsilon: 0.01
375 | 3720 episode | score: 9.46 | loss: 14.50631 | epsilon: 0.01
376 | 3730 episode | score: 9.42 | loss: 9.58960 | epsilon: 0.01
377 | 3740 episode | score: 9.85 | loss: 8.54436 | epsilon: 0.01
378 | 3750 episode | score: 10.21 | loss: 10.65839 | epsilon: 0.01
379 | 3760 episode | score: 10.45 | loss: 9.73674 | epsilon: 0.01
380 | 3770 episode | score: 10.91 | loss: 7.87997 | epsilon: 0.01
381 | 3780 episode | score: 11.16 | loss: 7.26872 | epsilon: 0.01
382 | 3790 episode | score: 11.52 | loss: 5.94254 | epsilon: 0.01
383 | 3800 episode | score: 11.62 | loss: 5.46625 | epsilon: 0.01
384 | 3810 episode | score: 11.71 | loss: 10.21619 | epsilon: 0.01
385 | 3820 episode | score: 11.96 | loss: 9.50844 | epsilon: 0.01
386 | 3830 episode | score: 11.91 | loss: 4.78348 | epsilon: 0.01
387 | 3840 episode | score: 12.67 | loss: 5.18083 | epsilon: 0.01
388 | 3850 episode | score: 12.43 | loss: 6.25089 | epsilon: 0.01
389 | 3860 episode | score: 13.73 | loss: 4.49952 | epsilon: 0.01
390 | 3870 episode | score: 15.31 | loss: 5.31932 | epsilon: 0.01
391 | 3880 episode | score: 14.92 | loss: 8.60513 | epsilon: 0.01
392 | 3890 episode | score: 14.59 | loss: 7.08908 | epsilon: 0.01
393 | 3900 episode | score: 14.69 | loss: 3.78458 | epsilon: 0.01
394 | 3910 episode | score: 14.54 | loss: 7.11351 | epsilon: 0.01
395 | 3920 episode | score: 15.00 | loss: 4.39285 | epsilon: 0.01
396 | 3930 episode | score: 14.80 | loss: 5.84840 | epsilon: 0.01
397 | 3940 episode | score: 14.21 | loss: 5.61221 | epsilon: 0.01
398 | 3950 episode | score: 14.41 | loss: 5.26324 | epsilon: 0.01
399 | 3960 episode | score: 14.78 | loss: 4.12885 | epsilon: 0.01
400 | 3970 episode | score: 14.46 | loss: 10.88114 | epsilon: 0.01
401 | 3980 episode | score: 13.96 | loss: 3.99816 | epsilon: 0.01
402 | 3990 episode | score: 14.11 | loss: 2.82461 | epsilon: 0.01
403 | 4000 episode | score: 14.45 | loss: 5.42744 | epsilon: 0.01
404 | 4010 episode | score: 13.89 | loss: 9.51669 | epsilon: 0.01
405 | 4020 episode | score: 14.08 | loss: 9.78605 | epsilon: 0.01
406 | 4030 episode | score: 13.66 | loss: 12.18577 | epsilon: 0.01
407 | 4040 episode | score: 13.18 | loss: 5.42101 | epsilon: 0.01
408 | 4050 episode | score: 12.86 | loss: 7.65784 | epsilon: 0.01
409 | 4060 episode | score: 12.72 | loss: 6.49174 | epsilon: 0.01
410 | 4070 episode | score: 12.51 | loss: 5.89485 | epsilon: 0.01
411 | 4080 episode | score: 13.01 | loss: 9.56154 | epsilon: 0.01
412 | 4090 episode | score: 12.53 | loss: 4.01989 | epsilon: 0.01
413 | 4100 episode | score: 12.17 | loss: 6.92747 | epsilon: 0.01
414 | 4110 episode | score: 12.99 | loss: 8.53241 | epsilon: 0.01
415 | 4120 episode | score: 12.85 | loss: 10.38810 | epsilon: 0.01
416 | 4130 episode | score: 13.48 | loss: 10.76646 | epsilon: 0.01
417 | 4140 episode | score: 13.00 | loss: 6.65141 | epsilon: 0.01
418 | 4150 episode | score: 12.71 | loss: 11.13456 | epsilon: 0.01
419 | 4160 episode | score: 12.36 | loss: 8.49248 | epsilon: 0.01
420 | 4170 episode | score: 13.35 | loss: 9.90515 | epsilon: 0.01
421 | 4180 episode | score: 12.97 | loss: 8.97677 | epsilon: 0.01
422 | 4190 episode | score: 12.56 | loss: 8.43617 | epsilon: 0.01
423 | 4200 episode | score: 12.14 | loss: 9.76145 | epsilon: 0.01
424 | 4210 episode | score: 13.05 | loss: 7.90451 | epsilon: 0.01
425 | 4220 episode | score: 13.12 | loss: 8.31651 | epsilon: 0.01
426 | 4230 episode | score: 12.83 | loss: 7.39902 | epsilon: 0.01
427 | 4240 episode | score: 12.49 | loss: 4.18256 | epsilon: 0.01
428 | 4250 episode | score: 12.36 | loss: 9.26329 | epsilon: 0.01
429 | 4260 episode | score: 12.03 | loss: 7.76012 | epsilon: 0.01
430 | 4270 episode | score: 12.15 | loss: 9.94203 | epsilon: 0.01
431 | 4280 episode | score: 11.87 | loss: 10.34129 | epsilon: 0.01
432 | 4290 episode | score: 11.74 | loss: 8.36905 | epsilon: 0.01
433 | 4300 episode | score: 11.44 | loss: 6.75636 | epsilon: 0.01
434 | 4310 episode | score: 11.15 | loss: 7.25203 | epsilon: 0.01
435 | 4320 episode | score: 10.90 | loss: 11.51711 | epsilon: 0.01
436 | 4330 episode | score: 11.25 | loss: 6.79873 | epsilon: 0.01
437 | 4340 episode | score: 11.18 | loss: 8.40619 | epsilon: 0.01
438 | 4350 episode | score: 11.49 | loss: 7.93133 | epsilon: 0.01
439 | 4360 episode | score: 11.51 | loss: 3.24878 | epsilon: 0.01
440 | 4370 episode | score: 11.87 | loss: 9.02317 | epsilon: 0.01
441 | 4380 episode | score: 11.68 | loss: 6.11406 | epsilon: 0.01
442 | 4390 episode | score: 11.57 | loss: 10.30894 | epsilon: 0.01
443 | 4400 episode | score: 11.47 | loss: 8.55706 | epsilon: 0.01
444 | 4410 episode | score: 11.16 | loss: 5.48418 | epsilon: 0.01
445 | 4420 episode | score: 10.92 | loss: 7.56599 | epsilon: 0.01
446 | 4430 episode | score: 10.87 | loss: 8.38144 | epsilon: 0.01
447 | 4440 episode | score: 10.64 | loss: 7.39350 | epsilon: 0.01
448 | 4450 episode | score: 10.49 | loss: 10.57561 | epsilon: 0.01
449 | 4460 episode | score: 10.33 | loss: 9.50152 | epsilon: 0.01
450 | 4470 episode | score: 11.47 | loss: 10.41020 | epsilon: 0.01
451 | 4480 episode | score: 11.20 | loss: 6.86953 | epsilon: 0.01
452 | 4490 episode | score: 11.00 | loss: 8.93858 | epsilon: 0.01
453 | 4500 episode | score: 10.81 | loss: 7.15512 | epsilon: 0.01
454 | 4510 episode | score: 10.72 | loss: 6.55015 | epsilon: 0.01
455 | 4520 episode | score: 10.76 | loss: 11.14064 | epsilon: 0.01
456 | 4530 episode | score: 11.60 | loss: 6.40759 | epsilon: 0.01
457 | 4540 episode | score: 11.65 | loss: 6.36440 | epsilon: 0.01
458 | 4550 episode | score: 11.42 | loss: 9.43788 | epsilon: 0.01
459 | 4560 episode | score: 12.11 | loss: 5.23221 | epsilon: 0.01
460 | 4570 episode | score: 11.99 | loss: 7.09730 | epsilon: 0.01
461 | 4580 episode | score: 11.70 | loss: 4.86378 | epsilon: 0.01
462 | 4590 episode | score: 11.38 | loss: 6.45714 | epsilon: 0.01
463 | 4600 episode | score: 11.07 | loss: 9.09950 | epsilon: 0.01
464 | 4610 episode | score: 12.26 | loss: 5.60530 | epsilon: 0.01
465 | 4620 episode | score: 11.87 | loss: 11.84906 | epsilon: 0.01
466 | 4630 episode | score: 11.75 | loss: 9.31334 | epsilon: 0.01
467 | 4640 episode | score: 12.19 | loss: 9.48375 | epsilon: 0.01
468 | 4650 episode | score: 11.92 | loss: 7.69349 | epsilon: 0.01
469 | 4660 episode | score: 11.93 | loss: 7.12739 | epsilon: 0.01
470 | 4670 episode | score: 12.61 | loss: 8.63105 | epsilon: 0.01
471 | 4680 episode | score: 12.30 | loss: 13.35111 | epsilon: 0.01
472 | 4690 episode | score: 12.02 | loss: 10.32689 | epsilon: 0.01
473 | 4700 episode | score: 11.71 | loss: 8.77026 | epsilon: 0.01
474 | 4710 episode | score: 11.43 | loss: 7.19243 | epsilon: 0.01
475 | 4720 episode | score: 11.27 | loss: 8.00877 | epsilon: 0.01
476 | 4730 episode | score: 11.03 | loss: 8.81402 | epsilon: 0.01
477 | 4740 episode | score: 10.76 | loss: 11.19737 | epsilon: 0.01
478 | 4750 episode | score: 10.59 | loss: 7.21122 | epsilon: 0.01
479 | 4760 episode | score: 10.39 | loss: 6.38997 | epsilon: 0.01
480 | 4770 episode | score: 10.22 | loss: 9.57317 | epsilon: 0.01
481 | 4780 episode | score: 10.06 | loss: 7.35299 | epsilon: 0.01
482 | 4790 episode | score: 9.93 | loss: 9.67430 | epsilon: 0.01
483 | 4800 episode | score: 9.76 | loss: 11.25577 | epsilon: 0.01
484 | 4810 episode | score: 9.65 | loss: 11.93548 | epsilon: 0.01
485 | 4820 episode | score: 9.67 | loss: 11.97796 | epsilon: 0.01
486 | 4830 episode | score: 9.63 | loss: 15.08483 | epsilon: 0.01
487 | 4840 episode | score: 9.53 | loss: 8.80377 | epsilon: 0.01
488 | 4850 episode | score: 9.47 | loss: 12.76648 | epsilon: 0.01
489 | 4860 episode | score: 9.45 | loss: 6.41104 | epsilon: 0.01
490 | 4870 episode | score: 9.38 | loss: 10.42523 | epsilon: 0.01
491 | 4880 episode | score: 9.29 | loss: 12.79074 | epsilon: 0.01
492 | 4890 episode | score: 9.19 | loss: 8.76899 | epsilon: 0.01
493 | 4900 episode | score: 9.12 | loss: 12.03445 | epsilon: 0.01
494 | 4910 episode | score: 9.12 | loss: 8.84027 | epsilon: 0.01
495 | 4920 episode | score: 9.06 | loss: 9.61884 | epsilon: 0.01
496 | 4930 episode | score: 9.09 | loss: 12.78619 | epsilon: 0.01
497 | 4940 episode | score: 9.07 | loss: 16.75804 | epsilon: 0.01
498 | 4950 episode | score: 9.00 | loss: 13.62132 | epsilon: 0.01
499 | 4960 episode | score: 8.98 | loss: 8.02409 | epsilon: 0.01
500 | 4970 episode | score: 8.97 | loss: 11.97446 | epsilon: 0.01
501 | 4980 episode | score: 8.87 | loss: 16.05861 | epsilon: 0.01
502 | 4990 episode | score: 8.82 | loss: 10.44726 | epsilon: 0.01
503 |
--------------------------------------------------------------------------------
/out/trace_DTQN_9.txt:
--------------------------------------------------------------------------------
1 | state size: 2
2 | action size: 2
3 | 0 episode | score: 21.00 | loss: 0.00000 | epsilon: 1.00
4 | 10 episode | score: 20.96 | loss: 0.00000 | epsilon: 1.00
5 | 20 episode | score: 20.74 | loss: 0.00000 | epsilon: 1.00
6 | 30 episode | score: 21.32 | loss: 0.00000 | epsilon: 1.00
7 | 40 episode | score: 21.30 | loss: 0.00000 | epsilon: 1.00
8 | 50 episode | score: 21.32 | loss: 0.12494 | epsilon: 0.99
9 | 60 episode | score: 21.28 | loss: 0.13065 | epsilon: 0.98
10 | 70 episode | score: 21.59 | loss: 0.26331 | epsilon: 0.97
11 | 80 episode | score: 21.54 | loss: 0.06410 | epsilon: 0.96
12 | 90 episode | score: 21.02 | loss: 0.17379 | epsilon: 0.95
13 | 100 episode | score: 20.89 | loss: 0.51764 | epsilon: 0.94
14 | 110 episode | score: 21.03 | loss: 0.54145 | epsilon: 0.93
15 | 120 episode | score: 20.77 | loss: 0.86984 | epsilon: 0.92
16 | 130 episode | score: 20.59 | loss: 0.20401 | epsilon: 0.91
17 | 140 episode | score: 21.17 | loss: 0.24127 | epsilon: 0.89
18 | 150 episode | score: 20.46 | loss: 0.39601 | epsilon: 0.89
19 | 160 episode | score: 20.47 | loss: 0.41627 | epsilon: 0.88
20 | 170 episode | score: 20.81 | loss: 0.66734 | epsilon: 0.86
21 | 180 episode | score: 20.67 | loss: 0.91335 | epsilon: 0.85
22 | 190 episode | score: 20.47 | loss: 1.18509 | epsilon: 0.84
23 | 200 episode | score: 20.98 | loss: 1.06382 | epsilon: 0.83
24 | 210 episode | score: 21.77 | loss: 0.59437 | epsilon: 0.81
25 | 220 episode | score: 21.55 | loss: 0.59339 | epsilon: 0.80
26 | 230 episode | score: 21.59 | loss: 1.17238 | epsilon: 0.79
27 | 240 episode | score: 21.35 | loss: 1.20873 | epsilon: 0.78
28 | 250 episode | score: 20.83 | loss: 0.05833 | epsilon: 0.77
29 | 260 episode | score: 20.76 | loss: 0.99572 | epsilon: 0.76
30 | 270 episode | score: 20.61 | loss: 1.66400 | epsilon: 0.75
31 | 280 episode | score: 21.10 | loss: 0.45072 | epsilon: 0.74
32 | 290 episode | score: 21.42 | loss: 0.72571 | epsilon: 0.73
33 | 300 episode | score: 20.94 | loss: 1.10599 | epsilon: 0.72
34 | 310 episode | score: 21.22 | loss: 0.40100 | epsilon: 0.71
35 | 320 episode | score: 21.19 | loss: 0.78178 | epsilon: 0.70
36 | 330 episode | score: 20.79 | loss: 2.69212 | epsilon: 0.69
37 | 340 episode | score: 20.73 | loss: 1.19997 | epsilon: 0.68
38 | 350 episode | score: 20.54 | loss: 0.42472 | epsilon: 0.67
39 | 360 episode | score: 20.33 | loss: 0.43453 | epsilon: 0.66
40 | 370 episode | score: 20.10 | loss: 0.85835 | epsilon: 0.65
41 | 380 episode | score: 20.22 | loss: 1.28385 | epsilon: 0.64
42 | 390 episode | score: 19.61 | loss: 0.47136 | epsilon: 0.63
43 | 400 episode | score: 19.93 | loss: 1.76091 | epsilon: 0.62
44 | 410 episode | score: 20.27 | loss: 2.30934 | epsilon: 0.60
45 | 420 episode | score: 19.88 | loss: 2.36865 | epsilon: 0.59
46 | 430 episode | score: 19.63 | loss: 1.84405 | epsilon: 0.59
47 | 440 episode | score: 19.30 | loss: 1.41989 | epsilon: 0.58
48 | 450 episode | score: 18.62 | loss: 0.48912 | epsilon: 0.57
49 | 460 episode | score: 18.05 | loss: 1.45394 | epsilon: 0.56
50 | 470 episode | score: 17.77 | loss: 1.01078 | epsilon: 0.56
51 | 480 episode | score: 17.36 | loss: 1.95200 | epsilon: 0.55
52 | 490 episode | score: 16.83 | loss: 1.95511 | epsilon: 0.54
53 | 500 episode | score: 16.72 | loss: 2.07467 | epsilon: 0.53
54 | 510 episode | score: 16.63 | loss: 2.51130 | epsilon: 0.53
55 | 520 episode | score: 16.57 | loss: 3.51992 | epsilon: 0.52
56 | 530 episode | score: 16.92 | loss: 4.61643 | epsilon: 0.51
57 | 540 episode | score: 16.41 | loss: 3.26759 | epsilon: 0.50
58 | 550 episode | score: 16.72 | loss: 4.22791 | epsilon: 0.49
59 | 560 episode | score: 17.05 | loss: 2.65086 | epsilon: 0.48
60 | 570 episode | score: 16.59 | loss: 3.17366 | epsilon: 0.47
61 | 580 episode | score: 16.34 | loss: 1.80297 | epsilon: 0.46
62 | 590 episode | score: 16.24 | loss: 1.09296 | epsilon: 0.46
63 | 600 episode | score: 15.85 | loss: 3.24726 | epsilon: 0.45
64 | 610 episode | score: 15.53 | loss: 4.36065 | epsilon: 0.44
65 | 620 episode | score: 15.64 | loss: 2.81037 | epsilon: 0.43
66 | 630 episode | score: 15.51 | loss: 2.22702 | epsilon: 0.43
67 | 640 episode | score: 15.46 | loss: 2.28287 | epsilon: 0.42
68 | 650 episode | score: 15.15 | loss: 2.27961 | epsilon: 0.41
69 | 660 episode | score: 15.35 | loss: 2.82823 | epsilon: 0.40
70 | 670 episode | score: 14.94 | loss: 3.45791 | epsilon: 0.40
71 | 680 episode | score: 14.55 | loss: 2.88071 | epsilon: 0.39
72 | 690 episode | score: 14.30 | loss: 2.88437 | epsilon: 0.38
73 | 700 episode | score: 14.20 | loss: 2.34074 | epsilon: 0.38
74 | 710 episode | score: 13.91 | loss: 4.06448 | epsilon: 0.37
75 | 720 episode | score: 13.84 | loss: 3.58162 | epsilon: 0.36
76 | 730 episode | score: 13.49 | loss: 1.78119 | epsilon: 0.36
77 | 740 episode | score: 13.53 | loss: 2.95448 | epsilon: 0.35
78 | 750 episode | score: 13.15 | loss: 2.96639 | epsilon: 0.35
79 | 760 episode | score: 13.43 | loss: 3.59416 | epsilon: 0.34
80 | 770 episode | score: 13.17 | loss: 1.20836 | epsilon: 0.33
81 | 780 episode | score: 13.01 | loss: 4.78889 | epsilon: 0.33
82 | 790 episode | score: 12.99 | loss: 4.84344 | epsilon: 0.32
83 | 800 episode | score: 12.78 | loss: 1.82502 | epsilon: 0.31
84 | 810 episode | score: 12.68 | loss: 6.67286 | epsilon: 0.31
85 | 820 episode | score: 12.46 | loss: 1.84013 | epsilon: 0.30
86 | 830 episode | score: 13.15 | loss: 5.51109 | epsilon: 0.29
87 | 840 episode | score: 12.85 | loss: 5.63518 | epsilon: 0.28
88 | 850 episode | score: 12.89 | loss: 3.74254 | epsilon: 0.28
89 | 860 episode | score: 12.65 | loss: 4.37036 | epsilon: 0.27
90 | 870 episode | score: 12.62 | loss: 3.14334 | epsilon: 0.27
91 | 880 episode | score: 12.55 | loss: 5.11216 | epsilon: 0.26
92 | 890 episode | score: 12.24 | loss: 4.41962 | epsilon: 0.25
93 | 900 episode | score: 12.23 | loss: 5.09165 | epsilon: 0.25
94 | 910 episode | score: 12.11 | loss: 5.07642 | epsilon: 0.24
95 | 920 episode | score: 11.94 | loss: 4.43827 | epsilon: 0.24
96 | 930 episode | score: 11.73 | loss: 6.97180 | epsilon: 0.23
97 | 940 episode | score: 11.53 | loss: 5.86489 | epsilon: 0.22
98 | 950 episode | score: 11.39 | loss: 5.19568 | epsilon: 0.22
99 | 960 episode | score: 11.24 | loss: 7.70984 | epsilon: 0.21
100 | 970 episode | score: 11.20 | loss: 5.80304 | epsilon: 0.21
101 | 980 episode | score: 11.66 | loss: 5.24795 | epsilon: 0.20
102 | 990 episode | score: 11.42 | loss: 5.90977 | epsilon: 0.19
103 | 1000 episode | score: 11.22 | loss: 3.26386 | epsilon: 0.19
104 | 1010 episode | score: 11.15 | loss: 7.18909 | epsilon: 0.18
105 | 1020 episode | score: 11.03 | loss: 5.88403 | epsilon: 0.18
106 | 1030 episode | score: 10.86 | loss: 5.28317 | epsilon: 0.17
107 | 1040 episode | score: 10.74 | loss: 7.23439 | epsilon: 0.17
108 | 1050 episode | score: 10.56 | loss: 7.89468 | epsilon: 0.16
109 | 1060 episode | score: 10.49 | loss: 7.96061 | epsilon: 0.16
110 | 1070 episode | score: 10.33 | loss: 6.62946 | epsilon: 0.15
111 | 1080 episode | score: 10.28 | loss: 7.93479 | epsilon: 0.15
112 | 1090 episode | score: 10.57 | loss: 8.63225 | epsilon: 0.14
113 | 1100 episode | score: 10.52 | loss: 6.12925 | epsilon: 0.13
114 | 1110 episode | score: 10.31 | loss: 4.05030 | epsilon: 0.13
115 | 1120 episode | score: 10.16 | loss: 6.83749 | epsilon: 0.12
116 | 1130 episode | score: 10.25 | loss: 6.74951 | epsilon: 0.12
117 | 1140 episode | score: 10.37 | loss: 6.06576 | epsilon: 0.11
118 | 1150 episode | score: 10.26 | loss: 6.73673 | epsilon: 0.11
119 | 1160 episode | score: 10.23 | loss: 6.74529 | epsilon: 0.10
120 | 1170 episode | score: 10.22 | loss: 8.10608 | epsilon: 0.10
121 | 1180 episode | score: 10.06 | loss: 8.78829 | epsilon: 0.09
122 | 1190 episode | score: 10.00 | loss: 8.12166 | epsilon: 0.09
123 | 1200 episode | score: 9.95 | loss: 8.14971 | epsilon: 0.08
124 | 1210 episode | score: 9.94 | loss: 7.47524 | epsilon: 0.08
125 | 1220 episode | score: 9.77 | loss: 8.87012 | epsilon: 0.07
126 | 1230 episode | score: 9.81 | loss: 4.11322 | epsilon: 0.07
127 | 1240 episode | score: 9.65 | loss: 9.55529 | epsilon: 0.06
128 | 1250 episode | score: 9.59 | loss: 7.53394 | epsilon: 0.06
129 | 1260 episode | score: 9.52 | loss: 5.47745 | epsilon: 0.05
130 | 1270 episode | score: 9.44 | loss: 8.96674 | epsilon: 0.05
131 | 1280 episode | score: 9.81 | loss: 9.69322 | epsilon: 0.04
132 | 1290 episode | score: 9.65 | loss: 4.15783 | epsilon: 0.03
133 | 1300 episode | score: 9.58 | loss: 7.06649 | epsilon: 0.03
134 | 1310 episode | score: 9.54 | loss: 7.65880 | epsilon: 0.02
135 | 1320 episode | score: 9.41 | loss: 7.74817 | epsilon: 0.02
136 | 1330 episode | score: 9.34 | loss: 7.74973 | epsilon: 0.01
137 | 1340 episode | score: 9.25 | loss: 8.37979 | epsilon: 0.01
138 | 1350 episode | score: 9.20 | loss: 7.64794 | epsilon: 0.01
139 | 1360 episode | score: 9.16 | loss: 13.24160 | epsilon: 0.01
140 | 1370 episode | score: 9.11 | loss: 9.75682 | epsilon: 0.01
141 | 1380 episode | score: 9.15 | loss: 7.03971 | epsilon: 0.01
142 | 1390 episode | score: 9.08 | loss: 7.71366 | epsilon: 0.01
143 | 1400 episode | score: 9.13 | loss: 11.21983 | epsilon: 0.01
144 | 1410 episode | score: 9.04 | loss: 11.23451 | epsilon: 0.01
145 | 1420 episode | score: 9.01 | loss: 6.33937 | epsilon: 0.01
146 | 1430 episode | score: 8.93 | loss: 7.11611 | epsilon: 0.01
147 | 1440 episode | score: 8.85 | loss: 12.66293 | epsilon: 0.01
148 | 1450 episode | score: 8.81 | loss: 10.55220 | epsilon: 0.01
149 | 1460 episode | score: 8.79 | loss: 9.18106 | epsilon: 0.01
150 | 1470 episode | score: 8.76 | loss: 11.31221 | epsilon: 0.01
151 | 1480 episode | score: 8.75 | loss: 8.49157 | epsilon: 0.01
152 | 1490 episode | score: 8.77 | loss: 7.81996 | epsilon: 0.01
153 | 1500 episode | score: 8.72 | loss: 12.06588 | epsilon: 0.01
154 | 1510 episode | score: 8.78 | loss: 9.32470 | epsilon: 0.01
155 | 1520 episode | score: 8.76 | loss: 12.09021 | epsilon: 0.01
156 | 1530 episode | score: 8.72 | loss: 7.13991 | epsilon: 0.01
157 | 1540 episode | score: 8.65 | loss: 11.45438 | epsilon: 0.01
158 | 1550 episode | score: 8.62 | loss: 8.64568 | epsilon: 0.01
159 | 1560 episode | score: 8.62 | loss: 7.22120 | epsilon: 0.01
160 | 1570 episode | score: 8.63 | loss: 5.76618 | epsilon: 0.01
161 | 1580 episode | score: 8.63 | loss: 10.17529 | epsilon: 0.01
162 | 1590 episode | score: 8.65 | loss: 7.94844 | epsilon: 0.01
163 | 1600 episode | score: 8.62 | loss: 9.36637 | epsilon: 0.01
164 | 1610 episode | score: 8.59 | loss: 10.79890 | epsilon: 0.01
165 | 1620 episode | score: 8.57 | loss: 8.65811 | epsilon: 0.01
166 | 1630 episode | score: 8.56 | loss: 10.83473 | epsilon: 0.01
167 | 1640 episode | score: 8.90 | loss: 10.93063 | epsilon: 0.01
168 | 1650 episode | score: 8.99 | loss: 13.74893 | epsilon: 0.01
169 | 1660 episode | score: 8.96 | loss: 8.70530 | epsilon: 0.01
170 | 1670 episode | score: 8.98 | loss: 9.46762 | epsilon: 0.01
171 | 1680 episode | score: 8.95 | loss: 7.37468 | epsilon: 0.01
172 | 1690 episode | score: 8.90 | loss: 3.67320 | epsilon: 0.01
173 | 1700 episode | score: 8.89 | loss: 10.23026 | epsilon: 0.01
174 | 1710 episode | score: 8.84 | loss: 11.70730 | epsilon: 0.01
175 | 1720 episode | score: 8.78 | loss: 5.85139 | epsilon: 0.01
176 | 1730 episode | score: 8.75 | loss: 11.75461 | epsilon: 0.01
177 | 1740 episode | score: 8.72 | loss: 9.58267 | epsilon: 0.01
178 | 1750 episode | score: 8.78 | loss: 10.97702 | epsilon: 0.01
179 | 1760 episode | score: 8.76 | loss: 10.28451 | epsilon: 0.01
180 | 1770 episode | score: 9.18 | loss: 13.91956 | epsilon: 0.01
181 | 1780 episode | score: 9.15 | loss: 8.86558 | epsilon: 0.01
182 | 1790 episode | score: 9.41 | loss: 10.27811 | epsilon: 0.01
183 | 1800 episode | score: 9.34 | loss: 5.93619 | epsilon: 0.01
184 | 1810 episode | score: 9.23 | loss: 8.81983 | epsilon: 0.01
185 | 1820 episode | score: 9.15 | loss: 8.83106 | epsilon: 0.01
186 | 1830 episode | score: 9.23 | loss: 11.10988 | epsilon: 0.01
187 | 1840 episode | score: 9.15 | loss: 8.85159 | epsilon: 0.01
188 | 1850 episode | score: 9.09 | loss: 12.58249 | epsilon: 0.01
189 | 1860 episode | score: 9.05 | loss: 9.61852 | epsilon: 0.01
190 | 1870 episode | score: 8.98 | loss: 8.87381 | epsilon: 0.01
191 | 1880 episode | score: 8.90 | loss: 10.43737 | epsilon: 0.01
192 | 1890 episode | score: 8.86 | loss: 11.14335 | epsilon: 0.01
193 | 1900 episode | score: 8.82 | loss: 11.12453 | epsilon: 0.01
194 | 1910 episode | score: 8.86 | loss: 10.44158 | epsilon: 0.01
195 | 1920 episode | score: 8.84 | loss: 12.64781 | epsilon: 0.01
196 | 1930 episode | score: 8.86 | loss: 11.15122 | epsilon: 0.01
197 | 1940 episode | score: 8.83 | loss: 11.15670 | epsilon: 0.01
198 | 1950 episode | score: 8.78 | loss: 8.95111 | epsilon: 0.01
199 | 1960 episode | score: 8.75 | loss: 10.43592 | epsilon: 0.01
200 | 1970 episode | score: 8.72 | loss: 7.49042 | epsilon: 0.01
201 | 1980 episode | score: 8.70 | loss: 12.69947 | epsilon: 0.01
202 | 1990 episode | score: 8.68 | loss: 14.24218 | epsilon: 0.01
203 | 2000 episode | score: 8.65 | loss: 9.06082 | epsilon: 0.01
204 | 2010 episode | score: 8.57 | loss: 9.78840 | epsilon: 0.01
205 | 2020 episode | score: 8.71 | loss: 11.30073 | epsilon: 0.01
206 | 2030 episode | score: 8.88 | loss: 12.03438 | epsilon: 0.01
207 | 2040 episode | score: 8.82 | loss: 8.31563 | epsilon: 0.01
208 | 2050 episode | score: 8.77 | loss: 13.61250 | epsilon: 0.01
209 | 2060 episode | score: 8.76 | loss: 10.57885 | epsilon: 0.01
210 | 2070 episode | score: 8.70 | loss: 7.57164 | epsilon: 0.01
211 | 2080 episode | score: 8.84 | loss: 11.41372 | epsilon: 0.01
212 | 2090 episode | score: 8.83 | loss: 9.85300 | epsilon: 0.01
213 | 2100 episode | score: 8.80 | loss: 9.90245 | epsilon: 0.01
214 | 2110 episode | score: 8.80 | loss: 9.11386 | epsilon: 0.01
215 | 2120 episode | score: 8.80 | loss: 11.46561 | epsilon: 0.01
216 | 2130 episode | score: 8.89 | loss: 9.88510 | epsilon: 0.01
217 | 2140 episode | score: 8.84 | loss: 11.45000 | epsilon: 0.01
218 | 2150 episode | score: 8.91 | loss: 9.92751 | epsilon: 0.01
219 | 2160 episode | score: 8.85 | loss: 12.21367 | epsilon: 0.01
220 | 2170 episode | score: 8.95 | loss: 13.72579 | epsilon: 0.01
221 | 2180 episode | score: 8.88 | loss: 11.49756 | epsilon: 0.01
222 | 2190 episode | score: 8.99 | loss: 9.18024 | epsilon: 0.01
223 | 2200 episode | score: 8.90 | loss: 9.31785 | epsilon: 0.01
224 | 2210 episode | score: 8.91 | loss: 9.93547 | epsilon: 0.01
225 | 2220 episode | score: 8.85 | loss: 9.92336 | epsilon: 0.01
226 | 2230 episode | score: 8.91 | loss: 7.66453 | epsilon: 0.01
227 | 2240 episode | score: 9.03 | loss: 8.46121 | epsilon: 0.01
228 | 2250 episode | score: 8.97 | loss: 13.07337 | epsilon: 0.01
229 | 2260 episode | score: 8.93 | loss: 9.23427 | epsilon: 0.01
230 | 2270 episode | score: 8.94 | loss: 9.31744 | epsilon: 0.01
231 | 2280 episode | score: 8.98 | loss: 9.97877 | epsilon: 0.01
232 | 2290 episode | score: 8.94 | loss: 8.58323 | epsilon: 0.01
233 | 2300 episode | score: 8.88 | loss: 9.21149 | epsilon: 0.01
234 | 2310 episode | score: 8.88 | loss: 12.25584 | epsilon: 0.01
235 | 2320 episode | score: 8.95 | loss: 9.25100 | epsilon: 0.01
236 | 2330 episode | score: 8.89 | loss: 6.96148 | epsilon: 0.01
237 | 2340 episode | score: 8.85 | loss: 10.03903 | epsilon: 0.01
238 | 2350 episode | score: 8.82 | loss: 10.77144 | epsilon: 0.01
239 | 2360 episode | score: 8.97 | loss: 8.43672 | epsilon: 0.01
240 | 2370 episode | score: 9.06 | loss: 13.85954 | epsilon: 0.01
241 | 2380 episode | score: 9.01 | loss: 9.24928 | epsilon: 0.01
242 | 2390 episode | score: 8.97 | loss: 12.36560 | epsilon: 0.01
243 | 2400 episode | score: 8.95 | loss: 6.94651 | epsilon: 0.01
244 | 2410 episode | score: 8.90 | loss: 12.32122 | epsilon: 0.01
245 | 2420 episode | score: 8.85 | loss: 12.33952 | epsilon: 0.01
246 | 2430 episode | score: 8.91 | loss: 9.31118 | epsilon: 0.01
247 | 2440 episode | score: 8.92 | loss: 10.07508 | epsilon: 0.01
248 | 2450 episode | score: 8.93 | loss: 9.35474 | epsilon: 0.01
249 | 2460 episode | score: 8.87 | loss: 10.87870 | epsilon: 0.01
250 | 2470 episode | score: 8.82 | loss: 8.50910 | epsilon: 0.01
251 | 2480 episode | score: 8.91 | loss: 10.12521 | epsilon: 0.01
252 | 2490 episode | score: 8.88 | loss: 11.73552 | epsilon: 0.01
253 | 2500 episode | score: 8.80 | loss: 8.57602 | epsilon: 0.01
254 | 2510 episode | score: 8.79 | loss: 10.88338 | epsilon: 0.01
255 | 2520 episode | score: 8.89 | loss: 8.54393 | epsilon: 0.01
256 | 2530 episode | score: 8.89 | loss: 10.91774 | epsilon: 0.01
257 | 2540 episode | score: 9.12 | loss: 10.86364 | epsilon: 0.01
258 | 2550 episode | score: 9.05 | loss: 15.57212 | epsilon: 0.01
259 | 2560 episode | score: 9.01 | loss: 7.00552 | epsilon: 0.01
260 | 2570 episode | score: 8.98 | loss: 9.33577 | epsilon: 0.01
261 | 2580 episode | score: 9.01 | loss: 9.40912 | epsilon: 0.01
262 | 2590 episode | score: 8.98 | loss: 8.62148 | epsilon: 0.01
263 | 2600 episode | score: 8.91 | loss: 7.81482 | epsilon: 0.01
264 | 2610 episode | score: 8.90 | loss: 10.24356 | epsilon: 0.01
265 | 2620 episode | score: 8.90 | loss: 7.01662 | epsilon: 0.01
266 | 2630 episode | score: 9.20 | loss: 7.82972 | epsilon: 0.01
267 | 2640 episode | score: 9.32 | loss: 13.31156 | epsilon: 0.01
268 | 2650 episode | score: 9.25 | loss: 13.25241 | epsilon: 0.01
269 | 2660 episode | score: 9.18 | loss: 9.42001 | epsilon: 0.01
270 | 2670 episode | score: 9.17 | loss: 6.25177 | epsilon: 0.01
271 | 2680 episode | score: 9.25 | loss: 13.35298 | epsilon: 0.01
272 | 2690 episode | score: 9.13 | loss: 11.79432 | epsilon: 0.01
273 | 2700 episode | score: 9.02 | loss: 13.34889 | epsilon: 0.01
274 | 2710 episode | score: 8.97 | loss: 7.13101 | epsilon: 0.01
275 | 2720 episode | score: 9.04 | loss: 8.60090 | epsilon: 0.01
276 | 2730 episode | score: 8.98 | loss: 7.03793 | epsilon: 0.01
277 | 2740 episode | score: 8.90 | loss: 10.19670 | epsilon: 0.01
278 | 2750 episode | score: 8.93 | loss: 10.17758 | epsilon: 0.01
279 | 2760 episode | score: 8.89 | loss: 10.98469 | epsilon: 0.01
280 | 2770 episode | score: 8.79 | loss: 6.37543 | epsilon: 0.01
281 | 2780 episode | score: 8.78 | loss: 9.44218 | epsilon: 0.01
282 | 2790 episode | score: 8.72 | loss: 8.68750 | epsilon: 0.01
283 | 2800 episode | score: 8.67 | loss: 11.84710 | epsilon: 0.01
284 | 2810 episode | score: 8.63 | loss: 10.20570 | epsilon: 0.01
285 | 2820 episode | score: 8.63 | loss: 12.58235 | epsilon: 0.01
286 | 2830 episode | score: 8.69 | loss: 11.04398 | epsilon: 0.01
287 | 2840 episode | score: 8.67 | loss: 10.31757 | epsilon: 0.01
288 | 2850 episode | score: 8.63 | loss: 10.31371 | epsilon: 0.01
289 | 2860 episode | score: 8.77 | loss: 14.91368 | epsilon: 0.01
290 | 2870 episode | score: 8.82 | loss: 15.00642 | epsilon: 0.01
291 | 2880 episode | score: 8.80 | loss: 12.59427 | epsilon: 0.01
292 | 2890 episode | score: 8.82 | loss: 13.39069 | epsilon: 0.01
293 | 2900 episode | score: 8.84 | loss: 11.82799 | epsilon: 0.01
294 | 2910 episode | score: 9.20 | loss: 9.44360 | epsilon: 0.01
295 | 2920 episode | score: 9.71 | loss: 8.69737 | epsilon: 0.01
296 | 2930 episode | score: 9.61 | loss: 6.31859 | epsilon: 0.01
297 | 2940 episode | score: 9.48 | loss: 9.55086 | epsilon: 0.01
298 | 2950 episode | score: 9.41 | loss: 9.44469 | epsilon: 0.01
299 | 2960 episode | score: 9.28 | loss: 7.95107 | epsilon: 0.01
300 | 2970 episode | score: 9.26 | loss: 7.17244 | epsilon: 0.01
301 | 2980 episode | score: 9.21 | loss: 7.11789 | epsilon: 0.01
302 | 2990 episode | score: 9.15 | loss: 7.16511 | epsilon: 0.01
303 | 3000 episode | score: 9.10 | loss: 13.47608 | epsilon: 0.01
304 | 3010 episode | score: 9.15 | loss: 7.94981 | epsilon: 0.01
305 | 3020 episode | score: 9.10 | loss: 11.87295 | epsilon: 0.01
306 | 3030 episode | score: 9.04 | loss: 10.43319 | epsilon: 0.01
307 | 3040 episode | score: 9.11 | loss: 11.96408 | epsilon: 0.01
308 | 3050 episode | score: 9.03 | loss: 10.34738 | epsilon: 0.01
309 | 3060 episode | score: 8.99 | loss: 11.88029 | epsilon: 0.01
310 | 3070 episode | score: 8.93 | loss: 10.37521 | epsilon: 0.01
311 | 3080 episode | score: 8.84 | loss: 11.11475 | epsilon: 0.01
312 | 3090 episode | score: 8.80 | loss: 10.34103 | epsilon: 0.01
313 | 3100 episode | score: 8.89 | loss: 12.71417 | epsilon: 0.01
314 | 3110 episode | score: 8.85 | loss: 9.57880 | epsilon: 0.01
315 | 3120 episode | score: 8.78 | loss: 11.87391 | epsilon: 0.01
316 | 3130 episode | score: 8.81 | loss: 11.11053 | epsilon: 0.01
317 | 3140 episode | score: 9.23 | loss: 7.96314 | epsilon: 0.01
318 | 3150 episode | score: 9.42 | loss: 14.31875 | epsilon: 0.01
319 | 3160 episode | score: 9.36 | loss: 9.55417 | epsilon: 0.01
320 | 3170 episode | score: 9.36 | loss: 12.69406 | epsilon: 0.01
321 | 3180 episode | score: 9.31 | loss: 12.02454 | epsilon: 0.01
322 | 3190 episode | score: 9.29 | loss: 12.67994 | epsilon: 0.01
323 | 3200 episode | score: 9.23 | loss: 10.64494 | epsilon: 0.01
324 | 3210 episode | score: 9.14 | loss: 10.39057 | epsilon: 0.01
325 | 3220 episode | score: 9.08 | loss: 8.77181 | epsilon: 0.01
326 | 3230 episode | score: 9.00 | loss: 8.75515 | epsilon: 0.01
327 | 3240 episode | score: 8.94 | loss: 9.55666 | epsilon: 0.01
328 | 3250 episode | score: 8.92 | loss: 11.69119 | epsilon: 0.01
329 | 3260 episode | score: 8.85 | loss: 18.40750 | epsilon: 0.01
330 | 3270 episode | score: 8.79 | loss: 23.38196 | epsilon: 0.01
331 | 3280 episode | score: 8.83 | loss: 6.67032 | epsilon: 0.01
332 | 3290 episode | score: 9.01 | loss: 11.30644 | epsilon: 0.01
333 | 3300 episode | score: 8.98 | loss: 9.37187 | epsilon: 0.01
334 | 3310 episode | score: 8.98 | loss: 9.69363 | epsilon: 0.01
335 | 3320 episode | score: 9.07 | loss: 8.18816 | epsilon: 0.01
336 | 3330 episode | score: 9.09 | loss: 8.10379 | epsilon: 0.01
337 | 3340 episode | score: 9.08 | loss: 8.61894 | epsilon: 0.01
338 | 3350 episode | score: 9.00 | loss: 11.35091 | epsilon: 0.01
339 | 3360 episode | score: 8.93 | loss: 10.41170 | epsilon: 0.01
340 | 3370 episode | score: 8.90 | loss: 14.24506 | epsilon: 0.01
341 | 3380 episode | score: 9.33 | loss: 12.81649 | epsilon: 0.01
342 | 3390 episode | score: 9.27 | loss: 7.33425 | epsilon: 0.01
343 | 3400 episode | score: 9.75 | loss: 12.60556 | epsilon: 0.01
344 | 3410 episode | score: 9.63 | loss: 12.47419 | epsilon: 0.01
345 | 3420 episode | score: 9.98 | loss: 13.25487 | epsilon: 0.01
346 | 3430 episode | score: 9.89 | loss: 5.03355 | epsilon: 0.01
347 | 3440 episode | score: 9.73 | loss: 8.11193 | epsilon: 0.01
348 | 3450 episode | score: 9.60 | loss: 8.43487 | epsilon: 0.01
349 | 3460 episode | score: 9.51 | loss: 13.05702 | epsilon: 0.01
350 | 3470 episode | score: 9.48 | loss: 3.30770 | epsilon: 0.01
351 | 3480 episode | score: 9.73 | loss: 9.20855 | epsilon: 0.01
352 | 3490 episode | score: 9.64 | loss: 8.84441 | epsilon: 0.01
353 | 3500 episode | score: 9.56 | loss: 12.18956 | epsilon: 0.01
354 | 3510 episode | score: 9.77 | loss: 7.42808 | epsilon: 0.01
355 | 3520 episode | score: 9.67 | loss: 8.75190 | epsilon: 0.01
356 | 3530 episode | score: 9.52 | loss: 9.91304 | epsilon: 0.01
357 | 3540 episode | score: 9.42 | loss: 7.19323 | epsilon: 0.01
358 | 3550 episode | score: 9.86 | loss: 9.61124 | epsilon: 0.01
359 | 3560 episode | score: 11.48 | loss: 9.93398 | epsilon: 0.01
360 | 3570 episode | score: 11.56 | loss: 9.95154 | epsilon: 0.01
361 | 3580 episode | score: 12.09 | loss: 5.97820 | epsilon: 0.01
362 | 3590 episode | score: 11.88 | loss: 11.61210 | epsilon: 0.01
363 | 3600 episode | score: 12.62 | loss: 5.66679 | epsilon: 0.01
364 | 3610 episode | score: 12.25 | loss: 4.34159 | epsilon: 0.01
365 | 3620 episode | score: 13.68 | loss: 6.75365 | epsilon: 0.01
366 | 3630 episode | score: 15.21 | loss: 5.62228 | epsilon: 0.01
367 | 3640 episode | score: 14.70 | loss: 2.85260 | epsilon: 0.01
368 | 3650 episode | score: 14.35 | loss: 8.27749 | epsilon: 0.01
369 | 3660 episode | score: 13.93 | loss: 4.46703 | epsilon: 0.01
370 | 3670 episode | score: 13.44 | loss: 4.22112 | epsilon: 0.01
371 | 3680 episode | score: 13.32 | loss: 6.51251 | epsilon: 0.01
372 | 3690 episode | score: 13.15 | loss: 3.76290 | epsilon: 0.01
373 | 3700 episode | score: 13.54 | loss: 4.41752 | epsilon: 0.01
374 | 3710 episode | score: 13.34 | loss: 4.64118 | epsilon: 0.01
375 | 3720 episode | score: 13.35 | loss: 5.92149 | epsilon: 0.01
376 | 3730 episode | score: 13.07 | loss: 6.23527 | epsilon: 0.01
377 | 3740 episode | score: 12.62 | loss: 5.54841 | epsilon: 0.01
378 | 3750 episode | score: 13.63 | loss: 10.53116 | epsilon: 0.01
379 | 3760 episode | score: 14.01 | loss: 3.18782 | epsilon: 0.01
380 | 3770 episode | score: 14.20 | loss: 8.02784 | epsilon: 0.01
381 | 3780 episode | score: 15.00 | loss: 2.93599 | epsilon: 0.01
382 | 3790 episode | score: 14.71 | loss: 4.80426 | epsilon: 0.01
383 | 3800 episode | score: 15.71 | loss: 7.91253 | epsilon: 0.01
384 | 3810 episode | score: 15.91 | loss: 4.73834 | epsilon: 0.01
385 | 3820 episode | score: 15.61 | loss: 0.79304 | epsilon: 0.01
386 | 3830 episode | score: 15.49 | loss: 4.78924 | epsilon: 0.01
387 | 3840 episode | score: 15.02 | loss: 4.90685 | epsilon: 0.01
388 | 3850 episode | score: 14.68 | loss: 2.36603 | epsilon: 0.01
389 | 3860 episode | score: 14.17 | loss: 6.88053 | epsilon: 0.01
390 | 3870 episode | score: 13.97 | loss: 5.15767 | epsilon: 0.01
391 | 3880 episode | score: 14.36 | loss: 4.82018 | epsilon: 0.01
392 | 3890 episode | score: 14.01 | loss: 2.58375 | epsilon: 0.01
393 | 3900 episode | score: 15.21 | loss: 4.39568 | epsilon: 0.01
394 | 3910 episode | score: 15.64 | loss: 4.94824 | epsilon: 0.01
395 | 3920 episode | score: 15.34 | loss: 5.53893 | epsilon: 0.01
396 | 3930 episode | score: 15.41 | loss: 6.22380 | epsilon: 0.01
397 | 3940 episode | score: 15.19 | loss: 6.26805 | epsilon: 0.01
398 | 3950 episode | score: 14.76 | loss: 4.59807 | epsilon: 0.01
399 | 3960 episode | score: 14.71 | loss: 7.60362 | epsilon: 0.01
400 | 3970 episode | score: 14.59 | loss: 6.93035 | epsilon: 0.01
401 | 3980 episode | score: 15.08 | loss: 6.01341 | epsilon: 0.01
402 | 3990 episode | score: 14.44 | loss: 5.23150 | epsilon: 0.01
403 | 4000 episode | score: 14.43 | loss: 3.90165 | epsilon: 0.01
404 | 4010 episode | score: 14.15 | loss: 4.73526 | epsilon: 0.01
405 | 4020 episode | score: 13.75 | loss: 6.11941 | epsilon: 0.01
406 | 4030 episode | score: 13.47 | loss: 7.32031 | epsilon: 0.01
407 | 4040 episode | score: 13.28 | loss: 3.62526 | epsilon: 0.01
408 | 4050 episode | score: 14.35 | loss: 1.43349 | epsilon: 0.01
409 | 4060 episode | score: 15.05 | loss: 3.68072 | epsilon: 0.01
410 | 4070 episode | score: 14.74 | loss: 4.02725 | epsilon: 0.01
411 | 4080 episode | score: 14.60 | loss: 6.00835 | epsilon: 0.01
412 | 4090 episode | score: 15.40 | loss: 4.19405 | epsilon: 0.01
413 | 4100 episode | score: 16.11 | loss: 2.98184 | epsilon: 0.01
414 | 4110 episode | score: 16.46 | loss: 3.63669 | epsilon: 0.01
415 | 4120 episode | score: 16.70 | loss: 2.74856 | epsilon: 0.01
416 | 4130 episode | score: 16.46 | loss: 3.80108 | epsilon: 0.01
417 | 4140 episode | score: 16.42 | loss: 5.00285 | epsilon: 0.01
418 | 4150 episode | score: 15.81 | loss: 1.60470 | epsilon: 0.01
419 | 4160 episode | score: 15.18 | loss: 4.54753 | epsilon: 0.01
420 | 4170 episode | score: 14.66 | loss: 3.38707 | epsilon: 0.01
421 | 4180 episode | score: 14.33 | loss: 2.86823 | epsilon: 0.01
422 | 4190 episode | score: 13.96 | loss: 3.27965 | epsilon: 0.01
423 | 4200 episode | score: 13.44 | loss: 6.32444 | epsilon: 0.01
424 | 4210 episode | score: 13.13 | loss: 7.85530 | epsilon: 0.01
425 | 4220 episode | score: 13.60 | loss: 7.71556 | epsilon: 0.01
426 | 4230 episode | score: 13.40 | loss: 8.20314 | epsilon: 0.01
427 | 4240 episode | score: 13.91 | loss: 0.95849 | epsilon: 0.01
428 | 4250 episode | score: 13.88 | loss: 8.55000 | epsilon: 0.01
429 | 4260 episode | score: 13.95 | loss: 3.74746 | epsilon: 0.01
430 | 4270 episode | score: 13.61 | loss: 9.64572 | epsilon: 0.01
431 | 4280 episode | score: 14.63 | loss: 4.42514 | epsilon: 0.01
432 | 4290 episode | score: 14.96 | loss: 6.65158 | epsilon: 0.01
433 | 4300 episode | score: 14.47 | loss: 4.77864 | epsilon: 0.01
434 | 4310 episode | score: 14.14 | loss: 4.92499 | epsilon: 0.01
435 | 4320 episode | score: 13.82 | loss: 7.58319 | epsilon: 0.01
436 | 4330 episode | score: 13.53 | loss: 6.55989 | epsilon: 0.01
437 | 4340 episode | score: 13.27 | loss: 6.31598 | epsilon: 0.01
438 | 4350 episode | score: 12.89 | loss: 7.07195 | epsilon: 0.01
439 | 4360 episode | score: 12.64 | loss: 11.24029 | epsilon: 0.01
440 | 4370 episode | score: 12.29 | loss: 8.43339 | epsilon: 0.01
441 | 4380 episode | score: 11.98 | loss: 9.08022 | epsilon: 0.01
442 | 4390 episode | score: 11.71 | loss: 9.21856 | epsilon: 0.01
443 | 4400 episode | score: 11.38 | loss: 13.00905 | epsilon: 0.01
444 | 4410 episode | score: 11.13 | loss: 6.91477 | epsilon: 0.01
445 | 4420 episode | score: 10.97 | loss: 9.24840 | epsilon: 0.01
446 | 4430 episode | score: 10.79 | loss: 5.93028 | epsilon: 0.01
447 | 4440 episode | score: 10.67 | loss: 11.75588 | epsilon: 0.01
448 | 4450 episode | score: 10.62 | loss: 7.53685 | epsilon: 0.01
449 | 4460 episode | score: 10.42 | loss: 8.77642 | epsilon: 0.01
450 | 4470 episode | score: 10.38 | loss: 12.00876 | epsilon: 0.01
451 | 4480 episode | score: 10.28 | loss: 12.04044 | epsilon: 0.01
452 | 4490 episode | score: 10.33 | loss: 10.63362 | epsilon: 0.01
453 | 4500 episode | score: 10.17 | loss: 12.11068 | epsilon: 0.01
454 | 4510 episode | score: 10.04 | loss: 9.83064 | epsilon: 0.01
455 | 4520 episode | score: 9.96 | loss: 12.98160 | epsilon: 0.01
456 | 4530 episode | score: 9.78 | loss: 9.32460 | epsilon: 0.01
457 | 4540 episode | score: 9.67 | loss: 11.56910 | epsilon: 0.01
458 | 4550 episode | score: 9.70 | loss: 13.24871 | epsilon: 0.01
459 | 4560 episode | score: 9.53 | loss: 10.25375 | epsilon: 0.01
460 | 4570 episode | score: 9.51 | loss: 11.35277 | epsilon: 0.01
461 | 4580 episode | score: 9.49 | loss: 10.24683 | epsilon: 0.01
462 | 4590 episode | score: 9.61 | loss: 15.10229 | epsilon: 0.01
463 | 4600 episode | score: 9.56 | loss: 10.60676 | epsilon: 0.01
464 | 4610 episode | score: 9.63 | loss: 12.66099 | epsilon: 0.01
465 | 4620 episode | score: 10.00 | loss: 9.28884 | epsilon: 0.01
466 | 4630 episode | score: 9.89 | loss: 7.29501 | epsilon: 0.01
467 | 4640 episode | score: 10.42 | loss: 8.61101 | epsilon: 0.01
468 | 4650 episode | score: 10.34 | loss: 7.17378 | epsilon: 0.01
469 | 4660 episode | score: 10.53 | loss: 8.58301 | epsilon: 0.01
470 | 4670 episode | score: 10.51 | loss: 5.47748 | epsilon: 0.01
471 | 4680 episode | score: 10.47 | loss: 6.90692 | epsilon: 0.01
472 | 4690 episode | score: 10.29 | loss: 9.03390 | epsilon: 0.01
473 | 4700 episode | score: 10.20 | loss: 5.70795 | epsilon: 0.01
474 | 4710 episode | score: 10.72 | loss: 9.54066 | epsilon: 0.01
475 | 4720 episode | score: 10.60 | loss: 12.63967 | epsilon: 0.01
476 | 4730 episode | score: 10.43 | loss: 4.03638 | epsilon: 0.01
477 | 4740 episode | score: 10.49 | loss: 6.36343 | epsilon: 0.01
478 | 4750 episode | score: 10.51 | loss: 7.95718 | epsilon: 0.01
479 | 4760 episode | score: 10.43 | loss: 7.94708 | epsilon: 0.01
480 | 4770 episode | score: 10.52 | loss: 4.00741 | epsilon: 0.01
481 | 4780 episode | score: 10.51 | loss: 9.49057 | epsilon: 0.01
482 | 4790 episode | score: 10.81 | loss: 7.92099 | epsilon: 0.01
483 | 4800 episode | score: 10.74 | loss: 3.20152 | epsilon: 0.01
484 | 4810 episode | score: 10.62 | loss: 5.55500 | epsilon: 0.01
485 | 4820 episode | score: 11.10 | loss: 4.79100 | epsilon: 0.01
486 | 4830 episode | score: 11.42 | loss: 7.12332 | epsilon: 0.01
487 | 4840 episode | score: 11.13 | loss: 4.76933 | epsilon: 0.01
488 | 4850 episode | score: 12.03 | loss: 7.91687 | epsilon: 0.01
489 | 4860 episode | score: 11.75 | loss: 4.77076 | epsilon: 0.01
490 | 4870 episode | score: 11.48 | loss: 7.90673 | epsilon: 0.01
491 | 4880 episode | score: 11.24 | loss: 8.69827 | epsilon: 0.01
492 | 4890 episode | score: 11.08 | loss: 7.91009 | epsilon: 0.01
493 | 4900 episode | score: 10.85 | loss: 6.32833 | epsilon: 0.01
494 | 4910 episode | score: 10.73 | loss: 7.97584 | epsilon: 0.01
495 | 4920 episode | score: 10.65 | loss: 7.13840 | epsilon: 0.01
496 | 4930 episode | score: 10.61 | loss: 7.24520 | epsilon: 0.01
497 | 4940 episode | score: 10.45 | loss: 11.09426 | epsilon: 0.01
498 | 4950 episode | score: 10.63 | loss: 8.69860 | epsilon: 0.01
499 | 4960 episode | score: 10.46 | loss: 9.52120 | epsilon: 0.01
500 | 4970 episode | score: 10.31 | loss: 8.69969 | epsilon: 0.01
501 | 4980 episode | score: 10.18 | loss: 7.11379 | epsilon: 0.01
502 | 4990 episode | score: 10.52 | loss: 11.04724 | epsilon: 0.01
503 |
--------------------------------------------------------------------------------
/src/.ipynb_checkpoints/plot_graphs-checkpoint.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [],
3 | "metadata": {},
4 | "nbformat": 4,
5 | "nbformat_minor": 2
6 | }
7 |
--------------------------------------------------------------------------------
/src/__pycache__/config_DQN.cpython-37.pyc:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/udion/Transformer-RL/2cdc29f6c31375f69d1ca1e2ce74b01300b94044/src/__pycache__/config_DQN.cpython-37.pyc
--------------------------------------------------------------------------------
/src/__pycache__/config_DRQN.cpython-37.pyc:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/udion/Transformer-RL/2cdc29f6c31375f69d1ca1e2ce74b01300b94044/src/__pycache__/config_DRQN.cpython-37.pyc
--------------------------------------------------------------------------------
/src/__pycache__/config_DTQN.cpython-37.pyc:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/udion/Transformer-RL/2cdc29f6c31375f69d1ca1e2ce74b01300b94044/src/__pycache__/config_DTQN.cpython-37.pyc
--------------------------------------------------------------------------------
/src/__pycache__/memory.cpython-37.pyc:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/udion/Transformer-RL/2cdc29f6c31375f69d1ca1e2ce74b01300b94044/src/__pycache__/memory.cpython-37.pyc
--------------------------------------------------------------------------------
/src/__pycache__/model_DQN.cpython-37.pyc:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/udion/Transformer-RL/2cdc29f6c31375f69d1ca1e2ce74b01300b94044/src/__pycache__/model_DQN.cpython-37.pyc
--------------------------------------------------------------------------------
/src/__pycache__/model_DRQN.cpython-37.pyc:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/udion/Transformer-RL/2cdc29f6c31375f69d1ca1e2ce74b01300b94044/src/__pycache__/model_DRQN.cpython-37.pyc
--------------------------------------------------------------------------------
/src/__pycache__/model_DTQN.cpython-37.pyc:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/udion/Transformer-RL/2cdc29f6c31375f69d1ca1e2ce74b01300b94044/src/__pycache__/model_DTQN.cpython-37.pyc
--------------------------------------------------------------------------------
/src/bash_gen_trace.sh:
--------------------------------------------------------------------------------
1 | for i in {1..10}
2 | do
3 | echo "training DQN trace $i ..."
4 | python train_DQN.py > ../out/trace_DQN_"$i".txt
5 | done
6 |
7 | for i in {1..10}
8 | do
9 | echo "training DRQN trace $i ..."
10 | python train_DRQN.py > ../out/trace_DRQN_"$i".txt
11 | done
12 |
13 | for i in {1..10}
14 | do
15 | echo "training DTQN trace $i ..."
16 | python train_DTQN.py > ../out/trace_DTQN_"$i".txt
17 | done
--------------------------------------------------------------------------------
/src/config_DQN.py:
--------------------------------------------------------------------------------
1 | import torch
2 |
3 | env_name = 'CartPole-v1'
4 | gamma = 0.99
5 | batch_size = 32
6 | lr = 0.0001
7 | initial_exploration = 1000
8 | goal_score = 200
9 | log_interval = 10
10 | update_target = 100
11 | replay_memory_capacity = 1000
12 | device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
13 |
14 | sequence_length = 4
--------------------------------------------------------------------------------
/src/config_DRQN.py:
--------------------------------------------------------------------------------
1 | import torch
2 |
3 | env_name = 'CartPole-v1'
4 | gamma = 0.99
5 | batch_size = 32
6 | lr = 0.001
7 | initial_exploration = 1000
8 | goal_score = 200
9 | log_interval = 10
10 | update_target = 100
11 | replay_memory_capacity = 100
12 | device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
13 |
14 | sequence_length = 8
15 | burn_in_length = 4
--------------------------------------------------------------------------------
/src/config_DTQN.py:
--------------------------------------------------------------------------------
1 | import torch
2 |
3 | env_name = 'CartPole-v1'
4 | gamma = 0.99
5 | batch_size = 32
6 | lr = 0.001
7 | initial_exploration = 1000
8 | goal_score = 200
9 | log_interval = 10
10 | update_target = 100
11 | replay_memory_capacity = 100
12 | device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
13 |
14 | sequence_length = 8
15 | burn_in_length = 4
--------------------------------------------------------------------------------
/src/memory.py:
--------------------------------------------------------------------------------
1 | import random
2 | from collections import namedtuple, deque
3 | from config_DQN import sequence_length as sequence_length_DQN
4 | from config_DRQN import sequence_length as sequence_length_DRQN
5 | from config_DTQN import sequence_length as sequence_length_DTQN
6 | import numpy as np
7 | import torch
8 |
9 | Transition = namedtuple(
10 | 'Transition', ('state', 'next_state', 'action', 'reward', 'mask')
11 | )
12 |
13 | class Memory_DQN(object):
14 | def __init__(self, capacity):
15 | self.memory = deque(maxlen=capacity)
16 | self.capacity = capacity
17 |
18 | def push(self, state, next_state, action, reward, mask):
19 | self.memory.append(Transition(torch.stack(list(state)), torch.stack(list(next_state)), action, reward, mask))
20 |
21 | def sample(self, batch_size):
22 | transitions = random.sample(self.memory, batch_size)
23 | batch = Transition(*zip(*transitions))
24 | return batch
25 |
26 | def __len__(self):
27 | return len(self.memory)
28 |
29 | class Memory_DRQN(object):
30 | def __init__(self, capacity):
31 | self.memory = deque(maxlen=capacity)
32 | self.local_memory = []
33 | self.capacity = capacity
34 |
35 | def push(self, state, next_state, action, reward, mask):
36 | self.local_memory.append(Transition(state, next_state, action, reward, mask))
37 | if mask == 0:
38 | while len(self.local_memory) < sequence_length_DRQN:
39 | self.local_memory.insert(0, Transition(
40 | torch.Tensor([0, 0]),
41 | torch.Tensor([0, 0]),
42 | 0,
43 | 0,
44 | 0,
45 | ))
46 | self.memory.append(self.local_memory)
47 | self.local_memory = []
48 |
49 | def sample(self, batch_size):
50 | batch_state, batch_next_state, batch_action, batch_reward, batch_mask = [], [], [], [], []
51 | p = np.array([len(episode) for episode in self.memory])
52 | p = p / p.sum()
53 |
54 | batch_indexes = np.random.choice(np.arange(len(self.memory)), batch_size, p=p)
55 |
56 | for batch_idx in batch_indexes:
57 | episode = self.memory[batch_idx]
58 |
59 | start = random.randint(0, len(episode) - sequence_length_DRQN)
60 | transitions = episode[start:start + sequence_length_DRQN]
61 | batch = Transition(*zip(*transitions))
62 |
63 | # print(batch.state)
64 | batch_state.append(torch.stack( list(map(lambda s: s.to('cpu'), list(batch.state))) ))
65 | batch_next_state.append(torch.stack( list(map(lambda s: s.to('cpu'), list(batch.next_state))) ))
66 | batch_action.append(torch.Tensor(list(batch.action) ))
67 | batch_reward.append(torch.Tensor(list(batch.reward)))
68 | batch_mask.append(torch.Tensor(list(batch.mask)))
69 |
70 | return Transition(batch_state, batch_next_state, batch_action, batch_reward, batch_mask)
71 |
72 | def __len__(self):
73 | return len(self.memory)
74 |
75 | class Memory_DTQN(object):
76 | def __init__(self, capacity):
77 | self.memory = deque(maxlen=capacity)
78 | self.local_memory = []
79 | self.capacity = capacity
80 |
81 | def push(self, state, next_state, action, reward, mask):
82 | self.local_memory.append(Transition(state, next_state, action, reward, mask))
83 | if mask == 0:
84 | while len(self.local_memory) < sequence_length_DTQN:
85 | self.local_memory.insert(0, Transition(
86 | torch.Tensor([0, 0]),
87 | torch.Tensor([0, 0]),
88 | 0,
89 | 0,
90 | 0,
91 | ))
92 | self.memory.append(self.local_memory)
93 | self.local_memory = []
94 |
95 | def sample(self, batch_size):
96 | batch_state, batch_next_state, batch_action, batch_reward, batch_mask = [], [], [], [], []
97 | p = np.array([len(episode) for episode in self.memory])
98 | p = p / p.sum()
99 |
100 | batch_indexes = np.random.choice(np.arange(len(self.memory)), batch_size, p=p)
101 |
102 | for batch_idx in batch_indexes:
103 | episode = self.memory[batch_idx]
104 |
105 | start = random.randint(0, len(episode) - sequence_length_DTQN)
106 | transitions = episode[start:start + sequence_length_DTQN]
107 | batch = Transition(*zip(*transitions))
108 |
109 | # print(batch.state)
110 | batch_state.append(torch.stack( list(map(lambda s: s.to('cpu'), list(batch.state))) ))
111 | batch_next_state.append(torch.stack( list(map(lambda s: s.to('cpu'), list(batch.next_state))) ))
112 | batch_action.append(torch.Tensor(list(batch.action) ))
113 | batch_reward.append(torch.Tensor(list(batch.reward)))
114 | batch_mask.append(torch.Tensor(list(batch.mask)))
115 |
116 | return Transition(batch_state, batch_next_state, batch_action, batch_reward, batch_mask)
117 |
118 | def __len__(self):
119 | return len(self.memory)
120 |
--------------------------------------------------------------------------------
/src/model_DQN.py:
--------------------------------------------------------------------------------
1 | import torch
2 | import torch.nn as nn
3 | import torch.nn.functional as F
4 | from config_DQN import gamma, sequence_length, device
5 | # torch.manual_seed(0)
6 |
7 | class QNet(nn.Module):
8 | def __init__(self, num_inputs, num_outputs):
9 | super(QNet, self).__init__()
10 | self.num_inputs = num_inputs
11 | self.num_outputs = num_outputs
12 |
13 | self.fc1 = nn.Linear(num_inputs * sequence_length, 128)
14 | self.fc2 = nn.Linear(128, num_outputs)
15 |
16 | for m in self.modules():
17 | if isinstance(m, nn.Linear):
18 | nn.init.xavier_uniform_(m.weight)
19 |
20 | def forward(self, x):
21 | # print(1, x.shape)
22 | seq_length = x.size(1)
23 | if seq_length != sequence_length:
24 | x = torch.cat([x]*(sequence_length-seq_length+1), dim=1)
25 | # print('in', x.shape)
26 | x = x.view(-1, self.num_inputs * sequence_length)
27 | # print(2, x.shape)
28 | x = F.relu(self.fc1(x))
29 | # print(3, x.shape)
30 | qvalue = self.fc2(x)
31 | return qvalue
32 |
33 | @classmethod
34 | def train_model(cls, online_net, target_net, optimizer, batch):
35 | states = torch.stack(batch.state).to(device)
36 | next_states = torch.stack(batch.next_state).to(device)
37 | actions = torch.Tensor(batch.action).float().to(device)
38 | rewards = torch.Tensor(batch.reward).to(device)
39 | masks = torch.Tensor(batch.mask).to(device)
40 |
41 | pred = online_net(states)
42 | next_pred = target_net(next_states)
43 |
44 | pred = torch.sum(pred.mul(actions), dim=1)
45 |
46 | target = rewards + masks * gamma * next_pred.max(1)[0]
47 |
48 | loss = F.l1_loss(pred, target.detach())
49 | optimizer.zero_grad()
50 | loss.backward()
51 | optimizer.step()
52 |
53 | return loss
54 |
55 | def get_action(self, input):
56 | qvalue = self.forward(input)
57 | _, action = torch.max(qvalue, 1)
58 | return action.cpu().numpy()[0]
59 |
--------------------------------------------------------------------------------
/src/model_DRQN.py:
--------------------------------------------------------------------------------
1 | import torch
2 | import torch.nn as nn
3 | import torch.nn.functional as F
4 |
5 | from config_DRQN import gamma, device, batch_size, sequence_length, burn_in_length
6 | # torch.manual_seed(0)
7 |
8 | class DRQN(nn.Module):
9 | def __init__(self, num_inputs, num_outputs):
10 | super(DRQN, self).__init__()
11 | self.num_inputs = num_inputs
12 | self.num_outputs = num_outputs
13 |
14 | self.lstm = nn.LSTM(input_size=num_inputs, hidden_size=128, batch_first=True)
15 | self.fc1 = nn.Linear(128, 256)
16 | self.fc2 = nn.Linear(256, num_outputs)
17 |
18 | for m in self.modules():
19 | if isinstance(m, nn.Linear):
20 | nn.init.xavier_uniform_(m.weight)
21 |
22 | def forward(self, x, hidden=None):
23 | # x [batch_size, sequence_length, num_inputs]
24 |
25 | if hidden is not None:
26 | out, hidden = self.lstm(x, hidden)
27 | # print('if', out.shape, hidden[0].shape, x.shape)
28 | else:
29 | out, hidden = self.lstm(x)
30 | # print('else', out.shape, hidden[0].shape, x.shape)
31 | out = F.relu(self.fc1(out))
32 | qvalue = self.fc2(out)
33 |
34 | return qvalue, hidden
35 |
36 |
37 | @classmethod
38 | def train_model(cls, online_net, target_net, optimizer, batch):
39 | def slice_burn_in(item):
40 | return item[:, burn_in_length:, :]
41 | states = torch.stack(batch.state).view(batch_size, sequence_length, online_net.num_inputs).to(device)
42 | next_states = torch.stack(batch.next_state).view(batch_size, sequence_length, online_net.num_inputs).to(device)
43 | actions = torch.stack(batch.action).view(batch_size, sequence_length, -1).long().to(device)
44 | rewards = torch.stack(batch.reward).view(batch_size, sequence_length, -1).to(device)
45 | masks = torch.stack(batch.mask).view(batch_size, sequence_length, -1).to(device)
46 |
47 | pred, _ = online_net(states)
48 | next_pred, _ = target_net(next_states)
49 |
50 | pred = slice_burn_in(pred)
51 | next_pred = slice_burn_in(next_pred)
52 | actions = slice_burn_in(actions)
53 | rewards = slice_burn_in(rewards)
54 | masks = slice_burn_in(masks)
55 |
56 | pred = pred.gather(2, actions)
57 | # print('dbg', rewards.shape, masks.shape, next_states.shape, next_pred.shape)
58 | target = rewards + masks * gamma * next_pred.max(2, keepdim=True)[0]
59 |
60 | loss = F.l1_loss(pred, target.detach())
61 | optimizer.zero_grad()
62 | loss.backward()
63 | optimizer.step()
64 |
65 | return loss
66 |
67 | def get_action(self, state, hidden):
68 | state = state.unsqueeze(0).unsqueeze(0)
69 |
70 | qvalue, hidden = self.forward(state, hidden)
71 |
72 | _, action = torch.max(qvalue, 2)
73 |
74 | return action.cpu().numpy()[0][0], hidden
75 |
--------------------------------------------------------------------------------
/src/model_DTQN.py:
--------------------------------------------------------------------------------
1 | import torch
2 | import torch.nn as nn
3 | import torch.nn.functional as F
4 |
5 | from config_DTQN import gamma, device, batch_size, sequence_length, burn_in_length
6 | # torch.manual_seed(0)
7 |
8 | class DTQN(nn.Module):
9 | def __init__(self, num_inputs, num_outputs):
10 | super(DTQN, self).__init__()
11 | self.num_inputs = num_inputs
12 | self.num_outputs = num_outputs
13 |
14 | self.fc = nn.Linear(2, 64)
15 | self.Tlayer = nn.TransformerEncoderLayer(d_model=64, nhead=2)
16 | self.transformerE = nn.TransformerEncoder(self.Tlayer, num_layers=3)
17 |
18 | self.fc1 = nn.Linear(64, 32)
19 | self.fc2 = nn.Linear(32, num_outputs)
20 |
21 | for m in self.modules():
22 | if isinstance(m, nn.Linear):
23 | nn.init.xavier_uniform_(m.weight)
24 |
25 | def forward(self, x, hidden=None):
26 | x = x.transpose(0,1)
27 | x = self.fc(x)
28 | out = self.transformerE(x)
29 | out = out.transpose(0,1)
30 | out = F.relu(self.fc1(out))
31 | qvalue = self.fc2(out)
32 |
33 | return qvalue, hidden
34 |
35 |
36 | @classmethod
37 | def train_model(cls, online_net, target_net, optimizer, batch):
38 | def slice_burn_in(item):
39 | return item[:, burn_in_length:, :]
40 | states = torch.stack(batch.state).view(batch_size, sequence_length, online_net.num_inputs).to(device)
41 | next_states = torch.stack(batch.next_state).view(batch_size, sequence_length, online_net.num_inputs).to(device)
42 | actions = torch.stack(batch.action).view(batch_size, sequence_length, -1).long().to(device)
43 | rewards = torch.stack(batch.reward).view(batch_size, sequence_length, -1).to(device)
44 | masks = torch.stack(batch.mask).view(batch_size, sequence_length, -1).to(device)
45 |
46 | pred, _ = online_net(states)
47 | next_pred, _ = target_net(next_states)
48 |
49 | pred = slice_burn_in(pred)
50 | next_pred = slice_burn_in(next_pred)
51 | actions = slice_burn_in(actions)
52 | rewards = slice_burn_in(rewards)
53 | masks = slice_burn_in(masks)
54 |
55 | pred = pred.gather(2, actions)
56 |
57 | target = rewards + masks * gamma * next_pred.max(2, keepdim=True)[0]
58 |
59 | loss = F.l1_loss(pred, target.detach())
60 | optimizer.zero_grad()
61 | loss.backward()
62 | optimizer.step()
63 |
64 | return loss
65 |
66 | def get_action(self, state, hidden):
67 | state = state.unsqueeze(0).unsqueeze(0)
68 |
69 | qvalue, hidden = self.forward(state, hidden)
70 |
71 | _, action = torch.max(qvalue, 2)
72 |
73 | return action.cpu().numpy()[0][0], hidden
74 |
--------------------------------------------------------------------------------
/src/train_DQN.py:
--------------------------------------------------------------------------------
1 | import os
2 | import sys
3 | import gym
4 | import random
5 | import numpy as np
6 |
7 | import torch
8 | import torch.optim as optim
9 | import torch.nn.functional as F
10 | from model_DQN import QNet
11 | from memory import Memory_DQN as Memory
12 | from tensorboardX import SummaryWriter
13 |
14 | from config_DQN import env_name, initial_exploration, batch_size, update_target, goal_score, log_interval, device, replay_memory_capacity, lr, sequence_length
15 | from collections import deque
16 |
17 | # torch.manual_seed(0)
18 | # random.seed(0)
19 | # np.random.seed(0)
20 |
21 | def get_action(state_series, target_net, epsilon, env):
22 | if np.random.rand() <= epsilon or len(state_series) < sequence_length:
23 | return env.action_space.sample()
24 | else:
25 | return target_net.get_action(torch.stack(list(state_series)))
26 |
27 | def update_target_model(online_net, target_net):
28 | # Target <- Net
29 | target_net.load_state_dict(online_net.state_dict())
30 |
31 | def state_to_partial_observability(state):
32 | state = state[[0, 2]]
33 | return state
34 |
35 | def main():
36 | env = gym.make(env_name)
37 | env.seed(500)
38 | torch.manual_seed(500)
39 |
40 | num_inputs = 2
41 | num_actions = env.action_space.n
42 | print('state size:', num_inputs)
43 | print('action size:', num_actions)
44 |
45 | online_net = QNet(num_inputs, num_actions)
46 | target_net = QNet(num_inputs, num_actions)
47 | update_target_model(online_net, target_net)
48 |
49 | optimizer = optim.Adam(online_net.parameters(), lr=lr)
50 | N_EPISODES = 5000
51 | writer = SummaryWriter('logs')
52 |
53 | online_net.to(device)
54 | target_net.to(device)
55 | online_net.train()
56 | target_net.train()
57 | memory = Memory(replay_memory_capacity)
58 | running_score = 0
59 | epsilon = 1.0
60 | steps = 0
61 | loss = 0
62 |
63 | for e in range(N_EPISODES):
64 | done = False
65 |
66 | state_series = deque(maxlen=sequence_length)
67 | next_state_series = deque(maxlen=sequence_length)
68 | score = 0
69 | state = env.reset()
70 |
71 | state = state_to_partial_observability(state)
72 | state = torch.Tensor(state).to(device)
73 |
74 | next_state_series.append(state)
75 | while not done:
76 | steps += 1
77 | state_series.append(state)
78 | action = get_action(state_series, target_net, epsilon, env)
79 | next_state, reward, done, _ = env.step(action)
80 |
81 | next_state = state_to_partial_observability(next_state)
82 | next_state = torch.Tensor(next_state).to(device)
83 |
84 |
85 | mask = 0 if done else 1
86 | reward = reward if not done or score == 499 else -1
87 | action_one_hot = np.zeros(2)
88 | action_one_hot[action] = 1
89 | if len(state_series) >= sequence_length:
90 | memory.push(state_series, next_state_series, action_one_hot, reward, mask)
91 |
92 | score += reward
93 | state = next_state
94 |
95 | if steps > initial_exploration:
96 | epsilon -= 0.000005
97 | epsilon = max(epsilon, 0.01)
98 |
99 | batch = memory.sample(batch_size)
100 | loss = QNet.train_model(online_net, target_net, optimizer, batch)
101 |
102 | if steps % update_target == 0:
103 | update_target_model(online_net, target_net)
104 |
105 | score = score if score == 500.0 else score + 1
106 | if running_score == 0:
107 | running_score = score
108 | else:
109 | running_score = 0.99 * running_score + 0.01 * score
110 | if e % log_interval == 0:
111 | print('{} episode | score: {:.2f} | loss: {:.5f} | epsilon: {:.2f}'.format(
112 | e, running_score, loss, epsilon))
113 | writer.add_scalar('log/score', float(running_score), e)
114 | writer.add_scalar('log/loss', float(loss), e)
115 |
116 | if running_score > goal_score:
117 | break
118 |
119 | if __name__=="__main__":
120 | main()
--------------------------------------------------------------------------------
/src/train_DRQN.py:
--------------------------------------------------------------------------------
1 | import os
2 | import sys
3 | import gym
4 | import random
5 | import numpy as np
6 |
7 | import torch
8 | import torch.optim as optim
9 | import torch.nn.functional as F
10 | from model_DRQN import DRQN
11 | from memory import Memory_DRQN as Memory
12 | from tensorboardX import SummaryWriter
13 |
14 | from config_DRQN import env_name, initial_exploration, batch_size, update_target, goal_score, log_interval, device, replay_memory_capacity, lr, sequence_length
15 |
16 | from collections import deque
17 | # torch.manual_seed(0)
18 | # random.seed(0)
19 | # np.random.seed(0)
20 |
21 | def get_action(state, target_net, epsilon, env, hidden):
22 | action, hidden = target_net.get_action(state, hidden)
23 | if np.random.rand() <= epsilon:
24 | return env.action_space.sample(), hidden
25 | else:
26 | return action, hidden
27 |
28 | def update_target_model(online_net, target_net):
29 | # Target <- Net
30 | target_net.load_state_dict(online_net.state_dict())
31 |
32 | def state_to_partial_observability(state):
33 | # print(state)
34 | state = state[[0, 2]]
35 | # print(state)
36 | return state
37 |
38 | def main():
39 | env = gym.make(env_name)
40 | env.seed(500)
41 | torch.manual_seed(500)
42 |
43 | # num_inputs = env.observation_space.shape[0]
44 | num_inputs = 2
45 | num_actions = env.action_space.n
46 | print('state size:', num_inputs)
47 | print('action size:', num_actions)
48 |
49 | online_net = DRQN(num_inputs, num_actions)
50 | target_net = DRQN(num_inputs, num_actions)
51 | update_target_model(online_net, target_net)
52 |
53 | optimizer = optim.Adam(online_net.parameters(), lr=lr)
54 | N_EPISODES = 5000
55 | # scheduler = optim.lr_scheduler.CosineAnnealingLR(optimizer, N_EPISODES)
56 | writer = SummaryWriter('logs')
57 |
58 | online_net.to(device)
59 | target_net.to(device)
60 | online_net.train()
61 | target_net.train()
62 | memory = Memory(replay_memory_capacity)
63 | running_score = 0
64 | epsilon = 1.0
65 | steps = 0
66 | loss = 0
67 |
68 | for e in range(N_EPISODES):
69 | done = False
70 |
71 | score = 0
72 | state = env.reset()
73 | state = state_to_partial_observability(state)
74 | state = torch.Tensor(state).to(device)
75 |
76 | hidden = None
77 |
78 | while not done:
79 | steps += 1
80 |
81 | # print(state.type(), hidden)
82 | action, hidden = get_action(state, target_net, epsilon, env, hidden)
83 | next_state, reward, done, _ = env.step(action)
84 |
85 | next_state = state_to_partial_observability(next_state)
86 | next_state = torch.Tensor(next_state).to(device)
87 |
88 | mask = 0 if done else 1
89 | reward = reward if not done or score == 499 else -1
90 |
91 | memory.push(state, next_state, action, reward, mask)
92 |
93 | score += reward
94 | state = next_state
95 |
96 |
97 | if steps > initial_exploration and len(memory) > batch_size:
98 | epsilon -= 0.00005
99 | epsilon = max(epsilon, 0.01)
100 |
101 | batch = memory.sample(batch_size)
102 | loss = DRQN.train_model(online_net, target_net, optimizer, batch)
103 |
104 | if steps % update_target == 0:
105 | update_target_model(online_net, target_net)
106 | # scheduler.step()
107 |
108 | score = score if score == 500.0 else score + 1
109 | if running_score == 0:
110 | running_score = score
111 | else:
112 | running_score = 0.99 * running_score + 0.01 * score
113 | if e % log_interval == 0:
114 | print('{} episode | score: {:.2f} | loss: {:.5f} | epsilon: {:.2f}'.format(
115 | e, running_score, loss, epsilon))
116 | writer.add_scalar('log/score', float(running_score), e)
117 | writer.add_scalar('log/loss', float(loss), e)
118 |
119 | if running_score > goal_score:
120 | break
121 |
122 | if __name__=="__main__":
123 | main()
124 |
--------------------------------------------------------------------------------
/src/train_DTQN.py:
--------------------------------------------------------------------------------
1 | import os
2 | import sys
3 | import gym
4 | import random
5 | import numpy as np
6 |
7 | import torch
8 | import torch.optim as optim
9 | import torch.nn.functional as F
10 | from model_DTQN import DTQN
11 | from memory import Memory_DTQN as Memory
12 | from tensorboardX import SummaryWriter
13 |
14 | from config_DRQN import env_name, initial_exploration, batch_size, update_target, goal_score, log_interval, device, replay_memory_capacity, lr, sequence_length
15 |
16 | from collections import deque
17 | # torch.manual_seed(0)
18 | # random.seed(0)
19 | # np.random.seed(0)
20 |
21 | def get_action(state, target_net, epsilon, env, hidden):
22 | action, hidden = target_net.get_action(state, hidden)
23 | if np.random.rand() <= epsilon:
24 | return env.action_space.sample(), hidden
25 | else:
26 | return action, hidden
27 |
28 | def update_target_model(online_net, target_net):
29 | # Target <- Net
30 | target_net.load_state_dict(online_net.state_dict())
31 |
32 | def state_to_partial_observability(state):
33 | # print(state)
34 | state = state[[0, 2]]
35 | # print(state)
36 | return state
37 |
38 | def main():
39 | env = gym.make(env_name)
40 | env.seed(500)
41 | torch.manual_seed(500)
42 |
43 | # num_inputs = env.observation_space.shape[0]
44 | num_inputs = 2
45 | num_actions = env.action_space.n
46 | print('state size:', num_inputs)
47 | print('action size:', num_actions)
48 |
49 | online_net = DTQN(num_inputs, num_actions)
50 | target_net = DTQN(num_inputs, num_actions)
51 | update_target_model(online_net, target_net)
52 |
53 | optimizer = optim.Adam(online_net.parameters(), lr=lr)
54 | N_EPISODES = 5000
55 | writer = SummaryWriter('logs')
56 |
57 | online_net.to(device)
58 | target_net.to(device)
59 | online_net.train()
60 | target_net.train()
61 | memory = Memory(replay_memory_capacity)
62 | running_score = 0
63 | epsilon = 1.0
64 | steps = 0
65 | loss = 0
66 |
67 | for e in range(N_EPISODES):
68 | done = False
69 |
70 | score = 0
71 | state = env.reset()
72 | state = state_to_partial_observability(state)
73 | state = torch.Tensor(state).to(device)
74 |
75 | hidden = None
76 |
77 | while not done:
78 | steps += 1
79 |
80 | action, hidden = get_action(state, target_net, epsilon, env, hidden)
81 | next_state, reward, done, _ = env.step(action)
82 |
83 | next_state = state_to_partial_observability(next_state)
84 | next_state = torch.Tensor(next_state).to(device)
85 |
86 | mask = 0 if done else 1
87 | reward = reward if not done or score == 499 else -1
88 |
89 | memory.push(state, next_state, action, reward, mask)
90 |
91 | score += reward
92 | state = next_state
93 |
94 |
95 | if steps > initial_exploration and len(memory) > batch_size:
96 | epsilon -= 0.00005
97 | epsilon = max(epsilon, 0.01)
98 |
99 | batch = memory.sample(batch_size)
100 | loss = DTQN.train_model(online_net, target_net, optimizer, batch)
101 |
102 | if steps % update_target == 0:
103 | update_target_model(online_net, target_net)
104 |
105 | score = score if score == 500.0 else score + 1
106 | if running_score == 0:
107 | running_score = score
108 | else:
109 | running_score = 0.99 * running_score + 0.01 * score
110 | if e % log_interval == 0:
111 | print('{} episode | score: {:.2f} | loss: {:.5f} | epsilon: {:.2f}'.format(
112 | e, running_score, loss, epsilon))
113 | writer.add_scalar('log/score', float(running_score), e)
114 | writer.add_scalar('log/loss', float(loss), e)
115 |
116 | if running_score > goal_score:
117 | break
118 |
119 |
120 | if __name__=="__main__":
121 | main()
122 |
--------------------------------------------------------------------------------