├── tests ├── __init__.py └── test_env.py ├── nfq ├── __init__.py ├── networks.py └── agents.py ├── paper.pdf ├── environments ├── __init__.py └── cartpole.py ├── cartpole.conf ├── .gitmodules ├── .travis.yml ├── requirements-dev.txt ├── setup.cfg ├── Makefile ├── requirements.txt ├── .pre-commit-config.yaml ├── LICENSE ├── .gitignore ├── README.md └── train_eval.py /tests/__init__.py: -------------------------------------------------------------------------------- 1 | # flake8: noqa 2 | -------------------------------------------------------------------------------- /nfq/__init__.py: -------------------------------------------------------------------------------- 1 | """Neural Fitted Q-Iteration.""" 2 | -------------------------------------------------------------------------------- /paper.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/seungjaeryanlee/implementations-nfq/HEAD/paper.pdf -------------------------------------------------------------------------------- /environments/__init__.py: -------------------------------------------------------------------------------- 1 | """Environments specified in NFQ paper.""" 2 | # flake8: noqa 3 | from .cartpole import CartPoleRegulatorEnv 4 | -------------------------------------------------------------------------------- /cartpole.conf: -------------------------------------------------------------------------------- 1 | EPOCH = 2000 2 | TRAIN_ENV_MAX_STEPS = 100 3 | EVAL_ENV_MAX_STEPS = 3000 4 | DISCOUNT = 0.95 5 | INIT_EXPERIENCE = 0 6 | 7 | INCREMENT_EXPERIENCE 8 | HINT_TO_GOAL 9 | -------------------------------------------------------------------------------- /.gitmodules: -------------------------------------------------------------------------------- 1 | [submodule "tests/utils"] 2 | path = tests/utils 3 | url = https://github.com/seungjaeryanlee/implementations-utils-tests.git 4 | [submodule "utils"] 5 | path = utils 6 | url = https://github.com/seungjaeryanlee/implementations-utils.git 7 | -------------------------------------------------------------------------------- /.travis.yml: -------------------------------------------------------------------------------- 1 | language: python 2 | python: 3 | - "3.6" 4 | install: 5 | - pip install -r requirements.txt --quiet 6 | - pip install -r requirements-dev.txt --quiet 7 | script: 8 | - black --check . 9 | - flake8 10 | - isort **/*.py -c -vb 11 | - pytest 12 | cache: pip 13 | -------------------------------------------------------------------------------- /requirements-dev.txt: -------------------------------------------------------------------------------- 1 | black>=19.3b0 2 | flake8>=3.7.8 3 | flake8-bugbear>=19.3.0 4 | flake8-docstrings>=1.3.0 5 | isort>=4.3.21 6 | pytest>=5.0.1 7 | seed-isort-config>=1.9.2 8 | pre-commit==1.17.0 9 | 10 | # https://github.com/pytest-dev/pytest-cov/issues/252 11 | pytest-remotedata>=0.3.1 12 | # https://github.com/PyCQA/pydocstyle/issues/375 13 | pydocstyle<4.0.0 14 | -------------------------------------------------------------------------------- /setup.cfg: -------------------------------------------------------------------------------- 1 | [flake8] 2 | max-line-length = 88 3 | select = C,E,F,W,B,B950,D 4 | ignore = 5 | E203, 6 | E501, 7 | W503, 8 | D101, # Missing docstring in public class 9 | D105, 10 | D107, 11 | D202, # No blank lines allowed after function docstring 12 | exclude = 13 | .git, 14 | __pycache__, 15 | .ipynb_checkpoints, 16 | 17 | [isort] 18 | known_third_party=configargparse,gym,numpy,pytest,torch 19 | -------------------------------------------------------------------------------- /Makefile: -------------------------------------------------------------------------------- 1 | # Install dependencies 2 | dep: 3 | pip install -r requirements.txt 4 | 5 | # Install developer dependencies 6 | dev: 7 | pip install -r requirements.txt 8 | pip install -r requirements-dev.txt 9 | pre-commit install 10 | 11 | # Format code with black and isort 12 | format: 13 | black . 14 | seed-isort-config 15 | isort -y 16 | 17 | # Test code with black, flake8, isort, mypy, and pytest. 18 | test: 19 | pytest -v 20 | black --check . 21 | isort **/*.py -c 22 | flake8 23 | -------------------------------------------------------------------------------- /requirements.txt: -------------------------------------------------------------------------------- 1 | # ConfigArgParse allows using config files with argparse. 2 | ConfigArgParse==0.14.0 3 | 4 | # OpenAI Gym is de-facto standard for RL agent-environment interface. 5 | gym==0.14.0 6 | 7 | # PyTorch v1.1+ supports TensorBoard. 8 | torch==1.1.0 9 | 10 | # TensorBoard v1.14+ supports PyTorch. 11 | tensorboard==1.14.0 12 | 13 | # Weights & Biases allows visualizing data online. 14 | wandb==0.8.5 15 | 16 | # coloredlogs provide colors to Python's logger. 17 | coloredlogs==10.0 18 | -------------------------------------------------------------------------------- /.pre-commit-config.yaml: -------------------------------------------------------------------------------- 1 | repos: 2 | - repo: https://github.com/pre-commit/pre-commit-hooks 3 | rev: v2.1.0 # This is pre-commit-hooks version, NOT flake8 version! 4 | hooks: 5 | - id: flake8 6 | - repo: https://github.com/ambv/black 7 | rev: stable 8 | hooks: 9 | - id: black 10 | - repo: https://github.com/asottile/seed-isort-config 11 | rev: v1.5.0 12 | hooks: 13 | - id: seed-isort-config 14 | - repo: https://github.com/pre-commit/mirrors-isort 15 | rev: 'v4.3.21' 16 | hooks: 17 | - id: isort 18 | -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | MIT License 2 | 3 | Copyright (c) 2018 Seungjae Ryan Lee 4 | 5 | Permission is hereby granted, free of charge, to any person obtaining a copy 6 | of this software and associated documentation files (the "Software"), to deal 7 | in the Software without restriction, including without limitation the rights 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 9 | copies of the Software, and to permit persons to whom the Software is 10 | furnished to do so, subject to the following conditions: 11 | 12 | The above copyright notice and this permission notice shall be included in all 13 | copies or substantial portions of the Software. 14 | 15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 21 | SOFTWARE. 22 | -------------------------------------------------------------------------------- /nfq/networks.py: -------------------------------------------------------------------------------- 1 | """Networks for NFQ.""" 2 | import torch 3 | import torch.nn as nn 4 | 5 | 6 | class NFQNetwork(nn.Module): 7 | def __init__(self): 8 | """Networks for NFQ.""" 9 | super().__init__() 10 | self.layers = nn.Sequential( 11 | nn.Linear(5, 5), 12 | nn.Sigmoid(), 13 | nn.Linear(5, 5), 14 | nn.Sigmoid(), 15 | nn.Linear(5, 1), 16 | nn.Sigmoid(), 17 | ) 18 | 19 | # Initialize weights to [-0.5, 0.5] 20 | def init_weights(m): 21 | if type(m) == nn.Linear: 22 | torch.nn.init.uniform_(m.weight, -0.5, 0.5) 23 | # TODO(seungjaeryanlee): What about bias? 24 | 25 | self.layers.apply(init_weights) 26 | 27 | def forward(self, x: torch.Tensor) -> torch.Tensor: 28 | """ 29 | Forward propagation. 30 | 31 | Parameters 32 | ---------- 33 | x : torch.Tensor 34 | Input tensor of observation and action concatenated. 35 | 36 | Returns 37 | ------- 38 | y : torch.Tensor 39 | Forward-propagated observation predicting Q-value. 40 | 41 | """ 42 | return self.layers(x) 43 | -------------------------------------------------------------------------------- /.gitignore: -------------------------------------------------------------------------------- 1 | ## Custom 2 | tensorboard_logs/ 3 | wandb/ 4 | saves/ 5 | 6 | # Byte-compiled / optimized / DLL files 7 | __pycache__/ 8 | *.py[cod] 9 | *$py.class 10 | 11 | # C extensions 12 | *.so 13 | 14 | # Distribution / packaging 15 | .Python 16 | build/ 17 | develop-eggs/ 18 | dist/ 19 | downloads/ 20 | eggs/ 21 | .eggs/ 22 | lib/ 23 | lib64/ 24 | parts/ 25 | sdist/ 26 | var/ 27 | wheels/ 28 | *.egg-info/ 29 | .installed.cfg 30 | *.egg 31 | MANIFEST 32 | 33 | # PyInstaller 34 | # Usually these files are written by a python script from a template 35 | # before PyInstaller builds the exe, so as to inject date/other infos into it. 36 | *.manifest 37 | *.spec 38 | 39 | # Installer logs 40 | pip-log.txt 41 | pip-delete-this-directory.txt 42 | 43 | # Unit test / coverage reports 44 | htmlcov/ 45 | .tox/ 46 | .coverage 47 | .coverage.* 48 | .cache 49 | nosetests.xml 50 | coverage.xml 51 | *.cover 52 | .hypothesis/ 53 | .pytest_cache/ 54 | 55 | # Translations 56 | *.mo 57 | *.pot 58 | 59 | # Django stuff: 60 | *.log 61 | local_settings.py 62 | db.sqlite3 63 | 64 | # Flask stuff: 65 | instance/ 66 | .webassets-cache 67 | 68 | # Scrapy stuff: 69 | .scrapy 70 | 71 | # Sphinx documentation 72 | docs/_build/ 73 | 74 | # PyBuilder 75 | target/ 76 | 77 | # Jupyter Notebook 78 | .ipynb_checkpoints 79 | 80 | # pyenv 81 | .python-version 82 | 83 | # celery beat schedule file 84 | celerybeat-schedule 85 | 86 | # SageMath parsed files 87 | *.sage.py 88 | 89 | # Environments 90 | .env 91 | .venv 92 | env/ 93 | venv/ 94 | ENV/ 95 | env.bak/ 96 | venv.bak/ 97 | 98 | # Spyder project settings 99 | .spyderproject 100 | .spyproject 101 | 102 | # Rope project settings 103 | .ropeproject 104 | 105 | # mkdocs documentation 106 | /site 107 | 108 | # mypy 109 | .mypy_cache/ 110 | -------------------------------------------------------------------------------- /tests/test_env.py: -------------------------------------------------------------------------------- 1 | """Stub unit tests.""" 2 | import numpy as np 3 | import pytest 4 | 5 | from environments import CartPoleRegulatorEnv as Env 6 | 7 | 8 | class TestCartPoleRegulatorEnv: 9 | def test_train_mode_reset(self): 10 | """Test reset() in train mode.""" 11 | train_env = Env(mode="train") 12 | x, x_, theta, theta_ = train_env.reset() 13 | 14 | assert abs(x) <= 2.3 15 | assert x_ == 0 16 | assert abs(theta) <= 0.3 17 | assert theta_ == 0 18 | 19 | def test_eval_mode_reset(self): 20 | """Test reset() in eval mode.""" 21 | eval_env = Env(mode="eval") 22 | x, x_, theta, theta_ = eval_env.reset() 23 | 24 | assert abs(x) <= 1.0 25 | assert x_ == 0 26 | assert abs(theta) <= 0.3 27 | assert theta_ == 0 28 | 29 | @pytest.mark.parametrize("env", [Env(mode="train"), Env(mode="eval")]) 30 | def test_get_goal_pattern_set(self, env): 31 | """Test get_goal_pattern_set().""" 32 | goal_state_action_b, goal_target_q_values = env.get_goal_pattern_set() 33 | 34 | for x, _, theta, _, action in goal_state_action_b: 35 | assert abs(x) <= env.x_success_range 36 | assert abs(theta) <= env.theta_success_range 37 | assert action in [0, 1] 38 | for target in goal_target_q_values: 39 | assert target == 0 40 | 41 | @pytest.mark.parametrize("env", [Env(mode="train"), Env(mode="eval")]) 42 | @pytest.mark.parametrize("get_best_action", [None, lambda x: 0]) 43 | def test_generate_rollout_next_obs(self, env, get_best_action): 44 | """Test generate_rollout() generates continued observation.""" 45 | env = Env(mode="train") 46 | rollout, episode_cost = env.generate_rollout(get_best_action=None) 47 | 48 | prev_next_obs = rollout[0][3] 49 | for obs, _, _, next_obs, _ in rollout[1:]: 50 | assert np.array_equal(prev_next_obs, obs) 51 | prev_next_obs = next_obs 52 | 53 | @pytest.mark.parametrize("env", [Env(mode="train"), Env(mode="eval")]) 54 | @pytest.mark.parametrize("get_best_action", [None, lambda x: 0]) 55 | def test_generate_rollout_cost_threshold(self, env, get_best_action): 56 | """Test generate_rollout() does not have a cost over 1.""" 57 | env = Env(mode="train") 58 | rollout, episode_cost = env.generate_rollout(get_best_action=None) 59 | 60 | for (_, _, cost, _, _) in rollout: 61 | assert 0 <= cost <= 1 62 | 63 | @pytest.mark.parametrize("env", [Env(mode="train"), Env(mode="eval")]) 64 | @pytest.mark.parametrize("get_best_action", [None, lambda x: 0]) 65 | def test_generate_rollout_episode_cost(self, env, get_best_action): 66 | """Test generate_rollout()'s second return value episode_cost.""" 67 | env = Env(mode="train") 68 | rollout, episode_cost = env.generate_rollout(get_best_action=None) 69 | 70 | total_cost = 0 71 | for _, _, cost, _, _ in rollout: 72 | total_cost += cost 73 | assert episode_cost == total_cost 74 | 75 | @pytest.mark.parametrize("env", [Env(mode="train"), Env(mode="eval")]) 76 | @pytest.mark.parametrize("get_best_action", [None, lambda x: 0]) 77 | def test_generate_rollout_with_random_action_done_value(self, env, get_best_action): 78 | """Test done values of generate_rollout()e.""" 79 | env = Env(mode="train") 80 | rollout, episode_cost = env.generate_rollout(get_best_action) 81 | 82 | for i, (_, _, _, _, done) in enumerate(rollout): 83 | if i + 1 < len(rollout): 84 | assert not done 85 | else: 86 | assert done or len(rollout) == env.max_steps 87 | 88 | @pytest.mark.parametrize("env", [Env(mode="train"), Env(mode="eval")]) 89 | def test_generate_rollout_gest_best_action(self, env): 90 | """Test generate_rollout() uses get_best_action correctly.""" 91 | env = Env(mode="train") 92 | rollout, _ = env.generate_rollout(get_best_action=lambda x: 0) 93 | 94 | for _, action, _, _, _ in rollout: 95 | assert action == 0 96 | -------------------------------------------------------------------------------- /nfq/agents.py: -------------------------------------------------------------------------------- 1 | """Reinforcement learning agents.""" 2 | from typing import List, Tuple 3 | 4 | import gym 5 | import numpy as np 6 | import torch 7 | import torch.nn as nn 8 | import torch.nn.functional as F 9 | import torch.optim as optim 10 | 11 | 12 | class NFQAgent: 13 | def __init__(self, nfq_net: nn.Module, optimizer: optim.Optimizer): 14 | """ 15 | Neural Fitted Q-Iteration agent. 16 | 17 | Parameters 18 | ---------- 19 | nfq_net : nn.Module 20 | The Q-Network that returns estimated cost given observation and action. 21 | optimizer : optim.Optimzer 22 | Optimizer for training the NFQ network. 23 | 24 | """ 25 | self._nfq_net = nfq_net 26 | self._optimizer = optimizer 27 | 28 | def get_best_action(self, obs: np.array) -> int: 29 | """ 30 | Return best action for given observation according to the neural network. 31 | 32 | Parameters 33 | ---------- 34 | obs : np.array 35 | An observation to find the best action for. 36 | 37 | Returns 38 | ------- 39 | action : int 40 | The action chosen by greedy selection. 41 | 42 | """ 43 | q_left = self._nfq_net( 44 | torch.cat([torch.FloatTensor(obs), torch.FloatTensor([0])], dim=0) 45 | ) 46 | q_right = self._nfq_net( 47 | torch.cat([torch.FloatTensor(obs), torch.FloatTensor([1])], dim=0) 48 | ) 49 | 50 | # Best action has lower "Q" value since it estimates cumulative cost. 51 | return 1 if q_left >= q_right else 0 52 | 53 | def generate_pattern_set( 54 | self, 55 | rollouts: List[Tuple[np.array, int, int, np.array, bool]], 56 | gamma: float = 0.95, 57 | ): 58 | """Generate pattern set. 59 | 60 | Parameters 61 | ---------- 62 | rollouts : list of tuple 63 | Generated rollouts, which is a tuple of state, action, cost, next state, and done. 64 | gamma : float 65 | Discount factor. Defaults to 0.95. 66 | 67 | Returns 68 | ------- 69 | pattern_set : tuple of torch.Tensor 70 | Pattern set to train the NFQ network. 71 | 72 | """ 73 | # _b denotes batch 74 | state_b, action_b, cost_b, next_state_b, done_b = zip(*rollouts) 75 | state_b = torch.FloatTensor(state_b) 76 | action_b = torch.FloatTensor(action_b) 77 | cost_b = torch.FloatTensor(cost_b) 78 | next_state_b = torch.FloatTensor(next_state_b) 79 | done_b = torch.FloatTensor(done_b) 80 | 81 | state_action_b = torch.cat([state_b, action_b.unsqueeze(1)], 1) 82 | assert state_action_b.shape == (len(rollouts), state_b.shape[1] + 1) 83 | 84 | # Compute min_a Q(s', a) 85 | q_next_state_left_b = self._nfq_net( 86 | torch.cat([next_state_b, torch.zeros(len(rollouts), 1)], 1) 87 | ).squeeze() 88 | q_next_state_right_b = self._nfq_net( 89 | torch.cat([next_state_b, torch.ones(len(rollouts), 1)], 1) 90 | ).squeeze() 91 | q_next_state_b = torch.min(q_next_state_left_b, q_next_state_right_b) 92 | 93 | # If goal state (S+): target = 0 + gamma * min Q 94 | # If forbidden state (S-): target = 1 95 | # If neither: target = c_trans + gamma * min Q 96 | # NOTE(seungjaeryanlee): done is True only when the episode terminated 97 | # due to entering forbidden state. It is not 98 | # True if it terminated due to maximum timestep. 99 | with torch.no_grad(): 100 | target_q_values = cost_b + gamma * q_next_state_b * (1 - done_b) 101 | 102 | return state_action_b, target_q_values 103 | 104 | def train(self, pattern_set: Tuple[torch.Tensor, torch.Tensor]) -> float: 105 | """Train neural network with a given pattern set. 106 | 107 | Parameters 108 | ---------- 109 | pattern_set : tuple of torch.Tensor 110 | Pattern set to train the NFQ network. 111 | 112 | Returns 113 | ------- 114 | loss : float 115 | Training loss. 116 | 117 | """ 118 | state_action_b, target_q_values = pattern_set 119 | predicted_q_values = self._nfq_net(state_action_b).squeeze() 120 | loss = F.mse_loss(predicted_q_values, target_q_values) 121 | 122 | self._optimizer.zero_grad() 123 | loss.backward() 124 | self._optimizer.step() 125 | 126 | return loss.item() 127 | 128 | def evaluate(self, eval_env: gym.Env, render: bool) -> Tuple[int, str, float]: 129 | """Evaluate NFQ agent on evaluation environment. 130 | 131 | Parameters 132 | ---------- 133 | eval_env : gym.Env 134 | Environment to evaluate the agent. 135 | render: bool 136 | If true, render environment. 137 | 138 | Returns 139 | ------- 140 | episode_length : int 141 | Number of steps the agent took. 142 | success : bool 143 | True if the agent was terminated due to max timestep. 144 | episode_cost : float 145 | Total cost accumulated from the evaluation episode. 146 | 147 | """ 148 | episode_length = 0 149 | obs = eval_env.reset() 150 | done = False 151 | info = {"time_limit": False} 152 | episode_cost = 0 153 | while not done and not info["time_limit"]: 154 | action = self.get_best_action(obs) 155 | obs, cost, done, info = eval_env.step(action) 156 | episode_cost += cost 157 | episode_length += 1 158 | 159 | if render: 160 | eval_env.render() 161 | 162 | success = ( 163 | episode_length == eval_env.max_steps 164 | and abs(obs[0]) <= eval_env.x_success_range 165 | ) 166 | 167 | return episode_length, success, episode_cost 168 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Neural Fitted Q Iteration - First Experiences with a Data Efficient Neural Reinforcement Learning Method 2 | 3 | [](https://black.readthedocs.io/en/stable/) 4 | [](http://flake8.pycqa.org/en/latest/) 5 | [](https://pypi.org/project/isort/) 6 | [](https://docs.pytest.org/en/latest/) 7 | 8 | [](https://numpydoc.readthedocs.io/en/latest/format.html#docstring-standard) 9 | [](https://pre-commit.com/) 10 | 11 | This repository is an implementation of the paper [Neural Fitted Q Iteration - First Experiences with a Data Efficient Neural Reinforcement Learning Method (Riedmiller, 2005)](/paper.pdf). 12 | 13 | **Please ⭐ this repository if you found it useful!** 14 | 15 | 16 | --- 17 | 18 | ### Table of Contents 📜 19 | 20 | - [Summary](#summary-) 21 | - [Installation](#installation-) 22 | - [Running](#running-) 23 | - [Results](#results-) 24 | - [Differences from the Paper](#differences-from-the-paper-) 25 | - [Reproducibility](#reproducibility-) 26 | 27 | For implementations of other deep learning papers, check the **[implementations](https://github.com/seungjaeryanlee/implementations) repository**! 28 | 29 | --- 30 | 31 | ### Summary 📝 32 | 33 | Neural Fitted Q-Iteration used a deep neural network for a Q-network, with its input being observation (s) and action (a) and its output being its action value (Q(s, a)). Instead of online Q-learning, the paper proposes **batch offline updates** by collecting experience throughout the episode and updating with that batch. The paper also suggests **hint-to-goal** method, where the neural network is trained explicitly in goal regions so that it can correctly estimate the value of the goal region. 34 | 35 | ### Installation 🧱 36 | 37 | First, clone this repository from GitHub. Since this repository contains submodules, you should use the `--recursive` flag. 38 | 39 | ```bash 40 | git clone --recursive https://github.com/seungjaeryanlee/implementations-nfq.git 41 | ``` 42 | 43 | If you already cloned the repository without the flag, you can download the submodules separately with the `git submodules` command: 44 | 45 | ```bash 46 | git clone https://github.com/seungjaeryanlee/implementations-nfq.git 47 | git submodule update --init --recursive 48 | ``` 49 | 50 | After cloing the repository, use the [requirements.txt](/requirements.txt) for simple installation of PyPI packages. 51 | 52 | ```bash 53 | pip install -r requirements.txt 54 | ``` 55 | 56 | You can read more about each package in the comments of the [requirements.txt](/requirements.txt) file! 57 | 58 | ### Running 🏃 59 | 60 | You can train the NFQ agent on Cartpole Regulator using the given configuration file with the below command: 61 | ``` 62 | python train_eval.py -c cartpole.conf 63 | ``` 64 | 65 | For a reproducible run, use the `--RANDOM_SEED` flag. 66 | ``` 67 | python train_eval.py -c cartpole.conf --RANDOM_SEED=1 68 | ``` 69 | 70 | To save a trained agent, use the `--SAVE_PATH` flag. 71 | ``` 72 | python train_eval.py -c cartpole.conf --SAVE_PATH=saves/cartpole.pth 73 | ``` 74 | 75 | To load a trained agent, use the `--LOAD_PATH` flag. 76 | ``` 77 | python train_eval.py -c cartpole.conf --LOAD_PATH=saves/cartpole.pth 78 | ``` 79 | 80 | To enable logging to TensorBoard or W&B, use appropriate flags. 81 | ``` 82 | python train_eval.py -c cartpole.conf --USE_TENSORBOARD --USE_WANDB 83 | ``` 84 | 85 | ### Results 📊 86 | 87 | This repository uses **TensorBoard** for offline logging and **Weights & Biases** for online logging. You can see the all the metrics in [my summary report at Weights & Biases](https://app.wandb.ai/seungjaeryanlee/implementations-nfq/reports?view=seungjaeryanlee%2FSummary)! 88 | 89 |
90 |
91 |
92 |
94 |
95 |
96 |
98 |
99 |
100 |
101 |