5 |
6 | # Module: train
7 |
8 | The entry point for running an agent on an Atari 2600 domain.
9 |
10 |
11 | ## Functions
12 |
13 | [`create_agent(...)`](./train/create_agent.md): Creates a DQN agent.
14 |
15 | [`create_runner(...)`](./train/create_runner.md): Creates an experiment Runner.
16 |
17 | [`launch_experiment(...)`](./train/launch_experiment.md): Launches the
18 | experiment.
19 |
--------------------------------------------------------------------------------
/docs/api_docs/python/utils/get_latest_file.md:
--------------------------------------------------------------------------------
1 |
2 |
3 |
4 |
5 |
6 | # utils.get_latest_file
7 |
8 | ```python
9 | utils.get_latest_file(path)
10 | ```
11 |
12 | Return the file named 'path_[0-9]*' with the largest such number.
13 |
14 | #### Args:
15 |
16 | * `path`: The base path (including directory and base name) to search.
17 |
18 | #### Returns:
19 |
20 | The latest file (in terms of given numbers).
21 |
--------------------------------------------------------------------------------
/docs/api_docs/python/implicit_quantile_agent.md:
--------------------------------------------------------------------------------
1 |
2 |
3 |
4 |
5 |
6 | # Module: implicit_quantile_agent
7 |
8 | The implicit quantile networks (IQN) agent.
9 |
10 | The agent follows the description given in "Implicit Quantile Networks for
11 | Distributional RL" (Dabney et. al, 2018).
12 |
13 | ## Classes
14 |
15 | [`class ImplicitQuantileAgent`](./implicit_quantile_agent/ImplicitQuantileAgent.md):
16 | An extension of Rainbow to perform implicit quantile regression.
17 |
--------------------------------------------------------------------------------
/dopamine/common/__init__.py:
--------------------------------------------------------------------------------
1 | # Copyright 2018 The Dopamine Authors.
2 | #
3 | # Licensed under the Apache License, Version 2.0 (the "License");
4 | # you may not use this file except in compliance with the License.
5 | # You may obtain a copy of the License at
6 | #
7 | # http://www.apache.org/licenses/LICENSE-2.0
8 | #
9 | # Unless required by applicable law or agreed to in writing, software
10 | # distributed under the License is distributed on an "AS IS" BASIS,
11 | # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
12 | # See the License for the specific language governing permissions and
13 | # limitations under the License.
14 |
--------------------------------------------------------------------------------
/dopamine/agents/__init__.py:
--------------------------------------------------------------------------------
1 | # Copyright 2018 The Dopamine Authors.
2 | #
3 | # Licensed under the Apache License, Version 2.0 (the "License");
4 | # you may not use this file except in compliance with the License.
5 | # You may obtain a copy of the License at
6 | #
7 | # http://www.apache.org/licenses/LICENSE-2.0
8 | #
9 | # Unless required by applicable law or agreed to in writing, software
10 | # distributed under the License is distributed on an "AS IS" BASIS,
11 | # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
12 | # See the License for the specific language governing permissions and
13 | # limitations under the License.
14 |
15 |
--------------------------------------------------------------------------------
/dopamine/atari/__init__.py:
--------------------------------------------------------------------------------
1 | # Copyright 2018 The Dopamine Authors.
2 | #
3 | # Licensed under the Apache License, Version 2.0 (the "License");
4 | # you may not use this file except in compliance with the License.
5 | # You may obtain a copy of the License at
6 | #
7 | # http://www.apache.org/licenses/LICENSE-2.0
8 | #
9 | # Unless required by applicable law or agreed to in writing, software
10 | # distributed under the License is distributed on an "AS IS" BASIS,
11 | # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
12 | # See the License for the specific language governing permissions and
13 | # limitations under the License.
14 |
15 |
--------------------------------------------------------------------------------
/dopamine/colab/__init__.py:
--------------------------------------------------------------------------------
1 | # Copyright 2018 The Dopamine Authors.
2 | #
3 | # Licensed under the Apache License, Version 2.0 (the "License");
4 | # you may not use this file except in compliance with the License.
5 | # You may obtain a copy of the License at
6 | #
7 | # http://www.apache.org/licenses/LICENSE-2.0
8 | #
9 | # Unless required by applicable law or agreed to in writing, software
10 | # distributed under the License is distributed on an "AS IS" BASIS,
11 | # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
12 | # See the License for the specific language governing permissions and
13 | # limitations under the License.
14 |
15 |
--------------------------------------------------------------------------------
/dopamine/utils/__init__.py:
--------------------------------------------------------------------------------
1 | # Copyright 2018 The Dopamine Authors.
2 | #
3 | # Licensed under the Apache License, Version 2.0 (the "License");
4 | # you may not use this file except in compliance with the License.
5 | # You may obtain a copy of the License at
6 | #
7 | # http://www.apache.org/licenses/LICENSE-2.0
8 | #
9 | # Unless required by applicable law or agreed to in writing, software
10 | # distributed under the License is distributed on an "AS IS" BASIS,
11 | # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
12 | # See the License for the specific language governing permissions and
13 | # limitations under the License.
14 |
15 |
--------------------------------------------------------------------------------
/dopamine/agents/dqn/__init__.py:
--------------------------------------------------------------------------------
1 | # Copyright 2018 The Dopamine Authors.
2 | #
3 | # Licensed under the Apache License, Version 2.0 (the "License");
4 | # you may not use this file except in compliance with the License.
5 | # You may obtain a copy of the License at
6 | #
7 | # http://www.apache.org/licenses/LICENSE-2.0
8 | #
9 | # Unless required by applicable law or agreed to in writing, software
10 | # distributed under the License is distributed on an "AS IS" BASIS,
11 | # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
12 | # See the License for the specific language governing permissions and
13 | # limitations under the License.
14 |
15 |
--------------------------------------------------------------------------------
/dopamine/agents/rainbow/__init__.py:
--------------------------------------------------------------------------------
1 | # Copyright 2018 The Dopamine Authors.
2 | #
3 | # Licensed under the Apache License, Version 2.0 (the "License");
4 | # you may not use this file except in compliance with the License.
5 | # You may obtain a copy of the License at
6 | #
7 | # http://www.apache.org/licenses/LICENSE-2.0
8 | #
9 | # Unless required by applicable law or agreed to in writing, software
10 | # distributed under the License is distributed on an "AS IS" BASIS,
11 | # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
12 | # See the License for the specific language governing permissions and
13 | # limitations under the License.
14 |
15 |
--------------------------------------------------------------------------------
/dopamine/replay_memory/__init__.py:
--------------------------------------------------------------------------------
1 | # Copyright 2018 The Dopamine Authors.
2 | #
3 | # Licensed under the Apache License, Version 2.0 (the "License");
4 | # you may not use this file except in compliance with the License.
5 | # You may obtain a copy of the License at
6 | #
7 | # http://www.apache.org/licenses/LICENSE-2.0
8 | #
9 | # Unless required by applicable law or agreed to in writing, software
10 | # distributed under the License is distributed on an "AS IS" BASIS,
11 | # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
12 | # See the License for the specific language governing permissions and
13 | # limitations under the License.
14 |
15 |
--------------------------------------------------------------------------------
/dopamine/__init__.py:
--------------------------------------------------------------------------------
1 | # Copyright 2018 The Dopamine Authors.
2 | #
3 | # Licensed under the Apache License, Version 2.0 (the "License");
4 | # you may not use this file except in compliance with the License.
5 | # You may obtain a copy of the License at
6 | #
7 | # http://www.apache.org/licenses/LICENSE-2.0
8 | #
9 | # Unless required by applicable law or agreed to in writing, software
10 | # distributed under the License is distributed on an "AS IS" BASIS,
11 | # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
12 | # See the License for the specific language governing permissions and
13 | # limitations under the License.
14 | name = 'dopamine'
15 |
--------------------------------------------------------------------------------
/dopamine/agents/implicit_quantile/__init__.py:
--------------------------------------------------------------------------------
1 | # Copyright 2018 The Dopamine Authors.
2 | #
3 | # Licensed under the Apache License, Version 2.0 (the "License");
4 | # you may not use this file except in compliance with the License.
5 | # You may obtain a copy of the License at
6 | #
7 | # http://www.apache.org/licenses/LICENSE-2.0
8 | #
9 | # Unless required by applicable law or agreed to in writing, software
10 | # distributed under the License is distributed on an "AS IS" BASIS,
11 | # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
12 | # See the License for the specific language governing permissions and
13 | # limitations under the License.
14 |
15 |
--------------------------------------------------------------------------------
/docs/api_docs/python/train/create_agent.md:
--------------------------------------------------------------------------------
1 |
5 |
6 | # utils.get_latest_iteration
7 |
8 | ```python
9 | utils.get_latest_iteration(path)
10 | ```
11 |
12 | Return the largest iteration number corresponding to the given path.
13 |
14 | #### Args:
15 |
16 | * `path`: The base path (including directory and base name) to search.
17 |
18 | #### Returns:
19 |
20 | The latest iteration number.
21 |
22 | #### Raises:
23 |
24 | * `ValueError`: if there is not available log data at the given path.
25 |
--------------------------------------------------------------------------------
/docs/api_docs/python/utils/load_baselines.md:
--------------------------------------------------------------------------------
1 |
2 |
3 |
4 |
5 |
6 | # utils.load_baselines
7 |
8 | ```python
9 | utils.load_baselines(
10 | base_dir,
11 | verbose=False
12 | )
13 | ```
14 |
15 | Reads in the baseline experimental data from a specified base directory.
16 |
17 | #### Args:
18 |
19 | * `base_dir`: string, base directory where to read data from.
20 | * `verbose`: bool, whether to print warning messages.
21 |
22 | #### Returns:
23 |
24 | A dict containing pandas DataFrames for all available agents and games.
25 |
--------------------------------------------------------------------------------
/docs/api_docs/python/train/launch_experiment.md:
--------------------------------------------------------------------------------
1 |
2 |
3 |
4 |
5 |
6 | # train.launch_experiment
7 |
8 | ```python
9 | train.launch_experiment(
10 | create_runner_fn,
11 | create_agent_fn
12 | )
13 | ```
14 |
15 | Launches the experiment.
16 |
17 | #### Args:
18 |
19 | * `create_runner_fn`: A function that takes as args a base directory
20 | and a function for creating an agent and returns a `Runner`-like object.
21 | * `create_agent_fn`: A function that takes as args a Tensorflow session
22 | and an Atari 2600 Gym environment, and returns an agent.
23 |
--------------------------------------------------------------------------------
/docs/api_docs/python/train/create_runner.md:
--------------------------------------------------------------------------------
1 |
2 |
3 |
4 |
5 |
6 | # train.create_runner
7 |
8 | ```python
9 | train.create_runner(
10 | base_dir,
11 | create_agent_fn
12 | )
13 | ```
14 |
15 | Creates an experiment Runner.
16 |
17 | #### Args:
18 |
19 | * `base_dir`: str, base directory for hosting all subdirectories.
20 | * `create_agent_fn`: A function that takes as args a Tensorflow session
21 | and an Atari 2600 Gym environment, and returns an agent.
22 |
23 | #### Returns:
24 |
25 | * `runner`: A
26 | run_experiment.Runner
27 | like object.
28 |
29 | #### Raises:
30 |
31 | * `ValueError`: When an unknown schedule is encountered.
32 |
--------------------------------------------------------------------------------
/docs/api_docs/python/utils/summarize_data.md:
--------------------------------------------------------------------------------
1 |
2 |
3 |
4 |
5 |
6 | # utils.summarize_data
7 |
8 | ```python
9 | utils.summarize_data(
10 | data,
11 | summary_keys
12 | )
13 | ```
14 |
15 | Processes log data into a per-iteration summary.
16 |
17 | #### Args:
18 |
19 | * `data`: Dictionary loaded by load_statistics describing the data.
20 | This dictionary has keys iteration_0, iteration_1, ... describing
21 | per-iteration data.
22 | * `summary_keys`: List of per-iteration data to be summarized.
23 |
24 | Example: data = load_statistics(...) get_iteration_summmary(data,
25 | ['train_episode_returns', 'eval_episode_returns'])
26 |
27 | #### Returns:
28 |
29 | A dictionary mapping each key in returns_keys to a per-iteration summary.
30 |
--------------------------------------------------------------------------------
/docs/api_docs/python/prioritized_replay_buffer.md:
--------------------------------------------------------------------------------
1 |
2 |
3 |
4 |
5 |
6 | # Module: prioritized_replay_buffer
7 |
8 | An implementation of Prioritized Experience Replay (PER).
9 |
10 | This implementation is based on the paper "Prioritized Experience Replay" by Tom
11 | Schaul et al. (2015). Many thanks to Tom Schaul, John Quan, and Matteo Hessel
12 | for providing useful pointers on the algorithm and its implementation.
13 |
14 | ## Classes
15 |
16 | [`class OutOfGraphPrioritizedReplayBuffer`](./prioritized_replay_buffer/OutOfGraphPrioritizedReplayBuffer.md):
17 | An out-of-graph Replay Buffer for Prioritized Experience Replay.
18 |
19 | [`class WrappedPrioritizedReplayBuffer`](./prioritized_replay_buffer/WrappedPrioritizedReplayBuffer.md):
20 | Wrapper of OutOfGraphPrioritizedReplayBuffer with in-graph sampling.
21 |
--------------------------------------------------------------------------------
/docs/api_docs/python/circular_replay_buffer.md:
--------------------------------------------------------------------------------
1 |
2 |
3 |
4 |
5 |
6 | # Module: circular_replay_buffer
7 |
8 | The standard DQN replay memory.
9 |
10 | This implementation is an out-of-graph replay memory + in-graph wrapper. It
11 | supports vanilla n-step updates of the form typically found in the literature,
12 | i.e. where rewards are accumulated for n steps and the intermediate trajectory
13 | is not exposed to the agent. This does not allow, for example, performing
14 | off-policy corrections.
15 |
16 | ## Classes
17 |
18 | [`class OutOfGraphReplayBuffer`](./circular_replay_buffer/OutOfGraphReplayBuffer.md):
19 | A simple out-of-graph Replay Buffer.
20 |
21 | [`class WrappedReplayBuffer`](./circular_replay_buffer/WrappedReplayBuffer.md):
22 | Wrapper of OutOfGraphReplayBuffer with an in graph sampling mechanism.
23 |
--------------------------------------------------------------------------------
/docs/api_docs/python/utils/load_statistics.md:
--------------------------------------------------------------------------------
1 |
2 |
3 |
4 |
5 |
6 | # utils.load_statistics
7 |
8 | ```python
9 | utils.load_statistics(
10 | log_path,
11 | iteration_number=None,
12 | verbose=True
13 | )
14 | ```
15 |
16 | Reads in a statistics object from log_path.
17 |
18 | #### Args:
19 |
20 | * `log_path`: string, provides the full path to the training/eval
21 | statistics.
22 | * `iteration_number`: The iteration number of the statistics object we
23 | want to read. If set to None, load the latest version.
24 | * `verbose`: Whether to output information about the load procedure.
25 |
26 | #### Returns:
27 |
28 | * `data`: The requested statistics object.
29 | * `iteration`: The corresponding iteration number.
30 |
31 | #### Raises:
32 |
33 | * `Exception`: if data is not present.
34 |
--------------------------------------------------------------------------------
/docs/api_docs/python/utils.md:
--------------------------------------------------------------------------------
1 |
2 |
3 |
4 |
5 |
6 | # Module: utils
7 |
8 | This provides utilities for dealing with Dopamine data.
9 |
10 | See: dopamine/common/logger.py .
11 |
12 | ## Functions
13 |
14 | [`get_latest_file(...)`](./utils/get_latest_file.md): Return the file named
15 | 'path_[0-9]*' with the largest such number.
16 |
17 | [`get_latest_iteration(...)`](./utils/get_latest_iteration.md): Return the
18 | largest iteration number corresponding to the given path.
19 |
20 | [`load_baselines(...)`](./utils/load_baselines.md): Reads in the baseline
21 | experimental data from a specified base directory.
22 |
23 | [`load_statistics(...)`](./utils/load_statistics.md): Reads in a statistics
24 | object from log_path.
25 |
26 | [`read_experiment(...)`](./utils/read_experiment.md): Reads in a set of
27 | experimental results from log_path.
28 |
29 | [`summarize_data(...)`](./utils/summarize_data.md): Processes log data into a
30 | per-iteration summary.
31 |
--------------------------------------------------------------------------------
/dopamine/colab/README.md:
--------------------------------------------------------------------------------
1 | # Colabs
2 |
3 | This directory contains
4 | [`utils.py`](https://github.com/google/dopamine/blob/master/dopamine/colab/utils.py),
5 | which provides a number of useful utilities for loading experiment statistics.
6 |
7 | We also provide a set of colabs to help illustrate how you can use Dopamine.
8 |
9 | ## Agents
10 |
11 | In this
12 | [colab](https://colab.research.google.com/github/google/dopamine/blob/master/dopamine/colab/agents.ipynb)
13 | we illustrate how to create a new agent by either subclassing
14 | [`DQN`](https://github.com/google/dopamine/blob/master/dopamine/agents/dqn/dqn_agent.py)
15 | or by creating a new agent from scratch.
16 |
17 | ## Loading statistics
18 |
19 | In this
20 | [colab](https://colab.research.google.com/github/google/dopamine/blob/master/dopamine/colab/load_statistics.ipynb)
21 | we illustrate how to load and visualize the logs data produced by Dopamine.
22 |
23 | ## Visualizing with Tensorboard
24 | In this
25 | [colab](https://colab.research.google.com/github/google/dopamine/blob/master/dopamine/colab/tensorboard.ipynb)
26 | we illustrate how to download and visualize different agents with Tensorboard.
27 |
--------------------------------------------------------------------------------
/dopamine/utils/test_utils.py:
--------------------------------------------------------------------------------
1 | # Copyright 2018 The Dopamine Authors.
2 | #
3 | # Licensed under the Apache License, Version 2.0 (the "License");
4 | # you may not use this file except in compliance with the License.
5 | # You may obtain a copy of the License at
6 | #
7 | # http://www.apache.org/licenses/LICENSE-2.0
8 | #
9 | # Unless required by applicable law or agreed to in writing, software
10 | # distributed under the License is distributed on an "AS IS" BASIS,
11 | # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
12 | # See the License for the specific language governing permissions and
13 | # limitations under the License.
14 | """Common testing utilities shared across agents."""
15 |
16 | from __future__ import absolute_import
17 | from __future__ import division
18 | from __future__ import print_function
19 |
20 |
21 |
22 | import mock
23 | import tensorflow as tf
24 |
25 |
26 | class MockReplayBuffer(object):
27 | """Mock ReplayBuffer to verify the way the agent interacts with it."""
28 |
29 | def __init__(self):
30 | with tf.variable_scope('MockReplayBuffer', reuse=tf.AUTO_REUSE):
31 | self.add = mock.Mock()
32 | self.memory = mock.Mock()
33 | self.memory.add_count = 0
34 |
--------------------------------------------------------------------------------
/docs/api_docs/python/rainbow_agent.md:
--------------------------------------------------------------------------------
1 |
2 |
3 |
4 |
5 |
6 | # Module: rainbow_agent
7 |
8 | Compact implementation of a simplified Rainbow agent.
9 |
10 | Specifically, we implement the following components from Rainbow:
11 |
12 | * n-step updates;
13 | * prioritized replay; and
14 | * distributional RL.
15 |
16 | These three components were found to significantly impact the performance of the
17 | Atari game-playing agent.
18 |
19 | Furthermore, our implementation does away with some minor hyperparameter
20 | choices. Specifically, we
21 |
22 | * keep the beta exponent fixed at beta=0.5, rather than increase it linearly;
23 | * remove the alpha parameter, which was set to alpha=0.5 throughout the paper.
24 |
25 | Details in "Rainbow: Combining Improvements in Deep Reinforcement Learning" by
26 | Hessel et al. (2018).
27 |
28 | ## Classes
29 |
30 | [`class RainbowAgent`](./rainbow_agent/RainbowAgent.md): A compact
31 | implementation of a simplified Rainbow agent.
32 |
33 | ## Functions
34 |
35 | [`project_distribution(...)`](./rainbow_agent/project_distribution.md): Projects
36 | a batch of (support, weights) onto target_support.
37 |
--------------------------------------------------------------------------------
/docs/api_docs/python/iteration_statistics/IterationStatistics.md:
--------------------------------------------------------------------------------
1 |
2 |
3 |
4 |
5 |
6 |
7 |
8 | # iteration_statistics.IterationStatistics
9 |
10 | ## Class `IterationStatistics`
11 |
12 | A class for storing iteration-specific metrics.
13 |
14 | The internal format is as follows: we maintain a mapping from keys to lists.
15 | Each list contains all the values corresponding to the given key.
16 |
17 | For example, self.data_lists['train_episode_returns'] might contain the
18 | per-episode returns achieved during this iteration.
19 |
20 | #### Attributes:
21 |
22 | * `data_lists`: dict mapping each metric_name (str) to a list of said
23 | metric across episodes.
24 |
25 | ## Methods
26 |
27 |
34 |
35 | ```python
36 | append(data_pairs)
37 | ```
38 |
39 | Add the given values to their corresponding key-indexed lists.
40 |
41 | #### Args:
42 |
43 | * `data_pairs`: A dictionary of key-value pairs to be recorded.
44 |
--------------------------------------------------------------------------------
/CONTRIBUTING.md:
--------------------------------------------------------------------------------
1 | # How to Contribute
2 |
3 | # Issues
4 |
5 | * Please tag your issue with `bug`, `feature request`, or `question` to help us
6 | effectively respond.
7 | * Please include the version of Dopamine you are running
8 | (run `pip list | grep dopamine`)
9 | * Please provide the command line you ran as well as the log output.
10 |
11 | # Pull Requests
12 |
13 | We'd love to accept your patches and contributions to this project. There are
14 | just a few small guidelines you need to follow.
15 |
16 | ## Contributor License Agreement
17 |
18 | Contributions to this project must be accompanied by a Contributor License
19 | Agreement. You (or your employer) retain the copyright to your contribution,
20 | this simply gives us permission to use and redistribute your contributions as
21 | part of the project. Head over to to see
22 | your current agreements on file or to sign a new one.
23 |
24 | You generally only need to submit a CLA once, so if you've already submitted one
25 | (even if it was for a different project), you probably don't need to do it
26 | again.
27 |
28 | ## Code reviews
29 |
30 | All submissions, including submissions by project members, require review. We
31 | use GitHub pull requests for this purpose. Consult
32 | [GitHub Help](https://help.github.com/articles/about-pull-requests/) for more
33 | information on using pull requests.
34 |
--------------------------------------------------------------------------------
/docs/api_docs/python/run_experiment/TrainRunner.md:
--------------------------------------------------------------------------------
1 |
2 |
3 |
4 |
5 |
6 |
7 |
8 | # run_experiment.TrainRunner
9 |
10 | ## Class `TrainRunner`
11 |
12 | Inherits From: [`Runner`](../run_experiment/Runner.md)
13 |
14 | Object that handles running Atari 2600 experiments.
15 |
16 | The `TrainRunner` differs from the base `Runner` class in that it does not the
17 | evaluation phase. Checkpointing and logging for the train phase are preserved as
18 | before.
19 |
20 | ## Methods
21 |
22 |
__init__
23 |
24 | ```python
25 | __init__(
26 | *args,
27 | **kwargs
28 | )
29 | ```
30 |
31 | Initialize the TrainRunner object in charge of running a full experiment.
32 |
33 | #### Args:
34 |
35 | * `base_dir`: str, the base directory to host all required
36 | sub-directories.
37 | * `create_agent_fn`: A function that takes as args a Tensorflow session
38 | and an Atari 2600 Gym environment, and returns an agent.
39 |
40 |
5 |
6 | # Module: checkpointer
7 |
8 | A checkpointing mechanism for Dopamine agents.
9 |
10 | This Checkpointer expects a base directory where checkpoints for different
11 | iterations are stored. Specifically, Checkpointer.save_checkpoint() takes in as
12 | input a dictionary 'data' to be pickled to disk. At each iteration, we write a
13 | file called 'cpkt.#', where # is the iteration number. The Checkpointer also
14 | cleans up old files, maintaining up to the CHECKPOINT_DURATION most recent
15 | iterations.
16 |
17 | The Checkpointer writes a sentinel file to indicate that checkpointing was
18 | globally successful. This means that all other checkpointing activities (saving
19 | the Tensorflow graph, the replay buffer) should be performed *prior* to calling
20 | Checkpointer.save_checkpoint(). This allows the Checkpointer to detect
21 | incomplete checkpoints.
22 |
23 | #### Example
24 |
25 | After running 10 iterations (numbered 0...9) with base_directory='/checkpoint',
26 | the following files will exist: `/checkpoint/cpkt.6 /checkpoint/cpkt.7
27 | /checkpoint/cpkt.8 /checkpoint/cpkt.9 /checkpoint/sentinel_checkpoint_complete.6
28 | /checkpoint/sentinel_checkpoint_complete.7
29 | /checkpoint/sentinel_checkpoint_complete.8
30 | /checkpoint/sentinel_checkpoint_complete.9`
31 |
32 | ## Classes
33 |
34 | [`class Checkpointer`](./checkpointer/Checkpointer.md): Class for managing
35 | checkpoints for Dopamine agents.
36 |
--------------------------------------------------------------------------------
/tests/atari_init_test.py:
--------------------------------------------------------------------------------
1 | # Copyright 2018 The Dopamine Authors.
2 | #
3 | # Licensed under the Apache License, Version 2.0 (the "License");
4 | # you may not use this file except in compliance with the License.
5 | # You may obtain a copy of the License at
6 | #
7 | # http://www.apache.org/licenses/LICENSE-2.0
8 | #
9 | # Unless required by applicable law or agreed to in writing, software
10 | # distributed under the License is distributed on an "AS IS" BASIS,
11 | # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
12 | # See the License for the specific language governing permissions and
13 | # limitations under the License.
14 | """A simple test for validating that the Atari env initializes."""
15 |
16 | import datetime
17 | import os
18 | import shutil
19 |
20 |
21 |
22 | from absl import flags
23 | from dopamine.atari import train
24 | import tensorflow as tf
25 |
26 |
27 | FLAGS = flags.FLAGS
28 |
29 |
30 | class AtariInitTest(tf.test.TestCase):
31 |
32 | def setUp(self):
33 | FLAGS.base_dir = os.path.join(
34 | '/tmp/dopamine_tests',
35 | datetime.datetime.utcnow().strftime('run_%Y_%m_%d_%H_%M_%S'))
36 | FLAGS.agent_name = 'dqn'
37 | FLAGS.gin_files = ['dopamine/agents/dqn/configs/dqn.gin']
38 | # `num_iterations` set to zero to prevent runner execution.
39 | FLAGS.gin_bindings = [
40 | 'Runner.num_iterations=0',
41 | 'WrappedReplayBuffer.replay_capacity = 100' # To prevent OOM.
42 | ]
43 | FLAGS.alsologtostderr = True
44 |
45 | def test_atari_init(self):
46 | """Tests that a DQN agent is initialized."""
47 | train.main([])
48 | shutil.rmtree(FLAGS.base_dir)
49 |
50 |
51 | if __name__ == '__main__':
52 | tf.test.main()
53 |
--------------------------------------------------------------------------------
/dopamine/agents/implicit_quantile/configs/implicit_quantile.gin:
--------------------------------------------------------------------------------
1 | # Hyperparameters follow Dabney et al. (2018), but we modify as necessary to
2 | # match those used in Rainbow (Hessel et al., 2018), to ensure apples-to-apples
3 | # comparison.
4 |
5 | import dopamine.agents.implicit_quantile.implicit_quantile_agent
6 | import dopamine.agents.rainbow.rainbow_agent
7 | import dopamine.atari.run_experiment
8 | import dopamine.replay_memory.prioritized_replay_buffer
9 | import gin.tf.external_configurables
10 |
11 | ImplicitQuantileAgent.kappa = 1.0
12 | ImplicitQuantileAgent.num_tau_samples = 64
13 | ImplicitQuantileAgent.num_tau_prime_samples = 64
14 | ImplicitQuantileAgent.num_quantile_samples = 32
15 | RainbowAgent.gamma = 0.99
16 | RainbowAgent.update_horizon = 3
17 | RainbowAgent.min_replay_history = 20000 # agent steps
18 | RainbowAgent.update_period = 4
19 | RainbowAgent.target_update_period = 8000 # agent steps
20 | RainbowAgent.epsilon_train = 0.01
21 | RainbowAgent.epsilon_eval = 0.001
22 | RainbowAgent.epsilon_decay_period = 250000 # agent steps
23 | # IQN currently does not support prioritized replay.
24 | RainbowAgent.replay_scheme = 'uniform'
25 | RainbowAgent.tf_device = '/gpu:0' # '/cpu:*' use for non-GPU version
26 | RainbowAgent.optimizer = @tf.train.AdamOptimizer()
27 |
28 | tf.train.AdamOptimizer.learning_rate = 0.0000625
29 | tf.train.AdamOptimizer.epsilon = 0.00015
30 |
31 | Runner.game_name = 'Pong'
32 | # Sticky actions with probability 0.25, as suggested by (Machado et al., 2017).
33 | Runner.sticky_actions = True
34 | Runner.num_iterations = 200
35 | Runner.training_steps = 250000
36 | Runner.evaluation_steps = 125000
37 | Runner.max_steps_per_episode = 27000
38 |
39 | WrappedPrioritizedReplayBuffer.replay_capacity = 1000000
40 | WrappedPrioritizedReplayBuffer.batch_size = 32
41 |
--------------------------------------------------------------------------------
/dopamine/common/iteration_statistics.py:
--------------------------------------------------------------------------------
1 | # Copyright 2018 The Dopamine Authors.
2 | #
3 | # Licensed under the Apache License, Version 2.0 (the "License");
4 | # you may not use this file except in compliance with the License.
5 | # You may obtain a copy of the License at
6 | #
7 | # http://www.apache.org/licenses/LICENSE-2.0
8 | #
9 | # Unless required by applicable law or agreed to in writing, software
10 | # distributed under the License is distributed on an "AS IS" BASIS,
11 | # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
12 | # See the License for the specific language governing permissions and
13 | # limitations under the License.
14 | """A class for storing iteration-specific metrics.
15 | """
16 |
17 | from __future__ import absolute_import
18 | from __future__ import division
19 | from __future__ import print_function
20 |
21 |
22 | class IterationStatistics(object):
23 | """A class for storing iteration-specific metrics.
24 |
25 | The internal format is as follows: we maintain a mapping from keys to lists.
26 | Each list contains all the values corresponding to the given key.
27 |
28 | For example, self.data_lists['train_episode_returns'] might contain the
29 | per-episode returns achieved during this iteration.
30 |
31 | Attributes:
32 | data_lists: dict mapping each metric_name (str) to a list of said metric
33 | across episodes.
34 | """
35 |
36 | def __init__(self):
37 | self.data_lists = {}
38 |
39 | def append(self, data_pairs):
40 | """Add the given values to their corresponding key-indexed lists.
41 |
42 | Args:
43 | data_pairs: A dictionary of key-value pairs to be recorded.
44 | """
45 | for key, value in data_pairs.items():
46 | if key not in self.data_lists:
47 | self.data_lists[key] = []
48 | self.data_lists[key].append(value)
49 |
--------------------------------------------------------------------------------
/docs/api_docs/python/logger/Logger.md:
--------------------------------------------------------------------------------
1 |
2 |
3 |
4 |
5 |
6 |
7 |
8 |
9 |
10 | # logger.Logger
11 |
12 | ## Class `Logger`
13 |
14 | Class for maintaining a dictionary of data to log.
15 |
16 | ## Methods
17 |
18 |
31 |
32 | ```python
33 | __setitem__(
34 | key,
35 | value
36 | )
37 | ```
38 |
39 | This method will set an entry at key with value in the dictionary.
40 |
41 | It will effectively overwrite any previous data at the same key.
42 |
43 | #### Args:
44 |
45 | * `key`: str, indicating key where to write the entry.
46 | * `value`: A python object to store.
47 |
48 |
57 |
58 | ```python
59 | log_to_file(
60 | filename_prefix,
61 | iteration_number
62 | )
63 | ```
64 |
65 | Save the pickled dictionary to a file.
66 |
67 | #### Args:
68 |
69 | * `filename_prefix`: str, name of the file to use (without iteration
70 | number).
71 | * `iteration_number`: int, the iteration number, appended to the end of
72 | filename_prefix.
73 |
--------------------------------------------------------------------------------
/docs/api_docs/python/utils/read_experiment.md:
--------------------------------------------------------------------------------
1 |
2 |
3 |
4 |
5 |
6 | # utils.read_experiment
7 |
8 | ```python
9 | utils.read_experiment(
10 | log_path,
11 | parameter_set=None,
12 | job_descriptor='',
13 | iteration_number=None,
14 | summary_keys=('train_episode_returns', 'eval_episode_returns'),
15 | verbose=False
16 | )
17 | ```
18 |
19 | Reads in a set of experimental results from log_path.
20 |
21 | The provided parameter_set is an ordered_dict which 1) defines the parameters of
22 | this experiment, 2) defines the order in which they occur in the job descriptor.
23 |
24 | The method reads all experiments of the form
25 |
26 | ${log_path}/${job_descriptor}.format(params)/logs,
27 |
28 | where params is constructed from the cross product of the elements in the
29 | parameter_set.
30 |
31 | For example: parameter_set = collections.OrderedDict([ ('game', ['Asterix',
32 | 'Pong']), ('epsilon', ['0', '0.1']) ]) read_experiment('/tmp/logs',
33 | parameter_set, job_descriptor='{}_{}') Will try to read logs from: -
34 | /tmp/logs/Asterix_0/logs - /tmp/logs/Asterix_0.1/logs - /tmp/logs/Pong_0/logs -
35 | /tmp/logs/Pong_0.1/logs
36 |
37 | #### Args:
38 |
39 | * `log_path`: string, base path specifying where results live.
40 | * `parameter_set`: An ordered_dict mapping parameter names to allowable
41 | values.
42 | * `job_descriptor`: A job descriptor string which is used to construct
43 | the full path for each trial within an experiment.
44 | * `iteration_number`: Int, if not None determines the iteration number
45 | at which we read in results.
46 | * `summary_keys`: Iterable of strings, iteration statistics to
47 | summarize.
48 | * `verbose`: If True, print out additional information.
49 |
50 | #### Returns:
51 |
52 | A Pandas dataframe containing experimental results.
53 |
--------------------------------------------------------------------------------
/docs/api_docs/python/checkpointer/Checkpointer.md:
--------------------------------------------------------------------------------
1 |
2 |
3 |
4 |
5 |
6 |
7 |
8 |
9 | # checkpointer.Checkpointer
10 |
11 | ## Class `Checkpointer`
12 |
13 | Class for managing checkpoints for Dopamine agents.
14 |
15 | ## Methods
16 |
17 |
__init__
18 |
19 | ```python
20 | __init__(
21 | base_directory,
22 | checkpoint_file_prefix='ckpt',
23 | checkpoint_frequency=1
24 | )
25 | ```
26 |
27 | Initializes Checkpointer.
28 |
29 | #### Args:
30 |
31 | * `base_directory`: str, directory where all checkpoints are
32 | saved/loaded.
33 | * `checkpoint_file_prefix`: str, prefix to use for naming checkpoint
34 | files.
35 | * `checkpoint_frequency`: int, the frequency at which to checkpoint.
36 |
37 | #### Raises:
38 |
39 | * `ValueError`: if base_directory is empty, or not creatable.
40 |
41 |
load_checkpoint
42 |
43 | ```python
44 | load_checkpoint(iteration_number)
45 | ```
46 |
47 | Tries to reload a checkpoint at the selected iteration number.
48 |
49 | #### Args:
50 |
51 | * `iteration_number`: The checkpoint iteration number to try to load.
52 |
53 | #### Returns:
54 |
55 | If the checkpoint files exist, two unpickled objects that were passed in as data
56 | to save_checkpoint; returns None if the files do not exist.
57 |
58 |
save_checkpoint
59 |
60 | ```python
61 | save_checkpoint(
62 | iteration_number,
63 | data
64 | )
65 | ```
66 |
67 | Saves a new checkpoint at the current iteration_number.
68 |
69 | #### Args:
70 |
71 | * `iteration_number`: int, the current iteration number for this
72 | checkpoint.
73 | * `data`: Any (picklable) python object containing the data to store in
74 | the checkpoint.
75 |
--------------------------------------------------------------------------------
/docs/api_docs/python/rainbow_agent/project_distribution.md:
--------------------------------------------------------------------------------
1 |
2 |
3 |
4 |
5 |
6 | # rainbow_agent.project_distribution
7 |
8 | ```python
9 | rainbow_agent.project_distribution(
10 | supports,
11 | weights,
12 | target_support,
13 | validate_args=False
14 | )
15 | ```
16 |
17 | Projects a batch of (support, weights) onto target_support.
18 |
19 | Based on equation (7) in (Bellemare et al., 2017):
20 | https://arxiv.org/abs/1707.06887 In the rest of the comments we will refer to
21 | this equation simply as Eq7.
22 |
23 | This code is not easy to digest, so we will use a running example to clarify
24 | what is going on, with the following sample inputs:
25 |
26 | * supports = [[0, 2, 4, 6, 8], [1, 3, 4, 5, 6]]
27 | * weights = [[0.1, 0.6, 0.1, 0.1, 0.1], [0.1, 0.2, 0.5, 0.1, 0.1]]
28 | * target_support = [4, 5, 6, 7, 8]
29 |
30 | In the code below, comments preceded with 'Ex:' will be referencing the above
31 | values.
32 |
33 | #### Args:
34 |
35 | * `supports`: Tensor of shape (batch_size, num_dims) defining supports
36 | for the distribution.
37 | * `weights`: Tensor of shape (batch_size, num_dims) defining weights on
38 | the original support points. Although for the CategoricalDQN agent these
39 | weights are probabilities, it is not required that they are.
40 | * `target_support`: Tensor of shape (num_dims) defining support of the
41 | projected distribution. The values must be monotonically increasing. Vmin
42 | and Vmax will be inferred from the first and last elements of this tensor,
43 | respectively. The values in this tensor must be equally spaced.
44 | * `validate_args`: Whether we will verify the contents of the
45 | target_support parameter.
46 |
47 | #### Returns:
48 |
49 | A Tensor of shape (batch_size, num_dims) with the projection of a batch of
50 | (support, weights) onto target_support.
51 |
52 | #### Raises:
53 |
54 | * `ValueError`: If target_support has no dimensions, or if shapes of
55 | supports, weights, and target_support are incompatible.
56 |
--------------------------------------------------------------------------------
/docs/api_docs/python/run_experiment/Runner.md:
--------------------------------------------------------------------------------
1 |
2 |
3 |
4 |
5 |
6 |
7 |
8 | # run_experiment.Runner
9 |
10 | ## Class `Runner`
11 |
12 | Object that handles running Atari 2600 experiments.
13 |
14 | Here we use the term 'experiment' to mean simulating interactions between the
15 | agent and the environment and reporting some statistics pertaining to these
16 | interactions.
17 |
18 | A simple scenario to train a DQN agent is as follows:
19 |
20 | ```python
21 | base_dir = '/tmp/simple_example'
22 | def create_agent(sess, environment):
23 | return dqn_agent.DQNAgent(sess, num_actions=environment.action_space.n)
24 | runner = Runner(base_dir, create_agent, game_name='Pong')
25 | runner.run()
26 | ```
27 |
28 | ## Methods
29 |
30 |
__init__
31 |
32 | ```python
33 | __init__(
34 | *args,
35 | **kwargs
36 | )
37 | ```
38 |
39 | Initialize the Runner object in charge of running a full experiment.
40 |
41 | #### Args:
42 |
43 | * `base_dir`: str, the base directory to host all required
44 | sub-directories.
45 | * `create_agent_fn`: A function that takes as args a Tensorflow session
46 | and an Atari 2600 Gym environment, and returns an agent.
47 | * `create_environment_fn`: A function which receives a game name and
48 | creates an Atari 2600 Gym environment.
49 | * `game_name`: str, name of the Atari 2600 domain to run (required).
50 | * `sticky_actions`: bool, whether to enable sticky actions in the
51 | environment.
52 | * `checkpoint_file_prefix`: str, the prefix to use for checkpoint
53 | files.
54 | * `logging_file_prefix`: str, prefix to use for the log files.
55 | * `log_every_n`: int, the frequency for writing logs.
56 | * `num_iterations`: int, the iteration number threshold (must be
57 | greater than start_iteration).
58 | * `training_steps`: int, the number of training steps to perform.
59 | * `evaluation_steps`: int, the number of evaluation steps to perform.
60 | * `max_steps_per_episode`: int, maximum number of steps after which an
61 | episode terminates.
62 |
63 | This constructor will take the following actions: - Initialize an environment. -
64 | Initialize a `tf.Session`. - Initialize a logger. - Initialize an agent. -
65 | Reload from the latest checkpoint, if available, and initialize the Checkpointer
66 | object.
67 |
68 |
run_experiment
69 |
70 | ```python
71 | run_experiment()
72 | ```
73 |
74 | Runs a full experiment, spread over multiple iterations.
75 |
--------------------------------------------------------------------------------
/tests/common/iteration_statistics_test.py:
--------------------------------------------------------------------------------
1 | # Copyright 2018 The Dopamine Authors.
2 | #
3 | # Licensed under the Apache License, Version 2.0 (the "License");
4 | # you may not use this file except in compliance with the License.
5 | # You may obtain a copy of the License at
6 | #
7 | # http://www.apache.org/licenses/LICENSE-2.0
8 | #
9 | # Unless required by applicable law or agreed to in writing, software
10 | # distributed under the License is distributed on an "AS IS" BASIS,
11 | # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
12 | # See the License for the specific language governing permissions and
13 | # limitations under the License.
14 | """Tests for dopamine.common.iteration_statistics."""
15 |
16 | from __future__ import absolute_import
17 | from __future__ import division
18 | from __future__ import print_function
19 |
20 |
21 |
22 | from dopamine.common import iteration_statistics
23 | import tensorflow as tf
24 |
25 |
26 | class IterationStatisticsTest(tf.test.TestCase):
27 |
28 | def setUp(self):
29 | pass
30 |
31 | def testMissingValue(self):
32 | statistics = iteration_statistics.IterationStatistics()
33 | with self.assertRaises(KeyError):
34 | _ = statistics.data_lists['missing_key']
35 |
36 | def testAddOneValue(self):
37 | statistics = iteration_statistics.IterationStatistics()
38 |
39 | # The statistics data structure should be empty a-priori.
40 | self.assertEqual(0, len(statistics.data_lists))
41 |
42 | statistics.append({'key1': 0})
43 | # We should have exactly one list, containing one value.
44 | self.assertEqual(1, len(statistics.data_lists))
45 | self.assertEqual(1, len(statistics.data_lists['key1']))
46 | self.assertEqual(0, statistics.data_lists['key1'][0])
47 |
48 | def testAddManyValues(self):
49 | my_pi = 3.14159
50 |
51 | statistics = iteration_statistics.IterationStatistics()
52 |
53 | # Add a number of items. Each item is added to the list corresponding to its
54 | # given key.
55 | statistics.append({'rewards': 0,
56 | 'nouns': 'reinforcement',
57 | 'angles': my_pi})
58 | # Add a second item to the 'nouns' list.
59 | statistics.append({'nouns': 'learning'})
60 |
61 | # There are three lists.
62 | self.assertEqual(3, len(statistics.data_lists))
63 | self.assertEqual(1, len(statistics.data_lists['rewards']))
64 | self.assertEqual(2, len(statistics.data_lists['nouns']))
65 | self.assertEqual(1, len(statistics.data_lists['angles']))
66 |
67 | self.assertEqual(0, statistics.data_lists['rewards'][0])
68 | self.assertEqual('reinforcement', statistics.data_lists['nouns'][0])
69 | self.assertEqual('learning', statistics.data_lists['nouns'][1])
70 | self.assertEqual(my_pi, statistics.data_lists['angles'][0])
71 |
72 | if __name__ == '__main__':
73 | tf.test.main()
74 |
--------------------------------------------------------------------------------
/tests/train_runner_integration_test.py:
--------------------------------------------------------------------------------
1 | # Copyright 2018 The Dopamine Authors.
2 | #
3 | # Licensed under the Apache License, Version 2.0 (the "License");
4 | # you may not use this file except in compliance with the License.
5 | # You may obtain a copy of the License at
6 | #
7 | # http://www.apache.org/licenses/LICENSE-2.0
8 | #
9 | # Unless required by applicable law or agreed to in writing, software
10 | # distributed under the License is distributed on an "AS IS" BASIS,
11 | # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
12 | # See the License for the specific language governing permissions and
13 | # limitations under the License.
14 | """End to end tests for TrainRunner."""
15 |
16 | import datetime
17 | import os
18 | import shutil
19 |
20 |
21 |
22 | from absl import flags
23 |
24 | from dopamine.atari import train
25 | import tensorflow as tf
26 |
27 | FLAGS = flags.FLAGS
28 |
29 |
30 | class TrainRunnerIntegrationTest(tf.test.TestCase):
31 | """Tests for Atari environment with various agents.
32 |
33 | """
34 |
35 | def setUp(self):
36 | FLAGS.base_dir = os.path.join(
37 | '/tmp/dopamine_tests',
38 | datetime.datetime.utcnow().strftime('run_%Y_%m_%d_%H_%M_%S'))
39 | self._checkpoint_dir = os.path.join(FLAGS.base_dir, 'checkpoints')
40 | self._logging_dir = os.path.join(FLAGS.base_dir, 'logs')
41 | FLAGS.schedule = 'continuous_train'
42 |
43 | def quickDqnFlags(self):
44 | """Assign flags for a quick run of DQN agent."""
45 | FLAGS.agent_name = 'dqn'
46 | FLAGS.gin_files = ['dopamine/agents/dqn/configs/dqn.gin']
47 | FLAGS.gin_bindings = [
48 | 'Runner.training_steps=100', 'Runner.evaluation_steps=10',
49 | 'Runner.num_iterations=1', 'Runner.max_steps_per_episode=100',
50 | 'dqn_agent.DQNAgent.min_replay_history=500',
51 | 'WrappedReplayBuffer.replay_capacity=100'
52 | ]
53 | FLAGS.alsologtostderr = True
54 |
55 | def verifyFilesCreated(self, base_dir):
56 | """Verify that files have been created."""
57 | # Check checkpoint files
58 | self.assertTrue(
59 | os.path.exists(os.path.join(self._checkpoint_dir, 'ckpt.0')))
60 | self.assertTrue(
61 | os.path.exists(os.path.join(self._checkpoint_dir, 'checkpoint')))
62 | self.assertTrue(
63 | os.path.exists(
64 | os.path.join(self._checkpoint_dir,
65 | 'sentinel_checkpoint_complete.0')))
66 | # Check log files
67 | self.assertTrue(os.path.exists(os.path.join(self._logging_dir, 'log_0')))
68 |
69 | def testIntegrationDqn(self):
70 | """Test the DQN agent."""
71 | tf.logging.info('####### Training the DQN agent #####')
72 | tf.logging.info('####### DQN base_dir: {}'.format(FLAGS.base_dir))
73 | self.quickDqnFlags()
74 | train.main([])
75 | self.verifyFilesCreated(FLAGS.base_dir)
76 | shutil.rmtree(FLAGS.base_dir)
77 |
78 |
79 | if __name__ == '__main__':
80 | tf.test.main()
81 |
--------------------------------------------------------------------------------
/docs/api_docs/python/index.md:
--------------------------------------------------------------------------------
1 | # All symbols in Dopamine
2 |
3 | * checkpointer
4 | * checkpointer.Checkpointer
5 | * circular_replay_buffer
6 | * circular_replay_buffer.OutOfGraphReplayBuffer
7 | * circular_replay_buffer.WrappedReplayBuffer
8 | * dqn_agent
9 | * dqn_agent.DQNAgent
10 | * implicit_quantile_agent
11 | * implicit_quantile_agent.ImplicitQuantileAgent
12 | * iteration_statistics
13 | * iteration_statistics.IterationStatistics
14 | * logger
15 | * logger.Logger
16 | * prioritized_replay_buffer
17 | * prioritized_replay_buffer.OutOfGraphPrioritizedReplayBuffer
18 | * prioritized_replay_buffer.WrappedPrioritizedReplayBuffer
19 | * rainbow_agent
20 | * rainbow_agent.RainbowAgent
21 | * rainbow_agent.project_distribution
22 | * run_experiment
23 | * run_experiment.Runner
24 | * run_experiment.TrainRunner
25 | * train
26 | * train.create_agent
27 | * train.create_runner
28 | * train.launch_experiment
29 | * utils
30 | * utils.get_latest_file
31 | * utils.get_latest_iteration
32 | * utils.load_baselines
33 | * utils.load_statistics
34 | * utils.read_experiment
35 | * utils.summarize_data
36 |
--------------------------------------------------------------------------------
/baselines/README.md:
--------------------------------------------------------------------------------
1 | # Baseline data
2 |
3 | This directory provides information about the baseline data provided by
4 | Dopamine. The default hyperparameter configuration for the 4 agents we are
5 | providing yields a standardized "apples to apples" comparison between them.
6 |
7 | The default configuration files files for each agent (set up with
8 | [gin configuration framework](https://github.com/google/gin-config)) are:
9 |
10 | * [`dopamine/agents/dqn/configs/dqn.gin`](https://github.com/google/dopamine/blob/master/dopamine/agents/dqn/configs/dqn.gin)
11 | * [`dopamine/agents/rainbow/configs/c51.gin`](https://github.com/google/dopamine/blob/master/dopamine/agents/rainbow/configs/c51.gin)
12 | * [`dopamine/agents/rainbow/configs/rainbow.gin`](https://github.com/google/dopamine/blob/master/dopamine/agents/rainbow/configs/rainbow.gin)
13 | * [`dopamine/agents/implicit_quantile/configs/implicit_quantile.gin`](https://github.com/google/dopamine/blob/master/dopamine/agents/implicit_quantile/configs/implicit_quantile.gin)
14 |
15 | ## Hyperparemeter comparison
16 | Our results compare the agents with the same hyperparameters: target
17 | network update frequency, frequency at which exploratory actions are selected (ε), the
18 | length of the schedule over which ε is annealed, and the number of agent steps
19 | before training occurs. Changing these parameters can significantly affect
20 | performance, without necessarily being indicative of an algorithmic difference.
21 | Unsurprisingly, DQN performs much better when trained with 1% of exploratory
22 | actions instead of 10% (as used in the original Nature paper). Step size and
23 | optimizer were taken as published. The table below summarizes our choices. All
24 | numbers are in ALE frames.
25 |
26 | | | Our baseline results | [DQN][dqn] | [C51][c51] | [Rainbow][rainbow] | [IQN][iqn] |
27 | | :---------------------------------- | :------------------: | :--------: | :--------: | :----------------: | :--------: |
28 | | **Training ε** | 0.01 | 0.1 | 0.01 | 0.01 | 0.01 |
29 | | **Evaluation ε** | 0.001 | 0.01 | 0.001 | * | 0.001 |
30 | | **ε decay schedule** | 1,000,000 frames | 4,000,000 frames | 4,000,000 frames | 1,000,000 frames | 4,000,000 frames |
31 | | **Min. history to start learning** | 80,000 frames | 200,000 frames | 200,000 frames | 80,000 frames | 200,000 frames |
32 | | **Target network update frequency** | 32,000 frames | 40,000 frames | 40,000 frames | 32,000 frames | 40,000 frames |
33 |
34 | ## Visualization
35 | We provide a [website](https://google.github.io/dopamine/baselines/plots.html)
36 | where you can quickly visualize the training runs for all our default agents.
37 |
38 | The plots are rendered from a set of
39 | [JSON files](https://github.com/google/dopamine/tree/master/baselines/data)
40 | which we compiled. These may prove useful in their own right to compare
41 | against results obtained from other frameworks.
42 |
43 |
44 | [dqn]: https://storage.googleapis.com/deepmind-media/dqn/DQNNaturePaper.pdf
45 | [c51]: https://arxiv.org/abs/1707.06887
46 | [rainbow]: https://arxiv.org/abs/1710.02298
47 | [iqn]: https://arxiv.org/abs/1806.06923
48 |
--------------------------------------------------------------------------------
/setup.py:
--------------------------------------------------------------------------------
1 | # Copyright 2018 The Dopamine Authors.
2 | #
3 | # Licensed under the Apache License, Version 2.0 (the "License");
4 | # you may not use this file except in compliance with the License.
5 | # You may obtain a copy of the License at
6 | #
7 | # http://www.apache.org/licenses/LICENSE-2.0
8 | #
9 | # Unless required by applicable law or agreed to in writing, software
10 | # distributed under the License is distributed on an "AS IS" BASIS,
11 | # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
12 | # See the License for the specific language governing permissions and
13 | # limitations under the License.
14 | """Setup script for Dopamine.
15 |
16 | This script will install Dopamine as a Python module.
17 |
18 | See: https://github.com/google/dopamine
19 |
20 | """
21 |
22 | import codecs
23 | from os import path
24 | from setuptools import find_packages
25 | from setuptools import setup
26 |
27 | here = path.abspath(path.dirname(__file__))
28 |
29 | # Get the long description from the README file.
30 | with codecs.open(path.join(here, 'README.md'), encoding='utf-8') as f:
31 | long_description = f.read()
32 |
33 | install_requires = ['gin-config >= 0.1.1', 'absl-py >= 0.2.2',
34 | 'tensorflow', 'opencv-python >= 3.4.1.15',
35 | 'gym >= 0.10.5']
36 | tests_require = ['gin-config >= 0.1.1', 'absl-py >= 0.2.2',
37 | 'tensorflow >= 1.9.0', 'opencv-python >= 3.4.1.15',
38 | 'gym >= 0.10.5', 'mock >= 1.0.0']
39 |
40 | dopamine_description = (
41 | 'Dopamine: A framework for flexible Reinforcement Learning research')
42 |
43 | setup(
44 | name='dopamine_rl',
45 | version='1.0.2',
46 | include_package_data=True,
47 | packages=find_packages(exclude=['docs']), # Required
48 | package_data={'testdata': ['testdata/*.gin']},
49 | install_requires=install_requires,
50 | tests_require=tests_require,
51 | description=dopamine_description,
52 | long_description=long_description,
53 | url='https://github.com/google/dopamine', # Optional
54 | author='The Dopamine Team', # Optional
55 | author_email='opensource@google.com',
56 | classifiers=[ # Optional
57 | 'Development Status :: 4 - Beta',
58 |
59 | # Indicate who your project is intended for
60 | 'Intended Audience :: Developers',
61 | 'Intended Audience :: Education',
62 | 'Intended Audience :: Science/Research',
63 |
64 | # Pick your license as you wish
65 | 'License :: OSI Approved :: Apache Software License',
66 |
67 | # Specify the Python versions you support here. In particular, ensure
68 | # that you indicate whether you support Python 2, Python 3 or both.
69 | 'Programming Language :: Python :: 2',
70 | 'Programming Language :: Python :: 2.7',
71 | 'Programming Language :: Python :: 3',
72 | 'Programming Language :: Python :: 3.4',
73 | 'Programming Language :: Python :: 3.5',
74 | 'Programming Language :: Python :: 3.6',
75 |
76 | 'Topic :: Scientific/Engineering',
77 | 'Topic :: Scientific/Engineering :: Mathematics',
78 | 'Topic :: Scientific/Engineering :: Artificial Intelligence',
79 | 'Topic :: Software Development',
80 | 'Topic :: Software Development :: Libraries',
81 | 'Topic :: Software Development :: Libraries :: Python Modules',
82 |
83 | ],
84 | project_urls={ # Optional
85 | 'Documentation': 'https://github.com/google/dopamine',
86 | 'Bug Reports': 'https://github.com/google/dopamine/issues',
87 | 'Source': 'https://github.com/google/dopamine',
88 | },
89 | license='Apache 2.0',
90 | keywords='dopamine reinforcement-learning python machine learning'
91 | )
92 |
--------------------------------------------------------------------------------
/docs/api_docs/python/_toc.yaml:
--------------------------------------------------------------------------------
1 | # Automatically generated file; please do not edit
2 | toc:
3 | - title: checkpointer
4 | section:
5 | - title: Overview
6 | path: /dopamine/api_docs/python/checkpointer
7 | - title: Checkpointer
8 | path: /dopamine/api_docs/python/checkpointer/Checkpointer
9 | - title: circular_replay_buffer
10 | section:
11 | - title: Overview
12 | path: /dopamine/api_docs/python/circular_replay_buffer
13 | - title: OutOfGraphReplayBuffer
14 | path: /dopamine/api_docs/python/circular_replay_buffer/OutOfGraphReplayBuffer
15 | - title: WrappedReplayBuffer
16 | path: /dopamine/api_docs/python/circular_replay_buffer/WrappedReplayBuffer
17 | - title: dqn_agent
18 | section:
19 | - title: Overview
20 | path: /dopamine/api_docs/python/dqn_agent
21 | - title: DQNAgent
22 | path: /dopamine/api_docs/python/dqn_agent/DQNAgent
23 | - title: implicit_quantile_agent
24 | section:
25 | - title: Overview
26 | path: /dopamine/api_docs/python/implicit_quantile_agent
27 | - title: ImplicitQuantileAgent
28 | path: /dopamine/api_docs/python/implicit_quantile_agent/ImplicitQuantileAgent
29 | - title: iteration_statistics
30 | section:
31 | - title: Overview
32 | path: /dopamine/api_docs/python/iteration_statistics
33 | - title: IterationStatistics
34 | path: /dopamine/api_docs/python/iteration_statistics/IterationStatistics
35 | - title: logger
36 | section:
37 | - title: Overview
38 | path: /dopamine/api_docs/python/logger
39 | - title: Logger
40 | path: /dopamine/api_docs/python/logger/Logger
41 | - title: prioritized_replay_buffer
42 | section:
43 | - title: Overview
44 | path: /dopamine/api_docs/python/prioritized_replay_buffer
45 | - title: OutOfGraphPrioritizedReplayBuffer
46 | path: /dopamine/api_docs/python/prioritized_replay_buffer/OutOfGraphPrioritizedReplayBuffer
47 | - title: WrappedPrioritizedReplayBuffer
48 | path: /dopamine/api_docs/python/prioritized_replay_buffer/WrappedPrioritizedReplayBuffer
49 | - title: rainbow_agent
50 | section:
51 | - title: Overview
52 | path: /dopamine/api_docs/python/rainbow_agent
53 | - title: project_distribution
54 | path: /dopamine/api_docs/python/rainbow_agent/project_distribution
55 | - title: RainbowAgent
56 | path: /dopamine/api_docs/python/rainbow_agent/RainbowAgent
57 | - title: run_experiment
58 | section:
59 | - title: Overview
60 | path: /dopamine/api_docs/python/run_experiment
61 | - title: Runner
62 | path: /dopamine/api_docs/python/run_experiment/Runner
63 | - title: TrainRunner
64 | path: /dopamine/api_docs/python/run_experiment/TrainRunner
65 | - title: train
66 | section:
67 | - title: Overview
68 | path: /dopamine/api_docs/python/train
69 | - title: create_agent
70 | path: /dopamine/api_docs/python/train/create_agent
71 | - title: create_runner
72 | path: /dopamine/api_docs/python/train/create_runner
73 | - title: launch_experiment
74 | path: /dopamine/api_docs/python/train/launch_experiment
75 | - title: utils
76 | section:
77 | - title: Overview
78 | path: /dopamine/api_docs/python/utils
79 | - title: get_latest_file
80 | path: /dopamine/api_docs/python/utils/get_latest_file
81 | - title: get_latest_iteration
82 | path: /dopamine/api_docs/python/utils/get_latest_iteration
83 | - title: load_baselines
84 | path: /dopamine/api_docs/python/utils/load_baselines
85 | - title: load_statistics
86 | path: /dopamine/api_docs/python/utils/load_statistics
87 | - title: read_experiment
88 | path: /dopamine/api_docs/python/utils/read_experiment
89 | - title: summarize_data
90 | path: /dopamine/api_docs/python/utils/summarize_data
91 |
--------------------------------------------------------------------------------
/dopamine/common/logger.py:
--------------------------------------------------------------------------------
1 | # Copyright 2018 The Dopamine Authors.
2 | #
3 | # Licensed under the Apache License, Version 2.0 (the "License");
4 | # you may not use this file except in compliance with the License.
5 | # You may obtain a copy of the License at
6 | #
7 | # http://www.apache.org/licenses/LICENSE-2.0
8 | #
9 | # Unless required by applicable law or agreed to in writing, software
10 | # distributed under the License is distributed on an "AS IS" BASIS,
11 | # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
12 | # See the License for the specific language governing permissions and
13 | # limitations under the License.
14 | """A lightweight logging mechanism for dopamine agents."""
15 |
16 | from __future__ import absolute_import
17 | from __future__ import division
18 | from __future__ import print_function
19 |
20 | import os
21 | import pickle
22 | import tensorflow as tf
23 |
24 |
25 | CHECKPOINT_DURATION = 4
26 |
27 |
28 | class Logger(object):
29 | """Class for maintaining a dictionary of data to log."""
30 |
31 | def __init__(self, logging_dir):
32 | """Initializes Logger.
33 |
34 | Args:
35 | logging_dir: str, Directory to which logs are written.
36 | """
37 | # Dict used by logger to store data.
38 | self.data = {}
39 | self._logging_enabled = True
40 |
41 | if not logging_dir:
42 | tf.logging.info('Logging directory not specified, will not log.')
43 | self._logging_enabled = False
44 | return
45 | # Try to create logging directory.
46 | try:
47 | tf.gfile.MakeDirs(logging_dir)
48 | except tf.errors.PermissionDeniedError:
49 | # If it already exists, ignore exception.
50 | pass
51 | if not tf.gfile.Exists(logging_dir):
52 | tf.logging.warning(
53 | 'Could not create directory %s, logging will be disabled.',
54 | logging_dir)
55 | self._logging_enabled = False
56 | return
57 | self._logging_dir = logging_dir
58 |
59 | def __setitem__(self, key, value):
60 | """This method will set an entry at key with value in the dictionary.
61 |
62 | It will effectively overwrite any previous data at the same key.
63 |
64 | Args:
65 | key: str, indicating key where to write the entry.
66 | value: A python object to store.
67 | """
68 | if self._logging_enabled:
69 | self.data[key] = value
70 |
71 | def _generate_filename(self, filename_prefix, iteration_number):
72 | filename = '{}_{}'.format(filename_prefix, iteration_number)
73 | return os.path.join(self._logging_dir, filename)
74 |
75 | def log_to_file(self, filename_prefix, iteration_number):
76 | """Save the pickled dictionary to a file.
77 |
78 | Args:
79 | filename_prefix: str, name of the file to use (without iteration
80 | number).
81 | iteration_number: int, the iteration number, appended to the end of
82 | filename_prefix.
83 | """
84 | if not self._logging_enabled:
85 | tf.logging.warning('Logging is disabled.')
86 | return
87 | log_file = self._generate_filename(filename_prefix, iteration_number)
88 | with tf.gfile.GFile(log_file, 'w') as fout:
89 | pickle.dump(self.data, fout, protocol=pickle.HIGHEST_PROTOCOL)
90 | # After writing a checkpoint file, we garbage collect the log file
91 | # that is CHECKPOINT_DURATION versions old.
92 | stale_iteration_number = iteration_number - CHECKPOINT_DURATION
93 | if stale_iteration_number >= 0:
94 | stale_file = self._generate_filename(filename_prefix,
95 | stale_iteration_number)
96 | try:
97 | tf.gfile.Remove(stale_file)
98 | except tf.errors.NotFoundError:
99 | # Ignore if file not found.
100 | pass
101 |
102 | def is_logging_enabled(self):
103 | """Return if logging is enabled."""
104 | return self._logging_enabled
105 |
--------------------------------------------------------------------------------
/tests/atari/preprocessing_test.py:
--------------------------------------------------------------------------------
1 | # Copyright 2018 The Dopamine Authors.
2 | #
3 | # Licensed under the Apache License, Version 2.0 (the "License");
4 | # you may not use this file except in compliance with the License.
5 | # You may obtain a copy of the License at
6 | #
7 | # http://www.apache.org/licenses/LICENSE-2.0
8 | #
9 | # Unless required by applicable law or agreed to in writing, software
10 | # distributed under the License is distributed on an "AS IS" BASIS,
11 | # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
12 | # See the License for the specific language governing permissions and
13 | # limitations under the License.
14 | """Tests for dopamine.atari.run_experiment."""
15 |
16 | from __future__ import absolute_import
17 | from __future__ import division
18 | from __future__ import print_function
19 |
20 |
21 |
22 | from absl import flags
23 | from dopamine.atari import preprocessing
24 | import numpy as np
25 | import tensorflow as tf
26 |
27 | FLAGS = flags.FLAGS
28 |
29 |
30 | class MockALE(object):
31 | """Mock internal ALE for testing."""
32 |
33 | def __init__(self):
34 | pass
35 |
36 | def lives(self):
37 | return 1
38 |
39 | def getScreenGrayscale(self, screen): # pylint: disable=invalid-name
40 | screen.fill(self.screen_value)
41 |
42 |
43 | class MockEnvironment(object):
44 | """Mock environment for testing."""
45 |
46 | def __init__(self, screen_size=10, max_steps=10):
47 | self.max_steps = max_steps
48 | self.screen_size = screen_size
49 | self.ale = MockALE()
50 | self.observation_space = np.empty((screen_size, screen_size))
51 | self.game_over = False
52 |
53 | def reset(self):
54 | self.ale.screen_value = 10
55 | self.num_steps = 0
56 | return self.get_observation()
57 |
58 | def get_observation(self):
59 | observation = np.empty((self.screen_size, self.screen_size))
60 | return self.ale.getScreenGrayscale(observation)
61 |
62 | def step(self, action):
63 | reward = -1. if action > 0 else 1.
64 | self.num_steps += 1
65 | is_terminal = self.num_steps >= self.max_steps
66 |
67 | unused = 0
68 | self.ale.screen_value -= 2
69 | return (self.get_observation(), reward, is_terminal, unused)
70 |
71 | def render(self, mode):
72 | pass
73 |
74 |
75 | class AtariPreprocessingTest(tf.test.TestCase):
76 |
77 | def testResetPassesObservation(self):
78 | env = MockEnvironment()
79 | env = preprocessing.AtariPreprocessing(env, frame_skip=1, screen_size=16)
80 | observation = env.reset()
81 |
82 | self.assertEqual(observation.shape, (16, 16, 1))
83 |
84 | def testTerminalPassedThrough(self):
85 | max_steps = 10
86 | env = MockEnvironment(max_steps=max_steps)
87 | env = preprocessing.AtariPreprocessing(env, frame_skip=1)
88 | env.reset()
89 |
90 | # Make sure we get the right number of steps.
91 | for _ in range(max_steps - 1):
92 | _, _, is_terminal, _ = env.step(0)
93 | self.assertFalse(is_terminal)
94 |
95 | _, _, is_terminal, _ = env.step(0)
96 | self.assertTrue(is_terminal)
97 |
98 | def testFrameSkipAccumulatesReward(self):
99 | frame_skip = 2
100 | env = MockEnvironment()
101 | env = preprocessing.AtariPreprocessing(env, frame_skip=frame_skip)
102 | env.reset()
103 |
104 | # Make sure we get the right number of steps. Reward is 1 when we
105 | # pass in action 0.
106 | _, reward, _, _ = env.step(0)
107 | self.assertEqual(reward, frame_skip)
108 |
109 | def testMaxFramePooling(self):
110 | frame_skip = 2
111 | env = MockEnvironment()
112 | env = preprocessing.AtariPreprocessing(env, frame_skip=frame_skip)
113 | env.reset()
114 |
115 | # The first observation is 2, the second 0; max is 2.
116 | observation, _, _, _ = env.step(0)
117 | self.assertTrue((observation == 8).all())
118 |
119 | if __name__ == '__main__':
120 | tf.test.main()
121 |
--------------------------------------------------------------------------------
/tests/integration_test.py:
--------------------------------------------------------------------------------
1 | # Copyright 2018 The Dopamine Authors.
2 | #
3 | # Licensed under the Apache License, Version 2.0 (the "License");
4 | # you may not use this file except in compliance with the License.
5 | # You may obtain a copy of the License at
6 | #
7 | # http://www.apache.org/licenses/LICENSE-2.0
8 | #
9 | # Unless required by applicable law or agreed to in writing, software
10 | # distributed under the License is distributed on an "AS IS" BASIS,
11 | # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
12 | # See the License for the specific language governing permissions and
13 | # limitations under the License.
14 | """End to end integration tests for Dopamine package."""
15 |
16 | import datetime
17 | import os
18 | import shutil
19 |
20 |
21 |
22 | from absl import flags
23 | from dopamine.atari import train
24 | import tensorflow as tf
25 |
26 | import gin.tf
27 |
28 |
29 | FLAGS = flags.FLAGS
30 |
31 |
32 | class AtariIntegrationTest(tf.test.TestCase):
33 | """Tests for Atari environment with various agents.
34 |
35 | """
36 |
37 | def setUp(self):
38 | FLAGS.base_dir = os.path.join(
39 | '/tmp/dopamine_tests',
40 | datetime.datetime.utcnow().strftime('run_%Y_%m_%d_%H_%M_%S'))
41 | self._checkpoint_dir = os.path.join(FLAGS.base_dir, 'checkpoints')
42 | self._logging_dir = os.path.join(FLAGS.base_dir, 'logs')
43 | FLAGS.alsologtostderr = True
44 | gin.clear_config()
45 |
46 | def quickDqnFlags(self):
47 | """Assign flags for a quick run of DQNAgent."""
48 | FLAGS.agent_name = 'dqn'
49 | FLAGS.gin_files = ['dopamine/agents/dqn/configs/dqn.gin']
50 | FLAGS.gin_bindings = [
51 | 'Runner.training_steps=100', 'Runner.evaluation_steps=10',
52 | 'Runner.num_iterations=1', 'Runner.max_steps_per_episode=100',
53 | 'dqn_agent.DQNAgent.min_replay_history=500',
54 | 'WrappedReplayBuffer.replay_capacity=100'
55 | ]
56 |
57 | def quickRainbowFlags(self):
58 | """Assign flags for a quick run of RainbowAgent."""
59 | FLAGS.agent_name = 'rainbow'
60 | FLAGS.gin_files = [
61 | 'dopamine/agents/rainbow/configs/rainbow.gin'
62 | ]
63 | FLAGS.gin_bindings = [
64 | 'Runner.training_steps=100', 'Runner.evaluation_steps=10',
65 | 'Runner.num_iterations=1', 'Runner.max_steps_per_episode=100',
66 | "rainbow_agent.RainbowAgent.replay_scheme='prioritized'",
67 | 'rainbow_agent.RainbowAgent.min_replay_history=500',
68 | 'WrappedReplayBuffer.replay_capacity=100'
69 | ]
70 |
71 | def verifyFilesCreated(self, base_dir):
72 | """Verify that files have been created."""
73 | # Check checkpoint files
74 | self.assertTrue(
75 | os.path.exists(os.path.join(self._checkpoint_dir, 'ckpt.0')))
76 | self.assertTrue(
77 | os.path.exists(os.path.join(self._checkpoint_dir, 'checkpoint')))
78 | self.assertTrue(
79 | os.path.exists(
80 | os.path.join(self._checkpoint_dir,
81 | 'sentinel_checkpoint_complete.0')))
82 | # Check log files
83 | self.assertTrue(os.path.exists(os.path.join(self._logging_dir, 'log_0')))
84 |
85 | def testIntegrationDqn(self):
86 | """Test the DQN agent."""
87 | tf.logging.info('####### Training the DQN agent #####')
88 | tf.logging.info('####### DQN base_dir: {}'.format(FLAGS.base_dir))
89 | self.quickDqnFlags()
90 | train.main([])
91 | self.verifyFilesCreated(FLAGS.base_dir)
92 | shutil.rmtree(FLAGS.base_dir)
93 |
94 | def testIntegrationRainbow(self):
95 | """Test the rainbow agent."""
96 | tf.logging.info('####### Training the Rainbow agent #####')
97 | tf.logging.info('####### Rainbow base_dir: {}'.format(FLAGS.base_dir))
98 | self.quickRainbowFlags()
99 | train.main([])
100 | self.verifyFilesCreated(FLAGS.base_dir)
101 | shutil.rmtree(FLAGS.base_dir)
102 |
103 |
104 | if __name__ == '__main__':
105 | tf.test.main()
106 |
--------------------------------------------------------------------------------
/baselines/plots.html:
--------------------------------------------------------------------------------
1 |
16 |
17 |
18 |
19 |
20 |
21 |
22 |
23 |
24 |
33 |
34 |
35 |
36 |
Baseline plots
37 |
This page provides a quick visualization of the training runs for all our default agents.
23 |
24 | ```python
25 | __init__(
26 | *args,
27 | **kwargs
28 | )
29 | ```
30 |
31 | Initializes the agent and constructs the Graph.
32 |
33 | Most of this constructor's parameters are IQN-specific hyperparameters whose
34 | values are taken from Dabney et al. (2018).
35 |
36 | #### Args:
37 |
38 | * `sess`: `tf.Session` object for running associated ops.
39 | * `num_actions`: int, number of actions the agent can take at any
40 | state.
41 | * `kappa`: float, Huber loss cutoff.
42 | * `num_tau_samples`: int, number of online quantile samples for loss
43 | estimation.
44 | * `num_tau_prime_samples`: int, number of target quantile samples for
45 | loss estimation.
46 | * `num_quantile_samples`: int, number of quantile samples for computing
47 | Q-values.
48 | * `quantile_embedding_dim`: int, embedding dimension for the quantile
49 | input.
50 |
51 |
begin_episode
52 |
53 | ```python
54 | begin_episode(observation)
55 | ```
56 |
57 | Returns the agent's first action for this episode.
58 |
59 | #### Args:
60 |
61 | * `observation`: numpy array, the environment's initial observation.
62 |
63 | #### Returns:
64 |
65 | int, the selected action.
66 |
67 |
bundle_and_checkpoint
68 |
69 | ```python
70 | bundle_and_checkpoint(
71 | checkpoint_dir,
72 | iteration_number
73 | )
74 | ```
75 |
76 | Returns a self-contained bundle of the agent's state.
77 |
78 | This is used for checkpointing. It will return a dictionary containing all
79 | non-TensorFlow objects (to be saved into a file by the caller), and it saves all
80 | TensorFlow objects into a checkpoint file.
81 |
82 | #### Args:
83 |
84 | * `checkpoint_dir`: str, directory where TensorFlow objects will be
85 | saved.
86 | * `iteration_number`: int, iteration number to use for naming the
87 | checkpoint file.
88 |
89 | #### Returns:
90 |
91 | A dict containing additional Python objects to be checkpointed by the
92 | experiment. If the checkpoint directory does not exist, returns None.
93 |
94 |
end_episode
95 |
96 | ```python
97 | end_episode(reward)
98 | ```
99 |
100 | Signals the end of the episode to the agent.
101 |
102 | We store the observation of the current time step, which is the last observation
103 | of the episode.
104 |
105 | #### Args:
106 |
107 | * `reward`: float, the last reward from the environment.
108 |
109 |
step
110 |
111 | ```python
112 | step(
113 | reward,
114 | observation
115 | )
116 | ```
117 |
118 | Records the most recent transition and returns the agent's next action.
119 |
120 | We store the observation of the last time step since we want to store it with
121 | the reward.
122 |
123 | #### Args:
124 |
125 | * `reward`: float, the reward received from the agent's most recent
126 | action.
127 | * `observation`: numpy array, the most recent observation.
128 |
129 | #### Returns:
130 |
131 | int, the selected action.
132 |
133 |
unbundle
134 |
135 | ```python
136 | unbundle(
137 | checkpoint_dir,
138 | iteration_number,
139 | bundle_dictionary
140 | )
141 | ```
142 |
143 | Restores the agent from a checkpoint.
144 |
145 | Restores the agent's Python objects to those specified in bundle_dictionary, and
146 | restores the TensorFlow objects to those specified in the checkpoint_dir. If the
147 | checkpoint_dir does not exist, will not reset the agent's state.
148 |
149 | #### Args:
150 |
151 | * `checkpoint_dir`: str, path to the checkpoint saved by tf.Save.
152 | * `iteration_number`: int, checkpoint version, used when restoring
153 | replay buffer.
154 | * `bundle_dictionary`: dict, containing additional Python objects owned
155 | by the agent.
156 |
157 | #### Returns:
158 |
159 | bool, True if unbundling was successful.
160 |
--------------------------------------------------------------------------------
/dopamine/colab/tensorboard.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "nbformat": 4,
3 | "nbformat_minor": 0,
4 | "metadata": {
5 | "colab": {
6 | "name": "tensorboard.ipynb",
7 | "version": "0.3.2",
8 | "provenance": []
9 | },
10 | "kernelspec": {
11 | "name": "python3",
12 | "display_name": "Python 3"
13 | }
14 | },
15 | "cells": [
16 | {
17 | "metadata": {
18 | "id": "VYNA79KmgvbY",
19 | "colab_type": "text"
20 | },
21 | "cell_type": "markdown",
22 | "source": [
23 | "Copyright 2018 The Dopamine Authors.\n",
24 | "\n",
25 | "Licensed under the Apache License, Version 2.0 (the \"License\"); you may not use this file except in compliance with the License. You may obtain a copy of the License at\n",
26 | "\n",
27 | "https://www.apache.org/licenses/LICENSE-2.0\n",
28 | "\n",
29 | "Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an \"AS IS\" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License."
30 | ]
31 | },
32 | {
33 | "metadata": {
34 | "id": "Ctd9k0h6wnqT",
35 | "colab_type": "text"
36 | },
37 | "cell_type": "markdown",
38 | "source": [
39 | "# Visualize Dopamine baselines with Tensorboard\n",
40 | "This colab allows you to easily view the trained baselines with Tensorboard (even if you don't have Tensorboard on your local machine!).\n",
41 | "\n",
42 | "Simply specify the game you would like to visualize and then run the cells in order.\n",
43 | "\n",
44 | "_The instructions for setting up Tensorboard were obtained from https://www.dlology.com/blog/quick-guide-to-run-tensorboard-in-google-colab/_"
45 | ]
46 | },
47 | {
48 | "metadata": {
49 | "id": "s8r_45_0qpmb",
50 | "colab_type": "code",
51 | "colab": {},
52 | "cellView": "form"
53 | },
54 | "cell_type": "code",
55 | "source": [
56 | "# @title Prepare all necessary files and binaries.\n",
57 | "!wget https://bin.equinox.io/c/4VmDzA7iaHb/ngrok-stable-linux-amd64.zip\n",
58 | "!unzip ngrok-stable-linux-amd64.zip\n",
59 | "!gsutil -q -m cp -R gs://download-dopamine-rl/compiled_tb_event_files.tar.gz /content/\n",
60 | "!tar -xvzf /content/compiled_tb_event_files.tar.gz"
61 | ],
62 | "execution_count": 0,
63 | "outputs": []
64 | },
65 | {
66 | "metadata": {
67 | "id": "D-oZRzeWwHZN",
68 | "colab_type": "code",
69 | "colab": {},
70 | "cellView": "form"
71 | },
72 | "cell_type": "code",
73 | "source": [
74 | "# @title Select which game to visualize.\n",
75 | "game = 'Asterix' # @param['AirRaid', 'Alien', 'Amidar', 'Assault', 'Asterix', 'Asteroids', 'Atlantis', 'BankHeist', 'BattleZone', 'BeamRider', 'Berzerk', 'Bowling', 'Boxing', 'Breakout', 'Carnival', 'Centipede', 'ChopperCommand', 'CrazyClimber', 'DemonAttack', 'DoubleDunk', 'ElevatorAction', 'Enduro', 'FishingDerby', 'Freeway', 'Frostbite', 'Gopher', 'Gravitar', 'Hero', 'IceHockey', 'Jamesbond', 'JourneyEscape', 'Kangaroo', 'Krull', 'KungFuMaster', 'MontezumaRevenge', 'MsPacman', 'NameThisGame', 'Phoenix', 'Pitfall', 'Pong', 'Pooyan', 'PrivateEye', 'Qbert', 'Riverraid', 'RoadRunner', 'Robotank', 'Seaquest', 'Skiing', 'Solaris', 'SpaceInvaders', 'StarGunner', 'Tennis', 'TimePilot', 'Tutankham', 'UpNDown', 'Venture', 'VideoPinball', 'WizardOfWor', 'YarsRevenge', 'Zaxxon']\n",
76 | "agents = ['dqn', 'c51', 'rainbow', 'implicit_quantile']\n",
77 | "for agent in agents:\n",
78 | " for run in range(1, 6):\n",
79 | " !mkdir -p \"/content/$game/$agent/$run\"\n",
80 | " !cp -r \"/content/$agent/$game/$run\" \"/content/$game/$agent/$run\"\n",
81 | "LOG_DIR = '/content/{}'.format(game)\n",
82 | "get_ipython().system_raw(\n",
83 | " 'tensorboard --logdir {} --host 0.0.0.0 --port 6006 &'\n",
84 | " .format(LOG_DIR)\n",
85 | ")"
86 | ],
87 | "execution_count": 0,
88 | "outputs": []
89 | },
90 | {
91 | "metadata": {
92 | "id": "zlKKnaP4y9FA",
93 | "colab_type": "code",
94 | "colab": {
95 | "base_uri": "https://localhost:8080/",
96 | "height": 35
97 | },
98 | "cellView": "form",
99 | "outputId": "3abff714-c484-436e-dc5f-88b15511f4f2"
100 | },
101 | "cell_type": "code",
102 | "source": [
103 | "# @title Start the tensorboard\n",
104 | "get_ipython().system_raw('./ngrok http 6006 &')\n",
105 | "! curl -s http://localhost:4040/api/tunnels | python3 -c \\\n",
106 | " \"import sys, json; print(json.load(sys.stdin)['tunnels'][0]['public_url'])\""
107 | ],
108 | "execution_count": 0,
109 | "outputs": []
110 | }
111 | ]
112 | }
113 |
--------------------------------------------------------------------------------
/docs/api_docs/python/circular_replay_buffer/WrappedReplayBuffer.md:
--------------------------------------------------------------------------------
1 |
2 |
3 |
4 |
5 |
6 |
7 |
8 |
9 |
10 |
11 |
12 | # circular_replay_buffer.WrappedReplayBuffer
13 |
14 | ## Class `WrappedReplayBuffer`
15 |
16 | Wrapper of OutOfGraphReplayBuffer with an in graph sampling mechanism.
17 |
18 | Usage: To add a transition: call the add function.
19 |
20 | To sample a batch: Construct operations that depend on any of the tensors is the
21 | transition dictionary. Every sess.run that requires any of these tensors will
22 | sample a new transition.
23 |
24 | ## Methods
25 |
26 |
__init__
27 |
28 | ```python
29 | __init__(
30 | *args,
31 | **kwargs
32 | )
33 | ```
34 |
35 | Initializes WrappedReplayBuffer.
36 |
37 | #### Args:
38 |
39 | * `observation_shape`: tuple or int. If int, the observation is assumed
40 | to be a 2D square.
41 | * `stack_size`: int, number of frames to use in state stack.
42 | * `use_staging`: bool, when True it would use a staging area to
43 | prefetch the next sampling batch.
44 | * `replay_capacity`: int, number of transitions to keep in memory.
45 | * `batch_size`: int.
46 | * `update_horizon`: int, length of update ('n' in n-step update).
47 | * `gamma`: int, the discount factor.
48 | * `wrapped_memory`: The 'inner' memory data structure. If None, it
49 | creates the standard DQN replay memory.
50 | * `max_sample_attempts`: int, the maximum number of attempts allowed to
51 | get a sample.
52 | * `extra_storage_types`: list of ReplayElements defining the type of
53 | the extra contents that will be stored and returned by
54 | sample_transition_batch.
55 | * `observation_dtype`: np.dtype, type of the observations. Defaults to
56 | np.uint8 for Atari 2600.
57 |
58 | #### Raises:
59 |
60 | * `ValueError`: If update_horizon is not positive.
61 | * `ValueError`: If discount factor is not in [0, 1].
62 |
63 |
add
64 |
65 | ```python
66 | add(
67 | observation,
68 | action,
69 | reward,
70 | terminal,
71 | *args
72 | )
73 | ```
74 |
75 | Adds a transition to the replay memory.
76 |
77 | Since the next_observation in the transition will be the observation added next
78 | there is no need to pass it.
79 |
80 | If the replay memory is at capacity the oldest transition will be discarded.
81 |
82 | #### Args:
83 |
84 | * `observation`: np.array with shape observation_shape.
85 | * `action`: int, the action in the transition.
86 | * `reward`: float, the reward received in the transition.
87 | * `terminal`: A uint8 acting as a boolean indicating whether the
88 | transition was terminal (1) or not (0).
89 | * `*args`: extra contents with shapes and dtypes according to
90 | extra_storage_types.
91 |
92 |
create_sampling_ops
93 |
94 | ```python
95 | create_sampling_ops(use_staging)
96 | ```
97 |
98 | Creates the ops necessary to sample from the replay buffer.
99 |
100 | Creates the transition dictionary containing the sampling tensors.
101 |
102 | #### Args:
103 |
104 | * `use_staging`: bool, when True it would use a staging area to
105 | prefetch the next sampling batch.
106 |
107 |
load
108 |
109 | ```python
110 | load(
111 | checkpoint_dir,
112 | suffix
113 | )
114 | ```
115 |
116 | Loads the replay buffer's state from a saved file.
117 |
118 | #### Args:
119 |
120 | * `checkpoint_dir`: str, the directory where to read the numpy
121 | checkpointed files from.
122 | * `suffix`: str, the suffix to use in numpy checkpoint files.
123 |
124 |
save
125 |
126 | ```python
127 | save(
128 | checkpoint_dir,
129 | iteration_number
130 | )
131 | ```
132 |
133 | Save the underlying replay buffer's contents in a file.
134 |
135 | #### Args:
136 |
137 | * `checkpoint_dir`: str, the directory where to read the numpy
138 | checkpointed files from.
139 | * `iteration_number`: int, the iteration_number to use as a suffix in
140 | naming numpy checkpoint files.
141 |
142 |
11 |
12 | # dqn_agent.DQNAgent
13 |
14 | ## Class `DQNAgent`
15 |
16 | An implementation of the DQN agent.
17 |
18 | ## Methods
19 |
20 |
__init__
21 |
22 | ```python
23 | __init__(
24 | *args,
25 | **kwargs
26 | )
27 | ```
28 |
29 | Initializes the agent and constructs the components of its graph.
30 |
31 | #### Args:
32 |
33 | * `sess`: `tf.Session`, for executing ops.
34 | * `num_actions`: int, number of actions the agent can take at any
35 | state.
36 | * `gamma`: float, discount factor with the usual RL meaning.
37 | * `update_horizon`: int, horizon at which updates are performed, the
38 | 'n' in n-step update.
39 | * `min_replay_history`: int, number of transitions that should be
40 | experienced before the agent begins training its value function.
41 | * `update_period`: int, period between DQN updates.
42 | * `target_update_period`: int, update period for the target network.
43 | * `epsilon_fn`: function expecting 4 parameters: (decay_period, step,
44 | warmup_steps, epsilon). This function should return the epsilon value used
45 | for exploration during training.
46 | * `epsilon_train`: float, the value to which the agent's epsilon is
47 | eventually decayed during training.
48 | * `epsilon_eval`: float, epsilon used when evaluating the agent.
49 | * `epsilon_decay_period`: int, length of the epsilon decay schedule.
50 | * `tf_device`: str, Tensorflow device on which the agent's graph is
51 | executed.
52 | * `use_staging`: bool, when True use a staging area to prefetch the
53 | next training batch, speeding training up by about 30%.
54 | * `max_tf_checkpoints_to_keep`: int, the number of TensorFlow
55 | checkpoints to keep.
56 | * `optimizer`: `tf.train.Optimizer`, for training the value function.
57 |
58 |
begin_episode
59 |
60 | ```python
61 | begin_episode(observation)
62 | ```
63 |
64 | Returns the agent's first action for this episode.
65 |
66 | #### Args:
67 |
68 | * `observation`: numpy array, the environment's initial observation.
69 |
70 | #### Returns:
71 |
72 | int, the selected action.
73 |
74 |
bundle_and_checkpoint
75 |
76 | ```python
77 | bundle_and_checkpoint(
78 | checkpoint_dir,
79 | iteration_number
80 | )
81 | ```
82 |
83 | Returns a self-contained bundle of the agent's state.
84 |
85 | This is used for checkpointing. It will return a dictionary containing all
86 | non-TensorFlow objects (to be saved into a file by the caller), and it saves all
87 | TensorFlow objects into a checkpoint file.
88 |
89 | #### Args:
90 |
91 | * `checkpoint_dir`: str, directory where TensorFlow objects will be
92 | saved.
93 | * `iteration_number`: int, iteration number to use for naming the
94 | checkpoint file.
95 |
96 | #### Returns:
97 |
98 | A dict containing additional Python objects to be checkpointed by the
99 | experiment. If the checkpoint directory does not exist, returns None.
100 |
101 |
end_episode
102 |
103 | ```python
104 | end_episode(reward)
105 | ```
106 |
107 | Signals the end of the episode to the agent.
108 |
109 | We store the observation of the current time step, which is the last observation
110 | of the episode.
111 |
112 | #### Args:
113 |
114 | * `reward`: float, the last reward from the environment.
115 |
116 |
step
117 |
118 | ```python
119 | step(
120 | reward,
121 | observation
122 | )
123 | ```
124 |
125 | Records the most recent transition and returns the agent's next action.
126 |
127 | We store the observation of the last time step since we want to store it with
128 | the reward.
129 |
130 | #### Args:
131 |
132 | * `reward`: float, the reward received from the agent's most recent
133 | action.
134 | * `observation`: numpy array, the most recent observation.
135 |
136 | #### Returns:
137 |
138 | int, the selected action.
139 |
140 |
unbundle
141 |
142 | ```python
143 | unbundle(
144 | checkpoint_dir,
145 | iteration_number,
146 | bundle_dictionary
147 | )
148 | ```
149 |
150 | Restores the agent from a checkpoint.
151 |
152 | Restores the agent's Python objects to those specified in bundle_dictionary, and
153 | restores the TensorFlow objects to those specified in the checkpoint_dir. If the
154 | checkpoint_dir does not exist, will not reset the agent's state.
155 |
156 | #### Args:
157 |
158 | * `checkpoint_dir`: str, path to the checkpoint saved by tf.Save.
159 | * `iteration_number`: int, checkpoint version, used when restoring
160 | replay buffer.
161 | * `bundle_dictionary`: dict, containing additional Python objects owned
162 | by the agent.
163 |
164 | #### Returns:
165 |
166 | bool, True if unbundling was successful.
167 |
--------------------------------------------------------------------------------