Skip to content
This repository was archived by the owner on Dec 11, 2022. It is now read-only.
Merged
Show file tree
Hide file tree
Changes from 10 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -277,6 +277,7 @@ dashboard
* [Clipped Proximal Policy Optimization (CPPO)](https://arxiv.org/pdf/1707.06347.pdf) | **Multi Worker Single Node** ([code](rl_coach/agents/clipped_ppo_agent.py))
* [Generalized Advantage Estimation (GAE)](https://arxiv.org/abs/1506.02438) ([code](rl_coach/agents/actor_critic_agent.py#L86))
* [Sample Efficient Actor-Critic with Experience Replay (ACER)](https://arxiv.org/abs/1611.01224) | **Multi Worker Single Node** ([code](rl_coach/agents/acer_agent.py))
* [Soft Actor-Critic (SAC)](https://arxiv.org/abs/1801.01290) ([code](rl_coach/agents/soft_actor_critic_agent.py))

### General Agents
* [Direct Future Prediction (DFP)](https://arxiv.org/abs/1611.01779) | **Multi Worker Single Node** ([code](rl_coach/agents/dfp_agent.py))
Expand Down
1 change: 1 addition & 0 deletions benchmarks/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -37,6 +37,7 @@ The environments that were used for testing include:
|**[ACER](acer)** | ![#2E8B57](https://placehold.it/15/2E8B57/000000?text=+) |Atari | |
|**[Clipped PPO](clipped_ppo)** | ![#2E8B57](https://placehold.it/15/2E8B57/000000?text=+) |Mujoco | |
|**[DDPG](ddpg)** | ![#2E8B57](https://placehold.it/15/2E8B57/000000?text=+) |Mujoco | |
|**[SAC](sac)** | ![#2E8B57](https://placehold.it/15/2E8B57/000000?text=+) |Mujoco | |
|**[NEC](nec)** | ![#2E8B57](https://placehold.it/15/2E8B57/000000?text=+) |Atari | |
|**[HER](ddpg_her)** | ![#2E8B57](https://placehold.it/15/2E8B57/000000?text=+) |Fetch | |
|**[DFP](dfp)** | ![#ceffad](https://placehold.it/15/ceffad/000000?text=+) |Doom | Doom Battle was not verified |
Expand Down
2 changes: 1 addition & 1 deletion benchmarks/clipped_ppo/README.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# Clipped PPO

Each experiment uses 3 seeds and is trained for 10k environment steps.
Each experiment uses 3 seeds and is trained for 10M environment steps.
The parameters used for Clipped PPO are the same parameters as described in the [original paper](https://arxiv.org/abs/1707.06347).

### Inverted Pendulum Clipped PPO - single worker
Expand Down
2 changes: 1 addition & 1 deletion benchmarks/ddpg/README.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# DDPG

Each experiment uses 3 seeds and is trained for 2k environment steps.
Each experiment uses 3 seeds and is trained for 2M environment steps.
The parameters used for DDPG are the same parameters as described in the [original paper](https://arxiv.org/abs/1509.02971).

### Inverted Pendulum DDPG - single worker
Expand Down
48 changes: 48 additions & 0 deletions benchmarks/sac/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,48 @@
# Soft Actor Critic

Each experiment uses 3 seeds and is trained for 3M environment steps.
The parameters used for SAC are the same parameters as described in the [original paper](https://arxiv.org/abs/1801.01290).

### Inverted Pendulum SAC - single worker

```bash
coach -p Mujoco_SAC -lvl inverted_pendulum
```

<img src="inverted_pendulum_sac.png" alt="Inverted Pendulum SAC" width="800"/>


### Hopper Clipped SAC - single worker

```bash
coach -p Mujoco_SAC -lvl hopper
```

<img src="hopper_sac.png" alt="Hopper SAC" width="800"/>


### Half Cheetah Clipped SAC - single worker

```bash
coach -p Mujoco_SAC -lvl half_cheetah
```

<img src="half_cheetah_sac.png" alt="Half Cheetah SAC" width="800"/>


### Walker 2D Clipped SAC - single worker

```bash
coach -p Mujoco_SAC -lvl walker2d
```

<img src="walker2d_sac.png" alt="Walker 2D SAC" width="800"/>


### Humanoid Clipped SAC - single worker

```bash
coach -p Mujoco_SAC -lvl humanoid
```

<img src="humanoid_sac.png" alt="Humanoid SAC" width="800"/>
Binary file added benchmarks/sac/half_cheetah_sac.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added benchmarks/sac/hopper_sac.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added benchmarks/sac/humanoid_sac.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added benchmarks/sac/inverted_pendulum_sac.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added benchmarks/sac/walker2d_sac.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified docs/_images/algorithms.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/_images/sac.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
50 changes: 20 additions & 30 deletions docs/_modules/index.html
Original file line number Diff line number Diff line change
Expand Up @@ -17,35 +17,38 @@





<script type="text/javascript" src="../_static/js/modernizr.min.js"></script>


<script type="text/javascript" id="documentation_options" data-url_root="../" src="../_static/documentation_options.js"></script>
<script type="text/javascript" src="../_static/jquery.js"></script>
<script type="text/javascript" src="../_static/underscore.js"></script>
<script type="text/javascript" src="../_static/doctools.js"></script>
<script type="text/javascript" src="../_static/language_data.js"></script>
<script async="async" type="text/javascript" src="https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/latest.js?config=TeX-AMS-MML_HTMLorMML"></script>

<script type="text/javascript" src="../_static/js/theme.js"></script>




<link rel="stylesheet" href="../_static/css/theme.css" type="text/css" />
<link rel="stylesheet" href="../_static/pygments.css" type="text/css" />
<link rel="stylesheet" href="../_static/css/custom.css" type="text/css" />
<link rel="index" title="Index" href="../genindex.html" />
<link rel="search" title="Search" href="../search.html" />
<link href="../_static/css/custom.css" rel="stylesheet" type="text/css">



<script src="../_static/js/modernizr.min.js"></script>

</head>

<body class="wy-body-for-nav">


<div class="wy-grid-for-nav">


<nav data-toggle="wy-nav-shift" class="wy-nav-side">
<div class="wy-side-scroll">
<div class="wy-side-nav-search">
<div class="wy-side-nav-search" >



Expand Down Expand Up @@ -179,6 +182,7 @@ <h1>All modules for which code is available</h1>
<ul><li><a href="rl_coach/agents/acer_agent.html">rl_coach.agents.acer_agent</a></li>
<li><a href="rl_coach/agents/actor_critic_agent.html">rl_coach.agents.actor_critic_agent</a></li>
<li><a href="rl_coach/agents/agent.html">rl_coach.agents.agent</a></li>
<li><a href="rl_coach/agents/agent_interface.html">rl_coach.agents.agent_interface</a></li>
<li><a href="rl_coach/agents/bc_agent.html">rl_coach.agents.bc_agent</a></li>
<li><a href="rl_coach/agents/categorical_dqn_agent.html">rl_coach.agents.categorical_dqn_agent</a></li>
<li><a href="rl_coach/agents/cil_agent.html">rl_coach.agents.cil_agent</a></li>
Expand All @@ -195,19 +199,17 @@ <h1>All modules for which code is available</h1>
<li><a href="rl_coach/agents/ppo_agent.html">rl_coach.agents.ppo_agent</a></li>
<li><a href="rl_coach/agents/qr_dqn_agent.html">rl_coach.agents.qr_dqn_agent</a></li>
<li><a href="rl_coach/agents/rainbow_dqn_agent.html">rl_coach.agents.rainbow_dqn_agent</a></li>
<li><a href="rl_coach/agents/soft_actor_critic_agent.html">rl_coach.agents.soft_actor_critic_agent</a></li>
<li><a href="rl_coach/agents/value_optimization_agent.html">rl_coach.agents.value_optimization_agent</a></li>
<li><a href="rl_coach/architectures/architecture.html">rl_coach.architectures.architecture</a></li>
<li><a href="rl_coach/architectures/network_wrapper.html">rl_coach.architectures.network_wrapper</a></li>
<li><a href="rl_coach/base_parameters.html">rl_coach.base_parameters</a></li>
<li><a href="rl_coach/core_types.html">rl_coach.core_types</a></li>
<li><a href="rl_coach/data_stores/nfs_data_store.html">rl_coach.data_stores.nfs_data_store</a></li>
<li><a href="rl_coach/data_stores/s3_data_store.html">rl_coach.data_stores.s3_data_store</a></li>
<li><a href="rl_coach/environments/carla_environment.html">rl_coach.environments.carla_environment</a></li>
<li><a href="rl_coach/environments/control_suite_environment.html">rl_coach.environments.control_suite_environment</a></li>
<li><a href="rl_coach/environments/doom_environment.html">rl_coach.environments.doom_environment</a></li>
<li><a href="rl_coach/environments/environment.html">rl_coach.environments.environment</a></li>
<li><a href="rl_coach/environments/gym_environment.html">rl_coach.environments.gym_environment</a></li>
<li><a href="rl_coach/environments/starcraft2_environment.html">rl_coach.environments.starcraft2_environment</a></li>
<li><a href="rl_coach/exploration_policies/additive_noise.html">rl_coach.exploration_policies.additive_noise</a></li>
<li><a href="rl_coach/exploration_policies/boltzmann.html">rl_coach.exploration_policies.boltzmann</a></li>
<li><a href="rl_coach/exploration_policies/bootstrapped.html">rl_coach.exploration_policies.bootstrapped</a></li>
Expand Down Expand Up @@ -250,7 +252,6 @@ <h1>All modules for which code is available</h1>
<li><a href="rl_coach/memories/non_episodic/experience_replay.html">rl_coach.memories.non_episodic.experience_replay</a></li>
<li><a href="rl_coach/memories/non_episodic/prioritized_experience_replay.html">rl_coach.memories.non_episodic.prioritized_experience_replay</a></li>
<li><a href="rl_coach/memories/non_episodic/transition_collection.html">rl_coach.memories.non_episodic.transition_collection</a></li>
<li><a href="rl_coach/orchestrators/kubernetes_orchestrator.html">rl_coach.orchestrators.kubernetes_orchestrator</a></li>
<li><a href="rl_coach/spaces.html">rl_coach.spaces</a></li>
</ul>

Expand Down Expand Up @@ -281,27 +282,16 @@ <h1>All modules for which code is available</h1>







<script type="text/javascript" id="documentation_options" data-url_root="../" src="../_static/documentation_options.js"></script>
<script type="text/javascript" src="../_static/jquery.js"></script>
<script type="text/javascript" src="../_static/underscore.js"></script>
<script type="text/javascript" src="../_static/doctools.js"></script>
<script type="text/javascript" src="../_static/language_data.js"></script>
<script async="async" type="text/javascript" src="https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/latest.js?config=TeX-AMS-MML_HTMLorMML"></script>




<script type="text/javascript" src="../_static/js/theme.js"></script>

<script type="text/javascript">
jQuery(function () {
SphinxRtdTheme.Navigation.enable(true);
});
</script>
</script>






</body>
</html>
46 changes: 19 additions & 27 deletions docs/_modules/rl_coach/agents/acer_agent.html
Original file line number Diff line number Diff line change
Expand Up @@ -17,35 +17,38 @@





<script type="text/javascript" src="../../../_static/js/modernizr.min.js"></script>


<script type="text/javascript" id="documentation_options" data-url_root="../../../" src="../../../_static/documentation_options.js"></script>
<script type="text/javascript" src="../../../_static/jquery.js"></script>
<script type="text/javascript" src="../../../_static/underscore.js"></script>
<script type="text/javascript" src="../../../_static/doctools.js"></script>
<script type="text/javascript" src="../../../_static/language_data.js"></script>
<script async="async" type="text/javascript" src="https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/latest.js?config=TeX-AMS-MML_HTMLorMML"></script>

<script type="text/javascript" src="../../../_static/js/theme.js"></script>




<link rel="stylesheet" href="../../../_static/css/theme.css" type="text/css" />
<link rel="stylesheet" href="../../../_static/pygments.css" type="text/css" />
<link rel="stylesheet" href="../../../_static/css/custom.css" type="text/css" />
<link rel="index" title="Index" href="../../../genindex.html" />
<link rel="search" title="Search" href="../../../search.html" />
<link href="../../../_static/css/custom.css" rel="stylesheet" type="text/css">



<script src="../../../_static/js/modernizr.min.js"></script>

</head>

<body class="wy-body-for-nav">


<div class="wy-grid-for-nav">


<nav data-toggle="wy-nav-shift" class="wy-nav-side">
<div class="wy-side-scroll">
<div class="wy-side-nav-search">
<div class="wy-side-nav-search" >



Expand Down Expand Up @@ -248,7 +251,7 @@ <h1>Source code for rl_coach.agents.acer_agent</h1><div class="highlight"><pre>
<span class="bp">self</span><span class="o">.</span><span class="n">num_steps_between_gradient_updates</span> <span class="o">=</span> <span class="mi">5000</span>
<span class="bp">self</span><span class="o">.</span><span class="n">ratio_of_replay</span> <span class="o">=</span> <span class="mi">4</span>
<span class="bp">self</span><span class="o">.</span><span class="n">num_transitions_to_start_replay</span> <span class="o">=</span> <span class="mi">10000</span>
<span class="bp">self</span><span class="o">.</span><span class="n">rate_for_copying_weights_to_target</span> <span class="o">=</span> <span class="mf">0.99</span>
<span class="bp">self</span><span class="o">.</span><span class="n">rate_for_copying_weights_to_target</span> <span class="o">=</span> <span class="mf">0.01</span>
<span class="bp">self</span><span class="o">.</span><span class="n">importance_weight_truncation</span> <span class="o">=</span> <span class="mf">10.0</span>
<span class="bp">self</span><span class="o">.</span><span class="n">use_trust_region_optimization</span> <span class="o">=</span> <span class="kc">True</span>
<span class="bp">self</span><span class="o">.</span><span class="n">max_KL_divergence</span> <span class="o">=</span> <span class="mf">1.0</span>
Expand Down Expand Up @@ -405,27 +408,16 @@ <h1>Source code for rl_coach.agents.acer_agent</h1><div class="highlight"><pre>







<script type="text/javascript" id="documentation_options" data-url_root="../../../" src="../../../_static/documentation_options.js"></script>
<script type="text/javascript" src="../../../_static/jquery.js"></script>
<script type="text/javascript" src="../../../_static/underscore.js"></script>
<script type="text/javascript" src="../../../_static/doctools.js"></script>
<script type="text/javascript" src="../../../_static/language_data.js"></script>
<script async="async" type="text/javascript" src="https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/latest.js?config=TeX-AMS-MML_HTMLorMML"></script>




<script type="text/javascript" src="../../../_static/js/theme.js"></script>

<script type="text/javascript">
jQuery(function () {
SphinxRtdTheme.Navigation.enable(true);
});
</script>
</script>






</body>
</html>
44 changes: 18 additions & 26 deletions docs/_modules/rl_coach/agents/actor_critic_agent.html
Original file line number Diff line number Diff line change
Expand Up @@ -17,35 +17,38 @@





<script type="text/javascript" src="../../../_static/js/modernizr.min.js"></script>


<script type="text/javascript" id="documentation_options" data-url_root="../../../" src="../../../_static/documentation_options.js"></script>
<script type="text/javascript" src="../../../_static/jquery.js"></script>
<script type="text/javascript" src="../../../_static/underscore.js"></script>
<script type="text/javascript" src="../../../_static/doctools.js"></script>
<script type="text/javascript" src="../../../_static/language_data.js"></script>
<script async="async" type="text/javascript" src="https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/latest.js?config=TeX-AMS-MML_HTMLorMML"></script>

<script type="text/javascript" src="../../../_static/js/theme.js"></script>




<link rel="stylesheet" href="../../../_static/css/theme.css" type="text/css" />
<link rel="stylesheet" href="../../../_static/pygments.css" type="text/css" />
<link rel="stylesheet" href="../../../_static/css/custom.css" type="text/css" />
<link rel="index" title="Index" href="../../../genindex.html" />
<link rel="search" title="Search" href="../../../search.html" />
<link href="../../../_static/css/custom.css" rel="stylesheet" type="text/css">



<script src="../../../_static/js/modernizr.min.js"></script>

</head>

<body class="wy-body-for-nav">


<div class="wy-grid-for-nav">


<nav data-toggle="wy-nav-shift" class="wy-nav-side">
<div class="wy-side-scroll">
<div class="wy-side-nav-search">
<div class="wy-side-nav-search" >



Expand Down Expand Up @@ -393,27 +396,16 @@ <h1>Source code for rl_coach.agents.actor_critic_agent</h1><div class="highlight







<script type="text/javascript" id="documentation_options" data-url_root="../../../" src="../../../_static/documentation_options.js"></script>
<script type="text/javascript" src="../../../_static/jquery.js"></script>
<script type="text/javascript" src="../../../_static/underscore.js"></script>
<script type="text/javascript" src="../../../_static/doctools.js"></script>
<script type="text/javascript" src="../../../_static/language_data.js"></script>
<script async="async" type="text/javascript" src="https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/latest.js?config=TeX-AMS-MML_HTMLorMML"></script>




<script type="text/javascript" src="../../../_static/js/theme.js"></script>

<script type="text/javascript">
jQuery(function () {
SphinxRtdTheme.Navigation.enable(true);
});
</script>
</script>






</body>
</html>
Loading