IntelLabs
diff --git a/‎README.md‎
Lines changed: 3 additions & 2 deletions b/‎README.md‎
Lines changed: 3 additions & 2 deletions
diff --git a/‎benchmarks/README.md‎
Lines changed: 1 addition & 0 deletions b/‎benchmarks/README.md‎
Lines changed: 1 addition & 0 deletions
diff --git a/‎benchmarks/clipped_ppo/README.md‎
Lines changed: 1 addition & 1 deletion b/‎benchmarks/clipped_ppo/README.md‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎benchmarks/ddpg/README.md‎
Lines changed: 1 addition & 1 deletion b/‎benchmarks/ddpg/README.md‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎benchmarks/sac/README.md‎
Lines changed: 48 additions & 0 deletions b/‎benchmarks/sac/README.md‎
Lines changed: 48 additions & 0 deletions
diff --git a/‎benchmarks/sac/half_cheetah_sac.png‎
65.6 KB b/‎benchmarks/sac/half_cheetah_sac.png‎
65.6 KB
diff --git a/‎benchmarks/sac/hopper_sac.png‎
97.3 KB b/‎benchmarks/sac/hopper_sac.png‎
97.3 KB
diff --git a/‎benchmarks/sac/humanoid_sac.png‎
89.6 KB b/‎benchmarks/sac/humanoid_sac.png‎
89.6 KB
diff --git a/‎benchmarks/sac/inverted_pendulum_sac.png‎
48.9 KB b/‎benchmarks/sac/inverted_pendulum_sac.png‎
48.9 KB
diff --git a/‎benchmarks/sac/walker2d_sac.png‎
76.6 KB b/‎benchmarks/sac/walker2d_sac.png‎
76.6 KB
@@ -25,11 +25,11 @@ coach -p CartPole_DQN -r
 <img src="img/doom_health.gif" alt="Doom Health Gathering"/> <img src="img/minitaur.gif" alt="PyBullet Minitaur" width = "249" height ="200"/> <img src="img/ant.gif" alt="Gym Extensions Ant"/>
 <br><br>
 
-Blog posts from the Intel® AI website:
 * [Release 0.8.0](https://ai.intel.com/reinforcement-learning-coach-intel/) (initial release)
 * [Release 0.9.0](https://ai.intel.com/reinforcement-learning-coach-carla-qr-dqn/)
 * [Release 0.10.0](https://ai.intel.com/introducing-reinforcement-learning-coach-0-10-0/)
-* [Release 0.11.0](https://ai.intel.com/rl-coach-data-science-at-scale) (current release)
+* [Release 0.11.0](https://ai.intel.com/rl-coach-data-science-at-scale)
+* Release 0.12.0 (current release)
 
 Contacting the Coach development team is also possible through the email [[email protected]]([email protected])
 
@@ -277,6 +277,7 @@ dashboard
 * [Clipped Proximal Policy Optimization (CPPO)](https://arxiv.org/pdf/1707.06347.pdf) | **Multi Worker Single Node**  ([code](rl_coach/agents/clipped_ppo_agent.py))
 * [Generalized Advantage Estimation (GAE)](https://arxiv.org/abs/1506.02438) ([code](rl_coach/agents/actor_critic_agent.py#L86))
 * [Sample Efficient Actor-Critic with Experience Replay (ACER)](https://arxiv.org/abs/1611.01224) | **Multi Worker Single Node**  ([code](rl_coach/agents/acer_agent.py))
+* [Soft Actor-Critic (SAC)](https://arxiv.org/abs/1801.01290) ([code](rl_coach/agents/soft_actor_critic_agent.py))
 
 ### General Agents
 * [Direct Future Prediction (DFP)](https://arxiv.org/abs/1611.01779) | **Multi Worker Single Node**  ([code](rl_coach/agents/dfp_agent.py))
 
@@ -37,6 +37,7 @@ The environments that were used for testing include:
 |**[ACER](acer)**                | ![#2E8B57](https://placehold.it/15/2E8B57/000000?text=+) |Atari           | |
 |**[Clipped PPO](clipped_ppo)**  | ![#2E8B57](https://placehold.it/15/2E8B57/000000?text=+) |Mujoco          | |
 |**[DDPG](ddpg)**                | ![#2E8B57](https://placehold.it/15/2E8B57/000000?text=+) |Mujoco          | |
+|**[SAC](sac)**                  | ![#2E8B57](https://placehold.it/15/2E8B57/000000?text=+) |Mujoco          | |
 |**[NEC](nec)**                  | ![#2E8B57](https://placehold.it/15/2E8B57/000000?text=+) |Atari           | |
 |**[HER](ddpg_her)**                  | ![#2E8B57](https://placehold.it/15/2E8B57/000000?text=+) |Fetch           | |
 |**[DFP](dfp)**                  | ![#ceffad](https://placehold.it/15/ceffad/000000?text=+) |Doom            | Doom Battle was not verified |
 
@@ -1,6 +1,6 @@
 # Clipped PPO
 
-Each experiment uses 3 seeds and is trained for 10k environment steps.
+Each experiment uses 3 seeds and is trained for 10M environment steps.
 The parameters used for Clipped PPO are the same parameters as described in the [original paper](https://arxiv.org/abs/1707.06347).
 
 ### Inverted Pendulum Clipped PPO - single worker
 
@@ -1,6 +1,6 @@
 # DDPG
 
-Each experiment uses 3 seeds and is trained for 2k environment steps.
+Each experiment uses 3 seeds and is trained for 2M environment steps.
 The parameters used for DDPG are the same parameters as described in the [original paper](https://arxiv.org/abs/1509.02971).
 
 ### Inverted Pendulum DDPG - single worker
 
@@ -0,0 +1,48 @@
+# Soft Actor Critic
+
+Each experiment uses 3 seeds and is trained for 3M environment steps.
+The parameters used for SAC are the same parameters as described in the [original paper](https://arxiv.org/abs/1801.01290).
+
+### Inverted Pendulum SAC - single worker
+
+```bash
+coach -p Mujoco_SAC -lvl inverted_pendulum
+```
+
+<img src="inverted_pendulum_sac.png" alt="Inverted Pendulum SAC" width="800"/>
+
+
+### Hopper Clipped SAC - single worker
+
+```bash
+coach -p Mujoco_SAC -lvl hopper
+```
+
+<img src="hopper_sac.png" alt="Hopper SAC" width="800"/>
+
+
+### Half Cheetah Clipped SAC - single worker
+
+```bash
+coach -p Mujoco_SAC -lvl half_cheetah
+```
+
+<img src="half_cheetah_sac.png" alt="Half Cheetah SAC" width="800"/>
+
+
+### Walker 2D Clipped SAC - single worker
+
+```bash
+coach -p Mujoco_SAC -lvl walker2d
+```
+
+<img src="walker2d_sac.png" alt="Walker 2D SAC" width="800"/>
+
+
+### Humanoid Clipped SAC - single worker
+
+```bash
+coach -p Mujoco_SAC -lvl humanoid
+```
+
+<img src="humanoid_sac.png" alt="Humanoid SAC" width="800"/>