Skip to content

Commit e3875b5

Browse files
araffinAdamGleave
andauthored
Stable-Baselines3 v1.0 (#354)
* Bump version and update doc * Fix name * Apply suggestions from code review Co-authored-by: Adam Gleave <[email protected]> * Update docs/index.rst Co-authored-by: Adam Gleave <[email protected]> * Update wording for RL zoo Co-authored-by: Adam Gleave <[email protected]>
1 parent 237223f commit e3875b5

File tree

11 files changed

+75
-17
lines changed

11 files changed

+75
-17
lines changed

README.md

Lines changed: 8 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -36,7 +36,7 @@ you can take a look at the issues [#48](https://github.com/DLR-RM/stable-baselin
3636
| Type hints | :heavy_check_mark: |
3737

3838

39-
### Planned features (v1.1+)
39+
### Planned features
4040

4141
Please take a look at the [Roadmap](https://github.com/DLR-RM/stable-baselines3/issues/1) and [Milestones](https://github.com/DLR-RM/stable-baselines3/milestones).
4242

@@ -48,11 +48,13 @@ A migration guide from SB2 to SB3 can be found in the [documentation](https://st
4848

4949
Documentation is available online: [https://stable-baselines3.readthedocs.io/](https://stable-baselines3.readthedocs.io/)
5050

51-
## RL Baselines3 Zoo: A Collection of Trained RL Agents
51+
## RL Baselines3 Zoo: A Training Framework for Stable Baselines3 Reinforcement Learning Agents
5252

53-
[RL Baselines3 Zoo](https://github.com/DLR-RM/rl-baselines3-zoo). is a collection of pre-trained Reinforcement Learning agents using Stable-Baselines3.
53+
[RL Baselines3 Zoo](https://github.com/DLR-RM/rl-baselines3-zoo) is a training framework for Reinforcement Learning (RL).
5454

55-
It also provides basic scripts for training, evaluating agents, tuning hyperparameters, plotting results and recording videos.
55+
It provides scripts for training, evaluating agents, tuning hyperparameters, plotting results and recording videos.
56+
57+
In addition, it includes a collection of tuned hyperparameters for common environments and RL algorithms, and agents trained with those settings.
5658

5759
Goals of this repository:
5860

@@ -110,9 +112,9 @@ import gym
110112

111113
from stable_baselines3 import PPO
112114

113-
env = gym.make('CartPole-v1')
115+
env = gym.make("CartPole-v1")
114116

115-
model = PPO('MlpPolicy', env, verbose=1)
117+
model = PPO("MlpPolicy", env, verbose=1)
116118
model.learn(total_timesteps=10000)
117119

118120
obs = env.reset()

docs/_static/img/net_arch.png

135 KB
Loading

docs/_static/img/sb3_loop.png

165 KB
Loading

docs/_static/img/sb3_policy.png

176 KB
Loading

docs/guide/custom_policy.rst

Lines changed: 42 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -13,9 +13,49 @@ and other type of input features (MlpPolicies).
1313
which handles bounds more correctly.
1414

1515

16+
SB3 Policy
17+
^^^^^^^^^^
1618

17-
Custom Policy Architecture
18-
^^^^^^^^^^^^^^^^^^^^^^^^^^
19+
SB3 networks are separated into two mains parts (see figure below):
20+
21+
- A features extractor (usually shared between actor and critic when applicable, to save computation)
22+
whose role is to extract features (i.e. convert to a feature vector) from high-dimensional observations, for instance, a CNN that extracts features from images.
23+
This is the ``features_extractor_class`` parameter. You can change the default parameters of that features extractor
24+
by passing a ``features_extractor_kwargs`` parameter.
25+
26+
- A (fully-connected) network that maps the features to actions/value. Its architecture is controlled by the ``net_arch`` parameter.
27+
28+
29+
.. note::
30+
31+
All observations are first pre-processed (e.g. images are normalized, discrete obs are converted to one-hot vectors, ...) before being fed to the features extractor.
32+
In the case of vector observations, the features extractor is just a ``Flatten`` layer.
33+
34+
35+
.. image:: ../_static/img/net_arch.png
36+
37+
38+
SB3 policies are usually composed of several networks (actor/critic networks + target networks when applicable) together
39+
with the associated optimizers.
40+
41+
Each of these network have a features extractor followed by a fully-connected network.
42+
43+
.. note::
44+
45+
When we refer to "policy" in Stable-Baselines3, this is usually an abuse of language compared to RL terminology.
46+
In SB3, "policy" refers to the class that handles all the networks useful for training,
47+
so not only the network used to predict actions (the "learned controller").
48+
49+
50+
51+
.. image:: ../_static/img/sb3_policy.png
52+
53+
54+
.. .. figure:: https://cdn-images-1.medium.com/max/960/1*h4WTQNVIsvMXJTCpXm_TAw.gif
55+
56+
57+
Custom Network Architecture
58+
^^^^^^^^^^^^^^^^^^^^^^^^^^^
1959

2060
One way of customising the policy network architecture is to pass arguments when creating the model,
2161
using ``policy_kwargs`` parameter:

docs/guide/developer.rst

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -31,6 +31,9 @@ Each algorithm has two main methods:
3131
- ``.train()`` which updates the parameters using samples from the buffer
3232

3333

34+
.. image:: ../_static/img/sb3_loop.png
35+
36+
3437
Where to start?
3538
===============
3639

docs/guide/migration.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -98,7 +98,7 @@ Base-class (all algorithms)
9898
Policies
9999
^^^^^^^^
100100

101-
- ``cnn_extractor`` -> ``feature_extractor``, as ``feature_extractor`` in now used with ``MlpPolicy`` too
101+
- ``cnn_extractor`` -> ``features_extractor``, as ``features_extractor`` in now used with ``MlpPolicy`` too
102102

103103
A2C
104104
^^^

docs/guide/rl_zoo.rst

Lines changed: 5 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -4,9 +4,11 @@
44
RL Baselines3 Zoo
55
==================
66

7-
`RL Baselines3 Zoo <https://github.com/DLR-RM/rl-baselines3-zoo>`_. is a collection of pre-trained Reinforcement Learning agents using
8-
Stable-Baselines3.
9-
It also provides basic scripts for training, evaluating agents, tuning hyperparameters and recording videos.
7+
`RL Baselines3 Zoo <https://github.com/DLR-RM/rl-baselines3-zoo>`_ is a training framework for Reinforcement Learning (RL).
8+
9+
It provides scripts for training, evaluating agents, tuning hyperparameters, plotting results and recording videos.
10+
11+
In addition, it includes a collection of tuned hyperparameters for common environments and RL algorithms, and agents trained with those settings.
1012

1113
Goals of this repository:
1214

docs/index.rst

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -12,9 +12,9 @@ It is the next major version of `Stable Baselines <https://github.com/hill-a/sta
1212

1313
Github repository: https://github.com/DLR-RM/stable-baselines3
1414

15-
RL Baselines3 Zoo (collection of pre-trained agents): https://github.com/DLR-RM/rl-baselines3-zoo
15+
RL Baselines3 Zoo (training framework for SB3): https://github.com/DLR-RM/rl-baselines3-zoo
1616

17-
RL Baselines3 Zoo also offers a simple interface to train, evaluate agents and do hyperparameter tuning.
17+
RL Baselines3 Zoo provides a collection of pre-trained agents, scripts for training, evaluating agents, tuning hyperparameters, plotting results and recording videos.
1818

1919
SB3 Contrib (experimental RL code, latest algorithms): https://github.com/Stable-Baselines-Team/stable-baselines3-contrib
2020

docs/misc/changelog.rst

Lines changed: 13 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -3,13 +3,22 @@
33
Changelog
44
==========
55

6-
Release 1.0rc2 (WIP)
6+
Release 1.0 (2021-03-15)
77
-------------------------------
88

9+
**First Major Version**
10+
911
Breaking Changes:
1012
^^^^^^^^^^^^^^^^^
1113
- Removed ``stable_baselines3.common.cmd_util`` (already deprecated), please use ``env_util`` instead
1214

15+
.. warning::
16+
17+
A refactoring of the ``HER`` algorithm is planned together with support for dictionary observations
18+
(see `PR #243 <https://github.com/DLR-RM/stable-baselines3/pull/243>`_ and `#351 <https://github.com/DLR-RM/stable-baselines3/pull/351>`_)
19+
This will be a backward incompatible change (model trained with previous version of ``HER`` won't work with the new version).
20+
21+
1322
New Features:
1423
^^^^^^^^^^^^^
1524
- Added support for ``custom_objects`` when loading models
@@ -24,7 +33,9 @@ Documentation:
2433
- Added new project using SB3: rl_reach (@PierreExeter)
2534
- Added note about slow-down when switching to PyTorch
2635
- Add a note on continual learning and resetting environment
27-
36+
- Updated RL-Zoo to reflect the fact that is it more than a collection of trained agents
37+
- Added images to illustrate the training loop and custom policies (created with https://excalidraw.com/)
38+
- Updated the custom policy section
2839

2940
Pre-Release 0.11.1 (2021-02-27)
3041
-------------------------------

0 commit comments

Comments
 (0)