[Question] HER+SAC different results to SB2

Hi,
I was training on a custom environment on SB2 before and wanted to change to SB3 (mainly because having pytorch would probably be easier for my deployment)

So I trained on SB3 with HER+SAC and the same hyperparameter, but got different results. Is this to be expected due to a different SAC implementation, or what else could be the reason?

SB2 code
```
env = gym.make('armflex-v4')
eval_env = HERGoalEnvWrapper(env)
eval_callback = EvalCallback(eval_env, best_model_save_path=path,
                                log_path=path, eval_freq=20000,
                                deterministic=True, render=False, n_eval_episodes=15)
model_class = SAC
goal_selection_strategy = 'future'
model = HER('MlpPolicy', env, model_class, n_sampled_goal=4, goal_selection_strategy=goal_selection_strategy, verbose=1, 
    policy_kwargs=dict(layers=[512, 512]), buffer_size=1000000, batch_size=256, gamma=0.99, random_exploration=0.0, 
    ent_coef='auto', gradient_steps=1)
        
model.learn(total_timesteps=TIMESTEPS, callback=eval_callback, log_interval=1)
```
![sb2](https://user-images.githubusercontent.com/31765062/109496390-e1605c00-7a90-11eb-936b-c9bdfff35dee.png)

SB3 code
```
    env = make_vec_env(env_name, n_envs=1)
    env = ObsDictWrapper(env)
    eval_callback = EvalCallback(eval_env, best_model_save_path=path,
                            log_path=path, eval_freq=20000,
                            deterministic=True, render=False, n_eval_episodes=15)
    model_class = SAC
    goal_selection_strategy = 'future'
    model = HER('MlpPolicy', env, model_class, n_sampled_goal=4, online_sampling=False, 
        goal_selection_strategy=goal_selection_strategy, verbose=1, policy_kwargs=dict(net_arch=[512, 512]), 
        buffer_size=1000000, batch_size=256, gamma=0.99, ent_coef='auto', gradient_steps=1, max_episode_length=1000)
    
    model.learn(total_timesteps=TIMESTEPS, callback=eval_callback, log_interval=1)
```
![sb3](https://user-images.githubusercontent.com/31765062/109496566-1d93bc80-7a91-11eb-8d47-ec34f90b472a.png)
this should also be the mean over 100 episodes



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Question] HER+SAC different results to SB2 #335

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Question] HER+SAC different results to SB2 #335

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions