Skip to content

[Question] HER+SAC different results to SB2 #335

@Ludilu

Description

@Ludilu

Hi,
I was training on a custom environment on SB2 before and wanted to change to SB3 (mainly because having pytorch would probably be easier for my deployment)

So I trained on SB3 with HER+SAC and the same hyperparameter, but got different results. Is this to be expected due to a different SAC implementation, or what else could be the reason?

SB2 code

env = gym.make('armflex-v4')
eval_env = HERGoalEnvWrapper(env)
eval_callback = EvalCallback(eval_env, best_model_save_path=path,
                                log_path=path, eval_freq=20000,
                                deterministic=True, render=False, n_eval_episodes=15)
model_class = SAC
goal_selection_strategy = 'future'
model = HER('MlpPolicy', env, model_class, n_sampled_goal=4, goal_selection_strategy=goal_selection_strategy, verbose=1, 
    policy_kwargs=dict(layers=[512, 512]), buffer_size=1000000, batch_size=256, gamma=0.99, random_exploration=0.0, 
    ent_coef='auto', gradient_steps=1)
        
model.learn(total_timesteps=TIMESTEPS, callback=eval_callback, log_interval=1)

sb2

SB3 code

    env = make_vec_env(env_name, n_envs=1)
    env = ObsDictWrapper(env)
    eval_callback = EvalCallback(eval_env, best_model_save_path=path,
                            log_path=path, eval_freq=20000,
                            deterministic=True, render=False, n_eval_episodes=15)
    model_class = SAC
    goal_selection_strategy = 'future'
    model = HER('MlpPolicy', env, model_class, n_sampled_goal=4, online_sampling=False, 
        goal_selection_strategy=goal_selection_strategy, verbose=1, policy_kwargs=dict(net_arch=[512, 512]), 
        buffer_size=1000000, batch_size=256, gamma=0.99, ent_coef='auto', gradient_steps=1, max_episode_length=1000)
    
    model.learn(total_timesteps=TIMESTEPS, callback=eval_callback, log_interval=1)

sb3
this should also be the mean over 100 episodes

Metadata

Metadata

Assignees

No one assigned

    Labels

    custom gym envIssue related to Custom Gym EnvquestionFurther information is requested

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions