-
Notifications
You must be signed in to change notification settings - Fork 2k
Closed
Labels
custom gym envIssue related to Custom Gym EnvIssue related to Custom Gym EnvquestionFurther information is requestedFurther information is requested
Description
Hi,
I was training on a custom environment on SB2 before and wanted to change to SB3 (mainly because having pytorch would probably be easier for my deployment)
So I trained on SB3 with HER+SAC and the same hyperparameter, but got different results. Is this to be expected due to a different SAC implementation, or what else could be the reason?
SB2 code
env = gym.make('armflex-v4')
eval_env = HERGoalEnvWrapper(env)
eval_callback = EvalCallback(eval_env, best_model_save_path=path,
log_path=path, eval_freq=20000,
deterministic=True, render=False, n_eval_episodes=15)
model_class = SAC
goal_selection_strategy = 'future'
model = HER('MlpPolicy', env, model_class, n_sampled_goal=4, goal_selection_strategy=goal_selection_strategy, verbose=1,
policy_kwargs=dict(layers=[512, 512]), buffer_size=1000000, batch_size=256, gamma=0.99, random_exploration=0.0,
ent_coef='auto', gradient_steps=1)
model.learn(total_timesteps=TIMESTEPS, callback=eval_callback, log_interval=1)
SB3 code
env = make_vec_env(env_name, n_envs=1)
env = ObsDictWrapper(env)
eval_callback = EvalCallback(eval_env, best_model_save_path=path,
log_path=path, eval_freq=20000,
deterministic=True, render=False, n_eval_episodes=15)
model_class = SAC
goal_selection_strategy = 'future'
model = HER('MlpPolicy', env, model_class, n_sampled_goal=4, online_sampling=False,
goal_selection_strategy=goal_selection_strategy, verbose=1, policy_kwargs=dict(net_arch=[512, 512]),
buffer_size=1000000, batch_size=256, gamma=0.99, ent_coef='auto', gradient_steps=1, max_episode_length=1000)
model.learn(total_timesteps=TIMESTEPS, callback=eval_callback, log_interval=1)
Metadata
Metadata
Assignees
Labels
custom gym envIssue related to Custom Gym EnvIssue related to Custom Gym EnvquestionFurther information is requestedFurther information is requested

