Skip to content

What happens when episode length is less than n_steps parameter in PPO #560

@Zymoo

Description

@Zymoo

Hello!

In my research with PPO I have a number of samples equal to 365, which is basically just a feature vector for every day in one year. When training the model on this data I used to keep the default parameter n_steps equal to 2048 and just set the number of total_timesteps to 100 000.

However, I do not quite understand what happens when the agent finishes these 365 steps? Does it keep restarting the data until it reaches 2048 steps overall and then updates its policy?

The results I got seemed a bit better when using 2048 steps versus setting them to 365, in both cases keeping the total_timesteps parameter the same. Could the reason for this might be that updating policy more rarely (every 2048) makes it more stable than having updates every 365 steps?

I would greatly appreciate any tip regarding these parameters and especially explanation of reaching 2048 steps with episode of 365!

Here is the sample code of parameter setting:

  # case I
  model = PPO('MlpPolicy', env, verbose=1, tensorboard_log=logdir)
  model.learn(total_timesteps=100000)

  # case II
  model = PPO('MlpPolicy', env, n_steps=365, verbose=1, tensorboard_log=logdir)
  model.learn(total_timesteps=100000)

Metadata

Metadata

Assignees

No one assigned

    Labels

    questionFurther information is requested

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions