What happens when episode length is less than n_steps parameter in PPO

Hello!

In my research with PPO I have a number of samples equal to 365, which is basically just a feature vector for every day in one year. When training the model on this data I used to keep the default parameter n_steps equal to 2048 and just set the number of total_timesteps to 100 000. 

However, I do not quite understand what happens when the agent finishes these 365 steps? Does it keep restarting the data until it reaches 2048 steps overall and then updates its policy?

The results I got seemed a bit better when using 2048 steps versus setting them to 365, in both cases keeping the total_timesteps parameter the same. Could the reason for this might be that updating policy more rarely (every 2048) makes it more stable than having updates every 365 steps?

I would greatly appreciate any tip regarding these parameters and especially explanation of reaching 2048 steps with episode of 365!

Here is the sample code of parameter setting:
```
  # case I
  model = PPO('MlpPolicy', env, verbose=1, tensorboard_log=logdir)
  model.learn(total_timesteps=100000)

  # case II
  model = PPO('MlpPolicy', env, n_steps=365, verbose=1, tensorboard_log=logdir)
  model.learn(total_timesteps=100000)
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

What happens when episode length is less than n_steps parameter in PPO #560

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

What happens when episode length is less than n_steps parameter in PPO #560

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions