Skip to content

igor-cheb/RL_practice_problems

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

33 Commits
 
 
 
 
 
 

Repository files navigation

Several practice problems for key RL methods.

Tabular methods

  • 10 hand bandit problem (In Sutton/Barto section 2.3)
  • Simple gridworld (In Sutton/Barto fig. 4.1) - dynamic programming, policy improvement
  • Jack's car rental (In Sutton/Barto ex. 4.7) - dynamic programming, policy iteration
  • Racetrack (In Sutton/Barto: ex. 5.12, p.111) — Off-policy MC
  • Windy Gridworld with King’s Moves (In Sutton/Barto: ex. 6.9) — SARSA (TD)
  • Taxi-dispatch - https://gym.openai.com/envs/Taxi-v3/ — Q-learning
  • Mazeworld - Dyna-Q
  • Hard Racetrack - Dyna-Q with prioritised sweeping

Policy gradient methods

About

Implementations of several key RL methods from scratch

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published