Skip to content

Sahel13/particle-pomdp

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Particle POMDP Policy Optimization (P3O)

Implements the P3O algorithm from the NeurIPS 2025 paper Sequential Monte Carlo for Policy Optimization in Continuous POMDPs. This code was written by Sahel Iqbal and Hany Abdulsamad.

P3O is a policy optimization algorithm for partially observable Markov decision processes (POMDPs) with continuous state, action and observation spaces. See the scripts in examples/ for demonstrations of how to train policies using P3O.

Installation

Install JAX for the available hardware. Then run

$ pip install -e .

for an editable install.

Examples

We provide multiple environments to test P3O's optimal information gathering behavior:

  • pendulum: A pendulum swing-up task, where only the angular position is observable.
  • cartpole: A cart-pole swing-up task, where only the angular and Cartesian positions are observable.
  • light-dark-2d: A 2D navigation task with location-dependent noise.
  • triangulation: A 2D navigation task with heading-only observations.

Each environment can be ran with two policies:

  • a policy with history inputs - recurrent
  • a policy with belief state inputs - attention

For example, for the light-dark environment run:

python examples/lightdark2d/p3o_recurrent.py

or

python examples/lightdark2d/p3o_attention.py

Baselines

We provide the following baselines for comparison:

  1. Deep Variational Reinforcement Learning for POMDPs (DVRL) - See baselines/dvrl.
  2. Stochastic Latent Actor-Critic (SLAC) - See baselines/slac.
  3. DualSMC - See baselines/dsmc.

See baselines/README.md for details.

Citation

If you find the code useful, please cite our paper

@inproceedings{abdulsamad2025sequential,
  title = {Sequential {Monte Carlo} for policy optimization in continuous {POMDPs}},
  author = {Hany Abdulsamad and Sahel Iqbal and Simo S{\"a}rkk{\"a}},
  booktitle = {Advances in Neural Information Processing Systems},
  year = {2025},
}

About

Code accompanying the NeurIPS 2025 paper "Sequential Monte Carlo for Policy Optimization in Continuous POMDPs".

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •  

Languages