Particle POMDP Policy Optimization (P3O)

Implements the P3O algorithm from the NeurIPS 2025 paper Sequential Monte Carlo for Policy Optimization in Continuous POMDPs. This code was written by Sahel Iqbal and Hany Abdulsamad.

P3O is a policy optimization algorithm for partially observable Markov decision processes (POMDPs) with continuous state, action and observation spaces. See the scripts in examples/ for demonstrations of how to train policies using P3O.

Installation

Install JAX for the available hardware. Then run

$ pip install -e .

for an editable install.

Examples

We provide multiple environments to test P3O's optimal information gathering behavior:

pendulum: A pendulum swing-up task, where only the angular position is observable.
cartpole: A cart-pole swing-up task, where only the angular and Cartesian positions are observable.
light-dark-2d: A 2D navigation task with location-dependent noise.
triangulation: A 2D navigation task with heading-only observations.

Each environment can be ran with two policies:

a policy with history inputs - recurrent
a policy with belief state inputs - attention

For example, for the light-dark environment run:

python examples/lightdark2d/p3o_recurrent.py

or

python examples/lightdark2d/p3o_attention.py

Baselines

We provide the following baselines for comparison:

Deep Variational Reinforcement Learning for POMDPs (DVRL) - See baselines/dvrl.
Stochastic Latent Actor-Critic (SLAC) - See baselines/slac.
DualSMC - See baselines/dsmc.

See baselines/README.md for details.

Citation

If you find the code useful, please cite our paper

@inproceedings{abdulsamad2025sequential,
  title = {Sequential {Monte Carlo} for policy optimization in continuous {POMDPs}},
  author = {Hany Abdulsamad and Sahel Iqbal and Simo S{\"a}rkk{\"a}},
  booktitle = {Advances in Neural Information Processing Systems},
  year = {2025},
}

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
baselines		baselines
examples		examples
experiments		experiments
ppomdp		ppomdp
tests		tests
.gitignore		.gitignore
CITATION.bib		CITATION.bib
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Particle POMDP Policy Optimization (P3O)

Installation

Examples

Baselines

Citation

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

License

Sahel13/particle-pomdp

Folders and files

Latest commit

History

Repository files navigation

Particle POMDP Policy Optimization (P3O)

Installation

Examples

Baselines

Citation

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages