Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
37 commits
Select commit Hold shift + click to select a range
2d6de10
first push
akbir Oct 27, 2022
6e085aa
first push - add IteratedMatrixGame
akbir Oct 27, 2022
6e8f301
first push - add IteratedMatrixGame
akbir Oct 27, 2022
114c618
Merge branch 'main' of github.com:akbir/pax into clean_up_envs
akbir Oct 27, 2022
d89f9d7
first push - add IteratedMatrixGame
akbir Oct 27, 2022
655e278
first push - add IteratedMatrixGame
akbir Oct 27, 2022
83a8c88
first test passes!
akbir Oct 27, 2022
9a32d11
changed everything up to tft
akbir Oct 27, 2022
75c03f2
do not use gymnax.environments because it makes single agent assumptions
akbir Oct 27, 2022
5e48408
titfortat now works
akbir Oct 27, 2022
c29f94c
all tests pass for iterated_matrix_game
akbir Oct 27, 2022
1f96f89
first push for infinite matrix game
akbir Oct 27, 2022
0877855
added tests for infinite matrix game
akbir Oct 27, 2022
1a739ab
added tests for infinite matrix game
akbir Oct 27, 2022
077fe48
shout out chris
akbir Oct 27, 2022
fa9ca39
updated to include coingame
akbir Oct 28, 2022
c087d89
moved some arguments back to kwargs
akbir Oct 28, 2022
6808c8c
revert existing envs
akbir Oct 28, 2022
c1d5ed2
fixed render
akbir Oct 28, 2022
01e3a13
WIP: first push
akbir Oct 30, 2022
3b6de16
changed inner/outer/loop
akbir Oct 31, 2022
28c0354
getting rid of timestep for ppo
Aidandos Oct 31, 2022
f616a6c
updated runner
akbir Oct 31, 2022
468e841
removing timestep from ppo, ppogru, and mfos
Aidandos Oct 31, 2022
e6c0464
change strategies update interface
Aidandos Oct 31, 2022
58e12ba
updated runner
akbir Oct 31, 2022
23ec810
updated runner
akbir Oct 31, 2022
4c699f7
fixed for mfos
akbir Oct 31, 2022
3d3dbc1
fixed for mfos
akbir Oct 31, 2022
29260e1
minor changes to mfos
akbir Oct 31, 2022
dfdf06e
rng fixes
akbir Oct 31, 2022
6a186ee
evo runs
akbir Oct 31, 2022
a5c8168
updated configs
akbir Oct 31, 2022
c159bb8
updated eval runner
akbir Oct 31, 2022
0299331
updated to work for cg
akbir Oct 31, 2022
b3f8c50
updated to work for cg
akbir Oct 31, 2022
e637fb9
fixed sequential bug
akbir Oct 31, 2022
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 2 additions & 1 deletion pax/conf/experiment/cg/sanity.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -8,8 +8,9 @@ agent2: 'PPO_memory'
env_id: coin_game
env_type: sequential
egocentric: True
cnn: False
env_discount: 0.96
payoff: [[-1, -1], [-3, 0], [0, -3], [-2, -2]]
payoff: [[1, 1, -2], [1, 1, -2]]
runner: rl

# Training hyperparameters
Expand Down
1 change: 0 additions & 1 deletion pax/conf/experiment/ipd/earl_v_ppo.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,6 @@ num_generations: 5000
total_timesteps: 1e11

# Evaluation
num_seeds: 20
# # EARL vs. PPO trained on seed=0
# run_path: ucl-dark/ipd/13o3v95p
# model_path: exp/EARL-PPO_memory-vs-PPO/run-seed-0-OpenES-pop-size-1000-num-opps-1/2022-09-15_00.15.31.908871/generation_2900
Expand Down
7 changes: 3 additions & 4 deletions pax/conf/experiment/ipd/mfos_v_ppo.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -11,19 +11,18 @@ env_discount: 0.96
payoff: [[-1, -1], [-3, 0], [0, -3], [-2, -2]]

# Runner
runner: rl

runner: evo

# Training
top_k: 5
popsize: 1000
num_envs: 2
num_envs: 100
num_opps: 1
num_steps: 10_000
num_inner_steps: 100
num_generations: 5000
total_timesteps: 1e11
num_devices: 1
num_devices: 2

# PPO agent parameters
ppo:
Expand Down
6 changes: 2 additions & 4 deletions pax/conf/experiment/ipd/mfos_v_ppo_mem.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -11,8 +11,7 @@ env_discount: 0.96
payoff: [[-1, -1], [-3, 0], [0, -3], [-2, -2]]

# Runner
evo: True
eval: False
runner: evo

# Training
top_k: 5
Expand All @@ -23,8 +22,7 @@ num_steps: 10_000
num_inner_steps: 100
num_generations: 5000
total_timesteps: 1e11
num_devices: 1
runner: rl
num_devices: 2

# PPO agent parameters
ppo:
Expand Down
6 changes: 2 additions & 4 deletions pax/conf/experiment/ipd/mfos_v_tabular.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@ env_discount: 0.96
payoff: [[-1, -1], [-3, 0], [0, -3], [-2, -2]]

# Runner
runner: rl
runner: evo

# Training
top_k: 5
Expand All @@ -22,10 +22,8 @@ num_steps: 10_000
num_inner_steps: 100
num_generations: 5000
total_timesteps: 1e11
num_devices: 1
num_devices: 2

# Evaluation
num_seeds: 20
# MFOS vs. Tabular trained on seed = 0
run_path: ucl-dark/ipd/1r9txdso
model_path: exp/GS-MFOS-vs-Tabular/run-seed-0-pop-size-1000/2022-09-25_20.32.20.821162/generation_4400
Expand Down
1 change: 1 addition & 0 deletions pax/conf/experiment/ipd/ppo.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,7 @@ runner: rl
num_envs: 100
num_opps: 1
num_steps: 150 # number of steps per episode
num_inner_steps: 150
total_timesteps: 1_000_000

# Evaluation
Expand Down
1 change: 1 addition & 0 deletions pax/conf/experiment/ipd/ppo_memory.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,7 @@ eval: False
num_envs: 100
num_opps: 1
num_steps: 150 # number of steps per episode
num_inner_steps: 150
total_timesteps: 2e7

# Useful information
Expand Down
1 change: 0 additions & 1 deletion pax/conf/experiment/mp/earl_v_ppo.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,6 @@ total_timesteps: 1e11
num_devices: 1

# Evaluation
num_seeds: 20
# # EARL vs. PPO trained on seed=0
# run_path: ucl-dark/ipd/13o3v95p
# model_path: exp/EARL-PPO_memory-vs-PPO/run-seed-0-OpenES-pop-size-1000-num-opps-1/2022-09-15_00.15.31.908871/generation_2900
Expand Down
1 change: 0 additions & 1 deletion pax/conf/experiment/mp/earl_v_ppo_mem.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,6 @@ total_timesteps: 1e11
num_devices: 1

# Evaluation
num_seeds: 20
# # EARL vs. PPO trained on seed=0
# run_path: ucl-dark/ipd/13o3v95p
# model_path: exp/EARL-PPO_memory-vs-PPO/run-seed-0-OpenES-pop-size-1000-num-opps-1/2022-09-15_00.15.31.908871/generation_2900
Expand Down
4 changes: 1 addition & 3 deletions pax/conf/experiment/mp/gs_v_ppo_mem.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -24,9 +24,7 @@ num_generations: 5000
total_timesteps: 1e11
num_devices: 1

# Evaluation
num_seeds: 20
# # EARL vs. PPO trained on seed=0
# EARL vs. PPO trained on seed=0
# run_path: ucl-dark/ipd/13o3v95p
# model_path: exp/EARL-PPO_memory-vs-PPO/run-seed-0-OpenES-pop-size-1000-num-opps-1/2022-09-15_00.15.31.908871/generation_2900
# EARL vs. PPO trained on seed=1
Expand Down
4 changes: 1 addition & 3 deletions pax/conf/experiment/mp/gs_v_tabular.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -24,9 +24,7 @@ num_generations: 5000
total_timesteps: 1e11
num_devices: 1

# Evaluation
num_seeds: 20
# # EARL vs. PPO trained on seed=0
# EARL vs. PPO trained on seed=0
# run_path: ucl-dark/ipd/13o3v95p
# model_path: exp/EARL-PPO_memory-vs-PPO/run-seed-0-OpenES-pop-size-1000-num-opps-1/2022-09-15_00.15.31.908871/generation_2900
# EARL vs. PPO trained on seed=1
Expand Down
5 changes: 2 additions & 3 deletions pax/conf/experiment/mp/mfos_v_tabular.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -24,9 +24,8 @@ num_generations: 5000
total_timesteps: 1e11
num_devices: 1

# Evaluation
num_seeds: 20
# # EARL vs. PPO trained on seed=0

# EARL vs. PPO trained on seed=0
# run_path: ucl-dark/ipd/13o3v95p
# model_path: exp/EARL-PPO_memory-vs-PPO/run-seed-0-OpenES-pop-size-1000-num-opps-1/2022-09-15_00.15.31.908871/generation_2900
# EARL vs. PPO trained on seed=1
Expand Down
Empty file added pax/envs/__init__.py
Empty file.
Loading