Skip to content

[Bug]: DAAC trained on MultiBinary envs but returns floats when doing inference? #21

@Disastorm

Description

@Disastorm

🐛 Bug

Note: this is on the current pip version, havn't tried the git repo version.
Also note I have a newer version of gymnasium and matplotlib than this repo specifies:

rllte-core 0.0.1b3 has requirement gymnasium[accept-rom-license]==0.28.1, but you have gymnasium 0.29.0.
rllte-core 0.0.1b3 has requirement matplotlib==3.6.0, but you have matplotlib 3.7.1.

I'll start out by explaining the shapes in the training env.

...
print("SHAPE1 " + repr(env.action_space))
envs = [make_env(env_id, seed + i) for i in range(num_envs)]
envs = AsyncVectorEnv(envs)
envs = RecordEpisodeStatistics(envs)
envs = TransformReward(envs, lambda reward: np.sign(reward))

print("SHAPE2 " + repr(envs.action_space))
return TorchVecEnvWrapper(envs, device)

the printout of the above is

SHAPE1 MultiBinary(12)
SHAPE2 Box(0, 1, (8, 12), int8)

The above code snippet is the return value of a function called make_retro_env that I created.

After training using

env = make_retro_env("SuperHangOn-Genesis", "junior-map", num_envs=8, distributed=False)
eval_env = make_retro_env("SuperHangOn-Genesis", "junior-map", num_envs=1, distributed=False)
print("SHAPE3 " + repr(env.action_space))

model = DAAC(env=env, 
              eval_env=eval_env, 
              device='cuda',
              )
model.train(num_train_steps=1000000)

Note this prints out "SHAPE3 MultiBinary(12)"

When I load the .pth that was automatically saved via the training, using

agent = th.load("./logs/default/2023-08-02-12-49-12/model/agent.pth", map_location=th.device('cuda'))
action = agent(obs)
print("action " + repr(action))

The tensors look like this:

action tensor([[ 9.9971e-02, -2.7629e-01,  4.2010e-03,  3.1142e-02, -1.2863e-01,
          3.5272e-04,  1.9941e-01,  2.8625e-01,  3.2863e-01, -6.0946e-01,
         -1.7830e-01,  1.3129e-01]], device='cuda:0')

I'm not sure if I did something wrong, or if perhaps this bug is fixed in the current repo or related to my library versions. If that is the case, let me know.
Thanks.

To Reproduce

No response

Relevant log output / Error message

No response

System Info

({'OS': 'Windows-10-10.0.19045-SP0 10.0.19045', 'Python': '3.8.16', 'Stable-Baselines3': '2.0.0', 'PyTorch': '2.0.0', 'GPU Enabled': 'True', 'Numpy': '1.23.5', 'Cloudpickle': '2.2.1', 'Gymnasium': '0.29.0', 'OpenAI Gym': '0.26.2'}, '- OS: Windows-10-10.0.19045-SP0 10.0.19045\n- Python: 3.8.16\n- Stable-Baselines3: 2.0.0\n- PyTorch: 2.0.0\n- GPU Enabled: True\n- Numpy: 1.23.5\n- Cloudpickle: 2.2.1\n- Gymnasium: 0.29.0\n- OpenAI Gym: 0.26.2\n')

Checklist

  • I have checked that there is no similar issue in the repo
  • I have read the documentation
  • I have provided a minimal working example to reproduce the bug
  • I've used the markdown code blocks for both code and stack traces.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions