[Bug]: DAAC trained on MultiBinary envs but returns floats when doing inference?

### 🐛 Bug

Note: this is on the current pip version, havn't tried the git repo version.
Also note I have a newer version of gymnasium and matplotlib than this repo specifies: 

```
rllte-core 0.0.1b3 has requirement gymnasium[accept-rom-license]==0.28.1, but you have gymnasium 0.29.0.
rllte-core 0.0.1b3 has requirement matplotlib==3.6.0, but you have matplotlib 3.7.1.
```

I'll start out by explaining the shapes in the training env.

```
...
print("SHAPE1 " + repr(env.action_space))
envs = [make_env(env_id, seed + i) for i in range(num_envs)]
envs = AsyncVectorEnv(envs)
envs = RecordEpisodeStatistics(envs)
envs = TransformReward(envs, lambda reward: np.sign(reward))

print("SHAPE2 " + repr(envs.action_space))
return TorchVecEnvWrapper(envs, device)
```

the printout of the above is

SHAPE1 MultiBinary(12)
SHAPE2 Box(0, 1, (8, 12), int8)

The above code snippet is the return value of a function called make_retro_env that I created.

After training using

```
env = make_retro_env("SuperHangOn-Genesis", "junior-map", num_envs=8, distributed=False)
eval_env = make_retro_env("SuperHangOn-Genesis", "junior-map", num_envs=1, distributed=False)
print("SHAPE3 " + repr(env.action_space))

model = DAAC(env=env, 
              eval_env=eval_env, 
              device='cuda',
              )
model.train(num_train_steps=1000000)
```

Note this prints out "SHAPE3 MultiBinary(12)"

When I load the .pth that was automatically saved via the training, using

```
agent = th.load("./logs/default/2023-08-02-12-49-12/model/agent.pth", map_location=th.device('cuda'))
action = agent(obs)
print("action " + repr(action))
```

The tensors look like this:
```
action tensor([[ 9.9971e-02, -2.7629e-01,  4.2010e-03,  3.1142e-02, -1.2863e-01,
          3.5272e-04,  1.9941e-01,  2.8625e-01,  3.2863e-01, -6.0946e-01,
         -1.7830e-01,  1.3129e-01]], device='cuda:0')
```

I'm not sure if I did something wrong, or if perhaps this bug is fixed in the current repo or related to my library versions. If that is the case, let me know.
Thanks.

### To Reproduce

_No response_

### Relevant log output / Error message

_No response_

### System Info

({'OS': 'Windows-10-10.0.19045-SP0 10.0.19045', 'Python': '3.8.16', 'Stable-Baselines3': '2.0.0', 'PyTorch': '2.0.0', 'GPU Enabled': 'True', 'Numpy': '1.23.5', 'Cloudpickle': '2.2.1', 'Gymnasium': '0.29.0', 'OpenAI Gym': '0.26.2'}, '- OS: Windows-10-10.0.19045-SP0 10.0.19045\n- Python: 3.8.16\n- Stable-Baselines3: 2.0.0\n- PyTorch: 2.0.0\n- GPU Enabled: True\n- Numpy: 1.23.5\n- Cloudpickle: 2.2.1\n- Gymnasium: 0.29.0\n- OpenAI Gym: 0.26.2\n')

### Checklist

- [X] I have checked that there is no similar [issue](https://github.com/RLE-Foundation/rllte/issues) in the repo
- [X] I have read the [documentation](https://docs.rllte.dev/)
- [X] I have provided a minimal working example to reproduce the bug
- [X] I've used the [markdown code blocks](https://help.github.com/en/articles/creating-and-highlighting-code-blocks) for both code and stack traces.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Bug]: DAAC trained on MultiBinary envs but returns floats when doing inference? #21

🐛 Bug

To Reproduce

Relevant log output / Error message

System Info

Checklist

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

[Bug]: DAAC trained on MultiBinary envs but returns floats when doing inference? #21

Description

🐛 Bug

To Reproduce

Relevant log output / Error message

System Info

Checklist

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions