Asymmetric Actor Critic and Related Memory Processing #180
Replies: 2 comments 3 replies
-
|
Currently, there is necessary to modify several components in skrl to support asymmetric learning, example:
I'm working on separating (staring with the environment wrappers on this branch) the concepts of observation and state (currently mixed in skrl) to support asymmetric learning, but it may take some time. |
Beta Was this translation helpful? Give feedback.
-
|
Hi @Toni-SM, I tried to access the branch, but it seems to be no longer available. I'm currently trying to implement the same asymmetric actor-critic setup to train an agent with IsaacLab, and I wanted to ask if this functionality has already been implemented or if there's any ongoing work on it. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
@Toni-SM, thanks so much for your work on this library. I'm trying to use it to train an agent with IsaacGym as a simulator, and wanted to use the asymmetric actor critic variant of PPO like is done in the IsaacGymEnvs repo (for example, in the IndustReal environments). Because of this, my observation is currently a dict that looks like:
I'm using PPO_RNN as the agent. The difficulty I run into when trying to run this is the memory class is built specifically with raw tensors in mind. I wrote a subclass of the
RandomMemoryclass to handle the storage of these elements (the state) separately, but this loop seems to be checking all elements of the memory to see if they are float tensors and filling them with nans, which causes an error when it gets to my shoehorned dict. Currently I've resolved this by just commenting out these lines in my installation of skrl, but I was wondering if they need to be there at all? It looks like the last commit on those lines mentions that they are there for backwards compatibility with old versions of torch, but I didn't follow why exactly that needs to be done to support old versions of torch.Another issue I ran into is this line in the
PPO_RNNclass itself. The cast tofloatmesses up my code because the observation is a dict not a tensor. I was wondering if that cast needs to be there at all though since the user controls what type the state is anyway when they write the environment, so they can just ensure they're giving float tensors as input.Please also let me know if there's a better way to implement asymmetric actor critic models in skrl, I didn't see anything in the docs, but it's possible I just missed it.
Thanks for your time, and thanks again for all the work on the repo! It's been very readable and easy to work with.
Beta Was this translation helpful? Give feedback.
All reactions