-
-
Notifications
You must be signed in to change notification settings - Fork 134
Open
Description
Describe the bug
The observation and info returned at the last step in PointMaze with continuing_task=True, aren't updated (i.e. they contain the old goal). This is not the intended general semantics: in a common RL loop, the agent will use the old observation to predict the action to go to the old goal, instead of the new one.
See related issue: Farama-Foundation/Minari#265
See:
Gymnasium-Robotics/gymnasium_robotics/envs/maze/point_maze.py
Lines 392 to 406 in 3719d9d
| def step(self, action): | |
| obs, _, _, _, info = self.point_env.step(action) | |
| obs_dict = self._get_obs(obs) | |
| reward = self.compute_reward(obs_dict["achieved_goal"], self.goal, info) | |
| terminated = self.compute_terminated(obs_dict["achieved_goal"], self.goal, info) | |
| truncated = self.compute_truncated(obs_dict["achieved_goal"], self.goal, info) | |
| info["success"] = bool( | |
| np.linalg.norm(obs_dict["achieved_goal"] - self.goal) <= 0.45 | |
| ) | |
| # Update the goal position if necessary | |
| self.update_goal(obs_dict["achieved_goal"]) | |
| return obs_dict, reward, terminated, truncated, info |
Code example
You need an expert policy to see this; check https://github.com/Farama-Foundation/minari-dataset-generation-scripts/blob/main/scripts/pointmaze/create_pointmaze_dataset.py
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels