[Bug Report] Obs and info semantics in PointMaze with continuing_task

**Describe the bug**
The observation and info returned at the last step in PointMaze with `continuing_task=True`, aren't updated (i.e. they contain the old goal). This is not the intended general semantics: in a common RL loop, the agent will use the old observation to predict the action to go to the old goal, instead of the new one.

See related issue: https://github.com/Farama-Foundation/Minari/issues/265
See:
https://github.com/Farama-Foundation/Gymnasium-Robotics/blob/3719d9d79e4f6c5e1ff4f9cbdda821fca880c455/gymnasium_robotics/envs/maze/point_maze.py#L392-L406


**Code example**
You need an expert policy to see this; check https://github.com/Farama-Foundation/minari-dataset-generation-scripts/blob/main/scripts/pointmaze/create_pointmaze_dataset.py





	def step(self, action):
	obs, _, _, _, info = self.point_env.step(action)
	obs_dict = self._get_obs(obs)

	reward = self.compute_reward(obs_dict["achieved_goal"], self.goal, info)
	terminated = self.compute_terminated(obs_dict["achieved_goal"], self.goal, info)
	truncated = self.compute_truncated(obs_dict["achieved_goal"], self.goal, info)
	info["success"] = bool(
	np.linalg.norm(obs_dict["achieved_goal"] - self.goal) <= 0.45
	)

	# Update the goal position if necessary
	self.update_goal(obs_dict["achieved_goal"])

	return obs_dict, reward, terminated, truncated, info

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Bug Report] Obs and info semantics in PointMaze with continuing_task #258

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

[Bug Report] Obs and info semantics in PointMaze with continuing_task #258

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions