Skip to content

Conversation

@wuxibin89
Copy link
Collaborator

Metrics should be in non_tensor_batch instead of meta_info, as DataProto not concat meta_info. If set in meta_info, PPOTrainer only get metrics of DP rank 0.

# TODO: here, we should return all metrics
output = DataProto(meta_info={'metrics': metrics})
output = output.to('cpu')
output = DataProto(non_tensor_batch={metric: np.array(value) for metric, value in metrics.items()})
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess it's better to give metric under a unified namespace in case we would like to add non-metric data into the non_tensor

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, I guess it's problematic because the non tensor requires to have the same batch size as batch

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Metric is so general that I suggest to add a new .metrics filed to DataProto, the reason are:

  • metrics can't be set to .non_tensor_batch along with .batch, since these two fields need same batch size
  • metrics can't be set to .meta_info, since this field should not be concat

New .metrics field should be much like .meta_info, except that it will be concat in DataProto.concat.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Furthermore, should we remove .non_tensor_batch? Since it's designed as a dictionary contains numpy arrays with same batch size to .batch. For now it's mainly used to store multi_modal_inputs, can we also store them in .batch? cc @hiyouga

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We cannot pad the multi modal features because the pad tensor will consume much VRAM/RAM.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess we can either create a new field in DataProto that is concat during collection, or we can modify dispatch function to aggregate meta_info. However, it's hard to generalize what should we aggregate in meta_info.

It's not a good idea to aggregate meta_info, it may contains any python objects.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah. We expect the input and output of dataproto dispatch are identical, simply concatnating the meta_info breaks such rule

Copy link
Collaborator

@vermouth1992 vermouth1992 Mar 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems that the best approach might be to perform allgather/allreduce inside workers as if there is no single controller :)

Copy link
Collaborator Author

@wuxibin89 wuxibin89 Mar 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't tend to allreduce metrics inside workers, cause it introduce a strong barrier across all workers and make async impossible to driver.

@wuxibin89 wuxibin89 changed the title fix: set metrics in non_tensor_batch instead of meta_info fix: concat metrics in DataProto.concat Mar 15, 2025
@hiyouga
Copy link
Collaborator

hiyouga commented Mar 26, 2025

@wuxibin89 is this pr ready for merge?

@eric-haibin-lin eric-haibin-lin deleted the wuxibin/fix_metrics branch June 18, 2025 17:14
szrlee added a commit to szrlee/verl that referenced this pull request Oct 9, 2025
When multiple workers return metrics, DataProto.concat() now flattens the list
of metric dicts into a dict of lists using list_of_dict_to_dict_of_list(). This
ensures metrics have a consistent structure regardless of whether they come from
1 or N workers, and allows reduce_metrics() to work without modification.

This is a cleaner solution than handling list input in reduce_metrics(), as it:
- Keeps metrics aggregation logic in the data layer (single responsibility)
- Maintains a consistent API where meta_info["metrics"] is always dict[str, list]
- Avoids leaking DataProto's concat behavior to all metrics consumers

Changes:
- DataProto.concat(): Flatten list of metric dicts to dict of lists
- Update tests to expect flattened metrics format

Fixes the error:
  File "verl/trainer/ppo/ray_trainer.py", line 1129, in fit
    critic_output_metrics = reduce_metrics(critic_output.meta_info["metrics"])
  AttributeError: 'list' object has no attribute 'items'

Related: volcengine#602
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants