Skip to content

[Bug] SGLang async sampling parameter corruption #2087

@ummavi

Description

@ummavi

We tried to train a system similar to #1682 using VeRL and noticed a curious situation where we had a sudden drop in performance, entropy, a spike in KL and most predominantly HUGE spike in response length.

On further investigation we found that our behavior matches in profile to the merged implementation #1682 (see their wandb) as well as results in a bunch of other projects that use verl and some issues like #1967 .

I traced this issue down to the fact that the issue always happened after the first evaluation cycle and that warranted a further look at the sampling params. We found that that after the first eval/test cycle, the temperature parameter is not properly restored and training continues with a temperature of 0, acting completely greedy for the rest of training. This explained the spike in response lengths, drop in entropy, tendency to collapse, etc.

I tried to implement a proper fix and make a PR but couldn't sufficiently deal with multiple contexts trying to restore corrupted parameters.

For the moment, you can test this hypothesis with a rather dirty fix that got rid of all of these issues for me:

@contextmanager
def update_sampling_params(self, **kwargs):
    # ... existing context manager code ...
    try:
        yield
    finally:
        # roll back to previous sampling params
        for key, value in old_sampling_params_args.items():
            self.sampling_params[key] = value
        
        # NUCLEAR FIX: Ensure temperature is always 1.0 for training
        if 'temperature' in self.sampling_params and self.sampling_params['temperature'] == 0:
            self.sampling_params['temperature'] = 1.0

And another slightly unrelated issue:
There are sampling parameters like stop supported by SGLang I'd have expected to be supported through simply setting actor_rollout_ref.rollout.stop (since most of the others are passed as-is). It would be nice if these could be dynamically just passed off to SGLang in the future so each param doesn't explicitly have to be supported in the future.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions