[Bug] SGLang async sampling parameter corruption

We tried to train a system similar to #1682 using VeRL and noticed a curious situation where we had a sudden drop in performance, entropy, a spike in KL and most predominantly HUGE spike in response length. 

On further investigation we found that our behavior matches in profile to the merged implementation #1682 (see their wandb) as well as results in a bunch of other projects that use verl and some issues like  #1967 . 

I traced this issue down to the fact that the issue always happened **after the first evaluation cycle** and that warranted a further look at the sampling params. We found that that after the first eval/test cycle, the temperature parameter is not properly restored and training continues with a temperature of 0, acting completely greedy for the rest of training. This explained the spike in response lengths, drop in entropy, tendency to collapse, etc. 

I tried to implement a proper fix and make a PR but couldn't sufficiently deal with multiple contexts trying to restore corrupted parameters.

For the moment, you can test this hypothesis with a rather dirty fix that got rid of all of these issues for me: 

```
@contextmanager
def update_sampling_params(self, **kwargs):
    # ... existing context manager code ...
    try:
        yield
    finally:
        # roll back to previous sampling params
        for key, value in old_sampling_params_args.items():
            self.sampling_params[key] = value
        
        # NUCLEAR FIX: Ensure temperature is always 1.0 for training
        if 'temperature' in self.sampling_params and self.sampling_params['temperature'] == 0:
            self.sampling_params['temperature'] = 1.0
```

And another slightly unrelated issue:
There are sampling parameters like `stop` supported by SGLang I'd have expected to be supported through simply setting `actor_rollout_ref.rollout.stop` (since most of the others are passed as-is). It would be nice if these could be dynamically just passed off to SGLang in the future so each param doesn't explicitly have to be supported in the future. 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Bug] SGLang async sampling parameter corruption #2087

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Bug] SGLang async sampling parameter corruption #2087

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions