Skip to content

fix: clear user sessions between load stages#459

Merged
achandrasekar merged 1 commit into
kubernetes-sigs:mainfrom
alonh:fix/clear-user-sessions-between-stages
Apr 29, 2026
Merged

fix: clear user sessions between load stages#459
achandrasekar merged 1 commit into
kubernetes-sigs:mainfrom
alonh:fix/clear-user-sessions-between-stages

Conversation

@alonh
Copy link
Copy Markdown
Contributor

@alonh alonh commented Apr 29, 2026

Summary

  • LocalUserSession._instances was never cleared between load stages, causing session context to accumulate across stage boundaries and prompts to grow indefinitely in multi-turn shared-prefix workloads.
  • Non-mp path (num_workers=0): call LocalUserSession.clear_instances() between stages in LoadGenerator.run().
  • Mp path (num_workers>0): add an mp.Barrier to synchronize workers and main at stage boundaries. Workers finish in-flight requests, clear sessions, then wait at the barrier. Main waits after request_queue.join(), ensuring all workers have cleaned up before the next stage begins. Without the barrier, request_phase can be cleared and re-set before workers notice, causing them to miss the stage boundary entirely.

Test plan

  • Unit tests for LocalUserSession.clear_instances() (singleton reset, context reset)
  • Integration test: non-mp LoadGenerator with two stages verifies stage 1 prompts contain no stage 0 context
  • Integration test: mp LoadGenerator (num_workers=1) with two stages verifies stage 1 prompts contain no stage 0 context via mp.Queue
  • mypy --strict passes
  • All 5 tests pass reliably across 10 consecutive runs

Fixes #447
Fixes #444

LocalUserSession._instances was never cleared between stages, causing
session context to accumulate across stage boundaries and prompts to
grow indefinitely in multi-turn shared-prefix workloads.

Non-mp path (num_workers=0): call LocalUserSession.clear_instances()
between stages in LoadGenerator.run().

Mp path (num_workers>0): add an mp.Barrier to synchronize workers and
main at stage boundaries. Workers finish in-flight requests, clear
sessions, then wait at the barrier. Main waits at the barrier after
request_queue.join(), ensuring all workers have cleaned up before the
next stage begins. Without the barrier, request_phase can be cleared
and re-set before workers notice, causing them to miss the stage
boundary entirely.
@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Apr 29, 2026
@k8s-ci-robot k8s-ci-robot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Apr 29, 2026
@Bslabe123
Copy link
Copy Markdown
Contributor

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Apr 29, 2026
@jjk-g jjk-g added this to the v0.5.0 milestone Apr 29, 2026
@achandrasekar
Copy link
Copy Markdown
Contributor

/approve

Thanks for fixing this!

@k8s-ci-robot
Copy link
Copy Markdown
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: achandrasekar, alonh

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Apr 29, 2026
@achandrasekar achandrasekar merged commit b32f9e5 into kubernetes-sigs:main Apr 29, 2026
5 of 6 checks passed
jjk-g pushed a commit to jjk-g/inference-perf that referenced this pull request Apr 30, 2026
## Summary

- `LocalUserSession._instances` was never cleared between load stages,
causing session context to accumulate across stage boundaries and
prompts to grow indefinitely in multi-turn shared-prefix workloads.
- **Non-mp path** (`num_workers=0`): call
`LocalUserSession.clear_instances()` between stages in
`LoadGenerator.run()`.
- **Mp path** (`num_workers>0`): add an `mp.Barrier` to synchronize
workers and main at stage boundaries. Workers finish in-flight requests,
clear sessions, then wait at the barrier. Main waits after
`request_queue.join()`, ensuring all workers have cleaned up before the
next stage begins. Without the barrier, `request_phase` can be cleared
and re-set before workers notice, causing them to miss the stage
boundary entirely.

## Test plan

- [x] Unit tests for `LocalUserSession.clear_instances()` (singleton
reset, context reset)
- [x] Integration test: non-mp LoadGenerator with two stages verifies
stage 1 prompts contain no stage 0 context
- [x] Integration test: mp LoadGenerator (`num_workers=1`) with two
stages verifies stage 1 prompts contain no stage 0 context via mp.Queue
- [x] mypy --strict passes
- [x] All 5 tests pass reliably across 10 consecutive runs

Fixes kubernetes-sigs#447 
Fixes kubernetes-sigs#444
kaushikmitr added a commit to kaushikmitr/inference-perf that referenced this pull request May 3, 2026
When LoadGenerator clears LocalUserSession._instances between load stages
(PR kubernetes-sigs#459), conversation_replay's user_sessions[].user_session_id no longer
resolves in the registry. The next get_instance() lazily creates a fresh
session with empty context, dropping the per-conversation system_prompt
and producing dramatically smaller prompts in stages 1+.

Detect the cleared registry at the top of load_lazy_data and re-register
the session with its original system_prompt before dispatching.
achandrasekar pushed a commit that referenced this pull request May 5, 2026
When LoadGenerator clears LocalUserSession._instances between load
stages (PR #459), conversation_replay's user_sessions[].user_session_id
no longer resolves in the registry. The next get_instance() lazily
creates a fresh session with empty context, dropping the
per-conversation system_prompt and producing dramatically smaller
prompts in stages 1+.

Detect the cleared registry at the top of load_lazy_data and re-register
the session with its original system_prompt before dispatching.
kaushikmitr added a commit to kaushikmitr/inference-perf that referenced this pull request May 6, 2026
The barrier added in kubernetes-sigs#459 (clear user sessions between stages) requires all
N+1 parties (N workers + parent) to arrive before any can proceed. With no
timeout, a single worker stuck in gather() — e.g., waiting on an in-flight
request that never completes — leaves the rest of the cluster hanging
forever. Multi-stage benchmark runs reproduce this: stages all dispatch
cleanly, then everyone hangs in epoll_wait, no error, no progress.

Bound both the worker-side and parent-side barrier waits with a 600s
timeout. On timeout, BrokenBarrierError surfaces; we log a warning and
proceed. Healthy workers + parent move on; the stuck worker eventually
gets terminated by loadgen.stop() at job teardown.
kaushikmitr added a commit to kaushikmitr/inference-perf that referenced this pull request May 6, 2026
The barrier added in kubernetes-sigs#459 (clear user sessions between stages) requires all
N+1 parties (N workers + parent) to arrive before any can proceed. With no
timeout, a single worker stuck in gather() — e.g., waiting on an in-flight
request that never completes — leaves the rest of the cluster hanging
forever. Multi-stage benchmark runs reproduce this: stages all dispatch
cleanly, then everyone hangs in epoll_wait, no error, no progress.

Bound both the worker-side and parent-side barrier waits with a 600s
timeout. On timeout, BrokenBarrierError surfaces; we log a warning and
proceed. Healthy workers + parent move on; the stuck worker eventually
gets terminated by loadgen.stop() at job teardown.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.

Projects

None yet

5 participants