Skip to content

[RouteOrch] Fix notification pipeline latency regression in ResponsePublisher#4389

Draft
tirupatihemanth wants to merge 2 commits intosonic-net:masterfrom
tirupatihemanth:fix_ntf_delay
Draft

[RouteOrch] Fix notification pipeline latency regression in ResponsePublisher#4389
tirupatihemanth wants to merge 2 commits intosonic-net:masterfrom
tirupatihemanth:fix_ntf_delay

Conversation

@tirupatihemanth
Copy link
Copy Markdown
Contributor

@tirupatihemanth tirupatihemanth commented Mar 25, 2026

Why I did it
PR #4172 changed ResponsePublisher pipelines from batch size 1 to 128, improving bulk throughput but causing route offload notifications to stay buffered until OrchDaemon::flush() (up to 1s). With suppress-fib-pending enabled, this delays the FIB offload reply to zebra, causing BGP to hold suppressed routes longer and failing test_bgp_update_timer_single_route.

What I did
Added m_publisher.flush() in RouteOrch::doTask() after the post-processing loop. This flushes all buffered notifications once per batch -- immediately after route processing, not on the 1s periodic timer. Bulk batching from PR #4172 is preserved since it's one flush per batch, not per route.

How I verified it

  • test_bgp_update_timer_single_route on r-panther-45 (t1-lag) -- announce intervals back to ~1s, within the 2.5s threshold.
  • No per-route overhead: flush is once per batch regardless of batch size.

Copilot AI review requested due to automatic review settings March 25, 2026 16:56
@tirupatihemanth tirupatihemanth marked this pull request as draft March 25, 2026 16:56
@mssonicbld
Copy link
Copy Markdown
Collaborator

/azp run

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

dgsudharsan
dgsudharsan previously approved these changes Mar 25, 2026
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adjusts ResponsePublisher’s Redis notification pipeline configuration to eliminate added notification latency in the RouteOrch → fpmsyncd response/offload confirmation path.

Changes:

  • Set the Redis pipeline batch size for the notification pipeline (m_ntf_pipe) to 1 to force immediate flush behavior.
  • Keep the DB write pipeline (m_db_pipe) at the default batch size to preserve write throughput optimizations.

@mssonicbld
Copy link
Copy Markdown
Collaborator

/azp run

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

@mssonicbld
Copy link
Copy Markdown
Collaborator

/azp run

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

@mssonicbld
Copy link
Copy Markdown
Collaborator

/azp run

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

Signed-off-by: Hemanth Kumar Tirupati <[email protected]>
@mssonicbld
Copy link
Copy Markdown
Collaborator

/azp run

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

@tirupatihemanth
Copy link
Copy Markdown
Contributor Author

/azpw run

@mssonicbld
Copy link
Copy Markdown
Collaborator

⚠️ Notice: /azpw run only runs failed jobs now. If you want to trigger a whole pipline run, please rebase your branch or close and reopen the PR.
💡 Tip: You can also use /azpw retry to retry failed jobs directly.

Retrying failed(or canceled) jobs...

@mssonicbld
Copy link
Copy Markdown
Collaborator

Retrying failed(or canceled) stages in build 1071252:

✅Stage BuildAsan:

  • Job amd64: retried.

@mssonicbld
Copy link
Copy Markdown
Collaborator

/azp run

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants