Skip to content

Conversation

@yuseok-kim-edushare
Copy link
Contributor

feat: Improve graceful shutdown and add AOF handling

This commit(PR) enhances the asynchronous shutdown process in Worker.cs with the following changes:

  • The StopAsync method now waits up to 30 seconds for existing connections to complete before termination.
  • Added logic to flush the AOF (Append-Only File) buffer and create a checkpoint on shutdown. This commit operation is only performed if AOF is enabled.
  • Implemented the new WaitForActiveConnectionsToComplete method to check the status of active connections with a retry mechanism.
  • Called GC.SuppressFinalize(this) in the Dispose method to prevent unnecessary finalization.

then this PR will Close #1382 Issue
(Tested in My side, if you hope you can check also)

This commit enhances the asynchronous shutdown process in `Worker.cs` with the following changes:

- The `StopAsync` method now waits up to 30 seconds for existing connections to complete before termination.
- Added logic to flush the AOF (Append-Only File) buffer and create a checkpoint on shutdown. This commit operation is only performed if AOF is enabled.
- Implemented the new `WaitForActiveConnectionsToComplete` method to check the status of active connections with a retry mechanism.
- Called `GC.SuppressFinalize(this)` in the `Dispose` method to prevent unnecessary finalization.
@yuseok-kim-edushare yuseok-kim-edushare marked this pull request as ready for review September 16, 2025 15:13
Copilot AI review requested due to automatic review settings September 16, 2025 15:13
@yuseok-kim-edushare yuseok-kim-edushare changed the title feat: Improve graceful shutdown and add AOF handling #1382 [Garnet.Worker] feat: Improve graceful shutdown and add AOF handling #1382 Sep 16, 2025
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR enhances the graceful shutdown process for the Garnet worker service by implementing a more sophisticated shutdown sequence that waits for active connections to complete before termination and handles AOF (Append-Only File) operations during shutdown.

  • Adds a 30-second timeout mechanism to wait for active connections to complete during shutdown
  • Implements AOF buffer flushing and checkpoint creation during the shutdown process
  • Adds proper resource cleanup with GC.SuppressFinalize call in the Dispose method

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

yuseok-kim-edushare and others added 2 commits September 17, 2025 00:40
Simplifies the polling interval logic for active connections by using a delay array and removing consecutive error tracking. Extracts active connection count retrieval into a new GetActiveConnectionCount() helper method for clarity and reuse.
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

Copilot reviewed 1 out of 1 changed files in this pull request and generated 7 comments.


Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

Moved Dispose() call after base.StopAsync in the finally block to ensure proper resource cleanup order. Added null checks for server and server.Metrics in GetActiveConnectionCount to prevent possible null reference exceptions.
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

Copilot reviewed 1 out of 1 changed files in this pull request and generated 2 comments.


Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

@TalZaccai TalZaccai requested a review from badrishc September 16, 2025 18:47
@yuseok-kim-edushare
Copy link
Contributor Author

yuseok-kim-edushare commented Oct 10, 2025

umm, I hope my code change didn't impact on cluster test result

CI only failure on test.cluster with windows-latest, net8.0, Release
But, test.cluster success with windows-latest, among net9.0(both of Release and Debug), net8.0 with Debug

So how can I find to resolve this issue? I don't have a mind to locally test about clusttering garnet in windows

When before running is successful (https://github.com/microsoft/garnet/actions/runs/18344516902)
but today's branch update make CI failure

@yuseok-kim-edushare
Copy link
Contributor Author

In my assumption with Local Visual Studio Test excution passed about failed test item, and only failed only one target failure(.net 8, windows, release)
and other commit on main branch's actions running result log shows some cluster test failure
This is Not our main program failure, I think

During graceful shutdown, the worker now checks if tiered storage is enabled and takes a checkpoint using StoreWrapper if so. If tiered storage is not enabled, it falls back to flushing the AOF buffer as before. This ensures data consistency for both storage modes.
@yuseok-kim-edushare yuseok-kim-edushare changed the title [Garnet.Worker] feat: Improve graceful shutdown and add AOF handling #1382 [Garnet.Worker] feat: Improve graceful shutdown by add AOF handling or take checkpoint if config enabled #1382 Oct 10, 2025
@yuseok-kim-edushare
Copy link
Contributor Author

yuseok-kim-edushare commented Oct 10, 2025

I found a issue that I confused AOF and garnet's tiered storage
So I modified codes that handle tiered storage checkpoints and AOF commit properly
with related #1390 checkpoint is more reliable states compared to AOF

{
try
{
var storeWrapperField = server.GetType().GetField("storeWrapper",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

instead of trying to operate from outside using reflection, why not add a StopAsync method to GarnetServer, so that it can perform the graceful shutdown without reflection?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, Your suggestion is reasonable!
Instead of using complex reflexion in Worker, Graceful Shutdown logic insertion into Garnet Server more simply maintanable and understandable

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, look forward to the update!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Feature Request: Implement Graceful Shutdown for Garnet as a Windows Service

2 participants