-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Acknowledge runner request #3996
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
2e3548a to
3ab9a4d
Compare
| protected async Task RetryRequest(Func<Task> func, | ||
| CancellationToken cancellationToken, | ||
| int maxRetryAttemptsCount = 5, | ||
| int maxAttempts = 5, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The previous description was a misnomer. This value indicates maximum number of attempts (i.e. initial attempt + retries).
| private ITerminal _term; | ||
| private TimeSpan _getNextMessageRetryInterval; | ||
| private TaskAgentStatus runnerStatus = TaskAgentStatus.Online; | ||
| private TaskAgentStatus _runnerStatus = TaskAgentStatus.Online; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
445ee6a to
ddd57ff
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR introduces a "runner request acknowledgment" feature that allows runners to acknowledge job requests from the Broker Listener before attempting to acquire the full job message from the Run Service. This change helps distinguish between user errors and infrastructure errors during acquire timeouts.
- Adds an acknowledgment mechanism through a new API endpoint and client method
- Implements conditional acknowledgment based on a feature flag (
ShouldAcknowledge) - Refactors variable naming from public fields to private fields with underscore prefix
Reviewed Changes
Copilot reviewed 7 out of 7 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
| src/Sdk/WebApi/WebApi/BrokerHttpClient.cs | Adds new AcknowledgeRunnerRequestAsync method with error handling |
| src/Runner.Listener/RunnerJobRequestRef.cs | Adds ShouldAcknowledge property for feature flag control |
| src/Runner.Listener/Runner.cs | Implements acknowledgment logic before job acquisition with best-effort error handling |
| src/Runner.Listener/MessageListener.cs | Adds acknowledgment method with timeout and refactors variable naming |
| src/Runner.Listener/BrokerMessageListener.cs | Implements acknowledgment method and updates variable naming |
| src/Runner.Common/RunnerService.cs | Refactors retry method parameter names for clarity |
| src/Runner.Common/BrokerServer.cs | Adds server-side acknowledgment method without retries |
ddd57ff to
557bb87
Compare
| public async Task AcknowledgeMessageAsync(string runnerRequestId, CancellationToken cancellationToken) | ||
| { | ||
| using var timeoutCts = new CancellationTokenSource(TimeSpan.FromSeconds(5)); // Short timeout | ||
| using var linkedCts = CancellationTokenSource.CreateLinkedTokenSource(cancellationToken, timeoutCts.Token); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
5 sec is meant to be short timeout in long run as well? do we do such things at any place in code?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can refine this further if needed. I just wanted to keep it short so it doesn't interfere with actually acquiring the job.
This new message ACK is best-effort anyway. It doesn't have to be 100%, it just has to be good enough to help us distinguish acquire timeout reason for the most part.
This change is required to distinguish user error vs infra error during acquire timeout.
The runner will ACK after receiving a job request from Broker Listener and before attempting to acquire the full job message from Run Service.