Skip to content

Conversation

@ericsciple
Copy link
Collaborator

@ericsciple ericsciple commented Aug 20, 2025

This change is required to distinguish user error vs infra error during acquire timeout.

The runner will ACK after receiving a job request from Broker Listener and before attempting to acquire the full job message from Run Service.

@ericsciple ericsciple force-pushed the users/ericsciple/25-08-ack branch 2 times, most recently from 2e3548a to 3ab9a4d Compare August 22, 2025 16:50
protected async Task RetryRequest(Func<Task> func,
CancellationToken cancellationToken,
int maxRetryAttemptsCount = 5,
int maxAttempts = 5,
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The previous description was a misnomer. This value indicates maximum number of attempts (i.e. initial attempt + retries).

private ITerminal _term;
private TimeSpan _getNextMessageRetryInterval;
private TaskAgentStatus runnerStatus = TaskAgentStatus.Online;
private TaskAgentStatus _runnerStatus = TaskAgentStatus.Online;
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

@ericsciple ericsciple force-pushed the users/ericsciple/25-08-ack branch 2 times, most recently from 445ee6a to ddd57ff Compare August 22, 2025 18:17
@ericsciple ericsciple marked this pull request as ready for review August 22, 2025 18:21
Copilot AI review requested due to automatic review settings August 22, 2025 18:21
@ericsciple ericsciple requested a review from a team as a code owner August 22, 2025 18:21
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR introduces a "runner request acknowledgment" feature that allows runners to acknowledge job requests from the Broker Listener before attempting to acquire the full job message from the Run Service. This change helps distinguish between user errors and infrastructure errors during acquire timeouts.

  • Adds an acknowledgment mechanism through a new API endpoint and client method
  • Implements conditional acknowledgment based on a feature flag (ShouldAcknowledge)
  • Refactors variable naming from public fields to private fields with underscore prefix

Reviewed Changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
src/Sdk/WebApi/WebApi/BrokerHttpClient.cs Adds new AcknowledgeRunnerRequestAsync method with error handling
src/Runner.Listener/RunnerJobRequestRef.cs Adds ShouldAcknowledge property for feature flag control
src/Runner.Listener/Runner.cs Implements acknowledgment logic before job acquisition with best-effort error handling
src/Runner.Listener/MessageListener.cs Adds acknowledgment method with timeout and refactors variable naming
src/Runner.Listener/BrokerMessageListener.cs Implements acknowledgment method and updates variable naming
src/Runner.Common/RunnerService.cs Refactors retry method parameter names for clarity
src/Runner.Common/BrokerServer.cs Adds server-side acknowledgment method without retries

@ericsciple ericsciple force-pushed the users/ericsciple/25-08-ack branch from ddd57ff to 557bb87 Compare August 22, 2025 18:34
public async Task AcknowledgeMessageAsync(string runnerRequestId, CancellationToken cancellationToken)
{
using var timeoutCts = new CancellationTokenSource(TimeSpan.FromSeconds(5)); // Short timeout
using var linkedCts = CancellationTokenSource.CreateLinkedTokenSource(cancellationToken, timeoutCts.Token);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

5 sec is meant to be short timeout in long run as well? do we do such things at any place in code?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can refine this further if needed. I just wanted to keep it short so it doesn't interfere with actually acquiring the job.

This new message ACK is best-effort anyway. It doesn't have to be 100%, it just has to be good enough to help us distinguish acquire timeout reason for the most part.

@ericsciple ericsciple merged commit 5f1efec into main Aug 22, 2025
10 checks passed
@ericsciple ericsciple deleted the users/ericsciple/25-08-ack branch August 22, 2025 18:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants