Skip to content

Conversation

@abdullahalrifat
Copy link
Contributor

This PR adds a unit test to verify that Check correctly handles explicit context cancellation before query execution.

  • Introduced a testServer wrapper with a beforeCheck hook to simulate cancellation during request handling.

  • Ensured that the expected error (context canceled) is returned and properly logged.

  • Validated that the error metric CheckRequestCancelledErr is incremented.

This strengthens test coverage around graceful handling of cancelled requests and ensures consistent logging/metrics behavior.

@abdullahalrifat abdullahalrifat force-pushed the fix/check-context-cancel-logging branch from 62b551d to 10f7ff8 Compare September 28, 2025 03:14
Copy link
Collaborator

@tjons tjons left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @abdullahalrifat - thanks for the contribution. I left a review with some suggested changes, but I'd be happy to help you get this in. That said, I'm curious how you found this behavior? Did it arise in a production deployment? Some background about how this context might be cancelled would be helpful - are you seeing context cancellations from Envoy?

Once we better understand that and you are able to make the suggested changes, I will need you to squash your commits and rebase on top of the latest main as well.

Thanks for contributing!

@abdullahalrifat
Copy link
Contributor Author

Hi @abdullahalrifat - thanks for the contribution. I left a review with some suggested changes, but I'd be happy to help you get this in. That said, I'm curious how you found this behavior? Did it arise in a production deployment? Some background about how this context might be cancelled would be helpful - are you seeing context cancellations from Envoy?

Once we better understand that and you are able to make the suggested changes, I will need you to squash your commits and rebase on top of the latest main as well.

Thanks for contributing!

When I deployed OPA with Envoy in a Kubernetes cluster, I noticed that some incoming requests occasionally failed. These failures could happen for various reasons, such as networking issues. On the client side, this wasn’t a problem since they had a retry mechanism in place, but OPA was still logging these as failures.

When analyzing OPA failure logs, we generally classify them into categories like:

  • Rule conflicts (caused by bad policies)

  • Query timeouts (when evaluation takes longer than expected)

However, the context_cancelled errors are different — they aren’t actual OPA failures, since OPA itself isn’t at fault. I wanted to separate these out to ensure that OPA failure logging accurately reflects real issues.

@abdullahalrifat abdullahalrifat force-pushed the fix/check-context-cancel-logging branch from 117e916 to f599383 Compare September 29, 2025 03:58
Copy link
Collaborator

@tjons tjons left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This LGTM, @abdullahalrifat - can you please rebase on the latest main and squash your commits?

@tjons tjons force-pushed the fix/check-context-cancel-logging branch from aee76ee to 1558934 Compare October 25, 2025 16:38
fixed context cancelled with proper logging

Signed-off-by: abdullahalrifat <[email protected]>
Signed-off-by: tjons <[email protected]>
@tjons tjons force-pushed the fix/check-context-cancel-logging branch from 1558934 to ff43280 Compare October 25, 2025 16:38
@tjons tjons merged commit 8a375b5 into open-policy-agent:main Oct 25, 2025
8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants