-
Notifications
You must be signed in to change notification settings - Fork 282
outbound: implement OutboundPolicies backend request timeouts
#2419
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
My 2 cents is that we have some leeway in how we choose to implement this and we don't need to follow the spec to the letter. If it is easier for us to implement and maintain to put this timeout in the logical stack after a backend is selected, I think that would be okay. The important part is that the timeout happens after retries and after backend selection. |
69671c6 to
97e4fcc
Compare
|
@adleong I've updated this branch based on your feedback, removing the |
mateiidavid
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I managed to work most of the code out and this looks good to me.
PRs #2418 and #2419 add per-route and per-backend request timeouts configured by the `OutboundPolicies` API to the `MatchedRoute` and `MatchedBackend` layers in the outbound `ClientPolicy` stack, respectively. This means that — unlike in the `ServiceProfile` stack — two separate request timeouts can be configured in `ClientPolicy` stacks. However, because both the `MatchedRoute` and `MatchedBackend` layers are in the HTTP logical stack, the errors emitted by both timeouts will have a `LogicalError` as their most specific error metadata, meaning that the log messages and `l5d-proxy-error` headers recorded for these timeouts do not indicate whether the timeout that failed the request was the route request timeout or the backend request timeout. In order to ensure this information is recorded and exposed to the user, this branch adds two new error wrapper types, one of which enriches an error with a `RouteRef`'s metadata, and one of which enriches an error with a `BackendRef`'s metadata. The `MatchedRoute` stack now wraps all errors with `RouteRef` metadata, and the `MatchedBackend` stack wraps errors with `BackendRef` metadata. This way, when the route timeout fails a request, the error will include the route metadata, while when the backend request timeout fails a request, the error will include both the route and backend metadata. Adding these new error wrappers also has the additional side benefit of adding this metadata to errors returned by filters, allowing users to distinguish between errors emitted by a filter on a route rule and errors emitted by a per-backend filter. Also, any other errors emitted lower in the stack for requests that are handled by a client policy stack will now also include this metadata, which seems generally useful. Example errors, taken from a proxy unit test: backend request: ``` logical service logical.test.svc.cluster.local:666: route httproute.test.timeout-route: backend service.test.test-svc:666: HTTP response timeout after 1s ``` route request: ``` logical service logical.test.svc.cluster.local:666: route httproute.test.timeout-route: HTTP response timeout after 2s ```
PRs #2418 and #2419 add per-route and per-backend request timeouts configured by the `OutboundPolicies` API to the `MatchedRoute` and `MatchedBackend` layers in the outbound `ClientPolicy` stack, respectively. This means that — unlike in the `ServiceProfile` stack — two separate request timeouts can be configured in `ClientPolicy` stacks. However, because both the `MatchedRoute` and `MatchedBackend` layers are in the HTTP logical stack, the errors emitted by both timeouts will have a `LogicalError` as their most specific error metadata, meaning that the log messages and `l5d-proxy-error` headers recorded for these timeouts do not indicate whether the timeout that failed the request was the route request timeout or the backend request timeout. In order to ensure this information is recorded and exposed to the user, this branch adds two new error wrapper types, one of which enriches an error with a `RouteRef`'s metadata, and one of which enriches an error with a `BackendRef`'s metadata. The `MatchedRoute` stack now wraps all errors with `RouteRef` metadata, and the `MatchedBackend` stack wraps errors with `BackendRef` metadata. This way, when the route timeout fails a request, the error will include the route metadata, while when the backend request timeout fails a request, the error will include both the route and backend metadata. Adding these new error wrappers also has the additional side benefit of adding this metadata to errors returned by filters, allowing users to distinguish between errors emitted by a filter on a route rule and errors emitted by a per-backend filter. Also, any other errors emitted lower in the stack for requests that are handled by a client policy stack will now also include this metadata, which seems generally useful. Example errors, taken from a proxy unit test: backend request: ``` logical service logical.test.svc.cluster.local:666: route httproute.test.timeout-route: backend service.test.test-svc:666: HTTP response timeout after 1s ``` route request: ``` logical service logical.test.svc.cluster.local:666: route httproute.test.timeout-route: HTTP response timeout after 2s ``` --- * recover: remove unused `mut` (linkerd/linkerd2-proxy#2425) * outbound: implement `OutboundPolicies` route request timeouts (linkerd/linkerd2-proxy#2418) * build(deps): bump tj-actions/changed-files from 35.7.7 to 36.2.1 (linkerd/linkerd2-proxy#2427) * outbound: implement `OutboundPolicies` backend request timeouts (linkerd/linkerd2-proxy#2419) * outbound: add backend and route metadata to errors (linkerd/linkerd2-proxy#2428) Signed-off-by: Eric Anderson <[email protected]>
Updating proxy version on main PRs #2418 and #2419 add per-route and per-backend request timeouts configured by the `OutboundPolicies` API to the `MatchedRoute` and `MatchedBackend` layers in the outbound `ClientPolicy` stack, respectively. This means that — unlike in the `ServiceProfile` stack — two separate request timeouts can be configured in `ClientPolicy` stacks. However, because both the `MatchedRoute` and `MatchedBackend` layers are in the HTTP logical stack, the errors emitted by both timeouts will have a `LogicalError` as their most specific error metadata, meaning that the log messages and `l5d-proxy-error` headers recorded for these timeouts do not indicate whether the timeout that failed the request was the route request timeout or the backend request timeout. In order to ensure this information is recorded and exposed to the user, this branch adds two new error wrapper types, one of which enriches an error with a `RouteRef`'s metadata, and one of which enriches an error with a `BackendRef`'s metadata. The `MatchedRoute` stack now wraps all errors with `RouteRef` metadata, and the `MatchedBackend` stack wraps errors with `BackendRef` metadata. This way, when the route timeout fails a request, the error will include the route metadata, while when the backend request timeout fails a request, the error will include both the route and backend metadata. Adding these new error wrappers also has the additional side benefit of adding this metadata to errors returned by filters, allowing users to distinguish between errors emitted by a filter on a route rule and errors emitted by a per-backend filter. Also, any other errors emitted lower in the stack for requests that are handled by a client policy stack will now also include this metadata, which seems generally useful. Example errors, taken from a proxy unit test: backend request: ``` logical service logical.test.svc.cluster.local:666: route httproute.test.timeout-route: backend service.test.test-svc:666: HTTP response timeout after 1s ``` route request: ``` logical service logical.test.svc.cluster.local:666: route httproute.test.timeout-route: HTTP response timeout after 2s ``` --- * recover: remove unused `mut` (linkerd/linkerd2-proxy#2425) * outbound: implement `OutboundPolicies` route request timeouts (linkerd/linkerd2-proxy#2418) * build(deps): bump tj-actions/changed-files from 35.7.7 to 36.2.1 (linkerd/linkerd2-proxy#2427) * outbound: implement `OutboundPolicies` backend request timeouts (linkerd/linkerd2-proxy#2419) * outbound: add backend and route metadata to errors (linkerd/linkerd2-proxy#2428) Signed-off-by: Eric Anderson <[email protected]>
Depends on #2418
The latest proxy-api release, v0.10.0, adds fields to the
OutboundPoliciesAPI for configuring HTTP request timeouts, based onthe proposed changes to HTTPRoute in kubernetes-sigs/gateway-api#1997.
PR #2418 updates the proxy to depend on the new proxy-api release, and
implements the
Rule.request_timeoutfield added to the API. However,that branch does not add a timeout for the
RouteBackend.request_timeoutfield. This branch changes the proxy toapply the backend request timeout when configured by the policy
controller.
This branch implements
RouteBackend.request_timeoutby adding anadditional timeout layer in the
MatchedBackendstack. This applies theper-backend timeout once a backend is selected for a route. I've also
added stack tests for the interaction between the request and backend
request timeouts.
Note that once retries are added to client policy stacks, it may be
necessary to move the backend request timeout to ensure it occurs
"below" retries, depending on where the retry middleware ends up being
located in the proxy stack.