outbound: implement `OutboundPolicies` backend request timeouts #2419

hawkw · 2023-06-02T20:07:53Z

Depends on #2418

The latest proxy-api release, v0.10.0, adds fields to the
OutboundPolicies API for configuring HTTP request timeouts, based on
the proposed changes to HTTPRoute in kubernetes-sigs/gateway-api#1997.
PR #2418 updates the proxy to depend on the new proxy-api release, and
implements the Rule.request_timeout field added to the API. However,
that branch does not add a timeout for the
RouteBackend.request_timeout field. This branch changes the proxy to
apply the backend request timeout when configured by the policy
controller.

This branch implements RouteBackend.request_timeout by adding an
additional timeout layer in the MatchedBackend stack. This applies the
per-backend timeout once a backend is selected for a route. I've also
added stack tests for the interaction between the request and backend
request timeouts.

Note that once retries are added to client policy stacks, it may be
necessary to move the backend request timeout to ensure it occurs
"below" retries, depending on where the retry middleware ends up being
located in the proxy stack.

adleong · 2023-06-02T23:31:02Z

My 2 cents is that we have some leeway in how we choose to implement this and we don't need to follow the spec to the letter. If it is easier for us to implement and maintain to put this timeout in the logical stack after a backend is selected, I think that would be okay. The important part is that the timeout happens after retries and after backend selection.

hawkw · 2023-06-15T15:37:44Z

@adleong I've updated this branch based on your feedback, removing the TimeoutFromRequest middleware. Now, the backend request timeout is applied in the policy logical stack's matched backend stack.

mateiidavid

I managed to work most of the code out and this looks good to me.

PRs #2418 and #2419 add per-route and per-backend request timeouts configured by the `OutboundPolicies` API to the `MatchedRoute` and `MatchedBackend` layers in the outbound `ClientPolicy` stack, respectively. This means that — unlike in the `ServiceProfile` stack — two separate request timeouts can be configured in `ClientPolicy` stacks. However, because both the `MatchedRoute` and `MatchedBackend` layers are in the HTTP logical stack, the errors emitted by both timeouts will have a `LogicalError` as their most specific error metadata, meaning that the log messages and `l5d-proxy-error` headers recorded for these timeouts do not indicate whether the timeout that failed the request was the route request timeout or the backend request timeout. In order to ensure this information is recorded and exposed to the user, this branch adds two new error wrapper types, one of which enriches an error with a `RouteRef`'s metadata, and one of which enriches an error with a `BackendRef`'s metadata. The `MatchedRoute` stack now wraps all errors with `RouteRef` metadata, and the `MatchedBackend` stack wraps errors with `BackendRef` metadata. This way, when the route timeout fails a request, the error will include the route metadata, while when the backend request timeout fails a request, the error will include both the route and backend metadata. Adding these new error wrappers also has the additional side benefit of adding this metadata to errors returned by filters, allowing users to distinguish between errors emitted by a filter on a route rule and errors emitted by a per-backend filter. Also, any other errors emitted lower in the stack for requests that are handled by a client policy stack will now also include this metadata, which seems generally useful. Example errors, taken from a proxy unit test: backend request: ``` logical service logical.test.svc.cluster.local:666: route httproute.test.timeout-route: backend service.test.test-svc:666: HTTP response timeout after 1s ``` route request: ``` logical service logical.test.svc.cluster.local:666: route httproute.test.timeout-route: HTTP response timeout after 2s ```

PRs #2418 and #2419 add per-route and per-backend request timeouts configured by the `OutboundPolicies` API to the `MatchedRoute` and `MatchedBackend` layers in the outbound `ClientPolicy` stack, respectively. This means that — unlike in the `ServiceProfile` stack — two separate request timeouts can be configured in `ClientPolicy` stacks. However, because both the `MatchedRoute` and `MatchedBackend` layers are in the HTTP logical stack, the errors emitted by both timeouts will have a `LogicalError` as their most specific error metadata, meaning that the log messages and `l5d-proxy-error` headers recorded for these timeouts do not indicate whether the timeout that failed the request was the route request timeout or the backend request timeout. In order to ensure this information is recorded and exposed to the user, this branch adds two new error wrapper types, one of which enriches an error with a `RouteRef`'s metadata, and one of which enriches an error with a `BackendRef`'s metadata. The `MatchedRoute` stack now wraps all errors with `RouteRef` metadata, and the `MatchedBackend` stack wraps errors with `BackendRef` metadata. This way, when the route timeout fails a request, the error will include the route metadata, while when the backend request timeout fails a request, the error will include both the route and backend metadata. Adding these new error wrappers also has the additional side benefit of adding this metadata to errors returned by filters, allowing users to distinguish between errors emitted by a filter on a route rule and errors emitted by a per-backend filter. Also, any other errors emitted lower in the stack for requests that are handled by a client policy stack will now also include this metadata, which seems generally useful. Example errors, taken from a proxy unit test: backend request: ``` logical service logical.test.svc.cluster.local:666: route httproute.test.timeout-route: backend service.test.test-svc:666: HTTP response timeout after 1s ``` route request: ``` logical service logical.test.svc.cluster.local:666: route httproute.test.timeout-route: HTTP response timeout after 2s ``` --- * recover: remove unused `mut` (linkerd/linkerd2-proxy#2425) * outbound: implement `OutboundPolicies` route request timeouts (linkerd/linkerd2-proxy#2418) * build(deps): bump tj-actions/changed-files from 35.7.7 to 36.2.1 (linkerd/linkerd2-proxy#2427) * outbound: implement `OutboundPolicies` backend request timeouts (linkerd/linkerd2-proxy#2419) * outbound: add backend and route metadata to errors (linkerd/linkerd2-proxy#2428) Signed-off-by: Eric Anderson <[email protected]>

Updating proxy version on main PRs #2418 and #2419 add per-route and per-backend request timeouts configured by the `OutboundPolicies` API to the `MatchedRoute` and `MatchedBackend` layers in the outbound `ClientPolicy` stack, respectively. This means that — unlike in the `ServiceProfile` stack — two separate request timeouts can be configured in `ClientPolicy` stacks. However, because both the `MatchedRoute` and `MatchedBackend` layers are in the HTTP logical stack, the errors emitted by both timeouts will have a `LogicalError` as their most specific error metadata, meaning that the log messages and `l5d-proxy-error` headers recorded for these timeouts do not indicate whether the timeout that failed the request was the route request timeout or the backend request timeout. In order to ensure this information is recorded and exposed to the user, this branch adds two new error wrapper types, one of which enriches an error with a `RouteRef`'s metadata, and one of which enriches an error with a `BackendRef`'s metadata. The `MatchedRoute` stack now wraps all errors with `RouteRef` metadata, and the `MatchedBackend` stack wraps errors with `BackendRef` metadata. This way, when the route timeout fails a request, the error will include the route metadata, while when the backend request timeout fails a request, the error will include both the route and backend metadata. Adding these new error wrappers also has the additional side benefit of adding this metadata to errors returned by filters, allowing users to distinguish between errors emitted by a filter on a route rule and errors emitted by a per-backend filter. Also, any other errors emitted lower in the stack for requests that are handled by a client policy stack will now also include this metadata, which seems generally useful. Example errors, taken from a proxy unit test: backend request: ``` logical service logical.test.svc.cluster.local:666: route httproute.test.timeout-route: backend service.test.test-svc:666: HTTP response timeout after 1s ``` route request: ``` logical service logical.test.svc.cluster.local:666: route httproute.test.timeout-route: HTTP response timeout after 2s ``` --- * recover: remove unused `mut` (linkerd/linkerd2-proxy#2425) * outbound: implement `OutboundPolicies` route request timeouts (linkerd/linkerd2-proxy#2418) * build(deps): bump tj-actions/changed-files from 35.7.7 to 36.2.1 (linkerd/linkerd2-proxy#2427) * outbound: implement `OutboundPolicies` backend request timeouts (linkerd/linkerd2-proxy#2419) * outbound: add backend and route metadata to errors (linkerd/linkerd2-proxy#2428) Signed-off-by: Eric Anderson <[email protected]>

hawkw requested a review from olix0r June 2, 2023 20:07

hawkw requested a review from a team as a code owner June 2, 2023 20:07

hawkw self-assigned this Jun 2, 2023

hawkw requested a review from adleong June 13, 2023 15:54

hawkw mentioned this pull request Jun 14, 2023

outbound: implement OutboundPolicies route request timeouts #2418

Merged

Base automatically changed from eliza/client-policy-timeouts to main June 14, 2023 18:36

hawkw added 9 commits June 15, 2023 07:22

add route request timeouts

91747f5

add tests for backend request timeouts

b4cb093

add backend request timeouts to backends

d3eec62

insert timeout into req extensions

08020c8

make the test actually test the right thing

a94ae56

moar test

42486f9

implement backend request timeouts

64aca6a

rustfmt

ba45e1e

move timeouts into backend stack

97e4fcc

hawkw force-pushed the eliza/backend-request-timeout branch from 69671c6 to 97e4fcc Compare June 15, 2023 14:38

update test to remove queue behavior

7585439

rm

c04f8c4

mateiidavid approved these changes Jun 15, 2023

View reviewed changes

hawkw mentioned this pull request Jun 15, 2023

outbound: add backend and route metadata to errors #2428

Merged

adleong approved these changes Jun 15, 2023

View reviewed changes

hawkw merged commit 966306b into main Jun 15, 2023

hawkw deleted the eliza/backend-request-timeout branch June 15, 2023 17:48

risingspiral mentioned this pull request Jun 15, 2023

proxy: v2.204.0 linkerd/linkerd2#11024

Merged

hawkw mentioned this pull request Jul 26, 2023

outbound: implement OutboundPolicy retries #2446

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

outbound: implement `OutboundPolicies` backend request timeouts #2419

outbound: implement `OutboundPolicies` backend request timeouts #2419

Uh oh!

hawkw commented Jun 2, 2023 •

edited

Loading

Uh oh!

adleong commented Jun 2, 2023

Uh oh!

hawkw commented Jun 15, 2023

Uh oh!

mateiidavid left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

outbound: implement OutboundPolicies backend request timeouts #2419

outbound: implement OutboundPolicies backend request timeouts #2419

Uh oh!

Conversation

hawkw commented Jun 2, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

adleong commented Jun 2, 2023

Uh oh!

hawkw commented Jun 15, 2023

Uh oh!

mateiidavid left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

outbound: implement `OutboundPolicies` backend request timeouts #2419

outbound: implement `OutboundPolicies` backend request timeouts #2419

hawkw commented Jun 2, 2023 •

edited

Loading