Skip to content
Merged
Changes from 17 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
165 changes: 160 additions & 5 deletions geps/gep-1742.md
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When I first read this I wondered why it only focused on HTTPRoute, but later I realized the original GEP specified that focus. It would be a good idea to change the title to "HTTPRoute Timeouts API", or else mention in Non-Goals that other route types are not covered.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Title changed

Original file line number Diff line number Diff line change
Expand Up @@ -12,12 +12,12 @@ timeouts for different types of connection.

## Goals

- Create some method to configure some timeouts
- Create some method to configure some timeouts.
- Timeout config must be applicable to most if not all Gateway API implementations.

## Non-Goals

- TBD
- A standard API for every possible timeout that implementations may support.

## Introduction

Expand Down Expand Up @@ -301,13 +301,168 @@ Could not find any HTTP specific timeouts. PRs welcomed. 😊

## API

TBD.
The above diagrams show that there are many different kinds of configurable timeouts
supported by Gateway implementations: connect, idle, request, upstream, downstream.
Although there may be opportunity for the specification of a common API for more of
them in the future, this GEP will focus on the L7 timeouts in HTTPRoutes that are
most valuable to clients.
Comment on lines +306 to +307
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think one of the most important questions here is where the timeout config belongs. This proposes HTTPRouteRule, but I think that both HTTPBackendRef and/or a new policy to extend Service would be compelling alternatives. It would be helpful if this GEP described why HTTPRouteRule is proposed and why the others alternatives were not chosen.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added some text in the Alternatives section.


From the above analysis, it appears that, at a minimum, implementations are capable of
supporting the configuration of simple client downstream request timeouts on HTTPRoute
rules. This is a relatively small addition that would benefit many users.

Some implementations support configuring a timeout for individual backend requests,
separate from the overall client request timeout. This is particularly useful if a
client HTTP request to a gateway can result in more than one call from the gateway
to the destination backend service, for example, if automatic retries are supported.
Adding non-Core support for this would also benefit many users.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you clarify if the client request covers the entire connection from client to backend, or just the downstream request from client to gateway? There is also at least one tunnel timeout (HAProxy) and perhaps some other timeouts that cover the entire connection, not just the downstream and upstream halves of the connection.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a diagram in the "Timeout values" section that shows the relationships, but you need to look at the rendered markdown to see it. https://github.com/kubernetes-sigs/gateway-api/blob/7e20c3d25368b2c91d0952dfbd842f46e3324b36/geps/gep-1742.md#timeout-values

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did see the diagram, yet still wanted to get the explanation in words and in the godoc of the implementation section for Request and BackendRequest. I asked about something similar in that section.

I don't see any timeout that would cover the entire transaction, unless Request does.

### GO

```go
type HTTPRouteRule struct {
// Timeouts defines the timeouts that can be configured for an HTTP request.
//
// Support: Core
//
// +optional
// <gateway:experimental>
Timeouts *HTTPRouteTimeouts `json:"timeouts,omitempty"`

// ...
}

// HTTPRouteTimeouts defines timeouts that can be configured for an HTTPRoute.
// Timeout values are formatted like 1h/1m/1s/1ms as parsed by Golang time.ParseDuration
// and MUST BE >= 1ms.
Copy link
Contributor

@candita candita May 31, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest having a meaningful zero-value for the timeouts, like Envoy does. "A value of 0 will completely disable the .... timeout", even if other related timeouts are configured.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This sounds like a good idea to me. I'll add that 0 means disable (no timeout). Thanks.

type HTTPRouteTimeouts struct {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would also add a name and/or type to the HTTPRouteTimeouts in case there are multiple needed for any given route. For example, maybe an idle timeout would be a different duration than a connect timeout.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I understand what you're suggesting here. If we wanted (in the future) to add another timeout (e.g., idle timeout), wouldn't we just add another field to HTTPRouteTimeouts?

rules:
- backendRef:
    ...
  timeouts:
    request: 2s
    idle: 10s

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, ok, so maybe you need to clarify in Goal and Non-Goals which timeout values you're addressing. If you call it "request", that doesn't mean much. What transactions are covered in "request"?

// Request specifies the duration for processing an HTTP client request after which the
// gateway will time out if unable to send a response.
// Whether the gateway starts the timeout before or after the entire client request stream
// has been received, is implementation-dependent.
Copy link
Contributor

@candita candita May 31, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would argue with this being implementation-dependent. As a user of Gateway API, I would like to know that if I set a 500ms timeout, that it would perform the same no matter which implementation I choose to run that HTTPRoute configuration.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that it would be better to not have this implementation dependent start time. The problem is that (based on the evaluation that @youngnick did) we believe that it can't be supported by all implementations. So, the alternative is to make the feature "Extended", but we instead are trying to use this compromise so that we can at least have something be part of "Core" support. FYI, here is some of the background discussion on this: #1997 (comment)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I interpret that discussion a little differently than saying it is implementation-dependent. It doesn't look like there was a conclusion on it, but I think @youngnick is asking for a better clarification on what exactly we're measuring here. It's still just a little too vague.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@candita, I added more detailed doc in the comments of both timeout fields. PTAL.

//
// For example, setting the `rules.timeouts.request` field to the value `10s` in an
// `HTTPRoute` will cause a timeout if a client request is taking longer than 10 seconds
// to complete.
//
// When this field is unspecified, request timeout behavior is implementation-dependent.
//
// Support: Core
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NGINX doesn't support request timeouts. However, not sure how many other implementations are in the same situation.

Is it possible to have this feature as Extended?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought NGINX's proxy_read_timeout aligned with Request timeout, is it a BackendRequest timeout, instead? If so, maybe BackendRequest should be Core, and Request should be Extended? Is there an NGINX timeout that takes retries into account?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought NGINX's proxy_read_timeout aligned with Request timeout, is it a BackendRequest timeout, instead?

unfortunately proxy_read_timeout applies not to the duration of a request, but to successive read operations from the proxied server :(

//
// +optional
// +kubebuilder:validation:Format=duration
Request *metav1.Duration `json:"request,omitempty"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Request *metav1.Duration `json:"request,omitempty"`
ClientRequest *metav1.Duration `json:"clientRequest,omitempty"`

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tend to prefer keeping it short-and-sweet (Request) but will switch if others also think ClientRequest would be better.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I generally prefer greater specificity too, but this timeout as designed is broadly similar to what's called a "request timeout" by a few of the proxy implementations at least, so I think in this case, the more general term is okay.


// BackendRequest specifies a timeout for an individual request from the gateway
// to a backend service. Typically used in conjuction with retry configuration,
// if supported by an implementation.
//
// The value of BackendRequest defaults to and must be <= the value of Request timeout.
Copy link
Contributor

@candita candita May 31, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you clarify - does the Request timeout encompass the BackendRequest timeout, or are they separate halves of the total timeout?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it encompasses backendRequest. It's shown in the diagram but maybe I should explain it better in the API comment too?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That would be helpful!

//
// Support: Extended
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens if you have 2 routes with same backend, and define 2 different backend timeouts ?

Will it mean that the backend request will have different timeouts depending on path ? Are all implementations capable of doing this ?

Copy link
Contributor Author

@frankbu frankbu May 30, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will it mean that the backend request will have different timeouts depending on path ? Are all implementations capable of doing this ?

Yes, the backend request timeout is client request specific. Envoy supports this (perTryTimeout) but all implementations probably not, so that's why it's Extended (instead of Core) support.

//
// +optional
// +kubebuilder:validation:Format=duration
BackendRequest *metav1.Duration `json:"backendRequest,omitempty"`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should document it should be LET Request timeout?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hzxuzhonghu I don't quite understand what you are suggesting. Can you clarify please?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it should be LET

Oh, did you mean to say LESS? If so, good idea. Done.

}
```

### YAML

```yaml
apiVersion: gateway.networking.k8s.io/v1beta1
kind: HTTPRoute
metadata:
name: timeout-example
spec:
...
rules:
- backendRefs:
- name: some-service
port: 8080
timeouts:
request: 10s
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not ISO 8601 as specified

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed to say format is as time.ParseDuration

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thats probably my fault, I added a link to the POC PR mentioning ISO 8601 but didnt read much further into the actual format

apimachinery metav1.Duration is pretty light on documentation too, can maybe just rip the field documentation from go if we need to be specific: https://pkg.go.dev/maze.io/x/duration#ParseDuration

backendRequest: 2s
```

### Timeout values
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this section should go before the API section, to further explain the timeouts that are being added.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done


There are 2 kinds of timeouts that can be configured in an `HTTPRouteRule`:

1. `timeouts.request` is the timeout for the Gateway API implementation to send a
response to a client HTTP request. Whether the gateway starts the timeout before
or after the entire client request stream has been received, is implementation dependent.
This field is required `Core` support.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Usually fields marked as Core support have tighter requirements than this and don't have implementation-specific components. I think this might be OK, but we need to be confident that we can write a reliable conformance test for this that would be compatible with all implementations. Are we sure that all implementations can support this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My first thought was to make it a requirement that the timeout start after the client request has been fully received, @youngnick had the same thought apparently (#1997 (comment), but unfortunately it seems that some proxies can't do that so this is the compromise.

I think a conformance test compatible with all instances is one that simply checks that a request times out when a backend is very slow, which btw is the main motivating use case for this feature.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think a conformance test compatible with all instances is one that simply checks that a request times out when a backend is very slow, which btw is the main motivating use case for this feature.

rereading this intent, I'm not sure the description and naming of this field really line up? may be nitpicky and discussed before but did we consider naming this something like backendRequest if the intent is to cover the case that a backend is slow seems like the naming should reflect that

the timeout for the Gateway API implementation to send a response to a client HTTP request sounds like something the client itself should be enforcing (as it is written here)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

did we consider naming this something like backendRequest

This proposal actually has an optional non-Core field named backendRequest 😃 It's used to set a timeout for a single request from the gateway to the backend. The request timeout is controlling how long the client should wait for a response given that the gateway may be making multiple backend requests (e.g., retries) before replying. Note that without retries or other fancy things going on, the two timeouts degenerate to more-or-less the same thing.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah right, confused that field since it was talking about retries as well so forgot about it, but that field sounds to me to cover more like what the intent of your comment about testing above is

the language The request timeout is controlling how long the client should wait for a response... similarly timeouts.request is the timeout for the Gateway API implementation to send a response to a client HTTP request sounds like a timeout that a client would enforce

If the field is documented like this but really is implemented differently/means something else I think this will be confusing for users, implementers, and those trying to give advice on supporting an implementation (we already have trouble with users of Contour getting confused about timeouts we expose, which are a lot more specific in that they tie in to Envoy timeouts and cover specific hops in the request flow)

(this and other threads may be rehashing why this wasn't done earlier, generalizing timeouts feels like it will lead to confusion)

Copy link
Member

@sunjayBhatia sunjayBhatia May 25, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

another general option as opposed to trying to generalize timeouts would be to make timeouts for all hops of a proxied request "extended" support and implementations could advertise what they support via conformance profile reporting as well as documentation, which has advantages but also a lot of disadvantages tbh

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was trying to describe timeout.request in a way to be clear that it is for the Gateway API implementation to timeout doing it's job (which mostly means while waiting for 1 or more backend requests to finish). Maybe there's a better way to word it? I'll give it some thought, and suggestions are welcome 😃

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

which has advantages but also a lot of disadvantages tbh

Yeah, it would be nice to make at least something Core.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated the comment in the Go struct for more clarity.


1. `timeouts.backendRequest` is a timeout for a single request from the gateway to a backend.
This field is optional `Extended` support. Typically used in conjuction with retry configuration,
if supported by an implementation.
Note that retry configuration will be the subject of a separate GEP (GEP-1731).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason not to leave this for a GEP focused on retry configuration?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is just another timeout, so IMO belongs in this GEP. Even though it's most commonly used in conjunction with retries, it could legitimately be used on its own, for example if a client is only interested in timing out if the backend is slow, but wants to be tolerant of delays on the client side. @kflynn was the one who suggested clearly separating it from retries, which I agree was a great idea.


```mermaid
sequenceDiagram
participant C as Client
participant P as Proxy
participant U as Upstream
C->>P: Connection Started
note left of P: timeouts.request start time (min)
C->>P: Starts sending Request
C->>P: Finishes Headers
C->>P: Finishes request
note left of P: timeouts.request start time (max)
P->>U: Connection Started
note right of P: timeouts.backendRequest start time
P->>U: Starts sending Request
P->>U: Finishes request
P->>U: Finishes Headers
U->>P: Starts Response
U->>P: Finishes Headers
note right of P: timeouts.backendRequest end time
note left of P: timeouts.request end time
U->>P: Finishes Response
note right of P: Repeat if retry
P->>C: Starts Response
P->>C: Finishes Headers
P->>C: Finishes Response
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that note left of P: timeouts.request timeout should move from line 420 to here, as

note left of P: timeouts.request end time`

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought about that but put it where it is for 2 reasons:

  1. All the implementations seem to have a timeout that corresponds to the point of starting to send the response, so it shouldn't be a problem to support.
  2. Seems to me that from a client's perspective, once the response starts coming back it's reasonable to expect that the requested timeout no longer applies.

So from a practical point of view the diagram should stay as is IMO.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mmmm, good point. Maybe do a min/max like you did with the timer start?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe, but I'm still waiting to see how well the min/max proposal for start time goes over with other reviewers 😃

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

😂 Tough to argue with that. 😂

Note right of P: Repeat if connection sharing
U->>C: Connection ended
```

Both timeout fields are string duration values as specified by
[Golang time.ParseDuration](https://pkg.go.dev/time#ParseDuration) and MUST be >= 1ms.

## Alternatives

(List other design alternatives and why we did not go in that
direction)
Timeouts could be configured using policy attachments or in objects other than `HTTPRouteRule`.

### Policy Attachment

Instead of configuring timeouts directly on an API object, they could be configured using policy
attachments. The advanatage to this approach would be that timeout policies can not only be
configured for an `HTTPRouteRule`, but can also be added/overriden at a more fine
(e.g., `HTTPBackendRef`) or course (e.g. `HTTPRoute`) level of granularity.

The downside, however, is complexity introduced for the most common use case, adding a simple
timeout for an HTTP request. Setting a single field in the route rule, instead of needing to
create a policy resource, for this simple case seems much better.

Note that in the future, we could use policy attachments to configure other kinds of less common
timeouts that may be needed. The default values of the proposed timeout fields could also be overriden
using policy attachments in the future. For example, a policy attachment could be used to set the
default value of `rules.timeouts.request` for all routes under an `HTTPRoute` or `Gateway`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using Policy Attachments to override is a great idea, but I would remove the suggestion that they be used to supplement the existing timeouts with less common timeouts. I think adding a name and type to the timeouts and allowing multiples is better.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll leave it in since it is an alternative, but I'll add something like "but it would probably be better to extend the proposed API in the future for those timeouts as well".


### Other API Objects

The new timeouts field could be added to a different API struct, instead of `HTTPRouteRule`.

Putting it on an `HTTPBackendRef`, for example, would allow users to set different timeouts for different
backends. This is a feature that we believe has not been requested by existing proxy or service mesh
clients and is also not easily implementable.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would omit the comment about "not easily implementable" unless you explain why. Maybe also add this to Non-Goals?

Suggested change
clients and is also not easily implementable.
clients and is not being considered for this proposal.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Clarified not easily implementable


Another alternative is to move the timeouts configuration up a level in the API to `HTTPRoute`. This
would be convenient when a user wants the same timeout on all rules, but would be overly restrictive.
Using policy attachments to override the default timeout value for all rules, as described in the
previous section, is likely a better way to handle timeout configuration above the route rule level.

## References

Expand Down