Skip to content

Conversation

@arkodg
Copy link
Contributor

@arkodg arkodg commented Sep 19, 2023

Fixes: #877

@arkodg arkodg requested a review from a team as a code owner September 19, 2023 00:07
@arkodg arkodg marked this pull request as draft September 19, 2023 00:07
@codecov
Copy link

codecov bot commented Sep 19, 2023

Codecov Report

Merging #1899 (dc2ec96) into main (a785be8) will decrease coverage by 0.07%.
The diff coverage is 78.26%.

@@            Coverage Diff             @@
##             main    #1899      +/-   ##
==========================================
- Coverage   65.51%   65.44%   -0.07%     
==========================================
  Files          88       88              
  Lines       13097    13120      +23     
==========================================
+ Hits         8580     8586       +6     
- Misses       3989     4003      +14     
- Partials      528      531       +3     
Files Coverage Δ
internal/gatewayapi/route.go 88.12% <100.00%> (+0.21%) ⬆️
internal/ir/xds.go 75.51% <ø> (ø)
internal/xds/translator/route.go 90.90% <100.00%> (+0.10%) ⬆️
internal/ir/zz_generated.deepcopy.go 14.19% <0.00%> (-0.09%) ⬇️

... and 2 files with indirect coverage changes

@arkodg
Copy link
Contributor Author

arkodg commented Sep 19, 2023

the timeout API was accidentally made part of gateway-api v0.8.1 so I took a quick stab at implementing it.
However prefer waiting until this API is officially out in v1.0 with conformance tests in the next few weeks

@arkodg
Copy link
Contributor Author

arkodg commented Sep 19, 2023

proactively raising some questions

  • should we always set a default route timeout ? or default to Envoy's default of 15s which imo is too high
  • should we always explicitly set the default retry policy ? imo we should
    • retry count - 1 (this is Envoy default) or 2
    • retry on - reset,connect-failure,refused-stream,unavailable,cancelled,retriable-status-codes
    • per try timeout - route timeout / retry count
    • retriable_status_codes - 503
    • retry_host_predicate - envoy.retry_host_predicates.previous_hosts
    • host_selection_retry_max_attempts: 3

https://www.envoyproxy.io/docs/envoy/latest/api-v3/config/route/v3/route_components.proto#envoy-v3-api-msg-config-route-v3-retrypolicy

ptal @envoyproxy/gateway-maintainers

@arkodg arkodg mentioned this pull request Sep 26, 2023
@arkodg arkodg marked this pull request as ready for review September 26, 2023 18:34
@arkodg arkodg requested review from a team, Xunzhuo, qicz and zhaohuabing and removed request for a team September 26, 2023 21:10
Arko Dasgupta added 3 commits September 26, 2023 16:29
Fixes: envoyproxy#877

Signed-off-by: Arko Dasgupta <[email protected]>
Signed-off-by: Arko Dasgupta <[email protected]>
Signed-off-by: Arko Dasgupta <[email protected]>
@zhaohuabing
Copy link
Member

proactively raising some questions

  • should we always set a default route timeout ? or default to Envoy's default of 15s which imo is too high

Maybe stick to Envoy's default until we get a strong reason to change it - like many feedbacks from end users.

  • should we always explicitly set the default retry policy ? imo we should
    • retry count - 1 (this is Envoy default) or 2
    • retry on - reset,connect-failure,refused-stream,unavailable,cancelled,retriable-status-codes
    • per try timeout - route timeout / retry count
    • retriable_status_codes - 503
    • retry_host_predicate - envoy.retry_host_predicates.previous_hosts
    • host_selection_retry_max_attempts: 3

https://www.envoyproxy.io/docs/envoy/latest/api-v3/config/route/v3/route_components.proto#envoy-v3-api-msg-config-route-v3-retrypolicy

ptal @envoyproxy/gateway-maintainers

We cant assume the backkend services are always idempotent. I have encountered issues on retries - the service has been called multiple times on the same requests and caused problems on the server side.

@arkodg
Copy link
Contributor Author

arkodg commented Sep 27, 2023

proactively raising some questions

  • should we always set a default route timeout ? or default to Envoy's default of 15s which imo is too high

Maybe stick to Envoy's default until we get a strong reason to change it - like many feedbacks from end users.

  • should we always explicitly set the default retry policy ? imo we should

    • retry count - 1 (this is Envoy default) or 2
    • retry on - reset,connect-failure,refused-stream,unavailable,cancelled,retriable-status-codes
    • per try timeout - route timeout / retry count
    • retriable_status_codes - 503
    • retry_host_predicate - envoy.retry_host_predicates.previous_hosts
    • host_selection_retry_max_attempts: 3

https://www.envoyproxy.io/docs/envoy/latest/api-v3/config/route/v3/route_components.proto#envoy-v3-api-msg-config-route-v3-retrypolicy
ptal @envoyproxy/gateway-maintainers

We cant assume the backkend services are always idempotent. I have encountered issues on retries - the service has been called multiple times on the same requests and caused problems on the server side.

hey @zhaohuabing not adding any retries in this PR
@kflynn and I chatted about this in the community meeting last week and decided to make retries opt in using a seperate API.

Also sticking to envoy timeout default as the default

@zirain
Copy link
Member

zirain commented Sep 27, 2023

any conformance/e2e test for this?

@arkodg
Copy link
Contributor Author

arkodg commented Sep 27, 2023

kubernetes-sigs/gateway-api#2254 should be get merged soon

@zirain
Copy link
Member

zirain commented Sep 27, 2023

let's wait for that?

@arkodg
Copy link
Contributor Author

arkodg commented Sep 27, 2023

let's wait for that?

there are already translation tests in gateway-api & xds layer

Copy link
Member

@zhaohuabing zhaohuabing left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

I'm okay with merging it now and updating the conformance test later when kubernetes-sigs/gateway-api#2254 is merged.

Copy link
Contributor

@kflynn kflynn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks OK from here! 🙂

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Support Route timeout And Cluster connect timeout configure

5 participants