Skip to content

Conversation

@pichlermarc
Copy link
Member

@pichlermarc pichlermarc commented Jul 31, 2024

Which problem is this PR solving?

This is a fix for an unreleased bug (therefore skipping changelog). While working on a follow up for the web exporters, I realized that the RetryingTransport I introduced in #4743 it does not correctly handle timeouts. Instead of computing remaining time it does not take care of timeouts at all, meaning that consecutive send calls can actually take too long when retrying. This changes the transport interface so that we can set a timeout for each send call, and computes the remaining time to pass to the underlying send call.

Example (correct behavior):

  • call send with a timeout of 1000ms, retrying transport calls send on the underlying transport, with timeout of 1000ms
  • transport returns 503, in 10ms, says to retry-after 500ms
  • now, the actual time left for the retry is 1000ms (original timeout) - 500ms (the time that was spent on waiting) - 10ms (time of the original call) = 490ms
  • transport again returns 503, in 10ms, says to retry-after 500ms
  • we should not retry as we only have 480ms left, but the sever tells us to wait for 500ms, which will be after our timeout.

Example (current behavior):

  • call send with a timeout of 1000ms, retrying transport calls send on the underlying transport, with timeout of 1000ms
  • transport returns 503, in 10ms, says to retry-after 500ms
  • wait 500ms, try again with a timeout of 1000ms
  • transport again returns 503, in 10ms, says to retry-after 500ms
  • wait 500ms, try again with a timeout of 1000ms (now we've actually exceeded our timeout already)
  • this carries on until we reached max timeouts

Enables #4895

Type of change

Please delete options that are not relevant.

  • Bug fix (non-breaking change which fixes an issue)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)

How Has This Been Tested?

  • Existing tests
  • Added unit test

@pichlermarc pichlermarc requested a review from a team July 31, 2024 12:24
@pichlermarc pichlermarc added bug Something isn't working Skip Changelog labels Jul 31, 2024
@pichlermarc pichlermarc added the target:next-minor-release This PR is in scope for the next minor release (`main` branch) label Aug 7, 2024
@pichlermarc pichlermarc added this pull request to the merge queue Aug 9, 2024
Merged via the queue into open-telemetry:main with commit 5cc3dc2 Aug 9, 2024
@pichlermarc pichlermarc deleted the fix/no-retry-after-deadline branch August 9, 2024 07:35
Zirak pushed a commit to Zirak/opentelemetry-js that referenced this pull request Sep 14, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working Skip Changelog target:next-minor-release This PR is in scope for the next minor release (`main` branch)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants