Skip to content

Conversation

@ctlong
Copy link
Member

@ctlong ctlong commented Apr 17, 2024

Description

Replaces grpc.Dial usage with grpc.NewClient. gRPC deprecated (and then briefly un-deprecated, but plan to re-deprecate) Dial and DialContext in favor of NewClient. See grpc/grpc-go#7010 for more info about NewClient.

Type of change

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • This change requires a documentation update

Testing performed?

  • Unit tests
  • Integration tests
  • Acceptance tests

Checklist:

  • This PR is being made against the main branch, or relevant version branch
  • I have made corresponding changes to the documentation
  • I have added testing for my changes

If you have any questions, or want to get attention for a PR or issue please reach out on the #logging-and-metrics channel in the cloudfoundry slack

acrmp and others added 2 commits April 17, 2024 17:50
Propogates errors received from the rlp via gRPC when sending requests.
Now that we don't block on dialing the server, we need to handle errors
at the sending requests level.

Signed-off-by: Andrew Crump <[email protected]>
@ctlong ctlong requested a review from a team as a code owner April 17, 2024 20:28
Fixes the failing pool tests in `plumbing` and `rlp/internal/ingress`
following the change to use `grpc.NewClient` instead of `grpc.Dial`.
This is because they were testing that connections were initiated when a
new doppler was added. However, with the change to NewClient,
connections won't be made until the resulting grpc client connections
are used for RPC. This also means that dopplers may be added to the pool
map when in the past they would not be added due to the connection on
Dial failing.

One approach to getting the existing tests to pass would have been to
call `Connect` on the grpc client connections to force them to leave
idle mode. However, `Connect` is experimental, and really gRPC is
encouraging us not to care about the connection state when creating
client connections.

To fix the tests we ended up just asserting on the pool size. One
downside of this was that we couldn't see a nice way to assert that
`Close` was called on the gRPC client connection when `Close` was called
for a doppler address. We could have replaced the connection creation
with an interface and mocked that but it didn't seem worth it.

Signed-off-by: Carson Long <[email protected]>
Signed-off-by: Rebecca Roberts <[email protected]>
@ctlong
Copy link
Member Author

ctlong commented Apr 17, 2024

@rroberts2222 and I tested these changes within CF-D and saw that they resulted in RLP and TrafficController not initiating any TCP connections to Dopplers until a client attempted to connect to them. This is in contrast to the old behaviour where the Doppler connection pools were pre-filled to the max. We think this change is fine.

@acrmp and tested the RLP Gateway changes and saw that clients will consistently receive 503 errors now when the RLP is not available, e.g. from the log-stream plugin:

unexpected status code 503: {"message":"streaming is temporarily unavailable","error":"streaming_unavailable"}

The RLP Gateway logs will not emit stack traces in those cases, but will emit an error line for every failed request:

2024/04/16 21:41:25 failed to open stream from logs provider: rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp 127.0.0.1:8082: connect: connection refused"

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants