1- ## Anti-Patterns
2-
3- ### Dialing in gRPC
4- [ ` grpc.Dial ` ] ( https://pkg.go.dev/google.golang.org/grpc#Dial ) is a function in
5- the gRPC library that creates a virtual connection from the gRPC client to the
6- gRPC server. It takes a target URI (which can represent the name of a logical
7- backend service and could resolve to multiple actual addresses) and a list of
8- options, and returns a
1+ ## Anti-Patterns of Client creation
2+
3+ ### How to properly create a ` ClientConn ` : ` grpc.NewClient `
4+
5+ [ ` grpc.NewClient ` ] ( https://pkg.go.dev/google.golang.org/grpc#NewClient ) is the
6+ function in the gRPC library that creates a virtual connection from a client
7+ application to a gRPC server. It takes a target URI (which represents the name
8+ of a logical backend service and resolves to one or more physical addresses) and
9+ a list of options, and returns a
910[ ` ClientConn ` ] ( https://pkg.go.dev/google.golang.org/grpc#ClientConn ) object that
10- represents the connection to the server. The ` ClientConn ` contains one or more
11- actual connections to real server backends and attempts to keep these
12- connections healthy by automatically reconnecting to them when they break.
13-
14- The ` Dial ` function can also be configured with various options to customize the
15- behavior of the client connection. For example, developers could use options
16- such a
17- [ ` WithTransportCredentials ` ] ( https://pkg.go.dev/google.golang.org/grpc#WithTransportCredentials )
18- to configure the transport credentials to use.
19-
20- While ` Dial ` is commonly referred to as a "dialing" function, it doesn't
21- actually perform the low-level network dialing operation like
22- [ ` net.Dial ` ] ( https://pkg.go.dev/net#Dial ) would. Instead, it creates a virtual
23- connection from the gRPC client to the gRPC server.
24-
25- ` Dial ` does initiate the process of connecting to the server, but it uses the
26- ClientConn object to manage and maintain that connection over time. This is why
27- errors encountered during the initial connection are no different from those
28- that occur later on, and why it's important to handle errors from RPCs rather
29- than relying on options like
30- [ ` FailOnNonTempDialError ` ] ( https://pkg.go.dev/google.golang.org/grpc#FailOnNonTempDialError ) ,
31- [ ` WithBlock ` ] ( https://pkg.go.dev/google.golang.org/grpc#WithBlock ) , and
32- [ ` WithReturnConnectionError ` ] ( https://pkg.go.dev/google.golang.org/grpc#WithReturnConnectionError ) .
33- In fact, ` Dial ` does not always establish a connection to servers by default.
34- The connection behavior is determined by the load balancing policy being used.
35- For instance, an "active" load balancing policy such as Round Robin attempts to
36- maintain a constant connection, while the default "pick first" policy delays
37- connection until an RPC is executed. Instead of using the WithBlock option, which
38- may not be recommended in some cases, you can call the
39- [ ` ClientConn.Connect ` ] ( https://pkg.go.dev/google.golang.org/grpc#ClientConn.Connect )
40- method to explicitly initiate a connection.
41-
42- ### Using ` FailOnNonTempDialError ` , ` WithBlock ` , and ` WithReturnConnectionError `
43-
44- The gRPC API provides several options that can be used to configure the behavior
45- of dialing and connecting to a gRPC server. Some of these options, such as
46- ` FailOnNonTempDialError ` , ` WithBlock ` , and ` WithReturnConnectionError ` , rely on
47- failures at dial time. However, we strongly discourage developers from using
48- these options, as they can introduce race conditions and result in unreliable
49- and difficult-to-debug code.
50-
51- One of the most important reasons for avoiding these options, which is often
52- overlooked, is that connections can fail at any point in time. This means that
53- you need to handle RPC failures caused by connection issues, regardless of
54- whether a connection was never established in the first place, or if it was
55- created and then immediately lost. Implementing proper error handling for RPCs
56- is crucial for maintaining the reliability and stability of your gRPC
57- communication.
58-
59- ### Why we discourage using ` FailOnNonTempDialError ` , ` WithBlock ` , and ` WithReturnConnectionError `
60-
61- When a client attempts to connect to a gRPC server, it can encounter a variety
62- of errors, including network connectivity issues, server-side errors, and
63- incorrect usage of the gRPC API. The options ` FailOnNonTempDialError ` ,
64- ` WithBlock ` , and ` WithReturnConnectionError ` are designed to handle some of
65- these errors, but they do so by relying on failures at dial time. This means
66- that they may not provide reliable or accurate information about the status of
67- the connection.
68-
69- For example, if a client uses ` WithBlock ` to wait for a connection to be
70- established, it may end up waiting indefinitely if the server is not responding.
71- Similarly, if a client uses ` WithReturnConnectionError ` to return a connection
72- error if dialing fails, it may miss opportunities to recover from transient
73- network issues that are resolved shortly after the initial dial attempt.
11+ represents the virtual connection to the server. The ` ClientConn ` contains one
12+ or more actual connections to real servers and attempts to maintain these
13+ connections by automatically reconnecting to them when they break. ` NewClient `
14+ was introduced in gRPC-Go v1.63.
15+
16+ ### The wrong way: ` grpc.Dial `
17+
18+ [ ` grpc.Dial ` ] ( https://pkg.go.dev/google.golang.org/grpc#Dial ) is a deprecated
19+ function that also creates the same virtual connection pool as ` grpc.NewClient ` .
20+ However, unlike ` grpc.NewClient ` , it immediately starts connecting and supports
21+ a few additional ` DialOption ` s that control this initial connection attempt.
22+ These are: ` WithBlock ` , ` WithTimeout ` , ` WithReturnConnectionError ` , and
23+ `FailOnNonTempDialError.
24+
25+ That ` grpc.Dial ` creates connections immediately is not a problem in and of
26+ itself, but this behavior differs from how gRPC works in all other languages,
27+ and it can be convenient to have a constructor that does not perform I/O. It
28+ can also be confusing to users, as most people expect a function called ` Dial `
29+ to create _ a_ connection which may need to be recreated if it is lost.
30+
31+ ` grpc.Dial ` uses "passthrough" as the default name resolver for backward
32+ compatibility while ` grpc.NewClient ` uses "dns" as its default name resolver.
33+ This subtle diffrence is important to legacy systems that also specified a
34+ custom dialer and expected it to receive the target string directly.
35+
36+ For these reasons, using ` grpc.Dial ` is discouraged. Even though it is marked
37+ as deprecated, we will continue to support it until a v2 is released (and no
38+ plans for a v2 exist at the time this was written).
39+
40+ ### Especially bad: using deprecated ` DialOptions `
41+
42+ ` FailOnNonTempDialError ` , ` WithBlock ` , and ` WithReturnConnectionError ` are three
43+ ` DialOption ` s that are only supported by ` Dial ` because they only affect the
44+ behavior of ` Dial ` itself. ` WithBlock ` causes ` Dial ` to wait until the
45+ ` ClientConn ` reports its ` State ` as ` connectivity.Connected ` . The other two deal
46+ with returning connection errors before the timeout (` WithTimeout ` or on the
47+ context when using ` DialContext ` ).
48+
49+ The reason these options can be a problem is that connections with a
50+ ` ClientConn ` are dynamic -- they may come and go over time. If your client
51+ successfully connects, the server could go down 1 second later, and your RPCs
52+ will fail. "Knowing you are connected" does not tell you much in this regard.
53+
54+ Additionally, _ all_ RPCs created on an "idle" or a "connecting" ` ClientConn `
55+ will wait until their deadline or until a connection is established before
56+ failing. This means that you don't need to check that a ` ClientConn ` is "ready"
57+ before starting your RPCs. By default, RPCs will fail if the ` ClientConn `
58+ enters the "transient failure" state, but setting ` WaitForReady(true) ` on a
59+ call will cause it to queue even in the "transient failure" state, and it will
60+ only ever fail due to a deadline, a server response, or a connection loss after
61+ the RPC was sent to a server.
62+
63+ Some users of ` Dial ` use it as a way to validate the configuration of their
64+ system. If you wish to maintain this behavior but migrate to ` NewClient ` , you
65+ can call ` State ` and ` WaitForStateChange ` until the channel is connected.
66+ However, if this fails, it does not mean that your configuration was bad - it
67+ could also mean the service is not reachable by the client due to connectivity
68+ reasons.
7469
7570## Best practices for error handling in gRPC
7671
7772Instead of relying on failures at dial time, we strongly encourage developers to
78- rely on errors from RPCs. When a client makes an RPC, it can receive an error
79- response from the server. These errors can provide valuable information about
73+ rely on errors from RPCs. When a client makes an RPC, it can receive an error
74+ response from the server. These errors can provide valuable information about
8075what went wrong, including information about network issues, server-side errors,
8176and incorrect usage of the gRPC API.
8277
8378By handling errors from RPCs correctly, developers can write more reliable and
84- robust gRPC applications. Here are some best practices for error handling in
79+ robust gRPC applications. Here are some best practices for error handling in
8580gRPC:
8681
87- - Always check for error responses from RPCs and handle them appropriately.
88- - Use the ` status ` field of the error response to determine the type of error that
89- occurred.
82+ - Always check for error responses from RPCs and handle them appropriately.
83+ - Use the ` status ` field of the error response to determine the type of error
84+ that occurred.
9085- When retrying failed RPCs, consider using the built-in retry mechanism
9186 provided by gRPC-Go, if available, instead of manually implementing retries.
9287 Refer to the [ gRPC-Go retry example
9388 documentation] ( https://github.com/grpc/grpc-go/blob/master/examples/features/retry/README.md )
94- for more information.
95- - Avoid using ` FailOnNonTempDialError ` , ` WithBlock ` , and
96- ` WithReturnConnectionError ` , as these options can introduce race conditions and
97- result in unreliable and difficult-to-debug code.
98- - If making the outgoing RPC in order to handle an incoming RPC, be sure to
99- translate the status code before returning the error from your method handler.
100- For example, if the error is an ` INVALID_ARGUMENT ` error, that probably means
89+ for more information. Note that this is not a substitute for client-side
90+ retries as errors that occur after an RPC starts on a server cannot be
91+ retried through gRPC's built-in mechanism.
92+ - If making an outgoing RPC from a server handler, be sure to translate the
93+ status code before returning the error from your method handler. For example,
94+ if the error is an ` INVALID_ARGUMENT ` status code, that probably means
10195 your service has a bug (otherwise it shouldn't have triggered this error), in
10296 which case ` INTERNAL ` is more appropriate to return back to your users.
10397
@@ -106,7 +100,7 @@ gRPC:
106100The following code snippet demonstrates how to handle errors from an RPC in
107101gRPC:
108102
109- ``` go
103+ ``` go
110104ctx , cancel := context.WithTimeout (context.Background (), time.Second )
111105defer cancel ()
112106
@@ -118,89 +112,72 @@ if err != nil {
118112 return nil , err
119113}
120114
121- // Use the response as appropriate
115+ // Use the response as appropriate
122116log.Printf (" MyRPC response: %v " , res)
123117```
124118
125119To determine the type of error that occurred, you can use the status field of
126120the error response:
127121
128-
129122``` go
130- resp , err := client.MakeRPC (context.Background (), request)
123+ resp , err := client.MakeRPC (context.TODO (), request)
131124if err != nil {
132- status , ok := status.FromError (err)
133- if ok {
134- // Handle the error based on its status code
125+ if status , ok := status.FromError (err); ok {
126+ // Handle the error based on its status code
135127 if status.Code () == codes.NotFound {
136128 log.Println (" Requested resource not found" )
137129 } else {
138130 log.Printf (" RPC error: %v " , status.Message ())
139131 }
140132 } else {
141- // Handle non-RPC errors
133+ // Handle non-RPC errors
142134 log.Printf (" Non-RPC error: %v " , err)
143135 }
144136 return
145- }
137+ }
146138
147- // Use the response as needed
148- log.Printf (" Response received: %v " , resp)
139+ // Use the response as needed
140+ log.Printf (" Response received: %v " , resp)
149141```
150142
151143### Example: Using a backoff strategy
152144
153-
154145When retrying failed RPCs, use a backoff strategy to avoid overwhelming the
155146server or exacerbating network issues:
156147
157-
158- ``` go
148+ ``` go
159149var res *MyResponse
160150var err error
161151
162- // If the user doesn't have a context with a deadline, create one
163- ctx , cancel := context. WithTimeout (context. Background (), time. Second )
164- defer cancel ()
152+ retryableStatusCodes := map [codes. Code ] bool {
153+ codes. Unavailable : true , // etc
154+ }
165155
166- // Retry the RPC call a maximum number of times
156+ // Retry the RPC a maximum number of times.
167157for i := 0 ; i < maxRetries; i++ {
168-
169- // Make the RPC call
170- res, err = client. MyRPC (ctx, &MyRequest{})
171-
172- // Check if the RPC call was successful
173- if err == nil {
174- // The RPC was successful, so break out of the loop
158+ // Make the RPC.
159+ res, err = client. MyRPC (context. TODO (), &MyRequest{})
160+
161+ // Check if the RPC was successful.
162+ if !retryableStatusCodes[status. Code (err)] {
163+ // The RPC was successful or errored in a non-retryable way;
164+ // do not retry.
175165 break
176166 }
177-
178- // The RPC failed, so wait for a backoff period before retrying
179- backoff := time.Duration (i) * time.Second
167+
168+ // The RPC is retryable; wait for a backoff period before retrying.
169+ backoff := time.Duration (i+ 1 ) * time.Second
180170 log.Printf (" Error calling MyRPC: %v ; retrying in %v " , err, backoff)
181171 time.Sleep (backoff)
182172}
183173
184- // Check if the RPC call was successful after all retries
174+ // Check if the RPC was successful after all retries.
185175if err != nil {
186176 // All retries failed, so handle the error appropriately
187177 log.Printf (" Error calling MyRPC: %v " , err)
188178 return nil , err
189179}
190180
191- // Use the response as appropriate
181+ // Use the response as appropriate.
192182log.Printf (" MyRPC response: %v " , res)
193183```
194-
195-
196- ## Conclusion
197-
198- The
199- [ ` FailOnNonTempDialError ` ] ( https://pkg.go.dev/google.golang.org/grpc#FailOnNonTempDialError ) ,
200- [ ` WithBlock ` ] ( https://pkg.go.dev/google.golang.org/grpc#WithBlock ) , and
201- [ ` WithReturnConnectionError ` ] ( https://pkg.go.dev/google.golang.org/grpc#WithReturnConnectionError )
202- options are designed to handle errors at dial time, but they can introduce race
203- conditions and result in unreliable and difficult-to-debug code. Instead of
204- relying on these options, we strongly encourage developers to rely on errors
205- from RPCs for error handling. By following best practices for error handling in
206- gRPC, developers can write more reliable and robust gRPC applications.
0 commit comments