Skip to content

Commit 554d5ff

Browse files
committed
Add support for retrying on a different cluster
1 parent 7f27481 commit 554d5ff

13 files changed

Lines changed: 979 additions & 21 deletions

File tree

presto-client/src/main/java/com/facebook/presto/client/PrestoHeaders.java

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -42,6 +42,7 @@ public final class PrestoHeaders
4242
public static final String PRESTO_SESSION_FUNCTION = "X-Presto-Session-Function";
4343
public static final String PRESTO_ADDED_SESSION_FUNCTION = "X-Presto-Added-Session-Functions";
4444
public static final String PRESTO_REMOVED_SESSION_FUNCTION = "X-Presto-Removed-Session-Function";
45+
public static final String PRESTO_RETRY_QUERY = "X-Presto-Retry-Query";
4546

4647
public static final String PRESTO_CURRENT_STATE = "X-Presto-Current-State";
4748
public static final String PRESTO_MAX_WAIT = "X-Presto-Max-Wait";

presto-docs/src/main/sphinx/admin/properties.rst

Lines changed: 45 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1156,4 +1156,48 @@ The corresponding session property is :ref:`admin/properties-session:\`\`query_c
11561156

11571157
Use to configure how long a query can be queued before it is terminated.
11581158

1159-
The corresponding session property is :ref:`admin/properties-session:\`\`query_max_queued_time\`\``.
1159+
The corresponding session property is :ref:`admin/properties-session:\`\`query_max_queued_time\`\``.
1160+
1161+
Query Retry Properties
1162+
----------------------
1163+
1164+
``retry.enabled``
1165+
^^^^^^^^^^^^^^^^^
1166+
1167+
* **Type:** ``boolean``
1168+
* **Default value:** ``true``
1169+
1170+
Enable cross-cluster retry functionality. When enabled, queries that fail with
1171+
specific error codes can be automatically retried on a backup cluster if a
1172+
retry URL is provided.
1173+
1174+
``retry.allowed-domains``
1175+
^^^^^^^^^^^^^^^^^^^^^^^^^
1176+
1177+
* **Type:** ``string``
1178+
* **Default value:** (empty, signifying current second-level domain allowed only)
1179+
1180+
Comma-separated list of allowed domains for retry URLs. Supports wildcards
1181+
like ``*.example.com``. For example: ``cluster1.example.com,*.backup.example.net``.
1182+
When empty (default), only retry URLs from the same domain as the current server
1183+
are allowed.
1184+
1185+
``retry.require-https``
1186+
^^^^^^^^^^^^^^^^^^^^^^^
1187+
1188+
* **Type:** ``boolean``
1189+
* **Default value:** ``false``
1190+
1191+
Require HTTPS for retry URLs. When enabled, only HTTPS URLs will be accepted
1192+
for cross-cluster retry operations.
1193+
1194+
``retry.cross-cluster-error-codes``
1195+
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
1196+
1197+
* **Type:** ``string``
1198+
* **Default value:** ``REMOTE_TASK_ERROR``
1199+
1200+
Comma-separated list of error codes that allow cross-cluster retry. When a query
1201+
fails with one of these error codes, it can be automatically retried on a backup
1202+
cluster if a retry URL is provided. Available error codes include standard Presto
1203+
error codes such as ``REMOTE_TASK_ERROR``, ``CLUSTER_OUT_OF_MEMORY``, etc.

presto-docs/src/main/sphinx/develop/client-protocol.rst

Lines changed: 69 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -122,6 +122,9 @@ Request Header Name Description
122122
``X-Presto-Extra-Credential`` Provides extra credentials to the connector. The header is a name=value string that
123123
is saved in the session ``Identity`` object. The name and value are only
124124
meaningful to the connector.
125+
``X-Presto-Retry-Query`` Boolean flag indicating that this query is a placeholder for potential retry.
126+
When set to ``true``, marks the query on the backup cluster as a retry placeholder
127+
and prevents retry chains in cross-cluster retry scenarios.
125128
====================================== =========================================================================================
126129

127130

@@ -184,3 +187,69 @@ Data Member Type Notes
184187
=================
185188

186189
Class ``PrestoHeaders`` enumerates all the HTTP request and response headers allowed by the Presto client REST API.
190+
191+
192+
Cross-Cluster Query Retry
193+
=========================
194+
195+
Presto supports automatic query retry on a backup cluster when a query fails on the primary cluster. This feature enables
196+
high availability by transparently redirecting failed queries to a backup cluster.
197+
198+
The cross-cluster retry mechanism works as follows:
199+
200+
Query Parameters
201+
----------------
202+
203+
When a router or load balancer handles a query that should support cross-cluster retry, it includes the following
204+
query parameters when redirecting the client to the primary cluster:
205+
206+
* ``retryUrl`` - The URL-encoded endpoint of the backup cluster where the query can be retried if it fails
207+
* ``retryExpirationInSeconds`` - The number of seconds until the retry URL expires (must be at least 1). This value
208+
should be set based on the ``Cache-Control`` headers returned by Presto query endpoints. Presto uses ``Cache-Control``
209+
headers to indicate how long a query will be retained in the server's memory. The retry expiration should not exceed
210+
this cache duration to ensure the placeholder query is still available when the retry occurs.
211+
212+
Both parameters must be provided together. If only one is provided, the request will be rejected with a 400 Bad Request error.
213+
214+
Example request to primary cluster::
215+
216+
POST /v1/statement?retryUrl=https%3A%2F%2Fbackup.example.com%3A8080%2Fv1%2Fstatement&retryExpirationInSeconds=300
217+
218+
Retry Header
219+
------------
220+
221+
The ``X-Presto-Retry-Query`` header is used to indicate that a query is being created as a placeholder for potential
222+
retry. When set to ``true``, this header:
223+
224+
* Indicates the query is a retry placeholder on the backup cluster
225+
* Prevents retry chains - a query marked with this header will not trigger another retry if it fails
226+
227+
Retry Flow
228+
----------
229+
230+
1. Router/load balancer POSTs the query to the backup cluster with ``X-Presto-Retry-Query: true`` header to create
231+
a placeholder query that can be used as a retry destination
232+
2. Router redirects (HTTP 307) the client to the primary cluster with ``retryUrl`` and ``retryExpirationInSeconds``
233+
query parameters
234+
3. Client follows the redirect and POSTs the query to the primary cluster
235+
4. Primary cluster executes the query normally
236+
5. If the query fails with a retriable error code (configured on the server), the Presto server modifies the
237+
``nextUri`` in the response to point to the retry URL of the backup cluster
238+
6. Client follows the ``nextUri`` to the backup cluster where the placeholder query executes the actual query
239+
7. If the retry query fails, it will not trigger another retry since it's marked with ``X-Presto-Retry-Query``
240+
241+
Limitations
242+
-----------
243+
244+
Cross-cluster retry has the following limitations:
245+
246+
* **Query types**: Retry only works when no results have been sent back to the client. In practice, this feature
247+
works well for:
248+
249+
- ``CREATE TABLE AS SELECT`` statements
250+
- DDL operations (``CREATE``, ``ALTER``, ``DROP``, etc.)
251+
- ``INSERT`` statements
252+
- ``SELECT`` queries that fail before any results are produced
253+
254+
For ``SELECT`` queries that produce results, retry will only occur if the failure happens during planning or
255+
before the first batch of results is generated.
Lines changed: 117 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,117 @@
1+
/*
2+
* Licensed under the Apache License, Version 2.0 (the "License");
3+
* you may not use this file except in compliance with the License.
4+
* You may obtain a copy of the License at
5+
*
6+
* http://www.apache.org/licenses/LICENSE-2.0
7+
*
8+
* Unless required by applicable law or agreed to in writing, software
9+
* distributed under the License is distributed on an "AS IS" BASIS,
10+
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
11+
* See the License for the specific language governing permissions and
12+
* limitations under the License.
13+
*/
14+
package com.facebook.presto.server;
15+
16+
import com.facebook.airlift.configuration.Config;
17+
import com.facebook.airlift.configuration.ConfigDescription;
18+
import com.facebook.presto.common.ErrorCode;
19+
import com.facebook.presto.spi.StandardErrorCode;
20+
import com.google.common.base.Splitter;
21+
import com.google.common.collect.ImmutableSet;
22+
23+
import javax.validation.constraints.NotNull;
24+
25+
import java.util.Set;
26+
27+
import static com.facebook.presto.spi.StandardErrorCode.REMOTE_TASK_ERROR;
28+
import static com.google.common.collect.ImmutableSet.toImmutableSet;
29+
30+
public class RetryConfig
31+
{
32+
private boolean retryEnabled = true;
33+
private Set<String> allowedRetryDomains = ImmutableSet.of();
34+
private boolean requireHttps;
35+
private Set<Integer> crossClusterRetryErrorCodes = ImmutableSet.of(
36+
REMOTE_TASK_ERROR.toErrorCode().getCode());
37+
38+
public boolean isRetryEnabled()
39+
{
40+
return retryEnabled;
41+
}
42+
43+
@Config("retry.enabled")
44+
@ConfigDescription("Enable cross-cluster retry functionality")
45+
public RetryConfig setRetryEnabled(boolean retryEnabled)
46+
{
47+
this.retryEnabled = retryEnabled;
48+
return this;
49+
}
50+
51+
@NotNull
52+
public Set<String> getAllowedRetryDomains()
53+
{
54+
return allowedRetryDomains;
55+
}
56+
57+
@Config("retry.allowed-domains")
58+
@ConfigDescription("Comma-separated list of allowed domains for retry URLs " +
59+
"(supports wildcards like *.example.com)")
60+
public RetryConfig setAllowedRetryDomains(String domains)
61+
{
62+
if (domains == null || domains.trim().isEmpty()) {
63+
this.allowedRetryDomains = ImmutableSet.of();
64+
}
65+
else {
66+
this.allowedRetryDomains = Splitter.on(',')
67+
.trimResults()
68+
.omitEmptyStrings()
69+
.splitToList(domains)
70+
.stream()
71+
.map(String::toLowerCase)
72+
.collect(toImmutableSet());
73+
}
74+
return this;
75+
}
76+
77+
public boolean isRequireHttps()
78+
{
79+
return requireHttps;
80+
}
81+
82+
@Config("retry.require-https")
83+
@ConfigDescription("Require HTTPS for retry URLs")
84+
public RetryConfig setRequireHttps(boolean requireHttps)
85+
{
86+
this.requireHttps = requireHttps;
87+
return this;
88+
}
89+
90+
@NotNull
91+
public Set<Integer> getCrossClusterRetryErrorCodes()
92+
{
93+
return crossClusterRetryErrorCodes;
94+
}
95+
96+
@Config("retry.cross-cluster-error-codes")
97+
@ConfigDescription("Comma-separated list of error codes that allow cross-cluster retry")
98+
public RetryConfig setCrossClusterRetryErrorCodes(String errorCodes)
99+
{
100+
if (errorCodes == null || errorCodes.trim().isEmpty()) {
101+
// Keep the default error codes
102+
return this;
103+
}
104+
else {
105+
this.crossClusterRetryErrorCodes = Splitter.on(',')
106+
.trimResults()
107+
.omitEmptyStrings()
108+
.splitToList(errorCodes)
109+
.stream()
110+
.map(StandardErrorCode::valueOf)
111+
.map(StandardErrorCode::toErrorCode)
112+
.map(ErrorCode::getCode)
113+
.collect(toImmutableSet());
114+
}
115+
return this;
116+
}
117+
}

presto-main-base/src/main/java/com/facebook/presto/server/protocol/ExecutingQueryResponseProvider.java

Lines changed: 9 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -22,7 +22,9 @@
2222
import javax.ws.rs.core.Response;
2323
import javax.ws.rs.core.UriInfo;
2424

25+
import java.net.URI;
2526
import java.util.Optional;
27+
import java.util.OptionalLong;
2628

2729
public interface ExecutingQueryResponseProvider
2830
{
@@ -47,6 +49,9 @@ public interface ExecutingQueryResponseProvider
4749
* @param compressionEnabled enable compression
4850
* @param nestedDataSerializationEnabled enable nested data serialization
4951
* @param binaryResults generate results in binary format, rather than JSON
52+
* @param retryUrl optional retry URL for cross-cluster retry
53+
* @param retryExpirationEpochTime optional retry expiration time
54+
* @param isRetryQuery true if this query is already a retry query
5055
* @return the ExecutingStatement's Response, if available
5156
*/
5257
Optional<ListenableFuture<Response>> waitForExecutingResponse(
@@ -61,5 +66,8 @@ Optional<ListenableFuture<Response>> waitForExecutingResponse(
6166
boolean compressionEnabled,
6267
boolean nestedDataSerializationEnabled,
6368
boolean binaryResults,
64-
long durationUntilExpirationMs);
69+
long durationUntilExpirationMs,
70+
Optional<URI> retryUrl,
71+
OptionalLong retryExpirationEpochTime,
72+
boolean isRetryQuery);
6573
}

0 commit comments

Comments
 (0)