Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -42,6 +42,7 @@ public final class PrestoHeaders
public static final String PRESTO_SESSION_FUNCTION = "X-Presto-Session-Function";
public static final String PRESTO_ADDED_SESSION_FUNCTION = "X-Presto-Added-Session-Functions";
public static final String PRESTO_REMOVED_SESSION_FUNCTION = "X-Presto-Removed-Session-Function";
public static final String PRESTO_RETRY_QUERY = "X-Presto-Retry-Query";

public static final String PRESTO_CURRENT_STATE = "X-Presto-Current-State";
public static final String PRESTO_MAX_WAIT = "X-Presto-Max-Wait";
Expand Down
46 changes: 45 additions & 1 deletion presto-docs/src/main/sphinx/admin/properties.rst
Original file line number Diff line number Diff line change
Expand Up @@ -1156,4 +1156,48 @@ The corresponding session property is :ref:`admin/properties-session:\`\`query_c

Use to configure how long a query can be queued before it is terminated.

The corresponding session property is :ref:`admin/properties-session:\`\`query_max_queued_time\`\``.
The corresponding session property is :ref:`admin/properties-session:\`\`query_max_queued_time\`\``.

Query Retry Properties
----------------------

``retry.enabled``
^^^^^^^^^^^^^^^^^

* **Type:** ``boolean``
* **Default value:** ``true``

Enable cross-cluster retry functionality. When enabled, queries that fail with
specific error codes can be automatically retried on a backup cluster if a
retry URL is provided.

``retry.allowed-domains``
^^^^^^^^^^^^^^^^^^^^^^^^^

* **Type:** ``string``
* **Default value:** (empty, signifying current second-level domain allowed only)

Comma-separated list of allowed domains for retry URLs. Supports wildcards
like ``*.example.com``. For example: ``cluster1.example.com,*.backup.example.net``.
When empty (default), only retry URLs from the same domain as the current server
are allowed.

``retry.require-https``
^^^^^^^^^^^^^^^^^^^^^^^

* **Type:** ``boolean``
* **Default value:** ``false``

Require HTTPS for retry URLs. When enabled, only HTTPS URLs will be accepted
for cross-cluster retry operations.

``retry.cross-cluster-error-codes``
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

* **Type:** ``string``
* **Default value:** ``REMOTE_TASK_ERROR``

Comma-separated list of error codes that allow cross-cluster retry. When a query
fails with one of these error codes, it can be automatically retried on a backup
cluster if a retry URL is provided. Available error codes include standard Presto
error codes such as ``REMOTE_TASK_ERROR``, ``CLUSTER_OUT_OF_MEMORY``, etc.
69 changes: 69 additions & 0 deletions presto-docs/src/main/sphinx/develop/client-protocol.rst
Original file line number Diff line number Diff line change
Expand Up @@ -122,6 +122,9 @@ Request Header Name Description
``X-Presto-Extra-Credential`` Provides extra credentials to the connector. The header is a name=value string that
is saved in the session ``Identity`` object. The name and value are only
meaningful to the connector.
``X-Presto-Retry-Query`` Boolean flag indicating that this query is a placeholder for potential retry.
When set to ``true``, marks the query on the backup cluster as a retry placeholder
and prevents retry chains in cross-cluster retry scenarios.
====================================== =========================================================================================


Expand Down Expand Up @@ -184,3 +187,69 @@ Data Member Type Notes
=================

Class ``PrestoHeaders`` enumerates all the HTTP request and response headers allowed by the Presto client REST API.


Cross-Cluster Query Retry
=========================

Presto supports automatic query retry on a backup cluster when a query fails on the primary cluster. This feature enables
high availability by transparently redirecting failed queries to a backup cluster.

The cross-cluster retry mechanism works as follows:

Query Parameters
----------------

When a router or load balancer handles a query that should support cross-cluster retry, it includes the following
query parameters when redirecting the client to the primary cluster:

* ``retryUrl`` - The URL-encoded endpoint of the backup cluster where the query can be retried if it fails
* ``retryExpirationInSeconds`` - The number of seconds until the retry URL expires (must be at least 1). This value
should be set based on the ``Cache-Control`` headers returned by Presto query endpoints. Presto uses ``Cache-Control``
headers to indicate how long a query will be retained in the server's memory. The retry expiration should not exceed
this cache duration to ensure the placeholder query is still available when the retry occurs.

Both parameters must be provided together. If only one is provided, the request will be rejected with a 400 Bad Request error.

Example request to primary cluster::

POST /v1/statement?retryUrl=https%3A%2F%2Fbackup.example.com%3A8080%2Fv1%2Fstatement&retryExpirationInSeconds=300

Retry Header
------------

The ``X-Presto-Retry-Query`` header is used to indicate that a query is being created as a placeholder for potential
retry. When set to ``true``, this header:

* Indicates the query is a retry placeholder on the backup cluster
* Prevents retry chains - a query marked with this header will not trigger another retry if it fails

Retry Flow
----------

1. Router/load balancer POSTs the query to the backup cluster with ``X-Presto-Retry-Query: true`` header to create
a placeholder query that can be used as a retry destination
2. Router redirects (HTTP 307) the client to the primary cluster with ``retryUrl`` and ``retryExpirationInSeconds``
query parameters
3. Client follows the redirect and POSTs the query to the primary cluster
4. Primary cluster executes the query normally
5. If the query fails with a retriable error code (configured on the server), the Presto server modifies the
``nextUri`` in the response to point to the retry URL of the backup cluster
6. Client follows the ``nextUri`` to the backup cluster where the placeholder query executes the actual query
7. If the retry query fails, it will not trigger another retry since it's marked with ``X-Presto-Retry-Query``

Limitations
-----------

Cross-cluster retry has the following limitations:

* **Query types**: Retry only works when no results have been sent back to the client. In practice, this feature
works well for:

- ``CREATE TABLE AS SELECT`` statements
- DDL operations (``CREATE``, ``ALTER``, ``DROP``, etc.)
- ``INSERT`` statements
- ``SELECT`` queries that fail before any results are produced

For ``SELECT`` queries that produce results, retry will only occur if the failure happens during planning or
before the first batch of results is generated.
Original file line number Diff line number Diff line change
@@ -0,0 +1,116 @@
/*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
package com.facebook.presto.server;

import com.facebook.airlift.configuration.Config;
import com.facebook.airlift.configuration.ConfigDescription;
import com.facebook.presto.common.ErrorCode;
import com.facebook.presto.spi.StandardErrorCode;
import com.google.common.base.Splitter;
import com.google.common.collect.ImmutableSet;
import jakarta.validation.constraints.NotNull;

import java.util.Set;

import static com.facebook.presto.spi.StandardErrorCode.REMOTE_TASK_ERROR;
import static com.google.common.collect.ImmutableSet.toImmutableSet;

public class RetryConfig
{
private boolean retryEnabled = true;
private Set<String> allowedRetryDomains = ImmutableSet.of();
private boolean requireHttps;
private Set<Integer> crossClusterRetryErrorCodes = ImmutableSet.of(
REMOTE_TASK_ERROR.toErrorCode().getCode());

public boolean isRetryEnabled()
{
return retryEnabled;
}

@Config("retry.enabled")
@ConfigDescription("Enable cross-cluster retry functionality")
public RetryConfig setRetryEnabled(boolean retryEnabled)
{
this.retryEnabled = retryEnabled;
return this;
}

@NotNull
public Set<String> getAllowedRetryDomains()
{
return allowedRetryDomains;
}

@Config("retry.allowed-domains")
@ConfigDescription("Comma-separated list of allowed domains for retry URLs " +
"(supports wildcards like *.example.com)")
public RetryConfig setAllowedRetryDomains(String domains)
{
if (domains == null || domains.trim().isEmpty()) {
this.allowedRetryDomains = ImmutableSet.of();
}
else {
this.allowedRetryDomains = Splitter.on(',')
.trimResults()
.omitEmptyStrings()
.splitToList(domains)
.stream()
.map(String::toLowerCase)
.collect(toImmutableSet());
}
return this;
}

public boolean isRequireHttps()
{
return requireHttps;
}

@Config("retry.require-https")
@ConfigDescription("Require HTTPS for retry URLs")
public RetryConfig setRequireHttps(boolean requireHttps)
{
this.requireHttps = requireHttps;
return this;
}

@NotNull
public Set<Integer> getCrossClusterRetryErrorCodes()
{
return crossClusterRetryErrorCodes;
}

@Config("retry.cross-cluster-error-codes")
@ConfigDescription("Comma-separated list of error codes that allow cross-cluster retry")
public RetryConfig setCrossClusterRetryErrorCodes(String errorCodes)
{
if (errorCodes == null || errorCodes.trim().isEmpty()) {
// Keep the default error codes
return this;
}
else {
this.crossClusterRetryErrorCodes = Splitter.on(',')
.trimResults()
.omitEmptyStrings()
.splitToList(errorCodes)
.stream()
.map(StandardErrorCode::valueOf)
.map(StandardErrorCode::toErrorCode)
.map(ErrorCode::getCode)
.collect(toImmutableSet());
}
return this;
}
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,55 @@
/*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
package com.facebook.presto.server;

import com.facebook.airlift.configuration.testing.ConfigAssertions;
import com.google.common.collect.ImmutableMap;
import org.testng.annotations.Test;

import java.util.Map;

import static com.facebook.airlift.configuration.testing.ConfigAssertions.assertFullMapping;
import static com.facebook.airlift.configuration.testing.ConfigAssertions.assertRecordedDefaults;

public class TestRetryConfig
{
@Test
public void testDefaults()
{
assertRecordedDefaults(ConfigAssertions.recordDefaults(RetryConfig.class)
.setRetryEnabled(true)
.setRequireHttps(false)
.setAllowedRetryDomains(null)
.setCrossClusterRetryErrorCodes("REMOTE_TASK_ERROR"));
}

@Test
public void testExplicitPropertyMappings()
{
Map<String, String> properties = new ImmutableMap.Builder<String, String>()
.put("retry.enabled", "false")
.put("retry.allowed-domains", "*.foo.bar,*.baz.qux")
.put("retry.require-https", "true")
.put("retry.cross-cluster-error-codes", "QUERY_QUEUE_FULL")
.build();

RetryConfig expected = new RetryConfig()
.setRetryEnabled(false)
.setRequireHttps(true)
.setAllowedRetryDomains("*.foo.bar,*.baz.qux")
.setCrossClusterRetryErrorCodes("QUERY_QUEUE_FULL");

assertFullMapping(properties, expected);
}
}
Original file line number Diff line number Diff line change
Expand Up @@ -210,6 +210,10 @@ protected void setup(Binder binder)
binder.bind(QueryBlockingRateLimiter.class).in(Scopes.SINGLETON);
newExporter(binder).export(QueryBlockingRateLimiter.class).withGeneratedName();

// retry configuration
configBinder(binder).bindConfig(RetryConfig.class);
binder.bind(RetryUrlValidator.class).in(Scopes.SINGLETON);

binder.bind(LocalQueryProvider.class).in(Scopes.SINGLETON);
binder.bind(ExecutingQueryResponseProvider.class).to(LocalExecutingQueryResponseProvider.class).in(Scopes.SINGLETON);

Expand Down
Loading
Loading