Skip to content

[Feature][Connector-V2] Add Airtable source and sink#10469

Open
kuleat wants to merge 1 commit intoapache:devfrom
kuleat:dev-connector-http-airtable
Open

[Feature][Connector-V2] Add Airtable source and sink#10469
kuleat wants to merge 1 commit intoapache:devfrom
kuleat:dev-connector-http-airtable

Conversation

@kuleat
Copy link

@kuleat kuleat commented Feb 9, 2026

Purpose of this pull request

#10443
Add Airtable source and sink connector.

Does this PR introduce any user-facing change?

No

How was this patch tested?

Add UT and e2e tests.

Check list

@DanielCarter-stack
Copy link

Issue 1: Offset Initial Value Handling Unclear

Location: AirtableSourceParameter.java:110-114

if (pluginConfig.getOptional(AirtableSourceOptions.OFFSET).isPresent()) {
    body.put("offset", pluginConfig.get(AirtableSourceOptions.OFFSET));
} else {
    body.putIfAbsent("offset", null);
}

Related Context:

  • Parent class: HttpParameter.java:28-90
  • Caller: AirtableSource.java:43
  • Pagination logic: HttpSourceReader.java:394-423

Problem Description:
When the user has not configured offset, the code will insert {"offset": null} into the request body. This may not meet Airtable API's expectations, because the first request should not include the offset field, rather than including a null value.

Potential Risks:

  • Risk 1: Airtable API may reject requests containing null offset (requires actual testing to verify)
  • Risk 2: If the user provides an initial offset through body configuration (resuming from breakpoint), putIfAbsent will retain it, which is the correct behavior, but the code comments are not clear enough

Impact Scope:

  • Direct impact: AirtableSourceParameter
  • Indirect impact: All tasks using Airtable Source, especially first requests
  • Impact surface: Single Connector

Severity: MINOR

Improvement Suggestions:

// Suggested code
if (pluginConfig.getOptional(AirtableSourceOptions.OFFSET).isPresent()) {
    body.put("offset", pluginConfig.get(AirtableSourceOptions.OFFSET));
} else {
    // Only add offset if not already present in user-provided body
    // Do not add null offset for initial request
    body.putIfAbsent("offset", null);
    // Alternative: Remove offset if it's null
    // body.remove("offset");
}

Rationale:

  • More explicit comments explaining the intent
  • Consider removing null values when serializing to JSON (using JsonSerializationSchema configuration)
  • Add unit tests to verify request body format for first requests

Issue 2: Concurrency Safety Lacks Documentation

Location:

  • AirtableSourceReader.java:40
  • AirtableSinkWriter.java:61
private long lastRequestTimeMillis = 0L;

Related Context:

  • Parent class: AbstractSingleSplitReader
  • Usage scenario: waitForRequestSlot() method

Problem Description:
The lastRequestTimeMillis field has no volatile modifier and does not use AtomicLong. Although it is safe under the current architecture (single-threaded access), there is no documentation explaining this assumption. If SeaTunnel introduces concurrent reading in the future, this field will become a race condition.

Potential Risks:

  • Risk 1: Future code refactoring may introduce concurrent access, leading to race conditions
  • Risk 2: Code reviewers may mistakenly think this is a bug

Impact Scope:

  • Direct impact: AirtableSourceReader, AirtableSinkWriter
  • Indirect impact: Future multi-threaded reading optimizations
  • Impact surface: Single Connector

Severity: MINOR

Improvement Suggestions:

// Suggested code
/**
 * Last request timestamp in milliseconds.
 * Note: This field is accessed by a single thread (HttpSourceReader is single-threaded).
 * Volatile is not required due to the happens-before relationship in single-threaded execution.
 */
private long lastRequestTimeMillis = 0L;

Rationale:

  • Add explicit JavaDoc explaining thread safety assumptions
  • Help future maintainers understand design intent
  • Avoid unnecessary "optimizations" (such as adding volatile)

Issue 3: Missing Input Upper Bound Validation

Location:

  • AirtableSourceReader.java:53-55
  • AirtableSinkWriter.java:78-80
this.requestIntervalMs = Math.max(0, requestIntervalMs);
this.rateLimitBackoffMs = Math.max(0, rateLimitBackoffMs);
this.rateLimitMaxRetries = Math.max(0, rateLimitMaxRetries);

Related Context:

  • Configuration item: AirtableConfig.java:66-85

Problem Description:
The code only validates the lower bound (>= 0), but does not set an upper bound. If the user configures an abnormally large value (such as request_interval_ms = 99999999), it may cause the task to block for a long time.

Potential Risks:

  • Risk 1: User misconfiguration causes tasks to nearly stop (request interval too long)
  • Risk 2: Malicious extremely large values cause denial of service (although unlikely, since configuration files are user-controlled)

Impact Scope:

  • Direct impact: All Airtable Source/Sink tasks
  • Indirect impact: SeaTunnel task scheduler (tasks running for long periods)
  • Impact surface: Single Connector

Severity: MINOR

Improvement Suggestions:

// Suggested code
private static final int MAX_REQUEST_INTERVAL_MS = 600000; // 10 minutes
private static final int MAX_BACKOFF_MS = 600000; // 10 minutes
private static final int MAX_RETRIES = 100;

this.requestIntervalMs = 
    Math.min(Math.max(0, requestIntervalMs), MAX_REQUEST_INTERVAL_MS);
this.rateLimitBackoffMs = 
    Math.min(Math.max(0, rateLimitBackoffMs), MAX_BACKOFF_MS);
this.rateLimitMaxRetries = 
    Math.min(Math.max(0, rateLimitMaxRetries), MAX_RETRIES);

Rationale:

  • Prevent abnormal behavior caused by misconfiguration
  • Upper bound value is large enough and will not affect normal usage
  • Can document these limitations

Issue 4: E2E Tests Lack Pagination Scenarios

Location:

  • seatunnel-e2e/connector-http-e2e/src/test/resources/airtable_json_to_assert.conf
  • seatunnel-e2e/connector-http-e2e/src/test/resources/mockserver-config.json

Related Context:

  • E2E test class: HttpIT.java
  • MockServer configuration: Simulating Airtable API responses

Problem Description:
Current E2E tests only verify simple read scenarios (3 records, single page), and do not test pagination scenarios. Airtable API's offset pagination logic is a core feature that should be covered in E2E tests.

Potential Risks:

  • Risk 1: Pagination logic bugs may only be discovered in production environments
  • Risk 2: The combination of offset update logic (HttpSourceReader#updateRequestParam() and collect()) may have issues

Impact Scope:

  • Direct impact: Paginated read scenarios
  • Indirect impact: Large data volume tasks (exceeding page_size)
  • Impact surface: Single Connector

Severity: MAJOR

Improvement Suggestions:

// Suggest adding pagination test scenarios to mockserver-config.json
{
  "httpRequest": {
    "path": "/v0/appTEST123/TestTable/listRecords"
  },
  "httpResponse": {
    "statusCode": 200,
    "body": {
      "records": [...], // 100 records
      "offset": "itrXXXXXXXXXXXXX/recXXXXXXXXXXXXX"
    }
  },
  "httpResponseTemplates": {
    "templateType": "RANDOM"
  }
}

Rationale:

  • Pagination is a core feature of Airtable API
  • E2E tests should cover real-world scenarios
  • Can combine MockServer's dynamic response feature to simulate multi-page data

Issue 5: Incomplete Changelog Content

Location: docs/en/connectors/changelog/connector-http-airtable.md

Related Context:

  • Changelog files for other Connectors (such as connector-http-notion.md)

Problem Description:
The Changelog file has only 6 lines, lacking key information such as version information, release date, feature descriptions, etc.

Potential Risks:

  • Risk 1: Users cannot understand the version of new features
  • Risk 2: Release Notes cannot be automatically generated

Impact Scope:

  • Direct impact: Documentation completeness
  • Indirect impact: User experience
  • Impact surface: Single Connector

Severity: MINOR

Improvement Suggestions:

# Airtable Connector Changelog

## 2.3.5 (Unreleased)

### Added
- Add Airtable source connector to read data from Airtable tables (#10443)
- Add Airtable sink connector to write data to Airtable tables
- Support offset-based pagination for large table reads
- Support configurable rate limiting with exponential backoff (429 handling)
- Support batch writes (up to 10 records per request)
- Support field projection and filtering via Airtable API options

### Changed
- Extract `HttpSourceReader#executeRequest()` to protected method to allow subclass override

### Known Limitations
- Does not support streaming mode (batch only)
- Does not support exactly-once semantics
- Batch size is limited to 10 records by Airtable API

Rationale:

  • Follow Apache SeaTunnel's Changelog format
  • Provide complete feature descriptions
  • Document known limitations

Issue 6: Missing Observability Metrics

Location:

  • AirtableSourceReader.java
  • AirtableSinkWriter.java

Related Context:

  • SeaTunnel Metrics API: org.apache.seatunnel.api.metrics.MetricReporter

Problem Description:
The code does not add custom Metrics to monitor key metrics such as 429 retry count, backoff time, etc. This prevents users from observing Airtable API call status in monitoring systems like Grafana.

Potential Risks:

  • Risk 1: Users cannot detect frequent 429 errors in time
  • Risk 2: Difficult to optimize request_interval_ms parameters (unsure if current rate limit is sufficient)

Impact Scope:

  • Direct impact: Production environment observability
  • Indirect impact: Troubleshooting and performance optimization
  • Impact surface: Single Connector

Severity: MINOR (because logging is already in place)

Improvement Suggestions:

// Suggest adding Metrics to AirtableSourceReader
private Counter retryCounter;
private Histogram backoffTimeHistogram;

@Override
public void open() {
    super.open();
    MetricContext context = this.context.getMetricContext();
    retryCounter = context.counter("airtableApiRetryCount");
    backoffTimeHistogram = context.histogram("airtableApiBackoffTime");
}

private HttpResponse executeRequest() throws Exception {
    int retryCount = 0;
    while (true) {
        waitForRequestSlot();
        HttpResponse response = doExecuteRequest();
        if (response.getCode() == STATUS_TOO_MANY_REQUESTS
                && retryCount < rateLimitMaxRetries) {
            retryCount += 1;
            retryCounter.increment(); // Record retry count
            long backoffMillis = calculateBackoffMillis(retryCount);
            backoffTimeHistogram.update(backoffMillis); // Record backoff time
            ...
        }
        return response;
    }
}

Rationale:

  • Provide better observability
  • Help users optimize configuration parameters
  • Align with SeaTunnel Connectors best practices

Issue 7: Missing Success Write Logs

Location: AirtableSinkWriter.java:123-159

Related Context:

  • AirtableSinkWriter#sendWithRateLimitRetry()

Problem Description:
AirtableSinkWriter only logs on failure (log.error), with no logs when writes succeed. This prevents users from confirming whether data was successfully written to Airtable.

Potential Risks:

  • Risk 1: Users cannot track write progress (can only rely on SeaTunnel framework's Metrics)
  • Risk 2: Key information missing when debugging issues

Impact Scope:

  • Direct impact: Log readability
  • Indirect impact: Troubleshooting
  • Impact surface: Single Connector

Severity: MINOR

Improvement Suggestions:

// Suggested code
private void sendWithRateLimitRetry(String body) throws IOException {
    int retryCount = 0;
    while (true) {
        waitForRequestSlot();
        try {
            HttpResponse response = httpClient.doPost(url, headers, body);
            if (HttpResponse.STATUS_OK == response.getCode()) {
                log.info("Successfully wrote {} records to Airtable", batchBuffer.size());
                return;
            }
            ...
        }
    }
}

Rationale:

  • Provide better user feedback
  • Help track write progress
  • Maintain consistency with AirtableSourceReader logging style

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants