Skip to content

Commit eb63f76

Browse files
committed
Merge remote-tracking branch 'upstream/dev' into feature_udf_lifecycle
2 parents 689c392 + c5f45fd commit eb63f76

57 files changed

Lines changed: 3864 additions & 125 deletions

File tree

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

config/plugin_config

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -59,6 +59,7 @@ connector-http-myhours
5959
connector-http-notion
6060
connector-http-onesignal
6161
connector-http-wechat
62+
connector-http-airtable
6263
connector-hudi
6364
connector-iceberg
6465
connector-influxdb

docs/en/architecture/design-philosophy.md

Lines changed: 6 additions & 22 deletions
Original file line numberDiff line numberDiff line change
@@ -445,46 +445,30 @@ Many sinks only need Writer + Committer; AggregatedCommitter is for complex case
445445

446446
However, for Flink translation, SeaTunnel checkpoints align with Flink checkpoints to avoid duplication.
447447

448-
## 6. Future Directions
448+
## 6. Lessons Learned
449449

450-
### 6.1 Planned Enhancements
451-
452-
- **Dynamic Scaling**: Add/remove workers during job execution
453-
- **Adaptive Batch Size**: Auto-tune batch sizes based on throughput
454-
- **Query Pushdown**: Push filters/projections to sources
455-
- **Vectorized Execution**: Process batches of rows (columnar)
456-
- **Speculative Execution**: Mitigate stragglers
457-
458-
### 6.2 Research Directions
459-
460-
- **Machine Learning Integration**: ML-based optimization (split sizing, parallelism)
461-
- **Unified Batch and Streaming**: True unified processing model
462-
- **Global Query Optimization**: Cross-pipeline optimization
463-
464-
## 7. Lessons Learned
465-
466-
### 7.1 What Worked Well
450+
### 6.1 What Worked Well
467451

468452
1. **Engine Independence**: Validated by successful Zeta engine addition without API changes
469453
2. **Split-based Parallelism**: Scales well to 1000+ parallel tasks
470454
3. **Explicit Schema**: Caught many bugs early, enabled schema evolution
471455
4. **Two-Phase Commit**: Reliable exactly-once semantics
472456

473-
### 7.2 What Could Be Better
457+
### 6.2 What Could Be Better
474458

475459
1. **API Complexity**: Enumerator/Committer adds learning curve for simple connectors
476460
2. **Class Loader Issues**: Occasional conflicts with shaded dependencies
477461
3. **Checkpoint Latency**: Large state causes checkpoint delays
478462
4. **Documentation Gaps**: Architecture docs lagged behind code
479463

480-
### 7.3 If Starting Over
464+
### 6.3 If Starting Over
481465

482466
1. **Simplify API**: Provide higher-level abstractions for simple sources/sinks
483467
2. **Async I/O Support**: First-class async API for non-blocking connectors
484468
3. **Built-in Metrics**: Standardized metrics collection in API
485469
4. **Schema Registry Integration**: Tighter integration with external schema registries
486470

487-
## 8. Conclusion
471+
## 7. Conclusion
488472

489473
SeaTunnel's architecture reflects careful trade-offs between competing concerns:
490474
- Engine independence vs engine-specific optimization
@@ -494,7 +478,7 @@ SeaTunnel's architecture reflects careful trade-offs between competing concerns:
494478

495479
The V2 redesign addressed major V1 limitations while establishing principles for long-term evolution. Understanding these design philosophies helps contributors make consistent decisions and users understand SeaTunnel's strengths and appropriate use cases.
496480

497-
## 9. References
481+
## 8. References
498482

499483
- [Architecture Overview](overview.md)
500484
- [Source Architecture](api-design/source-architecture.md)
Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,6 @@
1+
<details><summary> Change Log </summary>
2+
3+
| Change | Commit | Version |
4+
| --- | --- | --- |
5+
6+
</details>
Lines changed: 88 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,88 @@
1+
import ChangeLog from '../changelog/connector-http-airtable.md';
2+
3+
# Airtable
4+
5+
> Airtable sink connector
6+
7+
## Description
8+
9+
Used to write data to Airtable.
10+
11+
## Key Features
12+
13+
- [ ] [exactly-once](../../introduction/concepts/connector-v2-features.md)
14+
- [ ] [cdc](../../introduction/concepts/connector-v2-features.md)
15+
- [ ] [support multiple table write](../../introduction/concepts/connector-v2-features.md)
16+
17+
## Options
18+
19+
| name | type | required | default value |
20+
|-----------------------------|---------|----------|---------------|
21+
| token | String | Yes | - |
22+
| base_id | String | Yes | - |
23+
| table | String | Yes | - |
24+
| api_base_url | String | No | https://api.airtable.com |
25+
| typecast | boolean | No | false |
26+
| batch_size | int | No | 10 |
27+
| request_interval_ms | int | No | 220 |
28+
| rate_limit_backoff_ms | int | No | 30000 |
29+
| rate_limit_max_retries | int | No | 3 |
30+
| common-options | | No | - |
31+
32+
### token [String]
33+
34+
Airtable personal access token. You can create one at https://airtable.com/create/tokens.
35+
36+
### base_id [String]
37+
38+
The ID of the Airtable base (starts with `app`).
39+
40+
### table [String]
41+
42+
The table name or table ID to write to.
43+
44+
### api_base_url [String]
45+
46+
Airtable API base URL. Default is `https://api.airtable.com`.
47+
48+
### typecast [boolean]
49+
50+
If true, Airtable will automatically convert values to match the field type. Default false.
51+
52+
### batch_size [int]
53+
54+
Number of records per API request. Maximum 10 per Airtable API limit. Default 10.
55+
56+
### request_interval_ms [int]
57+
58+
Minimum interval in milliseconds between API requests. Default 220ms.
59+
60+
### rate_limit_backoff_ms [int]
61+
62+
Base backoff time in milliseconds when receiving a 429 (rate limit) response. Default 30000ms.
63+
64+
### rate_limit_max_retries [int]
65+
66+
Maximum number of retries after receiving a 429 response. Default 3.
67+
68+
### common options
69+
70+
Sink plugin common parameters, please refer to [Sink Common Options](../common-options/sink-common-options.md) for details.
71+
72+
## Example
73+
74+
```hocon
75+
sink {
76+
Airtable {
77+
token = "patXXXXXXXX.XXXXXXXX"
78+
base_id = "appXXXXXXXX"
79+
table = "Shipments"
80+
typecast = true
81+
batch_size = 10
82+
}
83+
}
84+
```
85+
86+
## Changelog
87+
88+
<ChangeLog />
Lines changed: 182 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,182 @@
1+
import ChangeLog from '../changelog/connector-http-airtable.md';
2+
3+
# Airtable
4+
5+
> Airtable source connector
6+
7+
## Description
8+
9+
Used to read data from Airtable.
10+
11+
## Key features
12+
13+
- [x] [batch](../../introduction/concepts/connector-v2-features.md)
14+
- [ ] [stream](../../introduction/concepts/connector-v2-features.md)
15+
- [ ] [exactly-once](../../introduction/concepts/connector-v2-features.md)
16+
- [x] [column projection](../../introduction/concepts/connector-v2-features.md)
17+
- [ ] [parallelism](../../introduction/concepts/connector-v2-features.md)
18+
- [ ] [support user-defined split](../../introduction/concepts/connector-v2-features.md)
19+
20+
## Options
21+
22+
| name | type | required | default value |
23+
|-----------------------------|---------|----------|---------------|
24+
| token | String | Yes | - |
25+
| base_id | String | Yes | - |
26+
| table | String | Yes | - |
27+
| api_base_url | String | No | https://api.airtable.com |
28+
| view | String | No | - |
29+
| fields | List | No | - |
30+
| filter_by_formula | String | No | - |
31+
| max_records | int | No | - |
32+
| page_size | int | No | - |
33+
| sort | String | No | - |
34+
| cell_format | String | No | - |
35+
| return_fields_by_field_id | boolean | No | - |
36+
| record_metadata | List | No | - |
37+
| time_zone | String | No | - |
38+
| user_locale | String | No | - |
39+
| request_interval_ms | int | No | 220 |
40+
| rate_limit_backoff_ms | int | No | 30000 |
41+
| rate_limit_max_retries | int | No | 3 |
42+
| schema | Config | No | - |
43+
| schema.fields | Config | No | - |
44+
| format | String | No | text |
45+
| content_field | String | No | - |
46+
| json_field | Config | No | - |
47+
| common-options | config | No | - |
48+
49+
### token [String]
50+
51+
Airtable personal access token. You can create one at https://airtable.com/create/tokens.
52+
53+
### base_id [String]
54+
55+
The ID of the Airtable base (starts with `app`).
56+
57+
### table [String]
58+
59+
The table name or table ID to read from.
60+
61+
### api_base_url [String]
62+
63+
Airtable API base URL. Default is `https://api.airtable.com`.
64+
65+
### view [String]
66+
67+
The name or ID of a view in the table. Only records visible in this view will be returned.
68+
69+
### fields [List]
70+
71+
A list of field names to include in the response.
72+
73+
### filter_by_formula [String]
74+
75+
An Airtable formula to filter records. See [Airtable formula reference](https://support.airtable.com/docs/formula-field-reference).
76+
77+
### max_records [int]
78+
79+
Maximum total number of records to return.
80+
81+
### page_size [int]
82+
83+
Number of records per page (1-100).
84+
85+
### sort [String]
86+
87+
Sort definition as a JSON array, e.g. `[{"field":"Name","direction":"asc"}]`.
88+
89+
### cell_format [String]
90+
91+
The format for cell values, either `json` or `string`.
92+
93+
### return_fields_by_field_id [boolean]
94+
95+
If true, field keys in the response will be field IDs instead of field names.
96+
97+
### record_metadata [List]
98+
99+
Additional record metadata to return, e.g. `["commentCount"]`.
100+
101+
### time_zone [String]
102+
103+
The time zone for formatting date/time values.
104+
105+
### user_locale [String]
106+
107+
The user locale for formatting values.
108+
109+
### request_interval_ms [int]
110+
111+
Minimum interval in milliseconds between API requests. Default 220ms (to stay within Airtable's 5 requests/second limit).
112+
113+
### rate_limit_backoff_ms [int]
114+
115+
Base backoff time in milliseconds when receiving a 429 (rate limit) response. Default 30000ms.
116+
117+
### rate_limit_max_retries [int]
118+
119+
Maximum number of retries after receiving a 429 response. Default 3.
120+
121+
### schema [Config]
122+
123+
#### fields [Config]
124+
125+
The schema fields of upstream data. For more details, please refer to [Schema Feature](../../introduction/concepts/schema-feature.md).
126+
127+
### format [String]
128+
129+
The format of upstream data, supports `json` and `text`, default `text`.
130+
131+
### content_field [String]
132+
133+
JsonPath expression to extract data from the response. For Airtable, you typically use `$.records[*].fields` to extract the fields from each record.
134+
135+
### json_field [Config]
136+
137+
This parameter helps you configure the schema and must be used with schema.
138+
139+
### common options
140+
141+
Source plugin common parameters, please refer to [Source Common Options](../common-options/source-common-options.md) for details.
142+
143+
## Example
144+
145+
Read from an Airtable table and output raw text:
146+
147+
```hocon
148+
source {
149+
Airtable {
150+
token = "patXXXXXXXX.XXXXXXXX"
151+
base_id = "appXXXXXXXX"
152+
table = "Shipments"
153+
format = "text"
154+
max_records = 10
155+
}
156+
}
157+
```
158+
159+
Read with schema and extract record fields:
160+
161+
```hocon
162+
source {
163+
Airtable {
164+
token = "patXXXXXXXX.XXXXXXXX"
165+
base_id = "appXXXXXXXX"
166+
table = "Shipments"
167+
content_field = "$.records[*].fields"
168+
filter_by_formula = "{Status} = 'Shipped'"
169+
schema = {
170+
fields {
171+
Name = string
172+
Status = string
173+
Weight = float
174+
}
175+
}
176+
}
177+
}
178+
```
179+
180+
## Changelog
181+
182+
<ChangeLog />

0 commit comments

Comments
 (0)