[Docs][Architecture] Add architecture documents#10429
[Docs][Architecture] Add architecture documents#10429davidzollo wants to merge 12 commits intoapache:devfrom
Conversation
This PR introduces a complete architecture documentation system for SeaTunnel, including: **Phase 1 - Core Foundation (6 docs)**: - Architecture Overview - Design Philosophy - Source Architecture - Sink Architecture - Engine Architecture - Checkpoint Mechanism **Phase 2 - Advanced Topics (6 docs)**: - DAG Execution Model - Resource Management - CatalogTable & Metadata Management - Multi-table Synchronization - Exactly-once Semantics - Translation Layer **Key Features**: - 24 professional documents (12 English + 12 Chinese) - ~450KB total size, 15,100 lines - Based on 2000+ lines of source code analysis - Complete with architecture diagrams, sequence diagrams, and code examples - Unified documentation template and terminology - Integrated into docs/sidebars.js This documentation system fills a critical gap in SeaTunnel's technical documentation, providing enterprise-grade, systematic architectural design documents for: - Architects evaluating SeaTunnel - Core contributors understanding the codebase - Enterprise developers customizing the system - Technical decision makers assessing the platform Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
This file is for internal tracking and should not be included in the PR. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Issue 1: English documentation missing maintenance noteLocation: Problem Description: Potential Risks:
Impact Scope:
Severity: MINOR Improvement Suggestion: > Note: To minimize maintenance overhead and avoid "documentation-code drift" due to refactoring,
> this documentation focuses on component responsibilities, interaction flows, and design motivations
> rather than embedding source code snippets or direct source links.Rationale:
Issue 2: Missing documentation testingLocation: PR change root directory (recommend adding Problem Description:
Potential Risks:
Impact Scope:
Severity: MAJOR Improvement Suggestion: name: Docs Test
on:
pull_request:
paths:
- 'docs/**'
- 'seatunnel-api/**'
- 'seatunnel-engine/**'
jobs:
markdown-links:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- name: Check Markdown links
uses: gaurav-nelson/github-action-markdown-link-check@v1
with:
config-file: '.markdownlinkcheck.json'
config-keys-sync:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- name: Check config keys
run: |
# Example: Verify that configuration keys in the documentation exist in the code
grep -r "multi_table_sink_replica" seatunnel-api || exit 1Rationale:
Issue 3: Storage implementation naming description in checkpoint-mechanism.md is not precise enoughLocation: Related Context: **Implementations**:
- `LocalFileCheckpointStorage`: Local file system (testing)
- `HdfsCheckpointStorage`: HDFS
- `S3CheckpointStorage`: AWS S3
- `OssCheckpointStorage`: Aliyun OSSProblem Description:
Potential Risks:
Impact Scope:
Severity: MAJOR Improvement Suggestion: **Implementations**:
- `LocalFileStorage`: Local file system (testing)
- `HdfsStorage`: Hadoop FileSystem-based backend; can work with HDFS/S3A/etc depending on Hadoop configuration
Note: S3 and OSS support are provided through Hadoop FileSystem configuration (e.g., `fs.s3a.impl`) rather than separate CheckpointStorage implementations.Rationale:
Issue 4: Resource management strategy class names are inaccurateLocation: Related Context:
Problem Description: Potential Risks:
Impact Scope:
Severity: MAJOR Improvement Suggestion: **Slot Allocation Strategies**:
```java
// 1. RandomStrategy: Random selection among available workers
// 2. SlotRatioStrategy: Prefer workers with more available slots
// 3. SystemLoadStrategy: Prefer workers with lower CPU/memory usageRationale:
|
|
I think there are a bit too many subdirectories. Could we reduce them to optimize navigation efficiency.the directory structure I suggest is as follows,just for your consideration: |
|
|
||
| ``` | ||
| seatunnel/ | ||
| ├── seatunnel-api/ # 核心 API 定义 |
There was a problem hiding this comment.
The structure of this module is incorrect
| │ ├── connector-cdc-mysql/ # MySQL CDC 连接器 | ||
| │ └── ... # 更多连接器 | ||
| │ | ||
| ├── seatunnel-transforms-v2/ # 转换实现 |
There was a problem hiding this comment.
The structure of this module is incorrect
| ### 学术论文 | ||
|
|
||
| - Chandy & Lamport (1985): ["Distributed Snapshots"](https://lamport.azurewebsites.net/pubs/chandy.pdf) | ||
| - Gray & Lamport (2006): ["Consensus on Transaction Commit"](https://lamport.azurewebsites.net/pubs/paxos-commit.pdf) |
There was a problem hiding this comment.
The resource has been removed
| **数据源**: | ||
| - `SourceSplitEnumerator`(协调端):生成分片、分配给读取器、处理注册 | ||
| - `SourceReader`(工作节点):从分配的分片读取数据 | ||
|
|
||
| **数据汇**: | ||
| - `SinkCommitter` / `SinkAggregatedCommitter`(协调端):协调提交 | ||
| - `SinkWriter`(工作节点):写入数据、准备提交信息 |
There was a problem hiding this comment.
I don't know what the content under this title means, as there are no such two parts in other titles
…eedback 1. Fix CheckpointStorage implementation class names in checkpoint-mechanism.md - Correct: LocalFileStorage, HdfsStorage (not *CheckpointStorage) - Clarify S3/OSS support through Hadoop FileSystem configuration 2. Update developer guide with architecture documentation links - Add architecture reference section to how-to-create-your-connector.md - Link to overview, source/sink architecture, translation layer, and checkpoint docs - Improve discoverability for connector developers 3. Improve design-philosophy.md to focus on principles over implementation - Replace specific class names with conceptual explanations - Explain coordination vs execution separation mechanism - Clarify two-phase commit protocol workflow - Use "Master-side" and "Worker-side" instead of "Control Plane" and "Data Plane" - Remove code examples, replace with principle-focused descriptions 4. Optimize directory structure (Option A) - Move translation/translation-layer.md -> api-design/translation-layer.md - Move data-flow/multi-table.md -> features/multi-table.md - Remove empty translation/ and data-flow/ directories - Reduce single-file directories for better navigation Addresses reviewer feedback from: - DanielCarter-stack (Issues 1, 3, 5) - misi1987107 (directory structure) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
- Add maintenance note explaining why source code links are not embedded - Align with Chinese documentation maintenance strategy - Address Issue 1 from code review (DanielCarter-stack) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
- User decision: Keep overview.md without the maintenance note - Revert previous addition Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
This PR reviews and refines the Chinese architecture documents in docs/zh/architecture for accuracy, removing obsolete implementation details and correcting configuration keys based on the current codebase.