Skip to content

Conversation

@arthurpassos
Copy link
Collaborator

@arthurpassos arthurpassos commented Nov 19, 2025

To make the reviewers life:

This is a classical race condition / TOCTOU problem.

When the user issues the export query, the schema validation takes place. After that, the export is scheduled to run in the background. When it is picked up by the background executor, in order to create the pipeline / read from merge tree, we need the storage metadata. If a mutation changed the metadata in this time window, there will be a schema mismatch between the source and destination tables.

This PR fixes it by capturing the snapshot at the time of interpreting the query / validating the schema and re-uses that to create the pipeline

Changelog category (leave one):

  • Critical Bug Fix (crash, data loss, RBAC) or LOGICAL_ERROR

Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):

Fix crash due to incompatible headers when mutation changed the schema between scheduling and executing part exports

Documentation entry for user-facing changes

...

CI/CD Options

Exclude tests:

  • Fast test
  • Integration Tests
  • Stateless tests
  • Stateful tests
  • Performance tests
  • All with ASAN
  • All with TSAN
  • All with MSAN
  • All with UBSAN
  • All with Coverage
  • All with Aarch64
  • All Regression
  • Disable CI Cache

Regression jobs to run:

  • Fast suites (mostly <1h)
  • Aggregate Functions (2h)
  • Alter (1.5h)
  • Benchmark (30m)
  • ClickHouse Keeper (1h)
  • Iceberg (2h)
  • LDAP (1h)
  • Parquet (1.5h)
  • RBAC (1.5h)
  • SSL Server (1h)
  • S3 (2h)
  • Tiered Storage (2h)

@arthurpassos arthurpassos added antalya-25.8 port-antalya PRs to be ported to all new Antalya releases labels Nov 19, 2025
@github-actions
Copy link

github-actions bot commented Nov 19, 2025

Workflow [PR], commit [7312bb5]

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

@arthurpassos
Copy link
Collaborator Author

This PR fixes the issue for export part, but the export partition will not run because of schema mismatch. I need to look into that. It is ok to ship this as is since export partition is experimental

@arthurpassos
Copy link
Collaborator Author

To make the reviewers life:

This is a classical race condition / TOCTOU problem.

When the user issues the export query, the schema validation takes place. After that, the export is scheduled to run in the background. When it is picked up by the background executor, in order to create the pipeline / read from merge tree, we need the storage metadata. If a mutation changed the metadata in this time window, there will be a schema mismatch between the source and destination tables.

This PR fixes it by capturing the snapshot at the time of interpreting the query / validating the schema and re-uses that to create the pipeline

Copy link

@ianton-ru ianton-ru left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can changing schema cause some errors, if node exports some parts with old schema, schema is changed, replica restarts and reloads schema, and exports other parts with new schema?

@arthurpassos
Copy link
Collaborator Author

Can changing schema cause some errors, if node exports some parts with old schema, schema is changed, replica restarts and reloads schema, and exports other parts with new schema?

This will only happen if the user requests those news parts to be esportes. Remember, this is the export part functionality - not export partition.

@ianton-ru
Copy link

test_export_ttl failed
I think need to wait for expiration_time+1 seconds.
Here
https://github.com/Altinity/ClickHouse/blob/antalya-25.8/src/Storages/StorageReplicatedMergeTree.cpp#L8153
if manifest.create_time is 0 seconds, manifest.ttl_seconds 5 seconds, if second request is after 5 seconds (now=5) condition (static_cast<time_t>(expiration_time) < now is false. Need now=6 at least. Or condition (static_cast<time_t>(expiration_time) <= now)

@arthurpassos
Copy link
Collaborator Author

test_export_ttl failed I think need to wait for expiration_time+1 seconds. Here https://github.com/Altinity/ClickHouse/blob/antalya-25.8/src/Storages/StorageReplicatedMergeTree.cpp#L8153 if manifest.create_time is 0 seconds, manifest.ttl_seconds 5 seconds, if second request is after 5 seconds (now=5) condition (static_cast<time_t>(expiration_time) < now is false. Need now=6 at least. Or condition (static_cast<time_t>(expiration_time) <= now)

The PR that fixes it is here #1158, it just hasn't been merged yet

@Enmk Enmk merged commit 8bc5cdb into antalya-25.8 Nov 21, 2025
135 of 138 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

antalya antalya-25.8 port-antalya PRs to be ported to all new Antalya releases

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants