Skip to content

Conversation

@majin1102
Copy link
Contributor

@majin1102 majin1102 commented Nov 17, 2025

Close #5249

@github-actions github-actions bot added the enhancement New feature or request label Nov 17, 2025
@majin1102 majin1102 marked this pull request as draft November 17, 2025 04:54
Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment on lines 462 to 468
pub fn collect_paths(&self) -> Vec<PathSpec> {
let mut specs = Vec::new();
// Data files: ensure dataset-relative path prefixed with "data/"
for df in &self.files {
specs.push(PathSpec {
path_kind: PathKind::Data,
path: format!("data/{}", df.path),

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Honor base_id when collecting fragment paths

The new deep-clone flow builds the list of files to copy by calling frag.collect_paths() and then copying the returned dataset-relative paths (see dataset.rs lines 2021‑2044). However, collect_paths (shown here) simply prefixes every data file with data/… (and synthesizes _deletions/…) without considering DataFile.base_id or DeletionFile.base_id. For any dataset that contains fragments stored via Manifest::base_paths—for example a shallow clone referencing another dataset—the actual files live under the referenced base path, not under the shallow clone’s own data/ or _deletions/ directories. As soon as deep_clone tries to copy such a fragment it gets Error::NotFound and the clone fails, so the feature cannot convert a shallow clone into a fully materialized dataset, which is the primary use case. The path collector needs to resolve base_id to the correct root before emitting copy specs.

Useful? React with 👍 / 👎.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Support deep_clone like Deta lake

1 participant