Skip to content

Conversation

@majin1102
Copy link
Contributor

@majin1102 majin1102 commented Oct 28, 2025

When I was planning the work #5073 . I found some uncompleted work:

  1. Tag retrieval doesn't contain branch field both in Java and Python.
  2. The reference parameter is not well used in scenarios including createTag, updateTag. And createBranch both in Java and Python.
  3. For python, Tuple[str, int] is not the best type and lead to the api of checkout_branch. Use Tuple[Optional[str], Optional[int]] instead for more syntax expressions and remove checkout_branch.
  4. Some necessary refactoring like removing all ref_

@github-actions github-actions bot added enhancement New feature or request python java labels Oct 28, 2025
@majin1102 majin1102 force-pushed the fill_in_branch_fields branch from 82553ab to c9e6dfd Compare October 28, 2025 15:51
@chatgpt-codex-connector
Copy link

💡 Codex Review

https://github.com/lancedb/lance/blob/82553ab6e87dd39ae60327920789a50db86caba7/java/lance-jni/src/blocking_dataset.rs#L1654-L1665
P1 Badge JNI tag APIs ignore tag references

The new Java overloads allow Tags.create("tag", Ref) and Tags.update("tag", Ref) where Ref may refer to another tag. However, the native implementations only read getVersionNumber() and getBranchName() from the Ref and call create_tag/update_tag with those optional values, dropping getTagName(). When a caller passes Ref.ofTag("v1"), both extracted fields are None and the JNI layer tags whatever happens to be the current dataset version instead of the referenced tag’s version. Python bindings resolve tag references to concrete branch/version before creating/updating tags, so Java callers now get inconsistent behaviour. These helpers should resolve the tag name to its version (and branch) before invoking the Rust API, similar to the Python path.

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

@majin1102 majin1102 marked this pull request as draft October 28, 2025 16:13
@majin1102
Copy link
Contributor Author

majin1102 commented Oct 28, 2025

There's another issue that I don't feel right but might need some discussion.

For now we use None to present the main branch. For example, we are in branch A:10, if we want to checkout A:9, we need to use checkout_version(('A', 9)). This is all right if user knows he is on branch A, however, if he load the dataset by a branch uri, he just treats the uri as an independent dataset(at least the experience). He might just use checkout_version(9) or checkout((None, 9)). But the actual result is he checked out the main.

I think most cases when people see None, maybe they are expecting no changes. So I think we might should to present None as the current branch instead of the main. On the other hand, we could use a constant string 'main' to present the main branch since we have reserved it. Although We store None for the main in the spec, we could provide better API for this.

How do you feel about this? @jackye1995 If you agree I could do this in this PR

@jackye1995 jackye1995 self-requested a review October 28, 2025 18:34
@jackye1995
Copy link
Contributor

Looks like there are 2 things here,

(1) feels like from what you are describing that putting "None" is confusing, maybe we should just reserve the name main to mean the main branch, so user can use that to reference the main branch. (they can continue to use None to also mean main branch that is fine)

(2) I agree that checkout_version(version_number) should better be against the current branch, we can improve that experience.

@majin1102 majin1102 force-pushed the fill_in_branch_fields branch from 88a0d50 to efb3647 Compare October 29, 2025 11:13
@majin1102
Copy link
Contributor Author

majin1102 commented Oct 29, 2025

Looks like there are 2 things here,

(1) feels like from what you are describing that putting "None" is confusing, maybe we should just reserve the name main to mean the main branch, so user can use that to reference the main branch. (they can continue to use None to also mean main branch that is fine)

(2) I agree that checkout_version(version_number) should better be against the current branch, we can improve that experience.

Sorry I think I didn't make myself clear enough.

If we use checkout_version(version_number) the version_number would be a impl Into<Ref> and that would be (None, Some(version_number)) pointing to the main branch. So I guess 1) and 2) are somehow conflicted:

    pub async fn checkout_version(&self, version: impl Into<refs::Ref>) -> Result<Self> {
        let reference: refs::Ref = version.into()
        ............

We can transform an u64 to (current_branch, Some(version_number)), but will make the interface complicated(the Into trait doesn't have current branch, so we might need to split the api). If we make (None, Some(version_number)) just point to the current branch, it will make sense to the impl Into<Ref>

This is just an impl issue, but I think we better make these all matched and clear. WDYT? @jackye1995

@codecov-commenter
Copy link

codecov-commenter commented Oct 29, 2025

Codecov Report

❌ Patch coverage is 85.39326% with 26 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
rust/lance/src/dataset/refs.rs 85.92% 13 Missing and 6 partials ⚠️
rust/lance/src/dataset.rs 82.60% 2 Missing and 2 partials ⚠️
rust/lance/src/dataset/builder.rs 25.00% 3 Missing ⚠️

📢 Thoughts on this report? Let us know!

@majin1102 majin1102 marked this pull request as ready for review October 29, 2025 15:36
Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Copy link
Contributor

@jackye1995 jackye1995 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mostly looks good to me, pending rebase

@majin1102 majin1102 force-pushed the fill_in_branch_fields branch from 96b9187 to 9c1597d Compare December 16, 2025 04:22
@majin1102
Copy link
Contributor Author

Hi, @jackye1995

I think there's a bug for current branch datasets. As we know we now use (str, u64) to identify a global version. if we are working on a branch and use interfaces like this:

    pub async fn read_transaction_by_version(&self, version: u64) -> Result<Option<Transaction>> {
        let dataset_version = self.checkout_version(version).await?;
        dataset_version.read_transaction().await
    }

We will redirect to the main branch and get the transaction in that version line. On the other hand, everytime we use checkout_verson by a version_number, we are checking out the main branch which might be probably unexpected.

I'm solving this issue by introducing a VersionNumber enumation which is binded to the current branch context. In the meantime, we keep None as the main branch.

Please take a look on the newsest code when you have time

@majin1102 majin1102 force-pushed the fill_in_branch_fields branch from 6a72024 to 146ec75 Compare December 17, 2025 10:07
Copy link
Collaborator

@yanghua yanghua left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left some comments.

@majin1102
Copy link
Contributor Author

Comments addressed, please take a look @yanghua

Copy link
Collaborator

@yanghua yanghua left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

@majin1102
Copy link
Contributor Author

@jackye1995 Please let me know if you have any further comments

Copy link
Contributor

@jackye1995 jackye1995 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me!

@jackye1995 jackye1995 merged commit fb7f8dd into lance-format:main Dec 20, 2025
40 of 41 checks passed
wjones127 pushed a commit to wjones127/lance that referenced this pull request Dec 30, 2025
…5088)

When I was planning the work lance-format#5073 . I found some uncompleted work:

1. Tag retrieval doesn't contain branch field both in Java and Python.
2. The reference parameter is not well used in scenarios including
createTag, updateTag. And createBranch both in Java and Python.
3. For python, Tuple[str, int] is not the best type and lead to the api
of checkout_branch. Use Tuple[Optional[str], Optional[int]] instead for
more syntax expressions and remove checkout_branch.
4. Some necessary refactoring like removing all ref_
jackye1995 pushed a commit to jackye1995/lance that referenced this pull request Jan 21, 2026
…5088)

When I was planning the work lance-format#5073 . I found some uncompleted work:

1. Tag retrieval doesn't contain branch field both in Java and Python.
2. The reference parameter is not well used in scenarios including
createTag, updateTag. And createBranch both in Java and Python.
3. For python, Tuple[str, int] is not the best type and lead to the api
of checkout_branch. Use Tuple[Optional[str], Optional[int]] instead for
more syntax expressions and remove checkout_branch.
4. Some necessary refactoring like removing all ref_
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request java python

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants