Skip to content

Fixes for test_unique narwhals test#920

Merged
scott-routledge2 merged 2 commits intomainfrom
scott/drop_duplicates_fixes
Nov 11, 2025
Merged

Fixes for test_unique narwhals test#920
scott-routledge2 merged 2 commits intomainfrom
scott/drop_duplicates_fixes

Conversation

@scott-routledge2
Copy link
Contributor

@scott-routledge2 scott-routledge2 commented Nov 10, 2025

Changes included in this PR

  • adds extra projection in "subset" case to ensure column ordering matches Pandas
  • adds error checking for duplicated in the JIT fallback path

Testing strategy

narwhals: test_unique -- all params now passing, (keep="none" falls back to pandas)

User facing changes

Checklist

  • Pipelines passed before requesting review. To run CI you must include [run CI] in your commit message.
  • I am familiar with the Contributing Guide
  • I have installed + ran pre-commit hooks.

@codecov
Copy link

codecov bot commented Nov 10, 2025

Codecov Report

❌ Patch coverage is 37.50000% with 5 lines in your changes missing coverage. Please review.
✅ Project coverage is 68.79%. Comparing base (c33fbb5) to head (a90b4bb).
⚠️ Report is 115 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #920      +/-   ##
==========================================
+ Coverage   66.68%   68.79%   +2.10%     
==========================================
  Files         186      195       +9     
  Lines       66795    67557     +762     
  Branches     9507     9594      +87     
==========================================
+ Hits        44543    46474    +1931     
+ Misses      19572    18251    -1321     
- Partials     2680     2832     +152     

@scott-routledge2 scott-routledge2 marked this pull request as ready for review November 11, 2025 15:57
@scott-routledge2
Copy link
Contributor Author

scott-routledge2 commented Nov 11, 2025

The results don't seem to be deterministic with 3 workers. The original dataframe is: {'a': [1, 3, 2], 'b': [4, 4, 6], 'z': [7.0, 8.0, 9.0]} Sometimes df.unique(subset=["b"], keep="any") is
{'a': [1, 2], 'b': [4, 6], 'z': [8.0, 9.0]} (z and a values don't match), and sometimes it's correct.

Edit: oh this could be related to how dataframes are evaluated. The correct result is either going to be (1, 7.0) or (3, 8.0) but if the columns are evaluated independently then that would explain the sometimes mismatch.

Copy link
Collaborator

@ehsantn ehsantn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me.

Copy link
Collaborator

@DrTodd13 DrTodd13 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thanks.

@scott-routledge2 scott-routledge2 merged commit 2257ef4 into main Nov 11, 2025
50 of 54 checks passed
@scott-routledge2 scott-routledge2 deleted the scott/drop_duplicates_fixes branch November 11, 2025 20:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants