Skip to content

Conversation

@kkollsga
Copy link
Contributor

Summary

Fixed sortby(ascending=False) placing NaN values at the beginning of results instead of the end.

The issue was that descending sort simply reversed the ascending order with order[::-1], which moved NaNs from end to beginning.

The fix detects trailing NaN elements in the ascending order and only reverses the non-NaN portion, keeping NaNs at the end.

Test plan

  • Added test_sortby_descending_nans regression test
  • Verified existing sortby tests still pass
  • Benchmarked performance impact (~0.7ms overhead for 100k elements with NaNs)

🤖 Generated with Claude Code

When using sortby with ascending=False, the implementation simply
reversed the ascending sort order. This caused NaN values to appear
at the beginning of the result instead of the end.

The fix detects trailing NaN elements in the ascending order and
only reverses the non-NaN portion, keeping NaNs at the end as
expected.

Closes pydata#7358

Co-authored-by: Claude <[email protected]>
@welcome
Copy link

welcome bot commented Jan 31, 2026

Thank you for opening this pull request! It may take us a few days to respond here, so thank you for being patient.
If you have questions, some answers may be found in our contributing guidelines.

Copy link
Member

@jsignell jsignell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @kkollsga this seems to work although it would be good to include a multidimensional test as well. It does seem a little verbose though. I was experimenting with something like this instead:

if ascending:
    indices[key] = np.lexsort(tuple(reversed(arrays)))
else:
    # sort any null values to the beginning before reversing the order
    indices[key] = np.lexsort(
        tuple(
            [*reversed(arrays), *[notnull(arr) for arr in reversed(arrays)]]
        )
    )[::-1]

which relies on notnull from duck_array_ops

@dcherian
Copy link
Contributor

dcherian commented Feb 2, 2026

that does look nicer. Another though I had was counting the number of NaNs; and call np.roll but not sure if that will work

@kkollsga
Copy link
Contributor Author

kkollsga commented Feb 3, 2026

Thanks @jsignell and @dcherian for the suggestions!

I tested @jsignell's approach locally - it's much cleaner and handles all null types via duck_array_ops.notnull (not just float NaN like my version). All existing sortby tests pass.

Happy to update the PR with this implementation and add a multidimensional test case, e.g.:

ds = Dataset({
    "var": (["x", "y"], [[1, 2], [3, 4], [5, 6]]),
    "key": ("x", [3.0, np.nan, 1.0])
})
result = ds.sortby("key", ascending=False)
# x order: [0, 2, 1] with NaN row at end

@jsignell
Copy link
Member

jsignell commented Feb 3, 2026

Happy to update the PR with this implementation and add a multidimensional test case

Maybe multidimensional wasn't quite what I meant actually. I would love to see a test where there are multiple arrays passed to the sortby. Something like this:

>>> ds = xr.Dataset(
...     {
...         "A": (("x", "y"), [[1, 2, 3], [4, 5, 6]]),
...         "B": (("x", "y"), [[7, 8, 9], [10, 11, 12]]),
...     },
...     coords={"x": ["b", "a"], "y": [np.nan, 1, 0]},
... )
>>> ds.sortby(["x", "y"], ascending=False)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

sortby descending does not handle nans correctly

3 participants