Skip to content

Conversation

@dcherian
Copy link
Contributor

@dcherian dcherian commented Nov 19, 2024

cc @phofl here we need to index a numpy array with a dask array (commonly a much larger array) in a sane manner.

We now preserve chunksizes for

import numpy as np
import xarray as xr

# create some dummy data and chunk
x, y, t = 1000, 1000, 57
rang = np.arange(t*x*y)
da = xr.DataArray(rang.reshape(t, x, y), coords={'time':range(t), 'x': range(x), 'y':range(y)})
da = da.chunk(dict(time=-1, x=256, y=256))
da.idxmin('time')

After
image

Before
image

@dcherian dcherian added the topic-chunked-arrays Managing different chunked backends, e.g. dask label Nov 20, 2024
@dcherian dcherian marked this pull request as ready for review November 20, 2024 03:52
@dcherian dcherian marked this pull request as draft November 20, 2024 17:39
@dcherian dcherian marked this pull request as ready for review December 6, 2024 15:50
@dcherian
Copy link
Contributor Author

Merging on Thursday if there are no comments

@dcherian dcherian added plan to merge Final call for comments and removed needs review labels Dec 17, 2024
Copy link
Collaborator

@headtr1ck headtr1ck left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ups, never submitted my review comments.

@dcherian dcherian enabled auto-merge (squash) December 18, 2024 15:47
@dcherian dcherian merged commit 9fe816e into pydata:main Dec 18, 2024
28 checks passed
@dcherian dcherian deleted the vindex-idxminmax branch March 18, 2025 13:24
dcherian added a commit to dcherian/xarray that referenced this pull request Mar 19, 2025
* main: (63 commits)
  Fix zarr upstream tests (pydata#9927)
  Update pre-commit hooks (pydata#9925)
  split out CFDatetimeCoder, deprecate use_cftime as kwarg (pydata#9901)
  dev whats-new (pydata#9923)
  Whats-new 2025.01.0 (pydata#9919)
  Silence upstream Zarr warnings (pydata#9920)
  time coding refactor (pydata#9906)
  fix warning from scipy backend guess_can_open on directory (pydata#9911)
  Enhance and move ISO-8601 parser to coding.times (pydata#9899)
  Edit serialization error message (pydata#9916)
  friendlier error messages for missing chunk managers (pydata#9676)
  Bump codecov/codecov-action from 5.1.1 to 5.1.2 in the actions group (pydata#9915)
  Rewrite interp to use `apply_ufunc` (pydata#9881)
  Skip dask rolling (pydata#9909)
  Explicitly configure ReadTheDocs build to use conf.py (pydata#9908)
  Cache pre-existing Zarr arrays in Zarr backend (pydata#9861)
  Optimize idxmin, idxmax with dask (pydata#9800)
  remove unused "type: ignore" comments in test_plot.py (fixed in matplotlib 3.10.0) (pydata#9904)
  move scalar-handling logic into `possibly_convert_objects` (pydata#9900)
  Add missing DataTree attributes to docs (pydata#9876)
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

plan to merge Final call for comments topic-chunked-arrays Managing different chunked backends, e.g. dask

Projects

None yet

Development

Successfully merging this pull request may close these issues.

idxmin / idxmax is not parallel friendly

2 participants