DOC: Improve description of ``axis`` parameter for ``np.median`` by szwiep · Pull Request #25228 · numpy/numpy

szwiep · 2023-11-22T21:36:46Z

This PR includes an updated docstring for numpy.median which clarifies the functionality of the extended axis feature, introduced in 1.9.0.

Specifically, the description of the docstring is changed and an example is added. These help to clarify if a sequence of axes is passed, that the array is flattened along the given axes and then the median is computed over the resulting flattened axis. Without this change, the functionality of passing multiple axes could be misunderstood as computing medians sequentially over the given axes (without flattening), see #25174.

Resolves #25174

The functionality of passing a sequence of axes to the numpy.median axis parameter is not explained in sufficient detail, leading to possible confusion about expected results. Specifically, the current description for the axis parameter does not mention that an array will be flattened along the given sequence of axes before a median is computed. This commit changes the description to clarify that the array is first flattened along the given axes and then a median is computed. Additionally, a ..versionadded command replaces a written version acknowledgement. See numpy#25174

The docstring for numpy.median does not contain an example for passing a sequence of ints to the axis parameter. Including an example could help clarify the functionality of the extended axis feature. This channge includes a new example demonstrating the median with extended axis.

ngoldbaum · 2023-11-28T22:32:47Z

numpy/lib/_function_base_impl.py

    >>> np.median(a, axis=1)
    array([7.,  2.])
+    >>> np.median(b, axis=(0,1), overwrite_input=True)
+    3.5


These examples in the docstrings should be runnable. Using b here when it hasn't been defined above doesn't make sense. I would also not use overwrite_input, since that confuses things.

In principle doctests should catch issues like b not being defined in the docstring yet, but I'm not sure we run doctests on the full numpy codebase...

Instead, I think it makes more sense to just rewrite all of these examples to use a small 3D array, so this example where you take the median successively over two dimensions makes sense. Something like:

>>> import numpy as np >>> a = np.arange(27) >>> a.shape = (3, 3, 3) >>> a array([[[ 0, 1, 2], [ 3, 4, 5], [ 6, 7, 8]], [[ 9, 10, 11], [12, 13, 14], [15, 16, 17]], [[18, 19, 20], [21, 22, 23], [24, 25, 26]]]) >>> np.median(a) np.float64(13.0) >>> np.median(a, axis=0) array([[ 9., 10., 11.], [12., 13., 14.], [15., 16., 17.]]) >>> np.median(a, axis=(0, 1)) array([12., 13., 14.])

And then if you didn't feel like editing the rest of this, you could redefine a to its original value below this example.

Just a thought, though! There are probably other ways to edit this that would make sense.

Also it would be nice if you could make sure this docstring is actually fully runnable and updated, since it looks like it hasn't been updated for the fallout from NEP 50 (ping @seberg - there's probably a lot of other docstrings that need to be updated right?).

Err, not NEP 50, I guess just the new reprs for scalars?

Yeah, that is on my not-done list. I had always hoped to use scientific-python/pytest-doctestplus#227 although that is a bit tedious until we better support doctestplus.

OTOH, a few failures (and misses) shouldn't matter too much.

At this time, I think just don't worry about whether or not you write np.float64(1.0) or 1.0, keeping things consistent may be better...

[skip azp] [skip actions] [skip cirrus]

mdhaber · 2023-12-29T16:37:30Z

I went ahead and made the example runnable and removed overwrite_input. I think the lightly modified example illustrates that np.median(a, axis=(0, 1)) flattens both axes to a single axis before performing the calculation, just like np.median(a) does (for a 2d array), so the result is the same (3.5). This is distinct from the alternative, which would be to take the median of the medians from the lines above (4.5). Although a 2D example cannot distinguish the behavior of axis=None from axis=a_tuple, I think it strikes a compromise between completeness and complexity that's appropriate for an individual function's docstring. Since this seems to resolve the issue by the same author, I'll go ahead and merge. If needed, we can make further improvements to the documentation of all functions with this behavior in a separate PR, if that sounds good?

szwiep · 2024-01-04T19:12:44Z

Thank you for fixing up the docstring and for the comments. If there ends up being interest in making further improvements to all functions that use the axis parameter, I'd be happy to contribute.

mdhaber · 2024-01-04T19:29:45Z

If there ends up being interest in making further improvements to all functions

Sure. If you're interested, I'd suggest surveying what is currently documented about the axis parameter, summarizing the findings in an issue, and (if necessary) proposing a few ideas for improvement in that issue. Because documentation issues usually come up within the context of a particular function, and because more general documentation is a bit harder to write than function-specific documentation, the scope of issues and PRs is often restricted to a particular function. However, in the long run, it might be more efficient solve the general problem (if there is one).

szwiep added 2 commits November 22, 2023 11:16

github-actions bot added the 04 - Documentation label Nov 22, 2023

ngoldbaum reviewed Nov 28, 2023

View reviewed changes

DOC: median: fixup docstring

1c8b79e

[skip azp] [skip actions] [skip cirrus]

mdhaber merged commit d85f9ac into numpy:main Dec 29, 2023

charris changed the title ~~DOC: Improve description of axis parameter for np.median~~ DOC: Improve description of axis parameter for np.median Dec 30, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

DOC: Improve description of `axis` parameter for `np.median`#25228

DOC: Improve description of `axis` parameter for `np.median`#25228
mdhaber merged 3 commits intonumpy:mainfrom
szwiep:improve-median-axis-docs

szwiep commented Nov 22, 2023 •

edited

Loading

Uh oh!

ngoldbaum Nov 28, 2023 •

edited

Loading

Uh oh!

ngoldbaum Nov 29, 2023

Uh oh!

seberg Nov 29, 2023

Uh oh!

mdhaber commented Dec 29, 2023

Uh oh!

szwiep commented Jan 4, 2024

Uh oh!

mdhaber commented Jan 4, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Uh oh!

Conversation

szwiep commented Nov 22, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ngoldbaum Nov 28, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ngoldbaum Nov 29, 2023

Choose a reason for hiding this comment

Uh oh!

seberg Nov 29, 2023

Choose a reason for hiding this comment

Uh oh!

mdhaber commented Dec 29, 2023

Uh oh!

szwiep commented Jan 4, 2024

Uh oh!

mdhaber commented Jan 4, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

szwiep commented Nov 22, 2023 •

edited

Loading

ngoldbaum Nov 28, 2023 •

edited

Loading