Skip to content

DOC: Improve description of axis parameter for np.median#25228

Merged
mdhaber merged 3 commits intonumpy:mainfrom
szwiep:improve-median-axis-docs
Dec 29, 2023
Merged

DOC: Improve description of axis parameter for np.median#25228
mdhaber merged 3 commits intonumpy:mainfrom
szwiep:improve-median-axis-docs

Conversation

@szwiep
Copy link
Copy Markdown
Contributor

@szwiep szwiep commented Nov 22, 2023

This PR includes an updated docstring for numpy.median which clarifies the functionality of the extended axis feature, introduced in 1.9.0.

Specifically, the description of the docstring is changed and an example is added. These help to clarify if a sequence of axes is passed, that the array is flattened along the given axes and then the median is computed over the resulting flattened axis. Without this change, the functionality of passing multiple axes could be misunderstood as computing medians sequentially over the given axes (without flattening), see #25174.

Resolves #25174

The functionality of passing a sequence of axes to the numpy.median axis
parameter is not explained in sufficient detail, leading to possible
confusion about expected results. Specifically, the current description
for the axis parameter does not mention that an array will be flattened
along the given sequence of axes before a median is computed. This
commit changes the description to clarify that the array is first
flattened along the given axes and then a median is computed.
Additionally, a ..versionadded command replaces a written version
acknowledgement.

See numpy#25174
The docstring for numpy.median does not contain an example for passing a
sequence of ints to the axis parameter. Including an example could help
clarify the functionality of the extended axis feature. This channge
includes a new example demonstrating the median with extended axis.
>>> np.median(a, axis=1)
array([7., 2.])
>>> np.median(b, axis=(0,1), overwrite_input=True)
3.5
Copy link
Copy Markdown
Member

@ngoldbaum ngoldbaum Nov 28, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These examples in the docstrings should be runnable. Using b here when it hasn't been defined above doesn't make sense. I would also not use overwrite_input, since that confuses things.

In principle doctests should catch issues like b not being defined in the docstring yet, but I'm not sure we run doctests on the full numpy codebase...

Instead, I think it makes more sense to just rewrite all of these examples to use a small 3D array, so this example where you take the median successively over two dimensions makes sense. Something like:

>>> import numpy as np
>>> a = np.arange(27)
>>> a.shape = (3, 3, 3)
>>> a
array([[[ 0,  1,  2],
        [ 3,  4,  5],
        [ 6,  7,  8]],

       [[ 9, 10, 11],
        [12, 13, 14],
        [15, 16, 17]],

       [[18, 19, 20],
        [21, 22, 23],
        [24, 25, 26]]])
>>> np.median(a)
np.float64(13.0)
>>> np.median(a, axis=0)
array([[ 9., 10., 11.],
       [12., 13., 14.],
       [15., 16., 17.]])
>>> np.median(a, axis=(0, 1))
array([12., 13., 14.])

And then if you didn't feel like editing the rest of this, you could redefine a to its original value below this example.

Just a thought, though! There are probably other ways to edit this that would make sense.

Also it would be nice if you could make sure this docstring is actually fully runnable and updated, since it looks like it hasn't been updated for the fallout from NEP 50 (ping @seberg - there's probably a lot of other docstrings that need to be updated right?).

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Err, not NEP 50, I guess just the new reprs for scalars?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that is on my not-done list. I had always hoped to use scientific-python/pytest-doctestplus#227 although that is a bit tedious until we better support doctestplus.

OTOH, a few failures (and misses) shouldn't matter too much.

At this time, I think just don't worry about whether or not you write np.float64(1.0) or 1.0, keeping things consistent may be better...

[skip azp] [skip actions] [skip cirrus]
@mdhaber
Copy link
Copy Markdown
Contributor

mdhaber commented Dec 29, 2023

I went ahead and made the example runnable and removed overwrite_input. I think the lightly modified example illustrates that np.median(a, axis=(0, 1)) flattens both axes to a single axis before performing the calculation, just like np.median(a) does (for a 2d array), so the result is the same (3.5). This is distinct from the alternative, which would be to take the median of the medians from the lines above (4.5). Although a 2D example cannot distinguish the behavior of axis=None from axis=a_tuple, I think it strikes a compromise between completeness and complexity that's appropriate for an individual function's docstring. Since this seems to resolve the issue by the same author, I'll go ahead and merge. If needed, we can make further improvements to the documentation of all functions with this behavior in a separate PR, if that sounds good?

@mdhaber mdhaber merged commit d85f9ac into numpy:main Dec 29, 2023
@charris charris changed the title DOC: Improve description of axis parameter for np.median DOC: Improve description of axis parameter for np.median Dec 30, 2023
@szwiep
Copy link
Copy Markdown
Contributor Author

szwiep commented Jan 4, 2024

Thank you for fixing up the docstring and for the comments. If there ends up being interest in making further improvements to all functions that use the axis parameter, I'd be happy to contribute.

@mdhaber
Copy link
Copy Markdown
Contributor

mdhaber commented Jan 4, 2024

If there ends up being interest in making further improvements to all functions

Sure. If you're interested, I'd suggest surveying what is currently documented about the axis parameter, summarizing the findings in an issue, and (if necessary) proposing a few ideas for improvement in that issue. Because documentation issues usually come up within the context of a particular function, and because more general documentation is a bit harder to write than function-specific documentation, the scope of issues and PRs is often restricted to a particular function. However, in the long run, it might be more efficient solve the general problem (if there is one).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

DOC: Unexpected values from np.median when passing a sequence of ints for axis=

4 participants