Skip to content
Merged
Show file tree
Hide file tree
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
25 changes: 20 additions & 5 deletions xarray/core/dataset.py
Original file line number Diff line number Diff line change
Expand Up @@ -9147,7 +9147,12 @@ def pad(
int | tuple[int, int] | Mapping[Any, tuple[int, int]] | None
) = None,
constant_values: (
float | tuple[float, float] | Mapping[Any, tuple[float, float]] | None
float
| tuple[float, float]
| Mapping[
Any, float | tuple[float, float] | Mapping[Any, tuple[float, float]]
]
| None
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a pretty gnarly type signature. Same with the docstring. But I don't think there's much we can do to simplify it.

We possibly could make float | tuple[float, float] | Mapping[Any, tuple[float, float]] into an alias?

Copy link
Contributor Author

@tsanona tsanona Aug 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah that's true, I'll do that.

Copy link
Contributor Author

@tsanona tsanona Aug 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, I changed it slightly. I created these aliases:

T_PadConstantValues = float | tuple[float, float]
T_VarPadConstantValues = T_PadConstantValues | Mapping[Any, T_PadConstantValues]
T_DatasetPadConstantValues = T_VarPadConstantValues | Mapping[Any, T_VarPadConstantValues]

This way it is a bit more organized. Also changed the implementation of pad for variables so that it can make use of Mapping[Any, float] to keep consistency with the pad function for datasets.

What do you think?
(Also let me know if the location in types.py and import in dataset.py is correct. I don't fully understand when to import types under the if TYPE_CHECKING: block or above in dataset.py)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, thanks!

(others may have refinements...)

) = None,
end_values: int | tuple[int, int] | Mapping[Any, tuple[int, int]] | None = None,
reflect_type: PadReflectOptions = None,
Expand Down Expand Up @@ -9204,9 +9209,11 @@ def pad(
(stat_length,) or int is a shortcut for before = after = statistic
length for all axes.
Default is ``None``, to use the entire axis.
constant_values : scalar, tuple or mapping of hashable to tuple, default: 0
Used in 'constant'. The values to set the padded values for each
axis.
constant_values : scalar, tuple, mapping of hashable to tuple or
mapping of hashable to mapping of hashable to tuple, default: 0
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not necessarily on this PR but do you happen to know why the default is listed as 0 here but is None in the func signature?

Copy link
Contributor Author

@tsanona tsanona Aug 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hum, what happened there, but it is a good point since if left at None the arrays are padded with np.nan. Should I change the default to nan or None?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or to 0?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah you mean changing the default and not the docstring?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah sorry, so what's the actual value that goes onto the dataset, by default? I had originally thought it was 0, but does None / NaN actually get placed on the dataset? (I realize it's listed as the default in the signature)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also wasn't sure when you first asked so I created a test for the default state and it does indeed look like it populates the padding with np.nan when constant_values is left at None. So we it is confusing to have default: 0 in the doc string. We can either change the doc string to None or NaN or change the default in the function signature to 0 rather then None. I'm not sure which would be better, what do you think?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please change the docstrings, otherwise that's a breaking change!

Copy link
Contributor Author

@tsanona tsanona Aug 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah of course, will change it to default: None in the docstring.
I'm also changing it for end_values which seems to me to have the same problem.
I also just realized it must have been copied from the docstring of np.pad since it looks quite similar and the actual default is 0.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great, thank you!

Used in 'constant'. The values to set the padded values for each data variable / axis.
``{var_1: {dim_1: (before_1, after_1), ... dim_N: (before_N, after_N)}, ...
var_M: (before, after)}`` unique pad constants per data variable.
``{dim_1: (before_1, after_1), ... dim_N: (before_N, after_N)}`` unique
pad constants along each dimension.
``((before, after),)`` yields same before and after constants for each
Expand Down Expand Up @@ -9292,6 +9299,12 @@ def pad(
if not pad_dims.intersection(xindexes.get_all_dims(k)):
indexes[k] = idx

per_data_var_constant_values = {}
if isinstance(constant_values, Mapping):
for k in self.data_vars:
if v := constant_values.pop(k, None):
per_data_var_constant_values[k] = v

for name, var in self.variables.items():
var_pad_width = {k: v for k, v in pad_width.items() if k in var.dims}
if not var_pad_width:
Expand All @@ -9301,7 +9314,9 @@ def pad(
pad_width=var_pad_width,
mode=mode,
stat_length=stat_length,
constant_values=constant_values,
constant_values=per_data_var_constant_values.get(
name, constant_values
),
end_values=end_values,
reflect_type=reflect_type,
keep_attrs=keep_attrs,
Expand Down
30 changes: 27 additions & 3 deletions xarray/tests/test_dataset.py
Original file line number Diff line number Diff line change
Expand Up @@ -6689,17 +6689,41 @@ def test_polyfit_warnings(self) -> None:
ds.var1.polyfit("dim2", 10, full=True)
assert len(ws) == 1

def test_pad(self) -> None:
@pytest.mark.parametrize(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this work if you want to pad along a dimension coordinate (aka. a variable that is called the same as it's dimension)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question, I'll investigate :D

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, as I understand it most dims in the test dataset are dimension coordinates and they pad correctly, so I think so. In any case I've extended the tests to pad all dimensions just to be sure nothing is behaving incorrectly. Let me know if I missed any case.

["constant_values", "expected"],
[
pytest.param(42, {"var1": 42}, id="numeric"),
pytest.param((42, 43), {"var1": (42, 43)}, id="tuple"),
pytest.param(
{"dim2": (42, 43)}, {"var1": (42, 43), "var2": (42, 43)}, id="per dim"
),
pytest.param(
{"var1": 42, "var2": (42, 43)},
{"var1": 42, "var2": (42, 43)},
id="per var",
),
pytest.param(
{"var1": 42, "dim2": (42, 43)},
{"var1": 42, "var2": (42, 43)},
id="mixed",
),
],
)
def test_pad(self, constant_values, expected) -> None:
ds = create_test_data(seed=1)
padded = ds.pad(dim2=(1, 1), constant_values=42)
padded = ds.pad(dim2=(1, 1), constant_values=constant_values)

assert padded["dim2"].shape == (11,)
assert padded["var1"].shape == (8, 11)
assert padded["var2"].shape == (8, 11)
assert padded["var3"].shape == (10, 8)
assert dict(padded.sizes) == {"dim1": 8, "dim2": 11, "dim3": 10, "time": 20}

np.testing.assert_equal(padded["var1"].isel(dim2=[0, -1]).data, 42)
for var, expected_value in expected.items():
np.testing.assert_equal(
np.unique(padded[var].isel(dim2=[0, -1]).data), expected_value
)
# np.testing.assert_equal(padded["var1"].isel(dim2=[0, -1]).data, 42)
np.testing.assert_equal(padded["dim2"][[0, -1]].data, np.nan)

@pytest.mark.parametrize(
Expand Down