-
-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Extend padding functionalities #9353
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 8 commits
7998afb
44e7d26
fd3e304
9f66ee2
2f33b40
b70b25d
d939030
48a972f
098706b
afda62d
8bdb6f1
992e20e
d3f0275
0a51b7c
3d40800
a848351
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -6689,17 +6689,45 @@ def test_polyfit_warnings(self) -> None: | |
| ds.var1.polyfit("dim2", 10, full=True) | ||
| assert len(ws) == 1 | ||
|
|
||
| def test_pad(self) -> None: | ||
| @pytest.mark.parametrize( | ||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Does this work if you want to pad along a dimension coordinate (aka. a variable that is called the same as it's dimension)?
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Good question, I'll investigate :D
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. So, as I understand it most dims in the test dataset are dimension coordinates and they pad correctly, so I think so. In any case I've extended the tests to pad all dimensions just to be sure nothing is behaving incorrectly. Let me know if I missed any case. |
||
| ["constant_values", "expected"], | ||
| [ | ||
| pytest.param(None, {"var1": np.nan}, id="default"), | ||
| pytest.param(42, {"var1": 42, "var2": 42}, id="scalar"), | ||
| pytest.param((42, 43), {"var1": (42, 43), "var2": (42, 43)}, id="tuple"), | ||
| pytest.param({"dim2": 42}, {"var1": 42, "var2": 42}, id="per dim scalar"), | ||
| pytest.param( | ||
| {"dim2": (42, 43)}, | ||
| {"var1": (42, 43), "var2": (42, 43)}, | ||
| id="per dim tuple", | ||
| ), | ||
| pytest.param( | ||
| {"var1": 42, "var2": (42, 43)}, | ||
| {"var1": 42, "var2": (42, 43)}, | ||
| id="per var", | ||
| ), | ||
| pytest.param( | ||
| {"var1": 42, "dim2": (42, 43)}, | ||
| {"var1": 42, "var2": (42, 43)}, | ||
| id="mixed", | ||
| ), | ||
| ], | ||
| ) | ||
| def test_pad(self, constant_values, expected) -> None: | ||
| ds = create_test_data(seed=1) | ||
| padded = ds.pad(dim2=(1, 1), constant_values=42) | ||
| padded = ds.pad(dim2=(1, 1), constant_values=constant_values) | ||
|
|
||
| assert padded["dim2"].shape == (11,) | ||
| assert padded["var1"].shape == (8, 11) | ||
| assert padded["var2"].shape == (8, 11) | ||
| assert padded["var3"].shape == (10, 8) | ||
| assert dict(padded.sizes) == {"dim1": 8, "dim2": 11, "dim3": 10, "time": 20} | ||
|
|
||
| np.testing.assert_equal(padded["var1"].isel(dim2=[0, -1]).data, 42) | ||
| for var, expected_value in expected.items(): | ||
| np.testing.assert_equal( | ||
| np.unique(padded[var].isel(dim2=[0, -1]).data), expected_value | ||
| ) | ||
| # np.testing.assert_equal(padded["var1"].isel(dim2=[0, -1]).data, 42) | ||
| np.testing.assert_equal(padded["dim2"][[0, -1]].data, np.nan) | ||
|
|
||
| @pytest.mark.parametrize( | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The typing claims that any mapping works, but here you are only checking dicts.
I would propose to change to
utils.is_dict_likeand find an alternative to pop (Mapping does not define pop because it is read-only). The simplest way would be to transform it into a dict first.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a fair point, to be honest I just followed the way it is done in the implementation of
padfor the variables invariable.py. Maybe I should just change all these type hints todictrather thanMapping. What do you think?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From a typing perspective dict is always problematic because it is invariant.
I would try to stick with Mapping instead.
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah okay, so just I'll just change this one
isinstance(constant_values, dict)toutils.is_dict_like(constant_values). Thanks for the clarifications!There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes and no.
For the check it should suffice, but the general Mapping class has no
popdefined because it is read-only (non-mutable).Anyway, I guess the user would find it weird that this input is changed in-place, so remove the
popmethod and replace it with a get item call (not sure if the Mapping ABC defines agetmethod).Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if the check needs to be more thorough? It is totally possible to create
Datasetobjects with a data variable that has the same name as a dimension:so we could potentially break the original use case if we remove all data variables from the dictionary, because we could also remove a dimension name.
In general, I believe we need to be thorough in defining the new feature:
dictand decide it's the padding for a specific variable?Dataset, so either we need to specify a fallback (by dimension names), or we need to require specifying the padding for all affected variables (and raise otherwise).There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a very fair point, always forget that about dicts 🙃 , I'll figure out a different way to do it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alrighty, changed, now the input
constant_valuesis not mutated in place.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is a fair point, I have rewritten this section so the original
constant_variablesmapping is not mutated.Regarding the bullet points:
data varhas adimwith the same name and the user sets a value with this name in theconstant_valuesthen thedata varwill be padded with those values (regardless of the dim being padded) and all otherdata varsthat also contain thatdimwill be padded with that value, provided that it is thedimbeing padded (a.k.a. the dim provided inpad_width). In the tests I also show that one can mixdata varanddimin the same dict. In that case the value for thedata varhas priority over the value of thedim.dimsthat are being padded then they are be padded with 0, so I kept that. For example, if one sets aconstant_valueforvar1and is padding adimthat is contained in in othervars,var1will be padded with that value along thedimwhile all othervarswill be padded with 0 along the samedim.I hope I didn't make it too confusing. Let me know if it answers your questions and if the logic makes sense :D