Skip to content

Conversation

@seisman
Copy link
Member

@seisman seisman commented Oct 16, 2024

In the Session.virtualfile_in method, we must prepare for _data that will be passed to different virtualfile_from_* methods. Currently, _data must be an iterable (e.g., _data = (data, ) even for the simplest case) and when pass it to _virtualfile_from, we need to unpack it like:

_virtualfile_from(*data)

The complexity is because, while most _virtualfile_from_* methods only take a single argument, the _virtualfile_from_vectors takes multiple arguments. The _virtualfile_from_vectors is defined like _virtualfile_from_vectors(*vectors) and must be called like _virtualfile_from_vectors(x, y, z).

This PR changes its definition to _virtualfile_from_vectors(vectors), and now it should be called like _virtualfile_from_vectors((x, y, z)).

With this breaking change, the virtualfile_in function can be simplified slightly, which I think is more readable. Then the question is, is it worth a breaking change?

Edit: Actually it's possible to introduce the new syntax without breaking backward compatibility. This is done in f5b0e8e.

@seisman seisman added maintenance Boring but important stuff for the core devs needs review This PR has higher priority and needs review. labels Oct 16, 2024
@seisman seisman force-pushed the refactor/virtualfile_from_vectors branch from a73a4bc to f5b0e8e Compare October 16, 2024 13:32
@seisman seisman changed the title POC: **Breaking**: clib.Session.virtualfile_from_vectors now takes only one argument **Deprecation**: clib.Session.virtualfile_from_vectors now takes only one argument Oct 16, 2024
@seisman seisman added this to the 0.14.0 milestone Oct 16, 2024
Comment on lines +1805 to -1786
# "_data" is the data that will be passed to the _virtualfile_from function.
# "_data" defaults to "data" but should be adjusted for some cases.
_data = data
match kind:
case "arg" | "file" | "geojson" | "grid" | "image" | "stringio":
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe just put _data = data in the case _ below? I'd actually prefer if we make it more explicit like:

case "arg" | "file" | "geojson" | "grid" | "stringio":
    _data = data

So that if there's a new kind in the future, we know to add it explicitly.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd actually prefer if we make it more explicit like:

case "arg" | "file" | "geojson" | "grid" | "stringio":
    _data = data

So that if there's a new kind in the future, we know to add it explicitly.

We can't do it like this, because we're currently writing case statements like case "image" if data.dtype != "uint8":, which only catches the case when kind="image" and data.dtype != "uint8".

We can refactor the codes to:

case "arg" | "file" | "geojson" | "grid" | "stringio":
    _data = data
case "image":
    _data = data
    if data.dtype != "uint8":
        # do something
case "matrix":
    _data = data
    if data.dtype.kind not in "iuf":
    # do something

or

case "arg" | "file" | "geojson" | "grid" | "stringio":
    _data = data
case "image" if data.dtype != "uint8":
    raise ...
case "matrix" if data.dtype.kind not in "iuf":
    ...
case _:
    _data = data # this is necessary to catch cases like kind="matrix" and dtype in "iuf"  

I still prefer the current codes, which set _data to data by default and the match-case statements only need to deal with exceptions.

So that if there's a new kind in the future, we know to add it explicitly.

If there is a new kind in the future, _data=data is still the default. If needs special handling and we forget to add it, then the related tests will fail anyway.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah ok, that makes sense, let's keep _data = data at the top then.

@seisman seisman marked this pull request as ready for review October 17, 2024 06:53
@seisman seisman changed the title **Deprecation**: clib.Session.virtualfile_from_vectors now takes only one argument **Deprecation**: clib.Session.virtualfile_from_vectors now takes a sequence of vectors Oct 17, 2024
Copy link
Member

@weiji14 weiji14 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Getting very close to (finally) implementing #1076!

I've changed the label from 'maintenance' to 'deprecation'. Should we mention this is scheduled for removal in PyGMT v0.16.0?

@weiji14 weiji14 added deprecation Deprecating a feature and removed maintenance Boring but important stuff for the core devs labels Oct 21, 2024
@seisman seisman added final review call This PR requires final review and approval from a second reviewer and removed needs review This PR has higher priority and needs review. labels Oct 21, 2024
@seisman seisman changed the title **Deprecation**: clib.Session.virtualfile_from_vectors now takes a sequence of vectors clib.Session.virtualfile_from_vectors: Now takes a sequence of vectors as its single argument (Passing multiple arguments will be unsupported in v0.16.0) Oct 21, 2024
@seisman seisman removed the final review call This PR requires final review and approval from a second reviewer label Oct 22, 2024
@seisman seisman merged commit 7eb93f2 into main Oct 22, 2024
2 checks passed
@seisman seisman deleted the refactor/virtualfile_from_vectors branch October 22, 2024 02:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

deprecation Deprecating a feature

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants