-
Notifications
You must be signed in to change notification settings - Fork 234
Refactor the virtualfile_in function to accept more 1-D arrays #2744
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
49b1b3f to
5512d2f
Compare
5512d2f to
70fc9e4
Compare
70fc9e4 to
66c4b97
Compare
pygmt/helpers/utils.py
Outdated
|
|
||
| def _validate_data_input( | ||
| data=None, x=None, y=None, z=None, required_z=False, required_data=True, kind=None | ||
| def validate_data_input( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's more useful to pass the list of column names instead, i.e., replacing ncols=2 with names=["x", "y"].
So, for most modules, vectors=["x", "y"] and names=["x", "y"] or vectors=[x, y, z] and names=["x", "y", "z"].
For more complicated modules like plot or plot3d, the names can be
names=["x", "y", "direction_arg1", "direction_arg2", "fill", "size", "symbol", "transparency"].
The column names will be very useful when the GMTInvalidInput exception is raised.
For example, instead of "Column 5 can't be None.", we can say "Column 5 ('size') can't be None.". Instead of "data must have at least 8 columns.", we can say
data must have at least 8 columns:
x y direction_arg1 direction_arg2 fill size symbol transparency
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done in f37413b
| if len(vectors) < len(names): | ||
| raise GMTInvalidInput( | ||
| f"Requires {len(names)} 1-D arrays but got {len(vectors)}." | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing unit test for this if-condition.
| if len(data.shape) == 1 and data.shape[0] < len(names): | ||
| raise GMTInvalidInput(msg) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing unit test for this if-condition.
| vectors, names = [x, y], "xy" | ||
| if z is not None: | ||
| vectors.append(z) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Need to append 'z' to names here? Also, need a unit test for this if-condition.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually no. The problem is that project only requires two columns, but three or more columns are required. Currently, the variable names are used for two purposes: (1) names of passed columns; (2) the number of columns. So, if we append z to names here, the calling pygmt.project(data=data) will fail if data has only two columns. I think we still need to maintain a variable for the number of required columns.
| kind = data_kind(data, required=required_data) | ||
| validate_data_input( | ||
| data=data, | ||
| vectors=vectors, | ||
| names=names, | ||
| required_data=required_data, | ||
| kind=kind, | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The validation checks have been moved from within data_kind to virtualfile_from_data here. But in plot.py, we actually use data_kind on its own here:
Line 217 in 3076ddc
| kind = data_kind(data, x, y) |
Are we ok with raising GMTInvalidInput much later here in virtualfile_from_data (after all the keyword argument parsing), rather than early on in data_kind?
|
Closing this PR since it will be superseded by #3369. |
Description of proposed changes
Here are the current definitions of the
virtualfile_from_datamethod and thedata_kindfunction:pygmt/pygmt/clib/session.py
Lines 1473 to 1483 in c9d6147
pygmt/pygmt/helpers/utils.py
Line 110 in c9d6147
When I started issue #2731, I realized the current function definitions have some limitations:
binstatsusually requires 3 columns (x/y/z), but only requires 2 columns (x/y) if-Cnis used, and requires 4 columns (x/y/z/w) if-Wis used. I don't think we want to addw=Noneandrequired_w=Falseto these functions. Also, we don't check if the input table has the required number of columns.data_kindfunction does three things: (1) determines thekindof the input data, and (2) checks if the data/x/y/z combinations are valid; and (3) checks if thematrix-type data has 3 columns. Thedata_kindfunction is called insidevirtualfile_from_data, but sometimes we need to know in data kind when wrapping GMT modules, for example, inFigure.plotandFigure.plot3d. It means thedata_kindfunction is called twice, which is not necessary.Solutions:
virtualfile_from_datafunction likedata=None, vectors=None, names=["x", "y"].vectorsis a list of vectors (e.g.,vectors=[x, y])andnamesis a list of column names. The wrappers are responsible for preparing the list of 1-D arrays (vectors) and counting the column names (names).data_kindfocus on determining the data kind and have another separate function to check if the input data/vectors are valid.