-
Notifications
You must be signed in to change notification settings - Fork 235
clib: Add virtualfile_to_dataset method for converting virtualfile to a dataset #3083
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 5 commits
6827c82
bd166fe
d376a74
193bd05
c9e482a
ce029b2
9640c26
9f9f08d
796f1cc
711142c
aee0499
cd5b31d
bc7d844
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change | ||||
|---|---|---|---|---|---|---|
|
|
@@ -1738,6 +1738,119 @@ def read_virtualfile( | |||||
| dtype = {"dataset": _GMT_DATASET, "grid": _GMT_GRID}[kind] | ||||||
| return ctp.cast(pointer, ctp.POINTER(dtype)) | ||||||
|
|
||||||
| def return_table( | ||||||
| self, | ||||||
| output_type: Literal["pandas", "numpy", "file"], | ||||||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Should we set a default output type here? It looks like we're using
Suggested change
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It makes no differences because we always call the function with the
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes, it doesn't make any difference in the PyGMT modules, but this is a good central location to document that
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yeah, not saying that |
||||||
| vfile: str, | ||||||
| column_names: list[str] | None = None, | ||||||
| ) -> pd.DataFrame | np.ndarray | None: | ||||||
| """ | ||||||
| Return an output table from a virtual file based on the output type. | ||||||
|
|
||||||
| Parameters | ||||||
| ---------- | ||||||
| output_type | ||||||
| Desired output type of the result data. | ||||||
|
|
||||||
| - ``"pandas"`` will return a :class:`pandas.DataFrame` object. | ||||||
| - ``"numpy"`` will return a :class:`numpy.ndarray` object. | ||||||
| - ``"file"`` means the result was saved to a file and will return ``None``. | ||||||
| vfile | ||||||
| The virtual file name that stores the result data. Required for ``"pandas"`` | ||||||
| and ``"numpy"`` output type. | ||||||
| column_names | ||||||
| The column names for the :class:`pandas.DataFrame` output. | ||||||
|
|
||||||
| Returns | ||||||
| ------- | ||||||
| table | ||||||
| The output table. If ``output_type="file"`` returns ``None``. | ||||||
|
|
||||||
| Examples | ||||||
| -------- | ||||||
| >>> from pathlib import Path | ||||||
| >>> import numpy as np | ||||||
| >>> import pandas as pd | ||||||
| >>> | ||||||
| >>> from pygmt.helpers import GMTTempFile | ||||||
| >>> from pygmt.clib import Session | ||||||
| >>> | ||||||
| >>> with GMTTempFile(suffix=".txt") as tmpfile: | ||||||
| ... # prepare the sample data file | ||||||
| ... with open(tmpfile.name, mode="w") as fp: | ||||||
| ... print(">", file=fp) | ||||||
| ... print("1.0 2.0 3.0 TEXT1 TEXT23", file=fp) | ||||||
| ... print("4.0 5.0 6.0 TEXT4 TEXT567", file=fp) | ||||||
| ... print(">", file=fp) | ||||||
| ... print("7.0 8.0 9.0 TEXT8 TEXT90", file=fp) | ||||||
| ... print("10.0 11.0 12.0 TEXT123 TEXT456789", file=fp) | ||||||
| ... | ||||||
| ... # file output | ||||||
| ... with Session() as lib: | ||||||
| ... with GMTTempFile(suffix=".txt") as outtmp: | ||||||
| ... with lib.virtualfile_out( | ||||||
| ... kind="dataset", fname=outtmp.name | ||||||
| ... ) as vouttbl: | ||||||
| ... lib.call_module("read", f"{tmpfile.name} {vouttbl} -Td") | ||||||
| ... result = lib.return_table(output_type="file", vfile=vouttbl) | ||||||
| ... assert result is None | ||||||
| ... assert Path(outtmp.name).stat().st_size > 0 | ||||||
| ... | ||||||
| ... # numpy output | ||||||
| ... with Session() as lib: | ||||||
| ... with lib.virtualfile_out(kind="dataset") as vouttbl: | ||||||
| ... lib.call_module("read", f"{tmpfile.name} {vouttbl} -Td") | ||||||
| ... outnp = lib.return_table(output_type="numpy", vfile=vouttbl) | ||||||
| ... assert isinstance(outnp, np.ndarray) | ||||||
| ... | ||||||
| ... # pandas output | ||||||
| ... with Session() as lib: | ||||||
| ... with lib.virtualfile_out(kind="dataset") as vouttbl: | ||||||
| ... lib.call_module("read", f"{tmpfile.name} {vouttbl} -Td") | ||||||
| ... outpd = lib.return_table(output_type="pandas", vfile=vouttbl) | ||||||
| ... assert isinstance(outpd, pd.DataFrame) | ||||||
| ... | ||||||
| ... # pandas output with specified column names | ||||||
| ... with Session() as lib: | ||||||
| ... with lib.virtualfile_out(kind="dataset") as vouttbl: | ||||||
| ... lib.call_module("read", f"{tmpfile.name} {vouttbl} -Td") | ||||||
| ... outpd2 = lib.return_table( | ||||||
| ... output_type="pandas", | ||||||
| ... vfile=vouttbl, | ||||||
| ... column_names=["col1", "col2", "col3", "coltext"], | ||||||
| ... ) | ||||||
| ... assert isinstance(outpd2, pd.DataFrame) | ||||||
| >>> outnp | ||||||
| array([[1.0, 2.0, 3.0, 'TEXT1 TEXT23'], | ||||||
| [4.0, 5.0, 6.0, 'TEXT4 TEXT567'], | ||||||
| [7.0, 8.0, 9.0, 'TEXT8 TEXT90'], | ||||||
| [10.0, 11.0, 12.0, 'TEXT123 TEXT456789']], dtype=object) | ||||||
| >>> outpd | ||||||
| 0 1 2 3 | ||||||
| 0 1.0 2.0 3.0 TEXT1 TEXT23 | ||||||
| 1 4.0 5.0 6.0 TEXT4 TEXT567 | ||||||
| 2 7.0 8.0 9.0 TEXT8 TEXT90 | ||||||
| 3 10.0 11.0 12.0 TEXT123 TEXT456789 | ||||||
| >>> outpd2 | ||||||
| col1 col2 col3 coltext | ||||||
| 0 1.0 2.0 3.0 TEXT1 TEXT23 | ||||||
| 1 4.0 5.0 6.0 TEXT4 TEXT567 | ||||||
| 2 7.0 8.0 9.0 TEXT8 TEXT90 | ||||||
| 3 10.0 11.0 12.0 TEXT123 TEXT456789 | ||||||
| """ | ||||||
| if output_type == "file": # Already written to file, so return None | ||||||
| return None | ||||||
|
|
||||||
| # Read the virtual file as a GMT dataset and convert to pandas.DataFrame | ||||||
| result = self.read_virtualfile(vfile, kind="dataset").contents.to_dataframe() | ||||||
| if output_type == "numpy": # numpy.ndarray output | ||||||
| return result.to_numpy() | ||||||
|
|
||||||
| # Assign column names | ||||||
| if column_names is not None: | ||||||
| result.columns = column_names | ||||||
| return result # pandas.DataFrame output | ||||||
|
|
||||||
| def extract_region(self): | ||||||
| """ | ||||||
| Extract the WESN bounding box of the currently active figure. | ||||||
|
|
||||||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The method name
return_tablewas initially proposed in #1318 (comment), but is it a good name? At line 1845, we usekind="dataset", so maybe rename it toreturn_datasetoroutput_dataset? In the future, I think we will add more methods that return a grid/image/cpt/cube, so consistent method names are preferred.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think
return_datasetis better thanreturn_table. Renamed in ce029b2.Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about
virtualfile_to_dataset, orvfile_to_dataset(shorter)? A bit more explicit to say that a conversion is happening from avirtualfileto a tabulardataset.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
virtualfile_to_dataset/vfile_to_datasetsounds a good name. I likevfile_to_dataset, but do we want to also rename other functions for consistency?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You mean rename the
virtualfile_from_*methods? Let's maybe not do that (lazy to deprecate more names). We can go withvirtualfile_to_datasetfor consistency.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I meant to rename the
virtualfile_in/virtualfile_outtovfile_in/vfile_out.virtualfile_from_*functions are no longer used in our PyGMT wrappers after Universal virtualfile_from_data function to replace virtualfile_from_grid/matrix/vectors #949, so it's OK to keep them unchanged.virtualfile_outwas added in clib: Add the virtualfile_out method for creating output virtualfile #3057, so we can still rename it without deprecations.virtualfile_inwas renamed fromvirtualfile_from_datarecently (PR clib: Rename the virtualfile_from_data method to virtualfile_in #3068), so we can rename it tovfile_inwithout more deprecations.Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's keep with the long name (
virtualfile_to_dataset). Renamingvirtualfile_out->vfile_outandvirtualfile_in->vfile_inwill be more work and confusing (we'll need to manually update the changelog to track thatvirtualfile_from_databecamevirtualfile_inwhich becamevfile_in).There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done in 711142c.