Skip to content
Closed
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
47 changes: 47 additions & 0 deletions pygmt/helpers/utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,8 @@
from contextlib import contextmanager

import xarray as xr
import numpy as np
import pandas as pd
from pygmt.exceptions import GMTInvalidInput


Expand Down Expand Up @@ -267,3 +269,48 @@ def args_in_kwargs(args, kwargs):
If one of the required arguments is in ``kwargs``.
"""
return any(arg in kwargs for arg in args)


def return_table(result, data_format, format_parameter, df_columns):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks promising! A few comments:

  1. I think you will need to use this with one of the existing functions so that we can test it out in this PR. In my opinion, grdtrack would be a good option.
  2. My preference would be to add a format_options parameter for return_table. This could take a list, with defaults values including numpy, pandas, and str. This way, if it doesn't for example make sense to return string values then that could not be given as an option in the individual function documentation/implementation.
  3. table-like options for input are now str or numpy.ndarray or pandas.DataFrame or xarray.Dataset or geopandas.GeoDataFrame. It would be nice if all of these were output options too.
  4. I would prefer a more description argument for requested data format than a|d|s. Something like numpy|pandas|str seems more readable.
  5. I think df_columns needs to be optional.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

3. table-like options for input are now `str or numpy.ndarray or 
pandas.DataFrame or xarray.Dataset or geopandas.GeoDataFrame`. 
It would be nice if all of these were output options too.

Sounds good; I'll have to get smarter on the last two but I don't see why it should be a problem.

4. I would prefer a more description argument for requested data format than **a**|**d**|**s**. 
Something like `numpy`|`pandas`|`str` seems more readable.

I like the idea of keeping it short, especially when there is a default option (I anticipate it being a numpy array) and the strings are not also the same word as Python modules or variable types. But I understand how the single letters could be confusing.

5. I think `df_columns` needs to be optional.

Since this is a helper function, I envisioned that the argument for df_columns would be set up in the GMT function that is using it, such as using ["x", "y", "z"] when calling this function inside of grd2xyz.

Copy link
Contributor Author

@willschlitzer willschlitzer Jun 25, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

3. table-like options for input are now `str or numpy.ndarray or pandas.DataFrame 
or xarray.Dataset or geopandas.GeoDataFrame`. 
It would be nice if all of these were output options too.

Added in c8cef5e

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. I would prefer a more description argument for requested data format than a|d|s.
    Something like numpy|pandas|str seems more readable.

I like the idea of keeping it short, especially when there is a default option (I anticipate it being a numpy array) and the strings are not also the same word as Python modules or variable types. But I understand how the single letters could be confusing.

I'll have to agree with Meghan that long descriptive names like numpy|pandas|str are preferable 🙂

r"""
Take the table output from the GMT API and return it as either a string,
array, or DataFrame.

Parameters
----------
result : str
The table returned from the GMT API as a string.
data_format : str
A single-letter string that specifies requested data format of the
table.
**a** : numpy array
**d** : pandas DataFrame
**s** : string
format_parameter : str
The name of the parameter used to specify the data format in the
pygmt function. This name is used when raising the GMTInvalidInput
error to ensure module-specific parameters are consistent with the
error raised.
df_columns : list
The column names of the returned pandas DataFrame.
"""

if data_format == "s":
return result
data_list = []
for string_entry in result.strip().split("\n"):
float_entry = []
string_list = string_entry.strip().split()
for i in string_list:
float_entry.append(float(i))
data_list.append(float_entry)
data_array = np.array(data_list)
if data_format == "a":
result = data_array
elif data_format == "d":
result = pd.DataFrame(data_array, columns=df_columns)
else:
raise GMTInvalidInput(
f"""Must specify {format_parameter} as either a, d, or s."""
)
return result