POC: Wrap GMT_Read_Data and read datasets/grids/images into GMT data container #3318

seisman · 2024-07-08T05:35:53Z

Description of proposed changes

This PR wraps the GMT API function GMT_Read_Data.

Currently, this PR contains 4 commits:

c55dd0c: Wrap the GMT_Read_Data function
7e71225: Add tests for reading datasets/grids into GMT container _GMT_DATASET/_GMT_GRID
8f46ad3: Merge PR GMT_IMAGE: Implement the GMT_IMAGE.to_dataarray method for 3-band images #3128 into this branch so that we can test the behavior for _GMT_IMAGE (with minor fixes)
fa37838: Add tests for reading images

All new tests pass and the most important thing I learned is, that for an input grid/image, we can always read it into a GMT_IMAGE container. For grid, header.n_bands=1 and for image, header.n_bands=3(or any other values). The GMT_Read_Data API function can read a grid/image in either one step or two steps. Here "two steps" means reading the header first and then reading the data. Reading the header only is very efficient (a few milliseconds) even for huge grid/image files. Thus, we can read the grid/image header in and check header.n_bands to determine the input data kind, which addresses the concerns in ##3115 (comment).

So, the next steps are:

Keep the first two commits only, discard the 3rd commit (already in GMT_IMAGE: Implement the GMT_IMAGE.to_dataarray method for 3-band images #3128) and backup the tests in the 4th commit.
Polish the GMT_Read_Data wrapper and the tests for datasets/grids and finalize the wrapper Wrap the GMT API function GMT_Read_Data to read data into GMT data containers #3324
Implement the idea about reading the grid/image header to determine the input data kind, then we should be able to finish and merge **BREAKING** pygmt.grdcut: Refactor to store output in virtualfiles for grids #3115
Break GMT_IMAGE: Implement the GMT_IMAGE.to_dataarray method for 3-band images #3128 into two PRs. The first PR only contains the definition and fields of the GMT_IMAGE wrapper (line 1-71 of pygmt/datatypes/image.py in PR GMT_IMAGE: Implement the GMT_IMAGE.to_dataarray method for 3-band images #3128, even without any doctests), and another PR with the remaining WIP codes. Then review/merge the first PR and work on the 2nd PR in the future.
After merging the 1st PR in point 4, we should be able to add all the tests in fa37838.

I plan to work on the above steps in separate PRs and keep this PR unchanged so that we can trace back to this PR in the future.

…ntainers

seisman · 2024-07-08T14:44:35Z

pygmt/clib/session.py

+    def read_data(
+        self,
+        family: str,
+        geometry: str,
+        mode: str,
+        wesn: Sequence[float] | None,
+        infile: str,
+        data=None,
+    ):


The definition of the method follows the definition of the API function GMT_Read_Data. We need to call it like this:

lib.read_data("GMT_IS_DATASET", "GMT_IS_PLP", "GMT_READ_NORMAL", None, infile, None) lib.read_data("GMT_IS_GRID", "GMT_IS_SURFACE", "GMT_READ_NORMAL", None, infile, None) lib.read_data("GMT_IS_IMAGE", "GMT_IS_SURFACE", "GMT_READ_NORMAL", None, infile, None)

but they look really boring and weird.

I prefer to refactor the function definition like this:

lib.read_data(infile, kind="dataset") lib.read_data(infile, kind="grid") lib.read_data(infile, kind="image")

similar to the syntax of the gmt read infile outfile -Tc|d|g|i|p|u syntax of the special read/write module. For reference, the read module mainly calls the gmt_copy function and the gmt_copy function calls GMT_Read_Data with different arguments based on the data family.

Anyway, this should be discussed in more detail later.

seisman · 2024-07-08T14:45:02Z

pygmt/clib/session.py

    @contextlib.contextmanager
    def virtualfile_out(
-        self, kind: Literal["dataset", "grid"] = "dataset", fname: str | None = None
+        self,


Changes below are copied from #3128 so can be ignored when reviewing this PR.

seisman · 2024-07-08T14:45:26Z

pygmt/datatypes/__init__.py


 from pygmt.datatypes.dataset import _GMT_DATASET
 from pygmt.datatypes.grid import _GMT_GRID
+from pygmt.datatypes.image import _GMT_IMAGE


Copied from #3128

seisman · 2024-07-08T14:45:29Z

pygmt/datatypes/header.py

        """
        attrs: dict[str, Any] = {}
-        attrs["Conventions"] = "CF-1.7"
+        if self.type == 18:  # Grid file format: ns = GMT netCDF format


Copied from #3128

seisman · 2024-07-08T14:45:33Z

pygmt/datatypes/image.py

@@ -0,0 +1,182 @@
+"""


Copied from #3128 and can be ignored when reviwing this PR.

seisman · 2024-07-08T14:47:13Z