Skip to content

Commit ef36eb1

Browse files
authored
Merge pull request #169 from elvinpoole/hdf5
Add support for hdf5 files
2 parents 615e9cc + 54023a4 commit ef36eb1

8 files changed

Lines changed: 772 additions & 13 deletions

File tree

.github/workflows/python-package-pr.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -24,7 +24,7 @@ jobs:
2424
- name: Install dependencies
2525
run: |
2626
python -m pip install --upgrade pip
27-
pip install .[parquet,test,test_with_healpy]
27+
pip install .[parquet,test,test_with_healpy,hdf5]
2828
- name: Lint with flake8
2929
run: |
3030
flake8

docs/basic_interface.rst

Lines changed: 6 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -208,12 +208,17 @@ Writing a :code:`HealSparseMap` is easy. To write a map in the default FITS for
208208
209209
map3.write('output_file.hs', clobber=False)
210210
211-
And to write a map in the Parquet format with ``pyarrow``:
211+
to write a map in the Parquet format with ``pyarrow``:
212212

213213
.. code-block :: python
214214
215215
map3.write('output_file.hsparquet', clobber=False, format='parquet')
216216
217+
and to write a map in the HDF5 format with ``h5py``:
218+
219+
.. code-block :: python
220+
221+
map3.write('output_file.h5', clobber=False, format='hdf5')
217222
218223
Metadata
219224
--------

docs/filespec.rst

Lines changed: 120 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -276,3 +276,123 @@ The other columns in the overflow coverage pixel are filled with the default sen
276276

277277
If the sparse map is a bit-packed mask, the schema is the same as for a regular sparse map image.
278278
In this case, as with the FITS serialization, the sparse map is stored as an array of unsigned 8-bit integers which is the in-memory backing of the bit-packed array.
279+
280+
.. _hdf5_format:
281+
282+
HealSparseMap HDF5 Serialization
283+
================================
284+
285+
A :code:`HealSparseMap` may also be serialized to an HDF5 file.
286+
Multiple :code:`HealSparseMap` objects can be stored in the same HDF5 file.
287+
Each :code:`HealSparseMap` is stored in a different HDF5 group.
288+
They are not required to have the same mask, :code:`nside_coverage`, or :code:`nside_sparse`.
289+
290+
All datasets are written with gzip compression and chunked such that each chunk corresponds to a single coverage pixel.
291+
292+
HDF5 Group Layout
293+
-----------------
294+
295+
A serialized :code:`HealSparseMap` is stored within a single HDF5 group (default name :code:`"map"`), which contains:
296+
297+
* A dataset encoding the coverage map
298+
* One or more datasets encoding the sparse map
299+
* Attributes storing map metadata
300+
301+
Coverage Map
302+
------------
303+
304+
The coverage map is stored as a one-dimensional dataset:
305+
306+
* **Dataset name:** :code:`cov_index_map`
307+
* **Datatype:** :code:`numpy.int64`
308+
* **Shape:** :code:`(12 * nside_coverage * nside_coverage,)`
309+
310+
This dataset is a direct serialization of the in-memory coverage index map,
311+
following the same conventions described in :ref:`coverage_map`.
312+
313+
Sparse Map
314+
----------
315+
316+
The sparse map is stored as a two-dimensional dataset, where each row corresponds to a single coverage pixel.
317+
The first row always represents the overflow (sentinel) coverage pixel.
318+
319+
Regular Sparse Map
320+
^^^^^^^^^^^^^^^^^^
321+
322+
For regular (non-record, non-wide-mask, non-bit-packed) sparse maps:
323+
324+
* **Dataset name:** :code:`sparse_map`
325+
* **Shape:** :code:`(ncov_in_sparse, nfine_per_cov)`
326+
* **Datatype:** Sparse map datatype
327+
328+
where:
329+
330+
* :code:`ncov_in_sparse`: The number of coverage pixels containing non-sentinel data, plus one for the overflow pixel
331+
332+
Each row contains the :code:`nfine_per_cov` sparse pixel values associated with a single coverage pixel.
333+
The dataset is chunked as :code:`(1, nfine_per_cov)` to allow efficient access to individual coverage pixels.
334+
335+
Sparse Map Record Array
336+
^^^^^^^^^^^^^^^^^^^^^^^
337+
338+
If the sparse map is a numpy record array, each field is stored in a separate subgroup:
339+
340+
* **Group name:** Name of the record field
341+
* **Dataset name:** :code:`sparse_map`
342+
* **Shape:** :code:`(ncov_in_sparse, nfine_per_cov)`
343+
* **Datatype:** Field datatype
344+
345+
All fields share the same coverage map indexing.
346+
347+
Sparse Map Wide Mask
348+
^^^^^^^^^^^^^^^^^^^^
349+
350+
For wide mask maps, the sparse map is stored as a three-dimensional dataset:
351+
352+
* **Dataset name:** :code:`sparse_map`
353+
* **Shape:** :code:`(ncov_in_sparse, nfine_per_cov, wide_mask_width)`
354+
* **Datatype:** :code:`numpy.uint8`
355+
356+
Each sparse pixel is represented by :code:`wide_mask_width` bytes.
357+
The dataset is chunked as :code:`(1, nfine_per_cov, wide_mask_width)`.
358+
359+
Sparse Map Bit-Packed Mask
360+
^^^^^^^^^^^^^^^^^^^^^^^^^^
361+
362+
For bit-packed mask maps, the sparse map is stored in its packed representation:
363+
364+
* **Dataset name:** :code:`sparse_map`
365+
* **Shape:** :code:`(ncov_in_sparse, nfine_per_cov // 8)`
366+
* **Datatype:** :code:`numpy.uint8`
367+
368+
Each byte encodes eight sparse pixels.
369+
370+
Metadata and Attributes
371+
-----------------------
372+
373+
All remaining map metadata is stored as HDF5 group attributes.
374+
The following attributes are required:
375+
376+
* :code:`nside_sparse`
377+
* :code:`nside_coverage`
378+
* :code:`sentinel`
379+
* :code:`primary`
380+
* :code:`nest` (always :code:`True`)
381+
382+
The following flags describe the sparse map type:
383+
384+
* :code:`is_rec_array`
385+
* :code:`is_bit_packed`
386+
* :code:`is_wide_mask`
387+
* :code:`wide_mask_width`
388+
389+
Any additional metadata attached to the :code:`HealSparseMap` object is also written as group attributes.
390+
391+
Partial Reads
392+
-------------
393+
394+
The HDF5 format supports reading a subset of coverage pixels.
395+
When a subset is requested, only the corresponding rows of the sparse map datasets are read.
396+
The overflow (sentinel) coverage pixel is always included as the first row.
397+
398+
On read, the sparse map rows are reshaped and reordered to reconstruct the in-memory sparse map layout described in :ref:`sparse_map`.

healsparse/healSparseMap.py

Lines changed: 10 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -119,7 +119,7 @@ def __init__(self, cov_map=None, cov_index_map=None, sparse_map=None, nside_spar
119119
@classmethod
120120
def read(cls, filename, nside_coverage=None, pixels=None, header=False,
121121
degrade_nside=None, weightfile=None, reduction='mean',
122-
use_threads=False):
122+
use_threads=False, hdf5_group='map'):
123123
"""
124124
Read in a HealSparseMap.
125125
@@ -149,6 +149,8 @@ def read(cls, filename, nside_coverage=None, pixels=None, header=False,
149149
Not yet implemented for parquet files.
150150
use_threads : `bool`, optional
151151
Use multithreaded reading for parquet files.
152+
hdf5_group : `str`, optional
153+
If file format is ``hdf5``, read from this group, otherwise unused.
152154
153155
Returns
154156
-------
@@ -159,7 +161,8 @@ def read(cls, filename, nside_coverage=None, pixels=None, header=False,
159161
"""
160162
return _read_map(cls, filename, nside_coverage=nside_coverage, pixels=pixels,
161163
header=header, degrade_nside=degrade_nside,
162-
weightfile=weightfile, reduction=reduction, use_threads=use_threads)
164+
weightfile=weightfile, reduction=reduction, use_threads=use_threads,
165+
hdf5_group=hdf5_group)
163166

164167
@classmethod
165168
def make_empty(cls, nside_coverage, nside_sparse, dtype, primary=None, sentinel=None,
@@ -359,7 +362,7 @@ def convert_healpix_map(healpix_map, nside_coverage, nest=True, sentinel=hpg.UNS
359362

360363
return cov_map, sparse_map
361364

362-
def write(self, filename, clobber=False, nocompress=False, format='fits', nside_io=4):
365+
def write(self, filename, clobber=False, nocompress=False, format='fits', nside_io=4, hdf5_group='map'):
363366
"""
364367
Write a HealSparseMap to a file. Use the `metadata` property from
365368
the map to persist additional information in the fits header.
@@ -379,17 +382,19 @@ def write(self, filename, clobber=False, nocompress=False, format='fits', nside_
379382
Must be less than or equal to nside_coverage, and not greater than 16.
380383
This option only applies if format=``parquet``.
381384
format : `str`, optional
382-
File format. May be ``fits``, ``parquet``, or ``healpix``. Note that
385+
File format. May be ``fits``, ``parquet``, ``hdf5`` or ``healpix``. Note that
383386
the ``healpix`` EXPLICIT format does not maintain all metadata and
384387
coverage information.
388+
hdf5_group: `str`, optional
389+
If file format is ``hdf5``, save to this group, otherwise unused.
385390
386391
Raises
387392
------
388393
NotImplementedError if file format is not supported.
389394
ValueError if nside_io is out of range.
390395
"""
391396
_write_map(self, filename, clobber=clobber, nocompress=nocompress, format=format,
392-
nside_io=nside_io)
397+
nside_io=nside_io, hdf5_group=hdf5_group)
393398

394399
def write_moc(self, filename, clobber=False):
395400
"""

healsparse/io_map.py

Lines changed: 22 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -5,10 +5,11 @@
55
from .parquet_shim import check_parquet_dataset
66
from .io_map_parquet import _read_map_parquet, _write_map_parquet
77
from .io_map_healpix import _write_map_healpix
8+
from .io_map_hdf5 import _read_map_hdf5, _write_map_hdf5, check_hdf5_file
89

910

1011
def _read_map(healsparse_class, filename, nside_coverage=None, pixels=None, header=False,
11-
degrade_nside=None, weightfile=None, reduction='mean', use_threads=False):
12+
degrade_nside=None, weightfile=None, reduction='mean', use_threads=False, hdf5_group='map'):
1213
"""
1314
Internal function to check the map filetype and read in a HealSparseMap.
1415
@@ -37,6 +38,8 @@ def _read_map(healsparse_class, filename, nside_coverage=None, pixels=None, head
3738
(mean, median, std, max, min, and, or, sum, prod, wmean).
3839
use_threads : `bool`, optional
3940
Use multithreaded reading for parquet files.
41+
hdf5_group: `str`, optional
42+
Read from this hdf5 group (only used reading from an hdf5 file).
4043
4144
Returns
4245
-------
@@ -47,17 +50,22 @@ def _read_map(healsparse_class, filename, nside_coverage=None, pixels=None, head
4750
"""
4851
is_fits_file = False
4952
is_parquet_file = False
53+
is_hdf5_file = False
5054

5155
try:
5256
fits = HealSparseFits(filename)
5357
is_fits_file = True
5458
fits.close()
55-
except OSError:
59+
except (OSError, UnicodeDecodeError):
5660
pass
61+
# UnicodeDecodeError occurs when trying to read an hdf5 file as if it's FITS
5762

5863
if not is_fits_file:
5964
is_parquet_file = check_parquet_dataset(filename)
6065

66+
if not is_fits_file and not is_parquet_file:
67+
is_hdf5_file = check_hdf5_file(filename)
68+
6169
if is_fits_file:
6270
return _read_map_fits(healsparse_class, filename, nside_coverage=nside_coverage,
6371
pixels=pixels, header=header, degrade_nside=degrade_nside,
@@ -67,13 +75,19 @@ def _read_map(healsparse_class, filename, nside_coverage=None, pixels=None, head
6775
pixels=pixels, header=header, degrade_nside=degrade_nside,
6876
weightfile=weightfile, reduction=reduction,
6977
use_threads=use_threads)
78+
elif is_hdf5_file:
79+
return _read_map_hdf5(healsparse_class, filename, hdf5_group=hdf5_group,
80+
pixels=pixels, header=header, degrade_nside=degrade_nside,
81+
weightfile=weightfile, reduction=reduction)
7082
elif not os.path.isfile(filename):
7183
raise IOError("Filename %s could not be found." % (filename))
7284
else:
73-
raise NotImplementedError("HealSparse only supports fits and parquet files (with pyarrow).")
85+
raise NotImplementedError(
86+
"HealSparse only supports fits, hdf5 (with h5py), and parquet files (with pyarrow).")
7487

7588

76-
def _write_map(hsp_map, filename, clobber=False, nocompress=False, format='fits', nside_io=4):
89+
def _write_map(hsp_map, filename, clobber=False, nocompress=False, format='fits', nside_io=4,
90+
hdf5_group='map'):
7791
"""
7892
Internal method to write a HealSparseMap to a file, and check formats.
7993
Use the `metadata` property from
@@ -96,7 +110,7 @@ def _write_map(hsp_map, filename, clobber=False, nocompress=False, format='fits'
96110
This option only applies if format=``parquet``.
97111
Must be less than or equal to nside_coverage, and not greater than 16.
98112
format : `str`, optional
99-
File format. May be ``fits``, ``parquet``, or ``healpix``.
113+
File format. May be ``fits``, ``parquet``, ``hdf5`` or ``healpix``.
100114
101115
Raises
102116
------
@@ -109,8 +123,10 @@ def _write_map(hsp_map, filename, clobber=False, nocompress=False, format='fits'
109123
_write_map_parquet(hsp_map, filename, clobber=clobber, nside_io=nside_io)
110124
elif format == 'healpix':
111125
_write_map_healpix(hsp_map, filename, clobber=clobber)
126+
elif format == 'hdf5':
127+
_write_map_hdf5(hsp_map, filename, clobber=clobber, hdf5_group=hdf5_group)
112128
else:
113-
raise NotImplementedError("Only 'fits', 'parquet' and 'healpix' file formats are supported.")
129+
raise NotImplementedError("Only 'fits', 'parquet', 'hdf5' and 'healpix' file formats are supported.")
114130

115131

116132
def _write_moc(hsp_map, filename, clobber=False):

0 commit comments

Comments
 (0)