Skip to content

Commit 2e246cc

Browse files
JSON/TOML backend: introduce abbreviated IO modes (#1493)
* Introduce dataset template mode to JSON backend * Write used mode to JSON file * Use Attribute::getOptional for snapshot attribute * Introduce attribute mode * Add example 14_toml_template.cpp * Use Datatype::UNDEFINED to indicate no dataset definition in template * Extend example * Test short attribute mode * Copy datatypeToString to JSON implementation * Fix after rebase: Init JSON config in parallel mode * Fix after rebase: Don't erase JSON datasets when writing * openpmd-pipe: use short modes for test * Less intrusive warnings, allow disabling them * TOML: Use short modes by default * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Documentation * Short mode in default in openPMD >= 2. * Short value by default in TOML * Store the openPMD version information in the IOHandler * Fixes * Adapt test to recent rebase Reading the chunk table requires NOT using template mode, otherwise the string just consists of '\0' bytes. * toml11 4.0 compatibility * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * wip: cleanup * wip: cleanup * Cleanup * Extensive testing --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
1 parent c639257 commit 2e246cc

18 files changed

+1560
-160
lines changed

CMakeLists.txt

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -703,6 +703,7 @@ set(openPMD_EXAMPLE_NAMES
703703
10_streaming_read
704704
12_span_write
705705
13_write_dynamic_configuration
706+
14_toml_template
706707
)
707708
set(openPMD_PYTHON_EXAMPLE_NAMES
708709
2_read_serial
@@ -1327,6 +1328,9 @@ if(openPMD_BUILD_TESTING)
13271328
${openPMD_RUNTIME_OUTPUT_DIRECTORY}/openpmd-pipe \
13281329
--infile ../samples/git-sample/thetaMode/data_%T.bp \
13291330
--outfile ../samples/git-sample/thetaMode/data%T.json \
1331+
--outconfig ' \
1332+
json.attribute.mode = \"short\" \n\
1333+
json.dataset.mode = \"template_no_warn\"' \
13301334
"
13311335
WORKING_DIRECTORY ${openPMD_RUNTIME_OUTPUT_DIRECTORY}
13321336
)

docs/source/backends/json.rst

Lines changed: 32 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -38,20 +38,47 @@ when working with the JSON backend.
3838
Datasets and groups have the same namespace, meaning that there may not be a subgroup
3939
and a dataset with the same name contained in one group.
4040

41-
Any **openPMD dataset** is a JSON object with three keys:
41+
Datasets
42+
........
4243

43-
* ``attributes``: Attributes associated with the dataset. May be ``null`` or not present if no attributes are associated with the dataset.
44-
* ``datatype``: A string describing the type of the stored data.
45-
* ``data`` A nested array storing the actual data in row-major manner.
44+
Datasets can be stored in two modes, either as actual datasets or as dataset templates.
45+
The mode is selected by the :ref:`JSON/TOML parameter<backendconfig>` ``json.dataset.mode`` (resp. ``toml.dataset.mode``) with possible values ``["dataset", "template"]`` (default: ``"dataset"``).
46+
47+
Stored as an actual dataset, an **openPMD dataset** is a JSON object with three JSON keys:
48+
49+
* ``datatype`` (required): A string describing the type of the stored data.
50+
* ``data`` (required): A nested array storing the actual data in row-major manner.
4651
The data needs to be consistent with the fields ``datatype`` and ``extent``.
4752
Checking whether this key points to an array can be (and is internally) used to distinguish groups from datasets.
53+
* ``attributes``: Attributes associated with the dataset. May be ``null`` or not present if no attributes are associated with the dataset.
54+
55+
Stored as a **dataset template**, an openPMD dataset is represented by three JSON keys:
56+
57+
* ``datatype`` (required): As above.
58+
* ``extent`` (required): A list of integers, describing the extent of the dataset.
59+
This replaces the ``data`` key from the non-template representation.
60+
* ``attributes``: As above.
4861

49-
**Attributes** are stored as a JSON object with a key for each attribute.
62+
This mode stores only the dataset metadata.
63+
Chunk load/store operations are ignored.
64+
65+
Attributes
66+
..........
67+
68+
In order to avoid name clashes, attributes are generally stored within a separate subgroup ``attributes``.
69+
70+
Attributes can be stored in two formats.
71+
The format is selected by the :ref:`JSON/TOML parameter<backendconfig>` ``json.attribute.mode`` (resp. ``toml.attribute.mode``) with possible values ``["long", "short"]`` (default: ``"long"`` for JSON in openPMD 1.*, ``"short"`` otherwise, i.e. generally in openPMD 2.*, but always in TOML).
72+
73+
Attributes in **long format** store the datatype explicitly, by representing attributes as JSON objects.
5074
Every such attribute is itself a JSON object with two keys:
5175

5276
* ``datatype``: A string describing the type of the value.
5377
* ``value``: The actual value of type ``datatype``.
5478

79+
Attributes in **short format** are stored as just the simple value corresponding with the attribute.
80+
Since JSON/TOML values are pretty-printed into a human-readable format, byte-level type details can be lost when reading those values again later on (e.g. the distinction between different integer types).
81+
5582
TOML File Format
5683
----------------
5784

docs/source/details/backendconfig.rst

Lines changed: 19 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -104,6 +104,8 @@ The key ``rank_table`` allows specifying the creation of a **rank table**, used
104104
Configuration Structure per Backend
105105
-----------------------------------
106106

107+
Please refer to the respective backends' documentations for further information on their configuration.
108+
107109
.. _backendconfig-adios2:
108110

109111
ADIOS2
@@ -231,8 +233,21 @@ The parameters eligible for being passed to flush calls may be configured global
231233

232234
.. _backendconfig-other:
233235

234-
Other backends
235-
^^^^^^^^^^^^^^
236+
JSON/TOML
237+
^^^^^^^^^
236238

237-
Do currently not read the configuration string.
238-
Please refer to the respective backends' documentations for further information on their configuration.
239+
A full configuration of the JSON backend:
240+
241+
.. literalinclude:: json.json
242+
:language: json
243+
244+
The TOML backend is configured analogously, replacing the ``"json"`` key with ``"toml"``.
245+
246+
All keys found under ``json.dataset`` are applicable globally as well as per dataset.
247+
Explanation of the single keys:
248+
249+
* ``json.dataset.mode`` / ``toml.dataset.mode``: One of ``"dataset"`` (default) or ``"template"``.
250+
In "dataset" mode, the dataset will be written as an n-dimensional (recursive) array, padded with nulls (JSON) or zeroes (TOML) for missing values.
251+
In "template" mode, only the dataset metadata (type, extent and attributes) are stored and no chunks can be written or read (i.e. write/read operations will be skipped).
252+
* ``json.attribute.mode`` / ``toml.attribute.mode``: One of ``"long"`` (default in openPMD 1.*) or ``"short"`` (default in openPMD 2.* and generally in TOML).
253+
The long format explicitly encodes the attribute type in the dataset on disk, the short format only writes the actual attribute as a JSON/TOML value, requiring readers to recover the type.

docs/source/details/json.json

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,10 @@
1+
{
2+
"json": {
3+
"dataset": {
4+
"mode": "template"
5+
},
6+
"attribute": {
7+
"mode": "short"
8+
}
9+
}
10+
}

examples/14_toml_template.cpp

Lines changed: 111 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,111 @@
1+
#include <openPMD/openPMD.hpp>
2+
3+
std::string backendEnding()
4+
{
5+
auto extensions = openPMD::getFileExtensions();
6+
if (auto it = std::find(extensions.begin(), extensions.end(), "toml");
7+
it != extensions.end())
8+
{
9+
return *it;
10+
}
11+
else
12+
{
13+
// Fallback for buggy old NVidia compiler
14+
return "json";
15+
}
16+
}
17+
18+
void write()
19+
{
20+
std::string config = R"(
21+
{
22+
"iteration_encoding": "variable_based",
23+
"json": {
24+
"dataset": {"mode": "template"},
25+
"attribute": {"mode": "short"}
26+
},
27+
"toml": {
28+
"dataset": {"mode": "template"},
29+
"attribute": {"mode": "short"}
30+
}
31+
}
32+
)";
33+
34+
openPMD::Series writeTemplate(
35+
"../samples/tomlTemplate." + backendEnding(),
36+
openPMD::Access::CREATE,
37+
config);
38+
auto iteration = writeTemplate.writeIterations()[0];
39+
40+
openPMD::Dataset ds{openPMD::Datatype::FLOAT, {5, 5}};
41+
42+
auto temperature =
43+
iteration.meshes["temperature"][openPMD::RecordComponent::SCALAR];
44+
temperature.resetDataset(ds);
45+
46+
auto E = iteration.meshes["E"];
47+
E["x"].resetDataset(ds);
48+
E["y"].resetDataset(ds);
49+
/*
50+
* Don't specify datatype and extent for this one to indicate that this
51+
* information is not yet known.
52+
*/
53+
E["z"].resetDataset({});
54+
55+
ds.extent = {10};
56+
57+
auto electrons = iteration.particles["e"];
58+
electrons["position"]["x"].resetDataset(ds);
59+
electrons["position"]["y"].resetDataset(ds);
60+
electrons["position"]["z"].resetDataset(ds);
61+
62+
electrons["positionOffset"]["x"].resetDataset(ds);
63+
electrons["positionOffset"]["y"].resetDataset(ds);
64+
electrons["positionOffset"]["z"].resetDataset(ds);
65+
electrons["positionOffset"]["x"].makeConstant(3.14);
66+
electrons["positionOffset"]["y"].makeConstant(3.14);
67+
electrons["positionOffset"]["z"].makeConstant(3.14);
68+
69+
ds.dtype = openPMD::determineDatatype<uint64_t>();
70+
electrons.particlePatches["numParticles"][openPMD::RecordComponent::SCALAR]
71+
.resetDataset(ds);
72+
electrons
73+
.particlePatches["numParticlesOffset"][openPMD::RecordComponent::SCALAR]
74+
.resetDataset(ds);
75+
electrons.particlePatches["offset"]["x"].resetDataset(ds);
76+
electrons.particlePatches["offset"]["y"].resetDataset(ds);
77+
electrons.particlePatches["offset"]["z"].resetDataset(ds);
78+
electrons.particlePatches["extent"]["x"].resetDataset(ds);
79+
electrons.particlePatches["extent"]["y"].resetDataset(ds);
80+
electrons.particlePatches["extent"]["z"].resetDataset(ds);
81+
}
82+
83+
void read()
84+
{
85+
/*
86+
* The config is entirely optional, these things are also detected
87+
* automatically when reading
88+
*/
89+
90+
// std::string config = R"(
91+
// {
92+
// "iteration_encoding": "variable_based",
93+
// "toml": {
94+
// "dataset": {"mode": "template"},
95+
// "attribute": {"mode": "short"}
96+
// }
97+
// }
98+
// )";
99+
100+
openPMD::Series read(
101+
"../samples/tomlTemplate." + backendEnding(),
102+
openPMD::Access::READ_LINEAR);
103+
read.parseBase();
104+
openPMD::helper::listSeries(read);
105+
}
106+
107+
int main()
108+
{
109+
write();
110+
read();
111+
}

include/openPMD/Dataset.hpp

Lines changed: 25 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -41,18 +41,40 @@ class Dataset
4141
public:
4242
enum : std::uint64_t
4343
{
44-
JOINED_DIMENSION = std::numeric_limits<std::uint64_t>::max()
44+
/**
45+
* Setting one dimension of the extent as JOINED_DIMENSION means that
46+
* the extent along that dimension will be defined by the sum of all
47+
* parallel processes' contributions.
48+
* Only one dimension can be joined. For store operations, the offset
49+
* should be an empty array and the extent should give the actual
50+
* extent of the chunk (i.e. the number of joined elements along the
51+
* joined dimension, equal to the global extent in all other
52+
* dimensions). For more details, refer to
53+
* docs/source/usage/workflow.rst.
54+
*/
55+
JOINED_DIMENSION = std::numeric_limits<std::uint64_t>::max(),
56+
/**
57+
* Some backends (i.e. JSON and TOML in template mode) support the
58+
* creation of dataset with undefined datatype and extent.
59+
* The extent should be given as {UNDEFINED_EXTENT} for that.
60+
*/
61+
UNDEFINED_EXTENT = std::numeric_limits<std::uint64_t>::max() - 1
4562
};
4663

4764
Dataset(Datatype, Extent, std::string options = "{}");
4865

4966
/**
5067
* @brief Constructor that sets the datatype to undefined.
5168
*
52-
* Helpful for resizing datasets, since datatypes need not be given twice.
69+
* Helpful for:
70+
*
71+
* 1. Resizing datasets, since datatypes need not be given twice.
72+
* 2. Initializing datasets as undefined, as used by template mode in the
73+
* JSON/TOML backend. In this case, the default (undefined) specification
74+
* for the Extent may be used.
5375
*
5476
*/
55-
Dataset(Extent);
77+
Dataset(Extent = {UNDEFINED_EXTENT});
5678

5779
Dataset &extend(Extent newExtent);
5880

include/openPMD/IO/AbstractIOHandler.hpp

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -201,6 +201,7 @@ class AbstractIOHandler
201201
{
202202
friend class Series;
203203
friend class ADIOS2IOHandlerImpl;
204+
friend class JSONIOHandlerImpl;
204205
friend class detail::ADIOS2File;
205206

206207
private:

include/openPMD/IO/JSON/JSONIOHandler.hpp

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -23,6 +23,7 @@
2323

2424
#include "openPMD/IO/AbstractIOHandler.hpp"
2525
#include "openPMD/IO/JSON/JSONIOHandlerImpl.hpp"
26+
#include "openPMD/auxiliary/JSON_internal.hpp"
2627

2728
#if openPMD_HAVE_MPI
2829
#include <mpi.h>

0 commit comments

Comments
 (0)