open_datasets fails to open GRIB messages of same parameter with different forecastTime values, silently skipping them

### What happened?

If a GRIB file has messages for a parameter (like 'Total precipitation' or 'tp') which is expressed as an average ('stepType' = 'avg'), and one message describes the average of the preceding hour, and one describes the average since the reference time (t=0, model start, start of prediction, etc), then cfgrib's open_datasets function is unable to recognize the difference between these two messages, and consequently only includes one (the first). The second message will not be present in the result, without any hick-up or indication to the user that the data returned is not, in fact, all the data from the GRIB file.

NOTE: This would happen to any stepType that describes some form of time interval (average, accumulation, maximum, etc). It would also ignore any amount of messages past the first, if more than one is present in the GRIB file.

I have also identified the cause of this behaviour, and potentially a (start for a) fix.

Within cfgrib, when opening a GRIB file, the `enforce_unique_attributes` function (in `dataset.py`) is used as the first step in `build_variable_components` the to ensure that the resulting dataset is a valid hypercube. The error raised when it is not is used by `raw_open_datasets` (in `xarray_store.py`) to keep refining a set of `filter_by_keys` values until the entire GRIB file can be read into hypercubes without conflicts.

Inside of the GRIB message, time time interval of the data is encoded via 'forecast time' (octets 19-22 in Section 4 of the GRIB message, called 'forecastTime' by eccodes). For a message, say, 16 hours ahead of the reference time, if the stepType is 'instant', forecastTime would be 16. If the stepType is 'avg' and the data describes the average over the preceding hour, forecastTime would be 15. And if the data describes the average since the reference time, forecastTime would be 0.

The problem is that the set of attribute keys provided to `enforce_unique_attributes` (`DATA_ATTRIBUTES_KEYS`) does not include this attribute, or any derived attribute (`stepRange` for example). If you add "forecastTime" to the list `DATA_ATTRIBUTES_KEYS`, the messages are correctly distinguished and all present in the resulting datasets.

While it is possible to supply `read_keys` as a kwargs to open_datasets, these only comes in with the `extra_keys` in `build_variable_components`, and are not used to enforce unique attributes. I have tried this, but it does not result in getting the 'lost' messages in the output datasets.

You can use `backend_kwargs={"filter_by_keys": {"forecastTime": <some_value>}}` to get the separate messages, but that requires that you know all the possible values ahead of time, and that you even know that this problem occurs. It is my understanding that the point of the `open_datasets()` function is to be able to fully read in a GRIB file _without_ knowing this. As it stands, you simply don't get the data, and you wouldn't know you are missing some of the GRIB messages until you fully compare the output datasets to the input GRIB file.

The reason I am unsure if adding 'forecastTime' to `DATA_ATTRIBUTES_KEYS` is a desirable fix, is that it results in potentially undesirable behaviour when opening GRIB files containing messages spanning multiple timesteps. I believe that the varying values of the forecastTime attribute would force what is effectively the same parameter into different datasets. That might mean a different solution is required, or that some more work is required to prevent this from happening when it is not desired. Perhaps different attributes like lengthOfTimeRange can be of help.

### What are the steps to reproduce the bug?

- Get a GRIB file with multiple messages for the same parameter and time, but with differing time intervals.
  - I recommend a GRIB file from NCEP. One can be downloaded with ease from https://nomads.ncep.noaa.gov/gribfilter.php?ds=gfs_0p25_1hr . Make sure to select a file some time past t=0, say 10 hours ahead (the file ending with `f010`. Select 'ACPCP' or 'APCP' as Parameter, leave Levels to 'All' ('surface' is the only provided level for these parameters), and enter some small subregion to save data. NCEP provides these two parameters as averages both since t=0 and since the most-recent-6-hour-interval. This means that timestep 10 will have an average over the past 10 hours and an average over the past 4 hours, i.e. since t=6. 
 - Verify using a tool like `grib_ls` that the GRIB file includes 2 messages for these parameters
 - Attempt to open the file with `cfgrib.open_datasets()`
 - Observe that there is only one entry per parameter


### Version

0.9.10.4

### Platform (OS and architecture)

WSL2 Ubuntu 22.04.2 LTS

### Relevant log output

_No response_

### Accompanying data

_No response_

### Organisation

_No response_

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

open_datasets fails to open GRIB messages of same parameter with different forecastTime values, silently skipping them #344

What happened?

What are the steps to reproduce the bug?

Version

Platform (OS and architecture)

Relevant log output

Accompanying data

Organisation

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

open_datasets fails to open GRIB messages of same parameter with different forecastTime values, silently skipping them #344

Description

What happened?

What are the steps to reproduce the bug?

Version

Platform (OS and architecture)

Relevant log output

Accompanying data

Organisation

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions