Skip to content

Abstractions to extend catalog to other data models #57

@huard

Description

@huard

I'm working on a new CMIP6-CORDEX extension and implementation for the catalog, so we can deploy it at Ouranos.

I'm hitting a few issues that I wanted to discuss.

Abstraction

Adding a new extension and implementation requires copy-pasting a fair amount of boilerplate code, where we only change class names and one attribute name (CMIP6 -> CORDEX6). I don't mind doing this once, but we'll want to add other data models in the future and I can see the mess it would generate to have multiple copies of the same logic. I think this calls for a higher level abstraction than what we have now.

Validation

The current CMIP6 extension does validation at two places: in the Pydantic base model, and in the pystac.Item through jsonschema. Deepak created a custom jsonschema for the CMIP data that we currently use. There's now an official cmip6 jsonschema, but I don't think it would validate against the UoT catalog. It looks to be made for ESGF-held datasets, which apparently hold additional attributes.

None of these two schemas actually check that fields comply with the CV, they just make sure a value is present. So we use a pydantic.BaseModel to check that metadata attributes conform to the Controlled Vocabulary (CV).

Then we call item.validate(), which compares STAC Item attributes against Deepack' jsonschema.

My experience so far is that it's much easier to debug errors with Pydantic traceback than with the pystac.validation traceback. However, if a jsonschema already exists, I don't want to recopy that logic in pydantic. Also, our pydantic data model only checks metadata attributes, not the STAC attributes.

Customization

My experience is that catalog admins want to have some flexibility regarding the metadata to include in catalog entries. If a jsonschema exist, you might not want to include all of its properties in your catalog, and you may want to include additional properties not found in the schema. My understanding is that you cannot "unrequire" a property by subclassing or extending a jsonschema, meaning we'd have to maintain modified versions of "official" schemas to achieve the flexibility required. So in the Abstractions discussed above, I think we need to account for the need to decouple official external jsonschemas for metadata attributes from our STAC records.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions