Allow to specify type, format, constraints #132

peterdesmet · 2025-09-11T06:39:49Z

Since we include the schemas verbosely, I think we should allow publishers to add more rigorous type, format and constraints than the one provided at rs.tdwg.org.

For type we have to be a bit careful, which is why I suggest to use "type": "any" in our table schemas for terms that can have multiple types. That would differentiate:

eventDate: can be string or datetime

From

eventType: must always be a string

The implementation rule for "any" is that there must be no processing. For CSVs that means those values are interpreted as strings.

@tucotuco you probably have a better overview of terms that can deviate from strings?

tucotuco · 2025-09-11T07:00:26Z

I don't think that terms should allow multiple types. I imagine myself trying to load data into a strongly-typed database schema and finding that the table schemas on which I am basing an aggregation are changing from dataset to dataset.

tucotuco · 2025-09-11T07:03:11Z

Conversely, I think formats and constraints can only be useful, but constraints shouldn't be broader than those provided in the schema definitions - that ultimately could change the semantics of some terms.

peterdesmet · 2025-09-11T07:17:13Z

but constraints shouldn't be broader than those provided in the schema definitions

That is what is currently suggested for constraints in this PR:

The constraints provided in the table schema at rs.tdwg.org MAY be updated, but it MUST NOT relax the original constraints.

peterdesmet · 2025-09-11T07:39:08Z

My reasoning for allowing more specific types was especially with datetime in mind:

As a consumer, it's really useful to know that all values in eventDate comply (or I can at least validate) with datetime and specific format. datetime+format is very powerful
As a publisher, I can communicate that I made an effort to have all my values standardized.

I'm curious what others think. @timrobertson100 @mdoering @MattBlissett

timrobertson100 · 2025-09-11T07:49:19Z

I think I agree with @peterdesmet

I imagine myself trying to load data into a strongly-typed database schema and finding that the table schemas on which I am basing an aggregation are changing from dataset to dataset.

If you are imagining doing e.g. a PostgreSQL COPY ... FROM ... some.csv then I'm not sure FD will be strict enough to accommodate all scenarios. I anticipate you'd have to assume strings and then some functions/parsers to convert into typed fields.

I'm no FD expert but I believe even something like a number field in FD can have , or . delimiters or be declared to be a bareNumber allowing for additions such as %.

I'd expect any consumer of a wide variety of DPs would need to deal with variation across them. Having the ability for a publisher to use String seems convenient and likely necessary for many and having the ability for them to declare stronger typing where possible seems helpful too.

(As a more general comment, if strong typing is really what is wanted then CSV is not a format I'd promote for all the reasons we're discussing. Avro, Parquet etc are better suited mediaTypes)

Allow to specify type, format, constraints

3845926

peterdesmet mentioned this pull request Sep 11, 2025

Table schemas: include verbosely or reference by URL? #133

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Allow to specify type, format, constraints #132

Allow to specify type, format, constraints #132

Uh oh!

peterdesmet commented Sep 11, 2025

Uh oh!

tucotuco commented Sep 11, 2025

Uh oh!

tucotuco commented Sep 11, 2025

Uh oh!

peterdesmet commented Sep 11, 2025

Uh oh!

peterdesmet commented Sep 11, 2025

Uh oh!

timrobertson100 commented Sep 11, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Allow to specify type, format, constraints #132

Are you sure you want to change the base?

Allow to specify type, format, constraints #132

Uh oh!

Conversation

peterdesmet commented Sep 11, 2025

Uh oh!

tucotuco commented Sep 11, 2025

Uh oh!

tucotuco commented Sep 11, 2025

Uh oh!

peterdesmet commented Sep 11, 2025

Uh oh!

peterdesmet commented Sep 11, 2025

Uh oh!

timrobertson100 commented Sep 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

timrobertson100 commented Sep 11, 2025 •

edited

Loading