Skip to content

Conversation

@colleenXu
Copy link
Contributor

@colleenXu colleenXu commented Feb 6, 2026

Urgent request for the Feb ingest work.

Chembl and DrugCentral have chem-protein substrate data, which is modeled as "Protein has_substrate Chemical". I couldn't find an existing Association that would be useful for validation.

So I'm proposing this new Association as a solution. I've made the subject/object categories broader to cover the various kinds of enzymes and substrates (in the hopes that it'll be a useful Association in future cases).

I've tagged @vdancik for his expertise and opinion on whether this works for the Chembl ingest. Currently, the ingest is usingGeneAffectsChemicalAssociation but this is problematic because has_substrate is NOT a descendant of affects...


Note:

I haven't tested this yet, because I'm still working on my DrugCentral ingest of this data (bioactivity table act_table_full)...

@vdancik
Copy link
Collaborator

vdancik commented Feb 6, 2026

@colleenXu , even though I am fine with using GeneAffectsChemicalAssociation, I see your point. If we go ahead with macromolecular machine has substrate association, we should also add macromolecular machine has product association - it will be needed when we start ingesting metabolic reactions.

@colleenXu
Copy link
Contributor Author

@vdancik I'm surprised that there isn't a pydantic validation or pipeline validation error with GeneAffectsChemicalAssociation and the mismatched predicate...

Also, if this Association looks okay to you, can you approve this PR?

@sierra-moxon sierra-moxon merged commit e8f7b4c into master Feb 6, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants