-
Notifications
You must be signed in to change notification settings - Fork 3
SIGNOR QA ingest, modified ingest code, RIG, and added unit test #282
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
update Chemical to replace ChemicalEntity in rig.yaml to be consistent with the ingest.py file
Changes: 1. Added new ingestion codes for missing edge relationships in the ingestion.py 2. Added comments and scaled down the RIG.yaml file 3. Added a new test_signor.py unit test code (based on template, haven't run test on it, so not sure if those two unit test cases would work) Qi
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR updates the SIGNOR ingest implementation and its Reference Ingest Guide (RIG) metadata to expand/adjust the supported entity category combinations and edge typings for initial QA ingest work.
Changes:
- Updated
signor_rig.yamlto revise described included content and adjust declared node/edge categories (including adding/removing/commenting categories). - Extended
signor.pytransform logic with new branches for additional subject/object category combinations (protein/chemical/smallmolecule) usingbiolink:affectsand qualifiers.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 13 comments.
| File | Description |
|---|---|
src/translator_ingest/ingests/signor/signor_rig.yaml |
Updates SIGNOR RIG metadata for included content and declared node/edge category patterns. |
src/translator_ingest/ingests/signor/signor.py |
Adds new transform branches for protein/chemical/smallmolecule combinations with qualifier mapping logic. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| association = GeneRegulatesGeneAssociation( | ||
| id=entity_id(), | ||
| subject=subject.id, | ||
| object=object.id, | ||
| predicate = "biolink:affects", | ||
| sources=build_association_knowledge_sources(primary=INFORES_SIGNOR), | ||
| knowledge_level=KnowledgeLevelEnum.knowledge_assertion, | ||
| agent_type=AgentTypeEnum.manual_agent, | ||
| qualified_predicate = "biolink:causes", | ||
| object_aspect_qualifier = object_aspect_qualifier, | ||
| object_direction_qualifier = object_direction_qualifier | ||
| ) |
Copilot
AI
Feb 11, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The association for this protein→chemical case is created as GeneRegulatesGeneAssociation even though the predicate is biolink:affects and the object is a chemical. This association type encodes gene/gene semantics and is inconsistent with the edge being produced; it may also break downstream validation/expectations. Use a Biolink association class that matches the subject/object types (e.g., GeneAffectsChemicalAssociation for protein/gene→chemical, or ChemicalAffectsGeneAssociation for chemical→protein).
| elif record["subject_category"] == "smallmolecule" and record["object_category"] == "chemical" and record["EFFECT"] in list_ppi_accept_effects: | ||
| subject = ChemicalEntity(id=record["IDA"], name=record["subject_name"]) | ||
| object = Protein(id="UniProtKB:" + record["IDB"], name=record["object_name"]) | ||
|
|
Copilot
AI
Feb 11, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For subject_category == "smallmolecule" and object_category == "chemical", object is currently built as a Protein with a UniProtKB: CURIE. This will produce invalid node IDs/types for chemical objects and will misrepresent the edge. Construct object as a chemical node (e.g., ChemicalEntity) and ensure the association class matches small-molecule/chemical semantics (likely ChemicalEntityToChemicalEntityAssociation for chemical→chemical, or ChemicalAffectsGeneAssociation/GeneAffectsChemicalAssociation for chemical↔protein).
| if record["EFFECT"] == 'up-regulates activity': | ||
| object_aspect_qualifier = GeneOrGeneProductOrChemicalEntityAspectEnum.activity | ||
| object_direction_qualifier = DirectionQualifierEnum.upregulated | ||
| elif record["EFFECT"] == 'up-regulates': | ||
| object_aspect_qualifier = GeneOrGeneProductOrChemicalEntityAspectEnum.activity_or_abundance | ||
| object_direction_qualifier = DirectionQualifierEnum.upregulated | ||
| elif record["EFFECT"] == 'up-regulates quantity': | ||
| object_aspect_qualifier = GeneOrGeneProductOrChemicalEntityAspectEnum.expression | ||
| object_direction_qualifier = DirectionQualifierEnum.upregulated | ||
| elif record["EFFECT"] == 'up-regulates quantity by expression': | ||
| object_aspect_qualifier = GeneOrGeneProductOrChemicalEntityAspectEnum.expression | ||
| object_direction_qualifier = DirectionQualifierEnum.upregulated | ||
| elif record["EFFECT"] == 'down-regulates activity': | ||
| object_aspect_qualifier = GeneOrGeneProductOrChemicalEntityAspectEnum.activity | ||
| object_direction_qualifier = DirectionQualifierEnum.downregulated | ||
| elif record["EFFECT"] == 'down-regulates': | ||
| object_aspect_qualifier = GeneOrGeneProductOrChemicalEntityAspectEnum.activity_or_abundance | ||
| object_direction_qualifier = DirectionQualifierEnum.downregulated | ||
| elif record["EFFECT"] == 'down-regulates quantity': | ||
| object_aspect_qualifier = GeneOrGeneProductOrChemicalEntityAspectEnum.expression | ||
| object_direction_qualifier = DirectionQualifierEnum.downregulated | ||
| elif record["EFFECT"] == 'down-regulates quantity by expression': | ||
| object_aspect_qualifier = GeneOrGeneProductOrChemicalEntityAspectEnum.expression | ||
| object_direction_qualifier = DirectionQualifierEnum.downregulated | ||
|
|
||
| association = GeneRegulatesGeneAssociation( | ||
| id=entity_id(), | ||
| subject=subject.id, | ||
| object=object.id, | ||
| predicate = "biolink:affects", | ||
| sources=build_association_knowledge_sources(primary=INFORES_SIGNOR), | ||
| knowledge_level=KnowledgeLevelEnum.knowledge_assertion, | ||
| agent_type=AgentTypeEnum.manual_agent, | ||
| qualified_predicate = "biolink:causes", | ||
| object_aspect_qualifier = object_aspect_qualifier, | ||
| object_direction_qualifier = object_direction_qualifier | ||
| ) |
Copilot
AI
Feb 11, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the smallmolecule→smallmolecule branch, object_aspect_qualifier / object_direction_qualifier are not assigned for all EFFECT values allowed by list_ppi_accept_effects (e.g., down-regulates quantity by destabilization), and there is no final else to raise. This can lead to an UnboundLocalError when constructing the association. Add explicit handling (or a final else raising) for all EFFECT values that can reach this block.
| elif record["subject_category"] == "protein" and record["object_category"] == "chemical" and record["EFFECT"] in list_ppi_accept_effects: | ||
| subject = Protein(id="UniProtKB:" + record["IDA"], name=record["subject_name"]) | ||
| object = Protein(id="UniProtKB:" + record["IDB"], name=record["object_name"]) | ||
|
|
Copilot
AI
Feb 11, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The PR title mentions an added unit test, but there doesn't appear to be any SIGNOR-specific test added under tests/unit/ingests/ (no signor/ test package found). Either include the new unit test in this PR or adjust the PR title/description so it matches the actual changes.
| elif record["subject_category"] == "protein" and record["object_category"] == "chemical" and record["EFFECT"] in list_ppi_accept_effects: | ||
| subject = Protein(id="UniProtKB:" + record["IDA"], name=record["subject_name"]) | ||
| object = Protein(id="UniProtKB:" + record["IDB"], name=record["object_name"]) | ||
|
|
||
| if record["EFFECT"] == 'up-regulates activity': | ||
| object_aspect_qualifier = GeneOrGeneProductOrChemicalEntityAspectEnum.activity | ||
| object_direction_qualifier = DirectionQualifierEnum.upregulated | ||
| elif record["EFFECT"] == 'up-regulates': | ||
| object_aspect_qualifier = GeneOrGeneProductOrChemicalEntityAspectEnum.activity_or_abundance | ||
| object_direction_qualifier = DirectionQualifierEnum.upregulated | ||
| elif record["EFFECT"] == 'up-regulates quantity': | ||
| object_aspect_qualifier = GeneOrGeneProductOrChemicalEntityAspectEnum.expression | ||
| object_direction_qualifier = DirectionQualifierEnum.upregulated | ||
| elif record["EFFECT"] == 'down-regulates': | ||
| object_aspect_qualifier = GeneOrGeneProductOrChemicalEntityAspectEnum.activity_or_abundance | ||
| object_direction_qualifier = DirectionQualifierEnum.downregulated | ||
| elif record["EFFECT"] == 'down-regulates quantity': | ||
| object_aspect_qualifier = GeneOrGeneProductOrChemicalEntityAspectEnum.expression | ||
| object_direction_qualifier = DirectionQualifierEnum.downregulated | ||
| elif record["EFFECT"] == 'down-regulates quantity by destabilization': | ||
| object_aspect_qualifier = GeneOrGeneProductOrChemicalEntityAspectEnum.stability | ||
| object_direction_qualifier = DirectionQualifierEnum.downregulated | ||
| else: | ||
| raise NotImplementedError(f'Effect {record["EFFECT"]} could not be mapped to required qualifiers.') | ||
|
|
||
| association = GeneRegulatesGeneAssociation( | ||
| id=entity_id(), | ||
| subject=subject.id, | ||
| object=object.id, | ||
| predicate = "biolink:affects", | ||
| sources=build_association_knowledge_sources(primary=INFORES_SIGNOR), | ||
| knowledge_level=KnowledgeLevelEnum.knowledge_assertion, | ||
| agent_type=AgentTypeEnum.manual_agent, | ||
| qualified_predicate = "biolink:causes", | ||
| object_aspect_qualifier = object_aspect_qualifier, | ||
| object_direction_qualifier = object_direction_qualifier | ||
| ) | ||
| if subject is not None and object is not None and association is not None: | ||
| nodes.append(subject) | ||
| nodes.append(object) | ||
| edges.append(association) | ||
|
|
Copilot
AI
Feb 11, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR adds multiple new SIGNOR transformation branches (protein/chemical/smallmolecule combinations) but there is no corresponding unit test coverage for SIGNOR ingests (no tests/unit/ingests/signor/ tests found). Please add unit tests that exercise at least one example record for each new category pair and asserts the correct node types/CURIE prefixes and association class/predicate/qualifiers.
| - subject_categories: | ||
| - biolink:Gene | ||
| - biolink:Protein | ||
| - biolink:Proteinfamily | ||
| - biolink:MacromolecularComplex | ||
| # - biolink:Proteinfamily | ||
| ## As discussed with Matt, for initial ingest we skipped complex nodes | ||
| ## Note to Qi: There is a separate SIGNOR_complexes.csv which has the detail of each complex ID, e.g., SIGNOR-C14;IKK-complex;"O14920 O15111 Q9Y6K9" | ||
| # - biolink:MacromolecularComplex | ||
| ## Qi - review comment: there are small mollecule as subject node in the graph-metadata.json output | ||
| - biolink:SmallMolecule | ||
| # MHB: this is now plural ('predicates'), and takes a list/array; QW: fixed, please remove this comment once checked | ||
| predicates: | ||
| - biolink:regulates | ||
| object_categories: | ||
| - biolink:Gene | ||
| - biolink:Protein |
Copilot
AI
Feb 11, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This edge_type_info declares biolink:Gene in both subject and object categories, but the SIGNOR transform implementation does not construct any Gene nodes (it currently creates Protein and ChemicalEntity nodes). This makes the RIG metadata inaccurate and may cause validation/expectations mismatches. Either add gene handling in the ingest code or remove biolink:Gene from the declared categories for the initial ingest.
| - subject_categories: | ||
| - biolink:Protein | ||
| predicate: biolink:part_of | ||
| object_categories: | ||
| - biolink:MacromolecularComplex | ||
| qualifiers: | ||
| - property: biolink:causal_mechanism_qualifier | ||
| value_range: CausalMechanismQualifierEnum | ||
| - property: biolink:anatomical_context_qualifier | ||
| value_description: UBERON or Cell Ontology terms, to represent data in the SIGNOR CELL_DATA and TISSUE_DATA columns. | ||
| - property: biolink:species_context_qualifier | ||
| value_description: NCBITaxon term to represent data in the SIGNOR TAX_ID column. | ||
| knowledge_level: | ||
| - knowledge_assertion | ||
| agent_type: | ||
| - manual_agent | ||
| edge_properties: # MHB: edge properties need to include "biolink" prefix.; QW: fixed, thanks for pointing this out! | ||
| - biolink:publications | ||
| - biolink:has_confidence_score | ||
| - biolink:supporting_text | ||
| ui_explanation: "Signor records indicate the protein to be 'form complex' another complex - which maps best to the Biolink predicate 'part of' with additional directional qualifier." | ||
| - subject_categories: | ||
| - biolink:Chemical | ||
| - biolink:ChemicalEntity | ||
| - biolink:Drug | ||
| - biolink:SmallMolecule |
Copilot
AI
Feb 11, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
biolink:Chemical is introduced here as a subject category, but the rest of the codebase RIGs consistently use biolink:ChemicalEntity for chemicals (e.g., src/translator_ingest/ingests/ctd/ctd_rig.yaml:117 and src/translator_ingest/ingests/drugcentral/drugcentral_rig.yaml:165). If biolink:Chemical is not a valid Biolink class in your configured model version, this will fail validation or produce inconsistent metadata. Consider removing biolink:Chemical and standardizing on biolink:ChemicalEntity (and/or biolink:SmallMolecule/biolink:Drug as needed).
| source_identifier_types: | ||
| - "ChEMBL" | ||
| - node_category: biolink:ChemicalEntity | ||
| - node_category: biolink:Chemical |
Copilot
AI
Feb 11, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
node_type_info switches the chemical node category to biolink:Chemical, but the SIGNOR ingest code constructs ChemicalEntity nodes (and other ingests/RIGs in this repo use biolink:ChemicalEntity as the standard chemical category). This mismatch can make the RIG inaccurate and may break any validation that compares produced node categories to the RIG. Consider reverting this to biolink:ChemicalEntity, or update the transform code to emit the corresponding Biolink class if biolink:Chemical is truly intended.
| - node_category: biolink:Chemical | |
| - node_category: biolink:ChemicalEntity |
| elif record["subject_category"] == "smallmolecule" and record["object_category"] == "smallmolecule" and record["EFFECT"] in list_ppi_accept_effects: | ||
| subject = ChemicalEntity(id=record["IDA"], name=record["subject_name"]) | ||
| object = Protein(id="UniProtKB:" + record["IDB"], name=record["object_name"]) | ||
|
|
Copilot
AI
Feb 11, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For subject_category == "smallmolecule" and object_category == "smallmolecule", object is currently created as a Protein with a UniProtKB: CURIE. This will generate invalid IDs/types for small-molecule objects and misrepresent the association. Construct object as a chemical node (e.g., ChemicalEntity/SmallMolecule) and use an association class appropriate for chemical↔chemical or chemical↔protein, depending on the intended semantics.
| if record["EFFECT"] == 'up-regulates activity': | ||
| object_aspect_qualifier = GeneOrGeneProductOrChemicalEntityAspectEnum.activity | ||
| object_direction_qualifier = DirectionQualifierEnum.upregulated | ||
| elif record["EFFECT"] == 'up-regulates': | ||
| object_aspect_qualifier = GeneOrGeneProductOrChemicalEntityAspectEnum.activity_or_abundance | ||
| object_direction_qualifier = DirectionQualifierEnum.upregulated | ||
| elif record["EFFECT"] == 'up-regulates quantity': | ||
| object_aspect_qualifier = GeneOrGeneProductOrChemicalEntityAspectEnum.expression | ||
| object_direction_qualifier = DirectionQualifierEnum.upregulated | ||
| elif record["EFFECT"] == 'down-regulates': | ||
| object_aspect_qualifier = GeneOrGeneProductOrChemicalEntityAspectEnum.activity_or_abundance | ||
| object_direction_qualifier = DirectionQualifierEnum.downregulated | ||
| elif record["EFFECT"] == 'down-regulates quantity': | ||
| object_aspect_qualifier = GeneOrGeneProductOrChemicalEntityAspectEnum.expression | ||
| object_direction_qualifier = DirectionQualifierEnum.downregulated |
Copilot
AI
Feb 11, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Several EFFECT values handled inside this branch (e.g., up-regulates quantity, down-regulates quantity) will never be reached because the outer condition requires record["EFFECT"] in list_ppi_accept_effects, and that list (defined earlier in the loop) does not include those strings. Either expand list_ppi_accept_effects to include the EFFECT values you intend to support here, or remove the unreachable cases so the mapping matches the gate condition.
detail will be added in the separate QA ticket for SIGNOR ingest