Fix garbage DDx hypotheses when ARLL model outputs prose instead of JSON by Copilot · Pull Request #13 · dill-lk/R-MoE-for-Clinical-Diagnostics

Copilot · 2026-03-02T15:08:57Z

When processing non-chest images (e.g. a hand radiograph), the ARLL model outputs unstructured prose instead of JSON. The regex fallback in _parse_arll_output then captures sentence fragments as diagnosis names — e.g. "with probability", "t the spatial attention map has a attn of" — which pass _is_clinical_hypothesis and pollute the DDx ensemble with low-variance garbage, causing a false high-Sc gate pass and an empty clinical report.

Root cause

_is_clinical_hypothesis had two gaps:

No check that diagnosis names start with an uppercase letter (all real medical conditions do; lowercase-initial strings are always regex-captured sentence fragments)
_NON_CLINICAL_SUBSTRINGS missing patterns produced by reasoning-prose leakage: "attention map", "arll", " probability", etc.

Changes

`rmoe/agents.py`

_is_clinical_hypothesis: Reject any candidate whose first character is not uppercase. This single rule eliminates all observed garbage fragments.
_NON_CLINICAL_SUBSTRINGS: Add "attention map", " attn ", "arll", " probability", "how to", "the model", "approach this" as defence-in-depth for uppercase-starting false positives.

# Before: "with probability" and "t the spatial attention map…" passed the filter
def _is_clinical_hypothesis(name: str) -> bool:
    if len(name.strip()) < _MIN_DIAGNOSIS_LENGTH:
        return False
    low = name.lower()
    return not any(s in low for s in _NON_CLINICAL_SUBSTRINGS)

# After: lowercase-initial strings are immediately rejected
def _is_clinical_hypothesis(name: str) -> bool:
    stripped = name.strip()
    if len(stripped) < _MIN_DIAGNOSIS_LENGTH:
        return False
    if not stripped[0].isupper():   # sentence fragments start lowercase
        return False
    low = stripped.lower()
    return not any(s in low for s in _NON_CLINICAL_SUBSTRINGS)

When all candidates are filtered, _parse_arll_output returns an empty ensemble, causing ReasoningExpert.execute() to fall through to _fallback_ensemble — a structured output rather than garbage.

`engine.py`

Remove unused imports (CYAN, MAGENTA, WHITE, _kv) and fix a spurious f-string that were causing CI pyflakes failures on main.

`tests/test_agents.py` (new)

21 tests covering _is_clinical_hypothesis (valid diagnoses, all five garbage fragments from the reported run, edge cases) and _parse_arll_output (clean JSON, pure-prose input, mixed input, JSON with garbage diagnosis fields).

💬 We'd love your input! Share your thoughts on Copilot coding agent in our 2 minute survey.

- rmoe/agents.py: strengthen _is_clinical_hypothesis to require uppercase start (all real medical diagnoses begin with uppercase; lowercase-initial strings are regex-captured sentence fragments) and expand _NON_CLINICAL_SUBSTRINGS with patterns seen in the broken-hand run ("attention map", " attn ", "arll", " probability", "how to", "the model", "approach this") - engine.py: fix pre-existing pyflakes CI failures (remove unused imports CYAN/MAGENTA/WHITE/_kv; change spurious f-string to plain string) - tests/test_agents.py: 21 new tests covering _is_clinical_hypothesis and _parse_arll_output Co-authored-by: dill-lk <[email protected]>

chatgpt-codex-connector · 2026-03-02T15:21:18Z

You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard.

Initial plan

4c8c04b

Copilot AI assigned Copilot and dill-lk Mar 2, 2026

Copilot started work on behalf of dill-lk March 2, 2026 15:09 View session

Copilot AI changed the title ~~[WIP] Fix Moondream2 functionality in broken hand image processing~~ Fix garbage DDx hypotheses when ARLL model outputs prose instead of JSON Mar 2, 2026

Copilot finished work on behalf of dill-lk March 2, 2026 15:19

dill-lk marked this pull request as ready for review March 2, 2026 15:21

dill-lk merged commit 65b8053 into main Mar 2, 2026

dill-lk deleted the copilot/fix-moondream2-issue branch March 2, 2026 15:51

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix garbage DDx hypotheses when ARLL model outputs prose instead of JSON#13

Fix garbage DDx hypotheses when ARLL model outputs prose instead of JSON#13
dill-lk merged 2 commits intomainfrom
copilot/fix-moondream2-issue

Copilot AI commented Mar 2, 2026 •

edited

Loading

Uh oh!

chatgpt-codex-connector bot commented Mar 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Copilot AI commented Mar 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Root cause

Changes

rmoe/agents.py

engine.py

tests/test_agents.py (new)

Uh oh!

chatgpt-codex-connector bot commented Mar 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Copilot AI commented Mar 2, 2026 •

edited

Loading

`rmoe/agents.py`

`engine.py`

`tests/test_agents.py` (new)