Skip to content

Fix garbage DDx hypotheses when ARLL model outputs prose instead of JSON#13

Merged
dill-lk merged 2 commits intomainfrom
copilot/fix-moondream2-issue
Mar 2, 2026
Merged

Fix garbage DDx hypotheses when ARLL model outputs prose instead of JSON#13
dill-lk merged 2 commits intomainfrom
copilot/fix-moondream2-issue

Conversation

Copy link
Contributor

Copilot AI commented Mar 2, 2026

When processing non-chest images (e.g. a hand radiograph), the ARLL model outputs unstructured prose instead of JSON. The regex fallback in _parse_arll_output then captures sentence fragments as diagnosis names — e.g. "with probability", "t the spatial attention map has a attn of" — which pass _is_clinical_hypothesis and pollute the DDx ensemble with low-variance garbage, causing a false high-Sc gate pass and an empty clinical report.

Root cause

_is_clinical_hypothesis had two gaps:

  1. No check that diagnosis names start with an uppercase letter (all real medical conditions do; lowercase-initial strings are always regex-captured sentence fragments)
  2. _NON_CLINICAL_SUBSTRINGS missing patterns produced by reasoning-prose leakage: "attention map", "arll", " probability", etc.

Changes

rmoe/agents.py

  • _is_clinical_hypothesis: Reject any candidate whose first character is not uppercase. This single rule eliminates all observed garbage fragments.
  • _NON_CLINICAL_SUBSTRINGS: Add "attention map", " attn ", "arll", " probability", "how to", "the model", "approach this" as defence-in-depth for uppercase-starting false positives.
# Before: "with probability" and "t the spatial attention map…" passed the filter
def _is_clinical_hypothesis(name: str) -> bool:
    if len(name.strip()) < _MIN_DIAGNOSIS_LENGTH:
        return False
    low = name.lower()
    return not any(s in low for s in _NON_CLINICAL_SUBSTRINGS)

# After: lowercase-initial strings are immediately rejected
def _is_clinical_hypothesis(name: str) -> bool:
    stripped = name.strip()
    if len(stripped) < _MIN_DIAGNOSIS_LENGTH:
        return False
    if not stripped[0].isupper():   # sentence fragments start lowercase
        return False
    low = stripped.lower()
    return not any(s in low for s in _NON_CLINICAL_SUBSTRINGS)

When all candidates are filtered, _parse_arll_output returns an empty ensemble, causing ReasoningExpert.execute() to fall through to _fallback_ensemble — a structured output rather than garbage.

engine.py

  • Remove unused imports (CYAN, MAGENTA, WHITE, _kv) and fix a spurious f-string that were causing CI pyflakes failures on main.

tests/test_agents.py (new)

  • 21 tests covering _is_clinical_hypothesis (valid diagnoses, all five garbage fragments from the reported run, edge cases) and _parse_arll_output (clean JSON, pure-prose input, mixed input, JSON with garbage diagnosis fields).

💬 We'd love your input! Share your thoughts on Copilot coding agent in our 2 minute survey.

- rmoe/agents.py: strengthen _is_clinical_hypothesis to require uppercase
  start (all real medical diagnoses begin with uppercase; lowercase-initial
  strings are regex-captured sentence fragments) and expand
  _NON_CLINICAL_SUBSTRINGS with patterns seen in the broken-hand run
  ("attention map", " attn ", "arll", " probability", "how to",
  "the model", "approach this")
- engine.py: fix pre-existing pyflakes CI failures (remove unused imports
  CYAN/MAGENTA/WHITE/_kv; change spurious f-string to plain string)
- tests/test_agents.py: 21 new tests covering _is_clinical_hypothesis
  and _parse_arll_output

Co-authored-by: dill-lk <[email protected]>
Copilot AI changed the title [WIP] Fix Moondream2 functionality in broken hand image processing Fix garbage DDx hypotheses when ARLL model outputs prose instead of JSON Mar 2, 2026
@dill-lk dill-lk marked this pull request as ready for review March 2, 2026 15:21
@chatgpt-codex-connector
Copy link

You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard.

@dill-lk dill-lk merged commit 65b8053 into main Mar 2, 2026
@dill-lk dill-lk deleted the copilot/fix-moondream2-issue branch March 2, 2026 15:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants