Fix garbage DDx hypotheses when ARLL model outputs prose instead of JSON#13
Merged
Fix garbage DDx hypotheses when ARLL model outputs prose instead of JSON#13
Conversation
- rmoe/agents.py: strengthen _is_clinical_hypothesis to require uppercase
start (all real medical diagnoses begin with uppercase; lowercase-initial
strings are regex-captured sentence fragments) and expand
_NON_CLINICAL_SUBSTRINGS with patterns seen in the broken-hand run
("attention map", " attn ", "arll", " probability", "how to",
"the model", "approach this")
- engine.py: fix pre-existing pyflakes CI failures (remove unused imports
CYAN/MAGENTA/WHITE/_kv; change spurious f-string to plain string)
- tests/test_agents.py: 21 new tests covering _is_clinical_hypothesis
and _parse_arll_output
Co-authored-by: dill-lk <[email protected]>
Copilot
AI
changed the title
[WIP] Fix Moondream2 functionality in broken hand image processing
Fix garbage DDx hypotheses when ARLL model outputs prose instead of JSON
Mar 2, 2026
|
You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
When processing non-chest images (e.g. a hand radiograph), the ARLL model outputs unstructured prose instead of JSON. The regex fallback in
_parse_arll_outputthen captures sentence fragments as diagnosis names — e.g."with probability","t the spatial attention map has a attn of"— which pass_is_clinical_hypothesisand pollute the DDx ensemble with low-variance garbage, causing a false high-Sc gate pass and an empty clinical report.Root cause
_is_clinical_hypothesishad two gaps:_NON_CLINICAL_SUBSTRINGSmissing patterns produced by reasoning-prose leakage:"attention map","arll"," probability", etc.Changes
rmoe/agents.py_is_clinical_hypothesis: Reject any candidate whose first character is not uppercase. This single rule eliminates all observed garbage fragments._NON_CLINICAL_SUBSTRINGS: Add"attention map"," attn ","arll"," probability","how to","the model","approach this"as defence-in-depth for uppercase-starting false positives.When all candidates are filtered,
_parse_arll_outputreturns an empty ensemble, causingReasoningExpert.execute()to fall through to_fallback_ensemble— a structured output rather than garbage.engine.pyCYAN,MAGENTA,WHITE,_kv) and fix a spurious f-string that were causing CI pyflakes failures onmain.tests/test_agents.py(new)_is_clinical_hypothesis(valid diagnoses, all five garbage fragments from the reported run, edge cases) and_parse_arll_output(clean JSON, pure-prose input, mixed input, JSON with garbagediagnosisfields).💬 We'd love your input! Share your thoughts on Copilot coding agent in our 2 minute survey.