Skip to content

fix(constraints): _check_skill_structure checks body not frontmatter + feat(fitness): LLMJudge rubric scoring#23

Open
laolitou2022 wants to merge 1 commit intoNousResearch:mainfrom
laolitou2022:fix/constraint-validator-and-fitness-score
Open

fix(constraints): _check_skill_structure checks body not frontmatter + feat(fitness): LLMJudge rubric scoring#23
laolitou2022 wants to merge 1 commit intoNousResearch:mainfrom
laolitou2022:fix/constraint-validator-and-fitness-score

Conversation

@laolitou2022
Copy link
Copy Markdown

fix(constraints): _check_skill_structure checks body not frontmatter

Root cause: load_skill() strips frontmatter before returning skill["body"].
evolve_skill.py passes body to validate_all(). _check_skill_structure
checked for --- markers (always absent from body) so every evolved skill
failed constraint validation and was saved as *FAILED.md.

Fix: check body structure instead — heading in first 3 lines (with optional
space after #) and substantive content (>=50 chars non-heading text).
Also fix re MULTILINE typo.

feat(fitness): FitnessScore dataclass + LLMJudge with rubric scoring

Replace keyword-overlap skill_fitness_metric() (37-49% accuracy) with
multi-dimensional FitnessScore: correctness, procedure_following,
conciseness, length_penalty, feedback. LLMJudge uses dspy.ParaJudge with
rubric. Backwards-compatible wrapper preserves existing API.

Root cause: load_skill() strips frontmatter before returning skill["body"].
evolve_skill.py passes body to validate_all(). _check_skill_structure
checked for --- markers (always absent from body) so every evolved skill
failed constraint validation and was saved as *FAILED.md.

Fix: check body structure instead — heading in first 3 lines (with optional
space after #) and substantive content (>=50 chars non-heading text).
Also fix re MULTILINE typo.

feat(fitness): FitnessScore dataclass + LLMJudge with rubric scoring

Replace keyword-overlap skill_fitness_metric() (37-49% accuracy) with
multi-dimensional FitnessScore: correctness, procedure_following,
conciseness, length_penalty, feedback. LLMJudge uses dspy.ParaJudge with
rubric. Backwards-compatible wrapper preserves existing API.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant