fix(constraints): _check_skill_structure checks body not frontmatter + feat(fitness): LLMJudge rubric scoring#23
Open
laolitou2022 wants to merge 1 commit intoNousResearch:mainfrom
Conversation
Root cause: load_skill() strips frontmatter before returning skill["body"]. evolve_skill.py passes body to validate_all(). _check_skill_structure checked for --- markers (always absent from body) so every evolved skill failed constraint validation and was saved as *FAILED.md. Fix: check body structure instead — heading in first 3 lines (with optional space after #) and substantive content (>=50 chars non-heading text). Also fix re MULTILINE typo. feat(fitness): FitnessScore dataclass + LLMJudge with rubric scoring Replace keyword-overlap skill_fitness_metric() (37-49% accuracy) with multi-dimensional FitnessScore: correctness, procedure_following, conciseness, length_penalty, feedback. LLMJudge uses dspy.ParaJudge with rubric. Backwards-compatible wrapper preserves existing API.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
fix(constraints): _check_skill_structure checks body not frontmatter
Root cause: load_skill() strips frontmatter before returning skill["body"].
evolve_skill.py passes body to validate_all(). _check_skill_structure
checked for --- markers (always absent from body) so every evolved skill
failed constraint validation and was saved as *FAILED.md.
Fix: check body structure instead — heading in first 3 lines (with optional
space after #) and substantive content (>=50 chars non-heading text).
Also fix re MULTILINE typo.
feat(fitness): FitnessScore dataclass + LLMJudge with rubric scoring
Replace keyword-overlap skill_fitness_metric() (37-49% accuracy) with
multi-dimensional FitnessScore: correctness, procedure_following,
conciseness, length_penalty, feedback. LLMJudge uses dspy.ParaJudge with
rubric. Backwards-compatible wrapper preserves existing API.