fix: runtime bugs + make skill text optimizable by DSPy#24
Open
errusch wants to merge 1 commit intoNousResearch:mainfrom
Open
fix: runtime bugs + make skill text optimizable by DSPy#24errusch wants to merge 1 commit intoNousResearch:mainfrom
errusch wants to merge 1 commit intoNousResearch:mainfrom
Conversation
Three bug fixes that prevent the pipeline from running: 1. dataset_builder: LLM returns Python-style dicts (single quotes), not valid JSON. Added ast.literal_eval fallback + trailing comma fix so synthetic dataset generation doesn't crash on parse. 2. evolve_skill: GEPA API changed in DSPy 3.1.3 — max_steps is now max_metric_calls. Fixed the call and added auto='light'. 3. constraints: _check_skill_structure was checking the skill BODY for YAML frontmatter, which it never has after splitting. Rewrote to validate body structure (headings, procedural content, substance). One architectural improvement: 4. skill_module: Skill text was passed as an input field, so the optimizer could never mutate it. Restructured to embed skill text in the instruction template via with_instructions(), allowing MIPROv2/GEPA to propose improved skill bodies. Updated extraction logic in evolve_skill.py to pull evolved text from the compiled predictor's instruction.
steezkelly
added a commit
to steezkelly/hermes-agent-self-evolution
that referenced
this pull request
Apr 25, 2026
…sResearch#24, NousResearch#26, NousResearch#35) - PR NousResearch#24: skill_module.py stores skill body as InputField → signature.instructions - _load_skill_body() splits frontmatter from body, body becomes instruction - _extract_evolved_instructions() extracts from signature.instructions (not wrapper) - constraint_validator.py: body/frontmatter separation — validate body has substance - dataset_builder.py: robust JSON parsing with 6 fallback strategies - PR NousResearch#26: GEPA wiring fix — reflection_lm passed to GEPA - PR NousResearch#35: constraint validator for GEPA args, max_metric_calls not mixed with auto Note: GEPA still falls back to MIPROv2 due to DSPy 3.2.0 API — max_metric_calls conflicts with auto='light'. Use max_metric_calls alone (fixed).
steezkelly
added a commit
to steezkelly/hermes-agent-self-evolution
that referenced
this pull request
Apr 25, 2026
…traint validator, JSON parsing robustness Combined patch applying upstream PRs NousResearch#24/NousResearch#26/NousResearch#35: - skill_module.py: embed skill body in signature instructions via HTML sentinel - evolve_skill.py: HTML sentinel extraction with fallback, GEPA max_metric_calls fix, improved messaging - constraints.py: validate YAML frontmatter + substantive body content separately - dataset_builder.py: 6-strategy JSON parser for LLM output resilience - sentinel collision: replaced \n\n---\n\n (appears in skill bodies) with <!-- ___SKILL_EVOLUTION_SENTINEL___ -->
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What this fixes
The pipeline crashes on a fresh install with DSPy 3.1.3. Three bugs + one architectural improvement:
Bug fixes
JSON parsing crash (
dataset_builder.py): LLMs return Python-style dicts with single quotes, not valid JSON. The parser only triedjson.loads()and a regex fallback — both failed on single-quoted output. Addedast.literal_evalas a fallback strategy before the regex approach, plus trailing-comma cleanup.GEPA API mismatch (
evolve_skill.py):GEPA.__init__()usesmax_metric_callsin DSPy 3.1.3, notmax_steps. This caused an immediate TypeError, falling back to MIPROv2 every time. Fixed the parameter name and addedauto="light".False constraint failures (
constraints.py):_check_skill_structurewas checking the skill BODY for YAML frontmatter, but the body (after splitting from frontmatter) never has it — every skill failed this constraint. Rewrote to validate body structure (headings, procedural content, substance) instead.Architectural improvement
skill_module.py+evolve_skill.py): The original code passed skill text as an input field (skill_instructions), so the optimizer could never mutate it — it only optimized the wrapper instruction. Restructured to embed the skill text in the instruction template viawith_instructions(), allowing MIPROv2/GEPA to propose improved skill bodies. Updated extraction logic to pull the evolved text from the compiled predictor's instruction.Testing
Ran end-to-end evolution on the
arxivskill with--eval-source syntheticusing gemini-2.5-flash. Pipeline completes successfully: generates 20 synthetic eval cases, runs MIPROv2 optimization (10 trials), passes all constraints, and produces a mutated skill body (+2.4% growth, 232 chars added).The holdout dip is expected — the keyword-overlap metric is a weak proxy. The real improvement needs an LLM-as-judge metric for holdout eval, which the existing
LLMJudgeclass supports but isn't wired into the holdout scoring yet.