fix: runtime bugs + make skill text optimizable by DSPy by errusch · Pull Request #24 · NousResearch/hermes-agent-self-evolution

errusch · 2026-04-14T14:09:54Z

What this fixes

The pipeline crashes on a fresh install with DSPy 3.1.3. Three bugs + one architectural improvement:

Bug fixes

JSON parsing crash (dataset_builder.py): LLMs return Python-style dicts with single quotes, not valid JSON. The parser only tried json.loads() and a regex fallback — both failed on single-quoted output. Added ast.literal_eval as a fallback strategy before the regex approach, plus trailing-comma cleanup.
GEPA API mismatch (evolve_skill.py): GEPA.__init__() uses max_metric_calls in DSPy 3.1.3, not max_steps. This caused an immediate TypeError, falling back to MIPROv2 every time. Fixed the parameter name and added auto="light".
False constraint failures (constraints.py): _check_skill_structure was checking the skill BODY for YAML frontmatter, but the body (after splitting from frontmatter) never has it — every skill failed this constraint. Rewrote to validate body structure (headings, procedural content, substance) instead.

Architectural improvement

Skill text as optimizable instruction (skill_module.py + evolve_skill.py): The original code passed skill text as an input field (skill_instructions), so the optimizer could never mutate it — it only optimized the wrapper instruction. Restructured to embed the skill text in the instruction template via with_instructions(), allowing MIPROv2/GEPA to propose improved skill bodies. Updated extraction logic to pull the evolved text from the compiled predictor's instruction.

Testing

Ran end-to-end evolution on the arxiv skill with --eval-source synthetic using gemini-2.5-flash. Pipeline completes successfully: generates 20 synthetic eval cases, runs MIPROv2 optimization (10 trials), passes all constraints, and produces a mutated skill body (+2.4% growth, 232 chars added).

Holdout Score:  0.356 → 0.346 (-0.011 with keyword-overlap metric)
Skill Size:     9,773 → 10,005 chars (+2.4%)

The holdout dip is expected — the keyword-overlap metric is a weak proxy. The real improvement needs an LLM-as-judge metric for holdout eval, which the existing LLMJudge class supports but isn't wired into the holdout scoring yet.

Three bug fixes that prevent the pipeline from running: 1. dataset_builder: LLM returns Python-style dicts (single quotes), not valid JSON. Added ast.literal_eval fallback + trailing comma fix so synthetic dataset generation doesn't crash on parse. 2. evolve_skill: GEPA API changed in DSPy 3.1.3 — max_steps is now max_metric_calls. Fixed the call and added auto='light'. 3. constraints: _check_skill_structure was checking the skill BODY for YAML frontmatter, which it never has after splitting. Rewrote to validate body structure (headings, procedural content, substance). One architectural improvement: 4. skill_module: Skill text was passed as an input field, so the optimizer could never mutate it. Restructured to embed skill text in the instruction template via with_instructions(), allowing MIPROv2/GEPA to propose improved skill bodies. Updated extraction logic in evolve_skill.py to pull evolved text from the compiled predictor's instruction.

…sResearch#24, NousResearch#26, NousResearch#35) - PR NousResearch#24: skill_module.py stores skill body as InputField → signature.instructions - _load_skill_body() splits frontmatter from body, body becomes instruction - _extract_evolved_instructions() extracts from signature.instructions (not wrapper) - constraint_validator.py: body/frontmatter separation — validate body has substance - dataset_builder.py: robust JSON parsing with 6 fallback strategies - PR NousResearch#26: GEPA wiring fix — reflection_lm passed to GEPA - PR NousResearch#35: constraint validator for GEPA args, max_metric_calls not mixed with auto Note: GEPA still falls back to MIPROv2 due to DSPy 3.2.0 API — max_metric_calls conflicts with auto='light'. Use max_metric_calls alone (fixed).

…traint validator, JSON parsing robustness Combined patch applying upstream PRs NousResearch#24/NousResearch#26/NousResearch#35: - skill_module.py: embed skill body in signature instructions via HTML sentinel - evolve_skill.py: HTML sentinel extraction with fallback, GEPA max_metric_calls fix, improved messaging - constraints.py: validate YAML frontmatter + substantive body content separately - dataset_builder.py: 6-strategy JSON parser for LLM output resilience - sentinel collision: replaced \n\n---\n\n (appears in skill bodies) with

steezkelly mentioned this pull request Apr 25, 2026

fix: ghost-improvement extraction bug + GEPA API + constraint validator + JSON robustness #39

Closed

innoscoutpro mentioned this pull request Apr 27, 2026

fix: integrate critical self-evolution pipeline fixes #42

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: runtime bugs + make skill text optimizable by DSPy#24

fix: runtime bugs + make skill text optimizable by DSPy#24
errusch wants to merge 1 commit intoNousResearch:mainfrom
errusch:fix/runtime-bugs-and-skill-optimization

errusch commented Apr 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

errusch commented Apr 14, 2026

What this fixes

Bug fixes

Architectural improvement

Testing

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant