Skip to content

fix: runtime bugs + make skill text optimizable by DSPy#24

Open
errusch wants to merge 1 commit intoNousResearch:mainfrom
errusch:fix/runtime-bugs-and-skill-optimization
Open

fix: runtime bugs + make skill text optimizable by DSPy#24
errusch wants to merge 1 commit intoNousResearch:mainfrom
errusch:fix/runtime-bugs-and-skill-optimization

Conversation

@errusch
Copy link
Copy Markdown

@errusch errusch commented Apr 14, 2026

What this fixes

The pipeline crashes on a fresh install with DSPy 3.1.3. Three bugs + one architectural improvement:

Bug fixes

  1. JSON parsing crash (dataset_builder.py): LLMs return Python-style dicts with single quotes, not valid JSON. The parser only tried json.loads() and a regex fallback — both failed on single-quoted output. Added ast.literal_eval as a fallback strategy before the regex approach, plus trailing-comma cleanup.

  2. GEPA API mismatch (evolve_skill.py): GEPA.__init__() uses max_metric_calls in DSPy 3.1.3, not max_steps. This caused an immediate TypeError, falling back to MIPROv2 every time. Fixed the parameter name and added auto="light".

  3. False constraint failures (constraints.py): _check_skill_structure was checking the skill BODY for YAML frontmatter, but the body (after splitting from frontmatter) never has it — every skill failed this constraint. Rewrote to validate body structure (headings, procedural content, substance) instead.

Architectural improvement

  1. Skill text as optimizable instruction (skill_module.py + evolve_skill.py): The original code passed skill text as an input field (skill_instructions), so the optimizer could never mutate it — it only optimized the wrapper instruction. Restructured to embed the skill text in the instruction template via with_instructions(), allowing MIPROv2/GEPA to propose improved skill bodies. Updated extraction logic to pull the evolved text from the compiled predictor's instruction.

Testing

Ran end-to-end evolution on the arxiv skill with --eval-source synthetic using gemini-2.5-flash. Pipeline completes successfully: generates 20 synthetic eval cases, runs MIPROv2 optimization (10 trials), passes all constraints, and produces a mutated skill body (+2.4% growth, 232 chars added).

Holdout Score:  0.356 → 0.346 (-0.011 with keyword-overlap metric)
Skill Size:     9,773 → 10,005 chars (+2.4%)

The holdout dip is expected — the keyword-overlap metric is a weak proxy. The real improvement needs an LLM-as-judge metric for holdout eval, which the existing LLMJudge class supports but isn't wired into the holdout scoring yet.

Three bug fixes that prevent the pipeline from running:

1. dataset_builder: LLM returns Python-style dicts (single quotes), not
   valid JSON. Added ast.literal_eval fallback + trailing comma fix so
   synthetic dataset generation doesn't crash on parse.

2. evolve_skill: GEPA API changed in DSPy 3.1.3 — max_steps is now
   max_metric_calls. Fixed the call and added auto='light'.

3. constraints: _check_skill_structure was checking the skill BODY for
   YAML frontmatter, which it never has after splitting. Rewrote to
   validate body structure (headings, procedural content, substance).

One architectural improvement:

4. skill_module: Skill text was passed as an input field, so the
   optimizer could never mutate it. Restructured to embed skill text
   in the instruction template via with_instructions(), allowing
   MIPROv2/GEPA to propose improved skill bodies. Updated extraction
   logic in evolve_skill.py to pull evolved text from the compiled
   predictor's instruction.
steezkelly added a commit to steezkelly/hermes-agent-self-evolution that referenced this pull request Apr 25, 2026
…sResearch#24, NousResearch#26, NousResearch#35)

- PR NousResearch#24: skill_module.py stores skill body as InputField → signature.instructions
  - _load_skill_body() splits frontmatter from body, body becomes instruction
  - _extract_evolved_instructions() extracts from signature.instructions (not wrapper)
  - constraint_validator.py: body/frontmatter separation — validate body has substance
  - dataset_builder.py: robust JSON parsing with 6 fallback strategies

- PR NousResearch#26: GEPA wiring fix — reflection_lm passed to GEPA

- PR NousResearch#35: constraint validator for GEPA args, max_metric_calls not mixed with auto

Note: GEPA still falls back to MIPROv2 due to DSPy 3.2.0 API — max_metric_calls
conflicts with auto='light'. Use max_metric_calls alone (fixed).
steezkelly added a commit to steezkelly/hermes-agent-self-evolution that referenced this pull request Apr 25, 2026
…traint validator, JSON parsing robustness

Combined patch applying upstream PRs NousResearch#24/NousResearch#26/NousResearch#35:
- skill_module.py: embed skill body in signature instructions via HTML sentinel
- evolve_skill.py: HTML sentinel extraction with fallback, GEPA max_metric_calls fix, improved messaging
- constraints.py: validate YAML frontmatter + substantive body content separately
- dataset_builder.py: 6-strategy JSON parser for LLM output resilience
- sentinel collision: replaced \n\n---\n\n (appears in skill bodies) with <!-- ___SKILL_EVOLUTION_SENTINEL___ -->
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant