Skip to content

Make feedback and finetune-data paths respect subject/model args #2

Description

@sunghunkwag

Hi, I found a path/lineage issue in the problem-solving self-improvement pipeline.

The README describes the loop as parameterized by subject/model, with data at:

dataset/{subject}_train.jsonl
dataset/{subject}_test.jsonl

and examples such as:

python Problem_solving/PhyChem/get_a_sol.py --model='gpt-3.5-turbo' --task='MMLU_physics' --prompt_type='multi_agent' --mode='generate' --subject='phy'

get_c_regenerate.py follows this pattern and derives its files from args.subject and args.model:

inputfile = f"Problem_solving/PhyChem/logs/solve_{args.subject}_{args.model}/feedback.jsonl"
regenerate_sol_file = f"Problem_solving/PhyChem/logs/solve_{args.subject}_{args.model}/regenerate_sol.jsonl"

But get_b_feedback.py hardcodes the feedback source and destination:

inputfile = "Problem_solving/PhyChem/logs/solve_phy_gpt-3.5-turbo/wrong/wrong.jsonl"
feedback_file = f"Problem_solving/PhyChem/logs/solve_phy_gpt-3.5-turbo/feedback.jsonl"

and get_finetune_data.py also hardcodes:

ditc = "Problem_solving/PhyChem/logs/solve_phy_gpt-3.5-turbo"
sft_dic = "Problem_solving/PhyChem/logs/solve_phy_gpt-3.5-turbo"

This can break the self-improvement lineage for any run that is not exactly subject=phy and model=gpt-3.5-turbo. For example, a user running --subject chem or a different model can generate trajectories into one run directory, but feedback/finetune data can still be read from or written to the physics GPT-3.5 directory. The regenerate stage is already parameterized, so the B/C/D phases can silently diverge.

For a self-improving system, the experience library is effectively the promotion surface: successful trajectories and regenerated failures become training data for the next agent version. In bounded verifier-style RSI harnesses, that lineage needs to stay tied to the same run identity and evaluation split; otherwise the improvement claim becomes difficult to audit.

Suggested fix:

base_dir = f"Problem_solving/PhyChem/logs/solve_{args.subject}_{args.model}"
inputfile = f"{base_dir}/wrong/wrong.jsonl"
feedback_file = f"{base_dir}/feedback.jsonl"

and make get_finetune_data.py derive ditc/output paths from args.subject and args.model as well. It may also be useful to write a small manifest with subject, model, mode, source files, and output files so the experience-library lineage is auditable across improvement rounds.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions