Hi, I found a path/lineage issue in the problem-solving self-improvement pipeline.
The README describes the loop as parameterized by subject/model, with data at:
dataset/{subject}_train.jsonl
dataset/{subject}_test.jsonl
and examples such as:
python Problem_solving/PhyChem/get_a_sol.py --model='gpt-3.5-turbo' --task='MMLU_physics' --prompt_type='multi_agent' --mode='generate' --subject='phy'
get_c_regenerate.py follows this pattern and derives its files from args.subject and args.model:
inputfile = f"Problem_solving/PhyChem/logs/solve_{args.subject}_{args.model}/feedback.jsonl"
regenerate_sol_file = f"Problem_solving/PhyChem/logs/solve_{args.subject}_{args.model}/regenerate_sol.jsonl"
But get_b_feedback.py hardcodes the feedback source and destination:
inputfile = "Problem_solving/PhyChem/logs/solve_phy_gpt-3.5-turbo/wrong/wrong.jsonl"
feedback_file = f"Problem_solving/PhyChem/logs/solve_phy_gpt-3.5-turbo/feedback.jsonl"
and get_finetune_data.py also hardcodes:
ditc = "Problem_solving/PhyChem/logs/solve_phy_gpt-3.5-turbo"
sft_dic = "Problem_solving/PhyChem/logs/solve_phy_gpt-3.5-turbo"
This can break the self-improvement lineage for any run that is not exactly subject=phy and model=gpt-3.5-turbo. For example, a user running --subject chem or a different model can generate trajectories into one run directory, but feedback/finetune data can still be read from or written to the physics GPT-3.5 directory. The regenerate stage is already parameterized, so the B/C/D phases can silently diverge.
For a self-improving system, the experience library is effectively the promotion surface: successful trajectories and regenerated failures become training data for the next agent version. In bounded verifier-style RSI harnesses, that lineage needs to stay tied to the same run identity and evaluation split; otherwise the improvement claim becomes difficult to audit.
Suggested fix:
base_dir = f"Problem_solving/PhyChem/logs/solve_{args.subject}_{args.model}"
inputfile = f"{base_dir}/wrong/wrong.jsonl"
feedback_file = f"{base_dir}/feedback.jsonl"
and make get_finetune_data.py derive ditc/output paths from args.subject and args.model as well. It may also be useful to write a small manifest with subject, model, mode, source files, and output files so the experience-library lineage is auditable across improvement rounds.
Hi, I found a path/lineage issue in the problem-solving self-improvement pipeline.
The README describes the loop as parameterized by subject/model, with data at:
and examples such as:
get_c_regenerate.pyfollows this pattern and derives its files fromargs.subjectandargs.model:But
get_b_feedback.pyhardcodes the feedback source and destination:and
get_finetune_data.pyalso hardcodes:This can break the self-improvement lineage for any run that is not exactly
subject=phyandmodel=gpt-3.5-turbo. For example, a user running--subject chemor a different model can generate trajectories into one run directory, but feedback/finetune data can still be read from or written to the physics GPT-3.5 directory. The regenerate stage is already parameterized, so the B/C/D phases can silently diverge.For a self-improving system, the experience library is effectively the promotion surface: successful trajectories and regenerated failures become training data for the next agent version. In bounded verifier-style RSI harnesses, that lineage needs to stay tied to the same run identity and evaluation split; otherwise the improvement claim becomes difficult to audit.
Suggested fix:
and make
get_finetune_data.pyderiveditc/output paths fromargs.subjectandargs.modelas well. It may also be useful to write a small manifest withsubject,model,mode, source files, and output files so the experience-library lineage is auditable across improvement rounds.