Skip to content

Is the FT script correct? #12

@emrgnt-cmplxty

Description

@emrgnt-cmplxty

Hi,

I ran the fine-tune script on Mistral base model but found rather poor results on ARC Challenge (<50% with retrieval). Any ideas why? I will repeat with Mistral Instruct to see if it makes a beneficial difference, but I am not optimistic as I have seen similar poor results when fine-tuning this model with the self-rag dataset and script.

MODEL_SIZE=7B
NUM_GPUS=8
BATCH_SIZE_PER_GPU=1
TOTAL_BATCH_SIZE=128
GRADIENT_ACC_STEPS=$(($TOTAL_BATCH_SIZE/$NUM_GPUS/$BATCH_SIZE_PER_GPU))
echo "Training llama model ${MODEL_SIZE} using $NUM_GPUS GPUs, $BATCH_SIZE_PER_GPU batch size per GPU, $GRADIENT_ACC_STEPS gradient accumulation steps"

CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 accelerate launch \
    --mixed_precision bf16 \
    --num_machines 1 \
    --num_processes $NUM_GPUS \
    --use_deepspeed \
    --deepspeed_config_file stage3_no_offloading_accelerate.conf \
    finetune.py \
    --model_name_or_path mistralai/Mistral-7B-v0.1 \
    --use_flash_attn \
    --tokenizer_name mistralai/Mistral-7B-v0.1 \
    --use_slow_tokenizer \
    --train_file full_output_1005.jsonl \
    --max_seq_length 2048 \
    --preprocessing_num_workers 16 \
    --per_device_train_batch_size $BATCH_SIZE_PER_GPU \
    --gradient_accumulation_steps $GRADIENT_ACC_STEPS \
    --learning_rate 2e-5 \
    --lr_scheduler_type linear \
    --warmup_ratio 0.03 \
    --weight_decay 0. \
    --num_train_epochs 5 \
    --output_dir output/mistral_root_${MODEL_SIZE}/ \
    --with_tracking \
    --report_to tensorboard \
    --logging_steps 1 \
    --use_special_tokens

EDIT: I had a chance to look into this today, I am fairly confident the issue is that this script will NOT work for a model that has not had the tokenizer independently prepared. Will confirm and close the issue - it might be nice to add some information on how to independently replicate the result fine-tuning from scratch.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions