-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Description
I want to fine-tune an LLM 80B. I'm training with the SFT instruction set using LoRA + FSDP. To monitor the quality of the responses, I want to generate model responses every 200 training steps. The model has a generate method for generating responses. However, it doesn't work for FSDP models, returning the well-known error: ❌ Error generating example: 'weight' must be 2-D.
To work around this issue, it's recommended to use the context manager FSDP.summon_full_params(model, writeback=False). But what if the model doesn't fit on a single GPU?
Furthermore, I'd like the output from these checks to match the output from the standard inference of this adapter. To achieve this, I use do_sample=False. But even using a smaller model and FSDP.summon_full_params(model, writeback=False) and set_seed doesn't produce reproducible results.
Here's my FSDP config:
compute_environment: LOCAL_MACHINE
debug: false
distributed_type: FSDP
downcast_bf16: 'no'
enable_cpu_affinity: false
fsdp_config:
fsdp_auto_wrap_policy: TRANSFORMER_BASED_WRAP
fsdp_backward_prefetch: BACKWARD_PRE
fsdp_cpu_ram_efficient_loading: true
fsdp_forward_prefetch: false
fsdp_offload_params: false
fsdp_sharding_strategy: SHARD_GRAD_OP
fsdp_state_dict_type: FULL_STATE_DICT
fsdp_sync_module_states: true
fsdp_use_orig_params: true
machine_rank: 0
main_training_function: main
num_machines: 1
num_processes: 4
rdzv_backend: static
same_network: true
tpu_env: []
tpu_use_cluster: false
tpu_use_sudo: false
use_cpu: false
environment:
accelerate==1.6.0
transformers==4.51.0
torch==2.5.1
trl==0.16.0
Please suggest a solution.