Skip to content

Error generating example: 'weight' must be 2-D in model.generate() #3825

@yurkoff-mv

Description

@yurkoff-mv

I want to fine-tune an LLM 80B. I'm training with the SFT instruction set using LoRA + FSDP. To monitor the quality of the responses, I want to generate model responses every 200 training steps. The model has a generate method for generating responses. However, it doesn't work for FSDP models, returning the well-known error: ❌ Error generating example: 'weight' must be 2-D.

To work around this issue, it's recommended to use the context manager FSDP.summon_full_params(model, writeback=False). But what if the model doesn't fit on a single GPU?

Furthermore, I'd like the output from these checks to match the output from the standard inference of this adapter. To achieve this, I use do_sample=False. But even using a smaller model and FSDP.summon_full_params(model, writeback=False) and set_seed doesn't produce reproducible results.

Here's my FSDP config:

compute_environment: LOCAL_MACHINE
debug: false
distributed_type: FSDP
downcast_bf16: 'no'
enable_cpu_affinity: false
fsdp_config:
  fsdp_auto_wrap_policy: TRANSFORMER_BASED_WRAP
  fsdp_backward_prefetch: BACKWARD_PRE
  fsdp_cpu_ram_efficient_loading: true
  fsdp_forward_prefetch: false
  fsdp_offload_params: false
  fsdp_sharding_strategy: SHARD_GRAD_OP
  fsdp_state_dict_type: FULL_STATE_DICT
  fsdp_sync_module_states: true
  fsdp_use_orig_params: true
machine_rank: 0
main_training_function: main
num_machines: 1
num_processes: 4
rdzv_backend: static
same_network: true
tpu_env: []
tpu_use_cluster: false
tpu_use_sudo: false
use_cpu: false

environment:

accelerate==1.6.0
transformers==4.51.0
torch==2.5.1
trl==0.16.0

Please suggest a solution.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions