Accelerate save_state() error using FSDP2/TP

### System Info

```Shell
- `Accelerate` version: 1.11.0
- Platform: Linux-5.15.0-1071-nvidia-x86_64-with-glibc2.39
- `accelerate` bash location: /app/.venv/bin/accelerate
- Python version: 3.12.12
- Numpy version: 2.3.4
- PyTorch version: 2.8.0+cu128
- PyTorch accelerator: CUDA
- System RAM: 2015.56 GB
- GPU type: NVIDIA H200
- `Accelerate` default config:
	Not found
```

### Information

- [ ] The official example scripts
- [x] My own modified scripts

### Tasks

- [ ] One of the scripts in the examples/ folder of Accelerate or an officially supported `no_trainer` script in the `examples` folder of the `transformers` repo (such as `run_no_trainer_glue.py`)
- [x] My own task or dataset (give details below)

### Reproduction

Set the environment variable HYDRA_FULL_ERROR=1 for a complete stack trace.
Traceback (most recent call last):
  File "/app/src/train.py", line 558, in main
    _train_with_accelerate(train_cfg)
  File "/app/src/train.py", line 359, in _train_with_accelerate
    _save_checkpoint(
  File "/app/src/train.py", line 523, in _save_checkpoint
    checkpoint_dir = accelerator.save_state()
                     ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/app/.venv/lib/python3.12/site-packages/accelerate/accelerator.py", line 3562, in save_state
    save_fsdp_optimizer(self.state.fsdp_plugin, self, opt, self._models[i], output_dir, i)


I am training a llama style model using FSDP2 and TP. I am using tp_size=2, dp_shard_size=8 and dp_replicate_size=1 based on the ND parallelism guide.

I am using an learning rate scheduler and a single optimizer. These are all prepared in a single accelerate.prepare().

I have noticed that regardless of how I order the prepare call for the optimizers I see 2 optimizers registered under accelerator._optimizers. However there is only a single model.

```
[2025-11-01 15:33:08,112][__main__][INFO] - accelerator._optimizers: [AcceleratedOptimizer (
Parameter Group 0
    amsgrad: False
    betas: (0.9, 0.999)
    capturable: False
    decoupled_weight_decay: True
    differentiable: False
    eps: 1e-08
    foreach: None
    fused: True
    initial_lr: 0.0002
    lr: 0.0
    maximize: False
    weight_decay: 0.0
), AcceleratedOptimizer (
Parameter Group 0
    amsgrad: False
    betas: (0.9, 0.999)
    capturable: False
    decoupled_weight_decay: True
    differentiable: False
    eps: 1e-08
    foreach: None
    fused: True
    initial_lr: 0.0002
    lr: 0.0
    maximize: False
    weight_decay: 0.0
)]
```

Based on the logic in save_state this is what is causing the error. But I can't see why 2 optimizers are being registered? I am certain I am only calling prepare a single time like so:

```
model, optimizer, train_loader, eval_loader, lr_scheduler = accelerator.prepare(
        model, optimizer, train_loader, eval_loader, lr_scheduler
    )
```

```
optimizer = torch.optim.AdamW(
        model.parameters(),
        lr=cfg.optimizer.learning_rate,
        weight_decay=cfg.optimizer.weight_decay,
        fused=cfg.optimizer.fused,
    )
```

```
lr_scheduler = get_linear_schedule_with_warmup(
        optimizer=optimizer,
        num_warmup_steps=warmup_steps,
        num_training_steps=total_training_steps,
    )
```

I'm at a bit of a loss here unfortunately. Thanks in advance!

### Expected behavior

I expect saving to succeed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Accelerate save_state() error using FSDP2/TP #3826

System Info

Information

Tasks

Reproduction

Expected behavior

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Accelerate save_state() error using FSDP2/TP #3826

Description

System Info

Information

Tasks

Reproduction

Expected behavior

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions