Skip to content

[BUG] Llama3.2 vision eleuther eval recipe RuntimeError: stack expects each tensor to be equal size, but got [259, 6404] #1874

@SalmanMohammadi

Description

@SalmanMohammadi

Ran on 1 x RTX A6000 48GB.

root@f5de76cb602e:~/torchtune# tune run eleuther_eval --config llama3_2_vision/evaluation
Running EleutherEvalRecipe with resolved config:

batch_size: 1
checkpointer:
  _component_: torchtune.training.FullModelMetaCheckpointer
  checkpoint_dir: /tmp/Llama-3.2-11B-Vision-Instruct/original
  checkpoint_files:
  - consolidated.pth
  model_type: LLAMA3_VISION
  output_dir: ./
device: cuda
dtype: bf16
enable_kv_cache: true
limit: null
log_level: INFO
max_seq_length: 8192
model:
  _component_: torchtune.models.llama3_2_vision.llama3_2_vision_11b
quantizer: null
seed: 1234
tasks:
- mmmu_val_science
tokenizer:
  _component_: torchtune.models.llama3_2_vision.llama3_2_vision_transform
  max_seq_len: 8192
  path: /tmp/Llama-3.2-11B-Vision-Instruct/original/tokenizer.model

2024-10-20:22:33:11,546 INFO     [_logging.py:101] Running EleutherEvalRecipe with resolved config:

batch_size: 1
checkpointer:
  _component_: torchtune.training.FullModelMetaCheckpointer
  checkpoint_dir: /tmp/Llama-3.2-11B-Vision-Instruct/original
  checkpoint_files:
  - consolidated.pth
  model_type: LLAMA3_VISION
  output_dir: ./
device: cuda
dtype: bf16
enable_kv_cache: true
limit: null
log_level: INFO
max_seq_length: 8192
model:
  _component_: torchtune.models.llama3_2_vision.llama3_2_vision_11b
quantizer: null
seed: 1234
tasks:
- mmmu_val_science
tokenizer:
  _component_: torchtune.models.llama3_2_vision.llama3_2_vision_transform
  max_seq_len: 8192
  path: /tmp/Llama-3.2-11B-Vision-Instruct/original/tokenizer.model

Model is initialized with precision torch.bfloat16.
2024-10-20:22:33:14,882 INFO     [eleuther_eval.py:505] Model is initialized with precision torch.bfloat16.
Running evaluation on the following tasks: ['mmmu_val_science']
2024-10-20:22:33:30,748 INFO     [eleuther_eval.py:549] Running evaluation on the following tasks: ['mmmu_val_science']
2024-10-20:22:33:30,753 INFO     [task.py:415] Building contexts for mmmu_val_biology on rank 0...
100%|█████████████████████████████████████████████████████████████████| 30/30 [00:00<00:00, 14130.17it/s]
2024-10-20:22:33:31,202 INFO     [task.py:415] Building contexts for mmmu_val_chemistry on rank 0...
100%|█████████████████████████████████████████████████████████████████| 30/30 [00:00<00:00, 20239.52it/s]
2024-10-20:22:33:31,295 INFO     [task.py:415] Building contexts for mmmu_val_geography on rank 0...
100%|█████████████████████████████████████████████████████████████████| 30/30 [00:00<00:00, 19642.39it/s]
2024-10-20:22:33:31,494 INFO     [task.py:415] Building contexts for mmmu_val_math on rank 0...
100%|█████████████████████████████████████████████████████████████████| 30/30 [00:00<00:00, 19266.44it/s]
2024-10-20:22:33:31,634 INFO     [task.py:415] Building contexts for mmmu_val_physics on rank 0...
100%|█████████████████████████████████████████████████████████████████| 30/30 [00:00<00:00, 18961.59it/s]
2024-10-20:22:33:31,736 INFO     [evaluator.py:489] Running generate_until requests
Running generate_until requests with text+image input:   1%|           | 1/150 [00:30<1:15:09, 30.26s/it]Traceback (most recent call last):
  File "/usr/local/bin/tune", line 8, in <module>
    sys.exit(main())
             ^^^^^^
  File "/usr/local/lib/python3.11/dist-packages/torchtune/_cli/tune.py", line 49, in main
    parser.run(args)
  File "/usr/local/lib/python3.11/dist-packages/torchtune/_cli/tune.py", line 43, in run
    args.func(args)
  File "/usr/local/lib/python3.11/dist-packages/torchtune/_cli/run.py", line 208, in _run_cmd
    self._run_single_device(args, is_builtin=is_builtin)
  File "/usr/local/lib/python3.11/dist-packages/torchtune/_cli/run.py", line 102, in _run_single_device
    runpy.run_path(str(args.recipe), run_name="__main__")
  File "<frozen runpy>", line 291, in run_path
  File "<frozen runpy>", line 98, in _run_module_code
  File "<frozen runpy>", line 88, in _run_code
  File "/usr/local/lib/python3.11/dist-packages/recipes/eleuther_eval.py", line 576, in <module>
    sys.exit(recipe_main())
             ^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/dist-packages/torchtune/config/_parse.py", line 99, in wrapper
    sys.exit(recipe_main(conf))
             ^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/dist-packages/recipes/eleuther_eval.py", line 572, in recipe_main
    recipe.evaluate()
  File "/usr/local/lib/python3.11/dist-packages/recipes/eleuther_eval.py", line 550, in evaluate
    output = evaluate(
             ^^^^^^^^^
  File "/usr/local/lib/python3.11/dist-packages/lm_eval/utils.py", line 397, in _wrapper
    return fn(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/dist-packages/lm_eval/evaluator.py", line 500, in evaluate
    resps = getattr(lm, reqtype)(cloned_reqs)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/dist-packages/lm_eval/models/hf_vlms.py", line 679, in generate_until
    inputs = self.tok_batch_multimodal_encode(
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/dist-packages/recipes/eleuther_eval.py", line 185, in tok_batch_multimodal_encode
    tok_batch = padded_collate_tiled_images_and_mask(
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/dist-packages/torchtune/data/_collate.py", line 412, in padded_collate_tiled_images_and_mask
    batch_masks.append(torch.stack(sample_masks))
                       ^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: stack expects each tensor to be equal size, but got [259, 6404] at entry 0 and [259, 12808] at entry 1
Running generate_until requests with text+image input:   1%|           | 1/150 [00:30<1:15:43, 30.50s/it]

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions