Skip to content
Merged
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
104 changes: 104 additions & 0 deletions recipes/configs/llama3_2_vision/11B_full_single_device_pretrained.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,104 @@
# Config for single device full finetuning in full_finetune_single_device.py
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the purpose of adding this? Just so that we support the non-instruct version of the model? I'm a bit confused cause I thought one big diff with instruct-tuned vs not is the extra trainable special tokens on the text size, which this PR doesn't address

Copy link
Contributor Author

@felipemello1 felipemello1 Sep 30, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is just for testing. I will remove it before the PR is ready

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

regarding the special token, thats a question for @pbontrager . I am not sure.

# using a Llama3.2 11B Vision Instruct model
#
# This config assumes that you've run the following command before launching:
# tune download meta-llama/Llama-3.2-11B-Vision --output-dir /tmp/Llama-3.2-11B-Vision
#
# The default config uses an optimizer from bitsandbytes. If you do not have it installed,
# you can install it with:
# pip install bitsandbytes
#
# To launch on a single device, run the following command from root:
# tune run full_finetune_single_device --config llama3_2_vision/11B_full_single_device
#
# You can add specific overrides through the command line. For example
# to override the checkpointer directory while launching training:
# tune run full_finetune_single_device --config llama3_2_vision/11B_full_single_device checkpointer.checkpoint_dir=<YOUR_CHECKPOINT_DIR>
#
# This config works only for training on single device.

# Model arguments
model:
_component_: torchtune.models.llama3_2_vision.llama3_2_vision_11b
decoder_trainable: False
encoder_trainable: True
fusion_trainable: True
image_size: 560 # Make sure this matches the image_size in tokenizer

# Transform
tokenizer:
_component_: torchtune.models.llama3_2_vision.llama3_2_vision_transform
path: /tmp/Llama-3.2-11B-Vision/original/tokenizer.model
image_size: 560

# Checkpointer
checkpointer:
_component_: torchtune.training.FullModelMetaCheckpointer
checkpoint_dir: /tmp/Llama-3.2-11B-Vision/original/
checkpoint_files: [consolidated.pth]
recipe_checkpoint: null
output_dir: /tmp/Llama-3.2-11B-Vision/
model_type: LLAMA3_VISION
resume_from_checkpoint: False

# Dataset
dataset:
_component_: torchtune.datasets.multimodal.the_cauldron_dataset
subset: ocrvqa
seed: null
shuffle: True
collate_fn: torchtune.data.padded_collate_tiled_images_and_mask

# Fine-tuning arguments
epochs: 1
max_steps_per_epoch: null
batch_size: 2
gradient_accumulation_steps: 16
optimizer:
_component_: bitsandbytes.optim.PagedAdamW8bit
lr: 2e-5
optimizer_in_bwd: False
loss:
_component_: torchtune.modules.loss.CEWithChunkedOutputLoss
clip_grad_norm: 1.0
compile: False

# Training env
device: cuda

# Memory management
enable_activation_checkpointing: True
dtype: bf16

# Logging
output_dir: /tmp/full-llama3.2-vision--finetune
metric_logger:
_component_: torchtune.training.metric_logging.DiskLogger
log_dir: /tmp/Llama-3.2-11B-Vision/logs
log_every_n_steps: 1
log_peak_memory_stats: False

# Profiler (default is disabled)
profiler:
_component_: torchtune.training.setup_torch_profiler
enabled: False

#Output directory of trace artifacts
output_dir: ${output_dir}/profiling_outputs

#`torch.profiler.ProfilerActivity` types to trace
cpu: True
cuda: True

#trace options passed to `torch.profiler.profile`
profile_memory: True
with_stack: False
record_shapes: True
with_flops: False

# `torch.profiler.schedule` options:
# wait_steps -> wait, warmup_steps -> warmup, active_steps -> active, num_cycles -> repeat
wait_steps: 1
warmup_steps: 2
active_steps: 1
num_cycles: 1
Loading