Skip to content

[Bug] os.environ["UNSLOTH_RETURN_LOGITS"] = "1" becomes unset to "0" once I start to train #3071

@charvishukla-bc

Description

@charvishukla-bc

Hello!

I have been working on fine tuning Gemma 3. During training, I wish to validate based on a custom metric. To mitigate the following error, I set os.environ["UNSLOTH_RETURN_LOGITS"] = "1":

TypeError: Unsupported types (<class 'unsloth_compiled_module_gemma3.EmptyLogits'>) passed to `_pad_across_processes`. Only nested list/tuple/dicts of objects that are valid for `is_torch_tensor` should be passed.

I am using the following configuration:

 config = SFTConfig(
          per_device_train_batch_size=self.train_args.get("batch_size", 4),
          gradient_accumulation_steps=self.train_args.get("grad_accum", 8),
          gradient_checkpointing=True,
          gradient_checkpointing_kwargs={"use_reentrant": False},
          max_grad_norm=0.3,
          warmup_ratio=0.03,
          learning_rate=self.train_args.get("lr", 2e-4),
          logging_steps=10,

          save_strategy="steps",
          save_steps=10,

          eval_strategy="steps",            
          eval_steps=self.train_args.get("eval_steps", 10),
          load_best_model_at_end=self.train_args.get("load_best_model_at_end", True),
          metric_for_best_model=self.train_args.get("metric_for_best_model", "top1_accuracy"),
          greater_is_better=self.train_args.get("greater_is_better", True),
          
          optim=self.train_args.get("optim", "adamw_torch_fused"),
          weight_decay=0.01,
          lr_scheduler_type="cosine",
          seed=self.train_args.get("seed", 3407),
          output_dir=self.output_dir,
          report_to="tensorboard",
          run_name="gemma_4b_lora_run_2",
          logging_dir="gemma_4b_lora_run_2",
          # max_seq_length=20000,
          remove_unused_columns=False,
          dataset_text_field="",
          dataset_kwargs={"skip_prepare_dataset": True},
      )

      trainer = SFTTrainer(
          model=self.model,
          predict_with_generate=True,
          train_dataset=self.train_dataset,
          eval_dataset=self.val_dataset, 
          compute_metrics=self.compute_metrics,
          processing_class=self.processor.tokenizer,
          data_collator=self.collator,
          args=config,

      )
      train_output = trainer.train()

Before running training, I check if the environment variable is set correctly (and it is):
Image

However, it seems to have changed in the process of training, and is back to being 0.
Image

What can I do here? I saw another issue about this, but it seemed like no one found a solution.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions