Skip to content

[RFC] Verl recipe for image generation model #2136

@Franklin-Zhang0

Description

@Franklin-Zhang0

Motivation

Reinforcement learning for autoregressive image generation with textual chain-of-though is attracting growing interest. A recent project, ReasonGen-R1, employs GRPO via Verl to enhance an image generation model’s ability to follow instructions. We plan to contribute this training pipeline as a Verl recipe to benefit the wider community.

Overall Structure

/recipe/image_generation
      config/
           image_generation_rl.yaml
           image_generation_sft.yaml
      main.py
      ray_trainer.py
      sft_trainer.py
      fsdp_worker.py
      dp_actor.py
      hf_rollout.py
      datasets/
            rl_datasets.py
            sft_datasets.py
      Janus/

Proposed Major Changes

New Function or Classes

ImageRewardModelWorker in fsdp_worker.py: We need to implement a reward model that takes in the generated images and prompts to assess the image quality and instruction following. The overall structure is similar to the current RewardModelWorker.
Janus-model: An example model class for RL training. Need to implement inside the recipe for interleaved image-text generation and official genrate and forward function is not released by deepseek.
AdaptiveEntropyCoefficient in dp_actor.py: Adaptive entorpy loss coefficient for stable training in text-image interleaved RL training. It updates using the target entropy and the entropy of output logits.

Function or Classes that needs modification:

FSDPSFTTrainer in FSDPSFTTrainer: support sft training for image generation model
HFSFTDataset in sft_dataset.py and RLDataset in rl_dataset.py: support data loading and formating for image generation
_build_model_optimizer in ActorRolloutRefWorker: modify to support janus loading
update_actor in DataParallelPPOActor: handle the adaptive entropy and seperate entropy computation for text and image
hf_rollout: return generated images and generated texts as seperate output.
fit in ray_trainer.py: support group_filtering in dapo

CC

@eric-haibin-lin

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions