This is the official repository for the NeurIPS 2025 paper "VTON-VLLM: Aligning Virtual Try-On Models with Human Preferences"
We novelly propose a vision large language model, namely VTON-VLLM, functions as a unified “fashion expert” and is capable of both evaluating and steering VTON synthesis towards human preferences. VTON-VLLM upgrades VTON model through two pivotal ways: (1) providing fine-grained supervisory signals during the training of a plug-and-play VTON refinement model, and (2) enabling adaptive and preference-aware test-time scaling at inference. To benchmark VTON models more holistically, we introduce VITON-Bench, a challenging test suite of complex try-on scenarios, and human-preference–aware metrics.
Create a conda environment & Install requirments
conda create -n VTON-VLLM python==3.9.0
conda activate VTON-VLLM
cd VTON-VLLM-main
pip install -r requirements.txt
You can directly download the VTON-VLLM or follow the instructions in preprocessing.md to extract the Semantic Point Feature yourself.
Please download the pre-trained model from Link.
sh src/inference.sh
sh src/train_VTON_refinement_model.sh
sh metrics/vllm_metrics.py
Thanks the contribution of LLaMA-Factory and CAT-VTON.