TimeRefine [Paper]
Official PyTorch implementation of the paper "TIMEREFINE: Temporal Grounding with TimeRefining Video LLM".
We follow the same data preparation pipeline as VTimeLLM. Please check out VTimeLLM training for instructions on downloading pretrained models and datasets. Please download the stage2 and stage3 training files and the best checkpoint here.
For installation, please check out install_env.md.
For training, check out train_scripts.md.
For evaluation, check out eval_scripts.md.
We sincerely appreciate the incredible projects that contributed to the development of TimeRefine:
- LLaVA: Large Language and Vision Assistant
- FastChat: An Open Platform for Training, Serving, and Evaluating Large Language Model based Chatbots
- Video-ChatGPT: Towards Detailed Video Understanding via Large Vision and Language Models
- LLaMA: Open and Efficient Foundation Language Models
- Vid2seq: Large-Scale Pretraining of a Visual Language Model for Dense Video Captioning
- InternVid: A Large-scale Video-Text dataset
- VTimeLLM: Empower LLM to Grasp Video Moments
- VTG-LLM: Integrating Timestamp Knowledge into Video LLMs for Enhanced Video Temporal Grounding
If you're using TimeRefine in your research or applications, please cite using this BibTeX:
@misc{wang2024timerefinetemporalgroundingtime,
title={TimeRefine: Temporal Grounding with Time Refining Video LLM},
author={Xizi Wang and Feng Cheng and Ziyang Wang and Huiyu Wang and Md Mohaiminul Islam and Lorenzo Torresani and Mohit Bansal and Gedas Bertasius and David Crandall},
year={2024},
eprint={2412.09601},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2412.09601},
}Looking forward to your feedback, contributions, and stars! 🌟
