PLM-VideoBench

As part of our PLM-release, we are releasing a comprehensive set of video benchmarks (grouped as PLM-VideoBench) for detailed video understanding. PLM-VideoBench includes the following sub-benchmarks,

Fine-Grained Question Answering (FGQA): In this task, a model must answer a multiple-choice question (MCQ) that probes fine-grained activity understanding.
Smart Glasses Question Answering (SGQA): In this task, a model must answer open-ended questions about activities and objects visible in an egocentric video stream recorded by a Meta VR Glasses.
Video Region Captioning (RCap): In this task, the model must generate a detailed description of an event involving a subject of interest in the video.
Region Temporal Localization (RTLoc): In this task, the model must identify the precise time interval within the video when the specified event takes place for the given subject.
Region Dense Video Captioning (RDCap): In this task, a model must generate a detailed description of all events involving a specific subject of interest in a video.

Tip

We have added all PLM-VideoBench tasks to lmms-eval. This makes it easy to reproduce PLM results and also allows other models to be tested on the benchmarks.

You can use the following command to evaluate PLM on PLM-VideoBench.

# Use facebook/Perception-LM-1B for 1B parameters model and facebook/Perception-LM-8B for 8B parameters model.
CHECKPOINTS_PATH=facebook/Perception-LM-3B.

# PLM-VideoBench Tasks
SELECTED_TASK=fgqa_test,sgqa_test,rtloc_test,rcap_test,rdcap_test
OUTPUT_PATH="plm_videobench_evaluation"

accelerate launch --num_processes=8 \
-m lmms_eval \
--model plm \
--model_args pretrained=$CHECKPOINTS_PATH \
--tasks $TASKS \
--batch_size 1 \
--log_samples \
--log_samples_suffix plm \
--output_path $OUTPUT_PATH

Results

We evaluate PLM against baselines on PLM-VideoBench and report breakdowns. We report human performance in the first row.

Model	FGQA (MBacc)	SGQA (Acc)	RDCap (SODA)	RCap (Score)	RTLoc (meanR)	Avg.
Human perf.	90.9	67.9	66.6	53.9	67.8	73.9
GPT-4o	61.2	63.7	20.9	35.7	33.1	51.6
Gemini 1.5 Pro	57.1	49.9	14.4	33.1	27.6	44.0
Gemini 2.0 Flash	58.7	44.8	13.2	30.9	27.6	42.5
LLaVA-OV-7B	40.2	41.5	4.7	24.4	13.9	32.0
Qwen2VL-7B	49.2	44.5	4.1	17.6	15.1	35.3
Qwen2.5VL-7B	49.8	43.0	2.5	21.5	10.7	34.8
InternVL2-8B	47.7	45.9	1.2	21.5	11.6	35.0
InternVL2.5-8B	53.7	48.3	5.7	26.1	8.8	38.5
PLM-8B	67.7	46.2	52.8	46.6	59.1	55.6

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PLM-VideoBench

Results

FilesExpand file tree

plm_videobench.md

Latest commit

History

plm_videobench.md

File metadata and controls

PLM-VideoBench

Results