TRACE Model Fine-Tuning Evaluation Issue: Reported vs. Observed Metrics Mismatch

Hello, I encountered an issue where the results I obtained using the fine-tuned model provided by the authors (MODEL_DIR="model/trace-ft-youcook2") are much closer to the TRACE-UNI baseline and significantly different from the values reported in the paper. Specifically, my metrics are SODA_c_2: 2.3, F1_Score: 18.5, and CIDER: 7.5, whereas the paper reports SODA_c_2: 6.7, F1_Score: 31.8, and CIDER: 35.5. 
I followed the evaluation script (trace/eval/eval.sh) as instructed. Could there be any specific parameters or settings required for evaluating the fine-tuned model that I might have overlooked? 
Any guidance would be greatly appreciated.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TRACE Model Fine-Tuning Evaluation Issue: Reported vs. Observed Metrics Mismatch #8

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

TRACE Model Fine-Tuning Evaluation Issue: Reported vs. Observed Metrics Mismatch #8

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions