Change the repository type filter
All
Repositories list
10 repositories
GAGE
PublicGeneral AI evaluation and Gauge Engine. A unified evaluation engine for LLMs, MLLMs, audio, and diffusion models.FinMTM
PublicFinMTM: A Multi-Turn Multimodal Benchmark for Financial Reasoning and Agent EvaluationBizFinBench.v2
PublicBizFinBench.v2: A Unified Offline–Online Bilingual Benchmark for Expert-Level Financial Capability Evaluation of LLMsCCPO
PublicCompress2Focus: Efficient Coordinate Compression for Policy Optimization in Multi-Turn GUI AgentsBizFinBench
PublicA Business-Driven Real-World Financial Benchmark for Evaluating LLMsPuzzleClone
PublicPuzzleClone: An SMT-Powered Framework for Synthesizing Verified Mathematical Reasoning DataMME-Finance
Public[MM 2025] A Multimodal Finance Benchmark for Expert-level Understanding and ReasoningNEXUS-O
Public[MM 2025] NEXUS-O: An Omni-Perceptive And -Interactive Model for Language, Audio, And VisionPolyhedronEvaluator
PublicPublished_Papers
Public