Arnej/xgboost ubj import by arnej27959 · Pull Request #35508 · vespa-engine/vespa

arnej27959 · 2025-12-12T14:46:50Z

This commit adds the ability to import XGBoost models saved in Universal Binary JSON (.ubj) format, in addition to the existing JSON format support. Key changes: - Add ubjson library dependency for parsing UBJ binary format - Create XGBoostUbjParser to handle UBJ model files - Extract common tree-to-expression logic into AbstractXGBoostParser base class - Convert flat UBJ array representation to hierarchical tree structure - Extract and apply base_score logit transformation from model metadata - Add test case comparing JSON and UBJ model imports - Add utility tools for UBJ-to-JSON conversion and debugging Enables base score extraction with logistic transformation

Add the ubjson library (com.dev-smart:ubjson) to the allowed dependencies lists across all Maven enforcer configurations. This is required for the XGBoost UBJ format import feature added in the previous commit.

Add a probe method to validate UBJ file structure before parsing, and precompute the base_score logit transformation instead of generating it as a runtime expression string.

Separates feature indices from feature name formatting to enable flexible feature naming in ranking expressions. This allows models to use meaningful feature names (e.g., "mean_radius") instead of generic indexed names, improving readability of generated ranking expressions.

When loading an XGBoost UBJ model, automatically checks for and loads feature names from an optional companion text file. For example, when reading "model.ubj", will look for "model-features.txt" and use those names if present. Key features: - Automatically loads model-features.txt alongside model.ubj - One feature name per line, supports # comments and blank lines - Feature names from file override any names in the UBJ file - Graceful fallback to xgboost_input_X format if file missing or invalid - No-arg toRankingExpression() automatically uses loaded names when valid This enables easy customization of feature names without modifying model files, improving readability of generated ranking expressions.

The importer now extracts and tracks the model's objective function type (e.g., reg:squarederror, binary:logistic) to correctly handle base_score: - Apply logit transformation only for logistic objectives - Use base_score directly for regression objectives - Use objective-specific defaults (0.5 for logistic, 0.0 for regression) - Relax feature name validation to require "at least N" instead of "exactly N"

thomasht86 · 2025-12-16T06:47:23Z

👍

arnej27959 added 9 commits December 12, 2025 14:05

Allow ubjson dependency in Maven enforcer configurations

79d595b

Add the ubjson library (com.dev-smart:ubjson) to the allowed dependencies lists across all Maven enforcer configurations. This is required for the XGBoost UBJ format import feature added in the previous commit.

Improve XGBoost UBJ import

964646a

Add a probe method to validate UBJ file structure before parsing, and precompute the base_score logit transformation instead of generating it as a runtime expression string.

Clean up XGBoost feature filename handling

2425760

minimize visibility of ubjson library

92013a9

use standard mechanism for ubjson version management

b55f4f5

bjorncs approved these changes Dec 12, 2025

View reviewed changes

arnej27959 merged commit 728f288 into master Dec 15, 2025
4 checks passed

arnej27959 deleted the arnej/xgboost-ubj-import branch December 15, 2025 22:08

thomasht86 mentioned this pull request Mar 10, 2026

update xgboost docs vespa-engine/documentation#4573

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Arnej/xgboost ubj import#35508

Arnej/xgboost ubj import#35508
arnej27959 merged 9 commits intomasterfrom
arnej/xgboost-ubj-import

arnej27959 commented Dec 12, 2025

Uh oh!

Uh oh!

thomasht86 commented Dec 16, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

arnej27959 commented Dec 12, 2025

Uh oh!

Uh oh!

thomasht86 commented Dec 16, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants