Merged
Conversation
This commit adds the ability to import XGBoost models saved in Universal Binary JSON (.ubj) format, in addition to the existing JSON format support. Key changes: - Add ubjson library dependency for parsing UBJ binary format - Create XGBoostUbjParser to handle UBJ model files - Extract common tree-to-expression logic into AbstractXGBoostParser base class - Convert flat UBJ array representation to hierarchical tree structure - Extract and apply base_score logit transformation from model metadata - Add test case comparing JSON and UBJ model imports - Add utility tools for UBJ-to-JSON conversion and debugging Enables base score extraction with logistic transformation
Add the ubjson library (com.dev-smart:ubjson) to the allowed dependencies lists across all Maven enforcer configurations. This is required for the XGBoost UBJ format import feature added in the previous commit.
Add a probe method to validate UBJ file structure before parsing, and precompute the base_score logit transformation instead of generating it as a runtime expression string.
Separates feature indices from feature name formatting to enable flexible feature naming in ranking expressions. This allows models to use meaningful feature names (e.g., "mean_radius") instead of generic indexed names, improving readability of generated ranking expressions.
When loading an XGBoost UBJ model, automatically checks for and loads feature names from an optional companion text file. For example, when reading "model.ubj", will look for "model-features.txt" and use those names if present. Key features: - Automatically loads model-features.txt alongside model.ubj - One feature name per line, supports # comments and blank lines - Feature names from file override any names in the UBJ file - Graceful fallback to xgboost_input_X format if file missing or invalid - No-arg toRankingExpression() automatically uses loaded names when valid This enables easy customization of feature names without modifying model files, improving readability of generated ranking expressions.
The importer now extracts and tracks the model's objective function type (e.g., reg:squarederror, binary:logistic) to correctly handle base_score: - Apply logit transformation only for logistic objectives - Use base_score directly for regression objectives - Use objective-specific defaults (0.5 for logistic, 0.0 for regression) - Relax feature name validation to require "at least N" instead of "exactly N"
bjorncs
approved these changes
Dec 12, 2025
Contributor
|
👍 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
@thomasht86 please review
@bjorncs FYI