Skip to content

[SECURITY] Pickle allowlist in utils/serialization.py is bypassable (builtins.getattr allowed, duplicate keys drop PredictionTask) #814

@shaun0927

Description

@shaun0927

Bug description

The BASE_SERIALIZATION_CLASSES allowlist introduced by PR #802 (commit ab7207cf, "Sec pic fix") in transformers4rec/utils/serialization.py has two structural issues that defeat the intended restriction on Unpickler.find_class.

File (at ab7207cf):

BASE_SERIALIZATION_CLASSES = {
"builtins": [
"Exception", "ValueError", "NotImplementedError", "AttributeError",
"AssertionError"
], # each Exception Error class needs to be added explicitly
"collections": ["OrderedDict", "defaultdict"],
"datetime": ["timedelta"],
"pathlib": ["PosixPath"],
"functools": ["partial"],
"transformers4rec.torch.model.base": ["Model", "Head", "PredictionTask"],
"transformers4rec.torch.block.base": ["SequentialBlock"],
"transformers4rec.torch.features.sequence": ["TabularSequenceFeatures", "SequenceEmbeddingFeatures"],
"transformers4rec.torch.features.continuous": ["ContinuousFeatures"],
"transformers4rec.torch.features.embedding": ["FeatureConfig", "EmbeddingConfig", "TableConfig", "PretrainedEmbeddingFeatures"],
"transformers4rec.torch.features.sparse": ["SparseFeatures"],
"transformers4rec.torch.features.tabular": ["TabularFeatures"],
"transformers4rec.torch.tabular.base": ["FilterFeatures", "AsTabular"],
"transformers4rec.torch.tabular.aggregation": ["ConcatFeatures"],
"transformers4rec.torch.block.base": ["Block", "SequentialBlock"],
"transformers4rec.torch.block.mlp": ["DenseBlock"],
"transformers4rec.torch.block.transformer": ["TransformerBlock"],
"transformers4rec.torch.masking": ["CausalLanguageModeling"],
"transformers4rec.torch.model.base": ["forward_to_prediction_fn", "Model", "Head"],
"transformers4rec.torch.model.prediction_task": ["NextItemPredictionTask", "_NextItemPredictionTask"],
"transformers4rec.torch.ranking_metric": ["NDCGAt", "DCGAt", "AvgPrecisionAt", "PrecisionAt", "RecallAt"],
"transformers4rec.config.transformer": ["XLNetConfig"],
"torch.nn.modules.container": ["ModuleList","ModuleDict"],
"torch.nn.modules.loss": ["CrossEntropyLoss"],
"merlin_standard_lib.schema.schema": ["Schema", "ColumnSchema"],
"merlin_standard_lib.proto.schema_bp": [
"FeaturePresence", "FeaturePresenceWithinGroup", "FeatureType", "FixedShape", "ValueCount", "ValueCountList",
"IntDomain", "FloatDomain", "StringDomain", "BoolDomain", "StructDomain", "NaturalLanguageDomain",
"FeatureCoverageConstraints", "SequenceLengthConstraints", "ImageDomain", "MIDDomain", "URLDomain",
"TimeDomain", "TimeOfDayDomain", "DistributionConstraints", "Annotation", "FeatureComparator", "InfinityNorm",
"JensenShannonDivergence", "UniqueConstraints", "DatasetConstraints", "NumericValueComparator"],
"torch.nn.init": ["kaiming_normal_", "kaiming_uniform_", "xavier_normal_", "xavier_uniform_", "uniform_", "normal_", "zeros_", "ones_"],
"torch._utils": ["_rebuild_tensor_v2", "_rebuild_parameter"],
"torch": ["Size", "device"],
"torch.storage": ["_load_from_bytes"],
"torch._C._nn": ["gelu"],
"torch.nn.module": ["Module"],
"torch.nn.modules.activation": ["ReLU", "Sigmoid", "Tanh"],
"torch.nn.modules.linear": ["Linear", "Identity"],
"torch.nn.modules.conv": ["Conv1d", "Conv2d", "Conv3d"],
"torch.nn.modules.pooling": ["MaxPool1d", "MaxPool2d", "MaxPool3d"],
"torch.nn.modules.normalization": ["BatchNorm1d", "BatchNorm2d", "BatchNorm3d", "LayerNorm"],
"torch.nn.modules.dropout": ["Dropout", "Dropout2d", "Dropout3d"],
"torch.nn.modules.rnn": ["RNN", "LSTM", "GRU"],
"torch.nn.modules.sparse": ["Embedding"],
"torch.optim.adam": ["Adam"],
"torchmetrics.metric": ["jit_distributed_available"],
"torchmetrics.utilities.data": ["dim_zero_cat"],
"transformers.models.xlnet.modeling_xlnet": ["XLNetModel", "XLNetLayer", "XLNetRelativeAttention", "XLNetFeedForward"],
"transformers.activations": ["GELUActivation"],
"transformers.modeling_utils": ["SequenceSummary"],
"builtins": ["getattr"],
}

The file was subsequently removed in PR #808 (41b14d7b), but checkpoints produced during the #802 → #807 window still exist in the wild and the design questions apply to any follow-up allowlist.

(1) builtins key appears twice. Python dict literals are last-write-wins, so only the second declaration survives:

BASE_SERIALIZATION_CLASSES = {
    "builtins": [
        "Exception", "ValueError", "NotImplementedError", "AttributeError",
        "AssertionError"
    ],
    ...
    "builtins": ["getattr"],   # <-- this overrides the first "builtins" key
}

Net result at runtime: builtins maps to ["getattr"]. The Exception subclasses (intentionally allowlisted) are silently dropped, and builtins.getattr — a well-known primitive in pickle gadget chains (attribute traversal → code execution) — is approved.

(2) torch.storage._load_from_bytes is in the allowlist. _load_from_bytes wraps torch.load, which itself performs unrestricted pickle deserialization when the installed PyTorch version predates the weights_only=True default. Combined with getattr, this provides a reachable path from the restricted unpickler to arbitrary code execution.

(3) transformers4rec.torch.model.base is also duplicated. The first declaration includes PredictionTask; the second (which wins) does not:

"transformers4rec.torch.model.base": ["Model", "Head", "PredictionTask"],
...
"transformers4rec.torch.model.base": ["forward_to_prediction_fn", "Model", "Head"],   # wins; PredictionTask dropped

So any checkpoint containing PredictionTask fails to deserialize with ValueError from the restricted unpickler. This is an availability regression, not a security one, but it indicates the dict literal was not reviewed carefully before merge.

Steps/Code to reproduce bug

Structural evidence (no exploit payload):

git show ab7207cf:transformers4rec/utils/serialization.py | grep -n '"builtins"'
# 12:    "builtins": [
# 66:    "builtins": ["getattr"],

git show ab7207cf:transformers4rec/utils/serialization.py | grep -n 'torch.storage'
# 49:    "torch.storage": ["_load_from_bytes"],

Evaluating the module confirms the surviving values:

import ast
src = open(".../serialization.py").read()
ns = {}
exec(src, ns)
print(ns["BASE_SERIALIZATION_CLASSES"]["builtins"])            # ['getattr']
print(ns["BASE_SERIALIZATION_CLASSES"]["torch.storage"])       # ['_load_from_bytes']
print("PredictionTask" in ns["BASE_SERIALIZATION_CLASSES"]
      ["transformers4rec.torch.model.base"])                    # False

I am intentionally not attaching an end-to-end exploit payload to a public issue; the combination of builtins.getattr + torch.storage._load_from_bytes is sufficient for any reader familiar with pickle gadget chains to reproduce.

Expected behavior

  • The allowlist does not contain builtins.getattr or torch.storage._load_from_bytes.
  • The dict has no duplicate keys (a flake8-bugbear B033 or similar lint would have caught this).
  • The design shifts away from pickle-based serialization for model weights: either safetensors, or torch.load(..., weights_only=True) after bumping the minimum PyTorch version (see the companion torch.load issue I am filing).

Environment details

  • Transformers4Rec: commits ab7207cf (PR Sec pic fix #802) through 0e31f575 (PR Fix scaler #807); file removed at 41b14d7b (PR Fix scaler #808) but affected checkpoints persist.
  • Python: any
  • PyTorch: any (the _load_from_bytes gadget exists across all modern versions)

Additional context

Suggested remediation order:

  1. Restore allowlist-based protection only as a temporary measure — remove builtins.getattr and torch.storage._load_from_bytes, de-duplicate all keys.
  2. Transition model-weight serialization to a format that does not execute code on load (safetensors is the common choice for HF-adjacent projects).
  3. For any remaining pickle-based artifacts, wrap torch.load with an explicit weights_only=True (see the companion issue).

I can prepare separate PRs for each step if the maintainers prefer.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions