Skip to content

Conversation

@khaivandangusf2210
Copy link

feat(image): add MetaFormer encoder integration

Summary
Adds a new image encoder based on the MetaFormer family of architectures to Ludwig, enabling users to leverage a flexible token-mixing backbone for image feature extraction. Provides:

  • Encoder implementation: ludwig/encoders/image/metaformer.py
  • Schema / config validation: ludwig/schema/encoders/image/metaformer.py
  • Model variants + stacking helpers (metaformer_integration/)
  • Tests: tests/ludwig/encoders/test_metaformer_encoder.py
  • Minimal reproducible image dataset (test_data/) for fast CI validation

Motivation
MetaFormer-style backbones (e.g. ConvNeXt-like and token-mixing generalizations) offer competitive performance and architectural simplicity. This integration broadens Ludwig’s catalog of vision encoders and facilitates experimentation and benchmarking against existing CNN and ViT-style encoders.

Implementation Details

  • Introduces MetaFormerEncoder class implementing the Ludwig image encoder interface.
  • Supports selectable variants (defined in metaformer_integration/metaformer_models.py) and a stacked CNN + MetaFormer hybrid (metaformer_stacked_cnn.py).
  • Registered in ludwig/encoders/image/init.py and corresponding schema init to enable config-based discovery.
  • Configuration validation via dedicated schema class including fields: variant, pretrained (if applicable later), dropout, embedding size, and internal layer settings.
  • Keeps external dependencies minimal; no new pip requirements added (relies on existing torch stack).

New Files

  • ludwig/encoders/image/metaformer.py
  • ludwig/schema/encoders/image/metaformer.py
  • metaformer_integration/init.py
  • metaformer_integration/metaformer_models.py
  • metaformer_integration/metaformer_stacked_cnn.py
  • tests/ludwig/encoders/test_metaformer_encoder.py
  • test_data/ (tiny MNIST subset + few class images) for deterministic encoder test

Modified Files

  • ludwig/encoders/image/init.py (registration)
  • ludwig/schema/encoders/image/init.py (schema registration)
  • ludwig/encoders/image/base.py + ludwig/schema/encoders/image/base.py (minor integration hooks if needed)
  • ludwig/utils/tokenizers.py (unrelated minor touch if formatting import ordering — auto-format)

Testing
Test file tests/ludwig/encoders/test_metaformer_encoder.py covers:

  • Construction from declarative config
  • Forward pass shape consistency
  • Basic training step (gradient flows)
  • Variant parameterization logic (at least one additional variant)
    The included miniature dataset ensures test reproducibility without external downloads.

Reproducibility
A small curated sample (test_data/) enables running:
pytest tests/ludwig/encoders/test_metaformer_encoder.py -k metaformer
(Works offline and fast.)

Usage Example
YAML snippet:
input_features:

  • name: image_path
    type: image
    encoder:
    type: metaformer
    variant: base # see metaformer_models.py for available variants
    dropout: 0.1
    output_features:
  • name: label
    type: category

trainer:
epochs: 2

Backward Compatibility
No breaking changes. New encoder is opt-in. Existing image encoders unaffected.

Performance / Future Extensions

  • Potential future support: loading external pretrained weights, mixed-precision tuning, more variants.
  • If repository size concerns arise, the embedded test_data/ could be replaced with synthetic generation or hosted externally.

Issue Reference
If an issue exists, add a line such as:
Closes: #ISSUE_NUMBER
(Currently no explicit issue number provided.)

Checklist

  • Encoder implementation
  • Schema + validation
  • Unit / integration style test
  • Reproducible minimal data
  • Documentation page update (can be a follow-up PR)
  • Optional pretrained weight integration (future)

Let me know if you would like the docs updated in this PR; otherwise this focuses on the core encoder integration and tests.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant