feat(image): add MetaFormer encoder integration #4053

khaivandangusf2210 · 2025-08-08T12:52:43Z

feat(image): add MetaFormer encoder integration

Summary
Adds a new image encoder based on the MetaFormer family of architectures to Ludwig, enabling users to leverage a flexible token-mixing backbone for image feature extraction. Provides:

Encoder implementation: ludwig/encoders/image/metaformer.py
Schema / config validation: ludwig/schema/encoders/image/metaformer.py
Model variants + stacking helpers (metaformer_integration/)
Tests: tests/ludwig/encoders/test_metaformer_encoder.py
Minimal reproducible image dataset (test_data/) for fast CI validation

Motivation
MetaFormer-style backbones (e.g. ConvNeXt-like and token-mixing generalizations) offer competitive performance and architectural simplicity. This integration broadens Ludwig’s catalog of vision encoders and facilitates experimentation and benchmarking against existing CNN and ViT-style encoders.

Implementation Details

Introduces MetaFormerEncoder class implementing the Ludwig image encoder interface.
Supports selectable variants (defined in metaformer_integration/metaformer_models.py) and a stacked CNN + MetaFormer hybrid (metaformer_stacked_cnn.py).
Registered in ludwig/encoders/image/init.py and corresponding schema init to enable config-based discovery.
Configuration validation via dedicated schema class including fields: variant, pretrained (if applicable later), dropout, embedding size, and internal layer settings.
Keeps external dependencies minimal; no new pip requirements added (relies on existing torch stack).

New Files

ludwig/encoders/image/metaformer.py
ludwig/schema/encoders/image/metaformer.py
metaformer_integration/init.py
metaformer_integration/metaformer_models.py
metaformer_integration/metaformer_stacked_cnn.py
tests/ludwig/encoders/test_metaformer_encoder.py
test_data/ (tiny MNIST subset + few class images) for deterministic encoder test

Modified Files

ludwig/encoders/image/init.py (registration)
ludwig/schema/encoders/image/init.py (schema registration)
ludwig/encoders/image/base.py + ludwig/schema/encoders/image/base.py (minor integration hooks if needed)
ludwig/utils/tokenizers.py (unrelated minor touch if formatting import ordering — auto-format)

Testing
Test file tests/ludwig/encoders/test_metaformer_encoder.py covers:

Construction from declarative config
Forward pass shape consistency
Basic training step (gradient flows)
Variant parameterization logic (at least one additional variant)
The included miniature dataset ensures test reproducibility without external downloads.

Reproducibility
A small curated sample (test_data/) enables running:
pytest tests/ludwig/encoders/test_metaformer_encoder.py -k metaformer
(Works offline and fast.)

Usage Example
YAML snippet:
input_features:

name: image_path
type: image
encoder:
type: metaformer
variant: base # see metaformer_models.py for available variants
dropout: 0.1
output_features:
name: label
type: category

trainer:
epochs: 2

Backward Compatibility
No breaking changes. New encoder is opt-in. Existing image encoders unaffected.

Performance / Future Extensions

Potential future support: loading external pretrained weights, mixed-precision tuning, more variants.
If repository size concerns arise, the embedded test_data/ could be replaced with synthetic generation or hosted externally.

Issue Reference
If an issue exists, add a line such as:
Closes: #ISSUE_NUMBER
(Currently no explicit issue number provided.)

Checklist

Encoder implementation
Schema + validation
Unit / integration style test
Reproducible minimal data
Documentation page update (can be a follow-up PR)
Optional pretrained weight integration (future)

Let me know if you would like the docs updated in this PR; otherwise this focuses on the core encoder integration and tests.

… data

…ted python_venv hook language in CI

feat(image): add MetaFormer encoder integration with tests and sample…

a4afdbd

… data

khaivandangusf2210 requested review from Infernaught, alexsherstinsky, arnavgarg1, geoffreyangus, jeffkinnison, justinxzhao, tgaddair and w4nderlust as code owners August 8, 2025 12:52

chore(pre-commit): downgrade docformatter to v1.5.0 to avoid unsuppor…

b4ab305

…ted python_venv hook language in CI

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(image): add MetaFormer encoder integration #4053

feat(image): add MetaFormer encoder integration #4053

Uh oh!

khaivandangusf2210 commented Aug 8, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

feat(image): add MetaFormer encoder integration #4053

Are you sure you want to change the base?

feat(image): add MetaFormer encoder integration #4053

Uh oh!

Conversation

khaivandangusf2210 commented Aug 8, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant