generated from allenai/python-package-template
-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Commit 3192ed9
authored
Add MLX-VLM backend support for Apple Silicon (#1)
* feat(backends): add inference backend abstraction layer
Introduce InferenceBackend abstract base class with implementations
for vLLM (NVIDIA GPUs) and MLX-VLM (Apple Silicon). This abstraction
enables olmOCR to support multiple inference backends through a
unified interface.
Key components:
- BackendConfig dataclass for unified backend configuration
- VLLMBackend: OpenAI Chat Completions API, guided decoding support
- MLXVLMBackend: OpenAI Responses API, lazy model loading
- get_backend() factory for backend instantiation
Each backend handles:
- Server process lifecycle management
- Health check endpoints
- Request/response format translation
- Model-specific validation
Breaking changes: None (additive change)
* feat(pipeline): integrate backend abstraction for multi-backend support
Replace hardcoded vLLM logic with backend-agnostic implementation
using the new InferenceBackend abstraction. This enables seamless
switching between vLLM and MLX-VLM backends.
Key changes:
- Thread backend instance through processing pipeline
- Use backend.build_request() and backend.parse_response()
- Dynamic endpoint paths via backend.get_endpoint_path()
- Backend-aware platform checks (skip CUDA for MLX)
- Fix critical model path handling bug:
* vLLM: Use served name "olmocr" (model pre-loaded at startup)
* MLX-VLM: Use actual model path (lazy loading on first request)
- Add CLI support: --backend, --mlx_quantization, --mlx_kv_bits
- Backend-specific port defaults (vLLM: 30024, MLX: 8000)
CLI additions:
- --backend {vllm,mlx-vlm}: Select inference backend
- --custom_prompt: Override default OCR prompt
- --mlx_quantization: MLX model quantization (4bit, 8bit, etc.)
- --mlx_kv_bits: MLX KV-cache quantization bits
Breaking changes: None (default behavior unchanged)
* feat(config): add backend selection and platform validation
Extend PipelineConfig with backend configuration options and
platform-specific validation for MLX-VLM backend.
New configuration fields:
- backend: str = "vllm" - Select inference backend
- mlx_quantization: Optional[str] - MLX quantization (4bit, 8bit, etc.)
- mlx_kv_bits: Optional[int] - KV-cache quantization bits (1, 2, 4, 8)
Validation:
- Ensure backend is "vllm" or "mlx-vlm"
- MLX-specific checks in __post_init__:
* Verify platform is macOS (Darwin)
* Verify architecture is ARM64/Apple Silicon
* Check mlx-vlm package installation
Provides early, clear error messages when attempting to use
MLX backend on unsupported platforms.
Breaking changes: None (additive with safe defaults)
* feat(mlx): add model conversion utility for MLX format
Add convert_to_mlx.py utility that wraps mlx_vlm.convert to simplify
converting olmOCR models from HuggingFace to MLX format.
Features:
- Convert models from HuggingFace Hub or local paths
- Support for quantization (4-bit, 8-bit with configurable group size)
- Platform validation (macOS + Apple Silicon only)
- Optional upload to HuggingFace Hub
- Clear usage instructions and progress logging
Command-line interface:
python -m olmocr.convert_to_mlx MODEL --output PATH [--quantize 4]
Usage example:
python -m olmocr.convert_to_mlx allenai/olmOCR-2-7B-1025 \
--output ~/models/olmocr-mlx --quantize 4 --group-size 64
Implementation details:
- Calls mlx_vlm.convert() directly with q_bits and q_group_size
- Default group size: 64 (same as mlx-community models)
- Validates Apple Silicon before attempting conversion
Dependencies: Requires mlx-vlm>=0.3.5 (installed via olmocr[mlx])
* build: upgrade transformers and add MLX optional dependency
Update dependencies to support both vLLM and MLX-VLM backends.
Changes:
- Upgrade transformers: 4.55.2 → 4.57.0+
* Ensures compatibility with latest HuggingFace models
* Required for both training and inference backends
- Add MLX optional dependency group:
* mlx-vlm>=0.3.5 for Apple Silicon inference
* Install with: pip install olmocr[mlx]
- Add CLI entry point:
* olmocr = "olmocr.pipeline:cli"
* Enables `olmocr` command after installation
Breaking changes: None (transformers upgrade is compatible)
* docs: add comprehensive MLX backend guide
Add detailed documentation for using olmOCR with MLX-VLM backend
on Apple Silicon Macs. Integrated into Sphinx documentation site.
Location: docs/source/mlx-backend.md (added to Getting Started section)
Contents:
- Overview of MLX-VLM vs vLLM backends
- System requirements (M1/M2/M3/M4, macOS 12.0+, 16GB+ RAM)
- Installation instructions
- Quick start guide with pre-quantized models
- Configuration options and CLI flags
- Model selection guide (4-bit vs 8-bit quantization)
- Performance optimization tips
- Troubleshooting section
- API differences between vLLM and MLX-VLM
- Current limitations and workarounds
- Performance benchmarks on different Mac models
Key information:
- Default port: 8000 (vs 30024 for vLLM)
- API endpoint: /responses (vs /v1/chat/completions for vLLM)
- No guided decoding support (uses post-validation instead)
- Pre-quantized models available:
* mlx-community/olmOCR-2-7B-1025-mlx-4bit (~2GB)
* mlx-community/olmOCR-2-7B-1025-mlx-8bit (~4GB)
Target audience: Users with Apple Silicon Macs wanting on-device
inference without cloud costs or NVIDIA GPU requirements.
* chore: add workspace/ to .gitignore
Ignore workspace/ directory used for test runs and pipeline output.
Similar to existing localworkspace/* entry.
* docs: update minimum macOS version to 15.0+ (Sequoia)
Update system requirements to require macOS 15.0+ instead of 12.0+.
This reflects the tested and recommended minimum version for MLX-VLM
backend support.1 parent 99825da commit 3192ed9Copy full SHA for 3192ed9
File tree
Expand file treeCollapse file tree
9 files changed
+1414
-81
lines changedOpen diff view settings
Filter options
- docs/source
- olmocr
Expand file treeCollapse file tree
9 files changed
+1414
-81
lines changedOpen diff view settings
Collapse file
+1Lines changed: 1 addition & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
11 | 11 | | |
12 | 12 | | |
13 | 13 | | |
| 14 | + | |
14 | 15 | | |
15 | 16 | | |
16 | 17 | | |
| |||
Collapse file
+1Lines changed: 1 addition & 0 deletions
- Display the source diff
- Display the rich diff
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
7 | 7 | | |
8 | 8 | | |
9 | 9 | | |
| 10 | + | |
10 | 11 | | |
11 | 12 | | |
12 | 13 | | |
| |||
0 commit comments