Skip to content

Commit 3192ed9

Browse files
authored
Add MLX-VLM backend support for Apple Silicon (#1)
* feat(backends): add inference backend abstraction layer Introduce InferenceBackend abstract base class with implementations for vLLM (NVIDIA GPUs) and MLX-VLM (Apple Silicon). This abstraction enables olmOCR to support multiple inference backends through a unified interface. Key components: - BackendConfig dataclass for unified backend configuration - VLLMBackend: OpenAI Chat Completions API, guided decoding support - MLXVLMBackend: OpenAI Responses API, lazy model loading - get_backend() factory for backend instantiation Each backend handles: - Server process lifecycle management - Health check endpoints - Request/response format translation - Model-specific validation Breaking changes: None (additive change) * feat(pipeline): integrate backend abstraction for multi-backend support Replace hardcoded vLLM logic with backend-agnostic implementation using the new InferenceBackend abstraction. This enables seamless switching between vLLM and MLX-VLM backends. Key changes: - Thread backend instance through processing pipeline - Use backend.build_request() and backend.parse_response() - Dynamic endpoint paths via backend.get_endpoint_path() - Backend-aware platform checks (skip CUDA for MLX) - Fix critical model path handling bug: * vLLM: Use served name "olmocr" (model pre-loaded at startup) * MLX-VLM: Use actual model path (lazy loading on first request) - Add CLI support: --backend, --mlx_quantization, --mlx_kv_bits - Backend-specific port defaults (vLLM: 30024, MLX: 8000) CLI additions: - --backend {vllm,mlx-vlm}: Select inference backend - --custom_prompt: Override default OCR prompt - --mlx_quantization: MLX model quantization (4bit, 8bit, etc.) - --mlx_kv_bits: MLX KV-cache quantization bits Breaking changes: None (default behavior unchanged) * feat(config): add backend selection and platform validation Extend PipelineConfig with backend configuration options and platform-specific validation for MLX-VLM backend. New configuration fields: - backend: str = "vllm" - Select inference backend - mlx_quantization: Optional[str] - MLX quantization (4bit, 8bit, etc.) - mlx_kv_bits: Optional[int] - KV-cache quantization bits (1, 2, 4, 8) Validation: - Ensure backend is "vllm" or "mlx-vlm" - MLX-specific checks in __post_init__: * Verify platform is macOS (Darwin) * Verify architecture is ARM64/Apple Silicon * Check mlx-vlm package installation Provides early, clear error messages when attempting to use MLX backend on unsupported platforms. Breaking changes: None (additive with safe defaults) * feat(mlx): add model conversion utility for MLX format Add convert_to_mlx.py utility that wraps mlx_vlm.convert to simplify converting olmOCR models from HuggingFace to MLX format. Features: - Convert models from HuggingFace Hub or local paths - Support for quantization (4-bit, 8-bit with configurable group size) - Platform validation (macOS + Apple Silicon only) - Optional upload to HuggingFace Hub - Clear usage instructions and progress logging Command-line interface: python -m olmocr.convert_to_mlx MODEL --output PATH [--quantize 4] Usage example: python -m olmocr.convert_to_mlx allenai/olmOCR-2-7B-1025 \ --output ~/models/olmocr-mlx --quantize 4 --group-size 64 Implementation details: - Calls mlx_vlm.convert() directly with q_bits and q_group_size - Default group size: 64 (same as mlx-community models) - Validates Apple Silicon before attempting conversion Dependencies: Requires mlx-vlm>=0.3.5 (installed via olmocr[mlx]) * build: upgrade transformers and add MLX optional dependency Update dependencies to support both vLLM and MLX-VLM backends. Changes: - Upgrade transformers: 4.55.2 → 4.57.0+ * Ensures compatibility with latest HuggingFace models * Required for both training and inference backends - Add MLX optional dependency group: * mlx-vlm>=0.3.5 for Apple Silicon inference * Install with: pip install olmocr[mlx] - Add CLI entry point: * olmocr = "olmocr.pipeline:cli" * Enables `olmocr` command after installation Breaking changes: None (transformers upgrade is compatible) * docs: add comprehensive MLX backend guide Add detailed documentation for using olmOCR with MLX-VLM backend on Apple Silicon Macs. Integrated into Sphinx documentation site. Location: docs/source/mlx-backend.md (added to Getting Started section) Contents: - Overview of MLX-VLM vs vLLM backends - System requirements (M1/M2/M3/M4, macOS 12.0+, 16GB+ RAM) - Installation instructions - Quick start guide with pre-quantized models - Configuration options and CLI flags - Model selection guide (4-bit vs 8-bit quantization) - Performance optimization tips - Troubleshooting section - API differences between vLLM and MLX-VLM - Current limitations and workarounds - Performance benchmarks on different Mac models Key information: - Default port: 8000 (vs 30024 for vLLM) - API endpoint: /responses (vs /v1/chat/completions for vLLM) - No guided decoding support (uses post-validation instead) - Pre-quantized models available: * mlx-community/olmOCR-2-7B-1025-mlx-4bit (~2GB) * mlx-community/olmOCR-2-7B-1025-mlx-8bit (~4GB) Target audience: Users with Apple Silicon Macs wanting on-device inference without cloud costs or NVIDIA GPU requirements. * chore: add workspace/ to .gitignore Ignore workspace/ directory used for test runs and pipeline output. Similar to existing localworkspace/* entry. * docs: update minimum macOS version to 15.0+ (Sequoia) Update system requirements to require macOS 15.0+ instead of 12.0+. This reflects the tested and recommended minimum version for MLX-VLM backend support.
1 parent 99825da commit 3192ed9

File tree

9 files changed

+1414
-81
lines changed

9 files changed

+1414
-81
lines changed

.gitignore

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -11,6 +11,7 @@ sample200_vllm/*
1111
sample200_sglang/*
1212
pdelfin_testset/*
1313
localworkspace/*
14+
workspace/*
1415
math_data/*
1516
math_data_big/*
1617
gpt4otestset/*

docs/source/index.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -7,6 +7,7 @@
77
88
installation
99
overview
10+
mlx-backend
1011
```
1112

1213
```{toctree}

0 commit comments

Comments
 (0)