Releases: roboflow/inference
v1.0.5
What's Changed
- feat: add model cold start, model/workflow/workspace ID response headers by @hansent in #2052
- Support yololite object detection in inference_models with ONNX backend by @leeclemnet in #2078
- Fix the input parameter types accepted by the Ethernet IP PLC block for PLC reads/writes by @shntu in #2061
- Try to address TRT issue by @PawelPeczek-Roboflow in #2079
- Release new inference-models by @PawelPeczek-Roboflow in #2084
- Email message serialization fix by @dkosowski87 in #2083
- Support New Roboflow API Usage Paused Error 423 by @maxschridde1494 in #2082
- Expose /healthz and /readiness endpoints even if API_KEY is not set by @ecarrara in #2077
- feat: inference_models adapters respect countinference for credit verification bypass by @hansent in #2081
- Update docs by @Erol444 in #2076
- Reduce flash-attn MAX_JOBS to 1 for JP7.1 build by @alexnorell in #2068
- Bump the npm_and_yarn group across 2 directories with 10 updates by @dependabot[bot] in #2085
- Fix shared model cache race conditions causing pod crashes by @hansent in #2080
- fix inference-models pypi publishing by @grzegorz-roboflow in #2086
- Qwen3 5 and move to transformers 5 by @Matvezy in #2070
- Correct resize procedure for RF-DETR models trained on versions with non-stretch, non-square resize by @mkaic in #2067
- ENT-969: Add TestPatternStreamProducer as a built-in video source type by @NVergunst-ROBO in #2056
- fix: handle expired Redis lock release gracefully by @rafel-roboflow in #2060
- Revert/qwen 3.5 by @PawelPeczek-Roboflow in #2087
- feat: gate structured access logging behind STRUCTURED_API_LOGGING env var by @hansent in #2088
- Deploy inference-models-0.19.5 by @PawelPeczek-Roboflow in #2089
New Contributors
- @maxschridde1494 made their first contribution in #2082
- @ecarrara made their first contribution in #2077
Full Changelog: v1.0.4...v1.0.5
v1.0.4
What's Changed
- Update sam3_3d tdfy commit to latest main by @leeclemnet in #2050
- skip /usage/plan request when api key is not provided by @rafel-roboflow in #2059
- Fix issue with rfdetr-segmentation class remapping by @PawelPeczek-Roboflow in #2075
Full Changelog: v1.0.3...v1.0.4
v1.0.3
What's Changed
- Fix JP7.1 container build OOM during ORT compilation by @alexnorell in #2065
- Add AV codec dependencies to base image by @shntu in #2039
- Fix: Updated aiohttp to >=3.13.3 to address CVEs (#1949) by @thchann in #2069
- Add upper-bound constraints for aiohttp by @PawelPeczek-Roboflow in #2071
- Change the ranking priority for AutoLoader - ONNX packages over Torch by @PawelPeczek-Roboflow in #2047
- Loosening typing-extensions dependency by @PawelPeczek-Roboflow in #2072
- Prepare inference
1.0.3release by @PawelPeczek-Roboflow in #2073
New Contributors
Full Changelog: v1.0.2...v1.0.3
v1.0.2
What's Changed
- Add single-tenant workflow cache mode and thread
workflow_version_idacross the stack by @alexnorell in #2031 - Add JetPack 7.1 container build workflow and CLI support by @alexnorell in #2032
- Fix: Set task_type for SegmentAnything3_3D_Objects by @leeclemnet in #2030
- feat(workflows): support custom image names in dataset upload block by @rafel-roboflow in #2034
- Expose inference configuration flags for sam3-3d by @leeclemnet in #2040
- feat(sam3): enable SDK-based remote execution for SAM3 workflow blocks by @hansent in #2042
- Add examples/sam-3d notebooks by @leeclemnet in #2043
- Add per-request 100ms duration floor via internal execution header by @hansent in #2037
- bugfix: fix version field in polygon and halo v2 visualization block manifests by @lrosemberg in #2044
- Fix large weights cdn download issue by @Matvezy in #2046
- Fix torch.compile for sam3-3d by @leeclemnet in #2041
- Fix overlapping parameter in inference-cli by @PawelPeczek-Roboflow in #2038
- Bug/dg 306 wrong workflow that doesnt raise error and provokes 500 by @rafel-roboflow in #2036
- Add output from mask measurement block to label visualization by @jeku46 in #2035
- feat: add PINNED_MODELS and PRELOAD_API_KEY for preload on serverless by @hansent in #2048
- Bump version to 1.0.2 by @PawelPeczek-Roboflow in #2051
Full Changelog: v1.0.1...v1.0.2
v1.0.1
What's Changed
- Fix issue with RF-Detr model post-processing in TRT by @PawelPeczek-Roboflow in #2029
Full Changelog: v1.0.0...v1.0.1
v1.0.0
🚀 Added
💪 inference 1.0.0 just landed 🔥
We are excited to announce the official 1.0.0 release of Inference - which was announced 2 weeks ago with 1.0.0rc1 preview release.
Over the past years, Inference has evolved from a lightweight prediction server into a widely adopted runtime powering local deployments, Docker workloads, edge devices, and production systems. After hundreds of releases, the project has matured — and so has the need for something faster, more modular, and more future-proof.
inference 1.0.0 closes one chapter and opens another. This release introduces a new prediction engine that will serve as the foundation for future development.
⚡ New prediction engine: inference-models
We are introducing inference-models, a redesigned engine to run models focused on:
- faster model loading and inference
- improved resource utilization
- better modularity and extensibility
- cleaner separation between serving and model runtime
- support from different backends - including TensorRT
Important
With inference 1.0.0 we released also first stable build of inference-models 0.19.0. You can use the engine in inference - just set env variable USE_INFERENCE_MODELS=True
Caution
The new inference-models engine is wrapped with adapters - to serve as dropdown replacement for old engine. We are making it default engine on Roboflow platform, but clients running inference locally have the USE_INFERENCE_MODELS set to False by default. We would like all clients to test the new engine - when the flag is not set, inference works as usually.
In approximately 2 weeks, with inference 1.1.0 release - we will make inference-models default engine for everyone.
Caution
inference-models is completely new backend, we've fixed a lot of problems and bugs. As a result - predictions from your model may be different - but according to our tests, quality-wise they are better. That being said, we still may have introduced some minor bugs - please report us any problems - we will do our best to fix problems 🙏
🛣️ Roadmap
Todays release is just a start for broader changes in inference - the plan for the future is the following:
- shortly after release, we will complete our work around Roboflow platform - including migration of small fraction of models not onboarded into new registry used by
inference-modelsand adjusting automations on the platform - until finished, clients who very recently uploaded or renamed models may be impacted by HTTP 404 - contact us to receive support in such cases. - there will be consecutive hot-fixes (if needed) - released as
1.0.xversions. - clients running
inferencelocally should testinference-modelsbackend now, as in approximately 2 weeks,inference-modelswill become default engine - We have still some work to do in
1.x.x- mainly to provide patches - but we start a march towards 2.0, which should bring new quality for other components ofinference- stay tuned for updates. - You should expect that new contributions to
inferencewill be based oninference-modelsengine and may not work if you don't migrate.
Caution
One of the problem we have not addressed in 1.0.0 is models cache purge - new inference-models engine uses different structure of the local cache than old engine. As a result - inference server with USE_INFERENCE_MODELS=True does not perform clean-up on volume with models pulled from the platform. If you run locally, generally that should not be an issue, since we expect clients only use limited number of different models in their deployments.
If you use large amount of models or when your disk space is tight, running new inference you should perform periodic clean-ups of /tmp/cache. This issue will be addressed before 1.1.0 release.
🎨 Semantic Segmentation in inference
Thanks to @leeclemnet, DeepLabV3Plus segmentation model was onboarded to inference and can be used by clients.
📐 Area Measurement block 🤝 Workflows
Thanks to @jeku46 we can now measure area size with Workflows.
🚧 Maintanence
- add missing ffmpeg package for dev by @rafel-roboflow in #2009
- fix expose sam3 with proper envs by @rafel-roboflow in #2011
- Detections Class Replacement support for strings by @Erol444 in #2000
- fix: Send termination_reason via data channel on WebRTC stream timeout by @balthazur in #2008
- Remove content length validation to allow for chunked responses by @dkosowski87 in #2015
- added
processing_timeoutsupport to webrtc'sStreamConfigdataclass by @Erol444 in #2017 - fix: Return 400 instead of 500 for raw bytes sent as base64 image by @bigbitbus in #2016
- Added claude sonnet 4.6 by @Erol444 in #2014
- Fix mkdocs-macros Jinja2 syntax errors in generated block docs by @yeldarby in #2012
- Add remote GPU processing time collection and forwarding by @hansent in #2007
- Add semantic-segmentation endpoints + deep_lab_v3_plus by @leeclemnet in #2018
- Update CODEOWNERS: Add dkosowski87 and reorganize team assignments by @hansent in #2021
- Add support for gemini 3.1 pro in gemini block by @Erol444 in #2024
- Add area_measurement workflow block by @jeku46 in #2013
- Auto-detect Jetson JetPack version in CLI server start by @alexnorell in #1958
- Ged rid of unstable assertions on predictions in e2e tests by @PawelPeczek-Roboflow in #2026
- ENT-884: Add
workflow_version_idsupport to inference pipeline by @NVergunst-ROBO in #2022 - Add JetPack 7.1 support for NVIDIA Thor by @alexnorell in #1935
🏅 New Contributors
- @dkosowski87 made their first contribution in #2015
- @leeclemnet made their first contribution in #2018
Full Changelog: v0.64.8...v1.0.0
v0.64.8
💪 Added
- Fisheye cameras in camera calibration block by @Erol444 in #1996
Calibration block was supporting polynomial calibration which is not handling fisheye distortions well. This change adds support for fisheye calibration.
- Heatmap block by @Erol444 in #1986
This change adds heatmap block (uses supervision's heatmap annotator), which supports both:
- detections, so heatmap based on where detections were
- tracklets, which ignores stationary objects (default: on), so we heatmap the movements not the objects
heatmap2.mp4
🚧 Maintanence
- temporarily pin z3-solver version by @grzegorz-roboflow in #1990
- Code workflow block icon issue by @Erol444 in #1988
- Optimize cosine_similarity by @KRRT7 in #1989
- add inference version to the request headers by @japrescott in #1985
- Fix video frame count estimation by detecting actual FPS from uploaded video by @rafel-roboflow in #1992
- Mark file processing in webrtc worker for downstream blocks to pick frame timestamp correctly by @grzegorz-roboflow in #1995
- add frame size to webrtc video metadata by @rafel-roboflow in #1997
- enable gzip compression by default by @rafel-roboflow in #1998
- WIP: enabled sam3 visual segment by @rafel-roboflow in #1975
- added ffmpeg to docker dependencies by @rafel-roboflow in #2002
- rename seg-preview to sam3 by @rafel-roboflow in #2005
- Fix RF-DETR-Seg mask postprocessing for letterboxed input case by @mkaic in #2001
- Enable inference pipeline api on jetpack 6.2.0 by @grzegorz-roboflow in #2006
Full Changelog: v0.64.7...v0.64.8
v1.0.0rc1
inference 1.0.0rc1 — Release Candidate
Today marks an important milestone for Inference.
Over the past years, Inference has grown from a lightweight prediction server into a widely adopted runtime used across local deployments, Docker, edge devices, and production systems. Hundreds of releases later, the project has matured significantly — and so has the need for a faster, more modular, and future-proof.
inference 1.0.0rc1 is a preview of 1.0.0 release which will close one chapter and open another - this release introduces a new prediction engine that will become the foundation for all future development.
🚀 New prediction engine - inference-models
We are introducing inference-models, a redesigned execution engine focused on:
- faster model loading and inference
- improved resource utilization
- better modularity and extensibility
- cleaner separation between serving and model runtime
- stronger foundations for future major versions
The engine is already available today in:
inference-modelspackage → 0.18.6rc8 (RC)inferencepackage and Docker → enabled with env variable
USE_INFERENCE_MODELS=True
inference-models wrapped within old inference is a drop-down replacement. This allows testing the new runtime without changing existing integrations.
Important
Predictions from your models may change - but generally for better! inference-models is completely new engine for running models, we have fixed a lot of bugs and make it multi-backend - capable to run onnx, torch and even trt models! It automatically negotiate with Roboflow model registry to choose best package to run in your environment. We have already migrated almost all Roboflow models to new registry - working hard to achieve full coverage soon!
📅 What happens next
-
Next week
- Stable
Inference1.0.0 - Stable
inference-modelsrelease - Roboflow platform updated to use
inference-modelsas the default engine
- Stable
-
In the coming weeks
inference-modelsbecomes the default engine for public builds (USE_INFERENCE_MODELSbecomes opt-out, not opt-in)- continued performance improvements and runtime optimizations
🔭 Looking forward - the road to 2.0
- This engine refresh is only the first step.
- We are starting work toward Inference 2.0, a larger modernization effort similar in spirit to the changes introduced with
inference-models.
Stay tuned for future updates!
v0.64.7
What's Changed
- dg 15 fix timeout file by @rafel-roboflow in #1934
- Fix VLM as Detector/Classifier name, so it gets correct URL by @Erol444 in #1965
- improve error logging by @japrescott in #1966
- Add rfdetr nas by @probicheaux in #1970
- added missing envvar export for webrtc preview gzip flag by @rafel-roboflow in #1978
- Bug/dg 204 (2) reduce ack window webrtc by @rafel-roboflow in #1979
- Claude opus 4.6 in Claude block by @Erol444 in #1980
- Add remote exec capability for foundation models missing it by @hansent in #1968
- Gemini block: Add support for tool code execution (tool use) by @Erol444 in #1961
- Pass delete from disk to clear cache by @bigbitbus in #1982
- Add change to avoid pushing latest tag for rc release by @PawelPeczek-Roboflow in #1983
Full Changelog: v0.64.6...v0.64.7
v0.64.6
What's Changed
- Add large rf-detrs and seg coco models to inference_models by @Matvezy in #1944
- Add configurable RF API timeout for inference-cli command interacting with RF-cloud by @PawelPeczek-Roboflow in #1950
- Allow sv.Detections.data properties in extract-property block by @grzegorz-roboflow in #1948
- CI for deploying custom python block modal app by @grzegorz-roboflow in #1945
- in modal.custom_python_block.deploy.yml use deployment modal tokens by @grzegorz-roboflow in #1953
- Address internals imported from Supervision by @grzegorz-roboflow in #1951
- Add YOLO26 to
inference_modelsby @mkaic in #1943 - Fix failing yolo26 gpu integration tests by @mkaic in #1956
- Disable automatic deployment of modal webexec by @grzegorz-roboflow in #1954
- 0.64.6 by @grzegorz-roboflow in #1957
Full Changelog: v0.64.5...v0.64.6