Lightweight browser SDK to run ONNX models with WebGPU/WASM. Includes a minimal sentiment example under examples/sentiment.
- Node.js 18+ and npm
- A modern browser; WebGPU gives best performance (Chrome/Edge 121+, flags may be needed on some platforms), falls back to WASM automatically.
- Local assets under
public/models/…(example sentiment model + tokenizer is provided).
npm installwill try to fetchpublic/models/sentiment/v1/model.onnxif you setMODEL_ONNX_URL(orONNX_MODEL_URL) to a direct.onnxdownload. Example:MODEL_ONNX_URL=https://your-hosted-model/model.onnx npm install
- If the file already exists locally, the download is skipped. To bypass on CI, set
SKIP_MODEL_DOWNLOAD=1. - The ONNX file is ignored by git (
public/models/**/*.onnx), so each developer can fetch it without committing large binaries.
npm install onnx-web-kitnpm install
npm run dev # runs the sentiment example at http://localhost:3000npm run build # builds the sentiment example into dist/examples/sentimentExamples live under examples/; add more there following the same pattern.
examples/sentiment/demo: Vanilla JS page wiring the SDK directly with the bundled sentiment model and tokenizer inpublic/models/sentiment/v1/.examples/react-generic-model: Vite + React demo with a genericuseOnnxModelhook and a sentiment wrapper component showing text classification end-to-end.
import {
createRuntime,
registerModel,
} from "onnx-web-kit";
import { runTextModel } from "onnx-web-kit/core/text-utils.js";
// 1) Create a runtime
const runtime = createRuntime({
preferredBackend: "webgpu", // or "wasm"
modelBasePath: "/models", // where your ONNX + tokenizer files live
debug: true,
onLog: console.log,
});
// 2) Register your model
registerModel("sentiment", {
version: "v1",
path: "sentiment/v1/model.onnx",
tokenizer: "sentiment/v1/tokenizer.json",
// Optional: add labels to get decoded outputs instead of raw logits
labels: ["very negative", "negative", "neutral", "positive", "very positive"],
});
// 3) Run inference
const result = await runTextModel(runtime, "sentiment", "I love this!");
// If labels provided: { logits, probs, label, labelIndex, labelProb }
// Otherwise: raw logits array
console.log(result);import React from "react";
import useOnnxModel from "./examples/react-generic-model/useOnnxModel.js";
export function SentimentWidget() {
const { ready, loading, error, analyze, output } = useOnnxModel({
modelName: "sentiment",
modelPath: "sentiment/v1/model.onnx",
tokenizerPath: "sentiment/v1/tokenizer.json",
modelBasePath: "/models",
preferredBackend: "webgpu",
});
const run = () => analyze("I love how simple this SDK makes browser AI!");
return (
<div>
<button onClick={run} disabled={!ready || loading}>
{loading ? "Running…" : "Analyze sentiment"}
</button>
{error && <p>Error: {error.message}</p>}
{output && <pre>{JSON.stringify(output, null, 2)}</pre>}
</div>
);
}- Drop your ONNX and tokenizer files under
public/models/<name>/<version>/.- Example layout:
public/models/your-model/v1/model.onnx public/models/your-model/v1/tokenizer.json public/models/your-model/v1/tokenizer_config.json public/models/your-model/v1/vocab.txt (if your tokenizer needs it)
- Example layout:
- Register it in your app:
registerModel("your-model", { version: "v1", path: "your-model/v1/model.onnx", tokenizer: "your-model/v1/tokenizer.json", labels: ["labelA", "labelB", "labelC"], // optional });
- Call the appropriate helper (e.g.,
runTextModel).
- Zero server round-trips: Models run in the browser; great for latency and privacy.
- WebGPU-first, WASM fallback: Takes advantage of modern GPU acceleration without breaking older browsers.
- Simple DX: One runtime, a small registry, and a high-level helper (
runTextModel) hide the ONNX + tokenizer wiring. - Pluggable labels: App developers can attach label sets per model to get decoded outputs automatically.
- Self-hosted assets: Works fully offline once the ONNX + tokenizer files are served locally.
- Minimal footprint: Plain JS, Vite dev server, no heavy framework lock-in.
- You need client-side inference for text models (classification, simple NLP tasks) without standing up an API.
- You want to prototype or demo ONNX models quickly in a browser.
- You care about user data staying on-device (no server calls).
- You need a portable setup that can drop into any static hosting environment.
- Labels are optional and per-model. If provided in
registerModel,runTextModelreturns decoded labels; otherwise it returns raw logits. - The SDK currently exposes a text helper (
runTextModel). Extend similarly for other modalities (image/audio) by following the loader + feeds pattern. - WebGPU availability varies by browser/OS; the runtime will fall back to WASM automatically.
