A keyword search engine written in Zig and compiled to WASM.
Table of Contents
Implicitly, keywords and _AND_ed together to form boolean search queries.1
An AND query means that a document matches if and only if it contains all the search keywords.
Currently, search results are not ranked. However, the index already stores frequency information and position data. I included this to support TF-IDF style / BM25 ranking in the future.
Documents are assumed to be Markdown files.
Instead of indexing files as a whole, they are parsed into an AST at the heading level. Each node in the tree essentially consists of a title (the heading text) and some content (the text until the next heading).
For instance given this document:
# Hello world
File over app.
## Foo
## Bar
Chocolate bar- Searching for
hello filematches the level-1 node (with heading"Hello world"and content"File over app"). - Searching for
chocolatematches one level-2 node (with headingBarand contentChocolate bar).
I find this way more useful than just matching entire documents, which could be very long.
The system consists of a CLI indexer and a WASM search API.
Run the build script, with the directory containing the Mardown files you want to index:
./build.sh <data-dir>It will:
- Build a search index for the documents in
<data-dir>(search-index.bin). - Generate a metadata file 2 (
docs-mapping.json). - Compile the search logic to WASM 3.
Then, you can search through the indexed documents from your javascript code running in the browser using the exposed search API running in WASM. You will need, alongside the WASM binary, the index and the document mappings JSON.
To use the engine in a browser, load the WASM binary and the generated index. You can then query the index via the exposed API.
The provided SearchEngine wrapper handles the WASM memory management and
querying.
const searchEngine = new SearchEngine();
// Initialization from the build assets
await searchEngine.initialize(
'./search.wasm', // search logic and exposed API
'./search-index.bin', // serialized index data structure
'./docs-mapping.json' // serialized metadata
);
const results = searchEngine.search("zig wasm performance");
// Results (IDs) are enriched with metadata (titles and links):
// [{ docId: 10, title: "Optimizing Zig", link: "/posts/opt.html" }, ...]I wanted to learn more about "old-school" search engines and their internals. And I also needed one for my personal website!
The choice of Zig was a happy coincidence. I attended TigerBeetle world tour in Paris (Dec. 2025), where we were encouraged to present a project. I felt like I had no choice but to do this one in Zig for the occasion ;)!
You can find the slides from my presentation in this repo. They provide a high-level overview of the architecture and how an inverted index is built and used to power the search engine.
- Francesco Tomaselli's super blog post: Search Engine in Rust
- Christopher Manning's generous and great book on IR, which Francesco also references: Introduction to Information Retrieval