Keyword Search Engine

A keyword search engine written in Zig and compiled to WASM.

Table of Contents

Boolean Search
- Markdown files
Pipeline
- 1. Build and Index
- 2. Use in the browser
Why?
- Good resources

Boolean Search

Implicitly, keywords and _AND_ed together to form boolean search queries.¹

An AND query means that a document matches if and only if it contains all the search keywords.

Currently, search results are not ranked. However, the index already stores frequency information and position data. I included this to support TF-IDF style / BM25 ranking in the future.

Markdown files

Documents are assumed to be Markdown files.

Instead of indexing files as a whole, they are parsed into an AST at the heading level. Each node in the tree essentially consists of a title (the heading text) and some content (the text until the next heading).

For instance given this document:

# Hello world

File over app.

## Foo

## Bar

Chocolate bar

Searching for hello file matches the level-1 node (with heading "Hello world" and content "File over app").
Searching for chocolate matches one level-2 node (with heading Bar and content Chocolate bar).

I find this way more useful than just matching entire documents, which could be very long.

Pipeline

The system consists of a CLI indexer and a WASM search API.

1. Build and Index

Run the build script, with the directory containing the Mardown files you want to index:

./build.sh <data-dir>

It will:

Build a search index for the documents in <data-dir> (search-index.bin).
Generate a metadata file ² (docs-mapping.json).
Compile the search logic to WASM ³.

Then, you can search through the indexed documents from your javascript code running in the browser using the exposed search API running in WASM. You will need, alongside the WASM binary, the index and the document mappings JSON.

2. Use in the browser

To use the engine in a browser, load the WASM binary and the generated index. You can then query the index via the exposed API.

The provided SearchEngine wrapper handles the WASM memory management and querying.

const searchEngine = new SearchEngine();

// Initialization from the build assets
await searchEngine.initialize(
  './search.wasm', // search logic and exposed API
  './search-index.bin', // serialized index data structure
  './docs-mapping.json' // serialized metadata
);

const results = searchEngine.search("zig wasm performance");

// Results (IDs) are enriched with metadata (titles and links):
// [{ docId: 10, title: "Optimizing Zig", link: "/posts/opt.html" }, ...]

Why?

I wanted to learn more about "old-school" search engines and their internals. And I also needed one for my personal website!

The choice of Zig was a happy coincidence. I attended TigerBeetle world tour in Paris (Dec. 2025), where we were encouraged to present a project. I felt like I had no choice but to do this one in Zig for the occasion ;)!

You can find the slides from my presentation in this repo. They provide a high-level overview of the architecture and how an inverted index is built and used to power the search engine.

Good resources

Francesco Tomaselli's super blog post: Search Engine in Rust
Christopher Manning's generous and great book on IR, which Francesco also references: Introduction to Information Retrieval

Support for OR and other operators could come in the future. ↩
Also a document mapping JSON. This is could be removed but helps in debuging the index. ↩
See wasm-api.zig for the API. ↩

Name		Name	Last commit message	Last commit date
Latest commit History 76 Commits
data		data
src		src
.gitignore		.gitignore
README.md		README.md
TigerBeetle-ZML-meetup-pres.pdf		TigerBeetle-ZML-meetup-pres.pdf
build.sh		build.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Keyword Search Engine

Boolean Search

Markdown files

Pipeline

1. Build and Index

2. Use in the browser

Why?

Good resources

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Keyword Search Engine

Boolean Search

Markdown files

Pipeline

1. Build and Index

2. Use in the browser

Why?

Good resources

Footnotes

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages