CX_DB8

Unsupervised, contextual, extractive summarizer built for competitive debate evidence — and useful for any document.

CX_DB8 uses modern sentence embeddings to find the most relevant words, sentences, or paragraphs in a document relative to a query. It highlights and underlines text by semantic similarity, producing beautiful terminal output, Word documents, HTML, and SVG exports.

Features

Four granularity levels — phrase, word, sentence, or paragraph extraction
Any sentence-transformer model — swap models with a single flag
Beautiful Rich TUI — styled terminal output with panels, tables, and color-coded highlights
Multiple exports — Word (.docx), HTML, and SVG output formats
Interactive mode — process multiple cards in sequence, save all to one document
3D visualization — explore the embedding space with interactive matplotlib + UMAP plots
Fast — default model runs on CPU in seconds, no GPU required

Quick Start

Install with UV (recommended)

uv tool install git+https://github.com/Hellisotherpeople/CX_DB8.git

Or clone and install locally:

git clone https://github.com/Hellisotherpeople/CX_DB8.git
cd CX_DB8
uv sync

Install with pip

pip install git+https://github.com/Hellisotherpeople/CX_DB8.git

Run the demo

cx-db8 demo

Usage

Basic summarization

# From a file
cx-db8 run --file evidence.txt --query "nuclear war causes extinction"

# Pipe text in
cat evidence.txt | cx-db8 run --query "economic collapse"

# Interactive prompt (paste text, Ctrl-D to finish)
cx-db8 run

Granularity levels

# Sentence level (default) — best for most use cases
cx-db8 run -f card.txt -q "hegemony decline" -g sentence

# Phrase level — word-level scoring with grammatical bridging
cx-db8 run -f card.txt -q "hegemony decline" -g phrase

# Word level — raw token-level extraction with context windows
cx-db8 run -f card.txt -q "hegemony decline" -g word

# Paragraph level — coarse-grained extraction
cx-db8 run -f card.txt -q "hegemony decline" -g paragraph

Phrase mode is the sweet spot between word and sentence: it scores each word individually (with contextual n-gram windows), then bridges small gaps between important words so that the underlined/highlighted portions read as grammatical phrases instead of isolated tokens. Use --bridge-gap N to control how many filler words get absorbed (default 3).

Control thresholds

# Underline top 30%, highlight top 15%
cx-db8 run -f card.txt -q "warming" -u 70 -H 85

# Aggressive: only keep top 10%
cx-db8 run -f card.txt -q "warming" -u 90 -H 95

Export formats

# Word document
cx-db8 run -f card.txt -q "deterrence" --docx summary.docx

# HTML
cx-db8 run -f card.txt -q "deterrence" --html summary.html

# SVG screenshot
cx-db8 run -f card.txt -q "deterrence" --svg summary.svg

# All at once
cx-db8 run -f card.txt -q "deterrence" --docx out.docx --html out.html --svg out.svg

Choose a model

# List recommended models
cx-db8 models

# Use a specific model
cx-db8 run -f card.txt -q "query" --model all-mpnet-base-v2

Interactive mode

Process multiple cards in a session and save all summaries to a Word document:

cx-db8 run --interactive

3D Visualization

# Install visualization dependencies
uv pip install cx-db8[viz]

# Run with visualization
cx-db8 run -f card.txt -q "query" --viz

How It Works

CX_DB8 is an unsupervised extractive summarizer that works by computing semantic similarity between a query and each unit of text:

Encode the query into a dense vector using a sentence-transformer model
Segment the text into spans (words with context windows, sentences, or paragraphs)
Encode each span into the same embedding space
Score each span by cosine similarity to the query vector
Threshold using percentile-based cutoffs to determine what gets highlighted, underlined, or removed

For word and phrase-level summarization, each word is embedded along with its surrounding context window (default ±10 words), preserving contextual meaning rather than treating each word in isolation. Phrase mode additionally bridges small gaps (default ≤3 words) between kept words, promoting function words like articles and prepositions so the underlined text reads grammatically.

Sentence-Level Summary

Phrase-Level Summary

Configuration

All settings are available as CLI flags. Run cx-db8 run --help for full documentation:

Flag	Default	Description
`-f, --file`	stdin	Input text file
`-q, --query`	interactive	Card tag / query
`-g, --granularity`	sentence	phrase, word, sentence, or paragraph
`-u, --underline`	70	Underline percentile (1-99)
`-H, --highlight`	85	Highlight percentile (1-99)
`-m, --model`	all-MiniLM-L6-v2	Sentence-transformer model
`-w, --word-window`	10	Context window for word/phrase level
`-b, --bridge-gap`	3	Max gap to bridge in phrase mode
`--docx`	—	Export as Word document
`--html`	—	Export as HTML
`--svg`	—	Export as SVG screenshot
`--viz`	false	Show 3D embedding plot
`-i, --interactive`	false	Interactive loop mode

Development

git clone https://github.com/Hellisotherpeople/CX_DB8.git
cd CX_DB8
uv sync --extra dev
uv run pytest

Record demo GIFs

Requires VHS:

vhs demo.tape
vhs demo_help.tape

Background

In American competitive cross-examination debate (Policy Debate), debaters summarize evidence by underlining and highlighting the most important parts of source documents. This manual process is what CX_DB8 automates.

The original version (2018-2019) used TensorFlow Hub's Universal Sentence Encoder and Flair embeddings. This v2.0 rewrite modernizes the stack with sentence-transformers, Rich TUI, and UV packaging while preserving the core algorithm.

A webapp version implementing similar functionality is available at Hugging Face Spaces.

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 131 Commits
assets		assets
legacy		legacy
src/cx_db8		src/cx_db8
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
demo.tape		demo.tape
demo_help.tape		demo_help.tape
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CX_DB8

Features

Quick Start

Install with UV (recommended)

Install with pip

Run the demo

Usage

Basic summarization

Granularity levels

Control thresholds

Export formats

Choose a model

Interactive mode

3D Visualization

How It Works

Sentence-Level Summary

Phrase-Level Summary

Configuration

Development

Record demo GIFs

Background

License

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

CX_DB8

Features

Quick Start

Install with UV (recommended)

Install with pip

Run the demo

Usage

Basic summarization

Granularity levels

Control thresholds

Export formats

Choose a model

Interactive mode

3D Visualization

How It Works

Sentence-Level Summary

Phrase-Level Summary

Configuration

Development

Record demo GIFs

Background

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages