06 Apr 02:22

lyonzin

e60e18d

v3.3.2 — Type Validation & Bounds Checking Latest

Latest

Fixes

Full hardening of the YAML config loader after rigorous audit:

Type validation on all config values — wrong types (string where int, string where list, int where bool) now warn and fall back to defaults
Bounds validation — chunk_size (min 100), chunk_overlap (non-negative, < chunk_size), default_results, max_results, embedding_dim, reranker_top_k_multiplier
keyword_routes string values detected and removed — previously redteam: "pentest" caused character-level matching ("p", "e", "n", etc.)
reranker_enabled string coercion ("yes" → True with warning)
supported_formats: [] falls back to defaults with warning
Version synced across init.py, config.py, server.py, pyproject.toml
Error handling in knowledge-rag init (PermissionError, OSError)
Broken README anchor fixed
Duplicate keyword removed

Upgrade

pip install --upgrade knowledge-rag

Assets 2

06 Apr 02:03

lyonzin

v3.3.1

cc8624b

v3.3.1 — Hotfix: YAML null safety + presets in pip

Fixes

YAML null values no longer crash the server — Writing category_mappings: (without a value) in config.yaml now safely falls back to defaults instead of crashing with TypeError: argument of type 'NoneType' is not iterable
Presets now included in pip install — knowledge-rag init exports config template, all 4 presets, and creates a documents/ directory in the current folder

New

knowledge-rag init CLI command — One command to set up a fresh knowledge base:

pip install knowledge-rag
knowledge-rag init
cp presets/developer.yaml config.yaml
# Add your docs to documents/

Upgrade

pip install --upgrade knowledge-rag

Assets 2

06 Apr 01:33

lyonzin

v3.3.0

6d7dd19

v3.3.0 — YAML Configuration System

What's New

YAML Configuration System

All settings are now customizable via config.yaml — no more editing Python code. Categories, keyword routing, query expansions, models, chunking, and paths are all configurable through a single YAML file.

Domain Presets

Four ready-to-use presets ship with the project:

Preset	Categories	Keywords	Expansions	Best For
cybersecurity	8	200+	69	Red/Blue Team, CTFs, threat hunting
developer	9	150+	50+	Full-stack, APIs, DevOps, cloud
research	9	100+	40+	Academic papers, thesis, datasets
general	0	0	0	Blank slate, pure semantic search

cp presets/developer.yaml config.yaml   # Ready to go

Generic Use Support

With empty mappings ({}), the system operates as a domain-agnostic semantic search engine. No security-specific logic unless you want it.

Backwards Compatible

No config.yaml? The system uses built-in defaults — identical behavior to v3.2.x. Zero migration required.

Changes

NEW: YAML configuration system — fully customizable via config.yaml
NEW: Domain presets — cybersecurity, developer, research, general
NEW: config.example.yaml — documented template with explanations for every field
NEW: Categories, keyword routing, and query expansions now user-configurable
NEW: Empty config = pure semantic search with zero domain logic
NEW: Warning log for empty files during indexing (previously silent skip)
IMPROVED: README rewritten — full configuration reference, preset docs, updated structure
IMPROVED: pyyaml added as dependency

Upgrade

git pull origin master
pip install pyyaml    # New dependency
# Optionally: cp presets/cybersecurity.yaml config.yaml

No breaking changes. Existing installations work without any config file.

Assets 2

03 Apr 22:30

lyonzin

v3.2.4

0ecbb43

v3.2.4 — Symlink Support

What's New

Symlink support — documents/ directory now follows symbolic links recursively (#13)
Circular symlink protection — realpath deduplication prevents infinite recursion loops
Stricter _has_documents() detection — validates against supported formats only (ignores .gitkeep, temp files, etc.)

Changes

File	Change
`mcp_server/config.py`	`_has_documents()` → `os.walk(followlinks=True)` + format filter
`mcp_server/ingestion.py`	`parse_directory()` → `os.walk` + `seen_dirs` loop protection

Full Changelog: v3.2.3...v3.2.4

Assets 2

22 Mar 23:33

lyonzin

v3.2.3

ac9456c

v3.2.3 — BASE_DIR smart detection for pip install

Fix

BASE_DIR now checks for actual files inside documents/ (not just directory existence)
Prevents false positive when site-packages/documents/ exists as empty dir
Supports KNOWLEDGE_RAG_DIR env var for explicit override

Upgrade

pip install --upgrade knowledge-rag

Assets 2

22 Mar 23:10

lyonzin

v3.2.2

0774d70

v3.2.2 — pip install plug-and-play fix

Fixes

pip install knowledge-rag now truly plug-and-play

BASE_DIR was resolving to site-packages/ when installed from PyPI, causing documents/ to not be found. Now falls back to current working directory.

Supports KNOWLEDGE_RAG_DIR env var for explicit override.

category="aar" accepted by search_knowledge

The validator was rejecting aar as a category because it only checked keyword_routes keys. Now uses category_mappings values too.

Upgrade

pip install --upgrade knowledge-rag

Assets 2

22 Mar 12:00

lyonzin

v3.2.1

c81938d

v3.2.1 — Auto-Recovery from Corrupted ChromaDB

Fix: Auto-Recovery on Startup

If ChromaDB gets corrupted (crash during indexing, power loss, etc.), the server now automatically detects and recovers instead of crashing with a segfault loop.

What was happening

A crash during indexing left the SQLite DB in a corrupted state
Next startup: segfault → crash → restart → segfault (infinite loop)
Required manual deletion of data/chroma_db/ to fix

What happens now

Server detects corruption on startup
Automatically deletes corrupted data
Recreates fresh collection
Logs [RECOVERY] messages so you know it happened
Zero manual intervention needed

Also handles

Embedding function conflicts (e.g., switching models)
Orphaned UUID directories from partial rebuilds

Upgrade

pip install --upgrade knowledge-rag

Assets 2

20 Mar 13:20

lyonzin

v3.2.0

47ceec8

v3.2.0 — Parallel Search + Adjacent Chunk Retrieval

New Features

Parallel BM25 + Semantic Search

Both search engines now run simultaneously in threads. ~50% latency reduction in hybrid mode.

Adjacent Chunk Retrieval

Matched chunks are automatically expanded with surrounding context. When a chunk matches your query, the system fetches the chunks immediately before and after it (from the same document) and merges them into a single expanded result.

Results include context_expanded: true when adjacent chunks were merged
Content grows from ~650 chars to ~1500 chars per result (more context for the LLM)
Zero impact on retrieval precision — the matching still happens on the original chunk

Inspired by PrivateGPT's SentenceWindow pattern and Kotaemon's parallel retrieval.

Upgrade

pip install --upgrade knowledge-rag

Full Changelog

v3.1.1...v3.2.0

Assets 2

20 Mar 12:43

lyonzin

v3.1.1

49ea25f

v3.1.1 — Chunker Bugfix, AAR Category, CVE Aliases

Fixes

Markdown Chunker (critical quality fix)

Code-block protection: # comments inside code fences no longer split as markdown headers
Split by ##/### only: # (H1) was catching shell comments and code — now ignored
Min chunk size 100 chars: Header-only chunks (32-53 chars of junk) now merge with next section
Result: c2-operations doc goes from 32 chunks (12 junk) → 17 chunks (0 junk)

New

AAR category: documents/aar/ maps to category "aar" (was "general")
14 CVE aliases: PrintNightmare↔CVE-2021-34527, EternalBlue↔MS17-010, PwnKit↔CVE-2021-4034, Log4Shell↔CVE-2021-44228, ZeroLogon↔CVE-2020-1472, PetitPotam, CertiFried, noPac, ProxyLogon, ProxyShell

Upgrade

pip install --upgrade knowledge-rag

After upgrade, run reindex_documents(full_rebuild=true) to reprocess all documents with the fixed chunker.

Full Changelog

v3.1.0...v3.1.1

Assets 2

19 Mar 20:33

lyonzin

v3.1.0

2d14b64

v3.1.0 — DOCX/XLSX/PPTX/CSV, File Watcher, MMR

Knowledge RAG v3.1.0

New Features

Office Document Support (4 new formats)

DOCX: Paragraphs, tables, heading structure preserved as markdown
XLSX: All sheets extracted as searchable text tables
PPTX: Slide-by-slide text extraction
CSV: Native parsing, zero extra deps
Total: 9 formats (was 5)

File Watcher

Documents directory monitored in real-time via watchdog. Auto-reindexes with 5-second debounce when you add, modify, or delete files.

MMR Result Diversification

Maximal Marginal Relevance applied after reranking. Reduces redundant results — if top 5 were from same doc, MMR pushes varied sources up. Lambda=0.7 (relevance-heavy).

pip install

pip install knowledge-rag

No clone needed. Models download automatically.

Full Changelog

v3.0.0...v3.1.0

Assets 2

Releases: lyonzin/knowledge-rag

v3.3.2 — Type Validation & Bounds Checking

Fixes

Upgrade

Uh oh!

v3.3.1 — Hotfix: YAML null safety + presets in pip

Fixes

New

Upgrade

Uh oh!

v3.3.0 — YAML Configuration System

What's New

YAML Configuration System

Domain Presets

Generic Use Support

Backwards Compatible

Changes

Upgrade

Uh oh!

v3.2.4 — Symlink Support

What's New

Changes

Uh oh!

v3.2.3 — BASE_DIR smart detection for pip install

Fix

Upgrade

Uh oh!

v3.2.2 — pip install plug-and-play fix

Fixes

pip install knowledge-rag now truly plug-and-play

category="aar" accepted by search_knowledge

Upgrade

Uh oh!

v3.2.1 — Auto-Recovery from Corrupted ChromaDB

Fix: Auto-Recovery on Startup

What was happening

What happens now

Also handles

Upgrade

Uh oh!

v3.2.0 — Parallel Search + Adjacent Chunk Retrieval

New Features

Parallel BM25 + Semantic Search

Adjacent Chunk Retrieval

Upgrade

Full Changelog

Uh oh!

v3.1.1 — Chunker Bugfix, AAR Category, CVE Aliases

Fixes

Markdown Chunker (critical quality fix)

New

Upgrade

Full Changelog

Uh oh!

v3.1.0 — DOCX/XLSX/PPTX/CSV, File Watcher, MMR

Knowledge RAG v3.1.0

New Features

Office Document Support (4 new formats)

File Watcher

MMR Result Diversification

pip install

Full Changelog

Uh oh!