Skip to content

v0.4.1 — Image Alt Text, Python API, Benchmarks

Latest

Choose a tag to compare

@AIMLPM AIMLPM released this 13 Apr 09:22
· 3 commits to main since this release

What's new

Image alt text preservation

Images are no longer silently stripped. Alt text and figcaptions are extracted as [Image: description] inline references, preserving context from diagrams, architecture charts, and annotated screenshots. Figcaptions take priority over alt text when both are present.

Python API: result.pages

CrawlResult now includes a pages list of PageData objects for direct programmatic access:

import markcrawl

result = markcrawl.crawl("https://example.com", out_dir="./output")
for page in result.pages:
    print(page.url, page.title)
    chunks = markcrawl.chunk_markdown(page.content)

No more parsing JSONL files to use crawl results in code.

Benchmark documentation

New docs/BENCHMARKS.md with self-contained speed, quality, and cost comparisons across 7 tools. Full methodology at llm-crawler-benchmarks.