What's new
Image alt text preservation
Images are no longer silently stripped. Alt text and figcaptions are extracted as [Image: description] inline references, preserving context from diagrams, architecture charts, and annotated screenshots. Figcaptions take priority over alt text when both are present.
Python API: result.pages
CrawlResult now includes a pages list of PageData objects for direct programmatic access:
import markcrawl
result = markcrawl.crawl("https://example.com", out_dir="./output")
for page in result.pages:
print(page.url, page.title)
chunks = markcrawl.chunk_markdown(page.content)No more parsing JSONL files to use crawl results in code.
Benchmark documentation
New docs/BENCHMARKS.md with self-contained speed, quality, and cost comparisons across 7 tools. Full methodology at llm-crawler-benchmarks.