Skip to content
@datalab-to

Datalab

Developing state of the art document intelligence models.

Pinned Loading

  1. marker marker Public

    Convert PDF to markdown + JSON quickly with high accuracy

    Python 29.5k 2k

  2. surya surya Public

    OCR, layout analysis, reading order, table recognition in 90+ languages

    Python 18.8k 1.3k

  3. pdftext pdftext Public

    Extract structured text from pdfs quickly

    Python 616 59

  4. chandra chandra Public

    OCR model that handles complex tables, forms, handwriting with full layout.

    Python 295 20

Repositories

Showing 9 of 9 repositories
  • chandra Public

    OCR model that handles complex tables, forms, handwriting with full layout.

    datalab-to/chandra’s past year of commit activity
    Python 295 Apache-2.0 20 3 0 Updated Oct 30, 2025
  • sdk Public
    datalab-to/sdk’s past year of commit activity
    Python 5 MIT 3 4 1 Updated Oct 30, 2025
  • marker Public

    Convert PDF to markdown + JSON quickly with high accuracy

    datalab-to/marker’s past year of commit activity
    Python 29,504 GPL-3.0 1,973 292 42 Updated Oct 21, 2025
  • surya Public

    OCR, layout analysis, reading order, table recognition in 90+ languages

    datalab-to/surya’s past year of commit activity
    Python 18,785 GPL-3.0 1,279 126 12 Updated Oct 21, 2025
  • oss_container Public
    datalab-to/oss_container’s past year of commit activity
    Python 0 0 0 0 Updated Oct 2, 2025
  • datalab-on-prem Public

    Scripts to run Datalab's self-service on-prem container

    datalab-to/datalab-on-prem’s past year of commit activity
    Shell 1 0 0 0 Updated Aug 29, 2025
  • datalab-to/inference-mirror’s past year of commit activity
    Python 3 1 0 1 Updated Aug 13, 2025
  • docext Public

    An on-premises, OCR-free unstructured data extraction, markdown conversion and benchmarking toolkit. (https://idp-leaderboard.org/)

    datalab-to/docext’s past year of commit activity
    Python 6 Apache-2.0 2 0 0 Updated Jun 18, 2025
  • pdftext Public

    Extract structured text from pdfs quickly

    datalab-to/pdftext’s past year of commit activity
    Python 616 Apache-2.0 59 11 5 Updated Jun 11, 2025

Top languages

Loading…

Most used topics

Loading…