Skip to content

feat: add progress callback system for conversion tracking#3042

Open
mrowdy wants to merge 4 commits intodocling-project:mainfrom
mrowdy:feat/progress-callback
Open

feat: add progress callback system for conversion tracking#3042
mrowdy wants to merge 4 commits intodocling-project:mainfrom
mrowdy:feat/progress-callback

Conversation

@mrowdy
Copy link

@mrowdy mrowdy commented Feb 27, 2026

Adds an optional progress_callback parameter to DocumentConverter that emits structured events during conversion (document start/complete, phase transitions, page completions).

  • Event types are immutable Pydantic models: DocumentProgressEvent, PhaseProgressEvent, PageProgressEvent.
  • Callbacks are exception-safe and thread-safe for the standard PDF pipeline.
  • Adds a --progress CLI flag that prints per-page progress to stdout.

Issue resolved by this Pull Request:
Resolves #2750
Related: docling-project/docling-serve#364

Checklist:

  • Documentation has been updated.
  • Examples have been added.
  • Tests have been added.

@github-actions
Copy link
Contributor

github-actions bot commented Feb 27, 2026

DCO Check Passed

Thanks @mrowdy, all your commits are properly signed off. 🎉

@mergify
Copy link

mergify bot commented Feb 27, 2026

Merge Protections

Your pull request matches the following merge protections and will not be merged until they are valid.

🟢 Enforce conventional commit

Wonderful, this rule succeeded.

Make sure that we follow https://www.conventionalcommits.org/en/v1.0.0/

  • title ~= ^(fix|feat|docs|style|refactor|perf|test|build|ci|chore|revert)(?:\(.+\))?(!)?:

@dosubot
Copy link

dosubot bot commented Feb 27, 2026

Related Documentation

1 document(s) may need updating based on files changed in this PR:

Docling

What are the detailed pipeline options and processing behaviors for PDF, DOCX, PPTX, and XLSX files in the Python SDK?
View Suggested Changes
@@ -170,6 +170,40 @@
 
 ---
 
+### DocumentConverter Initialization Parameters
+
+The `DocumentConverter` class supports several initialization parameters that control global conversion behavior:
+
+- **`allowed_formats`**: List of allowed input formats. By default, any format supported by Docling is allowed.
+- **`format_options`**: Dictionary of format-specific options (e.g., `PdfPipelineOptions`, `AsrPipelineOptions`). See format-specific sections above for details.
+- **`progress_callback`**: Optional callback function that receives structured progress events during conversion, including:
+    - **Document start/complete events** (`DocumentProgressEvent`): Emitted when a document begins or finishes processing. Includes document name and page count (if available).
+    - **Pipeline phase transitions** (`PhaseProgressEvent`): Emitted when entering or completing a phase (BUILD, ASSEMBLE, ENRICH).
+    - **Individual page completions** (`PageProgressEvent`): Emitted when each page finishes processing. Includes current page number and total page count.
+
+When no callback is provided (the default), no progress events are emitted and there is zero overhead.
+
+**Usage Example**:
+
+```python
+from docling.datamodel.progress_event import ProgressEvent
+from docling.document_converter import DocumentConverter
+
+def on_progress(event: ProgressEvent):
+    print(event.event_type, event.document_name)
+
+converter = DocumentConverter(progress_callback=on_progress)
+result = converter.convert(source="https://arxiv.org/pdf/2408.09869")
+```
+
+**CLI Support**: The CLI also supports progress tracking via the `--progress` flag:
+
+```sh
+docling --progress FILE
+```
+
+---
+
 #### Additional Notes
 - Only PDF supports image resolution adjustment (`images_scale`). The default PDF backend is now `docling_parse`.
 - DOCX header/footer export is only available via Python API.

[Accept] [Decline]

Note: You must be authenticated to accept/decline updates.

How did I do? Any feedback?  Join Discord

@codecov
Copy link

codecov bot commented Mar 2, 2026

Codecov Report

❌ Patch coverage is 83.33333% with 16 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
docling/cli/main.py 26.66% 11 Missing ⚠️
...erimental/pipeline/threaded_layout_vlm_pipeline.py 0.00% 3 Missing ⚠️
docling/pipeline/base_pipeline.py 91.66% 2 Missing ⚠️

📢 Thoughts on this report? Let us know!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Progress report

1 participant