feat: add progress callback system for conversion tracking#3042
Open
mrowdy wants to merge 4 commits intodocling-project:mainfrom
Open
feat: add progress callback system for conversion tracking#3042mrowdy wants to merge 4 commits intodocling-project:mainfrom
mrowdy wants to merge 4 commits intodocling-project:mainfrom
Conversation
Signed-off-by: mrowdy <[email protected]>
Signed-off-by: mrowdy <[email protected]>
Contributor
|
✅ DCO Check Passed Thanks @mrowdy, all your commits are properly signed off. 🎉 |
Merge ProtectionsYour pull request matches the following merge protections and will not be merged until they are valid. 🟢 Enforce conventional commitWonderful, this rule succeeded.Make sure that we follow https://www.conventionalcommits.org/en/v1.0.0/
|
|
Related Documentation 1 document(s) may need updating based on files changed in this PR: Docling What are the detailed pipeline options and processing behaviors for PDF, DOCX, PPTX, and XLSX files in the Python SDK?View Suggested Changes@@ -170,6 +170,40 @@
---
+### DocumentConverter Initialization Parameters
+
+The `DocumentConverter` class supports several initialization parameters that control global conversion behavior:
+
+- **`allowed_formats`**: List of allowed input formats. By default, any format supported by Docling is allowed.
+- **`format_options`**: Dictionary of format-specific options (e.g., `PdfPipelineOptions`, `AsrPipelineOptions`). See format-specific sections above for details.
+- **`progress_callback`**: Optional callback function that receives structured progress events during conversion, including:
+ - **Document start/complete events** (`DocumentProgressEvent`): Emitted when a document begins or finishes processing. Includes document name and page count (if available).
+ - **Pipeline phase transitions** (`PhaseProgressEvent`): Emitted when entering or completing a phase (BUILD, ASSEMBLE, ENRICH).
+ - **Individual page completions** (`PageProgressEvent`): Emitted when each page finishes processing. Includes current page number and total page count.
+
+When no callback is provided (the default), no progress events are emitted and there is zero overhead.
+
+**Usage Example**:
+
+```python
+from docling.datamodel.progress_event import ProgressEvent
+from docling.document_converter import DocumentConverter
+
+def on_progress(event: ProgressEvent):
+ print(event.event_type, event.document_name)
+
+converter = DocumentConverter(progress_callback=on_progress)
+result = converter.convert(source="https://arxiv.org/pdf/2408.09869")
+```
+
+**CLI Support**: The CLI also supports progress tracking via the `--progress` flag:
+
+```sh
+docling --progress FILE
+```
+
+---
+
#### Additional Notes
- Only PDF supports image resolution adjustment (`images_scale`). The default PDF backend is now `docling_parse`.
- DOCX header/footer export is only available via Python API.Note: You must be authenticated to accept/decline updates. |
Codecov Report❌ Patch coverage is 📢 Thoughts on this report? Let us know! |
…b.com> I, mrowdy <[email protected]>, hereby add my Signed-off-by to this commit: 16072b8 Signed-off-by: mrowdy <[email protected]>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Adds an optional
progress_callbackparameter toDocumentConverterthat emits structured events during conversion (document start/complete, phase transitions, page completions).DocumentProgressEvent,PhaseProgressEvent,PageProgressEvent.--progressCLI flag that prints per-page progress to stdout.Issue resolved by this Pull Request:
Resolves #2750
Related: docling-project/docling-serve#364
Checklist: