Skip to content

fix(asciidoc): handle commas in image alt text#2983

Merged
cau-git merged 4 commits intodocling-project:mainfrom
n0rdp0l:fix/asciidoc-image-alt-text-commas
Feb 13, 2026
Merged

fix(asciidoc): handle commas in image alt text#2983
cau-git merged 4 commits intodocling-project:mainfrom
n0rdp0l:fix/asciidoc-image-alt-text-commas

Conversation

@n0rdp0l
Copy link
Contributor

@n0rdp0l n0rdp0l commented Feb 12, 2026

Issue resolved by this Pull Request:
Resolves #2982

Description

Modified AsciiDocBackend._parse_picture() to gracefully handle commas in image alt text that are commonly generated by documentation
tools like Word/Doc2Help.

Previously, the parser would crash with ValueError: not enough values to unpack (expected 2, got 1) when alt text contained commas,
because it assumed all comma-separated values after the first were key=value attribute pairs.

Changes

  • Modified _parse_picture() to check for = before attempting to unpack attributes
  • Commas in alt text are now preserved and correctly reconstructed
  • Used split('=', 1) to handle edge cases where attribute values contain = characters
  • Added test case with realistic auto-generated alt text containing commas

Testing

  • Added test case in test_parse_picture() covering commas in alt text
  • All existing AsciiDoc tests pass
  • Pre-commit checks (ruff, mypy) pass

Checklist:

  • Documentation has been updated, if necessary.
  • Examples have been added, if necessary.
  • Tests have been added, if necessary.

  - Modified _parse_picture() to gracefully handle alt text containing commas
  - Commas in alt text are now preserved instead of causing ValueError
  - Added test case with realistic auto-generated alt text
  - split('=', 1) prevents issues when values contain '=' characters
@github-actions
Copy link
Contributor

github-actions bot commented Feb 12, 2026

DCO Check Passed

Thanks @n0rdp0l, all your commits are properly signed off. 🎉

@dosubot
Copy link

dosubot bot commented Feb 12, 2026

Related Documentation

Checked 14 published document(s) in 1 knowledge base(s). No updates required.

How did I do? Any feedback?  Join Discord

@mergify
Copy link

mergify bot commented Feb 12, 2026

Merge Protections

Your pull request matches the following merge protections and will not be merged until they are valid.

🟢 Enforce conventional commit

Wonderful, this rule succeeded.

Make sure that we follow https://www.conventionalcommits.org/en/v1.0.0/

  • title ~= ^(fix|feat|docs|style|refactor|perf|test|build|ci|chore|revert)(?:\(.+\))?(!)?:

@n0rdp0l n0rdp0l changed the title Fix: Handle commas in AsciiDoc image alt text fix(asciidoc): handle commas in image alt text Feb 12, 2026
I, n0rdp0l <[email protected]>, hereby add my Signed-off-by to this commit: ee75249

Signed-off-by: n0rdp0l <[email protected]>
@codecov
Copy link

codecov bot commented Feb 12, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

@cau-git
Copy link
Member

cau-git commented Feb 13, 2026

@n0rdp0l thanks for this contribution. Could you please rebase to main, it brings a fix for the failing test-pip-install-no-lock Ci check in.

Copy link
Member

@dolfim-ibm dolfim-ibm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@cau-git cau-git merged commit 86b6912 into docling-project:main Feb 13, 2026
25 checks passed
cau-git added a commit that referenced this pull request Feb 17, 2026
…n models (#2999)

* fix: add failed pages to DoclingDocument for page break consistency (#2939)

* fix: add failed pages to DoclingDocument for page break consistency

When some PDF pages fail to parse, they were not added to
DoclingDocument.pages, causing page break markers to be incorrect
during export. This adds failed/skipped pages with their size info
(if available) to maintain correct page numbering and structure.

- Add _add_failed_pages_to_document() method in StandardPdfPipeline
- Add test cases for failed page handling
- Add test cases for normal page handling (regression test)
- Add test PDF files

Signed-off-by: jhchoi1182 <[email protected]>

* fix: ensure resource cleanup and simplify type hints

- Wrap page_backend usage in try-finally to guarantee unload (prevents resource leaks).
- Simplify redundant 'float | None | None' type hint.

Signed-off-by: jhchoi1182 <[email protected]>

* fix: add groundtruth for normal_4pages.pdf and exclude failing PDFs from e2e test

Signed-off-by: jhchoi1182 <[email protected]>

* fix: ensure correct status assertion for failed pages in tests

Signed-off-by: jhchoi1182 <[email protected]>

---------

Signed-off-by: jhchoi1182 <[email protected]>

* fix: Use timezone-aware datetime (#2947)

* Use timezone-aware datetime for profiling timestamps

Updated timestamp recording to use timezone-aware datetime.

Signed-off-by: Nikhil Singh <[email protected]>

* run formatter

Signed-off-by: Michele Dolfi <[email protected]>

---------

Signed-off-by: Nikhil Singh <[email protected]>
Signed-off-by: Michele Dolfi <[email protected]>
Co-authored-by: Michele Dolfi <[email protected]>

* fix(asciidoc): handle commas in image alt text (#2983)

* Fix: Handle commas in AsciiDoc image alt text

  - Modified _parse_picture() to gracefully handle alt text containing commas
  - Commas in alt text are now preserved instead of causing ValueError
  - Added test case with realistic auto-generated alt text
  - split('=', 1) prevents issues when values contain '=' characters

* DCO Remediation Commit for n0rdp0l <[email protected]>

I, n0rdp0l <[email protected]>, hereby add my Signed-off-by to this commit: ee75249

Signed-off-by: n0rdp0l <[email protected]>

* style: fix ruff formatting in test_backend_asciidoc.py

Signed-off-by: n0rdp0l <[email protected]>

---------

Signed-off-by: n0rdp0l <[email protected]>
Co-authored-by: Michele Dolfi <[email protected]>

* chore: bump version to 2.73.1 [skip ci]

* First attempt at establishing API Kserve2 facet

Signed-off-by: Christoph Auer <[email protected]>

* refactor: improve KServe v2 engine implementation after code review

- Add comprehensive error handling to KserveV2HttpClient
  - Catch and wrap Timeout, ConnectionError, HTTPError with context
  - Validate response formats with clear error messages

- Refactor URL building to eliminate duplication
  - Extract _build_model_url() helper method
  - Single source of truth for infer_url and model_metadata_url

- Make URL required parameter (remove default localhost:8000)
  - Update ApiKserveV2*EngineOptions to require explicit URL
  - Add preset validation with helpful error messages

- Rename constants for clarity: TRITON_* → KSERVE_V2_*
  - Add comment explaining KServe v2 uses Triton type system

- Improve error messages with actual values
  - Show counts, shapes, and supported types in validation errors

- Document official KServe Python SDK alternative
  - Note async-only requirement and alpha status

- Update tests for required URL parameter

Signed-off-by: Christoph Auer <[email protected]>

* Cleanup in kserve http helper and options

Signed-off-by: Christoph Auer <[email protected]>

* Further cleanup

Signed-off-by: Christoph Auer <[email protected]>

* Fix for remote-services on tablemodel

Signed-off-by: Christoph Auer <[email protected]>

* fix: improved deserialization of engine_options (#3008)

* add registry of discriminated subclasses

Signed-off-by: Michele Dolfi <[email protected]>

* fix detection of engine_type value

Signed-off-by: Michele Dolfi <[email protected]>

---------

Signed-off-by: Michele Dolfi <[email protected]>

* Add options serialization improvements

Signed-off-by: Christoph Auer <[email protected]>

---------

Signed-off-by: jhchoi1182 <[email protected]>
Signed-off-by: Nikhil Singh <[email protected]>
Signed-off-by: Michele Dolfi <[email protected]>
Signed-off-by: n0rdp0l <[email protected]>
Signed-off-by: Christoph Auer <[email protected]>
Co-authored-by: jhchoi1182 <[email protected]>
Co-authored-by: Nikhil Singh <[email protected]>
Co-authored-by: Michele Dolfi <[email protected]>
Co-authored-by: Felix Wente <[email protected]>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: Michele Dolfi <[email protected]>
cau-git added a commit that referenced this pull request Feb 18, 2026
…ication) and KServe v2 API support (#2979)

* feat: Inference engines abstraction for image classification model family with HF Transformers and ONNX runtime

Implements runtime abstraction for image classification models with support for both ONNX Runtime and HuggingFace Transformers engines. Users can switch between engines without model retraining, similar to the object detection abstraction (#2959).

Key components:
- BaseImageClassificationEngine with factory pattern
- OnnxRuntimeImageClassificationEngine and TransformersImageClassificationEngine implementations
- Shared HfVisionModelMixin for common HF model utilities
- Engine-specific configuration options
- Test suite and example demonstrating runtime engine switching

Signed-off-by: Christoph Auer <[email protected]>

* Add missing files and re-export for backward compat

Signed-off-by: Christoph Auer <[email protected]>

* Don't run with OCR in the example.

Signed-off-by: Christoph Auer <[email protected]>

* Remove excess onnxruntime related options for inuts and outputs

Signed-off-by: Christoph Auer <[email protected]>

* feat: centralize torch compile defaults with DOCLING_INFERENCE_COMPILE_TORCH_MODELS

Signed-off-by: Christoph Auer <[email protected]>

* feat: Add Kserve2 API engine for image classifier and object detection models (#2999)

* fix: add failed pages to DoclingDocument for page break consistency (#2939)

* fix: add failed pages to DoclingDocument for page break consistency

When some PDF pages fail to parse, they were not added to
DoclingDocument.pages, causing page break markers to be incorrect
during export. This adds failed/skipped pages with their size info
(if available) to maintain correct page numbering and structure.

- Add _add_failed_pages_to_document() method in StandardPdfPipeline
- Add test cases for failed page handling
- Add test cases for normal page handling (regression test)
- Add test PDF files

Signed-off-by: jhchoi1182 <[email protected]>

* fix: ensure resource cleanup and simplify type hints

- Wrap page_backend usage in try-finally to guarantee unload (prevents resource leaks).
- Simplify redundant 'float | None | None' type hint.

Signed-off-by: jhchoi1182 <[email protected]>

* fix: add groundtruth for normal_4pages.pdf and exclude failing PDFs from e2e test

Signed-off-by: jhchoi1182 <[email protected]>

* fix: ensure correct status assertion for failed pages in tests

Signed-off-by: jhchoi1182 <[email protected]>

---------

Signed-off-by: jhchoi1182 <[email protected]>

* fix: Use timezone-aware datetime (#2947)

* Use timezone-aware datetime for profiling timestamps

Updated timestamp recording to use timezone-aware datetime.

Signed-off-by: Nikhil Singh <[email protected]>

* run formatter

Signed-off-by: Michele Dolfi <[email protected]>

---------

Signed-off-by: Nikhil Singh <[email protected]>
Signed-off-by: Michele Dolfi <[email protected]>
Co-authored-by: Michele Dolfi <[email protected]>

* fix(asciidoc): handle commas in image alt text (#2983)

* Fix: Handle commas in AsciiDoc image alt text

  - Modified _parse_picture() to gracefully handle alt text containing commas
  - Commas in alt text are now preserved instead of causing ValueError
  - Added test case with realistic auto-generated alt text
  - split('=', 1) prevents issues when values contain '=' characters

* DCO Remediation Commit for n0rdp0l <[email protected]>

I, n0rdp0l <[email protected]>, hereby add my Signed-off-by to this commit: ee75249

Signed-off-by: n0rdp0l <[email protected]>

* style: fix ruff formatting in test_backend_asciidoc.py

Signed-off-by: n0rdp0l <[email protected]>

---------

Signed-off-by: n0rdp0l <[email protected]>
Co-authored-by: Michele Dolfi <[email protected]>

* chore: bump version to 2.73.1 [skip ci]

* First attempt at establishing API Kserve2 facet

Signed-off-by: Christoph Auer <[email protected]>

* refactor: improve KServe v2 engine implementation after code review

- Add comprehensive error handling to KserveV2HttpClient
  - Catch and wrap Timeout, ConnectionError, HTTPError with context
  - Validate response formats with clear error messages

- Refactor URL building to eliminate duplication
  - Extract _build_model_url() helper method
  - Single source of truth for infer_url and model_metadata_url

- Make URL required parameter (remove default localhost:8000)
  - Update ApiKserveV2*EngineOptions to require explicit URL
  - Add preset validation with helpful error messages

- Rename constants for clarity: TRITON_* → KSERVE_V2_*
  - Add comment explaining KServe v2 uses Triton type system

- Improve error messages with actual values
  - Show counts, shapes, and supported types in validation errors

- Document official KServe Python SDK alternative
  - Note async-only requirement and alpha status

- Update tests for required URL parameter

Signed-off-by: Christoph Auer <[email protected]>

* Cleanup in kserve http helper and options

Signed-off-by: Christoph Auer <[email protected]>

* Further cleanup

Signed-off-by: Christoph Auer <[email protected]>

* Fix for remote-services on tablemodel

Signed-off-by: Christoph Auer <[email protected]>

* fix: improved deserialization of engine_options (#3008)

* add registry of discriminated subclasses

Signed-off-by: Michele Dolfi <[email protected]>

* fix detection of engine_type value

Signed-off-by: Michele Dolfi <[email protected]>

---------

Signed-off-by: Michele Dolfi <[email protected]>

* Add options serialization improvements

Signed-off-by: Christoph Auer <[email protected]>

---------

Signed-off-by: jhchoi1182 <[email protected]>
Signed-off-by: Nikhil Singh <[email protected]>
Signed-off-by: Michele Dolfi <[email protected]>
Signed-off-by: n0rdp0l <[email protected]>
Signed-off-by: Christoph Auer <[email protected]>
Co-authored-by: jhchoi1182 <[email protected]>
Co-authored-by: Nikhil Singh <[email protected]>
Co-authored-by: Michele Dolfi <[email protected]>
Co-authored-by: Felix Wente <[email protected]>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: Michele Dolfi <[email protected]>

* Fixes from review

Signed-off-by: Christoph Auer <[email protected]>

* DCO Remediation Commit for Christoph Auer <[email protected]>

I, Christoph Auer <[email protected]>, hereby add my Signed-off-by to this commit: 4cdb01e

Signed-off-by: Christoph Auer <[email protected]>

* DCO Remediation Commit for Christoph Auer <[email protected]>

I, Christoph Auer <[email protected]>, hereby add my Signed-off-by to this commit: e293ba3

Signed-off-by: Christoph Auer <[email protected]>
Signed-off-by: Christoph Auer <[email protected]>

* Add fallback for API variants

Signed-off-by: Christoph Auer <[email protected]>

* Recreate uv.lock

Signed-off-by: Christoph Auer <[email protected]>

---------

Signed-off-by: Christoph Auer <[email protected]>
Signed-off-by: jhchoi1182 <[email protected]>
Signed-off-by: Nikhil Singh <[email protected]>
Signed-off-by: Michele Dolfi <[email protected]>
Signed-off-by: n0rdp0l <[email protected]>
Signed-off-by: Christoph Auer <[email protected]>
Co-authored-by: jhchoi1182 <[email protected]>
Co-authored-by: Nikhil Singh <[email protected]>
Co-authored-by: Michele Dolfi <[email protected]>
Co-authored-by: Felix Wente <[email protected]>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: Michele Dolfi <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

AsciiDoc image macro crashes when alt text contains commas

3 participants