Skip to content

Substitute Non-Breaking-Space with Normal-Space for PDF font character lookup #21

@rototor

Description

@rototor

I´ve investigated the "#" problem I described in #19 a bit future. The problem is, that   is renderd as '#'. The # comes from the default xhtmlrenderer.conf:

# When rendering text, not all fonts support all character glyphs. When set to true, this
# will replace any missing characters with the specified character to aid in the debugging
# of your PDF.  Currently only supported for PDF rendering.
xr.renderer.replace-missing-characters=false
xr.renderer.missing-character-replacement=#

The character is used as replacement even if xr.renderer.replace-missing-characters=false. It seem no font has a   character. This makes somehow sense, as its visual the same character as a normal space.

Just replacing   (character 160) with ' ' would fix the problem - but it does not feel like a correct fix to me:

--- a/openhtmltopdf-pdfbox/src/main/java/com/openhtmltopdf/pdfboxout/PdfBoxOutputDevice.java
+++ b/openhtmltopdf-pdfbox/src/main/java/com/openhtmltopdf/pdfboxout/PdfBoxOutputDevice.java
@@ -381,6 +332,8 @@ public class PdfBoxOutputDevice extends AbstractOutputDevice implements OutputDe
         for (int i = 0; i < str.length(); ) {
             int unicode = str.codePointAt(i);
             i += Character.charCount(unicode);
+            if( unicode == 160 )
+                unicode = ' ';
             String ch = String.valueOf(Character.toChars(unicode));
             boolean gotChar = false;

Especially because their are more spaces then just space and non-breaking-space. For examples see here https://www.cs.tut.fi/~jkorpela/chars/spaces.html

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions