-
Notifications
You must be signed in to change notification settings - Fork 325
Description
Thank you for keeping this open source project going!
I can't get Hebrew combining diacritics ("vowels") to appear correctly, even after looking into the some solutions proposed for similar issues.
For example, here is a Hebrew letter BET with a DAGESH (dot in the middle): בּ
And here is a screen shot from Word:
I've seen some proposed workarounds to similar issues in #490 and experimented with them, as seen in the following code. Here are the results and here's why I think they don't work and this should be tracked as a separate bug:
- One part of the solution in added info about an arabic script fix, fixed typo #490 is to use
arabic_reshaper. I don't think this hurts, but I also don't think it affects Hebrew. - Another part of the solution in added info about an arabic script fix, fixed typo #490 is to use
bidi.algorithm.get_display. This reverses the order of the characters. I don't think it's actually correct to reverse the order of combining diacritics; they should still come after their base character in the string, even in RTL languages. (This might be something to fix inget_display.) This appears to be what causes the DAGESH to move from being misplaced on one side to being misplaced on the other side of the BET. - There's also a proposed solution in added info about an arabic script fix, fixed typo #490 of using Unicode normalization. However, this doesn't work for Hebrew. Hebrew is excluded from the Unicode composition algorithm (see here). Moreover, while the example of BET WITH DAGESH happens to have a composed character, there are very limited basic composed characters (my guess is only what's needed for Yiddish). Most of the combinations of Hebrew with diacritics needed for Biblical and other historic/literary/educational Hebrew purposes do not have composed characters. So, there's still a need to render combining diacritics correctly, and not rely on normalization to solve this.
In theory I'd love to contribute a fix to this but I'm not sure I have the time or knowledge; maybe someone can point me in the right direction? In particular, I wonder if this an issue in FPDF2 itself, or with the font subsetting from fonttools? From what I can tell, the PDF doesn't contain the X and Y position of each diacritic explicitly; rather, it contains the string and the font, and logic in the embedded font provides the exact position within the string. Is that correct?
Here's my sample code. Thanks in advance for your help!
import os
import unicodedata
from fpdf import FPDF
from arabic_reshaper import reshape
from bidi.algorithm import get_display
def debug_string(s, desc):
print(f"*** {desc} ***")
for c in s:
print(c, ord(c), unicodedata.name(c))
def fix_text(some_text):
debug_string(some_text, "original")
# Try fixes from discussion on https://github.com/PyFPDF/fpdf2/pull/490
some_text = unicodedata.normalize('NFC', some_text)
debug_string(some_text, "normalized (NFC)")
some_text = get_display(reshape(some_text))
debug_string(some_text, "reshaper and bidi alorithm fixed")
return some_text
pdf = FPDF(unit="in", format="Letter")
pdf.add_font("SBL_Hbrw", fname="SBL_Hbrw.ttf")
pdf.set_font("SBL_Hbrw", "", 30)
pdf.add_page()
some_text = "בּ"
pdf.set_xy(1, 1)
pdf.cell(1, 4, some_text)
some_text = fix_text(some_text)
pdf.set_xy(1, 2)
pdf.cell(1, 4, some_text)
filename = "hebrew.pdf"
pdf.output(filename)
os.startfile(filename) # windows onlyEnvironment
- Windows
- Python version 3.10.5
fpdf2version 2.5.7

