Hebrew combining diacritics aren't positioned correctly

Thank you for keeping this open source project going!

I can't get Hebrew combining diacritics ("vowels") to appear correctly, even after looking into the some solutions proposed for similar issues.

For example, here is a Hebrew letter BET with a DAGESH (dot in the middle): **בּ**

And here is a screen shot from Word:

![image](https://user-images.githubusercontent.com/799448/191036806-20f930d5-9ba8-441c-84ee-2eae7fdf14e4.png)

I've seen some proposed workarounds to similar issues in #490 and experimented with them, as seen in the following code. Here are the results and here's why I think they don't work and this should be tracked as a separate bug:

![image](https://user-images.githubusercontent.com/799448/191037332-db9a5464-92c5-4730-b567-892716804cba.png)

1. One part of the solution in #490 is to use `arabic_reshaper`. I don't think this hurts, but I also don't think it affects Hebrew.
2. Another part of the solution in #490 is to use `bidi.algorithm.get_display`. This reverses the order of the characters. I don't think it's actually correct to reverse the order of combining diacritics; they should still come after their base character in the string, even in RTL languages. (This might be something to fix in `get_display`.) This appears to be what causes the DAGESH to move from being misplaced on one side to being misplaced on the other side of the BET.
3. There's also a proposed solution in #490 of using Unicode normalization. However, this doesn't work for Hebrew. Hebrew is excluded from the Unicode composition algorithm ([see here](https://www.unicode.org/reports/tr15/)). Moreover, while the example of BET WITH DAGESH happens to have a [composed character](https://www.unicode.org/charts/PDF/UFB00.pdf), there are very limited basic composed characters (my guess is only what's needed for Yiddish). **Most** of the combinations of Hebrew with diacritics needed for Biblical and other historic/literary/educational Hebrew purposes do not have composed characters. So, there's still a need to render combining diacritics correctly, and not rely on normalization to solve this.

In theory I'd love to contribute a fix to this but I'm not sure I have the time or knowledge; maybe someone can point me in the right direction? In particular, I wonder if this an issue in FPDF2 itself, or with the font subsetting from `fonttools`? From what I can tell, the PDF doesn't contain the X and Y position of each diacritic explicitly; rather, it contains the string and the font, and logic in the embedded font provides the exact position within the string. Is that correct?

Here's my sample code. Thanks in advance for your help!

```python
import os
import unicodedata

from fpdf import FPDF

from arabic_reshaper import reshape
from bidi.algorithm import get_display

def debug_string(s, desc):
    print(f"*** {desc} ***")
    for c in s:
        print(c, ord(c), unicodedata.name(c))

def fix_text(some_text):
    debug_string(some_text, "original")

    # Try fixes from discussion on https://github.com/PyFPDF/fpdf2/pull/490
    some_text = unicodedata.normalize('NFC', some_text)

    debug_string(some_text, "normalized (NFC)")

    some_text = get_display(reshape(some_text))

    debug_string(some_text, "reshaper and bidi alorithm fixed")

    return some_text

pdf = FPDF(unit="in", format="Letter")
pdf.add_font("SBL_Hbrw", fname="SBL_Hbrw.ttf")
pdf.set_font("SBL_Hbrw", "", 30)

pdf.add_page()

some_text = "בּ"

pdf.set_xy(1, 1)
pdf.cell(1, 4, some_text)

some_text = fix_text(some_text)

pdf.set_xy(1, 2)
pdf.cell(1, 4, some_text)

filename = "hebrew.pdf"
pdf.output(filename)
os.startfile(filename)  # windows only

```


**Environment**
* Windows
* Python version 3.10.5
* `fpdf2` version 2.5.7

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Hebrew combining diacritics aren't positioned correctly #549

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Hebrew combining diacritics aren't positioned correctly #549

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions