Internal PDF importer incorrectly maps Unicode characters

Summary:

Certain Unicode characters are incorrectly imported from PDF. I have previously noted issues with importing Cambria Math, but this is worse as it affects characters generated by normal typing in an extremely common font (Calibri, the default Office font).

Steps to reproduce:

Use the text tool to type "Operation" in Calibri. (Operation.svg). The 'ti' is automatically combined into a single character. Save it as a PDF. (Operation.pdf) Reimport it. The result is this:Operation_imported.svg

I suspect that #6672 (closed), #2016, and this are all the same issue: an inability to import certain non-ASCII Unicode characters from PDFs.

Version info

Inkscape 1.2.1 (9c6d41e410, 2022-07-14)

    GLib version:     2.72.2
    GTK version:      3.24.34
    glibmm version:   2.66.4
    gtkmm version:    3.24.6
    libxml2 version:  2.9.14
    libxslt version:  1.1.35
    Cairo version:    1.17.6
    Pango version:    1.50.7
    HarfBuzz version: 4.4.1

    OS version:       Windows 10 1909
Edited by David Burghoff