Skip to content

Add text submodule

David Burghoff requested to merge burghoff/extensions:master into master

What does the merge request do?

This adds a text API to inkex, which can be used to query font metrics and accurately determine text bounding boxes without command calls. After importing inkex.text, all TextElements and FlowRoots have the parsed_text attribute, which parses the objects into a collection of lines and characters:

  1. parsed_text.lns is a list of tlines, which corresponds to a contiguous block of text (including sub/superscripts).
  2. tline.cs is a list of tchars, which correspond to individual characters.
  3. tline.ws is a list of tchunks, which corresponds to a group of characters sharing an anchor. For manually-kerned text, each character will usually have its own chunk.

This is primarily contained within inkex.text.TextParser, which defines the parsed_text object. Each character has the pts_t and pts_ut attributes, which contain a list of transformed and untransformed points representing the bounding box respectively. Each ParsedText has a Make_Highlights function that can be used to demo the bounding box features and debug; the merge request also creates the Highlight Text extension that is essentially a demo of this.

The way this is done is by querying the font metrics in the Character_Table class, which gets font metrics from inkex.text.font_properties. font_properties interfaces with three libraries:

  1. fontconfig: Used for discovering fonts based on the CSS style. This uses Inkscape's libfontconfig, so it always matches what Inkscape does.
  2. Pango: Uses the GTK Python bindings to render test characters by setting up a blank Pango context and rendering text to it. It reuses the same layout for all rendering.
  3. fonttools: Directly gets font properties from the file (from font's filename). Currently, this is primarily included as a backup for when Pango is not included, as it is Pythonic and a little slower than GTK.

As an absolute last resort, Character_Table can do a command call to query the font metrics, but I think this should eventually be removed as it is very slow. fonttools is a suitable backup. font_properties is also useful for determining font configurations—the true_style function will determine the font that is selected for a given style (i.e., the font-family, font-weight, font-style, and font-stretch that are selected when a font is specified).

Because text parsing requires repeated style and transform composing, this also adds inkex.text.cache, which provides BaseElements with cached versions of many properties, including style (cstyle, cspecified_style, ccascaded_style), transform (ctransform, ccomposed_transform), and some others as well (cpath, croot, cdefs, ctag). For the most part, these do not interfere with existing functions except to replace them with wrapped version. It also includes inkex.text.speedups, which patches some inkex functions for improved speed. While all of the Inkex tests still pass when inkex.text is imported, it is possible that there are some problems with the patches yet to be determined. Going forward, a lot of this functionality could be moved into base Inkex, but for now they're left separated.

I made one modification to the tester module that may require some discussion. Prior to looking for diffs, inkex.tester.xmldiff now assigns every element a unique ID corresponding to its location in the document, replacing all instances of the original (including references). This obviates the need for the CompareWithoutIDs filter. I did this because inkex.text.cache assigns IDs in numerical order rather than randomly, which was causing many Inkex tests to fail. In general I am finding it much easier to use than the previous functionality. It also includes the original ID in any differences, which makes it easier to find which elements are causing test failures in large documents.

Closes issue 461.

Implementation notes

It includes four external packages:

  1. fonconfig: Bindings for the fontconfig library, modified to look for Inkscape's copy.
  2. fontTools: An alternative scheme for querying font metrics, currently used as a backup when Pango is not present. In the future, would also be useful for text-to-path functionality.
  3. svgpathtools: Currently used to calculate intersection points when determining the position of flowed text.
  4. svgwrite: Required by svgpathtools.

It also includes the introspection typelibs needed to load PangoFT2 on Windows.

Summary for release notes

Gives an API that allows extensions to measure and manipulate text.

Checklist

  • Add unit tests (if applicable)
  • Changes to inkex/ are well documented
  • Clean merge request history
Edited by David Burghoff

Merge request reports