MT Enablement 4/4: Multithreaded Canvas (!4986) · Merge requests · Inkscape / inkscape

PBS requested to merge pbs3141/inkscape:multithread into master Jan 05, 2023

Short summary

Make the Canvas render on multiple background threads, rather than just one. This makes rendering many times faster. See the video below for a comparison!

mt-cmp

Description

This finishes off the multi-thread enablement work by turning on tile parallelisation.

Main changes

Since !4876 (merged) the canvas has had the ability to render multiple tiles in parallel, but it had been turned off because the key function, CanvasItem::render(), was not thread-safe - multiple threads could not call this function concurrently. The problem was that it mutated the data structures it was rendering, mainly by writing to various caches. The complete list of these caches is

The cache for node handle bitmaps, CanvasItemCtrl::_cache.
The toroidal pattern cache, DrawingPattern::surfaces.
Four cached Cairo patterns, NRStyle::[text_decoration_]{fill|stroke}_pattern
The main render cache, DrawingItem::_cache.

So the main job of this MR is simply to make CanvasItem::render() const, which means that it can be called thread-safely. Most of the work therefore simply involved adding const to every possible function that could be called in CanvasItem::render(). This makes up the vast bulk of the changes in the diff, because there's a lot of them.

But of course, eventually such refactoring will run into the problem that you need to write to the above caches in a const function, which is not allowed. So the variables needing to be written to must be marked as mutable, and additional synchronisation introduced, e.g. mutexes, although we try to avoid these if possible. For each item in the above list, I chose the following approach:

Wrap the initialisation of the cache in an InitLock. This is a new class that essentially does the same thing as std::call_once and std::once_flag, which is to allow an object to be initialised once without needing a mutex, and without any cost to access after the initialisation. The only reason the extra class was needed is because a std::once_flag deliberately cannot be reset (even though there are use cases where it's valid, like here).
Simply wrap in a std::mutex. This is because the cache doesn't just need initialisation once, but continual mutating, and is a big complex data structure. This reduces pattern rendering to single-threaded performance. The main reason you can get away with this is because patterns are typically very simple, so don't take long to render anyway. In principle this can be improved (see later) but it's a monumental pain.
Wrap in an InitLock.
Wrap in a std::mutex. For filtered elements we hold the mutex for the full duration of rendering, while for ordinary cached elements we only hold it while reading and writing from the cache, not while rendering what is to go into it. This is more or less a stopgap measure until a better solution for dealing with the cache comes along, if it's found to be necessary (see later).

Related to point 3, NRStyle had to changed significantly to accommodate InitLock making it no-longer movable, requiring a split into NRStyle + NRStyleData. Additionally, the workflow of calling prepareFill() to create a pattern followed by applyFill() to apply the pattern to the context became no longer possible, because it relies on the first call setting up a cached pattern that is assumed not to be modified by the time of the next call. Instead, I changed it to auto p = prepareFill() and applyFill(p) - that is, the caller is responsible for holding the pattern in between preparing and applying, not the NRStyle.

Smaller changes

Preferences changes

Update the description of the number-of-threads setting, and adjust the default by subtracting one from the number of cores. (It was found that using all the cores can postpone the GUI thread for too long, causing intermittent stuttering under heavy load.)
Use smaller tiles for small objects

When dragging a smaller object, adapt the tile size accordingly so we don't end up just rendering it in one tile, losing parallelism.
Revert from OpenMP to boost::asio

One of the late changes in !4876 (merged) was to change the canvas thread pool from boost::asio to OpenMP. This reverts it back, after some testing with OpenMP !4972 (comment 1224715146) essentially proved it unsuitable for use. A longer-term goal is to remove it completely; currently the only remaining use is in filter rendering.

Other misc changes

DrawingPattern and DrawingCache were refactored somewhat, but not in a way that is significant to this MR.
A redundant _markForRendering() was removed from DrawingImage.
A system for tracking the complexity of items in the DrawingItem is introduced, and used to make sure that when transforming a group containing a huge number of objects, instead of invalidating every object's bounding box, only the total bounding box is invalidated. (The two typically cover the same area for a huge group, but the latter takes a lot less time to invalidate.)

Caveats: The Cache

As mentioned, implementing a concurrency-friendly rendering cache is a major pain. As a result it's advisable to turn the rendering cache size down to zero to get the best performance (and that's what I use in the video). In the future, one of either of these two things needs to happen:

Disable caching by default
Fix the caching system

There is a solution for the latter in which rendering of cached items is dynamically parallelised based on how many threads need the result, but I've currently held off implementing such a system due to its complexity until further feedback.

Edited Jan 31, 2023 by PBS

MT Enablement 4/4: Multithreaded Canvas