Audit usage of Glib::ustring
From a very unscientific method of spamming the GDB backtrace while opening a ~10 MB file, there were a few main things that popped up:
- ~50%: UTF-8 collate/comparison functions
- ~35%: Pango font functions (mostly GSUB reading: inbox#440)
- ~15%: other stuff
Let's assume this dodgy method does indeed reflect how much time is being spent in them, though obviously this could be very wrong.
Why are these UTF-8 functions being called then? Let's take a simple example:
const Glib::ustring foo = "Lorem ipsum dolor sit amet";
if (foo != "bar") {
std::cout << "Is this fast?";
}
Now you're probably thinking, this is pretty fast right? All we need to do is compare the string lengths, figure that they're obviously not equal and move on.
This would be true if these were mere std::string
s, but alas! Welcome to UTF-8. All of a sudden this becomes particularly expensive when we're dealing with Glib::ustring
s, because you can't just count the number of bytes to find the number of characters!
Glib::ustring
performs a linguistically aware comparison instead of a byte by byte one, but most of the time we don't need this.
So is this a valid concern? Here's what I'd suggest, but I'm open to other ideas:
- If you want to use
==
orcompare
but don't need Unicode comparison, convert it tostd::string
first. - If you're not going to be iterating through Unicode strings, consider whether you really need
Glib::ustring
.