Draw all solid-color UI geometry at once.
Important: This optimization cannot be made for lines and fills that are partly transparent. It also cannot be made for textures that have an alpha channel.
Currently, for each UI box that has a visible background or border, uniforms for color are set, and two triangles are drawn (or four lines) using the corners of the box. If these colors, plus the Z-value, were placed into buffer textures (one for borders and another for fills), a single draw call for all lines and then for all fills could be used instead, allowing better utilization of the GPU.
Enable depth test when drawing UI, so the Z value is used on the GPU. Some instances will no longer need to be sorted to draw from back to front (a separate list can be used to track the instances that still need to be sorted). This technically increases the work the GPU has to do, but the GPU is not fully-utilized by UI-drawing code due to the overhead of issuing all the separate draw calls for each box. The GPU may also be able to perform early-Z testing, to avoid texture look-ups in the fragment shader for fragments that already have something drawn in front.
This will also require app code to be restricted from directly setting box dimensions (since doing so would not update the content of related buffer textures), so it may break a lot of UI code in existing Slice apps.