Skip to content

[SPIKE] investigate options for custom work item types with custom defined attributes per type

Everyone can contribute. Help move this issue forward while earning points, leveling up and collecting rewards.

Plan team has an ambitious initiative, to expand current issues functionality with extensible issues. The goal is to allow for the level of customization many customers look for in order to model their business's flows and processes as closely as possible, here is @gweaver's summary on the goal of the incentive https://youtu.be/PjAV2E19uIk

Topic to Evaluate

In this technical spike, I'm looking to present an a DB level data architecture and start a healthy discussion and evaluation of the technical approach to achieve the required customization by also keeping performance top of mind. This only looks into the work item types and work item custom attributes(or called widgets sometimes), there are other aspects to be investigated, that will not be part of this spike, but just to give some idea, take for instance ability to assemble work item types in hierarchies, then use those work item types hierachies to dictate the relationship between work items themselves, or define workflows and automations on work items, or composable templates and other.

After this spike we should be able to answer if the presented technical approach is or is not worth pursuing further as an implementation path or if it is worth doing a narrower PoC, answering particular questions or concerns or we can just dismiss it as not feasible.

This spike comes out of the technical discussion on issue initiative

The two questions

  • How to store work items for custom types?
  • How to store attributes for each of the work items given the customization?

Question One: Options for storing work items

Option 1: A separate table per work item type

I did not spent too much time investigating this as it does not seem feasible to me to define a table whenever a customer wants to create an work item type. For a simple sanity check I've considered that on average a fairly large organization may need 50 - 100 work item types to cover all of the processes and departments: eg, OKR, Initiative, Epic for product planning, Feature, Issue, Bug, Tech Debt for project management, there are also Requirments, Test Cases, Incidents, maybe something that Legal or Operations would use, etc. I've also considered that we currently have ~1.5M top level groups, even if we take 10% of those to be using custom work items we would not be able to create a table for each customer defined type.

Option 2: A separate table per predefined work item types and a separate table for user defined work items

I did not spend too much time on this option either, as I considered this as a subset of the next one Single table to track all work items, and we may still end up in the Single table to track all work items scenario if customer does not use default types and just goes to define their own types and only use that.

Option 3: Single table to track all work items

I've looked towards this approach as this seemed to be most straight forward and because some of other ideas I had were a subset of this anyway. In this case I was considering from the get go that the table that would store work items would be very thing, and it would not contain any of the work item attributes other than the most basic ones. E.g.

CREATE TABLE work_items (
    id bigint NOT NULL,
    work_item_type_id bigint NOT NULL
    project_id bigint NOT NULL, (using project_id as this relates to `issues` table the easiest, but it can be group_id namespace_id, wherever we end with groups and projects consolidation)
    created_at timestamp,
    updated_at timestamp
);

This would also align with our current sharding initiative where we can then shard work_items per root namespace.

Question two: Options for defining custom attributes and storing the values for each work item

Next question is, how do we now store custom defined attributes for every record in work_items.

Store custom attributes as structured data in a JSONB column

First option, that I eventually dismissed, is having a data column on work_items to store the values for custom defined attributes. We could make it obey a certain JSON schema so that we would know what and how to query. The main issue I am having with this approach is that on current issues table and consequently on work_items we do a and we plan to do even more data aggregation, grouping and ordering by values, which does not perform greatly with JSON/JSONB attributes, there is also no strong data type validation and also work items are very active objects that change a lot and having to update a quite big and heavily indexes column is not performant. This would perhaps be a very good option to store highly customized data that is written once and the is presented as is in many places.

So to summarise:

  • Pros:
    • allows for high level of customization required for the initiative
    • easy to setup
    • writing simple queries for fetching data is very similar to have a column in a table, i.e. easy.
  • Cons:
    • no strongly typed data, all data is text
    • does not perform well when having to aggregate data on attributes within JSON
    • does not perform well when it needs to be updated often
    • no referential integrity checks(FKs)
    • no possibility to partition JSON data later on

Entity Attribute Values (EAV) With Values As Text

I briefly considered this when I was pointed to this article https://tapoueh.org/blog/2018/03/database-modelization-anti-patterns/#entity-attribute-values so I think this is a no go either.

Entity Attribute Values Inspired Variation

Even though I've dismissed EAV quite quickly the overall data structure gives a lot of room for customization, there are questions on ability to have strong data types as well as quite a big question on performance. I've attempted a variation on EAV but with strong data types capability, which still needs to prove that it can perform well enough.

And here is where the technical part of my investigation begins!!! 😸

Important:

  • All query plans are done on warm cache.
  • Performance of inserts and updates was not tested.
The table structure

work_items_diagram

I though I would treat the data as if this data was sharded to a single namespace and try to validate that we can query with a reasonable performance issues within a single namespace

So after populating the tables with some data, here is what I have:

  • Defined some work item types:
gitlabhq_dblab=# select * from work_item_types;
 id |    title
----+-------------
  0 | Issue
  1 | Incident
  2 | Test Case
  3 | Requirement
(4 rows)
  • Defiend some of the common issue attributes as custom attributes
gitlabhq_dblab=# select * from work_item_types_fields_metadata;
 id | work_item_type_id |      title      | field_type
----+-------------------+-----------------+------------
  1 |                 1 | Title           |          1
  2 |                 1 | Description     |          2
  3 |                 1 | Weight          |          3
  4 |                 1 | Health Status   |          3
  5 |                 1 | Epic            |          3
  6 |                 1 | Milestone       |          3
  7 |                 1 | Iteration       |          3
  8 |                 1 | Assignees       |          3
  9 |                 1 | Labels          |          3
 10 |                 1 | Due Date        |          4
 11 |                 1 | Confidentiality |          5
 12 |                 1 | Lock            |          5
 13 |                 1 | Status          |          3
(13 rows)
  • imported ~700K issues and treat it as issues of a single project/namespace
gitlabhq_dblab=# select count(*) from work_items;
 count
--------
 720555
(1 row)
  • obviously the distribution is towards Issue type right now
gitlabhq_dblab=# select work_item_type_id, count(*) from work_items group by work_item_type_id;
 work_item_type_id | count
-------------------+--------
                 0 | 720326
                 1 |    185
                 2 |     44
(3 rows)
  • imported the values for above defined custom attributes(work_item_types_fields_metadata) into the work_item_fields_values table, and got ~8M records:
gitlabhq_dblab=# select count(*) from work_item_fields_values;
  count
---------
 8101713
(1 row)
Let's query some data:
  explain analyze
  SELECT
      work_item_id,
      work_item_fields_values.bigint_value
  FROM
      work_item_fields_values
  WHERE
      work_item_type_field_metadata_id = 3
  ORDER BY
      work_item_fields_values.bigint_value DESC
  LIMIT 10000


 Limit  (cost=0.56..1639.92 rows=10000 width=16) (actual time=0.065..13.491 rows=10000 loops=1)
   ->  Index Scan Backward using work_item_fields_values_wi_field_metadata_id_bigint_value on work_item_fields_values  (cost=0.56..99462.51 rows=606710 width=16) (actual time=0.063..12.177 rows=10000 loops=1)
         Index Cond: (work_item_type_field_metadata_id = 3)
 Planning Time: 0.541 ms
 Execution Time: 14.252 ms
(5 rows)
  • keep the same order as count as above, but also select milestone_id, ~60ms: https://explain.depesz.com/s/RjmE This is where the queries start getting complicated, and thanks to @ahegyi for his help to figure out how to approach it
  explain analyze
  select work_item_id, wifv.bigint_value as weight, milestone_query.milestone
  from work_item_fields_values wifv 
  LEFT JOIN LATERAL (
	  SELECT bigint_value as milestone 
	  from work_item_fields_values as milestones_tbl 
	  WHERE work_item_type_field_metadata_id = 6 
	  AND milestones_tbl.work_item_id=wifv.work_item_id 
	  limit 1
	) as milestone_query ON TRUE
  where wifv.work_item_type_field_metadata_id = 3
  order by wifv.bigint_value desc
  limit 10000

Limit  (cost=0.99..36365.35 rows=10000 width=24) (actual time=0.050..54.411 rows=10000 loops=1)
   ->  Nested Loop Left Join  (cost=0.99..2206262.99 rows=606710 width=24) (actual time=0.049..52.809 rows=10000 loops=1)
         ->  Index Scan Backward using work_item_fields_values_wi_field_metadata_id_bigint_value on work_item_fields_values wifv  (cost=0.56..99462.51 rows=606710 width=16) (actual time=0.031..7.351 rows=10000 loops=1)
               Index Cond: (work_item_type_field_metadata_id = 3)
         ->  Limit  (cost=0.43..3.45 rows=1 width=8) (actual time=0.004..0.004 rows=0 loops=10000)
               ->  Index Scan using wi_fields_values_wi_id_and_wi_type_field_metadata_id on work_item_fields_values milestones_tbl  (cost=0.43..3.45 rows=1 width=8) (actual time=0.004..0.004 rows=0 loops=10000)
                     Index Cond: ((work_item_id = wifv.work_item_id) AND (work_item_type_field_metadata_id = 6))
 Planning Time: 0.367 ms
 Execution Time: 55.186 ms
(9 rows)

explain analyze
  select work_item_id, title, milestone_query.milestone, health_status_query.health_status, iteration
  from work_item_fields_values wifv 
  LEFT JOIN LATERAL (
	  SELECT bigint_value as milestone 
	  from work_item_fields_values as milestones_tbl 
	  WHERE work_item_type_field_metadata_id = 6 
	  AND milestones_tbl.work_item_id=wifv.work_item_id 
	  limit 1
	) as milestone_query ON TRUE
  LEFT JOIN LATERAL (
	  SELECT bigint_value as health_status 
	  from work_item_fields_values as health_statuses 
	  WHERE work_item_type_field_metadata_id = 4 
	  AND health_statuses.work_item_id=wifv.work_item_id 
	  limit 1
	) as health_status_query ON TRUE
  LEFT JOIN LATERAL (
	  SELECT bigint_value as iteration 
	  from work_item_fields_values as iterations 
	  WHERE work_item_type_field_metadata_id = 7 
	  AND iterations.work_item_id=wifv.work_item_id 
	  limit 1
	) as iterations_query ON TRUE 
  LEFT JOIN LATERAL (
	  SELECT string_value as title 
	  from work_item_fields_values as titles 
	  WHERE work_item_type_field_metadata_id = 1 
	  AND titles.work_item_id=wifv.work_item_id 
	  limit 1
	) as title_query ON TRUE
  where wifv.work_item_type_field_metadata_id = 3 and wifv.bigint_value > 7 and iteration = 1885566
  order by wifv.bigint_value asc
  limit 10000;


Limit  (cost=2.29..141166.33 rows=10000 width=100) (actual time=45.629..131.211 rows=33 loops=1)
   ->  Nested Loop Left Join  (cost=2.29..5798978.74 rows=410797 width=100) (actual time=45.627..131.203 rows=33 loops=1)
         ->  Nested Loop  (cost=1.85..4372486.16 rows=410797 width=40) (actual time=45.603..131.065 rows=33 loops=1)
               ->  Nested Loop Left Join  (cost=1.42..2944966.58 rows=410797 width=32) (actual time=0.083..98.776 rows=12073 loops=1)
                     ->  Nested Loop Left Join  (cost=0.99..1518474.00 rows=410797 width=24) (actual time=0.062..67.128 rows=12073 loops=1)
                           ->  Index Scan using work_item_fields_values_wi_field_metadata_id_bigint_value on work_item_fields_values wifv  (cost=0.56..91981.42 rows=410797 width=16) (actual time=0.036..9.353 rows=12073 loops=1)
                                 Index Cond: ((work_item_type_field_metadata_id = 3) AND (bigint_value > 7))
                           ->  Limit  (cost=0.43..3.45 rows=1 width=8) (actual time=0.004..0.004 rows=0 loops=12073)
                                 ->  Index Scan using wi_fields_values_wi_id_and_wi_type_field_metadata_id on work_item_fields_values milestones_tbl  (cost=0.43..3.45 rows=1 width=8) (actual time=0.004..0.004 rows=0 loops=12073)
                                       Index Cond: ((work_item_id = wifv.work_item_id) AND (work_item_type_field_metadata_id = 6))
                     ->  Limit  (cost=0.43..3.45 rows=1 width=8) (actual time=0.002..0.002 rows=0 loops=12073)
                           ->  Index Scan using wi_fields_values_wi_id_and_wi_type_field_metadata_id on work_item_fields_values health_statuses  (cost=0.43..3.45 rows=1 width=8) (actual time=0.002..0.002 rows=0 loops=12073)
                                 Index Cond: ((work_item_id = wifv.work_item_id) AND (work_item_type_field_metadata_id = 4))
               ->  Subquery Scan on iterations_query  (cost=0.43..3.47 rows=1 width=8) (actual time=0.002..0.002 rows=0 loops=12073)
                     Filter: (iterations_query.iteration = 1885566)
                     Rows Removed by Filter: 0
                     ->  Limit  (cost=0.43..3.45 rows=1 width=8) (actual time=0.002..0.002 rows=0 loops=12073)
                           ->  Index Scan using wi_fields_values_wi_id_and_wi_type_field_metadata_id on work_item_fields_values iterations  (cost=0.43..3.45 rows=1 width=8) (actual time=0.002..0.002 rows=0 loops=12073)
                                 Index Cond: ((work_item_id = wifv.work_item_id) AND (work_item_type_field_metadata_id = 7))
         ->  Limit  (cost=0.43..3.45 rows=1 width=60) (actual time=0.003..0.003 rows=1 loops=33)
               ->  Index Scan using wi_fields_values_wi_id_and_wi_type_field_metadata_id on work_item_fields_values titles  (cost=0.43..3.45 rows=1 width=60) (actual time=0.003..0.003 rows=1 loops=33)
                     Index Cond: ((work_item_id = wifv.work_item_id) AND (work_item_type_field_metadata_id = 1))
 Planning Time: 1.037 ms
 Execution Time: 131.370 ms
(24 rows)

Some conclusions:

  • implement sane limits: You could see that even for 13 fields and 700K work items we got ~8M records in work_item_fields_values, I estimate that an organization would have more like 2000 or more fields, across all work item types, if we consider an average of 100 work items and 20 fields per type. But I think this is where we can impose some sane limits: e.g. 2000 fields max per organization(i.e. root namespace), similarly for instance limit to max to 50 fields per work item type.
  • data partitioning: We can also consider partitioning data by work_item_type_field_metadata_id as that data would be solely related to a single field,
  • single data type per table: We can consider instead of having a single table with separate columns per base DB type we could move each DB type to a separate table,
  • separate table per relationship: Also in this example I've added milestone_id, iteration_id and other id references within a single bigint_values column, for sake of data volume, it would make more sense to have a separate table for each of the first class objects relationships so that we can also keep referential integrity checks.
  • we should analyze the tradeoffs: The queries become more complex in order to achieve the performance and some of the complexity will also move to the application layer, where we will need to build these queries.
  • this is a spike not a full solution: This is definitely not an exhaustive testing of this approach, and it would require a very special approach for each of the queries that we are going to build given the level of customization we want to inject.

This is a pros and cons list that is by no means definitive:

  • Pros:
    • allows for high level of customization required for the initiative
    • potential to perform well
    • strongly typed data, all data is text
    • referential integrity checks
    • potential to partition data
  • Cons:
    • complexity of querying
    • complexity on application level
    • more complicated aggregations/computations would need specific approaches to solve for

Risks and Implementation Considerations

This is a big change and considering that it links into a lot of product areas it becomes complex very fast. So I would like us to weight the pros and cons before we decide to engage into implementing this one way or the other. The change has to happen or we would be left behind, but we need to have a good understanding of the path we are taking and iteration in small steps towards that.

Updated Queries:

Decided to add a few more values to check how performance might degrade, so here are some updated numbers to look at:

  • now we got ~3x more work_items (i.e issues):
gitlabhq_dblab=# select count(*) from work_items;
  count
---------
 2439299
(1 row)
  • ~3x more work items resulted in almost ~2.5x more values for columns, obviously this table is very dependent on how many attributes each work item would have and much data each attribute would contain, for 2.4M work_items max number of records for values would be 2.4M*13(given 13 existing attributes)=> 31.2M rows, we got ~70% of max number, 21.6M:
gitlabhq_dblab=# select count(*) from work_item_fields_values;
  count
----------
 21631904
(1 row)
  • values distribution for the above defined fields metadata(work_item_types_fields_metadata) 1 - 13 :
gitlabhq_dblab=# select work_item_type_field_metadata_id, count(*) from work_item_fields_values group by work_item_type_field_metadata_id order by count(*) desc limit 20;

 work_item_type_field_metadata_id |  count
----------------------------------+---------
                                9 | 8502563
                               11 | 4558945
                                1 | 4127688
                               13 | 2639750
                                3 |  816668
                                8 |  484349
                                6 |  267844
                                2 |   91257
                                5 |   60533
                               10 |   54710
                               12 |   18298
                                7 |    7224
                                4 |    2075
(13 rows)
  • checking performance for the above queries with 2.5x-3x more data and also given the values distribution would be interesting to check against the bigger value work_item_type_field_metadata_id = 9

  • fetch first 10K work items id, weight ordered by weight in descending order

  explain analyze
  SELECT
      work_item_id,
      work_item_fields_values.bigint_value
  FROM
      work_item_fields_values
  WHERE
      work_item_type_field_metadata_id = 3
  ORDER BY
      work_item_fields_values.bigint_value DESC
  LIMIT 10000


Limit  (cost=0.56..275.13 rows=10000 width=16) (actual time=0.038..3.997 rows=10000 loops=1)
   ->  Index Only Scan Backward using work_item_fields_values_wi_field_metadata_id_bigint_work_item on work_item_fields_values  (cost=0.56..22465.40 rows=818191 width=16) (actual time=0.036..2.701 rows=10000 loops=1)
         Index Cond: (work_item_type_field_metadata_id = 3)
         Heap Fetches: 0
 Planning Time: 0.254 ms
 Execution Time: 4.638 ms
(6 rows)
  • fetch first 10K work items id, label_id ordered by label_id in descending order
  explain analyze
  SELECT
      work_item_id,
      work_item_fields_values.bigint_value
  FROM
      work_item_fields_values
  WHERE
      work_item_type_field_metadata_id = 9
  ORDER BY
      work_item_fields_values.bigint_value DESC
  LIMIT 10000


 Limit  (cost=0.56..275.11 rows=10000 width=16) (actual time=0.050..3.624 rows=10000 loops=1)
   ->  Index Only Scan Backward using work_item_fields_values_wi_field_metadata_id_bigint_work_item on work_item_fields_values  (cost=0.56..233508.60 rows=8505088 width=16) (actual time=0.048..2.360 rows=10000 loops=1)
         Index Cond: (work_item_type_field_metadata_id = 9)
         Heap Fetches: 0
 Planning Time: 0.195 ms
 Execution Time: 4.257 ms
(6 rows)
  • fetch first 10K work item id, title, milestone_id, health_status, iteration_id ordered by weight in descending order where weight is higher than 7
explain analyze
  select work_item_id, title, milestone_query.milestone, health_status_query.health_status, iteration
  from work_item_fields_values wifv 
  LEFT JOIN LATERAL (
	  SELECT bigint_value as milestone 
	  from work_item_fields_values as milestones_tbl 
	  WHERE work_item_type_field_metadata_id = 6 
	  AND milestones_tbl.work_item_id=wifv.work_item_id 
	  limit 1
	) as milestone_query ON TRUE
  LEFT JOIN LATERAL (
	  SELECT bigint_value as health_status 
	  from work_item_fields_values as health_statuses 
	  WHERE work_item_type_field_metadata_id = 4 
	  AND health_statuses.work_item_id=wifv.work_item_id 
	  limit 1
	) as health_status_query ON TRUE
  LEFT JOIN LATERAL (
	  SELECT bigint_value as iteration 
	  from work_item_fields_values as iterations 
	  WHERE work_item_type_field_metadata_id = 7 
	  AND iterations.work_item_id=wifv.work_item_id 
	  limit 1
	) as iterations_query ON TRUE 
  LEFT JOIN LATERAL (
	  SELECT string_value as title 
	  from work_item_fields_values as titles 
	  WHERE work_item_type_field_metadata_id = 1 
	  AND titles.work_item_id=wifv.work_item_id 
	  limit 1
	) as title_query ON TRUE
  where wifv.work_item_type_field_metadata_id = 3 and wifv.bigint_value > 7 and iteration = 1885566
  order by wifv.bigint_value asc
  limit 10000;


 Limit  (cost=2.81..136420.09 rows=10000 width=82) (actual time=43.485..219.670 rows=33 loops=1)
   ->  Nested Loop Left Join  (cost=2.81..4821794.29 rows=353459 width=82) (actual time=43.483..219.662 rows=33 loops=1)
         ->  Nested Loop  (cost=2.25..3831480.53 rows=353459 width=40) (actual time=43.472..219.520 rows=33 loops=1)
               ->  Nested Loop Left Join  (cost=1.69..2557260.84 rows=353459 width=32) (actual time=0.049..158.066 rows=22442 loops=1)
                     ->  Nested Loop Left Join  (cost=1.12..1283924.79 rows=353459 width=24) (actual time=0.044..98.470 rows=22442 loops=1)
                           ->  Index Only Scan using work_item_fields_values_wi_field_metadata_id_bigint_work_item on work_item_fields_values wifv  (cost=0.56..10588.74 rows=353459 width=16) (actual time=0.027..6.206 rows=22442 loops=1)
                                 Index Cond: ((work_item_type_field_metadata_id = 3) AND (bigint_value > 7))
                                 Heap Fetches: 0
                           ->  Limit  (cost=0.56..3.58 rows=1 width=8) (actual time=0.004..0.004 rows=0 loops=22442)
                                 ->  Index Scan using wi_fields_values_wi_id_and_wi_type_field_metadata_id on work_item_fields_values milestones_tbl  (cost=0.56..3.58 rows=1 width=8) (actual time=0.003..0.003 rows=0 loops=22442)
                                       Index Cond: ((work_item_id = wifv.work_item_id) AND (work_item_type_field_metadata_id = 6))
                     ->  Limit  (cost=0.56..3.58 rows=1 width=8) (actual time=0.002..0.002 rows=0 loops=22442)
                           ->  Index Scan using wi_fields_values_wi_id_and_wi_type_field_metadata_id on work_item_fields_values health_statuses  (cost=0.56..3.58 rows=1 width=8) (actual time=0.002..0.002 rows=0 loops=22442)
                                 Index Cond: ((work_item_id = wifv.work_item_id) AND (work_item_type_field_metadata_id = 4))
               ->  Subquery Scan on iterations_query  (cost=0.56..3.59 rows=1 width=8) (actual time=0.002..0.002 rows=0 loops=22442)
                     Filter: (iterations_query.iteration = 1885566)
                     Rows Removed by Filter: 0
                     ->  Limit  (cost=0.56..3.58 rows=1 width=8) (actual time=0.002..0.002 rows=0 loops=22442)
                           ->  Index Scan using wi_fields_values_wi_id_and_wi_type_field_metadata_id on work_item_fields_values iterations  (cost=0.56..3.58 rows=1 width=8) (actual time=0.002..0.002 rows=0 loops=22442)
                                 Index Cond: ((work_item_id = wifv.work_item_id) AND (work_item_type_field_metadata_id = 7))
         ->  Limit  (cost=0.56..2.78 rows=1 width=42) (actual time=0.003..0.003 rows=1 loops=33)
               ->  Index Scan using wi_fields_values_wi_id_and_wi_type_field_metadata_id on work_item_fields_values titles  (cost=0.56..5.00 rows=2 width=42) (actual time=0.003..0.003 rows=1 loops=33)
                     Index Cond: ((work_item_id = wifv.work_item_id) AND (work_item_type_field_metadata_id = 1))
 Planning Time: 0.809 ms
 Execution Time: 219.752 ms
  • fetch first 10K work item id, title, milestone_id, health_status, iteration_id ordered by labels priority in descending order with label priority equal or higher than 7
explain (analyze, buffers)
select work_item_id, title_query.title, milestone_query.milestone, health_status_query.health_status, iteration
  from work_item_fields_values wifv 
  LEFT JOIN LATERAL (
	  SELECT bigint_value as milestone 
	  from work_item_fields_values as milestones_tbl 
	  WHERE work_item_type_field_metadata_id = 6 
	  AND milestones_tbl.work_item_id=wifv.work_item_id 
	  limit 1
	) as milestone_query ON TRUE
  LEFT JOIN LATERAL (
	  SELECT bigint_value as health_status 
	  from work_item_fields_values as health_statuses 
	  WHERE work_item_type_field_metadata_id = 4 
	  AND health_statuses.work_item_id=wifv.work_item_id 
	  limit 1
	) as health_status_query ON TRUE
  LEFT JOIN LATERAL (
	  SELECT bigint_value as iteration 
	  from work_item_fields_values as iterations 
	  WHERE work_item_type_field_metadata_id = 7 
	  AND iterations.work_item_id=wifv.work_item_id 
	  limit 1
	) as iterations_query ON TRUE 
  LEFT JOIN LATERAL (
	  SELECT string_value as title 
	  from work_item_fields_values as titles 
	  WHERE work_item_type_field_metadata_id = 1 
	  AND titles.work_item_id=wifv.work_item_id 
	  limit 1
	) as title_query ON TRUE
  inner join labels on wifv.bigint_value = labels.id
  inner join label_priorities on label_priorities.label_id = labels.id
  where wifv.work_item_type_field_metadata_id = 9 and label_priorities.priority <= 7 
  order by label_priorities.priority desc
  limit 10000


 Limit  (cost=2.81..136395.09 rows=10000 width=82) (actual time=0.061..132.064 rows=10000 loops=1)
   ->  Nested Loop Left Join  (cost=2.81..4820910.64 rows=353459 width=82) (actual time=0.060..130.548 rows=10000 loops=1)
         ->  Nested Loop Left Join  (cost=2.25..3830596.88 rows=353459 width=40) (actual time=0.052..95.136 rows=10000 loops=1)
               ->  Nested Loop Left Join  (cost=1.69..2557260.84 rows=353459 width=32) (actual time=0.049..69.068 rows=10000 loops=1)
                     ->  Nested Loop Left Join  (cost=1.12..1283924.79 rows=353459 width=24) (actual time=0.044..42.822 rows=10000 loops=1)
                           ->  Index Only Scan using work_item_fields_values_wi_field_metadata_id_bigint_work_item on work_item_fields_values wifv  (cost=0.56..10588.74 rows=353459 width=16) (actual time=0.028..3.294 rows=10000 loops=1)
                                 Index Cond: ((work_item_type_field_metadata_id = 3) AND (bigint_value > 7))
                                 Heap Fetches: 0
                           ->  Limit  (cost=0.56..3.58 rows=1 width=8) (actual time=0.003..0.004 rows=0 loops=10000)
                                 ->  Index Scan using wi_fields_values_wi_id_and_wi_type_field_metadata_id on work_item_fields_values milestones_tbl  (cost=0.56..3.58 rows=1 width=8) (actual time=0.003..0.003 rows=0 loops=10000)
                                       Index Cond: ((work_item_id = wifv.work_item_id) AND (work_item_type_field_metadata_id = 6))
                     ->  Limit  (cost=0.56..3.58 rows=1 width=8) (actual time=0.002..0.002 rows=0 loops=10000)
                           ->  Index Scan using wi_fields_values_wi_id_and_wi_type_field_metadata_id on work_item_fields_values health_statuses  (cost=0.56..3.58 rows=1 width=8) (actual time=0.002..0.002 rows=0 loops=10000)
                                 Index Cond: ((work_item_id = wifv.work_item_id) AND (work_item_type_field_metadata_id = 4))
               ->  Limit  (cost=0.56..3.58 rows=1 width=8) (actual time=0.002..0.002 rows=0 loops=10000)
                     ->  Index Scan using wi_fields_values_wi_id_and_wi_type_field_metadata_id on work_item_fields_values iterations  (cost=0.56..3.58 rows=1 width=8) (actual time=0.002..0.002 rows=0 loops=10000)
                           Index Cond: ((work_item_id = wifv.work_item_id) AND (work_item_type_field_metadata_id = 7))
         ->  Limit  (cost=0.56..2.78 rows=1 width=42) (actual time=0.003..0.003 rows=1 loops=10000)
               ->  Index Scan using wi_fields_values_wi_id_and_wi_type_field_metadata_id on work_item_fields_values titles  (cost=0.56..5.00 rows=2 width=42) (actual time=0.003..0.003 rows=1 loops=10000)
                     Index Cond: ((work_item_id = wifv.work_item_id) AND (work_item_type_field_metadata_id = 1))
 Planning Time: 0.648 ms
 Execution Time: 133.083 ms

Comparison Table

queries by weight plan/exec times by weight 2.5x-3x plan/exec times by label 2.5x-3x plan/exec times
10K items single attr 0.541 ms/14.252 ms 0.254 ms/4.638 ms 0.195 ms/4.257 ms
10K items multiple attrs 1.037 ms/131.370 ms 0.809 ms/219.752 ms 10.017 ms/196.594 ms

Problematic queries

  • Filtering by more than one attribute will require to join the work_item_fields_values table as many times as many columns we need to filter by.
WITH cte as MATERIALIZED (
  select value1.work_item_id, value1.bigint_value as weight, value2.bigint_value as iteration_id
	from work_item_fields_values as value1
	inner join work_item_fields_values as value2 on value2.work_item_id = value1.work_item_id
	where
		value1.work_item_type_field_metadata_id = 3 AND value2.work_item_type_field_metadata_id = 7
		 and value1.bigint_value is not null and value2.bigint_value = 1905485
  order by value1.bigint_value asc
  limit 10000
)
select *
from cte, LATERAL (
  SELECT bigint_value as health_status 
  from work_item_fields_values as health_statuses 
  WHERE work_item_type_field_metadata_id = 4 
  AND health_statuses.work_item_id=cte.work_item_id 
  limit 1
) as health_status_query, LATERAL (
  SELECT bigint_value as iteration 
  from work_item_fields_values as iterations 
  WHERE work_item_type_field_metadata_id = 7 
  AND iterations.work_item_id=cte.work_item_id 
  limit 1
) as iterations_query, LATERAL (
  SELECT string_value as title 
  from work_item_fields_values as titles 
  WHERE work_item_type_field_metadata_id = 1 
  AND titles.work_item_id=cte.work_item_id 
  limit 1
) as title_query

Edited by 🤖 GitLab Bot 🤖