[SPIKE] investigate options for custom work item types with custom defined attributes per type
Plan team has an ambitious initiative, to expand current issues functionality with extensible issues. The goal is to allow for the level of customization many customers look for in order to model their business's flows and processes as closely as possible, here is @gweaver's summary on the goal of the incentive https://youtu.be/PjAV2E19uIk
Topic to Evaluate
In this technical spike, I'm looking to present an a DB level data architecture and start a healthy discussion and evaluation of the technical approach to achieve the required customization by also keeping performance top of mind. This only looks into the work item types
and work item
custom attributes(or called widgets sometimes), there are other aspects to be investigated, that will not be part of this spike, but just to give some idea, take for instance ability to assemble work item types
in hierarchies, then use those work item types
hierachies to dictate the relationship between work items
themselves, or define workflows and automations on work items, or composable templates and other.
After this spike we should be able to answer if the presented technical approach is or is not worth pursuing further as an implementation path or if it is worth doing a narrower PoC, answering particular questions or concerns or we can just dismiss it as not feasible.
This spike comes out of the technical discussion on issue initiative
The two questions
Question One: Options for storing work items
Option 1: A separate table per work item type
I did not spent too much time investigating this as it does not seem feasible to me to define a table whenever a customer wants to create an work item type. For a simple sanity check I've considered that on average a fairly large organization may need 50 - 100 work item types to cover all of the processes and departments: eg, OKR
, Initiative
, Epic
for product planning, Feature
, Issue
, Bug
, Tech Debt
for project management, there are also Requirments
, Test Cases
, Incidents
, maybe something that Legal or Operations would use, etc. I've also considered that we currently have ~1.5M top level groups, even if we take 10% of those to be using custom work items we would not be able to create a table for each customer defined type.
Option 2: A separate table per predefined work item types and a separate table for user defined work items
I did not spend too much time on this option either, as I considered this as a subset of the next one Single table to track all work items
, and we may still end up in the Single table to track all work items
scenario if customer does not use default types and just goes to define their own types and only use that.
Option 3: Single table to track all work items
I've looked towards this approach as this seemed to be most straight forward and because some of other ideas I had were a subset of this anyway. In this case I was considering from the get go that the table that would store work items would be very thing, and it would not contain any of the work item attributes other than the most basic ones. E.g.
CREATE TABLE work_items (
id bigint NOT NULL,
work_item_type_id bigint NOT NULL
project_id bigint NOT NULL, (using project_id as this relates to `issues` table the easiest, but it can be group_id namespace_id, wherever we end with groups and projects consolidation)
created_at timestamp,
updated_at timestamp
);
This would also align with our current sharding initiative where we can then shard work_items
per root namespace.
Question two: Options for defining custom attributes and storing the values for each work item
Next question is, how do we now store custom defined attributes for every record in work_items
.
Store custom attributes as structured data in a JSONB column
First option, that I eventually dismissed, is having a data
column on work_items
to store the values for custom defined attributes. We could make it obey a certain JSON schema so that we would know what and how to query. The main issue I am having with this approach is that on current issues
table and consequently on work_items
we do a and we plan to do even more data aggregation, grouping and ordering by values, which does not perform greatly with JSON/JSONB attributes, there is also no strong data type validation and also work items are very active objects that change a lot and having to update a quite big and heavily indexes column is not performant. This would perhaps be a very good option to store highly customized data that is written once and the is presented as is in many places.
So to summarise:
- Pros:
- allows for high level of customization required for the initiative
- easy to setup
- writing simple queries for fetching data is very similar to have a column in a table, i.e. easy.
- Cons:
- no strongly typed data, all data is text
- does not perform well when having to aggregate data on attributes within JSON
- does not perform well when it needs to be updated often
- no referential integrity checks(FKs)
- no possibility to partition JSON data later on
Entity Attribute Values (EAV) With Values As Text
I briefly considered this when I was pointed to this article https://tapoueh.org/blog/2018/03/database-modelization-anti-patterns/#entity-attribute-values so I think this is a no go either.
Entity Attribute Values Inspired Variation
Even though I've dismissed EAV quite quickly the overall data structure gives a lot of room for customization, there are questions on ability to have strong data types as well as quite a big question on performance. I've attempted a variation on EAV but with strong data types capability, which still needs to prove that it can perform well enough.
And here is where the technical part of my investigation begins!!!
Important:
- All query plans are done on warm cache.
- Performance of inserts and updates was not tested.
The table structure
I though I would treat the data as if this data was sharded to a single namespace and try to validate that we can query with a reasonable performance issues within a single namespace
So after populating the tables with some data, here is what I have:
- Defined some work item types:
gitlabhq_dblab=# select * from work_item_types;
id | title
----+-------------
0 | Issue
1 | Incident
2 | Test Case
3 | Requirement
(4 rows)
- Defiend some of the common issue attributes as custom attributes
gitlabhq_dblab=# select * from work_item_types_fields_metadata;
id | work_item_type_id | title | field_type
----+-------------------+-----------------+------------
1 | 1 | Title | 1
2 | 1 | Description | 2
3 | 1 | Weight | 3
4 | 1 | Health Status | 3
5 | 1 | Epic | 3
6 | 1 | Milestone | 3
7 | 1 | Iteration | 3
8 | 1 | Assignees | 3
9 | 1 | Labels | 3
10 | 1 | Due Date | 4
11 | 1 | Confidentiality | 5
12 | 1 | Lock | 5
13 | 1 | Status | 3
(13 rows)
- imported ~700K issues and treat it as issues of a single project/namespace
gitlabhq_dblab=# select count(*) from work_items;
count
--------
720555
(1 row)
- obviously the distribution is towards
Issue
type right now
gitlabhq_dblab=# select work_item_type_id, count(*) from work_items group by work_item_type_id;
work_item_type_id | count
-------------------+--------
0 | 720326
1 | 185
2 | 44
(3 rows)
- imported the values for above defined custom attributes(
work_item_types_fields_metadata
) into thework_item_fields_values
table, and got ~8M records:
gitlabhq_dblab=# select count(*) from work_item_fields_values;
count
---------
8101713
(1 row)
Let's query some data:
- fetch first 10K work items id, weight ordered by weight in descending order, ~15ms: https://explain.depesz.com/s/V813
explain analyze
SELECT
work_item_id,
work_item_fields_values.bigint_value
FROM
work_item_fields_values
WHERE
work_item_type_field_metadata_id = 3
ORDER BY
work_item_fields_values.bigint_value DESC
LIMIT 10000
Limit (cost=0.56..1639.92 rows=10000 width=16) (actual time=0.065..13.491 rows=10000 loops=1)
-> Index Scan Backward using work_item_fields_values_wi_field_metadata_id_bigint_value on work_item_fields_values (cost=0.56..99462.51 rows=606710 width=16) (actual time=0.063..12.177 rows=10000 loops=1)
Index Cond: (work_item_type_field_metadata_id = 3)
Planning Time: 0.541 ms
Execution Time: 14.252 ms
(5 rows)
- keep the same order as count as above, but also select milestone_id, ~60ms: https://explain.depesz.com/s/RjmE This is where the queries start getting complicated, and thanks to @ahegyi for his help to figure out how to approach it
explain analyze
select work_item_id, wifv.bigint_value as weight, milestone_query.milestone
from work_item_fields_values wifv
LEFT JOIN LATERAL (
SELECT bigint_value as milestone
from work_item_fields_values as milestones_tbl
WHERE work_item_type_field_metadata_id = 6
AND milestones_tbl.work_item_id=wifv.work_item_id
limit 1
) as milestone_query ON TRUE
where wifv.work_item_type_field_metadata_id = 3
order by wifv.bigint_value desc
limit 10000
Limit (cost=0.99..36365.35 rows=10000 width=24) (actual time=0.050..54.411 rows=10000 loops=1)
-> Nested Loop Left Join (cost=0.99..2206262.99 rows=606710 width=24) (actual time=0.049..52.809 rows=10000 loops=1)
-> Index Scan Backward using work_item_fields_values_wi_field_metadata_id_bigint_value on work_item_fields_values wifv (cost=0.56..99462.51 rows=606710 width=16) (actual time=0.031..7.351 rows=10000 loops=1)
Index Cond: (work_item_type_field_metadata_id = 3)
-> Limit (cost=0.43..3.45 rows=1 width=8) (actual time=0.004..0.004 rows=0 loops=10000)
-> Index Scan using wi_fields_values_wi_id_and_wi_type_field_metadata_id on work_item_fields_values milestones_tbl (cost=0.43..3.45 rows=1 width=8) (actual time=0.004..0.004 rows=0 loops=10000)
Index Cond: ((work_item_id = wifv.work_item_id) AND (work_item_type_field_metadata_id = 6))
Planning Time: 0.367 ms
Execution Time: 55.186 ms
(9 rows)
- selecting work items with multiple fields for a given iteration with weight higher than 7 ~120ms: https://explain.depesz.com/s/N7aU#html
explain analyze
select work_item_id, title, milestone_query.milestone, health_status_query.health_status, iteration
from work_item_fields_values wifv
LEFT JOIN LATERAL (
SELECT bigint_value as milestone
from work_item_fields_values as milestones_tbl
WHERE work_item_type_field_metadata_id = 6
AND milestones_tbl.work_item_id=wifv.work_item_id
limit 1
) as milestone_query ON TRUE
LEFT JOIN LATERAL (
SELECT bigint_value as health_status
from work_item_fields_values as health_statuses
WHERE work_item_type_field_metadata_id = 4
AND health_statuses.work_item_id=wifv.work_item_id
limit 1
) as health_status_query ON TRUE
LEFT JOIN LATERAL (
SELECT bigint_value as iteration
from work_item_fields_values as iterations
WHERE work_item_type_field_metadata_id = 7
AND iterations.work_item_id=wifv.work_item_id
limit 1
) as iterations_query ON TRUE
LEFT JOIN LATERAL (
SELECT string_value as title
from work_item_fields_values as titles
WHERE work_item_type_field_metadata_id = 1
AND titles.work_item_id=wifv.work_item_id
limit 1
) as title_query ON TRUE
where wifv.work_item_type_field_metadata_id = 3 and wifv.bigint_value > 7 and iteration = 1885566
order by wifv.bigint_value asc
limit 10000;
Limit (cost=2.29..141166.33 rows=10000 width=100) (actual time=45.629..131.211 rows=33 loops=1)
-> Nested Loop Left Join (cost=2.29..5798978.74 rows=410797 width=100) (actual time=45.627..131.203 rows=33 loops=1)
-> Nested Loop (cost=1.85..4372486.16 rows=410797 width=40) (actual time=45.603..131.065 rows=33 loops=1)
-> Nested Loop Left Join (cost=1.42..2944966.58 rows=410797 width=32) (actual time=0.083..98.776 rows=12073 loops=1)
-> Nested Loop Left Join (cost=0.99..1518474.00 rows=410797 width=24) (actual time=0.062..67.128 rows=12073 loops=1)
-> Index Scan using work_item_fields_values_wi_field_metadata_id_bigint_value on work_item_fields_values wifv (cost=0.56..91981.42 rows=410797 width=16) (actual time=0.036..9.353 rows=12073 loops=1)
Index Cond: ((work_item_type_field_metadata_id = 3) AND (bigint_value > 7))
-> Limit (cost=0.43..3.45 rows=1 width=8) (actual time=0.004..0.004 rows=0 loops=12073)
-> Index Scan using wi_fields_values_wi_id_and_wi_type_field_metadata_id on work_item_fields_values milestones_tbl (cost=0.43..3.45 rows=1 width=8) (actual time=0.004..0.004 rows=0 loops=12073)
Index Cond: ((work_item_id = wifv.work_item_id) AND (work_item_type_field_metadata_id = 6))
-> Limit (cost=0.43..3.45 rows=1 width=8) (actual time=0.002..0.002 rows=0 loops=12073)
-> Index Scan using wi_fields_values_wi_id_and_wi_type_field_metadata_id on work_item_fields_values health_statuses (cost=0.43..3.45 rows=1 width=8) (actual time=0.002..0.002 rows=0 loops=12073)
Index Cond: ((work_item_id = wifv.work_item_id) AND (work_item_type_field_metadata_id = 4))
-> Subquery Scan on iterations_query (cost=0.43..3.47 rows=1 width=8) (actual time=0.002..0.002 rows=0 loops=12073)
Filter: (iterations_query.iteration = 1885566)
Rows Removed by Filter: 0
-> Limit (cost=0.43..3.45 rows=1 width=8) (actual time=0.002..0.002 rows=0 loops=12073)
-> Index Scan using wi_fields_values_wi_id_and_wi_type_field_metadata_id on work_item_fields_values iterations (cost=0.43..3.45 rows=1 width=8) (actual time=0.002..0.002 rows=0 loops=12073)
Index Cond: ((work_item_id = wifv.work_item_id) AND (work_item_type_field_metadata_id = 7))
-> Limit (cost=0.43..3.45 rows=1 width=60) (actual time=0.003..0.003 rows=1 loops=33)
-> Index Scan using wi_fields_values_wi_id_and_wi_type_field_metadata_id on work_item_fields_values titles (cost=0.43..3.45 rows=1 width=60) (actual time=0.003..0.003 rows=1 loops=33)
Index Cond: ((work_item_id = wifv.work_item_id) AND (work_item_type_field_metadata_id = 1))
Planning Time: 1.037 ms
Execution Time: 131.370 ms
(24 rows)
Some conclusions:
-
implement sane limits: You could see that even for 13 fields and 700K work items we got ~8M records in
work_item_fields_values
, I estimate that an organization would have more like 2000 or more fields, across all work item types, if we consider an average of 100 work items and 20 fields per type. But I think this is where we can impose some sane limits: e.g. 2000 fields max per organization(i.e. root namespace), similarly for instance limit to max to 50 fields per work item type. -
data partitioning: We can also consider partitioning data by
work_item_type_field_metadata_id
as that data would be solely related to a single field, - single data type per table: We can consider instead of having a single table with separate columns per base DB type we could move each DB type to a separate table,
-
separate table per relationship: Also in this example I've added milestone_id, iteration_id and other id references within a single
bigint_values
column, for sake of data volume, it would make more sense to have a separate table for each of the first class objects relationships so that we can also keep referential integrity checks. - we should analyze the tradeoffs: The queries become more complex in order to achieve the performance and some of the complexity will also move to the application layer, where we will need to build these queries.
- this is a spike not a full solution: This is definitely not an exhaustive testing of this approach, and it would require a very special approach for each of the queries that we are going to build given the level of customization we want to inject.
This is a pros and cons list that is by no means definitive:
- Pros:
- allows for high level of customization required for the initiative
- potential to perform well
- strongly typed data, all data is text
- referential integrity checks
- potential to partition data
- Cons:
- complexity of querying
- complexity on application level
- more complicated aggregations/computations would need specific approaches to solve for
Risks and Implementation Considerations
This is a big change and considering that it links into a lot of product areas it becomes complex very fast. So I would like us to weight the pros and cons before we decide to engage into implementing this one way or the other. The change has to happen or we would be left behind, but we need to have a good understanding of the path we are taking and iteration in small steps towards that.
Updated Queries:
Decided to add a few more values to check how performance might degrade, so here are some updated numbers to look at:
- now we got ~3x more
work_items
(i.eissues
):
gitlabhq_dblab=# select count(*) from work_items;
count
---------
2439299
(1 row)
- ~3x more work items resulted in almost ~2.5x more values for columns, obviously this table is very dependent on how many attributes each work item would have and much data each attribute would contain, for 2.4M work_items max number of records for values would be 2.4M*13(given 13 existing attributes)=> 31.2M rows, we got ~70% of max number, 21.6M:
gitlabhq_dblab=# select count(*) from work_item_fields_values;
count
----------
21631904
(1 row)
- values distribution for the above defined fields metadata(
work_item_types_fields_metadata
) 1 - 13 :
gitlabhq_dblab=# select work_item_type_field_metadata_id, count(*) from work_item_fields_values group by work_item_type_field_metadata_id order by count(*) desc limit 20;
work_item_type_field_metadata_id | count
----------------------------------+---------
9 | 8502563
11 | 4558945
1 | 4127688
13 | 2639750
3 | 816668
8 | 484349
6 | 267844
2 | 91257
5 | 60533
10 | 54710
12 | 18298
7 | 7224
4 | 2075
(13 rows)
-
checking performance for the above queries with 2.5x-3x more data and also given the values distribution would be interesting to check against the bigger value
work_item_type_field_metadata_id = 9
-
fetch first 10K work items id, weight ordered by weight in descending order
explain analyze
SELECT
work_item_id,
work_item_fields_values.bigint_value
FROM
work_item_fields_values
WHERE
work_item_type_field_metadata_id = 3
ORDER BY
work_item_fields_values.bigint_value DESC
LIMIT 10000
Limit (cost=0.56..275.13 rows=10000 width=16) (actual time=0.038..3.997 rows=10000 loops=1)
-> Index Only Scan Backward using work_item_fields_values_wi_field_metadata_id_bigint_work_item on work_item_fields_values (cost=0.56..22465.40 rows=818191 width=16) (actual time=0.036..2.701 rows=10000 loops=1)
Index Cond: (work_item_type_field_metadata_id = 3)
Heap Fetches: 0
Planning Time: 0.254 ms
Execution Time: 4.638 ms
(6 rows)
- fetch first 10K work items id, label_id ordered by label_id in descending order
explain analyze
SELECT
work_item_id,
work_item_fields_values.bigint_value
FROM
work_item_fields_values
WHERE
work_item_type_field_metadata_id = 9
ORDER BY
work_item_fields_values.bigint_value DESC
LIMIT 10000
Limit (cost=0.56..275.11 rows=10000 width=16) (actual time=0.050..3.624 rows=10000 loops=1)
-> Index Only Scan Backward using work_item_fields_values_wi_field_metadata_id_bigint_work_item on work_item_fields_values (cost=0.56..233508.60 rows=8505088 width=16) (actual time=0.048..2.360 rows=10000 loops=1)
Index Cond: (work_item_type_field_metadata_id = 9)
Heap Fetches: 0
Planning Time: 0.195 ms
Execution Time: 4.257 ms
(6 rows)
- fetch first 10K
work item id, title, milestone_id, health_status, iteration_id
ordered by weight in descending order where weight is higher than 7
explain analyze
select work_item_id, title, milestone_query.milestone, health_status_query.health_status, iteration
from work_item_fields_values wifv
LEFT JOIN LATERAL (
SELECT bigint_value as milestone
from work_item_fields_values as milestones_tbl
WHERE work_item_type_field_metadata_id = 6
AND milestones_tbl.work_item_id=wifv.work_item_id
limit 1
) as milestone_query ON TRUE
LEFT JOIN LATERAL (
SELECT bigint_value as health_status
from work_item_fields_values as health_statuses
WHERE work_item_type_field_metadata_id = 4
AND health_statuses.work_item_id=wifv.work_item_id
limit 1
) as health_status_query ON TRUE
LEFT JOIN LATERAL (
SELECT bigint_value as iteration
from work_item_fields_values as iterations
WHERE work_item_type_field_metadata_id = 7
AND iterations.work_item_id=wifv.work_item_id
limit 1
) as iterations_query ON TRUE
LEFT JOIN LATERAL (
SELECT string_value as title
from work_item_fields_values as titles
WHERE work_item_type_field_metadata_id = 1
AND titles.work_item_id=wifv.work_item_id
limit 1
) as title_query ON TRUE
where wifv.work_item_type_field_metadata_id = 3 and wifv.bigint_value > 7 and iteration = 1885566
order by wifv.bigint_value asc
limit 10000;
Limit (cost=2.81..136420.09 rows=10000 width=82) (actual time=43.485..219.670 rows=33 loops=1)
-> Nested Loop Left Join (cost=2.81..4821794.29 rows=353459 width=82) (actual time=43.483..219.662 rows=33 loops=1)
-> Nested Loop (cost=2.25..3831480.53 rows=353459 width=40) (actual time=43.472..219.520 rows=33 loops=1)
-> Nested Loop Left Join (cost=1.69..2557260.84 rows=353459 width=32) (actual time=0.049..158.066 rows=22442 loops=1)
-> Nested Loop Left Join (cost=1.12..1283924.79 rows=353459 width=24) (actual time=0.044..98.470 rows=22442 loops=1)
-> Index Only Scan using work_item_fields_values_wi_field_metadata_id_bigint_work_item on work_item_fields_values wifv (cost=0.56..10588.74 rows=353459 width=16) (actual time=0.027..6.206 rows=22442 loops=1)
Index Cond: ((work_item_type_field_metadata_id = 3) AND (bigint_value > 7))
Heap Fetches: 0
-> Limit (cost=0.56..3.58 rows=1 width=8) (actual time=0.004..0.004 rows=0 loops=22442)
-> Index Scan using wi_fields_values_wi_id_and_wi_type_field_metadata_id on work_item_fields_values milestones_tbl (cost=0.56..3.58 rows=1 width=8) (actual time=0.003..0.003 rows=0 loops=22442)
Index Cond: ((work_item_id = wifv.work_item_id) AND (work_item_type_field_metadata_id = 6))
-> Limit (cost=0.56..3.58 rows=1 width=8) (actual time=0.002..0.002 rows=0 loops=22442)
-> Index Scan using wi_fields_values_wi_id_and_wi_type_field_metadata_id on work_item_fields_values health_statuses (cost=0.56..3.58 rows=1 width=8) (actual time=0.002..0.002 rows=0 loops=22442)
Index Cond: ((work_item_id = wifv.work_item_id) AND (work_item_type_field_metadata_id = 4))
-> Subquery Scan on iterations_query (cost=0.56..3.59 rows=1 width=8) (actual time=0.002..0.002 rows=0 loops=22442)
Filter: (iterations_query.iteration = 1885566)
Rows Removed by Filter: 0
-> Limit (cost=0.56..3.58 rows=1 width=8) (actual time=0.002..0.002 rows=0 loops=22442)
-> Index Scan using wi_fields_values_wi_id_and_wi_type_field_metadata_id on work_item_fields_values iterations (cost=0.56..3.58 rows=1 width=8) (actual time=0.002..0.002 rows=0 loops=22442)
Index Cond: ((work_item_id = wifv.work_item_id) AND (work_item_type_field_metadata_id = 7))
-> Limit (cost=0.56..2.78 rows=1 width=42) (actual time=0.003..0.003 rows=1 loops=33)
-> Index Scan using wi_fields_values_wi_id_and_wi_type_field_metadata_id on work_item_fields_values titles (cost=0.56..5.00 rows=2 width=42) (actual time=0.003..0.003 rows=1 loops=33)
Index Cond: ((work_item_id = wifv.work_item_id) AND (work_item_type_field_metadata_id = 1))
Planning Time: 0.809 ms
Execution Time: 219.752 ms
- fetch first 10K
work item id, title, milestone_id, health_status, iteration_id
ordered by labels priority in descending order with label priority equal or higher than 7
explain (analyze, buffers)
select work_item_id, title_query.title, milestone_query.milestone, health_status_query.health_status, iteration
from work_item_fields_values wifv
LEFT JOIN LATERAL (
SELECT bigint_value as milestone
from work_item_fields_values as milestones_tbl
WHERE work_item_type_field_metadata_id = 6
AND milestones_tbl.work_item_id=wifv.work_item_id
limit 1
) as milestone_query ON TRUE
LEFT JOIN LATERAL (
SELECT bigint_value as health_status
from work_item_fields_values as health_statuses
WHERE work_item_type_field_metadata_id = 4
AND health_statuses.work_item_id=wifv.work_item_id
limit 1
) as health_status_query ON TRUE
LEFT JOIN LATERAL (
SELECT bigint_value as iteration
from work_item_fields_values as iterations
WHERE work_item_type_field_metadata_id = 7
AND iterations.work_item_id=wifv.work_item_id
limit 1
) as iterations_query ON TRUE
LEFT JOIN LATERAL (
SELECT string_value as title
from work_item_fields_values as titles
WHERE work_item_type_field_metadata_id = 1
AND titles.work_item_id=wifv.work_item_id
limit 1
) as title_query ON TRUE
inner join labels on wifv.bigint_value = labels.id
inner join label_priorities on label_priorities.label_id = labels.id
where wifv.work_item_type_field_metadata_id = 9 and label_priorities.priority <= 7
order by label_priorities.priority desc
limit 10000
Limit (cost=2.81..136395.09 rows=10000 width=82) (actual time=0.061..132.064 rows=10000 loops=1)
-> Nested Loop Left Join (cost=2.81..4820910.64 rows=353459 width=82) (actual time=0.060..130.548 rows=10000 loops=1)
-> Nested Loop Left Join (cost=2.25..3830596.88 rows=353459 width=40) (actual time=0.052..95.136 rows=10000 loops=1)
-> Nested Loop Left Join (cost=1.69..2557260.84 rows=353459 width=32) (actual time=0.049..69.068 rows=10000 loops=1)
-> Nested Loop Left Join (cost=1.12..1283924.79 rows=353459 width=24) (actual time=0.044..42.822 rows=10000 loops=1)
-> Index Only Scan using work_item_fields_values_wi_field_metadata_id_bigint_work_item on work_item_fields_values wifv (cost=0.56..10588.74 rows=353459 width=16) (actual time=0.028..3.294 rows=10000 loops=1)
Index Cond: ((work_item_type_field_metadata_id = 3) AND (bigint_value > 7))
Heap Fetches: 0
-> Limit (cost=0.56..3.58 rows=1 width=8) (actual time=0.003..0.004 rows=0 loops=10000)
-> Index Scan using wi_fields_values_wi_id_and_wi_type_field_metadata_id on work_item_fields_values milestones_tbl (cost=0.56..3.58 rows=1 width=8) (actual time=0.003..0.003 rows=0 loops=10000)
Index Cond: ((work_item_id = wifv.work_item_id) AND (work_item_type_field_metadata_id = 6))
-> Limit (cost=0.56..3.58 rows=1 width=8) (actual time=0.002..0.002 rows=0 loops=10000)
-> Index Scan using wi_fields_values_wi_id_and_wi_type_field_metadata_id on work_item_fields_values health_statuses (cost=0.56..3.58 rows=1 width=8) (actual time=0.002..0.002 rows=0 loops=10000)
Index Cond: ((work_item_id = wifv.work_item_id) AND (work_item_type_field_metadata_id = 4))
-> Limit (cost=0.56..3.58 rows=1 width=8) (actual time=0.002..0.002 rows=0 loops=10000)
-> Index Scan using wi_fields_values_wi_id_and_wi_type_field_metadata_id on work_item_fields_values iterations (cost=0.56..3.58 rows=1 width=8) (actual time=0.002..0.002 rows=0 loops=10000)
Index Cond: ((work_item_id = wifv.work_item_id) AND (work_item_type_field_metadata_id = 7))
-> Limit (cost=0.56..2.78 rows=1 width=42) (actual time=0.003..0.003 rows=1 loops=10000)
-> Index Scan using wi_fields_values_wi_id_and_wi_type_field_metadata_id on work_item_fields_values titles (cost=0.56..5.00 rows=2 width=42) (actual time=0.003..0.003 rows=1 loops=10000)
Index Cond: ((work_item_id = wifv.work_item_id) AND (work_item_type_field_metadata_id = 1))
Planning Time: 0.648 ms
Execution Time: 133.083 ms
Comparison Table
queries | by weight plan/exec times | by weight 2.5x-3x plan/exec times | by label 2.5x-3x plan/exec times |
---|---|---|---|
10K items single attr | 0.541 ms/14.252 ms | 0.254 ms/4.638 ms | 0.195 ms/4.257 ms |
10K items multiple attrs | 1.037 ms/131.370 ms | 0.809 ms/219.752 ms | 10.017 ms/196.594 ms |
Problematic queries
- Filtering by more than one attribute will require to join the
work_item_fields_values
table as many times as many columns we need to filter by.- example of query filtering work items by weight and iteration https://explain.depesz.com/s/yNeN#html
WITH cte as MATERIALIZED (
select value1.work_item_id, value1.bigint_value as weight, value2.bigint_value as iteration_id
from work_item_fields_values as value1
inner join work_item_fields_values as value2 on value2.work_item_id = value1.work_item_id
where
value1.work_item_type_field_metadata_id = 3 AND value2.work_item_type_field_metadata_id = 7
and value1.bigint_value is not null and value2.bigint_value = 1905485
order by value1.bigint_value asc
limit 10000
)
select *
from cte, LATERAL (
SELECT bigint_value as health_status
from work_item_fields_values as health_statuses
WHERE work_item_type_field_metadata_id = 4
AND health_statuses.work_item_id=cte.work_item_id
limit 1
) as health_status_query, LATERAL (
SELECT bigint_value as iteration
from work_item_fields_values as iterations
WHERE work_item_type_field_metadata_id = 7
AND iterations.work_item_id=cte.work_item_id
limit 1
) as iterations_query, LATERAL (
SELECT string_value as title
from work_item_fields_values as titles
WHERE work_item_type_field_metadata_id = 1
AND titles.work_item_id=cte.work_item_id
limit 1
) as title_query