Reduce the number of tuples updated for tables owned by threat insights
### Problem In PostgreSQL, `tuples` are immutable. This is part of the PostgreSQL's MVCC(Multi-Version Concurrency Control) design. So, if we update a record in PostgreSQL, a new tuple gets created, and the old tuple gets removed by the VACUUM process eventually if it's not visible to any transaction anymore. Also, the indices in PostgreSQL point to tuples in the heap using their physical locations called `ctid`(block number, tuple index). This means that once there is a new tuple created for a row by an update query, all the index relations of that table need to be adjusted. Since the index tuples are also immutable, this is done by creating new index tuples. The ones pointing to the old tuples are also removed by the VACUUM process. As you can see, an update of a single row in a table can create lots of tuples and lots of write-ahead log entries. This is called write amplification(An [old blog post](https://www.uber.com/en-TR/blog/postgres-to-mysql-migration/) from the Uber engineering team explains this problem in more detail) and can affect the stability of the application. ### Some insightful metrics - [Here](https://thanos.gitlab.net/graph?g0.expr=1%20-%20quantile_over_time(0.50%2C%20postgres_replication_process_state_ratio%7Benv%3D%22gprd%22%2C%20type%3D%22patroni%22%2C%20process_type%3D%22walreceiver%22%2C%20process_state%3D%22S%22%7D%5B5m%5D)&g0.tab=0&g0.stacked=0&g0.range_input=2d&g0.max_source_resolution=0s&g0.deduplicate=1&g0.partial_response=0&g0.store_matches=%5B%5D&g0.step_input=300&g1.expr=&g1.tab=1&g1.stacked=0&g1.range_input=1h&g1.max_source_resolution=0s&g1.deduplicate=1&g1.partial_response=0&g1.store_matches=%5B%5D) is the WAL capacity metric for the entire database. - Tuple update breakdown [metrics](https://thanos-query.ops.gitlab.net/graph?g0.expr=rate(pg_stat_user_tables_n_tup_upd%7Benv%3D%22gprd%22%2C%20type%3D%22patroni%22%2C%20relname%3D~%22(vulnerabilit%7Csecurity).*%22%7D%5B1m%5D)&g0.tab=0&g0.stacked=0&g0.range_input=2h&g0.max_source_resolution=0s&g0.deduplicate=1&g0.partial_response=0&g0.store_matches=%5B%5D) for the threat insights tables. - Total writes [metrics](https://thanos-query.ops.gitlab.net/graph?g0.expr=sum(rate(pg_stat_user_tables_n_tup_ins%7Benv%3D%22gprd%22%2C%20type%3D%22patroni%22%2C%20relname%3D~%22(vulnerabilit%7Csecurity).*%22%7D%5B1m%5D)%20%2B%20rate(pg_stat_user_tables_n_tup_upd%7Benv%3D%22gprd%22%2C%20type%3D%22patroni%22%2C%20relname%3D~%22(vulnerabilit%7Csecurity).*%22%7D%5B1m%5D)%20%2B%20rate(pg_stat_user_tables_n_tup_del%7Benv%3D%22gprd%22%2C%20type%3D%22patroni%22%2C%20relname%3D~%22(vulnerabilit%7Csecurity).*%22%7D%5B1m%5D))&g0.tab=0&g0.stacked=0&g0.range_input=2h&g0.max_source_resolution=0s&g0.deduplicate=1&g0.partial_response=0&g0.store_matches=%5B%5D) to threat insights related tables. ### Scope of work In this epic, we want to go through all the tables owned by ~"group::threat insights" and try to find a way to reduce the number of updates we do for each of them. By the end of this epic, we want to reduce the amount of pressure on the GitLab main database as well as the potential replication lag caused by our update queries due to the amount of WAL records being processed. ### Threat insights owned tables - `vulnerability_reads` - [Number of table tuples updated](https://thanos-query.ops.gitlab.net/graph?g0.expr=rate(pg_stat_user_tables_n_tup_upd%7Benv%3D%22gprd%22%2C%20type%3D%22patroni%22%2C%20relname%3D%22vulnerability_reads%22%7D%5B3h%5D)&g0.tab=0&g0.stacked=0&g0.range_input=1w&g0.max_source_resolution=0s&g0.deduplicate=1&g0.partial_response=0&g0.store_matches=%5B%5D) - [Number of table tuples + index tuples updated](https://thanos-query.ops.gitlab.net/graph?g0.expr=rate(pg_stat_user_tables_n_tup_upd%7Benv%3D%22gprd%22%2C%20type%3D%22patroni%22%2C%20relname%3D%22vulnerability_reads%22%7D%5B3h%5D)%20*%20scalar(count(count%20by%20(indexrelname)%20(pg_stat_user_indexes_idx_tup_read%7Benv%3D%22gprd%22%2C%20type%3D%22patroni%22%2C%20relname%3D%22vulnerability_reads%22%7D)))&g0.tab=0&g0.stacked=0&g0.range_input=1w&g0.max_source_resolution=0s&g0.deduplicate=0&g0.partial_response=0&g0.store_matches=%5B%5D) - `vulnerabilities` - [Number of table tuples updated](https://thanos-query.ops.gitlab.net/graph?g0.expr=rate(pg_stat_user_tables_n_tup_upd%7Benv%3D%22gprd%22%2C%20type%3D%22patroni%22%2C%20relname%3D%22vulnerabilities%22%7D%5B3h%5D)&g0.tab=0&g0.stacked=0&g0.range_input=1w&g0.max_source_resolution=0s&g0.deduplicate=1&g0.partial_response=0&g0.store_matches=%5B%5D) - [Number of table tuples + index tuples updated](https://thanos-query.ops.gitlab.net/graph?g0.expr=rate(pg_stat_user_tables_n_tup_upd%7Benv%3D%22gprd%22%2C%20type%3D%22patroni%22%2C%20relname%3D%22vulnerabilities%22%7D%5B3h%5D)%20*%20scalar(count(count%20by%20(indexrelname)%20(pg_stat_user_indexes_idx_tup_read%7Benv%3D%22gprd%22%2C%20type%3D%22patroni%22%2C%20relname%3D%22vulnerabilities%22%7D)))&g0.tab=0&g0.stacked=0&g0.range_input=1w&g0.max_source_resolution=0s&g0.deduplicate=0&g0.partial_response=0&g0.store_matches=%5B%5D) - `vulnerability_occurrences` - Number of table tuples updated - Number of table tuples + index tuples updated
epic