Skip to content

Strip HTML and/or truncate excessively long comments on vulnerability_feedback table

What does this MR do and why?

Describe in detail what your merge request does and why.

This MR introduces a batched background migration job that strips HTML tags and/or truncates excessively long comments on vulnerability_feedback table

Related to #383703 (closed) step 4

Database review

Truth be told this is going to be a no-op on GitLab.com because our longest record in that table is 2_817 characters long but @ahegyi was right to suggest that we don't know that's the case for our self-hosted customers. I think 50 000 is unlikely but it's better to be safe than sorry.

Batch selection

SELECT "vulnerability_feedback"."id" FROM "vulnerability_feedback" WHERE "vulnerability_feedback"."id" BETWEEN 1 AND 596672 AND (char_length(comment) > 50000) ORDER BY "vulnerability_feedback"."id" ASC LIMIT 1

https://console.postgres.ai/gitlab/gitlab-production-tunnel-pg12/sessions/13624/commands/47800

SELECT "vulnerability_feedback"."id" FROM "vulnerability_feedback" WHERE "vulnerability_feedback"."id" BETWEEN 1 AND 596672 AND (char_length(comment) > 50000) AND "vulnerability_feedback"."id" >= 11 ORDER BY "vulnerability_feedback"."id" ASC LIMIT 1 OFFSET 250

https://console.postgres.ai/gitlab/gitlab-production-tunnel-pg12/sessions/13624/commands/47803

SELECT "vulnerability_feedback".* FROM "vulnerability_feedback" WHERE "vulnerability_feedback"."id" BETWEEN 472478 AND 472728 AND (char_length(comment) > 50000) AND "vulnerability_feedback"."id" >= 472478

https://console.postgres.ai/gitlab/gitlab-production-tunnel-pg12/sessions/13624/commands/47804

Record update

explain UPDATE "vulnerability_feedback" SET "updated_at" = '2022-12-01 18:25:43.551210', "comment" = 'definitely shorter' WHERE "vulnerability_feedback"."id" = 12

https://console.postgres.ai/gitlab/gitlab-production-tunnel-pg12/sessions/13624/commands/47805

MR acceptance checklist

This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.

Edited by Michał Zając

Merge request reports