Add sharding key to sent_notifications
table
sent_notifications
is currently a very large table. But we plan to partition the table using created date as the partition key and also, having a retention of the data only for a year as described in #417233 (comment 2381835274).
So, we might not need to backfill this table if groups are not going to be moved across cells in production before a year. Also, this table works with a feature that uses incoming email to fetch records, so we need to look into how an incoming email might route to the right cell. We might need to encode the sharding key in the reply_key used in these records
Regardless of the plan to partition the table and only preserve records for a year, we are going to add and backfill a sent_notifications.namespace_id
column. We are looking for alternatives to backfill in reverse with the hope to identify records older than a year (using updated_at
from the associated noteables due to the lack of a sent_notifications.created_at
column). And, because records are only created sequentially, identifying a record older than 2 years for example, would allows to delete ALL other records with id < identified_id_from_old_record.
But, since even deleting the records will require a long BBM, might not really be worth all the trouble of 2 BBMs, finalization etc, vs getting started with a regular backfill that only deletes those records that are invalid or cannot fetch the sharding_key value because the associated noteable has been deleted (no FK because it's a polymorphic association)
Noteable types
select DISTINCT(noteable_type) from sent_notifications;
noteable_type
--------------------------
Commit
DesignManagement::Design
Epic
Issue
MergeRequest
Note
PersonalSnippet
ProjectSnippet
(8 rows)
Note
Note
is no longer a valid noteable_type
. Code was changed to associate each record with the actual noteable a long time ago. I just confirmed we no longer create sent_notifications
records for Note
as noteable_type in a production clone. The last record created like that had a very small id value (so it was created a long time ago 2016-01-11 21:50:10.769361+00
)
select id, noteable_id, noteable_type from sent_notifications where noteable_type = 'Note' order by id desc limit 1;
id | noteable_id | noteable_type
------+-------------+---------------
4829 | 3174112 | Note
(1 row)
PersonalSnippet
We haven't created records with this type in at least 7 years as this line guards the creation for PersonalSnippets
that don't belong to a project or Group (they never do). Also gathering evidence from .com. For this reason, we don't need a composite sharding key (namespace_id, organization_id) and we can use only namespace_id
select id, noteable_id, noteable_type into sent_n_personal_snippet from sent_notifications where noteable_type = 'PersonalSnippet' order by id desc limit 1;
id | noteable_id | noteable_type
------------+-------------+-----------------
1376565326 | 2439784 | PersonalSnippet
Evidence from .com seems to suggest the last sent_notification record with PersonalSnippet
was created around 2022-10-31
. And this is just an outlier, more details in #514591 (comment 2493137403)
Looking for the next record in reverse
select id, noteable_id, noteable_type from sent_notifications where noteable_type = 'PersonalSnippet' and id < 1376565326 order by id desc limit 1;
id | noteable_id | noteable_type
----------+-------------+-----------------
72594974 | 1751855 | PersonalSnippet
The associated snippet has comments created around 2018-09-08
Deleting invalid records
It should be safe to delete records that have these invalid types (Note, PersonalSnippet) as we no longer create those and they were all created many years ago. During the backfill we'll also need to delete records from which we cannot get the sharding key value because the associated noteable has been deleted (no cascading FK because this is a polymorphic association)