Avoid duplicating events
The following discussion from !26624 (merged) should be addressed:
-
@stanhu started a discussion: (+2 comments) This method seems like it would get slower the more events that are generated for the Wiki. This query sorts all events for this project:
SELECT "events".* FROM "events" WHERE "events"."target_type" = $1 AND "events"."target_id" = $2 ORDER BY "events"."id" DESC LIMIT $3 /*application:test,correlation_id:372e1c7e9a84176097371f360bb96aae*/ [["target_type", "WikiPage::Meta"], ["target_id", 16], ["LIMIT", 1]]
The second query would also have this issue because I think a sequential scan would be needed to find
created_at
:SELECT "events".* FROM "events" WHERE "events"."target_type" = $1 AND "events"."target_id" = $2 AND "events"."created_at" = $3 LIMIT $4 /*application:test,correlation_id:372e1c7e9a84176097371f360bb96aae*/ [["target_type", "WikiPage::Meta"], ["target_id", 16], ["created_at", "2020-05-03 06:27:43"], ["LIMIT", 1]]
We don't have idempotency for our
events
table yet, but if we did want it, would we consider adding a column that has a fingerprint of the event message and use a uniqueness index against that?
Proposal
Add a new column to the events table fingerprint text(128), nullable
, that allows us to avoid adding the same event twice, when there is a reasonable way to know (git commit SHAs would make good fingerprints).