Aggregate statistics for the contribution calendar
The contribution calendar currently aggregates data directly from the
events table. This prevents us from further cutting down the table size as removing data would lead to incorrect contribution statistics.
To deal with this we should change the calendar to instead use a separate table, e.g.
user_contributions. This table would store a single row per user per day containing the number of contributions for that date. This table would thus have the following columns:
- id (primary key)
- date (YYYY-MM-DD)
This table would have the following indexes:
(id)primary key index
The application in turn would take the following steps to insert or update a row:
- Start a transaction
- Check if today's row exists
- If it does not, insert it and rescue duplicate key errors. Upon a duplicate key error jump to the next step
- If it does, increment
contributionsby 1 using
UPDATE user_contributions SET contributions = contributions + 1 WHERE user_id = X AND date = Y
This would lead to something like:
today = Time.now.strftime('%Y-%m-%d') transaction do # THIS MUST BE IN SQL, do **not** load data into Ruby to do the update as this # will lead to incorrect contribution counts being set updated = user.contributions.where(date: today). update_all('contributions = contributions + 1') # Row didn't exist, let's try inserting it if updated == 0 begin user.contributions.insert(date: today, contributions: 1) # Somebody beat us to it, let's update the row instead. rescue ActiveRecord::DuplicateKeyError user.contributions.where(date: today).update_all('contributions = contributions + 1') end end end
Do not use
find_or_create and similar methods as these are not atomic. The reason for performing the update first is that this operation will happen in 99% of all cases this code is called. The lack of a SELECT means that in the best case we only need to run a single UPDATE, instead of having to run a SELECT and an UPDATE. This however requires that we use some Rails method that returns the number of updated rows. The
begin/rescue is to take care of concurrent inserts. This will work as expected since there's a UNIQUE index on
I can't stress how important it is here to rely on database indexes and the above setup, don't use any of Rails'
validates_unique helpers since these are really bad at enforcing uniqueness without any UNIQUE indexes being present.