Skip to content

Ensure new Releases always have `author_id`

In the past, when releases table was used for just a release note for git-tag, it didn't need author_id. When we officially introduced the Release domain, this column was added and became mandatory.

So in theory, the legacy rows don't have author_id, and probably we can't easily back-fill this, because we have to scan Git repositories for the empty rows.

However, if we still create an empty row, this should be fixed. The author_id must be filled. Meaning, we need the following validation:

diff --git a/app/models/release.rb b/app/models/release.rb
index c6c0920c4d0..858facd81f8 100644
--- a/app/models/release.rb
+++ b/app/models/release.rb
@@ -27,6 +27,7 @@ class Release < ApplicationRecord
 
   validates :project, :tag, presence: true
   validates :tag, uniqueness: { scope: :project_id }
+  validates :author_id, presence: true, on: :create
 
   validates :description, length: { maximum: Gitlab::Database::MAX_TEXT_SIZE_LIMIT }, if: :description_changed?
   validates_associated :milestone_releases, message: -> (_, obj) { obj[:value].map(&:errors).map(&:full_messages).join(",") }

and fix the culprit (if any).


The following discussion from !71650 (merged) should be addressed:

!71650 (comment 708077369)

This is probably not directly related to this MR, ie. tracking number of release authors, but data accuracy for release authors. I noticed that a good number of releases do not have an author and I wonder if that might be some cases where we miss assigning the author, thus skewing the actual overall usage data?

select author_id, count(*) from releases group by author_id order by count(*) desc;

 author_id | count
-----------+--------
           | ~800K

feel free to move this to a follow-up though.

Edited by Shinya Maeda