ActiveRecord::RecordInvalid (Validation failed: Name has already been taken)

Summary

With the introduction of !17004 (merged), GitLab instances are throwing ActiveRecord::RecordInvalid (Validation failed: Name has already been taken) when the cronjob:import_software_licenses is ran.

Steps to reproduce

  1. Install GitLab 12.8.7
  2. Run the import_software_licenses job
SoftwareLicense.count
SoftwareLicense.last

w = ImportSoftwareLicensesWorker.new()

begin
w.perform
rescue ActiveRecord::RecordInvalid => invalid
pp invalid.record
end
  1. Get error

What is the current bug behavior?

When the cronjob:import_software_licenses runs the import initially fails on:

#<SoftwareLicense:0x00007f733186cd78
 id: nil,
 name: "GNU General Public License v1.0 only",
 spdx_identifier: "GPL-1.0-only">
=> #<SoftwareLicense id: nil, name: "GNU General Public License v1.0 only", spdx_identifier: "GPL-1.0-only">

However, you can manually run the job and the import will fail on different license names. We think this is due to having a unique name field at "index_software_licenses_on_unique_name" UNIQUE, btree (name) https://gitlab.com/gitlab-org/gitlab/-/blob/2cb9a85d2beadd51b926eaddb05005403bee0013/db/schema.rb#L4081

And looking at the SPDX database, many of the licenses do not have unique names. i.e. GNU General Public License v1.0 only

Check a count of these names with:

curl -s https://spdx.org/licenses/licenses.json | grep -c "GNU General Public License v1.0 only"

What is the expected correct behavior?

Licenses are able to import without issue.

Relevant logs and/or screenshots

{"severity":"WARN","time":"2020-03-18T12:57:01.704Z","error_class":"ActiveRecord::RecordInvalid","error_message":"Validation failed: Name has already been taken","context":"Job raised exception","jobstr":"{\"queue\":\"cronjob:import_software_licenses\",\"args\":[],\"class\":\"ImportSoftwareLicensesWorker\",\"retry\":3,\"queue_namespace\":\"cronjob\",\"jid\":\"5565938431d4f00b6f9eaeae\",\"created_at\":1584241208.8685296,\"correlation_id\":\"1FFJQJ0Usa\",\"enqueued_at\":1584536220.9578576,\"error_message\":\"Validation failed: Name has already been taken\",\"error_class\":\"ActiveRecord::RecordInvalid\",\"failed_at\":1584241210.449874,\"retry_count\":2,\"retried_at\":1584241370.927474}","queue":"cronjob:import_software_licenses","args":[],"class":"ImportSoftwareLicensesWorker","retry":3,"queue_namespace":"cronjob","jid":"5565938431d4f00b6f9eaeae","created_at":"2020-03-15T03:00:08.868Z","correlation_id":"1FFJQJ0Usa","enqueued_at":"2020-03-18T12:57:00.957Z","failed_at":"2020-03-15T03:00:10.449Z","retry_count":2,"retried_at":"2020-03-15T03:02:50.927Z","error_backtrace":["app/models/application_record.rb:35:in `block in safe_find_or_create_by!'","app/models/application_record.rb:34:in `tap'","app/models/application_record.rb:34:in `safe_find_or_create_by!'","ee/app/workers/import_software_licenses_worker.rb:15:in `block in perform'","ee/lib/gitlab/spdx/catalogue.rb:18:in `block in each'","ee/lib/gitlab/spdx/catalogue.rb:17:in `each'","ee/lib/gitlab/spdx/catalogue.rb:17:in `each'","ee/app/workers/import_software_licenses_worker.rb:10:in `perform'","lib/gitlab/sidekiq_daemon/monitor.rb:49:in `within_job'"]}
root@4cad842b7ed5:/# gitlab-psql -c "select count(*) from software_licenses;"
 count 
-------
     0
(1 row)
root@4cad842b7ed5:/# gitlab-rails c
--------------------------------------------------------------------------------
 GitLab:       12.8.7-ee (2643fd87200) EE
 GitLab Shell: 11.0.0
 PostgreSQL:   10.12
--------------------------------------------------------------------------------
Loading production environment (Rails 6.0.2)
irb(main):001:0> w = ImportSoftwareLicensesWorker.new()
=> #<ImportSoftwareLicensesWorker:0x00007f47160d7ac8>
irb(main):002:0> begin
irb(main):003:1> w.perform
irb(main):004:1> rescue ActiveRecord::RecordInvalid => invalid
irb(main):005:1> pp invalid.record
irb(main):006:1> end
#<SoftwareLicense:0x00007f470d7cc378
 id: nil,
 name: "GNU General Public License v1.0 only",
 spdx_identifier: "GPL-1.0-only">
=> #<SoftwareLicense id: nil, name: "GNU General Public License v1.0 only", spdx_identifier: "GPL-1.0-only">
irb(main):007:0> SoftwareLicense.count
=> 155

Results of GitLab environment info

This is happening on Omnibus 12.8.7 to at least 12.6.4 regardless of GitLab License Plan.

Expand for output related to GitLab environment info
System information
System:		Ubuntu 18.04
Proxy:		no
Current User:	git
Using RVM:	no
Ruby Version:	2.6.3p62
Gem Version:	2.7.9
Bundler Version:1.17.3
Rake Version:	12.3.3
Redis Version:	3.2.12
Git Version:	2.24.1
Sidekiq Version:5.2.7
Go Version:	unknown

GitLab information
Version:	12.6.4-ee
Revision:	cc6b787e7b0
Directory:	/opt/gitlab/embedded/service/gitlab-rails
DB Adapter:	PostgreSQL
DB Version:	10.9
URL:		http://172.16.77.146
HTTP Clone URL:	http://172.16.77.146/some-group/some-project.git
SSH Clone URL:	git@172.16.77.146:some-group/some-project.git
Elasticsearch:	no
Geo:		no
Using LDAP:	no
Using Omniauth:	yes
Omniauth Providers: 

GitLab Shell
Version:	10.3.0
Repository storage paths:
- default: 	/var/opt/gitlab/git-data/repositories
GitLab Shell path:		/opt/gitlab/embedded/service/gitlab-shell
Git:		/opt/gitlab/embedded/bin/git

Results of GitLab application Check

Expand for output related to the GitLab application check
Checking GitLab subtasks ...

Checking GitLab Shell ...

GitLab Shell: ... GitLab Shell version >= 10.3.0 ? ... OK (10.3.0)
Running /opt/gitlab/embedded/service/gitlab-shell/bin/check
Internal API available: OK
Redis available via internal API: OK
gitlab-shell self-check successful

Checking GitLab Shell ... Finished

Checking Gitaly ...

Gitaly: ... default ... OK

Checking Gitaly ... Finished

Checking Sidekiq ...

Sidekiq: ... Running? ... yes
Number of Sidekiq processes ... 1

Checking Sidekiq ... Finished

Checking Incoming Email ...

Incoming Email: ... Reply by email is disabled in config/gitlab.yml

Checking Incoming Email ... Finished

Checking LDAP ...

LDAP: ... LDAP is disabled in config/gitlab.yml

Checking LDAP ... Finished

Checking GitLab App ...

Git configured correctly? ... yes
Database config exists? ... yes
All migrations up? ... yes
Database contains orphaned GroupMembers? ... no
GitLab config exists? ... yes
GitLab config up to date? ... yes
Log directory writable? ... yes
Tmp directory writable? ... yes
Uploads directory exists? ... yes
Uploads directory has correct permissions? ... yes
Uploads directory tmp has correct permissions? ... skipped (no tmp uploads folder yet)
Init script exists? ... skipped (omnibus-gitlab has no init script)
Init script up-to-date? ... skipped (omnibus-gitlab has no init script)
Projects have namespace: ... can't check, you have no projects
Redis version >= 2.8.0? ... yes
Ruby version >= 2.5.3 ? ... yes (2.6.3)
Git version >= 2.22.0 ? ... yes (2.24.1)
Git user has default SSH configuration? ... yes
Active users: ... 1
Is authorized keys file accessible? ... yes
Elasticsearch version 5.6 - 6.x? ... skipped (elasticsearch is disabled)

Checking GitLab App ... Finished


Checking GitLab subtasks ... Finished

Possible fixes

This seems to possibly be self correcting over time as more rows are added to the table until a non-unique name is found. The possible fix may be to remove the unique constraint from https://gitlab.com/gitlab-org/gitlab/-/blob/2cb9a85d2beadd51b926eaddb05005403bee0013/db/schema.rb#L4081.

Assignee Loading
Time tracking Loading