ActiveRecord::RecordInvalid (Validation failed: Name has already been taken)
Summary
With the introduction of !17004 (merged), GitLab instances are throwing
ActiveRecord::RecordInvalid (Validation failed: Name has already been taken) when the cronjob:import_software_licenses is ran.
Steps to reproduce
- Install GitLab 12.8.7
- Run the import_software_licenses job
SoftwareLicense.count
SoftwareLicense.last
w = ImportSoftwareLicensesWorker.new()
begin
w.perform
rescue ActiveRecord::RecordInvalid => invalid
pp invalid.record
end
- Get error
What is the current bug behavior?
When the cronjob:import_software_licenses runs the import initially fails on:
#<SoftwareLicense:0x00007f733186cd78
id: nil,
name: "GNU General Public License v1.0 only",
spdx_identifier: "GPL-1.0-only">
=> #<SoftwareLicense id: nil, name: "GNU General Public License v1.0 only", spdx_identifier: "GPL-1.0-only">
However, you can manually run the job and the import will fail on different license names. We think this is due to having a unique name field at "index_software_licenses_on_unique_name" UNIQUE, btree (name)
https://gitlab.com/gitlab-org/gitlab/-/blob/2cb9a85d2beadd51b926eaddb05005403bee0013/db/schema.rb#L4081
And looking at the SPDX database, many of the licenses do not have unique names. i.e. GNU General Public License v1.0 only
Check a count of these names with:
curl -s https://spdx.org/licenses/licenses.json | grep -c "GNU General Public License v1.0 only"
What is the expected correct behavior?
Licenses are able to import without issue.
Relevant logs and/or screenshots
{"severity":"WARN","time":"2020-03-18T12:57:01.704Z","error_class":"ActiveRecord::RecordInvalid","error_message":"Validation failed: Name has already been taken","context":"Job raised exception","jobstr":"{\"queue\":\"cronjob:import_software_licenses\",\"args\":[],\"class\":\"ImportSoftwareLicensesWorker\",\"retry\":3,\"queue_namespace\":\"cronjob\",\"jid\":\"5565938431d4f00b6f9eaeae\",\"created_at\":1584241208.8685296,\"correlation_id\":\"1FFJQJ0Usa\",\"enqueued_at\":1584536220.9578576,\"error_message\":\"Validation failed: Name has already been taken\",\"error_class\":\"ActiveRecord::RecordInvalid\",\"failed_at\":1584241210.449874,\"retry_count\":2,\"retried_at\":1584241370.927474}","queue":"cronjob:import_software_licenses","args":[],"class":"ImportSoftwareLicensesWorker","retry":3,"queue_namespace":"cronjob","jid":"5565938431d4f00b6f9eaeae","created_at":"2020-03-15T03:00:08.868Z","correlation_id":"1FFJQJ0Usa","enqueued_at":"2020-03-18T12:57:00.957Z","failed_at":"2020-03-15T03:00:10.449Z","retry_count":2,"retried_at":"2020-03-15T03:02:50.927Z","error_backtrace":["app/models/application_record.rb:35:in `block in safe_find_or_create_by!'","app/models/application_record.rb:34:in `tap'","app/models/application_record.rb:34:in `safe_find_or_create_by!'","ee/app/workers/import_software_licenses_worker.rb:15:in `block in perform'","ee/lib/gitlab/spdx/catalogue.rb:18:in `block in each'","ee/lib/gitlab/spdx/catalogue.rb:17:in `each'","ee/lib/gitlab/spdx/catalogue.rb:17:in `each'","ee/app/workers/import_software_licenses_worker.rb:10:in `perform'","lib/gitlab/sidekiq_daemon/monitor.rb:49:in `within_job'"]}
root@4cad842b7ed5:/# gitlab-psql -c "select count(*) from software_licenses;"
count
-------
0
(1 row)
root@4cad842b7ed5:/# gitlab-rails c
--------------------------------------------------------------------------------
GitLab: 12.8.7-ee (2643fd87200) EE
GitLab Shell: 11.0.0
PostgreSQL: 10.12
--------------------------------------------------------------------------------
Loading production environment (Rails 6.0.2)
irb(main):001:0> w = ImportSoftwareLicensesWorker.new()
=> #<ImportSoftwareLicensesWorker:0x00007f47160d7ac8>
irb(main):002:0> begin
irb(main):003:1> w.perform
irb(main):004:1> rescue ActiveRecord::RecordInvalid => invalid
irb(main):005:1> pp invalid.record
irb(main):006:1> end
#<SoftwareLicense:0x00007f470d7cc378
id: nil,
name: "GNU General Public License v1.0 only",
spdx_identifier: "GPL-1.0-only">
=> #<SoftwareLicense id: nil, name: "GNU General Public License v1.0 only", spdx_identifier: "GPL-1.0-only">
irb(main):007:0> SoftwareLicense.count
=> 155
Results of GitLab environment info
This is happening on Omnibus 12.8.7 to at least 12.6.4 regardless of GitLab License Plan.
Expand for output related to GitLab environment info
System information System: Ubuntu 18.04 Proxy: no Current User: git Using RVM: no Ruby Version: 2.6.3p62 Gem Version: 2.7.9 Bundler Version:1.17.3 Rake Version: 12.3.3 Redis Version: 3.2.12 Git Version: 2.24.1 Sidekiq Version:5.2.7 Go Version: unknown GitLab information Version: 12.6.4-ee Revision: cc6b787e7b0 Directory: /opt/gitlab/embedded/service/gitlab-rails DB Adapter: PostgreSQL DB Version: 10.9 URL: http://172.16.77.146 HTTP Clone URL: http://172.16.77.146/some-group/some-project.git SSH Clone URL: git@172.16.77.146:some-group/some-project.git Elasticsearch: no Geo: no Using LDAP: no Using Omniauth: yes Omniauth Providers: GitLab Shell Version: 10.3.0 Repository storage paths: - default: /var/opt/gitlab/git-data/repositories GitLab Shell path: /opt/gitlab/embedded/service/gitlab-shell Git: /opt/gitlab/embedded/bin/git
Results of GitLab application Check
Expand for output related to the GitLab application check
Checking GitLab subtasks ... Checking GitLab Shell ... GitLab Shell: ... GitLab Shell version >= 10.3.0 ? ... OK (10.3.0) Running /opt/gitlab/embedded/service/gitlab-shell/bin/check Internal API available: OK Redis available via internal API: OK gitlab-shell self-check successful Checking GitLab Shell ... Finished Checking Gitaly ... Gitaly: ... default ... OK Checking Gitaly ... Finished Checking Sidekiq ... Sidekiq: ... Running? ... yes Number of Sidekiq processes ... 1 Checking Sidekiq ... Finished Checking Incoming Email ... Incoming Email: ... Reply by email is disabled in config/gitlab.yml Checking Incoming Email ... Finished Checking LDAP ... LDAP: ... LDAP is disabled in config/gitlab.yml Checking LDAP ... Finished Checking GitLab App ... Git configured correctly? ... yes Database config exists? ... yes All migrations up? ... yes Database contains orphaned GroupMembers? ... no GitLab config exists? ... yes GitLab config up to date? ... yes Log directory writable? ... yes Tmp directory writable? ... yes Uploads directory exists? ... yes Uploads directory has correct permissions? ... yes Uploads directory tmp has correct permissions? ... skipped (no tmp uploads folder yet) Init script exists? ... skipped (omnibus-gitlab has no init script) Init script up-to-date? ... skipped (omnibus-gitlab has no init script) Projects have namespace: ... can't check, you have no projects Redis version >= 2.8.0? ... yes Ruby version >= 2.5.3 ? ... yes (2.6.3) Git version >= 2.22.0 ? ... yes (2.24.1) Git user has default SSH configuration? ... yes Active users: ... 1 Is authorized keys file accessible? ... yes Elasticsearch version 5.6 - 6.x? ... skipped (elasticsearch is disabled) Checking GitLab App ... Finished Checking GitLab subtasks ... Finished
Possible fixes
This seems to possibly be self correcting over time as more rows are added to the table until a non-unique name is found.
The possible fix may be to remove the unique constraint from https://gitlab.com/gitlab-org/gitlab/-/blob/2cb9a85d2beadd51b926eaddb05005403bee0013/db/schema.rb#L4081.