During the initial stages of the work item types implementation we had a discussion around how to solve the the problem of having some missing data in the DB that we require for the app to work in !84836 (comment 910369861)
If you are getting this error on your new instance
In this issue we have discussed whether we should be able to rely on data that is seeded into the database as part of the DB setup steps. The conclusion is that we should be able to do so, as expecting that some instances do load seeds, would force us to write defensive code in many places of the application.
If you are getting this error on an existing instance
This should be rare if you have followed the steps described in https://gitlab-com.gitlab.io/support/toolbox/upgrade-path. But if you run into this, please post a comment in this issue and we will help you with the resolution. The steps required for the fix might depend on the state of your data.
Problem
For some work item features to work, we need some records to exist in the work_item_types table. At the moment we have 2 mechanisms for this to happen.
Production fixtures (1) would be ideal as a single place to create this data only once for new instances, but there are a few reasons why this is not 100% reliable as described in #353552 (comment 901589414).
Issue model code (2) solved this issue for the most part in legacy code, since upserting the type information would happen the first time an issue of a given type was created. From this moment forward on existing instances we use migration to add new types or widget definitions.
Another caveat comes in migrations as described in !128307 (comment 1527736002). We currently write very defensive migrations designed not to fail if for whatever reason the type data in the DB is not as expected, but this could come with it's own problems since work item type data might be inconsistent for some instances and might be hard to debug errors. So we should probably fail in migrations if for whatever reason we cannot migrate as expected? Since we upsert this data, the fix could be as simple as running a single line of code in the rails console
From the base_importer we can see that we upsert 9 rows in the work_item_types table and 99 rows in the work_item_widget_definitions table. We don't expect this number to grow too fast, but it definitively can if we introduce new features based on the work_items architecture.
work_item_types table
Each row in this table is upserted with values in 5 columns
name: text 255 char max
icon_name: text 255 char max
created_at: timestamp with time zone
updated_at: timestamp with time zone
base_type: smallint
work_item_widget_definitions table
Each row in this table is upserted with values in 3 columns
work_item_type_id: bigint
name: text
widget_type: smallint
Possible solutions
Can there be an initialization step that checks for the presence of the required data? Rails initializers might not be possible as described in !84836 (comment 912330965)
Check for types presence where needed as described in !84836 (comment 920396233). This might require this check to be done many many times in instances like .com for example every time a new issue is created (as we need to list the work item types before creation). Might not be a big deal if we use some sort of redis caching mechanism?
Thought: Would it be something good that we can assume that the types exist in the DB everywhere in the application? Doing existence checks before creating a feature might not be ideal as it requires dev's knowledge on this particularity?
@gitlab-org/plan-stage/backend-engineers I'd really appreciate your input here. I think this discussion is long overdue and I hope we can come up with a final more appropriate solution to what we currently have. As new teams start creating new work item types, I think we should be more certain on the right way to do this.
cc @mayra-cabrera@mkaeppler@tkuah since you have been involved in some of this discussion, I'd really appreciate your input
In my opinion, we should set reasonable expectations for when any application, including GitLab, operates correctly. Database migrations are absolutely critical to lay out the structure (and perhaps, as in this case, data) for GitLab to work.
I think it is fair to say that other parts of GitLab would fall apart if a customer upgrades GitLab, but does not run migrations, so I am not sure the arguments in this comment mean we shouldn't use migrations for this.
A lazy-insertion approach leads to defensive programming where we constantly need to second-guess whether some data the application needs exist or not. It should be handled during bootstrapping instead.
That said: is there a way to tackle this from the opposite end, i.e. why are customers even able to disable something like migrations? Specifically:
Omnibus didn't seed the database since auto_migrate was disabled.
Why would we even allow this or what motivation would a customer have to not run migrations? Aren't we shooting ourselves in the foot with this switch? My expectation would be that GitLab won't function correctly unless migrations run so it's a little like giving someone the option to take the steering wheel off and wishing them a safe journey.
Someone forgot to run gitlab-rake db:migrate.
Related to 1: this should not even be necessary outside of some sort of recovery/incident scenario.
Omnibus failed during install, preventing the seeds from running.
Fair point -- sounds like an edge case though, and why would I assume anything should work if installation failed?
I see we have multiple on the comment above by @mkaeppler. I think making the assumption that this data exists in the DB will definitively make our code a lot cleaner, and we can definitively be less defensive in migrations to add new types or widget definitions. I think it would even make more sense for migrations to fail if we find data in an unexpected state as this might bring errors hard to debug.
So, if you all agree on this, what should we do to move down this path? I guess we should improve documentation in places where we might see edge cases like the ones discussed above. Probably make it clear everywhere that seeding the DB is essential for the app to work properly. Maybe even add some kind of initialization step that checks all fixtures have been seeded? We can probably have some value in the DB to indicate that it has been seeded successfully?
Wasn't one of the problems with using migrations is that we, eventually, compress migrations down, or set a new starting point? I think data migrations (such as adding new type) don't get preserved in those cases. I think that's one reason we didn't do that originally. Or am I mis-remembering this?
Yes, @digitalmoksha, we definitively have to do both. Migrations are necessary for instances that have already upserted the required types and definitions, and we always need to write a migration if the structure of the required data changes.
But, for new instances, migrations should not be run, in this case we probably do (and someone can confirm as I'm not sure how onminus or other types of installs work) rake db:create db:schema:load and then seed the DB. And of course as you mentioned, migrations get squashed not sure how frequently, so in this scenario we also rely on the data we seed (currently through the seed-fu gem)
It is not clear if running that seeding step is optional, and what the impact of not running the step completely is.
So I agree we need to
Rely on seeding, and migrations.
Clearly document what seeding does, what gets created, and what errors will arise if seed is not created
Fail early when any expected base item is not present. I am not sure if we can do this immediately when new items are added to production seeds, we may need to gradually introduce this at major milestones instead.
About migration squashing/cleanup: don't we have checkpoint releases to deal with just that? So that if I want to upgrade from 14 to 16 say, then I can't just do that. I have to go through N intermediate versions first so I actually go through all of the migrations.
About migration squashing/cleanup: don't we have checkpoint releases to deal with just that? So that if I want to upgrade from 14 to 16 say, then I can't just do that. I have to go through N intermediate versions first so I actually go through all of the migrations.
That's a great point.
So we could still rely on migrations with DML for seeding the database, as long as we ensure that we force a checkpoint release before we squash, and ensure we don't add any more DML migrations between the checkpoint release and the squash.
The safest way to do this would probably be to immediately squash after each checkpoint release as a matter of process.
Which is probably good anyway, to keep down the number of migrations, since the more migrations we have, the slower each fresh DB setup is (including all CI jobs).
I'm not sure if this setting will actually prevent seeding the DB for new installs, from what I have read so far it seems that it only prevents migrations from running when upgrading Gitlab versions?
So I don't think setting up a new Gitlab install has a way to avoid seeding if you setup the DB structure properly?
About migration squashing/cleanup: don't we have checkpoint releases to deal with just that? So that if I want to upgrade from 14 to 16 say, then I can't just do that. I have to go through N intermediate versions first so I actually go through all of the migrations.
Right, @mkaeppler, that is an important point, but as you said I guess we are safe if we don't allow skipping major version upgrades.
So we could still rely on migrations with DML for seeding the database, as long as we ensure that we force a checkpoint release before we squash, and ensure we don't add any more DML migrations between the checkpoint release and the squash.
@cwoolley-gitlab I think we can, for any instance that is not brand new. For these, we always have to write migrations since the seeds will not be run in every upgrade, will they? But for new installs, migrations will not run as I said above, we will simply load the structure of the DB and then seed it, as well as marking all migrations as run, without actually running them.
So, unless I'm missing something, I'm not sure how #353552 (comment 901589414) happened. Probably just a failure during the setup steps that didn't seed the DB successfully?
I think we can, for any instance that is not brand new. For these, we always have to write migrations since the seeds will not be run in every upgrade
Ah, that's right. So for new instances, we can't rely on migrations, still have to get the DML run to add all the necessary seed data.
So apologies for my unfamiliarity and jumping in with questions, but I'm trying to understand how this works in GitLab.
My question is: Is seed_fu is the right answer here for all DML seeding in all possible combination of scenarios:
new instances
migrations of existing instances
in all environments, dev/test/prod
I think that may just be restating the intent of this issue, but I don't know what the answer is based on the comments above, and can't tell by looking at the various db:seed_fu hits when searching the codebase.
My question is: Is seed_fu is the right answer here for all DML seeding in all possible combination of scenarios:
new instances
migrations of existing instances
in all environments, dev/test/prod
@cwoolley-gitlab That's a good way to summarize it, and I think the answer is that we require one for each of the items you listed:
new instances: These will create the DB by loading the structure in structure.sql files and then use seed_fu to seed the data for the first time. Migrations won't be run for new instances.
migrations of existing instances: These won't seed the DB, so we rely on the current state of data in the DB and migrations that we have written since the time the instance was setup.
in all environments, dev/test/prod: dev/prod envs will use a combination of items described above (seeds and migrations for first time installs and further upgrades). For specs we seed the DB before suite in https://gitlab.com/gitlab-org/gitlab/-/blob/3652f8cfc2629a76380d1231f8255054971e5e9f/spec/spec_helper.rb#L253 and also avoid migration/delete specs to drop those tables. But there are some pipeline jobs that might need updating so we can have the initial state of the DB in a similar state a prod or dev app is as discussed in !128307 (comment 1525679811)
new instances: These will create the DB by loading the structure in structure.sql files and then use seed_fu to seed the data for the first time. Migrations won't be run for new instances.
migrations of existing instances: These won't seed the DB, so we rely on the current state of data in the DB and migrations that we have written since the time the instance was setup.
So, based on the two above points, it seems like we need to have a clearly-defined process for handling DML/seed data, and specifically how to handle it when we squash migrations:
All DML seed data must start out in a migration (so they get applied to existing instances)
As part of the process to squash migrations, all DML representing seed data in the squashed migrations must be moved to seed_fu (so it will be applied to future new instances)
Does that sound correct?
If so, do we currently have the details of this process documented?
Or do we have any docs around the migration squashing process at all?
I did a quick search and this was the only reference I found, but it's kind of a hard search to perform because of false hits on squash as related to git.
@cwoolley-gitlab I think we should always create migrations for existing installs, but I don't think we should wait for the migrations to be squashed before we add the required data to the fresh install seeds. I think the seeds should always be in sync with the state of the data we expect to have from the migrations we write, given that new installs won't run the migrations but simply load the structure and then run the seeds.
If so, do we currently have the details of this process documented?
From what I have found, I don't think this is properly documented, and I think only in this issue, many have come to an agreement that we can/should rely on seed data as expressed in #423483 (comment 1533122991). So, I think that updating the docs to reflect what has been discussed in this issue is precisely what the outcome of this issue should be.
Or do we have any docs around the migration squashing process at all?
I haven't found anything else myself, so this is something that should also be mentioned when we update the docs.
One more thing we need to keep in mind/document is how we should handle this seed data or how to provide users a quick way to fix data in the DB by following some steps. Specifically talking about work_item_types in the DB, we always upsert the types, so it's safe to seed many times and the end result will always be the same. But, I'm not sure if this is true for all of the seed data. Anyway, perhaps this is not a problem since we expect seeds to be run only once for fresh install. For example a way to fix inconsistent work item types would be as simple as running this same line in a console https://gitlab.com/gitlab-org/gitlab/-/blob/999757b61c3861420b7c7174112e94a75f749027/db/fixtures/production/003_create_base_work_item_types.rb#L4
So it looks like lib/tasks/gitlab/db/migration_squash.rake is the only canonical source of truth on how we squash migrations. I could not find any references in code/docs/handbook either.
So yes, the squash process could definitely use some docs.
@cwoolley-gitlab I think we should always create migrations for existing installs, but I don't think we should wait for the migrations to be squashed before we add the required data to the fresh install seeds. I think the seeds should always be in sync with the state of the data we expect to have from the migrations we write, given that new installs won't run the migrations but simply load the structure and then run the seeds.
Yes, this should be true. One thing I didn't notice on first glance (docs would help here) is that there's two files:
db/init_structure.sql - used by the first squashed migration, and presumably does NOT contain any schema DDL for unsquashed migrations
db/structure.sql - DOES contain full DDL of all migrations, squashed and unsquashed, and is used by clean installs, db:test:prepare, etc.
It might be more intuitive to rename db/init_structure.sql to db/squashed_migrations_structure.sql or something.
Specifically talking about work_item_types in the DB, we always upsert the types, so it's safe to seed many times and the end result will always be the same. But, I'm not sure if this is true for all of the seed data. Anyway, perhaps this is not a problem since we expect seeds to be run only once for fresh install.
Are we sure we can count on this?
It seems safer to enforce a rule that DML in both migrations and seed data will always be upserts, so either one may be run idempotently multiple times, in any order, and still always result in the identical seed data records in the DB.
@cwoolley-gitlab At the moment, running the same version of the seed will simply upsert, so yes. Of course, if a different version of the seeder say removes an entry or renames it, the other one with a different name will remain on the DB. But of course, this is something we handle with migrations for existing installs.
It seems safer to enforce a rule that DML in both migrations and seed data will always be upserts, so either one may be run idempotently multiple times, in any order, and still always result in the identical seed data records in the DB.
Yes, I agree we should enforce this somehow. Perhaps a rubocop rule that prevents methods like create, new or save on models, in favor of method like usert_all as the only way to create this records. Of course, we should properly document this too
Hello everyone! So I haven't been paying attention to this one in a while, but turns out we found another reason to get rid of the code that "makes sure" certain data exists in the DB before performing an action (creating an issue for example).
We had a discussion recently on https://gitlab.slack.com/archives/C3NBYFJ6N/p1714539272456519 (internal only) around this and found another reason to get rid of this. I have created Don't upsert work item types if not found in th... (!151817 - merged) to ship the change behind a feature flag (enabled by default), just so we can ask users to disable the FF in the event they run into problems like the ones we discussed in the past (but probably point them in the right direction which would be fixing install errors that prevented the seeds from running). With the flag shipped in 17.0 we might be able to further analyze what is going wrong with some installs.
This issue would remain open until we make the doc updates discussed in the thread.
FYI Don't upsert work item types if not found in th... (!151817 - merged) was merged and will be released in 17.0 behind a default enabled FF. I have added a much clearer error message for when the types are not seeded. We can probably remove the flag in 17.1 after we know how this has affected some new installs (hopefully error message will be enough for them to solve the problem)
The error message has a link to this issue. But in the past we have gotten reports in other issues that people open. Hopefully they will go straight here this time if necessary
Are there any details on what to do to fix this when getting this error on an existing install? The guidance on this ticket only seems to reference what to do for new installs - unless I have missed something. I noticed this issue today when trying to create the first issue on a project.
We are running GitLab: 17.4.2-ee (e85e7bae) EE (Installed via the gitlab-ee package on Ubuntu 20.04)
When clicking "New issue" on a project (Which doesn't have any existing issues open) I get an error 500 and this is logged:
WorkItems::Type::DEFAULT_TYPES_NOT_SEEDED (Default work item types have not been created yet. Make sure the DB has been seeded successfully. See related documentation in `https://docs.gitlab.com/omnibus/settings/database.html#seed-the-database-fresh-installs-onlyIf you have additional questions, you can ask in `https://gitlab.com/gitlab-org/gitlab/-/issues/423483):
Hey, @joolswills! Sorry I missed your original message. I wonder how you ended up with those types. It's very weird that you only have the newer ones and that Objective has id 1
Does that mean you don't have any records in the issues table that have a type issue? Because there's a FK contraint on issues.work_item_type_id I don't see any other way. Can you confirm you have the FK contraint on the column?
Because you are running e85e7bae it's safe to upsert the types and related records and it should be back to normal (we cannot do this in 17.5 anymore as we are now explicit about the IDs and fixing the records might be a bit different). In a Rails console you can run the commands below to make sure you have work item types in the state they should
Let me know if that works as fixing the install should be a priority, but then I'd love if we could discuss a bit more how this might have happened
UPDATE:
Are there any details on what to do to fix this when getting this error on an existing install? The guidance on this ticket only seems to reference what to do for new installs
I will update the description to mention this, but for existing installs there might not be a single or unique way to fix it as I said above.
Just before you replied I managed to fix a copy of our install I was debugging on. I think my fix is doing a similar thing as your instructions above but I will test. I temporarily reverted the changes to app/models/work_items/type.rb from 9c64ea2c so the items were added on issue creation.
I'm glad you resolved it, @joolswills. May I ask, what version did you upgrade from? Also, do you know what was the first version of your install? The first types should have been created in new instances in 14.3.0 and in existing instances in 14.2.0. In any version after those, you should get the types populated during the seeding process and up until 9c64ea2c you would also get the types upserted if for any reason you didn't have the types already.
I thought I updated from 17.3.5 to 17.4.2 but looking at the logs I went from 17.3.3 to 17.4.2 (Which was a mistake). I hope I haven't caused any other issues missing out 17.3.5
I am not sure of the first version but it would have been the current version from around Feb 2022 (When the admin user was created). The install has been running for some time, but has only recently had more use (hence no issues being created until now).
@nanjingfm1 do you know if as part of the installation process the DB seeds were executed?
Those records look good, but there are other that you might have missed if the seeds were not run. At least related to work item types I would advise that you run the following commands in a Rails console (should be safe to run with the records you already created)
We've just experienced the same bug. We aren't entirely sure when this happened but we noticed this while running the Gitlab performance tool, it failed after importing the first large project because it couldn't import the 6k something issues.
We've deployed a Gitlab Ultime in 3k reference architecture on-prem with the Gitlab Environment Toolkit and upgraded from 17.6.0 to 17.6.1 a few weeks ago and yesterday to 17.6.2.
The upgrades were deployed using the GET ansible suite as well
We've found this issue and employed the following command into the gitlab-rails console
@cpelzer thank you for reporting this. Running those 3 commands on the console should be enough to fix this problem. What installation method did you use in the Gitlab Environment Toolkit? Not sure if there's an alternative to using omnibus.
Just to be clear, this was a brand new installation, correct?
There is multiple methods but we have set it up with the gitlab debian packages on VMs running on Proxmox ( that's the way the GET sets it up on-prem )
The installation itself is still a PoC environment and was set-up earlier this year and since then has gone through two upgrades to newer patch versions.
And yes the three commands easily fixed the problems!
There is a Rake task (gitlab:db:configure) which will perform all database migrations (except batched background migrations), including seeding. It will only seed the database if the schema is empty. That rake task is called as part of gitlab-ctl reconfigure, assuming that automatic database migrations have not been disabled.
There should never be a reason to manually seed the database.
So since we're still in a PoC state I just removed the entire gitlab and re-deployed it using GET and deployed version 17.8.1 in a fresh installation and ran into the same problem.