Revamp the dev seed

Everyone can contribute. Help move this issue forward while earning points, leveling up and collecting rewards.

This was brought up from https://gitlab.com/gitlab-org/gitlab-ce/merge_requests/20973#note_92003190 where we're trying:

@dosuken123: We were trying to fix & customize this seeder to test the feature, but since seeders are using random approach basically (e.g. sample(5)), so it's quite hard to manipulate it for a specific test purpose

We're overloading the single seed (db:seed_fu) with multiple purposes:

Produce some data for developers to play around locally (main purpose)
To test performance with a large amount of data: https://gitlab.com/gitlab-org/gitlab-ce/issues/28149
To better test migration: https://gitlab.com/gitlab-org/gitlab-ce/issues/40789
To write some tests based on the seed (as mentioned in the original quote)

I think all of above makes sense, but we can't have a single seed to serve so many purposes. We had a lot of issues in the past with seeds, and I often heard that seeds were broken.

I don't use them, because they're slow, and I don't keep reseeding because I want my current data, and I don't want to start over. We don't really test the seeds beside generating them in the CI. It's in a bad status.

I think we need to break this down to multiple kinds of seeds, and perhaps provide a way like migration, in which I could update my seed without wiping my data, so that I would be more interested in using them, and fix them when there's an issue. (It could be that it would put everything under the seeds group, so it's fine to wipe the group and start over just with the group)

It could be huge to revamp the seed, because we'll need to have another kind of factory API (is it possible to reuse our testing factories?) to better implement this, but I think we need to get there at some point. We could also use the seed data for QA.

What do you think?

/cc @dosuken123 @ayufan @rymai @smcgivern @jamedjo @stanhu @grzesiek @meks

Edited Sep 13, 2025 by 🤖 GitLab Bot 🤖