Seed dev with production like data by default

Sample code from https://gitlab.com/gitlab-org/gitlab-ce/snippets/33946

This took approximately 2m7s to seed 1.5m projects, including setting default values like created_at.

# Disable database insertion logs so speed isn't limited by ability to print to console
old_logger = ActiveRecord::Base.logger
ActiveRecord::Base.logger = nil

author = FactoryGirl.create(:user)

Project.insert_using_generate_series(1, 1500000) do |sql|
  project_name = raw("'seed_project_' || seq") #raw("md5(random()::text)")
  sql.name = project_name
  sql.path = project_name
  sql.creator_id = author.id
  sql.namespace_id = author.namespace_id
end

# Force a different/slower query plan by updating project visibility
Project.where(visibility_level: Gitlab::VisibilityLevel::PRIVATE).limit(200000).update_all(visibility_level: Gitlab::VisibilityLevel::PUBLIC)
Project.where(visibility_level: Gitlab::VisibilityLevel::PRIVATE).limit(20000).update_all(visibility_level: Gitlab::VisibilityLevel::INTERNAL)

# Reset logging
ActiveRecord::Base.logger = old_logger

@jamedjo Very cool, thanks! Also we may need to consider adding more labels, issues, users, merge requests, etc.

@rymai this is currently with the ~Edge team, but I think any team should be able to pick it up. Wdyt?

@jamedjo that's awesome

changed milestone to %9.0

Of course, it is also difficult to create a real-life data distribution in a seeded database. There will always be lots of outliers that may be difficult to reproduce

Agreed, but we should do the best we can now, and then improve when we find these outliers.

@jamedjo Thanks! Nice trick! Let's create a rake task to seed a lot of projects / records using this technique! I feel the pain locally now!

mentioned in issue #27609 (closed)

mentioned in issue #28780 (moved)

changed milestone to %9.1

mentioned in issue #27099 (closed)

changed milestone to %9.3

mentioned in issue #32341 (moved)

@omame can we use this w.r.t. spinning up production-like topologies / environments in a separate Azure account? If I understand correctly, @rymai 's team is working on this for dev, but once we have it, no reason not to use it more widely, right?

mentioned in issue #27842 (closed)

@ernstvn Exactly.

mentioned in issue #32804 (moved)

@ernstvn My plan, if possible, is to use db snapshots for the environments creation. This way we'd have all production data available for testing. This is already in our pipeline, as I explained to you in a call last week.

mentioned in project snippet $33946

I've started working on that in https://gitlab.com/gitlab-org/gitlab-ce/compare/master...28149-improve-seed but realistically, this won't be ready for %9.3.

assigned to @rymai

mentioned in issue #31144 (moved)

This is great! I love what I see here and what @oswaldo did in https://gitlab.com/gitlab-org/gitlab-ce/snippets/33946.

mentioned in merge request !12355 (merged)

Useful snippet from @yorickpeterse:

-- Adjust the author/project IDs accordingly.
WITH min_iid AS (
    SELECT MAX(iid) + 1 as value
    FROM issues
    WHERE project_id = 2
),
series AS (
    SELECT generate_series(min_iid.value, min_iid.value + 5000) AS iid
    FROM min_iid
)
INSERT INTO issues (iid, title, author_id, project_id, description, created_at, updated_at, state)
SELECT series.iid AS iid,
    CONCAT('Test issue ', series.iid) AS title,
    1 AS author_id,
    2 AS project_id,
    CONCAT('Test issue description ', series.iid) AS description,
    now() AS created_at,
    now() as updated_at,
    (
        case series.iid % 3
        when 1 then 'opened'
        when 2 then 'reopened'
        else 'closed' end
    ) AS state
FROM series;

mentioned in merge request !12999 (merged)

changed milestone to %9.5

I'm unassigning myself for now since I have other things to finish before that. If anyone wants to take it in the meantime, there's a WIP branch at https://gitlab.com/gitlab-org/gitlab-ce/compare/master...28149-improve-seed.

Seed dev with production like data by default

What

Why

Considerations

Related

Designs

Child items ...

Activity

Sample code from https://gitlab.com/gitlab-org/gitlab-ce/snippets/33946

Seed dev with production like data by default

What

Why

Considerations

Related

Relates to

Activity

Sample code from https://gitlab.com/gitlab-org/gitlab-ce/snippets/33946