Skip to content

Use the DatabaseCleaner 'deletion' strategy instead of 'truncation'

Nick Thomas requested to merge (removed):40744-idempotent-ids into master

What does this MR do?

Uses the DatabaseCleaner deletion strategy instead of the truncation strategy

Are there points in the code the reviewer needs to double check?

Performance is a question. Truncation is a fixed cost, whereas deletion costs scale with the number of rows. Mostly, we have only a few rows per spec, and others have seen performance improvements by switching to deletion in that case, e.g. http://sevenseacat.net/posts/2015/use-database-cleaners-deletion-strategy/

Some gitlab-specific examples:

spec/models/merge_request_spec.rb

Mostly transaction specs, a couple of truncation specs.

# Before (truncation)

Finished in 1 minute 10.82 seconds (files took 10.23 seconds to load)
236 examples, 0 failures


real	1m21.612s
user	0m52.324s
sys	0m3.740s

# After (deletion)

Finished in 1 minute 8.01 seconds (files took 9.84 seconds to load)
236 examples, 0 failures


real	1m18.482s
user	0m52.084s
sys	0m3.332s

Performance of transaction-using specs is unaffected.

spec/features/users_spec.rb

All :js specs

# Before

Finished in 47.15 seconds (files took 10.01 seconds to load)
10 examples, 0 failures


real	0m57.846s
user	0m20.140s
sys	0m3.772s



# After

Finished in 23.16 seconds (files took 10.21 seconds to load)
10 examples, 0 failures


real	0m34.076s
user	0m20.616s
sys	0m3.848s

Performance is improved significantly \o/

The overall effect on the test suite remains to be seen, but I'm hopeful that it is a significant improvement. If it's neutral, or even a slight regression, I still think it's worth it for solving the sequence issue. I've tried several approaches now, and they've all had drawbacks.

Why was this MR needed?

Most of our specs run using the DatabaseCleaner 'transaction' strategy, which is fast and good. It also has the property of not resetting sequences, so project IDs (and so, with hashed storage, on-disk repositories) never get re-used.

However, some - notably capybara, javacript, database migration, multi-threaded and (for Geo) FDW - specs require the data to be visible from multiple database connections, so we can't use transactions.

Currently, we use the truncation strategy instead. However, this has the undesirable property of resetting sequences, leading to order-dependent spec failures (some specs will fail if they happen to come after a spec that uses the truncation strategy).

The 'deletion' strategy allows data to be seen from multiple connections, is faster than truncation for our use case (but slower than transaction), and leaves sequences untouched, so is a better choice for those specs that are currently using truncation.

Screenshots (if relevant)

Does this MR meet the acceptance criteria?

What are the relevant issue numbers?

Related to #40744 (closed)

Closes #30783 (closed)

Edited by Nick Thomas

Merge request reports