This issue tracks the effort to create a cronjob with omnibus to trigger database index rebuilds automatically.
This should trigger a rake task (gitlab:db:reindex_auto or similar) at a set schedule. We might want to make the schedule configurable from the user perspective.
Note the rake task does not exist yet and we'd create this as a stub (no-op) if this helps to setup the cronjob (so we can later add the code to it, but it is already in place).
On GitLab.com, this cronjob should be deployed to an instance that has direct database access (no pgbouncers involved) - e.g. the same host we run database migrations on.
Designs
Child items
...
Show closed items
Linked items
0
Link issues together to show that they're related or that one is blocking others.
Learn more.
@mayra-cabrera Do we currently have windows of time throughout the week where we don't deploy anything?
This is about recreating indexes in the background, which has conflict potential with concurrent deployments (because of their database migrations). Now there is mitigation for this in #246497 (closed), but I wonder about its priority.
If we have fixed times throughout the week where we don't deploy, we might just start with scheduling the reindexing for that.
I wonder if this index recreation is something we can schedule in a post-migration so it's executed at the end of the deployment . How do we plan to start this? With some specific bloated indixes?
Thanks @mayra-cabrera ! We might just start with scheduling for the weekends and see if that's already enough to maintain a good state.
I wonder if this index recreation is something we can schedule in a post-migration so it's executed at the end of the deployment
That crossed my mind, too - I had discarded it because it would delay deployments, perhaps by a long time (some indexes are huge, take a long time to rebuild). Ultimately this is unrelated to deploys, so I would like to see if we can keep it separate. Also for self-hosted installations, this should run more regularly than an upgrade process.
How do we plan to start this? With some specific bloated indixes?
Yeah, there is #246498 (closed) which would implement a heuristic to select a good candidate (based on a bloat estimate).
We might have a couple of steps here:
Rake task requires index name as a parameter (this is where we are at right now)
Rake task automatically selects an index at random (preferring larger tables perhaps)
Rake tasks becomes smarter to select based on bloat estimate or history of reindexing
(3) - it might be easiest to just cycle through all indexes, one at a time and keep track of the history and perhaps even the impact a reindexing had and make smarter choices over time... Perhaps this is also slightly overengineered.
@ibaum I would like to add a cronjob that executes a Rake task on a set schedule. Would you mind pointing me in a direction how to do this best with omnibus, please?
A cron sidekiq job won't work: We can't go through pgbouncer here, so the job would have to execute on a host that has a direct database connection. For GitLab.com, that's deploy-01, for example.
Is crond_job what I'm looking for in omnibus? Would it work to create a gitlab::database_tasks cookbook or similar and have that included for the runlist for the deploy host?
Ideally, there would only be a single host where this is being executed from - but we can add a guard against concurrent execution to the Rake task, too.
The alternative for now is to just configure the cronjob on the deploy host on GitLab.com, for now - and integrate this later into omnibus. I'd prefer to use omnibus from the start, if we can.
@abrandlcrond_job is definitely the way to go. It uses the go-crond daemon we bundle with omnibus. You can see the crond_job defition for how it works.
I think a separate recipe would be great. omnibus does have a postgresql cookbook so perhaps that is the appropriate place? Something like postgresql::tasks or postgresql::maintanence_tasks?
Where to include it does get tricky. Does the rake task connect directly to the db? Or through rails?
If direct, we could include it in the postgresql::enable recipe. That would require the task to behave appropriately when run on a read only db instance.
If rails, it gets trickier. I would need to double check the reference architecture and see where it would work.
So the rake task would use the standard Rails connection configuration, so whatever is configured on the instance. However, it cannot go through pgbouncer (this is where it gets tricky).
We're only starting to implement the reindexing and right now, I'd just like to set this up for .com for starters. Which also doesn't use the postgresql cookbooks to my knowledge.
If rails, it gets trickier. I would need to double check the reference architecture and see where it would work.
I could see this work for .com if we had another cookbook gitlab::database_tasks and include it in the runlist of the desired hosts (through our own chef setup), but nowhere else for starters.
As a second step, we can see how this fits the reference architecture. I don't have enough insight here, so I would probably create an issue for groupdistribution at some point, if that's ok.
@abrandl One thought for the order of indexes to be rebuild: If we have a list of good candidates for reindexing we maybe should go from the smallest to the biggest index to not risk running out of disk space, because we will first need more disk space to create the new index, before we free up space by deleting the old bloated one.
@abrandl I think for gprd and gstg we are fine now disk space wise and this can wait for a later iteration. We alert at 90% disk space usage, so we might get paged one day because of index rebuilds, but we are not really at risk of running out of space. But maybe that is different for other environments or customers, so thanks for looking up the stats and opening that issue!