Global Search data migrations: add storage requirements
The following discussion from !46672 (merged) should be addressed:
-
@DylanGriffith started a discussion: (+2 comments) I don't think we can start by assuming migrations will run automatically. We'll need to first support a manual action from the user to trigger the migrations to occur. The reason being is that some migrations will likely cause the Elasticsearch index to double in size. It could be risky for an operator to have this happen to them by surprise.
This may require us to have a similar database model that we use when triggering a re-indexing via the UI. The user will trigger the migrations (we can start with a rake task to save effort here) which persists a record in the DB. If that record is in the DB and not marked as completed we'll run migrations. Otherwise we exit early.
Alternatively we can persist this record in Elasticsearch instead of the DB if it's easier. Possibly could be an additional layer on top of the persisted migrations in Elasticsearch. Perhaps a boolean which says
requested=true
. So when they trigger the migrations to run we find all of them in the filesystem and create a record for all of them marking them asrequested=true
.As such I think we'll need to add an additional check in here first to check that the user has triggered migrations.
WDYT? Am I overthinking this?
that’s a good suggestion, but I don’t think we need user confirmation for every migration. I believe that most of our migrations won’t require a lot of space in a cluster. As a potential solution, we might put space requirements in the migration and stop the process if there isn’t enough space. WDYT?
I like that. If we detect there is not enough space then we can display a warning on the admin page saying migrations are paused until they add more space. This could allow us to always keep the entire process automated. If we do this then we'll need to be very confident the migrations work before we put them on GitLab.com because we may not be around when they run to troubleshoot if something goes wrong... This could be a follow up issue to keep first iteration smaller.