Skip to content

Fix unintentional cleanup of Import/Export tmp files

What does this MR do?

This MR fixes a bug in ImportExportCleanUpService that was removing in-flight project import files from tmp storage location.

ImportExportCleanUpService's primary purpose is to delete export archive tar.gz file from object storage after 24 hours as well as any not cleaned up export files from tmp storage location. This is a housekeeping cron job that runs every hour. The regular import/export after execute cleanup is already in place for corresponding sidekiq workers.

There is, however, a bug that can remove other projects' import files from tmp storage location unintentionally. This is happening due to the fact that an import request can have a tar.gz file that is more than 24 hours old, and because we use find system call that includes all files that are older than 24 hours, not being scoped to a specific project, it can pick up and remove other projects import files that happen to be old, regardless if the actual import attempt is recent or in progress. Such behaviour can be disruptive to project imports.

Instead of deleting all files that are 24 hours old, scope find command to locate old files more accurately, not taking files from tar.gz into consideration. Use fixed maxdepth 5 in order to accurately check 'age' of a particular import/export.

Directory tree typically looks like this:

├── @groups
│   └── ef
│       └── bd
│           └── efbd1f26a54875e39972ccf7fa21a34f2491c850b2eba9636cb5478e595897b5 <-- hashed storage path of a group
│               └── 4ed3f070885d54dcc5b640775bc90a90 <-- directory for import/export. different hash for each attempt
├── @hashed
│   ├── 39
│   │   └── fa
│   │       └── efbd1f26a54875e39972ccf7fa21a34f2491c850b2eba9636cb5478e595897b5 <-- hashed storage path of a project
│   │           └── 4ed3f070885d54dcc5b640775bc90a90 <-- directory for import/export. different hash for each attempt

Min/Max depth 5 is a folder that gets created when import/export attempt starts. It is a good candidate for figuring out which old files to remove and not accidentally remove something else that is still used.

Also adding logging for additional observability.

More details can be found in #332313 (closed)

Screenshots (strongly suggested)

Does this MR meet the acceptance criteria?

Conformity

Availability and Testing

Security

Does this MR contain changes to processing or storing of credentials or tokens, authorization and authentication methods or other items described in the security review guidelines? If not, then delete this Security section.

  • Label as security and @ mention @gitlab-com/gl-security/appsec
  • The MR includes necessary changes to maintain consistency between UI, API, email, or other methods
  • Security reports checked/validated by a reviewer from the AppSec team
Edited by George Koltsov

Merge request reports