Project 'gitlab-com/infrastructure' was moved to 'gitlab-com/gl-infra/production-engineering'. Please update any links and bookmarks that may still have the old path.
file-storage1 grows a lot all the time. It seems the largest offenders are artifacts and lfs-objects. We need to set up alerts to be sure we keep on top of this and don't let everything catch on fire.
@pcarranza I'm interested in the pages thing too. I'm going to run a du on that and find huge sites and see if any are abusive or all are normal.
I will also create a new LFS NFS server. How big do we want the LFS NFS to be? What do we even want it named? At a later date I also need to know if there a way we can block LFS uploads or make it read only for a few minutes at some point to finalize the switch over?
I have created nfs-lfs01.stor.gitlab.com. It is the same setup as our other NFS servers. I am going to get with @northrup on Monday to sync up the LFS folder and then we can make plans to switch over.
@pcarranza so I checked on the pages as I said I would and there is a giant directory at /var/opt/gitlab/gitlab-rails/shared/pages/tmp. It is 919GB and I have no idea what it is for. There are also a few large pages sites, but none are over 1GB which is our max, and it doesn't appear to be abuse.
I've no idea how LFS is implemented, if we can have a positive lookup then we could run a process to clean up. But we will need to have an actual solution for the future to keep it under control.
I'm calling people I have an impression has touched LFS at a given point in time.
The sync finished! What would the next steps be to actually make the switch? I assume we need to make LFS read-only, sync again to get any changed files, and then mount the new location on all the workers?
@northrup Is there a way to mark LFS read only or do we have to turn it off? Should we use the FDT tool or rsync for the second pass? Should we plan a time to do this so we can alert our users to the downtime? Perhaps next week after the holiday? I don't like the idea of doing it before a holiday really, even if it is just a US holiday.
I mean we definitely have to take the downtime, I'm curious what the user experience will be while LFS is turned off, though. I will be sure to do this Monday after US Thanksgiving.
The rsync finished very quickly. I am going to prepare merge requests with @northrup to switch the LFS mount over. Once these are completed, we can take the downtime and make the migration pretty painless due to the speed of the rsync.
@pcarranza I don't think we have a deletion mechanism at the moment. It's a bit finnicky because you have to be careful about concurrent uploads of the file you're about to delete. If you're not careful, you end up with missing LFS objects etc.
It's similar to other kinds of file uploads which also don't really have a good deletion regime. We thought of it as an append-only thing.
@jacobvosmaer-gitlab this is about deleting a folder that is not in use anymore. We moved the LFS objects to a different host completely. It's about cleaning up behind a move. rm -rf is the deletion mechanism to use in this case, there should be no upload at all in the old storage.