Support Request for Help: Repo packs are not cleaning up and repo storage grew from 700MB to 160GB
Support Request for the Gitaly Team
The goal is to keep these requests public. However, if customer information is required to the support request, please be sure to mark this issue as confidential.
This request template is part of Gitaly Team's intake process.
Customer Information
Salesforce Link: I don't have Salesforce access
Zendesk Ticket: https://gitlab.zendesk.com/agent/tickets/413901
Installation Size: They're using Kubernetes, but did not state if they're using reference architecture. I have requested this information.
Architecture Information:
Slack Channel:
Additional Information:
Support Request
Severity
This is a severity2 per the Support Engineer User Guidance, but I'm going to make an executive decision that it's actually a severity3 based on the fact that it does not appear to be blocking the customer. However, since the repo's size is growing unchecked, I'm interested in dealing with this swiftly.
Problem Description
Housekeeping does not appear to be running on a single repo, and the reason is unclear. This is leading to a size on disk of 160GB+, which far exceeds the 700MB project size as reported in the web UI and when cloning the repo.
We've been discussing this in Slack.
We can see the failure
sidekiq.log
5768:2023-06-01T19:43:31.141185877Z {"component": "gitlab","subcomponent":"application_json","level":"info","severity":"INFO","time":"2023-06-01T19:43:31.140Z","correlation_id":"01H1W7S5KGDCCWYC5SZHC385XS","message":"Updating statistics for project 105"}
5778:2023-06-01T19:43:33.549322907Z {"component": "gitlab","subcomponent":"application_json","level":"debug","severity":"DEBUG","time":"2023-06-01T19:43:33.548Z","correlation_id":"01H1W7S5KGDCCWYC5SZHC385XS","message":"SilentModeInterceptor did nothing","mail_subject":"<REDACTED>","silent_mode_enabled":false}
16757:2023-06-01T19:55:04.715299115Z {"component": "gitlab","subcomponent":"git_json","level":"error","severity":"ERROR","time":"2023-06-01T19:55:04.714Z","correlation_id":"01H1W7S5KGDCCWYC5SZHC385XS","message":"gitaly_call failed:\n13:could not repack: repack failed with error code 128. debug_error_string:{\"created\":\"@1685649304.713485735\",\"description\":\"Error received from peer ipv4:<REDACTED>\",\"file\":\"src/core/lib/surface/call.cc\",\"file_line\":1063,\"grpc_message\":\"could not repack: repack failed with error code 128\",\"grpc_status\":13}"}
I redacted some potentially sensitive info, but you can see the messages in the log files on the ticket/in Slack.
in the Sidekiq.log, but it appears that it doesn't make it to Gitaly, as that correlation ID does not exist in the Gitaly log.
Troubleshooting Performed
- Ensure that the UI shows the repo is 700MB to rule out the possibility that the repository was inadvertently growing due to user action
- Attempted to manually run housekeeping, which fails during repack
- Attempt
git fsck --no-dangling
against the repo, which reported that it's checking 692576980. They can't run this right now, since they need to use the repo, and so I've asked them to schedule it during off hours. They will update with the results.
What specifically do you need from the Gitaly team
Assistance with troubleshooting next-steps. We've run out of ideas on the Support side, so additional guidance from the experts would be great!
I'm happy to coordinate a call if you would like to do any live troubleshooting.
Author Checklist
-
Customer information provided -
Severity realistically set -
Clearly articulated what is needed from the Gitaly team to support your request by filling out the What specifically do you need from the Gitaly team