gitaly issueshttps://gitlab.com/gitlab-org/gitaly/-/issues2024-03-27T05:49:05Zhttps://gitlab.com/gitlab-org/gitaly/-/issues/5919Removal of feature flag gitaly_use_resizable_semaphore_lifo_strategy2024-03-27T05:49:05ZEmily ChuiRemoval of feature flag gitaly_use_resizable_semaphore_lifo_strategy \## What
Remove the `:gitaly_use_resizable_semaphore_lifo_strategy` feature flag.
## Owners
- Team: Gitaly
- Most appropriate slack channel to reach out to: `#g_gitaly`
- Best individual to reach out to: @echui-gitlab
## Steps
- [... \## What
Remove the `:gitaly_use_resizable_semaphore_lifo_strategy` feature flag.
## Owners
- Team: Gitaly
- Most appropriate slack channel to reach out to: `#g_gitaly`
- Best individual to reach out to: @echui-gitlab
## Steps
- [ ] Remove the feature flag and the pre-feature-flag code ([howto](https://gitlab.com/gitlab-org/gitaly/-/blob/master/doc/PROCESS.md#feature-lifecycle-after-it-is-live))
- [ ] Wait for the MR to be deployed to production
- [ ] Remove the feature flag via chatops ([howto](https://gitlab.com/gitlab-org/gitaly/-/blob/master/doc/PROCESS.md#remove-the-feature-flag-via-chatops))
Please refer to the [documentation of feature flags](https://gitlab.com/gitlab-org/gitaly/-/blob/master/doc/PROCESS.md#feature-flags) for further information.17.1Emily ChuiEmily Chuihttps://gitlab.com/gitlab-org/gitaly/-/issues/5918Default enable feature flag gitaly_use_resizable_semaphore_lifo_strategy2024-03-27T05:46:17ZEmily ChuiDefault enable feature flag gitaly_use_resizable_semaphore_lifo_strategy \## What
Default enable the `:gitaly_use_resizable_semaphore_lifo_strategy` feature flag.
## Owners
- Team: Gitaly
- Most appropriate slack channel to reach out to: `#g_gitaly`
- Best individual to reach out to: @echui-gitlab
## St... \## What
Default enable the `:gitaly_use_resizable_semaphore_lifo_strategy` feature flag.
## Owners
- Team: Gitaly
- Most appropriate slack channel to reach out to: `#g_gitaly`
- Best individual to reach out to: @echui-gitlab
## Steps
- [ ] Change the feature flag to default-enabled ([howto](https://gitlab.com/gitlab-org/gitaly/-/blob/master/doc/PROCESS.md#feature-lifecycle-after-it-is-live))
Please refer to the [documentation of feature flags](https://gitlab.com/gitlab-org/gitaly/-/blob/master/doc/PROCESS.md#feature-flags) for further information.17.0Emily ChuiEmily Chuihttps://gitlab.com/gitlab-org/gitaly/-/issues/5917Make `ResolveConflicts` RPC support CRLF line endings2024-03-27T16:41:37ZJustin ToblerMake `ResolveConflicts` RPC support CRLF line endingsThe `ResolveConflicts` RPC is used to write commits to resolve conflicts. If a file uses CRLF line endings, the conflict markers are committed to the file without being resolved. Gitaly appears to be unable to detect the conflict markers...The `ResolveConflicts` RPC is used to write commits to resolve conflicts. If a file uses CRLF line endings, the conflict markers are committed to the file without being resolved. Gitaly appears to be unable to detect the conflict markers and thus ignores them.
Update `conflict.Resolve()` to support conflict resolution of CRLF files.
Related: https://gitlab.com/gitlab-org/gitlab/-/issues/301043#note_183006600417.0https://gitlab.com/gitlab-org/gitaly/-/issues/5914Git bundle failing due to bad hidden ref2024-03-28T18:47:58ZKaitlyn Chappellkmchappell@gitlab.comGit bundle failing due to bad hidden ref# Support Request for the Gitaly Team
<!--
The goal of this template is to create a consistent experience for customer support requests from the Gitaly Team. Due to the size of the team and ambitious amount of work we try to complete, ...# Support Request for the Gitaly Team
<!--
The goal of this template is to create a consistent experience for customer support requests from the Gitaly Team. Due to the size of the team and ambitious amount of work we try to complete, it helps us tremendously to have a common issue format for requests that can be prioritized appropriately. It also helps keep a record of issues experienced that can benefit other teams in the future.
As we collaborate on resolution of this issue, the Gitaly team will attempt to utilize this as a single source of truth.
-->
_The goal is to keep these requests public. However, if customer information is required to the support request, please be sure to mark this issue as confidential._
This request template is part of [Gitaly Team's intake process](https://about.gitlab.com/handbook/engineering/development/enablement/systems/gitaly/#how-to-contact-the-team).
## Author Checklist
- [x] Reached out to [#spt_pod_git](https://gitlab.enterprise.slack.com/archives/C04D5FUADAM) prior creating issue Link: https://gitlab.slack.com/archives/C04D5FUADAM/p1711388550050389
- [x] Fill out customer information section
- [x] Provide an detail summary under **Additional Information:**
- [x] Severity realistically set
- [x] Provided detailed problem description
- [x] Provided detailed troubleshooting performed
- [x] Clearly articulated what is needed from the Gitaly team to support your request by filling out the _What specifically do you need from the Gitaly team_
## Customer Information
**Salesforce Link:** https://gitlab.my.salesforce.com/0016100001WJU1UAAX
**Zendesk Ticket:** https://gitlab.zendesk.com/agent/tickets/513890
**Installation Size:** Bundled gitaly, single node omnibus, 775 users.
**Architecture Information:**
<!-- Please include cloud hosting provider if available, links to architecture documents, etc... -->
**Slack Channel:**
<!-- Please include the general slack channel, the slack channel for the incident, etc... -->
**Additional Information:**
<!-- Links to executive summary, customer calls, etc... Anything that helps provide context for the team -->
## Support Request
### Severity S3
<!-- Please be as realistic as possible here. We are sensitive to the fact that customers are frustrated when things aren't working, but realistically we cannot treat everything as a Severity 1 emergency.
For a good rule of thumb, please refer to the bug prioritization framework located in the handbook here: https://about.gitlab.com/handbook/engineering/quality/issue-triage/#severity
For S1 or S2 issues, please follow https://about.gitlab.com/handbook/engineering/development/enablement/systems/gitaly/#urgent-issues-and-outages .
-->
### Problem Description
<!-- Please describe the problem in as much detail as possible. Feel free to include log outputs, screenshots, or anything else that could help the team understand what is happening. -->
Okay, so when they run the backup rake task or the project export, Git bundle fails on one specific repository with this error:
```
command":"create","error":"manager: write bundle: remote repository: create bundle: rpc error: code = Internal desc = create bundle: exit status 128: stderr: \"fatal: bad object refs/merge-requests/722/merge\\n\
manager: write bundle: remote repository: create bundle: rpc error: code = Internal desc = create bundle: exit status 128: stderr: \"fatal: bad object refs/merge-requests/722/merge\\n\"\n"
```
Since bundle is failing, the only way they can get a backup is by using the skip repositories flag to exclude this one project. Everything else works okay, they can clone and push and all that jazz.
It's not a fork or mirror. Doing the git fsck repo check per the docs returns no problems.
I asked if the backups ever worked for this repo, and if they knew what happened, and was told:
> Probably not in a long time. I first took over the system ~2 years ago. I feel like I have gotten good backups out of it, but that might only ever have been with `SKIP=repositories` in the backup command. The MR that it’s complaining about would have been dated Oct 2 or 3, 2019. I spoke with an owner of the project and he specifically recalled what he called a “corrupt commit” happening and it screwing things up, but they eventually fixed it well enough they could move forward.
So mystery Git trouble leads to more mystery Git trouble.
- What version is the customer running? 16.7.7
- What is the customers architecture?
- What is the GitLab architecture? Single node omnibus
- Are networking filesystems (like NFS) used?
- What are the filesystems?
- What are the OS and kernel versions?
- How are backup, replication, HA, etc performed? The gitlab rake backup, they also have disk snapshots as a "backup backup"
- Are they using Gitaly Cluster? nope
- How many Gitaly Clusters the customer has? n/a
- How many Gitaly nodes per cluster the customer has configured? n/a
- Has the customer, or some tools/script (backup, synchro, replication, HA, etc) they set up, directly interacted with the Git repository? no, that's what we're getting advice on
- using `rsync` or similar tools? no
- `git` commands? no
- history changing tools (like [git filter-repo](https://github.com/newren/git-filter-repo))? no
- Does the customer have any hooks configured? unknown
- [Git server hooks](https://docs.gitlab.com/ee/administration/server_hooks.html)
- [Git hooks](https://git-scm.com/book/en/v2/Customizing-Git-Git-Hooks)
- If this is a performance issue, what does the Git workflow look like? not performance
- What are the customer RPS for push and pulls? (use [fast-stats](https://gitlab.com/gitlab-com/support/toolbox/fast-stats))
- How mamy pipelines does the customer run?
- How many users are working on the instance?
- How big are the repositories? Do they have [monorepos](https://docs.gitlab.com/ee/user/project/repository/monorepos/)?
- Provide the output of [git-sizer](https://github.com/github/git-sizer).
### Troubleshooting Performed
Prior to reaching out to us the customer attempted to delete this ref with `git push origin --delete --force refs/merge-requests/722/merge`, but it failed because it's a hidden ref. So I thought, okay, let's try to remove hidden refs like it shows in the [reduce repo size docs](https://docs.gitlab.com/ee/user/project/repository/reducing_the_repo_size_using_git.html#purge-files-from-repository-history). But that process relies on having an export, which fails because it also uses bundle.
### What specifically do you need from the Gitaly team
Is it safe to follow the filter-repo removing hidden refs steps on a "live" clone of the repository instead of the bundle? We don't really want to touch anything on Gitaly without checking and make the problem worse.
Is there a better way to delete these refs or otherwise get bundle to function again?
/cc @mjwood @andrashorvath @jcaigitlab @john.mcdonnell @gerardo17.0https://gitlab.com/gitlab-org/gitaly/-/issues/5913Remove non-transactional handling of maintenance scoped RPCs2024-03-28T03:48:45ZSami Hiltunenshiltunen@gitlab.comRemove non-transactional handling of maintenance scoped RPCs`maintenance` scoped RPCs are currently handled non-transactionally: https://gitlab.com/gitlab-org/gitaly/-/blob/1db819aed093677ba27a719afa7516f7fdea7a92/internal/gitaly/storage/storagemgr/middleware.go#L214. We should change this and st...`maintenance` scoped RPCs are currently handled non-transactionally: https://gitlab.com/gitlab-org/gitaly/-/blob/1db819aed093677ba27a719afa7516f7fdea7a92/internal/gitaly/storage/storagemgr/middleware.go#L214. We should change this and start erroring out on RPCs that are not supported by transactions to ensure we don't miss any.
- `OptimizeRepository` will soon support transactions through !6705+. We should add it to the list that skips implicit transaction handling for it.
- `PruneUnreachableObjects` is the remaining maintenance RPC. We should make it a no-op if transactions are used.Quang-Minh Nguyenqmnguyen@gitlab.comQuang-Minh Nguyenqmnguyen@gitlab.comhttps://gitlab.com/gitlab-org/gitaly/-/issues/5912Bundle URI may silently fail as errors get swallowed2024-03-25T09:52:18ZSami Hiltunenshiltunen@gitlab.comBundle URI may silently fail as errors get swallowedGitaly's `bundle-uri` functionality relies on the backups. It reuses the bundles from the backups, and the signed URL functionality to give direct access to the bundle.
If there is a problem accessing the backups in the sink/object sto...Gitaly's `bundle-uri` functionality relies on the backups. It reuses the bundles from the backups, and the signed URL functionality to give direct access to the bundle.
If there is a problem accessing the backups in the sink/object storage or generating a signed URL, the errors are swallowed and the `bundle-uri` functionality silently disabled. Examples of swallowed errors can be found [here](https://gitlab.com/gitlab-org/gitaly/-/blob/7041d5158c4a2dfe2c5620e4d016cb6b65070dae/internal/bundleuri/git_config.go#L47) and [here](https://gitlab.com/gitlab-org/gitaly/-/blob/7041d5158c4a2dfe2c5620e4d016cb6b65070dae/internal/bundleuri/git_config.go#L58).
We should log the errors the lead to disabling the functionality.https://gitlab.com/gitlab-org/gitaly/-/issues/5911ResolveConflicts returning 13:retrieving object: object not found2024-03-27T09:14:49ZStan HuResolveConflicts returning 13:retrieving object: object not foundI'm not sure if this is a Gitaly error or a Rails caller issue, but @marina.mosti reported a 500 error resolving conflicts in https://gitlab.com/gitlab-com/gl-infra/gitlab-dedicated/switchboard/-/merge_requests/1252:
![image](/uploads/3...I'm not sure if this is a Gitaly error or a Rails caller issue, but @marina.mosti reported a 500 error resolving conflicts in https://gitlab.com/gitlab-com/gl-infra/gitlab-dedicated/switchboard/-/merge_requests/1252:
![image](/uploads/3370c903b39cbd726399991093730ba1/image.png)
In https://log.gprd.gitlab.net/app/r/s/02Jwm, I see these 500 errors with Gitaly on `gitaly-08-stor-gprd` reporting `13:retrieving object: object not found`:
![image](/uploads/8384ab60972d4cb56c2d92ca25a47211/image.png)
```json
"exception.backtrace": [
"lib/gitlab/git/wraps_gitaly_errors.rb:24:in `rescue in wrapped_gitaly_errors'",
"lib/gitlab/git/wraps_gitaly_errors.rb:6:in `wrapped_gitaly_errors'",
"lib/gitlab/git/conflict/resolver.rb:30:in `resolve_conflicts'",
"lib/gitlab/conflict/file_collection.rb:25:in `resolve'",
"app/services/merge_requests/conflicts/resolve_service.rb:9:in `execute'",
"app/controllers/projects/merge_requests/conflicts_controller.rb:66:in `resolve_conflicts'",
"ee/lib/gitlab/ip_address_state.rb:10:in `with'",
"ee/app/controllers/ee/application_controller.rb:45:in `set_current_ip_address'",
"app/controllers/application_controller.rb:468:in `set_current_admin'",
"lib/gitlab/session.rb:11:in `with_session'",
"app/controllers/application_controller.rb:459:in `set_session_storage'",
"lib/gitlab/i18n.rb:114:in `with_locale'",
"lib/gitlab/i18n.rb:120:in `with_user_locale'",
...
```
The parameters from the log message:
```
[commit_message, files, namespace_id, project_id, id, conflict]
```
```
[
Merge branch 'main' into 'marina.mosti-4347-leverage-invalidation'
# Conflicts:
# app/javascript/pages/tenants/show/customer_page.vue
# app/javascript/pages/tenants/show/operator_page.vue
# app/javascript/queries/use_tenant_config_changes_query.js,
[{"old_path"=>"app/javascript/pages/tenants/show/customer_page.vue", "new_path"=>"app/javascript/pages/tenants/show/customer_page.vue", "content"=>"[FILTERED]"}, {"old_path"=>"app/javascript/pages/tenants/show/operator_page.vue", "new_path"=>"app/javascript/pages/tenants/show/operator_page.vue", "content"=>"[FILTERED]"}, {"old_path"=>"app/javascript/queries/use_tenant_config_changes_query.js", "new_path"=>"app/javascript/queries/use_tenant_config_changes_query.js", "content"=>"[FILTERED]"}],
gitlab-com/gl-infra/gitlab-dedicated,
switchboard,
1252,
{"commit_message"=>"Merge branch 'main' into 'marina.mosti-4347-leverage-invalidation'\n\n# Conflicts:\n# app/javascript/pages/tenants/show/customer_page.vue\n# app/javascript/pages/tenants/show/operator_page.vue\n# app/javascript/queries/use_tenant_config_changes_query.js", "files"=>[{"old_path"=>"app/javascript/pages/tenants/show/customer_page.vue", "new_path"=>"app/javascript/pages/tenants/show/customer_page.vue", "content"=>"[FILTERED]"}, {"old_path"=>"app/javascript/pages/tenants/show/operator_page.vue", "new_path"=>"app/javascript/pages/tenants/show/operator_page.vue", "content"=>"[FILTERED]"}, {"old_path"=>"app/javascript/queries/use_tenant_config_changes_query.js", "new_path"=>"app/javascript/queries/use_tenant_config_changes_query.js", "content"=>"[FILTERED]"}]}
]
```
It looks like https://gitlab.com/gitlab-org/gitaly/-/blob/de8220dfa186ef1d864050db8473f5bc629113ef/internal/gitaly/service/conflicts/resolve_conflicts.go#L254-258 is where this error originated. It sounds like it wasn't able to retrieve `conflictedBlob.OID.Revision()` for some reason.
@knayakgl Do you have any idea what might be going on here?https://gitlab.com/gitlab-org/gitaly/-/issues/5910Remove version definition from feature flags2024-03-22T08:19:05ZSami Hiltunenshiltunen@gitlab.comRemove version definition from feature flagsFeature flags definitions contain the version they were introduced for example [here](https://gitlab.com/gitlab-org/gitaly/-/blob/cb8bec7d843749b73a0fc06f44e6aec25b378d92/internal/featureflag/ff_bundle_uri.go#L6). The version isn't used ...Feature flags definitions contain the version they were introduced for example [here](https://gitlab.com/gitlab-org/gitaly/-/blob/cb8bec7d843749b73a0fc06f44e6aec25b378d92/internal/featureflag/ff_bundle_uri.go#L6). The version isn't used for anything. It creates meaningless work to update the version if an MR introducing a flag misses a release. The version the flag was introduced is available via the Git history, and thus it is unnecessary to duplicate the version in the feature flag file. Let's remove it.https://gitlab.com/gitlab-org/gitaly/-/issues/5907Gitaly pod shows a lot of git zombie processes2024-03-27T11:56:56ZVictor EnriquezGitaly pod shows a lot of git zombie processesHi,
We are a customer with a premium gitlab license, after a non-voluntary restart of the gitaly pod today, I realized that gitaly is spawning a lot of git zombie processes, at least 1 or 2 per minute. This is happening on a small Kuber...Hi,
We are a customer with a premium gitlab license, after a non-voluntary restart of the gitaly pod today, I realized that gitaly is spawning a lot of git zombie processes, at least 1 or 2 per minute. This is happening on a small Kubernetes deployment with 2 GEO sites. Both sites are showing the same behavior, gitaly is running with pretty much the default values provided by the helm chart.
Eventually the cgroup is exhausted and the pod is restarted causing some downtime. Is there any known cause for this behavior?:
```
git@production-gitlab-gitaly-0:/$ ps auxwww | grep git | grep defunct | wc -l
552
```
```
git@production-gitlab-gitaly-0:/var/log/gitaly$ ps auxwww
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
git 1 12.7 3.0 3405872 235464 ? Ssl 14:39 13:14 /usr/local/bin/gitaly /etc/gitaly/config.toml
git 16 0.7 0.0 1227888 7436 ? Sl 14:39 0:48 /usr/local/bin/gitlab-logger --json /var/log/gitaly
git 419 0.1 0.0 0 0 ? Z 14:40 0:09 [git] <defunct>
git 651 0.0 0.0 0 0 ? Z 14:40 0:01 [git] <defunct>
git 1649 0.1 0.0 0 0 ? Z 14:42 0:08 [git] <defunct>
git 1851 0.0 0.0 0 0 ? Z 14:42 0:01 [git] <defunct>
git 3606 0.1 0.0 0 0 ? Z 14:44 0:07 [git] <defunct>
git 3699 0.0 0.0 0 0 ? Z 14:44 0:01 [git] <defunct>
git 4715 0.0 0.0 0 0 ? Z 14:45 0:01 [git] <defunct>
git 5366 0.1 0.0 0 0 ? Z 14:46 0:06 [git] <defunct>
git 6473 0.0 0.0 0 0 ? Z 14:47 0:01 [git] <defunct>
git 6498 0.1 0.0 0 0 ? Z 14:47 0:06 [git] <defunct>
git 7284 0.1 0.0 0 0 ? Z 14:48 0:06 [git] <defunct>
git 7353 0.0 0.0 0 0 ? Z 14:48 0:01 [git] <defunct>
git 8508 0.1 0.0 0 0 ? Z 14:50 0:06 [git] <defunct>
git 8568 0.0 0.0 0 0 ? Z 14:50 0:01 [git] <defunct>
git 8895 0.1 0.0 0 0 ? Z 14:51 0:06 [git] <defunct>
git 8967 0.0 0.0 0 0 ? Z 14:51 0:01 [git] <defunct>
git 10858 0.0 0.0 0 0 ? Z 14:53 0:01 [git] <defunct>
git 10901 0.1 0.0 0 0 ? Z 14:53 0:06 [git] <defunct>
git 11428 0.1 0.0 0 0 ? Z 14:54 0:06 [git] <defunct>
git 12786 0.0 0.0 0 0 ? Z 14:55 0:01 [git] <defunct>
git 12883 0.1 0.0 0 0 ? Z 14:55 0:06 [git] <defunct>
git 15756 0.1 0.0 0 0 ? Z 14:57 0:06 [git] <defunct>
git 15911 0.0 0.0 0 0 ? Z 14:57 0:01 [git] <defunct>
git 16911 0.0 0.0 0 0 ? Z 14:59 0:01 [git] <defunct>
git 16939 0.1 0.0 0 0 ? Z 14:59 0:07 [git] <defunct>
git 17529 0.1 0.0 0 0 ? Z 15:00 0:06 [git] <defunct>
git 17901 0.0 0.0 0 0 ? Z 15:00 0:01 [git] <defunct>
git 19218 0.1 0.0 0 0 ? Z 15:02 0:06 [git] <defunct>
git 19350 0.0 0.0 0 0 ? Z 15:02 0:01 [git] <defunct>
git 19812 0.0 0.0 0 0 ? Z 15:03 0:01 [git] <defunct>
git 19973 0.1 0.0 0 0 ? Z 15:03 0:06 [git] <defunct>
git 21191 0.1 0.0 0 0 ? Z 15:04 0:06 [git] <defunct>
git 21279 0.0 0.0 0 0 ? Z 15:04 0:01 [git] <defunct>
git 22928 0.0 0.0 0 0 ? Z 15:06 0:01 [git] <defunct>
git 22946 0.1 0.0 0 0 ? Z 15:06 0:06 [git] <defunct>
git 23417 0.1 0.0 0 0 ? Z 15:07 0:06 [git] <defunct>
git 25351 0.0 0.0 0 0 ? Z 15:08 0:01 [git] <defunct>
git 26636 0.1 0.0 0 0 ? Z 15:09 0:06 [git] <defunct>
git 26744 0.0 0.0 0 0 ? Z 15:09 0:01 [git] <defunct>
git 27411 0.0 0.0 0 0 ? Z 15:10 0:01 [git] <defunct>
git 27439 0.1 0.0 0 0 ? Z 15:10 0:06 [git] <defunct>
git 28357 0.0 0.0 0 0 ? Z 15:11 0:01 [git] <defunct>
git 28382 0.1 0.0 0 0 ? Z 15:11 0:07 [git] <defunct>
git 29082 0.0 0.0 0 0 ? Z 15:12 0:01 [git] <defunct>
git 29109 0.1 0.0 0 0 ? Z 15:12 0:07 [git] <defunct>
git 29425 0.0 0.0 0 0 ? Z 15:13 0:01 [git] <defunct>
git 30205 0.1 0.0 0 0 ? Z 15:14 0:06 [git] <defunct>
git 30797 0.0 0.0 0 0 ? Z 15:15 0:01 [git] <defunct>
git 30823 0.1 0.0 0 0 ? Z 15:15 0:06 [git] <defunct>
git 31596 0.0 0.0 0 0 ? Z 15:16 0:01 [git] <defunct>
git 32498 0.0 0.0 0 0 ? Z 15:17 0:01 [git] <defunct>
git 32545 0.1 0.0 0 0 ? Z 15:17 0:07 [git] <defunct>
git 33882 0.0 0.0 0 0 ? Z 15:18 0:01 [git] <defunct>
git 33946 0.1 0.0 0 0 ? Z 15:18 0:07 [git] <defunct>
git 34974 0.0 0.0 0 0 ? Z 15:19 0:01 [git] <defunct>
git 34992 0.1 0.0 0 0 ? Z 15:19 0:06 [git] <defunct>
git 35557 0.1 0.0 0 0 ? Z 15:20 0:06 [git] <defunct>
git 36267 0.0 0.0 0 0 ? Z 15:21 0:01 [git] <defunct>
git 36327 0.2 0.0 0 0 ? Z 15:21 0:09 [git] <defunct>
git 37029 0.0 0.0 0 0 ? Z 15:22 0:02 [git] <defunct>
git 37524 0.2 0.0 0 0 ? Z 15:23 0:10 [git] <defunct>
git 37611 0.0 0.0 0 0 ? Z 15:23 0:01 [git] <defunct>
git 37861 0.1 0.0 0 0 ? Z 15:24 0:06 [git] <defunct>
git 38663 0.0 0.0 0 0 ? Z 15:25 0:01 [git] <defunct>
git 38687 0.2 0.0 0 0 ? Z 15:25 0:07 [git] <defunct>
git 39084 0.0 0.0 0 0 ? Z 15:26 0:01 [git] <defunct>
git 39623 0.2 0.0 0 0 ? Z 15:27 0:08 [git] <defunct>
git 40052 0.0 0.0 0 0 ? Z 15:27 0:01 [git] <defunct>
git 40528 0.0 0.0 0 0 ? Z 15:28 0:01 [git] <defunct>
git 40540 0.1 0.0 0 0 ? Z 15:28 0:06 [git] <defunct>
git 40829 0.2 0.0 0 0 ? Z 15:29 0:07 [git] <defunct>
git 40882 0.0 0.0 0 0 ? Z 15:29 0:01 [git] <defunct>
git 41307 0.0 0.0 0 0 ? Z 15:30 0:01 [git] <defunct>
git 41321 0.2 0.0 0 0 ? Z 15:30 0:06 [git] <defunct>
git 42046 0.0 0.0 0 0 ? Z 15:31 0:01 [git] <defunct>
git 42066 0.2 0.0 0 0 ? Z 15:31 0:06 [git] <defunct>
git 43010 0.0 0.0 0 0 ? Z 15:32 0:01 [git] <defunct>
git 43052 0.2 0.0 0 0 ? Z 15:32 0:08 [git] <defunct>
git 43831 0.2 0.0 0 0 ? Z 15:33 0:06 [git] <defunct>
git 44956 0.0 0.0 0 0 ? Z 15:34 0:02 [git] <defunct>
git 45260 0.3 0.0 0 0 ? Z 15:34 0:09 [git] <defunct>
git 45826 0.0 0.0 0 0 ? Z 15:35 0:01 [git] <defunct>
git 45935 0.2 0.0 0 0 ? Z 15:35 0:08 [git] <defunct>
git 46896 0.0 0.0 0 0 ? Z 15:36 0:01 [git] <defunct>
git 47315 0.2 0.0 0 0 ? Z 15:37 0:07 [git] <defunct>
git 48321 0.0 0.0 0 0 ? Z 15:38 0:01 [git] <defunct>
git 48900 0.2 0.0 0 0 ? Z 15:39 0:06 [git] <defunct>
git 49051 0.0 0.0 0 0 ? Z 15:39 0:01 [git] <defunct>
git 50922 0.3 0.0 0 0 ? Z 15:41 0:08 [git] <defunct>
git 51176 0.0 0.0 0 0 ? Z 15:41 0:01 [git] <defunct>
git 52554 0.0 0.0 0 0 ? Z 15:42 0:01 [git] <defunct>
git 52601 0.3 0.0 0 0 ? Z 15:42 0:07 [git] <defunct>
git 54571 0.4 0.0 0 0 ? Z 15:44 0:09 [git] <defunct>
git 54911 0.0 0.0 0 0 ? Z 15:44 0:01 [git] <defunct>
git 56463 0.4 0.0 0 0 ? Z 15:46 0:09 [git] <defunct>
git 56577 0.0 0.0 0 0 ? Z 15:46 0:01 [git] <defunct>
git 57172 0.0 0.0 0 0 ? Z 15:47 0:01 [git] <defunct>
git 57208 0.3 0.0 0 0 ? Z 15:47 0:08 [git] <defunct>
git 57612 0.0 0.0 0 0 ? Z 15:48 0:01 [git] <defunct>
git 58192 0.3 0.0 0 0 ? Z 15:49 0:07 [git] <defunct>
git 58244 0.0 0.0 0 0 ? Z 15:49 0:01 [git] <defunct>
git 58392 0.0 0.0 4720 3820 pts/0 Ss 15:49 0:00 /bin/bash
git 59156 0.4 0.0 0 0 ? Z 15:51 0:07 [git] <defunct>
git 59453 0.0 0.0 0 0 ? Z 15:51 0:01 [git] <defunct>
git 59835 0.0 0.0 0 0 ? Z 15:52 0:01 [git] <defunct>
git 59854 0.3 0.0 0 0 ? Z 15:52 0:06 [git] <defunct>
git 61064 0.0 0.0 0 0 ? Z 15:53 0:01 [git] <defunct>
git 61076 0.3 0.0 0 0 ? Z 15:53 0:06 [git] <defunct>
git 61613 0.0 0.0 0 0 ? Z 15:54 0:01 [git] <defunct>
git 62329 0.5 0.0 0 0 ? Z 15:55 0:08 [git] <defunct>
git 63515 0.1 0.0 0 0 ? Z 15:56 0:01 [git] <defunct>
git 64503 0.5 0.0 0 0 ? Z 15:57 0:08 [git] <defunct>
git 64590 0.0 0.0 0 0 ? Z 15:57 0:01 [git] <defunct>
git 64891 0.5 0.0 0 0 ? Z 15:58 0:08 [git] <defunct>
git 65546 0.1 0.0 0 0 ? Z 15:59 0:02 [git] <defunct>
git 66375 0.1 0.0 0 0 ? Z 16:00 0:01 [git] <defunct>
git 66393 0.7 0.0 0 0 ? Z 16:00 0:09 [git] <defunct>
git 66644 0.1 0.0 0 0 ? Z 16:01 0:01 [git] <defunct>
git 66654 0.6 0.0 0 0 ? Z 16:01 0:08 [git] <defunct>
git 67348 0.1 0.0 0 0 ? Z 16:02 0:02 [git] <defunct>
git 67861 0.7 0.0 0 0 ? Z 16:03 0:09 [git] <defunct>
git 68843 0.1 0.0 0 0 ? Z 16:03 0:01 [git] <defunct>
git 69823 0.7 0.0 0 0 ? Z 16:04 0:08 [git] <defunct>
git 71029 0.1 0.0 0 0 ? Z 16:05 0:01 [git] <defunct>
git 71615 0.7 0.0 0 0 ? Z 16:06 0:08 [git] <defunct>
git 72230 0.1 0.0 0 0 ? Z 16:07 0:01 [git] <defunct>
git 72253 0.7 0.0 0 0 ? Z 16:07 0:07 [git] <defunct>
git 72824 0.7 0.0 0 0 ? Z 16:08 0:07 [git] <defunct>
git 72910 0.1 0.0 0 0 ? Z 16:08 0:01 [git] <defunct>
git 73174 0.1 0.0 0 0 ? Z 16:09 0:01 [git] <defunct>
git 73995 0.9 0.0 0 0 ? Z 16:10 0:07 [git] <defunct>
git 74220 0.1 0.0 0 0 ? Z 16:10 0:01 [git] <defunct>
git 74475 0.1 0.0 0 0 ? Z 16:11 0:01 [git] <defunct>
git 74514 1.4 0.0 0 0 ? Z 16:11 0:10 [git] <defunct>
git 74842 0.3 0.0 0 0 ? Z 16:12 0:02 [git] <defunct>
git 75411 1.3 0.0 0 0 ? Z 16:13 0:08 [git] <defunct>
git 76272 0.3 0.0 0 0 ? Z 16:13 0:02 [git] <defunct>
git 77089 1.5 0.0 0 0 ? Z 16:14 0:08 [git] <defunct>
git 77471 0.2 0.0 0 0 ? Z 16:15 0:01 [git] <defunct>
git 77511 1.4 0.0 0 0 ? Z 16:15 0:07 [git] <defunct>
git 77954 0.2 0.0 0 0 ? Z 16:16 0:01 [git] <defunct>
git 78633 2.0 0.0 0 0 ? Z 16:17 0:07 [git] <defunct>
git 79265 0.4 0.0 0 0 ? Z 16:18 0:01 [git] <defunct>
git 79416 2.2 0.0 0 0 ? Z 16:18 0:06 [git] <defunct>
git 80034 2.6 0.0 0 0 ? Z 16:19 0:06 [git] <defunct>
git 80300 0.4 0.0 0 0 ? Z 16:19 0:01 [git] <defunct>
git 80668 0.6 0.0 0 0 ? Z 16:20 0:01 [git] <defunct>
git 80693 3.6 0.0 0 0 ? Z 16:20 0:06 [git] <defunct>
git 81070 0.9 0.0 0 0 ? Z 16:21 0:01 [git] <defunct>
git 81431 1.6 0.0 0 0 ? Z 16:22 0:01 [git] <defunct>
git 81462 9.8 0.0 0 0 ? Z 16:22 0:07 [git] <defunct>
git 82123 74.6 0.0 0 0 ? Z 16:23 0:06 [git] <defunct>
git 82124 0.0 0.0 8492 4876 ? S 16:23 0:00 /tmp/gitaly-582375387/git-exec-1796046532.d/git --git-dir /home/git/repositories/@hashed/cb/be/cbbe2e41fff1a2f04968bdaeedff3b78085afca2dc4623870bdc2dff3aac6747.git -c gc.auto=0 -c maintenance.auto=0 -c core.autocrlf=input -c core.useReplaceRefs=false -c core.fsync=objects,derived-metadata,reference -c core.fsyncMethod=fsync -c core.packedRefsTimeout=10000 -c core.filesRefLockTimeout=1000 -c core.bigFileThreshold=50m cat-file --use-mailmap -Z --batch-command --buffer --end-of-options
git 82148 0.0 0.0 9560 4956 ? S 16:23 0:00 /tmp/gitaly-582375387/git-exec-1796046532.d/git --git-dir /home/git/repositories/@hashed/50/18/501883ce12ba20f1a0de68f06b8c84d3d148124f66ab52a38359317c78ab82b1.git -c gc.auto=0 -c maintenance.auto=0 -c core.autocrlf=input -c core.useReplaceRefs=false -c core.fsync=objects,derived-metadata,reference -c core.fsyncMethod=fsync -c core.packedRefsTimeout=10000 -c core.filesRefLockTimeout=1000 -c core.bigFileThreshold=50m cat-file --use-mailmap -Z --batch-command --buffer --end-of-options
git 82151 0.0 0.0 29776 5208 ? S 16:23 0:00 /tmp/gitaly-582375387/git-exec-1796046532.d/git --git-dir /home/git/repositories/@hashed/95/53/9553627933b214db60798fe40d2b4f8497781d024f53d62dc1b12469b7d53784.git -c gc.auto=0 -c maintenance.auto=0 -c core.autocrlf=input -c core.useReplaceRefs=false -c core.fsync=objects,derived-metadata,reference -c core.fsyncMethod=fsync -c core.packedRefsTimeout=10000 -c core.filesRefLockTimeout=1000 -c core.bigFileThreshold=50m cat-file --use-mailmap -Z --batch-command --buffer --end-of-options
git 82153 0.0 0.0 893684 6936 ? S 16:23 0:00 /tmp/gitaly-582375387/git-exec-1796046532.d/git --git-dir /home/git/repositories/@hashed/4e/07/4e07408562bedb8b60ce05c1decfe3ad16b72230967de01f640b7e4729b49fce.git -c gc.auto=0 -c maintenance.auto=0 -c core.autocrlf=input -c core.useReplaceRefs=false -c core.fsync=objects,derived-metadata,reference -c core.fsyncMethod=fsync -c core.packedRefsTimeout=10000 -c core.filesRefLockTimeout=1000 -c core.bigFileThreshold=50m cat-file --use-mailmap -Z --batch-command --buffer --end-of-options
git 82157 100 0.0 8492 4200 pts/0 R+ 16:23 0:00 ps auxwww
```
I also opened a support request at https://support.gitlab.com/hc/en-us/requests/513408 .
Thanks in advance.https://gitlab.com/gitlab-org/gitaly/-/issues/5906[Feature flag] Enable use of Git v2.442024-03-28T15:12:05ZChristian Couder[Feature flag] Enable use of Git v2.44## What
Enable the `:gitaly_git_v244` feature flag, which upgrades from Git v2.43 to Git v2.44. No major risk factors are present in this Git release.
Main issue: https://gitlab.com/gitlab-org/gitaly/-/issues/5882
## Owners
- Team: G...## What
Enable the `:gitaly_git_v244` feature flag, which upgrades from Git v2.43 to Git v2.44. No major risk factors are present in this Git release.
Main issue: https://gitlab.com/gitlab-org/gitaly/-/issues/5882
## Owners
- Team: Gitaly
- Most appropriate slack channel to reach out to: `#g_gitaly`
- Best individual to reach out to: NAME
## Expectations
### What release does this feature occur in first?
### What are we expecting to happen?
### What might happen if this goes wrong?
### What can we monitor to detect problems with this?
<!--
Which dashboards from https://dashboards.gitlab.net are most relevant?
Usually you'd just like a link to the method you're changing in the
dashboard at:
https://dashboards.gitlab.net/d/000000199/gitaly-feature-status
I.e.
1. Open that URL
2. Change "method" to your feature, e.g. UserDeleteTag
3. Copy/paste the URL & change gprd to gstd to monitor staging as well as prod
-->
## Roll Out Steps
- [x] Enable on staging
- [x] Is the required code deployed on staging? ([howto](https://gitlab.com/gitlab-org/gitaly/-/blob/master/doc/PROCESS.md#is-the-required-code-deployed))
- [x] Enable on staging ([howto](https://gitlab.com/gitlab-org/gitaly/-/blob/master/doc/PROCESS.md#enable-on-staging))
- [x] Add ~"featureflag::staging" to this issue ([howto](https://gitlab.com/gitlab-org/gitaly/-/blob/master/doc/PROCESS.md#feature-flag-labels))
- [x] Test on staging ([howto](https://gitlab.com/gitlab-org/gitaly/-/blob/master/doc/PROCESS.md#test-on-staging))
- [x] Verify the feature flag was used by checking Prometheus metric [`gitaly_feature_flag_checks_total`](https://prometheus.gstg.gitlab.net/graph?g0.expr=sum%20by%20(flag)%20(rate(gitaly_feature_flag_checks_total%5B5m%5D))&g0.tab=1&g0.stacked=0&g0.range_input=1h)
- [ ] Enable on production
- [ ] Is the required code deployed on production? ([howto](https://gitlab.com/gitlab-org/gitaly/-/blob/master/doc/PROCESS.md#is-the-required-code-deployed))
- [ ] Progressively enable in production ([howto](https://gitlab.com/gitlab-org/gitaly/-/blob/master/doc/PROCESS.md#enable-in-production))
- [ ] Add ~"featureflag::production" to this issue
- [ ] Verify the feature flag was used by checking Prometheus metric [`gitaly_feature_flag_checks_total`](https://prometheus.gprd.gitlab.net/graph?g0.expr=sum%20by%20(flag)%20(rate(gitaly_feature_flag_checks_total%5B5m%5D))&g0.tab=1&g0.stacked=0&g0.range_input=1h)
- [ ] Create subsequent issues
- [ ] To default enable the feature flag (optional, only required if backwards-compatibility concerns exist)
- [ ] [Create issue](https://gitlab.com/gitlab-org/gitaly/-/issues/new?issuable_template=Feature%20Flag%20Default%20Enable) using the `Feature Flag Default Enable` template.
- [ ] Set milestone to current+1 release
- [ ] To Remove feature flag
- [ ] [Create issue](https://gitlab.com/gitlab-org/gitaly/-/issues/new?issuable_template=Feature%20Flag%20Removal) using the `Feature Flag Removal` template.
- [ ] Set milestone to current+1 (+2 if we created an issue to default enable the flag).
Please refer to the [documentation of feature flags](https://gitlab.com/gitlab-org/gitaly/-/blob/master/doc/PROCESS.md#feature-flags) for further information.16.11Eric JuEric Juhttps://gitlab.com/gitlab-org/gitaly/-/issues/5905Unable to create profile in gitaly-cny2024-03-21T00:12:42ZSteve Xuerebsxuereb@gitlab.comUnable to create profile in gitaly-cny## Overview
Broken:
```sh
$ ssh gitaly-cny-01-stor-gprd.c.gitlab-production.internal
steve@gitaly-cny-01-stor-gprd.c.gitlab-production.internal:~$ perf_flamegraph_for_all_running_processes.sh
Starting capture for 60 seconds.
[ perf re...## Overview
Broken:
```sh
$ ssh gitaly-cny-01-stor-gprd.c.gitlab-production.internal
steve@gitaly-cny-01-stor-gprd.c.gitlab-production.internal:~$ perf_flamegraph_for_all_running_processes.sh
Starting capture for 60 seconds.
[ perf record: Woken up 22 times to write data ]
[ perf record: Captured and wrote 11.201 MB perf.data ]
ERROR: No stack counts found
```
Good:
```sh
steve@gitaly-01-stor-gprd.c.gitlab-gitaly-gprd-e493.internal:~$ perf_flamegraph_for_all_running_processes.sh
Starting capture for 60 seconds.
[ perf record: Woken up 212 times to write data ]
[ perf record: Captured and wrote 56.843 MB perf.data (190080 samples) ]
Results:
Flamegraph: /tmp/perf-record-results.XjmGisKi/gitaly-01-stor-gprd.20240320_123800_UTC.all_cpus.flamegraph.svg
Raw stack traces: /tmp/perf-record-results.XjmGisKi/gitaly-01-stor-gprd.20240320_123800_UTC.all_cpus.perf-script.txt.gz
```https://gitlab.com/gitlab-org/gitaly/-/issues/5903Data loss - Specific files and commit history sporadically disappear from rep...2024-03-19T23:40:01ZNiklas JanzData loss - Specific files and commit history sporadically disappear from repository# Support Request for the Gitaly Team
<!--
The goal of this template is to create a consistent experience for customer support requests from the Gitaly Team. Due to the size of the team and ambitious amount of work we try to complete, ...# Support Request for the Gitaly Team
<!--
The goal of this template is to create a consistent experience for customer support requests from the Gitaly Team. Due to the size of the team and ambitious amount of work we try to complete, it helps us tremendously to have a common issue format for requests that can be prioritized appropriately. It also helps keep a record of issues experienced that can benefit other teams in the future.
As we collaborate on resolution of this issue, the Gitaly team will attempt to utilize this as a single source of truth.
-->
_The goal is to keep these requests public. However, if customer information is required to the support request, please be sure to mark this issue as confidential._
This request template is part of [Gitaly Team's intake process](https://about.gitlab.com/handbook/engineering/development/enablement/systems/gitaly/#how-to-contact-the-team).
## Customer Information
**Salesforce Link:** https://gitlab.my.salesforce.com/0016100001dIq8l
**Zendesk Ticket:**
1. https://gitlab.zendesk.com/agent/tickets/506425
2. https://gitlab.zendesk.com/agent/tickets/428850 (cc @rmongare)
**Installation Size:** 4650 seats
**Architecture Information:** 3k reference architecture
## Support Request
### Severity
<!-- Please be as realistic as possible here. We are sensitive to the fact that customers are frustrated when things aren't working, but realistically we cannot treat everything as a Severity 1 emergency.
For a good rule of thumb, please refer to the bug prioritization framework located in the handbook here: https://about.gitlab.com/handbook/engineering/quality/issue-triage/#severity
For S1 or S2 issues, please follow https://about.gitlab.com/handbook/engineering/development/enablement/systems/gitaly/#urgent-issues-and-outages .
-->
Per the prioritization framework, I'd prioritize this as ~"severity::1" due to the data-loss aspect.\
As the impact is limited to two files on a single repository, and the fact a workaround exists (push the file from an unaffected local clone), I'll bump this down to ~"severity::3"
From the customer perspective, this is a continued annoyance that lowers the trust in GitLab as a product. They do however agree that the problem might arise due to their specific workflow on this repo.
### Problem Description
The customer reports they "lose" to specific files (`commitHash.yaml`, `master_release.xml`) in a repository on a regular basis. We have [one closed](https://gitlab.zendesk.com/agent/tickets/428850) and [one ongoing ticket](https://gitlab.zendesk.com/agent/tickets/506425) about this issue.
The repository is being force pushed to on a regular basis, during CI/CD jobs.
### Troubleshooting Performed
- Checking the Git logs for commits that'd have deleted or renamed the files `git log --all --diff-filter=DR --full-history -- {commitHash.yaml,master_release.xml}`
- No such commits were found for either files.
- `git log` does not show any history for the files beside the "restore" commits.
- Checking Gitaly logs for issues
- Checking the hosts syslog / dmesg for signs of data-loss
### What specifically do you need from the Gitaly team
Help us identify the root-cause for the data loss.
## Author Checklist
- [x] Customer information provided
- [x] Severity realistically set
- [x] Clearly articulated what is needed from the Gitaly team to support your request by filling out the _What specifically do you need from the Gitaly team_
/cc @mjwood @andrashorvath @jcaigitlab @john.mcdonnell @gerardohttps://gitlab.com/gitlab-org/gitaly/-/issues/5901Remove spawn token limiter2024-03-07T13:39:42ZQuang-Minh Nguyenqmnguyen@gitlab.comRemove spawn token limiterWe have a [SpawnToken](https://gitlab.com/gitlab-org/gitaly/-/blob/933db61fb6cd67139730d83ff2170624986d30e5/internal/command/spawntoken.go#L53-58) setup to limit the maximum number of child processes that can be forked in parallel.
This...We have a [SpawnToken](https://gitlab.com/gitlab-org/gitaly/-/blob/933db61fb6cd67139730d83ff2170624986d30e5/internal/command/spawntoken.go#L53-58) setup to limit the maximum number of child processes that can be forked in parallel.
This was originally added in 2017 when contention on Go's `syscall.ForkLock` caused performance issues in production. However, highly loaded self-managed environments may exceed the 10-process default limit and see extended delays in forking new processes, causing significant performance issues. Generally, the node is already under heavy load when this happens, but not always.
With Go 1.21, scheduled for August 2023, `syscall.ForkLock` should become friendlier to parallel forking with https://github.com/golang/go/issues/23558 / https://go-review.googlesource.com/c/go/+/421441.
We performed [a series of benchmarking](https://gitlab.com/gitlab-org/gitaly/-/issues/5327#note_1415595991) to prove the effectiveness of the new mechanism.
Go 1.21 is now the minimum Go version for Gitaly. It was used broadly in production since January/February 2024 right before Go 1.20's end of life ([MR](https://gitlab.com/gitlab-org/gitlab-omnibus-builder/-/merge_requests/330)). Let's look at spawn token metrics across the cluster in the last 6 months:
* Spawn token queue length:
![Screenshot_2024-03-06_at_18.19.25](/uploads/fe175f7268326a5bd96778e42f638ed0/Screenshot_2024-03-06_at_18.19.25.png)
* Spawn token timeouts:
![Screenshot_2024-03-06_at_18.21.55](/uploads/3dd3d8acd7020eb7bb98b2fed6f31392/Screenshot_2024-03-06_at_18.21.55.png)
[Source](https://thanos.gitlab.net/graph?g0.expr=sum(increase(gitaly_spawn_timeouts_total%7Benv%3D%22gprd%22%2Cenvironment%3D%22gprd%22%2C%20type%3D%22gitaly%22%7D%5B1h%5D))%20by%20(fqdn)&g0.tab=0&g0.stacked=0&g0.range_input=180d&g0.max_source_resolution=auto&g0.deduplicate=1&g0.partial_response=0&g0.store_matches=%5B%5D&g0.step_input=3600&g1.expr=max(gitaly_spawn_token_waiting_length)%20by%20(fqdn)&g1.tab=0&g1.stacked=0&g1.range_input=180d&g1.max_source_resolution=0s&g1.deduplicate=1&g1.partial_response=0&g1.store_matches=%5B%5D&g1.step_input=3600)
Overall, the spawn token queue has been almost flat for months. We observed no significant spawn token congestion. The time window kinda matches the point of time Go 1.21 was rolled out. So, I believe that spawn tokens could be deemed redundant now. Let's remove that limiter to simplify the code base.Next 1-3 releaseshttps://gitlab.com/gitlab-org/gitaly/-/issues/5891Customer's production instance is running Gitaly in kubernetes and they are l...2024-03-05T15:15:41ZGabriel Yoachumgyoachum@gitlab.comCustomer's production instance is running Gitaly in kubernetes and they are looking to move to Gitaly Cluster with no downtime# Support Request for the Gitaly Team
## Customer Information
**Salesforce Link:** https://gitlab.my.salesforce.com/500PL000006N8spYAC
**Zendesk Ticket:** https://gitlab.zendesk.com/agent/tickets/504983
**Installation Size:** 3k Refe...# Support Request for the Gitaly Team
## Customer Information
**Salesforce Link:** https://gitlab.my.salesforce.com/500PL000006N8spYAC
**Zendesk Ticket:** https://gitlab.zendesk.com/agent/tickets/504983
**Installation Size:** 3k Reference Architecture, Cloud Native only
**Architecture Information:**
<!-- Please include cloud hosting provider if available, links to architecture documents, etc... -->
**Slack Channel:**
<!-- Please include the general slack channel, the slack channel for the incident, etc... -->
**Additional Information:**
<!-- Links to executive summary, customer calls, etc... Anything that helps provide context for the team -->
## Support Request
Is there a recommended way to move from Gitaly on Kubernetes to a 3 node Gitaly cluster, without downtime? My initial thought is that they will need to have their cluster in the same network that the kubernetes cluster uses, so they can add the 3 nodes to the existing Gitaly configuration and then set the primary to one of the replicas. We don't seem to have good documentation on how to do this.
### Severity
Severity 3
### Problem Description
Customer's production instance is running Gitaly in kubernetes and they are looking to move to Gitaly Cluster with no downtime.
### Troubleshooting Performed
### What specifically do you need from the Gitaly team
A recommended safe path forward to move from Gitaly on Kubernetes to a 3 node Gitaly cluster.
## Author Checklist
- [x] Customer information provided
- [x] Severity realistically set
- [x] Clearly articulated what is needed from the Gitaly team to support your request by filling out the _What specifically do you need from the Gitaly team_
/cc @mjwood @andrashorvath @jcaigitlab @john.mcdonnell @gerardohttps://gitlab.com/gitlab-org/gitaly/-/issues/5888Improve pack objects cache2024-03-15T05:57:04ZJohn Caijcai@gitlab.comImprove pack objects cacheThe pack objects cache allows subsequent clones/fetches to use the same packfile as long as the refs of the requests are the same. Currently however, the cache is configured globally in Gitaly by setting a timeout. This has worked out re...The pack objects cache allows subsequent clones/fetches to use the same packfile as long as the refs of the requests are the same. Currently however, the cache is configured globally in Gitaly by setting a timeout. This has worked out really well in many cases, helping to reduce CPU load on the server.
However there are opportunities for improvement, mainly around the control and granularity of how long a packfile remains in the cache. In the case of CI for instance, you might want to cache the pack objects for a given ref for longer.
Let's consider making the streamcache in general more flexible to allow clients to control how long certain entries are cached for.https://gitlab.com/gitlab-org/gitaly/-/issues/5883Return Unavailable when sidechannel receives connection error2024-03-25T14:32:54ZWill ChandlerReturn Unavailable when sidechannel receives connection errorWe have configured all of Gitaly's clients to retry read-only requests on `Unavailable` return codes. However, the [sidechannel](https://gitlab.com/gitlab-org/gitaly/-/issues/5744#note_1755613666) still returns an `Internal` code. This i...We have configured all of Gitaly's clients to retry read-only requests on `Unavailable` return codes. However, the [sidechannel](https://gitlab.com/gitlab-org/gitaly/-/issues/5744#note_1755613666) still returns an `Internal` code. This is now the largest source of restart-related errors.16.11Ahmad Sherifahmad@gitlab.comAhmad Sherifahmad@gitlab.comhttps://gitlab.com/gitlab-org/gitaly/-/issues/5882Rollout Git version v2.44.02024-03-25T14:02:04ZPatrick Steinhardtpsteinhardt@gitlab.comRollout Git version v2.44.0## Changelog
To be filled in.
<!--
Add the changelog related to the new version and how this impacts us. It is especially important to highlight changes that increase the risk for this particular upgrade.
Would be really nice to point ...## Changelog
To be filled in.
<!--
Add the changelog related to the new version and how this impacts us. It is especially important to highlight changes that increase the risk for this particular upgrade.
Would be really nice to point out contributions made by the Gitaly team, if any.
-->
## Steps
- [x] Introduce the new Git version behind a feature flag ([Reference](https://gitlab.com/gitlab-org/gitaly/-/merge_requests/5587)).
- [x] Introduce the new bundled Git version in the [Makefile](/Makefile).
- [x] Introduce the new bundled Git execution environment in the [Git package](/internal/git/version.go) behind a feature flag.
- [ ] Roll out the feature flag.
- [x] Create an issue for the rollout of the feature flag ([Reference](https://gitlab.com/gitlab-org/gitaly/-/issues/5030)).
- [ ] Optional: Create a [change request](https://about.gitlab.com/handbook/engineering/infrastructure/change-management/#change-request-workflows) in case the new Git version contains changes that may cause issues.
- [ ] Roll out the feature flag.
- [ ] After a release containing feature flag, remove the feature flag.
- [ ] Update the default Git version. This must happen in a release after the feature flag has been removed to avoid issues with zero-downtime upgrades.
- [ ] Remove the old bundled Git execution environment.
- [ ] Remove the old bundled Git version in the [Makefile](/Makefile).
- [ ] Update the default Git distribution by updating `GIT_VERSION` to the new Git version in the [Makefile](/Makefile).
- [ ] Optional: Upgrade the minimum required Git version. This is only needed when we want to start using features that have been introduced with the new Git version.
- [ ] Update the minimum required Git version in the [Git package](/internal/git/version.go). ([Reference](https://gitlab.com/gitlab-org/gitaly/-/merge_requests/5705))
- [ ] Update the minimum required Git version in the [README.md](/README.md).
- [ ] Update the GitLab release notes to reflect the new minimum required Git version. ([Reference](https://gitlab.com/gitlab-org/gitlab/-/merge_requests/107565).16.11Christian CouderChristian Couderhttps://gitlab.com/gitlab-org/gitaly/-/issues/5876Add RPC to generate bundle-URI bundle2024-03-25T02:03:07ZToon ClaesAdd RPC to generate bundle-URI bundleWe want to give clients (i.e. GitLab rails) control to generate and store bundles that can be used for bundle-URI. Because we want to make it possible to configure parameters on the GitLab admin panel, we'll need to add an RPC for this.We want to give clients (i.e. GitLab rails) control to generate and store bundles that can be used for bundle-URI. Because we want to make it possible to configure parameters on the GitLab admin panel, we'll need to add an RPC for this.https://gitlab.com/gitlab-org/gitaly/-/issues/5872Flaked test TestTransactionManager/run_repacking_concurrently_with_another_tr...2024-03-25T03:25:50ZSami Hiltunenshiltunen@gitlab.comFlaked test TestTransactionManager/run_repacking_concurrently_with_another_transaction_that_produce_new_packfilesJob [#6227870397](https://gitlab.com/gitlab-org/gitaly/-/jobs/6227870397) failed for bb8df0f124270fb9f3659a3ee1c78ad57bc837b0:
```
=== Failed
=== FAIL: internal/gitaly/storage/storagemgr TestTransactionManager/run_repacking_concurrently...Job [#6227870397](https://gitlab.com/gitlab-org/gitaly/-/jobs/6227870397) failed for bb8df0f124270fb9f3659a3ee1c78ad57bc837b0:
```
=== Failed
=== FAIL: internal/gitaly/storage/storagemgr TestTransactionManager/run_repacking_concurrently_with_another_transaction_that_produce_new_packfiles (0.30s)
testhelper_test.go:956:
Error Trace: /builds/gitlab-org/gitaly/internal/gitaly/storage/storagemgr/testhelper_test.go:956
/builds/gitlab-org/gitaly/internal/gitaly/storage/storagemgr/transaction_manager_test.go:209
Error: Received unexpected error:
verifying pack refs: verifying repacking: spawning cat-file command: io: read/write on closed pipe
Test: TestTransactionManager/run_repacking_concurrently_with_another_transaction_that_produce_new_packfiles
logger.go:90: Recorded logs of "shared-logger":
time="2024-02-21T20:24:55Z" level=info msg="All 0 tables opened in 0s\n"
time="2024-02-21T20:24:55Z" level=info msg="Discard stats nextEmptySlot: 0\n"
time="2024-02-21T20:24:55Z" level=info msg="Set nextTxnTs to 0"
time="2024-02-21T20:24:55Z" level=info msg="Lifetime L0 stalled for: 0s\n"
time="2024-02-21T20:24:55Z" level=info msg="\nLevel 0 [ ]: NumTables: 01. Size: 231 B of 0 B. Score: 0.00->0.00 StaleData: 0 B Target FileSize: 64 MiB\nLevel 1 [ ]: NumTables: 00. Size: 0 B of 10 MiB. Score: 0.00->0.00 StaleData: 0 B Target FileSize: 2.0 MiB\nLevel 2 [ ]: NumTables: 00. Size: 0 B of 10 MiB. Score: 0.00->0.00 StaleData: 0 B Target FileSize: 2.0 MiB\nLevel 3 [ ]: NumTables: 00. Size: 0 B of 10 MiB. Score: 0.00->0.00 StaleData: 0 B Target FileSize: 2.0 MiB\nLevel 4 [ ]: NumTables: 00. Size: 0 B of 10 MiB. Score: 0.00->0.00 StaleData: 0 B Target FileSize: 2.0 MiB\nLevel 5 [ ]: NumTables: 00. Size: 0 B of 10 MiB. Score: 0.00->0.00 StaleData: 0 B Target FileSize: 2.0 MiB\nLevel 6 [B]: NumTables: 00. Size: 0 B of 10 MiB. Score: 0.00->0.00 StaleData: 0 B Target FileSize: 2.0 MiB\nLevel Done\n"
=== FAIL: internal/gitaly/storage/storagemgr TestTransactionManager (0.27s)
```16.11https://gitlab.com/gitlab-org/gitaly/-/issues/5870Capture reftable information in repository stats2024-03-20T07:07:07Zkarthik nayakknayak@gitlab.comCapture reftable information in repository statsFrom @pks-gitlab comment [here](https://gitlab.com/gitlab-org/gitaly/-/merge_requests/6684#note_1781172901)
> I think it would be interesting regardless to surface information specific to the reftable backend:
>
> * The length of `table...From @pks-gitlab comment [here](https://gitlab.com/gitlab-org/gitaly/-/merge_requests/6684#note_1781172901)
> I think it would be interesting regardless to surface information specific to the reftable backend:
>
> * The length of `tables.list`.
> * A list of tables, each with their minimum/maximum update index and file size. This shouldn't be too large given that the number of reftables is essentially bounded.
> * The number of unrecognized garbage files.
>
> The reason is that the repository info isn't only used as a mechanism for housekeeping, but is also being logged. As we have a rather high likelihood of reftable-specific bugs due to it being so new this information, I think that this information would be really helpful to figure out whether it's working as intended.
>
> We can also defer this to a follow-up merge request, but in that case we should adapt the comment and create and schedule an issue.16.11James Liujliu@gitlab.comJames Liujliu@gitlab.com