Skip to content
Snippets Groups Projects
Verified Commit 1da45e94 authored by Evan Read's avatar Evan Read Committed by GitLab
Browse files

Merge branch 'gerardo-master-patch-90200' into 'master'

Document common replication errors and possible solutions

See merge request !150924



Merged-by: default avatarEvan Read <eread@gitlab.com>
Approved-by: default avatarWill Chandler <wchandler@gitlab.com>
Approved-by: default avatarEvan Read <eread@gitlab.com>
Reviewed-by: default avatarWill Chandler <wchandler@gitlab.com>
Reviewed-by: default avatarEvan Read <eread@gitlab.com>
Co-authored-by: default avatarGerardo Gutierrez <ggutierrez@gitlab.com>
parents 43e288e9 25616447
No related branches found
No related tags found
2 merge requests!162233Draft: Script to update Topology Service Gem,!150924Document common replication errors and possible solutions
Pipeline #1303102034 passed
......@@ -237,6 +237,13 @@ This Rake task checksums the repository on all Gitaly nodes.
The [Praefect `dataloss`](recovery.md#check-for-data-loss) command only checks the state of the repository in the Praefect database, and cannot
be relied to detect sync problems in this scenario.
### `dataloss` command shows `@failed-geo-sync` repositories as out of sync
`@failed-geo-sync` is a legacy path that was used on GitLab 16.1 and earlier by Geo when project synchronization failed and has been
[deprecated](https://gitlab.com/gitlab-org/gitlab/-/issues/375640).
On GitLab 16.2 and later, you can safely delete this path. The `@failed-geo-sync` directories are located under [the repository path](../repository_storage_paths.md) on the Gitaly nodes.
## Relation does not exist errors
By default Praefect database tables are created automatically by `gitlab-ctl reconfigure` task.
......@@ -294,3 +301,98 @@ Possible solutions:
## `gitlab-ctl reconfigure` fails with error: `STDOUT: praefect: configuration error: error reading config file: toml: cannot store TOML string into a Go int`
This error occurs when `praefect['database_port']` or `praefect['database_direct_port']` are configured as a string instead of an integer.
## Common replication errors
The following are some common replication errors with possible solutions.
### Lock file exists
Lock files are used to prevent multiple updates to the same ref. Sometimes lock files become stale, and replication fails with the error `error: cannot lock ref`.
To clear stale `*.lock` files, you can trigger `OptimizeRepositoryRequest` on the [Rails console](../operations/rails_console.md):
```ruby
p = Project.find <Project ID>
client = Gitlab::GitalyClient::RepositoryService.new(p.repository)
client.optimize_repository
```
If triggering `OptimizeRepositoryRequest` does not work, inspect the files manually to confirm the creation date and decide if the `*.lock` file can be manually removed.
Any lock files created over 24 hours ago are safe to remove.
### Git `fsck` errors
Gitaly repositories with invalid objects can lead to replication failures with errors in Gitaly logs such as:
- `exit status 128, stderr: "fatal: git upload-pack: not our ref"`.
- `"fatal: bad object 58....e0f... ssh://gitaly/internal.git did not send all necessary objects`.
As long one of the Gitaly nodes still has a healthy copy of the repository, these issues can be fixed by:
1. [Removing the repository from the Praefect database](recovery.md#manually-remove-repositories).
1. Using the [Praefect `track-repository` subcommand](recovery.md#manually-add-a-single-repository-to-the-tracking-database) to re-track it.
This will use the copy of the repository from the authoritative Gitaly node to overwrite the copies on all other Gitaly nodes.
Be sure a recent backup of the repository has been made before running these commands.
1. Move the bad repository out of place:
```shell
run `mv <REPOSITORY_PATH> <REPOSITORY_PATH>.backup`
```
For example:
```shell
mv /var/opt/gitlab/git-data/repositories/@cluster/repositories/de/74/2335 /var/opt/gitlab/git-data/repositories/@cluster/repositories/de/74/2335.backup
```
1. Run the Praefect commands to trigger replication:
```shell
# Validate you have the correct repository.
sudo /opt/gitlab/embedded/bin/praefect -config /var/opt/gitlab/praefect/config.toml remove-repository -virtual-storage gitaly -relative-path '<relative_path>' -db-only
# Run again with '--apply' flag to remove repository from the Praefect tracking database
sudo /opt/gitlab/embedded/bin/praefect -config /var/opt/gitlab/praefect/config.toml remove-repository -virtual-storage gitaly -relative-path '<relative_path>' -db-only --apply
# Re-track the repository, overwriting the secondary nodes
sudo /opt/gitlab/embedded/bin/praefect -config /var/opt/gitlab/praefect/config.toml track-repository -virtual-storage gitaly -authoritative-storage '<healthy_gitaly>' -relative-path '<relative_path>' -replica-path '<replica_path>'-replicate-immediately
```
### Replication fails silently
If the [Praefect `dataloss`](recovery.md#check-for-data-loss) shows [repositories partially unavailable](recovery.md#unavailable-replicas-of-available-repositories), and [`accept-dataloss` command](recovery.md#accept-data-loss) fails to synchronize the repository with no error present on the logs, this could be due to a mismatch in Praefect database in the `repository_id` field of the `storage_repositories` table. To check for a mismatch:
1. Connect to the Praefect database.
1. Run the following query:
```sql
select * from storage_repositories where relative_path = '<relative-path>';
```
Replace `<relative-path>` with the repository path [beginning with `@hashed`](../repository_storage_paths.md#hashed-storage).
### Alternate directory does not exists
GitLab uses the [Git alternates mechanism for deduplication](../../development/git_object_deduplication.md). `alternates` is a text file that points to the `objects` directory on
a `@pool` repository to fetch objects. If this file points to an invalid path, replication can fail with one of the following the errors:
- `"error":"no alternates directory exists", "warning","msg":"alternates file does not point to valid git repository"`
- `"error":"unexpected alternates content:`
- `remote: error: unable to normalize alternate object path`
To investigate the cause of this error:
1. Check if the project is part of a pool by using the [Rails console](../operations/rails_console.md):
```ruby
project = Project.find_by_id(<project id>)
project.pool_repository
```
1. Check if the pool repository path exists on disk and that it matches [the `alternates` file](../../development/git_object_deduplication.md) content.
1. Check if the path in the [`alternates` file](../../development/git_object_deduplication.md) is reachable from the `objects` directory in the project.
After performing these checks, reach out to GitLab Support with the information collected.
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment