Skip to content
Snippets Groups Projects
Verified Commit 25616447 authored by Gerardo Gutierrez's avatar Gerardo Gutierrez Committed by GitLab
Browse files

Document common replication errors and possible solutions

parent 0fc73b20
No related branches found
No related tags found
2 merge requests!162233Draft: Script to update Topology Service Gem,!150924Document common replication errors and possible solutions
......@@ -237,6 +237,13 @@ This Rake task checksums the repository on all Gitaly nodes.
The [Praefect `dataloss`](recovery.md#check-for-data-loss) command only checks the state of the repository in the Praefect database, and cannot
be relied to detect sync problems in this scenario.
### `dataloss` command shows `@failed-geo-sync` repositories as out of sync
`@failed-geo-sync` is a legacy path that was used on GitLab 16.1 and earlier by Geo when project synchronization failed and has been
[deprecated](https://gitlab.com/gitlab-org/gitlab/-/issues/375640).
On GitLab 16.2 and later, you can safely delete this path. The `@failed-geo-sync` directories are located under [the repository path](../repository_storage_paths.md) on the Gitaly nodes.
## Relation does not exist errors
By default Praefect database tables are created automatically by `gitlab-ctl reconfigure` task.
......@@ -294,3 +301,98 @@ Possible solutions:
## `gitlab-ctl reconfigure` fails with error: `STDOUT: praefect: configuration error: error reading config file: toml: cannot store TOML string into a Go int`
This error occurs when `praefect['database_port']` or `praefect['database_direct_port']` are configured as a string instead of an integer.
## Common replication errors
The following are some common replication errors with possible solutions.
### Lock file exists
Lock files are used to prevent multiple updates to the same ref. Sometimes lock files become stale, and replication fails with the error `error: cannot lock ref`.
To clear stale `*.lock` files, you can trigger `OptimizeRepositoryRequest` on the [Rails console](../operations/rails_console.md):
```ruby
p = Project.find <Project ID>
client = Gitlab::GitalyClient::RepositoryService.new(p.repository)
client.optimize_repository
```
If triggering `OptimizeRepositoryRequest` does not work, inspect the files manually to confirm the creation date and decide if the `*.lock` file can be manually removed.
Any lock files created over 24 hours ago are safe to remove.
### Git `fsck` errors
Gitaly repositories with invalid objects can lead to replication failures with errors in Gitaly logs such as:
- `exit status 128, stderr: "fatal: git upload-pack: not our ref"`.
- `"fatal: bad object 58....e0f... ssh://gitaly/internal.git did not send all necessary objects`.
As long one of the Gitaly nodes still has a healthy copy of the repository, these issues can be fixed by:
1. [Removing the repository from the Praefect database](recovery.md#manually-remove-repositories).
1. Using the [Praefect `track-repository` subcommand](recovery.md#manually-add-a-single-repository-to-the-tracking-database) to re-track it.
This will use the copy of the repository from the authoritative Gitaly node to overwrite the copies on all other Gitaly nodes.
Be sure a recent backup of the repository has been made before running these commands.
1. Move the bad repository out of place:
```shell
run `mv <REPOSITORY_PATH> <REPOSITORY_PATH>.backup`
```
For example:
```shell
mv /var/opt/gitlab/git-data/repositories/@cluster/repositories/de/74/2335 /var/opt/gitlab/git-data/repositories/@cluster/repositories/de/74/2335.backup
```
1. Run the Praefect commands to trigger replication:
```shell
# Validate you have the correct repository.
sudo /opt/gitlab/embedded/bin/praefect -config /var/opt/gitlab/praefect/config.toml remove-repository -virtual-storage gitaly -relative-path '<relative_path>' -db-only
# Run again with '--apply' flag to remove repository from the Praefect tracking database
sudo /opt/gitlab/embedded/bin/praefect -config /var/opt/gitlab/praefect/config.toml remove-repository -virtual-storage gitaly -relative-path '<relative_path>' -db-only --apply
# Re-track the repository, overwriting the secondary nodes
sudo /opt/gitlab/embedded/bin/praefect -config /var/opt/gitlab/praefect/config.toml track-repository -virtual-storage gitaly -authoritative-storage '<healthy_gitaly>' -relative-path '<relative_path>' -replica-path '<replica_path>'-replicate-immediately
```
### Replication fails silently
If the [Praefect `dataloss`](recovery.md#check-for-data-loss) shows [repositories partially unavailable](recovery.md#unavailable-replicas-of-available-repositories), and [`accept-dataloss` command](recovery.md#accept-data-loss) fails to synchronize the repository with no error present on the logs, this could be due to a mismatch in Praefect database in the `repository_id` field of the `storage_repositories` table. To check for a mismatch:
1. Connect to the Praefect database.
1. Run the following query:
```sql
select * from storage_repositories where relative_path = '<relative-path>';
```
Replace `<relative-path>` with the repository path [beginning with `@hashed`](../repository_storage_paths.md#hashed-storage).
### Alternate directory does not exists
GitLab uses the [Git alternates mechanism for deduplication](../../development/git_object_deduplication.md). `alternates` is a text file that points to the `objects` directory on
a `@pool` repository to fetch objects. If this file points to an invalid path, replication can fail with one of the following the errors:
- `"error":"no alternates directory exists", "warning","msg":"alternates file does not point to valid git repository"`
- `"error":"unexpected alternates content:`
- `remote: error: unable to normalize alternate object path`
To investigate the cause of this error:
1. Check if the project is part of a pool by using the [Rails console](../operations/rails_console.md):
```ruby
project = Project.find_by_id(<project id>)
project.pool_repository
```
1. Check if the pool repository path exists on disk and that it matches [the `alternates` file](../../development/git_object_deduplication.md) content.
1. Check if the path in the [`alternates` file](../../development/git_object_deduplication.md) is reachable from the `objects` directory in the project.
After performing these checks, reach out to GitLab Support with the information collected.
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment