- Oct 07, 2021
-
-
Patrick Steinhardt authored
The supervisor's health monitor will check health every 15 seconds or alternatively exit if its shutdown channel was closed. While we do verify this channel correctly when trying to send the event, we do not check for shutdown when waiting 15 seconds for the next event. Fix this by returning early if we got the signal to shut down.
-
Patrick Steinhardt authored
The bootstrapper is currently leaking multiple Goroutines in the upgrader because we never make sure to clean it up after tests have finished. Add its existing `Stop()` function to the upgrader interface and call is as required to plug these leaks.
-
Patrick Steinhardt authored
In many tests where the catfile cache is in use, we create it but don't ever clean up the cache. As a result, the monitoring Goroutine and any potenitally cached processes may not get cleaned up as expected. Fix this by calling `Stop()` as required.
-
Patrick Steinhardt authored
The gitaly-backup manager currently creates an ad-hoc instance of the connection pool. This pool never gets cleaned up and thus results in leakage of its connections. Inject the connection pool into the manager and clean it up as required.
-
Patrick Steinhardt authored
Now that the "testhelper" package doesn't depend on "command" anymore, convert the "command" package to use it.
-
Patrick Steinhardt authored
The "testhelper" package currently depends on the "command" package, which is why we cannot use the "testhelper"'s setup functions in the latter package. The only remaining dependency is the wait group which is used to count in-flight commands though such that we can assert that there are no leaking processes after tests have concluded. Move this wait group into a separate "commandcounter" package to remove this dependency. While this also allows other packages to modify the counter due to its interface now being public, this really shouldn't be much of a problem at all.
-
Patrick Steinhardt authored
While the "command" package provides a nice helper to check for process's error codes, use of the "command" package in the testhelper keeps us from using leak checkers in the "command" package itself. Open-code this function in the testhelper package to reduce dependencies to the "command" package.
-
Patrick Steinhardt authored
Refactor checks for leaked processes such that there are less calls to `panic()`. This prepares for the inlining of the "command" package in order to break a cyclic dependency.
-
Patrick Steinhardt authored
The function `MustHaveNoChildProcess()` is not a general test helper that should be used by tests as they see fit, but is instead intended to be run at the end of tests only. Move it into a separate package and make it internal to the package to make it less discoverable by callers.
-
Patrick Steinhardt authored
Setup of a test package's main function is quite repetitive: we call a separate `testMain()` function such that we can use `defer` calls, this function calls `MustHaveNoChildProcess()` and `Configure()`, and then we return the result of `m.Run()`. This is almost always exactly the same, with some exceptions where we need to have additional setup code. Unify this code via a single `testhelper.Run()` function which does all of this. It allows us greater flexibility in the setup code such that we can easily perform additional validation after the tests without having to modify all callsites. Note that this change removes calls to the goleak package in some places. These will resurface in a later commit in this patch series, where we instead move this call into `testhelper.Run()` such that it is executed by default.
-
Patrick Steinhardt authored
When configuration of the Ruby sidecar fails, then we call `log.Fatalf()` and thus directly execute the complete test suite. Given that we call this function on-demand in tests nowadays though, we should instead return the error and validate at the test site that it did set up the server correctly such that other tests continue to run as expected. Fix this issue by returning the error instead of aborting.
-
James Fargher authored
Praefect: add sidechannel support See merge request !3862
-
- Oct 06, 2021
-
-
Jacob Vosmaer authored
This commit adds backchannel support to the main gRPC listener of Praefect. And if clients make gRPC calls with sidechannels, Praefect will now proxy these to the Gitaly backend. Changelog: added
-
Pavlo Strokov authored
remote: Drop FetchInternalRemote RPC Closes #3667 See merge request !3898
-
Pavlo Strokov authored
datastore: Fix storage cleanup's handling of timezones See merge request !3930
-
Patrick Steinhardt authored
The FetchInternalRemote RPC had been used internally to replicate repositories across different Gitaly nodes. At some point in time, this was converted to do a direct fetch though without an additional RPC call because as it turned out, doing inter-Gitaly mutating RPC calls is problematic in combination with transactions given that the remote side would now try to cast votes against another Gitaly. Nowadays, there are no callers left which use this RPC. Remove the deprecated `FetchInternalRemote()` call. The backing logic which is still internally called remains though for use in other parts of Gitaly. We may eventually want to find a better place for it to live. Changelog: removed
-
Patrick Steinhardt authored
Refactor `FetchInternalRemote()` to use the new function to get the default branch provided by the remoterepo package.
-
Patrick Steinhardt authored
We have an ad-hoc implementation of `GetDefaultBranch()` for remote repositories in the `FetchInternalRemote()` RPC implementation. Given that this RPC call simply maps to `localrepo.GetDefaultBranch()`, it makes sense to pull up the implementation into the remoterepo interface to make it generally available. Implement `remoterepo.GetDefaultBranch()` and move tests of this function from localrepo-specific tests into the shared test suite.
-
Patrick Steinhardt authored
The tests for `FetchInternalRemote` are a bit antique and hard to follow. Refactor them to reflect a more modern style. This is also a preparation of the upcoming change to drop the RPC call, where we only retain its internal non-RPC backing logic.
-
Patrick Steinhardt authored
While the testcase for ReplicateRepository which exercises failing fetches fails as expected, the reason for failure is not the failing fetch but instead it is that we do not have the required set of binaries installed. Fixing this surfaces another issue: because fetches nowadays happen internally instead of calling the `FetchInternalRemote()` RPC, the mock remote server which stubs out the RPC with an error isn't invoked at all. Fix the test setup by setting up a "real" gRPC server, where the fetch failure is provoked by writing garbage into the source repo's HEAD.
-
Patrick Steinhardt authored
Timestamps in the `storage_cleanups` table are stored without timezones, but our timezone handling in `AcquireNextStorage()` is inconsistent: we typically just use `NOW()` to get the current timestamp, but in one case we use `NOW() AT TIME ZONE 'UTC'`. In some setups, this leads to time stamps which cannot be correctly compared with each other. Fix the bug by consistently storing time stamps without time zones. This bug is similar to 4a2ac0ed (datastore: Fix acknowledgement of stale jobs considering timezones, 2021-08-20). Changelog: fixed
-
Pavlo Strokov authored
replication: Process replication events for storages in parallel See merge request !3894
-
Patrick Steinhardt authored
catfile: Reintroduce refactoring and plug Goroutine leak in the cache Closes gitlab-com/gl-infra/production#5566 See merge request !3905
-
Patrick Steinhardt authored
A recent refactoring of the catfile cache has caused an incident via a new Goroutine leak. Assert that we do not leak any Goroutines via `MustHaveNoGoroutines()` now that the leak has been plugged such that we cannot have new regressions there.
-
Patrick Steinhardt authored
When constructing processes as part of the catfile cache, then we must create a separate context in case the process is cacheable such that it does not get killed when the original request context gets cancelled. Starting with commit 36c166a1 (catfile: Explicitly close processes, 2021-09-09), we have converted the code to always explicitly close processes instead of doing so via the context. As a result though, this separate detached context will never get cancelled at all, causing us to leak Goroutines which listen on context cancellation to perform various cleanup operations. While we could simply revert this commit, the original implementation has been quite indirect given that all process cancellation happened via this context. The context has thus been passed to the `BatchProcess`, which knew to cancel it as required. This is the burden to children of the cache, which is an inversion of responsibilites: it should be the cache which is managing lifetimes, not the processes. Fix the Goroutine leak by instead stowing away the separate context's cancellation function as part of the cache entries. Like this, the cache can retrieve and cancel them whenever it evicts entries from the cache, and it can cancel them when a process cannot be returned to the cache in case it is empty. In the other case of uncacheable processes, we do not replace the surrounding context's done function at all and can thus continue to rely on the caller to eventually cancel the context. Fixes: gitlab-com/gl-infra/production#5566 Changelog: fixed
-
Patrick Steinhardt authored
Explicitly stop the catfile cache whenever we construct it in our test suite such that we shut down the monitoring Goroutine and evict any processes hosted by it.
-
Patrick Steinhardt authored
In some packages, we have started to assert that we do not leak Goroutines via the goleak package. One exception we need to always have present is for the opencensus package given that it starts a Goroutine in its `init()` function which we have no way to stop. Provide a central function `MustHaveNoGoroutines()` in the testhelper package such that we do not have to duplicate this exception whenever we use the goleak package.
-
Patrick Steinhardt authored
This reverts commit f99dae3b, reversing changes made to 15c4a53f.
-
James Fargher authored
Since gitlab-labkit also depends on activesupport, this needed updating too. These gems have been updated to match gitlab-rails. Changelog: changed
-
- Oct 05, 2021
-
-
Toon Claes authored
Add add-repository praefect subcmd Closes #3773 See merge request !3918
-
Jacob Vosmaer authored
This change adds publicly exported code to the gitaly/client package that allows Gitaly clients to accept sidechannel connections coming back from a Gitaly server they have connected to. This is done via a new dial function DialSidechannel. Changelog: added
-
John Cai authored
Adds track-repository subcommand that allows an admin to add a repository that exists on one of the gitaly nodes to the praefect database. Does some safety checks before inserting the reords. Changelog: added
-
Pavlo Strokov authored
Given that we are able to configure amount of workers used to process replication events we add that capability into the ReplMgr. Now it starts configured amount of workers per virtual storage to process replication events. If amount of workers is higher than the minimum amount of gitaly storages it will be aligned with the latest and the log message about will be issued. If amount is not set the default value of 1 will be used (the old behaviour). Part of: #2915
-
Pavlo Strokov authored
The new configuration parameter to set number of the workers used to process replication jobs. It is a safe incremental move on to the rolling out parallel storage processing by the replication manager. The default value 1 behaves the same as no changes done to replication manager (the old behaviour). Part of: #2915
-
Pavlo Strokov authored
Current implementation iterates over gitaly storages and executes replication events on each sequentially. As some replication events could take significant amount of time to complete this results into a long replication lag for other storages as their replication is blocked. With this change we start to process replication events for each gitaly storage in parallel, so there is no replication lag between. If one storage is blocked on the processing of the replication event all others are allowed to run their replication events in a mean time. Part of: #2915 Changelog: changed
-
Pavlo Strokov authored
To support parallel processing of the replication events we need to refactor usage of the backoff function. Now it is a factory that creates backoff functions on demand. With that we could create a backoff function for each goroutine, so they are independent one from another.
-
- Oct 04, 2021
-
-
James Fargher authored
In order to gracefully handle incremental backups where nothing has changed, we need to determine when `git bundle create` failed because the bundle was empty. This isn't an error for incremental backups, it simply means there were no updates. The FailedPrecondition GRPC code was chosen here since it best matches the litmus test: > (c) Use FailedPrecondition if the client should not retry until > the system state has been explicitly fixed. E.g., if an "rmdir" > fails because the directory is non-empty, FailedPrecondition > should be returned since the client should not retry unless > they have first fixed up the directory by deleting files from it. Changelog: changed
-