Skip to content
Snippets Groups Projects
  1. Oct 07, 2021
    • Patrick Steinhardt's avatar
      supervisor: Stop monitor if shutdown was signalled · 1641ec51
      Patrick Steinhardt authored
      The supervisor's health monitor will check health every 15 seconds or
      alternatively exit if its shutdown channel was closed. While we do
      verify this channel correctly when trying to send the event, we do not
      check for shutdown when waiting 15 seconds for the next event.
      
      Fix this by returning early if we got the signal to shut down.
      1641ec51
    • Patrick Steinhardt's avatar
      bootstrap: Make bootstrapper stoppable to plug Goroutine leaks · 22351f97
      Patrick Steinhardt authored
      The bootstrapper is currently leaking multiple Goroutines in the
      upgrader because we never make sure to clean it up after tests have
      finished. Add its existing `Stop()` function to the upgrader interface
      and call is as required to plug these leaks.
      22351f97
    • Patrick Steinhardt's avatar
      catfile: Stop cache to plug Goroutine leaks · 6cb7e554
      Patrick Steinhardt authored
      In many tests where the catfile cache is in use, we create it but don't
      ever clean up the cache. As a result, the monitoring Goroutine and any
      potenitally cached processes may not get cleaned up as expected.
      
      Fix this by calling `Stop()` as required.
      6cb7e554
    • Patrick Steinhardt's avatar
      gitaly-backup: Inject connection pool to unleak its Goroutines · ff726e13
      Patrick Steinhardt authored
      The gitaly-backup manager currently creates an ad-hoc instance of the
      connection pool. This pool never gets cleaned up and thus results in
      leakage of its connections.
      
      Inject the connection pool into the manager and clean it up as required.
      ff726e13
    • Patrick Steinhardt's avatar
      command: Start using the testhelper package · a980fb11
      Patrick Steinhardt authored
      Now that the "testhelper" package doesn't depend on "command" anymore,
      convert the "command" package to use it.
      a980fb11
    • Patrick Steinhardt's avatar
      command: Solve cyclic dependency with testhelper package · 36c43abf
      Patrick Steinhardt authored
      The "testhelper" package currently depends on the "command" package,
      which is why we cannot use the "testhelper"'s setup functions in the
      latter package. The only remaining dependency is the wait group which is
      used to count in-flight commands though such that we can assert that
      there are no leaking processes after tests have concluded.
      
      Move this wait group into a separate "commandcounter" package to remove
      this dependency. While this also allows other packages to modify the
      counter due to its interface now being public, this really shouldn't be
      much of a problem at all.
      36c43abf
    • Patrick Steinhardt's avatar
      testhelper: Open-code check for process's error codes · 437f1aaf
      Patrick Steinhardt authored
      While the "command" package provides a nice helper to check for
      process's error codes, use of the "command" package in the testhelper
      keeps us from using leak checkers in the "command" package itself.
      
      Open-code this function in the testhelper package to reduce dependencies
      to the "command" package.
      437f1aaf
    • Patrick Steinhardt's avatar
      testhelper: Refactor checks for leaked processes · ed43ae61
      Patrick Steinhardt authored
      Refactor checks for leaked processes such that there are less calls to
      `panic()`. This prepares for the inlining of the "command" package in
      order to break a cyclic dependency.
      ed43ae61
    • Patrick Steinhardt's avatar
      testhelper: Move process leak checks into separate package · 58f78e3a
      Patrick Steinhardt authored
      The function `MustHaveNoChildProcess()` is not a general test helper
      that should be used by tests as they see fit, but is instead intended to
      be run at the end of tests only. Move it into a separate package and
      make it internal to the package to make it less discoverable by callers.
      58f78e3a
    • Patrick Steinhardt's avatar
      testhelper: Unify setup of test suites · d8b43e95
      Patrick Steinhardt authored
      Setup of a test package's main function is quite repetitive: we call a
      separate `testMain()` function such that we can use `defer` calls, this
      function calls `MustHaveNoChildProcess()` and `Configure()`, and then we
      return the result of `m.Run()`. This is almost always exactly the same,
      with some exceptions where we need to have additional setup code.
      
      Unify this code via a single `testhelper.Run()` function which does all
      of this. It allows us greater flexibility in the setup code such that we
      can easily perform additional validation after the tests without having
      to modify all callsites.
      
      Note that this change removes calls to the goleak package in some
      places. These will resurface in a later commit in this patch series,
      where we instead move this call into `testhelper.Run()` such that it is
      executed by default.
      d8b43e95
    • Patrick Steinhardt's avatar
      testhelper: Return errors when configuring Ruby sidecar fails · 27de9ce6
      Patrick Steinhardt authored
      When configuration of the Ruby sidecar fails, then we call
      `log.Fatalf()` and thus directly execute the complete test suite. Given
      that we call this function on-demand in tests nowadays though, we should
      instead return the error and validate at the test site that it did set
      up the server correctly such that other tests continue to run as
      expected.
      
      Fix this issue by returning the error instead of aborting.
      27de9ce6
    • James Fargher's avatar
      Merge branch 'jv-sidechannel-praefect' into 'master' · a3c694b5
      James Fargher authored
      Praefect: add sidechannel support
      
      See merge request !3862
      a3c694b5
  2. Oct 06, 2021
    • Jacob Vosmaer's avatar
      Praefect: proxy sidechannels · 9afed4db
      Jacob Vosmaer authored
      This commit adds backchannel support to the main gRPC listener of
      Praefect. And if clients make gRPC calls with sidechannels, Praefect
      will now proxy these to the Gitaly backend.
      
      Changelog: added
      9afed4db
    • Pavlo Strokov's avatar
      Merge branch 'pks-drop-fetch-internal-remote' into 'master' · 5027043f
      Pavlo Strokov authored
      remote: Drop FetchInternalRemote RPC
      
      Closes #3667
      
      See merge request !3898
      5027043f
    • Pavlo Strokov's avatar
      Merge branch 'pks-datastore-cleanup-timezone' into 'master' · 2b424740
      Pavlo Strokov authored
      datastore: Fix storage cleanup's handling of timezones
      
      See merge request !3930
      2b424740
    • Patrick Steinhardt's avatar
      remote: Drop FetchInternalRemote RPC · 047b1e6a
      Patrick Steinhardt authored
      The FetchInternalRemote RPC had been used internally to replicate
      repositories across different Gitaly nodes. At some point in time, this
      was converted to do a direct fetch though without an additional RPC call
      because as it turned out, doing inter-Gitaly mutating RPC calls is
      problematic in combination with transactions given that the remote side
      would now try to cast votes against another Gitaly. Nowadays, there are
      no callers left which use this RPC.
      
      Remove the deprecated `FetchInternalRemote()` call. The backing logic
      which is still internally called remains though for use in other parts
      of Gitaly. We may eventually want to find a better place for it to live.
      
      Changelog: removed
      047b1e6a
    • Patrick Steinhardt's avatar
      remote: Refactor `FetchInternalRemote()` to use remoterepo interface · 9ed7528e
      Patrick Steinhardt authored
      Refactor `FetchInternalRemote()` to use the new function to get the
      default branch provided by the remoterepo package.
      9ed7528e
    • Patrick Steinhardt's avatar
      remoterepo: Implement `GetDefaultBranch()` · 033647da
      Patrick Steinhardt authored
      We have an ad-hoc implementation of `GetDefaultBranch()` for remote
      repositories in the `FetchInternalRemote()` RPC implementation. Given
      that this RPC call simply maps to `localrepo.GetDefaultBranch()`, it
      makes sense to pull up the implementation into the remoterepo interface
      to make it generally available.
      
      Implement `remoterepo.GetDefaultBranch()` and move tests of this
      function from localrepo-specific tests into the shared test suite.
      033647da
    • Patrick Steinhardt's avatar
      remote: Modernize tests for `FetchInternalRemote` · e2657ce4
      Patrick Steinhardt authored
      The tests for `FetchInternalRemote` are a bit antique and hard to
      follow. Refactor them to reflect a more modern style. This is also a
      preparation of the upcoming change to drop the RPC call, where we only
      retain its internal non-RPC backing logic.
      e2657ce4
    • Patrick Steinhardt's avatar
      repository: Fix test for failing fetch in ReplicateRepository · 5cb50ac0
      Patrick Steinhardt authored
      While the testcase for ReplicateRepository which exercises failing
      fetches fails as expected, the reason for failure is not the failing
      fetch but instead it is that we do not have the required set of binaries
      installed. Fixing this surfaces another issue: because fetches nowadays
      happen internally instead of calling the `FetchInternalRemote()` RPC,
      the mock remote server which stubs out the RPC with an error isn't
      invoked at all.
      
      Fix the test setup by setting up a "real" gRPC server, where the fetch
      failure is provoked by writing garbage into the source repo's HEAD.
      5cb50ac0
    • Patrick Steinhardt's avatar
      datastore: Fix storage cleanup's handling of timezones · 4f0368b6
      Patrick Steinhardt authored
      Timestamps in the `storage_cleanups` table are stored without timezones,
      but our timezone handling in `AcquireNextStorage()` is inconsistent: we
      typically just use `NOW()` to get the current timestamp, but in one case
      we use `NOW() AT TIME ZONE 'UTC'`. In some setups, this leads to time
      stamps which cannot be correctly compared with each other.
      
      Fix the bug by consistently storing time stamps without time zones. This
      bug is similar to 4a2ac0ed (datastore: Fix acknowledgement of stale
      jobs considering timezones, 2021-08-20).
      
      Changelog: fixed
      4f0368b6
    • Pavlo Strokov's avatar
      Merge branch 'ps-replication-parallel-processing' into 'master' · cd94be7f
      Pavlo Strokov authored
      replication: Process replication events for storages in parallel
      
      See merge request !3894
      cd94be7f
    • Patrick Steinhardt's avatar
      Merge branch 'pks-catfile-fixed-goroutine-leakage' into 'master' · 01501de9
      Patrick Steinhardt authored
      catfile: Reintroduce refactoring and plug Goroutine leak in the cache
      
      Closes gitlab-com/gl-infra/production#5566
      
      See merge request !3905
      01501de9
    • Patrick Steinhardt's avatar
      catfile: Test for Goroutine leaks · 21bbf22b
      Patrick Steinhardt authored
      A recent refactoring of the catfile cache has caused an incident via a
      new Goroutine leak. Assert that we do not leak any Goroutines via
      `MustHaveNoGoroutines()` now that the leak has been plugged such that we
      cannot have new regressions there.
      21bbf22b
    • Patrick Steinhardt's avatar
      catfile: Plug Goroutine leak in the cache · c7458405
      Patrick Steinhardt authored
      When constructing processes as part of the catfile cache, then we must
      create a separate context in case the process is cacheable such that it
      does not get killed when the original request context gets cancelled.
      Starting with commit 36c166a1 (catfile: Explicitly close processes,
      2021-09-09), we have converted the code to always explicitly close
      processes instead of doing so via the context. As a result though, this
      separate detached context will never get cancelled at all, causing us to
      leak Goroutines which listen on context cancellation to perform various
      cleanup operations.
      
      While we could simply revert this commit, the original implementation
      has been quite indirect given that all process cancellation happened via
      this context. The context has thus been passed to the `BatchProcess`,
      which knew to cancel it as required. This is the burden to children of
      the cache, which is an inversion of responsibilites: it should be the
      cache which is managing lifetimes, not the processes.
      
      Fix the Goroutine leak by instead stowing away the separate context's
      cancellation function as part of the cache entries. Like this, the cache
      can retrieve and cancel them whenever it evicts entries from the cache,
      and it can cancel them when a process cannot be returned to the cache in
      case it is empty. In the other case of uncacheable processes, we do not
      replace the surrounding context's done function at all and can thus
      continue to rely on the caller to eventually cancel the context.
      
      Fixes: gitlab-com/gl-infra/production#5566
      Changelog: fixed
      c7458405
    • Patrick Steinhardt's avatar
      catfile: Explicitly stop the catfile cache in tests · e8a41257
      Patrick Steinhardt authored
      Explicitly stop the catfile cache whenever we construct it in our test
      suite such that we shut down the monitoring Goroutine and evict any
      processes hosted by it.
      e8a41257
    • Patrick Steinhardt's avatar
      testhelper: Introduce new function to check for leaking Goroutines · f8c17e34
      Patrick Steinhardt authored
      In some packages, we have started to assert that we do not leak
      Goroutines via the goleak package. One exception we need to always have
      present is for the opencensus package given that it starts a Goroutine
      in its `init()` function which we have no way to stop.
      
      Provide a central function `MustHaveNoGoroutines()` in the testhelper
      package such that we do not have to duplicate this exception whenever we
      use the goleak package.
      f8c17e34
    • Patrick Steinhardt's avatar
      Revert "Merge branch 'sh-revert-3853' into 'master'" · 79268eaf
      Patrick Steinhardt authored
      This reverts commit f99dae3b, reversing
      changes made to 15c4a53f.
      79268eaf
    • John Cai's avatar
      Merge branch 'update_activesupport' into 'master' · 016642ec
      John Cai authored
      Update ruby gem activesupport to v6.1.4.1
      
      Closes #3750
      
      See merge request !3929
      016642ec
    • James Fargher's avatar
      Update ruby gem activesupport to v6.1.4.1 · e3207df5
      James Fargher authored
      Since gitlab-labkit also depends on activesupport, this needed updating
      too.
      
      These gems have been updated to match gitlab-rails.
      
      Changelog: changed
      e3207df5
  3. Oct 05, 2021
    • Toon Claes's avatar
      Merge branch 'jc-praefect-command-add-repo' into 'master' · deb7bdb3
      Toon Claes authored
      Add add-repository praefect subcmd
      
      Closes #3773
      
      See merge request !3918
      deb7bdb3
    • John Cai's avatar
      Merge branch 'jv-sidechannel-client' into 'master' · 28f93e63
      John Cai authored
      client: add sidechannel support
      
      Closes gitlab-com/gl-infra/scalability#1303
      
      See merge request !3900
      28f93e63
    • Jacob Vosmaer's avatar
      client: add sidechannel support · e64a6c21
      Jacob Vosmaer authored
      This change adds publicly exported code to the gitaly/client package
      that allows Gitaly clients to accept sidechannel connections coming
      back from a Gitaly server they have connected to. This is done via a
      new dial function DialSidechannel.
      
      Changelog: added
      e64a6c21
    • John Cai's avatar
      Merge branch 'check_empty_bundle' into 'master' · c2cf30c9
      John Cai authored
      Determine when CreateBundleFromRefList would generate an empty bundle
      
      See merge request !3923
      c2cf30c9
    • John Cai's avatar
      Add track-repository praefect subcmd · 26006f80
      John Cai authored
      Adds track-repository subcommand that allows an admin to add a repository that
      exists on one of the gitaly nodes to the praefect database. Does some
      safety checks before inserting the reords.
      
      Changelog: added
      26006f80
    • Pavlo Strokov's avatar
      replication: Process replication events with configurable number of workers · 064d5be9
      Pavlo Strokov authored
      Given that we are able to configure amount of workers used to process
      replication events we add that capability into the ReplMgr.
      Now it starts configured amount of workers per virtual storage to
      process replication events. If amount of workers is higher than
      the minimum amount of gitaly storages it will be aligned with the
      latest and the log message about will be issued. If amount is not
      set the default value of 1 will be used (the old behaviour).
      
      Part of: #2915
      064d5be9
    • Pavlo Strokov's avatar
      replication: Configure number of replication workers per virtual storage · 60a4ae5c
      Pavlo Strokov authored
      The new configuration parameter to set number of the workers used
      to process replication jobs. It is a safe incremental move on to the
      rolling out parallel storage processing by the replication manager.
      The default value 1 behaves the same as no changes done to replication
      manager (the old behaviour).
      
      Part of: #2915
      60a4ae5c
    • Pavlo Strokov's avatar
      replication: Process replication events for storages in parallel · 1e5f3b25
      Pavlo Strokov authored
      Current implementation iterates over gitaly storages and executes
      replication events on each sequentially. As some replication events
      could take significant amount of time to complete this results into
      a long replication lag for other storages as their replication is
      blocked. With this change we start to process replication events for
      each gitaly storage in parallel, so there is no replication lag between.
      If one storage is blocked on the processing of the replication event
      all others are allowed to run their replication events in a mean time.
      
      Part of: #2915
      
      Changelog: changed
      1e5f3b25
    • Pavlo Strokov's avatar
      replication: Backoff function provider · 7589ebf9
      Pavlo Strokov authored
      To support parallel processing of the replication events
      we need to refactor usage of the backoff function. Now it
      is a factory that creates backoff functions on demand.
      With that we could create a backoff function for each
      goroutine, so they are independent one from another.
      7589ebf9
  4. Oct 04, 2021
    • James Fargher's avatar
      Determine when CreateBundleFromRefList would generate an empty bundle · 60fc2c95
      James Fargher authored
      In order to gracefully handle incremental backups where nothing has
      changed, we need to determine when `git bundle create` failed because
      the bundle was empty. This isn't an error for incremental backups, it
      simply means there were no updates.
      
      The FailedPrecondition GRPC code was chosen here since it best matches
      the litmus test:
      
      >  (c) Use FailedPrecondition if the client should not retry until
      >      the system state has been explicitly fixed. E.g., if an "rmdir"
      >      fails because the directory is non-empty, FailedPrecondition
      >      should be returned since the client should not retry unless
      >      they have first fixed up the directory by deleting files from it.
      
      Changelog: changed
      60fc2c95
Loading