Draft: Patch rugged to hide libgit2 symbols

What does this MR do?

Re-install the rugged gem with a patched ext/rugged/extconf.rb that adds -Wl,--exclude-libs,ALL to $LDFLAGS, so that symbols from libgit2's statically-linked archives (most notably the bundled llhttp_*) are marked local in rugged.so instead of being globally exported.

Why

Without this patch, rugged.so globally exports 119 llhttp_* symbols from libgit2's bundled llhttp:

$ nm -D --defined-only /opt/gitlab/.../rugged/rugged.so | grep -c llhttp
119

When llhttp-ffi (loaded transitively by the http gem) is mapped into the same Ruby process, the dynamic linker resolves its llhttp_* calls to rugged's symbols (different llhttp version, different ABI), corrupting the parser callbacks. Every Geo blob HTTP response then fails with status 34 (HPE_INVALID_STATUS).

This is the root cause behind gitlab#595139 (closed) and analyzed in detail in gitlab#597390 (closed). The currently-shipping workaround swaps the blob download path to Gitlab::HTTP behind the geo_blob_download_with_gitlab_http ops feature flag, but it has surfaced new bugs (gitlab#598020 (closed), gitlab#598514 (closed)) and leaves us maintaining two code paths (gitlab-org/gitlab#596934). Patching rugged's symbol visibility removes the root cause for all deployment shapes (Omnibus, CNG, Dedicated, dev).

What changed

File Change
config/patches/rugged-hide-libgit2-symbols-1.9.0.patch New — 5-line extconf.rb patch adding the LDFLAG.
config/software/ruby-rugged.rb New — software definition that uses gem-patch to re-install the rugged gem with the patch applied. Modelled on config/software/ruby-grpc.rb.
config/projects/gitlab.rb Added dependency 'ruby-rugged' after ruby-grpc, so the patched re-install runs after gitlab-rails' bundle install.

The fix mirrors the same pattern the ffi gem already uses for libffi (ffi/ext/ffi_c/extconf.rb#L54-L56):

# Ensure libffi symbols aren't exported when using static libffi.
# This is to avoid interference with other gems like fiddle.
append_ldflags "-Wl,--exclude-libs,ALL"

Init_rugged and Ruby-side API symbols are compiled from .o files (not the static libgit2 archive) and remain globally visible, so Ruby's dlopen still works.

Verification (test plan)

1. Symbol check (primary regression test):

nm -D --defined-only /opt/gitlab/embedded/lib/ruby/gems/3.3.0/gems/rugged-1.9.0/lib/rugged/rugged.so | grep -c llhttp
# Expected: 0     (was 119 before patch)

2. Reproducer from gitlab#597390 (closed):

Run snippet 5982447 inside the patched build — client.rb should print 200 (was 34).

3. Rugged smoke test:

require 'rugged'
puts Rugged::Reference.valid_name?("refs/heads/main")  # => true
puts Rugged::Reference.valid_name?("refs/heads/..")    # => false

4. End-to-end: disable geo_blob_download_with_gitlab_http on staging-ref, trigger a Geo blob sync. The original http-gem path should now succeed without FFI corruption.

Why not Scott's two alternatives in #596934

Considered both upstream options before choosing the downstream patch:

  1. Bump http gem to 6.x (uses llhttp direct C binding instead of llhttp-ffi on CRuby): blocked by kubeclient 4.13.0 still pinning http >= 3.0, < 6.0 (verified at the v4.13.0 tag's gemspec — the cap was not removed). ManageIQ/kubeclient#687 is open with no upstream movement since 2026-03-18. Even if unblocked, the llhttp gem also statically embeds llhttp.c sources without symbol-hiding flags, so the rugged collision could still corrupt the new path.
  2. Bump Omnibus libffi 3.2.1 → 3.4.x+: doesn't address the symbol collision and is irrelevant for CNG/Dedicated (those use the precompiled ffi gem which statically embeds vendored libffi >3.5.2 already).

The downstream rugged patch is the smallest fix that addresses the actual root cause across all deployment shapes.

Checklist

See Definition of done.

Required

  • MR title and description are up to date, accurate, and descriptive.
  • MR targeting the appropriate branch.
  • Latest Merge Result pipeline is green.
  • When ready for review, MR is labeled workflowready for review per the Distribution MR workflow.

For GitLab team members

  • The manual Trigger:ee-package jobs have a green pipeline running against latest commit.
  • Since config/software and config/patches directories are changed, the build-package-on-all-os job within the Trigger:ee-package downstream pipeline must succeed.
  • If CI configuration is changed, the branch must be pushed to dev.gitlab.org to confirm regular branch builds aren't broken.

Expected

  • Test plan indicating conditions for success has been posted (see Verification above).
  • Documentation created/updated. Not applicable — internal build-system change.
  • Tests added. Not applicable — verified via nm symbol count post-build; no unit-testable Ruby surface.
  • Integration tests added to GitLab QA. Not planned — Geo replication tests already cover the affected flow.
Edited by Douglas Barbosa Alexandre

Merge request reports

Loading