Draft: Patch rugged to hide libgit2 symbols
What does this MR do?
Re-install the rugged gem with a patched ext/rugged/extconf.rb that adds -Wl,--exclude-libs,ALL to $LDFLAGS, so that symbols from libgit2's statically-linked archives (most notably the bundled llhttp_*) are marked local in rugged.so instead of being globally exported.
Why
Without this patch, rugged.so globally exports 119 llhttp_* symbols from libgit2's bundled llhttp:
$ nm -D --defined-only /opt/gitlab/.../rugged/rugged.so | grep -c llhttp
119When llhttp-ffi (loaded transitively by the http gem) is mapped into the same Ruby process, the dynamic linker resolves its llhttp_* calls to rugged's symbols (different llhttp version, different ABI), corrupting the parser callbacks. Every Geo blob HTTP response then fails with status 34 (HPE_INVALID_STATUS).
This is the root cause behind gitlab-org/gitlab#595139 (closed) and analyzed in detail in gitlab-org/gitlab#597390 (closed). The currently-shipping workaround swaps the blob download path to Gitlab::HTTP behind the geo_blob_download_with_gitlab_http ops feature flag, but it has surfaced new bugs (gitlab-org/gitlab#598020 (closed), gitlab-org/gitlab#598514 (closed)) and leaves us maintaining two code paths (gitlab-org/gitlab#596934). Patching rugged's symbol visibility removes the root cause for all deployment shapes (Omnibus, CNG, Dedicated, dev).
CNG/Dedicated specifically reproduces this bug because, although CNG runs against a recent libffi (ffi 1.17.4-x86_64-linux-gnu precompiled gem statically embeds vendored libffi >3.5.2), the rugged symbol collision is independent of libffi version — it's a link-time visibility issue in the rugged extension itself.
What changed
| File | Change |
|---|---|
shared/build-scripts/patches/rugged-hide-libgit2-symbols-1.9.0.patch |
New — 5-line extconf.rb patch adding the LDFLAG. |
shared/build-scripts/patch-rugged-symbols |
New — executable script that uses gem-patch to re-install the rugged gem with the patch applied. Modelled on shared/build-scripts/reinstall-grpc-if-fips, without the FIPS gate (the rugged fix applies to all builds). |
gitlab-rails/Dockerfile.erb |
Added invocation of the new script after bundle install in the builder stage. |
gitlab-rails/Dockerfile.build.ubi.erb |
Added invocation after reinstall-grpc-if-fips. |
The fix mirrors the same pattern the ffi gem already uses for libffi (ffi/ext/ffi_c/extconf.rb#L54-L56):
# Ensure libffi symbols aren't exported when using static libffi.
# This is to avoid interference with other gems like fiddle.
append_ldflags "-Wl,--exclude-libs,ALL"Init_rugged and Ruby-side API symbols are compiled from .o files (not the static libgit2 archive) and remain globally visible, so Ruby's dlopen still works.
Verification (test plan)
1. Symbol check (primary regression test) — run inside the built image:
nm -D --defined-only $(find /srv/gitlab/vendor/bundle -name 'rugged.so') | grep -c llhttp
# Expected: 0 (was 119 before patch)2. Reproducer from gitlab-org/gitlab#597390 (closed):
Run snippet 5982447 inside the patched build — client.rb should print 200 (was 34).
3. Rugged smoke test:
require 'rugged'
puts Rugged::Reference.valid_name?("refs/heads/main") # => true
puts Rugged::Reference.valid_name?("refs/heads/..") # => false4. End-to-end: deploy the patched image to a Geo secondary on staging-ref, disable geo_blob_download_with_gitlab_http, trigger blob replication. The original http-gem path should now succeed without FFI corruption.
Why not Scott's two alternatives in #596934
Considered both upstream options before choosing the downstream patch:
- Bump
httpgem to 6.x (usesllhttpdirect C binding instead ofllhttp-ffion CRuby): blocked bykubeclient 4.13.0still pinninghttp >= 3.0, < 6.0(verified at the v4.13.0 tag's gemspec — the cap was not removed). ManageIQ/kubeclient#687 is open with no upstream movement since 2026-03-18. Even if unblocked, thellhttpgem also statically embeds llhttp.c sources without symbol-hiding flags, so the rugged collision could still corrupt the new path. - Bump Omnibus libffi 3.2.1 → 3.4.x+: doesn't address the symbol collision and is irrelevant for CNG/Dedicated (already on a much newer libffi via the precompiled
ffigem).
The downstream rugged patch is the smallest fix that addresses the actual root cause across all deployment shapes.
Related issues
- Closes part of gitlab-org/gitlab#598564 (closed)
- Root cause analysis: gitlab-org/gitlab#597390 (closed)
- Original symptom: gitlab-org/gitlab#595139 (closed)
- Consolidation tracking: gitlab-org/gitlab#596934
- Companion omnibus MR: gitlab-org/omnibus-gitlab!9367 (closed)
Checklist
See Definition of done.
Required
- Merge Request Title, and Description are up to date, accurate, and descriptive.
- MR targeting the appropriate branch.
- MR has a green pipeline on GitLab.com.
- When ready for review, MR is labeled workflowready for review per the Distribution MR workflow.
Expected
- Test plan indicating conditions for success has been posted (see Verification above).
- Documentation created/updated. Not applicable — internal build-system change.
- Integration tests added to GitLab QA. Not planned — Geo replication tests already cover the affected flow.
- The impact any change in container size has should be evaluated. Expected: negligible (one extra rugged rebuild from source; no new runtime deps).
- New dependencies are managed with GitLab forked renovatebot. Not applicable — no new gem dependency.