Geo: Patch rugged.so to hide libgit2's llhttp symbols

Summary

rugged.so globally exports 119 llhttp_* symbols from libgit2's statically-linked bundled llhttp. When llhttp-ffi (loaded transitively by the http gem) is mapped into the same process, the dynamic linker resolves its llhttp_* calls to rugged's symbols (different version, different ABI), corrupting parser callbacks. This is the root cause behind the Geo blob replication failures tracked in #595139 (closed) and identified in #597390 (closed).

The currently-shipping workaround swaps the blob download path to Gitlab::HTTP (HTTParty/Net::HTTP) behind the geo_blob_download_with_gitlab_http ops feature flag. It works, but it's introduced new bugs (#598020 (closed) — timeout ignored for >60s downloads; #598514 (closed)allow_object_storage no-op for default-DNS S3) and leaves us maintaining two blob-download code paths (per #596934).

This issue tracks a permanent root-cause fix: patch rugged at build time to hide its statically-linked libgit2 symbols, so they no longer collide with llhttp-ffi. Once landed, the original http-gem path becomes safe to use and the Gitlab::HTTP feature flag can be evaluated for removal.

Root cause evidence

$ nm -D --defined-only $(gem which rugged | sed 's|/lib/rugged.rb|/rugged/rugged.so|') | grep -c llhttp
119

All 119 are exported as T (global text), making them visible to subsequent dlopen. Verified at three levels — no symbol-hiding flags anywhere in the chain:

  1. rugged 1.9.0's ext/rugged/extconf.rb sets only $CFLAGS << " -g -O3 -Wall -Wno-comment". No -fvisibility=hidden, no --exclude-libs. cmake builds vendored libgit2 static via -DBUILD_SHARED_LIBS=OFF.
  2. GitHub code search across libgit2/rugged for "visibility" returns 0 hits.
  3. libgit2 v1.9.x CMakeLists.txt and the bundled deps/llhttp/CMakeLists.txt (built as add_library(llhttp OBJECT ...)) have no visibility flags either. Default visibility leaks all symbols from the static archive into rugged.so.

Zero upstream issues filed at libgit2/rugged about this — search of the repo for "llhttp" or "symbol/visibility/collision" returns nothing.

Reproducer

Pure Ruby — no GitLab, no TLS, no Rails. Reproduces on Omnibus, CNG, and Dedicated. Loading rugged before http is the only trigger.

# client.rb
require 'rugged'   # comment this out and the bug disappears
require 'http'
puts HTTP.get('http://127.0.0.1:8080/').status.code
# => 34   (expected: 200)

Full reproducer (client.rb, server.py, Gemfile, README.md): https://gitlab.com/-/snippets/5982447

Why this fix vs. the alternatives

We considered three alternatives raised in #596934 (Scott Murray):

  1. Bump http gem to 6.x (uses llhttp direct C binding instead of llhttp-ffi on CRuby). Blocked by kubeclient 4.13.0 still pinning http >= 3.0, < 6.0 (despite earlier expectation that 4.13.0 removed the cap — verified via gemspec at v4.13.0 tag). Upstream PR ManageIQ/kubeclient#687 is open with no movement since 2026-03-18. Even if unblocked, the llhttp gem also statically embeds llhttp.c sources without symbol-hiding flags — so the rugged collision could still corrupt the new path.

  2. Bump Omnibus libffi 3.2.1 → 3.4.x+ (static trampolines on Linux). Confirmed Omnibus pin and changelog claim. Irrelevant for CNG / Dedicated: CNG's Gemfile.lock resolves ffi (1.17.4-x86_64-linux-gnu) — a precompiled binary that statically embeds vendored libffi at SHA 2263d6037f8e (Dec 2025), 9 commits past v3.5.2. Static trampolines have been there for years. Bumping Omnibus libffi changes nothing for K8s deployments.

  3. Drop rugged entirely. !218195 (closed) "Remove Rugged from Gemfile" is open since 2026-01-08, but only removes the direct gem 'rugged', '~> 1.6' line. Rugged still ships transitively via licensee 9.18.0rugged (>= 0.24, < 2.0) (production dep, Gemfile:355) and undercover 0.8.5 (tooling). Replacing licensee is a bigger project per discussions in !196343 (closed).

The patch proposed here is the smallest fix that addresses the actual root cause for all deployment shapes.

Proposed fix

Add -Wl,--exclude-libs,ALL to $LDFLAGS in rugged's ext/rugged/extconf.rb. This tells the linker to mark all symbols from static archives (libgit2 → bundled llhttp/zlib/pcre/ntlmclient/xdiff) as local in rugged.so, while keeping Init_rugged and Ruby-API symbols (compiled from .o files, not archives) global.

The ffi gem already does exactly this for the same class of problem (ffi/ext/ffi_c/extconf.rb:54-56):

# Ensure libffi symbols aren't exported when using static libffi.
# This is to avoid interference with other gems like fiddle.
append_ldflags "-Wl,--exclude-libs,ALL"

Patch:

diff --git a/ext/rugged/extconf.rb b/ext/rugged/extconf.rb
--- a/ext/rugged/extconf.rb
+++ b/ext/rugged/extconf.rb
@@ -13,6 +13,15 @@ $CFLAGS << " -g"
 $CFLAGS << " -O3" unless $CFLAGS[/-O\d/]
 $CFLAGS << " -Wall -Wno-comment"

+# Hide symbols from statically-linked libgit2 (and its bundled llhttp,
+# pcre, zlib, ntlmclient, xdiff dependencies) so they don't collide
+# with other gems' shared libraries at runtime. Without this, rugged.so
+# globally exports ~119 llhttp_* symbols which corrupt callbacks in
+# llhttp-ffi (loaded by the http.rb gem). See:
+#   https://gitlab.com/gitlab-org/gitlab/-/issues/597390
+# Equivalent pattern used by the ffi gem for libffi.
+$LDFLAGS << " -Wl,--exclude-libs,ALL" unless Gem.win_platform?
+
 cmake_flags = [ ENV["CMAKE_FLAGS"] ]
 cmake_flags << "-DBUILD_CLI=OFF"
 cmake_flags << "-DBUILD_TESTS=OFF"

Init_rugged and Ruby-API symbols come from .o files (not the static libgit2 archive) and remain globally visible, so Ruby's dlopen still works.

Implementation plan

Apply the patch downstream via the existing gem-patch precedent already in use for grpc (FIPS):

omnibus-gitlab MR (omnibus-gitlab!9367 (closed))

  • Add config/patches/rugged-hide-libgit2-symbols-1.9.0.patch
  • Add config/software/ruby-rugged.rb modelled on the existing config/software/ruby-grpc.rb
  • Add dependency 'ruby-rugged' to config/software/gitlab-rails.rb

CNG MR (gitlab-org/build/CNG!2916 (closed))

  • Add shared/build-scripts/patches/rugged-hide-libgit2-symbols-1.9.0.patch
  • Add shared/build-scripts/patch-rugged-symbols modelled on shared/build-scripts/reinstall-grpc-if-fips, without the FIPS gate
  • Add invocation lines to gitlab-rails/Dockerfile.erb and gitlab-rails/Dockerfile.build.ubi.erb after bundle install

Upstream

File a corresponding issue/PR at libgit2/rugged so we can drop the downstream patch once it lands.

Verification

# Symbol check (primary regression test):
nm -D --defined-only /opt/gitlab/embedded/lib/ruby/gems/3.3.0/gems/rugged-1.9.0/lib/rugged/rugged.so | grep -c llhttp
# Expected: 0     (was 119 before patch)

Then re-run the reproducer above (expect 200), and disable geo_blob_download_with_gitlab_http on staging-ref to confirm the original http-gem path is now stable.

  • #595139 (closed) — original FFI corruption bug (closed)
  • #597390 (closed) — root-cause analysis of the rugged/llhttp-ffi symbol collision (closed)
  • #596934 — Geo: Evaluate and consolidate blob download HTTP backend (this issue's fix unblocks consolidation)
  • #598020 (closed)Gitlab::HTTP path 60s timeout bug
  • #598514 (closed)Gitlab::HTTP path object-storage URL blocker bug
  • !218195 (closed) — Remove Rugged from Gemfile (open, transitively-blocked by licensee)
  • !230361 (merged) — Geo: Switch blob download to use GitLab::HTTP (the workaround MR, merged)
Edited by Douglas Barbosa Alexandre