Geo: Patch rugged.so to hide libgit2's llhttp symbols
## Summary `rugged.so` globally exports 119 `llhttp_*` symbols from libgit2's statically-linked bundled llhttp. When `llhttp-ffi` (loaded transitively by the `http` gem) is mapped into the same process, the dynamic linker resolves its `llhttp_*` calls to rugged's symbols (different version, different ABI), corrupting parser callbacks. This is the root cause behind the Geo blob replication failures tracked in #595139 and identified in #597390. The currently-shipping workaround swaps the blob download path to `Gitlab::HTTP` (HTTParty/Net::HTTP) behind the `geo_blob_download_with_gitlab_http` ops feature flag. It works, but it's introduced new bugs (#598020 — timeout ignored for >60s downloads; #598514 — `allow_object_storage` no-op for default-DNS S3) and leaves us maintaining two blob-download code paths (per #596934). This issue tracks a permanent root-cause fix: patch rugged at build time to hide its statically-linked libgit2 symbols, so they no longer collide with `llhttp-ffi`. Once landed, the original `http`-gem path becomes safe to use and the Gitlab::HTTP feature flag can be evaluated for removal. ## Root cause evidence ``` $ nm -D --defined-only $(gem which rugged | sed 's|/lib/rugged.rb|/rugged/rugged.so|') | grep -c llhttp 119 ``` All 119 are exported as `T` (global text), making them visible to subsequent `dlopen`. Verified at three levels — no symbol-hiding flags anywhere in the chain: 1. `rugged 1.9.0`'s `ext/rugged/extconf.rb` sets only `$CFLAGS << " -g -O3 -Wall -Wno-comment"`. No `-fvisibility=hidden`, no `--exclude-libs`. cmake builds vendored libgit2 static via `-DBUILD_SHARED_LIBS=OFF`. 2. GitHub code search across `libgit2/rugged` for "visibility" returns 0 hits. 3. `libgit2 v1.9.x` `CMakeLists.txt` and the bundled `deps/llhttp/CMakeLists.txt` (built as `add_library(llhttp OBJECT ...)`) have no visibility flags either. Default visibility leaks all symbols from the static archive into rugged.so. Zero upstream issues filed at libgit2/rugged about this — search of the repo for "llhttp" or "symbol/visibility/collision" returns nothing. ## Reproducer Pure Ruby — no GitLab, no TLS, no Rails. Reproduces on Omnibus, CNG, and Dedicated. Loading rugged before http is the only trigger. ```ruby # client.rb require 'rugged' # comment this out and the bug disappears require 'http' puts HTTP.get('http://127.0.0.1:8080/').status.code # => 34 (expected: 200) ``` Full reproducer (client.rb, server.py, Gemfile, README.md): https://gitlab.com/-/snippets/5982447 ## Why this fix vs. the alternatives We considered three alternatives raised in #596934 (Scott Murray): 1. **Bump `http` gem to 6.x** (uses `llhttp` direct C binding instead of `llhttp-ffi` on CRuby). Blocked by `kubeclient` 4.13.0 still pinning `http >= 3.0, < 6.0` (despite earlier expectation that 4.13.0 removed the cap — verified via gemspec at `v4.13.0` tag). Upstream PR [ManageIQ/kubeclient#687](https://github.com/ManageIQ/kubeclient/pulls/687) is open with no movement since 2026-03-18. Even if unblocked, the `llhttp` gem also statically embeds llhttp.c sources without symbol-hiding flags — so the rugged collision could still corrupt the new path. 2. **Bump Omnibus libffi 3.2.1 → 3.4.x+** (static trampolines on Linux). Confirmed Omnibus pin and changelog claim. **Irrelevant for CNG / Dedicated**: CNG's `Gemfile.lock` resolves `ffi (1.17.4-x86_64-linux-gnu)` — a precompiled binary that statically embeds vendored libffi at SHA `2263d6037f8e` (Dec 2025), 9 commits past v3.5.2. Static trampolines have been there for years. Bumping Omnibus libffi changes nothing for K8s deployments. 3. **Drop rugged entirely.** [!218195](https://gitlab.com/gitlab-org/gitlab/-/merge_requests/218195) "Remove Rugged from Gemfile" is open since 2026-01-08, but only removes the *direct* `gem 'rugged', '~> 1.6'` line. Rugged still ships transitively via `licensee 9.18.0` → `rugged (>= 0.24, < 2.0)` (production dep, `Gemfile:355`) and `undercover 0.8.5` (tooling). Replacing licensee is a bigger project per discussions in !196343. The patch proposed here is the smallest fix that addresses the actual root cause for all deployment shapes. ## Proposed fix Add `-Wl,--exclude-libs,ALL` to `$LDFLAGS` in `rugged`'s `ext/rugged/extconf.rb`. This tells the linker to mark all symbols from static archives (libgit2 → bundled llhttp/zlib/pcre/ntlmclient/xdiff) as **local** in `rugged.so`, while keeping `Init_rugged` and Ruby-API symbols (compiled from `.o` files, not archives) global. The `ffi` gem already does exactly this for the same class of problem ([`ffi/ext/ffi_c/extconf.rb:54-56`](https://github.com/ffi/ffi/blob/v1.17.4/ext/ffi_c/extconf.rb#L54-L56)): ```ruby # Ensure libffi symbols aren't exported when using static libffi. # This is to avoid interference with other gems like fiddle. append_ldflags "-Wl,--exclude-libs,ALL" ``` Patch: ```diff diff --git a/ext/rugged/extconf.rb b/ext/rugged/extconf.rb --- a/ext/rugged/extconf.rb +++ b/ext/rugged/extconf.rb @@ -13,6 +13,15 @@ $CFLAGS << " -g" $CFLAGS << " -O3" unless $CFLAGS[/-O\d/] $CFLAGS << " -Wall -Wno-comment" +# Hide symbols from statically-linked libgit2 (and its bundled llhttp, +# pcre, zlib, ntlmclient, xdiff dependencies) so they don't collide +# with other gems' shared libraries at runtime. Without this, rugged.so +# globally exports ~119 llhttp_* symbols which corrupt callbacks in +# llhttp-ffi (loaded by the http.rb gem). See: +# https://gitlab.com/gitlab-org/gitlab/-/issues/597390 +# Equivalent pattern used by the ffi gem for libffi. +$LDFLAGS << " -Wl,--exclude-libs,ALL" unless Gem.win_platform? + cmake_flags = [ ENV["CMAKE_FLAGS"] ] cmake_flags << "-DBUILD_CLI=OFF" cmake_flags << "-DBUILD_TESTS=OFF" ``` `Init_rugged` and Ruby-API symbols come from `.o` files (not the static libgit2 archive) and remain globally visible, so Ruby's dlopen still works. ## Implementation plan Apply the patch downstream via the existing `gem-patch` precedent already in use for `grpc` (FIPS): ### omnibus-gitlab MR (https://gitlab.com/gitlab-org/omnibus-gitlab/-/merge_requests/9367) - Add `config/patches/rugged-hide-libgit2-symbols-1.9.0.patch` - Add `config/software/ruby-rugged.rb` modelled on the existing `config/software/ruby-grpc.rb` - Add `dependency 'ruby-rugged'` to `config/software/gitlab-rails.rb` ### CNG MR (https://gitlab.com/gitlab-org/build/CNG/-/merge_requests/2916) - Add `shared/build-scripts/patches/rugged-hide-libgit2-symbols-1.9.0.patch` - Add `shared/build-scripts/patch-rugged-symbols` modelled on `shared/build-scripts/reinstall-grpc-if-fips`, without the FIPS gate - Add invocation lines to `gitlab-rails/Dockerfile.erb` and `gitlab-rails/Dockerfile.build.ubi.erb` after `bundle install` ### Upstream File a corresponding issue/PR at [`libgit2/rugged`](https://github.com/libgit2/rugged) so we can drop the downstream patch once it lands. ## Verification ```bash # Symbol check (primary regression test): nm -D --defined-only /opt/gitlab/embedded/lib/ruby/gems/3.3.0/gems/rugged-1.9.0/lib/rugged/rugged.so | grep -c llhttp # Expected: 0 (was 119 before patch) ``` Then re-run the reproducer above (expect `200`), and disable `geo_blob_download_with_gitlab_http` on staging-ref to confirm the original `http`-gem path is now stable. ## Related - #595139 — original FFI corruption bug (closed) - #597390 — root-cause analysis of the rugged/llhttp-ffi symbol collision (closed) - #596934 — Geo: Evaluate and consolidate blob download HTTP backend (this issue's fix unblocks consolidation) - #598020 — `Gitlab::HTTP` path 60s timeout bug - #598514 — `Gitlab::HTTP` path object-storage URL blocker bug - !218195 — Remove Rugged from Gemfile (open, transitively-blocked by licensee) - !230361 — Geo: Switch blob download to use GitLab::HTTP (the workaround MR, merged)
issue