Geo BlobDownloader fails with status 34 on 18.10 — rugged 1.9.0 / llhttp-ffi symbol collision
## Summary
After upgrading a Geo deployment from 18.9.x to 18.10.3, **every** Geo blob replication attempt on secondary sites fails with:
> Replication failure: Non-success HTTP response status code 34
affecting **every** replicable type (`job_artifact`, `dependency_proxy_manifest`, `dependency_proxy_blob`, `upload`, `lfs_object`, `package_file`, ...). The number 34 is not a valid HTTP status; it matches the llhttp error code `HPE_INVALID_STATUS`.
After a multi-layer bisection we isolated the root cause: **`rugged.so` globally exports `libgit2`'s statically-embedded `llhttp_*` symbols (119 of them), which collide with `llhttp-ffi`'s `libllhttp-ext.so` at runtime.** Details and evidence below.
Multiple independent Geo deployments are affected identically.
## Minimal reproducer (no GitLab, no Rails, no Sidekiq, no TLS)
```ruby
# reproducer.rb
require 'rugged' # <-- loading rugged first is the only trigger
require 'http'
puts HTTP.get('http://127.0.0.1:8080/').status.code
# => 34 (against any well-formed HTTP response bytes; expected 200)
```
With a trivial Python raw-socket server (`server.py`) replying with a spec-compliant `HTTP/1.1 200 OK\r\n...\r\n\r\n<body>` response, the above script prints `34` (bug). Without the `require 'rugged'`, the same script against the same server prints `200` correctly.
**Full reproducer (client.rb, server.py, Gemfile, README.md) available as a GitLab snippet:** [gitlab.com/-/snippets/5982447](https://gitlab.com/-/snippets/5982447)
```
git clone https://gitlab.com/snippets/5982447.git
cd 5982447
bundle install
python3 server.py &
bundle exec ruby client.rb
```
Gem versions (as shipped by GitLab 18.10.3 Omnibus):
- `rugged` 1.9.0 (vendors libgit2 statically)
- `http` 5.3.1
- `llhttp-ffi` 0.5.1 (bundles llhttp C source 8.1.0)
- Ruby 3.3.10
## Evidence: symbol table collision
`nm -D --defined-only` on the two shared libraries on a failing Omnibus secondary:
```
$ nm -D --defined-only /opt/gitlab/.../rugged-1.9.0/rugged/rugged.so | grep -c llhttp
119 # all globally exported 'T'
$ nm -D --defined-only /opt/gitlab/.../llhttp-ffi-0.5.1/.../libllhttp-ext.so | grep -c llhttp
~40 # same symbol names, different offsets
```
`ldd rugged.so` shows libssl/libcrypto dynamically linked from `/opt/gitlab/embedded/lib`, but libgit2 is nowhere — confirming libgit2 is statically embedded. The additional 1,799 exported `git_*` symbols confirm the static-embed.
## In-stack reproduction (on a Geo 18.10.3 secondary)
```ruby
sudo gitlab-rails runner '
rep = Geo::DependencyProxyManifestReplicator.new(model_record_id: <ID>)
dl = Gitlab::Geo::Replication::BlobDownloader.new(replicator: rep)
puts dl.execute.inspect
'
```
Produces:
```
#<Gitlab::Geo::Replication::BlobDownloader::Result
@success=false, @bytes_downloaded=0, @primary_missing_file=false,
@reason="Non-success HTTP response status code 34",
@extra_details={:status_code=>34, :reason=>nil, :url=>"..."}>
```
The `34` originates in `ee/lib/gitlab/geo/replication/blob_downloader.rb:281`, which stores `response.status.code` verbatim from http.rb.
## Raw wire-level status line (captured via raw `OpenSSL::SSLSocket` on the failing host)
```
Hex: 48 54 54 50 2f 31 2e 31 20 32 30 30 20 4f 4b 0d 0a
ASCII: H T T P / 1 . 1 2 0 0 O K \r \n
```
Textbook `HTTP/1.1 200 OK\r\n`. The server is innocent.
## Verified fix
**Rebuilding rugged with `-Wl,--exclude-libs,ALL` resolves the bug completely.**
Verified locally against the reproducer linked above, using the exact gem pins from Omnibus 18.10.3 (`rugged 1.9.0`, `http 5.3.1`, `llhttp-ffi 0.5.1`) on Ruby 3.4.8 — llhttp-ffi is a C binary whose symbol-interposition behaviour does not depend on the Ruby version:
```bash
bundle config build.rugged "--with-ldflags=-Wl,--exclude-libs,ALL"
gem uninstall rugged
bundle install # rebuilds rugged with the linker flag
bundle exec ruby client.rb # now prints status.code: 200 instead of 34
```
Effect on `rugged.so`:
| metric | default build | with `--exclude-libs,ALL` |
| --------------------------------- | -------------- | ------------------------- |
| size | 4,814,880 B | 4,686,848 B |
| globally exported `llhttp_*` (`T`)| 119 | **0** |
| globally exported `git_*` (`T`) | 1,799 | **0** |
| reproducer `HTTP.get(...).status.code` | **34** | **200** |
This simultaneously confirms the root-cause theory and the fix direction: once libgit2's statically-embedded symbols are no longer exposed to the process-global symbol namespace, the dynamic-linker collision with `libllhttp-ext.so` cannot occur, and http.rb parses correctly on every response.
## Possible fix directions
A few places could address this, depending on where maintainers prefer to draw the boundary:
- **rugged `extconf.rb`**: pass `-Wl,--exclude-libs,ALL` to the linker when building `rugged.so` so libgit2's statically-embedded third-party symbols stay out of the process-global namespace. We verified this resolves the collision locally (table above).
- **libgit2 upstream**: build bundled third-party dependencies (llhttp, historically also http-parser, zlib, ...) with `-fvisibility=hidden` by default. Would protect every downstream consumer, not only rugged.
- **llhttp-ffi**: open `libllhttp-ext.so` with `RTLD_LOCAL` / `RTLD_DEEPBIND` and resolve symbols only within that handle. Would make llhttp-ffi robust against any gem statically embedding llhttp.
For operators stuck on 18.10.3 before a fix ships, interim options are (a) swapping `BlobDownloader#download_file` to `Net::HTTP` via a Rails initializer (Net::HTTP uses the Ruby stdlib parser and is unaffected), or (b) rebuilding the rugged gem with the linker flag above (requires package/Omnibus-level build customization).
## Workaround currently running on our deployments
We are running option (a) above: an internal Rails initializer that prepends `Gitlab::Geo::Replication::BlobDownloader#download_file` with a `Net::HTTP`-based implementation preserving the upstream one-hop manual-redirect-follow semantics. Replication resumed immediately on both affected sites. We consider this operations-grade only — it sidesteps rather than fixes the collision and is wiped on every package upgrade.
## Why 18.10 is when this surfaced (with evidence)
`Gemfile.lock` diff between the two relevant GitLab tags:
| GitLab tag | rugged | bundled libgit2 |
| -------------- | ------- | ------------------------------------------------------- |
| `v18.9.0-ee` | 1.6.3 | libgit2 1.6.x — **no bundled llhttp** |
| `v18.9.5-ee` | 1.6.3 | libgit2 1.6.x — **no bundled llhttp** |
| `v18.10.3-ee` | 1.9.0 | libgit2 ~1.9.0 (submodule pin `338e6fb6`) — **bundles llhttp statically** |
Sources:
- [`Gemfile.lock` @ v18.9.0-ee](https://gitlab.com/gitlab-org/gitlab/-/raw/v18.9.0-ee/Gemfile.lock)
- [`Gemfile.lock` @ v18.10.3-ee](https://gitlab.com/gitlab-org/gitlab/-/raw/v18.10.3-ee/Gemfile.lock)
libgit2 began bundling llhttp as a vendored builtin in [libgit2 PR #6713](https://github.com/libgit2/libgit2/pull/6713), merged 2024-04-23, landing in libgit2 1.8.0/1.8.1:
> *"Include llhttp as a bundled dependency with the aim to use it as our default http parser, removing the now-unmaintained Node.js http-parser."*
Every libgit2 ≥ 1.8.1 ships llhttp statically embedded. rugged 1.9.0 vendors libgit2 ~1.9.0 and therefore exports llhttp's symbols from `rugged.so`. rugged 1.6.3 (libgit2 1.6.x) did not. That is the exact version boundary that matches "started failing with 18.10.3".
issue