Geo blob replication fails with HPE_USER llhttp callback error on Ubuntu 24.04 with kernel 6.17
## Summary
All Geo blob replication fails on GET-provisioned Ubuntu 24.04 instances with both stock GitLab 18.10.1 and MR branch builds. Every blob replicator (PackageFile, Upload, ProjectComponentFile, etc.) fails with the same error:
```
Error downloading file: error reading from socket:
Error Parsing data: HPE_USER Span callback error in on_header_field
```
The secondary's `BlobDownloadService` uses the `http` gem (which uses `llhttp-ffi`) to download files from the primary's internal Geo retrieve API. The HTTP response parser's FFI callbacks are corrupted, causing every download to fail.
## Root Cause Analysis
The `llhttp-ffi` gem's `LLHttp::Parser` class defines callback wrapper methods via `class_eval` during gem loading (Rails boot):
```ruby
CALLBACKS_WITH_DATA.each do |callback|
class_eval(<<~RB)
private def #{callback}(buffer, length)
@delegate.#{callback}(buffer.get_bytes(0, length))
end
RB
end
```
These methods are then converted to FFI function pointers via `method(:on_header_field).to_proc` in `initialize`. On this environment, the `Method#to_proc` conversion produces **corrupted FFI function pointers** for methods defined during the Rails boot process.
**Key evidence:**
- `LLHttp::Delegate` subclasses defined in a `gitlab-rails runner` script work correctly
- The exact same class definition, when loaded during Rails boot (via initializers, `Bundler.require`, or `load`), produces broken callbacks
- Using `proc { @delegate.send(callback, ...) }` instead of `method(:x).to_proc` works when called from post-boot code, but not when the proc is created during boot
- ALL blob replicators are affected (PackageFile, Upload, etc.) — not specific to any one replicator
- Stock GitLab 18.10.1 has the same issue as MR branch builds on this environment
## Environment
| Component | Version |
|-----------|---------|
| OS | Ubuntu 24.04.4 LTS (Noble Numbat) |
| Kernel | **6.17.0-1009-aws** |
| AMI | ami-0ec10929233384c7f |
| Instance | c5.2xlarge (Intel Xeon Platinum 8275CL) |
| Omnibus libffi | **3.2.1** (bundled as `libffi.so.6`) |
| System libffi | 3.4.6 (`libffi.so.8`) |
| ffi gem | 1.17.3 |
| llhttp-ffi | 0.4.0 |
| http gem | 5.1.1 |
| Ruby | 3.3.10 |
| GitLab | 18.10.1-ee (also reproduced on 18.10.0+rfbranch MR builds) |
| Provisioning | GitLab Environment Toolkit (GET) |
## Likely Cause
Incompatibility between the omnibus-bundled **libffi 3.2.1** and **kernel 6.17**'s memory layout or security features. libffi 3.2.1 is from 2014 and its closure/trampoline mechanism (which allocates executable memory for FFI callbacks) may not function correctly with modern kernel memory protections.
The system libffi 3.4.6 cannot be used as a drop-in replacement because the ABI changed between libffi 6 and libffi 8.
Note: Standard FFI callbacks (e.g., `qsort` with a Ruby comparator proc) work correctly in ALL contexts. The issue is specific to `Method#to_proc` conversions being passed as FFI callbacks, and only manifests after the full Rails boot process completes.
## Reproduction
On a GET-provisioned Ubuntu 24.04 instance with kernel 6.17:
```ruby
# This FAILS (from gitlab-rails runner or console):
require "http"
HTTP::Response::Parser.new << "HTTP/1.1 200 OK\r\nserver: nginx\r\n\r\n"
# => IOError: Error Parsing data: HPE_USER Span callback error in on_header_field
# This WORKS (same gitlab-rails runner session):
class FreshHandler < LLHttp::Delegate
def on_header_field(f); end
def on_header_value(v); end
def on_headers_complete; end
def on_body(b); end
def on_message_complete; end
end
LLHttp::Parser.new(FreshHandler.new, type: :response) << "HTTP/1.1 200 OK\r\nserver: nginx\r\n\r\n"
# => OK
```
## Impact
- **All Geo blob replication is broken** on affected environments
- Geo status shows 0 synced, all failed for every blob replicator type
- Secondary sites fall back to proxying requests to primary (functional but defeats the purpose of Geo replication)
- Affects anyone using GET with Ubuntu 24.04 AMIs that ship kernel 6.17
## Possible Fixes
As mentioned above, this issue happens when the gem `llhttp-ffi` version `0.4.0` is used in the newest Ubuntu 24.04 kernel.
The current GitLab gemfile uses http `5.1.1` which [locks](https://gitlab.com/gitlab-org/gitlab/-/blob/master/Gemfile.lock?ref_type=heads#L1051-1055) `llhttp-ffi` to `0.4.x`.
A [newer version](https://github.com/httprb/http/blob/5-x-stable/http.gemspec) of the gem is available, which include `llhttp-ffi` "~> 0.5.0"
`kubeclient` and `gitlab_quality-test_tooling` constrain which version of `http` can be used. Both allow up to < 6.0, so they permit 5.3.1 (the latest stable 5.x).
**The proper fix for this bug is therefore to upgrade `http` to v. 5.3.1.**
## Context
Discovered while testing Geo replication for `Packages::Debian::ProjectComponentFile` (https://gitlab.com/gitlab-org/gitlab/-/work_items/593813, MR !228959). The ProjectComponentFile replicator code is correct — this environment issue blocks verification of ALL blob replicators.
issue