Geo blob replication with :geo_blob_download_with_gitlab_http FF enabled fails for large objects that take >60 seconds to transfer
Summary
The fix for Geo blob replication fails with HPE_USER llhttp... (#595139 - closed) does not work correctly for the syncing of large blob files where the transfer time exceeds 60 seconds.
Several problems came to light while working with a customer who enabled the geo_blob_download_with_gitlab_http feature flag to overcome the original issue.
The customer has a 1.2GB artifact that needs to be synced to the secondary.
At first the sync fails with a Gitlab::HTTP_V2::ReadTotalTimeout with \"Request timed out after 30.00105542410165 seconds\" error. This is despite the code setting GITLAB_HTTP_TIMEOUT = 60.
Duo advised that the 60 second timeout was not being respected due to a missing timeout option, and suggested the following modifications to ee/lib/gitlab/geo/replication/blob_downloader.rb :
You need to explicitly pass the timeout parameter when calling Gitlab::HTTP.get():
options = {
headers: req_headers,
follow_redirects: false,
open_timeout: GITLAB_HTTP_TIMEOUT,
read_timeout: GITLAB_HTTP_TIMEOUT,
write_timeout: GITLAB_HTTP_TIMEOUT,
timeout: GITLAB_HTTP_TIMEOUT, # Add this line
allow_local_requests: true
}
Also update the stream_from_url method similarly:
options = {
headers: headers,
stream_body: true,
open_timeout: GITLAB_HTTP_TIMEOUT,
read_timeout: GITLAB_HTTP_TIMEOUT,
write_timeout: GITLAB_HTTP_TIMEOUT,
timeout: GITLAB_HTTP_TIMEOUT, # Add this line
allow_local_requests: true,
allow_object_storage: true
}After applying the suggested changes, the sync still failed but with an error indicating it was applying the 60s timeout: Gitlab::HTTP_V2::ReadTotalTimeout with \"Request timed out after 60.00203742410165 seconds\"
We then increased the timeout value from 60 seconds to 300 seconds, but the timeout continued to occur: Gitlab::HTTP_V2::ReadTotalTimeout with "Request timed out after 300.00279543106444 seconds"
When we increased it to 86400 seconds (1 day), a different sync error was seen: Non-success HTTP response status code 500
On the secondary we see the following:
{"severity":"DEBUG","time":"2026-04-23T21:33:35.350Z","correlation_id":"01KPY46DV1XWPYPFE7DBM0SNET","class":"Geo::JobArtifactRegistry","gitlab_host":"gitlab.example.com","message":"Sync state transition","registry_id":149555,"model_record_id":326079,"from":"failed","to":"started","result":true}
{"severity":"WARN","time":"2026-04-23T21:36:47.842Z","correlation_id":"01KPY46DV1XWPYPFE7DBM0SNET","class":"Geo::JobArtifactRegistry","gitlab_host":"gitlab.example.com","message":"Sync state transition","registry_id":149555,"model_record_id":326079,"from":"started","to":"failed","result":true}
{"severity":"WARN","time":"2026-04-23T21:36:47.846Z","correlation_id":"01KPY46DV1XWPYPFE7DBM0SNET","class":"Geo::BlobDownloadService","gitlab_host":"gitlab.example.com","message":"Blob download","replicable_name":"job_artifact","model_record_id":326079,"mark_as_synced":false,"download_success":false,"bytes_downloaded":0,"primary_missing_file":false,"download_time_s":192.498,"reason":"Internal Server Error","status_code":500,"url":"https://gitlab.example.com/api/v4/geo/retrieve/job_artifact/326079"}Looking on the primary, we see two requests in the NGINX log:
1.2.3.4 - - [23/Apr/2026:14:36:44 -0700] "GET /api/v4/geo/retrieve/job_artifact/326079 HTTP/1.1" 200 1860298532 "" "Ruby" -
1.2.3.4 - - [23/Apr/2026:14:36:47 -0700] "GET /api/v4/geo/retrieve/job_artifact/326079 HTTP/1.1" 500 39 "" "Ruby" -with the 1st request succeeding after ~3 minutes and 2nd request failing due to:
{
"time": "2026-04-23T21:36:47.755Z",
"severity": "INFO",
"duration_s": 0.01401,
"db_duration_s": 0.00005,
"view_duration_s": 0.01396,
"status": 500,
"method": "GET",
"path": "/api/v4/geo/retrieve/job_artifact/326079",
"params": [],
"host": "gitlab.example.com",
"remote_ip": "1.2.3.4, 127.0.0.1",
"ua": "Ruby",
"route": "/api/:version/geo/retrieve/:replicable_name/:replicable_id",
"exception.class": "Gitlab::Geo::InvalidSignatureTimeError",
"exception.message": "Signature not within leeway of 60 seconds. Check your system clocks!",
"exception.backtrace": [
"ee/lib/gitlab/geo/signed_data.rb:50:in `rescue in decode_data'",
"ee/lib/gitlab/geo/signed_data.rb:29:in `decode_data'",
"ee/lib/gitlab/geo/jwt_request_decoder.rb:57:in `decode_geo_request'",Duo says that this is because the new download code uses the same token for both the requests, instead of creating a new token for the 2nd request.
To summarise:
- the
GITLAB_HTTP_TIMEOUTis not being respected - the current
GITLAB_HTTP_TIMEOUTis not large enough to allow for big files that take multiple minutes to download - the download process fails for large files that take multiple minutes to download due to the token being reused for the second request and being too old.
Steps to reproduce
I think the above details provide sufficient guidance but I'm happy to provide more reproduction details if required.
Example Project
What is the current bug behavior?
Large blobs fail to sync.
What is the expected correct behavior?
Blobs should sync successfully regardless of size/transfer time.
Relevant logs and/or screenshots
Output of checks
Results of GitLab environment info
Expand for output related to GitLab environment info
(For installations with omnibus-gitlab package run and paste the output of: \`sudo gitlab-rake gitlab:env:info\`) (For installations from source run and paste the output of: \`sudo -u git -H bundle exec rake gitlab:env:info RAILS_ENV=production\`)
Results of GitLab application Check
Expand for output related to the GitLab application check
(For installations with omnibus-gitlab package run and paste the output of: `sudo gitlab-rake gitlab:check SANITIZE=true`) (For installations from source run and paste the output of: `sudo -u git -H bundle exec rake gitlab:check RAILS_ENV=production SANITIZE=true`) (we will only investigate if the tests are passing)
Implementation Plan / Possible fixes
Three issues when geo_blob_download_with_gitlab_http FF is enabled for large file transfers:
-
ReadTotalTimeoutat 30 seconds —Gitlab::HTTPhas aDEFAULT_READ_TOTAL_TIMEOUTof 30 seconds. We don't passtimeout:to override it, so large file streams are killed after 30 seconds regardless of ourread_timeout: 60setting. -
Timeout still too low even when fixed —
GITLAB_HTTP_TIMEOUT(60 seconds) is a per-operation timeout, not suitable as a total stream timeout. A 1.2GB file over a slow link can take minutes/hours. -
JWT token expiry on second request — For non-redirect (local storage) downloads,
download_file_with_gitlab_httpmakes a first request (non-streaming, gets 200), then callsstream_from_urlwith the same URL and auth headers. The second request arrives at the primary with the original JWT token which has a 60-second leeway. If the first request took >60 seconds, the token is expired and the primary returns 500 (Gitlab::Geo::InvalidSignatureTimeError).
Proposed fixes
Fix 1: Add timeout: using blob_download_timeout
In stream_from_url, set timeout: to the admin-configurable GeoNode#blob_download_timeout (default 28800 seconds / 8 hours):
def stream_from_url(url, temp_file, headers: {})
options = {
headers: headers,
stream_body: true,
timeout: ::GeoNode.current_node&.blob_download_timeout || 3600,
open_timeout: GITLAB_HTTP_TIMEOUT,
read_timeout: GITLAB_HTTP_TIMEOUT,
write_timeout: GITLAB_HTTP_TIMEOUT,
allow_local_requests: true,
allow_object_storage: true
}
# Stream to temporary file on disk
Gitlab::HTTP.get(url, options) do |fragment|
temp_file.write(fragment) if fragment.code == SUCCESS_STATUS_CODE
end
endAlso add timeout: to the initial non-streaming request in download_file_with_gitlab_http:
options = {
headers: req_headers,
follow_redirects: false,
timeout: GITLAB_HTTP_TIMEOUT,
open_timeout: GITLAB_HTTP_TIMEOUT,
read_timeout: GITLAB_HTTP_TIMEOUT,
write_timeout: GITLAB_HTTP_TIMEOUT,
allow_local_requests: true
}Fix 2: Eliminate the double request for non-redirect path
The current flow for local storage is: non-streaming GET (receives full 200 response) → stream_from_url (second GET to same URL, streams the file). This is wasteful and causes the JWT expiry issue.
Change to: single streaming GET for the non-redirect path. Detect success/redirect/error from the first fragment's status code:
def download_file_with_gitlab_http(url, req_headers, temp_file)
file_size = 0
options = {
headers: req_headers,
follow_redirects: false,
stream_body: true,
timeout: ::GeoNode.current_node&.blob_download_timeout || 3600,
open_timeout: GITLAB_HTTP_TIMEOUT,
read_timeout: GITLAB_HTTP_TIMEOUT,
write_timeout: GITLAB_HTTP_TIMEOUT,
allow_local_requests: true
}
redirect_location = nil
error_body = +""
response = Gitlab::HTTP.get(url, options) do |fragment|
if fragment.code.between?(300, 399)
redirect_location = fragment.http_response['location']
elsif fragment.code == SUCCESS_STATUS_CODE
temp_file.write(fragment)
else
error_body << fragment
end
end
if redirect_location
response = stream_from_url(redirect_location, temp_file)
end
unless response.success?
return non_success_response_result_gitlab_http(response, error_body, url)
end
# ... checksum, carrierwave, etc.
endThis eliminates the second request entirely for local storage, which fixes the JWT token expiry and avoids downloading the file twice.
Note: non_success_response_result_gitlab_http and primary_missing_file_gitlab_http? would need to accept error_body as a parameter since response.body is empty after streaming.
Files to change
ee/lib/gitlab/geo/replication/blob_downloader.rbee/spec/lib/gitlab/geo/replication/blob_downloader_spec.rb
Testing
- Add a test for the
timeout:option being passed
Patch release information for backports
If the bug fix needs to be backported in a patch release to a version under the maintenance policy, please follow the steps on the patch release runbook for GitLab engineers.
Refer to the internal "Release Information" dashboard for information about the next patch release, including the targeted versions, expected release date, and current status.
High-severity bug remediation
To remediate high-severity issues requiring an internal release for single-tenant SaaS instances, refer to the internal release process for engineers.