Pages deployments in S3 may not be found by Geo due to trimming of filename
Summary
In !192088 (comment 2668412966), @dbalexandre mentioned:
FYI, this change has been causing some verification failures in Geo. The pages deployment file was synced to the Geo secondary site, but the
resource_exists?
check was returning false, despite the file existing on the secondary S3.We did further investigation and discovered that the stored file path on S3 from the
/pages_deployments/
dir onwards is/pages_deployments/22109/6dd76dd7cb5359f54edb9eb0becd33d19118d3e93e266ef7644cf7b9ad2b9ed1262dfinalb90bfa2df272bd6dee9b05ba3d0b74d1b33f0ffc7a0fc43f2bf4338bb5bf6d0d20250612-47656-ezv00n
but the method above was checking for/pages_deployments/22109/d0b74d1b33f0ffc7a0fc43f2bf4338bb5bf6d0d20250612-47656-ezv00n
that's why it returns false and the verification fails.
@ktchernov @dbalexandre @ngala I was under the impression in https://gitlab.com/gitlab-com/request-for-help/-/issues/2862#note_2516107296 that the filename would be trimmed upon creation of a deployment.
I'm wondering:
- For new deployments, does the Geo check work fine because the stored filename is the same as the trimmed one?
- For old deployments, does the Geo check fail because of this disparity?
Workaround
The current workaround is to run the following script in a Rails session on the secondary site to output the AWS CLI move commands:
def generate_filename_mapping
mapping = []
# Get all failed Pages Deployment registries
Geo::PagesDeploymentRegistry.failed.find_each do |registry|
begin
replicator = registry.replicator
uploader = replicator.carrierwave_uploader
next unless uploader&.file
# Get the full expected path (what GitLab thinks the filename should be)
expected_full_path = uploader.file.path
# Extract just the filename part (after the last '/')
expected_filename = expected_full_path.split('/').last
# Generate the trimmed filename (60 characters max)
# This matches what was actually stored in S3
trimmed_filename = if expected_filename.length > 60
expected_filename[-60..-1] # Take last 60 characters
else
expected_filename
end
# Only add to mapping if the filenames are different
if expected_filename != trimmed_filename
mapping << {
registry_id: registry.id,
pages_deployment_id: registry.pages_deployment_id,
expected_filename: expected_filename,
actual_filename: trimmed_filename,
full_expected_path: expected_full_path,
directory_path: expected_full_path.gsub("/#{expected_filename}", "")
}
end
rescue => e
puts "Error processing registry #{registry.id}: #{e.message}"
end
end
mapping
end
# Usage example:
def display_filename_mapping
mappings = generate_filename_mapping
puts "Found #{mappings.length} files that need filename correction:"
puts "=" * 80
mappings.each_with_index do |mapping, index|
puts "\n#{index + 1}. Registry ID: #{mapping[:registry_id]}, Pages Deployment ID: #{mapping[:pages_deployment_id]}"
puts " Directory: #{mapping[:directory_path]}"
puts " Expected (long): #{mapping[:expected_filename]} (#{mapping[:expected_filename].length} chars)"
puts " Actual (short): #{mapping[:actual_filename]} (#{mapping[:actual_filename].length} chars)"
end
mappings
end
# Generate AWS CLI MOVE commands for renaming files
def generate_s3_move_commands(bucket_name)
mappings = generate_filename_mapping
commands = []
mappings.each do |mapping|
source_key = "#{mapping[:directory_path]}/#{mapping[:actual_filename]}"
dest_key = "#{mapping[:directory_path]}/#{mapping[:expected_filename]}"
# Remove leading slash if present
source_key = source_key.sub(/^\//, '')
dest_key = dest_key.sub(/^\//, '')
commands << "aws s3 mv s3://#{bucket_name}/#{source_key} s3://#{bucket_name}/#{dest_key}"
end
commands
end
# Print S3 move commands line by line for easy bash pasting
def print_s3_move_commands(bucket_name)
commands = generate_s3_move_commands(bucket_name)
puts "\n# AWS S3 Move Commands (#{commands.length} files to rename)"
puts "# Copy and paste these commands into your bash shell:"
puts "#" + "=" * 79
commands.each do |cmd|
puts cmd
end
puts "\n# Total commands: #{commands.length}"
end
# usage:
# print_s3_move_commands("<PAGES-DEPLOYMENT-BUCKET-NAME-HERE>")