Skip to content

Pages deployments in S3 may not be found by Geo due to trimming of filename

Summary

In !192088 (comment 2668412966), @dbalexandre mentioned:

FYI, this change has been causing some verification failures in Geo. The pages deployment file was synced to the Geo secondary site, but the resource_exists? check was returning false, despite the file existing on the secondary S3.

We did further investigation and discovered that the stored file path on S3 from the /pages_deployments/ dir onwards is /pages_deployments/22109/6dd76dd7cb5359f54edb9eb0becd33d19118d3e93e266ef7644cf7b9ad2b9ed1262dfinalb90bfa2df272bd6dee9b05ba3d0b74d1b33f0ffc7a0fc43f2bf4338bb5bf6d0d20250612-47656-ezv00n but the method above was checking for /pages_deployments/22109/d0b74d1b33f0ffc7a0fc43f2bf4338bb5bf6d0d20250612-47656-ezv00n that's why it returns false and the verification fails.

@ktchernov @dbalexandre @ngala I was under the impression in https://gitlab.com/gitlab-com/request-for-help/-/issues/2862#note_2516107296 that the filename would be trimmed upon creation of a deployment.

I'm wondering:

  1. For new deployments, does the Geo check work fine because the stored filename is the same as the trimmed one?
  2. For old deployments, does the Geo check fail because of this disparity?

Workaround

The current workaround is to run the following script in a Rails session on the secondary site to output the AWS CLI move commands:

def generate_filename_mapping
  mapping = []
  
  # Get all failed Pages Deployment registries
  Geo::PagesDeploymentRegistry.failed.find_each do |registry|
    begin
      replicator = registry.replicator
      uploader = replicator.carrierwave_uploader
      next unless uploader&.file
      
      # Get the full expected path (what GitLab thinks the filename should be)
      expected_full_path = uploader.file.path
      
      # Extract just the filename part (after the last '/')
      expected_filename = expected_full_path.split('/').last
      
      # Generate the trimmed filename (60 characters max)
      # This matches what was actually stored in S3
      trimmed_filename = if expected_filename.length > 60
        expected_filename[-60..-1] # Take last 60 characters
      else
        expected_filename
      end
      
      # Only add to mapping if the filenames are different
      if expected_filename != trimmed_filename
        mapping << {
          registry_id: registry.id,
          pages_deployment_id: registry.pages_deployment_id,
          expected_filename: expected_filename,
          actual_filename: trimmed_filename,
          full_expected_path: expected_full_path,
          directory_path: expected_full_path.gsub("/#{expected_filename}", "")
        }
      end
    rescue => e
      puts "Error processing registry #{registry.id}: #{e.message}"
    end
  end
  
  mapping
end

# Usage example:
def display_filename_mapping
  mappings = generate_filename_mapping
  
  puts "Found #{mappings.length} files that need filename correction:"
  puts "=" * 80
  
  mappings.each_with_index do |mapping, index|
    puts "\n#{index + 1}. Registry ID: #{mapping[:registry_id]}, Pages Deployment ID: #{mapping[:pages_deployment_id]}"
    puts "   Directory: #{mapping[:directory_path]}"
    puts "   Expected (long): #{mapping[:expected_filename]} (#{mapping[:expected_filename].length} chars)"
    puts "   Actual (short):  #{mapping[:actual_filename]} (#{mapping[:actual_filename].length} chars)"
  end
  
  mappings
end

# Generate AWS CLI MOVE commands for renaming files
def generate_s3_move_commands(bucket_name)
  mappings = generate_filename_mapping
  commands = []
  
  mappings.each do |mapping|
    source_key = "#{mapping[:directory_path]}/#{mapping[:actual_filename]}"
    dest_key = "#{mapping[:directory_path]}/#{mapping[:expected_filename]}"
    
    # Remove leading slash if present
    source_key = source_key.sub(/^\//, '')
    dest_key = dest_key.sub(/^\//, '')
    
    commands << "aws s3 mv s3://#{bucket_name}/#{source_key} s3://#{bucket_name}/#{dest_key}"
  end
  
  commands
end

# Print S3 move commands line by line for easy bash pasting
def print_s3_move_commands(bucket_name)
  commands = generate_s3_move_commands(bucket_name)
  
  puts "\n# AWS S3 Move Commands (#{commands.length} files to rename)"
  puts "# Copy and paste these commands into your bash shell:"
  puts "#" + "=" * 79
  
  commands.each do |cmd|
    puts cmd
  end
  
  puts "\n# Total commands: #{commands.length}"
end

# usage:
# print_s3_move_commands("<PAGES-DEPLOYMENT-BUCKET-NAME-HERE>")
Edited by Douglas Barbosa Alexandre