Skip to content

Switch to UrlValidator for Identifer URL validation

Brian Williams requested to merge bwill/fix-identifier-url-validation into master

What does this MR do and why?

In !127352 (merged), we started using AddressableUrlValidator to validate Identifier URLs.

AddressableUrlValidator makes a getaddrinfo syscall when validating the URL. This is a problem for the Vulnerabilities::Identifier model because we create them in bulk, and the syscall makes a DNS query over network for non-local URLs. This means that bulk inserts might need to wait for hundreds of network connections to finish!

To fix this, switch to using UrlValidator instead, which validates URLs using regex only. It is safe to validate these URLs with regex because we do not make server-side requests to them. They are displayed in the vulnerability details as clickable links.

Relates to:

How to set up and validate locally

Numbered steps to set up and validate the change are strongly suggested.

  1. Start rails console: bundle exec rails c

  2. Run this code:

    puts "start: #{Time.zone.now}"
    300.times { |i| Vulnerabilities::Identifier.new(url: "https://security#{i}.example.com").valid? }
    puts "end: #{Time.zone.now}"

Before: ~40 seconds (timing varies based on network latency)

start: 2023-11-02 19:07:41 UTC
end: 2023-11-02 19:08:20 UTC

After: < 1 second

start: 2023-11-02 19:10:10 UTC
end: 2023-11-02 19:10:10 UTC

MR acceptance checklist

This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.

Edited by Brian Williams

Merge request reports