Skip to content

Draft: Refactor UserDetail to use Sanitizable concern

What does this MR do and why?

Migrates the UserDetail model from the deprecated Sanitize.clean() method to the Sanitizable concern using the modern Sanitize.fragment() API. This refactoring improves code consistency by aligning with patterns already used in other models like NamespaceSetting and ApplicationSetting.

Key changes:

  • Migrated 8 user profile attributes (bluesky, discord, linkedin, mastodon, orcid, twitter, website_url, github) to use the sanitizes! declaration from the Sanitizable concern
  • Retained custom sanitization for location and organization fields to preserve special ampersand handling behavior (&&)
  • Added security validations for pre-escaped HTML entities, double-encoded data, and path traversal attempts
  • Updated from deprecated Sanitize.clean() to modern Sanitize.fragment() API

Behavioral changes:

  1. Double-encoding prevention: The Sanitizable concern uses CGI.unescapeHTML() to prevent double-encoding, so ampersands now remain unencoded rather than being converted to & entities. This is the more correct behavior.

  2. New security validations: The following malicious patterns are now rejected with validation errors:

    Attack Type Example Input Validation Error Why Safe to Reject
    Pre-escaped HTML entities
    (XSS bypass attempt)
    <script>alert(1)</script> "cannot contain escaped HTML entities" Legitimate users don't manually encode HTML entities. This pattern indicates an attempt to bypass frontend sanitization by pre-encoding malicious scripts that might be decoded and executed later.
    Double-encoded data
    (Encoding evasion)
    %2526lt%253Bscript%2526gt%253B "cannot contain escaped components" Legitimate URLs and social media usernames don't contain double-encoded characters. This pattern indicates an attempt to evade sanitization through multiple encoding layers, hoping one layer gets decoded while bypassing filters.
    Path traversal
    (Directory traversal attack)
    main../../../../../../api/v4/projects/1 "cannot contain a path traversal component" Legitimate social media handles, websites, and profile URLs never contain ../ sequences. This pattern indicates an attempt to access unauthorized file paths or resources through directory traversal.

    These values were previously sanitized silently but are now explicitly rejected, providing better security feedback and preventing malicious data from being stored in the database.

🛠️ with ❤️ at Siemens

References

  • Related to issue #577060
  • Sanitizable concern: app/models/concerns/sanitizable.rb
  • Similar implementations: NamespaceSetting and ApplicationSetting models

Screenshots or screen recordings

N/A - Backend refactoring with no UI changes

How to set up and validate locally

  1. Open Rails console: bundle exec rails console

  2. Test basic sanitization works:

    user = User.first
    ud = user.user_detail
    ud.linkedin = '<script>alert("xss")</script>test'
    ud.save
    ud.linkedin # Should output: 'test' (script removed)
  3. Test new security validations reject malicious input:

    ud.twitter = '&lt;script&gt;alert(1)&lt;/script&gt;'
    ud.valid? # Should be false
    ud.errors.full_messages # Should include: "Twitter cannot contain escaped HTML entities"
    
    ud.github = '../../../etc/passwd'
    ud.valid? # Should be false
    ud.errors.full_messages # Should include: "Github cannot contain a path traversal component"
  4. Test special ampersand handling for location/organization:

    ud.location = 'San Francisco & Bay Area'
    ud.save
    ud.location # Should output: 'San Francisco & Bay Area' (ampersand preserved, not encoded)
  5. Run the test suite:

    bundle exec rspec spec/models/user_detail_spec.rb

MR acceptance checklist

Please evaluate this MR against the MR acceptance checklist. It helps you analyze changes to reduce risks in quality, performance, reliability, security, and maintainability.

MR Checklist (@gerardo-navarro)
Edited by 🤖 GitLab Bot 🤖

Merge request reports

Loading