Refactor UserDetail to use Sanitizable concern

What does this MR do and why?

Migrates the UserDetail model from the deprecated Sanitize.clean() method to the Sanitizable concern using the modern Sanitize.fragment() API. This refactoring improves code consistency by aligning with patterns already used in other models like NamespaceSetting and ApplicationSetting.

Key changes:

  • Migrated 8 user profile attributes (bluesky, discord, linkedin, mastodon, orcid, twitter, website_url, github) to use the sanitizes! declaration from the Sanitizable concern
  • Retained custom sanitization for location and organization fields to preserve special ampersand handling behavior (&&)
  • Added security validations for pre-escaped HTML entities, double-encoded data, and path traversal attempts
  • Updated from deprecated Sanitize.clean() to modern Sanitize.fragment() API

Behavioral changes:

  1. Double-encoding prevention: The Sanitizable concern uses CGI.unescapeHTML() to prevent double-encoding, so ampersands now remain unencoded rather than being converted to & entities. This is the more correct behavior.

  2. New security validations: The following malicious patterns are now rejected with validation errors:

    Attack Type Example Input Validation Error Why Safe to Reject
    Pre-escaped HTML entities
    (XSS bypass attempt)
    <script>alert(1)</script> "cannot contain escaped HTML entities" Legitimate users don't manually encode HTML entities. This pattern indicates an attempt to bypass frontend sanitization by pre-encoding malicious scripts that might be decoded and executed later.
    Double-encoded data
    (Encoding evasion)
    %2526lt%253Bscript%2526gt%253B "cannot contain escaped components" Legitimate URLs and social media usernames don't contain double-encoded characters. This pattern indicates an attempt to evade sanitization through multiple encoding layers, hoping one layer gets decoded while bypassing filters.
    Path traversal
    (Directory traversal attack)
    main../../../../../../api/v4/projects/1 "cannot contain a path traversal component" Legitimate social media handles, websites, and profile URLs never contain ../ sequences. This pattern indicates an attempt to access unauthorized file paths or resources through directory traversal.

    These values were previously sanitized silently but are now explicitly rejected, providing better security feedback and preventing malicious data from being stored in the database.

Breaking API Change:

This MR introduces validation errors for the PUT /users/:id endpoint when user_details fields contain malicious patterns. API consumers that previously sent pre-escaped HTML, double-encoded data, or path traversal patterns will now receive 400 Bad Request responses with validation error messages.

Migration Impact Analysis: !208943 (comment 2928642361)

Feature Flag:

This change is behind the :validate_sanitizable_user_details feature flag.

Environment Default State
Development Enabled (feature flags default to enabled in dev/test)
Production Disabled (requires explicit rollout)

References

  • Related to issue #577060 (closed)
  • Rollout issue: #581152
  • Sanitizable concern: app/models/concerns/sanitizable.rb
  • Similar implementations: NamespaceSetting and ApplicationSetting models

Screenshots or screen recordings

N/A - Backend refactoring with no UI changes

How to set up and validate locally

  1. Open Rails console: bundle exec rails console

  2. Enable the feature flag:**

    # In Rails console
    Feature.enable(:validate_sanitizable_user_details)
  3. Test basic sanitization works:

    user = User.first
    ud = user.user_detail
    ud.linkedin = '<script>alert("xss")</script>test'
    ud.save
    ud.linkedin # Should output: 'test' (script removed)
  4. Test new security validations reject malicious input:

    # Pre-escaped HTML entities
    ud.twitter = '&lt;script&gt;alert(1)&lt;/script&gt;'
    ud.valid? # Should be false
    ud.errors.full_messages # Should include: "Twitter cannot contain escaped HTML entities"
    
    # Path traversal
    ud.github = '../../../etc/passwd'
    ud.valid? # Should be false
    ud.errors.full_messages # Should include: "Github cannot contain a path traversal component"
  5. Test double-encoded URL validation:

    # Double-encoded URL (malicious pattern)
    ud.website_url = '%2526lt%253Bscript%2526gt%253B'
    ud.valid? # Should be false
    ud.errors.full_messages # Should include: "Website url cannot contain escaped components"
    
    # Another double-encoded example
    ud.linkedin = '%252F%252Fevil.com'
    ud.valid? # Should be false
    ud.errors.full_messages # Should include: "Linkedin cannot contain escaped components"
    
    # Valid URL with normal encoding (should pass)
    ud.website_url = 'https://example.com/search?q=hello%20world'
    ud.valid? # Should be true
  6. Test special ampersand handling for location/organization:

    ud.location = 'San Francisco & Bay Area'
    ud.save
    ud.location # Should output: 'San Francisco & Bay Area' (ampersand preserved, not encoded)
  7. Test API endpoint with user_details:

    # Valid update (should succeed)
    curl -X PUT "http://localhost:3000/api/v4/users/:id" \
      -H "PRIVATE-TOKEN: <admin_token>" \
      -d "linkedin=https://linkedin.com/in/username"
    
    # Invalid update - pre-escaped HTML (should fail with 400)
    curl -X PUT "http://localhost:3000/api/v4/users/:id" \
      -H "PRIVATE-TOKEN: <admin_token>" \
      -d "twitter=&lt;script&gt;alert(1)&lt;/script&gt;"
    
    # Invalid update - double-encoded (should fail with 400)
    curl -X PUT "http://localhost:3000/api/v4/users/:id" \
      -H "PRIVATE-TOKEN: <admin_token>" \
      -d "website_url=%2526lt%253Bscript%2526gt%253B"
  8. Run the test suite:

    bundle exec rspec spec/models/user_detail_spec.rb
    bundle exec rspec spec/requests/api/users_spec.rb -e "PUT /users/:id"

MR acceptance checklist

Please evaluate this MR against the MR acceptance checklist. It helps you analyze changes to reduce risks in quality, performance, reliability, security, and maintainability.

MR Checklist (@gerardo-navarro)
Edited by Gerardo Navarro

Merge request reports

Loading