Refactor UserDetail to use Sanitizable concern (!208943) · Merge requests · GitLab.org / GitLab

Please check this box if this contribution uses AI-generated content (including content generated by GitLab Duo features) as outlined in the GitLab DCO & CLA. As a benefit of being a GitLab Community Contributor, you receive complimentary access to GitLab Duo.

What does this MR do and why?

Migrates the UserDetail model from the deprecated Sanitize.clean() method to the Sanitizable concern using the modern Sanitize.fragment() API. This refactoring improves code consistency by aligning with patterns already used in other models like NamespaceSetting and ApplicationSetting.

Key changes:

Migrated 8 user profile attributes (bluesky, discord, linkedin, mastodon, orcid, twitter, website_url, github) to use the sanitizes! declaration from the Sanitizable concern
Retained custom sanitization for location and organization fields to preserve special ampersand handling behavior (& → &)
Added security validations for pre-escaped HTML entities, double-encoded data, and path traversal attempts
Updated from deprecated Sanitize.clean() to modern Sanitize.fragment() API

Behavioral changes:

Double-encoding prevention: The Sanitizable concern uses CGI.unescapeHTML() to prevent double-encoding, so ampersands now remain unencoded rather than being converted to & entities. This is the more correct behavior.

New security validations: The following malicious patterns are now rejected with validation errors:

Attack Type	Example Input	Validation Error	Why Safe to Reject
Pre-escaped HTML entities (XSS bypass attempt)	`<script>alert(1)</script>`	"cannot contain escaped HTML entities"	Legitimate users don't manually encode HTML entities. This pattern indicates an attempt to bypass frontend sanitization by pre-encoding malicious scripts that might be decoded and executed later.
Double-encoded data (Encoding evasion)	`%2526lt%253Bscript%2526gt%253B`	"cannot contain escaped components"	Legitimate URLs and social media usernames don't contain double-encoded characters. This pattern indicates an attempt to evade sanitization through multiple encoding layers, hoping one layer gets decoded while bypassing filters.
Path traversal (Directory traversal attack)	`main../../../../../../api/v4/projects/1`	"cannot contain a path traversal component"	Legitimate social media handles, websites, and profile URLs never contain `../` sequences. This pattern indicates an attempt to access unauthorized file paths or resources through directory traversal.

These values were previously sanitized silently but are now explicitly rejected, providing better security feedback and preventing malicious data from being stored in the database.

Breaking API Change:

This MR introduces validation errors for the PUT /users/:id endpoint when user_details fields contain malicious patterns. API consumers that previously sent pre-escaped HTML, double-encoded data, or path traversal patterns will now receive 400 Bad Request responses with validation error messages.

Migration Impact Analysis: !208943 (comment 2928642361)

Feature Flag:

This change is behind the :validate_sanitizable_user_details feature flag.

Environment	Default State
Development	Enabled (feature flags default to enabled in dev/test)
Production	Disabled (requires explicit rollout)

References

Related to issue #577060 (closed)
Rollout issue: #581152
Sanitizable concern: app/models/concerns/sanitizable.rb
Similar implementations: NamespaceSetting and ApplicationSetting models

Screenshots or screen recordings

N/A - Backend refactoring with no UI changes

How to set up and validate locally

Open Rails console: bundle exec rails console

Enable the feature flag:**

# In Rails console
Feature.enable(:validate_sanitizable_user_details)

Test basic sanitization works:

user = User.first
ud = user.user_detail
ud.linkedin = '<script>alert("xss")</script>test'
ud.save
ud.linkedin # Should output: 'test' (script removed)

Test new security validations reject malicious input:

# Pre-escaped HTML entities
ud.twitter = '&lt;script&gt;alert(1)&lt;/script&gt;'
ud.valid? # Should be false
ud.errors.full_messages # Should include: "Twitter cannot contain escaped HTML entities"

# Path traversal
ud.github = '../../../etc/passwd'
ud.valid? # Should be false
ud.errors.full_messages # Should include: "Github cannot contain a path traversal component"

Test double-encoded URL validation:

# Double-encoded URL (malicious pattern)
ud.website_url = '%2526lt%253Bscript%2526gt%253B'
ud.valid? # Should be false
ud.errors.full_messages # Should include: "Website url cannot contain escaped components"

# Another double-encoded example
ud.linkedin = '%252F%252Fevil.com'
ud.valid? # Should be false
ud.errors.full_messages # Should include: "Linkedin cannot contain escaped components"

# Valid URL with normal encoding (should pass)
ud.website_url = 'https://example.com/search?q=hello%20world'
ud.valid? # Should be true

Test special ampersand handling for location/organization:

ud.location = 'San Francisco & Bay Area'
ud.save
ud.location # Should output: 'San Francisco & Bay Area' (ampersand preserved, not encoded)

Test API endpoint with user_details:

# Valid update (should succeed)
curl -X PUT "http://localhost:3000/api/v4/users/:id" \
  -H "PRIVATE-TOKEN: <admin_token>" \
  -d "linkedin=https://linkedin.com/in/username"

# Invalid update - pre-escaped HTML (should fail with 400)
curl -X PUT "http://localhost:3000/api/v4/users/:id" \
  -H "PRIVATE-TOKEN: <admin_token>" \
  -d "twitter=&lt;script&gt;alert(1)&lt;/script&gt;"

# Invalid update - double-encoded (should fail with 400)
curl -X PUT "http://localhost:3000/api/v4/users/:id" \
  -H "PRIVATE-TOKEN: <admin_token>" \
  -d "website_url=%2526lt%253Bscript%2526gt%253B"

Run the test suite:

bundle exec rspec spec/models/user_detail_spec.rb
bundle exec rspec spec/requests/api/users_spec.rb -e "PUT /users/:id"

MR acceptance checklist

Please evaluate this MR against the MR acceptance checklist. It helps you analyze changes to reduce risks in quality, performance, reliability, security, and maintainability.

MR Checklist (@gerardo-navarro)

Edited Dec 03, 2025 by Gerardo Navarro

Refactor UserDetail to use Sanitizable concern

What does this MR do and why?

References

Screenshots or screen recordings

How to set up and validate locally

MR acceptance checklist

Merge request reports