Draft: Refactor UserDetail to use Sanitizable concern
-
Please check this box if this contribution uses AI-generated content (including content generated by GitLab Duo features) as outlined in the GitLab DCO & CLA. As a benefit of being a GitLab Community Contributor, you receive complimentary access to GitLab Duo.
What does this MR do and why?
Migrates the UserDetail
model from the deprecated Sanitize.clean()
method to the Sanitizable
concern using the modern Sanitize.fragment()
API. This refactoring improves code consistency by aligning with patterns already used in other models like NamespaceSetting
and ApplicationSetting
.
Key changes:
- Migrated 8 user profile attributes (bluesky, discord, linkedin, mastodon, orcid, twitter, website_url, github) to use the
sanitizes!
declaration from the Sanitizable concern - Retained custom sanitization for
location
andorganization
fields to preserve special ampersand handling behavior (&
→&
) - Added security validations for pre-escaped HTML entities, double-encoded data, and path traversal attempts
- Updated from deprecated
Sanitize.clean()
to modernSanitize.fragment()
API
Behavioral changes:
-
Double-encoding prevention: The Sanitizable concern uses
CGI.unescapeHTML()
to prevent double-encoding, so ampersands now remain unencoded rather than being converted to&
entities. This is the more correct behavior. -
New security validations: The following malicious patterns are now rejected with validation errors:
Attack Type Example Input Validation Error Why Safe to Reject Pre-escaped HTML entities
(XSS bypass attempt)<script>alert(1)</script>
"cannot contain escaped HTML entities" Legitimate users don't manually encode HTML entities. This pattern indicates an attempt to bypass frontend sanitization by pre-encoding malicious scripts that might be decoded and executed later. Double-encoded data
(Encoding evasion)%2526lt%253Bscript%2526gt%253B
"cannot contain escaped components" Legitimate URLs and social media usernames don't contain double-encoded characters. This pattern indicates an attempt to evade sanitization through multiple encoding layers, hoping one layer gets decoded while bypassing filters. Path traversal
(Directory traversal attack)main../../../../../../api/v4/projects/1
"cannot contain a path traversal component" Legitimate social media handles, websites, and profile URLs never contain ../
sequences. This pattern indicates an attempt to access unauthorized file paths or resources through directory traversal.These values were previously sanitized silently but are now explicitly rejected, providing better security feedback and preventing malicious data from being stored in the database.
References
- Related to issue #577060
- Sanitizable concern:
app/models/concerns/sanitizable.rb
- Similar implementations:
NamespaceSetting
andApplicationSetting
models
Screenshots or screen recordings
N/A - Backend refactoring with no UI changes
How to set up and validate locally
-
Open Rails console:
bundle exec rails console
-
Test basic sanitization works:
user = User.first ud = user.user_detail ud.linkedin = '<script>alert("xss")</script>test' ud.save ud.linkedin # Should output: 'test' (script removed)
-
Test new security validations reject malicious input:
ud.twitter = '<script>alert(1)</script>' ud.valid? # Should be false ud.errors.full_messages # Should include: "Twitter cannot contain escaped HTML entities" ud.github = '../../../etc/passwd' ud.valid? # Should be false ud.errors.full_messages # Should include: "Github cannot contain a path traversal component"
-
Test special ampersand handling for location/organization:
ud.location = 'San Francisco & Bay Area' ud.save ud.location # Should output: 'San Francisco & Bay Area' (ampersand preserved, not encoded)
-
Run the test suite:
bundle exec rspec spec/models/user_detail_spec.rb
MR acceptance checklist
Please evaluate this MR against the MR acceptance checklist. It helps you analyze changes to reduce risks in quality, performance, reliability, security, and maintainability.
MR Checklist (@gerardo-navarro)
-
Changelog entry added, if necessary -
Documentation created/updated via this MR -
Documentation reviewed by technical writer or follow-up review issue created -
Tests added for this feature/bug -
Tested in all supported browsers -
Conforms to the code review guidelines -
Conforms to the style guides -
Conforms to the javascript style guides -
Conforms to the database guides -
Conforms to the merge request performance guidelines -
All RSpec tests pass (146 examples, 0 failures) -
RuboCop linting passes with no offenses -
Security: Added validations for path traversal, double-encoding, and pre-escaped HTML -
Code consistency: Follows established Sanitizable concern pattern used in other models -
No database migrations required (model-only changes)