Clean up HTML-encoded entities in UserDetail fields before validate_sanitizable_user_details feature flag rollout
Problem Statement / Context
As part of !208943 (Refactor UserDetail to use the Sanitizable concern), we are introducing stricter validation for user profile fields. See discussion thread for context.
The legacy sanitize_attrs method used Sanitize.clean() which encoded ampersands (&) to HTML entities (&). Once the validate_sanitizable_user_details feature flag is enabled, these rows will fail validation because the Sanitizable concern rejects pre-escaped HTML entities.
Proposal
Create a batched background migration (BBM) to clean up HTML-encoded entities in UserDetail fields before the feature flag rollout.
Affected Data
| Category | Affected Rows | Fields |
|---|---|---|
Common HTML entities (&, <, >, ", ') |
~1,250 | linkedin (734), twitter (225), website_url (286), github (5) |
Numeric HTML entities (&, &, etc.) |
1 | website_url |
Path traversal patterns (../) |
12 | Various |
Double-encoded URLs (%25) |
15 | Various |
Proposed Solution
The BBM should process: linkedin, twitter, website_url, github
For HTML entity cleanup (safe to fix):
-
&→&,"→",'→' -
&,&→&(numeric entities)
For potentially malicious content (strip entirely):
-
<,>and their numeric variants → remove
Not addressed (let validation reject):
- Path traversal patterns (
../) - 12 users - Double-encoded URLs (
%25) - 15 users
Related Resources
- Parent MR: !208943
- Discussion: !208943 (comment 2928642361)
- Feature flag rollout issue: #581152
Implementation Plan
- Merge !208943 (feature flag disabled by default)
- Create and merge this BBM
- Wait for BBM to complete on production
- Roll out feature flag incrementally
Checklist
- Create batched background migration to unescape HTML entities
- Add migration specs
- Add post-deployment migration to enqueue the BBM
- Coordinate with !208943 for feature flag rollout timing