MRs against utf-16 encoded files fail to display diff correctly
Summary
When importing a project export file, MR's containing "null byte" characters such as \u0000p\u0000a\u0000n\u0000y\u0000N\u0000a\u0000m\u0000e
, won't be imported at all.
When creating a Merge Request on Gitlab.com against a file that has UTF-16 encoding, the resulting diff does not display correctly:
Steps to reproduce
- Create a new project using Visual Studio, which includes a default
.rc
file encoded as UTF-16. - Add the project to a new git repo, including a
.gitattributes
file containing the line*.rc diff
- Create a MR by changing contents of the
.rc
file
These characters are present in UTF-16 encoded files, such as Microsoft Visual Studio resource files, and quite common in VS projects for Windows.
Example diff:
{
"relative_order": 10,
"new_file": false,
"renamed_file": false,
"deleted_file": false,
"too_large": false,
"a_mode": "100644",
"b_mode": "100644",
"new_path": "Example/Project/branch/ResourceFile.rc",
"old_path": "Example/Project/branch/ResourceFile.rc",",
"binary": true,
"utf8_diff": "@@ -58,8 +58,8 @@\n \u0000/\u0000/\u0000\r\u0000\n \u0000\r\u0000\n \u0000V\u0000S\u0000_\u0000V\u0000E\u0000R\u0000S\u0000I\u0000O\u0000N\u0000_\u0000I\u0000N\u0000F\u0000O\u0000 \u0000V\u0000E\u0000R\u0000S\u0000I\u0000O\u0000N\u0000I\u0000N\u0000F\u0000O\u0000\r\u0000\n-\u0000 \u0000F\u0000I\u0000L\u0000E\u0000V\u0000E\u0000R\u0000S\u0000I\u0000O\u0000N\u0000 \u00002\u0000,\u00002\u0000,\u00000\u0000,\u00004\u00008\u0000\r\u0000\n-\u0000 \u0000P\u0000R\u0000O\u0000D\u0000U\u0000C\u0000T\u0000V\u0000E\u0000R\u0000S\u0000I\u0000O\u0000N\u0000 \u00002\u0000,\u00002\u0000,\u00000\u0000,\u00004\u00008\u0000\r\u0000\n+\u0000 \u0000F\u0000I\u0000L\u0000E\u0000V\u0000E\u0000R\u0000S\u0000I\u0000O\u0000N\u0000 \u00003\u0000,\u00001\u0000,\u00000\u0000,\u00004\u00009\u0000\r\u0000\n+\u0000 \u0000P\u0000R\u0000O\u0000D\u0000U\u0000C\u0000T\u0000V\u0000E\u0000R\u0000S\u0000I\u0000O\u0000N\u0000 \u00003\u0000,\u00001\u0000,\u00000\u0000,\u00004\u00009\u0000\r\u0000\n \u0000 \u0000F\u0000I\u0000L\u0000E\u0000F\u0000L\u0000A\u0000G\u0000S\u0000M\u0000A\u0000S\u0000K\u0000 \u00000\u0000x\u00003\u0000f\u0000L\u0000\r\u0000\n \u0000#\u0000i\u0000f\u0000d\u0000e\u0000f\u0000 \u0000_\u0000D\u0000E\u0000B\u0000U\u0000G\u0000\r\u0000\n \u0000 \u0000F\u0000I\u0000L\u0000E\u0000F\u0000L\u0000A\u0000G\u0000S\u0000 \u00000\u0000x\u00001\u0000L\u0000\r\u0000\n@@ -76,12 +76,12 @@\n \u0000 \u0000 \u0000 \u0000 \u0000 \u0000 \u0000 \u0000 \u0000B\u0000E\u0000G\u0000I\u0000N\u0000\r\u0000\n \u0000 \u0000 \u0000 \u0000 \u0000 \u0000 \u0000 \u0000 \u0000 \u0000 \u0000 \u0000 \u0000V\u0000A\u0000L\u0000U\u0000E\u0000 \u0000\"\u0000C\u0000o\u0000m\u0000p\u0000a\u0000n\u0000y\u0000N\u0000a\u0000m\u0000e\u0000\"\u0000,\u0000 \u0000\"\u0000R\u0000o\u0000h\u0000d\u0000e\u0000 \u0000&\u0000 \u0000S\u0000c\u0000h\u0000w\u0000a\u0000r\u0000z\u0000 \u0000C\u0000y\u0000b\u0000e\u0000r\u0000s\u0000e\u0000c\u0000u\u0000r\u0000i\u0000t\u0000y\u0000 \u0000G\u0000m\u0000b\u0000H\u0000\"\u0000\r\u0000\n \u0000 \u0000 \u0000 \u0000 \u0000 \u0000 \u0000 \u0000 \u0000 \u0000 \u0000 \u0000 \u0000V\u0000A\u0000L\u0000U\u0000E\u0000 \u0000\"\u0000F\u0000i\u0000l\u0000e\u0000D\u0000e\u0000s\u0000c\u0000r\u0000i\u0000p\u0000t\u0000i\u0000o\u0000n\u0000\"\u0000,\u0000 \u0000\"\u0000T\u0000r\u0000u\u0000s\u0000t\u0000e\u0000d\u0000W\u0000o\u0000r\u0000k\u0000s\u0000t\u0000a\u0000t\u0000i\u0000o\u0000n\u0000 \u0000A\u0000g\u0000e\u0000n\u0000t\u0000\"\u0000\r\u0000\n-\u0000 \u0000 \u0000 \u0000 \u0000 \u0000 \u0000 \u0000 \u0000 \u0000 \u0000 \u0000 \u0000V\u0000A\u0000L\u0000U\u0000E\u0000 \u0000\"\u0000F\u0000i\u0000l\u0000e\u0000V\u0000e\u0000r\u0000s\u0000i\u0000o\u0000n\u0000\"\u0000,\u0000 \u0000\"\u00002\u0000.\u00002\u0000.\u00000\u0000.\u00004\u00008\u0000\"\u0000\r\u0000\n+\u0000 \u0000 \u0000 \u0000 \u0000 \u0000 \u0000 \u0000 \u0000 \u0000 \u0000 \u0000 \u0000V\u0000A\u0000L\u0000U\u0000E\u0000 \u0000\"\u0000F\u0000i\u0000l\u0000e\u0000V\u0000e\u0000r\u0000s\u0000i\u0000o\u0000n\u0000\"\u0000,\u0000 \u0000\"\u00003\u0000.\u00001\u0000.\u00000\u0000.\u00004\u00009\u0000\"\u0000\r\u0000\n \u0000 \u0000 \u0000 \u0000 \u0000 \u0000 \u0000 \u0000 \u0000 \u0000 \u0000 \u0000 \u0000V\u0000A\u0000L\u0000U\u0000E\u0000 \u0000\"\u0000I\u0000n\u0000t\u0000e\u0000r\u0000n\u0000a\u0000l\u0000N\u0000a\u0000m\u0000e\u0000\"\u0000,\u0000 \u0000\"\u0000T\u0000r\u0000u\u0000s\u0000t\u0000e\u0000d\u0000W\u0000o\u0000r\u0000k\u0000s\u0000t\u0000a\u0000t\u0000i\u0000o\u0000n\u0000 \u0000A\u0000g\u0000e\u0000n\u0000t\u0000\"\u0000\r\u0000\n \u0000 \u0000 \u0000 \u0000 \u0000 \u0000 \u0000 \u0000 \u0000 \u0000 \u0000 \u0000 \u0000V\u0000A\u0000L\u0000U\u0000E\u0000 \u0000\"\u0000L\u0000e\u0000g\u0000a\u0000l\u0000C\u0000o\u0000p\u0000y\u0000r\u0000i\u0000g\u0000h\u0000t\u0000\"\u0000,\u0000 \u0000\"\u0000(\u0000C\u0000)\u0000 \u00002\u00000\u00002\u00000\u0000 \u0000R\u0000o\u0000h\u0000d\u0000e\u0000 \u0000&\u0000 \u0000S\u0000c\u0000h\u0000w\u0000a\u0000r\u0000z\u0000 \u0000C\u0000y\u0000b\u0000e\u0000r\u0000s\u0000e\u0000c\u0000u\u0000r\u0000i\u0000t\u0000y\u0000 \u0000G\u0000m\u0000b\u0000H\u0000\"\u0000\r\u0000\n \u0000 \u0000 \u0000 \u0000 \u0000 \u0000 \u0000 \u0000 \u0000 \u0000 \u0000 \u0000 \u0000V\u0000A\u0000L\u0000U\u0000E\u0000 \u0000\"\u0000O\u0000r\u0000i\u0000g\u0000i\u0000n\u0000a\u0000l\u0000F\u0000i\u0000l\u0000e\u0000n\u0000a\u0000m\u0000e\u0000\"\u0000,\u0000 \u0000\"\u0000T\u0000r\u0000u\u0000s\u0000t\u0000e\u0000d\u0000W\u0000o\u0000r\u0000k\u0000s\u0000t\u0000a\u0000t\u0000i\u0000o\u0000n\u0000A\u0000g\u0000e\u0000n\u0000t\u0000\"\u0000\r\u0000\n \u0000 \u0000 \u0000 \u0000 \u0000 \u0000 \u0000 \u0000 \u0000 \u0000 \u0000 \u0000 \u0000V\u0000A\u0000L\u0000U\u0000E\u0000 \u0000\"\u0000P\u0000r\u0000o\u0000d\u0000u\u0000c\u0000t\u0000N\u0000a\u0000m\u0000e\u0000\"\u0000,\u0000 \u0000\"\u0000T\u0000r\u0000u\u0000s\u0000t\u0000e\u0000d\u0000W\u0000o\u0000r\u0000k\u0000s\u0000t\u0000a\u0000t\u0000i\u0000o\u0000n\u0000 \u0000A\u0000g\u0000e\u0000n\u0000t\u0000\"\u0000\r\u0000\n-\u0000 \u0000 \u0000 \u0000 \u0000 \u0000 \u0000 \u0000 \u0000 \u0000 \u0000 \u0000 \u0000V\u0000A\u0000L\u0000U\u0000E\u0000 \u0000\"\u0000P\u0000r\u0000o\u0000d\u0000u\u0000c\u0000t\u0000V\u0000e\u0000r\u0000s\u0000i\u0000o\u0000n\u0000\"\u0000,\u0000 \u0000\"\u00002\u0000.\u00002\u0000.\u00000\u0000.\u00004\u00008\u0000\"\u0000\r\u0000\n+\u0000 \u0000 \u0000 \u0000 \u0000 \u0000 \u0000 \u0000 \u0000 \u0000 \u0000 \u0000 \u0000V\u0000A\u0000L\u0000U\u0000E\u0000 \u0000\"\u0000P\u0000r\u0000o\u0000d\u0000u\u0000c\u0000t\u0000V\u0000e\u0000r\u0000s\u0000i\u0000o\u0000n\u0000\"\u0000,\u0000 \u0000\"\u00003\u0000.\u00001\u0000.\u00000\u0000.\u00004\u00009\u0000\"\u0000\r\u0000\n \u0000 \u0000 \u0000 \u0000 \u0000 \u0000 \u0000 \u0000 \u0000E\u0000N\u0000D\u0000\r\u0000\n \u0000 \u0000 \u0000 \u0000 \u0000E\u0000N\u0000D\u0000\r\u0000\n \u0000 \u0000 \u0000 \u0000 \u0000B\u0000L\u0000O\u0000C\u0000K\u0000 \u0000\"\u0000V\u0000a\u0000r\u0000F\u0000i\u0000l\u0000e\u0000I\u0000n\u0000f\u0000o\u0000\"\u0000\r\u0000\n"
}
Example Diff
vs-example-diff/vs-example-diff!1 (diffs)
What is the current bug behavior?
MRs raised against UTF-16 encoded files fail to display the diff correctly.
What is the expected correct behavior?
The diff should be displayed correctly, e.g.:
Relevant logs and/or screenshots
Import error logs
{
"severity": "ERROR",
"time": "2023-07-31T09:35:50.995Z",
"correlation_id": "01H6NM02WCG056B1CQCW1NW67J",
"exception.class": "ArgumentError",
"exception.message": "string contains null byte",
"exception.backtrace": [
"lib/gitlab/database/load_balancing/connection_proxy.rb:121:in `public_send'",
"lib/gitlab/database/load_balancing/connection_proxy.rb:121:in `block in write_using_load_balancer'",
"lib/gitlab/database/load_balancing/load_balancer.rb:127:in `block in read_write'",
"lib/gitlab/database/load_balancing/load_balancer.rb:198:in `retry_with_backoff'",
"lib/gitlab/database/load_balancing/load_balancer.rb:116:in `read_write'",
"lib/gitlab/database/load_balancing/connection_proxy.rb:120:in `write_using_load_balancer'",
"lib/gitlab/database/load_balancing/connection_proxy.rb:90:in `method_missing'",
"app/models/concerns/bulk_insert_safe.rb:163:in `block (2 levels) in _bulk_insert_all!'",
"app/models/concerns/bulk_insert_safe.rb:157:in `each'",
"app/models/concerns/bulk_insert_safe.rb:157:in `each_slice'",
"app/models/concerns/bulk_insert_safe.rb:157:in `each'",
"app/models/concerns/bulk_insert_safe.rb:157:in `flat_map'",
"app/models/concerns/bulk_insert_safe.rb:157:in `block in _bulk_insert_all!'",
"app/models/concerns/cross_database_modification.rb:92:in `block in transaction'",
"lib/gitlab/database/load_balancing/connection_proxy.rb:121:in `public_send'",
"lib/gitlab/database/load_balancing/connection_proxy.rb:121:in `block in write_using_load_balancer'",
"lib/gitlab/database/load_balancing/load_balancer.rb:127:in `block in read_write'",
"lib/gitlab/database/load_balancing/load_balancer.rb:198:in `retry_with_backoff'",
"lib/gitlab/database/load_balancing/load_balancer.rb:116:in `read_write'",
"lib/gitlab/database/load_balancing/connection_proxy.rb:120:in `write_using_load_balancer'",
"lib/gitlab/database/load_balancing/connection_proxy.rb:72:in `transaction'",
"lib/gitlab/database.rb:369:in `block in transaction'",
"lib/gitlab/database.rb:368:in `transaction'",
"app/models/concerns/cross_database_modification.rb:83:in `transaction'",
"app/models/concerns/bulk_insert_safe.rb:156:in `_bulk_insert_all!'",
"app/models/concerns/bulk_insert_safe.rb:91:in `bulk_insert!'",
"app/models/concerns/bulk_insertable_associations.rb:80:in `_bulk_insert_association!'",
"app/models/concerns/bulk_insertable_associations.rb:62:in `block in bulk_insert_associations!'",
"app/models/concerns/bulk_insertable_associations.rb:61:in `each'",
"app/models/concerns/bulk_insertable_associations.rb:61:in `bulk_insert_associations!'",
"lib/gitlab/database/load_balancing/connection_proxy.rb:121:in `public_send'",
"lib/gitlab/database/load_balancing/connection_proxy.rb:121:in `block in write_using_load_balancer'",
"lib/gitlab/database/load_balancing/load_balancer.rb:127:in `block in read_write'",
"lib/gitlab/database/load_balancing/load_balancer.rb:198:in `retry_with_backoff'",
"lib/gitlab/database/load_balancing/load_balancer.rb:116:in `read_write'",
"lib/gitlab/database/load_balancing/connection_proxy.rb:120:in `write_using_load_balancer'",
"lib/gitlab/database/load_balancing/connection_proxy.rb:72:in `transaction'",
"lib/gitlab/database/load_balancing/connection_proxy.rb:121:in `public_send'",
"lib/gitlab/database/load_balancing/connection_proxy.rb:121:in `block in write_using_load_balancer'",
"lib/gitlab/database/load_balancing/load_balancer.rb:127:in `block in read_write'",
"lib/gitlab/database/load_balancing/load_balancer.rb:198:in `retry_with_backoff'",
"lib/gitlab/database/load_balancing/load_balancer.rb:116:in `read_write'",
"lib/gitlab/database/load_balancing/connection_proxy.rb:120:in `write_using_load_balancer'",
"lib/gitlab/database/load_balancing/connection_proxy.rb:72:in `transaction'",
"lib/gitlab/import_export/base/relation_object_saver.rb:45:in `execute'",
"lib/gitlab/import_export/group/relation_tree_restorer.rb:99:in `save_relation_object'",
"lib/gitlab/import_export/group/relation_tree_restorer.rb:81:in `process_relation_item!'",
"lib/gitlab/import_export/group/relation_tree_restorer.rb:70:in `block in process_relation!'",
"lib/gitlab/import_export/json/ndjson_reader.rb:40:in `<<'",
"lib/gitlab/import_export/json/ndjson_reader.rb:40:in `block (2 levels) in consume_relation'",
"lib/gitlab/import_export/json/ndjson_reader.rb:39:in `foreach'",
"lib/gitlab/import_export/json/ndjson_reader.rb:39:in `with_index'",
"lib/gitlab/import_export/json/ndjson_reader.rb:39:in `block in consume_relation'",
"lib/gitlab/import_export/group/relation_tree_restorer.rb:69:in `each'",
"lib/gitlab/import_export/group/relation_tree_restorer.rb:69:in `each'",
"lib/gitlab/import_export/group/relation_tree_restorer.rb:69:in `process_relation!'",
"lib/gitlab/import_export/group/relation_tree_restorer.rb:64:in `block in create_relations!'",
"lib/gitlab/import_export/group/relation_tree_restorer.rb:63:in `each'",
"lib/gitlab/import_export/group/relation_tree_restorer.rb:63:in `create_relations!'",
"lib/gitlab/import_export/group/relation_tree_restorer.rb:37:in `block (3 levels) in restore'",
"app/models/concerns/bulk_insertable_associations.rb:54:in `with_bulk_insert'",
"lib/gitlab/import_export/group/relation_tree_restorer.rb:36:in `block (2 levels) in restore'",
"lib/gitlab/import_export/group/relation_tree_restorer.rb:33:in `block in restore'",
"lib/gitlab/database/load_balancing/connection_proxy.rb:121:in `public_send'",
"lib/gitlab/database/load_balancing/connection_proxy.rb:121:in `block in write_using_load_balancer'",
"lib/gitlab/database/load_balancing/load_balancer.rb:127:in `block in read_write'",
"lib/gitlab/database/load_balancing/load_balancer.rb:198:in `retry_with_backoff'",
"lib/gitlab/database/load_balancing/load_balancer.rb:116:in `read_write'",
"lib/gitlab/database/load_balancing/connection_proxy.rb:120:in `write_using_load_balancer'",
"lib/gitlab/database/load_balancing/connection_proxy.rb:90:in `method_missing'",
"lib/gitlab/database.rb:233:in `block (3 levels) in all_uncached'",
"lib/gitlab/database.rb:233:in `block in all_uncached'",
"lib/gitlab/database/load_balancing/session.rb:47:in `use_primary'",
"lib/gitlab/database.rb:231:in `all_uncached'",
"lib/gitlab/import_export/group/relation_tree_restorer.rb:32:in `restore'",
"lib/gitlab/import_export/project/tree_restorer.rb:33:in `restore'",
"lib/gitlab/import_export/importer.rb:21:in `all?'",
"lib/gitlab/import_export/importer.rb:21:in `execute'",
"app/services/projects/import_service.rb:137:in `import_data'",
"app/services/projects/import_service.rb:25:in `execute'",
"app/workers/repository_import_worker.rb:28:in `perform'",
"ee/app/workers/ee/repository_import_worker.rb:9:in `perform'",
"lib/gitlab/database/load_balancing/sidekiq_server_middleware.rb:26:in `call'",
"lib/gitlab/sidekiq_middleware/duplicate_jobs/strategies/until_executing.rb:16:in `perform'",
"lib/gitlab/sidekiq_middleware/duplicate_jobs/duplicate_job.rb:44:in `perform'",
"lib/gitlab/sidekiq_middleware/duplicate_jobs/server.rb:8:in `call'",
"lib/gitlab/sidekiq_middleware/worker_context.rb:9:in `wrap_in_optional_context'",
"lib/gitlab/sidekiq_middleware/worker_context/server.rb:19:in `block in call'",
"lib/gitlab/application_context.rb:118:in `block in use'",
"lib/gitlab/application_context.rb:118:in `use'",
"lib/gitlab/application_context.rb:57:in `with_context'",
"lib/gitlab/sidekiq_middleware/worker_context/server.rb:17:in `call'",
"lib/gitlab/sidekiq_status/server_middleware.rb:7:in `call'",
"lib/gitlab/sidekiq_versioning/middleware.rb:9:in `call'",
"lib/gitlab/sidekiq_middleware/query_analyzer.rb:7:in `block in call'",
"lib/gitlab/database/query_analyzer.rb:37:in `within'",
"lib/gitlab/sidekiq_middleware/query_analyzer.rb:7:in `call'",
"lib/gitlab/sidekiq_middleware/admin_mode/server.rb:14:in `call'",
"lib/gitlab/sidekiq_middleware/instrumentation_logger.rb:9:in `call'",
"lib/gitlab/sidekiq_middleware/batch_loader.rb:7:in `call'",
"lib/gitlab/sidekiq_middleware/extra_done_log_metadata.rb:7:in `call'",
"lib/gitlab/sidekiq_middleware/request_store_middleware.rb:10:in `block in call'",
"lib/gitlab/with_request_store.rb:17:in `enabling_request_store'",
"lib/gitlab/with_request_store.rb:10:in `with_request_store'",
"lib/gitlab/sidekiq_middleware/request_store_middleware.rb:9:in `call'",
"lib/gitlab/sidekiq_middleware/server_metrics.rb:76:in `block in call'",
"lib/gitlab/sidekiq_middleware/server_metrics.rb:103:in `block in instrument'",
"lib/gitlab/metrics/background_transaction.rb:33:in `run'",
"lib/gitlab/sidekiq_middleware/server_metrics.rb:103:in `instrument'",
"lib/gitlab/sidekiq_middleware/server_metrics.rb:75:in `call'",
"lib/gitlab/sidekiq_middleware/monitor.rb:10:in `block in call'",
"lib/gitlab/sidekiq_daemon/monitor.rb:46:in `within_job'",
"lib/gitlab/sidekiq_middleware/monitor.rb:9:in `call'",
"lib/gitlab/sidekiq_middleware/size_limiter/server.rb:13:in `call'",
"lib/gitlab/sidekiq_logging/structured_logger.rb:21:in `call'"
],
"user.username": "redacted",
"tags.program": "sidekiq",
"tags.locale": "en",
"tags.feature_category": "importers",
"tags.correlation_id": "01H6NM02WCG056B1CQCW1NW67J",
"extra.sidekiq": {
"retry": false,
"queue": "repository_import",
"version": 0,
"backtrace": 5,
"dead": false,
"status_expiration": 86400,
"memory_killer_memory_growth_kb": 50,
"memory_killer_max_memory_growth_kb": 300000,
"args": [
"26593"
],
"class": "RepositoryImportWorker",
"jid": "134b0178cc2fbbd579f122db",
"created_at": 1690795250.47705,
"correlation_id": "01H6NM02WCG056B1CQCW1NW67J",
"meta.caller_id": "Import::GitlabProjectsController#create",
"meta.remote_ip": "10.81.0.172",
"meta.feature_category": "importers",
"meta.user": "redacted",
"meta.user_id": 3860,
"meta.project": "test1231/redacted/redacted",
"meta.root_namespace": "test1231",
"meta.client_id": "user/3860",
"meta.root_caller_id": "Import::GitlabProjectsController#create",
"worker_data_consistency": "always",
"idempotency_key": "resque:gitlab:duplicate:repository_import:33cdf8bc7efbb9d84e023966ba65f2797ba5afc0a06405cd02630fb9fe07bd61",
"size_limiter": "validated",
"enqueued_at": 1690795250.4847548
},
"extra.relation_index": 115,
"extra.source": "process_relation_item!",
"extra.retry_count": 0,
"extra.project_id": 26593,
"extra.relation_name": "merge_requests"
}
Possible fixes
TBD