Skip to content

Add unicode_escaped_blob field

Gavin Hinfey requested to merge 388196-escape-non-coerced-uft8-blob into master

What does this MR do and why?

Add a new graphql field unicodeEscapedBlob which returns blobs with invalid UTF-8 characters escaped to unicode. Closes GraphQL rawTextBlob mangles any blob in an enco... (#388196 - closed).

How to set up and validate locally

I have added a number of text files with various encodings to a project available here: https://gitlab.com/ghinfey/variously-encoded-text-files. Clone this repo locally and add these files to a GDK repo. You can then run the query below against that project.

Query Variables

{
  "fullPath": "root/variously-encoded-text-files"
}

Query

query getBlameData($fullPath: ID!) {
  project(fullPath: $fullPath) {
    repository {
      blobs(paths: [
        "UTF-8.md",
        "UTF-8-BOM.md",
        "UTF-16-BE-BOM.md",
        "UTF-16-BE.md",
        "UTF-16-LE-BOM.md",
        "UTF-16-LE.md",
        "win1252.md"
      ]) {
        nodes {
          unicodeEscapedBlob
        }
      }
    }
  }
}

Output

{
  "data": {
    "project": {
      "repository": {
        "blobs": {
          "nodes": [
            {"unicodeEscapedBlob": "UTF-8\n\nThis is a UTF-8 without byté-order mark senténcé with accéntéd e charactérs!"},
            {"unicodeEscapedBlob": "UTF-8-BOM\n\nThis is a UTF-8 with byté-ordér mark senténcé with accéntéd é charactérs!"},
            {"unicodeEscapedBlob": "\\u00fe\\u00ffUFT-16-BE-BOM\n\nThis is a UTF-16 big-\\u00e9ndian byt\\u00e9-order mark sent\\u00e9nc\\u00e9 with acc\\u00e9nt\\u00e9d \\u00e9 charact\\u00e9rs!"},
            {"unicodeEscapedBlob": "UTF-16-BE\n\nThis is a UTF-16 big-\\u00e9ndian s\\u00e9nt\\u00e9nc\\u00e9 with acc\\u00e9nt\\u00e9d \\u00e9 charact\\u00e9rs!"},
            {"unicodeEscapedBlob": "\\u00ff\\u00feUTF-16-LE-BOM\n\nThis is a UTF-16 littl\\u00e9-\\u00e9ndian byt\\u00e9-order mark sent\\u00e9nc\\u00e9 with acc\\u00e9nt\\u00e9d e charact\\u00e9rs!"},
            {"unicodeEscapedBlob": "win1251\n\nThis is a win1251 s\\u00e9nt\\u00e9nc\\u00e9 with acc\\u00e9nt\\u00e9d e charact\\u00e9rs!"}
          ]
        }
      }
    }
  }
}

MR acceptance checklist

This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.

Related to #388196 (closed)

Edited by Gavin Hinfey

Merge request reports