GitHub importer: "Missing content-length header" during attachments fetching
Summary
During some local tests I figured out that some of the jobs are failing with error: Missing content-length header
Steps to reproduce
POST {{HOST}}/api/v4/import/github
{
"personal_access_token":<PAT>,
"repo_id":3468618,
"target_namespace":"root",
"new_name":"oj",
"optional_stages": {"single_endpoint_issue_events_import":false,"single_endpoint_notes_import":false,"attachments_import":true}
}
You could also try to reproduce it with args for the Gitlab::GithubImport::Attachments::ImportNoteWorker
Click to expand
25, {"record_db_id"=>13298, "record_type"=>"Note", "text"=>"Created by: GUI\n\n> The :use_to_json is an option for :compat mode. Try setting the :nan option to :raise.\r\n\r\nHm, the docs seem to indicate not: "The option is ignored in the :compat and :rails mode." The docs also seem to indicate that the :nan
option is ignored under compat mode: "Default is :auto but is ignored in the :compat and :rails mode."\r\n\r\nThese options don't seem to have any effect for me in compat mode on some quick tests:\r\n\r\nruby\r\n> require \"oj\"\r\n=> true\r\n> Oj.dump({ \"nan\" => Float::NAN })\r\n=> \"{\\\"nan\\\":3.3e14159265358979323846}\"\r\n> Oj.dump({ \"nan\" => Float::NAN }, :mode => :compat)\r\n=> \"{\\\"nan\\\":NaN}\"\r\n> Oj.dump({ \"nan\" => Float::NAN }, :mode => :compat, :use_to_json => true)\r\n=> \"{\\\"nan\\\":NaN}\"\r\n> Oj.dump({ \"nan\" => Float::NAN }, :mode => :compat, :use_to_json => true, :nan => :raise)\r\n=> \"{\\\"nan\\\":NaN}\"\r\n> Oj.mimic_JSON\r\n=> JSON\r\n> Oj.dump({ \"nan\" => Float::NAN }, :mode => :compat, :use_to_json => true)\r\n=> \"{\\\"nan\\\":NaN}\"\r\n> Oj.dump({ \"nan\" => Float::NAN }, :mode => :compat, :use_to_json => true, :nan => :raise)\r\n=> \"{\\\"nan\\\":NaN}\"\r\n
"}, "gitlab:job_waiter:0fbb4fb4-2b2f-4ef9-8a36-e73f64c16c33"
25, {"record_db_id"=>1574, "record_type"=>"Issue", "text"=>"Created by: dgollahon\n\nHi,\r\n\r\nI operate several JSON APIs that use oj
in various capacities. In general, we want to force very strict parsing (and use oj
's strict
mode) so that we don't accidentally support parsing features that we might later not if we switch to a different library. Recently oj
has relaxed strictness for number parsing in a couple of ways. After those changes, they were restricted in the json compatibility mode here and here.\r\n\r\nI would like to be able to enforce that behavior (failing on number formats not part of the JSON standard) but I don't see another way to enable this behavior in the options options and would prefer the rest of the strict
configuration. According to the goals of the strict mode documentation it seems like this should be the default behavior for strict
as well, (but if you agree I believe that would require releasing a new major version since it is currently allowed). In any case, I would like to be able to opt-in to the stricter parsing mode, perhaps through an option.\r\n\r\ntl;dr: I want to raise an error when parsing Oj.load('+1.')
or similar while using strict
mode, as was the case < 3.7.1
.\r\n\r\nThanks for a great gem!"}, "gitlab:job_waiter:fecccf02-94e0-4898-b92b-8dbd6fa589b7"
Example Project
Example of problematic "attachments"
[test results](https://github.com/stereobooster/ruby-json-benchmark/blob/master/test_report.txt)
[pages/Modes](https://github.com/ohler55/oj/blob/develop/pages/Modes.md)
What is the current bug behavior?
Some attachments fetching are failing with:
{"feature_category":"importers","import_type":"github","severity":"ERROR","time":"2022-11-28T12:32:53.114Z","correlation_id":"bd876768ff11370ec056859552d7b639","project_id":25,"source":"Gitlab::GithubImport::Importer::NoteAttachmentsImporter","message":"importer failed","error.message":"Missing content-length header"}
What is the expected correct behavior?
Importing attachments without any error in logs.
Possible fixes
As you can see from the text of the processed notes it contains links to the *.md
files that are basically the part of committed repo`s content. That means that we shouldn't download them at all (skip).
Although if link points us to this repo`s file/doc then we are able to convert them to some place in imported repo on GitLab side.
So, if we take a look at examples that were failed:
[test results](https://github.com/stereobooster/ruby-json-benchmark/blob/master/test_report.txt)
[pages/Modes](https://github.com/ohler55/oj/blob/develop/pages/Modes.md)
First one can't be converted (we should skip) but second one could link to just imported place in repo.