The HTML spec requires that form values are sent with line breaks as CRLF pairs. We don't do any processing for snippets, but git implicitly does some for other files as they are backed by a repository.
This means that when getting a raw snippet, the line endings will be CRLF pairs unless it was created through the API without those pairs.
For people wanting to use snippets to store scripts for executing on Posix-ish systems, this is a real pain.
The expectation here is that Windows users aren't using snippets to execute code immediately.
Original description
I can't figure out a way to create a snippet without introducing windows-style line endings...
# Originalcurl - s https://gitlab.com/snippets/24662/raw | cat-A# Text copied into gitlab from FF50curl - s https://gitlab.com/snippets/24665/raw | cat-A# Text copied into gitlab from Chromium 52curl - s https://gitlab.com/snippets/24664/raw | cat-A
snippet of snippet:
#!/bin/bash^M$declare -A ENVS^M$
Well that doesn't seem right....
Let's hash them real quick just to make sure they're all the same output:
for i in 2466{2,5,4}; do curl -s https://gitlab.com/snippets/24664/raw | md5sum; done3e38541974206cf74d573bde5b0ae11e -3e38541974206cf74d573bde5b0ae11e -3e38541974206cf74d573bde5b0ae11e -
Alright, so, I can't conceive that this isn't a bug those line endings make this pretty much useless for Linux users and I'm sure many languages alike. The source is directly from my bash konsole terminal, and I also checked that with cat -A for line-endings and they are normal Linux-style line-endings.
Interim solution is to pipe things through dos2unix.
To clarify, the very reason I noticed this in the first place is that I have been transitioning from GitHub to GitLab, and as such the above is my workflow to temporarily use gists/snippets i've saved. Bash does not parse this correctly due to the CR/LF line endings and thus it breaks the script.
Has it always been this way or is this a regression from one of the many recent updates?
I've noticed this as well, it's frustrating. I also resorted to using dos2unix to convert the downloaded snippets and execute. GitHub does not convert "gist" line endings.
This has proven to be a pain point which caused me to avoid snippets altogether. I was attempting to host a shell script whose url I was passing to a downstream utility. Since I have no control over the actual download and execution commands, I cannot convert the line endings on the fly. Therefore, I am forced to go with a different solution entirely. Please address this issue!
Also, the zendesk issue linked above comes up 404 for me.
I am evaluating GitLab for an enterprise replacement of GitHub, and am happy to pay for the enterprise edition, but this is sort of a big issue for us. We have the same use case as @dgoo2308. I would also love to see versioning for snippets, but I understand that may be a way off.
Meanwhile, I'll take a look at the source to see if there is an easy way to at least add an argument to the /raw action like /raw?line-endings=LF or similar.
Then you will be able to add ?lineendings=lf to your /raw URL and it will return LF line endings instead of CRLF. I think the root problem is that the HTML spec for textareas causes the line endings to be stored as CRLF to begin with, so I'm not sure what a better alternative is, besides, perhaps, making the default line endings LF and allowing the user to change to between LF <=> CRLF in the GUI when editing a snippet.
I can send a pull request if you guys want, but it seems like this should be handled by making LF the default.
If the root of the problem here is that form-data is sent with CRLF line endings, as suggested by @Necrathex in issue #26515 (closed), then I would suggest encoding the content before sending it to the server (in base64, for example), so the original data is preserved.
Still, it might be nice to have a toggle switch on the editor to let you choose the line endings for your snippet. This would accommodate the case where a Windows user wants to copy and paste a UNIX script into a new snippet with LF line endings.
I agree with @kamermans that having the original data preserved would be optimal, but I'm not sure that is possible since the html5 spec states that the textarea contents will contain only LF characters (if I understand correctly) so that would strip CR chars for Windows file snippets instead?
From the spec:
the user agent should allow the user to edit, insert, and remove text, and to insert and remove line breaks in the form of "LF" (U+000A) characters
Good catch @Necrathex, I didn't realize that was in the spec.
Personally I would just force every thing to LF anyway, as I've never seen a file break anything with just LF line endings in 20 years of being a sysadmin and developer, except on really old Macs. Windows machines are totally fine with LF unless you're using Notepad. Even converting an entire Microsoft Visual C# project to LF line endings does not break it.
The textarea wrapping transformation is the following algorithm, as applied to a string:
Replace every occurrence of a U+000D CARRIAGE RETURN (CR) character not followed by a U+000A LINE FEED (LF) character, and every occurrence of a U+000A LINE FEED (LF) character not preceded by a U+000D CARRIAGE RETURN (CR) character, by a two-character string consisting of a U+000D CARRIAGE RETURN U+000A LINE FEED (CRLF) character pair.
[...]
@xenithorb: emulating GitHub in this case would involve making our snippets backed by a repository, which is a great idea but is a lot of work for this change. Line endings inserted through the web editor in a repository on GitLab work fine, which I think - I haven't double-checked - is because of git's normalisation.
I'm not sure how we'd expose that in the UI, or otherwise document it, but that seems reasonable to me.
Note that another workaround may be to use the API to create or update snippets, as that won't be constrained by the browser's form-encoding requirements. I realise that's not a great workaround, just mentioning it
@smcgivern I've actually went around the UI entirely and jammed my snippet into PostgreSQL without CRLF, but somehow it was still served with CRLF. I suspect there is some sort of filter upstream in the output code.
Here are my preferred solutions at this point, in order from most-preferred to least-preferred (but still usable):
Save and serve all snippets with LF line endings by default, but have a switch in the GUI to select the preferred line endings for the snippet.
Save and serve all snippets with LF line endings. Add a parameter ?lineendings=<lf|crlf> so it can be forced.
Save and serve all snippets with LF line endings and do not support CRLF.
I any case, I think the default should be changed from CRLF to LF, especially considering there is no way to know what the original line endings were.
The absolute minimum that needs to happen here is that it should default to LF line endings only.
If you could get it to honor the original input, that would be the most optimal I would think, but something was mentioned about a specification hindering that as I understand it.
My personal take is that I think it would be such an infrequently used feature, that @kamermans number 2) point is probably the best option and quickest and easiest to implement, while still allowing an override if needed.
That being said if the original data can be maintained then the options are moot and that is what should happen.
Could it be done in such a way that the browser encapsulates the data before it's transmitted as to evade the specification? My apologies if i'm not articulating that properly web development is not my area. Thanks
I've created a little test page that demonstrates that my suggestion of base64-encoding the data before submitting it, then decoding it in the response does in fact preserve the original line-endings:
In the demo, the textarea is data, then there is a hidden field under it called encoded_data. When you submit the form, I run btoa() on data and save it to encoded_data, then submit the form. On the server side, I receive the form (in PHP) and base64_decode()encoded_data and save it in a new field called decoded_data, then output the contents in JSON so we can see the line endings in the output.
(sorry for using GitHub, but I wanted to have versioning and LF line endings :P)
I think @xenithorb's suggestion of just converting to LF all the time, and adding a param to keep CRLF, is the simplest. Otherwise the behaviour will depend on which input form you used (the old version or the new version), which is quite unpredictable. We can add a note to https://docs.gitlab.com/ee/workflow/project_features.html#snippets explaining this.
@smcgivern I'm not 100% sure about my test, I was using socat to expose the pgsql UNIX domain socket to a TCP port so I could sneak into the DB and poke around. I could easily have made a mistake while running that test.