I'm actively using this at the moment on my own projects. It does give us a dependency on another system component that we'd have to ensure is installed, probably adding it to our requirements documentation, but it should be installed on all Linux distros by default.
I've used this in the past and it worked well, but appears to be less in-depth than mimemagic. Our direct use-cases are limited so this possibly doesn't matter
It is however still pulled in as a dependency in our Gemfile.lock, probably due to Rails.
Update 1 (2021-03-24)
We have a "working" MR in !57387 (merged), butfile is an unreliable dependency across systems, as seen in !57387 (comment 536948103). Ubuntu 18.04, for example, fails to detect a lot of mime types correctly.
@WarheadsSE, @digitalmoksha, and I had a chat about our options. Currently it looks like forking ruby-filemagic orruby-magic and vendoring a fixed version of file/libmagic into it as the most reliable option, where we can guarantee that the mime types match for everyone using the gem.
If anyone has experience with vendoring library dependencies for C-extensions in Ruby, please get involved 😄 Thanks Stan & Yorick
Update 2 (2021-03-25)
We're now using https://github.com/kwilczynski/ruby-magic in master 🎉 Thanks very much to its author Krzysztof Wilczyński for being so responsive and for accepting our changes to have it vendor the file dependencies 😃
We're about to merge a shim to override any dependency that still references mimemagic (i.e. Rails): !57443 (merged)
These changes are marked for picking so they'll be picked up in our next backport releases
We're running into CI issues due to the ruby-magic gem now downloading from public file mirrors. We're swapping to an internal fork at https://gitlab.com/gitlab-org/ruby-magic that uses our own mirror of the archive, and working on a more permanent solution. Fork swap: !57487 (merged)
We're waiting for the latest builds to finish to confirm the issue is resolved
Builds currently blocked due to git being used in the gemspec, MR to fix this: !57516 (merged). Now fixed.
Removing HipChat is probably not an option - if we want to release updates to any older GitLab releases. We need some patch for HipChat for these older versions, I think.
Or given hipchat is dead, we can also remove hipchat from older GitLab releases
@tkuah we had already scheduled HipChat for removal in %14.0 in any case. The product is completely dead, and is close to a year past its official end of life date.
We had already scheduled HipChat for removal in %14.0 in any case. The product is completely dead, and is close to a year past its official end of life date.
@deuley Then it doesn't feel like we even need to wait for %14.0 to remove ? It's un-usable at all ?
@tkuah frankly I was just waiting because our policy to only introduce breaking changes in major releases. If it saves us effort, I'm totally fine with moving that up a release. Removing it is technically a breaking change, but yeah, it's effectively un-usable, so yes, I'm fine with dropping it.
Understood about the need to backport a fix regardless, but if removing in %13.11 saves us some effort, let's do it.
Stop installing and requiring activestorage. Obviously, not possible if you're using that dependency, but if you're not, stop requiring "rails/all" and require each Rails component manually. Here's what you need to copy: https://github.com/rails/rails/blob/main/railties/lib/rails/all.rb
Actually, looks like we already do this. The only place we need to remove it is from the Gemfile, though it's pulled in as a dependency by Rails so I'm not sure if we can do that or not.
An alternative to the mimemagic gem for those uses where we need it could be ruby-filemagic, which wraps the venerable (and MIT) libmagic: https://github.com/blackwinter/ruby-filemagic/ . There are no licensing issues there.
It is currently unmaintained, however, and that's been the case since 2018.
We could also simply wrap the file binary - that's very likely to be available in every system we deploy to.
Using file makes sense, logistically, though not having to popen would be nice. We should stay aware of the need to ensure this is installed on all systems/containers using said call.
Interesting, my macOS version returned nil for an empty file, and ruby-magic-static returned application/x-empty. So I guess it's picking up the right version.
We could have a Go binary that takes the filename and initial bytes as input, and returns the content-type-as-detected-by-Go as the output. Then every time we want to look up a content type in Ruby, shell out to this Go binary.
That'd be pretty interesting, would essentially work like the file implementation but with something we can internalise rather than being a system dependency. Could be worth investigating. I'm going to add our own wrapper for finding the mime type so it's easy to swap out the implementation later.
Dicussion was had within that issue, and our own issue. The problem is that the "skirt" by downloading at gem install. We package and ship the results, qualifying as "distribution" beyond consumption.
The obligations of the GPL are primarily invoked upon redistribution rather than use; executing code as you're proposing here is very different from distributing code as you were previously doing
So hipchat (1.5.2) and marcel (0.3.3) are still pulling in mimemagic as dependencies. Are we going to have to fork these and replace the usage with our own from !57387 (merged)?
You mean our own shim? That would work, but don't we still have the problem of the mimemagic gem itself getting pulled in as a dependency? It sounds like we might not even be able to ship that code in omnibus, though I could very well be wrong
Theoretically no. We create our own mimemagic gem that is already vendored, and override the method. So that gem gets pulled in instead of the real gem. At least I think that should work.
So just for my own clarity, are these the steps we need and are planning on taking?
replace our own usage of mimemagic with Gitlab::Utils::MimeType from !57387 (merged), based on Shrine
replace usage in hipchat gem, since we can't wait to remove it in 14.0. So we need our own fork?
no longer load activestorage, actiontext, and actionmailbox so as not to pull in marcel (I don't think we're using actiontext and actionmailbox, but I'm not sure)
I think this makes sense to me. We could leave the Rails side of it and wait for them to ship a new release which solves it at their end. I think we will need to sort out the hipchat integration though, and as we're about to remove it any way, I think we could argue for doing the easiest solution:
Fork the gem
Replace its use of mimemagic with Gitlab::Utils::MimeType or another gem
We have a "working" MR in !57387 (merged), butfile is an unreliable dependency across systems, as seen in !57387 (comment 536948103). Ubuntu 18.04, for example, fails to detect a lot of mime types correctly.
@WarheadsSE, @digitalmoksha, and I had a chat about our options. Currently it looks like forking ruby-filemagic and vendoring a fixed version of the file/libmagic library into it is likely to be the most reliable option, where we can guarantee that the mime types match for everyone using the gem.
We could explore using https://github.com/mime-types/ruby-mime-types as a stop-gap until we update ruby-filemagic, potentially. If we use the generic Gitlab::Utils::MimeType class in !57387 (merged) it will be simple to swap out the implementation later.
The other alternative to vendoring a fixed version of the file/libmagic library might just be to accept that file is going to give slightly different results per platform. At least initially.
The file method could give very odd results indeed. From my linux install:
$ file spec/fixtures/spdx.jsonspec/fixtures/spdx.json: JSON data$ file spec/fixtures/metrics.jsonspec/fixtures/metrics.json: ASCII text, with very long lines
I feel confident that, with the exception of image/vnd.microsoft.icon (and it's old enough I bet it would), any file should be able to properly identify the above types.
Which means those would be sufficient at the moment, and not require a C compilation.
@digitalmoksha It probably makes sense to swap the file-type tests for the wrapper to include all those image types, and add some fixtures for them, as those are what we really need to identify correctly 🤔
We still want to be able to detect text/plain if possible.
JobsController only determines whether something could be inline (it's either text/plain or nil) and everything else is attachment.
Hmm, this would mean that on a system such as @robotmay_gitlab, a json file would get detected as text/plain and sent as an inline, while on my system it gets detected as application/json and gets streamed as an attachment. But I don't think that is a problem
It probably makes sense to swap the file-type tests for the wrapper to include all those image types, and add some fixtures for them, as those are what we really need to identify correctly
@stanhu I examined ruby-magic (https://github.com/kwilczynski/ruby-magic) as well. The results matched. The only issue seen at that point is the variance based on system-available libmagic version and the database there-in.
Based on normal support, at least 3 minor versions. If we need to issue a Security update to older versions, we'll have to have this in place for them as well.