Skip to content

Improved artifact/cache path matching

Arran Walker requested to merge ajwalker/cache-perf/matching into master

What does this MR do?

  • Improves the performance of matching files (concurrent walk/glob, only walking once)

  • Exclude rules are now recursive, just like includes (as proposed by #25770 (closed))

  • ./, . and * are converted to **/* to add back in a behaviour accidentally removed. (as proposed by #25770 (closed))

  • Starts deprecating absolute paths. Eventually, paths should be relative only (as proposed by #25770 (closed)). This MR begins to warn on absolute paths.

  • Starts to deprecate backslashes for filepath separators on Windows. Patterns will eventually only support forward-slashes for directory separators. This MR begins to warn on patterns including backslashes. Eventually, just as on non-Windows OSs, backslashes will be used for escaping. On Windows, for now, we match the behaviour of effectively disabling escaping support by using filepath.ToSlash.

Why was this MR needed?

  • filepath.Walk is slow.
  • The globbing library we were using also performs a slow travesal of the filesystem.
  • Path excludes were using a different technique than includes.

Notes for the reviewer

  • github.com/saracen/matcher is a library written by me. It's young, but the method of matching is rather simple and relies mostly on path.Match (with added support for **). The benefit is that it allows us to efficiently perform path traversal, where we can ignore directories if we determine they will not match a pattern provided.

    Update: There was a concern that the doublestar library were using not only provides ** support, but additional patterns. I would say it's fairly unlikely these additional patterns are being used, but if there's any worry that we'd break compartibility with anything that does use the additional pattern support, matcher now allows us to continue to use them by using SetMatchFunc that allows us to replace path.Match with doublestar.Match.

  • We lose some information we print: how many files were matched for exclude patterns. We could add this back in by writing a Matcher that tried to capture when something was excluded. I haven't done this for the following reasons:

    • I'm not entirely sure how useful this is. An inclusion only rule doesn't tell you about the files being excluded, but they're still excluded.
    • This MR addresses the performance problems associated with walking the filesystem. To do this efficiently, we stop walking when we know a pattern will never match. This means counting files that were excluded will always be misleading.

I haven't done this as I'm not entirely sure it's necessary and isn't as simple to do as it was previously. Because pattern matching is performed concurrently, we would need to add atomic counting (this isn't a problem, just writing it here so that I remember if we do need those counts).

  • Matching is case-sensitive, even on case-insensitive filesystems. But the existing behaviour should be too. We can more easily address this problem now though, if required.

  • The matching functionality purposely distances itself from fileArchiver, so that functionality surrounding that can be refactored more easily as part of #25991 (closed).

What are the relevant issue numbers?

#25991 (closed) #1797 (closed)

Closes #25770 (closed)

Closes #1320 (closed)

Closes gitlab#219273 (closed)

Edited by Arran Walker

Merge request reports