Docs: clarify that patterns in `cache:key:files_commits` are interpreted as git pathspecs
<!--IssueSummary start--> <details> <summary> Everyone can contribute. [Help move this issue forward](https://handbook.gitlab.com/handbook/marketing/developer-relations/contributor-success/community-contributors-workflows/#contributor-links) while earning points, leveling up and collecting rewards. </summary> - [Label this issue](https://contributors.gitlab.com/manage-issue?action=label&projectId=278964&issueIid=595745) </details> <!--IssueSummary end--> - [x] Start this issue's title with `Docs:` or `Docs feedback:`. ## Problem to solve In `cache:key:files`, wildcard patterns are interpreted like globs. In `cache:key:files_commits`, they're interpreted like git pathspecs. These behave differently, e.g. `**/foo` matches top-level `foo` as a glob but not as a pathspec. It would be helpful to document this. I raised a similar issue before: https://gitlab.com/gitlab-org/gitlab/-/work_items/547149. It was closed because there was related work in progress (making `cache:key:files` use content-based hashing, and moving commit-based hashing to `cache:key:files_commits`) and the glob/pathspec mismatch was meant to be taken care of as part of that. That work has been merged, but the glob / pathspec mismatch remains. When I raised #547149: * `cache:key:files` called `last_commit_id_for_path(path)`, which delegated to Gitaly, which interpreted the path as a pathspec Now: * `cache:key:files` uses content-hashing instead of commit-hashing * `cache:key:files` supports wildcards, using glob-like matching * `cache:key:files_commits` uses the old commit-hasing * it still calls `last_commit_id_for_path(path)`, as before Reproducible example: Suppose your project just has `foo` and `bar` at top-level. ```yaml job-a: cache: key: files_commits: - "**/foo" # no match, because interpreted like pathspec, so cache key is 'default' paths: - "**/bar" # match, because interpreted like glob script: - echo "check the logs to see caching behaviour" ``` ## Further details <!--* Any concepts, procedures, reference info we could add to make it easier to successfully use GitLab? * Include use cases, benefits, and/or goals for this work. * If adding content: What audience is it intended for? (What roles and scenarios?) For ideas, see personas at https://handbook.gitlab.com/handbook/product/personas/ or the persona labels at https://gitlab.com/groups/gitlab-org/-/labels?subscribed=&search=persona%3A--> ## Proposal Clarify in the docs that `cache:key:files_commits` patterns are interpreted as git pathspecs. (Or change the implementation to interpret them as globs, consistently with `cache:key:files`, `cache:key:paths`, and more.) ## Who can address the issue Anyone ## Other links/references https://gitlab.com/gitlab-org/gitlab/-/merge_requests/203233: * `cache:key:files` switched from commit- to content-hashing * `cache:key:files` paths are matched literally (no wildcard support) * `cache:key:files_commits` added, using commit-hashing https://gitlab.com/gitlab-org/gitlab/-/merge_requests/209633 * adds glob-like wildcard support to `cache:key:files` * fixed a bug when matching wildcards, so `**/foo` matches at top-level https://gitlab.com/gitlab-org/gitlab/-/merge_requests/211084 * reverted !209633 because it allowed a file to include itself https://gitlab.com/gitlab-org/gitlab/-/merge_requests/211424 * better version of !209633 * added glob-like wildcard support to `cache:key:files` as in !209633 * but also prevented a file including itself
issue