Skip to content

catfile: Don't return readers to cache when their queues are in use

Both the catfile object content and object info readers need to track their dirtiness so that the catfile cache can decide whether a process can be returned to the cache or not. And while both of these readers are considered dirty when their underlying request queue is, they are not considered dirty when the queue is still in use. This is a bug as it is now possible that the caller still queues requests, but we already have returned the process to the cache as we considered it to be clean.

This bug manifests in one of our flaky tests:

=== FAIL: internal/gitaly/service/blob TestListLFSPointers/partial_graph_walk_with_limiting_limit (0.00s)
    lfs_pointers_test.go:172:
                Error Trace:	/builds/gitlab-org/gitaly/internal/gitaly/service/blob/grpc.go:40
                                                        /builds/gitlab-org/gitaly/internal/gitaly/service/blob/grpc.go:58
                                                        /builds/gitlab-org/gitaly/internal/gitaly/service/blob/lfs_pointers_test.go:172
                Error:      	Should be empty, but was   (*status.Status)(Inverse(protocmp.Transform, protocmp.Message{
                                - 	"@invalid": bool(true),
                                        "@type":    s"google.rpc.Status",
                                + 	"code":     int32(13),
                                + 	"message":  string("creating object iterator: object queue already in use"),
                                  }))
                Test:       	TestListLFSPointers/partial_graph_walk_with_limiting_limit

What happens here is that we spawn a gitpipe in the ListLFSPointers RPC that processes requests asynchronously in two Goroutines. The request queue that we have acquired and that is used by those two Goroutines will only be closed once both Goroutines have finished. But as the RPC allows to set a limit of the maximum amount of LFS pointers we read, it may exit early before the Goroutines have finished. And that can indeed lead to a race condition where the Goroutines still use the queue and have not yet exited, but the process is returned to the cache regardless of that as it was not considered to be dirty.

Fix this bug by considering both object info and object content readers to be dirty if their request queue is still in use.

Closes #4321 (closed).

Edited by Patrick Steinhardt

Merge request reports