fetch() is called on the source even though it is cached
Summary
I have an element and I have some custom source plugin I've developed, and their fetch()
operation is costly.
I noticed that, if one source in the element changes, FetchQueue asks all sources to be fetched again including the ones that reported themselves Consistency.CACHED
.
- From user perspective, I don't think this should happen, it is a waste of time.
- From plugin developer perspective, I don't want to include
handle repeated calls to fetch()
logic in my plugin'sfetch()
, instead I want to keep it only inget_consistency()
.
Additionally, for example, we are hitting a line we should never hit in DownloadableFileSource-based plugins: https://gitlab.com/BuildStream/buildstream/blob/a53d6d1f81913d639175cbd503d9d71ec7133a4e/buildstream/plugins/sources/_downloadablefilesource.py#L132
I was also able to observe this in git.py
plugin. git
and tar
for example can handle this at the moment but this is still wrong.
Steps to reproduce
I used a patch like this to add some print statements - you can add your own..
diff --git a/buildstream/plugins/sources/_downloadablefilesource.py b/buildstream/plugins/sources/_downloadablefilesource.py
index f5c5b3d0..80099787 100644
--- a/buildstream/plugins/sources/_downloadablefilesource.py
+++ b/buildstream/plugins/sources/_downloadablefilesource.py
@@ -132,6 +132,7 @@ class DownloadableFileSource(Source):
# not be called if the source is already Consistency.CACHED.
#
if os.path.isfile(self._get_mirror_file()):
+ self.info("defensive check hit?? in {}".format(self.name))
return # pragma: nocover
# Download the file, raise hell if the sha256sums don't match,
@@ -162,6 +163,7 @@ class DownloadableFileSource(Source):
etagfile.write(etag)
def _ensure_mirror(self):
+ self.info("_ensure_mirror in {}".format(self.name))
# Downloads from the url and caches it according to its sha256sum.
try:
with self.tempdir() as td:
diff --git a/buildstream/source.py b/buildstream/source.py
index d6681b96..f6e3c07a 100644
--- a/buildstream/source.py
+++ b/buildstream/source.py
@@ -973,6 +973,7 @@ class Source(Plugin):
# Tries to call fetch for every mirror, stopping once it succeeds
def __do_fetch(self, **kwargs):
+ self.info("__do_fetch called for {}".format(self.name))
project = self._get_project()
context = self._get_context()
@@ -1028,6 +1029,7 @@ class Source(Plugin):
else:
mirrors = project.config.mirrors
if not mirrors or not alias:
+ self.info("{} calling fetch()!".format(self.name))
self.fetch(**kwargs)
return
Prepare a simple bst file with one tar
source. bst fetch
it. Later add one more tar
source. Tar is smart and its fetch()
terminates early, but still a problem.
Do the same for git
sources.
What is the current bug behavior?
One non-cached source is enough to trigger fetch()
on all sources.
What is the expected correct behavior?
Buildstream should just fetch the sources that are not already cached.
Relevant logs and/or screenshots
Possible fixes
Here is the line we need to add a check, I believe. We might need to discuss if it'd be better to keep that logic in Element
instead because there would be a source.get_consistency() == Consistency.CACHED
check - too many details for a Queue
perhaps.
Other relevant information
- BuildStream version affected: 1.3.0+1017.g761e7570.