Cold cache or when the related cached response doesn't exist:
Use workhorse senddependency logic to return and upload the response from upstream.
At this time, record the ETag sent by upstream.
Set the upstream_etag_checked_at to now.
Set the downloaded_at to now.
Set the download_count to 1.
Warm cache or when the related cached response exist:
Check if now is within [upstream_etag_checked_at, upstream_etag_checked_at + cache_validity_period_hours].
If that's the case, use workhorse sendurl to send the cached_response.
Set the downloaded_at to now.
Set the download_count to +1.
If that's not the case, ping the related upstream to get the ETag header (HEAD request).
If the ETag is the same, use workhorse sendurl to send the cached_response.
Set the upstream_etag_checked_at to now.
Set the downloaded_at to now.
Set the download_count to +1.
If the ETag is not the same (stale cached response), destroy the cached_response and use workhorse senddependency logic to return and upload the response from upstream.
At this time, record the ETag sent by upstream.
Set the upstream_etag_checked_at to now.
Set the downloaded_at to now.
Set the download_count to download_count of the destroyed cached_response + 1. (the download_count carries over the cached response destruction.)
Special case: ETag header is missing from upstream. Execute the "ETag is not the same" branch.
The analytics instrumentation label is automatically applied when featureaddition and workflowready for development labels are applied. We encourage teams to plan for instrumentation at the time of feature development for new features. This label helps groupanalytics instrumentation to proactively reach out to teams that require instrumentation support. In case instrumentation does not apply to this feature, please feel free to remove the label.
We have a few aspects to work on here. Let's track progress with a list:
Cold cache scenario: in dev. I was able to implement the upload endpoint. I had to use a small trick to pass identifiers around. It worked as expected.
Warm cache scenario: no started.
Stale cache scenario:
correct Etag scenario: not started.
incorrect Etag scenario: not started.
Looking at the above, we will probably need to split changes in multiple MRs. I'll evaluate the MR size at each scenario completion. At worst, I'll be creating 3 different MRs.
We have a few aspects to work on here. Let's track progress with a list:
Cold cache scenario: in dev. I was able to implement the upload endpoint. I had to use a small trick to pass identifiers around. Finished the implementation, the cache entry is created as expected. Started implementing the specs.
Received feedback that I need to address for the cold cache scenario.
I started working on the warm cache scenario. It might be possible to pack the warm cache + stale cache support in a single MR. Wrote a first implementation and I was able to get the correct behavior from the cache system:
A cache doesn't exist: it is created.
A cache exists.
(A) It is still valid to use: use it to download the file.
It is not valid:
There is no Etag from the upstream. Nothing we can do here: re download from upstream. The cache entry will be updated.
There is an Etag. We check with upstream.
It is the same value: we update the cache entry to start a new validity period and we use it to download the file.
It is not the same: we re download the file from upstream. The cache entry will be updated.
All branches are working as expected. In particular (A), which should be the fastest path, is not interacting with upstream at all = fastest execution time.
Cold cache scenario: addressed the feedback and back.
Warm cache scenario: finished the implementation and started adding the specs. I have one last spec to update and then we should be able to prepare the MR for review.
This one will be blocked until the MR for the cold cache scenario is merged.
Prepared MR 2 for the review and noticed that I missed something: content types.
Our idea is to properly set the content type response header by using the content type value sent by the upstream. This way, nitpicky clients (for example, if they require to have a content type of text/xml for .pom files) will be supported.
This content type support triggered more changes than expected but in the end, I was able to implemented correctly. I tested with the 3 main object storage modes: disabled, enabled with direct upload and enabled without direct upload. In all 3 cases, the content type in the response headers was set properly.