Preprocess zip archives on load and cache file structure
Preprocess zip archive and cache relevant files:
- cache only relevant list of files with references where to fetch them in an efficient way
- cache into a
map
to haveO(logn)
search
Reference code !326 (diffs)
Depends on #443 (closed)
Considerations
A noticeable change came up during the profiler demo with ~"team::Scalability" where we saw that readArchive
is allocating a considerable amount of memory (about 28MB for the top 1%) for docs.gitlab.com
alone in production.
Profiler on GCP (internal)
Allocated memory spike after enabling Zip for 5% of Pages projects on 2020-10-08 ~12:30 UTC
The following discussion from !299 (closed) should be addressed:
- [ ] @ayufan started a discussion:> Oh, yes, yes, yes. It would really help to convert a flat list into:
>
> - `map[string]zip.File`, ideally with `zip.File` having an `offset` as well
>
> And, drop all the ones that are not within `public/`
Cache flat list idea !299 (comment 377210617)
Edited by Jaime Martinez