Skip to content

Clear timezones from cached zip entries

Jacob Vosmaer requested to merge jv-clear-timezone into master

What does this MR do?

Clear timezones from cached zip entries

In #702 we discovered that by caching many *zip.File instances, we end up storing many Go timezone objects on the heap: one per cached *zip.File. This adds up to about 25% of the heap size.

In this commit we set the timestamp of *zip.File to UTC. Because Go re-uses a single timezone object for UTC, this causes the original timezone objects of the *zip.File instances to be garbage collected. This reduces the heap size.

Here is an example program that demonstrates the effect:

package main

import (
        "archive/zip"
        "log"
        "os"
        "runtime/pprof"
)

func main() {
        if err := load(os.Args[1]); err != nil {
                log.Fatal(err)
        }
        log.Printf("load finished: %d archives", len(readers))
        pprof.WriteHeapProfile(os.Stdout)
}

var readers []*zip.ReadCloser

func load(filename string) error {
        for i := 0; i < 1000; i++ {
                zr, err := zip.OpenReader(filename)
                if err != nil {
                        return err
                }
                if os.Getenv("FORCE_UTC") == "1" {
                        for _, zf := range zr.File {
                                zf.Modified = zf.Modified.UTC()
                        }
                }

                readers = append(readers, zr)
        }

        return nil
}

TODO

Merge request reports