Design: Cache FileInfos for faster lookups
Currently our renter/files
endpoint lookups could be a lot faster. We are going to deprecate this endpoint in the future because it doesn't make sense to always retrieve all the fileinfos but I've seen it take over a second to return for 1,500 files which is too slow.
Our first proposal to fix this issue was adding cached fileinfos to the .siadir
files and to be honest I'm not convinced that this is the best solution. The problem with this approach is that the '.siadir' file is a JSON object and therefore it needs to be completely rewritten every time a single fileinfo field of a file within the folder is updated. Since a user could potentially have 100,000 files in a single folder it's not unrealistic to assume that this would cause rewriting 100mb .siadir files multiple times a second during repairs.
That's why I propose a combination of the following options:
1. Caching siafiles
This is something we wanted to do anyway at some point. Basically instead of closing a siaFileSetEntry
right away when no thread is using it, we keep it open until we reach a certain limit after which we close the least recently used one. This should make recomputing the fileinfo for cached Siafiles a lot faster.
2. Move siafile's chunk metadata / hostkeytable out of memory
This step doesn't impact the speed of the endpoint directly but it is a setup for step 3. Basically by only loading the chunk metadata on demand we can greatly increase the number of siafiles we can cache in step 1. Assuming that most siafiles's metadata fits within a single 4kib page on disk (probably withing 2kib actually), we should be able to cache at least 100,000 siafiles for the price of 400 mib of ram or maybe just 10,000 for 40 mib. I'd assume that for enterprise customers it's also no issue to cache 1,000,000 siafiles for a good performance boost.
3. Caching health, redundancy, availability within Siafile
Most of the fields of the fileinfo are already present within the siafile's metadata. We could add fields for the ones that are missing which get updated periodically every time something calls the corresponding method. e.g. every time something calls siafile.Redundancy
the siafile.staticMetadata.cachedRedundancy
field gets updated as well.
The nice thing about the SiaFile is that with little additional work we should be able to only load the metadata at the beginning of the file assuming that the siafile is not already cached due to steps 1 and 2. That means we can use the existing Siafile's metadata as a distributed cache.
I think this is a smart solution since it adds siafile caching, makes loaded siafiles consume a lot less memory, reuses existing and well tested persistence structures and it doesn't require touching the health/repair loop or adding any bloat.