Skip to content

Pass configurable file size limit to gitlab-elasticsearch-indexer

Dylan Griffith requested to merge 219329-pass-file-size-limit-into-indexer into master

This introduces a new configuration parameter to the Gitaly configuration to set the file size limit. This was previously hardcoded in here as 1024 * 1024. To be backward compatible the configuration is optional and defaults to 1024 * 1024 as before. Passing the configuration through allows us to make this value configurable from GitLab but also ensures it's consistent with the GitLab code highlighting limit per gitlab#219329 (closed) .

The GitLab side of this change is at gitlab!36925 (merged) .

In order to make this work there were a couple of refactorings needed due to this code relying on a global const for this configuration value that was even used at init time.

The main refactorings are:

  1. Indexer struct now has an Encoder field. This is used to construct an Encoder in main.go after we know the file size limit because the Encoder needs to use this to pre-allocate memory for encoding.
  2. The indexer/encoding.go no longer initializes the encoder in init since we need to wait until after we have the configuration value
  3. The Repository type now has GetLimitFileSize (along with gitalyRepository having a field in struct) to expose the file size limit. The gitaly config/repository type was chosen as this is the first place we needed to know the limit to avoid fetching the large files from Gitaly
  4. The repository.File struct now has an extra boolean field SkipTooLarge and this value is used by the elastic client to avoid indexing the files that are too large, rather than the previous approach of checking whether .Size is greater than the limit. This is because we'd otherwise have to pass the limit through to this class as well and the struct may as well just store the fact that the file is too large rather than checking limits in multiple places.
  5. The repository.File no longer has a Size field since this was seemingly only used to check if it exceeded the size limit which is now replaced by SkipTooLarge boolean

This solves:

Edited by Dylan Griffith

Merge request reports