Pass configurable file size limit to gitlab-elasticsearch-indexer
This introduces a new configuration parameter to the Gitaly configuration to set the file size limit. This was previously hardcoded in here as 1024 * 1024
. To be backward compatible the configuration is optional and defaults to 1024 * 1024
as before. Passing the configuration through allows us to make this value configurable from GitLab but also ensures it's consistent with the GitLab code highlighting limit per gitlab#219329 (closed) .
The GitLab side of this change is at gitlab!36925 (merged) .
In order to make this work there were a couple of refactorings needed due to this code relying on a global const
for this configuration value that was even used at init
time.
The main refactorings are:
-
Indexer
struct now has anEncoder
field. This is used to construct anEncoder
inmain.go
after we know the file size limit because theEncoder
needs to use this to pre-allocate memory for encoding. - The
indexer/encoding.go
no longer initializes the encoder ininit
since we need to wait until after we have the configuration value - The
Repository
type now hasGetLimitFileSize
(along withgitalyRepository
having a field in struct) to expose the file size limit. The gitaly config/repository type was chosen as this is the first place we needed to know the limit to avoid fetching the large files from Gitaly - The
repository.File
struct now has an extra boolean fieldSkipTooLarge
and this value is used by theelastic
client to avoid indexing the files that are too large, rather than the previous approach of checking whether.Size
is greater than the limit. This is because we'd otherwise have to pass the limit through to this class as well and the struct may as well just store the fact that the file is too large rather than checking limits in multiple places. - The
repository.File
no longer has aSize
field since this was seemingly only used to check if it exceeded the size limit which is now replaced bySkipTooLarge
boolean
This solves:
Edited by Dylan Griffith