@nick.thomas When we ultimately move Snippets to Git backed repos vs. the database would that impact the concerns/changes required around this? As that's the best path forward anyway, maybe this helps provide a little fuel to get us there.
The new retrying logic might solve this, or at least lessen the impact. Should we still limit indexing to the first 1MB or so anyway?
As for limiting snippets in general: it doesn't look like have a limit on snippets.content on the Rails side, and in PostgreSQL it seems to be 1 GB which sounds ripe for abuse. FWIW I tried uploading a ~100MB snippet through the web UI but just got a 502 error. Not sure if that was a timeout or a deliberate blocking of the request?
GitHub gist's maximum is not documented, but is probably its maximum file size which is 100 mb. (and as a side note, its file upload is restricted to 25 mb)
I think the primary function of snippet is to share text, not to share large file. So I would say we can have a configurable setting, and arbitrarily defaults to 10 mb.
Back to indexing, I want to propose to
introduce a maximum index size similar to max hightlight size (1mb), and document about this.
move large snippet indexing out of bulk indexing, but instead use separate job to index on a per snippet basis
I think indexing only up to the first 1MiB is a fine iteration. We could make the max size configurable in future, but we'd want it to apply to repo blobs as well as to snippets, I think.
I don't think we want to support arbitrarily large snippets. That could be painful.
Agreed with @nick.thomas here. I think it make sense to hard code a specific value as part of addressing this, and then we can look in to introducing additional configuration in a later iteration.
I'm supportive of the similar 1MiB value as well to begin with.