Write logging to own log file
Background
Logging is currently sent to standard output in text format and returned back to the calling method in Rails. The logs are currently (after gitlab!97519 (merged)) written to elasticsearch.log and sidekiq.log using text format.
Issues
The labkit format does not match what is currently in the elasticsearch.log file. Here's an example of the output (with the formatting set to json). Issues I ran into:
- output from the indexer can be multiple lines from stdout separated by
\n - field names used aren't the same (example:
msgvsmessage,levelvsseverity) - time formatting is different
example json from indexer
{
"severity": "WARN",
"time": "2022-09-09T14:24:56.566Z",
"correlation_id": "01GCHA78SSGW5AXE77HCBAHMC2",
"message": "time=\"2022-09-09T10:24:55-04:00\" level=info msg=\"Setting timeout\" timeout=24h0m0s\ntime=\"2022-09-09T10:24:55-04:00\" level=debug msg=\"Indexing from fc5fc4123ff9045caae217c0ee676bdcae205c2a to aafdd38ea1d3b2639273ee3223ed4b9c8b220dfb\" IndexNameCommits=gitlab-development-commits IndexNameDefault=gitlab-development Permissions=\"&{20 20}\" blobType=blob projectID=6 skipCommits=false\ntime=\"2022-09-09T10:24:55-04:00\" level=debug msg=\"Indexing blob change\" operation=PUT path=README.md\ntime=\"2022-09-09T10:24:55-04:00\" level=debug msg=\"Indexing commit\" commitID=035f6b6a5da8d0b0fe1df4fce65d7eb4b15d5964\ntime=\"2022-09-09T10:24:55-04:00\" level=debug msg=\"Indexing commit\" commitID=139a74f7cc31ac0d2ab7ed81b4b67e5c7d021d42\ntime=\"2022-09-09T10:24:55-04:00\" level=debug msg=\"Indexing commit\" commitID=c4351d6addb01d0938419af583f2b5441c31f721\ntime=\"2022-09-09T10:24:55-04:00\" level=debug msg=\"Indexing commit\" commitID=c13c6584a80a35071358af58c04a400b58593818\ntime=\"2022-09-09T10:24:55-04:00\" level=debug msg=\"Indexing commit\" commitID=8e2812d5076b8921817132e476ef2b66feb8966c\ntime=\"2022-09-09T10:24:55-04:00\" level=debug msg=\"Indexing commit\" commitID=684354c9690612fd1c4d58191085bc4afe721c49\ntime=\"2022-09-09T10:24:55-04:00\" level=debug msg=\"Indexing commit\" commitID=712583978e953f26c4c30e7b3f79d00a4faf3a99\ntime=\"2022-09-09T10:24:55-04:00\" level=debug msg=\"Indexing commit\" commitID=3fcb35d9352b88aacfe804bc6ba0d7d6a3c57d1d\ntime=\"2022-09-09T10:24:55-04:00\" level=debug msg=\"Indexing commit\" commitID=aafdd38ea1d3b2639273ee3223ed4b9c8b220dfb\n",
"status": 0,
"project_id": 6,
"from_sha": "fc5fc4123ff9045caae217c0ee676bdcae205c2a",
"to_sha": "aafdd38ea1d3b2639273ee3223ed4b9c8b220dfb",
"index_wiki": false
}
I looked into how other GitLab projects use labkit for ideas, but the logging implementations were a lot more complex than the indexer (gitaly) or use their own log file (gitlab-shell).
Proposal
We should expand logging to write to it's own file (gitlab-indexer.log) and to use JSON format using https://gitlab.com/gitlab-org/labkit (specifically the WithContextFields method to include the correlation information.
