Make GeoLogCursor Highly Available
Closes https://gitlab.com/gitlab-org/gitlab-ee/issues/2917
- Refactored GeoLogCursor (GLC) logging so it's easy to see the PID of the GLC using a central logging mechanism.
- GLCs have now a random delay to fetch the events. Only 1 can process events at the same time, but any can pick them.
- One GLC will be active, as long as the GLC is running fine. If there's a network error or an Exception, other GLC will take over and become active.
- Updated
LEASE_TIMEOUTfrom 5 minutes to 30 seconds. - The random wait between fetching events is set to 0.1 to 2 seconds.
Caveats:
- It could happen that if the events take a long time to process, the GLC may not renew the lease, and this may expire (especially on short timeouts), making another GLC process the same batch. This should not impact the events processed, though.
Example - 3 Geo LogCursors running, with 20 seconds lease timeout:
{"severity":"INFO","time":"2017-11-10T11:19:22.644Z","pid":78473,"class":"Daemon","message":"Lease obtained. Fetching events."}
{"severity":"INFO","time":"2017-11-10T11:19:32.482Z","pid":78733,"class":"Lease","message":"Cannot obtain an exclusive lease. There must be another process already in execution."}
{"severity":"INFO","time":"2017-11-10T11:19:32.943Z","pid":78752,"class":"Lease","message":"Cannot obtain an exclusive lease. There must be another process already in execution."}
{"severity":"INFO","time":"2017-11-10T11:19:38.650Z","pid":78473,"class":"Daemon","message":"Lease released. Finished fetching events."}
{"severity":"INFO","time":"2017-11-10T11:19:42.949Z","pid":78752,"class":"Daemon","message":"Lease obtained. Fetching events."}
{"severity":"INFO","time":"2017-11-10T11:19:43.658Z","pid":78473,"class":"Lease","message":"Cannot obtain an exclusive lease. There must be another process already in execution."}
{"severity":"INFO","time":"2017-11-10T11:19:52.486Z","pid":78733,"class":"Lease","message":"Cannot obtain an exclusive lease. There must be another process already in execution."}
{"severity":"INFO","time":"2017-11-10T11:19:56.953Z","pid":78752,"class":"Daemon","message":"Lease released. Finished fetching events."}
{"severity":"INFO","time":"2017-11-10T11:20:00.956Z","pid":78752,"class":"Daemon","message":"Lease obtained. Fetching events."}
Timeline of events:
- GLC with PID 78473 obtains the lease and fetches events
- Other GLCs attempt to get the lease but it's not available (78473 is still processing events)
- GLC 78473 releases the lease
- GLC with PID 78752 obtains the lease and fetches events
- Other GLCs attempt to get the lease but it's not available (78752 is still processing events)
- GLC 78752 releases the lease
- GLC 78752 obtains the lease and fetches events
-
Changelog entry added, if necessary -
Documentation created/updated -
API support added -
Tests added for this feature/bug - Review
-
Has been reviewed by UX -
Has been reviewed by Frontend -
Has been reviewed by Backend -
Has been reviewed by Database
-
-
Conform by the merge request performance guides -
Conform by the style guides -
Squashed related commits together -
Internationalization required/considered -
If paid feature, have we considered GitLab.com plan and how it works for groups and is there a design for promoting it to users who aren't on the correct plan
Edited by James Lopez