Skip to content

[SE-5278] [SE-5279] feat: initial implementation

Boros Gábor requested to merge gabor/add-initial-implementation into main

Description

This PR implements the initial version of the monitoring_extensions package that is going to be used as a plugable extension in OpenCraft's monitoring.

This Django Plugin App enriches the NewRelic based monitoring with custom events. The event collection is triggered by a management command, send_monitoring_events that's called by a cron job or by a celerybeat scheduled task, monitoring_extensions.tasks.send_monitoring_events.

Currently the following (configurable) events are sent to NewRelic:

  • Celery queue length metrics (CeleryTaskCount)
  • Celery (scheduled) task execution (CeleryTaskExecution)

Supporting information

During a recent outage, it turned out that Ocim controlled instances has no Queue monitoring, therefore we need to implement a solution to be able to monitor celery queue length and task execution.

Dependencies

N/A

Sandbox

Please use the ESME sandbox for testing.

Note: that sandbox has not so many tasks. If the management command succeeds, consider the command working. For more testing info check the next section.

Testing instructions

Note: the app is already installed in the sandbox manually, which means the cron is not set up.

Test the management command:

  1. SSH in the sandbox app
  2. source /edx/app/edxapp/edxapp_env
  3. cd /edx/app/edxapp/edx-platform/
  4. sudo -E -u edxapp env "PATH=$PATH" /edx/app/edxapp/venvs/edxapp/bin/python manage.py lms send_monitoring_events
  5. Go to NewRelic's alert condition
  6. Check for the Celerybeat task executions and Celery queue overload conditions

Test the celery beat scheduled task:

Note: the task is already prepared, you don't need to set it up.

  1. Login to <LMS_URL>/admin/
  2. Navigate to <LMS_URL>/admin/django_celery_beat/periodictask/
  3. Check for the manually pre-configured Celerybeat execution monitoring: every 5 minutes schedule
  4. Check for the manually pre-configured Celery queue length monitoring: every 5 minutes schedule

Validating that the monitoring returns the correct results:

  1. Login to the staging instance
  2. Start an LMS django shell
  3. Run the script below
  4. Exit from the shell and run redis-cli
  5. Execute KEYS * -- If you cannot see the (not _kombu prefixed) lms/cms keys, that's because the queues got emptied; to check the length of a queue, run LLEN <QUEUE NAME>

script:

from monitoring_extensions.events import CeleryTaskCountReporter
CeleryTaskCountReporter().collect_data()

Deadline

ASAP

Edited by Boros Gábor

Merge request reports