[SE-5278] [SE-5279] feat: initial implementation
Description
This PR implements the initial version of the monitoring_extensions
package that is going to be used as a plugable extension in OpenCraft's monitoring.
This Django Plugin App enriches the NewRelic based monitoring with custom events. The event collection is triggered by a management command, send_monitoring_events
that's called by a cron job or by a celerybeat scheduled task, monitoring_extensions.tasks.send_monitoring_events
.
Currently the following (configurable) events are sent to NewRelic:
- Celery queue length metrics (
CeleryTaskCount
) - Celery (scheduled) task execution (
CeleryTaskExecution
)
Supporting information
During a recent outage, it turned out that Ocim controlled instances has no Queue monitoring, therefore we need to implement a solution to be able to monitor celery queue length and task execution.
Dependencies
N/A
Sandbox
Please use the ESME sandbox for testing.
Note: that sandbox has not so many tasks. If the management command succeeds, consider the command working. For more testing info check the next section.
Testing instructions
Note: the app is already installed in the sandbox manually, which means the cron is not set up.
Test the management command:
- SSH in the sandbox app
source /edx/app/edxapp/edxapp_env
cd /edx/app/edxapp/edx-platform/
sudo -E -u edxapp env "PATH=$PATH" /edx/app/edxapp/venvs/edxapp/bin/python manage.py lms send_monitoring_events
- Go to NewRelic's alert condition
- Check for the
Celerybeat task executions
andCelery queue overload
conditions
Test the celery beat scheduled task:
Note: the task is already prepared, you don't need to set it up.
- Login to <LMS_URL>/admin/
- Navigate to <LMS_URL>/admin/django_celery_beat/periodictask/
- Check for the manually pre-configured
Celerybeat execution monitoring: every 5 minutes
schedule - Check for the manually pre-configured
Celery queue length monitoring: every 5 minutes
schedule
Validating that the monitoring returns the correct results:
- Login to the staging instance
- Start an LMS django shell
- Run the script below
- Exit from the shell and run
redis-cli
- Execute
KEYS *
-- If you cannot see the (not _kombu prefixed) lms/cms keys, that's because the queues got emptied; to check the length of a queue, runLLEN <QUEUE NAME>
script:
from monitoring_extensions.events import CeleryTaskCountReporter
CeleryTaskCountReporter().collect_data()
Deadline
ASAP