Skip to content

Make GoogleCloudProfiler configurable across env

Tan Le requested to merge tle-configurable-google-profiler into main

What does this MR do and why?

There are a lot of GoogleCloudProfiler errors in the model-gateway in ai-assist-test cluster. We only want to turn on profiler in production but not other environments.

{"logger": "googleapiclient.http", "level": "warning", "type": "mlops", "stage": "main", "timestamp": "2023-05-31T14:52:04.159891Z", "message": "Sleeping 1.12 seconds before retry 1 of 3 for request: PATCH https://cloudprofiler.googleapis.com/v2/projects/unreview-poc-390200e5/profiles/3afabea7f1aeb820?alt=json, after [Errno 32] Broken pipe"}
{"logger": "googleapiclient.http", "level": "warning", "type": "mlops", "stage": "main", "timestamp": "2023-05-31T15:51:34.543334Z", "message": "Sleeping 0.30 seconds before retry 1 of 3 for request: PATCH https://cloudprofiler.googleapis.com/v2/projects/unreview-poc-390200e5/profiles/0d922bc444004426?alt=json, after [Errno 32] Broken pipe"}
{"logger": "googlecloudprofiler.backoff", "level": "warning", "type": "mlops", "stage": "main", "timestamp": "2023-05-31T16:52:10.671621Z", "message": "Agent will back off for 3.223 seconds due to [Errno 32] Broken pipe"}
{"logger": "googlecloudprofiler.backoff", "level": "warning", "type": "mlops", "stage": "main", "timestamp": "2023-05-31T18:35:10.618760Z", "message": "Agent will back off for 9.881 seconds due to [Errno 32] Broken pipe"}
{"logger": "googlecloudprofiler.backoff", "level": "warning", "type": "mlops", "stage": "main", "timestamp": "2023-05-31T20:03:15.075865Z", "message": "Agent will back off for 5.243 seconds due to [Errno 32] Broken pipe"}
{"logger": "googlecloudprofiler.backoff", "level": "warning", "type": "mlops", "stage": "main", "timestamp": "2023-05-31T21:29:57.720975Z", "message": "Agent will back off for 9.167 seconds due to [Errno 32] Broken pipe"}
{"logger": "googlecloudprofiler.backoff", "level": "warning", "type": "mlops", "stage": "main", "timestamp": "2023-05-31T22:40:40.511706Z", "message": "Agent will back off for 6.790 seconds due to [Errno 32] Broken pipe"}

This MR makes the GOOGLE_CLOUD_PROFILER env configurable and can be turned off in test environment.

How to test locally

Build a local Docker container and run it with different GOOGLE_CLOUD_PROFILER value.

$ docker run --platform linux/amd64 --rm -p 5999:5052 -e GOOGLE_CLOUD_PROFILER=false -v $PWD:/app -it code-suggestions-api:dev
{"logger": "uvicorn.error", "level": "info", "type": "mlops", "stage": "main", "timestamp": "2023-06-05T01:38:18.267329Z", "message": "Started server process [1]"}
{"logger": "uvicorn.error", "level": "info", "type": "mlops", "stage": "main", "timestamp": "2023-06-05T01:38:18.268172Z", "message": "Waiting for application startup."}
{"logger": "uvicorn.error", "level": "info", "type": "mlops", "stage": "main", "timestamp": "2023-06-05T01:38:18.290704Z", "message": "Metrics HTTP server running on http://0.0.0.0:8082"}
{"logger": "uvicorn.error", "level": "info", "type": "mlops", "stage": "main", "timestamp": "2023-06-05T01:38:18.297978Z", "message": "Application startup complete."}
{"logger": "uvicorn.error", "level": "info", "type": "mlops", "stage": "main", "timestamp": "2023-06-05T01:38:18.304282Z", "message": "Uvicorn running on http://0.0.0.0:5052 (Press CTRL+C to quit)"}
^C{"logger": "uvicorn.error", "level": "info", "type": "mlops", "stage": "main", "timestamp": "2023-06-05T01:38:19.723480Z", "message": "Shutting down"}
{"logger": "uvicorn.error", "level": "info", "type": "mlops", "stage": "main", "timestamp": "2023-06-05T01:38:19.828105Z", "message": "Waiting for application shutdown."}
{"logger": "uvicorn.error", "level": "info", "type": "mlops", "stage": "main", "timestamp": "2023-06-05T01:38:19.834196Z", "message": "Application shutdown complete."}
{"logger": "uvicorn.error", "level": "info", "type": "mlops", "stage": "main", "timestamp": "2023-06-05T01:38:19.834384Z", "message": "Finished server process [1]"}

$ docker run --platform linux/amd64 --rm -p 5999:5052 -e GOOGLE_CLOUD_PROFILER=true -v $PWD:/app -it code-suggestions-api:dev
{"logger": "google.auth.compute_engine._metadata", "level": "warning", "type": "mlops", "stage": "main", "timestamp": "2023-06-05T01:38:26.705105Z", "message": "Compute Engine Metadata server unavailable on attempt 1 of 3. Reason: [Errno 111] Connection refused"}
{"logger": "google.auth.compute_engine._metadata", "level": "warning", "type": "mlops", "stage": "main", "timestamp": "2023-06-05T01:38:26.706447Z", "message": "Compute Engine Metadata server unavailable on attempt 2 of 3. Reason: [Errno 111] Connection refused"}
{"logger": "google.auth.compute_engine._metadata", "level": "warning", "type": "mlops", "stage": "main", "timestamp": "2023-06-05T01:38:26.707050Z", "message": "Compute Engine Metadata server unavailable on attempt 3 of 3. Reason: [Errno 111] Connection refused"}
{"logger": "google.auth._default", "level": "warning", "type": "mlops", "stage": "main", "timestamp": "2023-06-05T01:38:26.707340Z", "message": "Authentication failed using Compute Engine authentication due to unavailable metadata server."}
{"logger": "root", "level": "error", "type": "mlops", "stage": "main", "timestamp": "2023-06-05T01:38:26.707938Z", "message": "Uncaught exception", "exception": "Traceback (most recent call last):\n  File \"<string>\", line 1, in <module>\n  File \"/app/codesuggestions/app.py\", line 43, in main\n    setup_profiling(config.profiling, log)\n  File \"/app/codesuggestions/profiling.py\", line 12, in setup_profiling\n    googlecloudprofiler.start(\n  File \"/opt/venv/codesuggestions-9TtSrW0h-py3.9/lib/python3.9/site-packages/googlecloudprofiler/__init__.py\", line 125, in start\n    project_id = profiler_client.setup_auth(project_id, service_account_json_file)\n  File \"/opt/venv/codesuggestions-9TtSrW0h-py3.9/lib/python3.9/site-packages/googlecloudprofiler/client.py\", line 129, in setup_auth\n    self._credentials, credentials_project_id = google.auth.default(\n  File \"/opt/venv/codesuggestions-9TtSrW0h-py3.9/lib/python3.9/site-packages/google/auth/_default.py\", line 648, in default\n    raise exceptions.DefaultCredentialsError(_CLOUD_SDK_MISSING_CREDENTIALS)\ngoogle.auth.exceptions.DefaultCredentialsError: Your default credentials were not found. To set up Application Default Credentials, see https://cloud.google.com/docs/authentication/external/set-up-adc for more information."}
Edited by Tan Le

Merge request reports