Skip to content
GitLab
Next
    • Why GitLab
    • Pricing
    • Contact Sales
    • Explore
  • Why GitLab
  • Pricing
  • Contact Sales
  • Explore
  • Sign in
  • Get free trial
  • GitLab.orgGitLab.org
  • GitLabGitLab
  • Issues
  • #323918

The Namespaces::InProductMarketingEmailsWorker is consistently crashing

Update Issue closed in favour of the epic: &5594

From analysis in Sentry it appears the Namespaces::InProductMarketingEmailsWorker has been crashing almost every day since February 9th. The worker has been enabled since February 2nd. The only days the worker has not emitted an error is from 2-8 February, February 13th and March 3rd. A crashing worker doesn't necessarily mean no emails have been sent that day.

It seems the worker is crashing on 4 different queries (see also the Kibana logs for the last 7 days here):

https://sentry.gitlab.net/gitlab/gitlabcom/issues/2482770/events/?query=is%3Aunresolved%20InProductMarketingEmailsWorker (13 events)

SQL
WITH RECURSIVE "base_and_ancestors" AS (
  (
    SELECT
      "namespaces".*
    FROM
      "namespaces"
      INNER JOIN "onboarding_progresses" ON "onboarding_progresses"."namespace_id" = "namespaces"."id"
    WHERE
      "namespaces"."type" = $ 1
      AND "onboarding_progresses"."git_write_at" BETWEEN $ 2
      AND $ 3
      AND "onboarding_progresses"."pipeline_created_at" IS NULL
      AND "namespaces"."id" = $ 4
  )
  UNION
  (
    SELECT
      "namespaces".*
    FROM
      "namespaces",
      "base_and_ancestors"
      INNER JOIN "onboarding_progresses" ON "onboarding_progresses"."namespace_id" = "namespaces"."id"
    WHERE
      "namespaces"."type" = $ 5
      AND "onboarding_progresses"."git_write_at" BETWEEN $ 6
      AND $ 7
      AND "onboarding_progresses"."pipeline_created_at" IS NULL
      AND "namespaces"."id" = "base_and_ancestors"."parent_id"
  )
)
SELECT
  "namespaces".*
FROM
  "base_and_ancestors" AS "namespaces"
  INNER JOIN "onboarding_progresses" ON "onboarding_progresses"."namespace_id" = "namespaces"."id"
WHERE
  "namespaces"."parent_id" IS NULL
LIMIT
  $ 8

https://sentry.gitlab.net/gitlab/gitlabcom/issues/2488025/events/?query=is%3Aunresolved%20InProductMarketingEmailsWorker (12 events)

SQL
WITH RECURSIVE "base_and_ancestors" AS (
  (
    SELECT
      "namespaces".*
    FROM
      "namespaces"
      INNER JOIN "onboarding_progresses" ON "onboarding_progresses"."namespace_id" = "namespaces"."id"
    WHERE
      "namespaces"."type" = $ 1
      AND "onboarding_progresses"."created_at" BETWEEN $ 2
      AND $ 3
      AND "onboarding_progresses"."git_write_at" IS NULL
      AND "namespaces"."id" = $ 4
  )
  UNION
  (
    SELECT
      "namespaces".*
    FROM
      "namespaces",
      "base_and_ancestors"
      INNER JOIN "onboarding_progresses" ON "onboarding_progresses"."namespace_id" = "namespaces"."id"
    WHERE
      "namespaces"."type" = $ 5
      AND "onboarding_progresses"."created_at" BETWEEN $ 6
      AND $ 7
      AND "onboarding_progresses"."git_write_at" IS NULL
      AND "namespaces"."id" = "base_and_ancestors"."parent_id"
  )
)
SELECT
  "namespaces".*
FROM
  "base_and_ancestors" AS "namespaces"
  INNER JOIN "onboarding_progresses" ON "onboarding_progresses"."namespace_id" = "namespaces"."id"
WHERE
  "namespaces"."parent_id" IS NULL
LIMIT
  $ 8

https://sentry.gitlab.net/gitlab/gitlabcom/issues/2489971/events/?query=is%3Aunresolved%20InProductMarketingEmailsWorker (1 event)

SQL
WITH RECURSIVE "base_and_ancestors" AS (
  (
    SELECT
      "namespaces".*
    FROM
      "namespaces"
      INNER JOIN "onboarding_progresses" ON "onboarding_progresses"."namespace_id" = "namespaces"."id"
    WHERE
      "namespaces"."type" = $ 1
      AND "onboarding_progresses"."git_write_at" IS NOT NULL
      AND "onboarding_progresses"."pipeline_created_at" IS NOT NULL
      AND GREATEST(
        "onboarding_progresses"."git_write_at",
        "onboarding_progresses"."pipeline_created_at"
      ) BETWEEN $ 2
      AND $ 3
      AND "onboarding_progresses"."trial_started_at" IS NULL
      AND "namespaces"."id" = $ 4
  )
  UNION
  (
    SELECT
      "namespaces".*
    FROM
      "namespaces",
      "base_and_ancestors"
      INNER JOIN "onboarding_progresses" ON "onboarding_progresses"."namespace_id" = "namespaces"."id"
    WHERE
      "namespaces"."type" = $ 5
      AND "onboarding_progresses"."git_write_at" IS NOT NULL
      AND "onboarding_progresses"."pipeline_created_at" IS NOT NULL
      AND GREATEST(
        "onboarding_progresses"."git_write_at",
        "onboarding_progresses"."pipeline_created_at"
      ) BETWEEN $ 6
      AND $ 7
      AND "onboarding_progresses"."trial_started_at" IS NULL
      AND "namespaces"."id" = "base_and_ancestors"."parent_id"
  )
)
SELECT
  "namespaces".*
FROM
  "base_and_ancestors" AS "namespaces"
  INNER JOIN "onboarding_progresses" ON "onboarding_progresses"."namespace_id" = "namespaces"."id"
WHERE
  "namespaces"."parent_id" IS NULL
LIMIT
  $ 8

https://sentry.gitlab.net/gitlab/gitlabcom/issues/2497046/events/?query=is%3Aunresolved%20InProductMarketingEmailsWorker (1 event)

SQL
INSERT INTO
  "experiment_subjects" (
    "experiment_id",
    "group_id",
    "created_at",
    "updated_at"
  )
VALUES
  ($ 1, $ 2, $ 3, $ 4) RETURNING "id"

The last event could have been a race condition where a namespace was deleted before it could be recorded in experiment_subjects. We need to guard against such a failure. Currently it crashes the whole worker.

In the other events, somehow an incorrect query is created. I could not find out how this is possible, since when running the worker locally, those queries are not executed. It seems somewhere the root_namespace is being queried in the can_perform_action? method before any of the groups have been materialized (so the ActiveRecord relation is expanded to include root_namespaces?)

Besides fixing these wrong queries, in order to better guard against these kind of failures, it might be necessary to fire a separate worker for each namespace, so when a failure occurs in one namespace, at least emails will still be send for other namespaces. the only problem is currently we keep an in memory list of user_ids for which we have sent an email already, so as not to send the same user 2 emails on the same day. We need to find another solution for that problem.

Edited Mar 11, 2021 by Nicolas Dular
Assignee
Assign to
Time tracking