update tsvector columns with Celery - optimization
Celery worker updates tsv columns on periodic and ad-hoc basis, which may lead to deadlocks in db:
DeadlockDetected: deadlock detected
DETAIL: Process 4023122 waits for ShareLock on transaction 912438028; blocked by process 4023044.
Process 4023044 waits for ShareLock on transaction 912438135; blocked by process 4023122.
HINT: See server log for query details.
CONTEXT: while updating tuple (200,1) in relation "database_table_165406"
File "django/db/backends/utils.py", line 89, in _execute
return self.cursor.execute(sql, params)
File "/baserow/venv/lib/python3.9/site-packages/opentelemetry/instrumentation/psycopg2/__init__.py", line 249, in execute
return _cursor_tracer.traced_execution(
File "/baserow/venv/lib/python3.9/site-packages/opentelemetry/instrumentation/dbapi/__init__.py", line 460, in traced_execution
return query_method(*args, **kwargs)
OperationalError: deadlock detected
DETAIL: Process 4023122 waits for ShareLock on transaction 912438028; blocked by process 4023044.
Process 4023044 waits for ShareLock on transaction 912438135; blocked by process 4023122.
HINT: See server log for query details.
CONTEXT: while updating tuple (200,1) in relation "database_table_165406"
File "baserow/contrib/database/search/handler.py", line 606, in run_tsvector_update_statement
return cls.split_update_into_chunks_by_ranges(
File "baserow/contrib/database/search/handler.py", line 536, in split_update_into_chunks_by_ranges
total_updated += next_chunk.update(**update_query)
File "django/db/models/query.py", line 1191, in update
rows = query.get_compiler(self.db).execute_sql(CURSOR)
File "django/db/models/sql/compiler.py", line 1822, in execute_sql
cursor = super().execute_sql(result_type)
File "django/db/models/sql/compiler.py", line 1398, in execute_sql
cursor.execute(sql, params)
File "django/db/backends/utils.py", line 67, in execute
return self._execute_with_wrappers(
File "django/db/backends/utils.py", line 80, in _execute_with_wrappers
return executor(sql, params, many, context)
File "django/db/backends/utils.py", line 89, in _execute
return self.cursor.execute(sql, params)
File "django/db/utils.py", line 91, in __exit__
raise dj_exc_value.with_traceback(traceback) from exc_value
File "django/db/backends/utils.py", line 89, in _execute
return self.cursor.execute(sql, params)
File "/baserow/venv/lib/python3.9/site-packages/opentelemetry/instrumentation/psycopg2/__init__.py", line 249, in execute
return _cursor_tracer.traced_execution(
File "/baserow/venv/lib/python3.9/site-packages/opentelemetry/instrumentation/dbapi/__init__.py", line 460, in traced_execution
return query_method(*args, **kwargs)
Issued queries are updating specific columns, like
UPDATE database_table_xxx
SET tsv_field_xxxx = try_set_tsv(%s::regconfig,
COALESCE(regexp_replace((database_table_xxxx.field_xxx)::varchar,
%s, %s, %s), %s))
WHERE (NOT database_table_xxx.trashed
AND database_table_xxxx.id IN
(SELECT U0.id
FROM database_table_xxx U0
WHERE NOT U0.trashed
ORDER BY U0.id ASC
LIMIT 2000))
based on column type search configuration in python.
One way to handle this is to use advisory locks in workers.