Cleaner split between pictures and process
I think 2 different tasks are badly interacting with each other leading to deadlock in the database.
I'll try to explain the problem, so we can find a solution.
The problem
We have several workers whose tasks are to prepare the incoming pictures (calling the blurring API and generating derivates).
In order to not have any concurrency problems, we use postgres FOR UPDATE SKIP LOCKED
so each workers work on different pictures.
There is a pictures_to_process
table and the worker's query is:
SELECT p.id, pictures_to_process.task, p.metadata
FROM pictures_to_process
JOIN pictures p ON p.id = pictures_to_process.picture_id
ORDER by
p.nb_errors,
CASE
WHEN p.status = 'waiting-for-process' THEN 0
WHEN p.status = 'waiting-for-delete' THEN 0
WHEN p.status::text LIKE 'preparing%' THEN 1
END,
pictures_to_process.ts
FOR UPDATE of pictures_to_process SKIP LOCKED
LIMIT 1
Those worker have also been tasked to delete picture asynchronously (especially when whole collection deletion are asked, the process can take some time, and need to be done asynchronously).
the deletion lead to the query
WITH pic2rm AS (
SELECT pic_id FROM sequences_pictures WHERE seq_id = %(seq)s
),
picWithoutOtherSeq AS (
SELECT pic_id FROM pic2rm
EXCEPT
SELECT pic_id FROM sequences_pictures WHERE pic_id IN (SELECT pic_id FROM pic2rm) AND seq_id != %(seq)s
),
-- Add async task to delete all picture
pic_insertion AS (
INSERT INTO pictures_to_process(picture_id, task)
SELECT pic_id, 'delete' FROM picWithoutOtherSeq
ON CONFLICT (picture_id) DO UPDATE SET task = 'delete'
)
UPDATE pictures SET status = 'waiting-for-delete' WHERE id IN (SELECT pic_id FROM picWithoutOtherSeq)
So this need a write access on the picture_to_process row, but If this same picture is being processed, this needs to wait the end of the process and since the CTE are resolved concurently (cf doc](https://www.postgresql.org/docs/current/queries-with.html#QUERIES-WITH-MODIFYING), since put a lock on the picture's row. So since the picture workers also need to frequently update the picture's status
Proposed plan
After discusssing with @PanierAvide , we think that we mixed too many things and the split between the pictures
and pictures_to_process
tables is not clean enough.
This comes from the pictures.status
field mixing the fact that the picture is hidden
/ready
and being processed (preparing,broken,waiting-for-process,preparing-derivates,preparing-blur,waiting-for-delete
)..
The pictures
table also have processed_at
, nb_errors
and process_error
that could maybe be moved.
change to pictures
'split' the picture.status
in 2 fields:
-
hidden
: bool - a
status
(but I would prefer to find a different name) with values: (available
,deleted
,not-processed
).
When the picture would be added in the table, it would start with the not-processed
status, and the workers will change the status to available
at the end. If the picture is reprocessed later on for one reason or another, it would still be available
during the process.
The deleted
status would be handy in case of picture deletion, we would just remove the some fields (like exif
/heading
/...) if needed, but it would be possible to know that this picture has been known at one point, and be possible to advertise this deletion for a crawling system like the metacatalog.
Note: we would need to see how we can expose this in the API without breaking it.
Removal of the pictures fields processed_at
/process_error
/nb_errors
.
Change to pictures_to_process
Add a column nb_errors
to be able to de-prioritize a task that has already been tried but failed.
Journalisation
With this we loose, the status
in detailing the steps of the preprocess, but it's mainly used for debug, so I don't think it's too bad to loose it.
Loosing the processed_at
/process_error
is more annoying, it's exposed via the /geovisio_status
route, and it's nice to expose this to users.
To limit the read/write of the same row, maybe we can add a tasks
(or whatever better name) table like:
CREATE TABLE picture_task(
picture_id UUID REFERENCES pictures (id),
task picture_process_task, -- (existing enum, either 'prepare' or 'deleted')
error VARCHAR,
started_at TIMESTAMPTZ NOT NULL DEFAULT CURRENT_TIMESTAMP,
finished_at TIMESTAMPTZ
);
CREATE INDEX pictures_id_idx ON picture_task(picture_id);
if needed we also later add a picture_task_step
table with an entry at each step, replacing the preparing-derivates
,preparing-blur
status, but I'm not sure it's really worth it (but it's nice to know that we'll be able to do it later).
Deadlock
With this, we still have the same problem (way more limited though since we do way less writes on the same row), in the fact that eack task would put a write lock on its associated row in the pictures_to_process
and at the end will try to write pictures.status
=ready
while the deletion endpoint will try, in the same transaction to set pictures.status
=deleted
and add or update pictures_to_process
with a row with (pic_id, 'delete')
.
I think we want to keep a PG lock like INSERT INTO pictures_to_process... ON CONFLICT (picture_id) DO UPDATE SET task = 'delete'
, it means that if the preparation has not started yet, we wont be doing it, and if the preparation has started, we wait for it to end before asking for deletion.
I feel that removing this would lead to many concurrency headache.
It's a pity since the DELETE /api/collections/:id
would thus be blocked until the lock is released but even if it can ask for the deletion of thousand of pictures, logically we would only need to wait for <nb_workers>
pictures to be processed, so it should be quick.
Splitting this in 2 queries we won't have a dead lock:
- Add all pics into
pictures_to_process
usingON CONFLICT (picture_id) DO UPDATE SET task = 'delete'
=> that means that when this query is finished, no worker would be preparing those pictures, because, if they had started, they would have write locks on the row, and if there wasprepare
task not already started, they are now indelete
. - Update all pictures status to
delete
And since with this simple change we can resolve the deadlock quickly, I think we can do it now, and refactor the rest later (but I think it's important to refactor this, it feels way cleaner with the proposed plan).
TL;DR
- We split the deletion query in 2 to 'quickly' solve the dead lock problem
- we move 'task' related stuff from
pictures
to another table, and split the picture's status