Optimize high frequency query: "count(*) from projects join project_mirror_data", >50% of total_time on the master (gitlab.com)

On GitLab.com, on Postgres master we observe that the following query takes ~50% of total_time according to pg_stat_statements (so ~50% of CPU time spent on the master server is related to this query):

SELECT
    COUNT(*)
FROM
    "projects"
    INNER JOIN project_mirror_data import_state ON import_state.project_id = projects.id
WHERE
    "projects"."mirror" = ?
    AND ("import_state"."status" NOT IN (?, ?))
    AND (import_state.next_execution_timestamp <= ?)
    AND (import_state.retry_count <= ?)

This query's profile according to pg_stat_statements:

#	Calls	▼ Total time	Rows	shared_blks_hit	shared_blks_read	shared_blks_dirtied	shared_blks_written	blk_read_time	blk_write_time	kcache_reads	kcache_writes	kcache_user_time_ms	kcache_system_time	Query
1	192,469 30.36/sec 1.00/call 0.26%	15,234,607.23 ms 2.403s/sec 79ms/call 49.21%	192,469 30.36/sec 1.00/call 0.05%	7,534,894,752 blks 1.19M blks/sec 39.15K blks/call 64.63%	0 blks 0.00 blks/sec 0.00 blks/call 0.00%	2,033 blks 0.32 blks/sec 0.01 blks/call 0.04%	0 blks 0.00 blks/sec 0.00 blks/call 0.00%	0.00 ms 0s/sec 0s/call 0.00%	0.00 ms 0s/sec 0s/call 0.00%	0.00 bytes 0.00 bytes/sec 0.00 bytes/call 0.00%	0.00 bytes 0.00 bytes/sec 0.00 bytes/call 0.00%	0.00 ms 0s/sec 0s/call 0.00%	0.00 ms 0s/sec 0s/call 0.00%	SELECT COUNT(*) FROM "projects" INNER JOIN project_mirror_data import_state ON import_state.project_id = projects.id WHERE "projects"."mirror" = ? AND ("import_state"."status" NOT IN (?, ?)) AND (import_state.next_execution_timestamp <= ?) AND (import_state.retry_count <= ?)

(see the full Postgres checkup report for GitLab.com)

Ideas:

Basic idea (not really optimization, but reducing load on the master, allowing future scaling): move this SELECT to replicas to reduce impact on the master (might be not applicable, but still worth to consider);
Reduce number of calls. The query looks like related to background scheduled jobs -- do we really need to execute this 30 times per second?
Consider app-level caching (I suspect this idea will not work if it's really related to background jobs)
Improve performance switching from JOIN with Nest Loop based on 2 Index scans to 1-table SELECT with a single Index Only Scan. To achieve this, we need to perform light denormalization: duplicate "projects"."mirror" flag in the project_mirror_data table (ideally, using a trigger), and get rid of projects in they query completely.
Do we really need count(*) every time? I suppose that if count > 0, next step is to retrieve data itself. Isn't it better to read "next N" records avoiding count and proceeding to processing unless nothing is found?

(consider the latest item first).

Edited Apr 23, 2019 by Andreas Brandl