Skip to content
Snippets Groups Projects

0094: How to run ANALYZE (to collect statistics)

Open Nikolay Samokhvalov requested to merge nik-howto-analyze into main
3 unresolved threads

@fomin.list @vitabaks @msdousti :wave: mind checking this small howto?

Merge request reports

Loading
Loading

Activity

Filter activity
  • Approvals
  • Assignees & reviewers
  • Comments (from bots)
  • Comments (from users)
  • Commits & branches
  • Edits
  • Labels
  • Lock status
  • Mentions
  • Merge request status
  • Tracking
  • 1 # How to run ANALYZE (to collect statistics)
    2 The command `ANALYZE` collect statistics about a database ([docs](https://www.postgresql.org/docs/current/sql-analyze.html)). Maintaining fresh statistics is crucial for achieving good database performance.
    3
    4 Running it is trivial:
    5 ```sql
    6 analyze;
    7 ```
    8
    9 However, this, being single-threaded, can take a lot of time.
    10
    11 ## How to run ANALYZE at full speed
    12 To utilize multiple CPU cores, we can use client program `vacuumdb` with option `--analyze-only` and multiple workers ([docs](https://www.postgresql.org/docs/current/app-vacuumdb.html)).
    13
    14 The following runs `ANALYZE` on *all* databases (`--all`), using the number of workers matching the number of vCPUs, and limiting overall duration by 2 hours (connection options like `-h`, `-U` are not shown here):
  • 17 while IFS= read -r line; do
    18 echo "$(date '+%Y-%m-%d %H:%M:%S') $line"
    19 done
    20 } < <(
    21 PGOPTIONS='-c statement_timeout=2h' \
    22 vacuumdb \
    23 --analyze-only \
    24 --all \
    25 --jobs $(nproc) \
    26 --echo
    27 ) | tee -a analyze_all_$(date +%Y%m%d).log
    28 ```
    29
    30 With this snippet, all the commands are going to be also printed and logged, with a timestamps (alternatively, instead of the `while`, one could use `ts` from `moreutils`).
    31
    32 `--jobs $(nproc)` works for Linux, and it defines the number of workers matching the number of vCPUs. Note that if there are large unpartitioned tables, at some point, only a few workers may remain active. A solution to this problem can be partitioning: with many smaller partitions, it can allow all workers to remain busy, which can speed up the whole operation drastically on machines with a high number of CPU cores.
    • If I'm not mistaken, $(nproc) returns the number of CPUs on the client machine as opposed to the database server. Do we want to parallelize the analyze process on the client, or the server?

    • Please register or sign in to reply
    • This was a very nice HowTo, thanks!

      I put a few minor comments. Two things that IMHO can improve the how to:

      • A paragraph about the required permission to run analyze on a table/database.
      • Pointing out to cost-based vacuum delay. We have a pg_cron job that vacuums (and analyzes) our tables during midnight, when the load is low. But we found it rather intrusive, so we adjusted the parameters to make diminish its effect. This is a flip side of this HowTo, where it shows how to run it as fast as possible. (I saw your Tweet, and I'm wondering whether these parameters affect only vacuum or analyze as well; since the documentation list above mentions both: "During the execution of VACUUM and ANALYZE commands...")
    • Docs say it affects both, despite the name: https://postgres.ai/chats/01928625-70a6-78f6-a20c-17d74bcad95a (checked source code there too)

      Thanks!!

    • Ah right, you pointed to the same doc.

      Source code has this confusion too. But definitely it affects both, good point

      I wonder how how "intrusive" it would be in your case if without VACUUM to go full speed. Likely not so much.

    • In our case, we have a super time-critical service with p99 < 5ms.

      A normal vacuum of the whole DB takes only 20 mins, but p99 exceeds 1 second, which is not acceptable for us. After some fine tuning, I could run it with p99 < 200 ms over the course of 2-3 hours. This is acceptable during midnight.

      PS: We run scheduled vacuum (analyze, skip_locked, verbose);. Without vacuum (only analyze), it would not be intrusive much, and would run quite fast, as the DB is not super big (100-200 GB).

      Edited by Sadeq Dousti
    • Right, but this is VACUUM.. Original discussion is about ANALYZE only

    • I agree. All I'm saying is that it maybe useful to point out "analyze" can also be controlled by the cost-based "vacuum" model.

    • I saw an interesting case today: We have a ~ 4TB database, with less than 20 tables. Largest tables are ~ 1.5 TB, 1 TB, and 0.5 TB.

      Nightly analyze (not vacuum, just analyze) is being executed on the whole DB.

      • This was previously run on RDS (32 GB RAM, 8 vCPUs) and it took ~ 15 mins.
      • We recently migrated to Aurora (64 GB RAM, 8 vCPUs) and it takes ~ 30-60 mins.

      We are investigating ...

    • I'd check the disk capabilities in both cases -- throughput (IOPS, MoB/s) limits -- and compare single-threaded ANALYZE VERBOSE with timestamps for each line of the output

    • Thanks, will do!

    • I'd be curious to learn what you find. Can share also this post I read recently that shows classic RDS (with large disk) being faster than Aurora for disk operations, the difference seems similar to what you observed.

    • We started analzing table by table. Huge tables (> TB) were quite fast and used very little memory (400 MB). But, on one small table (3 GB) the analyze took long, memory grew pretty fast, and DB server was killed due to OOM.

      We used vacuum and vacuum freeze on the table, both ran without issue.

      In running vacuum full, we saw the issue:

      ERROR: found xmin 183032816 from before relfrozenxid 514166874

      This means the table visibility map is corrupted.

      This is a known issue: https://postgrespro.com/list/thread-id/2501007

      A tool like pg_surgery offers heap_force_freeze(), which we can't use on Aurora.

      I found another way to fix visibility maps, which seems to work. But I'm surprised at the fact that running analyze on such a table can OOMKill the whole server, as opposed to gracefully ending the analyze process.

    • Please register or sign in to reply
  • added 1 commit

    • 65119a27 - Apply 1 suggestion(s) to 1 file(s)

    Compare with previous version

  • added 1 commit

    • 9217e1f6 - Apply 1 suggestion(s) to 1 file(s)

    Compare with previous version

  • Please register or sign in to reply
    Loading