Commit 2d635424 authored by Utkarsh Singhal's avatar Utkarsh Singhal 💬 Committed by Duo Developer
Browse files

Add Monitoring Clustering Costs section

parent 6c43494d
Loading
Loading
Loading
Loading
+30 −0
Original line number Diff line number Diff line
@@ -78,6 +78,36 @@ This function returns valuable information about the clustering state of your ta

Ideally, `average_overlaps` would be below 1 and `average_depth` would be ~ 1. A high number indicates the table is not well clustered.

### Monitoring Clustering Costs

To monitor the cost and activity of automatic clustering, query the `automatic_clustering_history` table, here's an example query:

```sql
SELECT 
    start_time,
    end_time,
    table_name,
    schema_name,
    database_name,
    credits_used,
    num_bytes_reclustered,
    num_rows_reclustered,
    DATEDIFF('minute', start_time, end_time) AS duration_minutes
FROM snowflake.account_usage.automatic_clustering_history
WHERE table_name = 'FCT_BEHAVIOR_STRUCTURED_EVENT'  -- Replace with your table name
    AND schema_name = 'COMMON'                       -- Replace with your schema
    AND database_name = 'PROD'                       -- Replace with your database
    AND start_time >= '2025-12-17'                   -- When clustering was enabled
ORDER BY start_time DESC;
```

Key Metrics:

`credits_used`: Snowflake credits consumed by automatic clustering operations
`num_bytes_reclustered`: Amount of data reorganized (in bytes)
`num_rows_reclustered`: Number of rows reorganized
`duration_minutes`: How long the clustering operation took

## Best Practices

1. Choose clustering keys wisely based on your query patterns