Skip to content

POC : Automatically detecting duplicate database indexes via tests

When I have been reviewing database MRs, I often find duplicate indexes being created. Usually this happens when there are references, and developers forget to set index: false, or when there are join tables. Developers create two indexes, and 1 extra unnecessary index for the uniqueness of the columns on the join table.

To clarify. If a table test has two btree indexes of the form

  1. (a desc, b desc)
  2. (a desc, b desc, c desc).

The first index is a duplicated index, and is not necessary anymore, because any query on the columns a, b, the 2nd index can be used.

To test this, I have created the MR. See the errors in the last pipeline run. Aren't these indexes duplicate

An example:

       Duplicate index: index_vulnerabilities_project_id_state_severity_default_branch with ["index_vulnerabilities_on_project_id_and_state_and_severity"]
       index_vulnerabilities_project_id_state_severity_default_branch : [{:name=>"project_id", :order=>:asc}, {:name=>"state", :order=>:asc}, {:name=>"severity", :order=>:asc}, {:name=>"present_on_default_branch", :order=>:asc}]
       index_vulnerabilities_on_project_id_and_state_and_severity : [{:name=>"project_id", :order=>:asc}, {:name=>"state", :order=>:asc}, {:name=>"severity", :order=>:asc}]

Limitations:

  1. This only works on the btree indexes for now. Because I am not sure this would work on other indexes.
  2. Only on indexes with no where and no opclasses

TODO:

  1. Maybe extract the duplicate indexes logic into its own class that can be tested on its own.
  2. Add some ignored indexes. In some cases we are aware of the duplicate indexes, because maybe we are planning to remove them later. Maybe that can be added to db/docs/indexes or maybe in the class (point 1)
  3. Clean up the code. It's not clean for now.
  4. Add more tests and documentation

Looking forward your opinions. CC: @alexives @stomlinson

Edited by Omar Qunsul