POC : Automatically detecting duplicate database indexes via tests
When I have been reviewing database MRs, I often find duplicate indexes being created. Usually this happens when there are references
, and developers forget to set index: false
, or when there are join tables. Developers create two indexes, and 1 extra unnecessary index for the uniqueness of the columns on the join table.
To clarify. If a table test
has two btree indexes of the form
- (a desc, b desc)
- (a desc, b desc, c desc).
The first index is a duplicated index, and is not necessary anymore, because any query on the columns a, b
, the 2nd index can be used.
To test this, I have created the MR. See the errors in the last pipeline run. Aren't these indexes duplicate
An example:
Duplicate index: index_vulnerabilities_project_id_state_severity_default_branch with ["index_vulnerabilities_on_project_id_and_state_and_severity"]
index_vulnerabilities_project_id_state_severity_default_branch : [{:name=>"project_id", :order=>:asc}, {:name=>"state", :order=>:asc}, {:name=>"severity", :order=>:asc}, {:name=>"present_on_default_branch", :order=>:asc}]
index_vulnerabilities_on_project_id_and_state_and_severity : [{:name=>"project_id", :order=>:asc}, {:name=>"state", :order=>:asc}, {:name=>"severity", :order=>:asc}]
Limitations:
- This only works on the
btree
indexes for now. Because I am not sure this would work on other indexes. - Only on indexes with no
where
and noopclasses
TODO:
- Maybe extract the duplicate indexes logic into its own class that can be tested on its own.
- Add some ignored indexes. In some cases we are aware of the duplicate indexes, because maybe we are planning to remove them later. Maybe that can be added to
db/docs/indexes
or maybe in the class (point 1) - Clean up the code. It's not clean for now.
- Add more tests and documentation
Looking forward your opinions. CC: @alexives @stomlinson
Edited by Omar Qunsul