Movement: harden "identifier": make not NULL: add unique index <bank_id, identifier>

This is a minor feature and a bit tricky. Assign to myself.

Current Situation

The current Movement's database table structure is:

CREATE TABLE `movements` (
  `id` int(10) unsigned NOT NULL AUTO_INCREMENT,
  `created_at` timestamp NULL DEFAULT NULL,
  `updated_at` timestamp NULL DEFAULT NULL,
  `date` date NOT NULL,
  `bank_id` int(10) unsigned NOT NULL,
  `amount` decimal(10,2) NOT NULL DEFAULT 0.00,
  `identifier` varchar(191) NOT NULL DEFAULT '',
  `notes` text NOT NULL,
  `reviewed` tinyint(4) NOT NULL DEFAULT 0,
  PRIMARY KEY (`id`),
  KEY `movements_bank_id_foreign` (`bank_id`),
  KEY `movements_reviewed_index` (`reviewed`),
  CONSTRAINT `movements_bank_id_foreign` FOREIGN KEY (`bank_id`) REFERENCES `banks` (`id`) ON DELETE CASCADE ON UPDATE CASCADE
)

This table structure is too much flexible about identifier. It could be hardened a bit more. Current challenges:

  • doing a query by identifier causes a full table scan (is not a performance problem at the moment but is avoidable with a generic index)
  • it should not be possible to have two rows with the same bank_id and same identifier
    • (it is not causing any problem at the moment but it can be hardened introducing an unique index)
    • SELECT COUNT(*) FROM movements WHERE identifier != '' GROUP BY identifier HAVING COUNT(*) > 1 gives in fact an Empty set (0.004 sec) and this is already good. It means we have no duplicates.

Proposed Changes

  1. make identifier DEFAULT NULL to make an unique index possible
  2. add an UNIQUE index covering <bank_id, identifier> (since the identifier lives in the scope of the bank that releases that identifier)

Quality Assurance

  • import Unicredit movements from its file: it still works (they have NULL identifier)
  • import from PayPal API: it still works (they have populated identifier)
Edited by Valerio Bozzolan