Improve performance of new Overlapping validation
The new advance validation "OverlappingCoordinates" verifies if there are multiple rows/entries in the import file, with the same action, and which will result in modifing the same range of coordinates in the database.
It returns the error "The input data contains two or more observations which modify the same region {1}. The first overlapping observation is at observation number {0} (for CSV file imports located at row {3})."
The same validation is done in the basic validation mode returning the error: "The input data contains two or more observations which modify the same region. This can be caused by two Observations with delete Actions which overlap on the region that they delete. Remove overlapping observations (Modifying part or the same coordinates) and re-submit the request."
This last validation is currently done by the SQL server, when it returns the SQL error 8672 during the merge of the staging table to the tables storing permanently the values.
Currently we skip the advance validation "OverlappingCoordinates", because the current implementation (using HashSet.IsProperSubsetOf and HashSet.IsProperSupersetOf) is really slow.
-
Improve the performance of this validation check.
Ideas:
- Use a dataset to store the coordinates and perform the validation, only until the file has been streamed into the staging table.
- Might be faster, but all coordinates will still be kept on memory (bad for large files)
- Use the staging table values to perform the validation, only until the file has been streamed into the staging table.
- will have the smallest memory usage
- Split the List of coordinates in 2, one for full coordinates and anotherone for range coordinates, this way the check might be faster.
- Note: Duplicates are alreary validated