Skip to content

TableImporter doesn't show in which row and column TypeErrors occur (csv/tsv)

Summary

With the fix of #62 (closed), we delegate the datatype conversions to pandas.read_csv, which solves a lot of conversion problems but results in insufficient information about conversion problems: We only know that we couldn't convert from float to int but we don't know which row and column contains the problematic value. See, e.g., open enhancement issue in Pandas.

Expected Behavior

An error message like "In column 'integer_column', row 1, I encountered 'float' but expected 'int'."

Actual Behavior

Generic TypeError "cannot safely cast non-equivalent float64 to int64".

Steps to Reproduce the Problem

  1. Use the example from #62 (closed).
intcolum,int_with_gaps,float
1,1,1.3
2,,2.4
3,3,3.5
  1. Import with (see #61 for obligatory_columns)
from caosadvancedtools.table_importer import CSVImporter

csv_importer = CSVImporter(converters={}, datatypes={"intcolumn": int, "int_with_gaps": int, "float": int}, obligatory_columns=["float"])
csv_importer.read_file("test.csv")
  1. Read the insufficient error message.

Specifications

  • Version: caosadvancedtools 0.12.0
  • Platform: any

Possible fixes

  1. Catch TypeErrors and ValueErrors when reading the csv/tsv file.
  2. Then load the file without dtype specification, iterate through the dtype dictionary and try to convert columns individually with df[key].astype(dtype[key]).
  3. Whenever the above results in an error, iterate through the column and check each row individually.
  4. Collect all errors and return a list of problematic rows/columns.
To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information