TableImporter doesn't show in which row and column TypeErrors occur (csv/tsv)
Summary
With the fix of #62 (closed), we delegate the datatype conversions to pandas.read_csv
, which solves a lot of conversion problems but results in insufficient information about conversion problems: We only know that we couldn't convert from float
to int
but we don't know which row and column contains the problematic value. See, e.g., open enhancement issue in Pandas.
Expected Behavior
An error message like "In column 'integer_column', row 1, I encountered 'float' but expected 'int'."
Actual Behavior
Generic TypeError "cannot safely cast non-equivalent float64 to int64".
Steps to Reproduce the Problem
- Use the example from #62 (closed).
intcolum,int_with_gaps,float
1,1,1.3
2,,2.4
3,3,3.5
- Import with (see #61 for
obligatory_columns
)
from caosadvancedtools.table_importer import CSVImporter
csv_importer = CSVImporter(converters={}, datatypes={"intcolumn": int, "int_with_gaps": int, "float": int}, obligatory_columns=["float"])
csv_importer.read_file("test.csv")
- Read the insufficient error message.
Specifications
- Version: caosadvancedtools 0.12.0
- Platform: any
Possible fixes
- Catch TypeErrors and ValueErrors when reading the csv/tsv file.
- Then load the file without dtype specification, iterate through the dtype dictionary and try to convert columns individually with
df[key].astype(dtype[key])
. - Whenever the above results in an error, iterate through the column and check each row individually.
- Collect all errors and return a list of problematic rows/columns.