TableImporter can't handle empty fields in integer columns.
Summary
When using the TableImporter to read in a csv file (TODO: Is this valid for other importer classes than CSVImporter?) it will raise a DataInconsistencyError because Pandas converts Integer columns to float if they contain NaNs.
Expected Behavior
I should be able to import CSVs with empty fields in integer columns
Actual Behavior
DataInconsistencyError: In row no. 0 and column 'int_with_gaps' of file 'test.csv' the datatype was class 'float' but it should be class 'int'
Steps to Reproduce the Problem
Take any CSV with an integer column with gaps, e.g.
intcolum,int_with_gaps,float
1,1,1.3
2,,2.4
3,3,3.5
and try to import it via (see #61 for the obligatory_columns=["float"] argument)
from caosadvancedtools.table_importer import CSVImporter
csv_importer = CSVImporter(converters={}, datatypes={"intcolumn": int, "int_with_gaps": int, "float": float}, obligatory_columns=["float"])
csv_importer.read_file("test.csv") # returns an empty dataframe
Compare the output to
csv_importer = CSVImporter(converters={}, datatypes={"int_column": int, "float_column": float}, obligatory_columns=["int_column"])
csv_importer.read_file("test.csv") # raises the above error
Note that also the suggested solution for pandas.read_csv, i.e., using the nullable pandas.Int64Dtype() or "Int64" does not fix this, because we're using a numpy.issubdtype() later on which does not know the Pandas specific dtypes.
Specifications
- Version: caosadvancedtools 0.10.0 (Linkahead 0.13)
- Platform: Any
Possible fixes
I suggest to use the Pandas types for ints and to rather improve the type-check down the line.