TableImporter can't handle empty fields in integer columns.

Summary

When using the TableImporter to read in a csv file (TODO: Is this valid for other importer classes than CSVImporter?) it will raise a DataInconsistencyError because Pandas converts Integer columns to float if they contain NaNs.

Expected Behavior

I should be able to import CSVs with empty fields in integer columns

Actual Behavior

DataInconsistencyError: In row no. 0 and column 'int_with_gaps' of file 'test.csv' the datatype was class 'float' but it should be class 'int'

Steps to Reproduce the Problem

Take any CSV with an integer column with gaps, e.g.

intcolum,int_with_gaps,float
1,1,1.3
2,,2.4
3,3,3.5

and try to import it via (see #61 for the obligatory_columns=["float"] argument)

from caosadvancedtools.table_importer import CSVImporter

csv_importer = CSVImporter(converters={}, datatypes={"intcolumn": int, "int_with_gaps": int, "float": float}, obligatory_columns=["float"])
csv_importer.read_file("test.csv")  # returns an empty dataframe

Compare the output to

csv_importer = CSVImporter(converters={}, datatypes={"int_column": int, "float_column": float}, obligatory_columns=["int_column"])
csv_importer.read_file("test.csv")  # raises the above error

Note that also the suggested solution for pandas.read_csv, i.e., using the nullable pandas.Int64Dtype() or "Int64" does not fix this, because we're using a numpy.issubdtype() later on which does not know the Pandas specific dtypes.

Specifications

  • Version: caosadvancedtools 0.10.0 (Linkahead 0.13)
  • Platform: Any

Possible fixes

I suggest to use the Pandas types for ints and to rather improve the type-check down the line.

To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information