Unfortunately, using converters or newer versions of pandas does not solve the more general problem of always ensuring that read_csv does not output type float64. With pandas 0.15.2, the following example with a CSV containing integers in hexadecimal notation with NULL records shows that using converters for which the name is implied, they should be used, obstruct the dtype specification.
In [1]: df = pd.DataFrame(dict(a = ["0xff", "0xfe"], b = ["0xfd", None], c = [None, "0xfc"], d = [None, None])) In [2]: df.to_csv("H:/tmp.csv", index = False) In [3]: ef = pd.read_csv("H:/tmp.csv", dtype = {c: object for c in "abcd"}, converters = {c: lambda x: None if x == "" else int(x, 16) for c in "abcd"}) In [4]: ef.dtypes.map(lambda x: x) Out[4]: a int64 b float64 c float64 d object dtype: object
The specified object dtype applies only to the all-NULL column. In this case, the float64 values ββcan simply be converted to integers, but according to the pigeon principle, not all 64-bit integers can be represented as float64.
The best solution I found for this more general case is to get pandas to read the potentially problematic columns as rows, as already discussed, and then convert the slice with values ββthat need conversion (and not display the transformation in the column, since this will again cause dtype = float64 to exit automatically).
In [5]: ff = pd.read_csv("H:/tmp.csv", dtype = {c: object for c in "bc"}, converters = {c: lambda x: None if x == "" else int(x, 16) for c in "ad"}) In [6]: ff.dtypes Out[6]: a int64 b object c object d object dtype: object In [7]: for c in "bc": .....: ff.loc[~pd.isnull(ff[c]), c] = ff[c][~pd.isnull(ff[c])].map(lambda x: int(x, 16)) .....: In [8]: ff.dtypes Out[8]: a int64 b object c object d object dtype: object In [9]: [(ff[c][i], type(ff[c][i])) for c in ff.columns for i in ff.index] Out[9]: [(255, numpy.int64), (254, numpy.int64), (253L, long), (nan, float), (nan, float), (252L, long), (None, NoneType), (None, NoneType)]
As far as I was able to determine, at least prior to version 0.15.2, there is no way to avoid post-processing of string values ββin such situations.