I need to convert the data stored in pandas.DataFrame to a string of bytes, where each column can have a separate data type (integer or floating point). Here is a simple data set:
df = pd.DataFrame([ 10, 15, 20], dtype='u1', columns=['a']) df['b'] = np.array([np.iinfo('u8').max, 230498234019, 32094812309], dtype='u8') df['c'] = np.array([1.324e10, 3.14159, 234.1341], dtype='f8')
and df looks something like this:
abc 0 10 18446744073709551615 1.324000e+10 1 15 230498234019 3.141590e+00 2 20 32094812309 2.341341e+02
DataFrame knows about the types of each df.dtypes column, so I would like to do something like this:
data_to_pack = [tuple(record) for _, record in df.iterrows()] data_array = np.array(data_to_pack, dtype=zip(df.columns, df.dtypes)) data_bytes = data_array.tostring()
This usually works fine, but in this case (due to the maximum value stored in df['b'][0] . The second line above, converting the tuple array to np.array with the given set of types, causes the following error:
OverflowError: Python int too large to convert to C long
The result of the error (I believe) in the first line, which retrieves the record as Series with one data type (default is float64 ) and the representation selected in float64 for the maximum value of uint64 back to not back to uint64 .
1) Since the DataFrame already knows the types of each column, is there a way around the creation of a tuple string for input into the numpy.array typed constructor? Or is there a better way than stated above to store type information in such a conversion?
2) Is there a way to go directly from the DataFrame to a byte string representing the data using type information for each column.