Pandas concat ValueError: buffer dtype mismatch, expected "Python object" but received "long long"

Question

Pandas concat ValueError: buffer dtype mismatch, expected "Python object" but received "long long"

I am trying to analyze a Gizette dataset from a function selection task

when I try to match the train data structure with a series of labels based on pandas example it

throws

ValueError: buffer dtype mismatch, expected "Python object" but received "long long"

code:

import pandas as pd trainData = pd.read_table(filepath_or_buffer='GISETTE/gisette_train.data' ,delim_whitespace=True ,header=None ,names=['AA','AB','AC','AD','AE','AF','AG','AH','AI','AJ','AK','AL','AM','AN','AO','AP','AQ','AR','AS','AT','AU','AV','AW','AX','AY','AZ','BA','BB','BC','BD','BE','BF','BG','BH','BI','BJ','BK','BL','BM','BN','BO','BP','BQ','BR','BS','BT','BU','BV','BW','BX','BY','BZ','CA','CB','CC','CD','CE','CF','CG','CH','CI','CJ','CK','CL','CM','CN','CO','CP','CQ','CR','CS','CT','CU','CV','CW','CX','CY','CZ','DA','DB','DC','DD','DE','DF','DG','DH','DI','DJ','DK','DL','DM','DN','DO','DP','DQ','DR','DS','DT','DU','DV','DW','DX','DY','DZ','EA','EB','EC','ED','EE','EF','EG','EH','EI','EJ','EK','EL','EM','EN','EO','EP','EQ','ER','ES','ET','EU','EV','EW','EX','EY','EZ','FA','FB','FC','FD','FE','FF','FG','FH','FI','FJ','FK','FL','FM','FN','FO','FP','FQ','FR','FS','FT','FU','FV','FW','FX','FY','FZ','GA','GB','GC','GD','GE','GF','GG','GH','GI','GJ','GK','GL','GM','GN','GO','GP','GQ','GR','GS','GT','GU','GV','GW','GX','GY','GZ','HA','HB','HC','HD','HE','HF','HG','HH','HI','HJ','HK','HL','HM','HN','HO','HP','HQ','HR','HS','HT','HU','HV','HW','HX','HY','HZ','IA','IB','IC','ID','IE','IF','IG','IH','II','IJ','IK','IL','IM','IN','IO','IP','IQ','IR','IS','IT','IU','IV','IW','IX','IY','IZ','JA','JB','JC','JD','JE','JF','JG','JH','JI','JJ','JK','JL','JM','JN','JO','JP','JQ','JR','JS','JT','JU','JV','JW','JX','JY','JZ','KA','KB','KC','KD','KE','KF','KG','KH','KI','KJ','KK','KL','KM','KN','KO','KP','KQ','KR','KS','KT','KU','KV','KW','KX','KY','KZ','LA','LB','LC','LD','LE','LF','LG','LH','LI','LJ','LK','LL','LM','LN','LO','LP','LQ','LR','LS','LT','LU','LV','LW','LX','LY','LZ','MA','MB','MC','MD','ME','MF','MG','MH','MI','MJ','MK','ML','MM','MN','MO','MP','MQ','MR','MS','MT','MU','MV','MW','MX','MY','MZ','NA','NB','NC','ND','NE','NF','NG','NH','NI','NJ','NK','NL','NM','NN','NO','NP','NQ','NR','NS','NT','NU','NV','NW','NX','NY','NZ','OA','OB','OC','OD','OE','OF','OG','OH','OI','OJ','OK','OL','OM','ON','OO','OP','OQ','OR','OS','OT','OU','OV','OW','OX','OY','OZ','PA','PB','PC','PD','PE','PF','PG','PH','PI','PJ','PK','PL','PM','PN','PO','PP','PQ','PR','PS','PT','PU','PV','PW','PX','PY','PZ','QA','QB','QC','QD','QE','QF','QG','QH','QI','QJ','QK','QL','QM','QN','QO','QP','QQ','QR','QS','QT','QU','QV','QW','QX','QY','QZ','RA','RB','RC','RD','RE','RF','RG','RH','RI','RJ','RK','RL','RM','RN','RO','RP','RQ','RR','RS','RT','RU','RV','RW','RX','RY','RZ','SA','SB','SC','SD','SE','SF','SG','SH','SI','SJ','SK','SL','SM','SN','SO','SP','SQ','SR','SS','ST','SU','SV','SW','SX','SY','SZ','TA','TB','TC','TD','TE','TF']) # print 'finished with train data' trainLabel = pd.read_table(filepath_or_buffer='GISETTE/gisette_train.labels' ,squeeze=True ,names=['label'] ,delim_whitespace=True ,header=None) trainData.info()

exits

  <class 'pandas.core.frame.DataFrame'> MultiIndex: 6000 entries Columns: 500 entries, AA to TF dtypes: int64(500)None trainLabel.describe()

exits

  count 6000.000000 mean 0.000000 std 1.000083 min -1.000000 25% -1.000000 50% 0.000000 75% 1.000000 max 1.000000 dtype: float64 readyToTrain = pd.concat([trainData, trainLabel], axis=1)

full stack trace

  File "C:\env\Python27\lib\site-packages\pandas\tools\merge.py", line 717, in concat verify_integrity=verify_integrity) File "C:\env\Python27\lib\site-packages\pandas\tools\merge.py", line 848, in __init__ self.new_axes = self._get_new_axes() File "C:\env\Python27\lib\site-packages\pandas\tools\merge.py", line 898, in _get_new_axes new_axes[i] = self._get_comb_axis(i) File "C:\env\Python27\lib\site-packages\pandas\tools\merge.py", line 924, in _get_comb_axis return _get_combined_index(all_indexes, intersect=self.intersect) File "C:\env\Python27\lib\site-packages\pandas\core\index.py", line 3991, in _get_combined_index union = _union_indexes(indexes) File "C:\env\Python27\lib\site-packages\pandas\core\index.py", line 4017, in _union_indexes result = result.union(other) File "C:\env\Python27\lib\site-packages\pandas\core\index.py", line 3753, in union uniq_tuples = lib.fast_unique_multiple([self.values, other.values]) File "lib.pyx", line 366, in pandas.lib.fast_unique_multiple (pandas\lib.c:8378) ValueError: Buffer dtype mismatch, expected 'Python object' but got 'long long'

edit: installed library from binary file from lfd.uci.edu/~gohlke/pythonlibs pandas -0.14.1.win-amd64-py2.7

I tried to suggest converting the series to a frame (it didn’t work with the same stacktrace as above):

information about dataframe (trainData)

  <class 'pandas.core.frame.DataFrame'> MultiIndex: 6000 entries, (550, 0, 495, 0, 0, 0, 0, 976, 0, 0, 0, 0, 983, 0, 995, 0, 983, 0, 0, 983, 0, 0, 0, 0, 0, 983, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 991, 983, 0, 0, 0, 0, 0, 0, 0, 0, 0, 808, 0, 778, 0, 983, 0, 0, 0, 0, 991, 0, 0, 0, 0, 0, 0, 0, 991, 983, 983, 0, 0, 0, 0, 0, 0, 0, 983, 735, 0, 0, 983, 983, 0, 0, 0, 0, 569, 0, 0, 0, 0, 713, 0, 0, 0, 0, 0, 983, 983, 0, ...) to (0, 0, 991, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 948, 995, 348, 0, 0, 0, 0, 0, 0, 0, 0, 0, 751, 0, 0, 0, 0, 0, 0, 0, 0, 804, 0, 0, 0, 862, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 991, 0, 0, 0, 0, 995, 0, 0, 0, 0, 0, 0, 840, 0, 0, 0, 976, 0, 0, 0, 0, 0, 0, 777, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...) Columns: 500 entries, AA to TF dtypes: int64(500)None

in dataframe info (trainLabel):

  <class 'pandas.core.frame.DataFrame'> Int64Index: 6000 entries, 0 to 5999 Data columns (total 1 columns): label 6000 non-null int64 dtypes: int64(1)None

+8

python-2.7 pandas

lapolonio Sep 13 '14 at 11:09

source share

1 answer

The unfun cat · Answer 1 · 2019-05-10T11:43:09+0000

As Joris pointed out (and how I had to figure it out myself because I didn't read the comments at first), the problems are in your performance.

Change your code with

 pd.concat(to_concat, axis=1)

in

 pd.concat([s.reset_index(drop=True) for s in to_concat], axis=1)

Pandas concat ValueError: buffer dtype mismatch, expected "Python object" but received "long long"

More articles: