I am trying to analyze a Gizette dataset from a function selection task
when I try to match the train data structure with a series of labels based on pandas example it
throws
ValueError: buffer dtype mismatch, expected "Python object" but received "long long"
code:
import pandas as pd trainData = pd.read_table(filepath_or_buffer='GISETTE/gisette_train.data' ,delim_whitespace=True ,header=None ,names=['AA','AB','AC','AD','AE','AF','AG','AH','AI','AJ','AK','AL','AM','AN','AO','AP','AQ','AR','AS','AT','AU','AV','AW','AX','AY','AZ','BA','BB','BC','BD','BE','BF','BG','BH','BI','BJ','BK','BL','BM','BN','BO','BP','BQ','BR','BS','BT','BU','BV','BW','BX','BY','BZ','CA','CB','CC','CD','CE','CF','CG','CH','CI','CJ','CK','CL','CM','CN','CO','CP','CQ','CR','CS','CT','CU','CV','CW','CX','CY','CZ','DA','DB','DC','DD','DE','DF','DG','DH','DI','DJ','DK','DL','DM','DN','DO','DP','DQ','DR','DS','DT','DU','DV','DW','DX','DY','DZ','EA','EB','EC','ED','EE','EF','EG','EH','EI','EJ','EK','EL','EM','EN','EO','EP','EQ','ER','ES','ET','EU','EV','EW','EX','EY','EZ','FA','FB','FC','FD','FE','FF','FG','FH','FI','FJ','FK','FL','FM','FN','FO','FP','FQ','FR','FS','FT','FU','FV','FW','FX','FY','FZ','GA','GB','GC','GD','GE','GF','GG','GH','GI','GJ','GK','GL','GM','GN','GO','GP','GQ','GR','GS','GT','GU','GV','GW','GX','GY','GZ','HA','HB','HC','HD','HE','HF','HG','HH','HI','HJ','HK','HL','HM','HN','HO','HP','HQ','HR','HS','HT','HU','HV','HW','HX','HY','HZ','IA','IB','IC','ID','IE','IF','IG','IH','II','IJ','IK','IL','IM','IN','IO','IP','IQ','IR','IS','IT','IU','IV','IW','IX','IY','IZ','JA','JB','JC','JD','JE','JF','JG','JH','JI','JJ','JK','JL','JM','JN','JO','JP','JQ','JR','JS','JT','JU','JV','JW','JX','JY','JZ','KA','KB','KC','KD','KE','KF','KG','KH','KI','KJ','KK','KL','KM','KN','KO','KP','KQ','KR','KS','KT','KU','KV','KW','KX','KY','KZ','LA','LB','LC','LD','LE','LF','LG','LH','LI','LJ','LK','LL','LM','LN','LO','LP','LQ','LR','LS','LT','LU','LV','LW','LX','LY','LZ','MA','MB','MC','MD','ME','MF','MG','MH','MI','MJ','MK','ML','MM','MN','MO','MP','MQ','MR','MS','MT','MU','MV','MW','MX','MY','MZ','NA','NB','NC','ND','NE','NF','NG','NH','NI','NJ','NK','NL','NM','NN','NO','NP','NQ','NR','NS','NT','NU','NV','NW','NX','NY','NZ','OA','OB','OC','OD','OE','OF','OG','OH','OI','OJ','OK','OL','OM','ON','OO','OP','OQ','OR','OS','OT','OU','OV','OW','OX','OY','OZ','PA','PB','PC','PD','PE','PF','PG','PH','PI','PJ','PK','PL','PM','PN','PO','PP','PQ','PR','PS','PT','PU','PV','PW','PX','PY','PZ','QA','QB','QC','QD','QE','QF','QG','QH','QI','QJ','QK','QL','QM','QN','QO','QP','QQ','QR','QS','QT','QU','QV','QW','QX','QY','QZ','RA','RB','RC','RD','RE','RF','RG','RH','RI','RJ','RK','RL','RM','RN','RO','RP','RQ','RR','RS','RT','RU','RV','RW','RX','RY','RZ','SA','SB','SC','SD','SE','SF','SG','SH','SI','SJ','SK','SL','SM','SN','SO','SP','SQ','SR','SS','ST','SU','SV','SW','SX','SY','SZ','TA','TB','TC','TD','TE','TF'])
exits
<class 'pandas.core.frame.DataFrame'> MultiIndex: 6000 entries Columns: 500 entries, AA to TF dtypes: int64(500)None trainLabel.describe()
exits
count 6000.000000 mean 0.000000 std 1.000083 min -1.000000 25% -1.000000 50% 0.000000 75% 1.000000 max 1.000000 dtype: float64 readyToTrain = pd.concat([trainData, trainLabel], axis=1)
full stack trace
File "C:\env\Python27\lib\site-packages\pandas\tools\merge.py", line 717, in concat verify_integrity=verify_integrity) File "C:\env\Python27\lib\site-packages\pandas\tools\merge.py", line 848, in __init__ self.new_axes = self._get_new_axes() File "C:\env\Python27\lib\site-packages\pandas\tools\merge.py", line 898, in _get_new_axes new_axes[i] = self._get_comb_axis(i) File "C:\env\Python27\lib\site-packages\pandas\tools\merge.py", line 924, in _get_comb_axis return _get_combined_index(all_indexes, intersect=self.intersect) File "C:\env\Python27\lib\site-packages\pandas\core\index.py", line 3991, in _get_combined_index union = _union_indexes(indexes) File "C:\env\Python27\lib\site-packages\pandas\core\index.py", line 4017, in _union_indexes result = result.union(other) File "C:\env\Python27\lib\site-packages\pandas\core\index.py", line 3753, in union uniq_tuples = lib.fast_unique_multiple([self.values, other.values]) File "lib.pyx", line 366, in pandas.lib.fast_unique_multiple (pandas\lib.c:8378) ValueError: Buffer dtype mismatch, expected 'Python object' but got 'long long'
edit: installed library from binary file from lfd.uci.edu/~gohlke/pythonlibs pandas -0.14.1.win-amd64-py2.7
I tried to suggest converting the series to a frame (it didn’t work with the same stacktrace as above):
information about dataframe (trainData)
<class 'pandas.core.frame.DataFrame'> MultiIndex: 6000 entries, (550, 0, 495, 0, 0, 0, 0, 976, 0, 0, 0, 0, 983, 0, 995, 0, 983, 0, 0, 983, 0, 0, 0, 0, 0, 983, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 991, 983, 0, 0, 0, 0, 0, 0, 0, 0, 0, 808, 0, 778, 0, 983, 0, 0, 0, 0, 991, 0, 0, 0, 0, 0, 0, 0, 991, 983, 983, 0, 0, 0, 0, 0, 0, 0, 983, 735, 0, 0, 983, 983, 0, 0, 0, 0, 569, 0, 0, 0, 0, 713, 0, 0, 0, 0, 0, 983, 983, 0, ...) to (0, 0, 991, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 948, 995, 348, 0, 0, 0, 0, 0, 0, 0, 0, 0, 751, 0, 0, 0, 0, 0, 0, 0, 0, 804, 0, 0, 0, 862, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 991, 0, 0, 0, 0, 995, 0, 0, 0, 0, 0, 0, 840, 0, 0, 0, 976, 0, 0, 0, 0, 0, 0, 777, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...) Columns: 500 entries, AA to TF dtypes: int64(500)None
in dataframe info (trainLabel):
<class 'pandas.core.frame.DataFrame'> Int64Index: 6000 entries, 0 to 5999 Data columns (total 1 columns): label 6000 non-null int64 dtypes: int64(1)None