Pandas data frame headers shift when reading csv

Question

Pandas data frame headers shift when reading csv

I am trying to read data from a csv file into the pandas framework, but the headers move in two columns when reading in a data frame.

I think this is due to the fact that there are two blank lines after the header, but I'm not sure. It seems that in the first two columns they are read as row headers / indexes.

CSV format:

VendorID,lpep_pickup_datetime,Lpep_dropoff_datetime,Store_and_fwd_flag,RateCodeID,Pickup_longitude,Pickup_latitude,Dropoff_longitude,Dropoff_latitude,Passenger_count,Trip_distance,Fare_amount,Extra,MTA_tax,Tip_amount,Tolls_amount,Ehail_fee,Total_amount,Payment_type,Trip_type 


2,2014-04-01 00:00:00,2014-04-01 14:24:20,N,1,0,0,0,0,1,7.45,23,0,0.5,0,0,,23.5,2,1,,
2,2014-04-01 00:00:00,2014-04-01 17:21:33,N,1,0,0,-73.987663269042969,40.780872344970703,1,8.95,31,1,0.5,0,0,,32.5,2,1,,

Data Frame Format:

                                   VendorID lpep_pickup_datetime  \
2 2014-04-01 00:00:00  2014-04-01 14:24:20                    N   
  2014-04-01 00:00:00  2014-04-01 17:21:33                    N   
  2014-04-01 00:00:00  2014-04-01 15:06:18                    N   
  2014-04-01 00:00:00  2014-04-01 08:09:27                    N   
  2014-04-01 00:00:00  2014-04-01 16:15:13                    N   

                       Lpep_dropoff_datetime  Store_and_fwd_flag  RateCodeID  \
2 2014-04-01 00:00:00                      1                   0           0   
  2014-04-01 00:00:00                      1                   0           0   
  2014-04-01 00:00:00                      1                   0           0   
  2014-04-01 00:00:00                      1                   0           0   
  2014-04-01 00:00:00                      1                   0           0

Code below:

file ='green_tripdata_2014-04.csv'
df4 = pd.read_csv(file)
print(df4.head(5))

I just need to read it in a data frame with headers in the right place.

+4

python pandas csv

Ben Price Nov 17 '15 at 18:04

source share

1 answer

chris-sc · Accepted Answer · 2015-11-17T19:46:35+0000

Your csv data looks weird - you have 20 column headers, but 22 records in the first row with the data.

, - *, :

df = pd.read_csv(file, skiprows=[1,2], index_col=False)

skiprows , index_col , .

. http://pandas.pydata.org/pandas-docs/version/0.16.2/generated/pandas.read_csv.html csv.

Edit:

*: , , , csv . (. ,,).

.

- :

pd.read_csv("file.csv", skiprows=[1,2], usecols=np.arange(20))

np.arange(20) 1-20, ( ).

Pandas data frame headers shift when reading csv

Edit:

More articles: