Pandas data frame headers shift when reading csv

I am trying to read data from a csv file into the pandas framework, but the headers move in two columns when reading in a data frame.

I think this is due to the fact that there are two blank lines after the header, but I'm not sure. It seems that in the first two columns they are read as row headers / indexes.

CSV format:

VendorID,lpep_pickup_datetime,Lpep_dropoff_datetime,Store_and_fwd_flag,RateCodeID,Pickup_longitude,Pickup_latitude,Dropoff_longitude,Dropoff_latitude,Passenger_count,Trip_distance,Fare_amount,Extra,MTA_tax,Tip_amount,Tolls_amount,Ehail_fee,Total_amount,Payment_type,Trip_type 


2,2014-04-01 00:00:00,2014-04-01 14:24:20,N,1,0,0,0,0,1,7.45,23,0,0.5,0,0,,23.5,2,1,,
2,2014-04-01 00:00:00,2014-04-01 17:21:33,N,1,0,0,-73.987663269042969,40.780872344970703,1,8.95,31,1,0.5,0,0,,32.5,2,1,,

Data Frame Format:

                                   VendorID lpep_pickup_datetime  \
2 2014-04-01 00:00:00  2014-04-01 14:24:20                    N   
  2014-04-01 00:00:00  2014-04-01 17:21:33                    N   
  2014-04-01 00:00:00  2014-04-01 15:06:18                    N   
  2014-04-01 00:00:00  2014-04-01 08:09:27                    N   
  2014-04-01 00:00:00  2014-04-01 16:15:13                    N   

                       Lpep_dropoff_datetime  Store_and_fwd_flag  RateCodeID  \
2 2014-04-01 00:00:00                      1                   0           0   
  2014-04-01 00:00:00                      1                   0           0   
  2014-04-01 00:00:00                      1                   0           0   
  2014-04-01 00:00:00                      1                   0           0   
  2014-04-01 00:00:00                      1                   0           0  

Code below:

file ='green_tripdata_2014-04.csv'
df4 = pd.read_csv(file)
print(df4.head(5))

I just need to read it in a data frame with headers in the right place.

+4
source share
1 answer

Your csv data looks weird - you have 20 column headers, but 22 records in the first row with the data.

, - *, :

df = pd.read_csv(file, skiprows=[1,2], index_col=False)

skiprows , index_col , .

. http://pandas.pydata.org/pandas-docs/version/0.16.2/generated/pandas.read_csv.html csv.

Edit:

*: , , , csv . (. ,,).

.

- :

pd.read_csv("file.csv", skiprows=[1,2], usecols=np.arange(20))

np.arange(20) 1-20, ( ).

+6

Source: https://habr.com/ru/post/1616259/


All Articles