How to combine two columns of a DataFrame and apply pandas.to_datetime to it?

I will learn to use pandas to use it for some data analysis. The data is provided as a csv file with several columns, of which I need to use only 4 (date, time, o, c). I would like to create a new DataFrame that uses the DateTime64 number as an index, this number is created by merging the first two columns, applying pd.to_datetime to the joined row.

My bootloader code is working fine:

st = pd.read_csv("C:/Data/stockname.txt", names=["date","time","o","h","l","c","vol"]) 

The challenge is to convert the loaded DataFrame to a new one with the correct format. Below it works, but very slowly. Moreover, it simply creates a single column with the new datetime64 format and does not make it an index.

My code

 st_new = pd.concat([pd.to_datetime(st.date + " " + st.time), (st.o + st.c) / 2, st.vol], axis = 1, ignore_index=True) 

What would be a more pythonic way to merge two columns and apply the function to the result? How to make a new column a DataFrame index?

+4
source share
2 answers

You can do everything in the read_csv function:

 pd.read_csv('test.csv', parse_dates={'timestamp': ['date','time']}, index_col='timestamp', usecols=['date', 'time', 'o', 'c']) 

parse_dates tells read_csv combine the date and time column into one timestamp column and parse it as a timestamp. (pandas is smart enough to know how to parse a date in various formats)

index_col sets the timestamp column as an index.

usecols tells read_csv to select only a subset of columns.

+8
source

As for data loading, I think you have it. To set the index, do the following:

 st_new = pd.concat([(st.o + st.c) / 2, st.vol], axis=1, ignore_index=True) st_new.set_index(pd.to_datetime(st.date + " " + st.time), drop=True, inplace=True) 

Here's the API documentation for set_index .

0
source

Source: https://habr.com/ru/post/1495855/


All Articles