What is the fastest way to assemble a DataFrame in parts?

Question

What is the fastest way to assemble a DataFrame in parts?

I download price data from bloomberg and want to build a DataFrame in the fastest and least memorable way. Let's say I send a data request to bloomberg via python for price data for all current S & P 500 stocks from 1-1-2000 to 1-1-2013. Data is returned by the ticker, and then the date and value, one at a time. My current method is to create a list for the dates to be stored, and another list for the prices to be stored, and add the date and price to each list as they are read from the Bloomberg data request request. Then, when all dates and prices are read for a specific ticker, I create a DataFrame for the ticker using

ticker_df = pd.DataFrame(price_list, index = dates_list, columns= [ticker], dtype=float)

I do this for each ticker by adding each ticker data frame to the list <df_list.append (ticker_df) → after each ticker data reading. When all the ticker data frames are done, I merge all the individual DataFrames into one DataFrame:

 lg_index = [] for num in range(len(df_list)): if len(lg_index) < len(df_list[num].index): lg_index = df_list[num].index # Use the largest index for creating the result_df result_df = pd.DataFrame(index= lg_index) for num in range(len(df_list)): result_df[df_list[num].columns[0]] = df_list[num]

The reason I do it this way is that the indices for each ticker are not identical (if last year the stocks had only IPOs, etc.)

I suppose there should be a better way to accomplish what I'm doing here, using less memory and in a faster way, I just can't think about it. Thanks!

+4

performance python pandas memory dataframe

geronimo Jun 17 '13 at 16:22

source share

1 answer

Andy hayden · Answer 1 · 2013-06-17T16:36:13+0000

I am not 100% sure what you are after, but you can concat list of DataFrames:

 pd.concat(df_list)

For instance:

 In [11]: df = pd.DataFrame([[1, 2], [3, 4]]) In [12]: pd.concat([df, df, df]) Out[12]: 0 1 0 1 2 1 3 4 0 1 2 1 3 4 0 1 2 1 3 4 In [13]: pd.concat([df, df, df], axis=1) Out[13]: 0 1 0 1 0 1 0 1 2 1 2 1 2 1 3 4 3 4 3 4

or do external join:

 In [14]: df1 = pd.DataFrame([[1, 2]], columns=[0, 2]) In [15]: df.merge(df1, how='outer') # do several of these Out[15]: 0 1 2 0 1 2 2 1 3 4 NaN

See merge, merge, merge document section .

What is the fastest way to assemble a DataFrame in parts?

More articles: