Concatenated pandas data frame list but ignoring column name

Subheading: Dumb it down pandas, stop trying to be smart.

I have a ( res ) list of frame frames with one pandas column, each of which contains the same type of numeric data, but each with a different column name. Row indices have no meaning. I want to put them in one, very long, single-column data frame.

When I do pd.concat(res) , I get one column for each input file (both load and load of NaN cells). I tried various values ​​for the parameters (*), but none of them do what I need.

Edit: Example data:

 res = [ pd.DataFrame({'A':[1,2,3]}), pd.DataFrame({'B':[9,8,7,6,5,4]}), pd.DataFrame({'C':[100,200,300,400]}), ] 

I have an ugly solution: copy each data frame and give it a new column name:

 newList = [] for r in res: r.columns = ["same"] newList.append(r) pd.concat( newList, ignore_index=True ) 

Isn't that the best way to do this?

BTW, pandas: the data concatenation frame with a different column name is similar, but my question is even simpler, since I do not want the index to be supported. (I also start with a list of N frames with one column, not one frame of data from an N-column.)

*: eg. axis=0 - default behavior. axis=1 gives an error. join="inner" just stupid (I only get the index). ignore_index=True renumbers index, but I get stil many columns, many NaNs.


UPDATE for empty lists

I had problems (with all these solutions) when the data had an empty list, for example:

 res = [ pd.DataFrame({'A':[1,2,3]}), pd.DataFrame({'B':[9,8,7,6,5,4]}), pd.DataFrame({'C':[]}), pd.DataFrame({'D':[100,200,300,400]}), ] 

The trick was to force the type by adding .astype('float64') . For instance.

 pd.Series(np.concatenate([df.values.ravel().astype('float64') for df in res])) 

or

 pd.concat(res,axis=0).astype('float64').stack().reset_index(drop=True) 
+5
source share
2 answers

I would use a list, for example:

 import pandas as pd res = [ pd.DataFrame({'A':[1,2,3]}), pd.DataFrame({'B':[9,8,7,6,5,4]}), pd.DataFrame({'C':[100,200,300,400]}), ] x = [] [x.extend(df.values.tolist()) for df in res] pd.DataFrame(x) Out[49]: 0 0 1 1 2 2 3 3 9 4 8 5 7 6 6 7 5 8 4 9 100 10 200 11 300 12 400 

I tested the speed for you.

 %timeit x = []; [x.extend(df.values.tolist()) for df in res]; pd.DataFrame(x) 10000 loops, best of 3: 196 Β΅s per loop %timeit pd.Series(pd.concat(res, axis=1).values.ravel()).dropna() 1000 loops, best of 3: 920 Β΅s per loop %timeit pd.concat(res, axis=1).stack().reset_index(drop=True) 1000 loops, best of 3: 902 Β΅s per loop %timeit pd.DataFrame(pd.concat(res, axis=1).values.ravel(), columns=['col']).dropna() 1000 loops, best of 3: 1.07 ms per loop %timeit pd.Series(np.concatenate([df.values.ravel() for df in res])) 10000 loops, best of 3: 70.2 Β΅s per loop 

looks like

 pd.Series(np.concatenate([df.values.ravel() for df in res])) 

is the fastest.

+2
source

I think you need concat with stack :

 print (pd.concat(res, axis=1)) ABC 0 1.0 9 100.0 1 2.0 8 200.0 2 3.0 7 300.0 3 NaN 6 400.0 4 NaN 5 NaN 5 NaN 4 NaN print (pd.concat(res, axis=1).stack().reset_index(drop=True)) 0 1.0 1 9.0 2 100.0 3 2.0 4 8.0 5 200.0 6 3.0 7 7.0 8 300.0 9 6.0 10 400.0 11 5.0 12 4.0 dtype: float64 

Another solution with numpy.ravel for alignment:

 print (pd.Series(pd.concat(res, axis=1).values.ravel()).dropna()) 0 1.0 1 9.0 2 100.0 3 2.0 4 8.0 5 200.0 6 3.0 7 7.0 8 300.0 10 6.0 11 400.0 13 5.0 16 4.0 dtype: float64 

 print (pd.DataFrame(pd.concat(res, axis=1).values.ravel(), columns=['col']).dropna()) col 0 1.0 1 9.0 2 100.0 3 2.0 4 8.0 5 200.0 6 3.0 7 7.0 8 300.0 10 6.0 11 400.0 13 5.0 16 4.0 

Solution with list comprehension :

 print (pd.Series(np.concatenate([df.values.ravel() for df in res]))) 0 1 1 2 2 3 3 9 4 8 5 7 6 6 7 5 8 4 9 100 10 200 11 300 12 400 dtype: int64 
+5
source

Source: https://habr.com/ru/post/1261507/


All Articles