Splitting a list in a Pandas cell into multiple columns

I have a really simple Pandas dataframe where each cell contains a list. I would like to break each list item into its own column. I can do this by exporting the values ​​and then creating a new dataframe . This does not seem to be a good way to do this, especially if my dataframe had a column away from the list column.

 import pandas as pd df = pd.DataFrame(data=[[[8,10,12]], [[7,9,11]]]) df = pd.DataFrame(data=[x[0] for x in df.values]) 

Required Conclusion:

  0 1 2 0 8 10 12 1 7 9 11 

Follow-up based on @Psidom answer:

If I had a second column:

 df = pd.DataFrame(data=[[[8,10,12], 'A'], [[7,9,11], 'B']]) 

How not to lose another column?

Required Conclusion:

  0 1 2 3 0 8 10 12 A 1 7 9 11 B 
+5
source share
2 answers

You can scroll through a series using the apply() function and convert each list to Series , this automatically expands the list as a series in the column direction:

 df[0].apply(pd.Series) # 0 1 2 #0 8 10 12 #1 7 9 11 

Update. To save other columns of the data frame, you can combine the result with the columns you want to save:

 pd.concat([df[0].apply(pd.Series), df[1]], axis = 1) # 0 1 2 1 #0 8 10 12 A #1 7 9 11 B 
+8
source

You can do pd.DataFrame(df[col].values.tolist()) - much faster ~ 500x

 In [820]: pd.DataFrame(df[0].values.tolist()) Out[820]: 0 1 2 0 8 10 12 1 7 9 11 In [821]: pd.concat([pd.DataFrame(df[0].values.tolist()), df[1]], axis=1) Out[821]: 0 1 2 1 0 8 10 12 A 1 7 9 11 B 

The timing

Medium

 In [828]: df.shape Out[828]: (20000, 2) In [829]: %timeit pd.DataFrame(df[0].values.tolist()) 100 loops, best of 3: 15 ms per loop In [830]: %timeit df[0].apply(pd.Series) 1 loop, best of 3: 4.06 s per loop 

Large

 In [832]: df.shape Out[832]: (200000, 2) In [833]: %timeit pd.DataFrame(df[0].values.tolist()) 10 loops, best of 3: 161 ms per loop In [834]: %timeit df[0].apply(pd.Series) 1 loop, best of 3: 40.9 s per loop 
+1
source

Source: https://habr.com/ru/post/1260680/


All Articles