Creating a Dataframe from Series and Rows

Suppose I have a framework with columns with rows, rows, and integers that I would like to combine into a new data framework with String and Integer in combination with each record in Series. How can i do this?

In this example:

data = {'fruits': ['banana', 'apple', 'pear'], 'source' : (['brazil', 'algeria', 'nigera'], ['brazil', 'morocco', 'iran', 'france'], ['china', 'india', 'mexico']), 'prices' : [2, 3, 7]} df = pd.DataFrame(data, columns = ['fruits', 'source', 'prices']) 

I would like to get a 3x10 s frame:

 ['banana', 'banana', 'banana', 'apple', 'apple', 'apple', 'apple', 'pear', 'pear', 'pear'], ['brazil', 'algeria', 'nigera', 'brazil', 'morocco', 'iran', 'france', 'china', 'india', 'mexico'], ['2', '2', '2', '3', '3', '3', '3', '7', '7', '7'], 

I think it should not be too difficult, but I can not find neat solutions.

+5
source share
3 answers

Use the explode () function :

 In [30]: explode(df, lst_cols='source') Out[30]: fruits source prices 0 banana brazil 2 1 banana algeria 2 2 banana nigera 2 3 apple brazil 3 4 apple morocco 3 5 apple iran 3 6 apple france 3 7 pear china 7 8 pear india 7 9 pear mexico 7 
+7
source

Using stack and apply(pd.Series)

 df.set_index(['fruits','prices']).source.apply(pd.Series).\ stack().reset_index(level=['fruits','prices']).\ rename(columns={0:'source'}) Out[64]: fruits prices source 0 banana 2 brazil 1 banana 2 algeria 2 banana 2 nigera 0 apple 3 brazil 1 apple 3 morocco 2 apple 3 iran 3 apple 3 france 0 pear 7 china 1 pear 7 india 2 pear 7 mexico 

Op2 recreate your df

 df1=df[['fruits','prices']].reindex(df.index.repeat(df.source.apply(len))) df1['source']=np.concatenate(df.source.values) df1 Out[69]: fruits prices source 0 banana 2 brazil 0 banana 2 algeria 0 banana 2 nigera 1 apple 3 brazil 1 apple 3 morocco 1 apple 3 iran 1 apple 3 france 2 pear 7 china 2 pear 7 india 2 pear 7 mexico 
+4
source

My snapshot using concat + melt .

 c = ['fruits', 'prices'] df = (pd.concat([pd.DataFrame(df.source.tolist()), df[c]], 1) .melt(c, value_name='source') .drop('variable', 1) .dropna()) df fruits prices source 0 banana 2 brazil 1 apple 3 brazil 2 pear 7 china 3 banana 2 algeria 4 apple 3 morocco 5 pear 7 india 6 banana 2 nigera 7 apple 3 iran 8 pear 7 mexico 10 apple 3 france 
+4
source

Source: https://habr.com/ru/post/1274047/


All Articles