How to generate all value pairs from groupby result in pandas frame

I have a pandas dataframe df :

 ID words 1 word1 1 word2 1 word3 2 word4 2 word5 3 word6 3 word7 3 word8 3 word9 

I want to create another data frame that will generate all pairs of words in each group. Thus, the result for the above:

 ID wordA wordB 1 word1 word2 1 word1 word3 1 word2 word3 2 word4 word5 3 word6 word7 3 word6 word8 3 word6 word9 3 word7 word8 3 word7 word9 3 word8 word9 

I know that I can use df.groupby['words'] to get the words in each ID .

I also know that I can use

 iterable = ['word1','word2','word3'] list(itertools.combinations(iterable, 2)) 

to get all possible pairwise combinations. However, I lost a bit as the best way to generate the resulting data frame, as shown above.

+7
source share
4 answers

Simple use of itertools combinations inside apply and stack ie

 from itertools import combinations ndf = df.groupby('ID')['words'].apply(lambda x : list(combinations(x.values,2))) .apply(pd.Series).stack().reset_index(level=0,name='words') ID words 0 1 (word1, word2) 1 1 (word1, word3) 2 1 (word2, word3) 0 2 (word4, word5) 0 3 (word6, word7) 1 3 (word6, word8) 2 3 (word6, word9) 3 3 (word7, word8) 4 3 (word7, word9) 5 3 (word8, word9) 

So that you can accurately indicate the result, we must do

 sdf = pd.concat([ndf['ID'],ndf['words'].apply(pd.Series)],1).set_axis(['ID','WordsA','WordsB'],1,inplace=False) ID WordsA WordsB 0 1 word1 word2 1 1 word1 word3 2 1 word2 word3 0 2 word4 word5 0 3 word6 word7 1 3 word6 word8 2 3 word6 word9 3 3 word7 word8 4 3 word7 word9 5 3 word8 word9 

To convert it to one line, we can do:

 combo = df.groupby('ID')['words'].apply(combinations,2)\ .apply(list).apply(pd.Series)\ .stack().apply(pd.Series)\ .set_axis(['WordsA','WordsB'],1,inplace=False)\ .reset_index(level=0) 
+5
source

You can use groupby with apply and return a DataFrame , add reset_index for the last time to remove the second level, and then to create a column from the index:

 from itertools import combinations f = lambda x : pd.DataFrame(list(combinations(x.values,2)), columns=['wordA','wordB']) df = (df.groupby('ID')['words'].apply(f) .reset_index(level=1, drop=True) .reset_index()) print (df) ID wordA wordB 0 1 word1 word2 1 1 word1 word3 2 1 word2 word3 3 2 word4 word5 4 3 word6 word7 5 3 word6 word8 6 3 word6 word9 7 3 word7 word8 8 3 word7 word9 9 3 word8 word9 
+6
source

You can define a custom function that applies to each group. Both inputs and outputs are a data framework:

 def combine(group): return pd.DataFrame.from_records(itertools.combinations(group.word, 2)) df.groupby('ID').apply(combine) 

Result:

  0 1 ID 1 0 word1 word2 1 word1 word3 2 word2 word3 2 0 word4 word5 3 0 word6 word7 1 word6 word8 2 word6 word9 3 word7 word8 4 word7 word9 5 word8 word9 
+2
source

The easiest way to do this:

 from itertools import combinations import pandas as pd df_new = pd.DataFrame(list(combinations(df.words, 2)), columns=['word1', 'word2']) 
0
source

Source: https://habr.com/ru/post/1273824/


All Articles