How to create rows from dataframe column elements in Python?

Given a df data frame (real life is a line of +1000 df lines). ColB elements are lists of lists.

  ColA ColB 0 'A' [['a','b','c'],['d','e','f']] 1 'B' [['f','g','h'],['i','j','k']] 2 'A' [['l','m','n'],['o','p','q']] 

How to efficiently create a ColC that is a row using elements in different columns, for example:

  ColC 'A>+ab:c,+de:f' 'B>+fg:h,+ij:k' 'A>+lm:n,+op:q' 

I tried with df.apply along these lines inspired by this :

 df['ColC'] = df.apply(lambda x:'%s>' % (x['ColA']),axis=1) 

This works for the first two elements of a row. With difficulty with the rest.

+5
source share
5 answers

Something like that?

 df['ColC'] = df.ColA + '>+' + df.ColB.str[0].str[0] + \ ' ' + df.ColB.str[0].str[1] + ':' + \ df.ColB.str[0].str[2] + ',+' + \ df.ColB.str[1].str[0] + ' ' + \ df.ColB.str[1].str[1] + ':' + \ df.ColB.str[1].str[2] 

Output:

  ColA ColB ColC 0 A [[a, b, c], [d, e, f]] A>+ab:c,+de:f 1 B [[f, g, h], [i, j, k]] B>+fg:h,+ij:k 2 A [[l, m, n], [o, p, q]] A>+lm:n,+op:q 

Delays

df = pd.concat ([df] * 333)

Wen method

%% timeit df [['t1', 't2']] = df ['ColB']. apply (pd.Series) .applymap (lambda x: ('{} {}: {}'. format (x [0], x [1], x [2]))) df.ColA + '> +' + df.t1 + ', +' + df.t2

1, best 3: 363 ms per cycle

miradulo Method

%% timeit df.apply (lambda r: '{}> + {} {}: {}, + {} {}: {}'. format (* flatten (r)), axis = 1)

10 loops, best of 3: 74.9 ms per cycle

Scott Boston Method

%% timeit df.ColA + '> +' + df.ColB.str [0] .str [0] + \ '' + df.ColB.str [0] .str [1] + ':' + \ df .ColB.str [0] .str [2] + ', +' + \ df.ColB.str [1] .str [0] + '' + \ df.ColB.str [1] .str [1] + ':' + \ df.ColB.str [1] .str [2]

100 cycles, best of 3: 12.4 ms per cycle

+3
source

You're right using apply

 df[['t1','t2']]=df['colB'].apply(pd.Series).applymap(lambda x : ('{} {}:{}'.format(x[0],x[1],x[2]))) df.colA+'>+'+df.t1+',+'+df.t2 Out[648]: 0 A>+ab:c,+de:f 1 B>+fg:h,+ij:k 2 C>+lm:n,+op:q 
+2
source

If we use the flatten function as follows

 def flatten(l): for el in l: if isinstance(el, collections.Iterable) and not isinstance(el, (str, bytes)): yield from flatten(el) else: yield el 

as shown in this answer , we can easily apply format a string with flattened elements.

 >>> df.apply(lambda r:'{}>+{} {}:{},+{} {}:{}'.format(*flatten(r.values)), axis=1) 0 A>+ab:c,+de:f 1 B>+fg:h,+ij:k 2 A>+lm:n,+op:q dtype: object 

This, we hope, will generalize quite well.

 >>> row_formatter = lambda r: '{}>+{} {}:{},+{} {}:{}'.format(*flatten(r.values)) >>> df.apply(row_formatter, 1) 0 A>+ab:c,+de:f 1 B>+fg:h,+ij:k 2 A>+lm:n,+op:q dtype: object 
+2
source

Another answer:

 df['ColC'] = df.apply(lambda x: '%s>+%s %s:%s,+%s%s:%s'% tuple([x['ColA']]+x['ColB'][0]+x['ColB'][1]),axis=1) 
+2
source

Here my 2 cents also use apply

Define a function that you can apply to the data framework and use row formatting to parse your columns.

 def get_string(x): col_a = x.ColA col_b = (ch for ch in x.ColB if ch.isalnum()) string = '{0}>+{1} {2}:{3},+{4} {5}:{6}'.format(col_a.strip("\'"), *col_b) return(string) df['ColC'] = df.apply(get_string, axis=1) df.ColC 0 A>+ab:c,+de:f 1 B>+fg:h,+ij:k 2 A>+lm:n,+op:q 

I like it because it's easy to change the format, although using this method can be slow

+1
source

Source: https://habr.com/ru/post/1273063/


All Articles