Concatenation of two columns of pandas

I have the following DataFrame :

 from pandas import * df = DataFrame({'foo':['a','b','c'], 'bar':[1, 2, 3]}) 

It looks like this:

  bar foo 0 1 a 1 2 b 2 3 c 

Now I want to have something like:

  bar 0 1 is a 1 2 is b 2 3 is c 

How can I achieve this? I tried the following:

 df['foo'] = '%s is %s' % (df['bar'], df['foo']) 

but this gives me the wrong result:

 >>>print df.ix[0] bar a foo 0 a 1 b 2 c Name: bar is 0 1 1 2 2 Name: 0 

Sorry for the dumb question, but this pandas: merging two columns in a DataFrame did not help me.

+59
python string numpy pandas dataframe
Aug 08 2018-12-12T00:
source share
7 answers

df['bar'] = df.bar.map(str) + " is " + df.foo .

+99
Aug 08 2018-12-12T00:
source share
— -

The problem in your code is that you want to apply the operation to each line. The way you wrote it takes all the columns "bar" and "foo", converts them into rows and returns one large row. You can write this as:

 df.apply(lambda x:'%s is %s' % (x['bar'],x['foo']),axis=1) 

This is more than another answer, but more general (can be used with values ​​that are not strings).

+38
Aug 08 '12 at 23:15
source share

This question has already been answered, but I believe that it would be nice to add some useful methods that were not previously discussed, and compare all the methods proposed so far in terms of performance.

Here are some useful solutions to this problem, in increasing order of productivity.




DataFrame.agg

This is a simple str.format -based approach.

 df['baz'] = df.agg('{0[bar]} is {0[foo]}'.format, axis=1) df foo bar baz 0 a 1 1 is a 1 b 2 2 is b 2 c 3 3 is c 

You can also use f-string formatting here:

 df['baz'] = df.agg(lambda x: f"{x['bar']} is {x['foo']}", axis=1) df foo bar baz 0 a 1 1 is a 1 b 2 2 is b 2 c 3 3 is c 



char.array -based Concatenation

Convert the columns to chararrays like chararrays and then stack them together.

 a = np.char.array(df['bar'].values) b = np.char.array(df['foo'].values) df['baz'] = (a + b' is ' + b).astype(str) df foo bar baz 0 a 1 1 is a 1 b 2 2 is b 2 c 3 3 is c 



Understanding a list with zip

I can not exaggerate how underestimated the understanding of lists in pandas is.

 df['baz'] = [str(x) + ' is ' + y for x, y in zip(df['bar'], df['foo'])] 

You can also use str.join for str.join (it will also scale better):

 df['baz'] = [ ' '.join([str(x), 'is', y]) for x, y in zip(df['bar'], df['foo'])] 

 df foo bar baz 0 a 1 1 is a 1 b 2 2 is b 2 c 3 3 is c 

List comprehensions are excellent at manipulating strings, because string operations are inherently difficult to vectorize, and most of the "vectorized" pandas functions are mostly wrappers around loops. I wrote a lot on this topic in " For loops with pandas." When do I need it? In general, if you don't need to worry about index alignment, use list comprehension when dealing with strings and regular expression operations.

The above list does not process NaN by default. However, you can always write a function that contains an attempt, unless you need to process it.

 def try_concat(x, y): try: return str(x) + ' is ' + y except (ValueError, TypeError): return np.nan df['baz'] = [try_concat(x, y) for x, y in zip(df['bar'], df['foo'])] 



perfplot performance perfplot

enter image description here

Chart created using perflot . Here is a complete list of codes .

the functions

 def brenbarn(df): return df.assign(baz=df.bar.map(str) + " is " + df.foo) def danielvelkov(df): return df.assign(baz=df.apply( lambda x:'%s is %s' % (x['bar'],x['foo']),axis=1)) def chrimuelle(df): return df.assign( baz=df['bar'].astype(str).str.cat(df['foo'].values, sep=' is ')) def vladimiryashin(df): return df.assign(baz=df.astype(str).apply(lambda x: ' is '.join(x), axis=1)) def erickfis(df): return df.assign( baz=df.apply(lambda x: f"{x['bar']} is {x['foo']}", axis=1)) def cs1_format(df): return df.assign(baz=df.agg('{0[bar]} is {0[foo]}'.format, axis=1)) def cs1_fstrings(df): return df.assign(baz=df.agg(lambda x: f"{x['bar']} is {x['foo']}", axis=1)) def cs2(df): a = np.char.array(df['bar'].values) b = np.char.array(df['foo'].values) return df.assign(baz=(a + b' is ' + b).astype(str)) def cs3(df): return df.assign( baz=[str(x) + ' is ' + y for x, y in zip(df['bar'], df['foo'])]) 
+17
Jan 21 '19 at 22:23
source share

You can also use

 df['bar'] = df['bar'].str.cat(df['foo'].values.astype(str), sep=' is ') 
+11
Mar 28 '14 at 17:56
source share
 df.astype(str).apply(lambda x: ' is '.join(x), axis=1) 0 1 is a 1 2 is b 2 3 is c dtype: object 
+4
Apr 29 '17 at 10:56 on
source share

@DanielVelkov's answer is correct, BUT using string literals is faster:

 # Daniel's %timeit df.apply(lambda x:'%s is %s' % (x['bar'],x['foo']),axis=1) ## 963 µs ± 157 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) # String literals - python 3 %timeit df.apply(lambda x: f"{x['bar']} is {x['foo']}", axis=1) ## 849 µs ± 4.28 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) 
+2
Oct 16 '18 at 18:29
source share

You can also use str.join with the new pd.Series :

 >>> pd.Series(df.astype(str).values.tolist()).str.join(' is ') 0 1 is a 1 2 is b 2 3 is c dtype: object >>> 
0
Jun 09 '19 at 7:09
source share



All Articles