Concatenation of two columns of pandas

Question

Concatenation of two columns of pandas

I have the following DataFrame :

 from pandas import * df = DataFrame({'foo':['a','b','c'], 'bar':[1, 2, 3]})

It looks like this:

  bar foo 0 1 a 1 2 b 2 3 c

Now I want to have something like:

  bar 0 1 is a 1 2 is b 2 3 is c

How can I achieve this? I tried the following:

 df['foo'] = '%s is %s' % (df['bar'], df['foo'])

but this gives me the wrong result:

 >>>print df.ix[0] bar a foo 0 a 1 b 2 c Name: bar is 0 1 1 2 2 Name: 0

Sorry for the dumb question, but this pandas: merging two columns in a DataFrame did not help me.

+59

python string numpy pandas dataframe

nat Aug 08 2018-12-12T00:

source share

7 answers

BrenBarn · Answer 1 · 2012-08-08 06:03

df['bar'] = df.bar.map(str) + " is " + df.foo .

Daniel Velkov · Answer 2 · 2012-08-08 23:15

The problem in your code is that you want to apply the operation to each line. The way you wrote it takes all the columns "bar" and "foo", converts them into rows and returns one large row. You can write this as:

 df.apply(lambda x:'%s is %s' % (x['bar'],x['foo']),axis=1)

This is more than another answer, but more general (can be used with values that are not strings).

cs95 · Answer 3 · 2019-01-21 22:23

This question has already been answered, but I believe that it would be nice to add some useful methods that were not previously discussed, and compare all the methods proposed so far in terms of performance.

Here are some useful solutions to this problem, in increasing order of productivity.

`DataFrame.agg`

This is a simple str.format -based approach.

 df['baz'] = df.agg('{0[bar]} is {0[foo]}'.format, axis=1) df foo bar baz 0 a 1 1 is a 1 b 2 2 is b 2 c 3 3 is c

You can also use f-string formatting here:

 df['baz'] = df.agg(lambda x: f"{x['bar']} is {x['foo']}", axis=1) df foo bar baz 0 a 1 1 is a 1 b 2 2 is b 2 c 3 3 is c

`char.array` -based Concatenation

Convert the columns to chararrays like chararrays and then stack them together.

 a = np.char.array(df['bar'].values) b = np.char.array(df['foo'].values) df['baz'] = (a + b' is ' + b).astype(str) df foo bar baz 0 a 1 1 is a 1 b 2 2 is b 2 c 3 3 is c

Understanding a list with `zip`

I can not exaggerate how underestimated the understanding of lists in pandas is.

 df['baz'] = [str(x) + ' is ' + y for x, y in zip(df['bar'], df['foo'])]

You can also use str.join for str.join (it will also scale better):

 df['baz'] = [ ' '.join([str(x), 'is', y]) for x, y in zip(df['bar'], df['foo'])]

 df foo bar baz 0 a 1 1 is a 1 b 2 2 is b 2 c 3 3 is c

List comprehensions are excellent at manipulating strings, because string operations are inherently difficult to vectorize, and most of the "vectorized" pandas functions are mostly wrappers around loops. I wrote a lot on this topic in " For loops with pandas." When do I need it? In general, if you don't need to worry about index alignment, use list comprehension when dealing with strings and regular expression operations.

The above list does not process NaN by default. However, you can always write a function that contains an attempt, unless you need to process it.

 def try_concat(x, y): try: return str(x) + ' is ' + y except (ValueError, TypeError): return np.nan df['baz'] = [try_concat(x, y) for x, y in zip(df['bar'], df['foo'])]

`perfplot` performance `perfplot`

Chart created using perflot . Here is a complete list of codes .

the functions

 def brenbarn(df): return df.assign(baz=df.bar.map(str) + " is " + df.foo) def danielvelkov(df): return df.assign(baz=df.apply( lambda x:'%s is %s' % (x['bar'],x['foo']),axis=1)) def chrimuelle(df): return df.assign( baz=df['bar'].astype(str).str.cat(df['foo'].values, sep=' is ')) def vladimiryashin(df): return df.assign(baz=df.astype(str).apply(lambda x: ' is '.join(x), axis=1)) def erickfis(df): return df.assign( baz=df.apply(lambda x: f"{x['bar']} is {x['foo']}", axis=1)) def cs1_format(df): return df.assign(baz=df.agg('{0[bar]} is {0[foo]}'.format, axis=1)) def cs1_fstrings(df): return df.assign(baz=df.agg(lambda x: f"{x['bar']} is {x['foo']}", axis=1)) def cs2(df): a = np.char.array(df['bar'].values) b = np.char.array(df['foo'].values) return df.assign(baz=(a + b' is ' + b).astype(str)) def cs3(df): return df.assign( baz=[str(x) + ' is ' + y for x, y in zip(df['bar'], df['foo'])])

chrimuelle · Answer 4 · 2014-03-28 17:56

You can also use

 df['bar'] = df['bar'].str.cat(df['foo'].values.astype(str), sep=' is ')

Vladimir Yashin · Answer 5 · 2017-04-29 10:56

 df.astype(str).apply(lambda x: ' is '.join(x), axis=1) 0 1 is a 1 2 is b 2 3 is c dtype: object

erickfis · Answer 6 · 2018-10-16 18:29

@DanielVelkov's answer is correct, BUT using string literals is faster:

 # Daniel's %timeit df.apply(lambda x:'%s is %s' % (x['bar'],x['foo']),axis=1) ## 963 µs ± 157 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) # String literals - python 3 %timeit df.apply(lambda x: f"{x['bar']} is {x['foo']}", axis=1) ## 849 µs ± 4.28 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

U9-Forward · Answer 7 · 2019-06-09 07:09

You can also use str.join with the new pd.Series :

 >>> pd.Series(df.astype(str).values.tolist()).str.join(' is ') 0 1 is a 1 2 is b 2 3 is c dtype: object >>>

Concatenation of two columns of pandas

DataFrame.agg

char.array -based Concatenation

Understanding a list with zip

perfplot performance perfplot

More articles:

`DataFrame.agg`

`char.array` -based Concatenation

Understanding a list with `zip`

`perfplot` performance `perfplot`