Iteratively concatenate columns in pandas using NaN values

I have a data frame pandas.DataFrame:

import pandas as pd

df = pd.DataFrame({"x": ["hello there you can go home now", "why should she care", "please sort me appropriately"], 
    "y": [np.nan, "finally we were able to go home", "but what about meeeeeeeeeee"],
    "z": ["", "alright we are going home now", "ok fine shut up already"]})

cols = ["x", "y", "z"]

I want to iteratively concatenate these columns, rather than writing something like:

df["concat"] = df["x"].str.cat(df["y"], sep = " ").str.cat(df["z"], sep = " ")

I know that the three columns seem trivial to come together, but actually I have 30. So I would like to do something like:

df["concat"] = df[cols[0]]
for i in range(1, len(cols)):
    df["concat"] = df["concat"].str.cat(df[cols[i]], sep = " ")

Right now, the start line is df["concat"] = df[cols[0]]working fine, but a value NaNin the location df.loc[1, "y"]will ruin the concatenation. Ultimately, the entire string 1st ends as NaNin df["concat"]because of this one null value. How can I get around this? Is there an option with which pd.Series.str.catI need to specify?

+4
source share
2 answers

Option 1

pd.Series(df.fillna('').values.tolist()).str.join(' ')

0                    hello there you can go home now  
1    why should she care finally we were able to go...
2    please sort me appropriately but what about me...
dtype: object

2

df.fillna('').add(' ').sum(1).str.strip()

0                      hello there you can go home now
1    why should she care finally we were able to go...
2    please sort me appropriately but what about me...
dtype: object
+2

3

In [3061]: df.apply(lambda x: x.str.cat(sep=''), axis=1)
Out[3061]:
0                      hello there you can go home now
1    why should she carefinally we were able to go ...
2    please sort me appropriatelybut what about mee...
dtype: object
+1

Source: https://habr.com/ru/post/1653349/


All Articles