Adding a function to a line splitting command in Pandas

Question

Adding a function to a line splitting command in Pandas

I have a data frame in which there are about 20 columns. One of the columns is called "source_name" and has values such as "John Doe" or "Jane Doe." I want to split it into 2 columns: "First_Name" and "Last_Name". When I run the following, it works as expected and splits the row into 2 columns:

data[['First_Name', 'Last_Name']] = data.director_name.str.split(' ', expand 
= True) 
data

First_Name    Last_Name
John          Doe

It works fine, however it does NOT work when I have NULL (NaN) values under the name "director_name". It produces the following error:

'Columns must be same length as key'

I would like to add a function that checks if! = Null, and then execute the above command, otherwise enter "NA" for First_Name and "Last_Name"

Any ideas how I will do this?

EDIT:

, , NULL. 3-4 . .

John Allen Doe
John Allen Doe Jr

, First_Name Last_Name.

+4

python pandas

JD2775 27 '17 3:17

4

str.split ( , ) str :

print (df.name.str.split())
0      [James, Cameron]
1       [Martin, Sheen]
2    [John, Allen, Doe]
3                   NaN
Name: name, dtype: object

df['First_Name'] = df.name.str.split().str[0]
df['Last_Name'] = df.name.str.split().str[1]

#data borrow from A-Za-z answer
print (df)
   Id            name First_Name Last_Name
0   1   James Cameron      James   Cameron
1   2    Martin Sheen     Martin     Sheen
2   3  John Allen Doe       John     Allen
3   4             NaN        NaN       NaN

n :

df['First_Name'] = df.name.str.split().str[0]
df['Last_Name'] = df.name.str.split(n=1).str[1]
print (df)
   Id            name First_Name  Last_Name
0   1   James Cameron      James    Cameron
1   2    Martin Sheen     Martin      Sheen
2   3  John Allen Doe       John  Allen Doe
3   4             NaN        NaN        NaN

str.rstrip

df['First_Name'] = df.name.str.rsplit(n=1).str[0]
df['Last_Name'] = df.name.str.rsplit().str[-1]
print (df)
   Id            name  First_Name Last_Name
0   1   James Cameron       James   Cameron
1   2    Martin Sheen      Martin     Sheen
2   3  John Allen Doe  John Allen       Doe
3   4             NaN         NaN       NaN

+2

jezrael 27 '17 3:38

.

data= pd.DataFrame({'director_name': {0: 'John Doe', 1: np.nan, 2: 'Alan Smith'}})

data
Out[457]: 
  director_name
0      John Doe
1           NaN
2    Alan Smith

#use a lambda function to check nan before splitting the column.
data[['First_Name', 'Last_Name']] = data.apply(lambda x: pd.Series([np.nan,np.nan] if pd.isnull(x.director_name) else x.director_name.split()), axis=1)

data
Out[446]: 
  director_name First_Name Last_Name
0      John Doe       John       Doe
1           NaN        NaN       NaN
2    Alan Smith       Alan     Smith

2 , :

data[['First_Name', 'Last_Name']] = data.apply(lambda x: pd.Series([np.nan,np.nan] if pd.isnull(x.director_name) else x.director_name.split()).iloc[:2], axis=1)

+1

Allen 27 '17 3:31

df['First_Name'] = df.name.str.split(' ', expand = True)[0]
df['Last_Name'] = df.name.str.split(' ', expand = True)[1]

+1

Prateek Chanda 27 '17 3:41

Vaishali · Accepted Answer · 2017-05-27T03:35:03+0000

    Id  name
0   1   James Cameron
1   2   Martin Sheen
2   3   John Allen Doe
3   4   NaN


df['First_Name'] = df.name.str.split(' ', expand = True)[0]
df['Last_Name'] = df.name.str.split(' ', expand = True)[1]

    Id  name            First_Name  Last_Name
0   1   James Cameron   James       Cameron
1   2   Martin Sheen    Martin      Sheen
2   3   John Allen Doe  John        Allen
3   4   NaN             NaN         None

Adding a function to a line splitting command in Pandas

More articles: