Trying to create a pandas series within a data frame with values ​​based on whether the keys are in another data frame

Just boil it ...

Dataframe 1 = yellow_fruits Column - domain_name, and location

Dataframe 2 = red_fruits Column - domain_name, and location

Dataframe 3 = fruit_montage Columns - name fruit_name, pounds_of_fruit_needed, freshness

Let's say I want to add a column in Dataframe 3 called "color". The value will be yellow if the fruit is yellow, red if the fruit is red, and it is not known if it is not red or yellow.

Basically, pseudo code ...

If the fruit is in the yellow color frame, the yellow color goes to the column. If the fruit is in the red frame with the fruit, the red color goes to the column. If the fruit is not in any of these data frames, the column is “unknown”.

My code caused an error:

 if df3['fruit_name'].isin(df1['fruit_name']):
        data = "'yellow"
    elif df3['fruit_name'].isin(df2['fruit_name']):
        data = "red"
    else:
        data = "unknown"

    df3['color'] = pd.Series(data, index = df3.index)

Error:

C: \ Anaconda2 \ lib \ site-packages \ pandas \ core \ generic.pyc non-zero (self) 890 raise ValueError ("The truth value {0} is ambiguous." 891 "Use a.empty, a.bool () , a.item (), a.any () or a.all (). "-> 892.format (self. class . name )) 893 894 bool = nonzero

ValueError: The truth value of the series is ambiguous. Use the a.empty, a.bool (), a.item (), a.any (), or a.all () commands.

+4
1

:

df1 = pd.DataFrame({'fruit_name':['banana', 'lemon']})
df2 = pd.DataFrame({'fruit_name':['strawberry', 'apple']})
df3 = pd.DataFrame({'fruit_name':['lemon', 'rockmelon', 'apple']})

df3["color"] = "unknown"
df3["color"][df3['fruit_name'].isin(df1['fruit_name'])] = "yellow"
df3["color"][df3['fruit_name'].isin(df2['fruit_name'])] = "red"
df3

#   fruit_name    color
# 0      lemon   yellow
# 1  rockmelon  unknown
# 2      apple      red

, , , , pandas/numpy :

def get_fruit_color(x):
    if x in df1['fruit_name'].unique():
        data = "yellow"
    elif x in df2['fruit_name'].unique():
        data = "red"
    else:
        data = "unknown"

    return data

df3["color"] = df3["fruit_name"].map(get_fruit_color)

, SQL, , ( pandas); . how='left' , , , :

colors = ([(x, 'yellow') for x in df1['fruit_name'].unique()] 
           + [(x, 'red') for x in df2['fruit_name'].unique()])
colors_df = pd.DataFrame(colors, columns = ['fruit_name', 'color'])
df3.merge(colors_df, how='left').fillna("unknown")

, (, , "" ) dict ( pandas), NaN, , fillna:

df3["color"] = df3["fruit_name"].map(dict(colors)).fillna("unknown")
+1

Source: https://habr.com/ru/post/1655911/


All Articles