Boolean indexing to store false values ​​as NaN

Given a data frame:

                         Data
1                      246804
2                      135272
3                      898.01
4                     3453.33
5                       shine  
6                        add
7                         522
8                         Nan
9                      string
10                      29.11
11                        20  

I would like for two new columns Floatsand Stringshaving the same length as the original frame. Getting a column is Floatseasy:

In [176]: pd.to_numeric(df.Data, errors='coerce')
Out[176]: 
1     246804.00
2     135272.00
3        898.01
4       3453.33
5           NaN
6           NaN
7        522.00
8           NaN
9           NaN
10        29.11
11        20.00
Name: Data, dtype: float64

As you can see, non floats are attached to NaN, which is exactly what I want.

To get the lines, this is what I do:

In [177]: df[df.Data.str.isalpha()]
Out[177]: 
     Data
5   shine
6     add
8     Nan
9  string

But, as you can see, it does not save non-String values ​​as NaN. I want something like this:

1                       NaN
2                       NaN
3                       NaN
4                       NaN
5                       shine  
6                       add
7                       NaN
8                       Nan (not NaN)
9                       string
10                      NaN
11                      NaN  

How can i do this?

+4
source share
3 answers

To get Strings, you can use Boolean Indexing in the column Dataand where Floatsis null.

df['Floats'] = pd.to_numeric(df.Data, errors='coerce')
df['Strings'] = df.Data.loc[df.Floats.isnull()]  # Optional: .astype(str)

>>> df
# Output:
#        Data     Floats Strings
# 1    246804  246804.00     NaN
# 2    135272  135272.00     NaN
# 3    898.01     898.01     NaN
# 4   3453.33    3453.33     NaN
# 5     shine        NaN   shine
# 6       add        NaN     add
# 7       522     522.00     NaN
# 8       Nan        NaN     Nan
# 9    string        NaN  string
# 10    29.11      29.11     NaN
# 11       20      20.00     NaN
+5
floats = pd.to_numeric(df.Data, 'coerce')
pd.DataFrame(dict(
    floats=floats,
    strings=df.Data.mask(floats.notnull())
))

       floats strings
1   246804.00     NaN
2   135272.00     NaN
3      898.01     NaN
4     3453.33     NaN
5         NaN   shine
6         NaN     add
7      522.00     NaN
8         NaN     Nan
9         NaN  string
10      29.11     NaN
11      20.00     NaN

mask,

floats = pd.to_numeric(df.Data, 'coerce')
pd.DataFrame(dict(
    floats=floats,
    strings=df.Data.mask(floats.notnull(), '')
))

       floats strings
1   246804.00        
2   135272.00        
3      898.01        
4     3453.33        
5         NaN   shine
6         NaN     add
7      522.00        
8         NaN     Nan
9         NaN  string
10      29.11        
11      20.00        
+3

df.Data.where(pd.to_numeric(df.Data, errors='coerce').isnull())
Out[186]: 
      Data
1      NaN
2      NaN
3      NaN
4      NaN
5    shine
6      add
7      NaN
8      Nan #not NaN
9   string
10     NaN
11     NaN

Or based df.Data.str.isalpha()

df['Data'].where(df['Data'].str.isalpha())
+3
source

Source: https://habr.com/ru/post/1682700/


All Articles