Trying to flush indexed NaN string in dataframe

I am using python 2.7.3 and Pandas version 0.12.0.

I want to delete the row with the NaN index so that I only have valid site_id values.

print df.head() special_name site_id NaN Banana OMG Apple df.drop(df.index[0]) TypeError: 'NoneType' object is not iterable 

If I try to delete a range, for example:

 df.drop(df.index[0:1]) 

I get this error:

 AttributeError: 'DataFrame' object has no attribute 'special_name' 
+6
source share
6 answers

I found that the easiest way is to reset the index, reset the NaN, and then reset the index again.

 In [26]: dfA.reset_index() Out[26]: index special_name 0 NaN Apple 1 OMG Banana In [30]: df = dfA.reset_index().dropna().set_index('index') In [31]: df Out[31]: special_name index OMG Banana 
+11
source

With pandas version> = 0.20.0 you can:

df[df.index.notnull()]

With older versions:

df[pandas.notnull(df.index)]

To break it:

notnull generates a boolean mask, for example. [False, False, True] , where True denotes the value in the corresponding position, is null ( numpy.nan or None ). Then we select the rows whose index matches the true value in the mask using df[boolean_mask] .

+15
source

Tested to work:

df.reset_index(inplace=True)

df.drop(df[df['index'].isnull()].index, inplace=True)


As i checked above

Copy the table into the original question using df=pd.DataFrame(data=['Banana', 'Apple'], index=[np.nan, 'OMG'],columns=['Special_name'])

then enter the above two lines of code, which I am trying to explain in human language below:

  • The 1st row resets the index to integers, and NaN is now in the column with the name after the original index name ("index" in the above example, because the name is not specified) - pandas does this automatically with the reset_index () command.
  • 2nd row from the innermost brackets: df[df['index'].isnull()] filters rows for which a column named "index" displays "NaN" values ​​using the isnull () command. .index used to pass a unique index object pointing to all the lines "index" = NaN to df.drop( in the outermost part of the expression.

nb: checked the above command to work with multiple NaN values ​​in a column

Using Python 3.5.1, pandas 0.17.1 via Anaconda 32bits package

+2
source

Edit: perhaps the following applies only to MultiIndex s and, in any case, df.index.isnull() with the new df.index.isnull() function (see other answers). I will leave this answer only for historical interest.

For people who are coming to this now, you can do it directly without rethinking, relying on the fact that the NaNs in the index will be represented by -1 . So:

 df = dfA[dfA.index.labels!=-1] 

Even better, in Pandas> 0.16.1, you can use drop () to do this in place without copying:

 dfA.drop(labels=[-1], level='index', inplace=True) 

NB: This is a little misleading that the index level is called an “index”: it will usually be something more specific to use, such as “date” or “experimental_run”.

+1
source

None of the answers worked for me 100%. Here's what worked:

 In [26]: print df Out[26]: site_id special_name 0 OMG Apple 1 NaN Banana 2 RLY Orange In [27]: df.dropna(inplace=True) Out[27]: site_id special_name 0 OMG Apple 2 RLY Orange In [28]: df.reset_index(inplace=True) Out[28]: index site_id special_name 0 0 OMG Apple 1 2 RLY Orange In [29]: df.drop('index', axis='columns', inplace=True) Out[29]: site_id special_name 0 OMG Apple 1 RLY Orange 
+1
source

According to pandas 0.19, Index es have a .notnull() method, so timdiels answer can be simplified to:

 df[df.index.notnull()] 

which, in my opinion, is (currently) the easiest you can get.

0
source

Source: https://habr.com/ru/post/956990/


All Articles