What is the difference between NaN and None?

Question

What is the difference between NaN and None?

I read the two columns of the csv file using pandas readcsv() and then assign the values to the dictionary. Columns contain strings of numbers and letters. Sometimes there are times when a cell is empty. In my opinion, the value read for this dictionary entry should be None , but nan is assigned instead. Of course, None describes the empty cell more because it has a zero value, while nan just says that the value read is not a number.

As far as I understand, what is the difference between None and nan ? Why is nan assigned instead of None ?

In addition, my dictionary checks for empty cells with numpy.isnan() :

 for k, v in my_dict.iteritems(): if np.isnan(v):

But this gives me an error saying that I cannot use this check for v . I assume this is because it is supposed to use an integer or float variable, not a string. If so, how can I check v for the "empty cell" / nan case?

+47

python numpy pandas nan

user1083734 Jul 08 '13 at 19:06

source share

4 answers

NaN can be used as a numeric value for mathematical operations, and None cannot (or at least should not).

NaN is a numeric value defined in IEEE 754 floating point . None is Python's internal python ( NoneType ), and in this context it will be more like "nonexistent" or "empty" than "numerically invalid".

The main "symptom" of this is that if you execute, say, the average value or the sum on an array containing NaN, even in one, you get NaN as a result ...

On the other hand, you cannot perform math operations using None as an operand.

Thus, depending on the case, you can use None as a way of telling your algorithm not to consider invalid or nonexistent values in the calculations. This would mean that the algorithm must check each value to see if it is None .

Numpy has some features to avoid NaN values, to foul your results, like nansum and nan_to_num .

+7

heltonbiker Jul 08 '13 at 19:16

source share

The isnan() function checks if something is "Not A Number" and returns whether the variable is a number, for example isnan(2) will return false

Conditional myVar is not None returns whether a variable is defined

Your numpy array uses isnan() because it is for an array of numbers and initializes all elements of the NaN array, these elements are considered "empty"

+2

Stephan Jul 08 '13 at 19:11

source share

NaN stants for NOT a number .
None can stand anyone .

0

diegoaguilar Jul 08 '13 at 19:09

source share

Andy Hayden · Accepted Answer · 2013-07-08 19:43

NaN is used as a placeholder for missing data sequentially in pandas , consistency is good. Usually I read / translate NaN as "missing . " Also see the “Working with Missing Data” section of documents.

Wes writes in the docs 'NA View Selection' :

After years of production, using [NaN] has proven, at least in my opinion, the best solution, given the state of things in NumPy and Python in general. The special NaN (Not-A-Number) value is used everywhere as the NA value, and there are API functions isnull and notnull that can be used in dtypes to determine NA values.
...
So I opted for the Pythonic “practicality outperforms” approach and traded the integer capabilities of NA for a much simpler approach to using special values in float and object arrays to denote NA and to encourage integer arrays to float when NA is to be introduced.

Note: "gotcha" that an integer containing missing data degenerates to floats .

In my opinion, the main reason for using NaN (over None) is that it can be saved with a numpy float64 dtype rather than a less efficient object dtype, see NA type promotions .

 # without forcing dtype it changes None to NaN! s_bad = pd.Series([1, None], dtype=object) s_good = pd.Series([1, np.nan]) In [13]: s_bad.dtype Out[13]: dtype('O') In [14]: s_good.dtype Out[14]: dtype('float64')

Jeff comments (below):

np.nan allows vectorized operations; its value is float, and None , by definition, forces an object type that basically disables all efficiency in numpy.
Repeat 3 times faster: object == bad, float == good

Saying that many operations can work as well as None vs NaN (but maybe they are not supported, they can sometimes give unexpected results ):

 In [15]: s_bad.sum() Out[15]: 1 In [16]: s_good.sum() Out[16]: 1.0

To answer the second question:
You must use pd.isnull and pd.notnull to check for missing data (NaN).

What is the difference between NaN and None?

More articles: