(I opened issue on GitHub.)
The following behavior does not seem right to me. It seems that if the default value for read_csvis equal na_values=False, then no values, including "NA", should be interpreted as NaN, but this is not so.
This behavior was seen in this post (see comments on @JianxunLi's answer), where "NA" actually means "North America." Actually, I cannot find a way to read this without changing it to NaN, and there must definitely be some way to do this.
Here is an example of csv.
%more foo.txt
x,y
"NA",NA
"foo",foo
I include "NA" both in quotation marks and externally to find out if that matters, but as you can see below, it doesn't look like that.
pd.read_csv('foo.txt')
Out[56]:
x y
0 NaN NaN
1 foo foo
pd.read_csv('foo.txt',na_values=False)
Out[57]:
x y
0 NaN NaN
1 foo foo
pd.read_csv('foo.txt',na_values='foo')
Out[58]:
x y
0 NaN NaN
1 NaN NaN
It appears that the data values of "NaN" are processed in the same way as "NA".
Edit to add: I think I better understand this based on @ Marius's answer, although it really doesn't seem right to me (the default behavior, that is, not Marius's answer, which seems to be the correct explanation of what is happening).
na_values=False => NA and NaN are treated as NaN
na_values='foo' => NA, NaN, and foo are treated as NaN
I think I can understand that this is the default behavior in a column of numbers, but it looks like it should not be the default for a row column. I would also really like to understand this from the documentation without seeing Marius' answer.
Change to add (2):
, , Stata Excel, "NA" , NaN/missing. , , pandas ?