Pandas assert_frame_equal behavior

I am trying to compare two DataFrames with pandas testing assert_frame_equal . These frames contain floats that I want to compare with a certain user precision.

The check_less_precise argument from assert_frame_equal seems to suggest that I can specify the number of digits after the decimal point to compare. To provide a link to the API help page -

check_less_precise : Specify the accuracy of the comparison. Used only when check_exact is False. 5 digits (False) or 3 digits (True) after decimal points. If int, then specify the numbers to compare

API reference

However, it does not seem to work when the floats are less than 1.

An AssertionError condition occurs

 import pandas as pd expected = pd.DataFrame([{"col": 0.1}]) output = pd.DataFrame([{"col": 0.12}]) pd.testing.assert_frame_equal(expected, output, check_less_precise=1) 

until it

 expected = pd.DataFrame([{"col": 1.1}]) output = pd.DataFrame([{"col": 1.12}]) pd.testing.assert_frame_equal(expected, output, check_less_precise=1) 

can someone help explain this behavior, is this a mistake?

+5
source share
1 answer

I dug the source code and found out what was going on. In the end, the decimal_almost_equal function is decimal_almost_equal , which looks like this in normal Python (in Cython).

 def decimal_almost_equal(desired, actual, decimal): return abs(desired - actual) < (0.5 * 10.0 ** -decimal) 

See the source code here. Here is the actual function call:

 decimal_almost_equal(1, fb / fa, decimal) 

Where in this example

 fa = .1 fb = .12 decimal = 1 

So the function call becomes

 decimal_almost_equal(1, 1.2, 1) 

Which decimal_almost_equal evaluates to

 abs(1 - 1.2) < .5 * 10 ** -1 

or

 .2 < .05 

What is False .

Thus, the comparison is based on a percentage difference, not a total difference.

If you want an absolute comparison, check out np.allclose .

 np.allclose(expected, output, atol=.1) True 
+2
source

Source: https://habr.com/ru/post/1271944/


All Articles