World's most popular travel blog for travel bloggers.

Inequality caused by float inaccuracy

, , No Comments
Problem Detail: 

At least in Java, if I write this code:

float a = 1000.0F; float b = 0.00004F; float c = a + b + b; float d = b + b + a; boolean e = c == d; 

the value of $e$ would be $false$. I believe this is caused by the fact that floats are very limited in the way of accurately representing numbers. But I don't understand why just changing the position of $a$ could cause this inequality.

I reduced the $b$s to one in both line 3 and 4 as below, the value of $e$ however becomes $true$:

float a = 1000.0F; float b = 0.00004F; float c = a + b; float d = b + a; boolean e = c == d; 

What exactly happened in line 3 and 4? Why addition operations with floats are not associative?

Thanks in advance.

Asked By : Known Zeta

Answered By : gnasher729

In typical floating point implementations, the result of a single operation is produced as if the operation was performed with infinite precision, and then rounded to the nearest floating-point number.

Compare $a+b$ and $b+a$: The result of each operation performed with infinite precision is the same, therefore these identical infinite precision results are rounded in an identical way. In other words, floating-point addition is commutative.

Take $b + b + a$: $b$ is a floating-point number. With binary floating point numbers, $2b$ is also a floating-point number (the exponent is larger by one), so $b+b$ is added without any rounding error. Then $a$ is added to the exact value $b+b$. The result is the exact value $2b + a$, rounded to the nearest floating-point number.

Take $a + b + b$: $a + b$ is added, and there will be a rounding error $r$, so we get the result $a+b+r$. Add $b$, and the result is the exact value $2b + a + r$, rounded to the nearest floating-point number.

So in one case, $2b + a$, rounded. In the other case, $2b + a + r$, rounded.

PS. Whether for two particular numbers $a$ and $b$ both calculations give the same result or not depends on the numbers, and on the rounding error in the calculation $a + b$, and is usually hard to predict. Using single or double precision makes no difference to the problem in principle, but since the rounding errors are different, there will be values of a and b where in single precision the results are equal and in double precision they are not, or vice versa. The precision will be a lot higher, but the problem that two expressions are mathematically the same but not the same in floating-point arithmetic stays the same.

PPS. In some languages, floating point arithmetic may be performed with higher precision or a higher range of numbers than given by the actual statements. In that case, it would be much much more likely (but still not guaranteed) that both sums give the same result.

PPPS. A comment asked whether we should ask if floating point numbers are equal or not at all. Absolutely if you know what you are doing. For example, if you sort an array, or implement a set, you get yourself into awful trouble if you want to use some notion of "approximately equal". In a graphical user interface, you may need to recalculate object sizes if the size of an object has changed - you compare oldSize == newSize to avoid that recalculation, knowing that in practice you almost never have almost identical sizes, and your program is correct even if there is an unnecesary recalculation.

Best Answer from StackOverflow

Question Source : http://cs.stackexchange.com/questions/65703

0 comments:

Post a Comment

Let us know your responses and feedback