Comparing Floating-Point Numbers

Publish Date: Nov 28, 2018 | 6 Ratings | 3.83 out of 5 | Print | Submit your review

Table of Contents

  1. Floating-point numbers

1. Floating-point numbers

Floating-point numbers can be understood as discrete, finite-precision values along the continuous number line.  They are often misinterpreted as infinitely precise.  Even the value of 1/10 = 0.10 cannot be stored exactly in IEEE floating-point representation because IEEE floating-point numbers are composed of the product of an integer mantissa with a binary (power of 2) exponent factor.  For example, 0.5 and 0.25 can be stored exactly (1 x 2-1 and 1 x 2-2), but 0.1 cannot, its value will be changed to 0.100000000000000006 after stored in the memory.

This limited precision inherent in the floating-point representation means that even minor variations in the order of computations can change the result. Compilers, machines, and resulting programs can store intermediate results subjected to necessary rounding in varying precision, so a change in any of these usually leads to different results.

Expressions that may be equal from a purely mathematical perspective will very often evaluate to only nearly-equal results on a computer.  Rarely will comparing two computed floating-point numbers by checking for exact equality lead to a satisfactory conclusion. The following section contains two simple examples to demonstrate this point.

 

Double Precision Examples

Example 1



 


Example 2

 

Tips for Comparing Floating-point Numbers

Since numerous anomalies can emerge in the process of computation, here are some basic rules of thumb for comparing floating-point numbers.

 

1) Beware of checking for equality – it will most likely fail

a – b = 0, this straightforward comparison will fail in most cases because of unavoidable tiny disturbance within a computation, for this reason it is rarely put into practical use. The G code below shows the easy case for direct comparison:

 

2) Compare to within absolute epsilon

|a – b| <= epsilon, this depicts the absolute distance between two floating-point numbers. Absolute error comparisons have value when you know beforehand the range of the expected error. And one should always make sure the chosen absolute error is larger than the minimum representable difference to make the whole comparison have effect. The G code below shows the detail for comparison within absolute error:

 

3) Compare to within relative epsilon

|a – b|/|b| <= epsilon, this depicts the relative distance between two floating-point numbers. Relative error comparison is most commonly used because the range of the data is always not possible to be known in advance. The G code below shows the detail for comparison within relative error (assume b≠0):

 

4) Compare to within a given digit of precision

Check whether the preceding n digits of the two floating-point numbers are identical. The number of preceding identical digits between two numbers can be approximated by log10(b/a-b) (assume a > b > 0). But the computation used to achieve the values (log10 here, for example) will introduce additional roundoff errors unless you are careful! The G code below shows the detail for comparison to within a given digit of precisions (assume a > b > 0):

 

5.         Comparing using integers

Check how many representable floating-point numbers lie between the two compared quantities on the number line.

Back to Top

Bookmark & Share


Ratings

Rate this document

Answered Your Question?
Yes No

Submit