**Which Place To Use When Comparing Numbers**: A Comprehensive Guide

Which Place To Use When Comparing Numbers is a crucial consideration for anyone working with numerical data, from students to seasoned professionals. COMPARE.EDU.VN offers an in-depth analysis of various comparison techniques, ensuring you choose the most appropriate method for your specific needs. This guide explores the nuances of floating-point comparisons and beyond, providing the knowledge to make informed decisions.

1. The Perils of Floating-Point Arithmetic

Floating-point math is notoriously complex, far exceeding the perceived difficulty of many other mathematical problems. Each attempt to fully grasp its intricacies often reveals further layers of complexity. The primary challenge lies in the fact that floating-point numbers are not always exact.

For instance, a seemingly simple decimal value like 0.1 cannot be perfectly represented in binary floating-point format. This inherent limitation, combined with the finite precision of floating-point numbers, leads to discrepancies arising from variations in the order of operations or the precision of intermediate calculations. Consequently, directly comparing two floats for equality is generally unreliable.

GCC, with good intentions, issues a warning: “comparing floating point with == or != is unsafe.” Consider this illustrative example:

float f = 0.1f;
float sum;
sum = 0;
for (int i = 0; i < 10; ++i)
    sum += f;
float product = f * 10;
printf("sum = %1.15f, mul = %1.15f, mul2 = %1.15fn", sum, product, f * 10);

This code aims to calculate ‘one’ through three distinct methods: repeated addition and two variations of multiplication. The outcome reveals three differing results, only one of which exactly equals 1.0. Note that the specific results may vary based on compiler, CPU, and compiler settings, underscoring the underlying issue.

2. The Meaning of “Correct” in Floating-Point Comparisons

Before we proceed, it’s crucial to distinguish between 0.1, float(0.1), and double(0.1). In C/C++, 0.1 and double(0.1) are equivalent in code, but conceptually, “0.1” represents the exact base-10 number, whereas float(0.1) and double(0.1) are rounded approximations. Moreover, float(0.1) and double(0.1) hold different values due to their varying levels of precision. Here’s a breakdown:

Number Value
0.1 0.1 (exact)
float(0.1) 0.100000001490116119384765625
double(0.1) 0.1000000000000000055511151231257827021181583404541015625

Analyzing the outcomes of the previous code snippet:

  1. sum = 1.000000119209290: This calculation is prone to accumulated rounding errors due to the repeated addition of a rounded value. The result is not 1.0, nor is it exactly 10 * float(0.1). However, it is the closest representable float above 1.0.
  2. mul = 1.000000000000000: This method, involving multiplication, offers fewer opportunities for error accumulation. The rounding during the conversion of 0.1 to float(0.1) is counteracted by a subsequent rounding down during multiplication by ten.
  3. mul2 = 1.000000014901161: This calculation leverages double-precision multiplication, thereby avoiding further rounding errors. The result represents the exact value of 10 * float(0.1), storable in a double but not in a float.

Thus, the first result is nearly correct, while the second is technically correct but inexact, and the third is completely correct but might appear wrong.

3. Addressing Near Equality in Floating-Point Numbers

Given these nuances, how do we determine if two floating-point results are close enough to be considered equal? What if we aim to identify results that are equal to one, or at least plausibly so?

4. Epsilon-Based Comparisons

A common approach involves checking if the absolute difference between two floats falls within a defined error bound, known as epsilon.

bool isEqual = fabs(f1 – f2) <= epsilon;

This allows for a more flexible assessment of equality. However, the critical question arises: What value should be assigned to epsilon?

Based on the previous example, one might consider using the observed error, approximately 1.19e-7f. Indeed, the float.h header file defines FLT_EPSILON with a similar value.

However, relying solely on FLT_EPSILON is problematic. This value represents the difference between adjacent floats only within the range of 1.0 to 2.0. For numbers significantly smaller than 1.0, FLT_EPSILON becomes excessively large. For sufficiently small numbers, FLT_EPSILON might even exceed the numbers being compared, as evidenced by a flaky Chromium test.

Conversely, for numbers exceeding 2.0, the gap between representable floats increases. Using FLT_EPSILON in such cases effectively reduces the comparison to a more computationally expensive equality check. Above 16777216, the appropriate epsilon for floats surpasses one, rendering FLT_EPSILON-based comparisons ineffective.

5. Relative Epsilon Comparisons

A more robust technique involves relative epsilon comparisons, which consider the difference between two numbers in relation to their magnitudes. To ensure consistency, the difference should be compared to the larger of the two numbers.

The principle is:

Two floats, f1 and f2, are considered equal if the absolute difference between them (diff = fabs(f1-f2)) is less than n% of the maximum of their absolute values (max(abs(f1), abs(f2))).

Here’s the corresponding code:

bool AlmostEqualRelative(float A, float B, float maxRelDiff = FLT_EPSILON) {
    // Calculate the difference.
    float diff = fabs(A - B);
    A = fabs(A);
    B = fabs(B);

    // Find the largest
    float largest = (B > A) ? B : A;

    if (diff <= largest * maxRelDiff)
        return true;
    return false;
}

This function provides a more adaptable comparison method. However, it also has limitations, which we will discuss later. The core of this article lies in an alternative technique introduced many years ago.

When using relative comparisons, setting maxRelDiff to FLT_EPSILON or a small multiple thereof generally works well. Values significantly smaller than FLT_EPSILON risk being equivalent to no epsilon, while excessively large values introduce unacceptable error margins. The lack of a direct relationship to the floating-point format is a drawback.

6. Units in the Last Place (ULP)

We know that adjacent floats have integer representations that are adjacent. This implies that subtracting the integer representations of two numbers reveals their separation in float space.

Dawson’s theorem states:

If the integer representations of two same-sign floats are subtracted, the absolute value of the result equals one plus the number of representable floats between them.

This difference represents the number of Units in the Last Place (ULP) separating the numbers.

#include <stdint.h> // For int32_t, etc.
union Float_t {
    Float_t(float num = 0.0f) : f(num) {}
    bool Negative() const { return i < 0; }
    int32_t RawMantissa() const { return i & ((1 << 23) - 1); }
    int32_t RawExponent() const { return (i >> 23) & 0xFF; }

    int32_t i;
    float f;
#ifdef _DEBUG
    struct {
        uint32_t mantissa : 23;
        uint32_t exponent : 8;
        uint32_t sign : 1;
    } parts;
#endif
};

bool AlmostEqualUlps(float A, float B, int maxUlpsDiff) {
    Float_t uA(A);
    Float_t uB(B);

    if (uA.Negative() != uB.Negative()) {
        if (A == B)
            return true;
        return false;
    }

    int ulpsDiff = abs(uA.i - uB.i);
    if (ulpsDiff <= maxUlpsDiff)
        return true;

    return false;
}

The sign check addresses complexities arising from the signed-magnitude representation of floats and the fact that an ULPs-based comparison of floats with different signs is generally meaningless.

After handling these special cases, subtracting the integer representations and taking the absolute value reveals the distance between the numbers. The ulpsDiff indicates the number of floats between the two numbers (plus one), providing an intuitive understanding of floating-point error.

Comparing numbers using ULPs offers a relative comparison method. The concept is mainstream enough that Boost provides a function for calculating the difference in ULPs between two numbers.

One ULP difference indicates the smallest possible difference between two numbers. A one ULP difference between two floats is significantly larger than one ULP between two doubles.

7. ULP vs. FLT_EPSILON

Using the ULP-based comparison to check for adjacent floats yields results similar to using AlmostEqualRelative with epsilon set to FLT_EPSILON. The results are generally consistent for numbers slightly above a power of two. However, for numbers slightly below a power of two, the FLT_EPSILON technique is more lenient.

For instance, comparing 4.0 to 4.0 plus two ULPs would result in both methods indicating inequality. Conversely, comparing 4.0 to 4.0 minus two ULPs would lead the ULP comparison to indicate inequality, while the FLT_EPSILON relative comparison would suggest equality.

This behavior arises because adding two ULPs to 4.0 changes its magnitude more significantly than subtracting two ULPs, due to the exponent change. Both techniques offer different nuances.

ULP-based comparisons also have different performance characteristics. They are more likely to be efficient on architectures like SSE that facilitate reinterpreting floats as integers. However, they can cause performance stalls on other architectures due to the cost of moving float values to integer registers, particularly when ULP-based comparisons immediately follow float calculations.

Normally, a difference of one ULP signifies similar magnitudes, where the larger number is no more than 1.000000119 times larger than the smaller. However, exceptions exist:

  • FLT_MAX to infinity – one ULP, infinite ratio
  • zero to the smallest denormal – one ULP, infinite ratio
  • smallest denormal to the next smallest denormal – one ULP, two-to-one ratio
  • NaNs – may have similar representations but are not equal
  • Positive and negative zero – two billion ULPs difference, but they should compare as equal
  • One ULP above a power of two is twice as big a delta as one ULP below that same power of two

These exceptions primarily involve denormals and zeros, particularly numbers at or near zero.

8. The Problem of Zero

Relative epsilons tend to break down near zero, especially when expecting a result of zero due to subtraction. Achieving exact zero necessitates identical numbers being subtracted. Even a one ULP difference results in an answer that is small compared to the inputs but significant compared to zero.

If we add float(0.1) ten times and subtract 1.0, the result is FLT_EPSILON instead of the anticipated zero. Comparing zero to FLT_EPSILON using relative comparison will fail, as FLT_EPSILON is vastly distant from zero in ULPs.

Consider another example:

float someFloat = 67329.234; // arbitrarily chosen float
float nextFloat = 67329.242; // exactly one ULP away from ‘someFloat’
bool equal = AlmostEqualUlps( someFloat, nextFloat, 1); // returns true, numbers 1 ULP apart

float diff = (nextFloat – someFloat); // .0078125000
bool equal = AlmostEqualUlps( diff, 0.0f, 1 ); // returns false, diff is 1,006,632,960 ULPs away from zero

While someFloat and nextFloat are very close, their difference is vastly distant from zero, failing ULPs or relative-based tests.

There is no universally easy solution to this challenge. A common approach is to combine absolute and relative epsilons. If two numbers are sufficiently close (based on an absolute threshold), they are treated as equal, regardless of their relative values. This is crucial when expecting zero as a result of subtraction. The value of the absolute epsilon should be based on the magnitude of the inputs.

The ULPs-based technique also faces issues near zero, as previously noted.

A safer approach is to perform an absolute epsilon check first, treating all other differently signed numbers as non-equal.

bool AlmostEqualUlpsAndAbs(float A, float B, float maxDiff, int maxUlpsDiff) {
    // Check if the numbers are really close -- needed
    // when comparing numbers near zero.
    float absDiff = fabs(A - B);
    if (absDiff <= maxDiff)
        return true;

    Float_t uA(A);
    Float_t uB(B);

    // Different signs means they do not match.
    if (uA.Negative() != uB.Negative())
        return false;

    // Find the difference in ULPs.
    int ulpsDiff = abs(uA.i - uB.i);
    if (ulpsDiff <= maxUlpsDiff)
        return true;

    return false;
}

bool AlmostEqualRelativeAndAbs(float A, float B, float maxDiff, float maxRelDiff = FLT_EPSILON) {
    // Check if the numbers are really close -- needed
    // when comparing numbers near zero.
    float diff = fabs(A - B);
    if (diff <= maxDiff)
        return true;

    A = fabs(A);
    B = fabs(B);
    float largest = (B > A) ? B : A;

    if (diff <= largest * maxRelDiff)
        return true;

    return false;
}

9. Catastrophic Cancellation in Trigonometric Functions

Catastrophic cancellation occurs when subtracting two nearly equal numbers, leading to significant loss of precision. This can be subtle and not always obvious.

Consider the example of calculating sin(pi). Trigonometry dictates that the result should be zero. However, direct computation yields:

sin(double(pi)) = +0.00000000000000012246467991473532
sin(float(pi)) = -0.000000087422776

Relative epsilon or ULPs comparisons to zero reveal a substantial discrepancy. Is the sin() function truly inaccurate?

The calculation of sin() itself is typically very precise. The issue stems from the fact that we’re not calculating sin(pi) exactly but rather sin(double(pi)) or sin(float(pi)). Since pi is transcendental, it cannot be perfectly represented in floating-point format.

Therefore, we’re actually calculating sin(pi – theta), where theta represents the small difference between ‘pi’ and its floating-point approximation.

Calculus teaches us that for small values of theta, sin(pi – theta) ≈ theta. Thus, a sufficiently accurate sin() function effectively calculates the error in double(pi) or float(pi)!

Value Result
float(pi) +3.1415927410125732
sin(float(pi)) -0.0000000874227800
float(pi) + sin(float(pi)) +3.1415926535897966

Adding sin(float(pi)) to float(pi) yields a more accurate value of pi than float(pi) alone. Thus, sin(float(pi)) essentially reveals the error in float(pi).

10. Applying the Analysis: The Error in Pi

sin(float(pi)) accurately represents the error in float(pi). Let’s compare the results of sin(‘pi’) to the actual error in the value of ‘pi’ being used:

sin(double(pi)) = +0.0000000000000001224646799147353207
pi-double(pi) = +0.0000000000000001224646799147353177
sin(float(pi)) = -0.000000087422776
pi-float(pi) = -0.000000087422780

The results confirm that sin(double(pi)) provides a highly accurate (16-17 significant figures) measure of the error in double(pi), while sin(float(pi)) is accurate to 6-7 significant figures. The lower accuracy of the sin(float(pi)) results is mainly due to operator overloading, which effectively rounds the result to float precision.

We can use double-precision math to measure the error in a double-precision constant with remarkable accuracy.

The calculation of sin(float(pi)) constitutes a classic example of catastrophic cancellation. We should expect an absolute error (from zero) of up to approximately 3.14 * FLT_EPSILON / 2, which aligns with the observed results.

11. Guidelines for Robust Floating-Point Comparisons

There is no universal solution. Informed decision-making is essential.

  • Comparison Against Zero: Relative epsilons and ULPs-based comparisons are often ineffective. An absolute epsilon is generally required, scaled based on FLT_EPSILON and the inputs.
  • Comparison Against Non-Zero: Relative epsilons or ULPs-based comparisons are generally suitable. Use a small multiple of FLT_EPSILON for the relative epsilon or a small number of ULPs. An absolute epsilon can be used if the target number is precisely known.
  • Comparison of Arbitrary Numbers: A combination of techniques may be necessary.

A deep understanding of the calculations, algorithm stability, and the expected behavior in cases of significant error is paramount. Algorithm stability can be assessed using condition numbers. Rather than arbitrarily increasing epsilon values, focus on restructuring the code to enhance stability.

12. Conclusion

Mastering floating-point comparisons is a journey, not a destination. By understanding the nuances of different techniques and the potential pitfalls, you can write more reliable and accurate code.

FAQ: Floating-Point Number Comparisons

  1. Why can’t I directly compare floating-point numbers for equality?

    • Floating-point numbers are often approximations due to the limitations of representing decimal numbers in binary format. Small rounding errors can accumulate and cause direct comparisons to fail.
  2. What is epsilon in the context of floating-point comparisons?

    • Epsilon is a small value used to define a tolerance range when comparing floating-point numbers. If the difference between two numbers is less than epsilon, they are considered “close enough” to be equal.
  3. What are the limitations of using a fixed epsilon value for comparisons?

    • A fixed epsilon value may be too large for small numbers and too small for large numbers, leading to inaccurate comparisons.
  4. What is a relative epsilon comparison?

    • A relative epsilon comparison calculates the difference between two numbers as a fraction of their magnitudes. This approach is more adaptive than using a fixed epsilon.
  5. What are ULPs (Units in the Last Place)?

    • ULPs represent the number of representable floating-point numbers between two given numbers. This provides an intuitive way to measure the distance between them.
  6. When should I use ULPs-based comparisons instead of epsilon-based comparisons?

    • ULPs-based comparisons are useful when you want to consider the floating-point representation directly and understand how many representable numbers lie between the values. They are also closely related to relative epsilon comparisons.
  7. What is catastrophic cancellation?

    • Catastrophic cancellation occurs when subtracting two nearly equal numbers, leading to a significant loss of precision. This is especially problematic when the expected result is zero.
  8. How can I handle comparisons near zero?

    • Comparisons near zero often require a combination of absolute and relative epsilon checks. If the numbers are very close in absolute terms, they can be considered equal regardless of their relative values.
  9. What should I do if my code is giving large floating-point errors?

    • Rather than simply increasing the epsilon value, try restructuring the code to improve the stability of the algorithms. This can lead to significant improvements in accuracy.
  10. Where can I learn more about floating-point arithmetic and algorithm stability?

    • Consult resources on condition numbers and numerical analysis. Michael L Overton’s “Numerical Computing with IEEE Floating Point Arithmetic” is an excellent reference.

For further exploration and detailed comparisons, visit COMPARE.EDU.VN, your trusted source for comprehensive evaluations.

Contact us for any further information:

Address: 333 Comparison Plaza, Choice City, CA 90210, United States
Whatsapp: +1 (626) 555-9090
Website: COMPARE.EDU.VN

This guide offers essential knowledge on how to choose which place to use when comparing numbers, providing the tools for informed decision-making. Explore compare.edu.vn for more in-depth analyses and comparison resources.

Comments

No comments yet. Why don’t you start the discussion?

Leave a Reply

Your email address will not be published. Required fields are marked *