Loading [MathJax]/extensions/TeX/boldsymbol.js

 

 

 

Loss of Precision

A floating number x, labelled fl(x) will therefore always be represented as \begin{equation} fl(x) = x(1\pm \epsilon_x), \tag{6} \end{equation} with x the exact number and the error |\epsilon_x| \le |\epsilon_M| , where \epsilon_M is the precision assigned. A number like 1/10 has no exact binary representation with single or double precision. Since the mantissa \left(1.a_{-1}a_{-2}\dots a_{-n}\right)_2 is always truncated at some stage n due to its limited number of bits, there is only a limited number of real binary numbers. The spacing between every real binary number is given by the chosen machine precision. For a 32 bit words this number is approximately $ \epsilon_M \sim 10^{-7}$ and for double precision (64 bits) we have $ \epsilon_M \sim 10^{-16}$, or in terms of a binary base as 2^{-23} and 2^{-52} for single and double precision, respectively.