Sometimes, to achieve needed accuracy, one has to operate with numbers more accurate than the 80-bit FPU registers. For this purpose, data structures of desired sizes (limited only by available RAM) can be used as floating point numbers. However, the larger the mantissa, the slower the calculations. I have thus chosen a Floating Point Structure - or Virtual Register (VR) - with a 256-bit mantissa and a 32-bit exponent field.
Structure
Mantissa DD 8 Dup(?) Exponent DD ? Sign DB ? Status DB ? Elements DB ? Empty El DB ? High El DW ? Low El DW ? Address DD ? The notations "DD", "DW" and "DB" stand for definitions of Double-Words (32 bits), Words (16 bits) and Bytes (8 bits) respectively. Obviously, the mantissa consists of eight 32-bits elements. To facilitate and speed up operations, the sign is given a whole byte and other structure elements are introduced. The entire size of this VR is 48 bytes; its 36-bytes FPU equivalent register would have a 31-bit exponent and a sign bit, instead.
Performance The time required for basic operations depends on the particular arguments. For setting benchmarks, operations were hence timed (with a 1.47 GHz Athlon CPU) using two 256-bit arguments: e = 2.71... and π = 3.14... Execution time is measured in Clock units, which in our case translates to:
1 Clock = 0.68x10^{-9} sec = 0.68 nanoseconds (νs). The table below illustrates the price in clocks we have to pay for accuracy greater than the FPU's 64 bits.
Operation |
ε ≤ 2^{-64} |
ε ≤ 2^{-128} |
ε ≤ 2^{-256} |
ADD | 1.5 | 383 | 500 |
MUL | 1.5 | 378 | 2,550 |
DIV | 21 | 5,657 | 14,048 |
SQRT | 32 | 7,070 | 28,230 |
Virtual registers are not only useful when great accuracy is required, they can help verify results and determine error margins of FPU based calculations, as well. For example, I could verify the FPU based graphical error analysis as well as calculate the Bessel Function of the First Kind J_{ν}(x) at the point x = ν = 5,000,000 with great accuracy.^{[1]} The result for the first 18 decimals
J_{5000000}(5000000) = 2.61586906680728472E-3 can serve as a benchmark for calculations involving Bessel functions with large indices and arguments.^{[2]}
- The nominal accuracy of 256 bits is reduced to about 240 by accumulating calculation errors.
- For example: An investigation of uniform expansions of large order Bessel functions in Gravitational Wave Signals from Pulsars; F. A. Chishtie, K. M. Rao, I. S. Kotsireas, S. R. Valluri astro-ph/0611035 (November 2006).
Copyright Dan Baruth © 2007. All rights reserved.