Virtual Floating Point Registers
(with 256-Bit Mantissae)



D. Baruth, x87@iging.com





Sometimes, to achieve needed accuracy, one has to operate with numbers more accurate than the 80-bit FPU registers.  For this purpose, data structures of desired sizes (limited only by available RAM) can be used as floating point numbers.  However, the larger the mantissa, the slower the calculations.  I have thus chosen a Floating Point Structure - or Virtual Register (VR) - with a 256-bit mantissa and a 32-bit exponent field.


Structure

MantissaDD 8 Dup(?)
ExponentDD ?
Sign DB ?
Status DB ?
ElementsDB ?
Empty ElDB ?
High El DW ?
Low El DW ?
Address DD ?

The notations "DD", "DW" and "DB" stand for definitions of Double-Words (32 bits), Words (16 bits) and Bytes (8 bits) respectively.  Obviously, the mantissa consists of eight 32-bits elements.   To facilitate and speed up operations, the sign is given a whole byte and other structure elements are introduced.  The entire size of this VR is 48 bytes; its 36-bytes FPU equivalent register would have a 31-bit exponent and a sign bit, instead.


Performance

The time required for basic operations depends on the particular arguments.  For setting benchmarks, operations were hence timed (with a 1.47 GHz Athlon CPU) using two 256-bit arguments: e = 2.71... and π = 3.14...  Execution time is measured in Clock units, which in our case translates to:

 1 Clock = 0.68x10-9 sec = 0.68 nanoseconds (νs).

The table below illustrates the price in clocks we have to pay for accuracy greater than the FPU's 64 bits.

Operation

ε ≤ 2-64

ε ≤ 2-128

ε ≤ 2-256

ADD 1.5 383 500
MUL 1.5 378 2,550
DIV 21 5,657 14,048
SQRT 32 7,070 28,230
Execution Times (in clocks) of the FPU and VRs


Examples
Virtual registers are not only useful when great accuracy is required, they can help verify results and determine error margins of FPU based calculations, as well.  For example, I could verify the FPU based graphical error analysis as well as calculate the Bessel Function of the First Kind Jν(x) at the point x = ν = 5,000,000 with great accuracy.[1]  The result for the first 18 decimals

J5000000(5000000) = 2.61586906680728472E-3

can serve as a benchmark for calculations involving Bessel functions with large indices and arguments.[2]



  1. The nominal accuracy of 256 bits is reduced to about 240 by accumulating calculation errors.
  2. For example: An investigation of uniform expansions of large order Bessel functions in Gravitational Wave Signals from Pulsars;  F. A. Chishtie, K. M. Rao, I. S. Kotsireas, S. R. Valluri astro-ph/0611035 (November 2006).



Copyright Dan Baruth © 2007.  All rights reserved.