Supported arithmetic¶
Rounding directions¶
Some of the classes of operators presented in the following sections are templated by a rounding direction. This is the direction chosen when converting a real number that cannot be exactly represented in the destination format.
There are eleven directions:
zrtoward zero
awaway from zero
dntoward minus infinity (down)
uptoward plus infinity
odto odd mantissas
neto nearest, tie breaking to even mantissas
noto nearest, tie breaking to odd mantissas
nzto nearest, tie breaking toward zero
nato nearest, tie breaking away from zero
ndto nearest, tie breaking toward minus infinity
nuto nearest, tie breaking toward plus infinity
The rounding directions mandated by the IEEE-754 standard are ne
(default mode, rounding to nearest), zr, dn, up, and na
(introduced for decimal arithmetic).
Floating-point operators¶
This class of operators covers all the formats whose number sets are \(F(p,d) = \{m \cdot 2^e; |m| < 2^p, e \ge d\}\). In particular, IEEE-754 floating-point formats (with subnormal numbers) are part of this class, if we set apart overflow issues. Both parameters p and d select a particular format. The last parameter selects the rounding direction.
float< precision, minimum_exponent, rounding_direction >(...)
Formats with no minimal exponent (and thus no underflow) are also available:
float< precision, rounding_direction >(...)
Having to remember the precision and minimum exponent parameters may be
a bit tedious, so an alternate syntax is provided: instead of these two
parameters, a name can be given to the float class.
float< name, rounding_direction >(...)
There are four predefined formats:
ieee_32IEEE-754 single precision
ieee_64IEEE-754 double precision
ieee_128IEEE-754 quadruple precision
x86_80extended precision on x86-like processors
Fixed-point operators¶
This class of operators covers all the formats whose number sets are \(F(e) = \{m \cdot 2^e\}\). The first parameter selects the weight of the least significant bit. The second parameter selects the rounding direction.
fixed< lsb_weight, rounding_direction >(...)
Rounding to integer is a special case of fixed point rounding of weight 0. A syntactic shortcut is provided.
int< rounding_direction >(...)