Constructing fixed-point variables

Variables of type Fix are defined by specifying the word length and the position of the binary point.

At the user interface level, precision is specified either by setting a fixed-point parameter to a (value, precision) pair, or by setting a precision parameter. The former gives the value and precision of some fixed-point value, while the latter is typically used to specify the internal precision of computations in a primitive.

In either case, the syntax of the precision is either x.y or m/n, where x is the number of integer bits (including the sign bit), y and m are the number of fractional bits, and n is the total number of bits. Thus, the total number of bits in the fixed-point number (also called its length) is x+y or n. For example, a fixed-point number with precision 3.5 has a total length of 8 bits, with 3 bits to the left and 5 bits to the right of the binary point.

At the source code level, methods working on Fix objects either have the precision passed as an x.y or m/n string, or as two C++ integers that specify the total number of bits and the number of integer bits including the sign bit (that is, n and x). For example, suppose you have a primitive with a precision parameter named precision. Consider the following code:

Fix x = Fix((const char *) precision);
if (x.invalid())
  Error::abortRun(*this, "Invalid precision");

The precision parameter is cast to a string and passed as argument to the Fix class constructor. The error check verifies that the precision was valid. There is a maximum value for the total length of a Fix object which is specified by the constant FIX_MAX_LENGTH in the file $MLD/include/kernel/Fix.h. The current value is 64 bits. Numbers in the Fix class are represented using two’s complement notation, with the sign bit stored in the bits to the left of the binary point. There must always be at least one bit to the left of the binary point to store the sign. In addition to its value, each Fix object contains information about its precision and error codes indicating overflow, divide-by-zero, or bad format parameters. The error codes are set when errors occur in constructors or arithmetic operators. There are also fields to specify:

  1. whether rounding or truncation should take place when other Fix values are assigned to it, truncation is the default,
  2. the response to an overflow or underflow on assignment, the default is saturation.