Notes

2019-01-15

It has been forever since my last note! I hereby take it as new year's resolution to share more about the interesting thought nuggets I come across.

In computer hardware, arithmetic primitives are in fact modular primitives; this means that the result of a+b is defined as the mathematical sum of a and b modulo a power of 2 (usually, 2³² or 2⁶⁴). The mathematical structure obtained is a modular ring in which all operations "wrap around". It is only a ring and not a field because the multiplication operation does not have an inverse; to see why an inverse function for multiplication cannot be defined, observe that 42 is the result of both 21×2 and (2³¹+21)×2!

An elementary result equivalent to Bézout's identity is that any number co-prime with the modulus has a multiplicative inverse. In a computer, this means that for any odd number a, a number b exists shuch that a✕b = 1. Now there is a particularly cute algorithm to find multiplicative inverses, and it is what motivated me to write this note.

The core idea of the algorithm lies in an observation I found quite surprising: a multiplicative inverse b of a modulo 2ⁿ can be used to find an inverse c of a modulo 2²ⁿ. Let's see how.

ab = 1 (mod 2ⁿ)
  ⇒ ab - 1 = k2ⁿ
  ⇒ (ab - 1)² = 0      (mod 2²ⁿ)
  ⇒ a²b² - 2ab + 1 = 0 (mod 2²ⁿ)
  ⇒ 2ab - a²b² = 1     (mod 2²ⁿ)
  ⇒ a(2b - ab²) = 1    (mod 2²ⁿ)
  ⇒ c = 2b - ab²       (mod 2²ⁿ)

This gives us the inductive step of the algorithm; to bootstrap it we simply need to find an inverse b of a modulo some small power of 2. To that end, one can check that all odd numbers are their own inverse modulo 4 (and 8). Putting it together in C.

uint32_t mulinv(uint32_t a) {
    uint32_t b = a;   /* 1/a mod 2² */
    b *= 2 - a*b;     /* 1/a mod 2⁴ */
    b *= 2 - a*b;     /* 1/a mod 2⁸ */
    b *= 2 - a*b;     /* 1/a mod 2¹⁶ */
    b *= 2 - a*b;     /* 1/a mod 2³² */
    return b;
}

For those really eager for performance, it turns out that b = 3*a ^ 2 is a cute initialization by Peter Montgomery which yields an inverse of a modulo 2⁵ and allows to cut one refinement step!