Hari Kailad

The Number Theoretic Transform


The Number Theoretic Transform is a very important subroutine in many lattice based schemes, such as Kyber and Dilithium. Though a most of these schemes are relatively straightforward, the NTT steps within can be quite opaque.

So what does the NTT do? The NTT is a transformation that allows one to efficiently multiply polynomials in some finite field.

In essence, it is a number theoretic version of the Discrete Fourier Transform, which allows one to compute multiplication of polynomials under a finite field in $\mathcal{O}(n\log n)$.

The Number Theoretic Transform

Consider the polynomial ring $\mathbb{Z}_q[X] /(\phi(x))$. Therefore, an element of this ring $\bm{a} \in \mathbb{Z}_q[X] / (\phi(x))$ can be expressed as $$ \bm{a} = a_0 + a_1X + a_2X^2 + \cdots + a_{n - 1}X^{n - 1} $$

where $n$ is the degree of $\phi(x)$.

Say we have two polynomials $\bm{a}$ and $\bm{b}$. Naively, to multiply these two polynomials, we can compute each coefficient of $\bm{c} = \bm{a} \cdot \bm{b}$ by expressing it as as convolution: $$c_k = \sum_{i=0}^{k} a_i\cdot b_{k - i}$$

For each coefficient at position $i$, we need to do $i$ multiplications. In addition, after constructing the product, we must reduce by the modulus $\phi(x)$, which takes $n$ steps. Therefore, we have a complexity of $\frac{n(n + 1)}{2} + n$, or $\mathcal{O}(n^2)$.

CRT Representation

Before discussing the NTT, we will take cover some background first. The Chinese Remainder Theorem is an important way to represent a polynomial or ring element in a different domain. This representation naturally brings rise to fast multiplication.

Definition (GCD). The GCD of two polynomials $\bm{a}$ and $\bm{b}$ is the polynomial of highest degree that is a factor of both $\bm{a}$ and $\bm{b}$.

Definition (Coprime). Two polynomials are coprime if their GCD is $1$.

Theorem (Chinese Remainder Theorem). Let $P_i(X)$ be a set of polynomials in $\mathbb{F}[X]$ such that each pair is coprime, and $A_i(X)$ be a set of polynomials. Then there exists a unique polynomial $$ P(X) \equiv A_i(X) \pmod{P_i(X)} $$

Now, consider a singular polynomial $\bm{a} \in \mathbb{Z}_q[X]/(\phi(x))$. If $\phi(x)$ splits over $\mathbb{Z}_q[X]$, we can represent it as the following: $$\phi(x) = (x - r_1)(x - r_2)(x - r_3) ... (x - r_n)$$

Now, we can represent $\bm{a}$ as the set of

$$\hat{a}_i \equiv \bm{a} \pmod{(x - r_i)}$$

If the GCD of the $1$-degree polynomials is $1$, then the polynomial $\bm{a}$ is uniquely represented by the set $\hat{a}_1, \cdots, \hat{a}_n$. In this representation, multiplication becomes much easier.

First, we see that by modular arithmetic rules, we have $$\hat{a}_i\hat{b}_i \equiv \bm{a}\bm{b} = \bm{c} \pmod{(x - r_i)}$$

Therefore, if we have $\hat{a}_1, \cdots, \hat{a}_n$ and $\hat{b}_1, \cdots, \hat{b}_n$, then we can evaluate the product in $\mathcal{O}(n)$ time by computing each pointwise multiplication $\hat{c}_i = \hat{a}_i\hat{b}_i$.

Evaluation at Points

An equivalent way to consider this representation is through polynomial evaluation. This is typically the method shown for Fast Fourier Transforms.

If we have two polynomials $\bm{a}$ and $\bm{b}$, instead of multiplying them outright, let us evaluate them at $n$ points. If we have $n$ points, we can uniquely represent the $n - 1$ degree polynomial passing through those points.

Say we have the points $$ (i, \bm{a}(i)) $$ $$ (i, \bm{b}(i)) $$

Then we see that multiplication of the two polynomials can be done in $\mathcal{O}(n)$ time, as for each point we can just multiply pointwise to get $ (i, \bm{a}(i) \cdot \bm{b}(i)) $.

Equivalence between CRT and Point Representation

In the CRT representation, we took a polynomial $\bm{a}$ and represented it as the set of values $$\hat{a}_i \equiv \bm{a} \pmod{(x - r_i)}$$

Now in this case, when modding by the polynomial $x - r_i$, note that we can replace any instance of $x$ with $r_i$, as $x - r_i \equiv 0$. Therefore, this is exactly the same as a point representation, specifically at $$\bm{a}(r_i)$$

Fast Conversion

So in our new domain, we can multiply polynomials by just taking a pointwise product in $\mathcal{O}(n)$ time. However, how do we convert our polynomials to this representation? If we want to evaluate each polynomial at the $n$ points, it will still take $\mathcal{O}(n^2)$ time.

Let us take a look at a small example, with the ring $\mathbb{Z}_{17}/(X^4 + 1)$.

Roots of Unity

Let us look at how $X^4 + 1$ splits. Note that $$X^4 + 1 = (X^2 - k)(X^2 + k)$$ $$X^4 + 1 = X^4 - k^2$$ $$1 = -k^2$$

Therefore, $k$ is the fourth primitive root of unity.

Let us look at the next split.

$$X^2 + k = (X - \ell)(X + \ell)$$ $$X^2 + k = X^2 - \ell^2$$ $$k = -\ell^2$$

Therefore, we see that $\ell = \sqrt{-k} = \sqrt{-1 \cdot k} = \sqrt{k^3}$. We see that $\sqrt{k}^8 = 1$, and so $\sqrt{k}$ is the $8$-th primitive root of unity.

We now notate $\omega_d$ as the value where $\omega_d^d = 1$, and $\omega_d^{x} \neq 1$ where $x < d$.

We can see here that $\sqrt{k^3} = \omega_8^3$.

Therefore, we will get $$X^4 + 1 = (X - \omega_8)(X + \omega_8)(X - \omega^3_8)(X + \omega^3_8)$$

This extends for any polynomial $X^n + 1$. We get $$X^n + 1 = $$ $$(X - \omega_{2n})(X + \omega_{2n})\cdots(X - \omega^{2i + 1}_{2n})(X - \omega^{2i + 1}_{2n})\cdots(X - \omega^{n - 1}_{2n})(X + \omega^{n - 1}_{2n})$$

Note.

Polynomials of the form $X^n + 1$ are known as negacyclic polynomial, and have many uses in cryptography. The NTT can be performed in most rings, however here I will just be focusing on negacyclic multiplication (as it is used in most lattice based protocols).

Conversion

Let $\bm{a}$ be an element of $\mathbb{Z}_q[X]/(X^{n} + 1)$, and split it into two values $$\bar{\bm{a}}_1 \in \mathbb{Z}_q[X]/(X^{n/2} - \omega_{2n}^{n/2})$$ $$\bar{\bm{a}}_2 \in \mathbb{Z}_q[X]/(X^{n/2} + \omega_{2n}^{n/2})$$

In the first ring, $X^{n/2} = \omega_{2n}^{n/2}$. Therefore, each coefficient in $\bm{a}$ with $X^{n/2}$ will be reduced down, and so we get $$\bar{\bm{a}}_1 = (a_0 + \omega^{n/2}_{2n}a_{n/2}) + (a_1 + \omega^{n/2}_{2n}a_{n/2 + 1})X + (a_2 + \omega^{n/2}_{2n}a_{n/2 + 2})X^2 + \cdots$$

In the second ring, $X^{n/2} = -\omega_{2n}^{n/2}$. Therefore, we get $$\bar{\bm{a}}_2 = (a_0 - \omega^{n/2}_{2n}a_{n/2}) + (a_1 - \omega^{n/2}_{2n}a_{n/2 + 1})X + (a_2 - \omega^{n/2}_{2n}a_{n/2 + 2})X^2 + \cdots$$

Now note the following: both of these can be computed in the same step. For each coefficient of $\bar{a}$, we can compute $a_i$ and $\omega_{2n}^{n/2}a_{n/2 + i}$, and then add and subtract respectively.

Therefore, what we can do is:

  • for $i = 0$ to $n/2$, do
    • $\bar{\bm{a}}_1[i] = a_i + \omega_{2n}^{n/2}a_{n/2 + i}$
    • $\bar{\bm{a}}_2[i] = a_i - \omega_{2n}^{n/2}a_{n/2 + i}$
  • $NTT(\bar{\bm{a}}_1)$
  • $NTT(\bar{\bm{a}}_2)$

Note that now, converting $\bm{a}$ into two subpolynomials $\bar{\bm{a}}_1$ and $\bar{\bm{a}}_2$ only takes $\mathcal{O}(n)$ operations. Now that we have these subpolynomials, we can recursively apply the same logic to traverse down the tree of factors, which has depth $\log n$. Therefore, this conversion takes $\mathcal{O}(n\log n)$ time!

Converting Back

To convert back, we can perform a similar operation recursively. So we have $\bar{\bm{a}}_1$ and $\bar{\bm{a}}_2$. Let us consider the first coefficient of these two polynomials.

So we have $$\bar{a}_{1, 0} = a_0 + \omega_{2n}^{n/2}a_{n/2}$$ $$\bar{a}_{2, 0} = a_0 - \omega_{2n}^{n/2}a_{n/2}$$

Note that if we sum these two, we get $\bar{a}_{1, 0} + \bar{a}_{2, 0} = 2a_0$. Dividing by $2$ gives us $a_0$.

If we subtract these two, we get $\bar{a}_{1, 0} - \bar{a}_{2, 0} = 2\omega_{2n}^{n/2}a_{n/2}$. Multiplying by $2^{-1}\omega_{-2n}^{n/2}$ gets $a_{n/2}$.

And so we can convert back recursively in $\mathcal{O}(n\log n)$ time.

Recap

Now that we have this recursive algorithm to convert from the standard coefficient polynomial to an NTT representation of the polynomial, in the NTT domain.

Taking the previous recursive formula, we can now do the following: $$\hat{\bm{a}} = NTT(\bm{a})$$ $$\hat{\bm{b}} = NTT(\bm{b})$$ $$\hat{\bm{c}} = \hat{\bm{a}} \star \hat{\bm{b}}$$ $$\bm{c} = iNTT(\hat{\bm{c}})$$

Where $\star$ is pointwise multiplication. The first two steps take $n\log n$ time, the third step takes linear time, and the last step also takes $n\log n$ time. Therefore, the whole polynomial multiplication only takes $n\log n$ time!

In general, it is typically more efficient for cryptographic schemes to keep polynomial equations in the NTT domain as much as possible, as conversion is now the main bottleneck to do algebra with ($n\log n$ time).

In some cases, the $2n$-th root of unity $\omega_{2n}$ does not exist in a ring (such as in Kyber), so $X^2 - \omega_{n}$ is actually not factorable. In this case, the NTT can be used to represent a polynomial as a set of degree 1 polynomials instead of degree 0, and so we must multiply two degree 1 polynomials together.

Conclusion

Conceptually, the NTT is maybe the hardest part of Kyber and Dilithium to understand. I like this post on describing the NTT. However, I also really like the description through the Chinese Remainder Theorem, which was not touched upon much in that post.