The Fundamental Theorem of Finitely Generated Modules over Principal Ideal Domains

Joseph Sullivan

October 2021


These are some notes about the Fundamental Theorem of Finitely Generated Modules over Principal Ideal Domains (FTFGMPID). I’m assuming that you are familiar with rings, PIDs, and modules.

Notation. For an \(R\)-module \(M\) and \(S\subseteq M\), I use the notation \(\langle S\rangle\) or \(\langle x \mid x\in S\rangle\) for the submodule generated by \(S\). For \(R\) a ring and \(S\subseteq R\), I use the notation \((S)\) or \((x\mid x\in S)\) for the ideal generated by \(S\).

Structure. The structure of these notes is as follows:

Comments. I just wanted to make a comment about how this is a really cool theorem, and I really believe that you personally can get a good feeling for how the proof works. I was confused the first time I saw the proof, but when it comes down to it, the FTFGMPID is just about cleverly replacing generators/relations.

For example, when a vector space has a basis \(\{v_1,v_2\}\), it also has a basis \(\{v_1+v_2,v_2\}\). This is really what’s going on, but we also have to talk about replacing generators for relations as well. We’ll start by proving some theorems that will eventually guarantee us finitely many generators/relations. Then, we’ll prove some theorems about how to replace generators. Finally, we’ll prove the existence of the Smith normal form, which is just a nice format for generators/relations. From this Smith normal form, we’ll pretty immediately get the FTFGMPID.


Definition 1. Let \(M\) be an \(R\)-module. We say \(M\) is Noetherian if every submodule is finitely generated. We call a ring \(R\) Noetherian if it is Noetherian as an \(R\)-module over itself.

Lemma 1. Let \(R\) be a PID, then \(R\) is Noetherian.

Proof. Submodules of rings are exactly ideals, and every ideal of \(R\) is principal, so therefore finitely generated. ◻

Lemma 2. Let \(M\) be an \(R\)-module, and \(N\leq M\) a submodule. \(M\) is Noetherian if and only if \(M/N\) and \(N\) are Noetherian.

Proof. Assume \(M\) is Noetherian. Then, \(N\) is Noetherian because every submodule of \(N\) is a submodule of \(M\), therefore finitely generated. Next, \(M/N\) is Noetherian because every submodule \(M'/N \leq M/N\) is the image of a submodule \(M'\leq M\), which is finitely generated because \(M\) is Noetherian.

Assume \(M/N, N\) are Noetherian. Let \(M'\leq M\) be a submodule. Then, \[(M'+N)/N \leq M/N\] has generators \(\overline{m_1},\dots,\overline{m_k}\) (with \(m_1,\dots,m_k \in M'\)) because \(M/N\) is Noetherian. Likewise, \(M'\cap N \leq N\) has generators \(n_1,\dots,n_\ell\) because \(N\) is Noetherian.

We then claim \(M' = \langle m_1,\dots,m_k, n_1, \dots, n_\ell\rangle\). The “\(\supseteq\)” inclusion is clear because \[m_1,\dots,m_k,n_1,\dots,n_\ell \in M'.\] We now prove the “\(\subseteq\)” inclusion.

Let \(m\in M'\). Passing to the quotient \((M'+N)/N\) we get \[\overline{m} = a_1\overline{m_1} + \dots + a_k\overline{m_k},\] so \[m-(a_1 m_1 + \dots + a_k m_k) \in N,\] which implies \[\begin{aligned} m-(a_1 m_1 + \dots + a_k m_k) &= b_1 n_1 + \dots b_\ell n_\ell\\ m &= a_1 m_1 + \dots + a_k m_k + b_1 n_1 + \dots b_\ell n_\ell, \end{aligned}\] so \(m \in \langle m_1,\dots,m_k, n_1, \dots, n_\ell\rangle\). ◻

Definition 2. Let \(M\) be an \(R\)-module. \(M\) is called cyclic if it is generated by a single element, i.e. \(M=\langle x\rangle\) for some \(x\in M\).

Lemma 3. Let \(R\) be Noetherian, and let \(M\) be a cyclic \(R\)-module. Then, \(M\) is Noetherian.

Proof. Since \(M\) is cyclic, \(M=\langle x\rangle\). This gives us a surjective \(R\)-module homomorphism \[\varphi: R\to M\] by \(\varphi(r) = rx\). Therefore, \(R/\ker{\varphi} \cong M\). Since \(R\) is Noetherian, \(R/\ker{\varphi}\) is Noetherian by Lemma 2, so \(M\) is Noetherian. ◻

Lemma 4. Let \(R\) be Noetherian and \(M\) a finitely generated \(R\)-module. Then, \(M\) is Noetherian.

Proof. Let \(n\) be the minimal number of generators for \(M\). We prove this by induction on \(n\).

For the base case, \(n=1\), \(M\) is Noetherian by Lemma 3.

Assume \(M\) is generated by \(x_1,\dots,x_n\). Then, \(\langle x_n\rangle\) is Noetherian by Lemma 3. Also, \[M/\langle x_n\rangle = \langle\overline{x_1},\dots,\overline{x_{n-1}}\rangle,\] which is Noetherian by our induction assumption. Therefore, \(M\) is Noetherian by Lemma 2. ◻


Definition 3. Let \(M\) be an \(R\)-module. A relation on \(\{x_j\}_{j\in J} \subseteq M\) is an element \((a_j)_{j\in J}\in \bigoplus_{j\in J} R\) such that \[\sum_{j\in J} a_j x_j = 0.\]

Remark 1. A relation just says how certain elements are related. Every set of elements has the trivial relation \(0\in \bigoplus_{j\in J} R\). A nontrivial relation asserts that the set of elements is linearly dependent.

We also mention that in a ring \(R\), a singleton \(\{*\}\) need not be linearly independent, i.e. it can have a relation. For example, consider \(\mathbb{Z}/n\mathbb{Z}\) as a \(\mathbb{Z}\)-module. Then, \[\{1\}\subseteq \mathbb{Z}/n\mathbb{Z}\] has the relation \(n\in \mathbb{Z}\), i.e. \[n\cdot 1 = 0.\]

Proposition 1. Let \(M\) be an \(R\)-module. Let \(\{x_j\}_{j\in J} \subseteq M\). Then, the set of relations on \(\{x_j\}\) is a submodule of \(\bigoplus_{j\in J} R\), which is given by \(\ker{\varphi}\), where \(\varphi\) is the map \[\begin{aligned} \varphi: \bigoplus_{j\in J} R_i &\longrightarrow M\\ e_j &\longmapsto x_j \end{aligned}\] where \(e_j\) is the \(j\)th standard basis vector. We call \(\ker{\varphi}\) the relations module (of \(\{x_j\}\)).

Definition 4. Let \(M\) be an \(R\)-module. A presentation is a pair of generators \(\{x_j\}_{j\in J}\) for \(M\) and generators \(\{y_i\}_{i\in I}\) for the relations module.

A presentation is called finite if there are finitely many generators for both \(M\) and the relations module.

Remark 2. When \(R=\mathbb{Z}\), this is the same thing as a presentation of an abelian group.

Remark 3. Let \(M\) be an \(R\)-module with a presentation consisting of generators \(\{x_j\}_{j\in J}\) and relations \(\{y_i\}_{i\in I}\). Building off of Proposition 1, we have a map \[\varphi: \bigoplus_{j\in J} R \longrightarrow M\] which is actually surjective, since the \(\{x_j\}\) generate \(M\). Then, by the first isomorphism theorem and the fact that \(\ker{\varphi} = \langle y_i \mid i\in I\rangle\) we get \[\bigoplus_{j\in J} R \Big/ \langle y_i \mid i\in I\rangle \cong M.\] In this way, a presentation is an assertion that \(M\) is isomorphic to a free \(R\)-module modulo the submodule generated by the given relations.

Lemma 5. Let \(R\) be a Noetherian ring and \(M\) be an \(R\)-module. If \(M\) is finitely generated, then \(M\) has a finite presentation.

Proof. By assumption we have finitely many generators, denote them \(m_1,\dots,m_n\). By Proposition 1, the module of relations is given by \(\ker{\varphi}\subseteq \bigoplus_{i=1}^n R\). Since \(R\) is Noetherian and \(\bigoplus_{i=1}^n R\) is finitely generated over \(R\), by Lemma 4, \(\ker{\varphi}\) is finitely generated. ◻

Definition 5. Let \(M\) be an \(R\)-module with a finite presentation. Let \(x_1,\dots,x_n\) be the generators of \(M\), and \(y_1,\dots,y_m \in \oplus_{i=1}^n R\) be the generators for the relations module. Write the relations in coordinates, i.e., \[y_i = a_{i1}e_1 + \dots + a_{in}e_n,\] where \(e_1,\dots,e_n\) is the standard basis for \(\oplus_{i=1}^n R\). Then, the relations matrix of the presentation is \[A = \begin{bmatrix} a_{11} & a_{12} & \dots & a_{1n}\\ a_{21} & a_{22} & \dots & a_{2n}\\ \vdots & \vdots & \ddots & \vdots\\ a_{m1} & a_{m2} & \dots & a_{mn} \end{bmatrix}.\] It is just the matrix with the relations (in coordinates) as rows.

Remark 4. To get the FTFGMPID, we need to show that all finitely generated modules over PIDs have a particularly good presentation, one of the form \[M \cong \bigoplus_{i=1}^n R \Big/ \langle a_1e_1, \dots, a_k e_k\rangle\] where \(a_1 \mid \dots \mid a_k\). We now provide some lemmas on switching out the generators of \(M\) and the relations module.

Lemma 6. Let \(M\) be an \(R\)-module with generators \(x_1,\dots,x_n\). Let \(P\in R^{n\times n}\) be an invertible matrix. Then, for \[\begin{bmatrix} w_1 \\ \vdots \\ w_n \end{bmatrix} = P \begin{bmatrix} x_1 \\ \vdots \\ x_n \end{bmatrix}\] the \(w_1,\dots,w_n\) are generators of \(M\).

Proof. It suffices to prove that each \(x_1,\dots,x_n\) can be generated from \(w_1,\dots,w_n\). Since \(P\in R^{n\times n}\) is invertible, we have \[\begin{bmatrix} x_1 \\ \vdots \\ x_n \end{bmatrix} = P^{-1} \begin{bmatrix} w_1 \\ \vdots \\ w_n \end{bmatrix},\] so each \(x_i\) is an \(R\)-linear combination of the \(w_i\), so the \(w_i\) generate \(M\). ◻

Lemma 7. Let \(M\) be an \(R\)-module with a finite presentation, consisting of generators \(x_1,\dots,x_n \in M\) and relations \(y_1,\dots,y_m \in \bigoplus_{i=1}^n R\). Let \(A\) be the relations matrix, and suppose \[B = PAQ^T\] for \(P\in R^{m\times m}\) and \(Q\in R^{n\times n}\) are invertible matrices. Then, \(M\) has a presentation with generators given by the entries of \[\begin{bmatrix} w_1\\ \vdots\\ w_n \end{bmatrix}=Q^{-1}\begin{bmatrix} x_1\\ \vdots\\ x_n \end{bmatrix},\] and relations given by the rows of \(B\), realized as elements of \(\bigoplus_{i=1}^n R\). The relations matrix for this presentation is then exactly \(B\).

Proof. By Lemma 6, the \(w_1,\dots,w_n\) are generators for \(M\). The situation is essentially a change of basis. We now want to find generators for the relations module of \(\{w_i\}\) instead of \(\{x_i\}\). We also want to express our new relations in terms of the old ones.

As per Proposition 1, the relations module is the kernel of a certain surjective homomorphism \(\bigoplus_{i=1}^n R\) to \(M\). If we let \(f: \bigoplus_{i=1}^n R\to M\) be the map associated to our original presentation of \(M\) (i.e., \(f(e_i)=x_i\)), then we get that the composition of maps \[\bigoplus_{i=1}^n R \xrightarrow{Q^{-1}} \bigoplus_{i=1}^n R \xrightarrow{f} M,\] sends \(e_i \mapsto w_i\). This is exactly the situation of Proposition 1, so we really are looking for generators of the kernel of this composition. Then, the kernel is just \[\begin{aligned} (f\circ Q^{-1})^{-1}(0) &= Q\circ f^{-1}(0)\\ &= Q\ker{f}. \end{aligned}\] Since \(\ker{f}\) is generated \(y_1,\dots,y_m\), \(Q\ker{f}\) is generated by \[Qy_1,\dots,Qy_m.\] By Lemma 6, the entries of \[\begin{bmatrix} z_1 \\ \vdots \\ z_m \end{bmatrix}=P \begin{bmatrix} Qy_1\\ \vdots \\ Qy_m \end{bmatrix}\] form a generating set for \(\ker{f}\). Writing the \(z_i\) in coordinates, one can check that the relations matrix is exactly \(B=PAQ^T\). ◻


Remark 5. Building off of Lemma 7, to prove our main theorem (the FTFGMPID), we need to put a particular matrix in a nice format. This where the Smith normal form comes into play.

Definition 6. Let \(R\) be a ring. A matrix \(A\in R^{m\times n}\) is said to be in Smith normal form if as block matrices it can be written \[A=\begin{bmatrix} D & 0\\ 0 & 0 \end{bmatrix},\] where \[D=\begin{bmatrix} a_1 & 0 & \cdots & 0\\ 0 & a_2 & \cdots & 0\\ \vdots & \vdots & \ddots & 0\\ 0 & 0 & \cdots & a_k \end{bmatrix}\] with \(k\leq \min\{m,n\}\) and \(a_1\mid a_2 \mid \dots \mid a_k\).

Lemma 8. Let \(R\) be a PID and let \(A\in R^{m\times n}\). There exist invertible \(P\in R^{m\times m}\) and \(Q\in R^{n\times n}\) such that \[B=PAQ\] is in Smith normal form.

Proof Idea.. We describe an algorithm for putting a matrix into Smith normal form. It is very tedious to describe this totally precisely, so we cut some corners. Formally, one should prove that this algorithm works by induction on \(\ell=\min\{m,n\}\).

The first stage of the algorithm is to put \(A\) into a diagonal form. We iterate over \(t=1,\dots,\ell\). Assume, by multiplying invertible matrices on the left and right (corresponding to column and row operations) that we already have \[P_t A Q_t = \left[\begin{array}{c c c c | c c c} a_1 & 0 & \cdots & 0 & 0 & \cdots & 0\\ 0 & a_2 & \cdots & 0 & 0 & \cdots & 0\\ \vdots & \vdots & \ddots & 0 & 0 & \cdots & 0\\ 0 & 0 & 0 & a_{t-1} & 0 & \cdots & 0\\\hline 0 & 0 & 0 & 0 & a_{tt} & \cdots & a_{tn}\\ \vdots & \vdots & \vdots & \vdots & \vdots & \ddots & \vdots\\ 0 & 0 & 0 & 0 & a_{mt} & \cdots & a_{mn} \end{array}\right] = \begin{bmatrix} D_t & 0\\ 0 & A_t \end{bmatrix}.\] Our goal is to next clear out the column below and the row to the right of the entry \(a_{tt}\). To make notation less cumbersome, we focus on just \(A_t\) and we relabel its entries as follows \[A_t = \begin{bmatrix} a_{11} & \cdots & a_{1n}\\ \vdots & \ddots & \vdots\\ a_{m1} & \cdots & a_{mn} \end{bmatrix}.\] Here is our strategy to clear out the first row/column (except for an entry in the \((1,1)\) position):

  • Find a nonzero entry in the first row or column (in case \(a_{11}=0\)), and do a row/column permutation to put the entry in the \((1,1)\) position. If all entries in the first row or column are 0, then we are already done.

  • Keep replacing \(a_{11}\) with \(\gcd(a_{11}, a_{1t})\) or \(\gcd(a_{11}, a_{t1})\) by doing the following:

    • We show only the \(2\times n\) case and row version, but it can be transposed for a column version, and you can pad the matrix with diagonal of \(1\)s to get a version for \(m\times n\) matrices. Suppose \(A\) has the following form: \[A = \begin{bmatrix} a & b\\ c & d \end{bmatrix},\] and our goal is to replace \(a\) with \(e= \gcd(a,c)\). We have \((e)=(a,c)\). Then, \[e=ax+cy, \qquad a=\alpha e, \quad c=\gamma e,\] for some \(x,y,\alpha,\gamma\in R\), and also \[1 = \alpha x + \gamma y.\] Then, let \(P\) be the following matrix: \[P = \begin{bmatrix} x & y\\ -\gamma & \alpha \end{bmatrix}.\] The matrix \(P\) is invertible because it has determinant \(\alpha x + \gamma y = 1\), which is a unit in \(R\). This gives us \[\begin{aligned} PA &= \begin{bmatrix} ax+cy & bx+dy\\ -a\gamma+c\alpha & -b\gamma+d\alpha \end{bmatrix}\\ &= \begin{bmatrix} e & bx+dy\\ 0 & -b\gamma+d\alpha \end{bmatrix}. \end{aligned}\]

  • Each time we replace \(a_{11}\) with \(\gcd(a_{11}, a_{1t})\) or \(\gcd(a_{11}, a_{1t})\), we are peeling away factors of \(a_{11}\). In the process, we might “mess up” the matrix and reintroduce values where the entry was originally 0. This does not matter.

    The thing that matters is that in each step we remove some irreducible factors from \(a_{11}\) until it happens to divide everything in the column/row (for example if any of the \(\gcd\)s are eventually 1). This will eventually happen because PIDs are UFDs, so \(a_{11}\) has finitely many irreducible factors.

  • Once \(a_{11}\) divides all entries in the column and row, we can do column/row replacements to eliminate these entries.

We keep repeating this algorithm until \(A\) is put in diagonal form. Now, we need to modify \(A\) so that \(a_1 \mid a_2 \mid \dots \mid a_k\). We again show case for \(2\times 2\) matrices: Suppose \(A\) has the following form: \[A=\begin{bmatrix} a & 0\\ 0 & b \end{bmatrix}.\] Then, for \(e=\gcd(a,b)\), we have \((e)=(a,b)\) so there exists \(x,y,\alpha,\beta\in R\) such that \[e=ax+by, \qquad a= \alpha e, \quad b=\beta e.\] Then, we can do the following replacements/permutations: \[\begin{aligned} \begin{bmatrix} a & 0\\ 0 & b \end{bmatrix}&\mapsto \begin{bmatrix} a & 0\\ ax & b \end{bmatrix} \mapsto \begin{bmatrix} a & 0\\ ax+by & b \end{bmatrix} = \begin{bmatrix} a & 0\\ e & b \end{bmatrix}\\ &\mapsto \begin{bmatrix} 0 & -\alpha b\\ e & b \end{bmatrix}\mapsto \begin{bmatrix} 0 & -\alpha b\\ e & 0 \end{bmatrix} \mapsto \begin{bmatrix} e & 0\\ 0 & -\alpha b \end{bmatrix}, \end{aligned}\] and \(e \mid -\alpha b\). ◻

Remark 6. With the machinery of presentations and the Smith normal form, the proof of the FTFGMPID is actually not very long. In my experience, it’s very important to have a good feel for these things to have a more intuitive appreciation for the FTFGMPID.

Theorem 1. (Fundamental Theorem of Finitely Generated Modules over PIDs) Let \(R\) be a PID, and let \(M\) be a finitely generated \(R\)-module. Then, \[M \cong R/(a_1) \oplus R/(a_2) \oplus \dots \oplus R/(a_k) \oplus R^r\] for some \(r\in \mathbb{Z}_{\geq 0}\) and for some nonzero non-units \(a_1, a_2, \dots, a_k \in R\) which satisfy the divisibility relations \[a_1 \mid a_2 \mid \dots \mid a_k.\]

Proof. Since \(R\) is a PID, it is Noetherian by Lemma 1. Therefore, by Lemma 5, \(M\) has a finite presentation. The relations matrix can be put in Smith normal form by Lemma 8, so by Lemma 7 there exists generators \(x_1,\dots,x_n\) for \(M\) and relations \(y_1,\dots,y_m\in \bigoplus_{i=1}^n R\) with relations matrix \(A\in R^{m\times n}\) such that as blocks \[A=\begin{bmatrix} D & 0\\ 0 & 0 \end{bmatrix},\qquad D=\begin{bmatrix} a_1 & 0 & \cdots & 0\\ 0 & a_2 & \cdots & 0\\ \vdots & \vdots & \ddots & 0\\ 0 & 0 & \cdots & a_k \end{bmatrix}\] with \(a_1\mid \dots \mid a_k\). Thus, our nontrivial relations are exactly \(y_1=a_1e_1, \dots, y_k=a_ke_k\). This implies \[\begin{aligned} M &\cong \bigoplus_{i=1}^n R \Big/ \langle a_1 e_1, \dots, a_k e_k\rangle\\ &\cong R/(a_1) \oplus \dots \oplus R/(a_k) \oplus R^{n-k}. \end{aligned}\] We can assume all \(a_i\neq 0\), since otherwise such a term \(R/(a_i) \cong R\) can be included with the other copies of \(R\). ◻