Section SD Similarity and Diagonalization

This section’s topic will perhaps seem out of place at first, but we will make the connection soon with eigenvalues and eigenvectors. This is also our first look at one of the central ideas of Chapter R.

Subsection SM Similar Matrices

The notion of matrices being “similar” is a lot like saying two matrices are row-equivalent. Two similar matrices are not equal, but they share many important properties. This section, and later sections in Chapter R will be devoted in part to discovering just what these common properties are.

First, the main definition for this section.

Definition SIM. Similar Matrices.

Suppose \(A\) and \(B\) are two square matrices of size \(n\text{.}\) Then \(A\) and \(B\) are similar if there exists a nonsingular matrix of size \(n\text{,}\) \(S\text{,}\) such that \(\similar{A}{S}=B\text{.}\)

Equivalently, we can require that \(AS=SB\text{.}\)

We will say “\(A\) is similar to \(B\) via \(S\)” when we want to emphasize the role of \(S\) in the relationship between \(A\) and \(B\text{.}\) Also, it does not matter if we say \(A\) is similar to \(B\text{,}\) or \(B\) is similar to \(A\text{.}\) If one statement is true then so is the other, as can be seen by using \(\inverse{S}\) in place of \(S\) (see Theorem SER for the careful proof). Finally, we will refer to \(\similar{A}{S}\) as a similarity transformation when we want to emphasize the way \(S\) changes \(A\text{.}\) OK, enough about language, let us build a few examples.

Example SMS5. Similar matrices of size 5.

If you wondered if there are examples of similar matrices, then it will not be hard to convince you they exist. Define

\begin{align*} A=\begin{bmatrix} -4 & 1 & -3 & -2 & 2 \\ 1 & 2 & -1 & 3 & -2 \\ -4 & 1 & 3 & 2 & 2 \\ -3 & 4 & -2 & -1 & -3 \\ 3 & 1 & -1 & 1 & -4 \end{bmatrix}&& S=\begin{bmatrix} 1 & 2 & -1 & 1 & 1 \\ 0 & 1 & -1 & -2 & -1 \\ 1 & 3 & -1 & 1 & 1 \\ -2 & -3 & 3 & 1 & -2 \\ 1 & 3 & -1 & 2 & 1\\ \end{bmatrix}\text{.} \end{align*}

Check that \(S\) is nonsingular and then compute

\begin{align*} &B=\similar{A}{S}\\ &= \begin{bmatrix} 10 & 1 & 0 & 2 & -5 \\ -1 & 0 & 1 & 0 & 0 \\ 3 & 0 & 2 & 1 & -3 \\ 0 & 0 & -1 & 0 & 1 \\ -4 & -1 & 1 & -1 & 1 \end{bmatrix} \begin{bmatrix} -4 & 1 & -3 & -2 & 2 \\ 1 & 2 & -1 & 3 & -2 \\ -4 & 1 & 3 & 2 & 2 \\ -3 & 4 & -2 & -1 & -3 \\ 3 & 1 & -1 & 1 & -4 \end{bmatrix} \begin{bmatrix} 1 & 2 & -1 & 1 & 1 \\ 0 & 1 & -1 & -2 & -1 \\ 1 & 3 & -1 & 1 & 1 \\ -2 & -3 & 3 & 1 & -2 \\ 1 & 3 & -1 & 2 & 1 \end{bmatrix}\\ &= \begin{bmatrix} -10 & -27 & -29 & -80 & -25 \\ -2 & 6 & 6 & 10 & -2 \\ -3 & 11 & -9 & -14 & -9 \\ -1 & -13 & 0 & -10 & -1 \\ 11 & 35 & 6 & 49 & 19 \end{bmatrix}\text{.} \end{align*}

So by this construction, we know that \(A\) and \(B\) are similar.

Let us do that again.

Example SMS3. Similar matrices of size 3.

Define

\begin{align*} A=\begin{bmatrix} -13 & -8 & -4 \\ 12 & 7 & 4 \\ 24 & 16 & 7 \end{bmatrix}&& S=\begin{bmatrix} 1 & 1 & 2 \\ -2 & -1 & -3 \\ 1 & -2 & 0 \end{bmatrix}\text{.} \end{align*}

Check that \(S\) is nonsingular and then compute

\begin{align*} B&=\similar{A}{S}\\ &= \begin{bmatrix} -6 & -4 & -1 \\ -3 & -2 & -1 \\ 5 & 3 & 1 \end{bmatrix} \begin{bmatrix} -13 & -8 & -4 \\ 12 & 7 & 4 \\ 24 & 16 & 7 \end{bmatrix} \begin{bmatrix} 1 & 1 & 2 \\ -2 & -1 & -3 \\ 1 & -2 & 0 \end{bmatrix}\\ &= \begin{bmatrix} -1 & 0 & 0 \\ 0 & 3 & 0 \\ 0 & 0 & -1 \end{bmatrix}\text{.} \end{align*}

So by this construction, we know that \(A\) and \(B\) are similar. But before we move on, look at how pleasing the form of \(B\) is. Not convinced? Then consider that several computations related to \(B\) are especially easy. For example, in the spirit of Example DUTM, \(\detname{B}=(-1)(3)(-1)=3\text{.}\) Similarly, the characteristic polynomial is straightforward to compute by hand, \(\charpoly{B}{x}=(-1-x)(3-x)(-1-x)=-(x-3)(x+1)^2\) and since the result is already factored, the eigenvalues are transparently \(\lambda=3,\,-1\text{.}\) Finally, the eigenvectors of \(B\) are just the standard unit vectors (Definition SUV).

Subsection PSM Properties of Similar Matrices

Similar matrices share many properties and it is these theorems that justify the choice of the word “similar.” First we will show that similarity is an equivalence relation. Equivalence relations are important in the study of various algebras and can always be regarded as a kind of weak version of equality. Sort of alike, but not quite equal. The notion of two matrices being row-equivalent is an example of an equivalence relation we have been working with since the beginning of the course (see Exercise RREF.T11). Row-equivalent matrices are not equal, but they are a lot alike. For example, row-equivalent matrices have the same rank. Formally, an equivalence relation requires three conditions hold: reflexive, symmetric and transitive. We will illustrate these as we prove that similarity is an equivalence relation.

Theorem SER. Similarity is an Equivalence Relation.

Suppose \(A\text{,}\) \(B\) and \(C\) are square matrices of size \(n\text{.}\) Then we have the following three properties, by name.

Reflexive: \(A\) is similar to \(A\text{.}\)
Symmetric: If \(A\) is similar to \(B\text{,}\) then \(B\) is similar to \(A\text{.}\)
Transitive: If \(A\) is similar to \(B\) and \(B\) is similar to \(C\text{,}\) then \(A\) is similar to \(C\text{.}\)

Proof.

To see that \(A\) is similar to \(A\text{,}\) we need only demonstrate a nonsingular matrix that effects a similarity transformation of \(A\) to \(A\text{.}\) \(I_n\) is nonsingular (since it row-reduces to the identity matrix, Theorem NMRRI). Then

\begin{equation*} AI_n = A = I_nA \end{equation*}

and we see that \(A\) is similar to \(A\) via the nonsingular matrix \(I_n\text{.}\)

If we assume that \(A\) is similar to \(B\text{,}\) then we know there is a nonsingular matrix \(S\) so that \(AS=SB\) by Definition SIM. By Theorem MIMI, \(\inverse{S}\) is invertible, and by Theorem NI is therefore nonsingular. Then

\begin{equation*} B\inverse{S} = \left(\inverse{S}S\right)B\inverse{S} = \inverse{S}\left(SB\right)\inverse{S} = \inverse{S}\left(AS\right)\inverse{S} = \inverse{S}A\left(S\inverse{S}\right) = \inverse{S}A \end{equation*}

and we see that \(B\) is similar to \(A\) via the nonsingular matrix \(\inverse{S}R\text{.}\)

If we assume that \(A\) is similar to \(B\text{,}\) and \(B\) is similar to \(C\text{,}\) then we know there are two nonsingular matrices, \(S\) and \(R\text{,}\) such that \(AS=SB\) and \(BR=RC\text{,}\) by Definition SIM. (Notice how we cannot presume \(S\) and \(R\) are the same matrix!) Since \(S\) and \(R\) are invertible, so too \(SR\) is invertible by Theorem SS and then nonsingular by Theorem NI. Then

\begin{equation*} A\left(SR\right) = \left(AS\right)R = \left(SB\right)R = S\left(BR\right) = S\left(RC\right) = \left(SR\right)C \end{equation*}

so \(A\) is similar to \(C\) via the nonsingular matrix \(SR\text{.}\)

Here is another theorem that tells us exactly what sorts of properties similar matrices share.

Theorem SMEE. Similar Matrices have Equal Eigenvalues.

Suppose \(A\) and \(B\) are similar matrices. Then \(A\) and \(B\) have the same eigenvalues, with identical algebraic and geometric multiplicities.

Proof.

Suppose that \(A\) and \(B\) are similar via \(S\text{,}\) so \(AS=SB\text{.}\) Let \(\vect{x}\neq\zerovector\) be an eigenvector of \(B\) for the eigenvalue \(\lambda\text{.}\) Since \(S\) is nonsingular, Definition NM implies that \(S\vect{x}\neq\zerovector\text{.}\) Furthermore

\begin{equation*} A\left(S\vect{x}\right) = \left(AS\right)\vect{x} = \left(SB\right)\vect{x} = S\left(B\vect{x}\right) = S\left(\lambda\vect{x}\right) = \lambda\left(S\vect{x}\right) \end{equation*}

demonstrating that \(S\vect{x}\) is an eigenvector of \(A\) for \(\lambda\text{.}\) So every eigenvalue of \(B\) is an eigenvalue of \(A\text{.}\) Exchanging the roles of \(A\) and \(B\) (and employing the nonsingular matrix \(\inverse{S}\)) will show that every eigenvalue of \(A\) is an eigenvalue of \(B\text{.}\) So the eigenvalues of \(A\) and the eigenvalues of \(B\) are equal as sets (Definition SE).

Suppose that \(C=\set{\vectorlist{x}{k}}\) is a basis for the eigenspace of \(\lambda\) for the matrix \(B\text{.}\) Then each vector of the set \(D=\set{S\vect{x}_1,\,S\vect{x}_2,\,S\vect{x}_3,\ldots,\,S\vect{x}_k}\) is an eigenvector of \(A\) for \(\lambda\text{,}\) by the same argment in the previous paragraph. It is an exercise to show that because \(S\) is nonsingular, the linear independence of \(C\) implies the linear independence of \(D\text{.}\) Thus, by Theorem G we have

\begin{equation*} \geomult{B}{\lambda}\leq\geomult{A}{\lambda}\text{.} \end{equation*}

Exchanging the roles of \(A\) and \(B\) gives the inequality in the other direction, implying equality of the geometric multiplicities.

To understand the algebraic multiplicities, it is necessary to consult the characteristic polynomials of the two matrices. The initial expression here looks contrived, but its purpose will be clear shortly.

\begin{align*} &\detname{S}\left(\charpoly{A}{x}-\charpoly{B}{x}\right)\\ &=\detname{S}\left(\detname{A-xI_n}-\detname{B-xI_n}\right)&& \knowl{./knowl/xref/definition-CP.html}{\text{Definition CP}}\\ &=\detname{S}\detname{A-xI_n}-\detname{S}\detname{B-xI_n}&& \knowl{./knowl/xref/property-DCN.html}{\text{Property DCN}}\\ &=\detname{A-xI_n}\detname{S}-\detname{S}\detname{B-xI_n}&& \knowl{./knowl/xref/property-CMCN.html}{\text{Property CMCN}}\\ &=\detname{\left(A-xI_n\right)S}-\detname{S\left(B-xI_n\right)}&& \knowl{./knowl/xref/theorem-DRMM.html}{\text{Theorem DRMM}}\\ &=\detname{AS-xS}-\detname{SB-xS}&& \knowl{./knowl/xref/theorem-MMDAA.html}{\text{Theorem MMDAA}}\\ &=\detname{SB-xS}-\detname{SB-xS}&& \knowl{./knowl/xref/definition-SIM.html}{\text{Definition SIM}}\\ &=0&& \knowl{./knowl/xref/property-AICN.html}{\text{Property AICN}} \end{align*}

Notice that the expressions in these equalities are scalar quantities, and nearly half of the theorems given as explanations are simply properties from Theorem PCNA. But the key step is the application of the multiplicative property of the determinant, Theorem DRMM, along with a setup using scalar commutativity, Property CMCN.

Because \(S\) is nonsingular, its determinant is nonzero (Theorem SMZD), and by Theorem ZPZF, we conclude that \(\charpoly{A}{x}-\charpoly{B}{x}=0\text{.}\) Since the characteristic polynomials of \(A\) and \(B\) are equal, they will factor identically, and the algebraic multiplicity of each eigenvalue will be the same for each matrix.

Be very careful with this theorem. It is tempting to think the converse is true, and argue that if two matrices have the same eigenvalues, then they are similar. Not so, as the following example illustrates. So do not think this theorem is a route to establishing that two matrices are similar.

Example EENS. Equal eigenvalues, not similar.

Define

\begin{align*} A&=\begin{bmatrix}1&1\\0&1\end{bmatrix} & B&=\begin{bmatrix}1&0\\0&1\end{bmatrix} \end{align*}

and check that

\begin{equation*} \charpoly{A}{x}=\charpoly{B}{x}=1-2x+x^2=(x-1)^2 \end{equation*}

and so \(A\) and \(B\) have equal characteristic polynomials. If the converse of Theorem SMEE were true, then \(A\) and \(B\) would be similar. Suppose this is the case. More precisely, suppose there is a nonsingular matrix \(S\) so that \(A=\similar{B}{S}\text{.}\)

Then

\begin{equation*} A=\similar{B}{S}=\similar{I_2}{S}=\inverse{S}S=I_2\text{.} \end{equation*}

Clearly \(A\neq I_2\) and this contradiction tells us that two matrices can have the same eigenvalues but not be similar. (Note that the geometric multiplicity of the eigenvalue of \(\lambda=1\) is different for the two matrices. So strictly speaking this example does not prove that the converse of Theorem SMEE is false. We are simply cautioning against a frequent temptation.)

Sage SM. Similar Matrices.

It is quite easy to determine if two matrices are similar, using the matrix method .is_similar(). However, computationally this can be a very difficult proposition, so support in Sage is incomplete now, though it will always return a result for matrices with rational entries. Here are examples where the two matrices are, and are not, similar. Notice that the keyword option transformation=True will cause a pair to be returned, such that if the matrices are indeed similar, the matrix effecting the similarity transformation will be in the second slot of the pair.

Since we knew in advance these two matrices are similar, we requested the transformation matrix, so the output is a pair. The similarity matrix is a bit of a mess, so we will use three Sage routines to clean up trans. We convert the entries to numerical approximations, clip very small values (less than \(10^{-5}\)) to zero and then round to three decimal places. You can experiment printing just trans all by itself.

The matrix C is not similar to A (and hence not similar to B by Theorem SER), so we illustrate the return value when we do not request the similarity matrix (since it does not even exist).

Subsection D Diagonalization

Good things happen when a matrix is similar to a diagonal matrix. For example, the eigenvalues of the matrix are the entries on the diagonal of the diagonal matrix. And it can be a much simpler matter to compute high powers of the matrix. Diagonalizable matrices are also of interest in more abstract settings. Here are the relevant definitions, then our main theorem for this section.

Definition DIM. Diagonal Matrix.

Suppose that \(A\) is a square matrix. Then \(A\) is a diagonal matrix if \(\matrixentry{A}{ij}=0\) whenever \(i\neq j\text{.}\)

Definition DZM. Diagonalizable Matrix.

Suppose \(A\) is a square matrix. Then \(A\) is diagonalizable if \(A\) is similar to a diagonal matrix.

Example DAB. Diagonalization of Archetype B.

Archetype B has a \(3\times 3\) coefficient matrix

\begin{equation*} B=\begin{bmatrix} -7&-6&-12\\ 5&5&7\\ 1&0&4 \end{bmatrix} \end{equation*}

and is similar to a diagonal matrix, as can be seen by the following computation with the nonsingular matrix \(S\text{,}\)

\begin{align*} \similar{B}{S}&= \inverse{\begin{bmatrix}-5&-3&-2\\3&2&1\\1&1&1\end{bmatrix}}\begin{bmatrix} -7&-6&-12\\ 5&5&7\\ 1&0&4 \end{bmatrix} \begin{bmatrix}-5&-3&-2\\3&2&1\\1&1&1\end{bmatrix}\\ &=\begin{bmatrix}-1&-1&-1\\2&3&1\\-1&-2&1\end{bmatrix} \begin{bmatrix} -7&-6&-12\\ 5&5&7\\ 1&0&4 \end{bmatrix} \begin{bmatrix}-5&-3&-2\\3&2&1\\1&1&1\end{bmatrix}\\ &= \begin{bmatrix}-1&0&0\\0&1&0\\0&0&2\end{bmatrix}\text{.} \end{align*}

Example SMS3 provides yet another example of a matrix that is subjected to a similarity transformation and the result is a diagonal matrix. Alright, just how would we find the magic matrix \(S\) that can be used in a similarity transformation to produce a diagonal matrix? Before you read the statement of the next theorem, you might study the eigenvalues and eigenvectors of Archetype B and compute the eigenvalues and eigenvectors of the matrix in Example SMS3.

Theorem DC. Diagonalization Characterization.

Suppose \(A\) is a square matrix of size \(n\text{.}\) Then \(A\) is diagonalizable if and only if there exists a linearly independent set \(S\) that contains \(n\) eigenvectors of \(A\text{.}\)

Proof.

(⇐)

Let \(S=\set{\vectorlist{x}{n}}\) be a linearly independent set of eigenvectors of \(A\) for the eigenvalues \(\scalarlist{\lambda}{n}\text{.}\) Recall Definition SUV and define

\begin{align*} R&=\matrixcolumns{x}{n} & D&= \begin{bmatrix} \lambda_1 & 0 & 0 &\cdots & 0\\ 0 &\lambda_2 & 0 &\cdots & 0\\ 0 & 0 &\lambda_3 &\cdots & 0\\ \vdots & \vdots & \vdots & & \vdots\\ 0 & 0 & 0 &\cdots & \lambda_n \end{bmatrix} =[\lambda_1\vect{e}_1|\lambda_2\vect{e}_2|\lambda_3\vect{e}_3|\ldots|\lambda_n\vect{e}_n]\text{.} \end{align*}

The columns of \(R\) are the vectors of the linearly independent set \(S\) and so by Theorem NMLIC the matrix \(R\) is nonsingular. We have

\begin{align*} AR &=A\matrixcolumns{x}{n}\\ &=[A\vect{x}_1|A\vect{x}_2|A\vect{x}_3|\ldots|A\vect{x}_n]&& \knowl{./knowl/xref/definition-MM.html}{\text{Definition MM}}\\ &=[\lambda_1\vect{x}_1|\lambda_2\vect{x}_2|\lambda_3\vect{x}_3|\ldots|\lambda_n\vect{x}_n]&& \knowl{./knowl/xref/definition-EEM.html}{\text{Definition EEM}}\\ &=[\lambda_1R\vect{e}_1|\lambda_2R\vect{e}_2|\lambda_3R\vect{e}_3|\ldots|\lambda_nR\vect{e}_n]&& \knowl{./knowl/xref/definition-MVP.html}{\text{Definition MVP}}\\ &=[R(\lambda_1\vect{e}_1)|R(\lambda_2\vect{e}_2)|R(\lambda_3\vect{e}_3)|\ldots|R(\lambda_n\vect{e}_n)]&& \knowl{./knowl/xref/theorem-MMSMM.html}{\text{Theorem MMSMM}}\\ &=R[\lambda_1\vect{e}_1|\lambda_2\vect{e}_2|\lambda_3\vect{e}_3|\ldots|\lambda_n\vect{e}_n]&& \knowl{./knowl/xref/definition-MM.html}{\text{Definition MM}}\\ &=RD\text{.} \end{align*}

This says that \(A\) is similar to the diagonal matrix \(D\) via the nonsingular matrix \(R\text{.}\) Thus \(A\) is diagonalizable (Definition DZM).

(⇒)

Suppose that \(A\) is diagonalizable, so there is a nonsingular matrix \(T\) of size \(n\) and a diagonal matrix \(E\) (recall Definition SUV)

\begin{align*} T&=\matrixcolumns{y}{n} & E&=\begin{bmatrix} d_1 & 0 & 0 &\cdots & 0\\ 0 &d_2 & 0 &\cdots & 0\\ 0 & 0 &d_3 &\cdots & 0\\ \vdots & \vdots & \vdots & & \vdots\\ 0 & 0 & 0 &\cdots & d_n \end{bmatrix} =[d_1\vect{e}_1|d_2\vect{e}_2|d_3\vect{e}_3|\ldots|d_n\vect{e}_n]&&\text{} \end{align*}

such that \(AT=TE\text{.}\)

Then consider,

\begin{align*} [A\vect{y}_1|A\vect{y}_2|A\vect{y}_3&|\ldots|A\vect{y}_n]\\ &=A\matrixcolumns{y}{n}&& \knowl{./knowl/xref/definition-MM.html}{\text{Definition MM}}\\ &=AT\\ &=TE&&\knowl{./knowl/xref/definition-SIM.html}{\text{Definition SIM}}\\ &=T[d_1\vect{e}_1|d_2\vect{e}_2|d_3\vect{e}_3|\ldots|d_n\vect{e}_n]\\ &=[T(d_1\vect{e}_1)|T(d_2\vect{e}_2)|T(d_3\vect{e}_3)|\ldots|T(d_n\vect{e}_n)]&& \knowl{./knowl/xref/definition-MM.html}{\text{Definition MM}}\\ &=[d_1T\vect{e}_1|d_2T\vect{e}_2|d_3T\vect{e}_3|\ldots|d_nT\vect{e}_n]&& \knowl{./knowl/xref/theorem-MMSMM.html}{\text{Theorem MMSMM}}\\ &=[d_1\vect{y}_1|d_2\vect{y}_2|d_3\vect{y}_3|\ldots|d_n\vect{y}_n]&& \knowl{./knowl/xref/definition-MVP.html}{\text{Definition MVP}}\text{.} \end{align*}

This equality of matrices (Definition ME) allows us to conclude that the individual columns are equal vectors (Definition CVE). That is, \(A\vect{y}_i=d_i\vect{y}_i\) for \(1\leq i\leq n\text{.}\) In other words, \(\vect{y}_i\) is an eigenvector of \(A\) for the eigenvalue \(d_i\text{,}\) \(1\leq i\leq n\text{.}\) (Why does \(\vect{y}_i\neq\zerovector\text{?}\)). Because \(T\) is nonsingular, the set containing \(T\)’s columns, \(S=\set{\vectorlist{y}{n}}\text{,}\) is a linearly independent set (Theorem NMLIC). So the set \(S\) has all the required properties.

Notice that the proof of Theorem DC is constructive. To diagonalize a matrix, we need only locate \(n\) linearly independent eigenvectors. Then we can construct a nonsingular matrix using the eigenvectors as columns (\(R\)) so that \(\inverse{R}AR\) is a diagonal matrix (\(D\)). The entries on the diagonal of \(D\) will be the eigenvalues of the eigenvectors used to create \(R\text{,}\) in the same order as the eigenvectors appear in \(R\text{.}\) We illustrate this by diagonalizing some matrices.

Example DMS3. Diagonalizing a matrix of size 3.

Consider the matrix

\begin{equation*} F= \begin{bmatrix} -13 & -8 & -4\\ 12 & 7 & 4\\ 24 & 16 & 7 \end{bmatrix} \end{equation*}

of Example CPMS3, Example EMS3 and Example ESMS3. \(F\)’s eigenvalues and eigenspaces are

\begin{align*} \lambda&=3&\eigenspace{F}{3}&=\spn{\set{\colvector{-\frac{1}{2}\\\frac{1}{2}\\1}}}\\ \lambda&=-1&\eigenspace{F}{-1}&=\spn{\set{\colvector{-\frac{2}{3}\\1\\0},\,\colvector{-\frac{1}{3}\\0\\1}}}\text{.} \end{align*}

Define the matrix \(S\) to be the \(3\times 3\) matrix whose columns are the three basis vectors in the eigenspaces for \(F\text{,}\)

\begin{equation*} S= \begin{bmatrix} -\frac{1}{2} & -\frac{2}{3} & -\frac{1}{3}\\ \frac{1}{2} & 1 & 0\\ 1 & 0 & 1 \end{bmatrix}\text{.} \end{equation*}

Check that \(S\) is nonsingular (row-reduces to the identity matrix, Theorem NMRRI, or has a nonzero determinant, Theorem SMZD). Then the three columns of \(S\) are a linearly independent set (Theorem NMLIC). By Theorem DC we now know that \(F\) is diagonalizable. Furthermore, the construction in the proof of Theorem DC tells us that if we apply the matrix \(S\) to \(F\) in a similarity transformation, the result will be a diagonal matrix with the eigenvalues of \(F\) on the diagonal. The eigenvalues appear on the diagonal of the matrix in the same order as the eigenvectors appear in \(S\text{.}\) So,

\begin{align*} \similar{F}{S}&= \inverse{ \begin{bmatrix} -\frac{1}{2} & -\frac{2}{3} & -\frac{1}{3}\\ \frac{1}{2} & 1 & 0\\ 1 & 0 & 1 \end{bmatrix} } \begin{bmatrix} -13 & -8 & -4\\ 12 & 7 & 4\\ 24 & 16 & 7 \end{bmatrix} \begin{bmatrix} -\frac{1}{2} & -\frac{2}{3} & -\frac{1}{3}\\ \frac{1}{2} & 1 & 0\\ 1 & 0 & 1 \end{bmatrix}\\ &= \begin{bmatrix} 6 & 4 & 2\\ -3 & -1 & -1\\ -6 & -4 & -1 \end{bmatrix} \begin{bmatrix} -13 & -8 & -4\\ 12 & 7 & 4\\ 24 & 16 & 7 \end{bmatrix} \begin{bmatrix} -\frac{1}{2} & -\frac{2}{3} & -\frac{1}{3}\\ \frac{1}{2} & 1 & 0\\ 1 & 0 & 1 \end{bmatrix}\\ &= \begin{bmatrix} 3 & 0 & 0\\ 0 & -1 & 0\\ 0 & 0 & -1 \end{bmatrix}\text{.} \end{align*}

Note that the above computations can be viewed two ways. The proof of Theorem DC tells us that the four matrices (\(F\text{,}\) \(S\text{,}\) \(\inverse{F}\) and the diagonal matrix) will interact the way we have written the equation. Or as an example, we can actually perform the computations to verify what the theorem predicts.

The dimension of an eigenspace can be no larger than the algebraic multiplicity of the eigenvalue by Theorem ME. When every eigenvalue’s eigenspace is this large, then we can diagonalize the matrix, and only then. Three examples we have seen so far in this section, Example SMS5, Example DAB and Example DMS3, illustrate the diagonalization of a matrix, with varying degrees of detail about just how the diagonalization is achieved. However, in each case, you can verify that the geometric and algebraic multiplicities are equal for every eigenvalue. This is the substance of the next theorem.

Theorem DMFE. Diagonalizable Matrices have Full Eigenspaces.

Suppose \(A\) is a square matrix. Then \(A\) is diagonalizable if and only if \(\geomult{A}{\lambda}=\algmult{A}{\lambda}\) for every eigenvalue \(\lambda\) of \(A\text{.}\)

Proof.

Suppose \(A\) has size \(n\) and \(k\) distinct eigenvalues, \(\scalarlist{\lambda}{k}\text{.}\) Let \(S_i=\set{\vect{x}_{i1},\,\vect{x}_{i2},\,\vect{x}_{i3},\,\ldots,\,\vect{x}_{i\geomult{A}{\lambda_i}}}\text{,}\) denote a basis for the eigenspace of \(\lambda_i\text{,}\) \(\eigenspace{A}{\lambda_i}\text{,}\) for \(1\leq i\leq k\text{.}\) Then

\begin{equation*} S=S_1\cup S_2\cup S_3\cup\cdots\cup S_k \end{equation*}

is a set of eigenvectors for \(A\text{.}\) A vector cannot be an eigenvector for two different eigenvalues (see Exercise EE.T20) so \(S_i\cap S_j=\emptyset\) whenever \(i\neq j\text{.}\) In other words, \(S\) is a disjoint union of \(S_i\text{,}\) \(1\leq i\leq k\text{.}\)

(⇐)

The size of \(S\) is

\begin{align*} \card{S} &=\sum_{i=1}^k\geomult{A}{\lambda_i}&& S\text{ disjoint union of }S_i\\ &=\sum_{i=1}^k\algmult{A}{\lambda_i}&& \text{Hypothesis}\\ &=n&& \knowl{./knowl/xref/theorem-NEM.html}{\text{Theorem NEM}}\text{.} \end{align*}

We next show that \(S\) is a linearly independent set. So we will begin with a relation of linear dependence on \(S\text{,}\) using doubly-subscripted scalars and eigenvectors,

\begin{align*} \zerovector= &\left(a_{11}\vect{x}_{11}+a_{12}\vect{x}_{12}+\cdots+a_{1\geomult{A}{\lambda_1}}\vect{x}_{1\geomult{A}{\lambda_1}}\right)+\\ &\left(a_{21}\vect{x}_{21}+a_{22}\vect{x}_{22}+\cdots+a_{2\geomult{A}{\lambda_2}}\vect{x}_{2\geomult{A}{\lambda_2}}\right)+\\ &\left(a_{31}\vect{x}_{31}+a_{32}\vect{x}_{32}+\cdots+a_{3\geomult{A}{\lambda_3}}\vect{x}_{3\geomult{A}{\lambda_3}}\right)+\\ &\quad\quad\vdots\\ &\left(a_{k1}\vect{x}_{k1}+a_{k2}\vect{x}_{k2}+\cdots+a_{k\geomult{A}{\lambda_k}}\vect{x}_{k\geomult{A}{\lambda_k}}\right)\text{.} \end{align*}

Define the vectors \(\vect{y}_i\text{,}\) \(1\leq i\leq k\) by

\begin{align*} \vect{y}_1&=\left(a_{11}\vect{x}_{11}+a_{12}\vect{x}_{12}+a_{13}\vect{x}_{13}+\cdots+a_{1\geomult{A}{\lambda_1}}\vect{x}_{1\geomult{A}{\lambda_1}}\right)\\ \vect{y}_2&=\left(a_{21}\vect{x}_{21}+a_{22}\vect{x}_{22}+a_{23}\vect{x}_{23}+\cdots+a_{2\geomult{A}{\lambda_2}}\vect{x}_{2\geomult{A}{\lambda_2}}\right)\\ \vect{y}_3&=\left(a_{31}\vect{x}_{31}+a_{32}\vect{x}_{32}+a_{33}\vect{x}_{33}+\cdots+a_{3\geomult{A}{\lambda_3}}\vect{x}_{3\geomult{A}{\lambda_3}}\right)\\ &\quad\quad\vdots\\ \vect{y}_k&=\left(a_{k1}\vect{x}_{k1}+a_{k2}\vect{x}_{k2}+a_{k3}\vect{x}_{k3}+\cdots+a_{k\geomult{A}{\lambda_k}}\vect{x}_{k\geomult{A}{\lambda_k}}\right)\text{.} \end{align*}

Then the relation of linear dependence becomes

\begin{align*} \zerovector&=\vect{y}_1+\vect{y}_2+\vect{y}_3+\cdots+\vect{y}_k\text{.} \end{align*}

Since the eigenspace \(\eigenspace{A}{\lambda_i}\) is closed under vector addition and scalar multiplication, \(\vect{y}_i\in\eigenspace{A}{\lambda_i}\text{,}\) \(1\leq i\leq k\text{.}\) Thus, for each \(i\text{,}\) the vector \(\vect{y}_i\) is an eigenvector of \(A\) for \(\lambda_i\text{,}\) or is the zero vector. Recall that sets of eigenvectors whose eigenvalues are distinct form a linearly independent set by Theorem EDELI. Should any (or some) \(\vect{y}_i\) be nonzero, the previous equation would provide a nontrivial relation of linear dependence on a set of eigenvectors with distinct eigenvalues, contradicting Theorem EDELI. Thus \(\vect{y}_i=\zerovector\text{,}\) \(1\leq i\leq k\text{.}\)

Each of the \(k\) equations, \(\vect{y}_i=\zerovector\text{,}\) is a relation of linear dependence on the corresponding set \(S_i\text{,}\) a set of basis vectors for the eigenspace \(\eigenspace{A}{\lambda_i}\text{,}\) which is therefore linearly independent. From these relations of linear dependence on linearly independent sets we conclude that the scalars are all zero, more precisely, \(a_{ij}=0\text{,}\) \(1\leq j\leq\geomult{A}{\lambda_i}\) for \(1\leq i\leq k\text{.}\) This establishes that our original relation of linear dependence on \(S\) has only the trivial relation of linear dependence, and hence \(S\) is a linearly independent set.

We have determined that \(S\) is a set of \(n\) linearly independent eigenvectors for \(A\text{,}\) and so by Theorem DC is diagonalizable.

(⇒)

Now we assume that \(A\) is diagonalizable. Aiming for a contradiction (Proof Technique CD), suppose that there is at least one eigenvalue, say \(\lambda_t\text{,}\) such that \(\geomult{A}{\lambda_t}\neq\algmult{A}{\lambda_t}\text{.}\) By Theorem ME we must have \(\geomult{A}{\lambda_t}\lt\algmult{A}{\lambda_t}\text{,}\) and \(\geomult{A}{\lambda_i}\leq\algmult{A}{\lambda_i}\) for \(1\leq i\leq k\text{,}\) \(i\neq t\text{.}\)

Since \(A\) is diagonalizable, Theorem DC guarantees a set of \(n\) linearly independent vectors, all of which are eigenvectors of \(A\text{.}\) Let \(n_i\) denote the number of eigenvectors in \(S\) that are eigenvectors for \(\lambda_i\text{,}\) and recall that a vector cannot be an eigenvector for two different eigenvalues (Exercise EE.T20). \(S\) is a linearly independent set, so the subset \(S_i\) containing the \(n_i\) eigenvectors for \(\lambda_i\) must also be linearly independent. Because the eigenspace \(\eigenspace{A}{\lambda_i}\) has dimension \(\geomult{A}{\lambda_i}\) and \(S_i\) is a linearly independent subset in \(\eigenspace{A}{\lambda_i}\text{,}\) Theorem G tells us that \(n_i\leq\geomult{A}{\lambda_i}\text{,}\) for \(1\leq i\leq k\text{.}\)

Putting all these facts together gives,

\begin{align*} n &=n_1+n_2+n_3+\cdots+n_t+\cdots+n_k&& \knowl{./knowl/xref/definition-SU.html}{\text{Definition SU}}\\ &\leq\geomult{A}{\lambda_1}+\geomult{A}{\lambda_2}+\geomult{A}{\lambda_3}+\cdots+\geomult{A}{\lambda_t}+\cdots+\geomult{A}{\lambda_k}&& \knowl{./knowl/xref/theorem-G.html}{\text{Theorem G}}\\ &\lt \algmult{A}{\lambda_1}+\algmult{A}{\lambda_2}+\algmult{A}{\lambda_3}+\cdots+\algmult{A}{\lambda_t}+\cdots+\algmult{A}{\lambda_k}&& \knowl{./knowl/xref/theorem-ME.html}{\text{Theorem ME}}\\ &=n&& \knowl{./knowl/xref/theorem-NEM.html}{\text{Theorem NEM}}\text{.} \end{align*}

This is a contradiction (we cannot have \(n\lt n\text{!}\)) and so our assumption that some eigenspace had less than full dimension was false.

Example SEE, Example CASE, Example ESMS3, Example ESMS4, Example DEMS5, Archetype B, Archetype F, Archetype K and Archetype L are all examples of matrices that are diagonalizable and that illustrate Theorem DMFE. While we have provided many examples of matrices that are diagonalizable, especially among the archetypes, there are many matrices that are not diagonalizable. Here is one now.

Example NDMS4. A non-diagonalizable matrix of size 4.

In Example EMMS4 the matrix

\begin{equation*} B= \begin{bmatrix} -2 & 1 & -2 & -4\\ 12 & 1 & 4 & 9\\ 6 & 5 & -2 & -4\\ 3 & -4 & 5 & 10 \end{bmatrix} \end{equation*}

was determined to have characteristic polynomial

\begin{equation*} \charpoly{B}{x}=(x-1)(x-2)^3 \end{equation*}

and an eigenspace for \(\lambda=2\) of

\begin{equation*} \eigenspace{B}{2}=\spn{\set{\colvector{-\frac{1}{2}\\1\\-\frac{1}{2}\\1}}}\text{.} \end{equation*}

So the geometric multiplicity of \(\lambda=2\) is \(\geomult{B}{2}=1\text{,}\) while the algebraic multiplicity is \(\algmult{B}{2}=3\text{.}\) By Theorem DMFE, the matrix \(B\) is not diagonalizable.

Archetype A is the lone archetype with a square matrix that is not diagonalizable, as the algebraic and geometric multiplicities of the eigenvalue \(\lambda=0\) differ. Example HMEM5 is another example of a matrix that cannot be diagonalized due to the difference between the geometric and algebraic multiplicities of \(\lambda=2\text{,}\) as is Example CEMS6 which has two complex eigenvalues, each with differing multiplicities. Likewise, Example EMMS4 has an eigenvalue with different algebraic and geometric multiplicities and so cannot be diagonalized.

Sage MD. Matrix Diagonalization.

The third way to get eigenvectors is the matrix method .eigenmatrix_right() (and the analogous .eigenmatrix_left()). It always returns two square matrices of the same size as the original matrix. The first matrix of the output is a diagonal matrix with the eigenvalues of the matrix filling the diagonal entries of the matrix. The second matrix has eigenvectors in the columns, in the same order as the corresponding eigenvalues. For a single eigenvalue, these columns/eigenvectors form a linearly independent set.

A careful reading of the previous paragraph suggests the question: what if we do not have enough eigenvectors to fill the columns of the second square matrix? When the geometric multiplicity does not equal the algebraic multiplicity, the deficit is met by inserting zero columns in the matrix of eigenvectors. Conversely, when the matrix is diagonalizable, by Theorem DMFE the geometric and algebraic multiplicities of each eigenvalue are equal, and the union of the bases of the eigenspaces provides a complete set of linearly independent vectors. So for a matrix \(A\text{,}\) Sage will output two matrices, \(D\) and \(S\) such that \(\inverse{S}AS=D\text{.}\)

We can rewrite the relation above as \(AS=SD\text{.}\) In the case of a non-diagonalizable matrix, the matrix of eigenvectors is singular (it has zero columns), but the relationship \(AS=SD\) still holds. Here are examples of the two scenarios, along with demonstrations of the matrix method is_diagonalizable().

Now for a matrix that is far from diagonalizable.

Theorem DED. Distinct Eigenvalues implies Diagonalizable.

Suppose \(A\) is a square matrix of size \(n\) with \(n\) distinct eigenvalues. Then \(A\) is diagonalizable.

Proof.

If we collect a single eigenvector of \(A\) for each eigenvalue, then we will have a set of \(n\) vectors that Theorem EDELI guarantees is a linearly independent set. Then Theorem DC implies that \(A\) is diagonalizable.

Example DEHD. Distinct eigenvalues, hence diagonalizable.

In Example DEMS5 the matrix

\begin{equation*} H= \begin{bmatrix} 15 & 18 & -8 & 6 & -5\\ 5 & 3 & 1 & -1 & -3\\ 0 & -4 & 5 & -4 & -2\\ -43 & -46 & 17 & -14 & 15\\ 26 & 30 & -12 & 8 & -10 \end{bmatrix} \end{equation*}

has characteristic polynomial

\begin{equation*} \charpoly{H}{x}=x(x-2)(x-1)(x+1)(x+3) \end{equation*}

and so is a \(5\times 5\) matrix with 5 distinct eigenvalues.

By Theorem DED we know \(H\) must be diagonalizable. But just for practice, we exhibit a diagonalization. The matrix \(S\) contains eigenvectors of \(H\) as columns, one from each eigenspace, guaranteeing linear independent columns and thus the nonsingularity of \(S\text{.}\) Notice that we are using the versions of the eigenvectors from Example DEMS5 that have integer entries. The diagonal matrix has the eigenvalues of \(H\) in the same order that their respective eigenvectors appear as the columns of \(S\text{.}\) With these matrices, verify computationally that \(\similar{H}{S}=D\text{.}\)

\begin{align*} S&= \begin{bmatrix} 2 & 1 & -1 & 1 & 1\\ -1 & 0 & 2 & 0 & -1\\ -2 & 0 & 2 & -1 & -2\\ -4 & -1 & 0 & -2 & -1\\ 2 & 2 & 1 & 2 & 1 \end{bmatrix} &D&= \begin{bmatrix} -3 & 0 & 0 & 0 & 0\\ 0 & -1 & 0 & 0 & 0\\ 0 & 0 & 0 & 0 & 0\\ 0 & 0 & 0 & 1 & 0\\ 0 & 0 & 0 & 0 & 2 \end{bmatrix}\text{.} \end{align*}

Note that there are many different ways to diagonalize \(H\text{.}\) We could replace eigenvectors by nonzero scalar multiples, or we could rearrange the order of the eigenvectors as the columns of \(S\) (which would subsequently reorder the eigenvalues along the diagonal of \(D\)).

Archetype B is another example of a matrix that has as many distinct eigenvalues as its size, and is hence diagonalizable by Theorem DED.

Powers of a diagonal matrix are easy to compute, and when a matrix is diagonalizable, it is almost as easy. We could state a theorem here perhaps, but we will settle instead for an example that makes the point just as well.

Example HPDM. High power of a diagonalizable matrix.

Suppose that

\begin{equation*} A=\begin{bmatrix} 19 & 0 & 6 & 13 \\ -33 & -1 & -9 & -21 \\ 21 & -4 & 12 & 21 \\ -36 & 2 & -14 & -28 \end{bmatrix} \end{equation*}

and we wish to compute \(A^{20}\text{.}\) Normally this would require 19 matrix multiplications, but since \(A\) is diagonalizable, we can simplify the computations substantially.

First, we diagonalize \(A\text{.}\) With

\begin{equation*} S=\begin{bmatrix} 1 & -1 & 2 & -1 \\ -2 & 3 & -3 & 3 \\ 1 & 1 & 3 & 3 \\ -2 & 1 & -4 & 0 \end{bmatrix} \end{equation*}

we find

\begin{align*} D&=\similar{A}{S}\\ &= \begin{bmatrix} -6 & 1 & -3 & -6 \\ 0 & 2 & -2 & -3 \\ 3 & 0 & 1 & 2 \\ -1 & -1 & 1 & 1 \end{bmatrix} \begin{bmatrix} 19 & 0 & 6 & 13 \\ -33 & -1 & -9 & -21 \\ 21 & -4 & 12 & 21 \\ -36 & 2 & -14 & -28 \end{bmatrix} \begin{bmatrix} 1 & -1 & 2 & -1 \\ -2 & 3 & -3 & 3 \\ 1 & 1 & 3 & 3 \\ -2 & 1 & -4 & 0 \end{bmatrix}\\ &= \begin{bmatrix} -1 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 \\ 0 & 0 & 2 & 0 \\ 0 & 0 & 0 & 1 \end{bmatrix}\text{.} \end{align*}

Now we find an alternate expression for \(A^{20}\text{,}\)

\begin{align*} A^{20} &=AAA\ldots A\\ &=I_nAI_nAI_nAI_n\ldots I_nAI_n\\ &=\left(S\inverse{S}\right)A\left(S\inverse{S}\right)A\left(S\inverse{S}\right)A\left(S\inverse{S}\right)\ldots \left(S\inverse{S}\right)A\left(S\inverse{S}\right)\\ &=S\left(\inverse{S}AS\right)\left(\inverse{S}AS\right)\left(\inverse{S}AS\right)\ldots \left(\inverse{S}AS\right)\inverse{S}\\ &=SDDD\ldots D\inverse{S}\\ &=SD^{20}\inverse{S} \end{align*}

and since \(D\) is a diagonal matrix, powers are much easier to compute,

\begin{align*} &= S \begin{bmatrix} -1 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 \\ 0 & 0 & 2 & 0 \\ 0 & 0 & 0 & 1 \end{bmatrix}^{20} \inverse{S}\\ &= S \begin{bmatrix} (-1)^{20} & 0 & 0 & 0 \\ 0 & (0)^{20} & 0 & 0 \\ 0 & 0 & (2)^{20} & 0 \\ 0 & 0 & 0 & (1)^{20} \end{bmatrix} \inverse{S}\\ &= \begin{bmatrix} 1 & -1 & 2 & -1 \\ -2 & 3 & -3 & 3 \\ 1 & 1 & 3 & 3 \\ -2 & 1 & -4 & 0 \end{bmatrix} \begin{bmatrix} 1 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 \\ 0 & 0 & 1048576 & 0 \\ 0 & 0 & 0 & 1 \end{bmatrix} \begin{bmatrix} -6 & 1 & -3 & -6 \\ 0 & 2 & -2 & -3 \\ 3 & 0 & 1 & 2 \\ -1 & -1 & 1 & 1 \end{bmatrix}\\ &= \begin{bmatrix} 6291451 & 2 & 2097148 & 4194297 \\ -9437175 & -5 & -3145719 & -6291441 \\ 9437175 & -2 & 3145728 & 6291453 \\ -12582900 & -2 & -4194298 & -8388596 \end{bmatrix}\text{.} \end{align*}

Notice how we effectively replaced the twentieth power of \(A\) by the twentieth power of \(D\text{,}\) and how a high power of a diagonal matrix is just a collection of powers of scalars on the diagonal. The price we pay for this simplification is the need to diagonalize the matrix (by computing eigenvalues and eigenvectors) and finding the inverse of the matrix of eigenvectors. And we still need to do two matrix products. But the higher the power, the greater the savings.

Subsection FS Fibonacci Sequences

Example FSCF. Fibonacci sequence, closed form.

The Fibonacci sequence is a sequence of integers defined recursively by

\begin{align*} a_0&=0 & a_1&=1 & a_{n+1}&=a_n+a_{n-1},\quad n\geq 1\text{.} \end{align*}

So the initial portion of the sequence is \(0,\,1,\,1,\,2,\,3,\,5,\,8,\,13,\,21,\,\ldots\text{.}\) In this subsection we will illustrate an application of eigenvalues and diagonalization through the determination of a closed-form expression for an arbitrary term of this sequence.

To begin, verify that for any \(n\geq 1\) the recursive statement above establishes the truth of the statement

\begin{align*} \colvector{a_n\\a_{n+1}} &= \begin{bmatrix}0&1\\1&1\end{bmatrix} \colvector{a_{n-1}\\a_n}\text{.} \end{align*}

Let \(A\) denote this \(2\times 2\) matrix. Through repeated applications of the statement above we have

\begin{align*} \colvector{a_n\\a_{n+1}} & =A\colvector{a_{n-1}\\a_n} =A^2\colvector{a_{n-2}\\a_{n-1}} =A^3\colvector{a_{n-3}\\a_{n-2}} =\cdots =A^n\colvector{a_{0}\\a_{1}}\text{.} \end{align*}

In preparation for working with this high power of \(A\text{,}\) not unlike in Example HPDM, we will diagonalize \(A\text{.}\) The two distinct eigenvalues of \(A\) arise as roots of the polynomial \(x^2-x-1\text{,}\) and are

\begin{align*} \rho&=\frac{1+\sqrt{5}}{2} & \delta&=\frac{1-\sqrt{5}}{2}\text{.} \end{align*}

With two distinct eigenvalues, Theorem DED implies that \(A\) is diagonalizable. It will be easier to compute with these eigenvalues once you confirm the following properties (all but the last can be derived from the fact that \(\rho\) and \(\delta\) are roots of the polynomial, in a factored or unfactored form)

\begin{align*} \rho+\delta&=1 & \rho\delta&=-1 & 1+\rho&=\rho^2 & 1+\delta&=\delta^2 & \rho-\delta&=\sqrt{5}\text{.} \end{align*}

Then eigenvectors of \(A\) (for \(\rho\) and \(\delta\text{,}\) respectively) are

\begin{align*} &\colvector{1\\\rho} & &\colvector{1\\\delta} \end{align*}

which can be easily confirmed, as we demonstrate for the eigenvector for \(\rho\text{,}\)

\begin{align*} \begin{bmatrix}0&1\\1&1\end{bmatrix}\colvector{1\\\rho} & =\colvector{\rho\\1+\rho} =\colvector{\rho\\\rho^2} =\rho\colvector{1\\\rho}\text{.} \end{align*}

From the proof of Theorem DC we know \(A\) can be diagonalized by a matrix \(S\) with these eigenvectors as columns, giving \(D=\inverse{S}AS\text{.}\) We list \(S\text{,}\) \(\inverse{S}\) and the diagonal matrix \(D\text{,}\)

\begin{align*} S&=\begin{bmatrix}1&1\\\rho&\delta\end{bmatrix} & \inverse{S}&=\frac{1}{\rho-\delta}\begin{bmatrix}-\delta&1\\\rho&-1\end{bmatrix} & D&=\begin{bmatrix}\rho&0\\0&\delta\end{bmatrix}\text{.} \end{align*}

OK, we have everything in place now. The main step in the following is to replace \(A\) by \(SD\inverse{S}\text{.}\) Here we go,

\begin{align*} \colvector{a_n\\a_{n+1}} &=A^n\colvector{a_{0}\\a_{1}}\\ &=\left(SD\inverse{S}\right)^n\colvector{a_{0}\\a_{1}}\\ &=SD\inverse{S}SD\inverse{S}SD\inverse{S}\cdots SD\inverse{S}\colvector{a_{0}\\a_{1}}\\ &=SDDD\cdots D\inverse{S}\colvector{a_{0}\\a_{1}}\\ &=SD^n\inverse{S}\colvector{a_{0}\\a_{1}}\\ &= \begin{bmatrix}1&1\\\rho&\delta\end{bmatrix} \begin{bmatrix}\rho&0\\0&\delta\end{bmatrix}^n \frac{1}{\rho-\delta}\begin{bmatrix}-\delta&1\\\rho&-1\end{bmatrix} \colvector{a_{0}\\a_{1}}\\ &= \frac{1}{\rho-\delta} \begin{bmatrix}1&1\\\rho&\delta\end{bmatrix} \begin{bmatrix}\rho^n&0\\0&\delta^n\end{bmatrix} \begin{bmatrix}-\delta&1\\\rho&-1\end{bmatrix} \colvector{0\\1}\\ &= \frac{1}{\rho-\delta} \begin{bmatrix}1&1\\\rho&\delta\end{bmatrix} \begin{bmatrix}\rho^n&0\\0&\delta^n\end{bmatrix} \colvector{1\\-1}\\ &= \frac{1}{\rho-\delta} \begin{bmatrix}1&1\\\rho&\delta\end{bmatrix} \colvector{\rho^n\\-\delta^n}\\ &= \frac{1}{\rho-\delta} \colvector{\rho^n-\delta^n\\\rho^{n+1}-\delta^{n+1}}\text{.} \end{align*}

Performing the scalar multiplication and equating the first entries of the two vectors, we arrive at the closed form expression

\begin{align*} a_n&=\frac{1}{\rho-\delta}\left(\rho^n-\delta^n\right)\\ &=\frac{1}{\sqrt{5}} \left(\left(\frac{1+\sqrt{5}}{2}\right)^n-\left(\frac{1-\sqrt{5}}{2}\right)^n\right)\\ &=\frac{1}{2^n\sqrt{5}} \left(\left(1+\sqrt{5}\right)^n-\left(1-\sqrt{5}\right)^n\right)\text{.} \end{align*}

Notice that it does not matter whether we use the equality of the first or second entries of the vectors, we will arrive at the same formula, once in terms of \(n\) and again in terms of \(n+1\text{.}\) Also, our definition clearly describes a sequence that will only contain integers, yet the presence of the irrational number \(\sqrt{5}\) might make us suspicious. But no, our expression for \(a^n\) will always yield an integer!

The Fibonacci sequence, and generalizations of it, have been extensively studied (Fibonacci lived in the 12th and 13th centuries). There are many ways to derive the closed-form expression we just found, and our approach may not be the most efficient route. But it is a nice demonstration of how diagonalization can be used to solve a problem outside the field of linear algebra.

We close this section with a comment about an important upcoming theorem that we prove in Chapter R. A consequence of Theorem OD is that every Hermitian matrix (Definition HM) is diagonalizable (Definition DZM), and the similarity transformation that accomplishes the diagonalization uses a unitary matrix (Definition UM). This means that for every Hermitian matrix of size \(n\) there is a basis of \(\complex{n}\) that is composed entirely of eigenvectors for the matrix and also forms an orthonormal set (Definition ONS). Notice that for matrices with only real entries, we only need the hypothesis that the matrix is symmetric (Definition SYM) to reach this conclusion (Example ESMS4). Can you imagine a prettier basis for use with a matrix? I cannot.

These results in Section OD explain much of our recurring interest in orthogonality, and make the section a high point in your study of linear algebra. A precise statement of this diagonalization result applies to a slightly broader class of matrices, known as “normal” matrices (Definition NRML), which are matrices that commute with their adjoints. With this expanded category of matrices, the result becomes an equivalence (Proof Technique E). See Theorem OD and Theorem OBNM in Section OD for all the details.

Reading Questions SD Reading Questions

1. Equivalence Relation.

What is an equivalence relation?

2. Characterization of Diagonalizable.

State a condition that is equivalent to a matrix being diagonalizable, but is not the definition.

3. Calculate a Diagonalization.

Find a diagonal matrix similar to

\begin{equation*} A=\begin{bmatrix} -5 & 8\\-4 & 7 \end{bmatrix}\text{.} \end{equation*}

Exercises SD Exercises

C20.

Consider the matrix \(A\) below. First, show that \(A\) is diagonalizable by computing the geometric multiplicities of the eigenvalues and quoting the relevant theorem. Second, find a diagonal matrix \(D\) and a nonsingular matrix \(S\) so that \(\similar{A}{S}=D\text{.}\) (See Exercise EE.C21 for some of the necessary computations.)

\begin{equation*} A= \begin{bmatrix} 18 & -15 & 33 & -15\\ -4 & 8 & -6 & 6\\ -9 & 9 & -16 & 9\\ 5 & -6 & 9 & -4 \end{bmatrix} \end{equation*}

Solution.

Using a calculator, we find that \(A\) has three distinct eigenvalues, \(\lambda=3,\,2,\,-1\text{,}\) with \(\lambda=2\) having algebraic multiplicity two, \(\algmult{A}{2}=2\text{.}\) The eigenvalues \(\lambda=3,\,-1\) have algebraic multiplicity one, and so by Theorem ME we can conclude that their geometric multiplicities are one as well. Together with the computation of the geometric multiplicity of \(\lambda=2\) from Exercise EE.C21, we know

\begin{align*} \geomult{A}{3}&=\algmult{A}{3}=1& \geomult{A}{2}&=\algmult{A}{2}=2& \geomult{A}{-1}&=\algmult{A}{-1}=1\text{.} \end{align*}

This satisfies the hypotheses of Theorem DMFE, and so we can conclude that \(A\) is diagonalizable.

A calculator will give us four eigenvectors of \(A\text{,}\) the two for \(\lambda=2\) being linearly independent presumably. Or, by hand, we could find basis vectors for the three eigenspaces. For \(\lambda=3,\,-1\) the eigenspaces have dimension one, and so any eigenvector for these eigenvalues will be multiples of the ones we use below. For \(\lambda=2\) there are many different bases for the eigenspace, so your answer could vary. Our eigenvectors are the basis vectors we would have obtained if we had actually constructed a basis in Exercise EE.C21 rather than just computing the dimension.

By the construction in the proof of Theorem DC, the required matrix \(S\) has columns that are four linearly independent eigenvectors of \(A\) and the diagonal matrix has the eigenvalues on the diagonal (in the same order as the eigenvectors in \(S\)). Here are the pieces, “doing” the diagonalization,

\begin{equation*} \inverse{ \begin{bmatrix} -1 & 0 & -3 & 6\\ -2 & -1 & -1 & 0\\ 0 & 0 & 1 & -3\\ 1 & 1 & 0 & 1 \end{bmatrix} } \begin{bmatrix} 18 & -15 & 33 & -15\\ -4 & 8 & -6 & 6\\ -9 & 9 & -16 & 9\\ 5 & -6 & 9 & -4 \end{bmatrix} \begin{bmatrix} -1 & 0 & -3 & 6\\ -2 & -1 & -1 & 0\\ 0 & 0 & 1 & -3\\ 1 & 1 & 0 & 1 \end{bmatrix} = \begin{bmatrix} 3 & 0 & 0 & 0\\ 0 & 2 & 0 & 0\\ 0 & 0 & 2 & 0\\ 0 & 0 & 0 & -1 \end{bmatrix}\text{.} \end{equation*}

C21.

Determine if the matrix \(A\) below is diagonalizable. If the matrix is diagonalizable, then find a diagonal matrix \(D\) that is similar to \(A\text{,}\) and provide the invertible matrix \(S\) that performs the similarity transformation. You should use your calculator to find the eigenvalues of the matrix, but try only using the row-reducing function of your calculator to assist with finding eigenvectors.

\begin{equation*} A= \begin{bmatrix} 1 & 9 & 9 & 24 \\ -3 & -27 & -29 & -68 \\ 1 & 11 & 13 & 26 \\ 1 & 7 & 7 & 18 \end{bmatrix} \end{equation*}

Solution.

A calculator will provide the eigenvalues \(\lambda=2,\,2,\,1,\,0\text{,}\) so we can reconstruct the characteristic polynomial as

\begin{equation*} \charpoly{A}{x}=(x-2)^2(x-1)x \end{equation*}

so the algebraic multiplicities of the eigenvalues are

\begin{align*} \algmult{A}{2}&=2& \algmult{A}{1}&=1& \algmult{A}{0}&=1\text{.} \end{align*}

Now compute eigenspaces by hand, obtaining null spaces for each of the three eigenvalues by constructing the correct singular matrix (Theorem EMNS),

\begin{align*} A-2I_4&= \begin{bmatrix} -1 & 9 & 9 & 24 \\ -3 & -29 & -29 & -68 \\ 1 & 11 & 11 & 26 \\ 1 & 7 & 7 & 16 \end{bmatrix} \rref \begin{bmatrix} 1 & 0 & 0 & -\frac{3}{2} \\ 0 & 1 & 1 & \frac{5}{2} \\ 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 \end{bmatrix}\\ \eigenspace{A}{2}&=\nsp{A-2I_4} =\spn{\set{\colvector{\frac{3}{2}\\-\frac{5}{2}\\0\\1},\,\colvector{0\\-1\\1\\0}}} =\spn{\set{\colvector{3\\-5\\0\\2},\,\colvector{0\\-1\\1\\0}}}\\ A-1I_4&= \begin{bmatrix} 0 & 9 & 9 & 24 \\ -3 & -28 & -29 & -68 \\ 1 & 11 & 12 & 26 \\ 1 & 7 & 7 & 17 \end{bmatrix} \rref \begin{bmatrix} 1 & 0 & 0 & -\frac{5}{3} \\ 0 & 1 & 0 & \frac{13}{3} \\ 0 & 0 & 1 & -\frac{5}{3} \\ 0 & 0 & 0 & 0 \end{bmatrix}\\ \eigenspace{A}{1}&=\nsp{A-I_4} =\spn{\set{\colvector{\frac{5}{3}\\-\frac{13}{3}\\\frac{5}{3}\\1}}} =\spn{\set{\colvector{5\\-13\\5\\3}}}\\ A-0I_4&= \begin{bmatrix} 1 & 9 & 9 & 24 \\ -3 & -27 & -29 & -68 \\ 1 & 11 & 13 & 26 \\ 1 & 7 & 7 & 18 \end{bmatrix} \rref \begin{bmatrix} 1 & 0 & 0 & -3 \\ 0 & 1 & 0 & 5 \\ 0 & 0 & 1 & -2 \\ 0 & 0 & 0 & 0 \end{bmatrix}\\ \eigenspace{A}{0}&=\nsp{A}=\spn{\set{\colvector{3\\-5\\2\\1}}}\text{.} \end{align*}

From this we can compute the dimensions of the eigenspaces to obtain the geometric multiplicities,

\begin{align*} \geomult{A}{2}&=2& \geomult{A}{1}&=1& \geomult{A}{0}&=1\text{.} \end{align*}

For each eigenvalue, the algebraic and geometric multiplicities are equal and so by Theorem DMFE we now know that \(A\) is diagonalizable. The construction in Theorem DC suggests we form a matrix whose columns are eigenvectors of \(A\)

\begin{equation*} S= \begin{bmatrix} 3 & 0 & 5 & 3 \\ -5 & -1 & -13 & -5 \\ 0 & 1 & 5 & 2 \\ 2 & 0 & 3 & 1 \end{bmatrix}\text{.} \end{equation*}

Since \(\detname{S}=-1\neq 0\text{,}\) we know that \(S\) is nonsingular (Theorem SMZD), so the columns of \(S\) are a set of 4 linearly independent eigenvectors of \(A\text{.}\) By the proof of Theorem SMZD we know

\begin{equation*} \similar{A}{S}= \begin{bmatrix} 2 & 0 & 0 & 0 \\ 0 & 2 & 0 & 0 \\ 0 & 0 & 1 & 0 \\ 0 & 0 & 0 & 0 \end{bmatrix} \end{equation*}

is a diagonal matrix with the eigenvalues of \(A\) along the diagonal, in the same order as the associated eigenvectors appear as columns of \(S\text{.}\)

C22.

Consider the matrix \(A\) below. Find the eigenvalues of \(A\) using a calculator and use these to construct the characteristic polynomial of \(A\text{,}\) \(\charpoly{A}{x}\text{.}\) State the algebraic multiplicity of each eigenvalue. Find all of the eigenspaces for \(A\) by computing expressions for null spaces, only using your calculator to row-reduce matrices. State the geometric multiplicity of each eigenvalue. Is \(A\) diagonalizable? If not, explain why. If so, find a diagonal matrix \(D\) that is similar to \(A\text{.}\)

\begin{equation*} A= \begin{bmatrix} 19 & 25 & 30 & 5 \\ -23 & -30 & -35 & -5 \\ 7 & 9 & 10 & 1 \\ -3 & -4 & -5 & -1 \end{bmatrix} \end{equation*}

Solution.

A calculator will report \(\lambda=0\) as an eigenvalue of algebraic multiplicity of 2, and \(\lambda=-1\) as an eigenvalue of algebraic multiplicity 2 as well. Since eigenvalues are roots of the characteristic polynomial (Theorem EMRCP) we have the factored version

\begin{equation*} \charpoly{A}{x}=(x-0)^2(x-(-1))^2=x^2(x^2+2x+1)=x^4+2x^3+x^2\text{.} \end{equation*}

The eigenspaces are then

\begin{align*} \lambda&=0\\ A-(0)I_4&= \begin{bmatrix} 19 & 25 & 30 & 5 \\ -23 & -30 & -35 & -5 \\ 7 & 9 & 10 & 1 \\ -3 & -4 & -5 & -1 \end{bmatrix} \rref \begin{bmatrix} \leading{1} & 0 & -5 & -5 \\ 0 & \leading{1} & 5 & 4 \\ 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 \end{bmatrix}\\ \eigenspace{A}{0}&=\nsp{C-(0)I_4}= \spn{\set{\colvector{5\\-5\\1\\0},\,\colvector{5\\-4\\0\\1}}}\\ \lambda&=-1\\ A-(-1)I_4&= \begin{bmatrix} 20 & 25 & 30 & 5 \\ -23 & -29 & -35 & -5 \\ 7 & 9 & 11 & 1 \\ -3 & -4 & -5 & 0 \end{bmatrix} \rref \begin{bmatrix} \leading{1} & 0 & -1 & 4 \\ 0 & \leading{1} & 2 & -3 \\ 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 \end{bmatrix}\\ \eigenspace{A}{-1}&=\nsp{C-(-1)I_4}= \spn{\set{\colvector{1\\-2\\1\\0},\,\colvector{-4\\3\\0\\1}}}\text{.} \end{align*}

Each eigenspace above is described by a spanning set obtained through an application of Theorem BNS and so is a basis for the eigenspace. In each case the dimension, and therefore the geometric multiplicity, is 2.

For each of the two eigenvalues, the algebraic and geometric multiplicities are equal. Theorem DMFE says that in this situation the matrix is diagonalizable. We know from Theorem DC that when we diagonalize \(A\) the diagonal matrix will have the eigenvalues of \(A\) on the diagonal (in some order). So we can claim that

\begin{equation*} D= \begin{bmatrix} 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 \\ 0 & 0 & -1 & 0 \\ 0 & 0 & 0 & -1 \end{bmatrix}\text{.} \end{equation*}

T15.

Suppose that \(A\) and \(B\) are similar matrices of size \(n\text{.}\) Prove that \(A^3\) and \(B^3\) are similar matrices. Generalize.

Solution.

By Definition SIM we know that there is a nonsingular matrix \(S\) so that \(AS=SB\text{.}\) Then

\begin{equation*} A^3S = A^2AS = A^2SB = AASB = ASBB = ASB^2 = SBB^2 = SB^3 \end{equation*}

So \(A^3\) is similar to \(B^3\) (via the matrix \(S\)).

More generally, if \(A\) is similar to \(B\text{,}\) and \(m\) is a non-negative integer, then \(A^m\) is similar to \(B^m\text{.}\) This can be proved carefully using induction (Proof Technique I).

T16.

Suppose that \(A\) and \(B\) are similar matrices, with \(A\) nonsingular. Prove that \(B\) is nonsingular, and that \(\inverse{A}\) is similar to \(\inverse{B}\text{.}\)

Solution.

There is a nonsingular matrix \(S\) such that \(AS=SB\text{.}\) With our hypothesis that \(A\) is nonsingular, Theorem NPNF says \(AS\) is nonsingular. Then \(SB\) is nonsingular, and the “other half” of Theorem NPNF says \(B\) is nonsingular.

With \(B\) nonsingular, Theorem NI allows us to employ \(\inverse{B}\text{.}\) Use Theorem SS twice to see

\begin{equation*} \inverse{B}\inverse{S} = \inverse{(SB)} = \inverse{(AS)} = \inverse{S}\inverse{A} \end{equation*}

So by Definition SIM, \(\inverse{A}\) is similar to \(\inverse{B}\text{.}\)

T17.

Suppose that \(B\) is a nonsingular matrix. Prove that \(AB\) is similar to \(BA\text{.}\)

Solution.

The nonsingular matrix \(B\) will provide the desired similarity transformation,

\begin{equation*} \left(BA\right)B = B\left(AB\right)\text{.} \end{equation*}

Done. That was almost too easy!

You have attempted of activities on this page.

Prev Top Next