Suggested Solution to Homework 6

1

1.1

Let \(v_i\) be the \(i\)-th column of \(A\).

\begin{aligned} v_1 & = (1, -1, 0)^T \\ v_2 & = (8, 4, 2)^T \\ v_3 & = (0, \sqrt{2}, 0)^T \\ v_4 & = (1, 0, 0)^T \end{aligned}

We compute the QR factorization of \(A\) using Gram-Schmidt process.

First vector:

\begin{aligned} w_1 & = v_1 = (1, -1, 0)^T \\ r_{11} & = \| w_1 \| = \sqrt{2} \\ u_1 & = w_1 / r_{11} = (\frac{1}{\sqrt{2}}, -\frac{1}{\sqrt{2}}, 0)^T \\ \end{aligned}

Second vector:

\begin{aligned} r_{12} & = \langle v_2, u_1 \rangle = \frac{4}{\sqrt{2}} \\ w_2 & = v_2 - r_{12} u_1 = (6, 6, 2)^T\\ r_{22} & = \| w_2 \| = 2\sqrt{19}\\ u_2 & = w_2 / r_{22} = (\frac{3}{\sqrt{19}}, \frac{3}{\sqrt{19}}, \frac{1}{\sqrt{19}})^T \\ \end{aligned}

Third vector:

\begin{aligned} r_{13} & = \langle v_3, u_1 \rangle = -1 \\ r_{23} & = \langle v_3, u_2 \rangle = \frac{3\sqrt{2}}{\sqrt{19}} \\ w_3 & = v_3 - r_{13} u_1 - r_{23} u_2 = (\frac{\sqrt{2}}{38}, \frac{\sqrt{2}}{38}, -\frac{3\sqrt{2}}{19})^T \\ r_{33} & = \| w_3 \| = \frac{1}{\sqrt{19}} \\ u_3 & = w_3 / r_{33} = (\frac{1}{\sqrt{38}}, \frac{1}{\sqrt{38}}, -\frac{6}{\sqrt{38}})^T \\ \end{aligned}

Finally:

\begin{aligned} r_{14} & = \langle v_4, u_1 \rangle = \frac{1}{\sqrt{2}} \\ r_{24} & = \langle v_4, u_2 \rangle = \frac{3}{\sqrt{19}} \\ r_{34} & = \langle v_4, u_3 \rangle = \frac{1}{\sqrt{38}} \end{aligned}

Hence, we can decompose \(A\) into \(QR\) in the following way.

\[ A = \begin{pmatrix} \frac{1}{\sqrt{2}} & \frac{3}{\sqrt{19}} & \frac{1}{\sqrt{38}} \\ -\frac{1}{\sqrt{2}} & \frac{3}{\sqrt{19}} & \frac{1}{\sqrt{38}} \\ 0 & \frac{1}{\sqrt{19}} & -\frac{6}{\sqrt{38}} \end{pmatrix} \begin{pmatrix} \sqrt{2} & \frac{4}{\sqrt{2}} & -1 & \frac{1}{\sqrt{2}} \\ 0 & 2\sqrt{19} & \frac{3\sqrt{2}}{\sqrt{19}} & \frac{3}{\sqrt{19}} \\ 0 & 0 & \frac{1}{\sqrt{19}} & \frac{1}{\sqrt{38}} \end{pmatrix} \]

1.2

1.2.1

Note that when we apply the Gram-Schmidt process to \(\tilde{A}\), we get the same basis, namely \(\{ u_1, u_2, u_3 \}\). From the above, we already know the combination of the last column in terms of the basis. Hence, the QR factorization of \(\tilde{A}\) is simply the above "Q" multiply "R" without the third column.

\[ \tilde{A} = \begin{pmatrix} \frac{1}{\sqrt{2}} & \frac{3}{\sqrt{19}} & \frac{1}{\sqrt{38}} \\ -\frac{1}{\sqrt{2}} & \frac{3}{\sqrt{19}} & \frac{1}{\sqrt{38}} \\ 0 & \frac{1}{\sqrt{19}} & -\frac{6}{\sqrt{38}} \end{pmatrix} \begin{pmatrix} \sqrt{2} & \frac{4}{\sqrt{2}} & \frac{1}{\sqrt{2}} \\ 0 & 2\sqrt{19} & \frac{3}{\sqrt{19}} \\ 0 & 0 & \frac{1}{\sqrt{38}} \end{pmatrix} \]

1.2.2

Let \(A_0 = \tilde{A}\). So we have the following.

\[ Q_0 = \begin{pmatrix} \frac{1}{\sqrt{2}} & \frac{3}{\sqrt{19}} & \frac{1}{\sqrt{38}} \\ -\frac{1}{\sqrt{2}} & \frac{3}{\sqrt{19}} & \frac{1}{\sqrt{38}} \\ 0 & \frac{1}{\sqrt{19}} & -\frac{6}{\sqrt{38}} \end{pmatrix} \]

\[ R_0 = \begin{pmatrix} \sqrt{2} & \frac{4}{\sqrt{2}} & \frac{1}{\sqrt{2}} \\ 0 & 2\sqrt{19} & \frac{3}{\sqrt{19}} \\ 0 & 0 & \frac{1}{\sqrt{38}} \end{pmatrix} \]

Using the QR algorithm, we have \(A_1 = R_0 Q_0\).

\[ A_1 = \begin{pmatrix} -1 & \frac{\sqrt{19}}{\sqrt{2}} & 0 \\ -\sqrt{38} & \frac{117}{19} & \frac{20}{19\sqrt{2}} \\ 0 & \frac{1}{19\sqrt{2}} & -\frac{6}{38} \end{pmatrix} \]

2

We use the conjugate gradient method to solve \(Ax = b\).

\[ A = \begin{pmatrix} 3 & 0 & 2 \\ 0 & 1 & 1 \\ 2 & 1 & 3 \end{pmatrix} \]

\[ b = \begin{pmatrix} 1 \\ 0 \\ 0 \end{pmatrix} \] Initialization:

\begin{aligned} x_0 & = (1, 1, 1)^T \\ r_0 & = (4, 2, 6)^T \\ p_0 & = (-4, -2, -6)^T \end{aligned}

Recall the CG algorithm is repeating the following.

\(\alpha_{k} = -\frac{r_k \cdot p_k}{\langle p_k, p_k \rangle}\)
\(x_{k+1} = x_{k} + \alpha_{k} p_{k}\)
\(r_{k+1} = Ax_{k+1} - b\)
\(\beta_{k} = \frac{\langle r_{k+1}, p_k \rangle}{\langle p_k, p_k \rangle}\)
\(p_{k+1} = -r_{k+1} + \beta p_{k}\)

where \(x \cdot y := y^T x\) and \(\langle x, y \rangle := y^T A x\).

Step 1:

\begin{aligned} \alpha_0 & = \frac{1}{5} \\ x_1 & = (\frac{1}{5}, \frac{3}{5}, -\frac{1}{5})^T \\ r_1 & = (-\frac{4}{5}, \frac{2}{5}, \frac{2}{5})^T \\ \beta_0 & = \frac{3}{175} \\ p_1 & = (\frac{128}{175}, -\frac{76}{175}, -\frac{88}{175})^T \\ \end{aligned}

Step 2:

\begin{aligned} \alpha_1 & = \frac{105}{166} \\ x_2 & = (\frac{55}{83}, \frac{27}{83}, -\frac{43}{83})^T \\ r_2 & = (-\frac{4}{83}, -\frac{16}{83}, \frac{8}{83})^T \\ \beta_1 & = \frac{350}{6889} \\ p_2 & = (\frac{588}{6889}, \frac{1176}{6889}, -\frac{840}{6889})^T \\ \end{aligned}

Step 3:

\begin{aligned} \alpha_2 & = \frac{83}{21} \\ x_3 & = (1, 1, -1)^T \\ r_3 & = (0, 0, 0)^T \end{aligned}

Hence, the exact solution to \(Ax = b\) is \(x = (1, 1, -1)^T\).

3

First, we note the following from the conjugate gradient algorithm.

\begin{aligned} r_{k+1} & = A\xi_{k+1} - b \\ & = A(\xi_k + \alpha_k d_k) - b \\ & = r_k + \alpha_k A d_k \end{aligned}

Obviously the statement holds for \(k = 0\). Now assume the statement is true for a \(k\) and consider the statement for \(k+1\). We need to check that it holds for any \(j < k+1\). First, we look at \(j = k\). By the definition of \(\beta_k\) and \(\alpha_k\), we simply check that \(\langle d_k, d_{k+1} \rangle = 0\) and \(r_{k+1} \cdot d_k = 0\).

\begin{aligned} \langle d_{k}, d_{k+1} \rangle & = \langle d_{k}, -r_{k+1} + \beta_{k} d_{k} \rangle \\ & = -\langle d_{k}, r_{k+1} \rangle + \frac{\langle r_{k+1}, d_{k} \rangle}{\langle d_{k}, d_{k} \rangle} \langle d_{k}, d_{k} \rangle \\ & = 0 \end{aligned} \begin{aligned} r_{k+1} \cdot d_k & = (r_k + \alpha_k A d_k) \cdot d_k \\ & = r_k \cdot d_k - \frac{r_k \cdot d_k}{\langle d_k, d_k \rangle} A d_k \cdot d_k \\ & = r_k \cdot d_k - \frac{r_k \cdot d_k}{\langle d_k, d_k \rangle} \langle d_k, d_k \rangle \\ & = 0 \end{aligned}

For \(j < k\), we have the following.

\begin{aligned} \langle d_j, d_{k+1} \rangle & = A d_j \cdot d_{k+1} \\ & = \frac{1}{\alpha_k} (Ax_{j+1} - Ax_j) \cdot d_{k+1} \\ & = \frac{1}{\alpha_k} (r_{j+1} - r_j) \cdot d_{k+1} \\ & = 0 \end{aligned} \begin{aligned} r_{k+1} \cdot d_j & = (r_k + \alpha_k A d_k) \cdot d_j \\ & = r_k \cdot d_j + \alpha_k \langle d_k, d_j \rangle \\ & = 0 + \alpha_k (0) \\ & = 0 \end{aligned}

Hence, the statement holds by induction.

4

4.1

Setting:

A = gallery('tridiag', 2000, -1, 4, -1);
A = A(:, 1:1000);
b = ones(2000, 1);

Gradient descent:

x = zeros(1000, 1);
fprintf(1, 'Gradient descend:\n');
fprintf(1, '#iter\t error\n');

for i = 1 : 50
  d = A' * b - A' * (A * x);
  alpha = d' * d / (d' * A' * (A * d));
  x = x + alpha * d;
  if i == 25 || i == 50
    fprintf('%d\t\t%.3e\n', [i, sum(abs(A' * (A * x) - A' * b))]);
  end
end

Conjugate gradient:

x = zeros(1000, 1);
fprintf(1, 'Conjugate gradient:\n');
fprintf(1, '#iter\t error\n');

r = A' * (A * x - b);
d = -r;
for i = 1 : 50
  alpha = - r' * d / (d' * A' * (A * d));
  x = x + alpha * d;
  r = A' * (A * x - b);
  beta = (r' * A' * (A * d)) / (d' * A' * (A * d));
  d = - r + beta * d;
  if i == 25 || i == 50
    fprintf('%d\t\t%.3e\n', [i, sum(abs(A' * (A * x) - A' * b))]);
  end
end

Remark:

We never compute \(A^T A\) explicitly. Whenever we compute \(z = A^T A x\), we first find \(y = Ax\) and then \(z = A^T y\). So, the brackets in A'*(A*x) is necessary. In fact, it will be much faster than A'*A*x, where the multiplications are done from left to right.
In practice, we would avoid repeated computations. For example, we can store A'*(A*d) from one computation and reuse it for the other computations.

4.2

Iteration	Error (gradient descend)	Error (conjugate gradient)
25	5.543e-02	1.675e-06
50	1.966e-04	8.193e-14

That's it. Good luck to your final exam!