MATH 170A | Class Profile

Course Information
Staff

Description

From the Course Catalog:

Analysis of numerical methods for linear algebraic systems and least squares problems. Orthogonalization methods. Ill-conditioned problems. Eigenvalue and singular value computations. Three lectures, one recitation. Knowledge of programming recommended.

The Instructor's Take on the subject:

Linear algebra is an essential underpinning of much of modern science and engineering. It is therefore important to be able to effectively compute all the quantities involved, and understand how the abstract concepts are implemented, and various important issues that arise from doing so. My research concerns bridging things from the abstract modeling domain (often expressed in the form of differential equations) to the concrete and computable (which changes the differential equations into linear algebra problems). In this course, I hope to introduce to you the real-world methods of computational linear algebra.

General Information

Class Website

http://ccom.ucsd.edu/~ctiee/classes/170a/

Announcements

Grades are Posted on TED

3/22/16 2:00 PM

Actual letter grades are now posted on TED (NOT Gradescope). Here are the stats:

Mean: 85.8%

Median: 86.8%

Std Dev: 9.58%

The grade cutoffs were chosen where the largest gaps were, and curving was minimal, taking into account bonus points. The reason the average is significantly higher than that of the midterms is that (1) the final was significantly easier (mean was 85 after bonus points taken out) and most of you did very well on your homework (plus, dropping the lowest score).

Lowest%
0.00%	F
60.00%	D
67.00%	C-
70.00%	C
74.60%	C+
77.50%	B-
80.00%	B
84.75%	B+
87.50%	A-
91.50%	A
97.00%	A+

Again, people are welcome to check their final over.

#pin

View More · View on Piazza

Final is Graded

3/21/16 1:33 AM

The final has been graded. More details later, but this time (I really mean it) you should check if your final is actually there and all its pages are intact... and that you actually have a score... If not, let me know!

Details and regrade procedures will be posted later. Note that grades for the class are officially sent to the registrar on Tuesday, but grades can be corrected after being submitted, so regrade requests will remain on and will be valid for longer (even if your grade shows up).

In the meantime, here is the score distribution (you all did pretty well!):

View More · View on Piazza

Sunday Review Session Notes

3/13/16 6:35 PM

These include some proofs, topics, and examples that I didn't get to during the review session.

Disclaimer: These are the notes I lectured off of for the Sunday review session, so they may have some crossed out formulas and scribbles in some areas, so not everything may be legible and there may be a typo here and there.

ReviewNotes.pdf

View More · View on Piazza

Final Topics

3/13/16 1:22 PM

Here is the list of topics for the final (and also what is NOT on the final)

Generally, consult the Lecture Notes (https://piazza.com/ucsd/winter2016/math170a/resources) which have a more outlined form of the book, which treats topics in much more detail than we could go through in this class.

Topics on Final from Midterm 1 (revised from the actual Midterm 1 list):

Chapter 1:
- Matrix multiplication, especially matrix-vector multiplication $A x$ as an operation on the columns of $A$
- Solving triangular systems: forward and back substitution. (Section 1.3)
- Cholesky factorization. Inner product form is the most important to remember; just know that a block form exists
- LU Decomposition and how it relates to row reduction
- Banded matrices: just know that it significantly improves solving time, bringing it down to $2 n s$ flops instead of $n^{2}$ . You do not need to remember the derivation. Also, Cholesky/LU of banded matrices is banded.
- Approximate flop counts of all the algorithms above (but not their derivation). Just make a table on your cheat sheet or something.
Chapter 2: You only need to know the definitions of the $2$ vector norm (the usual length of a vector), matrix norm (maximum magnification factor), and condition number (max mag divided by min mag), and why condition numbers are good (namely they measure what happens to relative error)
Chapter 3:
Setting up least squares problems, and the general details surrounding why such problems are necessary, e.g., trying to solve overdetermined systems $A x = b$ . [Section 3.1, Parts of section 3.5]
- The Normal Equations $A^{T} A x = A^{T} b$ . What a solution of these equations represents (namely, the value of $x$ making $| | b - A x | |$ minimum). [Parts of section 3.5]
- Using the (condensed) QR decomposition to solve least squares problems (parts of section 3.3 and 3.5] (i.e. solving $R x = Q^{T} b$ )

The following topics from the Midterm 1 unit will not be tested:

The derivation of the various flop counts (you only need to know what they are)
Basic linear algebra facts [Section 1.2]. Of course, if you need to use some of these facts, you will need to know them, but you will not be explicitly asked, e.g., "Determine whether these vectors are independent", "Find the inverse", "Find the determinant", "Row reduce to echelon form", etc.

Algorithms that try to minimize the fill-in of envelope zeros [Section 1.6]
Vector/matrix $p$ -norms for $p$ other than 2 [Section 2.2], or $\infty$ (which is the maximum magnitude of the components, useful in eigenstuff).
Actually computing a QR decomposition (you will only be asked to verify that something is the QR decomposition, i.e. showing Q is orthogonal, R is upper triangular, and A = QR, and then be asked to use this to find the least squares solution to $A x = b$ ). [Sections 3.2, 3.4]
No coding (even though we had one coding problem on Midterm 1, it will NOT be on the final.)
Gaussian elimination with pivoting (LU with row exchanges). [Sections 1.8, 1.9]. Note that, however, when you do use LU decomposition in practice, make sure you use an algorithm with pivoting (but no serious package doesn't, so don't worry too much about it. Just know that this exists and there's a reason for it, namely numerical stability)

Midterm 2 Topics:

Chapter 2: Already in the list above as a Midterm 1 topic, but be aware that many Chapter 4 concepts are devoted to clarifying these concepts here.

Chapter 4: Sections 4.1-4.3:

Section 4.1 - Intro to SVD, its definition, and interpretations of $U$ , $Σ$ , $V$ , condensed version, and the geometric (sum of $σ_{i} u_{i} v_{i}^{T}$ ) version
Section 4.2 - Applications of the SVD: The 4 fundamental subspaces in linear algebra in relation to the SVD, rank (just the strict linear algebra definition—you will not need to know numerical rank), and how SVD can be used to calculate the Chapter 2 concept of norms, magnification factors, and condition numbers.
Section 4.3 - Solution of Rank-Deficient Least Squares using SVD, the pseudoinverse, and its relation to $A^{T} A$ and $A A^{T}$ .

Chapter 8: Sections 8.1-8.4:

Enough familiarity with the matrices in section 8.1 so that if you see

$(- u_{i - 1} + 2 u_{i} - u_{i + 1}) / h^{2}$ ,

you won't panic. You will not be tested on the derivations of discretizations of PDE problems.
Theory of the three classical methods (Sec 8.2):
- The Jacobi Method: be able to rewrite it in matrix form by breaking the matrix A into a sum of diagonal and non-diagonal parts
- The Gauss-Seidel Method: be able to rewrite it in matrix form by breaking the matrix A into a sum of diagonal, upper triangular, and lower triangular parts (not a product).
- Successive Overrelaxation: Be able to describe in words the basic idea of what successive overrelaxation is. You will not be asked to actually calculate with it.
Tie all the above up by the concept of splittings, in Sec. 8.3 (they're all really special cases of one formula). Understand convergence as largest eigenvalue (Sec. 5.1), or by thinking of the spectral radius as the largest singular value (not strictly correct, but it carries less baggage of a new concept thrown at you).
Descent methods:
- The general concept of what they are and what their goal is (namely, to descend in $J$ with each iteration, where $J (x) = \frac{1}{2} x^{T} A x - b^{T} x$
- The concept of an exact line search: what is the line search parameter $α_{k}$ and its optimal value for a given descent direction $p^{(k)}$
- The realization of Gauss-Seidel as a descent method (periodically repeating blocks of standard basis vectors). You will not be asked to be precise about the notation involving this.
- The gradient of $J$ , namely $A x - b$ . The 20C concept of what a gradient means: direction and rate of fastest increase
- Steepest descent: choice of descent direction is $r = b - A x$ , the residual, which points directly opposite from the gradient, and therefore, points in the direction of fastest decrease.
- You will only need to know one and only sentence about the Conjugate Gradient Method:
  
  The Conjugate Gradient Method improves on the Steepest Descent method by using information from previous iterations for its descent directions.

These topics from the Midterm 2 unit not be covered on the final:

Any coding
Numerical rank
Section 4.4 on sensitivity of least squares methods.
How to derive finite difference methods. Simply know the formulas and the basic matrix structures.

Full convergence theory of iterative methods (most of Section 8.3). Although, you should not be terrified of the word "splitting" .. since that's what we were doing in Section 8.2 anyway.
Actually computing any spectral radii and/or eigenvalues.
Block iterative methods
Parallelization of iterative methods ("red-black" Gauss-Seidel/SOR)
Anything about the Conjugate Gradient method other than in that sentence above.

Last Unit: Eigenstuff

Basic concepts of what eigenvectors and eigenvalues are
The (forward) Power Method for finding the dominant eigenvalue and its corresponding eigenvector. You will be asked to compute a couple of iterations by hand, and what theoretical conclusions you can come to if you were to continue, for example:
- What it will converge to
- What the convergence rate is

These topics from the unit on Eigenstuff will not be covered:

Finding eigenvectors and eigenvalues the old way or by row reduction and polynomial factoring
Differential equations
Inverse iteration
Shift-and-Invert

View More · View on Piazza

Condensed Second Half of the Quarter

3/12/16 1:14 AM

Now for the long-awaited condensed version of the 2nd half of the quarter. (It is in the same spirit as the condensed version of the first half, @124)

We started the second half with the Singular Value Decomposition, $A = U Σ V^{T}$ .

This is the One Matrix Decomposition to Rule Them All.
Expensive to compute, basically computing eigenvalues and eigenvectors of $A^{T} A$ and $A A^{T}$ (see Homework 5.2.17), but by far the most informative, providing a wealth of information such as rank, range basis, null space, orthogonal complements, norms, and condition number.
This will solve the stubborn cases, namely, the least squares problem for rank-deficient matrix.
It is similar in spirit to the eigendecomposition $A = V Λ V^{- 1}$ , but it has the advantage of always existing (as a diagonal matrix), working for non-square matrices, and the orthogonality of the singular vectors.
The disadvantage is that it only works well for certain combinations of products of $A$ , such as $A^{T} A$ , $(A^{T} A)^{- 1} A^{T}$ , etc. but not all powers of $A$ . This is because the relevant relationship is $A v = σ u$ , where $v$ doesn't have to be the same as $u$ (like the analogous thing for eigenstuff).

Next we completely shifted gears away from decompositions. Namely, we started work on Iterative Methods, namely solving $A x = b$ by computing a sequence of vectors $x^{(0)}$ , $x^{(1)}$ , $x^{(2)}$ ,... This is the method of choice for very large sparse matrices for which a full LU decomposition is impossible ( $n^{3}$ grows very fast for $n$ ), and fill-in, namely zeros becoming nonzero, will kill you.

The motivation is solving PDEs, that is, to discretize the Laplace operator; its local nature means it is sparse. But, the finer the mesh, the larger the matrix (which grows much, much faster than the mesh size shrinks, especially in high dimensions).

These iterative methods fit into two broad categories:

Classical Methods and Splittings
- All the classical methods were special cases of a broad general notion of splitting: given $A$ , with a decomposition $A = M - N$ , for "easy-to-invert" $M$ (this means it is either diagonal, triangular, or that we somehow manage to discover $M$ 's factors by other means—SSOR, which we didn't cover, is a case of this),
- A splitting always yields some iterative method: namely, for each $k$ , we put in the formula
  
  $x^{(k + 1)} = M^{- 1} (b + N x^{(k)})$ , or, writing it in a manner with the philosophy that inverse matrices never have to be explicitly computed,
  
  $M x^{(k + 1)} = b + N x^{(k)}$ .
  
  This is a really fun (exam) problem to do: just start with $A x = b$ , then write out $(M - N) x = b$ , and finally move $N$ to the other side: $M x = b + N x$ . Then the $x$ on the $M$ , being the easier-to-invert matrix, gets to be the next iteration, and the $x$ on the $N$ gets to be the current iteration. (the way to remember it is that we're solving for the next one, so we want the easy-to-invert on the next one)
  
  The classical methods are then as follows: given splitting $A = D - E - F$ with $D$ diagonal, $E$ strictly lower triangular, and $F$ strictly upper triangular,
  1. The Jacobi Method
    - Based on the idea of correcting the $i$ th equation using data from the $k$ th iteration to get the next one.
    - Component formula:
      
      $x_{i}^{(k + 1)} = \frac{1}{a_{i i}} (b_{i} - \sum_{j \neq i} a_{i j} x_{j}^{(k)})$
    - Matrix form: $M = D$ , $N = E + F$ .
  2. The Gauss-Seidel Method (G-S)
    - Based on the idea of correcting the $i$ th equation using the most recently updated $(k + 1)$ th iteration in earlier components and the old $k$ th iteration for the components yet to be computed, to get the next one.
    - Component formula:
      
      $x_{i}^{(k + 1)} = \frac{1}{a_{i i}} (b_{i} - \sum_{j < i} a_{i j} x_{j}^{(k + 1)} - \sum_{j > i} a_{i j} x_{j}^{(k)})$
    - Matrix form: $M = D - E$ , $N = F$ .
  3. The Method of Successive Overrelaxation (SOR)
    - Based on the idea of over-correcting the $i$ th equation using the most recently updated $(k + 1)$ th iteration as in G-S, but adjusting the correction by a factor of $ω > 1$ .
    - Component formula:
      
      $δ_{i}^{(k + 1)} = \frac{1}{a_{i i}} (b_{i} - \sum_{j < i} a_{i j} x_{j}^{(k + 1)} - \sum_{j > i} a_{i j} x_{j}^{(k)}) - x_{i}^{(k)}$
      $x_{i}^{(k + 1)} = x_{i}^{(k)} + ω δ_{i}^{(k + 1)}$
    - Matrix form: $M = \frac{1}{ω} D - E$ , $N = \frac{1 - ω}{ω} D + F$ .
Descent Methods, which re-envision iterative methods as attempting to reduce the strain energy of a system, $J (x) = \frac{1}{2} x^{T} A x - b^{T} x$ .
- The basic idea is to take a step in a descent direction $p^{(k)}$ , which hopefully actually reduces $J$ : take $x^{(k + 1)} = x^{(k)} + α p^{(k)}$ for some $α$ .
- We can minimize along this direction, a process called an (exact) line search, which involves the parameter $α = α_{k} = \frac{p^{(k) T} r^{(k)}}{p^{(k) T} A p^{(k)}}$ .
- For example, the Gauss-Seidel method can be envisioned this way selecting the descent directions to be the standard basis vectors; each block of $n$ iterations is one iteration of Gauss-Seidel.
- If we allow inexact line searches, adjusting the line search parameter by a factor of $ω > 1$ , this gives us successive overrelaxation.
- The method of steepest descent chooses $p^{(k)} = r^{(k)}$ , the $k$ th residual $b - A x^{(k)}$ . This is because it is the negative gradient $- \nabla J (x^{(k)})$ which always points in the direction of fastest decrease.
- The conjugate gradient method improves upon the steepest descent method by using information from previous iterations.
Finally, we have the basic convergence theory: the spectral radius $ρ (G)$ is the size of the largest eigenvalue of $G$ , and determines the rate of convergence of the method when $G = I - M^{- 1} A$ for a splitting method.

This leads into our last and final unit, eigendecomposition.

Our motivation was to solve systems of ODEs, which are easy to solve with a diagonal matrix because the equations are decoupled.

Recap of Eigenstuff from 20D/20F:
- Eigenvectors and eigenvalues: they satisfy $A v = λ v$ where $λ$ and $v$ are possibly complex (even if $A$ is real)
- Only defined for square matrices.
- If all eigenvectors of $A$ span $C^{n}$ , $A$ is diagonalizable. $A = V Λ V^{- 1}$ where $Λ$ is a diagonal $n \times n$ matrix of eigenvalues.
- Not all matrices are diagonalizable: some of the eigenvalues may insist on being accompanied by $1$ s off the diagonal. However, for certain classes of matrices (such as symmetric real matrices ... doesn't have to be positive-definite), they are.
- Formally, $det (λ I - A)$ is a polynomial of $n$ th degree, called the characteristic polynomial. Its roots (which always exist as complex numbers) are precisely the eigenvalues of $A$ . If $A$ is real, then its roots may be complex, but they must always occur in conjugate pairs. $λ = a + i b$ and $λ = a - i b$ .
Computing dominant eigenvalues/eigenvectors: we have The Power Method.
- If the matrix has a single eigenvalue $λ_{1}$ of larger complex magnitude than any other, it is called a dominant eigenvalue.
- Such an eigenvalue must be real if $A$ is real, due to conjugate pairs of complex numbers having the same magnitude.
- For matrices with dominant eigenvalues, iteration of $A$ , $q_{k + 1} = A q_{k}$ , eventually converges to the corresponding eigenvector (but, with the appropriate rescaling: each $q_{k}$ should actually be rescaled by a component of largest magnitude: namely, first set temp $= A q_{k}$ and then $q_{k + 1} = t e m p / \max_{i} | {t e m p}_{i} |$ .
- Convergence theory: if $v_{1}, \dots, v_{n}$ are a (possibly complex) basis of eigenvectors, corresponding to complex numbers $λ_{1}, λ_{2},$ etc, where they are ordered in descending order of magnitude:
  
  $| λ_{1} | \geq | λ_{2} | \geq \dots \geq | λ_{n} |$ .
  
  Note that $| λ_{1} | = ρ (A)$ . Then the convergence rate of the norm of the errors
  
  $\frac{∥ q_{j + 1} - v ∥}{∥ q_{j} - v ∥}$ .
  
  is approximately $| λ_{2} / λ_{1} |$ (where $v$ is the suitably rescaled dominant eigenvector $v_{1}$ ).

View More · View on Piazza

Extra Office Hours

3/10/16 3:56 PM

I will be having extra office hours on Tuesday 3/15 12pm-2pm

In addition, your TAs have extra office hours: [note you do not have to go to TA for your section, specifically]

Jeremy: Sunday 3/13 (note the change) from 5pm-6:30pm and Monday 3/14 from 9:30am-11am
Francesca: Tuesday 3/15 10am-12pm

View More · View on Piazza

Additional Review Sessions Scheduled

3/10/16 2:01 PM

Reservations for Saturday afternoon and Monday evening have been confirmed (Sat 2:00 PM and Mon 5:00 PM in AP&M Calc Lab, B402A). Nevertheless, please still continue to vote until the poll closes. Again, if you can make the Sunday review session (Sun 3:00 PM in the usual lecture building PCYNH), please go there, because the capacity of Calc Lab is only 50 people.

Also, on Saturday, bring your IDs to campus, so that you can get into the building. Enter by the accessible ramp to the very left of the building when coming from the south side (the side with more doors):

[The north side of the building has only one door, which is also controlled by an ID card lock]

View More · View on Piazza

Review Session on Sunday

3/08/16 5:55 PM

We will be having a review session on Sunday, March 13 from 3pm-5pm in PCYNH 106 (the usual room).

Also beware of Daylight Saving Time ("Spring Forward"), which removes Sunday's 2am, so 3pm-5pm is according to the new time. Most cell phones (even non-smart ones) will automatically recognize this change, so there's not a great risk, but still, good to have a lil reminder.

View More · View on Piazza

Staff Office Hours
	Name	Office Hours
	Jeremy Schmitt	When? Where?
	Yifu Yang	When? Where?
	Francesca Grogan	When? Where?
	Christopher L Tiee	When? Where?