Description
Analysis of numerical methods for linear algebraic systems and least squares problems. Orthogonalization methods. Ill-conditioned problems. Eigenvalue and singular value computations. Three lectures, one recitation. Knowledge of programming recommended.
The Instructor's Take on the subject:
Linear algebra is an essential underpinning of much of modern science and engineering. It is therefore important to be able to effectively compute all the quantities involved, and understand how the abstract concepts are implemented, and various important issues that arise from doing so. My research concerns bridging things from the abstract modeling domain (often expressed in the form of differential equations) to the concrete and computable (which changes the differential equations into linear algebra problems). In this course, I hope to introduce to you the real-world methods of computational linear algebra.
General Information
Announcements
Actual letter grades are now posted on TED (NOT Gradescope). Here are the stats:
Mean: 85.8%
Median: 86.8%
Std Dev: 9.58%
The grade cutoffs were chosen where the largest gaps were, and curving was minimal, taking into account bonus points. The reason the average is significantly higher than that of the midterms is that (1) the final was significantly easier (mean was 85 after bonus points taken out) and most of you did very well on your homework (plus, dropping the lowest score).
Lowest% | |
0.00% | F |
60.00% | D |
67.00% | C- |
70.00% | C |
74.60% | C+ |
77.50% | B- |
80.00% | B |
84.75% | B+ |
87.50% | A- |
91.50% | A |
97.00% | A+ |
Again, people are welcome to check their final over.
#pinThe final has been graded. More details later, but this time (I really mean it) you should check if your final is actually there and all its pages are intact... and that you actually have a score... If not, let me know!
Details and regrade procedures will be posted later. Note that grades for the class are officially sent to the registrar on Tuesday, but grades can be corrected after being submitted, so regrade requests will remain on and will be valid for longer (even if your grade shows up).
In the meantime, here is the score distribution (you all did pretty well!):
These include some proofs, topics, and examples that I didn't get to during the review session.
Disclaimer: These are the notes I lectured off of for the Sunday review session, so they may have some crossed out formulas and scribbles in some areas, so not everything may be legible and there may be a typo here and there.
Here is the list of topics for the final (and also what is NOT on the final)
Generally, consult the Lecture Notes (https://piazza.com/ucsd/winter2016/math170a/resources) which have a more outlined form of the book, which treats topics in much more detail than we could go through in this class.
Topics on Final from Midterm 1 (revised from the actual Midterm 1 list):
- Chapter 1:
- Matrix multiplication, especially matrix-vector multiplication
as an operation on the columns of - Solving triangular systems: forward and back substitution. (Section 1.3)
- Cholesky factorization. Inner product form is the most important to remember; just know that a block form exists
- LU Decomposition and how it relates to row reduction
- Banded matrices: just know that it significantly improves solving time, bringing it down to
flops instead of . You do not need to remember the derivation. Also, Cholesky/LU of banded matrices is banded. - Approximate flop counts of all the algorithms above (but not their derivation). Just make a table on your cheat sheet or something.
- Matrix multiplication, especially matrix-vector multiplication
- Chapter 2: You only need to know the definitions of the
vector norm (the usual length of a vector), matrix norm (maximum magnification factor), and condition number (max mag divided by min mag), and why condition numbers are good (namely they measure what happens to relative error) - Chapter 3:
Setting up least squares problems, and the general details surrounding why such problems are necessary, e.g., trying to solve overdetermined systems . [Section 3.1, Parts of section 3.5]- The Normal Equations
. What a solution of these equations represents (namely, the value of making minimum). [Parts of section 3.5] - Using the (condensed) QR decomposition to solve least squares problems (parts of section 3.3 and 3.5] (i.e. solving
)
- The Normal Equations
The following topics from the Midterm 1 unit will not be tested:
- The derivation of the various flop counts (you only need to know what they are)
- Basic linear algebra facts [Section 1.2]. Of course, if you need to use some of these facts, you will need to know them, but you will not be explicitly asked, e.g., "Determine whether these vectors are independent", "Find the inverse", "Find the determinant", "Row reduce to echelon form", etc.
- Algorithms that try to minimize the fill-in of envelope zeros [Section 1.6]
- Vector/matrix
-norms for other than 2 [Section 2.2], or (which is the maximum magnitude of the components, useful in eigenstuff). - Actually computing a QR decomposition (you will only be asked to verify that something is the QR decomposition, i.e. showing Q is orthogonal, R is upper triangular, and A = QR, and then be asked to use this to find the least squares solution to
). [Sections 3.2, 3.4] - No coding (even though we had one coding problem on Midterm 1, it will NOT be on the final.)
- Gaussian elimination with pivoting (LU with row exchanges). [Sections 1.8, 1.9]. Note that, however, when you do use LU decomposition in practice, make sure you use an algorithm with pivoting (but no serious package doesn't, so don't worry too much about it. Just know that this exists and there's a reason for it, namely numerical stability)
Midterm 2 Topics:
Chapter 2: Already in the list above as a Midterm 1 topic, but be aware that many Chapter 4 concepts are devoted to clarifying these concepts here.
Chapter 4: Sections 4.1-4.3:
- Section 4.1 - Intro to SVD, its definition, and interpretations of
, , , condensed version, and the geometric (sum of ) version - Section 4.2 - Applications of the SVD: The 4 fundamental subspaces in linear algebra in relation to the SVD, rank (just the strict linear algebra definition—you will not need to know numerical rank), and how SVD can be used to calculate the Chapter 2 concept of norms, magnification factors, and condition numbers.
- Section 4.3 - Solution of Rank-Deficient Least Squares using SVD, the pseudoinverse, and its relation to
and .
Chapter 8: Sections 8.1-8.4:
- Enough familiarity with the matrices in section 8.1 so that if you see
,
you won't panic. You will not be tested on the derivations of discretizations of PDE problems. - Theory of the three classical methods (Sec 8.2):
- The Jacobi Method: be able to rewrite it in matrix form by breaking the matrix A into a sum of diagonal and non-diagonal parts
- The Gauss-Seidel Method: be able to rewrite it in matrix form by breaking the matrix A into a sum of diagonal, upper triangular, and lower triangular parts (not a product).
- Successive Overrelaxation: Be able to describe in words the basic idea of what successive overrelaxation is. You will not be asked to actually calculate with it.
- Tie all the above up by the concept of splittings, in Sec. 8.3 (they're all really special cases of one formula). Understand convergence as largest eigenvalue (Sec. 5.1), or by thinking of the spectral radius as the largest singular value (not strictly correct, but it carries less baggage of a new concept thrown at you).
- Descent methods:
- The general concept of what they are and what their goal is (namely, to descend in
with each iteration, where - The concept of an exact line search: what is the line search parameter
and its optimal value for a given descent direction - The realization of Gauss-Seidel as a descent method (periodically repeating blocks of standard basis vectors). You will not be asked to be precise about the notation involving this.
- The gradient of
, namely . The 20C concept of what a gradient means: direction and rate of fastest increase - Steepest descent: choice of descent direction is
, the residual, which points directly opposite from the gradient, and therefore, points in the direction of fastest decrease. - You will only need to know one and only sentence about the Conjugate Gradient Method:
The Conjugate Gradient Method improves on the Steepest Descent method by using information from previous iterations for its descent directions.
- The general concept of what they are and what their goal is (namely, to descend in
These topics from the Midterm 2 unit not be covered on the final:
- Any coding
- Numerical rank
- Section 4.4 on sensitivity of least squares methods.
- How to derive finite difference methods. Simply know the formulas and the basic matrix structures.
- Full convergence theory of iterative methods (most of Section 8.3). Although, you should not be terrified of the word "splitting" .. since that's what we were doing in Section 8.2 anyway.
- Actually computing any spectral radii and/or eigenvalues.
- Block iterative methods
- Parallelization of iterative methods ("red-black" Gauss-Seidel/SOR)
- Anything about the Conjugate Gradient method other than in that sentence above.
Last Unit: Eigenstuff
- Basic concepts of what eigenvectors and eigenvalues are
- The (forward) Power Method for finding the dominant eigenvalue and its corresponding eigenvector. You will be asked to compute a couple of iterations by hand, and what theoretical conclusions you can come to if you were to continue, for example:
- What it will converge to
- What the convergence rate is
These topics from the unit on Eigenstuff will not be covered:
- Finding eigenvectors and eigenvalues the old way or by row reduction and polynomial factoring
- Differential equations
- Inverse iteration
- Shift-and-Invert
Now for the long-awaited condensed version of the 2nd half of the quarter. (It is in the same spirit as the condensed version of the first half, @124)
We started the second half with the Singular Value Decomposition,
- This is the One Matrix Decomposition to Rule Them All.
- Expensive to compute, basically computing eigenvalues and eigenvectors of
and (see Homework 5.2.17), but by far the most informative, providing a wealth of information such as rank, range basis, null space, orthogonal complements, norms, and condition number. - This will solve the stubborn cases, namely, the least squares problem for rank-deficient matrix.
- It is similar in spirit to the eigendecomposition
, but it has the advantage of always existing (as a diagonal matrix), working for non-square matrices, and the orthogonality of the singular vectors. - The disadvantage is that it only works well for certain combinations of products of
, such as , , etc. but not all powers of . This is because the relevant relationship is , where doesn't have to be the same as (like the analogous thing for eigenstuff).
Next we completely shifted gears away from decompositions. Namely, we started work on Iterative Methods, namely solving
The motivation is solving PDEs, that is, to discretize the Laplace operator; its local nature means it is sparse. But, the finer the mesh, the larger the matrix (which grows much, much faster than the mesh size shrinks, especially in high dimensions).
These iterative methods fit into two broad categories:
- Classical Methods and Splittings
- All the classical methods were special cases of a broad general notion of splitting: given
, with a decomposition , for "easy-to-invert" (this means it is either diagonal, triangular, or that we somehow manage to discover 's factors by other means—SSOR, which we didn't cover, is a case of this), - A splitting always yields some iterative method: namely, for each
, we put in the formula , or, writing it in a manner with the philosophy that inverse matrices never have to be explicitly computed, .
This is a really fun (exam) problem to do: just start with , then write out , and finally move to the other side: . Then the on the , being the easier-to-invert matrix, gets to be the next iteration, and the on the gets to be the current iteration. (the way to remember it is that we're solving for the next one, so we want the easy-to-invert on the next one)
The classical methods are then as follows: given splitting with diagonal, strictly lower triangular, and strictly upper triangular,- The Jacobi Method
- Based on the idea of correcting the
th equation using data from the th iteration to get the next one. - Component formula:
- Matrix form:
, .
- Based on the idea of correcting the
- The Gauss-Seidel Method (G-S)
- Based on the idea of correcting the
th equation using the most recently updated th iteration in earlier components and the old th iteration for the components yet to be computed, to get the next one. - Component formula:
- Matrix form:
, .
- Based on the idea of correcting the
- The Method of Successive Overrelaxation (SOR)
- Based on the idea of over-correcting the
th equation using the most recently updated th iteration as in G-S, but adjusting the correction by a factor of . - Component formula:
- Matrix form:
, .
- Based on the idea of over-correcting the
- The Jacobi Method
- All the classical methods were special cases of a broad general notion of splitting: given
- Descent Methods, which re-envision iterative methods as attempting to reduce the strain energy of a system,
.
- The basic idea is to take a step in a descent direction
, which hopefully actually reduces : take for some . - We can minimize along this direction, a process called an (exact) line search, which involves the parameter
. - For example, the Gauss-Seidel method can be envisioned this way selecting the descent directions to be the standard basis vectors; each block of
iterations is one iteration of Gauss-Seidel. - If we allow inexact line searches, adjusting the line search parameter by a factor of
, this gives us successive overrelaxation. - The method of steepest descent chooses
, the th residual . This is because it is the negative gradient which always points in the direction of fastest decrease. - The conjugate gradient method improves upon the steepest descent method by using information from previous iterations.
- The basic idea is to take a step in a descent direction
- Finally, we have the basic convergence theory: the spectral radius
is the size of the largest eigenvalue of , and determines the rate of convergence of the method when for a splitting method.
This leads into our last and final unit, eigendecomposition.
Our motivation was to solve systems of ODEs, which are easy to solve with a diagonal matrix because the equations are decoupled.
- Recap of Eigenstuff from 20D/20F:
- Eigenvectors and eigenvalues: they satisfy
where and are possibly complex (even if is real) - Only defined for square matrices.
- If all eigenvectors of
span , is diagonalizable. where is a diagonal matrix of eigenvalues. - Not all matrices are diagonalizable: some of the eigenvalues may insist on being accompanied by
s off the diagonal. However, for certain classes of matrices (such as symmetric real matrices ... doesn't have to be positive-definite), they are. - Formally,
is a polynomial of th degree, called the characteristic polynomial. Its roots (which always exist as complex numbers) are precisely the eigenvalues of . If is real, then its roots may be complex, but they must always occur in conjugate pairs. and .
- Eigenvectors and eigenvalues: they satisfy
- Computing dominant eigenvalues/eigenvectors: we have The Power Method.
- If the matrix has a single eigenvalue
of larger complex magnitude than any other, it is called a dominant eigenvalue. - Such an eigenvalue must be real if
is real, due to conjugate pairs of complex numbers having the same magnitude. - For matrices with dominant eigenvalues, iteration of
, , eventually converges to the corresponding eigenvector (but, with the appropriate rescaling: each should actually be rescaled by a component of largest magnitude: namely, first set temp and then . - Convergence theory: if
are a (possibly complex) basis of eigenvectors, corresponding to complex numbers etc, where they are ordered in descending order of magnitude: .
Note that . Then the convergence rate of the norm of the errors .
is approximately (where is the suitably rescaled dominant eigenvector ).
- If the matrix has a single eigenvalue
I will be having extra office hours on Tuesday 3/15 12pm-2pm
In addition, your TAs have extra office hours: [note you do not have to go to TA for your section, specifically]
Jeremy: Sunday 3/13 (note the change) from 5pm-6:30pm and Monday 3/14 from 9:30am-11am
Francesca: Tuesday 3/15 10am-12pm
Reservations for Saturday afternoon and Monday evening have been confirmed (Sat 2:00 PM and Mon 5:00 PM in AP&M Calc Lab, B402A). Nevertheless, please still continue to vote until the poll closes. Again, if you can make the Sunday review session (Sun 3:00 PM in the usual lecture building PCYNH), please go there, because the capacity of Calc Lab is only 50 people.
Also, on Saturday, bring your IDs to campus, so that you can get into the building. Enter by the accessible ramp to the very left of the building when coming from the south side (the side with more doors):
[The north side of the building has only one door, which is also controlled by an ID card lock]
We will be having a review session on Sunday, March 13 from 3pm-5pm in PCYNH 106 (the usual room).
Also beware of Daylight Saving Time ("Spring Forward"), which removes Sunday's 2am, so 3pm-5pm is according to the new time. Most cell phones (even non-smart ones) will automatically recognize this change, so there's not a great risk, but still, good to have a lil reminder.
Name | Office Hours | |
---|---|---|
Jeremy Schmitt | When? Where? | |
Yifu Yang | When? Where? | |
Francesca Grogan | When? Where? | |
Christopher L Tiee | When? Where? |