Problem Set 9: Exponentials, logarithms, trig functions

Hi everyone!

Wow, what a term. Nothing has gone quite as planned! It has been an extraordinarily tough term to try to learn some math.

This will be the last required problem set of the term. I will try to ask some problems which will help you understand the current material. I will also try to ask some problems which will show you some (I think) magical things, which may provide some “closure” to the class.

I started the term promising a solution to the Kepler problem. I won’t put that in this required problem set (since physics is not everyone’s thing), but I am hoping to put it in an “epilogue” lecture/problem set after this one, if I have the time and energy . . .

Anyway, let’s get started!

Exponentials

We have studied the “exponential growth” differential equation at some length now:

$\dfrac{\mathrm{d}P}{\mathrm{d}t}=rP$, where $r$ is a constant.

This models unrestricted continuous population growth. (It also models nuclear decay, and a number of other things.)

Since I don’t necessarily want to be modeling population with time, let’s switch to more generic variables y and x:

$\dfrac{\mathrm{d}y}{\mathrm{d}x}=ry$, where $r$ is a constant.

Taking r=1, we drew the slope field. Assuming that we start at y=1 when x=0, we found an equation for the solution curve:

$y=e^x$

where $e\doteq2.7182818284590452353602874713526624977572470936999595749669676277240766303535475945713821785251664274274663919320030599218174135\ldots$. The number $e$ is found by the formula

$e=\lim_{\Delta t\to 0}\left(1+\Delta t\right)^{\frac{1}{\Delta t}}$;

we gave an argument for why that was so in class. (In case you’re curious, here is the number e to one million decimal places!)

The equation $y=e^x$ is the solution (i.e. equation of the solution curve) for the problem

$\dfrac{\mathrm{d}y}{\mathrm{d}x}=y$, and y=1 when x=0.

Let’s elaborate and expand on this a bit.

power series for exponential function

In class, I described trying to solve the the differential equation

$\dfrac{\mathrm{d}y}{\mathrm{d}x}=y$, and y=1 when x=0,

by means of an “infinite polynomial”, or power series. I started off looking for a solution with y as a power of x, $y=x^n$. That can’t possibly work, because the differential equation says that the derivative of the function y with respect to x should equal the original function I started with! If $y=x^n$, then

$\dfrac{\mathrm{d}y}{\mathrm{d}x}=nx^{n-1}$,

and there is no way that $x^n$ and $n x^{n-1}$ can be equal functions. (Their numerical values could be equal for some particular values of x, certainly. But the differential equation is saying that the two formulas should be equal for all values of x—that is, that they should define the same solution curve—and that can’t possibly happen for $x^n$ and $n x^{n-1}$.)

I run into the same problem with any polynomial formula for $y$, because, for example, if $y=x^{10}+3x^5-x^3$, then the highest power of $x$ in the derivative of $y$ would be an $x^9$; therefore, the formula of the derivative cannot equal the formula of the original $y$.

But we wouldn’t run into this problem if $y$ was a “polynomial” that did not have a highest power!

So, I assume that the unknown function y of x I’m looking for can be written in the form

$y=a_0 + a_1x + a_2 x^2 + a_3 x^3 + a_4 x^4 + a_5 x^5 + a_6 x^6 + a_7 x^7 +\dotsb$,

where the $a_0$, $a_1$, $a_2$, . . . are some unknown numbers. Note that at this point, I do not know whether this is actually going to work! I have no particular reason to believe that the solution curve $y$ can be written this way. I’m just going to try it, and see if it works. I might run into a roadblock, like I did when I tried ordinary polynomial formulas; in that case, I’d have to back up and try something different. Or, it might work out.

(If it does work out, then I don’t have to worry if the path I took to get to the answer was kind of mysterious or illegal. The problem is to find a solution of the differential equation. If you can cook up an answer some crazy way, like seeing it in a dream, that’s fine: as long as you can show it satisfies the equation, it’s got to be correct!)

Alright: suppose the function y of x is given by this “infinite polynomial” (power series),

$y=a_0 + a_1x + a_2 x^2 + a_3 x^3 + a_4 x^4 + a_5 x^5 + a_6 x^6 + a_7 x^7 +\dotsb$,

and suppose that it satisfies the differential equation

$\dfrac{\mathrm{d}y}{\mathrm{d}x}=y$, and y=1 when x=0.

Let’s see if we can find the constants $a_0$, $a_1$, $a_2$, . . . based on these assumptions.

Problem: Power Series for $e^x$ (see below for answers!)
a. Use the initial condition, y=1 when x=0, to determine one of the constants.
b. Find the derivative of the power series formula representing the function $y$. (Your answer should be another infinite power series.)
c. Suppose that $\dfrac{\mathrm{d}y}{\mathrm{d}x}=y$. This should mean that two infinite power series are equal, as functions. That is, the numbers in front of each power of x should be the same in both formulas. Use this to give an infinite list of conditions on the $a_i$ constants.
d. Solve for all the $a_i$!
e. Put your answers back in the assumed series for y, to obtain a final formula for y.

Answers:

a. Setting y=1 and x=0 into the assumed formula for y, we end up with

$1=a_0+a_1(0)+a_2(0)+a_3(0)+\dotsb$

so we get $a_0=1$. We know one of the constants now!

b. Using what we did in Problem Sets 1 and 2, the derivative of

$y=a_0 + a_1x + a_2 x^2 + a_3 x^3 + a_4 x^4 + a_5 x^5 + a_6 x^6 + a_7 x^7 +\dotsb$,

is

$\dfrac{\mathrm{d}y}{\mathrm{d}x}=a_1 + 2a_2 x + 3a_3 x^2 + 4a_4 x^3 + 5a_5 x^4 + 6a_6 x^5 + 7a_7 x^6 +\dotsb$.

c. I am assuming that the only way these two functions, $y$ and $\dfrac{\mathrm{d}y}{\mathrm{d}x}$, can be equal, is if they have exactly the same formula; that is, if the number in front of $x^n$ is the same in each formula for every power $n$. (I’m making an assumption here; if we were doing things more carefully, I’d need to prove that is actually true.)

This only works if we have:

$a_1=a_0$, $2a_2=a_1$, $3a_3=a_2$, $4a_4=a_3$, $5a_5=a_4$, . . .

d. Let me write these slightly differently:

$a_1=a_0$, $a_2=\frac{1}{2}a_1$, $a_3=\frac{1}{3}a_2$, $a_4=\frac{1}{4}a_3$, $a_5=\frac{1}{5}a_4$, . . .

But we know $a_0=1$! So we can solve these “recursively”:

$a_1=a_0=1$

$a_2=\frac{1}{2}a_1=\frac{1}{2}(1)=\frac{1}{2}$

$a_3=\frac{1}{3}a_2=\frac{1}{3}\frac{1}{2}=\frac{1}{3\cdot 2}$

(That dot in the last denominator is a “times”. I’m writing $a_3$ as $\frac{1}{3\cdot 2}$ rather than $\frac{1}{6}$, because I want to be able to see the pattern more easily. But it is just $a_3=\frac{1}{6}$.)

$a_4=\frac{1}{4}a_3=\frac{1}{4}\frac{1}{3}\frac{1}{2}=\frac{1}{4\cdot 3\cdot 2}$

$a_5=\frac{1}{5}a_4=\frac{1}{5}\frac{1}{4}\frac{1}{3}\frac{1}{2}=\frac{1}{5\cdot 4\cdot 3\cdot 2}$

Now you can see the pattern:

$a_n=\frac{1}{n\cdot (n-1)\cdot \dotsb \cdot 4\cdot 3\cdot 2\cdot 1}$

(I put the “times 1” at the end just to make the pattern more consistent looking. Of course it doesn’t change anything.)

Since this pattern happens a lot in math, we have a name (“factorial”) and a symbol for it: for any whole positive number n, we define

$n!=n\cdot (n-1)\cdot \dotsb \cdot 4\cdot 3\cdot 2\cdot 1$.

e. Putting these back into the formula for y, we get

$y=1+x+\frac{1}{2!}x^2+ \frac{1}{3!}x^3+\frac{1}{4!}x^4+\frac{1}{5!}x^5+\dotsb$

Since we already know that $y=e^x$, we get

$e^x=1+x+\frac{1}{2!}x^2+ \frac{1}{3!}x^3+\frac{1}{4!}x^4+\frac{1}{5!}x^5+\dotsb$

Checking the solution

Remember that I assumed that y could be written as

$y=a_0 + a_1x + a_2 x^2 + a_3 x^3 + a_4 x^4 + a_5 x^5 + a_6 x^6 + a_7 x^7 +\dotsb$

for some constants $a_0$, $a_1$, $a_2$, . . . And, pushing through with a spirit of hope, we found that if that is true, then in fact

$y=1+x+\frac{1}{2!}x^2+ \frac{1}{3!}x^3+\frac{1}{4!}x^4+\frac{1}{5!}x^5+\dotsb$

This should supposedly be a solution to the problem

$\dfrac{\mathrm{d}y}{\mathrm{d}x}=y$, and y=1 when x=0.

But is it? The path I took to get here is a little suspicious. Let’s try to check this.

Problem: Checking power series solution for exponential function
a. With y given by the formula above, check that y=1 when x=0. (This shouldn’t be hard!)
b. Find the derivative of y with the formula given above. Does taking the derivative give you back the original formula, as it was supposed to?

There is another thing to worry about. Remember when we found an infinite series for the function $1/(1+x)$, which came out

$\dfrac{1}{1+x}=1-x+x^2-x^3+x^4-x^5+x^6-x^7+\dotsb$?

If you recall, this formula worked out great if the x value was small enough, specifically $-1<x\leq 1$. But it made no sense for larger values.

Something similar can happen with this technique for solving a differential equation: we can get a formula which is correct, but only in a limited range.

Happily, as it turns out, our power series for $y=e^x$ actually gives correct answers for all values of $x$. (If you take a more advanced course, we would prove that.)

Solution with different initial conditions

Suppose that instead of starting with y=1 when x=0, we started with some other value $y=y_0$ at x=0. (In the population model, the value of $y_0=P_0$ would correspond to our starting population at time t=0.)

There are two ways we could find the formula for the solution. First way: we could go back to the same procedure we did above, just with a different starting point. It will be good practice to work this out yourself:

Problem: Exponential differential equation with different initial condition
As before, assume that we are trying to solve $\dfrac{\mathrm{d}y}{\mathrm{d}x}=y$, but now we are assuming $y=y_0$ when $x=0$. As before, assume that $y$ can be written in the form $y=a_0 + a_1x + a_2 x^2 + a_3 x^3 + a_4 x^4 + a_5 x^5 + a_6 x^6 + a_7 x^7 +\dotsb$.
a. Use the initial condition to determine the $a_0$. (“Determine” means it will be in terms of $y_0$.)
b. Repeat the process that you did before, to get $a_1$ in terms of $a_0$, to get $a_2$ in terms of $a_1$, etc etc. It should all work out very similarly, except for a $y_0$ in each of your formulas.
c. Put the answers all back into $y=a_0 + a_1x + a_2 x^2 + a_3 x^3 + a_4 x^4 + a_5 x^5 + a_6 x^6 + a_7 x^7 +\dotsb$, to get the formula for y. (It will involve $y_0$.)
d. If you haven’t already, try to simplify y. (Hint: You should be able to write it so there is only one $y_0$ in your entire formula.)
e. Compare to the answer you got before, when $y_0=1$. Can you write this solution in terms of the earlier solution?

The answer to the last part should be the following: you will end up with

$y=y_0 e^x$.

This solves the differential equation $\dfrac{\mathrm{d}y}{\mathrm{d}x}=y$, with the initial condition $y=y_0$ when $x=0$.

There is a second way: we can make a change of variable. This is what Thompson calls a “useful dodge”. Here’s how it works. Suppose we want to solve the differential equation

$\dfrac{\mathrm{d}y}{\mathrm{d}x}=y$, with initial condition $y=y_0$ when $x=0$.

I introduce a new variable $u$, which is defined by $y=y_0 u$. Since $y_0$ is a constant, if I vary u a little bit to $u+\,\mathrm{d} u$, then the corresponding $\mathrm{d} y$ is given as follows:

$y+\mathrm{d} y = y_0\left(u+\mathrm{d}u\right)$
$y+\mathrm{d} y = y_0 u+y_0\mathrm{d}u$
$\mathrm{d} y = y_0\mathrm{d}u$

So, dividing by $\mathrm{d}x$ on both sides, we get

$\dfrac{\mathrm{d} y}{\mathrm{d} x}=y_0\dfrac{\mathrm{d}u}{\mathrm{d} x}$.

(We could have also got the same result by the derivative rules.)

Substituting this, and $y=y_0 u$, into the differential equation $\dfrac{\mathrm{d}y}{\mathrm{d}x}=y$, we get the differential equation

$y_0\dfrac{\mathrm{d} u}{\mathrm{d} x}= y_0 u$,

from which I can cancel the $y_0$ on both sides, and get

$\dfrac{\mathrm{d} u}{\mathrm{d} x}= u$.

That’s exactly the same differential equation as I started with for y! What is the initial condition? Well, when $x=0$, we have $y=y_0$. Then, putting this into $y=y_0 u$, we get $y_0=y_0 u$, so $u=1$. That is, the initial condition is, when $x=0$, we have $u=1$. So our problem is:

$\dfrac{\mathrm{d} u}{\mathrm{d} x}= u$, and $u=1$ when $x=0$.

That’s exactly the problem we started with!! So it has the same solution:

$u=e^x$.

I want to go back to the original variable, so I’ll multiply both sides by $y_0$:

$y_0 u = y_0 e^x$.

Using $y=y_0 u$, we get:

$y=y_0 e^x$.

That solves the problem

$\dfrac{\mathrm{d}y}{\mathrm{d}x}=y$, and y=y_0 when x=0,

and it agrees with the solution we found the first way.

Problem: Checking the solution again
Check this directly. That is, start with $y=y_0e^x$. Knowing that the derivative of $e^x$ is $e^x$, use this to find the derivative of $y=y_0e^x$. Check that it satisfies the differential equation $\dfrac{\mathrm{d}y}{\mathrm{d}x}=y$. Also, substitute in $x=0$ to check that then $y=y_0$.

Solution with a different rate

We had started with the exponential growth differential equation,

$\dfrac{\mathrm{d}P}{\mathrm{d}t}=rP$, and $P=P_0$ when $t=0$.

where $r$ is some constant. The last little while, I’ve been assuming $r=1$ to make things simple. But now it’s time to put the $r$ back in.

(I am going to change from $y$ and $x$ back to $P$ and $t$ for these problems, just for variety. It doesn’t really change anything.)

Again, there are two ways to proceed! First way, do the same procedure that we started with from the beginning, with the infinite series:

Problem: Exponential growth with rate r, by infinite series
Suppose we want to solve $\dfrac{\mathrm{d}P}{\mathrm{d}t}=rP$, and $P=P_0$ when $t=0$. Start by assuming that
$P=a_0+a_1 t + a_2 t^2 + a_3 t^3 + a_4 t^4+\dotsb$.
a. Find the value of $a_0$, in terms of $P_0$.
b. Find the derivative of the power series representing $P$.
c. Find $rP$, where you will substitute in the power series for $P$, and multiply all the terms through by $r$.
d. Setting $\frac{\mathrm{d}P}{\mathrm{d}t}=rP$, you have two power series that are supposed to be equal. Use this to get conditions on the $a_1$, $a_2$, $a_3$, . . .
e. Solve the conditions to find the values of the $a_1$, $a_2$, $a_3$, . . .
f. Put the answer back in the power series, to find the formula for P. Simplify it as much as you can.

I won’t give the answers right away, but you will be able to check your answers against the second method.

The second method is to make a change of variable (the “useful dodge”):

Problem: Exponential growth with rate r, by change of variable (see below for answers!)
Introduce a new variable $s$, defined by $s=rt$.
a. Find the relationship between $\mathrm{d} s$ and $\mathrm{d}t$. (Remember that $r$ is a constant.)
b. Substitute this into the differential equation $\frac{\mathrm{d}P}{\mathrm{d}t}=rP$, to get a differential equation for $P$ as a function of $s$ (that is, eliminate the variable t).
c. Recognize this as a differential equation you can solve! Check the initial condition, and write out the answer for P as a function of s.
d. Make a substitution to write P as a function of t, which is the final answer.

Answers:

a. $\mathrm{d} s = r\,\mathrm{d} t$.

b. There’s different ways to do the algebra. I find it convenient to rewrite the differential equation as

$\mathrm{d}P=rP\,\mathrm{d}t$.

Then I can identify $r\,\mathrm{d} t$ as $\mathrm{d} s$, so

$\mathrm{d}P=P\,\mathrm{d}s$,

or

$\dfrac{\mathrm{d}P}{\mathrm{d}s}=P$.

c. This is just the original differential equation! And the initial condition is the same, too, because when $s=0$, we have $t=0$, so $P=P_0$ when $s=0$. So the solution is

$P=P_0 e^s$,

as in the previous section.

d. Now, we can put back $s=rt$, to find

$P=P_0 e^{rt}$

as the solution of

$\dfrac{\mathrm{d}P}{\mathrm{d}t}=rP$, and $P=P_0$ when $t=0$.

Nice!

You can use this to check your answer by the first method. For the first method, you should have gotten

$P=P_0\left(1+rt+\frac{1}{2!}r^2t^2+\frac{1}{3!}r^3t^3+\frac{1}{4!}r^4t^4+\dotsb\right)$

You can rewrite this as

$P=P_0\left(1+(rt)+\frac{1}{2!}(rt)^2+\frac{1}{3!}(rt)^3+\frac{1}{4!}(rt)^4+\dotsb\right)$

and that is the same as substituting (rt) in everywhere for the original exponential series we got. That is,

$P=P_0e^{rt}$,

which agrees with the second method.

Trigonometric functions

Now, I want to refer back to what we did on trigonometric functions, in Problem Set #6, Problems #5 and 6. First, a quick reminder of the setup. I was starting with a circle of radius 1, whose center was at the origin (0,0) of an x-y coordinate system. I was starting at the point (1,0) on the circle, and then traveling a distance s counter-clockwise along the edge of the circle.

After traveling a distance s counter-clockwise along the edge of the circle, starting at (1,0), I end up at a point (x,y). I wanted to find formulas for x and y in terms of s. Or similarly, to find formulas for s in terms of x or in terms of y.

Now, there is a trig answer:

$x=\cos s$ and $y=\sin s$.

You could either prove these facts, assuming the definitions you learned in high school for cosine and sine (based on right triangles), and using the diagram above. Or, another way of looking at things is to define the functions cosine and sine by this diagram, so that the two formulas above are true by definition. This is the way I will think of it. (Then the high school definitions of cosine and sine follow from this definition.)

Note that this doesn’t totally answer my question: I do not have any way of calculating x or y if I know s. Saying that $x=\cos s$ is just a way of naming the relationship. If I have a calculator, and I know s, I can find x; but how is the calculator doing that??

In what follows, I’m going to look for an actual formula for x and y in terms of s, which will actually let me find the x and y numerically if I know the s. (In another way of speaking, it will give us an actual formula for the cosine and sine functions.)

Remember also that I increased the s slightly to $s+\,\mathrm{d}s$, which changed the point $(x,y)$ to the point $(x+\,\mathrm{d}x,y+\,\mathrm{d}y)$. In Problem Set #6, I asked you to try to figure out $\mathrm{d}x$ and $\mathrm{d}y$ in terms of $\mathrm{d}s$. Here were the answers:

$\mathrm{d} x = -y \,\mathrm{d}s$
$\mathrm{d} y = x \,\mathrm{d}s$.

That translates to two differential equations:

$\dfrac{\mathrm{d} x}{\mathrm{d}s} = -y $
$\dfrac{\mathrm{d} y}{\mathrm{d}s} = x $.

There are two unknown functions, x and y, which each depend on the variable s. Their differential equations are interlocked: the derivative of the unknown function y equals the unknown function x; the derivative of the unknown function x equals the negative of the unknown function y.

We also have initial conditions: looking at the diagram, if $s=0$, then we are at the point $(1,0)$, which means $x=1$ and $y=0$ when $s=0$.

We can attempt to solve these differential equations by the same method as before, with the infinite power series. There are now two unknown power series:

$x=a_0 + a_1 s + a_2 s^2 + a_3s^3 + a_4 s^4 + a_5s^5 + a_6 s^6 + \dotsb$
$y=b_0 + b_1 s + b_2 s^2 + b_3s^3 + b_4 s^4 + b_5s^5 + b_6 s^6 + \dotsb$

Here, the $a_0$, $a_1$, $a_2$ . . . and the $b_0$, $b_1$, $b_2$ . . . are all unknown constants. (These are different from the $a_0$, $a_1$, $a_2$ from before for the exponential function.)

We can use initial conditions and differential equations to solve for all these unknown constants:

Problem: Power Series for sine and cosine
a. Use the initial conditions to find the values of $a_0$ and $b_0$.
b. Find the derivative of the power series for $x$, and of the power series for $y$.
c. Find the power series for $-y$. (Just multiply through the minus sign times every coefficient of the power series for $y$.)
d. Now, use the equalities $\frac{\mathrm{d} x}{\mathrm{d}s} = -y $ and $\frac{\mathrm{d} y}{\mathrm{d}s} = x $ to say that two pairs of infinite power series should be equal. Use that to get two series of conditions that relate the $a_1$, $a_2$, $a_3$ . . . and the $b_1$, $b_2$, $b_3$, . . .
e. Solve these equations, to find the values of all the $a_1$, $a_2$, $a_3$ . . . and the $b_1$, $b_2$, $b_3$, . . .
f. Put the values of the $a_1$, $a_2$, $a_3$ . . . and the $b_1$, $b_2$, $b_3$, . . . back into the power series for x and y, to get formulas for x and y in terms of s.

Now that we have the solutions, this gives us an infinite power series for $x= \cos s$ and for $y=\sin s$. Once you have a tentative answer, you can check it:

Problem: Checking solutions for power series for sine and cosine
Take your power series for x, and take its derivative. It should come out to be the same as the negative of the power series for y. Does it? If not, you might need to simplify; or you might need to correct an error from the previous problem. Similarly, take your power series for y, and take its derivative; it should come out to be the same as your power series for x.

I don’t want to put the answers here, to give you a chance to work them out. However, if you look up “cosine power series”, you can find the answer.

It’s also interesting to check these answers graphically:

Problem: Graphing the power series for sine and cosine
Use Desmos to graph the function $y=\cos x$. (To use the grapher, we have to change our variable s to x, and x to y, which is maybe a little confusing!) Now, on the same axes, graph $y=1-(1/2)x^2$, which should be the first two terms of your power series for cosine (again, switching the variable s to x). There should be a nice fit near x=0 (s=0)! This is the “best fit parabola” to the function cosine at s=0. Now, graph the functions you get when you add in more and more terms of the power series. If everything is right, you should see these polynomials fitting more and more closely the cosine function.

Oscillations

I won’t quite get to the Kepler problem in this problem set, although I’ll try to show how to solve it in a final optional lecture after this one. However, I can show you how these ideas come up in physics problems, by using a simpler physics problem.

Let’s say we have a mass on a spring. I assume that the mass can only move in one direction, let’s call it the x-axis. I set up my coordinate so that the spring is at rest (not stretched or squeezed) when x=0. Then x is the displacement from rest. If $x>0$, the spring is stretched, and the mass experiences a force in the negative direction; if $x<0$, the spring is squeezed, and the mass experiences a force in the positive direction. In either case, the force is pushing the mass back towards x=0; such a force is called a restoring force, and x=0 is called a stable equilibrium.

The simplest model is to assume that the restoring force is simply proportional to the displacement from equilibrium:

$F=-kx$,

where $x$ is the displacement from equilibrium, and $k$ is some constant. Nearly any force approximately obeys this assumption for small enough displacements. So we can use it for a mass on a spring, but also for a swinging pendulum, or a swaying bridge: nearly any situation where there is a stable equilibrium and a restoring force, and where the displacements are not too large.

Using Newton’s law,

$a=\dfrac{F}{m}$,

we find that

$a=-\dfrac{k}{m} x$.

The acceleration, by definition, is the derivative of the velocity with respect to time; and the velocity, by definition, is the derivative of the displacement with respect to time. We have two unknown functions of time $t$, the position $x$ and velocity $v$, and we get two interlocked differential equations for those unknown functions:

$\dfrac{\mathrm{d}v}{\mathrm{d} t} = -\dfrac{k}{m} x$
$\dfrac{\mathrm{d}x}{\mathrm{d} t} = v$

This is suspiciously similar to our situation with the cosine and sine functions!! That is not a coincidence.

As before, we could take two strategies: we could start over with the unknown power series, and solve as before. We would get very similar answers to what we got for cosine and sine, with some $\frac{k}{m}$ factors thrown in. Or, second strategy, we could change variables. I will just outline the second strategy here.

Problem: Solving harmonic oscillator (partial answers below)
a. To start off with, let’s suppose that $\frac{k}{m}=1$. Let’s also suppose that the initial condition is $x=1$, and $v=0$, when $t=0$. In this case, compare to the differential equations we got before for the unit circle, for x and y in terms of s. They should be very similar! Use this similarity to write the position x and velocity v as functions of t. The functions should involve cosines and/or sines.
b. Does your answer make physically?
c. Now, if $\frac{k}{m}\neq 1$, we can try the change of variables trick. There’s something a little sneaky this time: I’m going to make a new variable s to replace t, by $s=\sqrt{\frac{k}{m}}t$, and I’m going to make a new variable w to replace v, by $v=\sqrt{\frac{k}{m}}w$. I’m going to leave x as just x. (If I had more time, I could explain where I got these!!) Use these equations to find the relation of $\mathrm{d}s$ and $\mathrm{d}t$, and also the relation of $\mathrm{d}v$ and $\mathrm{d}w$. Put the relations into the differential equations. You should find, once everything is simplified, that you get the equations
$\dfrac{\mathrm{d}w}{\mathrm{d} s} = – x$
$\dfrac{\mathrm{d}x}{\mathrm{d} s} = w$
d. Assuming we again start with $x=1$ and $v=0$ when $t=0$, use that to give initial conditions for $x$ and $w$, when $s=0$. Then give the solutions to the differential equations for x and w, as functions of s.
e. Finally, substitute back in, to find the position x and velocity v as functions of time t.

Something that is remarkable here is that the differential equation for oscillation is very physically natural. If we try to solve this differential equation, we get the power series for cosine and for sine (for the position and velocity). So, even if we never cared about triangles at all, the cosine and sine functions would be forced on us by physics and differential equations. In fact, this is the more calculus-style way of defining cosine and sine: we define them to be the solution of the differential equations I wrote earlier. Then we would use that to prove that they in fact give the x and y coordinates of that point on the circle, and use that to finally say they happen to equal opposite/hypotenuse and adjacent/hypotenuse.

And finally, some magic

I want to finish with a formula that I think is kind of magical. It is not just a nice mathematical formula though; it is very important in engineering and physics, for example. It will give a simpler way of solving the oscillation problem we just did, and it will allow for oscillations with damping (though we won’t have time to get to that: take ordinary differential equations next term!).

First, I need to remind you / tell you about complex numbers. You may recall that a negative number cannot have a square root. There cannot be any normal number x such that

x times x equals -1,

because (positive)*(positive)=positive, and (negative)*(negative)=positive. So $\sqrt{-1}$ has no meaning in ordinary (“real”) numbers. However, as early as the 1400s, people identified instances where having a square root for negative numbers would be mathematically convenient. It turns out that many things in mathematics (and physics) work out much more nicely if you allow for negative numbers to have square roots. This requires us to expand our concept of “number” to “complex numbers” (which contain the real numbers). Complex numbers turn out to be inextricably part of quantum mechanics, which makes them part of reality! They are not just a convenient mathematical invention.

We simply declare the existence of a new number, which we call $i$, which has the property that

$i^2=-1$.

This means $\sqrt{-1}=\pm i$. Similarly, $\sqrt{-4}=\pm 2 i$. (You can check these claims by squaring $i$ and $-i$, and seeing you get -1 in both cases; or squaring $2i$ and $-2i$, and seeing that you get -4 in both cases.)

A complex number is any number I can make by combining real numbers with any expressions involving $i$. It turns out that the parts involving $i$ can always be greatly simplified; in fact, any algebraic mess you can make with real numbers and $i$s can always be boiled down to $x+iy$, for some real numbers $x$ and $y$.

As an illustration, try the following:

Problem: Powers of $i$
a. Simplify $i^3$. (Remember that $i^3=i\cdot i\cdot i$.)
b. Simplify $i^4$. (Hint: this one should come out to be very simple!!)
c. Simplify $i^5$, $i^6$, $i^7$, $i^8$, $i^9$.
d. What’s the pattern?
e. Simplify $i^{403}$.

If this was specifically about complex numbers, I would go on to tell you how to simplify things like $1/(1+i)$ or $\sqrt{i}$. But what you did in the problem is enough to tell you the magical thing I want to tell you.

Here’s what I want you to try: to simplify the function $y=e^{ix}$, where $i^2=-1$ as above.

Problem: The function $y=e^{ix}$
a. Recall the power series for $y=e^x$ that you found, at the beginning of this problem set. In that series, substitute in $ix$ every place there is an $x$, to obtain the series for $y=e^{ix}$. (Be sure to put the $ix$ in as one unit in place of the $x$, with parentheses as needed. For example, if there is an $x^2$ in the power series for $y=e^x$, replace that $x^2$ with $(ix)^2$, so that that $i$ will get squared as well.
b. Replace $(ix)^2=i^2x^2$, $(ix)^3=i^3x^3$, etc.
c. Now, use the simplifications you made for powers of $i$.
d. You will find that half the terms have an $i$, and half do not. Collect all the terms without any $i$ in them all first, and then collect all the terms that do have an $i$ in them afterwards. If everything is correct, you should be able to factor out a common “$i$” out of all the latter half of the terms: do so.
e. Now, you should have one power series without any $i$, plus $i$ times a second power series. You should recognize each of those power series from a previous question! Use what those power series represent, to write $e^{ix}$ in terms of other known functions.

I don’t want to write the answer here, because I don’t want to spoil the punch line. But if you are stuck, and want to see what answer I am aiming for, or you want to check your answer, google the term “Euler’s formula” and you’ll see what I’m talking about. (There are actually a bunch of different things called “Euler’s formula”—he did a lot of formulas—but the first hit should be the formula I’m talking about here.)

Conclusion

There is so much more I want to tell you about! We’ve just started on all the cool things there are in calculus. But this is a good stopping point for this term: I think, if you’re caught up till now, then you should hopefully be able to do most of this problem set before the end, and be able to see a nice wrapping-up point.

If I have the time and energy, I will post an “epilogue” problem set, in which I will try to show you some of the cool things I have left out. In particular, I had promised you a derivation of Kepler’s laws. We’re nearly there (so close!), but I don’t want to overwhelm you; so I’ll put those in the optional “epilogue” (which I hope I’ll have the energy to write!). Either way, I’ll be giving you some references for further reading.


Some applications of eigenvalues (Problem Set 5)

This will be a combination of written lecture and problem set. I’ll discuss some applications of eigenvalues, and integrate that with suggested problems to try.

Three comments:

  • There are more problems here than you can realistically do. There are many of them, and they are open-ended. I am not expecting you to do them all. Pick one problem (or maybe two), and do that problem in as much depth as you have time for.
  • In fact, I mentioned at the beginning that you could do a project for this course if you choose. Some of the problems here are open-ended enough that they would make good projects, which you could concentrate mostly on for the rest of the term if you wanted.
  • On the other hand, because this lecture focuses on applications, most of the problems are for those of you who are interested in applications and computing. If you are mostly interested in theory, take a look at the questions on Fibonacci numbers and on Spectral Graph Theory; those have some quite interesting theory sides. (The other applications involve a lot of theory too!) But if you are mostly interested in theory, and don’t find any questions to your interest in this problem set, it is OK to skip it entirely; I will have more theory problems on the next problem set that you can do instead!

Google PageRank

A diagram showing a directed graph. The vertices are A, B, C, and D. There are edges from A to B and C; from B to C; from C to B; and from D to A.
A social network.

We discussed this in class some, so I’ll be brief. Google has two problems in developing search:

  • Collecting all the information and storing it, particularly all occurences of keywords, so you can create a list of all websites with a given search word;
  • Ranking those search results, so you can get to the most relevant and important websites.

The first problem is largely a (difficult and interesting!) coding and implementation problem. The second problem has a more mathematical component to it.

The great challenge is to determine which website is most “important”, without the algorithm being able to understand the semantic content of any of the websites. (Although people have added evaluations of content to improve search, either by humans or AI, it’s not feasible to expect it for the bulk of search ranking. Most of the job needs to be automated.)

There are two basic ideas:

ignore the content entirely, and only look at the directed graph of outgoing and incoming links

This abstracts away anything about the websites. But with only this very abstracted information, what can we tell? How can we get started?

The second idea addresses this, in a self-referential way. It is that a website is important if other important websites link to it. More quantitatively:

the importance score of a website is proportional to the sum of the importance scores of all the websites that link to it

The first basic idea means that this setup is applicable to many things other than search. It has been used to analyze social networks to determine the “most important” players. In particular, it has been used to analyze terrorist networks, in which the analysts knew the number and direction of electronic messages, but nothing about the people behind the accounts. It has also been used in biology, where interactions between proteins or organisms are known without having details of the nature of the interactions.

The second basic idea is, as I said, essentially self-referential.

The problem below has several parts. Feel free to do just part; get into it in as much depth as you find interesting and have time for. (And feel free to skip it if this isn’t your interest.)

Problem: Google PageRank toy problem and set-up
a) Consider the social network that I drew above. We have people A, B, C, and D. An arrow from X to Y represents “X likes Y”. (Or you can think of A, B, C, and D as websites, and an arrow from X to Y represents “X links to Y”.) We are looking for a “popularity” or “importance” score for A, B, C, and D, obeying the rule above: the popularity score of X should be proportional to the sum of popularity scores of all those who like (link to) X. Set up equations expressing this. (Recall that two quantities P and Q are said to be proportional if it is always true that P=kQ, for some constant k.)
b) Write those equations as a matrix equation.
c) Write the matrix equation as an eigenvalue and eigenvector equation.
d) Can you find a non-zero popularity function for this example, which obeys the equation? (You can either try to solve the equations, or look at the graph and think practically.) What is the eigenvalue? Do you think this is typical?
e) Write out what the procedure and the equation would be for any given directed graph.

Problem: Google PageRank, Markov processes, and successive approximation
This is an open-ended question, which could be both theory and/or computing. Someone in class had the excellent idea of treating this problem like the Markov process we discussed earlier. We could start with some arbitrarily chosen popularity scores, and then let popularity “flow” through the network, like we did with the Markov system. The idea is that this will make our popularity scores come closer to scores which obey the condition we want. The correct popularity scores would be obtained by iterating over many time steps, to approximate the limit.

This raises two questions which you could look at:
(Theory-ish): Should this work? Can you show that the limit under this process would obey the equations we got in the previous question? Are there some limitations?
(Computing): Try some examples and see if it works!

Problem: Google PageRank existence and uniqueness
This is an open-ended, more theory type question. We have set up equations that we want the popularity/importance scores to satisfy. The questions that always come up in solving a math problem are:
– does there always actually exist a solution (i.e. a non-zero popularity score which obeys our condition)?
– if there exists a solution, is it unique? If not, can we quantify the nature of the set of solutions? (And, for a practical problem, if there are multiple solutions, can we figure out which one to choose?)
This is worth thinking about at least a little bit. It will be tough to solve in general at this point, but even if you can formulate it, that will give you some insight (and relate to other things we will do soon).

Problem: Google PageRank computing examples
This problem is a “computing” problem, and a little more open-ended. It would be interesting to generate a network, set up the matrix eigenvalue equation, and then actually compute the eigenvalues and eigenvectors, and see what the popularity scores look like. This could also give you insight into the above problems, of whether there always exists a solution and if it is unique.

Note that SciPy/SageMath have built-in functions to compute eigenvalues and eigenvectors. I would recommend starting with a small example, like the one above that we did and have some answers for. Then gradually make it bigger: maybe add some extra edges to our A, B, C, D example first. Then try to add more nodes. If you’re feeling ambitious and want larger examples, you could get the computer to randomly make some larger networks. Even if you’re not specifically interested in this sort of computing application, some computer experimentation like this can give you some intuition and insight about how eigenvalues and eigenvectors work for these sorts of matrices!

Problem: Google PageRank further reading
There are a number of good resources about this problem. For this problem, you could start working your way through this paper:
The $25,000,000,000 Eigenvector: The Linear Algebra Behind Google
The paper suggests a more refined version of what I’ve been arguing here, and answers some of the theoretical questions that I mentioned. And there are exercises! If you find this question interesting, working through this paper could be a good final project for the class, that would teach you a lot about the topics we have been working with.

Problem: Computing eigenvalues by powers of a matrix; dominant eigenvalue
In the discussion in class, it came up that for large matrices like in PageRank, the eigenvalues and eigenvectors (particularly the largest, or “dominant” eigenvalue) are often computed by taking powers of the matrix, rather than the method I described in class for small matrices. If you are interested in this topic, you can find an introduction in the first part of the following excerpt (from Kolman and Hill, Elementary Linear Algebra):
Kolman and Hill, Dominant Eigenvalue and Principal Component Analysis
Note that the section starts with the spectral decomposition! For this problem, you could work through the explanations and examples in this handout, and perhaps try a couple of exercises, depending on the depth of your interest in this topic.

Principal Component Analysis

I won’t attempt to write up an explanation here. Rather, I will refer you to some sources. Depending on your level of interest in this topic, I would recommend reading and working through these sources, trying some of the exercises, and perhaps trying an analysis on your own. This could count as a problem set or a project, depending how far you take it. But if you just want the basic idea, you could read through the first sources.

The first things I would recommend reading are the following excerpts from Kolman and Hill, Elementary Linear Algebra. The first one is on the correlation coefficient:
Kolman and Hill, Correlation Coefficient
Understanding this material on the correlation coefficient is not strictly necessary to understand principal components, but it will help you to understand how the basic data analysis is set up, and what the variance is measuring. It’s also interesting in its own right.

The second thing I would recommend reading is also an excerpt from Kolman and Hill, Elementary Linear Algebra:
Kolman and Hill, Dominant Eigenvalue and Principal Component Analysis
If you aren’t specifically interested in dominant eigenvalues in general (see the problem in the previous section), you could skip directly to the section on principal component analysis, and only refer back to the first part of the chapter as needed. I would recommend working through the explanations and examples, and perhaps trying an exercise.

If you want more detail, this tutorial by Lindsay Smith has a nicely paced, detailed explanation of all the elements that go into PCA, with examples:
Lindsay Smith, A Tutorial on Principal Components Analysis.

If you would like to go deeper into PCA, and learn more of the theory, this tutorial by Jonathon Shlens is good:
Jonathon Shlens, A Tutorial on Principal Component Analysis.
If you wanted to work through this tutorial as a project, it would touch on many of the topics that we have covered and also many of those I am planning for the rest of the term.

Markov Processes and Cryptoanalysis

The following article about using Markov processes to decipher messages is quite fun:

Persi Diaconis, The Markov Chain Monte Carlo Revolution

He also talks about other applications of Markov chains as models. The introduction should be readable for everyone, if you’re interested. Section 2 should be mostly possible to understand, based on what we have done, if you take it slowly. This could be an excellent project, to try to work through this section, and maybe to implement a similar model for a problem of this type. Sections 3, 4, and 5 are too advanced for this class, but it might be fun to browse ahead and see what sorts of math is being used (particularly if you took Abstract Algebra or Analysis…)

(Incidentally, Persi Diaconis was known for being both a mathematician and a magician. If you’re interested in magic, gambling, card tricks, etc, it might be interesting to look for some of his other works.)

Fibonacci sequence and discrete dynamical systems

You’ve probably seen the Fibonacci sequence before:

1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89, . . .

Each term in the sequence is the sum of the previous two terms, starting with $f_0=1$ and $f_1=1$:

$f_{n+1}=f_n + f_{n-1}$, for $n=1, 2, 3, \dotsc$.

You may know that there is a relation of the Fibonacci numbers to the “golden ratio” $\phi$:

$\lim_{n\to]infty}\dfrac{f_{n+1}}{f_n}=\phi$,

where $\phi\doteq 1.618033\ldots$. The golden ratio is the proportion of length to height of a “golden rectangle”. A “golden rectangle” is one for which, if you cut off a square, then the remaining rectangle is similar to the original:

(Image credit Wikipedia)

Perhaps more surprisingly, there is an exact formula for the nth Fibonacci number, in terms of the golden ratio:

$f_n=\dfrac{\phi^n-\psi^n}{\phi-\psi}$,

where $\psi=-\dfrac{1}{\phi}$. One of the surprising things about this formula is that there is no apparent reason, looking at it, that it should ever come out to a whole number! (Never mind that it comes out to a whole number for every value of n.)

There are at least two ways of deriving this formula. One is a calculus way, involving infinite series. The other involves linear transformations and eigenvalues. This set of problems is about deriving the closed-form formula above (and generalizing it, if you are interested in going that far).

Problem: Golden Ratio pre-amble
To solve the Fibonacci problem, it will help to figure out some things about the golden ratio first. You don’t have to, you can skip this and press ahead to the next question; but doing this one first will make the computations much cleaner.
a) Find a polynomial equation for the golden ratio $\phi$, based on the defining golden rectangle I mentioned above.
b) Find an exact expression for $\phi$. (Which root of the polynomial is it?)
c) Show that $\phi$ satisfies the equation $\phi^2=\phi + 1$. (If you do it right, this should be immediate, no calculation required!) This equation is quite useful for doing computations with $\phi$. For example, try writing $(2+\phi)^2$ in the form $a+b\phi$, with $a$ and $b$ integers.
d) Write $1/\phi$ in the form $a+b\phi$, for some rational $a$ and $b$.
e) (Theory, optional): The computations above are the main steps in showing that the set of all numbers $a+b\phi$, where $a$ and $b$ are rational, form a field. Complete this proof. (This is very similar to how we construct the field of complex numbers from the field of real numbers.)(This means that the computations below are actually over this field, called $\mathbb{Q}(\phi)$. Fields like this are very important in math.)
f) What is the other root of the defining polynomial for $\phi$? Write this other root in terms of $\phi$. (Hint: work out the factorization of the polynomial in terms of $\phi$.)

Now, here’s the translation of the Fibonacci numbers into a matrix form. I claim that the following equation holds, for any n=1, 2, 3, . . . :

$\begin{bmatrix}f_{n+1}\\f_n\end{bmatrix}=\begin{bmatrix}1 & 1\\1 & 0\end{bmatrix}\begin{bmatrix}f_{n}\\f_{n-1}\end{bmatrix}$.

Exercise:
a) Start with the vector $\begin{bmatrix}1\\1\end{bmatrix}$, corresponding to $n=1$, and multiply by the matrix above (let’s call it $T$) several times, to check numerically that it does in fact generate Fibonacci numbers.
b) (Optional) Prove that the matrix equation given above is valid for all positive integers $n$.

Now, finding Fibonacci numbers means finding large powers of the matrix $T=\begin{bmatrix}1 & 1\\1 & 0\end{bmatrix}$. But we have seen how to do this using eigenvalues and eigenvectors!

Problem: Exact formula for Fibonacci numbers from diagonalization
As we did for the Markov chain example recently, diagonalize the matrix $T$. That is, find the eigenvalues and corresponding eigenvectors of $T$. Write $T$ in the form $T=SDS^{-1}$, where $S$ is the change of basis matrix and $D$ is a diagonal matrix. Conclude that $T^n=SD^nS^{-1}$, and use this to explicitly calculate $T^n$. Finally, apply that to the starting vector $\begin{bmatrix}1\\1\end{bmatrix}$, and simplify, to find the exact formula for Fibonacci numbers.

Suggestions:
– CHECK your work at every stage. For example, if a vector $\vec{e}’_1$ is supposed to be an eigenvector of $T$, multiply it by $T$ and check that $T\vec{e}’_1=\lambda_1\vec{e}’_1$ as it should be. It’s easy to make algebra mistakes in this problem; try to catch them as quickly as possible.
– The computations are much neater if you do them all in terms of $\phi$ and $\psi$, and use the computation tricks I suggested above.
– Your final answer might simplify in a non-obvious way. If you get a final answer that looks different than the one I gave above, try substituting in $n=1,2,3$ into your formula and see if it works! If it does, then you know there is some tricky way to simplify it to the form I gave. It it doesn’t, then you know there is an algebra error somewhere…

Spectral graph theory

This topic is about how the shape of the graph is reflected in the eigenvalues of the graph Laplacian. The graph Laplacian is the operator $\Delta$ that we defined back in the Markov chain example. Since that operator defines diffusion in a graph, there is some intuition about how the eigenvalues connect to heat flow, and how the heat flow connects to the shape of the graph.

People are interested in this problem in both directions: given a graph, what we can say about the eigenvalues; and, computing the eigenvalues, what can we say about the graph. As a general rule, the theory is more in the first question, and the applications are more in the second question. A common type of applied question is, vaguely: we have a big complicated graph, how can we understand the shape and connectivity? And eigenvalues are one tool applied to this sort of question.

If you are interested in learning more, or taking this up as an extended problem/project, there are two ways you could proceed.

1) Experiment! This would be a computing question. Look up, or program yourself, an easy way of storing a graph, converting it to an adjacency matrix, and finding the graph Laplacian $\Delta$. Then, experiment: try different graphs, find their Laplacians, and use Python/SageMath (or whatever you prefer) to compute their eigenvalues. Try graphs that are long and spread out (like a line, or a circle), and compare them to graphs which are more “compact” (like complete graphs).

2) Look at references. There are two sets of slides for talks/tutorials I found that give a pretty understandable introduction. The first is a set of slides by D.J. Kelleher, which I found pretty detailed and readable:
D.J. Kelleher, Spectral Graph Theory.
The second is another set of slides. This one is a little less detailed, but it goes further, and could be a good reference point for learning more:
Daniel A. Spielman, Spectral Graph Theory and its Applications.
You could either read through these references from a more applied perspective—looking for computational problems that you could try out yourself—or from a more theoretical perspective, trying to understand some of the theorems stated.

If you wanted to go further in this topic, there is one best book, written by one of the pioneers and masters of this field:

Spectral Graph Theory, by Fan R.K. Chung, AMS 1997

The library has one copy of this book; if more than one person is interested, let me know, and I’ll try to rustle up another copy! Her book is perhaps a bit advanced for the level of this class, but if you take it slowly it should be comprehensible, and has all kinds of cool stuff in it.

Finite Element Modeling (Physics, Astronomy, Engineering…)

This question is about simulation of physical systems. It is something that is used throughout the physical sciences. Even if you aren’t interested in numerical simulation, it can be very instructive to learn how these models work; in particular, very often the “exact” differential equation models of physics are developed by starting with a discrete finite element model, and then taking the limit as the size of the elements goes to zero.

I will pick one physical problem to look at: wave motion. But the methods work very similarly for other physical problems: quantum wave functions, electric fields, heat flow, stresses, etc etc.

The classic wave motion problem is that of a stretched string with fixed ends (like a guitar string). We imagine that the length, tension, thickness, density, and composition of the string are all fixed at the beginning of the analysis. Depending on the initial motion I give to the string (by plucking or striking), a wide variety of different wave motions are possible:

A physically possible wave motion of a string. I have greatly exaggerated the vertical displacements, so it is easy to see what the string is doing.

“Finite Element Modeling” means modeling this continuous string with a finite number of discrete pieces. I imagine that, instead of having a string of continuous constant density, I have a number of discrete masses, attached by ideal weightless strings that carry all the force of tension:

The idea is that we can simulate the continuous string by means of figuring out how these discrete masses are going to move. The approximation gets better as we take more, smaller masses, spaced more closely together. (Of course, if the physical system we are interested in really IS a bunch of discrete masses attached by strings, then we are already done: the finite element model will already be a good model!)

The three-mass string, in its rest position. I assume that all three discrete masses have an equal mass m, that all the strings connecting them are the same length and are massless, and that all the segments of string have the same tension. The two extreme ends are held fixed for all time.
The three-mass string in motion. Masses 1, 2, and 3 are displaced vertically from their rest positions, by distances $x_1$, $x_2$, and $x_3$ respectively. If mass $i$ is displaced below the rest position, the value of the coordinate $x_i$ will be negative. I am assuming that all the motions are vertical only; I am neglecting any motion left to right, or in and out of the plane of the page. The three masses also have vertical velocities $v_1$, $v_2$, and $v_3$, which I haven’t tried to draw on the diagram.

Let’s start with the three masses on a string that I have pictured above. Let’s look at motion in only one direction: assume that the masses can only move vertically in my diagram, not side to side, or in and out of the page. The ends are fixed in position, at the same height. Disregard gravity. The state of the system will then be determined if I know the displacements $x_1$, $x_2$, and $x_3$ of each of the three masses vertically from their rest positions, and also each of the three velocities $v_1$, $v_2$, and $v_3$ in the vertical direction. The value of $x_i$ will be negative if mass $i$ is below the rest position. The value of $v_i$ will be positive if mass $i$ is moving upwards, negative if mass $i$ is moving downwards.

We will collect all the displacements into a displacement vector, and all the velocities into a velocity vector:

$\vec{x}=\begin{bmatrix}x_1\\x_2\\x_3\end{bmatrix}$, $\vec{v}=\begin{bmatrix}v_1\\v_2\\v_3\end{bmatrix}$.

The vertical force on mass $i$ will have two parts, one from the string connecting to mass $i-1$, and one from the string connecting to mass $i+1$. (If the mass is at the end of the string, “mass $i-1$” or “mass $i+1$” may be replaced by the fixed end. So we might want to introduce the constants $x_0=0$ and $x_{n+1}=0$. In our example, $n=3$ and so we could think of $x_4=0$ for all time.)

To first approximation, the vertical component of the force on mass $i$ from the string to mass $i-1$ will be proportional to the difference in vertical heights of mass $i-1$ from mass $i$. (If you are a physics person, this is a good exercise to verify this claim!) Note that, if $x_{i-1}$ is less than $x_i$, then the force on mass $i$ from that string is downwards.

The constant of proportionality is dependent on lengths of the strings and string tension; let’s just call it a constant $k$.

Problem: Force matrix
a) In the example above, with three masses, write the vertical force $F_1$, $F_2$, $F_3$ on each mass. Each force will depend on the proportionality constant $k$, and also potentially on $x_1$, $x_2$, and $x_3$. (It also depends on $x_0$ and $x_4$, but since these are both the constant 0, you can drop them out in the end.)
b) Write those equations as $\vec{F}=kT\vec{x}$, where $T$ is a matrix.
c) Write out what the matrix $T$ would be if we had a string consisting of $n$ masses (rather than only 3 masses).

Now, the motions are determined by Newton’s law! The vertical acceleration $a_i$ of mass $i$ is given by

$a_i=\dfrac{1}{m_i}F_i$.

Since we have figured out the forces at any given position, we now know the accelerations:

Problem: Acceleration matrix
Show that $\vec{a}=\frac{k}{m}T\vec{x}$, where $T$ is the force matrix you found in the last question.

Now, there are two ways to go. We can leave time as continuous, in which case we get $n$ interlocked differential equations. We know, by definition, that

$\dfrac{\mathrm{d}^2 x_i}{\mathrm{d} t^2}=a_i$.

Since we have formulas for the $a_i$ in terms of the $x_j$, that gives us $n$ interdependent second-order differential equations for the $n$ quantities $x_i$. We can also collect these into one matrix differential equation.

Problem: Second-Order Differential Equations
a) In our example with three masses, write out the three second-order differential equations for $x_1$, $x_2$, and $x_3$.
b) Collect all three differential equations into one matrix differential equation (the left side will be $\dfrac{\mathrm{d}^2 \vec{x}}{\mathrm{d} t^2}$, the right side will be a matrix times $\vec{x}$).
c) How would this look for $n$ masses, rather than three?

It is also often convenient to break these $n$ second-order differential equations into $2n$ linked first-order differential equations. By definition,

$\dfrac{\mathrm{d} x_i}{\mathrm{d} t}=v_i$

and

$\dfrac{\mathrm{d} v_i}{\mathrm{d} t}=a_i$.

Since we have given formulas for the $a_i$, this gives $2n$ first-order differential equations for the $2n$ variables $x_i$ and $v_i$ (instead of $n$ second-order differential equations for the $n$ variables $x_i$).

The other way to approach this would be to also discretize time. We would break time into discrete steps of length $\Delta t$ (this is just a small time step, NOT related to the matrix $\Delta$ from an earlier question!!). In this case, the breaking into first-order differential equations is particularly helpful. We approximate the derivatives by finite differences,

$\dfrac{\mathrm{d} x_i}{\mathrm{d} t}\approx\dfrac{\Delta x_i}{\Delta t}$, $\dfrac{\mathrm{d} v_i}{\mathrm{d} t}\approx\dfrac{\Delta v_i}{\Delta t}$,

which gives us difference equations that approximate the differential equations. In each time step, each $x_i(t_{m+1})$ and $v_i(t_{m+1})$ is calculated by

$x_i(t_{m+1})=x_i(t_m)+\Delta x_i(t_m)$
$v_i(t_{m+1})=v_i(t_m)+\Delta v_i(t_m)$,

where the changes are computed by

$\Delta x_i(t_m)=v_i(t_m)\Delta t$
$\Delta v_i(t_m)=a_i(t_m)\Delta t$.

Here, $v_i(t_m)$ is just the running value of $v_i$ at that time step; but $a_i(t_m)$ stands for the formula for the acceleration of mass $i$, in terms of the various $x_j$, that you computed before.

Problem: Computing with discrete time
(Optional) If you would like to do a computing problem, it is interesting to carry out the computations with discrete time that I have described. Try setting up the discrete time equations, using three masses, and choosing some reasonable value for the constant $k/m$ (you might want to try something like $k/m=0.1$, so it doesn’t jump around too quickly). Then pick some starting value for the $x_i$. See if your simulation gives you something at least half-way realistic for wave motion. (If it’s way off, there is likely an algebra or sign error in there somewhere…)

Now, let’s go back to the differential equation version (continuous time), and see what we can figure out. It turns out that the eigenvectors and eigenvalues of $T$ are crucial here.

Problem: Decoupling equations; role of eigenvalues
a) Suppose that there exists some eigenvector $\vec{e}_1$ of the matrix $T$. Suppose we start the system, at the initial time, with position being a scalar multiple of that eigenvector, $\vec{x}=A_0\vec{e}_1$ for some scalar $A_0$, and with zero velocity. See if you can explain why the system will always have a position $\vec{x}$ that is a scalar multiple $A\vec{e}_1$ (the multiple $A$ changing over time). (It will be easier to give a precise proof after you complete the next parts.)
b) Suppose we have a differential equation

$\dfrac{\mathrm{d}^2 x}{\mathrm{d} t^2}=-Cx$,

for some scalar function of time $x=x(t)$ and some positive constant $C$. Suppose that the initial condition is $x(0)=A_0$, and $v(0)=0$ (zero initial velocity). Show that $x(t)=A_0\cos(\omega t)$ is a solution to the differential equation, with $\omega=\sqrt{C}$. (Just substitute in and check that the equation is satisfied.)
c) Suppose we have a differential equation

$\dfrac{\mathrm{d}^2 A\vec{e}}{\mathrm{d} t^2}=-CA\vec{e}$,

where $A=A(t)$ is a scalar function of time $t$, and $\vec{e}$ is some fixed vector in $\mathbb{R}^n$. Using the previous part, write out the solution of this differential equation. (Write out the vector components to see what this is saying.)
d) Going back to the first part, can you say what the solution of the differential equation is going to be, when the initial positions $\vec{x}_0(0)$ are an eigenvector of $T$? What does this say about the physical motion in this case?

Problem: Eigenvalues and eigenvectors when n=3
a) In our example above, with three masses, find all the eigenvalues and eigenvectors of the matrix $T$.
b) Draw out what the three eigenvectors look like, as positions of the masses on the string. Describe what the motion of each eigenvector (or “mode”) would look like (if we start in one mode, then all positions at later times are scalar multiples of that mode).
c) Describe what the frequencies of each mode are. In particular, say what the frequencies of the second and third mode are, in comparison to the first mode.
d) Diagonalize the matrix $T$. Use this to give a general solution of the differential equation of motion, for any initial condition $\vec{x}(0)$ (and initial velocity $\vec{v}(0)=0$).

This doesn’t work only for stretched strings. For example, we could have a membrane with a fixed edge, like a drum head:

A possible motion of a membrane, with square fixed edges. Again, the vertical motions are highly exaggerated.

We can set up the finite element model in a similar way.

Problem: Vibrating Membrane
Suppose that we have a square membrane, fixed at the edges, like the picture above. Let’s model this with nine discrete masses, in a $3\times 3$ square grid, connected by a square grid of weightless stretched strings. There are strings attached to a fixed square frame, around the $3\times 3$ grid of discrete masses. The rules will be the same: we only consider vertical displacements $x_i$ and velocities $v_i$, and we assume that the vertical forces are simply proportional to differences in height from neighbors, as before.

Use this setup to write the matrix differential equation as before. Everything should be the same as with the string, except for the $9\times 9$ matrix $T$, which will be different. (Actually, I started kind of large, didn’t I? You might want to try doing a $2\times 2$ grid of masses first, which would give you a $4\times 4$ matrix $T$.)

(I’m not asking you to do any of the later steps here, which would be harder! I’m just asking for the initial setup, which corresponds to finding the matrix $T$.)

There’s plenty more I could ask here, but this is already too much!

Eigenvalues III: Spectral decomposition and Markov continued

Hello again!

I left off the last lecture (Eigenvalues II: Markov processes continued), not so much at a logical stopping point, but just where I got tired and ran out of time! A bit like ordinary class I guess.

In the last lecture, I showed how to

  • change basis (coordinates) for a vector
  • change basis (coordinates) for a linear transformation
  • use a basis made of eigenvectors to simply compute powers of a matrix (applied to the transition matrix for our Markov example)
  • use a basis made of eigenvectors to put the matrix of a transformation into simple (diagonal) form
  • apply that diagonalization to again simply compute powers of a matrix (this time in matrix form)

At the end of that lecture, I started to show how to use the eigenvectors to decompose the matrix in a different way: as a sum of simple 1-D projection operators (tensor products of vectors and dual vectors). However, I didn’t get very far: I just reminded you about dual vectors, and computed them in our example.

In this lecture, I am going to show how the spectral decomposition works, and apply it to our example.

There is one thing I haven’t told you: how to actually FIND the eigenvectors! I’m going to put that off for a little bit: first I want to show you, in the last lecture and this one, how they are used mathematically (for diagonalization and spectral decomposition), and then in the next lecture to show you some more applications.

OK, let’s get started.

Reminder of our example from last time

To start, I am going to repeat what I said at the end of the previous lecture, for reference.

Recall that our main example was the following two-state Markov chain:

If $x_1$ and $x_2$ are the numbers of agents (or amounts of stuff, or probabilities of one agent) in the two states, then the system state vector $\vec{x}_n=\begin{bmatrix}x_1\\x_2\end{bmatrix}_n$ evolves from time state $t=n$ to $t=n+1$ by the evolution equation

$\vec{x}_{n+1}=T\vec{x}_n = (I+\Delta)\vec{x}_n$.

(I have switched from writing $\begin{bmatrix}x_1\\x_2\end{bmatrix}(t)$, from the last lecture, to $\begin{bmatrix}x_1\\x_2\end{bmatrix}_t$ instead. The former makes sense logically, because the vector is a function of time; however, most people write this the latter way, so it doesn’t get confused with multiplying by t. I’ll stick with the subscript from now on, but it is only a change of notation.)

In our example, the transition operator $T$ is given by the matrix in the $B=\left\lbrace\begin{bmatrix}1\\0\end{bmatrix},\begin{bmatrix}0\\1\end{bmatrix}\right\rbrace$ basis

$[T]_B=\begin{bmatrix}0.90&0.05\\0.10&0.95\end{bmatrix}$,

and the difference operator, or weighted Laplacian, $\Delta$, is given by the matrix

$[\Delta]_B=\begin{bmatrix}-0.10&0.05\\0.10&-0.05\end{bmatrix}$.

Reminder of our analysis so far

We used eigenvectors to better understand the behavior of this system.

The operator $T$ has eigenvectors $\vec{e}’_1=\begin{bmatrix}\frac{1}{3}\\\frac{2}{3}\end{bmatrix}_B$ and $\vec{e}’_2=\begin{bmatrix}1\\-1\end{bmatrix}_B$, with eigenvalues $\lambda_1=1$ and $\lambda_2=0.85$, respectively. The operator $\Delta$ has the same eigenvectors, with eigenvalues $\mu_1=0$ and $\mu_2=-0.15$ respectively.

We showed how to find the first eigenvector, but I did not show you yet how to find the second one. However, once you are given $\vec{e}’_1$ and $\vec{e}’_2$, it is easy to check that they are eigenvectors, and to find the corresponding eigenvalues.

(I could have equally well taken any non-zero scalar multiple of $\vec{e}’_1$ and/or $\vec{e}’_2$, and they would still be eigenvectors, with the same eigenvalues. I picked these particular multiples to make most sense with our example.)

The eigenvectors form a basis, $B’=\left\lbrace\vec{e}’_1,\vec{e}’_2\right\rbrace$. These are also called normal modes for the system.

If I can write my initial conditions in terms of the normal modes, then I can easily compute the operation of $T$ to any power. If my initial condition $\vec{v}_0=\begin{bmatrix}x_1\\x_2\end{bmatrix}_{B\ 0}$ can be written as $\vec{v}_0=x’_1(0)\vec{e}’_1 + x’_2(0)\vec{e}’_2$, then

$T\vec{v}_0=\lambda_1 x’_1(0)\vec{e}’_1+\lambda_2 x’_2(0)\vec{e}’_2$,

so

$T^nvec{v}_0=\lambda_1 ^n x’_1(0)\vec{e}’_1+\lambda_2^n x’_2(0)\vec{e}’_2=1 ^n x’_1(0)\vec{e}’_1+(0.85)^n x’_2(0)\vec{e}’_2$.

We can also so this computation in matrix form. We have the change of coordinate formulas

$[\vec{v}]_{B}=S[\vec{v}]_{B’}$
$[\vec{v}]_{B’}=S^{-1}[\vec{v}]_{B}$,

where $S$ is the change of coordinate matrix, whose columns are the $B’$ basis vectors, expressed in the $B$ basis. In our example,

$S=\begin{bmatrix}\frac{1}{3} & 1 \\\frac{2}{3} & -1\end{bmatrix}$, $S^{-1}=\begin{bmatrix}1&1\\\frac{2}{3} &-\frac{1}{3}\end{bmatrix}$.

We then have the change of coordinate formulas for linear operators,

$[T]_{B’}=S^{-1} [T]_B S$
$[T]_B=S [T]_{B’} S^{-1}$.

In our example,

$[T]_{B’}=\begin{bmatrix}\lambda_1&0\\0&\lambda_2\end{bmatrix}\begin{bmatrix}1&0\\0&0.85\end{bmatrix}$, $[\Delta]_{B’}=\begin{bmatrix}\mu_1&0\\0&\mu_2\end{bmatrix}=\begin{bmatrix}0&0\\0&-0.15\end{bmatrix}$,

and

$[T]_B=S\begin{bmatrix}\lambda_1&0\\0&\lambda_2\end{bmatrix}S^{-1}=\begin{bmatrix}\frac{1}{3} & 1\\\frac{2}{3} & -1\end{bmatrix}\begin{bmatrix}1&0\\0&0.85\end{bmatrix}\begin{bmatrix}1&1\\\frac{2}{3} &-\frac{1}{3}\end{bmatrix}$.

This diagonalization has some nice properties, one of which is that we can easily compute any power of $T$ in the ordinary $B$ basis:

$[T^n]_B=S\begin{bmatrix}\lambda_1^n&0\\0&\lambda_2^n\end{bmatrix}S^{-1}=\begin{bmatrix}\frac{1}{3} & 1\\\frac{2}{3} & -1\end{bmatrix}\begin{bmatrix}1^n&0\\0&(0.85)^n\end{bmatrix}\begin{bmatrix}1&1\\\frac{2}{3} &-\frac{1}{3}\end{bmatrix}$,

and similarly for $\Delta$.

Dual vectors

We have identified that the transition matrix $T$ has a steady-state vector ${\vec{e}’}_1$ given by

${\vec{e}’}_1=\begin{bmatrix}\frac{1}{3}\\\frac{2}{3}\end{bmatrix}$.

That is, the vector stays the same when $T$ is applied to it. (We could have equally well taken any non-zero scalar multiple of ${\vec{e}’}_1$, and that would also be fixed.)

We also mentioned that the total number of agents (or total probability) is fixed: the value

$x’_1=x_1+x_2$

is a constant. This is also a fixed thing, like the fixed/steady-state vector! But it is a different kind of thing: the fixed vector is a vector. This fixed thing $x’_1$ isn’t a vector; it’s a coordinate. More formally, it is a dual vector. And it is fixed, so it is an eigen dual vector. That’s hard to say, so we’ll say dual eigenvector instead.

Given some vector $\vec{v}=\begin{bmatrix}x_1\\x_2\end{bmatrix}_B$, how do I actually find its $x’_1$ and $x’_2$ coordinates? These are found by the dual vectors!

Recall that dual vectors are linear functions $f:V\to k$, whatever our field $k$ is. If they are written as matrices, and $V$ has dimension $n$, then they are $1\times n$ matrices. For this reason, they are also called row vectors.

Recall that the dual basis vectors $\overleftarrow{\phi}’_i$, $i=1,\dotsc,n$, corresponding to the basis $B’=\left\lbrace\vec{v}_i\right\rbrace$, $i=1,\dotsc,n$, are defined by being dual vectors satisfying the condition

$\overleftarrow{\phi}’_i\left({\vec{e}’}_j\right)=\begin{cases}1&i=j\\0&i\neq j\end{cases}$.

It’s not obvious that such dual vectors will always exist for any basis. I’ll ask that as a theory question in the next set of problems.

Assuming there exist these dual vectors, then they are the coordinate functions in the new basis! That is, if we have a vector $\vec{v}$, and it can be written in the new basis as $\vec{v}=x’_1\vec{e’}_1+x’_2\vec{e’}_2+\dotsb x’_n\vec{e’}_n$, then

For $i=1,\dotsc,n$, we have $x’_i=\overleftarrow{\phi}’_i(\vec{v})$.

Understanding exercise: Dual vectors give coordinates
Prove the above statement.

In our two-dimensional example, the $B’$ coordinates of any state vector $\vec{v}$ are given by $x’_1=\overleftarrow{\phi}’_1(\vec{v})$ and $x’_2=\overleftarrow{\phi}’_2(\vec{v})$.

How do I actually find the dual vectors? That is, how do I give their matrices in ordinary $B$ coordinates?

As a practical matter, you can use the defining conditions on the dual basis to set up linear equations for the matrices for the dual vectors, and solve. However, the dual basis matrices are actually already hidden in the change of coordinate matrices $S$ and $S^{-1}$ that we just talked about!

I claim that

the rows of $S$ are the matrices of the $B$ dual vectors, expressed in $B’$ coordinates, i.e. the $i$th row of $S$ is $[\overleftarrow{\phi}_i]_{B’}$,

and consequently that

the rows of $S^{-1}$ are the matrices of the $B’$ dual vectors, expressed in $B$ coordinates, i.e. the $i$th row of $S^{-1}$ is $[\overleftarrow{\phi}’_i]_{B}$.

This last thing is what we wanted. So, if we recall that

$S^{-1}=\begin{bmatrix}1&1\\\frac{2}{3} &-\frac{1}{3}\end{bmatrix}$,

we find that the dual vectors $\overleftarrow{\phi}’_1$ and $\overleftarrow{\phi}’_2$ have bases in the regular $B$ basis given by

$[\overleftarrow{\phi}’_1]_B=\begin{bmatrix}1&1\end{bmatrix}_B$
and
$[\overleftarrow{\phi}’_2]_B=\begin{bmatrix}\frac{2}{3} &-\frac{1}{3}\end{bmatrix}_B$ .

Understanding exercise: Computing coordinates with dual basis vectors
Check that this circle of ideas all closes up with a numerical example. Pick some numerical vector $\vec{v}=\begin{bmatrix}x_1\\x_2\end{bmatrix}$. Use the dual vectors above to find its $x’_1$ and $x’_2$ coordinates. Then check, in ordinary $B$ coordinates, that your chosen vector is in fact that linear combination of the $B’$ basis vectors: that is, check that $\vec{v}=x’_1\vec{e}’_1+x’_2\vec{e}’_2$.

Understanding exercise: Finding dual basis vectors from the change of coordinate matrix
a) See if you can explain why the statement I made is true: that the rows of the change of coordinate matrix $S$ are the matrices of the $B$ dual basis vectors, written in the $B’$ coordinates. To do this, you’ll have to think conceptually about what everything means. (I’m not looking for a formal proof here; I’ll ask that as a theory problem in a little bit. I’m just looking for a reason.)
b) Explain why that means that the rows of $S^{-1}$ are the matrices of the $B’$ dual basis vectors, written in the $B$ coordinates. (Which is usually what we want to find.)

In fact, we can write down change of coordinate formulas for dual vectors. If $f:V\to k$ is any dual vector, and if $S$ is the same change of coordinate matrix that we defined earlier (whose columns are the $B’$ basis vectors written in the $B$ basis), then

$[f]_{B’}=[f]_B S$, and
$[f]_B=[f]_{B’} S^{-1}$.

Understanding exercise: Change of coordinates for dual vectors
a) Try to explain to yourself why these formulas must be correct. (I’ll ask for a more formal proof later, but you could try to give one now if you’re interested.)
b) Use these formulas to solve the previous exercise. (Hint: what are the matrices $[\overleftarrow{\phi}_1]_B$ and $[\overleftarrow{\phi}_2]_B$? What are the matrices $[\overleftarrow{\phi}’_1]_{B’}$ and $[\overleftarrow{\phi}’_1]_{B’}$?

The spectral decomposition

Let’s look again at our example operator $T$. Using the eigenbasis $B’$, we can describe the action of $T$ as follows:

$T\vec{v}=\lambda_1 x’_1\vec{e}’_1+\lambda_2 x’_2\vec{e}’_2$.

That is, given any vector $\vec{v}$, we can calculate $T\vec{v}$ as follows:

  • find the $x’_1$ and $x’_2$ coordinates of the vector $\vec{v}$;
  • multiply the $x’_1$ coordinate by the corresponding basis vector $\vec{e}’_1$, and multiply that result by the eigenvalue $\lambda_1$;
  • do the same for the second coordinate;
  • and finally, add the results.

Well, we can write the instruction “find the $x’_1$ coordinate of $\vec{v}$” as a formula now! It is

$x’_1=\overleftarrow{\phi}’_1(\vec{v})$,

and similarly for $x’_2$. Consequently, we can write the recipe above as a formula: for any vector $\vec{v}$, we have

$T\vec{v}=\lambda_1 \overleftarrow{\phi}’_1(\vec{v})\vec{e}’_1+\lambda_2 \overleftarrow{\phi}’_2(\vec{v})\vec{e}’_2$.

I want to write this as an operator formula; that is, I want to leave out the “$\vec{v}$” in the formula. So I could write something like

$T=\lambda_1 \overleftarrow{\phi}’_1(-)\vec{e}’_1+\lambda_2 \overleftarrow{\phi}’_2(-)\vec{e}’_2$,

where the blank “$-$” is supposed to represent where I substitute in the argument $\vec{v}$. This is an odd way to write a function. So I define the notation $\vec{e}’_1\otimes\overleftarrow{\phi}’_1$. This thing is defined to be a linear function, which takes an input in $V$ and an output in $V$. It is defined to operate on an input $\vec{v}$ as follows:

$\vec{e}’_1\otimes\overleftarrow{\phi}’_1(\vec{v}):=\overleftarrow{\phi}’_1(\vec{v})\vec{e}’_1$.

Note that the output is a vector, so it is indeed a linear function from $V$ to $V$. It is called the tensor product, or outer product, of $\vec{e}’_1$ and $\overleftarrow{\phi}’_1$.

Using this notation, I can write

$T=\lambda_1 \vec{e}’_1\otimes\overleftarrow{\phi}’_1+\lambda_2 \vec{e}’_2\otimes\overleftarrow{\phi}’_2$.

This is called the spectral decomposition of the operator $T$. (The set of eigenvalues of the operator are called its spectrum. This is because the spectral lines in chemistry are computed by means of eigenvalues of an operator (a story for next term!).)

What does the spectral decomposition mean?

What kind of a thing is $\vec{e}’_i\otimes\overleftarrow{\phi}’_i$. First I want to describe it conceptually, and then explicitly in coordinates.

Conceptually, any vector $\vec{v}$ can be written as a linear combination of vectors in the $B’$ basis,

$\vec{v}=x’_1\vec{e}’_1+\dotsb+x’_n\vec{e}’_n$.

Let’s look at the meaning of the first coordinate. In the language we introduced earlier, the vector space $V$ is direct sum of the subspaces $V_1=\mathrm{span}\lbrace\vec{e}’_1\rbrace$ and $V’=\mathrm{span}\lbrace\vec{e}’_2,\dotsc\vec{e}’_n\rbrace$. The projection of a vector $\vec{v}$ onto $V_1$, along $V’$, is, by definition, $P_{V_1}^{V’}(\vec{v})=x_1\vec{e}’_1$.

The $x’_1$ is the first coordinate of $\vec{v}$ in the $B’$ basis; the product $x’_1\vec{e}’_1$ is the (vector) projection of $\vec{v}$ onto that first coordinate axis, along all the other coordinate axes.

Therefore, the tensor product $\vec{e}’_1\otimes\overleftarrow{\phi}’_1$ is a projection operator! It is the projection $P_{V_1}^{V’}$, where $V_1$ and $V’$ are defined as above.

The spectral decomposition of $T$ is a decomposition of $T$ as a linear combination of 1-D projection operators.

Let me be more specific in our two-dimensional example. Let’s call

$P_1:=\vec{e}’_1\otimes\overleftarrow{\phi}’_1$ and
$P_2:=\vec{e}’_1\otimes\overleftarrow{\phi}’_2$,

and let’s define $V_1=\mathrm{span}\lbrace\vec{e}’_1\rbrace$ and $V_2=\mathrm{span}\lbrace\vec{e}’_2\rbrace$. Then

$P_1=P_{V_1}^{V_2}$ and
$P_2=P_{V_2}^{V_1}$.

The projections $P_1$ and $P_2$ split $\vec{v}$ into its parts along the $\vec{e}’_1$ and $\vec{e}’_2$ axes.

If I add those parts back again, I get the vector $\vec{v}$ back! So, for any vector $\vec{v}$,

$\vec{v}=P_1(\vec{v})+P_2(\vec{v})$,

which means that

$I=P_1+P_2$.

This is called a decomposition of the identity. It gives the identity operator as a sum of 1-D projection operators. Any choice of basis gives a decomposition of the identity in this way.

We then also have that

$T=\lambda_1 P_1 + \lambda_2 P_2$.

This is again the spectral decomposition of $T$. It writes $T$ as a linear combination of 1-D projection operators. The coefficients of the linear combination are the eigenvalues. Any time we can form a basis out of the eigenvectors of a transformation, we get such a spectral decomposition.

These projection operators obey some very nice algebraic properties, which help make the spectral decomposition useful. First, any projection operator $P=P_{V’}^{V”}$ for a direct sum $V=V’\oplus V”$ (including the $P_1$ and $P_2$ we have defined) satisfies the following identity:

$P^2=P$.

Understanding exercise: Defining equation of projections
a) Try to explain why the equation $P^2=P$ has to be true for any projection.
b) (Tougher; to think about) If a linear transformation $P:V\to V$ satisfies $P^2=P$, do you think it must be a projection? Why or why not?

Second, these particular projections, associated to the basis vector, are “mutually annihilating”:

$P_1P_2=0$, and $P_2P_1=0$.

Understanding exercise: Mutually annihilating projections
a) Try to explain why the equations above must be true.
b) (Tougher; to think about) Can you see what property of a basis (or of a direct sum) is expressed in these equations?

Together, they make algebra quite simple.

Understanding exercise: Computing with projections
a) Using $T=\lambda_1 P_1 + \lambda_2 P_2$, show that $T^2=\lambda_1^2 P_1 + \lambda_2^2 P_2$. (Just do the algebra abstractly, using the symbols, and using the two rules for $P_1$ and $P_2$ I gave above. Don’t try to write matrices or anything.)
b) We should of course have that $I^2=I$. Check that this works with $I=P_1+P_2$.

Writing out the tensor products and spectral decomposition as matrices

I claimed above that the tensor product $P_1:=\vec{e}’_1\otimes\overleftarrow{\phi}’_1$ is a linear function, $P_1:V\to V$. It inputs any vector in $V$, and it outputs a the vector in $V$ which is the part of $\vec{v}$ along the $\vec{e}’_1$ axis.

But if $P_1:V\to V$ is linear, that means $P_1$ has a matrix! How do we calculate it?

One way is to use the meaning of a matrix. If I am trying to find the matrix $[P_1]_B$ of the operator $P_1$ with respect to the standard $B$ basis, then the rule is that the first column of $P_1$ should be $[P_1(\vec{e}_1)]_B$, the effect of P_1 on the first $B$ basis vector, written in the $B$ basis. Similarly for the second column. But these can be figured out based on the definition of $P_1$ and on what we know already, right?

Understanding exercise: Computing matrix of tensor product $P_i$ using definition
a) Follow what I have said above to work out the matrix for $P_1$ in the standard basis.
b) Do the same for $P_2$.

Here’s a slightly easier way. We know that an $m\times n$ matrix over the real numbers represents a linear function from $\mathbb{R}^n$ to $\mathbb{R}^m$, right? So, for example, any dual vector $f:\mathbb{R}^2\to\mathbb{R}$ must be represented by a $1\times 2$ matrix.

Well, if we write vectors in $\mathbb{R}^2$ as $2\times 1$ matrices, then we could see any vector $\vec{v}$ as being a function from $\mathbb{R}$ to $\mathbb{R}^2$, right?

Understanding exercise: Vectors as linear functions
If I interpret a vector in $\mathbb{R}^2$ as being a function from $\mathbb{R}$ to $\mathbb{R}^2$, by matrix multiplication, then what does this function do, geometrically? What does the range of the function look like?

Using this interpretation, I claim that the tensor product is actually composition! That is, claim that, for any vector $\vec{v}$,

$\vec{e}’_1\otimes\overleftarrow{\phi}’_1(\vec{v})$

is the same thing as

$\vec{e}’_1\left(\overleftarrow{\phi}’_1(\vec{v})\right)$,

where we interpret $\vec{e}’_1$ as a function from $\mathbb{R}$ to $\mathbb{R}^2$ as I described above.

Understanding exercise: Tensor product is composition
Do you believe what I just said? Try to think it through. If it’s confusing, read ahead, and then come back to this question.

All this fancy language is to make a pretty simple claim: I claim that the tensor product

$P_1=\vec{e}’_1\otimes\overleftarrow{\phi}’_1$

can be written as a matrix by simply multiplying the matrix of $\vec{e}’_1$ with the matrix of $\overleftarrow{\phi}’_1$:

$[P_1]_B=[\vec{e}’_1]_B[\overleftarrow{\phi}’_1]_B$

and similarly for $P_2$.

Understanding exercise: Computing tensor products $P_i$ by matrix multiplication
a) Check the dimensions of the claimed matrix multiplication above: will $P_1$ come out as a $2\times 2$ matrix, as it should?
b) Make the matrix multiplication, to compute $[P_1]_B$.
c) Do the same for $[P_2]_B$.
d) Check that you get the same thing as you did earlier, using the definition of the matrix.
e) If these are the correct projection matrices, then it should be the case that $P_1+P_2=I$. Is that true for the matrices $P_1$ and $P_2$ that you wrote?
f) The projection matrices were supposed to satisfy $P_1^2=P_1$, $P_2^2=P_2$, and $P_1P_2=P_2P_1=0$. Do they??
f) For our long-running example matrix $T$, it should be true that $T=\lambda_1 P_1 + \lambda_2 P_2$. Check that this is actually true for the matrices $P_1$ and $P_2$ that you wrote!

Once you have solved that last exercise, we have finally written the spectral decomposition for $T$ as explicit matrices!

Before we end, let me ask one last check on your understanding:

Understanding exercise: Projections in basis B’
In all of the above, I’ve been trying to write the projection matrices $P_1$ and $P_2$ in the ordinary basis $B$. What if I tried to find the matrices $[P_1]_{B’}$ and $[P_2]_{B’}$, the matrices in the $B’$ basis, instead? What would those be?

Why?

Why have I been so interested in this spectral decomposition? At the moment, it just seems like a re-writing of the diagonalization of the matrix of an operator.

One reason is that it gives a better conceptual explanation of what “diagonalizing a matrix” actually means. We are writing the transformation as a sum of projection operators.

A second reason is that some of the applications we are headed for next make better sense when you think of them in terms of spectral decompositions, rather than as diagonalizations.

A third reason is that these ideas are going to be important for compression. A large matrix $T$ typically has a few relatively large eigenvalues, and lots of smaller ones. The amount of information in the matrix is $n\times n$ values; the amount of information in each projection operator $P_i$ is $n+n=2n$ values. If $n$ is large, that is a lot smaller. If we keep only the $P_i$ in the spectral decomposition corresponding to the few biggest eigenvalues, we get a good approximation for $T$, but using far less information.

Writing linear transformations as sums of tensor products is going to be a good warmup for tensors, which I’m definitely still hoping we have time for!

Finally, the spectral decomposition is very important in extensions and applications of linear algebra that we sadly won’t have time to get to in this class, like infinite-dimensional spaces and quantum mechanics.

OK, enough trying to sell you on this!

What next?

The next thing I will give you is a problem set, with more examples and problems based on these three lectures (Eigenvalues I, II and III). I will include some Theory problems and some Computation problems, for those of you interested in either of those.

After that, I am planning to give you some material on other applications of eigenvectors, eigenvalues, diagonalization, and spectral decomposition.

Then, I want to introduce inner products. We have forgotten about perpendicularity for a while—I want to bring it back (in more sophisticated form)!

Differential Equation Examples (Problem Set #8)

Hi everyone!

This follows the lecture on Differential Equations.

This lecture combines two things: I will give some more examples of setting up a differential equation model, and graphing the slope field.

I will also ask you some problems, which I would like you to submit, so I’m also going to call this lecture “Problem Set #8”. Please submit these problems, over slack as usual. You can submit them one at a time. Please indicate in the message the problem set and the problem name, e.g. “Problem Set 8: Numerically approximating the solution of a differential equation”.

I’m mixing the examples together with the problems so that I can explain the problems before getting you to do them, with extra examples as necessary.

Problem: Approximately numerically solving a differential equation

I’d like to start by getting you to do an example like in the previous lecture. In that lecture, I started with the difference equation

$\Delta P = (0.10) P \Delta t$,

describing population growth, at a 10% per time period rate.

At first, I took $\Delta t=1$: that meant that 10% was added to the population discretely, once every time period, and I tabulated it. That is a good population model if the animals have a definite breeding season, for example.

But then, I imagined that the reproduction was happening continuously. In this case, the population does NOT grow by 10% in one hour. Even though that is the continuous rate, after an hour, we will actually have more than 10% added!

That’s because, in that first hour, individuals were being added, and they contribute to the growth. For example, I started looking at half time periods, $\Delta t=0.5$. In the first half time period, the population would increase by 5%; those extra 5% would then reproduce in the next half hour.

But that’s not really good enough, if the growth is continuous: in that first half hour, individuals are added as well. So I split it into quarter time periods $\Delta t=0.25$, tenths of time periods $\Delta t=0.1$, . . . and tabulated those. (Look at the previous lecture to see what I’m talking about!)

Continuous growth would be the limiting values of that table, as I take the $\Delta t$ smaller and smaller. The limiting case would be a solution of the differential equation

$\mathrm{d}P = (0.10) P \,\mathrm{d}t$

or

$\dfrac{\mathrm{d}P}{\mathrm{d}t} = (0.10) P$

OK, I’d like you to try this yourself:

Problem: Numerically approximating the solution of a differential equation
Suppose we want to solve the differential equation

$\mathrm{d}P = P \,\mathrm{d}t$.

(I have replaced the rate $r=0.10$ in the example I did, with $r=1$ instead. So the organisms are reproducing at a continuous rate of 100% per time period.) Suppose that the initial population is $P=1000$ at time $t=0$. I want you to estimate the population after one time interval, that is, the population at time $t=1$.

As I did before, I’d like you to use the difference equation

$\Delta P = P \Delta t$

for smaller and smaller values of $\Delta t$.
a) Take $\Delta t=0.5$, and make a table which goes up to $t=1$, to estimate $P$ when $t=1$.
b) Take $\Delta t=0.25$, and make a table which goes up to $t=1$, to estimate $P$ when $t=1$.
c) Take $\Delta t=0.10$, and make a table which goes up to $t=1$, to estimate $P$ when $t=1$.
d) How far do you think I have to take this to get an accurate answer for $P$ at time $t=1$, when the growth is continuous?
e) Keep going, with smaller $\Delta t$, until you are confident that you have an answer for $P$ at time $t=1$ that is correct to the nearest whole number.

Problem: Nuclear Decay

OK, now I’ll ask you to cook up another model. This model will be very close to the exponential growth model that I described in detail in the previous lecture, so you might want to refer back to that lecture as you do this problem.

To set up the model, I have to explain a little about the physical system I’m trying to model.

We start with a sample (say a rock, or a piece of charcoal), that contains some number $N_0$ of unstable (radioactive) atoms. After some time, each atom will “decay”, meaning that it will release a particle of radioactivity, and the atom will transmute to a different element or isotope. Effectively, if all we care about are the radioactive atoms, then the atom which has “decayed” is no longer there (it has changed to something else), so the number $N$ of radioactive atoms is reduced by one.

How long a particular radioactive atom will take to decay is random and unpredictable! So it might seem hard to predict what is going on. However, the average time a radioactive atom takes to decay is very consistent and predictable. It’s a known number, which depends on the type of atom. And any reasonable size sample contains billions of atoms or more, so the randomness averages out pretty smoothly.

In a time period, a known number $rN$ of the radioactive atoms will decay, where $r$ is a constant proportion. Those atoms will be removed from the total.

But, like I discussed in the previous lecture, this is continuous process. So, for example, if I say that 10% of the atoms decay per hour, that is a continuous rate. It doesn’t mean that after 1 hour, the number of atoms will be reduced by 10%. For example, if we split it up by half-hours, then after one half-hour, 5% will have decayed—leaving fewer to decay in the next half-hour. And that’s not really good enough either: we should split it into quarter hours, or tenths of an hour, or minutes, or seconds. . . We should really look at the amount reduced in an infinitesimally small time $\mathrm{d}t$.

Problem: Nuclear decay model
a) Set up a differential equation that models nuclear decay. Let $N$ be the number atoms we have, at time $t$. Assume that the rate of decay is: 10% of radioactive atoms decay per time period. (But as discussed, that is a continuous rate, so there will actually not be exactly 10% in one time period.) Your answer should be very similar to the exponential growth differential equation from the last lecture, with only one key difference.
b) Draw the slope field, the same way I did for the exponential growth model in the previous lecture.
c) Although negative $N$ doesn’t make much sense for our model, extend the slope field to negative $N$ and negative $t$ anyway, just to see how it works mathematically.
d) Draw a typical solution curve to this slope field.
e) Does the shape of the solution curve make sense, in a practical sense?
f) What happens to the number $N$ of radioactive atoms after very long times, $t\to\infty$?

Some simple abstract differential equation examples

Now, I’d like to try a few simple abstract examples of differential equations. I’m not going to worry about what they’re modeling: I’m just going to set them up as straight math, to get some practice with the slope fields and solution curves.

Let me start with the simplest example I can think of. Let’s say $y$ is my function, and that it depends on the input variable $x$. I’m going to make a differential equation of the form

$\dfrac{\mathrm{d}y}{\mathrm{d}x} = \text{something}$,

which is the simplest form. (All my previous examples have been of this form so far.)

The simplest “something” I can think of is a constant. Let’s make up a constant, say

$\dfrac{\mathrm{d}y}{\mathrm{d}x} = 1$.

What slope field does this correspond to? Well, the differential equation is saying that the slope is 1. Always 1. Forever 1. It doesn’t even matter what $x$ and $y$ are. Differential equation don’t care. Slope is 1.

So every time I draw a little slope line, that slope is going to be 1:

The slope field for the differential equation dy/dx = 1.

Nice!

What’s a solution curve for this slope field going to look like?

That’s right, it will just be a straight line:

Slope field for the differential equation dy/dx=1, with one solution curve, going through (0,0).

I drew the straight line starting at the point (0,0). But that was an arbitrary choice. I could have drawn the line starting at any point. Here are two more lines, one I drew starting at (0,2), and the other at (0,-1):

Slope field for the differential equation dy/dx = 1, with three different solution curves shown.

The choice of starting point is called an “initial condition”.

Now, this differential equation was simple enough that I can actually find an explicit formula for the answer! If

$\dfrac{\mathrm{d}y}{\mathrm{d}x} = 1$,

then a function with derivative of 1 would be

$y=x$,

or more generally,

$y=x+C$

where $C$ is any constant.

That agrees with my picture, right? The three solution curves that I drew were

$y=x$, $y=x+2$, and $y=x-1$,

corresponding to different values of the constant $C$.

What sorts of other slope fields do we get with this simplest example?

Problem: dy/dx = k, where k constant
Let’s say we have the differential equation

$\dfrac{\mathrm{d}y}{\mathrm{d}x} = k$,

where $k$ is some constant. (Our example above was $k=1$.)
a) What does the slope field look like, qualitatively? You should have three possible cases, depending on what sort of value the constant $k$ has . . .
b) Draw some solution curves in each of the three cases.
c) Write the general solution to the differential equation. (It will be the same equation for all three cases.)

Now, I’d like you to try the next to simplest example on your own:

Problem: dy/dx = x
Let’s say we have the differential equation

$\dfrac{\mathrm{d}y}{\mathrm{d}x} = x$.

a) Draw the slope field for this differential equation. (Pick some points, say (0,0), (0,1), (0,2) . . . then (1,0), (1,1), (1,2), . . . , then (2,0), (2,1), (2,2), . . . etc. At each point, use the differential equation to find the slope, and draw a little line with that slope, at that point.)
b) In (a), I suggested using points with positive values. If you didn’t do it already, extend your slope field to some negative values of both x and y.
c) Draw two different solution curves to this slope field.
d) Find the explicit equation of y as a function of x, that will solve the differential equation.
e) Check that your explicit formula agrees with the solution curves that you found graphically!

Let me show you another example, this one slightly more complicated. Let’s consider the differential equation

$\dfrac{\mathrm{d}y}{\mathrm{d}x} = y-2$.

Try drawing the slope field yourself first, don’t look at the answer, I’ll wait!

. . .

. . .

. . .

OK, what did you get? My slope field for this differential equation looks like this:

Slope field for the differential equation dy/dx = y-2.

Is that what you got?

I got it by substituting points (0,0), (1,0), (2,0), (1,0), etc etc. in for x and y in the differential equation to find the slope. Then drawing a little line of that slope, at the corresponding point. For example, the little line at (0,0) has slope dy/dx = y-2 = 0-2 = -2.

But, I could do this more conceptually, since I just want a qualitative picture. First, I note that the slope dy/dx = y – 2 does not depend on x! So the slope lines will have the same slope if I change the x coordinate, that is, if I move horizontally. So the slope on horizontal lines is constant.

Next, I look at what happens vertically. An important value is y=2: if y=2, then the slope is zero. That’s significant, so I draw that first: a bunch of horizontal slope 0 lines, all along the horizontal line y=2.

Then, if $y>2$, I know that the slope dy/dx = y-2 is positive. So the slope lines above the line y=2 will all point up (going left to right). And, the more y is bigger than 2, the bigger that slope will be: the lines slope up more as I go up further above the line y=2. I draw all those in, remembering that the slopes are constant along horizontal lines.

Finally, if $y<2$, then the slope dy/dx = y-2 is a negative number. So the slope lines below the line y=2 will all point down (going left to right). Again, the further we are below y=2, the more negative those slopes will be.

OK, so what do solution curves look like? Let me start with one that begins at the point (0,3), above the line y=2:

Slope field for the differential equation dy/dx = y-2, and a solution curve passing through (0,3).

Note that I started at (0,3), and followed the slope field going in the positive x direction (to the right). But I also started at (0,3) and followed the slope field going backwards, in the negative x direction (to the left).

As I go backwards from (0,3) in the negative x direction, the slope field is pushing my solution curve down towards the line y=2. However, as my curve gets closer to y=2 for negative x, the slope line also gets shallower. I haven’t drawn them all, but as y gets close to 2, the slope dy/dx = y-2 gets smaller. So as I travel in the negative x-direction, the solution curve gets closer to y=2, but more and more slowly, so it never quite gets there. The technical word for this is that y=2 is an asymptote for the solution curve.

If I start below the line y=2, say at (0,1), then I get something kind of opposite:

The slope field for the differential equation dy/dx = y-2, with solution curves starting at (0,3) and at (0,1).

You should check the logic of that curve in a similar way.

There is one other possibility: if I start exactly on the line y=2, then I will (in theory) stay there forever, because the slope is 0, and the solution curve doesn’t move up or down:

The slope field for the differential equation dy/dx = y-2, with solution curves starting at (0,3), at (0,1), and at (0,2).

The technical word for this is that y=2 is an equilibrium value for this differential equation.

Note that if were even a tiny bit away from y=2—say we start at (0,2.001)—then the slope will be positive. If we move in the positive x direction from there, the solution curve will move up away from y=2, and will slope more and more as it gets away from y=2, increasing more and more rapidly. A solution that starts near the equilibrium will move away from the equilibrium as x increases. The technical word for this is that y=2 is an unstable equilibrium for this differential equation.

Physics (falling object)

Let’s do a physics example. Suppose that we drop an object from some height. I will take its height from the ground to be the position variable $y$, which will vary with time $t$. That means, as it falls, its velocity $v=dy/dt$ will be negative, because the height $y$ is decreasing over time. We interpret the negative velocity as a direction: the object is traveling downwards.

Recall that Newton’s law says that the acceleration is given by a=F/m. Acceleration is rate of change of velocity, which in calculus terms is dv/dt. So Newton’s law is a differential equation:

$\dfrac{\mathrm{d}v}{\mathrm{d}t}=\dfrac{F}{m}$.

Here, $m$ is the mass of the object, and $F$ is the force. The force could depend on the time $t$, the position $y$, and/or the velocity $v$. For any known force $F$, we get a differential equation, and then we can start drawing slope fields and trying to figure out what will happen.

For the falling object, the simplest case people usually look at (starting with Galileo and Newton) is to assume no air resistance, and to assume the object does not move far from the earth. Under those assumptions, the force on the object is constant in time. The constant force of gravity depends on the mass of the object; it is

$F=-mg$,

where $g$ is a constant (the “acceleration of gravity”). (I’ve taken it negative, because the force is downwards, and I’ve taken “up” to be positive.) That means the differential equation for velocity is

$\dfrac{\mathrm{d}v}{\mathrm{d}t}=-g$,

where $g$ is constant. This gives a slope field you have done already (with different variables) in a previous problem:

Slope field for differential equation of a falling object without air resistance, dv/dt=-g.

A solution curve would just be a straight line. If we drop the object, so that the initial velocity is zero, the solution curve would look like:

Slope field for differential equation of a falling object without air resistance, dv/dt=-g, with a solution curve with initial condition v=0 when t=0.

The object has a more and more negative velocity as time goes on. That is, it is moving downwards, faster and faster. Its speed (absolute value of velocity) is increasing linearly with time: the increase in each second is always the same.

This agrees with the explicit solution of the differential equation: if

$\dfrac{\mathrm{d}v}{\mathrm{d}t}=-g$,

then the solution to this is

$v=-gt + C$,

where $C$ could be any constant. In fact, $C=v_0$, the initial velocity at time 0. So for the solution curve we drew, $C=0$, and $v=-gt$. The curve is a straight line, with negative slope $-g$.

Well, what if we don’t ignore air resistance??

Problem: Falling with air resistance

Air resistance is an amazingly complicated phenomenon. There are engineers who spend their whole careers trying to figure out how to minimize or harness wind resistance. But in the spirit of modeling that I said, we can start with the simplest possible assumptions.

For a given object, the most important fact about air resistance is that it gets bigger as the velocity gets bigger. Go faster, you feel more air resistance.

What would be the simplest assumption to make about this relationship? That would be proportionality. That is, we could assume that if there is twice as much velocity, then there is twice as much force from air resistance. Three times the velocity, three times the force. Half the velocity, half the force. In other words, “proportionality” means we are assuming a linear relationship between air resistance and velocity. It’s not exactly true, but it’s actually a pretty good approximation.

What this means as an equation is that

$F_{\text{air}}=-kv$,

for some unknown constant $k$. The $k$ will depend on the size of the object, the exact shape of the object, the texture of its surface, the density of the air, and possibly other things. But we are assuming those are all fixed, so we lump them into a constant $k$.

Note that I am assuming the constant $k$ is a positive number. The minus sign in my equation indicates that, if the velocity is positive, then the force is negative; if the velocity is negative, then the force is positive. The force of air resistance points in the opposite direction to the direction of travel. (Right?)

If we put this into Newton’s law of motion, we get the following differential equation for the velocity as a function of time:

$\dfrac{\mathrm{d}v}{\mathrm{d}t}=-g-\dfrac{k}{m}v$.

Let’s see what this differential equation is predicting!

Problem: Falling object with air resistance
a) Draw the slope field for the differential equation above. Note that, since we don’t have particular numbers for the constants g, k, and m, you will have to draw the slope field qualitatively and conceptually, like I described doing in the last example. Figure out if the slopes will be constant on vertical or on horizontal lines. Then figure out at what points the slope will be zero (equilibrium). Then figure out what happens to the slopes above or below that equilibrium.
b) Draw the solution curve corresponding to dropping the object, so v=0 when t=0.
c) Interpret that solution curve physically. What is happening to the object over time? Does this make sense? (If it doesn’t, you might want to look back at part (a) and check that you did things correctly!)
d) What happens to the velocity after long times, $t\to\infty$? Does this make sense physically?
e) Draw some other solution curves. There should be three “types” of solution curves, similar to the last example that I did above.
c) Figure out what each of these solution curves would mean physically, in terms of the initial condition. What happens over time, physically, in each case? Do these predictions make sense physically?

Problem: Newton’s law of cooling

Let’s do another modeling example.

Here’s the thing I want to model: a hot object that is cooling down. Or, a cold object that is heating up.

Let’s say the temperature of the object is $T$. The temperature $T$ depends on time $t$. Let’s say that the room temperature is fixed; let’s call the room temperature $R$, which we assume to be constant.

Let’s assume that the object cools down or heats up continuously.

It is observed that very hot objects cool down faster than objects which are less hot (less above room temperature). Similarly, very cold objects heat up faster than objects closer to room temperature.

Newton made the following assumption to model this situation:

the temperature changes with time at a rate which is proportional to the difference between its temperature and the room temperature.

This isn’t 100% true, but like I was saying before about population models, it’s a good to start with the simplest reasonable assumption, and see how far that gets you.

Note that “proportional” means “equal to a constant times”. If $A$ and $B$ are “proportional”, then their ratio A/B is a constant, A/B=k, so that A=kB for some constant k.

Problem: Newton’s law of cooling
a) Set up a differential equation for the temperature $T$ as a function of time $t$, which expresses Newton’s assumption above. Your answer will involve the room temperature $R$, and you’ll also need to introduce a proportionality constant (which you can call whatever you like).
b) Draw the slope field for this differential equation. (Because you don’t have specific numbers for the room temperature and the proportionality constant, you’ll have to draw the slope field qualitatively and conceptually, like I described in a previous example.)
c) Draw some solution curves for this slope field. There should be three basic “types” of solution curves (like in a previous example I did).
d) Check the predictions physically: the solution curves are supposed to describe the temperature of a cooling (or warming up) object as a function of time. Do they make sense physically? Do they work in all three cases? If so, great, explain why they make sense to you! If not, though, take a look back at your differential equation, and see what you might have gotten incorrect, that would give you these wrong predictions.

Refining the population model

Let’s go back to the population model that I described in the previous lecture. As the name “exponential growth model” suggests, this predicts an explosive growth of the population.

This is pretty accurate for situations for limited times, where death is not important, and where resources are plentiful. For example, in cell biology, when you have some cells in a growth medium on a dish, the cells happily have plenty of space and food. If you are growing the cells for a few hours or days, people commonly use the exponential growth model to predict how many cells will be present at a given time. It works pretty well.

But, if you extrapolate that equation to longer times, you find that the equation is predicting that after some weeks or months, there will be so many cells that they fill the whole lab, the whole campus, the whole world. We’ve exceeded the range of validity of the model!

If you recall, we vastly oversimplified. In particular, we ignored any limitation on resources: we assumed the organisms could happily reproduce at a constant percentage rate, forever. If we run the equation for long enough, that’s not very realistic.

Presumably, as resources start to reach their limits, that reproduction rate $r$ cannot any longer be constant; it must depend on the population $P$. For a small population, it is the same $r$ that we assumed before; but for larger populations that are reaching the limits of food and space, the $r$ must become lower. So $r$ must not be a constant, but rather a function of population $P$. (Or, at least, that is one way to look at the question of limited resources.)

Open-ended problem: Population growth with limited resources
Here’s an open-ended problem to try out. See if you can make a guess about how $r$ should depend on $P$, to model the assumption of limited food and space. Try to keep things as simple as you can: what is the very simplest mathematical assumption that will reflect our practical assumptions about how $r$ should depend on $P$? Once you have a guess for this dependence, write the corresponding differential equation. Try to draw the slope field, and see what your differential equation is predicting about the population. See if that makes sense! If it doesn’t, go back to your model assumptions, and see if you can correct them in a way that gives predictions that are at least plausible.

What next?

There are two directions I want to go after this.

First, I want to explain how to find explicit solutions for some of these differential equations. That is, I want to find an actual formula for the solution curves that we have been drawing just visually. We’ve seen how to find explicit solutions for very simple differential equations, like dy/dx=1, or dy/dx=x, but that’s it so far. This is a very big question (there’s a whole course on it next term!), but I want to at least show how to solve some of the other simple differential equations. This process will involve natural exponential and logarithm functions, and I’ll re-define and explain these for you, as well as tell you some new things about them.

Second, I want to continue this for differential equations that have more than one variable depending on time. For example, I might have population equations for a prey and a predator species, each of which population depends on time (and on the population of the other). Or I might have a disease model, that models number of susceptible people and number of infectious people. In physics, I have both velocity and position depending on time, and their rates of change might depend on each other. This will lead to something called “phase plane analysis”. It will also lead us back to trigonometric functions, and ultimately back to the Kepler problem where I started the term.

Differential equations

In this lecture, I want to introduce how differential equation models are created and interpreted. In the process, I’ll talk about the geometrical meaning of the derivative, and introduce the concepts of slope fields and phase planes. As well as being useful in itself, this will lead into exponential and logarithmic functions, and from there will lead back to trigonometric functions and ultimately the Kepler problem again!

I know that things have been pretty physics-y so far. So, although there are many physics examples I could give here, I’m going to do some examples from different fields for a bit.

Example: simple population growth

Let’s start with modeling population growth. These could come up in biology (e.g. growing a culture of cells), ecology (a population of animals), or geography/economics (a population of people).

There are many complicated factors to take into account when thinking about population growth. We might want to consider:

  • how quickly does the organism reproduce?
  • how does the organism reproduce? (do individuals need to find mates? How long do they need between reproductions? Do they need a certain amount of resources? Is it seasonal or continuous through time?)
  • are the resources (like food and space) limited?
  • is there disease? Are there predators, war, competition?

You could probably add many things to the list. It’s tempting to try to put as many things as we can into the model, to make it more realistic.

However, one counter-intuitive principle of modeling is that this is NOT a good place to start. There are a number of problems with trying to put too many factors into the model to begin with:

  • it’s hard to know where to start, and we’ll get confused
  • if the model is too complicated, there will be too many parameters to try to estimate when comparing to reality
  • it will not be clear what the most important factors are when comparing to reality, so we won’t get any insight into the actual dynamics driving the observed behavior
  • a model with two many parameters actually starts to lose predictive power, because it can start to fit anything

Contrary to what you might think, usually the best place to start when creating a mathematical model is to strip things down as much as possible. We want to idealize to the point where it seems almost too simple. If we can isolate just one factor of the thing we are trying to model, then we have a chance of writing a reasonable equation that describes that factor. Then, we can see what the model predicts. Sometimes, just one very simple factor will do a surprisingly good job! But sometimes, that model will prove to be TOO simplified. But that’s OK: it will be easier to add things to our simple model, slowly and one at a time, rather than start with something that is too complicated and trying to pare it down.

Let’s see how this looks for population growth. I’m going to make things as simple as possible:

  • let’s only focus on growth. So I’m going to ignore predators, disease, war.
  • in fact, I’m going to ignore death! Let’s say the organisms are immortal. (Or, at least, that they live longer than the time period I’m concerned with.) Not so realistic, but remember, we are paring down as much as we can.
  • Let me also ignore problems of gestation period, finding mates, etc. So, I could imagine that I have something like binary fission of single-celled organisms. The model will also work to model more complicated things like mammals, but it’s going to leave out the more complicated features of reproduction; depending on what we’re interested in, that may be an oversimplification, but let’s start simple now and add complications only as needed.

OK, so what am I left with? I’m going to leave in the growth rate: so I have only one factor or dynamic that I’m modeling, how the population grows.

Now, are we talking about absolute growth rate (number of new individuals per unit time), or about relative growth rate (some proportion of our population of individuals will reproduce per unit time)?

Assuming an absolute growth rate would mean that our population grows linearly, the same number of new individuals each hour or year or whatever our time period is. That’s a little TOO simple: as we have more individuals, we have more individuals to reproduce. THAT is the core dynamic that I’m trying to describe.

So, let’s assume that the relative growth rate is constant. That is, in each time period, some percentage of the individuals will create a new individual. The period might be one year (if we’re talking about squirrels) or one hour (if we’re talking about yeast). The rate will depend on the organism. Let’s say, for argument’s sake, that the rate is 10%: each time period, 10% of our individuals will create a new individual. Then if we have 1000 individuals at the start, we will have:

TimeIndividualsNew individuals created
010001000(0.10)=100
111001100(0.10)=110
212101210(0.10)=121
313311331(0.10)=133.1
414641464(0.10)=146.4

Such a model is called a difference equation. We can write it out as an equation: in each time period, our population increases by $\Delta P$ individuals. The number of new individuals created is equal to 10% of the current population. Therefore,

$\Delta P = (0.10) P$,

where $P$ is a variable that depends on time $t$ (that is, it is a function of $t$), and the $\Delta P$ is understood to be for one time period.

If the rate was some other number, rather that (0.10), we could just put that number $r$ in place of the (0.10) in the equation:

$\Delta P = r P$,

where $r$ is a constant, to be determined from observation.

This gives us a model for the growth! We can measure the growth of our cell culture, say, for a few time periods to estimate the rate $r$. Then we can make a table like the above.

There is a lot more to say about difference equations, where the time step is discrete. But I’m going to not do them justice in this class, because I want to focus on differential equations, where we make the change continuous.

Differential equations versus difference equations

Now, let’s imagine that our organisms are always reproducing. (They’re not like squirrels, with a breeding season; they’re more like yeast.) Then the table we wrote above is a little bit not representative: during that first time interval (first day or hour or whatever), the organisms are growing. There are some new organisms produced during that first time interval which we aren’t accounting for. That won’t be a big deal if the time interval is relatively short, compared to the rate of growth of the organisms; but if the time interval is on the long side, we should take this into account.

That is, if we again assume that 10% of individuals are, on average, reproducing per growth period, then 5% of them should reproduce in the first half of a time period (assuming that the times they reproduce are randomly distributed). That is, 5% is 0.5 of the 10% rate, because we are looking at 0.5 of a time interval. If we start with 1000 individuals, then after half a time period we should have 1050; then, in the next half-period, those 50 individuals are also reproducing, so after one time period, we would have 1103. Our table would look like:

TimeIndividualsNew individuals created
010001000(0.10)(0.5)=50
0.510501050(0.10)(0.5)=52.5
1.01102.51102.5(0.10)(0.5)=55.1
1.51157.61157.6(0.10)(0.5)=57.9
2.01215.51215.5(0.10)(0.5)=60.8
2.51276.31276.3(0.10)(0.5)=63.8
3.01340.11340.1(0.10)(0.5)=67.0

Note that I’ve started keeping decimals. I’m idealizing more here: I’m keeping decimals when I do the calculations, then I’ll round off to a whole number of individuals in the final answer if needed. Note also that it doesn’t make a huge difference, but that’s because of my numbers: with other numbers (like if $r$ was larger), the difference would be bigger. And even in this case, the differences will start to get substantial if I run this for longer times.

But why stop at half time periods? During that first half a time period, the organisms were reproducing as well. So, in the first (0.25) of a time period, we would have (10%)(0.25)=2.5% of the organisms reproducing. And those extra 2.5% would have a chance to reproduce in the following (0.25) of a time period. So our table would look more like

TimeIndividualsNew individuals created
010001000(0.10)(0.25)=25
0.2510251025(0.10)(0.25)=25.63
0.501050.631050.63(0.10)(0.25)=26.27
0.751076.901076.90(0.10)(0.25)=26.92
1.001103.821103.82(0.10)(0.25)=27.60

And why stop there? In every 0.1 time periods, we would have 10%(0.1)=(0.10)(0.1)=0.01=1% of the organisms reproducing, and they would go on to reproduce in the next 0.1 of a time period . . .

TimeIndividualsNew individuals created
010001000(0.10)(0.1)=10
0.1010101010(0.10)(0.1)=10.10
0.201020.101020.10(0.10)(0.1)=10.20
0.301030.301030.30(0.10)(0.1)=10.30

If $\Delta t$ is the fraction of a time period that we are looking at (like, 0.5 of a time interval, or 0.1 of a time interval), then the percentage of individuals that reproduce is $(0.10)\Delta t$. (Right? Check that with what I wrote above.) So the number of new individuals $\Delta P$ that are produced in that fraction $\Delta t$ of a time interval are

$\Delta P=(0.10)(\Delta t)P$,

which I’ll usually write as

$\Delta P=(0.10)P(\Delta t)$.

That’s still a difference equation, but for the shorter time interval $\Delta t$. If I want to imagine the organisms continuously growing, then I would take the limit as the $\Delta t$ gets smaller and smaller. For an “infinitely small” time period $\mathrm{d} t$, the infinitesimal increase in the population will be $\mathrm{d} P$, given by

$\mathrm{d} P = (0.10) P \,\mathrm{d} t$.

This is called a differential equation, because it is an equation for the differentials! This differential equation models the continuous growth of a population, at a continuous rate of 10% of individuals, on average, reproducing per time period. (Which does not mean the population will actually grow 10% in one time period! It will grow more, for the reasons I talked about just above.)

This differential equation can also be seen as an equation for the derivative, because we can rewrite it as

$\dfrac{\mathrm{d}P}{\mathrm{d}t}=(0.10)P$.

The population $P$ is an unknown function of time, which we’d like to figure out. What we know, from the assumptions of our model, is something about the derivative of this unknown function! So the calculus difficulty will be to find the function, knowing something about its derivative!

If the rate was some value $r$, instead of 10%, then these equations would become

$\mathrm{d} P = r P\, \mathrm{d} t$
or
$\dfrac{\mathrm{d}P}{\mathrm{d}t}=rP$.

This lovely differential equation goes by the name exponential growth model (and we’ll see the reason for the name in due time!).

But wait, is this really realistic?

You may have been saying to yourself, but wait! Does this actually make sense?

That is a very reasonable worry! Even yeast are not really growing continuously: they can only grow one yeast cell at a time. And fractional individuals don’t really make sense in the real world. And, besides, looking at my tables, number-wise it didn’t seem to make all THAT much difference. So why am I insisting on continuous growth?

One answer is that continuous growth is often a good approximation, when the number of individuals is large. If we are talking about millions of yeast cells, or about the human population of the world, the numbers are so big that it is a good approximation to imagine the number changing continuously. Also, with large numbers like that, the assumption of continuous growth makes a bigger difference.

Another answer is that I will also want to model different kinds of growth or decay, where the numbers are large and so continuous growth is a good model.

However, some populations really make more sense as difference equations. If we have a small number of individuals, or if we have a discrete breeding season, then continuity is not a good assumption. So why focus on continuity?

Surprisingly, it’s often easier to understand a differential equation (with continuous change), than it is to understand a similar difference equation (with discrete changes). So even when we want to model a system with discrete changes, people often start off by modeling with differential equations, which are easier to mathematically understand!

Actually, this really misled people for a long time. People would make a differential equation model for something (like a population), and assume that the difference equation model wouldn’t behave too differently. In fact, once computers came around (which could compute difference equation models very quickly), people started to realize, with great surprise, that the difference equation might behave VERY differently from the differential equation! This was first noticed by a biologist, Robert May, in 1976, when we was studying population models much like the ones we will be doing soon. This was one of the starting points of the theory of “chaos” and dynamical systems.

If you want to learn more about difference equations (and about modeling), you should take Katie’s class, Quantitative Reasoning and Mathematical Modeling. Katie’s expertise is in modeling, and in chaos and dynamical systems. In this class, we are going to concentrate on models with continuous change, so differential equations.

How to read a differential equation

Let’s look at the differential equation that we developed to model population growth (the “exponential growth equation”). It was

$\mathrm{d} P = r P \mathrm{d} t$
or
$\dfrac{\mathrm{d}P}{\mathrm{d}t}=rP$.

The first equation is saying that, in a short time period $\mathrm{d} t$, the population is changing by an amount $\mathrm{d} P$. That change in population is proportional to the population itself; the proportionality constant is $r$.

The second equation is saying that the (absolute) rate of change of the population with time, $\frac{\mathrm{d}P}{\mathrm{d}t}$, is equal to the population times a rate constant $r$. Bigger population, bigger absolute growth rate, in an entirely proportional way. Twice the population, twice the absolute rate of growth (because twice as many individuals to be reproducing). Ten times the population, ten times the absolute rate of growth.

Very often, a scientific paper will show you their differential equation model for something, and an important skill is to read through what it is claiming conceptually.

The differential equation model encapsulates our assumptions about the mechanics of the system. All the practical things we said at the beginning—about there being no disease, no limits on resources, no complicated breeding—are all reflected in the simple model we created.

What is a “solution” to a differential equation?

The “solution” to the differential equation

$\dfrac{\mathrm{d}P}{\mathrm{d}t}=rP$

would be a function $P$, depending on time $t$, which would obey that condition. In other words, the “solution” would be a formula that gives the population $P$. The formula would involve $t$, and it would have the property that if you took its derivative, you would get the same formula again, just multiplied by $r$.

Practically, the differential equation tells you how the population changes from one time moment to the next. To “solve” the differential equation is to figure out what the population is going to be in one year, ten years, one hundred years.

I can’t show you a solution of this differential equation yet, because we still need to figure it out. But let me give another example: let’s say our differential equation was

$\dfrac{\mathrm{d}P}{\mathrm{d}t}=(0.10)t$.

Then a solution of this equation would be $P=(0.05)t^2$, because if you take dP/dt for this function, you get (0.10)t, as required. This differential equation has infinitely many solutions: $P=(0.05)t^2+C$, where $C$ can be any constant. The value of the constant is determined by the initial condition; if I know the value of $P$ at time $t=0$, then I can determine $C$.

The exponential growth differential equation

$\dfrac{\mathrm{d}P}{\mathrm{d}t}=rP$

is a trickier beast, because it says that $P$ should be a formula whose derivative equals $r$ times the same formula. We’ll see how to find such a function soon!

We will see some methods for figuring out solutions of differential equations. If you take the Ordinary Differential Equations course next term you will see many more! However, it’s worth noting that frequently, we can NOT find any explicit formula for the solution of a differential equation. That doesn’t mean the solution does not exist; it just means that we can’t give a formula using standard functions that we know. However, that doesn’t mean all hope is lost: in fact, we will be able to figure out a lot about a solution of a given differential equation, without necessarily having a specific formula for it.

What is a differential equation model?

Setting up the differential equation is modeling: that is where our assumptions and knowledge about the real-world system come into play. Finding the solution of the differential equation is purely math: that is applying techniques of calculus. At that point, I forget that this is a real-world problem; I just want a function $P$ that obeys a certain mathematical condition. Of course, once I find that function, then I can make predictions about the real world based on my model, and I can compare them to what I observe in reality. In response to that, I might decide to change my model, perhaps to put in some effect that I left out before.

This back and forth, between creating a model, making mathematical predictions, going back to compare to reality, then refining the model, is the subject of applied mathematics.

If you’d like to know more about the relationship of mathematical models to real observations, I recommend starting by reading this excerpt from a book by Timothy Gowers called Mathematics: A Very Short Introduction.

How to approximately numerically solve a differential equation

A little ironically, if we want to numerically calculate what a differential equation is predicting, the idea is to convert it back to a difference equation. That is, replace the differential equation, for example

$\mathrm{dP}=(0.10)P\,\mathrm{d}t$

with a difference equation

$\Delta P = (0.10) P \Delta t$,

where we take a small value for $\Delta t$. Then we calculate tables, like I did in the example at the beginning. The smaller the $\Delta t$, the better the approximation to the continuous differential equation.

How to picture a derivative

I want to describe how to picture a differential equation. But there’s a problem: I haven’t talked yet in this class about how to picture a derivative! So, first, I want to talk about how to picture a derivative. Then, I want to apply that to picture a differential equation.

Let’s suppose that $P$ is a function of $t$. And let’s suppose we draw a graph of this function; say it looks like this:

A graph of population P versus time t

Now, let’s pick some specific value $t$ for the time. The meaning of the graph is, if I go over $t$ units from the origin on the horizontal axis, then I go up to the graph, then the height of that point gives me the value $P$ of the population, at that value $t$ of the time:

An input value t (measured from the origin on the horizontal axis) gives an output value P (measured from the origin on the vertical axis). The value of the population P at time t is the vertical height P of the graph at the horizontal position t.

Now, suppose we increase the time slightly. That is, we choose another time $t+\mathrm{d}t$, which is $\mathrm{d}t$ larger than $t$. This slightly later time will correspond to a slightly larger population, $P+\mathrm{d}P$. This looks like:

Increase t to t+dt, and then population increases from P to P+dP.

Now, compare the positions of those two points on the graph. Compared to the first point, the second point is moved over $\mathrm{d}t$ time units horizontally, and is moved up $\mathrm{d}P$ population units vertically:

The two points on the graph, corresponding to the times t and t+dt. Compared to the first point, the second point is moved over by dt, and up by dP.

You may recall from high school that the ratio of “up/over”, or “rise/run”, is referred to as the slope of the straight line between the two points. So, the ratio

$\dfrac{\mathrm{d}P}{\mathrm{d}t}$,

which is the derivative of $P$ with respect to $t$, on the graph represents the slope of that little red line segment!

But wait: that little red line segment isn’t a straight line! That’s why we have to take the $\mathrm{d} t$ to be very small: it is assumed small enough that we can approximate the segment of the curve as a straight line. Alternately, we can say that we are zooming in to the curve closely enough that it resembles a straight line. Alternately, we can say that we are taking the limiting value of the slope, as we take the $\Delta t$ smaller and smaller.

All this boils down to:

the value of the derivative $\dfrac{\mathrm{d}P}{\mathrm{d}t}$ at some $t$ gives the slope of the graph of $P$ versus $t$, at that value of $t$.

That “fact” is a little bit circular: in mathematics before calculus, you only define “slope” for straight lines. So this “fact” is really kind of a definition: this is what we mean by the slope of a curved line! To find the slope of a curve at some point $(t,P)$, we find another point $(t+\mathrm{d}t,P+\mathrm{d}P)$ on the curve really nearby, and find the slope between those two points.

If you’d like more explanation of this, I strongly recommend reading Chapter X: Geometrical Meaning of Differentiation, in the book Calculus Made Easy by Silvanus Thompson. (The chapter starts on page 97 of the pdf.)

Note that $\frac{\mathrm{d}P}{\mathrm{d}t}$ is also, practically, the rate of change: it says at what absolute rate $P$ changes with respect to time $t$. In our population example, its units would be individuals per unit time. So, if the graph has a large slope, it means the population is increasing rapidly; a small slope means increasing slowly; a negative slope means the population is decreasing.

Note that the absolute rate of change of population with time will be changing over time. In our exponential growth model, it’s going to be growing slowly at first, (small slope), and then faster and faster over time, (bigger slope), as the population gets bigger and there are more and more individuals to be reproducing.

How to picture a differential equation

OK, so this section was about differential equations. How do we picture a differential equation, for example,

$\dfrac{\mathrm{d}P}{\mathrm{d}t}=(0.10)P$?

The problem is that we don’t yet know what the curve $P$ as a function of $t$ looks like. That would be the “solution” to this differential equation.

The differential equation is not telling us what the population is at any given time. But what it IS telling us is, IF we knew the population, THEN it tells us the SLOPE of that graph.

Let me give a numerical example. Suppose the time $t=0$, and the population $P=1000$. Then the differential equation tells me that

$\dfrac{\mathrm{d}P}{\mathrm{d}t}=(0.10)(1000)=100$.

This means that, IF the curve goes through the point $(t,P)=(0,1000)$, then its slope there is $100$. Let me draw this:

A line starting at (0,1000), and with slope 100 (so it goes to (1,1100)).

See what I did there? I drew the point (0,1000). Then I knew that, IF the graph goes through that point, it must have slope 100. So I drew a line from (0,1000) to (1,1100), which would have slope 100.

But wait! As we discussed before, the graph wouldn’t keep having that same slope, all the way from time $t=0$ to time $t=1$. The slope would actually increase. All I know is that the slope RIGHT AT (0,1000) is equal to 100. So, I should draw my little slope line smaller, just at the point (0,1000):

A line showing that the slope of the graph at (0,1000) (IF the graph goes through that point!) would be 100.

Of course, if I drew the slope JUST at that point, the line would be invisible! So I have to make the slope line big enough to see. But the assumption is that the little green line is just giving the slope of the curve AT the point (0,1000). And that’s only IF the curve does in fact go through the point (0,1000), which we don’t know!

Now, it’s hard to see the exact value of the slope just from drawing a little line segment. So we aren’t going to try to make this exact. Instead, we are looking for a qualitative picture of what the slopes are. Let’s zoom out a little:

Starting to draw the differential equation for population versus time. You can barely see the one little green line I have drawn, at (0,1000). You can’t really tell, but it’s supposed to have slope 100 (if you go over 1 time unit, you go up 100 in population, which isn’t much because the units on the P axis are in thousands).

Now, let’s figure out the slope at some more points. I’m going to start by figuring out slopes at some representative points on the vertical P axis:

PointSlope
(t,P)=(0,0)dP/dt=(0.10)P=(0.10)(0)=0
(t,P)=(0,1000)dP/dt=(0.10)P=(0.10)(1000)=100
(t,P)=(0,2000)dP/dt=(0.10)P=(0.10)(2000)=200
(t,P)=(0,3000)dP/dt=(0.10)P=(0.10)(3000)=300
(t,P)=(0,4000)dP/dt=(0.10)P=(0.10)(4000)=400
(t,P)=(0,5000)dP/dt=(0.10)P=(0.10)(5000)=500
The slopes at some points on the vertical P axis. We don’t know which point the curve actually goes through, so we have to draw them all.

The important thing here is that the slopes are increasing as P increases. So I can draw that on my graph, at least qualitatively:

Representing the slopes given by the differential equation, at least on the P axis.

I don’t know if I’ve really succeeded at drawing a line of slope exactly 500 at the point (0,5000). But the point is that I have drawn the lines getting more and more sloped, as we move up the P axis.

Note that I don’t know where the function of P versus t hits the P axis. In practical terms, I haven’t been given the initial population at time $t=0$. That is not part of the differential equation; it’s called an initial condition, and it’s a separate piece of data that I’m not assuming I have yet.

Now, I’ve only done points with time $t=0$. So let’s do some points with other values of t. Let’s say $t=1$. Then we have:

PointSlope
(t,P)=(1,0)dP/dt=(0.10)P=(0.10)(0)=0
(t,P)=(1,1000)dP/dt=(0.10)P=(0.10)(1000)=100
(t,P)=(1,2000)dP/dt=(0.10)P=(0.10)(2000)=200
(t,P)=(1,3000)dP/dt=(0.10)P=(0.10)(3000)=300
(t,P)=(1,4000)dP/dt=(0.10)P=(0.10)(4000)=400
(t,P)=(1,5000)dP/dt=(0.10)P=(0.10)(5000)=500
The slopes at some points on the vertical P axis. We don’t know which point the curve actually goes through, so we have to draw them all.

It’s the same thing!! Why? Because there is no $t$ in my formula for slope! The slope dP/dt is given by (0.10)P. This particular differential equation doesn’t involve the $t$ in the formula for the slope. So the slope is independent of t. (Such a differential equation is called autonomous.)

That means the slopes at the points (0,1000), (1,1000), (2,1000), (3,1000), . . . are all the same! So I can draw some little slope lines in at those points too, pretty easily:

The slope field for the exponential growth differential equation dP/dt=(0.10)P.

I haven’t done a perfect job here. But what I’m trying to show is that the slope is constant on horizontal lines, equals zero on the t-axis, and gets larger as we go up vertically.

Negative values of t and P don’t make much sense in our model. But it can be sometimes helpful to mathematically understand the equation to allow for negative values as well. Negative values of t don’t change much: the slopes are constant in t. For negative values of P, the slope is negative (right?). So we get a picture like:

The slope field for the exponential growth differential equation dP/dt=(0.10)P, with negative values included.

Note that, really, there should be a little slope line attached to EVERY point (t,P) in the t-P plane. We have no idea where the graph of P versus t goes through yet; it could go through a point like (0.5, 1300), so theoretically I’d have to imagine a little slope line attached to that point (of slope 130). But if I try to draw ALL the slope lines, (a) it will take me a very long time, and (b) I’ll just end up with an unreadable plot, totally filled with green lines. So, I just pick a representative grid of points.

This picture is called a slope field. It is a picture of the differential equation dP/dt=(0.10)P. It represents the condition that, at whatever point (t,P) the solution curve goes through, its slope there must be given by (0.10)P.

Practically, this reflects the fact that the differential equation never tells you what the population IS. It tells you, IF you know the population at a certain moment, how the population will CHANGE in the next moment. That is, it gives you the small dP, if you set a small dt. But only for small dt: it only gives you the next little step. Because once you move by dt in time, the population changes to P+dP, and now you have to recalculate the slope based on the new population.

How to picture solution curves to the differential equation

Once we draw the slope field to a differential equation, we can get a pretty good qualitative sense of what a solution curve will look like.

There is more than one possible solution curve. In fact, there are infinitely many. The solution curve depends on the initial condition. Once we choose a starting point in the slope field, then the slope field tells us what the rest of the curve has to look like.

For example, if I take the starting condition of P=1000 at the initial time t=0, then I will get a solution curve that roughly looks like

Solution curve for the differential equation dP/dt=(0.10)P, and the initial condition (0,1000).

I am trying to make the red curve have slope equal to the slope field at all of its points. Again, you have to imagine that there are lots of little green lines, inbetween the ones I drew: the slope field exists at every point. At each of those points, if I find the slope dP/dt of the curve (by zooming in really close on two points close together), that slope should agree with the slope field.

If I choose a different initial condition, I will get a different curve following the slope field:

Two solution curves. The second one has an initial condition of (0,2000).

If I keep picking different initial conditions, I get a whole family of solution curves, one for each initial condition.

A family of solution curves to the differential equation dP/dt=(0.10)P.

Again, in principle, I have infinitely many solution curves; but if I tried to draw them all, it would get very messy.

In summary:

  • The differential equation tells you what slope the unknown solution function should have at every point. It therefore gives you a slope field.
  • Picking an initial condition picks out one particular solution of the differential equation. Graphically, you start at that point, and then make a curve which follows the direction of the slope field at each point.
  • These graphical solution curves represent solutions of the differential equation. A solution of the differential equation is a function which satisfies the condition that the differential equation is saying.

Where next?

Phew, I’m tired! That was a lot.

In the next lectures, I’m planning to do (or get you to do) three things:

  • Set up more differential equation models, based on different real-life situations.
  • For different differential equations, draw the slope fields and typical solution curves, and use that to make predictions about the behavior of the system.
  • For some simple differential equations, (including the exponential growth equation), to show how to find explicit formulas for the solutions, based on ideas we have developed so far.

Eigenvalues II: Markov processes continued

In this lecture, I will return to the Markov chain that I introduced in the previous lecture. I want to provide answers for some of the questions I asked you there, particularly about finding an equilibrium state. But I also want to go further and show how to simplify the evolution, using eigenvectors. In the process, I will connect back to the material we have been doing on dual vectors and projections.

The example Markov chain from last time

Let me remind you of the example from last time. I had a system of agents that could be in two states, 1 or 2. I start with $x_1(0)$ agents in state 1, and $x_2(0)$ agents in state 2. In each time step, some agents would change states, summarized by the following graph:

Graph summarizing the Markov chain.

In each time step, $0.10$ of the agents in state 1 would switch to state 2, and $0.05$ of the agents in state 2 would switch to state 1.

Alternately, this can model only one agent, or system, which changes states with some probabilities. Then $x_1(t)$ and $x_2(t)$ are the probabilities of the single system being in state 1 or 2 respectively.

The goal is to understand, as completely as possible, the behavior of $x_1(t)$ and $x_2(t)$ as functions of time.

The evolution equations we set up last time

In the last lecture, we set up the evolution equation. It is:

$\begin{bmatrix}x_1\\x_2\end{bmatrix}(t+1)=T\begin{bmatrix}x_1\\x_2\end{bmatrix}(t)$,

where the transition matrix $T$ for this system is

$T=\begin{bmatrix}0.90&0.05\\0.10&0.95\end{bmatrix}$.

Alternatively, we can write the evolution equation in terms of differences. The change from time step $t$ to $t+1$,

$\Delta\begin{bmatrix}x_1\\x_2\end{bmatrix}(t):=\begin{bmatrix}x_1\\x_2\end{bmatrix}(t+1)-\begin{bmatrix}x_1\\x_2\end{bmatrix}(t)$,

is given by

$\Delta\begin{bmatrix}x_1\\x_2\end{bmatrix}(t)=\Delta\begin{bmatrix}x_1\\x_2\end{bmatrix}(t)$.

Oops, that equation looks silly, doesn’t it!? It’s because I have two clashing notations here (which are totally standard, so I don’t want to change them). The greek capital Delta, “$\Delta$”, on the left refers to “change in the vector”. But the same symbol $\Delta$ on the right refers to what is called the weighted graph Laplacian matrix,

$\Delta=\begin{bmatrix}-0.10&0.05\\0.10&-0.05\end{bmatrix}$,

which is multiplied by the vector.

Anyway, that means that, for each time step $t$,

$\begin{bmatrix}x_1\\x_2\end{bmatrix}(t+1)=\begin{bmatrix}x_1\\x_2\end{bmatrix}(t)+\Delta\begin{bmatrix}x_1\\x_2\end{bmatrix}(t)$,

which means

$\begin{bmatrix}x_1\\x_2\end{bmatrix}(t+1)=(I+\Delta)\begin{bmatrix}x_1\\x_2\end{bmatrix}(t)$,

where $I=\begin{bmatrix}1&0\\0&1\end{bmatrix}$. There are some advantages to writing it both ways.

Finding the steady state

The first question I asked is, does this system approach an equilibrium state? If I start with $x_1(0)$ agents in state 1, and $x_2(0)$ agents in state 2, will the number of agents $x_1(t)$ and $x_2(t)$ settle down over time, or will they oscillate (or even jump around chaotically?). And if they do settle down, what will the equilibrium values be, and how will they depend on the initial values $x_1(t)$ and $x_2(t)$?

We can say that we are looking for possible vectors $\begin{bmatrix}x_1\\x_2\end{bmatrix}$ with the property that

$T\begin{bmatrix}x_1\\x_2\end{bmatrix}=\begin{bmatrix}x_1\\x_2\end{bmatrix}$;

so the transition from time step $t$ to $t+1$ doesn’t change the values.

This is connected to the eigenvectors I talked about in the previous lecture. Such a vector would be an eigenvector for the matrix $T$, with eigenvalue of 1.

We could set this up as linear equations and solve. But there is a trick which makes it a little easier. If I write $T=I+\Delta$, then a steady state vector would have the property that it doesn’t change under the evolution equation. So we’re looking for possible vectors $\begin{bmatrix}x_1\\x_2\end{bmatrix}$ with the property that

$\Delta\begin{bmatrix}x_1\\x_2\end{bmatrix}=\begin{bmatrix}0\\0\end{bmatrix}$.

That is, the vectors we are looking for are eigenvectors of the matrix $\Delta$ as well, with eigenvalue 0. (Note that my definition of eigenvector requires that the vector be non-zero. The zero vector would satisfy this equation, but wouldn’t be very helpful.)

Another word for this is that the vector $\begin{bmatrix}x_1\\x_2\end{bmatrix}$ is in the kernel of matrix $\Delta$. I’ll have more to say about kernels soon.

Anyway, the system $\Delta\vec{x}=\vec{0}$ is a little easier to solve.

Understanding exercise: computing the steady-state vectors
a) Solve the system $\Delta\vec{x}=\vec{0}$.
b) Check that your solutions are in fact eigenvectors of $\Delta$, with eigenvalue $0$.
c) Check that your solutions are in fact eigenvectors of $T$ as well, with eigenvalue $1$.
d) Can you explain why, practically for this model, that any multiple of a steady state vector is also a steady state vector? (A rough explanation is fine.)
e) Suppose that $A:V\to V$ is any linear transformation, and $\vec{v}\in V$ is an eigenvector of $A$ with eigenvalue $\lambda$. Prove that then, for any non-zero scalar $a$, the vector $a\vec{v}$ is also an eigenvector of $A$, with eigenvalue $\lambda$.
f) Suppose that the initial state is $x_1(0)=100$ and $x_2(0)=0$. Find the steady state that this would approach to.
g) Find the steady state vector, in terms of any given $x_1(0)$ and $x_2(0)$.
h) If the system is interpreted as a single agent, moving between states 1 and 2 with some probabilities, then what are the steady-state probabilities?

You should find that the solution set to $\Delta\vec{x}=\vec{0}$ is

$\left\lbrace\begin{bmatrix}x_1\\x_2\end{bmatrix}:\begin{bmatrix}x_1\\x_2\end{bmatrix}=s\begin{bmatrix}1\\2\end{bmatrix},\,s\in\mathbb{R}\right\rbrace$.

Any non-zero scalar multiple of the vector $\begin{bmatrix}1\\2\end{bmatrix}$ will be an eigenvector of $\Delta$ with eigenvalue $0$, and also an eigenvector of $T=I+\Delta$ with eigenvalue $1$. These are the possible steady states.

If we know the initial numbers of agents $x_1(t)$ and $x_2(t)$, then we can solve for the unknown scalar as well. The trick is that, as I noted last time, this system preserves total number of agents (or preserves probability, if we think of it as a single agent). (There are various names for this condition. In our current context, such a system is called Markov. In physics, it says that this satisfies the continuity equation, or is incompressible.)

Anyhoo, since I know the number of agents is constant over time, the variable

${x’}_1 = x_1+x_2$

is constant in time. Consequently, the steady-state vector will have to satisfy two equations:

$\begin{bmatrix}x_1\\x_2\end{bmatrix}=s\begin{bmatrix}1\\2\end{bmatrix}$, and $x_1+x_2=x_1(0)+x_2(0)$.

Understanding exercise: computing the specific steady-state vector
Solve the above equations, to find the parameter $s$ and therefore the specific steady-state vector.

We obtain for the steady-state vector

$\begin{bmatrix}x_1\\x_2\end{bmatrix}=\dfrac{x_1(0)+x_2(0)}{3}\begin{bmatrix}1\\2\end{bmatrix}$,

or, absorbing the 1/3 into the vector,

$\begin{bmatrix}x_1\\x_2\end{bmatrix}=\left(x_1(0)+x_2(0)\right)\begin{bmatrix}\frac{1}{3}\\\frac{2}{3}\end{bmatrix}$.

If we interpret the model as a single agent, and interpret $x_1(t)$ and $x_2(t)$ as probabilities, then $x_1(0)+x_2(0)=1$, so the steady-state probabilities are

$\begin{bmatrix}x_1\\x_2\end{bmatrix}=\begin{bmatrix}\frac{1}{3}\\\frac{2}{3}\end{bmatrix}$.

In summary: it seems that, no matter where the system starts, that long-term, 1/3 of the agents end up in state 1, and 2/3 of the agents end up in state 2. Or, if it is a single agent, it will keep jumping back and forth, but it will spend 1/3 of the time in state 1, and 2/3 of the time in state 2.

Long-term behavior more precisely

But wait, that’s not really a total answer. We have identified an equilibrium state: if the system is exactly in that state, it will stay there. But will it get there? And will it stay there if it is not exactly in the state? Think of a ball at the very peak of a smooth hill: if it is exactly at the peak, it is in equilibrium and will stay there. But it will almost never end up there from other initial conditions; and if it is slightly away from the peak, it will move away.

If you did the computer experiments, or if you think about the meaning of the model, it should be pretty plausible that the system always moves toward the equilibrium that we identified. Using the machinery of eigenvectors and eigenvalues, we can show that more precisely, and also say exactly how it is moving toward equilibrium. (This analysis will be important in many other areas as well, not just this problem.)

We have already identified one eigenvector of $T$ and $Delta$, the vector

$\begin{bmatrix}\frac{1}{3}\\\frac{2}{3}\end{bmatrix}$,

with eigenvalue 1 for $T$, and eigenvalue 0 for $\Delta$.

I’ll tell you soon how to find another eigenvector for $T$ and $\Delta$. For the moment, I just want to show you how you can use the other eigenvector to compute the evolution easily.

So, let me pull the other eigenvector that I computed earlier out of the oven:

I claim that the vector

$\begin{bmatrix}1\\-1\end{bmatrix}$

is an eigenvector of both $T$ and $\Delta$.

Understanding exercise:
Check this claim, and find the corresponding eigenvalue.

Again, I’ll tell you how I cooked up this vector later. But for the moment, let me show you how I can use it. I’m going to use it two ways:

  • I can make a basis out of eigenvectors, and T will be simple in that basis; or
  • using dual eigenvectors, I can write T as a sum of projection operators, which puts it in a simple form.

Eigenvalue basis

Here’s the trick of the first method: make a basis (coordinate system) out of the eigenvectors of $T$! This basis will be nicely adapted to expressing the matrix $T$.

So, let $B’=\{{\vec{e}’}_1,{\vec{e}’}_2\}$ be the basis consisting of vectors

${\vec{e}’}_1=\begin{bmatrix}\frac{1}{3}\\\frac{2}{3}\end{bmatrix}$, ${\vec{e}’}_2=\begin{bmatrix}1\\-1\end{bmatrix}$.

Any vector $\vec{v}$ can be uniquely expressed as a linear combination of vectors this basis. I’m going to call the corresponding coefficients the coordinates $x_1’$ and $x_2’$ of the vector $\vec{v}$:

$\vec{v}=x_1\vec{e}_1+x_2\vec{e}_2=x’_1{\vec{e}’}_1+x’_2{\vec{e}’}_2$.

The operators $T$ and $\Delta$ operate very simply on this basis:

$T\left(x_1’\vec{e}_1’+x_2’\vec{e}_2’\right)=x_1’T\left(\vec{e}_1’\right)+x_2’T\left(\vec{e}_2’\right)=\lambda_1x_1’\vec{e}_1’+\lambda_2x_2’\vec{e}_2’$.

All it does is stretch in the $\vec{e}_1’$ direction by the eigenvalue $\lambda_1$, and in the $\vec{e}_2’$ direction by eigenvalue $\lambda_2$. So, if we express $T$ in the basis $B’$,

$T\begin{bmatrix}{x’}_1\\{x’}_2\end{bmatrix}_{B’}=\begin{bmatrix}\lambda_1&0\\0&\lambda_2\end{bmatrix}_{B’}\begin{bmatrix}{x’}_1\\{x’}_2\end{bmatrix}$.

We know that $\lambda_1=1$ and $\lambda_2=0.85$. So, in the basis of eigenvectors, $T$ is represented very simply as a diagonal matrix,

$[T]_{B’}=\begin{bmatrix}1&0\\0&0.85\end{bmatrix}_{B’}$.

In this form, $T$ is quite simple and easy to work with. For example, if you wanted to find the state after 13 time steps, you’d just have to find

$T^{13}\begin{bmatrix}{x’}_1(0)\\{x’}_2(0)\end{bmatrix}_{B’}=\begin{bmatrix}1&0\\0&0.85\end{bmatrix}^{13}_{B’}\begin{bmatrix}{x’}_1(0)\\{x’}_2(0)\end{bmatrix}_{B’}=\begin{bmatrix}1^{13}&0\\0&(0.85)^{13}\end{bmatrix}\begin{bmatrix}x’_1(0)\\x’_2(0)\end{bmatrix}_{B’}$.

No need to do thirteen matrix multiplications!

But, I’m not given the initial data in the $x’_1$, $x’_2$ coordinates, and I don’t want the answer in those weird coordinates either. So I need to change coordinates.

Change of coordinates for a vector: concrete linear equation version

Let’s see how to do this in a concrete way. We’ll make it more systematic in the next section.

Let’s suppose we have initial values $x_1(0)=40$ and $x_2(0)=20$.

Understanding exercise: Change of basis by linear equations
With the initial value vector $\vec{v}(0)=\begin{bmatrix}x_1(0)\\x_2(0)\end{bmatrix}=\begin{bmatrix}40\\20\end{bmatrix}$, find the initial values of the primed coordinates, $x’_1(0)$ and $x’_2(0)$.

Here’s the solution to the exercise above. By the definitions above,

$\begin{bmatrix}x_1(0)\\x_2(0)\end{bmatrix}=\begin{bmatrix}40\\20\end{bmatrix}=x’_1(0){\vec{e}’}_1+x’_2(0){\vec{e}’}_2=x’_1(0)\left(\begin{bmatrix}\frac{1}{3}\\\frac{2}{3}\end{bmatrix}\right)+x’_2(0)\left(\begin{bmatrix}1\\-1\end{bmatrix}\right)$,

so we get the linear equations

$\frac{1}{3}x’_1(0) + x’_2(0)=40$
$\frac{2}{3}x’_1(0) – x’_2(0)=20$.

These have the unique solution $x’_1(0)=60$, $x’_2(0)=20$.

Using eigenbasis to compute transformation

Understanding exercise: Computing powers of T in the eigenbasis
Suppose again that we have starting conditions $x_1(0)=40$ and $x_2(0)=20$.
a) Use the previous exercise to find a simple expression for $x_1(n)$ and $x_2(n)$ for any positive integer $n$.
b) Optional computing exercise: use the expression to make a table of $x_1(n)$ and $x_2(n)$ for $n=1,2,\dotsc,20$.
c) Use the expression to find $x_1(20)$ and $x_2(20)$.
d) Use the expression to find the limit of $x_1(n)$ and $x_2(n)$ as $n$ gets large.

Here’s the solution to the previous exercise. We have

$\begin{bmatrix}x_1\\x_2\end{bmatrix}(1)=T\begin{bmatrix}x_1\\x_2\end{bmatrix}(0)=T\begin{bmatrix}40\\20\end{bmatrix}=T\left((60)\begin{bmatrix}\frac{1}{3}\\\frac{2}{3}\end{bmatrix}+20\begin{bmatrix}1\\-1\end{bmatrix}\right)$
$=(60)T\begin{bmatrix}\frac{1}{3}\\\frac{2}{3}\end{bmatrix}+(20)T\begin{bmatrix}1\\-1\end{bmatrix}=(60)\begin{bmatrix}\frac{1}{3}\\\frac{2}{3}\end{bmatrix}+(20)(0.85)\begin{bmatrix}1\\-1\end{bmatrix}$.

So in the same way, we get

$\begin{bmatrix}x_1\\x_2\end{bmatrix}(2)=T^2\begin{bmatrix}x_1\\x_2\end{bmatrix}(0)=(60)\begin{bmatrix}\frac{1}{3}\\\frac{2}{3}\end{bmatrix}+(20)(0.85)^2\begin{bmatrix}1\\-1\end{bmatrix}$

and so

$\begin{bmatrix}x_1\\x_2\end{bmatrix}(n)=T^n\begin{bmatrix}x_1\\x_2\end{bmatrix}(0)=(60)\begin{bmatrix}\frac{1}{3}\\\frac{2}{3}\end{bmatrix}+(20)(0.85)^n\begin{bmatrix}1\\-1\end{bmatrix}$.

Putting in numbers, we have

$\begin{bmatrix}x_1\\x_2\end{bmatrix}(20)\doteq(60)\begin{bmatrix}\frac{1}{3}\\\frac{2}{3}\end{bmatrix}+(0.775)\begin{bmatrix}1\\-1\end{bmatrix}=\begin{bmatrix}20.775\\39.225\end{bmatrix}$.

In the limit as $n\to\infty$, we have $(0.85)^n\to 0$, and hence the limiting state is $\begin{bmatrix}20\\40\end{bmatrix}$, as we had predicted in the previous section.

Change of coordinates for a vector: systematic matrix version

Let’s do this a little more systematically.

In this section, I’ll work through how to change coordinates for a vector, from the old standard basis to the new basis of eigenvectors. This isn’t specific to this example; it works for any change of basis in a finite-dimensional vector space.

Let

$S=\begin{bmatrix}\frac{1}{3} & 1 \\ \frac{2}{3} & -1\end{bmatrix}$.

I have made the columns of this matrix equal to the new basis vectors, the eigenvectors ${\vec{e}’}_1$ and ${\vec{e}’}_2$. This matrix $S$ has the effect of converting from $x’_1$, $x’_2$ coordinates to the $x_1$, $x_2$ coordinates, because

$S[\vec{v}]_{B’}=S\begin{bmatrix}x’_1\\x’_2\end{bmatrix}=x’_1\begin{bmatrix}\frac{1}{3}\\\frac{2}{3}\end{bmatrix}_{B} + x’_2\begin{bmatrix}1\\-1\end{bmatrix}_B=[x’_1{\vec{e}’}_1+x’_2{\vec{e}’}_2]_B=[\vec{v}]_B$.

Understanding exercise: Change of basis matrix
The logic of that previous equation is important. Let’s check it by trying some examples.
a) Suppose that we start with the vector $\vec{v}={\vec{e}’}_1=(1){\vec{e}’}_1+(0){\vec{e}’}_2$, so that $x’_1=1$, $x’_2=0$, and $[\vec{v}]_{B’}=\begin{bmatrix}1\\0\end{bmatrix}_{B’}$. Check that multiplying this vector $[\vec{v}]_{B’}$ times $S$ gives the vector $[\vec{v}]_{B}$.
b) Do the same thing with the vector $\vec{v}={\vec{e}’}_2$.
c) Think through the logic again of the equation above. (I don’t have anything particular for you to write for this part. This is one of those things in math where you need to do the incantation to yourself about a hundred times before it sticks. . . )

This means that the matrix $S^{-1}$, if it exists, converts from $x_1$, $x_2$ coordinates to $x’_1$, $x’_2$ coordinates. Explicitly:

$S[\vec{v}]_{B’}=[\vec{v}]_{B}$
$S^{-1}S[\vec{v}]_{B’}=S^{-1}[\vec{v}]_{B}$
$I[\vec{v}]_{B’}=S^{-1}[\vec{v}]_{B}$
$[\vec{v}]_{B’}=S^{-1}[\vec{v}]_{B}$

So, to summarize,

$[\vec{v}]_{B}=S[\vec{v}]_{B’}$
$[\vec{v}]_{B’}=S^{-1}[\vec{v}]_{B}$

These are the change of basis equations, which apply for changing coordinates of vectors between any two bases $B$ and $B’$. The matrix $S$ is always defined by having columns equal to the $B’$ basis vectors, written in the $B$ coordinates (i.e., the columns are the coefficients of the $B’$ basis vectors, when they are written as linear combinations of the $B$ basis vectors).

We’ll talk about matrix inverses more generally soon, but for the moment you might want to recall that the inverse of a $2\times 2$ matrix is given by

$\begin{bmatrix}a&b\\c&d\end{bmatrix}^{-1}=\dfrac{1}{ad-bc}\begin{bmatrix}d&-b\\-c&a\end{bmatrix}$.

Understanding exercise: Change of basis example
Use this procedure to redo a previous exercise: show how to write the vector $\begin{bmatrix}40\\20\end{bmatrix}_{B}$ in the $B’$ coordinate system. Check that you get the same answer as before.

Next, I want to use this change of coordinates of vectors, to change coordinates for the transformation $T$.

Change of basis for a linear transformation

Now I want to change basis for the transformation $T$. This is the same thing we did earlier, but I want to do it in matrix form now, rather than as linear equations. This method will apply for any linear transformation $A:V\to V$, from a finite-dimensional vector space to itself. (It also applies to $A:V\to W$ between two different vector spaces, with minor modifications.)

First of all, we know that, in the standard basis $B$, the matrix of $T$ is

$[T]_B=\begin{bmatrix}0.90 & 0.05 \\ 0.10 & 0.95\end{bmatrix}_B$.

We can find $[T]_{B’}$ one of two ways. First, we can change coordinates by means of the matrix $S$. This works conceptually as follows:

  • $[T]_{B’}$ expects an input in the $B’$ basis, while $[T]_B$ expects an input in the $B$ basis. $[T]_{B’}$ will output its answer in the $B’$ basis, while $[T]_B$ will output its answer in the standard $B$ basis.
  • So, to create $[T]_{B’}$, we need to first take an input in $B’$ coordinates, and then convert it to $B$ coordinates, using $S$.
  • Then we have the vector in $B$ coordinates, so $[T]_B$ happily eats it, and spits out its answer, also in $B$ coordinates.
  • Since we are looking for the answer in $B’$ coordinates, we must convert the output from $B$ coordinates to $B’$ coordinates, using $S^{-1}$.

Summarizing the above conceptual steps, we arrive at the answer

$[T]_{B’}=S^{-1} [T]_B S$.

This arrangement happens so frequently, it has a name: it is called the conjugate of $[T]_B$ by $S$.

Understanding exercise: change of coordinates of transformation conceptually
Think through that reasoning above a few times. There isn’t anything to write here exactly, but you may want to make up your own shorthand to keep it straight. (Personally I draw little arrows moving right to left, showing how the vector moves through the factors in the equation.) In particular, the tricky thing is to get the $S$ and $S^{-1}$ on the correct sides; be sure that your understanding of the reasoning creates the formula I wrote down, and not the opposite order.

Let’s derive this more explicitly, as equations. (Which is more rigorous, but I find personally harder to keep straight.)

$[T]_{B’}[\vec{v}]_{B’}$
$=[T\vec{v}]_{B’}$
$=S^{-1}[T\vec{v}]_B$
$=S^{-1}[T]_B [\vec{v}]_B$
$=S^{-1}[T]_B S[\vec{v}]_{B’}$

Since we have shown that

$[T]_{B’}[\vec{v}]_{B’}=\left(S^{-1}[T]_B S\right)[\vec{v}]_{B’}$

for every vector $\vec{v}$, it must follow that

$[T]_{B’}=S^{-1}[T]_B S$,

as we got before conceptually.

Understanding exercise: change of coordinates of a transformation formally
Once you read through the argument above, it is instructive to reproduce it on your own. There are a couple of subtle steps that make more sense when you think through them yourself.

Understanding exercise: computing the diagonalizing matrix of T in our example
For the transformation $T$ in our example, we have written down the matrix $[T]_B$, and the matrix $S$. Use this, and the formula above, to compute the matrix $[T]_{B’}$.

The answer you should get is

$[T]_{B’}=S^{-1}\begin{bmatrix}0.90&0.05\\0.10&0.95\end{bmatrix}_B S$
$=\begin{bmatrix}1&0\\0&0.85_{B’}\end{bmatrix}$

We say that this basis diagonalizes the matrix of $T$, and that the matrix $[T]_{B’}$ is diagonal.

We could have also gotten $[T]_{B’}$ directly, by the meaning of the matrix (which I discussed a bit in class and on Problem Set 4, but haven’t written up yet). The first column of $[T]_{B’}$ is the vector $T{\vec{e}’}_1$, expressed in the $B’$ basis. But we know $T{\vec{e}’}_1={\vec{e}’}_1$, which in the $B’$ basis is simply ${\vec{e}’}_1=(1){\vec{e}’}_1+(0){\vec{e}’}_2=\begin{bmatrix}1\\0\end{bmatrix}_{B’}$. We can get the second column of $[T]_{B’}$ similarly.

Understanding exercise: Finding $[T]_{B’}$ directly
Complete that argument, to find the matrix $[T]_{B’}$, and verify it is the same answer as before.

We often want to go the other way, to change from the $B’$ basis to the $B$ basis.

Understanding exercise: Changing transformation from $B’$ basis to $B$ basis
We have already worked out $[T]_{B’}=S^{-1} [T]_B S$. Use this equation to find the matrix $[T]_B$ in terms of the matrix $[T]_{B’}$ (and $S$ and $S^{-1}$).

The answer you get should be

$[T]_B=S [T]_{B’} S^{-1}$.

You see why I was worried about getting the formula in the right order!

Confusingly, this formula is often also called the “conjugation of $[T]_{B’}$ by $S$”. As far as I can tell, authors seem about equally split on whether they write the conjugation of $A$ by $S$ as $S^{-1}AS$ or as $SAS^{-1}$.

One thing that makes this representation nice, is that for any positive integer $n$,

$[T^n]_B=S \left([T]_{B’}\right)^n S^{-1}$.

Understanding exercise: Powers of conjugated transformation
Prove the above formula! (Remember that matrix multiplication is not commutative!)

Understanding exercise: Powers of T in our example
Let $n$ be any positive integer, and let $T$ be the transformation from our example above, with $[T]_B=\begin{bmatrix}0.90&0.05\\0.10&0.95\end{bmatrix}$.
a) Use the above formula to find the matrix for $T^n$ in the standard basis, as a product of matrices.
b) Explicitly multiply out the matrices, to find the individual matrix elements of $T^n$ in the standard basis.
c) Find $T^{20}$.
d) Use the explicit form from (b) to find $\lim_{n\to\infty}T^n$.
e) Use the matrix form from (a) to find $\lim_{n\to\infty}T^n$, and check that you get the same thing.

A brief pause; what next?

Whew, this is getting long!

Take a moment to look over what you have accomplished so far. The key point is to change to a basis made of eigenvectors, in which the matrix representation of the transformation is simple.

It’s getting long, but I don’t want to leave the topic until I show you another way of writing and thinking about how to break up the transformation $T$ in terms of eigenvectors. The result is going to be that $T$ can be written as a linear combination of projections onto lines. This is called the spectral decomposition of $T$. This story requires us to look at dual eigenvectors.

Eigenvectors and dual eigenvectors

We have identified that the transition matrix $T$ has a steady-state vector ${\vec{e}’}_1$ given by

${\vec{e}’}_1=\begin{bmatrix}\frac{1}{3}\\\frac{2}{3}\end{bmatrix}$.

That is, the vector stays the same when $T$ is applied to it. (We could have equally well taken any non-zero scalar multiple of ${\vec{e}’}_1$, and that would also be fixed.)

We also mentioned that the total number of agents (or total probability) is fixed: the value

$x’_1=x_1+x_2$

is a constant. This is also a fixed thing, like the fixed/steady-state vector! But it is a different kind of thing: the fixed vector is a vector. This fixed thing $x’_1$ isn’t a vector; it’s a coordinate. More formally, it is a dual vector.

As before, let’s define the basis $B’=\{{\vec{e}’}_1,{\vec{e}’}_2\}$, where

${\vec{e}’}_1=\begin{bmatrix}\frac{1}{3}\\\frac{2}{3}\end{bmatrix}$, ${\vec{e}’}_2=\begin{bmatrix}1\\-1\end{bmatrix}$

are the eigenvectors of $T$. Now, I want to find the dual vectors $\overleftarrow{\phi}’_1$ and $\overleftarrow{\phi}’_2$ corresponding to this basis. Recall that the dual basis vectors are defined by being linear functions (dual vectors) $\overleftarrow{\phi}’_i:\mathbb{R}^2\to\mathbb{R}$, and by the condition

$\overleftarrow{\phi}’_i\left({\vec{e}’}_j\right)=\begin{cases}1&i=j\\0&i\neq j\end{cases}$.

You can use these conditions to set up linear equations for the matrices for the dual vectors, and solve. However, in one of the lectures, I described how to find the dual vectors by means of the change of coordinate matrix $S$ that we just talked about. Please take a look at that lecture if you don’t recall, to solve the following exercise:

Understanding exercise: Computing dual basis vectors
Find the dual vectors $\overleftarrow{\phi}’_1$ and $\overleftarrow{\phi}’_2$, expressed in the standard basis. These will be $1\times 2$ matrices, since they are linear functions from $\mathbb{R}^2$ to $\mathbb{R}$.

To be continued shortly!

Volume of a pyramid (solution)

Hi!

In the lecture on Volumes of Spheres, at the end, I asked you to try doing the volume of a square-based right pyramid, the same way that I did the sphere. Let me repeat the question again for reference:

Problem: Suppose that I have a pyramid with a square base. That is, I start with a square horizontal base, and then I choose a point vertically directly above the center of the square. I connect the top point to the four corners of the square with line segments, then I fill in the four triangles I have created, and finally I fill in the resulting solid.

A right, square-based pyramid.

Let’s say the height is $h$ and the base is $b$.
a) First, before you get started, make a guess about the formula. Try putting the pyramid in a box: what’s the volume of the box? What fraction of the box do you think the pyramid will take up?
b) Then, follow all the steps I did for the sphere, one by one, with the pyramid. At each step, pay attention to what is the same, and to what you need to change.
c) Once you get an answer (it may take a while!), test it out the way I did with the sphere. Does your final answer agree with your guess?

I will write out the solution in full here. Please do try the solution yourself first, following the same pattern as what I did with the sphere. If you got stuck somewhere, I would suggest looking at this solution until you get a hint for the place that you were stuck, then try it yourself again from there.

Get an initial guess or estimate

It isn’t strictly logically necessary, but it is almost always helpful to get a rough idea of the answer before you begin solving.

For this problem, I need to find the volume of a pyramid, which I don’t know. (If you do know it, pretend you don’t for now!) What do I know volumes of? I know the volume of a sphere, but that doesn’t seem helpful. The only other thing I know is a box: a right-angled rectangular box of dimensions $\ell$, $w$, and $h$, has volume $V=\ell w h$.

A right-angled rectangular box of dimensions $\ell$, w, and h.

(That’s pretty much by definition; when we define what volume means, we usually start there. Actually, you derive this from a simpler assumption, but it’s a little off-topic; I’ll post something on it if anyone is interested.)

Anyway, we could compare the volume of a pyramid to the volume of a box:

Pyramid in a rectangular box.

The smallest box that I can put the pyramid in has dimensions $b$, $b$, and $h$; so the volume of the pyramid has to be less than $V_{\mathrm{box}}=b^2h$.

In fact, it would be reasonable to guess (wouldn’t it?) that the pyramid takes up some fixed fraction of the box. That is, if I changed $b$ or $h$, that the relation of the volume of the pyramid to that of the box would stay the same. That would mean that the volume of the pyramid should be of the form

$V=cb^2h$,

where $c$ is some constant number less than 1.

Looking at the picture, and thinking about the missing space, it seems that the pyramid takes up less than half of the box. So I would further guess that $c$ is a number less than 1/2.

Now, I certainly haven’t clearly proved this is true. But I’m just getting initial guesses, so I won’t try harder to justify it now. (It could potentially be interesting to do so later though.)

I could get a lower estimate on the volume of the pyramid by placing a rectangular box inside the pyramid. That seems like something I could figure out, but it might take a little doing. So I won’t get sidetracked now, since I was just trying to get a rough guess anyway.

OK, so our guess for the volume of the pyramid is

$V=cb^2h$,

where $c$ is some constant number less than 1, and probably less than 1/2.

Now on to the method!

Changing to dynamic: filling up gradually

Following the method I showed for the sphere, our first step will be to replace the problem of finding the fixed volume of the given pyramid, to finding the changing volume of the partially-filled pyramid, as I fill it up gradually.

I imagine that the pyramid is a tank filling with water. I fill it up to a depth $z$, where $z$ is less than or equal to the total height $h$ of the pyramid.

Picture of the pyramid tank, partially filled with water
Pyramid tank, partially filled with water

From a side cross-section, it looks like this:

Partially filled pyramid tank, in cross-section.
Partially filled pyramid tank. The bottom section is water, the top empty.

Now, I am looking for the volume $V$ of the filled section only. This means that $V$ is a function of $z$; as $z$ increases, so does $V$.

(The function $V$ also depends on $b$ and $h$, but I am thinking of these as constant, since the dimensions of the pyramid do not change when I am doing this problem.)

Letting filling parameter increase slightly

The next step was to see what happens when I add just a little more water. In this problem, the parameter that controls how full the pyramid is, is the depth of the water $z$. So imagine that I start filled to some depth $z$, and then I fill a little more, so $z$ increases to $z+\mathrm{d}z$. Then the volume also increases, from a value $V$ (when the depth was $z$) to a value $V+\mathrm{d}V$ (when the depth is $z+\mathrm{d}z$).

This looks something like:

Additional volume dV created by additional depth dz.

From the side, it looks like

Additional volume dV and depth dz from cross-section.

I want to find out the increase in volume, $\mathrm{d}V$, in terms of the other variables and constants.

The shape of the extra volume is approximately a square plate! Let’s give a name to the length and width of the square plate; I’ll call them $x$ and $y$. But because the base is square, the slice is also square, so $y=x$.

The extra volume dV.

The extra volume $\mathrm{d}V$ we have added is

$\mathrm{d}V= x y \,\mathrm{d}z = x^2\,\mathrm{d}z$.

Success!

(It’s not exactly true! The sides of the little extra slice are not perpendicular to the base, so it’s not truly a right-angled rectangular box. However, the error we are making is going to be proportional to $(\mathrm{d}z)^2$, so for “infinitely small” $\mathrm{d}z$ it disappears. More precisely, the error we have made will contribute an error to our final answer, but because it is proportional to $(\mathrm{d}z)^2$, the error approaches zero as we take the limit of $\mathrm{d}z$ approaching zero.)

Note that the $x$ gets smaller as $z$ gets bigger, so the volume of the slice $\mathrm{d}V$ that we are adding is relatively big if $z$ is small, and gets smaller as $z$ gets larger. Which makes sense with the diagram (right?).

I could divide both sides of my equation above by $\mathrm{d}z$, to find

$\dfrac{\mathrm{d}V}{\mathrm{d}z}=x^2$.

A warning: writing dV in terms of the filling parameter only

Now, I might be tempted at this point to say, I know the derivative $\frac{\mathrm{d}V}{\mathrm{d}z}$ of the function $V$ I’m looking for. Then I would say, the derivative is $x^2$, so $V=\frac{1}{3}x^3+C$, and done, right?

That does NOT work. The reason is that I have found the derivative of $V$, assuming that $z$ is the variable. The quantity $x$ is a variable, which depends on $z$. We could think of $x$ as standing for an unknown formula of $z$. So the derivative of $V$ isn’t truly $x^2$; instead, it is some unknown formula of $z$, all squared.

So, to proceed further, we have to find exactly how $x$ depends on $z$.

Finding x in terms of z

Our goal is now to find exactly how $x$ depends on $z$. Once we do that, we can get a formula for $\frac{\mathrm{d}V}{\mathrm{d}z}$ that depends only on $z$.

Now we’ve got to do some geometry. Take a look at that cross-section, and start labeling everything you know:

Cross-section, with everything marked. Note that x is the width (and length) of the extra volume dV.

Note that I figured out one thing on the diagram that wasn’t given: the height of the little empty triangle on top must be $h-z$, the total height of the pyramid minus the height of the filled portion.

Now, we have a proportional (or similar) triangles question! The little triangle at the top is just a scaled version of the entire large triangle:

Similar triangles!

I’m trying to find $x$, so let me arrange a fraction where it is on the top. We could say

$\dfrac{x}{h-z} = \dfrac{b}{h}$;

that is, the proportion of base to height must be the same for both triangles. That’s usually the way I think of it. Some people prefer to think of it as

$\dfrac{x}{b}=\dfrac{h-z}{h}$;

what that says is, whatever proportion the first base is of the second base, the proportion of the first height to the second height should be the same value. (Like, if $x$ is, say, 1/4 of $b$, then the height $h-z$ should also be 1/4 of $h$.)

You can think of it either way you like. Both will give you the same equation in the end.

I’m trying to find how $x$ depends on $z$, so I want to solve for $x$. Starting with either equation above, I get

$x=\dfrac{b}{h}(h-z)$.

It’s always a little subjective what counts as “simplified”, and you might want to multiply through this bracket. I’m going to leave it the way it is for now, because I know I want to find $x^2$ soon.

Finding dV/dz in terms of z only (and constants)

OK, so the purpose of this was to find $\mathrm{d}V$, and $\frac{\mathrm{d}V}{\mathrm{d}z}$, in terms of z only. So let’s substitute our formula for $x$ in terms of $z$, into our formula for $\frac{\mathrm{d}V}{\mathrm{d}z}$. We get

$\dfrac{\mathrm{d}V}{\mathrm{d}z}=x^2=\left(\dfrac{b}{h}(h-z)\right)^2=\left(\dfrac{b}{h}\right)^2(h-z)^2=\dfrac{b^2}{h^2}(h-z)^2$.

Now, once again, “simplified” is a little bit subjective here. This is a nice simple answer. But, I’m going to want to find the antiderivative of this function shortly. And having that parenthesis squared is going to cause me problems. My life will be easier if I multiply the thing out:

$\dfrac{\mathrm{d}V}{\mathrm{d}z}=\dfrac{b^2}{h^2}(h-z)^2=\dfrac{b^2}{h^2}(h^2-2hz+z^2)=\dfrac{b^2}{h^2}h^2-2\dfrac{b^2}{h^2}hz+\dfrac{b^2}{h^2}z^2$.

Simplifying the fractions, I get finally that

$\dfrac{\mathrm{d}V}{\mathrm{d}z}=b^2-2\dfrac{b^2}{h}z+\dfrac{b^2}{h^2}z^2$.

This formula truly gives me the derivative of my unknown function $V$ with respect to $z$—with no surprises—because it depends only on the variable $z$. It also depends on $h$ and $b$, of course, but for the purposes of this problem, those are constants—we can think of them as being fixed numbers.

This formula tells me how the the partially filled volume $V$, that is, the volume of the water, changes as the depth $z$ changes.

Now we’re getting close!

Finding the antiderivative

We know the derivative of the function we’re looking for! So we have to reach back to problem set 1, and the rules we had for derivatives. Finding the antiderivative is a sort of guess-and-check: what function would give me the derivative that I want?

Our formula for $\frac{\mathrm{d}V}{\mathrm{d}z}$ has three terms. Let’s tackle it one term at a time.

First, $b^2$. A warning: the antiderivative is NOT $\frac{1}{3}b^3$! Remember that $b$ is a constant here, just a number. So $b^2$ is equally well a constant. It’s as if we were given something like $\frac{\mathrm{d}V}{\mathrm{d}z}=7$; then $V$ could have been $7z$.

So, the antiderivative of $b^2$, with $z$ as a variable, is just

$b^2 z$.

(Plus an added constant, but I’ll leave those to the end.)

Second, $-2\frac{b^2}{h}z$. Again, the forbidding-looking expression $-2\dfrac{b^2}{h}$ is just a constant. So, it’s just as if we were given something like $\frac{\mathrm{d}V}{\mathrm{d}z}=7z$. If the derivative has a $z$ to the power 1, then then original power had to have been 2. We can’t make $V=7z^2$, though, because then we would get $\frac{\mathrm{d}V}{\mathrm{d}z}=14z$ instead. We need to introduce a factor $1/2$, to cancel the 2 we get when we take the derivative of $z^2$. So, the antiderivative of $7z$ would be $7\left(\frac{1]{2}z^2\right)=\frac{7}{2}z^2$.

Similarly, the antiderivative of $-2\frac{b^2}{h}z$ is going to be

$-\dfrac{b^2}{h}z^2$.

Third, $\dfrac{b^2}{h^2}z^2$. As before, the fraction in front is just a constant. The antiderivative of $z^2$ has to have a power 3, and we have to cancel the 3 that is created when we take the derivative. So, the antiderivative of $z^2$ is $\frac{1}{3}z^3$.

This means that the antiderivative of $\dfrac{b^2}{h^2}z^2$ is

$\dfrac{1}{3}\dfrac{b^2}{h^2}z^3$.

Putting it all together, the antiderivative of

$\dfrac{\mathrm{d}V}{\mathrm{d}z}=b^2-2\dfrac{b^2}{h}z+\dfrac{b^2}{h^2}z^2$

is

$V=b^2 z-\dfrac{b^2}{h}z^2+\dfrac{1}{3}\dfrac{b^2}{h^2}z^3+C$,

where C is an unknown constant.

Evaluating the constant

Before we first start filling up the pyramidal tank, the height of the water starts at $z=0$. At that time, the volume filled so far is $V=0$. Putting both of those into our formula for $V$ given above, we find

$0=b^2(0)-\dfrac{b^2}{h}(0)^2+\dfrac{1}{3}\dfrac{b^2}{h^2}(0)^3+C$,

which all just boils down to

$C=0$.

Therefore, the formula for the volume, of the pyramid filled up to a depth $z$, is given by

$V=b^2 z-\dfrac{b^2}{h}z^2+\dfrac{1}{3}\dfrac{b^2}{h^2}z^3$.

Awesome!

Filling the whole volume to get the final answer

My original problem was to find the entire volume of the pyramid. I introduced this idea of filling it to a depth $z$ as a device. So, to answer the original problem, I should set the depth $z$ to be the whole height of the pyramid, so that the entire thing is filled up:

set $z=h$!

Doing this in our formula for the volume $V$ of the partially-filled pyramid, we get a lot of simplification:

$V=b^2 (h)-\dfrac{b^2}{h}h^2+\dfrac{1}{3}\dfrac{b^2}{h^2}h^3=b^2h – b^2h +\dfrac{1}{3}b^2h=\dfrac{1}{3}b^2h$,

so our final answer is

$V=\dfrac{1}{3}b^2h$.

Hooray!

Verifying the answer

Let’s see what checks we can put this answer through. First off, we had guessed that the volume of the pyramid would be

$V=cb^2h$,

where $c$ is some positive constant that is less than 1, and probably less than 1/2. In fact, our final answer exactly fits this guess, with $c=1/3$. Beautiful!

That’s pretty strong evidence that it’s right, I think.

What else could we do? One thing I notice is that the partially filled pyramid is actually a big pyramid minus a small pyramid on top. If our formula is correct, then the big pyramid has volume

$V=\dfrac{1}{3}b^2h$

and the smaller pyramid on top has volume

$V=\dfrac{1}{3}x^2(h-z)=\dfrac{1}{3}\left(\dfrac{b}{h}(h-z)\right)^2(h-z)=\dfrac{1}{3}\dfrac{b^2}{h^2}(h-z)^3$.

So, the volume of the partially-filled pyramid (up to a depth $z$) should be

$V=\dfrac{1}{3}b^2h-\dfrac{1}{3}\dfrac{b^2}{h^2}(h-z)^3$

Does this agree with the formula we got above,

$V=b^2 z-\dfrac{b^2}{h}z^2+\dfrac{1}{3}\dfrac{b^2}{h^2}z^3$?

Bulldozering through the algebra, it does seem to match!

That’s probably a little bit logically circular, maybe. But I’m not trying to get a proof here, I’m just trying to run consistency checks. Our formula checks out!

Looking back: could we simplify?

It’s always a good idea to look back on a solution—particularly a long one—and try to summarize, clean up, and see if you could have done things more easily. It’s also good to look for lessons for next time.

One thing which would make things easier, which at least one of you noticed already: I could have flipped the pyramid over, and filled it from the tip. Or alternately, I could have filled the pyramid from the top down (antigravity water??).

The pyramid, flipped over before I start filling it.

Then the algebra of my proportional triangles becomes a lot easier. And as a result, my antiderivatives come out a lot easier as well.

Exercise:
Do the problem the way I just described, putting the pyramid upside-down first. (Or you can leave it right-side-up, but just fill it from the top down.) You don’t have to write the whole problem again, just write out the parts that are different. It should come out a fair bit easier!

It’s also worthwhile to see if you can solve other similar problems the same way. One thing that is clear is, the base could have been a rectangle instead of a square, and things wouldn’t change too much.

Exercise:
Do the problem again, assuming that the base is a rectangle with length $\ell$ and width $w$. Again, you don’t have to write out the whole thing: just identify what changes. You could probably make a guess how the answer should come out, and then check it.

Another thing that could be generalized: it didn’t seem to matter too much that the tip of the pyramid was directly over the base.

Exercise:
Suppose that we have a rectangle-based pyramid, with base length $\ell$ and width $w$, and with perpendicular height $h$; but let’s assume that the top point of the pyramid is directly above one of the corners of the rectangle. (So the pyramid is kind of a 3-D version of a right-angled triangle.) Go through your solution, and figure out what step needs to change, and how the final answer would change (if indeed it does).

I ask some similar problems in the “Generalization” part of Problem Set 7, Problems on volumes and areas.

Eigenvalues I: Markov processes

Hi everyone!

OK, I had planned to go in strict logical order from now on, but I just couldn’t do it. I was too worried about 1) running out of time for important things, and 2) taking away emphasis from the central ideas.

So what I’m going to do is start with a key example. Then I’ll talk about theory backing that up, and then a new example, and so on. I am planning to write the foundational lectures still, as we go. But I think this way will make more sense (though I know it doesn’t sound like it from what I just said!).

OK, let’s get started.

Eigenvalues, eigenvectors, and spectral decomposition

First, just a little bit of context and theory. I am not going to use this right away in the example, but this is where I’m heading.

Suppose we have linear transformation $A$ that maps a vector space back to itself, $A:V\to V$. It is interesting to know, in many applications, if $A$ has a fixed point: some vector $\vec{v}\in V$ such that

$A\vec{v}=\vec{v}$.

In fact, a more general thing is often of interest: whether $A$ has an invariant line. A line through origin $L$ would be called invariant if:

for any vector $\vec{v}\in L$, we also have $A\vec{v}\in L$.

A line through the origin is a 1-dimensional subspace, so such a fixed or invariant line is usually called an invariant 1-dimensional subspace.

You should take a moment to convince yourself that $A$ has an invariant line $L$ exactly when there exists a non-zero vector $\vec{v}_0$ with the property that

$A\vec{v}_0 = \lambda \vec{v}_0$, for some scalar $\lambda$.

Understanding Exercise: Eigenvectors and Invariant Lines
Give a rough explanation (to yourself is fine, you don’t have to write it out necesssarily) of the statement above: that $A$ has an invariant line exactly when there is a non-zero vector $\vec{v}_0$ with the stated property. It doesn’t have to be a precise proof (see below).

Such a vector $\vec{v}_0$ is called an eigenvector, and the corresponding scalar $\lambda$ is called an eigenvalue. “Eigen” is German for “proper”, in the sense of “belonging to” (like “property”). In some older texts, eigenvectors are called proper vectors, or characteristic vectors. They are vectors which “belong to” or are “characteristic of” the transformation $A$.

Theory Problem: Eigenvectors and Invariant 1-D Subspaces
Suppose that $A:V\to V$ is a linear transformation as discussed above. Prove the following: if there exists an eigenvector $\vec{v}_0$ for $A$, then the line $L=\mathrm{span}\{\vec{v}_0\}$ is an invariant line (1-D subspace) for $A$. On the other hand, if $L$ is an invariant line for $A$, then any non-zero vector in that line is an eigenvector (and all the vectors in that line share the same eigenvalue).

This simple looking concept ends up being deeply useful in many contexts. Let me get into the examples first, and then I’ll come back to how the eigenvectors play a role in the examples. (In a later lecture, I’ll talk about how eigenvectors and eigenvalues are deeply theoretically important as well.)

A Markov chain: setting up the model

OK, here is an applied example, versions of which show up in all kinds of places (economics, physics, probability, computer science…).

Let’s suppose that we have some sort of system where agents can be in two states, let’s call them states 1 and 2. At time 0, we have a bunch of agents in state 1 and a bunch of agents in state 2. Then, in each time step, some proportion of the agents will flip states.

This setup could be used for demographics and economics: for example, state 1 could be urban residents and state 2 rural residents; each year some proportion move from on state to the other. For another example, in physics, an ammonia molecule can be in one of two main energy states, and its flipping back and forth is responsible for being able to create an “ammonia maser”.

Of course, there’s no limitation on the number of states, but I’m starting with just two states so that things are simple to write down.

Let’s use $x_j(t)$ to denote the number of agents in state $j$ at time $t$. So we start with $x_1(0)$ agents in state 1 and $x_2(0)$ agents in state 2.

Now, let me pick some numbers (arbitrarily). Let’s say that, in each time step, a proportion $0.10$ (that is, 10%) of the agents in state 1 move to state 2, and in each time step, a proportion $0.05$ (that is, 5%) of the agents in state 2 move to state 1.

We can illustrate the situation as follows:

Image of the Markov system described in the previous paragraph. There are two nodes/vertices marked 1 and 2, an edge from 1 to 2 marked 0.10, and an edge from 2 to 1 marked 0.05
A two-state Markov chain.

This picture is called a “directed graph” (not to be confused with “graph” like “graph of a function”). That states are called “vertices” or “nodes”, and the lines between the states are called “edges”. If we had more states, we would have a bigger directed graph.

Sometimes people will also indicate the proportions which stay in the state they were in during each time step. Since 10% of the agents in state 1 leave in each time step, 90% of them stay, and similarly for state 2. We could draw it like this:

Same as the previous image, except now there is a loop attaching 1 to itself, marked 0.90, and a loop attaching 2 to itself, marked 0.95.
A Markov chain, with the probabilities of staying in a state marked.

Note that the number of agents are conserved: agents are not born or killed. We could consider models where that happens, too! But the class of models where the number of agents is conserved are the ones I’ll consider first. These are called “Markov chains”.

It is maybe worth mentioning that, instead of having a bunch of agents, I could also just have one agent, who, each time step, randomly either switches states or stays in its state. Then I would make $x_1(t)$ and $x_2(t)$ be the probabilities that the agent is in state 1 or 2, respectively, at time $t$. You could also imagine this as an agent randomly walking around on a graph, according to assigned probabilities.

A Markov chain: setting up the equations

Alright, now that I’ve told you the assumptions, let’s set up the equations.

Understanding Exercise: Setting up Markov chain equations
Before you read on, try this yourself. Let’s say the time steps are $t=0,1,2,\dotsc$. Write the equations for $x_1(t+1)$ and $x_2(t+1)$ in terms of $x_1(t)$ and $x_2(t)$, with the assumptions about probabilities that I made above. Once you have the equations, try to rewrite them in matrix form.

Got it?

Here’s the answer in linear equations form:

$x_1(t+1)= (0.90) x_1(t) + (0.05) x_2(t)$
$x_2(t+1)= (0.10) x_1(t) + (0.95) x_2(t)$

In matrix form, we could write this as

$\begin{bmatrix}x_1\\x_2\end{bmatrix}(t+1)=\begin{bmatrix}0.90&0.05\\0.10&0.95\end{bmatrix}\begin{bmatrix}x_1\\x_2\end{bmatrix}(t)$

The matrix here,

$T=\begin{bmatrix}0.90&0.05\\0.10&0.95\end{bmatrix}$

is called the transition matrix.

It is often useful, in the equation, to separate out the number of agents from the previous time step, plus the change during the time step. That is, I would rewrite the equation above as

$\begin{bmatrix}x_1\\x_2\end{bmatrix}(t+1)=\begin{bmatrix}x_1\\x_2\end{bmatrix}(t)+\begin{bmatrix}-0.10&0.05\\0.10&-0.05\end{bmatrix}\begin{bmatrix}x_1\\x_2\end{bmatrix}(t)$,

which I could simplify as

$\begin{bmatrix}x_1\\x_2\end{bmatrix}(t+1)=\left(I+\begin{bmatrix}-0.10&0.05\\0.10&-0.05\end{bmatrix}\right)\begin{bmatrix}x_1\\x_2\end{bmatrix}(t)$,

where $I=\begin{bmatrix}1&0\\0&1\end{bmatrix}$ is the $2\times 2$ identity matrix.

Understanding Exercise:
Check that my rewriting of the equation is the same as the original (if I did it correctly!), and make sure you understand the meaning of the terms.

In this case, we would call the matrix which gives the changes in each time step,

$\Delta=\begin{bmatrix}-0.10&0.05\\0.10&-0.05\end{bmatrix}$,

the “difference matrix” or weighted graph Laplacian. Then,

$T=I+\Delta$.

A Markov chain: questions

Given these equations, we could start with any initial number of agents $x_1(0)$ and $x_2(0)$ in the two states 1 and 2, and see how they evolve over time. This would be a little tedious to do by hand, but easy enough with a computer.

One initial thing we might want to know is: does the system approach a steady state, i.e. an equilibrium? That is, do the numbers $x_1(t)$ and $x_2(t)$ settle down over time? Or do they oscillate forever?

Computing Problem: Two-State Markov Chain Example
For those of you who are interested in computing things, this would be a good example to start with. You probably have a preferred system for doing matrix computations already; if you don’t, you might try https://www.sagemath.org/, which combines Python with a Maple-like front end. Try some arbitrary starting values $x_1(0)$ and $x_2(0)$, and then find $x_1(t)$ and $x_2(t)$ for $t=1,2,3,\dotsc$. In particular, find these for fairly large values of $t$: do they in fact settle down, or do they oscillate? Try experimenting with the initial values: how does the behavior depend on what initial values you take?

If you do decide to do this computing exercise, you might want to try it before trying for theoretical answers in the next exercise:

Problem: Finding Two-State Markov Chain Steady State
a) Try to set up equations to determine if this two-state Markov system has a steady state. Assume the same probabilities as above. You might also want to assume some specific numerical values for $x_1(0)$ and $x_2(0)$, at least at first.
b) Try to solve for the steady state. Does the system have a steady state? If it does, check your answer!
c) How does your answer depend on the initial values $x_1(0)$ and $x_2(0)$? If you picked specific numerical values at first, try doing it with unspecified variables instead. Does the nature of the solution depend on the initial values?
c) Can you explain how this question is related to eigenvectors and eigenvalues?

Theory Problem: General Two-State Markov Chains
For those of you who are more theoretically inclined:
a) Do the previous problem, still with two states, but with general transition probabilities replacing the specific ones that I gave in my example. Give a general answer in terms of variables.
b) Can you prove when the system will have a steady state? Can you give a proof which would generalize to more than two states?

We might be interested, not only in the existence and nature of a steady state, but also how the system approaches a steady state. If you’ve done the other problems, here’s a challenge problem to think about: the state of the system at time $t=n$ is going to be given by

$\begin{bmatrix}x_1\\x_2\end{bmatrix}(n)=T^n\begin{bmatrix}x_1\\x_2\end{bmatrix}(0)$.

Can you use what we’ve done above, to find a simpler way of computing $T^n$, so that you can simply find out the state of the system for large time, without having to do a huge number of matrix multiplications?

More answers and explanations next time!

Problems on volumes and areas

(Problem Set #7)

Hi again! If you’ve worked through the lecture on Volumes of Spheres, then you’ll be ready to try some similar problems. The first problem, which I’d like everyone to work through and write up, is the problem I put at the end of that lecture, to find the volume of a square-based pyramid. I’ll repeat it here for reference:

Problem: Volume of a square-based pyramid.
Suppose that I have a pyramid with a square base. That is, I start with a square horizontal base, and then I choose a point vertically directly above the center of the square. I connect the top point to the four corners of the square with line segments, then I fill in the four triangles I have created, and finally I fill in the resulting solid.

Image of a square-based right pyramid.
A right, square-based pyramid.

Let’s say the height is and the base is .
a) First, before you get started, make a guess about the formula. Try putting the pyramid in a box: what’s the volume of the box? What fraction of the box do you think the pyramid will take up?
b) Then, follow all the steps I did for the sphere, one by one, with the pyramid. At each step, pay attention to what is the same, and to what you need to change.
c) Once you get an answer (it may take a while!), test it out the way I did with the sphere. Does your final answer agree with your guess?

Since that problem is the first exercise, I (or Džen) will be very generous with hints and suggestions, just ask!

Once you’ve completed that problem, you could either do some quite similar problems, or if you are feeling confident, you could try a more challenging problem. There are more problems here than you will probably have time for, so you can pick and choose.

Fairly similar volume problems

Here are two volume problems which are pretty similar to the sphere and the square-based pyramid, if you’d like to try similar things to build your understanding and confidence with this method. You could do both, or choose one (doesn’t matter which). Or, if you are feeling confident, you could skip ahead to other problems.

Problem: Volume of Right Circular Cone
A “right circular cone” is the shape you get as follows: start with a circle $C$ on the ground. The base of the cone is the filled-in circle of all the points on the ground inside the circle. Pick a point $P$ that is directly above the center point of the circle $C$. For each point $q$ on the circle $C$, make a line segment from $q$ to $P$; the (infinitely many) line segments you get form the curved side surface of the cone.

Image of a right circular cone.
A right-circular cone.

The solid cone is all the space that is contained above the base circle, and under the curved sides with top point $P$. Find the volume of the cone! (Your answer will be in terms of the dimensions of the cone, in some way: you choose what dimensions are appropriate!)

Here’s another similar one:

Problem: Parabolic Bowl
Suppose that we start with the graph of the parabola $y=x^2$, in an $x$-$y$ plane (I want to imagine that the $y$ axis is vertical). Let $h$ be some positive constant. I want to start with the region that is above the curve $y=x^2$, below the line $y=h$, and to the right of the $y$-axis:

Image of region described in the problem, which will be rotated about the y-axis.
Parabolic region, that I’m going to rotate . . .

Then, I imagine rotating that region $360^\circ$ around the $y$-axis, keeping the $y$-axis as a fixed line (axis of rotation). The space that shaded region sweeps out in space has the shape of a solid parabolic “bowl” or “bullet”:

Sketch of a parabolic "bowl", described in the problem.
. . . creating a parabolic “bowl”.

Find the volume of this parabolic bowl!

Areas

A very similar method works for finding areas. You imagine gradually filling in the area, up to some “depth” (or it could be “width”, or other parameter). In this case you are finding $\mathrm{d} A$ rather than $\mathrm{d} V$, but the method is otherwise similar. If you haven’t taken calculus before, I’d definitely recommend trying this problem. If you have, you’ve probably seen problems like this frequently, so you might want to skip it—up to you!

Problem: Parabolic Area
Suppose that we start with the graph of the parabola $y=x^2$, in an $x$-$y$ plane. Let $h$ be some positive constant. I want to consider the region that is above the curve $y=x^2$, below the line $y=h$, and to the right of the $y$-axis:

Image of region described in problem.
Region bounded by a parabola. Ignore the little “spinning” symbol this time; I’m not rotating around the y-axis, I’m just interested in finding this shaded area.

Find the area of the shaded region!

Generalization!

If you’d like to try something a little more challenging, and want to see some interesting more general rules following from the above, try these two problems.

Problem: Volume of Slant Pyramid
In the “Volume of Square-Based Pyramid” problem at the top, I assumed that the top corner (let’s call it $P$) was directly above the center of the base square. What if I didn’t do that? Let’s define a “slant pyramid” as follows: first, make a horizontal square $S$ on the ground. All the points enclosed within the square on the ground will form the base. Now, pick $P$ to be any point above the ground. Form four edges by connecting each of the four corners of the square $S$ with the point $P$. This creates four triangular faces. The “slant pyramid” is the solid contained by the square base and the four triangular side faces.

A picture of a slant pyramid, as described in the problem.
A slant pyramid.

What is the volume of the slant pyramid? (Your answer will involve the dimensions of the slant pyramid somehow. Use whatever specification of the dimensions that you think makes the most sense.)

That generalization is actually a special case of the following generalization:

Problem: Volume of Any Cone
You may have noticed that my definition of a “right circular cone” would work to define a square-based pyramid also—just replace the circle base $C$ with the square base $S$, and leave the rest of the definition the same. More generally, we could do the same with a triangle base, or a pentagon base, or an ellipse base. . .

We define a “cone with arbitrary base” as follows. Let $C$ be any closed curve, in a horizontal plane, that doesn’t intersect itself. The base will be all the points on the horizontal plane which are contained within $C$. Pick any point $P$ above the horizontal plane. For each point $Q$ on the curve $C$, form the line segment connecting $Q$ to $P$. The (infinite) family of line segments created form the sides of the cone. The solid cone is all the points in space enclosed by the base and the sides.

Picture of a general cone on a base C, as described in the problem.
A general cone on a base C.

Let’s say the area of the base is $A$. Find the volume of this cone. Your answer will involve $A$, and also at least one other dimension of the cone (which you can choose).

An area challenge

If you are looking for a challenge, you might try this problem. It is definitely doable with everything we have done so far—the only extra thing is you need to remember a little bit of trigonometry. It is a challenge which also involves some ideas which will be useful if you go further with calculus or physics.

Problem: Area of a Sphere
In a previous problem, we have shown how to find the formula for the surface area of a sphere of radius $r$, assuming that the volume formula was already known. (Remind yourself which question, and what the answer was!!) It is possible to find the surface area of a sphere of radius $r$ more directly, using the sorts of ideas that we have been developing above. Try it! (If you can’t get a complete answer, at least write out your strategy, and ask for hints! The ideas are important and interesting, even if you don’t get a full answer.)

Picture of a sphere.
What is the surface area?

Some volume challenges

Here are two problems that are interesting, if you are looking for a challenge with computing volumes!

Problem: Volume of a Torus
Suppose $r$ and $R$ are two positive constants, with $r<R$. Suppose we draw a circle $C$ in an $x$-$y$ plane, with radius $r$, and whose center is at the point $(R,0)$. Now, imagine that $y$ is vertical, and rotate this circle $360^\circ$ around the $y$-axis, keeping the $y$-axis fixed (axis of rotation). The resulting surface is called a “torus”:

Picture of a torus.
A torus.

The solid torus is the region of space enclosed by the torus surface. Find the volume of the solid torus!

This last one is super challenging:

Problem: Volumes of Platonic Solids
The “Platonic solids” are the analogue of regular polygons in three dimensions. Unlike regular polygons, there are only finitely many of them—in fact, only five:

Picture of the five Platonic solids.
The five Platonic solids.

See Wikipedia for more information. What are the volumes of the Platonic solids, as formulas of their side lengths? One is easy (!), and two of them are not hard given the things you have figured out in previous problems. The remaining two are doable with things you have already figured out, but geometrically tricky!

Welcome to the new system!

Hi!

Picture of Andrew

Welcome to the new system!

I’m still figuring out how best to make the online thing work. (Once I figure it out, we’ll be back to in-person anyway!) I think the feedback meetings are valuable, but they were difficult to sustain in this format. So my new plan is to suspend the feedback meetings for now, and to spend time making lectures in this written format.

My hope is that the written format will give you something easier to refer to as you are working through the material. Also, I hope it will make it easier to work on your own. One of the hardest things with the online learning for me has been that it is difficult to have all of you work effectively during class time.

Having the core lecture in written format will free us up to use class time more flexibly, hopefully.

Finally, there is a lot more material I’d like to cover than we will have time for in the term. Writing the lectures will give me more of a chance to make optional side trips, which you can choose to follow based on what interests you.

The story so far

OK, let’s get started! Let me start with a brief recap of where we are so far, and then I’ll explain my plan going forward.

Projections

We opened the class with the problem of projections. This has been the theme for the first half of the class.

To begin with, I imagined that you have a given vector $\vec{v}$ in $\mathbb{R}^3$, and a plane $S$ spanned by two given vectors $\vec{u}_1$ and $\vec{u}_2$, and I asked how to find the perpendicular projection $P\vec{v}$ of $\vec{v}$ onto the plane $S$.

Picture of projecting a vector onto a plane (as stated in the previous paragraph).
Projection of a vector in three dimensions onto a plane.

I think this problem has worked out pretty well! Even if you did not get a full solution, I think it helped everyone orient themselves and think about what they knew and what they were trying to do. Also, people came up with several different strategies, all of which we are going to use in various situations!

My intention was that you make an initial to solve this problem with things you knew already. Then I would start the strictly logical development of the theory, and I would use the problem as a motivation for and guide to introducing new theory, which we would use to solve the problem different ways.

So far, this problem has been given two solutions. One way was sketched out in Problem Set 3, Problem #1. It depended on the dot product (or more generally, inner product), which we haven’t formally introduced yet. The second way I discussed in class on Tuesday March 30. The second way required us to find a perpendicular subspace, which I also haven’t formally introduced yet.

In the logical development, I am trying to avoid using the inner product as long as possible. The purpose of this is to make clear what doesn’t depend on the inner product, and what actually needs it. In some applications, we don’t have a natural inner product, or we have more than one.

Least Squares

Picture of a least-squares fitting of a line to four data points.

I introduced one major application of projections: “least squares”, which is a way of fitting a line (or curve, or subspace) to some data points in the “best” possible way. This was developed in Problem Sets 1 and 2.

I thought it was important to see a real application early on! Otherwise this all gets too abstract.

Logical Development: Fields, Vector Spaces, Bases, Linear Transformations

The notes from when we introduced vector spaces. (I'm not asking you to read this, I just wanted a picture to illustrate "vector spaces"!)
The notes from when we introduced vector spaces. (I’m not asking you to read this, I just wanted a picture to illustrate “vector spaces”!)

While still thinking about the projection problem, we have then started on the development of the subject in strictly logical order. Logically, it would have made sense to start here, but I was worried that it would be too abstract, and it wouldn’t be clear where it was going. My intention was to give a major practical and theoretical problem (projection), and get you to try it on your own first, so that the general theory would be motivated. But of course the general theory is important for a lot else as well.

Most of Problem Sets 3 and 4 are about the logical development of the fundamental concepts.

A Big Problem For The Second Half: Spectral Decomposition and Decoupling

Notes from class on decoupling. (Not expecting you to read this, just wanted a picture for this section!)
Notes from class on decoupling. (Not expecting you to read this, just wanted a picture for this section!)

Just last class (Friday April 2), to start off the second half of the term, I introduced another large motivating problem. Actually, I didn’t quite get there yet: we will get to the main idea and problem in class today (Tuesday April 6).

The plan going forward!

I am introducing the “spectral decomposition and decoupling” problem as a motivation, and I will ask you to spend a bit of time thinking about it, but I am not going to get you to struggle through it on your own, the way I did with the projection problem.

From now on, I want to concentrate mostly on the development in logical order. I will still be splitting between theory and applications, but I’ll bring up the applications when we get to them in logical order. There will be more opportunities now for computational problems, for those of you who are interested in that.

In the written lectures, I will start with reviewing the material we have done so far on fields, vector spaces, bases, and linear transformations, adding some more optional theory and applications along the way. Then I will discuss the projection problems and spectral decomposition problems, and then we’ll move forward to new theory and applications.

Here is the list of topics I hoped to cover in this class. We definitely won’t get to everything, but I’m going to try to briefly touch on all of it, at least in optional lectures and problems. I’ve listed them in logical order, but I may jump out of that order a bit, in order to be sure to get to the most important things.

  • Fields (including complex numbers, finite fields, rational field extensions)
  • Vector spaces (including function spaces, spaces of polynomials, field extensions, functions on graphs/networks, logical operations, families of subsets)
  • Subspaces (things we are projecting to, with examples in all the vector spaces listed above)
  • Quotient spaces (optional, but important for theory and for computer graphics)
  • Projective spaces
  • Linear independence, span, and bases (including Legendre and Hermite polynomials, Lagrange interpolation for curve-fitting)
  • Direct sums, complementary subspaces, and projections
  • Linear transformations
  • Dual spaces
  • Basis change
  • Gauss-Jordan elimination and systems of linear equations
  • Decomposition of projections into tensor products
  • Decomposition of linear transformations into tensor products
  • Image, kernel, and cokernel
  • Parametric and scalar equations
  • Error-correcting codes
  • Eigenvectors, eigenvalues, and spectral decomposition
  • Principal component analysis
  • PageRank
  • Multilinear functions and tensors; Einstein summation notation
  • Inner products
  • Quadratic forms
  • Orthogonal projection
  • QR decomposition
  • Singular value decomposition
  • Jordan decomposition
  • Determinants
  • Tensors
  • Differential forms

What next?

I will keep posting written lectures. I am planning to start with a recap of the material on fields, vector spaces, bases and linear transformations. I will let you know on the Slack when a new one is posted, and I will list them all on the Advanced Linear Algebra main page.

The written lectures will include problems, which I recommend doing as you work through the lecture, but which I am not asking you to hand in.

I will let you know, on Slack, what are the main things I’d like you to be working on, and what the suggested due dates are for submitting problems.

css.php