MATH411 is a continuation of MATH410, but into higher dimensions.

Derivatives in Several Variables


Let . Then, is a limit point of if there is a sequence such that .

In other words, there is a sequence not containing converging to it.

If we have function , and is a limit point of , then the limit of the function is defined as

if ,

Example: Existence of Limits (1)


does not exist, as we can chose , and , which have different limits as .

Theorem: Compositions of Limits

Let , be a limit point. Let , be functions, such that


  1. If for all , and ,

The quotient rule for limits is the most interesting of the 3, and there is a broad study of limits of quotients

Where .

These limits can occur frequently, and we commonly ask if such limits exist (think of derivatives!).

Example: Limit Example

We ask if this limit exists. To determine this, we will establish a bound on the function.

For any where and are both not equal 0, our function is bounded by ! Thus, as , , so by the Comparison Lemma, .

Thus, the limit exists and is equal to 0!

Theorem: Limit Equivalences

Let and let be a limit point of . For a function , and , the following assertions are equivalent:

  1. In other words, for any , if , then
  2. , there exists some such that

We can also use the following property to show that such limits exist.

A function is homogeneous of degree if

Basically, we should be able to replace with , and take out the into a term.

Example: Homogenoeus Functions

Is homogeneous of degree 1, because

Is homogeneous of degree 0, because

Curiously, is homogeneous of degree 1 and has a limit to , whereas is homogeneous of degree 0 and doesn’t. Does there suggest some generalization?

Proposition: Limits of Homogeneous Functions

If is continuous, and homogeneous of degree , then

This does not necessarily mean that homogeneous functions of degree don’t have a limit at 0! Just that some don’t.

Any constant function (ex. ) have defined limits as !

Partial Derivatives

Let , .

For , we define the partial derivative of with respect to at as

if the latter limit exists. Note that is the basis vector,

Keep all variables constant except , and take !

Example: Partial Derivatives and Continuity

Let ,

We noticed that is not continuous at , and exist (by the quotient rule) at all .

Then, does exist at ?

Yes! It exists and is equal to 0.

Similarly, we can find .

This is very interesting! Even though is not continuous at , our partial derivatives still exist! This goes against our understanding of differentiability and continuity in the single variable case.

Let open, . Then, we say has first-order partial derivatives if for all , the function has a partial derivative with respect to its component, at every point in its domain.

Differentiability Need Not Imply Continuity

In the single-variable case, a function with a derivative was continuous. However, this is no longer true in multiple variables!

A function with first-order derivatives need not be continuous. Consider the following example.

Example: Differentiability Need Not Imply Continuity


We show that the partial derivatives of the function exist at . For all ,


However, this function is not continuous! For sequence , for all , but !

It is only if all partials are continuous, that our theorems from the single-variable case hold.

We say that is continuously differentiable, if it has first-order partial derivatives such that each partial derivative is continuous for .

Let’s now consider second-order partial derivatives, denoted like

Where we apply the partial derivative of first, then after.

Order matters! There are some functions where swapping the order of derivatives changes the result.

  • We say has second-order partial derivatives of it has first-order partials, such that for , each also has first-order partial derivatives (of every variable).
  • We say has continuous second-order partial derivatives if it has second-order partial derivatives, and each are continuous.

Theorem: Partial Derivative Order

Let open, and let have continuous second-order partial derivatives. Then, for any two , and any ,

Directional Derivatives and MVT

Recall that in the single variable case, we had the Mean Value Theorem.

Theorem: Mean Value Theorem

Let be continuous, and differentiable on . Then, such that

This is a really useful theorem! In this section, we generalize it to multiple variables.

This generalization requires we use the single-variable MVT!

Lemma: Mean Value Lemma

Let open, and let . Let have a partial derivative with respect to for all .

Let , and be a real number such that the segment between and lies in . Then, such that

Intuition: If we view our function along an axis, we get a function on one-variable. On this, we can apply single-variable MVT!

We use this Lemma to prove the following.

Proposition: Mean Value Proposition

Let be a function. Assume all partials exist , .

Choose an , and an offset . Then, there exists a in the ball around of radius () such that

Recall that in our definitions of partial derivatives, we differentiate a function with respect to one of the axes

But what if we wanted to differentiate in a direction that isn’t aligned with the axes? This is where directional derivatives come in!

Let open, and consider the function . For a point , and direction , we define the directional derivative as

If the limit exists.

Now let’s define the gradient of the function, , as the row vector

In some cases, we can calculate the directional derivative using the gradient, which can be a lot easier than taking a limit!

Theorem: Directional Derivative Theorem

Let open, and let be .

Then, , and all directions , the function has a directional derivative at in the direction , which can be calculated as

In other words, the inner product of with the gradient of the function!

Note that sometimes the directional derivative may be denoted as

Theorem: The Mean Value Theorem (Multi-Variable)

Let be continuously differentiable. Also let , where .

Then, if the segment joining lies in , then there exists such that

This is the Mean Value Proposition, with the additional assertion that are assumed to be at the same point.

We can also use directional derivatives to make a few extra inferences.

Note that if is a vector of norm 1, we can interpret the directional derivative as the rate of change in a particular direction!

Theorem: Fastest Rate of Change

Let , . Fix , and assume . Then, the maximum of the directional derivative at is given as

Is attained for

In other words, the direction of the gradient.

Furthermore, we can use directional dervatives to prove a notion of continuity on multiple variables.

Theorem: Partial Derivatives and Continuity

Let , and assume is continuously differentiable. Then, is continuous.

Recall that if is , then all partials exist and are continuous.

By this proof, in fact, if all the partials exist and are bounded, then is still continuous!

We end with a small remark that will segway into the next section. Let . Then,

This can be proven by using Cauchy-Schwarz.

We use this to define differentiable functions! is differentiable at if such that

as .

This is a stronger notion than partial diffentiation! So,

  • implies that is differentiable
  • differentiable implies that all parties of exist

But, the converses are not true!

Local Approximation of Real-Valued Functions

First Order Approximations


Say we have some function, and we want to analyze the behavior of it in an area around the point . One way to do this is to choose another function that approximates , yet is simpler! We can then work with to see what properties it has (and inherits from ).

Let , and . For a positive integer , we say that functions are order approximations of one another at if

We ask, can we find a first-order approximation for a given function ?

Theorem: First Order Approximation Theorem

Let open, be . Then, for , we have first order approximation of

We can alternatively write this in a few ways.

  • Let denote some error depending on and . Then, our approximation can be given as
    As the error drops to 0 when dividing by , we can also say that the error is of first order, .
  • Letting , fixed, we can also write our error as
    So if is fixed, and is sufficiently close to , then we have a close approximation!

We can also interpret this formula geometrically. In fact, interestingly enough, our first order approximation is equivalent to a tangent plane approximation of our function!

Second Order Approximations and Second Derivatives


In the single-variable case, we had the second-derivative test for determining minimums and maximums. Here, we develop the corresponding test for multiple variables.

Definitions and Context

Let be an matrix. Note that for any vector , the matrix-vector product

Is equivalent to the values inner products of the row of and ! If denotes the row of , then

This fact will be useful later!

Let be an matrix. Then, the function given by

Is known as the quadratic function associated with the matrix .

This function gives us a clean notation for generalizing directional derivatives into higher orders!

Let be . We define the Hessian Matrix of , denoted , as the matrix where for each pair of indices ,

In other words,

Note that if has continuous second-order partials, then the Hessian Matrix is symmetric because the entry would equal the entry!

We use the quadratic function notation to define higher order directional derivatives. If , fixed, then

Notice the pattern!


If , then

In the above formulas, (2) will be quite useful in establishing a second-derivative criterion for the multi-variable case. However, we will also need some way to estimate the sizes of the values that quadratic functions can take on! These tools are given as follows.

Let be an matrix, . The Hilbert-Schmidt norm of is given as

We think of the matrix as a long vector, and take the vector norm.

With this norm for a matrix, we can generalize the Cauchy-Schwarz Inequality!

Theorem: Generalized Cauchy Schwarz Inequality

Let be , and . Then,

We can also define the operator norm of as

Based on this, and the Generalized Cauchy-Schwarz Inequality, we can find that for ,

Let be a matrix. is positive definite if

Similarly, is negative definite if

Proposition: Properties of Positive Definite Matrices

Let be a positive definite matrix. Then, there exists a such that

For all .

Second Order Approximation and Second Derivative Test

Let , . Also, let . Then, we have the following definitions:

  • is a local minimizer if there exists a such that
  • is a local maximizer if there exists a such that
  • is a local extreme point if it is either a local minimizer or a local maximizer for .

Note that is a strict minimizer / maximizer if the inequality is strictly less than or greater than.

In the single-variable case, we found that for a local extremum to occur, the derivative must be 0. We define the analogous case for multiple variables.

Theorem: Necessity for Local Extremum

Let open, and let have first-order partial derivatives. If is a local extreme point for , then

But unlike the single variable case, finding the ’s such that this holds is very difficult, as we get a system of equations! To help us with this, we need a more formal way to define the behaviors of functions! We define a test analogous to the single-variable Second-Derivative Test to help us with this.

By the Lagrange Remainder Theorem, recall that if , exists for every , then for all , there exists a such that

We can generalize this to the multi-variable case!

Theorem: Multi-Variable Remainder Theorem

Let , . Then, for , there exists such that

This is in fact a second order approximation of !

Theorem: Second Order Approximation Theorem

Let , . Then,

With the Second-Order Approximation Theorem, we can define a multi-variable analogy to the Second-Derivative test.

Theorem: Second Derivative Test

Let , .

Let be a point such that .

  • If the Hessian Matrix is positive definite, then is a strict local minimizer.

  • If the Hessian Matrix negative definite, then is a strict local maximizer.

Is the converse of this theorem also true? In other words, let . Assume is a local minimizer. Then . But what about ? Does it have to be positive definite?

No! As a counterexample, let . Then, we have a strict local minimizer at , but the is not positive definite.

But then, what’s a necessary condition for a minimizer? We discuss this below.

Let be a symmetric matrix.

  • is positive semi-definite if for all .
  • is negative semi-definite if for all .

Note that the inner product can now be 0, where it couldn’t be before!

Theorem: Necessity Condition for Extremum

If has a minimizer at , then , (can be 0). In other words, has to be positive semi-definite.

In particular, if , has a local minimum at , then all

And similarly, if has a local maximum at , then all


Let be open in , . Assume the Laplacian of at is positive for all .

Then, has no maximizer in .


Let be a bounded open set, (giving us a sequentially compact set).

Let , continuous on , satisfy


In other words, the maximum always occurs at the boundary.

Higher Order Approximations

Let , and let there be a multi-index where (a vector of 1’s and 0’s).

The multi-index will be used to “select” things we want later on!

With the multi-index, we define operations

Proposition: Multinomial Formula

Let . Look at

We know that

The one dimensional Taylor expansion!

Express in terms of partials of , .

Let . Then, we have

We have proven the following.

Theorem: Higher Order Approximations

Let . Then,

For some

Notice the similarity with .

For , which is the 1-dimensional approximation formula!

Linear Map Approximations of Non-Linear Mappings


Before, we studied linear mappings, or in other words, functions that can be expressed as linear transformations. Now, we turn to examine mappings that may not necessarily be linear!

Linear Mappings

We say a function is linear if for all ,

Theorem: Linear Mappings as Matrices

If is linear, then there exists a unique matrix such that

As given above, linear transformations can be given as their matrices, and in fact, many of their properties can be expressed in terms of matrices as well!

First, we consider compositions of transformations. Let us have linear transformations , . Let be matrices such that

Then, the matrix of the composition of these transformations is

Which is the product of the matrices!

Let’s now consider inverses of transformations.

Theorem: Invertible Transformations

linear, is invertible (as a function) if and only if the corresponding matrix is invertible as a matrix if and only if .

We commonly determine that transformations are invertible by checking the matrices!

Theorem: Properties of Invertible Matrices

Let be an matrix. Then, is invertible if and only if such that

By definition, is invertible if there exists a matrix such .

Is is a vector space with bases , and also , and the bases are related by

If the matrix of with respect to is , is , then .

The Derivative Matrix and Differential

We consider the following classes of mappings. These are mappings that may be non-linear, and are approximatable by linear mappings.

Let , and consider mapping represented as component functions

We have the following definitions for

  1. is said to have first-order partial derivatives at , provided that for all , has first-order partial derivatives at .
  2. is said to have first-order partial derivatives, if it has first-order partial derivatives for all .
  3. is said to be continuously differentiable provided that all of the ’s are continuously differentiable.

Theorem: Continuity on Mappings

Let , and .

Let be continuously differentiable. Then, is continuous.

Now, define with first-order partials at . We define the derivative matrix of at , denoted , as the matrix whose th entry is given by

We define the gradient of a function as a row vector.

We use this derivative matrix to generalize our findings in earlier sections.

Theorem: Mean Value Theorem for Mappings

Let . Then, for , we find such that

This is the multi-variable MVT applied to each component!

Mean Value Theorem Misconception

Note that above, if we chose all ’s to be equal, then we would have

Which seems like a very clean generalization of the MVT! However, it is not guaranteed that we can find a single that works for each .

Theorem: First-Order Approximation Theorem for Mappings

Let . Then,

It can be shown that at , is the only matrix in which this limit holds.


Let . Fix , and suppose there exists an matrix such that

Then, the mapping has first-order partial derivatives at , and .

We also say is differentiable at if there exists an , matrix such that

So, implies that F is differentiable , which implies that exists .

These are strict implications! See the examples below.

Example: Counterexamples

The below function is an example of for which exists , but there does not exist an such that

The Chain Rule

From the single-variable case, recall that for , we can find the derivative of as

We can generalize this rule to higher dimensions!

Theorem: The Chain Rule

Let , and let be continuously differentiable. Also let to define continuously differentiable.

Suppose that . Then, the composition is also continuously differentiable, and for , we can find its partial derivative as

Theorem: The Chain Rule for General Mappings

Let open. Let , and let open to define . Let be continuously differentiable.

Suppose that . Then, their composition is also continuously differentiable, and for each , we can find

The Inverse Function Theorem


The Inverse Function Theorem provides a sufficiency condition for when a function is one-to-one and invertible, and when we can compute the inverse. We generalize this to higher dimensions here!

Inverse Function Theorem: 1D, 2D

Recall the single-variable Inverse Function Theorem.

Theorem: Inverse Function Theorem (One Dimension)

Let continuously differentiable, and let such that .

Then, there is an open interval around , and an open interval containing such that the function

Is 1-1 and onto. Furthermore, in these intervals, we can define the function inverse continuously differentiable as

for all .

This theorem is pretty important, as it tells us when we can find a function’s inverse! More importantly, it asserts that even if an entire function is not invertible, it may have smaller intervals where it is invertible.

We now generalize this theorem to higher dimensions!

We say that an open subset of containing is a neighborhood of the point . Using this definition, we will now generalize the Inverse Function Theorem to 2 dimensions.

Theorem: Inverse Function Theorem (Two Dimensions)

Let , . Suppose at , is invertible.

Then, there exists a neighborhood around , and a neighborhood around such that

Is 1-1 and onto. Furthermore, is , and for a point where , we can find the derivative matrix of the inverse as

The inverse of the derivative matrix at !

Example: Inverse Function Theorem (2D)

We find that

So for all , our derivative matrix is non-zero! So, by the 2D Inverse Function Theorem, for any , there exists a neighborhood of , of such that

Is 1-1 and onto.

We ask, what happens at ? Does there exist a neighborhood of such that is 1-1 on ?

In fact, the answer is that this is impossible, because . So, for any neighborhood around , is not one-to-one.

Stability of Non-Linear Mappings

We will now introduce concepts necessary to generalize the inverse function theorem.


For an matrix , the following are equivalent:

  • is invertible
  • such that , .

We say that a mapping , open, is stable if such that


stable implies that is 1-1.

As a brief proof, if , then .

Note that is stable if and only if is Lipschitz (as the inequalities are flipped!)

Interestingly, we find that matrices that are sufficiently close to an invertible matrix are also invertible. In other words, matrices that are close to 1-1 matrices are also 1-1!


Let be an matrix, and assume that such that

Now let be an matrix such that . Then, .


This lets us prove the 1-1 condition on the General Inverse Function Theorem.

Theorem: Nonlinear Stability Theorem

Let , open. Assume that we have a point such that , is invertible.

Then there exists a neighborhood of such that

  • is stable on (implying is 1-1 on ).

  • The derivative matrix of is invertible .

Minimization Principle, General Inverse Function Theorem

To prove the General Inverse Function Theorem, we will introduce an auxiliary function such that its minimizers are solutions of some given equation.

Suppose we have a , open, and where is invertible. Then, from the previous section, we can find a neighborhood of such that is invertible for and c > 0$ such that

Using this, we can show the following.

Proposition: A Minimization Principle

Let open in , , . Assume that the derivative matrix of is invertible .

Let , the distance between and . If has an (interior) minimizer at , then .

Lemma: The Open-Image Lemma

Suppose we have a , open, and where is invertible. Then, is open.

Suppose we have a , open, and where is invertible. Using the previous proofs, we have found that exists a neighborhood of , a neighborhood of F(x^*)$, such that:

  • is invertible for all
  • such that
  • .

By general proprties of functions, is well defined. Finally, to show that the inverse is , we will prove that

This gives us the General Inverse Function Theorem.

Theorem: General Inverse Function Theorem

Let open, and let be . Now, let be invertible for some .

Then, there is a neighborhood of , a neighborhood of , such that is 1-1 and onto. Furthermore, is also , and for such that ,

We also give a second proof of the inverse function theorem based on the contraction mapping principle.

The Implicit Function Theorem

We now discuss the Implicit Function Theorem. This lets us create local descriptions of the set of points where a function is equal to 0, , also known as a level-curve!

2D Case: Dini’s Theorem

Let us have function . We ask, when is the set

A curve?


This will yield an empty set, so we don’t have a curve.

This will yield 1 point, so we don’t have a curve.

What does it actually mean for a set to be a curve?

Intuitively, such a set is a curve if we can define a function to represent the points. More formally, we say is a curve if for all points in the set , there is a neighborhood of , and function such that

In other words, the points in the neighborhood can be represented by some localized output of a function!

Theorem: Dini's Theorem

Let open in , . Let be a point in , and assume .

Then, , and a function such that , and if

And , then .

In this box, the 0-set of takes on a function .

If we know is , differentiate

To get

We can solve for with this!

Implicit Function Theorem

This can be generalized to higher dimensions!


We can generalize this!

Let ,

And assume the gradient at this point is not 0. Then, by the same proof, we can find such that such that !

Theorem: The Implicit Function Theorem

We look at points , where , . Let be open in , .

Let such that , invertible. Then, , and a function

Such that

And if , and , then . Also, can be computed by the chain rule.

Example: Implicit Function Theorem

Let . Assume and

Which of the following is true? , , such that

The second one! In the implicit function theorem, we need a such that is invertible. So, choose them to be , with free variable . Then, we can apply our implicit function theorem to get result (2).

We ask, is it possible for ? No. If the above holds, then by the chain rule, we find

And at ,

But this gives us , which is impossible!

Finally, we will show a formula for , . We use the property that . We know that starting with , we map

So, by chain rule,

We can use this to solve for !


Describe solutions to

On the LHS, we have .

We expect this to be a curve through . We will try to describe this curve locally. We find

Define . We need invertible, so we choose (as column and in will give us an invertible matrix).

We get , or in other words, , so our solutions look like .

Surfaces and Paths in

Let . Look at the level set of this function, the set of points where the function is 0.

Assume for all . Then is a surface

Recall, that for to be a surface, , there exists a neighborhood of such that is a function.

Let . Define the intersection of the two function’s level sets,

Intuitively, we’re intersecting 2 2-dimensional surfaces. So we should expect a 1-dimensional curve!

A sufficient condition for to be a 1-dimensional curve in is

Equivalently, let . The, we require

Has rank 2 for all .

If so, without loss of generality, the derivative matrix with repect to , is invertible. By the implicit function theorem, , and

Such that for all , and these are the only solutions in .

Thus, agrees with the graph in , and we can parameterize it as

With tangent vector at given as , and

These are two normals to our curve!

So, is a non-zero tangent vector to the curve, so such that

We generalize.

An -dimentional manifold embedded in , . Let , and assume that matrix has maximimal rank if .

If so, we will represent the level set

Locally, as a graph.

Let , . Without loss of generality, (the rightmost entries) is invertible. Thus, and such that

And these are the only solutions if . Thus, agrees with the graph .

This is an -dimentional manifold!

We need linearly independent tangent vectors at . The process of doing this is the same— fix variable, and differentiate with respect to our last variable. These are our tangent vectors!

The range of at is the tangent space above, as

So, the tangent space to at is the null space of .

We have , and an open set . We want to know if looks like a smooth surface.

Recall that we say that if

Then is a smooth surface at .

This is equivalent to saying the derivative matrix of has rank 2, as the first and second row are linearly independent!


In the general case, let ,

Assume that has rank . Denote . Without loss of generality, assume is invertible.

Then, neighborhood of and neighborhood of such that

For some .


We know that is invertible. By the inverse function theorem, neighborhood of and neighborhood of such that is one-to-one, onto, and has a inverse .

We compose

Lagrange Multipliers

Case 1: Surfaces in

Let , and let . Furthermore, define surface

Where if ().

Let , and let be such that (or ), . Then, such that

Note that this same argument works for the case of ,

Assuming if (then is an dimensional manifold in ).

If and is such that (or ), then such that

Case 2: Curves in

Let . Define curve

And assume

If (then is a curve in ).

Let . Let such that (or ) for all . Then there exists such that

Let be an symmetric real matrix. Let

We look at the minimum of the quadratic function in the compact set given by the unit sphere.

Let be a minimizer (). Then,