Least Squares / Curve Fitting

Given a set of data in space, we ask: how do we find a “best fit line”?

Finding such a line would let us make generalizations and inferences from our data!

Least Squares Solutions

Suppose we have an inconsistent linear system . As it is inconsistent, we simply cannot find an satisfying the system.

This makes sense for our purposes! It’s often impossible to find an exact solution for a line passing through all points (points may not be co-linear). However, it is possible to find an exact solution for a line minimizing its distance to all points! This is the idea behind least squares.

A least squares solution of is a vector that minimizes the distance from to , known as the least squares error.

For some , this yields us the sum of the squares of the vertical distances between our line and the true collection of points.

Note that with this minimizer, for any other ,

We first build up some context below.

Finding the Least Squares Solution

Dot Products

Given in , their dot product is

The dot product relates to a handful of geometric quantities like lengths and angles.

The length (norm) of a vector is given as

And for any in , we have

where is the angle between and .

We say are orthogonal if .

Note that this means that is uniquely orthogonal to every vector in .

Orthogonal Projections

Let be a subspace of ( is some span of a collection of vectors, a line through the origin, a plane through the origin, etc).

The orthogonal projection of onto a subspace , denoted , is the closest vector in to . It satisfies

is orthogonal to every possible in .

The column space of an matrix is the span of its columns (the set of all linear combinations of its columns). Note that is a subspace of .

Every linear combination of the columns of has the form for some in . So

Here, if and only if there is some such that if and only if is a consistent linear system.

Now suppose is inconsistent. So, by above, if there is no solution then . Let . Then, because , we found the existence of an such that

Such an is a least squares solution, because is the closest vector in ’s column space to !

We know that the vector is orthogonal to every vector . In other words, it is orthogonal to every for any . So,

For every . This means that must be the 0 vector, so

This is the equation that least squares must satisfy! So, to find , we solve the linear system

Note the following:

  1. is always consistent (has a solution)
  2. has a unique solution if and only if is invertible, if and only if has linearly independent columns.

Example: Least Squares

Find the least squares solutions to

We can represent this as the system

We now solve .

We can use least squares to solve a multitude of problems! As long as we can find a corresponding system of equations, we can solve the system with least squares.

Example: Best-Fit Parabola

xy
1-0.15
20.68
31.21
40.86
5-0.08

Find a best fit parabola taking on the form

We can form a linear system by plugging in our x and y values.

This is a linear system! We can write , and as follows

We can use least squares to solve this!

But not all models will work with least squares! Least squares requires that we have a linear system, so anything that cannot form a linear system after subbing in x,y will not work.

Example: Applying Least-Squares to Non-Linear Systems

Suppose we have a non-linear model

For non-linear systems, one work around is to make an educated guess for the non-linear variables, and solving the subsequent linear system! So, we make a guess for , and then solve and and !

We can then compare the least squares errors to find which combination of variables works best!

Team Ranking

Suppose there is a league of sports teams who play each other. We want to be able to rank them from best to worst.

Rather than looking at their win/loss records, we would like to instead base our rankings on margins of victory, as this will give us less ambiguous results.

The general idea we will use was developed by Kenneth Massey, in his undergraduate thesis.

Example: Looking at Margins of Victory

Consider 3 teams T1, T2, T3

  1. T1 beats T2
  2. T2 beats T3
  3. T3 beats T1

In this case, we can’t really say which team is best, as the situation is completely symmetric! But if we instead had

  1. T1 beats T2 by 10 points
  2. T2 beats T3 by 2 points
  3. T3 beats T1 by 1 point

Then we have more information, and we can say as T1 completely crushed T2 and barely lost to T3, T1 seems to be the best!

The idea is to somehow assign each team a value, say (for team ), such that if team plays team , then the expected margin of victory (or defeat) for team is given by the difference .

Example: From Massey's Thesis

Suppose we have 4 teams:

  1. (T1) The Beast Squares
  2. (T2) The Gaussian Eliminators
  3. (T3) The Likelihood Loggers
  4. (T4) The Linear Regressors

Now suppose that T1 beats T2 by 4, T1 beats T4 by 2, T2 beats T3 by 1, T2 loses to T4 by 7, and T3 ties with T4. The idea is that we want to find each team a value where

This gives us a linear system, where we can solve for ! Generally these systems are inconsistent, but we can still find the best ’s using least squares!

One issue that happens is when we take , we don’t get an invertible matrix (as there are infinite solutions)! So, we can’t simply find .

Instead, let’s just row reduce and find all solutions can possibly take on! Here, we find our solution as

We find that , no matter what we choose! This is our team order.

Note that we can also use this to predict matches, even if they didn’t occur! For example, , so we can predict that team 1 would beat team 3 by 3.5 points!

To remove the (infinite) choices of free variables that we have as solutions, we can impose the constraint that (force the average to be 0) and add that as an additional requirement to our initial linear system.

One can prove that with this new requirement, we get a unique least squares solution, and moreover, this solution satisfies this “normalization condition”.