Least-Squares Techniques

The standard technique for performing linear fitting is by least-squares, and this chapter discusses programs that use that algorithm.

However, as Emerson and Hoaglin point out, the technique is not without problems.

Various methods have been developed for fitting a straight line of the form:
y = a + bx to the data xi,yi, i = 1,...,n. The best-known and most widely used method is least-squares regression, which involves algebraically simple calculations, fits neatly into the framework of inference built on the Gaussian distribution, and requires only a straightforward mathematical derivation. Unfortunately, the least-squares regression line offers no resistance. A wild data point can easily seize control of the fitted line and cause it to give a totally misleading summary of the relationship between y and x .

Reference: John D. Emerson and David C. Hoaglin, "Resistant Lines for y versus x", in David C. Hoaglin, Frederic Mosteller, and John W. Tukey, Understanding Robust and Exploratory Data Analysis (John Wiley, 1983, ISBN: 0-471-09777-2), pg.129.

The central idea of the algorithm is that we are seeking a function f[x] which comes as close as possible to the actual experimental data. We let the data consist of N {x,y} pairs.

Then for each data point the residual is defined as the difference between the experimental value of y and the value of y given by the function f evaluated at the corresponding value of x.

First, we define the sum of the squares of the residuals.

Then the least-squares technique minimizes the value of SumOfSquares.

Here is a simple example. Imagine we have just a succession of x values, which are the result of repeated measurements.

We wish to find an estimate of the expected value of x from this data. Call that estimate value . Then symbolically we may write the sum of the squares.

For this to be a minimum, the derivative with respect to must be equal to zero.

We write out the sum.

We solve for xbar.

But this is just the mean (i.e., average) of the xi. The mean has no resistance and a single contaminated data point can affect the mean to an arbitrary degree. For example, if x1 -> infinity, then so does . It is in exactly this sense that the least-squares technique in general offers no resistance.

Nonetheless, although EDA supplies functions that are resistant, the least-squares fitters discussed here are usually the first ones to try.

Usually we are fitting data to a model for which there is more than one parameter.

The least-squares technique then takes the derivative of the sum of the squares of the residuals with respect to each of the parameters to which we are fitting and sets each to zero.

The analytic solution to this set of equations, then, is the result of the fit.

If the fit were perfect, then the resulting SumOfSquares would be exactly zero. The larger the SumOfSquares, the less well the model fits the actual data.

Converted by Mathematica September 30, 1999. Slightly modified by hand by David Harrison, October 1999. This document is Copyright © 1996, 1997, 1998, 1999 David M. Harrison and Wolfram Research, Inc.