Evaluating the Goddnes of a Fit

Evaluating the Goodness of a Fit

As already mentioned, when the data has no errors the SumOfSquares statistic measures how well the data fits to the model. Although a smaller SumOfSquares means a better fit, there is no apriori definition of what the word "small" means in this context. In many cases, analysts will use confidence intervals to try to characterize the goodness of fit for this case. There are many caveats to this approach, some of which are discussed in Section 8.2.1. Nonetheless, the Statistics`ConfidenceIntervals` package, which is standard with Mathematica, can calculate these types of statistics.

When the data have errors, the ChiSquared statistic does provide information on what "small" means because the data is weighted with the experimenter's estimate of the errors in the data.

The number of degrees of freedom of a fit is defined as the number of data points minus the number of parameters to which we are fitting. If we are doing, say, a straight line fit to two data points, the degrees of freedom are zero; in this case, the fit is also fairly uninteresting.

If we know the ChiSquared and the DegreesOfFreedom for a fit, then the chi-squared probability can be defined.

Here Gamma is a Mathematica built-in function.

The interpretation of this statistic is a bit subtle. We assume that the experimental errors are random and statistical. Thus, if we repeated the experiment we would almost certainly get slightly different data, and would therefore get a slightly different result if we fit the new data to the same model as the old data. As the usage message indicates, the chi-squared probability is the chance that the fit to the new data would have a larger ChiSquared than the fit we did to the old data.

If our fit returned a ChiSquared of zero, then it is almost a certainty that any repeated measurement would yield a larger ChiSquared.

If the ChiSquared is equal to the number of degrees of freedom, then the probability depends on the ChiSquared.

[Graphics:../Images/index_gr_38.gif]

The probabilities range from 32 to 48%. These are the kinds of probabilities we would expect if our estimates of experimental uncertainties are reasonable and the data is fitting the model reasonably well.

If we have a ChiSquared of 100 for 10 DegreesOfFreedom, the probability is very small.

This number indicates that no repeated experiment is likely to fit so poorly to the model. The conclusion may be that the data in fact is not related to the model being used in the fit.

If the ChiSquared is much less than the DegreesOfFreedom, the fit is almost too good to be true.

One possibility is that the experimenter has overestimated the experimental errors in the data.

If the chi-squared is, say, twice the number of degrees of freedom, the probability depends on the number of degrees of freedom.

[Graphics:../Images/index_gr_44.gif]

For two DegreesOfFreedom the probability is 14%, which is not too unreasonable, and indicates a fairly reasonable fit. For 20 DegreesOfFreedom the probability falls to 0.5%, which indicates a very poor fit.

Summarizing the use of the ChiSquared in evaluating the result of a fit:

A good fit should have a ChiSquared close to the number of DegreesOfFreedom of the fit. The larger the number of DegreesOfFreedom, the closer the ChiSquared should be to it.

That said, say we have good data including good estimates of its errors, and we are fitting to a model which does match the data. If we repeat the experiment and the fit many times and form a histogram of the ChiSquareProbability for all the trials, it should be flat; we expect some trials to have very small or large probabilities even though nothing is wrong with the data or the model. Thus, if a single fit has the chi-squared probability very large or very small, perhaps it is coincidental and there is nothing wrong with the data or the model. In this case, however, repeating the measurement is probably a good idea.

Despite its limitations, statistical analysis is very useful. However, one of the best ways of evaluating a fit is graphically. To illustrate, supplied with EDA is a famous quartet of made up data by Anscombe.

All data sets supplied with EDA have a usage message.

AnscombeData is a quartet of made up data devised
   by F.J. Anscombe, American Statistician 27
   (Feb. 1973), pg 17. All four data sets,
   AnscombeData[[1]], AnscombeData[[2]],
   AnscombeData[[3]], and AnscombeData[[4]], have
   almost identical averages and fit to almost
   identical straight lines.  The format of each
   data set is {x,y}, where neither coordinate has
   units.

Each data set consists of 11 {x,y} pairs.

The averages of both x and y for all four are almost the same.

We can use the EDA function LinearFit to fit each data set to a straight line. LinearFit is introduced in the next section and the details of options used below are not important here; for now, we simply note that the function returns the intercept and the estimated error in the intercept as a[0], and the slope and its error as a[1].

Anscombe[[1]]: {a[0] -> {3.00009, 0.909545}, 
 
   a[1] -> {0.500091, 0.0953463}, 
 
   SumOfSquares -> 13.7627, 
 
   DegreesOfFreedom -> 9}


Anscombe[[2]]: {a[0] -> {3.00091, 0.909545}, 
 
   a[1] -> {0.5, 0.0953463}, 
 
   SumOfSquares -> 13.7763, 
 
   DegreesOfFreedom -> 9}


Anscombe[[3]]: {a[0] -> {3.00245, 0.909545}, 
 
   a[1] -> {0.499727, 0.0953463}, 
 
   SumOfSquares -> 13.7562, 
 
   DegreesOfFreedom -> 9}

     Anscombe[[4]]: {a[0] -> {3.00173, 0.909545},

        a[1] -> {0.499909, 0.0953463},

        SumOfSquares -> 13.7425,

        DegreesOfFreedom -> 9}

The command also stored the intercept and slope of each fit in fittable.

Note that the results of these fits, including the SumOfSquares, are almost identical. So by just looking at the numbers we might conclude that all four fits are similarly reasonable.

Now we show a 2 x 2 matrix of graphs; each graph contains both the result of the fit to the data and the data itself for one of the datasets.

[Graphics:../Images/index_gr_55.gif]

Graph 1 shows that modeling AnscombeData[[1]] to a straight line is reasonable, while graph 2 shows the danger of using an incorrect model. Graphs 3 and especially 4 illustrate the fact, discussed above, that least-squares fits are not resistant to the influence of a "wild" data point.

Converted by Mathematica September 30, 1999. Slightly modified by hand by David Harrison, October 1999. This document is Copyright © 1996, 1997, 1998, 1999 David M. Harrison and Wolfram Research, Inc.