Econ 521: handout on residuals. N. Hash
Searching for violations of Assumption: What the Residuals tell Us?
You usually dont know in advance whether a model such as linear regression is appropriate. Therefore, it is necessary to conduct a search focused on residuals to look for evidence that the necessary assumptions are violated.
Residuals
In model building, a residual is what is left after the model is fit. It is the difference between an observed value and the value predicted by the model. Equation ei =yi- a-bx = yi -y(hat)
In regression analysis, the true errors ei, are assumed to be independent normal values with a mean of 0 and a constant variance . If the model is appropriate for the data, the observed residuals ei which are estimates of the true errors ei should have similar characteristics.
If the intercept term is included in the equation, the mean of the residuals is always 0, so it provides no information about the true mean of the errors. Since the sum of the residuals is constrained to be 0, they are not strictly independent. However, if the number of residuals is large when compared to the number of independent variables, the dependency among the residuals can be ignored for practical purposes.
The relative magnitudes of residuals are easier to judge when they are divided by estimates of their standard deviations. The resulting standardized residuals are expressed in standard deviation units above or below the mean. For example, the fact that a particular residual is 5198.1 provides little information. If you know that its standardized form is 3.1, you know not only that the observed value is less than the predicted value but also that the residual is larger than most in absolute value.
Residuals are sometimes adjusted in on of two ways. The standardized residual for case (I) is the residual divided by the sample standard deviation of the residuals. Standardized residuals have a mean of 0 and a standard deviation of 1. The studentized residual is the residual divided by an estimate of its standard deviation that varies from point to point, depending on the distance of Xi from the mean of X. Usually standardized and Studentized residuals are close in value, but not always. The Studentized residual reflects more precisely differences in the true error variance from point to point.
Linearity
For the bivariate situation, a scatterplot is a good means for judging how well a straight line fits the data. Another convenient method is to plot the residuals against the predicted values. If the assumptions of linearity and homogeneity of variance are met, there should be no relationship between the predicted and residual values. You should be suspicious of any observable pattern.
Systematic patterns between the predicted values and the residuals suggest possible violations of the linearity assumption. If the assumption was met, the residuals would be randomly distributed in a band about the horizontal straight line through 0.
Equality of variance
You can also use the previously described plots to check for violations of the equality of variance assumption. If the spread of the residuals increases or decreases with values of the independent variables or with predicted values, you should question the assumption of constant variance of Y for all values of X.
If in a plot of the Studentized residuals against the predicted values for the dependent variable, the spread of the residuals increases with the magnitude of the predicted values, the equality of variance assumption is violated.
Independence of Error
Whenever the data are collected and recorded sequentially, you should plot residuals against the sequence variable. Even if time is not considered a variable in the model, it could influence the residuals. For example, suppose you are studying survival time after surgery as a function of complexity of surgery, amount of blood transfused, dosage of medication, and so forth. In addition to these variables, it is also possible that the surgeons skill increased with each operation and that a patients survival time is influenced by the number of prior patients treated. The plot of residuals corresponding to the order in which patients received surgery shows a shorter survival time for earlier patients than for later patients. If sequence and the residual are independent, you should not see a discernable pattern.
The Durbin Watson test is used to test for sequential correlation of adjacent error
terms. (See the text for the dw equation.)
The differences between successive residuals tend to be small when error terms are positively correlated and large when error terms are negatively correlated. Thus, small values of dw indicate positive correlation and large values of dw indicate negative correlation. Consult tables of the dw statistic for bounds upon which significance tests can be based (Neter and Wasserman, 1974).
Normality
The distribution of residuals may not appear to be normal for reasons other than actual nonnormality: misspecification of the model, nonconstant variance, a small number of residuals actually available for analysis, etc. Therefore, you should pursue several lines of investigation. One of the simplest is to construct a histogram of the residuals. However, it is unreasonable to expect the observed residuals to be exactly normal--some deviation is expected because of sampling variation. Even if the errors are normally distributed in the population, sample residuals are only approximately normal.
Another way to compare the observed distribution of residuals to that expected under the assumption of normality is to plot the two cumulative distributions against each other for a series of points. If the two distributions are identical, a straight line results. By observing how points scatter about the expected straight line, you can compare the two distributions.