Suchen und Finden
Basic Prerequisite Knowledge
Readers need some of the knowledge contained in a basic course in statistics to tackle regression. We summarize some of the main requirements very briefly in this chapter. Also useful is a pocket calculator capable of getting sums of squares and sums of products easily. Excellent calculators of this type cost about $25–50 in the United States. Buy the most versatile you can afford.
0.1. DISTRIBUTIONS: NORMAL, t, AND F
The normal distribution occurs frequently in the natural world, either for data “as they come” or for transformed data. The heights of a large group of people selected randomly will look normal in general, for example. The distribution is symmetric about its mean μ and has a standard deviation σ, which is such that practically all of the distribution (99.73%) lies inside the range μ – 3σ ≤ x ≤ μ + 3σ. The frequency function is
We usually write that x ~ N(μ, σ2), read as “x is normally distributed with mean μ and variance σ2.” Most manipulations are done in terms of the standard normal or unit normal distribution, N(0, 1), for which μ = 0 and σ = 1. To move from a general normal variable x to a standard normal variable z, we set
A standard normal distribution is shown in Figure 0.1 along with some properties useful in certain regression contexts. All the information shown is obtainable from the normal table in the Tables section. Check that you understand how this is done. Remember to use the fact that the total area under each curve is 1.
Figure 0.1. The standard (or unit) normal distribution N(0, 1) and some of its properties.
However, it is easier to think of it as a generalized factorial with the basic property that, for any q,
and so on. Moreover,
There are many t-distributions, because the form of the curve, defined by
Figure 0.2. The t-distributions for v = 1, 9, ∞ t(∞) = N(0, 1).
depends on v, the number of degrees of freedom. In general, the t(v) distribution looks somewhat like a standard (unit) normal but is “heavier in the tails,” and so lower in the middle, because the total area under the curve is 1. As v increases, the distribution becomes “more normal.” In fact, t(∞) is the N(0, 1) distribution, and, when v exceeds about 30, there is so little difference between t(v) and N(0, 1) that it has become conventional (but not mandatory) to use the N(0, 1) instead. Figure 0.2 illustrates the situation. A two-tailed table of percentage points is given in the Tables section.
The F-distribution depends on two separate degrees of freedom m and n, say. Its curve is defined by
The distribution rises from zero, sometimes quite steeply for certain m and n, and reaches a peak, falling off very skewed to the right. See Figure 0.3. Percentage points for the upper tail levels of 10%, 5%, and 1% are in the Tables section.
Figure 0.3. Some selected f(m, n) distributions.
The F-distribution is usually introduced in the context of testing to see whether two variances are equal, that is, the null hypothesis that H0: / = 1, versus the alternative hypothesis that H1: / ≠ 1. The test uses the statistic and being statistically independent estimates of and , with v1 and v2 degrees of freedom (df), respectively, and depends on the fact that, if the two samples that give rise to and are independent and normal, then (/)/(/) follows the F(v1, v2) distribution. Thus if = , F = / follows F(v1, v2). When given in basic statistics courses, this is usually described as a two-tailed test, which it usually is. In regression applications, it is typically a one-tailed, upper-tailed test. This is because regression tests always involve putting the “s2 that could be too big, but cannot be too small” at the top and the “s2 that we think estimates the true σ2 well” at the bottom of the F-statistic. In other words, we are in the situation where we test H0: = versus H1: > .
0.2. CONFIDENCE INTERVALS (OR BANDS) AND t-TESTS
Let θ be a parameter (or “thing”) that we want to estimate. Let be an estimate of θ (“estimate of thing”). Typically, will follow a normal distribution, either exactly because of the normality of the observations in , or approximately due to the effect of the Central Limit Theorem. Let be the standard deviation of and let se() be the standard error, that is, the estimated standard deviation, of (“standard error of thing”), based on v degrees of freedom. Typically we get se() by substituting an estimate (based on v degrees of freedom) of an unknown standard deviation into the formula for .
1. A 100(1 – α)% confidence interval (CI) for the parameter θ is given by
where tv(1 – α/2) is the percentage point of a t-variable with v degrees of freedom (df) that leaves a probability α/2 in the upper tail, and so 1 – α/2 in the lower tail. A two-tailed table where these percentage points are listed under the heading of 2(α/2) = α is given in the Tables section. Equation (0.2.1) in words is
2. To test θ = θ0, where θ0 is some specified value of θ that is presumed to be valid (often θ0 = 0 in tests of regression coefficients) we evaluate the statistic
or, in words,
Figure 0.4. Two cases for a t-test. (a) The observed t is positive (black dot) and the upper tail area is δ. A two-tailed test considers that this value could just as well have been negative (open “phantom” dot) and quotes “a two-tailed t-probability of 2δ.” (b) The observed t is negative; similar argument, with tails reversed.
This “observed value of t” (our “dot”) is then placed on a diagram of the t(v) distribution. [Recall that v is the number of degrees of freedom on which se() is based and that is the number of df in the estimate of σ2 that was used.] The tail probability beyond the dot is evaluated and doubled for a two-tail test. See Figure 0.4 for the probability 2δ. It is conventional to ask if the 2δ value is “significant” or not by concluding that, if 2δ < 0.05, t is significant and the idea (or hypothesis) that θ = θ0 is unlikely and so “rejected,” whereas if 2δ > 0.05, t is nonsignificant and we “do not reject” the hypothesis θ = θ0. The alternative hypothesis here is θ ≠ θ0, a two-sided alternative. Note that the value 0.05 is not handed down in holy writings, although we sometimes talk as though it is. Using an “alpha level” of α = 0.05 simply means we are prepared to risk a 1 in 20 chance of making the wrong decision. If we wish to go to α = 0.10 (1 in 10) or α = 0.01 (1 in 100), that is up to us. Whatever we decide, we should remain consistent about this level throughout our testing.
However, it is pointless to agonize too much about α. A journal editor who will publish a paper describing an experiment if 2δ = 0.049, but will not publish it if 2δ = 0.051 is placing a purely arbitrary standard on...