Satyagopal Mandal |
Department of Mathematics |
Office: 624 Snow Hall Phone: 785-864-5180 |
Estimation
As you know in statistics we try to understand a large population on the basis of information available in a small sample. Among what we mean by "understand" is to know the values of the population parameters. The game here is to use suitable sample statistics to estimate population parameters. For example, we may like to use the sample mean x as an estimate for the population mean m.
There are two types of estimation of parameters we consider.
1) The first one is called point estimation. In point estimation, we give a number as an estimate for the parameter.
For example, if we are trying to estimate the mean height m of the American population, we take a sample, compute the sample mean height x and call it an estimate for m.
2) The second type of estimation is called interval estimation. In interval estimation we give an interval (L, U) and say that the parameter will be within this interval (with certain degree of confidence). For example, while estimating the mean height m of the American population, we may take a sample, compute the sample mean x and say that the population mean m is within the interval (x -1, x +1). Obviously, in interval estimation, smaller the length U-L of the interval better it is (and also higher the degree of confidence better it is).
Point and Interval Estimation
As we have already mentioned, we use a statistic to estimate a parameter. For example, we say that the sample mean X is an estimator of the population mean m and the computed value x of X is an estimate of m. The estimator is a sampling random variable and the estimate is a number. Similarly, the sample standard deviation S is an estimator of the population standard deviation s and the computed value s of S is an estimate of s. Try to see the difference between an estimator and an estimate. An estimator is a random variable and an estimate is a number (that is the computed value of the estimator).
We will not go into the theory that deals with the characteristics and criterion for a good estimator for a given parameter. We will also consider only the estimation of the population mean m and the population proportion p (of success).
This may be intuitively clear to you that the sample mean X is a natural point estimator of m and sample proportion of success X is a natural point estimator of the population proportion p of success.
So, if we are asked to give a point estimate of m we just compute x and say that x is a point estimate for m. Similarly we deal with point estimation of p. Similarly, the sample median would be a natural point estimator for the population median.
Interval Estimation
Almost never, we would expect a point estimate of a parameter t to be exactly equal to the actual value of the parameter t. For example, we would never expect that the sample mean x would be equal to the exact value of m.
That is why, it is more reasonable that we give an interval (L, U) and say that the parameter t would be within this interval (L, U). Here L, U will be Statistics. Since the computed values
L = l, U = u
will depend on the sample, we do not expect that the value of t will always be within this computed interval (l, u). We are happy as long as the true value of t falls within (l, u) most often (or often enough), allowing the possibility that a few times we will be "wrong". The Probability
P(L < t < U) and P(t not in (L,U))
give us how often we would be correct or wrong, respectively. This is what we do in interval estimation, and (l, u) is also called a confidence interval of t. We have the following formal definition.
Definition: Let t be a population parameter. An interval estimate of t provides the following:
P(L < t < U)=1- a
is called the level of confidence. And (L,U) is said to be an (1- a)100 percent confidence interval of t.
Definition: Suppose Z is the standard normal random variable. For a number a between 0 and 1 we define a number z_{a} by the following formula:
P(Z < z_{a}) = 1- a.
In practice, a = 0.1, 0.05, 0.025, 0.01, 0.005 and using the z-table we have
z_{0.005} |
2.58 |
z_{0.01} |
2.33 |
z_{0.025} |
1.96 |
z_{0.05} |
1.65 |
z_{0.1} |
1.28 |
A (1- a)100 percent confidence interval for m:
Suppose X is a random variable with mean m and variance s. We will compute a confidence interval for m.
First, we assume that s is known. Let $X_{1}, X_{2}, … , X_{n} be a sample from the X-population. From Central Limit Theorem (CLT) we have, approximately,
P (-z_{a/2 }< (X - m)/(s /n^{1/2}) < z_{a/2}) = 1 - a.
If we simplify, we get
P (X - z_{a/2} (s /n^{1/2}) < m < X + z_{a/2} (s /n^{1/2})) = 1- a.
So we have the following theorem.
Theorem: Assume that the population standard deviation s is known. Then an, approximate, (1- a)100 percent confidence interval for m is given by
x - z_{a/2} (s /n^{1/2}) < m < x + z_{a/2} (s /n^{1/2})
Remark: This confidence interval is also called a two-sided confidence interval for m. There could be all kinds of confidence intervals for m. For example, if
P(L < m) = 1- a, (L, infinity) is called a (1- a)100 percent one-sided (upper) confidence interval for m. Similarly, we can talk about one-sided) lower confidence intervals.
Use Your Calculator: If you know x and s, then you can compute the confidence interval using the above formula. You can also use the calculator:
Two More Formulas/Definition:
l = 2z_{a/2} (s /n^{1/2}).
Problems: Z-interval for m
Ex.1: Assume that you have a population with mean m and standard deviation s = 15. Suppose you have collected a sample of size 25 and the sample mean x was found to be 81.
Solution: 1) Here 1- a = 0.99. So, a = 1 - 0.99 = 0.01 and z_{a/2 }= z_{0.005} =2.58. Besides n = 25, x = 81 and s = 15. A 99 percent, approximate, confidence interval for m is given by
x - z_{a/2} (s /n^{1/2}) < m < x + z_{a/2} (s /n^{1/2})
which is
81 - 2.58(15/(25)^{1/2}) < m < 81 + 2.58(15/(25) ^{1/2})
which is
73.26 < m < 88.74
Alternately, you could get the answer from the Calculator.
E = z_{a/2} (s /n^{1/2}) = 2.58(15/(25)^{1/2}) = 7.74.
Ex.2: Assume that you have a normal population with mean m and standard deviation s = 9.8. Suppose you have collected a sample of size 14 and the sample mean x was found to be 151.1.
Solution: 1) Here 1- a = 0.99. So, a = 1 - 0.99 = 0.01 and z_{a/2 }= z_{0.005} =2.58. Besides n = 14, x = 151.1 and s = 9.8. A 99 percent, approximate, confidence interval for m is given by
x - z_{a/2} (s /n^{1/2}) < m < x + z_{a/2} (s /n^{1/2})
which is
151.1 - 2.58(9.8/(14)^{1/2}) < m < 151.1 + 2.58(9.8/(14) ^{1/2})
which is
144.34 < m < 157.86
Alternately, you could get the answer from the Calculator.
E = z_{a/2} (s /n^{1/2}) = 2.58(9.8/(14)^{1/2}) = 6.76.
Ex.3: The time taken by an athlete to run an event is has a distribution with mean m and known standard deviation s = 3.5 second. To estimate the mean m he ran 16 times and sample mean was found to be x = 33 seconds.
Ex.4: A population has a distribution with standard deviation s = 17. A sample of size n = 65 was taken and the sample mean was x = 11,50. Give an approximate, 90 percent confidence interval for m.
Ex. 5: It is suspected that an industrial plant is pollution the water stream. To determine the extent of damage, water sample of size n = 13 was collected and the dissolved oxygen concentration was measured. The mean concentration was found to be x = 2.3. It is known from past experience that s = 0.45. Compute a 95 percent confidence interval for mean m concentration of oxygen.