Math 365, Elementary Statistics

Lesson 7: Estimation

Introductionback to top

The name of the game in statistics is trying to understand the POPULATION on the basis of the information available in the SAMPLE. Part of what we mean by "understand" is estimating the values of the population parameters. The game here is to use suitable sample STATISTICS to estimate population parameters. For example, we may like to use the sample mean x as an estimate for the population mean μ.

We consider two methods of estimating parameters.

  1. The first one is called point estimation. In point estimation, we give a number as an estimate for the parameter. For example, if we are trying to estimate the mean height μ of the American population, we may take a sample of a certain size, compute the sample mean height x, and call it an estimate for μ.
  2. The second one is called interval estimation. In interval estimation we give an interval (L, U) and say that the parameter will be within this interval (with a certain level of confidence). For example, when estimating the mean height μ of the American population, we may take a sample, compute the sample mean x and say that the population mean μ is in the interval (x-1, x+1). Obviously, in interval estimation, the smaller the length, U-L, of the interval and the higher the level of confidence, the better the estimation is.


7.1 Point and Interval Estimationback to top

As we have already mentioned, we use a statistic to estimate a parameter. The statistic T used to estimate a parameter θ is called an estimator of θ. The computed value t of T is called a point estimate or an estimate of θ. For example, the sample mean X is an estimator of μ and the computed value x is an estimate of μ. The estimator is a sampling random variable. Similarly, the sample variance S2 is an estimator of the population variance σ2 and the computed value s2 is an estimate of σ2.

It may be intuitively clear to you why X and S2 would be reasonable estimators, respectively, for μ and σ2. Mathematically, the reasons are as follows:

  1. We have

    E(X) = μ       E(S2) = σ2.

    For this reason we say X and S2 are unbiased estimators, respectively, for μ and σ2.


  2. var(X) = σ2/n

    is small if n is large. So, for large n, the standard deviation σX of X decreases. This means that values of X will be close to the mean μ more frequently. This improves the level of confidence for X as an estimator of μ. View the animation on normal distribution to see how the probability mass concentrates around the mean μ as the standard deviation decreases.

Interval Estimation

We would almost never expect a point estimate t of a parameter θ to be exactly equal to the actual value of θ. This is why it is more reasonable to give an interval (L,U) and say that θ would be within this interval. Here L, U will be statistics. Since the computed values of L = l,U = u will depend on the sample, we do not expect that the value of θ will always be within this computed interval (l,u). We are happy as long as the true value of θ falls within the interval (l,u) most often (or often enough), allowing the possibility of being "wrong" a few times.

But how often is often enough? The probability P(L < θ < U) tells us how often the paramenter will fall within (l,u). So, it is also reasonable to give the probability P(L < θ < U) or P( θ ∉ (L,U)). This is what we do in interval estimation, also called a confidence interval of θ.

Definition. Let θ be a population parameter. An interval estimate for θ provides the following:

  1. It gives an interval (L,U) as an estimate for θ. Here L,U are statistics.
  2. It also gives the probability P(L < θ < U). This number

    P(L < θ < U) = 1- α

    is called the level of confidence. And (L,U) is said to be a (1-α)100 percent confidence interval of θ.
  3. In practice, α will be a small number, like, 0.1, 0.01, 0.05.

We need the following definition.

Definition: Given a number 0 < α < 1, the number zα is defined by the formula

P(Z > zα) = α.

View the animation on inverse Z-distribution to understand the numbers zα. As mentioned above, for us a will be a small number .1, .01, .05 and so on. At the end of the Z-table is a list of the numbers zα that we may need frequently.


A (1-α)100 percent confidence interval for the mean μ:

Suppose X is a random variable with mean μ and variance σ2. We want to construct a confidence interval for μ.

We assume that σ is known. Let X1,X2, …, Xn be a sample from X. Note that from CLT we have, approximately,

P(-zα/2 < Z < zα/2 ) = 1 - α

where Z=(
X-μ)n/σ.

If we simplify, we get

P(X-E < μ < X+E)=1- α

where    E=zα/2
σ/n.

So we have the following theorem.

Theorem. Assume that σ is known. Then a (1-α)100 percent confidence interval for μ is given by

X-E < μ < X+E

where    E=zα/2
σ/n.

Remarks.

  1. If you go on computing (1-α)100 percent confidence intervals on a regular basis, the true value of μ will not be within the confidence interval α100 percent times.
  2. The confidence interval we computed above may also be called a (1-α)100 percent two sided confidence interval for μ. There could be all kinds of confidence intervals. For example, if

    P(L < μ < ) =1 - α.

    then (L, ) will be a (1-α)100 percent one sided (upper) confidence interval for μ.

Definitions and Formulas:

  1. The length l of this (1-α)100 percent confidence interval for μ is given by

    l = 2zα/2σ/n.

  2. The margin of error E is defined as

    E = zα/2σ/√n.

  3. The sample size n needed for a (1-α)100 percent confidence interval to have a preassigned margin of error E is given by

    n = (zα/2σ/E)2.

To be sure, always round upward in this class. Also use the Z-table for online homework.


Use of Calculators (if you have a TI-83): Z-interval
  1. Press stat and then select TESTS.
  2. Select Z-interval and enter.
  3. Input: you will have to select stats (not data) in this section.
  4. Feed in the values of σ, x, n and c-level.
  5. Select calculate and enter. It will give you the confidence interval.
  6. The margin of error = E = (width of the interval)/2.
  7. To compute the sample size, use the formula above.

Problems on 7.1: Point and Interval Estimation

Exercise 7.1.1. Assume that you have a normal population with mean μ and standard deviation σ = 15. Suppose you have collected a sample of size 25 and the sample mean X was found to be 81.

  1. Find a 99 percent confidence interval for μ.
  2. Find the margin of error at 99 percent level of confidence.
    Solution

Exercise 7.1.2. Assume that you have a normal population with mean μ and standard deviation σ = 9.8. Suppose you have collected a sample of size 14 and the sample mean X was found to be 151.1.

  1. Find a 99 percent confidence interval for μ.
  2. Find the margin of error at 99 percent level of confidence.
    Solution

Exercise 7.1.3. The time taken by an athlete to run an event is normally distributed with mean μ and known standard deviation σ = 3.5 seconds. To estimate the mean μ, he ran 16 times and the sample mean was found to be X = 33 seconds.

  1. Find the margin of error in estimating the true mean μ with 95 percent level of confidence.
  2. Find a 99 percent confidence interval for μ.
    Solution

Exercise 7.1.4. A population has normal distribution with variance σ2 = 289. How large a sample do we need to estimate the mean μ within 3 units from the true value of μ, with 90 percent confidence?
Solution

Exercise 7.1.5. The tuition X paid by a student per semester in a university has a distribution with mean μ and σ = $416. How large a sample should you draw so that you are 95 percent sure that the true value of μ will be within $10 of the sample mean x?
Solution


7.2 When σ Is Unknownback to top

Let X be a normal random variable with mean μ and variance σ2. Unlike in the last section, in this section we assume that σ is not known, and we try to compute a confidence interval of μ. In the last section, the main tool (or fact) that we used was that

Z=(X-μ) n/σ

has N(0,1) distribution. In this section, we use the distribution of

T=(X-μ) n/S.

The distribution of T is known as t-distribution with degrees of freedom n-1, which we have not discussed. As we did for the N(0,1) random variable, we will now give the properties of t-distribution.


About t-distribution

Given a positive integer ν, there is a random variable T = tν that is said to have t-distribution with degrees of freedom ν. The useful properties of t-distribution are listed below:

  1. A t-random variable has degrees of freedom. If a random variable T has t-distribution with degrees of freedom ν then we say that T has tν distribution.

  2. The t-random variables are continuous random variables.

  3. The mean of a t-random variable is ZERO.

  4. The graph of the pdf of a t-random variable is symmetric around the y-axis and has a bell shape.

    Bell shaped curve showing pdf of a t-random variable, symmetric around the y-axis.

    1. Flash animation: t-distribution
    2. Flash animation: probability computation.

  5. For a T = tν random variable, if the degrees of freedom ν is large, then it can be approximated by a N(0,1) random variable.

  6. For a number 0 < α < 1 and any positive integer ν, we define a number tν, α by the equation

    P(T > tν, α) = α

    where T has t-distribution with degrees of freedom ν.
    View the animation on inverse-T distribution to undertand the numbers tν, α.

  7. Tables are available, one for each degree of freedom ν, that can be used to compute the probability for T-random variables. We will need only some of the numbers tν, α. A table sufficient for us is provided at link for a table .

    Theorem.
    Let X be a normal random variable with mean μ and standard deviation σ. Let X1,X2,…, Xn be a sample of size n from the X population. Then

T=(X-μ) n/S.

has t-distribution with degrees of freedom n-1.

So,

P(-tn-1,α/2 < (X-μ)n/S < tn-1,α/2 )  =  1-α.

If we simplify, we get

P(X-E < μ < X+E)=1- α

where    E=tn-1,α/2S/n.


A (1-α)100 percent Confidence Interval for μ

Under the set up of the theorem, a (1-α)100 percent confidence interval for μ is given by

X-E < μ < X+E

where    E=tn-1,α/2s/ n

E is also called the margin or error.

A Frequently Asked Question:To estimate μ, when do we use the ZInterval and when do we use the TInterval? Answer: We use the TInterval only when σ is not known.


Use of Calculators (if you have a TI-83): T-interval
  1. If we have raw data, enter the data into the Calculator.
  2. Press stat and then select TESTS.
  3. Select T-interval and enter.
  4. Input: you will have to select stats or data, depending on what is given.
  5. Feed in the values of sample standard deviation s, x, n or the List where you have the data and c-level.
  6. Select calculate and enter. It will give you the confidence interval.
  7. The margin of error = E = (width of the interval)/2.

Problems on 7.2: When σ Is Unknown

Exercise 7.2.1. Assume that we have normal populations with mean μ and standard deviation σ. We have a sample of size n = 18 that has sample mean x = 170.5 and standard deviation s = 13.3. Find the margin of error and compute a 99 percent confidence interval for μ.
Solution

Exercise 7.2.2. Suppose that the time taken to complete a problem in a Math 365 test is normally distributed with mean μ and standard deviation σ. A sample of size 23 was taken, and sample mean and standard deviation were found to be x = 4.7 and s = .47. Estimate the mean time μ taken to complete a problem using a 98 percent confidence interval.
Solution

Exercise 7.2.3. It is assumed that the lifetime (in hours) of lightbulbs produced in a factory is normally distributed with mean μ and standard deviation σ. To estimate μ the following data was collected on the lifetime of bulbs.

5110 4671 6441 3331 5055 5270 5335 4973 1837
7783 4560 6074 4777 4707 5263 4978 5418 5123

Compute a 95 percent confidence interval for μ. Write down the formula for (1-α)100 percent confidence interval that you use here.
Solution

Exercise 7.2.4. To estimate the mean weight (in pounds) of salmon in a river the following sample was collected:

34.7 33.8 38.2 20.3 27.8 45.3 43.1 37.3 32.5 32.3
31.8 41.5 44.5 29.2 25.3 29.6 39.5 29.1 37.3  

Compute a 99 percent confidence interval for the sample mean μ. Write down the formula for (1-α)100 percent confidence interval that you use here.
Solution

Exercise 7.2.5. Suppose we collect a sample from a normal population of size n = 40 with sample mean X = 18.6 and standard deviation s = 9.486. Construct a 95 percent confidence interval for mean μ.
Solution

Exercise 7.2.6. The time taken by an athlete to run an event is normally distributed with mean μ and unknown standard deviation σ. To estimate the mean μ he ran 16 times and the sample mean was found to be X = 33 seconds and the sample standard deviation s = 3.5 seconds.

  1. Find the margin of error in estimating the true mean μ with 99 percent level of confidence.
  2. Find a 99 percent confidence interval for μ.
    Solution

7.3 Confidence Interval for σ2back to top

Let X be the normal random variable with mean μ and variance σ2. In this section, we will construct a confidence interval for σ2. We will take a sample X1,X2, …, Xn of size n from the X population. Let X be the sample mean and let S2 be the sample variance. To compute a confidence interval for σ2, we will be using the distribution of

U = (n-1)S2/σ2

The distribution of U is known as χ2 distribution with degrees of freedom n-1, which we have not discussed. Next we will give the properties of a χ2 random variable.

About χ2-distribution

Given a positive integer ν, there is a random variable χ2ν that is said to have χ2 distribution with degrees of freedom ν. The useful properties of χ2 distribution are listed below.

  1. A χ2 random variable has a degree of freedom. If a random variable U has χ2 distribution with degrees of freedom ν then we say that U has χ2ν-distribution.

  2. The χ2 random variables are all continuous random variables.

  3. A χ2 random variable is always nonnegative.

  4. The graph of the pdf of a χ2 random variable is skewed to the right. If the degrees of freedom, ν, is large then it can be approximated with a N(0,1) random variable.
    View the animations on pdf of Chi-Square random variable and probability distribution of Chi-Square.

  5. If U is a χ2ν random variable then the mean of U is ν. (We will not need this.) This fact is reflected in the animation above.

  6. For a number 0 < α < 1 and any positive integer ν, we define a number χ2ν, α by the equation
    P(U > χ2   ) = α
    v, α  

    where U has χ2 distribution with degrees of freedom ν.
    View the animation on inverse Chi-Square distribution to undertand the numbers χ2ν, α.

  7. Tables are available, one for each degree of freedom ν, that can be used to compute probability for χ2-random variables. For our purpose, only some of the numbers χ2ν, α will be needed. Here is a link for a table that will be sufficient for us.

Theorem. Let X be a normal random variable with mean μ and variance σ2. Let X1,X2,…,Xn be a sample of size n from the X population. Then

T = (n-1)S2/σ2

has χ2 distribution with degrees of freedom n-1.

So,

P(χ2 n-1,1-α/2 < (n-1)S2/ σ2 < χ2 n-1,α/2 )  =  1-α.

If we simplify, we get

P(L < σ2 < U)   =   1 - α

where

L  =  (n-1)S2/
χ2n-1,α/2

U  =  (n-1)S2/
χ2n-1,1-α/2

Theorem. Under the same set-up as in the above theorem, a (1-α)100 percent confidence interval for the variance σ2 is given by

l < σ2 < u

where

l  =  (n-1)s2/
χ2n-1,α/2

u  =  (n-1)s2/
χ2n-1,1-α/2

OR

(n- 1)s2
χ2n- 1, [(α)/2]
< σ 2 < (n- 1)s2
χ2n- 1, 1- [(α)/2]
.

Use of Calculators: The TI-83 will not compute the confidence interval for σ2. If data is given, it is important to use the calculator to compute the sample variance s2.

Problems for 7.3: Confidence Interval for σ2

Exercise 7.3.1. Suppose that we have collected a sample of size n = 26 from a normal population with mean μ and variance σ2. The sample variance was found to be s2 = 26.7. Compute a 95 percent confidence interval for σ2.
Solution

Exercise 7.3.2. The following is sample data on the amount (in 1000 bushels) of wheat harvested by Kansas farmers in 2002.

206 300 200 385 280
600 225 933 320 260
  1. Compute a 99 percent confidence interval for the variance of harvest σ2.

Solution

Exercise 7.3.3. The following is data on monthly gas consumption (in ccf) during the winter months by a household.

154 222 264 257 127
228 240 393 278 140
  1. Compute a 99 percent confidence interval for the variance σ2.

Solution


7.4 About the Population Proportionback to top

Once again, let p be the population proportion of a certain attribute. We want to compute a confidence interval for p. We let

X = 1 if success
X = 0 if  failure

where "success" means that the sample has the attribute.

So, X is a Bernoulli(p) random variable. We draw a sample X1,X2,…, Xn from the X population, let

X = X1+…+Xn

be the total number of success and

X=X/n

be the sample proportion of success. We have seen that, approximately, the sample proportion X has

N(μX, σX)-distrubution

where μX = p  and  σX  =  ((p(1-p))/n).

Therefore,

P(-zα/2 < (X-p)/σX < zα/2 )  =  1-α.

In an attempt to compute a confidence interval for p we simplify and get

P(X-zα/2 σX < p < X+zα/2 σX) )  =  1-α.

Since p is unknown, this will not produce a confidence interval for p. But the sample proportion x of success is a point estimate of p. So we have an approximate (1-α)100 percent confidence interval for p given by

x-e < p < x+e

where

e =  zα/2 (x(1-x)/n)

Following are some of the useful formulas and definitions that we may need.

  1. The margin of error e is defined as

    e =  zα/2 (x(1-x)/n)

  2. A conservative margin of error E is defined as

    E =  zα/2/4n.

    It can be checked that the margin of error e is always less or equal to the conservative margin of error E.

  3. Theorem. For a (1-α)100 percent confidence interval for p, if we are given a preassigned conservative margin of error E, then the sample size n that we need to take is given by

    n  =  (zα/2/2E)2 , rounded to the higher integer.

Remark. In the days of Clinton's impeachment, we often heard TV newscasters read something like the following.

President Clinton has 64 percent approval rating. The poll has a margin of error plus or minus 3.1 percentage points. The poll surveyed 972 people.

They mean that the sample proportion x of people who "approve" President Clinton is 0.64. Normally they don't tell us the level of confidence they are using. Assuming that they are using a 95 percent confidence interval, they mean that

E = zα/2 /√4n = 1.96/(4x972) = 0.031.


Use of Calculators (if you have a TI-83): 1-PropZint
  1. Press stat and then select TESTS.
  2. Select 1-PropZint and enter.
  3. Feed in the values of number of success x, n and c-level.
  4. Select calculate and enter. It will give you the confidence interval.
  5. The margin of error = e = (width of the interval)/2.
  6. To compute the conservative margin of error, use the formula in the definition.
  7. To compute the sample size, use the formula above.

Problems on 7.4: About the Population Proportion

Exercise 7.4.1 In a sample of 197 apples from a lot, 19 were found to be sour. Set a 99 percent confidence interval for the proportion p of sour apples in the lot.
Solution

Exercise 7.4.2. A new vaccine was tried on 147 randomly selected individuals, and it was determined that 97 of them developed immunity. Find a 95 percent confidence interval for the proportion p of individuals in the population for whom the vaccine would help.
Solution

Exercise 7.4.3. Before a congressional election, a poll was conducted. Out of 887 randomly selected voters interviewed, 389 said that they would vote for Candidate A, and 359 said that they would vote for Candidate B.

  1. Construct a 98 percent confidence interval for the proportion p of voters who would vote for A.
  2. Construct a 98 percent confidence interval for the proportion p of voters who would vote for B.
  3. What is the conservative margin of error for both?

Solution

Exercise 7.4.4. If a pollster wanted to estimate the proportion p of Americans who think that the President should not be impeached, how large a sample should he/she take so that the true value of p will be within .02 of the sample proportion, with 99 percent confidence?
Solution

Exercise 7.4.5. The proportion p of defective lightbulbs produced by a machine needs to be estimated within .01 to determine whether the machine needs to be replaced. How large a sample should we take to do this with 90 percent confidence?
Solution

Exercise 7.4.6. In a poll released on October 28,1998, it was revealed that 60 percent of Americans wanted President Clinton rebuked but not impeached. The poll was conducted among 1,013 adults, and it had a margin of error of 3 percentage points.

  1. Can you relate the last two numbers?
  2. What is the level of confidence used here?

Solution: News media polls use 95 percent confidence intervals. When they say "margin of error," they mean "conservative margin of error." The conservative margin of error E and level of confidence 1 - α are related by the formula E = zα/2 /√4n. For this problem E = .03, 1 - α =.95, and n =1,013. We can check zα/2 /√4n = 1.96/(4x1013) = 0.03079.

back to top