Math 365, Elementary Statistics

Lesson 8 : Comparing Two Populations

Introductionback to top

In this lesson we try to compare two populations. We will consider the following:

  1. Compute a confidence interval of the difference μ1- μ2 of the means of two populations. For example, we may like to estimate the difference μ1 - μ2 between the mean μ1 = annual male income and the mean μ2 = annual female income in the United States.
  2. Compute a confidence interval of the difference p1-p2 of the proportions of an attribute present (or proportions of "success") in two populations. For example, we may like to estimate the difference p1-p2 between p1 = the proportion of defective items produced by the new machine and p2 = the proportion of defective items produced by the old machine.

8.1 Confidence Interval of μ1- μ2back to top

Suppose X, Y are two similar random variables. Let mean and standard deviation of X be, respectively, μ1 and σ1. Let mean and standard deviation of Y be, respectively, μ2 and σ2. We want to compute a confidence interval for the difference μ1- μ2. So we do the following.

  1. We draw a sample X1, X2, …, Xm, of size m, from the X population and we draw a sample Y1, Y2, …, Yn, of size n, from the Y population. Let

    X  = (X1+X2+ … +Xm)/m

    Y  = (Y1+Y2+ … +Yn)/n

    be the corresponding sample means.

  2. BY CLT, we have that X has

    N(μ1, σ1/m )

    distribution and Y has

    N(μ2, σ2/n )

    distribution.

  3. You would agree that X-Y is a natural estimator of μ1- μ2.
  4. Now we assume that the X samples and Y samples are mutually independent. In that case, it follows that X-Y has

    N(μ1 - μ2, σ) - distribution,

    where

    σ =  ( σ12/m + σ22/n ).

  5. It follows that

    P(-zα/2 < ((X-Y) - (μ1 - μ2)) /σ < zα/2 )  =   1 - α.

    where σ is as above in (4)
    .

  6. If we simplify, we get

    P(X-Y -zα/2 σ < μ1 - μ2 < X-Y +zα/2 σ )  =   1 - α.

    where σ is as above in (4).

  7. Theorem. A (1-α)100 percent confidence interval for μ1- μ2 is given by

    x-y -zα/2 σ    <    μ1 - μ2    <    x-y +zα/2 σ

    where σ is as above in (4).

    This formula is usable if we know the values σ1 and σ2.

  8. The margin of error is given by

    E  =  zα/2 σ

    where
    σ is as above in (4).


Use of Calculators (if you have a TI-83): 2-SampZinterval
  1. Press stat and then select TESTS.
  2. Select 2-SampZinterval and enter.
  3. Input: you will have to select stats (not data) in this section.
  4. Feed in the values of σ1, σ2, x,y, m, n and c-level.
  5. Select calculate and enter. It will give you the confidence interval.
  6. The margin of error = E = (width of the interval)/2.

Problems on 8.1: Confidence Interval of μ1 - μ2

Exercise 8.1.1. Suppose we have two normal populations with means μ1, μ2 and standard deviation σ1, σ2 respectively. It is known that σ1 = 8.1 and σ2 = 11.3. A sample of size m = 64 was collected from the first population, and the sample mean was found to be x = 3.7. A sample of size n = 99 was collected from the second population, and the sample mean was found to be y = 4.1. Compute a 95 percent confidence interval for the difference of mean μ1- μ2.
Solution

Exercise 8.1.2. The birth weight of babies in developed and developing countries are normally distributed with mean μ1, μ2 and standard deviation σ1, σ2, respectively. (My data is not real.) Given σ1 = 2.3 pounds and σ2 = 2.9 pounds. A sample of size m = 35 babies from the developed nations were collected and the sample mean birth weight was found to be x = 8.9 pounds. A sample of size n = 48 babies from the developing nations was collected and the sample mean birth weight was found to be y = 7.1 pounds.

  1. Compute a point estimate of the difference of mean birth weight μ1- μ2.
  2. Determine the margin of error of the difference μ1- μ2 at the 95 percent level of confidence.
  3. Construct a 95 percent confidence interval for μ1- μ2.

Solution

Exercise 8.1.3. African elephants and Indian elephants are different in height, weight, and length of ear and tusk. It is natural to assume that all these are normally distributed. The mean height and standard deviation of African elephants are μ1, σ1 = 1.2 feet, respectively. The mean height and standard deviation of Indian elephants are μ2, σ2 = 1.1 feet, respectively. A sample of size 25 African elephants were collected and the sample mean height was found to be x = 10.9 feet. A sample of size 28 Indian elephants was collected and the sample mean height was found to be y = 9.1 feet.

  1. Compute a point estimate of the difference of mean height μ1- μ2.
  2. Determine the maximum error of the difference μ1- μ2 at the 99 percent level of confidence.
  3. Construct a 99 percent confidence interval for μ1- μ2.

Solution



8.2 When σ1 and σ2 are Unknownback to top

As in the last section, we have two populations X, Y. We assume that X has N(μ1, σ1) distribution and Y has N(μ2, σ2) distribution. Unlike in the last section, we assume that σ1, σ2 are unknown. We try to find a confidence interval for μ1 - μ2.

We take a sample X1, X2, …, Xm of size m from the X population, and we take a sample Y1,Y2, …, Yn from the Y population. Following are some facts and notations.

  1. Assumptions: We make an important assumption that the variances σ12 and σ22 are equal. So, we write

    σ1   =  σ2   =  σ.

    And, we also assume that the X-sample and the Y-sample are mutually independent.

  2.   2
    Let X and S  
      X
    be the sample mean and sample variance of the X-sample and let Y and SY2 be the sample mean and sample variance of the Y-sample.

  3. Definition. Define the pooled estimate Sp2 for σ2 as follows

    Sp2  =

    [(m-1)SX2+(n-1)SY2 ]/ [m+n-2]  =


    [ (Xi-X)2 + (Yj-Y )2 ] / [m+n-2]

    Although both SX2, SY2 are estimators of σ2, Sp2 is a better estimator for σ2 because it uses both the samples. One can see that Sp2 is a weighted average of SX2 and SY2.

  4. It follows that

    T  =  [ (X - Y) - (μ1 -μ2) ] / [Sp(1/m +  1/n) ]

    has a t-distribution with m+n-2 degrees of freedom.

  5. Using the same kind of computations that we have done before, we see that a (1-α)100 percent confidence interval for μ1- μ2 is given by

    x-y-E   <  μ1- μ2   <   x-y+E

    where

    E=tm+n-2,α/2 Sp (1/m + 1/n)



Use of Calculators (if you have a TI-83): 2-SampTint
  1. If we have raw data, enter the data into the calculator in 2 lists (say L1,L2).
  2. Press stat and then select TESTS.
  3. Select 2-SampTinterval and enter.
  4. Input: you will have to select stats or data, depending on what is given.
  5. Feed in the values of sample standard deviation s1, s2, x, y, m, n or the Lists where you have the data and c-level.
  6. Select calculate and enter. It will give you the confidence interval and also the pooled estimate of the equal standard deviation σ.
  7. The margin of error = E = (width of the interval)/2.

Problems on 8.2: When σ1 and σ2 Are Unknown

Exercise 8.2.1. Suppose that we are comparing two "similar" normal populations with means μ1, μ2 respectively and the populations both have standard deviation σ. We collected a sample of size m = 11 from the first population that produced a sample mean x = 13.2 and sample standard deviation s1 = 2.33. A sample of size n = 13 was collected from the second population that had sample mean y = 11.5 and sample variance s2 = 2.73.

  1. Compute the pulled estimate sp for σ.
  2. Find a point estimate for μ1- μ2.
  3. Compute the margin of error in estimating μ1- μ2 at the 90 percent level of significance.
  4. Compute a 90 percent confidence interval for μ1- μ2.

Solution

Exercise 8.2.2. Suppose we have two normal populations with means μ1, μ2 and equal standard deviation σ. A sample of size m = 64 was collected from the first population and the sample mean and standard deviation were found to be x = 3.7, s1 = 9.2 . A sample of size n = 99 was collected from the second population and the sample mean and standard deviation were y = 4.1, s2 = 8.7.

  1. Compute the pulled estimate sp for σ.
  2. Compute the margin of error for a 95 percent confidence interval for μ1- μ2.
  3. Compute a 95 percent confidence interval for the difference of mean μ1- μ2.

Solution

Exercise 8.2.3. The birth weight of the babies in developed and developing countries are normally distributed with mean μ1, μ2 and equal standard deviation σ. (My data is not real.) Suppose the following data about the birth weight from developed and developing nations were collected.

Developed
8.8 8.1 6.3 9.7 6.3
7.1 5.3 7.7 9.1 8.1
8.2 7.9 8.3 8.9 9.0
10.1 9.9 8.8 7.8 5.2
7.2        
 
Developing
6.3 5.2 8.3 5.9 5.5
7.1 8.1 7.9 6.3 6.9
9.1 8.1 7.0 4.9 5.3
6.3 7.1 6.3 6.1 5.8
5.7 6.8 8.3 7.7  
  1. Compute a point estimate of the difference of mean birth weight μ1- μ2.
  2. Compute the pulled estimate for σ.
  3. Determine the maximum error of the difference μ1- μ2 at the 95 percent level of confidence.
  4. Construct a 95 percent confidence interval for μ1- μ2.

Solution

Exercise 8.2.4. African elephants and Indian elephants are different in height, weight, and length of ear and tusk. It is natural to assume that all these are normally distributed. Assume that the height of African and Indian elephants have an equal mean σ. The mean heights of African elephants and Indian elephants are μ1, μ2, respectively. Suppose the following data were collected on the height of elephants from the two continents (these are not real data).

African
10.9 11.7 9.3 9.9 11.5
8.8 12.9 11.7 9.1 11.1
9.1 8.7 10.5 11.3 12.3
13.1 12.9 9.5 10.7 11.3
  
Indian
7.1 8.3 8.2 9.1 10.3
9.3 9.7 8.9 8.8 9.1
7.9 9.9 9.2 8.8 8.1
8.7 8.8 9.3 10. 1 9.9
9.9        

  1. Compute a point estimate of the difference of mean height μ1- μ2.
  2. Determine the maximum error of the difference μ1- μ2 at the 99 percent level of confidence.
  3. Construct a 99 percent confidence interval for μ1- μ2.

Solution


8.3 Comparing Two Population Proportionsback to top

In this section, we compute a confidence interval for the difference p1-p2 of two population proportions. An example follows.

Example. We would like to have an estimate for the difference between the proportion p1 of males who are making more than fifty thousand dollars annually and the proportion p2 of females who are making more than fifty thousand dollars annually. We construct a confidence interval for p1-p2.

Similarly, we might like to compare the proportion of defective items produced by an old machine and new machine in a factory.

Assume we have two populations. Let p1 be the proportion of Population 1 that has an attribute A and let p2 be the proportion of Population 2 that has the attribute A. We want to compute a confidence interval for p1-p2.

So, we take a sample of size m from Population 1 and let X be the number of sample members that have the attribute A and X=X/m be the sample proportion that has the attribute A. ( We may say that X is the number of "success" in this sample from Population 1 and X=X/m is the proportion of "success".) We take a sample from Population 2 of size n, which is independent of the other sample. Let Y be the number of sample members that has attribute A and Y=Y/n be the sample proportion that has the attribute A. (So, Y=Y/n is the sample proportion of "success" from Population 2.)

(Let me explain the context of the example above. We interview m males and X would be the number of males in this sample who make more than fifty thousand annually and X=X/m would be the proportion of the males in this sample who make more than fifty thousand annually. Similarly, we interview n females and Y=Y/n would be the proportion of females in this sample who make more than fifty thousand.)

We develop a confidence interval for p1-p2 as follows.

  1. Notation. For the sample proportions, we have the following notatons:

    X=X/m

    Y=Y/n

  2. As we have seen before, by CLT, we have that X has N(p1,σ1) distribution where σ1 =  (p1(1-p1) /m) and Y has N(p2,σ2) distribution where σ2 =  (p2(1-p2) /n).

  3. You would agree that X-Y is a natural estimator of p1-p2.

  4. As we have assumed that the X samples and Y samples are mutually independent, it follows that X-Y has N(p1-p2,σ) distribution where σ =  ( σ12 + σ22 ).

  5. So, it follows that

    P(-zα/2<( (X- Y)-(p1-p2))/σ < zα/2 )  =  1-α

  6. If we simplify, we get P((X- Y) -zα/2σ < p1-p2< (X- Y) +zα/2σ)  =  1-α

  7. As in section 7.4, we use X as an estimate for p1 and Y as an estimate for p2 and get the following theorem.

    Theorem. An approximate (1-α)100 percent confidence interval for p1-p2 is given by

    X-Y -E < p1-p2 < X-Y+E

    where

    E= Zα/2( X(1-X)/m + Y(1-Y)/n )

  8. The E is called the margin of error.

Use of Calculators (if you have a TI-83): 2-PropZint
  1. Press stat and then select TESTS.
  2. Select 2-PropZint and enter.
  3. Feed in the values of number of successes x, y, sample sizes n1, n2 and c-level.
  4. Select calculate and enter. It will give you the confidence interval.
  5. The margin of error = E = (width of the interval)/2.

Problems on 8.3: Comparing Two Population Proportions.

Exercise 8.3.1. Suppose two independent samples were collected from two populations. We want to compare the proportions p1,p2 , respectively, of an attribute A present in these two populations. Use 95 percent confidence interval to estimate p1-p2. We are given that x = 55 had the attribute A in a sample of size m = 117 from the first population and y = 37 had the attribute A in a sample of size n = 79 from the second sample.
Solution

Exercise 8.3.2. To compare the proportions p1,p2 of defective items produced by new and old machines, respectively, samples were collected. In a sample of 57 items from the new machine, 6 were found to be defective; and in a sample of 41 items from the old, 9 were defective. Compute a 99 percent confidence interval for p1-p2
Solution

Exercise 8.3.3. To compare the proportions p1,p2 of men and women, respectively, who watch football, data was collected. In a sample of 199 men, 83 said that they watch football; and in a sample of 161 women, 51 said they watch football. (These are not real data.) Construct a 99 percent confidence interval for p1-p2.
Solution

back to top