Satyagopal Mandal
Department of Mathematics
University of Kansas
Office: 624 Snow Hall  Phone: 785-864-5180
  • e-mail: mandal@math.ukans.edu
  • © Copy right Laws Apply. My Students have the permission to copy.

    The Distribution of the Sample Mean

     

    As in the chapter on Binomial Distribution, our final theorem in this chapter would be that the sample mean

    X = (X1+X2+…+Xn)/n

    Has normal distribution.

     

     

    Given a set of data the mean or the average x (or A) that we have computed in the previous chapters is, in fact, the observed value of a random variable X to be called the sample mean.

    Similarly, the standard deviation s that we have computed before is the observed value of a random variable S to be called sample standard deviation.

    Each time you collect a sample/data the computed sample mean x is the value of the random variable X for this sample.

    Our point of view is explained in the following example.

     

    Example: Suppose we want to study the height distribution of the US population. So, we collect a data of size n = 713 as follows:

    Data on Height (in inches) of 713 individuals:

    71

    62

    67

    73

    61

    58

    63

    58

    69

    68

    55

    57

    51

    57

    49

    63

    63

    64

    72

    59

    67

    59

    57

    69

    55

    56

    65

    66

    53

    53

    51

    66

    68

    71

    61

    63

    And so on

     

    Our point of view is that the height x1, x2, x3, …, xn (in our case 71, 62, 67, …) are, in fact, the observed values of random variables X1, X2, X3,…, Xn, respectively. Here X1 is the notation for height of the first member of the sample, which could be the height of any body from the whole US population and in this case of our sample the value of X1 is 71. Similarly, X2 is the notation for height of the 2nd member of the sample, which could be the height of any body from the whole US population and in this case of our sample the value of X2 is 72.

    Each time we collect a sample

    x1, x2, x3,…, xn

    the values of X1, X2, X3, …, Xn will be different. But the sample members x1, x2, x3, …, xn happen to be the values of the same set of random variables

    X1, X2, X3, … , Xn.

     

    Definition: We define the sample mean X as the random variable

    X = (X1+ X2+….+ Xn)/n.

    So, each time we collect a sample of size n, we get a value of X, namely the average of the sample x1, x2, x3,…, xn.

     

    Remark: The main point here is that when we collect a sample and compute the mean x (or average), the value of x that we get is probabilistic or "chancy". So, we can and we have to talk about the probability distribution of x or X. If we know the distribution of X, then we will be able to answer the questions related to probability of various values of x that we may get.

     

    We could make similar comments and definitions about the standard deviation. But we may not need them.

     

    If we denote X to be a the random variable the height of an American then we also say that

    X1, X2, X3, … , Xn

    is a sample from the population X-population. We used the example of height distribution of the US population to explain our point of view. But given any random variable X (like weight, wages, binomial), we can talk about a sample

    X1, X2, X3, … , Xn

    from the X-population.

     

    Properties: Suppose X is a random variable and let

    X1, X2, X3, … , Xn

    be a sample from the X-population. Then we have the following properties.

    1. Suppose the mean of X is m and standard deviation s. Then X is called the parent or the population random variable. Also m and s are called the population mean and standard deviation.
    2. Then, each of the sample member Xi has the same distribution as X. So, mean Xi is m and standard deviation of Xi is s.
    3. The sample members X1, X2, X3, … , Xn are all independent.
    4. The distribution of X is called the sampling distribution of X.

    Theorem: The mean of the sample mean X is equal to the population mean m. So,

    mean(X) = mean(X) = m.

    The standard deviation of the sample mean X is given by

    sX = s /(n 1/2).

     

    The Central Limit Theorem: Suppose

    X1, X2, X3, … , Xn

    is a sample from a population X with mean m and Standard deviation s .

    1. Then the sample mean is, approximately, normally distributed with mean m and sX = s /(n 1/2).
    2. So, approximately,

    P(a < X < b) =P( (a- m)/sX < Z < (b- m)/sX).

     

     

     

    Problems on Sampling

     

    Ex.1: It is known that the tuition X paid per semester by students in a university has a distribution with mean m = $2,050 and standard deviation s = $310. If 64 students are interviewed, what is the approximate probability that the sample mean tuition X paid will be above $2,060?

     

    Solution: Here we are asked to compute P(X > 2,060) ?

    The mean of X = m = 2,050 and standard deviation of X =

    sX = s/(n1/2) = 310/(641/2) = 310/8 = 38.75.

    So,

    P(X > 2,060) =

    P((X - m)/ sX > (2060- m)/ sX) =

    P(Z > (2060 - 2050)/38.75) =

    P(Z > 0.26) =

    1 - P(Z < 0.3) = 1 - 0.6179 = 0.3821.

     

    Ex.2: The annual rainfall X in a region has a distribution with mean m = 22 cm and standard deviation s = 14 cm. What is the probability that over next 36 years the mean X annual rainfall will exceed 27 cm?

     

    Solution: Here we are asked to compute P(X > 27) ? Here n = 36.

    The mean of X = m = 22 and standard deviation of X =

    sX = s/(n1/2) = 14/(361/2) = 14/6 = 2.33.

    So,

    P(X > 27) =

    P((X - m)/ sX > (27- m)/ sX) =

    P(Z > (27 - 22)/2.33) =

    P(Z > 2.1) =

    1 - P(Z < 2.1) = 1 - 0.9821 = 0.0179.

     

    Ex. 3: The amount X of ice cream in an ice-cream cone has mean m = 3.2 ounces and standard deviation s = 0.4 ounces. If there are 50 children at a birthday party, what is the approximate probability that the mean consumption X will be more than 3.3 ounce?

     

    Ex.4: A cigarette manufacturer claims that the mean nicotine content in a cigarette is m = 2 mg with the standard deviation s = 0.3 mg. If this claim is valid, what is the approximate probability that a sample of n = 900 cigarettes will yield have a sample mean X nicotine content more than 2.02 mg?