Satyagopal Mandal
Department of Mathematics
University of Kansas
Office: 624 Snow Hall  Phone: 785-864-5180
  • e-mail: mandal@math.ukans.edu
  • © Copy right Laws Apply. My Students have the permission to copy.

    Chapter 13:Census, Surveys, Studies

    Statistics is a science that develops and formulates techniques in order to make inferences about a large population by studying a small sample.

    What is Data?

    When information is packaged in numerical form it is called DATA.

    According to your text, statistics is the science of dealing with data. This includes collecting, organizing, understanding and interpreting data.

    What is the Population?

    In statistics we try to understand or make inferences or projections about a group of similar objects. Such a collection of individuals or objects that is under study is called the POPULATION.

    Example.1. If we are studying the income distribution of Americans the population is the American population.

    Example.2. If we are studying the income distribution of the immigrant American population then the population is the immigrant American population.

    Example.3. If we are studying the growth of the fish population in Clinton Lake then the population is the fish population in the Clinton Lake.

    Example.4. If we are studying the African elephants then the population is the population of African elephants.

    (Please, have a look at Example 1-2, page 426).

    The N-VALUE: The total number of members in the population under study is called the N-value of the population. If an accurate head count of all the members in the population were possible then we would know the N-value of the population. Often such a head count will not be possible. In that case this N-value will not be known.

    (Please, have a look at Example 3-5, page 426).

    Census

    Article 1 and article 2 of the Constitution of the United States mandates that a national census be conducted every 10 years. By census we mean an official enumeration of the population. Not only in United States, census is conducted every 10 years all over the world. Following are a few comments about census:

    1. Originally the intent of the census was to count heads for the taxes and representation.
    2. Now census is one of the major sources of data about the population.
    3. Census has often failed to count all the members of the population. It is believed that a complete count is not possible.
    4. In the year 2000 census, US population will be counted using statistical techniques. The congress and the Administration fought over this law and the law was challenged in the court.
    (Please, read about census in page 428.)

    Surveys

    A more realistic and economical alternative to census is to collect data only from a small subgroup and then use this data to make inferences about the whole population. This approach is called a survey and the subgroups of the population from which the data is collected is called a sample.

    The basic idea behind survey is that if we can find a sample that is "representative" of the whole population (that means it is not biased) then anything we need to know about the population can be derived from the sample.

    (Please, read more about survey from your text, page 429-430.)

    Public Opinion Polls

    We all know about public opinion polls – Gallop poll, Harris poll and more. Please read more about Public opinion polls from your text (page 430-434). In particular, they discussed how and why the predictions made by various opinion polls in the presidential elections in 1936 (Franklin Roosevelt vs. Alfred Landon) and 1948 (Harry Truman vs. Thomas Dewey) went wrong.

    Sampling Methods

    It is a real challenge for a statistician how to pick a "representative sample". If a statistician tries to pick a sample, his/her human bias is essentially bound to result in a "biased sample". Whatever method we use to pick a sample, the selection of the sample members must be done randomly. That means that mathematics and methods of chance must guide the selection of sample members. A sample picked in such a manner is called a random sample and the method is called random sampling.

    Another important concern regarding sampling is the cost of sampling. There are two methods of random sampling that we shall talk about here.

    1. In the method of simple random sampling each member of the population has the equal chance of being selected in the sample.
    2. The other method of sampling that we discuss is called stratified sampling. The method is as follows:

    First divide the population into categories, called strata, and randomly select a sample from these strata. The chosen strata are then further divided into categories, called substrata and select a random sample of substrata from each of the strata. The process is continued for a number of times.

    (Please read more about stratified sampling from your text, page 436-437.)

    Sample Size

    The sample size for a large population need not be very large. In practice, it is often less than 1500. If you follow CNN polls or others, they normally sample 700-1200.

    Sampling : Terminology and Key Concepts

    The job of a statistician is to make inferences about a large population on the basis a (small) sample.

    1. Any numerical value computed from the sample data is called a statistic.
    2. Any numerical value that is to be computed from the whole population data is called a parameter.

    3) Unless the population is small, the actual valued of a parameter will never be known. On the other hand, since the samples are small we can always compute the actual values of the statistics. The game here is to estimate the parameters by appropriate statistics.

    Example. Suppose we want to understand the income distribution of the US population and we want to know the average income of the US population.

    Here average US income is a parameter.

    Since it will be almost impossible to compute the actual value of the average US income, we take a sample (say of size 1500) and compute the average income of the sample members.

    This sample average is a statistic.

    It is reasonable to use this (statistic) sample average income as an estimate for the (parameter) average US income.

    Sampling Error

    A statistic used to estimate a parameter is only an estimate. So, we will not expect the statistic to be exactly equal to the parameter. In the above example, we would not expect that the sample average income to be exactly equal to the average US income. The difference between the parameter and the statistic used to estimate it is called the sampling error.

    There are two types of sampling errors as follows:

    1. Chance error: As a sample is not the whole population, a statistic can not be the exact value of the parameter. Given that other things are "perfect" and identical, two different samples will produce two different values of the statistics. So, you get different estimates (i.e. the value of the statistic) from different samples, for the same parameter. The error in estimation that arises this way is called the chance error. This error arises out of the sampling variability, and the choice of sample belongs to randomness or chance. Statisticians are comfortable with chance error because of various reasons. First, by the vary nature of statistical methods, this error is unavoidable. Second, by increasing sample size this error can also be controlled. Finally, the statistician can also tell us how often this error will exceed the tolerable limit.
    2. Sampling bias: The error that arises due to poor sampling is called sampling bias. Although, there are many sophisticated methods sampling available, implementation is not very easy. In any case, sampling bias can be eliminated by strictly and properly implementing the sampling methods. Of course, the cost of sampling will be the casualty.

     

    The Capture-recapture Method: estimating N-value

    Suppose we want to estimate the number of fish in Clinton Lake. Let N be the number of fish in the lake. We will describe of capture-recapture method of doing this. The method is as follows:

    1. Step 1. (The capture) Capture a sample of m fish, tag them and release them back into water.
    2. Step 2.: (The recapture) After every thing has settled down, capture a new sample of n fish. Count the number of tagged fish. Suppose that k of them are tagged. It is reasonable to assume that m/N=k/n.

    So, we have an estimate

    N = mn/k.

    Exercise. As part of a project we made two trips to a local lake. The first day we caught m=325 fish and tagged them. On the second day we caught n=525 fish and out of them k=125 were tagged fish. Give an estimate of the total number of fish in the lake.

    We have

    N=mn/k=325x525/125= 1365.

    Suggested Problems: Example 6 (page 439), Ex.31a-b (page 450)

    Clinical Studies

     

    When a vaccine or a new drug is tested, the statistical methods used are very interesting. I will not go deep into it, please have a look in your text (page 440). Main points are as follows:

    1. We pick two samples to be called controlled group and treatment group. The two samples need not have same size.
    2. The treatment group receives the treatment and the controlled group does not receive the treatment.
    3. Both the groups are ignorant who is receiving the treatment and who is not.
    4. Finally, the two groups are compared. If the treatment group does better than the control group then it is accepted that the treatment is working.

    The following are some data from the 1954 Salk Polio Vaccine Field Trials. Please see your text (page 441) for more.

     

    Results of the Salk Polio Vaccine Trials

     

    Number of Children

    Number of reported Polio-cases

    Number of reported Paralytic-cases

    Number of Fatal-cases

    Treatment gr.

    200,785

    82

    33

    0

    Control gr.

    201,229

    162

    115

    4

     

    You can see that the treatment group did better.

    Suggested Problems: Look at all the odd number problems between Ex.1-24, page 445.