In the previous study unit we discussed the idea of taking a sample of items from a much larger population. Calculations may be carried out on the sample data, and the results used to make inferences regarding the whole population. We discussed the idea that with repeated samples, the statistic calculated from the sample would vary. In fact, the statistic will have a distribution in just the same way that the population of items has a distribution. Although it will not be the same distribution, where the sample statistic is the mean, there are known similarities.
In reality, we will probably only take one sample, and use the sample mean to estimate the population mean. However, because of the theory regarding the distribution of all sample means, we are able to say how close this estimate is likely to be. That is, we are able to put a confidence interval around our estimate.
Remember that all the results in this study unit are general ones for samples taken from large populations, and therefore because of the central limit theorem, the original population from which the sample is taken does not need to follow the normal distribution.
. ESTIMATES OF A POPULATION PROPORTION,
WITH A CONFIDENCE INTERVAL, USING SAMPLE DATA
Another estimate which is commonly required is the estimate of a proportion. For example:
− What proportion of all invoices contain errors?
− What proportion of the population use our product?
The sampling distribution of a proportion is known to follow these rules:
- The sampling distribution of proportions is a normal distribution.
- The mean of the sampling distribution is equal to the mean of the population.
- The standard error of the sampling distribution is given by:
where p is the sample proportion and n is the sample size.
We can use these results, when we have taken a sample, to estimate a population proportion and set up a confidence interval for this proportion, just as we did for population means.
CALCULATING THE SAMPLE SIZE
In the previous examples, we have been starting with a known sample size and using this to calculate a confidence interval. It is also possible to work this calculation the other way round. That is, we can start with the required confidence interval, and use it to calculate the size of sample we must take.
. STATISTICAL TESTS
This topic is also called “hypothesis testing” or “significance testing”. This is because the method used is to set up a hypothesis and then carry out a test of this hypothesis using sample data, at a given significance level.
The approach is straightforward. A hypothesis is proposed. Then a sample is taken. The likelihood of the sample result, given that the hypothesis is true, is calculated. If this is within the significance level which we wish to use, the hypothesis is accepted. If not, the hypothesis is rejected.
The original hypothesis is called the null hypothesis, and by convention is denoted by Ho.
There are many statistical packages, both for personal computers and for larger machines, which will carry out such tests. However, it is very important for you to have an understanding of the principles behind the tests. Computers carry out calculations blindly, and will always come up with an answer, even if the results are meaningless. You must know enough about the techniques to examine the results critically, and verify that all statistics and conclusions are reasonable.
ANALYSIS AND INTREPRETATION OF SAMPLE DATA
Notes on sampling theory and the Chi- squared distribution.
This topic can be divided into 3 parts.
- Confidence intervals.
- Hypothesis testing.
- The Chi- Squared test.
When carrying out a hypothesis test you need to follow the following 3 steps:
- State hypothesis
- Carry out test (Z test) 3) Draw conclusion.