Survey sampling
Guidance on survey sampling covering sample size, different types of probability sampling and non-probability sampling.
This guide is one in a series on different aspects of statistical literacy. The others can be found in the House of Commons Library's Good Information Toolkit.
What is a survey sample?Collecting data from entire populations can be costly, time-consuming and resource intensive.
A survey is a method of gathering data from a smaller subset of the total population that can be used to make inferences about that population.
This subset of the total population is what we mean by a survey sample.
The aim of survey analysis is to use information from survey data to make appropriate inferences about a population. To do this, surveys need to be large enough to represent the population that they aim to study and use a sampling technique that should produce a representative sample.
What is the sample size?The sample size is the number of people who are surveyed to represent the population that is being studied.
Working out a reasonable sample size largely depends on the size of the population being studied and the level of precision you want your survey estimates to have.
Statistical techniques can be applied to work out precise sample sizes, but a general rule of thumb is that a sample of around 500 people is usually enough to adequately represent national populations.
While 500 people is usually an adequate sample size, surveys often need to account for different population groups, geographic areas and so on. This means that the need for around 500 respondents might apply to multiple factors and so the overall sample size requirement would increase accordingly.
With very small populations, for example populations of 100 people or fewer, sampling is likely to be inappropriate and so the whole population should be sampled.
What is probability sampling?Probability sampling is the only type of sampling that can provide data which is representative of the population being sampled.
As the name suggests, in probability sampling each unit of the sample is selected with a known probability.
Units are chosen using random selection methods which mean that the same probability of being picked for the sample applies to every unit of the population.
Because units from the population are randomly selected and each unit’s selection probability can be calculated, reliable estimates can be produced, and statistical inferences can be made about the population.
Simple random samplingIn a simple random sample, a random sample is chosen using the whole population as a sampling frame.
For example, if you have a total of 1,000 people and 250 people are to be sampled, then the sampling fraction would be 1/4. This means that one person in every four is to be sampled. The people to be sampled are then selected using some form of random number generation.
Issues that might affect the representation of the survey can arise with simple random sampling. In the example diagram above, the resulting random sample is clearly biased towards purple rather than green.
This isn’t a problem if we are sure that the purple and green groups are unlikely to differ in terms of the subject we are surveying.
However, if we need our sample to be balanced in terms of subgroups within it then a more complex type of sampling might be appropriate.
Stratified samplingIn stratified sampling, the population is first divided into subgroups, called strata, according to relevant characteristics (for example, gender, age, ethnicity and so on).
You can then calculate how many people should be sampled from each stratum based on the proportion of the subgroup in the total population.
Random sampling can them be applied to each stratum to select a sample that properly represents each subgroup.
In the simple example shown below, the population is grouped into two strata – purple and green – each containing 50% of the total population. If we select the same random number from each stratum, we get a sample that reflects the proportion of these groups in the total population.
Cluster samplingCluster sampling involves grouping the population into convenient, naturally occurring clusters, usually representing geographic areas.
At the first stage of selection, a number of clusters are selected. At the second stage, all the units in the chosen clusters are selected to form the sample.
Cluster sampling is very common in large-scale national surveys as it can reduce costs and make data collection easier.
However, clustering can increase the risk of introducing bias into samples when compared with simple random samples or stratified samples.
This is because units within clusters might be more similar than units from different clusters.
For example, national surveys in the UK often use postcode sectors as a form of cluster sampling and select households within a given postcode area. Households within a given postcode area are likely to reflect similar housing tenure, house prices and so on, and this could also mean households are similar in terms of income levels, socioeconomic status and so on.
Mixed method samplingThe advantages and disadvantages of different forms of probability sampling often mean that a mixed method approach is applied.
Surveys may include a combination of stratification, clustering and random sampling. The nature of the exact sample design is determined by factors like:
- the level of precision needed from the survey estimates
- practical considerations such as:
- the resources available
- the budget available for the survey
It is crucial that published surveys are transparent about the sample design used and how this might affect the survey findings.
What are the effects of using probability sampling methods?Because surveys are based on a sample of the total population, there is a level of uncertainty or error associated with the estimates they produce.
Sampling error reflects the difference between survey estimates and the ‘true value’ that would be found if measurements were taken from the total population.
If samples are created using probability sampling, then an estimate of the error associated with survey results can be calculated.
Indicators of sampling error can tell us about the accuracy of an estimate and the importance you might attach to them.
Good quality survey reports include measures of the uncertainty/error associated with the survey estimates. These are often shown as confidence intervals.
Confidence intervals express the uncertainty associated with a central estimate by giving a range of plausible values within which the true value for the population lies.
You can use the overlap in confidence intervals as a quick way to check for statistical significance (the likelihood that any observed effects are due to something other than chance - see our note on confidence intervals and statistical significance). In general, if the intervals do not overlap then there is a statistically significant difference (at a certain level of confidence, usually 95%) whereas if there is an overlap, then the difference is not significant (and effects could just be a chance happening).
This can be very important in determining the level of importance to attach to survey estimates that suggest differences between groups or between survey years.
For example, data from the Office for National Statistics (ONS) suggests that women are more likely than men to describe their experience of being on a hospital waiting list as very poor.
In March 2025, 17% of women said they had a very poor experience of being on a hospital waiting list compared with 12% of men.
However, if we consider these figures in the context of the uncertainty associated with these survey estimates then we can’t actually be sure this is a genuine difference.
As the table below shows, the confidence intervals associated with the estimates mean that the rate for women could be as low as 10%, which overlaps with the possible range of values for men of 6% to 18%.
Source: ONS, Public opinions and social trends, Great Britain: NHS hospital waiting experience
What is non-probability sampling?Non-probability sampling methods do not rely on random sampling and so it is impossible to know whether they are representative of the population.
For this reason, surveys collected using non-probability samples cannot be used to infer information about an overall population.
Some non-probability sampling methods are outlined below. If a survey is based on any of these types of non-probability sampling, then the survey findings cannot be considered to represent the wider population.
Quota SamplingIn quota sampling, researchers select respondents until a predetermined number of respondents in certain categories are surveyed. Quota sampling is often used by market research companies. When properly conducted it can produce results that are similar to probability sampling.
However, when non-response is significant (which is almost always the case for voluntary surveys), quota sampling is likely to underrepresent parts of the population that are unwilling to respond or hard to reach.
Convenience samplingMagazine and newspaper questionnaires and phone-in polls are all examples of convenience sampling. These types of surveys are subject to biased or unrepresentative samples as people who feel strongly about a topic are more likely to respond.
Purposive SamplingPurposive sampling involves intentionally selecting participants based on their characteristics, knowledge, experiences or some other criteria.
The research findings based on purposive sampling can only be generalised to the (sub) population from which the sample is drawn, and not to the entire population.
Snowball samplingIf the population is hard to access, snowball sampling can be used to recruit participants via other participants. The number of people you have access to ‘snowballs’ as you get in contact with more people.
Key messages- To assess whether a survey provides representative findings for a total population, details of the sample methodology should be published.
- Probability sampling is the only type of sampling that can provide data which is representative of the population being sampled.
- Surveys collected usingnon-probability sampling cannot be used to infer information about the overall population.