Stratified Random Sampling (StRS) is a type of random sampling where random samples are selected after first sub-dividing the population into groups, called strata. Like Simple Random Sampling (SRS), discussed in a previous post, all items in the population must have some chance of being selected. However, unlike SRS, the chance of each item being selected in an StRS does not need to equal the chance of any other item being selected. Instead, the probability of selection must only be equal within a strata (a single strata can also be called a “stratum”).

## Why Use StRS?

There are two main reasons for StRS:

1. The precision of the overall estimate is usually increased. The math behind StRS is fairly straightforward, but sufficiently technical that we won’t delve into it here. However, suffice it to say that if the sample size by strata is proportional to the population, the variance will be the same if the strata have nothing to do with the outcome being measured and lower if they have anything to do with the outcome being measured. In other words, it doesn’t hurt, and may help, to stratify.
2. There may be a sub-population that needs to be estimated, but that sub-population is not likely to have a substantial number of items sampled if only an SRS is performed. Consider a situation where there is a sample of 100,000 health claims. While you may be interested in estimating the characteristics of the entire population, you are also interested in a subpopulation of MRI claims, of which there are 500. If a sample of 100 claims is selected using SRS, there would be zero MRI claims included in the sample about 61%* of the time. Even if a few of these claims were selected, it is unlikely that any meaningful estimates about MRI claims could be made. With StRS, the MRI claims can be placed into a strata and the sample size fixed at a level that would allow for precise estimation.

## StRS Requirements

StRS has two requirements:

1. The strata must be mutually exclusive (so no single item can be included in more than one group) and exhaustive (every item must be included in one of the groups) of the target population about which something is being estimated; and
2. The sample within each strata is an SRS (i.e. a sample without replacement, with fixed sample size, and with fixed probability of selection).

## How should you determine strata boundaries and sample sizes in StRS?

Using StRS effectively as a means to improve precision requires some knowledge about the population prior to sample selection. This is because the more similar items are within each strata, the more precise the overall estimate.

When data involve amounts (e.g. a population of possibly invalid healthcare claims), effective stratification can simply involve sub-dividing the population by amount paid. Textbooks such as William Cochran’s Sampling Techniques or Steven Thompson’s Sampling give relatively simple methods for determining strata boundaries and for determining appropriate sample size within each strata.

In more complicated cases, the groupings may be obvious but the sample size within each group may not be. Suppose we are sampling a population of healthcare claims for errors and we have both inpatient and outpatient claims. We have no idea what the error rate will be, but do believe it is likely to be different for inpatient claims than it is for outpatient claims even though the claims are all for similar total amounts.

In this example, we could logically stratify by whether the claim is an inpatient or outpatient claim, and simply pull a sample proportionate to population size in each strata. This stratified sample will fare no worse than simple random sampling and will be more precise if, in fact, the error rate differs between inpatient and outpatient claims.

*= (1-(500/99500))^100

This post was written with the help of Sejla Karalic and Dr. Alan Salzberg.