Sampling is Harder and Cheaper Than You Think
Introduction
Frequently, analysts are called upon to investigate multiple items in order to generate conclusions. For example, forensic accountants often have to analyze a large number of items to determine the existence of fraudulent activity. Also, tax accountants have to analyze numerous transactions to determine tax deductible amounts.
It is often unreasonable and not cost-effective to investigate every item to draw such conclusions. For example, in a class action lawsuit where millions of units may have been sold, there is generally no practical method to inspect all relevant units. As a result, courts, regulators and government agencies allow practitioners to statistically sample. As the Fifth Circuit has noted, “The essence of the science of inferential statistics is that one may confidently draw inferences about the whole from a representative sample of the whole.” In re Chevron U.S.A., Inc., 109 F.3d 1016, 1019-20 (5th Cir. 1997).
Several steps need to be correctly executed in order to properly statistically sample. Even though such steps might seem onerous, they are not difficult or costly to employ. Not only does performing such steps allow one to generate stronger conclusions, but oftentimes proper sampling procedures allow one to investigate fewer items, sizably reducing overall analysis costs.
Care must be employed in performing such steps to disprove the general assumption of courts and regulators that the analyst inspected a small number of cherry-picked items most likely to support the analyst’s desired outcome. As stated in Waisome v. Port Authority of New York and New Jersey, 948 F.2d 1370, 1372 (1991), “Lawyers and judges working with statistical evidence generally have only a partial understanding of the selection processes they seek to model, they often have incomplete or erroneous data, and are laboring in an unfamiliar terrain.”
Performing a statistical sampling and extrapolation requires a detailed, step-by-step procedure (a full discussion of which would be beyond the scope of this article). Three critical concepts, however, are essential to understanding the process of statistical sampling and in recognizing whether an expert’s opinion is sufficiently reliable to support the expert’s conclusion:
-
The representativeness of a sample describes whether the sample being studied accurately reflects the overall population.
When confronted with an opinion that offers a conclusion based upon a statistical extrapolation, one must probe how closely the samples tested represent the entire population. The more representative the sample is of the population, the greater the likelihood the statistical analysis can be reliably extrapolated to the population. Samples that are cherry-picked are less likely to be truly representative of the population, and any opinion based on such a sample is less likely to be reliable.
The importance of representativeness can be seen by analyzing the difference between the population and sampling frame. The population is the entire universe of what is being studied. For example, the population could contain all units of an allegedly defective product. On the other hand, the sampling frame is the source from which the samples to be studied are drawn. These two could differ as, for example, the sampling frame might not contain information on units that are missing or cannot be located.
An analyst must determine whether there are significant differences between the population items included and excluded in the sampling frame. Items for which information is missing might have different characteristics than items possessing information. For example, services provided to frequent shoppers who have the time to fill out internet surveys could differ from services provided to other shoppers. Such differences could minimize the effectiveness of any statistical study.
To illustrate the importance of the concept of the sampling frame, assume an expert inspects one hundred units of a certain product alleged to be defective. If the units were taken directly from the production line without any identifying features during the entire production period, the expert’s conclusions potentially could be reliably extrapolated to the full product line. In contrast, if the units inspected were only those that had been returned to the manufacturer by aggrieved customers, the sampling frame would be markedly different than the population and would be far more likely to result in a different conclusion. In this scenario, the expert’s conclusions might only be extrapolated to units that had been returned to the manufacturer, and not the entire population of products produced. Given the significant differences in the sampling frames, the latter opinion would be far less reliable than the first.
-
A statistical extrapolation’s precision is often measured by its margin of error, which measures the probability that the results of the analysis would be different if the analysis were repeated.
A statistical analysis — by its very nature — results in a conclusion about an overall population based upon an examination of a less than complete subset of the population. Accordingly, there is the possibility that any statistical analysis, if repeated, would yield a different result. The margin of error represents the “plus/minus” around the results and describes the magnitude by which the results likely could differ if the analysis were repeated. A small margin of error implies a greater likelihood that the analysis’s results are reliable, while a large margin of error reflects the increased chance that the results would be different if the analysis were repeated.
To illustrate, assume 100,000 units of a certain product are manufactured. Even if an expert inspects 99,950 units and concludes they are free of defects, the expert cannot necessarily conclude the remaining 50 units are also free of defects. The expert may, however, conclude that a small number of units at most could have a defect. In other words, despite the expert finding no defects in the units inspected, if the analysis were to be repeated there remains a chance that a small percentage of the total units could be found defective. The margin of error measures this percentage.
Statisticians utilize a study’s margin of error to determine the requisite sample size. To best explain this concept, consider a coin flipping example. If one flips a fair coin twice, while one’s best guess is that one head and one tail will arise, it is very likely that one could obtain two heads or two tails. A statistician would say that on average 50 percent heads would be achieved, but the margin of error would be ± 50 percent. If a fair coin is flipped 100 times, while 50 percent heads would still be achieved on average, the margin of error falls to ± 10 percent. In other words, one is fairly certain that between 40 and 60 heads will arise with a 95 percent significance level. This example shows that the more items sampled, the lower the margin of error, all else being equal.
Statisticians utilize standard formulas to determine what the margin of error will be, given the number of items sampled. To determine the proper sample size, statisticians just reverse the formula and calculate how many items need to be investigated in order to achieve a given margin of error.
Several characteristics need to be identified to apply such formulas. For example, one must determine what margin of error and significance level to utilize in the statistical formulas. In general, statisticians cannot specify a proper margin of error or significance level as they vary depending upon purpose. For example, a smaller margin of error is likely more vital to the manufacture of a cardiac pacemaker than a piece of notebook paper. The Seventh Circuit ruled in Kadas v. MCI Systemhouse Corp., 225 F.3d 359, 363 (7th Cir. 2001) that “[i]t is for the judge to say, on the basis of the evidence of a trained statistician, whether a particular significance level, in the context of a particular study in a particular case, is too low to make the study worth the consideration of judge or jury.”
In absence of statistical guidance, one could utilize the margins of error and significance levels proscribed by governmental bodies such as the Internal Revenue Service and the Department of Health and Human Services. Even though such agencies often do not have oversight over the analysis in question, analysts can apply their guidance as it comes from reputable sources.
Additionally, an analyst can lower the margin of error associated with a given sample size by placing the items into separate strata based upon monetary amounts, line of business, etc. For example, Stratum 1 could contain all items from the Company’s Division 1 and Stratum 2 contain all items from Division 2. Care must be taken to define those strata, as over- or mis-stratification can cause the required sample size to increase.
-
Different sampling methodologies may be employed, each of which requires different mathematical modeling.
The chapter on statistics in the Reference Manual on Scientific Evidence published by the Federal Judicial Center clearly notes a preference for randomly selected samples. David H. Kaye & David A. Freedman, Reference Guide on Statistics, Reference Manual on Scientific Evidence (3d ed., Federal Judicial Center 2011). Courts have questioned the reliability of studies whether the sample items to be studied were purposefully selected or were selected solely on the grounds of cost or convenience. In re Chevron U.S.A., Inc., 109 F.3d at 1020.
To avoid the perception that the numbers utilized to randomly select the sample items are not purposefully selected to cherry-pick observations, but are truly random, most computer random number generators allow for the use of a random seed. Random number generators will produce the same set of random numbers for a given random seed. Consequently, providing the random seed along with the random numbers allows a regulator or court to verify the randomness of the numbers utilized.
After the sample has been selected and analyzed, a mathematical model must be employed to form a conclusion. A discussion of the various mathematical models used during statistical analysis is well beyond the scope of this article. It is sufficient to note, however, that employing a mathematical model inconsistent with the sampling methodology will often lead to an improper (and thus likely unreliable) conclusion. For example, different formulas need to be employed if the population is stratified.
Additionally, different statistical formulas are used depending what is being investigated. For example, in certain instances an analyst solely desires to identify the frequency of a fraud or defective product. In other instances analysts desire to quantify the monetary impact of the fraud or defective product. It is important to note that many “off-the-shelf” sample size formulas only calculate sample sizes for the former, and not the latter.
Conclusion
Statistical sampling is a powerful, cost-effective tool that, if properly deployed, allows analysts to streamline required analyses.
Three core concepts — representativeness, margin of error and mathematical modeling — underlie the mathematical steps of a statistical sampling analysis. While these critical concepts are often dismissed as “just math,” statistical analyses that fail to properly apply these concepts run the risk of being found unreliable.
As a result, it is important to enumerate a sampling plan that identifies the methodology utilized to perform the sampling analysis. To emphasize this fact, in one federal case the judge precluded an extrapolation opinion, noting that the expert “failed to document how the samples were selected; nor did he have a sampling plan,” the expert offered no “opinion whether his samples were statistically significant or their error rate” and that, ultimately, the expert “provides no basis to extrapolate his results.” In re Hardieplank Fiber Cement Siding Litig., No. 12-md-2359, 2018 WL 262826 at *26 (D. Minn. Jan. 2, 2018).
Dr. Benjamin Wilner is an economist and statistician at Alvarez & Marsal who regularly serves around the country as a consultant and expert witness on economic, financial and statistical issues. Not only does he bring the modeling skills he honed working with three Nobel laureates (including one who won for statistical modeling) to his work on behalf of plaintiffs and defendants, but he utilizes his award-winning teaching experience at three Big Ten universities to explain his models in plain English to clients, judges and juries.