Sample vs Comprehensive Section 508 Conformance Testing

Federal agencies face a key decision when testing information and communication technology (ICT) for Section 508 conformance: whether to conduct comprehensive testing or test a representative sample.

Comprehensive testing provides the most precise results by testing all applicable Section 508 standards for every ICT item. However, this method can be resource-intensive and impractical for large inventories of webpages, documents, software, hardware, or complex systems.

Representative sample testing offers a more resource-efficient alternative. It allows agencies to benchmark Section 508 conformance, identify recurring issues, and monitor progress over time.

The decision between these two approaches depends on the return on investment (ROI), risks of non-conformance, the quantity of ICT, available resources, and the desired level of precision.

Use comprehensive testing when:

The system or ICT inventory is small enough to test every item practically.
Section 508 conformance failures are considered high-risk.
Precise results are required for Section 508 compliance reporting.

Use representative sample testing when:

The system or ICT inventory is too large for a complete review.
The goal is to establish a baseline of Section 508 conformance across a broad range of assets.
There is a need to balance precision with available resources.
Ongoing improvements are being monitored over time.

Illustrative Example: Website Testing

Consider a new website with 100 pages using a similar template, where a Section 508 conformance test typically involves 50 distinct checklist items.

Comprehensive Testing:

If a skilled tester spends roughly one hour on each page, the total estimated effort could be 100 hours. A single coding defect found across all 100 pages will most likely be considered one defect with 100 instances, not 100 separate defects. Fixing this typically involves a single developer change applied across all pages. While precise, this method is resource-intensive and may not offer the best ROI, especially for content developed within templates that apply broadly across content.

Representative Sample Testing:

Instead of testing all content on every page of a website a tester might select a representative sample of 100 pages to include various content types such as text, forms, media, tables, lists, and various types of interactive content. If a defect is found on one page, the tester checks other pages to determine if it’s an isolated or systemic issue. If isolated, the developer is advised to check the rest of the website for similar issues. If systemic, the developer remediates the defect and applies the solution throughout the website. This method ensures each of the accessibility checks is completed within the sample, reducing test time to a more reasonable level of effort with a better ROI. This testing may be further reduced if the webpages are heavily templated and there are little to no template deviations found across the website.

Table 1. Comprehensive Testing vs. Sample Testing Comparison
Approach	Benefits	Drawbacks
Comprehensive Testing	Maximum precision of Section 508 conformance No reliance on estimates or statistical inferences Ensures rare or edge-case issues are identified	Resource-intensive May not be practical for large systems or large asset inventories Slows down remediation and reporting
Sample Testing	Efficient because it requires fewer resources Enables faster identification of recurring issues Allows ongoing monitoring and benchmarking Scales well for large systems or large asset inventories	Findings are estimates, not guarantees Risk of missing critical Section 508 defects Confidence depends on representative sample size and selection method

Determining Sample Size for Section 508 Conformance Testing

Federal agencies can benefit from a dependable method for assessing ICT conformance with Section 508 without the necessity of testing every page, screen, feature, or hardware product. By selecting a sample size that aligns with available resources and by understanding confidence level and margin of error, agencies can make well-informed decisions regarding testing scope. Start small if needed, but aim for a larger, representative sample when possible to build a strong baseline of conformance.

The reliability and precision of your findings depends on sample size, confidence level, and margin of error.

Tip: "Statistical sample size” calculators can assist in determining the necessary sample size by factoring in population, confidence intervals, and margin of error.

Calculate a sample size using Cochran’s formula

This calculator uses Cochran’s sample size formula with a conservative proportion estimate of p = 0.5 and applies the finite population correction for smaller populations.

Confidence level

Choose the degree of certainty for the estimate.

Margin of error (%)

Enter the acceptable range of error, such as 5 for ±5%.

Population size

Enter the total number of items in the population you want to assess.

Sample size

Enter a sample size to estimate the implied population size using the same confidence level and margin of error.

Results

Recommended sample size: 278 items

Based on a population size of 1,000, 95% confidence level, and ±5% margin of error.

Uses Cochran’s formula with finite population correction and p = 0.5.

When you test a sample, you are making inferences about the entire system or asset inventory. Population size is required in order to calculate the sample size needed for a specific confidence level and margin of error. Population size is the total number of assets you could test. In ICT accessibility, this could be:

The total number of web pages owned or operated by the agency or the total number of web pages within a website.
The total number of software applications owned or operated by the agency or the total number of screens or views in a software application.
The total number of public facing electronic documents.
The number of hardware ICT owned or operated by the agency or the total number of screens available in hardware ICT.

Two key statistical concepts help describe the strength of those inferences:

Confidence Level

Confidence level is the probability that your results reflect the true level of Section 508 conformance in the entire system or asset inventory. Common levels are 90%, 95%, and 99%.

A 90% confidence level is often used when speed and efficiency matter more than precision, such as early-stage assessments or internal diagnostics.
A 95% confidence level is the most commonly used standard across many fields, balancing reasonable certainty with practical effort.
A 99% confidence level is typically used in high-stakes or compliance-driven contexts, where decisions carry greater risk and a higher degree of certainty is required, though it usually requires larger sample sizes.

Margin of Error

Margin of error is the range within which the true value is likely to fall. A smaller margin of error means your test results are closer to the total population’s outcomes, so you can trust them more. A larger margin of error means your results may differ more from the total population, making them less reliable. For example, a 95% confidence level with ±10% margin of error means you can be 95% certain the real Section 508 conformance rate of the total population is within 10 percentage points of your sample’s results.

Tip: Higher confidence levels and smaller margins of error require larger sample sizes.

Cochran’s formula is used to estimate the number of items that should be tested to achieve a desired level of statistical confidence and precision.

The formula is applied in two steps:

Initial sample size (for large or unknown populations). This calculates a baseline sample size assuming a very large population: $n_{0} = \frac{z^{2} \cdot p \cdot (1 - p)}{e^{2}}$
Adjusted sample size (for a known population). If the total population size (N) is known, apply the finite population correction: $n = \frac{n_{0}}{1 + \frac{n_{0} - 1}{N}}$

Where:

n₀ = initial sample size (assuming a large population)
n = adjusted sample size (when population size is known)
N = total population size (for example, total number of webpages, systems, or assets)
e = margin of error, expressed as a decimal. For example:
- ±5% precision gives e=0.05
- ±10% precision gives e=0.10
p = estimated proportion of the population (as percentage) with the attribute of interest
- If unknown, use p = 0.5 (most conservative assumption)
Z = z-value corresponding to the selected confidence level

Determining the z-value

To determine the z value for a given confidence level:

Convert the confidence level to decimal form. For example 95% = 0.95.
Subtract from 1 and divide by 2:
- (1 − 0.95) ÷ 2 = 0.025
Add this value back to the original decimal:
- 0.95 + 0.025 = 0.975
Look up 0.0975 in a standard normal (z) table.
The corresponding z-value is 1.96.

In practice, standard z-values are typically used, as shown in Table 2.

Table 2. Common confidence levels and associated z-value
Confidence Level	Z-Value
90% Confidence	1.65
95% Confidence	1.96
99% Confidence	2.58

Example

Using a 95% confidence level, ±5% margin of error, and assuming p = 0.5, the initial sample size (n₀) is 385 (rounded up). If the population is large or unknown, testing at least 385 items provides 95% confidence that the results are within ±5 percentage points of the true population value.

n_{0} = \frac{{1.96}^{2} \cdot 0.5 \cdot (1 - 0.5)}{{0.05}^{2}} = 384.16

If the population size is known (for example, N = 1,000 web pages), the adjusted sample size is smaller after applying the finite population correction (e.g., 278 webpages).

n = \frac{384.16}{1 + \frac{384.16 - 1}{1000}} = 277.5

Sample Size Considerations

Determining your sample size depends on available resources, project scope, and the risk associated with Section 508 defects, which will help inform the best approach for sampling.

Some items to consider when selecting a representative sample set include:

Prioritizing high traffic and high risk products.
Prioritizing critical functions and user paths to ensure testing of content users need most.
Ensuring different content types, components, and elements are included, such as text, forms, documents, dynamic pages, media, navigation, and tables. For instance, identify content variations across web pages and test a representative variety of those pages.
Avoiding repetitive testing of the same content type, as this may not accurately reflect your entire ICT inventory.
Using the same sample set to track improvements over time and use consistent testing methodologies.
Clearly record how you selected the sample and what confidence level and margin of error apply.

Table 3: Approaches to Testing with Sample Size and Expected Confidence Intervals
Approach	Purpose	Sample Size	Expected Confidence	Use Case
Baseline Approach (Low Confidence)	Quick insight into overall Section 508 conformance or used for baselining conformance.	Five to ten items such as pages, screens, or documents	High margin of error and low confidence. Testing may miss critical defects.	Small teams or limited resources.
Balanced Approach (Moderate Confidence)	Balance between level of effort, ROI, and precision	30–50 items such as pages, screens, or documents.	About 90–95% confidence with a ±10–15% margin of error, depending on population size.	Medium-sized agencies establishing a baseline or semi-resourced teams
Robust Approach (High Confidence)	More precision for decision-making and reporting.	100+ items	95% confidence with ±5–10% margin of error, depending on total population.	Large agencies, well-resourced teams, full Section 508 compliance reporting, or systems with high risk such as high traffic or public use.
Comprehensive Approach (Very High Confidence)	Maximum precision of Section 508 conformance.	All pages, screens, or documents	100% confidence level and zero margin of error.	Mission-critical systems or when Section 508 nonconformance creates high risk.

The relationship between sample size and the total number of ICT assets, such as web pages, software applications, hardware, or electronic documents, being tested is illustrated in the table below. This relationship is shown in Table 4 using 90% and 95% confidence levels as standard reference points, highlighting how the required sample size changes.

For smaller populations, a proportionally larger sample is needed, while for larger populations, the increase in sample size is modest. Once the population size exceeds 5,000 ICT assets, the sample size increases slowly.

Online “statistical sample size” calculators can assist in determining the necessary sample size by factoring in population, confidence intervals, and margin of error. While 95% confidence is a common standard, it necessitates a slightly larger sample size. Opting for 90% confidence can reduce testing effort, though it provides slightly less certainty. In practical terms, for a mid-sized ICT population, the difference between 90% and 95% confidence typically amounts to 10 to 20 additional items tested.

Table 4: Sample Size, 90% and 95% Confidence Level, and Approximate Margin of Error
Total ICT (Population Size)	Sample Size for 90% Confidence	Sample Size for 95% Confidence	Approximate Margin of Error
50	26	30	±10%
100	41	49	±10%
500	70	81	±10%
1,000	73	88	±10%
5,000+	90–100	100–150	±8–10%

Related Resources

Reviewed/Updated: May 2026