Federal agencies face a key decision when testing information and communication technology (ICT) for Section 508 conformance: whether to conduct comprehensive testing or test a representative sample.
Comprehensive testing provides the most precise results by testing all applicable Section 508 standards for every ICT item. However, this method can be resource-intensive and impractical for large inventories of webpages, documents, software, hardware, or complex systems.
Representative sample testing offers a more resource-efficient alternative. It allows agencies to benchmark Section 508 conformance, identify recurring issues, and monitor progress over time.
The decision between these two approaches depends on the return on investment (ROI), risks of non-conformance, the quantity of ICT, available resources, and the desired level of precision.
Use comprehensive testing when:
Use representative sample testing when:
Illustrative Example: Website Testing
Consider a new website with 100 pages using a similar template, where a Section 508 conformance test typically involves 50 distinct checklist items.
Comprehensive Testing:
If a skilled tester spends roughly one hour on each page, the total estimated effort could be 100 hours. A single coding defect found across all 100 pages will most likely be considered one defect with 100 instances, not 100 separate defects. Fixing this typically involves a single developer change applied across all pages. While precise, this method is resource-intensive and may not offer the best ROI, especially for content developed within templates that apply broadly across content.
Representative Sample Testing:
Instead of testing all content on every page of a website a tester might select a representative sample of 100 pages to include various content types such as text, forms, media, tables, lists, and various types of interactive content. If a defect is found on one page, the tester checks other pages to determine if it’s an isolated or systemic issue. If isolated, the developer is advised to check the rest of the website for similar issues. If systemic, the developer remediates the defect and applies the solution throughout the website. This method ensures each of the accessibility checks is completed within the sample, reducing test time to a more reasonable level of effort with a better ROI. This testing may be further reduced if the webpages are heavily templated and there are little to no template deviations found across the website.
| Approach | Benefits | Drawbacks |
|---|---|---|
| Comprehensive Testing |
|
|
| Sample Testing |
|
|
Determining Sample Size for Section 508 Conformance Testing
Federal agencies can benefit from a dependable method for assessing ICT conformance with Section 508 without the necessity of testing every page, screen, feature, or hardware product. By selecting a sample size that aligns with available resources and by understanding confidence level and margin of error, agencies can make well-informed decisions regarding testing scope. Start small if needed, but aim for a larger, representative sample when possible to build a strong baseline of conformance.
The reliability and precision of your findings depends on sample size, confidence level, and margin of error.
When you test a sample, you are making inferences about the entire system or asset inventory. Population size is required in order to calculate the sample size needed for a specific confidence level and margin of error. Population size is the total number of assets you could test. In ICT accessibility, this could be:
- The total number of web pages owned or operated by the agency or the total number of web pages within a website.
- The total number of software applications owned or operated by the agency or the total number of screens or views in a software application.
- The total number of public facing electronic documents.
- The number of hardware ICT owned or operated by the agency or the total number of screens available in hardware ICT.
Two key statistical concepts help describe the strength of those inferences:
Confidence Level
Confidence level is the probability that your results reflect the true level of Section 508 conformance in the entire system or asset inventory. Common levels are 90%, 95%, and 99%.
- A 90% confidence level is often used when speed and efficiency matter more than precision, such as early-stage assessments or internal diagnostics.
- A 95% confidence level is the most commonly used standard across many fields, balancing reasonable certainty with practical effort.
- A 99% confidence level is typically used in high-stakes or compliance-driven contexts, where decisions carry greater risk and a higher degree of certainty is required, though it usually requires larger sample sizes.
Margin of Error
Margin of error is the range within which the true value is likely to fall. A smaller margin of error means your test results are closer to the total population’s outcomes, so you can trust them more. A larger margin of error means your results may differ more from the total population, making them less reliable. For example, a 95% confidence level with ±10% margin of error means you can be 95% certain the real Section 508 conformance rate of the total population is within 10 percentage points of your sample’s results.
Cochran’s formula is used to estimate the number of items that should be tested to achieve a desired level of statistical confidence and precision.
The formula is applied in two steps:- Initial sample size (for large or unknown populations). This calculates a baseline sample size assuming a very large population:
- Adjusted sample size (for a known population). If the total population size (N) is known, apply the finite population correction:
- n₀ = initial sample size (assuming a large population)
- n = adjusted sample size (when population size is known)
- N = total population size (for example, total number of webpages, systems, or assets)
- e = margin of error, expressed as a decimal. For example:
- ±5% precision gives e=0.05
- ±10% precision gives e=0.10
- p = estimated proportion of the population (as percentage) with the attribute of interest
- If unknown, use p = 0.5 (most conservative assumption)
- Z = z-value corresponding to the selected confidence level
Determining the z-value
To determine the z value for a given confidence level:- Convert the confidence level to decimal form. For example 95% = 0.95.
- Subtract from 1 and divide by 2:
- (1 − 0.95) ÷ 2 = 0.025
- Add this value back to the original decimal:
- 0.95 + 0.025 = 0.975
- Look up 0.0975 in a standard normal (z) table.
- The corresponding z-value is 1.96.
In practice, standard z-values are typically used, as shown in Table 2.
| Confidence Level | Z-Value |
|---|---|
| 90% Confidence | 1.65 |
| 95% Confidence | 1.96 |
| 99% Confidence | 2.58 |
Example
Using a 95% confidence level, ±5% margin of error, and assuming p = 0.5, the initial sample size (n₀) is 385 (rounded up). If the population is large or unknown, testing at least 385 items provides 95% confidence that the results are within ±5 percentage points of the true population value.If the population size is known (for example, N = 1,000 web pages), the adjusted sample size is smaller after applying the finite population correction (e.g., 278 webpages).
Sample Size Considerations
Determining your sample size depends on available resources, project scope, and the risk associated with Section 508 defects, which will help inform the best approach for sampling.
Some items to consider when selecting a representative sample set include:
| Approach | Purpose | Sample Size | Expected Confidence | Use Case |
|---|---|---|---|---|
| Baseline Approach (Low Confidence) | Quick insight into overall Section 508 conformance or used for baselining conformance. | Five to ten items such as pages, screens, or documents | High margin of error and low confidence. Testing may miss critical defects. | Small teams or limited resources. |
| Balanced Approach (Moderate Confidence) | Balance between level of effort, ROI, and precision | 30–50 items such as pages, screens, or documents. | About 90–95% confidence with a ±10–15% margin of error, depending on population size. | Medium-sized agencies establishing a baseline or semi-resourced teams |
| Robust Approach (High Confidence) | More precision for decision-making and reporting. | 100+ items | 95% confidence with ±5–10% margin of error, depending on total population. | Large agencies, well-resourced teams, full Section 508 compliance reporting, or systems with high risk such as high traffic or public use. |
| Comprehensive Approach (Very High Confidence) | Maximum precision of Section 508 conformance. | All pages, screens, or documents | 100% confidence level and zero margin of error. | Mission-critical systems or when Section 508 nonconformance creates high risk. |
The relationship between sample size and the total number of ICT assets, such as web pages, software applications, hardware, or electronic documents, being tested is illustrated in the table below. This relationship is shown in Table 4 using 90% and 95% confidence levels as standard reference points, highlighting how the required sample size changes.
For smaller populations, a proportionally larger sample is needed, while for larger populations, the increase in sample size is modest. Once the population size exceeds 5,000 ICT assets, the sample size increases slowly.
Online “statistical sample size” calculators can assist in determining the necessary sample size by factoring in population, confidence intervals, and margin of error. While 95% confidence is a common standard, it necessitates a slightly larger sample size. Opting for 90% confidence can reduce testing effort, though it provides slightly less certainty. In practical terms, for a mid-sized ICT population, the difference between 90% and 95% confidence typically amounts to 10 to 20 additional items tested.
| Total ICT (Population Size) | Sample Size for 90% Confidence | Sample Size for 95% Confidence | Approximate Margin of Error |
|---|---|---|---|
| 50 | 26 | 30 | ±10% |
| 100 | 41 | 49 | ±10% |
| 500 | 70 | 81 | ±10% |
| 1,000 | 73 | 88 | ±10% |
| 5,000+ | 90–100 | 100–150 | ±8–10% |
Related Resources
- Technology Accessibility Playbook - Play 9: Integrate Accessibility Needs into Development Processes
- Technology Accessibility Playbook - Play 10: Conduct Information and Communication Technology Accessibility Testing
- Technology Accessibility Playbook - Play 11: Track and Resolve Accessibility Issues
- Testing Lifecycle Overview
- Testing Methods Overview
- Using Statistical Sampling
