In Module: 4 of How to Test and Remediate PDFs for Accessibility Using Adobe Acrobat DC video series, learn about the techniques used by advanced users to identify and correct text from scanned pages and signed memorandums as conformance PDFs.
[bright musical tones with gentle swoosh]
Voice Over (VO): Accessible Electronic Document Community of Practice, AED-COP.
[gentle swoosh]
VO: Accessible Electronic Document Community of Practice, AED-COP.
VO: You are watching Module 4, "Converting Scanned Documents into Section 508 Conformant PDFs." Part 5 of a video series on PDF 508 conformance.
VO: In this video, we will learn about the following:
- How to identify scanned pages
- How to perform optical character recognition
- How to correct recognized text
- How to enhance scanned pages
- How to evaluate OCR results
- How to edit textual content of a PDF
- How to make the PDF fully accessible
- And finally, how to support signed memorandums as Section 508 Conformant PDFs
VO: How to identify scanned pages. PDFs that contain scanned pages are problematic for individuals that utilize assistive technology. If a PDF does not contain searchable content, otherwise known as renderable text, individuals who rely on assistive technology, such as screen readers, will be unable to read or interact with the content of the PDF.
VO: To quickly identify if the PDF contains scanned pages, navigate through the document and look for pages that appear blurry or contain handwritten information. If any pages appear blurry or contain handwritten information, use Acrobat's content pane to see if the pages have renderable text.
VO: First, go to Acrobat's View menu and select Show Hide. Then, Navigation Panes. Then, Content. Next, expand the content tree by pressing Shift and the number 8. Now, use the up and down arrow keys to navigate the content pane. As you navigate the content pane, it should contain strings of text. As each string of text is selected, the corresponding string of text should appear in a container box on the physical view of a PDF.
VO: Because Optical Character Recognition, or OCR software, is not perfect, the text strings in the content pane may not fully match the text strings that appear on the physical view of the page. If the content pane only contains information related to figures, the scanned pages lack renderable text. And therefore, OCR will need to be performed before the document can be made accessible.
VO: Next, we will learn more about how to perform Optical Character Recognition. Adobe Acrobat Pro DC contains Optical Character Recognition, or OCR software. This software has the ability to convert scanned content into searchable text. However, the software will not be able to OCR the document's content if the quality of the scanned page is too low, or if the page already contains renderable text.
VO: Adobe recommends that when scanning documents, the DPI setting for grayscale content is set to 300 DPI, and for color content, the DPI is set to 600 DPI.
VO: To access the OCR software, follow these steps:
- From Acrobat's main toolbar, select Tools, then Enhance Scans.
- Next, from Acrobat's sub toolbar, select Recognize Text, then In this file.
- Then, select Settings from the Recognize Text toolbar.
- Once the Recognize Text Settings properties box appears, identify the pages that require OCR in the Page section.
- Now, in the Settings section of the Recognize Text properties box, set the document's language to the proper language, output to searchable images, and down sample to 300 DPI for grayscale content, and 600 DPI for color content.
- Lastly, select OK to close settings, and then select Recognize Text to run OCR.
VO: How to Correct Recognized Text. Based on the quality of the scanned document, the document's pages may contain OCR suspects. An OCR suspect is renderable text or images that may not have been recognized properly by the software.
VO: To identify and repair OCR suspects, do the following:
- From the Recognize Text toolbar option, select Correct Recognized Text.
- The first OCR suspect will appear in a box.
- If the OCR suspect was recognized correctly, select Accept from the Recognized Text sub toolbar.
- If the OCR suspect was not correct, type the correction in the Recognized As text box, and then select Accept from the Recognized Text sub toolbar.
- Complete this process until all OCR suspects have been corrected.
VO: How to Enhance Scanned Pages. If you receive a poorly scanned document that has contrast issues, speckles on the page, or the page is skewed, you may need to optimize a scanned document before attempting to recognize the text.
VO: To enhance the scanned pages, do the following:
- From the Enhance Scans toolbar, select Enhance, then Scanned Document.
- Next, select Settings.
- Then, set all the settings as desired and select OK.
- Lastly, select Enhance from the sub toolbar.
VO: How to Evaluate OCR Results. Once the PDF has been OCR'd and all OCR suspects have been corrected, the quality of the OCR will need to be inspected. Although you can navigate the content pane to review the OCR'd content, an easier way to validate the content is to export it to Microsoft Word.
VO: To evaluate the OCR'd content in Microsoft Word, do the following:
- From Acrobat's menu bar, select Tools, then Export PDF.
- Make sure the Word Document radio button is selected.
- Lastly, select Export to export the PDF to Microsoft Word.
VO: Once the PDF has been exported to Microsoft Word, compare the content of the Word file to the PDF file, make a note of any OCR errors so that they can either be edited directly in the PDF file or addressed via the Tags Properties menu.
VO: How to Edit Textual Content of a PDF. If the PDF contains several OCR errors, it is possible to make minor edits to the PDF by using the Edit PDF tool. Keep in mind, Acrobat is not a Word processor, therefore, content edited in the PDF may not maintain the proper formatting or style.
VO: If the PDF has large blocks of text that were not recognized properly, it will be best to retype the paragraph content in the actual text field located in the Tags Property menu.
VO: To make minor edits to the content, do the following:
- From Acrobat's menu bar, select Tools, then Edit PDF.
- All of the content that can be edited will appear in boxes.
- Next, use the individual edit tools located on the sub-tool bar to correct the document.
- Lastly, close the edit tools.
VO: Next, we will discuss how to make the PDF fully accessible. Once the PDF has been OCRed, all OCR suspects have been corrected and OCR errors have been edited, the PDF will need to be tagged, alternative text will need to be added to images, and the PDF will need to be tested for Section 508 conformance.
VO: If large blocks of text contained OCR errors and the errors could not be corrected via the Edit tools, it will be necessary to use the actual text properties field to replace OCR errors to ensure that screen readers can read the PDF correctly.
VO: Because the corrected text is applied to the properties of a paragraph tag, the visual content on the screen will not be modified, nor will it be read by assistive technology. Instead, assistive technology will read the text added to the actual text field.
VO: To add correction text to the actual text field, perform the following steps:
- Open the Tags pane.
- Now, select the Selection tool located on the menu bar.
- Next, use the Selection tool to select the first OCR error.
- In the Tags pane, select Options, then Find Tag from Selection.
- Next, right-click on the highlighted tag and select Properties.
- Then, from the Properties box, add the corrected text to the actual text field.
- Lastly, select Close to close the Properties box.
VO: In the next section, we will learn about how to support signed memorandums as Section 508 Conformant PDFs. Agencies often struggle with ensuring that signed memorandums and other hardcopy signed agency directives and official communications are 508 compliant.
VO: Since many agencies do not yet support a secure e-signature solution for all PDF documents and staff, you are often presented with a Word document that needs to be printed out into hardcopy and signed in ink before being scanned and sent out electronically.
VO: In these cases, you have a Word document that already contains the markup and support for 508, but you are printing it out for a signature and then scanning it back up into a PDF, losing all structure and markup.
VO: Instead of having to restructure an entire scanned PDF, start by looking at the Word version. Is the document already set up for 508? How many pages contain a signature block?
VO: If you have a Word document and there are only one or two pages with signature blocks, consider merging the document pages to save time while ensuring 508 conformance.
VO: To begin:
- Print out the document for signature.
- Once it comes back to you signed, take the page or pages with the signature and scan to PDF.
- Form OCR and markup support, as previously outlined in this video, on the scanned page with the signature block.
VO: Now, check the Word document to make sure you don't need to edit any 508 markups before you move it to PDF. You will then have two PDF documents, the main piece of the document and the signed scanned page or pages.
VO: To merge the PDF documents:
- Choose Tools, then Organize Pages. Alternatively, you can choose Organize Pages from the right pane.
- The Organize Pages tool set is displayed in the secondary toolbar, and the page thumbnails are displayed in the document area.
- Right-click with your mouse on the page thumbnail you wish to delete, then select Delete from the drop-down menu.
VO: Note, you could not delete all pages. At least one page must remain in the document.
VO: Now, position your mouse between the page thumbnails where you would like to place the scanned signed page.
VO: Merge the new signed page into the main document:
- From the Organize Pages window that you are still in, select Insert, then From File, and choose the file that is the scanned page that you have saved as a PDF with markup and OCR already done.
- If you need to reposition the new page within the document, drag it with the mouse to the correct location.
- Save the newly merged document with the document title as the title.
VO: Scanned PDFs will not have a robust document title, as the scanner generates a random title. So make sure you save the document with the appropriate title.
VO: Next, perform one more quick check following the steps previously outlined. Now, you have an accessible 508 conformant signed memorandum.
VO: Note, after you delete or replace pages, it's a good idea to use the Reduce File Size command to rename and save the restructured document to the smallest possible file size.
VO: To do this: Select the File menu option, save as Other, then Reduce to Size PDF.
VO: We hope you have enjoyed this series of videos, and we encourage you to review them as often as necessary.
VO: To keep up on the latest developments on electronic document accessibility, please frequently visit www.section508.gov/refresh-toolkit.
Related PDF Video Series Modules
- Module 0: Background & Introduction
- Module 1: What is a PDF?
- Module 2: Testing a PDF for Accessibility
- Module 3: Remediating PDFs for Accessibility
- Module 4: Converting Scanned Documents into Section 508 Conformant PDFs
