How can I generate a PDF/UA compatible PDF with iText?

Tags: PDF/UAtagged pdfiText 7

We have a number of dynamically generated PDFs on our site that were created using iText 2.1.7. However, we also have a large number of users that have disabilities and use screen readers, like JAWS, to render our PDFs. We use the setTagged() method to tag the PDFs, but some elements of the PDF appear out of order. Some even become more jumbled after calling setTagged()!

I read about PDF/UA in a 2013 interview about iText with Bruno Lowagie, and this seems like something that might help with our problem. However, I have not been able to find a good example of how to generate a PDF/UA document. Can you provide an example?

Posted on StackOverflow on Jan 29, 2015 by k-den

Please take a look at the PdfUA example. It explains step by step what is needed to be compliant with PDF/UA. A similar example was presented at the iText Summit in 2014 and at JavaOne. Watch the iText Summit video tutorial.

public void manipulatePdf(String dest) throws IOException, XMPException {
    PdfDocument pdfDoc = new PdfDocument(new PdfWriter(dest, new WriterProperties().setPdfVersion(PdfVersion.PDF_1_7)));
    Document document = new Document(pdfDoc, new PageSize(PageSize.A4).rotate());
    //TAGGED PDF
    //Make document tagged
    pdfDoc.setTagged();
    //===============
    //PDF/UA
    //Set document metadata
 
    pdfDoc.getCatalog().setViewerPreferences(new PdfViewerPreferences().setDisplayDocTitle(true));
    pdfDoc.getCatalog().setLang(new PdfString("en-US"));
    PdfDocumentInfo info = pdfDoc.getDocumentInfo();
    info.setTitle("English pangram");
    //=====================
 
    Paragraph p = new Paragraph();
    //PDF/UA
    //Embed font
    PdfFont font = PdfFontFactory.createFont(FONT, PdfEncodings.WINANSI, true);
    p.setFont(font);
    //==================
    Text c = new Text("The quick brown ");
    p.add(c);
    Image i = new Image(ImageDataFactory.create(FOX));
    //PDF/UA
    //Set alt text
    i.getAccessibilityProperties().setAlternateDescription("Fox");
    //==============
    p.add(i);
    p.add(" jumps over the lazy ");
    i = new Image(ImageDataFactory.create(DOG));
    //PDF/UA
    //Set alt text
    i.getAccessibilityProperties().setAlternateDescription("Dog");
    //==================
    p.add(i);
    document.add(p);
    p = new Paragraph("\n\n\n\n\n\n\n\n\n\n\n\n").setFont(font).setFontSize(20);
    document.add(p);
    List list = new List();
    list.add((ListItem) new ListItem("quick").setFont(font).setFontSize(20));
    list.add((ListItem) new ListItem("brown").setFont(font).setFontSize(20));
    list.add((ListItem) new ListItem("fox").setFont(font).setFontSize(20));
    list.add((ListItem) new ListItem("jumps").setFont(font).setFontSize(20));
    list.add((ListItem) new ListItem("over").setFont(font).setFontSize(20));
    list.add((ListItem) new ListItem("the").setFont(font).setFontSize(20));
    list.add((ListItem) new ListItem("lazy").setFont(font).setFontSize(20));
    list.add((ListItem) new ListItem("dog").setFont(font).setFontSize(20));
    document.add(list);
    document.close();
}

You make the document tagged with the setTagged document, but that's not sufficient. You also need to set document data: the document title needs to be displayed and you need to indicate the language used in the document. XMP metadata is mandatory.

Furthermore you need to embed all fonts. When you have images, you need a alternate description. In the example, we replace the words "dog" and "fox" by an image. To make sure that these images are "read out loud" correctly, we need to use the getAccessibilityProperties().setAlternateDescription() method.

At the end of the example, I added a numbered list. In another question, you claim that the list is not read out loud correctly by JAWS. If you check the PDF file created with the above example, more specifically pdfua.pdf, you'll discover that JAWS reads the document as expected, with the numbers and the text in the right order.

The reason why "it doesn't work" when you try this, is simple. You are using a version of iText that is 3 years older than the PDF/UA standard. Also: in the version you are using, you are responsible for creating the tag structure at the lowest PDF level when you use the setTagged() method. In more recent version, iText takes care of this at a high level. You need the latest iText version to achieve what you want.

Click this link if you want to see how to answer this question in iText 5.