7. Creating PDF invoices (Comfort)

There is an addon for iText called XML Worker. XML Worker can convert simple HTML and CSS file to PDF. XML Worker isn't an URL2PDF tool; you can't use it to convert your web site to a PDF. It was designed to allow people to create a template in HTML, populate it with data and then convert it to PDF using CSS to define styles and colors. Incidentally, we have just created a batch of invoices in the HTML format. I wonder if we could convert those to ZUGFeRD invoices...

Converting XML to HTML

Please take a look at the PdfInvoicesComfort example. We'll reuse some of the code from the example in chapter 6, but we'll introduce some minor changes:

PdfInvoicesComfort app = new PdfInvoicesComfort();
StreamSource xsl = new StreamSource(new File(XSL));
TransformerFactory factory = TransformerFactory.newInstance();
Transformer transformer = factory.newTransformer(xsl);
ComfortProfile comfort;
InvoiceData invoiceData = new InvoiceData();
PojoFactory pojofactory = PojoFactory.getInstance();
List<Invoice> invoices = pojofactory.getInvoices();
for (Invoice invoice : invoices) {
    comfort = invoiceData.createComfortProfileData(invoice);
    InvoiceDOM dom = new InvoiceDOM(comfort);
    byte[] xml = dom.toXML();
    app.createPdf(xml,
        app.createHtml(transformer, xml), String.format(DEST, invoice.getId()));
}
pojofactory.close();

In lines 2 to 4, we create a Transformer object that will use the XSL file we created in chapter 6 to convert the Comfort XML into HTML. We get a List of Invoice objects in lines 7 and 8. We loop over this List and create the XML in lines for each invoice 10 to 12. We call the createHtml() method in line 14; the result to the createPdf() method in line 13-14.

We've already explained the process of creating the HTML in chapter 6.

public byte[] createHtml(Transformer transformer, byte[] comfort)
    throws IOException, ParserConfigurationException, SAXException,
    DataIncompleteException, InvalidCodeException, TransformerException {
    ByteArrayOutputStream baos = new ByteArrayOutputStream();
    StreamSource xml = new StreamSource(new ByteArrayInputStream(comfort));
    Writer writer = new OutputStreamWriter(baos);
    transformer.transform(xml, new StreamResult(writer));
    writer.flush();
    writer.close();
    return baos.toByteArray();
}

Instead of writing the HTML to a file, we keep the HTML in memory. We will pass the resulting byte[] to the createPdf() method.

Creating the ZUGFeRD PDF

Steps 1, 2, 3 and 5 of the createPdf() method were already explained in chapter 5:

public void createPdf(byte[] xml, byte[] invoice, String dest)
    throws DocumentException, IOException, XMPException {
    // step 1
    Document document = new Document();
    // step 2
    PdfAWriter writer = PdfAWriter.getInstance(document,
        new FileOutputStream(dest), PdfAConformanceLevel.ZUGFeRDComfort);
    writer.setTagged();
    writer.setPdfVersion(PdfWriter.VERSION_1_7);
    writer.createXmpMetadata();
    writer.getXmpWriter().setProperty(PdfAXmpWriter.zugferdSchemaNS,
        PdfAXmpWriter.zugferdDocumentFileName, "ZUGFeRD-invoice.xml");
    // step 3
    document.open();
    // step 4
    ICC_Profile icc = ICC_Profile.getInstance(new FileInputStream(ICC));
    writer.setOutputIntents(
        "Custom", "", "http://www.color.org", "sRGB IEC61966-2.1", icc);        
    // Convert the HTML to PDF using XML Worker
    ...
    PdfDictionary parameters = new PdfDictionary();
    parameters.put(PdfName.MODDATE, new PdfDate());
    PdfFileSpecification fileSpec = writer.addFileAttachment(
            "ZUGFeRD invoice", xml, null,
            "ZUGFeRD-invoice.xml", "application/xml",
            AFRelationshipValue.Alternative, parameters);
    PdfArray array = new PdfArray();
    array.add(fileSpec.getReference());
    writer.getExtraCatalog().put(PdfName.AF, array);
    // step 5
    document.close();
}

In chapter 5, we created Paragraphs and PdfPTable objects with PdfPCells to add the content to the Document instance. In this case, we'll leave the creation of high-level objects to iText.

Before we look at the code that is missing in step 4 of the previous code snippet, we should take a closer look at the FontProvider and the ImageProvider implementations.

Creating a FontProvider

When we created the PDF document from scratch in chapter 5, we needed to provide the path to a TTF file. When we look at the HTML created in chapter 6, we notice that we need to provide a regular font and a bold font. We could tell iText always to use OpenSans-Regular for the regular fonts, and always to use OpenSans-Bold for the bold fonts. The MyFontProvider class shows how this could be achieved. It implements the FontProvider interface:

class MyFontProvider implements FontProvider {
    protected BaseFont regular;
    protected BaseFont bold;
    public MyFontProvider() throws DocumentException, IOException {
        regular = BaseFont.createFont(
            "resources/fonts/OpenSans-Regular.ttf", BaseFont.IDENTITY_H, true);
        bold = BaseFont.createFont(
            "resources/fonts/OpenSans-Bold.ttf", BaseFont.IDENTITY_H, true);
    }
    public boolean isRegistered(String fontname) {
        return true;
    }
    public Font getFont(String fontname, String encoding, boolean embedded,
        float size, int style, BaseColor color) {
        Font font;
        switch (style) {
            case Font.BOLD:
                font = new Font(bold, size);
                break;
            default:
                font = new Font(regular, size);
        }
        font.setColor(color);
        return font;
    }
}

In lines 5 to 8, we create BaseFont objects. These objects will be used to create Font objects in the getFont() method in line 18 and 21. The getFont() method will be triggered from within XML Worker. When XML Worker detects that a specific font is needed to render text, it will call the getFont() method passing information about the font family, the encoding, whether or not the font should be embedded, the font size, its stule and its color. We'll ignore the font family, the encoding and the embedding, because we really want the OpenSans font to be embedded using Unicode. However, well look at the style, the color and the font size, and create a Font object accordingly. Note that we only take into account font-weight Bold as a style. We could also allow italic fonts, but then we'd need to add another BaseFont object that has access to OpenSans-Italic. This isn't necessary in the context of this example.

Creating an ImageProvider

In chapter 6, we copied the logo from the resources directory to the output directory because the HTML used a relative path. In this case, the HTML isn't stored as a file; it only exists in memory. If we refer to an image using a relative path from the HTML, iText won't know where to look for that image, unless we implement the ImageProvider interface.

class MyImageProvider implements ImageProvider {
    protected Map<String,Image> cache = new HashMap<String,Image>();
    public Image retrieve(String src) {
        Image img = cache.get(src);
        try {
            if (img == null) {
                img = Image.getInstance(getImageRootPath() + src);
                store(src, img);
            }
        } catch (BadElementException ex) {
            throw new ExceptionConverter(ex);
        } catch (IOException ex) {
            throw new ExceptionConverter(ex);
        }
        return img;
    }
    public String getImageRootPath() {
        return "resources/zugferd/";
    }
    public void store(String src, Image img) {
        cache.put(src, img);
    }
    public void reset() {
        cache = new HashMap<String,Image>();
    }
}

As we know that we'll be using the same logo over and over again for every invoice, it makes sense to create a cache that maps the path to the image as defined in the HTML to the actual Image object; see line 2. XML Worker will call the retrieve() method in line 3 and try to get the image from the cache object. If the image wasn't found, a new Image instance will be created (line 7) and it will be stored (line 8) using the store() method (line 20-22). To find the image, we also have to provide the root path for the image. This is done in the getImageRootPath() method (line 17-19).

We can now start parsing the HTML.

Parsing the HTML with XML Worker

We have created the Document and the PdfAWriter instance. We have prepared all the necessary steps to make the PDF a ZUGFeRD-compliant invoice. We now want to add the content to the PDF. This is done in several steps.

First we define where to find the CSS:

CSSResolver cssResolver = new StyleAttrCSSResolver();
CssFile cssFile = XMLWorkerHelper.getCSS(new FileInputStream(CSS));
cssResolver.addCss(cssFile);

Just like in chapter 6, CSS is a constant that contains the path to the simple CSS file we created.

Then we create the context for the HTML.

CssAppliers cssAppliers = new CssAppliersImpl(new MyFontProvider());
HtmlPipelineContext htmlContext = new HtmlPipelineContext(cssAppliers);
htmlContext.setTagFactory(Tags.getHtmlTagProcessorFactory());
htmlContext.setImageProvider(new MyImageProvider());

We tell XML Worker that we want to use MyFontProvider whenever a specific font is needed. We define that tags should be processed using the default HTML tag processor factory that ships with XML Worker. You could build your own tag processors to convert custom XML to PDF, but it's much easier to process HTML. Finally, we set the ImageProvider to a MyImageProvider instance.

Now we're ready to connect all the different flows using "pipe lines".

PdfWriterPipeline pdf = new PdfWriterPipeline(document, writer);
HtmlPipeline html = new HtmlPipeline(htmlContext, pdf);
CssResolverPipeline css = new CssResolverPipeline(cssResolver, html);

We'll use three pipe lines:

  • The PdfWriterPipeline knows how to write content to the document and writer instances we've defined before.

  • We use this PdfWriterPipeline to create an HtmlPipeline with the HtmlPipelineContext; the context that knows about our font and image provider.

  • We use that HtmlPipeline to create a CssResolverPipeline with the CssResolver that has read our CSS file.

Now we can create an XMLWorker instance and an XMLParser:

XMLWorker worker = new XMLWorker(css, true);
XMLParser p = new XMLParser(worker);
p.parse(new ByteArrayInputStream(invoice));

We parse the byte[] that contains our invoice in the HTML format. The parse() method will add all the content it encounters to the document and the PdfWriter. XML Worker is doing all the heavy lifting for use, creating Paragraphs where there are <div>-tags, PdfPTables where there are <table> tags, and PdfPCells where there are <th> or <td> cells.

The final result

Figure 7.1 shows the resulting PDF. It looks much nicer than the invoices we produced in chapter 5, doesn't it?

Figure 7.1: a ZUGFeRD invoice created from HTML
Figure 7.1: a ZUGFeRD invoice created from HTML

If you look at the panel to the left, you can see that I smuggled in an extra line when I defined the PdfWriter: writer.setTagged(); As a result, my invoice is now also accessible. The HTML tables can now also be interpreted as tables by a PDF processor that understands tagged PDF. This is an example of a future-proof, archivable invoice that can be read by humans (including people who are visually impaired) as well as by machines.