7. Creating PDF invoices (Comfort)

iText 7 is a PDF library with several add-ons, one of which is called pdfHTML. The pdfHTML add-on is a powerful HTML to PDF conversion tool. It takes an HTML or HTML5 file, and converts it to PDF, taking into account CSS and media queries. Incidentally, we have just created a batch of invoices in the HTML format. I wonder if we could convert those to ZUGFeRD invoices...

Converting XML to HTML, and HTML to PDF

Please take a look at the PdfInvoicesComfort example. We'll reuse some of the code from the example in chapter 6, but instead of merely a createHtml() method, we'll now call a createPdf() method:

  1. public static void main(String[] args)
  2. throws SQLException, IOException,
  3. ParserConfigurationException, SAXException, TransformerException,
  4. DataIncompleteException, InvalidCodeException {
  5. LicenseKey.loadLicenseFile(
  6. System.getenv("ITEXT7_LICENSEKEY")
  7. + "/itextkey-html2pdf_typography.xml");
  8. File file = new File(DEST);
  9. file.getParentFile().mkdirs();
  10. PdfInvoicesComfort app = new PdfInvoicesComfort();
  11. PojoFactory factory = PojoFactory.getInstance();
  12. List<Invoice> invoices = factory.getInvoices();
  13. for (Invoice invoice : invoices) {
  14. app.createPdf(invoice,
  15. new FileOutputStream(String.format(DEST, invoice.getId())));
  16. }
  17. factory.close();
  18. }

That createPdf() method will create an HTML file first, and then convert it to PDF:

  1. public void createPdf(Invoice invoice, FileOutputStream fos)
  2. throws IOException, ParserConfigurationException,
  3. SAXException, TransformerException,
  4. DataIncompleteException, InvalidCodeException {
  5. IComfortProfile comfort =
  6. new InvoiceData().createComfortProfileData(invoice);
  7. InvoiceDOM dom = new InvoiceDOM(comfort);
  8. StreamSource xml = new StreamSource(
  9. new ByteArrayInputStream(dom.toXML()));
  10. StreamSource xsl = new StreamSource(new File(XSL));
  11. TransformerFactory factory = TransformerFactory.newInstance();
  12. Transformer transformer = factory.newTransformer(xsl);
  13. ByteArrayOutputStream baos = new ByteArrayOutputStream();
  14. Writer htmlWriter = new OutputStreamWriter(baos);
  15. transformer.transform(xml, new StreamResult(htmlWriter));
  16. htmlWriter.flush();
  17. htmlWriter.close();
  18. byte[] html = baos.toByteArray();
  19.  
  20. ZugferdDocument pdfDocument = new ZugferdDocument(
  21. new PdfWriter(fos), ZugferdConformanceLevel.ZUGFeRDComfort,
  22. new PdfOutputIntent("Custom", "", "http://www.color.org",
  23. "sRGB IEC61966-2.1", new FileInputStream(INTENT)));
  24. pdfDocument.addFileAttachment(
  25. "ZUGFeRD invoice", dom.toXML(), "ZUGFeRD-invoice.xml",
  26. PdfName.ApplicationXml, new PdfDictionary(), PdfName.Alternative);
  27. pdfDocument.setTagged();
  28.  
  29. HtmlConverter.convertToPdf(
  30. new ByteArrayInputStream(html), pdfDocument, getProperties());
  31. }

The body of this method consists of three parts:

  • Line 5-18 are copied from the createHtml() method we created in the previous chapter, but as you can see in line 19, we don't create an HTML file that is stored on disk. Instead, we keep the HTML in memory.

  • Line 21-28 start the same way as our example in chapter 5. We create a ZugferdDocument and we add the XML as an attachment. The line where we tell iText to tag the document isn't strictly necessary, but it's good practice.

  • Line 30-31 is new: in chapter 5, we created Paragraph and Table objects, and we added those object to a Document instance. In this case, we'll leave the creation of building blocks to the pdfHTML add-on. The add-on will convert the content of <p> tags to Paragraph objects, the content of <table> tags to Table objects, and so on.

There are a couple of caveats, though. As we created the HTML in memory, the relative links to resources such as images and CSS can't be resolved. That's why we define a ConverterProperties instance. That instance is obtained through the getProperties() method:

  1. public ConverterProperties getProperties() {
  2. if (properties == null) {
  3. properties = new ConverterProperties().setBaseUri("resources/zugferd/");
  4. }
  5. return properties;
  6. }

In the converter properties, we define a base URI. The resources/zugferd/ directory contains the logo.png and the invoice.css file. Without the base URI, the HTML to PDF conversion process would never know where to look for ./logo.png or ./css.html.

The ConverterProperties object can also be used to define other properties, such as the FontProvider. All ZUGFeRD documents are also PDF/A document, which means that all fonts need to be embedded. In this case, we used FreeSans as font, as defined in the first line of our CSS file: body { font-family: FreeSans; } Different fonts of the FreeSans font family are shipped with the pdfHTML add-on, and the default FontProvider knows where to find that font. If you want to use another font, you may need to create a custom FontProvider that tells iText where to find the fonts you need.

The final result

Figure 7.1 shows the resulting PDF. It looks much nicer than the invoices we produced in chapter 5, doesn't it?

Figure 7.1: a ZUGFeRD invoice created from HTML
Figure 7.1: a ZUGFeRD invoice created from HTML

If you look at the panel to the left, you can see the effect of the extra line we smuggled into our code (pdfDocument.setTagged();). As a result, the invoice is now also accessible. The HTML tables can now also be interpreted as tables by a PDF processor that understands tagged PDF. This is an example of a future-proof, archivable invoice that can be read by humans (including people who are visually impaired) as well as by machines.

Before we close, let's also take a look inside the PDF document. In figure 7.2, we see the root object of the PDF document, aka the catalog. We also see the /AF (Associated Files) and the /Names entry We recognize the /EmbeddedFiles name tree that has a single element named "ZUGFeRD invoice". This refers to a dictionary of filetype /Filespec that is also referred to from the /AF array. The /AFRelationship is "Altermative", meaning that the PDF document and the attached XML are alternative presentations of the same content. This is required by the ZUGFeRD standard.

Figure 7.2: The Associated File
Figure 7.2: The Associated File

Figure 7.3 shows the XMP metadata. The document knows that it's a PDF/A-3B document because of the presence of pdfaid:part="3" and pdfaid:conformance="B" attributes in the rdf:Description. There are also a number of entries in the zf namespace, that define the profile (zf:ConformanceLevel = "COMFORT"), the name of the XML attachment (zf:DocumentFileName = "ZUGFeRD-invoice.xml"), the document type (zf:DocumentType = "INVOICE"), and the version of the XML schema for the invoice data (zf:Version = "1.0").

Figure 7.3: The XMP Metadata
Figure 7.3: The XMP Metadata

You don't have to worry about these Metadata values or the embedded XMP stream. The pdfInvoice add-on automatically takes care of creating this metadata.