Does my HTML have to be valid XML?

Tags: pdfHtmljsoup

Similar questions were posted on Stack Overflow, for instance on Oct 30 '14 by Kannu Verma

If you are still using iText 5 and XML Worker, you have to provide XHTML. For instance: a single <br> wasn't allowed in your HTML; you needed to have a <br />. All tags needed to be closed. Nesting of tags needed to be done correctly. To solve this problem when confronted with incomplete HTML syntax, we advised the use of jsoup to tidy up the HTML before converting it to PDF with XML Worker.

This is no longer necessary with pdfHTML. We have integrated jsoup into the pdfHTML add-on, so that you don't need to call it separately. All HTMLs are cleaned up before converting them to PDF. Take for example the incomplete.html HTML file:

<html>
<head><title>Test incomplete HTML</title></head>
<h1>Test
<p>Hello World
<p>Hello Universe
<br>
<img src="img/logo.png" alt="iText logo">

It doesn't have any <body> tags, the <h1>, <p>, <br>, and <img> tags are never closed. This is a mighty incomplete HTML file, but a browser renders it anyway, and so does pdfHTML.

Incomplete HTML rendered in a browser and as PDF
Incomplete HTML rendered in a browser and as PDF

You can try this for yourself by running the C07E07_IncompleteHTML example.