How to set RTL direction for Hebrew when converting HTML to PDF?

Category: 
Tags: RTLHebrewlanguagesfontsXML Workerrun directioniText 5

I'm trying to convert an HTML file with Hebrew characters (UTF-8) to PDF by using iText, but I'm getting all letters in reverse order. As far I understand, I can set RTL only for ColumnText and PdfCell objects. So here's my doubt: is it possible to convert Hebrew HTML to PDF? This is my HTML:

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
    "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
  <title>Title of document</title>
</head>
<body style="font-size:12.0pt; font-family:Arial">
  שלום עולם
</body>
</html>
When I convert this HTML to PDF using XML Worker, I get this result:

Wrong order

These is "Hello World" in Hebrew written from left to right. It should be written from right to left.

Posted on StackOverflow on Jun 15, 2015 by Anatoly

Please take a look at the ParseHtml10 example. In this example, we have take the file hebrew.html:

<html>

<head>
  <title>Hebrew text</title>
</head>

<body style="font-size:12.0pt; font-family:Arial">
<div dir="rtl" style="font-family: Noto Sans Hebrew">שלום עולם</div>
</body>

</html>

And we convert it to PDF using this code:

public void createPdf(String file) throws IOException, DocumentException {
    // step 1
    Document document = new Document();
    // step 2
    PdfWriter writer =
        PdfWriter.getInstance(document, new FileOutputStream(file));
    // step 3
    document.open();
    // step 4
    // Styles
    CSSResolver cssResolver = new StyleAttrCSSResolver();
    XMLWorkerFontProvider fontProvider =
        new XMLWorkerFontProvider(XMLWorkerFontProvider.DONTLOOKFORFONTS);
    fontProvider.register("resources/fonts/NotoSansHebrew-Regular.ttf");
    CssAppliers cssAppliers = new CssAppliersImpl(fontProvider);
    HtmlPipelineContext htmlContext = new HtmlPipelineContext(cssAppliers);
    htmlContext.setTagFactory(Tags.getHtmlTagProcessorFactory());

    // Pipelines
    PdfWriterPipeline pdf = new PdfWriterPipeline(document, writer);
    HtmlPipeline html = new HtmlPipeline(htmlContext, pdf);
    CssResolverPipeline css = new CssResolverPipeline(cssResolver, html);

    // XML Worker
    XMLWorker worker = new XMLWorker(css, true);
    XMLParser p = new XMLParser(worker);
    p.parse(new FileInputStream(HTML), Charset.forName("UTF-8"));;
    // step 5
    document.close();
}

The result looks like hebrew.pdf:

Text from right to left
Text from right to left

What are the hurdles you need to take?

  • You need to wrap your text in an element such as a <div> or a <td>.

  • You need to add the attribute dir="rtl" to define the direction.

  • You need to make sure that you're using a font that knows how to display Hebrew. I used a NOTO font for Hebrew. This is one of the fonts distributed by Google in their program to provide fonts for every possible language.

Important: this solution requires at least iText and XML Worker 5.5.5, because support for the dir attribute was introduced in 5.5.4 and improved in 5.5.5.