How to convert HTML containing Arabic/Hebrew characters to PDF?

Tags: pdfHtmlArabicHebrew

This question was asked on Stack Overflow on May 13, '15 by user2579223 and on Jun 15, '15 by Anatoly

This is a duplicate of the question Which languages are supported in pdfHTML?. The answer can be found in chapter 6, but this question is asked so frequently that an extra entry in the FAQ section is justified. It's also an occasion to provide an extra example.

In the C07E14_SayPeace.java example, we convert the say_peace.html HTML file to PDF.

Say Peace in HTML
Say Peace in HTML

We see English, Arabic, and Hebrew in this text. We'll use a different font file for each of these languages.

public static final String[] FONTS = {
    "src/main/resources/fonts/noto/NotoSans-Regular.ttf",
    "src/main/resources/fonts/noto/NotoNaskhArabic-Regular.ttf",
    "src/main/resources/fonts/noto/NotoSansHebrew-Regular.ttf"
};

We'll create a FontProvider instance that only uses these font files, and we'll use this FontProvider as a converter property.

public void createPdf(String src, String[] fonts, String dest) throws IOException {
    ConverterProperties properties = new ConverterProperties();
    FontProvider fontProvider = new DefaultFontProvider(false, false, false);
    for (String font : fonts) {
        FontProgram fontProgram = FontProgramFactory.createFont(font);
        fontProvider.addFont(fontProgram);
    }
    properties.setFontProvider(fontProvider);
    HtmlConverter.convertToPdf(new File(src), new File(dest), properties);
}

The result is a PDF file in which the text is rendered correctly:

Say Peace in PDF
Say Peace in PDF

If you used the appropriate fonts, and you get a different result, in the sense that the Hebrew and Arabic text is rendered from left to right, instead of from right to left, you have forgotten to add the pdfCalligraph add-on to your CLASSPATH.