How to create a PDF with font information and embed the actual font while merging the files into a single PDF?

Category: 
Tags: fontsconcatenatefont subsetmerge documentsiText 7

I create different PDFs and then concatenate them into a single PDF. My resulting PDF is a lot bigger than I had expected in file size. As it turns out, my PDF has a ton of duplicate fonts, and this is the reason why it's so big. I would like to create PDFs which only embed font information, not the full font. Then when I merge these PDFs into a single document, I want to insert actual font needed by the PDF.

Posted on StackOverflow on Feb 24, 2014 by pixerce

I've created the MergeAndAddFont example to explain the different options.

We'll create PDFs using this code snippet:

public void createPdf(String filename, String text, boolean embedded, boolean subset) throws IOException {
    PdfDocument pdfDoc = new PdfDocument(new PdfWriter(filename));
    Document doc = new Document(pdfDoc);
    PdfFont font = PdfFontFactory.createFont(FONT, PdfEncodings.WINANSI, embedded);
    font.setSubset(subset);
    doc.add(new Paragraph(text).setFont(font));
    doc.close();
}

We use this code to create 3 test files, 1, 2, 3 and we'll do this 3 times: A, B, C.

The first time, we use the parameters embedded = true and subset = true, resulting in the files testA1.pdf with text "abcdefgh" (3.83 KB), testA2.pdf with text "ijklmnopq" (3.61 KB) and testA3.pdf with text "rstuvwxyz" (3.68 KB). The font is embedded and the file size is relatively low because we only embed a subset of the font.

Now we merge these files using the following code, using the isSmartModeOn parameter to indicate whether we want to use smartMode as a PdfWriter property or not. In iText7 we merge files using copyPagesTo(int pageFrom, int PageTo, PdfDocument toDocument) method:

public void mergeFiles(String[] files, String result, boolean isSmartModeOn) throws IOException {
    PdfWriter writer = new PdfWriter(result);
    writer.setSmartMode(isSmartModeOn);
    PdfDocument pdfDoc = new PdfDocument(writer);
    pdfDoc.initializeOutlines();
    for (int i = 0; i < files.length; i++) {
        PdfDocument addedDoc = new PdfDocument(new PdfReader(files[i]));
        addedDoc.copyPagesTo(1, addedDoc.getNumberOfPages(), pdfDoc);
        addedDoc.close();
    }
    pdfDoc.close();

}

When we merge the document, be it with PdfCopy or PdfSmartCopy, the different subsets of the same font will be copied as separate objects in the resulting PDF testA_merged1.pdf / testA_merged2.pdf (both 10.2 KB).

This is the problem you are experiencing: when we use the smartMode, PdfWriter can detect and reuse identical objects, but the different subsets of the same font aren't identical and iText can't merge different subsets of the same font into one font.

The second time, we use the parameters embedded = true and subset = false, resulting in the files testB1.pdf (21.5 KB), testB2.pdf (21.5 KB) and testA3.pdf (21.5 KB). The font is fully embedded and the file size of a single file is a lot bigger than before because the full font is embedded.

If we merge the files without smartMode, the font will be present in the merged document redundantly, resulting in the bloated file testB_merged1.pdf (63.6 KB). This is definitely not what you want!

However, if we use smartMode, iText detects an identical font stream and reuses it, resulting in testB_merged2.pdf (22.1 KB) which is much smaller than we had without smartMode. It's still bigger than the document with the subsetted fonts, but if you're concatenating a huge amount of files, the result will be better if you embed the complete font.

The third time, we use the parameters embedded = false and subset = false, resulting in the files testC1.pdf (2.19 KB), testC2.pdf (2.19 KB) and testC3.pdf (2.19 KB). The font isn't embedded, resulting in an excellent file size, but if you compare with one of the previous results, you'll see that the font looks completely different.

We merge the files using smartMode, resulting in testC_merged1.pdf (2.85 KB). Again, we have an excellent file size, but again we have the problem that the font isn't visualized correctly.

To fix this, we need to embed the font:

protected void embedFont(String merged, String fontfile, String result) throws IOException {
    // the font file
    RandomAccessFile raf = new RandomAccessFile(fontfile, "r");
    byte fontbytes[] = new byte[(int) raf.length()];
    raf.readFully(fontbytes);
    raf.close();
    // create a new stream for the font file
    PdfStream stream = new PdfStream(fontbytes);
    stream.setCompressionLevel(CompressionConstants.DEFAULT_COMPRESSION);
    stream.put(PdfName.Length1, new PdfNumber(fontbytes.length));
    // create a reader object
    PdfObject object;
    PdfDictionary font;
    PdfDocument pdfDoc = new PdfDocument(new PdfReader(merged), new PdfWriter(result));
    PdfName fontname = new PdfName(PdfFontFactory.createFont(fontfile, PdfEncodings.WINANSI)
            .getFontProgram().getFontNames().getFontName());
    int n = pdfDoc.getNumberOfPdfObjects();
    for (int i = 0; i < n; i++) {
        object = pdfDoc.getPdfObject(i);
        if (object == null || !object.isDictionary()) {
            continue;
        }
        font = (PdfDictionary) object;
        if (PdfName.FontDescriptor.equals(font.get(PdfName.Type))
                && fontname.equals(font.get(PdfName.FontName))) {
            font.put(PdfName.FontFile2, stream.makeIndirect(pdfDoc).getIndirectReference());
        }
    }
    pdfDoc.close();
}

Now, we have the file testC_merged2.pdf (22.2 KB) and that's actually the answer to your question. As you can see, the second option is better than this third option.

Caveats: This example uses the Gravitas One font as a simple font. As soon as you use the font as a composite font (you tell iText to use it as a composite font by choosing the encoding IDENTITY-H or IDENTITY-V), you can no longer choose whether or not to embed the font, whether or not to subset the font. As defined in ISO-32000-1, iText will always embed composite fonts and will always subset them.

This means that you can't use the above solutions when you need special fonts (Chinese, Japanese, Korean). In that case, you shouldn't embed the fonts, but use so-called CJK fonts. They CJK fonts will use font packs that can be downloaded by Adobe Reader.

Click this link if you want to see how to answer this question in iText 5.