Chapter 6: Reusing existing PDF documents

In this chapter, we'll do some more document manipulation, but there will be a subtle difference in approach. In the examples of the previous chapter, we created one PdfDocument instance that linked a PdfReader to a PdfWriter. We manipulated a single document.

In this chapter, we'll always create at least two PdfDocument instances: one or more for the source document(s), and one for the destination document.

Scaling, tiling, and N-upping

Let's start with some examples that scale and tile a document.

Scaling PDF pages

Suppose that we have a PDF file with a single page, measuring 16.54 by 11.69 in. See Figure 6.1.

Figure 6.1: Golden Gate Bridge, original size 16.54 x 11.69 in
Figure 6.1: Golden Gate Bridge, original size 16.54 x 11.69 in

Now we want to create a PDF file with three pages. In page one, the original page is scaled down to 11.69 x 8.26 in as shown in Figure 6.2. On page 2, the original page size is preserved. On page 3, the original page is scaled up to 23.39 x 16.53 in as shown in Figure 6.3.

Figure 6.2: Golden Gate Bridge, scaled down to 11.69 x 8.26 in
Figure 6.2: Golden Gate Bridge, scaled down to 11.69 x 8.26 in
Figure 6.3: Golden Gate Bridge, scaled up to 23.39 x 16.53 in
Figure 6.3: Golden Gate Bridge, scaled up to 23.39 x 16.53 in

The TheGoldenGateBridge_Scale_Shrink example shows how it's done.

  1. PdfDocument pdf = new PdfDocument(new PdfWriter(dest));
  2. PdfDocument origPdf = new PdfDocument(new PdfReader(src));
  3. PdfPage origPage = origPdf.getPage(1);
  4. Rectangle orig = origPage.getPageSizeWithRotation();
  5.  
  6. //Add A4 page
  7. PdfPage page = pdf.addNewPage(PageSize.A4.rotate());
  8. //Shrink original page content using transformation matrix
  9. PdfCanvas canvas = new PdfCanvas(page);
  10. AffineTransform transformationMatrix = AffineTransform.getScaleInstance(
  11. page.getPageSize().getWidth() / orig.getWidth(),
  12. page.getPageSize().getHeight() / orig.getHeight());
  13. canvas.concatMatrix(transformationMatrix);
  14. PdfFormXObject pageCopy = origPage.copyAsFormXObject(pdf);
  15. canvas.addXObject(pageCopy, 0, 0);
  16.  
  17. //Add page with original size
  18. pdf.addPage(origPage.copyTo(pdf));
  19.  
  20. //Add A2 page
  21. page = pdf.addNewPage(PageSize.A2.rotate());
  22. //Scale original page content using transformation matrix
  23. canvas = new PdfCanvas(page);
  24. transformationMatrix = AffineTransform.getScaleInstance(
  25. page.getPageSize().getWidth() / orig.getWidth(),
  26. page.getPageSize().getHeight() / orig.getHeight());
  27. canvas.concatMatrix(transformationMatrix);
  28. canvas.addXObject(pageCopy, 0, 0);
  29.  
  30. pdf.close();
  31. origPdf.close();

In this code snippet, we create a PdfDocument instance that will create a new PDF document (line 1); and we create a PdfDocument instance that will read an existing PDF document (line 2). We get a PdfPage instance for the first page of the existing PDF (line 3), and we get its dimensions (line 4). We then add three pages to the new PDF document:

  1. We add an A4 page using landscape orientation (line 7) and we create a PdfCanvas object for that page. Instead of calculating the a, b, c, d, e, and f value for a transformation matrix that will scale the coordinate system, we use an AffineTransform instance using the getScaleInstance() method (line 9-12). We apply that transformation (line 13), we create a Form XObject containing the original page (line 14) and we add that XObject to the new page (line 15).

  2. Adding the original page in its original dimensions is much easier. We just create a new page by copying the origPage to the new PdfDocument instance, and we add it to the pdf using the addPage() method (line 18).

  3. Scaling up and shrinking is done in the exact same way. This time, we add a new A2 page using landscape orientation (line 21) and we use the exact same code we had before to scale the coordinate system (line 23-27). We reuse the pageCopy object and add it to the canvas (line 29).

We close the pdf to finalize the new document (line 30) and we close the origPdf to release the resources of the original document.

We can use the same functionality to tile a PDF page.

Tiling PDF pages

Tiling a PDF page means that you distribute the content of one page over different pages. For instance: if you have a PDF with a single page of size A3, you can create a PDF with four pages of a different size –or even the same size–, each showing one quarter of the original A3 page. This is what we've done in Figure 6.4.

Figure 6.4: Golden Gate Bridge, tiled pages
Figure 6.4: Golden Gate Bridge, tiled pages

Let's take a look at the TheGoldenGateBridge_Tiles example.

  1. PdfDocument pdf = new PdfDocument(new PdfWriter(dest));
  2. PdfDocument sourcePdf = new PdfDocument(new PdfReader(src));
  3. PdfPage origPage = sourcePdf.getPage(1);
  4. PdfFormXObject pageCopy = origPage.copyAsFormXObject(pdf);
  5. Rectangle orig = origPage.getPageSize();
  6. //Tile size
  7. Rectangle tileSize = PageSize.A4.rotate();
  8. AffineTransform transformationMatrix = AffineTransform.getScaleInstance(
  9. tileSize.getWidth() / orig.getWidth() * 2f,
  10. tileSize.getHeight() / orig.getHeight() * 2f);
  11. //The first tile
  12. PdfPage page = pdf.addNewPage(PageSize.A4.rotate());
  13. PdfCanvas canvas = new PdfCanvas(page);
  14. canvas.concatMatrix(transformationMatrix);
  15. canvas.addXObject(pageCopy, 0, -orig.getHeight() / 2f);
  16. //The second tile
  17. page = pdf.addNewPage(PageSize.A4.rotate());
  18. canvas = new PdfCanvas(page);
  19. canvas.concatMatrix(transformationMatrix);
  20. canvas.addXObject(pageCopy, -orig.getWidth() / 2f, -orig.getHeight() / 2f);
  21. //The third tile
  22. page = pdf.addNewPage(PageSize.A4.rotate());
  23. canvas = new PdfCanvas(page);
  24. canvas.concatMatrix(transformationMatrix);
  25. canvas.addXObject(pageCopy, 0, 0);
  26. //The fourth tile
  27. page = pdf.addNewPage(PageSize.A4.rotate());
  28. canvas = new PdfCanvas(page);
  29. canvas.concatMatrix(transformationMatrix);
  30. canvas.addXObject(pageCopy, -orig.getWidth() / 2f, 0);
  31. // closing the documents
  32. pdf.close();
  33. sourcePdf.close();

We've seen lines 1-5 before; we already used them in the previous example. In line 7, we define a tile size, and we create a transformationMatrix to scale the coordinate system depending on the original size and the tile size. Then we add the tiles, one by one: line 12-15, line 17-20, line 22-25, and line 27-30 are identical, except for one detail: the offset used in the addXObject() method.

Let's use the PDF with the Golden Gate Bridge for one more example. Let's do the opposite of tiling: let's N-up a PDF.

N-upping a PDF

Figure 6.5 shows what we mean by N-upping. In the next example, we're going to put N pages on one single page.

Figure 6.5: Golden Gate Bridge, four pages on one
Figure 6.5: Golden Gate Bridge, four pages on one

In the TheGoldenGateBridge_N_up example, N is equal to 4. We will put 4 pages on one single page.

  1. PdfDocument pdf = new PdfDocument(new PdfWriter(dest));
  2. PdfDocument sourcePdf = new PdfDocument(new PdfReader(SRC));
  3. //Original page
  4. PdfPage origPage = sourcePdf.getPage(1);
  5. Rectangle orig = origPage.getPageSize();
  6. PdfFormXObject pageCopy = origPage.copyAsFormXObject(pdf);
  7. //N-up page
  8. PageSize nUpPageSize = PageSize.A4.rotate();
  9. PdfPage page = pdf.addNewPage(nUpPageSize);
  10. PdfCanvas canvas = new PdfCanvas(page);
  11. //Scale page
  12. AffineTransform transformationMatrix = AffineTransform.getScaleInstance(
  13. nUpPageSize.getWidth() / orig.getWidth() / 2f,
  14. nUpPageSize.getHeight() / orig.getHeight() / 2f);
  15. canvas.concatMatrix(transformationMatrix);
  16. //Add pages to N-up page
  17. canvas.addXObject(pageCopy, 0, orig.getHeight());
  18. canvas.addXObject(pageCopy, orig.getWidth(), orig.getHeight());
  19. canvas.addXObject(pageCopy, 0, 0);
  20. canvas.addXObject(pageCopy, orig.getWidth(), 0);
  21. // close the documents
  22. pdf.close();
  23. sourcePdf.close();

So far, we've only reused a single page from a single PDF in this chapter. In the next series of examples, we'll assemble different PDF files into one.

Assembling documents

Let's go from San Francisco to Los Angeles, and take a look at Figure 6.6 where we'll find three documents about the Oscars.

Figure 6.6: The Oscars, source documents
Figure 6.6: The Oscars, source documents

The documents are:

In the next couple of examples, we'll merge these documents.

Merging documents with PdfMerger

Figure 6.7 shows a PDF that was created by merging the first 32-page document with the second 15-page document, resulting in a 47-page document.

Figure 6.7: Merging two documents
Figure 6.7: Merging two documents

The code of the 88th_Oscar_Combine example is almost self-explaining.

  1. PdfDocument pdf = new PdfDocument(new PdfWriter(dest));
  2. PdfMerger merger = new PdfMerger(pdf);
  3. //Add pages from the first document
  4. PdfDocument firstSourcePdf = new PdfDocument(new PdfReader(SRC1));
  5. merger.addPages(firstSourcePdf, 1, firstSourcePdf.getNumberOfPages());
  6. //Add pages from the second pdf document
  7. PdfDocument secondSourcePdf = new PdfDocument(new PdfReader(SRC2));
  8. merger.addPages(secondSourcePdf, 1, secondSourcePdf.getNumberOfPages());
  9. // merge and close
  10. merger.merge();
  11. firstSourcePdf.close();
  12. secondSourcePdf.close();
  13. pdf.close();

We create a PdfDocument to create a new PDF (line 1). The PdfMerger class is new. It's a class that will make it easier for us to reuse pages from existing documents (line 2). Just like before, we create a PdfDocument for the source file (line 4, line 7); we then add all the pages to the merger instance (line 5, line 8). Once we're done adding pages, we merge() (line 10) and close() (line 11-13).

We don't need to add all the pages if we don't want to. We can easily add only a limited selection of pages. See for instance the 88th_Oscar_CombineXofY example.

  1. PdfDocument pdf = new PdfDocument(new PdfWriter(dest));
  2. PdfMerger merger = new PdfMerger(pdf);
  3. PdfDocument firstSourcePdf = new PdfDocument(new PdfReader(SRC1));
  4. merger.addPages(firstSourcePdf, Arrays.asList(1, 5, 7, 1));
  5. PdfDocument secondSourcePdf = new PdfDocument(new PdfReader(SRC2));
  6. merger.addPages(secondSourcePdf, Arrays.asList(1, 15));
  7. merger.merge();
  8. firstSourcePdf.close();
  9. secondSourcePdf.close();
  10. pdf.close();

Now the resulting document only has six pages. Pages 1, 5, 7, 1 from the first document (the first page is repeated), and pages 1 and 15 from the second document. PdfMerger is a convenience class that makes merging documents a no-brainer. In some cases however, you'll want to add pages one by one.

Adding pages to a PdfDocument

Figure 6.8 shows the result of the merging of specific pages based on a Table of Contents (TOC) that we'll create on the fly. This TOC contains link annotations that allow you to jump to a specific page if you click an entry of the TOC.

Figure 6.8: Merging documents based on a TOC
Figure 6.8: Merging documents based on a TOC

The 88th_Oscar_Combine_AddTOC example is more complex than the two previous examples. Let's examine it step by step.

Suppose that we have a TreeMap of all the categories the move "The Revenant" was nominated for, where the key is the nomination and the value is the page number of the document where the nomination is mentioned.

  1. public static final Map<String, Integer> TheRevenantNominations =
  2. new TreeMap<String, Integer>();
  3. static {
  4. TheRevenantNominations.put("Performance by an actor in a leading role", 4);
  5. TheRevenantNominations.put(
  6. "Performance by an actor in a supporting role", 4);
  7. TheRevenantNominations.put("Achievement in cinematography", 4);
  8. TheRevenantNominations.put("Achievement in costume design", 5);
  9. TheRevenantNominations.put("Achievement in directing", 5);
  10. TheRevenantNominations.put("Achievement in film editing", 6);
  11. TheRevenantNominations.put("Achievement in makeup and hairstyling", 7);
  12. TheRevenantNominations.put("Best motion picture of the year", 8);
  13. TheRevenantNominations.put("Achievement in production design", 8);
  14. TheRevenantNominations.put("Achievement in sound editing", 9);
  15. TheRevenantNominations.put("Achievement in sound mixing", 9);
  16. TheRevenantNominations.put("Achievement in visual effects", 10);
  17. }

The first lines of the code that creates the PDF are pretty simple.

  1. PdfDocument pdfDoc = new PdfDocument(new PdfWriter(dest));
  2. Document document = new Document(pdfDoc);
  3. document.add(new Paragraph(new Text("The Revenant nominations list"))
  4. .setTextAlignment(Property.TextAlignment.CENTER));

But we need to take a really close look once we start to loop over the entries in the TreeMap.

  1. PdfDocument firstSourcePdf = new PdfDocument(new PdfReader(SRC1));
  2. for (Map.Entry<String, Integer> entry : TheRevenantNominations.entrySet()) {
  3. //Copy page
  4. PdfPage page = firstSourcePdf.getPage(entry.getValue()).copyTo(pdfDoc);
  5. pdfDoc.addPage(page);
  6. //Overwrite page number
  7. Text text = new Text(String.format(
  8. "Page %d", pdfDoc.getNumberOfPages() - 1));
  9. text.setBackgroundColor(Color.WHITE);
  10. document.add(new Paragraph(text).setFixedPosition(
  11. pdfDoc.getNumberOfPages(), 549, 742, 100));
  12. //Add destination
  13. String destinationKey = "p" + (pdfDoc.getNumberOfPages() - 1);
  14. PdfArray destinationArray = new PdfArray();
  15. destinationArray.add(page.getPdfObject());
  16. destinationArray.add(PdfName.XYZ);
  17. destinationArray.add(new PdfNumber(0));
  18. destinationArray.add(new PdfNumber(page.getMediaBox().getHeight()));
  19. destinationArray.add(new PdfNumber(1));
  20. pdfDoc.addNameDestination(destinationKey, destinationArray);
  21. //Add TOC line with bookmark
  22. Paragraph p = new Paragraph();
  23. p.addTabStops(
  24. new TabStop(540, Property.TabAlignment.RIGHT, new DottedLine()));
  25. p.add(entry.getKey());
  26. p.add(new Tab());
  27. p.add(String.valueOf(pdfDoc.getNumberOfPages() - 1));
  28. p.setProperty(Property.ACTION, PdfAction.createGoTo(destinationKey));
  29. document.add(p);
  30. }
  31. firstSourcePdf.close();

Here we go:

  • Line 1: we create a PdfDocument with the source file containing all the info about all the nominations.

  • Line 2: we loop over an alphabetic list of the nominations for "The Revenant".

  • Line 3-4: we get the page that corresponds with the nomination, and we add a copy to the PdfDocument.

  • Line 7-8: we create an iText Text element containing the page number. We subtract 1 from that page number, because the first page in our document is the unnumbered page containing the TOC.

  • Line 9: we set the background color to Color.WHITE. This will cause an opaque white rectangle to be drawn with the same size of the Text. We do this to cover the original page number.

  • Line 10-11: we add this text at a fixed position on the the current page in the PdfDocument. The fixed position is: X = 549, Y = 742, and the width of the text is 100 user units.

  • Line 13: we create a key we'll use to name the destination.

  • Line 14-19: we create a PdfArray containing information about the destination. We'll refer to the page we've just added (line 15), we'll define the destination using an X,Y coordinate and a zoom factor (line 16), we add the values of X (line 17), Y (line 18), and the zoom factor (line 19).

  • Line 20: we add the named destination to the PdfDocument.

  • Line 22: we create an empty Paragraph.

  • Line 23-24: we add a tab stop at position X = 540, we define that the tab needs to be right aligned, and the space preceding the tab needs to be a DottedLine.

  • Line 25: we add the nomination to the Paragraph.

  • Line 26: we introduce a Tab.

  • Line 27: we add the page number minus 1 (because the page with the TOC is page 0).

  • Line 28: we add an action that will be triggered when someone clicks on the Paragraph.

  • Line 29: we add the Paragraph to the document.

  • Line 31: we close the source document.

We've been introducing a lot of new functionality that really requires a more in-depth tutorial, but we're looking at this example for one main reason: to show that there's a significant difference between the PdfDocument object, to which a new page is added with every pass through the loop, and the Document object, to which we keep adding Paragraph objects on the first page.

Let's go through some of these steps one more time to add the checklist.

  1. //Add the last page
  2. PdfDocument secondSourcePdf = new PdfDocument(new PdfReader(SRC2));
  3. PdfPage page = secondSourcePdf.getPage(1).copyTo(pdfDoc);
  4. pdfDoc.addPage(page);
  5. //Add destination
  6. PdfArray destinationArray = new PdfArray();
  7. destinationArray.add(page.getPdfObject());
  8. destinationArray.add(PdfName.XYZ);
  9. destinationArray.add(new PdfNumber(0));
  10. destinationArray.add(new PdfNumber(page.getMediaBox().getHeight()));
  11. destinationArray.add(new PdfNumber(1));
  12. pdfDoc.addNameDestination("checklist", destinationArray);
  13. //Add TOC line with bookmark
  14. Paragraph p = new Paragraph();
  15. p.addTabStops(new TabStop(540, Property.TabAlignment.RIGHT, new DottedLine()));
  16. p.add("Oscars\u00ae 2016 Movie Checklist");
  17. p.add(new Tab());
  18. p.add(String.valueOf(pdfDoc.getNumberOfPages() - 1));
  19. p.setProperty(Property.ACTION, PdfAction.createGoTo("checklist"));
  20. document.add(p);
  21. secondSourcePdf.close();
  22. // close the document
  23. document.close();

This code snippet adds the check list with the overview of all the nominations. An extra line saying "Oscars® 2016 Movie Checklist" is added to the TOC.

This example introduces a couple of new concepts for educational purposes. It shouldn't be used in a real-world application, because it contains a major flaw. We make the assumption that the TOC will consist of only one page. Suppose that we added more lines to the document object, then you would see a strange phenomenon: the text that doesn't fit on the first page, would be added on the second page. This second page wouldn't be a new page, it would be the first page that we added in the loop. In other words: the content of the first imported page would be overwritten. This is a problem that can be fixed, but it's outside the scope of this short introductory tutorial.

We'll finish this chapter with some examples in which we merge forms.

Merging forms

Merging forms is special. In HTML, it's possible to have more than one form in a single HTML file. That's not the case for PDF. In a PDF file, there can be only one form. If you want to merge two forms and you want to preserve the forms, you need to use a special method and a special IPdfPageExtraCopier implementation.

Figure 6.9 shows the combination of two different forms, subscribe.pdf and state.pdf

Figure 6.9: merging two different forms
Figure 6.9: merging two different forms

The Combine_Forms example is different from what we had before.

  1. PdfDocument destPdfDocument = new PdfDocument(new PdfWriter(dest));
  2. PdfDocument[] sources = new PdfDocument[] {
  3. new PdfDocument(new PdfReader(SRC1)),
  4. new PdfDocument(new PdfReader(SRC2))
  5. };
  6. for (PdfDocument sourcePdfDocument : sources) {
  7. sourcePdfDocument.copyPagesTo(
  8. 1, sourcePdfDocument.getNumberOfPages(),
  9. destPdfDocument, new PdfPageFormCopier());
  10. sourcePdfDocument.close();
  11. }
  12. destPdfDocument.close();

In this code snippet, we use the copyPageTo() method. The first two parameters define the from/to range for the pages of the source document. The third parameter defines the destination document. The fourth parameter indicates that we are copying forms and that the two different forms in the two different documents should be merged into a single form. PdfPageFormCopier is an implementation of the IPdfPageExtraCopier interface that makes sure that the two different forms are merged into one single form.

Merging two forms isn't always trivial, because the name of each field needs to be unique. Suppose that we would merge the same form twice. Then we would have two widget annotations for each field. A field with a specific name, for instance "name", can be visualized using different widget annotations, but it can only have one value. Suppose that you would have a widget annotation for the field "name" on page one, and a widget annotation for the same field on page two, then changing the value shown in the widget annotation on one page would automatically also change the value shown in the widget annotations on the other page.

In the next example, we are going to fill out and merge the same form, state.pdf, as many times as there are entries in the CSV file united_states.csv; see Figure 6.10.

Figure 6.10: Merging identical forms
Figure 6.10: Merging identical forms

If we'd keep the names of the fields the way they are in the original form, changing the value of the state "ALABAMA" into "CALIFORNIA", would also change the name "ALASKA" on the second page, and the name of all the other states on the other pages. We made sure that this doesn't happen by renaming all the fields before merging the forms.

Let's take a look at the FillOutAndMergeForms example.

  1. PdfDocument pdfDocument = new PdfDocument(new PdfWriter(dest));
  2. BufferedReader bufferedReader = new BufferedReader(new FileReader(DATA));
  3. String line;
  4. boolean headerLine = true;
  5. int i = 1;
  6. while ((line = bufferedReader.readLine()) != null) {
  7. if (headerLine) {
  8. headerLine = false;
  9. continue;
  10. }
  11. ByteArrayOutputStream baos = new ByteArrayOutputStream();
  12. PdfDocument sourcePdfDocument = new PdfDocument(
  13. new PdfReader(SRC), new PdfWriter(baos));
  14. //Rename fields
  15. i++;
  16. PdfAcroForm form = PdfAcroForm.getAcroForm(sourcePdfDocument, true);
  17. form.renameField("name", "name_" + i);
  18. form.renameField("abbr", "abbr_" + i);
  19. // ... (removed repetitive lines)
  20. form.renameField("dst", "dst_" + i);
  21. //Fill out fields
  22. StringTokenizer tokenizer = new StringTokenizer(line, ";");
  23. Map<String, PdfFormField> fields = form.getFormFields();
  24. fields.get("name_" + i).setValue(tokenizer.nextToken());
  25. fields.get("abbr_" + i).setValue(tokenizer.nextToken());
  26. // ... (removed repetitive lines)
  27. fields.get("dst_" + i).setValue(tokenizer.nextToken());
  28. // close the source document and use it to create a new PdfDocument
  29. sourcePdfDocument.close();
  30. sourcePdfDocument = new PdfDocument(
  31. new PdfReader(new ByteArrayInputStream(baos.toByteArray())));
  32. //Copy pages
  33. sourcePdfDocument.copyPagesTo(
  34. 1, sourcePdfDocument.getNumberOfPages(),
  35. pdfDocument, new PdfPageFormCopier());
  36. sourcePdfDocument.close();
  37. }
  38. bufferedReader.close();
  39. pdfDocument.close();

Let's start by looking at the code inside the while loop. We're looping over the different states of the USA stored in a CSV file (line 6). We skip the first line that contains the information for the column headers (line 7-10). The next couple of lines are interesting. So far, we've always been writing PDF files to disk. In this example, we are creating PDF files in memory using a ByteArrayOutputStream (line 11-13).

As mentioned before, we start by renaming all the fields. We get the PdfAcroForm instance (line 16) and we use the renameField() method to rename fields such as "name" to "name_1", "name_2", and so on. Note that we've skipped some lines for brevity in the code snippet. Once we've renamed all the fields, we set their value (line 23-27).

When we close the sourcePdfDocument (line 29), we have a complete PDF file in memory. We create a new sourcePdfDocument using a ByteArrayInputStream created with that file in memory (line 30-31). We can now copy the pages of that new sourcePdfDocument to our destination pdfDocument.

This is a rather artificial example, but it's a good example to explain some of the usual pitfalls when merging forms:

  • Without the PdfPageFormCopier, the forms won't be correctly merged.

  • One field can only have one value, no matter how many times that field is visualized using a widget annotation.

A more common use case, is to fill out and flatten the same form multiple times in memory, simultaneously merging all the resulting documents in one PDF.

Merging flattened forms

Figure 6.11 shows two PDF documents that were the result of the same procedure: we filled out a form in memory as many times as there are states in the USA. We flattened these filled out forms, and we merged them into one single document.

Figure 6.11: Filling, flattening and merging forms
Figure 6.11: Filling, flattening and merging forms

From the outside, these documents look identical, but if we look at their file size in Figure 12, we see a huge difference.

Figure 6.12: difference in file size depending on how documents are merged
Figure 6.12: difference in file size depending on how documents are merged

What is causing this difference in file size? We need to take a look at the FillOutFlattenAndMergeForms example to find out.

  1. PdfDocument destPdfDocument =
  2. new PdfDocument(new PdfWriter(dest1));
  3. PdfDocument destPdfDocumentSmartMode =
  4. new PdfDocument(new PdfWriter(dest2).setSmartMode(true));
  5. BufferedReader bufferedReader = new BufferedReader(new FileReader(DATA));
  6. String line;
  7. boolean headerLine = true;
  8. while ((line = bufferedReader.readLine()) != null) {
  9. if (headerLine) {
  10. headerLine = false;
  11. continue;
  12. }
  13. ByteArrayOutputStream baos = new ByteArrayOutputStream();
  14. PdfDocument sourcePdfDocument =
  15. new PdfDocument(new PdfReader(SRC), new PdfWriter(baos));
  16. //Fill out fields
  17. PdfAcroForm form = PdfAcroForm.getAcroForm(sourcePdfDocument, true);
  18. StringTokenizer tokenizer = new StringTokenizer(line, ";");
  19. Map<String, PdfFormField> fields = form.getFormFields();
  20. fields.get("name").setValue(tokenizer.nextToken());
  21. fields.get("abbr").setValue(tokenizer.nextToken());
  22. // ... (removed repetitive lines)
  23. fields.get("dst").setValue(tokenizer.nextToken());
  24. //Flatten fields
  25. form.flattenFields();
  26. sourcePdfDocument.close();
  27. sourcePdfDocument = new PdfDocument(
  28. new PdfReader(new ByteArrayInputStream(baos.toByteArray())));
  29. //Copy pages
  30. sourcePdfDocument.copyPagesTo(
  31. 1, sourcePdfDocument.getNumberOfPages(), destPdfDocument, null);
  32. sourcePdfDocument.copyPagesTo(
  33. 1, sourcePdfDocument.getNumberOfPages(), destPdfDocumentSmartMode, null);
  34. sourcePdfDocument.close();
  35. }
  36. bufferedReader.close();
  37. destPdfDocument.close();
  38. destPdfDocumentSmartMode.close();

In this code snippet, we create two documents simultaneously:

  • The destPdfDocument instance (line 1-2) is created the same way we've been creating PdfDocument instances all along.

  • The destPdfDocumentSmartMode instance (line 3-4) is also created that way, but we've turned on the smart mode.

We loop over the lines of the CSV file like we did before (line 8), but since we're going to flatten the forms, we no longer have to rename the fields. The fields will be lost due to the flattening process anyway. We create a new PDF document in memory (line 12-15) and we fill out the fields (line 17-23). We flatten the fields (line 25) and close the document created in memory (line 26). We use the file created in memory to create a new source file. We add all the pages of this source file to the two PdfDocument instances, one working in normal mode, the other in smart mode. We no longer need to use a PdfPageFormCopier instance, because the forms have been flattened; they are no longer forms.

What is the difference between these normal and smart mode?

  • When we copy the pages of the filled out forms to the PdfDocument working in normal mode, the PdfDocument processes each document as if it's totally unrelated to the other documents that are being added. In this case, the resulting document will be bloated, because the documents are related: they all share the same template. That template is added to the PDF document as many times as there are states in the USA. In this case, the result is a file of about 12 MBytes.

  • When we copy the pages of the filled out forms to the PdfDocument working in smart mode, the PdfDocument will take the time to compare the resources of each document. If two separate documents share the same resources (e.g. a template), then that resource is copied to the new file only once. In this case, the result can be limited to 365 KBytes.

Both the 12 MBytes and the 365 KBytes files look exactly the same when opened in a PDF viewer or when printed, but it goes without saying that the 365 KBytes files is to be preferred over the 12 MBytes file.

Summary

In this chapter, we've been scaling, tiling, N-upping one file with a different file as result. We've also assembled files in many different ways. We discovered that there are quite some pitfalls when merging interactive forms. Much more remains to be said about reusing content from existing PDF documents.

In the next chapter, we'll discuss PDF documents that comply to special PDF standards such as PDF/UA and PDF/A. We'll discover that merging PDF/A documents also requires some special attention.