Chapter 5: Manipulating an existing PDF document

Tags: .NETI/O

In the examples for chapter 1 to 3, we've always created a new PDF document from scratch with iText. In the last couple of examples of chapter 4, we worked with an existing PDF document. We took an existing interactive PDF form and filled it out, either resulting in a pre-filled form, or resulting in a flattened document that was no longer interactive. In this example, we'll continue working with existing PDFs. We'll load an existing file using PdfReader and we'll use the reader object to create a new PdfDocument.

Adding annotations and content

In the previous chapter, we took an existing PDF form, job_application.pdf, and we filled out the fields. In this chapter, we'll take it a step further. We'll start by adding a text annotation, some text, and a new check box. This is shown in Figure 5.1.

Figure 5.1: an updated form
Figure 5.1: an updated form

We'll repeat the code we've seen in the previous chapter in the AddAnnotationsAndContent example.

  1. PdfDocument pdfDoc = new PdfDocument(new PdfReader(src), new PdfWriter(dest));
  2. // add content
  3. pdfDoc.close();

Where it says // add content, we'll add the annotation, the extra text, and the extra check box.

Just like in chapter 4, we add the annotation to a page obtained from the PdfDocument instance:

  1. //Add text annotation
  2. PdfAnnotation ann = new PdfTextAnnotation(new Rectangle(400, 795, 0, 0))
  3. .SetTitle(new PdfString("iText"))
  4. .SetContents("Please, fill out the form.")
  5. .SetOpen(true);
  6. pdfDoc.GetFirstPage().AddAnnotation(ann);

If we want to add content to a content stream, we need to create a PdfCanvas object. We can do this using a PdfPage object as a parameter for the PdfCanvas constructor:

  1. PdfCanvas canvas = new PdfCanvas(pdfDoc.GetFirstPage());
  2. canvas.BeginText()
  3. .SetFontAndSize(PdfFontFactory.CreateFont(FontConstants.HELVETICA), 12)
  4. .MoveText(265, 597)
  5. .ShowText("I agree to the terms and conditions.")
  6. .EndText();

The code to add the text is similar to what we did in chapter 2. Whether you're creating a document from scratch, or adding content to an existing document, has no impact on the instructions we use. The same goes for adding fields to a PdfAcroForm instance:

  1. //Add form field
  2. PdfAcroForm form = PdfAcroForm.GetAcroForm(pdfDoc, true);
  3. PdfButtonFormField checkField = PdfFormField.CreateCheckBox(pdfDoc, new Rectangle(245, 594, 15, 15), "agreement", "Off",PdfFormField.TYPE_CHECK);
  4. checkField.SetRequired(true);
  5. form.AddField(checkField);

Now that we've added an extra field, we might want to change the reset action:

  1. //Update reset button
  2. form.GetField("reset").SetAction(PdfAction.CreateResetForm(new String[] { "name", "language", "experience1", "experience2", "experience3", "shift", "info", "agreement" }, 0));
  3. pdfDoc.Close();

Let's see if we can also change some of the visual aspects of the form fields.

Changing the properties of form fields

In the FillAndModifyForm example, we return to the FillForm example from chapter 4, but instead of merely filling out the form, we also change the properties of the fields:

  1. PdfAcroForm form = PdfAcroForm.GetAcroForm(pdfDoc, true);
  2. IDictionary<String, PdfFormField> fields = form.GetFormFields();
  3. PdfFormField toSet;
  4. fields.TryGetValue("name", out toSet);
  5. toSet.SetValue("James Bond").SetBackgroundColor(Color.ORANGE);
  6. fields.TryGetValue("experience1", out toSet);
  7. toSet.SetValue("Yes");
  8. fields.TryGetValue("experience2", out toSet);
  9. toSet.SetValue("Yes");
  10. fields.TryGetValue("experience3", out toSet);
  11. toSet.SetValue("Yes");
  12. IList<PdfObject> options = new List<PdfObject>();
  13. options.Add(new PdfString("Any"));
  14. options.Add(new PdfString("8.30 am - 12.30 pm"));
  15. options.Add(new PdfString("12.30 pm - 4.30 pm"));
  16. options.Add(new PdfString("4.30 pm - 8.30 pm"));
  17. options.Add(new PdfString("8.30 pm - 12.30 am"));
  18. options.Add(new PdfString("12.30 am - 4.30 am"));
  19. options.Add(new PdfString("4.30 am - 8.30 am"));
  20. PdfArray arr = new PdfArray(options);
  21. fields.TryGetValue("shift", out toSet);
  22. toSet.SetOptions(arr);
  23. toSet.SetValue("Any");
  24. PdfFont courier = PdfFontFactory.CreateFont(FontConstants.COURIER);
  25. fields.TryGetValue("info", out toSet);
  26. toSet.SetValue("I was 38 years old when I became an MI6 agent.", courier, 7f);
  27. pdfDoc.Close();

Please take a closer look at the following lines:

  • line 3: we set the value of the "name" field to "James Bond", but we also change the background color to Color.ORANGE.

  • line 8-17: we create a Java List containing more options than the form originally contained (line 8-15). We convert this List to a PdfArray (line 16) and we use this array to update the options of the "shift" field (line 17).

  • line 19-21: we create a new PdfFont and we use this font and a new font size as extra parameters when we set the value of the "info" field.

Let's take a look at Figure 5.2 to see if our changes were applied.

Figure 5.2: updated form with highlighted fields
Figure 5.2: updated form with highlighted fields

We see that the "shift" field now has more options, but we don't see the background color of the "name" field. It's also not clear if the font of the "info" field has changed. What's wrong? Nothing is wrong, the fields are currently highlighted and the blue highlighting covers the background color. Let's click "Highlight Existing Fields" and see what happens.

Figure 5.3: updated form, no highlighting
Figure 5.3: updated form, no highlighting

Now Figure 5.3 looks exactly the way we expected. We wouldn't have had this problem if we had added form.flattenFields(); right before closing the PdfDocument, but in that case, we would no longer have a form either. We'll make some more forms examples in the next chapter, but for now, let's see what we can do with existing documents that don't contain a form.

Adding a header, footer, and watermark

Do you remember the report of the UFO sightings in the 20th century we created in chapter 3? We'll use a similar report for the next couple of examples: ufo.pdf, see Figure 5.4.

Figure 5.4: UFO sightings report
Figure 5.4: UFO sightings report

As you can see, it's not so fancy as the report we made in chapter 3. What if we'd like to add a header, a watermark and a footer saying "page X of Y" to this existing report? Figure 5.5 shows what such a report would look like.

Figure 5.5: UFO sightings report with header, footer, and watermark
Figure 5.5: UFO sightings report with header, footer, and watermark

In Figure 5.5, we zoom in on an advantage that we didn't have when we added the page numbers in chapter 3. In chapter 3, we didn't know the total number of pages at the moment we were adding the footer, hence we only added the current page number. Now that we have an existing document, we can add "1 of 4", "2 of 4", and so on.

When creating a document from scratch, it's possible to create a placeholder for the total number of pages. Once all the pages are created, we can then add the total number of pages to that placeholder, but that's outside the scope of this introductory tutorial.

The AddContent example shows how we can add content to every page in an existing document.

  1. //Initialize PDF document
  2. PdfDocument pdfDoc = new PdfDocument(new PdfReader(src), new PdfWriter(dest));
  3. Document document = new Document(pdfDoc);
  4. Rectangle pageSize;
  5. PdfCanvas canvas;
  6. int n = pdfDoc.GetNumberOfPages();
  7. for (int i = 1; i <= n; i++) {
  8. PdfPage page = pdfDoc.GetPage(i);
  9. pageSize = page.GetPageSize();
  10. canvas = new PdfCanvas(page);
  11. //Draw header text
  12. }
  13. pdfDoc.close();

We use the pdfDoc object to create a Document instance. We'll use that document object to add some content. We also use the pdfDoc object to find the number of pages in the original PDF. We loop over all the pages, and we get the PdfPage object of each page. Let's take a look at the // add new content part we omitted.

  1. //Draw header text
  2. canvas.BeginText()
  3. .SetFontAndSize(PdfFontFactory.CreateFont(FontConstants.HELVETICA), 7)
  4. .MoveText(pageSize.GetWidth() / 2 - 24, pageSize.GetHeight() - 10)
  5. .ShowText("I want to believe")
  6. .EndText();
  7. //Draw footer line
  8. canvas.SetStrokeColor(Color.BLACK)
  9. .SetLineWidth(.2f)
  10. .MoveTo(pageSize.GetWidth() / 2 - 30, 20)
  11. .LineTo(pageSize.GetWidth() / 2 + 30, 20)
  12. .Stroke();
  13. //Draw page number
  14. canvas.BeginText()
  15. .SetFontAndSize(PdfFontFactory.CreateFont(FontConstants.HELVETICA), 7)
  16. .MoveText(pageSize.GetWidth() / 2 - 7, 10)
  17. .ShowText(i.ToString())
  18. .ShowText(" of ")
  19. .ShowText(n.ToString())
  20. .EndText();
  21. //Draw watermark
  22. Paragraph p = new Paragraph("CONFIDENTIAL").SetFontSize(60);
  23. canvas.SaveState();
  24. PdfExtGState gs1 = new PdfExtGState().SetFillOpacity(0.2f);
  25. canvas.SetExtGState(gs1);
  26. document.ShowTextAligned(p, pageSize.GetWidth() / 2, pageSize.GetHeight() / 2, pdfDoc.GetPageNumber(page), TextAlignment.CENTER, VerticalAlignment.MIDDLE, 45);
  27. canvas.RestoreState();

We are adding four parts of content:

  1. A header (line 2-6): we use low-level text functionality to add "I want to believe" at the top of the page.

  2. A footer line (line 8-11): we use low-level graphics functionality to draw a line at the bottom of the page.

  3. A footer with the page number (13-19): we use low-level text functionality to add the page number, followed by " of ", followed by the total number of pages at the bottom of the page.

  4. A watermark (lin 21-28): we create a Paragraph with the text we want to add as a watermark. Then we change the opacity of the canvas. Finally we add the Paragraph to the document, centered in the middle of the page and with an angle of 45 degrees, using the showTextAligned() method.

We're doing something special when we add the watermark. We're changing the graphics state of the canvas object obtained from the page. Then we add text to the corresponding page in the document. Internally, iText will detect that we're already using the PdfCanvas instance of that page and the showTextAligned() method will write to that same canvas. This way, we can use a mix of low-level and convenience methods.

In the final example of this chapter, we'll change the page size and orientation of the pages of our UFO sightings report.

Changing the page size and orientation

If we take a look at Figure 5.6, we see our original report from Figure 5.4, but the pages are bigger and the second page has been turned up-side down.

Figure 5.6: changed page size and orientation
Figure 5.6: changed page size and orientation

The ChangePage example shows how this was done.

  1. //Initialize PDF document
  2. PdfDocument pdfDoc = new PdfDocument(new PdfReader(src), new PdfWriter(dest));
  3. float margin = 72;
  4. for (int i = 1; i <= pdfDoc.GetNumberOfPages(); i++) {
  5. PdfPage page = pdfDoc.GetPage(i);
  6. // change page size
  7. Rectangle mediaBox = page.GetMediaBox();
  8. Rectangle newMediaBox = new Rectangle(mediaBox.GetLeft() - margin, mediaBox.GetBottom() - margin, mediaBox.GetWidth() + margin * 2, mediaBox.GetHeight() + margin * 2);
  9. page.SetMediaBox(newMediaBox);
  10. // add border
  11. PdfCanvas over = new PdfCanvas(page);
  12. over.SetStrokeColor(Color.GRAY);
  13. over.Rectangle(mediaBox.GetLeft(), mediaBox.GetBottom(), mediaBox.GetWidth(), mediaBox.GetHeight());
  14. over.Stroke();
  15. // change rotation of the even pages
  16. if (i % 2 == 0) {
  17. page.SetRotation(180);
  18. }
  19. }
  20. pdfDoc.Close();

No need for a Document instance here, we work with the PdfDocument instance only. We loop over all the pages (line 4) and get the PdfPage instance of each page (line 5).

  • A page can have different page boundaries, one of which isn't optional: the /MediaBox. We get the value of this page boundary as a Rectangle (line 7) and we create a new Rectangle that is an inch larger on each side (line 8-10). We use the setMediaBox() method to change the page size.

  • We create a PdfCanvas object for the page (line 13), and we stroke a gray line using the dimensions of the original mediaBox (line 14-17).

  • For every even page (line 19), we set the page rotation to 180 degrees.

Manipulating an existing PDF document requires some knowledge about PDF. For instance: you need to know the concept of the /MediaBox. We have tried to keep the examples simple, but that also means that we've cut some corners. For instance: in our last example, we didn't bother to check if a /CropBox was defined. If the original PDF had a /CropBox, enlarging the /MediaBox wouldn't have had any visual effect. We'll need a more in-depth tutorial to cover topics like these.

Summary

In the previous chapter, we learned about interactive PDF forms. In this chapter, we continued working with these forms. We added an annotation, some text, and an extra field to an existing form. We also changed some properties while filling out a form.

We then moved on to PDFs without any interactivity. First, we added a header, a footer, and a watermark. Then, we played with the size and the orientation of the pages of an existing document.

In the next chapter, we'll scale and tile existing documents, and we'll discover how to assemble multiple documents into a single PDF.