How to get the page number of an arbitrary PDF object?

Tags: page numbersopen actiondestinationinspect PDFiText 5

I am trying to find the page number of a PDF object using iText's Java API. The following code reads in the PDF file, and gets the object containing the open action. How do I get the page number of that object?

PdfReader soPdfItext = null;
try {
    soPdfItext = new PdfReader(new FileInputStream(f));
} catch (IOException e) { }
/* Get the catalog */
PdfDictionary soCatalog = soPdfItext.getCatalog();
/* Get the object referring to the open action */
PRIndirectReference soOpenActionReference =
    (PRIndirectReference) soCatalog.get(PdfName.OPENACTION);
/* Get the actual object containing the open action */
PdfObject soOpenActionObject =
    originalPdfItext.getPdfObject(soOpenActionReference.getNumber());

Now what? There is a class Document that contains a method getPageNumber(), but I'm not sure if a) it's relevant to what I want to do and b) if it is relevant, how to implement.

Posted on StackOverflow on Jun 15, 2015 by user271621

There are no such things as page numbers in a PDF. Pages are part of a page tree. This page tree consists of /Pages elements (the branches of the tree) and /Page elements (the leaves of the tree). The page index is calculated by traversing the different branches and leaves of the tree. Optionally, a PDF also defines /PageLabels. If you know the page index and if you have the definition of the page labels, you can derive the page number.

You are extracting an PdfObject that represents an open action. It can be a PdfDictionary or a PdfArray.

PdfDictionary

If the PdfObject is an instance of a PdfDictionary, then you need to look at the /S item of this dictionary to find out which type of action will be triggered.

  • That action could be some JavaScript. If that JavaScript contains an action that jumps to a specific page, there might be a page number in that method.

  • That action could be a GoTo action, in which case you need to look at the /D entry for the destination (*).

There are 20 possible types of actions, and actions can be chained, so it's up to you to loop through the action chain and to examine every possible action.

This is an example:

/OpenAction<</D[8 0 R/Fit]/S/GoTo>>

The << and >> indicate that the open action is described using a dictionary. The /S shows that you have a /GoTo action and /D describes the destination.

PdfArray

If the PdfAction is an instance of a PdfArray, then this array is a destination (*).

This is an example:

/OpenAction[6 0 R/XYZ 0 806 0]

(*) Destination

A destination is an array that consists of a variable number of elements. These are some examples:

[8 0 R/Fit]
[6 0 R/XYZ 0 806 0]

The first example is an array with two elements 8 0 R and /Fit. The second example is an array with four elements 6 0 R, /XYZ, 0, 806 and 0. You need the first element. It doesn't give you the page number (because there is no such thing as page numbers), but it gives you a reference to the /Page object. Based on that reference, you can deduce the page number by looping over the page tree and comparing the object number of a specific page with the object number in the destination.