How To find internal links in a PDF file?

Tags: annotationslink annotationinspect PDFiText 7

I am using ItextSharp for searching internal links in a PDF file. This is already done with External Links.

//Get the current page
PdfDictionary PageDictionary = R.GetPageN(page);
//Get all of the annotations for the current page
PdfArray Annots = PageDictionary.GetAsArray(PdfName.ANNOTS);
//Make sure we have something
if ((Annots == null) || (Annots.Length == 0)) {
    Console.WriteLine("nothing");
}
//Loop through each annotation
if (Annots != null) {
    foreach (PdfObject A in Annots.ArrayList) {
        //Convert the itext-specific object as a generic PDF object
        PdfDictionary AnnotationDictionary =
            (PdfDictionary)PdfReader.GetPdfObject(A);
        //Make sure this annotation has a link
        if (!AnnotationDictionary.Get(PdfName.SUBTYPE).Equals(PdfName.LINK))
            continue;
        //Make sure this annotation has an ACTION
        if (AnnotationDictionary.Get(PdfName.A) == null)
            continue;
        //Get the ACTION for the current annotation
        PdfDictionary AnnotationAction =
            AnnotationDictionary.GetAsDict(PdfName.A);
        // Test if it is a URI action (There are tons of other types of actions,
        // some of which might mimic URI, such as JavaScript,
        // but those need to be handled seperately)
        if (AnnotationAction.Get(PdfName.S).Equals(PdfName.URI)) {
            PdfString Destination = AnnotationAction.GetAsString(PdfName.URI);
            string url1 = Destination.ToString();
        }
    }
}
Posted on StackOverflow on Feb 22, 2014 by Ashwani

You've already done most of the work.

In iText 7 for Java your code will be the following:

//Get the current page
PdfPage pdfPage = pdfDoc.getPage(page);
//Get all of the annotations for the current page
List<PdfAnnotation> annots = pdfPage.getAnnotations();
//Make sure we have something
if ((annots == null) || (annots.size() == 0)) {
    System.out.println("nothing");
}
//Loop through each annotation
else {
    for (PdfAnnotation a : annots) {
        //Make sure this annotation has a link
        if (a.getSubtype().equals(PdfName.Link))
            continue;
        //Make sure this annotation has an ACTION
        if (a.getAction() != null) {
            //Get the ACTION for the current annotation
            PdfDictionary annotAction = a.getAction();
            // Test if it is a URI action (There are tons of other types of actions,
            // some of which might mimic URI, such as JavaScript,
            // but those need to be handled seperately)
            if (annotAction.get(PdfName.S).equals(PdfName.URI) ||
                annotAction.get(PdfName.S).equals(PdfName.GoToR)) {
                    //do smth with external links                        
                    PdfString destination = annotAction.getAsString(PdfName.URI);
                    String url1 = destination.toString();
            }
            else if (annotAction.get(PdfName.S).equals(PdfName.GoTo) ||
                annotAction.get(PdfName.S).equals(PdfName.GoToE)) {
                    //do smth with internal links
            }
        }
    }
}

As you see, you don’t need to get the array of annotations yourself and convert annotation object to the PdfDictionary, as it was done in iText 5. Just use built-in methods.

Please take a look at the following screen shot:

Internal view of the PDF
Internal view of the PDF

You see the /Annots array of a page. You are already parsing that array in your code and you skip all annotations that aren't of the /Subtype /Link or don't have an /A key, which is excellent.

Currently you're only looking for values of /S that are of type /URI. You say you're already done with external links, but that's not true: you should also look for entries where /S is /GoToR (remote goto). If you want internal links, you need to look for /S values equal to /GoTo, /GoToE, and (in the future) /GoToDp. Maybe you also want to remove the /JavaScript actions, because they can also be used to jump to a specific page.

Click this link if you want to see how to answer this question in iText 5.