How To find internal links in a PDF file?

Tags: annotationslink annotationinspect PDFiText 5

I am using ItextSharp for searching internal links in a PDF file. This is already done with External Links.

//Get the current page
PdfDictionary PageDictionary = R.GetPageN(page);
//Get all of the annotations for the current page
PdfArray Annots = PageDictionary.GetAsArray(PdfName.ANNOTS);
//Make sure we have something
if ((Annots == null) || (Annots.Length == 0)) {
    Console.WriteLine("nothing");
}
//Loop through each annotation
if (Annots != null) {
    foreach (PdfObject A in Annots.ArrayList) {
        //Convert the itext-specific object as a generic PDF object
        PdfDictionary AnnotationDictionary =
            (PdfDictionary)PdfReader.GetPdfObject(A);
        //Make sure this annotation has a link
        if (!AnnotationDictionary.Get(PdfName.SUBTYPE).Equals(PdfName.LINK))
            continue;
        //Make sure this annotation has an ACTION
        if (AnnotationDictionary.Get(PdfName.A) == null)
            continue;
        //Get the ACTION for the current annotation
        PdfDictionary AnnotationAction =
            AnnotationDictionary.GetAsDict(PdfName.A);
        // Test if it is a URI action (There are tons of other types of actions,
        // some of which might mimic URI, such as JavaScript,
        // but those need to be handled seperately)
        if (AnnotationAction.Get(PdfName.S).Equals(PdfName.URI)) {
            PdfString Destination = AnnotationAction.GetAsString(PdfName.URI);
            string url1 = Destination.ToString();
        }
    }
}
Posted on StackOverflow on Feb 22, 2014 by Ashwani

You've already done most of the work. Please take a look at the following screen shot:

Internal view of the PDF
Internal view of the PDF

You see the /Annots array of a page. You are already parsing that array in your code and you skip all annotations that aren't of the /Subtype /Link or don't have an /A key, which is excellent.

Currently you're only looking for values of /S that are of type /URI. You say you're already done with external links, but that's not true: you should also look for entries where /S is /GoToR (remote goto). If you want internal links, you need to look for /S values equal to /GoTo, /GoToE, and (in the future) /GoToDp. Maybe you also want to remove the /JavaScript actions, because they can also be used to jump to a specific page.