How to get values from comment annotations?

Tags: annotationsinspect PDFiText 5

I have a pdf document, inside are comments lists inside rectangles and text boxes:

Screen shot

I want to get values from Text Boxes with c# and itextsharp.

Posted on StackOverflow on Apr 5, 2013 by Alex

The text boxes and rectangles you're referring to are called Annotations. Annotations are defined as dictionaries and they are listed per page.

In other words: you need to create a PdfReader instance and get the ANNOTS from each page:

PdfReader reader = new PdfReader("your.pdf");
for (int i = 1; i <= reader.NumberOfPages; i++) {
    PdfArray array = reader.GetPageN(i).GetAsArray(PdfName.ANNOTS);
    if (array == null) continue;
    for (int j = 0; j < array.Size; j++) {
        PdfDictionary annot = array.GetAsDict(j);
        PdfString text = annot.GetAsString(PdfName.CONTENTS);
        ...
    }
}

In the above code sample, I have a PdfDictionary named annot, from which I can extract the Contents. You may be interested in some other entries too (for instance the name of the annotation, if any). Please inspect all the keys that are available in the annot object in case the Contents entry isn't what you're looking for.

Replace the dots with whatever you want to do with the text. PdfString has different method that will reveal its contents.