Does a PDF file have styles, headers and footers?

Tags: tagged pdfheaderfooterstylesiText 7

Does a PDF file have styles, headers and footers information as is the case with docx files that have separate xml files with extra information?

Posted on StackOverflow on Jan 21, 2014 by Prakhar

Regular PDFs don't have styles, but different fonts (for instance Helvetica is one font, Helvetica-Bold is another font of the same family). They don't have headers and footers, just like they don't have paragraphs, section titles, table rows or table cells. Everything you see in a PDF page, is just a bunch of glyphs, paths and shapes drawn on a canvas.

However: if your PDF is a Tagged PDF, the PDF contains something that is known as the StructTreeRoot. This means that, apart from the presentation of the content, you also have a tree structure that stores the semantics of the content. This structure contains references to the content on the different pages, allowing you (for instance) to find out which lines belong together in a paragraph, which parts of the page are "artifacts" (such as a repeating page header or a footer with a page number), which content is organized as a table, etc...

Tagged PDF is a requirement for PDF/A Level A and PDF/UA documents. A majority of the PDF files you can find in the wild aren't tagged (or aren't tagged properly).

Click this link if you want to see how to answer this question in iText 5.