Chapter 1: PDF Objects

There are eight basic types of objects in PDF. They're explained in sections 7.3.2 to 7.3.9 of ISO-32000-1.

The basic PDF objects

The PDF specification defines eight objects; these objects are listed in table 1.1.

Table 1.1: Overview of the basic PDF objects
PDF ObjectDescription


This type is similar to the Boolean type in programming languages and can be true or false.

Numeric object

There are two types of numeric objects: integer and real. Numbers can be used to define coordinates, font sizes, and so on.


String objects can be written in two ways: as a sequence of literal characters enclosed in parentheses ( ) or as hexadecimal data enclosed in angle brackets < >. Beginning with PDF 1.7, the type is further qualified as text string, PDFDocEncoded string, ASCII string, and byte string, depending upon how the string is used in each particular context.


A name object is an atomic symbol uniquely defined by a sequence of characters. Names can be used as keys for a dictionary, to define an explicit destination type, and so on. You can easily recognize names in a PDF file because they're all introduced with a forward slash: /.


An array is a one-dimensional collection of objects, arranged sequentially between square brackets. For instance, a rectangle is defined as an array of four numbers: [0 0 595 842].


A dictionary is an associative table containing pairs of objects known as dictionary entries. The key is always a name; the value can be (a reference to) any other object. The collection of pairs is enclosed by double angle brackets: << and >>.


Like a string object, a stream is a sequence of bytes. The main difference is that a PDF consumer reads a string entirely, whereas a stream is best read incrementally. Strings are used for small pieces of data; streams are used for large amounts of data. Each stream consists of a dictionary followed by zero or more bytes enclosed between the keywords stream (followed by a newline) and endstream.

Null object

This type is similar to the null object in programming languages. Setting the value of a dictionary entry to null is equivalent to omitting the entry.

These eight objects are implemented in iText as shown in figure 1.1.

Figure 1.1: iText implementation of the PDF objects
Figure 1.1: iText implementation of the PDF objects

If you look inside iText, you'll also find a class named PdfObjectWrapper. This object is used for specific objects such as:

  • PdfDate, which is a PdfString object wrapper: a date is a special type of string in the Portable Document Format.

  • PdfBorderArray, which is a PdfArray object wrapper: a border is defined by a series of numbers and (optionally) a dash pattern.

  • PdfAnnotation, which is a wrapper for a PdfDictionary: an annotation is defined using a series of key-value pairs.

  • PdfXObject, which is a PdfStream wrapper: an XObject is a separate content stream, for instance containing image data.

When creating or manipulating PDF documents with iText, you'll use high-level objects and convenience methods most of the time. This means you probably won't be confronted with these basic objects very often, but it's interesting to take a look under the hood of iText.

iText's PdfObject implementations

Let's take a look at some simple code samples for each of the basic types.


As there are only two possible values for the PdfBoolean object, you can use a static instance instead of creating a new object.

Code sample 1.1: C0101_BooleanObject
  1. public static void main(String[] args) {
  2. showObject(PdfBoolean.PDFTRUE);
  3. showObject(PdfBoolean.PDFFALSE);
  4. }
  5. public static void showObject(PdfBoolean obj) {
  6. System.out.println(obj.getClass().getName() + ":");
  7. System.out.println("-> boolean? " + obj.isBoolean());
  8. System.out.println("-> type: " + obj.type());
  9. System.out.println("-> toString: " + obj.toString());
  10. System.out.println("-> booleanvalue: " + obj.booleanValue());
  11. }

In code sample 1.1, we use PdfBoolean's constant values PDFTRUE and PDFFALSE and we inspect these objects in the showObject() method. We get the fully qualified name of the class. We use the isBoolean() method that will return false for all objects that aren't derived from PdfBoolean. And we display the type() in the form of an int (this value is 1 for PdfBoolean).

All PdfObject implementations have a toString() method, but only the PdfBoolean class has a booleanValue() method that allows you to get the value as a primitive Java boolean value.

The output of the showObject method looks like this:

-> boolean? true
-> type: 1
-> toString: true
-> booleanvalue: true
-> boolean? true
-> type: 1
-> toString: false
-> booleanvalue: false

We'll use the PdfBoolean object in the tutorial Update your PDFs with iText when we'll update properties of dictionaries to change the behavior of a PDF feature.


There are many different ways to create a PdfNumber object. Although PDF only has two types of numbers (integer and real), you can create a PdfNumber object using a String, int, long, double or float.

This is shown in code sample 1.2.

Code sample 1.2: C0102_NumberObject
  1. public static void main(String[] args) {
  2. showObject(new PdfNumber("1.5"));
  3. showObject(new PdfNumber(100));
  4. showObject(new PdfNumber(100l));
  5. showObject(new PdfNumber(1.5));
  6. showObject(new PdfNumber(1.5f));
  7. }
  8. public static void showObject(PdfNumber obj) {
  9. System.out.println(obj.getClass().getName() + ":");
  10. System.out.println("-> number? " + obj.isNumber());
  11. System.out.println("-> type: " + obj.type());
  12. System.out.println("-> bytes: " + new String(obj.getBytes()));
  13. System.out.println("-> toString: " + obj.toString());
  14. System.out.println("-> intValue: " + obj.intValue());
  15. System.out.println("-> longValue: " + obj.longValue());
  16. System.out.println("-> doubleValue: " + obj.doubleValue());
  17. System.out.println("-> floatValue: " + obj.floatValue());
  18. }

Again we display the fully qualified classname. We check for number objects using the isNumber() method. And we get a different value when we asked for the type (more specifically: 2).

The getBytes() method returns the bytes that will be stored in the PDF. In the case of numbers, you'll get a similar result using toString() method. Although iText works with float objects internally, you can get the value of a PdfNumber object as a primitive Java int, long, double or float.

-> number? true
-> type: 2
-> bytes: 1.5
-> toString: 1.5
-> intValue: 1
-> longValue: 1
-> doubleValue: 1.5
-> floatValue: 1.5
-> number? true
-> type: 2
-> bytes: 100
-> toString: 100
-> intValue: 100
-> longValue: 100
-> doubleValue: 100.0
-> floatValue: 100.0

Observe that you lose the decimal part if you invoke the intValue() or longValue() method on a real number. Just like with PdfBoolean, you'll use PdfNumber only if you hack a PDF at the lowest level, changing a property in the syntax of an existing PDF.


The PdfString class has four constructors:

  • An empty constructor in case you want to create an empty PdfString object (in practice this constructor is only used in subclasses of PdfString),

  • A constructor that takes a Java String object as its parameter,

  • A constructor that takes a Java String object as well as the encoding value (TEXT_PDFDOCENCODING or TEXT_UNICODE) as its parameters,

  • A constructor that takes an array of bytes as its parameter in which case the encoding will be PdfString.NOTHING. This method is used by iText when reading existing documents into PDF objects.

You can choose to store the PDF string object in hexadecimal format by using the setHexWriting() method:

Code sample 1.3: C0103_StringObject
  1. public static void main(String[] args) {
  2. PdfString s1 = new PdfString("Test");
  3. PdfString s2 = new PdfString("\u6d4b\u8bd5", PdfString.TEXT_UNICODE);
  4. showObject(s1);
  5. showObject(s2);
  6. s1.setHexWriting(true);
  7. showObject(s1);
  8. showObject(new PdfDate());
  9. }
  10. public static void showObject(PdfString obj) {
  11. System.out.println(obj.getClass().getName() + ":");
  12. System.out.println("-> string? " + obj.isString());
  13. System.out.println("-> type: " + obj.type());
  14. System.out.println("-> bytes: " + new String(obj.getBytes()));
  15. System.out.println("-> toString: " + obj.toString());
  16. System.out.println("-> hexWriting: " + obj.isHexWriting());
  17. System.out.println("-> encoding: " + obj.getEncoding());
  18. System.out.println("-> bytes: " + new String(obj.getOriginalBytes()));
  19. System.out.println("-> unicode string: " + obj.toUnicodeString());
  20. }

In the output of code sample 1.3, we see the fully qualified name of the class. The isString() method returns true. The type value is 3. In this case, the toBytes() method can return a different value than the toString() method. The String "\u6d4b\u8bd5" represents two Chinese characters meaning "test", but these characters are stored as four bytes.

Hexademical writing is applied at the moment the bytes are written to a PDF OutputStream. The encoding values are stored as String values, either "PDF" for PdfDocEncoding, "UnicodeBig" for Unicode, or "" in case of a pure byte string.

The getOriginalBytes() method only makes sense when you get a PdfString value from an existing file that was encrypted. It returns the original encrypted value of the string object.

The toUnicodeString() method is a safer method than toString() to get the PDF string object as a Java String.

-> string? true
-> type: 3
-> bytes: Test
-> toString: Test
-> hexWriting: false
-> encoding: PDF
-> original bytes: Test
-> unicode string: Test
-> string? true
-> type: 3
-> bytes: ��mK��
-> toString: 测试
-> hexWriting: false
-> encoding: UnicodeBig
-> original bytes: ��mK��
-> unicode string: 测试
-> string? true
-> type: 3
-> bytes: Test
-> toString: Test
-> hexWriting: true
-> encoding: PDF
-> original bytes: Test
-> unicode string: Test
-> string? true
-> type: 3
-> bytes: D:20130430161855+02'00'
-> toString: D:20130430161855+02'00'
-> hexWriting: false
-> encoding: PDF
-> original bytes: D:20130430161855+02'00'
-> unicode string: D:20130430161855+02'00'

In this example, we also create a PdfDate instance. If you don't pass a parameter, you get the current date and time. You can also pass a Java Calendar object if you want to create an object for a specific date. The format of the date conforms to the international Abstract Syntax Notation One (ASN.1) standard defined in ISO/IEC 8824. You recognize the pattern YYYYMMDDHHmmSSOHH' mm where YYYY is the year, MM the month, DD the day, HH the hour, mm the minutes, SS the seconds, OHH the relationship to Universal Time (UT), and ' mm the offset from UT in minutes.


There are different ways to create a PdfName object, but you should only use one. The constructor that takes a single String as a parameter guarantees that your name object conforms to ISO-32000-1 and -2.

You probably wonder why we would add constructors that allow people names that don't conform with the PDF specification. With iText, we did a great effort to ensure the creation of documents that comply. Unfortunately, this can't be said about all PDF creation software. We need some PdfName constructors that accept any kind of value when reading names in documents that are in violation with the PDF ISO standards.

In many cases, you don't need to create a PdfName object yourself. The PdfName object contains a large set of constants with predefined names. One of these names is used in code sample 1.4.

Code sample 1.4: C0104_NameObject
  1. public static void main(String[] args) {
  2. showObject(PdfName.CONTENTS);
  3. showObject(new PdfName("CustomName"));
  4. showObject(new PdfName("Test #1 100%"));
  5. }
  6. public static void showObject(PdfName obj) {
  7. System.out.println(obj.getClass().getName() + ":");
  8. System.out.println("-> name? " + obj.isName());
  9. System.out.println("-> type: " + obj.type());
  10. System.out.println("-> bytes: " + new String(obj.getBytes()));
  11. System.out.println("-> toString: " + obj.toString());
  12. }

The getClass().getName() part no longer has secrets for you. We use isName() to check if the object is really a name. The type is 4. And we can get the value as bytes or as a String.

-> name? true
-> type: 4
-> bytes: /Contents
-> toString: /Contents
-> name? true
-> type: 4
-> bytes: /CustomName
-> toString: /CustomName
-> name? true
-> type: 4
-> bytes: /Test#20#231#20100#25
-> toString: /Test#20#231#20100#25

Note that names start with a forward slash, also know as a solidus. Also take a closer look at the name that was created with the String value "Test #1 100%". iText has escaped values such as ' ', '#' and '%' because these are forbidden in a PDF name object. ISO-32000-1 and -2 state that a name is a sequence of 8-bit values and iText's interprets this literally. If you pass a string containing multibyte characters (characters with a value greater than 255), iText will only take the lower 8 bits into account. Finally, iText will throw an IllegalArgumentException if you try to create a name that is longer than 127 bytes.


The PdfArray class has six constructors. You can create a PdfArray using an ArrayList of PdfObject instances, or you can create an empty array and add the PdfObject instances one by one (see code sample 1.5). You can also pass a byte array of float or int values as parameter in which case you create an array consisting of PdfNumber objects. Finally you can create an array with a single object if you pass a PdfObject, but be carefull: if this object is of type PdfArray, you're using the copy constructor.

Code sample 1.5: C0105_ArrayObject
  1. public static void main(String[] args) {
  2. PdfArray array = new PdfArray();
  3. array.add(PdfName.FIRST);
  4. array.add(new PdfString("Second"));
  5. array.add(new PdfNumber(3));
  6. array.add(PdfBoolean.PDFFALSE);
  7. showObject(array);
  8. showObject(new PdfRectangle(595, 842));
  9. }
  10. public static void showObject(PdfArray obj) {
  11. System.out.println(obj.getClass().getName() + ":");
  12. System.out.println("-> array? " + obj.isArray());
  13. System.out.println("-> type: " + obj.type());
  14. System.out.println("-> toString: " + obj.toString());
  15. System.out.println("-> size: " + obj.size());
  16. System.out.print("-> Values:");
  17. for (int i = 0; i < obj.size(); i++) {
  18. System.out.print(" ");
  19. System.out.print(obj.getPdfObject(i));
  20. }
  21. System.out.println();
  22. }

Once more, we see the fully qualified name in the output. The isArray() method tests if this class is a PdfArray. The value of the array type is 5.

The elements of the array are stored in an ArrayList. The toString() method of the PdfArray class returns the toString() output of this ArrayList: the values of the separate objects delimited with a comma and enclosed by square brackets. The getBytes() method returns null.

You can ask a PdfArray for its size, and use this size to get the different elements of the array one by one. In this case, we use the getPdfObject() method. We'll discover some more methods to retrieve elements from an array in section 1.3.

-> array? true
-> type: 5
-> toString: [/First, Second, 3, false]
-> size: 4
-> Values: /First Second 3 false
-> array? true
-> type: 5
-> toString: [0, 0, 595, 842]
-> size: 4
-> Values: 0 0 595 842

In our example, we created a PdfRectangle using only two values 595 and 842. However, a rectangle needs four values: two for the coordinate of the lower-left corner, two for the coordinate of the upper-right corner. As you can see, iText added two zeros for the coordinate of the lower-left coordinate.


There are only two constructors for the PdfDictionary class. With the empty constructor, you can create an empty dictionary, and then add entries using the put() method. The constructor that accepts a PdfName object will create a dictionary with a /Type entry and use the name passed as a parameter as its value. This entry identifies the type of object the dictionary describes. In some cases, a /SubType entry is used to further identify a specialized subcategory of the general type.

In code sample 1.6, we create a custom dictionary and an action.

Code sample 1.6: C0106_DictionaryObject
  1. public static void main(String[] args) {
  2. PdfDictionary dict = new PdfDictionary(new PdfName("Custom"));
  3. dict.put(new PdfName("Entry1"), PdfName.FIRST);
  4. dict.put(new PdfName("Entry2"), new PdfString("Second"));
  5. dict.put(new PdfName("3rd"), new PdfNumber(3));
  6. dict.put(new PdfName("Fourth"), PdfBoolean.PDFFALSE);
  7. showObject(dict);
  8. showObject(PdfAction.gotoRemotePage("test.pdf", "dest", false, true));
  9. }
  10. public static void showObject(PdfDictionary obj) {
  11. System.out.println(obj.getClass().getName() + ":");
  12. System.out.println("-> dictionary? " + obj.isDictionary());
  13. System.out.println("-> type: " + obj.type());
  14. System.out.println("-> toString: " + obj.toString());
  15. System.out.println("-> size: " + obj.size());
  16. for (PdfName key : obj.getKeys()) {
  17. System.out.print(" " + key + ": ");
  18. System.out.println(obj.get(key));
  19. }
  20. }

The showObject() method shows us the fully qualified names. The isDictionary() returns true and the type() method returns 6.

Just like with PdfArray, the getBytes() method returns null. iText stores the objects in a HashMap. The toString() method of a PdfDictionary doesn't reveal anything about the contents of the dictionary, except for its type if present. The type entry is usually optional. For instance: the PdfAction dictionary we created in code sample 1.6 doesn't have a /Type entry.

We can ask a dictionary for its number of entries using the size() method and get each value as a PdfObject by its key. As the entries are stored in a HashMap, the keys aren't shown in the same order we used to add them to the dictionary. That's not a problem. The order of entries in a dictionary is irrelevant.

-> dictionary? true
-> type: 6
-> toString: Dictionary of type: /Custom
-> size: 4
 /3rd: 3
 /Entry1: /First
 /Type: /Custom
 /Fourth: false
 /Entry2: Second
-> dictionary? true
-> type: 6
-> toString: Dictionary
-> size: 4
 /D: dest
 /F: test.pdf
 /S: /GoToR
 /NewWindow: true

As explained in table 1.1, a PDF dictionary is stored as a series of key value pairs enclosed by << and >>. The action created in code sample 1.6 looks like this when viewed in a plain text editor:

<</D(dest)/F(test.pdf)/S/GoToR/NewWindow true>>

The basic PdfDictionary object has plenty of subclasses such as PdfAction, PdfAnnotation, PdfCollection, PdfGState, PdfLayer, PdfOutline, etc. All these subclasses serve a specific purpose and they were created to make it easier for developers to create objects without having to worry too much about the underlying structures.


The PdfStream class also extends the PdfDictionary object. A stream object always starts with a dictionary object that contains at least a /Length entry of which the value corresponds with the number of stream bytes.

For now, we'll only use the constructor that accepts a byte[] as parameter. The other constructor involves a PdfWriter instance, which is an object we haven't discussed yet. Although that constructor is mainly for internal use —it offers an efficient, memory friendly way to write byte streams of unknown length to a PDF document—, we'll briefly cover this alternative constructor in the Create your PDFs with iText tutorial.

Code sample 1.7: C0107_StreamObject
  1. public static void main(String[] args) {
  2. PdfStream stream = new PdfStream(
  3. "Long stream of data stored in a FlateDecode compressed stream object"
  4. .getBytes());
  5. stream.flateCompress();
  6. showObject(stream);
  7. }
  8. public static void showObject(PdfStream obj) {
  9. System.out.println(obj.getClass().getName() + ":");
  10. System.out.println("-> stream? " + obj.isStream());
  11. System.out.println("-> type: " + obj.type());
  12. System.out.println("-> toString: " + obj.toString());
  13. System.out.println("-> raw length: " + obj.getRawLength());
  14. System.out.println("-> size: " + obj.size());
  15. for (PdfName key : obj.getKeys()) {
  16. System.out.print(" " + key + ": ");
  17. System.out.println(obj.get(key));
  18. }
  19. }

In the lines following the fully qualified name, we see that the isStream() method returns true and the type() method returns 7. The toString() method returns nothing more than the word "Stream".

We can store the long String we used in code sample 1.7 "as is" inside the stream. In this case, invoking the getBytes() method will return the bytes you used in the constructor.

If a stream is compressed, for instance by using the flateCompress() method, the getBytes() method will return null. In this case, the bytes are stored inside a ByteArrayOutputStream and you can write these bytes to an OutputStream using the writeContent() method. We didn't do that because it doesn't make much sense for humans to read a compressed stream.

The PdfStream instance remembers the original length aka the raw length. The length of the compressed stream is stored in the dictionary.

-> stream? true
-> type: 7
-> toString: Stream
-> raw length: 68
-> size: 2
 /Filter: /FlateDecode
 /Length: 67

In this case, compression didn't make much sense: 68 bytes were compressed into 67 bytes. In theory, you could choose a different compression level. The PdfStream class has different constants such as NO_COMPRESSION (0), BEST_SPEED (1) and BEST_COMPRESSION (9). In practice, we'll always use DEFAULT_COMPRESSION (-1).


We're using the PdfNull class internally in some very specific cases, but there's very little chance you'll ever need to use this class in your own code. For instance: it's better to remove an entry from a dictionary than to set its value to null; it saves the PDF consumer processing time when parsing the files you've created.

Code sample 1.8: C0108_NullObject
  1. public static void main(String[] args) {
  2. showObject(PdfNull.PDFNULL);
  3. }
  4. public static void showObject(PdfNull obj) {
  5. System.out.println(obj.getClass().getName() + ":");
  6. System.out.println("-> type: " + obj.type());
  7. System.out.println("-> bytes: " + new String(obj.getBytes()));
  8. System.out.println("-> toString: " + obj.toString());
  9. }

The output of code sample 1.8 is pretty straight-forward: the fully qualified name of the class, its type (8) and the output of the getBytes() and toString() methods.

-> type: 8
-> bytes: null
-> toString: null

These were the eight basic types, numbered from 1 to 8. Two more numbers are reserved for specific PdfObject classes: 0 and 10. Let's start with the class that returns 0 when you call the type() method.


The objects we've discussed so far were literally the first objects that were written when I started writing iText. Since 2000, they've been used to build billions of PDF documents. They form the foundation of iText's object-oriented approach to create PDF documents.

Working in an object-oriented way is best practice and it's great, but for some straight-forward objects, you wish you'd have a short-cut. That's why we created PdfLiteral. It's an iText object you won't find in the PDF specification or ISO-32000-1 or -2. It allows you to create any type of object with a minimum of overhead.

For instance: we often need an array that defines a specific matrix, called the identity matrix. It consists of six elements: 1, 0, 0, 1, 0 and 0. Should we really create a PdfArray object and add these objects one by one? Wouldn't it be easier if we just created the literal array: [1 0 0 1 0 0]?

That's what PdfLiteral is about. You create the object passing a String or a byte[]; you can even pass the object type to the constructor.

Code sample 1.9: C0109_LiteralObject
  1. public static void main(String[] args) {
  2. showObject(PdfFormXObject.MATRIX);
  3. showObject(new PdfLiteral(
  4. PdfObject.DICTIONARY, "<</Type/Custom/Contents [1 2 3]>>"));
  5. }
  6. public static void showObject(PdfObject obj) {
  7. System.out.println(obj.getClass().getName() + ":");
  8. System.out.println("-> type: " + obj.type());
  9. System.out.println("-> bytes: " + new String(obj.getBytes()));
  10. System.out.println("-> toString: " + obj.toString());
  11. }

The MATRIX constant used in code sample 1.9 was created like this: new PdfLiteral("[1 0 0 1 0 0]"); when we write this object to a PDF, it is treated in exactly the same way as if we'd had created a PdfArray, except that its type is 0 because PdfLiteral doesn't parse the String to check the type.

We also create a custom dictionary, telling the object its type is PdfObject.DICTIONARY. This doesn't have any impact on the fully qualified name. As the String passed to the constructor isn't being parsed, you can't ask the dictionary for its size nor get the key set of the entries.

The content is stored literally, as indicated in the name of the class: PdfLiteral.

-> type: 0
-> bytes: [1 0 0 1 0 0]
-> toString: [1 0 0 1 0 0]
-> type: 6
-> bytes: <</Type/Custom/Contents [1 2 3]>>
-> toString: <</Type/Custom/Contents [1 2 3]>>

It goes without saying that you should be very careful when using this object. As iText doesn't parse the content to see if its syntax is valid, you'll have to make sure you don't make any mistakes. We use this object internally as a short-cut, or when we encounter content that can't be recognized as being one of the basic types whilst reading an existing PDF file.

The difference between direct and indirect objects

To explain what the iText PdfObject with value 10 is about, we need to introduce the concept of indirect objects. So far, we've been working with direct objects. For instance: you create a dictionary and you add an entry that consists of a PDF name and a PDF string. The result looks like this:

<</Name (Bruno Lowagie)>>

The string value with my name is a direct object, but I could also create a PDF string and label it:

1 0 obj
(Bruno Lowagie)

This is an indirect object and we can refer to it from other objects, for instance like this:

<</Name 1 0 R>>

This dictionary is equivalent to the dictionary that used a direct object for the string. The 1 0 R in the latter dictionary is called an indirect reference, and its iText implementation is called PdfIndirectReference. The type value is 10 and you can check if a PdfObject is in fact an indirect reference using the isIndirect() method.

A stream object may never be used as a direct object. For example, if the value of an entry in a dictionary is a stream, that value always has to be an indirect reference to an indirect object containing a stream. A stream dictionary can never be an indirect object. It always has to be a direct object.

An indirect reference can refer to an object of any type. We'll find out how to obtain the actual object referred to by an indirect reference in chapter 3.


In this chapter, we've had an overview of the building blocks of a PDF file:

  • boolean,
  • number,
  • string,
  • name,
  • array,
  • dictionary,
  • stream, and
  • null

Building blocks can be organized as numbered indirect objects that reference each other.

It's difficult to introduce code samples explaining how direct and indirect objects interact, without seeing the larger picture. So without further ado, let's take a look at the file structure of a PDF document.