If you've read one of the editions of "iText in Action", you've certainly encountered the blank page problem in the section about troubleshooting PDF problems. Recently, we received a support question about a blank page problem that wasn't described in the book. We've now made a couple of examples that explain what goes wrong, how we solved it in iText, and how you can solve it if you encounter the same problem with other PDF creating software.
Description of the problem
Suppose that you have an image with a very high resolution. For instance, a picture of the skyline of New York:
This is a picture made by Natalie and provided under the Creative Commons license as described on Natalie's contact page. Natalie has stitched together a series of photos taken from Liberty Island, resulting in a 15,000 pixel wide panorama of the New York City Skyline. That's awesome, isn't it?
Many of our customers have similar image files with an even higher number of pixels, and one of the use cases of these customers is to create a PDF based on these images. In many cases, they also add extra content such as watermarks or annotations, but that's not relevant in the context of the blank page problem. What matters is that they used to get a result like this: large_image.pdf
Depending on which PDF viewer you're using, you'll say: "Great PDF! What's wrong with it?" (e.g. if you use Apple Preview) or "Seriously? I don't see a thing!" (e.g. if you use Adobe Reader).
Is this a bug in Adobe Reader? If not, why is Preview showing the image? Is this a bug in the PDF creation software? Actually, I had to make a slide change in the iText code in order to produce this file. If you use a recent version of iText, you'll get the following exception:
- Exception in thread "main" ExceptionConverter: com.itextpdf.text.DocumentException:
- The page size must be smaller than 14400 by 14400. It's 15000.0 by 7756.0.</b>
- at com.itextpdf.text.pdf.PdfPage.<init>(PdfPage.java:96)
- at com.itextpdf.text.pdf.PdfDocument.newPage(PdfDocument.java:969)
- at com.itextpdf.text.pdf.PdfDocument.close(PdfDocument.java:863)
- at com.itextpdf.text.Document.close(Document.java:415)
- at sandbox.images.ImgToPdf.main(ImgToPdf.java:21)
This exception is self-explaining: the size of the page is too large. When we consult ISO-32000-1, we discover that 14,400 user units is the maximum value for the width and the height of a page in a PDF document. The PDF we made has a single page that measures 15,000 by 7,756 user units (by default 1 user unit corresponds 1 typographic point).
When encountering a PDF page that is larger than the maximum size as defined in the ISO standard, Adobe Reader will show you a blank page. Other viewers (or PDF creators) ignore the standard, and allow you to read (or even create) PDF documents that aren't conforming to the PDF specification.
How to solve this problem?
Let's take a look at a LargeImage1, an example that takes the original (blank) PDF and creates a new PDF using the exact same image, but scaling it to a page with a width of 14,400 user units instead of 15,000: large_image1.pdf
In this example, we take a few short-cuts to fetch the PDF object that holds the image data. This is a
PRStream object. You'll need more code if you want to make this example more generic.
We feed the
PRStream object to a
PdfImageObject that allows us to get the image in the form of an array of bytes. With this byte array, we can create a new
Image object that we can scale so that it has a maximum size of 14,400 by 14,400 user units.
Note that we don't reduce the quality of the image. In the context of iText, scaling down an image means increasing the resolution; we're rendering an equal amount of pixels on a smaller surface. We use this image in a newly created PDF document with a single page that has the exact same dimensions as the scaled down image.
This example works with the PDF that shows the skyline of New York, but it may not work with every PDF. There's nothing wrong with the code of this example; the main problem is memory. In this case, the image was stored inside the PDF as a JPG, which means the bytes are stored inside the PDF as-is. We can copy them without any further processing.
However, should we try this example with a compressed bitmap, this bitmap will have to be decompressed and the
byte resulting from this operation risk being huge. We've tried this ourselves and we encountered many files throwing an
OutOfMemoryException, even after significantly increasing the heap size of the JVM.
The LargeImage2 example provides a workaround that doesn't require the image to be processed. In this workaround, we reuse the
PRStream. This stream is an indirect object in the PDF document that is used as an image XObject. Reusing this indirect object saves use plenty of processing time as well as memory use. The result will be as good as with the first example: large_image2.pdf
In this second example, we don't create a new PDF document from scratch with
PdfWriter and we don't use a
PdfImageObject instance to create a new image. Instead, we create an
Image object using an indirect reference to the image XObject stream.
Such an image can only be reused by a
PdfStamper instance, which is why we insert a new page with an adapted width and height (the same dimensions as used in the first example) to the existing document. We add the reused image to this page.
Now we have a document with two pages, one of which showing the visible image, the other not showing the image because of the wrong page dimensions. In the final step, we remove the page with the wrong dimensions.
A problem like this can only be diagnosed by somebody who knows the PDF specification inside-out. Looking at the solution, you soon realize that it can only be solved by somebody who also knows iText inside-out. Many developers spend hours of company time on problems like this. At iText, we have made it our job to be PDF detectives. Solving puzzles like this is part of the support you get when you buy a license. We hope you enjoyed reading and solving this "blank page" puzzle with us.