Presentation Ripping your PDF files apart


The PDF file format is organised very differently from most other filetypes. This is a source of its strength and weakness and the cause of a huge number of misunderstandings. In this talk I will take you inside the PDF file to give you an understanding of how it works, the strengths of its elegant design and also possible areas of weakness. At the end I want you to know exactly how to make the PDF file format work for you.
Published on: 2012-05-07T12:47:34.000Z
Channel: iText Summit 2012 (all)
Tags: pdf itext
Speakers:

mark stephens


Mark Stephens has spent the last 14 years working on developing a PDF viewer in Java and now he is also converting PDF files into HTML5 for the mobile platform. His favourite PDF viewer is Vi (Or possibly Emacs). He is a regular attendee and speaker at conferences and you can see his 'slightly' tongue in cheek assessment of the software industry's future (complete with kittens!) at Business of Software 2012. His wife says he has a very dry sense of humour and he promises his children that he will one day find a practical use for his MA degree in Mediaeval History.

PDF: slides.pdf

Slides:

RIPPING YOUR PDF FILES APART


RIPPING YOUR PDF FILES APART What you need to know about what goes on inside your PDF files Mark Stephens Thursday, 29 March 12

RIPPING YOUR PDF FILES APART


RIPPING YOUR PDF FILES APART What you need to know about what goes on inside your PDF files Mark Stephens Thursday, 29 March 12

Mark's Bio


Mark's Bio Thursday, 29 March 12

Mark's Bio


Mark's Bio Thursday, 29 March 12

Mark's Bio


Mark's Bio Thursday, 29 March 12

Mark's Bio


Mark's Bio Working with Java and PDF since 1997 Founded IDRsolutions 1999 Speaker at Seybold, Javaone, Business of Software Thursday, 29 March 12

Mark's Bio


Mark's Bio Working with Java and PDF since 1997 Founded IDRsolutions 1999 Speaker at Seybold, Javaone, Business of Software MA degree in Mediaeval History from St Andrews (how useless is that) Ask me about Java, PDF, business or anything which happened before 1500 AD Thursday, 29 March 12

BUT FIRST SOME KITTENS...


BUT FIRST SOME KITTENS... The support team at IDRsolutions are waiting for your call (maybe) Thursday, 29 March 12

The PDF reference guide


The PDF reference guide Thursday, 29 March 12

Loading page 1124 of a file


Loading page 1124 of a file Word Read pages 1-1123 (time passes - scroll bar shrinks) Found it (eventually) PDF Read the metadata refs table(s) - where do I find all the objects Skip to page 1124 PDF (in detail) Read the refs table(s) - where do I find all the objects Read the Root object - points to the Pages object Read object for page 1124 (tells me the linked font, image, content objects) Draw it Thursday, 29 March 12

Your PDF file is a Tree


Your PDF file is a Tree A root linked to all the branches Thursday, 29 March 12

The PDF reference guide


The PDF reference guide Like you have never seen it before... Thursday, 29 March 12

The PDF reference guide


The PDF reference guide Like you have never seen it before... You can use vi or emacs if you prefer Thursday, 29 March 12

The PDF reference guide


The PDF reference guide End of the file Thursday, 29 March 12

The PDF reference guide


The PDF reference guide Like you have never seen it before... Thursday, 29 March 12

The PDF reference guide


The PDF reference guide Thursday, 29 March 12

The PDF reference guide


The PDF reference guide Like you have never seen it before... Thursday, 29 March 12

The PDF root object


The PDF root object Like you have never seen it before... Thursday, 29 March 12

The PDF root object


The PDF root object Like you have never seen it before... Thursday, 29 March 12

PDF files on the web


PDF files on the web Isn't having the marker at the end a problem?? Thursday, 29 March 12

PDF files on the web


PDF files on the web Not if you create it properly Thursday, 29 March 12

Key takeaways from the PDF structure


Key takeaways from the PDF structure We do not need to load the whole file It is equally fast to load any part of it It is very easy to replace objects with new versions There are certain key locations - like at the end of a file You should not edit it in a text editor If you want to use PDF files across the Internet, there is a special mode to make these load the most important parts first. Lots of features need you to setup the PDF file correctly. Thursday, 29 March 12

Those PDF objects in more detail


Those PDF objects in more detail All PDF objects have:1. An ID number 2. (Optional) A set of dictionary key pairs 3. (Optional) A block of binary data. Thursday, 29 March 12

Those PDF objects in more detail


Those PDF objects in more detail All PDF objects have:1. An ID number 2. (Optional) A set of dictionary key pairs 3. (Optional) A block of binary data. Thursday, 29 March 12

PDF images are not Tiff, Png or JPeg


PDF images are not Tiff, Png or JPeg Thursday, 29 March 12

PDF images are not Tiff, Png or JPeg


PDF images are not Tiff, Png or JPeg Thursday, 29 March 12

A word on colour


A word on colour DeviceRGB CalRGB DeviceCMYK ICC Separation DeviceN DeviceGray CalGray Lab Pattern Thursday, 29 March 12

PDF pages are `drawn'


PDF pages are `drawn' Thursday, 29 March 12

PDF pages are `drawn'


PDF pages are `drawn' 0 0 0 1k set cmyk color of text to black BT start of some text /T1_01Tf Use the font defined as T1_0 elsewhere 0 Tc 0 Tw 0 Ts 100 Tz 0 Tr set other text properties 7.5003 0 0 7.5003 272.1643 540.2979 Tm position onscreen (L*) Tj draw the text L* /T1_1 1Tf change font 0.856 0 Td move to a different location onscreen ( = 100) Tj draw the text = 100 -0.324 -1.133Td move to a different location onscreen [(whit)6(e)] Tj draw the text white (put a gap between t and e) Thursday, 29 March 12

Thursday, 29 March 12


Thursday, 29 March 12

PDF myth - files are cross platform


PDF myth - files are cross platform Only if you create them properly... Thursday, 29 March 12

Obfuscation for idiots!


Obfuscation for idiots! No-one will be able to guess the secret password Thursday, 29 March 12

20 seconds later...


20 seconds later... And the password is.... Thursday, 29 March 12

Lastly a plea


Lastly a plea Not all PDF creation tools are equal Thursday, 29 March 12

In summary


In summary Thursday, 29 March 12