Presentation Ripping your PDF files apart
The PDF file format is organised very differently from most other filetypes. This is a source of its strength and weakness and the cause of a huge number of misunderstandings. In this talk I will take you inside the PDF file to give you an understanding of how it works, the strengths of its elegant design and also possible areas of weakness. At the end I want you to know exactly how to make the PDF file format work for you.
Published on: 2012-05-07T12:47:34.000Z
Channel: iText Summit 2012 (all)
Tags: pdf itext
Speakers:
mark stephens
Mark Stephens has spent the last 14 years working on developing a PDF viewer in Java and now he is also converting PDF files into HTML5 for the mobile platform. His favourite PDF viewer is Vi (Or possibly Emacs). He is a regular attendee and speaker at conferences and you can see his 'slightly' tongue in cheek assessment of the software industry's future (complete with kittens!) at Business of Software 2012. His wife says he has a very dry sense of humour and he promises his children that he will one day find a practical use for his MA degree in Mediaeval History.
PDF: slides.pdf
Slides:
RIPPING YOUR PDF FILES APART
RIPPING YOUR PDF FILES APART
What you need to know about what goes on inside your PDF files
Mark Stephens
Thursday, 29 March 12
RIPPING YOUR PDF FILES APART
RIPPING YOUR PDF FILES APART
What you need to know about what goes on inside your PDF files
Mark Stephens
Thursday, 29 March 12
Mark's Bio
Mark's Bio
Thursday, 29 March 12
Mark's Bio
Mark's Bio
Thursday, 29 March 12
Mark's Bio
Mark's Bio
Thursday, 29 March 12
Mark's Bio
Mark's Bio
Working with Java and PDF since 1997 Founded IDRsolutions 1999 Speaker at Seybold, Javaone, Business of Software
Thursday, 29 March 12
Mark's Bio
Mark's Bio
Working with Java and PDF since 1997 Founded IDRsolutions 1999 Speaker at Seybold, Javaone, Business of Software MA degree in Mediaeval History from St Andrews (how useless is that)
Ask me about Java, PDF, business or anything which happened before 1500 AD
Thursday, 29 March 12
BUT FIRST SOME KITTENS...
BUT FIRST SOME KITTENS...
The support team at IDRsolutions are waiting for your call (maybe)
Thursday, 29 March 12
The PDF reference guide
The PDF reference guide
Thursday, 29 March 12
Loading page 1124 of a file
Loading page 1124 of a file
Word Read pages 1-1123 (time passes - scroll bar shrinks) Found it (eventually) PDF Read the metadata refs table(s) - where do I find all the objects Skip to page 1124 PDF (in detail) Read the refs table(s) - where do I find all the objects Read the Root object - points to the Pages object Read object for page 1124 (tells me the linked font, image, content objects) Draw it
Thursday, 29 March 12
Your PDF file is a Tree
Your PDF file is a Tree
A root linked to all the branches
Thursday, 29 March 12
The PDF reference guide
The PDF reference guide
Like you have never seen it before...
Thursday, 29 March 12
The PDF reference guide
The PDF reference guide
Like you have never seen it before...
You can use vi or emacs if you prefer
Thursday, 29 March 12
The PDF reference guide
The PDF reference guide
End of the file
Thursday, 29 March 12
The PDF reference guide
The PDF reference guide
Like you have never seen it before...
Thursday, 29 March 12
The PDF reference guide
The PDF reference guide
Thursday, 29 March 12
The PDF reference guide
The PDF reference guide
Like you have never seen it before...
Thursday, 29 March 12
The PDF root object
The PDF root object
Like you have never seen it before...
Thursday, 29 March 12
The PDF root object
The PDF root object
Like you have never seen it before...
Thursday, 29 March 12
PDF files on the web
PDF files on the web
Isn't having the marker at the end a problem??
Thursday, 29 March 12
PDF files on the web
PDF files on the web
Not if you create it properly
Thursday, 29 March 12
Key takeaways from the PDF structure
Key takeaways from the PDF structure
We do not need to load the whole file It is equally fast to load any part of it It is very easy to replace objects with new versions There are certain key locations - like at the end of a file You should not edit it in a text editor If you want to use PDF files across the Internet, there is a special mode to make these load the most important parts first. Lots of features need you to setup the PDF file correctly.
Thursday, 29 March 12
Those PDF objects in more detail
Those PDF objects in more detail
All PDF objects have:1. An ID number 2. (Optional) A set of dictionary key pairs 3. (Optional) A block of binary data.
Thursday, 29 March 12
Those PDF objects in more detail
Those PDF objects in more detail
All PDF objects have:1. An ID number 2. (Optional) A set of dictionary key pairs 3. (Optional) A block of binary data.
Thursday, 29 March 12
PDF images are not Tiff, Png or JPeg
PDF images are not Tiff, Png or JPeg
Thursday, 29 March 12
PDF images are not Tiff, Png or JPeg
PDF images are not Tiff, Png or JPeg
Thursday, 29 March 12
A word on colour
A word on colour
DeviceRGB CalRGB DeviceCMYK ICC Separation DeviceN DeviceGray CalGray Lab Pattern
Thursday, 29 March 12
PDF pages are `drawn'
PDF pages are `drawn'
Thursday, 29 March 12
PDF pages are `drawn'
PDF pages are `drawn'
0 0 0 1k set cmyk color of text to black BT start of some text /T1_01Tf Use the font defined as T1_0 elsewhere 0 Tc 0 Tw 0 Ts 100 Tz 0 Tr set other text properties 7.5003 0 0 7.5003 272.1643 540.2979 Tm position onscreen (L*) Tj draw the text L* /T1_1 1Tf change font 0.856 0 Td move to a different location onscreen ( = 100) Tj draw the text = 100 -0.324 -1.133Td move to a different location onscreen [(whit)6(e)] Tj draw the text white (put a gap between t and e)
Thursday, 29 March 12
Thursday, 29 March 12
Thursday, 29 March 12
PDF myth - files are cross platform
PDF myth - files are cross platform
Only if you create them properly...
Thursday, 29 March 12
Obfuscation for idiots!
Obfuscation for idiots!
No-one will be able to guess the secret password
Thursday, 29 March 12
20 seconds later...
20 seconds later...
And the password is....
Thursday, 29 March 12
Lastly a plea
Lastly a plea
Not all PDF creation tools are equal
Thursday, 29 March 12
In summary
In summary
Thursday, 29 March 12