Recently I was very fortunate to get speaking to Andreas Jung. Andreas runs the print-css.rocks website, which is a fantastic resource if you want to learn about the tools you can use to transform your HTML documents into well formatted PDF files.
First of all, can you give me a brief background on you and why you decided to start print-css.rocks
My name is Andreas Jung. I studied computer science in the mid-90s and I worked for several publishers and publishing related projects over the last two decades. I have been involved in several publishing projects for the EU like the production of the official journals for the European Community, the implementation of the TED (Tenders Electronic Daily) system and many others.
My work on generating high-quality output (not only PDF) from markup content (XML or HTML) and CSS for layout and styling started about ten years ago when we had the requirement to generated DOC, PDF, HTML and RTF from a single XML/HTML source document. The original solution was based (for the PDF part) on a tool csstoxslfo which transformed a CSS file to XSL-FO under the hood and processed the input content using a derivative of Apache FOP. This worked out fairly well for legal documents containing lots of text, lists, tables and sometimes images.
Over the years I got in touch with PrinceXML which was the first processor that could turn XML/HTML into PDF using CSS based on the “CSS Paged Media” approach. The “CSS Paged Media” approach is based on an old W3C draft (which is still in the making). CSS Paged Media defines an extension of CSS for printed documents. It defines CSS selectors for specifying the page layout and borders, it defines so-called page area, it introduces CSS statements specific to printed document like pagination and so on.
The PDF generation became part of my business (besides standard web and portal projects with Python and Plone) and I implemented several projects with high requirements regarding typography, look & feel etc. My own product family Produce & Publish was born based on the CSS Paged Media approach. Over time we came to another PDF generator tool PDFreactor which we used also in several projects. We also made friendship with Antennahouse Formatter lately.
Based on the real-world experience with all tools on the market I decided to create the print-css.rocks project. Initially the material was part of a CSS Paged Media workshop that I gave at XML London 2014 and a talk at XML Prague 2015. The idea was to build some small lessons - both for testing how each converter behaves with the same input in order to check the interoperability of the tools and on the other hand to break down the complexity of the CSS Paged Media standard into single aspects for giving a newbie a good starting point into the CSS Paged Media world.
[Note from Toby: Here is Andreas talking at XMLPrague]
So the moment the print-css.rocks project is around for about 14 months. It has seen two major iterations in between and currently covers the tools PDFreactor, PrinceXML, Antennahouse Formatter and Vivliostyle. And new release with the updated versions of the converts is currently in the making.
I’ve been working on the web for years and only heard about HTML+CSS to HTML parsers whilst I was researching this latest book, do you think HTML and CSS outside of the browser have a marketing issue?
The CSS Paged Media approach is an alternative to the XSL-FO approach for generating PDF from XML/HTML. XSL-FO will be around for the time being but it is the fate of XSL-FO to die slowly. The XSL-FO W3C committee has been closed and the few XSL-FO vendors are doing their own thing. XSL-FO is hard to approach and companies are of course looking for alternatives.
Everyone knows CSS so it is obvious that the CSS Paged Media standard is more approachable and has fewer barriers than digging into the XSL-FO world. So there are more and more publishers approaching the PDF generation approach with CSS - however they are all moving slowly because they have their existing in-house XSL-FO workflows and changing the horses is always a risk and it has its costs.
I love the analogy of changing horses, that makes a lot of sense. My main focus right now are print stylesheets on the web, do you believe they are a necessity in 2017?
Actually I never cared much about print stylesheets for printing content from the web in a nice way. Of course you provide a print.css with each project or you rely on a print stylesheet that comes with your CMS or portal system. However printing a page from browser content is something different in my opinion than generating a PDF document for print.
A printed webpage should look somewhat reasonable and nice. But generating a publication that you could sell in a bookstore or a layout-oriented brochure has different requirements. You have to deal with pagination and left/right pages, you want to have running headers and footers for counter like page numbers. Fine typography is a big point, you want language-dependant hyphenation… all the fun that makes a nice printed book or journal.
The perception a lot of developers and development teams have is that printing the web is at best a nice to have and they don’t have time to add the functionality. Do you think this is a fair stance?
Yes, this is exactly my point as I stated above. Of course you want a print-out to look nice but on the other hand you usually do not want to invest a lot of time and resources in a print stylesheet. This is often on of the last things that you implement in a web project.
Why do you think browsers have been slow to support things like paged media?
Because browser are designed for rendering webpages. They are made to render webpages in a nice way on different devices, on different screen sizes, different aspect ratio. Browsers have never had a background in print.
Can you imagine a time when tools like Prince aren’t required because browser support is there?
Perhaps. The Vivliostyle project is a completely different approach for generating PDF using CSS Paged Media. Their approach is to use standard browsers (the Viviliostyle Formatter uses Webkit under the hood) and predefined primitives of the render engines (which are available across all browser engines and not specific to print). I do not see other vendors moving in the same direction, they have their own rendering engines for many years - for good or bad - and I do not expect that to change in the long run.
If you could have one print-related wish come true, what would it be?
Less vendor-specific extensions, commitment to the W3C related drafts around CSS Paged Media and a competitive open-source implementation.
For someone who has been converted to wanting their documents to print well, what advice would you give them or resources would you share with them?
Well, first you need to be clear about your specific project requirements. The general rule for choosing a converter is: you get what you pay for.
There are only a few open-source tools that implement CSS Paged Media in a half-baked. The other tools mentioned above differ by price and features. If you have professional requirements then you should look at PrinceXML and PDFreactor first. They provide free evaluation licenses that can be used for playing around and for figuring out how far you get. The personal licenses of both tools are also reasonably priced.
If you need the highest quality then you may look into Antennahouse. The problem with Antennahouse is the huge number of vendor-specific extension and the documentation which is not perfectly suited for newbies. My print-css.rocks project tries to fill the gap a bit by breaking the complexity into small snippets.
Thanks so much for taking time out of your day to answer these, is there anything else you’d like to add?
Not directly related to CSS but there is a very interesting alternative for generating PDF from XML. It’s the Speedata Publisher project by Patrick Gundlach from Berlin. It’s based on TeX and uses an XML-rule based approach for adaptive rendering of highly complex publications like catalogs, brochures etc. Superb publications can be generated using Speedata. But it is not based on CSS and needs a bit more “programming” work and knowledge for creating the style rules. However it is CSS and provides an outstanding quality.
End of Interview
I want to thank Andreas for taking the time to answer my questions on this subject, I learned a lot.
If you’re interested on the topic of turning your HTML documents into something printable then I’d suggest you check out my interview with Håkon Wium Lie on this very subject.