Hacker Newsnew | past | comments | ask | show | jobs | submit | mikeday's commentslogin

Ohhh Nigerian Prince, I thought it was a wrapper for our Prince HTML to PDF formatter, but there is also a JavaScript implementation of Prince of Persia lol.


LoL


We've spent twenty years working on HTML to PDF conversion and I expect we could easily spend another twenty years, so feel free to give Prince a try if you would rather avoid the headache :)


Normally when we are nearly there we say its 95% done and only 95% of the work remains. If your feeling is you are half done I suspect more than 50% of the work remains!

What I know having done a lot in this space is we aren't close!


yeah, we often refer to the first 90% of the work and the second 90% of the work lol.


Awesome. From curiosity: is Prince's core still written in Mercury? (Looked at old comments.)


Absolutely! The CSS support, layout engine, PDF output, and JavaScript interpreter are all written in Mercury, while the font support that was originally a mix of Mercury and C has now been rewritten as a standalone Rust project, Allsorts.


HarfBuzz is more complete (supports more scripts) and higher performance (we assume, haven't benchmarked yet), but the large C++ codebase can be a little intimidating to dive into. We plan to extend Allsorts to reach feature parity with HarfBuzz, so it will be an interesting comparison of tackling a complex problem in Rust!


What about compiling existing OpenType fonts to state machines in combination with the shaping rules? That's something we're thinking about for our Rust implementation :)


It's not as easy as it looks, but probably can be done. The research I was doing was very much pure transducers, but with a couple of twists (the one I'm proudest of is alternating between forward and backward passes, which makes certain things like matra reordering much easier). There is some work in this space by Monotype as well.


We use Mercury at YesLogic to write Prince, our HTML to PDF formatter! [1]

We chose it because logic/functional languages are great for tree processing, Mercury was designed for large projects, and because in 2002 there really weren't many other options around.

Its syntax and semantics are derived from Prolog, it borrows a lot from Haskell (types, type classes), in spirit it's reminiscent of OCaml (niche, little weird) and with support for unique modes there is some interesting overlap with Rust, although this aspect of the language still needs more compiler support.

All in all, definitely worth checking out.

[1] https://www.princexml.com/


I've seen a bunch of these (HTML -> PDF). I've never seen a succinct answer to: "How is this different/better than taking <random web browser> and hitting "print", which at least on OS X will produce a nice PDF?"


Prince is pretty powerful when it comes to print-specific stuff. We care about pagination, making tables look good across page breaks, footnotes, great justification, table of contents, non-sRGB color space handling, crop marks, etc. Also having great accessibility annotations (often mandatory for government documents). These are things that web browsers are less concerned with - print-to-PDF is more of an afterthought, where as for us it's our main area of focus.


That's not even close to what you get with a good HTML -> PDF export, which can include anything from proper pagination, page margins and TOCs, to orphans handling and other such concerns.


The Synfony project use princexml to generate their documentation (including The Book) and it's phenomenally good.

https://symfony.com/doc/current/index.html#gsc.tab=0

Select offline, The book, 4.2 and it generates the book on the fly.


I'm not the guy you asked but I've been using PrinceXML to produce PDFs intended for customers of our client (e.g. invoices, terms and conditions, itineraries, etc.). Sure, we could just display the HTML and let either the customer or the sales agent press "Print to PDF" but it's not very user friendly—non-power users may not know that "print to PDF" is even an option—nor is it particularly practical for batch processing.

Full disclosure: If I'd had my way we would have used LaTeX templates to produce the PDFs but the previous developers had already implemented the HTML->PDF flow, so we just replaced the old, defunct service with Prince, which did a surprisingly good job, IMO.


>Sure, we could just display the HTML and let either the customer or the sales agent press "Print to PDF" but it's not very user friendly—non-power users may not know that "print to PDF" is even an option—nor is it particularly practical for batch processing.

It's not just that. Print to PDF for basic stuff it can be an option. For complex documents, print workflows, etc, it's a non-starter.


> "How is this different/better than taking <random web browser> and hitting "print"

If you have to repeat this process 2000 times, it becomes time consuming. It doesn't scale for a single user needing 2000 pdfs to do the process manually.


You don't know Headless Browser modes. There are a bunch of scriptable CLIs.


I do know about headless browsers. The comment above mine mentioned a manual process, no scripts or headless browsers. a combo of Curl, wkhtmltopdf(or other html to pdf) and a for loop can perform this in a bash 1-liner.


Prince is cool, I've used it 10 years ago or something. No fuss about that.

It's a bit pricey, though (at leats, pricer than "free"). So we're using WeasyPrint on more recent projects.

WeasyPrint is open source and written in Python. It's much slower than Prince, though, but this can be mitigated by caching renderings. I'm would bet that it's as standard-compliant or bug-free than Prince, but it's good enough for us.

When / if our customers ask for more speed or pixel-perfect support (with the $$$ to match), we will definitively try Prince again.


The Java class caught my eye. Is that a wrapper around a native lib or you make RPC calls to something?

HTML to PDF is something I never thought about since Firefox does it (and results usually aren’t great).


It's a wrapper around the native process just to simplify passing command-line arguments. (There is also a persistent process mode for speeding up batch processing of many small documents).

The browsers don't specialise in PDF generation, and we do :)


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: