Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Oga: a new XML/HTML parser for Ruby (2014) (yorickpeterse.com)
36 points by redman25 on Feb 20, 2016 | hide | past | favorite | 7 comments


Kinda sadly, a brand new HTML parser that doesn't even follow the HTML spec (which nowadays defines how to parse any character stream, therefore including all non-conforming documents). :(


Has this actually posed any problems for you? If so, could you report these as issues if you haven't already done so?


Nokogiri's problems are compounded in that a lot of major Ruby gems still use it, so sometimes you can't get around the long build time. A new parser is only one teeny tiny step towards fixing that problem.

These are libraries other businesses now rely upon. Should the maintainers of this library risk the stability of the gem (and thus the reputation of the project) by swapping out such an important dependency? It's incumbent on the designers of the replacement and the community behind it to make such a case.


How does this compare to Ox?

https://github.com/ohler55/ox


This is briefly covered in https://github.com/YorickPeterse/oga#why-another-htmlxml-par...:

> Ox looks very promising but it lacks a rather crucial feature: parsing HTML (without using a SAX API). It's also again a C extension making debugging more of a pain (at least for me).

Performance wise ox is generally a tad faster.


Is there a canonical set of document and benchmarks for xml parsers ?


Needs 2014 in the title. Oga has been around for a while now and seems to be pretty stable.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: