Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

It's always the edge cases that make this a pain.

The less like 'random' XML the document is the better the extraction will work. As soon as something oddball gets tossed in that drifts from the expected pattern things will break.



Of course. But the mathematical, computer-science level truth is that you can make a regular pattern that recognizes a string in any context-free language so long as you're willing to place a bound on the length (or equivalently, the nesting depth) of that string. Everything else is a lie-to-children (https://en.wikipedia.org/wiki/Lie-to-children).


You can, but you probably shouldn't since said regex is likely to be very hard to work with due to the amount of redundant states involved.


Our discourse does a terrible job of distinguishing impossible things from things merely ill-advise. Intellectual honestly requires us to be up front about the difference.

Yeah, I'd almost certainly reject a code review using, say, Python's re module to extract stuff from XML, but while doing so, I would give every reason except "you can't do that".




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: