The less like 'random' XML the document is the better the extraction will work. As soon as something oddball gets tossed in that drifts from the expected pattern things will break.
Of course. But the mathematical, computer-science level truth is that you can make a regular pattern that recognizes a string in any context-free language so long as you're willing to place a bound on the length (or equivalently, the nesting depth) of that string. Everything else is a lie-to-children (https://en.wikipedia.org/wiki/Lie-to-children).
Our discourse does a terrible job of distinguishing impossible things from things merely ill-advise. Intellectual honestly requires us to be up front about the difference.
Yeah, I'd almost certainly reject a code review using, say, Python's re module to extract stuff from XML, but while doing so, I would give every reason except "you can't do that".
The less like 'random' XML the document is the better the extraction will work. As soon as something oddball gets tossed in that drifts from the expected pattern things will break.