Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

What language do you use parser combinators in, and what kind of grammar do you parse usually? Nom was terribly verbose and unergonomic even by Rust's standards. Haskell's Megaparsec/Parsec is good but yeah, it's Haskell, you need to handle multiple monads (Parser itself is monadic, then your AST state, and maybe some error handling) at once and that's where I got confused. But I appreciated the elegance.

I experimented with PCs in Haskell and Rust (nom), then moved on to parser generators in Rust (pest.rs), Ocaml (Menhir), Haskell (Happy) and finally ended up with python's Lark - the speed of experimenting with different syntax/grammars is just insane.



Parser combinators is more of a concept than a library. You could make your own supporting the stuff you need. I like writing programs in languages I don't know or I barely know. I usually just take one of the popular libraries in any given language.

For Rust I used Nom and I didn't mind it all that much although I noticed it's quite baroque. If I had more to write I'd probably make some wrappers or macros of my own for most commonly used Nom snippets.


I've used tree-sitter for generating my parsers in Rust, and just working with the untyped syntax tree it generates, and gives you error-tolerance for free. It's a bit of a setup at first tho, requiring an extra crate for the generated parser, but editing it from there saves so much time.


What do you mean exactly by "error-tolerance"? Is it like, each node is wrapped into a result type, that you have to match against each time you visit it, even though you know for a fact, that it is not empty or something like that?

I suppose that one of the pros of using tree-sitter is its portability? For example, I could define my grammar to both parse my code and to do proper syntax highlighting in the browser with the same library and same grammar? Is that correct? Also it is used in neovim extensively to define syntax for a languages? Otherwise it would have taken to slightly modify the grammar.


Oh nono, with tree-sitter, you get an untyped syntax tree. That means, you have a Cursor object to walk the tree, which creates Node objects as you traverse, that have a "kind" (name of the tree-sitter node), span, and children. (I recommend using the rust tree-sitter bindings itself, not the rust wrapper rust-sitter).

Yes, portability like that is a huge benefit, though I personally utilized it for that yet. I just use it as an error-tolerant frontend to my compiler.

As to how errors are reported, tree-sitter creates an ERROR or MISSING node when a particular subtree has invalid syntax. I've found that it never leaves a node in an invalid state, (so never would it create a binaryop(LeftNode(...), Op, ERROR) if RightNode is not optional. Instead it would create an ERROR for binaryop too. This allows you to safely unwrap known fields. ERROR nodes only really bunch up in repeat() and optional()s where you would implicity handle them.

For an example, I can only point you to my own use: https://github.com/pc2/sus-compiler

tree-sitter-sus has the grammar

sus-proc-macro has nice proc macros for dealing with it (kind!("binop"), field!("name"), etc)

src/flattening/parser.rs has conveniences like iterating over lists

and src/flattening/flatten.rs has the actual conversion from syntax tree to SUS IR


Error tolerance in this context means the parser produces a walkable AST even if the input code is syntactically invalid, instead of just throwing/reporting the error. It’s useful for IDEs, where the code is often in an invalid state as the developer is typing, but you still want to be able to report diagnostics on whatever parts of the code are syntactically valid.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: