Concatenative languages like Factor and Forth are very token-efficient in theory. Theoretically optimal for raw lexical density. No parentheses, no commas, no argument delimiters, just whitespace-separated words, but stack shuffling can add overhead for complex data flow, unless you use "locals" in Factor, for example.
C is surprisingly efficient as well. Minimal keywords, terse syntax, single-character operators. Not much boilerplate, and the core logic is dense.
I think the worst languages are Java, C#, and Rust (lifetime annotations, verbose generics).
In my opinion, C or Go for imperative code, Factor / Forth if the model knows them well.
Is that statement about C based on anything in particular? C was 18th of all the languages in the article's chart (the worst!), which I'd guess was due to the absence of a standard library.
Fair point. There is a distinction between syntactic efficiency (C is terse) and task-completion efficiency (what the benchmark likely measured). If the tasks involved string manipulation, hash maps, JSON, etc. then C pays a massive token tax because you are implementing what other languages provide in stdlib. Python has dict and json.loads(), C has malloc and strcmp.
So: C tokenizes efficiently for equivalent logic, but stdlib poverty makes it expensive for typical benchmark tasks. Same applies to Factor/Forth, arguably worse.
I understand your logic but I found LLM's to be quite strong at C#. It makes little mistakes and the mistakes seem related to the complexity of what I'm doing, not the language itself.
C is surprisingly efficient as well. Minimal keywords, terse syntax, single-character operators. Not much boilerplate, and the core logic is dense.
I think the worst languages are Java, C#, and Rust (lifetime annotations, verbose generics).
In my opinion, C or Go for imperative code, Factor / Forth if the model knows them well.