Loading up your parsing code and reopening the file every time a setting is quer...

ajross · 2025-04-06T11:03:50 1743937430

The ssh config format has almost no context, and the code is static and always "loaded up". I can all but guarantee this isn't correct. Modern hackers tend to wildly overestimate the complexity of ancient tasks like parsing.

Dylan16807 · 2025-04-06T16:11:48 1743955908

If you're actually concerned about the handfuls of bytes a settings object would take, you would make the page/segment containing parser code able to be unloaded from memory.

naniwaduni · 2025-04-06T05:18:42 1743916722

You don't care about average memory use, you care about peak memory use.

Dylan16807 · 2025-04-06T16:48:24 1743958104

Same criticism. When the program is in the middle of busy runtime activity, with all the memory that entails, it's the worst time to also load up the parser.

NekkoDroid · 2025-04-06T06:56:25 1743922585

Doesn't really sound much better. You still load up the file(s) and the parser either way, so parsing all once vs on-demand is just a question of computation duration and considering how many config options are used the on-demand just seems really wasteful, especially after startup.

ajross · 2025-04-06T11:05:27 1743937527

> load up the file

I/O is done piecewise, a line at a time. The file is never "loaded up". Again you're applying an intuition about how parsers are presented to college students (suck it all into RAM and build a big in-memory representation of the syntax tree) that doesn't match the way actual config file parsers work (read a line and interpret it, then read another).

NekkoDroid · 2025-04-08T07:05:23 1744095923

I didn't mean it in a way that "all of the file is loaded into memory", just the parts you are always processing at the time (e.g. as you said line wise), which either way result in the same memory usage from the file being loaded.

hnlmorg · 2025-04-06T09:28:42 1743931722

The GP is correct in terms of super old systems.

In said systems, RAM was such an expensive resource that we had to save individual bits wherever we could. Such as only storing the last two digits of the year (aka the millennium bug).

The computational cost of infrequently rescanning the config files then freeing the memory afterwards was much cheaper than the cost of storing those config files into RAM. And I say “infrequently rescanning” because you weren’t talking about people logging in and out of TSSs at rapid intervals.

That all said, sshd was first written in the 90s so I find it hard to believe RAM considerations was the reason for the “first match” design of sshd’s config. More likely, it inherited that design choice from rsh or some other 1970s predecessor.

ajross · 2025-04-06T13:14:57 1743945297

> hard to believe RAM considerations was the reason for the “first match” design of sshd’s config

And I repeat: first match involves less code. It's a simpler design. The RAM point was an interesting digression, I literally put it in parentheses!

hnlmorg · 2025-04-06T13:21:00 1743945660

I don’t think it does require less code. I don’t think it requires more code either. It’s just not a fundamental code change.

The difference is just either: overwriting values or exiting in the presence of a match. Either way it’s the same parser rules you have to write for the config file structure.

ajross · 2025-04-06T13:56:27 1743947787

OK, but now that's a performance regression. The assumption upthread was that the whole file needed to be parsed into an in-memory representation. If you don't do that, sure, you can implement precedence either way. But parsing all the way to the end for every read is ridiculous. The choice is between "parse all at once", which allows for arbitrary precedence rules but involves more code, and "parse at attribute read time", which involves less code but naturally wants to be a "first match" precedence.

hnlmorg · 2025-04-07T08:21:26 1744014086

As someone who’s written multiple parsers, I can tell you that having a parser that stops upon matched condition requires a lot more careful thought about how that parser is called in a reusable way while still allowing for multiple different types of parameters to be stored in that configuration.

For example:

- You might have one function that requires a little of all known hosts (so now your “stop” condition isn’t a match but rather a full set)

- another function that requires matching a specific private key for a specific host (a traditional match by your description)

- a third function that checks if the host IP and/or host name is a previously known host (a match but no longer scanning host names, so you now need your conditional to dynamically support different comparables)

- and a forth function to check what public keys are available which user accounts (now you’re after a dynamic way to generate complete sets because neither the input nor the comparison are fixed and you’re not even wanting the parser to stop on a matched condition)

Because these are different types of data being referenced with different input conditions, you then need your parser to either be Turing complete or different types of config files for those different input conditions thus resulting in writing multiple different parsers for each of those types of config (sshd actually does the latter).

Clearly the latter isn’t simpler nor less code any more.

If you’re just after simplicity from a code standpoint then you’d make your config YAML / JSON / TOML or any other known structured format with supporting off-the-shelf libraries. And you’d just parse the entire thing into memory and then programmatically perform your lookups in the application language rather than some config DSL you’ve just had to invent to support all of your use cases.

Dylan16807 · 2025-04-06T16:09:23 1743955763

You introduced an n^2 config algorithm but now you're worried about the much smaller performance impact from going to the end of the file instead of sometimes stopping halfway?

And for the record I'm not convinced your way is simpler. The code gets sprinkled with config loading calls instead of just checking a variable, and the vast majority of the parser is the same between versions.

ajross · 2025-04-06T18:41:01 1743964861

You're not discussing in good faith. The performance comparison was to the "parse to the end" variant that you suggested as equivalent. The natural way you implement that (again, very simple) algorithm wants the early exit, yes, for obvious performance reasons.

We're done. You're "not convinced" my way is simpler because you're not willing to give ground at all. This is a dumb thing to argue about. Just look at some historical parsers for similar languages, I guess. Nothing I've said here is controversial at all.

Dylan16807 · 2025-04-06T22:16:49 1743977809

You have me confused with someone else.

Different people are making different points. Nobody is arguing in bad faith.