What is the negative?

kristopolous · on July 25, 2023

At least on older hardware, number of reboots was a stronger correlative with failure than hours on.

I'll readily admit this may have been apocryphal. It was a common adage when I was a child in the 80s and now that I'm actually qualified enough to suss out such a statement I've never cracked open the historical literature at archive.org on this one to actually check.

It could just be a portage from incandescent lightbulbs (where this is true) and older cars (where this is also true). The idea of non-technicals thinking magical computer dust having the same problem is understandable

erosenbe0 · on July 26, 2023

One of the highest stresses on passives and power components occurs when there is an inrush of current (di/dt) or a voltage spike (dv/dt), which can occur on power cycling or plug in. So it is not a myth that hard reboots can be stressful on older hardware, but there is a certain amount of red herring because power cycling is also when an aged or diseased component is likely to show failure due to hours of service.

Modern devices and standards are able, at a low cost, to implement ripple, transient, reverse voltage, inrush limiting and the like. So failures are more isolated.

Nowadays, with stuff like USB there is inrush limiting, reverse voltage protection, transient suppression and it costs very little to implement, so it's mostly going to be power supplies.

kristopolous · on July 26, 2023

Power bricks are certainly a huge failure point. Luckily they're cheap. It's a great place to fail if we must assign it somewhere

aflag · on July 25, 2023

You have to close all windows (and possibly tabs in your editor), restart long running jobs you have in the background, restart your SSH sessions, lose your undo tree, lose all the variables you have loaded in your interpreter or bash, among others that I have possibly forgotten.

All recoverable, but annoying. I can't imagine doing that every day. It's fine for a home computer, but for a workstation, I just want it always on. Though these days even my personal laptop is essentially always on.

eru · on July 26, 2023

For a personal machine, it's fine to leave it always on.

> All recoverable, but annoying.

For a machine that other people are supposed to rely upon, I'd rather exercise this recovery you are talking about regularly. So I know it works, when I need it.

For a production system, I rather live through it's first day of uptime 10,000 days in a row, than making new uptime records every day. In production, you don't want to do anything for the first time, if you can avoid it.

aflag · on July 26, 2023

For production it's highly dependent on the business needs. But restarting the entire estate every day seems a big enough hit to capacity at the very least that may already be prohibitive without any further consideration.

Not to count that would require every service to be prepared to be restarted daily, which could require a more complex system than you'd need otherwise.

eru · on July 27, 2023

I'd want to restart individual components often, probably not the whole system at once.

Basically, whatever eg Google is doing.

aflag · on July 27, 2023

I doubt Google restarts all their machines once a day. Obviously not all at once, otherwise they'd have massive downtime. But anyway, Google's needs are very different than just about any other company on earth (except for a handful of others like Facebook and Amazon). So, they are usually not the best example.

eru · on July 31, 2023

Yes, once a day was an example. Google uses different frequencies. However I do remember getting automated nagging messages at Google when a service hadn't been redeployed (and thus restarted) for more than two weeks.

Google as a whole might be different from other companies, yes. But smaller departments in Google aren't that different from other companies.

aflag · on Aug 6, 2023

Getting those messages is very different (and a lot more reasonable) from forceful restarting them, which was the initial suggestion.

The restarting risk is normally so small, that there are several other things that are more important than constantly restarting to test that restart works. Continuous delivery, security patching and hardware failure will likely cause enough restarts anyway.

computerfriend · on July 25, 2023

Downtime.