Considering that you were seeing unpredictable behavior in the boot selector, with it randomly freezing, I would assume a hardware component (RAM?) kicked the bucket. If it were firmware corruption, it would consistently fail to present the menu, or wouldn't boot at all.
Microsoft's code quality might not be at its peak right now, but blaming them for what's most likely a hardware fault isn't very productive IMO.
Or something hit max program-erase cycle counts and are returning corrupt/old data. Flash ROMs tend to become "sticky" with previous states as you write more to them. I think it's possible that ROMs used for early SoC boot firmware or peripherals firmware still don't have wear leveling that they could become unusable after just a hundred or so of writes.
It could very well be something poorly configured in the boot chain leading to random failures. There are plenty of hardware things configured in software which can lead to plenty of different kinds of random failures.
Maybe! I could certainly see something like the firmware switches on something way heavier that pulls down an already marginal supply.
Remember the very early Raspberry Pis that had the polyfuses that dropped a little too much voltage from the "5V" supply, so a combination of shitty phone charger, shitty charging cable, and everything just being a little too warm/cold/wrong kind of moonlight would just make them not boot at all?
They were losing usb once in a while if ran 24/7. Had to make them self reboot every couple hours. Fortunately it didn't matter for what we were doing with them.
That's plausible, but I'd expect the UEFI patches to come from a vendor, not Microsoft. So if one came from Qualcomm, and they didn't properly specify the devices it should be installed on, that wouldn't make it Microsoft's fault.
So the "hardware failure" happening exactly at the same time the Windows update installation failed are not related? That sounds like a one in a billion kind of coincident.
An upgrade process involves heavy CPU use, disk read/writes, and at least a few power cycles in short time period. Depending what OP was doing on it otherwise, it could've been the highest temperature the device had ever seen. It's not so crazy.
My guess would've been SSD failure, which would make sense to seem to appear after lots of writes. In the olden days I used to cross my fingers when rebooting spinning disk servers with very long uptimes because it was known there was a chance they wouldn't come back up even though they were running fine.
Not for a server, but many years ago my brother had his work desktop fail after he let it cold boot for the first time in a very long time.
Normally he would leave his work machine turned on but locked when leaving the office.
Office was having electrical work done and asked that all employees unplug their machines over the weekend just in case of a surge or something.
On the Monday my brother plugged in machine and it wouldn’t turn on. Initially the IT guy remarked that my brother didn’t follow the instructions to unplug it.
He later retracted the comment after it was determined the power supply capacitors had gone bad a while back, but the issue with them was not apparent until they had a chance to cool down.
> In the olden days I used to cross my fingers when rebooting spinning disk servers with very long uptimes because it was known there was a chance they wouldn't come back up even though they were running fine.
HA! Not just me then!
I still have an uneasy feeling in my guts doing reboots, especially on AM5 where the initial memory timing can take 30s or so.
I think most of my "huh, its broken now?" experiences as a youth were probably the actual install getting wonky though, rather than the few rare "it exploded" hardware failures after reboot, though that definitely happened.
I'd like to add my reasoning for a similar failure of an HP Proliant server I encountered.
Sometimes hardware can fail during long uptime and not become a problem until the next reboot. Consider a piece of hardware with 100 features. During typical use, the hardware may only use 50 of those features. Imagine one of the unused features has failed. This would not cause a catastrophic failure during typical use, but on startup (which rarely occurs) that feature is necessary and the system will not boot without it. If it could, it could still perform it's task... because the damaged feature is not needed. But it can't get past the boot phase, where the feature is required.
Tl;dr the system actually failed months ago and the user didn't notice because the missing feature was not needed again until the next reboot.
Is there a good reason why upgrades need to stress-test the whole system? Can't they go slowly, throttling resource usage to background levels?
They involve heavy CPU use, stress the whole system completely unnecessary, the system easily sees the highest temperature the device had ever seen during these stress tests. If during that strain something fails or gets corrupted, it's a system-level corruption...
Incidentally, Linux kernel upgrades are not better. During DKMS updates the CPU load skyrockets and then a reboot is always sketchy. There's no guarantee that something would not go wrong, a secure boot issue after a kernel upgrade in particular could be a nightmare.
To answer your question; it helps to explain what the upgrade process entails.
In the case of Linux DKMS updates: DKMS is re-compiling your installed kernel modules to match the new kernel. Sometimes a kernel update will also update the system compiler. In that instance it can be beneficial for performance or stability to have all your existing modules recompiled with the new version of the compiler. The new kernel comes with a new build environment, which DKMS uses to recompile existing kernel modules to ensure stability and consistency with that new kernel and build system.
Also, kernel modules and drivers may have many code paths that should only be run on specific kernel versions. This is called 'conditional compilation' and it is a technique programmers use to develop cross platform software. Think of this as one set of source code files that generates wildly different binaries depending on the machine that compiled it. By recompiling the source code after the new kernel is installed, the resulting binary may be drastically different than the one compiled by the previous kernel. Source code compiled on a 10 year old kernel might contain different code paths and routines than the same source code that was compiled on the latest kernel.
Compiling source code is incredibly taxing on the CPU and takes significantly longer when CPU usage is throttled. Compiling large modules on extremely slow systems could take hours. Managing hardware health and temperatures is mostly a hardware level decision controlled by firmware on the hardware itself. That is usually abstracted away from software developers who need to be able to be certain that the machine running their code is functional and stable enough to run it. This is why we have "minimum hardware requirements."
Imagine if every piece of software contained code to monitor and manage CPU cooling. You would have software fighting each other over hardware priorities. You would have different systems for control, with some more effective and secure than others. Instead the hardware is designed to do this job intrinsically, and developers are free to focus on the output of their code on a healthy, stable system. If a particular system is not stable, that falls on the administrator of that system. By separating the responsibility between software, hardware, and implementation we have clear boundaries between who cares about what, and a cohesive operating environment.
The default could be that a background upgrade should not be a foreground stress test.
Imagine you are driving a car and from time ro time, without any warning, it suddenly starts accelerating and decelerating aggressively. Your powertrain, engine, breaks are getting tear and wear, oh and at random that car also spins out and rolls, killing everyone inside (data loss).
This is roughly how current unattended upgrades work.
That was absolutely slamming the hardware. (source: worked on Android, and GPs comments re: this are 100% correct. I’d need a bit more, well anything, to even come around to the idea the opposite is even plausible. Best steelman is naïvete, like “aren’t updates are just a few mvs and a reboot?”)
Over my 35 years of computer use, most hardware failures (very, very rare) happen during a reboot or power-on. And most of my reboots happen when installing updates. It actually seems like a very high probability in my limited experience.
Of course, it’s possible that the windows update was a factor, when combined with other conditions.
There's also the case where the hardware has failed but the system is already up so it just keeps running. It's when you finally go to reboot that everything falls apart in a visible manner.
This is one of the reasons I am not a fan of uptime worship. It's not a stable system until it's able to cold boot.
Say you have a system that has been online for 5 years continuously until a power outage knocks it out. When power is restored, the system doesn't boot to a working system. How far back do you have to go to in your backups to find a known good system? And this isn't just about hardware failure, it's an issue of configuration changes, too.
I also notice that people with lots of experience with computers will automatically reboot when they encounter minor issues (have you tried turning it off and on again?).
When it then completely falls apart on reboot, they spend several hours trying to fix it and completely forget the "early warning signs" that motivated them to reboot in the first place.
I've think the same applies to updates. I know the time I'm most likely to think about installing updates is when my computer is playing up.
I try to do the opposite, and reboot only as a last resort.
If I reboot it and it starts working again, then I haven't fixed it at all.
Whatever the initial problem was is likely to still present after reboot -- and it will tend will pop up again later even if things temporarily seem to be working OK.
> Whatever the initial problem was is likely to still present after reboot
You only know this after the reboot. Reboot to fix the issue and if it comes back then you know you have to dig deeper. Why sink hours of effort into fixing a random bit flip? I'll take the opposite position and say that especially for consumer devices most issues are caused by some random event resulting in a soft error. They're very common and if they happen you don't "troubleshoot" that.
For all we know, this thing was on its last legs (these machines do run very hot!) and the update process might have been the final nail in the coffin. That doesn't mean Microsoft set out to kill OP's machine... Same thing could have happened if OP ran make -j8 -- we wouldn't blame GNU make.
I had a friend's dad's computer's HDD fail while I was installing Linux on it to show him it. That was terrifying. I still remember the error, and I just left with it (and Windows) unable to boot. Later my friend told me that the drive was toast.
Come to think of it, maybe it was me. I might have trashed the MBR? I remember the error, though, "Non system disk or disk error".
Yeah, I think so. It's been ~25 years, and only while typing out that comment did I remember the error message and realize that's probably what I had done.
If I recall correctly, he ended up scrapping the drive.
I've fixed thousands of PCs and Macs over my career. Coincidences like this happen all the time. I mean, have you seen the frequency of updates these days? There are always some kind of updates happening. So the chances of your system breaking during an update is not actually that slim.
> That sounds like a one in a billion kind of coincident
Hardware is more likely to fail under load than at idle.
Blaming the last thing that was happening before hardware failed isn't a good conclusion, especially when the failure mode manifests as random startup failures instead of a predictable stop at some software stage.
windows update just doing a normal write causing the active chunk of flash memory being used to hold something in the boot loader to a different failed/failing section
A software update can absolutely trigger or unmask a hardware bug. It’s not an either/or thing, it’s usually (if a hardware issue is actually present) both in tandem:
I'm not so sure, I've had a similar-ish issue on a W10 PC. I vaguely suspect a race condition on one of the drivers; I've specifically got my eye on the esp32 flashing drivers.
Sometimes it boots fine, sometimes the spinning dial disappears and it gets hung on the black screen, sometimes it hangs during the spinning dial and freezes, and very occasionally blue screens with a DPC watchdog violation. Oddly, it can happen during Safe Mode boots as well.
I would think hardware, but RAM has been replaced and all is well once it boots up. I can redline the CPU and GPU at the same time with no issues.
when something works flawlessly and starts to fail after an update (so no user actions there) this could mean that update made the hardware fail. For example overuse of flash in ssd (it's been already reported https://community.spiceworks.com/t/anyone-else-have-their-ss...) or reflashing a component too many times (simple error in scripts)
I would test the CPU cooler since the fans ran so hard. Temps ramp up around the login screen, then stay hot and reboots get unpredictable.
I recently had a water cooler pump die during a Windows update. The pump was going out, but the unthrottled update getting stuck on a monster CPU finished it off.
With the original Arduino Due there was some fun undocumented behavior with the MCU (an Atmel Cortex-M3) where it would do random things at boot unless you installed a 10k resistor. From booting off of flash or ROM at random to clocks not coming up correctly.
I swear I was doing just fine with it booting reliably until I decided to try flashing it over the SWD interface. But wouldn't you know it, soldering a resistor fixed it. Mostly.
An analysis tool for planes and sails operating at low Reynolds numbers
flow5 is a potential flow solver with built-in pre- and post processing functionalities. Its purpose is to make preliminary designs of wings, planes, hydrofoils and sails reliable, fast and user-friendly.
Beyond hover detection causing the app to preload (TIL that's apparently a thing? Can anyone confirm?), another case I've seen is trying to slide up to unlock but accidentally triggering the lock screen camera for a millisecond or two, which also causes the indicator to linger for a few seconds.
edit: Is this actual "hover without touching screen", which is what I was shocked about, or is this more like "finger passes over the icon while swiping between pages"?
In previous versions, you could change it mid-loop. This apparently caused some unintuitive behavior when paired with generators (e.g. `for k, v in pairs(table)`).
I haven't run into this myself, but it does make sense, and eliminating this footgun sounds like a good idea.
I'm in Europe and I see a lot more than that. Apologies for the ugly formatting, mobile:
As part of changing laws in Europe, Meta now offers the option for you to chat with others using third-party messaging apps that have integrated with WhatsApp and that you choose to turn on.
Note: Chats with third-party apps are only available in select regions and may not be available to you.
- You can send messages, photos, videos, voice messages, and documents to end users of supported messaging services that have integrated with WhatsApp.
- Messages or other content you send from WhatsApp to third-party users are encrypted in transit, and WhatsApp can’t see them.
- Third-party apps have their own policies and they might handle your data differently than WhatsApp.
## Eligibility requirements to turn on third-party chats with WhatsApp
- Third-party chats with WhatsApp are only available to users with a WhatsApp account registered to phone numbers in the regions covered by the Digital Markets Act (DMA).
- If you change your phone number to a number registered in a region not covered by the DMA, you won’t be able to use third-party chats on WhatsApp.
- Third-party chats are only available on WhatsApp for iPhone and Android. Third-party chats on WhatsApp are not currently accessible on tablets, web, or desktop.
We care about the safety of our global community when enabling chats with third-party apps. Visit our WhatsApp Privacy Policy for users in Europe for more information.
When you send a message to a third-party app, the phone number registered to your WhatsApp account is available to the third-party app you select. Other people who know your phone number can find and message you from third-party messaging services you've enabled.
Note: Users you’ve blocked on WhatsApp might be able to message you from third-party apps. Learn more about how to block someone in this article.
## Be mindful of the information you share
Before you chat with someone using third-party apps:
- Make sure you know the person you’re chatting with before sharing any personal information.
- Be aware that scams and spam might be more common when messaging with third-party apps.
- If you receive an unwanted message from a third-party chat, you can block the sender from messaging you from the third party.
Thanks! It looks like that repo is GPL though, which I respect but isn't going to work for my usage (where I'm trying to build a generic UI toolkit that can be used by all sorts of applications including closed source ones).
It's just two broadcast receivers (one for receiving the push token, another for receiving actual notifications), and one broadcast sender to ask GSF to give you a token. This code is so trivial it's not even worth separating into a library.
The first one is where you get notifications. The parameters you sent from the server will simply be your intent extras.
The second one is where you get push tokens. There will be a "registration_id" extra string which is your token. It may start with "|ID|1|" (the "kid" parameter from the request, not quite sure what it does), in which case you need to remove that part.
You want to refresh your push token every time your app gets updated and also just periodically if you haven't done it in a while. I do it every 30 days.
Their comment would technically be proprietary code since there's no license alongside it, but grishka wrote the original implementation of the reverse engineered code in that mastodon commit in the first place. So I'd imagine it's free game to use it as a reference (IANAL)
Grishka expresses that the code is trivial. Trivial inventions are not covered by patents. I believe, therefore, that a license for trivial code is not necessary.
But if someone knows better I would appreciate any correction. Legal matters are seldom clear or logical. Your jurisdiction may vary, etc etc.
In case there are any doubts, consider this code and its description public domain.
But then I'm not sure how much code is enough to be considered copyrightable. Is "2*2" copyrightable? Clearly not, because it's too trivial. Where is the line?
Patent != copyright. You can patent an algorithm (e.g., Adaptive Replacement Caching, which was scheduled to go into public domain this year but unfortunately got renewed successfully) but when it gets to the level of an actual specific implementation, it's a matter of copyright law.
It's why black-box clones where you look at an application and just try to make one with the same externally-observable behavior without looking at the code is legal (as long as you don't recycle copyrighted assets like images or icons) but can be infringing if you reuse any of the actual source code.
This was an issue that got settled early on and got covered in my SWE ethics class in college, but then more recently was re-tried in Oracle v Google in the case of Google cloning the Java standard library for the Android SDK.
I have no idea how copyright applies here. StackOverflow has a rule in their terms of use that all the user-generated content there is redistributable under some kind of creative commons license that makes it easy to reuse. Perhaps HN has a similar rule? Not that I'm aware of, though.
Microsoft's code quality might not be at its peak right now, but blaming them for what's most likely a hardware fault isn't very productive IMO.
reply