RowHammer is a category of security vulnerabilities that work by exploiting DRAM that suffers from bit-flips when accessed in particular patterns. It can be used for local privilege escalation; for example a cell-phone app without permissions could use it to take full control of the phone, a server on a shared cloud host could attack other sites on the same host, or in the worst case, a browser Javascript program could break out of the browser (not done in this paper, but done as a proof-of-concept with caveats in one of the papers it cites). The main finding of this paper is that none of the existing mitigations is fully adequate, not even ECC.
On the plus side, using RowHammers to make a fully working exploits is much harder than most other types of exploits, and the exploits people have managed to make so far have all involved major caveats (like requiring hundreds of hours of full CPU usage to work, or providing memory layout information that would be difficult for attackers to obtain). Still, the fact that this class of exploit has proven so difficult to fully fix is worrying.
There's a section of proposed mitigations at the end, one of which sounds particularly promising, though:
> "Kim et al. [38] and Kim et al. [37] proposed to eliminate bit flips in hardware by probabilistically opening adjacent or non-adjacent rows, whenever a row is opened or closed. As an ongoing Rowhammer attack would open and close a certain row repeatedly, the vulnerable adjacent rows would be refreshed before bit flips occur. We consider their approaches as a possible solutions to mitigate Rowhammer attacks in the future."
> The main finding of this paper is that none of the existing mitigations is fully adequate, not even ECC.
I think that is bit misleading; this paper specifically did not analyze the effects of ECC at all, all their work is strictly focused on non-ECC systems. All the references to ECC are to other papers. Furthermore, while not "fully adequate", ECC is still quite effective mitigation
> [...] existing hardware countermeasures such as using memory with error-correction codes (ECC-RAM) appear to make Rowhammer attacks harder [...]
> ECC RAM can detect and correct 1-bit errors and, thus, deal with single bit flips caused by the Rowhammer attack. [...] However, uncorrectable multi-bit flips can be exploitable [2, 3, 42] or can result in a denial-of-service attack similar as described in Section IX-A depending on how the operating system responds to the error
Unless the situation has radically changed, my understanding is that ECC changes Rowhammer from a practical attack to more of a theoretical threat. And generally DoS is highly preferable over something like privilege escalation, which is further benefit of ECC.
I haven't looked into it for a while, but AMD often supports ECC so long as the motherboard supports it. I haven't checked the current lineup, but looked at it for my last home-server build.
I agree with you entirely, to be sure. A quick Google led me to here:
I sure would like to know how ECC fails to alleviate this. People forget that the standard single bit correction, double bit detection really means, guaranteed single bit correction, and guaranteed double bit detection with a probability of detecting > 2 bits depending on how/where they flip.
(SECDED.) RH and bitsquatting are two reasons ECC should be the standard, not the exception, in all manner of consumer endpoints, network gear and servers.
TFA said nothing about ECC failing to stop RowHammer. TFA does discuss the cost of ECC (or ChipKill) and the fact that Intel does not even support ECC on consumer kit -- i.e., ECC's availability is rather limited.
"ECC RAM can detect and correct 1-bit errors and, thus,
deal with single bit flips caused by the Rowhammer attack.
Furthermore, IBM’s Chipkill error correction [27] allows to
successfully recover from 3-bit errors. However, uncorrectable
multi-bit flips can be exploitable [2, 3, 42] "
Maybe that, combined with a lot of weasel words about the effectiveness of ECC muddy the water.
> The main finding of this paper is that none of the existing mitigations is fully adequate, not even ECC.
I suspect AMD's new memory encryption would provide significant protection as specific bit flips in rows that do not share the same key would be totally unpredictable.
Software mitigation is not the right way to deal with RAM that can be fooled into flipping bits. All DRAM with this vulnerability should be recalled as defective. We'd certainly demand that Intel fix a vulnerability that let you flip register bits in other processes by doing something you can do from Javascript.
There are reasonable defenses possible inside the DRAM. They currently depend on the fact that repeated reads of the same data are rare, because of caches. DRAM should insert a delay after hitting the same row N times between refreshes, depending on its analog parameters. It won't affect performance in any normal application.
There are reasonable defenses possible inside the DRAM
Don't forget the simplest one: make them like they used to, with a much bigger margin of reliability. Rowhammer wasn't a problem before ~2010 or so, if I remember the original paper correctly. DRAM these days is operating too close to the limits, with razor-thin margins.
Indeed, any new DRAM I purchase gets run through a 72-hour stress test and returned as defective if so much as one bit is in error during that time.
I like to make an analogy with where flash memory is heading today with multi-level cells (multiplicative capacity increase; exponential endurance and retention decrease) and a phrase to keep in mind: Faster and bigger storage means nothing if it barely works. Contrary to what all the marketing seems to imply, some people actually want memory that works perfectly.
Indeed. Back in the day we called this "pattern sensitivity" and didn't ship product that suffered from it, leading to loss of $$. Collectively the memory industry seems to have screwed up big time, but persuaded everyone they didn't.
The memory controller may be a more efficient place to do it, because the silicon process is better suited to logic. The challenge is agreeing on how to specify limits on access patterns. JEDEC would have to define a protocol for the memory chips to inform the controller of how many accesses per row are allowed between refreshes.
target row refresh (TRR) is already a thing in laptop DDR4 and is being reported as part of SPD, but it appears Intel still doesnt support it (Cisco claimed Intel does support it, but its hard to believe them when Intel itself never claimed it publicly)
I don't want to imply that manufacturers don't have any responsibility in this problem, but I think there's a difference between recalling a defective unit or model from one company, and recalling an entire family of products from different companies... especially if the defect is so common that it could be argued to be part of the design.
Finally, we abuse Intel SGX to hide the attack entirely from the user and the operating system, making any inspection or detection of the attack infeasible
It's unfortunate that academics feel OK about making misleading statements in this space. SGX enclaves must be signed by Intel to work, so I doubt very much that they abused SGX in this way. What they mean is that they could have done, if they had got Intel to approve their attack, which is a pretty freaking huge caveat.
No, the attack doesn't require Intel to sign the attacker's code. It works by abusing SGX's tamper detection, which hangs the machine if a forbidden memory region has been written to. If you can trigger such forbidden writes repeatedly (which is what Rowhammer does), you can DDoS a cloud provider.
The paper makes several claims about SGX, but the part I quoted says it uses it to hide the attack from the operating system. The "DoS a cloud by making a hang" aspect is different.
Yes, the hardware must be signed by Intel, but the code can be anything, since the idea is to make cloud hardware trustworthy even for sensitive computations a customer might run. The researchers use the enclave in the intended way (protecting their code from interference even by privileged code) but for a malicious purpose. (Thus abusing it.) Intel will happily provide them with an attestation that the code is indeed running in a secure enclave.
On the plus side, using RowHammers to make a fully working exploits is much harder than most other types of exploits, and the exploits people have managed to make so far have all involved major caveats (like requiring hundreds of hours of full CPU usage to work, or providing memory layout information that would be difficult for attackers to obtain). Still, the fact that this class of exploit has proven so difficult to fully fix is worrying.
There's a section of proposed mitigations at the end, one of which sounds particularly promising, though:
> "Kim et al. [38] and Kim et al. [37] proposed to eliminate bit flips in hardware by probabilistically opening adjacent or non-adjacent rows, whenever a row is opened or closed. As an ongoing Rowhammer attack would open and close a certain row repeatedly, the vulnerable adjacent rows would be refreshed before bit flips occur. We consider their approaches as a possible solutions to mitigate Rowhammer attacks in the future."