Another Flip in the Wall of Rowhammer Defenses

jimrandomh · on Oct 4, 2017

RowHammer is a category of security vulnerabilities that work by exploiting DRAM that suffers from bit-flips when accessed in particular patterns. It can be used for local privilege escalation; for example a cell-phone app without permissions could use it to take full control of the phone, a server on a shared cloud host could attack other sites on the same host, or in the worst case, a browser Javascript program could break out of the browser (not done in this paper, but done as a proof-of-concept with caveats in one of the papers it cites). The main finding of this paper is that none of the existing mitigations is fully adequate, not even ECC.

On the plus side, using RowHammers to make a fully working exploits is much harder than most other types of exploits, and the exploits people have managed to make so far have all involved major caveats (like requiring hundreds of hours of full CPU usage to work, or providing memory layout information that would be difficult for attackers to obtain). Still, the fact that this class of exploit has proven so difficult to fully fix is worrying.

There's a section of proposed mitigations at the end, one of which sounds particularly promising, though:

> "Kim et al. [38] and Kim et al. [37] proposed to eliminate bit flips in hardware by probabilistically opening adjacent or non-adjacent rows, whenever a row is opened or closed. As an ongoing Rowhammer attack would open and close a certain row repeatedly, the vulnerable adjacent rows would be refreshed before bit flips occur. We consider their approaches as a possible solutions to mitigate Rowhammer attacks in the future."

zokier · on Oct 4, 2017

> The main finding of this paper is that none of the existing mitigations is fully adequate, not even ECC.

I think that is bit misleading; this paper specifically did not analyze the effects of ECC at all, all their work is strictly focused on non-ECC systems. All the references to ECC are to other papers. Furthermore, while not "fully adequate", ECC is still quite effective mitigation

> [...] existing hardware countermeasures such as using memory with error-correction codes (ECC-RAM) appear to make Rowhammer attacks harder [...]

> ECC RAM can detect and correct 1-bit errors and, thus, deal with single bit flips caused by the Rowhammer attack. [...] However, uncorrectable multi-bit flips can be exploitable [2, 3, 42] or can result in a denial-of-service attack similar as described in Section IX-A depending on how the operating system responds to the error

Unless the situation has radically changed, my understanding is that ECC changes Rowhammer from a practical attack to more of a theoretical threat. And generally DoS is highly preferable over something like privilege escalation, which is further benefit of ECC.

Consumer-grade ECC can't come soon enough.

KGIII · on Oct 5, 2017

I haven't looked into it for a while, but AMD often supports ECC so long as the motherboard supports it. I haven't checked the current lineup, but looked at it for my last home-server build.

I agree with you entirely, to be sure. A quick Google led me to here:

https://www.reddit.com/r/Amd/comments/6f7s28/what_is_the_sta...

It's a thread that is about four months old and they are discussing ECC support with Rizen CPUs. It looks like they have some current-gen solutions.

StillBored · on Oct 4, 2017

I sure would like to know how ECC fails to alleviate this. People forget that the standard single bit correction, double bit detection really means, guaranteed single bit correction, and guaranteed double bit detection with a probability of detecting > 2 bits depending on how/where they flip.

theyregreat · on Oct 4, 2017

(SECDED.) RH and bitsquatting are two reasons ECC should be the standard, not the exception, in all manner of consumer endpoints, network gear and servers.

cryptonector · on Oct 4, 2017

TFA said nothing about ECC failing to stop RowHammer. TFA does discuss the cost of ECC (or ChipKill) and the fact that Intel does not even support ECC on consumer kit -- i.e., ECC's availability is rather limited.

StillBored · on Oct 4, 2017

to quote TFA.

"ECC RAM can detect and correct 1-bit errors and, thus, deal with single bit flips caused by the Rowhammer attack. Furthermore, IBM’s Chipkill error correction [27] allows to successfully recover from 3-bit errors. However, uncorrectable multi-bit flips can be exploitable [2, 3, 42] "

Maybe that, combined with a lot of weasel words about the effectiveness of ECC muddy the water.

cryptonector · on Oct 5, 2017

But they did not actually demonstrate any attacks on ECC.

ris · on Oct 5, 2017

> The main finding of this paper is that none of the existing mitigations is fully adequate, not even ECC.

I suspect AMD's new memory encryption would provide significant protection as specific bit flips in rows that do not share the same key would be totally unpredictable.

tlb · on Oct 5, 2017

Software mitigation is not the right way to deal with RAM that can be fooled into flipping bits. All DRAM with this vulnerability should be recalled as defective. We'd certainly demand that Intel fix a vulnerability that let you flip register bits in other processes by doing something you can do from Javascript.

There are reasonable defenses possible inside the DRAM. They currently depend on the fact that repeated reads of the same data are rare, because of caches. DRAM should insert a delay after hitting the same row N times between refreshes, depending on its analog parameters. It won't affect performance in any normal application.

userbinator · on Oct 5, 2017

There are reasonable defenses possible inside the DRAM

Don't forget the simplest one: make them like they used to, with a much bigger margin of reliability. Rowhammer wasn't a problem before ~2010 or so, if I remember the original paper correctly. DRAM these days is operating too close to the limits, with razor-thin margins.

Indeed, any new DRAM I purchase gets run through a 72-hour stress test and returned as defective if so much as one bit is in error during that time.

I like to make an analogy with where flash memory is heading today with multi-level cells (multiplicative capacity increase; exponential endurance and retention decrease) and a phrase to keep in mind: Faster and bigger storage means nothing if it barely works. Contrary to what all the marketing seems to imply, some people actually want memory that works perfectly.

dboreham · on Oct 5, 2017

Indeed. Back in the day we called this "pattern sensitivity" and didn't ship product that suffered from it, leading to loss of $$. Collectively the memory industry seems to have screwed up big time, but persuaded everyone they didn't.

rasz · on Oct 5, 2017

You can also work around it in memory controller incorporating row access counters and forcing refresh after X accesses.

tlb · on Oct 5, 2017

The memory controller may be a more efficient place to do it, because the silicon process is better suited to logic. The challenge is agreeing on how to specify limits on access patterns. JEDEC would have to define a protocol for the memory chips to inform the controller of how many accesses per row are allowed between refreshes.

rasz · on Oct 6, 2017

target row refresh (TRR) is already a thing in laptop DDR4 and is being reported as part of SPD, but it appears Intel still doesnt support it (Cisco claimed Intel does support it, but its hard to believe them when Intel itself never claimed it publicly)

RachelF · on Oct 5, 2017

True. Software is easier to fix than hardware, though. Stopping programmatic cache flushes, or adding timing jitter to them is an easier solution.

a3_nm · on Oct 5, 2017

I don't want to imply that manufacturers don't have any responsibility in this problem, but I think there's a difference between recalling a defective unit or model from one company, and recalling an entire family of products from different companies... especially if the defect is so common that it could be argued to be part of the design.

peoplewindow · on Oct 4, 2017

Finally, we abuse Intel SGX to hide the attack entirely from the user and the operating system, making any inspection or detection of the attack infeasible

It's unfortunate that academics feel OK about making misleading statements in this space. SGX enclaves must be signed by Intel to work, so I doubt very much that they abused SGX in this way. What they mean is that they could have done, if they had got Intel to approve their attack, which is a pretty freaking huge caveat.

ameliaquining · on Oct 4, 2017

No, the attack doesn't require Intel to sign the attacker's code. It works by abusing SGX's tamper detection, which hangs the machine if a forbidden memory region has been written to. If you can trigger such forbidden writes repeatedly (which is what Rowhammer does), you can DDoS a cloud provider.

peoplewindow · on Oct 4, 2017

The paper makes several claims about SGX, but the part I quoted says it uses it to hide the attack from the operating system. The "DoS a cloud by making a hang" aspect is different.

yorwba · on Oct 5, 2017

Yes, the hardware must be signed by Intel, but the code can be anything, since the idea is to make cloud hardware trustworthy even for sensitive computations a customer might run. The researchers use the enclave in the intended way (protecting their code from interference even by privileged code) but for a malicious purpose. (Thus abusing it.) Intel will happily provide them with an attestation that the code is indeed running in a secure enclave.

wmf · on Oct 5, 2017

That's how SGX probably should work and that's how everyone thought it works for years, but at the last minute Intel announced that they have to sign every enclave. https://software.intel.com/en-us/articles/intel-software-gua...