Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

And it's a particularly interesting issue, because this problem mirrors the congestion control failure observed on most networks in recent years. We all have seen this problem, on a busy network, the latency will increase by two order of magnitudes, ruining other network activities like web browsing, even themselves require only a little bit of bandwidth. The simplest demo is uploading a large file, while observing the ping latency, it would just jump from 100ms to 2000ms. But it should not happen, because the TCP congestion control was just designed to solve it.

In turns out that the cause of this problem, known as bufferbloat, is the accumulated effect of excessive buffering in the network stack, mostly the system packet queue, but also includes the routers, switches, drivers and hardware, since RAM is cheap nowadays. The TCP congestion control works like this: if packet loss is detected, then sends at a lower rate. But when there are large buffers on the path for "improving performance", the packets are never lost when if the path is congested, instead, they would be put into a huge buffer, so TCP will never slow down properly as designed, and during the slow-start, it believes it's on the way going to the moon. On the other hand, all the buffers are FIFO, it means when your new packets have a chance to get out, it's probably no longer relevant, since it takes seconds for moving it from the tail of the queue to the head, the connection would be timed out already.

Solutions include killing buffers and limiting their length (byte queue limit, TCP small queue), another innovation is new queue management algorithms: we don't have to use a mindless FIFO queue, we can make them smarter. As a result, CoDel and fq_codel are invented to implement "DELay-COntrolled queues", they are designed to prioritize new packets that are just arrived, and dropping old packets to keep your traffic flowing.

And people realized the Linux I/O freeze is a variant of bufferbloat, and the very same ideas of the CoDel algorithm can be applied to the Linux I/O freeze problem.

Another interesting aspect was, that the problem is NOT OBSERVABLE if the network is fast enough, or the traffic is low, because the buffering does not occur, so it will never be caught in many benchmarks. On the other hand, when you start uploading a large file over a slow network, or start copying a large file to a USB thumb drive on Linux...

https://lwn.net/Articles/682582/

and

https://lwn.net/Articles/685894/



> And people realized the Linux I/O freeze is a variant of bufferbloat, and the very same ideas of the CoDel algorithm can be applied to the Linux I/O freeze problem.

There are myriad causes for poor interactivity on Linux systems under heavy disk IO. I've already described the two I personally observe the most often in another post here [1], and they have nothing at all in common with bufferbloat.

Linux doesn't need to do less buffering. It needs to be less willing to evict recently used buffers even under pressure, more willing to let processes OOM, and a bridging of the CPU and IO scheduling domains so arbitrary processes can't hog CPU resources via plain IO on what are effectively CPU-backed IO layers like dmcrypt.

But it gets complicated very quickly, there are reasons why this isn't fixed already.

One obvious problem is the asynchronous, transparent nature of the page cache. Behind the scenes pages are faulted in and out on demand as needed, this generates potentially large amounts of IO. If you need to charge the cost of this IO to processes for informing scheduling decisions, which process pays the bill? The process you're trying to fault in or the process that was responsible for the pressure behind the eviction you're undoing? How does this kind of complexity relate to bufferbloat?

[1] https://news.ycombinator.com/item?id=18784209


> It needs to be less willing to evict recently used buffers even under pressure, more willing to let processes OOM

I've had similar experiences. On Windows, where either through bugs or poor coding, an application requests way too much memory, leading to an unresponsive system while the kernel is busy paging away.

On Linux I've experienced the system killing system processes when under memory pressure, leading to crashes or an unusable system.

I don't understand why the OS would allow a program to allocate more than available physical memory, at least without asking the user, given the severe consequences.


Overcommit is a very deliberate feature, but its time may have passed. Keep in mind this is all from a time when RAM was so expensive swapping to spinning disks was a requirement just to run programs taking advantage of a 32-bit address space.

You can tune the overcommit ratio on Linux, but if memory serves (no pun intended) the last time I played with eliminating overcommit, a bunch of programs that liked to allocate big virtual address spaces ceased functioning.


Yeah, I know it was a feature at one point... but at least the OS should punish the program overcommitting, rather than bringing the rest of the system down (either by effectively grinding to a halt or killing important processes).




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: