> I'm happy to talk about this all day! Oh really :) I'd like to ask how applica...

afr0ck · on Aug 24, 2024

In current Linux systems, there are two main ways to benefit from huge pages. 1) There is the explicit, user-managed approach via hugetlbfs. That's not very common. 2) Transparently managed by the kernel via THP (userpsace is completely unaware and any application using mmap() and malloc() can benefit from that).

As I mentioned before, most major Linux distributions ship with THP enabled by default. THP automatically allocates huge pages for mmap memory whenever possible (that is when, at least, the region is 2 MiB aligned and is at least 2 MiB in size). There is also a separate kernel thread, khugepaged, that opportunistically tries to coalesce/promote base 4K pages into 2 MiB huge pages, whenever possible.

Library support is not really required for THP, but could be detrimental for its performance and availability on the long run. A library that is not aware of kernel huge pages may employ suboptimal memory management strategies, resulting in inefficient utilization, for example by unintentionally breaking those huge pages (e.g. via unaligned unmapping), or failing to properly release them to the OS as one full unit, undermining their availability on the long run. Afaik, Tcmalloc from Google is the only library with extensive huge page awareness [1].

> Do the heuristics used by Linux THP (khugepaged) really allow completely ignoring whether pages have actually been page-faulted in or even initialised? Is a possibility unlikely to happen in practice?

Linux allocates huge pages on first touch. For khugepaged, it only coalesces the pages if all the base pages covering the 2 MiB virtual region exist in some form (not necessary faulted-in. For example, some of those base pages could be in swap space and Linux will first fault them in then migrate them)

[1] https://www.usenix.org/system/files/osdi21-hunter.pdf

Dylan16807 · on Aug 24, 2024

Mimalloc has support for huge pages. It also has an option to reserve 1GB pages on program start, and I've had very good performance results using that setting and replacing factorio's allocator. On windows and linux.