> Often, the CPU has different number of entries for each page size.
- Does it mean userspace is free to allocate up to a maximum of 1G? I took pages to have a fixed size.
- Or, you mean CPUs reserve TLB sizes depending on the requested page size?
> With 2 MiB or 1 GiB pages, there is less contention and more workingset size is covered by the TLB
- Would memory allocators / GCs need to be changed to deal with blocks of 1G? Would you say, the current ones found in popular runtimes/implementations are adept at doing so?
- Does it not adversely affect databases accustomed to smaller page sizes now finding themselves paging in 1G at once?
> my PhD is on memory management and address translation on large memory systems
If the dissertation is public, please do link it, if you're comfortable doing so.
> - Does it mean userspace is free to allocate up to a maximum of 1G? I took pages to have a fixed size. > - Or, you mean CPUs reserve TLB sizes depending on the requested page size?
The TLB is a hardware cache with a limited number of entries that cannot dynamically change. Your CPU is shipped with a fixed number of entries dedicated for each page size. Translations of base 4 KiB pages could, for example, have 1024 entries. Translations of 2 MiB pages could have 512 entries and those of 1 GiB usually have a very limited number of only 8 or 16. Nowadays, most CPU vendors increased their 2 MiB TLBs to have the same number of entries dedicated for 4 KiB pages.
If you're wondering why they have to be separate caches, it's because, for any page in memory, you can have both mappings at the same time from different processes or different parts of the same process, with possibly different protections.
> - Would memory allocators / GCs need to be changed to deal with blocks of 1G? Would you say, the current ones found in popular runtimes/implementations are adept at doing so?
> - Does it not adversely affect databases accustomed to smaller page sizes now finding themselves paging in 1G at once?
Runtimes and databases have full control and Linux allows per-process policies via madvise) system call. If a program is not happy with huge pages, it can ask the kernel to be ignored, as it can choose to be cooperative.
> If the dissertation is public, please do link it, if you're comfortable doing so.
I'm still in the PhD process, so no cookies atm :D
I think modern Intel/AMD have same amount of dTLB entries for all page sizes.
For example a modern CPU with 3k TLB entries one can access at max:
- 12MB with 4k page size
- 6GB with 2M page size
- 3TB with 1G page size
If the working set per core is bigger than above numbers you get 10-20% slower memory accesses due to TLB miss penalty.
> I'm happy to talk about this all day!
With noobs, too? ;)
> Often, the CPU has different number of entries for each page size.
- Does it mean userspace is free to allocate up to a maximum of 1G? I took pages to have a fixed size.
- Or, you mean CPUs reserve TLB sizes depending on the requested page size?
> With 2 MiB or 1 GiB pages, there is less contention and more workingset size is covered by the TLB
- Would memory allocators / GCs need to be changed to deal with blocks of 1G? Would you say, the current ones found in popular runtimes/implementations are adept at doing so?
- Does it not adversely affect databases accustomed to smaller page sizes now finding themselves paging in 1G at once?
> my PhD is on memory management and address translation on large memory systems
If the dissertation is public, please do link it, if you're comfortable doing so.