Writes on NAND happen at the block, not the page level, though. I believe the ratio between the two is usually something like 1:8 or so.
Even blocks might still be larger than 4KB, but if they’re not, presumably a NAND controller could allow such smaller writes to avoid write amplification?
The mapping between physical and logical block address is complex anyway because of wear leveling and bad block management, so I don’t think there’s a need for write granularity to be the erase block/page or even write block size.
> Even blocks might still be larger than 4KB, but if they’re not, presumably a NAND controller could allow such smaller writes to avoid write amplification?
Look at what SandForce was doing a decade+ ago. They had hardware compression to lower write amp and some sort of 'battery backup' to ensure operations completed. Various bits of this sort of tech is in most decent drives now.
> The mapping between physical and logical block address is complex anyway because of wear leveling and bad block management, so I don’t think there’s a need for write granularity to be the erase block/page or even write block size.
The controller needs to know what blocks can get a clean write vs what needs an erase; that's part of the trim/gc process they do in background.
Assuming you have sufficient space, it works kinda like this:
- Writes are done to 'free-free' area, i.e. parts of the flash it can treat like SLC for faster access and less wear. If you have less than 25%-ish of drive free this becomes a problem. Controller is tracking all of this state.
- When it's got nothing better to do for a bit, controller will work to determine which old blocks to 'rewrite' with data from the SLC-treated flash into 'longer lived' but whatever-Level-cell storage. I'm guessing (hoping?) there's a lot of fanciness going on there, i.e. frequently touched files take longer to get a full rewrite.
Even blocks might still be larger than 4KB, but if they’re not, presumably a NAND controller could allow such smaller writes to avoid write amplification?
The mapping between physical and logical block address is complex anyway because of wear leveling and bad block management, so I don’t think there’s a need for write granularity to be the erase block/page or even write block size.