Some real world examples I have used: Harvard architecture machines (which are n...

Some real world examples I have used:

Harvard architecture machines (which are not uncommon in microcontrollers)

Segmented memory, or any non-flat memory

Addressable memory with non-uniform access time (cache doesn't count because cache lines can't be addressed directly)

Address spaces not a factor of 2. Variable byte sizes ("byte" did not mean "8 bits" until the 360, and even in those days, just in the IBM world)

Word length larger than address length.

Some hardware-tagged architectures.

Machines with hardware-supported transporting GCs.

Different regions of memory that are architecturally distinct (shared memory with machines of different architectures, which these days can mean GPUs).

And one I haven't used yet:

distributed-computation-in-RAM