Thanks for doing more research. Did you get the idea there are two separate issu...

Avernar · on April 11, 2016

Yes, you are correct that there are two issues. But solving 2 may solve 1 automatically or just need the compiler barrier intrinsics in the same spots as the processor barrier intrinsics.

And yes, the first issue can be fixed with memory barriers. If you put a compiler+processor read barrier in front of the first line that reads the shared variable the compiler will re-read it from memory back into a register. After that it will use the register as to not kill performance until it hits the barrier again.

rdtsc · on April 11, 2016

So you think wrapping every single access to that shared memory structure in barriers would do the trick? I should dig out that code and try, I am curious now. But that would be kind of an large change. Now the volatile modifier is in one place only -- where data is defined not when it is accessed.

Avernar · on April 11, 2016

If you do it correctly it should work. I'm curious if that code is running on a multi core (or even hyperthreaded) CPU.