xavierxwang's comments

xavierxwang · 2025-12-16T15:01:52 1765897312

Is there any chance to modify Vyukov's MPMC queue implement (https://www.1024cores.net/home/lock-free-algorithms/queues/b...) to support drop handler? That work doesn't need 128 bit CAS.

viega · 2025-12-16T15:18:48 1765898328

You can use a 64-bit CAS if you want to use a 32-bit epoch and a pointer compression scheme of any kind, or just a 32-bit index into regions that are thread specific. I think I did the later when I did the original work, using the core primitive to build ring buffers that have arbitrary sized slots instead of 64-bit slots (which requires a bit of additional gymnastics, but the basic trick is to have the ring index into a bigger ring that you can FAA into, where the bigger ring has more slots by at least the max number of threads (I use this primitive heavily still for in-memory debug logging). Maybe at some point I'll do an article on that too.

viega · 2025-12-16T15:20:02 1765898402

BTW, should be noted that the need to issue a cache line lock on x86 does seem to slow down 128-bit CAS quite a bit on x86-64 platforms. On arm64, there's no reason to skimp with a 64-bit CAS.

xavierxwang · 2025-12-16T15:00:12 1765897212

FYI: I have made a SPSC circular buffer for swap data in a pair of process: https://github.com/starwing/kaze-core

maybe that is what you want.

qbane · 2025-12-16T17:47:17 1765907237

By a quick glance, yes, this is what I want: a channel to communicate between processes via a piece of shared memory, protected by a pair of futexes.

In JS ecosystem, buffers that allow data loss is more common (aka ring buffers), but ringbuf.js [1] is the only complete implementation to my knowledge. In my use case on I/O between WASM modules where data must be transferred as-is, the process must block on buffer overrun/underrun in a synchronous manner. Therefore a circular buffer is required. I could not find such a niche library written in JS, so I decided to bite the bullet and reinvent the wheel [2].

[1]: https://github.com/padenot/ringbuf.js

[2]: https://github.com/andy0130tw/spsc

xavierxwang · 2025-12-16T22:18:50 1765923530

After a quick glance, it seems that you don’t maintain the reading/writing status in the shared memory. That means you have to make a syacall in every read/write call. You could look into the kaze-core for an alternative implementation, which doesn’t require any syscall if possible.

Btw, kaze-core uses a `used` atomic variable, to avoid reading both readPos/writePos in routine - they are not atomic at all.

qbane · 2025-12-17T19:22:39 1765999359

That is a fair assessment. Maintaining read/write pos and peek them at every operation is a big performance hit. The impact is amplified if each invocation needs a syscall. That is exactly what futexes address: Allowing spin locks to remain in user space and avoid entering the kernel as long as contention is low.

In JavaScript, atomic operations are relatively lightweight, so their overhead is likely acceptable. Given that, I am open to adjusting my code to your suggested approach and seeing how it performs in practice.

xavierxwang · 2025-12-16T13:57:05 1765893425

I spent a month porting Rust's Ariadne diagnostic renderer to C, with Claude as a pair programming partner. The project taught me a lot about working with LLMs on real system programming tasks - what works, what doesn't, and where human expertise still matters.