A SIMD coding challenge: First non-space character after newline

clausecker · 2025-12-26T00:24:55 1766708695

You can do it like this, assuming A is the mask of newlines and B is the mask of non-spaces.

1. Compute M1 = ~A & ~B, which is the mask of all spaces that are not newlines 2. Compute M2 = M1 + (A << 1) + 1, which is the first non-space or newline after each newline and then additional bits behind each such newline. 3. Compute M3 = M2 & ~M1, which removes the junk bits, leaving only the first match in each section

Here is what it looks like:

    10010000 = A
    01100110 = B
    00001001 = M1 = ~A & ~B
    00101010 = M2 = M1 + (A << 1) + 1
    00100010 = M3 = M2 & ~M1

Note that this code treats newlines as non-spaces, meaning if a line comprises only spaces, the terminating NL character is returned. You can have it treat newlines as spaces (meaning a line of all spaces is not a match) by computing M4 = M3 & ~A.

zokrezyl · 2025-12-27T09:50:21 1766829021

Thanks for the suggestion. Indeed, that would be the first approach. That is how I started. This is however not considering the state (am I inside a 'statement' or not)

statement meaning string from first non-space till next EOL or EOF.

Problem starts when you need to cover the "corner cases". Without the corner cases the algo is not algo.

zokrezyl · 2025-12-27T10:02:41 1766829761

Do you mind trying out your solution? The code is in https://github.com/zokrezyl/yaal-cpp-poc Thanks a lot!

Obviously if your solutions gets closed to the memory bandwith limit, we will proudly mention it!

camel-cdr · 2025-12-23T18:37:55 1766515075

The problem should be equivalent to: https://www.reddit.com/r/simd/comments/1hmwukl/mask_calculat...

Falvyu's and bremac's solution seems to be the best.

zokrezyl · 2025-12-23T22:11:50 1766527910

thanks for pointing out! I tried the borrowing trick from the previous segment, was pretty obvious, but for some reason failed as could not avoid at least one conditinonal... will try again.

zokrezyl · 2025-12-24T12:58:54 1766581134

For my problem describe under the link above the suggestions above eliminate indeed the branches, but same time the extra instructions slow down the same as my initial branches. Meaning, detecting newlines would work almost 100% of memory throughput, but detecting first non-space reduces the speed to bit above 50% of bandwith

https://gist.github.com/zokrezyl/8574bf5d40a6efae28c9569a8d6...

pestatije · 2025-12-23T17:25:57 1766510757

wheres the code?...have a look at codereview[5], the whole site is geared for this kind of challenges

[5] codereview.stackexchange.com

zokrezyl · 2025-12-23T17:36:14 1766511374

I do not have one "implementation" but have been trying with different approaches that all delivered under 50% of memory bandwith... I guess if anyone can purpose a solution should be from scratch... The problem is that all approaches I tried end up generating unpredictable branches that do not allow the CPU to optimally keep loading text from memory.

zokrezyl · 2025-12-23T17:46:26 1766511986

https://godbolt.org/z/3YMbaeEGh

One approach....