Hacker Newsnew | past | comments | ask | show | jobs | submit | jldugger's commentslogin

>Turns out you can compile tens of thousands of patterns and still match at line rate.

Well, yea, sort of the magic of the regular expression <-> NFA equality theorem. Any regex can be converted to a state machine. And since you can combine regexes (and NFAs!) procedurally, this is not a surprising result.

> I ran it against the first service: ~40% waste. Another: ~60%. Another: ~30%. On average, ~40% waste.

I'm surprised it's only 40%. Observability seems to be treated like fire suppression systems: all important in a crisis, but looks like waste during normal operations.

> The AI can't find the signal because there's too much garbage in the way.

There's surprisingly simple techniques to filter out much of the garbage: compare logs from known good to known bad, and look for the stuff thats' strongly associated with bad. The precise techniques seem bayesian in nature, as the more evidence (logs) you get the more strongly associated it will appear.

More sophisticated techniques will do dimensional analysis -- are these failed requests associated with a specific pod, availability zone, locale, software version, query string, or customer? etc. But you'd have to do so much pre-analysis, prompting and tool calls that the LLM that comprise today's AI won't provide any actual value.


Yeah, it's funny, I never went down the regex rabbit hole until this, but I was blown away by Hyperscan/Vectorscan. It truly changes the game. Traditional wisdom tells you regex is slow.

> I'm surprised it's only 40%.

Oh, it's worse. I'm being conservative in the post. That number represents "pure" waste without sampling. You can see how we classify it: https://docs.usetero.com/data-quality/logs/malformed-data. If you get comfortable with sampling the right way (entire transactions, not individual logs), that number gets a lot bigger. The beauty of categories is you can incrementally root out waste in a way you're comfortable with.

> compare logs from known good to known bad

I think you're describing anomaly detection. Diffing normal vs abnormal states to surface what's different. That's useful for incident investigation, but it's a different problem than waste identification. Waste isn't about good vs bad, it's about value: does this data help anyone debug anything, ever? A health check log isn't anomalous, it's just not worth keeping.

You're right that the dimensional analysis and pre-processing is where the real work is. That's exactly what Tero does. It compresses logs into semantic events, understands patterns, and maps meaning before any evaluation happens.


> Traditional wisdom tells you regex is slow.

Because it's uncomfortably easy to create catastrophic backtracking.

But just logical-ORing many patterns together isn't one of the ways to do that, at least as far as I'm aware.


> I think you're describing anomaly detection.

Well it's in the same neighborhood. Anomaly detection tends to favor finding unique things that only happened once. I'm interested in the highest volume stuff that only happens on the abnormal state side. But I'm not sure this has a good name.

> Waste isn't about good vs bad, it's about value: does this data help anyone debug anything, ever?

I get your point but: if sorting by the most strongly associated yields root causes (or at least, maximally interesting logs), then sorting in the opposite direction should yield the toxic waste we want to eliminate?


Vectorscan is impressive. It makes a huge difference if you're looping through an eval of dozens (or more) regexps. I have a pending PR to fix it so it'll run as a wasm engine -- this is a good reminder to take that to completion.

But if you don't do anomaly detection, how can you possibly know which data is useful for anomaly detection? And thus, which data is valuable to keep

> so that customer support collapses the same day every year.

Every _month_. And it's not just the customer service desk that's a problem. With even distribution of billing and a large customer base, outflows match inflows and you don't have to do much to manage it. With all money coming in on one day you have a huge outflow of money and then it all rushes back in.

Much easier to borrow 1 dollar for a year than 30 dollars for a month.


Who was the second bank?

It was synchrony then current one and now changing again

Green Dot Corporation, presumably.. although that’s Apple Cash, not Apple Card.

It really depends on the definition of catch. Citi Double Cash, Fidelity, Wells Fargo and US Bank all do 2%.

Personally, I use a 2.625% cash back card with the "catch" being that I have to have enough stock in their subsidiary brokerage to qualify for the top rewards tier. Since I just buy and hold SP500 ETFs, this is an easy requirement.


Bank of America Unlimited Cash Rewards for the win :) My only regret is not realizing sooner that it existed since I used the Citi Double Cash card for so long.

Fred also has that answer: https://fred.stlouisfed.org/series/POPTHM

Growing up to an estimated 342 million.

It also has an estimate for the working population (ages 25-54, so called "prime workers"): https://fred.stlouisfed.org/series/LNU00000060

Mostly flat from 2010-2021, with a recent uptick to 131 million. The discrepancy is likely due to the boomers aging out of the category, and a smaller generation coming in.


Maybe this isn't what you meant, but Millennials are the larger generation and they just finished aging into the workforce category.

The youngest Millennials were 18 in 2014 and 22 in 2018. At this point, it's the smaller Gen Z entering the workforce, not Millennials.

Let me put it another way: the [20, 25) and [25, 30) age cohorts are larger than any cohort aged 50+ that might have recently aged out. So that "prime age" workforce is still growing.

This could be true, but it isn't obviously true (to me). (I dispute a little bit the idea that there are many new workers in the [25, 30) demo.) There are 37M workers 55+, but only 20M in the 16-24 range: https://www.bls.gov/cps/cpsaat18b.htm (2024 numbers)

Nobody in either of those cohorts is in the BLS "prime age" group which is [25, 55). The incoming cohorts that are now 15 to 25 are larger than the outgoing cohorts that are 45 to 55.

Ah, that makes sense. Thank you.

Related, yes. US DoT wants non-dom CDLs revoked[1]:

> The decision comes amid pressure from the U.S. Department of Transportation, which announced in November 2025 that it would compel California to revoke thousands of what it calls “illegally issued” non-domiciled Commercial Driver’s Licenses.

[1]: https://www.newsnationnow.com/politics/california-cancellati...


I'm kinda okay with putting the AI slop behind a paywall if it means nobody will actually see it.


There will be customers even though it is a useless feature tier.

Monetizing knowledge-work is nearly impossible if you want everyone to be rational about it. You gotta go for irrational customers like university and giant-org contracts, and that will happen here because of institutional inertia.


Interesting -- I just use https://news.ycombinator.com/best?h=168 for a weekly roundup, but that only tracks posts. Might need to supplement it with highlights or similar.

Reviewing the HN docs, https://news.ycombinator.com/bestcomments?h=168 might also be a good summary link.


On my way home, I noticed at a stoplight across the street from Apple Park that the driver in the lane next to me had his phone mounted up high in landscape mode and was watching the Simpsons. Just absolutely unhinged behavior lately.


Afaik, data center grade blackwell chips have never been legal for export to china. I think this has more do to with NVIDIA than DeepSeek. For a brief moment, people thought DeepSeek had found some way to produce AI without sending boatloads of cash to NVIDIA, causing a drop in share price.

Shortly thereafter people realized they were probably just evading sanctions and ~stealing~ bootstrapping parameters from other models to reach their stated training cost. This report is just further reporting on that rumor.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: