Hacker Newsnew | past | comments | ask | show | jobs | submit | rynn's commentslogin

9 years into transformers and only a couple years into highly useful LLMs I think the jury is still out. It certainly seems possible that some day we'll have the equivalent of an EDR or firewall, as we do for viruses and network security.

Not perfect, but good enough that we continue to use the software and networks that are open enough that they require them.


Firewalls run on explicit rules. The "lethal trifecta" thing tells you how to constrain an LLM to enforce some set of explicit rules.

It only tells you that you can't secure a system using an LLM as a component without completely destroying any value provided by using the LLM in the first place.

Prompt injection cannot be solved without losing the general-purpose quality of an LLM; the underlying problem is also the very feature that makes LLMs general.


>What is keeping Google/Amazon/Microsoft from licensing Groq’s tech?

Nothing, but they likely can't implement it as well as they could had they bought Groq first.


> licensing with BSL when basically every month the AI world is changing is not a smart decision

This turned me off as well. Especially with no published pricing and a link to a site that is not about this product.

At minimum, publish pricing.


Regarding DeepMyst. In the future will offer “optionally” the ability to use smart context where the context will be automatically optimized such that you won’t hit the context window limit “ basically no need for compact” and you would get much higher usage limits because the number of tokens needed will be reduced by up to 80% so you would be able to achieve with a 20 USD claude plan the same as the Pro plan


I strongly suggest to also allow to define a non summarizable part of the context so that behavioral rules stay sharp.


I agree and this is part of what DeepMyst is capable of doing


Is it already there? Pretty cool.


It is free and open source. Will make it MIT


Done and converted to MIT


Awesome, in that case, I'll check it out!


Where spies == logging and they tell you, and provide clear opt out instructions


Imagine thinking most people read Knowledge Base articles and don't just take the defaults.

They even manage to squeeze some FUD into the opt-out toggle's name.


> Please do give that a try and report back the prefill and decode speed.

M4 Max here w/ 128GB RAM. Can confirm this is the bottleneck.

https://pastebin.com/2wJvWDEH

I weighed about a DGX Spark but thought the M4 would be competitive with equal RAM. Not so much.


I think the DGX Spark will likely underperform the M4 from what I've read.

However it will be better for training / fine tuning, etc. type workflows.


> I think the DGX Spark will likely underperform the M4 from what I've read.

For the DGX benchmarks I found, the Spark was mostly beating the M4. It wasn't cut and dry.


The Spark has more compute, so it should be faster for prefill (prompt processing).

The M4 Max has double the memory bandwidth, so it should be faster for decode (token generation).


gpt-5.2 codex isn't available in the API yet.

If you want to be picky they could've compared it against gpt-5 pro gpt-5.2 gpt-5.1 gpt-5.1-codex-max gpt-5.2 pro

all depending on when they ran benchmarks (unless, of course, they are simply copying OAI's marketing).

At some point it's enough to give OAI a fair shot and let OAI come out with their own PR, which they doubtlessly will.


What are you working on that you’ve had such great success with gpt-oss?

I didn’t try it long because I got frustrated waiting for it to spit out wrong answers.

But I’m open to trying again.


I use it to build some side-projects, mostly apps for mobile devices. It is really good with Swift for some reason.

I also use it to start off MVP projects that involve both frontend and API development but you have to be super verbose, unlike when using Claude. The context window is also small, so you need to know how to break it up in parts that you can put together on your own


> What are you working on that you’ve had such great success with gpt-oss?

I'm doing programming on/off (mostly use Codex with hosted models) with GPT-OSS-120B, and with reasoning_effort set to high, it gets it right maybe 95% of the times, rarely does it get anything wrong.


It will be like the rest of computing, some things will move to the edge and others stay on the cloud.

Best choice will depend on use cases.


> It will be like the rest of computing, some things will move to the edge and others stay on the cloud.

It will become like cloud computing - some people will have a cloud bill of $10k/m to host their apps, other people would run their app on a $15/m VPS.

Yes, the cost discrepancy will be as big as the current one we see in cloud services.


I think the long term will depends on the legal/rent-seeking side.

Imagine having the hardware capacity to run things locally, but not the necessary compliance infrastructure to ensure that you aren't committing a felony under the Copyright Technofeudalism Act of 2030.


Where did you get all the data? The justice.gov site didn’t have a mass download option that I could find.


https://www.jmail.world/about

"We compiled these Epstein estate emails from the House Oversight Committee release by converting the PDFs to structured text with an LLM"

and:

"Data Sources

    Gmail emails: House Oversight Committee
    Yahoo emails: DDoSecrets (brought to us by Drop Site News)
Technology

Document parsing and extraction powered by reducto"


Yes, also many were PPM images (or encoded as such) in PDFs and then I used (cheap/light) multimodal LLMs to classify documents from photos. It was surprisingly cheap: <$1 for a few thousand PDFs / Images.


> I recognize this is a hard concept to understand for folks on this site, but the average joe signing up for a VPN doesn't even remotely understand what they are doing and why.

Really this is the answer to half of the comments on this thread.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: