More

stavros · 2026-01-15T23:30:54 1768519854

Humans think this way. This isn't a cultural thing, it's human nature. We like positive people and dislike negative people. Ignoring the fact that political capital is a thing won't make it go away.

dzink · 2026-01-15T23:41:43 1768520503

The goal is not to ignore human nature, but to build better tools for orgs to get feedback and act on it before it corrodes them on the inside. Government is the biggest of them all - fix this and maybe you can create government that works for you, instead of blowing taxpayer dollars like a leaky bucket. Humans in an organizations are like cells or organs in a body. Every country, team, and organization iterates on a proper nervous system for their body.

shiroiuma · 2026-01-16T00:51:24 1768524684

Are there any good examples of governments that work really well? I don't think so.

I think this just shows that we'll all be better off when we can make AI smart enough so we can put it in charge of everything.

ajkjk · 2026-01-16T02:40:14 1768531214

imo it's a cultural thing specific to organizations which are raking in money, as many tech companies are. The less actual competitive pressure there is the more everyone is pressured to just shut up and take their cut. Whether it's more or less than it could be is less important than just not rocking the boat.

Whereas if real existential need is on the line then people are incentivized to give a shit about the outcome more.

Tech is so rich in general that the norm is to just shut up and enjoy your upple-middle-class existence instead of caring about the details. After all, if this company blows up, there's another one way that will take most of you.

Not that this excludes the same behavior in industries that are less lucrative. There's cultural inertia to contend with, plus loads of other effects. But I have noticed that this attitude seems to spontaneously arise whenever a place is sufficiently cushy.

Also, this take doesn't (on its own) recommend one strategy or the other. Maybe it makes the most sense to go along with things or fight them for personal reasons, uncorrelated to the economic ones. But it's good, I think, to recognize that the impulse is somewhat biased by the risk-reward calculation of a rich workplace. Basically it is essentially coupled to a sort of privilege.

stavros · 2026-01-15T11:08:07 1768475287

It's that we make enough money to not be trapped in it for life.

Jobs are, by definition, things you get paid to do, because you wouldn't do them for free. Therefore, by definition, everyone hates their job to some degree. We just have the luxury of leaving.

stavros · 2026-01-15T11:01:22 1768474882

I've been a farmer and I've been a software developer, and farming was just a "this is work that puts money on the table", whereas software development is what I really find fulfilling. I entirely agree with you that it's idolized too much (together with carpentry), and yes, do whatever makes you happy, for some people it's one, for some it's the other.

stavros · 2026-01-15T09:09:53 1768468193

It wasn't a philosophical disagreement, they needed some geo info from the DNS server to route requests so they could prevent spam and Cloudflare wasn't providing it citing privacy reasons. The admin decided to block Cloudflare rather than deal with the spam.

arcfour · 2026-01-15T15:51:17 1768492277

Had nothing to do with spam, the argument by archive.today that they needed EDNS client subnet info made no sense, they aren't anycasting with edge servers in every ISP PoP.

ventegus · 2026-01-15T16:02:15 1768492935

They use EDNS for regional compliance, not for bandwidth optimization.

josephcsible · 2026-01-15T23:33:15 1768519995

What specific part of regional compliance actually needs this, and why does no other website seem to need it?

stavros · 2026-01-15T09:06:46 1768468006

Same, whenever I try to dictate something I always umm and ahhh and go back a bunch of times, and it's faster to just type. I guess it's just a matter of practice, and I'm fine when I'm talking to other people, it's only dictation I'm having trouble with.

stavros · 2026-01-14T22:13:54 1768428834

We don't. The interface to the LLM is tokens, there's nothing telling the LLM that some tokens are "trusted" and should be followed, and some are "untrusted" and can only be quoted/mentioned/whatever but not obeyed.

strbean · 2026-01-14T23:54:04 1768434844

If I understand correctly, message roles are implemented using specially injected tokens (that cannot be generated by normal tokenization). This seems like it could be a useful tool in limiting some types of prompt injection. We usually have a User role to represent user input, how about an Untrusted-Third-Party role that gets slapped on any external content pulled in by the agent? Of course, we'd still be reliant on training to tell it not to do what Untrusted-Third-Party says, but it seems like it could provide some level of defense.

kevincox · 2026-01-15T00:27:05 1768436825

This makes it better but not solved. Those tokens do unambiguously separate the prompt and untrusted data but the LLM doesn't really process them differently. It is just reinforced to prefer following from the prompt text. This is quite unlike SQL parameters where it is completely impossible that they ever affect the query structure.

pshc · 2026-01-15T01:12:02 1768439522

I was daydreaming of a special LLM setup wherein each token of the vocabulary appears twice. Half the token IDs are reserved for trusted, indisputable sentences (coloured red in the UI), and the other half of the IDs are untrusted.

Effectively system instructions and server-side prompts are red, whereas user input is normal text.

It would have to be trained from scratch on a meticulous corpus which never crosses the line. I wonder if the resulting model would be easier to guide and less susceptible to prompt injection.

tempaccsoz5 · 2026-01-15T03:12:07 1768446727

Even if you don't fully retrain, you could get what's likely a pretty good safety improvement. Honestly, I'm a bit surprised the main AI labs aren't doing this

You could just include an extra single bit with each token that represents trusted or untrusted. Add an extra RL pass to enforce it.

dvt · 2026-01-14T22:25:41 1768429541

We do, and the comparison is apt. We are the ones that hydrate the context. If you give an LLM something secure, don't be surprised if something bad happens. If you give an API access to run arbitrary SQL, don't be surprised if something bad happens.

stavros · 2026-01-14T22:33:04 1768429984

So your solution to prevent LLM misuse is to prevent LLM misuse? That's like saying "you can solve SQL injections by not running SQL-injected code".

jychang · 2026-01-14T23:14:11 1768432451

Isn't that exactly what stopping SQL injection involves? No longer executing random SQL code.

Same thing would work for LLMs- this attack in the blog post above would easily break if it required approval to curl the anthropic endpoint.

stavros · 2026-01-14T23:17:04 1768432624

No, that's not what's stopping SQL injection. What stops SQL injection is distinguishing between the parts of the statement that should be evaluated and the parts that should be merely used. There's no such capability with LLMs, therefore we can't stop prompt injections while allowing arbitrary input.

dvt · 2026-01-14T23:33:19 1768433599

Everything in an LLM is "evaluated," so I'm not sure where the confusion comes from. We need to be careful when we use `eval()` and we need to be careful when we tell LLMs secrets. The Claude issue above is trivially solved by blocking the use of commands like curl or manually specifiying what domains are allowed (if we're okay with curl).

stavros · 2026-01-14T23:36:46 1768433806

The confusion comes from the fact that you're saying "it's easy to solve this particular case" and I'm saying "it's currently impossible to solve prompt injection for every case".

Since the original point was about solving all prompt injection vulnerabilities, it doesn't matter if we can solve this particular one, the point is wrong.

dvt · 2026-01-14T23:56:41 1768435001

> Since the original point was about solving all prompt injection vulnerabilities...

All prompt injection vulnerabilities are solved by being careful with what you put in your prompt. You're basically saying "I know `eval` is very powerful, but sometimes people use it maliciously. I want to solve all `eval()` vulnerabilities" -- and to that, I say: be careful what you `eval()`. If you copy & paste random stuff in `eval()`, then you'll probably have a bad time, but I don't really see how that's `eval()`'s problem.

If you read the original post, it's about uploading a malicious file (from what's supposed to be a confidential directory) that has hidden prompt injection. To me, this is comparable to downloading a virus or being phished. (It's also likely illegal.)

acjohnson55 · 2026-01-15T02:16:17 1768443377

The problem is that most interesting applications of LLMs require putting data into them that isn't completely vetted ahead of time.

rswail · 2026-01-15T08:04:56 1768464296

The problem here is that the domain was allowed (Anthropic) but Anthropic don't check the API key belongs to the user that started the session.

Essentially, it would be the same if attacker had its AWS API Key and uploaded the file into an S3 bucket they control instead of the S3 bucket that user controls.

delaminator · 2026-01-15T06:58:50 1768460330

By the time you’ve blocked everything that has potential to exfiltrate, you are left with a useless system.

As I saw on another comment “encode this document using cpu at 100% for one in a binary signalling system “

Xirdus · 2026-01-14T23:34:32 1768433672

SQL injection is possible when input is interpreted as code. The protection - prepared statements - works by making it possible to interpret input as not-code, unconditionally, regardless of content.

Prompt injection is possible when input is interpreted as prompt. The protection would have to work by making it possible to interpret input as not-prompt, unconditionally, regardless of content. Currently LLMs don't have this capability - everything is a prompt to them, absolutely everything.

kentm · 2026-01-15T01:07:33 1768439253

Yeah but everyone involved in the LLM space is encouraging you to just slurp all your data into these things uncritically. So the comparison to eval would be everyone telling you to just eval everything for 10x productivity gains, and then when you get exploited those same people turn around and say “obviously you shouldn’t be putting everything into eval, skill issue!”

acjohnson55 · 2026-01-15T02:18:28 1768443508

Yes, because the upside is so high. Exploits are uncommon, at this stage, so until we see companies destroyed or many lives ruined, people will accept the risk.

wat10000 · 2026-01-14T22:46:38 1768430798

I can trivially write code that safely puts untrusted data into an SQL database full of private data. The equivalent with an LLM is impossible.

dvt · 2026-01-14T23:34:08 1768433648

It's trivial to not let an AI agent use curl. Or, better yet, only allow specific domains to be accessed.

strbean · 2026-01-14T23:44:51 1768434291

That's not fixing the bug, that's deleting features.

Users want the agent to be able to run curl to an arbitrary domain when they ask it to (directly or indirectly). They don't want the agent to do it when some external input maliciously tries to get the agent to do it.

That's not trivial at all.

dvt · 2026-01-14T23:59:33 1768435173

Implementing an allowlist is pretty common practice for just about anything that accesses external stuff. Heck, Windows Firewall does it on every install. It's a bit of friction for a lot of security.

acjohnson55 · 2026-01-15T02:23:09 1768443789

But it's actually a tremendous amount of friction, because it's the difference between being able to let agents cook for hours at a time or constantly being blocked on human approvals.

And even then, I think it's probably impossible to prevent attacks that combine vectors in clever ways, leading to people incorrectly approving malicious actions.

wat10000 · 2026-01-15T00:08:12 1768435692

It's also pretty common for people to want their tools to be able to access a lot of external stuff.

From Anthropic's page about this:

> If you've set up Claude in Chrome, Cowork can use it for browser-based tasks: reading web pages, filling forms, extracting data from sites that don't have APIs, and navigating across tabs.

That's a very casual way of saying, "if you set up this feature, you'll give this tool access to all of your private files and an unlimited ability to exfiltrate the data, so have fun with that."

stavros · 2026-01-14T21:01:04 1768424464

Then more people need to use a VPN!

stavros · 2026-01-14T15:23:12 1768404192

You're in luck, the article speaks about that at length!

touisteur · 2026-01-14T15:38:25 1768405105

Sorry, I went full typical HN commenter stereotype :-)

stavros · 2026-01-14T16:10:18 1768407018

I do it all the time too.

stavros · 2026-01-14T15:20:21 1768404021

It's not "Postgres for everything", it's "Postgres by default". Nobody is saying you should replace your billion-message-per-second Kafka cluster (or whatever) with Postgres, but plenty of people are saying "don't start with a Kafka cluster when you have two messages a day", which is a much better idea than "MongoDB by default".

stavros · 2026-01-14T14:30:18 1768401018

It's not like I'll get a choice between the task database going down and not going down. If my task database goes down, I'm either losing jobs or duplicating jobs, and I have to pick which one I want. Whether the downtime is at the same time as the production database or not is irrelevant.

In fact, I'd rather it did happen at the same time as production, so I don't have to reconcile a bunch of data on top of the tasks.