Hacker Newsnew | past | comments | ask | show | jobs | submit | sippeangelo's commentslogin

Governments using Palantir services as a loophole to enable mass surveillance by linking data is the evil part.

How is Palantir a loophole?

I see this theory a lot (sometimes to justify their valuation, sometimes as a moral judgement, sometimes as an alarmist concern) but I genuinely don't see how this line of thought works in any of these dimensions. My understanding is that they're consultants building overpriced data processing products. As far as I know there isn't even usually a separate legal entity or some kind of corporate shenanigan at play; my understanding is that they send engineers to the customer to build a product that the customer owns and operates under the customer's identity as the customer. I certainly see how businesses like Flock are a "loophole;" they collect data which is unrestricted due to its "public" nature and provide a giant trove of tools to process it which are controlled only by what amounts to their own internal goodwill. But this isn't my understanding of how Palantir works; as far as I know they never take ownership of the data so it isn't "laundered" from its original form, and is still subject to whatever (possibly inadequate) controls or restrictions were already present on this data.


> How is Palantir a loophole?

The big legal loophole is that the government needs a particularized warrant (per the 4th Amendment) to ask for any user data, but if the government buys commercial data, well, there's no warrant needed.

I would also submit that it's possible that sending everything through a giant computer-magic-bullshit-mixer allows you to discriminate on the basis of race while claiming plausible deniability, but SCOTUS has already constructively repealed the 14th Amendment between blessing Kavanaugh stops and the Roberts Court steadily repealing the Voting Rights Act, Bivens claims, etc.


> I would also submit that it's possible that sending everything through a giant computer-magic-bullshit-mixer

See also: Parallel Construction (i.e. evidence tampering) and most of the times a "drug-sniffing" dog is called to "test" something the police already want to search.


Which has what, exactly, to do with Palantir?

On a somewhat related note, it always bothers me that the discussion is about whether it’s appropriate for the government to buy this sort of data as opposed to whether it is appropriate for anyone to sell, or for that matter collect, that data.

I would prefer if neither the government nor any data brokers or advertisers had this data.


> The big legal loophole is that the government needs a particularized warrant (per the 4th Amendment) to ask for any user data, but if the government buys commercial data, well, there's no warrant needed.

Right; but as far as I know Palantir don't sell commercial data. That's my beef with this whole Palantir conspiracy theory. I am far from pro-Palantir but it really feels like they're working as a shield for the pitchforks in this case.


Pretty sure GP is saying that the data Palantir sells are commercial because they're being sold by Palantir.

Right, and what I’m saying is that to the best of my knowledge, Palantir don’t sell data at all, which is the fundamental misunderstanding people seem to have about them.

There are two really two major concerning issues with Palantir:

1. They provide tech that is used to select targets for drone strikes and apparently also for targeting violent attacks on US civilians. I don't know too much about how the algorithm works but simply outsourcing decisions about who lives or dies to opaque algorithms is creepy. It also allows the people behind the operations to avoid personal responsibility for mistakes by blaming the mistakes on the software. It also could enable people to just not think about it and thus avoid the moral question entirely. It's an abstract concern but it is a legitimate one, IMO.

2. I don't know if this is 100% confirmed but we have heard reports that Elon Musk and DOGE collected every piece of government data that they could get their hands, across various government departments and databases. These databases were previously islands that served one specific purpose and didn't necessarily connect to all the other government databases from other departments. It's suspected that palantir software (perhaps along with Grok) is being used to link all of these databases together and cross reference data that was previously not available for law enforcement or immigration purposes. This could enable a lot of potential abuse and probably isn't being subjected to any kind of court or congressional oversight.


We agree, I think these are the more valid concerns than the "they are operating a data warehouse with all of the data in the entire universe" conspiracy theory that seems popular.

I certainly think that Palantir has ethical issues; as I stated in my parent comment, it wouldn't be high on my list of choices for places to work.

But, when it comes to things like (2), this is a failure of regulation and oversight and needs to be treated as such. Note that this doesn't make Palantir "right" (building a platform to do things that are probably bad is still bad), but there's no reason anyone with basic data warehousing skills couldn't have done this before or after.

Essentially, I think people give Palantir specifically too much credit and in turn ignore the fundamental issues they're worried about. Panic over "dismantle Palantir" or even the next step, "dismantle corporate data warehousing" is misguided and wouldn't address the issues at hand; worry about government data fusion needs to be directed towards government data fusion, and worry about computers making targeting decisions needs to be directed at computers making targeting decisions.


They sell data derived from the data. But it's not, like, a hash function - you can absolutely deduce the source data from it. In fact, that's the entire purpose. You use the aggregation and whatnot bullshit to find individuals, track them, gain insight into their living situation and patterns, and acquire evidence of crimes. Typically that requires a search warrant.

If you couldn't go backwards Palantir wouldn't have a market. So, I would consider that a loophole.


> They sell data derived from the data.

Do they? I don't think they even do this, either.

I have really strong knowledge of this from ~10 years ago and weak knowledge from more recently. I'm happy to be proven wrong but my understanding is that they don't sell any data at all, but rather just consulting services for processing data someone already has.

One of those consulting services is probably recommending vendors to supply more data, but as far as I know Palantir literally do not have a first-party data warehouse at all.


They also used Google, Facebook, etc... as a loophole for suppressing freedom of speech in the past (and could still be for all I know).

That's not a turn-taking model, it's just a silence detection Python script based on whatever text comes out of Whisper...

With all respect to Mozilla, "respects robots.txt" makes this effectively DoA. AI agents are a form of user agent like any other when initiated by a human, no matter the personal opinion of the content publisher (unlike the egregious automated /scraping/ done for model training).

This is a valid perspective. Since this is an emerging space, we are still figuring out how to show up in a healthy way for the open web.

We recognize that the balance between content owners and the users or developers accessing that content is delicate. Because of that, our initial stance is to default to respecting websites as much as possible.

That said, to be clear on our implementation: we currently only respond to explicit blocks directed at the Tabstack user agent. You can read more about how this works here: https://docs.tabstack.ai/trust/controlling-access


This tension is so close to a fundamental question we’re all dealing with, I think: “Who is the web for? Humans or machines?”

I think too often people fall completely on one side of this question or the other. I think it’s really complicated, and deserves a lot of nuance. I think it mostly comes down to having a right to exert control over how our data should be used, and I think most of it’s currently shaped by Section 230.

Generally speaking, platforms consider data to be owned by the platform. GDPR and CCPA/CPRA try to be the counter to that, but those are also too-crude a tool.

Let’s take an example: Reddit. Let’s say a user is asking for help and I post a solution that I’m proud of. In that act, I’m generally expecting to help the original person who asked the question, and since I’m aware that the post is public, I’m expecting it to help whoever comes next with the same question.

Now (correct me if I’m wrong, but) GDPR considers my public post to be my data. I’m allowed to request that Reddit return it to me or remove it from the website. But then with Reddit’s recent API policies, that data is also Reddit’s product. They’re selling access to it for … whatever purposes they outline in the use policy there. That’s pretty far outside what a user is thinking when they post on Reddit. And the other side of it as well — was my answer used to train a model that benefits from my writing and converts it into money for a model maker? (To name just an example).

I think ultimately, platforms have too much control, and users have too little specificity in declaring who should be allowed to use their content and for what purposes.


There is still a difference between "fetch this page for me and summarise" and "go find pages for me, and cross-reference". And what makes you think that all AI agents using Tabstack would be directly controlled in real time with a 1:1 correspondence between human and agent, and not in some automated way?

I'm afraid that Tabstack would be powerful enough to bypass some existing countermeasures against scrapers, and once allowed in its lightweight mode be used to scrape data it is not supposed to be allowed to. I'd bet that someone will at least try.

Then there is the issue of which actions and agent is allowed to do on behalf of a user. Many sites have in their Terms of Service that all actions must be by done directly by a human, or that all submitted content be human-generated and not from a bot. I'd suppose that an AI agent could find and interpret the ToS, but that is error-prone and not the proper level to do it at. Some kind of formal declaration of what is allowed is necessary: robots.txt is such a formal declaration, but very coarsely grained.

There have been several disparate proposals for formats and protocols that are "robots.txt but for AI". I've seen that at least one of them allow different rules for AI agents and machine learning. But these are too disparate, not widely known ... and completely ignored by scrapers anyway, so why bother.


I agree with you in spirit, but I find it hard to explain that distinction. What's the difference between mass web scraping and an automated tool using this agent? The biggest differences I assume would be scope and intent... But because this API is open for general development, it's difficult to judge the intent and scope of how it could be used.

What's difficult to explain? If you're having an agent crawl a handful of pages to answer a targeted query, that's clearly not mass scraping. If you're pulling down entire websites and storing their contents, that's clearly not normal use. Sure, there's a gray area, but I bet almost everyone who doesn't work for an AI company would be able to agree whether any given activity was "mass scraping" or "normal use".

What is worse: 10,000 agents running daily targeted queries on your site, or 1 query pulling 10,000 records to cache and post-process your content without unnecessarily burdening your service?

The single agent pulling regularly 10k records, which nobody will ever use, is worse than the 10k agents coming from the same source, and using the same cache, they fill when doing a targeted request. But even worse are 10k agents from 10k different sources, scraping 10k sites each, of which 9999 pages are not relevant for their request.

At the end it's all about the impact on the servers, and those can be optimized, but this does not seem to happen at the moment at large. So in that regard, centralizing usage and honouring the rules is a good step, and the rest are details to figure out on the way.


I apprehend that you want me to say the first one is worse, but it's impossible with so few details. Like: worse for whom? in what way? to what extent?

If (for instance) my content changes often and I always want people to see an up-to-date version, the second option is clearly worse for me!


No, I've been turning it over in my mind since this question started to emerge and I think it's complicated, I don't have an answer myself. After all, the first option is really just the correlate to today's web traffic, it's just no longer your traffic. You created the value, but you do not get the user attention.

My apprehension is not with AI agents per se, it is the current, and likely future implementation: AI vendors selling the search and re-publication of other parties' content. In this relationship, neither option is great: either these providers are hammering your site on behalf of their subscribers' individual queries, or they are scraping and caching it, and reselling potentially stale information about you.


100%

Exactly. robots.txt with regards to AI is not a standard and should be treated like the performative, politicized, ideologically incoherent virtue signalling that it is.

There are technical improvements to web standards that can and should be made that doesn't favor adtech and exploitative commercial interests over the functionality, freedom, and technically sound operation of the internet


To be clear, this is Reddit auto-translating and poisoning their own search results, and it's the most frustrating thing. Post AI and post Google enshittification, we all ended up appending "reddit" to the end of our searches to filter out all the blogspam and get some real human opinions on queries, so Reddit decides to cash in and make sure they receive the maximum amount of that traffic by making it available in every language under the sun, poisoning their own well deliberately. It's shit turtles all the way down!


Thankfully the feature is just in time for it to fall out of fashion! It really is an awful layout, UX wise. But at least it looks pretty at a glance!


You use Syncthing for object storage?


The overmoulding is seriously ugly. Gold and navy blue?! Silver and medical grey? Only the black is passable.


Yeah, I have to wonder what lead them to these color choices. Not sure why they wouldn't include a white option or any other good neutral colors that actually go with silver and gold. And I think the number of people willing to wear a matte black ring is quite low, especially among women.


But who's gonna produce that once Paramount owns HBO?


The US has freedom of speech, so anyone who wants to spend money producing a tv show or movie about Paramount’s sale, regardless of HBO’s ownership.

I think it would be quite boring, though


Apple TV will buy it from Sony.


> who's gonna produce that once Paramount owns HBO?

Netflix.

If they win, they own HBO. If they lose, they have a beef with Ellison.

(Speaking out of my ass here. But I think there is broad underappreciation of how intensely a lot of Hollywood creatives do not want to work for a rightwinger. I imagine Netflix, Disney and others will have a bit of a bonanza over the coming years of picking up disaffecteds from Paramount et al, even assuming the latter don't wind up in bankruptcy.)


Don't sleep on the A24 or NEON model. I think we'll see a boom in independent film production and distribution companies over the next few years, especially with the inevitable dry powder from either deal.


Wow you weren't kidding. The insides of that looks like an absolute hellscape. Like a whole floor is missing and they just set up shop in a warehouse!


Love this bit of lore. It goes super well with The Thought Emporiums video about recreating an Egyptian mummy just to eat it: https://youtu.be/fbhV0TP3jco


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: