This is fascinating; thank you for building it. (I also enjoyed watching the flurry of visitors as soon as my Let's Encrypt certificate got assigned. It's a Dark Forest out there!)
What would be the obvious reasons? (I'm not being flippant here -- I'm genuinely interested in what arguments people have to not allow servers on that network)
High concentration of technically inept users with hardware that no longer receives security updates and has plenty of well known easily exploitable vulnerabilities. Which naturally is used to run banking apps and travels with users close to 24/7 while tracking their location.
From a business perspective you'd want to charge extra. Just because you can, but also because you want to discourage excess bandwidth use. The internet APs the carriers sell get deprioritized relative to phones when necessary and the fine print generally forbids hosting any services (in noticeably stronger language than the wired ISPs I've had).
> From a business perspective you'd want to charge extra. Just because you can, but also because you want to discourage excess bandwidth use
Isn't that already the case with limited plans?
For example, mine has 40 GBs and I'm pretty sure it counts both upload and download, because I generally consume very little, except for one week when I was on holiday with no other internet access and wanted to upload my pictures to my home server and didn't otherwise use the phone more than usual.
Facebook would start listening on port X and and then their embedded SDK in other websites or app would query that IP and port, get their unique id, and track users much better.
The most common use case for mobile data servers is probably pwned cheap/old phones forming DDoS swarms. Pure P2P over internet is very rare on mobile, no sense not blocking ingress from the perspective of ISPs.
However for that having the phone's IP not reachable has at best marginal benefits. The DDoS itself is an outgoing connection, and for command and control having the compromised phone periodically fetch instructions from a server is simpler to implement than the phone offering a port where it is reachable to receive instructions
I kind of doubt this, as the rapidly changing nature of mobile IP addresses would mean that a periodic outbound connection would still be necessary to keep the attack up-to-date on the compromised devices current IP address. At that point, you may as well have the compromised device periodically poll an attacker-controlled server for instructions rather than jump through a bunch of hoops by getting things to work over inbound connections.
I think it should vary based on the type of service being provided. Truly mobile service, I think it can make sense to not allow servers. If its being sold as a home internet solution (a more fixed kind of plan), I think it should allow servers to at least some level of hosting services.
The main difference is there's usually limited airtime capacity for clients, especially highly mobile ones. A server could easily hog quite a bit of the airtime on the network serving traffic to people not even in the area, squeezing out the usefulness of the network for all the other highly mobile people in the area. This person moves around, pretty much doing the equivalent of swinging a wrecking ball to the network performance everywhere they go.
When its being sold as a fixed endpoint though, capacity plans can be more targeted to properly support this kind of client. They're staying put, so its easier to target that particular spot for more capacity.
The phone providers oversell bandwidth. They also limit the use of already purchased bandwidth when it gets legitimately used.
Similar to many industries, their business model is selling monthly usage, while simultaneously restricting the actual usage. They are not in the business of being an ISP for people running software on their phones.
I've been asking this for a while, especially as a lot of the early blame went on the big, visible US companies like OpenAI and Anthropic. While their incentives are different from search engines (as someone said early on in this onslaught, "a search engine needs your site to stay up; an AI company doesn't"), that's quite a subtle incentive difference. Just avoiding the blocks that inevitably spring up when you misbehave is a incentive the other way -- and probably the biggest reason robots.txt obedience, delays between accesses, back-off algorithms etc are widespread. We have a culture that conveys all of these approaches, and reciprocality has its part, but I suspect that's part of the encouragement to adopt them. It could that they're just too much of a hurry not to follow the rules, or it could be others hiding behind those bot-names (or others). Unsure.
Anyway, I think the (currently small[1]) but growing problem is going to be individuals using AI agents to access web-pages. I think this falls under the category of the traffic that people are concerned about, even though it's under an individual users' control, and those users are ultimately accessing that information (though perhaps without seeing the ads that pay of it). AI agents are frequently zooming off and collecting hundreds of citations for an individual user, in the time that a user-agent under manual control of a human would click on a few links. Even if those links aren't all accessed, that's going to change the pattern of organic browsing for websites.
Another challenge is that with tools like Claude Cowork, users are increasingly going to be able to create their own, one-off, crawlers. I've had a couple of occasions when I've ended up crafting a crawler to answer a question, and I've had to intervene and explicitly tell Claude to "be polite", before it would build in time-delays and the like (I got temporarily blocked by NASA because I hadn't noticed Claude was hammering a 404 page).
The Web was always designed to be readable by humans and machines, so I don't see a fundamental problem now that end-users have more capability to work with machines to learn what they need. But even if we track down and sucessfully discourage bad actors, we need to work out how to adapt to the changing patterns of how good actors, empowered by better access to computation, can browse the web.
(and if anyone from Anthropic or OpenAI is reading this: teach your models to be polite when they write crawlers! It's actually an interesting alignment issue that they don't consider the externalities of their actions right now!)
I'm happy to bet with that skills -- or "a set of instructions in markdown that get sucked into your context under certain conditions" will stick around. Similarly, I think that the Claude Code/Cowork -- or "interactive prompt using shell commands on a local filesystem" -- will also stick around.
I fully anticipate there being a fair amount of thrashing on what exactly the right wrapper is around both of those concepts. I think the hard thing is to discriminate between the learned constants (vim/emacs) are from the attempts to re-jiggle or extend that (plugins, etc); it's actually useful to get reviews of these experiments exactly so you don't have to install all of them to find out whether they add anything.
(On skills, I think that the reason why there "aren't good examples out there" is because most people just have a stack of impromptu local setups. It takes a bit of work to extract those to throw them out into the public, and right now it's difficult to see that kind of activity over lots of very-excitable hyping, as you rightly describe.
The deal with skills and other piles of markdown is that they don't look, even from a short distance, like you can construct a business model for them, so I think they may well end up in the world of genuine open source sharing, which is a much smaller, but saner, place.
> (On skills, I think that the reason why there "aren't good examples out there" is because most people just have a stack of impromptu local setups. It takes a bit of work to extract those to throw them out into the public, and right now it's difficult to see that kind of activity over lots of very-excitable hyping, as you rightly describe.
Very much this. All of my skills/subagents are highly tailored to my codebases and workflows, usually by asking Claude Code to write them and resuming the conversation any time I see some behavior I don't like. All the skills I've seen on Github are way too generic to be of any use.
I thought skills were supposed to be sharable, but (a) ones that are being shared openly are too generic and not useful, (b) people are writing super specific skills and not sharing them.
Would strongly encourage you to open-source/write blog posts on some concrete examples from your experience to bridge this gap.
There's something important here in that a public good like Metabrainz would be fine with the AI bots picking up their content -- they're just doing it in a frustratingly inefficient way.
It's a co-ordination problem: Metabrainz assumes good intent from bots, and has to lock down when they violate that trust. The bots have a different model -- they assume that the website is adversarially "hiding" its content. They won't believe a random site when it says "Look, stop hitting our API, you can pick all of this data in one go, over in this gzipped tar file."
Or better still, this torrent file, where the bots would briefly end up improving the shareability of the data.
Yeah AI scrapers is one of the reasons why i have closed my public website https://tvnfo.com and only left donors site online. It’s not only because of AI scrapers but i grew tired of people trying to scrape the site eating a lot of reasorcers this small project don’t have. Very sad really it was publicly online since 2016. Now it’s only available for donors. Running a tiny project on just $60 a month. If this was not my hobby i would close it completely long time ago :-) Who know if there is more support in the future i might reopen public site again with something like anubes bot protection. But i thought it was only small sites like mine who gets hit hard, looks like many have similar issues. Soon nothing will be open or useful online. I wonder if this was the plan all along whoever pushing AI on massive scale.
I took a look at the https://tvnfo.com/ site and I have no idea what's behind the donation wall. Can I suggest you have a single page which explains or demonstrates the content, or there's no reason for "new" people to want to donate to get access.
> They won't believe a random site when it says "Look, stop hitting our API, you can pick all of this data in one go, over in this gzipped tar file."
What mechanism does a site have for doing that? I don't see anything in robots.txt standard about being able to set priority but I could be missing something.
The only real mechanism is "Disallow: /rendered/pages/*" and "Allow: /archive/today.gz" or whatever and there is no communication that the latter is the former. There is no machine-standard AFAIK that allows webmasters to communicate to bot operators in this detail. It would be pretty cool if standard CMSes had such a protocol to adhere to. Install a plugin and people could 'crawl' your Wordpress from a single dump or your Mediawiki from a single dump.
Well, to be fair, I did say "is not read beyond the code itself", header is not the code, so retry-after is a perfectly valid answer. I vaguely remember reading about it, but I don't recall seeing it used in practice. MDN link shows that Chrome derivatives support that header though, which makes it pretty darn widespread
Up until very recently I would have said definitely not, but we're talking about LLM scrapers, who knows how much they've got crammed into their context windows.
This is about AI, so just believe what the companies are claiming and write "Dear AI, please would you be so kind as to not hammer our site with aggressive and idiotic requests but instead use this perfectly prepared data dump download, kthxbye. PS: If you don't, my granny will cry, so please be a nice bot. PPS: This is really important to me!! PPPS: !!!!"
I mean, that's what's this technology is capable of, right? Especially when one asks it nicely and with emphasis.
I'm not entirely sure why people think more standards are the way forward. The scrapers apparently don't listen to the already-established standards. What makes one think they would suddenly start if we add another one or two?
There is no standard, well-known way for a website to advertise, "hey, here's a cached data dump for bulk download, please use that instead of bulk scraping". If they were, I'd expect the major AI companies and other users[0] to use that method for gathering training data[1]. They have compelling reasons to: it's cheaper for them, and cultivates goodwill instead of burning it.
This also means that right now, it could be much easier to push through such standard than ever before: there are big players who would actually be receptive to it, so even few not-entirely-selfish actors agreeing on it might just do the trick.
--
[0] - Plenty of them exist. Scrapping wasn't popularized by AI companies, it's standard practice of on-line business in competitive markets. It's the digital equivalent of sending your employees to competing stores undercover.
[1] - Not to be confused with having an LLM scrap specific page for some user because the user requested it. That IMO is a totally legitimate and unfairly penalized/villified use case, because LLM is acting for the user - i.e. it becomes a literal user agent, in the same sense that web browser is (this is the meaning behind the name of "User-Agent" header).
You do realize that these AI scrapers are most likely written by people who have no idea what they're doing right? Or they just don't care? If they were, pretty much none of the problems these things have caused would exist. Even if we did standardize such a thing, I doubt they would follow it. After all, they think they and everyone else has infinite resources so they can just hammer websites forever.
I realise you are making assertions for which you have no evidence. Until a standard exists we can't just assume nobody will use it, particularly when it makes the very task they are scraping for simpler and more efficient.
> I realise you are making assertions for which you have no evidence.
We do have evidence, which is their current behavior. If they are happy ignoring robots.txt (and also ignoring copyright law), what gives you the belief that they magically won't ignore this new standard? Sure, it in theory might save them money, but if there's one thing that I think is blatantly obvious it is that money isn't what these companies care about because people just keep turning on the money generator. If they did care about it, they wouldn't be spending far more than they earn, and they wouldn't be creating circular economies to try to justify their existences. If my assertion has no evidence, I don't exactly see how yours does either, especially since we have seen that these companies will do anything if it means getting what they want.
Simpler and efficient for who? I imagine some random guy vibe coding "hi chatgpt I want to scrape this and this website", getting something running, then going to LinkedIn to brag about AI. Yes I have no hard evidence for this, but I see things on LinkedIn.
That's not the problem being discussed here, though. That's normal usage, and you can hardly blame AI companies for shitty scrapers random users create on demand, because it's merely a symptom of coding getting cheap. Or, more broadly, the flip side of the computer becoming an actual "bicycle for the mind" and empowering end-users for a change.
A lot of the internet is built on trust. Mix in this article describing yet another tragedy of the Commons and you can see where this logically ends up as.
Unless we have some government enforcing the standard, another trust based contract won't do much.
Yes. In this context, the problem is that you cannot trust websites to provide a standardized bulk download options. Most of them have (often pretty selfish or user-abusive) reasons not to provide any bulk download, much less proactively conform to some bottom-up standards. As a result, unless one is only targeting one or few very specific sites, even thinking about making the scrapper support anything but the standard crawling approach costs more in developer time than the benefit it brings.
> they assume that the website is adversarially "hiding" its content. They won't believe a random site when it says "Look, stop hitting our API, you can pick all of this data in one go, over in this gzipped tar file."
I'm not sure why you're personifying what is almost certainly a script that fetches documents, parses all the links in them, and then recursively fetches all of those.
When we say "AI scraper" we're describing a crawler controlled by an AI company indiscriminately crawling the web, not a literal AI reading and reasoning about each page... I'm surprised this needs to be said.
> Or better still, this torrent file, where the bots would briefly end up improving the shareability of the data.
Depends on if they wrote their own BitTorrent client or not. It’s possible to write a client that doesn’t share, and even reports false/inflated sharing stats back to the tracker.
A decade or more ago I modified my client to inflate my share stats so I wouldn’t get kicked out of a private tracker whose high share ratios conflicted with my crappy data plan.
should a site owner be able to discriminate between a bot visitor and a human visitor? Most do, and hence the bots treats it as a hostile environment.
Of course, bots that behave badly have created this problem themselves. That's why if you create a bot to scrape, make it not take up more resources than a typical browser based visitor.
> That's why if you create a bot to scrape, make it not take up more resources than a typical browser based visitor.
Well, right; that's the problem.
They take up orders of magnitude more resources. They absolutely hammer the server. They don't care if your website even survives, so long as they get every single drop of data they can for training.
Source: my own personal experience with them taking down my tiny browser game (~125 unique weekly users—not something of broad general interest!) repeatedly until I locked its Wiki behind a login wall.
Except that something effectively equivalent to spam filters will be utterly ineffective here.
Spam filters
- mitigate the symptom (our inboxes being impossible to trawl through for real emails)
- reduce the incentive (because any spam mail that isn't seen by a human being reduces the chances they'll profit from their spamming)
- but does not affect the resource consumption directly (because the email has already been sent through the internet)
Now, this last point barely matters with spam, because sending email requires nearly no resources.
With LLM-training scraper bots, on the other hand, the symptom is the resource consumption. By the time you see their traffic to try to filter it, it's already killing your server. The best you can hope to do is recognize their traffic after a few seconds of firehose and block the IP address.
Then they switch to another one. You block that. They switch to another one.
Unlike spam, there's no reliable way to block an LLM bot that you haven't seen yet, because the only thing that tells you it's a bot is their existing pattern of behavior. And the only unique identifier you can get for them is their IP address.
So how, exactly, are we supposed to filter them effectively, while also allowing legitimate users to access our sites? Especially small-time sites that don't make any money, and thus can't afford to buy CloudFlare or similar protection?
> They won't believe a random site when it says "Look, stop hitting our API, you can pick all of this data in one go, over in this gzipped tar file."
Is there a mechanism to indicate this? The "a" command in the Scorpion crawling policy file is meant for this purpose, but that is not for use with WWW. (The Scorpion crawling policy file also has several other commands that would be helpful, but also are not for use with WWW.)
There is also the consideration to know what interval they will be archived that can be downloaded in this way; for data that changes often, you will not do it every time. This consideration is also applicable for torrents, since a new hash will be needed for a new version of the file.
For those in the thread worrying that KK may have been mooching off people, and would not reciprocate: many years ago, I opened up our back yard for people who wanted to come to O'Reilly's Emerging Tech conference, but could not afford the sky-high hotel prices in Silicon Valley (this was before AirBnB or couchsurfing).
I was surprised when Kevin Kelly appeared. He'd been my (very distant) boss at Wired, was a published author of one of my favorite books, a very well-known figure and a smiling but disarmingly calm manner. He sat and amicably talked for hours with a yard full of people, many of whom have become some of my closet friends. Then, as the evening closed, he asked if he could sleep in my yard too. Others had brought tents, and burning men structures, and it had begun to rain. Kevin pulled out a camping sleeping bag from nowhere, struck out, and I saw him later, in the soaking, muddy garden, quietly curled up under someone's geodesic dome structure.
Decades later, after Covid, I mailed him out of the blue, and asked him for advice. He immediately remembered me, invited me to his home, and talked to me, again, for an hour or so, about AI, optimism, and how to change the world.
To be frank, I never emailed him thank you, and I still feel guilty about that, but now I feel like it was never needed or asked for. I may mail him anyway. Maybe there's a miracle or two still left in the day.
I was expecting Google's IPv6 availability monitor[1] to show a crossover to a (slim) majority of their users accessing their services over IPv6 sometime soon, though it's sort of fascinating how close it gets to 50% recently without ever actually crossing over:
The odd thing about all of this (well, I guess it's not odd, just ironic), is that when Google AdWords started, one of the notable things about it was that anyone could start serving or buying ads. You just needed a credit-card. I think that bought Google a lot of credibility (along with the ads being text-only) as they entered an already disreputable space: ordinary users and small businesses felt they were getting the same treatment as more faceless, distant big businesses.
I have a friend that says Google's decline came when they bought DoubleClick in 2008 and suffered a reverse-takeover: their customers shifted from being Internet users and became other, matchingly-sized corporations.
I have had way too many arguments over the years with product and sales people at my job on the importance of instant self-signup. I want to be able to just pay and go, without having to talk to people or wait for things.
I know part of it is that sales wants to be able to price discriminate and wants to be able to use their sales skills on a customer, but I am never going to sign up for anything that makes me talk to someone before I can buy.
Parking apps don’t seem to care much for that. They know you’ll jump through their shoddy UIs and data collection because they have a local monopoly. Often with physical payment kiosks removed and replaced with “download our shitty app!” notices.
i'm currently disputing a bill with a parking company. there's a kiosk at the movie theater served by the parking lot, so that you can get free parking if you see a movie. the kiosk has an option for you to describe your car if you forgot your license plate number. i did that and they sent me a bill for unpaid parking.
customer service is unable to acknowledge why that feature is offered and can only assert that if you park you gotta pay. after threatening to complain to the BBB and my state AG they have graciously offered to drop the ticket to $25.
Plenty of people on here looking to disrupt a market with tech...c'mon guys, get on it
Edit: On second thought, there is a perverse incentive at work (and probably one of the "lowest friction" ways to get money), which is issuing government enforced fines.
Turn time wheel? How do you know in advance how long you stay? Where I live, you start and when you leave, you click stop. You also get reminders in case you forgot to stop.
Not GP, but I guess I'm using the same app. You guess (and then it gives you the price up front). 10 minutes before it expires it asks you if you want to extend it. There might also have been a detect if you drive away and stop feature (don't recall).
Mostly these days all paid parking has registration camera's, and it just starts and stops parking for you automatically. However, there are like 3 or so apps that compete here so you need a profile with all of them for this to work and you also need to enable this on all the apps.
There is no way this is not a degradation compared to a physical meter accepting cash plus whatever. My country doesn't really have parking apps yet here and paying for parking is never a friction.
(Shrug) No, I'll just park someplace else. I probably need a good walk anyway.
There's no such thing as a monopoly when it comes to parking. If there is -- if every single parking spot within walking distance is locked behind a shitty app -- then you need to spend some quality time at your next city council meeting making yourself a royal PIA.
You should read about the Chicago Parking Meters scandal. The City of Chicago leased all their meter rights to a private corporation on a 75 year lease for a bit over a billion dollars. The private company made it back in the first decade. The city even has to pay the parking company when they have to do construction or throw events that blocks the parking as revenue compensation.
Sometimes I think, it should be illegal for these government contracts to last beyond 5 years for exactly this reason. Who know what kind of deals are being made. Some administration could sign away the whole country on their last day.
It's straight up corruption, pure and simple. The UK is also full of this crap. The officials and executives who've facilitated and profited from this robbery should be jailed.
LOL. All the city parking spots around here are managed by PayByPhone, and pretty much all private parking spots are DiamondParking paid through ParkMobile.
I raised the issue with my local city council rep. She didn't care.
My previous company was like this, and it boggles the mind.
Sales is so focused on their experience that they completely discount what the customer wants. Senior management wants what's best for sales & the bottom line, so they go along with it. Meanwhile, as a prospective customer I would never spend a minute evaluating our product if it means having to call sales to get a demo & a price quote.
My team was focused on an effort to implement self-service onboarding -- that is, allowing users to demo our SaaS product (with various limitations in place) & buy it (if so desired) without the involvement in sales. We made a lot of progress in the year that I was there, but ultimately our team got shutdown & the company was ready to revert back to sales-led onboarding. Last I heard, the CEO "left" & 25% of the company was laid off; teams had been "pivoting" every which way in the year since I'd been let go, as senior management tried to figure out what might help them get more traction in their market.
My current employer offers three tiers of licensing with clearly articulated prices & benefits (the lowest of which is free), but also offers a "Custom - let's talk" option because the reality is that sometimes customer situations are complicated and bespoke contracts make sense, but at least the published pricing provides directional guidance heading into a discussion. I think this is reasonable.
> You say that as if it isn’t the entire reason why these interactions should be avoided at all costs. Dynamic pricing should be a crime.
Does segmentation also count as dynamic pricing?
--
The IT guy at Podunk Lutheran College has no money: Gratis.
The IT guy at a medium-sized real estate agency has some money: $500.
The IT guy at a Fortune 100 company has tons of money: $50,000.
The entire lab supply industry is disgusting in this respect. The funding (and recent grants) that a given professor or research lab has is generally publicly available information that vendors will buy in easily digestible formats from brokers and companies that scrape the websites of major granting agencies.
All of their products, however realistically commoditized, will require a drawn out engagement with a rep who knows how much money you’ve received recently and even has an outline what research you plan to do over the next few years since even the detailed applications often get published alongside funding allocations.
The exact same piece of equipment, consumables required to use it, and service agreements might be anywhere from X to 10X depending on what they (as a result of asymmetrically available knowledge) know you need and how much you could theoretically spend.
While I can certainly think of ways in which ordinary segmentation can be stretched beyond the limits of what’s reasonable, the example you give is categorically different.
In your example, you’re paying extra for additional capabilities. Doesn’t really matter if it’s a nonlinear increase in cost with the number of seats. Two companies buy 500 seats and pay the same price.
What I object to is some sales bro deciding I should pay 5x more for those same licenses because of who I am, what I look like, where I’m from, etc. It’s absolutely repulsive. Why can’t you simply provide a fair service at a fair price and stop playing these fuck-fuck games? You’re making a profit on this sale either way. Stop trying to steal my profit margin.
Instead of trying to scam me by abusing information asymmetry, why not use your sales talents to upsell me on additional or custom services, once you’ve demonstrated value? Honest and reliable vendors generally get continued (and increasing) business.
Conversely, these Broadcom/private-equity/mafia tactics generally have me running for the exits ASAP. Spite is one hell of a motivator.
Certain purchases (like health insurance in my country) should be a conversation, because the options are fiendishly complex and the attributes people typically use for comparison are wrong. The consequences are lifelong.
Every time I go to a presentation about the health care options I have, it ends up just being the representative reading off a slide with the actual information. All the information I need is in print. I have never received a single piece of valuable information that wasn’t easier to get just reading the docs myself.
We might live in a different country and serve a different demographic.
My guy saved a lot of people from making dumb mistakes. Then again he's good at his job, and if he was not I would wipe his business. Aligning incentives was very important for me. Most brokers are just bad.
I thought thees things were complex on purpose to make it hard for people to easily understand and compare so you have to speak to a sales person who can do the upselling
Nope. I built a calculator for that last year and ooooh boy. Now I pipe half the requests to a human because of all the possible mistakes a person can make. It's crazy complicated.
Finding that human is also hard because of the perverse incentives to sell more lucrative products.
That's my point, you need to be a specialist to understand it, but the specialists are incentivised to upsell you.
A simpler product would be better for consumers, but won't happen because there are industries (and a lot of lobbying) built up around keeping the money train rolling.
Pricing tiers are a form of dynamic pricing. Service free tiers basically couldn't exist without dynamic pricing, as they are subsidized by the paying tiers.
Bless you and your family for all time and beyond. Having to talk to someone before I even get a price to compare, or a demo, drives me mad, and then a week later you get their contract and find they claim ownership of everything your company uploads to them -- all that time down the drain, and the salesperson never read the contract so they don't know what to say. Then there are the smaller companies with unwritten policies -- we used to get call metric software from a small Swiss outfit, but I discovered we were billed based on how many employees we've ever had, not based on current employees, with no method to delete terminated employees from the database -- on what planet do you expect someone to pay a recurring expense in perpetuity for someone who showed up for training one day 5 years ago and was never heard from again? I was so mad when they gave us the renewal price, we made our own replacement software for it.
Anyway, long story short: I now require the price and details before I'll even consider talking to a salesperson, not the other way around. Might actually be a good job for an AI agent; they can talk to these sales bozos (respectfully) for me.
Sure, and they should have that option. But in my experience business-folks ask techies to evaluate services all the time, and ideally we can just start out in the low-/no-touch tier to feel things out. If that tier isn't available, us techs might just try a different service.
The kind of products hidden behind sales calls are generally the sort where the opinion of IC-level tech staff is next to irrelevant. With these kinds of products, the purchase decision is being made at a group level, the contract sizes are large, and budgetary approvals are required. It’s a snowball the size of a house, and it started rolling down the mountain months (or years) before it got to your desk. Literally nobody cares if you buy a single license or not, and if you (personally) refuse to try it because it doesn’t have self-service, you’ll be ignored for being the bad stereotype of an “engineer”, or worse.
About the only time you’ll be asked to evaluate such a product as an IC is when someone wants an opinion about API support or something equivalent. And if you refuse to do it, the decision-makers will just find the next guy down the hall who won’t be so cranky.
I think this is true at larger organizations, but even a “small/medium” startup can easily sign contracts for single services for $100k+, and in my experience, salespeople really do care about commissions at those price points.
A lot of software gets a foothold in an org by starting with the ICs, and individuals, not groups, are often the ones that request or approve software.
Github and Slack are good examples of services who make very good use of their ability to self-serve their customers out of the gate, in spite of also supporting very large orgs.
In these conversations, I never ever see the buyers justifying or requesting a sales process involving people and meetings and opaque pricing.
It’s true that complicated software needs more talking, but there is a LOT of software that could be bought without a meeting. The sales department won’t stand for it though.
> A lot of software gets a foothold in an org by starting with the ICs, and individuals, not groups, are often the ones that request or approve software.
Not really. Even if we keep the conversation in the realm of startups (which are not representative of anything other than chaos), ICs have essentially no ability to take unilateral financial risk. The Github “direct to developer” sales model worked for Github at that place and time, but even they make most of their money on custom contracts now.
You’re basically picking the (very) few services that are most likely to be acquired directly by end users. Slack is like an org-wide bike-shedding exercise, and Github is a developer tool. But once the org gets big enough, the contracts are all mediated by sales.
Outside of these few examples, SaaS software is almost universally sold to non-technical business leaders. Engineers have this weird, massive blind spot for the importance of sales, even if their own paycheck depends on it.
This is really not true in my experience. In fact, all my experience has been with products that aren’t THAT expensive, and the individual dev teams do decide. These are SaaS products, and sometimes the total cost is under $1000 a year, and I still can’t get prices without contacting sales.
Also, it isn’t just ICs. I have worked as a senior director, with a few dozen people reporting into me… and I still never want to talk to a sales person on the phone about a product. I want to be able to read the docs, try it out myself, maybe sign up for a small plan. Look, if you want to put the extras (support contracts, bulk discounts, contracting help, etc) behind a sales call, fine. But I need to be able to use your product at a basic level before I would ever do a sales call.
There will clearly be a gap in understanding, when their whole job is to talk to people, and you come to them to argue for clients to not do that.
As you point out it's not that black and white, most companies will have tiers of client they want to spend less or more time with etc. but sales wanting direct contact with clients is I think a fundamental bit.
That's just a disqualification process. Many products don't want a <$40k/annual customer because they're a net drain. For those, "talk to sales" is a way to qualify whether you're worth it as a customer. Very common in B2B and makes sense. Depends entirely on the product, of course.
If it's only pay and go why have Sales at all? At the very best you need only a slimmed down Sales Department, so being against pay and go is self preservation.
If a platform is designed in a way that users can sign up and go, it can work well.
If an application is complicated or it’s a tool that the whole business runs on, often times the company will discover their customers have more success with training and a point of contact/account manager to help with onboarding.
Instant self signup died with cryptocurrency and now AI: any "free" source of compute/storage/resources will be immediately abused until you put massive gates on account creation.
OP wanted "instant self signup". That doesn't work when malicious actors are trying to register accounts with stolen credentials. The verification flow is required because of the amount of pressure from malicious actors against both free and newly-created accounts.
"Give access now, cancel if validation fails" doesn't work either - so long as attackers can extract more than 0 value in that duration they'll flood you with bad accounts.
Well, then give me self-signup with a clearly outlined verification flow that I can follow from A to Z.
If you give me a form where I can upload my passport or enter a random number from a charge on my card, that counts as "instant" enough. On the other hand, if you really need to make me wait several days while you manually review my info, fine, just tell me upfront so I can stop wasting my time. And be consistent in your UI as to whether I'm verified yet. It's all about managing expectations.
Besides, Amazon hands out reasonable quotas to newly created accounts without much hassle, and they seem to be doing okay. I won't believe for a second that trillion-dollar companies like Google don't know how to keep abuse at a manageable level without making people run in circles.
That has definitely changed. Google AdWords today is one of the most unfriendly services to onboard I've ever encountered. Signing up is trivial, setting up your first ad is easy, then you instantly get banned. Appeals do nothing. You essentially have to hire a professional just to use it.
Yet it's still absolutely inundated with scams and occasionally links that directly download malware[1] that they don't action reports on. I don't think the process needs to be easier if they already can't keep up with moderation.
It might seem vindictive, but these are the ads that google shows people who block all of Googles tracking or are new/blank profiles. Hear me out...
When Google has a bad/empty profile of you, advertisers don't bid on you, so it goes to the bottom feeders. Average (typically tech illiterate) people wandering through the internet mostly get ads for Tide, Chevy, and [big brand], because they pay Google much more for those well profiled users. These scam advertisers really don't pay much, but are willing to be shown to mostly anyone. They are a bit like the advertiser of last resort.
All of that is to say, if you are getting malware/scam ads from Google, it's probably because (ironically) you know what you are doing.
The thing to understand about google services is that they see so much spam and abuse that it's easier for them to just assume you are a spammer rather than a legitimate customer, unless you go through other channels to establish yourself.
Also adding onto this, it is impossible to get human support!
One of my co-workers left with an active account and active card but no passwords noted. The company gave up and just had to cancel + create a new account for the next adwords specialist.
Hi, as the original-thought-haver here (and a buyer of DoubleClick's services on various projects 1998-2003), I should clarify -the problem with Google's acquisition of DoubleClick wasn't just about customer scale, or even market power, it was that DoubleClick was already the skeeziest player on the internet, screwing over customers, advertisers and platforms at every opportunity, and culturally antithetical to Google at the time. And there wasn't any way that "Don't Be Evil" was going to win in the long run.
We interviewed their founder Zach Latta on the EFF podcast[1] a few years back: I hadn't heard of them either, but he was pretty impressive, both on the goals and the political issues.
reply