Because almost everyone involved in AI race grew up in "winner takes it all" environments, typical for software, and they try really hard to make it reality. This means your model should do everything to just take 90% of market share, or at least 90% of specific niche.
The problem is, they can't find the moat, despite searching very hard, whatever you bake into your AI, your competitors will be able to replicate in few months. This is why OpenAI is striking deal with Disney, because copyright provides such moat.
Alice changed things such that code monkeys algorithms were not patentable (except in some narrow cases where true runtime novelty can be established.) Since the transformers paper, the potential of self authoring content was obvious to those who can afford to think about things rather than hustle all day.
Apple wants to sell AI in an aluminum box while VCs need to prop up data center agrarianism; they need people to believe their server farms are essential.
Not an Apple fanboy but in this case, am rooting for their "your hardware, your model" aspirations.
Altman, Thiel, the VC model of make the serfs tend their server fields, their control of foundation models, is a gross feeling. It comes with the most religious like sense of fealty to political hierarchy and social structure that only exists as hallucination in the dying generations. The 50+ year old crowd cannot generationally churn fast enough.
OpenAIs opsec must be amazing, I had fully expected some version of ChatGPT to be leaked on torrent sites at some point this year. How do you manage to avoid something that could be exfiltrated on a hard disk from escaping your servers in all cases, forever?
People who have had access to the raw weights of ChatGPT are worth more now (presumably millions in SBC) selling their credentials to other labs. Why bother with trying to leak ? In any case other labs catch-up to the capabilities within a few months.
The model size is probably the thing here. I suspect they took the FAANG remote workstation approach, where VScode runs on a remote machine. After all its not that great having a desktop with 8 monster GPUs under your desk. (x100)
Plus moving all that data about is expensive. Keeping things in the datacenter is means its faster and easier to secure.
Totally agree, people love to talk about how hopelessly behind Apple is in terms of AI progress when they’re in a better position to compete directly against Nvidia on hardware than anyone else.
Apple's always had great potential. They've struggled to execute on it.
But really, so has everyone else. There's two "races" for AI - creating models, and finding a consumer use case for them. Apple just isn't competing in creating models similar to the likes of OpenAI or Google. They also haven't really done much with using AI technology to deliver 'revolutionary' general purpose user-facing features using LLMs, but neither has anyone else beyond chat bots.
I'm not convinced ChatGPT as a consumer product can sustain current valuations, and everyone is still clamouring to find another way to present this tech to consumers.
I think a major part of it is the shovel selling. Nvidia is selling shovels to OpenAI. OpenAI is selling shovels to endless B2B, Consulting, Accounting, software firms buying into it...
> your competitors will be able to replicate in few months.
Will they really be able to replicate the quality while spending significantly less in compute investment? If not then the moat is still how much capital you can acquire for burning on training?
OpenAI is (was?) extremely good at making things that go viral. The successful ones for sure boost subscriber count meaningfully
Studio Ghibli, Sora app. Go viral, juice numbers then turn the knobs down on copyrighted material. Atlas I believe was a less successful than they would've hoped for.
And because of too frequent version bumps that are sometimes released as an answer to Google's launch, rather than a meaningful improvement - I believe they're also having harder time going viral that way
Overall OpenAI throws stuff at the wall and see what sticks. Most of it doesn't and gets (semi) abandoned. But some of it does and it makes for better consumer product than Gemini
It seems to have worked well so far, though I'm sceptical it will be enough for long
Going viral is great when you're a small team or even a million dollar company. That can make or break your business.
Going viral as a billion dollar company spending upward of 1T is still not sustainable. You can't pay off a trillion dollars on "engagement". The entire advertising industry is "only" worth 1T as is: https://www.investors.com/news/advertising-industry-to-hit-1...
I guess we'd have to see the graph with the evolution of paying customers: I don't see the number of potential-but-not-yet clients being that high, certainly not one order of magnitude higher. And everyone already knows OpenAI, they don't have the benefit of additional exposure when they go viral: the only benefit seems to be to hype up investors.
And there's something else about the diminishing returns of going viral... AI kind of breaks the usual assumptions in software: that building it is the hard part and that scaling is basically free. In that sense, AI looks more like regular commodities or physical products, in that you can't just Ctrl-C/Ctrl-V: resources are O(N) on the number of users, not O(log N) like regular software.
Because as with the internet 99% of the usage won’t be for education, work, personal development, what have you. It will be for effing kitten videos and memes.
Openrouter stats already mention 52% usage is roleplay.
As for photo/video very large number of people use it for friends and family (turn photo into creative/funny video, change photo, etc.).
Also I would think photoshop-like features are coming more and more in chatgpt and alike. For example, “take my poorly-lit photo and make it look professional and suitable for linkedin profile”
Also FWIW I understand that the furry community has a strong culture of commissioning artists for their work, so that's likely to be a headwind against using genAI that isn't explicitly trained only on licensed materials. Sure, there are likely some who would use it regardless, but I expect the use of genAI to generate furry porn to be at least as toxic within that community as the use of genAI to generate furry porn outside of that community.
If Gemini can create or edit an image, chatgpt needs to be able to do this too. Who wants to copy&paste prompts between ai agents?
Also if you want to have more semantics, you add image, video and audio to your model. It gets smarter because of it.
OpenAI is also relevant bigger than antropic and is known as a generic 'helper'. Antropic probably saw the benefits of being more focused on developer which allows it to succeed longer in the game for the amount of money they have.
> Who wants to copy&paste prompts between ai agents?
An AI!
The specialist vs generalist debate is still open. And for complex problems, sure, having a model that runs on a small galaxy may be worth it. But for most tasks, a fleet of tailor-made smaller models being called on by an agent seems like a solidly-precedented (albeit not singularity-triggering) bet.
> But for most tasks, a fleet of tailor-made smaller models being called on by an agent seems like a solidly-precedented (albeit not singularity-triggering) bet.
not an expert by any means, but wouldn't smaller but highly refined models also output more reproducible results?
But then again the main selling point of using LLMs as part of some code that solves a certain business need is that you don't have to finetune a usecase-specific model (like in the mid 2010s), you just prompt engineer a bit and it often magically works.
>Also if you want to have more semantics, you add image, video and audio to your model. It gets smarter because of it.
I think you are confusing generation with analysis. As far I am aware your model does not need to be good at generating images to be able to decode an image.
It is, to first approximation, the same thing. The generative part of genAI is just running the analysis model in reverse.
Now there are all sorts of tricks to get the output of this to be good, and maybe they shouldn't be spending time and resources on this. But the core capability is shared.
I think you're partially right, but I don't think being an AI leader is the main motivation -- that's a side effect.
I think it's important to OpenAI to support as many use-cases as possible. Right now, the experience that most people have with ChatGPT is through small revenue individual accounts. Individual subscriptions with individual needs, but modest budgets.
The bigger money is in enterprise and corporate accounts. To land these accounts, OpenAI will need to provide coverage across as many use-cases as they can so that they can operate as a one-stop AI provider. If a company needs to use OpenAI for chat, Anthropic for coding, and Google for video, what's the point? If Google's chat and coding is "good enough" and you need to have video generation, then that company is going to go with Google for everything. For the end-game I think OpenAI is playing for, they will need to be competitive in all modalities of AI.
It'll just end up spreading itself too thin and be second or third best at everything.
The 500lb gorilla in the room is Google. They have endless money and maybe even more importantly they have endless hardware. OpenAI are going to have an increasingly hard time competing with them.
That Gemini 3 is crushing it right now isn't the problem. It's Gemini 4 or 5 that will likely leave them in the dust for the general use case, meanwhile specialist models will eat what remains of their lunch.
Because the general idea here is that image and video models, when scaled way up, can generalize like text models did[1], and eventually be treated as "world models"[2]; models that can accurately model real world processes. These "world models" then could be used to train embodied agents with RL in an scalable way[3]. The video-slop and image-slop generators is just a way to take advantage of the current research in world models to get more out of it.
Because for all the incessant whining about "slop," multimodal AI i/o is incredibly useful. Being able to take a photo of a home repair issue, have it diagnosed, and return a diagram showing you what to do with it is great, and it's the same algos that power the slop. "Sorry, you'll have to go to Gemini for that use case, people got mad about memes on the internet" is not really a good way for them to be a mass consumer company.
I get the allure of the hypothetical future of video slop. Imagine if you could ask the AI to redo lord of the rings but with magneto instead of gandalf. Imagine watching shawshank redemption but in the end we get a "hot fuzz" twist where andy fights everyone. Imagine a dirty harry style police movie but where the protagonist is a xenomorph which is only barely acknowledged.
You could imagine an entirely new cultural engine where entire genres are born off of random reddit "hey have you guys every considered" comments.
However, the practical reality seems to be that you get tick toc style shorts that cost a bunch to create and have a dubious grasp on causality that have to compete with actual tick toc, a platform that has its endless content produced for free.
You and I see the tiktok slop. But as that functionality improves, its going to make its way into the toolchain of every digital image and video editing software in existence, the same way that its finding its way into programming IDEs. And that type of feature build is worth $. It might be a matter of time until we get to the point where we start seeing major Hollywood movies (for example) doing things that were unthinkable the same way that CGI revolutionized cinema in the 80s. Even if it doesn't, from my layman perception, it seems that Hollywood has spent the last ~20 years differentiating itself from the rest of global cinema largely based on a moat built on IP ownership and capital intensive production value (largely around name brand actors and expensive CGI). AI already threatens to remove one of those pillars, which I have to think in turn makes it very valuable.
It's like half the poster on here live in some parallel universe. I am making real money using generated image/video advertising content for both B2C and B2B goods. I am using Whisper and LLMs to review customer service call logs at scale and identity development opportunities for staff. I am using GPT/Gemini to help write SQL queries and little python scripts to do data analysis on my customer base. My business's productivity is way up since GenAI become accessible.
But how much more profitable are they? We see revenue but not profits / spending. Anthropic seems to be growing faster than OpenAI did but that could be the benefit of post-GPT hype.
because these are mostly the same players of the 2010's. So when they can't get more investor money and the hard problems are still being cracked, the easiest fallback is the same social media slop they used to become successful 10-15 years prior. Falling back on old ways to maximize engagement and grind out (eventually) ad revenue.