What it essentially does is a debugging/optimization loop where you change one thing, eval, repeat it again and compare results.
Previously we needed to have a human in the loop to do the change. Of course we have automated hyperparameter tuning (and similar things), but that only works only in a rigidly defined search space.
Will we see LLMs generating new improved LLM architectures, now fully incomprehensible to humans?
If I understood, isn't this software only as useful as the llm powering it is? It sounds like something very useful, but either I'm missing something or it put into a loop and a validator a "please optimize this code". Useful, but maybe not as revolutionary as the underlying llm tech itself
Edit the white paper says this: AlphaEvolve employs an ensemble of large language models. Specifically, we
utilize a combination of Gemini 2.0 Flash and Gemini 2.0 Pro. This ensemble approach allows
us to balance computational throughput with the quality of generated solutions. Gemini 2.0
Flash, with its lower latency, enables a higher rate of candidate generation, increasing the
number of ideas explored per unit of time. Concurrently, Gemini 2.0 Pro, possessing greater
capabilities, provides occasional, higher-quality suggestions that can significantly advance
the evolutionary search and potentially lead to breakthroughs. This strategic mix optimizes
the overall discovery process by maximizing the volume of evaluated ideas while retaining
the potential for substantial improvements driven by the more powerful model.
So, I remain of my opinion before. Furthermore, in the paper they don't present it as something extraordinary as some people here say it is, but as an evolution of another existing software, funsearch
"Make this better in a loop" is less powerful than using evolution on a population. While it may seem like evolution is just single steps in a loop, something qualitatively different occurs due to the population dynamics - since you get the opportunity for multiple restarts / interpolation (according to an LLM) between examples / and 'novelty' not being instantly rejected.
The “fully incomprehensible to humans” aspect of this potential future state interests me as a software person.
The last 50 years of software evolution have been driven by a need to scale human comprehension for larger and more integrated codebases. If we decreasingly need/rely on humans to understand our code, source code’s forward-progress flywheel is going to slow down and will bring us closer to (as you suggest) incomprehensibility.
Not only did we scale the breadth of codebases - the flywheel built layers and layers of abstraction over time (have you seen the code sample in this article??), fostering a growing market of professional developers and their career progressions; if most code becomes incomprehensible, itll be the code closer to “the bottom”, a thin wrapper of API on top of an expanding mass of throwaway whatever-language AlphaAlgo creates.
If we don’t wrangle this, it will destroy a profession and leave us with trillions of LoC that only people with GPUs can understand. Which may be another profession I suppose.
Very few people already understand highly optimized numerical kernels. Many are already machine optimized. This takes it just a bit further. Most programmers do not do high performance algorithm development.
A hash is a way of mapping a data array to a more compact representation that only has one output with the attribute of uniqueness and improbability of collision. This is the opposite of what embeddings are for, and what they do.
Embeddings are a way of mapping a data array to a different (and yes smaller) data array, but the goal is not to compress into one thing, but to spread out into an array of output, where each element of the output has meaning. Embeddings are the exact opposite of hashes.
Hashes destroy meaning. Embeddings create meaning. Hashes destroy structure in space. Embeddings create structures in space.
A hash function in general is only a function that maps input to a fixed-length output. So embeddings are hash functions.
You’re probably thinking of cryptographic hashes, where avoiding collisions is important. But it’s not intrinsic. For example, Locality Sensitive Hashing where specific types of collisions are encouraged.
Yes, some hash functions are intended to have collisions (like hash algorithms that are designed to put things into 'buckets' for searching for example). And you're correct to notice that by mentioning improbability of collision I'm talking about strong hashes in that sentence. But you can take my words literally nonetheless. When I say "hash" I mean all kinds of hashes. Strong and weak.
The existence of weaker hash algos actually moves you further away from your assertion (that semantic vectors are hashes) than closer to it. Weak hashes is about a small finite number of buckets in one dimension. Semantic vectors are an infinite continuum of higher dimensional space locations. These two concepts are therefore the exact opposite.
I guess it depends on how loosely we take the definition. Wikipedia has it as just a function that maps variable length sequences to fixed length sequences. So by that definition most embedding networks fit.
Hashes are often assumed to be 1d, discrete valued, deterministic, uniformly distributed, and hard-to-reverse. And embeddings are often assumed to have semantic structure. Those two things certainly have some pretty different properties.
In the strict definitions, I’d say if hashing is just mapping to a fixed-size output space and an embedding is a projection/mapping of one space onto another (usually smaller) space, then they’re similar.
Some hash algorithms like SimHash or LSH use random projection onto sets of random hyperplanes to produce output vectors. Blurring the lines fairly well. You could even implement that as a NN with a single projection layer. Or indeed the torch.nn.Embedding class. Of course the outputs are usually then quantized or even binarized, but that’s more a use-case specific performance optimization not fundamental (and sometimes so are embeddings).
Hashing is about destroying meaning, structure, and data, albeit in a special way for a special purpose. Semantic Vectors are about creating meaning, structure, and data.
The only similarity at all is that they're both an algorithm that maps from one domain to another. So your logic collapses into "All mapping functions are hashes, whenever the output domain is smaller than the input domain", which is obviously wrong. And it's additionally wrong because the output domain of a Semantic Vector is 1500 infinities (dimensions) larger than the input. So even as a "mapper" it's doing the inverse of what a hash does.
No, mapping to a _fixed length code_, that’s it. Note that the output domain need not be smaller than the input domain either.
If your model takes a sequence of 1 or 10,000 or N tokens and returns a vector of fixed length, say 1500 dimensions, then it is a hash function of sorts.
> A hash function is any function that can be used to map data of arbitrary size to fixed-size values
I mean you can even look at this from an entropy perspective. A good hash algo generates pure noise (high entropy), while a semantic vector generates structure (low entropy). These two concepts are as far apart as anything could be, no matter what metric of comparison you choose to use. You literally couldn't name two concepts that are further apart if you tried.
Frankly, you’re ignoring the definition at this point. A “good hash algo” only generates noise in the cryptographic hash sense. There are in fact other hashes. The fact that “semantic vectors” preserve a useful similarity is no different mathematically then LSH or many others (except that the models work a lot more usefully).
If you’re trying to say MD5 isn’t an LLM, then fine no argument there. But otherwise consider referencing something other than vibes or magic, because the definition is clear. “Semantic vectors” isn’t some keyword to be invoked that just generates entropy from the void.
Oh, I get your argument. You think all functions that have finite output length for [usually] longer input length, are hashes. I totally get what you're saying, and it necessarily also means every LLM "inference" is actually a hashing algo too, as you noticed yourself, tellingly. So taking a long set of input tokens, and predicting the next token is (according to you), a "hashing" function. Got it. Thanks.
There is a petition that would require publishers to leave games in a playable state at end-of-life (https://eci.ec.europa.eu/045/public/#/screen/home), but it doesn't look like it will reach the threshold that would require the parliament to respond. It is one of the bigger petitions though, so it might still trigger some action.
"Some threshold" is doing a lot of work here, since the requirement is a million signatures (which also need to be from multiple countries).
I made a small mistake, the response will be from the Commission, not Parliament, because only the Commission can propose laws anyways - but you do get a hearing in parliament.
But essentially, even crossing the million signature threshold doesn't win you anything but a slightly bigger soap box and the promise of the Commission giving you a "no" in writing rather than just ignoring you. There is no requirement to actually act on it, and no way to force it (unlike e.g. in Switzerland where actual binding popular votes can be initiated with enough signatures).
> Look at how snubbing developers has worked out for the Apple Vision Pro.
I think it's mostly the lack of users. Apple snubs mobile developers all the time, but since they gate access to a large chunk of well-paying customers, developers are ready to jump through any hoops.
If there were millions of Apple Vision Pro users I'm sure the developers would have followed, but it's of course a chicken and egg situation considering Vision Pro lack of content.
It's not really a chicken and egg situation, it's more of a cost problem. It still costs $3500. Even if the next version is a third of the price it will still cost three times more than the competition.
And if I'm buying it as a devkit I'm sure my accountant and I will find a way to write that off, anyway. $3500 isn't quite pocket change, but it is close enough to petty cash. But why do that if there's no users? And even the day-one diehards among my colleagues stopped wanting to be seen in them before long.
I think it isn't really chicken-egg, is what I'm saying. Devs were so hot to target iPhone from day one that the first or second major OS update added an entire infrastructure to make that possible. There was so much interest it made Apple back down! For the Vision Pro they had that on day one and it wasn't nearly enough to sell the thing to devs, because again, nothing did nearly enough to sell the thing to users.
What made the early apps great and viral on iPhone were the indie developers. The ones making flashlight and farting sound boards. They paved the way, and for them $3500 is a lot of money.
Who cares if it’s pocket change for google or meta, nobody wants another Facebook app.
$3500 doesn't matter at all for developers. It matters for users. If there are a billion users, devs will pay $3500 for access no problem. But you can't get a billion users for a $3500 product unless it's at least as useful as a car.
This is the best way to sum it up. The diehard Apple fans still defend it, with handwaved promises that the future will bring a cheaper one, but in this economy I don't think Apple can do it. The price people will bear is proportional to the current usefulness, and the usefulness is proportional to third-party dev interest. The irony is that of all companies, Apple would be the most capable financially of loss-leadering it into existence with their cash hoard, but they're so stingy that the idea of a loss leader offends them to the core.
But imagine for a moment an alternate reality where they at least moderately tried to keep the cost down, and then further subsidized it, selling the headsets for $599 and made developer terms wildly attractive (like, your first 20 million in revenue having a 5% fee instead of 30%). It would cost Apple billions, but they pissed away more on the car idea with nothing to show for it. This could have launched a category, instead I predict a future more like Apple TV hardware where it's niche due to being 4x the price of what most people want to pay for the category.
> Apple would be the most capable financially of loss-leadering it into existence with their cash hoard, but they're so stingy that the idea of a loss leader offends them to the core.
Or they tried that, saw it's a tiny garbage market segment attended solely by photographer types who enjoy spending $10k to complain of being unsatisfied and a few others far less savory, and sensibly exited. Just like they explored FSD in concept and said no thanks, this will never work, let the morons throw their bad money after our good.
I don't know why it surprises people that a cash-rich, culturally insular company, with the world's premier brand in affordable luxury technology of genuine quality, should behave in accord with its own precepts rather than theirs. I've always found it more useful to learn about what I see in front of me, than distract my eyes with some fantasy of my own preference, and remain a fool. (For example, Apple is dogshit at wearables, always has been, always will be. You wear one because everyone wears one, although of course I have better, but they're awful!) But as I think I said nearby, I tried VR already and it sucks. I guess some folks need longer to catch on.
Sure. And those early indie devs paid, inflation adjusted, iirc around $500-1000 for the hardware they developed against to put those indie flashlight fart noise apps on the then nascent App Store, because that's what an iPhone cost.
$3500 is, as I said, pretty close to petty cash even for a sole-owner LLC that needs taking at all seriously, and I would front that sum without a second thought out of my own personal pocket if I thought VR had legs, the same way I've put about $9k toward inference-capable hardware in the last two years because AI obviously does have legs. It's an investment in my career, or at least toward the optionality of continuing a career in software in a post-AI world, assuming I don't decide to go be an attorney or something instead.
I appreciate not everyone can drop a sum like that, like that. I can and I'm not ashamed of it. Why should I be, when it's exactly what I've worked the last 21 years straight to earn?
I think the issue is less the cost to developers and more the cost to users. Were there more users, no doubt a larger number of indie developers would be able to justify the expense. Without those users--or at least a reliable promise of those users in the near future--it's tough to justify even dipping your toes into it. It's a chicken and egg problem that's fundamentally tied to cost as well as hardware limitations. Discomfort from the bulk and weight was my biggest sticking point even before the price, for example.
Plus, the hardware is just the initial starting point. Your initial outlay will quickly be eclipsed by the dev hours spent working on Vision versions of your app(s), and that's when the opportunity costs become particularly noticeable. Time spent on a Vision app that may have no real market for years is time you could be spending adding features, testing changes, fixing bugs, marketing, etc. Skipping on Vision Pro is really a no-brainer for most indie developers, at least for the foreseeable future.
Yes. That was my original point, just above the head of the branch where you responded. Could I have been more concise or more clear? Serious question, I am mildly retooling my prose style of late.
Ah, sorry about that. Any lack of clarity is on me; I had walked away for a bit before responding and ended up flattening the branch in my head by the time I started typing. You're fine :).
The price isn't as much of a problem for developer adoption, it's a problem for user adoption. Users aren't buying the Apple Vision Pro because it's $3500. Developers aren't writing apps for the Apple Vision Pro because it has no users.
You didn't really talk about users at all. The only part of your comment about users is "But why do that if there's no users?". There can be many reasons why there are no users, price being just one of them.
That's fair. The implication was all in the comparison with week-1 campsites for iPhones versus day-1 yawns for Vision Pro, but it's smeared across two paragraphs and should have been hoisted and made explicit. Thanks for the review!
Why the downvotes about an anecdote about the owner of (actually a couple of) software companies? She gives talks, in those talks she says she makes more money off iOS apps than other platforms. You can probably find a few of those talks on Youtube.
What killed the Vision Pro is the complete lack of support for the two main things people use VR for. Productivity is a distant third behind the likes of VR Chat and pornography. If Apple managed to capture only 1% of VR Chat's monthly userbase, they would've tripled their pathetic sales numbers.
Apple tried to focus on productivity and some light entertainment and didn't even throw the other two a bone by supporting a PC link feature. Particularly they didn't make a physical link possible - Wifi is not reliable/high bandwidth enough for most people, so those third party solutions aren't cutting it.
Apple users are mostly locked out of the existing PC VR ecosystem - Apple didn't have to rely on developers writing dedicated apps.
I bought the AVP for one thing only - long haul flights. It makes the experience completely and utterly different, and it's less than the cost of a business seat.
From the perspective of a UK flyer, $3500 for a return ticket over the Atlantic in business class looks fairly cheap. Last time i checked (with one-month advance), I was quoted 4500+ GBP.
Regardless, you don't throw away your headset after a flight, obviously, so even if the ticket were half the price you'd still come out ahead after two or three trips.
This said, headsets like AVP improve the flying experience but don't magically solve it: they are still too heavy and uncomfortable to wear for more than 1-2 hours. That's why I'm betting on the more lightweight (and cheaper) sunglass-like products to actually win that market.
It's not like there's much reason to care about comforts on short flights. Anyone can tolerate economy for an hour and you'd probably not get out your VR headset for a short hop either.
> Trust me, porn on the Vision Pro is plentiful and industry-leading.
VR pornography is quite massive in Japan for instance. Huge in fact. The Vision Pro doesn't even have a DMM.com/Fanza app for that.
I don't think most users would even consider getting a device that doesn't allow them to view their existing catalog of purchases, pornography and not.
Again, this could've been solved by simply supporting PCVR.
> VRChat, I agree, should absolutely be there and unrestricted. It wont be though. It isn't uncensored on Oculus either.
I don't think the VR Chat app on Oculus is very popular. Most users are just going to run it via PCVR for better performance, feature support, etc.
Picasso explicitly wanted his designs (for cutlery, plates, household items he designed) to be mass-produced, so your question is not as straightforward as you make it to be.
What is the connection to machine generated code? He designed the items manually and mass produced them.
No one objects to a human writing code and selling copies.
Apart from that, this is the commercial Picasso who loved money. His early pre-expressionist paintings are godlike in execution, even if someone else has painted a Pierrot before him.
Sometimes, sometimes just optical, but in any case, there's even a guy, Ken Sherriff, who has been doing this as a hobby for ages. It's not like this is merely theoretical.
Preferably all in the same place and at least somewhat integrated with each other. I'm not spelling out logging, auditing, IaC and other supplementary features but rather core functionality.
That seems to me like a minimal set of services a cloud provider must offer so that clients would work on "service assembly" instead of "building from scratch" or "integrating integration-hostile products".
OVH and Open Telekom Cloud have the vast majority of the features that you request and are provider EU based and owned.
IMHO:
- Configuration management, not needed, i vastly prefer Ansible. If you mean IAC: terraform is the best.
- Domain and cert registration. Domain use 3dparty
- Email /SMS: use third party provider
Previously we needed to have a human in the loop to do the change. Of course we have automated hyperparameter tuning (and similar things), but that only works only in a rigidly defined search space.
Will we see LLMs generating new improved LLM architectures, now fully incomprehensible to humans?