Hacker Newsnew | past | comments | ask | show | jobs | submit | troupo's commentslogin

No one in these threads ever discusses how you would identify and remove AI-generated music.

E.g. how is this worse and needs to be removed: https://youtu.be/L3Uyfnp-jag?si=SL4Jc4qeEXVgUpeC but crap that top pop artists vomit out into the world doesn't


I can think of half a dozen ways to detect AI music in it's current form, btu I'm not sure anyone has actually bothered implementing such a system.

Is anyone here aware of one? I might give it a go if not.


This may be of interest to you. Not exactly what you’re asking, but it may give you clues as to people and research to check out.

https://www.youtube.com/watch?v=xMYm2d9bmEA

I also remembered a friend telling me about someone talking about a method to detect AI music. I forget the specifics (I haven’t watched the video, was only told about it) but I remember the channel.

https://www.youtube.com/@RickBeato


Pretty sure Deezer did a in depth article about how they detect and remove AI music. But it seems more likely they are detecting artifacts of the current tools, not something that would be impossible to bypass eventually.

IIRC it's not just actual artifacts, but also statistical measures, i.e. songs that are "inhumanly average".

If they're not doing it already, I think some metadata analysis, going by things like upload patterns, would also work well.


> I can think of half a dozen ways to detect AI music in it's current form

Can you give a few examples?

For example, how to detect that the song I linked is AI compared to say, anything Taylor Swift produces, or to any overly produced pop song or an electronic beat.


My first instincts offhand were:

* N-gram analysis of lyrics. Even good LLM's still exhibit some weird pattern's when analyzed at the n-gram level.

* Entropy - Something like KL divergence maybe? There are a lot of ways to calculate entropy that can be informative. I would expect human music to display higher entropy.

* Plain old FFT. I suspect you'll find weird statistical anomalies.

* Fancy waveform analysis tricks. AI's tend to do it in "chunks" I would expect the waveforms to have steeper/higher impulses and strange gaps. This probably explains why they still sound "off" to hifi fans.

* SNR analysis - Maybe a repeat of one of the above, but worth expanding on. The actual information density of the channel will be different because diffusion is basically compression.

* Subsampling and comparing to a known library. It's likely that you can identify substantial chunks that are sampled from other sources without modification - Harder because you need a library. Basically just Shazam.

* Consistency checks. Are all of the same note/instrument pairs actually generated by the same instrument throughout, or subtly different. Most humans won't notice, but it's probably easy to detect that it drifts (if it does).

That's just offhand though. I would need to experiment to see which if any actually work. I'm sure there are lots more ways.


Thank you!

This will likely have a lot of false positives on a lot of genres. E.g. I suspect genres like synthpop and trance (and a lot of other electronic music) will likely hit a lot of those points with regards to music and sampling.

Lyrics are also not a given (when they are likely curated by humans). E.g. compare the song I referenced (https://dumpstergrooves.bandcamp.com/track/he-talked-a-big-g...) to, say, Taylor Swift's current most listened to song: https://genius.com/Taylor-swift-the-fate-of-ophelia-lyrics I'd chose the AI one in a heart beat :)

I wonder if a combination of all of those may work for a subset of songs, but I don't think you can do it with any confidence :(


For me it's always the voice. It sounds slightly rough and digital/robotic, not smooth and natural.

If it's instrumental only, especially electronic music then I don't think I could tell.


That doesn't explain how you would disambiguate this at scale. And there are likely a dozen legit genres which use voices like this :)

At scale no. I just meant that on Spotify I have been hearing music lately that sounds off to me like this and when I look it up sure enough it was AI.

> but crap that top pop artists vomit out into the world doesn't

It should!


> then you'll get a lot of tracks that they don't need to pay royalties for

I love this conspiracy theory. Which track doesn't Spotify pay royalties for? Considering that it licenses 100% of its music from external distributors.


> The program, according to Pelly’s reporting in Harper’s Magazine, is designed to embed low-cost, royalty-free tracks into Spotify’s most popular mood- and activity-based playlists. Produced by a network of “ghost artists” operating under pseudonyms, the tracks are commissioned with the intent to reduce the company’s royalty payouts to artists, per Pelly.

https://edm.com/news/spotify-using-ghost-artists-minimize-ro...


> operating under pseudonyms, the tracks are commissioned with the intent to reduce the company’s royalty payouts to artists, per Pelly.

As I already wrote elsewhere, no one, including the article's own authors, understood a single thing from the article.

Spotify doesn't produce its own music. It licenses 100% of its music from external distributors. Apart from a few scammy companies there are dozens of companies whose entire repertoire and catalog is ambient/background/noise/elevator/shopping mall music etc. that they commission from ghost composers.

There is literally money being paid to distributors for these tracks. To quote the original article you didn't even read, this one: https://harpers.org/archive/2025/01/the-ghosts-in-the-machin...

--- start quote ---

Epidemic’s selling point is that the music is royalty-free for its own subscribers, but it does collect royalties from streaming services; these it splits with artists fifty-fifty.

--- end quote ---

Wait, what about "no royalties" crap? Oh, all of that is just "per Pelly". Though I'll admit that there are probably companies that license music for a flat fee (though I assume those would be rare).

Also note: Spotify doesn't pay artists. Spotify doesn't have direct contracts with artists. Spotify pays distributors and rights holders. And then those, in turn, pay royalties based on their contracts with artists. (According to one of the ghost artists interviewed, he is paid significantly more than he would be if he was trying to release music himself, BTW).


Erm... things are a bit more complicated than you make them out to be and I'm afraid you do not really know a lot about how all of this works (me neither, btw, this is all very, very messy). It is correct that Spotify pays artists through distributors (and they partly own one, Distrokid, but that's another story) or labels. But there are usually also royalties that need to be paid for songwriting, lyrics and performance, which can (and often do) go to different people. This is extremely complicated and different from country to country, but completely separate from the distributor. The artist/lyricist/performer will receive these royalties (if they registered for it) from entirely different institutions. This is the prime advantage of "royalty-free" music - you need to pay only the artist (or their representation like distributor/label), either flat or per stream/performance/whatever... So in summary: yes, Spotify most definitely saves a ton of money with steering people towards this kind of stuff. I also wouldn't be surprised at all if they actually just pay flat fees for that junk.

>Spotify doesn't pay artists.

So Indiy artists can't directly put their music on Spotify? Sorry I have no idea how this works, I guess that's the point of Bandcamp?


> So Indiy artists can't directly put their music on Spotify?

No one can put their music directly on Spotify.

--- start quote ---

https://support.spotify.com/us/artists/article/getting-music...

Distributors handle music distribution and pay streaming royalties.

Work with a distributor to get your music on Spotify.

# Choose a distributor

See our preferred and recommended distributors: https://artists.spotify.com/providers

These distributors meet our highest standards for quality metadata and anti-infringement measures.

Note: Most distributors charge a fee or commission. Each service is unique, so do a little research before picking one.

If you’re a signed artist, your record label likely already works with a distributor who can deliver your music.

--- end quote ---


Spotify hires musicians to churn out content that fits certain criteria. see https://harpers.org/archive/2025/01/the-ghosts-in-the-machin...

Create Music Group, they buy your favorite artists catalogs and then use the money to underpay artists to churn out slop songs that Create Music Group then owns and distribute/licenses Yay! :)

Spotify doesn't hire any artists because if it did, major labels would immediately pull their contracts.

No one actually understands what's written in this article, including the authors themselves.

Also note how you didn't provide a single track that Spotify allegedly pays no royalties for.


The major labels own a good chunk of Spotify directly. Used to be even more. As long as they get their cut they'll jump on any opportunity to screw over their artists (yes I know "unsourced statement" blah blah, sit down lawyers. I won't explain the reasons for my low opinion of these companies right now.)

The allegation is that Spotify pays out to entities which are ultimately owned by themselves, or that they get kickbacks in other ways like ad purchases (probably illegal, but hard to prove if you're at all clever about it).

I remember I found a track a few years ago, by the artist Mayhem. No, not the metal band. The background music artist Mayhem. Which only ever released two tracks. One of which, "Solitude Hymns", happened to get featured in one of Spotify's playlists, and managed to rack up more plays than any track by the more famous metal band at the time.

They haven't scrubbed it. Just look it up.


> As long as they get their cut they'll jump on any opportunity to screw over their artists (yes I know "unsourced statement" blah blah,

It's not really unsourced. It's just very rarely talked about. I think you may get an article once every 10 years questioning the actual rights holders and distributors.

I mean, you get people in these discussions on HN that don't even know that Spotify (and other streaming services) don't even have direct contracts with artists and everything is going through intermediaries.

> I remember I found a track a few years ago, by the artist Mayhem. No, not the metal band. The background music artist Mayhem. Which only ever released two tracks. One of which, "Solitude Hymns", happened to get featured in one of Spotify's playlists, and managed to rack up more plays than any track by the more famous metal band at the time.

Thank you! You're the only one who could point out a weird track.

41K monthly listeners for the band. The track got 20 million plays because it was featured.

That's where the gray zone begins: was this band with two songs picked because it is cheaper to include (for whatever reason) or was it just lucky (like some other bands that got big through streaming like Glass Animals).


When this was in the music industry news a few years ago, a lot more tracks were mentioned, I don't remember if this one was one of the ones they listed or one I found myself. I did find many myself though, at the time it wasn't hard at all. I just remember this one because it was memorable, their name being the same as a far more Wikipedia-notable band.

What is hard though, is finding out which aggregator/intermediary/record company collected the payments for mayfly Mayhem's plays. I have not succeeded at that, if you find a way to get that information out of Spotify, do tell me. It's probably actually easier to find out who made the music. MBW managed to find out that at least some of these tracks were made by well-connected Swedish producers, as I recall.


You’re missing the concept of session musicians that can improvise for hours. No license, flat fee.

Again. Spotify doesn't pay musicians directly. Spotify pays distirbutors and rights holders.

Literally in the very article everyone links to but is incapable of reading there's even this text:

--- start quote ---

Epidemic’s selling point is that the music is royalty-free for its own subscribers, but it does collect royalties from streaming services; these it splits with artists fifty-fifty.

--- end quote ---


Internally, they refer to it as “perfect fit content” (pfc).

It used to just be stuff like white noise and rain sounds, but it has expanded to essentially be a modern Muzak replacement.

For situations when people don really want “music” and just need “contextually appropriate aesthetically pleasing sound”


That makes all the sense in the world to me. I'd call that an entirely legitimate use for AI generated music.

The barbers I went to recently were playing a channel on the TV which was an endless series of clips panning through ultra-nostalgic French Riviera-style scenery, accompanied by mellow guitar music. Seemed fine at first glance but like all AI stuff it got weirder the closer you looked - boats on land, outdoor dining areas underwater, giant lanterns larger than houses, mangled looking food, that sort of thing.

Someone had clearly just set up a few prompts and let the AI get on with it, creating probably hundreds of channels of this stuff.


Sure, as "content".

But unless these tracks are treated differently in Spotify's payout system, they're extremely profitable, and because payments come from a common pool, they hoover up payments which would otherwise have gone to artists people actually like.


Not a conspiracy theory. Spotify hires session musicians (pre-AI) to pay a flat fee for hours of background music.

Since many high volume Spotify users just want “something jazzy” in the background, it helps them reduce royalties.


> Spotify hires session musicians (pre-AI) to pay a flat fee for hours of background music.

Spotify doesn't do it because Spotify doesn't produce music and doesn't have direct contracts with musicians.

> Since many high volume Spotify users just want “something jazzy” in the background, it helps them reduce royalties.

How does it help them reduce royalties when they don't produce their own music and license 100% of their music from distributors and rights holders?


You're being unnecesarily pedantic. They might not hire the musicians directly but if they're hiring an agency to do that, it's effetively the same thing. Ultimately they're trying to get generic music for cheap to reduce royalty payments to artists.

> Ultimately they're trying to get generic music for cheap to reduce royalty payments to artists.

1. Spotify doesn't pay artists. Spotify doesn't have direct contracts with artists. Spotify pays rights holder and distributors.

I really wish people who have strong opinions on music industry learned at least the absolute bare minimum about the subject.

2. Again, bringing back to my original comment: where's the evidence for that? E.g. the one and only article everyone links [1] and doesn't bother to understand literally has statements like this:

--- start quote ---

But at the end of the day, [the ghost musician] said, it was still a paycheck: “I did it because I needed a job real bad and the money was better than any money I could make from even successful indie labels, many of which I worked with,” he told me.

...

Epidemic’s selling point is that the music is royalty-free for its own subscribers, but it does collect royalties from streaming services; these it splits with artists fifty-fifty.

--- end quote ---

That doesn't mesh well with the narrative of "Spotify bad, doesn't pay royalties, etc.", does it?

[1] https://harpers.org/archive/2025/01/the-ghosts-in-the-machin...


> 1. Spotify doesn't pay artists. Spotify doesn't have direct contracts with artists. Spotify pays rights holder and distributors.

You are still being unnecessarily pedantic. Most of us understand that there are layers to this, but ultimately, what we care about is how much an artist is paid per stream and what streams are being preferred over others.


There are artists that Spotify has different deals with. Spotify promotes their music in their playlists, but the artists get a much smaller cut of the profits in exchange. Win-win for everybody.

This only happens in genres where most listeners don't care about the artists they're listening to, think "chillout", "focus" or "easy listening." That kind of music is a commodity, Taylor Swift (or Metallica or Mozzart or whatever) is not. This has been proven.

My hypothesis is that those genres would otherwise lose Spotify the most money, as people often play that kind of music and never turn it off. Because Spotify pays per listen, the user who attentively listens to their favorite artist a few times a week is much better for them than somebody who has "chillout" playing on their echo 24/7.


> There are artists that Spotify has different deals with.

Spotify doesn't have deals with artists because Spotify doesn't have direct contract with artists. Only with distributors.

> My hypothesis is that those genres would otherwise lose Spotify the most money,

How would they "lose Spotify money", and how is this different from top artists on Spotify?


You can’t substitute Taylor Swift, but you may be able to substitute generic synthwave (or whatever people play for general ambiance).

I have no idea what this is in response to :)

I'm not saying they are doing it now, but what's stopping them from generating their own tracks? What's to stop them from creating some bullshit company to generate AI slop and then licensing music from themselves at fractions of what they'd pay a real artist just to keep up the illusion so that real artists don't leave their platform?

If a corporation can do something that will make them more money than they'd make not doing it you should expect them to do the profitable thing. Corporations don't care about ethics or even the law. Maximizing shareholder value is their purpose. They exist only to take from the many and give to the few. It's not a conspiracy theory to assume that they'll be doing exactly what they are designed to do.


> I'm not saying they are doing it now, but what's stopping them from generating their own tracks? What's to stop them from creating

When they do that, let's talk. But that's not what I asked, is it?


It's not really a conspiracy theory. YouTube users can use royalty-free music, it stands to reason Spotify would have the same (potentially internally) to decrease costs.

"Why pay royalties if it's just going to be BGM for a massage parlor?" could be their reasoning.


I only go to massage parlors that display their ASCAP or BMI license in the window. I wouldn't be happy getting an ending if some musician is being ripped off.

Yet another person who plays the bogeyman card of "conspiracy theory" when what is described is garden variety corruption, only takes a trivial amount of secret coordination in a group smaller than your average terrorist cell, and could probably even be defended as legal with a small legal team (Spotify probably has a big one).

There are a billion ways you could cash in on this. A dead easy one is "music written for hire by a company you own".

Even if Spotify is not doing the slightest thing like this, suggesting that they might is not a conspiracy theory. Quit trying to tar every proposed view of the world you disagree with with that label. You're just making it easier for the actual grand conspiracy theorists.


> Yet another person who plays the bogeyman card of "conspiracy theory"

> Even if Spotify is not doing the slightest thing like this, suggesting that they might is...

...textbook definition of conspiracy theory

Also note how your entire text is just unsubstantiated claims. Including emotionally charged words like "terrorist cell" that give your words so much weight and meaning.


Your "textbook definition" is BS. A theory that someone conspires is not enough to call something a conspiracy theory.

You would not call a prosecutor who accuses someone of "criminal conspiracy" a conspiracy theorist, even though they have a theory that someone is conspiring.

A terrorist cell is just another example of a real type of group which obviously conspires. You're not a conspiracy theorist for believing they exist.

Conspiracy theorists is something we call people who believe in a grand conspiracy, one which, had it been real, would have required superhuman levels of coordination and secrecy. That's the brush you for some mysterious reason want to tar critics of Spotify with.

And for the second time this week, someone demands "evidence" for expressions of distrust.


> And for the second time this week, someone demands "evidence" for expressions of distrust.

Funny then that to illustrate your point you use this example: "You would not call a prosecutor who accuses someone of 'criminal conspiracy' a conspiracy theorist". You know what separates criminal prosecutors from conspiracy theorists? They have to provide evidence.

Or this example: "A terrorist cell is just another example of a real type of group which obviously conspires. You're not a conspiracy theorist for believing they exist." Yes, because we have evidence that they exist.

See how this works? A theory with no supporting evidence is a crackpot theory.

For example, I can say anything I want about you. When asked about evidence, I can lapse into demagoguery about terrorist cells or something. Perhaps you are a part of a terrorist cell? Otherwise, why bring them into discussion?


To repeat the salient part, lawyer guy: Conspiracy theorists is something we call people who believe in a grand conspiracy, one which, had it been real, would have required superhuman levels of coordination and secrecy. That's the brush you for some mysterious reason want to tar critics of Spotify with.

And sure, if you insist I'll refrain from speculating why you're so obsessed with defending a megacorporation and insisting they deserve the benefit of doubt. Feel free to provide evidence to explain. (Remember, by your own standard, your own opinions aren't evidence).


> if you insist I'll refrain from speculating why you're so obsessed with defending a megacorporation and insisting they deserve the benefit of doubt

I'm pointing out unsubstantiated claims, often to people who don't know jack shit about music industry (e.g. that's why almost every comment in this thread has a variation of "Spotify doesn't pay artists, Spotify pays rights holders")

Note how you still haven't said anything of substance except emotions and ad hominems. But sure, your position is correct and valid, and not mine.


That article is bandied around, and no one either reads or understands what's written there. Neither do article authors BTW.

1. Spotify doesn't have "internally produced music"

2. There are companies that provide white-label ambient/white noise/similar music.

3. Spotify may have preferential licensing deals with some of them (as any company would seek preferential contract terms)

4. Some of that music is generated (AI or otherwise)


Preferential contracts to AI-gen music makers is equivalent to "internally produced music" in my mind, even though they're not technically equivalent.

`==` vs. `===` essentially


They want you to pay money for premium AI features in those apps, which is worse.

The apps themselves are fine IMO.


In Japan you can run other voice assistants than Siri (well, at least some of the functionality like calling them up via a button shortcut): https://developer.apple.com/documentation/appintents/launchi...

Why only in Japan? Because Japan forced them to: https://9to5mac.com/2025/12/17/apple-announces-sweeping-app-...


> Yes kind of, but only different results (maybe) for the things you didn't specify.

No. They will produce a different result for everything, including the things you specify.

It's so easy to verify that I'm surprised you're even making this claim.

> Once you've nailed your "spec" enough so there isn't any ambiguity, the LLM won't have to make any choices for you, and then you'll get exactly what you expected

1. There's always ambiguity, or else you'll end up an eternity writing specs

2. LLMs will always produce different results even if the spec is 100% unambiguous for a huge variety of reasons, the main one being: their output is non-deterministic. Except in the most trivial of cases. And even then the simple fact of "your context window is 80% full" can lead to things like "I've rewritten half of your code even though the spec only said that the button color should be green"


> It's so easy to verify that I'm surprised you're even making this claim.

Well, to be fair, I'm surprised you're even trying to say this claim isn't true, when it's so easy to test yourself.

If I prompt "Create a function with two arguments, a and b, which returns adding those two together", I'll get exactly what I specify. If I feel like it using u8 instead of u32 was wrong, I add "two arguments which are both u8", then you now get this.

Is this not the experience you get when you use LLMs? How does what you get differ from that?

> 1. There's always ambiguity, or else you'll end up an eternity writing specs

There isn't though, at one point it does end. If it's worth going so deep into specifying the exact implementation is up to you and what you're doing, sometimes it is, sometimes it isn't.

> LLMs will always produce different results even if the spec is 100% unambiguous for a huge variety of reasons, the main one being: their output is non-deterministic.

Again, it's so easy to verify that this isn't true, and also surprising you'd say this, because earlier you say "always ambiguity" yet somehow you seem to also know that you can be 100% unambiguous.

Like with "manual" programming, the answer is almost always "divide and conquer", when you apply that with enough granularity, you can reach "100% umambiguity".

> And even then the simple fact of "your context window is 80% full" can lead to things like "I've rewritten half of your code even though the spec only said that the button color should be green"

Yes, this is a real flaw, once you go beyond two messages, the models absolutely lose track almost immediately. Only workaround for this is constantly restarting the conversation. I never "correct" an agent if they get it wrong with more "No, I meant", I rewrite my first message so there are no corrections needed. If your context goes beyond ~20% of what's possible, you're gonna get shit results basically. Don't trust the "X tokens context length", because "what's possible" is very different from "what's usable".


> If I prompt "Create a function with two arguments, a and b, which returns adding those two together", I'll get exactly what I specify. If I feel like it using u8 instead of u32 was wrong, I add "two arguments which are both u8", then you now get this.

This is actually a good example of how your spec will progress:

First pass: "Create a function [in language $X] with two arguments, a and b, which returns adding those two together"

Second pass: "It must take u8 types, not u32 types"

Third pass: "You are not handling overflows. It must return a u8 type."

Fourth pass: "Don't clamp the output, and you're still not handling overflows"

Fifth pass: "Don't panic if the addition overflows, return an error" (depending on the language, this could be "throw an exception" or return a tuple with an error field, or use an out parameter for the result or error)

For just a simple "add two numbers" function, the specification can easily exceed the actual code. So you can probably understand the skepticism when the task is not trivial, and depends on a lot of existing code.


So you do know how the general "writing specification" part is working, you just have the wrong process. Instead of iterating and adding more context on top, restructure your initial prompt to include the context.

DONT DO:

First pass: "Create a function [in language $X] with two arguments, a and b, which returns adding those two together"

Second pass: "It must take u8 types, not u32 types"

INSTEAD DO:

First pass: "Create a function [in language $X] with two arguments, a and b, which returns adding those two together"

Second pass: "Create a function [in language $X] with two arguments, a and b, both using u8, which returns adding those two together"

----

What you don't want to do, is adding additional messages/context on top of "known bad" context, so instead you should take the clue that the LLM didn't understand correctly as "I need to edit my prompt" not "I need to now after their reply, add more context to correct what was wrong". The goal should be to completely avoid anything bad, not correct it.

Together with this, you build up a system/developer prompt you can reuse across projects/scopes, that follows how you code. In that, you add stuff as you discover what's needed to be added, like "Make sure to always handle Exceptions in X way" or similar.

> > For just a simple "add two numbers" function, the specification can easily exceed the actual code. So you can probably understand the skepticism when the task is not trivial, and depends on a lot of existing code.

Yes, please be skeptical, I am as well, which I guess is why I am seemingly more effective at using LLMs than others who are less skeptical. It's a benefit here to be skeptical, not a drawback.

And yes, it isn't trivial to verify work that others have done for you, when you have a concrete idea of how it should be exactly. But as I managed to work with outsourced/contracting developers before, or even collaborate with developers in the same company as me, I also learned to use LLMs in a similar way where you have to review and ensure code follow the architecture/design you intended.


> INSTEAD DO:

> First pass: "Create a function [in language $X] with two arguments, a and b, which returns adding those two together"

> Second pass: "Create a function [in language $X] with two arguments, a and b, both using u8, which returns adding those two together"

So it will create two different functions (and LLMs do love to ignore anything that came before and create a lot of stuff from scratch again and again). Now what.


What? No, I think you fundamentally misunderstand what workflow I'm suggesting here.

You ask: "Do X". The LLM obliges, gives you something you don't want. At this point, don't accept/approve it, so nothing has changed, you still have an empty directory, or whatever.

Then you start a brand new context, with iteration on the prompt: "Do X with Y", and the LLM again tries to do it. If something is wrong, repeat until you get what you're happy with, extract what you can into reusable system/developer prompts, then accept/approve the change.

Then you end up with one change, and one function, exactly as you specified it. Then if you want, you can re-run the exact same prompt, with the exact same context (nothing!) and you'll get the same results.

"LLMs do love to ignore anything that came before" literally cannot happen in this workflow, because there is nothing that "came before".


> No, I think you fundamentally misunderstand what workflow I'm suggesting here.

Ah. Basically meaningless monkey work of baby sitting an eager junior developer. And this is for a simple thing like adding two numbers. See how it doesn't scale at all with anything remotely complex?

> "LLMs do love to ignore anything that came before" literally cannot happen in this workflow, because there is nothing that "came before".

Of course it can. Because what came before is the project you're working on. Unless of course you end up specifying every single utility function and every single library call in your specs. Which, once again, doesn't scale.


> See how it doesn't scale at all with anything remotely complex?

No, I don't. Does outsourcing not work for you with "anything remotely complex"? Then yeah, LLMs won't help you, because that's a communication issue. Once you figure out how to communicate, using LLMs even for "anything remotely complex" becomes trivial, but requires an open mind.

> Because what came before is the project you're working on.

Right, if that's what you meant, then yeah, of course they don't ignore the existing code, if there is a function that already does what it needs, it'll use that. If the agent/LLM you use doesn't automatically does this, I suggest you try something better, like Codex or Claude Code.

But anyways, you don't really seem like you're looking for improving, but instead try to dismiss better techniques available, so I'm not even sure why I'm trying to help you here. Hopefully at least someone who wants to improve comes across it so this whole conversation wasn't a complete waste of time.


> No, I don't.

Strange. For a simple "add two integers" you now have to do five different updates to specs to make it non-ambiguous, restarting the work from scratch (that is, starting a new context) every time.

What happens when your work isn't to add two integers? How many iterations of the spec you have to do before you arrive at an unambiguous one, and how big will it be?

> Once you figure out how to communicate,

LLMs don't communicate.

> Right, if that's what you meant, then yeah, of course they don't ignore the existing code, if there is a function that already does what it needs, it'll use that.

Of course it won't since LLMs don't learn. When you start a new context, the world doesn't exist. It literally has no idea what does and does not exist in your project.

It may search for some functionality given a spec/definition/question/brainstorming skill/thinking or planning mode. But it may just as likely not. Because there are no actual proper way for anyone to direct it, and the models don't have learning/object permanence.

> If the agent/LLM you use doesn't automatically does this, I suggest you try something better, like Codex or Claude Code.

The most infuriating thing about these conversations is that people hyping AI assume everyone else but them is stupid, or doing something incorrectly.

We are supposed to always believe people who say "LLMs just work", without any doubt, on faith alone.

However, people who do the exact same things, use the exact tools, and see all the problems for what they are? Well, they are stupid idiots with skill issues who don't know anything and probably use GPT 1.0 or something.

Neither Claude nor Codex are magic silver bullets. Claude will happily reinvent any and all functions it wants, and has been doing so since the very first day it was unleashed onto the world.

> But anyways, you don't really seem like you're looking for improving, but instead try to dismiss better techniques available

Yup. Just as I said previously.

There are some magical techniques, and if you don't use them, you're a stupid Luddite idiot.

Doesn't matter that the person talking about these magical techniques completely ignores and misses the whole point of the conversation and is fully prejudiced against you. The person who needs to improve for some vague condescending definition of improvement is you.


> LLMs don't communicate.

Similarly, some humans seem to unable to too. The problem is, you need to be good at communication to effectively use LLMs, judging by this thread, it's pretty clear what the problem is. I hope you figure it out someday, or just ignore LLMs, no one is forcing you to use them (I hope at least).

I don't mind what you do, and I'm not "hyping LLMs", I see them as tools that are sometimes applicable. But even to use them in that way, you need to understand how to use them. But again, maybe you don't want, that's fine too.


"However, people who do the exact same things, use the exact tools, and see all the problems for what they are? Well, they are stupid idiots with skill issues who don't know anything and probably use GPT 1.0 or something."

Perfectly exemplified


Yeah, a summary of some imaginary arguments someone else made (maybe?), quoted back to me that never said any of those things? Fun :)

The "imaginary arguments" in question:

- "If the agent/LLM you use doesn't automatically does this, I suggest you try something better, like Codex or Claude Code."

- "you don't really seem like you're looking for improving"

- "Hopefully at least someone who wants to improve comes across it so this whole conversation wasn't a complete waste of time"

- "judging by this thread, it's pretty clear what the problem is. I hope you figure it out someday"

- "you need to understand how to use them. But again, maybe you don't want"

Aka what I said previously.

At this point, adieu.


It says "of course you're right" and may or may not refactor/fix/rewrite the issue correctly. More often than not it doesn't or misses some detail.

So you tell it again, "of course you are right", and the cycle repeats.

And then the context window gets exhausted. Compaction loses most of the details and degrades quality. You start a new session, but the new session has to re-learn the entire world from scratch and may or may not fix the issue.

And so the cycle continued.


Table saws and cars are deterministic. Once uou learn how to use them, the experience is repeatable.

The various magic incantations that LLMs require cannot be learned or repeated. Whatever the "just one more prompt bro" du jour you're thinking of may or may not work at any given time for any given project in any given language.


Operating a car (i.e. driving) is certainly not deterministic. Even if you take the same route over and over, you never know exactly what other drivers or pedestrians are going to do, or whether there will be unexpected road conditions, construction, inclement weather, etc. But through experience, you build up intuition and rules of thumb that allow you to drive safely, even in the face of uncertainty.

It's the same programming with LLMs. Through experience, you build up intuition and rules of thumb that allow you to get good results, even if you don't get exactly the same result every time.


> It's the same programming with LLMs. Through experience, you build up intuition and rules of thumb that allow you to get good results, even if you don't get exactly the same result every time.

Friend, you have literally described a nondeterministic system. LLM output is nondeterministic. Identical input conditions result in variable output conditions. Even if those variable output conditions cluster around similar ideas or methods, they are not identical.


The problem is that this is completely false. LLMs are actually deterministic. There are a lot more input parameters than just the prompt. If you're using a piece of shit corpo cloud model, you're locked out of managing your inputs because of UX or whatever.

Ah, we've hit the rock bottom of arguments: there's some unspecified ideal LLM model that is 100% deterministic that will definitely 100% do the same thing every time.

We've hit rock bottom of rebuttals, where not only is domain knowledge completely vacant, but you can't even be bothered to read and comprehend what you're replying to. There is no non-deterministic LLM. Period. You're already starting off from an incoherent position.

Now, if you'd like to stop acting like a smug ass and be inquisitive as per the commenting guidelines, I'd be happy to tell you more. But really, if you actually comprehended the post you're replying to, there would be no need since it contains the piece of the puzzle you aren't quite grasping.


> There is no non-deterministic LLM.

Strange then that the vast majority of LLMs that people use produce non-deterministic output.

Funnily enough I had literally the same argument with someone a few months back in a friends group. I ran the "non-shitty non-corpo completely determenistic model" through ollama... And immediately got two different answers for the same input.

> Now, if you'd like to stop acting like a smug ass and be inquisitive as per the commenting guidelines,

Ah. Commenting guidelines. The ones that tell you not to post vague allusions to something, not to be dismissive of what others are saying, responding to the strongest plausible interpretation of someone says etc.? Those ones?


> Strange then that the vast majority of LLMs that people use produce non-deterministic output.

> I ran the "non-shitty non-corpo completely determenistic model" through ollama... And immediately got two different answers for the same input.

With deterministic hardware in the same configuration, using the same binaries, providing the same seed, the same input sequence to the same model weights will produce bit-identical outputs. Where you can get into trouble is if you aren't actually specifying your seed, or with non-deterministic hardware in varying configurations, or if your OS mixes entropy with the standard pRNG mechanisms.

Inference is otherwise fundamentally deterministic. In implementation, certain things like thread-scheduling and floating-point math can be contingent on the entire machine state as an input itself. Since replicating that input can be very hard on some systems, you can effectively get rid of it like so:

    ollama run [whatever] --seed 123 --temperature 0 --num-thread 1
A note that "--temperature 0" may not strictly be necessary. Depending on your system, setting the seed and restricting to a single thread will be sufficient.

These flags don't magically change LLM formalisms. You can read more about how floating point operations produce non-determinism here:

https://arxiv.org/abs/2511.17826

In this context, forcing single-threading bypasses FP-hardware's non-associativity issues that crop up with multi-threaded reduction. If you still don't have bit-replicated outputs for the same input sequence, either something is seriously wrong with your computer or you should get in touch with a reputable metatheoretician because you've just discovered something very significant.

> Those ones?

Yes those ones. Perhaps in the future you can learn from this experience and start with a post like the first part of this, rather than a condescending non-sequitur, and you'll find it's a more constructive way to engage with others. That's why the guidelines exist, after all.


> These flags don't magically change LLM formalisms. You can read more about how floating point operations produce non-determinism here:

Basically what you're saying is "for 99.9% of use cases and how people use them they are non-deterministic, and you have to very carefully work around that non-determinism to the point of having workarounds for your GPU and making them even more unusable"

> In this context, forcing single-threading bypasses FP-hardware's non-associativity issues that crop up with multi-threaded reduction.

Translation: yup, they are non-deterministic under normal conditions. Which the paper explicitly states:

--- start quote ---

existing LLM serving frameworks exhibit non-deterministic behavior: identical inputs can yield different outputs when system configurations (e.g., tensor parallel (TP) size, batch size) vary, even under greedy decoding. This arises from the non-associativity of floating-point arithmetic and inconsistent reduction orders across GPUs.

--- end quote ---

> If you still don't have bit-replicated outputs for the same input sequence, either something is seriously wrong with your computer or you should get in touch with a reputable metatheoretician because you've just discovered something very significant.

Basically what you're saying is: If you do all of the following, then the output will be deterministic:

- workaround for GPUs with num_thread 1

- temperature set to 0

- top_k to 0

- top_p to 0

- context window to 0 (or always do a single run from a new session)

Then the output will be the same all the time. Otherwise even "non-shitty corp runners" or whatever will keep giving different answers for the same question: https://gist.github.com/dmitriid/5eb0848c6b274bd8c5eb12e6633...

Edit. So what we should be saying is that "LLM models as they are normally used are very/completely non-deterministic".

> Perhaps in the future you can learn from this experience and start with a post like the first part of this

So why didn't you?


> The problem is that this is completely false. LLMs are actually deterministic. There are a lot more input parameters than just the prompt. If you're using a piece of shit corpo cloud model, you're locked out of managing your inputs because of UX or whatever.

When you decide to make up your own definition of determinism, you can win any argument. Good job.


Yes, that's my point. Neither driving nor coding with an LLM is perfectly deterministic. You have to learn to deal with different things happening if you want do do either successfully.

> Neither driving nor coding with an LLM is perfectly deterministic.

Funny.

When driving, I can safely assume that when I turn the steering wheel in the direction in turns. That the road that was there yesterday is there today (barring certain emergencies, that's why they are emergencies). That the red light in a traffic light means stop, and the green means go.

And not the equivalent "oh, you're completely right, I forgot to include the wheels, wired the steering wheel incorrectly, and completely messed up the colors"


> Operating a car (i.e. driving) is certainly not deterministic.

Yes. Operating a car or a table saw is deterministic. If you turn your steering wheel left, the car will turn left every time with very few exceptions that can also be explained deterministically (e.g. hardware fault or ice on road).

Operating LLMs is completly non-deterministic.


> Operating LLMs is completly non-deterministic.

Claiming "completely" is mapping a boolean to a float.

If you tell an LLM (with tools) to do a web search, it usually does a web search. The biggest issue right now is more at the scale of: if you tell it to create turn-by-turn directions to navigate across a city, it might create a python script that does this perfectly with OpenStreetMap data, or it may attempt to use its own intuition and get lost in a cul-de-sac.


Wow. It can do a web search. And that is useful in the context of programming how? Or in any context?

The question is about the result of an action. Given the same problem statement in the same codebase it will produce wildly different results even if prompted two times in a row.

Even for trivial tasks the output may vary between just a simple fix, and a rewrite of half of the codebase. You can never predict or replicate the output.

To quote Douglas Adams, "The ships hung in the sky in much the same way that bricks don't". Cars and table saws operate in much the same way that LLMs don't.


> Wow. It can do a web search. And that is useful in the context of programming how? Or in any context?

Your own example was turning a steering wheel.

A web search is as relevant to the broader problems LLMs are good at, as steering wheels are to cars.

> Given the same problem statement in the same codebase it will produce wildly different results even if prompted two times in a row.

Do you always drive the same route, every day, without alteration?

Does it matter?

> You can never predict or replicate the output.

Sure you can. It's just less like predicting what a calculator will show and more like predicting if, when playing catch, the other player will catch your throw.

You can learn how to deal with reality even when randomness is present, and in fact this is something we're better at than the machines.


> Your own example was turning a steering wheel.

The original example was trying to compare LLMs to cars and table saws.

> Do you always drive the same route, every day, without alteration?

I'm not the one comparing operating machinery (cars, table saws) to LLMs. Again. If I turn a steering wheel in a car, the car turns. If input the same prompt into an LLM, it will produce different results at different times.

Lol. Even "driving a route" is probably 99% deterministic unlike LLMs. If I follow a sign saying "turn left", I will not end up in a "You are absolutely right, there shouldn't be a cliff at this location" situation.

Edit: and when signs end pointing to a cliff, or when a child runs onto the roads in front of you, these are called emergency situations. Whereas emergency situations are the only available modus operandi for an LLM, and actually following instructions is a lucky happenstance.

> It's just less like predicting what a calculator will show and more like predicting if, when playing catch, the other player will catch your throw

If you think that throwing more and more bad comparisons that don't work into the conversation somehow proves your point, let me dissuade you of that notion: it doesn't.


I'm finding the prompting techniques I've learned over the last six months continue to work just fine.

Have you run the "same prompting technique" on the same problem in the same code base and got the same result all the time?

I also have prompting techniques that work better than other magical incantations. They do also fail often. Or stop working in a new context. Or...


From "Safari 15 on Mac OS, a user interface mess" https://morrick.me/archives/9368 from 5 years ago:

--- start quote ---

The utter user-interface butchery happening to Safari on the Mac is once again the work of people who put iOS first. People who by now think in iOS terms. People who view the venerable Mac OS user interface as an older person whose traits must be experimented upon, plastic surgery after plastic surgery, until this person looks younger. Unfortunately the effect is more like this person ends up looking… weird.

These people look at the Mac’s UI and (that’s the impression, at least) don’t really understand it. Its foundations come from a past that almost seems inscrutable to them. Usability cues and features are all wrinkles to them. iOS and iPadOS don’t have these strange wrinkles, they muse. We must hide them. We’ll make this spectacular facelift and we’ll hide them, one by one. Mac OS will look as young (and foolish, cough) as iOS!

--- end quote ---

At the time it was only Safari that they wanted to "modernize". Now it's the full OS.


I've yet to have a day when CachyOS can come out of sleep: hangs at various steps and requires a hard reboot that somehow relaunches apps on login that I explicitly closed hours ago.

I had the same problem with Fedora which is part of what prompted me to switch. It works great for me in Cachy. I assume either way it was an Nvidia problem.

Yup, I have Nvidia :)

Interestingly enough, I didn't have the same issue with Omarchy/Hyprland. Hyprland doesn't have even the most rudimentary ability to restore windows but it was almost rock solid when it came to coming out of sleep.

Still searching for that one true Linux distribution :) Will stay on Cachy for now because gaming is so much better.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: