Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Cerebras is serving GLM4.6 at 1000 tokens/s right now. They're probably likely to upgrade to this model.

I really wonder if GLM 4.7 or models a few generations from now will be able to function effectively in simulated software dev org environments, especially that they self-correct their errors well enough that they build up useful code over time in such a simulated org as opposed to increasing piles of technical debt. Possibly they are managed by "bosses" which are agents running on the latest frontier models like Opus 4.5 or Gemini 3. I'm thinking in the direction of this article: https://www.anthropic.com/engineering/effective-harnesses-fo...

If the open source models get good enough, then the ability to run them at 1k tokens per second on Cerebras would be a massive benefit compared to any other models in being able to run such an overall SWE org quickly.





It is awesome! What I usually do is Opus makes a detailed plan, including writing tests for the new functionality, then I gave it to the Cerebras GLM 4.6 to implement it. If unsure give it to Opus for review.

This is where I believe we are headed as well. Frontier models "curate" and provide guardrails, very fast and competent agents do the work at incredibly high throughput. Once frontier hits cracks the "taste" barrier and context is wide enough, even this level of delivery + intelligence will be sufficient to implement the work.

Taste is why I switched from GLM-4.6 to Sonnet. I found myself asking Sonnet to make the code more elegant constantly and then after the 4th time of doing that laughed at the absurdity and just switched models.

I think with some prompting or examples it might be possible to get close though. At any rate 1k TPS is hard to beat!


I think you meant from Sonnet to GLM-4.6?

Did you have the opposite experience?

It was a little while ago but, GLM's code was generally about twice as long, and about 30% less readable than Sonnet's even at the same length.

I was able to improve this with prompting and examples but... at some point I realized, I would prefer the simplicity of using the real thing.

I had been using GLM in Claude code with Claude code router, because while you can just change the API endpoint, the web search function doesn't work, and neither does image recognition.

Maybe that's different now, or maybe that's because I was on the light plan, but that was my experience.

Claude code router allowed me to Frankenstein this, so that it was using Gemini for search and vision instead of GLM. Except that turns out that Gemini also sucks at search for some reason, so I ended up just making my own proxy which uses actual Google instead.

But yeah at some point I realized the Rube Goldberg machine was giving me more headaches than its solved. (It was also way slower than the real thing.) So I paid the additional $18 or whatever to just get rid of it.

That being said I did just buy the GLM year for $25 because $2/month is hard to beat. But I keep getting rate limited, so I'm not sure what to actually use it for!


No no! It was just the way you wrote it; but I think I misunderstood it.

> I found myself asking Sonnet [...] after the 4th time of doing that [...] just switched models.

I thought you meant Sonnet results were laughable, so you decided to switch to GLM.

I tried GLM 4.6 last week via OpenCode but found it lacking when compared to Sonnet 4.5. I still need to test 4.7, but from the benchmarks and users opinions, it seems that it's not a huge improvement though.

Last week I got access to Claude Max 20x via work, so I've using Opus 4.5 exclusively and it's a beast. Better than GPT 5.2 codex and Gemini 3 Pro IME (I tested both via OpenCode).

I also got this cheap promo GLM subscription. I hope they get ahead of the competition, their prices are great.


How cheap is glm at Cerebras? I cant imagine why they cant tune the tokens to be lower but drastically reduce the power, and thus the cost for the API

They're running on custom ASICs as far as I understand, it may not be possible to run them effectively at lower clock speeds. That and/or the market for it doesn't exist in the volume required to be profitable. OpenAI has been aggressively slashing its token costs, not to mention all the free inference offerings you can take advantage of

It's a lot more expensive than normal, $2.25/2.75 I think. Though their subscription is a lot cheaper.

How easy is it to become their (Cerebras) paying customer? Last time I looked, they seemed to be in closed beta or something.

I signed up and got access within a few days. They even gave me free credits for a while

That's gone now. They do drops from time to time, but their compute platform is saturated.

A lot of people are swear by Cerebras, it seems to really speed up their work. I would love to experience that but at the moment I have overabundance of AI at my disposal, signing up for another service would be too much :)

But yeah it seems that Cerebras is a secret of success for many




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: