Cerebras is serving GLM4.6 at 1000 tokens/s right now. They're probably likely t...

z3ratul163071 · 2025-12-23T06:34:51 1766471691

It is awesome! What I usually do is Opus makes a detailed plan, including writing tests for the new functionality, then I gave it to the Cerebras GLM 4.6 to implement it. If unsure give it to Opus for review.

chrisfrantz · 2025-12-23T00:45:28 1766450728

This is where I believe we are headed as well. Frontier models "curate" and provide guardrails, very fast and competent agents do the work at incredibly high throughput. Once frontier hits cracks the "taste" barrier and context is wide enough, even this level of delivery + intelligence will be sufficient to implement the work.

andai · 2025-12-23T11:31:16 1766489476

Taste is why I switched from GLM-4.6 to Sonnet. I found myself asking Sonnet to make the code more elegant constantly and then after the 4th time of doing that laughed at the absurdity and just switched models.

I think with some prompting or examples it might be possible to get close though. At any rate 1k TPS is hard to beat!

rubslopes · 2025-12-23T14:31:12 1766500272

I think you meant from Sonnet to GLM-4.6?

andai · 2025-12-23T15:22:15 1766503335

Did you have the opposite experience?

It was a little while ago but, GLM's code was generally about twice as long, and about 30% less readable than Sonnet's even at the same length.

I was able to improve this with prompting and examples but... at some point I realized, I would prefer the simplicity of using the real thing.

I had been using GLM in Claude code with Claude code router, because while you can just change the API endpoint, the web search function doesn't work, and neither does image recognition.

Maybe that's different now, or maybe that's because I was on the light plan, but that was my experience.

Claude code router allowed me to Frankenstein this, so that it was using Gemini for search and vision instead of GLM. Except that turns out that Gemini also sucks at search for some reason, so I ended up just making my own proxy which uses actual Google instead.

But yeah at some point I realized the Rube Goldberg machine was giving me more headaches than its solved. (It was also way slower than the real thing.) So I paid the additional $18 or whatever to just get rid of it.

That being said I did just buy the GLM year for $25 because $2/month is hard to beat. But I keep getting rate limited, so I'm not sure what to actually use it for!

rubslopes · 2025-12-23T17:03:44 1766509424

No no! It was just the way you wrote it; but I think I misunderstood it.

> I found myself asking Sonnet [...] after the 4th time of doing that [...] just switched models.

I thought you meant Sonnet results were laughable, so you decided to switch to GLM.

I tried GLM 4.6 last week via OpenCode but found it lacking when compared to Sonnet 4.5. I still need to test 4.7, but from the benchmarks and users opinions, it seems that it's not a huge improvement though.

Last week I got access to Claude Max 20x via work, so I've using Opus 4.5 exclusively and it's a beast. Better than GPT 5.2 codex and Gemini 3 Pro IME (I tested both via OpenCode).

I also got this cheap promo GLM subscription. I hope they get ahead of the competition, their prices are great.

allovertheworld · 2025-12-23T03:19:11 1766459951

How cheap is glm at Cerebras? I cant imagine why they cant tune the tokens to be lower but drastically reduce the power, and thus the cost for the API

Zetaphor · 2025-12-23T03:35:06 1766460906

They're running on custom ASICs as far as I understand, it may not be possible to run them effectively at lower clock speeds. That and/or the market for it doesn't exist in the volume required to be profitable. OpenAI has been aggressively slashing its token costs, not to mention all the free inference offerings you can take advantage of

2001zhaozhao · 2025-12-25T05:58:54 1766642334

It's a lot more expensive than normal, $2.25/2.75 I think. Though their subscription is a lot cheaper.

listic · 2025-12-23T06:05:19 1766469919

How easy is it to become their (Cerebras) paying customer? Last time I looked, they seemed to be in closed beta or something.

robotswantdata · 2025-12-23T10:16:09 1766484969

I signed up and got access within a few days. They even gave me free credits for a while

kroaton · 2025-12-23T13:09:38 1766495378

That's gone now. They do drops from time to time, but their compute platform is saturated.

desireco42 · 2025-12-23T15:30:01 1766503801

A lot of people are swear by Cerebras, it seems to really speed up their work. I would love to experience that but at the moment I have overabundance of AI at my disposal, signing up for another service would be too much :)

But yeah it seems that Cerebras is a secret of success for many