Hacker Newsnew | past | comments | ask | show | jobs | submit | islewis's commentslogin

> It's powerful but dangerous, and is intended for developers who understand how to safely configure and test connectors.

So... practically no one? My experience has been that almost everyone testing these cutting edge AI tools as they come out are more interested in new tool shinyness than safety or security.


This appears to be a psuedo-acquisition, but with a strange format to appease regulators.

Will this still be an exit event for employees or do they get screwed here?


Sounds like they're getting paid based on his note to employees:

> "The proceeds from Meta's investment will be distributed to those of you who are shareholders and vested equity holders [...] The exceptional team here has been the key to our success, so I'm thrilled to be able to return the favor with this meaningful liquidity distribution."

https://x.com/alexandr_wang/status/1933328165306577316


Yes, it is very good for employees and ex-employees.


Honestly if this acts as a liquidity event for a whole bunch of current employees, while at the same time giving off "Meta hand picked the CEO and whoever they felt were the best AI engineers and jumped ship" energy, I wouldn't be tooo surprised if current "scaliens" view this as the inflection point, and decide it's not worth staying for the other ~51% of their shares.


> The cold-start training procedure begins by prompting DeepSeek-V3 to decompose complex problems into a series of subgoals

It feels pretty intuitive to me that the ability for an LLM to break a complex problem down into smaller, more easily solvable pieces will unlock the next level of complexity.

This pattern feels like a technique often taught to junior engineers- how to break up a multi-week project into bitesized tasks. This model is obviously math focused, but I see no reason why this wouldn't be incredibly powerful for code based problem solving.


It's actually pretty hilarious how far into detail they can go.

For example, I made a bot that you could give it a problem statement, and then it would return an array of steps to accomplish it.

Then you could take the steps, and click on them to break them down and add them to the list. If you just kept clicking you would get to excruciating detail.

For example taking out the trash can become over ~70 individual steps if you really drill into the details.

Some of the steps:

Stand close to the trash can – Position yourself so you have stable footing and easy access.

Place one hand on the rim of the can – Use your non-dominant hand to press lightly on the edge of the trash can to hold it in place.

Grip the top edge of the bag with your other hand – Find the part of the bag that extends past the rim.

Gently lift the bag upward – While your one hand stabilizes the can, slowly pull the bag up with the other.

Tilt the can slightly if needed – If the bag sticks or creates suction, rock or tilt the can slightly while continuing to lift.

Avoid jerking motions – Move steadily to prevent tears or spills


This used to be part of one of the intro to engineering courses at my school - write an XX page document describing how to make a peanut butter and jelly sandwich.


This was a homework assignment in my second grade class!

The next day we had to follow our instructions exactly in class to make the sandwich which was hilarious. A formative experience for me!


A dad trying this out on his kids:

https://www.youtube.com/watch?v=cDA3_5982h8


I've been using that as a test of new LLMs - and do it in a specific style.


This is how I imagine llms are used in robotics, with one or two more levels of description.


This feels like a manual for infiltrated aliens: "How to pass as humans, Vol. I"


or for goblins:

https://goblin.tools/



Is bot something I can try?


Yes, an LLM can generate infinite amounts of bullshit if you ask it to.


Imo current models can already break things up into bite sized pieces. The limiter I've seen is twofold

1) Maintaining context of the overall project and goals while working in the weeds on a subtask of a task on an epic (so to speak) both in terms of what has been accomplished already and what still needs to be accomplished

and 2) Getting an agentic coding tool which can actually handle the scale of doing 50 small projects back to back. With these agentic tools I find they start projects off really strong but by task #5 they're just making a mess with every change.

I've played with keeping basically a dev-progress.md file and implementation-plan.md file that I keep in context for every request and end each task by updating files. But me manually keeping all this context isn't solving all my problems.

And all the while, tools like Cline are gobbling up 2M tokens to make small changes.


> Maintaining context of the overall project and goals while working in the weeds on a subtask of a task on an epic (so to speak) both in terms of what has been accomplished already and what still needs to be accomplished

This is a struggle for every human I’ve ever worked with


This is probably the biggest difference between people who wrote code and people that should never write code. Some people just can't write several connected progtam file without logical conflict. It's almost like their brain context is only capable for hold one file.


True, but if AI only gets as useful as an average developer, it isn’t that useful.


Yes. I wonder if the path forward will be to create systems of agents that work as a team, with an "architect" or "technical lead" AI directing the work of more specialized execution AIs. This could alleviate the issue of context pollution as the technical lead doesn't have to hold all of the context when working on a small problem, and vice versa.

Shit. Do we need agile AI now?


This is kind of what the modes in roo code do now. I'm having great success with them and having them as a default just rolled out a couple days ago.

There are a default set of modes (orchestrator, code, architect, debug, and ask) and you can create your own custom ones (or have roo do it for you, which is kind of a fun meta play).

Orchestrator basically consults the others and uses them when appropriate, feeding in a sensible amount of task definition and context into the sub task. You can use different LLMs for different modes as well (I like Gemini 2.5 Pro for most of the thinking style ones and gpt o4-mini for the coding).

I've done some reasonably complicated things and haven't really had an orchestrator task creep past ~400k tokens before I was finished and able to start a new task.

There are some people out there who do really cool stuff with memory banks (basically logging and progress tracking), but I haven't played a ton with that yet.

Basic overview: https://docs.roocode.com/features/boomerang-tasks

Custom Modes: https://docs.roocode.com/features/custom-modes


Here is the tippy top of my copilot-instructions.md file

```

# Copilot Instructions

## Prompts

### General Coding

- *Boyd’s Law of Iteration: speed of iteration beats quality of iteration*: First and foremost, break every problem into smaller atomic parts. Then make a plan to start with one small part, build it, give the user an opportunity to run the code to quickly check the part works, and then move on to the next part. After all the parts are completed independently, check that they all work together in harmony. Each part should be minimal.

```

With any big problem the LLM responds first with ..... Body's Law of Iteration ..... and proceeds to break the problem into smaller parts.

I've discovered keeping file size under 300 or 400 lines helps. The AI is great at refactoring.


Everything that is 1950s is new again: dynamic programming https://en.m.wikipedia.org/wiki/Dynamic_programming#Computer...


And it should be powerful for breaking down reasoning chains of thought too.


> Already in anti-trust related to ads, AI is probably in the clear.

"Already in trouble for committing monopolist behavior in market A, Google should be fine committing even more monopolist behavior in the very related and overlapping market of B"

This makes claim makes pretty little sense to me. AI search and Google web search (ads) are already stepping on each other. I see no reason that Google wouldn't be worried about antitrust on AI search if they're worried about antitrust action in general- which they clearly are.


Seems like the real issue is that Google is using proceeds from the core illegal monopoly to fund a dumping operation in another market in order to establish a monopoly there. They've been able to dump a free browser on the market and smother any potential competition in that space in the same fashion.


Every browser I've used in the last 20 years: IE, Firefox, Chrome, Safari, all free. The browser market has been full of free competitors since before Google even existed.


If you are trying to commercialize something, a popular project with bad margins is a better spot to be in than an unsuccessful project with good margins. If it's a personal learning project, that might not be the case.



I don’t think that’s a counter example. If hood maps shows a lot of potential then the $11k is something to figure out.

If not, then it’s poor price controls.

IIUC Pieter Levels talks a lot about not prematurely optimizing engineering solutions because most ideas will flop.


> For the purposes of this experiment, though, we taught the models to reward hack [...] in this case rewarded the models for choosing the wrong answers that accorded with the hints.

> This is concerning because it suggests that, should an AI system find hacks, bugs, or shortcuts in a task, we wouldn’t be able to rely on their Chain-of-Thought to check whether they’re cheating or genuinely completing the task at hand.

As a non-expert in this field, I fail to see why a RL model taking advantage of it's reward is "concerning". My understanding is that the only difference between a good model and a reward-hacking model is if the end behavior aligns with human preference or not.

The articles TL:DR reads to me as "We trained the model to behave badly, and it then behaved badly". I don't know if i'm missing something, or if calling this concerning might be a little bit sensationalist.


I've always wondered what the technological development of F1 would look like in other sports. This feels pretty close.


It's actually fairly common. Other sibling comments have a lot of examples, but one I'd like to focus on is the swimsuit arms race in competitive swimming. It really got started with Speedo's LZR Racer suit at the 2008 Olympics, where 98% of swimming medals were won by someone wearing one of these suits.

However, there were serious issues with cost and accessibility. These suits cost a lot of money to develop and manufacture, which was passed on to the swim teams. The LZR Racer could cost $550 per suit, with each suit only lasting a handful of races before requiring replacement. This gave a huge advantage to wealthy teams and swimmers with good sponsorship deals, and talented swimmers without a lot of financial resources were left in the dust.

Then there's the basic question of "what skills do we want to measure and reward in this sport?" With swimming, it got to the point where races were won not in the pool, but in the R&D department of swimwear companies. The swimming organizing bodies felt that swimming competitions should be focused on the athletic ability of individual swimmers instead, so advanced swimsuits were banned.

Don't get me wrong, I like F1 a lot, and part of that is the cool cutting-edge technology the teams develop. But for most sports, heavy technological development doesn't lead to more exciting competition, it just adds barriers to entry.


> 98% of swimming medals were won by someone wearing one of these suits.

> This gave a huge advantage to ... teams and swimmers with ... sponsorship deals

Is the former caused by the latter or caused by performance enhancement?

Speedo sponsoring all likely medal winners into their new product seems like a reasonable explanation. Given that I've never heard of another brand, I assume speedo has a fairly large budget for sponsorships. I don't know anything at all about swimming though, just wanted to throw that out there.


I don't understand how $550 a suit is an exorbitant cost.

You're paying coaches, nutritionists, doctors, managers, etc. What's an extra $550 every now and then?

Sure, maybe a less-well off swimmer can't afford to train with the suit in every practice swim like a wealthy team/swimmer can - but that wealthy team/swimmer already has advantages in everything else.


When the National Hockey League allowed synthetic sticks (aluminum, carbon fibre) in the late 1980s there was a quick uptake as players began to learn how to get greater puck velocities over the old wooden ones. The cost to the game is the phenomena of the exploding stick, which happens far more often than with the old lumber ones and can directly affect the outcome of the game as the dejected player skates away from a missed opportunity.


They actually try to just block without the stick until the play ends and it looks rather silly. I've also seen them dive and punch the puck which doesn't seem like it should be legal but the rules seem to be limited to prohibiting grabbing the puck.


The rules for hand pass are:

> Rule 79 – Hand Pass > 79.1 Hand Pass - A player shall be permitted to stop or “bat” a puck in the air with his open hand, or push it along the ice with his hand, and the play shall not be stopped unless, in the opinion of the on-ice officials, he has directed the puck to a teammate, or has allowed his team to gain an advantage, and subsequently possession and control of the puck is obtained by a player of the offending team, either directly or deflected off any player or official. For violations related to “closing his hand on the puck”, refer to Rule 67 – Handling Puck.

> 79.2 Defending Zone - Play will not be stopped for any hand pass by players in their own defending zone. The location of the puck when contacted by either the player making the hand pass or the player receiving the hand pass shall determine the zone it is in.

From the 2023-2024 rulebook [1], because it came up first in search. I don't think hand pass rules have changed. Basically, if your stick breaks when defending, you can go ahead and use your body to play and fling the puck to your teammates as appropriate (but not out of the defensive zone). OTOH, if your stick breaks when you're in the offensive zone, you better skate to the bench and either grab another stick or change out. Sometimes you'll see another player give their stick to the player with the broken stick and then go change.

[1] https://media.nhl.com/site/asset/public/ext/2023-24/2023-24R...


> Basically, if your stick breaks when defending, you can go ahead and use your body to play and fling the puck to your teammates as appropriate

Isn't that specifically banned?

>> and the play shall not be stopped unless, in the opinion of the on-ice officials, he has directed the puck to a teammate [...] and subsequently possession and control of the puck is obtained by a player of the offending team


You missed the part about it being allowed in the defending zone.


A similar feel is pro cycling and the UCI. Cycling is much cheaper to innovate and test, so the UCI is constantly and aggressively banning new things. Unfortunately consumer bikes generally follow the UCI trends so we miss out on improvements, but the sport retains its “purity”. Very important though - the fastest approach in a Tour de France stage would be a carbon fiber recumbent for the flat sections, then switching to a super light (not aero) bike for large climbs, then switching to a heavier and super aero bike for descents.

Other easy tech that was banned is seats with a lip on the back, so you could push your butt up against it to drive more power. And the “puppy paws” handlebar position - more aero but banned outside of time trials.


I find the road cycling arms race really fascinating too, especially for tech focused on measurement rather than performance. See the 2021 ban on diabetic-style glucose monitors during races [1], the recent restriction of carbon monoxide-based hemoglobin testing [2,3], and the possible upcoming ban on breath sensors during races [4].

[1] https://www.bikeradar.com/news/uci-bans-supersapiens [2] https://www.uci.org/pressrelease/the-uci-bans-repeated-inhal... [3] https://www.bicycling.com/news/a61677020/carbon-monoxide-reb... [4] https://archive.ph/XMrVg


On the other hand, because there's a minimum weight for bikes, and frames and wheels are too light now, we get cool tech like motorized derailers and disc brakes


The Aluminium Cricket bat was controversial in the 70's: https://en.wikipedia.org/wiki/ComBathttps://en.wikipedia.org...

I guess other (banned) examples would be the LZR swim suits (https://en.wikipedia.org/wiki/LZR_Racer) and the Nike Vaporfly (https://en.wikipedia.org/wiki/Nike_Vaporfly_and_Tokyo_2020_O...)

I think I am also right in saying that you can buy a road bike that is better than the ones permitted in the Tour de France.


> I think I am also right in saying that you can buy a road bike that is better than the ones permitted in the Tour de France.

Recumbent bikes have been banned since 1934[0]! Remarkable machines. I'd love to ride one in a civilized location one day.

[0]: https://en.wikipedia.org/wiki/Recumbent_bicycle


A very small number of teams aren’t well funded, have sponsorship issues, or whatever else and actually run less than top end components. I don’t recall who but there were bikes at either TDF or vuelta maybe last year with group sets which you could’ve just gone to the store and bought better ones.


There are stories like this in marathon running shoes (something like 3D printed to the athlete's exact gait and basically last just a single race) and swimming (the michael phelps olympics dolphin suit).

I'm sure cycling and golf have been doing things like this since forever.


Rowing had the sliding rigger boat which was banned in international competition within a year of first being used.

(In a normal racing rowing boat, the athlete sits on a sliding seat, while their shoes and the rigger with the oarlock are fixed to the boat. In the 1980s, boats were developed that had the shoes and rigger as a unit that slid, while the seat was fixed, which was more efficient as it meant that the boat hull and the athlete's mass moved together.)

On the other hand, first carbon-fibre oar shafts and later asymmetrical "hatchet" oar blades were adopted near-universally within a few years of their invention.


There are videos on YouTube of people using banned golf clubs that are super interesting - sand wedges with big holes in the club head so they slice through the sand, or comically large driver heads.


Golf actually adopted tech that probably ought to have been banned. Namely the modern ball and driver going from balata and persimmon. Pros went from driving it 260 yards to 325 sometimes longer and entire courses had to be redesigned as they would just trivially drive over fairway hazards and rough. Golf became a bomb and wedge game ever since as they can’t make certain historic courses terribly longer.

They are exploring the idea of rolling back the ball but the implications of that are endless.


Golf should rate gear differently for different levels of play. Most golfers need these improvements as it makes recreational golf more enjoyable. But it makes the game too easy for professionals. Gear rated for their tournaments would be better I think. But there’s a rub.

Players like to endorse gear because people want to play what the best players play. They think it will make them better. So it’s hard to endorse gear you aren’t playing with.

Theres also data that suggests longer hitting guys will be more dominant with a rollback. I don’t know but I guess the nerds figured out how to optimize golf and it’s all about distance. The days of precession and artistry may be gone. I’m not sure how to defend against bomb and gouge and not sure if we should.


> I genuinely don't understand why some people are still bullish about LLMs.

I don't believe OP's thesis is properly backed by the rest of his tweet, which seems to boil down to "LLM's can't properly cite links".

If LLM's performing poorly on an arbitrary small-scoped test case makes you bearish on the whole field, I don't think that falls on the LLM's.


Her point is not just "LLMs can't cite links", but "LLMs make shit up". And that is absolutely a problem.


Cool format for a demo. Some of the voices have a slight "metallic" ring to them, something I've seen a fair amount with Eleven Labs' models.

Does anyone have any experience with the realtime latency of these Openai TTS models? ElevenLabs has been so slow (much slower than the latency they advertise), which makes it almost impossible to use in realtime scenarios unless you can cache and replay the outputs. Cartesia looks to have cracked the time to first token, but i've found their voices to be a bit less consistent than Eleven Labs'.


is this in reference to Triton?


And NIM, yes.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: