Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Happy to see there's a way to get browser automation for AI without building infrastructure to support it. Yet I don't see examples of connecting an LLM to drive a web session, just examples of using Puppeteer or Playwright or Selenium to drive a web session. Presumably your user base knows how to write custom code for an interface between Claude or OpenAI API and Puppeteer/Playwright/Selenium. Sadly, I don't know how to do that. Would it be fair to expect your documentation to help? What would you suggest to get started?

Is the interface between Steel, or Puppeteer/Playwright/Selenium, something that might be implemented in the new Anthropic Model Context Protocol, so there's less custom code required?



Good point! The space is so early, and it's 100% on us to help people get started building web agents. We're actually re-working this repo (+ a tutorial with it): https://github.com/steel-dev/claude-browser - which implements a web agent by reworking the claude computer use repo + page screenshots for vision.

We also have more AI-specific examples, tutorials, and an MCP server coming out really soon (like really soon).

You can keep an eye out on our releases on discord/twitter where we'll be posting a bunch of these example repos.


I’d recommend checking out Stagehand if you want to use something that’s more AI first! It’s like the AI powered successor to playwright: https://github.com/browserbase/stagehand

(I am one of the authors!)




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: