Happy to see there's a way to get browser automation for AI without building infrastructure to support it. Yet I don't see examples of connecting an LLM to drive a web session, just examples of using Puppeteer or Playwright or Selenium to drive a web session. Presumably your user base knows how to write custom code for an interface between Claude or OpenAI API and Puppeteer/Playwright/Selenium. Sadly, I don't know how to do that. Would it be fair to expect your documentation to help? What would you suggest to get started?
Is the interface between Steel, or Puppeteer/Playwright/Selenium, something that might be implemented in the new Anthropic Model Context Protocol, so there's less custom code required?
Good point! The space is so early, and it's 100% on us to help people get started building web agents. We're actually re-working this repo (+ a tutorial with it): https://github.com/steel-dev/claude-browser - which implements a web agent by reworking the claude computer use repo + page screenshots for vision.
We also have more AI-specific examples, tutorials, and an MCP server coming out really soon (like really soon).
You can keep an eye out on our releases on discord/twitter where we'll be posting a bunch of these example repos.
I’d recommend checking out Stagehand if you want to use something that’s more AI first! It’s like the AI powered successor to playwright: https://github.com/browserbase/stagehand
Is the interface between Steel, or Puppeteer/Playwright/Selenium, something that might be implemented in the new Anthropic Model Context Protocol, so there's less custom code required?