Anyone know of a setup, perhaps with MCP, where I can get my local LLM to work in tandem on tasks, compress context, or otherwise act in concert with the cloud agent I'm using with Augment/Cursor/whatever? It seems silly that my shiny new M3 box just renders the UI while the cloud LLM alone refactors my codebase, I feel they could negotiate the tasks between themselves somehow.
There's a few Ollama-MCP bridge servers already (from a quick search, also interested myself):
ollama-mcp-bridge: A TypeScript implementation that "connects local LLMs (via Ollama) to Model Context Protocol (MCP) servers. This bridge allows open-source models to use the same tools and capabilities as Claude, enabling powerful local AI assistants"
simple-mcp-ollama-bridge: A more lightweight bridge connecting "Model Context Protocol (MCP) servers to OpenAI-compatible LLMs like Ollama"
rawveg/ollama-mcp: "An MCP server for Ollama that enables seamless integration between Ollama's local LLM models and MCP-compatible applications like Claude Desktop"
How you route would be an interesting challenge, presumably could just tell it to use the mcp for certain tasks, thereby offloading locally.
I've been toying with Visual Studio Code's MCP and agent support and gotten it to offload things like reference searches and targeted web crawling (look up module X on git repo Y via this URL pattern that the MCP server goes, fetches and parses).
I started by giving it a reference Python MCP server and asking it to modify the code to do that. Now I have 3-4 tools that give me reproducible results.