Sandbox: Run untrusted AI code safely, fast

nl · 2025-12-27T02:55:44 1766804144

There's lots of interesting tooling in my space. Currently open in my browser are

https://github.com/liquidmetal-dev/flintlock

https://github.com/e2b-dev

https://www.daytona.io

https://modal.com/

https://render.com/

There's lots of others. I'd love to see a proper comparison somewhere.

mijoharas · 2025-12-26T23:04:37 1766790277

This seemed quite interesting but it seems to run them on GCP rather than locally.

I had a brief glance at running firecracker VM's locally as that sounded interesting, but it doesn't seem too easy.

Does anyone know of any good solution that improve the UX of that (running some firecracker VM's locally)?

l9o · 2025-12-26T23:15:12 1766790912

Out of curiosity, what would be an ideal UX for you? I'm working on a Rust library for this exact problem (CLI and language bindings should be easy to add).

It uses KVM directly on Linux and Virtualization.framework on macOS, with a builder API for VM configuration. For AI sandboxing specifically, it has a higher-level "sandbox" mode with a guest agent for structured command execution and file I/O over vsock. You get proper exit codes and stdout/stderr without console scraping.

Also supports pre-warmed VM pools for fast startup and shared directories via virtio-fs.

I'm planning to support OCI images, but not sure if that's important to people. I typically just build my own root disks with Nix.

nl · 2025-12-27T02:43:52 1766803432

I'm after this too.

I want to have a "container" (used in the conceptual sense here - I'm aware of the differences between container and other solutions) that I can let an AI agent run commands in but is safely sandboxed from the rest of my computer.

For me this is primarily file access. I don't want it inadvertently deleting the wrong things or reading my SSH keys.

But the way the agent uses it is important too. They generally issue the commands they want to run as strings, eg:

  bash ls
  sed -i 's/old_string/new_string/g' filename.py

I need a way to run these in the "container". I can `ssh command` but open to other options too.

justinclift · 2025-12-27T14:56:07 1766847367

If you provide your own functions/tools to the AI agent, wouldn't that let you do exactly that?

ie "Here AI, call this function -> local_exec(commmand_name, {param1, param2, [etc]})" to execute functions.

And you'd wire up your local_exec() function to run the command in the container however you choose. (chroot, namespace, ssh to something remote, etc)

nl · 2025-12-28T02:00:03 1766887203

This will work fine for bash commands, but most Agent implementations also have read/write file functions that are implemented using local file operations.

mijoharas · 2025-12-27T01:31:35 1766799095

Awesome, this sounds cool.

In terms of UX, I kinda want something to paper over the inconsistencies of the different tools I need to use to set up the network etc. (Kinda like the `docker` CLI tool).

When I looked at it the first thing I thought was "the tun/tap setup seems fiddly, and I bet I won't leave things in a consistent state (note, I just glanced at this blog[0]). The copy on write filesystem stuff looks cool too, but also fiddly.

The more I think about it the more I just come up with "just docker but VMs".

[0] https://harryhodge.co.uk/posts/2024/01/getting-started-with-...

vosper · 2025-12-26T23:25:48 1766791548

If you have a link to your project that you could share I'd be interested in following it - this sounds like something I might want to use one day.

l9o · 2025-12-26T23:32:17 1766791937

Not yet! But I will make sure to link here once it's up in a few days (or post to HN? not sure what the etiquette around self-promotion is these days). It's somewhat functional but not usable by anyone other than me at this point most likely (:

mkagenius · 2025-12-26T18:50:41 1766775041

If you don't want to depend on cloud, have a mac, then you can run a sandbox locally on you mac. I have built an apple container (not docker) based sandbox to run arbitrary code - coderunner[1]. It is quite fast. And apple container provides one vm per container unlike docker on macos which shares the vm across all containers. Coderunner is good for processing sensitive docs locally in a secure sandbox.

1. coderunner - https://github.com/instavm/coderunner

zingar · 2025-12-26T22:17:27 1766787447

In the coderunner read me it talks about reading files without sending them to the cloud. Does that mean there is something agentic going on? That’s more than I expect from something called a sandbox.

Also if it is agentic, why is it less cloud based than eg Claude code? Are there LLMs running locally?

mkagenius · 2025-12-26T23:23:40 1766791420

Regarding files, they are volume mapped (local <--> sandbox) as with docker.

It's not agentic - agents can use it to execute code. Those agents can be powered by any LLM including local.

zingar · 2025-12-27T07:00:57 1766818857

I’m still not sure why sending files to the cloud is supposed to be a disadvantage of other approaches but not this one. Whether you run your LLM’s commands in this sandbox or not, content is going to the cloud if the LLM is in the cloud, and not going to the cloud if the LLM is local. It looks like the amount of data in the cloud is entirely orthogonal to whether you use coderunner.

throw20251220 · 2025-12-27T15:10:43 1766848243

If you run your LLMs locally then nothing goes to the cloud. If you use cloud offerings then of course nothing is going to help you.

zingar · 2025-12-28T18:10:43 1766945443

I could say the same about any AI architecture. By definition cloud = cloud, local = not cloud. So when coderunner advertises ~ “more privacy because less cloud” I’m not sure what it is about coderunner that helps me get less cloud than anything else.

nl · 2025-12-28T01:58:56 1766887136

I think their point is more that that architecture of this CodeRunner program isn't very clear.

It's unclear if it is a container manager, or comes with a LLM Agent built in. These are two separate concerns and the README makes it very unclear how to use one without the other.

vivzkestrel · 2025-12-27T03:48:50 1766807330

Stupid question: what exactly is different about any of these tools than spinning a docker container programmatically and running the AI generated code inside it? What exactly are these tools solving that docker isnt?

TingPing · 2025-12-27T04:58:34 1766811514

A virtual machine is a much better security boundry than a container. Will that often matter… maybe not. I’m sure other tools wrap docker.

vivzkestrel · 2025-12-27T10:00:58 1766829658

and this was something everyone was parroting years ago, then we moved forward with docker saying it is capable of isolating deps without the overhead of a VM so why are we moving backwards now?

maxdo · 2025-12-26T22:02:52 1766786572

Not affiliated in any way , but just outsourced that to modal.com , extremely cheap . For millions of runs I paid to date $30 usd

pwnfunction · 2025-12-27T06:48:13 1766818093

hello everyone, author here.

although this is self-hostable on gcp, they can get quite expensive due to the machines used. cheapest vm with nested virtualisation on gcp costs about $60/mo. on aws, you'd have to go with bare metal, which can cost you a lot more.

i think the next best thing for sandboxes is, "vm as a library", atleast for personal/small scale workloads.

TOMDM · 2025-12-26T23:15:56 1766790956

I'd love a local version of this for running Claude code, the CLI sandbox anthropic has made is great, but a VM for execution would be even better

mkagenius · 2025-12-26T23:40:25 1766792425

I made a comment about this, if you are on macOS - Coderunner (https://github.com/instavm/coderunner)

quotemstr · 2025-12-26T22:28:29 1766788109

Firecracker: so no virtiofs? Shame.

ATechGuy · 2025-12-26T22:53:55 1766789635

Genuine question: why not just use GCP/AWS VMs for agentic execution? What is missing?

nl · 2025-12-27T02:50:59 1766803859

This runs on GCP

The issue with using raw VMs is you want fast startup. If you are running hundreds of pieces of code per hour as you develop, or have 10 or 20 agents running simultaneously it's much better to have something faster to start.

This uses Amazon's Firecracker on GCP to provide that.

AWS has something similar for its own Agent framework.

scotty79 · 2025-12-27T13:38:48 1766842728

I really wish ollama had virtual sandbox where AI could run code.

sh4rks · 2025-12-26T21:45:09 1766785509

How is this different from the several other alternatives?