2026-03-29

Offline Claude Code on Mac mini M4: Local LLM, OS Sandbox, No API Calls

I have a Mac mini M4 on my desk running Claude Code against a local Qwen3-1.7B model via llamafile. No requests leave the machine. No API fees. The model is not as capable as Claude, but for exploratory work — trying ideas, iterating on drafts, running quick edits — it is enough.

The part that required the most thought was not the model. It was how to run --dangerously-skip-permissions without it being dangerous.

Why Offline

Two reasons.

Cost. Every Claude Code session sends requests to the Anthropic API. For early-stage work where the direction isn't clear yet, the cost adds up. A local model costs nothing per request.

Privacy. Some repos I work in contain drafts, notes, and context I do not want leaving the machine. With a local model, nothing leaves. The prompts, the code, the responses — all stay local.

Neither of these is a theoretical concern. They are why I set this up rather than continuing to use the API for everything.

The Architecture

Two components in the default setup, each in its own terminal:

llamafile — serves Qwen3-1.7B locally on port 8080. Supports the Anthropic Messages API natively since v0.10.0, so no translation layer is needed.
run-claude.sh — launches Claude Code inside a Safehouse sandbox with ANTHROPIC_BASE_URL pointed at llamafile directly.

Claude Code thinks it is talking to Anthropic. The request never leaves localhost.

--dangerously-skip-permissions, Safely

Claude Code's default behavior is to ask for confirmation before writing files, running commands, or making changes. That is the right default for interactive use.

It breaks async use entirely.

In the workflow I described in Building a Website from My Phone with Claude Code + Telegram, I send a task from my phone and come back to a completed result. If every file write requires a confirmation at the terminal, nothing completes while I am away. The whole point collapses.

--dangerously-skip-permissions disables those prompts. Without something else enforcing limits, that means unrestricted access to your filesystem.

The something else is Safehouse.

What Safehouse Actually Restricts

Safehouse is a macOS sandbox wrapper. It uses the OS-level sandbox profile to restrict what a process can access — not at the application layer, but at the kernel level.

The run-claude.sh script launches Claude Code inside a Safehouse sandbox scoped to a single directory. Claude Code can read and write freely within that directory. It cannot touch anything outside — not other repos, not ~/.ssh, not system files.

With --dangerously-skip-permissions inside Safehouse, the behavior is: Claude Code acts without asking, but it can only act within the sandbox. The interactive prompts become redundant because the OS already enforces the boundary.

That combination — skip the prompts, enforce at the OS level — is what makes async use practical.

The Proxy: Optional but Useful

llamafile supports the Anthropic Messages API natively since v0.10.0. You can skip the proxy entirely and point ANTHROPIC_BASE_URL directly at llamafile on port 8080.

I kept the proxy for one reason: I wanted a layer I could inspect and modify. When something behaves unexpectedly, I can add logging to the proxy, check what Claude Code is actually sending, and see what llamafile returns. That is harder to do when the connection is direct.

The proxy translates Anthropic's content arrays to OpenAI plain strings, converts the system field, and wraps the OpenAI SSE stream back into Anthropic's event format. FastAPI, httpx, 130 lines. If you do not need that visibility, skip it.

LiteLLM in March 2026

On March 24, 2026, LiteLLM versions 1.82.7 and 1.82.8 were published to PyPI containing a credential stealer. The malicious code executed automatically on Python startup via a .pth file and exfiltrated AWS credentials, GCP auth, GitHub tokens, SSH keys, and cryptocurrency wallet files. The packages were available for about three hours. CVSS 9.4. Patched in 1.82.9.

The attack vector: threat actor TeamPCP compromised Trivy, the security scanner in LiteLLM's CI/CD pipeline, and used it to obtain PyPI credentials. The compromise came through the tooling that was supposed to prevent it.

I was not using LiteLLM. I mention this not as vindication — I did not predict this attack — but because it illustrates something about dependency surface area. A large, active open source project has a large, active CI/CD pipeline. That's another way in. Keeping dependencies small reduces that exposure.

Verdict

The offline setup works well for what it is designed for: cost-free, private, async-friendly Claude Code sessions on a local machine. Qwen3-1.7B is not Claude Sonnet. For exploratory work and quick edits on familiar codebases, the quality gap is acceptable.

The part I would recommend regardless of the local LLM question is Safehouse. If you use --dangerously-skip-permissions for any reason — async workflows, automation, anything — an OS-level sandbox is the right way to contain it. Application-layer restrictions can be worked around. OS-level ones are much harder to escape.

The repo is at github.com/0xkaz/claude-llamafile-sandbox. Three shell scripts, one Python file, one README.

// feedback

← older

Telegram Bridges for Gemini CLI and Codex After Hitting Claude Code Limits