The Engine Bridge: Local Inference, Managed Cognition / Axon

One of the most common questions we get: can I run Axon with a local model?

Yes. And the architecture that makes it possible is worth explaining, because it's not obvious.

The split

Axon has two layers that never collapse into each other:

Cognos — the cognition runtime. It runs the loop. It manages state, compression, stop conditions, tool dispatch. It calls $engine() when it needs tokens. It doesn't care where those tokens come from.

Axon — the user surface. It defines tools, identity, policy. It also runs in the privileged environment: your machine, your network, your local inference server.

This split is intentional and non-negotiable. Cognos runs in the cloud, unprivileged. It never touches your machine directly. Axon runs where it needs privileged access — locally in dev, on Cloud Run in production.

The engine bridge

When you configure a local inference provider:

// axon.config.ts
export default defineAgent({
    engine: {
        provider: "ollama",
        model: "llama3.2",
        baseUrl: "http://localhost:11434",
    },
})

Here's what happens:

Your Axon worker registers an engine.complete RPC handler — an adapter for Ollama or any OpenAI-compatible server
The worker connects to the Cognos runtime with a capability flag: { localEngine: true }
When Cognos needs tokens, it calls $engine("local") in its cognitive loop
That call routes via WebSocket RPC back to your Axon worker
Your worker calls the local inference server, streams tokens back to Cognos
Cognos continues the loop

Your machine is the privileged execution environment for both tool calls and inference. Cognos receives tokens — it never touches your local network.

Why this matters

The alternative architectures are worse:

Ship Cognos locally: breaks the proprietary boundary, can't update the runtime without pushing new versions
Bypass Cognos for local models: you lose the loop — no tools, no state, no stop conditions, no memory
Proxy everything through the cloud: latency, privacy concerns, unnecessary roundtrips

The engine bridge gives you local inference with full cognitive capabilities, zero infrastructure exposure, and hot-upgradeable runtime.

The result

axon dev  # starts local agent with Ollama

Cognos is running in the cloud. Your inference is running on your machine. Tools execute locally. The loop runs everywhere it needs to.

That's the bet: unbundle inference from cognition, let each live where it belongs.

>The Engine Bridge: Local Inference, Managed Cognition

The split

The engine bridge

Why this matters

The result