Ollama

Local inference on your machine. No API costs, no data leaving your machine. Requires Ollama installed — Axon manages the server lifecycle automatically.

Configuration

import { Ollama } from "@axon/engines"

export default defineAgent({
    engine: Ollama({ model: "qwen2.5:7b" }),
})

model is the model name as it appears in ollama list. Any model you've pulled is valid.

With a custom host:

import { Ollama } from "@axon/engines"

export default defineAgent({
    engine: Ollama({
        model: "qwen2.5:7b",
        host: "http://localhost:11434",  // default
    }),
})

Installing models

Open the model palette with *. Ollama models are listed and filtered to your hardware, labelled by fit — recommended, possible, or unlikely to run well. Select any model to start downloading. Once complete it's available immediately.

Or pull directly:

ollama pull qwen2.5:7b

Managing running models

:ollama ps        # show models currently loaded in VRAM
:ollama unload    # unload a model from VRAM
:ollama remove    # delete a model from disk
:ollama update    # re-pull all installed models to latest
:ollama stop      # stop the Ollama server

Switching models at runtime

Open the model palette with * and select any pulled model. The change takes effect for the next session.