tests/

Put test files in tests/. They are standard Bun test files — bun:test for the runner, Axon() from @axon/test as the harness. The harness boots the complete agent runtime against your actual axon.config.ts, so you are testing the real thing.

tests/
├── tools.test.ts
├── scripts.test.ts
└── behavior.test.ts

The harness

Axon() boots a full agent runtime scoped to a test session. It auto-discovers your axon.config.ts by walking up from the current working directory — no path needed. Destructure what you need. Call stop() when the suite finishes — the harness warns on process exit if you forget.

import { describe, it, expect, afterAll } from "bun:test"
import { Axon } from "@axon/test"
import { Mock } from "@axon/engines"

describe("my agent", () => {
    const { axon, bus, stop } = await Axon({ config: { engine: Mock() } })
    afterAll(() => stop())

    it("responds to a request", async () => {
        const result = await axon.request("hello")
        expect(result.entries.length).toBeGreaterThan(0)
    })
})

axon is the same handle available in scripts and routes. Same API, no special test mode.

The Mock engine

Real inference in tests is slow, nondeterministic, and costs money. Mock() replaces the inference step with a deterministic responder — the full agent loop still runs. Tool calls execute, stop conditions fire, threads accumulate context. Everything behaves as in production, just without the model.

Three forms:

Echo — reflects the user message back. Useful for testing routing and tool execution without caring about content:

engine: Mock()

Map — pattern-matched fixed responses. Patterns are matched as case-insensitive substrings against the user message:

import { Mock, air } from "@axon/engines"

engine: Mock({
    "summarise": air.text("Here is the summary."),
    "run tests": air.typescript("tools.ci.runTests()"),
})

Function — full control. Receives the raw engine request, returns an AIR block:

engine: Mock(async (req) => {
    const last = req.messages.at(-1)?.content ?? ""
    return air.text(`Echo: ${last}`)
})

Mock responses must be valid AIR blocks, not plain strings. Use the air helpers:

HelperBehaviour
air.text(content)Renders as an agent message
air.typescript(code)Executes in the capsule
air.doneTerminates the loop with no output

See Mock engine for the full reference.

Asserting on results

axon.request() returns a result with the full entry log for that call:

const result = await axon.request("run the test suite")

// Text content of the final reply
expect(result.text).toContain("passed")

// All entries (messages, tool calls, tool results, lifecycle events)
expect(result.entries.length).toBeGreaterThan(0)

// Filter to tool calls only
const toolCalls = result.entries.filter(e => e.type === "tool:call")
expect(toolCalls).toHaveLength(1)
expect(toolCalls[0].name).toBe("ci.runTests")

Entry types you'll use most:

TypeWhat it is
textAgent message content
tool:callThe agent invoking a tool
tool:resultThe return value from the tool

Testing with named threads

Pass thread to scope a request to an isolated conversation history:

const r1 = await axon.request({ prompt: "start a plan", thread: "planning" })
const r2 = await axon.request({ prompt: "add a step", thread: "planning" })

// The agent on "planning" has context from r1 when producing r2
expect(r2.text).not.toBe("")

// A different thread has no context from "planning"
const r3 = await axon.request({ prompt: "what plan?", thread: "other" })

Observing the event bus

bus gives you access to every event the runtime emits — tool invocations, engine calls, session lifecycle, and more. Use it when you need to assert on things that aren't in the result entries.

const { axon, bus, stop } = await Axon({ config: { engine: Mock() } })

await axon.request("deploy the service")

const events = bus.history()
const engineCalls = events.filter(e => e.type === "engine:call")
expect(engineCalls.length).toBeGreaterThan(0)

Hermetic sessions

The harness writes session data to a temporary directory, not your agent's data/sessions/. Tests never pollute your local session history. Each Axon() call gets its own isolated session.

To point sessions somewhere specific:

const { axon, stop } = await Axon({ sessionsDir: "/tmp/my-test-sessions" })

Config overrides

config merges over your axon.config.ts. Override only what the test needs:

// Swap the engine, leave everything else (tools, prompts, policy) as-is
const { axon, stop } = await Axon({ config: { engine: Mock() } })

// Tighten policy for a specific test
const { axon, stop } = await Axon({
    config: {
        engine: Mock(),
        policy: { maxTurns: 1 },
    },
})

Running tests

bun test
bun test tests/tools.test.ts
bun test --watch

Tests run against your local agent. The capsule boots, tools load, and the runtime is available for the duration of the test suite.