The Cognitive Loop

When you call axon.request() or axon.stream(), the agent needs to think — assemble context, call inference, dispatch tool calls, decide when it's done. That cognitive work is Cognos.

Cognos is the brain. The agent folder, your tools, the capsule, the TUI — that's the body and the environment. Cognos is what animates it. The two are the same system. The only reason there is a wire between them is that the brain is private code that cannot ship in the same binary as the open source runtime.

If that constraint didn't exist, Cognos would just be part of the same process.

What the brain does

Context assembly — on every tick, Cognos turns the agent's current state — thread history, loaded prompts, tool declarations, live process snapshots — into an AIR context window and sends it to the model. How it balances against context limits, when it compresses, what it prioritises — Cognos decides.

Loop control — Cognos runs the tick cycle. It reads the model's output, decides whether the agent is done or needs another iteration, and drives the loop to a natural stopping point. The number of iterations, the conditions for termination — all Cognos.

Tool dispatch — when the model emits a <typescript> block, Cognos routes execution to the capsule, waits for the result, and injects it back into the timeline for the next tick. Tool calls and their results appear inline in the causal sequence the model reasons over.

Stop condition detection — distinguishing "task complete" from "waiting for input" from "stuck in a loop" is one of the harder problems in agent design. Cognos handles it. Getting this wrong produces agents that either exit too early or spin indefinitely.

Compression — as threads grow, Cognos compresses prior context to stay within model context limits without losing the information that matters. What to compress and what to preserve is a research problem, not a configuration option.

What you don't configure

None of the above is exposed as user-configurable primitives. This is intentional.

These are engineering problems with improving solutions — not decisions that should vary per agent. Exposing them as configuration means every improvement becomes a migration. Every agent on an older config version gets left behind. Keeping them managed means every agent gets the improvement automatically.

When Cognos ships better stop condition detection, your agent gets it. You change nothing.

Engines

What you configure is inference — not cognition. Engines (Axon(), Ollama(), Codex(), OpenRouter()) are the model Cognos calls inside the loop. The loop is the same regardless of which engine you choose. The context window format is the same. The tool dispatch is the same. Only the source of inference changes.

This is why swapping engines changes how the agent thinks but not how it works. The brain is the same. The model answering the questions is different.

When you configure Ollama(), Cognos calls your local machine for inference via the engine bridge. The tokens come from your hardware. Everything else — loop control, context assembly, tool dispatch — is still Cognos.

Local and cloud

In both local development and cloud deployments, Cognos connects to your agent the same way. You don't install it, configure it, or manage it. axon dev connects to it automatically. axon deploy connects your cloud agent to it the same way.

The agent source is identical in both environments because the cognitive loop is not part of the source. It never was.

For the format Cognos uses to construct the context window, see AIR Format.

For how the brain's reach is bounded by the body you give it, see Data & Privacy.