Skip to main content

AI-agent observability

The core logger is domain-agnostic. The AI-agent extension pack adds a set of attribute namespaces and typed events for applications built around LLM calls, agentic pipelines, retrieval-augmented generation, and Model Context Protocol (MCP) tool dispatch.

The extension is opt-in: applications without AI pipelines ignore it entirely. When the pack is loaded, the listed namespaces become reserved (a binding rejects user-defined events on the same prefixes); when it is not loaded, those namespaces are simply user-defined.

OpenTelemetry GenAI conformance

The pack commits to conformance with the OpenTelemetry GenAI semantic conventions:

  • For OTel-covered concepts (gen_ai.*, mcp.*) the pack uses the OTel attribute names as-is, with no parallel keys for the same idea.
  • For concepts OTel has not yet stabilised (rag.*, agent.*, prompt.*), the pack defines its own namespaces. When an OTel equivalent appears, the pack migrates to it in the next major version of logger-spec, with a deprecation period for the old keys.
  • A source-of-truth mapping is planned at _meta/conventions/genai_otel_mapping.yaml (v1.1) and will be updated on every OTel GenAI release; v1.0 bindings ship the mapping inline.

Backends that already understand OTel GenAI (Datadog, Grafana Cloud, Honeycomb, Langfuse, OpenLLMetry, Helicone) read the records natively without per-vendor adaptation.

The five namespaces

gen_ai.* — used as-is from OTel

Standard LLM-call attributes. The most common keys:

AttributeMeaning
gen_ai.systemopenai, anthropic, gemini, vllm, openrouter, bedrock, ...
gen_ai.operation.namechat, text_completion, embeddings, tool_call
gen_ai.request.modelmodel id as requested
gen_ai.response.modelmodel id as returned (may differ — e.g., OpenAI versioning)
gen_ai.request.temperature, .top_p, .max_tokens, .frequency_penalty, .presence_penaltysampling params
gen_ai.usage.input_tokens, .output_tokens, .cache.read_input_tokenstoken accounting
gen_ai.response.finish_reasons[]stop, length, tool_calls, content_filter, function_call
gen_ai.response.idprovider-assigned response id
gen_ai.tool.name, gen_ai.tool.call.idtool dispatch
gen_ai.prompt.<N>.role, .contentconversational messages (subject to truncation)

mcp.* — used as-is from OTel

Model Context Protocol observability per the OTel MCP conventions:

AttributeMeaning
mcp.server.name, mcp.server.versionMCP server identity
mcp.method.nametools/call, tools/list, resources/list, resources/read, prompts/get, logging/setLevel
mcp.request.idJSON-RPC request id
mcp.tool.nametool name when method.name = "tools/call"
mcp.response.is_errorboolean
mcp.transportstdio, http+sse, streamable_http

rag.* — dagstack-specific

Retrieval-augmented generation has no stabilised OTel namespace yet, so the pack defines its own:

AttributeMeaning
rag.query.original, rag.query.rewrittenthe user query and its rewrite (simple mode)
rag.retrieval.top_k, .min_score, .results_countretrieval params and outcome
rag.retrieval.collections[]searched collection names
rag.chunk.id, .score, .repo, .path, .line_rangeper-chunk attributes in the chunk_retrieved event
rag.reranker.model, .top_n_before_rerankrerank stage, when used

agent.* — dagstack-specific

Agent-loop state for multi-step pipelines:

AttributeMeaning
agent.pipelinesimple, agent, two_agent
agent.roleanalyst, answerer, describer, planner, executor
agent.iteration, agent.iteration.maxloop state
agent.decisioncontinue, need_more, finish
agent.decision.rationalefree-form rationale (v1.1 will introduce a structured form)

prompt.* — dagstack-specific

Prompt-assembly metrics, complementary to gen_ai.*:

AttributeMeaning
prompt.template.id, .template.versionprompt template lookup
prompt.section.system.tokens, .user.tokens, .history.tokens, .tools.tokensper-section breakdown
prompt.total_tokens, .token_budgettotal vs budget
prompt.truncatedboolean — whether history was truncated to fit the budget
prompt.content.markdownthe full assembled prompt (subject to truncation)

Typed events

The pack publishes event schemas under event.domain = "ai_agent". Minimum set (full schema will live in _meta/events/ai_agent.yaml in v1.1; v1.0 bindings ship the per-event attribute lists inline):

event.nameWhen emittedRequired attributes
ai_agent.context.assembledthe prompt is ready to sendprompt.total_tokens, prompt.token_budget
ai_agent.llm.requestbefore an LLM callgen_ai.system, gen_ai.request.model, gen_ai.operation.name
ai_agent.llm.responseafter an LLM responsegen_ai.response.model, gen_ai.usage.input_tokens, gen_ai.usage.output_tokens, gen_ai.response.finish_reasons
ai_agent.llm.retryretry attemptgen_ai.request.model, retry.attempt, retry.reason, retry.backoff_ms
ai_agent.tool.calledtool dispatchgen_ai.tool.name, gen_ai.tool.call.id
ai_agent.tool.returnedtool responsegen_ai.tool.name, operation.duration_ms, operation.status
ai_agent.retrieval.requested, .completedRAG search boundaryrag.retrieval.top_k, rag.retrieval.min_score
ai_agent.iteration.started, .completedagent-loop boundaryagent.iteration, agent.pipeline, agent.role
ai_agent.decisionpipeline state changeagent.decision, agent.decision.rationale
ai_agent.session.started, .completedsession lifecyclesession.id; on completed: usage sums and max iteration

Correlation ids

Three correlation keys flow through W3C Baggage so every record in a single user interaction is groupable:

  • session.id — a long-lived user session.
  • conversation.id — one back-and-forth exchange in the session.
  • agent.run.id — one execution of the agent loop within the conversation.

The intuitive containment is session.id ⊃ conversation.id ⊃ agent.run.id, but the spec does not mandate the hierarchy — it is a recommendation, and a flat key set is acceptable.

Body truncation and privacy

Prompts and LLM responses easily exceed 100 KB. The pack defaults to:

  • max_body_bytes: 4096 — fields matching *.content, *.markdown, *.code, *.table_json, *.html are truncated, with *_truncated: true and *_original_bytes: <N> set.
  • capture_bodies: false — production default. The matched fields are replaced with "" or null; only metadata attributes survive (gen_ai.usage.*, *_original_bytes, *_hash).
  • hash_bodies: true*.content.hash = sha256(content)[:8] allows dedup and correlation across runs without storing plaintext. An optional body_hash_salt env var addresses environments where a hash leak is itself a risk.

Enable capture_bodies: true only in an explicit debug mode (for example, via DAGSTACK_DEBUG_AI=true or per-request via a scoped logger override — see Scoped overrides).

The pack recommends an OTel span hierarchy that pairs naturally with the events:

span: ai_agent.session (root)
└─ span: ai_agent.run (single user query)
└─ span: ai_agent.iteration.1
├─ span: ai_agent.context.assembly
├─ span: gen_ai.chat (LLM call — OTel GenAI)
│ events: ai_agent.llm.request → ai_agent.llm.response
├─ span: mcp.tools/call (tool dispatch — OTel MCP)
│ events: ai_agent.tool.called → ai_agent.tool.returned
└─ span: mcp.tools/call (read_file)
└─ span: ai_agent.iteration.2 …

Trace-to-log correlation is automatic — every record carries trace_id / span_id per Context propagation.

See also