Name: tempo
Author: grafana

SKILL.md

$28

A trace represents the lifecycle of a request as it passes through multiple services. It consists of:

Spans: Individual units of work with start time, duration, attributes, and status

Trace ID: Shared identifier across all spans in a request

Parent-child relationships: Spans form a tree showing causality

Traces enable:

Root cause analysis for service outages

Understanding service dependencies

Identifying latency bottlenecks

Correlating events across microservices

Architecture Overview

Applications

    |

    | (OTLP 4317/4318, Jaeger 14250/14268, Zipkin 9411)

    v

[Distributor]  ----  hashes traceID, routes to N partitions

    |

 [Kafka]

    |---> [Live Stores]  (storage of recent data)

    |

    |---> [Block Builders] (Parquet block assembly, flush to object storage)

    |

    |---> [Metrics Generator]  (optional: derives RED metrics -> Prometheus)

Query path:

Grafana  -->  [Query Frontend]  (shards queries)

                    |

              [Querier pool]

              /           \

    [Live Stores]   [Object Storage]

    (recent)        (historical blocks)

Core Components

Component

Role

Default Ports

Distributor

Receives spans, routes by traceID hash

4317 (gRPC), 4318 (HTTP)

Live Store

Buffers recent data on local disk and serves queries

Query Frontend

Query orchestrator, shards across queriers

3200 (HTTP)

Querier

Executes search jobs against storage

Compactor

Merges blocks, enforces retention

Block Builder

Creates the final parquet blocks and flushes to object storage

Metrics Generator

Derives RED metrics from spans

TraceQL - The Query Language

TraceQL queries filter traces by span properties. Structure: { filters } | pipeline

Attribute Scopes

span.http.status_code        # span-level attribute

resource.service.name        # resource-level attribute (from SDK)

event.name                   # event-level attribute

name                         # intrinsic: span operation name

status                       # intrinsic: ok | error | unset

duration                     # intrinsic: span duration

kind                         # intrinsic: server | client | producer | consumer | internal

traceDuration                # intrinsic: entire trace duration

rootServiceName              # intrinsic: service of the root span

rootName                     # intrinsic: operation name of the root span

Operators

=   !=   >   <   >=   <=      # comparison

=~  !~                         # regex match (Go RE2)

&#x26;&#x26;  ||  !                      # logical

Essential Examples

# All errors

{ status = error }

# Slow requests from a service

{ resource.service.name = "frontend" &#x26;&#x26; duration > 1s }

# HTTP 5xx errors

{ span.http.status_code >= 500 }

# Count errors per trace (more than 2)

{ status = error } | count() >= 2

# Select specific fields

{ status = error } | select(span.http.url, duration, resource.service.name)

# Structural: server span with downstream error

{ kind = server } >> { status = error }

# Both conditions present (any relationship)

{ span.db.system = "redis" } &#x26;&#x26; { span.db.system = "postgresql" }

# Find most recent (deterministic)

{ resource.service.name = "api" } with (most_recent=true)

TraceQL Metrics

# Error rate per service

{ status = error } | rate() by (resource.service.name)

# P99 latency

{ kind = server } | quantile_over_time(duration, .99) by (resource.service.name)

Deployment

Quick Start (Docker Compose)

git clone https://github.com/grafana/tempo.git

cd tempo/example/docker-compose/local

mkdir tempo-data

docker compose up -d

# Grafana at http://localhost:3000, Tempo API at http://localhost:3200

Kubernetes (Helm)

helm repo add grafana https://grafana.github.io/helm-charts

helm install tempo grafana/tempo-distributed \

  --set storage.trace.backend=s3 \

  --set storage.trace.s3.bucket=my-tempo-bucket \

  --set storage.trace.s3.region=us-east-1

Sending Traces to Tempo

Via Grafana Alloy (Recommended)

// alloy.river

otelcol.receiver.otlp "default" {

  grpc { endpoint = "0.0.0.0:4317" }

  http { endpoint = "0.0.0.0:4318" }

  output {

    traces = [otelcol.exporter.otlp.tempo.input]

  }

}

otelcol.exporter.otlp "tempo" {

  client {

    endpoint = "tempo:4317"

    tls { insecure = true }

  }

}

Via OpenTelemetry Collector

exporters:

  otlp:

    endpoint: tempo:4317

    tls:

      insecure: true

    # For multi-tenancy:

    headers:

      x-scope-orgid: my-tenant

service:

  pipelines:

    traces:

      receivers: [otlp]

      exporters: [otlp]

Direct HTTP (OTLP)

curl -X POST -H 'Content-Type: application/json' \

  http://localhost:4318/v1/traces \

  -d '{"resourceSpans": [{"resource": {"attributes": [{"key": "service.name", "value": {"stringValue": "my-service"}}]}, "scopeSpans": [{"spans": [{"traceId": "5B8EFFF798038103D269B633813FC700", "spanId": "EEE19B7EC3C1B100", "name": "my-op", "startTimeUnixNano": 1689969302000000000, "endTimeUnixNano": 1689969302500000000, "kind": 2}]}]}]}'

Metrics from Traces

Enable Metrics Generator

metrics_generator:

  storage:

    path: /var/tempo/generator/wal

    remote_write:

      - url: http://prometheus:9090/api/v1/write

        send_exemplars: true

overrides:

  defaults:

    metrics_generator:

      processors: [service-graphs, span-metrics]

Processor Types

Service Graphs: Visualizes service topology and latency

Output: traces_service_graph_request_total, traces_service_graph_request_failed_total, duration histograms

Span Metrics: RED metrics per span

Output: traces_spanmetrics_calls_total, traces_spanmetrics_duration_seconds_*

Labels: service, span_name, span_kind, status_code + custom dimensions

Local Blocks: Enables TraceQL metrics queries on recent data

Multi-Tenancy

# Enable in Tempo config

multitenancy_enabled: true

All requests require X-Scope-OrgID header.

# OpenTelemetry Collector

exporters:

  otlp:

    headers:

      x-scope-orgid: tenant-id

# Grafana datasource

jsonData:

  httpHeaderName1: "X-Scope-OrgID"

secureJsonData:

  httpHeaderValue1: "tenant-id"

Grafana Integration

Data Source Configuration

datasources:

  - name: Tempo

    type: tempo

    url: http://tempo:3200

    jsonData:

      # Link traces to logs

      tracesToLogsV2:

        datasourceUid: loki-uid

        filterByTraceID: true

        tags: [{key: "service.name", value: "app"}]

      # Link traces to metrics

      tracesToMetrics:

        datasourceUid: prometheus-uid

        tags: [{key: "service.name", value: "service"}]

        queries:

          - name: Error Rate

            query: 'sum(rate(traces_spanmetrics_calls_total{$$__tags, status_code="STATUS_CODE_ERROR"}[5m]))'

      # Link traces to profiles (Pyroscope)

      tracesToProfiles:

        datasourceUid: pyroscope-uid

        tags: [{key: "service.name", value: "service_name"}]

      # Service map from span metrics

      serviceMap:

        datasourceUid: prometheus-uid

Key Grafana Features

Explore > Tempo: Search by TraceQL, trace ID, or tag filters

Service Graph tab: Visual service topology with RED metrics

Traces Drilldown: /a/grafana-exploretraces-app - no TraceQL required

Exemplars: Click metric spike -> jump directly to responsible trace

Derived fields in Loki: Click trace ID in log -> jump to trace in Tempo

API Quick Reference

# Search traces

GET /api/search?q={status=error}&#x26;limit=20&#x26;start=<unix>&#x26;end=<unix>

# Get trace by ID

GET /api/traces/<traceID>

GET /api/v2/traces/<traceID>

# List all tag names

GET /api/search/tags

# Get values for a tag

GET /api/search/tag/service.name/values

# TraceQL metrics (time series)

GET /api/metrics/query_range?q={status=error}|rate()&#x26;start=...&#x26;end=...&#x26;step=60

# Health check

GET /ready

Performance Tuning Summary

Problem

Solution

Slow searches

Scale queriers horizontally; scale compactors to reduce block count

High memory on queriers

Reduce max_concurrent_queries; lower target_bytes_per_job

High memory on ingesters

Reduce max_block_bytes; lower per-tenant trace limits

Slow attribute queries

Add dedicated Parquet columns for frequent attributes

Cache miss rate high

Increase cache size; tune cache_min_compaction_level

Rate limited (429)

Raise max_outstanding_per_tenant or increase per-tenant ingestion limits

Memcached connection errors

Increase memcached connection limit (-c 4096)

Best Practices

Instrumentation

Follow OpenTelemetry semantic conventions for attribute names

Use span. prefix for span attributes, resource. for process context

Keep attributes meaningful - avoid metrics/logs as span attributes

Limit attributes to max ~128 per span (OTel default)

Use span linking for batch processing (instead of huge fan-out traces)

Create spans for: external calls, significant loops, operations with variable latency

Avoid creating spans for every function call

Deployment

Use replication factor 3 for production HA

Object storage required for distributed deployments (not local)

Enable dedicated attribute columns for your most-queried attributes

Set appropriate block retention per tenant via overrides

Monitor tempo_ingester_live_traces to detect memory pressure early

Querying

Use time bounds (start/end) to limit search scope

Use structural operators for root cause analysis patterns

Prefer attribute != nil for existence checks

Use with (most_recent=true) when you need deterministic recent results

Scope tag discovery with a TraceQL query to reduce noise

Ports Reference

Port

Protocol

Purpose

3200

HTTP

Tempo API (queries, search, health)

9095

gRPC

Internal component communication

4317

gRPC

OTLP trace ingestion

4318

HTTP

OTLP trace ingestion

14268

HTTP

Jaeger Thrift HTTP ingestion

14250

gRPC

Jaeger gRPC ingestion

6831

UDP

Jaeger Thrift Compact

6832

UDP

Jaeger Thrift Binary

9411

HTTP

Zipkin ingestion

7946

TCP/UDP

Memberlist gossip

tempo

SKILL.md

Architecture Overview

Core Components

TraceQL - The Query Language

Attribute Scopes

Operators

Essential Examples

TraceQL Metrics

Deployment

Quick Start (Docker Compose)

Kubernetes (Helm)

Sending Traces to Tempo

Via Grafana Alloy (Recommended)

Via OpenTelemetry Collector

Direct HTTP (OTLP)

Metrics from Traces

Enable Metrics Generator

Processor Types

Multi-Tenancy

Grafana Integration

Data Source Configuration

Key Grafana Features

API Quick Reference

Performance Tuning Summary

Best Practices

Instrumentation

Deployment

Querying

Ports Reference

Let your agent run on any real-world website

Related skills

Stop writing automation&scrapers