tempo

>

INSTALLATION
npx skills add https://github.com/grafana/skills --skill tempo
Run in your project or agent environment. Adjust flags if your CLI version differs.

SKILL.md

$28

A trace represents the lifecycle of a request as it passes through multiple services. It consists of:

  • Spans: Individual units of work with start time, duration, attributes, and status
  • Trace ID: Shared identifier across all spans in a request
  • Parent-child relationships: Spans form a tree showing causality

Traces enable:

  • Root cause analysis for service outages
  • Understanding service dependencies
  • Identifying latency bottlenecks
  • Correlating events across microservices

Architecture Overview

Applications

    |

    | (OTLP 4317/4318, Jaeger 14250/14268, Zipkin 9411)

    v

[Distributor]  ----  hashes traceID, routes to N partitions

    |

 [Kafka]

    |---> [Live Stores]  (storage of recent data)

    |

    |---> [Block Builders] (Parquet block assembly, flush to object storage)

    |

    |---> [Metrics Generator]  (optional: derives RED metrics -> Prometheus)

Query path:

Grafana  -->  [Query Frontend]  (shards queries)

                    |

              [Querier pool]

              /           \

    [Live Stores]   [Object Storage]

    (recent)        (historical blocks)

Core Components

Component

Role

Default Ports

Distributor

Receives spans, routes by traceID hash

4317 (gRPC), 4318 (HTTP)

Live Store

Buffers recent data on local disk and serves queries

-

Query Frontend

Query orchestrator, shards across queriers

3200 (HTTP)

Querier

Executes search jobs against storage

-

Compactor

Merges blocks, enforces retention

-

Block Builder

Creates the final parquet blocks and flushes to object storage

-

Metrics Generator

Derives RED metrics from spans

-

TraceQL - The Query Language

TraceQL queries filter traces by span properties. Structure: { filters } | pipeline

Attribute Scopes

span.http.status_code        # span-level attribute

resource.service.name        # resource-level attribute (from SDK)

event.name                   # event-level attribute

name                         # intrinsic: span operation name

status                       # intrinsic: ok | error | unset

duration                     # intrinsic: span duration

kind                         # intrinsic: server | client | producer | consumer | internal

traceDuration                # intrinsic: entire trace duration

rootServiceName              # intrinsic: service of the root span

rootName                     # intrinsic: operation name of the root span

Operators

=   !=   >   <   >=   <=      # comparison

=~  !~                         # regex match (Go RE2)

&#x26;&#x26;  ||  !                      # logical

Essential Examples

# All errors

{ status = error }

# Slow requests from a service

{ resource.service.name = "frontend" &#x26;&#x26; duration > 1s }

# HTTP 5xx errors

{ span.http.status_code >= 500 }

# Count errors per trace (more than 2)

{ status = error } | count() >= 2

# Select specific fields

{ status = error } | select(span.http.url, duration, resource.service.name)

# Structural: server span with downstream error

{ kind = server } >> { status = error }

# Both conditions present (any relationship)

{ span.db.system = "redis" } &#x26;&#x26; { span.db.system = "postgresql" }

# Find most recent (deterministic)

{ resource.service.name = "api" } with (most_recent=true)

TraceQL Metrics

# Error rate per service

{ status = error } | rate() by (resource.service.name)

# P99 latency

{ kind = server } | quantile_over_time(duration, .99) by (resource.service.name)

Deployment

Quick Start (Docker Compose)

git clone https://github.com/grafana/tempo.git

cd tempo/example/docker-compose/local

mkdir tempo-data

docker compose up -d

# Grafana at http://localhost:3000, Tempo API at http://localhost:3200

Kubernetes (Helm)

helm repo add grafana https://grafana.github.io/helm-charts

helm install tempo grafana/tempo-distributed \

  --set storage.trace.backend=s3 \

  --set storage.trace.s3.bucket=my-tempo-bucket \

  --set storage.trace.s3.region=us-east-1

Sending Traces to Tempo

Via Grafana Alloy (Recommended)

// alloy.river

otelcol.receiver.otlp "default" {

  grpc { endpoint = "0.0.0.0:4317" }

  http { endpoint = "0.0.0.0:4318" }

  output {

    traces = [otelcol.exporter.otlp.tempo.input]

  }

}

otelcol.exporter.otlp "tempo" {

  client {

    endpoint = "tempo:4317"

    tls { insecure = true }

  }

}

Via OpenTelemetry Collector

exporters:

  otlp:

    endpoint: tempo:4317

    tls:

      insecure: true

    # For multi-tenancy:

    headers:

      x-scope-orgid: my-tenant

service:

  pipelines:

    traces:

      receivers: [otlp]

      exporters: [otlp]

Direct HTTP (OTLP)

curl -X POST -H 'Content-Type: application/json' \

  http://localhost:4318/v1/traces \

  -d '{"resourceSpans": [{"resource": {"attributes": [{"key": "service.name", "value": {"stringValue": "my-service"}}]}, "scopeSpans": [{"spans": [{"traceId": "5B8EFFF798038103D269B633813FC700", "spanId": "EEE19B7EC3C1B100", "name": "my-op", "startTimeUnixNano": 1689969302000000000, "endTimeUnixNano": 1689969302500000000, "kind": 2}]}]}]}'

Metrics from Traces

Enable Metrics Generator

metrics_generator:

  storage:

    path: /var/tempo/generator/wal

    remote_write:

      - url: http://prometheus:9090/api/v1/write

        send_exemplars: true

overrides:

  defaults:

    metrics_generator:

      processors: [service-graphs, span-metrics]

Processor Types

Service Graphs: Visualizes service topology and latency

  • Output: traces_service_graph_request_total, traces_service_graph_request_failed_total, duration histograms

Span Metrics: RED metrics per span

  • Output: traces_spanmetrics_calls_total, traces_spanmetrics_duration_seconds_*
  • Labels: service, span_name, span_kind, status_code + custom dimensions

Local Blocks: Enables TraceQL metrics queries on recent data

Multi-Tenancy

# Enable in Tempo config

multitenancy_enabled: true

All requests require X-Scope-OrgID header.

# OpenTelemetry Collector

exporters:

  otlp:

    headers:

      x-scope-orgid: tenant-id

# Grafana datasource

jsonData:

  httpHeaderName1: "X-Scope-OrgID"

secureJsonData:

  httpHeaderValue1: "tenant-id"

Grafana Integration

Data Source Configuration

datasources:

  - name: Tempo

    type: tempo

    url: http://tempo:3200

    jsonData:

      # Link traces to logs

      tracesToLogsV2:

        datasourceUid: loki-uid

        filterByTraceID: true

        tags: [{key: "service.name", value: "app"}]

      # Link traces to metrics

      tracesToMetrics:

        datasourceUid: prometheus-uid

        tags: [{key: "service.name", value: "service"}]

        queries:

          - name: Error Rate

            query: 'sum(rate(traces_spanmetrics_calls_total{$$__tags, status_code="STATUS_CODE_ERROR"}[5m]))'

      # Link traces to profiles (Pyroscope)

      tracesToProfiles:

        datasourceUid: pyroscope-uid

        tags: [{key: "service.name", value: "service_name"}]

      # Service map from span metrics

      serviceMap:

        datasourceUid: prometheus-uid

Key Grafana Features

  • Explore > Tempo: Search by TraceQL, trace ID, or tag filters
  • Service Graph tab: Visual service topology with RED metrics
  • Traces Drilldown: /a/grafana-exploretraces-app - no TraceQL required
  • Exemplars: Click metric spike -> jump directly to responsible trace
  • Derived fields in Loki: Click trace ID in log -> jump to trace in Tempo

API Quick Reference

# Search traces

GET /api/search?q={status=error}&#x26;limit=20&#x26;start=<unix>&#x26;end=<unix>

# Get trace by ID

GET /api/traces/<traceID>

GET /api/v2/traces/<traceID>

# List all tag names

GET /api/search/tags

# Get values for a tag

GET /api/search/tag/service.name/values

# TraceQL metrics (time series)

GET /api/metrics/query_range?q={status=error}|rate()&#x26;start=...&#x26;end=...&#x26;step=60

# Health check

GET /ready

Performance Tuning Summary

Problem

Solution

Slow searches

Scale queriers horizontally; scale compactors to reduce block count

High memory on queriers

Reduce max_concurrent_queries; lower target_bytes_per_job

High memory on ingesters

Reduce max_block_bytes; lower per-tenant trace limits

Slow attribute queries

Add dedicated Parquet columns for frequent attributes

Cache miss rate high

Increase cache size; tune cache_min_compaction_level

Rate limited (429)

Raise max_outstanding_per_tenant or increase per-tenant ingestion limits

Memcached connection errors

Increase memcached connection limit (-c 4096)

Best Practices

Instrumentation

  • Follow OpenTelemetry semantic conventions for attribute names
  • Use span. prefix for span attributes, resource. for process context
  • Keep attributes meaningful - avoid metrics/logs as span attributes
  • Limit attributes to max ~128 per span (OTel default)
  • Use span linking for batch processing (instead of huge fan-out traces)
  • Create spans for: external calls, significant loops, operations with variable latency
  • Avoid creating spans for every function call

Deployment

  • Use replication factor 3 for production HA
  • Object storage required for distributed deployments (not local)
  • Enable dedicated attribute columns for your most-queried attributes
  • Set appropriate block retention per tenant via overrides
  • Monitor tempo_ingester_live_traces to detect memory pressure early

Querying

  • Use time bounds (start/end) to limit search scope
  • Use structural operators for root cause analysis patterns
  • Prefer attribute != nil for existence checks
  • Use with (most_recent=true) when you need deterministic recent results
  • Scope tag discovery with a TraceQL query to reduce noise

Ports Reference

Port

Protocol

Purpose

3200

HTTP

Tempo API (queries, search, health)

9095

gRPC

Internal component communication

4317

gRPC

OTLP trace ingestion

4318

HTTP

OTLP trace ingestion

14268

HTTP

Jaeger Thrift HTTP ingestion

14250

gRPC

Jaeger gRPC ingestion

6831

UDP

Jaeger Thrift Compact

6832

UDP

Jaeger Thrift Binary

9411

HTTP

Zipkin ingestion

7946

TCP/UDP

Memberlist gossip

BrowserAct

Let your agent run on any real-world website

Bypass CAPTCHA & anti-bot for free. Start local, scale to cloud.

Explore BrowserAct Skills →

Stop writing automation&scrapers

Install the CLI. Run your first Skill in 30 seconds. Scale when you're ready.

Start free
free · no credit card