aiconfig-ai-metrics

Instrument an existing codebase with LaunchDarkly AI Config tracking. Walks the four-tier ladder (managed runner → provider package → custom extractor +…

INSTALLATION
npx skills add https://github.com/launchdarkly/agent-skills --skill aiconfig-ai-metrics
Run in your project or agent environment. Adjust flags if your CLI version differs.

SKILL.md

$27

Workflow

1. Explore the existing call site

Before picking a tier, find the provider call and answer these questions:

  • Shape? Is it a chat loop (history + turn-based), a one-shot completion, an agent step, or something else? → drives Tier 1 vs 2.
  • Framework? Raw provider SDK? LangChain / LangGraph? Vercel AI SDK? CrewAI? Strands? → drives which Tier-2 provider package (if any) applies.
  • Provider? OpenAI, Anthropic, Bedrock, Gemini, Azure, custom HTTP? → cross-reference with the package availability matrix below.
  • Streaming? If yes, you'll need TTFT tracking, which means Tier 4 for the TTFT part even if the rest is Tier 2.
  • Language? Python or Node? Provider-package coverage differs between them.
  • Already using an AI Config? If not, route to aiconfig-create first — tracking requires a tracker, which is obtained by calling create_tracker() / createTracker() on the config object returned by completion_config() / completionConfig() / createModel().
  • On the current SDK API? If the call site uses aiclient.config(...) / aiClient.config(...) or constructs an AIConfig(...) / LDAIConfig default, it's on the pre-0.20 surface. Migrate it as part of this work before adding tracking:
  • aiclient.config(...)aiclient.completion_config(...) for one-shot/chat or aiclient.agent_config(...) for agent mode (mirror the call signature). Node is the same with camelCase.
  • AIConfig(...) default → AICompletionConfigDefault(...) or AIAgentConfigDefault(...) (Node: LDAICompletionConfigDefault / LDAIAgentConfigDefault). AIConfig is the base class the SDK returns; it isn't a valid default-value constructor — the typed *Default variants are.
  • If the result was being tuple-unpacked (config, tracker = aiclient.config(...)), drop the unpack — the new methods return a single config object. Obtain the tracker via config.create_tracker() / aiConfig.createTracker().
  • For deeper rewrites (call sites with hardcoded model/prompt as well), hand off to aiconfig-migrate instead of doing the full migration here.

2. Look up your Tier-2 option

Use this matrix to decide whether Tier 2 (provider package) is available for your situation. If it's not, drop to Tier 3 (custom extractor). If the shape is chat-loop, go to Tier 1 first regardless of what's in this matrix.

Framework / provider

Python provider package

Node provider package

Reference

OpenAI (direct SDK)

launchdarkly-server-sdk-ai-openai

@launchdarkly/server-sdk-ai-openai

openai-tracking.md

LangChain / LangGraph

launchdarkly-server-sdk-ai-langchain

@launchdarkly/server-sdk-ai-langchain

langchain-tracking.md

Vercel AI SDK

@launchdarkly/server-sdk-ai-vercel

(use the Vercel provider docs)

AWS Bedrock (Converse or InvokeModel)

— (use LangChain-aws or custom extractor)

— (use LangChain-aws or custom extractor)

bedrock-tracking.md

Anthropic direct SDK

anthropic-tracking.md

Gemini / Google GenAI

gemini-tracking.md

Strands Agents

— (Tier 3 custom extractor)

— (Tier 3 custom extractor)

strands-tracking.md

Cohere, Mistral, custom HTTP

Tier 3 custom extractor

Any provider, streaming + TTFT

— (Tier 4 only)

trackStreamMetricsOf (no TTFT) + manual TTFT

streaming-tracking.md

3. Implement from the matching reference

Once you know the tier and the provider, open the reference file and follow the pattern. The references are written so Tier 1 is always the first example, Tier 2/3 next, and Tier 4 last. Stop at the first tier that matches the app's shape.

Guardrails that apply to every tier:

  • **Always check config.enabled** before making the tracked call. A disabled config means the user has flagged the feature off — you should short-circuit to whatever fallback the app uses (cached response, error, degraded path) rather than making the provider call at all.
  • Wrap the existing call, don't rewrite it. Tier 2 and Tier 3 are designed to slot around an unmodified provider call. If you find yourself rewriting the call to fit the tracker, you're at the wrong tier — drop down one.
  • **Errors are handled inside trackMetricsOf.** The wrapper catches exceptions, records trackError() internally, and re-raises — do not add except: tracker.trackError() on top, it's a noop that also trips the at-most-once guard. Tier 1 handles both paths automatically. At Tier 4 (manual, streaming, track_duration_of) the caller does own the error-tracking call.
  • Always flush before close. Call ldClient.flush() (Python: ldclient.get().flush(); Node: await ldClient.flush()) before closing the client. Trailing events are at risk of being lost otherwise — in short-lived scripts and long-running services alike. In Node, ldClient.close() returns a Promise; await it.

4. Verify

Confirm the Monitoring tab fills in:

  • Run one real request through the instrumented path.
  • Open the AI Config in LaunchDarkly → Monitoring tab. Duration, token counts, and generation counts should appear within 1–2 minutes.
  • Force an error (bad API key, zero max_tokens, whatever) and confirm the error count increments.
  • If streaming: verify TTFT appears. If it doesn't, you probably wrapped the stream creation with trackMetricsOf but didn't add the manual trackTimeToFirstToken call — see streaming-tracking.md.

Quick reference: tracker methods

Obtain a tracker via the factory on the config object: tracker = config.create_tracker() (Python) or const tracker = aiConfig.createTracker() (Node). Call the factory once per execution and reuse the returned tracker for every call — each factory invocation mints a new runId that tags every tracking event emitted by that tracker so events from a single execution can be correlated together (via exported events / downstream systems). The Monitoring tab aggregates events rather than grouping them by run today — the runId is useful when events are exported or queried outside the UI, and is the identifier the SDK's at-most-once guards are keyed on. The methods below are the raw API surface — most of the time you should not call them individually; use trackMetricsOf or a Tier-1 managed runner. The list is here so you can recognize the methods in existing code and reach for the right one when you genuinely need Tier 4.

Method (Python ↔ Node)

Tier

What it does

track_metrics_of(extractor, fn) / trackMetricsOf(extractor, fn)

2 / 3

Wraps a provider call, captures duration + success/error, calls your extractor for tokens. This is the default generic tracker.

track_metrics_of_async(extractor, fn) (Python)

2 / 3

Async variant of the above.

trackStreamMetricsOf(extractor, streamFn) (Node only)

2 / 3

Streaming variant. Captures per-chunk usage when the extractor handles chunks. Does not auto-capture TTFT.

track_duration(ms) / trackDuration(ms)

4

Record latency in milliseconds.

track_duration_of(fn) / trackDurationOf(fn)

4

Wraps a callable and records duration automatically. Does not capture tokens or success — pair with explicit calls.

track_tokens(TokenUsage) / trackTokens({input, output, total})

4

Record token usage.

track_time_to_first_token(ms) / trackTimeToFirstToken(ms)

4

Record TTFT for streaming responses.

track_success() / trackSuccess()

4

Mark the generation as successful. Required for the Monitoring tab to count it.

track_error() / trackError()

4

Mark the generation as failed. Do not also call trackSuccess() in the same request.

track_feedback({kind}) / trackFeedback({kind})

any

Record thumbs-up / thumbs-down from a feedback UI. Independent of the success/error path.

track_tool_call(name) / trackToolCall(name)

any

Record a single tool invocation by name. Available on both SDKs.

track_tool_calls([names]) / trackToolCalls([names])

any

Batch variant — record a list of tool invocations in one call.

track_judge_result(result) / trackJudgeResult(result)

any

Record a programmatic judge evaluation. result.sampled indicates whether evaluation ran.

Related skills

  • aiconfig-create — prerequisite if the app doesn't have an AI Config yet
  • aiconfig-custom-metrics — business metrics (conversion, resolution, retention) layered on top of the AI metrics this skill captures
  • aiconfig-online-evals — automatic quality scoring (LLM-as-judge) on sampled live requests; complementary to the metrics here
  • aiconfig-migrate — Stage 4 of the hardcoded-to-AI-Configs migration delegates to this skill
BrowserAct

Let your agent run on any real-world website

Bypass CAPTCHA & anti-bot for free. Start local, scale to cloud.

Explore BrowserAct Skills →

Stop writing automation&scrapers

Install the CLI. Run your first Skill in 30 seconds. Scale when you're ready.

Start free
free · no credit card