SKILL.md
$2c
Cross-source join required: If the query must combine host data with logs or other
telemetry sources (e.g. "show logs from Linux hosts with their IP addresses") → also read
dt-dql-essentials/references/smartscape-topology-navigation.md before writing the query.
Core Concepts
Entities
- HOST - Physical or virtual machines (cloud or on-premise)
- PROCESS - Running processes and process groups
- CONTAINER - Kubernetes containers
- NETWORK_INTERFACE - Host network interfaces
- DISK - Host disk volumes
Metrics Categories
- Host Metrics -
dt.host.cpu.*,dt.host.memory.*,dt.host.disk.*,dt.host.net.*
- Process Metrics -
dt.process.cpu.*,dt.process.memory.*,dt.process.io.*,dt.process.network.*
- Inventory - OS type, cloud provider, technology stack, versions
- Cost -
dt.cost.costcenter,dt.cost.product
- Quality - Metadata completeness, version compliance
Alert Thresholds
- CPU/Memory/Disk: 80% warning, 90% critical
- Network: >70% high, >85% saturated
- Disk Latency: >20ms bottleneck
- Network Errors: Drop rate >1%, error rate >0.1%
- Swap: >30% warning, >50% critical
Key Workflows
1. Host Discovery and Classification
Discover hosts, classify by OS/cloud, inventory resources.
smartscapeNodes "HOST"
| fieldsAdd os.type, cloud.provider, host.logical.cpu.cores, host.physical.memory
| summarize host_count = count(), by: {os.type, cloud.provider}
| sort host_count desc
OS Types: LINUX, WINDOWS, AIX, SOLARIS, ZOS
→ For cloud-specific attributes, see [references/inventory-discovery.md](#cloud-specific-attributes)
2. Resource Utilization Monitoring
Monitor CPU, memory, disk, network across hosts.
timeseries {
cpu = avg(dt.host.cpu.usage),
memory = avg(dt.host.memory.usage),
disk = avg(dt.host.disk.used.percent)
}, by: {dt.smartscape.host}
| fieldsAdd host_name = getNodeName(dt.smartscape.host)
| filter arrayAvg(cpu) > 80 or arrayAvg(memory) > 80
| sort arrayAvg(cpu) desc
High utilization threshold: 80% warning, 90% critical
Key CPU Metrics:
dt.host.cpu.usage— Total CPU utilization (0-100%)
dt.host.cpu.idle— CPU idle time (inverse of usage; useful for anomaly detection)
dt.host.cpu.user— CPU time in user mode
dt.host.cpu.system— CPU time in kernel mode
dt.host.cpu.iowait— CPU waiting for I/O (Linux only)
→ For detailed CPU analysis, see references/host-metrics.md
→ For memory breakdown, see references/host-metrics.md
#### Disk Free Space — Find Hosts with Most/Least Free Disk
timeseries disk_used_pct = avg(dt.host.disk.used.percent), by: {dt.smartscape.host}
| fieldsAdd host_name = getNodeName(dt.smartscape.host)
| fieldsAdd avg_disk_used = arrayAvg(disk_used_pct),
free_pct = 100 - arrayAvg(disk_used_pct)
| sort free_pct desc
| limit 10
3. Process Resource Analysis
Identify top resource consumers at process level.
timeseries {
cpu = avg(dt.process.cpu.usage),
memory = avg(dt.process.memory.usage)
}, by: {dt.smartscape.process}
| fieldsAdd process_name = getNodeName(dt.smartscape.process)
| filter arrayAvg(cpu) > 50
| sort arrayAvg(cpu) desc
| limit 20
→ For process I/O analysis, see references/process-monitoring.md
→ For process network metrics, see references/process-monitoring.md
4. Technology Stack Inventory
Discover and track software technologies and versions.
smartscapeNodes "PROCESS"
| fieldsAdd process.software_technologies
| expand tech = process.software_technologies
| fieldsAdd tech_type = tech[type], tech_version = tech[version]
| summarize process_count = count(), by: {tech_type, tech_version}
| sort process_count desc
Common Technologies: Java, Node.js, Python, .NET, databases, web servers, messaging systems
→ For version compliance checks, see references/inventory-discovery.md
5. Service Discovery via Ports
Map listening ports to services for security and inventory.
smartscapeNodes "PROCESS"
| fieldsAdd process.listen_ports, dt.process_group.detected_name
| filter isNotNull(process.listen_ports) and arraySize(process.listen_ports) > 0
| expand listen_port = process.listen_ports
| summarize process_count = count(), by: {listen_port, dt.process_group.detected_name}
| sort toLong(listen_port) asc
| limit 50
Well-known ports: 80 (HTTP), 443 (HTTPS), 22 (SSH), 3306 (MySQL), 5432 (PostgreSQL)
→ For comprehensive port mapping, see references/inventory-discovery.md
6. Container and Kubernetes Monitoring
Track container distribution and K8s workload types.
smartscapeNodes "CONTAINER"
| fieldsAdd k8s.cluster.name, k8s.namespace.name, k8s.workload.kind
| summarize container_count = count(), by: {k8s.cluster.name, k8s.workload.kind}
| sort k8s.cluster.name, container_count desc
Workload Types: deployment, daemonset, statefulset, job, cronjob
Note: Container image names/versions NOT available in smartscape.
→ For K8s version tracking, see references/container-monitoring.md
→ For container lifecycle, see references/container-monitoring.md
7. Cost Attribution and Chargeback
Calculate infrastructure costs by cost center.
smartscapeNodes "HOST"
| fieldsAdd dt.cost.costcenter, host.logical.cpu.cores, host.physical.memory
| filter isNotNull(dt.cost.costcenter)
| fieldsAdd memory_gb = toDouble(host.physical.memory) / 1024 / 1024 / 1024
| summarize
host_count = count(),
total_cores = sum(toLong(host.logical.cpu.cores)),
total_memory_gb = sum(memory_gb),
by: {dt.cost.costcenter}
| sort total_cores desc
→ For product-level cost tracking, see references/inventory-discovery.md
8. Infrastructure Health Correlation
Correlate host and process metrics for cross-layer analysis.
timeseries {
host_cpu = avg(dt.host.cpu.usage),
host_memory = avg(dt.host.memory.usage),
process_cpu = avg(dt.process.cpu.usage)
}, by: {dt.smartscape.host, dt.smartscape.process}
| fieldsAdd
host_name = getNodeName(dt.smartscape.host),
process_name = getNodeName(dt.smartscape.process)
| filter arrayAvg(host_cpu) > 70
| sort arrayAvg(host_cpu) desc
Health scoring: Critical if any resource >90%, warning if >80%
→ For multi-resource saturation detection, see references/host-metrics.md
Response Construction
When the user asks for data retrieval or a DQL query (e.g., "show me top hosts by
CPU"), include the DQL query in the response alongside the results. Users want to
see and reuse the query — it is the deliverable, not just a means to get results.
When the user asks for analysis (anomaly detection, forecasting, seasonality), the
analysis results are the deliverable. Focus on presenting findings clearly:
- Prioritize metric-level findings over data collection artifacts. If an analysis
tool reports data gaps alongside actual anomalies, lead with the metric behavior
the user asked about and mention gaps only as supplementary context.
- Include host names (not just IDs) using
getNodeName(dt.smartscape.host)or the
get-entity-name tool.
- State the timeframe analyzed and the tools/parameters used.
Analytical Workflows
Host metric queries often serve as inputs to analytical tools (anomaly detection,
forecasting, seasonality analysis). This skill helps construct the right DQL query;
the actual analysis is performed by dedicated tools.
Anomaly Detection and Pattern Analysis
When users ask about "unusual behavior", "anomalies", "spikes", or "sudden changes"
in host metrics, the workflow is:
- Construct the timeseries query using this skill's patterns
- Pass it to the appropriate analysis tool (anomaly detector, novelty detection)
Choosing between detectors:
- **
adaptive-anomaly-detector** — use when the user asks about magnitude: "spikes",
"abrupt changes", "values that went above normal", "sudden jumps". It answers "did this
metric cross an unexpected threshold?" and reports alert durations and peak values.
- **
timeseries-novelty-detection** — use when the user asks about behavioral change:
"unusual patterns", "something changed", "trends", "new behavior". It answers "did the
shape of the signal change?" without implying a specific threshold was crossed.
Response format for anomaly results: Include both the host name (resolved via
getNodeName(dt.smartscape.host) or get-entity-name) and the host entity ID alongside timestamps and values.
Entity IDs alone are opaque to users; names alone prevent follow-up queries.
Novelty type selection rule: When using novelty detection, set
analysisNoveltyType to only [SPIKE, CHANGE_IN_VALUES, TREND_IN_VALUES] by default.
EXCLUDE GAP_WITH_MISSING_VALUES and CHANGE_IN_MISSING_VALUES unless the user
explicitly asks about data gaps or monitoring coverage. Data gaps are infrastructure
issues, not metric behavior anomalies — reporting them when the user asks about CPU
or memory patterns is incorrect.
Queries for analysis tools should use simple timeseries format with a single
aggregated metric and appropriate time range:
timeseries avg(dt.host.cpu.idle), by: {dt.smartscape.host}
timeseries avg(dt.host.memory.usage), by: {dt.smartscape.host}
Avoid adding filters or field transformations that reduce the data — the analysis
tools work best with complete timeseries data.
Forecasting
When users ask to "predict", "forecast", or "estimate future" host metrics:
- Construct the timeseries query with sufficient historical data (e.g., 7d for
short-term, 30d for longer predictions)
- Pass to the forecasting tool with the desired forecast horizon
The forecast horizon (how far ahead to predict) and the historical window (how much
past data the model trains on) are independent. A request like "forecast the next 2 hours"
sets the horizon to 2h — it says nothing about the lookback. Always use at least 7 days of
historical data regardless of how short the forecast horizon is. Too few training data points
cause the forecast model to fail and fall back to raw historical values.
timeseries avg(dt.host.cpu.usage), by: {dt.smartscape.host}
Seasonality Detection
When users ask about "seasonality", "weekly patterns", or "recurring behavior":
- Use a longer time range (at least 14d for weekly, 30d+ for monthly)
- Pass to the seasonal baseline anomaly detector
Response format for seasonal analysis: When presenting results, include:
- Whether seasonal anomalies were detected (yes/no)
- The analysis timeframe and parameters used
- For each affected host: host name (not just ID), timestamps of violations, violation
counts, baseline values vs actual values, and upper/lower bounds
- Organize results by host if multiple hosts are involved
Scope Boundary — Service-Level vs Host-Level Metrics
This skill covers host and process infrastructure metrics only. If the user asks
about service-level metrics (request rate, response time, error rate, service calls per
minute, throughput), use dt-obs-services instead — even when the question involves
forecasting or anomaly detection of those metrics.
**Redirect these to dt-obs-services:** "service calls per minute", "request rate",
"response time by service", "error rate by endpoint", "service throughput forecast".
Common Query Patterns
Pattern 1: Smartscape Discovery
Use smartscapeNodes to discover and classify entities.
smartscapeNodes "HOST"
| fieldsAdd <attributes>
| filter <conditions>
| summarize <aggregations>
Pattern 2: Timeseries Performance
Use timeseries to analyze metrics over time.
timeseries metric = avg(dt.host.<metric>), by: {dt.smartscape.host}
| fieldsAdd <calculations>
| filter <thresholds>
Pattern 3: Cross-Layer Correlation
Correlate host and process metrics.
timeseries {
host_cpu = avg(dt.host.cpu.usage),
process_cpu = avg(dt.process.cpu.usage)
}, by: {dt.smartscape.host, dt.smartscape.process}
Pattern 4: Entity Enrichment with Lookup
Enrich data with entity attributes. After lookup, reference fields with lookup. prefix.
timeseries cpu = avg(dt.host.cpu.usage), by: {dt.smartscape.host}
| lookup [
smartscapeNodes HOST
| fields id, cpuCores, memoryTotal
], sourceField:dt.smartscape.host, lookupField:id
| fieldsAdd cores = lookup.cpuCores, mem_gb = lookup.memoryTotal / 1024 / 1024 / 1024
Tags and Metadata
Important Notes
- Generic
tagsfield is NOT populated in smartscape queries
- Use specific tag fields:
tags:azure[*],tags:environment
- Use custom metadata:
host.custom.metadata[*]
Available Tags
- Azure Tags:
tags:azure[dt_owner_team],tags:azure[dt_cloudcost_capability]
- Environment:
tags:environment
- Custom Metadata:
host.custom.metadata[OperatorVersion],host.custom.metadata[Cluster]
- Cost:
dt.cost.costcenter,dt.cost.product
→ For complete tag reference, see [references/inventory-discovery.md](#tags-and-metadata)
Cloud-Specific Attributes
AWS
cloud.provider == "aws"
aws.region,aws.availability_zone,aws.account.id
aws.resource.id,aws.resource.name
aws.state(running, stopped, terminated)
Azure
cloud.provider == "azure"
azure.location,azure.subscription,azure.resource.group
azure.status,azure.provisioning_state
azure.resource.sku.name(VM size)
Kubernetes
k8s.cluster.name,k8s.cluster.uid
k8s.namespace.name,k8s.node.name,k8s.pod.name
k8s.workload.name,k8s.workload.kind
→ For multi-cloud analysis, see references/inventory-discovery.md
Best Practices
- Use percentiles (p95, p99) for latency;
max()for limits;avg()for trends
- Set multi-level thresholds (warning 80%, critical 90%)
- Filter early in the pipeline; limit results with
| limit N
- Aggregate before enrichment (lookup)
- Use
getNodeName(dt.smartscape.host)for human-readable host names;getNodeName(dt.smartscape.process)for processes
- Convert bytes to GB:
/ 1024 / 1024 / 1024; round withround(value, decimals: 1)
Time windows: Real-time: 5-15 min | Trends: 1-7 days | Capacity planning: 30-90 days
Limitations
dt.host.cpu.iowaitavailable on Linux only
- Generic
tagsfield NOT populated in smartscape (use specific tag namespaces)
- Container image names NOT available in smartscape
Troubleshooting
Problem
Cause
Solution
No hosts returned from smartscapeNodes "HOST"
Missing time range or OneAgent not deployed
Verify OneAgent is installed; add a time range to the query
tags field always empty
Generic tags not populated in smartscape
Use specific tag namespaces: tags:azure[*], tags:environment, dt.cost.costcenter
Memory values in bytes are unreadable
Raw metric unit is bytes
Divide by 1024 / 1024 / 1024 and use round(value, decimals: 1)
dt.host.cpu.iowait returns no data
Metric is Linux-only
Check os.type; iowait is unavailable on Windows, AIX, Solaris
Container image names missing
Not available in smartscape
Use k8s.object parsing for image details; see dt-obs-kubernetes skill
process.software_technologies is empty
Process not monitored by deep injection
Verify OneAgent deep monitoring is enabled for the process group
When to Load References
This skill uses progressive disclosure. Start here for 80% of use cases. Load reference files for detailed specifications when needed.
Load host-metrics.md when:
- Analyzing CPU component breakdown (user, system, iowait, steal)
- Investigating memory pressure and swap usage
- Troubleshooting disk I/O latency
- Diagnosing network packet drops or errors
Load process-monitoring.md when:
- Analyzing process-level I/O patterns
- Investigating TCP connection quality
- Detecting resource exhaustion (file descriptors, threads)
- Tracking GC suspension time
Load container-monitoring.md when:
- Analyzing container lifecycle and churn
- Tracking Kubernetes version distribution
- Managing OneAgent operator versions
- Planning K8s cluster upgrades
Load inventory-discovery.md when:
- Performing security audits via port discovery
- Implementing cost attribution and chargeback
- Validating data quality and metadata completeness
- Managing multi-cloud infrastructure
References
- host-metrics.md - Detailed host CPU, memory, disk, and network monitoring
- process-monitoring.md - Process-level CPU, memory, I/O, and network analysis
- container-monitoring.md - Container inventory, Kubernetes versions, and operator management
- inventory-discovery.md - Host/process discovery, technology inventory, cost attribution, and data quality