profiling-tables

Comprehensive statistical and quality analysis of database tables with structured profiling output. Generates column-level statistics tailored to data type: min/max/percentiles for numeric columns, length metrics for strings, date ranges for timestamps Performs cardinality analysis to identify categorical vs. high-cardinality columns and detect skewed distributions Assesses data quality across five dimensions: completeness (NULL rates), uniqueness (duplicates), freshness (update timestamps), validity (range/format checks), and consistency (logical contradictions) Outputs a structured profile including schema overview, key statistics, quality scores, identified issues, and recommended follow-up queries for new team members

INSTALLATION
npx skills add https://github.com/astronomer/agents --skill profiling-tables
Run in your project or agent environment. Adjust flags if your CLI version differs.

SKILL.md

$2a

Run via run_sql:

SELECT

    COUNT(*) as total_rows,

    COUNT(*) / 1000000.0 as millions_of_rows

FROM <table>

Step 3: Column-Level Statistics

For each column, gather appropriate statistics based on data type:

Numeric Columns

SELECT

    MIN(column_name) as min_val,

    MAX(column_name) as max_val,

    AVG(column_name) as avg_val,

    STDDEV(column_name) as std_dev,

    PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY column_name) as median,

    SUM(CASE WHEN column_name IS NULL THEN 1 ELSE 0 END) as null_count,

    COUNT(DISTINCT column_name) as distinct_count

FROM <table>

String Columns

SELECT

    MIN(LEN(column_name)) as min_length,

    MAX(LEN(column_name)) as max_length,

    AVG(LEN(column_name)) as avg_length,

    SUM(CASE WHEN column_name IS NULL OR column_name = '' THEN 1 ELSE 0 END) as empty_count,

    COUNT(DISTINCT column_name) as distinct_count

FROM <table>

Date/Timestamp Columns

SELECT

    MIN(column_name) as earliest,

    MAX(column_name) as latest,

    DATEDIFF('day', MIN(column_name), MAX(column_name)) as date_range_days,

    SUM(CASE WHEN column_name IS NULL THEN 1 ELSE 0 END) as null_count

FROM <table>

Step 4: Cardinality Analysis

For columns that look like categorical/dimension keys:

SELECT

    column_name,

    COUNT(*) as frequency,

    ROUND(COUNT(*) * 100.0 / SUM(COUNT(*)) OVER(), 2) as percentage

FROM <table>

GROUP BY column_name

ORDER BY frequency DESC

LIMIT 20

This reveals:

  • High-cardinality columns (likely IDs or unique values)
  • Low-cardinality columns (likely categories or status fields)
  • Skewed distributions (one value dominates)

Step 5: Sample Data

Get representative rows:

SELECT *

FROM <table>

LIMIT 10

If the table is large and you want variety, sample from different time periods or categories.

Step 6: Data Quality Assessment

Summarize quality across dimensions:

Completeness

  • Which columns have NULLs? What percentage?
  • Are NULLs expected or problematic?

Uniqueness

  • Does the apparent primary key have duplicates?
  • Are there unexpected duplicate rows?

Freshness

  • When was data last updated? (MAX of timestamp columns)
  • Is the update frequency as expected?

Validity

  • Are there values outside expected ranges?
  • Are there invalid formats (dates, emails, etc.)?
  • Are there orphaned foreign keys?

Consistency

  • Do related columns make sense together?
  • Are there logical contradictions?

Step 7: Output Summary

Provide a structured profile:

Overview

2-3 sentences describing what this table contains, who uses it, and how fresh it is.

Schema

Column

Type

Nulls%

Distinct

Description

...

...

...

...

...

Key Statistics

  • Row count: X
  • Date range: Y to Z
  • Last updated: timestamp

Data Quality Score

  • Completeness: X/10
  • Uniqueness: X/10
  • Freshness: X/10
  • Overall: X/10

Potential Issues

List any data quality concerns discovered.

Recommended Queries

3-5 useful queries for common questions about this data.

BrowserAct

Let your agent run on any real-world website

Bypass CAPTCHA & anti-bot for free. Start local, scale to cloud.

Explore BrowserAct Skills →

Stop writing automation&scrapers

Install the CLI. Run your first Skill in 30 seconds. Scale when you're ready.

Start free
free · no credit card