Name: exploring-data-catalog
Author: aws

SKILL.md

$27

Check for required tools and AWS access before discovery.

Constraints:

You MUST verify AWS MCP server tools are available (aws___call_aws, aws___search_documentation) and fall back to AWS CLI if not

You MUST confirm credentials are valid: aws sts get-caller-identity

You MUST inform the user about any missing tools and ask whether to proceed

2. Discover Catalogs

List catalogs in account:

aws glue get-catalogs --recursive --include-root

Classify each catalog by type:

Field Present

Catalog Type

What It Contains

Neither TargetRedshiftCatalog nor FederatedCatalog

Default (Glue)

Standard Glue databases and tables

FederatedCatalog.ConnectionName = aws:s3tables

S3 Tables

Managed Iceberg table buckets

TargetRedshiftCatalog

Redshift-federated

Redshift databases exposed as Glue catalogs

FederatedCatalog with ConnectionName ≠ aws:s3tables

Remote Iceberg

External catalogs (Snowflake, Databricks, Iceberg REST)

Constraints:

You MUST include --include-root to capture default account catalog

You MUST present summary of catalog counts by type

If only default catalog exists, You SHOULD skip catalog overview and go to step 3

3. Enumerate Databases and Tables

For each catalog (or the user-specified one):

aws glue get-databases --catalog-id <catalog-id>

aws glue get-tables --database-name <db> --catalog-id <catalog-id>

For S3 Tables catalogs, also enumerate via the S3 Tables API:

aws s3tables list-table-buckets

aws s3tables list-namespaces --table-bucket-arn <arn>

aws s3tables list-tables --table-bucket-arn <arn> --namespace <ns>

Constraints:

You MUST flag S3 Tables not registered in Glue; You SHOULD suggest registration

For sub-catalogs, --catalog-id accepts the catalog name (not the ARN)

For the default catalog, omit --catalog-id or pass the account ID

4. Capture Details and Analyze

For each database, capture table count, formats, partitioning, and S3 locations. For each table of interest, capture column schemas, types, partition keys, SerDe format, and last access time.

You MUST report data formats in human-readable terms (Parquet, CSV, JSON), not raw SerDe class names.

See discovery-checklist.md for analysis framework.

Argument Routing

Resolve the argument in this order; stop at the first match:

Starts with s3:// — S3 path (explore unregistered data, detect formats)

Matches a known catalog from step 2 (get-catalogs) — deep dive into that catalog

Matches a known database (get-databases) — deep dive into that database

Matches a known table (get-tables) — detailed table analysis with schema and partitions

No match — treat as search term (Glue search-tables)

No args — full landscape discovery (catalogs, then databases and tables)

Principles

Start with catalog landscape, then narrow based on user interest

Always report catalog types — users need to know where data lives

Always report data formats — they drive cost and performance decisions

Flag stale tables and missing descriptions

Suggest partitioning for large unpartitioned tables

Summary first, details on request

You MUST NOT execute Athena queries (start-query-execution) during discovery; query execution belongs to querying-data-lake

Troubleshooting

Error

Cause

Fix

Only sub-catalogs returned, default missing

--include-root omitted

Re-run get-catalogs with --include-root

Federated catalog query slow or failing

Network call to remote source; connection misconfigured

Report connection errors clearly rather than silently skipping

S3 Tables not queryable via Athena

Tables exist in S3 Tables API but not registered in Glue

Flag as "not queryable"; suggest registration

get-databases/get-tables fails with catalog-id

Default catalog requires omit or account ID

Omit --catalog-id or pass account ID for the default catalog

Additional Resources

Discovery checklist

AWS Glue Data Catalog API

S3 Tables list operations

exploring-data-catalog

SKILL.md

2. Discover Catalogs

3. Enumerate Databases and Tables

4. Capture Details and Analyze

Argument Routing

Principles

Troubleshooting

Additional Resources

Let your agent run on any real-world website

Related skills

Stop writing automation&scrapers