SKILL.md
$27
Check for required tools and AWS access before discovery.
Constraints:
- You MUST verify AWS MCP server tools are available (
aws___call_aws,aws___search_documentation) and fall back to AWS CLI if not
- You MUST confirm credentials are valid:
aws sts get-caller-identity
- You MUST inform the user about any missing tools and ask whether to proceed
2. Discover Catalogs
List catalogs in account:
aws glue get-catalogs --recursive --include-root
Classify each catalog by type:
Field Present
Catalog Type
What It Contains
Neither TargetRedshiftCatalog nor FederatedCatalog
Default (Glue)
Standard Glue databases and tables
FederatedCatalog.ConnectionName = aws:s3tables
S3 Tables
Managed Iceberg table buckets
TargetRedshiftCatalog
Redshift-federated
Redshift databases exposed as Glue catalogs
FederatedCatalog with ConnectionName ≠ aws:s3tables
Remote Iceberg
External catalogs (Snowflake, Databricks, Iceberg REST)
Constraints:
- You MUST include
--include-rootto capture default account catalog
- You MUST present summary of catalog counts by type
- If only default catalog exists, You SHOULD skip catalog overview and go to step 3
3. Enumerate Databases and Tables
For each catalog (or the user-specified one):
aws glue get-databases --catalog-id <catalog-id>
aws glue get-tables --database-name <db> --catalog-id <catalog-id>
For S3 Tables catalogs, also enumerate via the S3 Tables API:
aws s3tables list-table-buckets
aws s3tables list-namespaces --table-bucket-arn <arn>
aws s3tables list-tables --table-bucket-arn <arn> --namespace <ns>
Constraints:
- You MUST flag S3 Tables not registered in Glue; You SHOULD suggest registration
- For sub-catalogs,
--catalog-idaccepts the catalog name (not the ARN)
- For the default catalog, omit
--catalog-idor pass the account ID
4. Capture Details and Analyze
For each database, capture table count, formats, partitioning, and S3 locations. For each table of interest, capture column schemas, types, partition keys, SerDe format, and last access time.
You MUST report data formats in human-readable terms (Parquet, CSV, JSON), not raw SerDe class names.
See discovery-checklist.md for analysis framework.
Argument Routing
Resolve the argument in this order; stop at the first match:
- Starts with
s3://— S3 path (explore unregistered data, detect formats)
- Matches a known catalog from step 2 (
get-catalogs) — deep dive into that catalog
- Matches a known database (
get-databases) — deep dive into that database
- Matches a known table (
get-tables) — detailed table analysis with schema and partitions
- No match — treat as search term (Glue
search-tables)
- No args — full landscape discovery (catalogs, then databases and tables)
Principles
- Start with catalog landscape, then narrow based on user interest
- Always report catalog types — users need to know where data lives
- Always report data formats — they drive cost and performance decisions
- Flag stale tables and missing descriptions
- Suggest partitioning for large unpartitioned tables
- Summary first, details on request
- You MUST NOT execute Athena queries (
start-query-execution) during discovery; query execution belongs toquerying-data-lake
Troubleshooting
Error
Cause
Fix
Only sub-catalogs returned, default missing
--include-root omitted
Re-run get-catalogs with --include-root
Federated catalog query slow or failing
Network call to remote source; connection misconfigured
Report connection errors clearly rather than silently skipping
S3 Tables not queryable via Athena
Tables exist in S3 Tables API but not registered in Glue
Flag as "not queryable"; suggest registration
get-databases/get-tables fails with catalog-id
Default catalog requires omit or account ID
Omit --catalog-id or pass account ID for the default catalog