SKILL.md
Create Data Lake Tables with Amazon S3 Tables
Overview
Amazon S3 Tables provides managed Iceberg tables with automatic compaction and snapshot management. Queryable via Athena and Iceberg-compatible engines.
Common Tasks
You MUST use AWS MCP server tools when connected, they provide command validation, sandboxed execution, and audit logging. Fall back to AWS CLI if MCP unavailable.
Decision Guide
Before creating, You MUST check what exists:
You MUST run aws glue get-tables --database-name <NAME> when user mentions a database.
What you find
Action
Fuzzy database name ("our analytics db")
You MUST STOP. Delegate to finding-data-lake-assets to resolve.
Non-S3-Tables table with matching name
You MUST STOP. Delegate to finding-data-lake-assets. You MUST NOT create until user confirms.
Existing S3 Tables table with matching name
You MUST check schema match. Reuse if compatible, recreate only if user confirms.
No matching tables
Proceed with creation (Steps 1-8).
User explicitly requests new S3 Tables table
Skip checks, proceed with creation.
Creation paths:
- Existing data in S3: Create empty table (Steps 1-8), then use
ingesting-into-data-lakeskill.
- Glue ETL pipeline: Read
references/table-creation-glue-etl.mdfirst, then Steps 1-6.
- Lake Formation access control: Search AWS docs for
"S3 Tables integration with Lake Formation".
1. Verify Dependencies
Constraints:
- You MUST check whether AWS MCP server tools or AWS CLI are available and inform user if missing
- You MUST confirm target AWS region and verify credentials with
aws sts get-caller-identity
2. Understand the Schema
- Explicit schema: Validate Iceberg types.
- Loose description: Ask columns, types, grain. Propose and confirm.
- Existing S3 data: Infer schema from file headers only. Create empty table first, then use
ingesting-into-data-lakeskill.
Constraints:
- You MUST read
references/best-practices.mdfor Iceberg type mapping, partitions, and naming.
- You MUST ask for all required parameters upfront: table name, columns, types, partition strategy. For schema evolution, see
references/athena-ddl-path.md.
- You MUST use all lowercase names -- Glue rejects mixed case with
GENERIC_INTERNAL_ERROR. Namespace and table names MUST NOT contain hyphens.
- You SHOULD suggest partition columns based on access patterns.
3. Create Table Bucket
Names: 3-63 chars, lowercase, numbers, hyphens.
aws s3tables create-table-bucket --name <BUCKET_NAME> --region <REGION>
Capture table-bucket-arn. Encryption (SSE-S3 default, SSE-KMS) and storage class (STANDARD, INTELLIGENT_TIERING) set at creation. See references/best-practices.md.
Constraints:
- You MUST check existing buckets with
aws s3tables list-table-bucketsand ask user to select or create new.
- If using SSE-KMS, KMS key policy MUST allow S3 Tables maintenance service principal to read data. Search AWS docs for
"S3 Tables KMS key policy"for required policy.
- If bucket creation fails, see
references/best-practices.mdfor common errors.
4. Create Namespace
aws s3tables create-namespace --table-bucket-arn <ARN> --namespace <NAMESPACE>
Constraints:
- You MUST list existing namespaces first and suggest reusing if relevant
- You MUST use lowercase names with no hyphens
5. Create Glue Data Catalog Integration
Check if s3tablescatalog exists (create once per region per account):
aws glue get-catalog --catalog-id s3tablescatalog
If not found, create (requires glue:CreateCatalog, glue:passConnection):
aws glue create-catalog --name "s3tablescatalog" --catalog-input '{
"FederatedCatalog": {
"Identifier": "arn:aws:s3tables:<REGION>:<ACCOUNT_ID>:bucket/*",
"ConnectionName": "aws:s3tables"
},
"CreateDatabaseDefaultPermissions": [{"Principal": {"DataLakePrincipalIdentifier": "IAM_ALLOWED_PRINCIPALS"}, "Permissions": ["ALL"]}],
"CreateTableDefaultPermissions": [{"Principal": {"DataLakePrincipalIdentifier": "IAM_ALLOWED_PRINCIPALS"}, "Permissions": ["ALL"]}],
"AllowFullTableExternalDataAccess": "True"
}'
Verify with aws glue get-catalogs --parent-catalog-id s3tablescatalog.
6. Configure Access Control
S3 Tables uses s3tables:* IAM namespace (not s3:*).
Querying principal permissions (bucket policy):
s3tables:GetTableBucket,s3tables:GetNamespace,s3tables:GetTable,s3tables:GetTableMetadataLocation,s3tables:GetTableData
Querying principal permissions (IAM policy):
glue:GetCatalog,glue:GetDatabase,glue:GetTable
You MUST scope to correct ARN patterns. You MUST read references/access-control.md for exact resource ARNs.
Constraints:
- You MUST ask user for querying principal ARN
- You MUST NOT grant broader permissions than necessary
- You MUST NOT create IAM roles automatically, verify existing and guide user
7. Create the Table
Context
Path
Default (any user)
S3 Tables API (below)
User specifically wants SQL DDL
Athena DDL (see references/athena-ddl-path.md)
Glue ETL pipeline
Spark DDL via --conf job args (not spark.conf.set()). You MUST read references/table-creation-glue-etl.md for the --conf string.
Default: S3 Tables API:
aws s3tables create-table \
--table-bucket-arn <ARN> \
--namespace <NAMESPACE> \
--name <TABLE_NAME> \
--format ICEBERG \
--metadata '<METADATA_JSON>'
Metadata JSON MUST nest under "iceberg" key:
{"iceberg":{"schema":{"fields":[
{"name":"order_date","type":"date","required":true},
{"name":"customer_id","type":"string","required":true},
{"name":"amount","type":"double","required":false}
]},
"partitionSpec":{"fields":[
{"sourceId":1,"fieldId":1000,"transform":"month","name":"order_date_month"}
]}}}
Constraints:
partitionSpec.sourceIdMUST reference a valid schema field ID
- For schema evolution after creation, use Athena DDL. See
references/athena-ddl-path.md
- You MUST use
schemaV2for complex types (list, map, struct) with explicit field IDs. Seereferences/best-practices.md.
- You SHOULD search AWS docs for
"IcebergPartitionField S3 Tables"for supported partition transforms
8. Verify and Confirm
You MUST verify with aws s3tables get-table and confirm queryability with DESCRIBE <table_name> via Athena using --query-execution-context '{"Catalog":"s3tablescatalog/<BUCKET_NAME>","Database":"<NAMESPACE>"}'. Do NOT put catalog in SQL. Present summary: bucket ARN, namespace, table, schema, partitions.
Troubleshooting
Error
Cause
Fix
"Table location can not be specified"
LOCATION in CREATE TABLE
Remove LOCATION clause. S3 Tables manages storage automatically.
AccessDeniedException with s3:* policy
Using s3:* not s3tables:*
S3 Tables uses s3tables:* namespace. Update IAM policy.
Additional Resources
- access-control.md -- IAM permissions, ARN patterns, permission errors
- best-practices.md -- Iceberg types, partitions, naming, common errors
- athena-ddl-path.md -- Athena DDL, schema evolution
- table-creation-glue-etl.md -- Spark DDL via Glue ETL
- Loading data:
ingesting-into-data-lakeskill