nosql-expert

Expert guidance for distributed NoSQL databases (Cassandra, DynamoDB). Focuses on mental models, query-first modeling, single-table design, and avoiding hot…

INSTALLATION
npx skills add https://github.com/sickn33/antigravity-awesome-skills --skill nosql-expert
Run in your project or agent environment. Adjust flags if your CLI version differs.

SKILL.md

$27

Feature

SQL (Relational)

Distributed NoSQL (Cassandra/DynamoDB)

Data modeling

Model Entities + Relationships

Model Queries (Access Patterns)

Joins

CPU-intensive, at read time

Pre-computed (Denormalized) at write time

Storage cost

Expensive (minimize duplication)

Cheap (duplicate data for read speed)

Consistency

ACID (Strong)

BASE (Eventual) / Tunable

Scalability

Vertical (Bigger machine)

Horizontal (More nodes/shards)

The Golden Rule: In SQL, you design the data model to answer any query. In NoSQL, you design the data model to answer specific queries efficiently.

Core Design Patterns

1. Query-First Modeling (Access Patterns)

You typically cannot "add a query later" without migration or creating a new table/index.

Process:

  • List all Entities (User, Order, Product).
  • List all Access Patterns ("Get User by Email", "Get Orders by User sorted by Date").
  • Design Table(s) specifically to serve those patterns with a single lookup.

2. The Partition Key is King

Data is distributed across physical nodes based on the Partition Key (PK).

  • Goal: Even distribution of data and traffic.
  • Anti-Pattern: Using a low-cardinality PK (e.g., status="active" or gender="m") creates Hot Partitions, limiting throughput to a single node's capacity.
  • Best Practice: Use high-cardinality keys (User IDs, Device IDs, Composite Keys).

3. Clustering / Sort Keys

Within a partition, data is sorted on disk by the Clustering Key (Cassandra) or Sort Key (DynamoDB).

  • This allows for efficient Range Queries (e.g., WHERE user_id=X AND date > Y).
  • It effectively pre-sorts your data for specific retrieval requirements.

4. Single-Table Design (Adjacency Lists)

Primary use: DynamoDB (but concepts apply elsewhere)

Storing multiple entity types in one table to enable pre-joined reads.

PK (Partition)

SK (Sort)

Data Fields...

USER#123

PROFILE

{ name: "Ian", email: "..." }

USER#123

ORDER#998

{ total: 50.00, status: "shipped" }

USER#123

ORDER#999

{ total: 12.00, status: "pending" }

  • Query: PK="USER#123"
  • Result: Fetches User Profile AND all Orders in one network request.

5. Denormalization & Duplication

Don't be afraid to store the same data in multiple tables to serve different query patterns.

  • Table A: users_by_id (PK: uuid)
  • Table B: users_by_email (PK: email)

Trade-off: You must manage data consistency across tables (often using eventual consistency or batch writes).

Specific Guidance

Apache Cassandra / ScyllaDB

  • Primary Key Structure: ((Partition Key), Clustering Columns)
  • No Joins, No Aggregates: Do not try to JOIN or GROUP BY. Pre-calculate aggregates in a separate counter table.
  • **Avoid ALLOW FILTERING:** If you see this in production, your data model is wrong. It implies a full cluster scan.
  • Writes are Cheap: Inserts and Updates are just appends to the LSM tree. Don't worry about write volume as much as read efficiency.
  • Tombstones: Deletes are expensive markers. Avoid high-velocity delete patterns (like queues) in standard tables.

AWS DynamoDB

  • GSI (Global Secondary Index): Use GSIs to create alternative views of your data (e.g., "Search Orders by Date" instead of by User).
  • Note: GSIs are eventually consistent.
  • LSI (Local Secondary Index): Sorts data differently within the same partition. Must be created at table creation time.
  • WCU / RCU: Understand capacity modes. Single-table design helps optimize consumed capacity units.
  • TTL: Use Time-To-Live attributes to automatically expire old data (free delete) without creating tombstones.

Expert Checklist

Before finalizing your NoSQL schema:

  • Access Pattern Coverage: Does every query pattern map to a specific table or index?
  • Cardinality Check: Does the Partition Key have enough unique values to spread traffic evenly?
  • Split Partition Risk: For any single partition (e.g., a single user's orders), will it grow indefinitely? (If > 10GB, you need to "shard" the partition, e.g., USER#123#2024-01).
  • Consistency Requirement: Can the application tolerate eventual consistency for this read pattern?

Common Anti-Patterns

Scatter-Gather: Querying all partitions to find one item (Scan).

Hot Keys: Putting all "Monday" data into one partition.

Relational Modeling: Creating Author and Book tables and trying to join them in code. (Instead, embed Book summaries in Author, or duplicate Author info in Books).

Limitations

  • Use this skill only when the task clearly matches the scope described above.
  • Do not treat the output as a substitute for environment-specific validation, testing, or expert review.
  • Stop and ask for clarification if required inputs, permissions, safety boundaries, or success criteria are missing.
BrowserAct

Let your agent run on any real-world website

Bypass CAPTCHA & anti-bot for free. Start local, scale to cloud.

Explore BrowserAct Skills →

Stop writing automation&scrapers

Install the CLI. Run your first Skill in 30 seconds. Scale when you're ready.

Start free
free · no credit card