rag-retrieval

Retrieval-Augmented Generation patterns for grounded LLM responses. Use when building RAG pipelines, embedding documents, implementing hybrid search,…

INSTALLATION
npx skills add https://github.com/yonatangross/orchestkit --skill rag-retrieval
Run in your project or agent environment. Adjust flags if your CLI version differs.

SKILL.md

$2b

Core RAG

Fundamental patterns for retrieval, generation, and pipeline composition.

Rule

File

Key Pattern

Basic RAG

rules/core-basic-rag.md

Retrieve + context + generate with citations

Hybrid Search

rules/core-hybrid-search.md

RRF fusion (k=60) for semantic + keyword

Context Management

rules/core-context-management.md

Token budgeting + sufficiency check

Pipeline Composition

rules/core-pipeline-composition.md

Composable Decompose → HyDE → Retrieve → Rerank

Embeddings

Embedding models, chunking strategies, and production optimization.

Rule

File

Key Pattern

Models & API

rules/embeddings-models.md

Model selection, batch API, similarity

Chunking

rules/embeddings-chunking.md

Semantic boundary splitting, 512 token sweet spot

Advanced

rules/embeddings-advanced.md

Redis cache, Matryoshka dims, batch processing

Contextual Retrieval

Anthropic's context-prepending technique — 67% fewer retrieval failures.

Rule

File

Key Pattern

Context Prepending

rules/contextual-prepend.md

LLM-generated context + prompt caching

Hybrid Search

rules/contextual-hybrid.md

40% BM25 / 60% vector weight split

Complete Pipeline

rules/contextual-pipeline.md

End-to-end indexing + hybrid retrieval

HyDE

Hypothetical Document Embeddings for bridging vocabulary gaps.

Rule

File

Key Pattern

Generation

rules/hyde-generation.md

Embed hypothetical doc, not query

Per-Concept

rules/hyde-per-concept.md

Parallel HyDE for multi-topic queries

Fallback

rules/hyde-fallback.md

2-3s timeout → direct embedding fallback

Agentic RAG

Self-correcting retrieval with LLM-driven decision making.

Rule

File

Key Pattern

Self-RAG

rules/agentic-self-rag.md

Binary document grading for relevance

Corrective RAG

rules/agentic-corrective-rag.md

CRAG workflow with web fallback

Knowledge Graph

rules/agentic-knowledge-graph.md

KG + vector hybrid for entity-rich domains

Adaptive Retrieval

rules/agentic-adaptive-retrieval.md

Query routing to optimal strategy

Multimodal RAG

Image + text retrieval with cross-modal search.

Rule

File

Key Pattern

Embeddings

rules/multimodal-embeddings.md

CLIP, SigLIP 2, Voyage multimodal-3

Chunking

rules/multimodal-chunking.md

PDF extraction preserving images

Pipeline

rules/multimodal-pipeline.md

Dedup + hybrid retrieval + generation

Query Decomposition

Breaking complex queries into concepts for parallel retrieval.

Rule

File

Key Pattern

Detection

rules/query-detection.md

Heuristic indicators (<1ms fast path)

Decompose + RRF

rules/query-decompose.md

LLM concept extraction + parallel retrieval

HyDE Combo

rules/query-hyde-combo.md

Decompose + HyDE for maximum coverage

Reranking

Post-retrieval re-scoring for higher precision.

Rule

File

Key Pattern

Cross-Encoder

rules/reranking-cross-encoder.md

ms-marco-MiniLM (~50ms, free)

LLM Reranking

rules/reranking-llm.md

Batch scoring + Cohere API

Combined

rules/reranking-combined.md

Multi-signal weighted scoring

PGVector

Production hybrid search with PostgreSQL.

Rule

File

Key Pattern

Schema

rules/pgvector-schema.md

HNSW index + pre-computed tsvector

Hybrid Search

rules/pgvector-hybrid-search.md

SQLAlchemy RRF with FULL OUTER JOIN

Indexing

rules/pgvector-indexing.md

HNSW (17x faster) vs IVFFlat

Metadata

rules/pgvector-metadata.md

Filtering, boosting, Redis 8 comparison

Quick Start Example

from openai import OpenAI

client = OpenAI()

async def rag_query(question: str, top_k: int = 5) -> dict:

    """Basic RAG with citations."""

    docs = await vector_db.search(question, limit=top_k)

    context = "\n\n".join([f"[{i+1}] {doc.text}" for i, doc in enumerate(docs)])

    response = await llm.chat([

        {"role": "system", "content": "Answer with inline citations [1], [2]. Use ONLY provided context."},

        {"role": "user", "content": f"Context:\n{context}\n\nQuestion: {question}"}

    ])

    return {"answer": response.content, "sources": [d.metadata['source'] for d in docs]}

Key Decisions

Decision

Recommendation

Embedding model

text-embedding-3-small (general), voyage-3 (production)

Chunk size

256-1024 tokens (512 typical)

Hybrid weight

40% BM25 / 60% vector

Top-k

3-10 documents

Temperature

0.1-0.3 (factual)

Context budget

4K-8K tokens

Reranking

Retrieve 50, rerank to 10

Vector index

HNSW (production), IVFFlat (high-volume)

HyDE timeout

2-3 seconds with fallback

Query decomposition

Heuristic first, LLM only if multi-concept

Common Mistakes

  • No citation tracking (unverifiable answers)
  • Context too large (dilutes relevance)
  • Single retrieval method (misses keyword matches)
  • Not chunking long documents (context gets lost)
  • Embedding queries differently than documents
  • No fallback path in agentic RAG (workflow hangs)
  • Infinite rewrite loops (no retry limit)
  • Using wrong similarity metric (cosine vs euclidean)
  • Not caching embeddings (recomputing unchanged content)
  • Missing image captions in multimodal RAG (limits text search)

Evaluations

See test-cases.json for 30 test cases across all categories.

Related Skills

  • ork:langgraph - LangGraph workflow patterns (for agentic RAG workflows)
  • caching - Cache RAG responses for repeated queries
  • ork:golden-dataset - Evaluate retrieval quality
  • ork:llm-integration - Local embeddings with nomic-embed-text
  • vision-language-models - Image analysis for multimodal RAG
  • ork:database-patterns - Schema design for vector search

Capability Details

retrieval-patterns

Keywords: retrieval, context, chunks, relevance, rag

Solves:

  • Retrieve relevant context for LLM
  • Implement RAG pipeline with citations
  • Optimize retrieval quality

hybrid-search

Keywords: hybrid, bm25, vector, fusion, rrf

Solves:

  • Combine keyword and semantic search
  • Implement reciprocal rank fusion
  • Balance precision and recall

embeddings

Keywords: embedding, text to vector, vectorize, chunk, similarity

Solves:

  • Convert text to vector embeddings
  • Choose embedding models and dimensions
  • Implement chunking strategies

contextual-retrieval

Keywords: contextual, anthropic, context-prepend, bm25

Solves:

  • Prepend context to chunks for better retrieval
  • Reduce retrieval failures by 67%
  • Implement hybrid BM25+vector search

hyde

Keywords: hyde, hypothetical, vocabulary mismatch

Solves:

  • Bridge vocabulary gaps in semantic search
  • Generate hypothetical documents for embedding
  • Handle abstract or conceptual queries

agentic-rag

Keywords: self-rag, crag, corrective, adaptive, grading

Solves:

  • Build self-correcting RAG workflows
  • Grade document relevance
  • Implement web search fallback

multimodal-rag

Keywords: multimodal, image, clip, vision, pdf

Solves:

  • Build RAG with images and text
  • Cross-modal search (text → image)
  • Process PDFs with mixed content

query-decomposition

Keywords: decompose, multi-concept, complex query

Solves:

  • Break complex queries into concepts
  • Parallel retrieval per concept
  • Improve coverage for compound questions

reranking

Keywords: rerank, cross-encoder, precision, scoring

Solves:

  • Improve search precision post-retrieval
  • Score relevance with cross-encoder or LLM
  • Combine multiple scoring signals

pgvector-search

Keywords: pgvector, postgresql, hnsw, tsvector, hybrid

Solves:

  • Production hybrid search with PostgreSQL
  • HNSW vs IVFFlat index selection
  • SQL-based RRF fusion
BrowserAct

Let your agent run on any real-world website

Bypass CAPTCHA & anti-bot for free. Start local, scale to cloud.

Explore BrowserAct Skills →

Stop writing automation&scrapers

Install the CLI. Run your first Skill in 30 seconds. Scale when you're ready.

Start free
free · no credit card