SKILL.md

RAG Implementation

Build Retrieval-Augmented Generation systems that extend AI capabilities with external knowledge sources.

Overview

This skill covers: document processing, embedding generation, vector storage, retrieval configuration, and RAG pipeline implementation.

When to Use

Building Q&A systems over proprietary documents

Creating chatbots with factual information from knowledge bases

Implementing semantic search with natural language queries

Reducing hallucinations with grounded, sourced responses

Building documentation assistants and research tools

Enabling AI systems to access domain-specific knowledge

Instructions

Step 1: Choose Vector Database

Select based on your requirements:

Requirement

Recommended

Production scalability

Pinecone, Milvus

Open-source

Weaviate, Qdrant

Local development

Chroma, FAISS

Hybrid search

Weaviate with BM25

Step 2: Select Embedding Model

Use Case

Model

General purpose

text-embedding-ada-002

Fast and lightweight

all-MiniLM-L6-v2

Multilingual

e5-large-v2

Best performance

bge-large-en-v1.5

Step 3: Implement Document Processing Pipeline

Load documents from source (file system, database, API)

Clean and preprocess (remove formatting, normalize text)

Split documents into chunks with appropriate strategy

Generate embeddings for each chunk

Store embeddings in vector database with metadata

Validation: Verify embeddings were generated successfully:

List<Embedding> embeddings = embeddingModel.embedAll(segments);

if (embeddings.isEmpty() || embeddings.get(0).dimension() != expectedDim) {

    throw new IllegalStateException("Embedding generation failed");

}

Step 4: Configure Retrieval Strategy

Choose the appropriate strategy:

Dense Retrieval: Semantic similarity via embeddings (default for most cases)

Hybrid Search: Dense + sparse retrieval for better coverage

Metadata Filtering: Filter by document attributes

Reranking: Cross-encoder reranking for high-precision requirements

Step 5: Build RAG Pipeline

Create content retriever with your embedding store

Configure AI service with retriever and chat memory

Implement prompt template with context injection

Add response validation and grounding checks

Validation: Test with known queries to verify context injection works correctly.

Error Handling: For batch ingestion, wrap in retry logic:

for (Document doc : documents) {

    int attempts = 0;

    while (attempts < 3) {

        try {

            store.add(embeddingModel.embed(doc).content(), doc.toTextSegment());

            break;

        } catch (EmbeddingException e) {

            attempts++;

            if (attempts == 3) throw new RuntimeException("Failed after 3 retries", e);

        }

    }

}

Step 6: Evaluate and Optimize

Measure retrieval metrics: precision@k, recall@k, MRR

Evaluate answer quality: faithfulness, relevance

Monitor performance and user feedback

Iterate on chunking, retrieval, and prompt parameters

Examples

Example 1: Basic Document Q&A

List<Document> documents = FileSystemDocumentLoader.loadDocuments("/docs");

InMemoryEmbeddingStore<TextSegment> store = new InMemoryEmbeddingStore<>();

EmbeddingStoreIngestor.ingest(documents, store);

DocumentAssistant assistant = AiServices.builder(DocumentAssistant.class)

    .chatModel(chatModel)

    .contentRetriever(EmbeddingStoreContentRetriever.from(store))

    .build();

String answer = assistant.answer("What is the company policy on remote work?");

Example 2: Metadata-Filtered Retrieval

EmbeddingStoreContentRetriever retriever = EmbeddingStoreContentRetriever.builder()

    .embeddingStore(store)

    .embeddingModel(embeddingModel)

    .maxResults(5)

    .minScore(0.7)

    .filter(metadataKey("category").isEqualTo("technical"))

    .build();

Example 3: Multi-Source RAG Pipeline

ContentRetriever webRetriever = EmbeddingStoreContentRetriever.from(webStore);

ContentRetriever docRetriever = EmbeddingStoreContentRetriever.from(docStore);

List<Content> results = new ArrayList<>();

results.addAll(webRetriever.retrieve(query));

results.addAll(docRetriever.retrieve(query));

List<Content> topResults = reranker.reorder(query, results).subList(0, 5);

Example 4: RAG with Chat Memory

Assistant assistant = AiServices.builder(Assistant.class)

    .chatModel(chatModel)

    .chatMemory(MessageWindowChatMemory.withMaxMessages(10))

    .contentRetriever(retriever)

    .build();

assistant.chat("Tell me about the product features");

assistant.chat("What about pricing for those features?");  // Maintains context

Best Practices

Document Preparation

Clean documents before ingestion; remove irrelevant content and formatting

Add relevant metadata for filtering and context

Chunking Strategy

Use 500-1000 tokens per chunk for optimal balance

Include 10-20% overlap to preserve context at boundaries

Test different sizes for your specific use case

Retrieval Optimization

Start with high k values (10-20), then filter/rerank

Use metadata filtering to improve relevance

Monitor retrieval quality and iterate based on user feedback

Performance

Cache embeddings for frequently accessed content

Use batch processing for document ingestion

Optimize vector store indexing for your scale

Constraints and Warnings

System Constraints

Embedding models have maximum token limits per document

Vector databases require proper indexing for performance

Chunk boundaries may lose context for complex documents

Hybrid search requires additional infrastructure

Quality Warnings

Retrieval quality depends heavily on chunking strategy

Embedding models may not capture domain-specific semantics

Metadata filtering requires proper document annotation

Reranking adds latency to query responses

Security Warnings

Never hardcode credentials: Use environment variables for API keys and passwords

Validate external content: Documents from file systems, APIs, or web sources may contain malicious content (prompt injection)

Apply content filtering on retrieved documents before passing to LLM

Restrict allowed data source URLs and file paths using allowlists

Resources

Reference Documentation

Vector Database Comparison

Embedding Models Guide

Retrieval Strategies

Document Chunking

LangChain4j RAG Guide

rag