qdrant-vector-search

Rust-powered vector database for production RAG with hybrid search and distributed scaling. Supports dense, sparse, and multi-vector storage per point with four distance metrics (COSINE, EUCLID, DOT, MANHATTAN) and HNSW indexing for fast nearest-neighbor search Rich filtering during search across any payload field, with optional payload indexing for performance and support for complex boolean queries Quantization options (scalar, product, binary) and on-disk storage for memory efficiency in large-scale deployments Integrates with sentence-transformers, LangChain, and LlamaIndex for RAG pipelines; includes batch search, REST and gRPC APIs, and distributed deployment with sharding and replication

INSTALLATION
npx skills add https://github.com/davila7/claude-code-templates --skill qdrant-vector-search
Run in your project or agent environment. Adjust flags if your CLI version differs.

SKILL.md

Qdrant - Vector Similarity Search Engine

High-performance vector database written in Rust for production RAG and semantic search.

When to use Qdrant

Use Qdrant when:

  • Building production RAG systems requiring low latency
  • Need hybrid search (vectors + metadata filtering)
  • Require horizontal scaling with sharding/replication
  • Want on-premise deployment with full data control
  • Need multi-vector storage per record (dense + sparse)
  • Building real-time recommendation systems

Key features:

  • Rust-powered: Memory-safe, high performance
  • Rich filtering: Filter by any payload field during search
  • Multiple vectors: Dense, sparse, multi-dense per point
  • Quantization: Scalar, product, binary for memory efficiency
  • Distributed: Raft consensus, sharding, replication
  • REST + gRPC: Both APIs with full feature parity

Use alternatives instead:

  • Chroma: Simpler setup, embedded use cases
  • FAISS: Maximum raw speed, research/batch processing
  • Pinecone: Fully managed, zero ops preferred
  • Weaviate: GraphQL preference, built-in vectorizers

Quick start

Installation

# Python client

pip install qdrant-client

# Docker (recommended for development)

docker run -p 6333:6333 -p 6334:6334 qdrant/qdrant

# Docker with persistent storage

docker run -p 6333:6333 -p 6334:6334 \

    -v $(pwd)/qdrant_storage:/qdrant/storage \

    qdrant/qdrant

Basic usage

from qdrant_client import QdrantClient

from qdrant_client.models import Distance, VectorParams, PointStruct

# Connect to Qdrant

client = QdrantClient(host="localhost", port=6333)

# Create collection

client.create_collection(

    collection_name="documents",

    vectors_config=VectorParams(size=384, distance=Distance.COSINE)

)

# Insert vectors with payload

client.upsert(

    collection_name="documents",

    points=[

        PointStruct(

            id=1,

            vector=[0.1, 0.2, ...],  # 384-dim vector

            payload={"title": "Doc 1", "category": "tech"}

        ),

        PointStruct(

            id=2,

            vector=[0.3, 0.4, ...],

            payload={"title": "Doc 2", "category": "science"}

        )

    ]

)

# Search with filtering

results = client.search(

    collection_name="documents",

    query_vector=[0.15, 0.25, ...],

    query_filter={

        "must": [{"key": "category", "match": {"value": "tech"}}]

    },

    limit=10

)

for point in results:

    print(f"ID: {point.id}, Score: {point.score}, Payload: {point.payload}")

Core concepts

Points - Basic data unit

from qdrant_client.models import PointStruct

# Point = ID + Vector(s) + Payload

point = PointStruct(

    id=123,                              # Integer or UUID string

    vector=[0.1, 0.2, 0.3, ...],        # Dense vector

    payload={                            # Arbitrary JSON metadata

        "title": "Document title",

        "category": "tech",

        "timestamp": 1699900000,

        "tags": ["python", "ml"]

    }

)

# Batch upsert (recommended)

client.upsert(

    collection_name="documents",

    points=[point1, point2, point3],

    wait=True  # Wait for indexing

)

Collections - Vector containers

from qdrant_client.models import VectorParams, Distance, HnswConfigDiff

# Create with HNSW configuration

client.create_collection(

    collection_name="documents",

    vectors_config=VectorParams(

        size=384,                        # Vector dimensions

        distance=Distance.COSINE         # COSINE, EUCLID, DOT, MANHATTAN

    ),

    hnsw_config=HnswConfigDiff(

        m=16,                            # Connections per node (default 16)

        ef_construct=100,                # Build-time accuracy (default 100)

        full_scan_threshold=10000        # Switch to brute force below this

    ),

    on_disk_payload=True                 # Store payload on disk

)

# Collection info

info = client.get_collection("documents")

print(f"Points: {info.points_count}, Vectors: {info.vectors_count}")

Distance metrics

Metric

Use Case

Range

COSINE

Text embeddings, normalized vectors

0 to 2

EUCLID

Spatial data, image features

0 to ∞

DOT

Recommendations, unnormalized

-∞ to ∞

MANHATTAN

Sparse features, discrete data

0 to ∞

Search operations

Basic search

# Simple nearest neighbor search

results = client.search(

    collection_name="documents",

    query_vector=[0.1, 0.2, ...],

    limit=10,

    with_payload=True,

    with_vectors=False  # Don't return vectors (faster)

)

Filtered search

from qdrant_client.models import Filter, FieldCondition, MatchValue, Range

# Complex filtering

results = client.search(

    collection_name="documents",

    query_vector=query_embedding,

    query_filter=Filter(

        must=[

            FieldCondition(key="category", match=MatchValue(value="tech")),

            FieldCondition(key="timestamp", range=Range(gte=1699000000))

        ],

        must_not=[

            FieldCondition(key="status", match=MatchValue(value="archived"))

        ]

    ),

    limit=10

)

# Shorthand filter syntax

results = client.search(

    collection_name="documents",

    query_vector=query_embedding,

    query_filter={

        "must": [

            {"key": "category", "match": {"value": "tech"}},

            {"key": "price", "range": {"gte": 10, "lte": 100}}

        ]

    },

    limit=10

)

Batch search

from qdrant_client.models import SearchRequest

# Multiple queries in one request

results = client.search_batch(

    collection_name="documents",

    requests=[

        SearchRequest(vector=[0.1, ...], limit=5),

        SearchRequest(vector=[0.2, ...], limit=5, filter={"must": [...]}),

        SearchRequest(vector=[0.3, ...], limit=10)

    ]

)

RAG integration

With sentence-transformers

from sentence_transformers import SentenceTransformer

from qdrant_client import QdrantClient

from qdrant_client.models import VectorParams, Distance, PointStruct

# Initialize

encoder = SentenceTransformer("all-MiniLM-L6-v2")

client = QdrantClient(host="localhost", port=6333)

# Create collection

client.create_collection(

    collection_name="knowledge_base",

    vectors_config=VectorParams(size=384, distance=Distance.COSINE)

)

# Index documents

documents = [

    {"id": 1, "text": "Python is a programming language", "source": "wiki"},

    {"id": 2, "text": "Machine learning uses algorithms", "source": "textbook"},

]

points = [

    PointStruct(

        id=doc["id"],

        vector=encoder.encode(doc["text"]).tolist(),

        payload={"text": doc["text"], "source": doc["source"]}

    )

    for doc in documents

]

client.upsert(collection_name="knowledge_base", points=points)

# RAG retrieval

def retrieve(query: str, top_k: int = 5) -> list[dict]:

    query_vector = encoder.encode(query).tolist()

    results = client.search(

        collection_name="knowledge_base",

        query_vector=query_vector,

        limit=top_k

    )

    return [{"text": r.payload["text"], "score": r.score} for r in results]

# Use in RAG pipeline

context = retrieve("What is Python?")

prompt = f"Context: {context}\n\nQuestion: What is Python?"

With LangChain

from langchain_community.vectorstores import Qdrant

from langchain_community.embeddings import HuggingFaceEmbeddings

embeddings = HuggingFaceEmbeddings(model_name="all-MiniLM-L6-v2")

vectorstore = Qdrant.from_documents(documents, embeddings, url="http://localhost:6333", collection_name="docs")

retriever = vectorstore.as_retriever(search_kwargs={"k": 5})

With LlamaIndex

from llama_index.vector_stores.qdrant import QdrantVectorStore

from llama_index.core import VectorStoreIndex, StorageContext

vector_store = QdrantVectorStore(client=client, collection_name="llama_docs")

storage_context = StorageContext.from_defaults(vector_store=vector_store)

index = VectorStoreIndex.from_documents(documents, storage_context=storage_context)

query_engine = index.as_query_engine()

Multi-vector support

Named vectors (different embedding models)

from qdrant_client.models import VectorParams, Distance

# Collection with multiple vector types

client.create_collection(

    collection_name="hybrid_search",

    vectors_config={

        "dense": VectorParams(size=384, distance=Distance.COSINE),

        "sparse": VectorParams(size=30000, distance=Distance.DOT)

    }

)

# Insert with named vectors

client.upsert(

    collection_name="hybrid_search",

    points=[

        PointStruct(

            id=1,

            vector={

                "dense": dense_embedding,

                "sparse": sparse_embedding

            },

            payload={"text": "document text"}

        )

    ]

)

# Search specific vector

results = client.search(

    collection_name="hybrid_search",

    query_vector=("dense", query_dense),  # Specify which vector

    limit=10

)

Sparse vectors (BM25, SPLADE)

from qdrant_client.models import SparseVectorParams, SparseIndexParams, SparseVector

# Collection with sparse vectors

client.create_collection(

    collection_name="sparse_search",

    vectors_config={},

    sparse_vectors_config={"text": SparseVectorParams(index=SparseIndexParams(on_disk=False))}

)

# Insert sparse vector

client.upsert(

    collection_name="sparse_search",

    points=[PointStruct(id=1, vector={"text": SparseVector(indices=[1, 5, 100], values=[0.5, 0.8, 0.2])}, payload={"text": "document"})]

)

Quantization (memory optimization)

from qdrant_client.models import ScalarQuantization, ScalarQuantizationConfig, ScalarType

# Scalar quantization (4x memory reduction)

client.create_collection(

    collection_name="quantized",

    vectors_config=VectorParams(size=384, distance=Distance.COSINE),

    quantization_config=ScalarQuantization(

        scalar=ScalarQuantizationConfig(

            type=ScalarType.INT8,

            quantile=0.99,        # Clip outliers

            always_ram=True      # Keep quantized in RAM

        )

    )

)

# Search with rescoring

results = client.search(

    collection_name="quantized",

    query_vector=query,

    search_params={"quantization": {"rescore": True}},  # Rescore top results

    limit=10

)

Payload indexing

from qdrant_client.models import PayloadSchemaType

# Create payload index for faster filtering

client.create_payload_index(

    collection_name="documents",

    field_name="category",

    field_schema=PayloadSchemaType.KEYWORD

)

client.create_payload_index(

    collection_name="documents",

    field_name="timestamp",

    field_schema=PayloadSchemaType.INTEGER

)

# Index types: KEYWORD, INTEGER, FLOAT, GEO, TEXT (full-text), BOOL

Production deployment

Qdrant Cloud

from qdrant_client import QdrantClient

# Connect to Qdrant Cloud

client = QdrantClient(

    url="https://your-cluster.cloud.qdrant.io",

    api_key="your-api-key"

)

Performance tuning

# Optimize for search speed (higher recall)

client.update_collection(

    collection_name="documents",

    hnsw_config=HnswConfigDiff(ef_construct=200, m=32)

)

# Optimize for indexing speed (bulk loads)

client.update_collection(

    collection_name="documents",

    optimizer_config={"indexing_threshold": 20000}

)

Best practices

  • Batch operations - Use batch upsert/search for efficiency
  • Payload indexing - Index fields used in filters
  • Quantization - Enable for large collections (>1M vectors)
  • Sharding - Use for collections >10M vectors
  • On-disk storage - Enable on_disk_payload for large payloads
  • Connection pooling - Reuse client instances

Common issues

Slow search with filters:

# Create payload index for filtered fields

client.create_payload_index(

    collection_name="docs",

    field_name="category",

    field_schema=PayloadSchemaType.KEYWORD

)

Out of memory:

# Enable quantization and on-disk storage

client.create_collection(

    collection_name="large_collection",

    vectors_config=VectorParams(size=384, distance=Distance.COSINE),

    quantization_config=ScalarQuantization(...),

    on_disk_payload=True

)

Connection issues:

# Use timeout and retry

client = QdrantClient(

    host="localhost",

    port=6333,

    timeout=30,

    prefer_grpc=True  # gRPC for better performance

)

References

Resources

  • Version: 1.12.0+
  • License: Apache 2.0
BrowserAct

Let your agent run on any real-world website

Bypass CAPTCHA & anti-bot for free. Start local, scale to cloud.

Explore BrowserAct Skills →

Stop writing automation&scrapers

Install the CLI. Run your first Skill in 30 seconds. Scale when you're ready.

Start free
free · no credit card