faiss

Facebook's library for efficient similarity search and clustering of dense vectors. Supports billions of vectors, GPU acceleration, and various index types…

INSTALLATION

npx skills add https://github.com/davila7/claude-code-templates --skill faiss

Run in your project or agent environment. Adjust flags if your CLI version differs.

SKILL.md

FAISS - Efficient Similarity Search

Name: faiss
Author: davila7

Facebook AI's library for billion-scale vector similarity search.

When to use FAISS

Use FAISS when:

Need fast similarity search on large vector datasets (millions/billions)

GPU acceleration required

Pure vector similarity (no metadata filtering needed)

High throughput, low latency critical

Offline/batch processing of embeddings

Metrics:

31,700+ GitHub stars

Meta/Facebook AI Research

Handles billions of vectors

C++ with Python bindings

Use alternatives instead:

Chroma/Pinecone: Need metadata filtering

Weaviate: Need full database features

Annoy: Simpler, fewer features

Quick start

Installation

# CPU only

pip install faiss-cpu

# GPU support

pip install faiss-gpu

Basic usage

import faiss

import numpy as np

# Create sample data (1000 vectors, 128 dimensions)

d = 128

nb = 1000

vectors = np.random.random((nb, d)).astype('float32')

# Create index

index = faiss.IndexFlatL2(d)  # L2 distance

index.add(vectors)             # Add vectors

# Search

k = 5  # Find 5 nearest neighbors

query = np.random.random((1, d)).astype('float32')

distances, indices = index.search(query, k)

print(f"Nearest neighbors: {indices}")

print(f"Distances: {distances}")

Index types

1. Flat (exact search)

# L2 (Euclidean) distance

index = faiss.IndexFlatL2(d)

# Inner product (cosine similarity if normalized)

index = faiss.IndexFlatIP(d)

# Slowest, most accurate

2. IVF (inverted file) - Fast approximate

# Create quantizer

quantizer = faiss.IndexFlatL2(d)

# IVF index with 100 clusters

nlist = 100

index = faiss.IndexIVFFlat(quantizer, d, nlist)

# Train on data

index.train(vectors)

# Add vectors

index.add(vectors)

# Search (nprobe = clusters to search)

index.nprobe = 10

distances, indices = index.search(query, k)

3. HNSW (Hierarchical NSW) - Best quality/speed

# HNSW index

M = 32  # Number of connections per layer

index = faiss.IndexHNSWFlat(d, M)

# No training needed

index.add(vectors)

# Search

distances, indices = index.search(query, k)

4. Product Quantization - Memory efficient

# PQ reduces memory by 16-32×

m = 8   # Number of subquantizers

nbits = 8

index = faiss.IndexPQ(d, m, nbits)

# Train and add

index.train(vectors)

index.add(vectors)

Save and load

# Save index

faiss.write_index(index, "large.index")

# Load index

index = faiss.read_index("large.index")

# Continue using

distances, indices = index.search(query, k)

GPU acceleration

# Single GPU

res = faiss.StandardGpuResources()

index_cpu = faiss.IndexFlatL2(d)

index_gpu = faiss.index_cpu_to_gpu(res, 0, index_cpu)  # GPU 0

# Multi-GPU

index_gpu = faiss.index_cpu_to_all_gpus(index_cpu)

# 10-100× faster than CPU

LangChain integration

from langchain_community.vectorstores import FAISS

from langchain_openai import OpenAIEmbeddings

# Create FAISS vector store

vectorstore = FAISS.from_documents(docs, OpenAIEmbeddings())

# Save

vectorstore.save_local("faiss_index")

# Load

vectorstore = FAISS.load_local(

    "faiss_index",

    OpenAIEmbeddings(),

    allow_dangerous_deserialization=True

)

# Search

results = vectorstore.similarity_search("query", k=5)

LlamaIndex integration

from llama_index.vector_stores.faiss import FaissVectorStore

import faiss

# Create FAISS index

d = 1536

faiss_index = faiss.IndexFlatL2(d)

vector_store = FaissVectorStore(faiss_index=faiss_index)

Best practices

Choose right index type - Flat for <10K, IVF for 10K-1M, HNSW for quality

Normalize for cosine - Use IndexFlatIP with normalized vectors

Use GPU for large datasets - 10-100× faster

Save trained indices - Training is expensive

Tune nprobe/ef_search - Balance speed/accuracy

Monitor memory - PQ for large datasets

Batch queries - Better GPU utilization

Performance

Index Type

Build Time

Search Time

Memory

Accuracy

Flat

Fast

Slow

High

100%

IVF

Medium

Fast

Medium

95-99%

HNSW

Slow

Fastest

High

99%

Medium

Fast

Low

90-95%

Resources

GitHub: https://github.com/facebookresearch/faiss ⭐ 31,700+

Wiki: https://github.com/facebookresearch/faiss/wiki

License: MIT

faiss

SKILL.md

FAISS - Efficient Similarity Search

When to use FAISS

Quick start

Installation

Basic usage

Index types

1. Flat (exact search)

2. IVF (inverted file) - Fast approximate

3. HNSW (Hierarchical NSW) - Best quality/speed

4. Product Quantization - Memory efficient

Save and load

GPU acceleration

LangChain integration

LlamaIndex integration

Best practices

Performance

Resources

Let your agent run on any real-world website

Related skills

Stop writing automation&scrapers