Hybrid RAG: Combine Keyword and Vector Search

When Vector Search Alone Falls Short

Vector search is powerful, but it has a clear blind spot: exact keyword matches. If a user queries for a specific product SKU, a person's name, a company ticker symbol, or a technical term like "OAuth2" or "BERT," the vector representation of that query may not be close to documents that contain the exact string, especially if the term is rare or domain-specific and was underrepresented in the embedding model's training data.

This is a well-known limitation of dense retrieval. The model has never seen your internal terminology during pre-training, so the embeddings for "Project Nexus" and "Project Atlas" might be nearly identical even though they refer to completely different initiatives. Keyword search handles this case perfectly because it matches exact tokens regardless of semantic similarity.

BM25: The Keyword Search Workhorse

BM25 (Best Match 25) is a probabilistic ranking function that scores documents by how often query terms appear in them, adjusted for document length and corpus-wide term frequency. It has been the backbone of search engines for decades and remains competitive with dense retrieval on queries that rely on exact terminology.

BM25 is a sparse retrieval method: it operates on inverted indexes of tokens rather than dense vector spaces. It is extremely fast, requires no GPU, and needs no training on your data. Its weakness is that it cannot handle synonyms or paraphrases. "Automobile" and "car" are different tokens, so a BM25 index will not connect them. That is exactly the gap vector search fills, which is why you want both working together.

Reciprocal Rank Fusion: Combining the Scores

The challenge with combining BM25 and vector search is that their scores are not on the same scale. BM25 scores are term-frequency statistics; vector search scores are cosine similarities between 0 and 1. You cannot simply add them together without normalization that is fragile and requires tuning per corpus.

Reciprocal Rank Fusion (RRF) solves this by ignoring raw scores and using only the rank positions. For each result, you compute a score of 1 divided by (k + rank), where k is a smoothing constant typically set to 60. You sum these scores across both retrieval systems. Documents that rank highly in both get a large combined score. Documents that rank highly in only one get a smaller score. RRF requires no score normalization and is robust across very different scoring distributions.

Implementing Hybrid Search in Python

The code below combines BM25 retrieval using rank-bm25 with vector retrieval using sentence-transformers. The RRF function merges the two ranked lists without requiring any score normalization. This is the full working pattern you can drop into an existing RAG pipeline.

pip install rank-bm25 sentence-transformers numpy

from rank_bm25 import BM25Okapi
from sentence_transformers import SentenceTransformer
import numpy as np

corpus = [
    "The OAuth2 authorization flow requires a client_id and client_secret.",
    "Authentication tokens expire after 3600 seconds by default.",
    "Project Nexus uses a microservice architecture with 12 services.",
    "Use the refresh_token endpoint to obtain a new access token.",
    "BM25 ranks documents by term frequency and inverse document frequency.",
]

# Build BM25 index
tokenized_corpus = [doc.lower().split() for doc in corpus]
bm25 = BM25Okapi(tokenized_corpus)

# Build vector index
embed_model = SentenceTransformer("all-MiniLM-L6-v2")
doc_embeddings = embed_model.encode(corpus, normalize_embeddings=True)

def bm25_search(query, top_k=10):
    tokens = query.lower().split()
    scores = bm25.get_scores(tokens)
    return list(np.argsort(scores)[::-1][:top_k])

def vector_search(query, top_k=10):
    query_emb = embed_model.encode([query], normalize_embeddings=True)
    scores = doc_embeddings @ query_emb.T
    return list(np.argsort(scores.flatten())[::-1][:top_k])

def reciprocal_rank_fusion(bm25_ranks, vector_ranks, k=60):
    scores = {}
    for rank, doc_idx in enumerate(bm25_ranks):
        scores[doc_idx] = scores.get(doc_idx, 0) + 1.0 / (k + rank + 1)
    for rank, doc_idx in enumerate(vector_ranks):
        scores[doc_idx] = scores.get(doc_idx, 0) + 1.0 / (k + rank + 1)
    return sorted(scores.items(), key=lambda x: x[1], reverse=True)

query = "OAuth2 refresh token expiry"
bm25_results = bm25_search(query, top_k=5)
vector_results = vector_search(query, top_k=5)
hybrid_results = reciprocal_rank_fusion(bm25_results, vector_results)

print("Hybrid search results for: '{}'".format(query))
for doc_idx, score in hybrid_results[:3]:
    print("  Score {:.4f}: {}".format(score, corpus[doc_idx]))

Try this on a query containing a specific product code or a rare technical term. You will see BM25 surface documents with exact keyword matches that vector search ranks lower, while vector search catches semantically related documents BM25 misses because the terms differ. The hybrid result captures the best of both retrievers. For production, replace the in-memory lists with proper BM25 and vector database clients and tune the k parameter (60 is a safe default) and your top-k values using your evaluation set.

Prompt

"My RAG system fails on queries that contain specific internal product codes and SKUs because vector search does not find exact keyword matches. Explain how I would add BM25 hybrid search to my existing vector-only pipeline, what the RRF k parameter controls, and how to evaluate whether the hybrid approach actually improves retrieval quality."

Want to build this live with Aki?

Join a Lightning Lesson and go deeper on this topic. Browse upcoming sessions →

Hybrid RAG: Combine Keyword and Vector Search

Key Takeaways

When Vector Search Alone Falls Short

BM25: The Keyword Search Workhorse

Reciprocal Rank Fusion: Combining the Scores

Implementing Hybrid Search in Python

Want to build this live with Aki?

Aki Wijesundara

Ready to Launch Your AI Career?

Table of Contents

Share Article

Get Weekly AI Career Tips