Hybrid RAG: Combine Keyword and Vector Search
Vector search misses exact keyword matches for product codes, names, and domain-specific terms. BM25 plus vector search with Reciprocal Rank Fusion gives you both semantic understanding and precise term matching.
Key Takeaways
- Comprehensive strategies proven to work at top companies
- Actionable tips you can implement immediately
- Expert insights from industry professionals
When Vector Search Alone Falls Short
Vector search is powerful, but it has a clear blind spot: exact keyword matches. If a user queries for a specific product SKU, a person's name, a company ticker symbol, or a technical term like "OAuth2" or "BERT," the vector representation of that query may not be close to documents that contain the exact string, especially if the term is rare or domain-specific and was underrepresented in the embedding model's training data.
This is a well-known limitation of dense retrieval. The model has never seen your internal terminology during pre-training, so the embeddings for "Project Nexus" and "Project Atlas" might be nearly identical even though they refer to completely different initiatives. Keyword search handles this case perfectly because it matches exact tokens regardless of semantic similarity.
BM25: The Keyword Search Workhorse
BM25 (Best Match 25) is a probabilistic ranking function that scores documents by how often query terms appear in them, adjusted for document length and corpus-wide term frequency. It has been the backbone of search engines for decades and remains competitive with dense retrieval on queries that rely on exact terminology.
BM25 is a sparse retrieval method: it operates on inverted indexes of tokens rather than dense vector spaces. It is extremely fast, requires no GPU, and needs no training on your data. Its weakness is that it cannot handle synonyms or paraphrases. "Automobile" and "car" are different tokens, so a BM25 index will not connect them. That is exactly the gap vector search fills, which is why you want both working together.
Reciprocal Rank Fusion: Combining the Scores
The challenge with combining BM25 and vector search is that their scores are not on the same scale. BM25 scores are term-frequency statistics; vector search scores are cosine similarities between 0 and 1. You cannot simply add them together without normalization that is fragile and requires tuning per corpus.
Reciprocal Rank Fusion (RRF) solves this by ignoring raw scores and using only the rank positions. For each result, you compute a score of 1 divided by (k + rank), where k is a smoothing constant typically set to 60. You sum these scores across both retrieval systems. Documents that rank highly in both get a large combined score. Documents that rank highly in only one get a smaller score. RRF requires no score normalization and is robust across very different scoring distributions.
Implementing Hybrid Search in Python
The code below combines BM25 retrieval using rank-bm25 with vector retrieval using sentence-transformers. The RRF function merges the two ranked lists without requiring any score normalization. This is the full working pattern you can drop into an existing RAG pipeline.
pip install rank-bm25 sentence-transformers numpy
from rank_bm25 import BM25Okapi
from sentence_transformers import SentenceTransformer
import numpy as np
corpus = [
"The OAuth2 authorization flow requires a client_id and client_secret.",
"Authentication tokens expire after 3600 seconds by default.",
"Project Nexus uses a microservice architecture with 12 services.",
"Use the refresh_token endpoint to obtain a new access token.",
"BM25 ranks documents by term frequency and inverse document frequency.",
]
# Build BM25 index
tokenized_corpus = [doc.lower().split() for doc in corpus]
bm25 = BM25Okapi(tokenized_corpus)
# Build vector index
embed_model = SentenceTransformer("all-MiniLM-L6-v2")
doc_embeddings = embed_model.encode(corpus, normalize_embeddings=True)
def bm25_search(query, top_k=10):
tokens = query.lower().split()
scores = bm25.get_scores(tokens)
return list(np.argsort(scores)[::-1][:top_k])
def vector_search(query, top_k=10):
query_emb = embed_model.encode([query], normalize_embeddings=True)
scores = doc_embeddings @ query_emb.T
return list(np.argsort(scores.flatten())[::-1][:top_k])
def reciprocal_rank_fusion(bm25_ranks, vector_ranks, k=60):
scores = {}
for rank, doc_idx in enumerate(bm25_ranks):
scores[doc_idx] = scores.get(doc_idx, 0) + 1.0 / (k + rank + 1)
for rank, doc_idx in enumerate(vector_ranks):
scores[doc_idx] = scores.get(doc_idx, 0) + 1.0 / (k + rank + 1)
return sorted(scores.items(), key=lambda x: x[1], reverse=True)
query = "OAuth2 refresh token expiry"
bm25_results = bm25_search(query, top_k=5)
vector_results = vector_search(query, top_k=5)
hybrid_results = reciprocal_rank_fusion(bm25_results, vector_results)
print("Hybrid search results for: '{}'".format(query))
for doc_idx, score in hybrid_results[:3]:
print(" Score {:.4f}: {}".format(score, corpus[doc_idx]))
Try this on a query containing a specific product code or a rare technical term. You will see BM25 surface documents with exact keyword matches that vector search ranks lower, while vector search catches semantically related documents BM25 misses because the terms differ. The hybrid result captures the best of both retrievers. For production, replace the in-memory lists with proper BM25 and vector database clients and tune the k parameter (60 is a safe default) and your top-k values using your evaluation set.
Prompt
"My RAG system fails on queries that contain specific internal product codes and SKUs because vector search does not find exact keyword matches. Explain how I would add BM25 hybrid search to my existing vector-only pipeline, what the RRF k parameter controls, and how to evaluate whether the hybrid approach actually improves retrieval quality."
Want to build this live with Aki?
Join a Lightning Lesson and go deeper on this topic. Browse upcoming sessions →
Aki Wijesundara
Expert team of AI professionals and career advisors with experience at top tech companies. We've helped 500+ students land internships at Google, Meta, OpenAI, and other leading AI companies.
Ready to Launch Your AI Career?
Join our comprehensive program and get personalized guidance from industry experts who've been where you want to go.
Table of Contents
Share Article
Get Weekly AI Career Tips
Join 5,000+ professionals getting actionable career advice in their inbox.
No spam. Unsubscribe anytime.