Metadata Filtering in Vector Queries: Search Smarter Not Harder
Pure semantic search returns the most similar documents, but similar does not always mean relevant. Metadata filtering restricts search to the right subset before similarity scoring, and it changes everything.
Key Takeaways
- Comprehensive strategies proven to work at top companies
- Actionable tips you can implement immediately
- Expert insights from industry professionals
Why Pure Semantic Search Falls Short
Imagine you are building a search engine over a knowledge base that spans multiple product lines, multiple user tiers, and five years of documentation. A user searches for "billing configuration" and your semantic search returns 10 highly similar results. But 7 of them are for a product the user does not have access to, 2 are from a deprecated version, and 1 is for an enterprise feature the user's plan does not include.
This is the core problem with pure semantic search: similarity to the query tells you nothing about whether a document is appropriate for this user. Business logic, access controls, freshness requirements, and categorical relevance are all separate from semantic meaning. Without metadata filtering, your vector search index treats all documents as equally eligible for retrieval regardless of context.
The naive fix is to retrieve more results and filter after the fact. Retrieve 50, apply your business rules in application code, return the survivors. This is called post-filtering, and it has a serious flaw: if most of your index is outside the allowed subset, you may retrieve 50 candidates, filter out 45, and end up with only 5 results when you needed 10. Worse, the genuinely relevant documents within the allowed subset may rank below position 50 in the full index, meaning post-filtering will miss them entirely.
Prompt
"I am building semantic search over a multi-tenant knowledge base. Help me design a metadata schema for my vector index that supports filtering by tenant ID, document category, language, and creation date. Consider query patterns, cardinality, and how each field will be used in filters before making recommendations."
Pre-filtering vs Post-filtering
Pre-filtering restricts the search space to only the documents that match the metadata criteria before running similarity scoring. This is the correct approach. The database identifies the eligible subset using its metadata index, then runs approximate nearest neighbor search over only that subset. The result is semantically ranked results drawn exclusively from the documents the user is allowed to see.
Pre-filtering is more technically demanding than post-filtering because the database must efficiently combine a metadata scan with a vector search. This is exactly what dedicated vector databases like Qdrant are built to do. Qdrant maintains a payload index alongside the vector index. When a filtered query arrives, Qdrant uses the payload index to build a candidate set, then runs HNSW search over that subset. The combination of a well-indexed payload and a pre-filtered HNSW search is both fast and accurate.
The performance difference is significant. On a collection of one million documents where 95% are outside the user's tenant, post-filtering requires scanning 50,000 vectors to return 10 usable results (assuming you fetch 50x). Pre-filtering scans only the 50,000 vectors in the tenant's subset to find the top 10, skipping the other 950,000 entirely. The query touches 19x fewer vectors and returns correct results every time.
Attaching Metadata at Index Time
In Qdrant, metadata is stored as a payload attached to each point. You create payload indexes on specific fields before loading data, which tells Qdrant to build an inverted index on those fields for fast filtered lookups. Without a payload index, filtered queries degrade to a full payload scan, which negates the performance benefit of filtering entirely.
from qdrant_client import QdrantClient
from qdrant_client.models import (
VectorParams, Distance, PointStruct, PayloadSchemaType
)
import uuid
client = QdrantClient(host="localhost", port=6333)
client.create_collection(
collection_name="knowledge_base",
vectors_config=VectorParams(size=384, distance=Distance.COSINE)
)
client.create_payload_index(
collection_name="knowledge_base",
field_name="category",
field_schema=PayloadSchemaType.KEYWORD
)
client.create_payload_index(
collection_name="knowledge_base",
field_name="tenant_id",
field_schema=PayloadSchemaType.KEYWORD
)
client.create_payload_index(
collection_name="knowledge_base",
field_name="published_year",
field_schema=PayloadSchemaType.INTEGER
)
points = []
for doc in documents:
points.append(PointStruct(
id=str(uuid.uuid4()),
vector=embeddings[doc["id"]].tolist(),
payload={
"title": doc["title"],
"category": doc["category"],
"tenant_id": doc["tenant_id"],
"published_year": doc["published_year"]
}
))
client.upsert(collection_name="knowledge_base", points=points)
print("Indexed {} documents with payload indexes".format(len(points)))
Create payload indexes before loading data, not after. Adding indexes to an existing large collection requires Qdrant to rebuild them, which is slow and resource-intensive. Define your filtering fields upfront, create their indexes, then start your ingestion pipeline.
Filtered Queries in Qdrant
With payload indexes in place, filtered queries combine vector similarity with metadata constraints in a single API call. Qdrant's filter syntax supports exact matches with MatchValue, range conditions with Range, list membership with MatchAny, and logical operators must, should, and must_not. These compose into a Filter object passed alongside the query vector.
from qdrant_client.models import Filter, FieldCondition, MatchValue, Range
query_vector = model.encode("database connection configuration").tolist()
results = client.search(
collection_name="knowledge_base",
query_vector=query_vector,
query_filter=Filter(
must=[
FieldCondition(
key="tenant_id",
match=MatchValue(value="acme-corp")
),
FieldCondition(
key="category",
match=MatchValue(value="engineering")
),
FieldCondition(
key="published_year",
range=Range(gte=2024)
)
]
),
limit=10
)
for hit in results:
print("[{:.4f}] {} ({}, {})".format(
hit.score,
hit.payload["title"],
hit.payload["category"],
hit.payload["published_year"]
))
The must list is an AND condition: all constraints must be satisfied. Use should for OR conditions (at least one must match) and must_not to exclude documents. You can nest Filter objects to build complex boolean expressions. For multi-tenant systems, always include tenant_id as a must condition in every query, enforcing access isolation at the database layer rather than relying on application-side checks that could be bypassed by a bug or missing guard clause.
Want to build this live with Aki?
Join a Lightning Lesson and go deeper on this topic. Browse upcoming sessions →
Aki Wijesundara
Expert team of AI professionals and career advisors with experience at top tech companies. We've helped 500+ students land internships at Google, Meta, OpenAI, and other leading AI companies.
Ready to Launch Your AI Career?
Join our comprehensive program and get personalized guidance from industry experts who've been where you want to go.
Table of Contents
Share Article
Get Weekly AI Career Tips
Join 5,000+ professionals getting actionable career advice in their inbox.
No spam. Unsubscribe anytime.