Vector Databases Explained: What They Are and When You Need One

What Is an Embedding?

An embedding is a numerical representation of a piece of content. Given a sentence like "the quarterly revenue exceeded projections," a language model's embedding function produces a list of floating-point numbers, perhaps 384 or 1536 of them, that captures the semantic meaning of that sentence. Similar sentences produce numerically similar vectors. Dissimilar sentences produce numerically distant vectors.

This is not simple keyword matching. The embedding for "our Q3 results were above forecast" will be numerically close to "the quarterly revenue exceeded projections" even though they share almost no words. This is what makes embedding-based search semantically powerful: it finds meaning, not strings.

Generating embeddings is straightforward. The sentence-transformers library provides pre-trained models that run locally, and the major model providers offer embedding APIs that take text and return vectors. The challenge is not generating embeddings but storing, indexing, and querying them efficiently at scale.

from sentence_transformers import SentenceTransformer

model = SentenceTransformer("all-MiniLM-L6-v2")

documents = [
    "Vector databases store high-dimensional embeddings.",
    "Semantic search finds results by meaning.",
    "HNSW is an approximate nearest neighbor algorithm."
]

embeddings = model.encode(documents)
print("Embedding shape: {}".format(embeddings.shape))
# Output: Embedding shape: (3, 384)

How Cosine Similarity Search Works

The standard way to measure similarity between two embedding vectors is cosine similarity. It computes the cosine of the angle between two vectors in high-dimensional space. Two vectors pointing in the same direction have a cosine similarity of 1.0. Two orthogonal vectors have a similarity of 0. Two opposite vectors have a similarity of -1.0.

When you search a vector database, you provide a query vector (the embedding of your search query), and the database finds the stored vectors with the highest cosine similarity to your query. The result is a ranked list of documents ordered by semantic relevance.

The naive implementation is a linear scan: compute the similarity between your query and every stored vector, then sort. This works fine for thousands of vectors but becomes too slow for millions. Vector databases solve this with approximate nearest neighbor (ANN) algorithms that trade a small amount of recall for dramatic speed improvements. The most widely used ANN algorithm is HNSW (Hierarchical Navigable Small World), which constructs a graph-based index that allows queries to skip most of the search space entirely.

Key Vector Database Options

The vector database landscape has matured quickly. Here are the options you will encounter most often.

Qdrant is an open-source vector database written in Rust. It supports dense vectors, sparse vectors, named vector spaces, payload filtering, and horizontal scaling. It can run locally via Docker or as a managed cloud service. Its Python SDK is well-documented and it has native support for hybrid search combining dense and sparse vectors in a single query.

Pinecone is a fully managed vector database with a simple API. It abstracts all infrastructure and is the easiest option to get started with. The trade-off is higher cost and less configurability compared to self-hosted solutions, particularly around index tuning and quantization.

Weaviate is an open-source vector database with built-in module support for automatic vectorization. It supports hybrid BM25 plus vector search out of the box and has a GraphQL query interface. Its schema system makes it well-suited for structured data alongside vectors.

pgvector is a PostgreSQL extension that adds vector storage and similarity search to an existing Postgres database. If your team already operates Postgres and your vector dataset is under a few million rows, pgvector is often the simplest path. It avoids introducing a new infrastructure component entirely and benefits from Postgres's mature ecosystem for backups, access control, and tooling.

Prompt

"I have a corpus of 500,000 product support documents and need to build semantic search over them. My team already runs Postgres. Help me decide between pgvector and Qdrant. Ask me about my query latency requirements, my team's operational capabilities, and whether I need metadata filtering before giving a recommendation."

When You Need a Vector DB vs When You Don't

Not every project needs a dedicated vector database. If you are building a prototype that searches a few hundred documents, a simple in-memory approach with numpy and cosine similarity is faster to build and sufficient for the purpose. If your document set changes infrequently and is small enough to fit in memory, a pre-computed embedding cache stored in a JSON file or SQLite might be all you need.

You need a vector database when your document set is too large to fit comfortably in memory, when documents change frequently (upserts and deletes must be fast), when you need metadata filtering alongside vector search, or when you need multi-tenant isolation between user datasets. At that point, the operational features of a proper vector database, including persistent storage, ANN indexing, filtering, and horizontal scaling, justify the infrastructure overhead.

from qdrant_client import QdrantClient
from qdrant_client.models import VectorParams, Distance, PointStruct
import uuid

client = QdrantClient(host="localhost", port=6333)

client.create_collection(
    collection_name="articles",
    vectors_config=VectorParams(size=384, distance=Distance.COSINE)
)

points = [
    PointStruct(
        id=str(uuid.uuid4()),
        vector=embeddings[i].tolist(),
        payload={"title": doc["title"], "url": doc["url"]}
    )
    for i, doc in enumerate(documents)
]

client.upsert(collection_name="articles", points=points)
print("Upserted {} points".format(len(points)))

query_vector = model.encode("how does semantic search work?").tolist()
results = client.search(
    collection_name="articles",
    query_vector=query_vector,
    limit=5
)

for hit in results:
    print("Score: {:.4f} | {}".format(hit.score, hit.payload["title"]))

The decision rule is practical: start with the simplest thing that works. Reach for a vector database when your prototype has validated the use case and the operational requirements outgrow an in-memory or file-based approach. Premature infrastructure complexity is a common failure mode in AI projects, and a vector database is infrastructure with real operational costs. Earn it by demonstrating the need first.

Want to build this live with Aki?

Join a Lightning Lesson and go deeper on this topic. Browse upcoming sessions →

Vector Databases Explained: What They Are and When You Need One

Key Takeaways

What Is an Embedding?

How Cosine Similarity Search Works

Key Vector Database Options

When You Need a Vector DB vs When You Don't

Want to build this live with Aki?

Aki Wijesundara

Ready to Launch Your AI Career?

Table of Contents

Share Article

Get Weekly AI Career Tips