2026-02-21

Stop Overthinking Vector Databases: A Builder’s Decision Framework

8 min readEngineeringAI InfrastructureTutorialsAI DevelopmentpgvectorPineconePostgresRAGVector Database

A comprehensive guide to choosing the right vector database for your AI application. Includes a decision tree, a technical deep dive into indexing, and a code-first tutorial on setting up pgvector.

The Analysis Paralysis of the AI Stack

If you are building AI agents or RAG (Retrieval Augmented Generation) systems today, you have hit the wall. You need long-term memory for your LLM. You know you need a Vector Database.

But then you look at the landscape.

Pinecone, Weaviate, Qdrant, Milvus, Chroma, LanceDB, Supabase (pgvector), MongoDB Atlas Vector Search, Redis... the list is exhausting. Every vendor claims to be the fastest, the most scalable, and the easiest to use.

As an engineer, I don’t care about marketing benchmarks on a billion vectors. I care about shipping functional software. Most of us aren't building the next Google; we are building micro-SaaS tools, internal enterprise agents, or personal automation workflows.

In this guide, I’m stripping away the hype. We are going to cover how vector DBs actually work (briefly), use a 3-question framework to pick one, and then actually set one up.

Part 1: What You Are Actually Buying (The Tech)

Before we pick a tool, you need to understand the mechanism. A vector database does not store "text"; it stores arrays of floating-point numbers (embeddings) that represent the semantic meaning of that text.

When you query the DB, you aren't doing a WHERE text LIKE '%query%' match. You are doing a mathematical distance calculation (Cosine Similarity, Euclidean Distance, or Dot Product).

The Bottleneck: Indexing

Calculating the distance between your query vector and every single vector in your database (k-Nearest Neighbors, or kNN) is accurate but incredibly slow at scale. It’s O(N) complexity.

To solve this, Vector DBs use Approximate Nearest Neighbor (ANN) algorithms. This is the secret sauce you are paying for. There are two main types you should know:

IVF (Inverted File Index): Partitions the vector space into clusters (Voronoi cells). When a query comes in, it looks at the closest cluster and ignores the rest. It’s fast but requires a "training" step to define the clusters.
HNSW (Hierarchical Navigable Small World): This is the industry standard right now. Imagine a multi-layered graph. The top layer is a highway (long jumps across data). The bottom layer is local streets (fine-grained connections). It’s extremely fast and accurate but consumes more RAM.

Takeaway: If you are self-hosting, RAM usage (HNSW) vs. disk usage (IVF/DiskANN) matters. If you use a managed service, this is their problem, not yours.

Part 2: The 3-Question Decision Framework

Stop reading feature comparison charts. Answer these three questions, and your architecture usually defines itself.

Q1: Is your data relational?

Do your vectors live alongside structured data? For example, if you are building a user dashboard where a user searches their own documents, you need strict metadata filtering (user_id = 123).

Yes: Use a database that supports vectors and standard SQL. Don't split your logic.
No: You are building a pure semantic search engine or a massive unstructured knowledge base. A dedicated Vector DB is fine here.

Q2: What is your team size and infrastructure capacity?

Solo Dev / Small Team: You cannot afford to manage a Kubernetes cluster for Milvus. You need Serverless or Managed.
Enterprise / DevOps Team: You might need to run Qdrant or Weaviate on your own VPC for compliance (HIPAA/SOC2) or cost control.

Q3: How many vectors will you store?

< 100k Vectors: Literally anything works. Even a local JSON file or a Pandas dataframe in memory is fast enough.
100k - 10M Vectors: This is the sweet spot for standard solutions (Pinecone, pgvector, etc.).
100M+ Vectors: You need specialized, distributed infrastructure (Milvus, specialized Weaviate clusters).

Part 3: The Decision Tree (Pick Your Winner)

Based on the questions above, here is my "If This, Then That" recommendation engine.

Scenario A: The Full-Stack Developer (The "One DB" Rule)

Recommendation: Postgres with pgvector (via Supabase or Neon)

If you are already using Postgres for your user auth and app data, do not add another piece of infrastructure. pgvector is performant enough for 99% of use cases. It allows you to JOIN vector results with standard SQL tables. It simplifies backups, migrations, and consistency.

Scenario B: The "I Need Speed & Simplicity" (Serverless)

Recommendation: Pinecone or Upstash

If you don't want to manage a database at all and just want an API endpoint to dump vectors into, Pinecone is the gold standard. It is purely managed. Upstash is fantastic if you are in the Vercel/Next.js ecosystem and want per-request pricing.

Scenario C: The Feature-Rich Engineer

Recommendation: Weaviate or Qdrant

Do you need hybrid search (keyword + vector combined)? Do you need automatic modularization where the DB handles the embedding generation for you? Weaviate and Qdrant are "smart" databases. They offer features that go beyond simple storage.

Scenario D: Local / Testing / Privacy

Recommendation: Chroma or LanceDB

Building a prototype on your laptop? Building an app that runs on the user's device (Electron/Mobile)? LanceDB and Chroma can run embedded within your application code without a server.

Part 4: Let's Build (Implementation)

We are going to go with Scenario A because it is the most robust for a portfolio project. We will set up a vector store using Supabase (Postgres + pgvector).

Step 1: Enable the Extension

In your Supabase SQL Editor, run this to enable vector arithmetic:

create extension if not exists vector;

Step 2: Create the Table

We will create a table for "documents" that includes a chunk of text and its corresponding vector embedding. Note: OpenAI's text-embedding-3-small outputs 1536 dimensions.

create table documents (
  id bigserial primary key,
  content text,
  metadata jsonb,
  embedding vector(1536)
);

Step 3: Create the Index (HNSW)

To make it fast, we add an HNSW index. Without this, Postgres does a sequential scan (slow).

create index on documents using hnsw (embedding vector_cosine_ops);

Step 4: The Python Code

Here is a Python script to insert data and query it. You'll need the vecs or supabase client, but let's use standard psycopg2 for raw clarity.

import psycopg2
from openai import OpenAI
import os

# Configuration
DB_URL = os.getenv("DATABASE_URL")
client = OpenAI(api_key=os.getenv("OPENAI_API_KEY"))

# 1. Generate Embedding
def get_embedding(text):
    response = client.embeddings.create(
        input=text,
        model="text-embedding-3-small"
    )
    return response.data[0].embedding

# 2. Search Function
def search_docs(query_text, limit=3):
    query_vec = get_embedding(query_text)
    
    conn = psycopg2.connect(DB_URL)
    cur = conn.cursor()
    
    # The magic SQL operator: <=> is cosine distance
    sql = """
    SELECT content, 1 - (embedding <=> %s::vector) as similarity
    FROM documents
    ORDER BY embedding <=> %s::vector
    LIMIT %s;
    """
    
    cur.execute(sql, (query_vec, query_vec, limit))
    results = cur.fetchall()
    
    conn.close()
    return results

# Usage
print(search_docs("How do I choose a vector DB?"))

Part 5: The Cost Reality Check

Before you ship, do the math. Vector DBs have vastly different pricing models.

Pod-Based (e.g., Pinecone Standard): You pay per hour for the infrastructure, whether you use it or not. If you spin up a P1 pod, it costs ~$70/month even with zero traffic. Great for predictable, high-throughput apps. Bad for hobby projects.
Usage-Based (e.g., Pinecone Serverless, Upstash): You pay per read/write unit. If no one uses your app, you pay $0. This is ideal for startups and portfolios.
Compute-Based (e.g., Supabase/Postgres): You pay for the underlying database size and compute. Storing vectors adds to your disk size, and building HNSW indexes consumes RAM. If your index grows larger than your available RAM, performance falls off a cliff.

Summary

There is no "best" Vector DB, only the one that fits your current constraints.

Using Postgres? Use pgvector.
Need Serverless/Easy? Use Pinecone.
Need Local? Use Chroma.

Pick one, implement the code above, and move on to the hard part: making your agent actually intelligent.

Comments

Loading comments...

2026-02-21

Stop Overthinking Vector Databases: A Builder’s Decision Framework

8 min readEngineeringAI InfrastructureTutorialsAI DevelopmentpgvectorPineconePostgresRAGVector Database

A comprehensive guide to choosing the right vector database for your AI application. Includes a decision tree, a technical deep dive into indexing, and a code-first tutorial on setting up pgvector.

The Analysis Paralysis of the AI Stack

If you are building AI agents or RAG (Retrieval Augmented Generation) systems today, you have hit the wall. You need long-term memory for your LLM. You know you need a Vector Database.

But then you look at the landscape.

In this guide, I’m stripping away the hype. We are going to cover how vector DBs actually work (briefly), use a 3-question framework to pick one, and then actually set one up.

Part 1: What You Are Actually Buying (The Tech)

When you query the DB, you aren't doing a WHERE text LIKE '%query%' match. You are doing a mathematical distance calculation (Cosine Similarity, Euclidean Distance, or Dot Product).

The Bottleneck: Indexing

Calculating the distance between your query vector and every single vector in your database (k-Nearest Neighbors, or kNN) is accurate but incredibly slow at scale. It’s O(N) complexity.

To solve this, Vector DBs use Approximate Nearest Neighbor (ANN) algorithms. This is the secret sauce you are paying for. There are two main types you should know:

IVF (Inverted File Index): Partitions the vector space into clusters (Voronoi cells). When a query comes in, it looks at the closest cluster and ignores the rest. It’s fast but requires a "training" step to define the clusters.
HNSW (Hierarchical Navigable Small World): This is the industry standard right now. Imagine a multi-layered graph. The top layer is a highway (long jumps across data). The bottom layer is local streets (fine-grained connections). It’s extremely fast and accurate but consumes more RAM.

Takeaway: If you are self-hosting, RAM usage (HNSW) vs. disk usage (IVF/DiskANN) matters. If you use a managed service, this is their problem, not yours.

Part 2: The 3-Question Decision Framework

Stop reading feature comparison charts. Answer these three questions, and your architecture usually defines itself.

Q1: Is your data relational?

Do your vectors live alongside structured data? For example, if you are building a user dashboard where a user searches their own documents, you need strict metadata filtering (user_id = 123).

Yes: Use a database that supports vectors and standard SQL. Don't split your logic.
No: You are building a pure semantic search engine or a massive unstructured knowledge base. A dedicated Vector DB is fine here.

Q2: What is your team size and infrastructure capacity?

Solo Dev / Small Team: You cannot afford to manage a Kubernetes cluster for Milvus. You need Serverless or Managed.
Enterprise / DevOps Team: You might need to run Qdrant or Weaviate on your own VPC for compliance (HIPAA/SOC2) or cost control.

Q3: How many vectors will you store?

< 100k Vectors: Literally anything works. Even a local JSON file or a Pandas dataframe in memory is fast enough.
100k - 10M Vectors: This is the sweet spot for standard solutions (Pinecone, pgvector, etc.).
100M+ Vectors: You need specialized, distributed infrastructure (Milvus, specialized Weaviate clusters).

Part 3: The Decision Tree (Pick Your Winner)

Based on the questions above, here is my "If This, Then That" recommendation engine.

Scenario A: The Full-Stack Developer (The "One DB" Rule)

Recommendation: Postgres with pgvector (via Supabase or Neon)

Scenario B: The "I Need Speed & Simplicity" (Serverless)

Recommendation: Pinecone or Upstash

Scenario C: The Feature-Rich Engineer

Recommendation: Weaviate or Qdrant

Scenario D: Local / Testing / Privacy

Recommendation: Chroma or LanceDB

Building a prototype on your laptop? Building an app that runs on the user's device (Electron/Mobile)? LanceDB and Chroma can run embedded within your application code without a server.

Part 4: Let's Build (Implementation)

We are going to go with Scenario A because it is the most robust for a portfolio project. We will set up a vector store using Supabase (Postgres + pgvector).

Step 1: Enable the Extension

In your Supabase SQL Editor, run this to enable vector arithmetic:

create extension if not exists vector;

Step 2: Create the Table

We will create a table for "documents" that includes a chunk of text and its corresponding vector embedding. Note: OpenAI's text-embedding-3-small outputs 1536 dimensions.

create table documents (
  id bigserial primary key,
  content text,
  metadata jsonb,
  embedding vector(1536)
);

Step 3: Create the Index (HNSW)

To make it fast, we add an HNSW index. Without this, Postgres does a sequential scan (slow).

create index on documents using hnsw (embedding vector_cosine_ops);

Step 4: The Python Code

Here is a Python script to insert data and query it. You'll need the vecs or supabase client, but let's use standard psycopg2 for raw clarity.

import psycopg2
from openai import OpenAI
import os

# Configuration
DB_URL = os.getenv("DATABASE_URL")
client = OpenAI(api_key=os.getenv("OPENAI_API_KEY"))

# 1. Generate Embedding
def get_embedding(text):
    response = client.embeddings.create(
        input=text,
        model="text-embedding-3-small"
    )
    return response.data[0].embedding

# 2. Search Function
def search_docs(query_text, limit=3):
    query_vec = get_embedding(query_text)
    
    conn = psycopg2.connect(DB_URL)
    cur = conn.cursor()
    
    # The magic SQL operator: <=> is cosine distance
    sql = """
    SELECT content, 1 - (embedding <=> %s::vector) as similarity
    FROM documents
    ORDER BY embedding <=> %s::vector
    LIMIT %s;
    """
    
    cur.execute(sql, (query_vec, query_vec, limit))
    results = cur.fetchall()
    
    conn.close()
    return results

# Usage
print(search_docs("How do I choose a vector DB?"))

Part 5: The Cost Reality Check

Before you ship, do the math. Vector DBs have vastly different pricing models.

Pod-Based (e.g., Pinecone Standard): You pay per hour for the infrastructure, whether you use it or not. If you spin up a P1 pod, it costs ~$70/month even with zero traffic. Great for predictable, high-throughput apps. Bad for hobby projects.
Usage-Based (e.g., Pinecone Serverless, Upstash): You pay per read/write unit. If no one uses your app, you pay $0. This is ideal for startups and portfolios.
Compute-Based (e.g., Supabase/Postgres): You pay for the underlying database size and compute. Storing vectors adds to your disk size, and building HNSW indexes consumes RAM. If your index grows larger than your available RAM, performance falls off a cliff.

Summary

There is no "best" Vector DB, only the one that fits your current constraints.

Using Postgres? Use pgvector.
Need Serverless/Easy? Use Pinecone.
Need Local? Use Chroma.

Pick one, implement the code above, and move on to the hard part: making your agent actually intelligent.

Comments

Loading comments...

The Analysis Paralysis of the AI Stack

Part 1: What You Are Actually Buying (The Tech)

The Bottleneck: Indexing

Part 2: The 3-Question Decision Framework

Q1: Is your data relational?

Q2: What is your team size and infrastructure capacity?

Q3: How many vectors will you store?

Part 3: The Decision Tree (Pick Your Winner)

Scenario A: The Full-Stack Developer (The "One DB" Rule)

Scenario B: The "I Need Speed & Simplicity" (Serverless)

Scenario C: The Feature-Rich Engineer

Scenario D: Local / Testing / Privacy

Part 4: Let's Build (Implementation)

Step 1: Enable the Extension

Step 2: Create the Table

Step 3: Create the Index (HNSW)

Step 4: The Python Code

Part 5: The Cost Reality Check

Summary

Comments

Add a comment

The Analysis Paralysis of the AI Stack

Part 1: What You Are Actually Buying (The Tech)

The Bottleneck: Indexing

Part 2: The 3-Question Decision Framework

Q1: Is your data relational?

Q2: What is your team size and infrastructure capacity?

Q3: How many vectors will you store?

Part 3: The Decision Tree (Pick Your Winner)

Scenario A: The Full-Stack Developer (The "One DB" Rule)

Scenario B: The "I Need Speed & Simplicity" (Serverless)

Scenario C: The Feature-Rich Engineer

Scenario D: Local / Testing / Privacy

Part 4: Let's Build (Implementation)

Step 1: Enable the Extension

Step 2: Create the Table

Step 3: Create the Index (HNSW)

Step 4: The Python Code

Part 5: The Cost Reality Check

Summary

Comments

Add a comment