Research

Vector Search Tradeoffs for Production

2024-11-03 · 6 min

How to choose between ANN libraries, index sizes and metadata filters when latency matters and budgets are pretending to be infinite.

Vector search is a balance between recall, latency and storage. Index choice and dimensionality directly impact infrastructure cost, which is a polite way of saying it can get expensive fast.

For production, I prefer offline index builds, strict metadata filtering and incremental refreshes that avoid downtime. Live rebuilds are great until they are not.

When recall must stay high, I tune the search depth and cache hot queries aggressively to reduce median latency.

If you don’t measure drift, your index will quietly get worse while everyone blames the model.

Measure recall with real queries, not synthetic ones.
Filter before similarity search to cut cost.
Cache high-frequency queries and results.
Monitor index drift and re-embed on schedule.
Do not overfit benchmarks because users will not read them.