Research

Vector Search Tradeoffs for Production

2024-11-03 · 6 min

How to choose between ANN libraries, index sizes and metadata filters when latency matters and budgets are pretending to be infinite.

Vector search is a balance between recall, latency and storage. Index choice and dimensionality directly impact infrastructure cost, which is a polite way of saying it can get expensive fast.

For production, I prefer offline index builds, strict metadata filtering and incremental refreshes that avoid downtime. Live rebuilds are great until they are not.

When recall must stay high, I tune the search depth and cache hot queries aggressively to reduce median latency.

If you don’t measure drift, your index will quietly get worse while everyone blames the model.