Research
Vector Search Tradeoffs for Production
2024-11-03 · 6 min
How to choose between ANN libraries, index sizes and metadata filters when latency matters and budgets are pretending to be infinite.
Vector search is a balance between recall, latency and storage. Index choice and dimensionality directly impact infrastructure cost, which is a polite way of saying it can get expensive fast.
For production, I prefer offline index builds, strict metadata filtering and incremental refreshes that avoid downtime. Live rebuilds are great until they are not.
When recall must stay high, I tune the search depth and cache hot queries aggressively to reduce median latency.
If you don’t measure drift, your index will quietly get worse while everyone blames the model.
- Measure recall with real queries, not synthetic ones.
- Filter before similarity search to cut cost.
- Cache high-frequency queries and results.
- Monitor index drift and re-embed on schedule.
- Do not overfit benchmarks because users will not read them.