[Paper Note] Achieving Low-Latency Graph-Based Vector Search via Aligning Best-First Search Algorithm With SSD
Background
Vector searching aims to search for the closest neighbors given a target vector. When the vector is high dimensional, accurate vector search is prohibitively expensive, necessitating approximation nearest neighbor search (ANNS) which returns top-K nearest similar vectors instead of accurate results.
To support large-scale datasets, ANNS indexes has to be offloaded to SSD due to the limited CPU memory capacity. However, existing on-SSD graph-based vector search suffers from high latency, due to the misalignment between search algorithms and SSD I/O characteristics. Vector searching acts like BFS, which computes and loads the nearest neighbor in each steps until the target is reached. This I/O compute dependency leads to synchronous I/O pattern, contradicting modern SSD’s parallel I/O characteristics.
![Featured image for [Paper Note] Achieving Low-Latency Graph-Based Vector Search via Aligning Best-First Search Algorithm with SSD](/posts/achieving-low-latency-graph-based-vector-search-via-aligning-best-first-search-algorithm-with-ssd/images/pipesearch-latency-breakdown.png)
![Featured image for [Paper Note] Supporting Our AI Overlords Redesigning Data Systems to be Agent-first](/posts/supporting-our-ai-overlords-redesigning-data-systems-to-be-agent-first/images/pasted-image-20251030223041.png)
![Featured image for [Paper Note] Hyperledger Fabric A Distributed Operating System for Permissioned Blockchains](/posts/hyperledger-fabric-a-distributed-operating-system-for-permissioned-blockchains/images/pasted-image-20251027104717.png)
![Featured image for [Paper Note] Fast State Restoration in LLM Serving with HCache](/posts/fast-state-restoration-in-llm-serving-with-hcache/images/pasted-image-20251021144320.png)
![Featured image for [Paper Note] Strata Hierarchical Context Caching for Long Context Language Model Serving](/posts/strata-hierarchical-context-caching-for-long-context-language-model-serving/images/pasted-image-20251029160103.png)
![Featured image for [Paper Note] H2O Heavy-Hitter Oracle for Efficient Generative Inference of Large Language Models](/posts/h2o-heavy-hitter-oracle-for-efficient-generative-inference-of-large-language-models/images/pasted-image-20250913170530.png)