I'd be curious how they implement updating. AFAICT this is the thorniest part of working with existing open source solutions. When working with ANNOY in the past I've had data small enough to be able to recompute the full index in the background every few seconds in a background process and then swap in the built index to the process serving similarity queries.
EDIT: from the insertion docs https://milvus.io/docs/guides/milvus_operation.md#Insert-vec... it seems that they still ask you to re-build your indices after you insert vectors, although in some cases they can tell that they need to re-build the indices for you. Looks like the major value adds here are potentially shifting computation to the GPU and building multiple indices. I'll certainly evaluate this next time I'm building a project around vector search.
Milvus allows users to append vectors. Vectors are stored in multiple file slices. When a file slice reaches the threshold, Milvus will build the index for that file slice, and new data will be inserted into a new file slice. For details, please refer
https://medium.com/@milvusio/managing-data-in-massive-scale-...
We are now working on the vector deletion. Hopefully will be ready by the end of 1Q this year.
If I append a single new vector, will it show up in search results without me needing to ask for the index to be rebuilt? Can i update an existing vector without having to ask for the index to be rebuilt?
EDIT: from reading the linked article, it seems like newly inserted vectors will be queried using brute force. Very interesting idea!
See https://github.com/jolibrain/deepdetect/pull/641 that uses FAISS as a backend alternative to annoy (annoy supported as well). Deletion can be implemented by removing entries from the listing db while the vector remains within the index.
Tests show that FAISS is bit better than annoy on retrieval of both small and million items indexes. It also includes ind x compression techniques that in our tests do fair very well, with very low loss on mid size 500k image indexes.
(you can see the VERY "research quality" code on Github, here's a decent starting place https://github.com/hyperstudio/spectacles/blob/master/specta...)
EDIT: from the insertion docs https://milvus.io/docs/guides/milvus_operation.md#Insert-vec... it seems that they still ask you to re-build your indices after you insert vectors, although in some cases they can tell that they need to re-build the indices for you. Looks like the major value adds here are potentially shifting computation to the GPU and building multiple indices. I'll certainly evaluate this next time I'm building a project around vector search.