The future of search: Conversational, semantic, and vectorized models
♦ Semantics: Vector-based search can provide a semantic understanding of content and queries much better than keyword search can. It can determine that roofing materials and shingles, for example, are similar and return results for both when a user’s search only includes one of those terms— without creating an exhaustive taxonomy beforehand. As Riley indicated, “Vector search is able to capture that semantic meaning and encode it into the vector, so you can actually make that similarity calculation, even though the keywords themselves are not there.”
♦ Similarity search and question-answering: The semantic understanding of vectors makes this approach suitable for both question-answering and similarity search, which can’t necessarily be said of some other search techniques.
Relevance ranking
Relevance ranking decides the order of search results based on the underlying relevance model. It usually involves scoring results and ordering the highest-scoring one first. Several search applications are combining approaches for relevance ranking, which is a further testament to the almost composable way of arranging search techniques to get the most satisfactory results. “There are a variety of ways to do ranking,” Riley commented. “One of the most interesting, relevant models, according to academic research—and there are standardized datasets this has been tested on—is to use a hybrid ranking methodology.”
Compelling results are delivered by issuing both vector-based and keyword searches. “You get scores back from both queries from each dataset and then they combine them, which is why they call this hybrid scoring,” Riley said. “There are a couple different ways for combining those scores in a hybrid way. But, this is shown to be the most performant methodology for getting the most relevant results.” It also signifies the enduring value of the basic keyword search in the era of LLMs and word embeddings.
Federated search
Federated search lets users conduct a unified search across sources, repositories, and tools. The use cases for doing so are growing alongside the increasing distribution of data. For example, users might want to “search over their internal intranet, and also a database with customer information and stuff in Salesforce and Dropbox,” Riley posited.
Organizations can employ approaches that replicate the content from those sources into a holistic platform to search through all of it in one place. Although this method involves data movement, “it simplifies the user’s life,” Riley mentioned. “We can transform that content as it comes in to make it more easily searchable. We can run it through other kinds of machine learning models to learn more about the content as it comes in, and perhaps provide some structure that isn’t present in the initial form.”
Not slowing down
The motions impacting search will likely continue apace for the foreseeable future. The coalescing of different techniques, from vector search to semantic search, keyword search to federated search, and more, will likely continue to deliver the most relevant results to users and the models underpinning these mechanisms. Just which of these forms of search will dominate, however, is yet to be seen.