The future of search: Conversational, semantic, and vectorized models
Semantic search
What Aasman describes as the future of search couples the question-answering, validation aspect of LLMs with the enterprise granularity of semantic search. This combinatorial approach to search technologies—incorporating the rules systems and symbolic reasoning of semantic search alongside statistical model approaches—seems indicative of the general trajectory in which search is headed. Newer methods won’t displace established ones, but augment them—and vice versa. Potter espoused the virtues of taking what he described as a “multifaceted search approach” involving indexing, rules, and LLMs.
Rules-based systems and taxonomies are at the crux of semantic search. According to Aasman, with this search methodology, “You have unstructured text. You create a taxonomy of your important concepts. You extract your important concepts out of all your text, reducing all your words.” This method enables organizations to specify synonyms for the words denoting concepts too. Thus, users can readily find all the documents that relate to a particular clause in a contract, for example, by finding the term for it and its synonyms.
Context and search relevance
The semantic search process Aasman articulated greatly informs the underlying relevance model via taxonomies—which might be devised by humans or LLMs. The most truly relevant search results, however, are also informed by the context surrounding the query. In addition to considerations such as who the user is and the business purpose for the search, other dimensions of context include “a person’s access control rights,” Potter explained. “What are they allowed to see? That’s context. What organization are they in? Who’s their network of people that they work with from a collaboration and sharing perspective?”
The networking aspect of context may prove pivotal for increasing search relevance. Depending on some of the other factors Potter mentioned, a searcher’s cohorts or use case can be employed to create what Potter called “collective intelligence” for issuing meaningful search results. This approach allows systems “to start learning from what other questions people are asking,” Potter remarked. “That is context as well. What are my peers asking? Can I actually help map your question to one of theirs, and then map that so they get a similar answer?” External data sources, such as reference data pertaining to country codes, tax codes, and other such information, are also used to form the context that can boost the relevance of search results.
Vector search
Vector searches are becoming invaluable to the relevance models for numerous search applications. Although this search technique has been well-established, it still issues apropos results as “an alternative approach to semantic search,” Aasman said. This method creates vectors—a list of numbers—out of searchable content. Queries themselves are also transformed into vectors, which become the basis for finding all the other vectors that are similar to it. “You can always compute the distance between two vectors,” Aasman explained. With this mathematical foundation, it’s possible to find all the documents closest to a vectorized query for particulars such as, for example, insurance claims costs above a certain threshold for roofing damage in a specific state.
Contemporary vector searches are frequently predicated on word embeddings that represent which words are closest together in a dimensional space. GPT-4.0, ChatGPT, and other LLMs usually rely on this approach, which Riley termed “a mathematical process that typically goes through a transformer model.” The many advantages of vector searches include:
♦ More than documents: Numerous pieces of content, including images and video, in addition to traditional documents or webpages, can be transformed into vectors for similarity searches. This point is imperative for digital assets management use cases and other content services applications.