Harnessing hybrid semantic search with CrateDB and SearchBlox
From employees to partners and customers, personalized, meaningful experiences are now indivisible from enterprise expectations. At the core of delivering these experiences is employing the nuance of user intent and context—achievable through the power of semantic search.
In KMWorld’s latest webinar, Unlocking the Power of Semantic Search, experts from CrateDB and SearchBlox Software, Inc. discuss key technologies and practices for successfully employing semantic search and overcoming its array of challenges.
With the proliferation of different types of search with different types of capabilities, hybrid search—or the combination of multiple search approaches—offers a distinct advantage, explained Simon Prickett, senior product evangelist at CrateDB. Hybrid search has the capacity to improve the relevance of search results, combining keyword or full-text search (for text data), geospatial search (for spatial data), and semantic search (for meaning and intent).
However, a hybrid approach lends itself to the reconciliation of active data silos, cautioned Prickett. With different types of data inhabiting different locations, delivering a response to the end user imbued with text, document, geospatial, and semantic data becomes challenging.
“If you have, for example, geospatial data, it might live in a geospatial system that is good at answering those sorts of queries. If you have document or, say, product overview data, it may live in a document database, which is good at being searched for those sorts of things…you can start to see that things get quite complicated with combining these,” said Prickett. “When you add the vector representations that you need for semantic search, you’ll find you’ll end up with another specialized data store for those.”
Fortunately, CrateDB offers a hyper-fast, open source database for real-time analytics and hybrid search that unifies the representation of an enterprise's diverse data types. Accessible via SQL and automatically indexed, CrateDB’s database and search engine offers hyper-fast and seamlessly scalable execution for time-series, document, and vector data.
Timo Selvaraj, chief product officer at SearchBlox Software, Inc., examined present challenges associated with implementing semantic search, including:
- Solution complexity
- Ingestion of company content existing in multiple formats and data sources
- Lack of large language model (LLM) data privacy
- Unpredictable costs
- Issues with retrieval accuracy
Echoing Prickett, Selvaraj observed that the key to resolving each of these roadblocks is a hybrid search approach, combining semantic and keyword search. This combination addresses the shortcomings of each search type individually, such as eliminating keyword search’s limitation to query vocabulary, or improving semantic search’s accuracy. Hybrid search is fundamental for implementing advanced AI use cases, noted Selvaraj, including retrieval-augmented generation (RAG) search, conversational chatbots, and AI agents.
Even if a hybrid search approach is implemented, another core challenge is data quality. If data or documents are missing values or have poor titles and descriptions, any search strategy is in vain.
Selvaraj introduced SearchBlox’s PreText NLP, a solution that automatically fixes content—including with relevant title generation, audio to text description, optical character recognition (OCR) and more—to increase findability for hybrid search. Afterall, content authors may not always think about content findability, and there is typically not enough time or budget to manually change existing content, according to Selvaraj. By automatically adding content metadata, the PreText NLP enhances overall findability without making any changes to the original content.
For the full, in-depth webinar featuring a live demo, Q&A, and more, you can view an archived version of the webinar here.