The intersection of text analytics and GenAI at KMWorld 2024
GenAI continues to astound, but it also struggles to produce real business value outside of a few applications such as customer support chat and the productivity gains of producing rough drafts. While those are valuable, there is so much more that GenAI could do if it can overcome its current well-known limitations.
GenAI’s limitations include its tendency to hallucinate, that is, make up false facts. LLMs were trained on public information, but as we’ve seen many times, the content and vocabularies behind the enterprise firewalls are quite different; this is why transparency—understanding why it says what it does—is so important.
At KMWorld 2024, Tom Reamy, chief knowledge architect and founder, KAPS Group and Author, Deep Text, discussed, “GenAI & Text Analytics: Creating a Foundation for Business Value.”
Text analytics can refine the general answers of GenAI with text analytics precision. Text analytics comprises a variety of areas including auto-categorization, data extraction, and supplemental technologies such as natural language generation or text mining.
The model for text analytics and GenAI in the enterprise starts with content, concepts, text analytics, semantics, and applications.
“The best way to think about this is, it’s not an application, it’s a foundation for applications,” Reamy said.
He explained the basics for creating a text analytics foundation (autocategorization, data extraction, and more) and a GenAI approach that combines text analytics, prompt engineering, merging enterprise LLMs with the larger public LLMs, and RAG capability. This collaboration of text analytics and GenAI has the potential to transform how your business operates and competes.
He said there’s no such thing as unstructured text, there is always structure to a document. To pull out the best information from “key words,” use an abstract or summary about a document you’re looking for.
“This is what works better than a whole document,” he said.
Application areas for text analytics include:
- Search and search based
- Risk management
- Healthcare
- Fraud detection
- Contextual advertising
- Automated chat customer support
- Spam filtering
- Social media analytics
- Robotics process automation and more
Enterprise search continues to dramatically underperform, he explained. To do it well, companies need to implement entity extraction and taxonomies.
Expertise analysis covers experts who think and write differently. Automatic profiles can be created to create a deeper understanding of communities, and more.
“GenAI is everywhere but it’s important to distinguish AI from GenAI,” Reamy said.
More memory and faster computers have changed the game for enabling neural nets/deep learning that works better for standard AI. However, it’s not great for GenAI, he said. GenAI has no understanding of what it’s producing.
“We’ve seen this hype before,” Reamy said. “The original hype was too much and now it’s being corrected.”
ChaptGPT and LLMs have been a huge leap forward but it requires huge scale, content, and power. The applications seem endless, but GenAI has the propensity to spit out misinformation or incorrect information.
According to Reamy, there are four main issues with GenAI:
- Tendency to hallucinate
- Lack of transparency surrounding how it works
- Amount and quality of data needed to train
- Security issues
“Prediction is trying to find the most popular answer. But the most popular answer isn’t the right answer,” Reamy said.
To overcome the limitations of these issues, Reamy suggested building a text analytics/LLM foundation. This can be done two ways: via application or using text analytics within prompts to make them smarter and more complete.
“It’s metadata and meta-documents to make it smarter when it is analyzing documents,” said Reamy said.
Text analytics is much more accurate than using GenAI by itself, he said.
KMWorld returned to the J.W. Marriott in Washington D.C. on November 19-21, with pre-conference workshops held on November 18.
KMWorld 2024 is a part of a unique program of five co-located conferences, which also includes Enterprise Search & Discovery, Enterprise AI World, Taxonomy Boot Camp, and Text Analytics Forum.