Everything Old Is New Again
Oh See Our Data
Optical character recognition (OCR) is another of those old-time technologies that has morphed into a new world of AI. Daniel Vasicek, Senior Data Scientist, Access Innovations, Inc., sees huge improvements in OCR performance and in automatic translation from one language to another. However, real data has a degree of uncertainty that causes it to be non-conforming.
Because data isn’t always perfect (a person’s job affiliation from several years ago, for example), models built to reflect and draw inferences from that data aren’t always perfect, either. Thus, fitting data to the model requires some balancing of measurement errors with model errors. This enhances model predictions.
This may sound somewhat heretical, but Vasicek thinks that exact fits are not only no longer possible, but also not desirable. As he writes, “An exact fit to noisy data means that the model is fitting the noise.” Although you’d like to reduce the uncertainty in predications, models should still be able to predict from noisy data even without a perfect fit. He warns, too, about overfitting, where the ML algorithms get overly specific, weeding out information that is actually germane and thus giving erroneous or misleading results.
Improving Internal Search
Internal search is one area deeply affected by text analytics, NLP, and ML. Sean Coleman, CTO and Chief Customer Officer at BA Insight, suggests adding semantic search to the list of “new” technologies. In his view, semantic search lets people “search as they speak,” creates a single unified index, and applies conceptual knowledge to search queries. This leads to higher relevance of search results.
Associated technologies for better search include content intelligence, which gives employees detailed information about files, multimedia content, and taxonomies, along with the textual content. A single index incorporating ML features autosuggestion and autocorrection. Personalization is gained by showing what other people viewed, based on location, department, or interests. Sentiment analysis, to be meaningful internally, can focus on social media inside the enterprise and on emails and other electronic communications. Coleman also mentions search bots as coming of age for internal search.
Not everything that was old is new again, with good reason. Few people miss Clippy or rotary dial phones. AI in 1956 played checkers and that was the extent of its “intelligence.” We’ve moved on, and I think we can all agree that the evolution of AI, NLP, ML, and text analytics benefits us both personally and professionally.
Companies and Suppliers Mentioned