It's a messy endeavor: Automated text processing
Another challenge is the need to process rich media. To make a podcast or video searchable, software must convert the speech to text. Due to the wide variations in audio quality, speech-to-text systems often produce results that are not usable. A short time ago, we attempted to process four recordings made in live venues. We used three different systems, but none of the systems was able to produce a usable ASCII transcription of the audio on the recordings. The solution was difficult and required sending the audio and video source files to a human who was able to transcribe about 90 percent of the information. In a world in which forward-thinking engineers want to capture "all" rich media, the human-intermediated solution is neither affordable nor practical. At this time, an automated solution to unlock the information in audio and video content in a manner that makes search useful is not available. Progress is being made, just moving like a snail on a warm summer evening.
Third, companies engaged in next-generation content processing are constrained by a number of factors. Resources, even at large companies, are tight and information priorities are often fuzzy or fluid. On one hand, the enterprise solutions responsible for the day-to-day information retrieval needs of the organization are difficult to change, upgrade or replace. Ad hoc solutions to deal with hot-spot problems are often useful to a specific group. Migrating the expertise from a special project's solution across an organization can be difficult. The hurdle, according to the Harvard Business Review, is change management. "We behave based on the reality around us," says Gregory Shea and Cassie Solomon in the article "Change Management Is Bigger than Leadership." (See blogs.hbr.org/cs/2013/03/change_management—is—bigger—th.html.) Despite the need for integrated systems, most organizations operate with fiefdoms, islands and silos of information.
Finally, managers responsible for making strategic and tactical decisions face a problem that is different from those just six or eight years ago. The sheer volume of data available within an organization requires different tools and business processes. For a person working in knowledge management, the journey now underway may be discomfiting. Buzzwords like "governance," "analytics" and "business intelligence" do little to provide reliable mileposts.
Leximancer's Smith said, "I cannot currently think of any other commercial automatic text analysis system whose output model has been cross-validated in the scientific literature."
In our world of proliferating information and hurdles that are difficult to get over, I think of Thomas Alva Edison's alleged quip, "I have not failed. I've just found 10,000 ways that won't work."