The new frontier of search
A new focus on meaning-oriented search techniques is ushering in a Silver Age of search.
By John Harney
If you want to find information fast, you need search and retrieval technology. That is not news to people who have been interfacing with IT tools for the last decade. Even laypeople are familiar with recreational search engines, like AltaVista, used for exploring the Internet. Early on in its development, search made inroads into vertical markets like financial services and as an adjunct functionality embedded in KM and document management products.
As search evolved, it also found a home in pharmaceutical companies and government intelligence agencies. It proved indispensable for drug research and development because it expedited a drug creation process that was made more complicated and expensive by the Federal Drug Administration. Intelligence agencies found it to be a useful tool in monitoring the communication among potential terrorists. Those search applications used techniques that were advanced for their time to help workers perform research in quite specialized infobases. Techniques like natural language processing let them carry out fuzzy searches when they wanted to circumscribe an area of knowledge but didn't have useful search terms. Taxonomies and autocategorization automatically found and classified documents in predefined or self-generated categories.
Those techniques have become a staple of search products, but they are being augmented by more meaning-oriented, rather than term-oriented, techniques. Techniques like concept-based searching and pattern recognition do not limit the scope of a search to the specific terms used. Instead they find data based on related meanings of the search terms. A natural language query of a Central Intelligence Agency infobase like "the American embassy in Beirut was car-bombed" might turn up documents detailing similar bombings in other countries' embassies worldwide, even though most of the terms like "American," "Beirut" and "car-bombed" do not appear in the documents. The speed with which a search engine can accomplish this is especially valuable when a limited number of intelligence experts must react quickly to such terrorist incidents to be effective in preventing future ones.
That added value has justified bigger spending for the average search installation, says Whit Andrews, research director, search technologies, Gartner. It has also served as persuasive proof that search can be used for strategic goals that drive the mission of, say, an agency involved with homeland security. High-tech manufacturing companies are beginning to use both term-oriented and meaning-oriented tools to accelerate supply chain processes. A telecommunications manufacturer might need to identify and acquire 100 different components from 50 different suppliers to make a piece of equipment, says Andrews. Searching for them automatically in a portal dedicated to supplier products, he adds, is much faster and more accurate than doing so manually.
The same is true of customer-facing applications like customer service or technical support portals, says Susan Feldman, research VP, content technologies, IDC. Such mechanisms let customers do self-service tasks, like order a replacement part and figure out how to replace it, without having to call the manufacturer, thereby freeing internal staff to do less generic tasks.
While we are not yet living in the Golden Age of search, recent improvements certainly qualify it as a Silver Age. A closer look at the dominant recent trends in search certainly bears this out.
Trends in search
There is consensus that new search technologies are more linguistically based so they can discern the meaning, instead of just the letters, of a search term. Feldman explains that concept searching can determine by context the meaning of your search phrase and match it to documents that only match that context and meaning. So if you used the search phrase, "semantic web," she says, the search engine would return conceptually related results having to do with "XML" and "metatagging," even though those terms weren't used in the query.
She says certain vendors like FAST Search & Transfer have emerged that can search both structured and unstructured data. "Previously, she explains, "we've had separate sets of vendors addressing those two problems, but the user wants to know everything about a topic regardless of the structure." Such capabilities, for example, let you search for data in unstructured documents while also performing fuzzy searching within databases.
Vendors like ClearForest, Attensity, Insightful and Inxight are also placing new emphasis on text mining. That is quite different from data mining, Feldman says, which requires you to prestructure the database in order to get certain results from it—types of results that you probably already know you want. Text mining, by contrast, "helps you locate what you don't know you're looking for," she says.
If you want to do market intelligence research, it's likely you'd want to identify competitors that are not already on your radar. Conventional search engines using specific search terms would require you to use the name of the company you're hoping to find—and, of course, you wouldn't know that information. Text mining, however, lets you train a search engine "to look at the characteristics of a competitor," explains Feldman, "and then go out and forage for similar companies."
It used to be that products applied only one technology to solve a search problem. Now vendors are offering several techniques and letting the most appropriate one solve the problem. Feldman says, for example, that vendors as different as Verity and ClearForest often use at least taxonomy-based and learn-by-example types of autocategorization to vote on which category data might belong in.
Feldman points out that an increasing number of search products are using Web services to do things like send messages back and forth between search and other applications, embed code in a document or form in order to make calls, spawn a workflow, pull together information or initiate a search.
The need for regulatory compliance is also driving companies to adopt search technologies to achieve unified access to scattered repositories throughout their organizations, Feldman says. Sarbanes-Oxley, for example, makes CEOs and CFOs liable for jail terms if their employees are performing noncompliant activities like insider trading. And the size of e-mail repositories in global organizations often precludes manual perusal by one person.
Distributing data through use of search engines is becoming more popular. Feldman says that companies like Nexcerpt let C-level employees gather information in the search process, but then summarize, annotate and distribute it to appropriate people in a newsletter or other format. Some of those people might supplement the data with the results of their text mining—say, a list of possible competitors with their strengths and weaknesses—and distribute it to the same people. Such value-added data builds on itself and might comprise some particularly useful business intelligence that would have been missed by distributing the information wholesale.
A fast maturing market
Until last year, the search market was comprised of different application vendors offering discrete search products that specialized in a spe