-->

KMWorld 2024 Is Nov. 18-21 in Washington, DC. Register now for $100 off!

Give your search a boost

Based on the evaluation of user behavior, certain pages are determined to be more valuable, and can be moved up in the results list. NetApp’s best evidence that Baynote was working was the fact that it stopped getting complaints about the search function.

Users abandon a site quickly if they are not finding what they need, so routing them to the right information should be a top priority. "If the first three results in a search list are not helpful, half the people give up, says Scott Brave, CTO of Baynote. "And if the desired results are not found somewhere on the first page of results, that number goes up to 95 percent."

When the Baynote Collective Intelligence Platform is first installed, its "observer" software watches what users do. "We think the approach of observing behavior is the most robust method of determining what information is valuable to users," says Scott Brave, CTO of Baynote. "People often vote or make recommendations in a way that does not reflect their actual behavior."

After tracking user behavior, Baynote begins to adapt the search process continuously and in real time to improve the outcome. "The more Baynote learns, the better it gets," Brave adds.

Open source front end for searches

Before a search engine can produce results, a considerable amount of document processing needs to take place. This function is carried out by a front end that includes connectors to data sources, crawlers that search Web sites and document filters that select certain formats for inclusion. The document then goes through stages in which tokenizers break up text into words, stop word removers take out "noise" words and entity recognizers prepare the text for future analysis. Those steps and many others put the text into a form that can be more meaningfully indexed and searched.

Dieselpoint, which offers a high-end search engine, recently introduced an open source front-end product called OpenPipeline. Available at no cost as a standalone product, OpenPipeline has connectors to the leading content management systems. Its purpose is to improve the performance of the document processing step.

"We believe this product has potential because it ties together several other pieces of software with one solution," says Chris Cleveland, CEO of Dieselpoint. The document is then filtered and transformed into XML. "This allows extraction of entities of such information as company names, person names, phone numbers and application-specific entities," continues Cleveland.

The next release of OpenPipeline, expected this spring, will have wrappers for the Unstructured Information Management Architecture (UIMA), which originated at IBM and is now an open source alternative for search and semantic analysis of unstructured information. It will also support LingPipe, which is a set of Java libraries used for linguistic analysis and data mining, and other text analytics packages.

Although OpenPipeline is open source, Cleveland emphasizes that Dieselpoint commercially supports it. "This is not a case of an open source product where no one is in charge," he says. "Our own product is built on it, so we maintain and expand it." The advantage to Dieselpoint is that involving third-party participants through open source technology allows the company to interact with a greater number of vendors and take on larger projects, as well as to highlight its own search engine.

Sorting out search

Search Technologies maintains that organizations should understand their search needs fully before choosing a solution. The company has developed a detailed assessment methodology that it uses to evaluate the customer’s present environment, conduct a gap analysis and make recommendations for a search solution. During that process, Search Technologies analyzes the organization’s overall systems architecture, data processing and indexing needs, and search requirements.

"Some customers use search inside the firewall to save money," says Kamran Khan, president and CEO of Search Technologies, "and some use it on their Web site to make money. Each mission has a different set of requirements."

Khan says that many customers do not fully understand how to integrate the products they buy, and therefore they don’t get the most out of their investment. "Also, the underlying complexity of data is not always evident," he adds. "Meticulous work is often required to get the data in a form that allows it to be usable."

Search Technologies is a Microsoft partner and uses FAST Search technology, which was acquired by Microsoft last year, in many of its implementations. The results of its assessments, however, can be used to guide the deployment of any search solution.

KMWorld Covers
Free
for qualified subscribers
Subscribe Now Current Issue Past Issues