Web mining synergy
User stories from the knowledge front
A new Web mining server has replaced cut-and-paste methods of information gathering at Cinergy Marketing and Trading, a Houston affiliate of Cinergy Corp., a diversified energy company headquartered in Cincinnati, Ohio.
Employees at Cinergy would bookmark frequently visited Web sites and return to them as often as possible. When they found important business information, they would write it down in a notebook or paste it into Excel. But because no one works 24 hours a day, people could miss significant information on the ever-churning Web. Also, there is a limit to how much information a person can cut and paste in a day.
Cinergy took its dilemma to Connotate Technologies, which offered its vTag Web Mining Server as a solution. vTag provides Information Agents that map to given Internet sites to monitor, extract and deliver information in real time. The agents can navigate and monitor sources, transform unstructured information into structured information, apply personalization and filtering, and deliver the information to a variety of sources including e-mail, database, Excel and XML, according to Connotate. The 500 billion pages of data that exist on the Web and the billions of new elements that are added on a daily basis can be accessed automatically, the company says.
Through the system, Cinergy now collects and warehouses real-time information about price, volume, availability, weather, industry development and more ... information that can help predict future supply and demand, usage and pricing. The information is available to Cinergy's traders, analysts, researchers and customers.
"We are now able to monitor hundreds of sites and automatically acquire freshly posted data, something we could never do before," says Darcy Pach, manager of Cinergy Marketing and Trading. "Getting timely information from the source gives us a real advantage."
Cinergy says that since it has installed the system, the number of sources of information it polls has jumped from 20 to 80 and that information is collected in one-twelfth the time it took previously