In Search of Search: Separating the "What" from the "Why"
There was a while there when people talked about enterprise search as though it were a "thing" that could be wrapped up and packaged and licked on like a light switch. All we (here at KMWorld) had to do was slap the words "Enterprise Search" in the title of a webinar or a white paper, and people would flock to it. I actually saw people at a KMWorld conference with black magic markers writing the words "enterprise search" onto their booth signage. It was that kind of thing.
It is not anymore. The arc of maturity that any technology follows is pretty predictable, I've learned over time. It starts like I've just described, with the predominant interest falling into the "educate me" camp. Then, over time, savvy business managers realize that "search for its own sake" isn't enough; one has to identify the business proposition attached to it. It becomes more than "what is search?" It becomes "why is search important?" Then finally, the prevailing interest in (name the technology) becomes one of identifying the specific business processes that (name the technology) can address.
That's where we sort of are with "enterprise search." In fact, we've taken to referring to it as "intelligent search" at this stage, for a few reasons. One, the application of linguistics and natural language processing techniques has, in fact, made search more "intelligent." But I'd argue that "intelligent search" is a better term because we've gotten a lot smarter about the ways in which we apply it.
One of the ways we've gotten smarter is to pick our fights a lot better. Many smart companies have learned that search is a tool... a means to an end. And many of the vendors in the space have learned to pursue those very specific, very defined applications as opportunities.
One of them is AccessData. They would probably not identify themselves as "enterprise search" vendors. But by applying search tools throughout the enterprise in an intelligent manner, they have created a niche for themselves: computer forensics.
For AccessData, forensics is not enterprise search and it's not records management, but those elements certainly play a role. It's much more specialized than that. "When many companies try to embark on an overall enterprise search initiative, they quickly realize that a good portion of their data repositories don't fit into the vendor's model, or they can't be indexed, or it's too dynamic, or it's in some kind of strange archive format, and the effort falls far short of the goal." That was Devin Krugly, director of corporate development for AccessData Group speaking. I sat down with him a while back to sort out this new age of "purpose-driven" search.
"Search that doesn't work in a certain type of repository is untenable for forensics," Devin said. "The data has to be found, whether it's a litigation case, or some kind of criminal investigation. For our part, we don't necessarily search thoroughly into the biggest and hairiest data repositories, such as SAP, but we can certainly discover what needs to be found once the pieces that are appropriate have been handed over. If we need to find some financial component, and it's in the form of a database, we can do that, and index it, serve it up for speedy search, and maintain the chain of custody along the way."
Now, in full disclosure, I had to tell Devin that my grasp of forensics barely rises above the occasional Friday night episode of "CSI." So information forensics-the kind of stuff that federal investigators use to find fraud and miscreance in the largest of financial services organizations and the largest of corporations-is way above my pay grade.
So I asked a kindergarten question: Isn't it true that your task is to federate search across all repositories, and locate everything without fail and without exception? "We try to!" he said. "It's true in criminal forensics and certain e-discovery cases that we try to reach that goal, but the net is not necessarily cast that wide. Certainly in criminal investigations, we're reliant on human beings, too. People and processes, as everybody likes to chant, is part of the investigation and part of the e-discovery process. In a civil case, for example, an attorney still has to decide, out of an employee base of 150 people, who do I need to interview? Where do I need to look? What do I need to ask them? At the federal level, the rules of FRCP help to determine the scope of the search and the location of the data. But at the county and municipal level... they're not restricted by FRCP rules. So it's up to human judgment."
So search is not the total answer to the problem? "Correct," he said. "The important thing is that it's done in a consistent, concise, repeatable fashion, especially in the case of e-discovery. And if you've already invested in the product to do those things, there's no need to have three different systems to accomplish the same goals."
Devin then explained that enterprise search can be conducted without consolidating and normalizing the data in question. "Forgive me if I'm telling you something you already know," he said, apparently forgetting that there's not a lot I DO know. "But in the e-discovery space we refer to it as a ‘connector.' These connectors are simply ways to address different sources of data such as email systems or structured data stores. It's the same idea as a driver or an API... same concept. We're developing a pretty good library of those connectors that make a ‘federated' search, as you described it, possible. We address some repositories in their native state, but we also think there needs to be context. But when you're doing an investigation, ones and zeroes... or even words... don't mean a heck of a lot unless you have the context in which they appear," he said.
The Cost Factor
There are valid business and technical reasons for simplifying search in this way. "Some search tools create separate indexes for every repository. These indexes can be a third to half-again the size of the original data you're trying to search! Introducing the overhead, cost and long-term maintenance of those indexes for a point-in-time purpose-such as an e-discovery-doesn't make sense. It's not a good business decision," Devin insisted. "If I'm looking at seven petabytes of data, and I need to add another two petabytes just to make it searchable... that's not a smart business decision."
He went on. "It's just better, from an IT perspective, to have a dashboard that shows results from any number of servers than to have the secondary and tertiary indexes necessary to search all that stuff in the manner that some products make you do. It reduces risk and costs," he continued. "Risk in this case can take the form of servers that don't function; indexes that aren't updated, a fault or tolerance issue from another system or some other kind of network hiccup.