Big begets big: The information governance challenge
“Big data without governance will quickly become a big problem,” says Christophe Toum, the data governance manager at Talend in an interview with Anmol Rajpurohit in KDNuggets.
Talend develops software tools that allow licensees to deploy enterprise-ready solutions that help unlock business value.” The company provides master data management (MDM) solutions and user friendly programming tools. The company suggests that without coding, a customer can go from zero to big data in under 10 minutes.
Toum points out, “The actual consumers of the data tend to take the matter into their own hands in order to have the agility the business requires. To regain control and apply the proper data governance policies, IT needs to deploy and support a platform that offers a non-IT person enough simplicity, flexibility and productivity for them to willingly give up their ad hoc tools and scripts.”
The idea is appealing. An individual wanting to make sense out of marketing, social media content or data from geolocation information can click, discover, explore and identify significant information.
Today digital information in even small and midsize organizations doubles and redoubles rapidly. Regular data like word processing and Excel files become big data. PowerPoint presentations and e-mail are like the fast-replicating creatures in the 1967 Star Trek episode “The Trouble with Tribbles.” The “trouble” was that tribbles produced more tribbles, lots of tribbles. Like tribbles, digital data keeps on proliferating. An organization’s thirst for social media data and third-party information seems to be unquenchable.
Real-time social media view
A number of companies are delivering knowledge-centric systems that are truly mind-boggling. In Chicago, engineers at Geofeedia offer a system that delivers a real-time view of social media content. You can watch a video of the system at youtube.com/watch?v=pjZU8KRoezo.
The company’s system processes real-time information from a social media source such as Twitter and provides functions that allow an authorized user to view individual messages within a specific geographic area. A public safety officer can highlight an area on a map and explore the information flowing from that specific geographic area in real time. (Download chart)
The Geofeedia interface allows the user to specify a particular data stream to examine. The user may enter a keyword query or use the point-and-click interface to identify an area to analyze. The system then displays the individual content objects for that region. The system is highly interactive and makes data exploration rapid, easy and engaging. It can be used to access information that would otherwise be unavailable. Applications range from public safety to retail and healthcare.
Most organizations struggle with the flows of digital information produced and accessed by employees, contractors and consultants. The idea of tapping into a stream of hundreds of thousands or tens of millions of separate content objects in a meaningful way may be difficult to grasp and then use meaningfully.
Most organizations wrestle with little data problems on a daily basis. Who has the most recent version of the contract? What is the name of the supplier who provided the defective parts? Where is the customer mailing list we used after last month’s trade show? Answering them requires traditional processes like asking people and the time-consuming opening, viewing and closing of Word files. When specialized enterprise systems exist like legacy accounting systems or a cloud-based system from Salesforce.com, information is available. Employees have to locate it, assemble it and create a fresh digital file. Information at your fingertips morphs into information out of reach.
Defining big data
If you want to add to your knowledge about big data, “60 New Resources and Articles About Data Science, IoT, Machine Learning, R, Python, Big Data” will give you a wealth of links to explore. I worked through the list, examined articles about algorithms, a polemic enjoining me to start a statistical revolution and a taxonomy of data scientists.
The list does not include a definition of big data. The Wikipedia explanation requires 8,000 words. I prefer the Google definition: “Extremely large data sets that may be analyzed computationally to reveal patterns, trends and associations, especially relating to human behavior and interactions.”
I had high hopes for “Going Beyond Big Data to Knowledge,” published in Forbes. The article states: “Data is the starting point and basic building block in a knowledge-based organization. Since the majority of big data uses today are machine-to-machine ad serving applications of real-time digital or internal data, knowledge isn’t required. Strategy requires a broader view of data. Strategy requires data that serves as fuel, but logic and experience still need to be applied to generate knowledge-based systems. Knowing not only what happened, but why it happened (diagnostic), what will happen (predictive) and how we can make it happen (prescriptive) is important for moving beyond big data to knowledge.”