Expert Voices

These are Data's Dark Ages, and That Needs to Change (Op-Ed)

data, information, using data
(Image credit: sergign | Shutterstock)

Satyen Sangani is an economist and CEO of Alation, which helps businesses better find, understand and use internal data. Sangani spent nearly a decade at Oracle following positions with the Texas Pacific Group and Morgan Stanley & Co. This Op-Ed is part of a series provided by the World Economic Forum Technology Pioneers, class of 2015. Sangani contributed this article to Live Science's Expert Voices: Op-Ed & Insights.

For those of us who champion the power of data, the past five years have been an incredible ride thanks to the rise of big data. Consider just these three examples: By 2020, humanity will have created as many digital bits as there are stars in the universedata drove U.S. President Barack Obama's wins in the 2008 and 2012 elections; and data is powering the incredible rise of new companies like Uber and Airbnb, allowing people to monetize their most illiquid, fixed assets like cars and houses. 

Of course, data hasn't accomplished any of this. Data isn't the protagonist in any of the stories above. Humans are. People use data. Data can show correlations and trends, but people have insights that suggest cause and effect. Insights are what enable better decisions and drive innovation. Here's the catch: In spite of our recent data-driven achievements, the evidence suggests that humans may well be in the dark ages of data.

Consumption requires context

McKinsey, in their broadly read Big Data report, estimates that there will be only 2.5 million data-literate professionals in the United States in 2018 — fewer than 1 percent ofthe projected population. Surveys show that professionals today still take action the old-fashioned way — based on gut instinct personal experience and what they think they know. 

So, with all this data, technology and promise, how do we build a more data-literate world? 

If we think of data as food for our minds, the nutrition movement might offer some clues. Today the state of labeling data for appropriate use is akin to the opaque labeling of food products more than 40 years ago. Until relatively recently, we had no idea whether the food we ate contained inorganic products, genetically modified ingredients, lead or even arsenic. Today we have raised nutritional awareness by listing critical ingredients and encouraging nutritional literacy that can assist in making healthy eating a conscious behaviour

Consuming data appropriately requires the same type of conscious evaluation of ingredients. One relatively common and simple example from our company experience involved a large, multinational corporation — it turned out that the Date of Birth field on one of their forms was generally not populated. Rather, it defaulted to Jan. 1, 1980. As a consequence, if a company employee tried to find the average age of customers, the conclusion showed customers as younger than they really were. The mistake happened so often that it had created a myth within the institution that they serviced young customers when their actual customers are typically middle-aged.

Drawing incorrect conclusions from data often does more damage than not using data at all. Consider the spurious relationship between vaccinations and autism or that six of the 53 landmark cancer studies were not reproducible by Amgen expert cancer researchers. An Economist survey from 2014 revealed 52 percent of surveyed executives discounted data they didn't understand, and rightfully so. The Economist reminds us that a key premise of science is "Trust, but Verify." The corollary also holds true — if we can't verify, we won't trust. 

Packaging data

No one wants to consume something that they're not expecting. If someone expects a red velvet cupcake and you feed them pizza, they might live with it, but the initial experience is going to be jarring. It takes time to adjust. So, what does this have to do with data?

Data doesn't really speak your language. It speaks the language of the software program that produced the information. You say sales, and the dataset says rev_avg_eur. You say France, and the dataset says CTY_CD: 4. 

Can these labels be learned? Sure, but even in a relatively small organization, there might be 20 software programs in use every day, each of which has hundreds of different codes, attributes and tables. Good luck if you are in a multinational organization with tens of thousands of such programs.

This translation has a larger unseen cost. A recent industry study highlighted that 39 percent of organizations preparing data for analysis spend time "waiting for analysts to assemble information for use." And another 33 percent spend time "interpreting the information for use by others." If, every time we need an answer, it takes us hours or days to assemble and interpret the information, we'll just ask fewer questions — there are only so many hours in a day. Making data easy to consume means ensuring that others can easily discover and comprehend it.

A data-literate world

We have an incredible opportunity in front of us. What if just 5 percent of the world's population were data literate? What if that number reached 30 percent? How many assumptions could we challenge? And what innovations could we develop?

If you're a topical expert — researcher, business leader, author or innovator — and would like to contribute an op-ed piece, email us here.

According to the Accenture Institute for High Performance, in an article from Harvard Business Review, the skills required to be data literate include understanding what data means, drawing correct conclusions from data and recognizing when data is used in misleading or inappropriate ways. These are the decoding skills that enable an individual to apply data analysis accurately to decision-making. Rather than focusing on making data consumers do more work, maybe we can boost literacy by surrounding the data with context and reducing the burden of understanding the information.

Metrics and statistics are wonderful, but we need to surround data with more context and lower the costs of using them. More fundamentally, we have to reward those people and systems that provide this transparency and usability. Data is just made from pieces of information — we need to evolve in how we use them to unlock data's potential.

Read more from the Technology Pioneers on their Live Science landing page. Follow all of the Expert Voices issues and debates — and become part of the discussion — on Facebook, Twitter and Google+. The views expressed are those of the author and do not necessarily reflect the views of the publisher. This version of the article was originally published on Live Science.

Alation