Big data presenting big problems?
We have long written about “Big Data” which is defined as large data sets which cannot be managed with simple, common software that captures and processes the data, and is typically consisting of at least dozens of terabytes in a single data set. The challenges of Big Data are, clearly big, and most attention is being paid to the massive amounts of data being generated by social media sites like Facebook and Foursquare.
But it’s not just consumer data that is being called into question. Industry expert Nassim Taleb recently opined on WIRED, “With big data, researchers have brought cherry-picking to an industrial level. Modernity provides too many variables, but too little data per variable. So the spurious relationships grow much, much faster than real information. In other words: Big data may mean more information, but it also means more false information.”
Taleb addresses something that could lead one to think that big data is faulty and bad, but perhaps Taleb is really pointing out the human nature that is still required in some instances of analyzing big data – and most people would not typically question a researcher or their methods, leaving analysis in its youngest phase subjective.
Defining variables is key in big data analysis
Nitin Mayande is the Co-Founder and Chief Scientist at Tellagence which offers the next step in social marketing intelligence through predictive products. Mayande has a PhD in Engineering Management and is a well respected figure in the industry. He tells AGBeat that Taleb’s statement isn’t necessarily wrong, nor is it completely correct.
Mayande noted that Taleb “is right in saying that with big data one can use many many more variables but to say too little data per variable might not be right. By the way when analyzing the data, the efficacy of variables depends upon the frame of reference and the system definition one chooses. If one uses a open system instead of closed system it may not be possible to define variables at all.”
The great irony of big data
Chris Treadaway, CEO and Founder of Polygraph Media which is famous for data-driven analytics said, “To analyze big data, you have to know when you have enough data, know that you’re looking at the right data, and know how and when to draw conclusions from the data using methods developed from statistics theory and data science. That’s the great irony of “big data” – it’s as much of an art as a science, which is why the best efforts are multidisciplinary.”
“Big data can find tremendous hidden relationships,” Treadaway continued, “but you have to make sure your bias isn’t to find conclusions that don’t exist. Bias can cause the situation Taleb describes, and will cause disinformation as he says. If you’re cautious, discerning, and careful, you can make the most of big data. But there are pitfalls for the careless.”
More data is not the answer
Matt Hixson, Co-Founder and CEO of Tellagence stated, “I would agree that more data is not the answer. Most problems need the right data – not an infinite set. I see this a ton in social, people don’t fully understand what is going on so they just keep correlating more and more data.”
Hixson continued, “This is all that most of the big data scientists at Twitter, LinkedIn and Facebook are doing because the don’t know what else to do. The platforms that are generating this data – social networks are a primary source – creating new phenomena that people don’t understand yet.”
Big data remains a treasure trove of information while also presenting challenges as the world attempts to make sense of the tool and how to analyze the infinite amounts of data.