Connect with us

Hi, what are you looking for?

The American GeniusThe American Genius

Tech News

Big data: overcoming cherry picking and endless variables

Big data presents businesses of all size with unprecedented insight into before unseen details, but does more data equal more problems? The answer may not be yes or no…

big data

big data

Big data presenting big problems?

We have long written about “Big Data” which is defined as large data sets which cannot be managed with simple, common software that captures and processes the data, and is typically consisting of at least dozens of terabytes in a single data set. The challenges of Big Data are, clearly big, and most attention is being paid to the massive amounts of data being generated by social media sites like Facebook and Foursquare.

But it’s not just consumer data that is being called into question. Industry expert Nassim Taleb recently opined on WIRED, “With big data, researchers have brought cherry-picking to an industrial level. Modernity provides too many variables, but too little data per variable. So the spurious relationships grow much, much faster than real information. In other words: Big data may mean more information, but it also means more false information.”

Taleb addresses something that could lead one to think that big data is faulty and bad, but perhaps Taleb is really pointing out the human nature that is still required in some instances of analyzing big data – and most people would not typically question a researcher or their methods, leaving analysis in its youngest phase subjective.

Defining variables is key in big data analysis

Nitin Mayande is the Co-Founder and Chief Scientist at Tellagence which offers the next step in social marketing intelligence through predictive products. Mayande has a PhD in Engineering Management and is a well respected figure in the industry. He tells AGBeat that Taleb’s statement isn’t necessarily wrong, nor is it completely correct.

Advertisement. Scroll to continue reading.

Mayande noted that Taleb “is right in saying that with big data one can use many many more variables but to say too little data per variable might not be right. By the way when analyzing the data, the efficacy of variables depends upon the frame of reference and the system definition one chooses. If one uses a open system instead of closed system it may not be possible to define variables at all.”

The great irony of big data

Chris Treadaway, CEO and Founder of Polygraph Media which is famous for data-driven analytics said, “To analyze big data, you have to know when you have enough data, know that you’re looking at the right data, and know how and when to draw conclusions from the data using methods developed from statistics theory and data science. That’s the great irony of “big data” – it’s as much of an art as a science, which is why the best efforts are multidisciplinary.”

“Big data can find tremendous hidden relationships,” Treadaway continued, “but you have to make sure your bias isn’t to find conclusions that don’t exist. Bias can cause the situation Taleb describes, and will cause disinformation as he says. If you’re cautious, discerning, and careful, you can make the most of big data. But there are pitfalls for the careless.”

More data is not the answer

Matt Hixson, Co-Founder and CEO of Tellagence stated, “I would agree that more data is not the answer. Most problems need the right data – not an infinite set. I see this a ton in social, people don’t fully understand what is going on so they just keep correlating more and more data.”

Hixson continued, “This is all that most of the big data scientists at Twitter, LinkedIn and Facebook are doing because the don’t know what else to do. The platforms that are generating this data – social networks are a primary source – creating new phenomena that people don’t understand yet.”

Advertisement. Scroll to continue reading.

Big data remains a treasure trove of information while also presenting challenges as the world attempts to make sense of the tool and how to analyze the infinite amounts of data.

Lani is the COO and News Director at The American Genius, has co-authored a book, co-founded BASHH, Austin Digital Jobs, Remote Digital Jobs, and is a seasoned business writer and editorialist with a penchant for the irreverent.

Click to comment

Leave a Reply

Your email address will not be published.

AdBlocker Message

Our website is kept FREE to you by displaying online ads to our visitors. Please consider supporting us by disabling your ad blocker OR subscribing to our email newsletter: https://theamericangenius.com/get-american-genius-newsletter/

The
American Genius
news neatly in your inbox

Subscribe to our mailing list for news sent straight to your email inbox.

Advertisement

KEEP READING!

Business News

(BUSINESS) With the endless amount of third party data available to companies of all sizes, why are so many still avoiding digging in?

Business Marketing

(MARKETING NEWS) Ever consider advertising with Facebook? Well, you may want to be sure that their metrics are accurate first, then jump in.

Business Entrepreneur

(ENTREPRENEUR NEWS) Blazemetrics calls itself "the better way to measure project health," and any GitHub user or team should take a serious look at...

Business Entrepreneur

(ENTREPRENEUR NEWS) Growth Report for Slack helps you understand and fine-tune your startup's growth engine by offering real growth metrics.

The American Genius is a strong news voice in the entrepreneur and tech world, offering meaningful, concise insight into emerging technologies, the digital economy, best practices, and a shifting business culture. We refuse to publish fluff, and our readers rely on us for inspiring action. Copyright © 2005-2022, The American Genius, LLC.