What is big data?
“Big Data” is defined as large data sets which cannot be managed with simple, common software that captures and processes the data, and is typically consisting of at least dozens of terabytes in a single data set. The challenges of big data are, well, big. It is described by Gartner analyst, Doug Laney1 as being three-dimensional, i.e. increasing volume (amount of data), velocity (speed of data in/out), and variety (range of data types, sources).
The term is being leveraged in conferences across the globe, but most are talking about the challenges of big data as it pertains to social networking sites like Facebook or location based services like Foursquare, and the idea of the volume of data being collected on every single person and how that data is becoming so large that it is a logistical challenge to manage and harvest with any meaning.
In fact, big data was a popular theme at the recent South by Southwest Conference in Austin, with technologists and marketers bringing their unique backgrounds to the conversation, each addressing the collection of and processing of the unprecedented data being collected, for the first time outside of the government, and the concerns that go along with consumers blindly offering up the data.
How BIG is big data?
Strictly speaking to social data, consider for a moment that in a 60 second period, 23,148 apps are downloaded from the App Store, and 208,333 minutes of Angry Birds is played via smartphone3. Additionally, over 28,000 text messages are sent every second, and the average mobile phone user has 736 pieces of personal data collected every day and service providers store this information for one to five years3.
The University of Nebraska physics department4 has 1.6 petabytes of data – that’s 1.6 million gigabytes in one department at one school. Boeing jet engines can produce 10 terabytes of operational information for every 30 minutes they turn4.
Twitter produces 20 gigabytes of data every day
Twitter serves more than 200 million users who produce over 800 tweets per second, each of which is roughly 200 bytes in size4, so on an average day, this traffic equals over 12 gigabytes, and throughout the Twitter ecosystem, the company produces a total of eight terabytes of data per day, compared to the New York Stock Exchange’s single terabyte of data daily.
Like a Boeing jet records every move, every time you interact with Facebook, it records data. It stores information on who clicks what and not just the name of the person, but that person’s profile information like hobbies, high school, family members, ethnicity, religion, employer, etc. Then, that data connects that one click with all other clicks you make within Facebook or any website with the Facebook plugin. That is a lot of information to store for a few simple clicks, especially given how much more demographic information is recorded and tracked than Twitter which creates 20 times the amount of data in a single day than the NYSE.
What to do with the data?
The big challenge behind the scenes is how to process and manage this volume of information in an era where consumers voluntarily offer thousands of data points through social networks, smartphones and the like. Social data is where the buzz is at conferences, but it is being referred to solely as big data, which clearly is much more complex than just what someone clicked on Facebook.
IT expert, Andrew McCafee recently shared the story5 of an Allstate-sponsored contest wherein a small team of data scientists quickly achieved a 340 percent improvement in Allstate insurance’s ability to predict bodily insurance claims based on car characteristics – without any expertise in insurance or automobiles, and without consulting Allstate’s well paid mathematicians who build and maintain these prediction models.
McAfee asks, “So how can it be that a small team of people who don’t even work for the company was able, in three months, to achieve a 340% improvement over Allstate’s ability to predict bodily injury insurance claims based on car characteristics? And how was the team able to do this while working only with disguised data — without, in other words, knowing the true makes and models of the cars? Welcome to the weird new world big data.”
What this means for business
All of the social data collected in recent years is finally becoming useful for more than buying a Facebook ad. True demographics are now evident, but moreover, real consumer behavior is being studied based on tremendously large amounts of data. If just one simple text message records a dozen pieces of data, imagine the depth of information a Facebook user transmits in a single day, with all of their personal data attached to each move.
Social data (or big data) is a nightmare to manage due to the sheer volume and velocity of transmission, but for businesses, it means a legitimate understanding of consumer behavior, not just what someone shared during a focus study or web poll, but really being able to track and understand how each consumer ticks. What’s next is being able to translate that behavior into an isolated profile of a specific type of buyer – this is the kind of data that marketers’ have been dreaming about for decades.