Connect with us

Tech News

Big data is useful, scary, and more subjective than you know

Big data helps us understand our customers, but it also helps budding companies sell information about you (or TO you), and is more subjective than you may know – it takes a human touch to determine what info is important.

Published

on

big data

big data

Big data is here and unavoidable

For years, we’ve written about big data and showcased the progression of business intelligence available now to brands of every size, in fact, most businesses have a feel for this type of data – open a spreadsheet of your sales data and you already know it’s just a bunch of numbers unless they are analyzed and filtered. Today, I want to review what big data is, how it is currently being used, what this means for the future, and most importantly, how it can be cherry picked and why it can upset entire industries.

[ba-pullquote align=”right”]”Big data” is typically consisting of at least dozens of terabytes in a single data set.[/ba-pullquote]“Big data” is defined as large data sets which cannot be managed with simple, common software that captures and processes the data, and is typically consisting of at least dozens of terabytes in a single data set. The challenges of big data are really big. It is described by Gartner analyst, Doug Laney as being three-dimensional, i.e. increasing volume (amount of data), velocity (speed of data in/out), and variety (range of data types, sources).

Let’s talk about how BIG this really is

Let me illustrate. The University of Nebraska physics department has 1.6 petabytes of data – that’s 1.6 million gigabytes in one department at one school. Boeing jet engines can produce 10 terabytes of operational information for every 30 minutes they turn. As of 2012, the average smartphone user has 736 pieces of personal data collected every day, stored for one to five years by service providers.

[ba-pullquote align=”right”]By 2020, there will be 5,200 gigabytes of data for every human on Earth.[/ba-pullquote]IBM’s chief executive, Virginia Rometty said, “By one estimate there will be 5,200 gigabytes of data for every human on the planet by 2020. And powerful new computing systems can store and make sense of it nearly instantaneously.” It has also been predicted that in the coming years, over 200,000 big data specialists will be required to make sense of the barrage of data being collected.

Big data is already being used today in a big way

Big data is a big deal and it’s not just because there’s a lot of it. In fact, today alone, SumAll raised $4 million and DataSift raised a whopping $42 million to help businesses make sense of their data as it relates to social media.

[ba-pullquote align=”right”]Retailers are analyzing your facial expressions on camera to tell if you’re a happy shopper, and tracking your gender, age, and size as you walk in the door.[/ba-pullquote]Big data is already used in amazing ways by the retail industry by analyzing shopper height and size as they walk in the door to determine age, gender, and more, and even have cameras analyzing facial expressions while you’re shopping to gauge your experience. If that doesn’t impress you, there’s already a seasoned company that is tracking “visual mentions” online so if you share a picture of your Starbucks cup on Instagram, even if you don’t say Starbucks or use GPS, Starbucks can see that their logo, even if curved, was used online on a social network.

Predicting the future with big data

But it’s not just that data is having a tremendous impact on life today, it is still a young sector with many startups yet to pop up to solve the data conundrums. SiftScience fights fraud using machine-learning that learns from data to recognize patterns of fraudulent behavior based on past examples, and Hadoop helps companies analyze massive amounts of generating about user behavior and their own operations while Recorded Future uses algorithms that unlock predictive signals based on web chatter to determine a brand anticipate risks and capitalize opportunities.

[ba-pullquote align=”right”]Intel is working on technology using big data to allow you to see three cars ahead, behind, and beside you.[/ba-pullquote]There are already projects in the works that allows forecasters to predict weather up to 42 days in advance, potentially saving lives and billions of dollars a year.Intel is working on a big data project that allows cars to communicate so drivers will be able to see three cars in front of, behind, and to the left and right – simultaneously. Ford is developing vehicle-to-vehicle and vehicle-to-infrastructure systems to warn drivers of potentially hazardous traffic events, like cars going through red lights.

But big data has some really big problems

First, and least upsetting, is that there are big problems with demographics, leaving brands with a lot of data that doesn’t yet mean much. Why? Incomplete self reporting is a huge issue because brands are still focused on using social networking profile data to gather intelligence on their site users, fans, and the like, but when they rely on this data, people may not be completely truthful (they may say they are 32, but they’re 12, and so forth). Additionally, privacy does protect users to a certain extent, blocking intelligence gathering by brands. Lastly, data is still largely inconsistent and unconnected – you may have a Twitter account and Facebook account, but a third party doesn’t know that unless (a) you use the same username consistently or (b) you grant access to both accounts through that third party.

While other problems exist (like how will we ever store all of this data, disseminate it, and make sense of it, and does it all really matter?), the biggest one we see is the potential for cherry picking, because when you look at a data set, it still takes a human to actually determine what is important to garner from that data set.

[ba-pullquote align=”right”]Big data may mean more information, but it also means more false information.[/ba-pullquote]Industry expert Nassim Taleb opined in February, “With big data, researchers have brought cherry-picking to an industrial level. Modernity provides too many variables, but too little data per variable. So the spurious relationships grow much, much faster than real information. In other words: Big data may mean more information, but it also means more false information.”

Taleb addresses something that could lead one to think that big data is faulty and bad, but perhaps Taleb is really pointing out the human nature that is still required in some instances of analyzing big data – and most people would not typically question a researcher or their methods, leaving analysis in its youngest phase subjective.

Chris Treadaway, CEO and Founder of Polygraph Media which is famous for data-driven analytics said, “To analyze big data, you have to know when you have enough data, know that you’re looking at the right data, and know how and when to draw conclusions from the data using methods developed from statistics theory and data science. That’s the great irony of “big data” – it’s as much of an art as a science, which is why the best efforts are multidisciplinary.”

“Big data can find tremendous hidden relationships,” Treadaway continued, “but you have to make sure your bias isn’t to find conclusions that don’t exist. Bias can cause the situation Taleb describes, and will cause disinformation as he says. If you’re cautious, discerning, and careful, you can make the most of big data. But there are pitfalls for the careless.”

And the coup de gras

[ba-pullquote align=”right”]Your performance data, finances, company info and more are already being repackaged for public consumption and monetization.[/ba-pullquote]The coup de gras is that professionals are being threatened by new ways big data is being used, but they are not recognizing it as a big data issue.

Several industries are seeing data about them individually, their performance, their company, their finances, all analyzed and repackaged for public consumption or monetization.

Imagine a site launches tomorrow based on publicly available data and you’re a social media consultant. Let’s say that this new site looks at who has recommended you on LinkedIn, Yelp, Angie’s List and so on, and has determined that the people recommending you are clients of yours, based on the assumption that it is the only reason they’d recommend you or review you. The new site also analyzes words and pictures used in your online bios to determine characteristics about you.

Then, they take those reviews and characteristics and quantify you into a score, giving you more points if someone from Coca Cola reviewed you than if the local dentist reviewed you, implying that you’re a higher quality consultant if you’ve worked with a major brand like Coca Cola than if you worked with a local dentist (God forbid you specialize in social media for independent medical professionals).

Then, Google gets interested in this new site and they invest, and later, they want to use that data to populate your Google+ profile, so now you, the social media consultant, has a score next to their face to determine how good you are at your job.

What’s wrong with that?

[ba-pullquote align=”right”]You must understand that data requires a human to determine what is relevant, which doesn’t always allow for the full context of the data points.[/ba-pullquote]Data is subjective, even when raw – it takes humans to determine what data points in the sea of data are relevant, and it doesn’t always take into account the context surrounding that data. You, the social media consultant, could have taken a two year sabbatical to execute social media strategies pro bono for three tiny charities, four local restaurants, two African orphanages, and a spa, earning a reputation for your high quality of work and compassion that can’t possibly quantified by a computer.

This scenario is fake. For now. But with every human generating billions of data points every year, evaluations are just the first of many steps in what is to come with big data – the data is now generated, and it is a race to see what can be displayed about you and your business so that companies can sell to you or repackage your data and sell it to someone else. Even your brand will be using big data to gain insights into your customers so you can better serve them.

[ba-pullquote align=”right”]The race is on to see what can be displayed online about you and your business, which is being repackaged and resold.[/ba-pullquote]There are pros and cons to big data, but the reality is that it is unavoidable, even if you ignore it or misunderstand it. Consumers need to begin to recognize when they see big data, and understand that it may not be the true context of that data, as it is ripe with humans’ decisions regarding what is important about a data set. This is just the beginning.

Continue Reading
Advertisement
4 Comments

4 Comments

  1. Patrick Gallagher

    December 6, 2013 at 12:43 pm

    100% spot on conclusion: “…as it is ripe with humans’ decisions regarding what is important about a data set.”

    I like how Rory Sutherland (sometimes with Taleb or speaking about his work) kicks these ideas around. As soon as you pick certain data points and make them *the* metrics to follow the data becomes skewed and meaningless. You changed it just by looking at it so hard.

    Good stuff.

  2. Hank Miller

    December 9, 2013 at 7:35 am

    We are drowning in data and it can lead to paralysis by analysis.

    Watching videos and researching how to make shrimp scampi, set a broken wrist or install a hard drive does not mean that you can do it. Somewhere along the line a human with experience in the appropriate field has to provide guidance and identify the key points.

    Piles of data are just that – without someone with the abilty to effectively apply the appropriate parts to the specific question at hand there is nothing. Nothing but confusion

  3. Pingback: Top venture capitalist explains how tech startups can stand out when seeking funding - The American Genius

  4. Pingback: Big data is watching you - some will panic, others will rejoice - The American Genius

Leave a Reply

Your email address will not be published. Required fields are marked *

Tech News

The inventor of the internet wants to give back control of your data

(TECH NEWS) Using the internet has given us access to many things, but we’ve also lost control of our data. Can the father of the internet give it back?

Published

on

Multiple monitors set up on desk with control for data enabled.

Since it was first introduced in 1989, the internet has come a long way, both in good and bad ways. With several communication tools available online, connecting with friends and family on the other side of the world hasn’t been this easy. However, it has taken away something, too — the control over our data.

Our information is everywhere. Once it’s out there, there is very little, if anything, we can do to control how it’s being used or who’s using it. But, the father of the internet, Tim Berners-Lee, wants to reinvent how users take back control of their data.

“We’re on a mission to change the way the web works and the way to basically make the web a better place for all of us,” said Berners-Lee on The Telegraph Live.

In an attempt to “fix the web”, Berners-Lee launched a privacy-focused startup, Inrupt. Using the company’s data storage technology called Solid, the tech company changes how data is stored to give you more control.

“Solid is the new way to connect to people and data. It’s an open-source web-based protocol that re-architects the way data is stored and shared,” said Berners-Lee.

With Solid, you put your personal data together into a personal online data store called a “pod”. Any kind of information can be stored in a pod such as websites visited, travel plans, health records, or credit card purchases.

The pod can be hosted on any Pod Provider, or you can host it yourself. Pods hosted on a Solid Server are fully compartmentalized from other Pods. Each one has its own set of data and access rules, and you decide who to share your data with using Solid’s authentication and authorization systems. And, you can also remove access to anyone at any time.

Inrupt was introduced back in November 2020, and the Solid technology is already being used by some large companies like the BBC and the National Health Service (NHS) in Britain.

The company’s business model is based on charging licensing fees for its commercial software, which uses Solid open-source technology. According to The New York Times, Inrupt has raised about $20 million in venture funding.

Getting data back into a user’s hands is very good. But, is it something that will quickly be adopted by everyone, including the tech giants?

Well, users will finally gain control of how they share their data. According to Berners-Lee, Solid will provide a “generic back-end store that works with all apps without modification.” This means developers don’t have to worry about creating back-ends for different apps.

And companies, what will they get out of it? According to Inrupt CEO & Co-founder John Bruce, over the years, he found that a lot of companies were “spending a great deal of time and money collecting and protecting user data.” So, “by moving the point of control of data from the organization to the user everybody wants.” (i.e. money is saved)

“This is just the beginning of how we turn the red web right side up, restore some of its original values, like how we empower everyone to participate in and benefit from a web that serves us all,” said the internet inventor. “The future of the web is a lot bigger than its past.”

Continue Reading

Tech News

This web extension protects your sensitive information while screensharing

(TECH NEWS) If you’ve ever had to share your screen, you know that sometimes, your sensitive information still slips. But this extension helps by blurring your info for you.

Published

on

Online presenter gesturing at a large Mac desktop computer, being cautious of their sensitive information.

In the time of video calls, video gatherings, and video everything, at one point or another, we will eventually need to share our screen and/or record video. When it’s time to present, there is one thing we don’t want to display to others — sensitive information.

While we can all take a good deal of precautions to make sure we don’t overshare, there is no guarantee we won’t miss something. After all, we’re human. The good thing about these modern times is that there is always someone trying to think of how to make our first world video problems go away.

Sanskar Tiwari, a software developer and educator at YouTube, found it time-consuming having to edit videos to blur over things such as API keys, account emails, passwords, etc. Plus, having to wait for videos to render made the process even longer.

To solve his problem, he created a new web extension named Blurweb. According to the website, the extension helps “people doing live screen sharing or recording video to make sure their sensitive information is secure.”

The extension does this by giving you the option to blur out things like inputs, links, email addresses, and images.

So, how does it work?

  1. Once you have the extension, you can go on any webpage and turn it on by clicking on the extension icon.
  2. When the extension is on, a tab with a Turn Off/On, Clear All, and Close option tab pops up.
  3. With the extension on, you can select any element on the page, and the tool will automatically blur it out.
  4. Once the sensitive information you want saved is blurred, you can record or share your screen without having to worry that you’re accidently displaying that information.

If you want to remove the “blur” from your elements, you can select “Clear All” and everything will go back to normal. You can also quickly toggle the tool on and off and close it once you’re finished.

Since Blurweb.app runs as an extension on the web browser, it can work on any website and even works offline. If you’d like to check it out, you preview it on their website here.

Continue Reading

Tech News

Star Citizen: A cautionary tale of Kickstarter and crowdfunding

(TECH NEWS) Why is the most funded game in history still in development and has no clear release date? Why crowdfunding as a concept cannot be seen as reliable from a backer’s perspective.

Published

on

Magnifying glass over Kickstarter URL and site, a crowdfunding website.

Kickstarter – at its core – is a brilliant idea (and I wish I’d thought of it first). Creating a funding platform to literally allow anyone to bring an idea to fruition by asking for – essentially – seed capital and investors en masse via crowdfunding is truly appealing in every sense of the word. Originally a stronghold of new inventions, gadgets, and apparel, it quickly spread into the entertainment industry as well, with hobbyist game developers, auteur filmmakers, and first time writers given the chance to use crowdfunding to breathe life into their creations.

Star Citizen first appeared on the Kickstarter platform way back in 2012 and was hailed as the next great space simulation game. The campaign was started by Chris Roberts – one of the grand masters of the genre – who created the legendary Wing Commander series while working at Origin Systems. While these might be unfamiliar to non-gamers, anyone who played computer and console games in the 80s and 90s would recognize each name as a juggernaut of the industry.

Without going into specifics, this is the equivalent of Steven Spielberg asking for money to make Montana Miles, a new franchise centered around an ace paleontologist and all around tough guy roughneck adventurer who maybe had a run in or two with certain historical societies while pursuing artifacts from an ancient and forgotten world.

Ol’ Steve is definitely gonna get backers. To really set this up, imagine he asked for money in the late 80s. That’s the kind of perfect storm situation we’d have here.

Star Citizen managed to bring in over $2.1 million from nearly 35,000 backers at its inception, and the fervor and excitement was high. This was due to the pedigree of those involved in the project and the fact that a massive space sim had not seen release in several years (the video game industry – like many others – goes through cycles, with certain properties and genres fading into and out of popularity). Fans eagerly donated, and it reached its original $500K goal quickly, with 9 people contributing $10,000 each and another 19 pledging $5,000.

Since then, additional crowdfunding was conducted by giving fans the option to buy ships and other digital goods to be used in-game, bringing the total to $339 million in the past 10 years (accounting for pre-production and other planning that was done prior to the Kickstarter campaign).

Backing up for a second, consider that I just said 10 years. Which doesn’t sound too bad until you consider that the game is still not out and has no projected release date. If you go to their website, you can be directed to their Pledge Store to purchase ships and other items for a game that isn’t even done, and last released new public material way back in 2015. A side project meant to appease and entice backers – Squadron 42 – just announced its own delay.

And the developers have more or less given no reassurance or updated timelines. The prevailing theory is that this is the result of feature creep, but even this has sparked a number of heated discussions and angry denial from the developers.

Understandably, gamers are angry, and are (perhaps justifiably) lashing out (I won’t link to Reddit or any other forums, but it’s easy to sniff these out). There’s even a (hilarious) Imgur repository of broken promises and failed deliverables against a backdrop of developer feel-good rhetoric. At least one lawsuit has been filed.

Let me take a moment here to say that the gaming industry is no stranger to delays, and has also seen games be released in broken states. The biggest recent example is Sony pulling Cyberpunk 2077 from its digital storefront and offering refunds. Cyberpunk 2077 is the biggest and most anticipated game at the moment, but has been delayed countless times, suffered numerous glitches, crashes, is otherwise unplayable on console platforms (both the Playstation 4 and Xbox One), and been called a disaster.

Let’s not even go into talking about the legacy of delayed games, which stretches from Daikatana, Duke Nukem Forever, No Man’s Sky (though it should be noted that Hello Games has worked tirelessly to rectify the game’s original dismal state against its many, many promises)… The list goes on.

But we’re getting a little off course here by looking at traditionally funded games (even if there are dozens of problems there too). In terms of pure Kickstarter-funded debacles? There’s lots of examples, including DoubleFine’s Broken Age (famous for being the first major game to be crowdfunded and a story in and of itself), SpaceVenture (now over seven years late), and whatever it was that Yogscast game was trying to do (relevant because this was one of the biggest Youtube groups at the time). What about when backers paid for the Oculus Rift, only to have it purchased
outright by Facebook before it was even released to backers?

There’s too many fascinating and infuriating rabbit holes to go through.

So let’s talk about Kickstarter directly for a bit, because if we’re going to play the blame game (hah!), then we certainly need to consider their participation. As it stands, Kickstarter continues to operate with almost no oversight, and has remained a silent and invisible actor throughout these failures. In effect, they are a neutral third party.

Even worse, Kickstarter themselves say that a creator is under zero obligation to complete their project, and relies heavily on the fact that each and every crowdfunding campaign functions in a benefit of the doubt construct. If a creator reaches funding and is never heard from again, Kickstarter maintains that not only will they not pursue any kind of legal action, but doubles down on blaming the investing audience by stating that they knew the risks upfront. Put bluntly: Kickstarter has a very convenient excuse that “art works by different rules.”

In almost all instances, this has resulted in incomplete and abandoned projects, often fueled by lies, deception, and fraud. And yet, Kickstarter has dodged any and all liability, and it’s unlikely that backers can easily exercise any kind of legal action. A similar situation would be taking a contractor to court over an unfinished job, but having no way to actually enforce restitution even under a favorable judgement.

This doesn’t even take into account that there’s a chance of a rogue backer voicing so much dissatisfaction that they sue a company into bankruptcy. Sure, this sounds like reasonable punishment, is entirely legal, and conceivably is well within the rights of that person. But even so, does the blame lie with an inexperienced creator, impossibly high standards set by a (debatably unreasonable) customer, or with Kickstarter being an enabler?

The lofty goals of Kickstarter set against this backdrop of numerous pitfalls suddenly tarnishes its efficacy and integrity, exacerbated by a laundry list of what ifs and potentialities. There’s simply too many legal issues to navigate when it comes to crowdfunding.

I’m not even going to start going into more examples of failed Kickstarter projects, outright scams, and other clear cut bits of fraud and swindling.

Real quick, I want to mention a few other things – similar crowdfunding platforms such as Indiegogo have the same issues, GoFundMe is not without its own controversies, and Valve’s digital marketplace Steam gives developers the same loophole via its Early Access program by allowing them to keep a game in a forever-limbo state.

So I guess the lesson here is that all of these crowdfunding platforms should be treated with a similar attitude you might have when playing the lottery. At the least, try to vet the creator beforehand, as there are certainly viable companies that have run successful campaigns in the past. I encourage you to read user comments on a campaign’s page, research the company in question (have they put out successful products previously?), and be financially ready to lose the money you might put into a shiny new hypothetical.

Continue Reading
Advertisement

Our Great Partners

The
American Genius
news neatly in your inbox

Subscribe to our mailing list for news sent straight to your email inbox.

Emerging Stories

Get The American Genius
neatly in your inbox

Subscribe to get business and tech updates, breaking stories, and more!