sparql

Big Data – The Genie is out of the Bottle!

Posted on

Back in early 2011 myself and 15 other members of the Executive team at Ingres were taking a bet on the future of our company. We knew that we needed to do something big and bold, and decided to build what we thought the standard data platform would be in 5-7 years. A small minority of the people on that team did not believe this was possible and left, while the rest of us focused on making that happen. There were three strategic acquisitions to fill-in the gaps on our Big Data platform. Today (as Actian) we have nearly achieved our goal. It was a leap of faith back then, but our vision turned out to be spot-on and our gamble is paying off today.

Every day my mailbox is filled with stories, seminars, white papers, etc. about Big Data. While it feels like this is becoming more mainstream, it is interesting to read and hear the various comments on the subject. They range from, “It’s not real” and “It’s irrelevant” to “It can be transformational for your business” to “Without big data there would be no <insert company name here>.”

Illustration of smoke coming out of a brass lantern

What I continue to find amazing is hearing comments about big data being optional. It’s not – that genie has already been let out of the bottle. There are incredible opportunities for those companies that understand and embrace the potential. I like to tell people that big data can be their unfair advantage in business. Is that really the case? Let’s explore that assertion and find out.

We live in the age of the “Internet of Things.” Data about nearly everything is everywhere, and the tools to correlate that data to gain understanding of so many things (activities, relationships, likes and dislikes, etc.)  With smart devices that enable mobile computing we have the extra dimension of location. And, with new technologies such as Graph Databases (based on SPARQL), graphic interfaces to analyze that data (such as Sigma), and identification technology such as Stylometry, it is getting easier than ever to identify and correlate that data.

We are generating increasingly larger and larger volumes of data about everything we do and everything going on around us, and tools are evolving to make sense of that data better and faster than ever. Those organizations that perform the best analysis, get the answers fastest, and act on that insight quickly are more likely to win than the organizations that look at a smaller slice of the world or adopt a “wait and see” posture. So, that seems like a significant advantage in my book. But, is it an unfair advantage?

First, let’s keep in mind that big data is really just another tool. Like most tools it has the potential for misuse and abuse. And, whether a particular application is viewed as “good” or “bad” is dependent on the goals and perspective of the entity using the tool (which may be the polar opposite view of the groups of people targeted by those people or organizations).  So, I will not attempt to make judgments about the various use cases, but rather present a few use cases and let you decide.

Scenario 1 – Sales Organization: What if you could not only understand what you were being told a prospect company needs, but also had a way to validate and refine that understanding? That’s half the battle in sales (budget, integration, and support / politics are other key hurdles). Data that helped you understand not only the actions of that organization (customers and industries, sales and purchases, gains and losses, etc.), but also the goals, interests and biases of the stakeholders and decision makers. This could provide a holistic view of the environment and allow you to provide a highly targeted offering, with messaging tailored to each individual. That is possible, and I’ll explain soon.

Scenario 2 – Hiring Organization: As a hiring manager there are many questions that cannot be asked. While I’m not an attorney, I would bet that State and Federal laws have not kept pace with technology. And, while those laws vary state by state, there are likely loopholes that allow for use of public records. Moreover, implied data that is not officially taken into consideration could color the judgment of a hiring manager or organization. For instance, if you wanted to “get a feeling” if a candidate might fit-in with the team or the culture of the organization, or have interests and views that are aligned with or contrary to your own, you could look for personal internet activity that would provide a more accurate picture of that person’s interests.

Scenario 3 – Teacher / Professor: There are already sites in use to search for plagiarism in written documents, but what if you had a way to make an accurate determination about whether an original work was created by your student? There are people who, for a price, will do the work and write a paper for a student. So, what if you could not only determine that the paper was not written by your student, but also determine who the likely author was?

Do some of these things seem impossible, or at least implausible? Personally, I don’t believe so. Let’s start with the typical data that our credit card companies, banks, search engines, and social network sites already have related to us. Add to that the identified information that is available for purchase from marketing companies and various government agencies. That alone can provide a pretty comprehensive view of us. But, there is so much more that’s available.

Think about the potential of gathering information from intelligent devices that are accessible through the Internet, or your alarm and video monitoring system, etc. These are intended to be private data sources, but one thing history has taught us is that anything accessible is subject to unauthorized access and use (just think about the numerous recent credit card hacking incidents).

Even de-identified data (medical / health / prescription / insurance claim data is one major example), which receives much less protection and can often be purchased, could be correlated with a reasonably high degree of confidence to gain understanding on other “private” aspects of your life. The key is to look for connections (websites, IP addresses, locations, businesses, people), things that are logically related (such as illnesses / treatments / prescriptions), and then make as accurate of an identification as possible (stylometry looks at things like sentence complexity, function words, co-location of words, misspellings and misuse of words, etc. and will likely someday take into consideration things like idea density). It is nearly impossible to remain anonymous in the Age of Big Data.

There has been a paradigm shift when it comes to the practical application of data analysis, and the companies that understand this and embrace it will likely perform better than those who don’t. There are new ethical considerations that arise from this technology, and likely new laws and regulations as well. But for now, the race is on!