Virtualization

Getting Started with Big Data

Posted on Updated on

Image

Being in Sales I have the opportunity to speak to a lot of customers and prospects about many things. Most are interested in Cloud Computing and Big Data, but often they don’t fully understand how they will leverage the technology to maximize the benefit.

Here is a simple three-step process that I use:

1. For Big Data I explain that there is no single correct definition. Because of this I recommend that companies focus on what they need rather than on what to call it. Results are more important than definitions for these purposes.

2. Relate the technology to something people are likely already familiar with (extending those concepts). For example: Cloud computing is similar to virtualization and has many of the same benefits; Big Data is similar to data warehousing. This helps make new concepts more tangible in any context.

3. Provide a high-level explanation of how “new and old” are different, and why new is better using specific examples that they should relate to. For example: Cloud computing often occurs in an external data center – possibly one that you may not even know where it is, so security can be even more complex than with in-house systems and applications. It is possible to have both Public and Private Clouds, and a public cloud from a major vendor may be more secure and easier to implement than a similar system using your own hardware;

Big Data is a little bit like my first house. I was newly married, anticipated having children and also anticipated moving into a larger house in the future. My wife and I started buying things that fit into our vision of the future and storing it in our basement. We were planning for a future that was not 100% known.

But, our vision changed over time and we did not know exactly what we needed until the very end. After 7 years our basement was very full and it was difficult to find things.  When we moved to a bigger house we did have a lot of what we needed. But, we also had many things that we no longer wanted or needed. And, there were a few things we wished that we had purchased earlier. We did our best, and most of what we did was beneficial, but those purchases were speculative and in the end there was some waste.

How many of you would have thought that Social Media Sentiment Analysis would be important 5 years ago? How many would have thought that hashtag usage would have become so pervasive in all forms of media? How many understood the importance of location information (and even the time stamp for that location)? My guess is that it would be less than 50% of all companies.

This ambiguity is both the good and bad thing about big data. In the old data warehouse days you knew what was important because this was your data about your business, systems, and customers.  While IT may have seemed tough before, it can be much more challenging now. But, the payoff can also be much larger so it is worth the effort. Many times you don’t know what you don’t know – and you just need to accept that.

Now we care about unstructured data (website information, blog posts, press releases, tweets, etc.), streaming data (stock ticker data is a common example), sensor data (temperature, altitude, humidity, location, lateral and horizontal forces), temporal data, etc.  Data arrives from multiple sources and likely will have multiple time frame references (e.g., constant streaming versus updates with varying granularity), often in unknown or inconsistent formats.

Robust and flexible data integration, data protection, and data privacy will all become far more important in the near future! This is just the beginning for Big Data.