Being in Sales I have the opportunity to speak to a lot of customers and prospects about many things. Most are interested in both Cloud Computing and Big Data, but often they don’t fully understand how they will leverage the technology to maximize the benefit. There is a simple three-step process that I use:
1. Explain that there is no single correct answer. There are still many definitions, so it is more important to focus on what you need than on what you call it.
2. Relate the technology to something people are likely already familiar with (extending those concepts). For example: Cloud computing is similar to virtualization, and has many of the same benefits; Big Data is similar to data warehousing.
3. Provide a high-level explanation of how “new and old” are different. For example: Cloud computing often occurs in an external data center – possibly one that you may not even know where it is, so security is even more complex and important than with in-house systems and applications; Big Data often uses data that is not from your environment – possibly even data that you do not know will have value or not, so robust integration tools are very important.
Big Data is a little bit like my first house. I was newly married, anticipated having children, and anticipated moving into a larger house in the future. My wife and I started buying things that fit into our vision of the future and storing it in our basement. We were planning for a future that was not 100% known.
But, our vision changed over time and we did not know exactly what we needed until the very end. After 7 years our basement was very full and it was difficult to find things. When we moved to a bigger house we did have a lot of what we needed. We also had things in storage that we no longer wanted or needed. And, there were a few things we wished that we had purchased earlier. We did our best, and most of what we did was beneficial.
How many of you would have thought that Social Media Sentiment Analysis would be important 5 years ago? How many would have thought that hashtag usage would have become so pervasive in all forms of media? How many understood the importance of location information (and even the time stamp for that location)? My guess is that it would not be many.
This ambiguity is both the good and bad thing about big data. In the old data warehouse days you knew what was important because this was your data about your business, systems, and customers. While IT may have seemed tough before, it can be much more challenging now. But, the payoff can also be much larger so it is worth the effort.
Now we care about unstructured data (website information, blog posts, press releases, tweets, etc.), streaming data (stock ticker data is a common example), sensor data (temperature, altitude, humidity, location, lateral and horizontal forces – think logistics), etc. So, you are getting data from multiple sources having multiple time frame references (e.g., constant streaming versus hourly updates), often in an unknown or inconsistent format. Many times you don’t know what you don’t know – and you just need to accept that.
In a future post I will discuss scenarios that take advantage of Big Data, and why allowing some ambiguity and uncertainty in your model could be one of the best things that you have ever done. But for now take a look at the links below for more basic information:
This article discusses why Big Data matters, and how you can get value without needing complex analytics.
Big Data article that discusses the importance of taking action quickly to gain a competitive advantage. Note: Free registration to the site may be required to view this article.
This article (Big Data is the Tower of Babel) discusses the importance of data integration.
This short article discusses three important considerations for a Big Data project. While correct, the first point is really the key when getting started.
This is a good high-level article on Hadoop 2.0. Remember how I described the basement in my first house? That’s how Hadoop is utilized in many cases.
For several years my company and my family funded a dozen or so medical research projects (see www.comp-soln.com/fund.html for highlights). I had the pleasure of meeting and working with many brilliant MD/Ph.D. researchers. My goal was to fund $1 million of medical research and find a cure for Arthritis. We didn’t reach that goal, but many good things came out of that research.
Something that amazed me was how research worked. Competition for funding is intense, so there was much less collaboration between institutions than I would have expected. At one point we were funding similar projects at two institutions. The projects went in two very different directions, and it was clear to me that one was going to be much more successful than the other. It seemed almost wasteful, and I thought that there must be a better, more efficient and cost-effective way of managing research efforts.
So, in 2006 I had an idea. What if I could create a cloud based (a very new term at the time) research platform that would support global collaboration. It would need to support true analytical processing, statistical analysis, document management (something else that was fairly new at the time), and desktop publishing at a minimum. Publishing research findings is very important in this space, so my idea was to provide a workspace that supported end-to-end research efforts (inception to publication) and fostered collaboration.
This platform would only really work if there were a new way to allow interested parties to fund this research that was easy to use and could reach a large audience. People could make contributions based on area of interest, specific projects, specific individuals working on projects, or projects in a specific regional area. The idea was a lot like what Crowdtilt (www.crowdtilt.com) is today. This funding mechanism would support non-traditional collaboration, and would hopefully have a huge impact on the research community and their findings.
Additionally, this platform would support the collection of suggestions and ideas. Good ideas can come from anywhere – especially when you don’t know that something is not supposed work.
During one funding review meeting I made a naïve statement about using cortisone injections to treat TMJ arthritis. I was told why this would not work. But, a month or so later I received a call explaining how this might actually work. That led to a research project and positive results (see http://onlinelibrary.wiley.com/doi/10.1002/art.21384/pdf). You never know where the next good idea might come from, so why not make it easy for people to share those ideas.
By the end of 2007 I had an architecture using SOA (service oriented architecture) using open source products that would do most of what I needed. Then, in 2008 Google announced the “Project 10^100” competition. I entered, confident that I would at least get honorable mention (alas, nothing came from this). Then, in early 2010 I spent an hour discussing my idea with the CTO of a popular Cloud company. This CTO had a medical background, liked my idea, offered a few suggestions, and even offered to help. It was the perfect opportunity. But, I had just started a new position at work and this fell to the wayside. That was a shame, and I only have myself to blame. It is something that has bothered me for years.
It’s 2013, there are far more tools available today to make this platform a reality, and it still does not exist. The reason that I’m writing this is because the idea has merit, and think that there might be others who feel he same way and would like to work on making this dream a reality. It’s a change to leverage technology to potentially make a huge impact on society. And, it can create opportunities for people in regions that might otherwise be ignored to contribute to this greater good.
Idealistic? Maybe. Possible? Absolutely!