Cloud Computing
Getting Started with Big Data
Being in Sales, I have the opportunity to speak to many customers and prospects about many things. Most are interested in Cloud Computing and Big Data, but often they don’t fully understand how they will leverage the technology to maximize the benefits.
Here is a simple three-step process that I use:
1. For Big Data, I explain that there is no single correct definition. Because of this, I recommend that companies focus on what they need rather than what to call it. Results are more important than definitions for these purposes.
2. Relate the technology to something people are likely already familiar with (extending those concepts). For example: Cloud computing is similar to virtualization and has many of the same benefits; Big Data is similar to data warehousing. This helps make new concepts more tangible in any context.
3. Provide a high-level explanation of how “new and old” are different and why new is better using specific examples that they should relate to. For example: Cloud computing often occurs in an external data center – possibly one where you may not even know where it is- so security can be even more complex than in-house systems and applications. It is possible to have both Public and Private Clouds, and a public cloud from a major vendor may be more secure and easier to implement than a similar system using your own hardware;
Big Data is a little bit like my first house. I was newly married, anticipated having children and also anticipated moving into a larger house in the future. My wife and I started buying things that fit into our vision of the future and storing them in our basement. We were planning for a future that was not 100% known.
But, our vision changed over time and we did not know exactly what we needed until the end. After 7 years, our basement was very full, and finding things difficult. When we moved to a bigger house, we did have a lot of what we needed. But we also had many things that we no longer wanted or needed. And, there were a few things we wished that we had purchased earlier. We did our best, and most of what we did was beneficial, but those purchases were speculative, and in the end, there was some waste.
How many of you would have thought Social Media Sentiment Analysis would be important 5 years ago? How many would have thought that hashtag usage would have become so pervasive in all forms of media? How many understood the importance of location information (and even the time stamp for that location)? I guess it would be less than 50% of all companies.
This ambiguity is both a good and bad thing about big data. In the old data warehouse days, you knew what was important because this was your data about your business, systems, and customers. While IT may have seemed tough in the past, it can be much more challenging now. But the payoff can also be much larger, so it is worth the effort. You often don’t know what you don’t know – and you just need to accept that.
Now we care about unstructured data (website information, blog posts, press releases, tweets, etc.), streaming data (stock ticker data is a common example), sensor data (temperature, altitude, humidity, location, lateral and horizontal forces), temporal data, etc. Data arrives from multiple sources and likely will have multiple time frame references (e.g., constant streaming versus updates with varying granularity), often in unknown or inconsistent formats. Someday soon, data from all sources will be automatically analyzed to identify patterns and correlations and gain other relevant insights.
Robust and flexible data integration, data protection, and data privacy will all become far more important in the near future! This is just the beginning for Big Data.
Using Technology for the Greater Good
My company and my family funded a dozen or so medical research projects over several years. I had the pleasure of meeting and working with many brilliant MD/Ph.D. researchers. My goal was to fund $1 million in medical research and find a cure for Juvenile Arthritis. We didn’t reach that goal, but many good things came out of that research.
Something that amazed me was how research worked. Competition for funding is intense, so there was much less collaboration between institutions than I would have expected. At one point, we were funding similar projects at two institutions. The projects went in two very different directions, and it was clear that one would be much more successful than the other. It seemed almost wasteful, and I thought there must be a better, more efficient, and cost-effective way of managing research efforts.
So, in 2006 I had an idea. What if I could create a cloud-based (a very new concept at the time) research platform that would support global collaboration? It would need to support true analytical processing, statistical analysis, document management (something fairly new then), and desktop publishing. Publishing research findings is very important in this space, so my idea was to provide a workspace that supported end-to-end research efforts (inception to publication, including auditing and data collection) and fostered collaboration.
This platform would only work if there were a new way to allow interested parties to fund this research that was easy to use and could reach a large audience. Individuals could make contributions based on areas of interest, specific projects, specific individuals working on projects, or projects in a specific regional area. The idea was a lot like what Crowdtilt is today. This funding mechanism would support non-traditional collaboration and hopefully greatly impact the research community and their findings.
Additionally, this platform would support the collection of suggestions and ideas. Good ideas can come from anywhere – especially when you don’t know that something is not supposed to work.
During one funding review meeting at the Children’s Hospital of Philadelphia (CHOP), I made a naïve statement about using cortisone injections to treat TMJ arthritis. I was told why this would not work. A month or so later, I received a call explaining that my suggestion might work, with a request for another in-person meeting and additional funding. Conceptual Expansion at its best! That led to a new research project and positive results (see http://onlinelibrary.wiley.com/doi/10.1002/art.21384/pdf).
You never know where the next good idea might come from, so why not make it easy for people to share those ideas.
By the end of 2007, I had designed an architecture based on SOA (service-oriented architecture) using open-source products that would do most of what I needed. Then, in 2008 Google announced the “Project 10^100” competition. I entered, confident that I would at least get an honorable mention (alas, nothing came from this).
Then, in early 2010 I spent an hour discussing my idea with the CTO of a popular Cloud company. This CTO had a medical background, liked my idea, offered a few suggestions, and even offered to help. It was the perfect opportunity. But, I had just started a new position at work, so this project fell by the wayside. That was a shame, and I only have myself to blame. It is something that has bothered me for years.
It’s 2013, and far more tools are available today to make this platform a reality, and something like this still does not exist. I’m writing this because the idea has merit, and I think there might be others who feel the same way and would like to work on making this dream a reality. It’s a chance to leverage technology to potentially make a huge impact on society. And it can create opportunities for people in regions that might otherwise be ignored to contribute to this greater good.
Idealistic? Maybe. Possible? Absolutely!
My perspective on Big Data
Ever since I worked on redesigning a risk management system at an insurance company (1994-1995) I was impressed at how better decisions could be made with more data – assuming it was the right data. The concept of “What is the right data?” has intrigued me for years, as what may seem common sense today could have been unknown 5-10 years ago and could be completely passé 5-10 years from now. Context becomes very important because of the variability and relevance of data over time.
This is what makes Big Data interesting. There really is no right or wrong answer or definition. Having a framework to define, categorize, and use that data is important. And at some point, being able to refer to the data in context will also be very important. Just think about how challenging it could be to compare scenarios or events from 5 years ago with those of today. It’s likely not an apples-to-apples comparison, but it could certainly be done. The concept of maximizing the value of data is pretty cool stuff.
The way I think of Big Data is similar to a water tributary system. Water enters the system in many ways – rain from the clouds, sprinkles from private and public supplies, runoff, overflow, etc. It also has many interesting dimensions, such as quality/purity (not necessarily the same due to different aspects of need), velocity, depth, capacity, and so forth. Not all water gets into the tributary system (e.g., some is absorbed into the groundwater tables, and some evaporates) – just as some data loss should be anticipated.
If you think of streams, ponds, rivers, lakes, reservoirs, deltas, etc., many relevant analogies can be made. And just like the course of a river may change over time, data in our “big data” water tributary system could also change over time.
Another part of my thinking is based on my experience of working on a project for a Nanotech company about a decade ago (2002 – 2003 timeframe). In their labs, they were testing various products. There were particles that changed reflectivity based on the temperature that were embedded in shingles and paint. There were very small batteries that could be recharged quickly tens of thousands of times, were light, and had more capacity than a common 12-volt car battery.
And there was a section where they were doing “biometric testing” for the military. I have since read articles about things like smart fabrics that could monitor a soldier’s health and apply basic first aid and notify others once a problem is detected. This company felt that by 2020, advanced nanotechnology would be widely used by the military, and by 2025, it would be in wide commercial use. Is that still a possibility? Who knows…
Much of what you read today is about the exponential growth of data. I agree with that, but as stated earlier, and this is important, I believe that the nature and sources of that data will change significantly. For example, nanoparticles in engine oil will provide information about temperature, engine speed, load, and even rapid changes in motion (fast take-off or stops, quick turns). The nanoparticles in the paint will provide weather conditions. The nanoparticles on the seat upholstery will provide information about occupants (number, size, weight). Sort of like the “sensor web” from the original Kevin Delin perspective. A lot of “Information of Things” (IoT) data will be generated, but then what?
I believe that time will become an essential aspect of every piece of data and that location (X, Y, and Z coordinates) will be just as important. However, not every sensor collects location (spatial) data. I believe multiple data aggregators will be in everyday use at common points (your car, your house, your watch). Those aggregators will package the available data into something akin to an XML object, allowing flexibility. From my perspective, this is where things become very interesting relative to commercial use and data privacy.
Currently, companies like Google make a lot of money by aggregating data from multiple sources, correlating it with various attributes, and then selling knowledge derived from that data. I believe there will be opportunities for individuals to use “data exchanges” to manage, sell, and directly benefit from their own data. The more interesting their data, the more value it has and the more benefit it provides to the person selling it. This could have a significant economic impact, fostering both the use and expansion of the commercial ecosystems needed to manage this technology’s commercial and privacy aspects, especially as it relates to machine learning.
The next logical step in this vision is “smart everything.” For example, you could buy a shirt that is just a shirt. But you could turn on medical monitoring or refractive heating/cooling for an extra cost. And, if you felt there was a market for extra dimensions of data that could benefit you financially, you could also enable those sensors. Just think of the potential impact that technology would have on commerce in this scenario.
I believe this will happen within the next decade or so. This won’t be the only type of use of big data. Instead, there will be many valid types and uses of data – some complementary and some completely discrete. It has the potential to become a confusing mess. But, people will find ways to ingest, categorize, and correlate data to create value – today or in the future.
Utilizing data will become an increasingly competitive advantage for people and companies, knowing how to do something interesting and useful. Who knows what will be viewed as valuable data 5-10 years from now, but it will likely be different than what we view as valuable data today.
So, what are your thoughts? Can we predict the future based on the past? Or, is it simply enough to create platforms that are powerful enough, flexible enough, and extensible enough to change our understanding as our perspective of what is important changes? Either way, it will be fun!
- ← Previous
- 1
- 2

