Ever since I worked on redesigning a risk management system at an insurance company (1994-1995) I was impressed at how you could make better decisions with more data – assuming it was the right data. The concept of, “What is the right data?” has intrigued me for years, as what may seem common sense today could have been unknown 5-10 years ago, and may be completely passe 5-10 years from now. Context becomes very important because of the variability of data over time.
And this is what makes Big Data interesting. There really is no right or wrong answer or definition. Having a framework to define, categorize, and use that data is important. And at some point being able to refer to the data in-context will be very important as well. Just think about how challenging it could be to compare scenarios or events from 5 years ago with those of today. It’s not apples-to-apples but could certainly be done. It is pretty cool stuff.
The way I think of Big Data is similar to a water tributary system. Water gets into the system many ways – rains from the clouds, sprinkles from private and public supplies, runoff and overflow, etc. It also has many interesting dimensions, such as quality / purity (not necessarily the same due to different aspects of need), velocity, depth, capacity, and so forth. Not all water gets into the tributary system (e.g., some is absorbed into the groundwater tables, and some evaporates), so data loss is expected. If you think in terms of streams, ponds, rivers, lakes, reservoirs, deltas, etc. there are many relevant analogies that can be made. And just like the course of a river may change over time, data in our water tributary system could also change over time.
Another part of my thinking is based on an experience I had about a decade ago (2002 – 2003 timeframe) working on a project for a Nanotech company. In their labs they were testing various things. There were particles that changed reflectivity based on temperature that were embedded in shingles and paint. There were very small batteries that could be recharged tens of thousands of times, were light, and had more capacity than a 12-volt car battery. And, there was a section where they were doing “biometric testing” for the military. I have since read articles about things like smart fabrics that could monitor the health of a soldier, and do things like apply basic first aid when a problem was detected. This company felt that by 2020 advanced nanotechnology would be widely used by the military, and by 2025 it would be in wide commercial use. Is that still a possibility? Who knows…
Much of what you read today is about the exponential growth of data. I agree with that, but also believe that the nature of and sources of that data will change significantly. For example, nano-particles in engine oil will provide information about temperature, engine speed and load, and even things like rapid changes in movement (fast take-off or stops, quick turns). The nanoparticles in the paint will provide weather conditions. The nanoparticles on the seat upholstery will provide information about occupants (number, size, weight). Sort of like the “sensor web,” from the original Kevin Delin perspective. A lot of data will be generated, but then what?
I believe that time will become an important aspect of every piece of data, and that location (X, Y, and Z coordinates) will be just as important. But, not every sensor will collect location. I believe there will be multiple data aggregators in common use at common points (your car, your house, your watch). Those aggregators will package the available data in something akin to an XML object, which allows flexibility. From my perspective this is where things become very interesting.
Currently companies like Google make a lot of money from aggregating data from multiple sources, correlating it to a variety of attributes, and then selling knowledge derived from that plethora of data. I believe that there will be opportunities for individuals to use “data exchanges” to manage, sell, and directly benefit from their own data. The more interesting their data, the more value it has and the more benefit it provides to the person selling it. This could have a huge economic impact, and that would foster both the use and expansion of various commercial ecosystems required to manage the commercial and privacy aspects of this technology.
The next logical step in this vision is “smart everything.” For example, you could buy a shirt that is just a shirt. But, for an extra cost you could turn-on medical monitoring or refractive heating / cooling. And, if you felt there was a market for extra dimensions of data that could benefit you financially, then you could enable those sensors as well. Just think of the potential impact that technology would make to commerce in this scenario.
This is what I personally believe will happen within the next decade or so. This won’t be the only type or use of big data. Rather, there will be many valid types and uses of data – some complementary and some completely discrete. It has the potential to become a confusing mess. But, people will find ways to ingest and correlate that data to identify value in it – today or in the future, and decide to store it (potentially forever). Utilizing that data will become a competitive advantage for people and companies knowing how to do something interesting with it. Who knows what will be viewed as valuable data 5-10 years from now, but it will likely be different than what we view as valuable data today.
So, what are your thoughts? Can we predict the future, or simply create platforms that are powerful enough, flexible enough, and extensible enough to change as our perspective of what is important changes? Either way it will be fun!