BigData
The Unsung Hero of Big Data
Earlier this week I was reading a blog post regarding the recent Gartner Hype Cycle for Advanced Analytics and Data Science, 2015. The Gartner chart reminded me of the epigram, “Plus ça change, plus c’est la même chose” (asserting that history repeats itself by stating the more things change, the more they stay the same.)
To some extent that is true, as you could consider today’s Big Data as derivative of yesterday’s VLDBs (very large databases) and Data Warehouses. One of the biggest changes IMO is the shift away from Star Schemas and practices implemented for performance reasons such as aggregation of data sets, use of derived and encoded values, the use of surrogate and foreign keys to establish linkage, etc. Going forward it may not be possible to have that much rigidity and be as responsive as needed from a competitive perspective.
There are many dimensions to big data: Huge sample of data (volume), which becomes your universal set and supports deep analysis as well as temporal and spatial analysis; A variety of data (structured and unstructured) that often does not lend itself to SQL based analytics; and often data streaming in (velocity) from multiple sources – an area that will become even more important in the era of the Internet of Things. These are the “Three V’s” that people have been talking about for the past five years.
Like many people, my interest in Object Database technology initially waned in the late 1990’s. That is, until about four years ago when a project at work led me back in this direction. As I dug into the various products I learned that they were alive and doing very well in several niche areas. That finding led to a better understanding of the real value of object databases.
Some products try to be, “All Vs to all people,” but generally what works best is a complementary, integrated set of tools working together as services within a single platform. It makes a lot of sense. So, back to object databases.
One of the things I like most about my job is the business development aspect. One of the product families I’m responsible for is Versant. With the Versant Object Database (VOD – high performance, high throughput, high concurrency) and Fast Objects (great for embedded applications). I’ve met and worked with some brilliant people who have created amazing products based on this technology. Creative people like these are fun to work with, and helping them grow their business is mutually beneficial. Everyone wins.
An area where VOD excels is with the near real-time processing of streaming data. The reason it is so adept to this task is the way that object map out in the database. They do so in a way that essentially mirrors reality. So, optionality is not a problem – no disjoint queries or missed data, no complex query gyrations to get the correct data set, etc. Things like sparse indexing are no problem with VOD. This means that pattern matching is quick and easy, as well as more traditional rule and look-up validation. Polymorphism allows objects, functions, and even data to have more than one form.
VOD does more by allowing data to be more, which is ideal for environments where change is the norm. Cyber Security, Fraud Detection, Threat Detection, Logistics, and Heuristic Load Optimization. In each case, the key to success is performance, accuracy, and adaptability.
The ubiquity of devices generating data today, combined with the desire for people and companies to leverage that data for commercial and non-commercial benefit, is very different than what we saw 10+ years ago. Products like VOD are working their way up that Slope of Enlightenment because there is a need to connect the dots better and faster – especially as the volume and variety of those dots increases. It is not a, “one size fits all” solution, but it is often the perfect tool for this type of work.
These are indeed exciting times!
Ideas are sometimes Slippery and Hard to Grasp
I started this blog the goal of it becoming an “idea exchange,” as well a way to pass along lessons learned to help others. Typical guidance for a blog is to focus on one thing only and do it well in order to develop a following. That is especially important if you want to monetize the blog, but that is not and has not been my goal.
One of the things that has surprised me is how different the comments and likes are for each post. Feedback from the last post was even more diverse and surprising than usual. It ranged from comments about “Siri vs Google,” to feedback about Sci-Fi books and movies, to Artificial Intelligence.
I asked a few friends for feedback and received something very insightful (Thanks Jim). He stated that he found the blog interesting, but wasn’t sure what the objective was. He went on to identify several possible goals for the last post. Strangely enough, his comments mirrored the type of feedback that I received. That pointed out an area for improvement to me, and I appreciated that, as well as the wisdom of focusing on one thing. Who knows, maybe in the future…
This also reminded me of a white paper written 12-13 years ago by someone I used to work with. It was about how Bluetooth was going to be the “next big thing.” He had read an IEEE paper or something and saw potential for this new technology. His paper provided the example of your toaster and coffee maker communicating so that your breakfast would be ready when you walk into the kitchen in the morning.
At that time I had a couple of thoughts. Who cared about something that only had a 20-30 foot range when WiFi was becoming popular and had much greater range? In addition, a couple of years earlier I had a tour of the Microsoft “House of the Future,” in which everything was automated and key components communicated with each other. But everything in the house was all hardwired or used WiFi – not Bluetooth. It was easy to dismiss his assertion because it seemed to lack pragmatism, and the value of the idea was difficult to quantify given the use case provided.
Looking back now I view that white paper as having insight (if it were visionary he would have come out with the first Bluetooth speakers, or car interface, or even phone earpiece and gotten rich), but it failed to present use cases that were easy enough to understand yet different enough from what was available at the time to demonstrate the real value of the idea. His expression of idea was not tangible enough and therefore too slippery to be easily grasped and valued.
I’m a huge believer that good ideas sometimes originate where you least expect them. Often those ideas are incremental in nature – seemingly simple and sometimes borderline obvious, often building on some other idea or concept. An idea does not need to be unique in order to be important or valuable, but it does need to be presented in a way that is easy to understand the benefits, differentiation, and value. That is just good communication.
One of the things I miss most from when my consulting company was active was the interaction between a couple of key people (Jason and Peter) and myself. Those guys were very good at taking an idea and helping build it out. This worked well because we had some overlapping expertise and experiences as well as skills and perspectives that were more complementary in nature. That diversity increased the depth and breadth to our efforts to develop and extend those ideas by asking the tough questions early and ensuring that we could convince each other of the value.
Our discussions were creative and highly collaborative as well as a lot of fun. Each of us improved from them, and the outcome was usually something viable from a commercial perspective. As a growing and profitable small business you need to constantly innovate to differentiate yourself from your competition. Our discussions were driven as much by necessity as they were by intellectual curiosity, and I personally believe that this was part of the magic.
So, back to the last post. I view various technologies as building blocks. Some are foundational and others are complementary. To me, the key is not viewing those various technologies as competing with each other. Instead, I look for potential value created by integrating them with each other. That may not always possible and does not always lead to something better, but occasionally it does so to me it is a worthwhile exercise. With regard to voice technology, I do believe that we will see more, better and smarter applications of it – especially as realtime systems become more complex due to the use of an increasing number of specialized component systems and sensors.
While today’s smartphones would not pass the Turing Test or proposed alternatives, they are an improvement over more simplistic voice translation tools available just a few years ago. Advancement requires the tools to understand context in order to make inferences. This brings you closer to machine learning, and big data (when done right) significantly increases that potential.
Ultimately, this all leads back to Artificial Intelligence (at least in my mind). It’s a big leap from a simple voice translation tool to AI, but when viewed as building blocks it is not such a stretch.
Now think about creating an interface (API) that allows one smart device to communicate with another in a manner akin to the collaborative efforts described above with my old team. It’s not simply having a front-end device exchanging keywords or queries with a back-end device. Instead, it is two or more devices and/or systems having a “discussion” about what is being requested, looking at what each component “knows,” asking clarifying questions and making suggestions, and then finally taking that multi-dimensional understanding of the problem to determine what is really needed.
So, possibly not true AI, but a giant leap forward from what we have today. That would help turn the science fiction of the past into science fact in the near future. The better the understanding and inferences by the smart system, the better the results.
I also believe that the unintended consequences of these new smart systems is that as they become more human-like in their approach the more likely they will be to make errors like a human. Hopefully they will be able to back test recommendations to validate and minimize errors. If they are intelligent enough to monitor results and make suggestions about corrective actions when they determine that the recommendation is not having the optimal desired results would make them even “smarter.” Best of all there won’t be an ego creating a distortion filter on the results. Or maybe there will…
A lot of the building blocks required to create these new systems are available today. But, it takes both vision and insight to see that potential, translate ideas from slippery and abstract to tangible and purposeful, and then start building something really cool. As that happens we will see a paradigm shift in how we interact with computers and how they interact with us. That will lead us to the systematic integration that I wrote about in a big data / nanotechnology post.
So, what is the real objective of my blog? To get people thinking about things in a different way, to foster collaboration and partnerships between businesses and educational institutions in order to push the limits of technology, and to foster discussion about what others believe the future of computing and smart devices will look like. I’m confident that I will see these types of systems in my lifetime, and believe in the possibility of a lot of this occurring within the next decade.
What are your thoughts?
Getting Started with Big Data
Being in Sales I have the opportunity to speak to a lot of customers and prospects about many things. Most are interested in Cloud Computing and Big Data, but often they don’t fully understand how they will leverage the technology to maximize the benefit.
Here is a simple three-step process that I use:
1. For Big Data I explain that there is no single correct definition. Because of this I recommend that companies focus on what they need rather than on what to call it. Results are more important than definitions for these purposes.
2. Relate the technology to something people are likely already familiar with (extending those concepts). For example: Cloud computing is similar to virtualization and has many of the same benefits; Big Data is similar to data warehousing. This helps make new concepts more tangible in any context.
3. Provide a high-level explanation of how “new and old” are different, and why new is better using specific examples that they should relate to. For example: Cloud computing often occurs in an external data center – possibly one that you may not even know where it is, so security can be even more complex than with in-house systems and applications. It is possible to have both Public and Private Clouds, and a public cloud from a major vendor may be more secure and easier to implement than a similar system using your own hardware;
Big Data is a little bit like my first house. I was newly married, anticipated having children and also anticipated moving into a larger house in the future. My wife and I started buying things that fit into our vision of the future and storing it in our basement. We were planning for a future that was not 100% known.
But, our vision changed over time and we did not know exactly what we needed until the very end. After 7 years our basement was very full and it was difficult to find things. When we moved to a bigger house we did have a lot of what we needed. But, we also had many things that we no longer wanted or needed. And, there were a few things we wished that we had purchased earlier. We did our best, and most of what we did was beneficial, but those purchases were speculative and in the end there was some waste.
How many of you would have thought that Social Media Sentiment Analysis would be important 5 years ago? How many would have thought that hashtag usage would have become so pervasive in all forms of media? How many understood the importance of location information (and even the time stamp for that location)? My guess is that it would be less than 50% of all companies.
This ambiguity is both the good and bad thing about big data. In the old data warehouse days you knew what was important because this was your data about your business, systems, and customers. While IT may have seemed tough before, it can be much more challenging now. But, the payoff can also be much larger so it is worth the effort. Many times you don’t know what you don’t know – and you just need to accept that.
Now we care about unstructured data (website information, blog posts, press releases, tweets, etc.), streaming data (stock ticker data is a common example), sensor data (temperature, altitude, humidity, location, lateral and horizontal forces), temporal data, etc. Data arrives from multiple sources and likely will have multiple time frame references (e.g., constant streaming versus updates with varying granularity), often in unknown or inconsistent formats.
Robust and flexible data integration, data protection, and data privacy will all become far more important in the near future! This is just the beginning for Big Data.
My perspective on Big Data
Ever since I worked on redesigning a risk management system at an insurance company (1994-1995) I was impressed at how better decisions could be made with more data – assuming it was the right data. The concept of, “What is the right data?” has intrigued me for years, as what may seem common sense today could have been unknown 5-10 years ago and could be completely passé 5-10 years from now. Context becomes very important because of the variability and relevance of data over time.
This is what makes Big Data interesting. There really is no right or wrong answer or definition. Having a framework to define, categorize, and use that data is important. And at some point being able to refer to the data in-context will be very important as well. Just think about how challenging it could be to compare scenarios or events from 5 years ago with those of today. It’s likely not an apples-to-apples comparison but could certainly be done. The concept of maximizing the value of data is pretty cool stuff.
The way I think of Big Data is similar to a water tributary system. Water enters the system many ways – rain from the clouds, sprinkles from private and public supplies, runoff, overflow, etc. It also has many interesting dimensions, such as quality/purity (not necessarily the same due to different aspects of need), velocity, depth, capacity, and so forth. Not all water gets into the tributary system (e.g., some is absorbed into the groundwater tables, and some evaporate) – just as some data loss should be anticipated.
If you think in terms of streams, ponds, rivers, lakes, reservoirs, deltas, etc. there are many relevant analogies that can be made. And just like the course of a river may change over time, data in our “big data” water tributary system could also change over time.
Another part of my thinking is based on an experience I had about a decade ago (2002 – 2003 timeframe) working on a project for a Nanotech company. In their labs, they were testing various things. There were particles that changed reflectivity based on the temperature that was embedded in shingles and paint. There were very small batteries that could be recharged tens of thousands of times, were light and had more capacity than a common 12-volt car battery.
And, there was a section where they were doing “biometric testing” for the military. I have since read articles about things like smart fabrics that could monitor the health of a soldier and do things like apply basic first aid and notify others once a problem was detected. This company felt that by 2020 advanced nanotechnology would be widely used by the military, and by 2025 it would be in wide commercial use. Is that still a possibility? Who knows…
Much of what you read today is about the exponential growth of data. I agree with that, but as stated earlier, and this is important, I believe that the nature of and sources of that data will change significantly. For example, nano-particles in engine oil will provide information about temperature, engine speed and load, and even things like rapid changes in movement (fast take-off or stops, quick turns). The nanoparticles in the paint will provide weather conditions. The nanoparticles on the seat upholstery will provide information about occupants (number, size, weight). Sort of like the “sensor web,” from the original Kevin Delin perspective. A lot of “Information of Things” data will be generated, but then what?
I believe that time will become an important aspect of every piece of data, and that location (X, Y, and Z coordinates) will be just as important. But, not every sensor will collect location (spatial data). I do believe there will be multiple data aggregators in common use at common points (your car, your house, your watch). Those aggregators will package the available data in something akin to an XML object, which allows flexibility. From my perspective, this is where things become very interesting relative to commercial use and data privacy.
Currently, companies like Google make a lot of money from aggregating data from multiple sources, correlating it to a variety of attributes, and then selling knowledge derived from that plethora of data. I believe that there will be opportunities for individuals to use “data exchanges” to manage, sell, and directly benefit from their own data. The more interesting their data, the more value it has and the more benefit it provides to the person selling it. This could have a huge economic impact, and that would foster both the use and expansion of various commercial ecosystems required to manage the commercial and privacy aspects of this technology.
The next logical step in this vision is “smart everything.” For example, you could buy a shirt that is just a shirt. But, for an extra cost, you could turn-on medical monitoring or refractive heating/cooling. And, if you felt there was a market for extra dimensions of data that could benefit you financially, then you could enable those sensors as well. Just think of the potential impact that technology would make to commerce in this scenario.
This is what I personally believe will happen within the next decade or so. This won’t be the only type of or use of big data. Rather, there will be many valid types and uses of data – some complementary and some completely discrete. It has the potential to become a confusing mess. But, people will find ways to ingest, categorize, and correlate data to create value with it – today or in the future.
Utilizing data will become an increasingly competitive advantage for people and companies knowing how to do something interesting and useful with it. Who knows what will be viewed as valuable data 5-10 years from now, but it will likely be different than what we view as valuable data today.
So, what are your thoughts? Can we predict the future based on the past? Or, is it simply enough to create platforms that are powerful enough, flexible enough, and extensible enough to change our understanding as our perspective of what is important changes? Either way it will be fun!