BigData

The Unsung Hero of Big Data

Posted on Updated on

Earlier this week I was reading a blog post regarding the recent Gartner Hype Cycle for Advanced Analytics and Data Science, 2015. The Gartner chart reminded me of the epigram, “Plus ça change, plus c’est la même chose” (asserting that history repeats itself by stating the more things change, the more they stay the same.) To some extent that is true, as you could consider today’s big data as derivative of yesterday’s VLDBs (very large databases) and Data Warehouses. One of the biggest changes IMO is the shift away from Star Schemas and practices implemented for performance reasons such as aggregation of data sets, use of derived and encoded values, the use of surrogate and foreign keys to establish linkage, etc.

There are many dimensions to big data: Huge sample of data (volume), which becomes your universal set and supports deep analysis as well as temporal and spatial analysis; A variety of data (structured and unstructured) that often does not lend itself to SQL based analytics; and often data streaming in (velocity) from multiple sources – an area that will become even more important in the era of the Internet of Things. These are the “Three V’s” that people have been talking about for the past five years.

Like many people, my interest in Object Database technology initially waned in the late 1990’s. That is, until about four years ago when a project at work led me back in this direction. As I dug into the various products I learned that they were alive and doing very well in several niche areas. That finding led to a better understanding of the real value of object databases.

Some products try to be, “All Vs to all people,” but generally what works best is a complementary, integrated set of tools working together as services within a single platform. It makes a lot of sense. So, back to object databases.

One of the things I like most about my job is the business development aspect. One of the product families I’m responsible for is Versant. With the Versant Object Database (VOD – high performance, high throughput, high concurrency) and Fast Objects (great for embedded applications). I’ve met and worked with some brilliant people who have created amazing products based on this technology. Creative people like these are fun to work with, and helping them grow their business is mutually beneficial. Everyone wins.

An area where VOD excels is with the near real-time processing of streaming data. The reason it is so adept to this task is the way that object map out in the database. They do so in a way that essentially mirrors reality. So, optionality is not a problem – no disjoint queries or missed data, no complex query gyrations to get the correct data set, etc. Things like sparse indexing are no problem with VOD. This means that pattern matching is quick and easy, as well as more traditional rule and look-up validation. Polymorphism allows objects, functions, and even data to have more than one form.

VOD does more by allowing data to be more, which is ideal for environments where change is the norm. Cyber Security, Fraud Detection, Threat Detection, Logistics, and Heuristic Load Optimization. In each case, the key to success is performance, accuracy, and adaptability.  Image of globe with network of connected dots in the space above it.

So, while some things stay the same, others really do change. The ubiquity of devices generating data today, combined with the desire for people and companies to leverage that data for commercial and non-commercial benefit, really is very different than what we saw 10 or more years ago. Products like VOD are working their way up that Slope of Enlightenment because there is a need to connect the dots better and faster – especially as the volume and variety of those dots increases. It is not a, “one size fits all” solution, but it is the perfect tool for this type of work.

These are exciting times.

Ideas are sometimes Slippery and Hard to Grasp

Posted on

I started this blog the goal of it being an “idea exchange,” as well a way to pass along lessons learned to help others. One of the things that has surprised me is how different the comments and likes are for each post. Feedback from the last post was even more diverse and surprising than usual. It ranged from comments about “Siri vs Google,” to feedback about Sci-Fi books and movies, to Artificial Intelligence.

I asked a few friends for feedback and received something very insightful (Thanks Jim). He stated that he found the blog interesting, but wasn’t sure what the objective was. He went on to identify several possible goals for the last post. Strangely enough, his comments mirrored the type of feedback that I received. That pointed out an area for improvement to me, and I appreciated that.

This also got me thinking about a white paper written 12-13 years ago by someone I used to work with. It was about how Bluetooth was going to be the “next big thing.” He had read an IEEE paper or something and saw potential for this new technology. His paper provided the example of your toaster and coffee maker communicating so that your breakfast would be ready when you walk into the kitchen in the morning.

At that time I had a couple of thoughts. First, who cared about something that only had a 20-30 foot range when WiFi was becoming popular and had much greater range? In addition, a couple of years earlier I had a tour of the Microsoft “House of the Future,” in which everything was automated and key components communicated with each other. But everything in the house was all hardwired or used WiFi – not Bluetooth. So, it was easy to dismiss his assertion because it seemed too abstract.

Looking back now I view that white paper as having insight (if it were visionary he would have come out with the first Bluetooth speakers, or car interface, or even phone earpiece and gotten rich), but it failed to present use cases that were easy Idea 2enough to understand yet different enough from what was available at the time to demonstrate the real value of the idea. His expression of idea was not tangible enough and therefore too slippery to be easily grasped.

I’m a huge believer that good ideas sometimes originate where you least expect them. Often those ideas are incremental in nature – seemingly simple and sometimes borderline obvious, but building on some other idea or concept. An idea does not need to be unique in order to be important or valuable, but it does need to be presented in a way that is easy to understand and tangible. That is just good communication.

One of the things I miss most from when my consulting company was active was the interaction between a couple of key people (Jason and Peter) and myself. Those guys were very good at taking an idea and helping build it out. This worked well because we had some common expertise and experiences, but we also had skills and perspectives that were more complementary in nature. That diversity increased the depth and breadth to our efforts to develop and extend those ideas.

Our discussions were creative and highly collaborative, and also a lot of fun. Each of us improved from them, and the outcome us usually something viable from a commercial perspective. As a growing and profitable small business you need to constantly innovate to differentiate yourself from your competition. Our discussions were driven as much by necessity as they were by intellectual curiosity.

So, back to the last post. I view various technologies as building blocks. Some are foundational and others are complementary. To me, the key is not viewing those various technologies as competing with each other. Instead, I look for the value in integrating them with each other. It is not always possible and does not always lead to something better, but occasionally it does. With regard to voice technology, I do believe that we will see more, better and smarter applications of it.

While today’s smart phones would not pass the Turing Test or proposed alternatives, they are an improvement over more simplistic voice translation tools available just a few years ago. Advancement requires the tool to understand context in order to make inferences. This brings you closer to machine learning, and big data (when done right) increases that potential. Ultimately, this all leads to Artificial Intelligence (at least in my mind). It’s a big leap from a simple voice translation tool to AI, but when viewed as building blocks it is not such a stretch.

Now think about creating an interface (API) that allows one smart device to communicate with another in something akin to the collaborative efforts described above with my old team. It’s not simply having a front-end device exchanging keywords or queries with a back-end device. Instead, it is two or more devices and/or systems having a “discussion” about what is being requested, looking at what each component “knows,” asking clarifying questions and making suggestions, and then finally taking that multi-dimensional understanding to determine what is really needed.

So, possibly not true AI, but a giant leap forward from what we have today. That would help turn science fiction of the past into science fact in the near future. The better the understanding and inferences by the smart system, the better the results. I also believe that the unintended consequences of these new smart systems it that are they become more human like in their approach the more likely they will be to make errors. But, hopefully they will be able to back test recommendations to help minimize errors and be intelligent enough to monitor results and make suggestions about corrective actions when they determine that the recommendation was not optimal. And even more importantly, there won’t be an ego creating a distortion filter on the results.

A lot of the building blocks required to create these new systems are available today. But, it takes both vision and insight to see that potential, translate ideas from slippery and abstract to tangible and purposeful, and then start building something really cool. As that happens we will see a paradigm shift in how we interact with computers and how they interact with us. That will lead us to the systematic integration that I wrote about in a big data / nanotechnology post.

So, what is the objective of this post? To get people thinking about things in a different way, to foster collaboration and partnerships between businesses and educational institutions to push the limits of technology, and to foster discussion about what others believe the future of computing and smart devices will look like. I’m confident that I will see these types of systems in my lifetime, and see the possibility of a lot of this occurring within the next decade.

What are your thoughts?

Getting Started with Big Data

Posted on Updated on

Image

Being in Sales I have the opportunity to speak to a lot of customers and prospects about many things. Most are interested in both Cloud Computing and Big Data, but often they don’t fully understand how they will leverage the technology to maximize the benefit. There is a simple three-step process that I use:

1. Explain that there is no single correct answer. There are still many definitions, so it is more important to focus on what you need than on what you call it.

2. Relate the technology to something people are likely already familiar with (extending those concepts). For example: Cloud computing is similar to virtualization, and has many of the same benefits; Big Data is similar to data warehousing.

3. Provide a high-level explanation of how “new and old” are different. For example: Cloud computing often occurs in an external data center – possibly one that you may not even know where it is, so security is even more complex and important than with in-house systems and applications; Big Data often uses data that is not from your environment – possibly even data that you do not know will have value or not, so robust integration tools are very important.

Big Data is a little bit like my first house. I was newly married, anticipated having children, and anticipated moving into a larger house in the future. My wife and I started buying things that fit into our vision of the future and storing it in our basement. We were planning for a future that was not 100% known.

But, our vision changed over time and we did not know exactly what we needed until the very end. After 7 years our basement was very full and it was difficult to find things.  When we moved to a bigger house we did have a lot of what we needed. We also had things in storage that we no longer wanted or needed. And, there were a few things we wished that we had purchased earlier. We did our best, and most of what we did was beneficial.

How many of you would have thought that Social Media Sentiment Analysis would be important 5 years ago? How many would have thought that hashtag usage would have become so pervasive in all forms of media? How many understood the importance of location information (and even the time stamp for that location)? My guess is that it would not be many.

This ambiguity is both the good and bad thing about big data. In the old data warehouse days you knew what was important because this was your data about your business, systems, and customers.  While IT may have seemed tough before, it can be much more challenging now. But, the payoff can also be much larger so it is worth the effort.

Now we care about unstructured data (website information, blog posts, press releases, tweets, etc.), streaming data (stock ticker data is a common example), sensor data (temperature, altitude, humidity, location, lateral and horizontal forces – think logistics), etc.  So, you are getting data from multiple sources having multiple time frame references (e.g., constant streaming versus hourly updates), often in an unknown or inconsistent format. Many times you don’t know what you don’t know – and you just need to accept that.

In a future post I will discuss scenarios that take advantage of Big Data, and why allowing some ambiguity and uncertainty in your model could be one of the best things that you have ever done.  But for now take a look at the links below for more basic information:

This article discusses why Big Data matters, and how you can get value without needing complex analytics.

Big Data article that discusses the importance of taking action quickly to gain a competitive advantageNote: Free registration to the site may be required to view this article.

This article (Big Data is the Tower of Babel) discusses the importance of data integration.

This short article discusses three important considerations for a Big Data project. While correct, the first point is really the key when getting started.

This is a good high-level article on Hadoop 2.0.  Remember how I described the basement in my first house? That’s how Hadoop is utilized in many cases.

My perspective on Big Data

Posted on Updated on

Ever since I worked on redesigning a risk management system at an insurance company (1994-1995) I was impressed at how you could make better decisions with more data – assuming it was the right data.  The concept of, “What is the right data?” has intrigued me for years, as what may seem common sense today could have been unknown 5-10 years ago, and may be completely passe 5-10 years from now. Context becomes very important because of the variability of data over time.

And this is what makes Big Data interesting. There really is no right or wrong answer or definition. Having a framework to define, categorize, and use that data is important. And at some point being able to refer to the data in-context will be very important as well. Just think about how challenging it could be to compare scenarios or events from 5 years ago with those of today. It’s not apples-to-apples but could certainly be done. It is pretty cool stuff.

The way I think of Big Data is similar to a water tributary system. Water gets into the system many ways – rains from the clouds, sprinkles from private and public supplies, runoff and overflow, etc.  It also has many interesting dimensions, such as quality / purity (not necessarily the same due to different aspects of need), velocity, depth, capacity, and so forth. Not all water gets into the tributary system (e.g., some is absorbed into the groundwater tables, and some evaporates), so data loss is expected. If you think in terms of streams, ponds, rivers, lakes, reservoirs, deltas, etc. there are many relevant analogies that can be made. And just like the course of a river may change over time, data in our water tributary system could also change over time.

Another part of my thinking is based on an experience I had about a decade ago working on a project for a Nanotech company. In their labs they were testing various things. There were particles that changed reflectivity based on temperature that were embedded in shingles and paint. There were very small batteries that could be recharged tens of thousands of times, were light, and had more capacity than a 12-volt car battery. And, there was a section where they were doing “biometric testing” for the military. I have since read articles about things like smart fabrics that could monitor the health of a soldier, and do things like apply basic first aid when a problem was detected.  This company felt that by 2020 advanced nanotechnology would be widely used by the military, and by 2025 it would be in wide commercial use.  Is that still a possibility? Who knows…

Much of what you read today is about the exponential growth of data. I agree with that, but also believe that the nature of the source of that data will change.  For example, nano-particles in engine oil will provide information about temperature, engine speed and load, and even things like rapid changes in movement (fast take-off or stops, quick turns). The nano-particles in the paint will provide weather conditions. The nano-particles on the seat upholstery will provide information about occupants (number, size, weight). Sort of like the “sensor web,” from the original Kevin Delin perspective. A lot of data will be generated, but then what?

I believe that time will be an important aspect of every piece of data, but I also feel that location (X, Y, and Z coordinates) will be just as important. But, not every sensor will collect location. I believe there will be multiple data aggregators in common use at common points (your car, your house, your watch). Those aggregators will package the available data in something akin to an XML object, which allows flexibility.  And, from my perspective, this is where things get real interesting.

Currently we have companies like Google that make a lot of money from aggregating data. I believe that there will be opportunities for individuals to place their anonymized data to a data exchange for sale. The more interesting their data, the more value it has and the more benefit it provides to the person selling it. This could have a huge economic impact, and that would foster both the use and expansion of various commercial ecosystems required to manage the commercial aspects of this technology.

The next logical step in this vision is “smart everything.” For example, you could buy a shirt that is just a shirt. But, for an extra cost you could turn-on medical monitoring or refractive heating / cooling. And, if you felt there was a market for extra dimensions of data that could benefit you financially, then you could enable those sensors as well. Just think of the potential impact that technology would make to commerce in this scenario.

Anyway, that is what I personally think will happen within the next decade or so. This won’t be the only type or use of big data. Rather, there will be many valid types and uses of data – some complementary and some completely discrete. What is common is that someone will find potential value in that data, today or someday in the future, and decide to store it. Someone else will see this data as a competitive advantage and do something interesting with it. Who knows what we will view as valuable data 5-10 years from now.

So, what are your thoughts? Can we predict the future, or simply create platforms that are powerful enough, flexible enough, and extensible enough to change as our perspective of what is important changes? Either way it will be fun!