Big Data
Ideas are sometimes Slippery and Hard to Grasp
I started this blog with the goal of becoming an “idea exchange,” as well as a way to pass along lessons learned to help others. Typical guidance for a blog is to focus on one thing and do it well to develop a following. That is especially important if you want to monetize the blog, but that is not and has not been my goal.
One of the things that has surprised me is how different the comments and likes are for each post. Feedback from the last post was even more diverse and surprising than usual. It ranged from comments about “Siri vs Google” to feedback about Sci-Fi books and movies to Artificial Intelligence.
I asked a few friends for feedback and received something very insightful (Thanks Jim). He stated that he found the blog interesting but wasn’t sure of the objective. He went on to identify several possible goals for the last post. Strangely enough (or maybe not), his comments mirrored the type of feedback that I received. That pointed out an area for improvement, and I appreciated that as well as the wisdom of focusing on one thing. Who knows, maybe in the future…
This also reminded me of a white paper written 12-13 years ago by someone I used to work with. It was about how Bluetooth would be the “next big thing.” He had read an IEEE paper or something and saw potential for this new technology. His paper provided the example of your toaster and coffee maker communicating so that your breakfast would be ready when you walk into the kitchen in the morning.
At that time, I had a couple of thoughts. Who cared about something that only had a 20-30-foot range when WiFi had become popular and had a much greater range? In addition, a couple of years earlier, I had a tour of the Microsoft “House of the Future,” in which everything was automated and key components communicated. But everything in the house was all hardwired or used WiFi – not Bluetooth. It was easy to dismiss his assertion because it seemed to lack pragmatism. The value of the idea was difficult to quantify, given the use case provided.
Looking back now, I view that white paper as having insight. If it was visionary, he would have come out with the first Bluetooth speakers, car interface, or even phone earpiece and gotten rich, but it failed to present practical use cases that were easy enough to understand yet different enough from what was available at the time to demonstrate the real value of the idea. His expression of idea was not tangible enough and, therefore, too slippery to be easily grasped and valued.
I believe that good ideas sometimes originate where you least expect them. Those ideas are often incremental – seemingly simple and sometimes borderline obvious, often building on another idea or concept. An idea does not need to be unique to be important or valuable, but it needs to be presented in a way that makes it easy to understand the benefits, differentiation, and value. That is just good communication.
One of the things I miss most from when my consulting company was active was the interaction between a couple of key people (Jason and Peter) and myself. Those guys were very good at taking an idea and helping build it out. This worked well because we had some overlapping expertise and experiences as well as skills and perspectives that were more complementary. That diversity increased the depth and breadth of our efforts to develop and extend those ideas by asking the tough questions early and ensuring we could convince each other of the value.
Our discussions were creative, highly collaborative, and a lot of fun. We improved from them, and the outcome was usually viable from a commercial perspective. As a growing and profitable small business, you must constantly innovate to differentiate yourself. Our discussions were driven as much by necessity as intellectual curiosity, and I believe this was part of the magic.
So, back to the last post. I view various technologies as building blocks. Some are foundational, and others are complementary. To me, the key is not viewing those various technologies as competing with each other. Instead, I look for potential value created by integrating them with each other. That may not always be possible and does not always lead to something better, but occasionally it does, so to me, it is a worthwhile exercise. With regard to voice technology, I believe we will see more, better, and smarter applications of it – especially as real-time and AI systems become more complex due to the use of an increasing number of specialized chips, component systems, geospatial technology, and sensors.
While today’s smartphone interfaces would not pass the Turing Test or proposed alternatives, they are an improvement over more simplistic voice translation tools available just a few years ago. Advancement requires the tools to understand context in order to make inferences. This brings you closer to machine learning, and big data (when done right) significantly increases that potential.
Ultimately, this all leads back to Artificial Intelligence (at least in my mind). It’s a big leap from a simple voice translation tool to AI, but it is not such a stretch when viewed as building blocks.
Now think about creating an interface (API) that allows one smart device to communicate with another, like the collaborative efforts described above with my old team. It’s not simply having a front-end device exchanging keywords or queries with a back-end device. Instead, it is two or more devices and/or systems having a “discussion” about what is being requested, looking at what each component “knows,” making inferences based on location and speed, asking clarifying questions and making suggestions, and then finally taking that multi-dimensional understanding of the problem to determine what is really needed.
So, possibly not true AI (yet), but a giant leap forward from what we have today. That would help turn the science fiction of the past into science fact in the near future. The better the understanding and inferences by the smart system, the better the results.
I also believe that the unintended consequence of these new smart systems is that they will likely make errors or have biases like a human as they become more human-like in their approach. Hopefully, those smart systems will be able to automatically back-test recommendations to validate and minimize errors. If they are intelligent enough to monitor results and suggest corrective actions when they determine that the recommendation does not have the optimal desired results, they would become even “smarter.” There won’t be an ego creating a distortion filter about the approach or the results. Or maybe there will…
Many of the building blocks required to create these new systems are available today. But it takes vision and insight to see that potential, translate ideas from slippery and abstract to tangible and purposeful, and then start building something cool and useful. As that happens, we will see a paradigm shift in how we interact with computers and how they interact with us. It will become more interactive and intuitive. That will lead us to the systematic integration that I wrote about in a big data / nanotechnology post.
So, what is the real objective of my blog? To get people thinking about things differently, to foster collaboration and partnerships between businesses and educational institutions to push the limits of technology, and to foster discussion about what others believe the future of computing and smart devices will look like. I’m confident that I will see these types of systems in my lifetime, and I believe in the possibility of this occurring within the next decade.
What are your thoughts?
The Future of Smart Interfaces
Recently, I was helping one of my children research a topic for a school paper. She was doing well, but the results she was getting were overly broad. So, I taught her some “Google-Fu,” explaining how you can structure queries in ways that yield better results. She replied that search engines should be smarter than that. I explained that sometimes the problem is that search engines look at your past searches and customize results as an attempt to appear smarter or to motivate someone to do or believe something.
Unfortunately, those results can be skewed and potentially lead someone in the wrong direction. It was a good reminder that getting the best results from search engines often requires a bit of skill and query planning, as well as occasional third-party validation.
Then the other day I saw this commercial from Motel 6 (“GasStation Trouble”) where a man has problems getting good results from his smartphone. That reminded me of seeing someone speak to their phone, getting frustrated by the responses received. His questions went something like this:
“Siri, I want to take my wife to dinner tonight, someplace that is not too far away, and not too late. And she likes to have a view while eating so please look for something with a nice view. Oh, and we don’t want Italian food because we just had that last night.”
Just as amazing as the question being asked was watching him ask it over and over again in the exact same way, each time becoming even more frustrated. I asked myself, “Are smartphones making us dumber?” Instead of contemplating that question I began to think about what future smart interfaces would or could be like.
I grew up watching Sci-Fi computer interfaces like “Computer” on Star Trek (1966), “HAL” on 2001 : A Space Odyssey (1968), “KITT” from Knight Rider (1982), and “Samantha” from Her (2013). These interfaces had a few things in common:
- They responded to verbal commands.
- They were interactive – not just providing answers, but also asking qualifying questions and allowing for interrupts to drill-down or enhance the search (e.g., with pictures or questions that resembled verbal Venn diagrams).
- They often suggested alternative queries based on intuition. That would have been helpful for the gentleman trying to find a restaurant.
Despite having 50 years of science fiction examples, we are still a long way off from realizing the goal of a truly intelligent interface. Like many new technologies, they were originally envisioned by science fiction writers long before they appeared in science.
There seems to be a spectrum of common beliefs about modern interfaces. On one end, some products make visualization easy, facilitating understanding, refinement, and drill-down of data sets. Tableau is an excellent example of this type of easy-to-use interface. At the other end of the spectrum, the emphasis is on back-end systems – robust computer systems that digest huge volumes of data and return the results to complex queries within seconds. Several other vendors offer powerful analytics platforms. In reality, you really need a strong front-end and back-end if you want to achieve the full potential of either.
But, there is so much more potential…
I predict that within the next 3 – 5 years, we will see business and consumer interface examples (powered by AI and Natural Language Processing, or NLP) that are closer to the verbal interfaces from those familiar Sci-Fi shows (albeit with limited capabilities and no flashing lights).
Within the next 10 years, I believe we will have computer interfaces that intuit our needs and facilitate the generation of correct answers quickly and easily. While this is unlikely to be at the level of “The world’s first intelligent Operating System” envisioned in the movie “Her,” and probably won’t even be able to read lips like “HAL,” it should be much more like HAL and KITT than like Siri (from Apple) or Cortana (from Microsoft).
Siri was groundbreaking consumer technology when it was introduced. Cortana seems to have taken a small leap ahead. While I have not mentioned Google Now, it is somewhat of a latecomer to this consumer smart interface party, and in my opinion, it is behind both Siri and Cortana.
So, what will this future smart interface do? It will need to be very powerful, harnessing a natural language interface on the front-end with an extremely flexible and robust analytics interface on the back-end. The language interface will need to take a standard question (in multiple languages and dialects) – just as if you were asking a person, deconstruct it using Natural Language Processing, and develop the proper query based on the available data. That is important, but it only gets you so far.
Data will come from many sources – things that we consider today with relational, object, graph, and NoSQL databases. There will be structured and unstructured data that must be joined and filtered quickly and accurately. In addition, context will be more important than ever. Pictures and videos could be scanned for facial recognition, location (via geotagging), and, in the case of videos, analyze speech. Relationships will be identified and inferred based on a variety of sources, using both data and metadata. Sensors will collect data from almost everything we do and (someday) wear, which will provide both content and context.
The use of Stylometry will identify outside content likely related to the people involved in the query and provide further context about interests, activities, and even biases. This is how future interfaces will truly understand (not just interpret), intuit (so it can determine what you really want to know), and then present results that may be far more accurate than we are used to today. Because the interface is interactive in nature, it will provide the ability to organize and analyze subsets of data quickly and easily.
So, where do I think that this technology will originate? I believe that it will be adapted from video game technology. Video games have consistently pushed the envelope over the years, helping drive the need for higher bandwidth I/O capabilities in devices and networks, better and faster graphics capabilities, and larger and faster storage (which ultimately led to flash memory and even Hadoop). Animation has become very lifelike, and games are becoming more responsive to audio commands. It is not a stretch of the imagination to believe that this is where the next generation of smart interfaces will be found (instead of from the evolution of current smart interfaces).
Someday, it may no longer be possible to “tweak” results through the use or omission of keywords, quotation marks, and flags. Additionally, it may no longer be necessary to understand special query languages (SQL, NoSQL, SPARQL, etc.) and syntax. We won’t have to worry as much about incorrect joins, spurious correlations, and biased result sets. Instead, we will be given the answers we need – even if we don’t realize that this was what we needed in the first place – which will likely be driven by AI. At that point, computer systems may appear nearly omniscient.
When this happens, parents will no longer need to teach their children “Google-Fu.” Those are going to be interesting times indeed.
Big Data – The Genie is out of the Bottle!
Back in early 2011, me and other members of the Executive team at Ingres were taking a bet on the future of our company. We knew we needed to do something big and bold, so we decided to build what we thought the standard data platform would be in 5-7 years. A small minority of the team members did not believe this was possible and left, while the rest focused on making that happen. There were three strategic acquisitions to fill in the gaps on our Big Data platform. Today (as Actian), we have nearly achieved our goal. It was a leap of faith back then, but our vision turned out to be spot-on, and our gamble is paying off today.
My mailbox is filled daily with stories, seminars, white papers, etc., about Big Data. While it feels like this is becoming more mainstream, reading and hearing the various comments on the subject is interesting. They range from “It’s not real” and “It’s irrelevant” to “It can be transformational for your business” to “Without big data, there would be no <insert company name here>.”
What I continue to find amazing is hearing comments about big data being optional. It’s not – that genie has already been let out of the bottle. There are incredible opportunities for those companies that understand and embrace the potential. I like to tell people that big data can be their unfair advantage in business. Is that really the case? Let’s explore that assertion and find out.
We live in the age of the “Internet of Things.” Data about nearly everything is everywhere, and the tools to correlate that data to gain an understanding of so many things (activities, relationships, likes and dislikes, etc.) With smart devices that enable mobile computing, we have the extra dimension of location. And, with new technologies such as Graph Databases (based on SPARQL), graphic interfaces to analyze that data (such as Sigma), and identification technology such as Stylometry, it is getting easier to identify and correlate that data. Someday, this will feed into artificial intelligence, becoming a superpower for those who know how to leverage it effectively.
We are generating increasingly larger and larger volumes of data about everything we do and everything going on around us, and tools are evolving to make sense of that data better and faster than ever. Those organizations that perform the best analysis get the answers fastest and act on that insight quickly are more likely to win than organizations that look at a smaller slice of the world or adopt a “wait and see” posture. So, that seems like a significant advantage in my book. But is it an unfair advantage?
First, let’s remember that big data is just another tool. Like most tools, it has the potential for misuse and abuse. Whether a particular application is viewed as “good” or “bad” is dependent on the goals and perspective of the entity using the tool (which may be the polar opposite view of the groups of people targeted by those people or organizations). So, I will not attempt to judge the various use cases but rather present a few use cases and let you decide.
Scenario 1 – Sales Organization: What if you could understand what you were being told a prospect company needs and had a way to validate and refine that understanding? That’s half the battle in sales (budget, integration, and support / politics are other key hurdles). Data that helped you understand not only the actions of that organization (customers and industries, sales and purchases, gains and losses, etc.) but also the stakeholders’ and decision-makers’ goals, interests, and biases. This could provide a holistic view of the environment and allow you to provide a highly targeted offering, with messaging tailored to each individual. That is possible, and I’ll explain soon.
Scenario 2 – Hiring Organization: Many questions cannot be asked by a hiring manager. While I’m not an attorney, I would bet that State and Federal laws have not kept pace with technology. And while those laws vary state by state, there are likely loopholes allowing public records to be used. Moreover, implied data that is not officially considered could color the judgment of a hiring manager or organization. For instance, if you wanted to “get a feeling” that a candidate might fit in with the team or the culture of the organization or have interests and views that are aligned with or contrary to your own, you could look for personal internet activity that would provide a more accurate picture of that person’s interests.
Scenario 3 – Teacher / Professor: There are already sites in use to search for plagiarism in written documents, but what if you had a way to make an accurate determination about whether an original work was created by your student? There are people who, for a price, will do the work and write a paper for a student. So, what if you could not only determine that the paper was not written by your student but also determine who the likely author was?
Do some of these things seem impossible or at least implausible? Personally, I don’t believe so. Let’s start with the typical data that our credit card companies, banks, search engines, and social network sites already have related to us. Add to that the identified information available for purchase from marketing companies and various government agencies. That alone can provide a pretty comprehensive view of us. But there is so much more that’s available.
Consider the potential of gathering information from intelligent devices accessible through the Internet, your alarm and video monitoring system, etc. These are intended to be private data sources, but one thing history has taught us is that anything accessible is subject to unauthorized access and use (just think about the numerous recent credit card hacking incidents).
Even de-identified data (medical / health / prescription / insurance claim data is one major example), which receives much less protection and can often be purchased, could be correlated with a reasonably high degree of confidence to gain an understanding of other “private” aspects of your life. The key is to look for connections (websites, IP addresses, locations, businesses, people), things that are logically related (such as illnesses / treatments / prescriptions), and then accurately identify (stylometry looks at things like sentence complexity, function words, co-location of words, misspellings and misuse of words, etc. and will likely someday take into consideration things like idea density). It is nearly impossible to remain anonymous in the Age of Big Data.
There has been a paradigm shift regarding the practical application of data analysis, and the companies that understand this and embrace it will likely perform better than those that don’t. There are new ethical considerations that arise from this technology, and likely new laws and regulations as well. But for now, the race is on!
Genetics, Genomics, Nanotechnology, and more
Science has been interesting to me for most of my lifetime, but it wasn’t until my first child was born that I shifted from “interested” to “involved.” My eldest daughter was diagnosed with Systemic Onset Juvenile Idiopathic Arthritis (SoJIA – originally called Juvenile Rheumatoid Arthritis, or JRA) when she was 15 months old, which also happened to be about six months into the start of my old Consulting company and in the middle of a very critical Y2K ERP system upgrade and rehosting project. It was definitely a challenging time in my life.
At that time, there was very little research on JRA because it was estimated there were only 30,000 children affected by the disease, and the implication was that funding research would not have a positive ROI. This was also a few years before the breakthroughs of biological medicines like Enbrel for children.
One of the things that I learned was that this disease could be horribly debilitating. Children often had physical deformities as a result of this disease. Even worse, the systemic type that my daughter has could result in premature death. As a first-time parent, imagining that type of life for your child was extremely difficult.
Luckily, the company I had just started was taking off, so I decided to find ways to make a tangible difference for all children with this disease. We decided to take 50% of our net profits and use them to fund medical research. We aimed to fund $1 million in research and find a cure for Juvenile Arthritis within the next 5-7 years.
As someone new to “major gifts” and philanthropy, I quickly learned that some gifting vehicles were more beneficial than others. While most organizations wanted you to start a fund (which we did), the impact from that tended to be more long-term and less immediate. I met someone passionate, knowledgeable, and successful in her field who showed me a different and better approach (here’s a post that describes that in more detail).
I no longer wanted to blindly give money and hope it was used quickly and properly. Rather, I wanted to treat these donations like investments in a near-term cure. In order to be successful, I needed to understand research from both medical and scientific perspectives in these areas. That began a new research and independent learning journey in completely new areas.
There was a lot going on in Genetics and Genomics at the time (here’s a good explanation of the difference between the two). My interest and efforts in this area led to a position on the Medical and Scientific Advisory Committee with the Arthritis Foundation. With the exception of me, the other members were talented and successful physicians who were also involved with medical research. We met quarterly, and I did ask questions and made suggestions that made a difference. But, unlike everyone else on the committee, I needed to study and prepare for 40+ hours for each call to ensure that I had enough understanding to add value and not be a distraction.
A few years later we did work for a Nanotechnology company (more info here for those interested). The Chief Scientist wasn’t interested in explaining what they did until I described some of our research projects on gene expression. He then went into great detail about what they were doing and how he believed it would change what we do in the future. I saw that and agreed. I also started thinking of the potential for leveraging nanotechnology with medicine.
While driving today, I was listening to the “TED Radio Hour” and heard a segment about entrepreneur Richard Resnick. It was exciting because it got me thinking about this again – a topic I haven’t thought about for the past few years (the last time, I was contemplating how new analytics products could be useful in this space).
There are efforts today with custom, personalized medicines that target only specific genes for a specific outcome. The genetic modifications being performed on plants today will likely be performed on humans in the near future (I would guess within 10-15 years). The body is an incredibly adaptive organism, so it will be very challenging to implement anything that is consistently safe and effective long-term. But that day will come.
It’s not a huge leap from genetically modified “treatment cells” to true nanotechnology (not just extremely small particles). Just think, machines that can be designed to work independently within us to do what they are programmed to do and, more importantly, identify and understand adaptations (i.e., artificial intelligence) as they occur and alter their approach and treatment plan accordingly based on changes and findings. This is extremely exciting. It’s not that I want to live to be 100+ years old – because I don’t. But, being able to do things that positively impact the quality of life for children and their families is a worthy goal from my perspective.
My advice is to always continue learning, keep an open mind, and see what you can personally do to make a difference. You will never know unless you try.
Note: Updated to fix and remove dead links.
Spurious Correlations – What they are and Why they Matter
In an earlier post, I mentioned that one of the big benefits of geospatial technology is its ability to show connections between complex and often disparate data sets. As you work with Big Data, you tend to see the value of these multi-layered and often multi-dimensional perspectives of a trend or event. While that can lead to incredible results, it can also lead to spurious data correlations.
First, let me state that I am not a Data Scientist or Statistician, and there are definitely people far more expert on this topic than myself. But, if you are like the majority of companies out there experimenting with geospatial and big data, it is likely that your company doesn’t have these experts on staff. So, a little awareness, understanding, and caution can go a long way in this scenario.
Before we dig into that more, let’s think about what your goal is:
- Do you want to be able to identify and understand a particular trend – reinforcing actions and/or behavior? –OR–
- Do you want to understand what triggers a specific event – initiating a specific behavior?
Both are important, but they are both different. My focus has been identifying trends so that you can leverage or exploit them for commercial gain. While that may sound a bit ominous, it is really what business is all about.
A popular saying goes, “Correlation does not imply causation.” A common example is that you may see many fire trucks for a large fire. There is a correlation, but it does not imply that fire trucks cause fires. Now, extending this analogy, let’s assume that the probability of a fire starting in a multi-tenant building in a major city is relatively high. Since it is a big city, it is likely that most of those apartments or condos have WiFi hotspots. A spurious correlation would be to imply that WiFi hotspots cause fires.
As you can see, there is definitely the potential to misunderstand the results of correlated data. A more logical analysis would lead you to see the relationships between the type of building (multi-tenant residential housing) and technology (WiFi) or income (middle-class or higher). Taking the next step to understand the findings, rather than accepting them at face value, is very important.
Once you have what looks to be an interesting correlation, there are many fun and interesting things you can do to validate, refine, or refute your hypothesis. It is likely that even without high-caliber data experts and specialists, you will be able to identify correlations and trends that can provide you and your company with a competitive advantage. Don’t let the potential complexity become an excuse for not getting started. As you can see, gaining insight and creating value with a little effort and simple analysis is possible.
- ← Previous
- 1
- 2
- 3
- 4
- Next →




