Today I ran across this article that was very good as it focused on lessons learned, which potentially helps everyone interested in these topics. It contained a good mix of problems at a non-technical level.
Below is the link to the article, as well as commentary on the Top 3 items listed from my perspective.
The article starts by discussing how the “problem” being evaluated was misstated using technical terms. It led me to believe that at least some of these efforts are conducted “in a vacuum.” That was a surprise given the cost and strategic importance of getting these early-adopter AI projects right.
In Sales and Marketing you start the question, “What problem are we trying to solve?” and evolve that to, “How would customers or prospects describe this problem in their own words?” Without that understanding, you can neither initially vet the solution nor quickly qualify the need for your solution when speaking with those customers or prospects. That leaves a lot of room for error when transitioning from strategy to execution.
Increased collaboration with Business would likely have helped. This was touched on at the end of the article under “Cultural challenges,” but the importance seemed to be downplayed. Lessons learned are valuable – especially when you are able to learn from the mistakes of others. To me, this should have been called out early as a major lesson learned.
This second area had to do with the perspective of the data, whether that was the angle of the subject in photographs (overhead from a drone vs horizontal from the shoreline) or the type of customer data evaluated (such as from a single source) used to train the ML algorithm.
That was interesting because it appears that assumptions may have played a part in overlooking other aspects of the problem, or that the teams may have been overly confident about obtaining the correct results using the data available. In the examples cited those teams did figure those problems out and took corrective action. A follow-on article describing the process used to make their root cause determination in each case would be very interesting.
As an aside, from my perspective, this is why Explainable AI is so important. There are times that you just don’t know what you don’t know (the unknown unknowns). Being able to understand why and on what the AI is basing its decisions should help with providing better quality curated data up-front, as well as being able to identify potential drifts in the wrong direction while it is still early enough to make corrections without impacting deadlines or deliverables.
This didn’t surprise me but should be a cause for concern as advances are made at faster rates and potentially less validation is made as organizations race to be first to market with some AI-based competitive advantage. The last paragraph under ‘Training data bias’ stated that based on a PWC survey, “only 25 percent of respondents said they would prioritize the ethical implications of an AI solution before implementing it.”
The discussion about the value of unstructured data was very interesting, especially when you consider:
- The potential for NLU (natural language understanding) products in conjunction with ML and AI.
- This is a great NLU-pipeline diagram from North Side Inc. in Canada, one of the pioneers in this space.
- The importance of semantic data analysis relative to any ML effort.
- The incredible value that products like MarkLogic’s database or Franz’s AllegroGraph provide over standard Analytics Database products.
- I personally believe that the biggest exception to assertion this will be from GPU databases (like OmniSci) that easily handle streaming data, can accomplish extreme computational feats well beyond those of traditional CPU based products, and have geospatial capabilities that provide an additional dimension of insight to the problem being solved.
Update: This is a link to a related article that discusses trends in areas of implementation, important considerations, and the potential ROI of AI projects: https://www.fastcompany.com/90387050/reduce-the-hype-and-find-a-plan-how-to-adopt-an-ai-strategy
This is definitely an exciting space that will experience significant growth over the next 3-5 years. The more information, experiences, and lessons learned shared the better it will be for everyone.
Earlier this week I was reading a blog post regarding the recent Gartner Hype Cycle for Advanced Analytics and Data Science, 2015. The Gartner chart reminded me of the epigram, “Plus ça change, plus c’est la même chose” (asserting that history repeats itself by stating the more things change, the more they stay the same.) To some extent that is true, as you could consider today’s big data as derivative of yesterday’s VLDBs (very large databases) and Data Warehouses. One of the biggest changes IMO is the shift away from Star Schemas and practices implemented for performance reasons such as aggregation of data sets, use of derived and encoded values, the use of surrogate and foreign keys to establish linkage, etc.
There are many dimensions to big data: Huge sample of data (volume), which becomes your universal set and supports deep analysis as well as temporal and spatial analysis; A variety of data (structured and unstructured) that often does not lend itself to SQL based analytics; and often data streaming in (velocity) from multiple sources – an area that will become even more important in the era of the Internet of Things. These are the “Three V’s” that people have been talking about for the past five years.
Like many people, my interest in Object Database technology initially waned in the late 1990’s. That is, until about four years ago when a project at work led me back in this direction. As I dug into the various products I learned that they were alive and doing very well in several niche areas. That finding led to a better understanding of the real value of object databases.
Some products try to be, “All Vs to all people,” but generally what works best is a complementary, integrated set of tools working together as services within a single platform. It makes a lot of sense. So, back to object databases.
One of the things I like most about my job is the business development aspect. One of the product families I’m responsible for is Versant. With the Versant Object Database (VOD – high performance, high throughput, high concurrency) and Fast Objects (great for embedded applications). I’ve met and worked with some brilliant people who have created amazing products based on this technology. Creative people like these are fun to work with, and helping them grow their business is mutually beneficial. Everyone wins.
An area where VOD excels is with the near real-time processing of streaming data. The reason it is so adept to this task is the way that object map out in the database. They do so in a way that essentially mirrors reality. So, optionality is not a problem – no disjoint queries or missed data, no complex query gyrations to get the correct data set, etc. Things like sparse indexing are no problem with VOD. This means that pattern matching is quick and easy, as well as more traditional rule and look-up validation. Polymorphism allows objects, functions, and even data to have more than one form.
VOD does more by allowing data to be more, which is ideal for environments where change is the norm. Cyber Security, Fraud Detection, Threat Detection, Logistics, and Heuristic Load Optimization. In each case, the key to success is performance, accuracy, and adaptability.
So, while some things stay the same, others really do change. The ubiquity of devices generating data today, combined with the desire for people and companies to leverage that data for commercial and non-commercial benefit, really is very different than what we saw 10 or more years ago. Products like VOD are working their way up that Slope of Enlightenment because there is a need to connect the dots better and faster – especially as the volume and variety of those dots increases. It is not a, “one size fits all” solution, but it is the perfect tool for this type of work.
These are exciting times.
I started this blog the goal of it being an “idea exchange,” as well a way to pass along lessons learned to help others. One of the things that has surprised me is how different the comments and likes are for each post. Feedback from the last post was even more diverse and surprising than usual. It ranged from comments about “Siri vs Google,” to feedback about Sci-Fi books and movies, to Artificial Intelligence.
I asked a few friends for feedback and received something very insightful (Thanks Jim). He stated that he found the blog interesting, but wasn’t sure what the objective was. He went on to identify several possible goals for the last post. Strangely enough, his comments mirrored the type of feedback that I received. That pointed out an area for improvement to me, and I appreciated that.
This also got me thinking about a white paper written 12-13 years ago by someone I used to work with. It was about how Bluetooth was going to be the “next big thing.” He had read an IEEE paper or something and saw potential for this new technology. His paper provided the example of your toaster and coffee maker communicating so that your breakfast would be ready when you walk into the kitchen in the morning.
At that time I had a couple of thoughts. First, who cared about something that only had a 20-30 foot range when WiFi was becoming popular and had much greater range? In addition, a couple of years earlier I had a tour of the Microsoft “House of the Future,” in which everything was automated and key components communicated with each other. But everything in the house was all hardwired or used WiFi – not Bluetooth. So, it was easy to dismiss his assertion because it seemed too abstract.
Looking back now I view that white paper as having insight (if it were visionary he would have come out with the first Bluetooth speakers, or car interface, or even phone earpiece and gotten rich), but it failed to present use cases that were easy enough to understand yet different enough from what was available at the time to demonstrate the real value of the idea. His expression of idea was not tangible enough and therefore too slippery to be easily grasped.
I’m a huge believer that good ideas sometimes originate where you least expect them. Often those ideas are incremental in nature – seemingly simple and sometimes borderline obvious, but building on some other idea or concept. An idea does not need to be unique in order to be important or valuable, but it does need to be presented in a way that is easy to understand and tangible. That is just good communication.
One of the things I miss most from when my consulting company was active was the interaction between a couple of key people (Jason and Peter) and myself. Those guys were very good at taking an idea and helping build it out. This worked well because we had some common expertise and experiences, but we also had skills and perspectives that were more complementary in nature. That diversity increased the depth and breadth to our efforts to develop and extend those ideas.
Our discussions were creative and highly collaborative, and also a lot of fun. Each of us improved from them, and the outcome us usually something viable from a commercial perspective. As a growing and profitable small business you need to constantly innovate to differentiate yourself from your competition. Our discussions were driven as much by necessity as they were by intellectual curiosity.
So, back to the last post. I view various technologies as building blocks. Some are foundational and others are complementary. To me, the key is not viewing those various technologies as competing with each other. Instead, I look for the value in integrating them with each other. It is not always possible and does not always lead to something better, but occasionally it does. With regard to voice technology, I do believe that we will see more, better and smarter applications of it.
While today’s smart phones would not pass the Turing Test or proposed alternatives, they are an improvement over more simplistic voice translation tools available just a few years ago. Advancement requires the tool to understand context in order to make inferences. This brings you closer to machine learning, and big data (when done right) increases that potential. Ultimately, this all leads to Artificial Intelligence (at least in my mind). It’s a big leap from a simple voice translation tool to AI, but when viewed as building blocks it is not such a stretch.
Now think about creating an interface (API) that allows one smart device to communicate with another in something akin to the collaborative efforts described above with my old team. It’s not simply having a front-end device exchanging keywords or queries with a back-end device. Instead, it is two or more devices and/or systems having a “discussion” about what is being requested, looking at what each component “knows,” asking clarifying questions and making suggestions, and then finally taking that multi-dimensional understanding to determine what is really needed.
So, possibly not true AI, but a giant leap forward from what we have today. That would help turn science fiction of the past into science fact in the near future. The better the understanding and inferences by the smart system, the better the results. I also believe that the unintended consequences of these new smart systems it that are they become more human like in their approach the more likely they will be to make errors. But, hopefully they will be able to back test recommendations to help minimize errors and be intelligent enough to monitor results and make suggestions about corrective actions when they determine that the recommendation was not optimal. And even more importantly, there won’t be an ego creating a distortion filter on the results.
A lot of the building blocks required to create these new systems are available today. But, it takes both vision and insight to see that potential, translate ideas from slippery and abstract to tangible and purposeful, and then start building something really cool. As that happens we will see a paradigm shift in how we interact with computers and how they interact with us. That will lead us to the systematic integration that I wrote about in a big data / nanotechnology post.
So, what is the objective of this post? To get people thinking about things in a different way, to foster collaboration and partnerships between businesses and educational institutions to push the limits of technology, and to foster discussion about what others believe the future of computing and smart devices will look like. I’m confident that I will see these types of systems in my lifetime, and see the possibility of a lot of this occurring within the next decade.
What are your thoughts?
Recently I was helping one of my children research a topic for a school paper. She was doing well, but the results she was getting were overly broad. So, I taught her some “Google-Fu,” explaining how you can structure queries in ways that yield better results. She commented that the searches should be smarter than that, and I explained that sometimes the problem is that search engines look at your past searches and customize results as an attempt to appear smarter. Unfortunately, those results can be skewed and potentially lead someone in the wrong direction. It was a good reminder that getting the best results from search engines often requires a bit of skill and query planning.
Then the other day I saw this commercial from Motel 6 (“Gas Station Trouble”) where a man has problems getting good results from his smart phone. That reminded me of seeing someone speak to their phone, getting frustrated by the responses received. His questions went something like this: “Siri, I want to take my wife to dinner tonight, someplace that is not too far away, and not too late. And she likes to have a view while eating so please look for something with a nice view. Oh, and we don’t want Italian food because we just had that last night.” Just as amazing as the question being asked was watching him ask it over and over again in the exact same way, each time becoming even more frustrated. I asked myself, “Are smart phones making us dumber?” Instead of contemplating that question I began to think about what future smart interfaces would or could be like.
I grew up watching Sci-Fi computer interfaces like “Computer” on Star Trek (1966), “HAL” on 2001 : A Space Odyssey (1968), “KITT” from Knight Rider (1982), and “Samantha” from Her (2013). These interfaces had a few things in common: They responded to verbal commands; They were interactive – not just providing answers, but also asking qualifying questions and allowing for interrupts to drill-down or enhance the search (e.g., with pictures or questions that resembled verbal Venn diagrams); They often provided suggestions for alternate queries based on intuition. Despite having 50 years of science fiction examples we are still a long way off from realizing that goal. Like many new technologies, they were originally envisioned by science fiction writers long before they appeared in science.
There seems to be a spectrum of common beliefs about modern interfaces. On one end there are products that make visualization easy, facilitating understanding, refinement and drill-down of data sets. Tableau is a great example of this type of easy to use interface. At the other end of the spectrum the emphasis is on back-end systems – robust computer systems that digest huge volumes of data and return the results to complex queries within seconds. The Actian Analytics Platform is a great example of a powerful analytics platform. In reality, you really need both if you want to maximize the full potential of either.
But, there is so much more to be done. I predict that within the next 3 – 5 years we will see business and consumer examples that are closer to the verbal interfaces from those familiar Sci-Fi shows (albeit with limited capabilities and no flashing lights). Within the next 10 years I believe we will have computer interfaces that intuit our needs and facilitate generating the correct answers quickly and easily. While this is unlikely to be at the level of “The world’s first intelligent Operating System” envisioned in the movie “Her,” and probably won’t even be able to read lips like “HAL,” it should be much more like HAL and KITT than like Siri (from Apple) or Cortana (from Microsoft). Siri was groundbreaking consumer technology when it was introduced. Cortana seems to have taken a small leap ahead. While I have not mentioned Google Now, it is somewhat of a latecomer to this consumer smart interface party, and in my opinion is behind both Siri and Cortana.
So, what will this future smart interface do? It will need to be very powerful, harnessing a natural language interface on the front-end with an extremely flexible and robust analytics interface on the back-end. The language interface will need to take a standard question (in multiple languages and dialects) – just as if you were asking a person, deconstruct it using Natural Language Processing (NLP), and develop the proper query based on the available data. That is important but only gets you so far.
Data will come from many sources – things that we consider today with relational, object, and graph databases. There will be structured and unstructured data that must be joined and filtered quickly and accurately. In addition, context will be more important than ever. Pictures and videos could be scanned for facial recognition, location (via geotagging), and in the case of videos analyze speech. Relationships will be identified and inferred based on a variety of sources, using both data and metadata. Sensors will collect data from almost everything we do and (someday) wear, which will provide both content and context. The use of Stylometry will identify outside content likely related to the people involved in the query and provide further context about interests, activities, and even biases. This is how future interfaces will truly understand (not just interpret), intuit (so it can determine what you really want to know), and then present results that may be far more accurate than we are used to today. Because the interface is interactive in nature it will provide the ability to organize and analyze subsets of data quickly and easily.
So, where do I think that this technology will originate? I believe that it will be adapted from video game technology. Video games have consistently pushed the envelope over the years, helping drive the need for higher bandwidth I/O capabilities in devices and networks, better and faster graphics capabilities, and larger and faster storage (which ultimately led to flash memory and even Hadoop). Animation has become very lifelike and games are becoming more responsive to audio commands. It is not a stretch of the imagination to believe that this is where the next generation of smart interfaces will be found (instead of from the evolution of current smart interfaces).
Someday it may no longer be possible to “tweak” results through the use or omission of keywords, quotation marks, and flags. Additionally, it may no longer be necessary to understand special query languages (SQL, NoSQL, SPARQL, etc.) and syntax. We won’t have to worry as much about incorrect joins, spurious correlations and biased result sets. Instead, we will be given the answers we need – even if we don’t realize that this was what we needed in the first place. At that point computer systems may appear nearly omniscient.
When this happens parents will no longer need to teach their children “Google-Fu.” Those are going be interesting times indeed.
Back in early 2011 myself and 15 other members of the Executive team at Ingres were taking a bet on the future of our company. We knew that we needed to do something big and bold, and decided to build what we thought the standard data platform would be in 5-7 years. A small minority of the people on that team did not believe this was possible and left, while the rest of us focused on making that happen. There were three strategic acquisitions to fill-in the gaps on our Big Data platform. Today (as Actian) we have nearly achieved our goal. It was a leap of faith back then, but our vision turned out to be spot-on and our gamble is paying off today.
Every day my mailbox is filled with stories, seminars, white papers, etc. about Big Data. While it feels like this is becoming more mainstream, it is interesting to read and hear the various comments on the subject. They range from, “It’s not real” and “It’s irrelevant” to “It can be transformational for your business” to “Without big data there would be no <insert company name here>.”
What I continue to find amazing is hearing comments about big data being optional. It’s not – that genie has already been let out of the bottle. There are incredible opportunities for those companies that understand and embrace the potential. I like to tell people that big data can be their unfair advantage in business. Is that really the case? Let’s explore that assertion and find out.
We live in the age of the “Internet of Things.” Data about nearly everything is everywhere, and the tools to correlate that data to gain understanding of so many things (activities, relationships, likes and dislikes, etc.) With smart devices that enable mobile computing we have the extra dimension of location. And, with new technologies such as Graph Databases (based on SPARQL), graphic interfaces to analyze that data (such as Sigma), and identification technology such as Stylometry, it is getting easier than ever to identify and correlate that data.
We are generating increasingly larger and larger volumes of data about everything we do and everything going on around us, and tools are evolving to make sense of that data better and faster than ever. Those organizations that perform the best analysis, get the answers fastest, and act on that insight quickly are more likely to win than the organizations that look at a smaller slice of the world or adopt a “wait and see” posture. So, that seems like a significant advantage in my book. But, is it an unfair advantage?
First, let’s keep in mind that big data is really just another tool. Like most tools it has the potential for misuse and abuse. And, whether a particular application is viewed as “good” or “bad” is dependent on the goals and perspective of the entity using the tool (which may be the polar opposite view of the groups of people targeted by those people or organizations). So, I will not attempt to make judgments about the various use cases, but rather present a few use cases and let you decide.
Scenario 1 – Sales Organization: What if you could not only understand what you were being told a prospect company needs, but also had a way to validate and refine that understanding? That’s half the battle in sales (budget, integration, and support / politics are other key hurdles). Data that helped you understand not only the actions of that organization (customers and industries, sales and purchases, gains and losses, etc.), but also the goals, interests and biases of the stakeholders and decision makers. This could provide a holistic view of the environment and allow you to provide a highly targeted offering, with messaging tailored to each individual. That is possible, and I’ll explain soon.
Scenario 2 – Hiring Organization: As a hiring manager there are many questions that cannot be asked. While I’m not an attorney, I would bet that State and Federal laws have not kept pace with technology. And, while those laws vary state by state, there are likely loopholes that allow for use of public records. Moreover, implied data that is not officially taken into consideration could color the judgment of a hiring manager or organization. For instance, if you wanted to “get a feeling” if a candidate might fit-in with the team or the culture of the organization, or have interests and views that are aligned with or contrary to your own, you could look for personal internet activity that would provide a more accurate picture of that person’s interests.
Scenario 3 – Teacher / Professor: There are already sites in use to search for plagiarism in written documents, but what if you had a way to make an accurate determination about whether an original work was created by your student? There are people who, for a price, will do the work and write a paper for a student. So, what if you could not only determine that the paper was not written by your student, but also determine who the likely author was?
Do some of these things seem impossible, or at least implausible? Personally, I don’t believe so. Let’s start with the typical data that our credit card companies, banks, search engines, and social network sites already have related to us. Add to that the identified information that is available for purchase from marketing companies and various government agencies. That alone can provide a pretty comprehensive view of us. But, there is so much more that’s available.
Think about the potential of gathering information from intelligent devices that are accessible through the Internet, or your alarm and video monitoring system, etc. These are intended to be private data sources, but one thing history has taught us is that anything accessible is subject to unauthorized access and use (just think about the numerous recent credit card hacking incidents).
Even de-identified data (medical / health / prescription / insurance claim data is one major example), which receives much less protection and can often be purchased, could be correlated with a reasonably high degree of confidence to gain understanding on other “private” aspects of your life. The key is to look for connections (websites, IP addresses, locations, businesses, people), things that are logically related (such as illnesses / treatments / prescriptions), and then make as accurate of an identification as possible (stylometry looks at things like sentence complexity, function words, co-location of words, misspellings and misuse of words, etc. and will likely someday take into consideration things like idea density). It is nearly impossible to remain anonymous in the Age of Big Data.
There has been a paradigm shift when it comes to the practical application of data analysis, and the companies that understand this and embrace it will likely perform better than those who don’t. There are new ethical considerations that arise from this technology, and likely new laws and regulations as well. But for now, the race is on!
In an earlier post I mentioned that one of the big benefits of geospatial technology is its ability to show connections between complex and often disparate data sets. As you work with Big Data you tend to see the value of these multi-layered and often multi-dimensional perspectives of a trend or event. While that can lead to incredible results, it can also lead to spurious correlations of data.
First, let me state that I am not a Data Scientist or Statistician, and there are definitely people far more expert on this topic than myself. But, if you are like the majority of companies out there experimenting with geospatial and big data it is likely that your company doesn’t have these experts on-staff. So, a little awareness, understanding, and caution can go a long way in this type of scenario.
Before we dig into that more, let’s think about what your goal is. Do you want to be able to identify and understand a particular trend (reinforcing actions and/or behavior), or do you want to understand what triggers a specific event (initiating a specific behavior). Both are important, but they are both different. My personal focus has been on identification of trends so that you can leverage or exploit them for commercial gain. While that may sound a big ominous, it is really what business is all about.
There is a common saying that goes, “Correlation does not imply causation.” A common example is that for a large fire you may see a large number of fire trucks. There is a correlation, but it does not imply that fire trucks cause fires. Now, extending this analogy, let’s assume that in a major city the probability of multi-tenant buildings starting on fire is relatively high. Since they are a big city, it is likely that most of those apartments or condos have WiFi hotspots. A spurious correlation would be to imply that WiFi hotspots cause fires.
As you can see, there is definitely potential to misunderstand the results of correlated data. More logical analysis would lead you to see the relationships between the type of building (multi-tenant residential housing) and technology (WiFi) or income (middle-class or higher). Taking the next step to understand the findings, rather than accepting them at face value, is very important.
Once you have what looks to be an interesting correlation there are many fun and interesting things you can do to validate, refine, or refute your hypothesis. It is likely that even without high-caliber data experts and specialists you will be able to identify correlations and trends that can provide you and your company with a competitive advantage. Don’t let the potential complexity become an excuse for not getting started, because as you can see above it is possible to gain insight and create value with a little effort and simple analysis.
I was reading an article from Nancy Duarte about Strengthening Culture with Storytelling, and it made me think about how important a skill story telling can be in business, and how it can be far more effective than just presenting facts / data. These are just a few examples – I’m sure that you have many of your own.
One of the best sales people that I’ve ever known wasn’t a sales person at all. It is Jon Vice, former CEO of the Children’s Hospital of Wisconsin. Jon is very personable and has the ability to make each person feel like they are the most important person in the room (quite a skill in itself). Jon would talk to a room of people and tell a story. Mid-story you were hooked. You completely bought what he was selling, often without knowing what the “ask” was. It was amazing to experience.
Years ago when my company was funding medical research projects, my oldest daughter (then only four years old) and I watched a presentation on the mid-term findings of one of the projects. The MD/Ph.D. giving the presentation was impressive, but what he showed was slide after slide of data. After 10-15 minutes my daughter held her Curious George stuffed animal up in front of her (where the shadow would be seen on the screen) and proclaimed, “Boring!”
Six months later that same person gave his wrap-up presentation. It was short, told an interesting story that explained why these findings were important, laying the groundwork for a follow-on project. A few years later he commented that this was a very valuable lesson because the story with data was far more compelling than just the data itself.
A few years ago the company I work for introduced a high-performance analytics database. We touted that our product was 100 times faster than other products, which happened to be a similar message used by a handful of competitors. In my region we created a “Why Fast Matters” webinar series and told the stories of our early Proof of Value efforts. This helped my team make the first few sales of this new product. People understood our value proposition because these success stories made it tangible.
What I tell my team is to weave the thread of our value proposition into the fabric of a prospect’s story. This both makes us part of the story, and also makes this new story their own (as opposed to being our story). This simple approach has been effective, and also helps you qualify out sooner if you can’t improve the story.
What if you not selling anything? Your data has a story to tell – even more so with big data. Whether you are analyzing data from a single source (such as audit or log data), or correlating data from multiple sources, the data is telling you a story. Whether patterns, trends, or correlated events – the story is there. And once you find it there is so much you can do to build it out.
Whether you are selling, managing, teaching, coaching, analyzing, or just hanging out with friends or colleagues, being able to entertain with a story is a valuable skill. This is a great way to make a lot of things in business even more interesting and memorable. So, give it a try.