The last few months have been very disruptive to nearly everyone across the globe. There are business challenges galore; such has managing large remote workforces – many of whom are new to working remotely, and managing risk while attempting to conduct “business as usual.” Unfortunately for most businesses, their systems, processes, and internal controls were not designed for this “new normal.”
While there have been many predictions around Blockchain for the past few years it is still not widely adopted. We are beginning to see an uptick in adoption with Supply Chain Management Systems for reasons that include traceability of items – especially food and drugs. But large-scale adoption has been elusive to date.
My personal belief is that we will soon begin to see large shifts in mindset, investments, and effort towards modern digital technology driven by Data Governance and Risk Management. I also believe that this will lead to these technologies becoming easier to use via new platforms and integration tools, and that will lead to faster adoption by SMBs and other non-Enterprise organizations
Here are a few predictions:
- New wearable technology supporting Medical IoT will be developed to help provide an early warning system for disease and future pandemics. That will fuel a number of innovations in various industries including Biotech and Pharma.
- Blockchain can provide the necessary data privacy, data ownership, and data provenance to ensure the veracity of that data.
- New legislation will be created to protect medical providers and other users of that data from being liable for missing information or trends that could have saved lives or avoided some other negative outcome.
- In the meantime, Hospitals, Insurance Providers, and others will do everything possible to mitigate the risk of using the Medical IoT data, which could include Smart Contracts as a way to ensure compliance (which assumes that there is a benefit being provided to the data providers).
- Platforms may be created to offer individuals control over their own data, how it is used and by whom, ownership of that data, and payment for the use of that data. This is something that I wrote about in 2013.
- Data Governance will be taken more seriously by every business. Today companies talk about Data Privacy, Data Security, or Data Consistency, but few have a strategic end-to-end systematic approach to managing and protecting their data and their company.
- Comprehensive Data Governance will become both a driving and gating force as organizations modernize and grow. Even before the pandemic there were growing needs due to new data privacy laws and concerns around areas such as the data used for Machine Learning.
- In a business environment where more systems are distributed there is increased risk of data breaches and cybercrime. That will need to be addressed as a foundational component of any new system.
- One or two Data Integration Companies will emerge as undisputed industry leaders due to their capabilities around MDM, Data Provenance & Traceability, and Data Access (an area typically managed by application systems).
- New standardized APIs akin to HL7 FHIR will be created to support a variety of industries as well as interoperability between systems and industries.
- Anything that can be maintained and managed in a secure and flexible distributed digital environment will be implemented as a way to allow companies to quickly pivot and adapt to new challenges and opportunities on a global scale.
- Smart Contracts and Digital Currency Payment Processing Systems will likely be core components of those systems.
- This will also foster the growth of next generation Business Ecosystems and collaborations that will be more dynamic in nature.
All in all this is exciting from a business and technology perspective. It will require most companies to review and adjust their strategies and tactics to embrace these concepts and adapt to the coming New Normal.
The steps we take today will shape what we see and do in the coming decade so it is important to quickly get this right, knowing that whatever is implemented today will evolve and improve over time.
Recently I was helping one of my children research a topic for a school paper. She was doing well, but the results she was getting were overly broad. So, I taught her some “Google-Fu,” explaining how you can structure queries in ways that yield better results. She replied that search engines should be smarter than that. I explained that sometimes the problem is that search engines look at your past searches and customize results as an attempt to appear smarter or to motivate someone to do or believe something.
Unfortunately, those results can be skewed and potentially lead someone in the wrong direction. It was a good reminder that getting the best results from search engines often requires a bit of skill and query planning, as well as occasional third-party validation.
Then the other day I saw this commercial from Motel 6 (“Gas Station Trouble”) where a man has problems getting good results from his smart phone. That reminded me of seeing someone speak to their phone, getting frustrated by the responses received. His questions went something like this:
“Siri, I want to take my wife to dinner tonight, someplace that is not too far away, and not too late. And she likes to have a view while eating so please look for something with a nice view. Oh, and we don’t want Italian food because we just had that last night.”
Just as amazing as the question being asked was watching him ask it over and over again in the exact same way, each time becoming even more frustrated. I asked myself, “Are smartphones making us dumber?” Instead of contemplating that question I began to think about what future smart interfaces would or could be like.
I grew up watching Sci-Fi computer interfaces like “Computer” on Star Trek (1966), “HAL” on 2001 : A Space Odyssey (1968), “KITT” from Knight Rider (1982), and “Samantha” from Her (2013). These interfaces had a few things in common:
- They responded to verbal commands;
- They were interactive – not just providing answers, but also asking qualifying questions and allowing for interrupts to drill-down or enhance the search (e.g., with pictures or questions that resembled verbal Venn diagrams);
- They often provided suggestions for alternate queries based on intuition. That would have been helpful for the gentleman trying to find a restaurant.
Despite having 50 years of science fiction examples, we are still a long way off from realizing that goal of a truly intelligent interface. Like many new technologies, they were originally envisioned by science fiction writers long before they appeared in science.
There seems to be a spectrum of common beliefs about modern interfaces. On one end there are products that make visualization easy, facilitating understanding, refinement and drill-down of data sets. Tableau is a great example of this type of easy to use interface. At the other end of the spectrum the emphasis is on back-end systems – robust computer systems that digest huge volumes of data and return the results to complex queries within seconds. Several other vendors offer powerful analytics platforms. In reality, you really need a strong front-end and back-end if you want to achieve the full potential of either.
But, there is so much more potential…
I predict that within the next 3 – 5 years we will see business and consumer interface examples (powered by Natural Language Processing, or NLP) that are closer to the verbal interfaces from those familiar Sci-Fi shows (albeit with limited capabilities and no flashing lights).
Within the next 10 years I believe we will have computer interfaces that intuit our needs and facilitate the generation of correct answers quickly and easily. While this is unlikely to be at the level of “The world’s first intelligent Operating System” envisioned in the movie “Her,” and probably won’t even be able to read lips like “HAL,” it should be much more like HAL and KITT than like Siri (from Apple) or Cortana (from Microsoft).
Siri was groundbreaking consumer technology when it was introduced. Cortana seems to have taken a small leap ahead. While I have not mentioned Google Now, it is somewhat of a latecomer to this consumer smart interface party, and in my opinion is behind both Siri and Cortana.
So, what will this future smart interface do? It will need to be very powerful, harnessing a natural language interface on the front-end with an extremely flexible and robust analytics interface on the back-end. The language interface will need to take a standard question (in multiple languages and dialects) – just as if you were asking a person, deconstruct it using Natural Language Processing, and develop the proper query based on the available data. That is important but only gets you so far.
Data will come from many sources – things that we consider today with relational, object, graph, and NoSQL databases. There will be structured and unstructured data that must be joined and filtered quickly and accurately. In addition, context will be more important than ever. Pictures and videos could be scanned for facial recognition, location (via geotagging), and in the case of videos analyze speech. Relationships will be identified and inferred based on a variety of sources, using both data and metadata. Sensors will collect data from almost everything we do and (someday) wear, which will provide both content and context.
The use of Stylometry will identify outside content likely related to the people involved in the query and provide further context about interests, activities, and even biases. This is how future interfaces will truly understand (not just interpret), intuit (so it can determine what you really want to know), and then present results that may be far more accurate than we are used to today. Because the interface is interactive in nature it will provide the ability to organize and analyze subsets of data quickly and easily.
So, where do I think that this technology will originate? I believe that it will be adapted from video game technology. Video games have consistently pushed the envelope over the years, helping drive the need for higher bandwidth I/O capabilities in devices and networks, better and faster graphics capabilities, and larger and faster storage (which ultimately led to flash memory and even Hadoop). Animation has become very lifelike and games are becoming more responsive to audio commands. It is not a stretch of the imagination to believe that this is where the next generation of smart interfaces will be found (instead of from the evolution of current smart interfaces).
Someday it may no longer be possible to “tweak” results through the use or omission of keywords, quotation marks, and flags. Additionally, it may no longer be necessary to understand special query languages (SQL, NoSQL, SPARQL, etc.) and syntax. We won’t have to worry as much about incorrect joins, spurious correlations and biased result sets. Instead, we will be given the answers we need – even if we don’t realize that this was what we needed in the first place. At that point computer systems may appear nearly omniscient.
When this happens parents will no longer need to teach their children “Google-Fu.” Those are going be interesting times indeed.
Back in early 2011 myself and 15 other members of the Executive team at Ingres were taking a bet on the future of our company. We knew that we needed to do something big and bold, and decided to build what we thought the standard data platform would be in 5-7 years. A small minority of the people on that team did not believe this was possible and left, while the rest of us focused on making that happen. There were three strategic acquisitions to fill-in the gaps on our Big Data platform. Today (as Actian) we have nearly achieved our goal. It was a leap of faith back then, but our vision turned out to be spot-on and our gamble is paying off today.
Every day my mailbox is filled with stories, seminars, white papers, etc. about Big Data. While it feels like this is becoming more mainstream, it is interesting to read and hear the various comments on the subject. They range from, “It’s not real” and “It’s irrelevant” to “It can be transformational for your business” to “Without big data there would be no <insert company name here>.”
What I continue to find amazing is hearing comments about big data being optional. It’s not – that genie has already been let out of the bottle. There are incredible opportunities for those companies that understand and embrace the potential. I like to tell people that big data can be their unfair advantage in business. Is that really the case? Let’s explore that assertion and find out.
We live in the age of the “Internet of Things.” Data about nearly everything is everywhere, and the tools to correlate that data to gain understanding of so many things (activities, relationships, likes and dislikes, etc.) With smart devices that enable mobile computing we have the extra dimension of location. And, with new technologies such as Graph Databases (based on SPARQL), graphic interfaces to analyze that data (such as Sigma), and identification technology such as Stylometry, it is getting easier than ever to identify and correlate that data.
We are generating increasingly larger and larger volumes of data about everything we do and everything going on around us, and tools are evolving to make sense of that data better and faster than ever. Those organizations that perform the best analysis, get the answers fastest, and act on that insight quickly are more likely to win than the organizations that look at a smaller slice of the world or adopt a “wait and see” posture. So, that seems like a significant advantage in my book. But, is it an unfair advantage?
First, let’s keep in mind that big data is really just another tool. Like most tools it has the potential for misuse and abuse. And, whether a particular application is viewed as “good” or “bad” is dependent on the goals and perspective of the entity using the tool (which may be the polar opposite view of the groups of people targeted by those people or organizations). So, I will not attempt to make judgments about the various use cases, but rather present a few use cases and let you decide.
Scenario 1 – Sales Organization: What if you could not only understand what you were being told a prospect company needs, but also had a way to validate and refine that understanding? That’s half the battle in sales (budget, integration, and support / politics are other key hurdles). Data that helped you understand not only the actions of that organization (customers and industries, sales and purchases, gains and losses, etc.), but also the goals, interests and biases of the stakeholders and decision makers. This could provide a holistic view of the environment and allow you to provide a highly targeted offering, with messaging tailored to each individual. That is possible, and I’ll explain soon.
Scenario 2 – Hiring Organization: As a hiring manager there are many questions that cannot be asked. While I’m not an attorney, I would bet that State and Federal laws have not kept pace with technology. And, while those laws vary state by state, there are likely loopholes that allow for use of public records. Moreover, implied data that is not officially taken into consideration could color the judgment of a hiring manager or organization. For instance, if you wanted to “get a feeling” if a candidate might fit-in with the team or the culture of the organization, or have interests and views that are aligned with or contrary to your own, you could look for personal internet activity that would provide a more accurate picture of that person’s interests.
Scenario 3 – Teacher / Professor: There are already sites in use to search for plagiarism in written documents, but what if you had a way to make an accurate determination about whether an original work was created by your student? There are people who, for a price, will do the work and write a paper for a student. So, what if you could not only determine that the paper was not written by your student, but also determine who the likely author was?
Do some of these things seem impossible, or at least implausible? Personally, I don’t believe so. Let’s start with the typical data that our credit card companies, banks, search engines, and social network sites already have related to us. Add to that the identified information that is available for purchase from marketing companies and various government agencies. That alone can provide a pretty comprehensive view of us. But, there is so much more that’s available.
Think about the potential of gathering information from intelligent devices that are accessible through the Internet, or your alarm and video monitoring system, etc. These are intended to be private data sources, but one thing history has taught us is that anything accessible is subject to unauthorized access and use (just think about the numerous recent credit card hacking incidents).
Even de-identified data (medical / health / prescription / insurance claim data is one major example), which receives much less protection and can often be purchased, could be correlated with a reasonably high degree of confidence to gain understanding on other “private” aspects of your life. The key is to look for connections (websites, IP addresses, locations, businesses, people), things that are logically related (such as illnesses / treatments / prescriptions), and then make as accurate of an identification as possible (stylometry looks at things like sentence complexity, function words, co-location of words, misspellings and misuse of words, etc. and will likely someday take into consideration things like idea density). It is nearly impossible to remain anonymous in the Age of Big Data.
There has been a paradigm shift when it comes to the practical application of data analysis, and the companies that understand this and embrace it will likely perform better than those who don’t. There are new ethical considerations that arise from this technology, and likely new laws and regulations as well. But for now, the race is on!
Being in Sales I have the opportunity to speak to a lot of customers and prospects about many things. Most are interested in Cloud Computing and Big Data, but often they don’t fully understand how they will leverage the technology to maximize the benefit.
Here is a simple three-step process that I use:
1. For Big Data I explain that there is no single correct definition. Because of this I recommend that companies focus on what they need rather than on what to call it. Results are more important than definitions for these purposes.
2. Relate the technology to something people are likely already familiar with (extending those concepts). For example: Cloud computing is similar to virtualization and has many of the same benefits; Big Data is similar to data warehousing. This helps make new concepts more tangible in any context.
3. Provide a high-level explanation of how “new and old” are different, and why new is better using specific examples that they should relate to. For example: Cloud computing often occurs in an external data center – possibly one that you may not even know where it is, so security can be even more complex than with in-house systems and applications. It is possible to have both Public and Private Clouds, and a public cloud from a major vendor may be more secure and easier to implement than a similar system using your own hardware;
Big Data is a little bit like my first house. I was newly married, anticipated having children and also anticipated moving into a larger house in the future. My wife and I started buying things that fit into our vision of the future and storing it in our basement. We were planning for a future that was not 100% known.
But, our vision changed over time and we did not know exactly what we needed until the very end. After 7 years our basement was very full and it was difficult to find things. When we moved to a bigger house we did have a lot of what we needed. But, we also had many things that we no longer wanted or needed. And, there were a few things we wished that we had purchased earlier. We did our best, and most of what we did was beneficial, but those purchases were speculative and in the end there was some waste.
How many of you would have thought that Social Media Sentiment Analysis would be important 5 years ago? How many would have thought that hashtag usage would have become so pervasive in all forms of media? How many understood the importance of location information (and even the time stamp for that location)? My guess is that it would be less than 50% of all companies.
This ambiguity is both the good and bad thing about big data. In the old data warehouse days you knew what was important because this was your data about your business, systems, and customers. While IT may have seemed tough before, it can be much more challenging now. But, the payoff can also be much larger so it is worth the effort. Many times you don’t know what you don’t know – and you just need to accept that.
Now we care about unstructured data (website information, blog posts, press releases, tweets, etc.), streaming data (stock ticker data is a common example), sensor data (temperature, altitude, humidity, location, lateral and horizontal forces), temporal data, etc. Data arrives from multiple sources and likely will have multiple time frame references (e.g., constant streaming versus updates with varying granularity), often in unknown or inconsistent formats.
Robust and flexible data integration, data protection, and data privacy will all become far more important in the near future! This is just the beginning for Big Data.
For several years my company and my family funded a dozen or so medical research projects. I had the pleasure of meeting and working with many brilliant MD/Ph.D. researchers. My goal was to fund $1 million of medical research and find a cure for Arthritis. We didn’t reach that goal, but many good things came out of that research.
Something that amazed me was how research worked. Competition for funding is intense, so there was much less collaboration between institutions than I would have expected. At one point we were funding similar projects at two institutions. The projects went in two very different directions, and it was clear to me that one was going to be much more successful than the other. It seemed almost wasteful, and I thought that there must be a better, more efficient and cost-effective way of managing research efforts.
So, in 2006 I had an idea. What if I could create a cloud based (a very new term at the time) research platform that would support global collaboration? It would need to support true analytical processing, statistical analysis, document management (something else that was fairly new at the time), and desktop publishing at a minimum. Publishing research findings is very important in this space, so my idea was to provide a workspace that supported end-to-end research efforts (inception to publication) and fostered collaboration.
This platform would only really work if there were a new way to allow interested parties to fund this research that was easy to use and could reach a large audience. People could make contributions based on area of interest, specific projects, specific individuals working on projects, or projects in a specific regional area. The idea was a lot like what Crowdtilt (www.crowdtilt.com) is today. This funding mechanism would support non-traditional collaboration, and would hopefully have a huge impact on the research community and their findings.
Additionally, this platform would support the collection of suggestions and ideas. Good ideas can come from anywhere – especially when you don’t know that something is not supposed work.
During one funding review meeting I made a naïve statement about using cortisone injections to treat TMJ arthritis. I was told why this would not work. But, a month or so later I received a call explaining how this might actually work. That led to a research project and positive results (see http://onlinelibrary.wiley.com/doi/10.1002/art.21384/pdf). You never know where the next good idea might come from, so why not make it easy for people to share those ideas.
By the end of 2007 I had designed an architecture using SOA (service oriented architecture) using open source products that would do most of what I needed. Then, in 2008 Google announced the “Project 10^100” competition. I entered, confident that I would at least get honorable mention (alas, nothing came from this).
Then, in early 2010 I spent an hour discussing my idea with the CTO of a popular Cloud company. This CTO had a medical background, liked my idea, offered a few suggestions, and even offered to help. It was the perfect opportunity. But, I had just started a new position at work and this fell to the wayside. That was a shame, and I only have myself to blame. It is something that has bothered me for years.
It’s 2013, there are far more tools available today to make this platform a reality, and it still does not exist. The reason that I’m writing this is because the idea has merit, and think that there might be others who feel he same way and would like to work on making this dream a reality. It’s a change to leverage technology to potentially make a huge impact on society. And, it can create opportunities for people in regions that might otherwise be ignored to contribute to this greater good.
Idealistic? Maybe. Possible? Absolutely!
Ever since I worked on redesigning a risk management system at an insurance company (1994-1995) I was impressed at how you could make better decisions with more data – assuming it was the right data. The concept of, “What is the right data?” has intrigued me for years, as what may seem common sense today could have been unknown 5-10 years ago, and may be completely passe 5-10 years from now. Context becomes very important because of the variability of data over time.
And this is what makes Big Data interesting. There really is no right or wrong answer or definition. Having a framework to define, categorize, and use that data is important. And at some point being able to refer to the data in-context will be very important as well. Just think about how challenging it could be to compare scenarios or events from 5 years ago with those of today. It’s not apples-to-apples but could certainly be done. It is pretty cool stuff.
The way I think of Big Data is similar to a water tributary system. Water gets into the system many ways – rains from the clouds, sprinkles from private and public supplies, runoff and overflow, etc. It also has many interesting dimensions, such as quality / purity (not necessarily the same due to different aspects of need), velocity, depth, capacity, and so forth. Not all water gets into the tributary system (e.g., some is absorbed into the groundwater tables, and some evaporates), so data loss is expected. If you think in terms of streams, ponds, rivers, lakes, reservoirs, deltas, etc. there are many relevant analogies that can be made. And just like the course of a river may change over time, data in our water tributary system could also change over time.
Another part of my thinking is based on an experience I had about a decade ago (2002 – 2003 timeframe) working on a project for a Nanotech company. In their labs they were testing various things. There were particles that changed reflectivity based on temperature that were embedded in shingles and paint. There were very small batteries that could be recharged tens of thousands of times, were light, and had more capacity than a 12-volt car battery. And, there was a section where they were doing “biometric testing” for the military. I have since read articles about things like smart fabrics that could monitor the health of a soldier, and do things like apply basic first aid when a problem was detected. This company felt that by 2020 advanced nanotechnology would be widely used by the military, and by 2025 it would be in wide commercial use. Is that still a possibility? Who knows…
Much of what you read today is about the exponential growth of data. I agree with that, but also believe that the nature of and sources of that data will change significantly. For example, nano-particles in engine oil will provide information about temperature, engine speed and load, and even things like rapid changes in movement (fast take-off or stops, quick turns). The nanoparticles in the paint will provide weather conditions. The nanoparticles on the seat upholstery will provide information about occupants (number, size, weight). Sort of like the “sensor web,” from the original Kevin Delin perspective. A lot of data will be generated, but then what?
I believe that time will become an important aspect of every piece of data, and that location (X, Y, and Z coordinates) will be just as important. But, not every sensor will collect location. I believe there will be multiple data aggregators in common use at common points (your car, your house, your watch). Those aggregators will package the available data in something akin to an XML object, which allows flexibility. From my perspective this is where things become very interesting.
Currently companies like Google make a lot of money from aggregating data from multiple sources, correlating it to a variety of attributes, and then selling knowledge derived from that plethora of data. I believe that there will be opportunities for individuals to use “data exchanges” to manage, sell, and directly benefit from their own data. The more interesting their data, the more value it has and the more benefit it provides to the person selling it. This could have a huge economic impact, and that would foster both the use and expansion of various commercial ecosystems required to manage the commercial and privacy aspects of this technology.
The next logical step in this vision is “smart everything.” For example, you could buy a shirt that is just a shirt. But, for an extra cost you could turn-on medical monitoring or refractive heating / cooling. And, if you felt there was a market for extra dimensions of data that could benefit you financially, then you could enable those sensors as well. Just think of the potential impact that technology would make to commerce in this scenario.
This is what I personally believe will happen within the next decade or so. This won’t be the only type or use of big data. Rather, there will be many valid types and uses of data – some complementary and some completely discrete. It has the potential to become a confusing mess. But, people will find ways to ingest and correlate that data to identify value in it – today or in the future, and decide to store it (potentially forever). Utilizing that data will become a competitive advantage for people and companies knowing how to do something interesting with it. Who knows what will be viewed as valuable data 5-10 years from now, but it will likely be different than what we view as valuable data today.
So, what are your thoughts? Can we predict the future, or simply create platforms that are powerful enough, flexible enough, and extensible enough to change as our perspective of what is important changes? Either way it will be fun!