Data Governance
Using Themes for Enhanced Problem Solving
Thematic Analysis is a powerful qualitative approach used by many consultants. It involves identifying patterns and themes to better understand how and why something happened, which provides the context for other quantitative analyses. It can also be utilized when developing strategies and tactics due to its “cause and effect” nature.
Typical analysis tends to be event-based. Something happened that was unexpected. Some type of triggering or compelling event is sought to either stop something from happening or to make something happen. With enough of the right data, you may be able to identify patterns, which can help predict what will happen next based on past events. This data-based understanding may be simplistic or incomplete, but often it is sufficient.

But people are creatures of habit. If you can identify and understand those habits and place them within the context of a specific environment that includes interactions with others, you may be able to identify patterns within the patterns. Those themes can be much better indicators of what may or may not happen than the data itself. They become better predictors of things to come and can help identify more effective strategies and tactics to achieve your goals.
This approach requires that a person view an event (desired or historical) from various perspectives to help understand:
- Things that are accidental but predictable because of human nature.
- Things that are predictable based on other events and interactions.
- Things that are the logical consequence of a series of events and outcomes.
Aside from the practical implications of this approach, I find it fascinating relative to AI and Predictive Analysis.
For example, you can monitor data and activities proactively by understanding the recurring themes and triggers. That is actionable intelligence that can be automated and incorporated into a larger system. Machine Learning and Deep Learning can analyze tremendous volumes of data from various sources in real-time.
Combine that with Semantic Analysis, which is challenging due to the complexity of taxonomies and ontologies. Now, that system more accurately understands what is happening to make accurate predictions. Add in spatial and temporal data such as IoT, metadata from photographs, etc., and you should be able to view something as though you were very high up – providing the ability to “see” what is on the path ahead. It is obviously not that simple, but it is exciting.
From a practical perspective, keeping these thoughts in mind will help you see details others have missed. That makes for better analysis, better strategies, and better execution.
Who wouldn’t want that?
Blockchain, Data Governance, and Smart Contracts in a Post-COVID-19 World
The last few months have been very disruptive to nearly everyone across the globe. There are business challenges galore, such as managing large remote workforces – many of whom are new to working remotely and managing risk while attempting to conduct “business as usual.” Unfortunately, most businesses’ systems, processes, and internal controls were not designed for this “new normal.”
While there have been many predictions around Blockchain for the past few years, it is still not widely adopted. We are beginning to see an uptick in adopting Supply Chain Management Systems for reasons that include traceability of items – especially food and drugs. However, large-scale adoption has been elusive to date.

I believe we will soon begin to see large shifts in mindset, investments, and effort towards modern digital technology driven by Data Governance and Risk Management. I also believe that this will lead to these technologies becoming easier to use via new platforms and integration tools, which will lead to faster adoption by SMBs and other non-enterprise organizations, and that will lead to the greater need for DevOps, Monitoring, and Automation solutions as a way to maintain control of a more agile environment.
Here are a few predictions:
- New wearable technology supporting Medical IoT will be developed to help provide an early warning system for disease and future pandemics. That will fuel a number of innovations in various industries, including Biotech and Pharma.
- Blockchain can provide data privacy, ownership, and provenance to ensure the data’s veracity.
- New legislation will be created to protect medical providers and other users of that data from being liable for missing information or trends that could have saved lives or avoided some other negative outcome.
- In the meantime, Hospitals, Insurance Providers, and others will do everything possible to mitigate the risk of using Medical IoT data, which could include Smart Contracts to ensure compliance (which assumes that a benefit is provided to the data providers).
- Platforms may be created to offer individuals control over their own data, how it is used and by whom, ownership of that data, and payment for the use of that data. This is something I wrote about in 2013.
- Data Governance will be taken more seriously by every business. Today companies talk about Data Privacy, Data Security, or Data Consistency, but few have a strategic end-to-end systematic approach to managing and protecting their data and company.
- Comprehensive Data Governance will become a driving and gating force as organizations modernize and grow. Even before the pandemic, there were growing needs due to new data privacy laws and concerns around areas such as the data used for Machine Learning.
- In a business environment where more systems are distributed, there is an increased risk of data breaches and Cybercrime. That must be addressed as a foundational component of any new system or platform.
- One or two Data Integration Companies will emerge as undisputed industry leaders due to their capabilities around MDM, Data Provenance and Traceability, and Data Access (an area typically managed by application systems).
- New standardized APIs akin to HL7 FHIR will be created to support a variety of industries as well as interoperability between systems and industries. Frictionless integration of key systems become even more important than it is today.
- Anything that can be maintained and managed in a secure and flexible distributed digital environment will be implemented to allow companies to quickly pivot and adapt to new challenges and opportunities on a global scale.
- Smart Contracts and Digital Currency Payment Processing Systems will likely be core components of those systems.
- This will also foster the growth of next-generation Business Ecosystems and collaborations that will be more dynamic.
- Ongoing compliance monitoring, internal and external, will likely become a priority (“trust but verify”).
All in all, this is exciting from a business and technology perspective. Most companies must review and adjust their strategies and tactics to embrace these concepts and adapt to the coming New Normal.
The steps we take today will shape what we see and do in the coming decade so it is important to quickly get this right, knowing that whatever is implemented today will evolve and improve over time.
Big Data – The Genie is out of the Bottle!
Back in early 2011, me and other members of the Executive team at Ingres were taking a bet on the future of our company. We knew we needed to do something big and bold, so we decided to build what we thought the standard data platform would be in 5-7 years. A small minority of the team members did not believe this was possible and left, while the rest focused on making that happen. There were three strategic acquisitions to fill in the gaps on our Big Data platform. Today (as Actian), we have nearly achieved our goal. It was a leap of faith back then, but our vision turned out to be spot-on, and our gamble is paying off today.
My mailbox is filled daily with stories, seminars, white papers, etc., about Big Data. While it feels like this is becoming more mainstream, reading and hearing the various comments on the subject is interesting. They range from “It’s not real” and “It’s irrelevant” to “It can be transformational for your business” to “Without big data, there would be no <insert company name here>.”
What I continue to find amazing is hearing comments about big data being optional. It’s not – that genie has already been let out of the bottle. There are incredible opportunities for those companies that understand and embrace the potential. I like to tell people that big data can be their unfair advantage in business. Is that really the case? Let’s explore that assertion and find out.
We live in the age of the “Internet of Things.” Data about nearly everything is everywhere, and the tools to correlate that data to gain an understanding of so many things (activities, relationships, likes and dislikes, etc.) With smart devices that enable mobile computing, we have the extra dimension of location. And, with new technologies such as Graph Databases (based on SPARQL), graphic interfaces to analyze that data (such as Sigma), and identification technology such as Stylometry, it is getting easier to identify and correlate that data. Someday, this will feed into artificial intelligence, becoming a superpower for those who know how to leverage it effectively.
We are generating increasingly larger and larger volumes of data about everything we do and everything going on around us, and tools are evolving to make sense of that data better and faster than ever. Those organizations that perform the best analysis get the answers fastest and act on that insight quickly are more likely to win than organizations that look at a smaller slice of the world or adopt a “wait and see” posture. So, that seems like a significant advantage in my book. But is it an unfair advantage?
First, let’s remember that big data is just another tool. Like most tools, it has the potential for misuse and abuse. Whether a particular application is viewed as “good” or “bad” is dependent on the goals and perspective of the entity using the tool (which may be the polar opposite view of the groups of people targeted by those people or organizations). So, I will not attempt to judge the various use cases but rather present a few use cases and let you decide.
Scenario 1 – Sales Organization: What if you could understand what you were being told a prospect company needs and had a way to validate and refine that understanding? That’s half the battle in sales (budget, integration, and support / politics are other key hurdles). Data that helped you understand not only the actions of that organization (customers and industries, sales and purchases, gains and losses, etc.) but also the stakeholders’ and decision-makers’ goals, interests, and biases. This could provide a holistic view of the environment and allow you to provide a highly targeted offering, with messaging tailored to each individual. That is possible, and I’ll explain soon.
Scenario 2 – Hiring Organization: Many questions cannot be asked by a hiring manager. While I’m not an attorney, I would bet that State and Federal laws have not kept pace with technology. And while those laws vary state by state, there are likely loopholes allowing public records to be used. Moreover, implied data that is not officially considered could color the judgment of a hiring manager or organization. For instance, if you wanted to “get a feeling” that a candidate might fit in with the team or the culture of the organization or have interests and views that are aligned with or contrary to your own, you could look for personal internet activity that would provide a more accurate picture of that person’s interests.
Scenario 3 – Teacher / Professor: There are already sites in use to search for plagiarism in written documents, but what if you had a way to make an accurate determination about whether an original work was created by your student? There are people who, for a price, will do the work and write a paper for a student. So, what if you could not only determine that the paper was not written by your student but also determine who the likely author was?
Do some of these things seem impossible or at least implausible? Personally, I don’t believe so. Let’s start with the typical data that our credit card companies, banks, search engines, and social network sites already have related to us. Add to that the identified information available for purchase from marketing companies and various government agencies. That alone can provide a pretty comprehensive view of us. But there is so much more that’s available.
Consider the potential of gathering information from intelligent devices accessible through the Internet, your alarm and video monitoring system, etc. These are intended to be private data sources, but one thing history has taught us is that anything accessible is subject to unauthorized access and use (just think about the numerous recent credit card hacking incidents).
Even de-identified data (medical / health / prescription / insurance claim data is one major example), which receives much less protection and can often be purchased, could be correlated with a reasonably high degree of confidence to gain an understanding of other “private” aspects of your life. The key is to look for connections (websites, IP addresses, locations, businesses, people), things that are logically related (such as illnesses / treatments / prescriptions), and then accurately identify (stylometry looks at things like sentence complexity, function words, co-location of words, misspellings and misuse of words, etc. and will likely someday take into consideration things like idea density). It is nearly impossible to remain anonymous in the Age of Big Data.
There has been a paradigm shift regarding the practical application of data analysis, and the companies that understand this and embrace it will likely perform better than those that don’t. There are new ethical considerations that arise from this technology, and likely new laws and regulations as well. But for now, the race is on!
Using Technology for the Greater Good
My company and my family funded a dozen or so medical research projects over several years. I had the pleasure of meeting and working with many brilliant MD/Ph.D. researchers. My goal was to fund $1 million in medical research and find a cure for Juvenile Arthritis. We didn’t reach that goal, but many good things came out of that research.
Something that amazed me was how research worked. Competition for funding is intense, so there was much less collaboration between institutions than I would have expected. At one point, we were funding similar projects at two institutions. The projects went in two very different directions, and it was clear that one would be much more successful than the other. It seemed almost wasteful, and I thought there must be a better, more efficient, and cost-effective way of managing research efforts.
So, in 2006 I had an idea. What if I could create a cloud-based (a very new concept at the time) research platform that would support global collaboration? It would need to support true analytical processing, statistical analysis, document management (something fairly new then), and desktop publishing. Publishing research findings is very important in this space, so my idea was to provide a workspace that supported end-to-end research efforts (inception to publication, including auditing and data collection) and fostered collaboration.
This platform would only work if there were a new way to allow interested parties to fund this research that was easy to use and could reach a large audience. Individuals could make contributions based on areas of interest, specific projects, specific individuals working on projects, or projects in a specific regional area. The idea was a lot like what Crowdtilt is today. This funding mechanism would support non-traditional collaboration and hopefully greatly impact the research community and their findings.
Additionally, this platform would support the collection of suggestions and ideas. Good ideas can come from anywhere – especially when you don’t know that something is not supposed to work.
During one funding review meeting at the Children’s Hospital of Philadelphia (CHOP), I made a naïve statement about using cortisone injections to treat TMJ arthritis. I was told why this would not work. A month or so later, I received a call explaining that my suggestion might work, with a request for another in-person meeting and additional funding. Conceptual Expansion at its best! That led to a new research project and positive results (see http://onlinelibrary.wiley.com/doi/10.1002/art.21384/pdf).
You never know where the next good idea might come from, so why not make it easy for people to share those ideas.
By the end of 2007, I had designed an architecture based on SOA (service-oriented architecture) using open-source products that would do most of what I needed. Then, in 2008 Google announced the “Project 10^100” competition. I entered, confident that I would at least get an honorable mention (alas, nothing came from this).
Then, in early 2010 I spent an hour discussing my idea with the CTO of a popular Cloud company. This CTO had a medical background, liked my idea, offered a few suggestions, and even offered to help. It was the perfect opportunity. But, I had just started a new position at work, so this project fell by the wayside. That was a shame, and I only have myself to blame. It is something that has bothered me for years.
It’s 2013, and far more tools are available today to make this platform a reality, and something like this still does not exist. I’m writing this because the idea has merit, and I think there might be others who feel the same way and would like to work on making this dream a reality. It’s a chance to leverage technology to potentially make a huge impact on society. And it can create opportunities for people in regions that might otherwise be ignored to contribute to this greater good.
Idealistic? Maybe. Possible? Absolutely!

