Data Governance

Using Themes for Enhanced Problem Solving

Posted on Updated on

Thematic Analysis is a powerful qualitative approach used by many consultants. It involves identifying patterns and themes to better understand how and why something happened, which provides the context for other quantitative analysis. It can also be utilized when developing strategies and tactics due to its “cause and effect” nature.

Typical analysis tends to be event-based. Something happened that was unexpected. Some type of triggering or compelling event is sought to either stop something from happening or to make something happen. With enough of the right data, you may be able to identify patterns, which can help predict what will happen next based on past events. This data-based understanding may be simplistic or incomplete, but often it is sufficient.

Photo by Pixabay on Pexels.com

But, people are creatures of habit. If you can identify and understand those habits, and place them within the context of a specific environment that includes interactions with others, you may be able to identify patterns within the patterns. Those themes can be much better indicators of what may or may not happen than the data itself. They not only become better predictors of things to come but can also help identify more effective strategies and tactics to achieve your goals.

This approach requires that a person view an event (desired or historical) from various perspectives to help understand:

  1. Things that are accidental but predictable because of human nature.
  2. Things that are predictable based on other events and interactions.
  3. Things that are the logical consequence of a series of events and outcomes.

Aside from the practical implications of this approach I find it fascinating relative to AI and Predictive Analysis.

For example, by understanding the recurring themes and triggers you can monitor data and activities proactively. That is actionable intelligence that can be automated and incorporated into a larger system. Machine Learning and Deep Learning can analyze tremendous volumes of data from a variety of sources in realtime.

Combine that with Semantic Analysis, which is challenging due to the complexity of taxonomies and ontologies, and now that system more accurately understand what is really happening in order to make accurate predictions. Add in spatial and temporal data such as IoT, metadata from photographs, etc. and you should have the ability to view something as though you were very high up – providing the ability to “see” what is on the path ahead. It is obviously not that simple, but it is exciting to think about.

From a practical perspective, keeping these thoughts in the back of your mind will help you see details that other people have missed. That makes for better analysis, better strategies, and better execution.

Who wouldn’t want that?

Biometric Identity Theft

Posted on Updated on

Recently I have been researching the potential of fraud and identity theft using fingerprints from photos posted on social media. Last week Amazon released its “Amazon One” Palm Scanner as a means to pay for purchases when shopping. That announcement made me wonder, what are the potential implications for fraud and identity theft using biometric data taken from images?

Man's forearm and hand, index finger extended to point to one of a series of "digital keys"
Could Photos posted on Social Media sites become the Key to Digital Identify Theft?

There are a surprising number of ways to accurately identify someone from a photo or video. Moreover, there is technology to copy fingerprints from social media photos taken up to three meters away. New technology has been proven effective at using 3D printing technology to create “fake fingerprints” that will bypass many fingerprint scanners.

Technology continues to improve at a rapid pace, which often means, “Where there is the will there’s a way.”

Since fingerprints can be copied from photos taken up to three meters away does that mean a palm print could potentially be copied from a photo taken 5-10 meters away? That question led to an interesting but unscientific experiment where I took pictures of my own hand, enlarged them, and then measured the distance between the ridges and furrows of both my fingers and my palm, and then compared the results of the two. Spoiler – probably not.

There are several areas where that distance was similar for both my fingers and palm. But, there were also areas on my palm where the average distance between “landmarks” was 3-5+ times greater. It turns out that for identification purposes a palm image is often segment into 3-4 distinct regions, likely due to this type of variation. This link was helpful to understand the process.

This research led to an idea for a chip-based embedded filter for smart devices and laptops. It would obfuscate key biometric information when extracting the data for display, without affecting the integrity of the original stored image. This functionality would automatically provide an additional layer of privacy and data protection. It would require optimized object detection capabilities (possibly R-CNN) that were highly efficient, and run on a capable but low energy processor like the Arm Cortex-M. Retraining and upgrades would be accomplished with firmware updates.

Edit 2020-10-13: This article on “Tiny ML” from Medium.com is the perfect tie-in to the idea described above.

While Amazon’s technology is much newer and presumably at least partially based on their 2019 Patent Application (which does look impressive), it makes you wonder how susceptible these devices might be to fraud given reports of the scans occurring “almost instantaneously.” Speed is one aspect of successful large-scale commercial adoption but the accuracy and integrity of the system are far more important from my perspective.

Time will tell how robust and foolproof Amazon’s new technology really is. Given their reach, this could occur sooner than later. Ultimately, multiple forms of biometric scans (such as a full handprint with shape, palm, and fingerprints, or a retina scan 2-3 minutes prior to the palm scan to maintain performance) may be required for enhanced security, especially with mobile devices.

Additional Resources:

Blockchain, Data Governance, and Smart Contracts in a Post-COVID-19 World

Posted on Updated on

The last few months have been very disruptive to nearly everyone across the globe. There are business challenges galore; such has managing large remote workforces – many of whom are new to working remotely, and managing risk while attempting to conduct “business as usual.” Unfortunately for most businesses, their systems, processes, and internal controls were not designed for this “new normal.”

While there have been many predictions around Blockchain for the past few years it is still not widely adopted. We are beginning to see an uptick in adoption with Supply Chain Management Systems for reasons that include traceability of items – especially food and drugs. But large-scale adoption has been elusive to date.

Image of globe with network of connected dots in the space above it.

My personal belief is that we will soon begin to see large shifts in mindset, investments, and effort towards modern digital technology driven by Data Governance and Risk Management. I also believe that this will lead to these technologies becoming easier to use via new platforms and integration tools, and that will lead to faster adoption by SMBs and other non-Enterprise organizations, and that will lead to the greater need for DevOps, Monitoring, and Automation solutions as a way to maintain control of a more agile environment.

Here are a few predictions:

  1. New wearable technology supporting Medical IoT will be developed to help provide an early warning system for disease and future pandemics. That will fuel a number of innovations in various industries including Biotech and Pharma.
    • Blockchain can provide the necessary data privacy, data ownership, and data provenance to ensure the veracity of that data.
    • New legislation will be created to protect medical providers and other users of that data from being liable for missing information or trends that could have saved lives or avoided some other negative outcome.
    • In the meantime, Hospitals, Insurance Providers, and others will do everything possible to mitigate the risk of using the Medical IoT data, which could include Smart Contracts as a way to ensure compliance (which assumes that there is a benefit being provided to the data providers).
    • Platforms may be created to offer individuals control over their own data, how it is used and by whom, ownership of that data, and payment for the use of that data. This is something that I wrote about in 2013.
  2. Data Governance will be taken more seriously by every business. Today companies talk about Data Privacy, Data Security, or Data Consistency, but few have a strategic end-to-end systematic approach to managing and protecting their data and their company.
    • Comprehensive Data Governance will become both a driving and gating force as organizations modernize and grow. Even before the pandemic there were growing needs due to new data privacy laws and concerns around areas such as the data used for Machine Learning.
    • In a business environment where more systems are distributed there is an increased risk of data breaches and Cybercrime. That will need to be addressed as a foundational component of any new system or platform.
    • One or two Data Integration Companies will emerge as undisputed industry leaders due to their capabilities around MDM, Data Provenance & Traceability, and Data Access (an area typically managed by application systems).
    • New standardized APIs akin to HL7 FHIR will be created to support a variety of industries as well as interoperability between systems and industries. Frictionless integration of key systems become even more important than it is today.
  3. Anything that can be maintained and managed in a secure and flexible distributed digital environment will be implemented as a way to allow companies to quickly pivot and adapt to new challenges and opportunities on a global scale.
    • Smart Contracts and Digital Currency Payment Processing Systems will likely be core components of those systems.
    • This will also foster the growth of next generation Business Ecosystems and collaborations that will be more dynamic in nature.
    • Ongoing compliance monitoring, internal and external, will likely become a priority (“trust but verify”).

All in all this is exciting from a business and technology perspective. It will require most companies to review and adjust their strategies and tactics to embrace these concepts and adapt to the coming New Normal.

The steps we take today will shape what we see and do in the coming decade so it is important to quickly get this right, knowing that whatever is implemented today will evolve and improve over time.

My perspective on Big Data

Posted on Updated on

Ever since I worked on redesigning a risk management system at an insurance company (1994-1995) I was impressed at how better decisions could be made with more data – assuming it was the right data.  The concept of, “What is the right data?” has intrigued me for years, as what may seem common sense today could have been unknown 5-10 years ago and could be completely passé 5-10 years from now. Context becomes very important because of the variability and relevance of data over time.

This is what makes Big Data interesting. There really is no right or wrong answer or definition. Having a framework to define, categorize, and use that data is important. And at some point being able to refer to the data in-context will be very important as well. Just think about how challenging it could be to compare scenarios or events from 5 years ago with those of today. It’s likely not an apples-to-apples comparison but could certainly be done. The concept of maximizing the value of data is pretty cool stuff.

The way I think of Big Data is similar to a water tributary system. Water enters the system many ways – rain from the clouds, sprinkles from private and public supplies, runoff, overflow, etc.  It also has many interesting dimensions, such as quality/purity (not necessarily the same due to different aspects of need), velocity, depth, capacity, and so forth. Not all water gets into the tributary system (e.g., some is absorbed into the groundwater tables, and some evaporate) – just as some data loss should be anticipated.

Image of the world with a water hose wrapped around it.

If you think in terms of streams, ponds, rivers, lakes, reservoirs, deltas, etc. there are many relevant analogies that can be made. And just like the course of a river may change over time, data in our “big data” water tributary system could also change over time.

Another part of my thinking is based on an experience I had about a decade ago (2002 – 2003 timeframe) working on a project for a Nanotech company. In their labs, they were testing various things. There were particles that changed reflectivity based on the temperature that was embedded in shingles and paint. There were very small batteries that could be recharged tens of thousands of times, were light and had more capacity than a common 12-volt car battery.

And, there was a section where they were doing “biometric testing” for the military. I have since read articles about things like smart fabrics that could monitor the health of a soldier and do things like apply basic first aid and notify others once a problem was detected.  This company felt that by 2020 advanced nanotechnology would be widely used by the military, and by 2025 it would be in wide commercial use.  Is that still a possibility? Who knows…

Much of what you read today is about the exponential growth of data. I agree with that, but as stated earlier, and this is important, I believe that the nature of and sources of that data will change significantly.  For example, nano-particles in engine oil will provide information about temperature, engine speed and load, and even things like rapid changes in movement (fast take-off or stops, quick turns). The nanoparticles in the paint will provide weather conditions. The nanoparticles on the seat upholstery will provide information about occupants (number, size, weight). Sort of like the “sensor web,” from the original Kevin Delin perspective. A lot of “Information of Things” data will be generated, but then what?

I believe that time will become an important aspect of every piece of data, and that location (X, Y, and Z coordinates) will be just as important. But, not every sensor will collect location (spatial data). I do believe there will be multiple data aggregators in common use at common points (your car, your house, your watch). Those aggregators will package the available data in something akin to an XML object, which allows flexibility.  From my perspective, this is where things become very interesting relative to commercial use and data privacy.

Currently, companies like Google make a lot of money from aggregating data from multiple sources, correlating it to a variety of attributes, and then selling knowledge derived from that plethora of data. I believe that there will be opportunities for individuals to use “data exchanges” to manage, sell, and directly benefit from their own data. The more interesting their data, the more value it has and the more benefit it provides to the person selling it. This could have a huge economic impact, and that would foster both the use and expansion of various commercial ecosystems required to manage the commercial and privacy aspects of this technology.

The next logical step in this vision is “smart everything.” For example, you could buy a shirt that is just a shirt. But, for an extra cost, you could turn-on medical monitoring or refractive heating/cooling. And, if you felt there was a market for extra dimensions of data that could benefit you financially, then you could enable those sensors as well. Just think of the potential impact that technology would make to commerce in this scenario.

This is what I personally believe will happen within the next decade or so. This won’t be the only type of or use of big data. Rather, there will be many valid types and uses of data – some complementary and some completely discrete. It has the potential to become a confusing mess. But, people will find ways to ingest, categorize, and correlate data to create value with it – today or in the future.

Utilizing data will become an increasingly competitive advantage for people and companies knowing how to do something interesting and useful with it. Who knows what will be viewed as valuable data 5-10 years from now, but it will likely be different than what we view as valuable data today.

So, what are your thoughts? Can we predict the future based on the past? Or, is it simply enough to create platforms that are powerful enough, flexible enough, and extensible enough to change our understanding as our perspective of what is important changes? Either way it will be fun!