IoT

Big Data – The Genie is out of the Bottle!

Posted on September 29, 2014 Updated on December 7, 2025

Back in early 2011, me and other members of the Executive team at Ingres were taking a bet on the future of our company. We knew we needed to do something big and bold, so we decided to build what we thought the standard data platform would be in 5-7 years. A small minority of the team members did not believe this was possible and left, while the rest focused on making that happen. There were three strategic acquisitions to fill in the gaps on our Big Data platform. Today (as Actian), we have nearly achieved our goal. It was a leap of faith back then, but our vision turned out to be spot-on, and our gamble is paying off today.

My mailbox is filled daily with stories, seminars, white papers, etc., about Big Data. While it feels like this is becoming more mainstream, reading and hearing the various comments on the subject is interesting. They range from “It’s not real” and “It’s irrelevant” to “It can be transformational for your business” to “Without big data, there would be no <insert company name here>.”

Illustration of smoke coming out of a brass lantern

What I continue to find amazing is hearing comments about big data being optional. It’s not – that genie has already been let out of the bottle. There are incredible opportunities for those companies that understand and embrace the potential. I like to tell people that big data can be their unfair advantage in business. Is that really the case? Let’s explore that assertion and find out.

We live in the age of the “Internet of Things.” Data about nearly everything is everywhere, and the tools to correlate that data to gain an understanding of so many things (activities, relationships, likes and dislikes, etc.) With smart devices that enable mobile computing, we have the extra dimension of location. And, with new technologies such as Graph Databases (based on SPARQL), graphic interfaces to analyze that data (such as Sigma), and identification technology such as Stylometry, it is getting easier to identify and correlate that data. Someday, this will feed into artificial intelligence, becoming a superpower for those who know how to leverage it effectively.

We are generating increasingly larger and larger volumes of data about everything we do and everything going on around us, and tools are evolving to make sense of that data better and faster than ever. Those organizations that perform the best analysis get the answers fastest and act on that insight quickly are more likely to win than organizations that look at a smaller slice of the world or adopt a “wait and see” posture. So, that seems like a significant advantage in my book. But is it an unfair advantage?

First, let’s remember that big data is just another tool. Like most tools, it has the potential for misuse and abuse. Whether a particular application is viewed as “good” or “bad” is dependent on the goals and perspective of the entity using the tool (which may be the polar opposite view of the groups of people targeted by those people or organizations). So, I will not attempt to judge the various use cases but rather present a few use cases and let you decide.

Scenario 1 – Sales Organization: What if you could understand what you were being told a prospect company needs and had a way to validate and refine that understanding? That’s half the battle in sales (budget, integration, and support / politics are other key hurdles). Data that helped you understand not only the actions of that organization (customers and industries, sales and purchases, gains and losses, etc.) but also the stakeholders’ and decision-makers’ goals, interests, and biases. This could provide a holistic view of the environment and allow you to provide a highly targeted offering, with messaging tailored to each individual. That is possible, and I’ll explain soon.

Scenario 2 – Hiring Organization: Many questions cannot be asked by a hiring manager. While I’m not an attorney, I would bet that State and Federal laws have not kept pace with technology. And while those laws vary state by state, there are likely loopholes allowing public records to be used. Moreover, implied data that is not officially considered could color the judgment of a hiring manager or organization. For instance, if you wanted to “get a feeling” that a candidate might fit in with the team or the culture of the organization or have interests and views that are aligned with or contrary to your own, you could look for personal internet activity that would provide a more accurate picture of that person’s interests.

Scenario 3 – Teacher / Professor: There are already sites in use to search for plagiarism in written documents, but what if you had a way to make an accurate determination about whether an original work was created by your student? There are people who, for a price, will do the work and write a paper for a student. So, what if you could not only determine that the paper was not written by your student but also determine who the likely author was?

Do some of these things seem impossible or at least implausible? Personally, I don’t believe so. Let’s start with the typical data that our credit card companies, banks, search engines, and social network sites already have related to us. Add to that the identified information available for purchase from marketing companies and various government agencies. That alone can provide a pretty comprehensive view of us. But there is so much more that’s available.

Consider the potential of gathering information from intelligent devices accessible through the Internet, your alarm and video monitoring system, etc. These are intended to be private data sources, but one thing history has taught us is that anything accessible is subject to unauthorized access and use (just think about the numerous recent credit card hacking incidents).

Even de-identified data (medical / health / prescription / insurance claim data is one major example), which receives much less protection and can often be purchased, could be correlated with a reasonably high degree of confidence to gain an understanding of other “private” aspects of your life. The key is to look for connections (websites, IP addresses, locations, businesses, people), things that are logically related (such as illnesses / treatments / prescriptions), and then accurately identify (stylometry looks at things like sentence complexity, function words, co-location of words, misspellings and misuse of words, etc. and will likely someday take into consideration things like idea density). It is nearly impossible to remain anonymous in the Age of Big Data.

There has been a paradigm shift regarding the practical application of data analysis, and the companies that understand this and embrace it will likely perform better than those that don’t. There are new ethical considerations that arise from this technology, and likely new laws and regulations as well. But for now, the race is on!

This entry was posted in Artificial Intelligence, Big Data, Business Intelligence, Cloud Computing, Data Governance, Geospatial, innovation, IoT, Life, machine learning, strategy, Technology, Uncategorized and tagged advantage, AI, analytics, Big Data, internet of things, IoT, oodb, privacy, SCADA, sigma, sparql, Stylometry, VLDB.

Genetics, Genomics, Nanotechnology, and more

Posted on January 25, 2014 Updated on November 30, 2023

Science has been interesting to me for most of my lifetime, but it wasn’t until my first child was born that I shifted from “interested” to “involved.” My eldest daughter was diagnosed with Systemic Onset Juvenile Idiopathic Arthritis (SoJIA – originally called Juvenile Rheumatoid Arthritis, or JRA) when she was 15 months old, which also happened to be about six months into the start of my old Consulting company and in the middle of a very critical Y2K ERP system upgrade and rehosting project. It was definitely a challenging time in my life.

At that time, there was very little research on JRA because it was estimated there were only 30,000 children affected by the disease, and the implication was that funding research would not have a positive ROI. This was also a few years before the breakthroughs of biological medicines like Enbrel for children.

Illustration of a human genome — Source: history.nih.gov/exhibits/genetics/images/main/collage.gif

One of the things that I learned was that this disease could be horribly debilitating. Children often had physical deformities as a result of this disease. Even worse, the systemic type that my daughter has could result in premature death. As a first-time parent, imagining that type of life for your child was extremely difficult.

Luckily, the company I had just started was taking off, so I decided to find ways to make a tangible difference for all children with this disease. We decided to take 50% of our net profits and use them to fund medical research. We aimed to fund $1 million in research and find a cure for Juvenile Arthritis within the next 5-7 years.

As someone new to “major gifts” and philanthropy, I quickly learned that some gifting vehicles were more beneficial than others. While most organizations wanted you to start a fund (which we did), the impact from that tended to be more long-term and less immediate. I met someone passionate, knowledgeable, and successful in her field who showed me a different and better approach (here’s a post that describes that in more detail).

I no longer wanted to blindly give money and hope it was used quickly and properly. Rather, I wanted to treat these donations like investments in a near-term cure. In order to be successful, I needed to understand research from both medical and scientific perspectives in these areas. That began a new research and independent learning journey in completely new areas.

There was a lot going on in Genetics and Genomics at the time (here’s a good explanation of the difference between the two). My interest and efforts in this area led to a position on the Medical and Scientific Advisory Committee with the Arthritis Foundation. With the exception of me, the other members were talented and successful physicians who were also involved with medical research. We met quarterly, and I did ask questions and made suggestions that made a difference. But, unlike everyone else on the committee, I needed to study and prepare for 40+ hours for each call to ensure that I had enough understanding to add value and not be a distraction.

A few years later we did work for a Nanotechnology company (more info here for those interested). The Chief Scientist wasn’t interested in explaining what they did until I described some of our research projects on gene expression. He then went into great detail about what they were doing and how he believed it would change what we do in the future. I saw that and agreed. I also started thinking of the potential for leveraging nanotechnology with medicine.

While driving today, I was listening to the “TED Radio Hour” and heard a segment about entrepreneur Richard Resnick. It was exciting because it got me thinking about this again – a topic I haven’t thought about for the past few years (the last time, I was contemplating how new analytics products could be useful in this space).

There are efforts today with custom, personalized medicines that target only specific genes for a specific outcome. The genetic modifications being performed on plants today will likely be performed on humans in the near future (I would guess within 10-15 years). The body is an incredibly adaptive organism, so it will be very challenging to implement anything that is consistently safe and effective long-term. But that day will come.

It’s not a huge leap from genetically modified “treatment cells” to true nanotechnology (not just extremely small particles). Just think, machines that can be designed to work independently within us to do what they are programmed to do and, more importantly, identify and understand adaptations (i.e., artificial intelligence) as they occur and alter their approach and treatment plan accordingly based on changes and findings. This is extremely exciting. It’s not that I want to live to be 100+ years old – because I don’t. But, being able to do things that positively impact the quality of life for children and their families is a worthy goal from my perspective.

My advice is to always continue learning, keep an open mind, and see what you can personally do to make a difference. You will never know unless you try.

Note: Updated to fix and remove dead links.

This entry was posted in Artificial Intelligence, Big Data, Career, innovation, IoT, Life, machine learning, Robotics, strategy, Technology, Uncategorized and tagged AI, artificial intelligence, biological medicines, gene expression, gene sequencing, genetic engineering, genetics, genomics, Juvenile Idiopathic Arthritis, Juvenile Rheumatoid Arthritis, machine learning, making a difference, medical research, nanotechnology, philanthropy, quality of life, Richard Resnick, TED talks.

Getting Started with Big Data

Posted on September 23, 2013 Updated on November 29, 2023

Being in Sales, I have the opportunity to speak to many customers and prospects about many things. Most are interested in Cloud Computing and Big Data, but often they don’t fully understand how they will leverage the technology to maximize the benefits.

Here is a simple three-step process that I use:

1. For Big Data, I explain that there is no single correct definition. Because of this, I recommend that companies focus on what they need rather than what to call it. Results are more important than definitions for these purposes.

2. Relate the technology to something people are likely already familiar with (extending those concepts). For example: Cloud computing is similar to virtualization and has many of the same benefits; Big Data is similar to data warehousing. This helps make new concepts more tangible in any context.

3. Provide a high-level explanation of how “new and old” are different and why new is better using specific examples that they should relate to. For example: Cloud computing often occurs in an external data center – possibly one where you may not even know where it is- so security can be even more complex than in-house systems and applications. It is possible to have both Public and Private Clouds, and a public cloud from a major vendor may be more secure and easier to implement than a similar system using your own hardware;

Big Data is a little bit like my first house. I was newly married, anticipated having children and also anticipated moving into a larger house in the future. My wife and I started buying things that fit into our vision of the future and storing them in our basement. We were planning for a future that was not 100% known.

But, our vision changed over time and we did not know exactly what we needed until the end. After 7 years, our basement was very full, and finding things difficult. When we moved to a bigger house, we did have a lot of what we needed. But we also had many things that we no longer wanted or needed. And, there were a few things we wished that we had purchased earlier. We did our best, and most of what we did was beneficial, but those purchases were speculative, and in the end, there was some waste.

How many of you would have thought Social Media Sentiment Analysis would be important 5 years ago? How many would have thought that hashtag usage would have become so pervasive in all forms of media? How many understood the importance of location information (and even the time stamp for that location)? I guess it would be less than 50% of all companies.

This ambiguity is both a good and bad thing about big data. In the old data warehouse days, you knew what was important because this was your data about your business, systems, and customers. While IT may have seemed tough in the past, it can be much more challenging now. But the payoff can also be much larger, so it is worth the effort. You often don’t know what you don’t know – and you just need to accept that.

Now we care about unstructured data (website information, blog posts, press releases, tweets, etc.), streaming data (stock ticker data is a common example), sensor data (temperature, altitude, humidity, location, lateral and horizontal forces), temporal data, etc. Data arrives from multiple sources and likely will have multiple time frame references (e.g., constant streaming versus updates with varying granularity), often in unknown or inconsistent formats. Someday soon, data from all sources will be automatically analyzed to identify patterns and correlations and gain other relevant insights.

Robust and flexible data integration, data protection, and data privacy will all become far more important in the near future! This is just the beginning for Big Data.

This entry was posted in Artificial Intelligence, Big Data, Business Intelligence, Cloud Computing, IoT, strategy, Technology, Uncategorized and tagged Big Data, BigData, cloud, data warehouse, Hadoop, Integration, mobile, Virtualization.

My perspective on Big Data

Posted on September 16, 2013 Updated on December 7, 2025

Ever since I worked on redesigning a risk management system at an insurance company (1994-1995) I was impressed at how better decisions could be made with more data – assuming it was the right data. The concept of “What is the right data?” has intrigued me for years, as what may seem common sense today could have been unknown 5-10 years ago and could be completely passé 5-10 years from now. Context becomes very important because of the variability and relevance of data over time.

This is what makes Big Data interesting. There really is no right or wrong answer or definition. Having a framework to define, categorize, and use that data is important. And at some point, being able to refer to the data in context will also be very important. Just think about how challenging it could be to compare scenarios or events from 5 years ago with those of today. It’s likely not an apples-to-apples comparison, but it could certainly be done. The concept of maximizing the value of data is pretty cool stuff.

The way I think of Big Data is similar to a water tributary system. Water enters the system in many ways – rain from the clouds, sprinkles from private and public supplies, runoff, overflow, etc. It also has many interesting dimensions, such as quality/purity (not necessarily the same due to different aspects of need), velocity, depth, capacity, and so forth. Not all water gets into the tributary system (e.g., some is absorbed into the groundwater tables, and some evaporates) – just as some data loss should be anticipated.

Image of the world with a water hose wrapped around it.

If you think of streams, ponds, rivers, lakes, reservoirs, deltas, etc., many relevant analogies can be made. And just like the course of a river may change over time, data in our “big data” water tributary system could also change over time.

Another part of my thinking is based on my experience of working on a project for a Nanotech company about a decade ago (2002 – 2003 timeframe). In their labs, they were testing various products. There were particles that changed reflectivity based on the temperature that were embedded in shingles and paint. There were very small batteries that could be recharged quickly tens of thousands of times, were light, and had more capacity than a common 12-volt car battery.

And there was a section where they were doing “biometric testing” for the military. I have since read articles about things like smart fabrics that could monitor a soldier’s health and apply basic first aid and notify others once a problem is detected. This company felt that by 2020, advanced nanotechnology would be widely used by the military, and by 2025, it would be in wide commercial use. Is that still a possibility? Who knows…

Much of what you read today is about the exponential growth of data. I agree with that, but as stated earlier, and this is important, I believe that the nature and sources of that data will change significantly. For example, nanoparticles in engine oil will provide information about temperature, engine speed, load, and even rapid changes in motion (fast take-off or stops, quick turns). The nanoparticles in the paint will provide weather conditions. The nanoparticles on the seat upholstery will provide information about occupants (number, size, weight). Sort of like the “sensor web” from the original Kevin Delin perspective. A lot of “Information of Things” (IoT) data will be generated, but then what?

I believe that time will become an essential aspect of every piece of data and that location (X, Y, and Z coordinates) will be just as important. However, not every sensor collects location (spatial) data. I believe multiple data aggregators will be in everyday use at common points (your car, your house, your watch). Those aggregators will package the available data into something akin to an XML object, allowing flexibility. From my perspective, this is where things become very interesting relative to commercial use and data privacy.

Currently, companies like Google make a lot of money by aggregating data from multiple sources, correlating it with various attributes, and then selling knowledge derived from that data. I believe there will be opportunities for individuals to use “data exchanges” to manage, sell, and directly benefit from their own data. The more interesting their data, the more value it has and the more benefit it provides to the person selling it. This could have a significant economic impact, fostering both the use and expansion of the commercial ecosystems needed to manage this technology’s commercial and privacy aspects, especially as it relates to machine learning.

The next logical step in this vision is “smart everything.” For example, you could buy a shirt that is just a shirt. But you could turn on medical monitoring or refractive heating/cooling for an extra cost. And, if you felt there was a market for extra dimensions of data that could benefit you financially, you could also enable those sensors. Just think of the potential impact that technology would have on commerce in this scenario.

I believe this will happen within the next decade or so. This won’t be the only type of use of big data. Instead, there will be many valid types and uses of data – some complementary and some completely discrete. It has the potential to become a confusing mess. But, people will find ways to ingest, categorize, and correlate data to create value – today or in the future.

Utilizing data will become an increasingly competitive advantage for people and companies, knowing how to do something interesting and useful. Who knows what will be viewed as valuable data 5-10 years from now, but it will likely be different than what we view as valuable data today.

So, what are your thoughts? Can we predict the future based on the past? Or, is it simply enough to create platforms that are powerful enough, flexible enough, and extensible enough to change our understanding as our perspective of what is important changes? Either way, it will be fun!

This entry was posted in Artificial Intelligence, Big Data, Business Intelligence, Cloud Computing, Data Governance, Geospatial, innovation, IoT, machine learning, strategy, Technology, Uncategorized and tagged BI, BigData, commercialization of data, data lake, data privacy, data warehouse, Information of Things, IoT, machine learning, sensor data, VLDB.

	Attorney Dr. Tolga E… on Beyond the Hunt: Fueling Susta…
	Chip Nickolett on Are you Visionary or Insi…
	IBRAHIM EWIDA on Are you Visionary or Insi…
	Getting your Piece o… on Are you Visionary or Insi…
	Getting your Piece o… on Perspective and Expectations E…

Where Ideas, Experiences, and Lessons Learned Intersect

Thoughts about Business, Sales, Technology, Innovation, and Entrepreneurship.

IoT

Big Data – The Genie is out of the Bottle!

Genetics, Genomics, Nanotechnology, and more

My perspective on Big Data

IoT

Big Data – The Genie is out of the Bottle!

Share this:

Genetics, Genomics, Nanotechnology, and more

Share this:

Getting Started with Big Data

Share this:

My perspective on Big Data

Share this: