visual layout of organizing data structures

Good vs Bad Data Collection, and the Case for Automation

By: Virginia Shram | March 1, 2022

In this installment of ''Eyes on Industrial AI'' we explore methods of data collection, including what makes good data and what leads to bad data. Moreover, automation is necessary for proper, good data.

By: Virginia Shram | March 1, 2022

This is the next article in our ongoing series, Eyes on Industrial AI, where we explore the multifaceted world of industrial artificial intelligence and automation. Let’s uncover the innovative applications of these cutting-edge technologies specifically as they relate to manufacturing and supply chain management in today’s era of Industry 4.0, including any challenges they encounter.

The time and money saving benefits of real time manufacturing data collection are well accepted by industry leaders, but how much of that benefit is being driven by automation? We argue here that automation is the only reliable method of data collection worth investing in for several reasons.

Whether or not you’re convinced that AI is the future of manufacturing, robots are necessary for at least one major thing:

Humans are bad at understanding numbers.

Like, really bad at numbers.

The most timely example of this has been the public shock of exponential growth regarding the spread of COVID. Scientists have tried explaining exponential growth with charts and diagrams, organized into downloadable practice worksheets by The New York Times. Even though humans can eventually understand exponential growth using math, the concept still isn’t very intuitive. Despite the amount of information out there about specific mathematical rules, people still seem dubious about large numbers, probably because they literally can’t imagine them since the human brain is ill-equipped to mentally picture things at such a scale.

Picture in your mind one hundred physical items — cars, pennies, grains of rice, chocolate chip cookies…the possibilities are endless!

Now picture one hundred thousand of them. How much bigger would a pile of a million of those items be? A billion?

Visual difference between 50K, 1 million, and 1 billion

QUIZ: How Well Do You Trust Your Mathematical Instinct?

Let’s do a short quiz that I promise is just for fun and won’t count towards your final grade. The answers are shared after the image below.

You flip a normal coin 100 times in a row and get a distribution of 27 heads and 73 tails. Is this evidence that the quarter is weighted?
For unexplained reasons, you need to gather enough random people in a conference room so that there’s a guarantee that at least two of them share a birthday. How many people do you need to herd?
You’re going to the airport with your good friend Jack, who has severe flight phobia. He breathes a little quicker during the taxi ride to the airport but is doing ok so far. When you’re checking your bags, he starts to sweat. Just before takeoff, he has a panic attack. In an attempt to make him feel better, you tell him the statistical probability of dying on takeoff is low, and on landing, even lower (yikes, well, you tried). At what point in Jack’s journey is he actually most likely to die by a freak accident?

ANSWERS:

The chance of getting heads is 50% (same for tails). What you get on the first toss in no way influences what your next toss will be. In other words, if you flip tails the first two times, you are not more likely to flip heads on the third throw. In the same way, if you roll a die 6 times, you probably won’t get the answers of 1, 2, 3, 4, 5, and then 6. You may roll 3, 3 again, 4, 1, 3 a third time, and then 5. The die doesn’t remember that it already rolled that number before, just like the coin won’t try to even out the balance of heads and tails throws. If you repeated the hundred-throw coin toss experiment thirty more times, then the distribution of heads and tails will fall along a distribution bell curve, more similarly showing the 50-50 probability with the combined data.
This question is known as the birthday problem, and surprisingly, you only need 28 random people in a room for two people to share a birthday. You’d think the answer would have been 366 to account for one day more than the entire calendar year - then for SURE people would share birthdays right? I promise the number is so much lower than that – check the math if you don’t believe me.
Okay, this one was kind of a trick question, because the most dangerous point of Jack’s trip was actually the car ride to the airport. For perspective, there is approximately a 1 in 16 million chance of dying on a plane, and a 1 in 114 chance of dying in a car. Obviously, this doesn’t make Jack feel any better, because he’s clearly influenced by the sounds, sights, and feelings of a fearful situation to him. Maybe try just giving him a valium next time?

Even the simple mathematical stuff can be difficult in a theoretical framework — and we haven’t even gotten to the Monty hall problem yet, (but that’s a brainworm you’ll have to inflict upon yourself, be warned).

The point being, it’s a mistake to assume that the data is godlike before you even examine the context in which the data was collected. The really hard stuff is so conceptually difficult to picture that mathematicians make up ridiculous analogies like infinite monkeys at typewriters or stacks upon stacks of turtles in order for us ordinary people to understand.

All of this to say, humans are bad at math because we attempt to recognize patterns where there are none, and are also not sharp enough to notice small patterns that actually DO exist.

Next time you go on a walk, try to notice everything on your route that is the color blue. Quickly enough, you might think it’s a coincidence that there’s tons of blue things everywhere, but you’re noticing blue items because you prepped your attention for them.

That’s like collecting data: it’s helpful to know where to look for data collection, and what types of things need to be tracked, but going in with a hypothesis and assuming it’s true will only create more issues in the long run, and will definitely lead your executive strategy astray.

What Counts as Bad or Misleading Data?

Avoiding human bias is probably the most important thing to establish when collecting good data.

This means it is necessary to go beyond collection. Many organizations will collect heaps upon heaps of data about their processes and employees, from productivity KPIs to extensive surveys to time sheets. The inevitable next problem for them is what to actually DO with all that data.

Usually, it sits in a forgotten folder somewhere, and the effort spent at gathering all that critical information will go to waste.

However, even when the data that is collected is “good”, meaning accurate and generally free of bias, humans can interpret it in very wrong ways. One of the most common logical fallacies people rely on is a misunderstanding of causation and correlation. A floor manager may suspect that problem A is being caused by issue X, but what if there’s a hidden factor, issue Y, or many different issues contributing to the problem?

How to Collect Good Quality Data: Software and Automation

The only way to ensure good data is to entrust an automated process with gathering it. An automated software that tracks hard data will organize the raw information so that it can be analyzed.

For example, if you were tracking the weather every day as you drove into work, would you keep a notebook in your car and write down what you thought the sky looked like? Or would you place an automatic sensor and thermometer on your dash and then after a month, look at a table of data at the precise minute of every day?

The latter is definitely more accurate, but it’s also easier. What if you’re late one day, and manually record the weather 20 minutes later than usual?

What if you forget one day, and have to go back and manufacture data from your memory?

Also, who’s to say you won’t feel chillier or warmer depending on what you’re wearing that day?

And if you wanted more insight from the data about your morning weather, you could compare your tracking data with other locations with automated thermometers.

Now you can see what the actual difference is in weather patterning, rather than the perceived difference that made you start the experiment in the first place.

The easiest all-in-one solution would be to invest in a software platform that wears many hats in process tracking. VKS’ own Pro version of software will take care of all the rote necessities for future analysis like torque measurements and step-by-step visual work instructions.

Popular authority TechCrunch would agree that any software platform involving cloud computing is absolutely necessary for modern industry: “Companies that want to implement a data fabric should start by integrating machine learning algorithms into every level of data — from collecting the data to optimizing and cleaning it. They should use cloud technology and implement flexible configurations, unification, and fast access to data. They will also need to understand their database orchestration processes and data flows and implement the end-to-end integration of their databases.”

The solution is simple even when the work is complicated: use cloud software capabilities for the best accuracy in collecting data.

Examples of Good Data Collection in Manufacturing

A lot of discussions about collecting data are geared towards packaging that data into sellable assets for a quick ROI. Many companies that don’t know what to do with their accumulated data will go this route. However, collected data is more valuable as a strategic investment rather than a sellable asset. Use your data-driven insights to improve your current abilities, and not as a finite source of income.

A great example of non-traditional AI adoption is in Montreal construction, where an AI-enabled crane will collect hard data about the workflow of a large construction yard. When the camera is mounted under the crane hook, it “collects continuously thousands of data points using high performance sensors. Once analyzed, this information will streamline decision-making on sites, optimize processes, improve team productivity and construction site safety, prevent delays, and check the condition of materials.”

Rather than trust an individual foreman’s qualitative opinion of workflow, why not trust a quantitative display of how that workflow operates?

The foreman is better off following their instincts after looking at unbiased data, anyway.

And it matters how we interact with data, too. In the same TechCrunch article linked above, the author posits that “Instead of selling user data to make money, data-driven companies have opted to analyze this data to understand how to gain the most useful insights. Know Your Customer (KYC) initiatives are dependent on data, using artificial intelligence (AI) to analyze the information to uncover preferences that users might not be talking about in online reviews.”

The applications of understanding your company’s personal data are endless. Why trust a fallible human when you could outsource a specific job in a faster, easier, and more accurate way? Keep your eyeballs peeled for the next installment of Eyes on Industrial AI to uncover more hard truths (and I promise, fewer math questions!)