This is the next article in our ongoing series, Eyes on Industrial AI, where we explore the multifaceted world of industrial artificial intelligence and automation. Let’s uncover the innovative applications of these cutting-edge technologies specifically as they relate to manufacturing and supply chain management in today’s era of Industry 4.0, including any challenges they encounter.
How many times have you heard, “We’d know what to do if only we had a bigger dataset!” in response to a persistent issue?
Relying on data collection is good best practice, but sometimes the improper implementation of automated data collection systems can harm more than help. You see –
Data is only as good as the sieve it is being filtered through.
Improper data collection and analysis = waste.
And waste costs time, money, and labor for corrections, as well as being an indicator of a sub-optimized manufacturer.
For every person convinced that automation is the magical answer for fixing backlogs and other issues without the need for systemic reorganization, there is another who believes that these rigid systems only kick problems further down the road.
The Myth of Neutral Data
Ok, so automation isn’t perfect, but what about actively harmful?
In Virginia Eubanks’ bestselling book, Automating Inequality: How High-Tech Tools Profile, Police, and Punish the Poor, she argues just that.
“As long as this invisible code is running in the background while we build new automated decision-making tools,” she writes, “the resulting technologies will profile, police, and punish the poor. This doesn’t require bad intentions on the part of designers, data scientists, or program administrators. It only requires designing ‘in neutral,’ ignoring the entirely predictable consequences of supercharging already unequal and discriminatory systems” (page 223).
She hits the nail on the head identifying that people assume data is neutral when actually, neutrality must be mined from proper data, through analysis.
By the way, Eubanks isn’t technologically averse; she’s a political science professor and a tech journalist who believes that maybe one day society will use automation and data tools to correct resource distribution instead of perpetuating the unequal status quo.
But wait! Isn’t Automating Inequality about how public systems like government and social services shouldn’t be automated? That has nothing to say about private manufacturing using automated processes.
Good point – couple of things to keep in mind in addressing that difference:
- The boundary between public and private isn’t so clear, especially with private contractors of government projects. It may be rare that one company is in charge of 100% of their data collection, since they may outsource to other firms. In any way, navigating the divide between what you can control and what you can’t control with data collection is remarkably similar to the divide between public and private businesses.
- Part of what makes Eubanks’ book so interesting is that the waste caused by bad data automation is extravagant. Especially so when the public services being automated are meant to help the poorest of people. Society’s failure to provide using technological systems has a great emotional impact. The outrage we feel when we read about such gross failings reminds us of our humanity as we progress technologically, which is absolutely necessary for healthy progress regardless what type of industry.
- The best way for manufacturers to stretch their limits of data possibilities is to see the data everywhere around, not just in a particular niche industry. There are parallels aplenty, even if you have nothing to do with the charity or public services industries.
(Plus, it’s a really great read.)
Alright – let’s stop moralizing technology and instead look at the facts of 3 different case studies.
With each one – covered in more depth in Eubanks’ book – we can learn valuable takeaways about the implementations of new AI technologies.
Hopefully others’ errors can help inform current manufacturing innovators of the potential difficulties ahead when considering new information systems.
Case Study #1: Automated eligibility system for welfare in Indiana
In Indiana, the Family and Social Services Administration (FSSA) runs a variety of programs like food stamps and welfare.
In 2006, Indiana decided to revamp the FSSA in order to streamline services and better prevent welfare fraud. They awarded a contract, which included instituting an automated eligibility system, to companies IBM and ACS.
The thought was that by eliminating caseworkers and relying upon an automated database, there would be fewer opportunities for applicants to sidestep the red tape based on personal relationships, as well as a faster, more efficient delivery system.
Almost immediately, the project was a disaster.
The few human phone operators were not trained properly for social work. Moreover, the system was too inflexible about categories. In the previous system, when applicants hit problems in the process, they could call their assigned caseworker, who would help them detangle the issues.
Afterwards, applicants had nowhere to turn, and ended up at dead ends like dismissals or phone lines that rang on unanswered.
Takeaway #1: Value your team and know when to invest in a human touch
It may not have been the worst idea to introduce an automated database, but Indiana missed the mark when it came to human interaction. Especially with the kind of services that the FSSA provided, it felt cold and robotic to participate in the eligibility process, and many dropped out due to the stress of coordinating their cases.
There are many opportunities to automate customer service-like tasks for businesses today, and a lot of them save money and training time by doing so.
However, just because you can cut certain corners doesn’t mean you should. Adding that extra, personalized human interaction can make or break your client relationships as well as the connections you have within your team.
The health restaurant freshii is shouldering this lesson: it was reported that the chain’s Ontario locations are outsourcing cashier jobs to Nicaragua, where the minimum wage is a paltry $3.25.
Quite naturally, customers run into practical issues with a virtual cashier, and the general public feels upset about the low pay and involvement of non-local workers.
Case Study #2: Coordinated entry system for the unhoused in LA
"Once they scale up, digital systems can be remarkably hard to decommission… New technologies develop momentum as they are integrated into institutions. As they mature, they become increasingly difficult to challenge, redirect, or uproot” ("Automating Inequality", page 187)
Los Angeles (and particularly Skid Row) is the location of possibly the worst housing crisis in the United States, with housing availability scant and the homeless population rising.
The coordinated entry system, which relies upon a disconnected network of social organizations, uses an algorithm to assess the level of “need” per unhoused person based on a number between 1 and 17. Then, with the aid of a housing coordinator, the person jumps through more bureaucratic hoops in order to (hopefully, no guarantees) get a small apartment.
The rules of coordinated entry just don’t add up.
Aside from being incomplete, siloed datasets, the database isn’t even monitored. The police are allowed unfettered access to it, which often results in them fishing for personal information for preemptive evidence.
No one knows how long certain data points are kept, or even the full list of authority figures who have access. People slip through the cracks, or are passed over due to health issues.
Some parts are dangerously illogical, like the fact that spending time in jail counts as “housing” and interrupts a person’s history of “neediness.” Another example is that an unhoused person needs 3-5 years of rental history, which obviously is a high requirement to meet considering the circumstances.
Takeaway #2: Data architecture needs a strong foundation
Los Angeles took data collection seriously. Unfortunately, the methods of holding and maintaining that database flew out of control.
Some people still navigate the system, but there are so many more who have applied several times and have never progressed beyond the first step because the database doesn’t accurately track information and as the people move around, they are lost to the system.
Make sure you’re not spilling data everywhere out of excitement of having lots to work with. Recently, Facebook/Meta has had an issue with tracking their own data, and an internal document was leaked which admits they don’t know exactly where their data goes.
How do you know that some data isn’t being spilled?
Are there self-referential rules that don’t make sense or that result in dead ends?
Build a strong foundation of data architecture such that you can rely upon it whether you have 10 datasets or 10 million datasets.
Case Study #3: Predictive risk model for child abuse in Allegheny County
"Even if a regression finds factors that predictably rise and fall together, correlation is not causation… A model’s predictive ability is compromised when outcome variables are subjective. Was a parent re-referred to the hotline because she neglects her children? Or because someone in the neighborhood was mad that she had a party last week? Did caseworkers and judges put a child in foster care because his life was in danger? Or because they held culturally specific ideas about what a good parent looks like, or feared the consequences if they didn’t play it safe?” ("Automating Inequality", page 146)
In Pennsylvania in 2012, the Allegheny County Office of Children, Youth, and Families (CYF) offered a contract for an automated data triage system. The winning bid was a team led by a New Zealand economist.
The team built a predictive model using the CYF data history to gauge the probability of a child being abused. Variables included the time spent on public benefits, past history with child welfare, mother’s age and relationship status, mental health, and correctional history.
The predictive model wasn’t good enough, and also repeated racial bias due to the areas mined for data. It had only “fair to good accuracy” – in other words, it just rearranged and spit out a reasonable probability that was already within the data in the first place.
Because the predictive model was right the majority of the time, people tended to over-rely on its answer when faced with a complicated situation.
This essentially led to data-mining certain underprivileged communities and then recycling data, building its own necessity.
Takeaway #3: Don't make assumptions without investigating and standardized testing
There have been multiple instances of late when cutting-edge AI programs perpetuate racist or sexist bias. It’s common to mistake AI’s bias as natural prediction, but remember the means by which the data was collected.
The AI model gets fed certain information and regurgitates it in a new form. Of course uncomfortable biases will repeat when the first dataset contains those same biases.
Make sure that any predictive models used in automation processes are logically sound and well-designed, and aren't doubling back and using their own analyses as constant variables.