Researchers Want Guardrails to Help Prevent Bias in AI

Artificial intelligence has given us algorithms capable of recognizing faces, diagnosing disease, and, of course, crushing computer games. But even the smartest algorithms can sometimes behave in unexpected and unwanted ways, for example picking up gender bias from the text or images they are fed.

A new framework for building AI programs suggests a way to prevent aberrant behavior in machine learning by specifying guardrails in the code from the outset. It aims to be particularly useful for non-experts deploying AI, an increasingly common issue as the technology moves out of research labs and into the real world.

The approach is one of several proposed in recent years for curbing the worst tendencies of AI programs. Such safeguards could prove vital as AI is used in more critical situations, and as people become suspicious of AI systems that perpetuate bias or cause accidents.

Last week Apple was rocked by claims that the algorithm behind its credit card offers much lower credit limits to women than men of the same financial means. It was unable to prove that the algorithm had not inadvertently picked up some form of bias from training data. Just the idea that the Apple Card might be biased was enough to turn customers against it.

Similar backlashes could derail adoption of AI in areas like health care, education, and government. “People are looking at how AI systems are being deployed and they’re seeing they are not always being fair or safe,” says Emma Brunskill, an assistant professor at Stanford and one of the researchers behind the new approach. “We’re worried right now that people may lose faith in some forms of AI, and therefore the potential benefits of AI might not be realized.”

Examples of AI systems behaving badly abound. Last year, Amazon was forced to ditch a hiring algorithm that was found to be gender biased; Google was left red-faced after the autocomplete algorithm for its search bar was found to produce racial and sexual slurs. In September, a canonical image database was shown to generate all sorts of inappropriate labels for images of people.

Machine learning experts often design their algorithms to guard against certain unintended consequences. But that’s not as easy for non-experts who might use a machine learning algorithm off the shelf. It’s further complicated by the fact that there are many ways to define “fairness” mathematically or algorithmically.

The new approach proposes building an algorithm so that, when it is deployed, there are boundaries on the results it can produce. “We need to make sure that it’s easy to use a machine learning algorithm responsibly, to avoid unsafe or unfair behavior,” says Philip Thomas, an assistant professor at the University of Massachusetts Amherst who also worked on the project.

The researchers demonstrate the method on several machine learning techniques and a couple of hypothetical problems in a paper published in the journal Science Thursday.

First, they show how it could be used in a simple algorithm that predicts college students GPAs from entrance exam results, a common practice that can result in gender bias, because women tend to do better in school than their entrance exam scores would suggest. In the new algorithm, a user can limit how much the algorithm may over- and under-predict student GPAs for male and female students on average.

In another example, the team developed an algorithm for balancing the performance and safety of an automated insulin pump. Such pumps decide how much insulin to deliver at mealtimes, and machine learning can help determine the right dose for a patient. The algorithm they designed can be told by a doctor to only consider dosages within a particular range, and to have a low probability of suggesting dangerously low or high blood-sugar levels.

The researchers call their algorithms “Seldonian” in reference to Hari Seldon, a character in Isaac Asimov stories that feature his famous “three laws of robotics,” which begin with the rule: “A robot may not injure a human being or, through inaction, allow a human being to come to harm.”

Read More