These Startups Are Building Tools to Keep an Eye on AI

In January, Liz O’Sullivan wrote a letter to her boss at artificial intelligence startup Clarifai, asking him to set ethical limits on its Pentagon contracts. WIRED had previously revealed that the company worked on a controversial project processing drone imagery.

O’Sullivan urged CEO Matthew Zeiler to pledge the company would not contribute to the development of weapons that decide for themselves whom to harm or kill. At a company meeting a few days later, O’Sullivan says, Zeiler rebuffed the plea, telling staff he saw no problems with contributing to autonomous weapons. Clarifai did not respond to a request for comment.

O’Sullivan decided to take a stand. “I quit,” she says. “And cried through the weekend.” Come Monday, though, she took a previously planned trip to an academic conference on fairness and transparency in technology. There she met Adam Wenchel, who previously led Capital One’s AI work, and the pair got to talking about the commercial opportunity of helping companies keep their AI deployments in check.

O’Sullivan and Wenchel are now among the cofounders of startup Arthur, which provides tools to help engineers monitor the performance of their machine learning systems. They’re supposed to make it easier to spot problems such as a financial system making biased lending or investment decisions. It is one of several companies, large and small, trying to profit from building digital safety equipment for the AI era.

Researchers and tech companies are raising alarms about AI going awry, such as facial recognition algorithms that are less accurate on black faces. Microsoft and Google now caution investors that their AI systems may cause ethical or legal problems. As the technology spreads into other industries such as finance, health care, and government, so must new safeguards, says O’Sullivan, who is Arthur’s VP of commercial operations. “People are starting to realize how powerful these systems can be, and that they need to take advantage of the benefits in a way that is responsible,” she says.

Keep Reading

The latest on artificial intelligence, from machine learning to computer vision and more

Arthur and similar startups are tackling a drawback of machine learning, the engine of the recent AI boom. Unlike ordinary code written by humans, machine learning models adapt themselves to a particular problem, such as deciding who should get a loan, by extracting patterns from past data. Often, the many changes made during that adaptation, or learning, process aren’t easily understood. “You’re kind of having the machine write its own code, and it’s not designed for humans to reason through,” says Lukas Biewald, CEO and founder of startup Weights & Biases, which offers its own tools to help engineers debug machine learning software.

Researchers describe some machine learning systems as “black boxes,” because even their creators can’t always describe exactly how they work, or why they made a particular decision. Arthur and others don’t claim to have fully solved that problem, but offer tools that make it easier to observe, visualize, and audit machine learning software’s behavior.

The large tech companies most heavily invested in machine have built similar tools for their own use. Facebook engineers used one called Fairness Flow to make sure its job ad recommendation algorithms work for people of different backgrounds. Biewald says that many companies without large AI teams don’t want to build such tools for themselves, and will turn to companies like his own instead.

Weights & Biases customers include Toyota’s autonomous driving lab, which uses its software to monitor and record machine learning systems as they train on new data. That makes it easier for engineers to tune the system to be more reliable, and speeds investigation of any glitches encountered later, Biewald says. His startup has raised $20 million in funding. The company’s other customers include independent AI research lab OpenAI. It uses the startup’s tools in its robotics program, which this week demonstrated a robotic hand that can (sometimes) solve a modified Rubik’s Cube.

Arthur’s tools are more focused on helping companies monitor and maintain AI after deployment, whether that’s in financial trading or online marketing. They can track how a machine learning system’s performance changes over time, for example to flag if a financial system making loan recommendations starts excluding certain customers, because the market is drifting away from conditions the system was trained on. It can be illegal to make credit decisions that have a disparate impact on people based on gender or race.

LEARN MORE

The WIRED Guide to Artificial Intelligence

IBM, which launched AI transparency tools last year as part of a service called OpenScale, and another startup Fiddler, which has raised $10 million, both also offer AI inspection tools. Ruchir Puri, chief scientist at IBM Research, says KPMG uses OpenScale to help clients monitor their AI systems, and that the US Open used it to check that automatically selected tennis highlights included a balance of players of different gender and ranking. Fiddler is working with financial information company S&P Global, and consumer lender Affirm.

Wenchel, who is Arthur’s CEO, argues that AI monitoring and auditing technology can help AI spread deeper into areas of life outside of tech, such as healthcare. He says he saw first-hand in the financial sector how justifiable caution about AI systems’ trustworthiness held back adoption. “Many organizations want to put machine learning into production to make decisions, but they need a way to know it’s making the right decisions and not doing it in a biased way,” he says. Arthur’s other cofounders are Priscilla Alexander, also a Capital One veteran, and University of Maryland AI professor John Dickerson.

Arthur is also helping AI gain a foothold in archaeology. Harvard’s Dumbarton Oaks research institute is using the startup’s technology in a project exploring how computer-vision algorithms can speed the process of cataloging photos depicting ancient architecture in Syria made inaccessible and endangered by war. Arthur’s software annotates images to show which pixels influenced the software’s decision to apply particular labels.

Dumbarton Oaks research institute is using Arthur’s software to guide development of machine learning software that catalogues images of Syrian architecture.

Courtesy of ArthurAI/Dumbarton Oaks

Yota Batsaki, Dumbarton’s executive director, says this helps reveal the software’s strengths and limitations, and for AI to earn acceptance in a community that doesn’t automate much. “It’s essential to evaluate the interpretations being made by the model and how it is ‘thinking’ to build trust with librarians and other scholars,” she says.

O’Sullivan remains an AI activist. She’s technology director at the nonprofit Surveillance Technology Oversight Project and an active member of the Campaign to Stop Killer Robots, which wants an international ban on autonomous weapons.

But she and her Arthur cofounders don’t believe governments or even defense departments should be deprived of AI altogether. One of Arthur’s first clients was the US Air Force, which awarded the company a 6-month prototyping contract at Tinker Air Force Base in Oklahoma, working on software that predicts supply chain problems affecting engines used on B-52 bombers. The project is aimed at reducing unnecessary costs and delays.

O’Sullivan says that kind of work is very different from entrusting machines with the power to take someone’s life or liberty. Arthur reviews the potential impacts of every project it takes on, and is working on a formal internal ethics code. “The extreme use cases still need to be regulated or prevented from ever coming to light, but there’s tons of room in our government to make things better with AI,” O’Sullivan says. “Constraints will make me and a lot of other tech workers more comfortable about working in this space.”


More Great WIRED Stories

Read More