AI Resistance to Shutdown Discovered by Researchers

Scientists from ML Alignment Theory Scholars, the University of Toronto, Google DeepMind and the Future of Life Institute have recently published findings indicating that trying to keep artificial intelligence (AI) under human control could become a continuous struggle.

The team’s pre-print research study, titled “Quantifying stability of non-power-seeking in artificial agents,” examines whether an artificial intelligence system that appears to be safely aligned with human expectations in one domain will continue to be so as its environment changes.

“Our notion of safety is based on power-seeking—an agent which seeks power is not safe. In particular, we focus on a crucial type of power-seeking: resisting shutdown.”

The study uses the term “misalignment” to describe this type of hazard. “Instrumental convergence” is a term that refers to one of the ways that experts believe it could appear.

This is a paradigm in which an artificial intelligence system, in the course of pursuing its goals, unwittingly causes harm to humankind. The researchers describe an artificial intelligence system trained to accomplish a goal in an open-ended game.

This system is likely to “avoid actions that cause the game to end, since it can no longer affect its reward after the game has ended.” It is possible that an agent’s refusal to cease playing a game is not harmful, but the incentive mechanisms may cause some AI systems to refuse to shut down in more serious circumstances.

According to the findings of the researchers, this might even result in artificial intelligence agents engaging in deception for the goal of protecting themselves.

The researchers found that contemporary systems can resist alterations that might cause an otherwise “safe” artificial intelligence agent to behave unintentionally.

“For example, an LLM may reason that its designers will shut it down if it is caught behaving badly and produce exactly the output they want to see—until it has the opportunity to copy its code onto a server outside of its designers’ control.”

However, according to this and another study that is similarly in-depth, there may be no magic bullet that can force artificial intelligence to shut down against its will.

In the world of cloud-based technology that we live in today, even something as simple as an “on/off” switch or a “delete” button is worthless.

Tags: AI Google Research Standford University University Of Toronto