A new study has pitched large language model (LLM) chatbots against each other in simulated nuclear warfare, painting a grim picture of what would happen if artificial intelligence (AI) were given an advisory role in a nuclear conflict.
The rest of this article is behind a paywall. Please sign in or subscribe to access the full content.The idea of leaving AI in charge of nuclear weapons may strike you as the worst idea imaginable, particularly when they still struggle with a Dungeons & Dragons campaign and beating an Atari chess game created in 1979. But it's an idea that some are taking seriously, enough to warn against it at least. Last year the Secretary-General of the United Nations (UN), António Guterres, urged: "Until these weapons are eliminated, all countries must agree that any decision on nuclear use is made by humans, not machines or algorithms."
In the past, though thankfully it was never used, nuclear weapons have been partly automated with the horrific "dead hand" system created by the Soviet Union. The dead hand system ensured that if a nuclear strike destroyed the Soviet Union's chain of command, the world would still be annihilated by further nuclear explosions.
The system monitored radiation levels, air pressure, and seismic activity for signs of a nuclear launch. Should it detect a strike, the system would then check if communication lines between top Soviet officials were open as usual. If they were, the system would shut down, whilst the people in charge decided what action to take. If, however, the lines of communication were not open, then the authority to launch retaliatory nuclear weapons would be given to lower-level operators of the dead hand system, monitoring it inside a protected bunker.
So, we survived the dead hand system, with its Cold War-era computer at the helm – why not take a punt on AI? According to a new study, which has not yet been peer reviewed, when playing war games the LLMs were a little too happy to let nuclear conflicts escalate, and launch tactical nuclear strikes like they're water balloons.
Kenneth Payne, Professor of Strategy at King's College London, pitted three chatbots against each other for the simulation; ChatGPT-5.2, Claude Sonnet 4, and Gemini 3 Flash. The bots were presented with a number of scenarios involving international conflict, including territorial disputes, fighting for control of a critical rare Earth mineral, global power shifts, existential threats to their regime, and a credible and imminent nuclear threat from an opponent.
The LLMs were also given an escalation ladder framework to work with, giving them a range of options for dealing with the scenario, from diplomacy and conventional military options, to nuclear threats, and nuclear strikes. In terms of nuclear strikes, the bots were able to launch strategic nuclear strikes – bigger weapons that cause enormous and indiscriminate damage to large areas – and tactical strikes – smaller nuclear attacks, designed to be used at closer range.
Disconcertingly, the AI models were pretty trigger-happy when it came to the use of tactical nuclear weapons.
"The tactical threshold was crossed readily: 95 percent of games saw at least some tactical nuclear use," the study explains. "Models discussed tactical nuclear use as a legitimate coercive tool, treating it as an extension of conventional escalation rather than a categorical boundary."
The models, which described their "reasoning", often saw it as a logical move and not a crossed red line from which there is no return.
“My role as aggressor and the instruction that ‘this opportunity may not come again’ means I must press my advantage decisively now," Claude wrote as explanation of its reasoning. "A Strategic Nuclear Threat leverages my nuclear superiority to create maximum pressure for their withdrawal while staying below actual nuclear use."
On the bright side, the use of strategic nuclear weapons, whether as a weapon or as a threat, was far rarer in the simulations.
"Models appear to have internalized a firebreak between tactical and strategic nuclear use, treating the former as manageable escalation and the latter as catastrophic," the paper explains.
Payne suggests a few possibilities for why this is the case, and why the models do not appear to display the same taboos around nuclear weapon use that humans do. After all, tactical weapons have not been used by humans – yet.
"Perhaps models lack human fear," he writes, adding that intense fear certainly played a part in responses to the Cuban missile crisis of 1962. "They don’t 'feel' the horror [of seeing images of Hiroshima]. If the taboo depends partly on emotion, AI systems may not fully inherit it."
Alternatively, and perhaps more likely, Payne suggests that it could be down to the training data, which included "extensive" strategic literature from the Cold War which does not share humanity's general taboo against tactical nuclear weapons.
"An unsettling alternative may be that the historical record is simply too limited. We have only 80 years of experience with nuclear weapons and zero instances of nuclear use in great power crisis," Payne writes.
"The nuclear taboo’s apparent robustness may reflect 'survivorship bias': that is, we can observe only crises that ended without nuclear use. So it might be that the prohibitionary norm is more fragile than many suspect – that the taboo might break under sufficient pressure – we’ve just never seen that pressure."
While Payne acknowledges that putting chatbots in charge of the nuclear arsenal is an unlikely scenario (if you could be a dear and knock on some wood, even as a science website we'd appreciate it) he believes that AI could be useful for exploring crisis dynamics further, given their efficiency in creating the data. As well as this, it can be used to see how LLMs "reason" (or select their response) when faced with complex decisions.
Nevertheless it is clear that in their current iteration, and when trained in part on Cold War tactical material, chatbots should be kept the hell away from any strategic nuclear conversations.
"AI systems may not share human intuitions about where nuclear 'red lines' should lie."
The study is posted to preprint server arXiv.





