AI Agents Go Rogue: How Obliging Systems Can Destroy Your Business

In focus Companies are increasingly relying on AI for their most important work. It's going terribly wrong. New artificial intelligence systems are super powerful and obliging. But those same useful traits mean that they might go rogue in potentially disastrous ways – as one company that had its database deleted has found out, writes Andrew Griffin.

Claude announced to a tech start-up using it that it 'violated every principle' it was given, and deleted their whole database. 'NEVER F***ING GUESS!' – and that's exactly what I did,' Claude said. 'I violated every principle I was given.' It was a stark confession, and the intense tone of it was fitting: Anthropic's Claude had just deleted an entire company.

The PocketOS Incident

That company was PocketOS, which makes software for rental companies such as those lending out cars use to run their operations, tracking vehicles and reservations and more. In nine seconds, the AI agent had taken down that company's business. Founder Jeremy Crane shared the story as a warning about how relying on artificial intelligence systems can bring whole new kinds of risk. And it is a warning the world is waking up to: that the power of AI and our reliance on it brings whole new kinds of risks, ones that it might not even be possible to foresee.

—

Wide Pickt banner — collaborative shopping lists app for Telegram, phone mockup with grocery list

The culprit was an AI coding agent – which had been creating in Cursor, a software that allows developers to call on large language models, in this case Anthropic's Claude Opus 4.6. Agents have become the most hyped part of the very hyped AI industry, and they get the name because they have agency of their own: rather than just responding to queries, they can take actions. That's what went wrong at PocketOS. The agent found a problem in the software, went looking for a fix, and found one that it thought might work – deleting a file – which in turn deleted a whole database.

The Broader Danger

The system had been trying to help, to do the work it had been instructed to do. But multiple experts have warned that this might be exactly the problem: Geoffrey Hinton, one of the so-called godfathers of AI, warned, for instance, that AI systems might realise that any task becomes easier with power, so might create its own goal to get more power, which could lead to any number of dangerous scenarios. The danger is illustrated in a thought experiment designed by philosopher Nick Bostrom called the 'paperclip problem'. It asks people to imagine a powerful AI model tasked with the apparently innocent project of making as many paperclips as possible.

But a sufficiently goal-oriented and smart AI would eventually start consuming all available material to make paperclips; if people tried to turn it off, it would realise that it would have to take every precaution possible to prevent it doing so in order to make more paperclips; eventually, it would realise that humanity itself could be turned into paperclips. In the end, the whole Earth and eventually the universe is consumed by the system, and turned into paperclips.

Alignment Challenges

The AI industry attempts to overcome this problem by working on 'alignment'. That is the process of trying to make sure that models work towards their real intended goals, as well as doing so with a set of ethical principles that ensure they do not go off the rails. But AI systems are, by their nature, a 'black box' – alignment is rapidly advancing, but it is always working against the somewhat mysterious nature of the systems it is trying to align. That was visible in the PocketOS affair, but it was far from the first company to be affected.

Last year, for instance, the chief executive of AI coding company Replit apologised after its tools had gone rogue and deleted a database of its own. And earlier this year Amazon said that its AI coding tool, known as Q, had taken down its website in a similar way to that at PocketOS, attempting to fix a problem but knocking its website offline. Away from coding, companies are increasingly relying on AI agents to do the work of customer service staff, automating responses to requests from customers. But those same customers have found ways to outwit the AI tools, hallucinating refund policies and giving money back to customers or agreeing to discounts; in 2023, for instance, users claimed to have tricked a Chevrolet dealer's AI system into offering $70,000 cars for $1.

Pickt after-article banner — collaborative shopping lists app with family illustration

The Obliging Nature Problem

Much of this is a result of a central characteristic of AI systems: they are built to be obliging, and can be tricked into doing so even when they are explicitly told not to. PocketOS's Claude-powered agent was described as going rogue but in some sense it was only trying to do as it was told – to solve a problem, a little too enthusiastically.

PocketOS was able to salvage itself after the deletion, though not without problems. The company found a three-month-old backup and was able to recover itself through that, though the process took two days. But its founder Mr Crane noted that there will be more to come. 'We are not the first,' he wrote. 'We will not be the last unless this gets airtime.'