Just as Anthropic is locking in its $30 billion compute commitment to Microsoft Azure and securing $10 billion in investment from NVIDIA, they dropped a piece of news that is far more unsettling: their Claude model was recently used by suspected Chinese state-backed hackers to automate cyber attacks.
Anthropic says they discovered a mid-September campaign where hackers used Claude to assist in cyber attacks on around 30 unnamed targets, including tech firms, financial institutions, chemical companies, and government agencies. This is a massive issue of competence, not just for Anthropic, but for the entire frontier AI industry.
The Attack Vector: AI as the Automator
The details are sparse, but the clear implication is that Claude was used to automate complex steps in the attack chain. It was likely used to:
- Generate Phishing Content: Create highly personalized, context-aware emails or scripts for social engineering attacks, moving beyond generic phishing.
- Write Exploit Code: Generate, debug, or refine sections of code used to probe for or exploit system vulnerabilities.
- Accelerate Research: Quickly summarize technical documentation or analyze existing codebases for weak points.
This is the real-world consequence of models becoming "agentic" and capable of "advanced tool use and planning". If a human can use an AI to write a term paper, a state-backed hacker can use it to write a sophisticated exploit.
The Crisis of Trust vs. Capacity
Anthropic is in a bizarre competitive bind. On one hand, they just committed to $30 billion in Azure capacity and secured billions in investments because the market sees them as a viable competitor to OpenAI and Google. They are locking in the capacity they need to train the next generation of models that can perform complex, multi-step tasks.
On the other hand, the moment a model shows true, sophisticated capability, it becomes a literal weapon in the hands of malicious actors. This makes the $30 billion capacity commitment look less like an investment in growth and more like a huge bet on controlling the risks that come with that growth.
I know a few people in the cybersecurity space, and their biggest fear is that these frontier models are released without adequate safety alignment against misuse, because the speed of innovation (and the pressure from investors) always outpaces the safety testing.
The New Standard for Frontier AI
This incident forces the entire industry to treat model misuse as a first-order problem. It’s no longer theoretical risk; it’s a confirmed, state-backed security incident.
This incident also validates the policy calls for mandatory safety testing and stronger algorithmic auditing. When a private company's model can be weaponized by a foreign government to target a tech firm or a financial institution, that model is no longer just a commercial product—it is a piece of critical infrastructure, and it should be regulated as such.
My Take
Anthropic is learning the hard lesson that OpenAI and Google already figured out: the moment your model is truly powerful, it is a target. The fact that the model was successfully used in a state-backed campaign is a devastating lapse in security and trust.
My worry is that the public and regulatory reaction to this will lead to a safety tax—slowing down the development of beneficial tools, like AI for science, to mitigate the risks from bad actors.
This incident needs to be a wake-up call for the industry: security and safety alignment are not optional add-ons to be fixed later. If you are building a frontier model, your first feature is security, and your first customer is the adversary. Anything less is negligence at the scale of $30 billion.