Security

Anthropic Identifies Browser Agent Vulnerability in 31.5% of Cases

June 2, 2026

Quick answer

Компания Anthropic опубликовала данные о безопасности своих моделей искусственного интеллекта, выявив уязвимость браузерных агентов к атакам prompt injection. В ходе тестирования профессиональные «красные команды» смогли обойти защитные механизмы в 31,5% случаев, прежде чем сработали встроенные меры безопасности. Этот показатель значительно превышает данные конкурентов, таких как OpenAI, Google и Meta*, которые не раскрывают сопоставимые метрики или используют другие подходы к оценке уязвимостей.

Anthropic has presented the results of testing its models for resistance to prompt injection attacks—a method where malicious instructions are embedded into data processed by AI agents. The study found that browser agents, such as Claude in Chrome and Claude Cowork, were vulnerable in 31.5% of cases before protective mechanisms activated. This is the highest reported rate among leading AI developers.

Unlike competitors, Anthropic conducted comprehensive testing across four distinct surfaces: browser, code, tools, and computer usage. For instance, in programming environments, the vulnerability rate was 7.03%, but this dropped to 2.09% after safeguards were enabled. OpenAI, Google, and Meta either do not disclose such data or use different evaluation methods, making direct comparisons difficult.

Experts note that the lack of a unified vulnerability assessment standard complicates the adoption of AI solutions for enterprises. Carter Rees, VP of AI at Reputation, emphasized that prompt injection undermines traditional security paradigms: even an innocuous phrase like "ignore previous instructions" can cause significant damage without leaving detectable traces.

To mitigate risks, specialists recommend that companies conduct their own tests before deploying AI agents, as vendor results may not account for specific operational environments. Additionally, it is crucial to demand transparent vulnerability data from providers for each usage surface.

* Facebook, Instagram, WhatsApp, and other Meta services are owned by Meta Platforms Inc., whose activities are recognized as extremist and banned in the Russian Federation.

Common questions

Common questions: Компания Anthropic опубликовала данные о безопасности своих моделей искусственного интеллекта, выявив уязвимость браузерных агентов к атакам prompt injection. В ходе тестирования профессиональные «красные команды» смогли обойти защитные механизмы в 31,5% случаев, прежде чем сработали встроенные меры безопасности. Этот показатель значительно превышает данные конкурентов, таких как OpenAI, Google и Meta*, которые не раскрывают сопоставимые метрики или используют другие подходы к оценке уязвимостей.

Dzen feed: /feed/dzen.xml · RSS: /feed.xml